math 240: calculus iiiedvardf/240summer12/240notes.pdf1 about 1.1 contentofthesenotes...

128
Math 240: Calculus III Edvard Fagerholm [email protected] June 20, 2012

Upload: others

Post on 25-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • Math 240: Calculus III

    Edvard [email protected]

    June 20, 2012

  • Contents1 About 3

    1.1 Content of these notes . . . . . . . . . . . . . . . . . . . . . . 31.2 About notation . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    2 Vector Spaces 52.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Definition of a vector space . . . . . . . . . . . . . . . . . . . . 62.3 Span of vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 72.4 Linear independence of vectors . . . . . . . . . . . . . . . . . . 102.5 Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.6 Generalizing our Definition of a Vector (optional) . . . . . . . 15

    3 Matrices 183.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2 Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . 193.3 The transpose of a matrix . . . . . . . . . . . . . . . . . . . . 243.4 Some important types of matrices . . . . . . . . . . . . . . . . 253.5 Systems of linear equations and elementary matrices . . . . . . 273.6 Gaussian elimination . . . . . . . . . . . . . . . . . . . . . . . 313.7 Rank of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . 353.8 Rank and systems of linear equations . . . . . . . . . . . . . . 373.9 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.10 Properties of the determinant . . . . . . . . . . . . . . . . . . 423.11 Some other formulas for the determinant . . . . . . . . . . . . 473.12 Matrix inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.13 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . 553.14 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . 63

    4 Higher-Order ODEs 684.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 684.2 Homogeneous equations . . . . . . . . . . . . . . . . . . . . . 704.3 Nonhomogeneous equations . . . . . . . . . . . . . . . . . . . 724.4 Homogeneous linear equations with constant coefficient . . . . 744.5 Undetermined Coefficients . . . . . . . . . . . . . . . . . . . . 774.6 Variation of parameters (optional) . . . . . . . . . . . . . . . . 804.7 Cauchy-Euler equations . . . . . . . . . . . . . . . . . . . . . 81

    1

  • 4.8 Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 834.8.1 Free undamped motion . . . . . . . . . . . . . . . . . . 834.8.2 Free damped motion . . . . . . . . . . . . . . . . . . . 844.8.3 Driven motion . . . . . . . . . . . . . . . . . . . . . . . 86

    5 Systems of linear ODEs 885.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 885.2 Homogeneous linear systems . . . . . . . . . . . . . . . . . . . 925.3 Homogeneous linear systems – complex eigenvalues . . . . . . 955.4 Solutions by diagonalization . . . . . . . . . . . . . . . . . . . 98

    6 Series solutions to ODEs 996.1 Solutions around ordinary points . . . . . . . . . . . . . . . . 996.2 Solutions around singular points . . . . . . . . . . . . . . . . . 101

    7 Vector Calculus 1027.1 Line Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . 1027.2 Independence of Path . . . . . . . . . . . . . . . . . . . . . . . 1057.3 Multiple integrals . . . . . . . . . . . . . . . . . . . . . . . . . 1087.4 Green’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 1117.5 Change of variable formula for multiple integrals . . . . . . . . 1167.6 Surface integrals . . . . . . . . . . . . . . . . . . . . . . . . . . 1207.7 Divergence theorem . . . . . . . . . . . . . . . . . . . . . . . . 1257.8 Stokes theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 127

    2

  • 1 About

    1.1 Content of these notes

    These notes will cover all the material that will be covered in class. What ismostly missing is going to be pictures and examples. The course text, Zill,Cullen, Advanced Engineering Mathematics 3rd Ed, will be very useful formore thorough and harder (read: longer) examples than what I’m willing tospend my time on typing up. You’ll also find more problems to complementthe homework that I will assign, since the more problems you do the better.

    All the theory that’s taught in class will be in these notes and some more.During lecture I might sometimes skip parts of some proofs that you’ll findin these notes or instead just present the general idea by doing e.g. a specialcase of the general case. This will usually happen when:

    1. The full proof will not add anything useful to your understanding ofthe topic.

    2. The full proof is not a computational technique that will turn out usefulwhen solving practical problems. This is an applied class after all.

    Embedded in each section you’ll find some examples. After almost everydefinition there will be something I would call a trivial example that shouldhelp you check that you are understanding what the definition is saying.You should make sure you understand them before moving on, since notunderstanding them is a sign that you’ve misunderstood something.

    Each section also ends with a list of all the most basic computationalproblems related to that topic. These are things you will be expected toperform in your sleep, so make sure you understand them. Almost anyproblem you will encounter in this class will reduce to solving a sequence ofthese problems, so they will be the ”bricks” of most applications that you’llencounter.

    3

  • 1.2 About notation

    The following basic notations will be used in the class:

    ∅ = the set with no elementsN = 0, 1, 2, 3, . . .Z = 0, 1,−1, 2,−2, . . .Q = {x/y | x, y ∈ Z, y 6= 0}R = the real numbersC = the complex numbers∀ = this is read ”for all”∃ = this is read ”there exists”∈ = this is read ”element of”6∈ = this is read ”not element of”

    s.t. = short for ”such that”

    In other words, a short hand for the sentence ”there exists a real number thatis larger than zero” could be written as ∃x ∈ R s.t. x > 0. I will also assumeyou know the following set theoretic notations. Given any sets A,B we maydefine union, intersection, difference, equality and subset. These are definedas follows:

    x ∈ A ∪B ⇔ x ∈ A or x ∈ Bx ∈ A ∩B ⇔ x ∈ A and x ∈ Bx ∈ A \B ⇔ x ∈ A and x 6∈ B

    A = B ⇔ x ∈ A if and only if x ∈ BA ⊂ B ⇔ If x ∈ A, then x ∈ B.

    Notice that A = B can also be written as A ⊂ B and B ⊂ A. Finally, wesometimes use the following notation

    A ( B,

    which means that A ⊂ B, but A 6= B. In other words, B contains an elementthat is not in A, but all elements of A are in B.

    We will also sometimes use the following notation to describe sets. Saywe want to define the set of all whole numbers that are even. We can write

    4

  • this set as{x ∈ Z | x is even}

    More generally, assume that P is a predicate, which is informally somethingthat’s either true or false depending on what you feed it and A any set. Wewrite

    {x ∈ A | P (x)}to mean ”the elements x of A s.t. P (x) is true”. In our example A = Z andP (x) =”x is even”. If x is constructed of multiple parts, we might sometimeswrite e.g.

    {(x, y) ∈ R2 | y = 2x},which would use as the predicate P (x, y) the statement ”y = 2x”. Usually,these should not lead to too much confusion since most of the notation isquite self-explanatory.

    For functions we also use the familiar notation f : A → B. This meansthat f is a function from A to B i.e. A is the domain of f and B the codomainof f . In other words, given an x ∈ A, f assigns a unique f(x) ∈ B. In thissetting one also talks about the image of f which is defined as

    Im f = {y ∈ B | ∃x ∈ A s.t. y = f(x)}

    In other words, all the points in B onto which an element in A maps.

    2 Vector Spaces

    2.1 Vectors

    In math 114 a sort of mixed notation was used for vectors. Sometimes wewrote −→v = 2i + j + k, while we also used the notation −→v = 〈2, 1, 1〉. Wesee that there’s an immediate equivalence between points of R3 and vectors.If O denotes the origin in R3, then given a point P = (2, 1, 1) it defines thevector −→

    OP = 〈2, 1, 1〉.The moral of the story is that a point P = (x, y, z) represents a vectorand a point is essentially just a list of numbers. Those of you familiar withcomputers might have heard the term vector processing, which simply meansthat something operates on a list of numbers. In this spirit we will make thefollowing definition:

    5

  • Definition 2.1.1 A vector is just an n-tuple of (a1, . . . , an) of numbers.Typically we will have (a1, . . . , an) ∈ Rn i.e. each ai is a real number.

    Example 2.1.2 (1, 1) is a vector in R2 and (1, 0, 1, 2) is a vector in R4. Notethat in the latter case there’s really no way to ”visualize” where this vectorpoints, so it has less of a geometric meaning, since we can’t visualize fourdimensions.

    Note 2.1.3 In applications one often encounters vectors in, say, R100. Forexample, say we sample 100 numbers independently for some statistical ap-plication. The result will be a vector (a1, . . . , a100) ∈ R100, where ai is thenumber obtained in the ith sample. Here, the vector is chosen from a 100-dimensional space. This does not make any sense if thinking about it visually.However, it does make sense, if we think about the dimension as simply beingthe degrees of freedom of the experiment. Each number is basically chosenindependently of the others.

    Note 2.1.4 Essentially any phenomena that depends freely on n-variablescan be modeled by something called an n-dimensional vector space, which isthe object of study in linear algebra. This is why linear algebra tends to bethe most useful subfield of mathematics for practical applications. From amathematical perspective, it would make much more sense to teach it beforecalculus, but the world doesn’t seem to work that way.

    2.2 Definition of a vector space

    Let (a1, . . . , an), (b1, . . . , bn) ∈ Rn and c ∈ R. In other words, two vectorsand a real number. Write x = (a1, . . . , an) and y = (b1, . . . , bn). We can addthese vectors and multiply them by real numbers as follows:

    x + y = (a1, . . . , an) + (b1, . . . , bn) = (a1 + b1, . . . , an + bn)

    cx = c(a1, . . . , an) = (ca1, . . . , can).

    In other words add component-by-component and multiply each componentwith the constant. The real number c ∈ R will be called a scalar. This leadsus to the concept of a vector space.

    6

  • Definition 2.2.1 A vector space is a subset V ⊂ Rn s.t. the followingconditions hold:

    1. If x,y ∈ V , then x + y ∈ V .

    2. If c ∈ R, then cx ∈ V .

    The former condition is called closed under vector addition while the latteris called closed under multiplication by a scalar.

    Example 2.2.2 Let n = 2, so we are looking at R2. Let V = {(x, y) | y =2x}. In other words V consists of all the points on the line y = 2x. Letsshow that V is a vector space by checking that conditions (1) and (2) in thedefinition are true. This is done as follows:

    1. Let (a1, a2), (b1, b2) ∈ V be points on the line y = 2x. By definitiona2 = 2a1 and b2 = 2b1. It follows that a2 + b2 = 2a1 + 2b1 = 2(a1 + b1).In other words,

    (a1, a2) + (b1, b2) = (a1 + b1, a2 + b2) ∈ V.

    2. Now let (a1, a2) ∈ V and c ∈ R. Since a2 = 2a1, we also have thatca2 = 2ca1, so c(a1, a2) = (ca1, ca2) ∈ V .

    Example 2.2.3 Let V = {(x, y) | y = 2x + 1} i.e. the points on the liney = 2x+ 1. Now (1, 3) is on the line, but (−1)(1, 3) = (−1,−3) is not. ThusV is not closed under multiplication by a scalar, so it’s not a vector space.

    2.3 Span of vectors

    A typical problem one needs to solve is the following. Assume we are givenvectors x1, . . . ,xk ∈ Rn. We want to find all the vectors x s.t.

    x = c1x1 + . . .+ ckxk, ci ∈ R, i = 1, . . . , n.

    Conversely, given x we want to determine if it’s of the previous form.

    Definition 2.3.1 A vector x that can be written as c1x1 + . . .+ckxk is calleda linear combination of the vectors x1, . . . ,xk.

    7

  • Example 2.3.2 Assume we are given one vector x = (1, 0) ∈ R2. Then wemay write (2, 0) = 2x, so (2, 0) is a linear combination of x. However, saywe look at the vector (1, 1). Clearly,

    (1, 1) 6= cx = c(1, 0) = (c, 0)

    for any choice of c ∈ R. In other words (1, 1) is not a linear combination ofx.

    Example 2.3.3 Assume we are given vectors x1 = (1, 0, 1), x2 = (2, 1, 0),x3 = (0, 0, 1). We want to find all the linear combinations of x1,x2,x3. Wewant to find all x ∈ R3 s.t.

    x = c1x1 + c2x2 + c3x3.

    Write x = (a1, a2, a3). To determine if x is a linear combination, we need tosolve for the unknowns c1, c2, c3 s.t. the combination gives us x. This gives

    (a1, a2, a3) = c1x1 + c2x2 + c3x3

    = c1(1, 0, 1) + c2(2, 1, 0) + c3(0, 0, 1)

    = (c1, 0, c1) + (2c2, c2, 0) + (0, 0, c3)

    = (c1 + 2c2, c2, c1 + c3).

    We just equate the components, which gives us a linear system of equationsc1 + 2c2 = a1c2 = a2c1 + c3 = a3.

    In other words, we need to determined when this linear system of equationsin the unknowns c1, c2 and c3 has a solution given a1, a2, a3. We will be ableto answer this question later in the class when we study systems of linearequations and Gaussian elimination.

    Definition 2.3.4 Let x1, . . . ,xk ∈ Rn be a collection of vectors. We definethe span of them to be the set

    V = {x = (a1, . . . , an) ∈ Rn | x = c1x1 + . . .+ ckxk, ci ∈ R, i = 1, . . . , k}.

    8

  • In other words, V is the set of all vectors that can be written as a linearcombination of the x1, . . . ,xn. We denote this by

    V = span(x1, . . . ,xk).

    Definition 2.3.5 If V ⊂ Rn is a vector space and V = span(x1, . . . ,xk) forsome vectors x1, . . . ,xk ∈ Rn. Then x1, . . . ,xk is called a spanning set ofvectors of V .

    Note 2.3.6 The previous definition goes the other way around. We are giventhe vector space V and we say that x1, . . . ,xk is a spanning set if it happensthat V = span(x1, . . . ,xk).

    Theorem 2.3.7 V = span(x1, . . . ,xk) ⊂ Rn is a vector space.

    Proof. Let x,y ∈ V , so that

    x = c1x1 + . . .+ ckxk, y = d1x1 + . . .+ dkxk

    where ci, di ∈ R are scalars. Then

    x + y = (c1 + d1)x1 + . . .+ (ck + dk)xk ∈ V,

    since this is a linear combination with scalars ci + di. Similarly,

    cx = cc1x1 + . . .+ cckxk ∈ V.

    Theorem 2.3.8 For any vectors x1, . . . ,xk ∈ Rn, we have that

    span(x1, . . . ,xk−1) ⊂ span(x1, . . . ,xk)

    Proof. x = c1x1 + . . .+ ck−1xk−1 = c1x1 + . . .+ ck−1xk−1 + 0xk.

    The previous result is also useful for creating larger vector spaces. Assumewe are given a vector space V = span(x1, . . . ,xk), then by choosing x ∈Rn \ V , i.e. a vector not in V , we can create a larger vector space V ( W =span(x1, . . . ,xk,x), since the space W contains x.

    9

  • WARNING 2.3.9 The span of no vectors is by convention defined to thethe vector space containing just the zero vector 0 ∈ Rn. This comes up whendefining dimension later on.

    Theorem 2.3.10 Given a list of vectors x1, . . . ,xn, then the following willnot change their span:

    1. Multiplying a vector in the list by a nonzero constant.

    2. Exchanging two vectors in the list (thus renaming one to be the otherand vice versa).

    3. Replacing xi by xi + axj for some j 6= i i.e. adding a scalar multipleof one vector to another.

    Proof. c1x1 + . . . + cnxn = c1x1 + . . . + cia−1(axi) + . . . + cnxn proves (1).The others follow similar logic.

    2.4 Linear independence of vectors

    We mentioned earlier in Note 2.1.4 that we are often interested in the degreesof freedom of a problem. In linear algebra this is called dimension. It’s basedon the following concept. Given vectors x1, . . . ,xk ∈ Rn and some vectorx ∈ span(x1, . . . ,xk), then by definition we know that

    x = c1x1 + . . .+ ckxk.

    We can now ask us the following question: are the ci’s unique, that is, arethere multiple choices of the ci s.t. we get x as the linear combination?Concretely, can we find ci, di ∈ R, s.t. ci 6= di for some i and still

    x = c1x1 + . . .+ ckxk = d1x1 + . . .+ dkxk?

    Example 2.4.1 Let x1 = (1, 0), x2 = (0, 1), x3 = (1, 1) be vectors in R2and choose x = (2, 1). We see that

    x = 2x1 + x2 + 0x3 = x1 + 0x2 + x3,

    so there are multiple choices for the scalars.

    10

  • Here’s the reason why we are interested in this. We are looking for adefinition of dimension, which is based on the following. Assume we aregiven vectors x1, . . . ,xk ∈ Rn. Let’s remove a vector from this collection,say, xk. By theorem 2.3.8 we know that

    span(x1, . . . ,xk−1) ⊂ span(x1, . . . ,xk),

    but do we have equality or not?

    Example 2.4.2 Let’s go back to the previous example, so x1 = (1, 0),x2 = (0, 1), x3 = (1, 1). Then we know that span(x1,x2) ⊂ span(x1,x2,x3).However, since we can write x3 = x1 + x2, we have that

    x = c1x1 + c2x2 + c3x3 = c1x1 + c2x1 + c3(x1 +x2) = (c1 + c3)x1 + (c2 + c3)x2.

    This shows that if x ∈ span(x1,x2,x3), then x ∈ span(x1,x2), so actually

    span(x1,x2) = span(x1,x2,x3).

    On the other hand, since x2 = 0x1 + x2, we clearly have x2 ∈ span(x1,x2).But (0, 1) = x2 6= c1x1 = (c1, 0) for any choice of c1 ∈ R, so x2 6∈ span(x1).Thus, we have

    span(x1) ( span(x1,x2) = span(x1,x2,x3).

    Is there a general reason to why this is the case? Yes, and we will answerthat next.

    Definition 2.4.3 Given vectors x1, . . . ,xk ∈ Rn, we say that x1, . . . ,xk arelinearly independent if given any x ∈ span(x1, . . . ,xk), the equation

    x = c1x1 + . . .+ ckxk

    has a unique solution in terms of the ci.

    Example 2.4.4 One vector x1 ∈ Rn is always linearly independent, assum-ing x1 6= 0. This is because for distinct scalars c, d ∈ R we obviously havecx1 6= dx1, so that the equation

    x = c1x1

    has a unique solution for any x ∈ span(x1).

    11

  • The alert reader might notice that this definition is quite problematic forpractical computations. Since span(x1, . . . ,xk) is an infinite set (unless allvectors are the point 0), to check for linear independence, we would haveto check the condition for infinitely many x ∈ span(x1, . . . ,xk). This wouldnot be practical, since we would never be done, so fortunately we have thefollowing theorem, which implies we only need to check one element:

    Theorem 2.4.5 The vectors x1, . . . ,xk ∈ Rn are linearly independent if andonly if the equation

    0 = c1x1 + . . .+ ckxk

    has a unique solution ci = 0 for i = 1, . . . , k.

    Proof. We first show that linear independence implies the condition. Assumethat x1, . . . ,xk ∈ Rn are linearly independent. Since 0x = 0 ∈ Rn for anyvector in Rn, we get

    0 = 0x1 + . . .+ 0xn

    But the definition of linear independence says precisely that any such solutionis unique. Since we have just found one, this must be the unique one, so ci = 0for i = 1, . . . , k is the unique solution.

    Next assume that ci = 0 is the only solution for the equation in thetheorem. Pick x ∈ span(x1, . . . ,xk) and assume that

    x = c1x1 + . . .+ ckxk = d1x1 + . . .+ dkxk.

    It follows that

    0 = x− x = (c1 − d1)x1 + . . .+ (ck − dk)xk.

    By our assumption, we must have ci − di = 0 i.e. ci = di, i = 1, . . . , k. Itfollows that x = c1x1 + . . .+ ckxk has a unique solution.

    2.5 Dimension

    Now that we know linear independence and how to test for it, we can begindefining the concept of dimension. We start with the following theorem.

    12

  • Theorem 2.5.1 Assume that x1, . . . ,xk ∈ Rn are not linearly independent.Then we may remove some vector xi from the list without changing the span,i.e.

    span(x1, . . . ,xi−1,xi+1, . . . ,xk) = span(x1, . . . ,xk).

    Proof. By theorem 2.4.5, we may find some c1, . . . , ck ∈ R, s.t. ci 6= 0 forsome i and

    0 = c1x1 + . . .+ ckxk.

    Since we can just rename our variables, we may just as well assume thatc1 6= 0 to simplify notation.

    We already know that span(x2, . . . ,xk) ⊂ span(x1, . . . ,xk), so we needto show the opposite inclusion. Pick x ∈ span(x1, . . . ,xk). We need to showthat we may write

    x = d2x2 + . . .+ dkxk.

    It follows from 0 = c1x1 + . . .+ ckxk that −c1x1 = c2x2 + . . .+ ckxk. By ourassumption, c1 6= 0, so we can divide both sides by −c1 giving

    x1 = −(c2/c1x2 + . . .+ ck/cixk)

    Now x ∈ span(x1, . . . ,xk), so for some αi ∈ R, i = 1, . . . , k, we get bysubstituting for xi that

    x = α1x1 + . . .+ αkxk

    = −α1(c2/c1x2 + . . .+ ck/cixk) + α2x2 + . . .+ αkxk= (α2 − α1c2/c1)x2 + . . .+ (αk − α1ck/c1)xk.

    It follows that x ∈ span(x2, . . . ,xk), so the proof is complete.

    Algorithm 2.5.2 (Find linearly independent subset) The previous theoremprovides us with the following algorithm. Assume we are given a list ofvectors x1, . . . ,xk ∈ Rn. Note that the previous proof shows that if 0 =c1x1 + . . . + ckxk, then we can remove any element from the list for whichci 6= 0 without changing the span.

    1. If our list contains vectors that are 0 we may remove them withoutaffecting the span.

    13

  • 2. If x1, . . . ,xk are not linearly independent, then 0 = c1x1 + . . . + ckxkand some ci 6= 0. Remove the vector xi from the list. This won’t thenaffect the span.

    3. If the list that is left is not linearly independent, we may repeat theprevious step.

    4. Continue until left with a linearly independent list of vectors with thesame span.

    At each stage the list of vectors decrease in length by 1, so the process hasto stop, so the algorithm always stops with a linearly independent list. Thisproves the following:

    Theorem 2.5.3 Assume that V = span(x1, . . . ,xk), then we may alwaysfind some linearly independent subset A ⊂ {x1, . . . ,xk}, s.t.

    span(x1, . . . ,xk) = span(A)

    Example 2.5.4 Let x1 = (1, 0), x2 = (0, 1), x3 = (1, 1). We showed inexample 2.4.2 that these vectors are not linearly independent. Furthermore,we showed that

    span(x1) ( span(x1,x2) = span(x1,x2,x3).

    The subset A = {x1,x2} ⊂ {x1,x2,x3} is linearly independent. Thus ouralgorithm would have stopped after step (2).

    Definition 2.5.5 Let V = span(x1, . . . ,xk) and A a linearly independentsubset of {x1, . . . ,xk} s.t. V = span(A). We define the dimension of V tobe the number

    dimV = #A.

    (Here # denotes the number of elements in the set in our linearly independentlist produced by the algorithm). The vector space V = {0} ⊂ Rn is spannedby an empty list of vectors, so by convention it has dimension 0.

    Definition 2.5.6 Let V = span(x1, . . . ,xk) s.t. the x1, . . . ,xk are linearlyindependent. Then the vectors x1, . . . ,xk are called a basis for V .

    14

  • Note 2.5.7 The alert reader might notice some serious problems here. IfV = span(x1, . . . ,xk) = span(y1, . . . ,yl) for two different sets of vectors.Then running the algorithm for our first list produces some

    A ⊂ {x1, . . . ,xk}

    while running the algorithm for the second list produces some subset

    B ⊂ {y1, . . . ,yl}.

    Our definition then says that #A = dimV = #B, so our definition of di-mension only makes sense if A and B contain the same number of vectors.Fortunately, we have the following theorem, which tells us that vector spacesbehave pretty much exactly the way we would like them to. In particular, Aand B will always contain the same number of vectors.

    Theorem 2.5.8 Let V ⊂ Rn be a vector space. Then the following are true:

    1. V has a basis, so there always exists a finite list of vectors x1, . . . ,xks.t. they are linearly independent and V = span(x1, . . . ,xk).

    2. Any two bases of V have the same number of elements, thus dimV is awell-defined constant independent of the chosen basis. Mathematicianswould call such a thing an intrinsic property of V .

    3. IfW ⊂ V ⊂ Rn are vector spaces, then dimW ≤ dimV ≤ dimRn = n.

    4. If W ⊂ V ⊂ Rn and dimW = dimV , then W = V .

    Proof. Take either math 312 or math 370.

    Note 2.5.9 Gaussian elimination taught later in the class will provide uswith an effective method for computing the dimension of the span of somevectors.

    2.6 Generalizing our Definition of a Vector (optional)

    Our current definition of a vector is not completely adequate for our class.We will need a slightly more general definition for a vector when solvingsystems of differential equations. If you trace through all the proofs made sofar, you see that we have used to following properties of vector addition:

    15

  • 1. x + y ∈ V for all x,y ∈ V (closure under addition)

    2. x + y = y + x for all x,y ∈ V (commutativity)

    3. x + (y + z) = (x + y) + z for all x,y, z ∈ V (associativity)

    4. ∃0 ∈ V s.t. 0 + x = x for all x ∈ V (existence of zero vector)

    5. Given x ∈ V there’s a vector y ∈ V s.t. x+y = 0 (existence of inverse)

    Now assume V is any set with an addition satisfying all the above properties.Let F denote either R or C. If for each x ∈ V and c ∈ F, we can define anew vector cx ∈ V s.t. the following hold

    1. For all c ∈ F and x ∈ V , cx ∈ V (closure under scalar multiplication)

    2. For all c ∈ F and x,y ∈ V , c(x + y) = cx + cy (distributivity)

    3. For all c1, c2 ∈ F and x ∈ V , (c1 + c2)x = c1x + c2x (distributivity)

    4. For all c1, c2 ∈ F and x ∈ V , (c1c2)x = c1(c2x)

    5. 1x = x for all x ∈ V ,

    then we call the product cx a scalar multiplication on V .

    Definition 2.6.1 A set V equipped with a scalar multiplication by F is avector space. When F = R we call V a real vector space and when F = C wecall V a complex vector space

    If the previous example felt abstract, then it’s precisely because it is. Afew examples will show that it’s not that complicated.

    Example 2.6.2 Let P(R) denote polynomials with real coefficients and setV = P(R). We can clearly multiply a real polynomial by a real number, soe.g. 2(1+x+x2) = 2+2x+2x2 and similarly a sum of two real polynomials isa real polynomial. Thus we pick F = R. The reader can check that additionof polynomials and multiplication by a scalar satisfies all the axioms givenabove, where the zero vector is just the trivial polynomial 0.

    16

  • Example 2.6.3 Let V = {f : [0, 1] → R} denote the collection of all func-tions from [0, 1] to R. If f, g are two such functions, then we can add thempointwise, so we can define a function f + g by (f + g)(x) = f(x) + g(x).Similarly, we can multiply such a function with a real number, so if c ∈ R,then cf is the function (cf)(x) = c · f(x) i.e. we multiply the value at eachpoint with c. This is precisely what it intuitively means to multiply a func-tion by a number. Again, it’s a simple exercise to check that V together withF = R satisfies all the properties of a vector space.

    When solving differential equations, we will have to deal with vectorspaces V , where the elements i.e. the vectors x ∈ V are solutions to a dif-ferential equation. Solving an equation will then be the problem of finding abasis for this so called solution space. Here’s a simple example:

    Example 2.6.4 Assume we are given the differential equation y = y′. Fromprevious courses we know that the solutions to this equation are the functionscex. Now let V = {cex | c ∈ R}, then given x,y ∈ V , we have that x = c1exand y = c2ex, so that x + y = (c1 + c2)ex ∈ V . This shows that V is closedunder addition. Scalar multiplication is then just multiplying the functionwith a real number like in the previous example, so if cex ∈ V , then for anyreal r ∈ R also rcex ∈ V . Again, one can check that the addition and scalarmultiplication satisfies the axioms of a vector space. Furthermore, ex is abasis of V .

    Example 2.6.5 A final standard example that we will encounter is vectorspaces where the vectors are lists of functions e.g. (x + 1, ex, x2). Again,these can be added component-wise and similarly for scalar multiplication.

    WARNING 2.6.6 When working with these more generalized examples,theorem 2.5.8 stops being true. For example, there’s no finite list of vectorsspanning the vector space of all real polynomials P(R). Such vector spaceswill be called infinite dimensional. Those of you who want to think aboutthis can look at the optional homework problems.

    17

  • 3 Matrices

    3.1 Introduction

    Definition 3.1.1 A matrix is a rectangular array of numbers or functions,a1,1 a1,2 · · · a1,ma2,1 a2,2 · · · a2,m...

    ......

    an,1 an,2 · · · an,m

    .By ai,j we simply mean the element in the ith row and jth column. SometimesI will also use the term (i, j)-element of a matrix which just means the elementai,j. By the dimensions of a matrix we mean the number of rows and columns.A matrix with n rows and m columns is called an n-by-m matrix. We willoften denote an n-by-m matrix by A = (aij), where i = 1, . . . , n and j =1, . . . ,m. This shorter notation is usually needed when proving things aboutmatrices.

    Definition 3.1.2 n-by-m matrices are denoted by Matn,m(R), Matn,m(C)etc. where the set in parenthesis denotes what set the elements of the matrixbelong to.

    Example 3.1.3 The following are matrices in Mat2,2(R) respectively Mat2,3(C),[1√

    20 1

    ],

    [1 0 i2 1 0

    ],

    while the following is a matrix, where the entries are functions[x2 + 1 ex + 1x 1

    ].

    A matrix with function elements can be thought of as a function into eitherMatn,m(R) or Matn,m(C), i.e. for each value of the variable, we get a real orcomplex matrix.

    Note 3.1.4 A real matrix is of course also a complex matrix, since real num-bers are also complex, but just with zero imaginary part. Thus Matn,m(R) ⊂Matn,m(C).

    18

  • Definition 3.1.5 Given an n-by-m matrixa1,1 a1,2 · · · a1,ma2,1 a2,2 · · · a2,m...

    ......

    an,1 an,2 · · · an,m

    .we call x = (ai,1, . . . , ai,n) the ith row vector and

    y =

    a1,ja2,j...an,j

    = (a1,j, . . . , an,j)the jth column vector. Notice that we also denote vectors with the bracketnotation, so we identify an n-vector with a n-by-1 matrix.

    In the definition above, we see that vectors in the matrix setting can alsocontain functions as elements. Thus you should think of a row or columnvector as just a list of elements in the corresponding row or column of thematrix.

    Example 3.1.6 Given the complex matrix from the previous example,[1 0 i2 1 0

    ],

    the first column vector is (1, 2) while the second row vector is (2, 1, 0). Thefirst row vector of [

    x2 + 1 ex + 1x 1

    ]is (x2 + 1, ex + 1).

    3.2 Matrix Algebra

    Definition 3.2.1 The sum of two matrices is defined as follows: a1,1 · · · a1,m... . . . ...an,1 · · · an,m

    + b1,1 · · · b1,m... . . . ...bn,1 · · · bn,m

    = a1,1 + b1,1 · · · a1,m + b1,m... . . . ...an,1 + bn,1 · · · an,m + bn,m

    19

  • Note that matrices can only be added if they have the same dimensions.More compactly, if A = (aij) and B = (bij) are of equal dimensions, thenA+B = (aij + bij).

    Definition 3.2.2 Given a number c and a matrix, we can define scalarmultiplication by

    c

    a1,1 · · · a1,m... . . . ...an,1 · · · an,m

    = ca1,1 · · · ca1,m... . . . ...can,1 · · · can,m

    The number c is called a scalar and depending on the context we mightassume c is either a natural, whole, rational, real or complex number. Again,we may write A = (aij) and then cA = (caij).

    Example 3.2.3 Some concrete examples:[1√

    20 1

    ]+

    [x2 + 1 ex + 1x 1

    ]=

    [x2 + 2 ex + 1 +

    √2

    x 2

    ]

    2

    [1 0 i2 1 0

    ]=

    [2 0 2i4 2 0

    ]

    Theorem 3.2.4 Given matrices A,B,C of the same dimensions and scalarsc, d, scalar multiplication and matrix addition satisfy the following properties:

    1. A+B = B + A (commutativity)

    2. A+ (B + C) = (A+B) + C (associativity)

    3. (cd)A = c(dA)

    4. c(A+B) = cA+ cB, (c+ d)A = cA+ dA (distributivity)

    Proof. I’ll prove the first, the rest are just as easy to check and will behomework. Write A = (aij) and B = (bij) with dimensions again being thesame. Then

    A+B = (aij + bij) = (bij + aij) = B + A.

    20

  • Note 3.2.5 With regards to the generalized definition of a vector space,the reader can check that n-by-m matrices with real coefficients form a realvector space. More generally, let V be any vector space with scalars F (i.e.either R or C) and W = Matn,m(V ) the set of n-by-m matrices where thematrix elements are elements of V . Then W is also an F vector space withcomponent-wise addition and scalar products.

    Definition 3.2.6 If we are given two such lists x = (a1, . . . , an), y =(b1, . . . , bn) representing row or column vectors of a matrix, we can definetheir dot product as

    x · y =n∑i=1

    aibi.

    Example 3.2.7 Let x denote the first column vector and y the second rowvector of [

    x2 + 1 ex + 1x 1

    ]Then

    x · y = (x2 + 1, x) · (x, 1) = (x2 + 1)x+ x = x3 + 2x.

    We can also multiply matrices, but the product of matrices has a some-what unintuitive definition. Matrices where originally invented to handlesystems of linear equations, which we will also study, so therein lies themotivation for the definition. The idea was to write a system of m linearequations in n variables,

    a1,1x1 + a2x2 + . . .+ a1,nxn = b1...

    am,1x1 + a2x2 + . . .+ am,nxn = bm

    ,

    in the following form a1,1 · · · a1,n... . . . ...am,1 · · · am,n

    x1...xn

    = b1...bm

    .

    21

  • Therefore, we want the ”product” on the left to equal a1,1x1 + a2x2 + . . .+ a1,nxn...am,1x1 + a2x2 + . . .+ am,nxn

    .The following definition for the matrix product accomplishes just this:

    Definition 3.2.8 The product of two matrices is defined as follows. Let Abe an n-by-p matrix and B a p-by-m matrix. Denote the row vectors ofA by x1, . . . ,xn and the column vectors of B by y1, . . . ,ym. Then

    AB =

    x1...xn

    [ y1 · · · ym ] = x1 · y1 · · · x1 · ym... . . . ...

    xn · y1 · · · xn · · ·ym

    Thus the element ai,j in the matrix AB is the dot product of the ith rowvector of A and the jth column vector of B. The product AB is an n-by-mmatrix. Using the shorter notation with A = (aij) and B = (bij), thenAB = (cij), where

    cij =

    p∑k=1

    aikbkj.

    WARNING 3.2.9 The dot product between row vectors of A and columnvectors of B has to make sense, i.e. they have to be lists of equal length. Thismeans that the matrix product AB is only defined if the number ofcolumns of A equals the number of rows of B. The size of the matricesin the matrix product work as follows:

    An×pBp×m = Cn×m.

    WARNING 3.2.10 If the product AB is defined, this does not imply thatBA is defined!

    22

  • Example 3.2.11 Here are a bunch of examples:

    [1 0 20 1 1

    ] 1 1 10 0 01 0 1

    = [ 1 + 2 1 1 + 21 0 1

    ]

    =

    [3 1 31 0 1

    ][

    1 02 1

    ] [1 11 1

    ]=

    [1 1

    2 + 1 2 + 1

    ]=

    [1 13 3

    ][

    1 11 1

    ] [1 02 1

    ]=

    [1 + 2 11 + 2 1

    ]=

    [3 13 1

    ][

    2 11 1

    ] [x1x2

    ]=

    [2x1 + x2x1 + 2x2

    ]The last product shows how matrix multiplication relates to systems of linearequations.

    WARNING 3.2.12 As can be seen by the previous example matrix mul-tiplication is not commutative when both AB and BA makes sense. It’sactually rare for the product of two matrices to commute! Write down a few2-by-2 matrices and compute products of them in both orders, you’ll see thatvery few, if any of them, will commute.

    Theorem 3.2.13 Given matrices A,B,C of such dimensions that the prod-ucts are defined, the matrix product satisfies the following properties:

    1. A(BC) = (AB)C (associativity)

    2. (A+B)C = AC +BC, A(B + C) = AB + AC (distributivity)

    Proof. These are annoying to check, but is best done using the shorthandnotation for matrices and the sum formula for the elements in the productmatrix. The interested student can try to check these.

    23

  • 3.3 The transpose of a matrix

    Definition 3.3.1 Given an n-by-m matrix

    A =

    a1,1 · · · a1,m... . . . ...an,1 · · · an,m

    we define its transpose to be the m-by-n matrix

    AT =

    a1,1 · · · an,1... . . . ...a1,m · · · an,m

    .In other words the ith row vector of AT is the ith column vector of A.Using our shorter notation, define a′ji = aij, then if A = (aij), i = 1, . . . , n,j = 1, . . . ,m, we get AT = (a′ji), where j = 1, . . . ,m and i = 1, . . . , n.

    Example 3.3.2 [1 0 12 1 1

    ]T=

    1 20 11 1

    Theorem 3.3.3 The transpose satisfies the following properties:

    1. (AT )T = A

    2. (A+B)T = AT +BT

    3. (AB)T = BTAT

    4. (cA)T = cAT

    Proof. Most are obvious if you try a few examples. I will assign a few ashomework.

    Definition 3.3.4 A matrix is symmetric if AT = A.

    24

  • Symmetric matrices will come up later in the class when we study eigen-values. They also have an important role in the definition of quadratic forms.These are very important in number theory and are also used in geometry toclassify all plane conics. Again, this is outside the scope of this course. Theinterested reader can look up these terms on Wikipedia.

    3.4 Some important types of matrices

    In this section we will look at some very important matrices that arise inapplications. Often we are interested in e.g. being able to factorize a ma-trix into products of the following type. These are usually called matrixdecompositions and they are very important in e.g. numerical analysis andwhen implementing matrix computations on a computer. Unfortunately, wedon’t have time to go into this. The most important class of matrices is thefollowing:

    Definition 3.4.1 A matrix with the same number of rows and column iscalled a square matrix.

    Square matrices have the nice property that if we restrict ourselves ton-by-n square matrices, then we can add them and multiply them and theresult will always be an n-by-n square matrix. They are closed under thestandard matrix operations.

    Definition 3.4.2 Denote by In the n-by-n square matrix with 1’s on thediagonal and 0’s everywhere else. Then this is called the n-by-n identitymatrix or just the identity matrix if the dimensions are clear from the context.If the dimensions of the matrix is obvious from the context, we will just writeI to denote the identity matrix. More concretely, the matrix looks like follows 1 · · · 0... . . . ...

    0 · · · 1

    Theorem 3.4.3 Let A be an n-by-n square matrix. Then

    IA = A = AI,

    25

  • so I is the identity element with respect to matrix multiplication of n-by-nmatrices. This also explains the name of the matrix.

    Proof. Write I = (bij), A = (aij), where i, j = 1, . . . , n, so that

    bij =

    {1, i = j0, i 6= j .

    Letting IA = (cij) and AI = (c′ij), we get

    cij =n∑k=1

    bikakj = biiaij = aij

    and

    c′ij =n∑k=1

    aikbkj = aijbjj = aij.

    Definition 3.4.4 A square matrix with nonzero elements only on the diag-onal is called a diagonal matrix.

    Example 3.4.5 The matrix on the left is a diagonal matrix, while the matrixon the right is not. −1 0 00 1 0

    0 0 3

    , −1 2 00 1 0

    0 0 3

    Definition 3.4.6 A square matrix is called upper triangular if all elementsbelow the diagonal are 0’s. Similarly, a matrix is called lower triangular ifall elements above the diagonal are 0’s.

    Example 3.4.7 The matrix on the left is lower triangular, while the matrixon the right is upper triangular. −1 0 04 1 0

    1 0 3

    , −1 2 00 1 0

    0 0 3

    26

  • Note 3.4.8 Note that a diagonal matrix is both lower and upper triangular.Moreover, a diagonal matrix is also symmetric.

    3.5 Systems of linear equations and elementary matri-ces

    We start by looking again at a linear system of equations:a1,1x1 + a1,2x2 + . . .+ a1,nxn = b1

    ...am,1x1 + am,2x2 + . . .+ am,nxn = bm

    ,

    We may also write it in the matrix form AX = B, where A, X and B are asfollows:

    AX =

    a1,1 · · · a1,n... . . . ...am,1 · · · am,n

    x1...xn

    = b1...bm

    = B.If we evaluate the product on the left, the matrix equation simply becomes a1,1x1 + a1,2x2 + . . .+ a1,nxn...

    am,1x1 + am,2x2 + . . .+ am,nxn

    = b1...bm

    and if we equate components, this will give us precisely the original linearsystem.

    Definition 3.5.1 The matrix A in the matrix notation is called the co-effiecient matrix of the system, B is often called the constant vector and Xthe vector of unknowns.

    Definition 3.5.2 A linear system is consistent if it has at least one solution.If it has no solutions, then it’s called inconsistent.

    Definition 3.5.3 A linear system is underdetermined if it has fewer equa-tions than unknowns. It is overdetermined if there are more equations thanunknowns.

    27

  • Example 3.5.4 The system on the left is consistent while the one on theright is not: {

    x+ y = 0x− y = 0 ,

    {x+ y = 1x+ y = 0

    .

    In matrix form these equations are:[1 11 −1

    ] [xy

    ]=

    [00

    ],

    [1 11 1

    ] [xy

    ]=

    [10

    ].

    Given a linear system, we can clearly do the following without changingthe solution to the system:

    1. We may multiply an equation with a nonzero constant.

    2. We may switch the order of two equations in the system.

    3. We may add a constant multiple of one equation to another.

    Definition 3.5.5 The previous three kinds of operations are called elemen-tary operations on a system. Notice the resemblance with Theorem 2.3.10.This is not a coincidence.

    Definition 3.5.6 Denote by Eij the n-by-n matrix for which every elementis a zero except that the (i, j)-element is 1.

    Note 3.5.7 If A = (aij) is an n-by-n matrix, then A =∑n

    i=1

    ∑nj=1 aijEij,

    so this gives us a nice notation for modifying matrices.

    Definition 3.5.8 The following matrices will be called elementary matrices :

    1. In + (a− 1)Eii.

    2. In + Eij + Eji − Eii − Ejj.

    3. In + aEij, i 6= j.

    28

  • The first one is just the identity matrix with an element a in (i, i). Thesecond one is the identity matrix with the ith and jth row interchanged.The third one is the identity matrix, but with the element 0 in (i, j) replacedwith a.

    Theorem 3.5.9 Let A be an n-by-m matrix, with the following row vectors:

    A =

    x1...xn

    .Multiplying with the elementary matrices on the left has the following effect.Write E1 = In+(a−1)Eii, E2 = In+Eij +Eji−Eii−Ejj and E3 = In+aEijand assume i < j, then

    E1A =

    x1...

    xi−1axixi+1...xx

    , E2A =

    x1...

    xi−1xjxi+1...

    xj−1xixj+1...xx

    , E3A =

    x1...

    xi−1xi + axjxi+1...xx

    ,

    Thus multiplying by E1 multiplies row i by a, multiplying by E2 interchangesrows i and j and multiplying by E3 adds axj to row i.

    Proof. These are all straightforward using the short notation. Won’t go intodetails.

    Example 3.5.10 Here’s an example of an elementary matrix of each type: a 0 00 1 00 0 1

    , 1 0 00 0 1

    0 1 0

    , 1 a 00 1 0

    0 0 1

    29

  • Multiplying a 3-by-2 matrix with the middle one gives e.g. 1 0 00 0 10 1 0

    1 12 43 1

    = 1 13 1

    2 4

    ,so it interchanges the second and third row as desired. The reader shouldcheck that multiplying with the other two has the result claimed in theprevious theorem.

    Definition 3.5.11 The operations performed on a matrix by multiplying byan elementary matrix on the left are called elementary row operations. Thusthese are:

    1. Multiplying a row by a nonzero constant.

    2. Exchanging two rows of a matrix.

    3. Adding a constant multiple of one row to another.

    Definition 3.5.12 Let A be a matrix and B a matrix that we have got byperforming elementary row operations on A, then A and B are said to beequivalent.

    Definition 3.5.13 Let A be an n-by-m matrix and B and n-by-p matrix.We write (A | B) to denote the matrix

    (A | B) =

    a1,1 · · · a1,m b1,1 · · · b1,p... . . . ... ... . . . ...an,1 · · · an,m bn,1 · · · bn,p

    in other words, (A | B) is the matrix formed by taking the column vectorsof A and then appending the column vectors of B. The matrix (A | B) iscalled the augmented matrix of A by B.

    Definition 3.5.14 Given a linear system of equationsa1,1x1 + a2x2 + . . .+ a1,nxn = b1

    ...am,1x1 + a2x2 + . . .+ am,nxn = bm

    30

  • we call the matrix a1,1 · · · a1,n b1... . . . ... ...am,1 · · · am,n bm

    the augmented matrix of the system. In other words, it’s just the matrix(A | B), when the system is written as AX = B.

    Definition 3.5.15 A linear system is called homogeneous if bi = 0 for all i.If this does not hold, then the system is nonhomogeneous.

    Homogeneous systems are important for the following reason. Assume weare given the system AX = B and assume that we have found two differentvectors X1 and X2 as a solution. Then

    A(X2 −X1) = B −B = 0,

    so X2−X1 is a solution to the corresponding homogeneous system AX = 0.Thus, X2 = X1 +(X2−X1), so the other solution is X1 plus a solution to thehomogeneous equation. Conversely, if Xh is a solution to the homogeneousequation AX = 0, then

    A(X1 +Xh) = AX1 + AXh = B + 0 = B,

    so X1 +Xh is a solution to the system AX = B. This argument shows thatto solve AX = B one needs to find one particular vector, say Xp, solvingit. Then we can look for all the solutions of the corresponding homogeneoussystem AX = 0, which is generally easier to solve, and the full set of solutionsto AX = B will then be the vectors

    Xp +Xh,

    where Xh varies over all the solutions of AX = 0.

    3.6 Gaussian elimination

    The tool for solving systems of linear equations will be the following. As-sume that we have a system with an augmented matrix looking like e.g. thefollowing: 1 1 1 30 1 2 1

    0 0 1 1

    31

  • The corresponding linear system to this isx+ y + z = 3y + 2z = 1z = 1

    We see that solving the system is very simple, since we already know thevalue for z. Thus, we can plug in z = 1 into the second equation, then solvefor y = 1 and finally plug in y and z into the first equation to get x = 1.The matrix defining the system has a very special form, which is what makessolving the system easy.

    Definition 3.6.1 We will say that a matrix is in row-echelon form if itsatisfies the following:

    1. The first nonzero entry in each row is a 1.

    2. The first nonzero entry in a lower row always appears to the right ofthe first nonzero entry of an upper row.

    3. All zero rows are at the bottom of the matrix.

    Example 3.6.2 The matrix in the first example in this section is in row-echelon form. Another one would be

    1 0 2 −1 00 0 1 −5 70 0 0 1 00 0 0 0 0

    .Definition 3.6.3 The first nonzero element in each row is called a pivot.

    Algorithm 3.6.4 A linear system can be solved using the following proce-dure:

    1. By only applying elementary row operations to its augmented matrix,transform the matrix into row-echelon form.

    2. Now solve as in the first example of this section (if possible) by solvingfor variables from the bottom up.

    32

  • Definition 3.6.5 The procedure used to perform step (1) is called Gaussianelimination. The procedure is best understood through a few examples, butI’ll give an algorithmic description below.

    Algorithm 3.6.6 (Gaussian elimination) Let A be an n-by-m matrix.

    1. Look for the leftmost nonzero column of A.

    2. Switch rows if needed, so the the upper most element in that row i.e. thepivot is nonzero.

    3. Multiply the row with the pivot by a scalar, so that the pivot is 1.

    4. Make the elements below the pivot to be zero, by subtracting a multipleof the pivot row from each row below the pivot.

    5. Continue the procedure on the ”submatrix” that you get by removingthe pivot row and column.

    Example 3.6.7 We want to transform the following matrix into row-echelonform using only elementary row operations: 1 3 −2 −74 1 3 5

    2 −5 7 19

    .This works as follows. We see that for the matrix to be in row-echelon form,the elements below the top-left element need to be zeros. Denote the ith rowby Ri. We can replace R2 by R2 − 4R1 and R3 by R3 − 2R1. The processgives the following chain: 1 3 −2 −74 1 3 5

    2 −5 7 19

    R2←R2−4R1R3←R2−2R1⇒ 1 3 −2 −70 −11 11 33

    0 −11 11 33

    R2←− 111R2R3←− 111R3⇒ 1 3 −2 −70 1 −1 −3

    0 1 −1 −3

    R3←R3−R2⇒ 1 3 −2 −70 1 −1 −3

    0 0 0 0

    33

  • This matrix in row-echelon form, then corresponds to the systemx+ 3y − 2z = −7y − z = −30 = 0

    ,

    which is then equivalent to the original one, thus has the same set of solutions.Note that even though we started with three equations, the system essentiallyreduced to a system of two. This will be explained in more details in the nextsection. The system is now underdetermined, so we can solve it as follows.Since, the last equation puts no constraints on z, choose z freely to be anynonzero value, say, z = 1. From the second equation, we then get y = −2and finally from the first x = 16.

    Example 3.6.8 Here’s an example of what happens when a system has nosolution. Assume our system has the augmented matrix 1 2 1 0−1 0 2 1

    0 2 3 2

    Applying Gaussian elimination on this matrix gives us the following chain: 1 2 1 0−1 0 2 1

    0 2 3 2

    R2←R2+R1⇒ 1 2 1 00 2 3 1

    0 2 3 2

    R3←R3−R2⇒ 1 2 1 00 2 3 1

    0 0 0 1

    R2←R2/2⇒ 1 2 1 00 1 3

    212

    0 0 0 1

    Translating this back to a linear system, we see that our system is

    x+ 2y + z = 0y + 3z/2 = 1/20z = 1

    .

    Obviously, there’s no way to satisfy the last equation.

    Example 3.6.9 Here’s a more complicated example where we need to applyall row operations, including row switching. This is also closer to what solving

    34

  • a real-life system would amount to, since the elements in the matrix typicallyend up being rational numbers making the computation slightly annoying.

    1 3 5 −1 11 3 0 1 12 2 9 4 01 0 3 0 1

    R2←R2−R1⇒

    1 3 5 −1 20 0 −5 2 −10 −4 −2 6 −40 −3 −2 1 −1

    R2↔R3⇒

    1 3 5 −1 20 −4 −2 6 −40 0 −5 2 −10 −3 −2 1 −1

    R2←−R2/4⇒

    1 3 5 −1 20 1 1/2 −3/2 10 0 −5 2 −10 −3 −2 1 −1

    R4←R4+3R2⇒

    1 3 5 −1 20 1 1/2 −3/2 10 0 −5 2 −10 0 −1/2 −7/2 2

    R3←−R3/5⇒

    1 3 5 −1 20 1 1/2 −3/2 10 0 1 −2/5 1/50 0 −1/2 −7/2 2

    R4←R4+R3/2⇒

    1 3 5 −1 20 1 1/2 −3/2 10 0 1 −2/5 1/50 0 0 −37/10 21/10

    R4←−10R4/37⇒

    1 3 5 −1 20 1 1/2 −3/2 10 0 1 −2/5 1/50 0 0 1 −21/37

    3.7 Rank of a matrix

    Definition 3.7.1 Let A be an n-by-m matrix and denote the row vectors ofA by x1, . . . ,xn. We define the rank of A of to be the dimension of

    V = span(x1, . . . ,xn)

    i.e. rankA = dimV .

    From chapter 2 we already know a method for computing this. We justapply our algorithm that removes vectors from the list of row vectors with-out changing the span. Once we are left with a linearly independent list ofvectors, then the size of that list will be the rank of A.

    However, there is a much more efficient way of computing this. Assumethat we have applied Gaussian elimination on the matrix A transforming itinto row-echelon form. Gaussian elimination only performs elementary row-operations on the matrix, but Theorem 2.3.10 says that these won’t affect

    35

  • the span! So given a matrix in row-echelon form, is there an easy way tofigure out the dimension of the span of the row vectors?

    Lets look at an example from the previous section. The following matrixfrom the previous section is in row-echelon form

    1 0 2 −1 00 0 1 −5 70 0 0 1 00 0 0 0 0

    and we’ll denote the row vectors by x1, . . . ,x4. First notice that the lastvector is the zero vector, so it doesn’t add anything to the span, so we canjust throw x4 away. We are then left with three row vectors correspondingto the nonzero ones. Now look at the equation

    0 = c1x1 + c2x2 + c3x3

    in R5. Notice that the first component of x1 is 1, since the vector equals(1, 0, 2,−1, 0), but the first component is zero for all other vectors. If welook at the above equation only for the first component it will be

    0 = c1 · 1 + c2 · 0 + c3 · 0.

    This forces c1 = 0. The equation for the third component will then become

    0 = 0 · 2 + c2 · 1 + c3 · 0,

    so again c2 = 0 and finally c3 = 0. Therefore our three row vectors arelinearly independent and the rank of the matrix is 3! You should convinceyourself that for any matrix in row-echelon form, the previous argument willwork. This provides us with the following method for computing the rank ofa matrix:

    Algorithm 3.7.2 (Compute rank of matrix) Let A be an n-by-m matrix.

    1. Apply Gaussian elimination to reduce A to row-echelon form.

    2. Count the number of nonzero rows in the row-echelon form.

    3. The number of nonzero rows is the rank of the matrix.

    36

  • This method has another nice application to material we did on vectorspaces. Assume that we are given some vectors x1, . . . ,xk ∈ Rn and we wantto figure out the dimension of V = span(x1, . . . ,xk). This can be done usingthe idea of the previous algorithm as follows:

    Algorithm 3.7.3 (Find basis of V = span(x1, . . . ,xk)) We perform thefollowing steps:

    1. Form a k-by-nmatrix A which has the vectors x1, . . . ,xk as row vectors.

    2. Compute the rank using the previous algorithm.

    3. The rank will be the dimension of V and the nonzero rows of the row-echelon form of A will form a basis of V .

    Note 3.7.4 The previous algorithm provides us with a basis of V . It willnot however find a subset of the list x1, . . . ,xk which is a basis. To do thatyou have to use Algorithm 2.5.2.

    3.8 Rank and systems of linear equations

    There’s still an important application of rank. The rank of the augmentedmatrix of a linear system lets us analyze whether or not a linear system ofequations has any solutions, and, if it has, if there’s a unique solution orinfinitely many of them. If we are given a system AX = B the size of itssolution set will be completely determined by comparing the rank of A tothe rank of (A | B).

    Assume that we start with a matrix A and we reduce it row-echelon form.Since the rank of A can never exceed the number of rows of A, we eventuallyhave two cases:

    1. The row-echelon form has no zero rows. We say it has full rank, sincethe rank equals the number of rows of A.

    2. The row-echelon form has one or more zero rows at the bottom, so therank is less than the rank of A.

    Now lets think what will happen if we add one more column to A and computethe row-echelon form of the new matrix. If you look at the examples of

    37

  • Gaussian elimination from the previous section, you’ll notice that at eachstage of the algorithm we only care about one column at a time, since we tryto clear the elements to zero below our current pivot. Therefore, nothing willchange in the algorithm if we add a new column, except when we reach thatcolumn in the algorithm. The end result is that adding one more column tothe right might turn a zero row of the old row-echelon form into a nonzerorow in the row-echelon form of the new matrix. Thus we get the following:

    Proposition 3.8.1 Given a linear system AX = B we have the followingpossibilities:

    1. rankA = rank(A | B).

    2. rankA < rank(A | B).

    The significance is that in the first case the system has solutions while inthe second case it does not. The second case essentially means that therow-echelon form of (A | B) will have a row of the following form[

    0 · · · 0 1].

    But in the linear system corresponding to the matrix this will correspondto an equation 0 = 1, which has no solution! Thus we have the followingtheorem:

    Theorem 3.8.2 A linear system AX = B has a solution if and only ifrankA = rank(A | B).

    This now gives us a simple algorithm for determining if a system has asolution:

    Algorithm 3.8.3 (Checking if a linear system has a solution) Let the systembe AX = B and let (A | B) be the augmented matrix. We do the following:

    1. Reduce the matrix (A | B) into row-echelon form.

    2. Let M denote the reduced matrix and N the reduced matrix with thelast column removed.

    38

  • 3. If rankM > rankN , then the system has no solution.

    Finally, if the system has solutions then we have two cases:

    1. The system has a unique solution.

    2. The system has more than one solution.

    Now assume that the latter holds, then we may find two solutions to theequation AX = B, so let X1 and X2 be solutions. As we saw earlier thismeans that

    A(X1 −X2) = 0,

    but then we can multiply X1−X2 by a scalar getting c(X1−X2) and, again,

    A(c(X1 −X2)) = cA(X1 −X2) = 0,

    so it follows that X1 + c(X1 −X2) is a solution to the system for all choicesof c. Since X1 − X2 is not the zero vector (the solutions being distinct) itfollows that the system has infinitely many solutions. Hence, we have thetheorem:

    Theorem 3.8.4 A consistent linear system has either a unique solution orinfinitely many solutions.

    There’s a very simple way to distinguish between these. Let A be then-by-m coefficient matrix of AX = B, so the system has n equations and munknowns. Then the following holds:

    1. If rankA < m, then we have infinitely many solutions.

    2. If rankA = m, then we a unique solution.

    To collect everything together we have the following algorithm for com-pletely solving a linear system:

    Algorithm 3.8.5 (Solving a linear system) Assume we are given a systemof n equations and m unknowns, so AX = B, where A is an n-by-m matrix.We do the following:

    39

  • 1. Compute the row-echelon form of (A | B).

    2. Let M denote the reduced matrix and N the reduced matrix with thelast column removed.

    3. If rankM > rankN , then stop since the system has no solutions.

    4. If rankM = rankN = m, then the system has a unique solution.

    5. If rankM = rankN < m, then the system has infinitely many solu-tions.

    6. If we have solutions and r = rankM , then solve by back substitutionas in the example in the beginning of section 3.6. During the backsubstitution we can choose freely the value of m− r variables.

    3.9 Determinants

    In math 114 you have already met a form of the determinant when computingthe cross product of two vectors. The determinant is a tool that you canthink about as follows. You feed it a list of n vectors in Rn and out comes areal number. In other words it’s a function that takes as input vectors andoutputs a number. The function is denoted by det and in the case n = 2 itwould give us a number

    det((a1, b1), (a2, b2)).

    The problem of this section is trying to determine what we would like thisfunction to do and then how to define it.

    The starting point is the following. Assume that we are given two vectorsx1,x2 in R2. Then from 114 you might remember that they form a paral-lelogram with vertices 0,x1,x2,x1 + x2. we would like the determinant tomeasure the signed area, so

    det(x1,x2) = ±area of parallelogram.

    For three vectors y1,y2,y3 ∈ R3, we would like the determinant to measurethe signed volume of the parallelepiped of the matrices. This concept of signhas to do with something called the orientation of the measured geometricobject. We won’t go into that.

    40

  • This idea of ”volume” can be generalize into Rn and we want the previoustwo examples to generalize to the definition of volume in higher dimensions. Ifyou don’t want to think about what something like that would philosophicallymean (e.g. what’s a cube in 4-dimensions and what’s the volume?), justassume that all the following examples have n = 2 or n = 3.

    If the determinant measures volume, then it should certainly satisfy thefollowing

    det(cx1,x2, . . . ,xn) = c det(x1,x2, . . . ,xn), (1)

    so if I stretch (or shrink) the parallelepiped in one dimension by a factor ofc, then certainly the volume also changes by a factor of c. Now if you add avector x to x1, then we should get

    det(x1 + x,x2, . . . ,xn) = det(x1,x2, . . . ,xn) + det(x,x2, . . . ,xn) (2)

    Since, the parallelepiped on the left can be cut into pieces and assembled toform the ones on the right (draw picture of n = 2 case). Also if two vectorsare the same, then the volume would be zero, since the object is ”flat”. In R3the parallelepiped would have no height similarly the parallelogram definedby two copies of the same vector has no height. Thus we want that

    det(x1, . . . ,xn) = 0 (3)

    if xi = xj for some i 6= j.To make notation easier, we can define the determinant on matrices. Since

    the determinant takes as input n vectors x1, . . . ,xn ∈ Rn, we can just let Abe an n-by-n matrix where the row vectors are x1, . . . ,xn i.e.

    A =

    x1...xn

    .and then let detA = det(x1, . . . ,xn). Now we are ready to actually definethe determinant function through a list of conditions we want it to satisfy:

    Definition 3.9.1 The determinant is a function that takes as input an n-by-n matrix and outputs a number. The function is assumed to have thefollowing properties:

    1. The identity matrix has determinant 1, so det I = 1.

    41

  • 2. If we replace the ith row vector by ax + by we have the followingequality:

    det(x1, . . . ,xi−1, ax + by,xi+1, . . . ,xn) =a det(x1, . . . ,xi−1,x,xi+1, . . . ,xn) + b det(x1, . . . ,xi−1,y,xi+1, . . . ,xn)

    3. If two rows in A are equal, then detA = 0.

    WARNING 3.9.2 The determinant is only defined for square matrices, sowhenever we speak of the determinant of a matrix it’s automatically assumedthat the matrix is a square matrix.

    Note 3.9.3 These conditions are nothing but a compact way of expressingwhat we had just done above. At least the middle condition deserves somecomment. It actually says a few things. First, if b = 0, then it simply saysthat multiplying a row by the constant a multiplies the determinant by a,so it’s just the equation (1) above. When a = 1 it’s simply the condition(2) about adding another vector to one of the rows. The last condition isprecisely condition (3).

    So we haven’t actually given a formula for computing the determinant andwe don’t even know if there is a function that satisfies our properties, sincethe list of properties might be contradictory. For example you can’t find afunction s.t. f(1) = 0 and f(1) = 1, so just listing a bunch of properties andsaying we pick a function that has these properties is not very correct from amathematical point of view. Thus, our current definition of the determinantis more like a wishlist. However, in this course we will just take for grantedthat the determinant function exists and it has all the properties listed.

    3.10 Properties of the determinant

    In this section we will derive some properties of the determinant startingfrom the three properties listed in the definition of the determinant. Theseproperties end up being extremely important for practical computations.

    Theorem 3.10.1 Interchanging two rows of a matrix changes the sign ofthe determinant.

    42

  • Proof. I’ll use the vector notation for this one, since it’s easier. Assume thatA has row vectors x1, . . . ,xn and assume we switch the first and second row.The argument for any other pair is the same, but the notation is just messier.We expand using property (2) and use (3) to get rid of terms, so

    0 = det(x1 + x2,x1 + x2,x3, . . . ,xn)

    = det(x1,x1,x3, . . . ,xn) +

    det(x1,x2,x3, . . . ,xn) +

    det(x2,x1,x3, . . . ,xn) +

    det(x2,x2,x3, . . . ,xn)

    = det(x1,x2,x3, . . . ,xn) +

    det(x2,x1,x3, . . . ,xn)

    Since the sum of the last two terms is zero, they must have the same magni-tude and opposite signs.

    Theorem 3.10.2 If a matrix contains a row of all zeros, then the determi-nant is zero.

    Proof. I’ll assume the first row is zero to simplify notation. The proof forany other row is the same. Let the row vectors be 0,x2, . . . ,xn. Then

    det(0,x2, . . . ,xn) = det(0 + 0,x2, . . . ,xn)

    = det(0,x2, . . . ,xn) + det(0,x2, . . . ,xn)

    Now subtract det(0,x2, . . . ,xn) from both sides.

    Theorem 3.10.3 Adding a scalar multiple of one row to a different row willnot change the value of the determinant.

    Proof. Assume we add a scalar multiple of the row xi, i 6= 1, to the firstrow. The general case is again similar. Let the row vectors be x1,x2, . . . ,xn.Then

    det(x1 + axi,x2, . . . ,xn) = det(x1,x2, . . . ,xn) + a det(xi,x2, . . . ,xn).

    But xi occurs twice in the list of vectors in the second term, so property (3)of the determinant tells us that the term is zero.

    43

  • The following theorem is very important and the observation used in theproof will be used to actually compute determinants.

    Theorem 3.10.4 If an n-by-n matrix has rank less than n, then the deter-minant is zero.

    Proof. Let A have rank(A) < n. Let B be the row-echelon form of A. ThenB will have a row of all zeros at the bottom, so by the previous theorem thedeterminant is zero. The theorem then follows from observing the following:

    1. If the determinant of a matrix is nonzero, then multiplying a row bya nonzero constant won’t make the determinant zero (since, the deter-minant just gets multiplied by the constant).

    2. Interchanging two rows of a matrix will not make the determinant zero,since it just switches the sign of the matrix.

    3. Adding a constant multiple of one row to a different row does notchange the determinant.

    Thus we see that applying elementary row operations on a matrix withnonzero determinant won’t make the determinant zero. Since B has beenderived from A precisely in this way, detB = 0 only if detA = 0.

    The next theorem is the basis for actually computing determinants, sinceit gives an extremely simple formula to compute the determinant for certainmatrices.

    Theorem 3.10.5 If a matrix is upper or lower triangular, then the deter-minant equals the product of the diagonal elements.

    Proof. Doing this in full generality is quite annoying, so I’ll show the idea inthe case of a 3-by-3 matrix for the upper triangular case. Thus, we have thematrix: a11 a12 a130 a22 a23

    0 0 a33

    First notice that if any of the diagonal elements are zero, then reducing thematrix to row-echelon form will produce a zero row, so the determinant is

    44

  • zero. Thus the determinant equals the product of the diagonal elements inthis case.

    Now assume that all diagonal elements are nonzero. Subtracting a23a−133times the third row from the second row and a13a−133 times the third row fromthe first row, the matrix becomes a11 a12 00 a22 0

    0 0 a33

    .Now subtract a12a−122 times the second row from the third, so the matrixbecomes a11 0 00 a22 0

    0 0 a33

    .What we have left is just the identity matrix with the rows multiplied bysome scalars. In other words

    det

    a11 0 00 a22 00 0 a33

    = a11 det 1 0 00 a22 0

    0 0 a33

    = a11a22 det

    1 0 00 1 00 0 a33

    = a11a22a33 det

    1 0 00 1 00 0 1

    = a11a22a33,

    since the identity matrix has determinant 1. The proof of the lower triangularcase is essentially the same.

    The following is the most efficient way of computing the determinant forlarge matrices.

    Algorithm 3.10.6 (Determinant by Gaussian elimination) Let A be an n-by-n matrix.

    45

  • 1. Reduce the matrix to row-echelon form keeping track of how many rowshave been switched and how many times a row has been multiplied bya constant. (These are the operations that change the determinant)

    2. The row-echelon matrix is upper triangular, so compute its determi-nant by multiplying the elements on the diagonal. Let the computeddeterminant of the row-echelon form be d.

    3. Assume that rows have been multiplied by the scalars c1, c2, . . . , ck inorder during the Gaussian elimination and a row has been switched stimes with another.

    4. detA = (−1)s(c1 · · · ck)−1d.

    The determinant also has the following two properties that are muchharder to prove:

    Theorem 3.10.7 det(AB) = det(A) det(B).

    Theorem 3.10.8 det(A) = det(AT ).

    The latter theorem has lots of useful consequences. Essentially any claimabout columns becomes a claim about rows in the transpose, so we cantranslate all our theorems to involve operations on columns instead of rows.Using this and our previous results, we can summarize the following long listof properties for the determinant:

    1. Interchanging two rows or columns in a matrix A changes the sign ofthe determinant.

    2. For a triangular (upper or lower) matrix A, the determinant is theproduct of the diagonal elements. In particular, det I = 1.

    3. If a matrix A has a zero row or column, then detA = 0.

    4. If one row or column of A is a linear combination of the others, thendetA = 0.

    5. detA does not change if we add a scalar multiple of a row or columnto another row or column.

    46

  • 6. Multiplying a row or a column by a constant c multiplies the determi-nant by c.

    7. detA = detAT .

    8. det(AB) = det(A) det(B).

    9. For an n-by-n matrix det(cA) = cnA (we multiply n rows by c).

    10. If an n-by-n matrix A has rankA < n, then detA = 0.

    3.11 Some other formulas for the determinant

    In this chapter we list some other concrete ways of computing the determi-nant. I won’t provide any proof or justification for these, since it takes quitea bit of work to derive them from the results in the previous chapter.

    Theorem 3.11.1 (Determinant of 2-by-2 matrix) For a 2-by-2 matrix thedeterminant has the following simple formula:

    det

    [a11 a12a21 a22

    ]= a11a22 − a12a21.

    Proof. This follows from row-reducing the matrix.

    Theorem 3.11.2 (Determinant of 3-by-3 matrix) The determinant of a 3-by-3 matrix a11 a12 a13a21 a22 a23

    a31 a32 a33

    has the following simply visual rule:

    a11 a12 a13 a11 a12

    a21 a22 a23 a21 a22

    a31 a32 a33 a31 a32

    + + +

    − − −

    47

  • In other words,

    det

    a11 a12 a13a21 a22 a23a31 a32 a33

    = a11a22a33+a12a23a31+a13a21a32−a13a22a31−a11a23a32−a12a21a33.Proof. Again we can reduce symbolically to row-echelon form. The detailsare quite messy.

    Unfortunately no simple formulas exists for n-by-n matrices when n > 3.The only way to compute the determinant is either using the methods fromthe previous section or by using the method of cofactor expansion describednext.

    Definition 3.11.3 Let A be a matrix. The (i, j) minor of A is the matrixthat we get by removing the ith row and jth column of A.

    Example 3.11.4 The matrix on the right is the (2, 1) minor of the one onthe left −3 2 −10 5 3

    1 −1 2

    , [ 2 −1−1 2]

    Definition 3.11.5 Let A = (aij) be an n-by-n matrix. The cofactor Cij isdefined to be (−1)i+j detMij, where Mij is the (i, j) minor of A.

    Theorem 3.11.6 Let A = (aij) be an n-by-n square matrix. Then thedeterminant satisfies the following:

    detA = ai1Ci1 + ai2Ci2 + . . .+ ainCin.

    Thus we multiply each element in row i with the cofactor we get by remov-ing the row and column of the element. Similarly, we can do the same forcolumns, so

    detA = a1iC1i + a2iC2i + . . .+ aniCni.

    Definition 3.11.7 The method of computing the determinant in the previ-ous theorem is called the cofactor expansion of the determinant.

    48

  • Note 3.11.8 The previous formula lets you express the determinant of ann-by-n matrix as a sum of determinants of (n− 1)-by-(n− 1) matrices.

    Example 3.11.9 Expanding the determinant in the previous along the thirdcolumn gives

    det

    −3 2 −10 5 31 −1 2

    = (−1)(−1)1+3 det [ 0 51 −1

    ]

    +3(−1)2+3 det[−3 21 −1

    ]+2(−1)3+3 det

    [−3 20 5

    ].

    Note 3.11.10 The cofactor expansion is rarely an efficient method for com-puting determinants of matrices larger than 4-by-4. However, in some specialcases it might be useful. For example if there’s a row or column in the matrixthat has mostly zeros in it, then computing the expansion along that row orcolumn results in a very few terms in the sum in Theorem 3.11.6, since mostaij’s will be zeros.

    Example 3.11.11 The following matrix has mostly zeros in the last column,so the expansion along that column gives

    det

    −3 2 00 5 01 −1 2

    = 0(−1)1+3 det [ 0 51 −1

    ]

    +0(−1)2+3 det[−3 21 −1

    ]+2(−1)3+3 det

    [−3 20 5

    ]= 2 det

    [−3 20 5

    ]Similarly, we could expand along the second row, which would have the sameeffect.

    49

  • 3.12 Matrix inverse

    Definition 3.12.1 Let A be an n-by-n matrix. An n-by-n matrix B is calleda left inverse if BA = I. Similarly, B is called a right inverse if AB = I. IfBA = I = AB, then B is called an inverse of A.

    Matrix multiplication takes a lot of work on larger matrices, which shouldbe apparent by know. Therefore, checking that a matrix is an inverse takesa lot of work, since we have to perform two matrix multiplications to do it.Fortunately, the following result allows us to cut the work in half.

    Theorem 3.12.2 Let B be a left inverse of A, then B is also a right inverse.Thus to check if B is an inverse of A, one only needs to check that BA = I.

    Proof. This is tricky without the theory of linear maps. The proof can befound in most texts on linear algebra.

    Corollary 3.12.3 If AB = I, then BA = I.

    Proof. A is now a left inverse of B, so it’s also a right inverse by the previoustheorem.

    Theorem 3.12.4 If a matrix A has an inverse, then it is unique. This letsus talk about the inverse of A.

    Proof. Assume that both B and C are inverses of A. Then

    B = BI = B(AC) = (BA)C = IC = C.

    Definition 3.12.5 A matrix A with an inverse is called an invertible matrix.The inverse of A will be denoted by A−1.

    The following shows how the matrix inverse is extremely useful for solv-ing systems of linear equations. Assume we are given an invertible n-by-nmatrix A. Now given a system of linear equations

    AX = B

    50

  • we can just multiply both sides by A−1 from the left. It follows that

    X = A−1B.

    What’s important about this is that the matrix inverse gives the solutionright away for any choice of vector B, since the solution X is now a functionof B.

    Example 3.12.6 You should check by multiplication that the following ma-trices are inverses:

    A =

    2 0 1−2 3 4−5 5 6

    , A−1 = −2 5 −3−8 17 −10

    5 −10 6

    Thus if we have a linear system AX = B, then we get xy

    z

    = −2 5 −3−8 17 −10

    5 −10 6

    b1b2b3

    = −2b1 + 5b2 − 3b3−8b1 + 17b2 − 10b3

    5b1 − 10b2 + 6b3

    ,so once we know the matrix inverse of the coefficient matrix a system, solvingit becomes a triviality. The moral of the story is that if you need to solvemany systems of equations with the same coefficient matrix, but varyingconstant vectors, then the most economical approach is to try to computethe matrix inverse of the matrix. If it exists, then you will get a formula likethe above, where you can just plug in your constants i.e. you don’t need todo Gaussian elimination again for every choice of constant vector.

    Having a nice tool like the matrix inverse is not of much use unless we ac-tually know how to compute it. The rest of this section will be devoted todeveloping an algorithm that lets you both compute it, if it exists, as well asdetermine that it does not.

    Assume we are given matrices A, B with the same number of rows andlet (A | B) the again the augmented matrix. Assume we want to peform anelementary row operation on (A | B), say, we want to switch two rows inthe matrix. We know from previous sections that this can be performed by

    51

  • multiplying with an elementary matrix E from the left. The reader shouldconvince himself/herself that the following formula then holds

    E(A | B) = (EA | EB).

    It simply says that it doesn’t matter if we switch rows in the whole matrix orif we do it separately for the two parts. The end result will still be the same.The same thing is true for any of the elementary row operations. Thus wehave the following theorem:

    Theorem 3.12.7 Let A and B the two matrices with n rows and let E bean elementary n-by-n matrix. Then

    E(A | B) = (EA | EB).

    Next assume that A is an n-by-n square matrix and I is the identity matrix.Assume that we can transform A into the identity matrix by a sequence ofelementary row operations. Since each row operation corresponds to mul-tiplying from the left by an elementary matrix E, there will be a sequenceof elementary matrices, E1, . . . , Ek. corresponding to the row operations, inother words

    EkEk−1 · · ·E2E1A = I. (4)

    Now what happens if we perform the same row operations on the matrix(A | I)? Then we will instead end up with

    EkEk−1 · · ·E2E1(A | I) = (I | EkEk−1 · · ·E2E1).

    Thus the second half of the matrix will contain the matrixB = EkEk−1 · · ·E2E1.However, (1) tells us that BA = I, so we have in fact found a way to computethe matrix inverse! It follows that we have the following theorem:

    Theorem 3.12.8 Assume that an n-by-n matrix A can be converted intothe identity matrix through elementary row operations. Then A is invertibleand the previous method lets us compute the inverse.

    52

  • Example 3.12.9 We compute the inverse of the matrix

    1 −1 23 1 21 1 −1

    using the method just described. We get 1 −1 2 1 0 03 1 2 0 1 0

    1 1 −1 0 0 1

    ⇒ 1 −1 2 1 0 00 4 −4 −3 1 0

    0 2 −3 −1 0 1

    1 −1 2 1 0 00 1 −1 −3/4 1/4 0

    0 2 −3 −1 0 1

    ⇒ 1 −1 2 1 0 00 1 −1 −3/4 1/4 0

    0 0 1 −1/2 1/2 −1

    The matrix on the left is now in row-echelon form and we can work backwardsto subtract lower rows from upper rows to transform our left matrix into theidentity matrix. This works as follows:

    1 −1 0 2 −1 20 1 0 −5/4 3/4 −10 0 1 −1/2 1/2 −1

    ⇒ 1 0 0 3/4 −1/4 10 1 0 −5/4 3/4 −1

    0 0 1 −1/2 1/2 −1

    It follows that 1 −1 23 1 2

    1 1 −1

    −1 = 3/4 −1/4 1−5/4 3/4 −1−1/2 1/2 −1

    .

    Now lets make the convention that a vector x = (a1, . . . , an) is an n-by-1matrix. Then multiplying with an m-by-n matrix A makes sense, i.e., wemay compute the product

    A

    a1...an

    and the result will be an m-by-1 matrix, which is a vector in Rm.

    Definition 3.12.10 An m-by-n matrix defines a function Rn → Rm whichmaps a vector x ∈ Rn to Ax ∈ Rm. Such a function is called a linear map,or linear transformation or a linear operator.

    53

  • Using the concept of a linear map, we will give a complete answer to thefollowing question: when does a matrix have an inverse? Going back to ex-ample 3.12.9 lets look at the working backwards step after we have reachedthe row-echelon form. Notice that the method used to work backwards to theidentity matrix works for any n-by-n matrix in row-echelon form, assumingthat the last row is not a zero row. However, these matrices are precisely thesquare matrices that have nonzero determinant, since the row-echelon formhas determinant one and the original determinant differs by a nonzero con-stant corresponding to row switches and scalings of rows by nonzero scalarsduring the Gaussian elimination.

    Conversely, assume that the matrix A has an inverse A−1. If the row-echelon form of A has a zero row, then rankA < n, so that the systemAX = 0 has a nonzero solution, call it x. But then

    x = Ix = A−1Ax = A−10 = 0,

    which would contradict the fact that x 6= 0. It follows that if rankA < n,then no inverse can exist, so if the inverse exists, then the row-echelon formdoes not have a zero row at the bottom. We have thus proved the followingtheorem:

    Theorem 3.12.11 A square matrix A has an inverse if and only if detA 6= 0.

    Algorithm 3.12.12 (Finding inverse of matrix) Let A be a square matrix.1. Compute detA.

    2. If detA = 0, then no inverse exists.

    3. If detA 6= 0, use the method in example 3.12.9 to find the inverse.

    Finally, there’s a useful formula for 2-by-2 matrices, which is very quickto compute:

    Theorem 3.12.13 Write A =[a bc d

    ]. If detA 6= 0, then we have the

    formula:A−1 =

    1

    detA

    [d −b−c a

    ]=

    1

    ad− bc

    [d −b−c a

    ]54

  • Proof. Compute the product of A and the claimed inverse and simplify.

    Definition 3.12.14 An invertible square matrix will be called nonsingularwhile a noninvertible square matrix will be called singular.

    3.13 Eigenvalues and eigenvectors

    This section introduces one of the most important computational tools inmathematics with an extremely diverse set of applications in engineering andsciences. The interested reader should check the discussion on applicationson the Wikipedia page explaining eigenvectors.

    In the previous section we mentioned that given an n-by-n matrix A,then it defines a function Rn → Rn, x 7→ Ax. We are often interested inwhat this map does geometrically, since these maps determined by matricestend to have a very geometric description. The following examples serve asillustrations.

    Example 3.13.1 The matrix below rotates points in R2 counterclockwisearound the origin by the angle θ, which the reader can easily check by mul-tiplying different (a, b) with the matrix,[

    cos θ − sin θsin θ cos θ

    ].

    This matrix and its generalizations have important applications in e.g. com-puter graphics where it’s used to rotate objects before computing a projectionof the scene to a 2D-screen. Note that the matrix rotating clockwise i.e. inthe opposite direction by θ will be A−1.

    Example 3.13.2 The following matrix reflects points around the x-axis.[1 00 −1

    ]

    Example 3.13.3 Write θ = arctan a and let Aθ be the rotation matrix abovecorresponding to the angle θ. Denote by A the reflection matrix above. Thenthe matrix

    AθAA−1θ

    55

    http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors#Applications

  • reflects points around the line y = ax.

    Example 3.13.4 Let Aθ be as in the previous example. Define

    B =

    [c 00 1

    ],

    so that A scales the x-component of a vector by c. Then the matrix

    AθBA−1θ

    scales the component along the line y = ax by c.

    It turns out that by defining matrices for different operations, we can usethem as building blocks to compute quite complicated maps, which would bevery hard to construct directly. The last two examples serve as illustrations.While this is extremely useful in practice, we won’t have time to go into this.I will write an optional short section on this which the interested student canskim.

    Notice that in second example, a vector along the x-axis stays fixed bythe matrix, in the third example a vector along the line stays fixed and inthe third example a vector along the line gets scaled by c. The geometry insuch a map can usually be analyzed through the following concept.

    Definition 3.13.5 Let A be an n-by-n matrix, so it defines a linear mapRn → Rn. Then a nonzero vector x ∈ Rn will be called an eigenvector of Aif

    Ax = λx

    for some λ ∈ R. In other words, an eigenvector is a nonzero vector for whichmultiplying by A has the effect of just scaling the vector. The eigenvalues ofA are the scalars for which a corresponding eigenvector exists that satisfiesthe equation above.

    Example 3.13.6 In the second examples above (1, 0) is an eigenvector cor-responding to the eigenvalue 1. In the third example the vector (1, a) is aneigenvector corresponding to the eigenvalue 1 while in the fourth example(1, a) is an eigenvector corresponding to the eigenvalue c.

    56

  • Example 3.13.7 Let A = I be the identity matrix. For every vector x ∈ Rn,we then have that Ax = x. This shows that every nonzero vector is aneigenvector corresponding to the eigenvalue 1. In particular, 1 is the onlyeigenvalue.

    Example 3.13.8 Let A be an n-by-n matrix. Then A has the eigenvalue 0if and only if AX = 0 has multiple solutions, i.e. rankA < n. This follows,since if x is a nonzero solution, then Ax = 0 = 0x, so 0 is an eigenvalue.Conversely, if 0 is an eigenvalue, then there’s a corresponding eigenvectorsatisfying the equation Ax = 0x = 0. Since eigenvectors are nonzero, thisgives a nonzero solution to AX = 0.

    Theorem 3.13.9 λ ∈ R is an eigenvalue of A if and only if det(A−λI) = 0.

    Proof. Assume first that λ ∈ Rn is an eigenvalue of A, so there exists acorresponding eigenvector x. We have that

    Ax = λx⇔ Ax− λx = 0⇔ (A− λI)x = 0.

    Since x 6= 0 it follows that B = A−λI does not have an inverse, so detB = 0.Conversely, if det(A − λI) = 0, then A − λI does not have full rank, so

    the equation (A− λI)X = 0 has a nonzero solution. Thus there is a vectorx s.t. (A− λI)x = 0. The chain of equivalences above shows that λ is thenan eigenvalue corresponding to the eigenvector x.

    This theorem shows us how to compute the eigenvalues of a matrix. As-sume that we start with a matrix

    A =

    a11 · · · a1n... . . . ...an1 · · · ann

    .Then the expression A− λI corresponds to the matrix

    A− λI =

    a11 − λ · · · a1n... . . . ...an1 · · · ann − λ

    ,

    57

  • i.e. we subtract λ from each diagonal element of A. Having fixed the matrixA, denote

    p(λ) = det(A− λI),so p : R → R is a function with parameter λ and the eigenvalues of A areprecisely the zeros of p. Let’s try to figure out what this function p lookslike. We will start with an example.

    Example 3.13.10 Write A =[a bc d

    ], so that

    p(λ) = det(A− λI) = det[a− λ bc d− λ

    ]= (a− λ)(d− λ)− bc.

    If we simplify the expression on the right, we see that p is a quadratic poly-nomial in λ. The following is true in general.

    Theorem 3.13.11 If A is an n-by-n matrix, then the function p(λ) =det(A− λI) is a polynomial in the variable λ of degree n.

    Proof. This is an easy induction proof.

    Definition 3.13.12 The polynomial p(λ) = det(A−λI) of an n-by-n squarematrix is called the characteristic polynomial of A.

    Finding the eigenvalues of a matrix is now relatively easy. We just com-pute the characteristic polynomial and look for its roots. This can be doneexactly whenever n < 5, since we have explicit formulas for the roots of anypolynomial of degree less than 5. What we still need to figure out is how tocompute the eigenvectors. We do this next.

    Note 3.13.13 Not all polynomials with real coefficients have real roots, sosome matrices have no eigenvalues. An example is given by example 3.13.1.

    Let A be an n-by-n matrix an