linear algebra wwl chen

LINEAR ALGEBRA

W W L CHEN

c© W W L Chen, 1982, 2008.

This chapter originates from material used by the author at Imperial College, University of London, between 1981 and 1990.

It is available free to all individuals, on the understanding that it is not to be used for financial gain,

and may be downloaded and/or photocopied, with or without permission from the author.

However, this document may not be kept on any information storage and retrieval system without permission

from the author, unless such system is not accessible to any individuals other than its owners.

Chapter 1

LINEAR EQUATIONS

1.1. Introduction

Example 1.1.1. Try to draw the two lines

3x+ 2y = 5,6x+ 4y = 5.

It is easy to see that the two lines are parallel and do not intersect, so that this system of two linearequations has no solution.


3x+ 2y = 5,x+ y = 2.

It is easy to see that the two lines are not parallel and intersect at the point (1, 1), so that this systemof two linear equations has exactly one solution.


3x+ 2y = 5,6x+ 4y = 10.

It is easy to see that the two lines overlap completely, so that this system of two linear equations hasinfinitely many solutions.

Chapter 1 : Linear Equations page 1 of 31

Linear Algebra c© W W L Chen, 1982, 2008

In these three examples, we have shown that a system of two linear equations on the plane R2 mayhave no solution, one solution or infinitely many solutions. A natural question to ask is whether therecan be any other conclusion. Well, we can see geometrically that two lines cannot intersect at more thanone point without overlapping completely. Hence there can be no other conclusion.

In general, we shall study a system of m linear equations of the form

a11x1 + a12x2 + . . .+ a1nxn = b1,

a21x1 + a22x2 + . . .+ a2nxn = b2,

...am1x1 + am2x2 + . . .+ amnxn = bm,

(1)

with n variables x1, x2, . . . , xn. Here we may not be so lucky as to be able to see geometrically what isgoing on. We therefore need to study the problem from a more algebraic viewpoint. In this chapter, weshall confine ourselves to the simpler aspects of the problem. In Chapter 6, we shall study the problemagain from the viewpoint of vector spaces.

If we omit reference to the variables, then system (1) can be represented by the arraya11 a12 . . . a1n

a21 a22 . . . a2n...

......

am1 am2 . . . amn

∣∣∣∣∣∣∣∣b1b2...bm

(2)

of all the coefficients. This is known as the augmented matrix of the system. Here the first row of thearray represents the first linear equation, and so on.

We also write Ax = b, where

A =

a11 a12 . . . a1n

a21 a22 . . . a2n...

......

am1 am2 . . . amn

and b =

b1b2...bm

represent the coefficients and

x =

x1

x2...xn

represents the variables.

Example 1.1.4. The array 1 3 1 5 10 1 1 2 12 4 0 7 1

∣∣∣∣∣∣543

(3)

represents the system of 3 linear equations

x1 + 3x2 + x3 + 5x4 + x5 = 5,x2 + x3 + 2x4 + x5 = 4,

2x1 + 4x2 + 7x4 + x5 = 3,(4)



with 5 variables x1, x2, x3, x4, x5. We can also write

1 3 1 5 10 1 1 2 12 4 0 7 1

x1

x2

x3

x4

x5

=

543

.

1.2. Elementary Row Operations

Let us continue with Example 1.1.4.

Example 1.2.1. Consider the array (3). Let us interchange the first and second rows to obtain 0 1 1 2 11 3 1 5 12 4 0 7 1

∣∣∣∣∣∣453

.

Then this represents the system of equations

x2 + x3 + 2x4 + x5 = 4,x1 + 3x2 + x3 + 5x4 + x5 = 5,

2x1 + 4x2 + 7x4 + x5 = 3,(5)

essentially the same as the system (4), the only difference being that the first and second equations havebeen interchanged. Any solution of the system (4) is a solution of the system (5), and vice versa.

Example 1.2.2. Consider the array (3). Let us add 2 times the second row to the first row to obtain 1 5 3 9 30 1 1 2 12 4 0 7 1

∣∣∣∣∣∣1343

.


x1 + 5x2 + 3x3 + 9x4 + 3x5 = 13,x2 + x3 + 2x4 + x5 = 4,

2x1 + 4x2 + 7x4 + x5 = 3,(6)

essentially the same as the system (4), the only difference being that we have added 2 times the secondequation to the first equation. Any solution of the system (4) is a solution of the system (6), and viceversa.

Example 1.2.3. Consider the array (3). Let us multiply the second row by 2 to obtain 1 3 1 5 10 2 2 4 22 4 0 7 1

∣∣∣∣∣∣583

.


x1 + 3x2 + x3 + 5x4 + x5 = 5,2x2 + 2x3 + 4x4 + 2x5 = 8,

2x1 + 4x2 + 7x4 + x5 = 3,(7)



essentially the same as the system (4), the only difference being that the second equation has beenmultiplied through by 2. Any solution of the system (4) is a solution of the system (7), and vice versa.

In the general situation, it is not difficult to see the following.

PROPOSITION 1A. (ELEMENTARY ROW OPERATIONS) Consider the array (2) correspondingto the system (1).(a) Interchanging the i-th and j-th rows of (2) corresponds to interchanging the i-th and j-th equations

in (1).(b) Adding a multiple of the i-th row of (2) to the j-th row corresponds to adding the same multiple of

the i-th equation in (1) to the j-th equation.(c) Multiplying the i-th row of (2) by a non-zero constant corresponds to multiplying the i-th equation

in (1) by the same non-zero constant.In all three cases, the collection of solutions to the system (1) remains unchanged.

Let us investigate how we may use elementary row operations to help us solve a system of linearequations. As a first step, let us continue again with Example 1.1.4.

Example 1.2.4. Consider again the system of linear equations

x1 + 3x2 + x3 + 5x4 + x5 = 5,x2 + x3 + 2x4 + x5 = 4,

2x1 + 4x2 + 7x4 + x5 = 3,(8)

represented by the array 1 3 1 5 10 1 1 2 12 4 0 7 1

∣∣∣∣∣∣543

. (9)

Let us now perform elementary row operations on the array (9). At this point, do not worry if you donot understand why we are taking the following steps. Adding −2 times the first row of (9) to the thirdrow, we obtain 1 3 1 5 1

0 1 1 2 10 −2 −2 −3 −1

∣∣∣∣∣∣54−7

.

From here, we add 2 times the second row to the third row to obtain 1 3 1 5 10 1 1 2 10 0 0 1 1

∣∣∣∣∣∣541

. (10)

Next, we add −3 times the second row to the first row to obtain 1 0 −2 −1 −20 1 1 2 10 0 0 1 1

∣∣∣∣∣∣−741

.

Next, we add the third row to the first row to obtain 1 0 −2 0 −10 1 1 2 10 0 0 1 1

∣∣∣∣∣∣−641

.



Finally, we add −2 times the third to row to the second row to obtain 1 0 −2 0 −10 1 1 0 −10 0 0 1 1

∣∣∣∣∣∣−621

. (11)

We remark here that the array (10) is said to be in row echelon form, while the array (11) is said to bein reduced row echelon form – precise definitions will follow in Sections 1.5–1.6. Let us see how we maysolve the system (8) by using the arrays (10) or (11). First consider (10). Note that this represents thesystem

x1 + 3x2 + x3 + 5x4 + x5 = 5,x2 + x3 + 2x4 + x5 = 4,

x4 + x5 = 1.(12)

First of all, take the third equation

x4 + x5 = 1.

If we let x5 = t, then x4 = 1 − t. Substituting these into the second equation, we obtain (you must dothe calculation here)

x2 + x3 = 2 + t.

If we let x3 = s, then x2 = 2 + t− s. Substituting all these into the first equation, we obtain (you mustdo the calculation here)

x1 = −6 + t+ 2s.

Hence

x = (x1, x2, x3, x4, x5) = (−6 + t+ 2s, 2 + t− s, s, 1− t, t)

is a solution of the system (12) for every s, t ∈ R. In view of Proposition 1A, these are also precisely thesolutions of the system (8). Alternatively, consider (11) instead. Note that this represents the system

x1 − 2x3 − x5 = −6,x2 + x3 − x5 = 2,

x4 + x5 = 1.(13)

First of all, take the third equation

x4 + x5 = 1.

If we let x5 = t, then x4 = 1 − t. Substituting these into the second equation, we obtain (you must dothe calculation here)

x2 + x3 = 2 + t.

If we let x3 = s, then x2 = 2 + t− s. Substituting all these into the first equation, we obtain (you mustdo the calculation here)

x1 = −6 + t+ 2s.

Hence

x = (x1, x2, x3, x4, x5) = (−6 + t+ 2s, 2 + t− s, s, 1− t, t)



is a solution of the system (13) for every s, t ∈ R. In view of Proposition 1A, these are also preciselythe solutions of the system (8). However, if you have done the calculations as suggested, you will noticethat the calculation is easier for the system (13) than for the system (12). This is clearly a case of thearray (11) in reduced row echelon form having more 0’s than the array (10) in row echelon form, so thatthe system (13) has fewer non-zero coefficients than the system (12).

1.3. Row Echelon Form

Definition. A rectangular array of numbers is said to be in row echelon form if the following conditionsare satisfied:(1) The left-most non-zero entry of any non-zero row has value 1. These are called the pivot entries.(2) All zero rows are grouped together at the bottom of the array.(3) The pivot entry of a non-zero row occurring lower in the array is to the right of the pivot entry of

a non-zero row occurring higher in the array.

Next, we investigate how we may reduce a given array to row echelon form. We shall illustrate theideas by working on an example.

Example 1.3.1. Consider the array 0 0 5 0 15 50 2 4 7 1 30 1 2 3 0 10 1 2 4 1 2

.

Step 1: Locate the left-most non-zero column and cover all columns to the left of this column (in ourillustration here, × denotes an entry that has been covered). We now have

× 0 5 0 15 5× 2 4 7 1 3× 1 2 3 0 1× 1 2 4 1 2

.

Step 2: Consider the part of the array that remains uncovered. By interchanging rows if necessary,ensure that the top-left entry is non-zero. So let us interchange rows 1 and 4 to obtain

× 1 2 4 1 2× 2 4 7 1 3× 1 2 3 0 1× 0 5 0 15 5

.

Step 3: If the top entry on the left-most uncovered column is a, then we multiply the top uncovered rowby 1/a to ensure that this entry becomes 1. So let us divide row 1 by 1 to obtain

× 1 2 4 1 2× 2 4 7 1 3× 1 2 3 0 1× 0 5 0 15 5

!

Step 4: We now try to make all entries below the top entry on the left-most uncovered column zero.This can be achieved by adding suitable multiples of row 1 to the other rows. So let us add −2 timesrow 1 to row 2 to obtain

× 1 2 4 1 2× 0 0 −1 −1 −1× 1 2 3 0 1× 0 5 0 15 5

.



Then let us add −1 times row 1 to row 3 to obtain× 1 2 4 1 2× 0 0 −1 −1 −1× 0 0 −1 −1 −1× 0 5 0 15 5

.

Step 5: Now cover the top row. We then obtain× × × × × ×× 0 0 −1 −1 −1× 0 0 −1 −1 −1× 0 5 0 15 5

.

Step 6: Repeat Steps 1–5 on the uncovered array, and as many times as necessary so that eventually thewhole array gets covered. So let us continue. Following Step 1, we locate the left-most non-zero columnand cover all columns to the left of this column. We now have

× × × × × ×× × 0 −1 −1 −1× × 0 −1 −1 −1× × 5 0 15 5

.

Following Step 2, we interchanging rows if necessary to ensure that the top-left entry is non-zero. So letus interchange rows 1 and 3 (here we do not count any covered rows) to obtain

× × × × × ×× × 5 0 15 5× × 0 −1 −1 −1× × 0 −1 −1 −1

.

Following Step 3, we multiply the top row by a suitable number to ensure that the top entry on theleft-most uncovered column becomes 1. So let us multiply row 1 by 1/5 to obtain

× × × × × ×× × 1 0 3 1× × 0 −1 −1 −1× × 0 −1 −1 −1

.

Following Step 4, we do nothing! Following Step 5, we cover the top row. We then obtain× × × × × ×× × × × × ×× × 0 −1 −1 −1× × 0 −1 −1 −1

.

Following Step 1, we locate the left-most non-zero column and cover all columns to the left of thiscolumn. We now have

× × × × × ×× × × × × ×× × × −1 −1 −1× × × −1 −1 −1

.

Following Step 2, we do nothing! Following Step 3, we multiply the top row by a suitable number toensure that the top entry on the left-most uncovered column becomes 1. So let us multiply row 1 by −1to obtain

× × × × × ×× × × × × ×× × × 1 1 1× × × −1 −1 −1

.



Following Step 4, we now try to make all entries below the top entry on the left-most uncovered columnzero. So let us add row 1 to row 2 to obtain

× × × × × ×× × × × × ×× × × 1 1 1× × × 0 0 0

.

Following Step 5, we cover the top row. We then obtain× × × × × ×× × × × × ×× × × × × ×× × × 0 0 0

.

Following Step 1, we locate the left-most non-zero column and cover all columns to the left of thiscolumn. We now have

× × × × × ×× × × × × ×× × × × × ×× × × × × ×

.

Step ∞. Uncover everything! We then have0 1 2 4 1 20 0 1 0 3 10 0 0 1 1 10 0 0 0 0 0

,

in row echelon form.In practice, we do not actually cover any entries of the array, so let us repeat here the same argumentwithout covering anything – the reader is advised to compare this with the earlier discussion. We startwith the array

0 0 5 0 15 50 2 4 7 1 30 1 2 3 0 10 1 2 4 1 2

.

Interchanging rows 1 and 4, we obtain 0 1 2 4 1 20 2 4 7 1 30 1 2 3 0 10 0 5 0 15 5

.

Adding −2 times row 1 to row 2, and adding −1 times row 1 to row 3, we obtain0 1 2 4 1 20 0 0 −1 −1 −10 0 0 −1 −1 −10 0 5 0 15 5

.

Interchanging rows 2 and 4, we obtain0 1 2 4 1 20 0 5 0 15 50 0 0 −1 −1 −10 0 0 −1 −1 −1

.



Multiplying row 1 by 1/5, we obtain

0 1 2 4 1 20 0 1 0 3 10 0 0 −1 −1 −10 0 0 −1 −1 −1

.

Multiplying row 3 by −1, we obtain

0 1 2 4 1 20 0 1 0 3 10 0 0 1 1 10 0 0 −1 −1 −1

.

Adding row 3 to row 4, we obtain

0 1 2 4 1 20 0 1 0 3 10 0 0 1 1 10 0 0 0 0 0

,

in row echelon form.

Remarks. (1) As already observed earlier, we do not actually physically cover rows or columns. In anypractical situation, we simply copy these entries without changes.

(2) The steps indicated inthe first part of the last example are for guidance only. In practice, wedo not have to follow the steps above religiously, and what we do is to a great extent dictated by goodcommon sense. For instance, suppose that we are faced with the array(

2 3 2 13 2 0 2

).

If we follow the steps religiously, then we shall multiply row 1 by 1/2. However, note that this willintroduce fractions to some entries of the array, and any subsequent calculation will become rathermessy. Instead, let us multiply row 1 by 3 to obtain(

6 9 6 33 2 0 2

).

Then let us multiply row 2 by 2 to obtain (6 9 6 36 4 0 4

).

Adding −1 times row 1 to row 2, we obtain(6 9 6 30 −5 −6 1

).

In this way, we have avoided the introduction of fractions until later in the process. In general, if we startwith an array with integer entries, then it is possible to delay the introduction of fractions by omittingStep 3 until the very end.



Example 1.3.2. Consider the array 2 1 3 2 51 3 2 4 13 2 0 0 2

.

Try following the steps indicated in the first part of the previous example religiously and try to see howcomplicated the calculations get. On the other hand, we can modify the steps with some common sense.First of all, we interchange rows 1 and 2 to obtain 1 3 2 4 1

2 1 3 2 53 2 0 0 2

.

The reason for taking this step is to put an entry 1 at the top left without introducing fractions anywhere.When we next add multiples of row 1 to the other rows to make 0’s below this 1, we do not introducefractions either. Now adding −2 times row 1 to row 2, we obtain 1 3 2 4 1

0 −5 −1 −6 33 2 0 0 2

.

Adding −3 times row 1 to row 3, we obtain 1 3 2 4 10 −5 −1 −6 30 −7 −6 −12 −1

.

Next, multiplying row 2 by −7, we obtain 1 3 2 4 10 35 7 42 −210 −7 −6 −12 −1

.

Multiplying row 3 by −5, we obtain 1 3 2 4 10 35 7 42 −210 35 30 60 5

.

Note that here we are essentially covering up row 1. Also, we have multiplied rows 2 and 3 by suitablemultiples to that their leading non-zero entries are the same, in preparation for taking the next stepwithout introducing fractions. Now adding −1 times row 2 to row 3, we obtain 1 3 2 4 1

0 35 7 42 −210 0 23 18 26

.

Here, the array is almost in row echelon form, except that the leading non-zero entries in rows 2 and 3are not equal to 1. However, we can always multiply row 2 by 1/35 and row 3 by 1/23 if we want toobtain the row echelon form 1 3 2 4 1

0 1 1/5 6/5 −3/50 0 1 18/23 26/23

.

If this differs from the answer you got when you followed the steps indicated in the previous examplereligiously, do not worry. row echelon forms are not unique!



1.4. Reduced Row Echelon Form

Definition. A rectangular array of numbers is said to be in reduced row echelon form if the followingconditions are satisfied:(1) The left-most non-zero entry of any non-zero row has value 1. These are called the pivot entries.(2) All zero rows are grouped together at the bottom of the array.(3) The pivot entry of a non-zero row occurring lower in the array is to the right of the pivot entry of

a non-zero row occurring higher in the array.(4) Each column containing a pivot entry has 0’s everywhere else in the column.

We now investigate how we may reduce a given array to reduced row echelon form. Here, we basicallytake an extra step to convert an array from row echelon form to reduced row echelon form. We shallillustrate the ideas by continuing on an earlier example.

Example 1.4.1. Consider again the array0 0 5 0 15 50 2 4 7 1 30 1 2 3 0 10 1 2 4 1 2

.

We have already shown in Example 1.3.1 that this array can be reduced to row echelon form0 1 2 4 1 20 0 1 0 3 10 0 0 1 1 10 0 0 0 0 0

.

Step 1: Cover all zero rows at the bottom of the array. We now have0 1 2 4 1 20 0 1 0 3 10 0 0 1 1 1× × × × × ×

.

Step 2: We now try to make all the entries above the pivot entry on the bottom row zero (here again wedo not count any covered rows). This can be achieved by adding suitable multiples of the bottom rowto the other rows. So let us add −4 times row 3 to row 1 to obtain

0 1 2 0 −3 −20 0 1 0 3 10 0 0 1 1 1× × × × × ×

.

Step 3: Now cover the bottom row. We then obtain0 1 2 0 −3 −20 0 1 0 3 1× × × × × ×× × × × × ×

.

Step 4: Repeat Steps 2–3 on the uncovered array, and as many times as necessary so that eventuallythe whole array gets covered. So let us continue. Following Step 2, we add −2 times row 2 to row 1 toobtain

0 1 0 0 −9 −40 0 1 0 3 1× × × × × ×× × × × × ×

.



Following Step 3, we cover row 2 to obtain0 1 0 0 −9 −4× × × × × ×× × × × × ×× × × × × ×

.

Following Step 2, we do nothing! Following Step 3, we cover row 1 to obtain× × × × × ×× × × × × ×× × × × × ×× × × × × ×

.

Step ∞. Uncover everything! We then have0 1 0 0 −9 −40 0 1 0 3 10 0 0 1 1 10 0 0 0 0 0

,

in reduced row echelon form.Again, in practice, we do not actually cover any entries of the array, so let us repeat here the sameargument without covering anything – the reader is advised to compare this with the earlier discussion.We start with the row echelon form

0 1 2 4 1 20 0 1 0 3 10 0 0 1 1 10 0 0 0 0 0

.

Adding −4 times row 3 to row 1, we obtain0 1 2 0 −3 −20 0 1 0 3 10 0 0 1 1 10 0 0 0 0 0

.

Adding −2 times row 2 to row 1, we obtain0 1 0 0 −9 −40 0 1 0 3 10 0 0 1 1 10 0 0 0 0 0

,

in reduced row echelon form.

1.5. Solving a System of Linear Equations

Let us first summarize what we have done so far. We study a system (1) of m linear equations in nvariables x1, . . . , xn. If we omit reference to the variables, then the system (1) can be represented bythe array (2), with m rows and n + 1 columns. We next reduce the array (2) to row echelon form orreduced row echelon form by elementary row operations.

By Proposition 1A, the system of linear equations represented by the array in row echelon form orreduced row echelon form has the same solution set as the system (1). It follows that to solve the system



(1), it remains to solve the system represented by the array in row echelon form or reduced row echelonform. We now describe a simple way to obtain all solutions of this system.

Definition. Any column of an array (2) in row echelon form or reduced row echelon form containing apivot entry is called a pivot column.

First of all, let us eliminate the situation when the system has no solutions. Suppose that the array(2) has been reduced to row echelon form, and that this contains a row of the form

0 . . . 0︸︷︷︸n

1

corresponding to the last column of the array being a pivot column. This row represents the equation

0x1 + . . .+ 0xn = 1;

clearly the system cannot have any solution.

Definition. Suppose that the array (2) in row echelon form or reduced row echelon form satisfies thecondition that its last column is not a pivot column. Then any variable xi corresponding to a pivotcolumn is called a pivot variable. All other variables are called free variables.

Example 1.5.1. Consider the array 0 1 0 0 −90 0 1 0 30 0 0 1 10 0 0 0 0

∣∣∣∣∣∣∣−4110

,

representing the system

x2 − 9x5 = −4,x3 + 3x5 = 1,

x4 + x5 = 1.

Note that the zero row in the array represents an equation which is trivial! Here the last column of thearray is not a pivot column. Now columns 2, 3, 4 are the pivot columns, so that x2, x3, x4 are the pivotvariables and x1, x5 are the free variables.

To solve the system, we allow the free variables to take any values we choose, and then solve for thepivot variables in terms of the values of these free variables.

Example 1.5.2. Consider the system of 4 linear equations

5x3 + 15x5 = 5,2x2 + 4x3 + 7x4 + x5 = 3,x2 + 2x3 + 3x4 = 1,x2 + 2x3 + 4x4 + x5 = 2,

(14)

in the 5 variables x1, x2, x3, x4, x5. If we omit reference to the variables, then the system can be repre-sented by the array

0 0 5 0 150 2 4 7 10 1 2 3 00 1 2 4 1

∣∣∣∣∣∣∣5312

. (15)


Jimmy

Highlight


As in Example 1.3.1, we can reduce the array (15) to row echelon form

0 1 2 4 10 0 1 0 30 0 0 1 10 0 0 0 0

∣∣∣∣∣∣∣2110

, (16)


x2 + 2x3 + 4x4 + x5 = 2,x3 + 3x5 = 1,

x4 + x5 = 1.(17)

Alternatively, as in Example 1.4.1, we can reduce the array (15) to reduced row echelon form

0 1 0 0 −90 0 1 0 30 0 0 1 10 0 0 0 0

∣∣∣∣∣∣∣−4110

, (18)


x2 − 9x5 = −4,x3 + 3x5 = 1,

x4 + x5 = 1.(19)

By Proposition 1A, the three systems (14), (17) and (19) have exactly the same solution set. Now, weobserve from (16) or (18) that columns 2, 3, 4 are the pivot columns, so that x2, x3, x4 are the pivotvariables and x1, x5 are the free variables. If we assign values x1 = s and x5 = t, then we have, from(17) (harder) or (19) (easier), that

(x1, x2, x3, x4, x5) = (s, 9t− 4,−3t+ 1,−t+ 1, t). (20)

It follows that (20) is a solution of the system (14) for every s, t ∈ R.

Example 1.5.3. Let us return to Example 1.2.4, and consider again the system (8) of 3 linear equations inthe 5 variables x1, x2, x3, x4, x5. If we omit reference to the variables, then the system can be representedby the array (9). We can reduce the array (9) to row echelon form (10), representing the system (12).Alternatively, we can reduce the array (9) to reduced row echelon form (11), representing the system(13). By Proposition 1A, the three systems (8), (12) and (13) have exactly the same solution set. Now,we observe from (10) or (11) that columns 1, 2, 4 are the pivot columns, so that x1, x2, x4 are the pivotvariables and x3, x5 are the free variables. If we assign values x3 = s and x5 = t, then we have, from(12) (harder) or (13) (easier), that

(x1, x2, x3, x4, x5) = (−6 + t+ 2s, 2 + t− s, s, 1− t, t). (21)

It follows that (21) is a solution of the system (8) for every s, t ∈ R.

Example 1.5.4. In this example, we do not bother even to reduce the matrix to row echelon form.Consider the system of 3 linear equations

2x1 + x2 + 3x3 + 2x4 = 5,x1 + 3x2 + 2x3 + 4x4 = 1,

3x1 + 2x2 = 2,(22)



in the 4 variables x1, x2, x3, x4. If we omit reference to the variables, then the system can be representedby the array 2 1 3 2

1 3 2 43 2 0 0

∣∣∣∣∣∣512

. (23)

As in Example 1.3.2, we can reduce the array (23) to the form 1 3 2 40 35 7 420 0 23 18

∣∣∣∣∣∣1−2126

, (24)


x1 + 3x2 + 2x3 + 4x4 = 1,35x2 + 7x3 + 42x4 = −21,

23x3 + 18x4 = 26.(25)

Note that the array (24) is almost in row echelon form, except that the pivot entries are not 1. ByProposition 1A, the two systems (22) and (25) have exactly the same solution set. Now, we observefrom (24) that columns 1, 2, 3 are the pivot columns, so that x1, x2, x3 are the pivot variables and x4 isthe free variable. If we assign values x4 = s, then we have, from (25), that

(x1, x2, x3, x4) =(

1623s+

2823,−24

23s− 19

23,−18

23s+

2623, s

). (26)

It follows that (26) is a solution of the system (22) for every s ∈ R.

1.6. Homogeneous Systems

Consider a homogeneous system of m linear equations of the form

a11x1 + a12x2 + . . .+ a1nxn = 0,a21x1 + a22x2 + . . .+ a2nxn = 0,

...am1x1 + am2x2 + . . .+ amnxn = 0,

(27)

with n variables x1, x2, . . . , xn. If we omit reference to the variables, then system (27) can be representedby the array

a11 a12 . . . a1n

a21 a22 . . . a2n...

......

am1 am2 . . . amn

∣∣∣∣∣∣∣∣00...0

(28)

of all the coefficients.

Note that the system (27) always has a solution, namely the trivial solution

x1 = x2 = . . . = xn = 0.

Indeed, if we reduce the array (28) to row echelon form or reduced row echelon form, then it is notdifficult to see that the last column is a zero column and so cannot be a pivot column.



On the other hand, if the system (27) has a non-trivial solution, then we can multiply this solutionby any non-zero real number different from 1 to obtain another non-trivial solution. We have thereforeproved the following simple result.

PROPOSITION 1B. The homogeneous system (27) either has the trivial solution as its only solutionor has infinitely many solutions.

The purpose of this section is to discuss the following stronger result.

PROPOSITION 1C. Suppose that the system (27) has more variables than equations; in other words,suppose that n > m. Then there are infinitely many solutions.

To see this, let us consider the array (28) representing the system (27). Note that (28) has m rows,corresponding to the number of equations. Also (28) has n + 1 columns, where n is the number ofvariables. However, the column of (28) on the extreme right is a zero column, corresponding to thefact that the system is homogeneous. Furthermore, this column remains a zero column if we performelementary row operations on the array (28). If we now reduce (28) to row echelon form by elementaryrow operations, then there are at most m pivot columns, since there are only m equations in (27) and mrows in (28). It follows that if we exclude the zero column on the extreme right, then the remaining ncolumns cannot all be pivot columns. Hence at least one of the variables is a free variable. By assigningthis free variable arbitrary real values, we end up with infinitely many solutions for the system (27).

1.7. Application to Network Flow

Systems of linear equations arise when we investigate the flow of some quantity through a network.Such networks arise in science, engineering and economics. Two such examples are the pattern of trafficflow through a city and distribution of products from manufacturers to consumers through a network ofwholesalers and retailers.

A network consists of a set of points, called the nodes, and directed lines connecting some or all of thenodes. The flow is indicated by a number or a variable. We observe the following basic assumptions:

• The total flow into a node is equal to the total flow out of a node.• The total flow into the network is equal to the total flow out of the network.

Example 1.7.1. The picture below represents a system of one way streets in a particular part of somecity and the traffic flow along the streets between the junctions:

Linear Algebra c! W W L Chen, 1982, 2006

Example 1.7.1. The picture below represents a system of one way streets in a particular part of somecity and the tra!c flow along the streets between the junctions:

x1 200

200 A x2 B 300

x3 x4

300 C x5 D 500

400 300




We first equate the total flow into each node with the total flow out of the same node:

node A: 200 + x3 = x1 + x2,node B: 200 + x2 = 300 + x4,node C: 400 + x5 = 300 + x3,node D: 500 + x4 = 300 + x5.

We then equate the total flow into and out of the network:

400 + 200 + 200 + 500 = 300 + 300 + x1 + 300.

These give rise to a system of 5 linear equations

x1 + x2 − x3 = 200,x2 − x4 = 100,

x3 − x5 = 100,x4 − x5 = −200,

x1 = 400,

in the 5 variables x1, . . . , x5, with augmented matrix1 1 −1 0 00 1 0 −1 00 0 1 0 −10 0 0 1 −11 0 0 0 0

∣∣∣∣∣∣∣∣∣200100100−200400

.

This has reduced row echelon form1 0 0 0 00 1 0 −1 00 0 1 0 −10 0 0 1 −10 0 0 0 0

∣∣∣∣∣∣∣∣∣400100100−200

0

.

We have general solution (x1, . . . , x5) = (400, t − 100, t + 100, t − 200, t), where t is a parameter. Sinceone way streets do not permit negative flow, all the coordinates have to be non-negative. It follows thatt ≥ 200.

Example 1.7.2. The picture below represents the quantities of a particular product that flow frommanufacturers M1,M2,M3, through wholesalers W1,W2,W3 and retailers R1, R2, R3, R4, to consumers:


Example 1.7.2. The picture below represents the quantities of a particular product that flow frommanufacturers M1, M2, M3, through wholesalers W1, W2, W3 and retailers R1, R2, R3, R4, to consumers:

M1 M2 M3

W1 W2 W3

R1 R2 R3 R4

200x1 x2

300 x3

x4x5

300x6

100x7

400 x8 200 500




We first equate the total flow into each node with the total flow out of the same node:

node W1: 200 + x1 = x4 + x5,node W2: 300 + x2 = 300 + x6,node W3: x3 = 100 + x7,node R1: x4 = 400,node R2: 300 + x5 = x8,node R3: 100 + x6 = 200,node R4: x7 = 500.

We then equate the total flow into and out of the network:

200 + x1 + x2 + 300 + x3 = 400 + x8 + 200 + 500.

These give rise to a system of 8 linear equations

x1 − x4 − x5 = −200,x2 − x6 = 0,

x3 − x7 = 100,x4 = 400,

x5 − x8 = −300,x6 = 100,

x7 = 500,x1 + x2 + x3 − x8 = 600,

in the 8 variables x1, . . . , x8, with augmented matrix

1 0 0 −1 −1 0 0 00 1 0 0 0 −1 0 00 0 1 0 0 0 −1 00 0 0 1 0 0 0 00 0 0 0 1 0 0 −10 0 0 0 0 1 0 00 0 0 0 0 0 1 01 1 1 0 0 0 0 −1

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

−2000

100400−300100500600

.

This has row echelon form

1 0 0 −1 −1 0 0 00 1 0 0 0 −1 0 00 0 1 0 0 0 −1 00 0 0 1 0 0 0 00 0 0 0 1 0 0 −10 0 0 0 0 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 0 0

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

−2000

100400−3001005000

.

We have general solution (x1, . . . , x8) = (t−100, 100, 600, 400, t−300, 100, 500, t), where t is a parameter.If no goods is returned, then all the coordinates have to be non-negative. It follows that t ≥ 300.

1.8. Application to Electrical Networks

A simple electric circuit consists of two basic components, electrical sources where the electrical potentialE is measured in volts (V ), and resistors where the resistence R is measured in ohms (Ω). We areinterested in determining the current I measured in amperes (A).



The electrical potential between two points is sometimes called the voltage drop between these twopoints. Currents and voltage drops can be positive or negative.

The current flow in an electrical circuit is governed by three basic rules:

• Ohm’s law: The voltage drop E across a resistor with resistence R with a current I passing throughit is given by E = IR.• Current law: The sum of the currents flowing into any point is the same as the sum of the currents

flowing out of the point.• Voltage law: The sum of the voltage drops around any closed loop is equal to zero.

Around any loop, we select a positive direction – clockwise or anticlockwise as we see fit. We have thefollowing convention:

• The voltage drop across a resistor is taken to be positive if the current flows in the positive directionof the loop, and negative if the current flows in the negative direction of the loop.• The voltage drop across an electrical source is taken to be positive if the positive direction of the

loop is from + to −, and negative if the positive direction of the loop is from − to +.

Example 1.8.1. Consider the electric circuit shown in the diagram below:



8! A

I1 I3

20V+

"I2 4! 20!

+ "B 16V


We wish to determine the currents I1, I2 and I3. Applying the Current law to the point A, we obtainI1 = I2 + I3. Applying the Current law to the point B, we obtain the same. Hence we have the linearequation

I1 − I2 − I3 = 0.

Next, let us consider the left hand loop, and let us take the positive direction to be clockwise. ByOhm’s law, the voltage drop across the 8Ω resistor is 8I1, while the voltage drop across the 4Ω resistor is4I2. On the other hand, the voltage drop across the 20V electrical source is negative, since the positivedirection of the loop is from − to +. The Voltage law applied to this loop now gives 8I1 + 4I2 − 20 = 0,and we have the linear equation

8I1 + 4I2 = 20, or 2I1 + I2 = 5.

Next, let us consider the right hand loop, and let us take the positive direction to be clockwise. By Ohm’slaw, the voltage drop across the 20Ω resistor is 20I3, while the voltage drop across the 4Ω resistor is−4I2. On the other hand, the voltage drop across the 16V electrical source is negative, since the positivedirection of the loop is from − to +. The Voltage law applied to this loop now gives 20I3−4I2−16 = 0,and we have the linear equation

4I2 − 20I3 = −16, or I2 − 5I3 = −4.



We now have a system of three linear equations

I1 − I2 − I3 = 0,2I1 + I2 = 5,

I2 − 5I3 = −4.(29)

The augmented matrix is given by 1 −1 −12 1 00 1 −5

∣∣∣∣∣∣05−4

, with reduced row echelon form

1 0 00 1 00 0 1

∣∣∣∣∣∣211

.

Hence I1 = 2 and I2 = I3 = 1. Note here that we have not considered the outer loop. Suppose again thatwe take the positive direction to be clockwise. By Ohm’s law, the voltage drop across the 8Ω resistor is8I1, while the voltage drop across the 20Ω resistor is 20I3. On the other hand, the voltage drop acrossthe 20V and 16V electrical sources are both negative. The Voltage law applied to this loop then gives8I1 + 20I3 − 36 = 0. But this equation can be obtained by combining the last two equations in (29).




8! A

I1 I3

20V+

"I2 6! 8!

+ "B5! 30V


We wish to determine the currents I1, I2 and I3. Applying the Current law to the point A, we obtainI1 + I2 = I3. Applying the Current law to the point B, we obtain the same. Hence we have the linearequation

I1 + I2 − I3 = 0.

Next, let us consider the left hand loop, and let us take the positive direction to be clockwise. By Ohm’slaw, the voltage drop across the 8Ω resistor is 8I1, the voltage drop across the 6Ω resistor is −6I2, whilethe voltage drop across the 5Ω resistor is 5I1. On the other hand, the voltage drop across the 20Velectrical source is negative, since the positive direction of the loop is from − to +. The Voltage lawapplied to this loop now gives 8I1 − 6I2 + 5I1 − 20 = 0, and we have the linear equation

13I1 − 6I2 = 20.

Next, let us consider the outer loop, and let us take the positive direction to be clockwise. By Ohm’slaw, the voltage drop across the 8Ω resistor on the top is 8I1, the voltage drop across the 8Ω resistoron the right is 8I3, while the voltage drop across the 5Ω resistor is 5I1. On the other hand, the voltagedrop across the 30V and 20V electrical sources are both negative, since the positive direction of the loopis from − to + in each case. The Voltage law applied to this loop now gives 8I1 + 8I3 + 5I1 − 50 = 0,and we have the linear equation

13I1 + 8I3 = 50.



We now have a system of three linear equations

I1 + I2 − I3 = 0,13I1 − 6I2 = 20,13I1 + 8I3 = 50.

(30)

The augmented matrix is given by 1 1 −113 −6 013 0 8

∣∣∣∣∣∣02050

, with reduced row echelon form

1 0 00 1 00 0 1

∣∣∣∣∣∣213

.

Hence I1 = 2, I2 = 1 and I3 = 3. Note here that we have not considered the right hand loop. Supposeagain that we take the positive direction to be clockwise. By Ohm’s law, the voltage drop across the8Ω resistor is 8I3, while the voltage drop across the 6Ω resistor is 6I2. On the other hand, the voltagedrop across the 30V electrical source is negative. The Voltage law applied to this loop then gives8I3 + 6I2 − 30 = 0. But this equation can be obtained by combining the last two equations in (30).

1.9. Application to Economics

In this section, we describe a simple exchange model due to the economist Leontief. An economy isdivided into sectors. We know the total output for each sector as well as how outputs are exchangedamong the sectors. The value of the total output of a given sector is known as the price of the output.

Leontief has shown that there exist equilibrium prices that can be assigned to the total output of thesectors in such a way that the income for each sector is exactly the same as its expenses.

Example 1.9.1. An economy consists of three sectors A,B,C which purchase from each other accordingto the table below:

proportion of output from sectorA B C

purchased by sector A 0.2 0.6 0.1purchased by sector B 0.4 0.1 0.5purchased by sector C 0.4 0.3 0.4

Let pA, pB , pC denote respectively the value of the total output of sectors A,B,C. For the expense tomatch the value for each sector, we must have

0.2pA + 0.6pB + 0.1pC = pA,

0.4pA + 0.1pB + 0.5pC = pB ,

0.4pA + 0.3pB + 0.4pC = pC ,

leading to the homogeneous linear equations

0.8pA − 0.6pB − 0.1pC = 0,0.4pA − 0.9pB + 0.5pC = 0,0.4pA + 0.3pB − 0.6pC = 0,

giving rise to the augmented matrix 0.8 −0.6 −0.10.4 −0.9 0.50.4 0.3 −0.6

∣∣∣∣∣∣000

, or simply

8 −6 −14 −9 54 3 −6

∣∣∣∣∣∣000

.



This can be reduced by elementary row operations to 16 0 −130 12 −110 0 0

∣∣∣∣∣∣000

,

leading to the solution (pA, pB , pC) = t( 1316 ,

1112 , 1) if we assign the free variable pC = t, or to the solution

(pA, pB , pC) = t(39, 44, 48) if we assign the free variable pC = 48t, where t is a real parameter. For thelatter, the choice t = 106 gives rise to the prices of 39, 44 and 48 million for the three sectors A,B,Crespectively.

1.10. Application to Chemistry

Chemical equations consist of reactants and products. The problem is to balance such equations so thatthe following two rules apply:

• Conservation of mass: No atoms are produced or destroyed in a chemical reaction.• Conservation of charge: The total charge of reactants is equal to the total charge of the products.

Example 1.10.1. Consider the oxidation of ammonia to form nitric oxide and water, given by thechemical equation

(x1)NH3 + (x2)O2 −→ (x3)NO + (x4)H2O.

Here the reactants are ammonia (NH3) and oxygen (O2), while the products are nitric oxide (NO) andwater (H2O). Our problem is to find the smallest positive integer values of x1, x2, x3, x4 such that theequation balances. To do this, the technique is to equate the total number of each type of atoms on thetwo sides of the chemical equation:

atom N : x1 = x3,atom H: 3x1 = 2x4,atom O: 2x2 = x3 + x4.

These give rise to a homogeneous system of 3 linear equations

x1 − x3 = 0,3x1 − 2x4 = 0,

2x2 − x3 − x4 = 0,

in the 4 variables x1, . . . , x4, with augmented matrix 1 0 −1 03 0 0 −20 2 −1 −1

∣∣∣∣∣∣000

,

which can be simplified by elementary row operations to 1 0 −1 00 2 −1 −10 0 3 −2

∣∣∣∣∣∣000

,

leading to the general solution (x1, . . . , x4) = t( 23 ,

56 ,

23 , 1) if we assign the free variable x4 = t. The

choice t = 6 gives rise to the smallest positive integer solution (x1, . . . , x4) = (4, 5, 4, 6), leading to thebalanced chemical equation

4NH3 + 5O2 −→ 4NO + 6H2O.



Example 1.10.2. Consider the chemical equation

(x1)CO + (x2)CO2 + (x3)H2 −→ (x4)CH4 + (x5)H2O.

We equate the total number of each type of atoms on the two sides of the chemical equation:

atom C: x1 + x2 = x4,atom O: x1 + 2x2 = x5,atom H: 2x3 = 4x4 + 2x5.

These give rise to a homogeneous system of 3 linear equations

x1 + x2 − x4 = 0,x1 + 2x2 − x5 = 0,

2x3 − 4x4 − 2x5 = 0,

in the 5 variables x1, . . . , x5, with augmented matrix 1 1 0 −1 01 2 0 0 −10 0 2 −4 −2

∣∣∣∣∣∣000

,

with reduced row echelon form 1 0 0 −2 10 1 0 1 −10 0 1 −2 −1

∣∣∣∣∣∣000

,

leading to the general solution (x1, . . . , x5) = s(2,−1, 2, 1, 0) + t(−1, 1, 1, 0, 1) if we assign the two freevariables x4 = s and x5 = t. The choice s = 2 and t = 3 leads to the solution (x1, . . . , x5) = (1, 1, 7, 2, 3),with balanced chemical equation

CO + CO2 + 7H2 −→ 2CH4 + 3H2O;

the choice s = 3 and t = 4 leads to the solution (x1, . . . , x5) = (2, 1, 10, 3, 4), with balanced chemicalequation

2CO + CO2 + 10H2 −→ 3CH4 + 4H2O;

while the choice s = 3 and t = 5 leads to the solution (x1, . . . , x5) = (1, 2, 11, 3, 5), with balancedchemical equation

CO + 2CO2 + 11H2 −→ 3CH4 + 5H2O.

All these are known to happen.

1.11. Application to Mechanics

In this section, we consider the problem of systems of weights, light ropes and smooth light pulleys,subject to the following two main principles:

• If a light rope passes around one or more smooth light pulleys, then the tension at the two ends arethe same.• Newton’s second law of motion: We have F = mx, where F denotes force, m denotes mass and x

denotes acceleration.



Example 1.11.1. Two particles, of mass 2 and 4 (kilograms), are attached to the ends of a light ropepassing around a smooth light pulley suspended from the ceiling as shown in the diagram below:



x1 x2

2

4

We would like to find the tension in the rope and the acceleration of each particle. Here it will beconvenient that the distances x1 and x2 are measured downwards, and we take this as the positivedirection, so that any positive accelaration is downwards. We first apply Newton’s law of motion to eachparticle. The picture below summarizes the forces action on the two particles:

T T

2 4

2g 4g





x1 x2

2

4


T T

2 4

2g 4g


Here T denotes the tension in the rope, and g denotes acceleration due to gravity. Newton’s law ofmotion applied to the two particles (downwards) then give the equations

2x1 = 2g − T and 4x2 = 4g − T.

We also have the conservation of the length of the rope, in the form x1 + x2 = C, so that x1 + x2 = 0.To summarize, for the three variables x1, x2, T , we have the system of linear equations

2x1 + T = 2g,4x2 + T = 4g,

x1 + x2 = 0,

with augmented matrix 2 0 10 4 11 1 0

∣∣∣∣∣∣2g4g0

,

which can be reduced by elementary row operations to 1 1 00 −2 10 0 3

∣∣∣∣∣∣02g8g

.

This leads to the solution (x1, x2, T ) = (− 13g,

13g,

83g).



Example 1.11.2. We now generalize the problem in the previous example. Two particles, of mass m1

and m2, are attached to the ends of a light rope passing around a smooth light pulley suspended fromthe ceiling as shown in the diagram below:


Example 1.11.2. We now generalize the problem in the previous example. Two particles, of mass m1

and m2, are attached to the ends of a light rope passing around a smooth light pulley suspended fromthe ceiling as shown in the diagram below:

x1 x2

m1

m2


For the three variables x1, x2, T , we now have the system of linear equations

m1x1 + T = m1g,

m2x2 + T = m2g,

x1 + x2 = 0,

with augmented matrix m1 0 10 m2 11 1 0

∣∣∣∣∣∣m1gm2g

0

,

which can be reduced by elementary row operations to 1 1 00 m1m2 m1

0 0 m1 +m2

∣∣∣∣∣∣0

m1m2g2m1m2g

.

This leads to the solution

(x1, x2, T ) =(m1 −m2

m1 +m2g,m2 −m1

m1 +m2g,

2m1m2

m1 +m2g

).

Note that if m1 = m2, then x1 = x2 = 0, so that the particles are stationary. On the other hand, ifm2 > m1, then x2 > 0 and x1 < 0. Then

T <2m1m2

m1 +m1g = m2g and T >

2m1m2

m2 +m2g = m1g.

Hence m1g < T < m2g.



Problems for Chapter 1

1. Consider the system of linear equations

2x1 + 5x2 + 8x3 = 2,x1 + 2x2 + 3x3 = 4,

3x1 + 4x2 + 4x3 = 1.

a) Write down the augmented matrix for this system.b) Reduce the augmented matrix by elementary row operations to row echelon form.c) Use your answer in part (b) to solve the system of linear equations.


4x1 + 5x2 + 8x3 = 0,x1 + 3x3 = 6,

3x1 + 4x2 + 6x3 = 9.



x1 − x2 − 7x3 + 7x4 = 5,−x1 + x2 + 8x3 − 5x4 = −7,3x1 − 2x2 − 17x3 + 13x4 = 14,2x1 − x2 − 11x3 + 8x4 = 7.


4. Solve the system of linear equations

x + 3y − 2z = 4,2x + 7y + 2z = 10.

5. For each of the augmented matrices below, reduce the matrix to row echelon or reduced row echelonform, and solve the system of linear equations represented by the matrix:

a)

1 1 2 13 2 −1 34 3 1 42 1 −3 2

∣∣∣∣∣∣∣56111

b)

1 2 3 −32 −5 −3 127 1 8 5

∣∣∣∣∣∣127

6. Reduce each of the following arrays by elementary row operations to reduced row echelon form:

a)

1 2 3 4 50 2 3 4 50 0 3 4 50 0 0 4 5

b)

1 1 0 0 00 1 1 0 00 0 1 1 00 0 0 1 1

c)

1 11 21 31 41 512 12 22 32 42 523 13 23 33 43 53



7. Consider a system of linear equations in five variables x = (x1, x2, x3, x4, x5) and expressed in matrixform Ax = b, where x is written as a column matrix. Suppose that the augmented matrix (A|b)can be reduced by elementary row operations to the row echelon form

1 3 2 0 60 0 1 1 20 0 0 1 10 0 0 0 0

∣∣∣∣∣∣∣4170

.

a) Which are the pivot variables and which are the free variables?b) Determine all the solutions of the system of linear equations.

8. Consider a system of linear equations in five variables x = (x1, x2, x3, x4, x5) and expressed in matrixform Ax = b, where x is written as a column matrix. Suppose that the augmented matrix (A|b)can be reduced by elementary row operations to the row echelon form

1 2 0 3 10 1 3 1 20 0 0 1 10 0 0 0 0

∣∣∣∣∣∣∣5340

.

a) Which are the pivot variables and which are the free variables?b) Determine all the solutions of the system of linear equations.


x1 + λx2 − x3 = 1,2x1 + x2 + 2x3 = 5λ+ 1,x1 − x2 + 3x3 = 4λ+ 2,x1 − 2λx2 + 7x3 = 10λ− 1.

a) Reduce its associated augmented matrix to row echelon form.[Hint: After one or two steps, we will find the calculations extremely unpleasant, particularlysince we do not know whether λ is zero or non-zero. Try rewriting the system of equations asa system in the variables x1, x3, x2, so that columns 2 and 3 of the augmented matrix are nowswapped.]

b) Find a value of λ for which the system is soluble.c) Solve the system.

10. Find the minimum value for x4 in the following system of one way streets:


10. Find the minimum value for x4 in the following system of one way streets:

120 150

50 A x1 B 80

x2 x3

100 C x4 D 100




11. Consider the traffic flow in the following system of one way streets:


11. Consider the tra!c flow in the following system of one way streets:

50 A x1 B x2

x3 x4

80 C x5 D 10


a) Find the general solution of the system.b) Find the range for x5, and then determine the range for each of the other four variables.

12. Consider the traffic flow in the following system of one way streets:


12. Consider the tra!c flow in the following system of one way streets:

30 30

60 A B C D 40

x1 x2 x3

x4 10 20 x5

x6 x7 x8

50 E F G H x9

50 40


a) Find the general solution of the system.b) Find the range for x8, and then determine the range for each of the other eight variables.

13. Consider the electric circuit shown in the diagram below:



A

I1 I3

10! I2 5! 6!

" + " +

B20V 40V


Determine the currents I1, I2 and I3. You must explain each step carefully, quoting all the relevantlaws on electric circuits. In particular, you must clearly indicate the positive direction of each loopyou are considering, and ensure that the voltage drop across every resistor and electrical source onthe loop carries the correct sign.






A

I1 I3

8! I2 1! 8!

" + " +

B60V 20V


Determine the currents I1, I2 and I3. You must explain each step carefully, quoting all the relevantlaws on electric circuits. In particular, you must clearly indicate the positive direction of each loopyou are considering, and ensure that the voltage drop across every resistor and electrical source onthe loop carries the correct sign.




8! 20!A B

I1 I3

50V+

"I2 5! I5 5!

"+

10V

I4 I6

C D


Determine the currents I1, I2, I3, I4, I5 and I6. You must explain each step carefully, quoting all therelevant laws on electric circuits. In particular, you must clearly indicate the positive direction ofeach loop you are considering, and ensure that the voltage drop across every resistor and electricalsource on the loop carries the correct sign.

16. Three industries A,B,C consume their own outputs and also buy from each other according to thetable below:

proportion of output of industryA B C

bought by industry A 0.35 0.50 0.30bought by industry B 0.25 0.20 0.30bought by industry C 0.40 0.30 0.40

Use the simple exchange model due to the economist Leontief to determine equilibrium prices thatthey can charge each other so that no money changes hands.



17. An arrangement exists for three colleagues A,B,C who work for themselves and each other accordingto the table below:

percentage of time spent byA B C

working for A 50 40 10working for B 10 20 60working for C 40 40 30

Use the simple exchange model due to the economist Leontief to determine equilibrium fees thatthey can charge each other so that no money changes hands.

18. Three farmers A,B,C grow bananas, oranges and apples respectively, and buy off each other.Farmer A buys 50% of the oranges and 20% of the apples, farmer B buys 30% of the bananas and40% of the apples, while farmer C buys 50% of the bananas and 20% of the oranges. Use the simpleexchange model due to the economist Leontief to determine equilibrium prices that they can chargeeach other so that no money changes hands.

19. For each of the following chemical reactions, determine the balanced chemical equation:a) reactants Al and O2; product Al2O3

b) reactants C2H6 and O2; products CO2 and H2Oc) reactants PbO2 and HCl; products PbCl2, Cl2 and H2Od) reactants C2H5OH and O2; products CO2 and H2Oe) reactants MnO2, H2SO4 and H2C2O4; products MnSO4, CO2 and H2O

20. Two particles, of mass m1 and m2 (kilograms), are arranged with light ropes and smooth lightpulleys as shown in the diagram below:


20. Two particles, of mass m1 and m2 (kilograms), are arranged with light ropes and smooth lightpulleys as shown in the diagram below:

x1 x2

m1

m2


a) Consider first of all the case when m1 = m2 = 3.(i) Show that the augmented matrix for a system of linear equations in the three variables x1, x2, T ,

where T denotes the tension of the rope, has reduced row echelon form 1 0 00 1 00 0 1

∣∣∣∣∣∣−g/52g/59g/5

.

(ii) Determine x1, x2, T in this case.(iii) In which direction is the particle on the right moving?



b) Show that in the general case, we have

(x1, x2, T ) =(

(m1 − 2m2)gm1 + 4m2

,2(2m2 −m1)gm1 + 4m2

,3m1m2g

m1 + 4m2

).

c) What relationship must m1 and m2 have in order to achieve equilibrium?

21. Three particles, of mass m1, m2 and m3 (kilograms), are arranged with light ropes and smoothlight pulleys as shown in the diagram below:


b) Show that in the general case, we have

(x1, x2, T ) =!

(m1 " 2m2)gm1 + 4m2

,2(2m2 "m1)g

m1 + 4m2,

3m1m2g

m1 + 4m2

".

c) What relationship must m1 and m2 have in order to achieve equilibrium?

21. Three particles, of mass m1, m2 and m3 (kilograms), are arranged with light ropes and smoothlight pulleys as shown in the diagram below:

x2

x1 x3

m3

m1

m2

a) Show that we have

(x1, x2, x3, T ) =!!

1" 4m2m3

M

"g,

!1" 8m1m3

M

"g,

!1" 4m1m2

M

"g,

4m1m2m3g

M

",

where T denotes the tension of the rope and M = m1m2 + m2m3 + 4m1m3.b) Show that equilibrium occurs precisely when m2 = 2m1 = 2m3.


a) Show that we have

(x1, x2, x3, T ) =((

1− 4m2m3

M

)g,

(1− 8m1m3

M

)g,

(1− 4m1m2

M

)g,

4m1m2m3g

M

),

where T denotes the tension of the rope and M = m1m2 +m2m3 + 4m1m3.b) Show that equilibrium occurs precisely when m2 = 2m1 = 2m3.


LINEAR ALGEBRA

W W L CHEN

c© W W L Chen, 1982, 2008.






Chapter 2

MATRICES

2.1. Introduction

A rectangular array of numbers of the form a11 . . . a1n...

...am1 . . . amn

(1)

is called an m× n matrix, with m rows and n columns. We count rows from the top and columns fromthe left. Hence

( ai1 . . . ain ) and

a1j

...amj

represent respectively the i-th row and the j-th column of the matrix (1), and aij represents the entryin the matrix (1) on the i-th row and j-th column.

Example 2.1.1. Consider the 3× 4 matrix 2 4 3 −13 1 5 2−1 0 7 6

.

Here

( 3 1 5 2 ) and

357

Chapter 2 : Matrices page 1 of 39


represent respectively the 2-nd row and the 3-rd column of the matrix, and 5 represents the entry in thematrix on the 2-nd row and 3-rd column.

We now consider the question of arithmetic involving matrices. First of all, let us study the problemof addition. A reasonable theory can be derived from the following definition.

Definition. Suppose that the two matrices

A =

a11 . . . a1n...

...am1 . . . amn

and B =

b11 . . . b1n...

...bm1 . . . bmn

both have m rows and n columns. Then we write

A+B =

a11 + b11 . . . a1n + b1n...

...am1 + bm1 . . . amn + bmn

and call this the sum of the two matrices A and B.

Example 2.1.2. Suppose that

A =

2 4 3 −13 1 5 2−1 0 7 6

and B =

1 2 −2 70 2 4 −1−2 1 3 3

.

Then

A+B =

2 + 1 4 + 2 3− 2 −1 + 73 + 0 1 + 2 5 + 4 2− 1−1− 2 0 + 1 7 + 3 6 + 3

=

3 6 1 63 3 9 1−3 1 10 9

.

Example 2.1.3. We do not have a definition for “adding” the matrices

(2 4 3 −1−1 0 7 6

)and

2 4 33 1 5−1 0 7

.

PROPOSITION 2A. (MATRIX ADDITION) Suppose that A,B,C are m × n matrices. Supposefurther that O represents the m× n matrix with all entries zero. Then(a) A+B = B +A;(b) A+ (B + C) = (A+B) + C;(c) A+O = A; and(d) there is an m× n matrix A′ such that A+A′ = O.

Proof. Parts (a)–(c) are easy consequences of ordinary addition, as matrix addition is simply entry-wiseaddition. For part (d), we can consider the matrix A′ obtained from A by multiplying each entry of Aby −1. ©

The theory of multiplication is rather more complicated, and includes multiplication of a matrix by ascalar as well as multiplication of two matrices.

We first study the simpler case of multiplication by scalars.



Definition. Suppose that the matrix

A =

a11 . . . a1n...

...am1 . . . amn

has m rows and n columns, and that c ∈ R. Then we write

cA =

ca11 . . . ca1n...

...cam1 . . . camn

and call this the product of the matrix A by the scalar c.

Example 2.1.4. Suppose that

A =

2 4 3 −13 1 5 2−1 0 7 6

.

Then

2A =

4 8 6 −26 2 10 4−2 0 14 12

.

PROPOSITION 2B. (MULTIPLICATION BY SCALAR) Suppose that A,B are m×n matrices, andthat c, d ∈ R. Suppose further that O represents the m× n matrix with all entries zero. Then(a) c(A+B) = cA+ cB;(b) (c+ d)A = cA+ dA;(c) 0A = O; and(d) c(dA) = (cd)A.

Proof. These are all easy consequences of ordinary multiplication, as multiplication by scalar c is simplyentry-wise multiplication by the number c. ©

The question of multiplication of two matrices is rather more complicated. To motivate this, let usconsider the representation of a system of linear equations

a11x1 + . . .+ a1nxn = b1,

...am1x1 + . . .+ amnxn = bm,

(2)

in the form Ax = b, where

A =

a11 . . . a1n...

...am1 . . . amn

and b =

b1...bm

(3)


x =

x1...xn

(4)



represents the variables. This can be written in full matrix notation by a11 . . . a1n...

...am1 . . . amn

x1...xn

=

b1...bm

.

Can you work out the meaning of this representation?

Now let us define matrix multiplication more formally.

Definition. Suppose that

A =

a11 . . . a1n...

...am1 . . . amn

and B =

b11 . . . a1p

......

bn1 . . . bnp

are respectively an m × n matrix and an n × p matrix. Then the matrix product AB is given by them× p matrix

AB =

q11 . . . q1p

......

qm1 . . . qmp

,

where for every i = 1, . . . ,m and j = 1, . . . , p, we have

qij =n∑k=1

aikbkj = ai1b1j + . . .+ ainbnj .

Remark. Note first of all that the number of columns of the first matrix must be equal to the numberof rows of the second matrix. On the other hand, for a simple way to work out qij , the entry in the i-throw and j-th column of AB, we observe that the i-th row of A and the j-th column of B are respectively

( ai1 . . . ain ) and

b1j...bnj

.

We now multiply the corresponding entries – from ai1 with b1j , and so on, until ain with bnj – and thenadd these products to obtain qij .

Example 2.1.5. Consider the matrices

A =

2 4 3 −13 1 5 2−1 0 7 6

and B =

1 42 30 −23 1

.

Note that A is a 3 × 4 matrix and B is a 4 × 2 matrix, so that the product AB is a 3 × 2 matrix. Letus calculate the product

AB =

q11 q12

q21 q22

q31 q32

.



Consider first of all q11. To calculate this, we need the 1-st row of A and the 1-st column of B, so let uscover up all unnecessary information, so that

2 4 3 −1× × × ×× × × ×

1 ×2 ×0 ×3 ×

=

q11 ×× ×× ×

.

From the definition, we have

q11 = 2 · 1 + 4 · 2 + 3 · 0 + (−1) · 3 = 2 + 8 + 0− 3 = 7.

Consider next q12. To calculate this, we need the 1-st row of A and the 2-nd column of B, so let us coverup all unnecessary information, so that

2 4 3 −1× × × ×× × × ×

× 4× 3× −2× 1

=

× q12

× ×× ×

.


q12 = 2 · 4 + 4 · 3 + 3 · (−2) + (−1) · 1 = 8 + 12− 6− 1 = 13.

Consider next q21. To calculate this, we need the 2-nd row of A and the 1-st column of B, so let us coverup all unnecessary information, so that

× × × ×3 1 5 2× × × ×

1 ×2 ×0 ×3 ×

=

× ×q21 ×× ×

.


q21 = 3 · 1 + 1 · 2 + 5 · 0 + 2 · 3 = 3 + 2 + 0 + 6 = 11.

Consider next q22. To calculate this, we need the 2-nd row of A and the 2-nd column of B, so let uscover up all unnecessary information, so that

× × × ×3 1 5 2× × × ×

× 4× 3× −2× 1

=

× ×× q22

× ×

.


q22 = 3 · 4 + 1 · 3 + 5 · (−2) + 2 · 1 = 12 + 3− 10 + 2 = 7.

Consider next q31. To calculate this, we need the 3-rd row of A and the 1-st column of B, so let us coverup all unnecessary information, so that

× × × ×× × × ×−1 0 7 6

1 ×2 ×0 ×3 ×

=

× ×× ×q31 ×

.


q31 = (−1) · 1 + 0 · 2 + 7 · 0 + 6 · 3 = −1 + 0 + 0 + 18 = 17.



Consider finally q32. To calculate this, we need the 3-rd row of A and the 2-nd column of B, so let uscover up all unnecessary information, so that

× × × ×× × × ×−1 0 7 6

× 4× 3× −2× 1

=

× ×× ×× q32

.


q32 = (−1) · 4 + 0 · 3 + 7 · (−2) + 6 · 1 = −4 + 0 +−14 + 6 = −12.

We therefore conclude that

AB =

2 4 3 −13 1 5 2−1 0 7 6

1 42 30 −23 1

=

7 1311 717 −12

.

Example 2.1.6. Consider again the matrices

A =

2 4 3 −13 1 5 2−1 0 7 6

and B =

1 42 30 −23 1

.

Note that B is a 4 × 2 matrix and A is a 3 × 4 matrix, so that we do not have a definition for the“product” BA.

We leave the proofs of the following results as exercises for the interested reader.

PROPOSITION 2C. (ASSOCIATIVE LAW) Suppose that A is an m×n matrix, B is an n×p matrixand C is an p× r matrix. Then A(BC) = (AB)C.

PROPOSITION 2D. (DISTRIBUTIVE LAWS)(a) Suppose that A is an m× n matrix and B and C are n× p matrices. Then A(B +C) = AB +AC.(b) Suppose that A and B are m× n matrices and C is an n× p matrix. Then (A+B)C = AC +BC.

PROPOSITION 2E. Suppose that A is an m×n matrix, B is an n× p matrix, and that c ∈ R. Thenc(AB) = (cA)B = A(cB).

2.2. Systems of Linear Equations

Note that the system (2) of linear equations can be written in matrix form as

Ax = b,

where the matrices A, x and b are given by (3) and (4). In this section, we shall establish the followingimportant result.

PROPOSITION 2F. Every system of linear equations of the form (2) has either no solution, onesolution or infinitely many solutions.



Proof. Clearly the system (2) has either no solution, exactly one solution, or more than one solution.It remains to show that if the system (2) has two distinct solutions, then it must have infinitely manysolutions. Suppose that x = u and x = v represent two distinct solutions. Then

Au = b and Av = b,

so that

A(u− v) = Au−Av = b− b = 0,

where 0 is the zero m× 1 matrix. It now follows that for every c ∈ R, we have

A(u + c(u− v)) = Au +A(c(u− v)) = Au + c(A(u− v)) = b + c0 = b,

so that x = u + c(u− v) is a solution for every c ∈ R. Clearly we have infinitely many solutions. ©

2.3. Inversion of Matrices

For the remainder of this chapter, we shall deal with square matrices, those where the number of rowsequals the number of columns.

Definition. The n× n matrix

In =

a11 . . . a1n...

...an1 . . . ann

,

where

aij =

1 if i = j,0 if i 6= j,

is called the identity matrix of order n.

Remark. Note that

I1 = ( 1 ) and I4 =

1 0 0 00 1 0 00 0 1 00 0 0 1

.

The following result is relatively easy to check. It shows that the identity matrix In acts as the identityfor multiplication of n× n matrices.

PROPOSITION 2G. For every n× n matrix A, we have AIn = InA = A.

This raises the following question: Given an n×n matrix A, is it possible to find another n×n matrixB such that AB = BA = In?

We shall postpone the full answer to this question until the next chapter. In Section 2.5, however, weshall be content with finding such a matrix B if it exists. In Section 2.6, we shall relate the existence ofsuch a matrix B to some properties of the matrix A.



Definition. An n × n matrix A is said to be invertible if there exists an n × n matrix B such thatAB = BA = In. In this case, we say that B is the inverse of A and write B = A−1.

PROPOSITION 2H. Suppose that A is an invertible n× n matrix. Then its inverse A−1 is unique.

Proof. Suppose that B satisfies the requirements for being the inverse of A. Then AB = BA = In. Itfollows that

A−1 = A−1In = A−1(AB) = (A−1A)B = InB = B.

Hence the inverse A−1 is unique. ©

PROPOSITION 2J. Suppose that A and B are invertible n× n matrices. Then (AB)−1 = B−1A−1.

Proof. In view of the uniqueness of inverse, it is sufficient to show that B−1A−1 satisfies the require-ments for being the inverse of AB. Note that

(AB)(B−1A−1) = A(B(B−1A−1)) = A((BB−1)A−1) = A(InA−1) = AA−1 = In

and

(B−1A−1)(AB) = B−1(A−1(AB)) = B−1((A−1A)B) = B−1(InB) = B−1B = In

as required. ©

PROPOSITION 2K. Suppose that A is an invertible n× n matrix. Then (A−1)−1 = A.

Proof. Note that both (A−1)−1 and A satisfy the requirements for being the inverse of A−1. Equalityfollows from the uniqueness of inverse. ©

2.4. Application to Matrix Multiplication

In this section, we shall discuss an application of invertible matrices. Detailed discussion of the techniqueinvolved will be covered in Chapter 7.

Definition. An n× n matrix

A =

a11 . . . a1n...

...an1 . . . ann

,

where aij = 0 whenever i 6= j, is called a diagonal matrix of order n.

Example 2.4.1. The 3× 3 matrices 1 0 00 2 00 0 0

and

0 0 00 0 00 0 0

are both diagonal.

Given an n× n matrix A, it is usually rather complicated to calculate

Ak = A . . . A︸︷︷︸k

.

However, the calculation is rather simple when A is a diagonal matrix, as we shall see in the followingexample.



Example 2.4.2. Consider the 3× 3 matrix

A =

17 −10 −545 −28 −15−30 20 12

.

Suppose that we wish to calculate A98. It can be checked that if we take

P =

1 1 23 0 3−2 3 0

,

then

P−1 =

−3 2 1−2 4/3 13 −5/3 −1

.

Furthermore, if we write

D =

−3 0 00 2 00 0 2

,

then it can be checked that A = PDP−1, so that

A98 = (PDP−1) . . . (PDP−1)︸︷︷︸98

= PD98P−1 = P

398 0 00 298 00 0 298

P−1.

This is much simpler than calculating A98 directly. Note that this example is only an illustration. Wehave not discussed here how the matrices P and D are found.

2.5. Finding Inverses by Elementary Row Operations

In this section, we shall discuss a technique by which we can find the inverse of a square matrix, if theinverse exists. Before we discuss this technique, let us recall the three elementary row operations wediscussed in the previous chapter. These are: (1) interchanging two rows; (2) adding a multiple of onerow to another row; and (3) multiplying one row by a non-zero constant.

Let us now consider the following example.

Example 2.5.1. Consider the matrices

A =

a11 a12 a13

a21 a22 a23

a31 a32 a33

and I3 =

1 0 00 1 00 0 1

.

• Let us interchange rows 1 and 2 of A and do likewise for I3. We obtain respectively a21 a22 a23

a11 a12 a13

a31 a32 a33

and

0 1 01 0 00 0 1

.



Note that a21 a22 a23

a11 a12 a13

a31 a32 a33

=

0 1 01 0 00 0 1

a11 a12 a13

a21 a22 a23

a31 a32 a33

.

• Let us interchange rows 2 and 3 of A and do likewise for I3. We obtain respectively a11 a12 a13

a31 a32 a33

a21 a22 a23

and

1 0 00 0 10 1 0

.


a31 a32 a33

a21 a22 a23

=

1 0 00 0 10 1 0

a11 a12 a13

a21 a22 a23

a31 a32 a33

.

• Let us add 3 times row 1 to row 2 of A and do likewise for I3. We obtain respectively a11 a12 a13

3a11 + a21 3a12 + a22 3a13 + a23

a31 a32 a33

and

1 0 03 1 00 0 1

.


3a11 + a21 3a12 + a22 3a13 + a23

a31 a32 a33

=

1 0 03 1 00 0 1

a11 a12 a13

a21 a22 a23

a31 a32 a33

.

• Let us add −2 times row 3 to row 1 of A and do likewise for I3. We obtain respectively−2a31 + a11 −2a32 + a12 −2a33 + a13

a21 a22 a23

a31 a32 a33

and

1 0 −20 1 00 0 1

.

Note that−2a31 + a11 −2a32 + a12 −2a33 + a13

a21 a22 a23

a31 a32 a33

=

1 0 −20 1 00 0 1

a11 a12 a13

a21 a22 a23

a31 a32 a33

.

• Let us multiply row 2 of A by 5 and do likewise for I3. We obtain respectively a11 a12 a13

5a21 5a22 5a23

a31 a32 a33

and

1 0 00 5 00 0 1

.


5a21 5a22 5a23

a31 a32 a33

=

1 0 00 5 00 0 1

a11 a12 a13

a21 a22 a23

a31 a32 a33

.

• Let us multiply row 3 of A by −1 and do likewise for I3. We obtain respectively a11 a12 a13

a21 a22 a23

−a31 −a32 −a33

and

1 0 00 1 00 0 −1

.




a21 a22 a23

−a31 −a32 −a33

=

1 0 00 1 00 0 −1

a11 a12 a13

a21 a22 a23

a31 a32 a33

.

Let us now consider the problem in general.

Definition. By an elementary n×n matrix, we mean an n×n matrix obtained from In by an elementaryrow operation.

We state without proof the following important result. The interested reader may wish to constructa proof, taking into account the different types of elementary row operations.

PROPOSITION 2L. Suppose that A is an n × n matrix, and suppose that B is obtained from A byan elementary row operation. Suppose further that E is an elementary matrix obtained from In by thesame elementary row operation. Then B = EA.

We now adopt the following strategy. Consider an n×n matrix A. Suppose that it is possible to reducethe matrix A by a sequence α1, α2, . . . , αk of elementary row operations to the identity matrix In. IfE1, E2, . . . , Ek are respectively the elementary n× n matrices obtained from In by the same elementaryrow operations α1, α2 . . . , αk, then

In = Ek . . . E2E1A.

We therefore must have

A−1 = Ek . . . E2E1 = Ek . . . E2E1In.

It follows that the inverse A−1 can be obtained from In by performing the same elementary row operationsα1, α2, . . . , αk. Since we are performing the same elementary row operations on A and In, it makes senseto put them side by side. The process can then be described pictorially by

(A|In) α1−−−→ (E1A|E1In)α2−−−→ (E2E1A|E2E1In)α3−−−→ . . .αk−−−→ (Ek . . . E2E1A|Ek . . . E2E1In) = (In|A−1).

In other words, we consider an array with the matrix A on the left and the matrix In on the right. Wenow perform elementary row operations on the array and try to reduce the left hand half to the matrixIn. If we succeed in doing so, then the right hand half of the array gives the inverse A−1.

Example 2.5.2. Consider the matrix

A =

1 1 23 0 3−2 3 0

.

To find A−1, we consider the array

(A|I3) =

1 1 2 1 0 03 0 3 0 1 0−2 3 0 0 0 1

.



We now perform elementary row operations on this array and try to reduce the left hand half to thematrix I3. Note that if we succeed, then the final array is clearly in reduced row echelon form. Wetherefore follow the same procedure as reducing an array to reduced row echelon form. Adding −3 timesrow 1 to row 2, we obtain 1 1 2 1 0 0

0 −3 −3 −3 1 0−2 3 0 0 0 1

.

Adding 2 times row 1 to row 3, we obtain 1 1 2 1 0 00 −3 −3 −3 1 00 5 4 2 0 1

.

Multiplying row 3 by 3, we obtain 1 1 2 1 0 00 −3 −3 −3 1 00 15 12 6 0 3

.

Adding 5 times row 2 to row 3, we obtain 1 1 2 1 0 00 −3 −3 −3 1 00 0 −3 −9 5 3

.

Multiplying row 1 by 3, we obtain 3 3 6 3 0 00 −3 −3 −3 1 00 0 −3 −9 5 3

.

Adding 2 times row 3 to row 1, we obtain 3 3 0 −15 10 60 −3 −3 −3 1 00 0 −3 −9 5 3

.

Adding −1 times row 3 to row 2, we obtain 3 3 0 −15 10 60 −3 0 6 −4 −30 0 −3 −9 5 3

.

Adding 1 times row 2 to row 1, we obtain 3 0 0 −9 6 30 −3 0 6 −4 −30 0 −3 −9 5 3

.

Multiplying row 1 by 1/3, we obtain 1 0 0 −3 2 10 −3 0 6 −4 −30 0 −3 −9 5 3

.



Multiplying row 2 by −1/3, we obtain 1 0 0 −3 2 10 1 0 −2 4/3 10 0 −3 −9 5 3

.

Multiplying row 3 by −1/3, we obtain 1 0 0 −3 2 10 1 0 −2 4/3 10 0 1 3 −5/3 −1

.

Note now that the array is in reduced row echelon form, and that the left hand half is the identity matrixI3. It follows that the right hand half of the array represents the inverse A−1. Hence

A−1 =

−3 2 1−2 4/3 13 −5/3 −1

.


A =

1 1 2 32 2 4 50 3 0 00 0 0 1

.

To find A−1, we consider the array

(A|I4) =

1 1 2 3 1 0 0 02 2 4 5 0 1 0 00 3 0 0 0 0 1 00 0 0 1 0 0 0 1

.

We now perform elementary row operations on this array and try to reduce the left hand half to thematrix I4. Adding −2 times row 1 to row 2, we obtain

1 1 2 3 1 0 0 00 0 0 −1 −2 1 0 00 3 0 0 0 0 1 00 0 0 1 0 0 0 1

.

Adding 1 times row 2 to row 4, we obtain1 1 2 3 1 0 0 00 0 0 −1 −2 1 0 00 3 0 0 0 0 1 00 0 0 0 −2 1 0 1

.

Interchanging rows 2 and 3, we obtain1 1 2 3 1 0 0 00 3 0 0 0 0 1 00 0 0 −1 −2 1 0 00 0 0 0 −2 1 0 1

.



At this point, we observe that it is impossible to reduce the left hand half of the array to I4. For thosewho remain unconvinced, let us continue. Adding 3 times row 3 to row 1, we obtain

1 1 2 0 −5 3 0 00 3 0 0 0 0 1 00 0 0 −1 −2 1 0 00 0 0 0 −2 1 0 1

.

Adding −1 times row 4 to row 3, we obtain1 1 2 0 −5 3 0 00 3 0 0 0 0 1 00 0 0 −1 0 0 0 −10 0 0 0 −2 1 0 1

.

Multiplying row 1 by 6 (here we want to avoid fractions in the next two steps), we obtain6 6 12 0 −30 18 0 00 3 0 0 0 0 1 00 0 0 −1 0 0 0 −10 0 0 0 −2 1 0 1

.

Adding −15 times row 4 to row 1, we obtain6 6 12 0 0 3 0 −150 3 0 0 0 0 1 00 0 0 −1 0 0 0 −10 0 0 0 −2 1 0 1

.

Adding −2 times row 2 to row 1, we obtain6 0 12 0 0 3 −2 −150 3 0 0 0 0 1 00 0 0 −1 0 0 0 −10 0 0 0 −2 1 0 1

.

Multiplying row 1 by 1/6, multiplying row 2 by 1/3, multiplying row 3 by −1 and multiplying row 4 by−1/2, we obtain

1 0 2 0 0 1/2 −1/3 −5/20 1 0 0 0 0 1/3 00 0 0 1 0 0 0 10 0 0 0 1 −1/2 0 −1/2

.

Note now that the array is in reduced row echelon form, and that the left hand half is not the identitymatrix I4. Our technique has failed. In fact, the matrix A is not invertible.

2.6. Criteria for Invertibility

Examples 2.5.2–2.5.3 raise the question of when a given matrix is invertible. In this section, we shallobtain some partial answers to this question. Our first step here is the following simple observation.

PROPOSITION 2M. Every elementary matrix is invertible.

Proof. Let us consider elementary row operations. Recall that these are: (1) interchanging two rows;(2) adding a multiple of one row to another row; and (3) multiplying one row by a non-zero constant.



These elementary row operations can clearly be reversed by elementary row operations. For (1), weinterchange the two rows again. For (2), if we have originally added c times row i to row j, then we canreverse this by adding −c times row i to row j. For (3), if we have multiplied any row by a non-zeroconstant c, we can reverse this by multiplying the same row by the constant 1/c. Note now that eachelementary matrix is obtained from In by an elementary row operation. The inverse of this elementarymatrix is clearly the elementary matrix obtained from In by the elementary row operation that reversesthe original elementary row operation. ©

Suppose that an n × n matrix B can be obtained from an n × n matrix A by a finite sequence ofelementary row operations. Then since these elementary row operations can be reversed, the matrix Acan be obtained from the matrix B by a finite sequence of elementary row operations.

Definition. An n× n matrix A is said to be row equivalent to an n× n matrix B if there exist a finitenumber of elementary n× n matrices E1, . . . , Ek such that B = Ek . . . E1A.

Remark. Note that B = Ek . . . E1A implies that A = E−11 . . . E−1

k B. It follows that if A is rowequivalent to B, then B is row equivalent to A. We usually say that A and B are row equivalent.

The following result gives conditions equivalent to the invertibility of an n× n matrix A.

PROPOSITION 2N. Suppose that

A =

a11 . . . a1n...

...an1 . . . ann

,

and that

x =

x1...xn

and 0 =

0...0

are n× 1 matrices, where x1, . . . , xn are variables.(a) Suppose that the matrix A is invertible. Then the system Ax = 0 of linear equations has only the

trivial solution.(b) Suppose that the system Ax = 0 of linear equations has only the trivial solution. Then the matrices

A and In are row equivalent.(c) Suppose that the matrices A and In are row equivalent. Then A is invertible.

Proof. (a) Suppose that x0 is a solution of the system Ax = 0. Then since A is invertible, we have

x0 = Inx0 = (A−1A)x0 = A−1(Ax0) = A−10 = 0.

It follows that the trivial solution is the only solution.

(b) Note that if the system Ax = 0 of linear equations has only the trivial solution, then it can bereduced by elementary row operations to the system

x1 = 0, . . . , xn = 0.

This is equivalent to saying that the array a11 . . . a1n...

...an1 . . . ann

∣∣∣∣∣∣0...0



can be reduced by elementary row operations to the reduced row echelon form 1 . . . 0...

...0 . . . 1

∣∣∣∣∣∣0...0

.

Hence the matrices A and In are row equivalent.

(c) Suppose that the matrices A and In are row equivalent. Then there exist elementary n×n matricesE1, . . . , Ek such that In = Ek . . . E1A. By Proposition 2M, the matrices E1, . . . , Ek are all invertible, sothat

A = E−11 . . . E−1

k In = E−11 . . . E−1

k

is a product of invertible matrices, and is therefore itself invertible. ©

2.7. Consequences of Invertibility

Suppose that the matrix

A =

a11 . . . a1n...

...an1 . . . ann

is invertible. Consider the system Ax = b, where

x =

x1...xn

and b =

b1...bn

are n× 1 matrices, where x1, . . . , xn are variables and b1, . . . , bn ∈ R are arbitrary. Since A is invertible,let us consider x = A−1b. Clearly

Ax = A(A−1b) = (AA−1)b = Inb = b,

so that x = A−1b is a solution of the system. On the other hand, let x0 be any solution of the system.Then Ax0 = b, so that

x0 = Inx0 = (A−1A)x0 = A−1(Ax0) = A−1b.

It follows that the system has unique solution. We have proved the following important result.

PROPOSITION 2P. Suppose that

A =

a11 . . . a1n...

...an1 . . . ann

,

and that

x =

x1...xn

and b =

b1...bn

are n × 1 matrices, where x1, . . . , xn are variables and b1, . . . , bn ∈ R are arbitrary. Suppose furtherthat the matrix A is invertible. Then the system Ax = b of linear equations has the unique solutionx = A−1b.



We next attempt to study the question in the opposite direction.

PROPOSITION 2Q. Suppose that

A =

a11 . . . a1n...

...an1 . . . ann

,

and that

x =

x1...xn

and b =

b1...bn

are n × 1 matrices, where x1, . . . , xn are variables. Suppose further that for every b1, . . . , bn ∈ R, thesystem Ax = b of linear equations is soluble. Then the matrix A is invertible.

Proof. Suppose that

b1 =

10...00

, . . . , bn =

00...01

.

In other words, for every j = 1, . . . , n, bj is an n×1 matrix with entry 1 on row j and entry 0 elsewhere.Now let

x1 =

x11...xn1

, . . . , xn =

x1n...

xnn

denote respectively solutions of the systems of linear equations

Ax = b1, . . . , Ax = bn.

It is easy to check that

A ( x1 . . . xn ) = ( b1 . . . bn ) ;

in other words,

A

x11 . . . x1n...

...xn1 . . . xnn

= In,

so that A is invertible. ©

We can now summarize Propositions 2N, 2P and 2Q as follows.

PROPOSITION 2R. In the notation of Proposition 2N, the following four statements are equivalent:(a) The matrix A is invertible.(b) The system Ax = 0 of linear equations has only the trivial solution.(c) The matrices A and In are row equivalent.(d) The system Ax = b of linear equations is soluble for every n× 1 matrix b.



2.8. Application to Economics

In this section, we describe briefly the Leontief input-output model, where an economy is divided into nsectors.

For every i = 1, . . . , n, let xi denote the monetary value of the total output of sector i over a fixedperiod, and let di denote the output of sector i needed to satisfy outside demand over the same fixedperiod. Collecting together xi and di for i = 1, . . . , n, we obtain the vectors

x =

x1...xn

∈ Rn and d =

d1...dn

∈ Rn,

known respectively as the production vector and demand vector of the economy.

On the other hand, each of the n sectors requires material from some or all of the sectors to produceits output. For i, j = 1, . . . , n, let cij denote the monetary value of the output of sector i needed bysector j to produce one unit of monetary value of output. For every j = 1, . . . , n, the vector

cj =

c1j...cnj

∈ Rn

is known as the unit consumption vector of sector j. Note that the column sum

c1j + . . .+ cnj ≤ 1 (5)

in order to ensure that sector j does not make a loss. Collecting together the unit consumption vectors,we obtain the matrix

C = ( c1 . . . cn ) =

c11 . . . c1n...

...cn1 . . . cnn

,

known as the consumption matrix of the economy.

Consider the matrix product

Cx =

c11x1 + . . .+ c1nxn...

cn1x1 + . . .+ cnnxn

.

For every i = 1, . . . , n, the entry ci1x1 + . . .+cinxn represents the monetary value of the output of sectori needed by all the sectors to produce their output. This leads to the production equation

x = Cx + d. (6)

Here Cx represents the part of the total output that is required by the various sectors of the economyto produce the output in the first place, and d represents the part of the total output that is availableto satisfy outside demand.

Clearly (I − C)x = d. If the matrix I − C is invertible, then

x = (I − C)−1d

represents the perfect production level. We state without proof the following fundamental result.



PROPOSITION 2S. Suppose that the entries of the consumption matrix C and the demand vector dare non-negative. Suppose further that the inequality (5) holds for each column of C. Then the inversematrix (I − C)−1 exists, and the production vector x = (I − C)−1d has non-negative entries and is theunique solution of the production equation (6).

Let us indulge in some heuristics. Initially, we have demand d. To produce d, we need Cd as input.To produce this extra Cd, we need C(Cd) = C2d as input. To produce this extra C2d, we needC(C2d) = C3d as input. And so on. Hence we need to produce

d + Cd + C2d + C3d + . . . = (I + C + C2 + C3 + . . .)d

in total. Now it is not difficult to check that for every positive integer k, we have

(I − C)(I + C + C2 + C3 + . . .+ Ck) = I − Ck+1.

If the entries of Ck+1 are all very small, then

(I − C)(I + C + C2 + C3 + . . .+ Ck) ≈ I,

so that

(I − C)−1 ≈ I + C + C2 + C3 + . . .+ Ck.

This gives a practical way of approximating (I − C)−1, and also suggests that

(I − C)−1 = I + C + C2 + C3 + . . . .

Example 2.8.1. An economy consists of three sectors. Their dependence on each other is summarizedin the table below:

To produce one unit of monetaryvalue of output in sector

1 2 3

monetary value of output required from sector 1 0.3 0.2 0.1monetary value of output required from sector 2 0.4 0.5 0.2monetary value of output required from sector 3 0.1 0.1 0.3

Suppose that the final demand from sectors 1, 2 and 3 are respectively 30, 50 and 20. Then the productionvector and demand vector are respectively

x =

x1

x2

x3

and d =

d1

d2

d3

=

305020

,

while the consumption matrix is given by

C =

0.3 0.2 0.10.4 0.5 0.20.1 0.1 0.3

, so that I − C =

0.7 −0.2 −0.1−0.4 0.5 −0.2−0.1 −0.1 0.7

.

The production equation (I − C)x = d has augmented matrix 0.7 −0.2 −0.1−0.4 0.5 −0.2−0.1 −0.1 0.7

∣∣∣∣∣∣305020

, equivalent to

7 −2 −1−4 5 −2−1 −1 7

∣∣∣∣∣∣300500200

,



and which can be converted to reduced row echelon form 1 0 00 1 00 0 1

∣∣∣∣∣∣3200/276100/27700/9

.

This gives x1 ≈ 119, x2 ≈ 226 and x3 ≈ 78, to the nearest integers.

2.9. Matrix Transformation on the Plane

Let A be a 2 × 2 matrix with real entries. A matrix transformation T : R2 → R2 can be defined asfollows: For every x = (x1, x2) ∈ R, we write T (x) = y, where y = (y1, y2) ∈ R2 satisfies(

y1

y2

)= A

(x1

x2

).

Such a transformation is linear, in the sense that T (x′ + x′′) = T (x′) + T (x′) for every x′,x′′ ∈ R2 andT (cx) = cT (x) for every x ∈ R2 and every c ∈ R. To see this, simply observe that

A

(x′1 + x′′1x′2 + x′′2

)= A

(x′1x′2

)+A

(x′′1x′′2

)and A

(cx1

cx2

)= cA

(x1

x2

).

We shall study linear transformations in greater detail in Chapter 8. Here we confine ourselves to lookingat a few simple matrix transformations on the plane.

Example 2.9.1. The matrix

A =(

1 00 −1

)satisfies A

(x1

x2

)=(

1 00 −1

)(x1

x2

)=(x1

−x2

)for every (x1, x2) ∈ R2, and so represents reflection across the x1-axis, whereas the matrix

A =(−1 0

0 1

)satisfies A

(x1

x2

)=(−1 0

0 1

)(x1

x2

)=(−x1

x2

)for every (x1, x2) ∈ R2, and so represents reflection across the x2-axis. On the other hand, the matrix

A =(−1 0

0 −1

)satisfies A

(x1

x2

)=(−1 0

0 −1

)(x1

x2

)=(−x1

−x2

)for every (x1, x2) ∈ R2, and so represents reflection across the origin, whereas the matrix

A =(

0 11 0

)satisfies A

(x1

x2

)=(

0 11 0

)(x1

x2

)=(x2

x1

)for every (x1, x2) ∈ R2, and so represents reflection across the line x1 = x2. We give a summary in thetable below:

Transformation Equations Matrix

Reflection across x1-axis y1 = x1

y2 = −x2

(1 00 −1

)Reflection across x2-axis

y1 = −x1

y2 = x2

(−1 00 1

)Reflection across origin

y1 = −x1

y2 = −x2

(−1 00 −1

)Reflection across x1 = x2

y1 = x2

y2 = x1

(0 11 0

)



Example 2.9.2. Let k be a fixed positive real number. The matrix

A =(k 00 k

)satisfies A

(x1

x2

)=(k 00 k

)(x1

x2

)=(kx1

kx2

)for every (x1, x2) ∈ R2, and so represents a dilation if k > 1 and a contraction if 0 < k < 1. On theother hand, the matrix

A =(k 00 1

)satisfies A

(x1

x2

)=(k 00 1

)(x1

x2

)=(kx1

x2

)for every (x1, x2) ∈ R2, and so represents an expansionn in the x1-direction if k > 1 and a compressionin the x1-direction if 0 < k < 1, whereas the matrix

A =(

1 00 k

)satisfies A

(x1

x2

)=(

1 00 k

)(x1

x2

)=(x1

kx2

)for every (x1, x2) ∈ R2, and so represents a expansion in the x2-direction if k > 1 and a compression inthe x2-direction if 0 < k < 1. We give a summary in the table below:


Dilation or contraction by factor k > 0y1 = kx1

y2 = kx2

(k 00 k

)Expansion or compression in x1-direction by factor k > 0

y1 = kx1

y2 = x2

(k 00 1

)Expansion or compression in x2-direction by factor k > 0

y1 = x1

y2 = kx2

(1 00 k

)

Example 2.9.3. Let k be a fixed real number. The matrix

A =(

1 k0 1

)satisfies A

(x1

x2

)=(

1 k0 1

)(x1

x2

)=(x1 + kx2

x2

)for every (x1, x2) ∈ R2, and so represents a shear in the x1-direction. For the case k = 1, we have thefollowing:



A =!

k 00 k

"satisfies A

!x1

x2

"=

!k 00 k

" !x1

x2

"=

!kx1

kx2

"for every (x1, x2) " R2, and so represents a dilation if k > 1 and a contraction if 0 < k < 1. On theother hand, the matrix

A =!

k 00 1

"satisfies A

!x1

x2

"=

!k 00 1

" !x1

x2

"=

!kx1

x2

"for every (x1, x2) " R2, and so represents an expansionn in the x1-direction if k > 1 and a compressionin the x1-direction if 0 < k < 1, whereas the matrix

A =!

1 00 k

"satisfies A

!x1

x2

"=

!1 00 k

" !x1

x2

"=

!x1

kx2

"for every (x1, x2) " R2, and so represents a expansion in the x2-direction if k > 1 and a compression inthe x2-direction if 0 < k < 1. We give a summary in the table below:


Dilation or contraction by factor k > 0#

y1 = kx1

y2 = kx2

!k 00 k

"Expansion or compression in x1-direction by factor k > 0

#y1 = kx1

y2 = x2

!k 00 1


$ y1 = x1

y2 = kx2

!1 00 k

"


A =!

1 k0 1

"satisfies A

!x1

x2

"=

!1 k0 1

" !x1

x2

"=

!x1 + kx2

x2

"for every (x1, x2) " R2, and so represents a shear in the x1-direction. For the case k = 1, we have thefollowing:

• • • •

• • • •

T

(k=1)

For the case k = #1, we have the following:

• • • •

• • • •

T

(k=!1)


For the case k = −1, we have the following:



A =!

k 00 k

"satisfies A

!x1

x2

"=

!k 00 k

" !x1

x2

"=

!kx1

kx2

"for every (x1, x2) " R2, and so represents a dilation if k > 1 and a contraction if 0 < k < 1. On theother hand, the matrix

A =!

k 00 1

"satisfies A

!x1

x2

"=

!k 00 1

" !x1

x2

"=

!kx1

x2

"for every (x1, x2) " R2, and so represents an expansionn in the x1-direction if k > 1 and a compressionin the x1-direction if 0 < k < 1, whereas the matrix

A =!

1 00 k

"satisfies A

!x1

x2

"=

!1 00 k

" !x1

x2

"=

!x1

kx2

"for every (x1, x2) " R2, and so represents a expansion in the x2-direction if k > 1 and a compression inthe x2-direction if 0 < k < 1. We give a summary in the table below:


Dilation or contraction by factor k > 0#

y1 = kx1

y2 = kx2

!k 00 k


#y1 = kx1

y2 = x2

!k 00 1


$ y1 = x1

y2 = kx2

!1 00 k

"


A =!

1 k0 1

"satisfies A

!x1

x2

"=

!1 k0 1

" !x1

x2

"=

!x1 + kx2

x2

"for every (x1, x2) " R2, and so represents a shear in the x1-direction. For the case k = 1, we have thefollowing:

• • • •

• • • •

T

(k=1)

For the case k = #1, we have the following:

• • • •

• • • •

T

(k=!1)

Chapter 2 : Matrices page 21 of 39Chapter 2 : Matrices page 21 of 39


Similarly, the matrix

A =(

1 0k 1

)satisfies A

(x1

x2

)=(

1 0k 1

)(x1

x2

)=(

x1

kx1 + x2

)for every (x1, x2) ∈ R2, and so represents a shear in the x2-direction. We give a summary in the tablebelow:


Shear in x1-directiony1 = x1 + kx2

y2 = x2

(1 k0 1

)Shear in x2-direction

y1 = x1

y2 = kx1 + x2

(1 0k 1

)

Example 2.9.4. For anticlockwise rotation by an angle θ, we have T (x1, x2) = (y1, y2), where

y1 + iy2 = (x1 + ix2)(cos θ + i sin θ),

and so (y1

y2

)=(

cos θ − sin θsin θ cos θ

)(x1

x2

).

It follows that the matrix in question is given by

A =(


).

We give a summary in the table below:


Anticlockwise rotation by angle θy1 = x1 cos θ − x2 sin θy2 = x1 sin θ + x2 cos θ

(cos θ − sin θsin θ cos θ

)

We conclude this section by establishing the following result which reinforces the linearity of matrixtransformations on the plane.

PROPOSITION 2T. Suppose that a matrix transformation T : R2 → R2 is given by an invertiblematrix A. Then(a) the image under T of a straight line is a straight line;(b) the image under T of a straight line through the origin is a straight line through the origin; and(c) the images under T of parallel straight lines are parallel straight lines.

Proof. Suppose that T (x1, x2) = (y1, y2). Since A is invertible, we have x = A−1y, where

x =(x1

x2

)and y =

(y1

y2

).

The equation of a straight line is given by αx1 + βx2 = γ or, in matrix form, by

(α β )(x1

x2

)= ( γ ) .



Hence

(α β )A−1

(y1

y2

)= ( γ ) .

Let

(α′ β′ ) = (α β )A−1.

Then

(α′ β′ )(y1

y2

)= ( γ ) .

In other words, the image under T of the straight line αx1 + βx2 = γ is α′y1 + β′y2 = γ, clearly anotherstraight line. This proves (a). To prove (b), note that straight lines through the origin correspond toγ = 0. To prove (c), note that parallel straight lines correspond to different values of γ for the samevalues of α and β. ©

2.10. Application to Computer Graphics

Example 2.10.1. Consider the letter M in the diagram below:


Hence

( ! " )A!1

!y1

y2

"= ( # ) .

Let

( !" "" ) = (! " )A!1.

Then

( !" "" )!

y1

y2

"= ( # ) .

In other words, the image under T of the straight line !x1 + "x2 = # is !"y1 + ""y2 = #, clearly anotherstraight line. This proves (a). To prove (b), note that straight lines through the origin correspond to# = 0. To prove (c), note that parallel straight lines correspond to di!erent values of # for the samevalues of ! and ". !

2.10. Application to Computer Graphics

Example 2.10.1. Consider the letter M in the diagram below:

Following the boundary in the anticlockwise direction starting at the origin, the 12 vertices can berepresented by the coordinates!

00

",

!10

",

!16

",

!40

",

!76

",

!70

",!

80

",

!88

",

!78

",

!42

",

!18

",

!08

".

Let us apply a matrix transformation to these vertices, using the matrix

A =!

1 12

0 1

",

representing a shear in the x1-direction with factor 0.5, so that

A

!x1

x2

"=

!x1 + 1

2x2

x2

"for every (x1, x2) " R2.


Following the boundary in the anticlockwise direction starting at the origin, the 12 vertices can berepresented by the coordinates(

00

),

(10

),

(16

),

(40

),

(76

),

(70

),(

80

),

(88

),

(78

),

(42

),

(18

),

(08

).

Let us apply a matrix transformation to these vertices, using the matrix

A =(

1 12

0 1

),

representing a shear in the x1-direction with factor 0.5, so that

A

(x1

x2

)=(x1 + 1

2x2

x2

)for every (x1, x2) ∈ R2.



Then the images of the 12 vertices are respectively(00

),

(10

),

(46

),

(40

),

(106

),

(70

),(

80

),

(128

),

(118

),

(52

),

(58

),

(48

),

noting that(1 1

20 1

)(0 1 1 4 7 7 8 8 7 4 1 00 0 6 0 6 0 0 8 8 2 8 8

)=(

0 1 4 4 10 7 8 12 11 5 5 40 0 6 0 6 0 0 8 8 2 8 8

).

In view of Proposition 2T, the image of any line segment that joins two vertices is a line segment thatjoins the images of the two vertices. Hence the image of the letter M under the shear looks like thefollowing:


Then the images of the 12 vertices are respectively!00

",

!10

",

!46

",

!40

",

!106

",

!70

",!

80

",

!128

",

!118

",

!52

",

!58

",

!48

",

noting that!1 1

20 1

" !0 1 1 4 7 7 8 8 7 4 1 00 0 6 0 6 0 0 8 8 2 8 8

"=

!0 1 4 4 10 7 8 12 11 5 5 40 0 6 0 6 0 0 8 8 2 8 8

".

In view of Proposition 2T, the image of any line segment that joins two vertices is a line segment thatjoins the images of the two vertices. Hence the image of the letter M under the shear looks like thefollowing:

Next, we may wish to translate this image. However, a translation is a transformation by vectorh = (h1, h2) " R2 is of the form!

y1

y2

"=

!x1

x2

"+

!h1

h2

"for every (x1, x2) " R2,

and this cannot be described by a matrix transformation on the plane. To overcome this deficiency,we introduce homogeneous coordinates. For every point (x1, x2) " R2, we identify it with the point(x1, x2, 1) " R3. Now we wish to translate a point (x1, x2) to (x1, x2) + (h1, h2) = (x1 + h1, x2 + h2), sowe attempt to find a 3# 3 matrix A! such that#$ x1 + h1

x2 + h2

1

%& = A!

#$ x1

x2

1

%& for every (x1, x2) " R2.

It is easy to check that#$ x1 + h1

x2 + h2

1

%& =

#$ 1 0 h1

0 1 h2

0 0 1

%& #$ x1

x2

1

%& for every (x1, x2) " R2.

It follows that using homogeneous coordinates, translation by vector h = (h1, h2) " R2 can be describedby the matrix

A! =

#$ 1 0 h1

0 1 h2

0 0 1

%& .


Next, we may wish to translate this image. However, a translation is a transformation by vectorh = (h1, h2) ∈ R2 is of the form(

y1

y2

)=(x1

x2

)+(h1

h2

)for every (x1, x2) ∈ R2,

and this cannot be described by a matrix transformation on the plane. To overcome this deficiency,we introduce homogeneous coordinates. For every point (x1, x2) ∈ R2, we identify it with the point(x1, x2, 1) ∈ R3. Now we wish to translate a point (x1, x2) to (x1, x2) + (h1, h2) = (x1 + h1, x2 + h2), sowe attempt to find a 3× 3 matrix A∗ such thatx1 + h1

x2 + h2

1

= A∗

x1

x2

1

for every (x1, x2) ∈ R2.

It is easy to check thatx1 + h1

x2 + h2

1

=

1 0 h1

0 1 h2

0 0 1

x1

x2

1

for every (x1, x2) ∈ R2.

It follows that using homogeneous coordinates, translation by vector h = (h1, h2) ∈ R2 can be describedby the matrix

A∗ =

1 0 h1

0 1 h2

0 0 1

.



Remark. Consider a matrix transformation T : R2 → R2 on the plane given by a matrix

A =(a11 a12

a21 a22

).

Suppose that T (x1, x2) = (y1, y2). Then(y1

y2

)= A

(x1

x2

)=(a11 a12

a21 a22

)(x1

x2

).

Under homogeneous coordinates, the image of the point (x1, x2, 1) is now (y1, y2, 1). Note that y1

y2

1

=

a11 a12 0a21 a22 00 0 1

x1

x2

1

.

It follows that homogeneous coordinates can also be used to study all the matrix transformations wehave discussed in Section 2.9. By moving over to homogeneous coordinates, we simply replace the 2× 2matrix A by the 3× 3 matrix

A∗ =(A 00 1

).

Example 2.10.2. Returning to Example 2.10.1 of the letter M , the 12 vertices are now represented byhomogeneous coordinates, put in an array in the form 0 1 1 4 7 7 8 8 7 4 1 0

0 0 6 0 6 0 0 8 8 2 8 81 1 1 1 1 1 1 1 1 1 1 1

.

Then the 2× 2 matrix

A =(

1 12

0 1

)is now replaced by the 3× 3 matrix

A∗ =

1 12 0

0 1 00 0 1

.

Note that

A∗

0 1 1 4 7 7 8 8 7 4 1 00 0 6 0 6 0 0 8 8 2 8 81 1 1 1 1 1 1 1 1 1 1 1

=

1 12 0

0 1 00 0 1

0 1 1 4 7 7 8 8 7 4 1 00 0 6 0 6 0 0 8 8 2 8 81 1 1 1 1 1 1 1 1 1 1 1

=

0 1 4 4 10 7 8 12 11 5 5 40 0 6 0 6 0 0 8 8 2 8 81 1 1 1 1 1 1 1 1 1 1 1

.

Next, let us consider a translation by the vector (2, 3). The matrix under homogeneous coordinates forthis translation is given by

B∗ =

1 0 20 1 30 0 1

.



Note that

B∗A∗

0 1 1 4 7 7 8 8 7 4 1 00 0 6 0 6 0 0 8 8 2 8 81 1 1 1 1 1 1 1 1 1 1 1

=

1 0 20 1 30 0 1

0 1 4 4 10 7 8 12 11 5 5 40 0 6 0 6 0 0 8 8 2 8 81 1 1 1 1 1 1 1 1 1 1 1

=

2 3 6 6 12 9 10 14 13 7 7 63 3 9 3 9 3 3 11 11 5 11 111 1 1 1 1 1 1 1 1 1 1 1

,

giving rise to coordinates in R2, displayed as an array(2 3 6 6 12 9 10 14 13 7 7 63 3 9 3 9 3 3 11 11 5 11 11

)Hence the image of the letter M under the shear followed by translation looks like the following:


Note that

B!A!

!" 0 1 1 4 7 7 8 8 7 4 1 00 0 6 0 6 0 0 8 8 2 8 81 1 1 1 1 1 1 1 1 1 1 1

#$=

!" 1 0 20 1 30 0 1

#$ !" 0 1 4 4 10 7 8 12 11 5 5 40 0 6 0 6 0 0 8 8 2 8 81 1 1 1 1 1 1 1 1 1 1 1

#$=

!" 2 3 6 6 12 9 10 14 13 7 7 63 3 9 3 9 3 3 11 11 5 11 111 1 1 1 1 1 1 1 1 1 1 1

#$ ,

giving rise to coordinates in R2, displayed as an array%2 3 6 6 12 9 10 14 13 7 7 63 3 9 3 9 3 3 11 11 5 11 11

&Hence the image of the letter M under the shear followed by translation looks like the following:

Example 2.10.3. Under homogeneous coordinates, the transformation representing a reflection acrossthe x1-axis, followed by a shear by factor 2 in the x1-direction, followed by anticlockwise rotation by90", and followed by translation by vector (2,"1), has matrix!" 1 0 2

0 1 "10 0 1

#$ !" 0 "1 01 0 00 0 1

#$ !" 1 2 00 1 00 0 1

#$ !" 1 0 00 "1 00 0 1

#$ =

!" 0 1 21 "2 "10 0 1

#$ .

2.11. Complexity of a Non-Homogeneous System

Consider the problem of solving a system of linear equations of the form Ax = b, where A is an n# ninvertible matrix. We are interested in the number of operations required to solve such a system. By anoperation, we mean interchanging, adding or multiplying two real numbers.


Example 2.10.3. Under homogeneous coordinates, the transformation representing a reflection acrossthe x1-axis, followed by a shear by factor 2 in the x1-direction, followed by anticlockwise rotation by90, and followed by translation by vector (2,−1), has matrix 1 0 2

0 1 −10 0 1

0 −1 01 0 00 0 1

1 2 00 1 00 0 1

1 0 00 −1 00 0 1

=

0 1 21 −2 −10 0 1

.

2.11. Complexity of a Non-Homogeneous System

Consider the problem of solving a system of linear equations of the form Ax = b, where A is an n× ninvertible matrix. We are interested in the number of operations required to solve such a system. By anoperation, we mean interchanging, adding or multiplying two real numbers.



One way of solving the system Ax = b is to write down the augmented matrix a11 . . . a1n...

...an1 . . . ann

∣∣∣∣∣∣b1...bn

, (7)

and then convert it to reduced row echelon form by elementary row operations.

The first step is to reduce it to row echelon form:

(I) First of all, we may need to interchange two rows in order to ensure that the top left entry in thearray is non-zero. This requires n+ 1 operations.

(II) Next, we need to multiply the new first row by a constant in order to make the top left pivotentry equal to 1. This requires n+ 1 operations, and the array now looks like

1 a12 . . . a1n

a21 a22 . . . a2n...

......

an1 an2 . . . ann

∣∣∣∣∣∣∣∣b1b2...bn

.

Note that we are abusing notation somewhat, as the entry a12 here, for example, may well be differentfrom the entry a12 in the augmented matrix (7).

(III) For each row i = 2, . . . , n, we now multiply the first row by −ai1 and then add to row i. Thisrequires 2(n− 1)(n+ 1) operations, and the array now looks like

1 a12 . . . a1n

0 a22 . . . a2n...

......

0 an2 . . . ann

∣∣∣∣∣∣∣∣b1b2...bn

. (8)

(IV) In summary, to proceed from the form (7) to the form (8), the number of operations required isat most 2(n+ 1) + 2(n− 1)(n+ 1) = 2n(n+ 1).

(V) Our next task is to convert the smaller array a22 . . . a2n...

...an2 . . . ann

∣∣∣∣∣∣b2...bn

to an array that looks like

1 a23 . . . a2n

0 a33 . . . a3n...

......

0 an3 . . . ann

∣∣∣∣∣∣∣∣b2b3...bn

.

These have one row and one column fewer than the arrays (7) and (8), and the number of operationsrequired is at most 2m(m+ 1), where m = n− 1. We continue in this way systematically to reach rowechelon form, and conclude that the number of operations required to convert the augmented matrix (7)to row echelon form is at most

n∑m=1

2m(m+ 1) ≈ 23n3.



The next step is to convert the row echelon form to reduced row echelon form. This is simpler, asmany entries are now zero. It can be shown that the number of operations required is bounded bysomething like 2n2 – indeed, by something like n2 if one analyzes the problem more carefully. In anycase, these estimates are insignificant compared to the estimate 2

3n3 earlier.

We therefore conclude that the number of operations required to solve the system Ax = b by reducingthe augmented matrix to reduced row echelon form is bounded by something like 2

3n3 when n is large.

Another way of solving the system Ax = b is to first find the inverse matrix A−1. This may involveconverting the array a11 . . . a1n

......

an1 . . . ann

∣∣∣∣∣∣1

. . .1

to reduced row echelon form by elementary row operations. It can be shown that the number of operationsrequired is something like 2n3, so this is less efficient than our first method.

2.12. Matrix Factorization

In some situations, we may need to solve systems of linear equations of the form Ax = b, with the samecoefficient matrix A but for many different vectors b. If A is an invertible square matrix, then we canfind its inverse A−1 and then compute A−1b for each vector b. However, the matrix A may not be asquare matrix, and we may have to convert the augmented matrix to reduced row echelon form.

In this section, we describe a way for solving this problem in a more efficient way. To describe this,we first need a definition.

Definition. A rectangular array of numbers is said to be in quasi row echelon form if the followingconditions are satisfied:(1) The left-most non-zero entry of any non-zero row is called a pivot entry. It is not necessary for its

value to be equal to 1.(2) All zero rows are grouped together at the bottom of the array.(3) The pivot entry of a non-zero row occurring lower in the array is to the right of the pivot entry of

a non-zero row occurring higher in the array.

In other words, the array looks like row echelon form in shape, except that the pivot entries do nothave to be equal to 1.

We consider first of all a special case.

PROPOSITION 2U. Suppose that an m× n matrix A can be converted to quasi row echelon form byelementary row operations but without interchanging any two rows. Then A = LU , where L is an m×mlower triangular matrix with diagonal entries all equal to 1 and U is a quasi row echelon form of A.

Sketch of Proof. Recall that applying an elementary row operation to an m× n matrix correspondsto multiplying the matrix on the left by an elementary m × m matrix. On the other hand, if we areaiming for quasi row echelon form and not row echelon form, then there is no need to multiply any rowof the array by a non-zero constant. Hence the only elementary row operation we need to perform isto add a multiple of one row to another row. In fact, it is sufficient even to restrict this to adding amultiple of a row higher in the array to another row lower in the array, and it is easy to see that thecorresponding elementary matrix is lower triangular, with diagonal entries all equal to 1. Let us call



such elementary matrices unit lower triangular. If an m × n matrix A can be reduced in this way toquasi row echelon form U , then

U = Ek . . . E2E1A,

where the elementary matrices E1, E2, . . . , Ek are all unit lower triangular. Let L = (Ek . . . E2E1)−1.Then A = LU . It can be shown that products and inverses of unit lower triangular matrices are alsounit lower triangular. Hence L is a unit lower triangular matrix as required. ©

If Ax = b and A = LU , then L(Ux) = b. Writing y = Ux, we have

Ly = b and Ux = y.

It follows that the problem of solving the system Ax = b corresponds to first solving the system Ly = band then solving the system Ux = y. Both of these systems are easy to solve since both L and U havemany zero entries. It remains to find L and U .

If we reduce the matrix A to quasi row echelon form by only performing the elementary row operationof adding a multiple of a row higher in the array to another row lower in the array, then U can betaken as the quasi row echelon form resulting from this. It remains to find L. However, note thatL = (Ek . . . E2E1)−1, where U = Ek . . . E2E1A, and so

I = Ek . . . E2E1L.

This means that the very elementary row operations that convert A to U will convert L to I. Wetherefore wish to create a matrix L such that this is satisfied. It is simplest to illustrate the techniqueby an example.


A =

2 −1 2 −2 34 1 6 −5 82 −10 −4 8 −52 −13 −6 16 −5

.

The entry 2 in row 1 and column 1 is a pivot entry, and column 1 is a pivot column. Adding −2 timesrow 1 to row 2, adding −1 times row 1 to row 3, and adding −1 times row 1 to row 4, we obtain

2 −1 2 −2 30 3 2 −1 20 −9 −6 10 −80 −12 −8 18 −8

.

Note that the same three elementary row operations convert1 0 0 02 1 0 01 ∗ 1 01 ∗ ∗ 1

to

1 0 0 00 1 0 00 ∗ 1 00 ∗ ∗ 1

.

Next, the entry 3 in row 2 and column 2 is a pivot entry, and column 2 is a pivot column. Adding 3times row 2 to row 3, and adding 4 times row 2 to row 4, we obtain

2 −1 2 −2 30 3 2 −1 20 0 0 7 −20 0 0 14 0

.



Note that the same two elementary row operations convert1 0 0 00 1 0 00 −3 1 00 −4 ∗ 1

to

1 0 0 00 1 0 00 0 1 00 0 ∗ 1

.

Next, the entry 7 in row 3 and column 4 is a pivot entry, and column 4 is a pivot column. Adding −2times row 3 to row 4, we obtain the quasi row echelon form

U =

2 −1 2 −2 30 3 2 −1 20 0 0 7 −20 0 0 0 4

,

where the entry 4 in row 4 and column 5 is a pivot entry, and column 5 is a pivot column. Note thatthe same elementary row operation converts

1 0 0 00 1 0 00 0 1 00 0 2 1

to

1 0 0 00 1 0 00 0 1 00 0 0 1

.

Now observe that if we take

L =

1 0 0 02 1 0 01 −3 1 01 −4 2 1

,

then L can be converted to I4 by the same elementary operations that convert A to U .

The strategy is now clear. Every time we find a new pivot, we note its value and the entries below it.The lower triangular entries of L are formed by these columns with each column divided by the value ofthe pivot entry in that column.

Example 2.12.2. Let us examine our last example again. The pivot columns at the time of establishingthe pivot entries are respectively

2422

,

∗3−9−12

,

∗∗714

,

∗∗∗4

.

Dividing them respectively by the pivot entries 2, 3, 7 and 4, we obtain respectively the columns1211

,

∗1−3−4

,

∗∗12

,

∗∗∗1

.

Note that the lower triangular entries of the matrix

L =

1 0 0 02 1 0 01 −3 1 01 −4 2 1

correspond precisely to the entries in these columns.



LU FACTORIZATION ALGORITHM.(1) Reduce the matrix A to quasi row echelon form by only performing the elementary row operation of

adding a multiple of a row higher in the array to another row lower in the array. Let U be the quasirow echelon form obtained.

(2) Record any new pivot column at the time of its first recognition, and modify it by replacing any entryabove the pivot entry by zero and dividing every other entry by the value of the pivot entry.

(3) Let L denote the square matrix obtained by letting the columns be the pivot columns as modified instep (2).

Example 2.12.3. We wish to solve the system of linear equations Ax = b, where

A =

3 −1 2 −4 1−3 3 −5 5 −26 −4 11 −10 6−6 8 −21 13 −9

and b =

1−29−15

.

Let us first apply LU factorization to the matrix A. The first pivot column is column 1, with modifiedversion

1−12−2

.

Adding row 1 to row 2, adding −2 times row 1 to row 3, and adding 2 times row 1 to row 4, we obtain3 −1 2 −4 10 2 −3 1 −10 −2 7 −2 40 6 −17 5 −7

.

The second pivot column is column 2, with modified version01−13

.

Adding row 2 to row 3, and adding −3 times row 2 to row 4, we obtain3 −1 2 −4 10 2 −3 1 −10 0 4 −1 30 0 −8 2 −4

.

The third pivot column is column 3, with modified version001−2

.

Adding 2 times row 3 to row 4, we obtain the quasi row echelon form3 −1 2 −4 10 2 −3 1 −10 0 4 −1 30 0 0 0 2

.



The last pivot column is column 5, with modified version0001

.

It follows that

L =

1 0 0 0−1 1 0 02 −1 1 0−2 3 −2 1

and U =

3 −1 2 −4 10 2 −3 1 −10 0 4 −1 30 0 0 0 2

.

We now consider the system Ly = b, with augmented matrix1 0 0 0−1 1 0 02 −1 1 0−2 3 −2 1

∣∣∣∣∣∣∣1−29−15

.

Using row 1, we obtain y1 = 1. Using row 2, we obtain y2 − y1 = −2, so that y2 = −1. Using row 3, weobtain y3 + 2y1 − y2 = 9, so that y3 = 6. Using row 4, we obtain y4 − 2y1 + 3y2 − 2y3 = −15, so thaty4 = 2. Hence

y =

1−162

.

We next consider the system Ux = y, with augmented matrix3 −1 2 −4 10 2 −3 1 −10 0 4 −1 30 0 0 0 2

∣∣∣∣∣∣∣1−162

.

Here the free variable is x4. Let x4 = t. Using row 4, we obtain 2x5 = 2, so that x5 = 1. Using row 3,we obtain 4x3 = 6 + x4 − 3x5 = 3 + t, so that x3 = 3

4 + 14 t. Using row 2, we obtain

2x2 = −1 + 3x3 − x4 + x5 = 94 − 1

4 t,

so that x2 = 98 − 1

8 t. Using row 1, we obtain 3x1 = 1+x2−2x3 +4x4−x5 = 278 t− 3

8 , so that x1 = 98 t− 1

8 .Hence

(x1, x2, x3, x4, x5) =(

9t− 18

,9− t

8,

3 + t

4, t, 1

), where t ∈ R.

Remarks. (1) In practical situations, interchanging rows is usually necessary to convert a matrix A toquasi row echelon form. The technique here can be modified to produce a matrix L which is not unitlower triangular, but which can be made unit lower triangular by interchanging rows.

(2) Computing an LU factorization of an n × n matrix takes approximately 23n

3 operations. Solvingthe systems Ly = b and Ux = y requires approximately 2n2 operations.

(3) LU factorization is particularly efficient when the matrix A has many zero entries, in which casethe matrices L and U may also have many zero entries.



2.13. Application to Games of Strategy

Consider a game with two players. Player R, usually known as the row player, has m possible moves,denoted by i = 1, 2, 3, . . . ,m, while player C, usually known as the column player, has n possible moves,denoted by j = 1, 2, 3, . . . , n. For every i = 1, 2, 3, . . . ,m and j = 1, 2, 3, . . . , n, let aij denote the payoffthat player C has to make to player R if player R makes move i and player C makes move j. Thesenumbers give rise to the payoff matrix

A =

a11 . . . a1n...

...am1 . . . amn

.

The entries can be positive, negative or zero.

Suppose that for every i = 1, 2, 3, . . . ,m, player R makes move i with probability pi, and that forevery j = 1, 2, 3, . . . , n, player C makes move j with probability qj . Then

p1 + . . .+ pm = 1 and q1 + . . .+ qn = 1.

Assume that the players make moves independently of each other. Then for every i = 1, 2, 3, . . . ,m andj = 1, 2, 3, . . . , n, the number piqj represents the probability that player R makes move i and player Cmakes move j. Then the double sum

EA(p,q) =m∑i=1

n∑j=1

aijpiqj

represents the expected payoff that player C has to make to player R.

The matrices

p = ( p1 . . . pm ) and q =

q1...qn

are known as the strategies of player R and player C respectively. Clearly the expected payoff

EA(p,q) =m∑i=1

n∑j=1

aijpiqj = ( p1 . . . pm )

a11 . . . a1n...

...am1 . . . amn

q1

...qn

= pAq.

Here we have slightly abused notation. The right hand side is a 1× 1 matrix!

We now consider the following problem: Suppose that A is fixed. Is it possible for player R to choosea strategy p to try to maximize the expected payoff EA(p,q)? Is it possible for player C to choose astrategy q to try to minimize the expected payoff EA(p,q)?

FUNDEMENTAL THEOREM OF ZERO SUM GAMES. There exist strategies p∗ and q∗ suchthat

EA(p∗,q) ≥ EA(p∗,q∗) ≥ EA(p,q∗)

for every strategy p of player R and every strategy q of player C.

Remark. The strategy p∗ is known as an optimal strategy for player R, and the strategy q∗ is known asan optimal strategy for player C. The quantity EA(p∗,q∗) is known as the value of the game. Optimalstrategies are not necessarily unique. However, if p∗∗ and q∗∗ are another pair of optimal strategies,then EA(p∗,q∗) = EA(p∗∗,q∗∗).



Zero sum games which are strictly determined are very easy to analyze. Here the payoff matrix Acontains saddle points. An entry aij in the payoff matrix A is called a saddle point if it is a least entryin its row and a greatest entry in its column. In this case, the strategies

p∗ = ( 0 . . . 0 1 0 . . . 0 ) and q =

0...010...0

,

where the 1’s occur in position i in p∗ and position j in q∗, are optimal strategies, so that the value ofthe game is aij .

Remark. It is very easy to show that different saddle points in the payoff matrix have the same value.

Example 2.13.1. In some sports mad school, the teachers require 100 students to each choose betweenrowing (R) and cricket (C). However, the students cannot make up their mind, and will only decidewhen the identities of the rowing coach and cricket coach are known. There are 3 possible rowing coachesand 4 possible cricket coaches the school can hire. The number of students who will choose rowing aheadof cricket in each scenario is as follows, where R1, R2 and R3 denote the 3 possible rowing coaches, andC1, C2, C3 and C4 denote the 4 possible cricket coaches:

C1 C2 C3 C4

R1 75 50 45 60

R2 20 60 30 55

R3 45 70 35 30

[For example, if coaches R2 and C1 are hired, then 20 students will choose rowing, and so 80 studentswill choose cricket.] We first reset the problem by subtracting 50 from each entry and create a payoffmatrix

A =

25 0 −5 10−30 10 −20 5−5 20 −15 −20

.

[For example, the top left entry denotes that if each sport starts with 50 students, then 25 is the numbercricket concedes to rowing.] Here the entry −5 in row 1 and column 3 is a saddle point, so the optimalstrategy for rowing is to use coach R1 and the optimal strategy for cricket is to use coach C3.

In general, saddle points may not exist, so that the problem is not strictly determined. Then thesolution for these optimal problems are solved by linear programming techniques which we do notdiscuss here. However, in the case of 2× 2 payoff matrices

A =(a11 a12

a21 a22

)which do not contain saddle points, we can write p2 = 1− p1 and q2 = 1− q1. Then

EA(p,q) = a11p1q1 + a12p1(1− q1) + a21(1− p1)q1 + a22(1− p1)(1− q1)= ((a11 − a12 − a21 + a22)p1 − (a22 − a21))q1 + (a12 − a22)p1 + a22.



Let

p1 = p∗1 =a22 − a21

a11 − a12 − a21 + a22.

Then

EA(p∗,q) =(a12 − a22)(a22 − a21)a11 − a12 − a21 + a22

+ a22 =a11a22 − a12a21

a11 − a12 − a21 + a22,

which is independent of q. Similarly, if

q1 = q∗1 =a22 − a12

a11 − a12 − a21 + a22,

then

EA(p,q∗) =a11a22 − a12a21

a11 − a12 − a21 + a22,

which is independent of p. Hence

EA(p∗,q) = EA(p∗,q∗) = EA(p,q∗) for all strategies p and q.

Note that

p∗ =( a22 − a21

a11 − a12 − a21 + a22

a11 − a12

a11 − a12 − a21 + a22

)(9)

and

q∗ =

a22 − a12

a11 − a12 − a21 + a22

a11 − a21

a11 − a12 − a21 + a22

, (10)

with value

EA(p∗,q∗) =a11a22 − a12a21

a11 − a12 − a21 + a22.




1. Consider the four matrices

A =

2 51 42 1

, B =(

1 7 2 99 2 7 1

), C =

1 0 42 1 31 1 53 2 1

, D =

1 0 72 1 21 3 0

.

Calculate all possible products.

2. In each of the following cases, determine whether the products AB and BA are both defined; ifso, determine also whether AB and BA have the same number of rows and the same number ofcolumns; if so, determine also whether AB = BA:

a) A =(

0 34 5

)and B =

(2 −13 2

)

b) A =(

1 −1 53 0 4

)and B =

2 13 61 5

c) A =

(2 −13 2

)and B =

(1 −412 1

)

d) A =

3 1 −4−2 0 51 −2 3

and B =

2 0 00 5 00 0 −1

3. Evaluate A2, where A =(

2 −53 1

), and find α, β, γ ∈ R, not all zero, such that the matrix

αI + βA+ γA2 is the zero matrix.

4. a) Let A =(

6 −49 −6

). Show that A2 is the zero matrix.

b) Find all 2× 2 matrices B =(α βγ δ

)such that B2 is the zero matrix.

5. Prove that if A and B are matrices such that I − AB is invertible, then the inverse of I − BA isgiven by the formula (I −BA)−1 = I +B(I −AB)−1A.[Hint: Write C = (I −AB)−1. Then show that (I −BA)(I +BCA) = I.]

6. For each of the matrices below, use elementary row operations to find its inverse, if the inverseexists:

a)

1 1 11 −1 10 0 1

b)

1 2 −21 5 32 6 −1

c)

1 5 21 1 70 −3 4

d)

2 3 43 4 22 3 3

e)

1 a b+ c1 b a+ c1 c a+ b



7. a) Using elementary row operations, show that the inverse of2 5 8 51 2 3 12 4 7 21 3 5 3

is

3 −2 1 −5−2 5 −2 30 −2 1 01 −1 0 −1

.

b) Without performing any further elementary row operations, use part (a) to solve the system oflinear equations

2x1 + 5x2 + 8x3 + 5x4 = 0,x1 + 2x2 + 3x3 + x4 = 1,

2x1 + 4x2 + 7x3 + 2x4 = 0,x1 + 3x2 + 5x3 + 3x4 = 1.

8. Consider the matrix

A =

1 0 3 11 1 5 52 1 9 82 0 6 3

.

a) Use elementary row operations to find the inverse of A.b) Without performing any further elementary row operations, use your solution in part (a) to solve

the system of linear equations

x1 + 3x3 + x4 = 1,x1 + x2 + 5x3 + 5x4 = 0,

2x1 + x2 + 9x3 + 8x4 = 0,2x1 + 6x3 + 3x4 = 0.

9. In each of the following, solve the production equation x = Cx + d:

a) C =(

0.1 0.50.6 0.2

)and d =

(5000030000

)b) C =

(0 0.6

0.5 0.2

)and d =

(3600022000

)

c) C =

0.2 0.2 00.1 0 0.20.3 0.1 0.3

and d =

400000080000006000000

10. Consider three industries A, B and C. For industry A to manufacture $1 worth of its product,

it needs to purchase 25c worth of product from each of industries B and C. For industry B tomanufacture $1 worth of its product, it needs to purchase 65c worth of product from industry Aand 5c worth of product from industry C, as well as use 5c worth of its own product. For industryC to manufacture $1 worth of its product, it needs to purchase 55c worth of product from industryA and 10c worth of product from industry B. In a particular week, industry A receives $500000worth of outside order, industry B receives $250000 worth of outside order, but industry C receivesno outside order. What is the production level required to satisfy all the demands precisely?

11. Suppose that C is an n× n consumption matrix with all column sums less than 1. Suppose furtherthat x′ is the production vector that satisfies an outside demand d′, and that x′′ is the productionvector that satisfies an outside demand d′′. Show that x′+x′′ is the production vector that satisfiesan outside demand d′ + d′′.



12. Suppose that C is an n× n consumption matrix with all column sums less than 1. Suppose furtherthat the demand vector d has 1 for its top entry and 0 for all other entries. Describe the productionvector x in terms of the columns of the matrix (I − C)−1, and give an interpretation of yourobservation.

13. Consider a pentagon in R2 with vertices (1, 1), (3, 1), (4, 2), (2, 4) and (1, 3). For each of the followingtransformations on the plane, find the 3× 3 matrix that describes the transformation with respectto homogeneous coordinates, and use it to find the image of the pentagon:

a) reflection across the x2-axisb) reflection across the line x1 = x2

c) anticlockwise rotation by 90

d) translation by the fixed vector (3,−2)e) shear in the x2-direction with factor 2f) dilation by factor 2g) expansion in x1-direction by factor 2h) reflection across the x2-axis, followed by anticlockwise rotation by 90

i) translation by the fixed vector (3,−2), followed by reflection across the line x1 = x2

j) shear in the x2-direction with factor 2, followed by dilation by factor 2, followed by expansion inx1-direction by factor 2

14. In homogeneous coordinates, a 3× 3 matrix that describes a transformation on the plane is of theform

A∗ =

a11 a12 h1

a21 a22 h2

0 0 1

.

Show that this transformation can be described by a matrix transformation on R2 followed by atranslation in R2.

15. Consider the matrices

A∗1 =

1 0 0sinφ cosφ 0

0 0 1

and A∗2 =

secφ − tanφ 00 1 00 0 1

,

where φ ∈ (0, 12π) is fixed.

a) Show that A∗1 represents a shear in the x2-direction followed by a compression in the x2-direction.b) Show that A∗2 represents a shear in the x1-direction followed by an expansion in the x1-direction.c) What transformation on the plane does the matrix A∗2A

∗1 describe?

[Remark: This technique is often used in computer graphics to speed up calculations.]

16. Consider the matrices

A∗1 =

1 − tan θ 00 1 00 0 1

and A∗2 =

1 0 0sin 2θ 1 0

0 0 1

,

where θ ∈ R is fixed.a) What transformation on the plane does the matrix A∗1 describe?b) What transformation on the plane does the matrix A∗2 describe?c) What transformation on the plane does the matrix A∗1A

∗2A∗1 describe?

[Remark: This technique is often used to reduce the number of multiplication operations.]

17. Show that the products and inverses of 3 × 3 unit lower triangular matrices are also unit lowertriangular.



18. For each of the following matrices A and b, find an LU factorization of the matrix A and use it tosolve the system Ax = b:

a) A =

2 1 24 6 54 6 8

and b =

62124

b) A =

3 1 39 4 106 −1 5

and b =

5189

c) A =

2 1 2 14 3 5 44 3 5 7

and b =

1918

d) A =

3 1 1 59 3 4 196 2 −1 0

and b =

10357

e) A =

2 −3 1 2−6 10 −5 −44 −7 6 −14 −2 −10 19

and b =

11−628

f) A =

2 −2 1 2 24 −3 0 7 5−4 7 −5 3 26 −8 19 −8 18

and b =

4121448

19. Consider a payoff matrix

A =

4 −1 −6 4−6 2 0 8−3 −8 7 −5

.

a) What is the expected payoff if p = ( 1/3 0 2/3 ) and q =

1/41/41/41/4

?

b) Suppose that player R adopts the strategy p = ( 1/3 0 2/3 ). What strategy should player Cadopt?

c) Suppose that player C adopts the strategy q =

1/41/41/41/4

. What strategy should player R adopt?

20. Construct a simple example show that optimal strategies are not unique.

21. Show that the entries in the matrices in (9) and (10) are in the range [0, 1].


LINEAR ALGEBRA

W W L CHEN

c© W W L Chen, 1982, 2008.






Chapter 3

DETERMINANTS

3.1. Introduction

In the last chapter, we have related the question of the invertibility of a square matrix to a question ofsolutions of systems of linear equations. In some sense, this is unsatisfactory, since it is not simple tofind an answer to either of these questions without a lot of work. In this chapter, we shall relate thesetwo questions to the question of the determinant of the matrix in question. As we shall see later, thetask is reduced to checking whether this determinant is zero or non-zero. So what is the determinant?

Let us start with 1× 1 matrices, of the form

A = ( a ) .

Note here that I1 = ( 1 ). If a 6= 0, then clearly the matrix A is invertible, with inverse matrix

A−1 = ( a−1 ) .

On the other hand, if a = 0, then clearly no matrix B can satisfy AB = BA = I1, so that the matrix Ais not invertible. We therefore conclude that the value a is a good “determinant” to determine whetherthe 1× 1 matrix A is invertible, since the matrix A is invertible if and only if a 6= 0.

Let us then agree on the following definition.


A = ( a )

is a 1× 1 matrix. We write

det(A) = a,

and call this the determinant of the matrix A.

Chapter 3 : Determinants page 1 of 24


Next, let us turn to 2× 2 matrices, of the form

A =(a bc d

).

We shall use elementary row operations to find out when the matrix A is invertible. So we consider thearray

(A|I2) =(a b 1 0c d 0 1

), (1)

and try to use elementary row operations to reduce the left hand half of the array to I2. Suppose firstof all that a = c = 0. Then the array becomes(

0 b 1 00 d 0 1

),

and so it is impossible to reduce the left hand half of the array by elementary row operations to thematrix I2. Consider next the case a 6= 0. Multiplying row 2 of the array (1) by a, we obtain(

a b 1 0ac ad 0 a

).

Adding −c times row 1 to row 2, we obtain(a b 1 00 ad− bc −c a

). (2)

If D = ad− bc = 0, then this becomes (a b 1 00 0 −c a

),

and so it is impossible to reduce the left hand half of the array by elementary row operations to thematrix I2. On the other hand, if D = ad− bc 6= 0, then the array (2) can be reduced by elementary rowoperations to (

1 0 d/D −b/D0 1 −c/D a/D

),

so that

A−1 =1

ad− bc(d −b−c a

).

Consider finally the case c 6= 0. Interchanging rows 1 and 2 of the array (1), we obtain(c d 0 1a b 1 0

).

Multiplying row 2 of the array by c, we obtain(c d 0 1ac bc c 0

).

Adding −a times row 1 to row 2, we obtain(c d 0 10 bc− ad c −a

).



Multiplying row 2 by −1, we obtain (c d 0 10 ad− bc −c a

). (3)

Again, if D = ad− bc = 0, then this becomes(c d 0 10 0 −c a

),

and so it is impossible to reduce the left hand half of the array by elementary row operations to thematrix I2. On the other hand, if D = ad− bc 6= 0, then the array (3) can be reduced by elementary rowoperations to (

1 0 d/D −b/D0 1 −c/D a/D

),

so that

A−1 =1

ad− bc(d −b−c a

).

Finally, note that a = c = 0 is a special case of ad− bc = 0. We therefore conclude that the value ad− bcis a good “determinant” to determine whether the 2 × 2 matrix A is invertible, since the matrix A isinvertible if and only if ad− bc 6= 0.

Let us then agree on the following definition.


A =(a bc d

)is a 2× 2 matrix. We write

det(A) = ad− bc,

and call this the determinant of the matrix A.

3.2. Determinants for Square Matrices of Higher Order

If we attempt to repeat the argument for 2 × 2 matrices to 3 × 3 matrices, then it is very likely thatwe shall end up in a mess with possibly no firm conclusion. Try the argument on 4× 4 matrices if youmust. Those who have their feet firmly on the ground will try a different approach.

Our approach is inductive in nature. In other words, we shall define the determinant of 2×2 matrices interms of determinants of 1×1 matrices, define the determinant of 3×3 matrices in terms of determinantsof 2 × 2 matrices, define the determinant of 4 × 4 matrices in terms of determinants of 3 × 3 matrices,and so on.

Suppose now that we have defined the determinant of (n− 1)× (n− 1) matrices. Let

A =

a11 . . . a1n...

...an1 . . . ann

(4)



be an n × n matrix. For every i, j = 1, . . . , n, let us delete row i and column j of A to obtain the(n− 1)× (n− 1) matrix

Aij =

a11 . . . a1(j−1) • a1(j+1) . . . a1n

......

......

...a(i−1)1 . . . a(i−1)(j−1) • a(i−1)(j+1) . . . a(i−1)n

• . . . • • • . . . •a(i+1)1 . . . a(i+1)(j−1) • a(i+1)(j+1) . . . a(i+1)n

......

......

...an1 . . . an(j−1) • an(j+1) . . . ann

. (5)

Here • denotes that the entry has been deleted.

Definition. The number Cij = (−1)i+j det(Aij) is called the cofactor of the entry aij of A. In otherwords, the cofactor of the entry aij is obtained from A by first deleting the row and the column containingthe entry aij , then calculating the determinant of the resulting (n − 1) × (n − 1) matrix, and finallymultiplying by a sign (−1)i+j .

Note that the entries of A in row i are given by

( ai1 . . . ain ) .

Definition. By the cofactor expansion of A by row i, we mean the expression

n∑j=1

aijCij = ai1Ci1 + . . .+ ainCin. (6)

Note that the entries of A in column j are given by a1j

...anj

.

Definition. By the cofactor expansion of A by column j, we mean the expression

n∑i=1

aijCij = a1jC1j + . . .+ anjCnj . (7)

We shall state without proof the following important result. The interested reader is referred toSection 3.8 for further discussion.

PROPOSITION 3A. Suppose that A is an n × n matrix given by (4). Then the expressions (6) and(7) are all equal and independent of the row or column chosen.

Definition. Suppose that A is an n× n matrix given by (4). We call the common value in (6) and (7)the determinant of the matrix A, denoted by det(A).



Let us check whether this agrees with our earlier definition of the determinant of a 2 × 2 matrix.Writing

A =(a11 a12

a21 a22

),

we have

C11 = a22, C12 = −a21, C21 = −a12, C22 = a11.

It follows that

by row 1 : a11C11 + a12C12 = a11a22 − a12a21,

by row 2 : a21C21 + a22C22 = −a21a12 + a22a11,

by column 1 : a11C11 + a21C21 = a11a22 − a21a12,

by column 2 : a12C12 + a22C22 = −a12a21 + a22a11.

The four values are clearly equal, and of the form ad− bc as before.


A =

2 3 51 4 22 1 5

.

Let us use cofactor expansion by row 1. Then

C11 = (−1)1+1 det(

4 21 5

)= (−1)2(20− 2) = 18,

C12 = (−1)1+2 det(

1 22 5

)= (−1)3(5− 4) = −1,

C13 = (−1)1+3 det(

1 42 1

)= (−1)4(1− 8) = −7,

so that

det(A) = a11C11 + a12C12 + a13C13 = 36− 3− 35 = −2.

Alternatively, let us use cofactor expansion by column 2. Then

C12 = (−1)1+2 det(

1 22 5

)= (−1)3(5− 4) = −1,

C22 = (−1)2+2 det(

2 52 5

)= (−1)4(10− 10) = 0,

C32 = (−1)3+2 det(

2 51 2

)= (−1)5(4− 5) = 1,

so that

det(A) = a12C12 + a22C22 + a32C32 = −3 + 0 + 1 = −2.

When using cofactor expansion, we should choose a row or column with as few non-zero entries aspossible in order to minimize the calculations.




A =

2 3 0 51 4 0 25 4 8 52 1 0 5

.

Here it is convenient to use cofactor expansion by column 3, since then

det(A) = a13C13 + a23C23 + a33C33 + a43C43 = 8C33 = 8(−1)3+3 det

2 3 51 4 22 1 5

= −16,

in view of Example 3.2.1.

3.3. Some Simple Observations

In this section, we shall describe two simple observations which follow immediately from the definitionof the determinant by cofactor expansion.

PROPOSITION 3B. Suppose that a square matrix A has a zero row or has a zero column. Thendet(A) = 0.

Proof. We simply use cofactor expansion by the zero row or zero column. ©

Definition. Consider an n× n matrix

A =

a11 . . . a1n...

...an1 . . . ann

.

If aij = 0 whenever i > j, then A is called an upper triangular matrix. If aij = 0 whenever i < j, thenA is called a lower triangular matrix. We also say that A is a triangular matrix if it is upper triangularor lower triangular.

Example 3.3.1. The matrix 1 2 30 4 50 0 6

is upper triangular.

Example 3.3.2. A diagonal matrix is both upper triangular and lower triangular.

PROPOSITION 3C. Suppose that the n× n matrix

A =

a11 . . . a1n...

...an1 . . . ann

is triangular. Then det(A) = a11a22 . . . ann, the product of the diagonal entries.



Proof. Let us assume that A is upper triangular – for the case when A is lower triangular, change theterm “left-most column” to the term “top row” in the proof. Using cofactor expansion by the left-mostcolumn at each step, we see that

det(A) = a11 det

a22 . . . a2n...

...an2 . . . ann

= a11a22 det

a33 . . . a3n...

...an3 . . . ann

= . . . = a11a22 . . . ann

as required. ©

3.4. Elementary Row Operations

We now study the effect of elementary row operations on determinants. Recall that the elementary rowoperations that we consider are: (1) interchanging two rows; (2) adding a multiple of one row to anotherrow; and (3) multiplying one row by a non-zero constant.

PROPOSITION 3D. (ELEMENTARY ROW OPERATIONS) Suppose that A is an n× n matrix.(a) Suppose that the matrix B is obtained from the matrix A by interchanging two rows of A. Then

det(B) = −det(A).(b) Suppose that the matrix B is obtained from the matrix A by adding a multiple of one row of A to

another row. Then det(B) = det(A).(c) Suppose that the matrix B is obtained from the matrix A by multiplying one row of A by a non-zero

constant c. Then det(B) = cdet(A).

Sketch of Proof. (a) The proof is by induction on n. It is easily checked that the result holds whenn = 2. When n > 2, we use cofactor expansion by a third row, say row i. Then

det(B) =n∑j=1

aij(−1)i+j det(Bij).

Note that the (n − 1) × (n − 1) matrices Bij are obtained from the matrices Aij by interchanging tworows of Aij , so that det(Bij) = −det(Aij). It follows that

det(B) = −n∑j=1

aij(−1)i+j det(Aij) = −det(A)

as required.

(b) Again, the proof is by induction on n. It is easily checked that the result holds when n = 2. Whenn > 2, we use cofactor expansion by a third row, say row i. Then

det(B) =n∑j=1

aij(−1)i+j det(Bij).

Note that the (n− 1)× (n− 1) matrices Bij are obtained from the matrices Aij by adding a multiple ofone row of Aij to another row, so that det(Bij) = det(Aij). It follows that

det(B) =n∑j=1

aij(−1)i+j det(Aij) = det(A)

as required.



(c) This is simpler. Suppose that the matrix B is obtained from the matrix A by multiplying row i ofA by a non-zero constant c. Then

det(B) =n∑j=1

caij(−1)i+j det(Bij).

Note now that Bij = Aij , since row i has been removed respectively from B and A. It follows that

det(B) =n∑j=1

caij(−1)i+j det(Aij) = cdet(A)

as required. ©

In fact, the above operations can also be carried out on the columns of A. More precisely, we havethe following result.

PROPOSITION 3E. (ELEMENTARY COLUMN OPERATIONS) Suppose that A is an n×n matrix.(a) Suppose that the matrix B is obtained from the matrix A by interchanging two columns of A. Then

det(B) = −det(A).(b) Suppose that the matrix B is obtained from the matrix A by adding a multiple of one column of A

to another column. Then det(B) = det(A).(c) Suppose that the matrix B is obtained from the matrix A by multiplying one column of A by a

non-zero constant c. Then det(B) = cdet(A).

Elementary row and column operations can be combined with cofactor expansion to calculate thedeterminant of a given matrix. We shall illustrate this point by the following examples.


A =

2 3 2 51 4 1 25 4 4 52 2 0 4

.

Adding −1 times column 3 to column 1, we have

det(A) = det

0 3 2 50 4 1 21 4 4 52 2 0 4

.

Adding −1/2 times row 4 to row 3, we have

det(A) = det

0 3 2 50 4 1 20 3 4 32 2 0 4

.

Using cofactor expansion by column 1, we have

det(A) = 2(−1)4+1 det

3 2 54 1 23 4 3

= −2 det

3 2 54 1 23 4 3

.

Adding −1 times row 1 to row 3, we have

det(A) = −2 det

3 2 54 1 20 2 −2

.



Adding 1 times column 2 to column 3, we have

det(A) = −2 det

3 2 74 1 30 2 0

.

Using cofactor expansion by row 3, we have

det(A) = −2 · 2(−1)3+2 det(

3 74 3

)= 4 det

(3 74 3

).

Using the formula for the determinant of 2 × 2 matrices, we conclude that det(A) = 4(9 − 28) = −76.Let us start again and try a different way. Dividing row 4 by 2, we have

det(A) = 2 det

2 3 2 51 4 1 25 4 4 51 1 0 2

.


det(A) = 2 det

2 3 2 50 3 1 05 4 4 51 1 0 2

.


det(A) = 2 det

2 −3 2 50 0 1 05 −8 4 51 1 0 2

.

Using cofactor expansion by row 2, we have

det(A) = 2 · 1(−1)2+3 det

2 −3 55 −8 51 1 2

= −2 det

2 −3 55 −8 51 1 2

.


det(A) = −2 det

0 −5 15 −8 51 1 2

.


det(A) = −2 det

0 −5 10 −13 −51 1 2

.


det(A) = −2 · 1(−1)3+1 det( −5 1−13 −5

)= −2 det

( −5 1−13 −5

).

Using the formula for the determinant of 2× 2 matrices, we conclude that det(A) = −2(25 + 13) = −76.




A =

2 1 0 1 32 3 1 2 54 7 2 3 71 0 1 1 32 1 0 2 0

.

Here we have the least number of non-zero entries in column 3, so let us work to get more zeros into thiscolumn. Adding −1 times row 4 to row 2, we have

det(A) = det

2 1 0 1 31 3 0 1 24 7 2 3 71 0 1 1 32 1 0 2 0

.


det(A) = det

2 1 0 1 31 3 0 1 22 7 0 1 11 0 1 1 32 1 0 2 0

.


det(A) = 1(−1)4+3 det

2 1 1 31 3 1 22 7 1 12 1 2 0

= − det

2 1 1 31 3 1 22 7 1 12 1 2 0

.


det(A) = −det

1 1 1 30 3 1 21 7 1 10 1 2 0

.


det(A) = −det

1 1 1 30 3 1 20 6 0 −20 1 2 0

.


det(A) = −1(−1)1+1 det

3 1 26 0 −21 2 0

= −det

3 1 26 0 −21 2 0

.

Adding 1 times row 1 to row 2, we have

det(A) = −det

3 1 29 1 01 2 0

.




det(A) = −2(−1)1+3 det(

9 11 2

)= −2 det

(9 11 2

).

Using the formula for the determinant of 2× 2 matrices, we conclude that det(A) = −2(18− 1) = −34.


A =

1 0 2 4 1 02 4 5 7 6 24 6 1 9 2 13 5 0 1 2 52 4 5 3 6 21 0 2 5 1 0

.

Here note that rows 1 and 6 are almost identical. Adding −1 times row 1 to row 6, we have

det(A) = det

1 0 2 4 1 02 4 5 7 6 24 6 1 9 2 13 5 0 1 2 52 4 5 3 6 20 0 0 1 0 0

.


det(A) = det

1 0 2 4 1 00 0 0 4 0 04 6 1 9 2 13 5 0 1 2 52 4 5 3 6 20 0 0 1 0 0

.


det(A) = det

1 0 2 4 1 00 0 0 0 0 04 6 1 9 2 13 5 0 1 2 52 4 5 3 6 20 0 0 1 0 0

.

It follows from Proposition 3B that det(A) = 0.

3.5. Further Properties of Determinants

Definition. Consider the n× n matrix

A =

a11 . . . a1n...

...an1 . . . ann

.

By the transpose At of A, we mean the matrix

At =

a11 . . . an1...

...a1n . . . ann



obtained from A by transposing rows and columns.


A =

1 2 34 5 67 8 9

.

Then

At =

1 4 72 5 83 6 9

.

Recall that determinants of 2×2 matrices depend on determinants of 1×1 matrices; in turn, determi-nants of 3×3 matrices depend on determinants of 2×2 matrices, and so on. It follows that determinantsof n × n matrices ultimately depend on determinants of 1 × 1 matrices. Note now that transposing a1× 1 matrix does not affect its determinant (why?). The result below follows in view of Proposition 3A.

PROPOSITION 3F. For every n× n matrix A, we have det(At) = det(A).

Example 3.5.2. We have

det

2 2 4 1 21 3 7 0 10 1 2 1 01 2 3 1 23 5 7 3 0

= det

2 1 0 1 32 3 1 2 54 7 2 3 71 0 1 1 32 1 0 2 0

= −34.

Next, we shall study the determinant of a product. In Section 3.8, we shall sketch a proof of thefollowing important result.

PROPOSITION 3G. For every n× n matrices A and B, we have det(AB) = det(A) det(B).

PROPOSITION 3H. Suppose that the n× n matrix A is invertible. Then

det(A−1) =1

det(A).

Proof. In view of Propositions 3G and 3C, we have det(A) det(A−1) = det(In) = 1. The result followsimmediately. ©

Finally, the main reason for studying determinants, as outlined in the introduction, is summarized bythe following result.

PROPOSITION 3J. Suppose that A is an n×n matrix. Then A is invertible if and only if det(A) 6= 0.

Proof. Suppose that A is invertible. Then det(A) 6= 0 follows immediately from Proposition 3H.Suppose now that det(A) 6= 0. Let us now reduce A by elementary row operations to reduced rowechelon form B. Then there exist a finite sequence E1, . . . , Ek of elementary n× n matrices such that

B = Ek . . . E1A.

It follows from Proposition 3G that

det(B) = det(Ek) . . . det(E1) det(A).



Recall that all elementary matrices are invertible and so have non-zero determinants. It follows thatdet(B) 6= 0, so that B has no zero rows by Proposition 3B. Since B is an n× n matrix in reduced rowechelon form, it must be In. We therefore conclude that A is row equivalent to In. It now follows fromProposition 2N(c) that A is invertible. ©

Combining Propositions 2Q and 3J, we have the following result.

PROPOSITION 3K. In the notation of Proposition 2N, the following statements are equivalent:(a) The matrix A is invertible.(b) The system Ax = 0 of linear equations has only the trivial solution.(c) The matrices A and In are row equivalent.(d) The system Ax = b of linear equations is soluble for every n× 1 matrix b.(e) The determinant det(A) 6= 0.

3.6. Application to Curves and Surfaces

A special case of Proposition 3K states that a homogeneous system of n linear equations in n variableshas a non-trivial solution if and only if the determinant if the coefficient matrix is equal to zero. In thissection, we shall use this to solve some problems in geometry. We illustrate our ideas by a few simpleexamples.

Example 3.6.1. Suppose that we wish to determine the equation of the unique line on the xy-plane thatpasses through two distinct given points (x1, y1) and (x2, y2). The equation of a line on the xy-plane isof the form ax + by + c = 0. Since the two points lie on the line, we must have ax1 + by1 + c = 0 andax2 + by2 + c = 0. Hence

xa+ yb+ c = 0,x1a+ y1b+ c = 0,x2a+ y2b+ c = 0.

Written in matrix notation, we have x y 1x1 y1 1x2 y2 1

abc

=

000

.

Clearly there is a non-trivial solution (a, b, c) to this system of linear equations, and so we must have

det

x y 1x1 y1 1x2 y2 1

= 0,

the equation of the line required.

Example 3.6.2. Suppose that we wish to determine the equation of the unique circle on the xy-planethat passes through three distinct given points (x1, y1), (x2, y2) and (x3, y3), not all lying on a straightline. The equation of a circle on the xy-plane is of the form a(x2 + y2) + bx+ cy+d = 0. Since the threepoints lie on the circle, we must have a(x2

1 + y21) + bx1 + cy1 + d = 0, a(x2

2 + y22) + bx2 + cy2 + d = 0, and

a(x23 + y2

3) + bx3 + cy3 + d = 0. Hence

(x2 + y2)a+ xb+ yc+ d = 0,(x2

1 + y21)a+ x1b+ y1c+ d = 0,

(x22 + y2

2)a+ x2b+ y2c+ d = 0,(x2

3 + y23)a+ x3b+ y3c+ d = 0.



Written in matrix notation, we havex2 + y2 x y 1x2

1 + y21 x1 y1 1

x22 + y2

2 x2 y2 1x2

3 + y23 x3 y3 1

abcd

=

0000

.

Clearly there is a non-trivial solution (a, b, c, d) to this system of linear equations, and so we must have

det

x2 + y2 x y 1x2

1 + y21 x1 y1 1

x22 + y2

2 x2 y2 1x2

3 + y23 x3 y3 1

= 0,

the equation of the circle required.

Example 3.6.3. Suppose that we wish to determine the equation of the unique plane in 3-space thatpasses through three distinct given points (x1, y1, z1), (x2, y2, z2) and (x3, y3, z3), not all lying on astraight line. The equation of a plane in 3-space is of the form ax + by + cz + d = 0. Since thethree points lie on the plane, we must have ax1 + by1 + cz1 + d = 0, ax2 + by2 + cz2 + d = 0, andax3 + by3 + cz3 + d = 0. Hence

xa+ yb+ zc+ d = 0,x1a+ y1b+ z1c+ d = 0,x2a+ y2b+ z2c+ d = 0,x3a+ y3b+ z3c+ d = 0.

Written in matrix notation, we havex y z 1x1 y1 z1 1x2 y2 z2 1x3 y3 z3 1

abcd

=

0000

.

Clearly there is a non-trivial solution (a, b, c, d) to this system of linear equations, and so we must have

det

x y z 1x1 y1 z1 1x2 y2 z2 1x3 y3 z3 1

= 0,

the equation of the plane required.

Example 3.6.4. Suppose that we wish to determine the equation of the unique sphere in 3-space thatpasses through four distinct given points (x1, y1, z1), (x2, y2, z2), (x3, y3, z3) and (x4, y4, z4), not all lyingon a plane. The equation of a sphere in 3-space is of the form a(x2 + y2 + z2) + bx + cy + dz + e = 0.Since the four points lie on the sphere, we must have

a(x21 + y2

1 + z21) + bx1 + cy1 + dz1 + e = 0,

a(x22 + y2

2 + z22) + bx2 + cy2 + dz2 + e = 0,

a(x23 + y2

3 + z23) + bx3 + cy3 + dz3 + e = 0,

a(x24 + y2

4 + z24) + bx4 + cy4 + dz4 + e = 0.

Hence

(x2 + y2 + z2)a+ xb+ yc+ zd+ e = 0,(x2

1 + y21 + z2

1)a+ x1b+ y1c+ z1d+ e = 0,(x2

2 + y22 + z2

2)a+ x2b+ y2c+ z2d+ e = 0,(x2

3 + y23 + z2

3)a+ x3b+ y3c+ z3d+ e = 0,(x2

4 + y24 + z2

4)a+ x4b+ y4c+ z4d+ e = 0.



Written in matrix notation, we havex2 + y2 + z2 x y z 1x2

1 + y21 + z2

1 x1 y1 z1 1x2

2 + y22 + z2

2 x2 y2 z2 1x2

3 + y23 + z2

3 x3 y3 z3 1x2

4 + y24 + z2

4 x4 y4 z4 1

abcde

=

00000

.

Clearly there is a non-trivial solution (a, b, c, d, e) to this system of linear equations, and so we must have

det

x2 + y2 + z2 x y z 1x2

1 + y21 + z2

1 x1 y1 z1 1x2

2 + y22 + z2

2 x2 y2 z2 1x2

3 + y23 + z2

3 x3 y3 z3 1x2

4 + y24 + z2

4 x4 y4 z4 1

= 0,

the equation of the sphere required.

3.7. Some Useful Formulas

In this section, we shall discuss two very useful formulas which involve determinants only. The first oneenables us to find the inverse of a matrix, while the second one enables us to solve a system of linearequations. The interested reader is referred to Section 3.8 for proofs.

Recall first of all that for any n× n matrix

A =

a11 . . . a1n...

...an1 . . . ann

,

the number Cij = (−1)i+j det(Aij) is called the cofactor of the entry aij , and the (n−1)×(n−1) matrix

Aij =

a11 . . . a1(j−1) • a1(j+1) . . . a1n

......

......

...a(i−1)1 . . . a(i−1)(j−1) • a(i−1)(j+1) . . . a(i−1)n

• . . . • • • . . . •a(i+1)1 . . . a(i+1)(j−1) • a(i+1)(j+1) . . . a(i+1)n

......

......

...an1 . . . an(j−1) • an(j+1) . . . ann

is obtained from A by deleting row i and column j; here • denotes that the entry has been deleted.

Definition. The n× n matrix

adj(A) =

C11 . . . Cn1...

...C1n . . . Cnn

is called the adjoint of the matrix A.

Remark. Note that adj(A) is obtained from the matrix A first by replacing each entry of A by itscofactor and then by transposing the resulting matrix.



PROPOSITION 3L. Suppose that the n× n matrix A is invertible. Then

A−1 =1

det(A)adj(A).


A =

1 −1 00 1 22 0 3

.

Then

adj(A) =

det(

1 20 3

)−det

(−1 00 3

)det(−1 0

1 2

)−det

(0 22 3

)det(

1 02 3

)−det

(1 00 2

)det(

0 12 0

)−det

(1 −12 0

)det(

1 −10 1

)

=

3 3 −24 3 −2−2 −2 1

.

On the other hand, adding 1 times column 1 to column 2 and then using cofactor expansion on row 1,we have

det(A) = det

1 −1 00 1 22 0 3

= det

1 0 00 1 22 2 3

= det(

1 22 3

)= −1.

It follows that

A−1 =

−3 −3 2−4 −3 22 2 −1

.

Next, we turn our attention to systems of n linear equations in n unknowns, of the form

a11x1 + . . .+ a1nxn = b1,

...an1x1 + . . .+ annxn = bn,

represented in matrix notation in the form

Ax = b,

where

A =

a11 . . . a1n...

...an1 . . . ann

and b =

b1...bn

(8)


x =

x1...xn

(9)

represents the variables.



For every j = 1, . . . , k, write

Aj(b) =

a11 . . . a1(j−1) b1 a1(j+1) . . . a1n

......

......

...an1 . . . an(j−1) bn an(j+1) . . . ann

; (10)

in other words, we replace column j of the matrix A by the column b.

PROPOSITION 3M. (CRAMER’S RULE) Suppose that the matrix A is invertible. Then the uniquesolution of the system Ax = b, where A, x and b are given by (8) and (9), is given by

x1 =det(A1(b))

det(A), . . . , xn =

det(An(b))det(A)

,

where the matrices A1(b), . . . , A1(b) are defined by (10).

Example 3.7.2. Consider the system Ax = b, where

A =

1 −1 00 1 22 0 3

and b =

123

.

Recall that det(A) = −1. By Cramer’s rule, we have

x1 =

det

1 −1 02 1 23 0 3

det(A)

= −3, x2 =

det

1 1 00 2 22 3 3

det(A)

= −4, x3 =

det

1 −1 10 1 22 0 3

det(A)

= 3.

Let us check our calculations. Recall from Example 3.7.1 that

A−1 =

−3 −3 2−4 −3 22 2 −1

.

We therefore have x1

x2

x3

=

−3 −3 2−4 −3 22 2 −1

123

=

−3−43

.

3.8. Further Discussion

In this section, we shall first discuss a definition of the determinant in terms of permutations. In orderto do so, we need to make a digression and discuss first the rudiments of permutations on non-emptyfinite sets.

Definition. Let X be a non-empty finite set. A permutation φ on X is a function φ : X → X which isone-to-one and onto. If x ∈ X, we denote by xφ the image of x under the permutation φ.

It is not difficult to see that if φ : X → X and ψ : X → X are both permutations on X, thenφψ : X → X, defined by xφψ = (xφ)ψ for every x ∈ X so that φ is followed by ψ, is also a permutationon X.



Remark. Note that we use the notation xφ instead of our usual notation φ(x) to denote the imageof x under φ. Note also that we write φψ to denote the composition ψ φ. We shall do this only forpermutations. The reasons will become a little clearer later in the discussion.

Since the set X is non-empty and finite, we may assume, without loss of generality, that it is1, 2, . . . , n, where n ∈ N. We now let Sn denote the set of all permutations on the set 1, 2, . . . , n. Inother words, Sn denotes the collection of all functions from 1, 2, . . . , n to 1, 2, . . . , n that are bothone-to-one and onto.

PROPOSITION 3N. For every n ∈ N, the set Sn has n! elements.

Proof. There are n choices for 1φ. For each such choice, there are (n − 1) choices left for 2φ. And soon. ©

To represent particular elements of Sn, there are various notations. For example, we can use thenotation (

1 2 . . . n1φ 2φ . . . nφ

)to denote the permutation φ.

Example 3.8.1. In S4, (1 2 3 42 4 1 3

)denotes the permutation φ, where 1φ = 2, 2φ = 4, 3φ = 1 and 4φ = 3. On the other hand, the readercan easily check that (

1 2 3 42 4 1 3

)(1 2 3 43 2 4 1

)=(

1 2 3 42 1 3 4

).

A more convenient way is to use the cycle notation. The permutations(1 2 3 42 4 1 3

)and

(1 2 3 43 2 4 1

)can be represented respectively by the cycles (1 2 4 3) and (1 3 4). Here the cycle (1 2 4 3) gives theinformation 1φ = 2, 2φ = 4, 4φ = 3 and 3φ = 1. Note also that in the latter case, since the image of 2is 2, it is not necessary to include this in the cycle. Furthermore, the information(

1 2 3 42 4 1 3

)(1 2 3 43 2 4 1

)=(

1 2 3 42 1 3 4

)can be represented in cycle notation by (1 2 4 3)(1 3 4) = (1 2). We also say that the cycles (1 2 4 3),(1 3 4) and (1 2) have lengths 4, 3 and 2 respectively.

Example 3.8.2. In S6, the permutation(1 2 3 4 5 62 4 1 3 6 5

)can be represented in cycle notation as (1 2 4 3)(5 6).

Example 3.8.3. In S4 or S6, we have (1 2 4 3) = (1 2)(1 4)(1 3).



The last example motivates the following important idea.

Definition. Suppose that n ∈ N. A permutation in Sn that interchanges two numbers among theelements of 1, 2, . . . , n and leaves all the others unchanged is called a transposition.

Remark. It is obvious that a transposition can be represented by a 2-cycle, and is its own inverse.

Definition. Two cycles (x1 x2 . . . xk) and (y1 y2 . . . yl) in Sn are said to be disjoint if the elementsx1, . . . , xk, y1, . . . , yl are all different.

The interested reader may try to prove the following result.

PROPOSITION 3P. Suppose that n ∈ N.(a) Every permutation in Sn can be written as a product of disjoint cycles.(b) For every subset x1, x2, . . . , xk of the set 1, 2, . . . , n, where the elements x1, x2, . . . , xk are dis-

tinct, the cycle (x1 x2 . . . xk) satisfies

(x1 x2 . . . xk) = (x1 x2)(x1 x3) . . . (x1 xk);

in other words, every cycle can be written as a product of transpositions.(c) Consequently, every permutation in Sn can be written as a product of transpositions.

Example 3.8.4. In S9, the permutation(1 2 3 4 5 6 7 8 93 2 5 1 7 8 4 9 6

)can be written in cycle notation as (1 3 5 7 4)(6 8 9). By Theorem 3P(b), we have

(1 3 5 7 4) = (1 3)(1 5)(1 7)(1 4) and (6 8 9) = (6 8)(6 9).

Hence the permutation can be represented by (1 3)(1 5)(1 7)(1 4)(6 8)(6 9).

Definition. Suppose that n ∈ N. Then a permutation in Sn is said to be even if it is representable asthe product of an even number of transpositions and odd if it is representable as the product of an oddnumber of transpositions. Furthermore, we write

ε(φ) =

+1 if φ is even,−1 if φ is odd.

Remark. It can be shown that no permutation can be simultaneously odd and even.

We are now in a position to define the determinant of a matrix. Suppose that

A =

a11 . . . a1n...

...an1 . . . ann

(11)

is an n× n matrix.

Definition. By an elementary product from the matrix A, we mean the product of n entries of A, notwo of which are from the same row or same column.

It follows that any such elementary product must be of the form

a1(1φ)a2(2φ) . . . an(nφ),

where φ is a permutation in Sn.



Definition. By the determinant of an n× n matrix A of the form (11), we mean the sum

det(A) =∑φ∈Sn

ε(φ)a1(1φ)a2(2φ) . . . an(nφ), (12)

where the summation is over all the n! permutations φ in Sn.

It is be shown that the determinant defined in this way is the same as that defined earlier by row orcolumn expansions. Indeed, one can use (12) to establish Proposition 3A. The very interested readermay wish to make an attempt. Here we confine our study to the special cases when n = 2 and n = 3.In the two examples below, we use e to denote the identity permutation.

Example 3.8.5. Suppose that n = 2. We have the following:

elementary product permutation sign contribution

a11a22 e +1 +a11a22

a12a21 (1 2) −1 −a12a21

Hence det(A) = a11a22 − a12a21 as shown before.



a11a22a33 e +1 +a11a22a33

a12a23a31 (1 2 3) +1 +a12a23a31

a13a21a32 (1 3 2) +1 +a13a21a32

a13a22a31 (1 3) −1 −a13a22a31

a11a23a32 (2 3) −1 −a11a23a32

a12a21a33 (1 2) −1 −a12a21a33

Hence det(A) = a11a22a33 + a12a23a31 + a13a21a32 − a13a22a31 − a11a23a32 − a12a21a33. We have thepicture below:


We are now in a position to define the determinant of a matrix. Suppose that

(11) A =

!" a11 . . . a1n...

...an1 . . . ann

#$is an n" n matrix.

Definition. By an elementary product from the matrix A, we mean the product of n entries of A, notwo of which are from the same row or same column.

It follows that any such elementary product must be of the form

a1(1!)a2(2!) . . . an(n!),

where ! is a permutation in Sn.

Definition. By the determinant of an n" n matrix A of the form (11), we mean the sum

(12) det(A) =%

!!Sn

"(!)a1(1!)a2(2!) . . . an(n!),

where the summation is over all the n! permutations ! in Sn.

It is be shown that the determinant defined in this way is the same as that defined earlier by row orcolumn expansions. Indeed, one can use (12) to establish Proposition 3A. The very interested readermay wish to make an attempt. Here we confine our study to the special cases when n = 2 and n = 3.In the two examples below, we use e to denote the identity permutation.



a11a22 e +1 +a11a22

a12a21 (1 2) #1 #a12a21

Hence det(A) = a11a22 # a12a21 as shown before.



a11a22a33 e +1 +a11a22a33

a12a23a31 (1 2 3) +1 +a12a23a31

a13a21a32 (1 3 2) +1 +a13a21a32

a13a22a31 (1 3) #1 #a13a22a31

a11a23a32 (2 3) #1 #a11a23a32

a12a21a33 (1 2) #1 #a12a21a33

Hence det(A) = a11a22a33 + a12a23a31 + a13a21a32 # a13a22a31 # a11a23a32 # a12a21a33. We have thepicture below:

+ + + # # #

a11 a12 a13 a11 a12

a21 a22 a23 a21 a22

a31 a32 a33 a31 a32


Next, we discuss briefly how one may prove Proposition 3G concerning the determinant of the productof two matrices. The idea is to use elementary matrices. Corresponding to Proposition 3D, we can easilyestablish the following result.

PROPOSITION 3Q. Suppose that E is an elementary matrix.(a) If E arises from interchanging two rows of In, then det(E) = −1.(b) If E arises from adding one row of In to another row, then det(E) = 1.(c) If E arises from multiplying one row of In by a non-zero constant c, then det(E) = c.

Combining Propositions 3D and 3Q, we can establish the following intermediate result.



PROPOSITION 3R. Suppose that E is an n× n elementary matrix. Then for any n× n matrix B,we have det(EB) = det(E) det(B).

Proof of Proposition 3G. Let us reduce A by elementary row operations to reduced row echelon formA′. Then there exist a finite sequence G1, . . . , Gk of elementary matrices such that A′ = Gk . . . G1A.Since elementary matrices are invertible with elementary inverse matrices, it follows that there exist afinite sequence E1, . . . , Ek of elementary matrices such that

A = E1 . . . EkA′. (13)

Suppose first of all that det(A) = 0. Then it follows from (13) that the matrix A′ must have a zero row.Hence A′B must have a zero row, and so det(A′B) = 0. But AB = E1 . . . Ek(A′B), so it follows fromProposition 3R that det(AB) = 0. Suppose next that det(A) 6= 0. Then A′ = In, and so it follows from(13) that AB = E1 . . . EkB. The result now follows on applying Proposition 3R. ©

We complete this chapter by establishing the two formulas discussed in Section 3.7.

Proof of Proposition 3L. It suffices to show that

A adj(A) = det(A)In, (14)

as this clearly implies

A

(1

det(A)adj(A)

)= In,

giving the result. To show (14), note that

A adj(A) =

a11 . . . a1n...

...an1 . . . ann

C11 . . . Cn1...

...C1n . . . Cnn

. (15)

Suppose that the right hand side of (15) is equal to

B =

b11 . . . b1n...

...bn1 . . . bnn

.

Then for every i, j = 1, . . . , n, we have

bij = ai1Cj1 + . . .+ ainCjn. (16)

It follows that when i = j, we have

bii = ai1Ci1 + . . .+ ainCin = det(A).

On the other hand, if i 6= j, then (16) is equal to the determinant of the matrix obtained from A byreplacing row j by row i. This matrix has therefore two identical rows, and so the determinant is 0(why?). The identity (14) follows immediately. ©



Proof of Proposition 3M. Since A is invertible, it follows from Proposition 3L that

A−1 =1

det(A)adj(A).

By Proposition 2P, the unique solution of the system Ax = b is given by

x = A−1b =1

det(A)adj(A)b.

Written in full, this becomes x1...xn

=1

det(A)

C11 . . . Cn1...

...C1n . . . Cnn

b1...bn

=1

det(A)

b1C11 + . . .+ bnCn1...

b1C1n + . . .+ bnCnn

.

Hence, for every j = 1, . . . , n, we have

xj =b1C1j + . . .+ bnCnj

det(A).

To complete the proof, it remains to show that

b1C1j + . . .+ bnCnj = det(Aj(b)).

Note, on using cofactor expansion by column j, that

det(Aj(b)) =n∑i=1

bi(−1)i+j det

a11 . . . a1(j−1) • a1(j+1) . . . a1n

......

......

...a(i−1)1 . . . a(i−1)(j−1) • a(i−1)(j+1) . . . a(i−1)n

• . . . • • • . . . •a(i+1)1 . . . a(i+1)(j−1) • a(i+1)(j+1) . . . a(i+1)n

......

......

...an1 . . . an(j−1) • an(j+1) . . . ann

=

n∑i=1

bi(−1)i+j det(Aij) =n∑i=1

biCij

as required. ©




1. Compute the determinant of each of the matrices in Problem 2.6.

2. Find the determinant of each of the following matrices:

P =

1 3 28 4 02 1 2

, Q =

1 1 −11 −1 1−1 1 1

, R =

a a2 a3

b b2 b3

c c2 c3

.

3. Find the determinant of the matrix 3 4 5 21 0 1 02 3 6 37 2 9 4

.

4. By using suitable elementary row and column operations as well as row and column expansions,show that

det

2 3 7 1 32 3 7 1 52 3 6 1 94 6 2 3 45 8 7 4 5

= 2.

[Remark: Note that rows 1 and 2 of the matrix are almost identical.]

5. By using suitable elementary row and column operations as well as row and column expansions,show that

det

2 1 5 1 32 1 5 1 24 3 2 1 14 3 2 0 12 1 6 π 7

= 2.

[Remark: The entry π is not a misprint!]

6. If A and B are square matrices of the same size and detA = 2 and detB = 3, find det(A2B−1).

7. a) Compute the Vandermonde determinants

det

1 a a2

1 b b2

1 c c2

and det

1 a a2 a3

1 b b2 b3

1 c c2 c3

1 d d2 d3

.

b) Establish a formula for the Vandermonde determinant

det

1 a1 a2

1 . . . an−11

1 a2 a22 . . . an−1

2...

......

...1 an a2

n . . . an−1n

.



8. Compute the determinant

det

a b ca+ x b+ x c+ xa+ y b+ y c+ y

.

9. For each of the matrices below, compute its adjoint and use Proposition 3L to calculate its inverse:

a)

1 1 32 −2 10 1 0

b)

3 5 42 1 11 0 1

10. Use Cramer’s rule to solve the system of linear equations

2x1 + x2 + x3 = 4,−x1 + 2x3 = 2,3x1 + x2 + 3x3 = −2.


LINEAR ALGEBRA

W W L CHEN

c© W W L Chen, 1982, 2008.






Chapter 4

VECTORS

4.1. Introduction

A vector is an object which has magnitude and direction.

Example 4.1.1. We may be travelling north-east at 50 kph. In this case, the direction of the velocityis north-east and the magnitude of the velocity is 50 kph. We can describe our velocity in kph as(

50√2,

50√2

),

where the first coordinate describes the speed with which we are moving east and the second coordinatedescribes the speed with which we are moving north.

Example 4.1.2. An object in the sky may be 100 metres away in the south-east direction 45 degreesupwards. In this case, the direction of its position is south-eastand 45 degrees upwards and the magnitudeof its distance is 100 metres. We can describe the position of the object in metres as(

50,−50,100√

2

),

where the first coordinate describes the distance east, the second coordinate describes the distance northand the third coordinate describes the distance up.

The purpose of this chapter is to study some relationship between algebra and geometry. We shallfirst study some algebra which is motivated by geometric considerations. We then use the algebra laterto better understand some problems in geometry.

Chapter 4 : Vectors page 1 of 24


4.2. Vectors in R2

A vector on the plane R2 can be described as an ordered pair u = (u1, u2), where u1, u2 ∈ R.

Definition. Two vectors u = (u1, u2) and v = (v1, v2) in R2 are said to be equal, denoted by u = v, ifu1 = v1 and u2 = v2.

Definition. For any two vectors u = (u1, u2) and v = (v1, v2) in R2, we define their sum to be

u + v = (u1, u2) + (v1, v2) = (u1 + v1, u2 + v2).

Geometrically, if we represent the two vectors u and v by −−→AB and −−→BC respectively, then the sumu + v is represented by −→AC as shown in the diagram below:


4.2. Vectors in R2

A vector on the plane R2 can be described as an ordered pair u = (u1, u2), where u1, u2 " R.



u + v = (u1, u2) + (v1, v2) = (u1 + v1, u2 + v2).

Geometrically, if we represent the two vectors u and v by ##$AB and ##$BC respectively, then the sumu + v is represented by #$AC as shown in the diagram below:

C

A Bu

u+vv

The next diagram demonstrates geometrically that u + v = v + u:

D C

A B

u

u

u+vv v

PROPOSITION 4A. (VECTOR ADDITION)(a) For every u,v " R2, we have u + v " R2.(b) For every u,v,w " R2, we have u + (v + w) = (u + v) + w.(c) For every u " R2, we have u + 0 = u, where 0 = (0, 0) " R2.(d) For every u " R2, there exists v " R2 such that u + v = 0.(e) For every u,v " R2, we have u + v = v + u.

Proof. Write u = (u1, u2), v = (v1, v2) and w = (w1, w2), where u1, u2, v1, v2, w1, w2 " R. To checkpart (a), simply note that u1 + v1, u2 + v2 " R. To check part (b), note that

u + (v + w) = (u1, u2) + (v1 + w1, v2 + w2) = (u1 + (v1 + w1), u2 + (v2 + w2))= ((u1 + v1) + w1, (u2 + v2) + w2) = (u1 + v1, u2 + v2) + (w1, w2)= (u + v) + w.

Part (c) is trivial. Next, if v = (#u1,#u2), then u + v = 0, giving part (d). To check part (e), notethat u + v = (u1 + v1, u2 + v2) = (v1 + u1, v2 + u2) = v + u. !




4.2. Vectors in R2

A vector on the plane R2 can be described as an ordered pair u = (u1, u2), where u1, u2 " R.



u + v = (u1, u2) + (v1, v2) = (u1 + v1, u2 + v2).

Geometrically, if we represent the two vectors u and v by ##$AB and ##$BC respectively, then the sumu + v is represented by #$AC as shown in the diagram below:

C

A Bu

u+vv


D C

A B

u

u

u+vv v

PROPOSITION 4A. (VECTOR ADDITION)(a) For every u,v " R2, we have u + v " R2.(b) For every u,v,w " R2, we have u + (v + w) = (u + v) + w.(c) For every u " R2, we have u + 0 = u, where 0 = (0, 0) " R2.(d) For every u " R2, there exists v " R2 such that u + v = 0.(e) For every u,v " R2, we have u + v = v + u.

Proof. Write u = (u1, u2), v = (v1, v2) and w = (w1, w2), where u1, u2, v1, v2, w1, w2 " R. To checkpart (a), simply note that u1 + v1, u2 + v2 " R. To check part (b), note that


Part (c) is trivial. Next, if v = (#u1,#u2), then u + v = 0, giving part (d). To check part (e), notethat u + v = (u1 + v1, u2 + v2) = (v1 + u1, v2 + u2) = v + u. !


PROPOSITION 4A. (VECTOR ADDITION)(a) For every u,v ∈ R2, we have u + v ∈ R2.(b) For every u,v,w ∈ R2, we have u + (v + w) = (u + v) + w.(c) For every u ∈ R2, we have u + 0 = u, where 0 = (0, 0) ∈ R2.(d) For every u ∈ R2, there exists v ∈ R2 such that u + v = 0.(e) For every u,v ∈ R2, we have u + v = v + u.

Proof. Write u = (u1, u2), v = (v1, v2) and w = (w1, w2), where u1, u2, v1, v2, w1, w2 ∈ R. To checkpart (a), simply note that u1 + v1, u2 + v2 ∈ R. To check part (b), note that


Part (c) is trivial. Next, if v = (−u1,−u2), then u + v = 0, giving part (d). To check part (e), notethat u + v = (u1 + v1, u2 + v2) = (v1 + u1, v2 + u2) = v + u. ©



Definition. For any vector u = (u1, u2) in R2 and any scalar c ∈ R, we define the scalar multiple to be

cu = c(u1, u2) = (cu1, cu2).

Example 4.2.1. Suppose that u = (2, 1). Then −2u = (−4, 2). Geometrically, if we represent the twovectors u and −2u by −→OA and −−→OB respectively, then we have the diagram below:


Definition. For any vector u = (u1, u2) in R2 and any scalar c " R, we define the scalar multiple to be

cu = c(u1, u2) = (cu1, cu2).

Example 4.2.1. Suppose that u = (2, 1). Then #2u = (#4, 2). Geometrically, if we represent the twovectors u and #2u by #$OA and ##$OB respectively, then we have the diagram below:

A

O

B

u

!2u

PROPOSITION 4B. (SCALAR MULTIPLICATION)(a) For every c " R and u " R2, we have cu " R2.(b) For every c " R and u,v " R2, we have c(u + v) = cu + cv.(c) For every a, b " R and u " R2, we have (a + b)u = au + bu.(d) For every a, b " R and u " R2, we have (ab)u = a(bu).(e) For every u " R2, we have 1u = u.

Proof. Write u = (u1, u2) and v = (v1, v2), where u1, u2, v1, v2 " R. To check part (a), simply notethat cu1, cu2 " R. To check part (b), note that

c(u + v) = c(u1 + v1, u2 + v2) = (c(u1 + v1), c(u2 + v2))= (cu1 + cv1, cu2 + cv2) = (cu1, cu2) + (cv1, cv2) = cu + cv.

To check part (c), note that

(a + b)u = ((a + b)u1, (a + b)u2) = (au1 + bu1, au2 + bu2)= (au1, au2) + (bu1, bu2) = au + bu.

To check part (d), note that

(ab)u = ((ab)u1, (ab)u2) = (a(bu1), a(bu2)) = a(bu1, bu2) = a(bu).

Finally, to check part (e), note that 1u = (1u1, 1u2) = (u1, u2) = u. !

Definition. For any vector u = (u1, u2) in R2, we define the norm of u to be the non-negative realnumber

%u% =!

u21 + u2

2.

Remarks. (1) The norm of a vector is simply its magnitude or length. The definition follows from thefamous theorem of Pythagoras.

(2) Suppose that P (u1, u2) and Q(v1, v2) are two points on the plane R2. To calculate the distanced(P, Q) between the two points, we can first find a vector from P to Q. This is given by (v1#u1, v2#u2).The distance d(P, Q) is then the norm of this vector, so that

d(P, Q) ="

(v1 # u1)2 + (v2 # u2)2.


PROPOSITION 4B. (SCALAR MULTIPLICATION)(a) For every c ∈ R and u ∈ R2, we have cu ∈ R2.(b) For every c ∈ R and u,v ∈ R2, we have c(u + v) = cu + cv.(c) For every a, b ∈ R and u ∈ R2, we have (a+ b)u = au + bu.(d) For every a, b ∈ R and u ∈ R2, we have (ab)u = a(bu).(e) For every u ∈ R2, we have 1u = u.

Proof. Write u = (u1, u2) and v = (v1, v2), where u1, u2, v1, v2 ∈ R. To check part (a), simply notethat cu1, cu2 ∈ R. To check part (b), note that

c(u + v) = c(u1 + v1, u2 + v2) = (c(u1 + v1), c(u2 + v2))= (cu1 + cv1, cu2 + cv2) = (cu1, cu2) + (cv1, cv2) = cu + cv.

To check part (c), note that

(a+ b)u = ((a+ b)u1, (a+ b)u2) = (au1 + bu1, au2 + bu2)= (au1, au2) + (bu1, bu2) = au + bu.

To check part (d), note that

(ab)u = ((ab)u1, (ab)u2) = (a(bu1), a(bu2)) = a(bu1, bu2) = a(bu).

Finally, to check part (e), note that 1u = (1u1, 1u2) = (u1, u2) = u. ©

Definition. For any vector u = (u1, u2) in R2, we define the norm of u to be the non-negative realnumber

‖u‖ =√u2

1 + u22.

Remarks. (1) The norm of a vector is simply its magnitude or length. The definition follows from thefamous theorem of Pythagoras.

(2) Suppose that P (u1, u2) and Q(v1, v2) are two points on the plane R2. To calculate the distanced(P,Q) between the two points, we can first find a vector from P to Q. This is given by (v1−u1, v2−u2).The distance d(P,Q) is then the norm of this vector, so that

d(P,Q) =√

(v1 − u1)2 + (v2 − u2)2.



(3) It is not difficult to see that for any vector u ∈ R2 and any scalar c ∈ R, we have ‖cu‖ = |c|‖u‖.

Definition. Any vector u ∈ R2 satisfying ‖u‖ = 1 is called a unit vector.

Example 4.2.2. The vector (3, 4) has norm 5.

Example 4.2.3. The distance between the points (6, 3) and (9, 7) is√

(9− 6)2 + (7− 3)2 = 5.

Example 4.2.4. The vectors (1, 0) and (0,−1) are unit vectors in R2.

Example 4.2.5. The unit vector in the direction of the vector (1, 1) is (1/√

2, 1/√

2).

Example 4.2.6. In fact, all unit vectors in R2 are of the form (cos θ, sin θ), where θ ∈ R.

Quite often, we may want to find the angle between two vectors. The scalar product of the two vectorsthen comes in handy. We shall define the scalar product in two ways, one in terms of the angle betweenthe two vectors and the other not in terms of this angle, and show that the two definitions are in factequivalent.

Definition. Suppose that u = (u1, u2) and v = (v1, v2) are vectors in R2, and that θ ∈ [0, π] representsthe angle between them. We define the scalar product u · v of u and v by

u · v = ‖u‖‖v‖ cos θ if u 6= 0 and v 6= 0,

0 if u = 0 or v = 0.(1)

Alternatively, we write

u · v = u1v1 + u2v2. (2)

The definitions (1) and (2) are clearly equivalent if u = 0 or v = 0. On the other hand, we have thefollowing result.

PROPOSITION 4C. Suppose that u = (u1, u2) and v = (v1, v2) are non-zero vectors in R2, and thatθ ∈ [0, π] represents the angle between them. Then

‖u‖‖v‖ cos θ = u1v1 + u2v2.

Proof. Geometrically, if we represent the two vectors u and v by −→OA and −−→OB respectively, then thedifference v − u is represented by −−→AB as shown in the diagram below:


(3) It is not di!cult to see that for any vector u " R2 and any scalar c " R, we have #cu# = |c|#u#.

Definition. Any vector u " R2 satisfying #u# = 1 is called a unit vector.

Example 4.2.2. The vector (3, 4) has norm 5.

Example 4.2.3. The distance between the points (6, 3) and (9, 7) is!

(9$ 6)2 + (7$ 3)2 = 5.

Example 4.2.4. The vectors (1, 0) and (0,$1) are unit vectors in R2.

Example 4.2.5. The unit vector in the direction of the vector (1, 1) is (1/%

2, 1/%

2).

Example 4.2.6. In fact, all unit vectors in R2 are of the form (cos !, sin !), where ! " R.

Quite often, we may want to find the angle between two vectors. The scalar product of the two vectorsthen comes in handy. We shall define the scalar product in two ways, one in terms of the angle betweenthe two vectors and the other not in terms of this angle, and show that the two definitions are in factequivalent.

Definition. Suppose that u = (u1, u2) and v = (v1, v2) are vectors in R2, and that ! " [0, "] representsthe angle between them. We define the scalar product u · v of u and v by

(1) u · v =" #u##v# cos ! if u &= 0 and v &= 0,

0 if u = 0 or v = 0.


(2) u · v = u1v1 + u2v2.

The definitions (1) and (2) are clearly equivalent if u = 0 or v = 0. On the other hand, we have thefollowing result.

PROPOSITION 4C. Suppose that u = (u1, u2) and v = (v1, v2) are non-zero vectors in R2, and that! " [0, "] represents the angle between them. Then

#u##v# cos ! = u1v1 + u2v2.

Proof. Geometrically, if we represent the two vectors u and v by $'OA and $$'OB respectively, then thedi"erence v $ u is represented by $$'AB as shown in the diagram below:

B

A

O

v!u

!u

v

By the Law of cosines, we have

AB2

= OA2

+ OB2 $ 2OAOB cos !;


By the Law of cosines, we have

AB2

= OA2

+OB2 − 2OAOB cos θ;



in other words, we have

‖v − u‖2 = ‖u‖2 + ‖v‖2 − 2‖u‖‖v‖ cos θ,

so that

‖u‖‖v‖ cos θ = 12 (‖u‖2 + ‖v‖2 − ‖v − u‖2)

= 12 (u2

1 + u22 + v2

1 + v22 − (v1 − u1)2 − (v2 − u2)2)

= u1v1 + u2v2

as required. ©

Remarks. (1) We say that two non-zero vectors in R2 are orthogonal if the angle between them is π/2.It follows immediately from the definition of the scalar product that two non-zero vectors u,v ∈ R2 areorthogonal if and only if u · v = 0.

(2) We can calculate the scalar product of any two non-zero vectors u,v ∈ R2 by the formula (2) andthen use the formula (1) to calculate the angle between u and v.

Example 4.2.7. Suppose that u = (√

3, 1) and v = (√

3, 3). Then by the formula (2), we have

u · v = 3 + 3 = 6.

Note now that

‖u‖ = 2 and ‖v‖ = 2√

3.

It follows from the formula (1) that

cos θ =u · v‖u‖‖v‖ =

64√

3=√

32,

so that θ = π/6.

Example 4.2.8. Suppose that u = (√

3, 1) and v = (−√3, 3). Then by the formula (2), we haveu · v = 0. It follows that u and v are orthogonal.

PROPOSITION 4D. (SCALAR PRODUCT) Suppose that u,v,w ∈ R2 and c ∈ R. Then(a) u · v = v · u;(b) u · (v + w) = (u · v) + (u ·w);(c) c(u · v) = (cu) · v = u · (cv);(d) u · u ≥ 0; and(e) u · u = 0 if and only if u = 0.

Proof. Write u = (u1, u2), v = (v1, v2) and w = (w1, w2), where u1, u2, v1, v2, w1, w2 ∈ R. Part (a) istrivial. To check part (b), note that

u · (v + w) = u1(v1 + w1) + u2(v2 + w2) = (u1v1 + u2v2) + (u1w1 + u2w2) = u · v + u ·w.

Part (c) is rather simple. To check parts (d) and (e), note that u · u = u21 + u2

2 ≥ 0, and that equalityholds precisely when u1 = u2 = 0. ©



Consider the diagram below:



(3)

P

R A

Q

O

av

w

u

Here we represent the two vectors a and u by "#OA and ""#OP respectively. If we project the vector u on tothe line OA, then the image of the projection is the vector w, represented by ""#OQ. On the other hand,if we project the vector u on to a line perpendicular to the line OA, then the image of the projection isthe vector v, represented by ""#OR.

Definition. In the notation of the diagram (3), the vector w is called the orthogonal projection of thevector u on the vector a, and denoted by w = projau.

PROPOSITION 4E. (ORTHOGONAL PROJECTION) Suppose that u,a $ R2. Then

projau =u · a%a%2 a.

Remark. Note that the component of u orthogonal to a, represented by ""#OR in the diagram (3), is

u" projau = u" u · a%a%2 a.

Proof of Proposition 4E. Note that w = ka for some k $ R. It clearly su!ces to prove that

k =u · a%a%2 .

It is easy to see that the vectors u " w and a are orthogonal. It follows that the scalar product(u"w) · a = 0. In other words, (u" ka) · a = 0. Hence

k =u · aa · a =

u · a%a%2

as required. !

To end this section, we shall apply our knowledge gained so far to find a formula that gives theperpendicular distance of a point (x0, y0) from a line ax + by + c = 0. Consider the diagram below:

P

n = (a, b)

Q

O

D

u

ax+by+c=0


(3)

Here we represent the two vectors a and u by −→OA and −−→OP respectively. If we project the vector u on tothe line OA, then the image of the projection is the vector w, represented by −−→OQ. On the other hand,if we project the vector u on to a line perpendicular to the line OA, then the image of the projection isthe vector v, represented by −−→OR.


PROPOSITION 4E. (ORTHOGONAL PROJECTION) Suppose that u,a ∈ R2. Then

projau =u · a‖a‖2 a.

Remark. Note that the component of u orthogonal to a, represented by −−→OR in the diagram (3), is

u− projau = u− u · a‖a‖2 a.

Proof of Proposition 4E. Note that w = ka for some k ∈ R. It clearly suffices to prove that

k =u · a‖a‖2 .

It is easy to see that the vectors u − w and a are orthogonal. It follows that the scalar product(u−w) · a = 0. In other words, (u− ka) · a = 0. Hence

k =u · aa · a =

u · a‖a‖2

as required. ©

To end this section, we shall apply our knowledge gained so far to find a formula that gives theperpendicular distance of a point (x0, y0) from a line ax+ by + c = 0. Consider the diagram below:



(3)

P

R A

Q

O

av

w

u

Here we represent the two vectors a and u by "#OA and ""#OP respectively. If we project the vector u on tothe line OA, then the image of the projection is the vector w, represented by ""#OQ. On the other hand,if we project the vector u on to a line perpendicular to the line OA, then the image of the projection isthe vector v, represented by ""#OR.


PROPOSITION 4E. (ORTHOGONAL PROJECTION) Suppose that u,a $ R2. Then

projau =u · a%a%2 a.

Remark. Note that the component of u orthogonal to a, represented by ""#OR in the diagram (3), is

u" projau = u" u · a%a%2 a.

Proof of Proposition 4E. Note that w = ka for some k $ R. It clearly su!ces to prove that

k =u · a%a%2 .

It is easy to see that the vectors u " w and a are orthogonal. It follows that the scalar product(u"w) · a = 0. In other words, (u" ka) · a = 0. Hence

k =u · aa · a =

u · a%a%2

as required. !

To end this section, we shall apply our knowledge gained so far to find a formula that gives theperpendicular distance of a point (x0, y0) from a line ax + by + c = 0. Consider the diagram below:

P

n = (a, b)

Q

O

D

u

ax+by+c=0

Chapter 4 : Vectors page 6 of 24Chapter 4 : Vectors page 6 of 24


Suppose that (x1, y1) is any arbitrary point O on the line ax+ by+ c = 0. For any other point (x, y) onthe line ax+ by + c = 0, the vector (x− x1, y − y1) is parallel to the line. On the other hand,

(a, b) · (x− x1, y − y1) = (ax+ by)− (ax1 + by1) = −c+ c = 0,

so that the vector n = (a, b), in the direction −−→OQ, is perpendicular to the line ax + by + c = 0.Suppose next that the point (x0, y0) is represented by the point P in the diagram. Then the vectoru = (x0 − x1, y0 − y1) is represented by −−→OP , and −−→OQ represents the orthogonal projection projnu of uon the vector n. Clearly the perpendicular distance D of the point (x0, y0) from the line ax+ by+ c = 0satisfies

D = ‖projnu‖ =∥∥∥∥u · n‖n‖2 n

∥∥∥∥ =|(x0 − x1, y0 − y1) · (a, b)|√

a2 + b2=|ax0 + by0 − ax1 − by1|√

a2 + b2=|ax0 + by0 + c|√

a2 + b2.

We have proved the following result.

PROPOSITION 4F. The perpendicular distance D of a point (x0, y0) from a line ax+ by + c = 0 isgiven by

D =|ax0 + by0 + c|√

a2 + b2.

Example 4.2.9. The perpendicular distance D of the point (5, 7) from the line 2x− 3y+ 5 = 0 is givenby

D =|10− 21 + 5|√

4 + 9=

6√13.

4.3. Vectors in R3

In this section, we consider the same problems as in Section 4.2, but in 3-space R3. Any reader whofeels confident may skip this section.

A vector on the plane R3 can be described as an ordered triple u = (u1, u2, u3), where u1, u2, u3 ∈ R.

Definition. Two vectors u = (u1, u2, u3) and v = (v1, v2, v3) in R3 are said to be equal, denoted byu = v, if u1 = v1, u2 = v2 and u3 = v3.

Definition. For any two vectors u = (u1, u2, u3) and v = (v1, v2, v3) in R3, we define their sum to be

u + v = (u1, u2, u3) + (v1, v2, v3) = (u1 + v1, u2 + v2, u3 + v3).

Definition. For any vector u = (u1, u2, u3) in R3 and any scalar c ∈ R, we define the scalar multipleto be

cu = c(u1, u2, u3) = (cu1, cu2, cu3).

The following two results are the analogues of Propositions 4A and 4B. The proofs are essentiallysimilar.



PROPOSITION 4A’. (VECTOR ADDITION)(a) For every u,v ∈ R3, we have u + v ∈ R3.(b) For every u,v,w ∈ R3, we have u + (v + w) = (u + v) + w.(c) For every u ∈ R3, we have u + 0 = u, where 0 = (0, 0, 0) ∈ R3.(d) For every u ∈ R3, there exists v ∈ R3 such that u + v = 0.(e) For every u,v ∈ R3, we have u + v = v + u.

PROPOSITION 4B’. (SCALAR MULTIPLICATION)(a) For every c ∈ R and u ∈ R3, we have cu ∈ R3.(b) For every c ∈ R and u,v ∈ R3, we have c(u + v) = cu + cv.(c) For every a, b ∈ R and u ∈ R3, we have (a+ b)u = au + bu.(d) For every a, b ∈ R and u ∈ R3, we have (ab)u = a(bu).(e) For every u ∈ R3, we have 1u = u.

Definition. For any vector u = (u1, u2, u3) in R3, we define the norm of u to be the non-negative realnumber

‖u‖ =√u2

1 + u22 + u2

3.

Remarks. (1) Suppose that P (u1, u2, u3) and Q(v1, v2, v3) are two points in R3. To calculate thedistance d(P,Q) between the two points, we can first find a vector from P to Q. This is given by(v1 − u1, v2 − u2, v3 − u3). The distance d(P,Q) is then the norm of this vector, so that

d(P,Q) =√

(v1 − u1)2 + (v2 − u2)2 + (v3 − u3)2.

(2) It is not difficult to see that for any vector u ∈ R3 and any scalar c ∈ R, we have

‖cu‖ = |c|‖u‖.

Definition. Any vector u ∈ R3 satisfying ‖u‖ = 1 is called a unit vector.

Example 4.3.1. The vector (3, 4, 12) has norm 13.

Example 4.3.2. The distance between the points (6, 3, 12) and (9, 7, 0) is 13.

Example 4.3.3. The vectors (1, 0, 0) and (0,−1, 0) are unit vectors in R3.

Example 4.3.4. The unit vector in the direction of the vector (1, 0, 1) is (1/√

2, 0, 1/√

2).

The theory of scalar products can be extended to R3 is the natural way.

Definition. Suppose that u = (u1, u2, u3) and v = (v1, v2, v3) are vectors in R3, and that θ ∈ [0, π]represents the angle between them. We define the scalar product u · v of u and v by

u · v = ‖u‖‖v‖ cos θ if u 6= 0 and v 6= 0,

0 if u = 0 or v = 0.(4)


u · v = u1v1 + u2v2 + u3v3. (5)

The definitions (4) and (5) are clearly equivalent if u = 0 or v = 0. On the other hand, we have thefollowing analogue of Proposition 4C. The proof is similar.



PROPOSITION 4C’. Suppose that u = (u1, u2, u3) and v = (v1, v2, v3) are non-zero vectors in R3,and that θ ∈ [0, π] represents the angle between them. Then

‖u‖‖v‖ cos θ = u1v1 + u2v2 + u3v3.

Remarks. (1) We say that two non-zero vectors in R3 are orthogonal if the angle between them is π/2.It follows immediately from the definition of the scalar productthat two non-zero vectors u,v ∈ R3 areorthogonal if and only if u · v = 0.

(2) We can calculate the scalar product of any two non-zero vectors u,v ∈ R3 by the formula (5) andthen use the formula (4) to calculate the angle between u and v.

Example 4.3.5. Suppose that u = (2, 0, 0) and v = (1, 1,√

2). Then by the formula (5), we haveu · v = 2. Note now that ‖u‖ = 2 and ‖v‖ = 2. It follows from the formula (4) that

cos θ =u · v‖u‖‖v‖ =

24

=12,

so that θ = π/3.

Example 4.3.6. Suppose that u = (2, 3, 5) and v = (1, 1,−1). Then by the formula (5), we haveu · v = 0. It follows that u and v are orthogonal.

The following result is the analogue of Proposition 4D. The proof is similar.

PROPOSITION 4D’. (SCALAR PRODUCT) Suppose that u,v,w ∈ R3 and c ∈ R. Then(a) u · v = v · u;(b) u · (v + w) = (u · v) + (u ·w);(c) c(u · v) = (cu) · v = u · (cv);(d) u · u ≥ 0; and(e) u · u = 0 if and only if u = 0.

Suppose now that a and u are two vectors in R3. Then since two vectors are always coplanar, we candraw the following diagram which represents the plane they lie on:


PROPOSITION 4C’. Suppose that u = (u1, u2, u3) and v = (v1, v2, v3) are non-zero vectors in R3,and that ! " [0, "] represents the angle between them. Then

#u##v# cos ! = u1v1 + u2v2 + u3v3.

Remarks. (1) We say that two non-zero vectors in R3 are orthogonal if the angle between them is "/2.It follows immediately from the definition of the scalar productthat two non-zero vectors u,v " R3 areorthogonal if and only if u · v = 0.

(2) We can calculate the scalar product of any two non-zero vectors u,v " R3 by the formula (5) andthen use the formula (4) to calculate the angle between u and v.

Example 4.3.5. Suppose that u = (2, 0, 0) and v = (1, 1,$

2). Then by the formula (5), we haveu · v = 2. Note now that #u# = 2 and #v# = 2. It follows from the formula (4) that

cos ! =u · v#u##v# =

24

=12,

so that ! = "/3.

Example 4.3.6. Suppose that u = (2, 3, 5) and v = (1, 1,%1). Then by the formula (5), we haveu · v = 0. It follows that u and v are orthogonal.

The following result is the analogue of Proposition 4D. The proof is similar.

PROPOSITION 4D’. (SCALAR PRODUCT) Suppose that u,v,w " R3 and c " R. Then(a) u · v = v · u;(b) u · (v + w) = (u · v) + (u · w);(c) c(u · v) = (cu) · v = u · (cv);(d) u · u & 0; and(e) u · u = 0 if and only if u = 0.

Suppose now that a and u are two vectors in R3. Then since two vectors are always coplanar, we candraw the following diagram which represents the plane they lie on:

(6)

P

R A

Q

O

av

w

u

Note that this diagram is essentially the same as the diagram (3), the only di!erence being that whilethe diagram (3) shows the whole of R2, the diagram (6) only shows part of R3. As before, we representthe two vectors a and u by %'OA and %%'OP respectively. If we project the vector u on to the line OA, thenthe image of the projection is the vector w, represented by %%'OQ. On the other hand, if we project thevector u on to a line perpendicular to the line OA, then the image of the projection is the vector v,represented by %%'OR.



(6)

Note that this diagram is essentially the same as the diagram (3), the only difference being that whilethe diagram (3) shows the whole of R2, the diagram (6) only shows part of R3. As before, we representthe two vectors a and u by −→OA and −−→OP respectively. If we project the vector u on to the line OA, thenthe image of the projection is the vector w, represented by −−→OQ. On the other hand, if we project thevector u on to a line perpendicular to the line OA, then the image of the projection is the vector v,represented by −−→OR.




The following result is the analogue of Proposition 4E. The proof is similar.

PROPOSITION 4E’. (ORTHOGONAL PROJECTION) Suppose that u,a ∈ R3. Then

projau =u · a‖a‖2 a.

Remark. Note that the component of u orthogonal to a, represented by −−→OR in the diagram (6), is

u− projau = u− u · a‖a‖2 a.

4.4. Vector Products

In this section, we shall discuss a product of vectors unique to R3. The idea of vector products has wideapplications in geometry, physics and engineering, and is motivated by the wish to find a vector that isperpendicular to two given vectors.

We shall use the right hand rule. In other words, if we hold the thumb on the right hand upwardsand close the remaining four fingers, then the fingers point from the x-direction towards the y-direction,while the thumb points towards the z-direction. Alternatively, if we imagine Columbus had never livedand that the earth were flat, then taking the x-direction as east and the y-direction as north, then thez-direction is upwards!

We shall frequently use the three vectors i = (1, 0, 0), j = (0, 1, 0) and k = (0, 0, 1) in R3.

Definition. Suppose that u = (u1, u2, u3) and v = (v1, v2, v3) are two vectors in R3. Then the vectorproduct u× v is defined by the determinant

u× v = det

i j ku1 u2 u3

v1 v2 v3

.

Remarks. (1) Note that

i× j = −(j× i) = k,

j× k = −(k× j) = i,

k× i = −(i× k) = j.

(2) Using cofactor expansion by row 1, we have

u× v = det(u2 u3

v2 v3

)i− det

(u1 u3

v1 v3

)j + det

(u1 u2

v1 v2

)k

=(

det(u2 u3

v2 v3

),−det

(u1 u3

v1 v3

),det

(u1 u2

v1 v2

))= (u2v3 − u3v2, u3v1 − u1v3, u1v2 − u2v1).

We shall first of all show that the vector product u× v is orthogonal to both u and v.



PROPOSITION 4G. Suppose that u = (u1, u2, u3) and v = (v1, v2, v3) are two vectors in R3. Then(a) u · (u× v) = 0; and(b) v · (u× v) = 0.

Proof. Note first of all that

u · (u× v) = (u1, u2, u3) ·(

det(u2 u3

v2 v3

),−det

(u1 u3

v1 v3

),det

(u1 u2

v1 v2

))

= u1 det(u2 u3

v2 v3

)− u2 det

(u1 u3

v1 v3

)+ u3 det

(u1 u2

v1 v2

)= det

u1 u2 u3

u1 u2 u3

v1 v2 v3

,

in view of cofactor expansion by row 1. On the other hand, clearly

det

u1 u2 u3

u1 u2 u3

v1 v2 v3

= 0.

This proves part (a). The proof of part (b) is similar. ©

Example 4.4.1. Suppose that u = (1,−1, 2) and v = (3, 0, 2). Then

u× v = det

i j k1 −1 23 0 2

=(

det(−1 2

0 2

),−det

(1 23 2

),det

(1 −13 0

))= (−2, 4, 3).

Note that (1,−1, 2) · (−2, 4, 3) = 0 and (3, 0, 2) · (−2, 4, 3) = 0.

PROPOSITION 4H. (VECTOR PRODUCT) Suppose that u,v,w ∈ R3 and c ∈ R. Then(a) u× v = −(v × u);(b) u× (v + w) = (u× v) + (u×w);(c) (u + v)×w = (u×w) + (v ×w);(d) c(u× v) = (cu)× v = u× (cv);(e) u× 0 = 0; and(f) u× u = 0.

Proof. Write u = (u1, u2, u3), v = (v1, v2, v3) and w = (w1, w2, w3). To check part (a), note that

det

i j ku1 u2 u3

v1 v2 v3

= −det

i j kv1 v2 v3

u1 u2 u3

.

To check part (b), note that

det

i j ku1 u2 u3

v1 + w1 v2 + w2 v3 + w3

= det

i j ku1 u2 u3

v1 v2 v3

+ det

i j ku1 u2 u3

w1 w2 w3

.

Part (c) is similar. To check part (d), note that

cdet

i j ku1 u2 u3

v1 v2 v3

= det

i j kcu1 cu2 cu3

v1 v2 v3

= det

i j ku1 u2 u3

cv1 cv2 cv3

.

To check parts (e) and (f), note that

u× 0 = det

i j ku1 u2 u3

0 0 0

= 0 and u× u = det

i j ku1 u2 u3

u1 u2 u3

= 0

as required. ©



Next, we shall discuss an application of vector product to the evaluaton of the area of a parallelogram.To do this, we shall first establish the following result.

PROPOSITION 4J. Suppose that u = (u1, u2, u3) and v = (v1, v2, v3) are non-zero vectors in R3,and that θ ∈ [0, π] represents the angle between them. Then(a) ‖u× v‖2 = ‖u‖2‖v‖2 − (u · v)2; and(b) ‖u× v‖ = ‖u‖‖v‖ sin θ.

Proof. Note that

‖u× v‖2 = (u2v3 − u3v2)2 + (u3v1 − u1v3)2 + (u1v2 − u2v1)2 (7)

and

‖u‖2‖v‖2 − (u · v)2 = (u21 + u2

2 + u23)(v2

1 + v22 + v2

3)− (u1v1 + u2v2 + u3v3)2. (8)

Part (a) follows on expanding the right hand sides of (7) and (8) and checking that they are equal. Toprove part (b), recall that

u · v = ‖u‖‖v‖ cos θ.

Combining with part (a), we obtain

‖u× v‖2 = ‖u‖2‖v‖2 − ‖u‖2‖v‖2 cos2 θ = ‖u‖2‖v‖2 sin2 θ.

Part (b) follows. ©

Consider now a parallelogram with vertices O,A,B,C. Suppose that u and v are represented by −→OAand −−→OC respectively. If we imagine the side OA to represent the base of the parallelogram, so thatthe base has length ‖u‖, then the height of the the parallelogram is given by ‖v‖ sin θ, as shown in thediagram below:


Next, we shall discuss an application of vector product to the evaluaton of the area of a parallelogram.To do this, we shall first establish the following result.

PROPOSITION 4J. Suppose that u = (u1, u2, u3) and v = (v1, v2, v3) are non-zero vectors in R3,and that ! " [0, "] represents the angle between them. Then(a) #u$ v#2 = #u#2#v#2 % (u · v)2; and(b) #u$ v# = #u##v# sin !.

Proof. Note that

(7) #u$ v#2 = (u2v3 % u3v2)2 + (u3v1 % u1v3)2 + (u1v2 % u2v1)2

and

(8) #u#2#v#2 % (u · v)2 = (u21 + u2

2 + u23)(v

21 + v2

2 + v23)% (u1v1 + u2v2 + u3v3)2.

Part (a) follows on expanding the right hand sides of (7) and (8) and checking that they are equal. Toprove part (b), recall that

u · v = #u##v# cos !.

Combining with part (a), we obtain

#u$ v#2 = #u#2#v#2 % #u#2#v#2 cos2 ! = #u#2#v#2 sin2 !.

Part (b) follows. !

Consider now a parallelogram with vertices O, A,B, C. Suppose that u and v are represented by %&OAand %%&OC respectively. If we imagine the side OA to represent the base of the parallelogram, so thatthe base has length #u#, then the height of the the parallelogram is given by #v# sin !, as shown in thediagram below:

C B

O A

!v! sin ! v

u!

!u!

It follows from Proposition 4J that the area of the parallelogram is given by #u$ v#. We have provedthe following result.

PROPOSITION 4K. Suppose that u,v " R3. Then the parallelogram with u and v as two of its sideshas area #u$ v#.

We conclude this section by making a remark on the vector product u $ v of two vectors in R3.Recall that the vector product is perpendicular to both u and v. Furthermore, it can be shown that thedirection of u$ v satisfies the right hand rule, in the sense that if we hold the thumb on the right handoutwards and close the remaining four fingers, then the thumb points towards the u$ v-direction whenthe fingers point from the u-direction towards the v-direction. Also, we showed in Proposition 4J thatthe magnitude of u$v depends only on the norm of u and v and the angle between the two vectors. It


It follows from Proposition 4J that the area of the parallelogram is given by ‖u × v‖. We have provedthe following result.

PROPOSITION 4K. Suppose that u,v ∈ R3. Then the parallelogram with u and v as two of its sideshas area ‖u× v‖.

We conclude this section by making a remark on the vector product u × v of two vectors in R3.Recall that the vector product is perpendicular to both u and v. Furthermore, it can be shown that thedirection of u× v satisfies the right hand rule, in the sense that if we hold the thumb on the right handoutwards and close the remaining four fingers, then the thumb points towards the u× v-direction whenthe fingers point from the u-direction towards the v-direction. Also, we showed in Proposition 4J that



the magnitude of u×v depends only on the norm of u and v and the angle between the two vectors. Itfollows that the vector product is unchanged as long as we keep a right hand coordinate system. This isan important consideration in physics and engineering, where we may use different coordinate systemson the same problem.

4.5. Scalar Triple Products

Suppose that u,v,w ∈ R3 do not all lie on the same plane. Consider the parallelepiped with u,v,w asthree of its edges. We are interested in calculating the volume of this parallelepiped. Suppose that u, vand w are represented by −→OA, −−→OB and −−→OC respectively. Consider the diagram below:


follows that the vector product is unchanged as long as we keep a right hand coordinate system. This isan important consideration in physics and engineering, where we may use di!erent coordinate systemson the same problem.

4.5. Scalar Triple Products

Suppose that u,v,w " R3 do not all lie on the same plane. Consider the parallelepiped with u,v,w asthree of its edges. We are interested in calculating the volume of this parallelepiped. Suppose that u, vand w are represented by #$OA, ##$OB and ##$OC respectively. Consider the diagram below:

P A

C

O B

v!w

u

w

v

By Proposition 4K, the base of this parallelepiped, with O, B, C as three of the vertices, has area %v&w%.Next, note that if OP is perpendicular to the base of the parallelepiped, then ##$OP is in the direction ofv &w. If PA is perpendicular to OP , then the height of the parallelepiped is equal to the norm of theorthogonal projection of u on v &w. In other words, the parallelepiped has height

%projv!wu% =!!!!u · (v &w)%v &w%2 (v &w)

!!!! =|u · (v &w)|%v &w% .

Hence the volume of the parallelepiped is given by

V = |u · (v &w)|.


PROPOSITION 4L. Suppose that u,v,w " R3. Then the parallelepiped with u, v and w as three ofits edges has volume |u · (v &w)|.

Definition. Suppose that u,v,w " R3. Then u · (v&w) is called the scalar triple product of u, v andw.

Remarks. (1) It follows immediately from Proposition 4L that three vectors in R3 are coplanar if andonly if their scalar triple product is zero.

(2) Note that

u · (v &w) = (u1, u2, u3) ·"

det"

v2 v3

w2 w3

#,#det

"v1 v3

w1 w3

#,det

"v1 v2

w1 w2

##(9)

= u1 det"

v2 v3

w2 w3

## u2 det

"v1 v3

w1 w3

#+ u3 det

"v1 v2

w1 w2

#

= det

$% u1 u2 u3

v1 v2 v3

w1 w2 w3

&' ,

in view of cofactor expansion by row 1.


By Proposition 4K, the base of this parallelepiped, with O,B,C as three of the vertices, has area ‖v×w‖.Next, note that if OP is perpendicular to the base of the parallelepiped, then −−→OP is in the direction ofv ×w. If PA is perpendicular to OP , then the height of the parallelepiped is equal to the norm of theorthogonal projection of u on v ×w. In other words, the parallelepiped has height

‖projv×wu‖ =∥∥∥∥u · (v ×w)‖v ×w‖2 (v ×w)

∥∥∥∥ =|u · (v ×w)|‖v ×w‖ .

Hence the volume of the parallelepiped is given by

V = |u · (v ×w)|.We have proved the following result.

PROPOSITION 4L. Suppose that u,v,w ∈ R3. Then the parallelepiped with u, v and w as three ofits edges has volume |u · (v ×w)|.

Definition. Suppose that u,v,w ∈ R3. Then u · (v × w) is called the scalar triple product of u, vand w.

Remarks. (1) It follows immediately from Proposition 4L that three vectors in R3 are coplanar if andonly if their scalar triple product is zero.

(2) Note that

u · (v ×w) = (u1, u2, u3) ·(

det(v2 v3

w2 w3

),− det

(v1 v3

w1 w3

),det

(v1 v2

w1 w2

))= u1 det

(v2 v3

w2 w3

)− u2 det

(v1 v3

w1 w3

)+ u3 det

(v1 v2

w1 w2

)

= det

u1 u2 u3

v1 v2 v3

w1 w2 w3

, (9)

in view of cofactor expansion by row 1.



(3) It follows from identity (9) that

u · (v ×w) = v · (w × u) = w · (u× v).

Note that each of the determinants can be obtained from the other two by twice interchanging two rows.

Example 4.5.1. Suppose that u = (1, 0, 1), v = (2, 1, 3) and w = (0, 1, 1). Then

u · (v ×w) = det

1 0 12 1 30 1 1

= 0,

so that u, v and w are coplanar.

Example 4.5.2. The volume of the parallelepiped with u = (1, 0, 1), v = (2, 1, 4) and w = (0, 1, 1) asthree of its edges is given by

|u · (v ×w)| =∣∣∣∣∣∣det

1 0 12 1 40 1 1

∣∣∣∣∣∣ = | − 1| = 1.

4.6. Application to Geometry in R3

In this section, we shall study lines and planes in R3 by using our results on vectors in R3.

Consider first of all a plane in R3. Suppose that (x1, y1, z1) ∈ R3 is a given point on this plane.Suppose further that n = (a, b, c) is a vector perpendicular to this plane. Then for any arbitrary point(x, y, z) ∈ R3 on this plane, the vector

(x, y, z)− (x1, y1, z1) = (x− x1, y − y1, z − z1)

joins one point on the plane to another point on the plane, and so must be parallel to the plane andhence perpendicular to n = (a, b, c). It follows that the scalar product

(a, b, c) · (x− x1, y − y1, z − z1) = 0,

and so

a(x− x1) + b(y − y1) + c(z − z1) = 0. (10)

If we write −d = ax1 + by1 + cz1, then (10) can be rewritten in the form

ax+ by + cz + d = 0. (11)

Equation (10) is usually called the point-normal form of the equation of a plane, while equation (11) isusually known as the general form of the equation of a plane.

Example 4.6.1. Consider the plane through the point (2,−5, 7) and perpendicular to the vector(3, 5,−4). Here (a, b, c) = (3, 5,−4) and (x1, y1, z1) = (2,−5, 7). The equation of the plane is givenin point-normal form by 3(x− 2) + 5(y+ 5)− 4(z− 7) = 0, and in general form by 3x+ 5y− 4z+ 37 = 0.Here −d = 6− 25− 28 = −37.



Example 4.6.2. Consider the plane through the points (1, 1, 1), (2, 2, 0) and (4,−6, 2). Then the vectors

(2, 2, 0)− (1, 1, 1) = (1, 1,−1) and (4,−6, 2)− (1, 1, 1) = (3,−7, 1)

join the point (1, 1, 1) to the points (2, 2, 0) and (4,−6, 2) respectively and are therefore parallel to theplane. It follows that the vector product

(1, 1,−1)× (3,−7, 1) = (−6,−4,−10)

is perpendicular to the plane. The equation of the plane is then given by−6(x−1)−4(y−1)−10(z−1) = 0,or 3x+ 2y + 5z − 10 = 0.

Consider next a line in R3. Suppose that (x1, y1, z1) ∈ R3 is a given point on this line. Suppose furtherthat n = (a, b, c) is a vector parallel to this line. Then for any arbitrary point (x, y, z) ∈ R3 on this line,the vector

(x, y, z)− (x1, y1, z1) = (x− x1, y − y1, z − z1)

joins one point on the line to another point on the line, and so must be parallel to n = (a, b, c). It followsthat there is some number λ ∈ R such that

(x− x1, y − y1, z − z1) = λ(a, b, c),

so that

x = x1 + aλ,

y = y1 + bλ, (12)z = z1 + cλ,

where λ is called a parameter. Suppose further that a, b, c are all non-zero. Then, eliminating theparameter λ, we obtain

x− x1

a=y − y1

b=z − z1

c. (13)

Equations (12) are usually called the parametric form of the equations of a line, while equations (13)are usually known as the symmetric form of the equations of a line.

Example 4.6.3. Consider the line through the point (2,−5, 7) and parallel to the vector (3, 5,−4). Here(a, b, c) = (3, 5,−4) and (x1, y1, z1) = (2,−5, 7). The equations of the line are given in parametric formby

x = 2 + 3λ,y = −5 + 5λ,z = 7− 4λ,

and in symmetric form byx− 2

3=y + 5

5= −z − 7

4.



Example 4.6.4. Consider the line through the points (3, 0, 5) and (7, 0, 8). Then a vector in the directionof the line is given by

(7, 0, 8)− (3, 0, 5) = (4, 0, 3).

The equation of the line is then given in parametric form by

x = 3 + 4λ,y = 0,z = 5 + 3λ,

and in symmetric form by

x− 34

=z − 5

3and y = 0.

Consider the plane through three fixed points (x1, y1, z1), (x2, y2, z2) and (x3, y3, z3), not lying on thesame line. Let (x, y, z) be a point on the plane. Then the vectors

(x, y, z)− (x1, y1, z1) = (x− x1, y − y1, z − z1),(x, y, z)− (x2, y2, z2) = (x− x2, y − y2, z − z2),(x, y, z)− (x3, y3, z3) = (x− x3, y − y3, z − z3),

each joining one point on the plane to another point on the plane, are all parallel to the plane. Usingthe vector product, we see that the vector

(x− x2, y − y2, z − z2)× (x− x3, y − y3, z − z3)

is perpendicular to the plane, and so perpendicular to the vector (x− x1, y − y1, z − z1). It follows thatthe scalar triple product

(x− x1, y − y1, z − z1) · ((x− x2, y − y2, z − z2)× (x− x3, y − y3, z − z3)) = 0;

in other words,

det

x− x1 y − y1 z − z1

x− x2 y − y2 z − z2

x− x3 y − y3 z − z3

= 0.

This is another technique to find the equation of a plane through three fixed points.

Example 4.6.5. We return to the plane in Example 4.6.2, through the three points (1, 1, 1), (2, 2, 0)and (4,−6, 2). The equation is given by

det

x− 1 y − 1 z − 1x− 2 y − 2 z − 0x− 4 y + 6 z − 2

= 0.

The determinant on the left hand side is equal to −6x− 4y− 10z+ 20. Hence the equation of the planeis given by −6x− 4y − 10z + 20 = 0, or 3x+ 2y + 5z − 10 = 0.

We observe that the calculation for the determinant above is not very pleasant. However, the techniquecan be improved in the following way by making less reference to the unknown point (x, y, z). Note thatthe vectors

(x, y, z)− (x1, y1, z1) = (x− x1, y − y1, z − z1),(x2, y2, z2)− (x1, y1, z1) = (x2 − x1, y2 − y1, z2 − z1),(x3, y3, z3)− (x1, y1, z1) = (x3 − x1, y3 − y1, z3 − z1),




(x2 − x1, y2 − y1, z2 − z1)× (x3 − x1, y3 − y1, z3 − z1)

is perpendicular to the plane, and so perpendicular to the vector (x− x1, y − y1, z − z1). It follows thatthe scalar triple product

(x− x1, y − y1, z − z1) · ((x2 − x1, y2 − y1, z2 − z1)× (x3 − x1, y3 − y1, z3 − z1)) = 0;

in other words,

det

x− x1 y − y1 z − z1

x2 − x1 y2 − y1 z2 − z1

x3 − x1 y3 − y1 z3 − z1

= 0.

Example 4.6.6. We return to the plane in Examples 4.6.2 and 4.6.5, through the three points (1, 1, 1),(2, 2, 0) and (4,−6, 2). The equation is given by

det

x− 1 y − 1 z − 12− 1 2− 1 0− 14− 1 −6− 1 2− 1

= 0.

The determinant on the left hand side is equal to

det

x− 1 y − 1 z − 11 1 −13 −7 1

= −6(x− 1)− 4(y − 1)− 10(z − 1) = −6x− 4y − 10z + 20.

Hence the equation of the plane is given by −6x− 4y − 10z + 20 = 0, or 3x+ 2y + 5z − 10 = 0.

We next consider the problem of dividing a line segment in a given ratio. Suppose that x1 and x2 aretwo given points in R3.

We wish to divide the line segment joining x1 and x2 internally in the ratio α1 : α2, where α1 and α2

are positive real numbers. In other words, we wish to find the point x on the line segment joining x1

and x2 such that

‖x− x1‖‖x− x2‖ =

α1

α2,

as shown in the diagram below:



(x2 " x1, y2 " y1, z2 " z1)# (x3 " x1, y3 " y1, z3 " z1)

is perpendicular to the plane, and so perpendicular to the vector (x" x1, y " y1, z " z1). It follows thatthe scalar triple product

(x" x1, y " y1, z " z1) · ((x2 " x1, y2 " y1, z2 " z1)# (x3 " x1, y3 " y1, z3 " z1)) = 0;

in other words,

det

!" x" x1 y " y1 z " z1

x2 " x1 y2 " y1 z2 " z1

x3 " x1 y3 " y1 z3 " z1

#$ = 0.

Example 4.6.6. We return to the plane in Examples 4.6.2 and 4.6.5, through the three points (1, 1, 1),(2, 2, 0) and (4,"6, 2). The equation is given by

det

!" x" 1 y " 1 z " 12" 1 2" 1 0" 14" 1 "6" 1 2" 1

#$ = 0.

The determinant on the left hand side is equal to

det

!" x" 1 y " 1 z " 11 1 "13 "7 1

#$ = "6(x" 1)" 4(y " 1)" 10(z " 1) = "6x" 4y " 10z + 20.

Hence the equation of the plane is given by "6x" 4y " 10z + 20 = 0, or 3x + 2y + 5z " 10 = 0.

We next consider the problem of dividing a line segment in a given ratio. Suppose that x1 and x2 aretwo given points in R3.

We wish to divide the line segment joining x1 and x2 internally in the ratio !1 : !2, where !1 and !2

are positive real numbers. In other words, we wish to find the point x on the line segment joining x1

and x2 such that

$x" x1$$x" x2$ =

!1

!2,

as shown in the diagram below:

| | |x1 x x2

%"""""" $x" x1$ """"""&%"" $x" x2$ ""&

Since x" x1 and x2 " x are both in the same direction as x2 " x1, we must have

!2(x" x1) = !1(x2 " x), or x =!1x2 + !2x1

!1 + !2.

We wish next to find the point x on the line joining x1 and x2, but not between x1 and x2, such that

$x" x1$$x" x2$ =

!1

!2,


Since x− x1 and x2 − x are both in the same direction as x2 − x1, we must have

α2(x− x1) = α1(x2 − x), or x =α1x2 + α2x1

α1 + α2.

We wish next to find the point x on the line joining x1 and x2, but not between x1 and x2, such that

‖x− x1‖‖x− x2‖ =

α1

α2,



where α1 and α2 are positive real numbers, as shown in the diagrams below for the cases α1 < α2 andα1 > α2 respectively:


where !1 and !2 are positive real numbers, as shown in the diagrams below for the cases !1 < !2 and!1 > !2 respectively:

| | | | | |x x1 x2 x1 x2 x

Since x" x1 and x" x2 are in the same direction as each other, we must have

!2(x" x1) = !1(x" x2), or x =!1x2 " !2x1

!1 " !2.

Example 4.6.7. Let x1 = (1, 2, 3) and x2 = (7, 11, 6). The point

x =2x2 + x1

2 + 1=

2(7, 11, 6) + (1, 2, 3)3

= (5, 8, 5)

divides the line segment joining (1, 2, 3) and (7, 11, 6) internally in the ratio 2 : 1, whereas the point

x =4x2 " 2x1

4" 2=

4(7, 11, 6)" 2(1, 2, 3)2

= (13, 20, 9)

satisfies

#x" x1##x" x2# =

42.

Finally we turn our attention to the question of finding the distance of a plane from a given point.We shall prove the following analogue of Proposition 4F.

PROPOSITION 4F’. The perpendicular distance D of a plane ax + by + cz + d = 0 from a point(x0, y0, z0) is given by

D =|ax0 + by0 + cz0 + d|$

a2 + b2 + c2.

Proof. Consider the following diagram:

P

n = (a, b, c)

Q

O

D

ax+by+cz=0

u


Since x− x1 and x− x2 are in the same direction as each other, we must have

α2(x− x1) = α1(x− x2), or x =α1x2 − α2x1

α1 − α2.


x =2x2 + x1

2 + 1=

2(7, 11, 6) + (1, 2, 3)3

= (5, 8, 5)


x =4x2 − 2x1

4− 2=

4(7, 11, 6)− 2(1, 2, 3)2

= (13, 20, 9)

satisfies

‖x− x1‖‖x− x2‖ =

42.



D =|ax0 + by0 + cz0 + d|√

a2 + b2 + c2.



where !1 and !2 are positive real numbers, as shown in the diagrams below for the cases !1 < !2 and!1 > !2 respectively:

| | | | | |x x1 x2 x1 x2 x

Since x" x1 and x" x2 are in the same direction as each other, we must have

!2(x" x1) = !1(x" x2), or x =!1x2 " !2x1

!1 " !2.


x =2x2 + x1

2 + 1=

2(7, 11, 6) + (1, 2, 3)3

= (5, 8, 5)


x =4x2 " 2x1

4" 2=

4(7, 11, 6)" 2(1, 2, 3)2

= (13, 20, 9)

satisfies

#x" x1##x" x2# =

42.



D =|ax0 + by0 + cz0 + d|$

a2 + b2 + c2.


P

n = (a, b, c)

Q

O

D

ax+by+cz=0

u

Chapter 4 : Vectors page 18 of 24Chapter 4 : Vectors page 18 of 24


Suppose that (x1, x2, x3) is any arbitrary point O on the plane ax+ by+ cz+d = 0. For any other point(x, y, z) on the plane ax+ by+ cz+ d = 0, the vector (x− x1, y− y1, z− z1) is parallel to the plane. Onthe other hand,

(a, b, c) · (x− x1, y − y1, z − z1) = (ax+ by + cz)− (ax1 + by1 + cz1) = −d+ d = 0,

so that the vector n = (a, b, c), in the direction −−→OQ, is perpendicular to the plane ax+ by + cz + d = 0.Suppose next that the point (x0, y0, z0) is represented by the point P in the diagram. Then the vectoru = (x0−x1, y0−y1, z0−z1) is represented by −−→OP , and −−→OQ represents the orthogonal projection projnuof u on the vector n. Clearly the perpendicular distance D of the point (x0, y0, z0) from the planeax+ by + cz + d = 0 satisfies

D = ‖projnu‖ =∥∥∥∥u · n‖n‖2 n

∥∥∥∥ =|(x0 − x1, y0 − y1, z0 − z1) · (a, b, c)|√

a2 + b2 + c2

=|ax0 + by0 + cz0 − ax1 − by1 − cz1|√

a2 + b2 + c2=|ax0 + by0 + cz0 + d|√

a2 + b2 + c2

as required. ©

A special case of Proposition 4F’ is when (x0, y0, z0) = (0, 0, 0) is the origin. This show that theperpendicular distance of the plane ax+ by + cz + d = 0 from the origin is

|d|√a2 + b2 + c2

.

Example 4.6.8. Consider the plane 3x+ 5y − 4z + 37 = 0. The distance of the point (1, 2, 3) from theplane is

|3 + 10− 12 + 37|√9 + 25 + 16

=38√50

=19√

25

.

The distance of the origin from the plane is

|37|√9 + 25 + 16

=37√50.

Example 4.6.9. Consider also the plane 3x+5y−4z−1 = 0. Note that this plane is also perpendicular tothe vector (3, 5,−4) and is therefore parallel to the plane 3x+5y−4z+37 = 0. It is therefore reasonable tofind the perpendicular distance between these two parallel planes. Note that the perpendicular distancebetween the two planes is equal to the perpendicular distance of any point on 3x+ 5y− 4z− 1 = 0 fromthe plane 3x+ 5y− 4z+ 37 = 0. Note now that (1, 2, 3) lies on the plane 3x+ 5y− 4z− 1 = 0. It followsfrom Example 4.6.8 that the distance between the two planes is 19

√2/5.

4.7. Application to Mechanics

Let u = (ux, uy) denote a vector in R2, where the components ux and uy are functions of an independentvariable t. Then the derivative of u with respect to t is given by

dudt

=(

dux

dt,

duy

dt

).



Example 4.7.1. When discussing planar particle motion, we often let r = (x, y) denote the position ofa particle at time t. Then the components x and y are functions of t. The derivative

v =drdt

=(

dxdt,

dydt

)represents the velocity of the particle, and its derivative

a =dvdt

=(

d2x

dt2,

d2y

dt2

)represents the acceleration of the particle. We often write r = ‖r‖, v = ‖v‖ and a = ‖a‖.

Suppose that w = (wx, wy) is another vector in R2. Then it is not difficult to see that

ddt

(u ·w) = u · dwdt

+dudt·w. (14)

Example 4.7.2. Consider a particle moving at constant speed along a circular path centred at theorigin. Then r = ‖r‖ is constant. More precisely, the position vector r = (x, y) satisfies x2 + y2 = c1,where c1 is a positive constant, so that

r · r = (x, y) · (x, y) = c1. (15)

On the other hand, v = ‖v‖ is constant. More precisely, the velocity vector

v =(

dxdt,

dydt

)satisfies

(dxdt

)2

+(

dydt

)2

= c2,

where c2 is a positive constant, so that

v · v =(

dxdt,

dydt

)·(

dxdt,

dydt

)= c2. (16)

Differentiating (15) and (16) with respect to t, and using the identity (14), we obtain respectively

r · v = 0 and v · a = 0. (17)

Using the properties of the scalar product, we see that the equations in (17) show that the vector vis perpendicular to both vectors r and a, and so a must be in the same direction as or the oppositedirection to r. Next, differentiating the first equation in (17), we obtain

r · a + v · v = 0, or r · a = −v2 < 0.

Let θ denote the angle between a and r. Then θ = 0 or θ = 180. Since

r · a = ‖r‖‖a‖ cos θ,

it follows that cos θ < 0, and so θ = 180. We also obtain ra = v2, so that a = v2/r. This is a vectorproof that for circular motion at constant speed, the acceleration is towards the centre of the circle andof magnitude v2/r.

Let u = (ux.uy, uz) denote a vector in R3, where the components ux, uy and uz are functions of anindependent variable t. Then the derivative of u with respect to t is given by

dudt

=(

dux

dt,

duy

dt,

duz

dt

).



Suppose that w = (wx, wy, wz) is another vector in R3. Then it is not difficult to see that

ddt

(u ·w) = u · dwdt

+dudt·w. (18)

Example 4.7.3. When discussing particle motion in 3-dimensional space, we often let r = (x, y, z)denote the position of a particle at time t. Then the components x, y and z are functions of t. Thederivative

v =drdt

=(

dxdt,

dydt,

dzdt

)= (x, y, z)

represents the velocity of the particle, and its derivative

a =dvdt

=(

d2x

dt2,

d2y

dt2,

d2z

dt2

)= (x, y, z)

represents the acceleration of the particle.

Example 4.7.4. For a particle of mass m, the kinetic energy is given by

T = 12m(x2 + y2 + z2) = 1

2m(x, y, z) · (x, y, z) = 12mv · v.

Using the identity (18), we have

dTdt

= ma · v = F · v,

where F = ma denotes the force. On the other hand, suppose that the potential energy is given by V .Using knowledge on functions of several real variables, we can show that

dVdt

=∂V

∂x

dxdt

+∂V

∂y

dydt

+∂V

∂z

dzdt

=(∂V

∂x,∂V

∂y,∂V

∂z

)· v = ∇V · v,

where

∇V =(∂V

∂x,∂V

∂y,∂V

∂z

)is called the gradient of V . The law of conservation of energy says that T + V is constant, so that

dTdt

+dVdt

= (F +∇V ) · v = 0

holds for all vectors v, so that F(r) = −∇V (r) for all vectors r.

Example 4.7.5. If a force acts on a moving particle, then the work done is defined as the product ofthe distance moved and the magnitude of the force in the direction of motion. Suppose that a force Facts on a particle with displacement r. Then the component of the force in the direction of the motionis given by F · u, where

u =r‖r‖

is a unit vector in the direction of the vector r. It follows that the work done is given by

‖r‖(F · r‖r‖

)= F · r.



For instance, we see that the work done in moving a particle along a vector r = (3,−2, 4) with appliedforce F = (2,−1, 1) is F · r = (2,−1, 1) · (3,−2, 4) = 12.

Example 4.7.6. We can also resolve a force into components. Consider a weight of mass m hangingfrom the ceiling on a rope as shown in the picture below:


For instance, we see that the work done in moving a particle along a vector r = (3,"2, 4) with appliedforce F = (2,"1, 1) is F · r = (2,"1, 1) · (3,"2, 4) = 12.


m

Here the rope makes an angle of 60! with the vertical. We wish to find the tension T on the rope. Tofind this, note that the tension on the rope is a force, and we have the following picture of forces:

T1 60! T2

magnitude mg

The force T1 has magnitude #T1# = T . Let z be a unit vector pointing vertically upwards. Using scalarproducts, we see that the component of the force T1 in the vertical direction is

T1 · z = #T1##z# cos 60! = 12T.

Similarly, the force T2 has magnitude #T2# = T , and the component of it in the vertical direction is

T2 · z = #T2##z# cos 60! = 12T.

Since the weight is stationary, he total force upwards on it is 12T + 1

2T "mg = 0. Hence T = mg.


Here the rope makes an angle of 60 with the vertical. We wish to find the tension T on the rope. Tofind this, note that the tension on the rope is a force, and we have the following picture of forces:


For instance, we see that the work done in moving a particle along a vector r = (3,"2, 4) with appliedforce F = (2,"1, 1) is F · r = (2,"1, 1) · (3,"2, 4) = 12.


m

Here the rope makes an angle of 60! with the vertical. We wish to find the tension T on the rope. Tofind this, note that the tension on the rope is a force, and we have the following picture of forces:

T1 60! T2

magnitude mg

The force T1 has magnitude #T1# = T . Let z be a unit vector pointing vertically upwards. Using scalarproducts, we see that the component of the force T1 in the vertical direction is

T1 · z = #T1##z# cos 60! = 12T.

Similarly, the force T2 has magnitude #T2# = T , and the component of it in the vertical direction is

T2 · z = #T2##z# cos 60! = 12T.


2T "mg = 0. Hence T = mg.


The force T1 has magnitude ‖T1‖ = T . Let z be a unit vector pointing vertically upwards. Using scalarproducts, we see that the component of the force T1 in the vertical direction is

T1 · z = ‖T1‖‖z‖ cos 60 = 12T.

Similarly, the force T2 has magnitude ‖T2‖ = T , and the component of it in the vertical direction is

T2 · z = ‖T2‖‖z‖ cos 60 = 12T.


2T −mg = 0. Hence T = mg.




1. For each of the following pairs of vectors in R2, calculate u + 3v, u · v, ‖u− v‖ and find the anglebetween u and v:

a) u = (1, 1) and v = (−5, 0) b) u = (1, 2) and v = (2, 1)

2. For each of the following pairs of vectors in R2, calculate 2u − 5v, ‖u − 2v‖, u · v and the anglebetween u and v (to the nearest degree):

a) u = (1, 3) and v = (−2, 1) b) u = (2, 0) and v = (−1, 2)

3. For the two vectors u = (2, 3) and v = (5, 1) in the 2-dimensional euclidean space R2, determineeach of the following:

a) u− v b) ‖u‖c) u · (u− v) d) the angle between u and u− v

4. For each of the following pairs of vectors in R3, calculate u + 3v, u · v, ‖u − v‖, find the anglebetween u and v, and find a unit vector perpendicular to both u and v:

a) u = (1, 1, 1) and v = (−5, 0, 5) b) u = (1, 2, 3) and v = (3, 2, 1)

5. Find vectors v and w such that v is parallel to (1, 2, 3), v + w = (7, 3, 5) and w is orthogonal to(1, 2, 3).

6. Let ABCD be a quadrilateral. Show that the quadrilateral obtained by joining the midpoints ofadjacent sides of ABCD is a parallelogram.[Hint: Let a, b, c and d be vectors representing the four sides of ABCD.]

7. Suppose that u, v and w are vectors in R3 such that the scalar triple porduct u · (v×w) 6= 0. Let

u′ =v ×w

u · (v ×w), v′ =

w × uu · (v ×w)

, w′ =u× v

u · (v ×w).

a) Show that u′ · u = 1.b) Show that u′ · v = u′ ·w = 0.c) Use the properties of the scalar triple product to find v′ · v and w′ ·w, as well as v′ · u, v′ ·w,

w′ · u and w′ · v.

8. Suppose that u, v, w, u′, v′ and w′ are vectors in R3 such that u′ · u = v′ · v = w′ ·w = 1 andu′ · v = u′ ·w = v′ · u = v′ ·w = w′ · u = w′ · v = 0. Show that if u · (v ×w) 6= 0, then

u′ =v ×w

u · (v ×w), v′ =

w × uu · (v ×w)

, w′ =u× v

u · (v ×w).

9. Suppose that u, v and w are vectors in R3.a) Show that u× (v ×w) = (u ·w)v − (u · v)w.b) Deduce that (u× v)×w = (u ·w)v − (v ·w)u.

10. Consider the three points P (2, 3, 1), Q(4, 2, 5) and R(1, 6,−3).a) Find the equation of the line through P and Q.b) Find the equation of the plane perpendicular to the line in part (a) and passing through R.c) Find the distance between R and the line in part (a).d) Find the area of the parallelogram with the three points as vertices.e) Find the equation of the plane through the three points.f) Find the distance of the origin (0, 0, 0) from the plane in part (e).g) Are the planes in parts (b) and (e) perpendicular? Justify your assertion.



11. Consider the points (1, 2, 3), (0, 2, 4) and (2, 1, 3) in R3.a) Find the area of a parallelogram with these points as three of its vertices.b) Find the perpendicular distance between (1, 2, 3) and the line passing through (0, 2, 4) and (2, 1, 3).

12. Consider the points (1, 2, 3), (0, 2, 4) and (2, 1, 3) in R3.a) Find a vector perpendicular to the plane containing these points.b) Find the equation of this plane and its perpendicular distance from the origin.c) Find the equation of the line perpendicular to this plane and passing through the point (3, 6, 9).

13. Find the equation of the plane through the points (1, 2,−3), (2,−3, 4) and (−3, 1, 2).

14. Find the equation of the plane through the points (2,−1, 1), (3, 2,−1) and (−1, 3, 2).

15. Find the volume of a parallelepiped with the points (1, 2, 3), (0, 2, 4), (2, 1, 3) and (3, 6, 9) as four ofits vertices.

16. Consider a weight of mass m hanging from the ceiling supported by two ropes as shown in thepicture below:


11. Consider the points (1, 2, 3), (0, 2, 4) and (2, 1, 3) in R3.a) Find the area of a parallelogram with these points as three of its vertices.b) Find the perpendicular distance between (1, 2, 3) and the line passing through (0, 2, 4) and (2, 1, 3).

12. Consider the points (1, 2, 3), (0, 2, 4) and (2, 1, 3) in R3.a) Find a vector perpendicular to the plane containing these points.b) Find the equation of this plane and its perpendicular distance from the origin.c) Find the equation of the line perpendicular to this plane and passing through the point (3, 6, 9).

13. Find the equation of the plane through the points (1, 2,"3), (2,"3, 4) and ("3, 1, 2).

14. Find the equation of the plane through the points (2,"1, 1), (3, 2,"1) and ("1, 3, 2).

15. Find the volume of a parallelepiped with the points (1, 2, 3), (0, 2, 4), (2, 1, 3) and (3, 6, 9) as four ofits vertices.

16. Consider a weight of mass m hanging from the ceiling supported by two ropes as shown in thepicture below:

m

Here the rope on the left makes an angle of 45! with the vertical, while the rope on the right makesan angle of 60! with the vertical. Find the tension on the two ropes.


Here the rope on the left makes an angle of 45 with the vertical, while the rope on the right makesan angle of 60 with the vertical. Find the tension on the two ropes.


LINEAR ALGEBRA

W W L CHEN

c© W W L Chen, 1994, 2008.

This chapter is available free to all individuals, on the understanding that it is not to be used for financial gain,




Chapter 5

INTRODUCTION TO

VECTOR SPACES

5.1. Real Vector Spaces

Before we give any formal definition of a vector space, we shall consider a few concrete examples of suchan abstract object. We first study two examples from the theory of vectors which we first discussed inChapter 4.

Example 5.1.1. Consider the set R2 of all vectors of the form u = (u1, u2), where u1, u2 ∈ R. Considervector addition and also multiplication of vectors by real numbers. It is easy to check that we have thefollowing properties:(1.1) For every u,v ∈ R2, we have u + v ∈ R2.(1.2) For every u,v,w ∈ R2, we have u + (v + w) = (u + v) + w.(1.3) For every u ∈ R2, we have u + 0 = 0 + u = u.(1.4) For every u ∈ R2, we have u + (−u) = 0.(1.5) For every u,v ∈ R2, we have u + v = v + u.(2.1) For every c ∈ R and u ∈ R2, we have cu ∈ R2.(2.2) For every c ∈ R and u,v ∈ R2, we have c(u + v) = cu + cv.(2.3) For every a, b ∈ R and u ∈ R2, we have (a+ b)u = au + bu.(2.4) For every a, b ∈ R and u ∈ R2, we have (ab)u = a(bu).(2.5) For every u ∈ R2, we have 1u = u.

Example 5.1.2. Consider the set R3 of all vectors of the form u = (u1, u2, u3), where u1, u2, u3 ∈ R.Consider vector addition and also multiplication of vectors by real numbers. It is easy to check that wehave properties analogous to (1.1)–(1.5) and (2.1)–(2.5) in the previous example, with reference to R2

being replaced by R3.

We next turn to an example from the theory of matrices which we first discussed in Chapter 2.

Chapter 5 : Introduction to Vector Spaces page 1 of 17


Example 5.1.3. Consider the set M2,2(R) of all 2 × 2 matrices with entries in R. Consider matrixaddition and also multiplication of matrices by real numbers. Denote by O the 2 × 2 null matrix. It iseasy to check that we have the following properties:(1.1) For every P,Q ∈M2,2(R), we have P +Q ∈M2,2(R).(1.2) For every P,Q,R ∈M2,2(R), we have P + (Q+R) = (P +Q) +R.(1.3) For every P ∈M2,2(R), we have P +O = O + P = P .(1.4) For every P ∈M2,2(R), we have P + (−P ) = O.(1.5) For every P,Q ∈M2,2(R), we have P +Q = Q+ P .(2.1) For every c ∈ R and P ∈M2,2(R), we have cP ∈M2,2(R).(2.2) For every c ∈ R and P,Q ∈M2,2(R), we have c(P +Q) = cP + cQ.(2.3) For every a, b ∈ R and P ∈M2,2(R), we have (a+ b)P = aP + bP .(2.4) For every a, b ∈ R and P ∈M2,2(R), we have (ab)P = a(bP ).(2.5) For every P ∈M2,2(R), we have 1P = P .

We also turn to an example from the theory of functions.

Example 5.1.4. Consider the set A of all functions of the form f : R → R. For any two functionsf, g ∈ A, define the function f + g : R → R by writing (f + g)(x) = f(x) + g(x) for every x ∈ R. Forevery function f ∈ A and every number c ∈ R, define the function cf : R→ R by writing (cf)(x) = cf(x)for every x ∈ R. Denote by λ : R→ R the function where λ(x) = 0 for every x ∈ R. Then it is easy tocheck that we have the following properties:(1.1) For every f, g ∈ A, we have f + g ∈ A.(1.2) For every f, g, h ∈ A, we have f + (g + h) = (f + g) + h.(1.3) For every f ∈ A, we have f + λ = λ+ f = f .(1.4) For every f ∈ A, we have f + (−f) = λ.(1.5) For every f, g ∈ A, we have f + g = g + f .(2.1) For every c ∈ R and f ∈ A, we have cf ∈ A.(2.2) For every c ∈ R and f, g ∈ A, we have c(f + g) = cf + cg.(2.3) For every a, b ∈ R and f ∈ A, we have (a+ b)f = af + bf .(2.4) For every a, b ∈ R and f ∈ A, we have (ab)f = a(bf).(2.5) For every f ∈ A, we have 1f = f .

There are many more examples of sets where properties analogous to (1.1)–(1.5) and (2.1)–(2.5) inthe four examples above hold. This apparent similarity leads us to consider an abstract object whichwill incorporate all these individual cases as examples. We say that these examples are all vector spacesover R.

Definition. A vector space V over R, or a real vector space V , is a set of objects, known as vectors,together with vector addition + and multiplication of vectors by element of R, and satisfying the followingproperties:

(VA1) For every u,v ∈ V , we have u + v ∈ V .(VA2) For every u,v,w ∈ V , we have u + (v + w) = (u + v) + w.(VA3) There exists an element 0 ∈ V such that for every u ∈ V , we have u + 0 = 0 + u = u.(VA4) For every u ∈ V , there exists −u ∈ V such that u + (−u) = 0.(VA5) For every u,v ∈ V , we have u + v = v + u.(SM1) For every c ∈ R and u ∈ V , we have cu ∈ V .(SM2) For every c ∈ R and u,v ∈ V , we have c(u + v) = cu + cv.(SM3) For every a, b ∈ R and u ∈ V , we have (a+ b)u = au + bu.(SM4) For every a, b ∈ R and u ∈ V , we have (ab)u = a(bu).(SM5) For every u ∈ V , we have 1u = u.

Remark. The elements a, b, c ∈ R discussed in (SM1)–(SM5) are known as scalars. Multiplication ofvectors by elements of R is sometimes known as scalar multiplication.



Example 5.1.5. Let n ∈ N. Consider the set Rn of all vectors of the form u = (u1, . . . , un), whereu1, . . . , un ∈ R. For any two vectors u = (u1, . . . , un) and v = (v1, . . . , vn) in Rn and any number c ∈ R,write

u + v = (u1 + v1, . . . , un + vn) and cu = (cu1, . . . , cun).

To check (VA1), simply note that u1+v1, . . . , un+vn ∈ R. To check (VA2), note that if w = (w1, . . . , wn),then

u + (v + w) = (u1, . . . , un) + (v1 + w1, . . . , vn + wn) = (u1 + (v1 + w1), . . . , un + (vn + wn))= ((u1 + v1) + w1, . . . , (un + vn) + wn) = (u1 + v1, . . . , un + vn) + (w1, . . . , wn)= (u + v) + w.

If we take 0 to be the zero vector (0, . . . , 0), then u + 0 = 0 + u = u, giving (VA3). Next, writing−u = (−u1, . . . ,−un), we have u + (−u) = 0, giving (VA4). To check (VA5), note that

u + v = (u1 + v1, . . . , un + vn) = (v1 + u1, . . . , vn + un) = v + u.

To check (SM1), simply note that cu1, . . . , cun ∈ R. To check (SM2), note that

c(u + v) = c(u1 + v1, . . . , un + vn) = (c(u1 + v1), . . . , c(un + vn))= (cu1 + cv1, . . . , cun + cvn) = (cu1, . . . , cun) + (cv1, . . . , cvn) = cu + cv.

To check (SM3), note that

(a+ b)u = ((a+ b)u1, . . . , (a+ b)un) = (au1 + bu1, . . . , aun + bun)= (au1, . . . , aun) + (bu1, . . . , bun) = au + bu.


(ab)u = ((ab)u1, . . . , (ab)un) = (a(bu1), . . . , a(bun)) = a(bu1, . . . , bun) = a(bu).

Finally, to check (SM5), note that

1u = (1u1, . . . , 1un) = (u1, . . . , un) = u.

It follows that Rn is a vector space over R. This is known as the n-dimensional euclidean space.

Example 5.1.6. Let k ∈ N. Consider the set Pk of all polynomials of the form

p(x) = p0 + p1x+ . . .+ pkxk, where p0, p1, . . . , pk ∈ R.

In other words, Pk is the set of all polynomials of degree at most k and with coefficients in R. For anytwo polynomials p(x) = p0 + p1x+ . . .+ pkx

k and q(x) = q0 + q1x+ . . .+ qkxk in Pk and for any number

c ∈ R, write

p(x) + q(x) = (p0 + q0) + (p1 + q1)x+ . . .+ (pk + qk)xk and cp(x) = cp0 + cp1x+ . . .+ cpkxk.

To check (VA1), simply note that p0 + q0, . . . , pk + qk ∈ R. To check (VA2), note that if we writer(x) = r0 + r1x+ . . .+ rkx

k, then we have

p(x) + (q(x) + r(x)) = (p0 + p1x+ . . .+ pkxk) + ((q0 + r0) + (q1 + r1)x+ . . .+ (qk + rk)xk)

= (p0 + (q0 + r0)) + (p1 + (q1 + r1))x+ . . .+ (pk + (qk + rk))xk

= ((p0 + q0) + r0) + ((p1 + q1) + r1)x+ . . .+ ((pk + qk) + rk)xk

= ((p0 + q0) + (p1 + q1)x+ . . .+ (pk + qk)xk) + (r0 + r1x+ . . .+ rkxk)

= (p(x) + q(x)) + r(x).



If we take 0 to be the zero polynomial 0 + 0x+ . . .+ 0xk, then p(x) +0 = 0+p(x) = p(x), giving (VA3).Next, writing −p(x) = −p0 − p1x − . . . − pkx

k, we have p(x) + (−p(x)) = 0, giving (VA4). To check(VA5), note that

p(x) + q(x) = (p0 + q0) + (p1 + q1)x+ . . .+ (pk + qk)xk

= (q0 + p0) + (q1 + p1)x+ . . .+ (qk + pk)xk = q(x) + p(x).

To check (SM1), simply note that cp0, . . . , cpk ∈ R. To check (SM2), note that

c(p(x) + q(x)) = c((p0 + q0) + (p1 + q1)x+ . . .+ (pk + qk)xk)= c(p0 + q0) + c(p1 + q1)x+ . . .+ c(pk + qk)xk

= (cp0 + cq0) + (cp1 + cq1)x+ . . .+ (cpk + cqk)xk

= (cp0 + cp1x+ . . .+ cpkxk) + (cq0 + cq1x+ . . .+ cqkx

k)= cp(x) + cq(x).


(a+ b)p(x) = (a+ b)p0 + (a+ b)p1x+ . . .+ (a+ b)pkxk

= (ap0 + bp0) + (ap1 + bp1)x+ . . .+ (apk + bpk)xk

= (ap0 + ap1x+ . . .+ apkxk) + (bp0 + bp1x+ . . .+ bpkx

k)= ap(x) + bp(x).


(ab)p(x) = (ab)p0 + (ab)p1x+ . . .+ (ab)pkxk = a(bp0) + a(bp1)x+ . . .+ a(bpk)xk

= a(bp0 + bp1x+ . . .+ bpkxk) = a(bp(x)).

Finally, to check (SM5), note that

1p(x) = 1p0 + 1p1x+ . . .+ 1pkxk = p0 + p1x+ . . .+ pkx

k = p(x).

It follows that Pk is a vector space over R. Note also that the vectors are the polynomials.

There are a few simple properties of vector spaces that we can deduce easily from the definition.

PROPOSITION 5A. Suppose that V is a vector space over R, and that u ∈ V and c ∈ R.(a) We have 0u = 0.(b) We have c0 = 0.(c) We have (−1)u = −u.(d) If cu = 0, then c = 0 or u = 0.

Proof. (a) By (SM1), we have 0u ∈ V . Hence

0u + 0u = (0 + 0)u (by (SM3)),= 0u (since 0 ∈ R).

It follows that

0u = 0u + 0 (by (VA3)),= 0u + (0u + (−(0u))) (by (VA4)),= (0u + 0u) + (−(0u)) (by (VA2)),= 0u + (−(0u)) (from above),= 0 (by (VA4)).



(b) By (SM1), we have c0 ∈ V . Hence

c0 + c0 = c(0 + 0) (by (SM2)),= c0 (by (VA3)).

It follows that

c0 = c0 + 0 (by (VA3)),= c0 + (c0 + (−(c0))) (by (VA4)),= (c0 + c0) + (−(c0)) (by (VA2)),= c0 + (−(c0)) (from above),= 0 (by (VA4)).

(c) We have

(−1)u = (−1)u + 0 (by (VA3)),= (−1)u + (u + (−u)) (by (VA4)),= ((−1)u + u) + (−u) (by (VA2)),= ((−1)u + 1u) + (−u) (by (SM5)),= ((−1) + 1)u + (−u) (by (SM3)),= 0u + (−u) (since 1 ∈ R),= 0 + (−u) (from (a)),= −u (by (VA3)).

(d) Suppose that cu = 0 and c 6= 0. Then c−1 ∈ R and

u = 1u (by (SM5)),= (c−1c)u (since c ∈ R \ 0),= c−1(cu) (by (SM4)),= c−10 (assumption),= 0 (from (b)),

as required. ©

5.2. Subspaces

Example 5.2.1. Consider the vector space R2 of all points (x, y), where x, y ∈ R. Let L be a linethrough the origin 0 = (0, 0). Suppose that L is represented by the equation αx + βy = 0; in otherwords,

L = (x, y) ∈ R2 : αx+ βy = 0.

Note first of all that 0 = (0, 0) ∈ L, so that (VA3) and (VA4) clearly hold in L. Also (VA2) and (VA5)clearly hold in L. To check (VA1), note that if (x, y), (u, v) ∈ L, then αx+ βy = 0 and αu+ βv = 0, sothat α(x+ u) + β(y+ v) = 0, whence (x, y) + (u, v) = (x+ u, y+ v) ∈ L. Next, note that (SM2)–(SM5)clearly hold in L. To check (SM1), note that if (x, y) ∈ L, then αx+ βy = 0, so that α(cx) + β(cy) = 0,whence c(x, y) = (cx, cy) ∈ L. It follows that L forms a vector space over R. In fact, we have shownthat every line in R2 through the origin is a vector space over R.



Definition. Suppose that V is a vector space over R, and that W is a subset of V . Then we say that Wis a subspace of V if W forms a vector space over R under the vector addition and scalar multiplicationdefined in V .

Example 5.2.2. We have just shown in Example 5.2.1 that every line in R2 through the origin is asubspace of R2. On the other hand, if we work through the example again, then it is clear that we havereally only checked conditions (VA1) and (SM1) for L, and that 0 = (0, 0) ∈ L.

PROPOSITION 5B. Suppose that V is a vector space over R, and that W is a non-empty subsetof V . Then W is a subspace of V if the following conditions are satisfied:(SP1) For every u,v ∈W , we have u + v ∈W .(SP2) For every c ∈ R and u ∈W , we have cu ∈W .

Proof. To show that W is a vector space over R, it remains to check that W satisfies (VA2)–(VA5)and (SM2)–(SM5). To check (VA3) and (VA4) for W , it clearly suffices to check that 0 ∈ W . Since Wis non-empty, there exists u ∈W . Then it follows from (SP2) and Proposition 5A(a) that 0 = 0u ∈W .The remaining conditions (VA2), (VA5) and (SM2)–(SM5) hold for all vectors in V , and hence also forall vectors in W . ©

Example 5.2.3. Consider the vector space R3 of all points (x, y, z), where x, y, z ∈ R. Let P be a planethrough the origin 0 = (0, 0, 0). Suppose that P is represented by the equation αx + βy + γz = 0; inother words,

P = (x, y, z) ∈ R2 : αx+ βy + γz = 0.

To check (SP1), note that if (x, y, z), (u, v, w) ∈ P , then αx + βy + γz = 0 and αu + βv + γw = 0, sothat α(x + u) + β(y + v) + γ(z + w) = 0, whence (x, y, z) + (u, v, w) = (x + u, y + v, z + w) ∈ P . Tocheck (SP2), note that if (x, y, z) ∈ P , then αx+βy+γz = 0, so that α(cx) +β(cy) +γ(cz) = 0, whencec(x, y, z) = (cx, cy, cz) ∈ P . It follows that P is a subspace of R3. Next, let L be a line through theorigin 0 = (0, 0, 0). Suppose that (α, β, γ) ∈ R3 is a non-zero point on L. Then we can write

L = t(α, β, γ) : t ∈ R.

Suppose that u = t(α, β, γ) ∈ L and v = s(α, β, γ) ∈ L, and that c ∈ R. Then

u + v = t(α, β, γ) + s(α, β, γ) = (t+ s)(α, β, γ) ∈ L,

giving (SP1). Also, cu = c(t(α, β, γ)) = (ct)(α, β, γ) ∈ L, giving (SP2). It follows that L is a subspaceof R3. Finally, it is not difficult to see that both 0 and R3 are subspaces of R3.

Example 5.2.4. Note that R2 is not a subspace of R3. First of all, R2 is not a subset of R3. Note alsothat vector addition and scalar multiplication are different in R2 and R3.

Example 5.2.5. Suppose that A is an m× n matrix and 0 is the m× 1 zero column matrix. Considerthe system Ax = 0 of m homogeneous linear equations in the n unknowns x1, . . . , xn, where

x =

x1...xn

is interpreted as an element of the vector space Rn, with usual vector addition and scalar multiplication.Let S denote the set of all solutions of the system. Suppose that x,y ∈ S and c ∈ R. Then

A(x + y) = Ax +Ay = 0 + 0 = 0,

giving (SP1). Also, A(cx) = c(Ax) = c0 = 0, giving (SP2). It follows that S is a subspace of Rn. Tosummarize, the space of solutions of a system of m homogeneous linear equations in n unknowns is asubspace of Rn.



Example 5.2.6. As a special case of Example 5.2.5, note that if we take two non-parallel planes in R3

through the origin 0 = (0, 0, 0), then the intersection of these two planes is clearly a line through theorigin. However, each plane is a homogeneous equation in the three unknowns x, y, z ∈ R. It follows thatthe intersection of the two planes is the collection of all solutions (x, y, z) ∈ R3 of the system formed bythe two homogeneous equations in the three unknowns x, y, z representing these two planes. We havealready shown in Example 5.2.3 that the line representing all these solutions is a subspace of R3.

Example 5.2.7. We showed in Example 5.1.3 that the set M2,2(R) of all 2 × 2 matrices with entriesin R forms a vector space over R. Consider the subset

W =(

a11 a12

a21 0

): a11, a12, a21 ∈ R

of M2,2(R). Since(

a11 a12

a21 0

)+(b11 b12

b21 0

)=(a11 + b11 a12 + b12

a21 + b21 0

)and c

(a11 a12

a21 0

)=(ca11 ca12

ca21 0

),

it follows that (SP1) and (SP2) are satisfied. Hence W is a subspace of M2,2(R).

Example 5.2.8. We showed in Example 5.1.4 that the set A of all functions of the form f : R→ R formsa vector space over R. Let C0 denote the set of all functions of the form f : R→ R which are continuousat x = 2, and let C1 denote the set of all functions of the form f : R → R which are differentiable atx = 2. Then it follows from the arithmetic of limits and the arithmetic of derivatives that C0 and C1 areboth subspaces of A. Furthermore, C1 is a subspace of C0 (why?). On the other hand, let k ∈ N. Recallfrom Example 5.1.6 the vector space Pk of all polynomials of the form

p(x) = p0 + p1x+ . . .+ pkxk, where p0, p1, . . . , pk ∈ R.

In other words, Pk is the set of all polynomials of degree at most k and with coefficients in R. ClearlyPk is a subspace of C1.

5.3. Linear Combination

In this section and the next two, we shall study ways of describing the vectors in a vector space V . Ourultimate goal is to be able to determine a subset B of vectors in V and describe every element of V interms of elements of B in a unique way. The first step in this direction is summarized below.

Definition. Suppose that v1, . . . ,vr are vectors in a vector space V over R. By a linear combinationof the vectors v1, . . . ,vr, we mean an expression of the type

c1v1 + . . .+ crvr,

where c1, . . . , cr ∈ R.

Example 5.3.1. In R2, every vector (x, y) is a linear combination of the two vectors i = (1, 0) andj = (0, 1), for clearly (x, y) = xi + yj.

Example 5.3.2. In R3, every vector (x, y, z) is a linear combination of the three vectors i = (1, 0, 0),j = (0, 1, 0) and k = (0, 0, 1), for clearly (x, y, z) = xi + yj + zk.



Example 5.3.3. In R4, the vector (1, 4,−2, 6) is a linear combination of the two vectors (1, 2, 0, 4) and(1, 1, 1, 3), for we have (1, 4,−2, 6) = 3(1, 2, 0, 4)− 2(1, 1, 1, 3). On the other hand, the vector (2, 6, 0, 9)is not a linear combination of the two vectors (1, 2, 0, 4) and (1, 1, 1, 3), for

(2, 6, 0, 9) = c1(1, 2, 0, 4) + c2(1, 1, 1, 3)

would lead to the system of four equations

c1 + c2 = 2,2c1 + c2 = 6,

c2 = 0,4c1 + 3c2 = 9.

It is easily checked that this system has no solutions.

Example 5.3.4. In the vector space A of all functions of the form f : R → R described in Example5.1.4, the function cos 2x is a linear combination of the three functions cos2 x, cosh2 x and sinh2 x. It isnot too difficult to check that

cos 2x = 2 cos2 x+ sinh2 x− cosh2 x,

noting that cos 2x = 2 cos2 x− 1 and cosh2 x− sinh2 x = 1.

We observe that in Example 5.3.1, every vector in R2 is a linear combination of the two vectors i and j.Similarly, in Example 5.3.2, every vector in R3 is a linear combination of the three vectors i, j and k.On the other hand, we observe that in Example 5.3.3, not every vector in R4 is a linear combination ofthe two vectors (1, 2, 0, 4) and (1, 1, 1, 3).

Let us therefore investigate the collection of all vectors in a vector space that can be represented aslinear combinations of a given set of vectors in V .

Definition. Suppose that v1, . . . ,vr are vectors in a vector space V over R. The set

spanv1, . . . ,vr = c1v1 + . . .+ crvr : c1, . . . , cr ∈ R

is called the span of the vectors v1, . . . ,vr. We also say that the vectors v1, . . . ,vr span V if

spanv1, . . . ,vr = V ;

in other words, if every vector in V can be expressed as a linear combination of the vectors v1, . . . ,vr.

Example 5.3.5. The two vectors i = (1, 0) and j = (0, 1) span R2.

Example 5.3.6. The three vectors i = (1, 0, 0), j = (0, 1, 0) and k = (0, 0, 1) span R3.

Example 5.3.7. The two vectors (1, 2, 0, 4) and (1, 1, 1, 3) do not span R4.

PROPOSITION 5C. Suppose that v1, . . . ,vr are vectors in a vector space V over R.(a) Then spanv1, . . . ,vr is a subspace of V .(b) Suppose further that W is a subspace of V and v1, . . . ,vr ∈W . Then spanv1, . . . ,vr ⊆W .

Proof. (a) Suppose that u,w ∈ spanv1, . . . ,vr and c ∈ R. There exist a1, . . . , ar, b1, . . . , br ∈ R suchthat

u = a1v1 + . . .+ arvr and w = b1v1 + . . .+ brvr.



Then

u + w = (a1v1 + . . .+ arvr) + (b1v1 + . . .+ brvr)= (a1 + b1)v1 + . . .+ (ar + br)vr ∈ spanv1, . . . ,vr

and

cu = c(a1v1 + . . .+ arvr) = (ca1)v1 + . . .+ (car)vr ∈ spanv1, . . . ,vr.

It follows from Proposition 5B that spanv1, . . . ,vr is a subspace of V .

(b) Suppose that c1, . . . , cr ∈ R and u = c1v1 + . . . + crvr ∈ spanv1, . . . ,vr. If v1, . . . ,vr ∈ W ,then it follows from (SM1) for W that c1v1, . . . , crvr ∈ W . It then follows from (VA1) for W thatu = c1v1 + . . .+ crvr ∈W . ©

Example 5.3.8. In R2, any non-zero vector v spans the subspace cv : c ∈ R. This is clearly a linethrough the origin. Also, try to draw a picture to convince yourself that any two non-zero vectors thatare not on the same line span R2.

Example 5.3.9. In R3, try to draw pictures to convince yourself that any non-zero vector spans asubspace which is a line through the origin; any two non-zero vectors that are not on the same line spana subspace which is a plane through the origin; and any three non-zero vectors that do not lie on thesame plane span R3.

5.4. Linear Independence

We first study two simple examples.

Example 5.4.1. Consider the three vectors v1 = (1, 2, 3), v2 = (3, 2, 1) and v3 = (3, 3, 3) in R3. Then

spanv1,v2,v3 = c1(1, 2, 3) + c2(3, 2, 1) + c3(3, 3, 3) : c1, c2, c3 ∈ R= (c1 + 3c2 + 3c3, 2c1 + 2c2 + 3c3, 3c1 + c2 + 3c3) : c1, c2, c3 ∈ R.

Write (x, y, z) = (c1 + 3c2 + 3c3, 2c1 + 2c2 + 3c3, 3c1 + c2 + 3c3). Then it is not difficult to see thatxyz

=

1 3 32 2 33 1 3

c1c2c3

,

and so (do not worry if you cannot understand why we take this next step)

( 1 −2 1 )

xyz

= ( 1 −2 1 )

1 3 32 2 33 1 3

c1c2c3

= ( 0 0 0 )

c1c2c3

= ( 0 ) ,

so that x− 2y + z = 0. It follows that spanv1,v2,v3 is a plane through the origin and not R3. Note,in fact, that 3v1 + 3v2 − 4v3 = 0. Note also that

det

1 3 32 2 33 1 3

= 0.



Example 5.4.2. Consider the three vectors v1 = (1, 1, 0), v2 = (5, 1,−3) and v3 = (2, 7, 4) in R3. Then

spanv1,v2,v3 = c1(1, 1, 0) + c2(5, 1,−3) + c3(2, 7, 4) : c1, c2, c3 ∈ R= (c1 + 5c2 + 2c3, c1 + c2 + 7c3,−3c2 + 4c3) : c1, c2, c3 ∈ R.

Write (x, y, z) = (c1 + 5c2 + 2c3, c1 + c2 + 7c3,−3c2 + 4c3). Then it is not difficult to see thatxyz

=

1 5 21 1 70 −3 4

c1c2c3

,

so that −25 26 −334 −4 53 −3 4

xyz

=

−25 26 −334 −4 53 −3 4

1 5 21 1 70 −3 4

c1c2c3

=

1 0 00 1 00 0 1

c1c2c3

=

c1c2c3

.

It follows that for every (x, y, z) ∈ R3, we can find c1, c2, c3 ∈ R such that (x, y, z) = c1v1 + c2v2 + c3v3.Hence spanv1,v2,v3 = R3. Note that

det

1 5 21 1 70 −3 4

6= 0,

and that the only solution for

(0, 0, 0) = c1v1 + c2v2 + c3v3

is c1 = c2 = c3 = 0.

Definition. Suppose that v1, . . . ,vr are vectors in a vector space V over R.(LD) We say that v1, . . . ,vr are linearly dependent if there exist c1, . . . , cr ∈ R, not all zero, such that

c1v1 + . . .+ crvr = 0.(LI) We say that v1, . . . ,vr are linearly independent if they are not linearly dependent; in other words,

if the only solution of c1v1 + . . .+ crvr = 0 in c1, . . . , cr ∈ R is given by c1 = . . . = cr = 0.

Example 5.4.3. Let us return to Example 5.4.1 and consider again the three vectors v1 = (1, 2, 3),v2 = (3, 2, 1) and v3 = (3, 3, 3) in R3. Consider the equation c1v1 + c2v2 + c3v3 = 0. This can berewritten in matrix form as 1 3 3

2 2 33 1 3

c1c2c3

=

000

.

Since

det

1 3 32 2 33 1 3

= 0,

the system has non-trivial solutions; for example, (c1, c2, c3) = (3, 3,−4), so that 3v1 + 3v2 − 4v3 = 0.Hence v1,v2,v3 are linearly dependent.



Example 5.4.4. Let us return to Example 5.4.2 and consider again the three vectors v1 = (1, 1, 0),v2 = (5, 1,−3) and v3 = (2, 7, 4) in R3. Consider the equation c1v1 + c2v2 + c3v3 = 0. This can berewritten in matrix form as 1 5 2

1 1 70 −3 4

c1c2c3

=

000

.

Since

det

1 5 21 1 70 −3 4

6= 0,

the only solution is c1 = c2 = c3 = 0. Hence v1,v2,v3 are linearly independent.

Example 5.4.5. In the vector space A of all functions of the form f : R→ R described in Example 5.1.4,the functions x, x2 and sinx are linearly independent. To see this, note that for every c1, c2, c3 ∈ R, thelinear combination c1x+ c2x

2 + c3 sinx is never identically zero unless c1 = c2 = c3 = 0.

Example 5.4.6. In Rn, the vectors e1, . . . , en, where

ej = (0, . . . , 0︸︷︷︸j−1

, 1, 0, . . . , 0︸︷︷︸n−j

) for every j = 1, . . . , n,

are linearly independent (why?).

We observe in Examples 5.4.3–5.4.4 that the determination of whether a collection of vectors in R3

are linearly dependent is based on whether a system of homogeneous linear equations has non-trivialsolutions. The same idea can be used to prove the following result concerning Rn.

PROPOSITION 5D. Suppose that v1, . . . ,vr are vectors in the vector space Rn. If r > n, thenv1, . . . ,vr are linearly dependent.

Proof. For every j = 1, . . . , r, write

vj = (a1j , . . . , anj).

Then the equation c1v1 + . . .+ crvr = 0 can be rewritten in matrix form as a11 . . . a1r...

...an1 . . . anr

c1...cr

=

0...0

.

If r > n, then there are more variables than equations. It follows that there must be non-trivial solutionsc1, . . . , cr ∈ R. Hence v1, . . . ,vr are linearly dependent. ©

Remarks. (1) Consider two vectors v1 = (a11, a21) and v2 = (a12, a22) in R2. To study linear indepen-dence, we consider the equation c1v1 + c2v2 = 0, which can be written in matrix form as(

a11 a12

a21 a22

)(c1c2

)=(

00

).

The vectors v1 and v2 are linearly independent precisely when

det(a11 a12

a21 a22

)6= 0.



This can be interpreted geometrically in the following way: The area of the parallelogram formed bythe two vectors v1 and v2 is in fact equal to the absolute value of the determinant of the matrix formedwith v1 and v2 as the columns; in other words,

det(a11 a12

a21 a22

).

It follows that the two vectors are linearly dependent precisely when the parallelogram has zero area;in other words, when the two vectors lie on the same line. On the other hand, if the parallelogram haspositive area, then the two vectors are linearly independent.

(2) Consider three vectors v1 = (a11, a21, a31), v2 = (a12, a22, a32), and v3 = (a13, a23, a33) in R3. Tostudy linear independence, we consider the equation c1v1 + c2v2 + c3v3 = 0, which can be written inmatrix form as a11 a12 a13

a21 a22 a23

a31 a32 a33

c1c2c3

=

000

.

The vectors v1, v2 and v3 are linearly independent precisely when

det

a11 a12 a13

a21 a22 a23

a31 a32 a33

6= 0.

This can be interpreted geometrically in the following way: The volume of the parallelepiped formed bythe three vectors v1, v2 and v3 is in fact equal to the absolute value of the determinant of the matrixformed with v1, v2 and v3 as the columns; in other words,

det

a11 a12 a13

a21 a22 a23

a31 a32 a33

.

It follows that the three vectors are linearly dependent precisely when the parallelepiped has zero volume;in other words, when the three vectors lie on the same plane. On the other hand, if the parallelepipedhas positive volume, then the three vectors are linearly independent.

(3) What is the geometric interpretation of two linearly independent vectors in R3? Well, note thatif v1 and v2 are non-zero and linearly dependent, then there exist c1, c2 ∈ R, not both zero, such thatc1v1 + c2v2 = 0. This forces the two vectors to be multiples of each other, so that they lie on thesame line, whence the parallelogram they form has zero area. It follows that if two vectors in R3 forma parallelogram with positive area, then they are linearly independent.

5.5. Basis and Dimension

In this section, we complete the task of describing uniquely every element of a vector space V in termsof the elements of a suitable subset B. To motivate the ideas, we first consider an example.

Example 5.5.1. Let us consider the three vectors v1 = (1, 1, 0), v2 = (5, 1,−3) and v3 = (2, 7, 4) in R3,as in Examples 5.4.2 and 5.4.4. We have already shown that spanv1,v2,v3 = R3, and that the vectorsv1,v2,v3 are linearly independent. Furthermore, we have shown that for every u = (x, y, z) ∈ R3, wecan write u = c1v1 + c2v2 + c3v3, where c1, c2, c3 ∈ R are determined uniquely by c1

c2c3

=

−25 26 −334 −4 53 −3 4

xyz

.



Definition. Suppose that v1, . . . ,vr are vectors in a vector space V over R. We say that v1, . . . ,vris a basis for V if the following two conditions are satisfied:(B1) We have spanv1, . . . ,vr = V .(B2) The vectors v1, . . . ,vr are linearly independent.

Example 5.5.2. Consider two vectors v1 = (a11, a21) and v2 = (a12, a22) in R2. Suppose that

det(a11 a12

a21 a22

)6= 0;

in other words, suppose that the parallelogram formed by the two vectors has non-zero area. Then itfollows from Remark (1) in Section 5.4 that v1 and v2 are linearly independent. Furthermore, for everyu = (x, y) ∈ R2, there exist c1, c2 ∈ R such that u = c1v1 + c2v2. Indeed, c1 and c2 are determined asthe unique solution of the system (

a11 a12

a21 a22

)(c1c2

)=(xy

).

Hence spanv1,v2 = R2. It follows that v1,v2 is a basis for R2.

Example 5.5.3. Consider three vectors of the type v1 = (a11, a21, a31), v2 = (a12, a22, a32) and v3 =(a13, a23, a33) in R3. Suppose that

det

a11 a12 a13

a21 a22 a23

a31 a32 a33

6= 0;

in other words, suppose that the parallelepiped formed by the three vectors has non-zero volume. Thenit follows from Remark (2) in Section 5.4 that v1, v2 and v3 are linearly independent. Furthermore, forevery u = (x, y, z) ∈ R3, there exist c1, c2, c3 ∈ R such that u = c1v1 + c2v2 + c3v3. Indeed, c1, c2 andc3 are determined as the unique solution of the system a11 a12 a13

a21 a22 a23

a31 a32 a33

c1c2c3

=

xyz

.

Hence spanv1,v2,v3 = R3. It follows that v1,v2,v3 is a basis for R3.

Example 5.5.4. In Rn, the vectors e1, . . . , en, where

ej = (0, . . . , 0︸︷︷︸j−1

, 1, 0, . . . , 0︸︷︷︸n−j

) for every j = 1, . . . , n,

are linearly independent and span Rn. Hence e1, . . . , en is a basis for Rn. This is known as thestandard basis for Rn.

Example 5.5.5. In the vector space M2,2(R) of all 2 × 2 matrices with entries in R as discussed inExample 5.1.3, the set (

1 00 0

),

(0 10 0

),

(0 01 0

),

(0 00 1

)is a basis.

Example 5.5.6. In the vector space Pk of polynomials of degree at most k and with coefficients in Ras discussed in Example 5.1.6, the set 1, x, x2, . . . , xk is a basis.



PROPOSITION 5E. Suppose that v1, . . . ,vr is a basis for a vector space V over R. Then everyelement u ∈ V can be expressed uniquely in the form

u = c1v1 + . . .+ crvr, where c1, . . . , cr ∈ R.

Proof. Since u ∈ V = spanv1, . . . ,vr, there exist c1, . . . , cr ∈ R such that u = c1v1 + . . . + crvr.Suppose now that b1, . . . , br ∈ R such that

c1v1 + . . .+ crvr = b1v1 + . . .+ brvr.

Then

(c1 − b1)v1 + . . .+ (cr − br)vr = 0.

Since v1, . . . ,vr are linearly independent, it follows that c1 − b1 = . . . = cr − br = 0. Hence c1, . . . , crare uniquely determined. ©

We have shown earlier that a vector space can have many bases. For example, any collection of threevectors not on the same plane is a basis for R3. In the following discussion, we attempt to find out someproperties of bases. However, we shall restrict our discussion to the following simple case.

Definition. A vector space V over R is said to be finite-dimensional if it has a basis containing onlyfinitely many elements.

Example 5.5.7. The vector spaces Rn, M2,2(R) and Pk that we have discussed earlier are all finite-dimensional.

Recall that in Rn, the standard basis has exactly n elements. On the other hand, it follows fromProposition 5D that any basis for Rn cannot contain more than n elements. However, can a basis for Rn

contain fewer than n elements?

We shall answer this question by showing that all bases for a given vector space have the same numberof elements. As a first step, we establish the following generalization of Proposition 5D.

PROPOSITION 5F. Suppose that v1, . . . ,vn is a basis for a vector space V over R. Suppose furtherthat r > n, and that the vectors u1, . . . ,ur ∈ V . Then the vectors u1, . . . ,ur are linearly dependent.

Proof. Since v1, . . . ,vn is a basis for the vector space V , we can write

u1 = a11v1 + . . .+ an1vn,

...ur = a1rv1 + . . .+ anrvn,

where aij ∈ R for every i = 1, . . . , n and j = 1, . . . , r. Let c1, . . . , cr ∈ R. Since v1, . . . ,vn are linearlyindependent, it follows that if

c1u1 + . . .+ crur = c1(a11v1 + . . .+ an1vn) + . . .+ cr(a1rv1 + . . .+ anrvn)= (a11c1 + . . .+ a1rcr)v1 + . . .+ (an1c1 + . . .+ anrcr)vn

= 0,

then a11c1 + . . .+a1rcr = . . . = an1c1 + . . .+anrcr = 0; in other words, we have the homogeneous system a11 . . . a1r...

...an1 . . . anr

c1...cr

=

0...0

.



If r > n, then there are more variables than equations. It follows that there must be non-trivial solutionsc1, . . . , cr ∈ R. Hence u1, . . . ,ur are linearly dependent. ©

PROPOSITION 5G. Suppose that V is a finite-dimensional vector space V over R. Then any twobases for V have the same number of elements.

Proof. Note simply that by Proposition 5F, the vectors in the “basis” with more elements must belinearly dependent, and so cannot be a basis. ©

We are now in a position to make the following definition.

Definition. Suppose that V is a finite-dimensional vector space over R. Then we say that V is ofdimension n if a basis for V contains exactly n elements.

Example 5.5.8. The vector space Rn has dimension n.

Example 5.5.9. The vector space M2,2(R) of all 2 × 2 matrices with entries in R, as discussed inExample 5.1.3, has dimension 4.

Example 5.5.10. The vector space Pk of all polynomials of degree at most k and with coefficients in R,as discussed in Example 5.1.6, has dimension (k + 1).

Example 5.5.11. Recall Example 5.2.5, where we showed that the set of solutions to a system of mhomogeneous linear equations in n unknowns is a subspace of Rn. Consider now the homogeneous system

1 3 −5 1 51 4 −7 3 −21 5 −9 5 −90 3 −6 2 −1

x1

x2

x3

x4

x5

=

0000

.

The solutions can be described in the form

x = c1

1−2−100

+ c2

130−5−1

,

where c1, c2 ∈ R (the reader must check this). It can be checked that (1,−2,−1, 0, 0) and (1, 3, 0,−5,−1)are linearly independent and so form a basis for the space of solutions of the system. It follows that thespace of solutions of the system has dimension 2.

Suppose that V is an n-dimensional vector space over R. Then any basis for V consists of exactly nlinearly independent vectors in V . Suppose now that we have a set of n linearly independent vectorsin V . Will this form a basis for V ?

We have already answered this question in the affirmative in the cases when the vector space is R2

or R3. To seek an answer to the general case, we first establish the following result.

PROPOSITION 5H. Suppose that V is a finite-dimensional vector space over R. Then any finite setof linearly independent vectors in V can be expanded, if necessary, to a basis for V .

Proof. Let S = v1, . . . ,vk be a finite set of linearly independent vectors in V . If S spans V , thenthe proof is complete. If S does not span V , then there exists vk+1 ∈ V that is not a linear combination



of the elements of S. The set T = v1, . . . ,vk,vk+1 is a finite set of linearly independent vectors in V ;for otherwise, there exist c1, . . . , ck, ck+1, not all zero, such that

c1v1 + . . .+ ckvk + ck+1vk+1 = 0.

If ck+1 = 0, then c1v1 + . . . + ckvk = 0, contradicting the assumption that S is a finite set of linearlyindependent vectors in V . If ck+1 6= 0, then

vk+1 = − c1ck+1

v1 − . . .−ckck+1

vk,

contradicting the assumption that vk+1 is not a linear combination of the elements of S. We now studythe finite set T of linearly independent vectors in V . If T spans V , then the proof is complete. If T doesnot span V , then we repeat the argument. Note that the number of vectors in a linearly independentexpansion of S cannot exceed the dimension of V , in view of Proposition 5F. So eventually some linearlyindependent expansion of S will span V . ©

PROPOSITION 5J. Suppose that V is an n-dimensional vector space over R. Then any set of nlinearly independent vectors in V is a basis for V .

Proof. Let S be a set of n linearly independent vectors in V . By Proposition 5H, S can be expanded,if necessary, to a basis for V . By Proposition 5F, any expansion of S will result in a linearly dependentset of vectors in V . It follows that S is already a basis for V . ©

Example 5.5.12. Consider the three vectors v1 = (1, 2, 3), v2 = (3, 2, 1) and v3 = (3, 3, 3) in R3, asin Examples 5.4.1 and 5.4.3. We showed that these three vectors are linearly dependent, and span theplane x− 2y + z = 0. Note that

v3 =34v1 +

34v2,

and that v1 and v2 are linearly independent. Consider now the vector v4 = (0, 0, 1). Note that v4 doesnot lie on the plane x− 2y + z = 0, so that v1,v2,v4 form a linearly independent set. It follows thatv1,v2,v4 is a basis for R3.




1. Determine whether each of the following subsets of R3 is a subspace of R3:a) (x, y, z) ∈ R3 : x = 0 b) (x, y, z) ∈ R3 : x+ y = 0c) (x, y, z) ∈ R3 : xz = 0 d) (x, y, z) ∈ R3 : y ≥ 0e) (x, y, z) ∈ R3 : x = y = z

2. For each of the following collections of vectors, determine whether the first vector is a linear com-bination of the remaining ones:

a) (1, 2, 3); (1, 0, 1), (2, 1, 0) in R3

b) x3 + 2x2 + 3x+ 1; x3, x2 + 3x, x2 + 1 in P4

c) (1, 3, 5, 7); (1, 0, 1, 0), (0, 1, 0, 1), (0, 0, 1, 1) in R4

3. For each of the following collections of vectors, determine whether the vectors are linearly indepen-dent:

a) (1, 2, 3), (1, 0, 1), (2, 1, 0) in R3 b) (1, 2), (3, 5), (−1, 3) in R2

c) (2, 5,−3, 6), (1, 0, 0, 1), (4, 0, 9, 6) in R4 d) x2 + 1, x+ 1, x2 + x in P3

4. Find the volume of the parallelepiped in R3 formed by the vectors (1, 2, 3), (1, 0, 1) and (3, 0, 2).

5. Let S be the set of all functions y that satisfy the differential equation

2d2y

dx2− 3

dydx

+ y = 0.

Show that S is a subspace of the vector space A described in Example 5.1.4.

6. For each of the sets in Problem 1 which is a subspace of R3, find a basis for the subspace, and thenextend it to a basis for R3.


LINEAR ALGEBRA

W W L CHEN

c© W W L Chen, 1994, 2008.





Chapter 6

VECTOR SPACES

ASSOCIATED WITH MATRICES

6.1. Introduction

Consider an m× n matrix

A =

a11 . . . a1n...

...am1 . . . amn

, (1)

with entries in R. Then the rows of A can be described as vectors in Rn as

r1 = (a11, . . . .a1n), . . . , rm = (am1, . . . .amn), (2)

while the columns of A can be described as vectors in Rm as

c1 =

a11...

am1

, . . . , cn =

a1n...

amn

. (3)

For simplicity, we sometimes write

A =

r1...

rm

and A = ( c1 . . . cn ) .

We also consider the system of homogeneous equations Ax = 0.

Chapter 6 : Vector Spaces Associated with Matrices page 1 of 9


In this chapter, we shall be concerned with three vector spaces that arise from the matrix A.

Definition. Suppose that A is an m× n matrix of the form (1), with entries in R.(RS) The subspace spanr1, . . . , rm of Rn, where r1, . . . , rm are given by (2) and are the rows of the

matrix A, is called the row space of A.(CS) The subspace spanc1, . . . , cn of Rm, where c1, . . . , cn are given by (3) and are the columns of

the matrix A, is called the column space of A.(NS) The solution space of the system of homogeneous linear equations Ax = 0 is called the nullspace

of A.

Remarks. (1) To see that spanr1, . . . , rm is a subspace of of Rn and that spanc1, . . . , cn is asubspace of of Rm, recall Proposition 5C.

(2) To see that the nullspace of A is a subspace of Rn, recall Example 5.2.5.

6.2. Row Spaces

Our aim in this section is to find a basis for the row space of a given matrix A with entries in R. Thistask is made considerably easier by the following result.

PROPOSITION 6A. Suppose that the matrix B can be obtained from the matrix A by elementary rowoperations. Then the row space of B is identical to the row space of A.

Proof. Clearly the rows of B are linear combinations of the rows of A, so that any linear combinationof the rows of B is a linear combination of the rows of A. Hence the row space of B is contained in therow space of A. On the other hand, the rows of A are linear combinations of the rows of B, so a similarargument shows that the row space of A is contained in the row space of B. ©

To find a basis for the row space of A, we can now reduce A to row echelon form, and consider thenon-zero rows that result from this reduction. It is easily seen that these non-zero rows are linearlyindependent.

Example 6.2.1. Let

A =

1 3 −5 1 51 4 −7 3 −21 5 −9 5 −90 3 −6 2 −1

.

Then

r1 = (1, 3,−5, 1, 5),r2 = (1, 4,−7, 3,−2),r3 = (1, 5,−9, 5,−9),r4 = (0, 3,−6, 2,−1).

Also the matrix A can be reduced to row echelon form as1 3 −5 1 50 1 −2 2 −70 0 0 1 −50 0 0 0 0

.



It follows that

v1 = (1, 3,−5, 1, 5), v2 = (0, 1,−2, 2,−7), v3 = (0, 0, 0, 1,−5)

form a basis for the row space of A.

Remark. Naturally, it is not necessary that the first non-zero entry of a basis element has to be 1.

6.3. Column Spaces

Our aim in this section is to find a basis for the column space of a given matrix A with entries in R.Naturally, we can consider the transpose At of A, and use the technique in Section 6.2 to find a basisfor the row space of At. This basis naturally gives rise to a basis for the column space of A.

Example 6.3.1. Let

A =

1 3 −5 1 51 4 −7 3 −21 5 −9 5 −90 3 −6 2 −1

.

Then

At =

1 1 1 03 4 5 3−5 −7 −9 −61 3 5 25 −2 −9 −1

.

The matrix At can be reduced to row echelon form as1 1 1 00 1 2 20 0 0 10 0 0 00 0 0 0

.

It follows that

w1 = (1, 1, 1, 0), w2 = (0, 1, 2, 2), w3 = (0, 0, 0, 1)

form a basis for the row space of At, and so a basis for the column space of A.

Alternatively, we may pursue the following argument, which shows that elementary row operations donot affect the linear dependence relations among the columns of a matrix.

PROPOSITION 6B. Suppose that the matrix B can be obtained from the matrix A by elementary rowoperations. Then any collection of columns of A are linearly independent if and only if the correspondingcollection of columns of B are linearly independent.

Proof. Let A∗ be a matrix made up of a collection of columns of A, and let B∗ be the matrix madeup of the corresponding collection of columns of B. Consider the two systems of homogeneous linearequations

A∗x = 0 and B∗x = 0.



Since B∗ can be obtained from the matrix A∗ by elementary row operations, the two systems have thesame solution set. On the other hand, the columns of A∗ are linearly independent precisely when thesystem A∗x = 0 has only the trivial solution, precisely when the system B∗x = 0 has only the trivialsolution, precsiely when the columns of B∗ are linearly independent. ©

To find a basis for the column space of A, we can now reduce A to row echelon form, and considerthe pivot columns that result from this reduction. It is easily seen that these pivot columns are linearlyindependent, and that any non-pivot column is a linear combination of the pivot columns.

Example 6.3.2. Let

A =

1 3 −5 1 51 4 −7 3 −21 5 −9 5 −90 3 −6 2 −1

.

Then A can be reduced to row echelon form as1 3 −5 1 50 1 −2 2 −70 0 0 1 −50 0 0 0 0

.

It follows that the pivot columns of A are the first, second and fourth columns. Hence

u1 = (1, 1, 1, 0), u2 = (3, 4, 5, 3), u3 = (1, 3, 5, 2)

form a basis for the column space of A.

6.4. Rank of a Matrix

For the matrix

A =

1 3 −5 1 51 4 −7 3 −21 5 −9 5 −90 3 −6 2 −1

,

we have shown that the row space has dimension 3, and so does the column space. In fact, we have thefollowing important result.

PROPOSITION 6C. For any matrix A with entries in R, the dimension of the row space is the sameas the dimension of the column space.

Proof. For any matrix A, we can reduce A to row echelon form. Then the dimension of the row spaceof A is equal to the number of non-zero rows in the row echelon form. On the other hand, the dimensionof the column space of A is equal to the number of pivot columns in the row echelon form. However, thenumber of non-zero rows in the row echelon form is the same as the number of pivot columns. ©

Definition. The rank of a matrix A, denoted by rank(A), is equal to the common value of the dimensionof its row space and the dimension of its column space.


A =

1 3 −5 1 51 4 −7 3 −21 5 −9 5 −90 3 −6 2 −1

has rank 3.



6.5. Nullspaces


A =

1 3 −5 1 51 4 −7 3 −21 5 −9 5 −90 3 −6 2 −1

.

We showed in Example 5.5.11 that the space of solutions of Ax = 0 has dimension 2. In other words,the nullspace of A has dimension 2. Note that in this particular case, the dimension of the nullspace ofA and the dimension of the column space of A have a sum of 5, the number of columns of A.

Recall now that the nullspace of A is a subspace of Rn, where n is the number of columns of thematrix A.

PROPOSITION 6D. For any matrix A with entries in R, the sum of the dimension of its columnspace and the dimension of its nullspace is equal to the number of columns of A.

Sketch of Proof. We consider the system of homogeneous linear equations Ax = 0, and reduce Ato row echelon form. The number of leading variables is now equal to the dimension of the row spaceof A, and so equal to the dimension of the column space of A. On the other hand, the number of freevariables is equal to the dimension of the space of solutions, which is the nullspace. Note now that thetotal number of variables is equal to the number of columns of A. ©

Remark. Proposition 6D is sometimes known as the Rank-nullity theorem, where the nullity of a matrixis the dimension of its nullspace.

We conclude this section by stating the following result for square matrices.

PROPOSITION 6E. Suppose that A is an n × n matrix with entries in R. Then the followingstatements are equivalent:(a) A can be reduced to In by elementary row operations.(b) A is invertible.(c) det A 6= 0.(d) The system Ax = 0 has only the trivial solution.(e) The system Ax = b is soluble for every b ∈ Rn.(f) The rows of A are linearly independent.(g) The columns of A are linearly independent.(h) A has rank n.

6.6. Solution of Non-Homogeneous Systems

Consider now a non-homogeneous system of equations

Ax = b, (4)

where

A =

a11 . . . a1n...

...am1 . . . amn

and b =

b1...

bm

, (5)

with entries in R.



Our aim here is to determine whether a given system (4) has a solution without making any attemptto actually find any solution.

Note first of all that a11 . . . a1n...

...am1 . . . amn

x1...

xn

=

a11x1 + . . . + a1nxn...

am1x1 + . . . + amnxn

= x1

a11...

am1

+ . . . + xn

a1n...

amn

.

It follows that Ax can be described by

Ax = x1c1 + . . . + xncn,

where c, . . . , cn are defined by (3) and are the columns of A. In other words, Ax is a linear combinationof the columns of A. It follows that if the system (4) has a solution, then b must be a linear combinationof the columns of A. This means that b must belong to the column space of A, so that the two matrices

A =

a11 . . . a1n...

...am1 . . . amn

and (A|b) =

a11 . . . a1n b1...

......

am1 . . . amn bm

(6)

must have the same (column) rank.

On the other hand, if the two matrices A and (A|b) have the same rank, then b must be a linearcombination of the columns of A, so that

b = x1c1 + . . . + xncn

for some x1, . . . , xn ∈ R. This gives a solution of the system (4).

We have just proved the following result.

PROPOSITION 6F. For any matrix A with entries in R, the non-homogeneous system of equationsAx = b has a solution if and only if the matrices A and (A|b) have the same rank. Here (A|b) is definedby (5) and (6).


A =

1 3 −5 1 51 4 −7 3 −21 5 −9 5 −90 3 −6 2 −1

and b =

1233

.

We have already shown that rank(A) = 3. Now

(A|b) =

1 3 −5 1 5 11 4 −7 3 −2 21 5 −9 5 −9 30 3 −6 2 −1 3

can be reduced to row echelon form as

1 3 −5 1 5 10 1 −2 2 −7 10 0 0 1 −5 00 0 0 0 0 0

,

so that rank(A|b) = 3. It follows that the system has a solution.




A =

1 3 −5 1 51 4 −7 3 −21 5 −9 5 −90 3 −6 2 −1

and b =

1243

.

We have already shown that rank(A) = 3. Now

(A|b) =

1 3 −5 1 5 11 4 −7 3 −2 21 5 −9 5 −9 40 3 −6 2 −1 3

can be reduced to row echelon form as

1 3 −5 1 5 10 1 −2 2 −7 10 0 0 1 −5 00 0 0 0 0 1

,

so that rank(A|b) = 4. It follows that the system has no solution.

Remark. The matrix (A|b) is sometimes known as the augmented matrix.

We conclude this chapter by describing the set of all solutions of a non-homogeneous system of equa-tions.

PROPOSITION 6G. Suppose that A is a matrix with entries in R. Suppose further that x0 is asolution of the non-homogeneous system of equations Ax = b, and that v1, . . . ,vr is a basis for thenullspace of A. Then every solution of the system Ax = b can be written in the form

x = x0 + c1v1 + . . . + crvr, where c1, . . . , cr ∈ R. (7)

On the other hand, every vector of the form (7) is a solution to the system Ax = b.

Proof. Let x be any solution of the system Ax = b. Since Ax0 = b, it follows that A(x − x0) = 0.Hence there exist c1, . . . , cr ∈ R such that

x− x0 = c1v1 + . . . + crvr,

giving (7). On the other hand, it follows from (7) that

Ax = A(x0 + c1v1 + . . . + crvr) = Ax0 + c1Av1 + . . . + crAvr = b + 0 + . . . + 0 = b.

Hence every vector of the form (7) is a solution to the system Ax = b. ©


A =

1 3 −5 1 51 4 −7 3 −21 5 −9 5 −90 3 −6 2 −1

and b =

1233

.

We have already shown in Example 5.5.11 that v1 = (1,−2,−1, 0, 0) and v2 = (1, 3, 0,−5,−1) form abasis for the nullspace of A. On the other hand, x0 = (−4, 0, 1, 5, 1) is a solution of the non-homogeneoussystem. It follows that the solutions of the non-homogeneous system are given by

x = (−4, 0, 1, 5, 1) + c1(1,−2,−1, 0, 0) + c2(1, 3, 0,−5,−1) where c1, c2 ∈ R.



Example 6.6.4. Consider the non-homogeneous system x − 2y + z = 2 in R3. Note that this systemhas only one equation. The corresponding homogeneous system is given by x − 2y + z = 0, and thisrepresents a plane through the origin. It is easily seen that (1, 1, 1) and (2, 1, 0) form a basis for thesolution space of x− 2y + z = 0. On the other hand, note that (1, 0, 1) is a solution of x− 2y + z = 2.It follows that the solutions of x− 2y + z = 2 are of the form

(x, y, z) = (1, 0, 1) + c1(1, 1, 1) + c2(2, 1, 0), where c1, c2 ∈ R.

Try to draw a picture for this problem.




1. For each of the following matrices, find a basis for the row space and a basis for the column spaceby first reducing the matrix to row echelon form:

a)

5 9 33 −5 −61 5 3

b)

1 2 4 −1 51 2 3 −1 31 2 0 −4 −3

c)

1 1 21 3 −84 −3 −71 12 −3

d)

1 2 3 40 1 −1 53 4 11 2

2. For each of the following matrices, determine whether the non-homogeneous system of linear equa-

tions Ax = b has a solution:

a) A =

5 9 33 −5 −61 5 3

and b =

426

b) A =

1 2 4 −1 51 2 3 −1 31 2 0 −4 −3

and b =

357

c) A =

1 1 21 3 −84 −3 −71 12 −3

and b =

0124

d) A =

1 2 3 40 1 −1 53 4 11 2

and b =

101


LINEAR ALGEBRA

W W L CHEN

c© W W L Chen, 1982, 2008.






Chapter 7

EIGENVALUES AND EIGENVECTORS

7.1. Introduction

Example 7.1.1. Consider a function f : R2 → R2, defined for every (x, y) ∈ R2 by f(x, y) = (s, t),where (

st

)=(

3 31 5

)(xy

).

Note that(3 31 5

)(3−1

)=(

6−2

)= 2

(3−1

)and

(3 31 5

)(11

)=(

66

)= 6

(11

).

On the other hand, note that

v1 =(

3−1

)and v2 =

(11

)form a basis for R2. It follows that every u ∈ R2 can be written uniquely in the form u = c1v1 + c2v2,where c1, c2 ∈ R, so that

Au = A(c1v1 + c2v2) = c1Av1 + c2Av2 = 2c1v1 + 6c2v2.

Note that in this case, the function f : R2 → R2 can be described easily in terms of the two specialvectors v1 and v2 and the two special numbers 2 and 6. Let us now examine how these special vectorsand numbers arise. We hope to find numbers λ ∈ R and non-zero vectors v ∈ R2 such that(

3 31 5

)v = λv.

Chapter 7 : Eigenvalues and Eigenvectors page 1 of 13


Since

λv = λ

(1 00 1

)v =

(λ 00 λ

)v,

we must have ((3 31 5

)−(λ 00 λ

))v = 0.

In other words, we must have (3− λ 3

1 5− λ

)v = 0. (1)

In order to have non-zero v ∈ R2, we must therefore ensure that

det(

3− λ 31 5− λ

)= 0.

Hence (3− λ)(5− λ)− 3 = 0, with roots λ1 = 2 and λ2 = 6. Substituting λ = 2 into (1), we obtain(1 31 3

)v = 0, with root v1 =

(3−1

).

Substituting λ = 6 into (1), we obtain(−3 31 −1

)v = 0, with root v2 =

(11

).


A =

a11 . . . a1n...

...an1 . . . ann

(2)

is an n × n matrix with entries in R. Suppose further that there exist a number λ ∈ R and a non-zerovector v ∈ Rn such that Av = λv. Then we say that λ is an eigenvalue of the matrix A, and that v isan eigenvector corresponding to the eigenvalue λ.

Suppose that λ is an eigenvalue of the n× n matrix A, and that v is an eigenvector corresponding tothe eigenvalue λ. Then Av = λv = λIv, where I is the n × n identity matrix, so that (A − λI)v = 0.Since v ∈ Rn is non-zero, it follows that we must have

det(A− λI) = 0. (3)

In other words, we must have

det

a11 − λ a12 . . . a1n

a21 a22 − λ a2n...

. . ....

an1 an2 . . . ann − λ

= 0.

Note that (3) is a polynomial equation. Solving this equation (3) gives the eigenvalues of the matrix A.On the other hand, for any eigenvalue λ of the matrix A, the set

v ∈ Rn : (A− λI)v = 0 (4)

is the nullspace of the matrix A− λI, a subspace of Rn.



Definition. The polynomial (3) is called the characteristic polynomial of the matrix A. For any rootλ of (3), the space (4) is called the eigenspace corresponding to the eigenvalue λ.

Example 7.1.2. The matrix (3 31 5

)has characteristic polynomial (3 − λ)(5 − λ) − 3 = 0; in other words, λ2 − 8λ + 12 = 0. Hence theeigenvalues are λ1 = 2 and λ2 = 6, with corresponding eigenvectors

v1 =(

3−1

)and v2 =

(11

)respectively. The eigenspace corresponding to the eigenvalue 2 is

v ∈ R2 :(

1 31 3

)v = 0

=c

(3−1

): c ∈ R

.

The eigenspace corresponding to the eigenvalue 6 isv ∈ R2 :

(−3 31 −1

)v = 0

=c

(11

): c ∈ R

.


A =

−1 6 −120 −13 300 −9 20

.

To find the eigenvalues of A, we need to find the roots of

det

−1− λ 6 −120 −13− λ 300 −9 20− λ

= 0;

in other words, (λ + 1)(λ − 2)(λ − 5) = 0. The eigenvalues are therefore λ1 = −1, λ2 = 2 and λ3 = 5.An eigenvector corresponding to the eigenvalue −1 is a solution of the system

(A+ I)v =

0 6 −120 −12 300 −9 21

v = 0, with root v1 =

100

.

An eigenvector corresponding to the eigenvalue 2 is a solution of the system

(A− 2I)v =

−3 6 −120 −15 300 −9 18


021

.


(A− 5I)v =

−6 6 −120 −18 300 −9 15


1−5−3

.

Note that the three eigenspaces are all lines through the origin. Note also that the eigenvectors v1, v2

and v3 are linearly independent, and so form a basis for R3.




A =

17 −10 −545 −28 −15−30 20 12

.


det

17− λ −10 −545 −28− λ −15−30 20 12− λ

= 0;

in other words, (λ+ 3)(λ− 2)2 = 0. The eigenvalues are therefore λ1 = −3 and λ2 = 2. An eigenvectorcorresponding to the eigenvalue −3 is a solution of the system

(A+ 3I)v =

20 −10 −545 −25 −15−30 20 15


13−2

.


(A− 2I)v =

15 −10 −545 −30 −15−30 20 10

v = 0, with roots v2 =

103

and v3 =

230

.

Note that the eigenspace corresponding to the eigenvalue −3 is a line through the origin, while theeigenspace corresponding to the eigenvalue 2 is a plane through the origin. Note also that the eigenvectorsv1, v2 and v3 are linearly independent, and so form a basis for R3.


A =

2 −1 01 0 00 0 3

.


det

2− λ −1 01 0− λ 00 0 3− λ

= 0;

in other words, (λ − 3)(λ − 1)2 = 0. The eigenvalues are therefore λ1 = 3 and λ2 = 1. An eigenvectorcorresponding to the eigenvalue 3 is a solution of the system

(A− 3I)v =

−1 −1 01 −3 00 0 0


001

.


(A− I)v =

1 −1 01 −1 00 0 2


110

.

Note that the eigenspace corresponding to the eigenvalue 3 is a line through the origin. On the otherhand, the matrix 1 −1 0

1 −1 00 0 2



has rank 2, and so the eigenspace corresponding to the eigenvalue 1 is of dimension 1 and so is also aline through the origin. We can therefore only find two linearly independent eigenvectors, so that R3

does not have a basis consisting of linearly independent eigenvectors of the matrix A.


A =

3 −3 21 −1 21 −3 4

.


det

3− λ −3 21 −1− λ 21 −3 4− λ

= 0;

in other words, (λ − 2)3 = 0. The eigenvalue is therefore λ = 2. An eigenvector corresponding to theeigenvalue 2 is a solution of the system

(A− 2I)v =

1 −3 21 −3 21 −3 2

v = 0, with roots v1 =

20−1

and v2 =

310

.

Note now that the matrix 1 −3 21 −3 21 −3 2

has rank 1, and so the eigenspace corresponding to the eigenvalue 2 is of dimension 2 and so is a planethrough the origin. We can therefore only find two linearly independent eigenvectors, so that R3 doesnot have a basis consisting of linearly independent eigenvectors of the matrix A.

Example 7.1.7. Suppose that λ is an eigenvalue of a matrix A, with corresponding eigenvector v. Then

A2v = A(Av) = A(λv) = λ(Av) = λ(λv) = λ2v.

Hence λ2 is an eigenvalue of the matrix A2, with corresponding eigenvector v. In fact, it can be proved byinduction that for every natural number k ∈ N, λk is an eigenvalue of the matrix Ak, with correspondingeigenvector v.

Example 7.1.8. Consider the matrix 1 5 40 2 60 0 3

.


det

1− λ 5 40 2− λ 60 0 3− λ

= 0;

in other words, (λ − 1)(λ − 2)(λ − 3) = 0. It follows that the eigenvalues of the matrix A are given bythe entries on the diagonal. In fact, this is true for all triangular matrices.



7.2. The Diagonalization Problem

Example 7.2.1. Let us return to Examples 7.1.1 and 7.1.2, and consider again the matrix

A =(

3 31 5

).

We have already shown that the matrix A has eigenvalues λ1 = 2 and λ2 = 6, with correspondingeigenvectors

v1 =(

3−1

)and v2 =

(11

)respectively. Since the eigenvectors form a basis for R2, every u ∈ R2 can be written uniquely in theform

u = c1v1 + c2v2, where c1, c2 ∈ R, (5)

and

Au = 2c1v1 + 6c2v2. (6)

Write

c =(c1c2

), u =

(xy

), Au =

(st

).

Then (5) and (6) can be rewritten as (xy

)=(

3 1−1 1

)(c1c2

)(7)

and (st

)=(

3 1−1 1

)(2c16c2

)=(

3 1−1 1

)(2 00 6

)(c1c2

)(8)

respectively. If we write

P =(

3 1−1 1

)and D =

(2 00 6

),

then (7) and (8) become u = Pc and Au = PDc respectively, so that APc = PDc. Note that c ∈ R2 isarbitrary. This implies that (AP − PD)c = 0 for every c ∈ R2. Hence we must have AP = PD. SinceP is invertible, we conclude that

P−1AP = D.

Note here that

P = ( v1 v2 ) and D =(λ1 00 λ2

).

Note also the crucial point that the eigenvectors of A form a basis for R2.



We now consider the problem in general.

PROPOSITION 7A. Suppose that A is an n×n matrix, with entries in R. Suppose further that A haseigenvalues λ1, . . . , λn ∈ R, not necessarily distinct, with corresponding eigenvectors v1, . . . ,vn ∈ Rn,and that v1, . . . ,vn are linearly independent. Then

P−1AP = D,

where

P = ( v1 . . . vn ) and D =

λ1

. . .λn

.

Proof. Since v1, . . . ,vn are linearly independent, they form a basis for Rn, so that every u ∈ Rn canbe written uniquely in the form

u = c1v1 + . . .+ cnvn, where c1, . . . , cn ∈ R, (9)

and

Au = A(c1v1 + . . .+ cnvn) = c1Av1 + . . .+ cnAvn = λ1c1v1 + . . .+ λncnvn. (10)

Writing

c =

c1...cn

,

we see that (9) and (10) can be rewritten as

u = Pc and Au = P

λ1c1...

λncn

= PDc

respectively, so that

APc = PDc.

Note that c ∈ Rn is arbitrary. This implies that (AP − PD)c = 0 for every c ∈ Rn. Hence we musthave AP = PD. Since the columns of P are linearly independent, it follows that P is invertible. HenceP−1AP = D as required. ©


A =

−1 6 −120 −13 300 −9 20

,

as in Example 7.1.3. We have P−1AP = D, where

P =

1 0 10 2 −50 1 −3

and D =

−1 0 00 2 00 0 5

.




A =

17 −10 −545 −28 −15−30 20 12

,

as in Example 7.1.4. We have P−1AP = D, where

P =

1 1 23 0 3−2 3 0

and D =

−3 0 00 2 00 0 2

.

Definition. Suppose that A is an n × n matrix, with entries in R. We say that A is diagonalizableif there exists an invertible matrix P , with entries in R, such that P−1AP is a diagonal matrix, withentries in R.

It follows from Proposition 7A that an n × n matrix A with entries in R is diagonalizable if itseigenvectors form a basis for Rn. In the opposite direction, we establish the following result.

PROPOSITION 7B. Suppose that A is an n×n matrix, with entries in R. Suppose further that A isdiagonalizable. Then A has n linearly independent eigenvectors in Rn.

Proof. Suppose that A is diagonalizable. Then there exists an invertible matrix P , with entries in R,such that D = P−1AP is a diagonal matrix, with entries in R. Denote by v1, . . . ,vn the columns of P ;in other words, write

P = ( v1 . . . vn ) .

Also write

D =

λ1

. . .λn

.

Clearly we have AP = PD. It follows that

(Av1 . . . Avn ) = A ( v1 . . . vn ) = ( v1 . . . vn )

λ1

. . .λn

= (λ1v1 . . . λnvn ) .

Equating columns, we obtain

Av1 = λ1v1, . . . , Avn = λnvn.

It follows that A has eigenvalues λ1, . . . , λn ∈ R, with corresponding eigenvectors v1, . . . ,vn ∈ Rn. SinceP is invertible and v1, . . . ,vn are the columns of P , it follows that the eigenvectors v1, . . . ,vn are linearlyindependent. ©

In view of Propositions 7A and 7B, the question of diagonalizing a matrix A with entries in R isreduced to one of linear independence of its eigenvectors.

PROPOSITION 7C. Suppose that A is an n×n matrix, with entries in R. Suppose further that A hasdistinct eigenvalues λ1, . . . , λn ∈ R, with corresponding eigenvectors v1, . . . ,vn ∈ Rn. Then v1, . . . ,vn

are linearly independent.



Proof. Suppose that v1, . . . ,vn are linearly dependent. Then there exist c1, . . . , cn ∈ R, not all zero,such that

c1v1 + . . .+ cnvn = 0. (11)

Then

A(c1v1 + . . .+ cnvn) = c1Av1 + . . .+ cnAvn = λ1c1v1 + . . .+ λncnvn = 0. (12)

Since v1, . . . ,vn are all eigenvectors and hence non-zero, it follows that at least two numbers amongc1, . . . , cn are non-zero, so that c1, . . . , cn−1 are not all zero. Multiplying (11) by λn and subtractingfrom (12), we obtain

(λ1 − λn)c1v1 + . . .+ (λn−1 − λn)cn−1vn−1 = 0.

Note that since λ1, . . . , λn are distinct, the numbers λ1 − λn, . . . , λn−1 − λn are all non-zero. It followsthat v1, . . . ,vn−1 are linearly dependent. To summarize, we can eliminate one eigenvector and theremaining ones are still linearly dependent. Repeating this argument a finite number of times, we arriveat a linearly dependent set of one eigenvector, clearly an absurdity. ©

We now summarize our discussion in this section.

DIAGONALIZATION PROCESS. Suppose that A is an n× n matrix with entries in R.(1) Determine whether the n roots of the characteristic polynomial det(A− λI) are real.(2) If not, then A is not diagonalizable. If so, then find the eigenvectors corresponding to these eigen-

values. Determine whether we can find n linearly independent eigenvectors.(3) If not, then A is not diagonalizable. If so, then write

P = ( v1 . . . vn ) and D =

λ1

. . .λn

,

where λ1, . . . , λn ∈ R are the eigenvalues of A and where v1, . . . ,vn ∈ Rn are respectively theircorresponding eigenvectors. Then P−1AP = D.

7.3. Some Remarks

In all the examples we have discussed, we have chosen matrices A such that the characteristic polynomialdet(A− λI) has only real roots. However, there are matrices A where the characteristic polynomial hasnon-real roots. If we permit λ1, . . . , λn to take values in C and permit “eigenvectors” to have entries inC, then we may be able to “diagonalize” the matrix A, using matrices P and D with entries in C. Thedetails are similar.


A =(

1 −51 −1

).


det(

1− λ −51 −1− λ

)= 0;



in other words, λ2 + 4 = 0. Clearly there are no real roots, so the matrix A has no eigenvalues in R. Tryto show, however, that the matrix A can be “diagonalized” to the matrix

D =(

2i 00 −2i

).

We also state without proof the following useful result which will guarantee many examples where thecharacteristic polynomial has only real roots.

PROPOSITION 7D. Suppose that A is an n×n matrix, with entries in R. Suppose further that A issymmetric. Then the characteristic polynomial det(A− λI) has only real roots.

We conclude this section by discussing an application of diagonalization. We illustrate this by anexample.


A =

17 −10 −545 −28 −15−30 20 12

,

as in Example 7.2.3. Suppose that we wish to calculate A98. Note that P−1AP = D, where

P =

1 1 23 0 3−2 3 0

and D =

−3 0 00 2 00 0 2

.

It follows that A = PDP−1, so that

A98 = (PDP−1) . . . (PDP−1)︸︷︷︸98

= PD98P−1 = P

398 0 00 298 00 0 298

P−1.

This is much simpler than calculating A98 directly.

7.4. An Application to Genetics

In this section, we discuss very briefly the problem of autosomal inheritance. Here we consider a setof two genes designated by G and g. Each member of the population inherits one from each parent,resulting in possible genotypes GG, Gg and gg. Furthermore, the gene G dominates the gene g, sothat in the case of human eye colours, for example, people with genotype GG or Gg have brown eyeswhile people with genotype gg have blue eyes. It is also believed that each member of the populationhas equal probability of inheriting one or the other gene from each parent. The table below gives thesepeobabilities in detail. Here the genotypes of the parents are listed on top, and the genotypes of theoffspring are listed on the left.

GG−GG GG−Gg GG− gg Gg −Gg Gg − gg gg − ggGG 1 1

2 0 14 0 0

Gg 0 12 1 1

212 0

gg 0 0 0 14

12 1



Example 7.4.1. Suppose that a plant breeder has a large population consisting of all three genotypes.At regular intervals, each plant he owns is fertilized with a plant known to have genotype GG, andis then disposed of and replaced by one of its offsprings. We would like to study the distribution ofthe three genotypes after n rounds of fertilization and replacements, where n is an arbitrary positiveinteger. Suppose that GG(n), Gg(n) and gg(n) denote the proportion of each genotype after n roundsof fertilization and replacements, and that GG(0), Gg(0) and gg(0) denote the initial proportions. Thenclearly we have

GG(n) +Gg(n) + gg(n) = 1 for every n = 0, 1, 2, . . . .

On the other hand, the left hand half of the table above shows that for every n = 1, 2, 3, . . . , we have

GG(n) = GG(n− 1) + 12Gg(n− 1),

Gg(n) = 12Gg(n− 1) + gg(n− 1),

and

gg(n) = 0,

so that GG(n)Gg(n)gg(n)

=

1 1/2 00 1/2 10 0 0

GG(n− 1)Gg(n− 1)gg(n− 1)

.

It follows that GG(n)Gg(n)gg(n)

= An

GG(0)Gg(0)gg(0)

for every n = 1, 2, 3, . . . ,

where the matrix

A =

1 1/2 00 1/2 10 0 0

has eigenvalues λ1 = 1, λ2 = 0, λ3 = 1/2, with respective eigenvectors

v1 =

100

, v2 =

1−21

, v3 =

1−10

.

We therefore write

P =

1 1 10 −2 −10 1 0

and D =

1 0 00 0 00 0 1/2

, with P−1 =

1 1 10 0 10 −1 −2

.

Then P−1AP = D, so that A = PDP−1, and so

An = PDnP−1 =

1 1 10 −2 −10 1 0

1 0 00 0 00 0 1/2n

1 1 10 0 10 −1 −2

=

1 1− 1/2n 1− 1/2n−1

0 1/2n 1/2n−1

0 0 0

.



It follows that GG(n)Gg(n)gg(n)

=

1 1− 1/2n 1− 1/2n−1

0 1/2n 1/2n−1

0 0 0

GG(0)Gg(0)gg(0)

=

GG(0) +Gg(0) + gg(0)−Gg(0)/2n − gg(0)/2n−1

Gg(0)/2n + gg(0)/2n−1

0

=

1−Gg(0)/2n − gg(0)/2n−1

Gg(0)/2n + gg(0)/2n−1

0

→ 1

00

as n→∞.

This means that nearly the whole crop will have genotype GG.




1. For each of the following 2 × 2 matrices, find all eigenvalues and describe the eigenspace of thematrix; if possible, diagonalize the matrix:

a)(

3 4−2 −3

)b)(

2 −11 0

)2. For each of the following 3 × 3 matrices, find all eigenvalues and describe the eigenspace of the

matrix; if possible, diagonalize the matrix:

a)

−2 9 −61 −2 03 −9 5

b)

2 −1 −10 3 2−1 1 2

c)

1 1 00 1 10 0 1

3. Consider the matrices A =

−10 6 3−26 16 816 −10 −5

and B =

0 −6 −160 17 450 −6 −16

.

a) Show that A and B have the same eigenvalues.b) Reduce A and B to the same disgonal matrix.c) Explain why there is an invertible matrix R such that R−1AR = B.

4. Find A8 and B8, where A and B are the two matrices in Problem 3.

5. Suppose that θ ∈ R is not an integer multiple of π. Show that the matrix(


)does

not have an eigenvector in R2.

6. Consider the matrix A =(

cos θ sin θsin θ − cos θ

), where θ ∈ R.

a) Show that A has an eigenvector in R2 with eigenvalue 1.b) Show that any vector v ∈ R2 perpendicular to the eigenvector in part (a) must satisfy Av = −v.

7. Let a ∈ R be non-zero. Show that the matrix(

1 a0 1

)cannot be diagonalized.


LINEAR ALGEBRA

W W L CHEN

c© W W L Chen, 1997, 2008.





Chapter 8

LINEAR TRANSFORMATIONS

8.1. Euclidean Linear Transformations

By a transformation from Rn into Rm, we mean a function of the type T : Rn → Rm, with domain Rn

and codomain Rm. For every vector x ∈ Rn, the vector T (x) ∈ Rm is called the image of x under thetransformation T , and the set

R(T ) = T (x) : x ∈ Rn,

of all images under T , is called the range of the transformation T .

Remark. For our convenience later, we have chosen to use R(T ) instead of the usual T (Rn) to denotethe range of the transformation T .

For every x = (x1, . . . , xn) ∈ Rn, we can write

T (x) = T (x1, . . . , xn) = (y1, . . . , ym).

Here, for every i = 1, . . . ,m, we have

yi = Ti(x1, . . . , xn), (1)

where Ti : Rn → R is a real valued function.

Definition. A transformation T : Rn → Rm is called a linear transformation if there exists a realmatrix

A =

a11 . . . a1n...

...am1 . . . amn

Chapter 8 : Linear Transformations page 1 of 35


such that for every x = (x1, . . . , xn) ∈ Rn, we have T (x1, . . . , xn) = (y1, . . . , ym), where

y1 = a11x1 + . . .+ a1nxn,

...ym = am1x1 + . . .+ amnxn,

or, in matrix notation, y1...ym

=

a11 . . . a1n...

...am1 . . . amn

x1...xn

. (2)

The matrix A is called the standard matrix for the linear transformation T .

Remarks. (1) In other words, a transformation T : Rn → Rm is linear if the equation (1) for everyi = 1, . . . ,m is linear.

(2) If we write x ∈ Rn and y ∈ Rm as column matrices, then (2) can be written in the form y = Ax,and so the linear transformation T can be interpreted as multiplication of x ∈ Rn by the standardmatrix A.

Definition. A linear transformation T : Rn → Rm is said to be a linear operator if n = m. In this case,we say that T is a linear operator on Rn.

Example 8.1.1. The linear transformation T : R5 → R3, defined by the equations

y1 = 2x1 + 3x2 + 5x3 + 7x4 − 9x5,

y2 = 3x2 + 4x3 + 2x5,

y3 = x1 + 3x3 − 2x4 ,

can be expressed in matrix form as

y1

y2

y3

=

2 3 5 7 −90 3 4 0 21 0 3 −2 0

x1

x2

x3

x4

x5

.

If (x1, x2, x3, x4, x5) = (1, 0, 1, 0, 1), then

y1

y2

y3

=

2 3 5 7 −90 3 4 0 21 0 3 −2 0

10101

=

−264

,

so that T (1, 0, 1, 0, 1) = (−2, 6, 4).

Example 8.1.2. Suppose that A is the zero m × n matrix. The linear transformation T : Rn → Rm,where T (x) = Ax for every x ∈ Rn, is the zero transformation from Rn into Rm. Clearly T (x) = 0 forevery x ∈ Rn.

Example 8.1.3. Suppose that I is the identity n× n matrix. The linear operator T : Rn → Rn, whereT (x) = Ix for every x ∈ Rn, is the identity operator on Rn. Clearly T (x) = x for every x ∈ Rn.



PROPOSITION 8A. Suppose that T : Rn → Rm is a linear transformation, and that e1, . . . , en isthe standard basis for Rn. Then the standard matrix for T is given by

A = (T (e1) . . . T (en) ) ,

where T (ej) is a column matrix for every j = 1, . . . , n.

Proof. This follows immediately from (2). ©

8.2. Linear Operators on R2

In this section, we consider the special case when n = m = 2, and study linear operators on R2. Forevery x ∈ R2, we shall write x = (x1, x2).

Example 8.2.1. Consider reflection across the x2-axis, so that T (x1, x2) = (−x1, x2). Clearly we have

T (e1) =(−1

0

)and T (e2) =

(01

),

and so it follows from Proposition 8A that the standard matrix is given by

A =(−1 0

0 1

).

It is not difficult to see that the standard matrices for reflection across the x1-axis and across the linex1 = x2 are given respectively by

A =(

1 00 −1

)and A =

(0 11 0

).

Also, the standard matrix for reflection across the origin is given by

A =(−1 0

0 −1

).


Linear operator Equations Standard matrix

Reflection across x2-axis y1 = −x1

y2 = x2

(−1 00 1

)Reflection across x1-axis

y1 = x1

y2 = −x2

(1 00 −1

)Reflection across x1 = x2

y1 = x2

y2 = x1

(0 11 0

)Reflection across origin

y1 = −x1

y2 = −x2

(−1 00 −1

)

Example 8.2.2. For orthogonal projection onto the x1-axis, we have T (x1, x2) = (x1, 0), with standardmatrix

A =(

1 00 0

).



Similarly, the standard matrix for orthogonal projection onto the x2-axis is given by

A =(

0 00 1

).



Orthogonal projection onto x1-axis y1 = x1

y2 = 0

(1 00 0

)Orthogonal projection onto x2-axis

y1 = 0y2 = x2

(0 00 1

)

Example 8.2.3. For anticlockwise rotation by an angle θ, we have T (x1, x2) = (y1, y2), where

y1 + iy2 = (x1 + ix2)(cos θ + i sin θ),

and so (y1

y2

)=(


)(x1

x2

).

It follows that the standard matrix is given by

A =(


).



Anticlockwise rotation by angle θy1 = x1 cos θ − x2 sin θy2 = x1 sin θ + x2 cos θ

(cos θ − sin θsin θ cos θ

)

Example 8.2.4. For contraction or dilation by a non-negative scalar k, we have T (x1, x2) = (kx1, kx2),with standard matrix

A =(k 00 k

).

The operator is called a contraction if 0 < k < 1 and a dilation if k > 1, and can be extended to negativevalues of k by noting that for k < 0, we have(

k 00 k

)=(−1 0

0 −1

)(−k 00 −k

).

This describes contraction or dilation by non-negative scalar −k followed by reflection across the origin.We give a summary in the table below:


Contraction or dilation by factor ky1 = kx1

y2 = kx2

(k 00 k

)



Example 8.2.5. For expansion or compression in the x1-direction by a positive factor k, we haveT (x1, x2) = (kx1, x2), with standard matrix

A =(k 00 1

).

This can be extended to negative values of k by noting that for k < 0, we have(k 00 1

)=(−1 0

0 1

)(−k 00 1

).

This describes expansion or compression in the x1-direction by positive factor −k followed by reflectionacross the x2-axis. Similarly, for expansion or compression in the x2-direction by a non-zero factor k,we have the standard matrix

A =(

1 00 k

).



Expansion or compression in x1-directiony1 = kx1

y2 = x2

(k 00 1

)Expansion or compression in x2-direction

y1 = x1

y2 = kx2

(1 00 k

)

Example 8.2.6. For shears in the x1-direction with factor k, we have T (x1, x2) = (x1 + kx2, x2), withstandard matrix

A =(

1 k0 1

).

For the case k = 1, we have the following.



A =!

k 00 1

".

This can be extended to negative values of k by noting that for k < 0, we have!k 00 1

"=

!"1 00 1

" !"k 00 1

".

This describes expansion or compression in the x1-direction by positive factor "k followed by reflectionacross the x2-axis. Similarly, for expansion or compression in the x2-direction by a non-zero factor k,we have the standard matrix

A =!

1 00 k

".



Expansion or compression in x1-direction#

y1 = kx1

y2 = x2

!k 00 1

"Expansion or compression in x2-direction

$ y1 = x1

y2 = kx2

!1 00 k

"


A =!

1 k0 1

".


• • • •

• • • •

T

(k=1)

For the case k = "1, we have the following.

• • • •

• • • •

T

(k=!1)


For the case k = −1, we have the following.



A =!

k 00 1

".

This can be extended to negative values of k by noting that for k < 0, we have!k 00 1

"=

!"1 00 1

" !"k 00 1

".

This describes expansion or compression in the x1-direction by positive factor "k followed by reflectionacross the x2-axis. Similarly, for expansion or compression in the x2-direction by a non-zero factor k,we have the standard matrix

A =!

1 00 k

".



Expansion or compression in x1-direction#

y1 = kx1

y2 = x2

!k 00 1

"Expansion or compression in x2-direction

$ y1 = x1

y2 = kx2

!1 00 k

"


A =!

1 k0 1

".


• • • •

• • • •

T

(k=1)

For the case k = "1, we have the following.

• • • •

• • • •

T

(k=!1)

Chapter 8 : Linear Transformations page 5 of 35Chapter 8 : Linear Transformations page 5 of 35


Similarly, for shears in the x2-direction with factor k, we have standard matrix

A =(

1 0k 1

).



Shear in x1-directiony1 = x1 + kx2

y2 = x2

(1 k0 1

)Shear in x2-direction

y1 = x1

y2 = kx1 + x2

(1 0k 1

)

Example 8.2.7. Consider a linear operator T : R2 → R2 which consists of a reflection across the x2-axis,followed by a shear in the x1-direction with factor 3 and then reflection across the x1-axis. To find thestandard matrix, consider the effect of T on a standard basis e1, e2 of R2. Note that

e1 =(

10

)7→(−1

0

)7→(−1

0

)7→(−1

0

)= T (e1),

e2 =(

01

)7→

(01

)7→

(31

)7→(

3−1

)= T (e2),

so it follows from Proposition 8A that the standard matrix for T is

A =(−1 3

0 −1

).

Let us summarize the above and consider a few special cases. We have the following table of invertiblelinear operators with k 6= 0. Clearly, if A is the standard matrix for an invertible linear operator T , thenthe inverse matrix A−1 is the standard matrix for the inverse linear operator T−1.

Linear operator T Standard matrix A Inverse matrix A−1 Linear operator T−1

Reflection acrossline x1=x2

(0 11 0

) (0 11 0

)Reflection across

line x1=x2

Expansion or compressionin x1−direction

(k 00 1

) (k−1 0

0 1

)Expansion or compression

in x1−direction

Expansion or compressionin x2−direction

(1 00 k

) (1 00 k−1

)Expansion or compression

in x2−direction

Shearin x1−direction

(1 k0 1

) (1 −k0 1

)Shear

in x1−direction

Shearin x2−direction

(1 0k 1

) (1 0−k 1

)Shear

in x2−direction

Next, let us consider the question of elementary row operations on 2 × 2 matrices. It is not difficultto see that an elementary row operation performed on a 2× 2 matrix A has the effect of multiplying the



matrix A by some elementary matrix E to give the product EA. We have the following table.

Elementary row operation Elementary matrix E

Interchanging the two rows(

0 11 0

)Multiplying row 1 by non-zero factor k

(k 00 1

)Multiplying row 2 by non-zero factor k

(1 00 k

)Adding k times row 2 to row 1

(1 k0 1

)Adding k times row 1 to row 2

(1 0k 1

)

Now, we know that any invertible matrix A can be reduced to the identity matrix by a finite number ofelementary row operations. In other words, there exist a finite number of elementary matrices E1, . . . , Es

of the types above with various non-zero values of k such that

Es . . . E1A = I,

so that

A = E−11 . . . E−1

s .


PROPOSITION 8B. Suppose that the linear operator T : R2 → R2 has standard matrix A, where A isinvertible. Then T is the product of a succession of finitely many reflections, expansions, compressionsand shears.

In fact, we can prove the following result concerning images of straight lines.

PROPOSITION 8C. Suppose that the linear operator T : R2 → R2 has standard matrix A, where Ais invertible. Then(a) the image under T of a straight line is a straight line;(b) the image under T of a straight line through the origin is a straight line through the origin; and(c) the images under T of parallel straight lines are parallel straight lines.

Proof. Suppose that T (x1, x2) = (y1, y2). Since A is invertible, we have x = A−1y, where

x =(x1

x2

)and y =

(y1

y2

).

The equation of a straight line is given by αx1 + βx2 = γ or, in matrix form, by

(α β )(x1

x2

)= ( γ ) .

Hence

(α β )A−1

(y1

y2

)= ( γ ) .



Let

(α′ β′ ) = (α β )A−1.

Then

(α′ β′ )(y1

y2

)= ( γ ) .

In other words, the image under T of the straight line αx1 + βx2 = γ is α′y1 + β′y2 = γ, clearly anotherstraight line. This proves (a). To prove (b), note that straight lines through the origin correspond toγ = 0. To prove (c), note that parallel straight lines correspond to different values of γ for the samevalues of α and β. ©

8.3. Elementary Properties of Euclidean Linear Transformations

In this section, we establish a number of simple properties of euclidean linear transformations.

PROPOSITION 8D. Suppose that T1 : Rn → Rm and T2 : Rm → Rk are linear transformations.Then T = T2 T1 : Rn → Rk is also a linear transformation.

Proof. Since T1 and T2 are linear transformations, they have standard matrices A1 and A2 respectively.In other words, we have T1(x) = A1x for every x ∈ Rn and T2(y) = A2y for every y ∈ Rm. It followsthat T (x) = T2(T1(x)) = A2A1x for every x ∈ Rn, so that T has standard matrix A2A1. ©

Example 8.3.1. Suppose that T1 : R2 → R2 is anticlockwise rotation by π/2 and T2 : R2 → R2 isorthogonal projection onto the x1-axis. Then the respective standard matrices are

A1 =(

0 −11 0

)and A2 =

(1 00 0

).

It follows that the standard matrices for T2 T1 and T1 T2 are respectively

A2A1 =(

0 −10 0

)and A1A2 =

(0 01 0

).

Hence T2 T1 and T1 T2 are not equal.

Example 8.3.2. Suppose that T1 : R2 → R2 is anticlockwise rotation by θ and T2 : R2 → R2 isanticlockwise rotation by φ. Then the respective standard matrices are

A1 =(


)and A2 =

(cosφ − sinφsinφ cosφ

).

It follows that the standard matrix for T2 T1 is

A2A1 =(

cosφ cos θ − sinφ sin θ − cosφ sin θ − sinφ cos θsinφ cos θ + cosφ sin θ cosφ cos θ − sinφ sin θ

)=(

cos(φ+ θ) − sin(φ+ θ)sin(φ+ θ) cos(φ+ θ)

).

Hence T2 T1 is anticlockwise rotation by φ+ θ.

Example 8.3.3. The reader should check that in R2, reflection across the x1-axis followed by reflectionacross the x2-axis gives reflection across the origin.

Linear transformations that map distinct vectors to distinct vectors are of special importance.



Definition. A linear transformation T : Rn → Rm is said to be one-to-one if for every x′,x′′ ∈ Rn, wehave x′ = x′′ whenever T (x′) = T (x′′).

Example 8.3.4. If we consider linear operators T : R2 → R2, then T is one-to-one precisely when thestandard matrix A is invertible. To see this, suppose first of all that A is invertible. If T (x′) = T (x′′),then Ax′ = Ax′′. Multiplying on the left by A−1, we obtain x′ = x′′. Suppose next that A is notinvertible. Then there exists x ∈ R2 such that x 6= 0 and Ax = 0. On the other hand, we clearly haveA0 = 0. It follows that T (x) = T (0), so that T is not one-to-one.

PROPOSITION 8E. Suppose that the linear operator T : Rn → Rn has standard matrix A. Then thefollowing statements are equivalent:(a) The matrix A is invertible.(b) The linear operator T is one-to-one.(c) The range of T is Rn; in other words, R(T ) = Rn.

Proof. ((a)⇒(b)) Suppose that T (x′) = T (x′′). Then Ax′ = Ax′′. Multiplying on the left by A−1 givesx′ = x′′.

((b)⇒(a)) Suppose that T is one-to-one. Then the system Ax = 0 has unique solution x = 0 in Rn.It follows that A can be reduced by elementary row operations to the identity matrix I, and is thereforeinvertible.

((a)⇒(c)) For any y ∈ Rn, clearly x = A−1y satisfies Ax = y, so that T (x) = y.

((c)⇒(a)) Suppose that e1, . . . , en is the standard basis for Rn. Let x1, . . . ,xn ∈ Rn be chosen tosatisfy T (xj) = ej , so that Axj = ej , for every j = 1, . . . , n. Write

C = ( x1 . . . xn ) .

Then AC = I, so that A is invertible. ©

Definition. Suppose that the linear operator T : Rn → Rn has standard matrix A, where A is invertible.Then the linear operator T−1 : Rn → Rn, defined by T−1(x) = A−1x for every x ∈ Rn, is called theinverse of the linear operator T .

Remark. Clearly T−1(T (x)) = x and T (T−1(x)) = x for every x ∈ Rn.

Example 8.3.5. Consider the linear operator T : R2 → R2, defined by T (x) = Ax for every x ∈ R2,where

A =(

1 11 2

).

Clearly A is invertible, and

A−1 =(

2 −1−1 1

).

Hence the inverse linear operator is T−1 : R2 → R2, defined by T−1(x) = A−1x for every x ∈ R2.

Example 8.3.6. Suppose that T : R2 → R2 is anticlockwise rotation by angle θ. The reader shouldcheck that T−1 : R2 → R2 is anticlockwise rotation by angle 2π − θ.

Next, we study the linearity properties of euclidean linear transformations which we shall use later todiscuss linear transformations in arbitrary real vector spaces.



PROPOSITION 8F. A transformation T : Rn → Rm is linear if and only if the following twoconditions are satisfied:(a) For every u,v ∈ Rn, we have T (u + v) = T (u) + T (v).(b) For every u ∈ Rn and c ∈ R, we have T (cu) = cT (u).

Proof. Suppose first of all that T : Rn → Rm is a linear transformation. Let A be the standard matrixfor T . Then for every u,v ∈ Rn and c ∈ R, we have

T (u + v) = A(u + v) = Au +Av = T (u) + T (v)

and

T (cu) = A(cu) = c(Au) = cT (u).

Suppose now that (a) and (b) hold. To show that T is linear, we need to find a matrix A such thatT (x) = Ax for every x ∈ Rn. Suppose that e1, . . . , en is the standard basis for Rn. As suggested byProposition 8A, we write

A = (T (e1) . . . T (en) ) ,

where T (ej) is a column matrix for every j = 1, . . . , n. For any vector

x =

x1...xn

in Rn, we have

Ax = (T (e1) . . . T (en) )

x1...xn

= x1T (e1) + . . .+ xnT (en).

Using (b) on each summand and then using (a) inductively, we obtain

Ax = T (x1e1) + . . .+ T (xnen) = T (x1e1 + . . .+ xnen) = T (x)

as required. ©

To conclude our study of euclidean linear transformations, we briefly mention the problem of eigen-values and eigenvectors of euclidean linear operators.

Definition. Suppose that T : Rn → Rn is a linear operator. Then any real number λ ∈ R is calledan eigenvalue of T if there exists a non-zero vector x ∈ Rn such that T (x) = λx. This non-zero vectorx ∈ Rn is called an eigenvector of T corresponding to the eigenvalue λ.

Remark. Note that the equation T (x) = λx is equivalent to the equation Ax = λx. It follows thatthere is no distinction between eigenvalues and eigenvectors of T and those of the standard matrix A.We therefore do not need to discuss this problem any further.

8.4. General Linear Transformations

Suppose that V and W are real vector spaces. To define a linear transformation from V into W , we aremotivated by Proposition 8F which describes the linearity properties of euclidean linear transformations.



By a transformation from V into W , we mean a function of the type T : V → W , with domain Vand codomain W . For every vector u ∈ V , the vector T (u) ∈ W is called the image of u under thetransformation T .

Definition. A transformation T : V → W from a real vector space V into a real vector space W iscalled a linear transformation if the following two conditions are satisfied:(LT1) For every u,v ∈ V , we have T (u + v) = T (u) + T (v).(LT2) For every u ∈ V and c ∈ R, we have T (cu) = cT (u).

Definition. A linear transformation T : V → V from a real vector space V into itself is called a linearoperator on V .

Example 8.4.1. Suppose that V and W are two real vector spaces. The transformation T : V → W ,where T (u) = 0 for every u ∈ V , is clearly linear, and is called the zero transformation from V to W .

Example 8.4.2. Suppose that V is a real vector space. The transformation I : V → V , where I(u) = ufor every u ∈ V , is clearly linear, and is called the identity operator on V .

Example 8.4.3. Suppose that V is a real vector space, and that k ∈ R is fixed. The transformationT : V → V , where T (u) = ku for every u ∈ V , is clearly linear. This operator is called a dilation ifk > 1 and a contraction if 0 < k < 1.

Example 8.4.4. Suppose that V is a finite dimensional vector space, with basis w1, . . . ,wn. Define atransformation T : V → Rn as follows. For every u ∈ V , there exists a unique vector (β1, . . . , βn) ∈ Rn

such that u = β1w1 + . . . + βnwn. We let T (u) = (β1, . . . , βn). In other words, the transformation Tgives the coordinates of any vector u ∈ V with respect to the given basis w1, . . . ,wn. Suppose nowthat v = γ1w1 + . . .+ γnwn is another vector in V . Then u + v = (β1 + γ1)w1 + . . .+ (βn + γn)wn, sothat

T (u + v) = (β1 + γ1, . . . , βn + γn) = (β1, . . . , βn) + (γ1, . . . , γn) = T (u) + T (v).

Also, if c ∈ R, then cu = cβ1w1 + . . .+ cβnwn, so that

T (cu) = (cβ1, . . . , cβn) = c(β1, . . . , βn) = cT (u).

Hence T is a linear transformation. We shall return to this in greater detail in the next section.

Example 8.4.5. Suppose that Pn denotes the vector space of all polynomials with real coefficients anddegree at most n. Define a transformation T : Pn → Pn as follows. For every polynomial

p = p0 + p1x+ . . .+ pnxn

in Pn, we let

T (p) = pn + pn−1x+ . . .+ p0xn.

Suppose now that q = q0 + q1x+ . . .+ qnxn is another polynomial in Pn. Then

p+ q = (p0 + q0) + (p1 + q1)x+ . . .+ (pn + qn)xn,

so that

T (p+ q) = (pn + qn) + (pn−1 + qn−1)x+ . . .+ (p0 + q0)xn

= (pn + pn−1x+ . . .+ p0xn) + (qn + qn−1x+ . . .+ q0x

n) = T (p) + T (q).



Also, for any c ∈ R, we have cp = cp0 + cp1x+ . . .+ cpnxn, so that

T (cp) = cpn + cpn−1x+ . . .+ cp0xn = c(pn + pn−1x+ . . .+ p0x

n) = cT (p).

Hence T is a linear transformation.

Example 8.4.6. Let V denote the vector space of all real valued functions differentiable everywhere in R,and let W denote the vector space of all real valued functions defined on R. Consider the transformationT : V → W , where T (f) = f ′ for every f ∈ V . It is easy to check from properties of derivatives that Tis a linear transformation.

Example 8.4.7. Let V denote the vector space of all real valued functions that are Riemann integrableover the interval [0, 1]. Consider the transformation T : V → R, where

T (f) =∫ 1

0

f(x) dx

for every f ∈ V . It is easy to check from properties of the Riemann integral that T is a linear transfor-mation.

Consider a linear transformation T : V →W from a finite dimensional real vector space V into a realvector space W . Suppose that v1, . . . ,vn is a basis of V . Then every u ∈ V can be written uniquelyin the form u = β1v1 + . . .+ βnvn, where β1, . . . , βn ∈ R. It follows that

T (u) = T (β1v1 + . . .+ βnvn) = T (β1v1) + . . .+ T (βnvn) = β1T (v1) + . . .+ βnT (vn).

We have therefore proved the following generalization of Proposition 8A.

PROPOSITION 8G. Suppose that T : V → W is a linear transformation from a finite dimensionalreal vector space V into a real vector space W . Suppose further that v1, . . . ,vn is a basis of V . ThenT is completely determined by T (v1), . . . , T (vn).

Example 8.4.8. Consider a linear transformation T : P2 → R, where T (1) = 1, T (x) = 2 and T (x2) = 3.Since 1, x, x2 is a basis of P2, this linear transformation is completely determined. In particular, wehave, for example,

T (5− 3x+ 2x2) = 5T (1)− 3T (x) + 2T (x2) = 5.

Example 8.4.9. Consider a linear transformation T : R4 → R, where T (1, 0, 0, 0) = 1, T (1, 1, 0, 0) = 2,T (1, 1, 1, 0) = 3 and T (1, 1, 1, 1) = 4. Since (1, 0, 0, 0), (1, 1, 0, 0), (1, 1, 1, 0), (1, 1, 1, 1) is a basis of R4,this linear transformation is completely determined. In particular, we have, for example,

T (6, 4, 3, 1) = T (2(1, 0, 0, 0) + (1, 1, 0, 0) + 2(1, 1, 1, 0) + (1, 1, 1, 1))= 2T (1, 0, 0, 0) + T (1, 1, 0, 0) + 2T (1, 1, 1, 0) + T (1, 1, 1, 1) = 14.

We also have the following generalization of Proposition 8D.

PROPOSITION 8H. Suppose that V,W,U are real vector spaces. Suppose further that T1 : V → Wand T2 : W → U are linear transformations. Then T = T2 T1 : V → U is also a linear transformation.

Proof. Suppose that u,v ∈ V . Then

T (u + v) = T2(T1(u + v)) = T2(T1(u) + T1(v)) = T2(T1(u)) + T2(T1(v)) = T (u) + T (v).

Also, if c ∈ R, then

T (cu) = T2(T1(cu)) = T2(cT1(u)) = cT2(T1(u)) = cT (u).

Hence T is a linear transformation. ©



8.5. Change of Basis

Suppose that V is a real vector space, with basis B = u1, . . . ,un. Then every vector u ∈ V can bewritten uniquely as a linear combination

u = β1u1 + . . .+ βnun, where β1, . . . , βn ∈ R. (3)

It follows that the vector u can be identified with the vector (β1, . . . , βn) ∈ Rn.

Definition. Suppose that u ∈ V and (3) holds. Then the matrix

[u]B =

β1...βn

is called the coordinate matrix of u relative to the basis B = u1, . . . ,un.

Example 8.5.1. The vectors

u1 = (1, 2, 1, 0), u2 = (3, 3, 3, 0), u3 = (2,−10, 0, 0), u4 = (−2, 1,−6, 2)

are linearly independent in R4, and so B = u1,u2,u3,u4 is a basis of R4. It follows that for anyu = (x, y, z, w) ∈ R4, we can write

u = β1u1 + β2u2 + β3u3 + β4u4.

In matrix notation, this becomesxyzw

=

1 3 2 −22 3 −10 11 3 0 −60 0 0 2

β1

β2

β3

β4

,

so that

[u]B =

β1

β2

β3

β4

=

1 3 2 −22 3 −10 11 3 0 −60 0 0 2

−1

xyzw

.

Remark. Consider a function φ : V → Rn, where φ(u) = [u]B for every u ∈ V . It is not difficult to seethat this function gives rise to a one-to-one correspondence between the elements of V and the elementsof Rn. Furthermore, note that

[u + v]B = [u]B + [v]B and [cu]B = c[u]B,

so that φ(u + v) = φ(u) + φ(v) and φ(cu) = cφ(u) for every u,v ∈ V and c ∈ R. Thus φ is a lineartransformation, and preserves much of the structure of V . We also say that V is isomorphic to Rn. Inpractice, once we have made this identification between vectors and their coordinate matrices, then wecan basically forget about the basis B and imagine that we are working in Rn with the standard basis.

Clearly, if we change from one basis B = u1, . . . ,un to another basis C = v1, . . . ,vn of V , then wealso need to find a way of calculating [u]C in terms of [u]B for every vector u ∈ V . To do this, note thateach of the vectors v1, . . . ,vn can be written uniquely as a linear combination of the vectors u1, . . . ,un.Suppose that for i = 1, . . . , n, we have

vi = a1iu1 + . . .+ aniun, where a1i, . . . , ani ∈ R,



so that

[vi]B =

a1i...ani

.

For every u ∈ V , we can write

u = β1u1 + . . .+ βnun = γ1v1 + . . .+ γnvn, where β1, . . . , βn, γ1, . . . , γn ∈ R,

so that

[u]B =

β1...βn

and [u]C =

γ1...γn

.

Clearly

u = γ1v1 + . . .+ γnvn

= γ1(a11u1 + . . .+ an1un) + . . .+ γn(a1nu1 + . . .+ annun)= (γ1a11 + . . .+ γna1n)u1 + . . .+ (γ1an1 + . . .+ γnann)un

= β1u1 + . . .+ βnun.

Hence

β1 = γ1a11 + . . .+ γna1n,

...βn = γ1an1 + . . .+ γnann.

Written in matrix notation, we have β1...βn

=

a11 . . . a1n...

...an1 . . . ann

γ1

...γn

.


PROPOSITION 8J. Suppose that B = u1, . . . ,un and C = v1, . . . ,vn are two bases of a realvector space V . Then for every u ∈ V , we have

[u]B = P [u]C ,

where the columns of the matrix

P = ( [v1]B . . . [vn]B )

are precisely the coordinate matrices of the elements of C relative to the basis B.

Remark. Strictly speaking, Proposition 8J gives [u]B in terms of [u]C . However, note that the matrixP is invertible (why?), so that [u]C = P−1[u]B.

Definition. The matrix P in Proposition 8J is sometimes called the transition matrix from the basis Cto the basis B.



Example 8.5.2. We know that with

u1 = (1, 2, 1, 0), u2 = (3, 3, 3, 0), u3 = (2,−10, 0, 0), u4 = (−2, 1,−6, 2),

and with

v1 = (1, 2, 1, 0), v2 = (1,−1, 1, 0), v3 = (1, 0,−1, 0), v4 = (0, 0, 0, 2),

both B = u1,u2,u3,u4 and C = v1,v2,v3,v4 are bases of R4. It is easy to check that

v1 = u1,

v2 = −2u1 + u2,

v3 = 11u1 − 4u2 + u3,

v4 = −27u1 + 11u2 − 2u3 + u4,

so that

P = ( [v1]B [v2]B [v3]B [v4]B ) =

1 −2 11 −270 1 −4 110 0 1 −20 0 0 1

.

Hence [u]B = P [u]C for every u ∈ R4. It is also easy to check that

u1 = v1,

u2 = 2v1 + v2,

u3 = −3v1 + 4v2 + v3,

u4 = −v1 − 3v2 + 2v3 + v4,

so that

Q = ( [u1]C [u2]C [u3]C [u4]C ) =

1 2 −3 −10 1 4 −30 0 1 20 0 0 1

.

Hence [u]C = Q[u]B for every u ∈ R4. Note that PQ = I. Now let u = (6,−1, 2, 2). We can check thatu = v1 + 3v2 + 2v3 + v4, so that

[u]C =

1321

.

Then

[u]B =

1 −2 11 −270 1 −4 110 0 1 −20 0 0 1

1321

=

−10

601

.

Check that u = −10u1 + 6u2 + u4.



Example 8.5.3. Consider the vector space P2. It is not too difficult to check that

u1 = 1 + x, u2 = 1 + x2, u3 = x+ x2

form a basis of P2. Let u = 1 + 4x− x2. Then u = β1u1 + β2u2 + β3u3, where

1 + 4x− x2 = β1(1 + x) + β2(1 + x2) + β3(x+ x2) = (β1 + β2) + (β1 + β3)x+ (β2 + β3)x2,

so that β1 + β2 = 1, β1 + β3 = 4 and β2 + β3 = −1. Hence (β1, β2, β3) = (3,−2, 1). If we writeB = u1, u2, u3, then

[u]B =

3−21

.

On the other hand, it is also not too difficult to check that

v1 = 1, v2 = 1 + x, v3 = 1 + x+ x2

form a basis of P2. Also u = γ1v1 + γ2v2 + γ3v3, where

1 + 4x− x2 = γ1 + γ2(1 + x) + γ3(1 + x+ x2) = (γ1 + γ2 + γ3) + (γ2 + γ3)x+ γ3x2,

so that γ1 + γ2 + γ3 = 1, γ2 + γ3 = 4 and γ3 = −1. Hence (γ1, γ2, γ3) = (−3, 5,−1). If we writeC = v1, v2, v3, then

[u]C =

−35−1

.

Next, note that

v1 = 12u1 + 1

2u2 − 12u3,

v2 = u1,

v3 = 12u1 + 1

2u2 + 12u3.

Hence

P = ( [v1]B [v2]B [v3]B ) =

1/2 1 1/21/2 0 1/2−1/2 0 1/2

.

To verify that [u]B = P [u]C , note that 3−21

=

1/2 1 1/21/2 0 1/2−1/2 0 1/2

−35−1

.

8.6. Kernel and Range

Consider first of all a euclidean linear transformation T : Rn → Rm. Suppose that A is the standardmatrix for T . Then the range of the transformation T is given by

R(T ) = T (x) : x ∈ Rn = Ax : x ∈ Rn.



It follows that R(T ) is the set of all linear combinations of the columns of the matrix A, and is thereforethe column space of A. On the other hand, the set

x ∈ Rn : Ax = 0

is the nullspace of A.

Recall that the sum of the dimension of the nullspace of A and dimension of the column space of A isequal to the number of columns of A. This is known as the Rank-nullity theorem. The purpose of thissection is to extend this result to the setting of linear transformations. To do this, we need the followinggeneralization of the idea of the nullspace and the column space.

Definition. Suppose that T : V →W is a linear transformation from a real vector space V into a realvector space W . Then the set

ker(T ) = u ∈ V : T (u) = 0

is called the kernel of T , and the set

R(T ) = T (u) : u ∈ V

is called the range of T .

Example 8.6.1. For a euclidean linear transformation T with standard matrix A, we have shown thatker(T ) is the nullspace of A, while R(T ) is the column space of A.

Example 8.6.2. Suppose that T : V →W is the zero transformation. Clearly we have ker(T ) = V andR(T ) = 0.

Example 8.6.3. Suppose that T : V → V is the identity operator on V . Clearly we have ker(T ) = 0and R(T ) = V .

Example 8.6.4. Suppose that T : R2 → R2 is orthogonal projection onto the x1-axis. Then ker(T ) isthe x2-axis, while R(T ) is the x1-axis.

Example 8.6.5. Suppose that T : Rn → Rn is one-to-one. Then ker(T ) = 0 and R(T ) = Rn, in viewof Proposition 8E.

Example 8.6.6. Consider the linear transformation T : V → W , where V denotes the vector space ofall real valued functions differentiable everywhere in R, where W denotes the space of all real valuedfunctions defined in R, and where T (f) = f ′ for every f ∈ V . Then ker(T ) is the set of all differentiablefunctions with derivative 0, and so is the set of all constant functions in R.

Example 8.6.7. Consider the linear transformation T : V → R, where V denotes the vector space ofall real valued functions Riemann integrable over the interval [0, 1], and where

T (f) =∫ 1

0

f(x) dx

for every f ∈ V . Then ker(T ) is the set of all Riemann integrable functions in [0, 1] with zero mean,while R(T ) = R.

PROPOSITION 8K. Suppose that T : V → W is a linear transformation from a real vector space Vinto a real vector space W . Then ker(T ) is a subspace of V , while R(T ) is a subspace of W .



Proof. Since T (0) = 0, it follows that 0 ∈ ker(T ) ⊆ V and 0 ∈ R(T ) ⊆ W . For any u,v ∈ ker(T ), wehave

T (u + v) = T (u) + T (v) = 0 + 0 = 0,

so that u + v ∈ ker(T ). Suppose further that c ∈ R. Then

T (cu) = cT (u) = c0 = 0,

so that cu ∈ ker(T ). Hence ker(T ) is a subspace of V . Suppose next that w, z ∈ R(T ). Then there existu,v ∈ V such that T (u) = w and T (v) = z. Hence

T (u + v) = T (u) + T (v) = w + z,

so that w + z ∈ R(T ). Suppose further that c ∈ R. Then

T (cu) = cT (u) = cw,

so that cw ∈ R(T ). Hence R(T ) is a subspace of W . ©

To complete this section, we prove the following generalization of the Rank-nullity theorem.

PROPOSITION 8L. Suppose that T : V →W is a linear transformation from an n-dimensional realvector space V into a real vector space W . Then

dim ker(T ) + dimR(T ) = n.

Proof. Suppose first of all that dim ker(T ) = n. Then ker(T ) = V , and so R(T ) = 0, and the resultfollows immediately. Suppose next that dim ker(T ) = 0, so that ker(T ) = 0. If v1, . . . ,vn is abasis of V , then it follows that T (v1), . . . , T (vn) are linearly independent in W , for otherwise there existc1, . . . , cn ∈ R, not all zero, such that

c1T (v1) + . . .+ cnT (vn) = 0,

so that T (c1v1 + . . . + cnvn) = 0, a contradiction since c1v1 + . . . + cnvn 6= 0. On the other hand,elements of R(T ) are linear combinations of T (v1), . . . , T (vn). Hence dimR(T ) = n, and the result againfollows immediately. We may therefore assume that dim ker(T ) = r, where 1 ≤ r < n. Let v1, . . . ,vrbe a basis of ker(T ). This basis can be extended to a basis v1, . . . ,vr,vr+1, . . . ,vn of V . It suffices toshow that

T (vr+1), . . . , T (vn) (4)

is a basis of R(T ). Suppose that u ∈ V . Then there exist β1, . . . , βn ∈ R such that

u = β1v1 + . . .+ βrvr + βr+1vr+1 + . . .+ βnvn,

so that

T (u) = β1T (v1) + . . .+ βrT (vr) + βr+1T (vr+1) + . . .+ βnT (vn)= βr+1T (vr+1) + . . .+ βnT (vn).

It follows that (4) spans R(T ). It remains to prove that its elements are linearly independent. Supposethat cr+1, . . . , cn ∈ R and

cr+1T (vr+1) + . . .+ cnT (vn) = 0. (5)



We need to show that

cr+1 = . . . = cn = 0. (6)

By linearity, it follows from (5) that T (cr+1vr+1 + . . .+ cnvn) = 0, so that

cr+1vr+1 + . . .+ cnvn ∈ ker(T ).

Hence there exist c1, . . . , cr ∈ R such that

cr+1vr+1 + . . .+ cnvn = c1v1 + . . .+ crvr,

so that

c1v1 + . . .+ crvr − cr+1vr+1 − . . .− cnvn = 0.

Since v1, . . . ,vn is a basis of V , it follows that c1 = . . . = cr = cr+1 = . . . = cn = 0, so that (6) holds.This completes the proof. ©

Remark. We sometimes say that dimR(T ) and dim ker(T ) are respectively the rank and the nullity ofthe linear transformation T .

8.7. Inverse Linear Transformations

In this section, we generalize some of the ideas first discussed in Section 8.3.

Definition. A linear transformation T : V →W from a real vector space V into a real vector space Wis said to be one-to-one if for every u′,u′′ ∈ V , we have u′ = u′′ whenever T (u′) = T (u′′).

The result below follows immediately from our definition.

PROPOSITION 8M. Suppose that T : V →W is a linear transformation from a real vector space Vinto a real vector space W . Then T is one-to-one if and only if ker(T ) = 0.

Proof. (⇒) Clearly 0 ∈ ker(T ). Suppose that ker(T ) 6= 0. Then there exists a non-zero v ∈ ker(T ).It follows that T (v) = T (0), and so T is not one-to-one.

(⇐) Suppose that ker(T ) = 0. Given any u′,u′′ ∈ V , we have

T (u′)− T (u′′) = T (u′ − u′′) = 0

if and only if u′ − u′′ = 0; in other words, if and only if u′ = u′′. ©

We have the following generalization of Proposition 8E.

PROPOSITION 8N. Suppose that T : V → V is a linear operator on a finite-dimensional real vectorspace V . Then the following statements are equivalent:(a) The linear operator T is one-to-one.(b) We have ker(T ) = 0.(c) The range of T is V ; in other words, R(T ) = V .

Proof. The equivalence of (a) and (b) is established by Proposition 8M. The equivalence of (b) and (c)follows from Proposition 8L. ©



Suppose that T : V →W is a one-to-one linear transformation from a real vector space V into a realvector space W . Then for every w ∈ R(T ), there exists exactly one u ∈ V such that T (u) = w. We cantherefore define a transformation T−1 : R(T ) → V by writing T−1(w) = u, where u ∈ V is the uniquevector satisfying T (u) = w.

PROPOSITION 8P. Suppose that T : V →W is a one-to-one linear transformation from a real vectorspace V into a real vector space W . Then T−1 : R(T )→ V is a linear transformation.

Proof. Suppose that w, z ∈ R(T ). Then there exist u,v ∈ V such that T−1(w) = u and T−1(z) = v.It follows that T (u) = w and T (v) = z, so that T (u + v) = T (u) + T (v) = w + z, whence

T−1(w + z) = u + v = T−1(w) + T−1(z).

Suppose further that c ∈ R. Then T (cu) = cw, so that

T−1(cw) = cu = cT−1(w).

This completes the proof. ©

We also have the following result concerning compositions of linear transformations and which requiresno further proof, in view of our knowledge concerning inverse functions.

PROPOSITION 8Q. Suppose that V,W,U are real vector spaces. Suppose further that T1 : V → Wand T2 : W → U are one-to-one linear transformations. Then(a) the linear transformation T2 T1 : V → U is one-to-one; and(b) (T2 T1)−1 = T−1

1 T−12 .

8.8. Matrices of General Linear Transformations

Suppose that T : V →W is a linear transformation from a real vector space V to a real vector space W .Suppose further that the vector spaces V and W are finite dimensional, with dimV = n and dimW = m.We shall show that if we make use of a basis B of V and a basis C of W , then it is possible to describeT indirectly in terms of some matrix A. The main idea is to make use of coordinate matrices relative tothe bases B and C.

Let us recall some discussion in Section 8.5. Suppose that B = v1, . . . ,vn is a basis of V . Thenevery vector v ∈ V can be written uniquely as a linear combination

v = β1v1 + . . .+ βnvn, where β1, . . . , βn ∈ R. (7)

The matrix

[v]B =

β1...βn

(8)

is the coordinate matrix of v relative to the basis B.

Consider now a transformation φ : V → Rn, where φ(v) = [v]B for every v ∈ V . The proof of thefollowing result is straightforward.

PROPOSITION 8R. Suppose that the real vector space V has basis B = v1, . . . ,vn. Then thetransformation φ : V → Rn, where φ(v) = [v]B satisfies (7) and (8) for every v ∈ V , is a one-to-one linear transformation, with range R(φ) = Rn. Furthermore, the inverse linear transformationφ−1 : Rn → V is also one-to-one, with range R(φ−1) = V .



Suppose next that C = w1, . . . ,wm is a basis of W . Then we can define a linear transformationψ : W → Rm, where ψ(w) = [w]C for every w ∈ W , in a similar way. We now have the followingdiagram of linear transformations.


Suppose next that C = w1, . . . ,wm is a basis of W . Then we can define a linear transformation! : W " Rm, where !(w) = [w]C for every w # W , in a similar way. We now have the followingdiagram of linear transformations.

V W

Rn Rm

T

! "!!1 "!1

Clearly the composition

S = ! $ T $ "!1 : Rn " Rm

is a euclidean linear transformation, and can therefore be described in terms of a standard matrix A.Our task is to determine this matrix A in terms of T and the bases B and C.

We know from Proposition 8A that

A = (S(e1) . . . S(en) ) ,

where e1, . . . , en is the standard basis for Rn. For every j = 1, . . . , n, we have

S(ej) = (! $ T $ "!1)(ej) = !(T ("!1(ej))) = !(T (vj)) = [T (vj)]C .

It follows that

(9) A = ( [T (v1)]C . . . [T (vn)]C ) .

Definition. The matrix A given by (9) is called the matrix for the linear transformation T with respectto the bases B and C.

We now have the following diagram of linear transformations.

V W

Rn Rm

T

! "

S

!!1 "!1

Hence we can write T as the composition

T = !!1 $ S $ " : V "W.

For every v # V , we have the following:

v [v]B A[v]B !!1(A[v]B)! S "!1


Clearly the compositionS = ψ T φ−1 : Rn → Rm



A = (S(e1) . . . S(en) ) ,


S(ej) = (ψ T φ−1)(ej) = ψ(T (φ−1(ej))) = ψ(T (vj)) = [T (vj)]C .

It follows that

A = ( [T (v1)]C . . . [T (vn)]C ) . (9)





V W

Rn Rm

T

! "!!1 "!1


S = ! $ T $ "!1 : Rn " Rm



A = (S(e1) . . . S(en) ) ,



It follows that

(9) A = ( [T (v1)]C . . . [T (vn)]C ) .



V W

Rn Rm

T

! "

S

!!1 "!1


T = !!1 $ S $ " : V "W.


v [v]B A[v]B !!1(A[v]B)! S "!1



T = ψ−1 S φ : V →W.

For every v ∈ V , we have the following:



V W

Rn Rm

T

! "!!1 "!1


S = ! $ T $ "!1 : Rn " Rm



A = (S(e1) . . . S(en) ) ,



It follows that

(9) A = ( [T (v1)]C . . . [T (vn)]C ) .



V W

Rn Rm

T

! "

S

!!1 "!1


T = !!1 $ S $ " : V "W.


v [v]B A[v]B !!1(A[v]B)! S "!1

Chapter 8 : Linear Transformations page 21 of 35Chapter 8 : Linear Transformations page 21 of 35


More precisely, if v = β1v1 + . . .+ βnvn, then

[v]B =

β1...βn

and A[v]B = A

β1...βn

=

γ1...γm

,

say, and so T (v) = ψ−1(A[v]B) = γ1w1 + . . .+ γmwm. We have proved the following result.

PROPOSITION 8S. Suppose that T : V → W is a linear transformation from a real vector space Vinto a real vector space W . Suppose further that V and W are finite dimensional, with bases B and Crespectively, and that A is the matrix for the linear transformation T with respect to the bases B and C.Then for every v ∈ V , we have T (v) = w, where w ∈W is the unique vector satisfying [w]C = A[v]B.

Remark. In the special case when V = W , the linear transformation T : V → W is a linear operatoron T . Of course, we may choose a basis B for the domain V of T and a basis C for the codomain Vof T . In the case when T is the identity linear operator, we often choose B 6= C since this represents achange of basis. In the case when T is not the identity operator, we often choose B = C for the sake ofconvenience; we then say that A is the matrix for the linear operator T with respect to the basis B.

Example 8.8.1. Consider an operator T : P3 → P3 on the real vector space P3 of all polynomials withreal coefficients and degree at most 3, where for every polynomial p(x) in P3, we have T (p(x)) = xp′(x),the product of x with the formal derivative p′(x) of p(x). The reader is invited to check that T is alinear operator. Now consider the basis B = 1, x, x2, x3 of P3. The matrix for T with respect to B isgiven by

A = ( [T (1)]B [T (x)]B [T (x2)]B [T (x3)]B ) = ( [0]B [x]B [2x2]B [3x3]B ) =

0 0 0 00 1 0 00 0 2 00 0 0 3

.

Suppose that p(x) = 1 + 2x+ 4x2 + 3x3. Then

[p(x)]B =

1243

and A[p(x)]B =

0 0 0 00 1 0 00 0 2 00 0 0 3

1243

=

0289

,

so that T (p(x)) = 2x+ 8x2 + 9x3. This can be easily verified by noting that

T (p(x)) = xp′(x) = x(2 + 8x+ 9x2) = 2x+ 8x2 + 9x3.

In general, if p(x) = p0 + p1x+ p2x2 + p3x

3, then

[p(x)]B =

p0

p1

p2

p3

and A[p(x)]B =

0 0 0 00 1 0 00 0 2 00 0 0 3

p0

p1

p2

p3

=

0p1

2p2

3p3

,

so that T (p(x)) = p1x+ 2p2x2 + 3p3x

3. Observe that

T (p(x)) = xp′(x) = x(p1 + 2p2x+ 3p3x2) = p1x+ 2p2x

2 + 3p3x3,

verifying our result.



Example 8.8.2. Consider the linear operator T : R2 → R2, given by T (x1, x2) = (2x1 + x2, x1 + 3x2)for every (x1, x2) ∈ R2. Consider also the basis B = (1, 0), (1, 1) of R2. Then the matrix for T withrespect to B is given by

A = ( [T (1, 0)]B [T (1, 1)]B ) = ( [(2, 1)]B [(3, 4)]B ) =(

1 −11 4

).

Suppose that (x1, x2) = (3, 2). Then

[(3, 2)]B =(

12

)and A[(3, 2)]B =

(1 −11 4

)(12

)=(−1

9

),

so that T (3, 2) = −(1, 0) + 9(1, 1) = (8, 9). This can be easily verified directly. In general, we have

[(x1, x2)]B =(x1 − x2

x2

)and A[(x1, x2)]B =

(1 −11 4

)(x1 − x2

x2

)=(x1 − 2x2

x1 + 3x2

),

so that T (x1, x2) = (x1 − 2x2)(1, 0) + (x1 + 3x2)(1, 1) = (2x1 + x2, x1 + 3x2).

Example 8.8.3. Suppose that T : Rn → Rm is a linear transformation. Suppose further that B and Care the standard bases for Rn and Rm respectively. Then the matrix for T with respect to B and C isgiven by

A = ( [T (e1)]C . . . [T (en)]C ) = (T (e1) . . . T (en) ) ,

so it follows from Proposition 8A that A is simply the standard matrix for T .

Suppose now that T1 : V → W and T2 : W → U are linear transformations, where the real vectorspaces V,W,U are finite dimensional, with respective bases B = v1, . . . ,vn, C = w1, . . . ,wm andD = u1, . . . ,uk. We then have the following diagram of linear transformations.


Example 8.8.2. Consider the linear operator T : R2 " R2, given by T (x1, x2) = (2x1 + x2, x1 + 3x2)for every (x1, x2) # R2. Consider also the basis B = (1, 0), (1, 1) of R2. Then the matrix for T withrespect to B is given by

A = ( [T (1, 0)]B [T (1, 1)]B ) = ( [(2, 1)]B [(3, 4)]B ) =!

1 $11 4

".

Suppose that (x1, x2) = (3, 2). Then

[(3, 2)]B =!

12

"and A[(3, 2)]B =

!1 $11 4

" !12

"=

!$19

",

so that T (3, 2) = $(1, 0) + 9(1, 1) = (8, 9). This can be easily verified directly. In general, we have

[(x1, x2)]B =!

x1 $ x2

x2

"and A[(x1, x2)]B =

!1 $11 4

" !x1 $ x2

x2

"=

!x1 $ 2x2

x1 + 3x2

",

so that T (x1, x2) = (x1 $ 2x2)(1, 0) + (x1 + 3x2)(1, 1) = (2x1 + x2, x1 + 3x2).

Example 8.8.3. Suppose that T : Rn " Rm is a linear transformation. Suppose further that B and Care the standard bases for Rn and Rm respectively. Then the matrix for T with respect to B and C isgiven by

A = ( [T (e1)]C . . . [T (en)]C ) = (T (e1) . . . T (en) ) ,

so it follows from Proposition 8A that A is simply the standard matrix for T .

Suppose now that T1 : V " W and T2 : W " U are linear transformations, where the real vectorspaces V, W, U are finite dimensional, with respective bases B = v1, . . . ,vn, C = w1, . . . ,wm andD = u1, . . . ,uk. We then have the following diagram of linear transformations.

V W U

Rn Rm Rk

T1

!

T2

" #

S1

!!1

S2

"!1 #!1

Here ! : U " Rk, where !(u) = [u]D for every u # U , is a linear transformation, and

S1 = " % T1 % #!1 : Rn " Rm and S2 = ! % T2 % "!1 : Rm " Rk

are euclidean linear transformations. Suppose that A1 and A2 are respectively the standard matricesfor S1 and S2, so that they are respectively the matrix for T1 with respect to B and C and the matrixfor T2 with respect to C and D. Clearly

S2 % S1 = ! % T2 % T1 % #!1 : Rn " Rk.

It follows that A2A1 is the standard matrix for S2 % S1, and so is the matrix for T2 % T1 with respect tothe bases B and D. To summarize, we have the following result.


Here η : U → Rk, where η(u) = [u]D for every u ∈ U , is a linear transformation, and

S1 = ψ T1 φ−1 : Rn → Rm and S2 = η T2 ψ−1 : Rm → Rk

are euclidean linear transformations. Suppose that A1 and A2 are respectively the standard matricesfor S1 and S2, so that they are respectively the matrix for T1 with respect to B and C and the matrixfor T2 with respect to C and D. Clearly

S2 S1 = η T2 T1 φ−1 : Rn → Rk.

It follows that A2A1 is the standard matrix for S2 S1, and so is the matrix for T2 T1 with respect tothe bases B and D. To summarize, we have the following result.



PROPOSITION 8T. Suppose that T1 : V → W and T2 : W → U are linear transformations, wherethe real vector spaces V,W,U are finite dimensional, with bases B, C,D respectively. Suppose further thatA1 is the matrix for the linear transformation T1 with respect to the bases B and C, and that A2 is thematrix for the linear transformation T2 with respect to the bases C and D. Then A2A1 is the matrix forthe linear transformation T2 T1 with respect to the bases B and D.

Example 8.8.4. Consider the linear operator T1 : P3 → P3, where for every polynomial p(x) in P3,we have T1(p(x)) = xp′(x). We have already shown that the matrix for T1 with respect to the basisB = 1, x, x2, x3 of P3 is given by

A1 =

0 0 0 00 1 0 00 0 2 00 0 0 3

.

Consider next the linear operator T2 : P3 → P3, where for every polynomial q(x) = q0 +q1x+q2x2 +q3x

3

in P3, we have

T2(q(x)) = q(1 + x) = q0 + q1(1 + x) + q2(1 + x)2 + q3(1 + x)3.

We have T2(1) = 1, T2(x) = 1 + x, T2(x2) = 1 + 2x + x2 and T2(x3) = 1 + 3x + 3x2 + x3, so that thematrix for T2 with respect to B is given by

A2 = ( [T2(1)]B [T2(x)]B [T2(x2)]B [T2(x3)]B ) =

1 1 1 10 1 2 30 0 1 30 0 0 1

.

Consider now the composition T = T2 T1 : P3 → P3. Let A denote the matrix for T with respect to B.By Proposition 8T, we have

A = A2A1 =

1 1 1 10 1 2 30 0 1 30 0 0 1

0 0 0 00 1 0 00 0 2 00 0 0 3

=

0 1 2 30 1 4 90 0 2 90 0 0 3

.

Suppose that p(x) = p0 + p1x+ p2x2 + p3x

3. Then

[p(x)]B =

p0

p1

p2

p3

and A[p(x)]B =

0 1 2 30 1 4 90 0 2 90 0 0 3

p0

p1

p2

p3

=

p1 + 2p2 + 3p3

p1 + 4p2 + 9p3

2p2 + 9p3

3p3

,

so that T (p(x)) = (p1 +2p2 +3p3)+(p1 +4p2 +9p3)x+(2p2 +9p3)x2 +3p3x3. We can check this directly

by noting that

T (p(x)) = T2(T1(p(x))) = T2(p1x+ 2p2x2 + 3p3x

3) = p1(1 + x) + 2p2(1 + x)2 + 3p3(1 + x)3

= (p1 + 2p2 + 3p3) + (p1 + 4p2 + 9p3)x+ (2p2 + 9p3)x2 + 3p3x3.

Example 8.8.5. Consider the linear operator T : R2 → R2, given by T (x1, x2) = (2x1 + x2, x1 + 3x2)for every (x1, x2) ∈ R2. We have already shown that the matrix for T with respect to the basisB = (1, 0), (1, 1) of R2 is given by

A =(

1 −11 4

).



Consider the linear operator T 2 : R2 → R2. By Proposition 8T, the matrix for T 2 with respect to B isgiven by

A2 =(

1 −11 4

)(1 −11 4

)=(

0 −55 15

).

Suppose that (x1, x2) ∈ R2. Then

[(x1, x2)]B =(x1 − x2

x2

)and A2[(x1, x2)]B =

(0 −55 15

)(x1 − x2

x2

)=( −5x2

5x1 + 10x2

),

so that T (x1, x2) = −5x2(1, 0) + (5x1 + 10x2)(1, 1) = (5x1 + 5x2, 5x1 + 10x2). The reader is invited tocheck this directly.

A simple consequence of Propositions 8N and 8T is the following result concerning inverse lineartransformations.

PROPOSITION 8U. Suppose that T : V → V is a linear operator on a finite dimensional real vectorspace V with basis B. Suppose further that A is the matrix for the linear operator T with respect to thebasis B. Then T is one-to-one if and only if A is invertible. Furthermore, if T is one-to-one, then A−1

is the matrix for the inverse linear operator T−1 : V → V with respect to the basis B.

Proof. Simply note that T is one-to-one if and only if the system Ax = 0 has only the trivial solutionx = 0. The last assertion follows easily from Proposition 8T, since if A′ denotes the matrix for theinverse linear operator T−1 with respect to B, then we must have A′A = I, the matrix for the identityoperator T−1 T with respect to B. ©

Example 8.8.6. Consider the linear operator T : P3 → P3, where for every q(x) = q0 +q1x+q2x2 +q3x

3

in P3, we have

T (q(x)) = q(1 + x) = q0 + q1(1 + x) + q2(1 + x)2 + q3(1 + x)3.

We have already shown that the matrix for T with respect to the basis B = 1, x, x2, x3 is given by

A =

1 1 1 10 1 2 30 0 1 30 0 0 1

.

This matrix is invertible, so it follows that T is one-to-one. Furthermore, it can be checked that

A−1 =

1 −1 1 −10 1 −2 30 0 1 −30 0 0 1

.

Suppose that p(x) = p0 + p1x+ p2x2 + p3x

3. Then

[p(x)]B =

p0

p1

p2

p3

and A−1[p(x)]B =

1 −1 1 −10 1 −2 30 0 1 −30 0 0 1

p0

p1

p2

p3

=

p0 − p1 + p2 − p3

p1 − 2p2 + 3p3

p2 − 3p3

p3

,

so that

T−1(p(x)) = (p0 − p1 + p2 − p3) + (p1 − 2p2 + 3p3)x+ (p2 − 3p3)x2 + p3x3

= p0 + p1(x− 1) + p2(x2 − 2x+ 1) + p3(x3 − 3x2 + 3x− 1)= p0 + p1(x− 1) + p2(x− 1)2 + p3(x− 1)3 = p(x− 1).



8.9. Change of Basis

Suppose that V is a finite dimensional real vector space, with one basis B = v1, . . . ,vn and anotherbasis B′ = u1, . . . ,un. Suppose that T : V → V is a linear operator on V . Let A denote the matrixfor T with respect to the basis B, and let A′ denote the matrix for T with respect to the basis B′. Ifv ∈ V and T (v) = w, then

[w]B = A[v]B (10)

and

[w]B′ = A′[v]B′ . (11)

We wish to find the relationship between A′ and A.

Recall Proposition 8J, that if

P = ( [u1]B . . . [un]B )

denotes the transition matrix from the basis B′ to the basis B, then

[v]B = P [v]B′ and [w]B = P [w]B′ . (12)

Note that the matrix P can also be interpreted as the matrix for the identity operator I : V → V withrespect to the bases B′ and B. It is easy to see that the matrix P is invertible, and

P−1 = ( [v1]B′ . . . [vn]B′ )

denotes the transition matrix from the basis B to the basis B′, and can also be interpreted as the matrixfor the identity operator I : V → V with respect to the bases B and B′.

Combining (10) and (12), we conclude that

[w]B′ = P−1[w]B = P−1A[v]B = P−1AP [v]B′ .

Comparing this with (11), we conclude that

P−1AP = A′. (13)

This implies that

A = PA′P−1. (14)

Remark. We can use the notation

A = [T ]B and A′ = [T ]B′

to denote that A and A′ are the matrices for T with respect to the basis B and with respect to the basisB′ respectively. We can also write

P = [I]B,B′

to denote that P is the transition matrix from the basis B′ to the basis B, so that

P−1 = [I]B′,B.



Then (13) and (14) become respectively

[I]B′,B[T ]B[I]B,B′ = [T ]B′ and [I]B,B′ [T ]B′ [I]B′,B = [T ]B.


PROPOSITION 8V. Suppose that T : V → V is a linear operator on a finite dimensional real vectorspace V , with bases B = v1, . . . ,vn and B′ = u1, . . . ,un. Suppose further that A and A′ are thematrices for T with respect to the basis B and with respect to the basis B′ respectively. Then

P−1AP = A′ and A′ = PAP−1,

where

P = ( [u1]B . . . [un]B )

denotes the transition matrix from the basis B′ to the basis B.

Remarks. (1) We have the following picture.


Then (13) and (14) become respectively

[I]B!,B[T ]B[I]B,B! = [T ]B! and [I]B,B! [T ]B! [I]B!,B = [T ]B.


PROPOSITION 8V. Suppose that T : V " V is a linear operator on a finite dimensional real vectorspace V , with bases B = v1, . . . ,vn and B! = u1, . . . ,un. Suppose further that A and A! are thematrices for T with respect to the basis B and with respect to the basis B! respectively. Then

P"1AP = A! and A! = PAP"1,

where

P = ( [u1]B . . . [un]B )

denotes the transition matrix from the basis B! to the basis B.

Remarks. (1) We have the following picture.

v w

v w

[v]B! [w]B!

[v]B [w]B

T

II

T

A!

P

A

P"1

(2) The idea can be extended to the case of linear transformations T : V "W from a finite dimensionalreal vector space into another, with a change of basis in V and a change of basis in W .

Example 8.9.1. Consider the vector space P3 of all polynomials with real coe!cients and degree atmost 3, with bases B = 1, x, x2, x3 and B! = 1, 1 + x, 1 + x + x2, 1 + x + x2 + x3. Consider alsothe linear operator T : P3 " P3, where for every polynomial p(x) = p0 + p1x + p2x

2 + p3x3, we have

T (p(x)) = (p0 + p1)+ (p1 + p2)x+(p2 + p3)x2 +(p0 + p3)x3. Let A denote the matrix for T with respectto the basis B. Then T (1) = 1 + x3, T (x) = 1 + x, T (x2) = x + x2 and T (x3) = x2 + x3, and so

A = ( [T (1)]B [T (x)]B [T (x2)]B [T (x3)]B ) =

!"#1 1 0 00 1 1 00 0 1 11 0 0 1

$%& .

Next, note that the transition matrix from the basis B! to the basis B is given by

P = ( [1]B [1 + x]B [1 + x + x2]B [1 + x + x2 + x3]B ) =

!"#1 1 1 10 1 1 10 0 1 10 0 0 1

$%& .


(2) The idea can be extended to the case of linear transformations T : V →W from a finite dimensionalreal vector space into another, with a change of basis in V and a change of basis in W .

Example 8.9.1. Consider the vector space P3 of all polynomials with real coefficients and degree atmost 3, with bases B = 1, x, x2, x3 and B′ = 1, 1 + x, 1 + x + x2, 1 + x + x2 + x3. Consider alsothe linear operator T : P3 → P3, where for every polynomial p(x) = p0 + p1x + p2x

2 + p3x3, we have

T (p(x)) = (p0 + p1) + (p1 + p2)x+ (p2 + p3)x2 + (p0 + p3)x3. Let A denote the matrix for T with respectto the basis B. Then T (1) = 1 + x3, T (x) = 1 + x, T (x2) = x+ x2 and T (x3) = x2 + x3, and so

A = ( [T (1)]B [T (x)]B [T (x2)]B [T (x3)]B ) =

1 1 0 00 1 1 00 0 1 11 0 0 1

.

Next, note that the transition matrix from the basis B′ to the basis B is given by

P = ( [1]B [1 + x]B [1 + x+ x2]B [1 + x+ x2 + x3]B ) =

1 1 1 10 1 1 10 0 1 10 0 0 1

.



It can be checked that

P−1 =

1 −1 0 00 1 −1 00 0 1 −10 0 0 1

,

and so

A′ = P−1AP =

1 −1 0 00 1 −1 00 0 1 −10 0 0 1

1 1 0 00 1 1 00 0 1 11 0 0 1

1 1 1 10 1 1 10 0 1 10 0 0 1

=

1 1 0 00 1 1 0−1 −1 0 01 1 1 2

is the matrix for T with respect to the basis B′. It follows that

T (1) = 1− (1 + x+ x2) + (1 + x+ x2 + x3) = 1 + x3,

T (1 + x) = 1 + (1 + x)− (1 + x+ x2) + (1 + x+ x2 + x3) = 2 + x+ x3,

T (1 + x+ x2) = (1 + x) + (1 + x+ x2 + x3) = 2 + 2x+ x2 + x3,

T (1 + x+ x2 + x3) = 2(1 + x+ x2 + x3) = 2 + 2x+ 2x2 + 2x3.

These can be verified directly.

8.10. Eigenvalues and Eigenvectors

Definition. Suppose that T : V → V is a linear operator on a finite dimensional real vector space V .Then any real number λ ∈ R is called an eigenvalue of T if there exists a non-zero vector v ∈ V such thatT (v) = λv. This non-zero vector v ∈ V is called an eigenvector of T corresponding to the eigenvalue λ.

The purpose of this section is to show that the problem of eigenvalues and eigenvectors of the linearoperator T can be reduced to the problem of eigenvalues and eigenvectors of the matrix for T withrespect to any basis B of V . The starting point of our argument is the following theorem, the proof ofwhich is left as an exercise.

PROPOSITION 8W. Suppose that T : V → V is a linear operator on a finite dimensional real vectorspace V , with bases B and B′. Suppose further that A and A′ are the matrices for T with respect to thebasis B and with respect to the basis B′ respectively. Then(a) detA = detA′;(b) A and A′ have the same rank;(c) A and A′ have the same characteristic polynomial;(d) A and A′ have the same eigenvalues; and(e) the dimension of the eigenspace of A corresponding to an eigenvalue λ is equal to the dimension of

the eigenspace of A′ corresponding to λ.

We also state without proof the following result.

PROPOSITION 8X. Suppose that T : V → V is a linear operator on a finite dimensional real vectorspace V . Suppose further that A is the matrix for T with respect to a basis B of V . Then(a) the eigenvalues of T are precisely the eigenvalues of A; and(b) a vector u ∈ V is an eigenvector of T corresponding to an eigenvalue λ if and only if the coordinate

matrix [u]B is an eigenvector of A corresponding to the eigenvalue λ.



Suppose now that A is the matrix for a linear operator T : V → V on a finite dimensional realvector space V with respect to a basis B = v1, . . . ,vn. If A can be diagonalized, then there exists aninvertible matrix P such that

P−1AP = D

is a diagonal matrix. Furthermore, the columns of P are eigenvectors of A, and so are the coordinatematrices of eigenvectors of T with respect to the basis B. In other words,

P = ( [u1]B . . . [un]B ) ,

where B′ = u1, . . . ,un is a basis of V consiting of eigenvectors of T . Furthermore, P is the transitionmatrix from the basis B′ to the basis B. It follows that the matrix for T with respect to the basis B′ isgiven by

D =

λ1

. . .λn

,

where λ1, . . . , λn are the eigenvalues of T .

Example 8.10.1. Consider the vector space P2 of all polynomials with real coefficients and degree atmost 2, with basis B = 1, x, x2. Consider also the linear operator T : P2 → P2, where for everypolynomial p(x) = p0 + p1x+ p2x

2, we have T (p(x)) = (5p0− 2p1) + (6p1 + 2p2− 2p0)x+ (2p1 + 7p2)x2.Then T (1) = 5− 2x, T (x) = −2 + 6x+ 2x2 and T (x2) = 2x+ 7x2, so that the matrix for T with respectto the basis B is given by

A = ( [T (1)]B [T (x)]B [T (x2)]B ) =

5 −2 0−2 6 20 2 7

.

It is a simple exercise to show that the matrix A has eigenvalues 3, 6, 9, with corresponding eigenvectors

x1 =

22−1

, x2 =

2−12

, x3 =

−122

,

so that writing

P =

2 2 −12 −1 2−1 2 2

,

we have

P−1AP =

3 0 00 6 00 0 9

.

Now let B′ = p1(x), p2(x), p3(x), where

[p1(x)]B =

22−1

, [p2(x)]B =

2−12

, [p3(x)]B =

−122

.



Then P is the transition matrix from the basis B′ to the basis B, and D is the matrix for T with respectto the basis B′. Clearly p1(x) = 2 + 2x− x2, p2(x) = 2− x+ 2x2 and p3(x) = −1 + 2x+ 2x2. Note nowthat

T (p1(x)) = T (2 + 2x− x2) = 6 + 6x− 3x2 = 3p1(x),T (p2(x)) = T (2− x+ 2x2) = 12− 6x+ 12x2 = 6p2(x),T (p3(x)) = T (−1 + 2x+ 2x2) = −9 + 18x+ 18x2 = 9p3(x).




1. Consider the transformation T : R3 → R4, given by

T (x1, x2, x3) = (x1 + x2 + x3, x2 + x3, 3x1 + x2, 2x2 + x3)

for every (x1, x2, x3) ∈ R3.a) Find the standard matrix A for T .b) By reducing A to row echelon form, determine the dimension of the kernel of T and the dimension

of the range of T .

2. Consider a linear operator T : R3 → R3 with standard matrix

A =

1 2 32 1 31 3 2

.

Let e1, e2, e3 denote the standard basis for R3.a) Find T (ej) for every j = 1, 2, 3.b) Find T (2e1 + 5e2 + 3e3).c) Is T invertible? Justify your assertion.

3. Consider the linear operator T : R2 → R2 with standard matrix

A =(

1 10 1

).

a) Find the image under T of the line x1 + 2x2 = 3.b) Find the image under T of the circle x2

1 + x22 = 1.

4. For each of the following, determine whether the given transformation is linear:a) T : V → R, where V is a real inner product space and T (u) = ‖u‖.b) T :M2,2(R)→M2,3(R), where B ∈M2,3(R) is fixed and T (A) = AB.c) T :M3,4(R)→M4,3(R), where T (A) = At.d) T : P2 → P2, where T (p0 + p1x+ p2x

2) = p0 + p1(2 + x) + p2(2 + x)2.e) T : P2 → P2, where T (p0 + p1x+ p2x

2) = p0 + p1x+ (p2 + 1)x2.

5. Suppose that T : R3 → R3 is a linear transformation satisfying the conditions T (1, 0, 0) = (2, 4, 1),T (1, 1, 0) = (3, 0, 2) and T (1, 1, 1) = (1, 4, 6).

a) Evaluate T (5, 3, 2).b) Find T (x1, x2, x3) for every (x1, x2, x3) ∈ R3.

6. Suppose that T : R3 → R3 is orthogonal projection onto the x1x2-plane.a) Find the standard matrix A for T .b) Find A2.c) Show that T T = T .

7. Consider the bases B = u1,u2,u3 and C = v1,v2,v3 of R3, where u1 = (2, 1, 1), u2 = (2,−1, 1),u3 = (1, 2, 1), v1 = (3, 1,−5), v2 = (1, 1,−3) and v3 = (−1, 0, 2).

a) Find the transition matrix from the basis C to the basis B.b) Find the transition matrix from the basis B to the basis C.c) Show that the matrices in parts (a) and (b) are inverses of each other.d) Compute the coordinate matrix [u]C , where u = (−5, 8,−5).e) Use the transition matrix to compute the coordinate matrix [u]B.f) Compute the coordinate matrix [u]B directly and compare it to your answer in part (e).



8. Consider the bases B = p1, p2 and C = q1, q2 of P1, where p1 = 2, p2 = 3 + 2x, q1 = 6 + 3x andq2 = 10 + 2x.

a) Find the transition matrix from the basis C to the basis B.b) Find the transition matrix from the basis B to the basis C.c) Show that the matrices in parts (a) and (b) are inverses of each other.d) Compute the coordinate matrix [p]C , where p = −4 + x.e) Use the transition matrix to compute the coordinate matrix [p]B.f) Compute the coordinate matrix [p]B directly and compare it to your answer in part (e).

9. Let V be the real vector space spanned by the functions f1 = sinx and f2 = cosx.a) Show that g1 = 2 sinx+ cosx and g2 = 3 cosx form a basis of V .b) Find the transition matrix from the basis C = g1, g2 to the basis B = f1, f2 of V .c) Compute the coordinate matrix [f ]C , where f = 2 sinx− 5 cosx.d) Use the transition matrix to compute the coordinate matrix [f ]B.e) Compute the coordinate matrix [f ]B directly and compare it to your answer in part (d).

10. Let P be the transition matrix from a basis C to another basis B of a real vector space V . Explainwhy P is invertible.

11. For each of the following linear transformations T , find ker(T ) and R(T ), and verify the Rank-nullitytheorem:

a) T : R3 → R3, with standard matrix A =

1 −1 35 6 −47 4 2

.

b) T : P3 → P2, where T (p(x)) = p′(x), the formal derivative.

c) T : P1 → R, where T (p(x)) =∫ 1

0

p(x) dx.

12. For each of the following, determine whether the linear operator T : Rn → Rn is one-to-one. If so,find also the inverse linear operator T−1 : Rn → Rn:

a) T (x1, x2, x3, . . . , xn) = (x2, x1, x3, . . . , xn)b) T (x1, x2, x3, . . . , xn) = (x2, x3, . . . , xn, x1)c) T (x1, x2, x3, . . . , xn) = (x2, x2, x3, . . . , xn)

13. Consider the operator T : R2 → R2, where T (x1, x2) = (x1 + kx2,−x2) for every (x1, x2) ∈ R2.Here k ∈ R is fixed.

a) Show that T is a linear operator.b) Show that T is one-to-one.c) Find the inverse linear operator T−1 : R2 → R2.

14. Consider the linear transformation T : P2 → P1, where T (p0 +p1x+p2x2) = (p0 +p2) + (2p0 +p1)x

for every polynomial p0 + p1x+ p2x2 in P2.

a) Find the matrix for T with respect to the bases 1, x, x2 and 1, x.b) Find T (2 + 3x+ 4x2) by using the matrix A.c) Use the matrix A to recover the formula T (p0 + p1x+ p2x

2) = (p0 + p2) + (2p0 + p1)x.

15. Consider the linear operator T : R2 → R2, where T (x1, x2) = (x1−x2, x1+x2) for every (x1, x2) ∈ R2.a) Find the matrix A for T with respect to the basis (1, 1), (−1, 0) of R2.b) Use the matrix A to recover the formula T (x1, x2) = (x1 − x2, x1 + x2).c) Is T one-to-one? If so, use the matrix A to find the inverse linear operator T−1 : R2 → R2.



16. Consider the real vector space of all real sequences x = (x1, x2, x3, . . .) such that the series

∞∑n=1

xn

is convergent.a) Show that the transformation T : V → R, given by

T (x) =∞∑

n=1

xn

for every x ∈ V , is a linear transformation.b) Is the linear transformation T one-to-one? If so, give a proof. If not, find two distinct vectors

x,y ∈ V such that T (x) = T (y).

17. Suppose that T1 : R2 → R2 and T2 : R2 → R2 are linear operators such that

T1(x1, x2) = (x1 + x2, x1 − x2) and T2(x1, x2) = (2x1 + x2, x1 − 2x2)

for every (x1, x2) ∈ R2.a) Show that T1 and T2 are one-to-one.b) Find the formulas for T−1

1 , T−12 and (T2 T1)−1.

c) Verify that (T2 T1)−1 = T−11 T−1

2 .

18. Consider the transformation T : P1 → R2, where T (p(x)) = (p(0), p(1)) for every polynomial p(x)in P1.

a) Find T (1− 2x).b) Show that T is a linear transformation.c) Show that T is one-to-one.d) Find T−1(2, 3), and sketch its graph.

19. Suppose that V and W are finite dimensional real vector spaces with dimV > dimW . Supposefurther that T : V →W is a linear transformation. Explain why T cannot be one-to-one.

20. Suppose that

A =

1 3 −12 0 56 −2 4

is the matrix for a linear operator T : P2 → P2 with respect to the basis B = p1(x), p2(x), p3(x)of P2, where p1(x) = 3x+ 3x2, p2(x) = −1 + 3x+ 2x2 and p3(x) = 3 + 7x+ 2x2.

a) Find [T (p1(x))]B, [T (p2(x))]B and [T (p3(x))]B.b) Find T (p1(x)), T (p2(x)) and T (p3(x)).c) Find a formula for T (p0 + p1x+ p2x

2).d) Use the formula in part (c) to compute T (1 + x2).

21. Suppose that B = v1,v2,v3,v4 is a basis for a real vector space V . Suppose that T : V → V is alinear operator, with T (v1) = v2, T (v2) = v4, T (v3) = v1 and T (v4) = v3.

a) Find the matrix for T with respect to the basis B.b) Is T one-to-one? If so, describe its inverse.



22. Let Pk denote the vector space of all polynomials with real coefficients and degree at most k.Consider P2 with basis B = 1, x, x2 and P3 with basis C = 1, x, x2, x3. We define T1 : P2 → P3

and T2 : P3 → P2 as follows. For every polynomial p(x) = a0 + a1x + a2x2 in P2, we have

T1(p(x)) = xp(x) = a0x+ a1x2 + a2x

3. For every polynomial q(x) in P3, we have T2(q(x)) = q′(x),the formal derivative of q(x) with respect to the variable x.

a) Show that T1 : P2 → P3 and T2 : P3 → P2 are linear transformations.b) Find T1(1), T1(x), T1(x2), and compute the matrix A1 for T1 : P2 → P3 with respect to the basesB and C.

c) Find T2(1), T2(x), T2(x2), T2(x3), and compute the matrix A2 for T2 : P3 → P2 with respect tothe bases C and B.

d) Let T = T2 T1. Find T (1), T (x), T (x2), and compute the matrix A for T : P2 → P2 with respectto the basis B. Verify that A = A2A1.

23. Suppose that T : V → V is a linear operator on a real vector space V with basis B. Suppose thatfor every v ∈ V , we have

[T (v)]B =

x1 − x2 + x3

x1 + x2

x1 − x2

and [v]B =

x1

x2

x3

.

a) Find the matrix for T with respect to the basis B.b) Is T one-to-one? If so, describe its inverse.

24. For each of the following, let V be the subspace with basis B = f1(x), f2(x), f3(x) of the spaceof all real valued functions defined on R. Let T : V → V be defined by T (f(x)) = f ′(x) for everyfunction f(x) in V . Find the matrix for T with respect to the basis B:

a) f1(x) = 1, f2(x) = sinx, f3(x) = cosxb) f1(x) = e2x, f2(x) = xe2x, f3(x) = x2e2x

25. Let P2 denote the vector space of all polynomials with real coefficients and degree at most 2,with basis B = 1, x, x2. Consider the linear operator T : P2 → P2, where for every polynomialp(x) = a0 + a1x+ a2x

2 in P2, we have T (p(x)) = p(2x+ 1) = a0 + a1(2x+ 1) + a2(2x+ 1)2.a) Find T (1), T (x), T (x2), and compute the matrix A for T with respect to the basis B.b) Use the matrix A to compute T (3 + x+ 2x2).c) Check your calculations in part (b) by computing T (3 + x+ 2x2) directly.d) What is the matrix for T T : P2 → P2 with respect to the basis B?e) Consider a new basis B′ = 1 + x, 1 + x2, x+ x2 of P2. Using a change of basis matrix, compute

the matrix for T with respect to the basis B′.f) Check your answer in part (e) by computing the matrix directly.

26. Consider the linear operator T : P1 → P1, where for every polynomial p(x) = p0 + p1x in P1, wehave T (p(x)) = p0 + p1(x+ 1). Consider also the bases B = 6 + 3x, 10 + 2x and B′ = 2, 3 + 2xof P1.

a) Find the matrix for T with respect to the basis B.b) Use Proposition 8V to compute the matrix for T with respect to the basis B′.

27. Suppose that V and W are finite dimensional real vector spaces. Suppose further that B and B′ arebases for V , and that C and C′ are bases for W . Show that for any linear transformation T : V →W ,we have

[I]C′,C [T ]C,B[I]B,B′ = [T ]C′,B′ .

28. Prove Proposition 8W.

29. Prove Proposition 8X.



30. For each of the following linear transformations T : R3 → R3, find a basis B of R3 such that thematrix for T with respect to the basis B is a diagonal matrix:

a) T (x1, x2, x3) = (−x2 + x3,−x1 + x3, x1 + x2)b) T (x1, x2, x3) = (4x1 + x3, 2x1 + 3x2 + 2x3, x1 + 4x3)

31. Consider the linear operator T : P2 → P2, where

T (p0 + p1x+ p2x2) = (p0 − 6p1 + 12p2) + (13p1 − 30p2)x+ (9p1 − 20p2)x2.

a) Find the eigenvalues of T .b) Find a basis B of P2 such that the matrix for T with respect to B is a diagonal matrix.


LINEAR ALGEBRA

W W L CHEN

c© W W L Chen, 1997, 2008.





Chapter 9

REAL INNER PRODUCT SPACES

9.1. Euclidean Inner Products

In this section, we consider vectors of the form u = (u1, . . . , un) in the euclidean space Rn. In particular,we shall generalize the concept of dot product, norm and distance, first developed for R2 and R3 inChapter 4.

Definition. Suppose that u = (u1, . . . , un) and v = (v1, . . . , vn) are vectors in Rn. The euclidean dotproduct of u and v is defined by

u · v = u1v1 + . . .+ unvn,

the euclidean norm of u is defined by

‖u‖ = (u · u)1/2 = (u21 + . . .+ u2

n)1/2,

and the euclidean distance between u and v is defined by

d(u,v) = ‖u− v‖ = ((u1 − v1)2 + . . .+ (un − vn)2)1/2.

PROPOSITION 9A. Suppose that u,v,w ∈ Rn and c ∈ R. Then(a) u · v = v · u;(b) u · (v + w) = (u · v) + (u ·w);(c) c(u · v) = (cu) · v; and(d) u · u ≥ 0, and u · u = 0 if and only if u = 0.

Chapter 9 : Real Inner Product Spaces page 1 of 16


PROPOSITION 9B. (CAUCHY-SCHWARZ INEQUALITY) Suppose that u,v ∈ Rn. Then

|u · v| ≤ ‖u‖ ‖v‖.

In other words,

|u1v1 + . . .+ unvn| ≤ (u21 + . . .+ u2

n)1/2(v21 + . . .+ v2

n)1/2.

PROPOSITION 9C. Suppose that u,v ∈ Rn and c ∈ R. Then(a) ‖u‖ ≥ 0;(b) ‖u‖ = 0 if and only if u = 0;(c) ‖cu‖ = |c| ‖u‖; and(d) ‖u + v‖ ≤ ‖u‖+ ‖v‖.

PROPOSITION 9D. Suppose that u,v,w ∈ Rn. Then(a) d(u,v) ≥ 0;(b) d(u,v) = 0 if and only if u = v;(c) d(u,v) = d(v,u); and(d) d(u,v) ≤ d(u,w) + d(w,v).

Remark. Parts (d) of Propositions 9C and 9D are known as the Triangle inequality.

In R2 and R3, we say that two non-zero vectors are perpendicular if their dot product is zero. Wenow generalize this idea to vectors in Rn.

Definition. Two vectors u,v ∈ Rn are said to be orthogonal if u · v = 0.

Example 9.1.1. Suppose that u,v ∈ Rn are orthogonal. Then

‖u + v‖2 = (u + v) · (u + v) = u · u + 2u · v + v · v = ‖u‖2 + ‖v‖2.

This is an extension of Pythagoras’s theorem.

Remarks. (1) Suppose that we write u,v ∈ Rn as column matrices. Then

u · v = vtu,

where we use matrix multiplication on the right hand side.

(2) Matrix multiplication can be described in terms of dot product. Suppose that A is an m × nmatrix and B is an n× p matrix. If we let r1, . . . , rm denote the vectors formed from the rows of A, andlet c1, . . . , cp denote the vectors formed from the columns of B, then

AB =

r1 · c1 . . . r1 · cp...

...rm · c1 . . . rm · cp

.

9.2. Real Inner Products

The purpose of this section and the next is to extend our discussion to define inner products in realvector spaces. We begin by giving a reminder of the basics of real vector spaces or vector spaces over R.



Definition. A real vector space V is a set of objects, known as vectors, together with vector addition+ and multiplication of vectors by elements of R, and satisfying the following properties:

(VA1) For every u,v ∈ V , we have u + v ∈ V .(VA2) For every u,v,w ∈ V , we have u + (v + w) = (u + v) + w.(VA3) There exists an element 0 ∈ V such that for every u ∈ V , we have u + 0 = 0 + u = u.(VA4) For every u ∈ V , there exists −u ∈ V such that u + (−u) = 0.(VA5) For every u,v ∈ V , we have u + v = v + u.(SM1) For every c ∈ R and u ∈ V , we have cu ∈ V .(SM2) For every c ∈ R and u,v ∈ V , we have c(u + v) = cu + cv.(SM3) For every a, b ∈ R and u ∈ V , we have (a+ b)u = au + bu.(SM4) For every a, b ∈ R and u ∈ V , we have (ab)u = a(bu).(SM5) For every u ∈ V , we have 1u = u.

Remark. The elements a, b, c ∈ R discussed in (SM1)–(SM5) are known as scalars. Multiplication ofvectors by elements of R is sometimes known as scalar multiplication.

Definition. Suppose that V is a real vector space, and that W is a subset of V . Then we say that Wis a subspace of V if W forms a real vector space under the vector addition and scalar multiplicationdefined in V .

Remark. Suppose that V is a real vector space, and that W is a non-empty subset of V . Then W is asubspace of V if the following conditions are satisfied:(SP1) For every u,v ∈W , we have u + v ∈W .(SP2) For every c ∈ R and u ∈W , we have cu ∈W .

The reader may refer to Chapter 5 for more details and examples.

We are now in a position to define an inner product on a real vector space V . The following definitionis motivated by Proposition 9A concerning the properties of the euclidean dot product in Rn.

Definition. Suppose that V is a real vector space. By a real inner product on V , we mean a function〈 , 〉 : V × V → R which satisfies the following conditions:(IP1) For every u,v ∈ V , we have 〈u,v〉 = 〈v,u〉.(IP2) For every u,v,w ∈ V , we have 〈u,v + w〉 = 〈u,v〉+ 〈u,w〉.(IP3) For every u,v ∈ V and c ∈ R, we have c〈u,v〉 = 〈cu,v〉.(IP4) For every u ∈ V , we have 〈u,u〉 ≥ 0, and 〈u,u〉 = 0 if and only if u = 0.

Remarks. (1) The properties (IP1)–(IP4) describe respectively symmetry, additivity, homogeneity andpositivity.

(2) We sometimes simply refer to an inner product if we know that V is a real vector space.

Definition. A real vector space with an inner product is called a real inner product space.

Our next definition is a natural extension of the idea of euclidean norm and euclidean distance.

Definition. Suppose that u and v are vectors in a real inner pruduct space V . Then the norm of u isdefined by

‖u‖ = 〈u,u〉1/2,

and the distance between u and v is defined by

d(u,v) = ‖u− v‖.



Example 9.2.1. For u,v ∈ Rn, let 〈u,v〉 = u · v, the euclidean dot product discussed in the lastsection. This satisfies Proposition 9A and hence conditions (IP1)–(IP4). The inner product is known asthe euclidean inner product in Rn.

Example 9.2.2. Let w1, . . . , wn be positive real numbers. For u = (u1, . . . , un) and v = (v1, . . . , vn) inRn, let

〈u,v〉 = w1u1v1 + . . .+ wnunvn.

It is easy to check that conditions (IP1)–(IP4) are satisfied. This inner product is called a weightedeuclidean inner product in Rn, and the positive real numbers w1, . . . , wn are known as weights. The unitcircle with respect to this inner product is given by

u ∈ Rn : ‖u‖ = 1 = u ∈ Rn : 〈u,u〉 = 1 = u ∈ Rn : w1u21 + . . .+ wnu

2n = 1.

Example 9.2.3. Let A be a fixed invertible n× n matrix with real entries. For u,v ∈ Rn, interpretedas column matrices, let

〈u,v〉 = Au ·Av,

the euclidean dot product of the vectors Au and Av. It can be checked that conditions (IP1)–(IP4) aresatisfied. This inner product is called the inner product generated by the matrix A. To check conditions(IP1)–(IP4), it is useful to note that

〈u,v〉 = (Av)tAu = vtAtAu.

Example 9.2.4. Consider the vector spaceM2,2(R) of all 2× 2 matrices with real entries. For matrices

U =(u11 u12

u21 u22

)and V =

(v11 v12

v21 v22

)in M2,2(R), let

〈U, V 〉 = u11v11 + u12v12 + u21v21 + u22v22.

It is easy to check that conditions (IP1)–(IP4) are satisfied.

Example 9.2.5. Consider the vector space P2 of all polynomials with real coefficients and of degree atmost 2. For polynomials

p = p(x) = p0 + p1x+ p2x2 and q = q(x) = q0 + q1x+ q2x

2

in P2, let

〈p, q〉 = p0q0 + p1q1 + p2q2.

It can be checked that conditions (IP1)–(IP4) are satisfied.

Example 9.2.6. It is not difficult to show that C[a, b], the collection of all real valued functions con-tinuous in the closed interval [a, b], forms a real vector space. We also know from the theory of realvalued functions that functions continuous over a closed interval [a, b] are integrable over [a, b]. Forf, g ∈ C[a, b], let

〈f, g〉 =∫ b

a

f(x)g(x) dx.

It can be checked that conditions (IP1)–(IP4) are satisfied.



9.3. Angles and Orthogonality

Recall that in R2 and R3, we can actually define the euclidean dot product of two vectors u and v bythe formula

u · v = ‖u‖ ‖v‖ cos θ, (1)

where θ is the angle between u and v. Indeed, this is the approach taken in Chapter 4, and theCauchy-Schwarz inequality, as stated in Proposition 9B, follows immediately from (1), since | cos θ| ≤ 1.

The picture is not so clear in the euclidean space Rn when n > 3, although the Cauchy-Schwarzinequality, as given by Proposition 9B, does allow us to recover a formula of the type (1). But then thenumber θ does not have a geometric interpretation.

We now study the case of a real inner product space. Our first task is to establish a generalized versionof Proposition 9B.

PROPOSITION 9E. (CAUCHY-SCHWARZ INEQUALITY) Suppose that u and v are vectors in areal inner product space V . Then

|〈u,v〉| ≤ ‖u‖ ‖v‖. (2)

Proof. Our proof here looks like a trick, but it works. Suppose that u and v are vectors in a real innerproduct space V . If u = 0, then since 0u = 0, it follows that

〈u,v〉 = 〈0,v〉 = 〈0u,v〉 = 0〈u,v〉 = 0,

so that (2) is clearly satisfied. We may suppose therefore that u 6= 0, so that 〈u,u〉 6= 0. For every realnumber t, it follows from (IP4) that 〈tu + v, tu + v〉 ≥ 0. Hence

0 ≤ 〈tu + v, tu + v〉 = t2〈u,u〉+ 2t〈u,v〉+ 〈v,v〉.

Since 〈u,u〉 6= 0, the right hand side is a quadratic polynomial in t. Since the inequality holds for everyreal number t, it follows that the quadratic polynomial

t2〈u,u〉+ 2t〈u,v〉+ 〈v,v〉

has either repeated roots or no real root, and so the discriminant is non-positive. In other words, wemust have

0 ≥ (2〈u,v〉)2 − 4〈u,u〉〈v,v〉 = 4〈u,v〉2 − 4‖u‖2‖v‖2.

The inequality (2) follows once again. ©

Example 9.3.1. Note that Proposition 9B is a special case of Proposition 9E. In fact, Proposition 9Brepresents the Cauchy-Schwarz inequality for finite sums, that for u1, . . . , un, v1, . . . , vn ∈ R, we have∣∣∣∣∣

n∑i=1

uivi

∣∣∣∣∣ ≤(

n∑i=1

u2i

)1/2( n∑i=1

v2i

)1/2

.

Example 9.3.2. Applying Proposition 9E to the inner product in the vector space C[a, b] studied inExample 9.2.6, we obtain the Cauchy-Schwarz inequality for integrals, that for f, g ∈ C[a, b], we have∣∣∣∣∣

∫ b

a

f(x)g(x) dx

∣∣∣∣∣ ≤(∫ b

a

f2(x) dx

)1/2(∫ b

a

g2(x) dx

)1/2

.



Next, we investigate norm and distance. We generalize Propositions 9C and 9D.

PROPOSITION 9F. Suppose that u and v are vectors in a real inner product space, and that c ∈ R.Then(a) ‖u‖ ≥ 0;(b) ‖u‖ = 0 if and only if u = 0;(c) ‖cu‖ = |c| ‖u‖; and(d) ‖u + v‖ ≤ ‖u‖+ ‖v‖.

PROPOSITION 9G. Suppose that u, v and w are vectors in a real inner product space. Then(a) d(u,v) ≥ 0;(b) d(u,v) = 0 if and only if u = v;(c) d(u,v) = d(v,u); and(d) d(u,v) ≤ d(u,w) + d(w,v).

The proofs are left as exercises.

The Cauchy-Schwarz inequality, as given by Proposition 9E, allows us to recover a formula of the type

〈u,v〉 = ‖u‖ ‖v‖ cos θ. (3)

Although the number θ does not have a geometric interpretation, we can nevertheless interpret it as theangle between the two vectors u and v under the inner product 〈 , 〉. Of particular interest is the casewhen cos θ = 0; in other words, when 〈u,v〉 = 0.

Definition. Suppose that u and v are non-zero vectors in a real inner product space V . Then theunique real number θ ∈ [0, π] satisfying (3) is called the angle between u and v with respect to the innerproduct 〈 , 〉 in V .

Definition. Two vectors u and v in a real inner product space are said to be orthogonal if 〈u,v〉 = 0.

Definition. Suppose that W is a subspace of a real inner product space V . A vector u ∈ V is said tobe orthogonal to W if 〈u,w〉 = 0 for every w ∈ W . The set of all vectors u ∈ V which are orthogonalto W is called the orthogonal complement of W , and denoted by W⊥; in other words,

W⊥ = u ∈ V : 〈u,w〉 = 0 for every w ∈W.

Example 9.3.3. In R3, the non-trivial subspaces are lines and planes through the origin. Under theeuclidean inner product, two non-zero vectors are orthogonal if and only if they are perpendicular. Itfollows that if W is a line through the origin, then W⊥ is the plane through the origin and perpendicularto the line W . Also, if W is a plane through the origin, then W⊥ is the line through the origin andperpendicular to the plane W .

Example 9.3.4. In R4, let us consider the two vectors u = (1, 1, 1, 0) and v = (1, 0, 1, 1). Under theeuclidean inner product, we have

‖u‖ = ‖v‖ =√

3 and 〈u,v〉 = 2.

This verifies the Cauchy-Schwarz inequality. On the other hand, if θ ∈ [0, π] represents the angle betweenu and v with respect to the euclidean inner product, then (3) holds, and we obtain cos θ = 2/3, so thatθ = cos−1(2/3).



Example 9.3.5. In R4, it can be shown that

W = (w1, w2, 0, 0) : w1, w2 ∈ R

is a subspace. Consider now the euclidean inner product, and let

A = (0, 0, u3, u4) : u3, u4 ∈ R.

We shall show that A ⊆ W⊥ and W⊥ ⊆ A, so that W⊥ = A. To show that A ⊆ W⊥, note that forevery (0, 0, u3, u4) ∈ A, we have

〈(0, 0, u3, u4), (w1, w2, 0, 0)〉 = (0, 0, u3, u4) · (w1, w2, 0, 0) = 0

for every (w1, w2, 0, 0) ∈ W , so that (0, 0, u3, u4) ∈ W⊥. To show that W⊥ ⊆ A, note that for every(u1, u2, u3, u4) ∈W⊥, we need to have

〈(u1, u2, u3, u4), (w1, w2, 0, 0)〉 = (u1, u2, u3, u4) · (w1, w2, 0, 0) = u1w1 + u2w2 = 0

for every (w1, w2, 0, 0) ∈ W . The choice (w1, w2, 0, 0) = (1, 0, 0, 0) requires us to have u1 = 0, while thechoice (w1, w2, 0, 0) = (0, 1, 0, 0) requires us to have u2 = 0. Hence we must have u1 = u2 = 0, so that(u1, u2, u3, u4) ∈ A.

Example 9.3.6. Let us consider the inner product on M2,2(R) discussed in Example 9.2.4. Let

U =(

1 03 4

)and V =

(4 20 −1

).

Then 〈U, V 〉 = 0, so that the two matrices are orthogonal.

Example 9.3.7. Let us consider the inner product on P2 discussed in Example 9.2.5. Let

p = p(x) = 1 + 2x+ 3x2 and q = q(x) = 4 + x− 2x2.

Then 〈p, q〉 = 0, so that the two polynomials are orthogonal.

Example 9.3.8. Let us consider the inner product on C[a, b] discussed in Example 9.2.6. In particular,let [a, b] = [0, π/2]. Suppose that

f(x) = sinx− cosx and g(x) = sinx+ cosx.

Then

〈f, g〉 =∫ π/2

0

f(x)g(x) dx =∫ π/2

0

(sinx− cosx)(sinx+ cosx) dx =∫ π/2

0

(sin2 x− cos2 x) dx = 0,

so that the two functions are orthogonal.

Example 9.3.9. Suppose that A is an m × n matrix with real entries. Recall that if we let r1, . . . , rmdenote the vectors formed from the rows of A, then the row space of A is given by

c1r1 + . . .+ cmrm : c1, . . . , cm ∈ R,

and is a subspace of Rn. On the other hand, the set

x ∈ Rn : Ax = 0

is called the nullspace of A, and is also a subspace of Rn. Clearly, if x belongs to the nullspace of A,then ri · x = 0 for every i = 1, . . . ,m. In fact, the row space of A and the nullspace of A are orthogonal



complements of each other under the euclidean inner product in Rn. On the other hand, the columnspace of A is the row space of At. It follows that the column space of A and the nullspace of At areorthogonal complements of each other under the euclidean inner product in Rm.

Example 9.3.10. Suppose that u and v are orthogonal vectors in an inner product space. Then

‖u + v‖2 = 〈u + v,u + v〉 = 〈u,u〉+ 2〈u,v〉+ 〈v,v〉 = ‖u‖2 + ‖v‖2.

This is a generalized version of Pythagoras’s theorem.

Remark. We emphasize here that orthogonality depends on the choice of the inner product. Very often,a real vector space has more than one inner product. Vectors orthogonal with respect to one may notbe orthogonal with respect to another. For example, the vectors u = (1, 1) and v = (1,−1) in R2 areorthogonal with respect to the euclidean inner product

〈u,v〉 = u1v1 + u2v2,

but not orthogonal with respect to the weighted euclidean inner product

〈u,v〉 = 2u1v1 + u2v2.

9.4. Orthogonal and Orthonormal Bases

Suppose that v1, . . . ,vr are vectors in a real vector space V . We often consider linear combinations ofthe type c1v1 + . . .+ crvr, where c1, . . . , cr ∈ R. The set

spanv1, . . . ,vr = c1v1 + . . .+ crvr : c1, . . . , cr ∈ R

of all such linear combinations is called the span of the vectors v1, . . . ,vr. We also say that the vectorsv1, . . . ,vr span V if spanv1, . . . ,vr = V ; in other words, if every vector in V can be expressed as alinear combination of the vectors v1, . . . ,vr.

It can be shown that spanv1, . . . ,vr is a subspace of V . Suppose further that W is a subspace of Vand v1, . . . ,vr ∈W . Then spanv1, . . . ,vr ⊆W .

On the other hand, the spanning set v1, . . . ,vr may contain more vectors than are necessary todescribe all the vectors in the span. This leads to the idea of linear independence.

Definition. Suppose that v1, . . . ,vr are vectors in a real vector space V .(LD) We say that v1, . . . ,vr are linearly dependent if there exist c1, . . . , cr ∈ R, not all zero, such that

c1v1 + . . .+ crvr = 0.(LI) We say that v1, . . . ,vr are linearly independent if they are not linearly dependent; in other words,

if the only solution of c1v1 + . . .+ crvr = 0 in c1, . . . , cr ∈ R is given by c1 = . . . = cr = 0.

Definition. Suppose that v1, . . . ,vr are vectors in a real vector space V . We say that v1, . . . ,vr isa basis for V if the following two conditions are satisfied:(B1) We have spanv1, . . . ,vr = V .(B2) The vectors v1, . . . ,vr are linearly independent.

Suppose that v1, . . . ,vr is a basis for a real vector space V . Then it can be shown that every elementu ∈ V can be expressed uniquely in the form u = c1v1 + . . .+ crvr, where c1, . . . , cr ∈ R.

We shall restrict our discussion to finite-dimensional real vector spaces. A real vector space V is said tobe finite-dimensional if it has a basis containing only finitely many elements. Suppose that v1, . . . ,vn



is such a basis. Then it can be shown that any collection of more than n vectors in V must be linearlydependent. It follows that any two bases for V must have the same number of elements. This commonnumber is known as the dimension of V .

It can be shown that if V is a finite-dimensional real vector space, then any finite set of linearlyindependent vectors in V can be expanded, if necessary, to a basis for V . This establishes the existenceof a basis for any finite-dimensional vector space. On the other hand, it can be shown that if thedimension of V is equal to n, then any set of n linearly independent vectors in V is a basis for V .

Remark. The above is discussed in far greater detail, including examples and proofs, in Chapter 5.

The purpose of this section is to add the extra ingredient of orthogonality to the above discussion.

Definition. Suppose that V is a finite-dimensional real inner product space. A basis v1, . . . ,vn of Vis said to be an orthogonal basis of V if 〈vi,vj〉 = 0 for every i, j = 1, . . . , n satisfying i 6= j. It is saidto be an orthonormal basis if it satisfies the extra condition that ‖vi‖ = 1 for every i = 1, . . . , n.

Example 9.4.1. The usual basis v1, . . . ,vn in Rn, where

vi = (0, . . . , 0︸︷︷︸i−1

, 1, 0, . . . , 0︸︷︷︸n−i

)

for every i = 1, . . . , n, is an orthonormal basis of Rn with respect to the euclidean inner product.

Example 9.4.2. The vectors v1 = (1, 1) and v2 = (1,−1) are linearly independent in R2 and satisfy

〈v1,v2〉 = v1 · v2 = 0.

It follows that v1,v2 is an orthogonal basis of R2 with respect to the euclidean inner product. Canyou find an orthonormal basis of R2 by normalizing v1 and v2?

It is theoretically very simple to express any vector as a linear combination of the elements of anorthogonal or orthonormal basis.

PROPOSITION 9H. Suppose that V is a finite-dimensional real inner product space. If v1, . . . ,vnis an orthogonal basis of V , then for every vector u ∈ V , we have

u =〈u,v1〉‖v1‖2 v1 + . . .+

〈u,vn〉‖vn‖2 vn.

Furthermore, if v1, . . . ,vn is an orthonormal basis of V , then for every vector u ∈ V , we have

u = 〈u,v1〉v1 + . . .+ 〈u,vn〉vn.

Proof. Since v1, . . . ,vn is a basis of V , there exist unique c1, . . . , cn ∈ R such that

u = c1v1 + . . .+ cnvn.

For every i = 1, . . . , n, we have

〈u,vi〉 = 〈c1v1 + . . .+ cnvn,vi〉 = c1〈v1,vi〉+ . . .+ cn〈vn,vi〉 = ci〈vi,vi〉since 〈vj ,vi〉 = 0 if j 6= i. Clearly vi 6= 0, so that 〈vi,vi〉 6= 0, and so

ci =〈u,vi〉〈vi,vi〉

for every i = 1, . . . , n. The first assertion follows immediately. For the second assertion, note that〈vi,vi〉 = 1 for every i = 1, . . . , n. ©



Collections of vectors that are orthogonal to each other are very useful in the study of vector spaces,as illustrated by the following important result.

PROPOSITION 9J. Suppose that the non-zero vectors v1, . . . ,vr in a finite-dimensional real innerproduct space are pairwise orthogonal. Then they are linearly independent.

Proof. Suppose that c1, . . . , cr ∈ R and

c1v1 + . . .+ crvr = 0.

Then for every i = 1, . . . , r, we have

0 = 〈0,vi〉 = 〈c1v1 + . . .+ crvr,vi〉 = c1〈v1,vi〉+ . . .+ cr〈vr,vi〉 = ci〈vi,vi〉

since 〈vj ,vi〉 = 0 if j 6= i. Clearly vi 6= 0, so that 〈vi,vi〉 6= 0, and so we must have ci = 0 for everyi = 1, . . . , r. It follows that c1 = . . . = cr = 0. ©

Of course, the above is based on the assumption that an orthogonal basis exists. Our next task isto show that this is indeed the case. Our proof is based on a technique which orthogonalizes any givenbasis of a vector space.

PROPOSITION 9K. Every finite-dimensional real inner product space has an orthogonal basis, andhence also an orthonormal basis.

Remark. We shall prove Proposition 9K by using the Gram-Schmidt process. The central idea of thisprocess, in its simplest form, can be described as follows. Suppose that v1 and u2 are two non-zerovectors in an inner product space, not necessarily orthogonal to each other. We shall attempt to removesome scalar multiple α1v1 from u2 so that v2 = u2 − α1v1 is orthogonal to v1; in other words, we wishto find a suitable real number α1 such that

〈v1,v2〉 = 〈v1,u2 − α1v1〉 = 0.

The idea is illustrated in the picture below.


Collections of vectors that are orthogonal to each other are very useful in the study of vector spaces,as illustrated by the following important result.

PROPOSITION 9J. Suppose that the non-zero vectors v1, . . . ,vr in a finite-dimensional real innerproduct space are pairwise orthogonal. Then they are linearly independent.

Proof. Suppose that c1, . . . , cr " R and

c1v1 + . . . + crvr = 0.

Then for every i = 1, . . . , r, we have

0 = #0,vi$ = #c1v1 + . . . + crvr,vi$ = c1#v1,vi$+ . . . + cr#vr,vi$ = ci#vi,vi$

since #vj ,vi$ = 0 if j %= i. Clearly vi %= 0, so that #vi,vi$ %= 0, and so we must have ci = 0 for everyi = 1, . . . , r. It follows that c1 = . . . = cr = 0. !

Of course, the above is based on the assumption that an orthogonal basis exists. Our next task isto show that this is indeed the case. Our proof is based on a technique which orthogonalizes any givenbasis of a vector space.

PROPOSITION 9K. Every finite-dimensional real inner product space has an orthogonal basis, andhence also an orthonormal basis.

Remark. We shall prove Proposition 9K by using the Gram-Schmidt process. The central idea of thisprocess, in its simplest form, can be described as follows. Suppose that v1 and u2 are two non-zerovectors in an inner product space, not necessarily orthogonal to each other. We shall attempt to removesome scalar multiple !1v1 from u2 so that v2 = u2 & !1v1 is orthogonal to v1; in other words, we wishto find a suitable real number !1 such that

#v1,v2$ = #v1,u2 & !1v1$ = 0.

The idea is illustrated in the picture below.

v1u2

v2

We clearly need #v1,u2$ & !1#v1,v1$ = 0, and

!1 =#v1,u2$#v1,v1$ =

#v1,u2$'v1'2

is a suitable choice, so that

(4) v1 and v2 = u2 & #v1,u2$'v1'2

v1

are now orthogonal. Suppose in general that v1, . . . ,vs and us+1 are non-zero vectors in an inner productspace, where v1, . . . ,vs are pairwise orthogonal. We shall attempt to remove some linear combination


We clearly need 〈v1,u2〉 − α1〈v1,v1〉 = 0, and

α1 =〈v1,u2〉〈v1,v1〉 =

〈v1,u2〉‖v1‖2


v1 and v2 = u2 − 〈v1,u2〉‖v1‖2 v1 (4)

are now orthogonal. Suppose in general that v1, . . . ,vs and us+1 are non-zero vectors in an inner productspace, where v1, . . . ,vs are pairwise orthogonal. We shall attempt to remove some linear combination



α1v1 + . . .+αsvs from us+1 so that vs+1 = us+1−α1v1− . . .−αsvs is orthogonal to each of v1, . . . ,vs;in other words, we wish to find suitable real numbers α1, . . . , αs such that

〈vi,vs+1〉 = 〈vi,us+1 − α1v1 − . . .− αsvs〉 = 0

for every i = 1, . . . , s. We clearly need

〈vi,us+1〉 − α1〈vi,v1〉 − . . .− αs〈vi,vs〉 = 〈vi,us+1〉 − αi〈vi,vi〉 = 0,

and

αi =〈vi,us+1〉〈vi,vi〉 =

〈vi,us+1〉‖vi‖2


v1, . . . ,vs and vs+1 = us+1 − 〈v1,us+1〉‖v1‖2 v1 − . . .− 〈vs,us+1〉

‖vs‖2 vs (5)

are now pairwise orthogonal.

Example 9.4.3. The vectors

u1 = (1, 2, 1, 0), u2 = (3, 3, 3, 0), u3 = (2,−10, 0, 0), u4 = (−2, 1,−6, 2)

are linearly independent in R4, since

det

1 3 2 −22 3 −10 11 3 0 −60 0 0 2

6= 0.

Hence u1,u2,u3,u4 is a basis of R4. Let us consider R4 as a real inner product space with theeuclidean inner product, and apply the Gram-Schmidt process to this basis. We have

v1 = u1 = (1, 2, 1, 0),

v2 = u2 − 〈v1,u2〉‖v1‖2 v1 = (3, 3, 3, 0)− 〈(1, 2, 1, 0), (3, 3, 3, 0)〉

‖(1, 2, 1, 0)‖2 (1, 2, 1, 0)

= (3, 3, 3, 0)− 126

(1, 2, 1, 0) = (3, 3, 3, 0) + (−2,−4,−2, 0) = (1,−1, 1, 0),

v3 = u3 − 〈v1,u3〉‖v1‖2 v1 − 〈v2,u3〉

‖v2‖2 v2

= (2,−10, 0, 0)− 〈(1, 2, 1, 0), (2,−10, 0, 0)〉‖(1, 2, 1, 0)‖2 (1, 2, 1, 0)− 〈(1,−1, 1, 0), (2,−10, 0, 0)〉

‖(1,−1, 1, 0)‖2 (1,−1, 1, 0)

= (2,−10, 0, 0) +186

(1, 2, 1, 0)− 123

(1,−1, 1, 0)

= (2,−10, 0, 0) + (3, 6, 3, 0) + (−4, 4,−4, 0) = (1, 0,−1, 0),

v4 = u4 − 〈v1,u4〉‖v1‖2 v1 − 〈v2,u4〉

‖v2‖2 v2 − 〈v3,u4〉‖v3‖2 v3

= (−2, 1,−6, 2)− 〈(1, 2, 1, 0), (−2, 1,−6, 2)〉‖(1, 2, 1, 0)‖2 (1, 2, 1, 0)

− 〈(1,−1, 1, 0), (−2, 1,−6, 2)〉‖(1,−1, 1, 0)‖2 (1,−1, 1, 0)− 〈(1, 0,−1, 0), (−2, 1,−6, 2)〉

‖(1, 0,−1, 0)‖2 (1, 0,−1, 0)

= (−2, 1,−6, 2) +66

(1, 2, 1, 0) +93

(1,−1, 1, 0)− 42

(1, 0,−1, 0)

= (−2, 1,−6, 2) + (1, 2, 1, 0) + (3,−3, 3, 0) + (−2, 0, 2, 0) = (0, 0, 0, 2).



It is easy to verify that the four vectors

v1 = (1, 2, 1, 0), v2 = (1,−1, 1, 0), v3 = (1, 0,−1, 0), v4 = (0, 0, 0, 2)

are pairwise orthogonal, so that v1,v2,v3,v4 is an orthogonal basis of R4. Normalizing each of thesefour vectors, we obtain the corresponding orthonormal basis(

1√6,

2√6,

1√6, 0),

(1√3,− 1√

3,

1√3, 0),

(1√2, 0,− 1√

2, 0), (0, 0, 0, 1)

.

Proof of Proposition 9K. Suppose that the vector space V has dimension of n. Then it has a basisof the type u1, . . . ,un. We now let v1 = u1, and define v2, . . . ,vn inductively by (4) and (5) to obtaina set of pairwise orthogonal vectors v1, . . . ,vn. Clearly none of these n vectors is zero, for if vs+1 = 0,then it follows from (5) that v1, . . . ,vs,us+1, and hence u1, . . . ,us,us+1, are linearly dependent, clearlya contradiction. It now follows from Proposition 9J that v1, . . . ,vn are linearly independent, and somust form a basis of V . This proves the first assertion. To prove the second assertion, observe that eachof the vectors

v1

‖v1‖ , . . . ,vn‖vn‖

has norm 1. ©

Example 9.4.4. Consider the real inner product space P2, where for polynomials

p = p(x) = p0 + p1x+ p2x2 and q = q(x) = q0 + q1x+ q2x

2,

the inner product is defined by

〈p, q〉 = p0q0 + p1q1 + p2q2.

The polynomials

u1 = 3 + 4x+ 5x2, u2 = 9 + 12x+ 5x2, u3 = 1− 7x+ 25x2

are linearly independent in P2, since

det

3 9 14 12 −75 5 25

6= 0.

Hence u1, u2, u3 is a basis of P2. Let us apply the Gram-Schmidt process to this basis. We have

v1 = u1 = 3 + 4x+ 5x2,

v2 = u2 − 〈v1, u2〉‖v1‖2 v1 = (9 + 12x+ 5x2)− 〈3 + 4x+ 5x2, 9 + 12x+ 5x2〉

‖3 + 4x+ 5x2‖2 (3 + 4x+ 5x2)

= (9 + 12x+ 5x2)− 10050

(3 + 4x+ 5x2) = (9 + 12x+ 5x2) + (−6− 8x− 10x2) = 3 + 4x− 5x2,

v3 = u3 − 〈v1, u3〉‖v1‖2 v1 − 〈v2, u3〉

‖v2‖2 v2

= (1− 7x+ 25x2)− 〈3 + 4x+ 5x2, 1− 7x+ 25x2〉‖3 + 4x+ 5x2‖2 (3 + 4x+ 5x2)

− 〈3 + 4x− 5x2, 1− 7x+ 25x2〉‖3 + 4x− 5x2‖2 (3 + 4x− 5x2)

= (1− 7x+ 25x2)− 10050

(3 + 4x+ 5x2) +15050

(3 + 4x− 5x2)

= (1− 7x+ 25x2) + (−6− 8x− 10x2) + (9 + 12x− 15x2) = 4− 3x+ 0x2.



It is easy to verify that the three polynomials

v1 = 3 + 4x+ 5x2, v2 = 3 + 4x− 5x2, v3 = 4− 3x+ 0x2

are pairwise orthogonal, so that v1, v2, v3 is an orthogonal basis of P2. Normalizing each of these threepolynomials, we obtain the corresponding orthonormal basis

3√50

+4√50x+

5√50x2,

3√50

+4√50x− 5√

50x2,

45− 3

5x+ 0x2

.

9.5. Orthogonal Projections

The Gram-Schmidt process is an example of using orthogonal projections. The geometric interpretationof

v2 = u2 − 〈v1,u2〉‖v1‖2 v1

is that we have removed from u2 its orthogonal projection on v1; in other words, we have removed fromu2 the component of u2 which is “parallel” to v1, so that the remaining part must be “perpendicular”to v1.

It is natural to consider the following question. Suppose that V is a finite-dimensional real innerproduct space, and that W is a subspace of V . Given any vector u ∈ V , can we write

u = w + p,

where w ∈W and p ∈W⊥? If so, is this expression unique?

The following result answers these two questions in the affirmative.

PROPOSITION 9L. Suppose that V is a finite-dimensional real inner product space, and that W isa subspace of V . Suppose further that v1, . . . ,vr is an orthogonal basis of W . Then for any vectoru ∈ V ,

w =〈u,v1〉‖v1‖2 v1 + . . .+

〈u,vr〉‖vr‖2 vr

is the unique vector satisfying w ∈W and u−w ∈W⊥.

Proof. Note that the orthogonal basis v1, . . . ,vr of W can be extended to a basis

v1, . . . ,vr,ur+1, . . . ,un

of V which can then be orthogonalized by the Gram-Schmidt process to an orthogonal basis

v1, . . . ,vr,vr+1, . . . ,vn

of V . Clearly vr+1, . . . ,vn ∈ W⊥. Suppose now that u ∈ V . Then u can be expressed as a linearcombination of v1, . . . ,vn in a unique way. By Proposition 9H, this unique expression is given by

u =〈u,v1〉‖v1‖2 v1 + . . .+

〈u,vn〉‖vn‖2 vn = w +

〈u,vr+1〉‖vr+1‖2 vr+1 + . . .+

〈u,vn〉‖vn‖2 vn.

Clearly u−w ∈W⊥. ©



Definition. The vector w in Proposition 9L is called the orthogonal projection of u on the subspaceW , and denoted by projWu. The vector p = u − w is called the component of u orthogonal to thesubspace W .

Example 9.5.1. Recall Example 9.4.3. Consider the subspace W = spanu1,u2. Note that v1 and v2

can each be expressed as a linear combination of u1 and u2, and that u1 and u2 can each be expressedas a linear combination of v1 and v2. It follows that v1,v2 is an orthogonal basis of W . This basiscan be extended to an orthogonal basis v1,v2,v3,v4 of R4. It follows that W⊥ = spanv3,v4.




1. In each of the following, determine whether 〈 , 〉 is an inner product in the given vector space bychecking whether conditions (IP1)–(IP4) hold:

a) R2; 〈u,v〉 = 2u1v1 − u2v2 b) R2; 〈u,v〉 = u1v1 + 2u1v2 + u2v2

c) R3; 〈u,v〉 = u21v

21 + u2

2v22 + u2

3v23

2. Consider the vector space R2. Suppose that 〈 , 〉 is the inner product generated by the matrix

A =(

2 12 3

).

Evaluate each of the following:a) 〈(1, 2), (2, 3)〉 b) ‖(1, 2)‖ c) d((1, 2), (2, 3))

3. Suppose that the vectors u,v,w in an inner product space V satisfy 〈u,v〉 = 2, 〈v,w〉 = −3,〈u,w〉 = 5, ‖u‖ = 1, ‖v‖ = 2 and ‖w‖ = 7. Evaluate each of the following:

a) 〈u + v,v + w〉 b) 〈2v −w, 3u + 2w〉 c) 〈u− v − 2w, 4u + v〉d) ‖u + v‖ e) ‖2w − v‖ f) ‖u− 2v + 4w‖

4. Suppose that u and v are two non-zero vectors in the real vector space R2. Follow the steps belowto establish the existence of a real inner product 〈 , 〉 on R2 such that 〈u,v〉 6= 0.

a) Explain, in terms of the euclidean inner product, why we may restrict our discussion to vectorsof the form u = (x, y) and v = (ky,−kx), where x, y, k ∈ R satisfy (x, y) 6= (0, 0) and k 6= 0.

b) Explain next why we may further restrict our discussion to vectors of the form u = (x, y) andv = (y,−x), where x, y ∈ R satisfy (x, y) 6= (0, 0).

c) Let u = (x, y) and v = (y,−x), where x, y ∈ R and (x, y) 6= (0, 0). Consider the inner producton R2 generated by the real matrix

A =(a bb c

),

where ac 6= b2. Show that 〈u,v〉 = (a2 − c2)xy + b(a+ c)(y2 − x2).d) Suppose that x2 = y2. Show that the choice a > c > b = 0 will imply 〈u,v〉 6= 0.e) Suppose that x2 6= y2. Show that the choice c = a > b > 0 will imply 〈u,v〉 6= 0.

5. Consider the real vector space R2.a) Find two distinct non-zero vectors u,v ∈ R2 such that 〈u,v〉 = 0 for every weighted euclidean

inner product on R2.b) Find two distinct non-zero vectors u,v ∈ R2 such that 〈u,v〉 6= 0 for any inner product on R2.

6. For each of the following inner product spaces and subspaces W , find W⊥:a) R2 (euclidean inner product); W = (x, y) ∈ R2 : x+ 2y = 0.b) M2,2(R) (inner product discussed in Section 9.2);

W =(

ta 00 tb

): t ∈ R

,

where a and b are non-zero.

7. Suppose that v1, . . . ,vn is a basis for a real inner product space V . Does there exist v ∈ V whichis orthogonal to every vector in this basis?

8. Use the Cauchy-Schwarz inequality to prove that (a cos θ + b sin θ)2 ≤ a2 + b2 for every a, b, θ ∈ R.[Hint: First find a suitable real inner product space.]



9. Prove Proposition 9F.

10. Show that 〈u,v〉 = 14‖u + v‖2 − 1

4‖u− v‖2 for any u and v in a real inner product space.

11. Suppose that v1, . . . ,vn is an orthonormal basis of a real inner product space V . Show that forevery u ∈ V , we have ‖u‖2 = 〈u,v1〉2 + . . .+ 〈u,vn〉2.

12. Show that if v1, . . . ,vn are pairwise orthogonal in a real inner product space V , then

‖v1 + . . .+ vn‖2 = ‖v1‖2 + . . .+ ‖vn‖2.

13. Show that v1 = (2,−2, 1), v2 = (2, 1,−2) and v3 = (1, 2, 2) form an orthogonal basis of R3 underthe euclidean inner product. Then write u = (−1, 0, 2) as a linear combination of v1,v2,v3.

14. Let u1 = (2, 2,−1), u2 = (4, 1, 1) and u3 = (1, 10,−5). Show that u1,u2,u3 is a basis of R3, andapply the Gram-Schmidt process to this basis to find an orthonormal basis of R3.

15. Show that the vectors u1 = (0, 2, 1, 0), u2 = (1,−1, 0, 0), u3 = (1, 2, 0,−1) and u4 = (1, 0, 0, 1) forma basis of R4. Then apply the Gram-Schmidt process to find an orthogonal basis of R4. Find alsothe corresponding orthonormal basis of R4.

16. Consider the vector space P2 with the inner product

〈p, q〉 =∫ 1

0

p(x)q(x) dx.

Apply the Gram-Schmidt process to the basis 1, x, x2 to find an orthogonal basis of P2. Find alsothe corresponding orthonormal basis of P2.

17. Suppose that we apply the Gram-Schmidt process to non-zero vectors u1, . . . ,un without firstchecking that these form a basis of the inner product space, and obtain vs = 0 for some s = 1, . . . , n.What conclusion can we draw concerning the collection u1, . . . ,un?


LINEAR ALGEBRA

W W L CHEN

c© W W L Chen, 1997, 2008.





Chapter 10

ORTHOGONAL MATRICES

10.1. Introduction

Definition. A square matrix A with real entries and satisfying the condition A−1 = At is called anorthogonal matrix.

Example 10.1.1. Consider the euclidean space R2 with the euclidean inner product. The vectorsu1 = (1, 0) and u2 = (0, 1) form an orthonormal basis B = u1,u2. Let us now rotate u1 and u2

anticlockwise by an angle θ to obtain v1 = (cos θ, sin θ) and v2 = (− sin θ, cos θ). Then C = v1,v2 isalso an orthonormal basis.

LINEAR ALGEBRA

W W L CHEN

c! W W L Chen, 1997, 2006.





Chapter 10

ORTHOGONAL MATRICES

10.1. Introduction

Definition. A square matrix A with real entries and satisfying the condition A!1 = At is called anorthogonal matrix.

Example 10.1.1. Consider the euclidean space R2 with the euclidean inner product. The vectorsu1 = (1, 0) and u2 = (0, 1) form an orthonormal basis B = u1,u2. Let us now rotate u1 and u2

anticlockwise by an angle ! to obtain v1 = (cos !, sin !) and v2 = (" sin !, cos !). Then C = v1,v2 isalso an orthonormal basis.

!u1

u2

v1

v2

Chapter 10 : Orthogonal Matrices page 1 of 11Chapter 10 : Orthogonal Matrices page 1 of 11


The transition matrix from the basis C to the basis B is given by

P = ( [v1]B [v2]B ) =(


).

Clearly

P−1 = P t =(

cos θ sin θ− sin θ cos θ

).

In fact, our example is a special case of the following general result.

PROPOSITION 10A. Suppose that B = u1, . . . ,un and C = v1, . . . ,vn are two orthonormalbases of a real inner product space V . Then the transition matrix P from the basis C to the basis B isan orthogonal matrix.


A =

1/3 −2/3 2/32/3 −1/3 −2/32/3 2/3 1/3

is orthogonal, since

AtA =

1/3 2/3 2/3−2/3 −1/3 2/32/3 −2/3 1/3

1/3 −2/3 2/32/3 −1/3 −2/32/3 2/3 1/3

=

1 0 00 1 00 0 1

.

Note also that the row vectors of A, namely (1/3,−2/3, 2/3), (2/3,−1/3,−2/3) and (2/3, 2/3, 1/3) areorthonormal. So are the column vectors of A.

In fact, our last observation is not a coincidence.

PROPOSITION 10B. Suppose that A is an n× n matrix with real entries. Then(a) A is orthogonal if and only if the row vectors of A form an orthonormal basis of Rn under the

euclidean inner product; and(b) A is orthogonal if and only if the column vectors of A form an orthonormal basis of Rn under the

euclidean inner product.

Proof. We shall only prove (a), since the proof of (b) is almost identical. Let r1, . . . , rn denote the rowvectors of A. Then

AAt =

r1 · r1 . . . r1 · rn...

...rn · r1 . . . rn · rn

.

It follows that AAt = I if and only if for every i, j = 1, . . . , n, we have

ri · rj =

1 if i = j,0 if i 6= j,

if and only if r1, . . . , rn are orthonormal. ©

PROPOSITION 10C. Suppose that A is an n× n matrix with real entries. Suppose further that theinner product in Rn is the euclidean inner product. Then the following are equivalent:(a) A is orthogonal.(b) For every x ∈ Rn, we have ‖Ax‖ = ‖x‖.(c) For every u,v ∈ Rn, we have Au ·Av = u · v.

Chapter 10 : Orthogonal Matrices page 2 of 11


Proof. ((a)⇒(b)) Suppose that A is orthogonal, so that AtA = I. It follows that for every x ∈ Rn, wehave

‖Ax‖2 = Ax ·Ax = xtAtAx = xtIx = xtx = x · x = ‖x‖2.

((b)⇒(c)) Suppose that ‖Ax‖ = ‖x‖ for every x ∈ Rn. Then for every u,v ∈ Rn, we have

Au·Av = 14‖Au+Av‖2− 1

4‖Au−Av‖2 = 14‖A(u+v)‖2− 1

4‖A(u−v)‖2 = 14‖u+v‖2− 1

4‖u−v‖2 = u·v.

((c)⇒(a)) Suppose that Au ·Av = u · v for every u,v ∈ Rn. Then

Iu · v = u · v = Au ·Av = vtAtAu = AtAu · v,

so that

(AtA− I)u · v = 0.

In particular, this holds when v = (AtA− I)u, so that

(AtA− I)u · (AtA− I)u = 0,

whence

(AtA− I)u = 0, (1)

in view of Proposition 9A(d). But then (1) is a system of n homogeneous linear equations in n unknownssatisfied by every u ∈ Rn. Hence the coefficient matrix AtA−I must be the zero matrix, and so AtA = I.©

Proof of Proposition 10A. For every u ∈ V , we can write

u = β1u1 + . . .+ βnun = γ1v1 + . . .+ γnvn, where β1, . . . , βn, γ1, . . . , γn ∈ R,

and where B = u1, . . . ,un and C = v1, . . . ,vn are two orthonormal bases of V . Then

‖u‖2 = 〈u,u〉 = 〈β1u1 + . . .+ βnun, β1u1 + . . .+ βnun〉 =n∑

i=1

n∑j=1

βiβj〈ui,uj〉 =n∑

i=1

β2i

= (β1, . . . , βn) · (β1, . . . , βn).

Similarly,

‖u‖2 = 〈u,u〉 = 〈γ1v1 + . . .+ γnvn, γ1v1 + . . .+ γnvn〉 =n∑

i=1

n∑j=1

γiγj〈vi,vj〉 =n∑

i=1

γ2i

= (γ1, . . . , γn) · (γ1, . . . , γn).

It follows that in Rn with the euclidean norm, we have ‖[u]B‖ = ‖[u]C‖, and so ‖P [u]C‖ = ‖[u]C‖ forevery u ∈ V . Hence ‖Px‖ = ‖x‖ holds for every x ∈ Rn. It now follows from Proposition 10C that Pis orthogonal. ©



10.2. Eigenvalues and Eigenvectors

In this section, we give a brief review on eigenvalues and eigenvectors first discussed in Chapter 7.

Suppose that

A =

a11 . . . a1n...

...an1 . . . ann

is an n × n matrix with real entries. Suppose further that there exist a number λ ∈ R and a non-zerovector v ∈ Rn such that Av = λv. Then we say that λ is an eigenvalue of the matrix A, and that v isan eigenvector corresponding to the eigenvalue λ. In this case, we have Av = λv = λIv, where I is then× n identity matrix, so that (A− λI)v = 0. Since v ∈ Rn is non-zero, it follows that we must have

det(A− λI) = 0. (2)

In other words, we must have

det

a11 − λ a12 . . . a1n

a21 a22 − λ a2n...

. . ....

an1 an2 . . . ann − λ

= 0.

Note that (2) is a polynomial equation. The polynomial det(A−λI) is called the characteristic polynomialof the matrix A. Solving this equation (2) gives the eigenvalues of the matrix A.

On the other hand, for any eigenvalue λ of the matrix A, the set

v ∈ Rn : (A− λI)v = 0 (3)

is the nullspace of the matrix A−λI, and forms a subspace of Rn. This space (3) is called the eigenspacecorresponding to the eigenvalue λ.

Suppose now that A has eigenvalues λ1, . . . , λn ∈ R, not necessarily distinct, with correspondingeigenvectors v1, . . . ,vn ∈ Rn, and that v1, . . . ,vn are linearly independent. Then it can be shown that

P−1AP = D,

where

P = ( v1 . . . vn ) and D =

λ1

. . .λn

.

In fact, we say that A is diagonalizable if there exists an invertible matrix P with real entries suchthat P−1AP is a diagonal matrix with real entries. It follows that A is diagonalizable if its eigenvectorsform a basis of Rn. In the opposite direction, one can show that if A is diagonalizable, then it has nlinearly independent eigenvectors in Rn. It therefore follows that the question of diagonalizing a matrixA with real entries is reduced to one of linear independence of its eigenvectors.

We now summarize our discussion so far.



DIAGONALIZATION PROCESS. Suppose that A is an n× n matrix with real entries.(1) Determine whether the n roots of the characteristic polynomial det(A− λI) are real.(2) If not, then A is not diagonalizable. If so, then find the eigenvectors corresponding to these eigen-

values. Determine whether we can find n linearly independent eigenvectors.(3) If not, then A is not diagonalizable. If so, then write

P = ( v1 . . . vn ) and D =

λ1

. . .λn

,

where λ1, . . . , λn ∈ R are the eigenvalues of A and where v1, . . . ,vn ∈ Rn are respectively theircorresponding eigenvectors. Then P−1AP = D.

In particular, it can be shown that if A has distinct eigenvalues λ1, . . . , λn ∈ R, with correspondingeigenvectors v1, . . . ,vn ∈ Rn, then v1, . . . ,vn are linearly independent. It follows that all such matricesA are diagonalizable.

10.3. Orthonormal Diagonalization

We now consider the euclidean space Rn an as inner product space with the euclidean inner product.Given any n × n matrix A with real entries, we wish to find out whether there exists an orthonormalbasis of Rn consisting of eigenvectors of A.

Recall that in the Diagonalization process discussed in the last section, the columns of the matrix Pare eigenvectors of A, and these vectors form a basis of Rn. It follows from Proposition 10B that thisbasis is orthonormal if and only if the matrix P is orthogonal.

Definition. An n×n matrix A with real entries is said to be orthogonally diagonalizable if there existsan orthogonal matrix P with real entries such that P−1AP = P tAP is a diagonal matrix with realentries.

First of all, we would like to determine which matrices are orthogonally diagonalizable. For those thatare, we then need to discuss how we may find an orthogonal matrix P to carry out the diagonalization.

To study the first question, we have the following result which gives a restriction on those matricesthat are orthogonally diagonalizable.

PROPOSITION 10D. Suppose that A is a orthogonally diagonalizable matrix with real entries. ThenA is symmetric.

Proof. Suppose that A is orthogonally diagonalizable. Then there exists an orthogonal matrix P anda diagonal matrix D, both with real entries and such that P tAP = D. Since PP t = P tP = I andDt = D, we have

A = PDP t = PDtP t,

so that

At = (PDtP t)t = (P t)t(Dt)tP t = PDP t = A,

whence A is symmetric. ©

Our first question is in fact answered by the following result which we state without proof.



PROPOSITION 10E. Suppose that A is an n × n matrix with real entries. Then it is orthogonallydiagonalizable if and only if it is symmetric.

The remainder of this section is devoted to finding a way to orthogonally diagonalize a symmetricmatrix with real entries. We begin by stating without proof the following result. The proof requiresresults from the theory of complex vector spaces.

PROPOSITION 10F. Suppose that A is a symmetric matrix with real entries. Then all the eigenvaluesof A are real.

Our idea here is to follow the Diagonalization process discussed in the last section, knowing that sinceA is diagonalizable, we shall find a basis of Rn consisting of eigenvectors of A. We may then wish toorthogonalize this basis by the Gram-Schmidt process. This last step is considerably simplified in viewof the following result.

PROPOSITION 10G. Suppose that u1 and u2 are eigenvectors of a symmetric matrix A with realentries, corresponding to distinct eigenvalues λ1 and λ2 respectively. Then u1 · u2 = 0. In other words,eigenvectors of a symmetric real matrix corresponding to distinct eigenvalues are orthogonal.

Proof. Note that if we write u1 and u2 as column matrices, then since A is symmetric, we have

Au1 · u2 = ut2Au1 = ut

2Atu1 = (Au2)tu1 = u1 ·Au2.

It follows that

λ1u1 · u2 = Au1 · u2 = u1 ·Au2 = u1 · λ2u2,

so that (λ1 − λ2)(u1 · u2) = 0. Since λ1 6= λ2, we must have u1 · u2 = 0. ©

We can now follow the procedure below.

ORTHOGONAL DIAGONALIZATION PROCESS. Suppose that A is a symmetric n×n matrixwith real entries.(1) Determine the n real roots λ1, . . . , λn of the characteristic polynomial det(A−λI), and find n linearly

independent eigenvectors u1, . . . ,un of A corresponding to these eigenvalues as in the Diagonaliza-tion process.

(2) Apply the Gram-Schmidt orthogonalization process to the eigenvectors u1, . . . ,un to obtain orthogo-nal eigenvectors v1, . . . ,vn of A, noting that eigenvectors corresponding to distinct eigenvalues arealready orthogonal.

(3) Normalize the orthogonal eigenvectors v1, . . . ,vn to obtain orthonormal eigenvectors w1, . . . ,wn ofA. These form an orthonormal basis of Rn. Furthermore, write

P = ( w1 . . . wn ) and D =

λ1

. . .λn

,

where λ1, . . . , λn ∈ R are the eigenvalues of A and where w1, . . . ,wn ∈ Rn are respectively theirorthogonalized and normalized eigenvectors. Then P tAP = D.

Remark. Note that if we apply the Gram-Schmidt orthogonalization process to eigenvectors corre-sponding to the same eigenvalue, then the new vectors that result from this process are also eigenvectorscorresponding to this eigenvalue. Why?




A =

2 2 12 5 21 2 2

.


det

2− λ 2 12 5− λ 21 2 2− λ

= 0;

in other words, (λ−7)(λ−1)2 = 0. The eigenvalues are therefore λ1 = 7 and (double root) λ2 = λ3 = 1.An eigenvector corresponding to λ1 = 7 is a solution of the system

(A− 7I)u =

−5 2 12 −2 21 2 −5

u = 0, with root u1 =

121

.

Eigenvectors corresponding to λ2 = λ3 = 1 are solutions of the system

(A− I)u =

1 2 12 4 21 2 1

u = 0, with roots u2 =

10−1

and u3 =

2−10

which are linearly independent. Next, we apply the Gram-Schmidt orthogonalization process to u2 andu3, and obtain

v2 =

10−1

and v3 =

1−11

which are now orthogonal to each other. Note that we do not have to do anything to u1 at this stage,in view of Proposition 10G. We now conclude that

v1 =

121

, v2 =

10−1

, v3 =

1−11

form an orthogonal basis of R3. Normalizing each of these, we obtain respectively

w1 =

1/√

62/√

61/√

6

, w2 =

1/√

20

−1/√

2

, w3 =

1/√

3−1/√

31/√

3

.

We now take

P = ( w1 w2 w3 ) =

1/√

6 1/√

2 1/√

32/√

6 0 −1/√

31/√

6 −1/√

2 1/√

3

.

Then

P−1 = P t =

1/√

6 2/√

6 1/√

61/√

2 0 −1/√

21/√

3 −1/√

3 1/√

3

and P tAP =

7 0 00 1 00 0 1

.




A =

−1 6 −120 −13 300 −9 20

.


det

−1− λ 6 −120 −13− λ 300 −9 20− λ

= 0;

in other words, (λ + 1)(λ − 2)(λ − 5) = 0. The eigenvalues are therefore λ1 = −1, λ2 = 2 and λ3 = 5.An eigenvector corresponding λ1 = −1 is a solution of the system

(A+ I)u =

0 6 −120 −12 300 −9 21


100

.

An eigenvector corresponding to λ2 = 2 is a solution of the system

(A− 2I)u =

−3 6 −120 −15 300 −9 18


021

.


(A− 5I)u =

−6 6 −120 −18 300 −9 15


1−5−3

.

Note that while u1,u2,u3 correspond to distinct eigenvalues of A, they are not orthogonal. The matrixA is not symmetric, and so Proposition 10G does not apply in this case.


A =

5 −2 0−2 6 20 2 7

.


det

5− λ −2 0−2 6− λ 20 2 7− λ

= 0;

in other words, (λ− 3)(λ− 6)(λ− 9) = 0. The eigenvalues are therefore λ1 = 3, λ2 = 6 and λ3 = 9. Aneigenvector corresponding λ1 = 3 is a solution of the system

(A− 3I)u =

2 −2 0−2 3 20 2 4


22−1

.


(A− 6I)u =

−1 −2 0−2 0 20 2 1


2−12

.




(A− 9I)u =

−4 −2 0−2 −3 20 2 −2


−122

.

Note now that the eigenvalues are distinct, so it follows from Proposition 10G that u1,u2,u3 are or-thogonal, so we do not have to apply Step (2) of the Orthogonal diagonalization process. Normalizingeach of these vectors, we obtain respectively

w1 =

2/32/3−1/3

, w2 =

2/3−1/32/3

, w3 =

−1/32/32/3

.

We now take

P = ( w1 w2 w3 ) =

2/3 2/3 −1/32/3 −1/3 2/3−1/3 2/3 2/3

.

Then

P−1 = P t =

2/3 2/3 −1/32/3 −1/3 2/3−1/3 2/3 2/3

and P tAP =

3 0 00 6 00 0 9

.




1. Prove Proposition 10B(b).

2. Let

A =(a+ b b− aa− b b+ a

),

where a, b ∈ R. Determine when A is orthogonal.

3. Suppose that A is an orthogonal matrix with real entries. Prove thata) A−1 is an orthogonal matrix; andb) detA = ±1.

4. Suppose that A and B are orthogonal matrices with real entries. Prove that AB is orthogonal.

5. Verify that for every a ∈ R, the matrix

A =1

1 + 2a2

1 −2a 2a2

2a 1− 2a2 −2a2a2 2a 1

is orthogonal.

6. Suppose that λ is an eigenvalue of an orthogonal matrix A with real entries. Prove that 1/λ is alsoan eigenvalue of A.

7. Suppose that

A =(a bc d

)is an orthogonal matrix with real entries. Explain why a2 + b2 = c2 + d2 = 1 and ac+ bd = 0, andquote clearly any result that you use. Deduce that A has one of the two possible forms

A =(


)or A =

(cos θ − sin θ− sin θ − cos θ

),

where θ ∈ [0, 2π).

8. Consider the matrix

A =

1 −√6√

3−√6 2

√2√

3√

2 3

.

a) Find the characteristic polynomial of A and show that A has eigenvalues 4 (twice) and −2.b) Find an eigenvector of A corresponding to the eigenvalue −2.c) Find two orthogonal eigenvectors of A corresponding to the eigenvalue 4.d) Find an orthonormal basis of R3 consisting of eigenvectors of A.e) Using the orthonormal basis in part (d), find a matrix P such that P tAP is a diagonal matrix.



9. Apply the Orthogonal diagonalization process to each of the following matrices:

a) A =

5 0 60 11 66 6 −2

b) A =

0 2 02 0 10 1 0

c) A =

1 −4 2−4 1 −22 −2 −2

d) A =

2 0 360 3 036 0 23

e) A =

1 1 0 01 1 0 00 0 0 00 0 0 0

f) A =

−7 24 0 024 7 0 00 0 −7 240 0 24 7

10. Suppose that B is an m × n matrix with real entries. Prove that the matrix A = BtB has an

orthonormal set of n eigenvectors.


LINEAR ALGEBRA

W W L CHEN

c© W W L Chen, 1997, 2008.





Chapter 11

APPLICATIONS OF

REAL INNER PRODUCT SPACES

11.1. Least Squares Approximation

Given a continuous function f : [a, b] → R, we wish to approximate f by a polynomial g : [a, b] → R ofdegree at most k, such that the error ∫ b

a

|f(x)− g(x)|2 dx

is minimized. The purpose of this section is to study this problem using the theory of real inner productspaces. Our argument is underpinned by the following simple result in the theory.

PROPOSITION 11A. Suppose that V is a real inner product space, and that W is a finite-dimensionalsubspace of V . Given any u ∈ V , the inequality

‖u− projWu‖ ≤ ‖u−w‖

holds for every w ∈W .

In other words, the distance from u to any w ∈ W is minimized by the choice w = projWu, theorthogonal projection of u on the subspace W . Alternatively, projWu can be thought of as the vectorin W closest to u.

Proof of Proposition 11A. Note that

u− projWu ∈W⊥ and projWu−w ∈W.

Chapter 11 : Applications of Real Inner Product Spaces page 1 of 12


It follows from Pythagoras’s theorem that

‖u−w‖2 = ‖(u− projWu) + (projWu−w)‖2 = ‖u− projWu‖2 + ‖projWu−w‖2,

so that

‖u−w‖2 − ‖u− projWu‖2 = ‖projWu−w‖2 ≥ 0.

The result follows immediately. ©

Let V denote the vector space C[a, b] of all continuous real valued functions on the closed interval[a, b], with inner product

〈f, g〉 =∫ b

a

f(x)g(x) dx.

Then ∫ b

a

|f(x)− g(x)|2 dx = 〈f − g, f − g〉 = ‖f − g‖2.

It follows that the least squares approximation problem is reduced to one of finding a suitable polynomialg to minimize the norm ‖f − g‖.

Now let W = Pk[a, b] be the collection of all polynomials g : [a, b] → R with real coefficients and ofdegree at most k. Note that W is essentially Pk, although the variable is restricted to the closed interval[a, b]. It is easy to show that W is a subspace of V . In view of Proposition 11A, we conclude that

g = projW f

gives the best least squares approximation among polynomials in W = Pk[a, b]. This subspace is ofdimension k + 1. Suppose that v0, v1, . . . , vk is an orthogonal basis of W = Pk[a, b]. Then by Propo-sition 9L, we have

g =〈f, v0〉‖v0‖2

v0 +〈f, v1〉‖v1‖2

v1 + . . .+〈f, vk〉‖vk‖2

vk.

Example 11.1.1. Consider the function f(x) = x2 in the interval [0, 2]. Suppose that we wish to find aleast squares approximation by a polynomial of degree at most 1. In this case, we can take V = C[0, 2],with inner product

〈f, g〉 =∫ 2

0

f(x)g(x) dx,

and W = P1[0, 2], with basis 1, x. We now apply the Gram-Schmidt orthogonalization process to thisbasis to obtain an orthogonal basis 1, x− 1 of W , and take

g =〈x2, 1〉‖1‖2

1 +〈x2, x− 1〉‖x− 1‖2

(x− 1).

Now

〈x2, 1〉 =∫ 2

0

x2 dx =83

and ‖1‖2 = 〈1, 1〉 =∫ 2

0

dx = 2,



while

〈x2, x− 1〉 =∫ 2

0

x2(x− 1) dx =43

and ‖x− 1‖2 = 〈x− 1, x− 1〉 =∫ 2

0

(x− 1)2 dx =23.

It follows that

g =43

+ 2(x− 1) = 2x− 23.

Example 11.1.2. Consider the function f(x) = ex in the interval [0, 1]. Suppose that we wish to find aleast squares approximation by a polynomial of degree at most 1. In this case, we can take V = C[0, 1],with inner product

〈f, g〉 =∫ 1

0

f(x)g(x) dx,

and W = P1[0, 1], with basis 1, x. We now apply the Gram-Schmidt orthogonalization process to thisbasis to obtain an orthogonal basis 1, x− 1/2 of W , and take

g =〈ex, 1〉‖1‖2

1 +〈ex, x− 1/2〉‖x− 1/2‖2

(x− 1

2

).

Now

〈ex, 1〉 =∫ 1

0

ex dx = e− 1 and 〈ex, x〉 =∫ 1

0

exxdx = 1,

so that ⟨ex, x− 1

2

⟩= 〈ex, x〉 − 1

2〈ex, 1〉 =

32− e

2.

Also

‖1‖2 = 〈1, 1〉 =∫ 1

0

dx = 1 and∥∥∥∥x− 1

2

∥∥∥∥2

=⟨x− 1

2, x− 1

2

⟩=∫ 1

0

(x− 1

2

)2

dx =112.

It follows that

g = (e− 1) + (18− 6e)(x− 1

2

)= (18− 6e)x+ (4e− 10).

Remark. From the proof of Proposition 11A, it is clear that ‖u−w‖ is minimized by the unique choicew = projWu. It follows that the least squares approximation problem posed here has a unique solution.

11.2. Quadratic Forms

A real quadratic form in n variables x1, . . . , xn is an expression of the form

n∑i=1

n∑j=1

i≤j

cijxixj , (1)

where cij ∈ R for every i, j = 1, . . . , n satisfying i ≤ j.



Example 11.2.1. The expression 5x21 + 6x1x2 + 7x2

2 is a quadratic form in two variables x1 and x2. Itcan be written in the form

5x21 + 6x1x2 + 7x2

2 = (x1 x2 )(

5 33 7

)(x1

x2

).

Example 11.2.2. The expression 4x21 + 5x2

2 + 3x23 + 2x1x2 + 4x1x3 + 6x2x3 is a quadratic form in three

variables x1, x2 and x3. It can be written in the form

4x21 + 5x2

2 + 3x23 + 2x1x2 + 4x1x3 + 6x2x3 = (x1 x2 x3 )

4 1 21 5 32 3 3

x1

x2

x3

.

Note that in both examples, the quadratic form can be described in terms of a real symmetric matrix.In fact, this is always possible. To see this, note that given any quadratic form (1), we can write, forevery i, j = 1, . . . , n,

aij =

cij if i = j,

12cij if i < j,

12cji if i > j.

(2)

Then

n∑i=1

n∑j=1

i≤j

cijxixj =n∑i=1

n∑j=1

aijxixj = (x1 . . . xn )

a11 . . . a1n...

...an1 . . . ann

x1...xn

.

The matrix

A =

a11 . . . a1n...

...an1 . . . ann

is clearly symmetric, in view of (2).

We are interested in the case when x1, . . . , xn take real values. In this case, we can write

x =

x1...xn

.

It follows that a quadratic form can be written as

xtAx,

where A is an n× n real symmetric matrix and x takes values in Rn.

Many problems in mathematics can be studied using quadratic forms. Here we shall restrict ourattention to two fundamental problems which are in fact related. The first is the question of whatconditions the matrix A must satisfy in order that the inequality

xtAx > 0

holds for every non-zero x ∈ Rn. The second is the question of whether it is possible to have a change ofvariables of the type x = Py, where P is an invertible matrix, such that the quadratic form xtAx canbe represented in the alternative form ytDy, where D is a diagonal matrix with real entries.



Definition. A quadratic form xtAx is said to be positive definite if xtAx > 0 for every non-zero x ∈ Rn.In this case, we say that the symmetric matrix A is a positive definite matrix.

To answer our first question, we shall prove the following result.

PROPOSITION 11B. A quadratic form xtAx is positive definite if and only if all the eigenvalues ofthe symmetric matrix A are positive.

Our strategy here is to prove Proposition 11B by first studying our second question. Since the matrixA is real and symmetric, it follows from Proposition 10E that it is orthogonally diagonalizable. In otherwords, there exists an orthogonal matrix P and a diagonal matrix D such that P tAP = D, and soA = PDP t. It follows that

xtAx = xtPDP tx,

and so, writing

y = P tx,

we have

xtAx = ytDy.

Also, since P is an orthogonal matrix, we also have x = Py. This answers our second question.

Furthermore, in view of the Orthogonal diagonalization process, the diagonal entries in the matrix Dcan be taken to be the eigenvalues of A, so that

D =

λ1

. . .λn

,

where λ1, . . . , λn ∈ R are the eigenvalues of A. Writing

y =

y1...yn

,

we have

xtAx = ytDy = λ1y21 + . . .+ λny

2n. (3)

Note now that x = 0 if and only if y = 0, since P is an invertible matrix. Proposition 11B now followsimmediately from (3).

Example 11.2.3. Consider the quadratic form 2x21 + 5x2

2 + 2x23 + 4x1x2 + 2x1x3 + 4x2x3. This can be

written in the form xtAx, where

A =

2 2 12 5 21 2 2

and x =

x1

x2

x3

.

The matrix A has eigenvalues λ1 = 7 and (double root) λ2 = λ3 = 1; see Example 10.3.1. Furthermore,we have P tAP = D, where

P =

1/√

6 1/√

2 1/√

32/√

6 0 −1/√

31/√

6 −1/√

2 1/√

3

and D =

7 0 00 1 00 0 1

.

Writing y = P tx, the quadratic form becomes 7y21 + y2

2 + y23 which is clearly positive definite.



Example 11.2.4. Consider the quadratic form 5x21 + 6x2

2 + 7x23 − 4x1x2 + 4x2x3. This can be written

in the form xtAx, where

A =

5 −2 0−2 6 20 2 7

and x =

x1

x2

x3

.

The matrix A has eigenvalues λ1 = 3, λ2 = 6 and λ3 = 9; see Example 10.3.3. Furthermore, we haveP tAP = D, where

P =

2/3 2/3 −1/32/3 −1/3 2/3−1/3 2/3 2/3

and D =

3 0 00 6 00 0 9

.

Writing y = P tx, the quadratic form becomes 3y21 + 6y2

2 + 9y23 which is clearly positive definite.

Example 11.2.5. Consider the quadratic form x21 + x2

2 + 2x1x2. Clearly this is equal to (x1 + x2)2 andis therefore not positive definite. The quadratic form can be written in the form xtAx, where

A =(

1 11 1

)and x =

(x1

x2

).

It follows from Proposition 11B that the eigenvalues of A are not all positive. Indeed, the matrix A haseigenvalues λ1 = 2 and λ2 = 0, with corresponding eigenvectors(

11

)and

(1−1

).

Hence we may take

P =(

1/√

2 1/√

21/√

2 −1/√

2

)and D =

(2 00 0

).

Writing y = P tx, the quadratic form becomes 2y21 which is not positive definite.

11.3. Real Fourier Series

Let E denote the collection of all functions f : [−π, π] → R which are piecewise continuous on theinterval [−π, π]. This means that any f ∈ E has at most a finite number of points of discontinuity, ateach of which f need not be defined but must have one sided limits which are finite. We further adoptthe convention that any two functions f, g ∈ E are considered equal, denoted by f = g, if f(x) = g(x)for every x ∈ [−π, π] with at most a finite number of exceptions.

It is easy to check that E forms a real vector space. More precisely, let λ ∈ E denote the functionλ : [−π, π]→ R, where λ(x) = 0 for every x ∈ [−π, π]. Then the following conditions hold:

• For every f, g ∈ E, we have f + g ∈ E.• For every f, g, h ∈ E, we have f + (g + h) = (f + g) + h.• For every f ∈ E, we have f + λ = λ+ f = f .• For every f ∈ E, we have f + (−f) = λ.• For every f, g ∈ E, we have f + g = g + f .• For every c ∈ R and f ∈ E, we have cf ∈ E.• For every c ∈ R and f, g ∈ E, we have c(f + g) = cf + cg.• For every a, b ∈ R and f ∈ E, we have (a+ b)f = af + bf .• For every a, b ∈ R and f ∈ E, we have (ab)f = a(bf).• For every f ∈ E, we have 1f = f .



We now give this vector space E more structure by introducing an inner product. For every f, g ∈ E,write

〈f, g〉 =1π

∫ π

−πf(x)g(x) dx.

The integral exists since the function f(x)g(x) is clearly piecewise continuous on [−π, π]. It is easy tocheck that the following conditions hold:

• For every f, g ∈ E, we have 〈f, g〉 = 〈g, f〉.• For every f, g, h ∈ E, we have 〈f, g + h〉 = 〈f, g〉+ 〈f, h〉.• For every f, g ∈ E and c ∈ R, we have c〈f, g〉 = 〈cf, g〉.• For every f ∈ E, we have 〈f, f〉 ≥ 0, and 〈f, f〉 = 0 if and only if f = λ.

Hence E is a real inner product space.

The difficulty here is that the inner product space E is not finite-dimensional. It is not straightforwardto show that the set

1√2, sinx, cosx, sin 2x, cos 2x, sin 3x, cos 3x, . . .

(4)

in E forms an orthonormal “basis” for E. The difficulty is to show that the set spans E.

Remark. It is easy to check that the elements in (4) form an orthonormal “system”. For every k,m ∈ N,we have ⟨

1√2,

1√2

⟩=

1π

∫ π

−π

12

dx = 1;⟨1√2, sin kx

⟩=

1π

∫ π

−π

1√2

sin kx = 0;⟨1√2, cos kx

⟩=

1π

∫ π

−π

1√2

cos kx = 0;

as well as

〈sin kx, sinmx〉 =1π

∫ π

−πsin kx sinmxdx =

1π

∫ π

−π

12

(cos(k −m)x− cos(k +m)x) dx =

1 if k = m,0 if k 6= m;

〈cos kx, cosmx〉 =1π

∫ π

−πcos kx cosmxdx =

1π

∫ π

−π

12

(cos(k −m)x+ cos(k +m)x) dx =

1 if k = m,0 if k 6= m;

and

〈sin kx, cosmx〉 =1π

∫ π

−πsin kx cosmxdx =

1π

∫ π

−π

12

(sin(k −m)x+ sin(k +m)x) dx = 0.

Let us assume that we have established that the set (4) forms an orthonormal basis for E. Then anatural extension of Proposition 9H gives rise to the following: Every function f ∈ E can be writtenuniquely in the form

a0

2+∞∑n=1

(an cosnx+ bn sinnx), (5)

known usually as the (trigonometric) Fourier series of the function f , with Fourier coefficients

a0√2

=⟨f,

1√2

⟩=

1π

∫ π

−πf(x) dx,



and, for every n ∈ N,

an = 〈f, cosnx〉 =1π

∫ π

−πf(x) cosnx dx and bn = 〈f, sinnx〉 =

1π

∫ π

−πf(x) sinnx dx.

Note that the constant term in the Fourier series (5) is given by⟨f,

1√2

⟩1√2

=a0

2.

Example 11.3.1. Consider the function f : [−π, π] → R, given by f(x) = x for every x ∈ [−π, π]. Forevery n ∈ N ∪ 0, we have

an =1π

∫ π

−πx cosnx dx = 0,

since the integrand is an odd function. On the other hand, for every n ∈ N, we have

bn =1π

∫ π

−πx sinnx dx =

2π

∫ π

0

x sinnx dx,

since the integrand is an even function. On integrating by parts, we have

bn =2π

(−[x cosnx

n

]π0

+∫ π

0

cosnxn

dx)

=2π

(−[x cosnx

n

]π0

+[

sinnxn2

]π0

)=

2(−1)n+1

n.

We therefore have the (trigonometric) Fourier series

∞∑n=1

2(−1)n+1

nsinnx.

Note that the function f is odd, and this plays a crucial role in eschewing the Fourier coefficients ancorresponding to the even part of the Fourier series.

Example 11.3.2. Consider the function f : [−π, π]→ R, given by f(x) = |x| for every x ∈ [−π, π]. Forevery n ∈ N ∪ 0, we have

an =1π

∫ π

−π|x| cosnx dx =

2π

∫ π

0

x cosnx dx,

since the integrand is an even function. Clearly

a0 =2π

∫ π

0

x dx = π.

Furthermore, for every n ∈ N, on integrating by parts, we have

an =2π

([x sinnx

n

]π0

−∫ π

0

sinnxn

dx)

=2π

([x sinnx

n

]π0

+[

cosnxn2

]π0

)=

0 if n is even,

− 4πn2

if n is odd.

On the other hand, for every n ∈ N, we have

bn =1π

∫ π

−π|x| sinnx dx = 0,



since the integrand is an odd function. We therefore have the (trigonometric) Fourier series

π

2−

∞∑n=1n odd

4πn2

cosnx =π

2−∞∑k=1

4π(2k − 1)2

cos(2k − 1)x.

Note that the function f is even, and this plays a crucial role in eschewing the Fourier coefficients bncorresponding to the odd part of the Fourier series.

Example 11.3.3. Consider the function f : [−π, π]→ R, given for every x ∈ [−π, π] by

f(x) = sgn(x) =

+1 if 0 < x ≤ π,0 if x = 0,−1 if −π ≤ x < 0.

For every n ∈ N ∪ 0, we have

an =1π

∫ π

−πsgn(x) cosnx dx = 0,

since the integrand is an odd function. On the other hand, for every n ∈ N, we have

bn =1π

∫ π

−πsgn(x) sinnx dx =

2π

∫ π

0

sinnx dx,

since the integrand is an even function. It is easy to see that

bn = − 2π

[cosnxn

]π0

=

0 if n is even,

4πn

if n is odd.

We therefore have the (trigonometric) Fourier series

∞∑n=1n odd

4πn

sinnx =∞∑k=1

4π(2k − 1)

sin(2k − 1)x.

Example 11.3.4. Consider the function f : [−π, π]→ R, given by f(x) = x2 for every x ∈ [−π, π]. Forevery n ∈ N ∪ 0, we have

an =1π

∫ π

−πx2 cosnx dx =

2π

∫ π

0

x2 cosnx dx,

since the integrand is an even function. Clearly

a0 =2π

∫ π

0

x2 dx =2π2

3.

Furthermore, for every n ∈ N, on integrating by parts, we have

an =2π

([x2 sinnx

n

]π0

−∫ π

0

2x sinnxn

dx)

=2π

([x2 sinnx

n

]π0

+[

2x cosnxn2

]π0

−∫ π

0

2 cosnxn2

dx)

=2π

([x2 sinnx

n

]π0

+[

2x cosnxn2

]π0

−[

2 sinnxn3

]π0

)=

4(−1)n

n2.



On the other hand, for every n ∈ N, we have

bn =1π

∫ π

−πx2 sinnx dx = 0,

since the integrand is an odd function. We therefore have the (trigonometric) Fourier series

π2

3+∞∑n=1

4(−1)n

n2cosnx.




1. Consider the function f : [−1, 1]→ R : x 7→ x3. We wish to find a polynomial g(x) = ax+ b whichminimizes the error ∫ 1

−1

|f(x)− g(x)|2 dx.

Follow the steps below to find this polynomial g:a) Consider the real vector space C[−1, 1]. Write down a suitable real inner product on C[−1, 1] for

this problem, explaining carefully the steps that you take.b) Consider now the subspace P1[−1, 1] of all polynomials of degree at most 1. Describe the poly-

nomial g in terms of f and orthogonal projection with respect to the inner product in part (a).Give a brief explanation for your choice.

c) Write down a basis of P1[−1, 1].d) Apply the Gram-Schmidt process to your basis in part (c) to obtain an orthogonal basis of

P1[−1, 1].e) Describe your polynomial in part (b) as a linear combination of the elements of your basis in part

(d), and find the precise values of the coefficients.

2. For each of the following functions, find the best least squares approximation by linear polynomialsof the form ax+ b, where a, b ∈ R:

a) f : [0, π/2]→ R : x 7→ sinx b) f : [0, 1]→ R : x 7→ x3

c) f : [0, 2]→ R : x 7→ ex

3. Consider the quadratic form 2x21 + x2

2 + x23 + 2x1x2 + 2x1x3 in three variables x1, x2, x3.

a) Write the quadratic form in the form xtAx, where

x =

x1

x2

x3

and where A is a symmetric matrix with real entries.

b) Apply the Orthogonal diagonalization process to the matrix A.c) Find a transformation of the type x = Py, where P is an invertible matrix, so that the quadratic

form can be written as ytDy, where

y =

y1

y2

y3

and where D is a diagonal matrix with real entries. You should give the matrices P and Dexplicitly.

d) Is the quadratic form positive definite? Justify your assertion both in terms of the eigenvalues ofA and in terms of your solution to part (c).

4. For each of the following quadratic forms in three variables, write it in the form xtAx, find asubstitution x = Py so that it can be written as a diagonal form in the variables y1, y2, y3, anddetermine whether the quadratic form is positive definite:

a) x21 + x2

2 + 2x23 − 2x1x2 + 4x1x3 + 4x2x3 b) 3x2

1 + 2x22 + 3x2

3 + 2x1x3

c) 3x21 + 5x2

2 + 4x23 + 4x1x3 − 4x2x3 d) 5x2

1 + 2x22 + 5x2

3 + 4x1x2 − 8x1x3 − 4x2x3

e) x21 − 5x2

2 − x23 + 4x1x2 + 6x2x3



5. Determine which of the following matrices are positive definite:

a)

0 1 11 0 11 1 0

b)

3 1 11 1 21 2 1

c)

6 1 71 1 27 2 9

d)

6 −2 −1−2 6 −1−1 −1 5

e)

3 −2 4−2 6 24 2 3

f)

2 0 0 00 1 0 10 0 2 00 1 0 1

6. Find the trigonometric Fourier series for each of the following functions f : [−π, π]→ C:

a) f(x) = x|x| for every x ∈ [−π, π]b) f(x) = | sinx| for every x ∈ [−π, π]c) f(x) = | cosx| for every x ∈ [−π, π]d) f(x) = 0 for every x ∈ [−π, 0] and f(x) = x for every x ∈ (0, π]e) f(x) = sinx for every x ∈ [−π, 0] and f(x) = cosx for every x ∈ (0, π]f) f(x) = cosx for every x ∈ [−π, 0] and f(x) = sinx for every x ∈ (0, π]g) f(x) = cos(x/2) for every x ∈ [−π, π]h) f(x) = sin(x/2) for every x ∈ [−π, π]


LINEAR ALGEBRA

W W L CHEN

c© W W L Chen, 1997, 2008.





Chapter 12

COMPLEX VECTOR SPACES

12.1. Complex Inner Products

Our task in this section is to define a suitable complex inner product. We begin by giving a reminder ofthe basics of complex vector spaces or vector spaces over C.

Definition. A complex vector space V is a set of objects, known as vectors, together with vectoraddition + and multiplication of vectors by elements of C, and satisfying the following properties:

(VA1) For every u,v ∈ V , we have u + v ∈ V .(VA2) For every u,v,w ∈ V , we have u + (v + w) = (u + v) + w.(VA3) There exists an element 0 ∈ V such that for every u ∈ V , we have u + 0 = 0 + u = u.(VA4) For every u ∈ V , there exists −u ∈ V such that u + (−u) = 0.(VA5) For every u,v ∈ V , we have u + v = v + u.(SM1) For every c ∈ C and u ∈ V , we have cu ∈ V .(SM2) For every c ∈ C and u,v ∈ V , we have c(u + v) = cu + cv.(SM3) For every a, b ∈ C and u ∈ V , we have (a+ b)u = au + bu.(SM4) For every a, b ∈ C and u ∈ V , we have (ab)u = a(bu).(SM5) For every u ∈ V , we have 1u = u.

Remark. Subspaces of complex vector spaces can be defined in a similar way as for real vector spaces.

An example of a complex vector space is the euclidean space Cn consisting of all vectors of the formu = (u1, . . . , un), where u1, . . . , un ∈ C. We shall first generalize the concept of dot product, norm anddistance, first developed for Rn in Chapter 9.

Definition. Suppose that u = (u1, . . . , un) and v = (v1, . . . , vn) are vectors in Cn. The complexeuclidean inner product of u and v is defined by

u · v = u1v1 + . . .+ unvn,

Chapter 12 : Complex Vector Spaces page 1 of 6


the complex euclidean norm of u is defined by

‖u‖ = (u · u)1/2 = (|u1|2 + . . .+ |un|2)1/2,

and the complex euclidean distance between u and v is defined by

d(u,v) = ‖u− v‖ = (|u1 − v1|2 + . . .+ |un − vn|2)1/2.

Corresponding to Proposition 9A, we have the following result.

PROPOSITION 12A. Suppose that u,v,w ∈ Cn and c ∈ C. Then(a) u · v = v · u;(b) u · (v + w) = (u · v) + (u ·w);(c) c(u · v) = (cu) · v; and(d) u · u ≥ 0, and u · u = 0 if and only if u = 0.

The following definition is motivated by Proposition 12A.

Definition. Suppose that V is a complex vector space. By a complex inner product on V , we mean afunction 〈 , 〉 : V × V → C which satisfies the following conditions:(IP1) For every u,v ∈ V , we have 〈u,v〉 = 〈v,u〉.(IP2) For every u,v,w ∈ V , we have 〈u,v + w〉 = 〈u,v〉+ 〈u,w〉.(IP3) For every u,v ∈ V and c ∈ C, we have c〈u,v〉 = 〈cu,v〉.(IP4) For every u ∈ V , we have 〈u,u〉 ≥ 0, and 〈u,u〉 = 0 if and only if u = 0.

Definition. A complex vector space with an inner product is called a complex inner product space ora unitary space.

Definition. Suppose that u and v are vectors in a complex inner product space V . Then the norm ofu is defined by

‖u‖ = 〈u,u〉1/2,

and the distance between u and v is defined by

d(u,v) = ‖u− v‖.

Using this inner product, we can discuss orthogonality, orthogonal and orthonormal bases, the Gram-Schmidt orthogonalization process, as well as orthogonal projections, in a similar way as for real innerproduct spaces. In particular, the results in Sections 9.4 and 9.5 can be generalized to the case of complexinner product spaces.

12.2. Unitary Matrices

For matrices with real entries, orthogonal matrices and symmetric matrices play an important role in theorthogonal diagonalization problem. For matrices with complex entries, the analogous roles are playedby unitary matrices and hermitian matrices respectively.

Definition. Suppose that A is a matrix with complex entries. Suppose further that the matrix A isobtained from the matrix A by replacing each entry of A by its complex conjugate. Then the matrix

A∗ = At

is called the conjugate transpose of the matrix A.



PROPOSITION 12B. Suppose that A and B are matrices with complex entries, and that c ∈ C. Then(a) (A∗)∗ = A;(b) (A+B)∗ = A∗ +B∗;(c) (cA)∗ = cA∗; and(d) (AB)∗ = B∗A∗.

Definition. A square matrix A with complex entries and satisfying the condition A−1 = A∗ is said tobe a unitary matrix.

Corresponding to Proposition 10B, we have the following result.

PROPOSITION 12C. Suppose that A is an n× n matrix with complex entries. Then(a) A is unitary if and only if the row vectors of A form an orthonormal basis of Cn under the complex

euclidean inner product; and(b) A is unitary if and only if the column vectors of A form an orthonormal basis of Cn under the

complex euclidean inner product.

12.3. Unitary Diagonalization

Corresponding to the orthogonal disgonalization problem in Section 10.3, we now discuss the followingunitary diagonalization problem.

Definition. A square matrix A with complex entries is said to be unitarily diagonalizable if there existsa unitary matrix P with complex entries such that P−1AP = P ∗AP is a diagonal matrix with complexentries.

First of all, we would like to determine which matrices are unitarily diagonalizable. For those thatare, we then need to discuss how we may find a unitary matrix P to carry out the diagonalization. Asbefore, we study the question of eigenvalues and eigenvectors of a given matrix; these are defined as forthe real case without any change.

In Section 10.3, we have indicated that a square matrix with real entries is orthogonally diagonalizableif and only if it is symmetric. The most natural extension to the complex case is the following.

Definition. A square matrix A with complex entries is said to be hermitian if A = A∗.

Unfortunately, it is not true that a square matrix with complex entries is unitarily diagonalizableif and only if it is hermitian. While it is true that every hermitian matrix is unitarily diagonalizable,there are unitarily diagonalizable matrices that are not hermitian. The explanation is provided by thefollowing.

Definition. A square matrix A with complex entries is said to be normal if AA∗ = A∗A.

Remark. Note that every hermitian matrix is normal and every unitary matrix is normal.

Corresponding to Propositions 10E and 10G, we have the following results.

PROPOSITION 12D. Suppose that A is an n × n matrix with complex entries. Then it is unitarilydiagonalizable if and only if it is normal.

PROPOSITION 12E. Suppose that u1 and u2 are eigenvectors of a normal matrix A with complexentries, corresponding to distinct eigenvalues λ1 and λ2 respectively. Then u1 · u2 = 0. In other words,eigenvectors of a normal matrix corresponding to distinct eigenvalues are orthogonal.



We can now follow the procedure below.

UNITARY DIAGONALIZATION PROCESS. Suppose that A is a normal n × n matrix withcomplex entries.(1) Determine the n complex roots λ1, . . . , λn of the characteristic polynomial det(A − λI), and find

n linearly independent eigenvectors u1, . . . ,un of A corresponding to these eigenvalues as in theDiagonalization process.

(2) Apply the Gram-Schmidt orthogonalization process to the eigenvectors u1, . . . ,un to obtain orthogo-nal eigenvectors v1, . . . ,vn of A, noting that eigenvectors corresponding to distinct eigenvalues arealready orthogonal.

(3) Normalize the orthogonal eigenvectors v1, . . . ,vn to obtain orthonormal eigenvectors w1, . . . ,wn ofA. These form an orthonormal basis of Cn. Furthermore, write

P = ( w1 . . . wn ) and D =

λ1

. . .λn

,

where λ1, . . . , λn ∈ C are the eigenvalues of A and where w1, . . . ,wn ∈ Cn are respectively theirorthogonalized and normalized eigenvectors. Then P ∗AP = D.

We conclude this chapter by discussing the following important result which implies Proposition 10F,that all the eigenvalues of a symmetric real matrix are real.

PROPOSITION 12F. Suppose that A is a hermitian matrix. Then all the eigenvalues of A are real.

Sketch of Proof. Suppose that A is a hermitian matrix. Suppose further that λ is an eigenvalue ofA, with corresponding eigenvector v. Then

Av = λv.

Multiplying on the left by the conjugate transpose v∗ of v, we obtain

v∗Av = v∗λv = λv∗v.

To show that λ is real, it suffices to show that the 1× 1 matrices v∗Av and v∗v both have real entries.Now

(v∗Av)∗ = v∗A∗(v∗)∗ = v∗Av

and

(v∗v)∗ = v∗(v∗)∗ = v∗v.

It follows that both v∗Av and v∗v are hermitian. It is easy to prove that hermitian matrices must havereal entries on the main diagonal. Since v∗Av and v∗v are 1× 1, it follows that they are real. ©




1. Consider the set V of all matrices of the form(z 00 z

),

where z ∈ C, with matrix addition and scalar multiplication. Determine whether V forms a complexvector space.

2. Is Rn a subspace of Cn? Justify your assertion.

3. Prove Proposition 12A.

4. Suppose that u,v,w are elements of a complex inner product space, and that c ∈ C.a) Show that 〈u + v,w〉 = 〈u,w〉+ 〈v,w〉.b) Show that 〈u, cv〉 = c〈u,v〉.

5. Let V be the vector space of all continuous functions f : [0, 1]→ C. Show that

〈f, g〉 =∫ 1

0

f(x)g(x) dx

defines a complex inner product on V .

6. Suppose that u,v are elements of a complex inner product space, and that c ∈ C.a) Show that 〈u− cv,u− cv〉 = 〈u,u〉 − c〈u,v〉 − c〈u,v〉+ cc〈v,v〉.b) Deduce that 〈u,u〉 − c〈u,v〉 − c〈u,v〉+ cc〈v,v〉 ≥ 0.c) Prove the Cauchy-Schwarz inequality, that |〈u,v〉|2 ≤ 〈u,u〉〈v,v〉.

7. Generalize the results in Sections 9.4 and 9.5 to the case of complex inner product spaces. Try toprove as many results as possible.

8. Prove Proposition 12B.

9. Prove Proposition 12C.

10. Prove that the diagonal entries on every hermitian matrix are all real.

11. Suppose that A is a square matrix with complex entries.a) Prove that det(A) = detA.b) Deduce that det(A∗) = detA.c) Prove that if A is hermitian, then detA is real.d) Prove that if A is unitary, then |detA| = 1.

12. Apply the Unitary diagonalization process to each of the following matrices:

a) A =(

4 1− i1 + i 5

)b) A =

(3 −ii 3

)

c) A =

5 0 00 −1 −1 + i0 −1− i 0



13. Suppose that λ1 and λ2 are distinct eigenvalues of a hermitian matrix A, with eigenvectors u1 andu2 respectively.

a) Show that u∗1Au2 = λ1u∗1u2 and u∗1Au2 = λ2u∗1u2.b) Complete the proof of Proposition 12E.

14. Suppose that A is a square matrix with complex entries, and that A∗ = −A.a) Show that iA is a hermitian matrix.b) Show that A is unitarily diagonalizable but has purely imaginary eigenvalues.


linear algebra wwl chen

Documents