vecspacandlinalg

SESA2021 Engineering Analysis:

Vector Spaces and Linear Algebra

Lecture Notes 2009/2010.

Lecturer: Dr A. A. Shah

School of Engineering Sciences, University of Southampton

Room 1027, Building 25

e-mail: A.Shah@soton.ac.uk

Contents

1 Introduction and example applications 1

2 Basic definitions and examples 6

3 Subspaces of vector spaces 10

4 Linear Transformations 11

5 Span 15

6 Linear independence 20

7 Basis and dimension 23

8 Changing the basis 26

9 Fundamental subspaces 35

10 Square matrices and systems of linear equations 47

11 Inner product spaces and orthogonality 51

12 Orthogonal and orthonormal bases 62

13 Orthogonal projections 67

14 The Gram-Schmidt process 74

15 Least squares approximations 80

Engineering Analysis SESA 2021

k1 k2k1

Figure 1: A mechanical system with 2 masses and 3 springs (vibration example 1.1).

1 Introduction and example applications

In this course we will be concerned primarily with solving systems of linear

equations (including eigenvalue problems), which are difficult to avoid in any

aspect of engineering. These systems, which can be very large, can be written

as equations involving matrices and vectors. Let’s look at some examples in

which vectors, matrices and eigenvalues arise.

Example 1.1 Consider the system with 2 masses and 3 springs shown in Fig-

ure 1. We can use Newton’s second law along with Hooke’s law to write down a

system of equations for the displacements, x1 and x2, of the two masses

mx1 + k1x1 + k2(x1 − x2) = 0

mx2 + k1x2 + k2(x2 − x1) = 0

where k1 and k2 are the spring constants. These equations can be written as

x1 = −(k1 + k2

x1 +k2m

x2 =k2m

x1 −(k1 + k2

We could now write the system in matrix form by first introducing a “vector”

form of the solution: ~x = (x1, x2). Then

︸︷︷︸

−β α

α −β

︸︷︷︸

where β = (k1 + k2)/m and α = k2/m. The obvious thing to do is look for

oscillatory solutions that are of the form

~x = ~veiωt =

eiωt (4)

where ω is the vibration frequency. The new vector ~v contains just constants,

v1 and v2, which we would want to find. The variable part is in the eiωt. Sub-

stituting (4) into (3) and cancelling the eiωt terms on both sides gives us a new

system of equations for ~v

−ω2

−β α

α −β

or A~v = −ω2~v (5)

This is an example of an eigenvalue problem, i.e., something of the form: “a

��

Figure 2: Output from an experiment in which temperature is measured with time.

The objective is to get the best straight line fit to the data. Data fitting example 1.2.

transformation (in this case a matrix A) acting on a object (the vector ~v) and

giving us a constant (in this case −ω2) times the object”. We may now ask how

many solutions there are and what they look like. In this present case we would

be most interested in the frequencies of vibration and the corresponding solutions

(normal modes). It turns out that there are 2 frequencies because there are two

degrees of freedom.

Example 1.2 Suppose you have run an experiment and collected some data that

you would like to fit to a line or curve. Let’s say you’ve taken measurements

of temperature against time and expect a linear rise in temperature but due to

experimental error, not all points will fall nicely onto a straight line, as seen

in Figure 2. Let’s say there are 4 temperature measurements T1 to T4 taken at

times t1 to t4, and we want to represent temperature as T = a+ bt. We need to

find a and b. If we take the two points T1 and T2 at times t1 and t2 we have

T1 = a+ bt1 T2 = a+ bt2

which we can rearrange to find a and b. The problem is that if we use another

two points we will get different values of a and b. Let’s define the “vector”

(T1, T2, T3, T4). We need to find one value for a and one for b. The matrix

equation we need to solve is

a+ bt1

a+ bt2

a+ bt3

a+ bt4

Notice however, that we have more equations than variables (a and b)! This is

an example of an“overdetermined system”. How do we solve the system? Well,

we can’t solve it exactly but what we can do is find the “best fit” in some sense.

One method we will look at to do this is called “least squares”.

Example 1.3 Partial differential equations (conservation laws) from aerody-

namics are usually solved on a computer. The numerical solutions are con-

structed first by “discretising” the equations using the finite difference, finite

volume or finite element methods in space together with a time-stepping proce-

dure if the equations are unsteady. This means that you approximate the solu-

tions at discrete points in time and space and try to find the solutions at those

points. By discretising the equations this way you will end up with large matrix

systems. The more points (finer mesh) you chose and the higher the dimen-

sion the larger the systems become. There are a great many ways of solving

these systems depending on the accuracy and speed required and the stability of

the methods. To understand these methods and to choose the most appropriate

(in for example a CFD code) for a given problem you need to understand some

linear algebra (theory of matrix systems).

We first need to develop the ideas of “vectors” and “transformations” (for our

purposes these are matrices) so that you are familiar with the language used to

describe matrix systems.

Figure 3: Vectors in the plane R2 (left) and in space R

3 (right). For both these spaces

we can represent vectors graphically but in higher dimensions this is obviously not

possible.

2 Basic definitions and examples

A vector is a quite general object. It doesn’t have to be a directed line segment

in space or in the plane, as shown in Figure 3. In fact we can have vectors in

higher dimensions, as shown in example 1.2. More on this below.

When we look at a particular set of vectors we will call it a vector space

and give it a symbol like V . The individual vectors will be given symbols like ~u

or ~v .

So what is a “vector space” and what isn’t? Let’s look at a familiar example.

Example 2.1 Euclidean n spaces, denoted Rn, are vector spaces (we will see

why in a second). In this course we will deal almost exclusively with these vector

spaces.

There are two that you are familiar with: R2 (two dimensional space) and R

(three-dimensional space). An example of a vector in R2 is ~u = (2, 3), which is

sometimes written as 2~e1+3~e2. The numbers 2 and 3 are called the “coordinates”

of the vector ~u = (2, 3) in the “standard basis vectors” ~e1 = (1, 0) and ~e2 = (0, 1).

We will look at these concepts in detail later on. Vectors in R2 and R

3 can be

represented graphically as shown in Figure 3.

More generally we can define vectors that have n coordinates. These are

vectors in the vector space Rn. For example, the vector (T1, T2, T3, T4) in example

1.2 is in R4 and (1, 0,−1, 2, 4, 0, 1) is in R

7. We are not able to visualise these

in a graph.

To construct a vector space we basically take a bunch (set) of vectors and

define ways of adding them together and multiplying them by numbers (scalars).

Let’s recall some basic facts about the familiar vectors in the Euclidean 2 and

3 spaces R2 and R

3. These will help us to understand what a vector space is

precisely.

(1) On R2 (space) we can “add” two vectors as follows:

(1,−1) + (2, 5) = (1 + 2,−1 + 5) = (3, 4) (7)

i.e., we just add the individual coordinates. We can add vectors in any Rn

space in this way. Note that we have chosen to define “addition” in this way.

We could instead choose another way. By doing it as above, we have made

sure that the sum of two vectors in R2 is another vector in R

(2) On R2 we can multiply a vector by a scalar (number) as follows:

2(1, 3) = (2× 1, 2× 3) = (2, 6) (8)

where we just multiply each coordinate by the scalar (number) 2. We can

multiply vectors in any Rn space by a scalar in this way. Again, we have

defined “multiplication by a scalar” in a certain way. We could instead choose

another way. By doing it as above, we have made sure that multiplying a

vector in R2 by a scalar gives another vector in R

(3) Now that we have defined a way of adding vectors in Rn (add individual

components) and of multiplying them by scalars (multiply each component by

the scalar), it doesn’t matter which way round we add vectors in Rn or which

way round we multiply them by scalars. There are some obvious rules, such

(i) ~u+ ~v = ~v + ~u e.g. (2,−1) + (1, 0) = (1, 0) + (2,−1)

(ii) (c+ k)~u = c~u+ k~u e.g. (3 + 2)(2,−1) = 3(2,−1) + 2(2,−1)

for any three vectors ~u , ~v , ~w in Rn and any scalars c and k.

(4) In R2 we have a zero vector ~0, i.e. (0, 0). When we add ~0 to any vector, e.g.

(2,−1) + (0, 0) = (2,−1)

the vector doesn’t change.

In a general vector space V we have to define the way we add vectors and

multiply them by scalars. When constructing these definitions, we have to make

sure that the rules above for the familiar way of doing things in Rn are preserved.

For V to be a vector space:

• The way we “add” vectors in V has to lead to other vectors in V . We say

that V is closed with respect to addition if this is true.

• When a vector in V is multiplied by a scalar, the answer must be an-

other vector in V . We say that V is closed with respect to scalar

multiplication if this is true.

• The way we define addition and scalar multiplication of the vectors in V

has to preserve rules (9) and other similar rules.

• V has to have a zero vector and adding it to any vector should not change

that vector.

If just one of these requirements is not satisfied, V will NOT be a vector space.

Example 2.2 Let’s define vector addition in R2 in the usual way (add individ-

ual components), but instead of the usual scalar multiplication we will use

c~u = c(u1, u2) = (u1, cu2) (10)

i.e., we only multiply the second coordinate. Let’s try to satisfy the last rule in

equations (9) with any vector in R2 and any two scalars:

2× (1, 1) + 3× (1, 1) = (1, 2) + (1, 3) = (2, 5)

5× (1, 1) = (1, 5) 6= 2× (1, 1) + 3× (1, 1)

So, defining scalar multiplication this way does NOT lead to a vector space.

We can also treat functions and even more abstract objects as vectors in

vector spaces. In this course, however, we will not consider these types of spaces,

which are usually referred to as “function spaces”.

One last bit of notation. If V consists of vectors ~v1, ~v2, ~v3, ....., ~vn, we use

curly brackets as follows

V = {~v1, ~v2, ~v3, ....., ~vn}

to represent this set of vectors. For example, if we have a set of vectors consisting

of ~v1 = (1, 0) and ~v2 = (0, 1), we write

V = {(1, 0), (0, 1)}

3 Subspaces of vector spaces

For some vector spaces it is possible to take a subset W (i.e. some of the vectors)

of the original space V and obtain a new vector space using the same rules for

addition and scalar multiplication. We call W a subspace of V . There are

some very important subspaces we will meet later on.

It turns out that to be a subspace, we only need to make sure that the

subspace is closed with respect to addition and scalar multiplication, i.e.,

when we add vectors in W or multiply a vector in W by a scalar we get

another vector in W .

Example 3.1 Let W be the set of all vectors in R3 that are of the form (0, u2, u3),

i.e., the first coordinate is zero. Is this a subspace of R3 with the usual rules for

addition and scalar multiplication?

To find out we need to verify that addition and scalar multiplication of vectors

(0, u2, u3) lead to vectors of the form (0, u2, u3). This is the case (Exercise:

Check that it is), so the space consisting of such vectors is a subspace of R3.

Example 3.2 Let W be the set of all vectors (u1, u2) in R2 such that u1 ≥ 0,

i.e., the first coordinate non-negative. Is this a subspace of R2 with the usual

rules for addition and scalar multiplication?

Multiply (u1, u2) by c < 0. We get (cu1, cu2), where the first coordinate is

negative. Therefore, this space is not closed with respect to scalar multiplica-

tion (multiplication by a negative scalar will give a vector that is not in W ).

Therefore, W is NOT a subspace of R2.

4 Linear Transformations

The idea of a transformation (or map) is that it takes a vector, say ~u in a

space V , and “transforms” or “maps” it into another vector A~u in a space W ,

which may or may not be the same as V . This is like a function f(x) taking a

number x and giving us another number y = f(x).

When we are dealing with the Euclidean n spaces, we can write a transfor-

mation as a matrix. In what sense is a matrix a transformation? Let’s take a

look at an example.

Example 4.1 Consider the following 3× 3 matrix

1 2 −1

3 −2 5

Let’s take the vector ~u = (1, 3, 0) in R3 and “transform” it with A (i.e. left

multiply it by A) into another vector ~b in R3

1 2 −1

3 −2 5

︸︷︷︸

In this example, A takes a vector in R3 and transforms it by multiplication into

another vector in R3. We write A : R3 → R

3 to signify this. This is pronounced

“A maps R3 to R3”. The output vector ~b is called the image of ~u under A.

In the general case, an n × m (n rows and m columns) matrix Anm takes any

vector in Rm and transforms it by multiplication into a vector in R

n, i.e., Anm :

Rm → R

• The domain of Anm is the set of inputs, which is Rm.

• The range of Anm is the set of all possible outputs (images) in Rn.

Let’s take a look at some more examples:

Example 4.2 Consider the following multiplication (transformation) of a vec-

tor ~u by a 4× 3 matrix A, which leads to another vector ~b

−1 5 1

0 −4 3

3 −7 −3

︸︷︷︸

~u in R3

︸︷︷︸

~b in R4

In this example we multiply a column vector in R3 (the domain) by A and get a

column vector in R4, so A : R3 → R

4. The set of possible outputs in R4 is the

range of A. Clearly, ~b is in the range of A (it is one of the possible outputs).

Example 4.3 Consider the following multiplication of a vector by a 3× 2 ma-

trix A

−1 1

4 −3

︸︷︷︸

~u in R2

︸︷︷︸

~b in R3

This time we multiply a column vector in R2 (the domain) by A and get a column

vector in R3, so A : R2 → R

3. The set of possible outputs in R3 is the range of

A. ~b is in the range of A.

From the rules of matrix multiplication, we know that for any matrix A and

vectors ~u and ~v

A(~u+ ~v) = A~u+A~v and A(c~u) = cA~u (15)

where c is any number (scalar). These rules tell us that A preserves linear

combinations.

Example 4.4

−1 1

A~u+A~v =

−1 1

A(~u+ ~v) = A

1− 1

−1 + 0

−1 1

Exercise: Check that A(5~u) = 5A~u, i.e., that A times 5~u is the same as 5 times

Because matrices satisfy the rules (15) we call them linear transformations

(or linear maps).

5 Span

We want to be able to write all vectors in a space V as sums of some special

‘fundamental’ vectors. We will build up to this slowly over the next few sections.

Without perhaps knowing it, you have already done this in the Euclidean

spaces using the standard basis vectors in example 2.1. Let’s look at an

example.

Example 5.1 The standard basis vectors in R3 are

~e1 = (1, 0, 0) ~e2 = (0, 1, 0) ~e3 = (0, 0, 1)

You can write any vector in R3 as a linear combination of ~e1, ~e2 and ~e3.

This means that any vector is a constant times ~e1 plus a constant times ~e2 plus

constant times ~e3. For example:

(2, 3, 1) = 2× ~e1 + 3× ~e2 + 1× ~e3

= 2(1, 0, 0) + 3(0, 1, 0) + (0, 0, 1)

= (2, 3, 1)

and this holds in all of the Euclidean spaces.

We can generalise this idea of linear combinations to general vector spaces.

We say that a vector ~w in V is a linear combination of vectors

~v1, ~v2, ~v3, ..., ~vn (all in V ) if it can be written as:

~w = c1~v1 + c2~v2 + .......+ cn~vn (20)

for some scalars c1, c2, ...cn.

Example 5.2 Is ~u = (−12, 20) in R2 a linear combination of ~v1 = (−1, 2) and

~v2 = (4,−6)?

If it is, then

(−12, 20) = c1(−1, 2) + c2(4,−6) (21)

−c1 + 4c2 = −12 and 2c1 − 6c2 = 20 (22)

The solution to these equations is c1 = 4 and c2 = −2. So ~u = 4~v1 − 2~v2, i.e.,

~u is a linear combination of ~v1 and ~v2.

Example 5.3 Is ~u = (1,−4) in R2 a linear combination of ~v1 = (2, 10) and

~v2 = (−3,−15)?

If it is, then

(1,−4) = c1(2, 10) + c2(−3,−15) (23)

2c1 − 3c2 = 1

10c1 − 15c2 = −4 ⇒ 2c1 − 3c2 = −45

The second equation contradicts the first, so there is no solution. ~u is NOT a

linear combination of ~v1 and ~v2.

Suppose we have a vector space V . We are interested in finding a set S of vectors

from V that allows us to write any vector in V as a linear combination of the

vectors in S. Let’s look at an example.

Example 5.4 Any vector in R3 can be written as a linear combination of the

standard basis vectors ~e1, ~e2 and ~e3. We say that these vectors “span R3”, i.e.,

all vectors in R3 can be written as linear combinations of them. Remember that

we write a set of vectors inside curly brackets, so the set of basis vectors in R3

is written {~e1, ~e2, ~e3}. To signify that this set spans R3 we write

R3 = span {~e1, ~e2, ~e3}

We now generalise the idea of a span to general vector spaces:

Let S = {~v1, ~v2, ....~vn} be a set of vectors in a space V and let W be the set

of all linear combinations of the vectors in S. The set W is called the span

of the vectors ~v1, ~v2, ....~vn and we write

W = spanS = span {v1, v2, ..., vn}

Example 5.5 Do the following vectors span R3?

~v1 = (2, 0, 1) ~v2 = (−1, 3, 4) ~v3 = (1, 1,−2)

If they do, then any vector in R3, say ~u = (u1, u2, u3), can be written as a linear

combination of ~v1, ~v2 and ~v3:

~u = (u1, u2, u3) = c1~v1 + c2~v2 + c3~v3

(u1, u2, u3) = c1(2, 0, 1) + c2(−1, 3, 4) + c3(1, 1,−2)

2c1 − c2 + c3 = u1

3c2 + c3 = u2

c1 + 4c2 − 2c3 = u3

We need to be able to find values for c1, c2 and c3. These equations can be

written in matrix form as

2 −1 1

1 4 −2

︸︷︷︸

To have a solution ~c, the matrix A has to be invertible, i.e., have an inverse. For

then: ~c = A−1~u. To have an inverse, the determinant of A has to be non-zero.

Exercise: check that the determinant of A is −24.

So we can find values for c1, c2 and c3. Therefore

span {~v1, ~v2, ~v3} = R3

6 Linear independence

We want to know when a set of vectors S will span the whole of a vector space

V , i.e., when we can write all vectors in V as linear combinations of the vectors

in S. There are two things we have to make sure of: (i) there are enough

vectors in S to describe all of V and (ii) there are no redundant vectors in S

so we can write each vector in V as a linear combination in only one way. By

a redundant vector, we mean that it is a linear combination of the other vectors

in the set, so we don’t really need it.

In order to reach this goal, we need firstly to identify when vectors in a set

are “independent” of each other, by which again we mean that none of them is

a linear combination of the others.

Example 6.1 Consider the vectors ~v1 = (2,−2, 4) and ~v2 = (3,−5, 4) and

~v3 = (0, 1, 1) in R3. If these vectors are “dependent” we can form linear combi-

nations, i.e., we should be able to get

c1~v1 + c2~v2 + c3~v3 = ~0 or c1~v1 = −c2~v2 − c3~v3 (28)

where the scalars c1, c2 and c3 cannot all be zero (otherwise it is not possible to

form a linear combination and the vectors are independent). Substituting, we get

c1(2,−2, 4) + c2(3,−5, 4) + c3(0, 1, 1) = (0, 0, 0)

which leads to a system of equations in matrix form

−2 −5 1

︸︷︷︸

For equations of the form A~c = ~0, there is only the trivial solution (~c = ~0) if A

is invertible. Otherwise, there will be a non-trivial solution (at least one of c1,

c2 and c3 will not be zero). Exercise: check that det(A) = 0.

So we can find at least one non-trivial solution, ~c to equation (28). Thus,

the vectors ~v1, ~v2 and ~v3 are not independent.

This leads us on to the definition of “linear independence”, which is just a

generalisation of the “dependence” concept above.

Let S = {~v1, ~v2, ....~vn} be a set of vectors in some vector space V . If the

equation

c1~v1 + c2~v2 + ...+ cn~vn = ~0

is only satisfied when c1 = c2 = ... = cn = 0, we say that the vectors

~v1, ~v2, ....~vn are linearly independent. Otherwise, we say that the vectors

are linearly dependent.

Let’s look at some more examples.

Example 6.2 Are the vectors ~v1 = (3,−1) and ~v2 = (−2, 2) in R2 linearly

independent?

Let’s set up the equation:

c1~v1 + c2~v2 = ~0 ⇒ c1(3,−1) + c2(−2, 2) = (0, 0)

which leads to a system of equations

3c1 − 2c2 = 0 − c1 + 2c2 = 0

the only solution to which is c1 = c2 = 0 (the trivial solution). Therefore, ~v1

and ~v2 linearly independent.

Example 6.3 The standard basis vectors in R3, ~e1, ~e2 and ~e3, are linearly in-

dependent. Try to find numbers c1, c2 and c3 such that c1~e1 + c2~e2 + c3~e3 = ~0.

It’s impossible!

7 Basis and dimension

To this point, we’ve been using the term “standard basis” in the Euclidean n

spaces, without really knowing what the “basis” part of this expression means.

Moreover, in these so-called “n-dimensional” spaces, what does “dimension”

actually mean. In R2 and R

3 the “dimension” is usually thought of geometrically

as the number of axes, typically labelled x, y and z. The more general concept of

dimension will reduce to this definition. First we will tackle the issue of “basis”.

Earlier, I said we were working towards writing all vectors in a space V as

sums (linear combinations) of some special ‘fundamental’ vectors, S = {~v1, ~v2, ..., ~vn}.

There should be enough vectors to span the whole of V , i.e., V = spanS (any

vector in V can be obtained from a linear combination of the vectors in S). At

the same time, there should be no redundant (linearly dependent) vectors in

S because the linear combinations should be unique. These two requirements

basically lead to the special set of vectors we are looking for, and we call this

set a “basis for V ”.

Let S = {~v1, ~v2, ....~vn} be a set of vectors in some vector space V . If

V = span {~v1, ~v2, ..., ~vn}

and ~v1, ~v2, ....~vn are linearly independent, we call S a basis for V .

Example 7.1 The standard basis vectors ~e1, ~e2 and ~e3 form a basis for R3

(hence the name).

(i) We already know from examples 5.1 and 5.4 that R3 = span {~e1, ~e2, ~e3}. ✔

(ii) From example 6.3 we know that the standard basis vectors are linearly inde-

pendent. ✔

Example 7.2 Determine if the vectors ~v1 = (1,−1, 1), ~v2 = (0, 1, 2) and ~v3 =

(3, 0,−1) form a basis for R3.

First we have to check whether these vectors are linearly dependent., i.e., can

we find c1, c2 and c3 (not all zero) such that c1~v1 + c2~v2 + c3~v3 = ~0?

c1(1,−1, 1) + c2(0, 1, 2) + c3(3, 0,−1) = (0, 0, 0)

or in matrix form

−1 1 0

1 2 −1

︸︷︷︸

Also, to have R3 = span {~v1, ~v2, ~v3}, any vector (u1, u2, u3) in R

3 has to be a

linear combination of ~v1, ~v2 and ~v3

C1(1,−1, 1) + C2(0, 1, 2) + C3(3, 0,−1) = (u1, u2, u3)

or in matrix form

−1 1 0

1 2 −1

︸︷︷︸

If det(A) is not zero, then (31) has a unique solution ~C and the only solution to

equation (30) is the trivial solution ~c = ~0. Exercise: Check that det(A) = −10.

Therefore, ~v1, ~v2 and ~v3 are linearly independent and they span R3. Thus, they

form a basis for R3.

We now come to the concept of “dimension”.

Suppose that S = {~v1, ~v2, ...., ~vn} is a basis for a vector space V . If the

number of vectors in S is finite, say n, we say that V is finite dimensional

with dimension n. We write dim(V ) = n . Otherwise, the space is said

to be infinite dimensional.

It turns out importantly that

All bases of V contain the same number of vectors

Example 7.3 All the spaces Rn are finite dimensional with dimension n. For

example, R3 has dimension 3. All bases for R3 will have 3 vectors. If there are

more, they will not be linearly independent. If there are fewer, they will not span

8 Changing the basis

We’ve already seen through examples that a basis for a vector space is not

unique. For example, the standard basis in R3 and the set of vectors {~v1, ~v2, ~v3}

in example 7.2 are both bases in R3.

The standard basis in Rn is generally the easiest one to work with but there

may be cases in which an alternative basis is preferable. Therefore, we need to

find a way to convert between different bases. Let’s look at an example to sort

out some terminology.

Example 8.1 Using the standard basis in R3 we can write the vector (3, 5, 2)

(3, 5, 2) = 3(1, 0, 0) + 5(0, 1, 0) + 2(0, 0, 1) = 3~e1 + 5~e2 + 2~e3

The numbers multiplying the basis vectors, 3, 5 and 2, are called the “coordi-

nates” of the vector. It is clear that the coordinates will change depending on

the basis. For the standard basis, the coordinates are simple to find: they are

just the numbers in the vector itself. For other bases, you have to think a bit

We now generalise the idea of coordinates.

Let S = {~v1, ~v2, ....~vn} be a basis for a vector space V . Since S is a basis, we

can express any vector ~u in V as a linear combination of the vectors in S.

~u = c1~v1 + c2~v2 + ....+ cn~vn

The numbers c1, c2, ..., cn are called the coordinates of ~u with respect to

the basis S

The coordinates for a vector with respect to a basis S can themselves be written

as a vector in Rn, which we call a coordinate vector

(~u)S = (c1, c2, ..., cn)

The subscript S makes it clear that the coordinates are with respect to S. For

the standard bases in Rn, the coordinate vector (~u)S is exactly the same as the

vector ~u itself, as seen in the example above.

Example 8.2 Determine the coordinate vector (~u)S of the vector u = (10, 5, 0)

relative to the following bases.

(i) The standard basis in R3.

In this case

~u = 10~e1 + 5~e2 + 0~e3

so the coordinates are 10, 5 and 0, and the coordinate vector is simply

(~u)S = (10, 5, 0) = ~u

(ii) S = {~v1, ~v2, ~v3} where ~v1 = (1,−1, 1), ~v2 = (0, 1, 2) and ~v3 = (3, 0,−1).

In this case, we have to find the coordinates c1, c2 and c3 such that

c1(1,−1, 1) + c2(0, 1, 2) + c3(3, 0,−1) = (10, 5, 0)

This is equivalent to the system of equations

c1 + 3c3 = 10

−c1 + c2 = 5

c1 + 2c2 − c3 = 0

The answer is c1 = −2, c2 = 3 and c3 = 4. Exercise: Check this result.

Therefore,

(~u)S = (−2, 3, 4)

Now onto how to change bases. We will work in R2 to demonstrate the procedure

Suppose we have two bases for the space R2:

B = {~v1, ~v2} Basis 1

C = {~w1, ~w2} Basis 2

Now because B is a basis for R2, each of the basis vectors in C can be written

as a linear combination of the basis vectors in B

~w1 = a~v1 + b~v2

~w2 = c~v1 + d~v2

This means that the coordinate vectors for ~w1 and ~w2 relative to the basis B

(~w1)B = (a, b) and (~w2)B = (c, d)

Unfortunately, we now have to introduce a new notation for writing these coor-

dinate vectors. Instead of ( )B we are going to write them using [ ]B and call

them coordinate matrices.

[~w1]B =

and [~w2]B =

They are basically the same as the coordinate vectors, written as columns.

Next, let ~u be any vector in V . In terms of the basis C, we can write ~u as

~u = c1 ~w1 + c2 ~w2 (35)

The coordinate matrix of ~u relative to C is:

[~u]C =

Equation (33) tells us how to write the basis vectors in C as linear combinations

of the basis vectors B. Substituting equation (33) into equation (35), we get

~u = c1 ~w1 + c2 ~w2

= c1(a~v1 + b~v2) + c2(c~v1 + d~v2)

= (ac1 + cc2)~v1 + (bc1 + dc2)~v2

This gives us the coordinate matrix of ~u relative to the basis B

[~u]B =

ac1 + cc2

bc1 + dc2

Let us re-write this as

[~u]B =

ac1 + cc2

bc1 + dc2

︸︷︷︸

[~u]C (39)

The matrix P is called the transition matrix from C to B: given the co-

ordinate matrix of a vector relative to the basis C, we can use it to find the

coordinate matrix relative to the basis B. Notice that its columns are the coor-

dinate matrices for the basis vectors C relative to B, [~w1]B and [~w2]B. We can

therefore write P compactly as

P = [[~w1]B [~w2]B]

Equation (39) can then be written compactly as

[~u]B = P [~u]C = [[~w1]B [~w2]B] [~u]C

We can now generalise this result.

Suppose we have two bases for the vector space V :

B = {~v1, ~v2, ..., ~vn} Basis 1

C = {~w1, ~w2, ..., ~wn} Basis 2

The transition matrix from C to B is defined as

P = [[~w1]B [~w2]B...... [~wn]B] (40)

where the ith column of P is the coordinate matrix of ~wi relative to B.

The coordinate matrix of a vector ~u in V relative to B is then related to

the coordinate matrix of ~u relative to C by

[~u]B = P [~u]C (41)

Example 8.3 Consider the standard basis B = {~e1, ~e2, ~e3} and the basis C =

{~v1, ~v2, ~v3}, where ~v1 = (1,−1, 1) and ~v2 = (0, 1, 2) and ~v3 = (3, 0,−1), for R3.

(i) Find the transition matrix from C to B

(ii) Find the transition matrix from B to C

(i) Recall that the columns of the transition matrix are coordinate matrices for

the basis vectors C relative to B. In other words, we have to find the coordinates

of the basis vectors ~v1, ~v2 and ~v3 when they are written as linear combinations

of ~e1, ~e2 and ~e3. We know from examples 8.1 and 8.2 that in the standard basis,

the coordinate vector (and therefore the coordinate matrix) is simply the vector

itself. Thus

[~v1]B =

[~v2]B =

[~v3]B =

From equation (40), the transition matrix from C to B is then

P = [[~v1]B [~v2]B [~v3]B] =

−1 1 0

1 2 −1

(ii) To find the transition matrix from B to C we need the coordinate matrices

of the standard basis vectors relative to C. In other words, we have to find the

coordinates of ~e1, ~e2 and ~e3 when they are written as linear combinations of ~v1,

~v2 and ~v3. This requires more work (for you!)

Exercise: Verify that

~e1 =110~v1 +

110~v2 +

310~v3

~e2 = −35~v1 +

25~v2 +

~e3 =310~v1 +

310~v2 − 1

Therefore, the coordinate matrices of the standard basis vectors relative to C are

[~e1]B =

[~e2]B =

−3/5

[~e3]B =

−1/10

and the transition matrix from B to C is

P ′ = [[~e1]C [~e2]C [~e3]C ] =

1/10 −3/5 3/10

1/10 2/5 3/10

3/10 1/5 −1/10

Example 8.4 Using the results of the previous example, compute

(i) [~u]B given (~u)C = (−2, 3, 4)

(ii) [~u]C given (~u)B = (10, 5, 0)

(i) All we need to do now is use equation (41), i.e., some matrix multiplication

[~u]B = P [~u]C =

−1 1 0

1 2 −1

Looking back at example 8.2(ii), we can see that this is the right result. Once we

have the transition matrix, we can perform this computation quickly and easily

for many vectors.

(ii) This time, we swap the bases and use the transition matrix P ′ instead of P ,

since we’re going from B to C.

[~u]C = P ′[~u]B =

1/10 −3/5 3/10

1/10 2/5 3/10

3/10 1/5 −1/10

as expected from part (i).

There is one final observation to make

The transition matrix from the basis B to C is the inverse of the transition

matrix from C to B.

Exercise: Check that P ′ is the inverse of P in example 8.4.

9 Fundamental subspaces

There are some very important subspaces of Rn that we will be interested in.

These subspaces are associated with matrices. Let’s look at a general n × m

matrix

a11 a12 · · · a1m

a21 a22 · · · a2m

......

. . ....

an1 an2 · · · anm

It has n rows and m columns. The row vectors are the vectors formed out

of the rows of Anm (these are in Rm) and the column vectors are the vectors

formed out of the columns of Anm (these are in Rn).

Example 9.1 Consider the 4× 2 matrix

−1 5

0 −4

3 −7

The row vectors are

~r1 = (−1, 5) ~r2 = (0,−4) ~r3 = (9, 2) ~r4 = (3,−7) (50)

which are vectors in R2 (there are m = 2 columns) and the column vectors are

which are vectors in R4 (there are n = 4 columns).

There are three important subspaces of Rn and R

m associated with a matrix

Anm. We call them the fundamental subspaces of Anm.

First let’s recall that a matrix Anm is a linear transformation that takes

any column vector in Rm and transforms by multiplication into a column vector

in Rn. We write Anm : Rm → R

n. The domain of Anm is Rm (the set of inputs)

and the range of Anm is the set of all possible outputs (“images”) in Rn (which

is generally not all of Rn, just a subspace of it).

Now onto the fundamental subspaces of Anm.

(1) The first subspace is related to the zero vector in Rn. The set of all vectors

~u in the domain Rm that give

Anm~u = ~0 (52)

is called the null space or kernel of Anm. In other words, those vectors in

the domain (inputs) that when operated on by Anm give us the zero vector

in Rn. We write the null space of a matrix A as null(A) or ker(A)

(2) The span of the row vectors of Anm, i.e., the set of all linear combinations of

the row vectors, is called the row space of Anm. Because the row vectors

are in Rm, the row space is a subspace of Rm. We write the row space of a

matrix A as row(A)

(3) The span of the column vectors of Anm, i.e., the set of all linear combinations

of the column vectors, is called the column space of Anm. Because the

column vectors are in Rn, the column space is a subspace of Rn. We write

the column space of a matrix A as col(A)

We will be interested in finding bases for each of these spaces. First another

example.

Example 9.2 Find the null space ker(A) of the following matrix

1 −7

−3 21

To find the null space, we use equation (52). Let’s assume that (u1, u2) is a

vector in ker(A). Then equation (52) leads to

1 −7

−3 21

which can be written as as system of linear equations

u1 − 7u2 = 0

−3u1 + 21u2 = 0 ⇒ −u1 + 7u2 = 0

The two equations are equivalent, and are satisfied when (u1, u2) = (7t, t) for

any number t. Therefore, ker(A) consists of all vectors of the form (7t, t) for

any number t, of which there are infinitely many.

Now, this is one way of finding the null space and a basis for it. However, we

want to be able to find bases for all the fundamental spaces for more complicated

matrices using just one procedure. This procedure is described through another

example.

Before we move onto the example, we first have to review the concepts of

augmented matrices and reduced echelon forms, which you have covered

in your first year maths modules.

Suppose we have a linear system of homogeneous (right hand sides are zero)

equations:

−u1 + 2u2 − u3 + 5u4 + 6u5 = 0

4u1 − 4u2 − 4u3 − 12u4 − 8u5 = 0

2u1 − 6u3 − 2u4 + 4u5 = 0

−3u1 + u2 + 7u3 − 2u4 + 12u5 = 0

We can write this in matrix form as

−1 2 −1 5 6

4 −4 −4 −12 −8

2 0 −6 −2 4

−3 1 7 −2 12

︸︷︷︸

A convenient way of writing this system of equations is by forming the aug-

mented matrix

−1 2 −1 5 6 0

4 −4 −4 −12 −8 0

2 0 −6 −2 4 0

−3 1 7 −2 12 0

The entries to the left of the line represent the coefficients of u1 to u5 in equations

(56) and (57). The zeros to the right of the line represent the terms on the right

hand sides of the ‘=’ signs in equations (56) and (57).

Now, in the system of equations (56) we can multiply or divide any equation

by a constant, we can add or subtract equations or we can swap the equations

around without altering the solutions. You do this, e.g., when you solve 2 linear

simultaneous equations.

Aside.

Solve the following system and make a note of the steps required.

u1 − 2u2 = 2 3u1 + u2 = −2

The same is true, therefore, of the augmented matrix (58), which represents the

system of equations (56): We can

• Interchange 2 rows

• Multiply or divide a row by a non-zero number

• Add a multiple of one row to another.

These are called elementary row operations. They are equivalent to adding

equations (56), multiplying them by constants and interchanging them. The

augmented matrix is just a more compact way of doing it. We also have to be

careful about the right hand sides when we perform the operations. However,

for the homogeneous system above they are zero, so they do not affect the row

operations.

We now want to find the reduced row echelon form of the matrix. We

get this by performing elementary row operations until the augmented matrix

satisfies the following properties.

• In each row, the first non zero entry from the left is 1. This is called the

leading 1.

• The leading 1 in each row is to the right of the leading 1 in the row above.

• All rows consisting entirely of zeros are at the bottom of the matrix.

Exercise: go through the following steps on the augmented matrix (58)

(1) row 2 + 4 × row 1

(2) row 3 + 2 × row 1

(3) row 2 ÷ 4

(4) row 1 × −1

(5) row 3 ÷ 4

(6) row 4 + 3 × row 1

(7) row 3 - row 2

(8) row 4 + 5 × row 2

(9) row 4 ↔ row 3

(10) row 4 ÷ 7

to confirm that the reduced row echelon form is

1 −2 1 −5 −6 0

0 1 −2 2 4 0

0 0 0 1 −2 0

0 0 0 0 0 0

We now move onto the example.

Example 9.3 Determine a basis for the null space of the following 4×5 matrix

−1 2 −1 5 6

4 −4 −4 −12 −8

2 0 −6 −2 4

−3 1 7 −2 12

To find the null space we need to solve equation (52) for ~u = (u1, u2, u3, u4, u5)

in R5. This is the same as equation (57) above. We put it into the augmented

matrix, which is given by matrix (58).

Now we need the reduced row echelon form of the matrix. Again, we have

done this already. The answer is given by matrix (59)

1 −2 1 −5 −6 0

0 1 −2 2 4 0

0 0 0 1 −2 0

0 0 0 0 0 0

Thus, we only have 3 equations (the top 3 rows), but 5 unknowns. Let’s set

u5 = s, where s is any number. The third equation (row) gives

u4 = 2u5 = 2s

Now set u3 = t for any number t. The second equation (row) gives

u2 = 2u3 − 2u4 − 4u5 = 2t− 8s

Finally, the first equation (row) gives

u1 = 2u2 − u3 + 5u4 + 6u5 = 3t

The full solution is

2t− 8s

︸︷︷︸

for any numbers t and s. There are infinitely many solutions because the number

of unknowns is greater than the number of equations. So, the null space consists

of all vectors of the form c1~u1 + c2~u2.

In the above example, we haven’t quite answered the question - we still haven’t

specified a basis! It looks like the vectors ~u2 and ~u2 could form a basis. The cer-

tainly span the whole of the null space, but are they linearly independent. Yes,

they are (Exercise: check that they are). So, they satisfy the two properties

required to be a basis.

We now come to the main reason for solving the system by finding the

reduced row echelon form.

Let Anm be an n×m matrix.

• The vectors found for the null space of the reduced echelon form of Anm

are always linearly independent. They form a basis for the null space of

the reduced echelon matrix and for the null space of the original matrix.

The dimension of the null space (i.e., number of basis vectors) is called

the nullity of Anm, written nullity(Anm) .

• The row vectors containing the leading 1’s in the reduced echelon form of

Anm form a basis for the row space of the reduced echelon matrix and for

the row space of the original matrix Anm.

• The column vectors containing the leading 1’s in the reduced echelon

from of Anm form a basis for the column space of the reduced echelon

form. Suppose that these columns vectors correspond to column numbers

m1,m2, ..,mk.

The column vectors of the original matrix Anm corresponding to

column numbers m1,m2, ..,mk form a basis for the original matrix.

Example 9.4 Let’s look again at the matrix A in example 9.3. The reduced

row echelon form U is given by equation (59)

1 −2 1 −5 −6 0

0 1 −2 2 4 0

0 0 0 1 −2 0

0 0 0 0 0 0

We found that there are two vectors in the basis for the null space. All bases

have the same number of vectors. Therefore nullity(A) = 2.

Rows 1, 2 and 3 of U contain the leading 1’s. Therefore, a basis for the row

space of both A and U is given by

~r1 = (1,−2, 1,−5,−6)

~r2 = (0, 1,−2, 2, 4)

~r3 = (0, 0, 0, 1,−2)

with dim(row(A))=3

Columns 1, 2 and 4 of U contain the leading 1’s. Therefore, a basis for the

column space of U is given by the 1st, 2nd and 4th column vectors of U

~c′1 = (1, 0, 0, 0)

~c′2 = (−2, 1, 0, 0)

~c′4 = (−5, 2, 1, 0)

A basis for the column space of A is therefore given by the 1st, 2nd and 4th

column vectors of A

~c1 = (−1, 4, 2,−3)

~c2 = (2,−4, 0, 1)

~c4 = (5,−12,−2,−2)

with dim(col(A))=3

Notice in this example that dim(row(A)) = dim(col(A)), i.e., the column and

row spaces have the same dimension. This is always true.

The row space and column space of a general n×mmatrix Anm have the same

dimension. We call this dimension the rank of Anm, written rank(Anm) .

The second thing to notice from the example above is that nullity(A)+rank(A) =

2 + 3 = 5, i.e., the number of columns. This again is always true.

For a general n×m matrix Anm (m columns)

nullity(Anm) + rank(Anm) = m

For an n× n matrix A

nullity(A) + rank(A) = n (64)

10 Square matrices and systems of linear equations

The concepts of rank and nullity are important. Let’s consider a square n × n

matrix A : Rn → Rn. A typical problem in many applications of engineering is

to find a solution ~u in Rn to the equation

A~u = ~b (65)

where the vector~b in Rn is known. We will look at certain aspects of this problem

with an example.

Example 10.1 Consider the matrix

1 −2 1

2 1 −2

−3 0 2

= (~c1 ~c2 ~c3) (66)

where ~c1, ~c2 and ~c3 are the column vectors of A

Now consider the procedure for multiplying a vector ~u = (u1, u2, u3) by A

1 −2 1

2 1 −2

−3 0 2

1× u1 + (−2)× u2 + 1× u3

2× u1 + 1× u2 + (−2)× u3

(−3)× u1 + 0× u2 + 2× u3

= u1~c1 + u2~c2 + u3~c3

i.e., any matrix multiplication leads to a linear combination of the column vec-

tors, i.e., a vector in the column space.

From the above example we can see that if we want to solve equation (65), the

vector ~b has to be in the column space of A. It also shows that all output vectors

(i.e. the range of A) are in the column space of A

The range of a square matrix is its column space

Next, let’s consider the nullity and rank. What happens when the rank of an

n × n matrix is less than n? From the definition of rank, we know that if

rank(A) < n, some of the column and row vectors will be linearly dependent -

they can be obtained from the other rows by forming linear combinations and

are, therefore, redundant. If we were to set up the matrix system (65) with some

vector b and look for a solution u, then we would not have enough equations or

some equations would contradict each other. Therefore, a solution will not exist

all or there will be infinitely many solutions ⇒ A will not have an inverse.

The rank of A is less than n ⇐⇒ A is not invertible

A square n×n matrix A with rank(A) = n is said to have full rank (obviously

the rank cannot be any bigger!) If rank(A) < n, the matrix A is said to be rank

deficient. We can restate the above as

A is rank deficient ⇐⇒ A is not invertible

By definition, if a matrix A is rank deficient some of the rows are linearly

dependent. By performing elementary row operations (adding multiples of rows

to other rows) we can get a new matrix B that will have a row of zeros. The

determinants of A andB will differ only by a constant. Therefore, since det(B) =

0, we have det(A) = 0, which means that A will not have an inverse.

A is rank deficient ⇐⇒ det(A) = 0

Another way to look at nullity and rank is by considering the solutions to

A~v = ~0 (69)

the solution to this equation clearly gives us the null space ker(A). The nullity

of A is the number of vectors in the basis for ker(A). If there are non-zero so-

lutions to equation (69), then nullity(A) > 0. Equation (64) then tells us that

rank(A) < n. In this case, we can write

A(~u+ ~v) = A~u+A~v = ~b+~0 = ~b

What does this tell us? It tells us that if ~u is a solution to equation (69) then

so is ~u+ ~v, and there may be an infinite number of the ~v. This suggests that if

a solution to equation (65) exists, it will not be unique.

A is rank deficient ⇐⇒ no unique solution to A~u = ~b

11 Inner product spaces and orthogonality

There is a special class of spaces that we are going to look at. The Euclidean

spaces fall into this category.

What we would like to do, as in Rn, is measure the (i) magnitude (or

“length”) of a vector and (ii) angles and distances between vectors. In R2

and R3 you can visualise these but in higher dimensions you can’t. The basic idea

is to introduce generalisations of the familiar “dot product” and “magnitude”

of a vector in R2 or R

Example 11.1 The dot product in R2 and R

3 is defined as follows

~u · ~v = (u1, u2, u3) · (v1, v2, v3) = u1v1 + u2v2 + u3v3

where we multiply the first, second, etc. coordinate of the first vector by the first,

second etc. coordinate of the second vector and add the results. The dot product

has a geometric interpretation

~u · ~v = |~u||~v| cos θ

where |~u| =√

u21 + u22 + u23 and |~v| =√

v21 + v22 + v23 are the “magnitudes” of

the vectors and θ is the angle between the vectors in the plane that contains them

both. Notice also that

√~u · ~u =

u21 + u22 + u23 = |~u|

and that

~u · ~v = ~v · ~u

(~u+ ~v) · ~w = ~u · ~w + ~v · ~w

(c~u) · ~v = c(~u · ~v) for any scalar c

~u · ~u = u21 + u22 + u23 ≥ 0

~u · ~u = 0 if and only if ~u = ~0

In general Rn spaces we can define the same dot product (multiply individual

respective components)

~u · ~v = u1v1 + u2v2 + ....+ unvn

and the length of a vector in Rn is given by

|~u| =√

u21 + u22 + ...+ u2n =√~u · ~u

Now let’s look at a general vector space V . We want similar measures of “angles”

and “magnitudes”.

• What we do is extend the idea of the dot product and call it an inner

product

• Like the dot product of two vectors, the inner product of two vectors gives

us a number.

• As with the dot product, we will be able to use the inner product to

measure “angles” and “magnitudes”.

• We write 〈~u,~v〉 to represent the inner product of two vectors.

Example 11.2 The dot product on Euclidean spaces is an example of an inner

product. It is called the standard inner product on these spaces.

Example 11.3 Let ~u = (1,−2, 4), ~v = (−2, 0, 1) and ~w = (3,−2, 2). With the

standard inner product (i.e., just the dot product)

〈~u,~v〉 = 〈(1,−2, 4), (−2, 0, 1)〉 = 1× (−2) + (−2)× 0 + 4× 1 = 2

〈~v, ~u〉 = 〈(−2, 0, 1), (1,−2, 4)〉 = (−2)1 + 0(−2) + 4 = 2 = 〈~u,~v〉

〈~u+ ~v, ~w〉 = 〈~u, ~w〉+ 〈~v, ~w〉 Exercise: Check this

〈c~u,~v〉 = 〈(c,−2c, 4c), (−2, 0, 1)〉 = −2c+ 0 + 4c = 2c = c 〈~u,~v〉

〈~u, c~v〉 = c 〈~u,~v〉 Exercise: Check this

〈~u, ~u〉 =√

12 + (−2)2 + 42 =√21 = |~u|

• The properties demonstrated in this example always hold. The property 〈~u,~v〉 =

〈~v, ~u〉 is called “symmetry”.

• The third property is termed “linearity in the first argument” (the two “argu-

ments” are the vectors on either side of the comma.

Exercise: Show that 〈~u,~v + ~w〉 = 〈~u,~v〉+ 〈~u, ~w〉 for the vectors in the example

above. This means that the inner product is “linear in the second argument” as

well as the first. It is, therefore, bilinear.

Hardish exercise (used later on): Show that (“additivity” property)

〈(~v1 + ~v2 + ....+ ~vn), ~w〉 = 〈~v1, ~w〉+ 〈~v2, ~w〉+ .....+ 〈~vn, ~w〉 (72)

HINT: We can write ~v1 + ~v2 + ....+ ~vn = ~v1 + (~v2 + ....+ ~vn)

The sum ~s = ~v2 + ....+ ~vn is just a single vector when we perform the addition.

Then we can apply the third rule in (71). Repeat the procedure by taking out ~v2

from the sum ~s to form a new sum: ~s2 = ~v3+ ....+~vn. Keep going until the new

sum has only the term ~vn.

Example 11.4 We can define other inner products on the Rn spaces. To fix

ideas, let’s take vectors ~u = (u1, u2, u3) and ~v = (v1, v2, v3) in R3. The following

defines an inner product

New: 〈~u,~v〉 = w1u1v1 + w2u2v2 + w3u3v3

Standard inner product: 〈~u,~v〉 = u1v1 + u2v2 + u3v3

The new and standard inner products are the same except for the numbers w1,

w2 and w3 multiplying the first second and third terms in the sum respectively.

These numbers are called weights. This is an example of a “weighted inner

product”.

• A vector space on which we can define an inner product is called a inner

product space.

• The inner product has to satisfy the rules (70) when we swap the dot

product for the inner product.

• We are mainly interested in the vector spaces Rn with the inner product

defined by standard inner product, i.e. dot product.

So how do we measure the “magnitude” of a vector?

In the last computation in example 11.3 you saw that√

〈~u, ~u〉 is the mag-

nitude of ~u. Before we go on to define the magnitude in general we are going

to rename it. We will not say the “magnitude of ~u ” but will instead say the

“norm of ~u”. Moreover, we we will not write the norm (magnitude) as |~u|, but

instead we will write it as ‖~u‖. A norm can be defined without reference to an

inner product. However, we are interested in inner product spaces and the inner

product allow us to define a norm as

‖~u‖ =√

〈~u, ~u〉

Example 11.5 In the Euclidean spaces with the standard inner product, the

norm induced by the inner product is

‖~u‖ =√

〈~u, ~u〉 =√

u21 + u22 + ...+ u2n

Note that for this space the norm ‖~u‖ is identical to the magnitude |~u|

Example 11.6 Find the norms of the vectors ~u = (3, 4) and ~v = (2,−1, 2,−3)

using the standard inner product

‖~u‖ =√

〈~u, ~u〉 =√32 + 42 = 5

‖~v‖ =√

〈~v,~v〉 =√

22 + (−1)2 + 22 + (−3)2 =√18 = 3

Example 11.7 In the Euclidean spaces, the norm induced by the standard inner

product satisfies certain properties. For example, for all vectors ~u = (u1, u2, u3)

‖~u‖ =√

〈~u, ~u〉 =√

u21 + u22 + u23 = 0 if and only if ~u = ~0

‖c~u‖ = c‖~u‖ for any scalar c

Exercise: For ~u = (1,−2, 2), check that ‖2~u‖ = 2‖~u‖ = 6

All norms must satisfy these properties.

Next we must find a way to compute “distances” between vectors. In R2 and

R3, the distance between ~u and ~v is given by |~u− ~v|, i.e., the magnitude of the

difference. For a general inner product space we have

The distance between two vectors ~u and ~v is given by the metric

d(~u,~v) = ‖~u− ~v‖ =√

〈~u− ~v, ~u− ~v〉

(also called distance function)

Example 11.8 In the Euclidean spaces with the standard inner product, the

metric is

d(~u,~v) = ‖~u− ~v‖ =√

〈~u− ~v, ~u− ~v〉

(u1 − v1)2 + (u2 − v2)2 + ...+ (un − vn)2

Note that for this space, ‖~u− ~v‖ is identical to |~u− ~v| .

Exercise: Try to show that

d(~u,~v) = d(~v, ~u)

HINT: (a− b)2 = (b− a)2 for any scalars a and b.

Example 11.9 Calculate the metric for ~u = (3, 4, 1,−1) and ~v = (2,−1, 2,−3)

d(~u,~v) = ‖~u− ~v‖ =√

(3− 2)2 + (4 + 1)2 + (1− 2)2 + (−1 + 3)2 =√31

Exercise: Check that d(~u,~v) = d(~v, ~u), i.e., ‖~u− ~v‖ = ‖~v − ~u‖.

Recall that in R2 and R

3, two vectors are at right angles if ~u · ~v = 0 because

~u ·~v = |~u||~v| cos θ. We say that these vectors are orthogonal. In direct analogy,

for a general inner product space, we say that

~u and ~v are orthogonal if 〈~u,~v〉 = 0

Example 11.10 The standard basis vectors in Rn are orthogonal to each other

with the standard inner product. For example

〈(1, 0, 0), (0, 1, 0)〉 = 0, 〈(0, 1, 0), (0, 0, 1)〉 = 0

and so on (remember these are just dot products).

Now, suppose that W is a subspace of an inner product space V . We say that

a vector ~u from V is orthogonal to W if it is orthogonal to every vector in

W . The set of all vectors that are orthogonal to W is called the orthogonal

complement of W and is denoted by W⊥ ( “W perp”).

Example 11.11 Consider the space R3 with the standard basis. Let W be the

subspace of R3 consisting of all vectors that lie in the xy plane, i.e., of the form

~q = (q1, q2, 0), for any scalars q1 and q2. The orthogonal complement of W will

be all vectors ~u in R3 that are orthogonal to every vector in W , that is

〈~u, ~q〉 = 〈(u1, u2, u3), (q1, q2, 0)〉 = 0

For this to be true for any choice of u1, u2, u3, q1 and q2, we must have

u1 = u2 = 0. It doesn’t matter what u3 is because the third component of ~q

is always zero. So, we are looking at vectors of the form (0, 0, u3). These are

vectors in the direction of ~e3. The span of ~e3 is all linear combinations of ~e3,

which means vectors of the form c~e3 = (0, 0, c) for any c. Therefore

W⊥ = span {~e3}

Armed with the definition or orthogonal complement, let’s briefly revisit the fun-

damental subspaces of a matrix. There is actually another one. It is associated

with the transpose of the matrix.

Example 11.12 Consider the 3× 3 matrix A and its transpose AT

a11 a12 a13

a21 a22 a23

a31 a32 a33

a11 a21 a31

a12 a22 a32

a13 a23 a33

To get AT , we swap the columns for the rows. The 3 column vectors of A are

~c1 = (a11, a21, a31), ~c2 = (a12, a22, a32), ~c1 = (a13, a23, a33)

These are also the 3 row vectors of AT . It follows that

Finding a basis for the column space of A is equivalent to finding a basis

for the row space of AT .

Now consider the procedure for multiplying a vector ~u = (u1, u2, u3) by AT

a11 a21 a31

a12 a22 a32

a13 a23 a33

a11u1 + a21u2 + a31u3

a12u1 + a22u2 + a32u3

a13u1 + a23u2 + a33u3

〈~u,~c1〉

〈~u,~c2〉

〈~u,~c3〉

Suppose that the vector ~u is in the null space of AT , i.e., ker(AT ). Then

AT~u = ~0

which, looking at equation (76) means that 〈~u,~ci〉 = 0, for i = 1, 2, 3 (~u is

orthogonal to every one of the column vectors of A). Let ~v be any vector in

col(A), i.e., all linear combinations of ~c1, ~c2 and ~c3. Then ~v has the form

~v = a1~c1 + a2~c2 + a3~c3 for some numbers a1, a2 and a3. This gives

〈~u,~v〉 = 〈~u, a1~c1 + a2~c2 + a3~c3〉

= a1 〈~u,~c1〉+ a2 〈~u,~c2〉+ a3 〈~u,~c3〉 = 0 + 0 + 0 = 0

which means that ~u is orthogonal to any vector ~v in col(A). We have demon-

strated is that if ~u is in ker(AT ), it must also be in the orthogonal complement

of col(A), written as col(A)⊥.

Now suppose that ~u is in col(A)⊥. Then it is orthogonal to every vector in

col(A), in particular, to the individual column vectors ~c1, ~c2 and ~c3. From equa-

tion (76) we then see that AT~u = ~0, so ~u is in ker(AT ). We have demonstrated

is that if ~u is in col(A)⊥, it must also be in ker(AT ). Combining this will the

previous result, we conclude that col(A)⊥ and ker(AT ) are the same thing! We

also know that col(A) and the range of A are the same. Therefore

ker(AT ) = col(A)⊥ = range(A)⊥

(HARD) Exercise: Use similar arguments to show that

ker(A) = row(A)⊥

ker(AT ) is the fourth fundamental subspace, called the left null space or cok-

ernel.

12 Orthogonal and orthonormal bases

We now come back to the issue of basis. Recall that B is a basis for a vector

space V if every vector in V can be written as a linear combination of the

vectors in B and the vectors in B are linearly independent (none of them is a

linear combination of the others). If B is a basis for V and, furthermore, the

space V has an inner product (i.e. it is an inner product space), we can turn B

into a special type of basis. This new basis will have important and very useful

properties. Before showing you how to construct it, you will need to understand

a few basic concepts.

• Let S be a set of vectors in an inner product space. If each distinct pair

of vectors is orthogonal we call S an orthogonal set.

• If S is an orthogonal set and each vector in S has a norm of 1, then S is

called an orthormal set.

Example 12.1 Given the vectors ~v1 = (2, 0,−1), ~v2 = (0,−1, 0) and ~v3 =

(2, 0, 4) in R3

(a) Show that they form an orthogonal set with the standard inner product but

do not form an orthonormal set.

(b) Turn them into an orthonormal set ~u1, ~u2 and ~u3.

(a) To show that they form an orthogonal set, we have to demonstrate that each

distinct pair is orthogonal.

〈~v1, ~v2〉 = 2× 0 + 0× (−1) + (−1)× 0 = 0

〈~v1, ~v3〉 = 2× 2 + 0× 0 + (−1)× 4 = 0

〈~v2, ~v3〉 = 0× 2 + (−1)× 0 + 0× 4 = 0

Exercise: Why didn’t we compute 〈~v2, ~v1〉, 〈~v3, ~v1〉 and 〈~v3, ~v2〉?

Now, to be an orthonormal set, the norms (magnitudes) of ~v1, ~v2 and ~v3 have

to be 1. Let’s compute them

‖~v1‖ =√

〈~v1, ~v1〉 =√

22 + 02 + (−1)2 =√5 ✗

‖~v2‖ =√

〈~v2, ~v2〉 =√

02 + (−1)2 + 02 = 1 ✔

‖~v3‖ =√

〈~v3, ~v3〉 =√22 + 02 + 42 =

√20 = 2

√5 ✗

(b) Most of the work is done. All we have to do is divide each vector by its norm

~u1 =~v1‖~v1‖

=1√5(2, 0,−1) =

(2√5, 0,− 1√

~u2 =~v2‖~v2‖

= (0,−1, 0)

~u3 =~v3‖~v3‖

2√5(2, 0, 4) =

(1√5, 0,

Exercise: Verify that the norms of these vectors are 1 and that they are orthog-

Example 12.2 The standard basis vectors in Rn form an orthonormal set with

the standard inner product. For example, ~e1 = (1, 0, 0), ~e2 = (0, 1, 0) and ~e3 =

(0, 0, 1) in R3. Exercise: Compute the norms of these vectors and their pairwise

inner products to show that they form an orthonormal set.

There is a special property of orthogonal/orthonormal sets that will come in

very handy

If S is an orthogonal set of vectors in an inner product space, then S is also

a set of linearly independent vectors

How can we show this? Let S = {~v1, ~v2, ...., ~vn} be the set of vectors in question.

We know they are orthogonal. Let’s recall the definition of linear independence:

The vectors ~v1, ~v2, ...., ~vn are linearly independent if the only way to get

c1 ~v1 + c2~v2 + ....+ cn~vn = ~0 (81)

is by having all the numbers c1, c2, ...cn equal to zero. This is equivalent to

saying that no vector can be a linear combination of the others. Let’s now take

the inner product of both sides of (81) with any of the vectors, let’s say ~v1

〈(c1~v1 + c2~v2 + ....+ cn~vn), ~v1〉 =⟨

~0, ~v1

The inner product has to satisfy equation (72) (called “additivity”), which gives

us a way to simplify the left hand side of equation (82)

〈(c1~v1 + c2~v2 + ....+ cn~vn), ~v1〉

= 〈c1~v1, ~v1〉+ 〈c2~v2, ~v1〉+ .....+ 〈cn~vn, ~v1〉

= c1 〈~v1, ~v1〉+ c2 〈~v2, ~v1〉+ .....+ cn 〈~vn, ~v1〉

= c1 〈~v1, ~v1〉

What happened to all the terms after c1 〈~v1, ~v1〉? Remember that the set S =

{~v1, ~v2, ...., ~vn} is orthogonal. Therefore, the inner product of two distinct vectors

is zero so the only nonzero term in the third line of (83) is c1 〈~v1, ~v1〉.

The right hand side of equation (82) is obviously zero, so we end up with

c1 〈~v1, ~v1〉 = 0

Now 〈~v1, ~v1〉 = ‖~v1‖2 > 0 unless ~v1 is the zero vector, which it isn’t. Therefore,

we must have c1 = 0. If we perform the same procedure with ~v2 instead of ~v1,

we will get c2 = 0, and so on with all the other scalars. Therefore, the set S is

linearly independent.

The great thing about having an orthogonal/orthonormal basis for a space V

is that we can easily find the coordinates of any vector in V wih respect this basis.

Remember that the coordinates are the numbers multiplying the basis vectors

in the linear combination: if S = {~v1, ~v2, ...., ~vn} is the orthogonal/orthonormal

basis for V , then any vector ~u (in V ) can be written as

~u = c1~v1 + c2~v2 + ....+ cn~vn

Let’s take the inner product of both sides with ~v1 (same as the procedure above)

〈~u,~v1〉 = 〈(c1~v1 + c2~v2 + ....+ cn~vn), ~v1〉

= c1 〈~v1, ~v1〉+ c2 〈~v2, ~v1〉+ .....+ cn 〈~vn, ~v1〉

= c1 〈~v1, ~v1〉

Since we know ~u and we know ~v1 we can find c1

c1 =〈~u,~v1〉〈~v1, ~v1〉

=〈~u,~v1〉‖~v1‖2

using the definition of the norm. Similarly

c2 =〈~u,~v2〉‖~v2‖2

, c3 =〈~u,~v3〉‖~v3‖2

, . . . . . . . . . cn =〈~u,~vn〉‖~vn‖2

Therefore, we can write the vector ~u as

~u =〈~u,~v1〉‖~v1‖2

~v1 +〈~u,~v2〉‖~v2‖2

~v2 + ....+〈~u,~vn〉‖~vn‖2

~vn (85)

If {~v1, ~v2, .....~vk} is an orthonormal basis, then

~u = 〈~u,~v1〉~v1 + 〈~u,~v2〉~v2 + ....+ 〈~u,~vn〉~vn (86)

v = (0,0,1)

u = (2,2,1)

u = (2,2,0)projWW = xy plane

Figure 4: The othogonal projection of a vector ~u in R3 on the xy plane W (example

13.1).

13 Orthogonal projections

We now introduce the idea of “orthogonal projections”. Let’s look at a simple

example

Example 13.1 Let’s take the vector ~u = (2, 2, 1) = 2~e1+2~e2+~e3 in R3. We can

define a subspace W of R3 as that space with all vectors of the form ~q = (q1, q2, 0),

where q1 and q2 are any scalars. This is nothing more than those vectors in R3

that lie in the xy plane. They are linear combinations of ~e1 and ~e2.

The orthogonal projection of ~u on the xy plane W is the vector ~u =

(2, 2,0). What is this exactly? Basically, what we do is drop a straight line from

the point P in Figure 4 to the xy plane, landing at a point Q. The line vector

~v =−−→PQ has to be perpendicular to the xy plane. The only possibility for this is

~v = (0, 0, 1), i.e., it is parallel to the z axis. The vector−−→OQ is the orthogonal

projection. We write it as projW ~u. Notice that it lies in W (the xy plane).

Why do we call it orthogonal? Well, there is the obvious reason that the line

−−→PQ we drop is perpendicular (orthogonal) to the xy plane. Notice that the vector

~v =−−→PQ is orthogonal to every vector in ~q = (q1, q2, 0) in W (the xy plane):

〈~v, ~q〉 = 〈(0, 0, 1), (q1, q2, 0)〉 = 0

Therefore, ~v is in the orthogonal complement W⊥ of W (see example 11.11).

The orthogonal projection projW ~u, on the other hand, is in W , and

~v + projW ~u = (0, 0, 1) + (2, 2, 0) = (2, 2, 1) = ~u

What we have managed to do is decompose ~u into two parts, one in W⊥ and the

other in W . The two parts are orthogonal to each other. We can get these two

parts by splitting the linear combination of orthogonal basis vectors

~u = 2~e1 + 2~e2︸︷︷︸

projW ~u (in W )

+ e3︸︷︷︸

~v (in W⊥)

Finally, we can see from Figure 4 that ~v is the shortest distance between

P and the plane W . If we wanted to approximate the vector ~u using only the

basis vectors in W (~e1 and ~e2), projW ~u would be the best approximation.

Now this is all well and good but what if we have a vector in a general Rn space

and we want to approximate it by a vector in a general subspace of Rn. For

instance, in the above example, rather than choosing the subspace as the xy

plane we could have chosen another plane, such as 2x + 3y − z = 2. We would

then have to approximate the vector ~u by a linear combination of basis vectors

that describe this plane in order to obtain the orthogonal projection.

Let ~u be a vector in Rn endowed with the standard inner product. Let W

be a subspace of Rn with an orthogonal basis {~v1, ~v2, .....~vk}, where k ≤ n.

The orthogonal projection of ~u on W is given by

projW ~u =〈~u,~v1〉‖~v1‖2

~v1 +〈~u,~v2〉‖~v2‖2

~v2 + ....+〈~u,~vk〉‖~vk‖2

~vk (87)

If {~v1, ~v2, .....~vk} is an orthonormal basis for W , then

projW ~u = 〈~u,~v1〉~v1 + 〈~u,~v2〉~v2 + ....+ 〈~u,~vk〉~vk (88)

• projW ~u is in W and the vector ~v = (~u−projW ~u) is in W⊥. The vectors

projW ~u and ~v are, therefore, orthogonal.

• The shortest “distance” between the vector ~u and the subspace W is the

norm (magnitude) of ~v: ‖~v‖ = shortest distance between ~u and W .

• Of all the vectors in the subspace W , the vector projW ~u is the best

approximation to ~u.

Most of these facts are suggested by example 13.1, but we haven’t quite shown

that they hold in the general case. Let’s start with the claim that that vec-

tors ~v = (~u − projW ~u) and projW ~u are orthogonal. To simplify the notation

let’s assume that the basis {~v1, ~v2, .....~vk} is orthonormal, i.e. all the ~vi’s have

‖~vi‖ = 1. Then

〈~v, projW ~u〉

= 〈(~u− projW ~u), projW ~u〉

= 〈~u, projW ~u〉 − 〈projW ~u, projW ~u〉

~u, 〈~u,~v1〉~v1 + 〈~u,~v2〉~v2 + ....+ 〈~u,~vk〉~vk︸︷︷︸

projW ~u

− 〈projW ~u, projW ~u〉

= 〈~u, 〈~u,~v1〉~v1〉+ ....+ 〈~u, 〈~u,~vk〉~vk〉 − 〈projW ~u, projW ~u〉

= 〈~u,~v1〉〈~u,~v1〉+ ....+ 〈~u,~vk〉〈~u,~vk〉 − 〈projW ~u, projW ~u〉

= 〈~u,~v1〉2 + 〈~u,~v2〉2 + ....+ 〈~u,~vk〉2 − 〈projW ~u, projW ~u〉

= 〈projW ~u, projW ~u〉 − 〈projW ~u, projW ~u〉 = 0

so they are indeed orthogonal.

Exercise: Repeat this procedure for an orthogonal (but not orthornormal)

basis for W .

Now, projW ~u clearly lies in W by the way it is defined (a linear combina-

tion of the basis vectors in W ). How do we show that ~v is in W⊥? If it is,

then ~v is orthogonal to every vector in W . Since every vector in W is a lin-

ear combination of the vectors in {~v1, ~v2, ...~vk}, we just need to show that ~v

is orthogonal to each of these basis vectors (why?). Again, let’s assume they

are orthonormal. We choose any one of them, say ~v1, and take the inner product

〈~v,~v1〉

= 〈(~u− projW ~u), ~v1〉

= 〈~u,~v1〉 − 〈projW ~u,~v1〉

= 〈~u,~v1〉 − 〈〈~u,~v1〉~v1 + 〈~u,~v2〉~v2 + ....+ 〈~u,~vk〉~vk, ~v1〉

= 〈~u,~v1〉 − 〈〈~u,~v1〉~v1, ~v1〉+ 〈〈~u,~v2〉~v2, ~v1〉+ ....+ 〈〈~u,~vk〉~vk, ~v1〉

= 〈~u,~v1〉 − 〈~u,~v1〉〈~v1, ~v1〉+ 〈~u,~v2〉〈~v2, ~v1〉+ ....+ 〈~u,~vk〉〈~vk, ~v1〉

= 〈~u,~v1〉 − 〈~u,~v1〉〈~v1, ~v1〉

= 〈~u,~v1〉 − 〈~u,~v1〉 = 0

We can do the same with all the basis vectors.

Exercise: Repeat this procedure for an orthogonal (but not orthornormal)

basis for W .

Now onto the statement about “shortest distance” and “best approximation”.

In R2 and R

3 (as you can see in Figure 4), the vector ~v takes us from a P to

the closest point on the subspace W because the shortest distance between

two points is a straight line! It is essentially this concept that we want to

generalise for higher dimensions.

Let’s restate clearly want we want to do: find the vector in W that gives us

the best approximation to a general vector ~u in Rn. We are claiming that this

c = a + b2 2 2

Figure 5: Illustration of the Pythagorean theorem.

vector is projW ~u. Let’s start by choosing any vector ~w in W that is NOT the

same as projW ~u. We can write (a simple mathematical trick that will help us)

~u− ~w = (~u− projW ~u) + (projW ~u− ~w)

The vector (projW ~u− ~w) is a combination of (basis) vectors in W and so belongs

to W itself. We already know that the vector ~v = ~u − projW ~u is in W⊥.

Therefore (~u− projW ~u) and (projW ~u− ~w) are orthogonal.

To proceed, we look at a familiar concept: the Pythagorean theorem

for a right triangle, demonstrated in Figure 5. For two vectors ~a and ~b in R2

or R3 that are at right angles (orthogonal), the Pythagorean theorem becomes

|~a + ~b|2 = |~a|2 + |~b|2. For two orthogonal vectors ~a and ~b in a general inner

product space, the equivalent theorem is

‖~a+~b‖2 = ‖~a‖2 + ‖~b‖2

Putting ~a = (~u− projW ~u) and ~b = (projW ~u− ~w) in this formula we get

‖~u− ~w‖2 = ‖~u− projW ~u‖2 + ‖projW ~u− ~w‖2

> ‖~u− projW ~u‖2 because ~w 6= projW ~u

Therefore

‖~u− projW ~u‖2 < ‖~u− ~w‖2 for all vectors ~w in W , except projW ~u

In turn, this means that

• The shortest “distance” between the vector ~u and W is the norm (magnitude)

of the vector ~v = ~u− projW ~u.

• projW ~u gives us the “best approximation” to ~u by a vector in W .

One final note. We have shown that given any subspace W of Rn for which

we have an orthogonal basis, we can write any vector in Rn as a sum of

a vector in W and a vector in W⊥. The dimension of Rn has to be n.

Therefore the dimensions (number of basis vectors) of W and W⊥ have to sum

dimW + dimW⊥ = n

We have essentially partitioned Rn into the two spaces W and W⊥. We say,

therefore, that Rn is the direct sum of W and W⊥. This is written as

Rn = W ⊕W⊥

14 The Gram-Schmidt process

The first important application of orthogonal projections is the Gram-Scmidt

process. We will meet another in the next section.

Suppose we have an arbitraty (non-orthogonal) basis for Rn. The basis

vectors are linearly independent but we would prefer them to be orthogonal too.

It turns out that this is always possible: given n linearly independent vectors for

Rn we can turn them into an orthogonal basis. The way we do this is called the

Gram-Schmidt process. We will illustrate the procedure using two examples.

Throughout this section, the standard inner product on Rn will be assumed.

v =1 u1

v2proj W

u =2 v −2 v2proj W

yOldbasis

Newbasis

Figure 6: An illustration of the Gram Schmidt process in R2 (see example 14.1). Here,

the orthogonal basis ~u1 and ~u2 is constructed from ~v1 and ~v2.

Example 14.1 Consider the basis S = {~v1, ~v2} for R2, where ~v1 = (3, 1) and

~v2 = (2, 2). This may not be a particularly convenient basis, unlike the standard

basis, where the vectors ~e1 and ~e2 are perpendicular (orthogonal). We’ve seen

how easy it is in that case to write down the coordinates for a general vector.

What if we could turn these linearly independent vectors into an orthogonal

basis {~u1, ~u2}? It would then resemble the standard basis, but with a different

origin. In fact we can!

We start by putting ~u1 = ~v1. This will be our first orthogonal basis vector.

We now need to construct a second vector that is orthogonal to ~u1. We could

do this by simply looking at Figure 6 and making the simple observation that in

order for ~u2 to be orthogonal to ~u1 = ~v1, it must be of the form ~u2 = t(1,−3)

for any number t. However, we want a systematic way of doing it because there

are generally many more than two basis vectors.

Let’s form the subspace W = span {~u1} of R2. W is the set of all the linear

combinations (in this case multiples) of ~u1. We can project the vector ~v2 from

the original basis S onto W . This orthogonal projection, projW ~v2, is shown in

Figure 6. It is nothing more than the component of ~v2 that points in the direction

of ~u1. It is in the space W because it is a multiple of ~u1.

In the previous section we saw that any vector ~u in Rn can be written as a

sum of two vectors: (i) the projection projW ~u of ~u onto a subspace W (which

has an orthogonal basis) and (ii) the vector ~u − projW ~u in W⊥. We can ap-

ply this information here by putting ~u = ~v2 and W = span {~u1}. The vector

~v2 − projW ~v2 is orthogonal to every vector in W = span {~u1}, in particular to

~u1. So we set ~u2 = ~v2 − projW ~v2 to get a vector orthogonal to ~u1 = ~v1. The

formula for the projection is by equation (87)

projW ~v2 =〈~v2, ~u1〉‖~u1‖2

~u1 (only one basis vector ~u1 in W ) (91)

This gives

~u2 = ~v2 −〈~v2, ~u1〉‖~u1‖2

~u1 = ~v2 −4

5~u1 =

5(−1, 3) (92)

Exercise: Check that ~u1 and ~u2 are orthogonal.

Example 14.2 Given the basis ~v1 = (2,−1, 0), ~v2 = (1, 0,−1) and ~v3 =

(3, 7,−1), find an orthogonal basis {~u1, ~u2, ~u3} for R3.

As in the last example we set ~u1 = ~v1 and form the subspace W1 = span {~u1}

of R3. Again, we project ~v2 on W1 to get the component of ~v2 in the direction

of ~u1. The component of ~v2 in W⊥1 then gives us ~u2

~u2 = ~v2 − projW1~v2

= ~v2 −〈~v2, ~u1〉‖~u1‖2

= (1, 0,−1)− 2

5(2,−1, 0) (〈~v2, ~u1〉 = 2 and ‖~u1‖2 = 5 )

5(1, 2,−1)

Now what? We repeat the previous steps. We want a vector orthogonal to both

~u1 and ~u2. The subspace W2 = span {~u1, ~u2} of R3 consists of all linear com-

binations of ~u1 and ~u2. Our task is, therefore, to find a vector ~u3 that lies in

W⊥2 , the subspace of all vectors that are orthogonal to both ~u1 and ~u2. So let’s

project ~v3 on W2 to get a vector projW2~v3 in the subspace W2. Then, the vector

~v = ~v3 − projW2~v3 lies in W⊥

2 . The vector projW2~v3 is again given by equation

(87), so ~u3 is

~u3 = ~v3 − projW2~v3

= ~v3 −(〈~v3, ~u1〉

‖~u1‖2~u1 +

〈~v3, ~u2〉‖~u2‖2

= ~v3 −(

5~u1 +

6/5~u2

3(1, 2, 1)

Exercise: Check that ~u1, ~u2 and ~u3 are mutually orthogonal.

Exercise: How do we know that the orthogonal set of vectors we have con-

structed in this example, {~u1, ~u2, ~u3}, is actually a basis for R3? In other words,

does this set of vectors span the whole of R3?

HINT: Think about (i) the number of basis vectors required to span Rn, and

(ii) the relationship between linear independence and orthogonality.

In the two examples above we have developed a procedure for turning a general

basis into an orthogonal basis. We now summarise the procedure.

Let ~v1, ~v2, .....~vn be a set of linearly independent vectors for Rn. Then an

orthogonal basis ~u1, ~u2, .....~un for Rn can be found by the following Gram-

Schmidt process

Step 1.

~u1 = ~v1

Step 2.

~u2 = ~v2 −〈~v2, ~u1〉‖~u1‖2

Step 3.

~u3 = ~v3 −(〈~v3, ~u1〉

‖~u1‖2~u1 +

〈~v3, ~u2〉‖~u2‖2

Step n.

~un = ~vn −(〈 ~vn, ~u1〉

‖~u1‖2~u1 +

〈 ~vn, ~u2〉‖~u2‖2

~u2 + . . . . . .+〈 ~vn, ~un−1〉‖~un−1‖2

~un−1

Note that to obtain an orthonormal basis from the new orthogonal basis, we

simply divide each new member of the orthogonal basis by its norm.

Example 14.3 Convert the orthogonal basis found in example 14.2 into an or-

thonormal basis {~w1, ~w2, ~w3}.

The orthogonal basis is

~u1 = (2,−1, 0) ~u2 =1

5(1, 2,−1) ~u3 =

3(1, 2, 1)

We compute

‖~u1‖ =√5 ‖~u2‖ =

5‖~u3‖ =

which yields

~w1 =~u1‖~u1‖

=1√5(2,−1, 0)

~w2 =~u2‖~u2‖

=1√30

(1, 2,−5)

~w3 =~u3‖~u3‖

=1√6(1, 2, 1)

Exercise: (a) Given the basis ~v1 = (1, 1, 1, 1), ~v2 = (1, 1, 1, 0), ~v3 = (1, 1, 0, 0)

and ~v4 = (1, 0, 0, 0) for R4, construct an orthogonal basis for R

4. (b) Convert

the orthogonal basis found into an orthonormal basis.

15 Least squares approximations

We now come to a second important application of orthogonal projections. Re-

member the temperature data example 1.2 in which we wanted to fit a line to

the data but ended up with a system of equations that had more equations

than unknowns? This type of system is called “overdetermined”. The other

way round, when we have more unknowns than equations, the system is called

“underdetermined”. Both of these types of systems are called inconsistent.

Suppose that we have an inconsistent system of n equations in m unknowns.

In matrix form, the system is:

A~u = ~b

for an n × m matrix A and vector ~b in Rn. There is no solution ~u (in R

this equation, i.e., there is no vector ~u such that A~u = ~b. Perhaps, however, we

can look for a vector ~u such that A~u will be close to ~b. To this end, let’s define

a residual ~r as follows:

~r = ~b−A~u (95)

~r is obviously a vector (both ~b and A~u are vectors). It is a measure of how close

a vector ~u will be to satisfying the equation. What we do is look for the vector

~u that makes the norm (magnitude) of ~r as small as possible. This leads to the

least squares solution.

Given an inconsistent system A~u = ~b, the vector ~ul that makes

‖~r‖ = ‖~b−A~ul‖

as small as possible is called the least squares solution.

Okay, this has given us some sort of criterion, but how exactly do we find this

vector ~ul? Recall example 10.1 in which it was shown that the multiplication

of a vector by a matrix results in a linear combination of the matrix column

vectors, i.e., all outputs A~u are in col(A), the column space.

Now put W = col(A). A~u will be in W for any ~u. Indeed, the set of outputs

A~u for all choices of ~u in Rm will span W ; any linear combination of the column

vectors is possible for the right choice of ~u = (u1, u2, ..., um). Therefore:

range(A) = col (A) = W

At this point let’s state the least squares problem in a different way:

Given an inconsistent system A~u = ~b for some ~b in Rn, find the vector A~ul

in W = col (A) that is the closest approximation to ~b, i.e.,

‖~b−A~ul‖ < ‖~b−A~u‖

for all possible choices of A~u.

Let’s restate a result from section 13 on orthogonal projections. Suppose W is

a subspace of Rn and ~x is a vector in Rn. The closest approximation to ~x

by a vector in W is given by projW~x, the orthogonal projection of ~x

on W. projW~x is in W by definition and ~x− projW~x is in W⊥.

Let’s put W = col (A) and swap ~x for A~u (all vectors in the column space

(range) of A). Then the closest approximation to ~b by a vector in W is projW~b.

This is the vector A~ul that we want:

A~ul = projW~b

We could find A~ul this way and invert the result to find ~ul, but there is a better

way to solve the problem.

The least squares solution ~ul to the problem A~u = ~b also satisfies the

normal system:

ATA~ul = AT~b (96)

This system is always consistent. If the equation A~x = ~0 has only the

trivial solution ~x = ~0, a unique solution to the least squares problem is

~ul = (ATA)−1AT~b

Before we move onto an example, let’s see why the above statements are true.

We’ve determined that A~ul = projW~b, which is a vector in W = col (A). We

can always find ~ul = (u1, u2, ..., um), with the right choice of coordinates, such

that A~ul gives us the projection vector we want. This means we always have a

solution.

The residual, given by equation (95), satisfies:

~r = ~b−A~ul = ~b− projW~b

The vector on the right-hand side is in W⊥ = col (A)⊥, as stated above. In ex-

ample 11.12 we showed that for a matrix A, col (A)⊥ is the same as ker (AT ), the

null space of AT . So, for the least squares solution, the residual is in ker (AT ),

which means that AT~r = ~0, i.e.,

AT~r = AT (~b−A~ul) = ~0 or AT~b = ATA~ul

If we had two solutions ~u1land ~u2

l, then we would have:

A~u1l = A~u2l or A(~u1l − ~u2l ) = 0

But this has only the solution (~u1l− ~u2

l) = ~0, so ~u1

l= ~u2

l. In other words, we

have contradicted ourselves, which means we can’t have two solutions.

Example 15.1 Use a least squares approximation to find the equation of the

line that will best approximate the points (x, y) = (−2, 65), (1, 20), (−7, 105)

and (5,−34).

The line will have the form y = ax+ b. If we put each of the x and y values

into y = ax + b we will get 4 equations for a and b (clearly too many!). The

system is overdetermined. It is written in matrix form as follows:

−2 1

−7 1

︸︷︷︸

The normal system (96) for the least squares solution is given by multiplying

both sides by the transpose of A:

−2 1 −7 5

1 1 1 1

︸︷︷︸

−2 1

−7 1

︸︷︷︸

−2 1 −7 5

1 1 1 1

︸︷︷︸

which leads to a much simpler equation

79 −3

−3 4

−1015

This is an easy system to solve. The answer is a = −11.7 and b = 47.8.

Exercise Find the least squares solution to the following system:

2 −1 1

1 −5 2

−3 1 −4

1 −1 1

︸︷︷︸

vecspacandlinalg

ac1 cc2

k1 k2

reduced echelon matrix

reduced echelon form

add individual components

cn vn

column numbers m1

special fundamental vectors

Documents

explore the levels of creation

united states constitution

venture capital

daniel zanella and alexander weygers

bodybuilding - the rock hard challenge (month 1 training)

star wars trivia!

algorithms

oedipus the king: the ideal tragic play

barclays1

basic buffer overflows explained

introduction to six sigma

the last carnival i ever saw

how computer keyboards work

i am a holocaust denier and i am unafraid

chapter-01

bhagavad gita

personality development

chapter 24

effective parenting: establishing boundaries

star wars prequel trilogy trivia (episodes i-iii)