vecspacandlinalg
Post on 03-Sep-2014
21 Views
Preview:
TRANSCRIPT
SESA2021 Engineering Analysis:
Vector Spaces and Linear Algebra
Lecture Notes 2009/2010.
Lecturer: Dr A. A. Shah
School of Engineering Sciences, University of Southampton
Room 1027, Building 25
e-mail: A.Shah@soton.ac.uk
Copyright c© 2010 University of Southampton
Contents
1 Introduction and example applications 1
2 Basic definitions and examples 6
3 Subspaces of vector spaces 10
4 Linear Transformations 11
5 Span 15
6 Linear independence 20
7 Basis and dimension 23
8 Changing the basis 26
9 Fundamental subspaces 35
10 Square matrices and systems of linear equations 47
11 Inner product spaces and orthogonality 51
12 Orthogonal and orthonormal bases 62
13 Orthogonal projections 67
14 The Gram-Schmidt process 74
15 Least squares approximations 80
Engineering Analysis SESA 2021
k1 k2k1
x2x1
m m
Figure 1: A mechanical system with 2 masses and 3 springs (vibration example 1.1).
1 Introduction and example applications
In this course we will be concerned primarily with solving systems of linear
equations (including eigenvalue problems), which are difficult to avoid in any
aspect of engineering. These systems, which can be very large, can be written
as equations involving matrices and vectors. Let’s look at some examples in
which vectors, matrices and eigenvalues arise.
Example 1.1 Consider the system with 2 masses and 3 springs shown in Fig-
ure 1. We can use Newton’s second law along with Hooke’s law to write down a
system of equations for the displacements, x1 and x2, of the two masses
mx1 + k1x1 + k2(x1 − x2) = 0
mx2 + k1x2 + k2(x2 − x1) = 0
(1)
where k1 and k2 are the spring constants. These equations can be written as
Page 1
Engineering Analysis SESA 2021
x1 = −(k1 + k2
m
)
x1 +k2m
x2
x2 =k2m
x1 −(k1 + k2
m
)
x2
(2)
We could now write the system in matrix form by first introducing a “vector”
form of the solution: ~x = (x1, x2). Then
x1
x2
︸ ︷︷ ︸
~x
=
−β α
α −β
︸ ︷︷ ︸
A
x1
x2
︸ ︷︷ ︸
~x
(3)
where β = (k1 + k2)/m and α = k2/m. The obvious thing to do is look for
oscillatory solutions that are of the form
~x = ~veiωt =
v1
v2
eiωt (4)
where ω is the vibration frequency. The new vector ~v contains just constants,
v1 and v2, which we would want to find. The variable part is in the eiωt. Sub-
stituting (4) into (3) and cancelling the eiωt terms on both sides gives us a new
system of equations for ~v
−ω2
v1
v2
=
−β α
α −β
v1
v2
or A~v = −ω2~v (5)
This is an example of an eigenvalue problem, i.e., something of the form: “a
Page 2
Engineering Analysis SESA 2021
������������
��������
��������
��������
��������
������
����
��
������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
Tem
pera
ture
Time
Figure 2: Output from an experiment in which temperature is measured with time.
The objective is to get the best straight line fit to the data. Data fitting example 1.2.
transformation (in this case a matrix A) acting on a object (the vector ~v) and
giving us a constant (in this case −ω2) times the object”. We may now ask how
many solutions there are and what they look like. In this present case we would
be most interested in the frequencies of vibration and the corresponding solutions
(normal modes). It turns out that there are 2 frequencies because there are two
degrees of freedom.
Example 1.2 Suppose you have run an experiment and collected some data that
you would like to fit to a line or curve. Let’s say you’ve taken measurements
of temperature against time and expect a linear rise in temperature but due to
experimental error, not all points will fall nicely onto a straight line, as seen
in Figure 2. Let’s say there are 4 temperature measurements T1 to T4 taken at
times t1 to t4, and we want to represent temperature as T = a+ bt. We need to
Page 3
Engineering Analysis SESA 2021
find a and b. If we take the two points T1 and T2 at times t1 and t2 we have
T1 = a+ bt1 T2 = a+ bt2
which we can rearrange to find a and b. The problem is that if we use another
two points we will get different values of a and b. Let’s define the “vector”
(T1, T2, T3, T4). We need to find one value for a and one for b. The matrix
equation we need to solve is
T1
T2
T3
T4
=
a+ bt1
a+ bt2
a+ bt3
a+ bt4
=
1 t1
1 t2
1 t3
1 t4
a
b
(6)
Notice however, that we have more equations than variables (a and b)! This is
an example of an“overdetermined system”. How do we solve the system? Well,
we can’t solve it exactly but what we can do is find the “best fit” in some sense.
One method we will look at to do this is called “least squares”.
Example 1.3 Partial differential equations (conservation laws) from aerody-
namics are usually solved on a computer. The numerical solutions are con-
structed first by “discretising” the equations using the finite difference, finite
volume or finite element methods in space together with a time-stepping proce-
dure if the equations are unsteady. This means that you approximate the solu-
tions at discrete points in time and space and try to find the solutions at those
points. By discretising the equations this way you will end up with large matrix
Page 4
Engineering Analysis SESA 2021
systems. The more points (finer mesh) you chose and the higher the dimen-
sion the larger the systems become. There are a great many ways of solving
these systems depending on the accuracy and speed required and the stability of
the methods. To understand these methods and to choose the most appropriate
(in for example a CFD code) for a given problem you need to understand some
linear algebra (theory of matrix systems).
We first need to develop the ideas of “vectors” and “transformations” (for our
purposes these are matrices) so that you are familiar with the language used to
describe matrix systems.
Page 5
Engineering Analysis SESA 2021
x
y
x
y
z
Figure 3: Vectors in the plane R2 (left) and in space R
3 (right). For both these spaces
we can represent vectors graphically but in higher dimensions this is obviously not
possible.
2 Basic definitions and examples
A vector is a quite general object. It doesn’t have to be a directed line segment
in space or in the plane, as shown in Figure 3. In fact we can have vectors in
higher dimensions, as shown in example 1.2. More on this below.
When we look at a particular set of vectors we will call it a vector space
and give it a symbol like V . The individual vectors will be given symbols like ~u
or ~v .
So what is a “vector space” and what isn’t? Let’s look at a familiar example.
Example 2.1 Euclidean n spaces, denoted Rn, are vector spaces (we will see
why in a second). In this course we will deal almost exclusively with these vector
spaces.
There are two that you are familiar with: R2 (two dimensional space) and R
3
(three-dimensional space). An example of a vector in R2 is ~u = (2, 3), which is
Page 6
Engineering Analysis SESA 2021
sometimes written as 2~e1+3~e2. The numbers 2 and 3 are called the “coordinates”
of the vector ~u = (2, 3) in the “standard basis vectors” ~e1 = (1, 0) and ~e2 = (0, 1).
We will look at these concepts in detail later on. Vectors in R2 and R
3 can be
represented graphically as shown in Figure 3.
More generally we can define vectors that have n coordinates. These are
vectors in the vector space Rn. For example, the vector (T1, T2, T3, T4) in example
1.2 is in R4 and (1, 0,−1, 2, 4, 0, 1) is in R
7. We are not able to visualise these
in a graph.
To construct a vector space we basically take a bunch (set) of vectors and
define ways of adding them together and multiplying them by numbers (scalars).
Let’s recall some basic facts about the familiar vectors in the Euclidean 2 and
3 spaces R2 and R
3. These will help us to understand what a vector space is
precisely.
(1) On R2 (space) we can “add” two vectors as follows:
(1,−1) + (2, 5) = (1 + 2,−1 + 5) = (3, 4) (7)
i.e., we just add the individual coordinates. We can add vectors in any Rn
space in this way. Note that we have chosen to define “addition” in this way.
We could instead choose another way. By doing it as above, we have made
sure that the sum of two vectors in R2 is another vector in R
2.
(2) On R2 we can multiply a vector by a scalar (number) as follows:
Page 7
Engineering Analysis SESA 2021
2(1, 3) = (2× 1, 2× 3) = (2, 6) (8)
where we just multiply each coordinate by the scalar (number) 2. We can
multiply vectors in any Rn space by a scalar in this way. Again, we have
defined “multiplication by a scalar” in a certain way. We could instead choose
another way. By doing it as above, we have made sure that multiplying a
vector in R2 by a scalar gives another vector in R
2.
(3) Now that we have defined a way of adding vectors in Rn (add individual
components) and of multiplying them by scalars (multiply each component by
the scalar), it doesn’t matter which way round we add vectors in Rn or which
way round we multiply them by scalars. There are some obvious rules, such
as
(i) ~u+ ~v = ~v + ~u e.g. (2,−1) + (1, 0) = (1, 0) + (2,−1)
(ii) (c+ k)~u = c~u+ k~u e.g. (3 + 2)(2,−1) = 3(2,−1) + 2(2,−1)
(9)
for any three vectors ~u , ~v , ~w in Rn and any scalars c and k.
(4) In R2 we have a zero vector ~0, i.e. (0, 0). When we add ~0 to any vector, e.g.
(2,−1) + (0, 0) = (2,−1)
the vector doesn’t change.
Page 8
Engineering Analysis SESA 2021
In a general vector space V we have to define the way we add vectors and
multiply them by scalars. When constructing these definitions, we have to make
sure that the rules above for the familiar way of doing things in Rn are preserved.
For V to be a vector space:
• The way we “add” vectors in V has to lead to other vectors in V . We say
that V is closed with respect to addition if this is true.
• When a vector in V is multiplied by a scalar, the answer must be an-
other vector in V . We say that V is closed with respect to scalar
multiplication if this is true.
• The way we define addition and scalar multiplication of the vectors in V
has to preserve rules (9) and other similar rules.
• V has to have a zero vector and adding it to any vector should not change
that vector.
If just one of these requirements is not satisfied, V will NOT be a vector space.
Example 2.2 Let’s define vector addition in R2 in the usual way (add individ-
ual components), but instead of the usual scalar multiplication we will use
c~u = c(u1, u2) = (u1, cu2) (10)
i.e., we only multiply the second coordinate. Let’s try to satisfy the last rule in
Page 9
Engineering Analysis SESA 2021
equations (9) with any vector in R2 and any two scalars:
2× (1, 1) + 3× (1, 1) = (1, 2) + (1, 3) = (2, 5)
but
5× (1, 1) = (1, 5) 6= 2× (1, 1) + 3× (1, 1)
So, defining scalar multiplication this way does NOT lead to a vector space.
We can also treat functions and even more abstract objects as vectors in
vector spaces. In this course, however, we will not consider these types of spaces,
which are usually referred to as “function spaces”.
One last bit of notation. If V consists of vectors ~v1, ~v2, ~v3, ....., ~vn, we use
curly brackets as follows
V = {~v1, ~v2, ~v3, ....., ~vn}
to represent this set of vectors. For example, if we have a set of vectors consisting
of ~v1 = (1, 0) and ~v2 = (0, 1), we write
V = {(1, 0), (0, 1)}
3 Subspaces of vector spaces
For some vector spaces it is possible to take a subset W (i.e. some of the vectors)
of the original space V and obtain a new vector space using the same rules for
addition and scalar multiplication. We call W a subspace of V . There are
some very important subspaces we will meet later on.
Page 10
Engineering Analysis SESA 2021
It turns out that to be a subspace, we only need to make sure that the
subspace is closed with respect to addition and scalar multiplication, i.e.,
when we add vectors in W or multiply a vector in W by a scalar we get
another vector in W .
Example 3.1 Let W be the set of all vectors in R3 that are of the form (0, u2, u3),
i.e., the first coordinate is zero. Is this a subspace of R3 with the usual rules for
addition and scalar multiplication?
To find out we need to verify that addition and scalar multiplication of vectors
(0, u2, u3) lead to vectors of the form (0, u2, u3). This is the case (Exercise:
Check that it is), so the space consisting of such vectors is a subspace of R3.
Example 3.2 Let W be the set of all vectors (u1, u2) in R2 such that u1 ≥ 0,
i.e., the first coordinate non-negative. Is this a subspace of R2 with the usual
rules for addition and scalar multiplication?
Multiply (u1, u2) by c < 0. We get (cu1, cu2), where the first coordinate is
negative. Therefore, this space is not closed with respect to scalar multiplica-
tion (multiplication by a negative scalar will give a vector that is not in W ).
Therefore, W is NOT a subspace of R2.
4 Linear Transformations
The idea of a transformation (or map) is that it takes a vector, say ~u in a
space V , and “transforms” or “maps” it into another vector A~u in a space W ,
which may or may not be the same as V . This is like a function f(x) taking a
number x and giving us another number y = f(x).
Page 11
Engineering Analysis SESA 2021
When we are dealing with the Euclidean n spaces, we can write a transfor-
mation as a matrix. In what sense is a matrix a transformation? Let’s take a
look at an example.
Example 4.1 Consider the following 3× 3 matrix
A =
2 0 1
1 2 −1
3 −2 5
(11)
Let’s take the vector ~u = (1, 3, 0) in R3 and “transform” it with A (i.e. left
multiply it by A) into another vector ~b in R3
A~u =
2 0 1
1 2 −1
3 −2 5
︸ ︷︷ ︸
A
1
3
0
︸ ︷︷ ︸
~u
=
2
7
−3
︸ ︷︷ ︸
~b
(12)
In this example, A takes a vector in R3 and transforms it by multiplication into
another vector in R3. We write A : R3 → R
3 to signify this. This is pronounced
“A maps R3 to R3”. The output vector ~b is called the image of ~u under A.
In the general case, an n × m (n rows and m columns) matrix Anm takes any
vector in Rm and transforms it by multiplication into a vector in R
n, i.e., Anm :
Rm → R
n.
Page 12
Engineering Analysis SESA 2021
• The domain of Anm is the set of inputs, which is Rm.
• The range of Anm is the set of all possible outputs (images) in Rn.
Let’s take a look at some more examples:
Example 4.2 Consider the following multiplication (transformation) of a vec-
tor ~u by a 4× 3 matrix A, which leads to another vector ~b
−1 5 1
0 −4 3
9 2 1
3 −7 −3
︸ ︷︷ ︸
A
2
1
−1
︸ ︷︷ ︸
~u in R3
=
2
−7
19
2
︸ ︷︷ ︸
~b in R4
(13)
In this example we multiply a column vector in R3 (the domain) by A and get a
column vector in R4, so A : R3 → R
4. The set of possible outputs in R4 is the
range of A. Clearly, ~b is in the range of A (it is one of the possible outputs).
Example 4.3 Consider the following multiplication of a vector by a 3× 2 ma-
trix A
7 3
−1 1
4 −3
︸ ︷︷ ︸
A
1
−2
︸ ︷︷ ︸
~u in R2
=
1
−3
10
︸ ︷︷ ︸
~b in R3
(14)
Page 13
Engineering Analysis SESA 2021
This time we multiply a column vector in R2 (the domain) by A and get a column
vector in R3, so A : R2 → R
3. The set of possible outputs in R3 is the range of
A. ~b is in the range of A.
From the rules of matrix multiplication, we know that for any matrix A and
vectors ~u and ~v
A(~u+ ~v) = A~u+A~v and A(c~u) = cA~u (15)
where c is any number (scalar). These rules tell us that A preserves linear
combinations.
Example 4.4
A =
2 3
−1 1
~u =
1
−1
~v =
−1
0
(16)
Then
A~u+A~v =
2 3
−1 1
1
−1
+
2 3
−1 1
−1
0
=
−1
−2
+
−2
1
=
−3
−1
(17)
and
Page 14
Engineering Analysis SESA 2021
A(~u+ ~v) = A
1− 1
−1 + 0
=
2 3
−1 1
0
−1
=
−3
−1
(18)
Exercise: Check that A(5~u) = 5A~u, i.e., that A times 5~u is the same as 5 times
A~u.
Because matrices satisfy the rules (15) we call them linear transformations
(or linear maps).
5 Span
We want to be able to write all vectors in a space V as sums of some special
‘fundamental’ vectors. We will build up to this slowly over the next few sections.
Without perhaps knowing it, you have already done this in the Euclidean
spaces using the standard basis vectors in example 2.1. Let’s look at an
example.
Example 5.1 The standard basis vectors in R3 are
~e1 = (1, 0, 0) ~e2 = (0, 1, 0) ~e3 = (0, 0, 1)
You can write any vector in R3 as a linear combination of ~e1, ~e2 and ~e3.
This means that any vector is a constant times ~e1 plus a constant times ~e2 plus
constant times ~e3. For example:
Page 15
Engineering Analysis SESA 2021
(2, 3, 1) = 2× ~e1 + 3× ~e2 + 1× ~e3
= 2(1, 0, 0) + 3(0, 1, 0) + (0, 0, 1)
= (2, 3, 1)
(19)
and this holds in all of the Euclidean spaces.
We can generalise this idea of linear combinations to general vector spaces.
We say that a vector ~w in V is a linear combination of vectors
~v1, ~v2, ~v3, ..., ~vn (all in V ) if it can be written as:
~w = c1~v1 + c2~v2 + .......+ cn~vn (20)
for some scalars c1, c2, ...cn.
Example 5.2 Is ~u = (−12, 20) in R2 a linear combination of ~v1 = (−1, 2) and
~v2 = (4,−6)?
If it is, then
(−12, 20) = c1(−1, 2) + c2(4,−6) (21)
or
−c1 + 4c2 = −12 and 2c1 − 6c2 = 20 (22)
The solution to these equations is c1 = 4 and c2 = −2. So ~u = 4~v1 − 2~v2, i.e.,
~u is a linear combination of ~v1 and ~v2.
Page 16
Engineering Analysis SESA 2021
Example 5.3 Is ~u = (1,−4) in R2 a linear combination of ~v1 = (2, 10) and
~v2 = (−3,−15)?
If it is, then
(1,−4) = c1(2, 10) + c2(−3,−15) (23)
or
2c1 − 3c2 = 1
10c1 − 15c2 = −4 ⇒ 2c1 − 3c2 = −45
(24)
The second equation contradicts the first, so there is no solution. ~u is NOT a
linear combination of ~v1 and ~v2.
Suppose we have a vector space V . We are interested in finding a set S of vectors
from V that allows us to write any vector in V as a linear combination of the
vectors in S. Let’s look at an example.
Example 5.4 Any vector in R3 can be written as a linear combination of the
standard basis vectors ~e1, ~e2 and ~e3. We say that these vectors “span R3”, i.e.,
all vectors in R3 can be written as linear combinations of them. Remember that
we write a set of vectors inside curly brackets, so the set of basis vectors in R3
is written {~e1, ~e2, ~e3}. To signify that this set spans R3 we write
R3 = span {~e1, ~e2, ~e3}
We now generalise the idea of a span to general vector spaces:
Page 17
Engineering Analysis SESA 2021
Let S = {~v1, ~v2, ....~vn} be a set of vectors in a space V and let W be the set
of all linear combinations of the vectors in S. The set W is called the span
of the vectors ~v1, ~v2, ....~vn and we write
W = spanS = span {v1, v2, ..., vn}
Example 5.5 Do the following vectors span R3?
~v1 = (2, 0, 1) ~v2 = (−1, 3, 4) ~v3 = (1, 1,−2)
If they do, then any vector in R3, say ~u = (u1, u2, u3), can be written as a linear
combination of ~v1, ~v2 and ~v3:
~u = (u1, u2, u3) = c1~v1 + c2~v2 + c3~v3
(u1, u2, u3) = c1(2, 0, 1) + c2(−1, 3, 4) + c3(1, 1,−2)
(25)
or
2c1 − c2 + c3 = u1
3c2 + c3 = u2
c1 + 4c2 − 2c3 = u3
(26)
We need to be able to find values for c1, c2 and c3. These equations can be
written in matrix form as
Page 18
Engineering Analysis SESA 2021
2 −1 1
0 3 1
1 4 −2
︸ ︷︷ ︸
A
c1
c2
c3
︸ ︷︷ ︸
~c
=
u1
u2
u3
︸ ︷︷ ︸
~u
(27)
To have a solution ~c, the matrix A has to be invertible, i.e., have an inverse. For
then: ~c = A−1~u. To have an inverse, the determinant of A has to be non-zero.
Exercise: check that the determinant of A is −24.
So we can find values for c1, c2 and c3. Therefore
span {~v1, ~v2, ~v3} = R3
Page 19
Engineering Analysis SESA 2021
6 Linear independence
We want to know when a set of vectors S will span the whole of a vector space
V , i.e., when we can write all vectors in V as linear combinations of the vectors
in S. There are two things we have to make sure of: (i) there are enough
vectors in S to describe all of V and (ii) there are no redundant vectors in S
so we can write each vector in V as a linear combination in only one way. By
a redundant vector, we mean that it is a linear combination of the other vectors
in the set, so we don’t really need it.
In order to reach this goal, we need firstly to identify when vectors in a set
are “independent” of each other, by which again we mean that none of them is
a linear combination of the others.
Example 6.1 Consider the vectors ~v1 = (2,−2, 4) and ~v2 = (3,−5, 4) and
~v3 = (0, 1, 1) in R3. If these vectors are “dependent” we can form linear combi-
nations, i.e., we should be able to get
c1~v1 + c2~v2 + c3~v3 = ~0 or c1~v1 = −c2~v2 − c3~v3 (28)
where the scalars c1, c2 and c3 cannot all be zero (otherwise it is not possible to
form a linear combination and the vectors are independent). Substituting, we get
c1(2,−2, 4) + c2(3,−5, 4) + c3(0, 1, 1) = (0, 0, 0)
which leads to a system of equations in matrix form
Page 20
Engineering Analysis SESA 2021
2 3 0
−2 −5 1
4 4 1
︸ ︷︷ ︸
A
c1
c2
c3
︸ ︷︷ ︸
~c
=
0
0
0
(29)
For equations of the form A~c = ~0, there is only the trivial solution (~c = ~0) if A
is invertible. Otherwise, there will be a non-trivial solution (at least one of c1,
c2 and c3 will not be zero). Exercise: check that det(A) = 0.
So we can find at least one non-trivial solution, ~c to equation (28). Thus,
the vectors ~v1, ~v2 and ~v3 are not independent.
This leads us on to the definition of “linear independence”, which is just a
generalisation of the “dependence” concept above.
Let S = {~v1, ~v2, ....~vn} be a set of vectors in some vector space V . If the
equation
c1~v1 + c2~v2 + ...+ cn~vn = ~0
is only satisfied when c1 = c2 = ... = cn = 0, we say that the vectors
~v1, ~v2, ....~vn are linearly independent. Otherwise, we say that the vectors
are linearly dependent.
Let’s look at some more examples.
Example 6.2 Are the vectors ~v1 = (3,−1) and ~v2 = (−2, 2) in R2 linearly
independent?
Let’s set up the equation:
Page 21
Engineering Analysis SESA 2021
c1~v1 + c2~v2 = ~0 ⇒ c1(3,−1) + c2(−2, 2) = (0, 0)
which leads to a system of equations
3c1 − 2c2 = 0 − c1 + 2c2 = 0
the only solution to which is c1 = c2 = 0 (the trivial solution). Therefore, ~v1
and ~v2 linearly independent.
Example 6.3 The standard basis vectors in R3, ~e1, ~e2 and ~e3, are linearly in-
dependent. Try to find numbers c1, c2 and c3 such that c1~e1 + c2~e2 + c3~e3 = ~0.
It’s impossible!
Page 22
Engineering Analysis SESA 2021
7 Basis and dimension
To this point, we’ve been using the term “standard basis” in the Euclidean n
spaces, without really knowing what the “basis” part of this expression means.
Moreover, in these so-called “n-dimensional” spaces, what does “dimension”
actually mean. In R2 and R
3 the “dimension” is usually thought of geometrically
as the number of axes, typically labelled x, y and z. The more general concept of
dimension will reduce to this definition. First we will tackle the issue of “basis”.
Earlier, I said we were working towards writing all vectors in a space V as
sums (linear combinations) of some special ‘fundamental’ vectors, S = {~v1, ~v2, ..., ~vn}.
There should be enough vectors to span the whole of V , i.e., V = spanS (any
vector in V can be obtained from a linear combination of the vectors in S). At
the same time, there should be no redundant (linearly dependent) vectors in
S because the linear combinations should be unique. These two requirements
basically lead to the special set of vectors we are looking for, and we call this
set a “basis for V ”.
Let S = {~v1, ~v2, ....~vn} be a set of vectors in some vector space V . If
V = span {~v1, ~v2, ..., ~vn}
and ~v1, ~v2, ....~vn are linearly independent, we call S a basis for V .
Example 7.1 The standard basis vectors ~e1, ~e2 and ~e3 form a basis for R3
(hence the name).
(i) We already know from examples 5.1 and 5.4 that R3 = span {~e1, ~e2, ~e3}. ✔
Page 23
Engineering Analysis SESA 2021
(ii) From example 6.3 we know that the standard basis vectors are linearly inde-
pendent. ✔
Example 7.2 Determine if the vectors ~v1 = (1,−1, 1), ~v2 = (0, 1, 2) and ~v3 =
(3, 0,−1) form a basis for R3.
First we have to check whether these vectors are linearly dependent., i.e., can
we find c1, c2 and c3 (not all zero) such that c1~v1 + c2~v2 + c3~v3 = ~0?
c1(1,−1, 1) + c2(0, 1, 2) + c3(3, 0,−1) = (0, 0, 0)
or in matrix form
1 0 3
−1 1 0
1 2 −1
︸ ︷︷ ︸
A
c1
c2
c3
︸ ︷︷ ︸
~c
=
0
0
0
(30)
Also, to have R3 = span {~v1, ~v2, ~v3}, any vector (u1, u2, u3) in R
3 has to be a
linear combination of ~v1, ~v2 and ~v3
C1(1,−1, 1) + C2(0, 1, 2) + C3(3, 0,−1) = (u1, u2, u3)
or in matrix form
1 0 3
−1 1 0
1 2 −1
︸ ︷︷ ︸
A
C1
C2
C3
︸ ︷︷ ︸
~C
=
u1
u2
u3
︸ ︷︷ ︸
~u
(31)
Page 24
Engineering Analysis SESA 2021
If det(A) is not zero, then (31) has a unique solution ~C and the only solution to
equation (30) is the trivial solution ~c = ~0. Exercise: Check that det(A) = −10.
Therefore, ~v1, ~v2 and ~v3 are linearly independent and they span R3. Thus, they
form a basis for R3.
We now come to the concept of “dimension”.
Suppose that S = {~v1, ~v2, ...., ~vn} is a basis for a vector space V . If the
number of vectors in S is finite, say n, we say that V is finite dimensional
with dimension n. We write dim(V ) = n . Otherwise, the space is said
to be infinite dimensional.
It turns out importantly that
All bases of V contain the same number of vectors
Example 7.3 All the spaces Rn are finite dimensional with dimension n. For
example, R3 has dimension 3. All bases for R3 will have 3 vectors. If there are
more, they will not be linearly independent. If there are fewer, they will not span
R3.
Page 25
Engineering Analysis SESA 2021
8 Changing the basis
We’ve already seen through examples that a basis for a vector space is not
unique. For example, the standard basis in R3 and the set of vectors {~v1, ~v2, ~v3}
in example 7.2 are both bases in R3.
The standard basis in Rn is generally the easiest one to work with but there
may be cases in which an alternative basis is preferable. Therefore, we need to
find a way to convert between different bases. Let’s look at an example to sort
out some terminology.
Example 8.1 Using the standard basis in R3 we can write the vector (3, 5, 2)
as
(3, 5, 2) = 3(1, 0, 0) + 5(0, 1, 0) + 2(0, 0, 1) = 3~e1 + 5~e2 + 2~e3
The numbers multiplying the basis vectors, 3, 5 and 2, are called the “coordi-
nates” of the vector. It is clear that the coordinates will change depending on
the basis. For the standard basis, the coordinates are simple to find: they are
just the numbers in the vector itself. For other bases, you have to think a bit
more.
We now generalise the idea of coordinates.
Page 26
Engineering Analysis SESA 2021
Let S = {~v1, ~v2, ....~vn} be a basis for a vector space V . Since S is a basis, we
can express any vector ~u in V as a linear combination of the vectors in S.
~u = c1~v1 + c2~v2 + ....+ cn~vn
The numbers c1, c2, ..., cn are called the coordinates of ~u with respect to
the basis S
The coordinates for a vector with respect to a basis S can themselves be written
as a vector in Rn, which we call a coordinate vector
(~u)S = (c1, c2, ..., cn)
The subscript S makes it clear that the coordinates are with respect to S. For
the standard bases in Rn, the coordinate vector (~u)S is exactly the same as the
vector ~u itself, as seen in the example above.
Example 8.2 Determine the coordinate vector (~u)S of the vector u = (10, 5, 0)
relative to the following bases.
(i) The standard basis in R3.
In this case
~u = 10~e1 + 5~e2 + 0~e3
so the coordinates are 10, 5 and 0, and the coordinate vector is simply
(~u)S = (10, 5, 0) = ~u
Page 27
Engineering Analysis SESA 2021
(ii) S = {~v1, ~v2, ~v3} where ~v1 = (1,−1, 1), ~v2 = (0, 1, 2) and ~v3 = (3, 0,−1).
In this case, we have to find the coordinates c1, c2 and c3 such that
c1(1,−1, 1) + c2(0, 1, 2) + c3(3, 0,−1) = (10, 5, 0)
This is equivalent to the system of equations
c1 + 3c3 = 10
−c1 + c2 = 5
c1 + 2c2 − c3 = 0
(32)
The answer is c1 = −2, c2 = 3 and c3 = 4. Exercise: Check this result.
Therefore,
(~u)S = (−2, 3, 4)
Now onto how to change bases. We will work in R2 to demonstrate the procedure
Suppose we have two bases for the space R2:
B = {~v1, ~v2} Basis 1
C = {~w1, ~w2} Basis 2
Now because B is a basis for R2, each of the basis vectors in C can be written
as a linear combination of the basis vectors in B
Page 28
Engineering Analysis SESA 2021
~w1 = a~v1 + b~v2
~w2 = c~v1 + d~v2
(33)
This means that the coordinate vectors for ~w1 and ~w2 relative to the basis B
are
(~w1)B = (a, b) and (~w2)B = (c, d)
Unfortunately, we now have to introduce a new notation for writing these coor-
dinate vectors. Instead of ( )B we are going to write them using [ ]B and call
them coordinate matrices.
[~w1]B =
a
b
and [~w2]B =
c
d
(34)
They are basically the same as the coordinate vectors, written as columns.
Next, let ~u be any vector in V . In terms of the basis C, we can write ~u as
~u = c1 ~w1 + c2 ~w2 (35)
The coordinate matrix of ~u relative to C is:
[~u]C =
c1
c2
(36)
Page 29
Engineering Analysis SESA 2021
Equation (33) tells us how to write the basis vectors in C as linear combinations
of the basis vectors B. Substituting equation (33) into equation (35), we get
~u = c1 ~w1 + c2 ~w2
= c1(a~v1 + b~v2) + c2(c~v1 + d~v2)
= (ac1 + cc2)~v1 + (bc1 + dc2)~v2
(37)
This gives us the coordinate matrix of ~u relative to the basis B
[~u]B =
ac1 + cc2
bc1 + dc2
(38)
Let us re-write this as
[~u]B =
ac1 + cc2
bc1 + dc2
=
a c
b d
︸ ︷︷ ︸
P
c1
c2
︸ ︷︷ ︸
[~u]C
=
a c
b d
[~u]C (39)
The matrix P is called the transition matrix from C to B: given the co-
ordinate matrix of a vector relative to the basis C, we can use it to find the
coordinate matrix relative to the basis B. Notice that its columns are the coor-
dinate matrices for the basis vectors C relative to B, [~w1]B and [~w2]B. We can
therefore write P compactly as
P = [[~w1]B [~w2]B]
Page 30
Engineering Analysis SESA 2021
Equation (39) can then be written compactly as
[~u]B = P [~u]C = [[~w1]B [~w2]B] [~u]C
We can now generalise this result.
Suppose we have two bases for the vector space V :
B = {~v1, ~v2, ..., ~vn} Basis 1
C = {~w1, ~w2, ..., ~wn} Basis 2
The transition matrix from C to B is defined as
P = [[~w1]B [~w2]B...... [~wn]B] (40)
where the ith column of P is the coordinate matrix of ~wi relative to B.
The coordinate matrix of a vector ~u in V relative to B is then related to
the coordinate matrix of ~u relative to C by
[~u]B = P [~u]C (41)
Example 8.3 Consider the standard basis B = {~e1, ~e2, ~e3} and the basis C =
{~v1, ~v2, ~v3}, where ~v1 = (1,−1, 1) and ~v2 = (0, 1, 2) and ~v3 = (3, 0,−1), for R3.
Page 31
Engineering Analysis SESA 2021
(i) Find the transition matrix from C to B
(ii) Find the transition matrix from B to C
(i) Recall that the columns of the transition matrix are coordinate matrices for
the basis vectors C relative to B. In other words, we have to find the coordinates
of the basis vectors ~v1, ~v2 and ~v3 when they are written as linear combinations
of ~e1, ~e2 and ~e3. We know from examples 8.1 and 8.2 that in the standard basis,
the coordinate vector (and therefore the coordinate matrix) is simply the vector
itself. Thus
[~v1]B =
1
−1
1
[~v2]B =
0
1
2
[~v3]B =
3
0
−1
(42)
From equation (40), the transition matrix from C to B is then
P = [[~v1]B [~v2]B [~v3]B] =
1 0 3
−1 1 0
1 2 −1
(43)
(ii) To find the transition matrix from B to C we need the coordinate matrices
of the standard basis vectors relative to C. In other words, we have to find the
coordinates of ~e1, ~e2 and ~e3 when they are written as linear combinations of ~v1,
~v2 and ~v3. This requires more work (for you!)
Exercise: Verify that
Page 32
Engineering Analysis SESA 2021
~e1 =110~v1 +
110~v2 +
310~v3
~e2 = −35~v1 +
25~v2 +
15~v3
~e3 =310~v1 +
310~v2 − 1
10~v3
Therefore, the coordinate matrices of the standard basis vectors relative to C are
[~e1]B =
1/10
1/10
3/10
[~e2]B =
−3/5
2/5
1/5
[~e3]B =
3/10
3/10
−1/10
(44)
and the transition matrix from B to C is
P ′ = [[~e1]C [~e2]C [~e3]C ] =
1/10 −3/5 3/10
1/10 2/5 3/10
3/10 1/5 −1/10
(45)
Example 8.4 Using the results of the previous example, compute
(i) [~u]B given (~u)C = (−2, 3, 4)
(ii) [~u]C given (~u)B = (10, 5, 0)
(i) All we need to do now is use equation (41), i.e., some matrix multiplication
Page 33
Engineering Analysis SESA 2021
[~u]B = P [~u]C =
1 0 3
−1 1 0
1 2 −1
−2
3
4
=
10
5
0
(46)
Looking back at example 8.2(ii), we can see that this is the right result. Once we
have the transition matrix, we can perform this computation quickly and easily
for many vectors.
(ii) This time, we swap the bases and use the transition matrix P ′ instead of P ,
since we’re going from B to C.
[~u]C = P ′[~u]B =
1/10 −3/5 3/10
1/10 2/5 3/10
3/10 1/5 −1/10
10
5
0
=
−2
3
4
(47)
as expected from part (i).
There is one final observation to make
The transition matrix from the basis B to C is the inverse of the transition
matrix from C to B.
Exercise: Check that P ′ is the inverse of P in example 8.4.
Page 34
Engineering Analysis SESA 2021
9 Fundamental subspaces
There are some very important subspaces of Rn that we will be interested in.
These subspaces are associated with matrices. Let’s look at a general n × m
matrix
Anm =
a11 a12 · · · a1m
a21 a22 · · · a2m
......
. . ....
an1 an2 · · · anm
(48)
It has n rows and m columns. The row vectors are the vectors formed out
of the rows of Anm (these are in Rm) and the column vectors are the vectors
formed out of the columns of Anm (these are in Rn).
Example 9.1 Consider the 4× 2 matrix
A =
−1 5
0 −4
9 2
3 −7
(49)
The row vectors are
~r1 = (−1, 5) ~r2 = (0,−4) ~r3 = (9, 2) ~r4 = (3,−7) (50)
which are vectors in R2 (there are m = 2 columns) and the column vectors are
Page 35
Engineering Analysis SESA 2021
~c1 =
−1
0
9
3
~c2 =
5
−4
2
−7
(51)
which are vectors in R4 (there are n = 4 columns).
There are three important subspaces of Rn and R
m associated with a matrix
Anm. We call them the fundamental subspaces of Anm.
First let’s recall that a matrix Anm is a linear transformation that takes
any column vector in Rm and transforms by multiplication into a column vector
in Rn. We write Anm : Rm → R
n. The domain of Anm is Rm (the set of inputs)
and the range of Anm is the set of all possible outputs (“images”) in Rn (which
is generally not all of Rn, just a subspace of it).
Now onto the fundamental subspaces of Anm.
(1) The first subspace is related to the zero vector in Rn. The set of all vectors
~u in the domain Rm that give
Anm~u = ~0 (52)
is called the null space or kernel of Anm. In other words, those vectors in
the domain (inputs) that when operated on by Anm give us the zero vector
in Rn. We write the null space of a matrix A as null(A) or ker(A)
Page 36
Engineering Analysis SESA 2021
(2) The span of the row vectors of Anm, i.e., the set of all linear combinations of
the row vectors, is called the row space of Anm. Because the row vectors
are in Rm, the row space is a subspace of Rm. We write the row space of a
matrix A as row(A)
(3) The span of the column vectors of Anm, i.e., the set of all linear combinations
of the column vectors, is called the column space of Anm. Because the
column vectors are in Rn, the column space is a subspace of Rn. We write
the column space of a matrix A as col(A)
We will be interested in finding bases for each of these spaces. First another
example.
Example 9.2 Find the null space ker(A) of the following matrix
A =
1 −7
−3 21
(53)
To find the null space, we use equation (52). Let’s assume that (u1, u2) is a
vector in ker(A). Then equation (52) leads to
A =
1 −7
−3 21
u1
u2
=
0
0
(54)
which can be written as as system of linear equations
Page 37
Engineering Analysis SESA 2021
u1 − 7u2 = 0
−3u1 + 21u2 = 0 ⇒ −u1 + 7u2 = 0
(55)
The two equations are equivalent, and are satisfied when (u1, u2) = (7t, t) for
any number t. Therefore, ker(A) consists of all vectors of the form (7t, t) for
any number t, of which there are infinitely many.
Now, this is one way of finding the null space and a basis for it. However, we
want to be able to find bases for all the fundamental spaces for more complicated
matrices using just one procedure. This procedure is described through another
example.
Before we move onto the example, we first have to review the concepts of
augmented matrices and reduced echelon forms, which you have covered
in your first year maths modules.
Suppose we have a linear system of homogeneous (right hand sides are zero)
equations:
−u1 + 2u2 − u3 + 5u4 + 6u5 = 0
4u1 − 4u2 − 4u3 − 12u4 − 8u5 = 0
2u1 − 6u3 − 2u4 + 4u5 = 0
−3u1 + u2 + 7u3 − 2u4 + 12u5 = 0
(56)
We can write this in matrix form as
Page 38
Engineering Analysis SESA 2021
−1 2 −1 5 6
4 −4 −4 −12 −8
2 0 −6 −2 4
−3 1 7 −2 12
︸ ︷︷ ︸
A
u1
u2
u3
u4
u5
=
0
0
0
0
(57)
A convenient way of writing this system of equations is by forming the aug-
mented matrix
−1 2 −1 5 6 0
4 −4 −4 −12 −8 0
2 0 −6 −2 4 0
−3 1 7 −2 12 0
(58)
The entries to the left of the line represent the coefficients of u1 to u5 in equations
(56) and (57). The zeros to the right of the line represent the terms on the right
hand sides of the ‘=’ signs in equations (56) and (57).
Now, in the system of equations (56) we can multiply or divide any equation
by a constant, we can add or subtract equations or we can swap the equations
around without altering the solutions. You do this, e.g., when you solve 2 linear
simultaneous equations.
Page 39
Engineering Analysis SESA 2021
Aside.
Solve the following system and make a note of the steps required.
u1 − 2u2 = 2 3u1 + u2 = −2
The same is true, therefore, of the augmented matrix (58), which represents the
system of equations (56): We can
• Interchange 2 rows
• Multiply or divide a row by a non-zero number
• Add a multiple of one row to another.
These are called elementary row operations. They are equivalent to adding
equations (56), multiplying them by constants and interchanging them. The
augmented matrix is just a more compact way of doing it. We also have to be
careful about the right hand sides when we perform the operations. However,
for the homogeneous system above they are zero, so they do not affect the row
operations.
We now want to find the reduced row echelon form of the matrix. We
get this by performing elementary row operations until the augmented matrix
satisfies the following properties.
• In each row, the first non zero entry from the left is 1. This is called the
leading 1.
• The leading 1 in each row is to the right of the leading 1 in the row above.
Page 40
Engineering Analysis SESA 2021
• All rows consisting entirely of zeros are at the bottom of the matrix.
Exercise: go through the following steps on the augmented matrix (58)
(1) row 2 + 4 × row 1
(2) row 3 + 2 × row 1
(3) row 2 ÷ 4
(4) row 1 × −1
(5) row 3 ÷ 4
(6) row 4 + 3 × row 1
(7) row 3 - row 2
(8) row 4 + 5 × row 2
(9) row 4 ↔ row 3
(10) row 4 ÷ 7
to confirm that the reduced row echelon form is
U =
1 −2 1 −5 −6 0
0 1 −2 2 4 0
0 0 0 1 −2 0
0 0 0 0 0 0
(59)
We now move onto the example.
Example 9.3 Determine a basis for the null space of the following 4×5 matrix
Page 41
Engineering Analysis SESA 2021
A =
−1 2 −1 5 6
4 −4 −4 −12 −8
2 0 −6 −2 4
−3 1 7 −2 12
(60)
To find the null space we need to solve equation (52) for ~u = (u1, u2, u3, u4, u5)
in R5. This is the same as equation (57) above. We put it into the augmented
matrix, which is given by matrix (58).
Now we need the reduced row echelon form of the matrix. Again, we have
done this already. The answer is given by matrix (59)
U =
1 −2 1 −5 −6 0
0 1 −2 2 4 0
0 0 0 1 −2 0
0 0 0 0 0 0
(61)
Thus, we only have 3 equations (the top 3 rows), but 5 unknowns. Let’s set
u5 = s, where s is any number. The third equation (row) gives
u4 = 2u5 = 2s
Now set u3 = t for any number t. The second equation (row) gives
u2 = 2u3 − 2u4 − 4u5 = 2t− 8s
Finally, the first equation (row) gives
u1 = 2u2 − u3 + 5u4 + 6u5 = 3t
Page 42
Engineering Analysis SESA 2021
The full solution is
~u =
3t
2t− 8s
t
2s
s
= t
3
2
1
0
0
︸ ︷︷ ︸
~u1
+s
0
−8
0
2
1
︸ ︷︷ ︸
~u2
(62)
for any numbers t and s. There are infinitely many solutions because the number
of unknowns is greater than the number of equations. So, the null space consists
of all vectors of the form c1~u1 + c2~u2.
In the above example, we haven’t quite answered the question - we still haven’t
specified a basis! It looks like the vectors ~u2 and ~u2 could form a basis. The cer-
tainly span the whole of the null space, but are they linearly independent. Yes,
they are (Exercise: check that they are). So, they satisfy the two properties
required to be a basis.
We now come to the main reason for solving the system by finding the
reduced row echelon form.
Page 43
Engineering Analysis SESA 2021
Let Anm be an n×m matrix.
• The vectors found for the null space of the reduced echelon form of Anm
are always linearly independent. They form a basis for the null space of
the reduced echelon matrix and for the null space of the original matrix.
The dimension of the null space (i.e., number of basis vectors) is called
the nullity of Anm, written nullity(Anm) .
• The row vectors containing the leading 1’s in the reduced echelon form of
Anm form a basis for the row space of the reduced echelon matrix and for
the row space of the original matrix Anm.
• The column vectors containing the leading 1’s in the reduced echelon
from of Anm form a basis for the column space of the reduced echelon
form. Suppose that these columns vectors correspond to column numbers
m1,m2, ..,mk.
The column vectors of the original matrix Anm corresponding to
column numbers m1,m2, ..,mk form a basis for the original matrix.
Page 44
Engineering Analysis SESA 2021
Example 9.4 Let’s look again at the matrix A in example 9.3. The reduced
row echelon form U is given by equation (59)
U =
1 −2 1 −5 −6 0
0 1 −2 2 4 0
0 0 0 1 −2 0
0 0 0 0 0 0
(63)
We found that there are two vectors in the basis for the null space. All bases
have the same number of vectors. Therefore nullity(A) = 2.
Rows 1, 2 and 3 of U contain the leading 1’s. Therefore, a basis for the row
space of both A and U is given by
~r1 = (1,−2, 1,−5,−6)
~r2 = (0, 1,−2, 2, 4)
~r3 = (0, 0, 0, 1,−2)
with dim(row(A))=3
Columns 1, 2 and 4 of U contain the leading 1’s. Therefore, a basis for the
column space of U is given by the 1st, 2nd and 4th column vectors of U
~c′1 = (1, 0, 0, 0)
~c′2 = (−2, 1, 0, 0)
~c′4 = (−5, 2, 1, 0)
A basis for the column space of A is therefore given by the 1st, 2nd and 4th
Page 45
Engineering Analysis SESA 2021
column vectors of A
~c1 = (−1, 4, 2,−3)
~c2 = (2,−4, 0, 1)
~c4 = (5,−12,−2,−2)
with dim(col(A))=3
Notice in this example that dim(row(A)) = dim(col(A)), i.e., the column and
row spaces have the same dimension. This is always true.
The row space and column space of a general n×mmatrix Anm have the same
dimension. We call this dimension the rank of Anm, written rank(Anm) .
The second thing to notice from the example above is that nullity(A)+rank(A) =
2 + 3 = 5, i.e., the number of columns. This again is always true.
For a general n×m matrix Anm (m columns)
nullity(Anm) + rank(Anm) = m
For an n× n matrix A
nullity(A) + rank(A) = n (64)
Page 46
Engineering Analysis SESA 2021
10 Square matrices and systems of linear equations
The concepts of rank and nullity are important. Let’s consider a square n × n
matrix A : Rn → Rn. A typical problem in many applications of engineering is
to find a solution ~u in Rn to the equation
A~u = ~b (65)
where the vector~b in Rn is known. We will look at certain aspects of this problem
with an example.
Example 10.1 Consider the matrix
A =
1 −2 1
2 1 −2
−3 0 2
= (~c1 ~c2 ~c3) (66)
where ~c1, ~c2 and ~c3 are the column vectors of A
~c1 =
1
2
−3
~c2 =
−2
1
0
~c3 =
1
−2
2
(67)
Now consider the procedure for multiplying a vector ~u = (u1, u2, u3) by A
Page 47
Engineering Analysis SESA 2021
1 −2 1
2 1 −2
−3 0 2
u1
u2
u3
=
1× u1 + (−2)× u2 + 1× u3
2× u1 + 1× u2 + (−2)× u3
(−3)× u1 + 0× u2 + 2× u3
= u1
1
2
−3
+ u2
−2
1
0
+ u3
1
−2
2
= u1~c1 + u2~c2 + u3~c3
(68)
i.e., any matrix multiplication leads to a linear combination of the column vec-
tors, i.e., a vector in the column space.
From the above example we can see that if we want to solve equation (65), the
vector ~b has to be in the column space of A. It also shows that all output vectors
(i.e. the range of A) are in the column space of A
The range of a square matrix is its column space
Next, let’s consider the nullity and rank. What happens when the rank of an
n × n matrix is less than n? From the definition of rank, we know that if
rank(A) < n, some of the column and row vectors will be linearly dependent -
they can be obtained from the other rows by forming linear combinations and
are, therefore, redundant. If we were to set up the matrix system (65) with some
vector b and look for a solution u, then we would not have enough equations or
some equations would contradict each other. Therefore, a solution will not exist
Page 48
Engineering Analysis SESA 2021
all or there will be infinitely many solutions ⇒ A will not have an inverse.
The rank of A is less than n ⇐⇒ A is not invertible
A square n×n matrix A with rank(A) = n is said to have full rank (obviously
the rank cannot be any bigger!) If rank(A) < n, the matrix A is said to be rank
deficient. We can restate the above as
A is rank deficient ⇐⇒ A is not invertible
By definition, if a matrix A is rank deficient some of the rows are linearly
dependent. By performing elementary row operations (adding multiples of rows
to other rows) we can get a new matrix B that will have a row of zeros. The
determinants of A andB will differ only by a constant. Therefore, since det(B) =
0, we have det(A) = 0, which means that A will not have an inverse.
A is rank deficient ⇐⇒ det(A) = 0
Another way to look at nullity and rank is by considering the solutions to
A~v = ~0 (69)
Page 49
Engineering Analysis SESA 2021
the solution to this equation clearly gives us the null space ker(A). The nullity
of A is the number of vectors in the basis for ker(A). If there are non-zero so-
lutions to equation (69), then nullity(A) > 0. Equation (64) then tells us that
rank(A) < n. In this case, we can write
A(~u+ ~v) = A~u+A~v = ~b+~0 = ~b
What does this tell us? It tells us that if ~u is a solution to equation (69) then
so is ~u+ ~v, and there may be an infinite number of the ~v. This suggests that if
a solution to equation (65) exists, it will not be unique.
A is rank deficient ⇐⇒ no unique solution to A~u = ~b
Page 50
Engineering Analysis SESA 2021
11 Inner product spaces and orthogonality
There is a special class of spaces that we are going to look at. The Euclidean
spaces fall into this category.
What we would like to do, as in Rn, is measure the (i) magnitude (or
“length”) of a vector and (ii) angles and distances between vectors. In R2
and R3 you can visualise these but in higher dimensions you can’t. The basic idea
is to introduce generalisations of the familiar “dot product” and “magnitude”
of a vector in R2 or R
3.
Example 11.1 The dot product in R2 and R
3 is defined as follows
~u · ~v = (u1, u2, u3) · (v1, v2, v3) = u1v1 + u2v2 + u3v3
where we multiply the first, second, etc. coordinate of the first vector by the first,
second etc. coordinate of the second vector and add the results. The dot product
has a geometric interpretation
~u · ~v = |~u||~v| cos θ
where |~u| =√
u21 + u22 + u23 and |~v| =√
v21 + v22 + v23 are the “magnitudes” of
the vectors and θ is the angle between the vectors in the plane that contains them
both. Notice also that
√~u · ~u =
√
u21 + u22 + u23 = |~u|
Page 51
Engineering Analysis SESA 2021
and that
~u · ~v = ~v · ~u
(~u+ ~v) · ~w = ~u · ~w + ~v · ~w
(c~u) · ~v = c(~u · ~v) for any scalar c
~u · ~u = u21 + u22 + u23 ≥ 0
~u · ~u = 0 if and only if ~u = ~0
(70)
In general Rn spaces we can define the same dot product (multiply individual
respective components)
~u · ~v = u1v1 + u2v2 + ....+ unvn
and the length of a vector in Rn is given by
|~u| =√
u21 + u22 + ...+ u2n =√~u · ~u
Now let’s look at a general vector space V . We want similar measures of “angles”
and “magnitudes”.
Page 52
Engineering Analysis SESA 2021
• What we do is extend the idea of the dot product and call it an inner
product
• Like the dot product of two vectors, the inner product of two vectors gives
us a number.
• As with the dot product, we will be able to use the inner product to
measure “angles” and “magnitudes”.
• We write 〈~u,~v〉 to represent the inner product of two vectors.
Example 11.2 The dot product on Euclidean spaces is an example of an inner
product. It is called the standard inner product on these spaces.
Example 11.3 Let ~u = (1,−2, 4), ~v = (−2, 0, 1) and ~w = (3,−2, 2). With the
standard inner product (i.e., just the dot product)
〈~u,~v〉 = 〈(1,−2, 4), (−2, 0, 1)〉 = 1× (−2) + (−2)× 0 + 4× 1 = 2
〈~v, ~u〉 = 〈(−2, 0, 1), (1,−2, 4)〉 = (−2)1 + 0(−2) + 4 = 2 = 〈~u,~v〉
〈~u+ ~v, ~w〉 = 〈~u, ~w〉+ 〈~v, ~w〉 Exercise: Check this
〈c~u,~v〉 = 〈(c,−2c, 4c), (−2, 0, 1)〉 = −2c+ 0 + 4c = 2c = c 〈~u,~v〉
〈~u, c~v〉 = c 〈~u,~v〉 Exercise: Check this
√
〈~u, ~u〉 =√
12 + (−2)2 + 42 =√21 = |~u|
(71)
Page 53
Engineering Analysis SESA 2021
• The properties demonstrated in this example always hold. The property 〈~u,~v〉 =
〈~v, ~u〉 is called “symmetry”.
• The third property is termed “linearity in the first argument” (the two “argu-
ments” are the vectors on either side of the comma.
Exercise: Show that 〈~u,~v + ~w〉 = 〈~u,~v〉+ 〈~u, ~w〉 for the vectors in the example
above. This means that the inner product is “linear in the second argument” as
well as the first. It is, therefore, bilinear.
Hardish exercise (used later on): Show that (“additivity” property)
〈(~v1 + ~v2 + ....+ ~vn), ~w〉 = 〈~v1, ~w〉+ 〈~v2, ~w〉+ .....+ 〈~vn, ~w〉 (72)
HINT: We can write ~v1 + ~v2 + ....+ ~vn = ~v1 + (~v2 + ....+ ~vn)
The sum ~s = ~v2 + ....+ ~vn is just a single vector when we perform the addition.
Then we can apply the third rule in (71). Repeat the procedure by taking out ~v2
from the sum ~s to form a new sum: ~s2 = ~v3+ ....+~vn. Keep going until the new
sum has only the term ~vn.
Example 11.4 We can define other inner products on the Rn spaces. To fix
ideas, let’s take vectors ~u = (u1, u2, u3) and ~v = (v1, v2, v3) in R3. The following
defines an inner product
New: 〈~u,~v〉 = w1u1v1 + w2u2v2 + w3u3v3
Standard inner product: 〈~u,~v〉 = u1v1 + u2v2 + u3v3
Page 54
Engineering Analysis SESA 2021
The new and standard inner products are the same except for the numbers w1,
w2 and w3 multiplying the first second and third terms in the sum respectively.
These numbers are called weights. This is an example of a “weighted inner
product”.
• A vector space on which we can define an inner product is called a inner
product space.
• The inner product has to satisfy the rules (70) when we swap the dot
product for the inner product.
• We are mainly interested in the vector spaces Rn with the inner product
defined by standard inner product, i.e. dot product.
So how do we measure the “magnitude” of a vector?
In the last computation in example 11.3 you saw that√
〈~u, ~u〉 is the mag-
nitude of ~u. Before we go on to define the magnitude in general we are going
to rename it. We will not say the “magnitude of ~u ” but will instead say the
“norm of ~u”. Moreover, we we will not write the norm (magnitude) as |~u|, but
instead we will write it as ‖~u‖. A norm can be defined without reference to an
inner product. However, we are interested in inner product spaces and the inner
product allow us to define a norm as
Page 55
Engineering Analysis SESA 2021
‖~u‖ =√
〈~u, ~u〉
Example 11.5 In the Euclidean spaces with the standard inner product, the
norm induced by the inner product is
‖~u‖ =√
〈~u, ~u〉 =√
u21 + u22 + ...+ u2n
Note that for this space the norm ‖~u‖ is identical to the magnitude |~u|
Example 11.6 Find the norms of the vectors ~u = (3, 4) and ~v = (2,−1, 2,−3)
using the standard inner product
‖~u‖ =√
〈~u, ~u〉 =√32 + 42 = 5
‖~v‖ =√
〈~v,~v〉 =√
22 + (−1)2 + 22 + (−3)2 =√18 = 3
√2
(73)
Example 11.7 In the Euclidean spaces, the norm induced by the standard inner
product satisfies certain properties. For example, for all vectors ~u = (u1, u2, u3)
in R3
‖~u‖ =√
〈~u, ~u〉 =√
u21 + u22 + u23 = 0 if and only if ~u = ~0
‖c~u‖ = c‖~u‖ for any scalar c
(74)
Exercise: For ~u = (1,−2, 2), check that ‖2~u‖ = 2‖~u‖ = 6
Page 56
Engineering Analysis SESA 2021
All norms must satisfy these properties.
Next we must find a way to compute “distances” between vectors. In R2 and
R3, the distance between ~u and ~v is given by |~u− ~v|, i.e., the magnitude of the
difference. For a general inner product space we have
The distance between two vectors ~u and ~v is given by the metric
d(~u,~v) = ‖~u− ~v‖ =√
〈~u− ~v, ~u− ~v〉
(also called distance function)
Example 11.8 In the Euclidean spaces with the standard inner product, the
metric is
d(~u,~v) = ‖~u− ~v‖ =√
〈~u− ~v, ~u− ~v〉
=√
(u1 − v1)2 + (u2 − v2)2 + ...+ (un − vn)2
Note that for this space, ‖~u− ~v‖ is identical to |~u− ~v| .
Exercise: Try to show that
d(~u,~v) = d(~v, ~u)
HINT: (a− b)2 = (b− a)2 for any scalars a and b.
Page 57
Engineering Analysis SESA 2021
Example 11.9 Calculate the metric for ~u = (3, 4, 1,−1) and ~v = (2,−1, 2,−3)
d(~u,~v) = ‖~u− ~v‖ =√
(3− 2)2 + (4 + 1)2 + (1− 2)2 + (−1 + 3)2 =√31
Exercise: Check that d(~u,~v) = d(~v, ~u), i.e., ‖~u− ~v‖ = ‖~v − ~u‖.
Recall that in R2 and R
3, two vectors are at right angles if ~u · ~v = 0 because
~u ·~v = |~u||~v| cos θ. We say that these vectors are orthogonal. In direct analogy,
for a general inner product space, we say that
~u and ~v are orthogonal if 〈~u,~v〉 = 0
Example 11.10 The standard basis vectors in Rn are orthogonal to each other
with the standard inner product. For example
〈(1, 0, 0), (0, 1, 0)〉 = 0, 〈(0, 1, 0), (0, 0, 1)〉 = 0
and so on (remember these are just dot products).
Now, suppose that W is a subspace of an inner product space V . We say that
a vector ~u from V is orthogonal to W if it is orthogonal to every vector in
W . The set of all vectors that are orthogonal to W is called the orthogonal
complement of W and is denoted by W⊥ ( “W perp”).
Page 58
Engineering Analysis SESA 2021
Example 11.11 Consider the space R3 with the standard basis. Let W be the
subspace of R3 consisting of all vectors that lie in the xy plane, i.e., of the form
~q = (q1, q2, 0), for any scalars q1 and q2. The orthogonal complement of W will
be all vectors ~u in R3 that are orthogonal to every vector in W , that is
〈~u, ~q〉 = 〈(u1, u2, u3), (q1, q2, 0)〉 = 0
For this to be true for any choice of u1, u2, u3, q1 and q2, we must have
u1 = u2 = 0. It doesn’t matter what u3 is because the third component of ~q
is always zero. So, we are looking at vectors of the form (0, 0, u3). These are
vectors in the direction of ~e3. The span of ~e3 is all linear combinations of ~e3,
which means vectors of the form c~e3 = (0, 0, c) for any c. Therefore
W⊥ = span {~e3}
Armed with the definition or orthogonal complement, let’s briefly revisit the fun-
damental subspaces of a matrix. There is actually another one. It is associated
with the transpose of the matrix.
Example 11.12 Consider the 3× 3 matrix A and its transpose AT
A =
a11 a12 a13
a21 a22 a23
a31 a32 a33
AT =
a11 a21 a31
a12 a22 a32
a13 a23 a33
(75)
Page 59
Engineering Analysis SESA 2021
To get AT , we swap the columns for the rows. The 3 column vectors of A are
~c1 = (a11, a21, a31), ~c2 = (a12, a22, a32), ~c1 = (a13, a23, a33)
These are also the 3 row vectors of AT . It follows that
Finding a basis for the column space of A is equivalent to finding a basis
for the row space of AT .
Now consider the procedure for multiplying a vector ~u = (u1, u2, u3) by AT
a11 a21 a31
a12 a22 a32
a13 a23 a33
u1
u2
u3
=
a11u1 + a21u2 + a31u3
a12u1 + a22u2 + a32u3
a13u1 + a23u2 + a33u3
=
〈~u,~c1〉
〈~u,~c2〉
〈~u,~c3〉
(76)
Suppose that the vector ~u is in the null space of AT , i.e., ker(AT ). Then
AT~u = ~0
which, looking at equation (76) means that 〈~u,~ci〉 = 0, for i = 1, 2, 3 (~u is
orthogonal to every one of the column vectors of A). Let ~v be any vector in
col(A), i.e., all linear combinations of ~c1, ~c2 and ~c3. Then ~v has the form
~v = a1~c1 + a2~c2 + a3~c3 for some numbers a1, a2 and a3. This gives
Page 60
Engineering Analysis SESA 2021
〈~u,~v〉 = 〈~u, a1~c1 + a2~c2 + a3~c3〉
= a1 〈~u,~c1〉+ a2 〈~u,~c2〉+ a3 〈~u,~c3〉 = 0 + 0 + 0 = 0
(77)
which means that ~u is orthogonal to any vector ~v in col(A). We have demon-
strated is that if ~u is in ker(AT ), it must also be in the orthogonal complement
of col(A), written as col(A)⊥.
Now suppose that ~u is in col(A)⊥. Then it is orthogonal to every vector in
col(A), in particular, to the individual column vectors ~c1, ~c2 and ~c3. From equa-
tion (76) we then see that AT~u = ~0, so ~u is in ker(AT ). We have demonstrated
is that if ~u is in col(A)⊥, it must also be in ker(AT ). Combining this will the
previous result, we conclude that col(A)⊥ and ker(AT ) are the same thing! We
also know that col(A) and the range of A are the same. Therefore
ker(AT ) = col(A)⊥ = range(A)⊥
(HARD) Exercise: Use similar arguments to show that
ker(A) = row(A)⊥
ker(AT ) is the fourth fundamental subspace, called the left null space or cok-
ernel.
Page 61
Engineering Analysis SESA 2021
12 Orthogonal and orthonormal bases
We now come back to the issue of basis. Recall that B is a basis for a vector
space V if every vector in V can be written as a linear combination of the
vectors in B and the vectors in B are linearly independent (none of them is a
linear combination of the others). If B is a basis for V and, furthermore, the
space V has an inner product (i.e. it is an inner product space), we can turn B
into a special type of basis. This new basis will have important and very useful
properties. Before showing you how to construct it, you will need to understand
a few basic concepts.
• Let S be a set of vectors in an inner product space. If each distinct pair
of vectors is orthogonal we call S an orthogonal set.
• If S is an orthogonal set and each vector in S has a norm of 1, then S is
called an orthormal set.
Example 12.1 Given the vectors ~v1 = (2, 0,−1), ~v2 = (0,−1, 0) and ~v3 =
(2, 0, 4) in R3
(a) Show that they form an orthogonal set with the standard inner product but
do not form an orthonormal set.
(b) Turn them into an orthonormal set ~u1, ~u2 and ~u3.
(a) To show that they form an orthogonal set, we have to demonstrate that each
distinct pair is orthogonal.
Page 62
Engineering Analysis SESA 2021
〈~v1, ~v2〉 = 2× 0 + 0× (−1) + (−1)× 0 = 0
〈~v1, ~v3〉 = 2× 2 + 0× 0 + (−1)× 4 = 0
〈~v2, ~v3〉 = 0× 2 + (−1)× 0 + 0× 4 = 0
(78)
Exercise: Why didn’t we compute 〈~v2, ~v1〉, 〈~v3, ~v1〉 and 〈~v3, ~v2〉?
Now, to be an orthonormal set, the norms (magnitudes) of ~v1, ~v2 and ~v3 have
to be 1. Let’s compute them
‖~v1‖ =√
〈~v1, ~v1〉 =√
22 + 02 + (−1)2 =√5 ✗
‖~v2‖ =√
〈~v2, ~v2〉 =√
02 + (−1)2 + 02 = 1 ✔
‖~v3‖ =√
〈~v3, ~v3〉 =√22 + 02 + 42 =
√20 = 2
√5 ✗
(79)
(b) Most of the work is done. All we have to do is divide each vector by its norm
~u1 =~v1‖~v1‖
=1√5(2, 0,−1) =
(2√5, 0,− 1√
5
)
~u2 =~v2‖~v2‖
= (0,−1, 0)
~u3 =~v3‖~v3‖
=1
2√5(2, 0, 4) =
(1√5, 0,
2√5
)
(80)
Exercise: Verify that the norms of these vectors are 1 and that they are orthog-
onal.
Example 12.2 The standard basis vectors in Rn form an orthonormal set with
the standard inner product. For example, ~e1 = (1, 0, 0), ~e2 = (0, 1, 0) and ~e3 =
Page 63
Engineering Analysis SESA 2021
(0, 0, 1) in R3. Exercise: Compute the norms of these vectors and their pairwise
inner products to show that they form an orthonormal set.
There is a special property of orthogonal/orthonormal sets that will come in
very handy
If S is an orthogonal set of vectors in an inner product space, then S is also
a set of linearly independent vectors
How can we show this? Let S = {~v1, ~v2, ...., ~vn} be the set of vectors in question.
We know they are orthogonal. Let’s recall the definition of linear independence:
The vectors ~v1, ~v2, ...., ~vn are linearly independent if the only way to get
c1 ~v1 + c2~v2 + ....+ cn~vn = ~0 (81)
is by having all the numbers c1, c2, ...cn equal to zero. This is equivalent to
saying that no vector can be a linear combination of the others. Let’s now take
the inner product of both sides of (81) with any of the vectors, let’s say ~v1
〈(c1~v1 + c2~v2 + ....+ cn~vn), ~v1〉 =⟨
~0, ~v1
⟩
(82)
The inner product has to satisfy equation (72) (called “additivity”), which gives
us a way to simplify the left hand side of equation (82)
Page 64
Engineering Analysis SESA 2021
〈(c1~v1 + c2~v2 + ....+ cn~vn), ~v1〉
= 〈c1~v1, ~v1〉+ 〈c2~v2, ~v1〉+ .....+ 〈cn~vn, ~v1〉
= c1 〈~v1, ~v1〉+ c2 〈~v2, ~v1〉+ .....+ cn 〈~vn, ~v1〉
= c1 〈~v1, ~v1〉
(83)
What happened to all the terms after c1 〈~v1, ~v1〉? Remember that the set S =
{~v1, ~v2, ...., ~vn} is orthogonal. Therefore, the inner product of two distinct vectors
is zero so the only nonzero term in the third line of (83) is c1 〈~v1, ~v1〉.
The right hand side of equation (82) is obviously zero, so we end up with
c1 〈~v1, ~v1〉 = 0
Now 〈~v1, ~v1〉 = ‖~v1‖2 > 0 unless ~v1 is the zero vector, which it isn’t. Therefore,
we must have c1 = 0. If we perform the same procedure with ~v2 instead of ~v1,
we will get c2 = 0, and so on with all the other scalars. Therefore, the set S is
linearly independent.
The great thing about having an orthogonal/orthonormal basis for a space V
is that we can easily find the coordinates of any vector in V wih respect this basis.
Remember that the coordinates are the numbers multiplying the basis vectors
in the linear combination: if S = {~v1, ~v2, ...., ~vn} is the orthogonal/orthonormal
basis for V , then any vector ~u (in V ) can be written as
~u = c1~v1 + c2~v2 + ....+ cn~vn
Page 65
Engineering Analysis SESA 2021
Let’s take the inner product of both sides with ~v1 (same as the procedure above)
〈~u,~v1〉 = 〈(c1~v1 + c2~v2 + ....+ cn~vn), ~v1〉
= c1 〈~v1, ~v1〉+ c2 〈~v2, ~v1〉+ .....+ cn 〈~vn, ~v1〉
= c1 〈~v1, ~v1〉
(84)
Since we know ~u and we know ~v1 we can find c1
c1 =〈~u,~v1〉〈~v1, ~v1〉
=〈~u,~v1〉‖~v1‖2
using the definition of the norm. Similarly
c2 =〈~u,~v2〉‖~v2‖2
, c3 =〈~u,~v3〉‖~v3‖2
, . . . . . . . . . cn =〈~u,~vn〉‖~vn‖2
Therefore, we can write the vector ~u as
~u =〈~u,~v1〉‖~v1‖2
~v1 +〈~u,~v2〉‖~v2‖2
~v2 + ....+〈~u,~vn〉‖~vn‖2
~vn (85)
If {~v1, ~v2, .....~vk} is an orthonormal basis, then
~u = 〈~u,~v1〉~v1 + 〈~u,~v2〉~v2 + ....+ 〈~u,~vn〉~vn (86)
Page 66
Engineering Analysis SESA 2021
v = (0,0,1)
u = (2,2,1)
u = (2,2,0)projWW = xy plane
x
y
z
O
P
Qπ/2
Figure 4: The othogonal projection of a vector ~u in R3 on the xy plane W (example
13.1).
13 Orthogonal projections
We now introduce the idea of “orthogonal projections”. Let’s look at a simple
example
Example 13.1 Let’s take the vector ~u = (2, 2, 1) = 2~e1+2~e2+~e3 in R3. We can
define a subspace W of R3 as that space with all vectors of the form ~q = (q1, q2, 0),
where q1 and q2 are any scalars. This is nothing more than those vectors in R3
that lie in the xy plane. They are linear combinations of ~e1 and ~e2.
The orthogonal projection of ~u on the xy plane W is the vector ~u =
(2, 2,0). What is this exactly? Basically, what we do is drop a straight line from
the point P in Figure 4 to the xy plane, landing at a point Q. The line vector
~v =−−→PQ has to be perpendicular to the xy plane. The only possibility for this is
~v = (0, 0, 1), i.e., it is parallel to the z axis. The vector−−→OQ is the orthogonal
projection. We write it as projW ~u. Notice that it lies in W (the xy plane).
Page 67
Engineering Analysis SESA 2021
Why do we call it orthogonal? Well, there is the obvious reason that the line
−−→PQ we drop is perpendicular (orthogonal) to the xy plane. Notice that the vector
~v =−−→PQ is orthogonal to every vector in ~q = (q1, q2, 0) in W (the xy plane):
〈~v, ~q〉 = 〈(0, 0, 1), (q1, q2, 0)〉 = 0
Therefore, ~v is in the orthogonal complement W⊥ of W (see example 11.11).
The orthogonal projection projW ~u, on the other hand, is in W , and
~v + projW ~u = (0, 0, 1) + (2, 2, 0) = (2, 2, 1) = ~u
What we have managed to do is decompose ~u into two parts, one in W⊥ and the
other in W . The two parts are orthogonal to each other. We can get these two
parts by splitting the linear combination of orthogonal basis vectors
~u = 2~e1 + 2~e2︸ ︷︷ ︸
projW ~u (in W )
+ e3︸︷︷︸
~v (in W⊥)
Finally, we can see from Figure 4 that ~v is the shortest distance between
P and the plane W . If we wanted to approximate the vector ~u using only the
basis vectors in W (~e1 and ~e2), projW ~u would be the best approximation.
Now this is all well and good but what if we have a vector in a general Rn space
and we want to approximate it by a vector in a general subspace of Rn. For
instance, in the above example, rather than choosing the subspace as the xy
Page 68
Engineering Analysis SESA 2021
plane we could have chosen another plane, such as 2x + 3y − z = 2. We would
then have to approximate the vector ~u by a linear combination of basis vectors
that describe this plane in order to obtain the orthogonal projection.
Let ~u be a vector in Rn endowed with the standard inner product. Let W
be a subspace of Rn with an orthogonal basis {~v1, ~v2, .....~vk}, where k ≤ n.
The orthogonal projection of ~u on W is given by
projW ~u =〈~u,~v1〉‖~v1‖2
~v1 +〈~u,~v2〉‖~v2‖2
~v2 + ....+〈~u,~vk〉‖~vk‖2
~vk (87)
If {~v1, ~v2, .....~vk} is an orthonormal basis for W , then
projW ~u = 〈~u,~v1〉~v1 + 〈~u,~v2〉~v2 + ....+ 〈~u,~vk〉~vk (88)
• projW ~u is in W and the vector ~v = (~u−projW ~u) is in W⊥. The vectors
projW ~u and ~v are, therefore, orthogonal.
• The shortest “distance” between the vector ~u and the subspace W is the
norm (magnitude) of ~v: ‖~v‖ = shortest distance between ~u and W .
• Of all the vectors in the subspace W , the vector projW ~u is the best
approximation to ~u.
Most of these facts are suggested by example 13.1, but we haven’t quite shown
that they hold in the general case. Let’s start with the claim that that vec-
tors ~v = (~u − projW ~u) and projW ~u are orthogonal. To simplify the notation
Page 69
Engineering Analysis SESA 2021
let’s assume that the basis {~v1, ~v2, .....~vk} is orthonormal, i.e. all the ~vi’s have
‖~vi‖ = 1. Then
〈~v, projW ~u〉
= 〈(~u− projW ~u), projW ~u〉
= 〈~u, projW ~u〉 − 〈projW ~u, projW ~u〉
=
⟨
~u, 〈~u,~v1〉~v1 + 〈~u,~v2〉~v2 + ....+ 〈~u,~vk〉~vk︸ ︷︷ ︸
projW ~u
⟩
− 〈projW ~u, projW ~u〉
= 〈~u, 〈~u,~v1〉~v1〉+ ....+ 〈~u, 〈~u,~vk〉~vk〉 − 〈projW ~u, projW ~u〉
= 〈~u,~v1〉 〈~u,~v1〉+ ....+ 〈~u,~vk〉 〈~u,~vk〉 − 〈projW ~u, projW ~u〉
= 〈~u,~v1〉2 + 〈~u,~v2〉2 + ....+ 〈~u,~vk〉2 − 〈projW ~u, projW ~u〉
= 〈projW ~u, projW ~u〉 − 〈projW ~u, projW ~u〉 = 0
(89)
so they are indeed orthogonal.
Exercise: Repeat this procedure for an orthogonal (but not orthornormal)
basis for W .
Now, projW ~u clearly lies in W by the way it is defined (a linear combina-
tion of the basis vectors in W ). How do we show that ~v is in W⊥? If it is,
then ~v is orthogonal to every vector in W . Since every vector in W is a lin-
ear combination of the vectors in {~v1, ~v2, ...~vk}, we just need to show that ~v
is orthogonal to each of these basis vectors (why?). Again, let’s assume they
Page 70
Engineering Analysis SESA 2021
are orthonormal. We choose any one of them, say ~v1, and take the inner product
〈~v,~v1〉
= 〈(~u− projW ~u), ~v1〉
= 〈~u,~v1〉 − 〈projW ~u,~v1〉
= 〈~u,~v1〉 − 〈〈~u,~v1〉~v1 + 〈~u,~v2〉~v2 + ....+ 〈~u,~vk〉~vk, ~v1〉
= 〈~u,~v1〉 − 〈〈~u,~v1〉~v1, ~v1〉+ 〈〈~u,~v2〉~v2, ~v1〉+ ....+ 〈〈~u,~vk〉~vk, ~v1〉
= 〈~u,~v1〉 − 〈~u,~v1〉 〈~v1, ~v1〉+ 〈~u,~v2〉 〈~v2, ~v1〉+ ....+ 〈~u,~vk〉 〈~vk, ~v1〉
= 〈~u,~v1〉 − 〈~u,~v1〉 〈~v1, ~v1〉
= 〈~u,~v1〉 − 〈~u,~v1〉 = 0
(90)
We can do the same with all the basis vectors.
Exercise: Repeat this procedure for an orthogonal (but not orthornormal)
basis for W .
Now onto the statement about “shortest distance” and “best approximation”.
In R2 and R
3 (as you can see in Figure 4), the vector ~v takes us from a P to
the closest point on the subspace W because the shortest distance between
two points is a straight line! It is essentially this concept that we want to
generalise for higher dimensions.
Let’s restate clearly want we want to do: find the vector in W that gives us
the best approximation to a general vector ~u in Rn. We are claiming that this
Page 71
Engineering Analysis SESA 2021
c = a + b2 2 2
c
a
b
Figure 5: Illustration of the Pythagorean theorem.
vector is projW ~u. Let’s start by choosing any vector ~w in W that is NOT the
same as projW ~u. We can write (a simple mathematical trick that will help us)
~u− ~w = (~u− projW ~u) + (projW ~u− ~w)
The vector (projW ~u− ~w) is a combination of (basis) vectors in W and so belongs
to W itself. We already know that the vector ~v = ~u − projW ~u is in W⊥.
Therefore (~u− projW ~u) and (projW ~u− ~w) are orthogonal.
To proceed, we look at a familiar concept: the Pythagorean theorem
for a right triangle, demonstrated in Figure 5. For two vectors ~a and ~b in R2
or R3 that are at right angles (orthogonal), the Pythagorean theorem becomes
|~a + ~b|2 = |~a|2 + |~b|2. For two orthogonal vectors ~a and ~b in a general inner
product space, the equivalent theorem is
‖~a+~b‖2 = ‖~a‖2 + ‖~b‖2
Page 72
Engineering Analysis SESA 2021
Putting ~a = (~u− projW ~u) and ~b = (projW ~u− ~w) in this formula we get
‖~u− ~w‖2 = ‖~u− projW ~u‖2 + ‖projW ~u− ~w‖2
> ‖~u− projW ~u‖2 because ~w 6= projW ~u
Therefore
‖~u− projW ~u‖2 < ‖~u− ~w‖2 for all vectors ~w in W , except projW ~u
In turn, this means that
• The shortest “distance” between the vector ~u and W is the norm (magnitude)
of the vector ~v = ~u− projW ~u.
• projW ~u gives us the “best approximation” to ~u by a vector in W .
One final note. We have shown that given any subspace W of Rn for which
we have an orthogonal basis, we can write any vector in Rn as a sum of
a vector in W and a vector in W⊥. The dimension of Rn has to be n.
Therefore the dimensions (number of basis vectors) of W and W⊥ have to sum
to n:
dimW + dimW⊥ = n
We have essentially partitioned Rn into the two spaces W and W⊥. We say,
therefore, that Rn is the direct sum of W and W⊥. This is written as
Rn = W ⊕W⊥
Page 73
Engineering Analysis SESA 2021
14 The Gram-Schmidt process
The first important application of orthogonal projections is the Gram-Scmidt
process. We will meet another in the next section.
Suppose we have an arbitraty (non-orthogonal) basis for Rn. The basis
vectors are linearly independent but we would prefer them to be orthogonal too.
It turns out that this is always possible: given n linearly independent vectors for
Rn we can turn them into an orthogonal basis. The way we do this is called the
Gram-Schmidt process. We will illustrate the procedure using two examples.
Throughout this section, the standard inner product on Rn will be assumed.
v =1 u1
v2proj W
v2
u =2 v −2 v2proj W
x
yOldbasis
u2
u1
Newbasis
v2
v1
Figure 6: An illustration of the Gram Schmidt process in R2 (see example 14.1). Here,
the orthogonal basis ~u1 and ~u2 is constructed from ~v1 and ~v2.
Example 14.1 Consider the basis S = {~v1, ~v2} for R2, where ~v1 = (3, 1) and
~v2 = (2, 2). This may not be a particularly convenient basis, unlike the standard
basis, where the vectors ~e1 and ~e2 are perpendicular (orthogonal). We’ve seen
Page 74
Engineering Analysis SESA 2021
how easy it is in that case to write down the coordinates for a general vector.
What if we could turn these linearly independent vectors into an orthogonal
basis {~u1, ~u2}? It would then resemble the standard basis, but with a different
origin. In fact we can!
We start by putting ~u1 = ~v1. This will be our first orthogonal basis vector.
We now need to construct a second vector that is orthogonal to ~u1. We could
do this by simply looking at Figure 6 and making the simple observation that in
order for ~u2 to be orthogonal to ~u1 = ~v1, it must be of the form ~u2 = t(1,−3)
for any number t. However, we want a systematic way of doing it because there
are generally many more than two basis vectors.
Let’s form the subspace W = span {~u1} of R2. W is the set of all the linear
combinations (in this case multiples) of ~u1. We can project the vector ~v2 from
the original basis S onto W . This orthogonal projection, projW ~v2, is shown in
Figure 6. It is nothing more than the component of ~v2 that points in the direction
of ~u1. It is in the space W because it is a multiple of ~u1.
In the previous section we saw that any vector ~u in Rn can be written as a
sum of two vectors: (i) the projection projW ~u of ~u onto a subspace W (which
has an orthogonal basis) and (ii) the vector ~u − projW ~u in W⊥. We can ap-
ply this information here by putting ~u = ~v2 and W = span {~u1}. The vector
~v2 − projW ~v2 is orthogonal to every vector in W = span {~u1}, in particular to
~u1. So we set ~u2 = ~v2 − projW ~v2 to get a vector orthogonal to ~u1 = ~v1. The
formula for the projection is by equation (87)
projW ~v2 =〈~v2, ~u1〉‖~u1‖2
~u1 (only one basis vector ~u1 in W ) (91)
Page 75
Engineering Analysis SESA 2021
This gives
~u2 = ~v2 −〈~v2, ~u1〉‖~u1‖2
~u1 = ~v2 −4
5~u1 =
2
5(−1, 3) (92)
Exercise: Check that ~u1 and ~u2 are orthogonal.
Example 14.2 Given the basis ~v1 = (2,−1, 0), ~v2 = (1, 0,−1) and ~v3 =
(3, 7,−1), find an orthogonal basis {~u1, ~u2, ~u3} for R3.
As in the last example we set ~u1 = ~v1 and form the subspace W1 = span {~u1}
of R3. Again, we project ~v2 on W1 to get the component of ~v2 in the direction
of ~u1. The component of ~v2 in W⊥1 then gives us ~u2
~u2 = ~v2 − projW1~v2
= ~v2 −〈~v2, ~u1〉‖~u1‖2
~u1
= (1, 0,−1)− 2
5(2,−1, 0) (〈~v2, ~u1〉 = 2 and ‖~u1‖2 = 5 )
=1
5(1, 2,−1)
(93)
Now what? We repeat the previous steps. We want a vector orthogonal to both
~u1 and ~u2. The subspace W2 = span {~u1, ~u2} of R3 consists of all linear com-
binations of ~u1 and ~u2. Our task is, therefore, to find a vector ~u3 that lies in
W⊥2 , the subspace of all vectors that are orthogonal to both ~u1 and ~u2. So let’s
project ~v3 on W2 to get a vector projW2~v3 in the subspace W2. Then, the vector
~v = ~v3 − projW2~v3 lies in W⊥
2 . The vector projW2~v3 is again given by equation
Page 76
Engineering Analysis SESA 2021
(87), so ~u3 is
~u3 = ~v3 − projW2~v3
= ~v3 −(〈~v3, ~u1〉
‖~u1‖2~u1 +
〈~v3, ~u2〉‖~u2‖2
~u2
)
= ~v3 −(
−1
5~u1 +
22/5
6/5~u2
)
=8
3(1, 2, 1)
(94)
Exercise: Check that ~u1, ~u2 and ~u3 are mutually orthogonal.
Exercise: How do we know that the orthogonal set of vectors we have con-
structed in this example, {~u1, ~u2, ~u3}, is actually a basis for R3? In other words,
does this set of vectors span the whole of R3?
HINT: Think about (i) the number of basis vectors required to span Rn, and
(ii) the relationship between linear independence and orthogonality.
In the two examples above we have developed a procedure for turning a general
basis into an orthogonal basis. We now summarise the procedure.
Page 77
Engineering Analysis SESA 2021
Let ~v1, ~v2, .....~vn be a set of linearly independent vectors for Rn. Then an
orthogonal basis ~u1, ~u2, .....~un for Rn can be found by the following Gram-
Schmidt process
Step 1.
~u1 = ~v1
Step 2.
~u2 = ~v2 −〈~v2, ~u1〉‖~u1‖2
~u1
Step 3.
~u3 = ~v3 −(〈~v3, ~u1〉
‖~u1‖2~u1 +
〈~v3, ~u2〉‖~u2‖2
~u2
)
...
...
Step n.
~un = ~vn −(〈 ~vn, ~u1〉
‖~u1‖2~u1 +
〈 ~vn, ~u2〉‖~u2‖2
~u2 + . . . . . .+〈 ~vn, ~un−1〉‖~un−1‖2
~un−1
)
Page 78
Engineering Analysis SESA 2021
Note that to obtain an orthonormal basis from the new orthogonal basis, we
simply divide each new member of the orthogonal basis by its norm.
Example 14.3 Convert the orthogonal basis found in example 14.2 into an or-
thonormal basis {~w1, ~w2, ~w3}.
The orthogonal basis is
~u1 = (2,−1, 0) ~u2 =1
5(1, 2,−1) ~u3 =
8
3(1, 2, 1)
We compute
‖~u1‖ =√5 ‖~u2‖ =
√30
5‖~u3‖ =
8√6
3
which yields
~w1 =~u1‖~u1‖
=1√5(2,−1, 0)
~w2 =~u2‖~u2‖
=1√30
(1, 2,−5)
~w3 =~u3‖~u3‖
=1√6(1, 2, 1)
Exercise: (a) Given the basis ~v1 = (1, 1, 1, 1), ~v2 = (1, 1, 1, 0), ~v3 = (1, 1, 0, 0)
and ~v4 = (1, 0, 0, 0) for R4, construct an orthogonal basis for R
4. (b) Convert
the orthogonal basis found into an orthonormal basis.
Page 79
Engineering Analysis SESA 2021
15 Least squares approximations
We now come to a second important application of orthogonal projections. Re-
member the temperature data example 1.2 in which we wanted to fit a line to
the data but ended up with a system of equations that had more equations
than unknowns? This type of system is called “overdetermined”. The other
way round, when we have more unknowns than equations, the system is called
“underdetermined”. Both of these types of systems are called inconsistent.
Suppose that we have an inconsistent system of n equations in m unknowns.
In matrix form, the system is:
A~u = ~b
for an n × m matrix A and vector ~b in Rn. There is no solution ~u (in R
m) to
this equation, i.e., there is no vector ~u such that A~u = ~b. Perhaps, however, we
can look for a vector ~u such that A~u will be close to ~b. To this end, let’s define
a residual ~r as follows:
~r = ~b−A~u (95)
~r is obviously a vector (both ~b and A~u are vectors). It is a measure of how close
a vector ~u will be to satisfying the equation. What we do is look for the vector
~u that makes the norm (magnitude) of ~r as small as possible. This leads to the
least squares solution.
Page 80
Engineering Analysis SESA 2021
Given an inconsistent system A~u = ~b, the vector ~ul that makes
‖~r‖ = ‖~b−A~ul‖
as small as possible is called the least squares solution.
Okay, this has given us some sort of criterion, but how exactly do we find this
vector ~ul? Recall example 10.1 in which it was shown that the multiplication
of a vector by a matrix results in a linear combination of the matrix column
vectors, i.e., all outputs A~u are in col(A), the column space.
Now put W = col(A). A~u will be in W for any ~u. Indeed, the set of outputs
A~u for all choices of ~u in Rm will span W ; any linear combination of the column
vectors is possible for the right choice of ~u = (u1, u2, ..., um). Therefore:
range(A) = col (A) = W
At this point let’s state the least squares problem in a different way:
Given an inconsistent system A~u = ~b for some ~b in Rn, find the vector A~ul
in W = col (A) that is the closest approximation to ~b, i.e.,
‖~b−A~ul‖ < ‖~b−A~u‖
for all possible choices of A~u.
Let’s restate a result from section 13 on orthogonal projections. Suppose W is
a subspace of Rn and ~x is a vector in Rn. The closest approximation to ~x
by a vector in W is given by projW~x, the orthogonal projection of ~x
Page 81
Engineering Analysis SESA 2021
on W. projW~x is in W by definition and ~x− projW~x is in W⊥.
Let’s put W = col (A) and swap ~x for A~u (all vectors in the column space
(range) of A). Then the closest approximation to ~b by a vector in W is projW~b.
This is the vector A~ul that we want:
A~ul = projW~b
We could find A~ul this way and invert the result to find ~ul, but there is a better
way to solve the problem.
The least squares solution ~ul to the problem A~u = ~b also satisfies the
normal system:
ATA~ul = AT~b (96)
This system is always consistent. If the equation A~x = ~0 has only the
trivial solution ~x = ~0, a unique solution to the least squares problem is
~ul = (ATA)−1AT~b
Before we move onto an example, let’s see why the above statements are true.
We’ve determined that A~ul = projW~b, which is a vector in W = col (A). We
can always find ~ul = (u1, u2, ..., um), with the right choice of coordinates, such
that A~ul gives us the projection vector we want. This means we always have a
solution.
Page 82
Engineering Analysis SESA 2021
The residual, given by equation (95), satisfies:
~r = ~b−A~ul = ~b− projW~b
The vector on the right-hand side is in W⊥ = col (A)⊥, as stated above. In ex-
ample 11.12 we showed that for a matrix A, col (A)⊥ is the same as ker (AT ), the
null space of AT . So, for the least squares solution, the residual is in ker (AT ),
which means that AT~r = ~0, i.e.,
AT~r = AT (~b−A~ul) = ~0 or AT~b = ATA~ul
If we had two solutions ~u1land ~u2
l, then we would have:
A~u1l = A~u2l or A(~u1l − ~u2l ) = 0
But this has only the solution (~u1l− ~u2
l) = ~0, so ~u1
l= ~u2
l. In other words, we
have contradicted ourselves, which means we can’t have two solutions.
Example 15.1 Use a least squares approximation to find the equation of the
line that will best approximate the points (x, y) = (−2, 65), (1, 20), (−7, 105)
and (5,−34).
The line will have the form y = ax+ b. If we put each of the x and y values
into y = ax + b we will get 4 equations for a and b (clearly too many!). The
system is overdetermined. It is written in matrix form as follows:
Page 83
Engineering Analysis SESA 2021
−2 1
1 1
−7 1
5 1
︸ ︷︷ ︸
A
a
b
︸ ︷︷ ︸
~u
=
65
20
105
−34
︸ ︷︷ ︸
~b
(97)
The normal system (96) for the least squares solution is given by multiplying
both sides by the transpose of A:
−2 1 −7 5
1 1 1 1
︸ ︷︷ ︸
AT
−2 1
1 1
−7 1
5 1
︸ ︷︷ ︸
A
a
b
︸ ︷︷ ︸
~u
=
−2 1 −7 5
1 1 1 1
︸ ︷︷ ︸
AT
65
20
105
−34
︸ ︷︷ ︸
~b
(98)
which leads to a much simpler equation
79 −3
−3 4
a
b
=
−1015
156
(99)
This is an easy system to solve. The answer is a = −11.7 and b = 47.8.
Exercise Find the least squares solution to the following system:
Page 84
top related