mathematics methods
TRANSCRIPT
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 1/212
M A T H E M A T I C A L
M E T H O D S I
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 3/212
Contents
I Linear Algebra: by Gordon Royle and Alice Devillers 7
1 Systems of Linear Equations 91.1 Systems of Linear Equations 9
1.2 Solving Linear Equations 12
1.3 Gaussian Elimination 16
1.4 Back Substitution 19
2 Vector Spaces and Subspaces 25
2.1 The vector spaceR
n
252.2 Subspaces 27
2.3 Spans and Spanning Sets 31
2.4 Linear Independence 37
2.5 Bases 40
3 Matrices and Determinants 47
3.1 Matrix Algebra 47
3.2 Subspaces from matrices 51
3.3 The null space 56
3.4 Solving systems of linear equations 59
3.5 Matrix Inversion 59
3.6 Determinants 68
4 Linear transformations 77
4.1 Introduction 77
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 4/212
4
II Differential Calculus: by Luchezar Stoyanov, Jennifer Hopwood and MichaelGiudici. Some pictures courtesy of Kevin Judd. 85
5 Vector functions and functions of several variables 87
5.1 Vector valued functions 87
5.2 Functions of two variables 89
5.3 Functions of three or more variables 91
5.4 Summary 92
6 Limits and continuity 95
6.1 Scalar-valued functions of one variable 95
6.2 Limits of vector functions 100
6.3 Limits of functions of two variables 100
6.4 Continuity of Functions 103
7 Differentiation 107
7.1 Derivatives of functions of one variable 1077.2 Differentiation of vector functions 109
7.3 Partial Derivatives 113
7.4 Tangent Planes and differentiability 116
7.5 The Jacobian matrix and the Chain Rule 119
7.6 Directional derivatives and gradients 124
8 Maxima and Minima 129
8.1 Functions of a single variable 129
8.2 Functions of several variables 131
8.3 Identifying local maxima and minima 134
9 Taylor Polynomials 139
9.1 Taylor polynomials for functions of one variable 139
9.2 Taylor polynomials for functions of two variables 141
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 5/212
mathematical methods i 5
III Differential equations and eigenstructure: by Des Hill 143
10 Differential Equations 145
10.1 Introduction 14510.2 Mathematical modelling with ordinary differential equations 148
10.3 First order ordinary differential equations 149
10.4 Initial conditions 155
10.5 Linear constant-coefficient ordinary differential equations 156
10.6 Linear nonhomogeneous constant-coefficient differential equations 158
10.7 Initial and boundary conditions 162
10.8 Summary of method 164
10.9 Partial differential equations 164
11 Eigenvalues and eigenvectors 171
11.1 Introduction 171
11.2 Finding eigenvalues and eigenvectors 172
11.3 Some properties of eigenvalues and eigenvectors 175
11.4 Diagonalisation 177
11.5 Linear Systems 179
11.6 Solutions of homogeneous linear systems of differential equations 17911.7 A change of variables approach to finding the solutions 181
12 Change of Basis 183
12.1 Introduction 183
12.2 Change of basis 185
12.3 Linear transformations and change of bases 186
IV Sequences and Series: by Luchezar Stoyanov, Jennifer Hopwood and Michael Giudici 191
13 Sequences and Series 193
13.1 Sequences 193
13.2 Infinite Series 198
13.3 Absolute Convergence and the Ratio Test 204
13.4 Power Series 205
13.5 Taylor and MacLaurin Series 207
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 6/212
6
14 Index 211
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 7/212
Part I
Linear Algebra: by Gordon
Royle and Alice Devillers
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 9/212
1
Systems of Linear Equations
This chapter covers the systematic solution of systems of linear equations using Gaussian elimination
and back-substitution and the description, both algebraic and geometric, of their solution space.
Before commencing this chapter, students should be able to:
• Plot linear equations in 2 variables, and
• Add and multiply matrices.
After completing this chapter, students will be able to:
• Systematically solve systems of linear equations with many variables, and
• Identify when a system of linear equations has 0, 1 or infinitely many solutions, and
• Give the solution set of a system of linear equations in parametric form.
1.1 Systems of Linear Equations
A linear equation is an equation of the form
x + 2 y = 4
where each term in the equation is either a number 1 (i.e. “6”) or 1 You will often see the word scalarused to refer to a number, and scalarmultiple to describe a numerical multi-ple of a variable.
a numerical multiple of a variable (i.e., “2x”, “4 y”). If an equation
involves powers or products of variables (x2, xy, etc.) or any other
functions (sin x, ex, etc.) then it is not linear.
A system of linear equations is a set of one or more linear equa-
tions considered together, such as
x + 2 y = 4
x − y = 1
which is a system of two equations in the two variables2 x and y. 2 Often the variables are called “un-knowns” emphasizing that solving asystem of linear equations is a processof finding the unknowns.
A solution to a system of linear equations is an assignment of
values to the variables such that all of the equations in the system
are satisfied. For example, there is a unique solution to the system
given above which is
x = 2, y = 1.
Particularly when there are more variables, it will often be useful togive the solutions as vectors3 like 3 In this case we could also just give
the solution as (2, 1) where, by conven-tion the first component of the vectoris the x-coordinate Usually the vari-ables will have names like x , y, z or x1,x2, . . ., xn and so we can just specifythe vector alone — in this case (2, 1) —and it will be clear which componentcorres onds to which variable.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 10/212
10
(x, y) = (2, 1).
If the system of linear equations involves just two variables, then
we can visualise the system geometrically by plotting the solutions to
each equation separately on the x y-plane. The solution to the sys-
tem of linear equations is the point where the two plots intersect,which in this case is the point (2, 1).
−2 −1 1 2 3
−2
−1
1
2
3 Figure 1.1: The two linear equationsplotted as intersecting lines in thexy-plane
It is easy to visualise systems of linear equations in two vari-
ables, but it is more difficult in three variables, where we need
3-dimensional plots. In three dimensions the solutions to a single
linear equation such as This particular equation describes theplane with normal vector (1,2,
−1)
containing the point (4,0,0).x + 2 y − z = 4
form a plane in 3-dimensional space. While computer algebra sys-
tems can produce somewhat reasonable plots of surfaces in three
dimensions, it is hard to interpret plots showing two or more inter-
secting surfaces.
With four or more variables any sort of visualisation is essen- It is still very useful to use geometricintuition to think about systems of linear equations with many variablesprovided you are careful about whereit no longer applies.
tially impossible and so to reason about systems of linear equations
with many variables, we need to develop algebraic tools rather than
geometric ones.
1.1.1 Solutions to systems of linear equations
The system of linear equations shown in Figure 1.1 has a unique
solution. In other words there is just one (x, y) pair that satisfies
both equations, and this is represented by the unique point of inter-
section of the two lines. Some systems of linear equations have no If a system of linear equations hasat least one solution, then it is calledconsistent, and otherwise it is calledinconsistent.
solutions at all. For example, there are no possible values for (x, y)
that satisfy both of the following equations:
x + 2 y = 4
2x + 4 y = 3
Geometrically, the two equations determine parallel but differentlines, and so they do not meet.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 11/212
mathematical methods i 11
−2 −1 1 2 3
−2
−1
1
2
3 Figure 1.2: An inconsistent systemplotted as lines in the xy-plane
There is another possibility for the number of solutions to a sys-tem of linear equations, which is that a system may have infinitely
many solutions. For example, consider the system
x + 2 y + z = 4
y + z = 1(1.1)
Each of the two equations determines a plane in three dimen-
sions, and as the two planes are not parallel4, they meet in a line 4 The two planes are not parallel because the normal vectors to thetwo planes, that is n1 = (1,2,1) andn2 = (0,1,1) are not parallel.
and so every point on the line is a solution to this system of linear
equations.
How can we describe the solution set to a system of linear equations with
infinitely many solutions?
One way of describing an infinite solution set is in terms of free
parameters where one (or more) of the variables is left unspecified
with the values assigned to the other variables being expressed as
formulas that depend on the free variables.
Let’s see how this works with the system of linear equations
given by (1.1): here we can choose z to be the “free variable” but
then to satisfy the second equation it will be necessary to have
y = 1
− z. Then the first equation can only be satisfied by taking
x = 4 − 2 y − z
= 4 − 2(1 − z) − z (using y = 1 − z)
= 2 + z.
Thus the complete solution set S of system (1.1) is
S = {(2 + z, 1 − z, z) | z ∈ R}.
To find a particular solution to the linear system, you can pick
any desired value for z and then the values for x and y are deter-
mined. For example, if we take z = 1 then we get (3,0,1) as asolution, and if we take z = 0 then we get (2,1,0), and so on. For
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 12/212
12
this particular system of linear equations it would have been possi-
ble to choose one of the other variables to be the “free variable” and
we would then get a different expression for the same solution set.
Example 1.1. (Different free variable) To rewrite the solution set S =
{(2 + z, 1 − z, z) | z ∈ R} so that the y-coordinate is the free variable, just notice that as y = 1 − z, this implies that z = 1 − y and so the
solution set becomes S = {(3 − y, y, 1 − y) | y ∈ R}.
A system of linear equations can also be expressed as a single
matrix equation involving the product of a matrix and a vector of
variables. So the system of linear equations
x + 2 y + z = 5
y − z = −1
2x + 3 y − z = 3
can equally well be expressed as 1 2 1
0 1 −1
2 3 −1
x
y
z
=
5
−1
3
just using the usual rules for multiplying matrices.5 In general, a 5 Matrix algebra is discussed in detailin Chapter 3 but for this representationas a system of linear equations, justthe definition of the product of twomatrices is needed.
system of linear equations with m equations in n variables has the
form
A x = b
where A is an m×
n coefficient matrix, x is an n×
1 vector of vari-
ables, and b is an m × 1 vector of scalars.
1.2 Solving Linear Equations
In this section, we consider a systematic method of solving systems
of linear equations. The method consists of two steps, first using In high school, systems of linear
equations are often solved with adhoc methods that are quite suitablefor small systems, but which arenot sufficiently systematic to tacklelarger systems. It is very important
to thoroughly learn the systematicmethod, as solving systems of linearequations is a fundamental part of many of the questions that arise inlinear algebra. In fact, almost everyquestion in linear algebra ultimatelydepends on setting up and solving asuitable system of linear equations!
Gaussian elimination to reduce the system to a simpler system of
linear equations, and then back-substitution to find the solutions to
the simpler system.
1.2.1 Elementary Row Operations
An elementary row operation is an operation that transforms a system
of linear equations into a different, but equivalent system of linear
equations, where “equivalent” means that the two systems have
identical solutions. The answer to the obvious question — “ Why
bother transforming one system of linear equations into another?” —
is that the new system might be simpler to solve than the original
system. In fact there are some systems of linear equations that are
extremely simple to solve, and it turns out that by systematically
applying a sequence of elementary row operations, we can transform
any system of linear equations into an equivalent system whosesolutions are very simple to find.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 13/212
mathematical methods i 13
Definition 1.2. (Elementary Row Operations)
An elementary row operation is one of the following three types of
transformation applied to a system of linear equations:
Type 1 Interchanging two equations.Type 2 Multiplying an equation by a non-zero scalar.
Type 3 Adding a multiple of one equation to another equation.
In a system of linear equations, we let R i denote the i-th equa-
tion, and so we can express an elementary row operation symboli-
cally as follows:
Ri ↔ R j Exchange equations R i and R j
Ri ← αRi Multiply equation R i by α
Ri ← Ri + αR j Add α times R j to R i
We will illustrate elementary row operations on a simple system
of linear equations:
x + 2 y + z = 5
y − z = −1
2x + 3 y − z = 3
(1.2)
Example 1.3. (Type 1 Elementary Row Operation) Applying the Type 1
elementary row operation R1 ↔ R2 (in words, “interchange equations 1
and 2") to the original system (1.2) yields the system of linear equations:
y − z = −1
x + 2 y + z = 5
2x + 3 y − z = 3
It is obvious that this new system of linear equations has exactly the same
solutions as the original system, because each individual equation is un-
changed and listing them in a different order does not alter which vectors
satisfy them all.
Example 1.4. (Type 2 Elementary Row Operation) Applying the Type
2 elementary row operation R2 ← 3R2 (in words, “multiply the secondequation by 3”) to the original system (1.2) gives a new system of linear
equations:
x + 2 y + z = 5
3 y − 3 z = −3
2x + 3 y − z = 3
Again it is obvious that this system of linear equations has exactly the
same solutions as the original system. While the second equation is
changed, the solutions to this individual equation are not changed.6 6 This relies on the equation beingmultiplied by a non-zero scalar.
Example 1.5. (Type 3 Elementary Row Operation) Applying the Type 3
elementary row operation R3 ← R3 − 2R1 (in words,“add −2 times the first equation to the third equation" ) to the original system (1.2) gives a
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 14/212
14
new system of linear equations:
x + 2 y + z = 5
y − z = −1
− y
− 3 z =
−7
In this case, it is not obvious that the system of linear equations has the
same solutions as the original. In fact, the system is actually different
from the original, but it happens to have the exact same set of solutions.
This is so important that it needs to be proved.
As foreshadowed in the last example, in order to use elementary
row operations with confidence, we must be sure that the set of solu-
tions to a system of linear equations is not changed when the system
is altered by an elementary row operation. To convince ourselves of
this, we need to prove7 that applying an elementary row operation 7 A proof in mathematics is a carefulexplanation of why some mathematicalfact is true. A proof normally consistsof a sequence of statements, each fol-lowing logically from the previousstatements, where each individuallogical step is sufficiently simple thatit can easily be checked. The word“proof” often alarms students, butreally it is nothing more than a verysimple line-by-line explanation of amathematical statement. Creatingproofs of interesting or useful newfacts is the raison d‘être of a profes-sional mathematician.
to a system of linear equations neither destroys existing solutionsnor creates new ones.
Theorem 1.6. Suppose that S is a system of linear equations, and that
T is the system of linear equations that results by applying an elementary
row operation to S. Then the set of solutions to S is equal to the set of
solutions to T.
Proof. As discussed in the examples, this is obvious for Type 1 and
Type 2 elementary row operations. So suppose that T arises from S
by performing the Type 3 elementary row operation R i ← Ri + αR j.
Then S consists of m equations
S = {R1, R2, . . . , Rm}
while T only differs in the i-th equation
T = {R1, R2, . . . , Ri−1, Ri + αR j, Ri+1, . . . , Rm}.
It is easy to check that if a vector satisfies two equations R i and
R j, then it also satisfies R i + αR j and so any solution to S is also a
solution to T . What remains to be checked is that any solution to T
is a solution to S. However if a vector satisfies all the equations in
T , then it satisfies R i + αR j and R j, and so it satisfies the equation
(Ri + αR j) + (−αR j)
which is just R i. Thus any solution to T also satisfies S.
Now let’s consider applying an entire sequence of elementary
row operations to our example system of linear equations (1.2) to
reduce it to a much simpler form.
So, starting with
x + 2 y + z = 5
y − z = −12x + 3 y − z = 3
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 15/212
mathematical methods i 15
apply the Type 3 elementary row operation R3 ← R3 − 2R1 to get
x + 2 y + z = 5
y − z = −1
− y − 3 z = −7
followed by the Type 3 elementary row operation R3 ← R3 + R2
obtaining
x + 2 y + z = 5
y − z = −1
− 4 z = −8
Now notice that the third equation only involves the variable z,
and so it can now be solved, obtaining z = 2. The second equation
involves just y, z and as z is now known, it really only involves y,
and we get y = 1. Finally, with both y and z known, the first equa-
tion only involves x and by substituting the values that we know
into this equation we discover that x = 1. Therefore, this system of linear equations has the unique solution (x, y, z) = (1,1,2).
Notice that the final system was essentially trivial to solve, and
so the elementary row operations converted the original system
into one whose solution was trivial.
1.2.2 The Augmented Matrix
The names of the variables in a system of linear equations are es-
sentially irrelevant — whether we call three variables x , y and z or
x1, x2 and x3 makes no fundamental difference to the equations or
their solution. Thus writing out each equation in full when writingdown a sequence of systems of linear equations related by elemen-
tary row operations involves a lot of unnecessary repetition of the
variable names. Provided each equation has the variables in the
same order, all the information is contained solely in the coefficients,
and so these are all we need. Therefore we normally represent a
system of linear equations by a matrix known as the augmented
matrix of the system of linear equations; each row of the matrix rep-
resents a single equation, with the coefficients of the variables to the
left of the bar, and the constant term to the right of the bar. Each
column to the left of the bar contains all of the coefficients for a sin-
gle variable. For our example system (1.2), we have the following:
x + 2 y + z = 5
y − z = −1
2x + 3 y − z = 3
1 2 1 5
0 1 −1 −1
2 3 −1 3
Example 1.7. (From matrix to linear system) What system of linear
equations has the following augmented matrix?
0 −1 2 3
1 0 −2 4
3 4 1 0
The form of the matrix tells us that there are three variables, which we
can name arbitrarily, say x1, x2 and x3. Then the first row of the matrix
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 16/212
16
corresponds to the equation 0x1 − 1x2 + 2x3 = 3, and interpreting the
other two rows analogously, the entire system is
− x2 + 2x3 = 3
x1
− 2x3 = 4
3x1 + 4x2 + x3 = 0
We could also have chosen any other three names for the variables.
In other words, the augmented matrix for the system of linear
equations A x = b is just the matrix [ A | b].
1.3 Gaussian Elimination
When solving a system of linear equations using the augmented
matrix, the elementary row operations8 are performed directly on 8 This is why they are called elemen-tary row operations, rather than ele-
mentary equation operations, becausethey are always viewed as operatingon the rows of the augmented matrix
the augmented matrix.As explained earlier, the aim of the elementary row operations is
to put the matrix into a simple form from which it is easy to “read
off” the solutions; to be precise we need to define exactly the simple
form that we are trying to achieve.
Definition 1.8. (Row Echelon Form)
A matrix is in row echelon form if
1. Any rows of the matrix consisting entirely of zeros occur as the last
rows of the matrix, and
2. The first non-zero entry of each row is in a column strictly to the right
of the first non-zero entry in any of the earlier rows.
This definition is slightly awkward to read, but very easy to
grasp by example. Consider the two matrices
1 1 −1 2 0
0 0 −2 1 3
0 0 0 1 0
0 0 0 0 −1
1 1 −1 2 0
0 0 −2 1 3
0 2 0 1 0
0 0 1 2 −1
Neither matrix has any all-zero rows so the first condition is au-
tomatically satisfied. To check the second condition we need to
identify the first non-zero entry in each row — this is called the
leading entry:
1 1 −1 2 0
0 0 −2 1 3
0 0 0 1 0
0 0 0 0 −1
1 1 −1 2 0
0 0 −2 1 3
0 2 0 1 0
0 0 1 2 −1
In the first matrix, the leading entries in rows 1, 2, 3 and 4 occurin columns 1, 3, 4 and 5 respectively and so the leading entry for
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 17/212
mathematical methods i 17
each row always occurs strictly further to the right than the leading
entry in any earlier row. So this first matrix is in row-echelon form.
However for the second matrix, the leading entry in rows 2 and 3
occur in columns 3 and 2 respectively, and so the leading entry in
row 3 actually occurs to the left of the leading entry in row 2; hence
this matrix is not in row-echelon form.
Example 1.9. (Row-echelon form) The following matrices are all in row-
echelon form: 1 0 2 1
0 0 1 −1
0 0 0 0
1 1 2 3
0 2 1 −1
0 0 3 0
1 0 0 0
0 2 1 1
0 0 0 1
Example 1.10. (Not row-echelon form) None of the following matrices are
in row-echelon form: 1 0 2 1
0 0 0 0
0 0 1 −1
1 1 2 3
0 2 1 −1
0 1 3 0
1 0 0 0
0 0 1 1
0 0 2 1
Gaussian Elimination (sometimes called row-reduction) is a system-
atic method for applying elementary row operations to a matrix
until it is in row-echelon form. We’ll see in the next section that a
technique called back substitution, which involves processing theequations in reverse order, can easily determine the set of solu-
tions to a system of linear equations whose augmented matrix is in
row-echelon form.
Without further ado, here is the algorithm for Gaussian Elimina-
tion, first informally in words, and then more formally in symbols.
The algorithm is defined for any matrices, not just the augmented
matrices arising from a system of linear equations, because it has
many applications.
Definition 1.11. (Gaussian Elimination — in words)
Let A be an m × n matrix. At each stage in the algorithm, a particular
position in the matrix, called the pivot position, is being processed. Ini-
tially the pivot position is at the top-left of the matrix. What happens at
each stage depends on whether the pivot entry (that is, the number in the
pivot position) is zero or not.
1. If the pivot entry is zero then, if possible, interchange the pivot row
with one of the rows below it, in order to ensure that the pivot entry is
non-zero. This will be possible unless the pivot entry and every entry
below it are zero, in which case simply move the pivot position one
column to the right.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 18/212
18
2. If the pivot entry is non-zero then, by adding a suitable multiple of the
pivot row to every row below the pivot row, ensure that every entry
below the pivot entry is zero. Then move the pivot position one column
to the right and one row down.
When the pivot position is moved off the matrix, then the process finishes
and the matrix will be in row-echelon form.
The process of “adding a multiple of the pivot row to every row
below it in order to zero out the column below the pivot entry” is
called pivoting on the pivot entry for short.
Example 1.12. (Gaussian Elimination) Consider the following matrix,
with the initial pivot position marked:
A = 2 1 2 42 1 1 0
4 3 2 4
The initial pivot position is the (1, 1) position in the matrix, and the
pivot entry is therefore 2. Pivoting on the (1, 1)-entry is accomplished by
performing the two elementary operations R2 ← R2 − R1 and R3 ←R3 − 2R1, leaving the matrix:
2 1 2 4
0 0 −1 −4
0 1 −2 −4
R2 ← R2 − R1
R3 ← R3 − 2R1
(The elementary row operations used are noted down next to the relevantrows to indicate how the row reduction is proceeding.) The new pivot
entry is 0, but as the entry immediately under the pivot position is non-
zero, interchanging the two rows moves a non-zero to the pivot position. 2 1 2 4
0 1 −2 −4
0 0 −1 −4
R2 ↔ R3
R3 ↔ R2
The next step is to pivot on this entry in order to zero out all the entries
below it and then move the pivot position. As the only entry below the
pivot is already zero, no elementary row operations need be performed, and
the only action required is to move the pivot: 2 1 2 4
0 1 −2 −4
0 0 −1 −4
.
Once the pivot position reaches the bottom row, there are no further opera-
tions to be performed (regardless of whether the pivot entry is zero or not)
and so the process terminates, leaving the matrix in row-echelon form
2 1 2 4
0 1 −2 −4
0 0 −1 −4
as required.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 19/212
mathematical methods i 19
For completeness, and to provide a description more suitable
for implementing Gaussian elimination on a computer, we give the
same algorithm more formally – in a sort of pseudo-code. 9 9 Pseudo-code is a way of expressinga computer program precisely, butwithout using the syntax of anyparticular programming language.In pseudo-code, assignments, loops,conditionals and other features thatvary from language-to-language areexpressed in natural language.
Definition 1.13. (Gaussian Elimination — in symbols)
Let A = (aij ) be an m × n matrix and set two variables r ← 1, c ← 1.
(Here r stands for “row” and c for “column” and they store the pivot po-
sition.) Then repeatedly perform whichever one of the following operations
is possible (only one will be possible at each stage) until either r > m or
c > n, at which point the algorithm terminates.
1. If arc = 0 and there exists x > r such that axc = 0 then perform the
elementary row operation Rr ↔ Rx.
2. If arc = 0 and axc = 0 for all x > r, then set c ← c + 1.
3. If arc = 0 then, for each x > r, perform the elementary row operation
Rx ← Rx − (axc /arc)Rr,
and then set r ← r + 1 and c ← c + 1.
When this algorithm terminates, the matrix will be in row-echelon form.
1.4 Back Substitution
Recall that the whole point of elementary row operations is totransform a system of linear equations into a simpler system with
the same solutions; in other words to change the problem to an easier
problem with the same answer. So after reducing the augmented
matrix of a system of linear equations to row-echelon form, we now
need a way to read off the solution set.
The first step is to determine whether the system is consistent or
otherwise, and this involves identifying the leading entries in each
row of the augmented matrix in row-echelon form — these are the
first non-zero entries in each row. In other words, run a finger along
each row of the matrix stopping at the first non-zero entry and
noting which column it is in.
Example 1.14. (Leading entries) The following two augmented matrices
in row-echelon form have their leading entries highlighted.
1 0 −1 2 3
0 0 2 1 0
0 0 0 0 2
0 0 0 0 0
1 2 −1 2 3
0 0 2 1 0
0 0 0 −1 2
The left-hand matrix of Example 1.14 has the property that oneof the leading entries is on the right-hand side of the augmenting bar.
You may wonder why we keepsaying “to the right of the augmenting bar” rather than “in the last column”.The answer is that if we have morethan one linear equation with thesame coefficient matrix, say A x = b1,
A x = b2, then we can form a “super-augmented” matrix [ A | b1 b2]and solve both systems with oneapplication of Gaussian elimination.So there may be more than one columnto the right of the augmenting bar.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 20/212
20
If we “unpack” what this means for the system of linear equations,
then we see that the third row corresponds to the linear equation
0x1 + 0x2 + 0x3 + 0x4 = 2,
which can never be satisfied. Therefore this system of linear equa-tions has no solutions, or in other words, is inconsistent. This is in
fact a defining feature of an inconsistent system of linear equations, a
fact that is important enough to warrant stating separately.
Theorem 1.15. A system of linear equations is inconsistent if and only if
one of the leading entries in the row-echelon form of the augmented matrix
is to the right of the augmenting bar.
Proof. Left to the reader.
The right-hand matrix of Example 1.14 has no such problem,
and so we immediately conclude that the system is consistent — it
has at least one solution. Every column to the left of the augment-
ing bar corresponds to one of the variables in the system of linear
equations
x1 x2 x3 x4
1 2 −1 2 3
0 0 2 1 0
0 0 0 −1 2
(1.3)
and so the leading entries identify some of the variables. In this case,
the leading entries are in columns 1, 3 and 4 and so the identifiedvariables are x1, x3 and x4. The variables identified in this fashion
are called the basic variables (also known as leading variables) of the
system of linear equations. The following sentence is the key to
understanding solving systems of linear equations by back substitu-
tion:
Every non-basic variable of a system of linear equations is a free vari-
able or free parameter of the system of linear equations, while every
basic variable can be expressed uniquely as a combination of the free
parameters and/or constants.
The process of back-substitution refers to examining the equationsin reverse order, and for each equation finding the unique expres-
sion for the basic variable corresponding to the leading entry of
that row. Let’s continue our examination of the right-hand matrix of
Example 1.14, also shown with the columns identified in (1.3).
The third row of the matrix, when written out as an equation,
says that −1x4 = 2, and so x4 = −2, which is an expression for
the basic variable x4 as a constant. The second row of the matrix
corresponds to the equation 2x3 + x4 = 0, but as we know now that
x4 = −2, this can be substituted in to give 2x3 − 2 = 0 or x3 = 1.
The first row of this matrix corresponds to the equation
x1 + 2x2 − x3 + 2x4 = 3
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 21/212
mathematical methods i 21
and after substituting in x3 = 1 and x4 = −2 this reduces to
x1 + 2x2 = 8. (1.4)
This equation involves one basic variable (that is, x1) together with
a non-basic variable (that is, x2
) and a constant (that is, 8). The rules
of back-substitution say that this should be manipulated to give an
expression for the basic variable in terms of the other things. So we
get
x1 = 8 − 2x2
and the entire solution set for this system of linear equations is
given by
S = {(8 − 2x2, x2, 1, −2) | x2 ∈ R}.
Therefore we conclude that this system of linear equations has
infinitely many solutions that can be described by one free parameter.
The astute reader will notice that (1
.4
) could equally well bewritten x2 = 4 − x1/2 and so we could use x1 as the free parameter,
rather than x2 — so why does back-substitution need to specify
which variable should be chosen as the free parameter? The answer
is that there is always an expression for the solution set that uses
the non-basic variables as the free parameters. In other words, the
process as described will always work .
Example 1.16. (Back substitution) Find the solutions to the system of
linear equations whose augmented matrix in row-echelon form is
0 2 −1 0 2 3 1
0 0 1 3 −1 0 20 0 0 0 1 1 0
0 0 0 0 0 1 5
First identify the leading entries in the matrix, and therefore the basic and
non-basic variables. The leading entries in each row are highlighted below
0 2 −1 0 2 3 1
0 0 1 3 −1 0 2
0 0 0 0 1 1 0
0 0 0 0 0 1 5
Therefore the basic variables are {x2, x3, x5, x6} while the free variablesare {x1, x4} and so this system has infinitely many solutions that can be
described with two free parameters. Now back-substitute starting from the
last equation. The fourth equation is simply that x 6 = 5, while the third
equation gives x5 + x6 = 0 which after substituting the known value for
x6 gives us x5 = −5. The second equation is
x3 + 3x4 − x5 = 2
and so it involves the basic variable x3 along with the free variable x4 and
the already-determined variable x5. Substituting the known value for x5
and rearranging to give an expression for x3, we get
x3 = −3 − 3x4 (1.5)
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 22/212
22
Finally the first equation is
2x2 − x3 + 2x5 + 3x6 = 1
and so substituting all that we have already determined we get
2x2 − (−3 − 3x4) + 2(−5) + 3(5) = 1
which simplifies to
x2 = −7 − 3x4
2
What about x1? It is a variable in the system of linear equations, but it
did not actually occur in any of the equations. So if it does not appear
in any of the equations, then there are no restrictions on its values and
so it can take any value — therefore it is a free variable. Fortunately, the
rules for back-substitution have already identified it as a non-basic variable
as it should be. Therefore the final solution set for this system of linear
equations is
S = {(x1, (−7 − 3x4)/2, −3 − 3x4, x4, −5, 5) | x1, x4 ∈ R}
and therefore we have found an expression with two free parameters as
expected.
Key Concept 1.17. (Solving Systems of Linear Equations)
To solve a system of linear equations of the form A x = b, perform the
following steps:
1. Form the augmented matrix [ A | b].
2. Use Gaussian elimination to put the augmented matrix into row-
echelon form.
3. Use back-substitution to express each of the basic variables as a
combination of the free variables and constants.
1.4.1 Reasoning about systems of linear equations
Understanding the process of Gaussian elimination and back-substitution also allows us to reason about systems of linear equa-
tions, even if they are not explicitly defined, and make general
statements about the number of solutions to systems of linear equa-
tions. One of the most important is the following result, which says
that any consistent system of linear equations with more unknowns
than equations has infinitely many solutions.
Theorem 1.18. Suppose that A x = b is a system of m linear equations
in n variables. If m < n, then the system is either inconsistent or has
infinitely many solutions.
Proof. Consider the row-echelon form of the augmented matrix[ A | b]. If the last column contains the leading entry of some row,
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 23/212
mathematical methods i 23
then the system is inconsistent. Otherwise, each leading entry is
in a column corresponding to a variable, and so there are exactly
m basic variables. As there are n variables altogether, this leaves
n − m > 0 free parameters in the solution set and so there are
infinitely many solutions.
A homogeneous system of linear equations is one of the form
A x = 0 and these systems are always consistent.10 Thus Theo- 10 Why is this true?
rem 1.18 has the important corollary that “every homogeneous system
of linear equations with more unknowns than equations has infinitely
many solutions”.
A second example of reasoning about a system of linear equa-
tions rather than just solving an explicit system is when the system
is not fully determined. For example, suppose that a and b are un-
known values. What can be said about the number of solutions of
the following system of linear equations? 1 2 a
0 1 2
1 3 3
x
y
z
=
−3
b
0
In particular, for which values of a and b will this system have 0, 1
or infinitely many solutions?
To answer this, start performing Gaussian elimination as usual,
treating a and b symbolically as their values are not known.11 The 11 Of course, it is necessary to makesure that you never compute anythingthat might be undefined, such as 1/a.If you need to use 1/a during the
Gaussian elimination, then you need toseparate out the cases a = 0 and a = 0and do them separately.
row reduction proceeds in the following steps: the initial aug-
mented matrix is 1 2 a −30 1 2 b
1 3 3 0
and so after pivoting on the top-left position we get
1 2 a −3
0 1 2 b
0 1 3 − a 3
R3 ← R3 − R1
and then
1 2 a −3
0 1 2 b
0 0 1 − a 3 − b
R3 ← R3 − R2
From this matrix, we can immediately see that if a = 1 then 1 − a =0 and every variable is basic, which means that the system has a
unique solution (regardless of the value of b). On the other hand,
if a = 1 then either b = 3 in which case the system is inconsistent,
or b = 3 in which case there are infinitely many solutions. We can
summarise this outcome:
a = 1 Unique solution
a = 1 and b = 3 No solutions
a = 1 and b = 3 Infinitely many solutions
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 25/212
2
Vector Spaces and Subspaces
This chapter takes the first steps away from the geometric interpretation of vectors in familiar 2- or
3-dimensional space by introducing n-dimensional vectors and the vector space Rn, which must neces-
sarily be described and manipulated algebraically.
Before commencing this chapter, students should be able to:
• Solve systems of linear equations.
After completing this chapter, students will be able to:
• Determine when a set of vectors is a subspace, and
• Determine when a set of vectors is linearly independent, and
• Find a basis for a subspace, and hence determine its dimension.
2.1 The vector space Rn
The vector space Rn consists of all the n-tuples of real numbers,
which henceforth we call vectors; formally we say that
Rn = {(x1, x2, . . . , xn) | x1, x2, . . . , xn ∈ R}.
Thus R2 is just the familiar collection of pairs of real numbers that
we usually visualise by identifying each pair (x, y) with the point
(x, y) on the Cartesian plane, and R3 the collection of triples of real
numbers that we usually identify with 3-space.
A vector u = (u1, . . . , un) ∈ Rn may have different meanings:
• when n = 2 or 3 it could represent a geometric vector in Rn that
has both a magnitude and a direction;
• when n = 2 or 3 it could represent the coordinates of a point in
the Cartesian plane or in 3-space;
• it could represent certain quantities, eg u1 apples, u2 pears, u3
oranges, u4 bananas, . . .
• it may simply represent a string of real numbers.
The vector space Rn also has two operations that can be per-formed on vectors, namely vector addition and scalar multiplication.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 26/212
26
Although their definitions are intuitively obvious, we give them
anyway:
Definition 2.1. (Vector Addition)
If u = (u1, u2, . . . , un) and v = (v1, v2, . . . , vn) are vectors in Rn thentheir sum u + v is defined by
u + v = (u1 + v1, u2 + v2, . . . , un + vn).
In other words, two vectors are added coordinate-by-coordinate.
Definition 2.2. (Scalar Multiplication)
If v = (v1, v2, . . . , vn) and α
∈R then the product αv is defined by
αv = (αv1, αv2, . . . , αvn) .
In other words, each coordinate of the vector is multiplied by the scalar.
Example 2.3. Here are some vector operations:
(1,2, −1, 3) + (4,0,1,2) = (5,2,0,5)
(3,1,2) + (6, −1, −4) = (9,0, −2)
5(1,0, −1, 2) = (5,0, −5,10)
Row or column vectors?
A vector in Rn is simply an ordered n-tuple of real numbers
and for many purposes all that matters is that we write it down
in such a way that it is clear which is the first coordinate, the
second coordinate and so on.
However a vector can also be viewed as a matrix, which is
very useful when we use matrix algebra (the subject of Chap-
ter 3) to manipulate equations involving vectors, and then a
choice has to be made whether to use a 1 × n matrix, i.e. amatrix with one row and n columns or an n × 1 matrix, i.e. a
matrix with n rows and 1 column to represent the vector.
Thus a vector in R4 can be represented either as a row vector
such as
[1,2,3,4]
or as a column vector such as
1
2
3
4
.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 27/212
mathematical methods i 27
For various reasons, mostly to do with the conventional no-
tation we use for functions (that is, we usually write f (x) rather
than (x) f ), it is more convenient mathematically to assume that
vectors are represented as column vectors most of the time. Un-
fortunately, in writing about mathematics, trying to typeset a rowvector such as [1,2,3,4] is much more convenient than typeset-
ting a column vector such as
1
2
3
4
which, as you can see, leads
to ugly and difficult to read paragraphs.
Some authors try to be very formal and use the notation for
a matrix transpose (see Chapter 2) to allow them to elegantly
typeset a column vector: so their text would read something
like: Let v = (1,2,3,4)T in which case everyone is clear that the
vector v is really a column vector.
In practice however, either the distinction between a row-
and column-vector is not important (e..g adding two vectors
together) or it is obvious from the context; in either case there
is never any actual confusion caused by the difference. So to
reduce the notational overload of adding a slew of transpose
symbols that are almost never needed, in these notes we’ve
decided to write all vectors just as rows, but with the under-
standing that when it matters (in matrix equations), they are
really to be viewed as column vectors. In this latter case, it will
always be obvious from the context that the vectors must becolumn vectors anyway!
One vector plays a special role in linear algebra; the vector in
Rn with all components equal to zero is called the zero-vector and
denoted
0 = (0 , 0 , . . . , 0).
It has the obvious properties that v + 0 = 0 + v = v; this means that
it is an additive identity1. 1 This is just formal mathematicalterminology for saying that you canadd it to any vector without altering
that vector.2.2 Subspaces
In the study of 2- and 3- dimensional geometry, figures such as lines
and planes play a particularly important role and occur in many
different contexts. In higher-dimensional and more general vector
spaces, a similar role is played by a vector subspace or just subspace
which is a set of vectors that has three special additional properties.
Definition 2.4. (Vector Subspace)
Let S ⊆ Rn be a set of vectors. Then S is called a subspace of Rn if
(S1) 0 ∈ S, and
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 28/212
28
(S2) u + v ∈ S for all vectors u, v ∈ S, and
(S3) αv ∈ S for all scalars α ∈ R and vectors v ∈ S.
First we’ll go through these three conditions in turn and see
what they are saying. The first condition, (S1) simply says that a
subspace must contain the zero vector 0; when this condition does
not hold, it is an easy way to show that a given set of vectors is not a
subspace.
Example 2.5. The set of vectors S = {(x, y) | x + y = 1} is not a
subspace of R2 because the vector 0 = (0, 0) does not belong to S.
The second condition2 (S2) says that in order to be a subspace, a 2 Condition (S2) does not restrict whathappens to the sum of two vectors thatare not in S, or the sum of a vector in S
and one not in S. It is only concernedwith the sum of two vectors that areboth in S.
set S of vectors must be closed under vector addition. This means that
if two vectors that are both in S are added together, then their summust remain in S.
Example 2.6. (Subspace)3 In R3, the xy-plane is the set of all vectors of 3 In the next section, we’ll see how to
present a formal proof that a set of vectors is closed under vector addition, but for these examples, geometricintuition is enough to see that what is being claimed is true.
the form (x, y, 0) (where x and y can be anything). The xy-plane is closed
under vector addition because if we add any two vectors in the xy-plane
together, then the resulting vector also lies in the xy-plane.
Example 2.7. (Not a subspace) In R2, the unit disk is the set of vectors
{(x, y) | x2 + y2 ≤ 1}.
This set of vectors is not closed under vector addition because if we takeu = (1, 0) and v = (0, 1), then both u and v are in the unit disk, but their
sum u + v = (1, 1) is not in the unit disk.
The third condition (S3) says that in order to qualify as a sub-
space, a set S of vectors must be closed under scalar multiplication,
meaning that if a vector is contained in S, then all of its scalar multi-
ples must also be contained in S.
Example 2.8. (Closed under scalar multiplication) In R2, the set of
vectors on the two axes, namely
S = {(x, y) | xy = 0}is closed under scalar multiplication, because it is clear that any multiple
of a vector on the x-axis remains on the x-axis, and any multiple of a
vector on the y-axis remains on the y-axis.
Example 2.9. (Not closed under scalar multiplication) In R2, the unit
disk, which was defined in Example 2.7, is not closed under scalar multi-
plication because if we take u = (1, 0) and α = 2, then αu = (2, 0) which
is not in the unit disk.
One of the fundamental skills needed in linear algebra is the
ability to identify whether a given set of vectors in Rn forms asubspace or not. Usually a set of vectors will be described in some
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 29/212
mathematical methods i 29
way, and you will need to be able to tell whether this set of vectors
is a subspace. To prove that a given set of vectors is a subspace,
it is necessary to show that all three conditions (S1), (S2) and (S3)
are satisfied, while to show that a set of vectors is not a subspace,
it is only necessary to show that one of the three conditions is not
satisfied.
2.2.1 Subspace proofs
In this subsection, we consider in more detail how to show whether
or not a given set of vectors is a subspace or not. It is much easier to
show that a set of vectors is not a subspace than to show that a set
of vectors is a subspace. The reason for this is that conditions (S2)
and (S3) apply to every pair of vectors in the given set. To show that
either condition fails, we only need to give a single example where
the condition does not hold, but to show that they are true, we need
to find a general argument that applies to every pair of vectors.
This asymmetry is so important that we give it a name4. 4 The “black swan” name comes fromthe famous notion that in order toprove or disprove the logical statement“all swans are white”, it would only benecessary to find one single black swanto disprove it, but it would be necessaryto check every possible swan in orderto prove it. Subspaces are the same —if a set of vectors is not a subspace,then it is only necessary to find one“black swan” showing that one of the conditions does not hold, but if itis a subspace then it is necessary to
“check every swan” by proving thatthe conditions hold for every pair of vectors and scalars.
Key Concept 2.10. (The “Black Swan” concept)
To show that a set of vectors S is not closed under vector addition,
it is sufficient to find a single explicit example of two vectors u, v
that are contained in S, but whose sum u + v is not contained in S.
However, to show that a set of vectors S is closed under vector
addition, it is necessary to give a formal symbolic proof that applies
to every pair of vectors in S.
Example 2.11. (Not a subspace) The set S = {(w, x, y, z) | wx = yz} in
R4 is not a subspace because if u = (1,0,2,0) and v = (0,1,0,2), then
u, v ∈ S but u + v = (1,1,2,2) /∈ S so condition (S2) does not hold.
One of the hardest techniques for first-time students of linear al-
gebra is to understand how to structure a proof that a set of vectors
is a subspace, so we’ll go slowly. Suppose that
S = {(x, y, z) | x − y = 2 z}is a set of vectors in R
3, and we need to check whether or not it is a
subspace. Here is a model proof, interleaved with some discussion
about the proof.5 5 When you do your proofs it mayhelp to structure them like this modelproof, but don’t include the discussion— this is to help you understand whythe model proof looks like it does, butit is not part of the proof itself.
(S1) It is obvious that
0 − 0 = 2(0)
and so 0 ∈ S.
Discussion: To check that 0 is in S, it is necessary to verify that the zero
vector satisfies the “defining condition” that determines S. In this case, the
defining condition is that the difference of the first two co-ordinates (that
is, x − y) is equal to twice the third coordinate (that is, 2 z). For the vector
0 = (0,0,0) we have all coordinates equal to 0, and so the condition is true.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 30/212
30
(S2) Let u = (u1, u2, u3) ∈ S and v = (v1, v2, v3) ∈ S. Then
u1 − u2 = 2u3 (2.1)
v1 − v2 = 2v3 (2.2)
Discussion: To prove that S is closed under addition, we need to check
every possible pair of vectors, which can only be done symbolically. We give
symbolic names u and v to two vectors in S and write down the only facts
that we currently know — namely that they satisfy the defining condition
for S. These equations are given labels — in this case, (2.1) and (2.2),
because the proof must refer to these equations later.
Now consider the sum
u + v = (u1 + v1, u2 + v2, u3 + v3)
and test it for membership in S. As
(u1 + v1) − (u2 + v2) = u1 + v1 − u2 − v2 (rearranging)
= (u1 − u2) + (v1 − v2) (rearranging)
= 2u3 + 2v3 (by (2.1) and (2.2))
= 2(u3 + v3) (rearranging terms)
it follows that u + v ∈ S.
Discussion: To show that u + v is in S, we need to show that the difference
of its first two coordinates is equal to twice its third coordinate. So the
sequence of calculations starts with the difference of the first two coordinates
and then carefully manipulates this expression in order to show that it
is equal to twice the third coordinate. Every stage of the manipulation is justified either just as a rearrangement of the terms or by reference to some
previously known fact. At some stage in the manipulation, the proof must
use the two equations (2.1) and (2.2), because the result must depend on the
two original vectors being vectors in S.
(S3) Let u = (u1, u2, u3) ∈ S and α ∈ R. Then
u1 − u2 = 2u3 (2.3)
Discussion: To prove that S is closed under scalar multiplication, we need
to check every vector in S and scalar in R. We give the symbolic name u
to the vector in S and α to the scalar, and note down the only fact that we
currently know — namely that u satisfies the defining condition for S. We’llneed this fact later, and so give it a name, in this case (2.3).
Now consider the vector
αu = (αu1, αu2, αu3)
and test it for membership in S. As
αu1 − αu2 = α(u1 − u2) (rearranging)
= α(2u3) (by (2.3))
= 2(αu3)
it follows that αu ∈ S.
Discussion: To show that αu is in S, we need to show that the differenceof its first two coordinates is equal to twice its third coordinate. So the
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 31/212
mathematical methods i 31
sequence of calculations starts with the difference of the first two coordinates
and then carefully manipulates it in order to show that it is equal to twice
the third coordinate. At some stage in the manipulation, the proof must use
the equations (2.3) because the result must depend on the original vector
being a member of S.
It will take quite a bit of practice to be able to write this sort of
proof correctly, so do not get discouraged if you find it difficult at
first. Here are some examples to try out.
Example 2.12. These sets of vectors are subspaces:
1. The set of vectors {(w, x, y, z) | w + x + y + z = 0} in R4.
2. The xy-plane in R3.
3. The line x = y in R2.
4. The set of vectors {(x1, x2, . . . , xn) | x1 + x2 + . . . + xn−1 = xn} in
Rn.
while these sets of vectors are not subspaces:
1. The set of vectors {(w, x, y, z) | w + x + y + z = 1} in R4.
2. The plane normal to n = (1,2,1) passing through the point (1,1,1).
3. The line x = y − 1 in R2.
4. The set of vectors {(x1, x2, . . . , xn) | x1 + x2 + . . . + xn−1 ≥ xn} in
Rn for n ≥ 2.
2.2.2 Exercises
1. Show that a line in R2 is a subspace if and only if it passes
through the origin (0, 0).
2. Find a set of vectors in R2 that is closed under vector addition,
but not closed under scalar multiplication.
3. Find a set of vectors in R2 that is closed under scalar multiplica-
tion, but not closed under vector addition.
2.3 Spans and Spanning Sets
We start this section by considering a simple question:
What is the smallest subspace S of R2 containing the vector v =
(1, 2)?
If S is a subspace, then by condition (S1) of Definition 2.4 it must
also contain the zero vector, so it follows that S must contain at least
the two vectors {0, v}. But by condition (S2), the subspace S must
also contain the sum of any two vectors in S, and so therefore S
must also contain all the vectors
(2, 4), (3, 6), (4, 8), (5,10), . . .
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 32/212
32
But then, by condition (S3), it follows that S must also contain all
the multiples of v, such as
(−1, −2), (1/2,1), (1/4, 1/2), . . . ,
Therefore S must contain at least the set of vectors6 6 So if a subspace contains a vectorv then it must contain every scalarmultiple of v. In R
2 and R3, this means
that if a subspace contains a point,then it contains the line containing theorigin and that point.
{(α, 2α) | α ∈ R}
and in fact this set contains enough vectors to satisfy the three
conditions (S1), (S2) and (S3), and so it is the smallest subspace
containing v. Now let’s extend this result by considering the same
question but with a bigger starting set of vectors: Suppose that
A = {v1, v2, . . . , vk } is a set of vectors in Rn — what is the smallest
subspace of Rn that contains A?
To answer this, we need a couple more definitions:
Definition 2.13. (Linear Combination)
Let A = {v1, v2, . . . , vk} be a set of vectors in Rn. Then a linear combi-
nation of the vectors in A is any vector of the form
v = α1v1 + α2v2 + · · · + αk vk
where α1, α2, . . ., αk ∈ R are arbitrary scalars.
By slightly modifying the argument of the last paragraph, itshould be clear that if a subspace contains the vectors v1, v2, . . .,vk ,
then it also contains every linear combination of those vectors. As
we will frequently need to refer to the “set of all possible linear
combinations” of a set of vectors, we should give it a name:
Definition 2.14. (Span)
The span of A = {v1, v2, . . . , vk} is the set of all possible linear
combinations of the vectors in A, and is denoted span( A). In symbols,
span( A) = {α1v1 + α2v2 + · · · + αk vk | αi ∈R
, 1 ≤ i ≤ k } .
Therefore, if a subspace S contains a subset A ⊆ S then it also
contains the span of A. To answer the original question (“what is
the smallest subspace containing A”) it is enough to notice that the
span of A is always a subspace itself and so no further vectors need
to be added. This is sufficiently important to write out formally as a
theorem and to give a formal proof.
Theorem 2.15. (Span of anything is a subspace) Let A = {v1, v2, . . . , vk}be a set of vectors in R
n. Then span( A) is a subspace of R
n, and is the
smallest subspace of Rn containing A.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 33/212
mathematical methods i 33
Proof. We must show that the three conditions of Definition 2.4
hold.
(S1) It is clear that
0 = 0v1 + 0v2 +
· · ·+ 0vk
and so 0 is a linear combination of the vectors in A and thus
0 ∈ span( A). This seems like a lot of work just to saysomething that is almost obvious: if you take two linear combinations of aset of vectors and add them together,then the resulting vector is also a linearcombination of the original set of vectors!
(S2) Let u, v ∈ span( A). Then there are scalars αi, β i (1 ≤ i ≤ k )
such that
u = α1v1 + α2v2 + · · · + αk vk (2.4)
v = β1v1 + β2v2 + · · · + βk vk (2.5)
Now consider the sum u + v:
u + v = (α1
v1 + α
2v
2 +
· · ·+ α
k v
k )
+ ( β1v1 + β2v2 + · · · + βk vk ) (by (2.4), (2.5))
= (α1 + β1) v1 + (α2 + β2) v2 + · · · + (αk + βk ) vk
and so u + v ∈ span( A).
(S3) Let v ∈ span( A) and α ∈ R. Then there are scalars β i (1 ≤ i ≤k ) such that
v = β1v1 + β2v2 + · · · + βk vk (2.6)
It is clear that
αv = α ( β1v1 + β2v2 + · · · + βk vk ) (by (2.6))
= (αβ1)v1 + (αβ2)v2 + · · · + (αβk )vk (rearranging)
and so αv ∈ span( A).
The arguments earlier in this section showed that any subspace
containing A also contains span( A) and as span( A) is a subspace
itself, it must be the smallest subspace containing A.
The span of a set of vectors gives us an easy way to find sub-
spaces — start with any old set of vectors, take their span and weget a subspace. If we do have a subspace given in this way, then
what can we say about it? Is this a useful way to create, or work
with, a subspace?
For example, suppose we start with
A = {(1,0,1), (3,2,3)}
as a set of vectors in R3. Then span( A) is a subspace of R3 — what
can we say about this subspace? The first thing to notice is that
we can easily check whether or not a particular vector is contained
in span( A), because it simply involves solving a system of linearequations.7 7 This shows that having a subspace of
the form span( A) is a good representa-tion of the subspace, because we caneasily test membership of the subspace— that is, we “know” which vectorsare contained in the subspace.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 34/212
34
Continuing our example, to decide whether a vector v is con-
tained in span( A), we try to solve the vector equation
v = λ1(1,0,1) + λ2(3,2,3)
for the two “unknowns” λ1, λ2; this is a system of 3 linear equa-
tions in two unknowns and, as discussed in Chapter 1, can easily be
solved.
Example 2.16. (Vector not in span) If A = {(1,0,1), (3,2,3)}, then
v = (2,4,5) is not in span( A). This follows because the equation
(2,4,5) = λ1(1,0,1) + λ2(3,2,3)
yields the system of linear equations
λ1 + 3λ2 = 2
2λ2 = 4
λ1 + 3λ2 = 5
which is obviously inconsistent. Thus there is no linear combination of
the vectors in A that is equal to (2,4,5).
Example 2.17. (Vector in span) If A = {(1,0,1), (3,2,3)}, then v =
(5, −2, 5) is in span( A). This follows because the equation
(5, −2, 5) = λ1(1,0,1) + λ2(3,2,3)
yields the system of linear equations
λ1 + 3λ2 = 5
2λ2 = −2λ1 + 3λ2 = 5
which has the unique solution λ2 = −1 and λ1 = 8. Therefore v ∈span( A) because we have now found the particular linear combination
required.
Continuing with A = {(1,0,1), (3,2,3)}, is there another descrip-
tion of the subspace span( A)? It is easy to see that every vector in
span( A) must have its first and third coordinates equal, and by try-
ing a few examples, it seems likely that every vector with first and
third coordinates equal is in span( A). To prove this, we would need
to demonstrate that a suitable linear combination of the two vectorscan be found for any such vector. In other words, we need to show
that the equation
(x, y, x) = λ1(1,0,1) + λ2(3,2,3)
has a solution for all values of x and y. Fortunately this system of
linear equations can easily be solved symbolically with the result
that the system is always consistent with solution
λ1 = x − 3 y/2 λ2 = y/2.
Therefore we have the fact that
span({(1,0,1), (3,2,3)}) = {(x, y, x) | x, y ∈ R}
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 35/212
mathematical methods i 35
2.3.1 Spanning Sets
So far in this section, we have started with a small collection of vec-
tors (that is, the set A), and then built a subspace (that is, span( A))
from that set of vectors. Why do we want to find a spanning
set of vectors for a subspace? Theanswer is that a spanning set is aeffective way of describing a subspace.Every subspace has a spanning set,and so it is also a universal way of describing a subspace. Basically,once you know a spanning set fora subspace, you can easily calculateeverything about that subspace.
Now we consider the situation where we start with an arbi-trary subspace V and try to find a set — hopefully a small set —
of vectors A such that V = span( A). This concept is sufficiently
important to warrant a formal definition:
Definition 2.18. (Spanning Set)
Let V ⊆ Rn be a subspace. Then a set A = {v1, v2, . . . , vk} of vectors,
each contained in V, is called a spanning set for V if
V = span( A).
Example 2.19. (Spanning Set) If V = {(x, y, 0) | x, y ∈ R}, then V is
a subspace of R3. The set A = {(1,0,0), (0,1,0)} is a spanning set for
V, because every vector in V is a linear combination of the vectors in A,
and every linear combination of the vectors in A is in V. There are other
spanning sets for V — for example, the set B = {(1,1,0), (1, −1, 0)} is
another spanning set for V.
Example 2.20. (Spanning Set) If V = {(w, x, y, z) | w + x + y + z = 0},
then V is a subspace of R4. The set
A = {(1, −1,0,0), (1,0, −1, 0), (1,0,0, −1)}is a spanning set for V because every vector in V is a linear combination
of the vectors in A, and every linear combination of the vectors in A is in
V. There are many other spanning sets for V — for example, the set
A = {(2, −1, −1, 0), (1,0, −1, 0), (1,0,0, −1)}is a different spanning set for the same subspace.
One critical point that often causes difficulty for students begin-
ning linear algebra is understanding the difference between “span”
and “spanning set”; the similarity in the phrases seems to cause
confusion. To help overcome this, we emphasise the difference.8 8
Another way to think of it is that a“spanning set” is like a list of LEGO®shapes that you can use to builda model while the “span” is thecompleted model (the subspace).Finding a spanning set for a subspaceis like starting with the completedmodel and asking “What shapes do Ineed to build this model?".
Key Concept 2.21. (Difference between span and spanning set) To
remember the difference between span and spanning set, make sure
you understand that:
• The span of a set A of vectors is the entire subspace that can be
“built” from the vectors in A by taking linear combinations in all
possible ways.
• A spanning set of a subspace V is the set of vectors that are
needed in order to “build” V .
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 36/212
36
In the previous examples, the spanning sets were just “given”
with no explanation of how they were found, and no proof that
they were the correct spanning sets. In order to show that a particu-
lar set A actually is a spanning set for a subspace V , it is necessary
to check two things:
1. Check that the vectors in A are actually contained in V — this
guarantees that span( A) ⊆ V .
2. Check that every vector in V can be made as a linear combination
of the vectors in A — this shows that span( A) = V .
The first of these steps is easy and, by now, you will not be sur-
prised to discover that the second step can be accomplished by
solving a system of linear equations9. 9 In fact, almost everything in linearalgebra ultimately involves nothingmore than solving a system of linearequations!
Example 2.22. (Spanning Set With Proof) We show that the set A =
{(1,1, −1), (2,1,1)} is a spanning set for the subspace
V = {(x, y, z) | z = 2x − 3 y} ⊂ R3.
First notice that both (1,1, −1) and (2,1,1) satisfy the condition that
z = 2x − 3 y and so are actually in V. Now we need to show that every
vector in V is a linear combination of these two vectors. Any vector in
V has the form (x, y, 2x − 3 y) and so we need to show that the vector
equation
(x, y, 2x − 3 y) = λ1(1,1, −1) + λ2(2,1,1)
in the two unknowns λ1 and λ2 is consistent, regardless of the values of x
and y. Writing this out as a system of linear equations we get
λ1 + 2λ2 = x
λ1 + λ2 = y
−λ1 + λ2 = 2x − 3 y
Solving this system of linear equations using the techniques of the previ-
ous chapter shows that this system always has a unique solution, namely
λ1 = 2 y − x λ2 = x − y.
Hence every vector in V can be expressed as a linear combination of the
two vectors, showing that these two vectors are a spanning set for V.
Actually finding a spanning set for a subspace is not so difficult,
because it can just be built up vector-by-vector. Suppose that a sub-
space V is given in some form (perhaps by a formula) and you need
to find a spanning set for V . Start by just picking any non-zero vec-
tor v1 ∈ V , and examine span({v1}) — if this is equal to V , then
you have finished, otherwise there are some vectors in V that can-
not yet be built just from v1. Choose one of these “unreachable”
vectors, say v2, add it to the set you are creating, and then examine
span(
{v1, v2
}) to see if this is equal to V . If it is, then you are fin-
ished and otherwise there is another “unreachable” vector, whichyou call v3 and add to the set, and so on. After some finite number
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 37/212
mathematical methods i 37
of steps (say k steps), this process will eventually terminate10 when 10 The reason that this process mustterminate (i.e. not go on for ever)will become clear over the next fewsections.
there are no more unreachable vectors in V in which case
V = span({v1, v2, . . . , vk })
and you have found a spanning set for V .
Example 2.23. (Finding Spanning Set) Let
V = {(x, y, z) | z = 2x − 3 y},
which is a subspace of R3. To find a spanning set for V, we start by choos-
ing any non-zero vector that lies in V, say v1 = (1,0,2). It is clear that
span({v1}) is strictly smaller than V, because every vector in span({v1})
has a zero second coordinate, whereas there are vectors in V that do not
have this property. So we choose any one of these — say, v2 = (0,1, −3)
— and now consider span(
{v1, v2
}), which we can now prove is actually
equal to V. Thus, a suitable spanning set for V is the set
A = {(1,0,2), (0,1, −3)}.
2.4 Linear Independence
In the last section, we learned how a subspace can always be de-
scribed by giving a spanning set for that subspace. There are many
spanning sets for any given subspace, and in this section we con-
sider when a spanning set is efficient — in the sense that it is as
small as it can be. For example, here are three different spanningsets for the xy-plane in R
3 (remember that the x y-plane is the set of
vectors {(x, y, 0) | x, y ∈ R}.
A1 = {(1,0,0), (0,1,0)} A2 = {(1,1,0), (1, −1, 0), (1,3,0)} A3 = {(2,2,0), (1,2,0)}
Which of these is the “best” spanning set to use? There is perhaps
nothing much to choose between A1 and A3, because each of them
contain two vectors11, but it is clear that A2 is unnecessarily big — if 11 But maybe A1 looks “more natural”
because the vectors have such a simpleform; later we will see that for manysubspaces there is a natural spanningset, although this is not always the case
we throw out any of the three vectors in A2, then the remaining twovectors still span the same subspace. On the other hand, both of the
spanning sets A1 and A3 are minimal spanning sets for the xy-plane
in that if we discard any of the vectors, then the remaining set no
longer spans the whole x y-plane.
The reason that A2 is not a smallest-possible spanning set for the
xy-plane is that the third vector is redundant — it is already a linear
combination of the first two vectors: (1,3,0) = 2(1,1,0) − (1, −1, 0)
and therefore any vector in span( A2) can be produced as a linear
combination only of the first two vectors. More precisely any linear
combination
α1(1,1,0) + α2(1, −1, 0) + α3(1,3,0)
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 38/212
38
of all three of the vectors in A2 can be rewritten as
α1(1,1,0) + α2(1, −1, 0) + α3 (2(1,1,0) − (1, −1, 0))
which is equal to
(α1 + 2α3)(1,1,0) + (α1 − α3)(1, −1, 0)which is just a linear combination of the first two vectors with al-
tered scalars.
Therefore a spanning set for a subspace is an efficient way to
represent a subspace if none of the vectors in the spanning set is a
linear combination of the other vectors. While this condition is easy
to state, it is hard to work with directly, and so we use a condition
that means exactly the same thing but is easier to use.
Definition 2.24. (Linear Independence)
Let A = {v1, v2, . . . , vk} be a set of vectors in Rn. Then A is calledlinearly independent (or just independent) if the only solution to the
vector equation
λ1v1 + λ2v2 + · · · + λk vk = 0
in the unknowns λ1, λ2, . . ., λk is the trivial solution λ1 = λ2 = · · · =
λk = 0.
Before seeing why this somewhat strange definition means ex-
actly the same as having no one of the vectors being a linear com-
bination of the others, we’ll see a couple of examples. The astutereader (indeed, even the somewhat dull and apathetic reader) will
not be surprised to learn that testing a set of vectors for linear inde-
pendence involves solving a system of linear equations.
Example 2.25. (Independent Set) In order to decide whether the set of
vectors A = {(1,1,2,2), (1,0, −1, 2), (2,1,3,1)} in R4 is linearly
independent, we need to solve the vector equation
λ1(1,1,2,2) + λ2(1,0, −1, 2) + λ3(2,1,3,1) = (0,0,0,0).
This definitely has at least one solution, namely the trivial solution
λ1 = 0, λ2 = 0, λ3 = 0, and so the only question is whether it has
more solutions. The vector equation is equivalent to the system of linear
equations
λ1 + λ2 + 2λ3 = 0
λ1 + λ3 = 0
2λ1 − λ2 + 3λ3 = 0
2λ1 + 2λ2 + λ3 = 0
which can easily be shown, by the techniques of Chapter 1 to have a unique
solution.
A set of vectors that is not linearly independent is called depen-
dent. To show that a set of vectors is dependent, it is only necessary
to find an explicit non-trivial linear combination12 of the vectors 12 There is an asymmetry here similarto the asymmetry in subspace proofs.To show that a set of vectors is de-pendent only requires one non-triviallinear combination, whereas to showthat a set of vectors is independent itis necessary in principle to show thatevery non-trivial linear combinationof the vectors is non-zero. Of coursein practice this is done by solving the
equal to 0.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 39/212
mathematical methods i 39
Example 2.26. (Dependent Set) Is the set
A = {(1,3, −1), (2,1,2), (4,7,0)}
in R3 linearly independent? To decide this, set up the vector equation
λ1(1,3, −1) + λ2(2,1,2) + λ3(4,7,0) = (0,0,0)
and check how many solutions it has. This is equivalent to the system of
linear equations
1λ1 + 2λ2 + 4λ3 = 0
3λ1 + λ2 + 7λ3 = 0
−λ1 + 2λ2 = 0
After Gaussian elimination, the augmented matrix for this system of linear
equations is
1 2 4 0
0 −5 −5 0
0 0 0 0
and so has infinitely many solutions, because λ3 is a free parameter.
While this is already enough to prove that A is dependent, it is always
useful to find an explicit solution which can then be used to double-check
the conclusion. As λ3 is free, we can find a solution by putting λ3 = 1, in
which case the second row gives λ2 = −1 and the first row λ1 = −2. And
indeed we can check that
−2(1,3, −1) − 1(2,1,2) + 1(4,7,0) = (0,0,0)
as required to prove dependence.
Now we’ll give a rigorous proof of the earlier claim that the
definition of linear independence (Definition 2.24) is just a way of
saying that none of the vectors is a linear combination of the others.
Theorem 2.27. Let A = {v1, v2, . . . , vk} be a set of vectors in Rn.
Then A is linearly independent if and only if none of the vectors in A are a
linear combination of the others.
Proof. We will actually prove the contrapositive13 statement, namely 13 Given the statement “ A implies B”,recall that the contrapositive statement
is “not B implies not A”. If a statementis true then so is its contrapositive, andvice versa.
that A is linearly dependent if and only if one of the vectors isa linear combination of the others. First suppose that one of the
vectors, say vi, is a linear combination of the others: then there are
scalars α1, α2, . . ., αi−1, α i+1, . . ., αk such that
vi = α1v1 + α2v2 + · · · + αi−1vi−1 + αi+1vi+1 + · · · + αk vk
and so there is a non-trivial linear combination of the vectors of A
equal to 0, namely:
α1v1 + α2v2 + · · · + αi−1vi−1 − 1vi + αi+1vi+1 + · · · + αk vk = 0
(This linear combination is not all-zero because the coefficient of vi
is equal to −1 which is definitely non-zero.)
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 40/212
40
Next suppose that the set A is linearly dependent. Then there is
some non-trivial linear combination of the vectors equal to 0:
α1v1 + α2v2 + · · · + αk vk = 0.
Because this linear combination is non-trivial, not all of the coeffi-cients are equal to 0, and so we can pick one of them, say αi, that is
non-zero. But then
−αivi = α1v1 + α2v2 + · · · + αi−1vi−1 + αi+1vi+1 + · · · + αk vk,
and so because αi = 0 we can divide by αi and get
vi = −α1
αiv1 − α2
αiv2 − · · · − αi−1
αivi−1 − αi+1
αivi+1 − · · · − αk
αivk
so one of the vectors is a linear combination of the others.
There are two key facts about dependency that are intuitively
clear, but useful enough to state formally:
1. If A is a linearly independent set of vectors in Rn, then any sub-
set of A is also linearly independent.
2. If B is a linearly dependent set of vectors in Rn then any superset
of B is also linearly dependent.
In other words, you can remove vectors from an independent set
and it remains independent and you can add vectors to a depen-
dent set and it remains dependent.
2.5 Bases
In the last few sections, we have learned that giving a spanning set
for a subspace is an effective way of describing a subspace and that
a spanning set is efficient if it is linearly independent. Therefore an
excellent way to describe or transmit, for example by computer, a
subspace is to give a linearly independent spanning set for the sub-
space. This concept is so important that it has a special name:
Definition 2.28. (Basis)Let V be a subspace of Rn. Then a basis for V is a linearly independent
spanning set for V. In other words, a basis is a set of vectors A ⊂ Rn
such that
• V = span( A), and
• A is linearly independent.
Example 2.29. (Basis) The set A = {(1,0,0), (0,1,0)} is a basis for
the xy-plane in R3, because it is a linearly independent set of vectors and
any vector in the xy-plane can be expressed as a linear combination of thevectors of A.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 42/212
42
shows that you can start with any linearly independent set and
augment it vector-by-vector to obtain a basis containing the original
linearly independent set of vectors16. 16 We still have not yet shown that thisprocess will actually terminate, butwill do so in the next section.
Another approach to finding a basis of a subspace is to start with
a spanning set that is linearly dependent and to remove vectors from
it one-by-one. If the set is linearly dependent then one of the vec-
tors is a linear combination of the others, and so it can be removed
from the set without altering the span of the set of vectors. This
process can be repeated until the remaining vectors are linearly
independent in which case they form a basis.
Example 2.32. (Basis from a spanning set) Let
A = {(1,1, −2), (−2, −2, 4), (−1, −2, 3), (5, −5, 0)}
be a set of vectors in R3, and let V = span( A). What is a basis for V? We
start by testing whether A is linearly independent by solving the system of linear equations
λ1(1,1, −2) + λ2(−2, −2, 4) + λ3(−1, −2, 3) + λ4(5, −5, 0) = (0,0,0)
to see if it has any non-trivial solutions. If so, then one of the vectors can
be expressed as a linear combination of the others and discarded. In this
case, we discover that (−2, −2, 4) = −2(1,1, −2) and so we can throw
out (−2, −2, 4). Now we are left with
{(1,1, −2), (−1, −2, 3), (5, −5, 0)}
and test whether this set is linearly independent. By solving
λ1(1,1, −2) + λ2(−1, −2, 3) + λ3(5, −5, 0) = (0,0,0)
we discover that (5, −5, 0) = 15(1,1, −2) + 10(−1, −2, 3) and so we
can discard (5, −5, 0). Finally the remaining two vectors are linearly
independent and so the set
{(1,1, −2), (−1, −2, 3)}
is a basis for V.
The most important property of a basis for a vector space V is
that every vector in V can be expressed as a linear combination of
the basis vectors in exactly one way; we prove this in the next result:
Theorem 2.33. Let B = {v1, v2, . . . , vk} be a basis for the sub-
space V. Then for any vector v ∈ V, there is a unique choice of scalars
α1, α2, . . . , αk such that
v = α1v1 + α2v2 + · · · + αk vk.
We call the scalars α1, . . . ,αk the coordinates of v in the basis B .Proof. As
B is a spanning set for V , there is at least one way of
expressing v as a linear combination of the basis vectors. So we justneed to show that there cannot be two different linear combinations
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 43/212
mathematical methods i 43
each equal to v. So suppose that there are scalars α1, α2, . . . ,αk and
β1, β2, . . . , βk such that
v = α1v1 + α2v2 + · · · + αk vk
v = β1v1 + β2v2 + · · · + βk vk
Subtracting these expressions and rearranging, we discover that
0 = (α1 − β1)v1 + (α2 − β2)v2 + · · · + (αk − βk )vk .
As B is a linearly independent set of vectors, the only linear com-
bination equal to 0 is the trivial linear combination with all coeffi-
cients equal to 0, and so α1 − β1 = 0, α2 − β2 = 0, . . . , αk − βk = 0
and so αi = βi for all i. Therefore the two linear combinations for v
are actually the same.
2.5.1 Dimension
As previously mentioned, a subspace of Rn will usually have in-finitely many bases. However, these bases will all share one feature
— every basis for a subspace V contains the same number of vectors.
This fact is not at all obvious, and so we will give a proof for it.
Actually we will prove a slightly more technical result that has the
result about bases, along with a number of other useful results, as
simple consequences.17 While this is a result of fundamental impor- 17 A result that is a straightforwardconsequence of a theorem is called a“corollary”.
tance, it uses some fiddly notation with lots of subscripts, so do not
feel alarmed if you do not understand it first time through.
Theorem 2.34. Let A = {v1, v2, . . . , vk} be a linearly independent set
of vectors. Then any set of >
k vectors contained in V = span( A) isdependent.
Proof. Let {w1, w2, . . . , w} be a set of > k vectors in V . Then
each of these can be expressed as a linear combination of the vec-
tors in A, and so there are scalars αij ∈ R, where 1 ≤ i ≤ and
1 ≤ j ≤ k such that:
w1 = α11v1 + α12v2 + · · · + α1k vk
w2 = α21v1 + α22v2 + · · · + α2k vk
...
w = α1v1 + α2v2 + · · · + αk vk
Now consider what happens when we test {w1, w2, . . . , w} for
linear dependence. We try to solve the system of linear equations
β1w1 + β2w2 + · · · + βw = 0 (2.7)
and determine if there are any non-trivial solutions to this system.
By replacing each w i in (2.7) with the corresponding expression as
a linear combination of the vectors in A, we get a huge equation:
0 = β1(α11v1 + α12v2 + · · · + α1k vk)
+ β2(α21v1 + α22v2 + · · · + α2k vk)
+ · · ·+ β(α1v1 + α2v2 + · · · + αk vk).
(2.8)
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 44/212
44
However, this is a linear combination of the vectors in A that is
equal to the zero vector. Because A is a linearly independent set of
vectors, this happens if and only if the coefficients of the vectors vi
in (2.8) are all zero. In other words, the scalars { β1, β2, . . . , β} must
satisfy the following system of linear equations:
α11 β1 + α21 β2 + · · · + α1 β = 0
α12 β1 + α22 β2 + · · · + α2 β = 0
......
...
α1k β1 + α2k β2 + · · · + αk β = 0.
This is a homogeneous18 system of linear equations, and so it is 18 The constant term in each equationis zero.consistent. However, there are more variables than equations and
so there is at least one non-basic variable, and hence there is at least
one non-trivial choice of scalars { β1, β2, . . . , β} satisfying (2.7),
thereby showing that {w1, w2, . . . , w} is linearly dependent.
Corollary 2.35. Every basis for a subspace V of Rn contains the same
number of vectors; this number is then called the dimension of the sub-
space V and is denoted by dim(V ).
Proof. Suppose that A = {v1, v2, . . . , vk} and B = {w1, w2, . . . , w}are two bases for V . Then both A and B are linearly independent
sets of vectors and both have the same span. So by Theorem 2.34 it
follows that k ≤ and ≤ k and so = k .
Example 2.36. (Dimension of Rn) The standard basis for R2 contains
two vectors, the standard basis for R3 contains three vectors and the stan-
dard basis for Rn contains n vectors, so we conclude that the dimension
of Rn is equal to n.
Example 2.37. (Dimension of a line) A line through the origin in Rn is
a subspace consisting of all the multiples of a given non-zero vector:
L = {λv | λ ∈ R}
The set {v} containing the single vector v is a basis for L and so a line is
1-dimensional.
This shows that the formal algebraic definition of dimension
coincides with our intuitive geometric understanding of the word
dimension, which is reassuring.19 19 Of course, we would expect this to be the case, because the algebraic no-tion of “dimension” was developed asan extension of the familiar geometricconcept.
Another important corollary of Theorem 2.34 is that we are fi-
nally in a position to show that the process of finding a basis by
extending 20 a linearly independent set will definitely finish.20 “extending” just means “addingvectors to”.Corollary 2.38. If V is a subspace of Rn, then any linearly independent
set A of vectors in V is contained in a basis for V.
Proof. If span( A) = V , then adding a vector in V \span( A) to A creates a strictly larger independent set of vectors. As no set
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 45/212
mathematical methods i 45
of n + 1 vectors in Rn is linearly independent, this process must
terminate in at most n steps, and when it terminates, the set is a
basis for V .
The next result seems intuitively obvious, because it just says
that dimension behaves as you would expect, in that a subspace canonly contain other subspaces if they have lower dimension.
Theorem 2.39. Suppose that S, T are subspaces of Rn and that S T.
Then dim(S) < dim(T ).
Proof. Let BS be a basis for S. Then BS is a linearly independent set
of vectors contained in T , and so it can be extended to a basis for T .
As span(BS) = T the basis for T is strictly larger than the basis for
S and so dim(S) < dim(T ).
Theorem 2.35 has a number of important consequences. If A is a
set of vectors contained in a subspace V , then normally there is noparticular relationship between the properties “ A is linearly inde-
pendent” and “ A is a spanning set for V ” in that A can have none,
either or both of these properties. However if A has the right size to
be a basis, then it must either have none or both of the properties.
Corollary 2.40. Let V be a k-dimensional subspace of Rn. Then
1. Any linearly independent set of k vectors of V is a basis for V.
2. Any spanning set of k vectors of V is a basis for V.
Proof. To prove the first statement, let A be a linearly independent
set of vectors in V . By Corollary 2.38, A can be extended to a basis
but by Corollary 2.35 this basis contains k vectors and so no vectors
can be added to A. Therefore A is already a basis for V .
For the second statement, let B be a spanning set of vectors,
and let B be the largest independent set of vectors contained in
B. Then span(B) = span(B) = V and so B is a basis for V . By
Corollary 2.35 this basis contains k vectors and so B = B, showing
that B was already linearly independent.
Example 2.41. (Dimension of a specific plane) The subspace V =
{(x, y, z)
| x + y + z = 0
}is a plane through the origin in R
3. What
is its dimension? We can quickly find two vectors in V, namely (1, −1, 0)
and (1,0, −1), and as they are not multiples of each other, the set
A = {(1, −1, 0), (1,0, −1)}
is linearly independent. As R3 is 3-dimensional, any proper subspace of
R3 has dimension at most 2, and so this set of vectors is a basis.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 47/212
3
Matrices and Determinants
This chapter introduces matrix algebra and explains the fundamental relationships between
matrices and their properties, and the various subspaces associated with a matrix.
Before commencing this chapter, students should be able to:
• Solve systems of linear equations,
• Confidently identify and manipulate subspaces, including rapidly determining spanning sets and
bases for subspaces, and
• Add and multiply matrices.
After completing this chapter, students will be able to:
• Understand the operations of matrix algebra and identify the similarities and differences between
matrix algebra and the algebra of real numbers,
• Describe, and find bases for, the row space, column space and null space of a matrix,
• Find the rank and nullity of a matrix and understand how they are related by the rank-nullity theo-
rem, and
• Compute determinants and understand the relationship between determinants, rank and invertibility
of matrices.
An m × n matrix is a rectangular array of numbers with m rows
and n columns.
3.1 Matrix Algebra
In this section we consider the algebra of matrices — that is, the sys-tem of mathematical operations such as addition, multiplication,
inverses and so on, where the operands1 are matrices, rather than 1 This is mathematical terminology for“the objects being operated on”.numbers. In isolation, the basic operations are all familiar from
high school — in other words, adding two matrices or multiplying
two matrices should be familiar to everyone— but matrix algebra is
primarily concerned with the relationships between the operations.
3.1.1 Basic Operations
The basic operations for matrix algebra are matrix addition, matrix
multiplication , matrix transposition and scalar multiplication. For com-pleteness, we give the formal definitions of these operations:
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 48/212
48
Definition 3.1. (Matrix Operations)
The basic matrix operations are matrix addition, matrix multiplication,
matrix transposition and scalar multiplication, which are defined as
follows:Matrix Addition: Let A = (aij ) and B = (bij ) be two m × n matrices.
Then their sum C = A + B is the m × n matrix defined by
cij = a ij + bij .
Matrix Multiplication: Let A = (aij ) be an m × p matrix, and B =
(bij ) be a p × n matrix. Then their product C = AB is the m × n matrix
defined by
cij =k = p
∑ k =1
aik bkj .
Matrix Transposition: Let A = (aij ) be an m × n matrix, Then thetranspose C = AT of A is the n × m matrix defined by
cij = a ji .
Scalar Multiplication: Let A = (aij ) be an m × n matrix, and α ∈ R be
a scalar. Then the scalar multiple C = α A is the m × n matrix defined by
cij = αaij .
The properties of these operations are mostly obvious but,again for completeness, we list them all and give them their for-
mal names.
Theorem 3.2. If A, B and C are matrices and α, β are scalars then,
whenever the relevant operations are defined, the following properties hold:
1. A + B = B + A (matrix addition is commutative)
2. ( A + B) + C = A + (B + C) (matrix addition is associative)
3. α( A + B) = α A + αB
4. (α + β) A = α A + β A
5. (αβ) A = α( β A)
6. A(BC) = ( AB)C (matrix multiplication is associative)
7. (α A)B = α( AB) and A(αB) = α( AB)
8. A(B + C) = AB + AC (multiplication is left-distributive over
addition)
9. ( A + B)C = AC + BC (multiplication is right-distributive over
addition)
10. ( AT )T = A
11. ( A + B)T = AT + BT
12. ( AB)T = BT AT
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 49/212
mathematical methods i 49
Proof. All of these can be proved by elementary algebraic manip-
ulation of the expressions for (i, j)-entry of the matrices on both
sides of each equation. We omit the proofs because they are slightly
tedious and not very illuminating.2 2 However it may be worth your whileworking through one of them, say
the proof that matrix multiplication isassociative, to convince yourself thatyou can do it.
Almost all of the properties in Theorem 3.2 are unsurprising andessentially mirror the properties of the algebra of real numbers. 3
3 A 1 × 1 matrix can be viewed asessentially identical to a real number,and so this is also not surprising.
Probably the only property in the list that is not immediately “obvi-
ous” is property (12) stating that the transpose of a matrix product
is the product of the matrix transposes in reverse order.
( AB)T = B T AT
This property extends to longer products of matrices, for example
we can find the transpose of ABC as follows:4 4 A cautious reader might — correctly— object that A BC is not a legiti-mate expression in matrix algebra,
because we have only defined ma-trix multiplication to be a productof two matrices. To make this a legalexpression in matrix algebra, it reallyneeds to be parenthesised, and so weshould use either A(BC) or ( AB)C.However because matrix multiplicationis associative, these evaluate to thesame matrix and so, by convention,as it does not matter which say theproduct is parenthesised, we omit theparentheses altogether.
( ABC)T = (( AB)C)T = CT ( AB)T = CT (BT AT ) = CT BT AT
It is a nice test of your ability to structure a proof by induction to
prove formally that
( A1 A2 · · · An−1 An)T = AT n AT
n−1 · · · AT 2 AT
1 .
However, rather than considering the obvious properties that
are in the list, it is more instructive to consider the most obvious
omission from the list; in other words, an important property that
matrix algebra does not share with the algebra of real numbers. This
is the property of commutativity of multiplication because, while
the multiplication of real numbers is commutative, it is easy tocheck by example that matrix multiplication is not commutative. For
example, 1 1
2 1
3 −1
0 4
=
3 3
6 2
but 3 −1
0 4
1 1
2 1
=
1 2
8 4
.
If A and B are two specific matrices, then it might be the case that
AB = BA, in which case the two matrices are said to commute, but
usually it will be the case that AB = BA. This is a key difference between matrix algebra and the algebra of real numbers.
There are some other key differences worth delving into: in real
algebra, the numbers 0 and 1 play special roles, being the additive
identity and multiplicative identity respectively. In other words, for
any real number x ∈ R we have
x + 0 = 0 + x = x and 1.x = x .1 = x
and
0.x = x .0 = 0. (3.1)
In the algebra of square matrices (that is, n × n matrices for somen) we can analogously find an additive identity and a multiplicative
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 50/212
50
identity. The additive identity is the matrix On with every entry equal
to zero, and it is obvious that for any n × n matrix A,
A + On = On + A = A .
The multiplicative identity is the matrix I n where every entry onthe main diagonal5 is equal to one, and every entry off the main 5 The main diagonal consist of the
(1, 1), (2, 2), . . ., (n, n) positions. As anexample,
I 3 =
1 0 0
0 1 00 0 1
.
diagonal is equal to zero. Then for any n × n matrix A,
AI n = I n A = A.
When the size of the matrices is unspecified, or irrelevant, we will
often drop the subscript and just use O and I respectively. As the
terms “additive/multiplicative identity" are rather cumbersome, the
matrix O is usually called the zero matrix and the matrix I is usually
called the identity matrix or just the identity.
The property (3.1) relating multiplication and zero also holds inmatrix algebra, because it is clear that
AO = O A = O
for any square matrix A. However, there are other important prop-
erties of real algebra that are not shared by matrix algebra. In par-
ticular, in real algebra there are no non-zero zero divisors, so that if
xy = 0 then at least one of x and y is equal to zero. However this is
not true for matrices — there are products equal to the zero matrix
even if neither matrix is zero. For example,
1 1
2 2
2 −1
−2 1
=
0 0
0 0
.
Definition 3.3. (A menagerie of square matrices)
Suppose that A = (aij ) is an n × n matrix. Then
• A is the zero matrix if aij = 0 for all i, j.
• A is the identity matrix if aij = 1 if i = j and 0 otherwise.
• A is a symmetric matrix if AT = A.
• A is a skew-symmetric matrix if AT = − A.
• A is a diagonal matrix if aij = 0 for all i = j .
• A is an upper-triangular matrix if aij = 0 for all i > j.
• A is a lower-triangular matrix if aij = 0 for all i < j .
• A is an idempotent matrix if A2 = A, where A2 = AA is the
product of A with itself.
• A is a nilpotent matrix if Ak = O for some k, where Ak = AA · · · A k times
.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 51/212
mathematical methods i 51
Example 3.4. (Matrices of various types) Consider the following 3 × 3
matrices:
A =
1 0 −1
0 2 3
0 0 1
B =
−1 0 0
0 2 0
0 0 0
C =
−1 0 1
0 2 1
1 1 3
Then A is an upper-triangular matrix, B is a diagonal matrix and C is
a symmetric matrix.
Example 3.5. (Nilpotent matrix) The matrix
A =
2 4
−1 −2
is nilpotent because
A2 =
2 4
−1 −2
2 4
−1 −2
=
0 0
0 0
Exercise 3.1.1. Show that an upper-triangular matrix with zero diagonal
is nilpotent.
3.2 Subspaces from matrices
There are various vector subspaces associated with a matrix, and
there are useful relationships between the properties of the matrices
and subspaces. The three principal subspaces associated with a
matrix are the row space, the column space and the null space of a
matrix. The first two of these are defined analogously to each other,while the third is somewhat different. If A is an m × n matrix,
then each of the rows of the matrix can be viewed as a vector in Rn,
while each of the columns of the matrix can be viewed as a vector in
Rm.
Definition 3.6. (Row and Column Space)
Let A be an m × n matrix. Then the row space and column space are
defined as follows:
Row Space The row space of A is the subspace of Rn that is spanned
by the rows of A. In other words, the row space is the subspace con-sisting of all the linear combinations of the rows of A. We denote it by
rowsp( A).
Column Space The column space of A is the subspace of Rm that is
spanned by the columns of A. In other words, the column space is the
subspace consisting of all the linear combinations of the columns of A. We
denote it by colsp( A).
Example 3.7. (Row space and Column Space) Let A be the 3 × 4 matrix
A = 1 0 1
−1
2 −1 2 01 0 2 1
(3.2)
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 52/212
52
Then the row space of A is the subspace of R4 defined by
rowsp( A) = span({(1,0,1, −1), (2, −1,2,0), (1,0,2,1)}),
while the column space of A is the subspace of R3 defined by
colsp( A) = span({(1,2,1), (0, −1, 0), (1,2,2), (−1,0,1)}).
(Remember the conventions regarding row and column vectors described
in Chapter 1.)
As described in Chapter 2, it is easy to answer any particular
question about a subspace if you know a spanning set for that
subspace. In particular, it is easy to determine whether a given
vector is in the row or column space of a matrix just by setting up
the appropriate system of linear equations.
Example 3.8. (Vector in row space) Is the vector (2,1, −1, 3) in the rowspace of the matrix A shown in (3.2) above? This question is equivalent to
asking whether there are scalars λ1, λ2 and λ3 such that
λ1(1,0,1, −1) + λ2(2, −1,2,0) + λ3(1,0,2,1) = (2,1, −1, 3).
By considering each of the four coordinates in turn, this corresponds to the
following system of four linear equations in the three variables:
λ1 + 2λ2 + λ3 = 2
− λ2 = 1
λ1
+ 2λ
2 +
2λ
3 =
−1−λ1 + λ3 = 3
The augmented matrix for this system is
1 2 1 2
0 −1 0 1
1 2 2 −1
−1 0 1 3
which, after row-reduction, becomes
1 2 1 20 −1 0 1
0 0 1 −3
0 0 0 13
and so the system is inconsistent and we conclude that (2,1, −1, 3) /∈rowsp( A). Notice that the coefficient matrix of the system of linear equa-
tions is the transpose of the original matrix.
Example 3.9. (Vector in column space) Is the vector (1, −1, 2) in the row
space of the matrix A shown in (3.2) above? This question is equivalent to
asking whether there are scalars λ1, λ2, λ3 and λ4 such that
λ1(1,2,1) + λ2(0, −1, 0) + λ3(1,2,2) + λ4(−1,0,1) = (1, −1, 2).
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 53/212
mathematical methods i 53
By considering each of the three coordinate positions in turn, this corre-
sponds to a system of three equations in the four variables:
λ1 + λ3 − λ4 = 1
2λ1
− λ2 + 2λ3 =
−1
λ1 + 2λ3 + λ4 = 2
The augmented matrix for this system is 1 0 1 −1 1
2 −1 2 0 −1
1 0 2 1 2
which, after row reduction, becomes
1 0 1 −1 1
0 −1 0 2 −3
0 0 1 2 1
and so this system of linear equations has three basic variables, one free
parameter and therefore infinitely many solutions. So we conclude that
(1, −1, 2) ∈ colsp( A). We could, if necessary, or just to check, find a
particular solution to this system of equations. For example, if we set the
free parameter λ4 = 1 then the corresponding solution is λ1 = 3, λ2 = 5,
λ3 = −1 and λ4 = 1 and we can check that
3(1,2,1) + 5(0, −1, 0) − (1,2,2) + (−1,0,1) = (1, −1, 2).
Notice that in this case, the augmented matrix of the system of linear
equations is just the original matrix itself.
In the previous two examples, the original question led to sys-
tems of linear equations whose coefficient matrix was either the
original matrix or the transpose of the original matrix.
In addition to being able to identify whether particular vectors
are in the row space or column space of a matrix, we would also
like to be able to find a a basis for the subspace and thereby deter-
mine its dimension. In Chapter 2 we described a technique where
any spanning set for a subspace can be reduced to a basis by suc-
cessively throwing out vectors that are linear combinations of the
others. While this technique works perfectly well for determining
the dimension of the row space or column space of a matrix, there
is an alternative approach based on two simple observations:
1. Performing elementary row operations on a matrix does not
change its row space.6 6 However, elementary row operationsdo change the column space!!2. The non-zero rows of a matrix in row-echelon form are linearly
independent, and therefore form a basis for the row space of that
matrix.
The consequence of these two facts is that it is very easy to find
a basis for the row space of a matrix — simply put it into row-echelon form and then write down the non-zero rows that are
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 54/212
54
found. However the basis that is found by this process will not
usually be a subset of the original rows of the matrix. If it is neces-
sary to find a basis for the row space whose vectors are all original
rows of the matrix, then either of the two earlier techniques can be
used.
Example 3.10. (Basis of row space) Consider the problem of finding a
basis for the row space of the 4 × 5 matrix
A =
1 2 −1 −1 4
0 0 0 0 0
1 −1 1 0 1
3 0 1 −1 6
.
After performing the elementary row operations R3 ← R3 − 1R1, R4 ←R4 − 3R1 then R2 ↔ R3 and finally R4 ← R4 − 2R2, we end up with the
following matrix, which we denote A, which is in row-echelon form.
A =
1 2 −1 −1 4
0 −3 2 1 −3
0 0 0 0 0
0 0 0 0 0
The key point is that the elementary row operations have not changed the
row space of the matrix in any way and so rowsp( A) = rowsp( A).
However it is obvious that the two non-zero rows
{(1,2, −1, −1, 4), (0, −3,2,1, −3)}are a basis for the row space of A, and so they are also a basis for the row
space of A.To find a basis for the column space of the matrix A, we cannot
do elementary row operations because they alter the column space.
However it is clear that colsp( A) = rowsp( AT ), and so just trans-
posing the matrix and then performing the same procedure will
find a basis for the column space of A.
Example 3.11. (Basis of column space) What is a basis for the column
space of the matrix A of Example 3.10? We first transpose the matrix,
getting
AT =
1 0 1 3
2 0 −1 0−1 0 1 1
−1 0 0 −1
4 0 1 6
and then perform Gaussian elimination to obtain the row-echelon matrix
1 0 1 3
0 0 −3 −6
0 0 0 0
0 0 0 0
0 0 0 0
whose row space has basis {(1,0,1,3), (0,0, −3, −6)}. Therefore thesetwo vectors are a basis for the column space of A.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 55/212
mathematical methods i 55
What can be said about the dimension of the row space and col-
umn space of a matrix? In the previous two examples, we found
that the row space of the matrix A is a 2-dimensional subspace of
R5, and the column space of A is a 2-dimensional subspace of R4.
In particular, even though they are subspaces of different ambient
vector spaces, the dimensions of the row space and column space
turn out to be equal. This is not an accident, and in fact we have the
following surprising result:
Theorem 3.12. Let A be an m × n matrix. Then the dimension of its row
space is equal to the dimension of its column space.
Proof. Suppose that {v1, v2, . . . , vk} is a basis for the column space
of A. Then each column of A can be expressed as a linear combi-
nation of these vectors; suppose that the i-th column ci is given
by
ci = γ1iv1 + γ2iv2 + · · · γki vk
Now form two matrices as follows: B is an m × k matrix whose
columns are the basis vectors vi, while C = (γij ) is a k × n matrix
whose i-th column contains the coefficients γ1i, γ2i, . . ., γki . It then
follows7 that A = BC. 7 You may have to try this out witha few small matrices first to see whythis is true. It is not difficult whenyou see a small situation, but it is notimmediately obvious either.
However we can also view the product A = BC as expressing
the rows of A as a linear combination of the rows of C with the i-th
row of B giving the coefficients for the linear combination that de-
termines the i-th row of A. Therefore the rows of C are a spanning
set for the row space of A, and so the dimension of the row space of
A is at most k . We conclude that
dim(rowsp( A)) ≤ dim(colsp( A))
Applying the same argument to AT we also conclude that
dim(colsp( A)) ≤ dim(rowsp( A))
and hence these values are equal.
This number — the common dimension of the row space and
column space of a matrix — is an important property of a matrix
and has a special name:
Definition 3.13. (Matrix Rank)
The dimension of the row space (and column space) of a matrix is called
the rank of the matrix. If an n × n matrix has rank n, then the matrix is
said to be full rank.
There is a useful characterisation of the row- and column spaces
of a matrix that is sufficiently important to state separately.
Theorem 3.14. If A is an m
×n matrix, then the set of vectors
{ A x | x ∈ Rn}
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 56/212
56
is equal to the column space of A, while the set of vectors
{y A | y ∈ Rm}
is equal to the row space of A.
Proof. Suppose that {c1, c2, . . . , cn} ∈ Rm are the n columns of
A. Then if x = (x1, x2, . . . , xn), it is easy to see that A x = x1c1 +
x2c2 + . . . + xncn. Therefore every vector of the form A x is a linear
combination of the columns of A, and every linear combination of
the columns of A can be obtained by multiplying A by a suitable
vector. A similar argument applies for the row space of A.
3.3 The null space
Suppose that A is an m × n matrix. If two vectors v1, v2 have the
property that Av1 = 0 and Av2 = 0, then simple manipulationshows that
A(v1 + v2) = Av1 + Av2 = 0 + 0 = 0
and for any λ ∈ R,
A(λv1) = λ Av1 = λ0 = 0
Therefore the set of vectors v with the property that Av = 0 is
closed under vector addition and scalar multiplication, and there-
fore it satisfies the requirements to be a subspace.
Definition 3.15. (Null space)
Let A be an m × n matrix. The set of vectors
{v ∈ Rn | Av = 0}
is a subspace of Rn called the null space of A and denoted by nullsp( A).
Example 3.16. (Null space) Is the vector v = (0,1, −1, 2) in the null
space of the matrix
A = 1 2 2 03 0 2 1?
All that is needed is to check Av and see what arises. As
1 2 2 0
3 0 2 1
0
1
−1
2
=
0
0
it follows that v ∈ nullsp( A).
This shows that testing membership of the null space of a matrix
is a very easy task. What about finding a basis for the null space of a matrix? This turns out8 to be intimately related to the techniques 8 No surprise here!
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 57/212
mathematical methods i 57
we used Chapter 1 to solve systems of linear equations.
So, suppose we wish to find a basis for the matrix
A =
1 2 2 0
3 0 2 1
from Example 3.16. The matrix equation A x = 0 yields the follow-
ing system of linear equations
x1 + 2x2 + 2x3 = 0
3x1 + 2x3 + x4 = 0
which has augmented matrix1 2 2 0 0
3 0 2 1 0
After the single elementary row operation R2
← R2
−3R1, the
matrix is in row echelon form:1 2 2 0 0
0 −6 −4 1 0
Therefore x3 and x4 are free parameters; the second equation shows
that x2 = (x4 − 4x3)/6 and substituting this into the first equation
yields x1 = −(2x3 + x4)/3. Thus, following the techniques of
Chapter 1 we can describe the solution set as
S = {(−(2x3 + x4)/3, (x4 − 4x3)/6, x3, x4) | x3, x4 ∈ R}
In order to find a basis for S notice that we can rewrite the solu-tion as a linear combination of vectors by separating out the terms
involving x3 from the terms involving x4
(−(2x3 + x4)/3, (x4 − 4x3)/6, x3, x4) =
= (−2x3/3, −4x3/6, x3, 0) + (−x4/3, x4/6, 0, x4)
= x3(−2/3, −2/3, 1, 0) + x4(−1/3, 1/6, 0, 1)
Therefore we can express the solution set S as follows:
S = {x3(−2/3, −2/3, 1, 0) + x4(−1/3, 1/6, 0, 1) | x3, x4 ∈ R}
However this immediately tells us that S just consists of all thelinear combinations of the two vectors (−2/3, −2/3, 1, 0) and
(−1/3, 1/6, 0, 1) and therefore we have found, almost by acci-
dent, a spanning set for the subspace S. It is immediate that these
two vectors are linearly independent and therefore they form a
basis for the null space of A.
In general, this process will always find a basis for the null space
of a matrix. If the set of solutions to the system of linear equations
has s free parameters, then it can be expressed as a linear combina-
tion of s vectors. These s vectors will always be linearly independent
because in each of the s coordinate positions corresponding to the
free parameters, just one of the s vectors will have a non-zero entry.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 58/212
58
Definition 3.17. (Nullity)
The dimension of the null space of a matrix A is called the nullity of A.
We close this section with one of the most important results in
elementary linear algebra, which is universally called the Rank-
Nullity Theorem, which has a surprisingly simple proof.
Theorem 3.18. Suppose that A is an m × n matrix. Then
rank( A) + nullity( A) = n.
Proof. Consider the system of linear equations A x = 0, which is
a system of m equations in n unknowns. This system is solved by
applying Gaussian elimination to the augmented matrix [ A | 0],
thereby obtaining the matrix [ A | 0] in row-echelon form. The rankof A is equal to the number of non-zero rows of A, which is equal
to the number of basic variables in the system of linear equations.
The nullity of A is the number of free parameters in the solution
set to the system of linear equations and so it is equal to the num-
ber of non-basic variables. So the rank of A plus the nullity of A is
equal to the number of basic variables plus the number of non-basic
variables. As each of the n variables is either basic or non-basic, the
result follows.
Given a matrix, it is important to be able to put all the techniques
together and to determine the rank, the nullity, a basis for the nullspace and a basis for the column space of a given matrix. This is
demonstrated in the next example.
Example 3.19. (Rank and nullity) Find the rank, nullity and bases for
the row space and null space for the following 4 × 4 matrix:
A =
1 0 2 1
3 1 3 3
2 1 1 0
2 1 1 2
.
All of the questions can be answered once the matrix is in row-echelon form, and so the first task is to apply Gaussian elimination, which will
result in the following matrix:
A =
1 0 2 1
0 1 −3 0
0 0 0 −2
0 0 0 0
.
The matrix has 3 non-zero rows and so the rank of A is equal to 3. These
non-zero rows form a basis for the rowspace of A and so the basis for the
rowspace of A is
{(1,0,2,1), (0,1, −3, 0), (0,0,0, −2)}.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 59/212
mathematical methods i 59
The null space is the set of solutions to the matrix equation A x = 0,
and solving this equation by performing Gaussian elimination on the
augmented matrix [ A | 0] would yield the augmented matrix [ A | 0]. 9 9 In other words, the Gaussian elim-ination part only needs to be doneonce, because everything dependsonly on the form of the matrix A .However it is important to rememberthat although it is the same matrix,we are using it in two quite distinctways. This distinction is often missed by students studying linear algebra forthe first time.
So given the augmented matrix
A =
1 0 2 1 00 1 −3 0 0
0 0 0 −2 0
0 0 0 0 0
,
we see that x1, x2 and x4 are basic variables, while the solution space has
x3 as its only free parameter. Expressing the basic variables in terms of the
free parameters by back-substitution we determine that the solution set is
S = {(−2x3, 3x3, x3, 0) | x3 ∈ R}
and it is clear that a basis for this is
{(
−2,3,1,0)
}and so the nullity of A
is equal to 1.
3.4 Solving systems of linear equations
We have seen that the set of all solutions to the system of linear
equations A x = 0 is the nullspace of A. What can we say about the
set of solutions of
A x = b (3.3)
when b = 0?
Suppose that we know one solution x1 and that v lies in the
nullspace of A. Then
A( x1 + v) = A x1 + Av = b + 0 = b
Hence if we are given one solution we can create many more by
simply adding elements of the nullspace of A.
Moreover, given any two solutions x1 and x2 of (3.3) we have that
A( x1 − x2) = A x1 − A x2 = b − b = 0
and so x1 − x2 lies in the nullspace of A. In particular, every solu-
tion of A x = b is of the form x1 + v for some v ∈ nullsp( A). This is
so important that we state it as a theorem.
Theorem 3.20. Let A x = b be a system of linear equations and let x1 be
one solution. Then the set of all solutions is
{ x1 + v | v ∈ nullsp( A)}
Corollary 3.21. The number of free parameters required for the set of
solutions of A x = b is the dimension of the nullspace of A.
3.5 Matrix Inversion
In this section we consider the inverse of a matrix. The theory of inverses in matrix algebra is far more subtle and interesting than
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 60/212
60
that of inverses in real algebra, and it is intimately related to the
ideas of independence and rank that we have explored so far.
In real algebra, every non-zero element has a multiplicative in-
verse; in other words, for every x = 0 we can find another number,
x such that
xx = x x = 1
Rather than calling it x , we use the notation x−1 to mean “the
number that you need to multiply x by in order to get 1”, thus
getting the familiar
2−1 = 1
2
In matrix algebra, the concept of a multiplicative identity and
hence an inverse only makes sense for square matrices. However,
even then, there are some complicating factors. In particular, be-
cause multiplication is not commutative, it is conceivable that a
product of two square matrices might be equal to the identity onlyif they are multiplied in a particular order. Fortunately, this does
not actually happen:
Theorem 3.22. Suppose that A and B are square matrices such that
AB = I n. Then BA = I n.
Proof. First we observe that B has rank equal to n. This follows
because if Bv = 0 for any non-zero vector v, then ABv = 0, which
contradicts the fact that AB = I n. So the null space of B contains
only the zero vector, so B has nullity 0 and therefore, by the Rank-
Nullity theorem, B has rank n.Secondly we do some simple manipulation using the properties
of matrix algebra that were outlined in Theorem 3.2:
On = AB − I n (because AB = I n)
= B( AB − I n) (because BOn = On)
= B AB − B (distributivity)
= (BA − I n)B (distributivity)
This manipulation shows that the matrix (BA − I )B = On.
However because the rank of B is n, it follows that the column
space of B is the whole of Rn, and so any vector v ∈ Rn can beexpressed in the form B x for some x. Therefore
(BA − I n)v = (BA − I n)B x (because B has rank n)
= On x (because (BA − I n)B = On)
= 0 (properties of zero matrix)
The only matrix whose product with every vector v is equal to 0 is
the zero matrix itself, so B A − I n = On or B A = I n as required.
This theorem shows that when defining the inverse of a matrix,
we don’t need to worry about the order in which the multiplicationoccurs.10 10 In some text-books, the authors
introduce the idea that B is the “left-inverse” of A if B A = I and the“right-inverse” of A if A B = I , andthen immediately prove Theorem 3.22
showing that a left-inverse is a right-inverse and vice versa.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 61/212
mathematical methods i 61
Another property of inverses in the algebra of real numbers is
that a non-zero real number has a unique inverse. Fortunately, this
property also holds for matrices:
Theorem 3.23. If A, B and C are square matrices such that
AB = I n and AC = I n
then B = C.
Proof. The proof just proceeds by manipulation using the properties
of matrix algebra outlined in Theorem 3.2.
B = B I n (identity matrix property)
= B( AC) (hypothesis of theorem)
= (BA)C (associativity)
= I n
C (by Theorem
3.22
)= C (identity matrix property)
Therefore a matrix has at most one inverse.
Definition 3.24. (Matrix Inverse)
Let A be an n × n matrix. If there is a matrix B such that
AB = I n
then B is called the inverse of A, and is denoted A−1. From The-
orems 3.22 and 3.23, it follows that B is uniquely determined, thatBA = I n, and that B−1 = A.
A matrix is called invertible if it has an inverse, and non-invertible
otherwise.
Example 3.25. (Matrix Inverse) Suppose that
A =
1 0 1
2 1 1
2 0 1
Then if we take
B =
−1 0 1
0 1 −1
2 0 −1
then it is easy to check that
AB = B A = I n
Therefore we conclude that A−1 exists and is equal to B and, naturally,
B−1 exists and is equal to A.
In real algebra, every non-zero number has an inverse, but this isnot the case for matrices:
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 62/212
62
Example 3.26. (Non-zero matrix with no inverse) Suppose that
A =
1 1
2 2
.
Then there is no possible matrix B such that AB = I 2. Why is this? If the matrix B existed, then it would necessarily satisfy
1 1
2 2
b11 b12
b21 b22
=
1 0
0 1
.
In order to satisfy this matrix equation, then b 11 + b21 must equal 1, while
2b11 + 2b21 = 2(b11 + b21) must equal 0 —- clearly this is impossible. So
the matrix A has no inverse.
One of the common mistakes made by students of elementary
linear algebra is to assume that every matrix has an inverse. If an
argument or proof about a generic matrix A ever uses A−1 as part
of the manipulation, then it is necessary to first demonstrate that A
is actually invertible. Alternatively, the proof can be broken down
into two separate cases, one covering the situation where A is as-
sumed to be invertible and a separate one for where it is assumed
to be non-invertible.
3.5.1 Finding inverses
This last example of the previous section (Example 3.26) essentially
shows us how to find the inverse of a matrix because, as usual, it all boils down to solving systems of linear equations. If A is an n × n
matrix then finding its inverse, if it exists, is just a matter of finding
a matrix B such that AB = I n. To find the first column of B, it is
sufficient to solve the equation A x = e1, then the second column
is the solution to A x = e2, and so on. If any of these equations has
no solutions then A does not have an inverse.11 Therefore, finding 11 In fact, if any of them have infinitelymany solutions, then it also followsthat A has no inverse because if theinverse exists it must be unique. Thusif one of the equations has infinitelymany solutions, then one of the otherequations must have no solutions.
the inverse of an n × n matrix involves solving n separate systems
of linear equations. However because each of the n systems has the
same coefficient matrix (that is, A), there are shortcuts that make
this procedure easier.
To illustrate this, we do a full example for a 3 × 3 matrix, al-though the principle is the same for any matrix. Suppose we want
to find the inverse of the matrix
A =
−1 0 1
0 1 −1
2 0 −1
.
The results above show that we just need to find a matrix B = (bij )
such that
−1 0 1
0 1 −12 0 −1
b11 b12 b13
b21 b22 b23
b31 b32 b33
= 1 0 0
0 1 00 0 1
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 63/212
mathematical methods i 63
This can be done by solving three separate systems of linear equa-
tions, one to determine each column of B:
A
b11
b21
b31
=
1
0
0
, A
b12
b22
b32
=
0
1
0
and A
b13
b23
b33
=
0
0
1
.
Then the matrix A has an inverse if and only if all three of these
systems of linear equations have a solution, and in fact, each of
them must have a unique solution. If any one of the three equations
is inconsistent, then A is one of the matrices that just doesn’t have
an inverse.
Consider how the solution of these systems of linear equations
will proceed: for the first column, the Gaussian elimination pro-
ceeds as follows: We start with the augmented matrix
−1 0 1 1
0 1 −1 02 0 −1 0
,
and after pivoting on the (1, 1)-entry we get −1 0 1 1
0 1 −1 0
0 0 1 2
R3 ← R3 + 2R1
which is now in row-echelon form. As every variable is basic, this
system of linear equations has a unique solution, which can easily
be determined by back-substitution, giving b31 = 2, b21 = 2 and
b11 = 1.Now we solve for the second column of B; this time the aug-
mented matrix is −1 0 1 0
0 1 −1 1
2 0 −1 0
,
and after pivoting on the (1, 1)-entry we get
−1 0 1 0
0 1 −1 1
0 0 1 0
R3 ← R3 + 2R1
It is immediately apparent that we used the exact same elementary
row operations as we did for the previous system of linear equations
because, naturally enough, the coefficient matrix is the same matrix.
And obviously, we’ll do the same elementary row operations again
when we solve the third system of linear equations! So to avoid
repeating work unnecessarily, it is better to solve all three systems
simultaneously. This is done by using a sort of “super-augmented”
matrix that has three columns to the right of the augmenting bar,
representing the right-hand sides of the three separate equations:
−1 0 1 1 0 0
0 1 −1 0 1 02 0 −1 0 0 1
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 64/212
64
Then performing an elementary row operation on this bigger ma-
trix has exactly the same effect as doing it on each of the three
systems separately.
−1 0 1 1 0 0
0 1 −1 0 1 00 0 1 2 0 1
R3 ← R3 + 2R1
We could now break this apart again into three separate systems
of linear equations and solve each of them by back-substitution, but
again this involves a little bit more work than necessary.
Suppose instead that we do some more elementary row operations
in order to make the system(s) of linear equations even simpler than
before. Usually, when we pivot on an entry in the matrix, we use
the pivot row in order to zero-out the rest of the column below the
pivot entry. However there is nothing stopping us from zeroing out
the rest of the column above
the pivot entry as well. Let’s do this onthe (3, 3)-entry in the matrix and see how useful it is: −1 0 0 −1 0 −1
0 1 0 2 1 1
0 0 1 2 0 1
R1 ← R1 − R3
R2 ← R2 + R3
One final elementary row operation puts the matrix into an espe-
cially nice form.
1 0 0 1 0 1
0 1 0 2 1 1
0 0 1 2 0 1
R1 ← (−1)R1
In this form not even any back substitution is needed to find the
solution; the first system of equations has solution b11 = 1, b21 = 2
and b31 = 3, while the second has solution b12 = 0, b22 = 1 and
b32 = 0 while the final system has solution b13 = 1, b23 = 1 and
b33 = 1. Thus the inverse of the matrix A is given by
A−1 =
1 0 1
2 1 1
2 0 1
which is just exactly the matrix that was found to the right of the
augmenting bar!In this example, we’ve jumped ahead without using the formal
terminology or precisely defining the “especially nice form” of the
final matrix. We remedy this immediately.
Definition 3.27. (Reduced row-echelon form)
A matrix is in reduced row-echelon form if it is in row echelon form,
and
1. The leading entry of each row is equal to one, and
2. The leading entry of each row is the only non-zero entry in its column.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 65/212
mathematical methods i 65
Example 3.28. (Reduced row-echelon form) The following matrices are
both in reduced row echelon form:
1 0 0 2 0
0 1 0 0 0
0 0 1 3 00 0 0 0 1
and
1 0 0
0 1 0
0 0 10 0 0
.
Example 3.29. (Not in reduced row-echelon form) The following matrices
are NOT in reduced row echelon form:
1 0 0 2 0
0 −1 0 0 0
0 0 1 3 0
0 0 0 0 2
and
1 0 0
0 1 0
0 0 1
0 0 1
.
A simple modification to the algorithm for Gaussian elimination
yields an algorithm for reducing a matrix to reduced row echelon
form; this algorithm is known as Gauss-Jordan elimination. It is pre-
sented below, with the differences between Gaussian elimination
and Gauss-Jordan elimination highlighted in boldface.
Definition 3.30. (Gauss-Jordan Elimination — in words)
Let A be an m × n matrix. At each stage in the algorithm, a particular
position in the matrix, called the pivot position, is being processed. Ini-
tially the pivot position is at the top-left of the matrix. What happens ateach stage depends on whether the pivot entry (that is, the number in the
pivot position) is zero or not.
1. If the pivot entry is zero then, if possible, interchange the pivot row
with one of the rows below it, in order to ensure that the pivot entry is
non-zero. This will be possible unless the pivot entry and every entry
below it are zero, in which case simply move the pivot position one
column to the right.
2. If the pivot entry is non-zero, multiply the pivot row to ensure
that the pivot entry is 1 and then, by adding a suitable multiple of
the pivot row to every row above and below the pivot row, ensure that
every entry above and below the pivot entry is zero. Then move the
pivot position one column to the right and one row down.
When the pivot position is moved off the matrix, then the process finishes
and the matrix will be in reduced row echelon form.
Now we can formally summarise the procedure for finding the
inverse of a matrix. Remember, however, that this is simply a way
of organising the calculations efficiently, and that there is nothing
more sophisticated occurring than solving systems of linear equa-tions.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 66/212
66
Key Concept 3.31. (Finding the inverse of a matrix) In order to find
the inverse of an n × n matrix A, proceed as follows:
1. Form the “super-augmented” matrix
[ A | I n]
2. Apply Gauss-Jordan elimination to this matrix to place it into
reduced row-echelon form
3. If the resulting reduced row echelon matrix has an identity matrix
to the left of the augmenting bar, then it must have the form
[I n | A−1]
and so A−
1 will be the matrix on the right of the augmenting bar.
4. If the reduced row echelon matrix does not have an identity ma-
trix to the left of the augmenting bar, then the matrix A is not
invertible.
It is interesting to note that, while it is important to understand
what a matrix inverse is and how to calculate a matrix inverse, it is
almost never necessary to actually find an explicit matrix inverse
in practice. An explicit problem for which a matrix inverse might
be useful can almost always be solved directly (by some form of
Gaussian elimination) without actually computing the inverse.However, as we shall see in the next section, understanding
the procedure for calculating an inverse is useful in developing
theoretical results.
3.5.2 Characterising invertible matrices
In the last two subsections we have defined the inverse of a matrix,
demonstrated that some matrices have inverses and others don’t
and given a procedure that will either find the inverse of a matrix
or demonstrate that it does not exist. In this subsection, we consider
some of the special properties of invertible matrices focussing onwhat makes them invertible, and what particular properties are
enjoyed by invertible matrices.
Theorem 3.32. An n × n matrix is invertible if and only if it has rank
equal to n.
Proof. This is so important that we give a couple of proofs in
slightly different language, though the fundamental concept is
the same in both proofs.
Proof 1: Applying elementary row operations to a matrix does not
alter its row space, and hence its rank. If a matrix A is invertible,
then Gauss-Jordan elimination applied to A will yield the iden-tity matrix, which has rank n. If A is not invertible, then applying
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 67/212
mathematical methods i 67
Gauss-Jordan elimination to A yields a matrix with at least one row
of zeros, and so it does not have rank n.
Proof 2: If a matrix A is invertible then there is always a solution
to the matrix equation
A x = v
for every v, and so the column space of A is equal to the whole
of Rn. Conversely if A is not invertible, then at least one of the
standard basis vectors {e1, e2, . . . , en} is not in the column space of
A and so the rank of A is strictly less than n.
There are some other characterisations of invertible matrices that
may be useful, but they are all really just elementary restatements
of Theorem 3.32.
Theorem 3.33. Let A be an n × n matrix. Then
1. A is invertible if and only if its rows are linearly independent.
2. A is invertible if and only if its columns are linearly independent.
3. A is invertible if and only if its row space is Rn.
4. A is invertible if and only if its column space is Rn.
Proof. These are all ways of saying “the rank of A is n”.
Example 3.34. (Non-invertible matrix) The matrix
A =
0 1 2
1 2
−1
1 3 1
is not invertible because
(0,1,2) + (1,2, −1) = (1,3,1)
is a dependency among the rows, and so the rows are not linearly indepen-
dent.
Now let’s consider some of the properties of invertible matrices.
Theorem 3.35. Suppose that A and B are invertible n × n matrices, and
k is a positive integer. Then
1. The matrix AB is invertible, and
( AB)−1 = B−1 A−1.
2. The matrix Ak is invertible, and Ak
−1=
A−1k
.
3. The matrix AT is invertible, and
AT
−1=
A−1T
.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 68/212
68
Proof. To show that a matrix is invertible, it is sufficient to demon-
strate the existence of some matrix whose product with the given
matrix is the identity. Thus to show that A B is invertible, we must
find something that we can multiply AB by in order to end up with
the identity.
( AB)(B−1 A−1) = A(BB−1) A−1 (associativity)
= AI n A−1 (properties of inverses)
= AA−1 (properties of identity)
= I n (properties of inverses)
This shows that AB is invertible, and that its inverse is B−1 A−1
as required. The remaining two statements are straightforward to
prove.
This theorem shows that the collection of invertible n
×n ma-
trices is closed under matrix multiplication. In addition, there is amultiplicative identity (the matrix I n) and every matrix has an in-
verse (obviously!). These turn out to be the conditions that define
an algebraic structure called a group. The group of invertible n × n
matrices plays a fundamental role in the mathematical subject of
group theory which is an important topic in higher-level Pure Mathe-
matics.
3.6 Determinants
From high-school we are all familiar with the formula for the in-verse of a 2 × 2 matrix: a b
c d
−1
= 1
ad − bc
d −b
−c a
if ad − bc = 0
where the inverse does not exist if ad − bc = 0. In other words, a
2 × 2 matrix has an inverse if and only if ad − bc = 0. This num-
ber is called the determinant of the matrix, and it is either denoted
det( A) or just | A|.Example 3.36. (Determinant Notation) If
A = 3 5
2 4
then we say either
det( A) = 2 or
3 5
2 4
= 2
because 3 · 4 − 2 · 5 = 2.
In this section, we’ll extend the concept of determinant to n × n
matrices and show that it characterises invertible matrices in the
same way — a matrix is invertible if and only if its determinant isnon-zero.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 69/212
mathematical methods i 69
The determinant of a square matrix is a scalar value (i.e. a num-
ber) associated with that matrix that can be recursively defined as
follows:
Definition 3.37. (Determinant)If A = (aij ) is an n × n matrix, then the determinant of A is a real
number, denoted det( A) or | A|, that is defined as follows:
1. If n = 1, then | A| = a11
2. If n > 1, then
| A| = j=n
∑ j=1
(−1)1+ j a1 j| A[1, j]| (3.4)
where A[i, j] is the (n − 1) × (n − 1) matrix obtained from A by
deleting the i-th row and the j-th column
Notice that when n > 1, this expresses an n × n determinant as an alter-
nating sum of n terms, each of which is an (n − 1) × (n − 1) determinant.
Example 3.38. (A 3 × 3 determinant) What is the determinant of the
matrix
A =
2 5 3
4 3 6
1 0 2
?
First let’s identify the matrices A[1, 1], A[1, 2] and A[1, 3] ; recall these are
obtained by deleting one row and column from A. For example, A[1, 2] isobtained by deleting the first row and second column from A, thus
A[1, 2] =
2 5 3
4 3 6
1 0 2
The term (−1)1+ j simply alternates between +1 and −1 and so the first
term is added because (−1)2 = 1, the second subtracted because (−1)3 =
−1, the third added, and so on. Using the formula we get
| A| = 2 · 3 6
0 2 − 5 · 4 6
1 2 + 3 · 4 3
1 0 = 2 · 6 − 5 · 2 + 3 · (−3)
= −7
where the three 2 × 2 determinants have just been calculated using the
usual rule.
This procedure for calculating the determinant is called expand-
ing along the first row, because each of the terms a1 j A[1, j] is associ-
ated with an entry in the first row. However it turns out, although
we shall not prove it,12 that it is possible to do the expansion along 12 Proving this is not difficult but
it involves a lot of manipulation of subscripts and nested sums, which isprobably not the best use of your time.
any row or indeed, any column. So in fact we have the followingresult:
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 70/212
70
Theorem 3.39. Let A = (aij ) be an n × n matrix. Then for any fixed
row index i we have
| A| = j=n
∑ j=1
(−1)i+ j aij | A[i, j]|
and for any fixed column index j, we have
| A| =i=n
∑ i=1
(−1)i+ jaij | A[i, j]|.
(Notice that the first of these two sums involves terms obtained from the i-
th row of the matrix, while the second involves terms from the j-th column
of the matrix.)
Proof. Omitted.
Example 3.40. (Expanding down the second column) Determine the
determinant of
A =
2 5 3
4 3 6
1 0 2
by expanding down the second column. Notice that because we are using
the second column, the signs given by the (−1)i+ j terms alternate −1,
+1, −1 starting with a negative, not a positive. So the calculation gives
| A| = (−1) · 5 ·
4 6
1 2
+ 3 ·
2 3
1 2
+ (−1) · 0 · (don’t care)
= −5 · 2 + 3 · 1 + 0= −7
Also notice that because a32 = 0, the term (−1)3+2 a32 | A[3, 2]| is forced
to be zero, and so there is no need to actually calculate | A[3, 2]|.In general, you should choose the row or column of the matrix
that has lots of zeros in it, in order to make the calculation as easy
as possible!
Example 3.41. (Easy if you choose right) To determine the determinant of
A =
3 1 0 24 1 0 1
2 1 0 1
1 1 3 2
use the third column which has only one non-zero entry, and get
| A| = (+1) · 0 + (−1) · 0 + (+1) · 0 + (−1) · 3 ·
3 1 2
4 1 1
2 1 1
= −6
rather than getting an expression with three or four 3 × 3 determinants toevaluate!
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 71/212
mathematical methods i 71
From this we can immediately deduce some theoretical results:
Theorem 3.42. Let A be an n × n matrix. Then
1. | AT | = | A|,2.
|α A
|= αn
| A
|, and
3. If A has a row of zeros, then | A| = 0.
4. If A is an upper (or lower) triangular matrix, then | A| is the product of
the entries on the diagonal of the matrix.
Proof. To prove the first statement, we use induction on n. Certainly
the statement is true for 1 × 1 matrices. So now suppose that it
is true for all matrices of size up to n − 1. Notice that expanding
along the first row of A gives a sum with the same coefficients as
expanding down the first column of AT , the signs of each term are
the same because (−1)i+ j = (−1) j+i, and all the (n − 1) × (n − 1)
determinants in the first sum are just the transposes of those in thesecond sum, and so are equal by the inductive hypothesis.
For the second statement we again use induction. Certainly the
statement is true for 1 × 1 matrices. So now suppose that it is true
for all matrices of size up to n − 1. Then
|α A| = j=n
∑ j=1
(−1)i+ j (αaij ) |α A[i, j]|
= j=n
∑ j=1
(−1)i+ j (αaij ) αn−1| A[i, j]| (inductive hypothesis)
= ααn−1 j=n∑
j=1
(−1)i+ j aij | A[i, j]| (rearranging)
= αn| A|.The third statement is immediate because if we expand along the
row of zeros, then every term in the sum is zero.
The fourth statement again follows from an easy induction argu-
ment.
Example 3.43. (Determinant of matrix in row echelon form) A matrix in
row echelon form is necessarily upper triangular, and so its determinant
can easily be calculated. For example, the matrix
A =
2 0 1 −1
0 1 2 1
0 0 −3 2
0 0 0 1
which is in row echelon form has determinant equal to 2 · 1 · (−3) · 1 =
−6 because this is the product of the diagonal entries. We can verify this
easily from the formula by repeatedly expanding down the first column. So
2 0 1 −1
0 1 2 1
0 0 −3 20 0 0 1
= 2 ·
1 2 1
0 −3 20 0 1 = 2 · 1 · −
3 2
0 1
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 72/212
72
3.6.1 Calculating determinants
The recursive definition of a determinant expresses an n
×n de-
terminant as a linear combination of n terms each involving an
(n − 1) × (n − 1) determinant. So to find a 10 × 10 determinant
like this involves computing ten 9 × 9 determinants, each of which
involves nine 8 × 8 determinants, each of which involves eight 7 × 7
determinants, each of which involves seven 6 × 6 determinants,
each of which involves six 5 × 5 determinants, each of which in-
volves five 4 × 4 determinants, each of which involves four 3 × 3
determinants, each of which involves three 2 × 2 determinants.
While this is possible (by computer) for a 10 × 10 matrix, even the
fastest supercomputer would not complete a 100 × 100 matrix in the
lifetime of the universe.However, in practice, a computer can easily find a 100 × 100
determinant, so there must be another more efficient way. Once
again, this way is based on elementary row operations.
Theorem 3.44. Suppose A is an n × n matrix, and that A is obtained
from A by performing a single elementary row operation.
1. If the elementary row operation is Type I, then
| A| = −| A|.
In other words, a Type I elementary row operation multiplies the deter-minant by −1.
2. If the elementary row operation is Type 2, say Ri ← αRi then
| A| = α| A|.
In other words, multiplying a row by the scalar α multiplies the deter-
minant by α.
3. If the elementary row operation is of Type 3, then
| A
|=
| A
|.
In other words, adding a multiple of one row to another does not change
the determinant.
Proof. Omitted for now.
Previously we have used elementary row operations to find
solutions to systems of linear equations, and to find the basis and
dimension of the row space of a matrix. In both these applications,
the elementary row operations did not change the answer that was
being sought. For finding determinants however, elementary row
operations do change the determinant of the matrix, but they changeit in a controlled fashion and so the process is still useful.13 13 Another common mistake for begin-
ning students of linear algebra is toassume that row-reduction preservesevery interesting property of a matrix.Instead, row-reduction preserves someproperties, alters others in a controlledfashion, and destroys others. It isimportant to always know why therow-reduction is bein done.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 73/212
mathematical methods i 73
Example 3.45. (Finding determinant by row-reduction) We return to an
earlier example, of finding the determinant of
A =
2 5 3
4 3 6
1 0 2
Suppose that this unknown value is denoted d. Then after doing the Type I
elementary row operation R1 ↔ R3 we get the matrix 1 0 2
4 3 6
2 5 3
which has determinant −d, because Type 1 elementary row operations
multiply the determinant by −1. If we now perform the Type 3 elementary
row operations, R2 ← R2 − 4R1 and R3 ← R3 − 2R1, then the resultingmatrix 1 0 2
0 3 −2
0 5 −1
still has determinant −d because Type 3 elementary row operations do
not alter the determinant. Finally, the elementary row operation R3 ←R3 − (5/3)R2 yields the matrix
1 0 2
0 3 −2
0 0 7/3
which still has determinant −d. However, it is easy to see that the deter-
minant of this final matrix is 7, and so −d = 7, which immediately tells
us that d = −7 confirming the results of Examples 3.38 and 3.40
Thinking about this process in another way shows us that if a
matrix A has determinant d and the matrix A is the row-echelon
matrix obtained by performing Gaussian elimination on A, then
| A| = β | A| for some β = 0.
Combining this with the fourth property of Theorem 3.44 allows us
to state the single most important property of determinants:
Theorem 3.46. A matrix A is invertible if and only if its determinant is
non-zero.
Proof. Consider the row echelon matrix A obtained by applying
Gaussian elimination to A. If A is invertible, then A has no zero
rows and so every diagonal entry is non-zero and thus | A| = 0,
while if A is not invertible, A has at least one zero row and thus
| A| = 0. As the determinant of A is a non-zero multiple of the
determinant of A, it follows that A has non-zero determinant if and only if it is invertible.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 74/212
74
3.6.2 Properties of Determinant
We finish this chapter with some of the properties of determinants,
most of which follow immediately from the following theorem,
which shows that the determinant function is multiplicative.
Theorem 3.47. If A and B are two n × n matrices, then
| AB| = | A| · |B|.Proof. There are several proofs of this result, none of which are
very nice. We give a sketch outline14 of the most illuminating proof. 14 This is a very brief outline of theproof so do not worry if you cannotfollow it without some guidance onhow to fill in the gaps.
First note that if either A or B (or both) is not invertible, then AB is
not invertible and so the result is true if any of the determinants is
zero.
Then proceed in the following steps:
1. Define an elementary matrix to be a matrix obtained by perform-
ing a single elementary row operation on the identity matrix.2. Note that premultiplying a matrix A by an elementary matrix E,
thereby forming the matrix E A, is exactly the same as perform-
ing the same elementary row operation on A.
3. Show that elementary matrices of Type 1, 2 and 3 have deter-
minant −1, α and 1 respectively (where α is the non-zero scalar
associated with an elementary row operation of Type 2).
4. Conclude that the result is true if A is an elementary matrix or a
product of elementary matrices.
5. Finish by proving that a matrix is invertible if and only if it is the
product of elementary matrices, because Gauss-Jordan elimina-tion will reduce any invertible matrix to the identity matrix.
The other proofs of this result use a different description of the
determinant as the weighted sum of n! products of matrix entries,
together with extensive algebraic manipulation.
Just for fun, we’ll demonstrate this result for 2 × 2 matrices
purely algebraically, in order to give a flavour of the alternative
proofs. Suppose that
A = a b
c d B = a b
c d Then
AB =
aa + bc ab + bd
ac + cd bc + dd
Therefore
| AB| = (aa + bc)(bc + dd) − (ab + bd )(ac + cd)
= aabc + aa dd + bc bc + bcdd − abac − abcd − bdac − bdcd= (aabc − abac) + aa dd + bc bc + (bcdd − bdcd) − abcd − bdac= 0 + (ad)(ad) + (bc)(bc) + 0 − (ad)(bc) − (ad)(bc)
= (ad − bc)(ad − bc)= | A||B|.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 75/212
mathematical methods i 75
The multiplicativity of the determinant immediately gives the
main properties.
Theorem 3.48. Suppose that A and B are n × n matrices and k is a
positive integer. Then
1. | AB| = |BA|2. | Ak | = | A|k
3. If A is invertible, then | A−1| = 1/| A|Proof. The first two are immediate, and the third follows from the
fact that AA−1 = I n and so | A|| A−1| = 1.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 77/212
4
Linear transformations
4.1 Introduction
First we will give the general definition of a function.
Definition 4.1. (Function)
Given two sets A and B, a function f : A → B is a rule that assigns to
each element of A a unique element of B. We often write
f : A −→ B
a −→ f (a)
where f (a) is the element of B assigned to a, called the image of a un-
der f. The set A is called the domain of f and is sometimes denoted
dom( f ). The set B is called the codomain of f . The range of f, (some-
times denoted range( f )) is the set of all elements of B that are the image
of some element of A.
Often f (a) is defined by some equation involving a (or whatever
variable is being used to represent elements of A), for example
f (a) = a2. However, sometimes you may see f defined by listing
f (a) for each a ∈ A. For example, if A = {1,2,3} we could define f
by
f : A −→ R
1 −→
10
2 −→ 10
3 −→ 102
If f (x) is defined by some rule and the domain of f is not explicitly
given then we assume that the domain of f is the set of all values
on which f (x) is defined.
Note that the range of f need not be all of the codomain. For
example, if f : R → R is the function defined by f (x) = x 2 then the
codomain of f is R while the range of f is the set {x ∈ R | x ≥ 0}.
A linear transformation is a function from one vector space to an-
other preserving the structure of vector spaces, that is, it preserves
vector addition and scalar multiplication.More precisely:
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 78/212
78
Definition 4.2. (Linear transformation)
A function f from Rn to R
m is a linear transformation if:
1. f (u + v) = f (u) + f (v) for all u, v in Rn ;
2. f (αv) = α f (v) for all v in Rn and all α in R.
An interesting case is when n = m , inwhich case the domain and codomainare the same vector space.Example 4.3. (Linear transformation) In R
3, the orthogonal projection
to the xy-plane is a linear transformation. This maps the vector (x, y, z) to
(x, y, 0). Check the two conditions to convinceyourself
Example 4.4. (Not a linear transformation) The function from R3 to
R given by f (x, y, z) = x2 + y2 + z2 is not a linear transformation.
Indeed for v = (1,0,0) and α = 2, f (αv) = f (2,0,0) = 4 while
α f (v) = 2.1 = 2.
Example 4.5. (Not a linear transformation) Let f : R → R defined
by f (x) = ax + b. Note that f (1) = a + b while f (2) = 2a + b =2(a + b) when b = 0. Thus when b = 0, the function f is not a linear
transformation of the vector space R. We call f an affine function.
Example 4.6. Let A be an m × n matrix. Then the function f from Rn to
Rm such that f ( x) = A x is a linear transformation (where we see x as an
n × 1 column vector, as described in Chapter 1). Indeed
f (u + v) = A(u + v) = Au + Av = f (u) + f (v)
(using matrix property (8) of Theorem 3.2), and
f (αv) = A(αv) = α Av = α f (v)
(using matrix property (7) of Theorem 3.2).
Theorem 4.7. Let f : Rn −→ Rm be a linear transformation.
(i) f (0) = 0.
(ii) The range of f is a subspace of Rm.
Proof. (i) This follows from an easy calculation and the fact that
0 + 0 = 0: f (0) = f (0 + 0) = f (0) + f (0). The result follows from
subtracting f (0) on each side.
(ii) We need to prove the three subspace conditions for range( f ) (see
Definition 2.4).
(S1) 0 ∈ range( f ) since 0 = f (0) by Part (i).
(S2) Let u, v ∈ range( f ), that is, u = f (u), v = f (v) for some
u, v ∈ Rn. Then u + v = f (u) + f (v) = f (u + v) and so
u + v ∈ range( f ).
(S3) Let α ∈ R and v ∈ range( f ), that is, v = f (v) for some
v ∈ Rn. Then αv = α f (v) = f (αv) and so αv ∈ range( f )
for all scalars
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 79/212
mathematical methods i 79
4.1.1 Linear transformations and bases
A linear transformation can be given by a formula, but there are
other ways to describe it. In fact, if we know the images under a
linear transformation of each of the vectors in a basis, then the rest
of the linear transformation is completely determined.
Theorem 4.8. Let {u1, u2, . . . , un} be a basis for Rn and let t 1, t 2, . . ., For instance the basis of Rn can be the
standard basis.t n be n vectors of Rm. Then there exists a unique linear transformation f
from Rn to R
m such that f (u1) = t 1, f (u2) = t 2, . . ., f (un) = t n.
Proof. We know by Theorem 2.33 that any vector v ∈ Rn can be
written in a unique way as v = α1u1 + α2u2 + . . . + αnun (where the
αi’s are real numbers). Define f (v) = α1t 1 + α2t 2 + . . . + αnt n. Then
f satisfies f (ui) = t i for all i between 1 and n and we can easily
check that f is linear. So we have that f exists.
Now suppose g is also a linear transformation satisfying g(ui) =
t i for all i between 1 and n. Then
g(v) = g(α1u1 + α2u2 + . . . + αnun)
= g(α1u1) + g(α2u2) + . . . + g(αnun)
(by the first condition for a linear function)
= α1 g(u1) + α2 g(u2) + . . . + αn g(un)
(by the second condition for a linear function)
= α1t 1 + α2t 2 + . . . + αnt n
= f (v)
Thus g(v) = f (v) for all v ∈ Rn so they are the same linear trans-
formation, that is, f is unique.
Exercise 4.1.1. Say f is a linear transformation from R2 to R
3 with
f (1, 0) = (1,2,3) and f (0, 1) = (0, −1, 2). Determine f (x, y).
4.1.2 Linear transformations and matrices
We have seen that it is useful to choose a basis of the domain, say
{u1, u2, . . . , un}. Now we will also take a basis for the codomain:
{v1, v2, . . . , vm}. For now we can think of both the bases as the
standard ones, but later we will need the general case.By Theorem 2.33, each vector f (u j) (1 ≤ j ≤ n) has unique
coordinates in the basis {v1, v2, . . . , vm} of Rm. More precisely:
f (u j) = a1 jv1 + a2 jv2 + . . . + amj vm
where the symbols a ij represent real numbers, for 1 ≤ i ≤ m,
1 ≤ j ≤ n. For short we write
f (u j) =m
∑ i=1
aij vi.
Now we can determine the image of any vector x in Rn. If x =x1u1 + x2u2 + . . . xnun, then we have:
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 80/212
80
f ( x) = f (x1u1 + x2u2 + . . . + xnun)
= f (n
∑ j=1
x ju j)
=n
∑ j=1
x j f (u j) (by linearity)
=n
∑ j=1
x j
m
∑ i=1
aij vi
=m
∑ i=1
n
∑ j=1
aij x j
vi
Notice that ∑ n j=1 aij x j is exactly the i-th element of the m
×1
matrix A x, where x is the n × 1 vector (x1, x2, . . . , xn)T . Now let
us see vectors as column vectors (coordinates in the chosen basis),
that is, vectors in Rm are m × 1 matrices and vectors in R
n are n × 1
matrices. Then f ( x) = A x, where A = (aij ) is the m × n matrix
defined by f (u j) = ∑ mi=1 aij vi. Since A has size m × n and x has size
n × 1, A x has size m × 1 and so is a vector of Rm. Together with Example 4.6, this tellsus that linear transformations areessentially the same as matrices (afteryou have chosen a basis of the domainand a bais of the codomain).
This gives us a very convenient way to express a linear transfor-
mation (as the matrix A) and to calculate the image of any vector.
Definition 4.9.
(Matrix of a linear transformation)The matrix of a linear transformation f , with respect to the basis B of the
domain and the basis C of the codomain, is the matrix whose j-th column
contains the coordinates in the basis C of the the image under f of the j-th
basis vector of B.
When both B and C are the standard bases then we refer to the matrix
as the standard matrix of f .
Whenever m = n we usually take B = C.
Example 4.10. (identity) The identity matrix I n
corresponds to the
linear transformation that fixes every basis vector, and hence fixes every
vector in Rn.
Example 4.11. (dilation) The linear transformation with matrix
2 0
0 2
A dilation is a function that maps every
vector to a fixed multiple of itself: x → λ x, where λ is called the ratio of the dilation.
(with respect to the standard basis, for both the domain and codomain)
maps e1 to 2e1 and e2 to 2e2: it is a dilation of ratio 2 in R2.
Example 4.12. (rotation) The linear transformation with matrix
0 −1
1 0
(with respect to the standard basis) maps e1 to e2 and e2 to −e1: it is the
anticlockwise rotation of the plane by an angle of 90 degrees (or π /2)around the origin.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 81/212
mathematical methods i 81
Exercise 4.1.2. In R2, an anticlockwise rotation of angle θ around the
origin is a linear transformation. What is its matrix with respect to the
standard basis?
4.1.3 CompositionWhenever you have two linear transformations such that the
codomain of the first one is the same vector space as the domain
of the second one, we can apply the first one followed by the sec-
ond one.
Definition 4.13. (Composition)
Let f : Rn → Rm and g : Rm → R
p be functions. Then the function
g ◦ f : Rn → R p defined by
( g ◦ f )( x) = g( f ( x))
is the composition of f by g.
Notice we read composition from rightto left.
Theorem 4.14. If f : Rn → Rm and g : Rm → R
p are linear
transformations, then g ◦ f is also a linear transformation.
Proof. We need to prove the two conditions for a linear transforma-
tion.
For all u, v in Rn:
( g ◦ f )(u + v) = g( f (u + v)) (definition of composition)
= g( f (u) + f (v)) ( f is a linear transformation)
= g( f (u)) + g( f (v))) ( g is a linear transformation)
= ( g ◦ f )(u) + ( g ◦ f )(v) (definition of composition)
For all u in Rn and all α in R:
( g ◦ f )(αv) = g( f (αv)) (definition of composition)
= g(α f (v)) ( f is a linear transformation)
= α g( f (v)) ( g is a linear transformation)
= α( g ◦ f )(v) (definition of composition)
Let A = (aij ) be the matrix corresponding to f with respect to
the basis {u1, u2, . . . , un} of the domain and the basis {v1, v2, . . . , vm}of the codomain. Let B = (bij ) be the matrix corresponding to g
with respect to the basis {v1, v2, . . . , vm} of the domain and the ba-
sis {w1, w2, . . . , w p} of the codomain. So A is an m × n matrix and
B is a p
×m matrix. Let us look at the image of u1 under g
◦ f .
We first apply f , so the image f (u1) corresponds to the firstcolumn of A: f (u1) = a11v1 + a21v2 + . . . + am1vm = ∑
mi=1 ai1vi.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 82/212
82
Then we apply g to f (u1):
( g ◦ f )(u1) = g
m
∑ i=1
ai1vi
=
m
∑ i=1 ai1 g(vi) ( g is a linear transformation)
=m
∑ i=1
ai1
p
∑ j=1
b ji w j
= p
∑ j=1
m
∑ i=1
ai1b ji
w j
= p
∑ j=1
m
∑ i=1
b ji ai1
w j
= p
∑ j=1
(BA) j1w j
Notice this is exactly the first column of the matrix B A! We can do
the same calculation with any u j (1 ≤ j ≤ n) to see that the image
( g ◦ f )(u j) corresponds exactly to the j-th column of the matrix B A.
Hence the matrix corresponding to g ◦ f with respect to the basis
{u1, u2, . . . , un} of the domain and the basis {w1, w2, . . . , w p} of the
codomain is B A.
Key Concept 4.15. Composition of linear transformations is the
same thing as multiplication of the corresponding matrices.
You may have thought that matrix multiplication was defined in
a strange way: it was defined precisely so that it corresponds with
composition of linear transformations.
4.1.4 Inverses
An inverse function is a function that “undoes” another function:
if f (x) = y, the inverse function g maps y to x . More directly,
g( f (x)) = x, meaning that g ◦ f is the identity function, that is, it
fixes every vector. A function f that has an inverse is called invert-ible and the inverse function is then uniquely determined by f and
is denoted by f −1.
In the case where f is a linear transformation, it is invertible if
and only if its domain equals its codomain and its range, say Rn. In
this case, f −1 ◦ f is the identity function.
Let A be the matrix corresponding to f and B be the matrix
corresponding to f −1 (all with respect to the standard basis, say);
both are n × n matrices. We have seen in Example 4.10 that the
matrix corresponding to the identity function is the identity matrix
I n. By the Key Concept 4.15, we have that B A = I n. In other words,
B is the inverse matrix of A.Hence we have:
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 83/212
mathematical methods i 83
Theorem 4.16. The matrix corresponding to the inverse of an invertible
linear transformation f is the inverse of the matrix corresponding to f
(with respect to a chosen basis).
4.1.5 Rank-Nullity Theorem revisited
Remember the Rank-Nullity Theorem (Theorem 3.18): rank( A) +
nullity( A) = n for an m × n matrix A. We now know that A rep-
resents a linear transformation f : x → A x, so we are going to
interpret what the rank and the nullity are in terms of f .
The rank of A is the dimension of the column space of A. We
have seen that the columns represent the images f (u j) for each
basis vector u j of Rn. So the column space corresponds exactly to
the range of f . By Theorem 4.7, we know that the range of f is a
subspace, and so has a dimension: the rank of A corresponds to the
dimension of the range of f .
The nullity of A is the dimension of the null space of A. Recall
that the null space of A is the set of vectors x of Rn such that A x =
0. In terms of f , it corresponds to the vectors x of Rn such that
f ( x) = 0. This set is called the kernel of f .
Definition 4.17. (Kernel)
The kernel of a linear transformation f : Rn −→ Rm is the set
Ker( f ) = { x ∈ Rn| f ( x) = 0}.
The kernel of f is a subspace1 of Rn and so has a dimension: the 1 Try proving it!
nullity of A corresponds to the dimension of the kernel of f .
We can now rewrite the Rank-Nullity Theorem as follows:
Theorem 4.18. Let f be a linear transformation. Then
dim(range( f )) + dim(Ker( f )) = dim(dom( f )).
We immediately get:
Corollary 4.19. The dimension of the range of a linear transformation
is at most the dimension of its domain.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 85/212
Part II
Differential Calculus: byLuchezar Stoyanov,
Jennifer Hopwood and
Michael Giudici. Some
pictures courtesy of Kevin Judd.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 87/212
5
Vector functions and functions of several variables
A great deal can be done with scalar functions of one real variable;
that is, functions that are rules for assigning to each real number in
some subset of the real line (the independent variable) another real
number (the dependent variable). An example of such a function is
f (x) = x2
where x is a real number. However, sometimes this type of function
is not enough to deal with the problems that arise in science, en-
gineering, economics or other fields. For instance, a meteorologist
might want to deal with air pressure which varies with latitude,
longitude and time, so would need a function of three variables.
Other types of problems might involve just one independent vari-
able (for instance, time) which affects several dependent variables
(e.g. the components in three directions of an electromagnetic field);
then we would have a set of three functions of one variable. We
could deal with all three at once by regarding them as the compo-
nents of a vector.
Just as for scalar functions of one real variable, it is important to
know how these functions change as the variables change. This is
where calculus is required. Luckily the ideas of differential calculus
from high school can be extended to these new functions and that is
the focus of this part of the course.
5.1 Vector valued functions
Vector functions are functions that take a real number and assign a
vector in Rn. We will usually be dealing with vectors in R
2 and R3.
More precisely, given a subset I of R, a vector function with values
in R3 is a function
r : I −→ R3
t −→ r (t) = ( f (t), g(t), h(t))
The functions f (t), g(t) and h(t) are called coordinate functions of
r (t).
For instance, t could represent time and r (t) the position of some
object at that time. As t varies in I , r (t) draws a curve C in R3
which is the path of the object in space.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 88/212
88
An example of such a function is
r : R −→ R3
t −→ sin t,2cos t, t
10
that plots the curve given in Figure 5.1.
Figure 5.1: The curve defined by r (t) =sin(t),2cos(t), t
10
for t ∈ [0,20]
In mathematical language, the curve C is the set described by
C = {( f (t), g(t), h(t)) | t ∈ I }.
We say that
C : x = f (t), y = g(t), z = h(t), for t ∈ I
are parametric equations of C with parameter t, or equivalently, that
( f (t), g(t), h(t)) is a parameterisation of C.
Remark 5.1. Any curve C (except if it consists of a single point) has
infinitely many different parameterisations, that is, can be described by
infinitely many vector functions. See Example 5.2(2) and (3) below.
A vector function with values in R2 has the form
r : I −→ R2
t −→ r (t) = ( f (t), g(t))
with coordinate functions f (t) and g(t). As above, when t varies in
I , r (t) draws a curve C in R2.
Example 5.2. 1. For t ∈ R, let r (t) = (x0 + at, y0 + bt, z0 + ct), where
x0, y0, z0, a, b, c are real constants. The corresponding curve C defined
by
x = x0 + at, y = y0 + bt, z = z0 + ct, for t ∈ R,
is a straight line in R3 – this is the line through the point P =
(x0, y0, z0) parallel to the vector v = (a, b, c). Briefly, r (t) = P + tv.
2. For t ∈ [0, 2π ] define r (t) = (cos(t),sin(t)) The corresponding curve
is the unit circle with centre at (0, 0) and with radius one.
3. Another parameterisation of the unit circle is given by r 1(t) =(cos(2t),sin(2t)) for t ∈ [0,π ].
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 89/212
mathematical methods i 89
5.2 Functions of two variables
We can think of two variables x and y as the vector (x, y) ∈ R2.
A function of two variables can then be thought of as a function
whose variable is a vector in R2. More formally, let D be a subset of
R2. A function of two variables defined on D is a rule that assigns to
each (x, y) ∈ D a real number f (x, y), that is,
f : D −→ R
(x, y) −→ f (x, y)
An example of a function of two variables is the height above sea
level at a location on a map defined by its x and y coordinates.
Given a function f : A → B, the graph of f is the set
graph( f ) = {( a, f (a) ) | a ∈ A}
When f is a scalar-valued function of one real variable, then theelements of graph( f ) are points in the plane and so this concept
corresponds to the graph in the plane that you are familiar with
drawing.
When f is a function of two variables then
graph( f ) = {( x, y, f (x, y) ) | (x, y) ∈ D}This defines a surface S in R
3. We sometimes write
S : z = f (x, y), for (x, y) ∈ D,
which can be regarded as a parametric equation for S. So if we
think of R2 as the plane in R3 defined by z = 0, then z = f (x, y)
gives the distance of the surface above or below the plane at the
point (x, y) just as y = f (x) gives the distance of a curve above or
below the x-axis for functions of one variable.
Figure 5.2 shows the surface defined by the graph of the function
f (x, y) = y2 − x2.
Figure 5.2: The surface defined by f (x, y) = y2 − x2
Example 5.3. 1. Let
f : R2 −→ R
(x, y) −→ x2 + y2
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 90/212
90
Then
graph( f ) = {(x, y, x2 + y2) | (x, y) ∈ R2}
and defines a surface that is called a paraboloid and is shown in Fig-
ure 5.3. The domain of f is R2 and the range is [0,∞).
Figure 5.3: The surface defined by f (x, y) = x 2 + y2
2. Let D = {(x, y) ∈ R2 | x2 + y2 ≤ 1} and define
f : D −→ R
(x, y) −→ 1 − x2 − y2
The graph of f is the upper hemisphere in R3 of radius 1 and centre
(0,0,0) as displayed in Figure 5.4. The range of f is the interval [0, 1].
Figure 5.4: The surface defined by
f (x, y) = 1 − x2 − y2
5.2.1 Level Curves
Another way of visually representing functions of two variables is
by using level curves.
Suppose f : D −→ R is a function of two variables. Then the set
of points (x, y) in D satisfying the equation
f (x, y) = k
where k is some real constant defines a curve in D . It is called a
level curve of f because it consists of all the points in D for whichthe corresponding points on the surface define by f are at the same
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 91/212
mathematical methods i 91
level, that is, at height |k | above or below the plane. If k is not in the
range of f , then the level curve does not exist.
Examples of level curves with which you might be familiar are:
1. contour lines on maps, which are lines of constant height above
sea-level, and
2. the isobars on a weather chart, which are lines of constant atmo-
spheric pressure.
Level curves are a way of giving visual information about a surface
in R3 in a 2-dimensional form.
Example 5.4. 1. Let f (x, y) = x2 + y2. If k > 0, the level curve defined
by f (x, y) = k is the circle x2 + y2 = k. If k = 0 then the level curve
is the single point {0} and if k < 0, then the level curve does not exist.
Figure 5.5 displays some of the level curves.
Figure 5.5: Some level curves for f (x, y) = x 2 + y2.
2. Let f (x, y) = 2x2 + 3 y2. If k > 0, the level curve defined by f (x, y) =
k is the ellipse 2x2 + 3 y2 = k. If k = 0 the level curve is the single
point {0} and if k < 0, then the level curve does not exist. Some of the
level curves are displayed in Figure 5.6.
Figure 5.6: Some level curves for f (x, y) = 2x2 + 3 y2.
5.3 Functions of three or more variables
Functions of three or more variables may also be defined but itis not possible to visualise their graphs as curves or surfaces. For
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 92/212
92
example, if D is a subset of R3, a function of three variables defined in
D is a rule that assigns to each (x, y, z) ∈ D a real number f (x, y, z).
In a similar way, given a subset D of Rn, we can consider functions
f (x1, x2, . . . , xn) defined for (x1, x2, . . . , xn) ∈ D.
For a general function f of n variables,
graph( f ) = {( x1, x2, . . . , xn, f (x1, x2, . . . , xn) ) | (x1, x2, . . . , xn) ∈ D}
(this is an n-dimensional surface in Rn+1, which is a concept which
makes algebraic but not visual sense). For any k ∈ R the set
S = { x ∈ D | f ( x) = k }
is called a level surface of f (level curve when n = 2).
Example 5.5. 1. f (x, y, z) = ax + by + cz = c · x is a linear function
of three variables for each constant vector c = (a, b, c). Here x =
(x, y, z) ∈ R3. If c = 0, then for each real number k the equation
f ( x) = k determines a plane in R3 with normal vector c. Changing k
gives different planes parallel to each other. That is, the level surfaces
defined by f are all planes perpendicular to c.
2. f ( x) = c · x = c1x1 + . . . + cnxn. Then (see part (1) above) the level
surfaces { x | f ( x) = k } are mutually parallel (n − 1)-dimensional
subspaces.
3. f (x, y, z) = x2 + y2 + z2. Then, for each k > 0, the level surface
defined by f (x, y, z) = k is a sphere with radius√
k and centre 0.
5.4 Summary
Summing up we have discussed three types of functions:
1. Scalar-valued functions of one real variable, that is, functions of
the form
f : A −→ R
for some subset A of R.
2. Vector functions, that is, functions of the form
f : A −→ Rn
for some subset A of R with n ≥ 2.
3. Functions of several variables, that is, functions of the form
f : A −→ R
for some subset A of Rn with n ≥ 2.
High school calculus considered functions of type (1). In these
notes we will briefly refresh this and study the calculus of functionsof types (2) and (3).
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 93/212
mathematical methods i 93
We can of course have even more general functions, that is, func-
tions of the form
f : A −→ Rm
for some subset A of Rn with n, m ≥ 2. Such functions are some-
times called vector fields or vector transformations.We have already seen linear versions of such functions in Chap-
ter 4: an m × n matrix B defines the function
f : Rn −→ Rm
x −→ B x
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 95/212
6
Limits and continuity
6.1 Scalar-valued functions of one variable
An important concept for the calculus of scalar-valued functions of
one variable is the notion of a limit. This should mostly be familiarto you from school. Recall that an open interval in R is a set denoted
by (a, b) and consisting of all real numbers x such that a < x < b. A
closed interval [a, b] consists of all real numbers x such that a ≤ x ≤b.
Definition 6.1. (Intuitive definition of a limit)
Let f (x) be defined on an open interval containing a, except possibly
at a, and let L ∈ R. We say that f converges to L as x approaches
a if we can make the values of f (x) arbitrarily close to L by taking x to
be sufficiently close to a, but not equal to a. We write limx→a f (x) = L or
f (x) →x→a L.
Essentially what the definition is saying is that as x gets closer
and closer to a, the value f (x) gets closer and closer to L .
Sometimes we can calculate the limit of f (x) as x approaches a
by simply evaluating f (a). For example, if f (x) = x2 then
limx→1
f (x) = f (1) = 1.
However, it may be the case that limx→a f (x) = f (a). For example,consider the function f : R → R defined by
f (x) =
x if x = 0
2 if x = 0(6.1)
The graph of this function is given in Figure 6.1. Then we clearly
have that limx→0
f (x) = 0 while f (0) = 2.
It is also possible to define one-sided limits. We write
limx→a− f (x) = L
if f (x) gets arbitrarily close to L as x gets sufficiently close to a withx < a. This definition only requires f to be defined on an open
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 96/212
96
−2 −1 1 2 3
−2
−1
1
2
3 Figure 6.1: The graph of the function f defined in (6.1)
interval (b, a) for some b <
a. Similarly, lim x→a+ f (x) = L if f (x)gets arbitrarily close to L as x gets sufficiently close to a with x > a.
Note that limx→a
f (x) exists if and only if
limx→a−
f (x) = lim x→a+
f (x)
Example 6.2. 1. For any constant α > 0, limx→0
xα = 0.
2. limx→0
sin x
x = 1.
Figure 6.2: The graph of the function
y = sin(x)/x for x ∈ [−5, 5].
3. Consider the function
f (x) =
0 if x < 0
1 if x ≥ 0
Then limx→0 f (x) does not exist. However, lim x→0− f (x) = 0 and
limx→0+ f (x) = 1.
To make it easier to calculate limits we have the following rules
that allow us to combine limits of known functions to determinelimits of new functions.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 97/212
mathematical methods i 97
Theorem 6.3 (Limit Laws). Let I be an open interval, a ∈ I, and
let f (x) and g(x) be defined for x ∈ I \ {a} such that limx→a
f (x) and
limx→a
g(x) exist. Then:
1.
limx→a
( f (x) + g(x)) = limx→a
f (x) + limx→a
g(x)
limx→a
( f (x) − g(x)) = limx→a
f (x) − limx→a
g(x)
2. For any constant c ∈ R,
limx→a
(c · f (x)) = c · limx→a
f (x)
3.
limx→a
( f (x) · g(x)) = (limx→a
f (x)) · (limx→a
g(x))
4. If limx→a
g(x) = 0, then
limx→a
f (x)
g(x) =
limx→a f (x)
limx→a g(x)
The statements in Theorem 6.3 remain true if x → a is replaced
by either x → a− or x → a+ for suitably defined functions.
The intuitive definition of a limit can be easily extended to the
notion of the limit of f (x) as x approaches infinity. In particular
we say that f (x) tends to L as x → ∞ if we can make f (x) arbi-
trarily close to L by taking x sufficiently large. We denote this bylim
x→∞ f (x) = L . We can define lim
x→−∞ f (x) in a similar fashion.
The statements in Theorem 6.3 still hold if we replace x → a by
x → ∞ or x → −∞.
Example 6.4. 1. For any constant α > 0, limx→∞
1
xα = 0.
2. Evaulate limx→∞
x2 + 2x + 1
2x2 + x + 7.
Solution: We have
limx→∞
x2
+ 2x + 12x2 + x + 7
= limx→∞
x
2
(1 +
2
x +
1
x2 )x2(2 + 1
x + 7x2 )
= limx→∞
1 + 2x + 1
x2
2 + 1x + 7
x2
=lim
x→∞1 + lim
x→∞
2
x + lim
x→∞
1
x2
limx→∞
2 + limx→∞
1
x + lim
x→∞
7
x2
by Theorem 6.3
= 1+0+0
2+
0+
2
= 1
2
Another useful theorem for calculating limits is the following
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 98/212
98
Theorem 6.5 (The Squeeze Theorem, or The Sandwich Theorem).
Let I be an open interval, a ∈ I, and let f (x), g(x) and h(x) be defined
for x ∈ I \ {a} with g(x) ≤ f (x) ≤ h(x) for all x ∈ I \ {a}. If
limx→
a g(x) = lim
x→
ah(x) = L
then limx→a
f (x) = L.
The Squeeze Theorem still holds if x → a is replaced by any of
x → a−, x → a+, x → ∞ or x → −∞ with the appropriate changes
to the statement about the domain of the function.
Example 6.6. Determine limx→∞
sin x
x .
We have
− 1
x ≤ sin x
x ≤ 1
x
for any x >
0, and
limx→∞
−1
x =
limx→∞
1
x =
0. So by the Squeeze
Theorem, limx→∞
sin x
x = 0.
We saw in Example 6.2(3) one way in which a limit may not
exist. Another way for which a limit may fail to exist is if f (x)
diverges to ∞.
Definition 6.7. (Diverging to ∞)
Let f (x) be defined on an open (b, a). We say that f diverges to ∞ as
x → a from below if f (x) gets arbitrarily large as x gets sufficiently
close to a with x < a. We denote this by limx→a− f (x) = ∞.In a similar way one defines lim
x→a− f (x) = −∞, limx→a+
f (x) = ∞ and
limx→a+
f (x) = −∞.
Note that by writing
limx→a− f (x) = ∞
we are not saying that the limit exists,we are simply using the notation todenote that f diverges to infinity from below.
Finally, if limx→a−
f (x) = limx→a+
f (x) = ∞, we write limx→a
f (x) = ∞.
Similarly, limx→a
f (x) = −∞ means that limx→a− f (x) = lim
x→a+ f (x) = −∞.
Example 6.8. For f (x) = 1
x
−1
, x = 1, we have
limx→1−
f (x) = −∞ , limx→1+
f (x) = ∞.
6.1.1 The precise definition of a limit
The intuitive definition of a limit given in Definition 6.1 is a bit im-
precise; what exactly do ‘approach’ or ‘sufficiently close’ mean?
This is particularly unclear when dealing with functions of sev-
eral variables: how do you ‘approach’ 0? From which direction?
It is also unclear when you start to deal with more complicated
functions than the ones you will see in this course. Moreover, we
need a more precise definition if we wish to give rigorous proofs of Theorem 6.3 and 6.5.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 99/212
mathematical methods i 99
For these reasons, mathematicians require a more formal and
precise definition. We outline such a definition in this subsection
for the interested student. These ideas will be explored in tutorials.
Definition 6.9. (Limits at finite points)Let f (x) be defined on an open interval I around a, except possibly at a,
and let L ∈ R. We say that f converges to L as x → a if for all > 0
there exists δ > 0 such that
| f (x) − L| < for all x ∈ I \ {a} with |x − a| < δ
This definition needs quite a bit of unpacking. Essentially what it
is saying is that no matter what you are given, then it is possible
to find a δ > 0 (possibly depending on ) such that if x has distance
at most δ from a then f (x) is at most distance from L. The quan-
tity is the measure of ‘closeness’ of f (x) to L and δ is the measure
of ‘closeness’ of x to a . Another way of thinking about it is that δ is
the control on the input error that you need to guarantee an output
error of .
Visualising this geometrically, what we have is the following:
Since | f (x) − L| < is equivalent to L − < f (x) < L + , then
limx→a
f (x) = L means that for any > 0 we can find δ > 0 small
enough that the part of the graph of f corresponding to the interval
(a − δ, a + δ) is entirely in the horizontal strip L − < y < L + .
In other words, when x approaches a the value f (x) of the functionapproaches L .
We will demonstrate this with a simple example.
Example 6.10. Let f (x) = 2x + 1 and a = 0. Intuitively we expect
the limit as x tends to 0 to be 1. More formally, suppose that > 0. To
show that limx→0
f (x) = 1 we need to find a δ > 0 so that if x is at distance
at most δ from 0 then f (x) is at distance at most from 1. A good guess
would be δ = /2. Indeed, if |x − 0| < δ then
| f (x) − 1| = |2x + 1 − 1| = |2x| = 2|x| < 2δ =
as required.
We can also give a precise definition for limits as x approaches ∞.
Definition 6.11. (Limits at ∞ and −∞)
Let f (x) be defined on an interval (c,∞) for some c ∈ R. We say that
f (x) tends to L as x → ∞ if for all > 0 there exists b > c such that
| f (x) − L| < for all x ≥ b. In a similar way we define limx→−∞ f (x) = L.
Geometrically, limx→∞ f (x) = L means that for any > 0 we canfind some b > a (possibly depending on ) so that the part of the
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 100/212
100
graph of f corresponding to the interval [b,∞) is entirely in the
horizontal strip L − < y < L + . In other words, as the variable
x becomes larger and larger the value f (x) of the function at x
approaches L .
We can also give a precise definition for f to diverge to ∞.
Definition 6.12. (Diverging to ∞)
Let f (x) be defined on an interval of the form (b, a) for some b < a. We
say that f diverges to ∞ as x → a with x < a if for any M > 0 there
exists δ > 0 such that f (x) > M for all x < a with |x − a| < δ.
This is saying that no matter what positive number M you
choose, you can always find some margin δ so that if x is at most
δ away from a then f (x) will be greater than M.
6.2 Limits of vector functions
We can consider limits of vector functions in a similar way to how
we viewed limits for scalar functions.
Let r (t) = ( f (t), g(t), h(t)) be a vector function defined on some
open interval around a , except possibly at a. Define
limt→a
r (t) =
limt→a
f (t),limt→a
g(t),limt→a
h(t)
,
if the limits of the coordinate functions exist. Limits can then be
calculated using the techniques for scalar-valued functions.
We can also define limt→a− r (t) and lim
t→a+r (t) in a similar fashion.
6.3 Limits of functions of two variables
We can also extend the notion of a limit to functions of two vari-
ables but this requires a few modifications.
First, for functions of one real variable we used the notion of an
open interval of the real line. The appropriate notion in the two
variable case is that of an open disc: Given a = (a1, a2) ∈ R2 and
δ > 0, the open disc of radius δ and centre a is the set
B(a, δ) = { x ∈ R2 | d( x, a) < δ}
Here
d( x, a) =
(x1 − a1)2 + (x2 − a2)2
is the distance between the points x = (x1, x2) and a.
Given a subset D of R2 we say that a is an interior point of D if D
contains some open disc centred at a.
The intuitive definition of a limit would then be:
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 101/212
mathematical methods i 101
Definition 6.13. (Intuitive definition of a limit)
Let f ( x) be defined on an open disc centred at a, except possibly at a, and
let L ∈ R. We say that f converges to L as x → a if we can make the
values of f ( x) arbitrarily close to L by taking x to be sufficiently close toa, but not equal to a. We write lim
x→a f ( x) = L.
Example 6.14. 1. Let f (x, y) = x. Then
lim(x, y)→(a1,a2)
f (x, y) = a1.
2. Determine lim(x, y)→(0,0)
x2 + x + xy + y
x + y .
Solution: Here it helps if we factorise the numerator. We have
lim(x, y)→(0,0)
x2 + x + xy + y
x + y = lim
(x, y)→(0,0)
(x + y)(x + 1)
x + y
= lim(x, y)→(0,0)
x + 1
= 1
For the one variable case there were only two paths along which
to approach a real number a: from the left or from the right. For
the limit to exist we require that the limit as we approach a from
either side is the same. However, there are infinitely many paths
approaching a point a ∈ R2 and they are not all straight lines! For
the limit to exist the value of f ( x) must approach the same value Lno matter which path we choose.
More formally, for the limit to exist and to be equal to L we
require that for every curve r (t) with limt→t0 r (t) = a we have
limt→t0
f (r (t)) = L
If the latter holds for only some curves but not all curves, then
lim x→a f ( x) does not exist. This gives a simple method for showing
that a limit does not exist.
Example 6.15. Let D = R2\{(0, 0)} and define
f : D −→ R
(x, y) −→ xyx2+ y2
Let a = (0, 0).
First consider the curve r 1(t) = (t, t) passing through a. Then
limt→0
r 1(t) = (0, 0), so
limt→0
f (r 1(t)) = limt→0
tt
t2 + t2 =
1
2.
Next consider the curve, r 2(t) = (0, t) defined for all t > 0. Then
limt→0+
r 2(t) = a and along this curve f (r 2(t)) = f (0, t) = 0 and so
limt→0+
f (r 2(t)) = limt→0+
0 = 0 = 1
2
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 102/212
102
Since we have found two different paths with two different limits as they
approach (0, 0) it follows that lim(x, y)→(0,0)
f (x, y) does not exist.
To see what is actually happening with the function we can plot its
graph and we see in Figure 6.3 that the resulting surface seems to have
some kind of pinch in it at (0, 0). The curve (r 1(t), f (r 1(t))) has gonealong the top ridge while the curve (r 2(t), f (r 2(t))) has gone along the
bottom ridge.
Figure 6.3: The surface defined by f (x, y) = xy
x2 + y2 .
Sometimes it is easier to find a limit, if it exists, by first changing
the co-ordinate system; so we shall now consider another com-
monly used system of co-ordinates in R2.
Definition 6.16. (Polar coordinates in R
2)
Let x = r cos(θ) and y = r sin(θ) for r ≥ 0 and θ ∈ [0, 2π ). Then r,
θ are called the polar coordinates of the point (x, y). Notice that r2 =
x2 + y2, so r =
x2 + y2 and is the distance of the point (x, y) from the
origin.
Exercise 6.3.1. Find lim(x, y)→(0,0)
xy x2 + y2
if it exists.
Solution. Introduce polar coordinates: x = r cos(θ) and y = r sin(θ).
Since r =
x2 + y2, we have r → 0 as (x, y) → (0, 0). Now
xy x2 + y2
= |r2
cos(θ) sin(θ)|r
= r| cos(θ)|| sin(θ)| ≤ r
and so
−r ≤ xy x2 + y2
≤ r.
Letting (x, y) → (0, 0), we have r → 0, so by the Squeeze Theorem,
lim(x, y)→(0,0)
xy x2 + y2
= 0.
We can define limits for functions of three or more variables in
a similar way. We define the open ball of radius δ and centre a to be
the setB(a, δ) = { x ∈ R
n | d( x, a) < δ}
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 103/212
mathematical methods i 103
where d( x, a) is the distance between x and a. Sometimes we refer
to open discs in R2 and open intervals in R as open balls so that
we do not need to interchange terminology for each case. Then
Definition 6.3 extends to the general setting by replacing “open
disc” with “open ball”.
We end this section by giving the precise definition of a limit for
the interested student.
Definition 6.17. (Precise definition of limit)
Let f ( x) be defined on an open ball around a, except possibly at a, and let
L ∈ R. We write
L = lim x→c
f ( x)
and say that the limit of f ( x) as x approaches a is equal to L if for
every > 0 there exists δ > 0 such that | f ( x) − L| < holds for each
x ∈ B(a, δ) with x = a (that is, for each x with 0 < d( x, a) < δ).
6.4 Continuity of Functions
Continuity for scalar-valued functions of one variable essentially
means that the graph of the function has no gaps in it, that is, can
be drawn without taking your pen off the page. We make this def-
inition more precise and state it in a manner that also applies to
vector functions and functions of several variables as follows:
Definition 6.18. (Continuity)
Let a ∈ Rm for some m ≥ 1 and let D ⊆ R
m such that D contains an
open ball centred at a. Let f : D → Rn for some n ≥ 1. We say that f is
continuous at a if
lim x→ a
f ( x) = f (a)
We say that f is continuous if it is continuous at all points in its domain.
It may be the case that f is not definedon an open ball around each point in
its domain. For example, if A = [0,∞
)and f : A → R is defined by f (x) =√ x then point 0 is in the domain of
f but f is not defined on an openinterval around 0. In this case, to saythat f is continuous at 0 means thatlimx→0+ = f (0). We can similarlydefine f to be continuous at the rightend point of an interval.
One consequence of continuity is that a continuous function
cannot skip values, that is small changes in the variable result insmall changes in the value of the function.
It is easy to see that if r (t) is a vector-valued function then it is
continuous at t0 if and only if each of the coordinate functions f (t),
g(t) and h(t) are continuous at t0.
We give some examples of continuous functions.
Example 6.19. 1. Given real numbers a0, a1, . . . , an, the polynomial
P(x) = a0xn + a1xn−1 + . . . + a1x + an, x ∈ R
is a continuous function, and therefore every rational function
R(x) = P(x)Q(x)
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 104/212
104
where P and Q are polynomials, is continuous on its domain.
2. sin(x) and cos(x) are continuous on R.
3. ex is continuous on R, while ln(x) is continuous on its domain, which
is (0,∞).
4. For any constant α ∈ R, the function f (x) = xα is continuous for all
x > 0.
5. The function r (t) = (t3, ln(3 − t),√
t) with domain [0, 3) is continu-
ous since each of its coordinate functions is continuous on this interval.
6. It follows from part (1) that f (x, y) = x is a continuous function in
R2. Similarly, g(x, y) = y is continuous.
7. The function f defined in Equation (6.1) is not continuous at x = 0.
The next result tells us various ways of combining continuousfunctions to get new continuous functions.
Theorem 6.20. Let n ≥ 1, D ⊆ Rn and let f , g : D −→ R be
continuous at c ∈ D. Then:
1. The functions a f ( x) + bg( x) (a , b ∈ R) and f ( x) g( x) are continuous
at c.
2. If g(c) = 0, then f ( x) g( x)
is continuous at c.
3. If h(t) is a function of a single variable t ∈ R, defined on the range of f
and continuous at f (c), then h( f ( x)) is continuous at c.
4. If t ∈ I ⊆ R, and r (t) is a vector function such that r (t) ∈ D for t ∈ I
and r (t0) = c , and if r is continuous at t0, then the function f (r (t)) is
continuous at t0.
Example 6.21. Since f (x, y) = x and g(x, y) = y are continuous
functions (Example 6.19(6)), Theorem 6.20(1) implies that h(x, y) = x p yq
is continuous for all integers p, q ≥ 0. Using Theorem 6.20(1) again, one
gets that every polynomial
P(x, y) = a1x p1 yq1 + a2x p2 yq2 + . . . + ak x pk yqk
is continuous. Hence by Theorem 6.20(2) rational functions R(x, y) =P(x, y)Q(x, y)
, where P and Q are polynomials, are continuous on their domains
that is, everywhere except where Q(x, y) = 0.
Then, applying Theorem 6.20(3), it follows that eR(x, y), sin(R(x, y))
and cos(R(x, y)) are continuous functions (on their domains).
We note that if we know that a function f is continuous at a
point a then it is easy to calculate lim x→a
f ( x): it is simply f (a).
Example 6.22. Determine lim(x, y)→(0,0)
x2 + y2
2x + 2 .
Solution: Note that the denominator is nonzero when (x, y) = (0, 0).Thus since both the numerator and the denominator are polynomials,
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 105/212
mathematical methods i 105
Theorem 6.20(2) implies that the function f (x, y) = x2+ y2
2x+2 is continuous
at (0, 0). Thus
lim(x, y)→(0,0)
x2 + y2
2x + 2 =
02 + 02
2(0) + 2 = 0
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 107/212
7
Differentiation
You should be familiar with differentiating functions of one vari-
able from high school. We will first revise this and then move on
to the appropriate analogues for vector functions and functions of
several variables.
7.1 Derivatives of functions of one variable
Definition 7.1. (Derivative)
Let f (x) be a function defined on some interval I and let a be an interior
point of I.
We say that f is differentiable at a if the limit
limx→a
f (x)−
f (a)
x − a
exists and we define f (a) to be the value of the limit. Then f (a) is called
the (first) derivative of f at a.
In a similar way one defines the left derivative
f (a−) = limx→a−
f (x) − f (a)
x − a
of f at a, and the right derivative
f (a+) = lim
x→a+
f (x) − f (a)
x − aof f at a, whenever the corresponding limits exist.
Letting x = a + h we have that x → a is the same as h → 0 and so
an equivalent definition of the derivative is that
f (a) = limh→0
f (a + h) − f (a)
h
Recall that another notation for the derivative of f at a is d f dx (a).
Notice that the left and right derivatives can be considered not
only for interior points a of I but also when a is a right or left end-point of I respectively.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 108/212
108
We emphasis the following key concept as it will guide our later
definitions of the derivative of other types of functions.
Key Concept 7.2. The derivative of f at a is the slope of the tangent
line to the curve defined by y = f (x) at the point (a, f (a)). More-
over, close to x = a the tangent line is the best approximation to the
curve by a straight line.
You should be familiar with finding the derivative of various
functions from high school, such as polynomials, trigonometric
functions, exponential functions and logarithmic functions. More-
over, for a function defined in a piecewise manner you often need
to use the formal definition to determine the derivative. You should
also be familiar with the following results.
Theorem 7.3. Let f (x) and g(x) be differentiable on an interval I. Then:
1. [ f (x) + g(x)] = f (x) + g(x) and [ f (x) − g(x)] = f (x) − g(x).
2. For any constant c ∈ R, [c f (x)] = c f (x).
3. (Product Rule)
[ f (x) g(x)] = f (x) g(x) + f (x) g(x)
4. (Quotient Rule) If g(x) = 0 on I f (x)
g(x)
=
f (x) g(x) − f (x) g(x)
( g(x))2
5. (Chain Rule) Assume that h(x) is a differentiable function on some
interval J and h(x) ∈ I for all x ∈ J. Then the function f (h(x)) is
differentiable on J and
d
dx [ f (h(x))] = f (h(x))h(x)
Theorem 7.4. If f is differentiable at some a ∈ I, then f is continuousat a.
Remark 7.5. The converse statement of Theorem 7.4 is not true, that is,
not every continuous function is differentiable. For example, f (x) = |x| is
continuous everywhere on R. However
f (0−) = limx→0−
|x| − 0
x − 0 = lim
x→0−−x
x = −1 ,
while
f (0+) = limx→0+
|x| − 0
x − 0
= limx→0+
x
x
= 1
= f (0−) ,
so f is not differentiable at 0.1 1 There are continuous functions onR which are not differentiable at anypoint, but such examples are beyondthe scope of this unit.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 109/212
mathematical methods i 109
Note that even if f is differentiable at a point a it is not necessar-
ily true that f (a) = limx→a
f (x). This happens to be true only if f (x)
is continuous at a and this is not necessarily the case. We say that
f is continuously differentiable on an interval I if f (x) exists and is
continuous everywhere in I .
Definition 7.6. (nth derivative)
If f is differentiable at any x ∈ I, we get a function x → f (x) de-
fined on I which is called the (first) derivative of f on I. If f (x) is
also differentiable on I we define the second derivative of f on I by
f (x) = ( f )(x) for any x ∈ I. Repeating this process we can define the
nth derivative by f (n)(x) = ( f (n−1))(x) for all x ∈ I.
7.2 Differentiation of vector functions
We can define the derivative of a vector function in an analogous
way to the derivative of a scalar function of one variable.
Definition 7.7. (Derivative of a vector function)
Let r (t) be a vector-valued function defined on an interval I containing
the point t0
. Define
r (t0) = lims→0
r (t0 + s) − r (t0)
s ,
if the limit exists. In this case r (t0) is called the derivative of r at t0, and
r is called differentiable at t0.
The vector function r (t) is called continuously differentiable in I
if r (t) exists and is continuous everywhere in I . Just as for scalar
functions of one variable, it is possible for a function to be differen-
tiable on an interval but for the derivative not to be continuous.The following result shows us that to differentiate a vector func-
tion we simply need to differentiate each of the coordinate func-
tions. We state it for vector functions whose image lies in R2 but it
will clearly extend to any Rn.
Theorem 7.8. The vector function r (t) = ( f (t), g(t)) is differentiable at
t0 if and only if the coordinate functions f , g and h are differentiable at t0.
When r (t) is differentiable at t0, its derivative is given by
r (t0) = ( f (t0), g(t0)).
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 110/212
110
Proof. We have
r (t0) = lims→0
r (t0 + s) − r (t0)
s
= lim
s→0
1
s
( f (t0 + s)
− f (t0), g(t0 + s)
− g(t0))
=
lims→0
f (t0 + s) − f (t0)
s , lim
s→0
g(t0 + s) − g(t0)
s
= ( f (t0), g(t0))
Recall that given a vector v = (x, y) ∈R
2, the length or magnitude of v is de-noted by |v| and is given by
x2 + y2
What is the geometric interpretation of the derivative of a vector
function? Recall that the vector function r : I → R2 describes
a curve C in R2. If t0 ∈ I such that r (t0) exists and is non-
zero, then r (t0) is a tangent vector to the curve C at r (t0). It
will sometimes be useful to use the corresponding unit tangent
vector given by
T (t0) = r (t0)
|r (t0)|The line passing through the point r (t0) and parallel to the
vector r (t0) is called the tangent line to the curve C at r (t0). The
tangent line is called the (affine) linear approximation of C at
r (t0). There is a unique linear parameterisation of by a vector
function v(t) such that v(t0) = r (t0) and v (t0) = r (t0), namely
v(t) = r (t0) + (t−
t0) r (t0)
(However there are many other linear parameterisations of .)
Near r (t0), it is the best approximation of C by a line.
We have a similar geometric interpretation for functions
r : I → R3.
The derivative of a vector function r (t) encodes more than just
the slope of the tangent to the curve traced out by r (t). For exam-
ple, consider the two vector functions r 1(t) = (cos(t),sin(t)) and
r 2(t) = (cos(−2t),sin(−2t)). Both functions trace out the same
curve inR
2
, namely the unit circle. Now r 1(t) = (− sin(t),cos(t))while r 2(t) = (2sin(−2t), −2cos(−2t)). The first is a vector of mag-
nitude 1 while the second has magnitude 2 and at time t = 0 points
in the opposite direction to the first. This reflects the fact that the
first function describes an anticlockwise trajectory around the circle
while the second describes a clockwise trajectory that is travelling
at twice the speed.
In particular, for a particle with position vector at time t given by
r )(t) the velocity vector is given by r (t) and the speed is given by
|r (t)|. The velocity vector describes the rate of change of position
(and so has a direction) while the speed is the rate of change of
distance travelled (and so only a scalar). The acceleration vector isgiven by r (t).
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 111/212
mathematical methods i 111
C
r (t0)
Figure 7.1: Tangent line of a curve Cat r (t0).
Example 7.9. 1. Sketch the curve C defined by
r (t) = (cos(t),sin(t), t) for t ∈ R,
find the tangent vector r (t) and write down the parametric equations
of the tangent line to C at r (t0). (This curve is called a helix.)
Solution. The curve C lies on the cylinder x2 + y2 = 1. The orthogo-
nal projection of r (t) on the xy-plane is v(t) = (cos(t),sin(t)), while
z = t is the height of the point r (t). The curve C is shown in Figure
7.2.
Figure 7.2: The curve defined byR(t) = (cos(t),sin(t), t).
We have
r (t) = (− sin(t),cos(t), 1)
Given t0, the tangent line to C at r (t0) is the line through r (t0) =
(cos(t0),sin(t0), t0) parallel to r (t0) = (− sin(t0),cos(t0), 1). Thus
the parametric equations of are:
L : x = cos(t0) − (t − t0) sin(t0) , y = sin(t0) + (t − t0) cos(t0) , z = t , t ∈ R.
2. Let r (t) = (t, f (t)), where f (t) is a differentiable function. Then the
curve defined byx = t, y = f (t)
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 112/212
112
coincides with the graph of the function f and r (t) = (1, f (t)) which
is a vector in the direction of the tangent to the graph.
Just as with scalar functions of one variable, we need some rules
for finding the derivatives of combinations of vector functions. So
we have the following
Theorem 7.10. Let u(t) and v(t) be differentiable vector functions (both
with values in Rn) and let f (t) be a differentiable real-valued function.
Then:
1. [u(t) + v(t)] = u (t) + v(t) ;
2. [cu(t)] = cu(t) for any constant c ∈ R ;
3. [ f (t)u(t)] = f (t)u(t) + f (t)u(t) ;
4. [u(t) · v(t)] = u (t) · v(t) + u(t) · v(t) ;
5. (for vector functions in R3)
[u(t)× v(t)] = u (t)× v(t) + u(t)× v(t)
6. (Chain Rule) [u( f (t))] = f (t)u( f (t)).
Proof. We will only prove parts (3) and (6) and only in the case
of R2. Let u(t) = (u1(t), u2(t)).
(3) Using the Product Rule for real-valued functions we have
[ f (t)u(t)] = ( f (t)u1(t), f (t)u2(t))
= (( f (t)u1(t)), ( f (t)u2(t)))
= ( f (t)u1(t) + f (t)u1(t), f (t)u2(t) + f (t)u2(t))
= f (t)(u1(t), u2(t)) + f (t)(u1(t), u
2(t))
= f (t)u(t) + f (t)u(t).
(6) Using the Chain Rule for real-valued functions, we get
[u( f (t))] = (u1( f (t)), u2( f (t)))
= ([u1( f (t))], [u2( f (t))])= ( f (t)u
1( f (t)), f (t)u2( f (t))) = f (t) u( f (t))
Example 7.11. Show that if the vector function r (t) is continuously
differentiable for t in an interval I and |r (t)| = c , a constant for all t ∈ I,
then r (t) ⊥ r (t) for all t ∈ I.
Question: What would the curve described by r (t) look like?
Solution. Note that |r (t)|2 = r (t) · r (t). Then differentiating
r (t) · r (t) = c2,
by using Theorem 7.10(4), we get
r (t) · r (t) + r (t) · r (t) = 0.
That is, 2r (t) · r (t) = 0. Hence r (t) ⊥ r (t) for all t.Question: So which well-known geometric fact have we just proved?
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 113/212
mathematical methods i 113
7.3 Partial Derivatives
For scalar-valued functions of one variable, the derivative gave us
the rate of change of the function as the single variable changed.
Functions of several variables have several variables and the func-
tion may change at different rates with respect to different vari-
ables. To accommodate this we need to introduce the notion of a
partial derivative.
Definition 7.12. (Partial Derivative)
Let f (x, y) be a function defined on a subset D of R2, and let (a, b) be an
interior point of D. For x near a, define g(x) = f (x, b) to be a function of
x. We define f x(a, b) = g (a), if this exists.
We call f x(a, b) the partial derivative of f with respect to x at (a, b).
From the definition of the derivative for a function of a single variable we
have:
f x(a, b) = limh→0
f (a + h, b) − f (a, b)
h .
The partial derivative of f with respect to y at (a, b) is defined simi-
larly:
f y(a, b) = limh→0
f (a, b + h) − f (a, b)
h .
For f x(a, b) we have fixed the value of y and let x vary, so we
are back to dealing with a function of just one variable, a familiar
situation.In general, for any value of x and y:
f x(x, y) = limh→0
f (x + h, y) − f (x, y)
h
and
f y(x, y) = limh→0
f (x, y + h) − f (x, y)
h ,
whenever the limits exist. So f x(x, y) gives us the rate of change
of f with respect to x with y held constant and f y(x, y) the rate of
change of f with respect to y with x constant.
What is the geometrical interpretation? Imagine standing at a
point on the surface given by the graph of the function f . The
value of f x(x, y) at the point will be the slope of the tangent
line to the surface that is parallel to the xz-plane, that is the
slope you see when looking in the direction of the x-axis. This
is demonstrated in Figure 7.3. This slope is likely to be different
from the slope of the tangent line to the surface that is parallel
to the yz-plane, which is given by f y(x, y). See Figure 7.4.
There are lots of different notation used to denote partial deriva-tives. Some are2: 2 Note that we use ∂
∂x instead of ddx to
distinguish between partial differentia-tion and ordinary differentiation.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 114/212
114
Figure 7.3: Tangent line parallel toxz-plane
Figure 7.4: Tangent lines parallel toxz-plane and yz-plane
f x(x, y) = f x(x, y) = ∂
∂x f (x, y) =
∂ f
∂x(x, y),
f y(x, y) = f y(x, y) = ∂
∂ y f (x, y) =
∂ f
∂ y(x, y)
To find the partial derivative of f (x, y) with respect to x we con-
sider y as a constant and proceed as if f were a function of a single
variable. This means that we can use the familiar rules for differen-
tiation of functions of a single variable.
Example 7.13. Let f (x, y) = x2 y + y sin(x). Then
f x(x, y) = 2xy + y cos(x),
f y(x, y) = x2 + sin(x).
We can extend the notion of a partial derivative to define the
partial derivatives f x, f y and f z of a function f (x, y, z) of three vari-
ables, and more generally the partial derivatives f xi of a function
f (x1, x2, . . . , xn) of n variables. To calculate the partial derivative
with respect to the variable x i, we consider all x j’s with j = i as
constants and proceed as though f were a function of a single vari-
able x i. Again we can use the familiar rules for differentiation of
functions of a single variable.
Example 7.14. 1. Let f (x, y, z) = x + y + z
x2 + y2 + z2. Then
f x(x, y, z) = (x2 + y2 + z2) − (x + y + z) (2x)
(x2 + y2 + z2)2 =
y2 + z2 − x2 − 2x( y + z)(x2 + y2 + z2)2
.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 115/212
mathematical methods i 115
Similarly, or just using the symmetry with respect to x, y and z, one
gets
f y(x, y, z) = x2 + z2 − y2 − 2 y(x + z)
(x2 + y2 + z2)2 , f z(x, y, z) =
x2 + y2 − z2 − 2 z(x + y)
(x2 + y2 + z2)2 .
2. f (x, y, z) = exyz + y sin(xz). Then
f x(x, y, z) = yzexyz + yz cos(xz) , f y(x, y) = xzexyz + sin(xz) ,
f z(x, y, z) = xyexyz + xy cos(xz).
3. Consider the linear function f ( x) = c · x defined for x ∈ Rn, where c
is a constant vector; that is,
f (x1, x2, . . . , xn) = c1x1 + c2x2 + . . . + cnxn.
Then f xi ( x) = c i for each i and each x ∈ R
n
.
7.3.1 Higher Derivatives
With functions of one variable there is only at most one second
derivative but as the number of variables increases the number of
possible second order derivatives increases quite rapidly, as we
shall see in what follows.
Definition 7.15. (Second partial derivative)
Let D be an open disc inR
2
and let f : D −→ R. Suppose the partial
derivative f x(x, y) exists for all (x, y) ∈ D. This defines a function
g = f x : D −→ R. Suppose this new function g has a partial derivative
with respect to x at some (a, b) ∈ D. We denote
f xx (a, b) = gx(a, b) = ∂
∂x f x(a, b)
and call it the second partial derivative of f with respect to x at
(a, b). Similarly we define
f xy(a, b) = g y(a, b) = ∂
∂ y f x(a, b)
Some alternative notation is:
f xx = f xx = ∂
∂x
∂ f
∂x
=
∂2 f
∂x2 ,
f xy = f xy = ∂
∂ y
∂ f
∂x
=
∂2 f
∂ y∂x .
We can similarly define f yx and f yy. Thus we have a total of four
second-order partial derivatives, although for many of the functions
with which we shall be concerned it turns out that the two mixedderivatives, f yx and f xy , are identical.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 116/212
116
We can also define third order derivatives:
f xxy = ∂
∂ y
∂
∂x
∂ f
∂x
,
similarly one defines f xxx , f xyx , etc.
Example 7.16. Let f (x, y) = x2 y + xy − y3. Then
f x(x, y) = 2xy + y and f y(x, y) = x2 + x − 3 y2
Also
f xx (x, y) = ∂
∂x(2xy + y) = 2 y , f xy(x, y) =
∂
∂ y(2xy + y) = 2x + 1,
f yx (x, y) = ∂
∂x(x2 + x − 3 y2) = 2x + 1
f yy(x, y) = ∂
∂ y
(x2 + x
−3 y2) =
−6 y.
Notice that f xy (x, y) = f yx (x, y) for this function. Also
f xxy (x, y) = ∂
∂ y(2 y) = 2, f xyx (x, y) =
∂
∂x(2x + 1) = 2
and
f yxx (x, y) = ∂
∂x(2x + 1) = 2
Thus f xxy (x, y) = f xyx (x, y) = f yxx (x, y).
Example 7.16 is an illustration of the following theorem.
Theorem 7.17 (Clairaut’s Theorem). Let f (x, y) be defined on an open
disc D of R2. If the functions f xy and f yx are both defined and continuous
on D, then f xy = f yx on D.
An analogue of Theorem 7.17 holds for functions of three or
more variables. For example,
f xyzx (x, y, z) = f xxyz(x, y, z) = f yzxx (x, y, z) = f yxzx (x, y, z), etc,
provided the partial derivatives are continuous.
7.4 Tangent Planes and differentiability
When dealing with functions of one variable we often want to find
the equation of a line tangent to the graph of the function at some
point. One reason for doing this is that sometimes it is convenient
to use the equation of the tangent line as an approximation to the
more complicated equation of the actual function. Indeed if f (x) is
a function that is differentiable at x = a then the tangent line to the
curve defined by y = f (x) at x = a has equation
y = f (a)(x − a) + f (a) (7.1)
The derivative of f at x = a is the scalar multiple of x given in
the equation of the line. This is demonstrated in Figure 7.5 for thefunction f (x) = x2 at the point x = 3.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 117/212
mathematical methods i 117
−1 1 2 3 4 5
−10
10
20
30 Figure 7.5: The tangent line to y = x2
at x = 3.
The function L(x) = f (x)(x − a) + f (a) is called the best linearapproximation3 to f at x = a. 3 Note that L(x) : R → R is notnecessarily a linear function in thesense of Chapter 4. Indeed, if f (a) = 0then L(λx) = f (a)(λx − a) + f (a) =λL(x). The function L(x) is what wewould call an affine function.
Similar ideas occur in multi-variable calculus but the linear ap-
proximations are of higher dimensions than one-dimensional tan-
gent lines. For instance, for functions of two variables the analogue
of a tangent line is a tangent plane.
Consider a function f : D −→ R where the domain D is a subset
of R2, and let (a, b) be an interior point of D . Let S be the surface
defined by
{(x, y, f (x, y)) | (x, y) ∈ D}.
Suppose the partial derivatives f x(a, b) and f y(a, b) exist. Fix y = b
and consider the following curve C1 on S defined by
r 1(x) = (x, b, f (x, b))
for all (x, b) such that f (x, b) is defined. Then C1 is the intersection
curve of the surface S with the plane y = b. The vector function
r 1(x) is differentiable at a and r 1(a) = (1,0, f x(a, b)) is a tangent
vector to C1 at P = r 1(a) = (a, b, f (a, b)).
Similarly, the curve C2 defined by
r 2( y) = (a, y, f (a, y))
for all (a, y) such that f (a, y) exists, is the intersection of S withthe plane x = a and has a tangent vector r 2(b) = (0,1, f y(a, b)) at
r 2(b) = P.
In conclusion, the vectors r 1(a) = (1,0, f x(a, b)) and r 2(b) =
(0,1, f y(a, b)) are tangent to the surface S at the point P = (a, b, f (a, b)).
These define tangent lines 1 and 2 to S parallel to the correspond-
ing coordinate planes. This was depicted in Figure 7.4.
Since 1 is the best linear approximation to C1 near P and 2
is the best linear approximation to C2 near P, if there is a plane
that is the best linear approximation to S near P it must contain
both 1 and 2. There is a unique plane Π, containing the point
P = (a, b, f (a, b)) and parallel to the vectors r 1(a) and r 2(b) and Πis called the tangent plane to S at P . See Figure 7.6.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 118/212
118
Figure 7.6: Tangent plane
Any non-zero vector orthogonal to Π is called a normal vector to
the surface S at the point P. Thus
n = r 1(a)× r 2(b) = (− f x(a, b), − f y(a, b), 1)
is a normal vector to S at P.4 4 Recall from the week 4 tutorial sheetthat the cross product of two vectors isperpendicular to both the original twovectors.
Since n is perpendicular to Π, an equation for the tangent plane
Π has the form
− f x(a, b) x − f y(a, b) y + z = d
for some constant d which can be determined by using the fact that
P ∈ Π.
Example 7.18. Determine the equation of the tangent plane Π to the
graph S of f (x, y) = x2 + 3xy − y2 at the point P = (1, −1, −3) and an
upward normal to this plane.
Solution: Calculating the partial derivatives we have f x(x, y) = 2x + 3 y
and f y(x, y) = 3x − 2 y. Thus f x(1, −1) = −1 and f y(1, −1) = 5, and
therefore n = (1, −5, 1) is a normal vector to the plane Π.
Thus an equation for Π has the form x − 5 y + z = d. Since P =
(1, −1, −3) ∈ Π it follows that 1 + 5 − 3 = d, that is, d = 3. Hence the
equation of the tangent plane is
x − 5 y + z = 3.
It is possible to write the equation of the tangent plane in a dif-ferent manner. Since the point P is given by (a, b, f (a, b)) the equa-
tion of Π can also be written as
z − f (a, b) = f x(a, b)(x − a) + f y(a, b)( y − b)
or
z =
f x(a, b) f y(a, b) x − a
y − b
+ f (a, b) (7.2)
This looks similar to the equation of the tangent line given in
(7.1). The tangent plane to S at P may or may not be a good linear
approximation to S near P. If it is then we define our function f to be differentiable. We can make this definition more
precise but this is beyond the scope of the course: we would define f to bedifferentiable at the point c = (a, b) if there is a linear function A : R2 → R
such that
lim x→c
f ( x) − f (c) − A( x − c)
d( x, c) = 0
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 119/212
mathematical methods i 119
Definition 7.19. (Derivative)
We say that f is differentiable at (a, b) if the tangent plane given by
(7.2) is the best linear approximation to f near (a, b). The matrix
D f (a, b) =
f x(a, b) f y(a, b)
is called the derivative of f at (a, b). We say that f is differentiable if f
is differentiable at every point in its domain.
We should note that the existence of ∂ f ∂x and
∂ f ∂ y at (a, b) does not
guarantee that f is differentiable at (a, b), as demonstrated in the
following example.
Example 7.20. Consider the function
f : R2 −→ R
(x, y) −→
0 if xy = 0
1 if xy = 0
The graph of f is given in Figure 7.7. When x = 0 or y = 0, the value
of f (x, y) is 1 and so f is constant in both the x-direction and the y-
direction. Hence f x(0, 0) = 0 and f y(0, 0) = 0. However, the tangent
plane
z =
f x(0, 0) f y(0, 0) x − 0
y − 0
+ f (0, 0)
= 1is clearly not a good a approximation for f near (0, 0) as f (a, b) = 0 for
any (a, b) arbitrarily close to (0, 0) with ab = 0.
A simple test for a function to be differentiable is the following
theorem.
Theorem 7.21. If ∂ f ∂x and
∂ f ∂ y exist on an open disc centred (a, b) and are
continuous at (a, b) then f is differentiable at (a, b).
Example 7.22. Recall the function f (x, y) = x2 + 3xy − y2 from
Example 7.18. Its partial derivatives are f x(x, y) = 2x + 3 y and
f y(x, y) = 3x − 2 y which exist and are continuous for all (x, y) ∈ R
2
asthey are polynomials. Hence f is differentiable and its derivative is
D f (x, y) =
2x + 3 y 3x − 2 y
We also have the following analogue of Theorem 7.4.
Theorem 7.23. If f is differentiable at (a, b) then f is continuous at
(a, b).
7.5 The Jacobian matrix and the Chain Rule
We saw in the previous section that if a function f : R2 → R
is differentiable then the derivative of f at c = (a, b) is the 1 × 2
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 120/212
120
x
y
z
Figure 7.7: The graph of the function f defined in Example 7.20.
matrix given by ∂ f ∂x (c) ∂ f
∂ y (c)
We also saw in Section 7.2 that if f (t) = ( f 1(t), f 2(t)) is a differ-
entiable vector function then its derivative is the vector function
f (t) = (
d f 1dt (t),
d f 2dt (t)). It is convenient to write both f (t) and f (t)
as column vectors5 , that is 5 This is consistent with our conventionthat all vectors are actually columnvectors.
f (t) =
f 1(t)
f 2(t)
and f (t) =
d f 1dt (t)
d f 2dt (t)
that is, as 2 × 1 matrices.
What happens if we have a function F : R2 → R2, that is, a
function of the form
(u, v) = F(x, y)
Then really we have two coordinate functions of x and y and it is best to think of our function in terms of column vectors, that isu
v
= F(x, y) =
f 1(x, y)
f 2(x, y)
If our function F is differentiable6 then the derivative at c = (a, b) is 6 We haven’t said what it actuallymeans for such a function to be differ-entiable but essentially it is just thatF can be approximated by an (affine)linear function.
the 2 × 2 matrix
DF(c) =
∂ f 1∂x (c) ∂ f 1
∂ y (c)∂ f 2∂x (c) ∂ f 2
∂ y (c)
(7.3)
consisting of all the first order partial derivatives.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 121/212
mathematical methods i 121
Definition 7.24. (Jacobian matrix)
The matrix DF(c) is called the Jacobian matrix of F at c and its determi-
nant is called the Jacobian.
In general, if F : Rn → Rm then F has m coordinate functions
and the Jacobian matrix of F is an m × n matrix of partial deriva-
tives.
Example 7.25. 1. Polar coordinates transformation: Let F(r, θ) =
(r cos(θ), r sin(θ)) for r ≥ 0 and θ ∈ [0, 2π ). That is F(r, θ) =
( f 1(r, θ), f 2(r, θ)) where f 1(r, θ) = r cos(θ) and f 2(r, θ) = r sin(θ).
Now if
D =
{(r, θ)
|r
∈[0, a], θ
∈[0, 2π )
} for some a > 0, then
range( F ) = {(x, y) ∈ R2 | x2 + y2 ≤ a2}
that is, the disc with centre at 0 and radius a.
Then for every c = (r, θ) we have
DF(c) =
∂ f 1∂r (c) ∂ f 1
∂θ (c)∂ f 2∂r (c)
∂ f 2∂θ (c)
=
cos(θ) −r sin(θ)
sin(θ) r cos(θ)
Notice that the Jacobian of F is equal to r.
2. Consider F : R2 −→ R2 given by F(x, y) = (3x + 4 y, −2x + y). Then
(writing vectors as columns) we have
F(x, y) =
3x + 4 y
−2x + y
=
3 4
−2 1
x
y
so F is the linear transformation defined by the matrix
A = 3 4
−2 1Now
DF(x, y) =
∂ f 1∂x
∂ f 1∂ y
∂ f 2∂x
∂ f 2∂ y
=
3 4
−2 1
= A
This should not be surprising: the derivative of a function gives the
best linear approximation to the function near a point. If the function is
already linear then the best linear approximation is the function itself.
In the case of functions of one variable the derivative of f (x) = ax is
equal to a for all x ∈ R. For a linear function F( x) = A x given by amatrix A, then DF( x) = A for every x ∈ R
2 (Exercise).
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 122/212
122
7.5.1 The Chain Rule
If f is a scalar-valued function of a variable x and x is in turn a
scalar-valued function of t, then recall that the Chain Rule allows us
to determine the derivative of f with respect to t without having to
explicitly find f as a function of t :
d f
dt =
d f
dx
dx
dt
This naturally extends to functions of several variables.
Suppose first that f : R2 → R is a function of the variables x and
y and both x and y are functions of t. Then f can be rewritten as a
function of t and the derivative with respect to t will be:
Chain Rule I: d f
dt =
∂ f
∂x
dx
dt +
∂ f
∂ y
dy
dt (7.4)
Note that both f , x and y are functions of the single variable t
and so we use d f
dt , dxdt and
dydt instead of the partial derivative nota-
tion.
Example 7.26. Let f (x, y) = x2 + y2 such that x = t2 and y = et. Then
dx
dt = 2t and
dy
dt = et
while∂ f
∂x = 2x and
∂ f
∂ y = 2 y
Thus by the Chain Rule we have
d f
dt =
∂ f
∂x
dx
dt +
∂ f
∂ y
dy
dt= 2x(2t) + 2 y(et)
= 2(t2)2t + 2etet substituting for x and y
= 4t3 + 2e2t
Alternatively, suppose that f : R2 → R is a function of the
variables x and y and both x and y are functions of the variables s
and t. Then f is a function of s and t and its partial derivatives will
be given by:
Chain Rule II:
∂ f
∂s =
∂ f
∂x
∂x
∂s +
∂ f
∂ y
∂ y
∂s
∂ f
∂t =
∂ f
∂x
∂x
∂t +
∂ f
∂ y
∂ y
∂t
(7.5)
Example 7.27. Let f (x, y) = x2 + y2 with x = 2s + t and y = s2 + 4.
Then ∂ f
∂x = 2x
∂ f
∂ y = 2 y
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 123/212
mathematical methods i 123
∂x
∂t = 1
∂x
∂s = 2
∂ y
∂t = 0
∂ y
∂s = 2s
Then by the Chain Rule we have
∂ f
∂s =
∂ f
∂x
∂x
∂s +
∂ f
∂ y
∂ y
∂s= (2x)2 + (2 y)(2s)
= 2(2s + t) + 2(s2 + 4)2s
= 4s3 + 24s + 4t
and∂ f
∂t =
∂ f
∂x
∂x
∂t +
∂ f
∂ y
∂ y
∂t= (2x)1 + (2 y)0
= 2(2s + t) = 4s + 2t
Recall that the Chain Rule for scalar-valued functions of one
variable also provides a way of determining the derivative of the
composition of two functions. In this interpretation we have
d
dx f ( g(x)) = f ( g(x)) g(x).
For example, if f (x) = sin(x) and g(x) = 2x2 then f ( g(x)) =
sin(2x2)7 and 7 Indeed doing such differentiationhas probably become second naturewithout even realising that you areusing the Chain Rule.
d
dx f ( g(x)) = cos(2x2)4x
This interpretation naturally extends to vector functions of an
arbitrary number of variables in a manner that does not requiremore complicated and complicated expressions along the lines of
our previous versions of the Chain Rule in (7.4) and (7.5).
Theorem 7.28 (Chain Rule). Let f : Rn → Rm and g : Rs → R
n
be differentiable functions and let F = f ◦ g : Rs → Rm. Then F is
differentiable and
DF( x) = D f ( g( x))Dg( x)
Note that D f ( g( x)) is an m × n matrix and D g( x) is an n × s
matrix so their product is an m × s matrix as required.
Is this the same as Chain Rules I and II? Suppose that f is a func-
tion of x and y, and both x and y are in turn functions of t. De-
fine g(t) = (x(t), y(t)), the change of variables function and let
F = f ◦ g. Then F can be viewed as the function given by f when
written as a function of t . Then
Dg(t) =
dxdtdydt
and D f (x, y) =
∂ f ∂x
∂ f ∂ y
Now Theorem 7.28 gives
DF(t) = ∂ f ∂x
∂ f ∂ y
dxdtdy
dt =∂ f ∂x
dxdt +
∂ f ∂ y
dydt
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 124/212
124
that isd f
dt =
dF
dt =
∂ f
∂x
dx
dt +
∂ f
∂ y
dy
dt
as we saw in Chain Rule I (7.4). Chain Rule II can be recovered in a
similar manner.
Example 7.29. Let g (x, y) = (x2 + 3 y, y2 + 3) and f (u, v) =
(eu − v, ln(v)). Find the Jacobian matrix and the Jacobian of F (x, y) =
f ( g (x, y)) at c = (0, 1).
We have g (c) = (3, 4) and
D g (x, y) =
2x 3
0 2 y
, D f (u, v) =
eu −1
0 1/v
.
The Chain Rule implies
D F (c) = D f ( g (c))D g (c) = e3
−10 1/4
0 30 2
= 0 3e3
− 20 1/2
Hence the Jacobian, det(D F (c)), is 0.
7.6 Directional derivatives and gradients
We saw in Section 7.3, that ∂ f ∂x gives the rate of change of f in the
direction of the x-axis and ∂ f ∂ y gives the rate of change of f in the
direction of the y-axis. However, we may also be interested in the
rate of change of f in some other direction. This is provided by the
directional derivative.
Definition 7.30. (Directional Derivative)
Let f (x, y) be defined on some subset D of R2 and let c = (a, b) be an
interior point of D. Given a non-zero vector v ∈ R2, define u =
v
|v| , that
is the unit vector in the direction of v.
The directional derivative of f at c in the direction v is defined by
Dv f (c) = limh
→0
f (c + hu) − f (c)
h
whenever the limit exists.
One can define directional derivatives for functions of three or
more variables in a similar fashion.
Geometrically, Dv f (c) is the rate of change of f at c in the di-
rection of v. So if v is parallel to the x-axis then the directional
derivative is just the partial derivative with respect to x . Similarly,
if v is parallel to the y-axis then the directional derivative is the
partial derivative with respect to y.
To enable us to calculate directional derivatives we introduce thenotion of the gradient vector. 8 8 You will have noticed that the gra-
dient vector ∇ f looks very similar tothe derivative D f and may wonderwhy we have given it a different name.There is a subtle difference and somany mathematicians usually write∇ f as a column vector to distinguishthe two. The derivative D f ( x) is amatrix and so defines a linear function
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 125/212
mathematical methods i 125
Definition 7.31. (Gradient vector)
Let f : D → R with D ⊆ R2 and let (x, y) be an interior point of D. The
gradient vector of f at (x, y) is the vector
∇ f (x, y) = grad f (x, y) = ( f x(x, y), f y(x, y))
if the partial derivatives f x(x, y) and f y(x, y) exist.
In a similar way one defines the gradient vector of a function of three
or more variables. That is, if f (x, y, z) is a function of three variables, its
gradient vector at (x, y, z) is defined by
∇ f (x, y, z) = ( f x(x, y, z), f y(x, y, z), f z(x, y, z))
whenever the partial derivatives of f at (x, y, z) exist.
Example 7.32. 1. I f f (x, y) = x2 y − ln(x) for all x > 0 and y ∈ R,
then the gradient vector of f is
∇ f (x, y) = (2xy − 1
x, x2)
for each (x, y) in the domain of f.
2. Let f (x, y, z) = c · x = ax + by + cz, where c = (a, b, c) ∈ R3. Then
∇ f (x, y, z) = (a, b, c) = c.
The next theorem gives a method for calculating directional
derivatives
Theorem 7.33. If f is a differentiable function at c, then for every non-
zero vector v, the directional derivative Dv f (c) exists and
Dv f (c) = ∇ f (c) · u
where u = v
|v| .
Example 7.34. Find the directional derivative of f (x, y) = xe y at
c = (−3, 0) in the direction of the vector v = (1, 2).
Solution. Now
∇ f (x, y) = (e y, xe y) and the partial derivatives f x and
f y are continuous so by Theorem 7.21, f is differentiable. Now ∇ f (c) =
(1, −3) and u = v
|v| = (1, 2)√
5= (
1√ 5
, 2√
5). Hence
Dv f (c) = ∇ f (c) · u = (1, −3) · ( 1√
5,
2√ 5
) = − 5√ 5
= −√
5.
7.6.1 Maximum rate of change
We would like to determine the direction in which f has the max-
imum rate of change. As we saw in Theorem 7.33 the directional
derivative of f in the direction of v is
Dv f (c) = ∇ f (c) · u
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 126/212
126
where u is the unit vector in the direction of v. Recall that we can
also calculate the dot product of two vectors by
∇ f (c) · u = |∇ f (c)| |u| cos(θ)
where θ is the angle between∇
f (c) and u. Since|u|
= 1 and
−1 ≤ cos(θ) ≤ 1 it follows that the maximum value for Dv f (c) will
be | ∇ f (c) | and will occur in the direction of ∇ f (c).
Thus we have proved the following theorem.
Theorem 7.35. Let f ( x) for x ∈ D, be a differentiable function of two or
more variables and let c be an interior point of D such that ∇ f (c) = 0.
Then
max|u|=1
Du f (c) = |∇ f (c)|
and the maximum is achieved only for u = ∇ f (c)|∇ f (c)| .
Theorem 7.35 is saying that the gradient vector at a given point,points in the direction in which the function is increasing most
rapidly.
The same argument shows that the function is decreasing most
rapidly when the angle between v and ∇ f (c) is π , that is, when v
is in the direction of −∇ f (c).
Example 7.36. The temperature at each point of a metal plate is given by
the function T (x, y) = ex cos( y) + e y cos(x). In what direction does the
temperature increase most rapidly at the point (0, 0). What is this rate of
increase?
Solution. ∇
T = (ex cos( y)−
e y sin(x),−
ex sin( y) + e y cos(x)), so
∇T (0, 0) = (1, 1). This is the direction of fastest increase of T at (0, 0).
The rate of increase in the direction of ∇T (0, 0) is |∇T (0, 0)| =√
2.
If you consider the contour curves on a map, then the contour
curves indicate the direction in which the altitude is constant and
the direction in which the ground is steepest is in the direction
perpendicular to the contour curve. As we mentioned in Section
5.2.1, the contour lines on a map correspond to the level curves
of a function of two variables. The following theorem tells us that
the gradient vector is perpendicular to the level curves, since the
gradient vector is the direction of the greatest rate of increase, this
agrees with our experience.
Theorem 7.37. If f : D → R with D ⊆ R2, is differentiable at (a, b)
and ∇ f (a, b) = 0 then ∇ f (a, b) is perpendicular to the tangent line to
the level curve of f at (a, b).
Proof. Let C be the level curve passing through c = (a, b) and
let r (t) be a continuously differentiable parametrisation of C near
c with r (t0) = c for some t0. Since r (t) ∈ C for all t, we have
f (r (t)) = k for all t. Differentiating this equality with respect to t
and using the Chain Rule gives
0 = D f (r (t ))Dr (t)= ∇ f (r (t)) · r (t)
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 127/212
mathematical methods i 127
For t = t0 this gives ∇ f (c) ⊥ r (t0). Since r (t) is the direction of
the tangent line of C at c it follows that ∇ f (c) is orthogonal to C
at c.
Theorem 7.37 is illustrated in Figure 7.8.
z = f(x,y)
through (a,b)Level curve
(a,b)
Tangent line
Normal line
Gradient Vector
f(a,b)
Figure 7.8: The gradient vector relatedto the level curves
Similarly, let f ( x) : D −→ R, where D ⊂ Rn, be a differentiable
function of three or more variables. Consider the level surface
S : f ( x) = k
and assume that f (c) = k and ∇ f (c) = 0. Then ∇ f (c) is a normal
vector to S at c, and the (n − 1)-dimensional subspace containing c
and orthogonal to ∇ f (c) is the tangent plane to S at c.
Example 7
.38
. 1
. Find a normal vector to the surface
S : z − xe y cos( z) = −1
at c = (1,0,0) and an equation for the tangent plane to S at c.
Solution. S is a level surface of f (x, y, z) = z − xe y cos z. We have
∇ f (x, y, z) = (−e y cos z, −x e y cos z, 1 + x e y sin z)
so ∇ f (c) = (−1, −1, 1) is a normal vector to S at c. Since the tangent
plane is perpendicular to ∇ f (c), it has an equation −x − y + z = d for
some constant d. Since c lies in the tangent plane, we get
−1 = d, so
an equation of the plane is x + y − z = 1.
2. Let S be the graph of f (x, y) : D −→ R. Notice that S is a level surface
of the function g(x, y, z) = z − f (x, y), so assuming f is differentiable
at (a, b) ∈ D, the vector ∇ g(c) = (− f x(a, b), − f y(a, b), 1) is a
normal vector to S at c = (a, b, f (a, b)). This agrees with the definition
of a normal vector given in Section 7.4.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 129/212
8
Maxima and Minima
In this chapter we look at maxima and minima of functions of
several variables. First we revise the single variable case that should
be familiar from school but our treatment may be different.
8.1 Functions of a single variable
Definition 8.1. (Maxima and minima)
Let D ⊆ R and let f : D → R. Given c ∈ D, we say that f has an
absolute maximum at c if f (x) ≤ f (c) for all x ∈ D. Similarly we say
that f has an absolute minimum at c if f (x) ≥ f (c) for all x ∈ D.
We say that f has a local maximum at c if there exists an open inter-
val I centred at c such that f (x) ≤ f (c) for all x ∈ I ∩ D. We define alocal minimum similarly.
A maximum or minimum of f is also called an extremum of f .
So a local maximum (or minimum) occurs at a point where the
value of the function is at least as great (or at least as small) as at
any point in the vicinity but which may be exceeded at some other
point in the domain, whereas an absolute maximum (minimum)
is the largest (least) value taken by the function anywhere in its
domain.
Example 8.2. The function f defined on the interval [−1, 3] and whose
graph is given in Figure 8.1 has an absolute maximum at x = 3, an
absolute mininum at x = −1 as well a local maximum at x = 2/3 and a
local minimum at x = 2.
Theorem 8.3 (The Extreme Value Theorem). Let f be a continuous
scalar-valued function on a finite and closed interval [a, b]. Then f is
bounded and has an absolute maximum and an absolute minimum in
[a, b], that is, there exist c1, c2 ∈ [a, b] such that f (c1) ≤ f (x) ≤ f (c2)
for all x
∈[a, b].
Remark 8.4. 1. The points c1, c2 do not have to be unique1. 1 They can be the same point if f is aconstant function.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 130/212
130
−1 1 2 3
−20
−10
10 Figure 8.1: Examples of maxima andminima
2. The statement of Theorem 8.3 is not true (in general) for other types of intervals. For example, the function f (x) = 1/x defined on x ∈ [1,∞),
is a continuous function that has no absolute minimum. Although the
range of the function is bounded below by 0, it never actually achieves
this value.
Definition 8.5. (Critical points)
Let f be a scalar-valued function defined on the subset D of R. For c ∈D, we say that c is a critical point of f if
1. c is an interior point of D, and
2. either f (c) does not exist or f (c) = 0.
Theorem 8.6 (Fermat’s Theorem). If the function f has a local maxi-
mum or minimum at some interior point c of it domain, then c is a critical
point of f .
So, we have the following relations for interior points:
{points of absolute extrema} ⊆ {points of local extrema}⊆ {critical points} ∪ { boundary points}
In general, these sets are not equal. In particular, not all critical
points give rise to local maxima or minima. For example if f (x) =
x3 then f (x) = 3x2 so x = 0 is a critical point yet it is neither a
local maxima nor a local minima.
The above suggests the following process.
Key Concept 8.7. Procedure for finding the absolute extrema of a
continuous function f over a finite closed interval.
Step 1. Find the critical points of f and the values of f at these.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 131/212
mathematical methods i 131
Step 2. Find the values of f at the ends of the interval.
Step 3. Compare the values from Steps 1 and 2 ; the largest of them
gives the absolute maximum, while the smallest gives the absolute
minimum.
The existence of the absolute maximum and minimum follows from
the Extreme Value Theorem.
Example 8.8. Find the absolute maximum and the absolute minimum of
the function f (x) =√
x (5 − x2) on the interval [0, 4].
Step 1. For x > 0 we have
f (x) = 1
2√
x (5 − x2) +
√ x (−2x) =
(5 − x2) − 4x2
2√
x =
5(1 − x2)
2√
x .
So, for x ∈ (0, 4) we have f (x) = 0 only when x = 1, that is, x = 1 is
the only critical point of f. We have f (1) = 4.
Step 2. f (0) = 0 and f (4) = −22.
Step 3. Comparing f (1) = 4, f (0) = 0 and f (4) = −22, we conclude
that f (1) = 4 is the absolute maximum of the function, while f (4) =
−22 is its absolute minimum.
8.2 Functions of several variables
The ideas of Section 8.1 easily extend to functions of two variables.
Definition 8.9. (Maxima and minima)
Let D ⊆ R2 and let f : D → R. Given c ∈ D, we say that f has an
absolute maximum at c if f ( x) ≤ f (c) for all x ∈ D. Similarly we say
that f has an absolute minimum if f ( x) ≥ f (c) for all x ∈ D.
We say that f has a local maximum at c if there exists an open ball B
centred at c such that f ( x) ≤ f (c) for all x ∈ B ∩ D. We define a local
minimum similarly.
A maximum or minimum of f is also called an extremum of f .
Example 8.10. The function f (x, y) = 2x2 + y2 has an absolute mini-
mum at (0, 0) because f (x, y) ≥ 0 = f (0, 0) for all (x, y).
We have the natural analogue for critical points for functions of
two variables.
Definition 8.11. (Critical points)
Let f be a scalar-valued function defined on a subset D of R2. For c
∈ D
we say that c is a critical point of f if
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 132/212
132
1. c is an interior point of D, and
2. ∇ f (c) does not exist or ∇ f (c) = 0.
We have the following analogue of Theorem 8.6.
Theorem 8.12. Let D ⊆ R2 and let f : D → R. If c is an interior point
of D and f has a local maximum or minimum at c, then c is a critical
point of f .
Again note that not every critical point is a local maxima or
minima. For example, if f (x, y) = y2 − x2, we have ∇ f (0, 0) = 0,
but (0, 0) is neither a local maximum nor a local minimum. As can
be seen in Figure 8.2 the graph of the function near (0, 0) looks
like a saddle: it is increasing in some directions and decreasing in
others. This leads to the following definition.
Figure 8.2: The surface defined by f (x, y) = y2 − x2
Definition 8.13. (Saddle point)
Let f : D −→ R, with D ⊆ R2. A critical point of f is called a saddle
point of f if f has neither a local minimum nor a local maximum at the
point.
The Extreme Value Theorem also has an appropriate analoguefor functions of two variables but first we need some definitions.
Definition 8.14. (Closed, open and bounded subsets)
The boundary ∂D of a subset D of R2 is the set of all points c ∈ R2 such
that every open disc centred at c contains points of D and points not in D.
• A subset D of R2 is called closed if all its boundary points are in the
subset, that is, if ∂D ⊆ D.
• A subset D of R2 is called open if none of its boundary points are in
the subset, that is, if ∂D ∩ D = ∅.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 133/212
mathematical methods i 133
• A set D is called bounded if it is contained in a disc of finite radius.
The boundary of the set
D = {(x, y) | −2 ≤ x ≤ 3, 0 ≤ y ≤ 10}
is
{(x, y) ∈ D | x ∈ {−2, 2} or y ∈ {0,10}}Thus D is closed and it is also bounded. The set {(x, y) | −2 ≤ x ≤3, y > 0} is unbounded.
Theorem 8.15 (The Extreme Value Theorem). Let D be a non-empty
closed and bounded subset of R2 and let f : D −→ R be a continuous
function. Then f is bounded and there exist a ∈ D and b ∈ D such that
f (a)≤
f ( x)≤
f (b) for all x∈
D.
Again we have a strategy for finding the absolute maxima and
minima of a function.
Key Concept 8.16. Procedure for finding the absolute extrema of a
continuous function f over a closed and bounded set
Step 1. Find the critical points of f and the values of f at these.
Step 2. Find the maximum and the minimum of the restriction of f
to the boundary ∂D.
Step 3. Compare the values from Steps 1 and 2 ; the largest of them
gives the absolute maximum, while the smallest gives the absolute
minimum.
The existence of the absolute maximum and minimum follows from
the Extreme Value Theorem.
Example 8.17. Find the maximum and minimum of the function f (x, y) =
2x2 + x + y2 − 2 on D = {(x, y) | x2 + y2 ≤ 4}.
Solution:Step 1. To find the critical points of f , solve the system
f x(x, y) = 4x + 1 = 0
f y(x, y) = 2 y = 0
There are no points in D where ∇ f (x, y) fails to exist, so the only
critical point is P = (−1/4,0) which is in the interior of D. We have
f (P) = 18 − 1
4 − 2 = − 178 .
Step 2. The boundary ∂D of D is the circle x2 + y2 = 4. On this set of
points we have y2 = 4−
x2, so
f (x, y) = 2x2 + x + (4 − x2) − 2 = x2 + x + 2
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 134/212
134
on ∂D. Notice that on ∂D we have x ∈ [−2, 2]. So, we have to find the
extreme values of g(x) = x2 + x + 2 on the interval [−2, 2].
From g(x) = 2x + 1, the only critical point of g(x) on [−2, 2] is x =
−1/2. Since g(−1/2) = 74 and on the end points of the interval we
have g(−2) = 4 and g(2) = 8, it follows that the absolute maximumof g(x) is g(2) = 8 and its absolute minimum is g(−1/2) = 74 . These
are the maximum and the minimum values of f on the circle ∂D.
Step 3. Comparing the maximum and minimum values of f on ∂D with
the value of f at the only critical point P, we see that the absolute
minimum of f is − 178 = f (−1/4,0) and its absosulte maximum is
8 = g(2) = f (2, 0).
If f is a function of three or more variables defined in some re-
gion D of Rn, one can define local and absolute extrema, critical
points and saddle points in the same way. The analogues of Theo-
rems 8.12 and 8.15 remain true.
8.3 Identifying local maxima and minima
If f (x) is a real-valued function of a single variable, one way to
identify if a critical point c is a local maxima or minima is to ex-
amine the value of the derivative around c. For example, if on an
interval I centred at c the function f (x) is increasing for x < c (that
is, if f (x) > 0) and decreasing for x > c, (that is, f (x) < 0) then
f has a local maxima at x = c. We can identify a local minima in a
similar way.
Another way is to identify the concavity of the function.
Definition 8.18. (Inflection point)
A function f is called concave up on an interval I if f (x) is increasing
on I. Similarly, f is called concave down if f (x) is decreasing on I.
We say that c ∈ I is an inflection point for f if c is an interior point
of I and the type of concavity of f changes at c.
Remark 8.19. Of course, one way to identify if f (x) is increasing or
decreasing is to examine the second derivative f (x). That is, if f (x)exists and f (x) > 0 for all x ∈ I then f is concave up on I. Similarly, if
f (x) < 0 for all x ∈ I then f is concave down on I.
Moreover, if c is a point of inflection then f (c) = 0.
Example 8.20. For g(x) = x3 whose graph is given in Figure 8.3, we
have g(x) = 6x, so g(x) < 0 for x < 0 and g(x) > 0 for x > 0.
Thus, g is concave down on (−∞, 0) and concave up on (0,∞). The point
x = 0 is an inflection point. There are no other inflection points.
This provides the Second Derivative Test.
Theorem 8.21 (The Second Derivative Test). Let f (x) be a continuous
function defined on an interval I with continuous second derivative andlet c ∈ I be a critical point of f .
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 135/212
mathematical methods i 135
−2 −1 1 2
−2
2
Figure 8.3: The graph of the function f (x) = x3.
1. I f f (c) > 0 then f has a local minimum at c.
2. I f f (c) < 0 then f has a local maximum at c.
If f (c) = 0, then c may or may not be an inflection point.
Example 8.22. The function f (x) = x4 for x ∈ R whose graph is
given in Figure 8.4, is concave up on the whole of R, so x = 0 is not an
inflection point for f (although f (0) = 0).
−2 −1 1 2
2
4
6
8 Figure 8.4: The graph of the function f (x) = x4.
Next consider a scalar-valued function f (x, y) of two variables
defined on a subset D of R2. There are now four second order
partial derivatives f xx , f xy , f yx and f yy to consider. We already saw
in Theorem 7.17 that if f xy and f yx are defined and continuous on D
then they are equal. For c ∈ D, we define the matrix f xx (c) f xy (c)
f yx (c) f yy(c)
This matrix is sometimes called the Hessian matrix and denoted byD2 f (c).
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 136/212
136
The Hessian matrix defines a quadratic form
Qc(x, y) =
x y f xx (c) f xy (c)
f yx (c) f yy(c)
x
y
= f xx (c)x2 + ( f xy (c) + f yx (c))xy + f yy (c) y2
The quadratic form Qc is called positive definite if Qc(x, y) > 0 for
all (x, y) = (0, 0) and called negative definite if Qc(x, y) < 0 for all
(x, y) = (0, 0).
We will assume from now on that f xy and f yx are continuous
so that D2 f (c) is a symmetric matrix. Since D2 f (c) is symmetric,
the quadratic form Qc will be positive definite if and only if the
determinant Dc of its associated Hessian matrix is strictly positive
and f xx (c) > 0. The form Qc will be negative definite if and only if
Dc > 0 and f xx (c) < 0. When Dc < 0, the quadratic form takes both
positive and negative values. These three statements are perhaps
best justified by looking at the eigenvalues of the Hessian matrix, but
eigenvalues won’t be seen until later in the course.
Theorem 8.23 (Second Derivatives Test). Suppose f has continuous
second partial derivatives in an open set U in R2, c ∈ U and ∇ f (c) = 0
(that is, c is a critical point of f).
1. If Qc(x, y) is positive definite (that is, Dc > 0 and f xx (c) > 0), then f
has a local minimum at c.
2. If Qc(x, y) is negative definite (that is, Dc > 0 and f xx (c) < 0), then f has a local maximum at c.
3. If Qc(x, y) assumes both positive and negative values (that is, Dc < 0),
then c is a saddle point of f .
Remark 8.24. 1. When Dc = 0, the Second Derivatives Test gives no
information.
2. Notice that the three cases in Theorem 8.23 do not cover all possible
cases for the quadratic form Qc(x, y). For example, it may happen
that Qc(x, y) ≥
0 for all x, y (we then say that Q is positive semi-
definite).
Example 8.25. Find all critical points of the function f (x, y) = x3 −3xy + y3 and determine their type (if possible).
Solution: There are no points in the domain where ∇ f (x, y) does not
exist, so the only critical points of f are solutions of the system
f x(x, y) = 3x2 − 3 y = 0,
f y(x, y) = −3x + 3 y2 = 0
From the 1st equation y = x2. Substituting this into the second equation
we get −x + x4 = 0. Thus, x = 0 or x = 1. Consequently, the critical points of f are P1 = (0, 0) and P2 = (1, 1).
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 137/212
mathematical methods i 137
Next, f xx = 6x, f xy = −3 and f yy = 6 y. At P1 the Hessian matrix is 0 −3
−3 0
which has determinant −9 and so P1 is a saddle point. At P2 the Hessian matrix is
6 −3
−3 6
which has determinant 27. Since f xx (1, 1) = 6 > 0 the quadratic form is
positive definite and so f has a local minimum at P2 = (1, 1).
The surface defined by the graph of f is given in Figure 8.5.
Figure 8.5: The surface defined by
f (x, y) = x3
− 3xy + y3
.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 139/212
9
Taylor Polynomials
We saw in Chapter 7 that we could use the derivative of a function
to get a good linear approximation of the function. Sometimes,
an even better approximation is required. In particular, it is often
convenient to be able to approximate a function by a polynomial: itis easy to calculate its value and easy to differentiate and integrate.
9.1 Taylor polynomials for functions of one variable
Definition 9.1. (Taylor polynomial)
Let f (x) be a real-valued function of one variable defined on some interval
I and having continuous derivatives f (x), f (x), . . . , f (n)(x) on I for
some integer n
≥1. Let a be an interior point of I.
The nth Taylor polynomial of f about a is defined by
T n,a(x) = f (a) + f (a)
1! (x − a) +
f (a)
2! (x − a)2 + . . . +
f (n)(a)
n! (x − a)n
This is always a polynomial of degree at most n (however the de-
gree might be less than n, since it may happen that f (n)(a) = 0).
Note that T 1,a(x) is the equation of the tangent line to the curve
y = f (x) when x = a.
Remark 9.2. 1. Clearly T n,a
(a) = f (a). Next,
T n,a(x) = f (a) + f (a)
1! (x − a) +
f (a)
2! (x − a)2 + . . . +
f (n)(a)
(n − 1)! (x − a)n−1
so T n,a(a) = f (a).
Continuing in this way, one sees that
T (k )n,a (a) = f (k )(a)
for all k = 0, 1 , 2, . . . , n.
2. Using properties of polynomials one can actually prove that if Q(x) is
any polynomial of degree at most n such that Q(k )(a) = f (k )(a) for allk = 0, 1, 2 , . . . , n, then Q(x) = T n,a(x).
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 140/212
140
Example 9.3. 1. Let f (x) = ex and a = 0. Since f (x) = e x = f (k )(x)
for all k ≥ 0 we have f (k )(0) = 1 for all k ≥ 0, and so
T n,0(x) = 1 + x
1! +
x2
2! + . . . +
xn
n!
for any positive integer n.
2. Let f (x) = sin(x). Notice that
f (x) = cos(x) , f (x) = − sin(x) , f (x) = − cos(x) , f (4)(x) = sin(x)
and the sequence of the derivatives is periodic with period 4.
Consider a = π /3 and n = 3. We have
f (π /3) =
√ 3
2 , f (π /3) =
1
2 , f (π /3) = −
√ 3
2 , f (π /3) = −1
2
therefore
T 3,π /3(x) =
√ 3
2 +
1
2 (x − π /3) −
√ 3
4 (x − π /3)2 − 1
12 (x − π /3)3
Taylor polynomials are used to approximate the original func-
tion f . That is, given f , n and a as before, we consider the approxi-
mation
f (x) ≈ T n,a(x) for x ∈ I (9.1)
This can be written as
f (x) = T n,a(x) + Rn,a(x) , (9.2)
where Rn,a(x) is the error term (the so called remainder). Naturally,
the approximation (9.1) will not be of much use if we do not know
how small the error is. The following theorem gives some informa-
tion about the remainder Rn,a(x) which can be used to get practical
estimates about the size of the error.
Theorem 9.4 (Taylor’s Formula). Assume that f has continuous deriva-
tives up to order n + 1 on some interval I and a is an interior point of I.
Then for any x
∈ I there exists z between x and a such that
Rn,a(x) = f (n+1)( z)
(n + 1)! (x − a)n+1
That is, for any x ∈ I,
f (x) = f (a) + f (a)
1! (x − a) +
f (a)
2! (x − a)2 + . . . +
f (n)(a)
n! (x − a)n +
f (n+1)( z)
(n + 1)! (x − a)n+1
for any x ∈ I, where z depends on x.
Despite the fact that we don’t know what z is, knwoing an in-
terval in which it lies allows us to bound the error, as we see in thenext example.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 141/212
mathematical methods i 141
Example 9.5. Find T 6,0(x) for f (x) = sin(x) and estimate the maximal
possible error in the approximation
sin(x) ≈ T 6,0(x) (9.3)
for x
∈ [−
0.3, 0.3]. Use this approximation to find sin(12◦) correct to 6
decimal places.
Solution: Recall the derivatives of sin(x):
f (x) = cos(x) , f (x) = − sin(x) , f (x) = − cos(x) , f (4)(x) = sin(x) ,
f (5)(x) = cos(x) , f (6)(x) = − sin(x) , f (7)(x) = − cos(x)
Thus,
T 6,0(x) = 0 + 1
1! x + 0 − 1
3! x3 + 0 +
1
5! x5 = x − x3
3! +
x5
5!
To estimate the error, recall that
sin(x) = T 6,0(x) + R6,0(x) ,
where by Taylor’s formula, for any x the error term has the form
R6,0(x) = f (7)( z)
7! x7
for some z between 0 and x. For x ∈ [−0.3, 0.3] we have |x| ≤ 0.3, so
|R6,0(x)| = | cos( z)|
5040 |x|7 ≤ 1
5040|0.3|7 = 0.4339285 × 10−7
< 4.4× 10−8 .
Hence the maximum error in the approximation (9.3) for x ∈ [−0.3, 0.3] is
at most 4.4 × 10−8.
Since 12◦ is π /15 in radians and π /15 < 0.3, the previous argumentshows that
sin(12◦) = sin(π /15) ≈ π /15 − (π /15)3
3! +
(π /15)5
5! = 0.20791169...
is correct to at least 6 decimals.
9.2 Taylor polynomials for functions of two variables
We can also define Taylor polynomials for functions of two vari-
ables. We will only state the second degree approximation.
Theorem 9.6 (Taylor’s Theorem). Let D be an open disc in R2, let f : D −→ R, and let c = (a, b) ∈ D. If f has continuous and bounded
partial derivatives up to third order in D, then there exists a constant
C > 0 such that for each (x, y) ∈ D we have
f (x, y) = f (a, b) +
f x(a, b) f y(a, b) x − a
y − b
+1
2
x − a y − b
f xx (a, b) f xy (a, b)
f yx (a, b) f yy(a, b)
x − a
y − b
+R2
(a, b; x, y)
where |R2(a, b; x, y)| ≤ C |(x − a, y − b)|3 = C [(x − a)2 + ( y − b)2]3/2.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 142/212
142
Note that the first two terms in the approximation for f (x, y)
give the equation of the tangent plane to the surface z = f (x, y) at
the point (a, b, f (a, b)).
Remark 9.7. If (a, b) is a critical point of f then f x(a, b) = f y(a, b) = 0
and so the second degree Taylor polynomial is simply
f (a, b) + 1
2
x − a y − b
f xx (a, b) f xy(a, b)
f yx (a, b) f yy (a, b)
x − a
y − b
Note that the quadratic term involves the Hessian matrix for f and the
quadratic form used in the Second Derivative Test in Theorem 8.23.
Clearly, if the quadratic form only takes positive values near (a, b) then
f will have a local minimum at (a, b), and if the quadratic form only takes
negative values near (a, b) then f will have a local maximum at (a, b).
This justifies Theorem 8.23.
Example 9.8. Let f (x, y) = 1 + x2 + y2 and let c = (0, 0). Then
f x(x, y) = x/
1 + x2 + y2, so f x(0, 0) = 0, and similarly f y(0, 0) = 0.
Thus, the first degree approximation of f about c = (0, 0) is
f (x, y) ≈ f (0, 0) +
f x(0, 0) f y(0, 0) x − 0
y − 0
= 1
This amounts to approximating the function by the equation of its tangent
plane.
Next,
f xx (x, y) = 1 1 + x2 + y2 − x
2
(1 + x2 + y2)3/2
so f xx (0, 0) = 1. Similarly, f xy(0, 0) = 0 and f yy (0, 0) = 1, so the second
degree approximation of f about c = (0, 0) is
f (x, y) ≈ 1 + 12
x − 0 y − 0
f xx (0, 0) f xy (0, 0)
f yx (0, 0) f yy(0, 0)
x − 0
y − 0
= 1 + 1
2 (x2 + y2)
Now we are approximating the function by a second-order polynomial, so
by the equation of a curved, rather than flat, surface.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 143/212
Part III
Differential equations and
eigenstructure: by Des Hill
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 145/212
10
Differential Equations
10.1 Introduction
In a vast number of situations a mathematical model of a system
or process will result in an equation (or set of equations) involvingnot only functions of the dependent variables but also derivatives
of some or all of those functions with respect to one or more of the
variables. Such equations are called differential equations.
The simplest situation is that of a single function of a single
independent variable, in which case the equation is referred to
as an ordinary differential equation. A situation in which there is
more than one independent variable will involve a function of
those variables and an equation involving partial derivatives of that
function is called a partial differential equation.
Notationally, it is easy to tell the difference. For example, the
equation∂ f
∂x +
∂ f
∂ y = f 2 (10.1)
is a partial differential equation to be solved for f (x, y), whereas
d2 f
dx2 + 3
d f
dx + 2 f = x 4 (10.2)
is an ordinary differential equation to be solved for f (x).
The order of a differential equation is the degree of the highest
derivative that occurs in it. Partial differential equation (10.1) is
first order and ordinary differential equation (10.2) is second order.For partial differential equations the degree of a mixed derivative
is the total number of derivatives taken. For example, the following
partial differential equation for f (x, t) has order five:
∂ f 5
∂x3∂t2 +
∂2 f
∂x2 +
∂ f
∂t = 0 (10.3)
An important class of differential equations are those referred to
as linear. Roughly speaking linear differential equations are those
in which neither the function nor its derivatives occur in products,
powers or nonlinear functions. Differential equations that are not
linear are referred to as nonlinear. Equation (10.1) is nonlinear,whereas equations (10.2) and (10.3) are both linear.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 146/212
146
Example 10.1. Classify the following differential equations with respect
to (i) their nature (ordinary or partial), (ii) their order and (iii) linear or
nonlinear:
(a) ∂ f
∂x − ∂ f
∂ y
= 1 (b) ∂2 g
∂t2
+ g = sin(t)
(c) d3 y
dx3 + 8 y = x sin x (d)
∂u
∂x
∂u
∂t = x + t
(e) P2 d2P
dx2 = x5 + 1 ( f )
∂4F
∂x∂ y3 = t2F
Solution:
(i) Equations (a), (b), (d) and ( f ) involve partial derivatives and are
hence partial differential equations, whereas equations (c) and (e) involve
ordinary derivatives and are hence ordinary differential equations.
(ii) Recall that the order of a differential equation is the degree of the
highest derivative that occurs in it. The orders of the differential equations
are as follows:
(a) first (b) second (c) third (d) first
(e) second ( f ) fourth
(iii) Recall that linear differential equations are those in which neither
the function nor its derivatives occur in products, powers or nonlinear
functions. It doesn’t matter how the independent variables appear. We
observe that equations (a), (b), (c) and ( f ) are linear whereas equations
(d) and (e) are nonlinear.
10.1.1 Solutions of differential equations
When asked to solve an algebraic equation, for example x2 − 3x +
2 = 0, we expect the answers to be numbers. The situation as
regards differential equations is much more difficult because we are
being asked to find functions that will satisfy the given equation,
for example in Example 10.1(a) we are asked for a function f (x, y)
that will satisfy the partial differential equation ∂ f ∂x −
∂ f ∂ y = 1, and
in Example 10.1(c) we are asked to find a function y(x) that will
satisfy
d3 y
dx3 + 8 y = x sin x.Unlike algebraic equations, which only have a discrete set of
solutions (for example x2 − 3x + 2 = 0 only has the solutions x = 1
or 2) differential equations can have whole families of solutions.
For example, y = Ae3t satisfies the ordinary differential equationdydt = 3 y for any value of A.
If a differential equation is linear then there is a well-established
procedure for finding solutions and we shall cover this in detail for
ordinary differential equations. If an ordinary differential equa-
tion is nonlinear but is of first order then we may be able to find
solutions.
The theory of partial differential equations is outside the scope of this unit but we shall touch upon it at the end of this section.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 147/212
mathematical methods i 147
10.1.2 Verification of solutions of differential equations
To get a feel for things (and to practice our algebra) we’ll have a
quick look at the relatively simple procedure of verifying solutions
of differential equations by way of a few examples
Example 10.2. Verify that
y(x) = Ae2x + Be−2x − 2cos(x) − 5x sin(x) (10.4)
is a solution of the ordinary differential equation
d2 y
dx2 − 4 y = 25x sin(x) (10.5)
for any value of the constants A and B.
Solution: We need to calculate d2 ydx2 . In order to do this we need the
product rule to differentiate x sin x. It gives
d
dx (x sin(x)) = sin(x) + x cos(x)
and
d2
dx2 ( x sin x) = 2 cos(x) − x sin(x)
Hence
d2 y
dx2 = 4 Ae2x + 4Be−2x − 8cos(x) + 5x sin(x)
and substitution of this and the expression (10.4) into equation (10.5)quickly yields the required verification.
Example 10.3. Verify that both
(i) f (x, y) = xy − y2
2 and (ii) f (x, y) = sin( y − x) +
x2
2
are solutions of the partial differential equation
∂ f
∂x +
∂ f
∂ y = x
Solution: In each case we need to calculate ∂ f ∂x and
∂ f ∂ y . We have
(i) ∂ f
∂x = y and
∂ f
∂ y = x − y. Thus
∂ f
∂x +
∂ f
∂ y = y + x − y = x
(ii) ∂ f
∂x = − cos( y − x) + x and
∂ f
∂ y = cos( y − x).
Hence ∂ f
∂x +
∂ f
∂ y = − cos( y − x) + x + cos( y − x) = x
In both cases we have verified the solution of the partial differential equa-tion.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 148/212
148
10.2 Mathematical modelling with ordinary differential equa-
tions
For real-world systems changing continuously in time we can use
derivatives to model the rates of change of quantities. Our mathe-
matical models are thus differential equations.
Example 10.4. Modelling population growth.
The simplest model of population growth is to assume that the rate of
change of population is proportional to the population at that time. Let
P(t) represent the population at time t. Then the mathematical model is
dP
dt = rP for some constant r > 0
It can be shown (using a method called separation of variables, which
we shall learn shortly) that the function P(t) that satisfies this differential
equation is
P(t) = P0ert where P0is the population at t = 0
This model is clearly inadequate in that it predicts that the population will
increase without bound if r > 0. A more realistic model is the logistic
growth model
dP
dt = rP(C − P) where r > 0 and C > 0 are constants
The method of separation of variables can be used to show that the solution
of this differential equation is
P(t) = CP0
P0 + (C − P0)e−rt where P0 = P(0)
This model predicts that as time goes on, the population will tend towards
the constant value C, called the carrying capacity.
Example 10.5. Newton’s law of cooling
The rate at which heat is lost from an object is proportional to the differ-
ence between the temperature of the object and the ambient temperature.
Let H (t) be the temperature of the object (in ◦C) at time t and suppose
the fixed ambient temperature is A◦C. Newton’s law of cooling says that
dH
dt = α( A − H ) for some constant α > 0
The method of separation of variables can be used to show that the solution
of this differential equation is
H (t) = A + ( H 0 − A)e−αt where H 0 = H (0)
This model predicts that as time goes on, the temperature of the object will
approach that of its surroundings, which agrees with our intuition.
Example 10.6. One tank mixing process
Suppose we have a tank of salt water and we allow fresh water into thetank at a rate of F m3 /sec, and allow salt water out at the same rate. Note
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 149/212
mathematical methods i 149
Tank
Fresh Water
F
Salt Water
F
Volume
V
−
− −
− − −
−
Figure 10.1: A tank.
this is a volume rate, and that the volume V of the tank is maintained
constant. We assume instantaneous mixing so that the tank has a uniform
concentration.
Let x(t) represent the salt concentration of the water (kg/m3) in the
tank at time t and a(t) represent the amount of salt (kg). We have x (t) =
a(t)/V. The tank starts with an amount of salt a 0 kg.The rate at which salt is being removed from the tank at time t is given
byda
dt = −x(t) × flow rate = −Fx(t) = − F
V a(t) = −αa(t)
where α = F/V is a positive constant. This equation has the solution
a(t) = a0e−αt. This converges to zero as t → ∞.
Consider the same tank which is now filled with fresh water. Water pol-
luted with q kg/m3 of some chemical enters the tank at a rate of F m3 /sec,
and polluted water exits the tank at the same rate. We again assume in-
stantaneous mixing so that the tank has a uniform concentration.
Let x(t) represent the concentration of pollutant (kg/m3
) in the waterin the tank at time t and a(t) represent the amount of pollutant (kg). We
have x(t) = a(t)/V. The rate at which pollutant is being added to the
tank at time t is given by
da
dt = amount of pollutant added per sec − amount of pollutant removed per sec
That is,da
dt = q F − Fx(t) = q F − F
V a(t)
Alternatively, we can obtain a differential equation for the concentration
x(t) by dividing through the above equation by V to give
dx
dt =
F
V (q − x) ⇒ dx
dt = α(q − x)
where α = F/V is a positive constant. Notice that this is essentially
the same as the differential equation that we obtained for Newton’s law of
cooling.
10.3 First order ordinary differential equations
Most first order ordinary differential equations can be expressed
(by algebraic re-arrangement if necessary) in the form
dx
dt = f (t, x) (10.6)
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 150/212
150
where the function f (t, x) is known, and we are asked to find the
solution x(t).
10.3.1 Direction fields
Expression (10.6) means that for any point in the t x-plane (for
which f is defined) we can evaluate the gradient dxdt and repre-
sent this graphically by means of a small arrow. If we do this for a
whole grid of points in the tx-plane and place all of the arrows on
the same plot we produce what is called a direction field. Figure 10.2
displays the direction field in the case where f (t, x) = x 2 − t2.
Figure 10.2: The direction field of dxdt = x2 − t2.
A solution of equation (10.6) is a function relating x and t which
geometrically is a curve in the tx-plane. Since this solution satisfies
the differential equation then the curve is such that its gradient is
the same as the direction field vector at any point on the curve.
That is, the direction field is a collection of arrows that are
tangential to the solution curves. This observation enables us to
roughly sketch solution curves without actually solving the differ-
ential equation, as long as we have a device to plot the direction
field. We can indeed sketch many such curves (called a family of
solution curves) superimposed on the same direction field.
Example 10.7. The direction field of dxdt = x2 − t2 along with three
solution curves is given in Figure 10.3. The top curve is the solution that
goes through (t, x) = (0, 1). The middle curve is the solution that goes
through (t, x) = (0, 0) and the bottom curve is the solution that goes
through (t, x) = (0, −2).
Example 10.8. The direction field of dxdt = 3x + et along with three
solution curves. The top curve is the solution that goes through (t, x) =
(0, 1) is given in Figure 10.4. The middle curve is the solution that goes
through (t, x) = (0, 0) and the bottom curve is the solution that goes
through (t, x) = (0, −1).
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 151/212
mathematical methods i 151
Figure 10.3: Three solution curves fordxdt = x2 − t2.
Figure 10.4: Three solution curves fordxdt = 3x + et.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 152/212
152
10.3.2 Separation of variables
If it is the case that a first order differential equation can be manip-
ulated into the following form, called separable form:
dx
dt = h(t)
g(x) for some g(x) = 0 (10.7)
(and the equation is called a separable differential equation) then there
is a simple way of determining its solution. Informally, we treat dxdt
as a fraction and re-arrange equation (10.7) into the form
g(x)dx = h(t)dt
We now (symbolically) integrate both sides to get g(x)dx =
h(t)dt
If we can integrate g(x) and h(t) with respect to their respective ar-guments then we will have an expression that we can (in principle)
re-arrange into a solution of the form x(t).
This informal approach can be made rigorous by appeal to the
Chain Rule. We write equation (10.7) as
g(x)dx
dt = h(t)
and integrate both sides with respect to t to get
g(x)
dx
dt dt =
h(t)dt which implies
g(x)dx =
h(t)dt
Example 10.9. Solve the first order differential equation dxdt = x2 sin(t)
Solution: The differential equation is separable. The solution is given
by dx
x2 =
sin(t)dt which implies − 1
x = − cos(t) + C
(10.8)
where C is a constant of integration. We can re-arrange equation (10.8) to
get
x(t) = 1
cos(t) − C (10.9)
Note that we could have added integration constants, say C1 and C2,to each side of equation (10.8) but these can easily be combined into one
integration constant: C = C2 − C1.
Note also that equation (10.8) does not hold if x = 0. In such sit-
uations we have to investigate the original differential equation dxdt =
x2 sin(t). In this case it turns out that x(t) = 0 is in fact a solution, but
not of the form (10.9). Special situations like this are something that we
should be aware of.
10.3.3 The integrating factor method
If a first order ordinary differential equation is linear then it can besolved by the following process. The most general form for a first
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 153/212
mathematical methods i 153
order linear ordinary differential equation is
dx
dt + p(t)x = r(t) (10.10)
The trick is to multiply this equation throughout by an as yet un-
determined function g(t) and then appeal to the product rule of
differentiation to guide us in the correct choice of g(t). Multiplying
equation (10.10) throughout by g(t) gives
g(t)dx
dt + g(t) p(t)x = g(t)r(t) (10.11)
We would like to be able to write the left-hand side of this equation
in the form:d
dt [ g(t)x]
Applying the product rule to this expression and comparing it to
the leftt-hand side of equation (10.11) we have
g(t)dx
dt +
d g
dt x = g(t)
dx
dt + g(t) p(t)x
which implies thatdg
dt = g(t) p(t) (10.12)
This equation is separable. The solution is
dg
g(t) =
p(t)dt
which implies
ln g(t) =
p(t)dt (10.13)
We can set the constant of integration to zero - any function that
satisfies equation (10.12) will do for our purposes. Exponentiating
both sides of equation (10.13) gives
g(t) = e
p(t)dt
and g(t) is referred to as an integrating factor. With g(t) having this
special form, equation (10.11) is equivalent to
d
dt [ g(t)x(t)] = g(t)r(t)
Integrating this equation with respect to t gives
g(t)x(t) =
g(t)r(t)dt + C
and hence the solution of the ordinary differential equation is
x(t) = 1
g(t)
g(t)r(t)dt + C
where g(t) = e
p(t)dt
(10.14)
Note that we must include the integration constant C because wewant the most general solution possible.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 154/212
154
Example 10.10. . Solve the first order linear differential equation
dx
dt − 3x = et (10.15)
As a guide to the shape of the solutions, the direction field for this differ-
ential equation along with a number of solution curves appears in Figure10.4.
Solution: The integrating factor is
g(t) = e
p(t)dt = e −3dt = e−3t
Multiplying the differential equation through by g(t) gives
e−3t dx
dt − 3e−3tx = e−2t
We know from our recent work that the left-hand side of this can be rewrit-
ten in product form and so we obtain:
ddt
e−3tx
= e−2t
Now integrate both sides with respect to t:
e−3tx =
e−2tdt = − 1
2e−2t + C
and tidy up:
x(t) = − et
2 + Ce3t (10.16)
Note that we could have written down the solution immediately by ap-
pealing to equation (10.14) but when learning the method it is instructive
to follow through each step in the process in order to gain a better under-standing of how it works.
Example 10.11. Solve the first order linear differential equation
dx
dt +
2
t x =
sin(t)
t2
Solution: The integrating factor is
g(t) = e
p(t)dt = e
2t dt = e2 ln(t) = e ln(t2) = t2
Multiplying the differential equation through by this gives
t2 dxdt
+ 2tx = sin(t)
We know from our recent work that the right-hand side of this can be
rewritten in product form:
d
dt
t2x
= sin(t)
Now integrate with respect to t:
t2x =
sin(t)dt = − cos(t) + C
and tidy up:
x(t) = − cos(t) + Ct2
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 155/212
mathematical methods i 155
10.4 Initial conditions
The values of constants of integration that arise when we solve
differential equations can be determined by making use of other
conditions (or restrictions) placed on the problem.
Example 10.12. Solve dxdt − 3x = et subject to x(0) = 1, that is, x = 1
when t = 0.
Solution: We have already seen this differential equation. It is equa-
tion (10.15) and we have determined that its (most general) solution is
given by (10.16):
x = − et
2 + Ce3t
All we have to do is substitute x = 1 and t = 0 and solve the resulting
algebraic equation for C. We have
1 = −e0
2 + Ce0 ⇒ 1 = −1
2 + C ⇒ C = 3
2
so the required solution is
x(t) = 3e3t − et
2 (10.17)
In applications, the variable t is usually used to represent time.
The extra condition (for example x(0) = 1) is called an initial value
and the combined differential equation plus initial condition is
called an initial value problem. The solution curve of (10.17) appears
in Figure 10.4.
Occasionally there are situations when the ‘initial condition’ isactually specified at a nonzero time value.
Example 10.13. Solve the initial value problem
dx
dt =
t2
x subject to x(1) = 4
Solution: We observe that the differential equation is separable. The
solution is: xdx =
t2dt ⇒ x2
2 =
t3
3 + C
which implies
x = ±
2t3
3 + 2C
Notice that we have two different solutions to the differential equation, one
positive and one negative. The initial condition x(1) = 4 allows us to
eliminate the negative solution so we are left with
x =
2t3
3 + 2C
and substituting into this x = 4 and t = 1 gives
4 = 2
3 + 2C ⇒ C = 23
3 ⇒ x =
2t3 + 463
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 156/212
156
10.5 Linear constant-coefficient ordinary differential equations
The general theory of higher order (that is, greater than one) dif-
ferential equations is beyond the scope of this unit and in any case
many applications lead to differential equations that have an espe-
cially simple structure, and we shall concentrate on these.
Recall that in a linear differential equation neither the function
nor its derivatives occur in products, powers or nonlinear functions.
We shall focus on second order linear differential equations. Exten-
sion to higher orders is straightforward, and first order ones can be
handled by the methods already discussed.
The most general second order linear ordinary differential equa-
tion is
a(t)d2x
dt2 + b(t)
dx
dt + c(t)x = f (t)
but we shall simpify this even further by assuming that the coef-
ficient functions on the left-hand side are constants (that is, not
functions of t). In this case we have the general second order linear
constant-coefficient ordinary differential equation
ad2x
dt2 + b
dx
dt + cx = f (t) a = 0 (10.18)
If f (t) = 0 then the equation is referred to as being homogeneous,
otherwise it is called nonhomogeneous. We shall first of all concen-
trate on the homogeneous case, that is, we wish to solve
ad2x
dt2 + bdx
dt + cx = 0 a = 0 (10.19)
where a, b and c are given constants.
In seeking a solution technique we consider the first order equa-
tion
bdx
dt + cx = 0 b = 0
We can solve this by using either the separation approach or the
integrating factor method. Using separation we have
dxdt = − cx
b ⇒
dxx =
− c
b dt
⇒ ln(x) = −c
b t + C⇒ x = Aemt
where m = −c/b, C is the constant of integration and A = eC.
By analogy we attempt to find a solution to the second order
equation (10.19) in the form x = emt. Substituting x = emt into
equation (10.19) yields
am2emt + bmemt + cemt = 0 ⇒ (am2 + bm + c)emt = 0
and since emt can never be zero we can divide through by it to
arrive at a quadratic equation for m:
am2 + bm + c = 0
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 157/212
mathematical methods i 157
This equation is usually referred to as the characteristic equation (or
auxiliary equation) of the differential equation. Label the roots of this
equation m1 and m2. Then both em1 t and em2t are solutions of the
differential equation.
Homogeneous linear differential equations have the linearity
property, that is, if both x1(t) and x2(t) are solutions then so is any
linear combination of these (that is, Ax1(t) + Bx2(t) where A and B
are constants).
We can easily show this for equation (10.19). Let x(t) = Ax1(t) +
Bx2(t). Then
a d2 xdt2 + b dx
dt + cx = a A d2 x1
dt2 + B d2 x2
dt2
+ b
A dx1
dt + B dx1dt
+ c [ Ax1 + Bx2]
= A
a d2 x1dt2 + b dx1
dt + cx1
+ B
a d2 x2
dt2 + b dx2dt + cx2
= 0 A + 0B = 0
Making use of this linearity result allows us to write down the
general solution of the second order homogeneous linear constant-
coefficient ordinary differential equation:
x(t) = Aem1t + Bem2t (10.20)
Notice that in this case we have two undetermined constants
in the solution. The general situation is that an nth order linear
differential equation (even one with variable coefficients) will have
n undetermined constants in its general solution. If the differential
equation is nonlinear the situation is much less clear.
Recall that the roots of a quadratic equation can be either (i) both
real; (ii) complex conjugates; or (iii) a repeated root. The actualsolution form is different in each of these three cases.
Case 1. Two real roots. In this case the solution is simply that
given in (10.20).
Example 10.14. Solve the differential equation
d2x
dt2 − 5
dx
dt + 4x = 0
Solution: The characteristic equation is m2 − 5m + 4 = 0 which
factorises into (m − 1)(m − 4) = 0 and hence the required solutions are
m1 = 1 and m2 = 4 and the solution is
x(t) = Aet + Be4t
Case 2. Complex conjugate roots. The differential equation is
in terms of real variables so we would like to have a solution in
terms of functions of real numbers. To achieve this we recall Euler’s
formula
eit = cos(t) + i sin(t)
If the roots are m1 = α + i β and m2 = α − i β then general solution
(including both real and complex functions) is
x = Ae(α+i β)t + Be(α−i β)t = eαt Aei βt + Be−i βt
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 158/212
158
where A and B are complex numbers. Applying Euler’s formula
gives
x(t) = eαt [ A(cos( βt) + i sin( βt)) + B(cos( βt) − i sin( βt))]
which we can rewrite as
x(t) = C1eαt cos( βt) + C2eαt sin( βt)
where C1 = A + B and C2 = i( A − B). Since we are only interested
in real-valued functions we are only interested in the cases where
C1 and C2 are real.
Example 10.15. Solve the differential equation
d2x
dt2 − 4
dx
dt + 13x = 0
Solution: The characteristic equation is m2 − 4m + 13 = 0. The
quadratic formula gives the roots as m1,2 = 2 ± 3i and hence the solution
is
x(t) = C1e2t cos(3t) + C2e2t sin(3t)
where C1 and C2 are any real numbers.
Case 3. Equal roots. In this case we have only one solution of the
form x1 = emt. The other solution will have the form x2 = temt. This
can be deduced using a process called reduction of order but this is
outside the scope of this material. The required solution is
x(t) = Aemt + Btemt
Example 10.16. Solve the differential equation
d2x
dt2 + 6
dx
dt + 9x = 0
Solution: The characteristic equation is m2 + 6m + 9 = 0 which
factorises into (m + 3)2 = 0 and hence the required component solutions
are e−3t and te−3t and the solution of the differential equation is
x(t) = Ae−
3t + Bte−
3t
10.6 Linear nonhomogeneous constant-coefficient differential equa-
tions
We now consider equations of the form (10.18), reproduced here:
ad2x
dt2 + b
dx
dt + cx = f (t) a = 0 (10.21)
where a, b and c are given constants. In the previous section we
learnt how to solve such a differential equation if f (t) = 0 (that is,
the equation is homogeneous). If f (t) = 0 the equation is callednonhomogeneous.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 159/212
mathematical methods i 159
In order to solve equation (10.21) we first need to solve the
equivalent homogeneous equation (that is, when f (t) = 0). The
solution so obtained is referred to as the homogeneous solution or
the complementary function. It is actually a whole family of solutions
because it has two free parameters (C1 and C2, or A and B).
The solution of equation (10.21) is the sum of this homogeneous
solution and a particular solution (that is, any solution that we
can find) to the nonhomogeneous differential equation (10.21) and
is referred to as the general solution of (10.21). It is also actually a
whole family of solutions for the same reasons.
We can show that such a combination will be a solution to equa-
tion (10.21). Let xc(t) be a solution of the homogeneous equation
and x p(t) be a solution of equation (10.21). We need to show that
x(t) = xc(t) + x p(t) is also a solution of (10.21):
ad2 xdt2 + b
dxdt + cx = a d2 xc
dt2 + d2 x p
dt2 + b dxcdt +
dx p
dt + c xc + x p=
a d2 xc
dt2 + b dxcdt + cxc
+
a
d2 x p
dt2 + bdx p
dt + cx p
= 0 + f (t) = f (t)
and our claim is proved. We can also show that there are in fact
no other possible solutions of equation (10.21) by showing that
the difference between any two solutions (say x1(t) and x2(t)) is a
solution of the homogeneous equation:
a d2
dt2 (x1 − x2) + b ddt (x1 − x2) + c(x1 − x2)
= a d2
x1dt2 + b dx1
dt + cx1− a d2 x2
dt2 + b dx2dt + cx2
= f (t) − f (t) = 0
So the set of solutions of a linear differential equation behaves in
the same way as what we saw for the set of solutions of a system of
linear equations in Section 3.4.
In any given problem our tasks will be to solve the homogeneous
equation (using the methods of the previous section) and to find
a particular solution. There is a general method (called variation of
parameters) for achieving the latter but it often leads to difficult, if
not impossible integrals to solve. We shall restrict ourselves to aclass of right-hand side functions f (t) for which there is an easily
applicable method.
This class is the collection of polynomial functions
αntn + αn−1tn−1 + · · · + α1t + α0,
exponential functions αe βt and the trigonometric functions α cos(γt)
and β sin(δt) or any linear combination of these, for example
f (t) = t2 + 7t + 4 + e5t + 6cos(3t)
The method of finding a particular solution for such a class of right-hand side functions is called the method of undetermined coefficients.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 160/212
160
10.6.1 Method of undetermined coefficients
Table 10.1 is a summary of the appropriate form of solution to seek.
The values of αn, · · · , α0, α, and β will be given and we are required
to determine the correct values of Pn, · · · , P0, P and/or Q that will
give a form that will satisfy the differential equation.
f (t) Solution form
αntn + αn−1tn−1 + · · · + α1t + α0 Pntn + Pn−1tn−1 + · · · + P1t + P0
αe βt Pe βt
α cos(γt) P cos(γt) + Q sin(γt)
β sin(δt) P cos(δt) + Q sin(δt)
Table 10.1: Solution forms for methodof undetermined coefficients
Example 10.17. Solve the differential equation d2 xdt2 + 6 dx
dt + 9x = 4t2 + 5.
Solution: In Example 10.16, we found the complementary solution
(that is, the solution of the homogeneous problem). It is
xc(t) = Ae−3t + Bte−3t
We now need to determine a particular solution. Based on Table 10.1 we
try
x p(t) = P2t2 + P1t + P0 (10.22)
Note that we must include the term P1t. Substitution of expression
(10.22) into the differential equation gives
2P2 + 6(2P2t + P1) + 9(P2t2 + P1t + P0) = 4t2 + 5
⇒ 9P2t2 + (9P1 + 12P2)t + (9P0 + 6P1 + 2P2) = 4t2 + 5
and equating powers of t on each side of this equation leads to a set of
algebraic equations to solve for the unknowns P0, P1 and P2:
9P2 = 4 9P1 + 12P2 = 0 9P0 + 6P1 + 2P2 = 5
The solution of this set of equations is
P0 =
23
27 P1 = −16
27 P2 =
4
9
Hence the particular solution
x p(t) = 4
9t2 − 16
27t +
23
27
and finally, the general solution of the nonhomogeneous differential equa-
tion is
x(t) = Ae−3t + Bte−3t + 4
9t2 − 16
27t +
23
27 (10.23)
Example 10.18. Solve the differential equation
d2x
dt2 − 5
dx
dt + 4x = 7 cos(3t)
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 161/212
mathematical methods i 161
Solution: In Example 10.14 we found the complementary solution
(that is, the solution of the homogeneous problem). It is
xc(t) = Aet + Be4t
We now need to determine a particular solution. Based on Table 10.1 we
try
x p(t) = P cos(3t) + Q sin(3t) (10.24)
Note that we must include both cos and sin terms. Substitution of expres-
sion (10.24) into the differential equation gives
−9P cos(3t) − 9Q sin(3t) − 5(−3P sin(3t) +3Q cos(3t)) + 4(P cos(3t) + Q sin(3t)) = 7 cos(3t)
and equating coefficients of cos(3t) and sin(3t) on both sides of this
equation leads to a set of algebraic equations to solve for the unknowns P
and Q:
−9P − 15Q + 4P = 7 − 9Q + 15P + 4Q = 0
The solution of this set of equations is
P = − 7
50 and Q = −21
50
and so
x p(t) = − 7
50 cos(3t) − 21
50 sin(3t)
and finally, the general solution of the nonhomogeneous differential equa-
tion is
x(t) = Aet + Be4t
− 7
50 cos(3t)
− 21
50 sin(3t) (10.25)
Example 10.19. Solve the differential equation
d2x
dt2 − 4
dx
dt + 13x = 8e−3t
Solution: In Example 10.15 we found the complementary solution
(that is, the solution of the homogeneous problem). It is
xc(t) = C1e2t cos(3t) + C2e2t sin(3t)
We now need to determine a particular solution. Based on Table 10.1 we
try
x p(t) = Pe−3t (10.26)
Substitution of expression (10.26) into the differential equation gives
9Pe−3t + 12Pe−3t + 13Pe−3t = 8e−3t
and cancelling throughout by e−3t (which is valid because it can never be
zero) gives
34P = 8 ⇒ P = 4
17 ⇒ x p(t) =
4
17e−3t
and hence the general solution of the nonhomogeneous differential equation
is
x(t) = C1e2t cos(3t) + C2e2t sin(3t) + 417
e−3t (10.27)
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 162/212
162
10.7 Initial and boundary conditions
As mentioned earlier, the values of constants that arise when we
solve differential equations can be determined by making use of
other conditions (or restrictions) placed on the problem. If there are
n unknown constants then we will need n extra conditions.
If all of the extra conditions are given at one value of the inde-
pendent variable then the extra conditions are called initial condi-
tions and the combined differential equation plus initial condition
is called an initial value problem. This is so even if the independent
variable doesn’t represent time.
Formally, if the extra conditions are given at different values of
the independent variable then they are called boundary conditions
and the combined differential equation plus boundary conditions is
called a boundary value problem.
In applications, the term initial condition is used if the dependentvariable represents time and the term boundary condition is used if
the dependent variable represents space. Usually, the two different
uses of the terminolgy coincide.
Example 10.20. Solve the initial value problem
d2x
dt2 − 5
dx
dt + 4x = 7 cos(3t) subject to x(0) = 1 and
dx
dt (0) = 2
Solution: We have already seen this differential equation in Example
10.18 and determined that its general solution is given by (10.25):
x(t) = Aet + Be4t − 750
cos(3t) − 2150
sin(3t)
The initial conditions will give two equations to solve for the unknowns, A
and B. Firstly,
x(0) = 1 ⇒ 1 = A + B − 7
50 (10.28)
The second initial condition involves the derivative so first we need to find
it:dx
dt = Aet + 4Be4t +
21
50 sin(3t) − 63
50 cos(3t)
and the second initial condition then gives
dx
dt (0) = 2 ⇒ 2 = A + 4B − 63
50 (10.29)
Solving the pair of algebraic equations (10.28) and (10.29) gives
A = 13
30 and B =
53
75
so the required solution is
x(t) = 13
30
et + 53
75
e4t
− 7
50
cos(3t)
− 21
50
sin(3t)
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 163/212
mathematical methods i 163
Example 10.21. Solve the boundary value problem
d2x
dt2 + 6
dx
dt + 9x = 4t2 + 5 subject to x(0) = 7 and x(1) = −3
Solution: We have already seen this differential equation and deter-
mined that its general solution is given by (10.23):
x(t) = Ae−3t + Bte−3t + 4
9t2 − 16
27t +
23
27
The boundary conditions give two equations to solve for the unknowns, A
and B:
x(0) = 7 ⇒ 7 = A + 23
27
x(1) = −3 ⇒ −3 = Ae−3 + Be−3 + 4
9 − 16
27 +
23
27
Solving this pair of algebraic equations gives
A = 16627
and B = −166 + 100e3
75
so the required solution is
x(t) = 166
27 e−3t − 166 + 100e3
75 te−3t +
4
9t2 − 16
27t +
23
27
Example 10.22. Solve the boundary value problem
d2x
dt2 − 4
dx
dt + 13x = 8e−3t subject to x(−π ) = 2 and
dx
dt (π ) = 0
Solution: We have already seen this differential equation and deter-mined that its general solution is given by (10.27):
x(t) = C1e2t cos(3t) + C2e2t sin(3t) + 4
17e−3t
The boundary conditions give two equations to solve for the unknowns, C1
and C2:
x(−π ) = 2 ⇒ 2 = −C1e−2π + 4
17e3π
The derivative of the solution is
dxdt
= C12e2t cos(3t) − 3e2t sin(3t)+ C2
2e2t sin(3t) + 3e2t cos(3t)− 1217
e−3t
and hence
dx
dt (π ) = 0 ⇒ 0 = −2C1e2π − 3C2e2π − 12
17e−3π
Solving this pair of algebraic equations gives
C1 = −2e2π + 4
17e5π and C2 =
4
3e2π − 20
51e5π
so the required solution is
x(t) = −2e2π + 417
e5π e2t cos(3t) +43
e2π − 2051
e5π e2t sin(3t) + 417
e−3t
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 164/212
164
10.8 Summary of method
We now summarize the method for solving a linear constant-
coefficient second order ordinary differential equation with initial
or boundary conditions.
Key Concept 10.23. Solve
ad2x
dt2 + b
dx
dt + cx = f (t) with a = 0
given some initial or boundary conditions.
Step 1. Solve the corresponding characteristic (or auxilliary) equation
am2 + bm + c = 0
Step 2. Use your solution to the characteristic equation to find the
general solution xc(t) of the corresponding homogeneous equation
ad2x
dt2 + b
dx
dt + cx = 0
by using one of the three general forms given in Section 10.5
depending on whether or not the characteristic equation has two
distinct real roots, complex conjugate roots or equal roots. Your
solution will have two unknown constants.
Step 3. Use the Method of Undetermined Coefficients outlined inSection 10.6.1 to find a particular solution x p(t).
Step 4. The general solution is now x(t) = xc(t) + x p(t).
Step 5. Use the initial or boundary conditions to determine the two
unknown constants in x(t) coming from xc(t).
10.9 Partial differential equations
A partial differential equation (PDE) is one that involves an un-
known function of more than one variable and any number of its
partial derivatives. For example,
∂u
∂x +
∂u
∂ y = 0
is a partial differential equation for the unknown function u(x, y).
There may be any number of independent variables. For ex-
ample, an analysis of the vibrations of a solid structure (due to an
earthquake perhaps) would involve the three spacial variables x , y
and z, and time t. If there were only one independent variable thenthe equation would be an ordinary differential equation.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 165/212
mathematical methods i 165
Partial differential equations are important in a large number of
very important areas, for example, hydrodynamics and aerodynam-
ics (the Navier-Stokes equation), groundwater flow, acoustics, the
spread of pollution, combustion, electrodynamics (Maxwell’s equa-
tions), quantum mechanics (the Schrödinger equation), meteorology
and climate science.
Example 10.24. Maxwell’s Equations
These are 4 coupled vector PDEs for the vector unknowns for the elec-
tric and magnetics fields E = (Ex, E y, E z) and B = (Bx, B y, B z). They
are, in vector derivative notation,
∇ · E = 4πρe ∇ · B = 0
−∇ × E = 1
c
∂B
∂t ∇ × B =
1
c
∂E
∂t +
4π
c je
where∇
is the so-called vector differential operator
∇ = ( ∂
∂x, ∂
∂ y, ∂
∂ z)
Maxwell’s Equations constitute an (overspecified) set of 8 scalar equations.
The charge density ρe and current density je are specified.
Example 10.25. Three important partial differential equations are the
following.
The Wave equation
∂2u
∂t2
= c2 ∂2u
∂x2
for u(x, t)
where c is the wave speed.
The heat conduction (or diffusion) equation
∂T
∂t = κ
∂2T
∂x2 for T (x, t)
where κ is the thermal diffusivity.
The Laplace equation
∂2ψ
∂x2 +
∂2ψ
∂ y2 = 0 for ψ(x, y)
10.9.1 Finding solutions of partial differential equations
There are a number of ways to find solutions of PDEs but they can
involve a large amount of work and can be very subtle. A general
discussion of how to find solutions to PDEs is beyond the scope
of this unit. We shall instead focus on verification of a given solu-
tion and analysis of its behaviour and what this tells us about the
situation being modelled.
Recall that when solving ODEs there would arise arbitrary con-
stants, for example,
d2 y
dx2 + y = 0 ⇒ y(x) = A cos x + B sin x
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 166/212
166
and these would be determined from given initial data, e.g.
y(0) = 1 and y (0) = 0 ⇒ A = 1 and B = 0
In the case of PDEs, arbitrary functions can arise. For example, the
solution of
∂u
∂x +
∂u
∂ y = 0 is u(x, y) = f (x − y)
where f is any differentiable function of one variable. We verify this
using the Chain Rule as follows:
∂u
∂x +
∂u
∂ y = f (x − y)(1) + f (x − y)(−1) = 0
as required.
10.9.2 Verification of given solutions of partial differential equations
This is simply a matter of substituting the given solution into thepartial differential equation and evaluating the derivatives involved.
Example 10.26. Verify that u(x, t) =
x/(1 + t) is a solution of the
PDE∂u
∂t + u2 ∂u
∂x = 0
Solution. We will need the Chain Rule and the quotient rule:
∂u
∂x =
1
2
1 x(1 + t)
∂u
∂∂t = −1
2
x
(1 + t)3
and hence
∂u∂t + u2 ∂u∂x = −12 x(1 + t)3 + x1 + t · 12 1
x(1 + t) = 0
as required.
Example 10.27. Verify that u(x, y) = ex f ( y − x) is a solution of the
PDE∂u
∂x +
∂u
∂ y = u
for any function f of one variable.
Solution. We will need the chain rule and the product rule:
∂u
∂x = ex f ( y − x) − ex f ( y − x)
∂u
∂ y = ex f ( y − x)
and hence
∂u
∂x +
∂u
∂ y = ex f ( y − x) − ex f ( y − x) + ex f ( y − x) = ex f ( y − x) = u
as required.
Remark 10.28. Note that the PDE is symmetric in x and y but the
solution doesn’t appear to be. What happened? We can write a symmetric
partner to this solution:
u(x, y) = ex f ( y − x) = e yex− y f ( y − x)
which implies that
u(x, y) = e y g(x − y) where g( z) = e z f (− z)
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 167/212
mathematical methods i 167
A given partial differential equation can have different solution
forms. For example, all of the functions
ψ(x, y) = x2 − y2
ψ(x, y) = ex cos( y)
ψ(x, y) = ln(x2 + y2)
are solutions of the Laplace equation
∂2ψ
∂x2 +
∂2ψ
∂ y2 = 0
Verification of these solutions is a simple task.
ψ(x, y) = x2 − y2 then ∂2ψ
∂x2 +
∂2ψ
∂ y2 = 2 + (−2) = 0
ψ(x, y) = ex cos y then ∂2ψ
∂x2 +
∂2ψ
∂ y2 = ex cos y + ex(− cos y) = 0
ψ(x, y) = ln(x2 + y2) then ∂ψ
∂x =
2x
x2 + y2 and
∂ψ
∂ y =
2 y
x2 + y2
and hence
∂2ψ
∂x2 +
∂2ψ
∂ y2 =
2( y2 − x2)
(x2 + y2)2 +
2(x2 − y2)
(x2 + y2)2 = 0
10.9.3 Graphical description of solutions of partial differential equa-tions
There are three ways of representing solutions of a PDE graphically.
Consider the function u(x, y) = ex cos( y). If we have access to a
sophisticated graphing package we can plot the solution surface in
three-dimensional space as in Figure 10.5.
!"
#
$"
!%&'"&' ()&'
*&'
!&'
!&'
)&'
Figure 10.5: Surface plot of u(x, y) =ex cos( y).
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 168/212
168
Alternatively, we could produce a plot of the level curves of the
solution in two dimensions as in Figure 10.6.
Figure 10.6: Level-curve plot of u(x, y) = ex cos( y).
We could also graph u against x for chosen values of y as in
Figure 10.7.
Figure 10.7: Section plots of u(x, y) =ex cos( y) for y = 0, π /8,π /4.
10.9.4 Initial and Boundary Conditions
Recall that to fully determine the solution of an nth order ODE
we needed to specify n initial or boundary conditions. To fully
determine the solution of a PDE we need sufficiently many initial
and/or boundary conditions. These will appear in the form of
functions. An example of an initial condition is
u(x, 0) = sin(x) for all x
and an example of a boundary condition is
u(0, t) = e−t for all t > 0
How many initial and/or boundary conditions do we need? If
a PDE is nth order in time and mth order in space then we will re-
quire n initial conditions and m boundary conditions. An example
of a fully specified problem is:
Solve∂2u
∂t2 =
∂3u
∂x3
subject to the initial conditions
u(x, 0) = cos(x) and ∂u
∂t (x, 0) = 0 for 0 < x < 1
and the boundary conditions
u(0, t) = sin(t), ∂u
∂x(0, t) = 1,
∂2u
∂x2 (0, t) = t for t > 0
An initial-boundary value problem (IBVP) is a PDE for an unknownfunction of time and some spatial coordinates plus enough initial
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 169/212
mathematical methods i 169
and boundary information to determine any unknown functions or
constants that arise. An example of an IBVP for the Wave equation
in the region 0 < x < ∞, 0 < t < ∞ is:
Solve∂2u
∂t2 = c
2 ∂2u
∂x2
subject to the initial conditions
u(x, 0) = cos(x) and ∂u
∂t (x, 0) = 0
and the boundary conditions
u(0, t) = 1 and u(π
2 , t) = 0
Example 10.29. Verify that the function
u(x, t) = xe−x
sin(t)
satisfies the initial-boundary value problem
∂2u
∂t2 =
∂2u
∂x2 + 2
∂u
∂x
u(x, 0) = 0, ∂u
∂t
x, π
2
= 0
u(0, t) = 0, limx→∞
u(x, t) = 0
Soluton step 1. Verify that the function satisfies the partial differential
equation:
∂2u∂t2
= −xe−x sin(t)
and
∂2u∂x2 + 2 ∂u
∂x = −2e−x sin(t) + xe−x sin(t) + 2e−x sin(t) − 2xe−x sin(t)
= −xe−x sin(t)
and hence the partial differential equation is satisfied.
Solution step 2. Verify the auxiliary (that is, initial and boundary)
conditions:
sin(0) = 0 ⇒ u(x, 0) = 0
∂u∂t
= xe−x cos(t) and cos π 2 = 0 ⇒ ∂u
∂t (x, π
2 ) = 0
u(0, t) = 0
limx→∞
u(x, t) = 0 =
limx→∞
x
ex
sin(t) = 0sin(t) = 0
and hence all of the auxiliary conditions are satisfied. A plot of the solution
surface in three-dimensional space is given in Figure 10.8.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 170/212
170
0.0
0.5
t
1.00.0
10.0
0.05
7.55.0
0.1
1.52.5x
0.15
0.0
0.2
0.25
0.3
0.35
Figure 10.8: Solution surface forExample 10.29.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 171/212
11
Eigenvalues and eigenvectors
11.1 Introduction
Matrix multiplication usually results in a change of direction, for
example,2 0
1 3
1
4
=
2
13
which is not parallel to
1
4
The eigenvectors of a given (square) matrix A are those special non-
zero vectors v that map to multiples of themselves under multipli-
cation by the matrix A. That is,
Av = λv
The eigenvalues of A are the corresponding scale factors λ. Geomet-
rically, the eigenvectors of a matrix A are stretched (or shrunk) onmultiplication by A, whereas any other vector is rotated as well as
being stretched or shrunk:
x1
x2
x1
x2
v
x Av = λv
A x = λ x
Note that by definition 0 is not an eigenvector but we do allow 0 to
be an eigenvalue.
Example 11.1. Observe that2 6
3 5
1
1
=
8
8
= 8
1
1
2 63 5
−21
=
2−1
= − 1
−21
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 172/212
172
Hence 8 and −1 are eigenvalues of the matrix
2 6
3 5
with λ = 8 having
corresponding eigenvector (1, 1) and λ = −1 having corresponding
eigenvector (−2, 1).
Some observations: Any multiple of an eigenvector is also aneigenvector:
A(cv) = c( Av) = c(λv) = λ(cv)
so cv is an eigenvector of A if v is, and with the same correspond-
ing eigenvalue. The set of all eigenvectors of a given eigenvalue
together with the zero vector is called the eigenspace corresponding
to the eigenvalue λ and we write
Eλ = {v | Av = λv}
The eigenspaces for Example 11.1 are shown in Figure 11.1.
x1
x2
E8
E−1
Figure 11.1: Eigenspaces for Example11.1.
Eigenspaces for nonzero eigenvalues are subspaces of the col-
umn space of the matrix while the 0-eigenspace is the nullspace
of the matrix. Geometrically, they are lines or planes through the
origin in R3 if A is a 3 × 3 matrix (other than the identity matrix).
11.2 Finding eigenvalues and eigenvectors
Let A be a given n × n matrix. Recall that the algebraic definition of
an eigenvalue-eigenvector pair is
Av = λv
where λ is a scalar and v is a nonzero column vector of length n.
We begin by rearranging and regrouping, and noting that v = I v
for any vector v, where I is the n × n identity matrix, as follows:
Av
−λv = 0
Av − λI v = 0
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 173/212
mathematical methods i 173
( A − λI )v = 0
This is a homogeneous set of linear equations for the components
of v. If the matrix A − λI were invertible then the solution would
simply be v = 0 but this is not allowed by definition and in any
case would be of no practical use in applications. We hence requirethat A − λI be not invertible and this will be the case if
det( A − λI) = 0 (11.1)
When we evaluate this determinant we will have a polynomial
equation of degree n in the unknown λ. The solutions of this equa-
tion will be the required eigenvalues. Equation (11.1) is called the
characteristic equation of the matrix A. The polynomial det( A − λI)
is called the characteristic polynomial of A.
Example 11.2. Example 11.1 revisited. We start by forming
A − λI =
2 6
3 5
− λ
1 0
0 1
=
2 − λ 6
3 5 − λ
This determinant of this matrix is
det( A − λI ) = (2 − λ)(5 − λ) − 18 = λ2 − 7λ − 8
and hence the characteristic equation is
λ2 − 7λ − 8 = 0 ⇒ (λ− 8)(λ + 1) = 0 ⇒ λ = 8, −1
That is, the eigenvalues of the given matrix A are λ = 8 and λ = −1.For each eigenvalue λ we must solve the equation
( A − λI )v = 0
When λ = 8 we have−6 6
3 −3
v1
v2
=
0
0
⇒ E8 = span
1
1
When λ = −1 we have
3 6
3 6 v1
v2 = 0
0 ⇒ E−1 = span−2
1 In solving these sets of equations we end up with the complete eigenspace
in each case, but we usually select just one member of each.
It is important to note that the reduced row echelon form of
( A − λI ) will always have at least one row of zeros. The number of zero
rows equals the dimension of the eigenspace.
Example 11.3. The upper triangular matrix
A = 1 2 6
0 3 50 0 4
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 174/212
174
Since A is upper triangular, so is A − λI and hence its determinant is just
the product of the diagonal. Thus the characteristic equation is
(1 − λ)(3 − λ)(4 − λ) = 0 ⇒ λ = 1, 3, 4
Note that these are in fact the diagonal elements of A. The respectiveeigenspaces are
E1 = span
1
0
0
E3 = span
1
1
0
E4 = span
16
15
3
The eigenvalues, and corresponding eigenvectors, could be
complex-valued. If the matrix A is real-valued then the eigenval-ues, that is, the roots of the characteristic polynomial, will occur in
complex conjugate pairs.
Example 11.4.
A =
2 1
−5 4
The characteristic polynomial is λ2 − 6λ + 13 = 0. The eigenvalues are
λ = 3 + 2i and λ = 3 − 2i. When we solve ( A − λI )v = 0 we will get
solutions containing complex numbers. Although we can’t interpret them
as vectors in R2 there are many applications (particularly in Engineering)
in which there is a natural interpretation in terms of the problem underinvestigation. The corresponding eigenvectors are
v =
1 − 2i
5
and v =
1 + 2i
5
The characteristic polynomial may have repeated roots. If it
factors into the form
(λ− λ1)m1 · · · (λ− λ j)m j · · · (λ− λ p)m p
we say that the algebraic multiplicity of λ = λ j is m j. For example, if
the characteristic polynomial were (λ + 3)(λ − 2)4(λ − 5) then thealgebraic multiplicity of λ = 2 would be 4.
Example 11.5.
A =
−2 2 −3
2 1 −6
−1 −2 0
The characteristic equation is
−λ3 − λ2 + 21λ + 45 = 0
and so
(3 + λ)2(5 − λ) = 0 ⇒ λ = −3, −3, 5
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 175/212
mathematical methods i 175
Repeating the root −3 reflects the fact that λ = −3 has algebraic multi-
plicity 2.
To find the eigenvectors corresponding to λ = 5 we solve
( A − 5I )v = 0 ⇒ −7 2
−3
2 −4 −6−1 −2 −5
v1
v2
v3
= 0
00
After some work we arrive at the reduced row echelon form
−1 −2 −5 0
0 1 2 0
0 0 0 0
Note that we have one row of zeros. The solution will therefore involve one
free parameter. Let v3 = t. The second row gives v2 = −2t and the first
row then gives v1
= −
t and hence the eigenspace corresponding to λ = 5
is
E5 = span
1
2
−1
Similarly, to find the eigenvectors corresponding to λ = −3 we solve
( A + 3I )v = 0 ⇒
1 2 −3
2 4 −6
−1 −2 3
v1
v2
v3
=
0
0
0
It turns out that the reduced row echelon form has two rows of zeros andhence the solution will involve two free parameters. Let v 2 = s and
v3 = t. Calculation gives v1 = −2s + 3t and hence the eigenspace
corresponding to λ = −3 is
E−3 = span
−2
1
0
,
3
0
1
The dimension of the eigenspace of an eigenvalue is called its
geometric multiplicity. In the above example the geometric multiplic-
ity of λ =
−3 is 2 and that of λ = 5 is 1. Note that the eigenspace
corresponding to λ = −3 is a plane through the origin and the
eigenspace corresponding to λ = 5 is a line through the origin, as
displayed in Figure 11.2
11.3 Some properties of eigenvalues and eigenvectors
Let A be an n × n matrix with eigenvalues λ1,λ2, · · · ,λn. (We
include all complex and repeated eigenvalues.) Then
• The determinant of the matrix A equals the product of the eigen-
values:det( A) = λ1λ2 . . . λn
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 176/212
176
x
y
z
E5
Figure 11.2: The eigenspaces forExample 11.5.
• The trace of a square matrix is the sum of its diagonal entries.
The trace of the matrix A equals the sum of the eigenvalues:
trace( A) ≡ a11 + a22 + · · · + ann = λ1 + λ2 + · · · + λn
Note that in both of these formulae all n eigenvalues must becounted.
• The eigenvalues of A−1 (if it exists) are
1
λ1,
1
λ2, · · · ,
1
λn
• The eigenvalues of AT (that is, the transpose of A) are the same
as for the matrix A:
λ1, λ2 , · · · , λn
• If k is a scalar then the eigenvalues of the matrix k A are
k λ1, k λ2 , · · · , k λn
• If k is a scalar and I the identity matrix then the eigenvalues of
the matrix A + kI are
λ1 + k , λ2 + k , · · · , λn + k
• If k is a positive integer then the eigenvalues of Ak are
λk 1, λk
2 , · · · , λk n
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 177/212
mathematical methods i 177
• Any matrix polynomial in A:
An + αn−1 An−1 + · · · + α1 A + α0 I
has eigenvalues
λn + αn−1λn−1 + · · · + α1λ + α0 for λ = λ1, λ2, . . . ,λn
• The Cayley-Hamilton Theorem: A matrix A satisfies its own
characteristic equation, that is, if the characteristic equation is
λn + cn−1λn−1 + · · · + c1λ + c0 = 0
where c1, c2, · · · , cn are constants then
An + cn−1 An−1 + · · · + c1 A + c0 I = 0
• It can be shown that any set of vectors from different eigenspaces,
that is, corresponding to different eigenvalues, is a linearly inde- pendent set.
11.4 Diagonalisation
The eigenstructure (that is, the eigenvalue and eigenvector informa-
tion) is often collected into matrix form. The eigenvalues (including
multiplicities) are placed as the entries in a diagonal matrix D and
one eigenvector for each eigenvalue is taken to form a matrix P.
Example 11.6. Example 11.5 revisited.
A =
−2 2 −3
2 1 −6
−1 −2 0
P =
1 −2 3
2 1 0
−1 0 1
D =
5 0 0
0 −3 0
0 0 −3
Note that the columns of D and P must correspond, e.g. λ = 5 is in
column 1 of D so the corresponding eigenvector must be in column 1 of P.
The following relationship between the matrices A, P and D will
always hold:
AP = PD (11.2)
We can see this by noting that the ith column of the equation:
a11 · · · a1n...
...
an1 · · · ann
. . . vi1 . . ....
......
. . . vin . . .
=
. . . vi1 . . ....
......
. . . vin . . .
. . . 0 0
0 λi 0
0 0 . . .
is just the eigenvalue equation for λ i, that is
Avi = λ ivi
For Example 11.6:
AP = 5 6
−9
10 −3 0−5 0 −3
= P D
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 178/212
178
If matrix P is invertible then equation (11.2) can be re-written as:
A = PDP−1 (11.3)
or
D = P−1
AP (11.4)Two matrices M and N are called similar matrices if there exists
an invertible matrix Q such that
N = Q−1 MQ
Clearly A and the diagonal matrix D constructed from the eigenval-
ues of A are similar.
Diagonalisation is the process of determining a matrix P such that
D = P−1 AP
All we need to do is to find the eigenvalues of A and form P as
described above.Note, however, that not every matrix is diagonalisable.
Example 11.7.
A =
1 −1 0
0 1 1
0 0 −2
The eigenvalues of A are λ = −2,1,1 (that is, λ = 1 has algebraic
multiplicity 2).
The eigenspace corresponding to λ = −2 is
E−2 = span1
3−9
The eigenspace corresponding to λ = 1 is
E1 = span
1
0
0
Both λ = −2 and λ = 1 have geometric multipliciy 1. In order for P to
be invertible its columns must be linearly independent. Unfortunately,
there are not enough linearly independent eigenvectors to enable us to
build matrix P. Hence matrix A is not diagonalisable.
If an n × n matrix has n distinct eigenvalues then it will be di-
agonalisable because each eigenvalue will give a representative
eigenvector and these will be linearly independent because they
correspond to different eigenvalues.
A matrix A is called a symmetric matrix if it equals its transpose,
that is
A = AT
It can be shown that the eigenvalues of a real symmetric n × n matrix
A are all real and that we can always find enough linearly inde-
pendent eigenvectors to form matrix P, even if there are less than n
distinct eigenvalues. That is, a real symmetric matrix can always bediagonalised.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 179/212
mathematical methods i 179
11.5 Linear Systems
A homogeneous system of linear differential equations is a system of the
form
x = A x
where
x =
x1(t)...
xn(t)
and A =
a11 . . . a1n...
...
an1 . . . ann
, a constant matrix.
We’ll see later that the general solution is a linear combination
of certain vector solutions (which we’ll label with a superscript
bracketed number), x(1)(t), x(2)(t), etc:
x(t) = c1 x(1)(t) + · · · + cn x(n)(t)
where the constants c1, · · · , cn are called combination co-efficients.
There is a connection between the solution of the system of differ-
ential equations and the eigenstructure of the associated matrix A.
Example 11.8. Consider the second-order differential equation
x + 2x − 3x = 0
Let x1 = x and x2 = x . The derivatives of the new variables are
x1 = x = x2 x
2 = x = 3x − 2x = 3x1 − 2x2
In matrix-vector form this isx
1
x2
=
0 1
3 −2
x1
x2
The (eigenvalue) characteristic equation for the matrix is λ2 + 2λ− 3 = 0.
Note the similarity between this and the original differential equation. In
order to solve the original problem we would try for a solution of the form
x = emt. Then x = memt and x = m2emt and substitution gives
m2emt + 2memt − 3emt = 0 ⇒ (m2 + 2m − 3)emt = 0
Divide throughout by emt (which is never zero) to get
m2 + 2m − 3 = 0 This is the characteristic equation in m instead of λ.
We shall use a method based on eigenstructure to solve systems
of differential equations.
11.6 Solutions of homogeneous linear systems of differential equa-
tions
When n = 1 the system is simply x = ax where a = a11. The so-
lution is x(t) = ceat where c is a constant. To solve a homogeneous
linear system, we seek solutions of the form
x(t) = veλt (11.5)
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 180/212
180
where λ is a constant and v is a constant vector. Substitution into
the system gives
x = A x ⇒ λveλt = Aveλt ⇒ λv = Av
This is the eigenvalue-eigenvector problem so as soon as we knowthe eigenvalues and eigenvectors we can write down the compo-
nent solutions just by substituting into equation (11.5).
Example 11.9. Example 11.8 revisited.x
1
x2
=
0 1
3 −2
x1
x2
The eigenstructure of the matrix is
v1 = 1
1 v−3 = 1
−3
The two component solutions are
x(1)(t) =
1
1
et x(2)(t) =
1
−3
e−3t
and the general solution is
x = c1
1
1
et + c2
1
−3
e−3t
In scalar form, the solutions are
x ≡ x1 = c1et + c2e−3t x ≡ x2 = c1et − 3c2e−3t
Note that both x1 (position) and x2 (velocity) satisfy the original differ-
ential equation. This will be so if the system was derived from one higher
order differential equation.
Example 11.10. The second order system
x(t) = A x(t) x =
x1
x2
A =
1 2
5 4
The eigensolution is
v6 =
2
5
v−1 =
1
−1
Hence the two component solutions are
x(1)(t) =
2
5
e6t x(2)(t) =
1
−1
e−t
The general solution is
x(t) = c12
5 e6t + c2 1−1
e−t
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 181/212
mathematical methods i 181
Example 11.11. The third order system
x(t) = A x(t) x =
x1
x2
x3
A =
1 2 3
0 4 5
0 0 6
The eigensolution is
v1 =
1
0
0
v4 =
2
3
0
v6 =
16
25
10
The general solution is
x(t) = c1
1
0
0
et + c2
2
3
0
e4t + c3
16
25
10
e6t
Example 11.12. Repeated eigenvalue example. The third order
system
x(t) = A x(t) x =
x1
x2
x3
A =
−2 2 −3
2 1 −6
−1 −2 0
The eigensolution is
v5 =
1
2
−1
v−3 =
−2
1
0
,
3
0
1
The general solution is
x(t) = c1
1
2
−1
e5t +
c2
−2
1
0
+ c3
3
0
1
e−3t
11.7 A change of variables approach to finding the solutions
Equations (11.3) and (11.4) can be used to simplify difficult prob-
lems:
Original Problem Easy solution
Easier Problem Original solution
A
P P−1
D
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 182/212
182
Introduce a vector variable y(t) through the relationship
x(t) = Py(t) (11.6)
where the columns of matrix P are the eigenvectors of A . Substitute
this change of variables into the system of differential equations:
x = A x ⇒ Py = APy ⇒ y = P−1 APy
We know that P−1 AP = D so the new system can be written as
y(t) = Dy(t)
In expanded notation this new system of differential equations is
y
1...
yn
=
λ1 . . . 0...
. . . ...
0 . . . λn
y1
...
yn
Each row of this matrix-vector equation is a simple scalar differ-
ential equation for the unknown functions y1(t), · · · , yn(t). They
are
y1 = λ1 y1, y
2 = λ2 y2, · · · , yn = λn yn
The solutions of these are easily obtained:
y1(t) = c1eλ1t, y2(t) = c2eλ2t, · · · , yn(t) = cneλnt
and substituting these into equation (11.6) gives the required solu-tions.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 183/212
12
Change of Basis
12.1 Introduction
Recall that the standard basis in R2 is
S =
1
0
,
0
1
Any vector in R2 can be written as a linear combination of these, for
example, 2
3
= 2
1
0
+ 3
0
1
Recall also that the standard basis in R3 is
S = 1
0
0 ,
0
1
0 ,
0
0
1
Any vector in R3 can be written as a linear combination of these, for
example, −4
2
−7
= −4
1
0
0
+ 2
0
1
0
− 7
0
0
1
In many areas of mathematics, both pure and applied, there is
good reason to opt to use a different basis than the standard one.
Judicious choice of a different basis will greatly simplify the prob-
lem. We saw this idea in Section 11.7 where we used a speciallychosen (based on eigenstructure) change of variables to greatly sim-
plify a system of differential equations. This plan of attack (that is,
a well chosen change of basis) has many uses.
Recall that a set of vectors B = {u1, u2, · · · , un} is a basis for a
subspace W of Rm if B is linearly independent and spans W . Note that
W may be all of Rm.
Example 12.1. Let W P be the plane 5x + 11 y − 9 z = 0 through the
origin in R3. A basis for this subspace is
B = 1
23
, 4
−11
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 184/212
184
We can verify this by observing that the basis vectors are clearly linearly
independent (there are only two of them and one is not a multiple of the
other) and noting that
1
23
× 4
−11
= 5
11−9
, a vector normal to W P
Any vector v in W P can be written as a linear combination of the basis
vectors, for example,
v =
−5
8
7
= 3
1
2
3
− 2
4
−1
1
Note that 5(−5) +11(8) − 9(7) = 0
In general, the task of finding the required linear combination of
basis vectors u1, u2, · · · , un that produce a given target vector v is
that of solving a set of simultaneous linear equations:
Find α1, α2, · · · , αn such that v = α1u1 + α2u2 + · · · + αnun
Example 12.2. Example continued. Express w = (16, −13, −7) in
terms of the basis B.
Solution. We must solve: 16
−13
−7
= α1
1
2
3
+ α2
4
−1
1
⇒
α1 + 4α2 = 16
2α1 − α2 = −13
3α1 + α2 = −7
and so α1 = −4α2 = 5
The fact that we have found a solution means that w = (16, −13, −7) lies
in the plane W P. If it did not then the system of equation for α1, α2 would
be inconsistent.
The coefficients α1, α2, · · · , αn are called the coordinates of v in the
basis B and we write
[v]B =
α1...
αn
Example 12.3. Example continued.
The coordinates of v and w in the basis B are
[v]B =
3
−2
[w]B =
−4
5
As mentioned before it is often useful to change the basis that
one is working with. Suppose that instead of the basis B in the
above example we chose to use another basis, say
C = 5
14
, −3
32
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 186/212
186
Example 12.4. Example 12.1 revisited again. The required coordinates
are
1
2
3
C
=
12
12
4
−1
1
C
=
12
−12
and hence
PCB =
12
12
12 − 1
2
and PBC = P−1
CB =
1 1
1 −1
We can verify these. Recall that [v]B = [3, −2] and [w]C = [1/2, −9/2].
Then
[v]C = PCB [v]B =
12
12
12 − 1
2
3
−2
=
1252
which agrees with what we got before, and
[w]B = PBC [w]C = 1 11 −1
12
− 92
= −45
which also agrees with what we got before.
12.3 Linear transformations and change of bases
Recall that linear transfomations can be represented using matrices,
for example, counterclockwise rotation through an angle θ in R2
using the standard basis for the original and transformed vectors
has transformation matrix
ASS =
cos θ − sin θ
sin θ cos θ
We have used the subscript SS to indicate that the standard basis
is being used to represent the original vectors and also the rotated
vectors. For example, the vector e2 = [0, 1] under a rotation of π /2
becomes 0 −1
1 0
0
1
=
−1
0
= −e1, as expected
If we desire to use different bases to represent the vectors, say basis
B for the original vectors v and basis C for the transformed vectors
v then we shall label the transformation matrix ACB and the linear
transformation will be v
C = ACB [v]B (12.1)
We need a way of deducing ACB . This can be achieved by employ-
ing the change of basis matrices PBS and PCS where
[v]B = PBS [v]S and
v
C = PCS
v
S
Substitution of these formulae in equation (12.1) gives
PCS
v
S = ACB PBS [v]S ⇒
v
S = PSC ACB PBS [v]S
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 187/212
mathematical methods i 187
(recalling that PCS = P−1SC ). Using the standard basis for the trans-
formation would be [v]S = ASS [v]S and hence we must have
ASS = PSC ACB PBS, which we can rearrange to get the linear trans-
formation change of basis formula
ACB = PCS ASS PSB
Note that if we use the same basis B for both the original and trans-
formed vectors then we have
ABB = PBS ASS PSB (12.2)
Recall that two matrices M and N are similar if there exists an in-
vertible matrix Q such that N = Q−1 MQ. Equation (12.2) tells us
that all linear transformation matrices in which the same basis is
used for the original and for the transformed vectors are similar.
Diagonalisation is an example of this situation when matrix A
represents some linear transformation in the standard bases. If we switch to bases composed of the eigenvectors of A then the
matrix representation of the linear transformation takes on the
much simpler form D , a diagonal matrix.
Example 12.5. We determine the change of basis matrix from the stan-
dard basis S to a basis
B = {u1, u2, · · · , un}We note that [ui]S = [ui] for all i = 1, · · · , n and hence we can immedi-
ately write
PSB = [u1 u2 · · · un]
Example 12.6. We determine the transformation matrix for counterclock-
wise rotation through an angle θ in R2 using the basis
B =
1
−1
,
1
1
to represent both the original vectors and the transformed vectors.
We need to calculate PSB and PBS. We can write down immediately
that
PSB = 1 1−1 1 and that PBS = P−1SB = 12
1 −11 1
and so the desired transformation matrix is
ABB = PBS ASS PSB = 12
1 −1
1 1
cos θ − sin θ
sin θ cos θ
1 1
−1 1
= 12
1 −1
1 1
cos θ + sin θ cos θ − sin θ
sin θ − cos θ sin θ + cos θ
=
cos θ − sin θ
sin θ cos θ
That is, ABB = ASS, but if we think about the geometry then this makessense.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 188/212
188
Example 12.7. We determine the transformation matrix for counterclock-
wise rotation through an angle θ in R2 using the basis
C =
1
0
,
1
1
to represent both the original vectors and the transformed vectors.
We need to calculate PSC and PCS. We can write down immediately
that
PSC =
1 1
0 1
and that PCS = P−1
SC =
1 −1
0 1
and so the desired transformation matrix is
ACC = PCS ASS PSC = 1 −1
0 1 cos(θ) − sin(θ)
sin(θ) cos(θ) 1 1
0 1=
1 −1
0 1
cos(θ) cos(θ) − sin(θ)
sin(θ) sin(θ) + cos(θ)
=
cos(θ) − sin(θ) −2sin(θ)
sin(θ) sin(θ) + cos(θ)
Example 12.8. The matrix of a particular linear transfomation in R3 in
which the standard basis has been used for both the original and trans-
formed vectors is
ASS =
1 3 4
2 −1 1
−3 5 1
Determine the matrix of the linear transfomation if basis B is used for the
original vector and basis C is used for the transformed vectors, where
B =
0
0
1
,
0
1
0
,
1
0
0
C =
1
1
1
,
0
1
1
,
0
0
1
Solution. First we need to calculate PCS and PSB . We can immediately
write down PSB and PSC, and use the fact that PCS = P−1SC . We have
PSB = 0 0 1
0 1 01 0 0 and PSC =
1 0 0
1 1 01 1 1 ⇒ PCS = P−
1
SC = 1 0 0
−1 1 00 −1 1
and hence
ACB =
4 3 1
−3 −4 1
0 6 −5
Example 12.9. Recall that a square matrix A is diagonalisable if there ex-
ists an invertible matrix P such that D = P−1 AP where D is a diagonal
matrix. This is in fact a change of basis formula. If A represents a linear
transformation using the standard basis for both the original and trans-
formed vectors, that is, A = ASS, then we can interpret P as representinga change of basis, from a new basis B to the standard basis S (yes, it has to
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 191/212
Part IV
Sequences and Series: by
Luchezar Stoyanov,
Jennifer Hopwood and
Michael Giudici
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 193/212
13
Sequences and Series
13.1 Sequences
By a sequence we mean an infinite sequence of real numbers:
a1, a2, a3, . . . , an, . . .
We denote such a sequence by {an} or {an}∞n=1. Sometimes our
sequences will start with am for some m = 1.
Example 13.1. 1.
1, 1
2, 1
3, . . . ,
1
n, . . .
Here an = 1n for all integers n ≥ 1.
2. bn = (−1)nn3 for n ≥ 1, defines the sequence
−1, 23, −33, 43, −53, 63, . . .
3. For any integer n ≥ 1, define an = 1 when n is odd, and an = 0 when
n is even. This gives the sequence
1,0,1,0,1,0,1,0, . . .
4. The sequence of the so called Fibonacci numbers is defined recur-
sively as follows: a1 = a2 = 1, an+2 = an + an+1 for n ≥ 1. This is
then the sequence
1,1,2,3,5,8,13, . . .
In the same way that we could define the limit as x → ∞ of afunction f (x) we can also define the limit of a sequence. This is not
surprising since a sequence can be regarded as a function with the
domain being the set of positive integers. 1 1 Again this definition can be mademore precise along the lines of Section6.1.1 in the following manner: We saythat {an} has a limit L if for every > 0 there exists a positive integer N such that |an − L| < for all n ≥ N .
Definition 13.2. (Intuitive definition of the limit of a sequence)
Let {an} be a sequence and L be a real number. We say that {an} has
a limit L if we can make an arbitrarily close to L by taking n to be suffi-
ciently large. We denote this situation by
limn→∞ an = L
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 194/212
194
We say that {an} is convergent if limn→∞
an exists; otherwise we say
that {an} is divergent.
Example 13.3. 1. Let an = b for all n ≥ 1. Then limn→∞ an = b.
2. Consider the sequence an = 1n (n ≥ 1). Then lim
n→∞an = 0.
3. If α > 0 is a constant (that is, does not depend on n) and an = 1
nα for
any n ≥ 1, then limn→∞
an = 0. For instance, taking α = 1/2 gives
1, 1√
2,
1√ 3
, . . . , 1√
n, . . . −→ 0
4. Consider the sequence {an} from Example 13.1(3) above, that is
1,0,1,0,1,0,1,0, . . .
This sequence is divergent.
Just as for limits of functions of a real variable we have Limit
Laws and a Squeeze Theorem:
Theorem 13.4 (Limit Laws). Let {an} and {bn} be convergent se-
quences with limn→∞
an = a and limn→∞
bn = b. Then:
1. limn→∞
(an ± bn) = a ± b.
2. limn→∞
(c an) = c a for any constant c ∈ R.
3. limn→∞
(an bn) = a b.
4. If b = 0 and bn = 0, for all n then limn→∞
an
bn=
a
b.
Theorem 13.5 (The Squeeze Theorem or The Sandwich Theorem).
Let {an}, {bn} and {cn} be sequences such that limn→∞
an = limn→∞
cn = a
and
an ≤ bn ≤ cn
for all n ≥ 1. Then the sequence {bn} is also convergent and limn→∞
bn = a.
We can use Theorems 13.4 and 13.5 to calculate limits of various
sequences.
Example 13.6. Using Theorem 13.4 several times
limn→∞
n2 − n + 1
3n2 + 2n − 1 = lim
n→∞
n2 (1 − 1/n + 1/n2)
n2 (3 + 2/n − 1/n2)
=lim
n→∞(1 − 1/n + 1/n2)
limn→∞
(3 + 2/n − 1/n2)
=lim
n→∞1 − lim
n→∞
1
n + lim
n→∞
1
n2
limn
→∞
3 + 2 limn
→∞
1
n − lim
n
→∞
1
n2
= 13
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 195/212
mathematical methods i 195
Here we used the fact that limn→∞
1
n2 = 0 (see Example 13.3(3)).
Example 13.7. Find limn→∞
cos(n)
n if it exists.
Solution. (Note: Theorem 13.4 is not applicable here, since limn
→∞
cos(n)
does not exist.) Since −1 ≤ cos(n) ≤ 1 for all n, we have
− 1
n ≤ cos(n)
n ≤ 1
n
for all n ≥ 1. Using the Squeeze Theorem and the fact that
limn→∞
− 1
n = lim
n→∞
1
n = 0
it follows that limn→∞
cos(n)
n = 0.
A particular way for a sequence to be divergent is if it diverges
to ∞ or−∞.
Definition 13.8. (Diverging to infinity)
We say that the sequence {an}∞n=1 diverges to ∞ if given any positive
number M we can always find a point in the sequence after which all
terms are greater than M. We denote this by limn→∞
an = ∞ or an → ∞.
Similarly, we say that {an}∞n=1 diverges to −∞ if given any negative
number M we can always find a point in the sequence after which all
terms are less than M. We denote this by limn→∞
an = −∞ or an → −∞.
Note as in the limit of a function case when we write limn→∞
an =
∞ we do not mean that the limit exists and is equal to some special
number ∞. We are just using a combination of symbols which has
been agreed will be taken to mean "the limit does not exist and the
reason it does not exist is that the terms in the sequence increase
without bound’. In particular, ∞ IS NOT a number and so do not
try to do arithmetic with it. If you do you will almost certainly end
up with incorrect results, sooner or later, and you will always be
writing nonsense.
Note that it follows from the definitions that if an
→ −∞, then
|an| → ∞.
Example 13.9. 1. Let an = n and bn = −n2 for all n ≥ 1. Then
an → ∞, while bn → −∞, and |bn| = n2 → ∞.
2. an = (−1)n n does not diverge to ∞ and an does not diverge to −∞,
either. However, |an| = n → ∞.
3. Let r > 1 be a constant. Then rn → ∞. If r < −1, then rn does not
diverge to ∞ or −∞. However, |rn| = |r|n → ∞.
4. If an = 0 for all n, then an → 0 if and only if 1|an| → ∞. Similarly,
if an > 0 for all n ≥ 1, then an → ∞ if and only if 1an → 0. (These
properties can be easily derived from the definitions.)
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 196/212
196
13.1.1 Bounded sequences
Definition 13.10. (Bounded)
A sequence {an}∞n=1 is called bounded above if an is less than orequal to some number (an upper bound) for all n. Similarly, it is called
bounded below if an is greater than or equal to some number (a lower
bound) for all n.
We say that a sequence is bounded if it has both an upper and lower
bound.
In Examples 13.1, the sequences in part (1) and (3) are bounded,
the one in part (2) is bounded neither above nor below, while the
sequence in part (4) is bounded below but not above.
Theorem 13.11. Every convergent sequence is bounded.
It is important to note that the converse statement in Theorem
13.11 is not true; there exist bounded sequences that are divergent.
See Example 13.1(3) above. So this theorem is not used to prove
that sequences are convergent but to prove that they are not, as in
the example below.
Example 13.12. The sequence an = (−1)nn3 is not bounded, so by
Theorem 13.11. it is divergent.
An upper bound of a sequence is not unique. For example, both
1 and 2 are upper bounds for the sequence an = 1n . This motivates
the following definition.
Definition 13.13. (Supremum and infimum)
Let A ⊂ R. The least upper bound of A (whenever A is bounded above) is
called the supremum of A and is denoted sup A.
Similarly, the greatest lower bound of A (whenever A is bounded be-
low) is called the infimum of A and is denoted inf A.
Notice that if the set A has a maximum, then sup A is the largest
element of A. Similarly, if A has a minimum, then inf A is the
smallest element of A. However, sup A always exists when A is
bounded above even when A has no maximal element. For exam-
ple, if A = (0, 1) then A does not have a maximal element ( for any
a ∈ A there is always b ∈ A such that a < b). However, it has a
supremum and sup A = 1. Similarly, inf A always exists when A
is bounded below, regardless of whether A has a minimum or not.
For example, if A = (0, 1) then inf A = 0.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 197/212
mathematical methods i 197
Definition 13.14. (Monotone)
A sequence {an} is called monotone if it is either non-decreasing or non-
increasing, that is, if either an ≤ an+1 for all n or an ≥ an+1 for all
n.
We can now state one important property of sequences.
Theorem 13.15 (The Monotone Sequences Theorem). If the se-
quence {an}∞n=1 is non-decreasing and bounded above, then the sequence
is convergent and
limn→∞
an = sup({an})
If {an}∞n=1 is non-increasing and bounded below, then {an} is convergent
and
limn→∞ an = inf ({an})That is, every monotone bounded sequence is convergent.
Theorem 13.15 and the definition of divergence to ±∞ implies
the following:
Key Concept 13.16. For any non-decreasing sequence
a1 ≤ a2 ≤ a3 ≤ . . . ≤ an ≤ an+1 ≤
there are only two options:
Option 1. The sequence is bounded above. Then by the MonotoneSequences Theorem the sequence is convergent, that is, lim
n→∞an exists.
Options 2. The sequence is not bounded above. It then follows
from the definition of divergence to ∞ that limn→∞
an = ∞.
Similarly, for any non-increasing sequence
b1 ≥ b2 ≥ b3 ≥ . . . ≥ bn ≥ bn+1 ≥
either the sequence is bounded below and then limn→∞
bn exists or the
sequence is not bounded below and then limn→∞
bn = −∞.
Example 13.17. Let an = 1n for all n ≥ 1. As we know,
limn→∞
an = 0 = inf
1
n | n = 1, 2 , . . .
This just confirms Theorem 13.15.
Theorem 13.18. 1. For every real constant α > 0
limn→∞
ln n
nα = 0 .
2.
limn→∞
n√ n = limn→∞
n1/n = 1 .
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 198/212
198
3. For every constant a ∈ R
limn→∞
an
n! = 0 .
4. For every constant a ∈ R
limn→∞
1 +
an
n= ea .
13.2 Infinite Series
Definition 13.19. (Infinite series)
An infinite series is by definition of the form
∞
∑ n=1
an = a1 + a2 + . . . + an + . . . (13.1)
where a1, a2, . . . , an, . . . is a sequence of real numbers.
Sometimes we will label the series2 by ∑ ∞n=0 an = a0 + a1 + a2 + . . .. 2 It is important to be clear about thedifference between a sequence anda series. These terms are often usedinterchangeably in ordinary English but in mathematics they have distinctlydifferent and precise meanings.
Since an infinite series involves the sum of infinitely many terms
this raises the issue of what the sum actually is. We can deal with
this in a precise manner by the following definition.
Definition 13.20. (Convergent and divergent series)
For every n ≥ 1,sn = a1 + a2 + . . . + an
is called the nth partial sum of the series in (13.1). If limn→∞
sn = s we say
that the infinite series is convergent and write
∞
∑ n=1
an = s
The number s is then called the sum of the series.
If limn→∞
sn does not exist, we say that the infinite series (13.1) is diver-
gent.
Note that when a series is convergent we have
∞
∑ n=1
an = limn→∞
(a1 + a2 + . . . + an)
Example 13.21. 1. The interval (0, 1] can be covered by subintervals
as follows: first take the subinterval [1/2,1] (of length 1/2), then the
subinterval [1/4,1/2] (of length 1/4); then the subinterval [1/8, 1/4]
(of length 1/8), etc. The total length of these subintervals is 1, which
implies
1 = 12
+ 14
+ 18
+ . . . + 12n
+ . . . .
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 199/212
mathematical methods i 199
2. Given a constant r ∈ R, consider the geometric series
∞
∑ n=0
rn = 1 + r + r2 + . . . + rn−1 + . . .
For the nth partial sum of the above series we have
sn = 1 + r + r2 + . . . + rn−1 = 1 − rn
1 − r
when r = 1.
(a) Let |r| < 1. Then limn→∞
sn = limn→∞
1 − rn
1 − r =
1
1 − r, since lim rn = 0
whenever |r| < 1. Thus the geometric series is convergent in this
case and we write
1
1 − r = 1 + r + r2 + . . . + rn−1 + . . . (13.2)
Going back to part (1), now rigorously we have∞
∑ n=1
1
2n = −1 +
∞
∑ n=0
1
2n = −1 +
1
1 − 1/2 = 1
(b) If |r| > 1, then limn→∞
rn does not exist (in fact |rn| → ∞ as n → ∞),
so sn = 1−rn
1−r does not have a limit; that is, the geometric series is
divergent.
(c) If r = 1, then sn = n → ∞ as n → ∞, so the geometric series is
again divergent.
(d) Let r =
−1. Then sn = 1 for odd n and sn = 0 for even n. Thus the
sequence of partial sums is 1, 0 , 1, 0 , 1, 0 , . . . , which is a divergent
sequence. Hence the geometric series is again divergent.
Conclusion: the geometric series is convergent if and only if |r| < 1.
Theorem 13.22. If the series∞
∑ n=1
an is convergent, then limn→∞
an = 0.
The above theorem says that limn→∞
an = 0 is necessary for con-
vergence. The theorem is most useful in the following equivalent
form.
Theorem 13.23 (Test for divergence). If the sequence {an} does not
converge to 0, then the series∞
∑ n=1
an is divergent.
Example 13.24. The series∞
∑ n=1
n2 + 2n + 1
−3n2 + 4 is divergent by the Test for
Divergence, since
limn→∞
an = limn→∞
n2 + 2n + 1
−3n2 + 4 = lim
n→∞
1 + 2/n + 1/n2
−3 + 4/n2 = −1
3 = 0 .
Note that Theorem 13.22 does not say that limn
→∞
an = 0 implies
that the series is convergent. In fact, the following example showsthat this is not the case.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 200/212
200
Example 13.25. The harmonic series
1 + 1
2 +
1
3 + . . . +
1
n + . . .
is divergent, even though limn→∞
an = 0.
Proof for the interested reader: First notice that sn < sn+1 for everyn ≥ 1, so the sequence {sn} of partial sums is increasing. Therefore to
show that limn→∞
sn = ∞ it is enough to observe that the sequence {sn} is
not bounded. To do this we will demonstrate that the subsequence
s1, s2, s22 , s23 , . . . , s2n , . . . → ∞
as n → ∞.
We have s1 = 1 and s2 = 1 + 12 . Next,
s22 =
1 +
1
2
+
1
3 +
1
4
> s2 +
1
4 +
1
4
= 1 +
2
2
ands23 =
1 + 1
2 + 13 + 1
4
+
15 + 1
6 + 17 + 1
8
> s22 +
18 + 1
8 + 18 + 1
8
> 1 + 2
2 + 12 = 1 + 3
2
Assume that for some k ≥ 1 we have s2k > 1 + k 2 . Then
s2k +1 =
1 + 12 + . . . + 1
2k
+
12k +1
+ 12k +2
+ . . . + 12k +1
> s2k + 2k × 1
2k +1
> 1 + k 2 + 1
2 = 1 + k +12
It now follows by the Principle of Mathematical Induction that
s2n > 1 + n
2
for all positive integers n. This obviously implies limn→∞
s2n = ∞ and there-
fore the sequence {sn} is unbounded. Since this sequence is increasing (as
mentioned earlier), it follows that limn→∞
sn = ∞, so the harmonic series is
divergent.
Indeed, it can be shown that if sn is the nth partial sum of the harmonic
series thensn
ln n =
1 + 12 + 1
3 + . . . + 1n
ln n → 1
as n → ∞. That is, sn → ∞ as fast as ln(n) → ∞.
Definition 13.26. (p-series)
One important series is the p-series given by
∞
∑ n=1
1
n p
Here p ∈ R is an arbitrary constant.
Note that the case p = 1 is the harmonic series which we havealready seen is divergent.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 201/212
mathematical methods i 201
Theorem 13.27. The p-series
∞
∑ n=1
1
n p
is convergent if and only if p > 1.
There are several results and tests that allow you to determine
if a series is convergent or not. We outline some of them in the
theorems below.
Theorem 13.28. If the infinite series∞
∑ n=1
an and∞
∑ n=1
bn are convergent,
then the series∞
∑ n=1
(an ± bn) and∞
∑ n=1
c an (for any constant c ∈ R) are also
convergent with
∞
∑ n=1
(an ±
bn
) =∞
∑ n=1
an ±
∞
∑ n=1
bn
and∞
∑ n=1
c an
= c∞
∑ n=1
an
Example 13.29. Using Theorem 13.28 and the formula for the sum of a
(convergent) geometric series, we get
∞
∑ n=0
−2
3
n
+ 3
4n+1
=
∞
∑ n=0
−2
3
n
+ 3
4
∞
∑ n=0
1
4n
= 1
1 + 2/3 +
3
4 · 1
1 − 1/4 =
8
5
Another way to determine if a series converges is to compare it
to a series whose behaviour we know. The first way to do this is in
an analogous way to the Squeeze Theorem.
Theorem 13.30 (The Comparison Test). Let∞
∑ n=1
an and∞
∑ n=1
bn be
infinite series such that
0 ≤ an ≤ bn
holds for all sufficiently large n.
1. If ∞
∑ n=1
bn is convergent then∞
∑ n=1
an is convergent.
2. If ∞
∑ n=1
an is divergent then∞
∑ n=1
bn is divergent.
Since we know the behaviour of a p-series, we usually use one in
our comparison.
Example 13.31. 1. Consider the series
∞
∑ n=1
1 + sin(n)
n2
Since −1 ≤ sin(n) ≤ 1, we have 0 ≤ 1 + sin(n) ≤ 2 for all n. Thus
0 ≤ 1 + sin(n)
n2 ≤ 2
n2 (13.3)
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 202/212
202
for all integers n ≥ 1.
The series∞
∑ n=1
1
n2 is convergent, since it is a p-series with p = 2 > 1.
Thus∞
∑ n=1
2
n2
is convergent, and now (13.3) and the Comparison Test
show that the series∞
∑ n=1
1 + sin(n)
n2 is also convergent.
2. Consider the series∞
∑ n=1
ln(n)
n . Since
0 < 1
n ≤ ln(n)
n
for all integers n ≥ 3, and the series∞
∑ n=1
1
n is divergent (it is the
harmonic series), the Comparison Test implies that the series∞
∑ n=1
ln(n)n
is also divergent.
Another way to compare to series is to take the limit of the ra-
tio of terms from the corrsponding sequences. This allows us to
compare the rates at which the two sequences go to 0.
Theorem 13.32 (The Limit Comparison Test). Let∞
∑ n=1
an and∞
∑ n=1
bn
be infinite series such that an ≥ 0 and bn > 0 for sufficiently large n, and
letc = lim
n→∞
an
bn≥ 0
(a) If 0 < c < ∞, then∞
∑ n=1
an is convergent if and only if ∞
∑ n=1
bn is
convergent.
(b) If c = 0 and∞
∑ n=1
bn is convergent then∞
∑ n=1
an is convergent.
(c) If limn
→∞
an
bn= ∞ and
∞
∑ n=1
an is convergent then∞
∑ n=1
bn is convergent.
Remark 13.33. 1. Clearly in case (a) above we have that∞
∑ n=1
an is diver-
gent whenever∞
∑ n=1
bn is divergent. In case (b), if ∞
∑ n=1
an is divergent,
then∞
∑ n=1
bn must be also divergent. And in case (c), if ∞
∑ n=1
bn is diver-
gent, then∞
∑ n=1
an is divergent.
2. Notice that in cases (b) and (c) we have implications (not equivalences).
For example, in case (b) if we know that∞∑
n=1
an is convergent, we can-
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 204/212
204
Remark 13.36. The conclusion of Theorem 13.35 about the convergence
of an alternating series remains true if we assume that a n ≥ 0 and an ≥an+1 for all sufficiently large n and lim
n→∞an = 0.
Example 13.37. The series
∞
∑ n=1
(−1)n−1 1
n = 1 − 1
2 +
1
3 − 1
4 +
1
5 − 1
6 . . .
is convergent according to the Alternating Series Test, since an = 1n ≥ 0
for all n ≥ 1, and the sequence {an} is decreasing and converges to 0.
13.3 Absolute Convergence and the Ratio Test
We begin with a theorem.
Theorem 13.38. If the infinite series ∞∑
n=1
|an| is convergent, then the
series∞
∑ n=1
an is also convergent.
This motivates the following definition.
Definition 13.39. (Absolute convergence)
An infinite series∞
∑ n=1
an is called absolutely convergent if the series
∞
∑ n=1
|an| is convergent. If ∞
∑ n=1
an is convergent but∞
∑ n=1
|an| is divergent
then we say that∞
∑ n=1
an is conditionally convergent.
As Theorem 13.38 shows, every absolutely convergent series is
convergent. The next example shows that the converse is not true.
Example 13.40. 1. The series∞
∑ n=1
(−1)n−1
n is convergent (by the Alter-
nating Series Test). However,
∞
∑ n=1
(−1)n−1
n
=∞
∑ n=1
1
n
is divergent (it is the harmonic series). Thus∞
∑ n=1
(−1)n−1
n is condition-
ally convergent.
2. Consider the series∞
∑ n=1
sin(n)
n2 . Since
0 ≤ sin(n)
n2
≤ 1
n2
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 205/212
mathematical methods i 205
and the series∞
∑ n=1
1
n2 is convergent (a p-series with p = 2 > 1),
the Comparison Test implies that∞
∑ n=1
sin(n)
n2
is convergent. Hence
∞
∑ n=1
sin(n)
n2 is absolutely convergent and therefore convergent.
Theorem 13.41 (The Ratio Test). Let∞
∑ n=1
an be such that
limn→∞
an+1
an
= L
1. If L < 1, then∞
∑ n=1
an is absolutely convergent.
2. If L > 1, then∞
∑ n=1
an is divergent.
Note that when L = 1 the Ratio Test gives no information.
Example 13.42. Consider the series∞
∑ n=1
n2
2n. We have
an+1
an
= an+1
an=
(n + 1)2/2n+1
n2/2n =
1
2
n + 1
n
2
→ 1
2
as n → ∞. Since L = 1
2 < 1, the Ratio Test implies that
∞
∑ n=1
n2
2n is
absolutely convergent.
Example 13.43. For any constant b ∈
R the infinite series∞
∑ n=0
bn
n! is
absolutely convergent. We have an+1
an
=
bn+1/(n + 1)!
bn/n!
= |b|n + 1
→ 0
as n → ∞. So, by the Ratio Test, the series∞
∑ n=0
bn
n! is absolutely conver-
gent. In particular (by the Test for Divergence), limn→∞
bn
n! = 0.
13.4 Power Series
Definition 13.44. (Power series)
Let a ∈ R be a given number, {an}∞n=0 a given sequence of real numbers
and x ∈ R is a variable (parameter).
A series of the form
∞
∑ n=0
an (x − a)n = a0 + a1 (x − a) + a2 (x − a)2 + . . . + an (x − a)n + . . .
is called a power series centred at a. When a = 0, the series is simply
called a power series.
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 206/212
206
Clearly a power series can be regarded as a function of x defined
for all x ∈ R for which the infinite series is convergent.
Theorem 13.45. For a power series ∑ ∞n=0 an(x − a)n one of the following
three possibilities occurs:
(a) The series is absolutely convergent for x = a and divergent for all
x = a.
(b) The series is absolutely convergent for all x ∈ R.
(c) There exists R > 0 such that the series is absolutely convergent for
|x − a| < R and divergent for |x − a| > R.
In other words, a power series is convergent at only one point or
convergent everywhere or convergent only on an interval centred at
a. It is not possible for it to only be convergent at several separated
points or on several separate intervals.
The number R in part (c) of Theorem 13.45 is called the radius of
convergence of the power series. In case (a) we define R = 0, while in
(b) we set R = ∞. When |x − a| = R then the series may or not be
convergent. It is even possible for a series to be convergent for one
value of x for which |x − a| = R and divergent for another.
Example 13.46. Find all x ∈ R for which the series
∞
∑ n=1
(−1)n
√ n
(x + 3)n (13.4)
is absolutely convergent, conditionally convergent or divergent.
Solution. When x = −3 the series is absolutely convergent.
Let x = 3. Then an+1
an
=
(−1)n+1(x + 3)n+1/√
n + 1
(−1)n(x + 3)n/√
n
= |x + 3|
n
n + 1 → |x + 3|
as n → ∞. Thus, by the Ratio Test, the series is absolutely convergent for
|x + 3| < 1, that is, for −4 < x < −2, and divergent for |x + 3| > 1, that
is, for x < −4 and x > −2.
It remains to check the points x = −4 and x = −2.
Substituting x =
−4 in (13.4) gives the series
∞
∑ n=1
(−1)n
√ n (
−1)n =
∞
∑ n=1
1
n1/2 which is divergent (it is a p-series with p = 1/2 < 1).
When x = −2, the series (13.4) becomes∞
∑ n=1
(−1)n
√ n
, which is conver-
gent by the Alternating Series Test. However,
∞
∑ n=1
(−1)n
√ n
=∞
∑ n=1
1√ n
is divergent (as we mentioned above), so∞
∑ n=1
(−1)n
√ n
is conditionally con-
vergent.Conclusion: The series (13.4) is
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 208/212
208
for all |x − a| < R. Then substituting x = a in (13.6) gives f (a) =
a1.
Similarly, differentiating (13.6) yields
f (x) = 2a2 + 6a3 (x − a) + . . . + n (n − 1) an (x − a)n−2 + . . .
for all |x − a| < R and substituting x = a in this equality gives
f (a) = 2a2, that is, a2 = f (a)
2! .
Continuing in this fashion we must have
an = f (n)(a)
n! for each n.
Definition 13.49. (Taylor Series)
Assume that f (x) has derivatives of all orders on some interval I contain-
ing the point a in its interior. Then the power series
∞
∑ n=0
f (n)(a)
n! (x − a)n = f (a) + f (a) (x − a) +
f (a)
2! (x − a)2 + . . .
is called the Taylor series of f about a. When a = 0 this series is called
the MacLaurin Series of f .
Recall from Chapter 9, that
T n,a(x) = f (a) + f (a) (x − a) + f (a)
2! (x − a)2 +
f (a)
3! (x − a)3 + . . . +
f (n)(a)
n! (x − a)n
is the nth Taylor polynomial of f at a. Clearly this is the (n + 1)st
partial sum of the Taylor series of f at a. For any given x ∈ I we
have
f (x) = T n,a(x) + Rn,a(x)
where Rn,a(x) is the remainder (or error term). So, if for some x ∈ I
we have Rn,a(x) → 0 as n → ∞, then
f (x) = limn→∞
T n,a(x) =∞
∑ n=0
f (n)(a)
n! (x − a)n
That is, when Rn,a(x)
→0 as n
→∞, the Taylor series is convergent
at x and its sum is equal to f (x).
Example 13.50. 1. Consider the function f (x) = e x. Then f (n)(x) = e x
for all integers n ≥ 1, so f (n)(0) = 1 for all n. Thus, the Taylor series
for ex about 0 has the form
∞
∑ n=0
xn
n! = 1 +
x
1! +
x2
2! +
x3
3! + . . . +
xn
n! + . . . . (13.7)
We will now show that the sum of this series is equal to e x for all x ∈R.
Fix a constant b > 0. For any x∈
[−
b, b] we have
f (x) = T n,0(x) + Rn,0(x)
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 209/212
mathematical methods i 209
where T n,0(x) is the nth Taylor polynomial of f about 0 and Rn,0(x) is
the corresponding remainder. By Taylor’s formula,
Rn,0(x) = f (n+1)( z)
(n + 1)! xn+1
for some z between 0 and x. Thus | z| ≤ b, and therefore
0 ≤ |Rn,0(x)| =
f (n+1)( z)
(n + 1)! xn+1
≤ e z
(n + 1)! bn+1 ≤ eb bn+1
(n + 1)! → 0 ,
as n → ∞ by Example 13.43. By the Squeeze Theorem, (for the given
x) limn→∞
|Rn,0(x)| = 0. Thus the Taylor series (13.7) is convergent
and its sum is ex. Since b > 0 can be taken arbitrarily large the above
conclusion can be made about any x ∈ R. Thus
ex =∞
∑ n=0
xn
n!
for all x ∈ R.
2. Using the above argument (with slight modifications), one shows that
sin(x) =∞
∑ n=0
(−1)n x2n+1
(2n + 1)! =
x
1!− x3
3! +
x5
5! − . . . + (−1)n x2n+1
(2n + 1)! + . . .
for all x ∈ R. Here the right hand side is the Taylor series of sin(x)
about 0.
3. Similarly,
cos(x) =∞
∑ n=0
(−1)n x2n
(2n)! = 1 − x2
2! +
x4
4! − x6
6! − . . . + (−1)n x2n
(2n)! + . . .
for all x ∈ R. Here the right hand side is the Taylor series of cos(x)
about 0.
Combining parts (1), (2) and (3) and extending the notion of a power
series to the complex numbers this gives a way of showing that e ix =
cos(x) + i sin(x).
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 211/212
14
Index
p-series, 200
divergent, 194
absolute maximum, 129, 131
absolute minimum, 129, 131
absolutely convergent, 204
additive identity, 49
Alternating series, 203
associative, 48
augmented matrix, 15
auxiliary equation, 157
back substitution, 17
basic variables, 20
basis, 40 boundary, 132
boundary conditions, 162
boundary value problem, 162
bounded, 133
change of coordinates matrix, 185
characteristic equation, 157
closed, 132
codomain, 77
coefficient matrix, 12
column space, 51
column vector, 26commutative, 48
commute, 49
conditionally convergent., 204
consistent, 10
continuous, 103
continuously differentiable, 109
contrapositive, 39
convergent, 194, 198
critical point, 130, 131
derivative, 109, 119
determinant, 69
diagonal matrix, 50
differentiable, 107, 119
differential equations, 145
dilation, 80
dimension, 44
direction field, 150directional derivative, 124
divergent, 198
diverges to ∞, 195
domain, 77
eigenvalues, 171
eigenvectors, 171
elementary matrix, 74
elementary row operation, 12
extremum, 129, 131
free parameter, 20
free variable, 20
full rank, 55
function, 77
Gaussian Elimination, 17
gradient vector, 125
group, 68
harmonic series, 200
Hessian matrix, 135
homogeneous, 23, 156
idempotent matrix, 50
identity, 50
identity matrix, 50, 80
image, 77
inconsistent., 10
independent, 38
infimum, 196
infinite series, 198
inflection point, 134
initial value problem, 155
inverse, 59, 61
inverse function, 82
invertible, 61, 82
Jacobian, 121
Jacobian matrix, 121
kernel, 83
leading entries, 19
leading entry, 16
leading variables, 20
left-distributive, 48
linear combination, 32
linear transformation, 77
linearly independent, 38
local maximum, 129, 131
local minimum, 129, 131
lower-triangular matrix, 50
MacLaurin Series, 208
main diagonal, 50
matrix, 47
matrix addition, 47
matrix multiplication, 47
matrix transposition, 47
monotone, 197
multiplicative identity, 49
negative definite, 136
nilpotent matrix, 50
non-basic variable, 20
non-invertible, 61
nonhomogeneous, 156
normal vector, 118
null space, 56
nullity, 58
open, 132
ordinary differential equation, 145
orthogonal projection, 78
partial derivative, 113
partial differential equation, 145
pivot entry, 17
pivot position, 17
8/12/2019 Mathematics Methods
http://slidepdf.com/reader/full/mathematics-methods 212/212
212
polar coordinates, 102
positive definite, 136
power series, 207
product, 26
proof, 14
quadratic form, 136
rank, 55
Rank-Nullity Theorem, 58
ratio, 80
reduced row-echelon form, 64
right-distributive, 48
row echelon form, 16
row space, 51
row vector, 26
row-reduction, 17
saddle point, 132
scalar, 9
scalar multiple, 9
scalar multiplication, 25, 47
separable differential equation, 152
sequence, 193similar, 187
skew-symmetric matrix, 50
span, 32
spanning set, 35
standard basis, 41
standard basis vectors, 41
standard matrix, 80
subspace, 27
sum, 26
supremum, 196
symmetric matrix, 50
systems of linear equations, 9
Taylor polynomial, 139
Taylor series, 208
transpose, 48
upper-triangular matrix, 50
vector addition, 25
vector space, 25
vector subspace, 27
vectors, 25
zero divisors, 50
zero matrix, 50
zero-vector, 27