mathematics methods

212
MATHEMATICAL METHODS I

Upload: zackhussen

Post on 03-Jun-2018

232 views

Category:

Documents


0 download

TRANSCRIPT

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 1/212

M A T H E M A T I C A L

M E T H O D S I

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 2/212

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 3/212

Contents

I Linear Algebra: by Gordon Royle and Alice Devillers   7

1   Systems of Linear Equations   91.1   Systems of Linear Equations   9

1.2   Solving Linear Equations   12

1.3   Gaussian Elimination   16

1.4   Back Substitution   19

2   Vector Spaces and Subspaces   25

2.1   The vector spaceR

n

252.2   Subspaces   27

2.3   Spans and Spanning Sets   31

2.4   Linear Independence   37

2.5   Bases   40

3   Matrices and Determinants   47

3.1   Matrix Algebra   47

3.2   Subspaces from matrices   51

3.3   The null space   56

3.4   Solving systems of linear equations   59

3.5   Matrix Inversion   59

3.6   Determinants   68

4   Linear transformations   77

4.1   Introduction   77

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 4/212

4

II Differential Calculus: by Luchezar Stoyanov, Jennifer Hopwood and MichaelGiudici. Some pictures courtesy of Kevin Judd.   85

5   Vector functions and functions of several variables   87

5.1   Vector valued functions   87

5.2   Functions of two variables   89

5.3   Functions of three or more variables   91

5.4   Summary   92

6   Limits and continuity   95

6.1   Scalar-valued functions of one variable   95

6.2   Limits of vector functions   100

6.3   Limits of functions of two variables   100

6.4   Continuity of Functions   103

7   Differentiation   107

7.1   Derivatives of functions of one variable   1077.2   Differentiation of vector functions   109

7.3   Partial Derivatives   113

7.4   Tangent Planes and differentiability   116

7.5   The Jacobian matrix and the Chain Rule   119

7.6   Directional derivatives and gradients   124

8   Maxima and Minima   129

8.1   Functions of a single variable   129

8.2   Functions of several variables   131

8.3   Identifying local maxima and minima   134

9   Taylor Polynomials   139

9.1   Taylor polynomials for functions of one variable   139

9.2   Taylor polynomials for functions of two variables   141

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 5/212

mathematical methods i 5

III Differential equations and eigenstructure: by Des Hill   143

10   Differential Equations   145

10.1 Introduction   14510.2 Mathematical modelling with ordinary differential equations   148

10.3 First order ordinary differential equations   149

10.4 Initial conditions   155

10.5 Linear constant-coefficient ordinary differential equations   156

10.6 Linear nonhomogeneous constant-coefficient differential equations   158

10.7 Initial and boundary conditions   162

10.8 Summary of method   164

10.9 Partial differential equations   164

11   Eigenvalues and eigenvectors   171

11.1 Introduction   171

11.2 Finding eigenvalues and eigenvectors   172

11.3 Some properties of eigenvalues and eigenvectors   175

11.4 Diagonalisation   177

11.5 Linear Systems   179

11.6 Solutions of homogeneous linear systems of differential equations   17911.7 A change of variables approach to finding the solutions   181

12   Change of Basis   183

12.1 Introduction   183

12.2 Change of basis   185

12.3 Linear transformations and change of bases   186

IV Sequences and Series: by Luchezar Stoyanov, Jennifer Hopwood and Michael Giudici   191

13   Sequences and Series   193

13.1 Sequences   193

13.2 Infinite Series   198

13.3 Absolute Convergence and the Ratio Test   204

13.4 Power Series   205

13.5 Taylor and MacLaurin Series   207

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 6/212

6

14   Index   211

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 7/212

Part I

Linear Algebra: by Gordon

Royle and Alice Devillers

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 8/212

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 9/212

1

Systems of Linear Equations

This chapter covers the systematic solution of  systems of linear equations  using Gaussian elimination

and back-substitution and the description, both algebraic and geometric, of their  solution space.

Before commencing this chapter, students should be able to:

• Plot linear equations in 2  variables, and

• Add and multiply matrices.

After completing this chapter, students will be able to:

• Systematically solve systems of linear equations with many variables, and

• Identify when a system of linear equations has 0, 1 or infinitely many solutions, and

• Give the solution set of a system of linear equations in parametric form.

1.1   Systems of Linear Equations

A linear equation is an equation of the form

x + 2 y =  4

where each term in the equation is either a number  1 (i.e. “6”) or   1 You will often see the word scalarused to refer to a number, and  scalarmultiple to describe a numerical multi-ple of a variable.

a numerical multiple of a variable (i.e., “2x”, “4 y”). If an equation

involves powers or products of variables (x2, xy, etc.) or any other

functions (sin x, ex, etc.) then it is not linear.

A system of linear equations is a set of one or more linear equa-

tions considered together, such as

x   +   2 y   =   4

x −   y   =   1

which is a system of two equations in the two variables2 x and  y.   2 Often the variables are called “un-knowns” emphasizing that  solving asystem of linear equations is a processof  finding the unknowns.

A solution to a system of linear equations is an assignment of 

values to the variables such that all  of the equations in the system

are satisfied. For example, there is a unique solution to the system

given above which is

x =  2,   y =  1.

Particularly when there are more variables, it will often be useful togive the solutions as  vectors3 like   3 In this case we could also just give

the solution as (2, 1) where, by conven-tion the first component of the vectoris the x-coordinate Usually the vari-ables will have names like  x , y, z  or  x1,x2, . . .,  xn  and so we can just specifythe vector alone — in this case  (2, 1) —and it will be clear which componentcorres onds to which variable.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 10/212

10

(x, y) = (2, 1).

If the system of linear equations involves just two variables, then

we can visualise the system  geometrically by plotting the solutions to

each equation separately on the  x y-plane. The solution to the sys-

tem of linear equations is the point where the two plots intersect,which in this case is the point  (2, 1).

−2   −1 1   2 3

−2

−1

1

2

3   Figure 1.1: The two linear equationsplotted as intersecting lines in thexy-plane

It is easy to visualise systems of linear equations in two vari-

ables, but it is more difficult in three variables, where we need

3-dimensional plots. In three dimensions the solutions to a single

linear equation such as   This particular equation describes theplane with normal vector (1,2,

−1)

containing the point (4,0,0).x   +   2 y − z   =   4

form a plane in  3-dimensional space. While computer algebra sys-

tems can produce somewhat reasonable plots of surfaces in three

dimensions, it is hard to interpret plots showing two or more inter-

secting surfaces.

With four or more variables any sort of visualisation is essen-   It is still very useful to use geometricintuition to think about systems of linear equations with many variablesprovided you are careful about whereit no longer applies.

tially impossible and so to reason about systems of linear equations

with many variables, we need to develop  algebraic tools rather than

 geometric ones.

1.1.1   Solutions to systems of linear equations

The system of linear equations shown in Figure  1.1 has a unique

solution. In other words there is just one (x, y) pair that satisfies

 both equations, and this is represented by the unique point of inter-

section of the two lines. Some systems of linear equations have  no   If a system of linear equations hasat least one solution, then it is calledconsistent, and otherwise it is calledinconsistent.

solutions at all. For example, there are no possible values for (x, y)

that satisfy both of the following equations:

x   +   2 y   =   4

2x   +   4 y   =   3

Geometrically, the two equations determine  parallel but  differentlines, and so they do not meet.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 11/212

mathematical methods i 11

−2   −1 1   2 3

−2

−1

1

2

3   Figure 1.2: An inconsistent systemplotted as lines in the  xy-plane

There is another possibility for the number of solutions to a sys-tem of linear equations, which is that a system may have  infinitely

many solutions. For example, consider the system

x   +   2 y   +   z   =   4

 y   +   z   =   1(1.1)

Each of the two equations determines a plane in three dimen-

sions, and as the two planes are not parallel4, they meet in a  line   4 The two planes are not parallel because the normal vectors to thetwo planes, that is  n1   = (1,2,1) andn2  = (0,1,1) are not parallel.

and so every point on the line  is a solution to this system of linear

equations.

 How can we describe the solution set to a system of linear equations with

infinitely many solutions?

One way of describing an infinite solution set is in terms of  free

 parameters where one (or more) of the variables is left unspecified

with the values assigned to the other variables being expressed as

 formulas that depend on the free variables.

Let’s see how this works with the system of linear equations

given by (1.1): here we can choose  z  to be the “free variable” but

then to satisfy the second equation it will be necessary to have

 y =  1

− z. Then the first equation can only be satisfied by taking

x =  4 − 2 y − z

= 4 − 2(1 − z) − z   (using y  =  1 − z)

= 2 + z.

Thus the complete solution set S  of system (1.1) is

S = {(2 + z, 1 − z,  z) |  z ∈ R}.

To find a particular solution to the linear system, you can pick

any desired value for  z  and then the values for  x  and  y  are deter-

mined. For example, if we take z   =  1 then we get (3,0,1) as asolution, and if we take  z  =  0 then we get (2,1,0), and so on. For

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 12/212

12

this particular system of linear equations it would have been possi-

 ble to choose one of the other variables to be the “free variable” and

we would then get a different expression for the same solution set.

Example 1.1.  (Different free variable) To rewrite the solution set S   =

{(2 + z, 1 − z, z) |   z ∈  R} so that the y-coordinate is the free variable, just notice that as y   =   1 −  z, this implies that z   =   1 −  y and so the

solution set becomes S  = {(3 − y, y, 1 − y) |  y ∈ R}.

A system of linear equations can also be expressed as a single

matrix equation involving the product of a matrix and a  vector of 

variables. So the system of linear equations

x   +   2 y   +   z   =   5

 y −   z   = −1

2x   +   3 y −   z   =   3

can equally well be expressed as 1 2 1

0 1 −1

2 3 −1

x

 y

 z

 =

5

−1

3

 just using the usual rules for multiplying matrices.5 In general, a   5 Matrix algebra is discussed in detailin Chapter  3  but for this representationas a system of linear equations, justthe definition of the product of twomatrices is needed.

system of linear equations with  m  equations in n  variables has the

form

 A x =  b

where  A is an  m×

n coefficient matrix,  x  is an n×

1 vector of vari-

ables, and b  is an  m × 1 vector of scalars.

1.2   Solving Linear Equations

In this section, we consider a systematic method of solving systems

of linear equations. The method consists of two steps, first using   In high school, systems of linear

equations are often solved with  adhoc methods that are quite suitablefor small systems, but which arenot sufficiently systematic to tacklelarger systems. It is  very important

to thoroughly learn the systematicmethod, as solving systems of linearequations is a fundamental part of many of the questions that arise inlinear algebra. In fact, almost everyquestion in linear algebra ultimatelydepends on setting up and solving asuitable system of linear equations!

Gaussian elimination to reduce the system to a simpler system of 

linear equations, and then  back-substitution to find the solutions to

the simpler system.

1.2.1   Elementary Row Operations

An elementary row operation is an operation that transforms a system

of linear equations into a  different, but equivalent system of linear

equations, where “equivalent” means that the two systems have

identical solutions. The answer to the obvious question —  “ Why

bother transforming one system of linear equations into another?”  —

is that the new system might be  simpler to solve than the original

system. In fact there are some systems of linear equations that are

extremely simple to solve, and it turns out that by systematically

applying a sequence of elementary row operations, we can transform

any system of linear equations into an equivalent system whosesolutions are very simple to find.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 13/212

mathematical methods i 13

Definition 1.2.  (Elementary Row Operations)

 An elementary row operation  is one of the following three types of 

transformation applied to a system of linear equations:

Type  1  Interchanging two equations.Type  2  Multiplying an equation by a non-zero scalar.

Type  3  Adding a multiple of one equation to another equation.

In a system of linear equations, we let  R i  denote the  i-th equa-

tion, and so we can express an elementary row operation symboli-

cally as follows:

Ri ↔ R j   Exchange equations R i  and R j

Ri ← αRi   Multiply equation R i  by α

Ri ← Ri + αR j   Add α  times  R j  to R i

We will illustrate elementary row operations on a simple system

of linear equations:

x   +   2 y   +   z   =   5

 y −   z   = −1

2x   +   3 y −   z   =   3

(1.2)

Example 1.3.   (Type 1  Elementary Row Operation) Applying the Type  1

elementary row operation R1 ↔  R2  (in words, “interchange equations 1

and 2") to the original system  (1.2) yields the system of linear equations:

 y −   z   = −1

x   +   2 y   +   z   =   5

2x   +   3 y −   z   =   3

It is obvious that this new system of linear equations has exactly the same

solutions as the original system, because each individual equation is un-

changed and listing them in a different order does not alter which vectors

satisfy them all.

Example 1.4.   (Type 2  Elementary Row Operation) Applying the Type

2 elementary row operation R2 ←  3R2 (in words, “multiply the secondequation by 3”) to the original system  (1.2) gives a new system of linear

equations:

x   +   2 y   +   z   =   5

3 y −   3 z   = −3

2x   +   3 y −   z   =   3

 Again it is obvious that this system of linear equations has exactly the

same solutions as the original system. While the second equation is

changed, the solutions to this individual equation are not changed.6   6 This relies on the equation beingmultiplied by a non-zero scalar.

Example 1.5.   (Type 3  Elementary Row Operation) Applying the Type  3

elementary row operation R3 ←  R3 − 2R1  (in words,“add −2 times the first equation to the third equation" ) to the original system (1.2) gives a

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 14/212

14

new system of linear equations:

x   +   2 y   +   z   =   5

 y −   z   = −1

−  y

 −  3 z   =

 −7

In this case, it is not obvious that the system of linear equations has the

same solutions as the original. In fact, the system is actually different

 from the original, but it happens to have the  exact same set of solutions.

This is so important that it needs to be proved.

As foreshadowed in the last example, in order to use elementary

row operations with confidence, we must be sure that the  set of solu-

tions to a system of linear equations is  not changed when the system

is altered by an elementary row operation. To convince ourselves of 

this, we need to prove7 that applying an elementary row operation   7 A proof  in mathematics is a carefulexplanation of why some mathematicalfact is true. A proof normally consistsof a sequence of statements, each fol-lowing logically from the previousstatements, where each individuallogical step is sufficiently simple thatit can easily be checked. The word“proof” often alarms students, butreally it is nothing more than a verysimple line-by-line explanation of amathematical statement. Creatingproofs of interesting or useful newfacts is the raison d‘être of a profes-sional mathematician.

to a system of linear equations neither destroys existing solutionsnor creates new ones.

Theorem  1.6.  Suppose that S is a system of linear equations, and that

T is the system of linear equations that results by applying an elementary

row operation to S. Then the set of solutions to S is equal to the set of 

solutions to T.

Proof.   As discussed in the examples, this is obvious for Type 1  and

Type  2  elementary row operations. So suppose that  T  arises from S

 by performing the Type 3  elementary row operation R i ←  Ri + αR j.

Then S  consists of  m  equations

S = {R1, R2, . . . , Rm}

while T  only differs in the  i-th equation

T  = {R1, R2, . . . , Ri−1, Ri + αR j, Ri+1, . . . , Rm}.

It is easy to check that if a vector satisfies two equations  R i  and

R j, then it also satisfies  R i +  αR j  and so any solution to  S  is also a

solution to T . What remains to be checked is that any solution to  T 

is a solution to S. However if a vector satisfies all the equations in

T , then it satisfies  R i + αR j  and R j, and so it satisfies the equation

(Ri + αR j) + (−αR j)

which is just  R i. Thus any solution to  T  also satisfies S.

Now let’s consider applying an entire sequence of elementary

row operations to our example system of linear equations (1.2) to

reduce it to a much simpler form.

So, starting with

x   +   2 y   +   z   =   5

 y −   z   = −12x   +   3 y −   z   =   3

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 15/212

mathematical methods i 15

apply the Type 3  elementary row operation  R3 ← R3 − 2R1  to get

x   +   2 y   +   z   =   5

 y −   z   = −1

−   y −   3 z   = −7

followed by the Type  3  elementary row operation  R3 ←   R3 +  R2

obtaining

x   +   2 y   +   z   =   5

 y −   z   = −1

−   4 z   = −8

Now notice that the third equation only involves the variable  z,

and so it can now be solved, obtaining  z  =  2. The second equation

involves just y, z  and as z  is now known, it really only involves  y,

and we get y  =  1. Finally, with both  y  and  z  known, the first equa-

tion only involves  x  and by substituting the values that we know

into this equation we discover that x  =  1. Therefore, this system of linear equations has the unique solution  (x, y, z) = (1,1,2).

Notice that the final system was essentially trivial to solve, and

so the elementary row operations converted the original system

into one whose solution was trivial.

1.2.2   The Augmented Matrix

The names of the variables in a system of linear equations are es-

sentially irrelevant — whether we call three variables  x , y  and  z  or

x1,  x2  and x3  makes no fundamental difference to the equations or

their solution. Thus writing out each equation in full when writingdown a sequence of systems of linear equations related by elemen-

tary row operations involves a lot of unnecessary repetition of the

variable names. Provided each equation has the variables in the

same order, all the information is contained solely in the  coefficients,

and so these are all we need. Therefore we normally represent a

system of linear equations by a matrix known as the  augmented

matrix of the system of linear equations; each row of the matrix rep-

resents a single equation, with the coefficients of the variables to the

left of the bar, and the constant term to the right of the bar. Each

column to the left of the bar contains all of the coefficients for a sin-

gle variable. For our example system (1.2), we have the following:

x   +   2 y   +   z   =   5

 y −   z   = −1

2x   +   3 y −   z   =   3

1 2 1 5

0 1 −1   −1

2 3 −1 3

Example 1.7.  (From matrix to linear system) What system of linear

equations has the following augmented matrix?

0 −1 2 3

1 0 −2 4

3 4 1 0

The form of the matrix tells us that there are three variables, which we

can name arbitrarily, say x1, x2 and x3. Then the first row of the matrix

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 16/212

16

corresponds to the equation 0x1 − 1x2 + 2x3   =   3, and interpreting the

other two rows analogously, the entire system is

−   x2   +   2x3   =   3

x1

  −  2x3   =   4

3x1   +   4x2   +   x3   =   0

We could also have chosen any other three names for the variables.

In other words, the augmented matrix for the system of linear

equations A x =  b  is just the matrix [ A | b].

1.3   Gaussian Elimination

When solving a system of linear equations using the augmented

matrix, the elementary row operations8 are performed directly on   8 This is why they are called elemen-tary row operations, rather than ele-

mentary equation operations, becausethey are always viewed as operatingon the rows of the augmented matrix

the augmented matrix.As explained earlier, the aim of the elementary row operations is

to put the matrix into a simple form from which it is easy to “read

off” the solutions; to be precise we need to define exactly the simple

form that we are trying to achieve.

Definition 1.8.  (Row Echelon Form)

 A matrix is in row echelon form if 

1. Any rows of the matrix consisting entirely of zeros occur as the last

rows of the matrix, and

2. The first non-zero entry of each row is in a column strictly to the right

of the first non-zero entry in any of the earlier rows.

This definition is slightly awkward to read, but very easy to

grasp by example. Consider the two matrices

1 1 −1 2 0

0 0 −2 1 3

0 0 0 1 0

0 0 0 0 −1

1 1 −1 2 0

0 0 −2 1 3

0 2 0 1 0

0 0 1 2 −1

Neither matrix has any all-zero rows so the first condition is au-

tomatically satisfied. To check the second condition we need to

identify the first non-zero entry in each row — this is called the

leading entry:

1 1   −1 2 0

0 0   −2 1 3

0 0 0 1 0

0 0 0 0   −1

1 1   −1 2 0

0 0   −2 1 3

0 2 0 1 0

0 0 1 2 −1

In the first matrix, the leading entries in rows 1, 2, 3 and 4 occurin columns 1, 3, 4 and 5 respectively and so the leading entry for

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 17/212

mathematical methods i 17

each row always occurs  strictly further to the right  than the leading

entry in any earlier row. So this first matrix is in row-echelon form.

However for the second matrix, the leading entry in rows 2 and 3

occur in columns 3 and 2 respectively, and so the leading entry in

row 3 actually occurs to the left of the leading entry in row 2; hence

this matrix is not in row-echelon form.

Example 1.9.  (Row-echelon form) The following matrices are all in row-

echelon form: 1 0 2 1

0 0 1 −1

0 0 0 0

1 1 2 3

0 2 1 −1

0 0 3 0

1 0 0 0

0 2 1 1

0 0 0 1

Example 1.10.  (Not row-echelon form) None of the following matrices are

in row-echelon form: 1 0 2 1

0 0 0 0

0 0 1 −1

1 1 2 3

0 2 1 −1

0 1 3 0

1 0 0 0

0 0 1 1

0 0 2 1

Gaussian Elimination (sometimes called  row-reduction) is a system-

atic method for applying elementary row operations to a matrix

until it is in row-echelon form. We’ll see in the next section that a

technique called back substitution, which involves processing theequations in reverse order, can easily determine the set of solu-

tions to a system of linear equations whose augmented matrix is in

row-echelon form.

Without further ado, here is the algorithm for Gaussian Elimina-

tion, first informally in words, and then more formally in symbols.

The algorithm is defined for any matrices, not just the augmented

matrices arising from a system of linear equations, because it has

many applications.

Definition 1.11.  (Gaussian Elimination — in words)

Let A be an m × n matrix. At each stage in the algorithm, a particular

 position in the matrix, called the  pivot position, is being processed. Ini-

tially the pivot position is at the top-left of the matrix. What happens at

each stage depends on whether the pivot entry (that is, the number in the

 pivot position) is zero or not.

1. If the pivot entry is zero then, if possible, interchange the pivot row

with one of the rows below it, in order to ensure that the pivot entry is

non-zero. This will be possible unless the pivot entry and every entry

below it are zero, in which case simply move the pivot position one

column to the right.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 18/212

18

2. If the pivot entry is non-zero then, by adding a suitable multiple of the

 pivot row to every row below the pivot row, ensure that every entry

 below the pivot entry is zero. Then move the pivot position one column

to the right and one row down.

When the pivot position is moved off the matrix, then the process finishes

and the matrix will be in row-echelon form.

The process of “adding a multiple of the pivot row to every row

 below it in order to zero out the column below the pivot entry” is

called pivoting on the pivot entry for short.

Example 1.12.  (Gaussian Elimination) Consider the following matrix,

with the initial pivot position marked:

 A = 2 1 2 42 1 1 0

4 3 2 4

The initial pivot position is the (1, 1) position in the matrix, and the

 pivot entry is therefore 2. Pivoting on the (1, 1)-entry is accomplished by

 performing the two elementary operations R2 ←   R2 − R1 and R3 ←R3 − 2R1, leaving the matrix:

2 1 2 4

0 0   −1 −4

0 1 −2 −4

  R2 ← R2 − R1

R3 ← R3 − 2R1

(The elementary row operations used are noted down next to the relevantrows to indicate how the row reduction is proceeding.) The new pivot

entry is 0, but as the entry immediately under the pivot position is non-

 zero, interchanging the two rows moves a non-zero to the pivot position. 2 1 2 4

0 1   −2 −4

0 0 −1 −4

  R2 ↔ R3

R3 ↔ R2

The next step is to pivot on this entry in order to zero out all the entries

below it and then move the pivot position. As the only entry below the

 pivot is already zero, no elementary row operations need be performed, and

the only action required is to move the pivot: 2 1 2 4

0 1   −2 −4

0 0   −1   −4

.

Once the pivot position reaches the bottom row, there are no further opera-

tions to be performed (regardless of whether the pivot entry is zero or not)

and so the process terminates, leaving the matrix in row-echelon form

2 1 2 4

0 1 −2 −4

0 0 −1 −4

as required.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 19/212

mathematical methods i 19

For completeness, and to provide a description more suitable

for implementing Gaussian elimination on a computer, we give the

same algorithm more formally – in a sort of pseudo-code. 9   9 Pseudo-code is a way of expressinga computer program precisely, butwithout using the syntax of anyparticular programming language.In pseudo-code, assignments, loops,conditionals and other features thatvary from language-to-language areexpressed in natural language.

Definition 1.13.  (Gaussian Elimination — in symbols)

Let A  = (aij ) be an m × n matrix and set two variables r ←  1, c ←  1.

(Here r stands for “row” and c for “column” and they store the pivot po-

sition.) Then repeatedly perform whichever one of the following operations

is possible (only one will be possible at each stage) until either r   >  m or

c > n, at which point the algorithm terminates.

1. If arc   =  0 and there exists x   >  r such that axc =  0 then perform the

elementary row operation Rr ↔ Rx.

2. If arc  =  0  and axc  =  0  for all x > r, then set c ← c + 1.

3. If arc = 0  then, for each x > r, perform the elementary row operation

Rx ← Rx − (axc /arc)Rr,

and then set r ← r + 1 and c ← c + 1.

When this algorithm terminates, the matrix will be in row-echelon form.

1.4   Back Substitution

Recall that the whole point of elementary row operations is totransform a system of linear equations into a simpler system with

the same solutions; in other words to change the problem to an easier

problem with the same answer. So after reducing the augmented

matrix of a system of linear equations to row-echelon form, we now

need a way to read off the solution set.

The first step is to determine whether the system is consistent or

otherwise, and this involves identifying the leading entries in each

row of the augmented matrix in row-echelon form — these are the

 first non-zero entries in each row. In other words, run a finger along

each row of the matrix stopping at the first non-zero entry and

noting which column it is in.

Example 1.14.  (Leading entries) The following two augmented matrices

in row-echelon form have their  leading entries highlighted.

1 0 −1 2 3

0 0 2 1 0

0 0 0 0 2

0 0 0 0 0

1 2 −1 2 3

0 0 2 1 0

0 0 0   −1 2

The left-hand matrix of Example  1.14 has the property that oneof the leading entries is on the  right-hand side of the augmenting bar.

You may wonder why we keepsaying “to the right of the augmenting bar” rather than “in the last column”.The answer is that if we have morethan one linear equation with thesame coefficient matrix, say  A x  =  b1,

 A x  =  b2, then we can form a “super-augmented” matrix [ A  |   b1  b2]and solve both systems with oneapplication of Gaussian elimination.So there may be more than one columnto the right of the augmenting bar.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 20/212

20

If we “unpack” what this means for the system of linear equations,

then we see that the third row corresponds to the linear equation

0x1 + 0x2 + 0x3 + 0x4 =  2,

which can never be satisfied. Therefore this system of linear equa-tions has no solutions, or in other words, is  inconsistent. This is in

fact a defining feature of an inconsistent system of linear equations, a

fact that is important enough to warrant stating separately.

Theorem  1.15.  A system of linear equations is inconsistent if and only if 

one of the leading entries in the row-echelon form of the augmented matrix

is to the right of the augmenting bar.

Proof.  Left to the reader.

The right-hand matrix of Example  1.14 has no such problem,

and so we immediately conclude that the system is consistent — it

has at least one solution. Every column to the left of the augment-

ing bar corresponds to one of the variables in the system of linear

equations

x1   x2   x3   x4

1 2 −1 2 3

0 0 2 1 0

0 0 0 −1 2

  (1.3)

and so the leading entries identify  some of the variables. In this case,

the leading entries are in columns 1, 3 and 4 and so the identifiedvariables are x1, x3  and  x4. The variables identified in this fashion

are called the basic variables (also known as  leading variables) of the

system of linear equations. The following sentence is the key to

understanding solving systems of linear equations by back substitu-

tion:

Every  non-basic variable of a system of linear equations is a  free vari-

able or  free parameter of the system of linear equations, while every

basic variable can be expressed uniquely as a combination of the free

parameters and/or constants.

The process of  back-substitution refers to examining the equationsin reverse order, and for each equation finding the unique expres-

sion for the basic variable corresponding to the leading entry of 

that row. Let’s continue our examination of the right-hand matrix of 

Example 1.14, also shown with the columns identified in (1.3).

The third row of the matrix, when written out as an equation,

says that −1x4   =  2, and so  x4   = −2, which is an expression for

the basic variable  x4  as a constant. The second row of the matrix

corresponds to the equation 2x3 + x4  = 0, but as we know now that

x4   = −2, this can be substituted in to give 2x3 − 2  =  0 or x3   =  1.

The first row of this matrix corresponds to the equation

x1 + 2x2 − x3 + 2x4  =  3

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 21/212

mathematical methods i 21

and after substituting in x3  =  1 and  x4  = −2 this reduces to

x1 + 2x2  =  8. (1.4)

This equation involves one basic variable (that is,  x1) together with

a non-basic variable (that is,  x2

) and a constant (that is, 8). The rules

of back-substitution say that this should be manipulated to give an

expression for the basic variable in terms of the other things. So we

get

x1  =  8 − 2x2

and the entire solution set for this system of linear equations is

given by

S = {(8 − 2x2, x2, 1, −2) | x2 ∈ R}.

Therefore we conclude that this system of linear equations has

infinitely many solutions that can be described by  one free parameter.

The astute reader will notice that (1

.4

) could equally well bewritten x2  = 4 − x1/2 and so we could use  x1  as the free parameter,

rather than  x2  — so why does back-substitution need to specify

which variable should be chosen as the free parameter? The answer

is that there is always an expression for the solution set that uses

the non-basic variables as the free parameters. In other words, the

process as described will always work .

Example 1.16.  (Back substitution) Find the solutions to the system of 

linear equations whose augmented matrix in row-echelon form is

0 2 −1 0 2 3 1

0 0 1 3 −1 0 20 0 0 0 1 1 0

0 0 0 0 0 1 5

First identify the leading entries in the matrix, and therefore the basic and

non-basic variables. The leading entries in each row are highlighted below

0 2   −1 0 2 3 1

0 0 1 3 −1 0 2

0 0 0 0 1 1 0

0 0 0 0 0 1 5

Therefore the basic variables are {x2, x3, x5, x6} while the free variablesare {x1, x4} and so this system has infinitely many solutions that can be

described with two free parameters. Now back-substitute starting from the

last equation. The fourth equation is simply that x 6   =  5, while the third

equation gives x5 + x6   =  0 which after substituting the known value for

x6 gives us x5  = −5. The second equation is

x3 + 3x4 − x5 =  2

and so it involves the basic variable x3 along with the free variable x4 and

the already-determined variable x5. Substituting the known value for x5

and rearranging to give an expression for x3, we get

x3 = −3 − 3x4   (1.5)

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 22/212

22

Finally the first equation is

2x2 − x3 + 2x5 + 3x6  =  1

and so substituting all that we have already determined we get

2x2 − (−3 − 3x4) + 2(−5) + 3(5) = 1

which simplifies to

x2 = −7 − 3x4

2

What about x1? It is a variable in the system of linear equations, but it

did not actually occur in any of the equations. So if it does not appear

in any of the equations, then there are no restrictions on its values and

so it can take any value — therefore it is a free variable. Fortunately, the

rules for back-substitution have already identified it as a non-basic variable

as it should be. Therefore the final solution set for this system of linear

equations is

S = {(x1, (−7 − 3x4)/2, −3 − 3x4, x4, −5, 5) | x1, x4 ∈ R}

and therefore we have found an expression with two free parameters as

expected.

Key Concept  1.17.  (Solving Systems of Linear Equations)

To solve a system of linear equations of the form A x  =  b, perform the

 following steps:

1. Form the augmented matrix [ A | b].

2. Use Gaussian elimination to put the augmented matrix into row-

echelon form.

3. Use back-substitution to express each of the basic variables as a

combination of the free variables and constants.

1.4.1   Reasoning about systems of linear equations

Understanding the process of Gaussian elimination and back-substitution also allows us to  reason about systems of linear equa-

tions, even if they are not explicitly defined, and make general

statements about the number of solutions to systems of linear equa-

tions. One of the most important is the following result, which says

that any consistent system of linear equations with more unknowns

than equations has infinitely many solutions.

Theorem  1.18.   Suppose that A x  =  b is a system of m linear equations

in n variables. If m   <  n, then the system is either inconsistent or has

infinitely many solutions.

Proof.  Consider the row-echelon form of the augmented matrix[ A |   b]. If the last column contains the leading entry of some row,

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 23/212

mathematical methods i 23

then the system is inconsistent. Otherwise, each leading entry is

in a column corresponding to a variable, and so there are exactly

m basic variables. As there are n  variables altogether, this leaves

n − m   >   0 free parameters in the solution set and so there are

infinitely many solutions.

A homogeneous system of linear equations is one of the form

 A x   =   0 and these systems are  always consistent.10 Thus Theo-   10 Why is this true?

rem 1.18 has the important corollary that  “every homogeneous system

of linear equations with more unknowns than equations has infinitely

many solutions”.

A second example of reasoning  about a system of linear equa-

tions rather than just solving an explicit system is when the system

is not fully determined. For example, suppose that  a  and  b  are un-

known values. What can be said about the number of solutions of 

the following system of linear equations? 1 2   a

0 1 2

1 3 3

x

 y

 z

 =

−3

b

0

In particular, for which values of  a  and  b  will this system have 0, 1

or infinitely many solutions?

To answer this, start performing Gaussian elimination as usual,

treating a  and  b  symbolically as their values are not known.11 The   11 Of course, it is necessary to makesure that you never compute anythingthat  might be undefined, such as 1/a.If you need to use 1/a during the

Gaussian elimination, then you need toseparate out the cases  a  =  0 and  a = 0and do them separately.

row reduction proceeds in the following steps: the initial aug-

mented matrix is   1 2   a   −30 1 2   b

1 3 3 0

and so after pivoting on the top-left position we get

1 2   a   −3

0 1 2   b

0 1 3 − a   3

  R3 ← R3 − R1

and then

  1 2   a   −3

0 1 2   b

0 0 1 − a   3 − b

  R3 ← R3 − R2

From this matrix, we can immediately see that if  a = 1 then 1 − a =0 and every variable is basic, which means that the system has a

unique solution (regardless of the value of  b). On the other hand,

if  a  =  1 then either b =  3 in which case the system is inconsistent,

or b  =  3 in which case there are infinitely many solutions. We can

summarise this outcome:

a = 1 Unique solution

a =  1 and b = 3 No solutions

a =  1 and b  =  3 Infinitely many solutions

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 24/212

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 25/212

2

Vector Spaces and Subspaces

This chapter takes the first steps away from the  geometric interpretation of vectors in familiar 2- or

3-dimensional space by introducing n-dimensional vectors and the vector space Rn, which must neces-

sarily be described and manipulated algebraically.

Before commencing this chapter, students should be able to:

• Solve systems of linear equations.

After completing this chapter, students will be able to:

• Determine when a set of vectors is a subspace, and

• Determine when a set of vectors is linearly independent, and

• Find a basis for a subspace, and hence determine its dimension.

2.1   The vector space Rn

The vector space Rn consists of all the n-tuples of real numbers,

which henceforth we call vectors; formally we say that

Rn = {(x1, x2, . . . , xn) | x1, x2, . . . , xn ∈ R}.

Thus R2 is just the familiar collection of  pairs of real numbers that

we usually visualise by identifying each pair  (x, y) with the point

(x, y) on the Cartesian plane, and R3 the collection of  triples of real

numbers that we usually identify with  3-space.

A vector u  = (u1, . . . , un) ∈ Rn may have different meanings:

• when n  =  2 or 3 it could represent a geometric vector in Rn that

has both a magnitude and a direction;

• when n  =  2 or 3 it could represent the coordinates of a point in

the Cartesian plane or in 3-space;

• it could represent certain quantities, eg  u1  apples, u2  pears, u3

oranges, u4  bananas, . . .

• it may simply represent a string of real numbers.

The vector space Rn also has two operations that can be per-formed on vectors, namely vector addition and  scalar multiplication.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 26/212

26

Although their definitions are intuitively obvious, we give them

anyway:

Definition 2.1.   (Vector Addition)

If  u  = (u1, u2, . . . , un) and  v  = (v1, v2, . . . , vn) are vectors in Rn thentheir sum  u + v is defined by

u + v = (u1 + v1, u2 + v2, . . . , un + vn).

In other words, two vectors are added  coordinate-by-coordinate.

Definition 2.2.   (Scalar Multiplication)

If  v  = (v1, v2, . . . , vn) and  α

∈R then the product αv is defined by

αv = (αv1, αv2, . . . , αvn) .

In other words, each coordinate of the vector is multiplied by the scalar.

Example 2.3.  Here are some vector operations:

(1,2, −1, 3) + (4,0,1,2) = (5,2,0,5)

(3,1,2) + (6, −1, −4) = (9,0, −2)

5(1,0, −1, 2) = (5,0, −5,10)

Row or column vectors?

A vector in Rn is simply an ordered  n-tuple of real numbers

and for many purposes all that matters is that we write it down

in such a way that it is clear which is the first coordinate, the

second coordinate and so on.

However a vector can also be viewed as a matrix, which is

very useful when we use  matrix algebra (the subject of Chap-

ter 3) to manipulate equations involving vectors, and then a

choice has to be made whether to use a 1 × n matrix, i.e. amatrix with one row and n  columns or an n × 1 matrix, i.e. a

matrix with n  rows and  1  column to represent the vector.

Thus a vector in R4 can be represented either as a  row vector

such as

[1,2,3,4]

or as a column vector such as

1

2

3

4

.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 27/212

mathematical methods i 27

For various reasons, mostly to do with the conventional no-

tation we use for functions (that is, we usually write   f (x) rather

than (x) f ), it is more convenient mathematically to assume that

vectors are represented as  column vectors most of the time. Un-

fortunately, in writing about mathematics, trying to typeset a rowvector such as  [1,2,3,4] is much more convenient than typeset-

ting a column vector such as

1

2

3

4

which, as you can see, leads

to ugly and difficult to read paragraphs.

Some authors try to be very formal and use the notation for

a matrix transpose (see Chapter  2) to allow them to elegantly

typeset a column vector: so their text would read something

like: Let v  = (1,2,3,4)T  in which case everyone is clear that the

vector v  is really a column vector.

In practice however, either the distinction between a row-

and column-vector is not important (e..g adding two vectors

together) or it is obvious from the context; in either case there

is never any actual confusion caused by the difference. So to

reduce the notational overload of adding a slew of transpose

symbols that are almost never needed, in these notes we’ve

decided to write all vectors just as rows, but with the under-

standing that when it matters (in matrix equations), they are

really to be viewed as column vectors. In this latter case, it will

always be obvious from the context that the vectors  must becolumn vectors anyway!

One vector plays a special role in linear algebra; the vector in

Rn with all components equal to zero is called the  zero-vector and

denoted

0 = (0 , 0 , . . . , 0).

It has the obvious properties that v + 0 =  0 + v =  v; this means that

it is an additive identity1.   1 This is just formal mathematicalterminology for saying that you canadd it to any vector without altering

that vector.2.2   Subspaces

In the study of 2- and 3- dimensional geometry, figures such as  lines

and planes play a particularly important role and occur in many

different contexts. In higher-dimensional and more general vector

spaces, a similar role is played by a vector subspace or just subspace

which is a set of vectors that has three special additional properties.

Definition 2.4.   (Vector Subspace)

Let S ⊆ Rn be a set of vectors. Then S is called a  subspace of Rn if 

(S1)   0 ∈ S, and

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 28/212

28

(S2)   u + v ∈ S for all vectors u, v ∈ S, and

(S3)   αv ∈ S for all scalars α ∈ R and vectors v ∈ S.

First we’ll go through these three conditions in turn and see

what they are saying. The first condition, (S1) simply says that a

subspace must contain the zero vector  0; when this condition does

not hold, it is an easy way to show that a given set of vectors is  not  a

subspace.

Example 2.5.  The set of vectors S   = {(x, y) |   x + y   =   1} is not a

subspace of R2 because the vector 0  = (0, 0) does not belong to S.

The second condition2 (S2) says that in order to be a subspace, a   2 Condition (S2) does not restrict whathappens to the sum of two vectors thatare not  in  S, or the sum of a vector in  S

and one not in S. It is only concernedwith the sum of two vectors that areboth in  S.

set S  of vectors must be  closed under vector addition. This means that

if two vectors that are both in S  are added together, then their summust remain in S.

Example 2.6.   (Subspace)3 In R3, the xy-plane is the set of all vectors of    3 In the next section, we’ll see how to

present a formal proof that a set of vectors is closed under vector addition, but for these examples, geometricintuition is enough to see that what is being claimed is true.

the form (x, y, 0) (where x and y can be anything). The xy-plane is  closed

under vector addition because if we add any  two vectors in the xy-plane

together, then the resulting vector also lies in the xy-plane.

Example 2.7.  (Not a subspace) In R2, the unit disk is the set of vectors

{(x, y) | x2 + y2 ≤ 1}.

This set of vectors is  not  closed under vector addition because if we takeu = (1, 0) and  v  = (0, 1), then both u  and  v  are in the unit disk, but their

sum u + v = (1, 1) is  not  in the unit disk.

The third condition (S3) says that in order to qualify as a sub-

space, a set S  of vectors must be  closed under scalar multiplication,

meaning that if a vector is contained in  S, then all of its scalar multi-

 ples must also be contained in  S.

Example 2.8.  (Closed under scalar multiplication) In R2, the set of 

vectors on the two axes, namely

S = {(x, y) | xy  =  0}is closed under scalar multiplication, because it is clear that any multiple

of a vector on the x-axis remains on the x-axis, and any multiple of a

vector on the y-axis remains on the y-axis.

Example 2.9.  (Not closed under scalar multiplication) In R2, the unit

disk, which was defined in Example  2.7, is not closed under scalar multi-

 plication because if we take  u  = (1, 0) and  α  =  2, then αu = (2, 0) which

is not in the unit disk.

One of the fundamental skills needed in linear algebra is the

ability to identify whether a given set of vectors in Rn forms asubspace or not. Usually a set  of vectors will be described in some

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 29/212

mathematical methods i 29

way, and you will need to be able to tell whether this set of vectors

is a subspace. To prove that a given set of vectors  is  a subspace,

it is necessary to show that all three conditions (S1), (S2) and (S3)

are satisfied, while to show that a set of vectors  is not a subspace,

it is only necessary to show that one of the three conditions is not

satisfied.

2.2.1   Subspace proofs

In this subsection, we consider in more detail how to show whether

or not a given set of vectors is a subspace or not. It is  much easier to

show that a set of vectors is  not  a subspace than to show that a set

of vectors is  a subspace. The reason for this is that conditions (S2)

and (S3) apply to every pair of vectors in the given set. To show that

either condition fails, we only need to give a single example where

the condition does not hold, but to show that they are true, we need

to find a general argument that applies to every pair of vectors.

This asymmetry is so important that we give it a name4.   4 The “black swan” name comes fromthe famous notion that in order toprove or disprove the logical statement“all swans are white”, it would only benecessary to find one single black swanto disprove it, but it would be necessaryto check every possible swan in orderto prove it. Subspaces are the same —if a set of vectors is not a subspace,then it is only necessary to find one“black swan” showing that one of the conditions does not hold, but if itis a subspace then it is necessary to

“check every swan” by proving thatthe conditions hold for every pair of vectors and scalars.

Key Concept  2.10.  (The “Black Swan” concept)

To show that a set of vectors S is not closed under vector addition,

it is sufficient to find a single explicit example of two vectors u, v

that are contained in S, but whose sum u + v is not contained in S.

 However, to show that a set of vectors S  is closed under vector

addition, it is necessary to give a  formal symbolic proof  that applies

to every pair of vectors in S.

Example 2.11.  (Not a subspace) The set S  = {(w, x, y, z) | wx  =  yz} in

R4 is not a subspace because if  u  = (1,0,2,0) and  v  = (0,1,0,2), then

u, v ∈ S but u + v = (1,1,2,2)  /∈ S so condition (S2) does not hold.

One of the hardest techniques for first-time students of linear al-

gebra is to understand how to structure a proof that a set of vectors

is a subspace, so we’ll go slowly. Suppose that

S = {(x, y, z) | x − y =  2 z}is a set of vectors in R

3, and we need to check whether or not it is a

subspace. Here is a model proof, interleaved with some discussion

about the proof.5   5 When you do your proofs it mayhelp to structure them like this modelproof, but don’t include the discussion— this is to help you understand whythe model proof looks like it does, butit is not part of the proof itself.

(S1) It is obvious that

0 − 0 =  2(0)

and so 0 ∈ S.

Discussion:  To check that 0 is in S, it is necessary to verify that the zero

vector satisfies the “defining condition” that determines S. In this case, the

defining condition is that the difference of the first two co-ordinates (that

is, x − y) is equal to twice the third coordinate (that is,  2 z). For the vector

0 = (0,0,0) we have all coordinates equal to  0, and so the condition is true.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 30/212

30

(S2) Let u  = (u1, u2, u3) ∈ S and  v  = (v1, v2, v3) ∈ S. Then

u1 − u2  =  2u3   (2.1)

v1 − v2  =  2v3   (2.2)

Discussion:  To prove that S is closed under addition, we need to check 

every possible pair of vectors, which can only be done symbolically. We give

symbolic names u  and  v  to two vectors in S and write down the only facts

that we currently know — namely that they satisfy the defining condition

 for S. These equations are given labels — in this case,  (2.1) and (2.2),

because the proof must refer to these equations later.

Now consider the sum

u + v = (u1 + v1, u2 + v2, u3 + v3)

and test it for membership in  S. As

(u1 + v1) − (u2 + v2) = u1 + v1 − u2 − v2   (rearranging)

= (u1 − u2) + (v1 − v2)   (rearranging)

= 2u3 + 2v3   (by (2.1) and (2.2))

= 2(u3 + v3)   (rearranging terms)

it follows that  u + v ∈ S.

Discussion:  To show that  u + v is in S, we need to show that the difference

of its first two coordinates is equal to twice its third coordinate. So the

sequence of calculations starts with the difference of the first two coordinates

and then carefully manipulates this expression in order to show that it

is equal to twice the third coordinate. Every stage of the manipulation is justified either just as a rearrangement of the terms or by reference to some

 previously known fact. At some stage in the manipulation, the proof must

use the two equations  (2.1) and  (2.2), because the result must depend on the

two original vectors being vectors in S.

(S3) Let u  = (u1, u2, u3) ∈ S and  α ∈ R. Then

u1 − u2  =  2u3   (2.3)

Discussion:  To prove that S is closed under scalar multiplication, we need

to check every vector in S and scalar in R. We give the symbolic name u

to the vector in S and α  to the scalar, and note down the only fact that we

currently know — namely that  u  satisfies the defining condition for S. We’llneed this fact later, and so give it a name, in this case (2.3).

Now consider the vector

αu = (αu1, αu2, αu3)

and test it for membership in  S. As

αu1 − αu2  =  α(u1 − u2)   (rearranging)

= α(2u3)   (by (2.3))

= 2(αu3)

it follows that αu ∈ S.

Discussion:  To show that  αu is in S, we need to show that the differenceof its first two coordinates is equal to twice its third coordinate. So the

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 31/212

mathematical methods i 31

sequence of calculations starts with the difference of the first two coordinates

and then carefully manipulates it in order to show that it is equal to twice

the third coordinate. At some stage in the manipulation, the proof must use

the equations (2.3) because the result must depend on the original vector

being a member of S.

It will take quite a bit of practice to be able to write this sort of 

proof correctly, so do not get discouraged if you find it difficult at

first. Here are some examples to try out.

Example 2.12.  These sets of vectors  are  subspaces:

1. The set of vectors {(w, x, y, z) | w + x + y + z =  0} in R4.

2. The xy-plane in R3.

3. The line x =  y in R2.

4. The set of vectors {(x1, x2, . . . , xn) |  x1 + x2 + . . . + xn−1  =  xn} in

Rn.

while these sets of vectors are not subspaces:

1. The set of vectors {(w, x, y, z) | w + x + y + z =  1} in R4.

2. The plane normal to n  = (1,2,1) passing through the point  (1,1,1).

3. The line x =  y − 1 in R2.

4. The set of vectors {(x1, x2, . . . , xn) |  x1 + x2 + . . . + xn−1 ≥  xn} in

Rn  for n ≥ 2.

2.2.2   Exercises

1. Show that a line in R2 is a subspace if and only if it passes

through the origin (0, 0).

2. Find a set of vectors in R2 that is closed under vector addition,

 but not closed under scalar multiplication.

3. Find a set of vectors in R2 that is closed under scalar multiplica-

tion, but not closed under vector addition.

2.3   Spans and Spanning Sets

We start this section by considering a simple question:

What is the smallest subspace  S  of R2 containing the vector  v   =

(1, 2)?

If  S  is a subspace, then by condition (S1) of Definition  2.4 it must

also contain the zero vector, so it follows that  S  must contain at least

the two vectors {0, v}. But by condition (S2), the subspace S  must

also contain the sum  of any two vectors in  S, and so therefore S

must also contain all the vectors

(2, 4), (3, 6), (4, 8), (5,10), . . .

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 32/212

32

But then, by condition (S3), it follows that  S  must also contain all

the multiples of  v, such as

(−1, −2), (1/2,1), (1/4, 1/2), . . . ,

Therefore S  must contain  at least  the set of vectors6   6 So if a subspace contains a vectorv then it must contain every scalarmultiple of  v. In R

2 and R3, this means

that if a subspace contains a point,then it contains the line containing theorigin and that point.

{(α, 2α) | α ∈ R}

and in fact this set contains enough vectors to satisfy the three

conditions (S1), (S2) and (S3), and so it is the smallest subspace

containing v. Now let’s extend this result by considering the same

question but with a bigger starting set of vectors: Suppose that

 A = {v1, v2, . . . , vk } is a set of vectors in Rn — what is the smallest

subspace of Rn that contains  A?

To answer this, we need a couple more definitions:

Definition 2.13.  (Linear Combination)

Let A = {v1, v2, . . . , vk} be a set of vectors in Rn. Then a linear combi-

nation of the vectors in A is any vector of the form

v =  α1v1 + α2v2 + · · · + αk vk

where α1, α2, . . ., αk  ∈ R are arbitrary scalars.

By slightly modifying the argument of the last paragraph, itshould be clear that if a subspace contains the vectors  v1, v2, . . .,vk ,

then it also contains  every linear combination of those vectors. As

we will frequently need to refer to the “set of all possible linear

combinations” of a set of vectors, we should give it a name:

Definition 2.14.   (Span)

The span of A   =  {v1, v2, . . . , vk} is the set of  all possible linear

combinations of the vectors in A, and is denoted span( A). In symbols,

span( A) = {α1v1 + α2v2 + · · · + αk vk | αi ∈R

, 1 ≤ i ≤ k } .

Therefore, if a subspace  S  contains a subset A ⊆   S then it also

contains the span of  A. To answer the original question (“what is

the smallest subspace containing  A”) it is enough to notice that the

span of  A is always a subspace itself and so no further vectors need

to be added. This is sufficiently important to write out formally as a

theorem and to give a formal proof.

Theorem  2.15.  (Span of anything is a subspace) Let A  = {v1, v2, . . . , vk}be a set of vectors in R

n. Then span( A) is a subspace of R

n, and is the

smallest subspace of Rn containing A.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 33/212

mathematical methods i 33

Proof.  We must show that the three conditions of Definition  2.4

hold.

(S1) It is clear that

0 =  0v1 + 0v2 +

· · ·+ 0vk 

and so 0  is a linear combination of the vectors in  A and thus

0 ∈ span( A).   This seems like a lot of work just to saysomething that is almost obvious: if you take two linear combinations of aset of vectors and add them together,then the resulting vector is  also a linearcombination of the original set of vectors!

(S2) Let u, v ∈  span( A). Then there are scalars  αi,  β i  (1 ≤   i ≤   k )

such that

u =  α1v1 + α2v2 + · · · + αk vk    (2.4)

v =  β1v1 + β2v2 + · · · + βk vk    (2.5)

Now consider the sum u + v:

u + v = (α1

v1 + α

2v

2 +

· · ·+ α

k v

k )

+ ( β1v1 + β2v2 + · · · + βk vk )   (by (2.4), (2.5))

= (α1 + β1) v1 + (α2 + β2) v2 + · · · + (αk  + βk ) vk 

and so u + v ∈ span( A).

(S3) Let v ∈ span( A) and  α ∈ R. Then there are scalars  β i  (1 ≤ i ≤k ) such that

v =  β1v1 + β2v2 + · · · + βk vk    (2.6)

It is clear that

αv =  α ( β1v1 + β2v2 + · · · + βk vk )   (by (2.6))

= (αβ1)v1 + (αβ2)v2 + · · · + (αβk )vk    (rearranging)

and so αv ∈ span( A).

The arguments earlier in this section showed that any  subspace

containing A also contains span( A) and as span( A) is a subspace

itself, it must be the smallest subspace containing  A.

The span of a set of vectors gives us an easy way to find  sub-

spaces — start with any old set of vectors, take their span and weget a subspace. If we do have a subspace given in this way, then

what can we say about it? Is this a  useful way to create, or work

with, a subspace?

For example, suppose we start with

 A = {(1,0,1), (3,2,3)}

as a set of vectors in R3. Then span( A) is a subspace of R3 — what

can we say about this subspace? The first thing to notice is that

we can easily check  whether or not a particular vector is contained

in span( A), because it simply involves  solving a system of linearequations.7   7 This shows that having a subspace of 

the form span( A) is a good representa-tion of the subspace, because we caneasily test membership of the subspace— that is, we “know” which vectorsare contained in the subspace.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 34/212

34

Continuing our example, to decide whether a vector  v  is con-

tained in span( A), we try to  solve the vector equation

v =  λ1(1,0,1) + λ2(3,2,3)

for the two “unknowns” λ1, λ2; this is a system of  3  linear equa-

tions in two unknowns and, as discussed in Chapter  1, can easily be

solved.

Example 2.16.  (Vector not in span) If A   = {(1,0,1), (3,2,3)}, then

v = (2,4,5) is not in span( A). This follows because the equation

(2,4,5) = λ1(1,0,1) + λ2(3,2,3)

 yields the system of linear equations

λ1   +   3λ2   =   2

2λ2   =   4

λ1   +   3λ2   =   5

which is obviously inconsistent. Thus there is no linear combination of 

the vectors in A that is equal to  (2,4,5).

Example 2.17.  (Vector in span) If A   = {(1,0,1), (3,2,3)}, then v   =

(5, −2, 5) is in span( A). This follows because the equation

(5, −2, 5) = λ1(1,0,1) + λ2(3,2,3)

 yields the system of linear equations

λ1   +   3λ2   =   5

2λ2   = −2λ1   +   3λ2   =   5

which has the unique solution  λ2   = −1 and  λ1   =   8. Therefore v  ∈span( A) because we have now found the particular linear combination

required.

Continuing with  A = {(1,0,1), (3,2,3)}, is there another descrip-

tion of the subspace span( A)? It is easy to see that every vector in

span( A) must have its first and third coordinates equal, and by try-

ing a few examples, it seems likely that  every vector with first and

third coordinates equal is in span( A). To prove this, we would need

to demonstrate that a suitable linear combination of the two vectorscan be found for any such vector. In other words, we need to show

that the equation

(x, y, x) = λ1(1,0,1) + λ2(3,2,3)

has a solution for all values of  x  and  y. Fortunately this system of 

linear equations can easily be solved symbolically with the result

that the system is always consistent with solution

λ1  =  x − 3 y/2   λ2 =  y/2.

Therefore we have the fact that

span({(1,0,1), (3,2,3)}) = {(x, y, x) | x, y ∈ R}

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 35/212

mathematical methods i 35

2.3.1   Spanning Sets

So far in this section, we have started with a small collection of vec-

tors (that is, the set  A), and then  built a subspace (that is, span( A))

from that set of vectors.   Why do we want to find a spanning

set of vectors for a subspace? Theanswer is that a spanning set is aeffective way of  describing a subspace.Every subspace has a spanning set,and so it is also a universal way of describing a subspace. Basically,once you know a spanning set fora subspace, you can easily calculateeverything about that subspace.

Now we consider the situation where we start with an arbi-trary subspace V  and try to find a set — hopefully a small set —

of vectors  A such that  V   =   span( A). This concept is sufficiently

important to warrant a formal definition:

Definition 2.18.   (Spanning Set)

Let V  ⊆  Rn be a subspace. Then a set A  = {v1, v2, . . . , vk} of vectors,

each contained in V, is called a  spanning set for V if 

V  =  span( A).

Example 2.19.  (Spanning Set) If V   = {(x, y, 0) |  x, y ∈  R}, then V is

a subspace of R3. The set A  = {(1,0,0), (0,1,0)} is a  spanning set for

V, because every vector in V is a linear combination of the vectors in A,

and every linear combination of the vectors in A is in V. There are other

spanning sets for V — for example, the set B  = {(1,1,0), (1, −1, 0)} is

another spanning set for V.

Example 2.20.   (Spanning Set) If V  = {(w, x, y, z) | w + x + y + z =  0},

then V is a subspace of R4. The set

 A = {(1, −1,0,0), (1,0, −1, 0), (1,0,0, −1)}is a  spanning set for V because every vector in V is a linear combination

of the vectors in A, and every linear combination of the vectors in A is in

V. There are many other spanning sets for V — for example, the set

 A = {(2, −1, −1, 0), (1,0, −1, 0), (1,0,0, −1)}is a different spanning set for the same subspace.

One critical point that often causes difficulty for students begin-

ning linear algebra is understanding the difference between “span”

and “spanning set”; the similarity in the phrases seems to cause

confusion. To help overcome this, we emphasise the difference.8   8

Another way to think of it is that a“spanning set” is like a list of LEGO®shapes that you can use to builda model while the “span” is thecompleted model (the subspace).Finding a spanning set for a subspaceis like starting with the completedmodel and asking “What shapes do Ineed to build this model?".

Key Concept  2.21.  (Difference between span and spanning set) To

remember the difference between span and spanning set, make sure

 you understand that:

• The span of a set A of vectors is the entire subspace that can be

“built” from the vectors in A by taking linear combinations in all

 possible ways.

• A spanning set of a subspace V is the  set of vectors  that are

needed in order to “build” V .

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 36/212

36

In the previous examples, the spanning sets were just “given”

with no explanation of how they were found, and no proof that

they were the correct spanning sets. In order to show that a particu-

lar set  A actually is  a spanning set for a subspace  V , it is necessary

to check two things:

1. Check that the vectors in  A are actually contained in V  — this

guarantees that span( A) ⊆ V .

2. Check that every vector in  V  can be made as a linear combination

of the vectors in  A — this shows that span( A) = V .

The first of these steps is easy and, by now, you will not be sur-

prised to discover that the second step can be accomplished by

solving a system of linear equations9.   9 In fact, almost everything in linearalgebra ultimately involves nothingmore than solving a system of linearequations!

Example 2.22.  (Spanning Set With Proof) We show that the set A   =

{(1,1, −1), (2,1,1)} is a spanning set for the subspace

V  = {(x, y, z) |  z =  2x − 3 y} ⊂ R3.

First notice that both (1,1, −1) and  (2,1,1) satisfy the condition that

 z  =  2x − 3 y and so are actually in V. Now we need to show that  every

vector in V is a linear combination of these two vectors. Any vector in

V has the form (x, y, 2x − 3 y) and so we need to show that the vector

equation

(x, y, 2x − 3 y) = λ1(1,1, −1) + λ2(2,1,1)

in the two unknowns λ1 and  λ2 is consistent, regardless of the values of x

and y. Writing this out as a system of linear equations we get

λ1   +   2λ2   =   x

λ1   +   λ2   =   y

−λ1   +   λ2   =   2x − 3 y

Solving this system of linear equations using the techniques of the previ-

ous chapter shows that this system always has a  unique solution, namely

λ1  =  2 y − x   λ2 =  x − y.

 Hence every vector in V can be expressed as a linear combination of the

two vectors, showing that these two vectors are a spanning set for V.

Actually finding a spanning set for a subspace is not so difficult,

 because it can just be built up vector-by-vector. Suppose that a sub-

space V  is given in some form (perhaps by a formula) and you need

to find a spanning set for  V . Start by just picking any non-zero vec-

tor v1 ∈   V , and examine span({v1}) — if this is equal to  V , then

you have finished, otherwise there are some vectors in  V  that can-

not yet be built just from  v1. Choose one of these “unreachable”

vectors, say v2, add it to the set you are creating, and then examine

span(

{v1, v2

}) to see if  this is equal to V . If it is, then you are fin-

ished and otherwise there is another “unreachable” vector, whichyou call v3  and add to the set, and so on. After some finite number

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 37/212

mathematical methods i 37

of steps (say  k  steps), this process will eventually terminate10 when   10 The reason that this process mustterminate (i.e. not go on for ever)will become clear over the next fewsections.

there are no more unreachable vectors in V  in which case

V  =  span({v1, v2, . . . , vk })

and you have found a spanning set for V .

Example 2.23.  (Finding Spanning Set) Let

V  = {(x, y, z) |  z =  2x − 3 y},

which is a subspace of R3. To find a spanning set for V, we start by choos-

ing any non-zero vector that lies in V, say  v1   = (1,0,2). It is clear that

span({v1}) is strictly smaller than V, because every vector in span({v1})

has a zero second coordinate, whereas there are vectors in V that do not

have this property. So we choose any one of these — say,  v2  = (0,1, −3)

 — and now consider span(

{v1, v2

}), which we can now prove is actually

equal to V. Thus, a suitable spanning set for V is the set

 A = {(1,0,2), (0,1, −3)}.

2.4   Linear Independence

In the last section, we learned how a subspace can always be de-

scribed by giving a  spanning set for that subspace. There are many

spanning sets for any given subspace, and in this section we con-

sider when a spanning set is  efficient — in the sense that it is as

small as it can be. For example, here are three different spanningsets for the xy-plane in R

3 (remember that the  x y-plane is the set of 

vectors {(x, y, 0) | x, y ∈ R}.

 A1  = {(1,0,0), (0,1,0)} A2  = {(1,1,0), (1, −1, 0), (1,3,0)} A3  = {(2,2,0), (1,2,0)}

Which of these is the “best” spanning set to use? There is perhaps

nothing much to choose between  A1  and  A3, because each of them

contain two vectors11, but it is clear that  A2  is  unnecessarily big — if    11 But maybe  A1  looks “more natural”

 because the vectors have such a simpleform; later we will see that for manysubspaces there is a natural spanningset, although this is not always the case

we throw out any of the three vectors in  A2, then the remaining twovectors still span the same subspace. On the other hand, both of the

spanning sets  A1  and  A3  are minimal spanning sets for the  xy-plane

in that if we discard any of the vectors, then the remaining set no

longer spans the whole x y-plane.

The reason that  A2  is not a smallest-possible spanning set for the

xy-plane is that the third vector is  redundant — it is  already a linear

combination of the first two vectors:  (1,3,0) =  2(1,1,0) − (1, −1, 0)

and therefore any vector in span( A2) can be produced as a linear

combination only of the first two vectors. More precisely any linear

combination

α1(1,1,0) + α2(1, −1, 0) + α3(1,3,0)

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 38/212

38

of all three of the vectors in  A2  can be rewritten as

α1(1,1,0) + α2(1, −1, 0) + α3 (2(1,1,0) − (1, −1, 0))

which is equal to

(α1 + 2α3)(1,1,0) + (α1 − α3)(1, −1, 0)which is just a linear combination of the first two vectors with al-

tered scalars.

Therefore a spanning set for a subspace is an efficient way to

represent a subspace if none of the vectors in the spanning set is a

linear combination of the other vectors. While this condition is easy

to state, it is hard to work with directly, and so we use a condition

that means exactly the same thing but is easier to use.

Definition 2.24.  (Linear Independence)

Let A   = {v1, v2, . . . , vk} be a set of vectors in Rn. Then A is calledlinearly independent (or just independent) if the only solution to the

vector equation

λ1v1 + λ2v2 + · · · + λk vk  =  0

in the unknowns λ1, λ2, . . ., λk  is the trivial solution λ1  =  λ2  = · · ·  =

λk  = 0.

Before seeing why this somewhat strange definition means ex-

actly the same as having no one of the vectors being a linear com-

 bination of the others, we’ll see a couple of examples. The astutereader (indeed, even the somewhat dull and apathetic reader) will

not be surprised to learn that testing a set of vectors for linear inde-

pendence involves solving a system of linear equations.

Example 2.25.  (Independent Set) In order to decide whether the set of 

vectors A   = {(1,1,2,2), (1,0, −1, 2), (2,1,3,1)} in R4 is linearly

independent, we need to solve the vector equation

λ1(1,1,2,2) + λ2(1,0, −1, 2) + λ3(2,1,3,1) = (0,0,0,0).

This definitely has at least one solution, namely the trivial solution

λ1   =   0, λ2   =   0, λ3   =   0, and so the only question is whether it has

more solutions. The vector equation is equivalent to the system of linear

equations

λ1   +   λ2   +   2λ3   =   0

λ1   +   λ3   =   0

2λ1 −   λ2   +   3λ3   =   0

2λ1   +   2λ2   +   λ3   =   0

which can easily be shown, by the techniques of Chapter  1  to have a unique

solution.

A set of vectors that is  not  linearly independent is called  depen-

dent. To show that a set of vectors is dependent, it is only necessary

to find an explicit non-trivial linear combination12 of the vectors   12 There is an asymmetry here similarto the asymmetry in subspace proofs.To show that a set of vectors is de-pendent only requires one  non-triviallinear combination, whereas to showthat a set of vectors is independent itis necessary in principle to show thatevery non-trivial linear combinationof the vectors is non-zero. Of coursein practice this is done by solving the

equal to 0.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 39/212

mathematical methods i 39

Example 2.26.  (Dependent Set) Is the set

 A = {(1,3, −1), (2,1,2), (4,7,0)}

in R3 linearly independent? To decide this, set up the vector equation

λ1(1,3, −1) + λ2(2,1,2) + λ3(4,7,0) = (0,0,0)

and check how many solutions it has. This is equivalent to the system of 

linear equations

1λ1   +   2λ2   +   4λ3   =   0

3λ1   +   λ2   +   7λ3   =   0

−λ1   +   2λ2   =   0

 After Gaussian elimination, the augmented matrix for this system of linear

equations is

1 2 4 0

0 −5 −5 0

0 0 0 0

and so has infinitely many solutions, because λ3 is a free parameter.

While this is already enough to prove that A is dependent, it is always

useful to find an explicit solution which can then be used to double-check 

the conclusion. As λ3  is free, we can find a solution by putting  λ3  =  1, in

which case the second row gives  λ2  = −1 and the first row λ1  = −2. And

indeed we can check that

−2(1,3, −1) − 1(2,1,2) + 1(4,7,0) = (0,0,0)

as required to prove dependence.

Now we’ll give a rigorous proof of the earlier claim that the

definition of linear independence (Definition  2.24) is just a way of 

saying that none of the vectors is a linear combination of the others.

Theorem  2.27.   Let A   = {v1, v2, . . . , vk} be a set of vectors in Rn.

Then A is linearly independent if and only if none of the vectors in A are a

linear combination of the others.

Proof.  We will actually prove the contrapositive13 statement, namely   13 Given the statement “ A implies B”,recall that the contrapositive statement

is “not  B  implies not  A”. If a statementis true then so is its contrapositive, andvice versa.

that  A is linearly dependent if and only if one of the vectors isa linear combination of the others. First suppose that one of the

vectors, say vi, is a linear combination of the others: then there are

scalars α1, α2, . . ., αi−1, α i+1, . . ., αk  such that

vi  =  α1v1 + α2v2 + · · · + αi−1vi−1 + αi+1vi+1 + · · · + αk vk

and so there is a non-trivial linear combination of the vectors of  A

equal to 0, namely:

α1v1 + α2v2 + · · · + αi−1vi−1 − 1vi + αi+1vi+1 + · · · + αk vk  =  0

(This linear combination is not all-zero because the coefficient of  vi

is equal to −1 which is definitely non-zero.)

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 40/212

40

Next suppose that the set  A is linearly dependent. Then there is

some non-trivial linear combination of the vectors equal to  0:

α1v1 + α2v2 + · · · + αk vk  =  0.

Because this linear combination is non-trivial, not  all  of the coeffi-cients are equal to 0, and so we can pick one of them, say  αi, that is

non-zero. But then

−αivi =  α1v1 + α2v2 + · · · + αi−1vi−1 + αi+1vi+1 + · · · + αk vk,

and so because αi = 0 we can divide by αi  and get

vi  = −α1

αiv1 −  α2

αiv2 − · · · − αi−1

αivi−1 −  αi+1

αivi+1 − · · · − αk 

αivk

so one of the vectors is a linear combination of the others.

There are two key facts about dependency that are intuitively

clear, but useful enough to state formally:

1. If  A is a linearly independent set of vectors in Rn, then any sub-

set of  A is also linearly independent.

2. If  B  is a linearly dependent set of vectors in Rn then any superset

of  B  is also linearly dependent.

In other words, you can remove vectors from an independent set

and it remains independent and you can add vectors to a depen-

dent set and it remains dependent.

2.5   Bases

In the last few sections, we have learned that giving a  spanning set

for a subspace is an  effective way of describing a subspace and that

a spanning set is efficient if it is linearly independent. Therefore an

excellent way to describe or transmit, for example by computer, a

subspace is to give a linearly independent spanning set  for the sub-

space. This concept is so important that it has a special name:

Definition 2.28.   (Basis)Let V be a subspace of Rn. Then a basis for V is a  linearly independent

spanning set for V. In other words, a basis is a set of vectors A  ⊂   Rn

such that

• V  =  span( A), and

• A is linearly independent.

Example 2.29.  (Basis) The set A   = {(1,0,0), (0,1,0)} is a  basis for

the xy-plane in R3, because it is a linearly independent set of vectors and

any vector in the xy-plane can be expressed as a linear combination of thevectors of A.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 41/212

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 42/212

42

shows that you can start with any  linearly independent set and

augment it vector-by-vector to obtain a basis containing the original

linearly independent set of vectors16.   16 We still have not yet shown that thisprocess will actually terminate, butwill do so in the next section.

Another approach to finding a basis of a subspace is to start with

a spanning set that is linearly dependent and to remove vectors from

it one-by-one. If the set is linearly dependent then one of the vec-

tors is a linear combination of the others, and so it can be removed

from the set without altering the span of the set of vectors. This

process can be repeated until the remaining vectors are linearly

independent in which case they form a basis.

Example 2.32.  (Basis from a spanning set) Let

 A = {(1,1, −2), (−2, −2, 4), (−1, −2, 3), (5, −5, 0)}

be a set of vectors in R3, and let V  =  span( A). What is a basis for V? We

start by testing whether A is linearly independent by solving the system of linear equations

λ1(1,1, −2) + λ2(−2, −2, 4) + λ3(−1, −2, 3) + λ4(5, −5, 0) = (0,0,0)

to see if it has any non-trivial solutions. If so, then one of the vectors can

be expressed as a linear combination of the others and discarded. In this

case, we discover that  (−2, −2, 4) = −2(1,1, −2) and so we can throw

out (−2, −2, 4). Now we are left with

{(1,1, −2), (−1, −2, 3), (5, −5, 0)}

and test whether this set is linearly independent. By solving

λ1(1,1, −2) + λ2(−1, −2, 3) + λ3(5, −5, 0) = (0,0,0)

we discover that (5, −5, 0) =   15(1,1, −2) + 10(−1, −2, 3) and so we

can discard (5, −5, 0). Finally the remaining two vectors are linearly

independent and so the set

{(1,1, −2), (−1, −2, 3)}

is a basis for V.

The most important property of a basis for a vector space  V  is

that every vector in  V  can be expressed as a linear combination of 

the basis vectors in exactly one way; we prove this in the next result:

Theorem  2.33.   Let B   =   {v1, v2, . . . , vk} be a basis for the sub-

space V. Then for any vector v ∈  V, there is a unique choice of scalars

α1, α2, . . . , αk  such that

v =  α1v1 + α2v2 + · · · + αk vk.

We call the scalars  α1, . . . ,αk  the  coordinates of  v  in the basis B .Proof.   As

B  is a spanning set for  V , there is at least one way of 

expressing v  as a linear combination of the basis vectors. So we justneed to show that there cannot be  two different linear combinations

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 43/212

mathematical methods i 43

each equal to v. So suppose that there are scalars  α1, α2, . . . ,αk  and

 β1, β2, . . . , βk  such that

v =  α1v1 + α2v2 + · · · + αk vk

v =  β1v1 + β2v2 + · · · + βk vk

Subtracting these expressions and rearranging, we discover that

0 = (α1 − β1)v1 + (α2 − β2)v2 + · · · + (αk − βk )vk .

As B  is a linearly independent set of vectors, the only linear com-

 bination equal to 0  is the trivial linear combination with all coeffi-

cients equal to  0, and so α1 −  β1  =  0, α2 −  β2  =  0, . . . , αk −  βk   =  0

and so αi  =  βi  for all i. Therefore the two linear combinations for v

are actually the same.

2.5.1   Dimension

As previously mentioned, a subspace of Rn will usually have in-finitely many bases. However, these bases will all share one feature

— every basis for a subspace  V  contains the same number  of vectors.

This fact is not at all obvious, and so we will give a proof for it.

Actually we will prove a slightly more technical result that has the

result about bases, along with a number of other useful results, as

simple consequences.17 While this is a result of fundamental impor-   17 A result that is a straightforwardconsequence of a theorem is called a“corollary”.

tance, it uses some fiddly notation with lots of subscripts, so do not

feel alarmed if you do not understand it first time through.

Theorem  2.34.  Let A  = {v1, v2, . . . , vk} be a linearly independent set

of vectors. Then any set of  >

 k vectors contained in V   =   span( A) isdependent.

Proof.   Let {w1, w2, . . . , w}  be a set of  >   k  vectors in V . Then

each of these can be expressed as a linear combination of the vec-

tors in  A, and so there are scalars  αij  ∈   R, where 1 ≤   i ≤   and

1 ≤  j ≤ k  such that:

w1 =  α11v1 + α12v2 + · · · + α1k vk

w2 =  α21v1 + α22v2 + · · · + α2k vk

...

w  =  α1v1 + α2v2 + · · · + αk vk

Now consider what happens when we test {w1, w2, . . . , w} for

linear dependence. We try to solve the system of linear equations

 β1w1 + β2w2 + · · · + βw  =  0   (2.7)

and determine if there are any non-trivial solutions to this system.

By replacing each w i   in (2.7) with the corresponding expression as

a linear combination of the vectors in  A, we get a huge equation:

0 =  β1(α11v1 + α12v2 + · · · + α1k vk)

+ β2(α21v1 + α22v2 + · · · + α2k vk)

+ · · ·+ β(α1v1 + α2v2 + · · · + αk vk).

(2.8)

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 44/212

44

However, this is a linear combination of the vectors in  A that is

equal to the zero vector. Because  A is a linearly independent set of 

vectors, this happens if and only if the coefficients of the vectors  vi

in (2.8) are all zero. In other words, the scalars { β1, β2, . . . , β} must

satisfy the following system of linear equations:

α11 β1 + α21 β2 + · · · + α1 β  =  0

α12 β1 + α22 β2 + · · · + α2 β  =  0

......

...

α1k  β1 + α2k  β2 + · · · + αk  β  =  0.

This is a homogeneous18 system of linear equations, and so it is   18 The constant term in each equationis zero.consistent. However, there are more variables than equations and

so there is at least one non-basic variable, and hence there is at least

one non-trivial choice of scalars { β1, β2, . . . , β} satisfying (2.7),

thereby showing that {w1, w2, . . . , w} is linearly dependent.

Corollary 2.35.  Every basis for a subspace V of Rn contains the same

number of vectors; this number is then called the  dimension of the sub-

space V and is denoted by dim(V ).

Proof.   Suppose that  A  = {v1, v2, . . . , vk} and B  = {w1, w2, . . . , w}are two bases for V . Then both  A and  B  are linearly independent

sets of vectors and both have the same span. So by Theorem  2.34 it

follows that k  ≤ and ≤ k  and so = k .

Example 2.36.   (Dimension of Rn) The standard basis for R2 contains

two vectors, the standard basis for R3 contains three vectors and the stan-

dard basis for Rn contains n vectors, so we conclude that the  dimension

of Rn is equal to n.

Example 2.37.  (Dimension of a line) A line through the origin in Rn is

a subspace consisting of all the multiples of a given non-zero vector:

L = {λv | λ ∈ R}

The set {v} containing the single vector v  is a basis for L and so a line is

1-dimensional.

This shows that the formal algebraic definition of dimension

coincides with our intuitive geometric understanding of the word

dimension, which is reassuring.19   19 Of course, we would expect this to be the case, because the algebraic no-tion of “dimension” was developed asan extension of the familiar geometricconcept.

Another important corollary of Theorem  2.34 is that we are fi-

nally in a position to show that the process of finding a basis by

extending   20 a linearly independent set will definitely finish.20 “extending” just means “addingvectors to”.Corollary 2.38.  If V is a subspace of Rn, then any linearly independent

set A of vectors in V is contained in a basis for V.

Proof.   If span( A)  =   V , then adding a vector in V \span( A) to A creates a strictly larger independent set of vectors. As no set

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 45/212

mathematical methods i 45

of  n  + 1 vectors in Rn is linearly independent, this process must

terminate in at most n  steps, and when it terminates, the set is a

 basis for V .

The next result seems intuitively obvious, because it just says

that dimension behaves as you would expect, in that a subspace canonly contain other subspaces if they have lower dimension.

Theorem  2.39.  Suppose that S, T are subspaces of Rn and that S    T.

Then dim(S) < dim(T ).

Proof.   Let  BS  be a basis for  S. Then  BS  is a linearly independent set

of vectors contained in  T , and so it can be extended to a basis for  T .

As span(BS) =  T  the basis for  T  is strictly larger than the basis for

S and so dim(S) < dim(T ).

Theorem 2.35 has a number of important consequences. If  A is a

set of vectors contained in a subspace  V , then normally there is noparticular relationship between the properties “ A is linearly inde-

pendent” and “ A is a spanning set for  V ” in that  A can have none,

either or both of these properties. However if  A has the right size to

 be a basis, then it must either have none or both of the properties.

Corollary 2.40.  Let V be a k-dimensional subspace of Rn. Then

1. Any linearly independent set of k vectors of V is a basis for V.

2. Any spanning set of k vectors of V is a basis for V.

Proof.  To prove the first statement, let  A be a linearly independent

set of vectors in V . By Corollary 2.38,  A can be extended to a basis

 but by Corollary  2.35 this basis contains k  vectors and so no vectors

can be added to  A. Therefore  A is already a basis for  V .

For the second statement, let B  be a spanning set of vectors,

and let B  be the largest independent set of vectors contained in

B. Then span(B) =   span(B) =   V  and so B  is a basis for V . By

Corollary 2.35 this basis contains k  vectors and so  B   =  B, showing

that B  was already linearly independent.

Example 2.41.   (Dimension of a specific plane) The subspace V   =

{(x, y, z)

 |  x + y + z   =   0

}is a plane through the origin in R

3. What

is its dimension? We can quickly find two vectors in V, namely  (1, −1, 0)

and (1,0, −1), and as they are not multiples of each other, the set

 A = {(1, −1, 0), (1,0, −1)}

is linearly independent. As R3 is 3-dimensional, any proper subspace of 

R3 has dimension at most 2, and so this set of vectors is a basis.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 46/212

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 47/212

3

 Matrices and Determinants

This chapter introduces matrix algebra  and explains the fundamental relationships between

matrices and their properties, and the various subspaces associated with a matrix.

Before commencing this chapter, students should be able to:

• Solve systems of linear equations,

• Confidently identify and manipulate subspaces, including rapidly determining spanning sets and

 bases for subspaces, and

• Add and multiply matrices.

After completing this chapter, students will be able to:

• Understand the operations of matrix algebra and identify the similarities and differences between

matrix algebra and the algebra of real numbers,

• Describe, and find bases for, the row space, column space and null space of a matrix,

• Find the rank and nullity of a matrix and understand how they are related by the rank-nullity theo-

rem, and

• Compute determinants and understand the relationship between determinants, rank and invertibility

of matrices.

An m × n matrix is a rectangular array of numbers with  m  rows

and n  columns.

3.1   Matrix Algebra

In this section we consider the  algebra of matrices — that is, the sys-tem of mathematical operations such as addition, multiplication,

inverses and so on, where the operands1 are matrices, rather than   1 This is mathematical terminology for“the objects being operated on”.numbers. In isolation, the basic operations are all familiar from

high school — in other words, adding two matrices or multiplying

two matrices should be familiar to everyone— but matrix algebra is

primarily concerned with the relationships between the operations.

3.1.1   Basic Operations

The basic operations for matrix algebra are  matrix addition, matrix

multiplication ,  matrix transposition and  scalar multiplication. For com-pleteness, we give the formal definitions of these operations:

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 48/212

48

Definition 3.1.  (Matrix Operations)

The basic matrix operations are matrix addition, matrix multiplication,

matrix transposition and  scalar multiplication, which are defined as

 follows:Matrix Addition: Let A  = (aij ) and B  = (bij ) be two m × n matrices.

Then their sum  C  =  A + B is the m × n matrix defined by

cij  =  a ij  + bij .

Matrix Multiplication: Let A   = (aij ) be an m ×  p matrix, and B   =

(bij ) be a p × n matrix. Then their  product C  =  AB is the m × n matrix

defined by

cij  =k = p

∑ k =1

aik bkj .

Matrix Transposition: Let A   = (aij ) be an m × n matrix, Then thetranspose C  =  AT  of A is the n × m matrix defined by

cij  =  a ji .

Scalar Multiplication: Let A = (aij ) be an m × n matrix, and α ∈ R be

a scalar. Then the scalar multiple C  =  α A is the m × n matrix defined by

cij  =  αaij .

The properties of these operations are mostly obvious but,again for completeness, we list them all and give them their for-

mal names.

Theorem  3.2.  If A, B and C are matrices and  α, β  are scalars then,

whenever the relevant operations are defined, the following properties hold:

1. A + B =  B  + A (matrix addition is  commutative)

2.   ( A + B) + C =  A + (B + C) (matrix addition is associative)

3.   α( A + B) = α A + αB

4.   (α + β) A =  α A + β A

5.   (αβ) A =  α( β A)

6. A(BC) = ( AB)C (matrix multiplication is associative)

7.   (α A)B =  α( AB) and A(αB) = α( AB)

8. A(B + C) =   AB +  AC (multiplication is left-distributive over

addition)

9.   ( A +  B)C   =   AC +  BC (multiplication is right-distributive over

addition)

10.   ( AT )T  =  A

11.   ( A + B)T  =  AT  + BT 

12.   ( AB)T  =  BT  AT 

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 49/212

mathematical methods i 49

Proof.  All of these can be proved by elementary algebraic manip-

ulation of the expressions for  (i, j)-entry of the matrices on both

sides of each equation. We omit the proofs because they are slightly

tedious and not very illuminating.2   2 However it may be worth your whileworking through one of them, say

the proof that matrix multiplication isassociative, to convince yourself thatyou can do it.

Almost all of the properties in Theorem  3.2 are unsurprising andessentially mirror the properties of the algebra of real numbers. 3

3 A 1 × 1 matrix can be viewed asessentially identical to a real number,and so this is also not surprising.

Probably the only property in the list that is not immediately “obvi-

ous” is property (12) stating that the transpose of a matrix product

is the product of the matrix transposes  in reverse order.

( AB)T  = B T  AT 

This property extends to longer products of matrices, for example

we can find the transpose of  ABC  as follows:4   4 A cautious reader might — correctly— object that  A BC  is not a legiti-mate expression in matrix algebra,

 because we have only defined ma-trix multiplication to be a productof  two  matrices. To make this a legalexpression in matrix algebra, it reallyneeds to be parenthesised, and so weshould use either  A(BC) or  ( AB)C.However because matrix multiplicationis associative, these evaluate to thesame matrix and so, by convention,as it does not matter which say theproduct is parenthesised, we omit theparentheses altogether.

( ABC)T  = (( AB)C)T  = CT ( AB)T  = CT (BT  AT ) = CT BT  AT 

It is a nice test of your ability to structure a proof by induction  to

prove formally that

( A1 A2 · · · An−1 An)T  =  AT n AT 

n−1 · · · AT 2 AT 

1 .

However, rather than considering the obvious properties that

are in the list, it is more instructive to consider the most obvious

omission from the list; in other words, an important property that

matrix algebra does not  share with the algebra of real numbers. This

is the property of commutativity of multiplication because, while

the multiplication of real numbers is commutative, it is easy tocheck by example that matrix multiplication is not commutative. For

example, 1 1

2 1

3 −1

0 4

 =

3 3

6 2

 but   3 −1

0 4

1 1

2 1

 =

1 2

8 4

.

If  A and  B  are two specific matrices, then it might be the case that

 AB  =  BA, in which case the two matrices are said to commute, but

usually it will be the case that  AB =   BA. This is a key difference between matrix algebra and the algebra of real numbers.

There are some other key differences worth delving into: in real

algebra, the numbers 0 and 1 play special roles, being the  additive

identity and  multiplicative identity respectively. In other words, for

any real number x ∈ R we have

x + 0 =  0 + x  =  x   and 1.x =  x .1 =  x

and

0.x =  x .0 =  0. (3.1)

In the algebra of  square matrices (that is, n × n matrices for somen) we can analogously find an additive identity and a multiplicative

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 50/212

50

identity. The additive identity is the matrix On  with every entry equal

to zero, and it is obvious that for any  n × n matrix  A,

 A + On = On +  A  =  A .

The multiplicative identity is the matrix  I n  where every entry onthe main diagonal5 is equal to one, and every entry off the main   5 The main diagonal consist of the

(1, 1), (2, 2), . . ., (n, n) positions. As anexample,

I 3  =

1 0 0

0 1 00 0 1

.

diagonal is equal to zero. Then for any  n × n matrix  A,

 AI n =  I n A =  A.

When the size of the matrices is unspecified, or irrelevant, we will

often drop the subscript and just use  O  and  I  respectively. As the

terms “additive/multiplicative identity" are rather cumbersome, the

matrix O  is usually called the  zero matrix and the matrix  I   is usually

called the identity matrix or just the identity.

The property (3.1) relating multiplication and zero also holds inmatrix algebra, because it is clear that

 AO =  O A =  O

for any square matrix  A. However, there are other important prop-

erties of real algebra that are not  shared by matrix algebra. In par-

ticular, in real algebra there are no non-zero zero divisors, so that if 

xy  =  0 then at least one of  x  and  y  is equal to zero. However this is

not true for matrices — there are products equal to the zero matrix

even if neither matrix is zero. For example,

1 1

2 2

  2 −1

−2 1

 =

0 0

0 0

.

Definition 3.3.  (A menagerie of square matrices)

Suppose that A = (aij ) is an n × n matrix. Then

• A is the zero matrix if aij  =  0  for all i, j.

• A is the identity matrix if aij  =  1  if i =  j and 0  otherwise.

• A is a symmetric matrix if AT  =  A.

• A is a skew-symmetric matrix if AT  = − A.

• A is a diagonal matrix if aij  =  0  for all i =  j .

• A is an upper-triangular matrix if aij  =  0  for all i >   j.

• A is a lower-triangular matrix if aij  =  0  for all i <  j .

• A is an idempotent matrix if A2 =  A, where A2 =  AA is the

 product of A with itself.

• A is a nilpotent matrix if Ak  = O for some k, where Ak  =  AA · · · A   k times

.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 51/212

mathematical methods i 51

Example 3.4.  (Matrices of various types) Consider the following 3 × 3

matrices:

 A =

1 0 −1

0 2 3

0 0 1

  B =

−1 0 0

0 2 0

0 0 0

  C =

−1 0 1

0 2 1

1 1 3

Then A is an upper-triangular matrix, B is a diagonal matrix and C is

a symmetric matrix.

Example 3.5.  (Nilpotent matrix) The matrix

 A =

  2 4

−1 −2

is nilpotent because

 A2 =

  2 4

−1 −2

  2 4

−1 −2

 =

0 0

0 0

Exercise 3.1.1.  Show that an upper-triangular matrix with zero diagonal

is nilpotent.

3.2   Subspaces from matrices

There are various vector subspaces associated with a matrix, and

there are useful relationships between the properties of the matrices

and subspaces. The three principal subspaces associated with a

matrix are the row space, the column space and the null space of a

matrix. The first two of these are defined analogously to each other,while the third is somewhat different. If  A is an m × n matrix,

then each of the rows of the matrix can be viewed as a vector in Rn,

while each of the columns of the matrix can be viewed as a vector in

Rm.

Definition 3.6.  (Row and Column Space)

Let A be an m × n matrix. Then the row space and column space are

defined as follows:

Row Space The  row space of A is the subspace of Rn that is spanned

 by the rows of A. In other words, the row space is the subspace con-sisting of all the linear combinations of the rows of A. We denote it by

rowsp( A).

Column Space The  column space of A is the subspace of Rm that is

spanned by the columns of A. In other words, the column space is the

subspace consisting of all the linear combinations of the columns of A. We

denote it by colsp( A).

Example 3.7.   (Row space and Column Space) Let A be the 3 × 4 matrix

 A = 1 0 1

 −1

2 −1 2 01 0 2 1

  (3.2)

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 52/212

52

Then the row space of A is the subspace of R4 defined by

rowsp( A) = span({(1,0,1, −1), (2, −1,2,0), (1,0,2,1)}),

while the column space of A is the subspace of R3 defined by

colsp( A) = span({(1,2,1), (0, −1, 0), (1,2,2), (−1,0,1)}).

(Remember the conventions regarding row and column vectors described

in Chapter 1.)

As described in Chapter 2, it is easy to answer any particular

question about a subspace if you know a spanning set for that

subspace. In particular, it is easy to determine whether a given

vector is in the row or column space of a matrix just by setting up

the appropriate system of linear equations.

Example 3.8.  (Vector in row space) Is the vector  (2,1, −1, 3) in the rowspace of the matrix A shown in  (3.2) above? This question is equivalent to

asking whether there are scalars  λ1, λ2 and  λ3  such that

λ1(1,0,1, −1) + λ2(2, −1,2,0) + λ3(1,0,2,1) = (2,1, −1, 3).

By considering each of the four coordinates in turn, this corresponds to the

 following system of four linear equations in the three variables:

λ1   +   2λ2   +   λ3   =   2

−   λ2   =   1

λ1

  +  2λ

2  +

  2λ

3  =

 −1−λ1   +   λ3   =   3

The augmented matrix for this system is

1 2 1 2

0 −1 0 1

1 2 2   −1

−1 0 1 3

which, after row-reduction, becomes

1 2 1 20 −1 0 1

0 0 1   −3

0 0 0 13

and so the system is  inconsistent and we conclude that (2,1, −1, 3)   /∈rowsp( A). Notice that the coefficient matrix of the system of linear equa-

tions is the transpose of the original matrix.

Example 3.9.  (Vector in column space) Is the vector  (1, −1, 2) in the row

space of the matrix A shown in  (3.2) above? This question is equivalent to

asking whether there are scalars  λ1, λ2, λ3 and  λ4 such that

λ1(1,2,1) + λ2(0, −1, 0) + λ3(1,2,2) + λ4(−1,0,1) = (1, −1, 2).

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 53/212

mathematical methods i 53

By considering each of the three coordinate positions in turn, this corre-

sponds to a system of three equations in the four variables:

λ1   +   λ3 −   λ4   =   1

2λ1

 −  λ2   +   2λ3   =

 −1

λ1   +   2λ3   +   λ4   =   2

The augmented matrix for this system is 1 0 1 −1 1

2 −1 2 0   −1

1 0 2 1 2

which, after row reduction, becomes

1 0 1 −1 1

0 −1 0 2   −3

0 0 1 2 1

and so this system of linear equations has three basic variables, one free

 parameter and therefore infinitely many solutions. So we conclude that

(1, −1, 2) ∈   colsp( A). We could, if necessary, or just to check, find a

 particular solution to this system of equations. For example, if we set the

 free parameter λ4  = 1  then the corresponding solution is  λ1  = 3, λ2  = 5,

λ3 = −1 and  λ4 =  1  and we can check that

3(1,2,1) + 5(0, −1, 0) − (1,2,2) + (−1,0,1) = (1, −1, 2).

Notice that in this case, the augmented matrix of the system of linear

equations is just the original matrix itself.

In the previous two examples, the original question led to sys-

tems of linear equations whose coefficient matrix was either the

original matrix or the transpose of the original matrix.

In addition to being able to identify whether particular vectors

are in the row space or column space of a matrix, we would also

like to be able to find a a  basis for the subspace and thereby deter-

mine its dimension. In Chapter 2  we described a technique where

any spanning set for a subspace can be  reduced to a basis by suc-

cessively throwing out vectors that are linear combinations of the

others. While this technique works perfectly well for determining

the dimension of the row space or column space of a matrix, there

is an alternative approach based on two simple observations:

1. Performing elementary row operations on a matrix does not

change its row space.6   6 However, elementary row operationsdo change the  column space!!2. The non-zero rows of a matrix in row-echelon form are linearly

independent, and therefore form a basis for the row space of that

matrix.

The consequence of these two facts is that it is very easy to find

a basis for the row space of a matrix — simply put it into row-echelon form and then write down the non-zero rows that are

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 54/212

54

found. However the basis that is found by this process will not

usually be a subset of the  original rows of the matrix. If it is neces-

sary to find a basis for the row space whose vectors are all original

rows of the matrix, then either of the two earlier techniques can be

used.

Example 3.10.  (Basis of row space) Consider the problem of finding a

basis for the row space of the  4 × 5 matrix

 A =

1 2 −1 −1 4

0 0 0 0 0

1 −1 1 0 1

3 0 1 −1 6

.

 After performing the elementary row operations R3 ←  R3 − 1R1, R4 ←R4 − 3R1 then R2 ↔ R3 and finally R4 ← R4 − 2R2, we end up with the

 following matrix, which we denote A, which is in row-echelon form.

 A =

1 2 −1 −1 4

0 −3 2 1 −3

0 0 0 0 0

0 0 0 0 0

The key point is that the elementary row operations have not changed the

row space of the matrix in any way and so  rowsp( A) =   rowsp( A).

 However it is obvious that the two non-zero rows

{(1,2, −1, −1, 4), (0, −3,2,1, −3)}are a basis for the row space of A, and so they are also a basis for the row

space of A.To find a basis for the  column space of the matrix  A, we cannot

do elementary row operations because they alter the column space.

However it is clear that colsp( A) =   rowsp( AT ), and so just trans-

posing the matrix and then performing the same procedure will

find a basis for the column space of  A.

Example 3.11.  (Basis of column space) What is a basis for the  column

space of the matrix A of Example  3.10? We first transpose the matrix,

 getting

 AT  =

1 0 1 3

2 0 −1 0−1 0 1 1

−1 0 0 −1

4 0 1 6

and then perform Gaussian elimination to obtain the row-echelon matrix

1 0 1 3

0 0 −3 −6

0 0 0 0

0 0 0 0

0 0 0 0

whose row space has basis {(1,0,1,3), (0,0, −3, −6)}. Therefore thesetwo vectors are a basis for the column space of A.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 55/212

mathematical methods i 55

What can be said about the dimension of the row space and col-

umn space of a matrix? In the previous two examples, we found

that the row space of the matrix  A is a 2-dimensional subspace of 

R5, and the column space of  A is a 2-dimensional subspace of R4.

In particular, even though they are subspaces of different ambient

vector spaces, the dimensions of the row space and column space

turn out to be equal. This is not an accident, and in fact we have the

following surprising result:

Theorem  3.12.  Let A be an m × n matrix. Then the dimension of its row

space is equal to the dimension of its column space.

Proof.   Suppose that {v1, v2, . . . , vk} is a basis for the column space

of  A. Then each column of  A can be expressed as a linear combi-

nation of these vectors; suppose that the i-th column ci  is given

 by

ci =  γ1iv1 + γ2iv2 + · · · γki vk

Now form two matrices as follows:  B is an  m × k  matrix whose

columns are the basis vectors  vi, while C   = (γij ) is a k × n matrix

whose i-th column contains the coefficients γ1i, γ2i, . . ., γki . It then

follows7 that  A =  BC.   7 You may have to try this out witha few small matrices first to see whythis is true. It is not difficult whenyou see a small situation, but it is notimmediately obvious either.

However we can also view the product  A   =   BC  as expressing

the rows of  A as a linear combination of the rows of  C  with the i-th

row of  B  giving the coefficients for the linear combination that de-

termines the i-th row of  A. Therefore the rows of  C  are a spanning

set for the row space of  A, and so the dimension of the row space of 

 A is at most k . We conclude that

dim(rowsp( A)) ≤ dim(colsp( A))

Applying the same argument to  AT  we also conclude that

dim(colsp( A)) ≤ dim(rowsp( A))

and hence these values are equal.

This number — the common dimension of the row space and

column space of a matrix — is an important property of a matrix

and has a special name:

Definition 3.13.   (Matrix Rank)

The dimension of the row space (and column space) of a matrix is called

the rank of the matrix. If an n × n matrix has rank n, then the matrix is

said to be full rank.

There is a useful characterisation of the row- and column spaces

of a matrix that is sufficiently important to state separately.

Theorem  3.14.  If A is an m

×n matrix, then the set of vectors

{ A x |  x ∈ Rn}

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 56/212

56

is equal to the column space of A, while the set of vectors

{y A | y ∈ Rm}

is equal to the row space of A.

Proof.   Suppose that {c1, c2, . . . , cn} ∈   Rm are the n  columns of 

 A. Then if  x   = (x1, x2, . . . , xn), it is easy to see that  A x   =   x1c1 +

x2c2 + . . . + xncn. Therefore every vector of the form  A x is a linear

combination of the columns of  A, and every linear combination of 

the columns of  A can be obtained by multiplying  A by a suitable

vector. A similar argument applies for the row space of  A.

3.3   The null space

Suppose that  A is an m × n matrix. If two vectors  v1, v2  have the

property that  Av1   =   0 and  Av2   =   0, then simple manipulationshows that

 A(v1 + v2) =  Av1 + Av2 =  0 + 0 =  0

and for any λ ∈ R,

 A(λv1) = λ Av1  =  λ0 =  0

Therefore the set of vectors  v  with the property that  Av   =   0 is

closed under vector addition and scalar multiplication, and there-

fore it satisfies the requirements to be a  subspace.

Definition 3.15.   (Null space)

Let A be an m × n matrix. The set of vectors

{v ∈ Rn |  Av =  0}

is a subspace of Rn called the null space of A and denoted by nullsp( A).

Example 3.16.  (Null space) Is the vector  v   = (0,1, −1, 2) in the null

space of the matrix

 A = 1 2 2 03 0 2 1?

 All that is needed is to check Av and see what arises. As

1 2 2 0

3 0 2 1

0

1

−1

2

=

0

0

it follows that  v ∈ nullsp( A).

This shows that testing membership of the null space of a matrix

is a very easy task. What about finding a  basis for the null space of a matrix? This turns out8 to be intimately related to the techniques   8 No surprise here!

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 57/212

mathematical methods i 57

we used Chapter  1  to solve systems of linear equations.

So, suppose we wish to find a basis for the matrix

 A =

1 2 2 0

3 0 2 1

from Example 3.16.  The matrix equation  A x  =  0 yields the follow-

ing system of linear equations

x1   +   2x2   +   2x3   =   0

3x1   +   2x3   +   x4   =   0

which has augmented matrix1 2 2 0 0

3 0 2 1 0

After the single elementary row operation  R2

 ←  R2

−3R1, the

matrix is in row echelon form:1 2 2 0 0

0 −6 −4 1 0

Therefore x3  and  x4  are  free parameters; the second equation shows

that x2   = (x4 − 4x3)/6 and substituting this into the first equation

yields  x1   = −(2x3 +  x4)/3. Thus, following the techniques of 

Chapter 1  we can describe the solution set as

S = {(−(2x3 + x4)/3,   (x4 − 4x3)/6,   x3,   x4) | x3, x4 ∈ R}

In order to find a basis for  S  notice that we can rewrite the solu-tion as a linear combination of vectors by separating out the terms

involving  x3  from the terms involving x4

(−(2x3 + x4)/3, (x4 − 4x3)/6,   x3,   x4) =

= (−2x3/3, −4x3/6,   x3, 0) + (−x4/3,   x4/6, 0,   x4)

=  x3(−2/3, −2/3, 1, 0) + x4(−1/3, 1/6, 0, 1)

Therefore we can express the solution set  S  as follows:

S = {x3(−2/3, −2/3, 1, 0) + x4(−1/3, 1/6, 0, 1) | x3, x4 ∈ R}

However this immediately tells us that  S  just consists of  all thelinear combinations of the two vectors  (−2/3, −2/3, 1, 0) and

(−1/3, 1/6, 0, 1) and therefore we have found, almost by acci-

dent, a spanning set for the subspace  S. It is immediate that these

two vectors are linearly independent and therefore they form a

 basis for the null space of  A.

In general, this process will  always find a basis for the null space

of a matrix. If the set of solutions to the system of linear equations

has s  free parameters, then it can be expressed as a linear combina-

tion of  s  vectors. These  s  vectors will always be linearly independent

 because in each of the s  coordinate positions corresponding to the

free parameters, just one of the s  vectors will have a non-zero entry.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 58/212

58

Definition 3.17.   (Nullity)

The dimension of the null space of a matrix A is called the nullity of A.

We close this section with one of the most important results in

elementary linear algebra, which is universally called the  Rank-

Nullity Theorem, which has a surprisingly simple proof.

Theorem  3.18.  Suppose that A is an m × n matrix. Then

rank( A) + nullity( A) = n.

Proof.  Consider the system of linear equations  A x   =   0, which is

a system of  m  equations in n  unknowns. This system is solved by

applying Gaussian elimination to the augmented matrix  [ A |   0],

thereby obtaining the matrix  [ A | 0]  in row-echelon form. The rankof  A is equal to the number of non-zero rows of  A, which is equal

to the number of  basic variables in the system of linear equations.

The nullity of  A is the number of free parameters in the solution

set to the system of linear equations and so it is equal to the num-

 ber of  non-basic variables. So the rank of  A  plus the nullity of  A is

equal to the number of basic variables plus the number of non-basic

variables. As each of the  n  variables is either basic or non-basic, the

result follows.

Given a matrix, it is important to be able to put all the techniques

together and to determine the rank, the nullity, a basis for the nullspace and a basis for the column space of a given matrix. This is

demonstrated in the next example.

Example 3.19.  (Rank and nullity) Find the rank, nullity and bases for

the row space and null space for the following  4 × 4 matrix:

 A =

1 0 2 1

3 1 3 3

2 1 1 0

2 1 1 2

.

 All of the questions can be answered once the matrix is in row-echelon form, and so the first task is to apply Gaussian elimination, which will

result in the following matrix:

 A =

1 0 2 1

0 1 −3 0

0 0 0 −2

0 0 0 0

.

The matrix has  3  non-zero rows and so the  rank of A is equal to  3. These

non-zero rows form a basis for the rowspace of A and so the basis for the

rowspace of A is

{(1,0,2,1),   (0,1, −3, 0),   (0,0,0, −2)}.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 59/212

mathematical methods i 59

The null space is the set of solutions to the matrix equation A x   =   0,

and solving this equation by performing Gaussian elimination on the

augmented matrix [ A |  0] would yield the augmented matrix [ A |  0].   9   9 In other words, the Gaussian elim-ination part only needs to be doneonce, because everything dependsonly on the form of the matrix  A .However it is important to rememberthat although it is the same matrix,we are using it in two quite distinctways. This distinction is often missed by students studying linear algebra forthe first time.

So given the augmented matrix

 A =

1 0 2 1 00 1 −3 0 0

0 0 0 −2 0

0 0 0 0 0

,

we see that x1, x2 and x4  are basic variables, while the solution space has

x3 as its only free parameter. Expressing the basic variables in terms of the

 free parameters by back-substitution we determine that the solution set is

S = {(−2x3, 3x3, x3, 0) | x3 ∈ R}

and it is clear that a basis for this is

{(

−2,3,1,0)

}and so the nullity of A

is equal to 1.

3.4   Solving systems of linear equations

We have seen that the set of all solutions to the system of linear

equations A x  =  0  is the nullspace of  A. What can we say about the

set of solutions of 

 A x =  b   (3.3)

when b = 0?

Suppose that we know one solution  x1  and that v  lies in the

nullspace of  A. Then

 A( x1 + v) =  A x1 + Av =  b + 0 =  b

Hence if we are given one solution we can create many more by

simply adding elements of the nullspace of  A.

Moreover, given any two solutions  x1  and  x2  of (3.3) we have that

 A( x1 − x2) =  A x1 −  A x2  =  b − b =  0

and so x1 − x2  lies in the nullspace of  A. In particular, every solu-

tion of  A x =  b  is of the form  x1 + v  for some  v ∈ nullsp( A). This is

so important that we state it as a theorem.

Theorem  3.20.   Let A x  =  b be a system of linear equations and let  x1 be

one solution. Then the set of all solutions is

{ x1 + v | v ∈ nullsp( A)}

Corollary 3.21.  The number of free parameters required for the set of 

solutions of A x =  b  is the dimension of the nullspace of A.

3.5   Matrix Inversion

In this section we consider the  inverse of a matrix. The theory of inverses in matrix algebra is far more subtle and interesting than

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 60/212

60

that of inverses in real algebra, and it is intimately related to the

ideas of independence and rank that we have explored so far.

In real algebra, every non-zero element has a  multiplicative in-

verse; in other words, for every  x =  0 we can find another number,

x  such that

xx  =  x x =  1

Rather than calling it x , we use the notation  x−1 to mean “the

number that you need to multiply  x  by in order to get  1”, thus

getting the familiar

2−1 = 1

2

In matrix algebra, the concept of a multiplicative identity and

hence an inverse only makes sense for square matrices. However,

even then, there are some complicating factors. In particular, be-

cause multiplication is not commutative, it is  conceivable that a

product of two square matrices might be equal to the identity onlyif they are multiplied in a particular order. Fortunately, this does

not actually happen:

Theorem  3.22.  Suppose that A and B are square matrices such that

 AB  =  I n. Then BA =  I n.

Proof.  First we observe that  B  has rank equal to n. This follows

 because if  Bv  =  0 for any non-zero vector v, then  ABv  =  0, which

contradicts the fact that  AB   =   I n. So the null space of  B  contains

only the zero vector, so  B  has nullity 0 and therefore, by the Rank-

Nullity theorem, B  has rank n.Secondly we do some simple manipulation using the properties

of matrix algebra that were outlined in Theorem  3.2:

On =  AB − I n   (because  AB  =  I n)

=  B( AB − I n)   (because  BOn =  On)

=  B AB − B   (distributivity)

= (BA − I n)B   (distributivity)

This manipulation shows that the matrix (BA − I )B   =   On.

However because the rank of  B  is  n, it follows that the column

space of  B  is the whole of Rn, and so any vector  v  ∈   Rn can beexpressed in the form  B x for some  x. Therefore

(BA − I n)v = (BA − I n)B x   (because  B  has rank n)

= On x   (because (BA − I n)B =  On)

= 0   (properties of zero matrix)

The only matrix whose product with  every vector  v  is equal to 0  is

the zero matrix itself, so B A − I n =  On  or B A =  I n  as required.

This theorem shows that when defining the inverse of a matrix,

we don’t need to worry about the  order in which the multiplicationoccurs.10   10 In some text-books, the authors

introduce the idea that B  is the “left-inverse” of  A  if  B A   =   I  and the“right-inverse” of  A  if  A B   =   I , andthen immediately prove Theorem  3.22

showing that a left-inverse is a right-inverse and vice versa.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 61/212

mathematical methods i 61

Another property of inverses in the algebra of real numbers is

that a non-zero real number has a  unique inverse. Fortunately, this

property also holds for matrices:

Theorem  3.23.  If A, B and C are square matrices such that

 AB  =  I n   and AC =  I n

then B =  C.

Proof.  The proof just proceeds by manipulation using the properties

of matrix algebra outlined in Theorem  3.2.

B =  B I n   (identity matrix property)

= B( AC)   (hypothesis of theorem)

= (BA)C   (associativity)

=  I n

C  (by Theorem

 3.22

)= C   (identity matrix property)

Therefore a matrix has  at most one inverse.

Definition 3.24.  (Matrix Inverse)

Let A be an n × n matrix. If there is a matrix B such that

 AB  =  I n

then B is called the  inverse of A, and is denoted A−1. From The-

orems  3.22 and  3.23, it follows that B is uniquely determined, thatBA  =  I n, and that B−1 =  A.

 A matrix is called  invertible if it has an inverse, and  non-invertible

otherwise.

Example 3.25.  (Matrix Inverse) Suppose that

 A =

1 0 1

2 1 1

2 0 1

Then if we take

B =

−1 0 1

0 1 −1

2 0 −1

then it is easy to check that

 AB  =  B A =  I n

Therefore we conclude that A−1 exists and is equal to B and, naturally,

B−1 exists and is equal to A.

In real algebra,  every non-zero number has an inverse, but this isnot the case for matrices:

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 62/212

62

Example 3.26.  (Non-zero matrix with no inverse) Suppose that

 A =

1 1

2 2

.

Then there is no possible matrix  B such that AB  =   I 2. Why is this? If the matrix B existed, then it would necessarily satisfy

1 1

2 2

 b11   b12

b21   b22

 =

1 0

0 1

.

In order to satisfy this matrix equation, then b 11 + b21 must equal 1, while

2b11 + 2b21  = 2(b11 + b21) must equal 0  —- clearly this is impossible. So

the matrix A has no inverse.

One of the common mistakes made by students of elementary

linear algebra is to assume that every matrix has an inverse. If an

argument or proof about a generic matrix  A ever uses  A−1 as part

of the manipulation, then it is necessary to first demonstrate that  A

is actually invertible. Alternatively, the proof can be broken down

into two separate cases, one covering the situation where  A is as-

sumed to be invertible and a separate one for where it is assumed

to be non-invertible.

3.5.1   Finding inverses

This last example of the previous section (Example  3.26) essentially

shows us how to  find  the inverse of a matrix because, as usual, it all boils down to solving systems of linear equations. If  A  is an n × n

matrix then finding its inverse, if it exists, is just a matter of finding

a matrix B  such that  AB   =   I n. To find the  first column of  B, it is

sufficient to solve the equation  A x   =   e1, then the second column

is the solution to  A x  =  e2, and so on. If any of these equations has

no solutions then  A does not have an inverse.11 Therefore, finding   11 In fact, if any of them have infinitelymany solutions, then it also followsthat  A  has no inverse because if theinverse exists it must be unique. Thusif one of the equations has infinitelymany solutions, then one of the  otherequations must have no solutions.

the inverse of an  n × n matrix involves solving  n  separate systems

of linear equations. However because each of the  n  systems has the

same coefficient matrix (that is,  A), there are shortcuts that make

this procedure easier.

To illustrate this, we do a full example for a 3 × 3 matrix, al-though the principle is the same for any matrix. Suppose we want

to find the inverse of the matrix

 A =

−1 0 1

0 1 −1

2 0 −1

.

The results above show that we just need to find a matrix  B  = (bij )

such that

−1 0 1

0 1 −12 0 −1

b11   b12   b13

b21   b22   b23

b31   b32   b33

 = 1 0 0

0 1 00 0 1

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 63/212

mathematical methods i 63

This can be done by solving  three separate systems of linear equa-

tions, one to determine each column of  B:

 A

b11

b21

b31

 =

1

0

0

,   A

b12

b22

b32

 =

0

1

0

  and   A

b13

b23

b33

 =

0

0

1

.

Then the matrix  A has an inverse if and only if all three of these

systems of linear equations have a solution, and in fact, each of 

them must have a unique solution. If any one of the three equations

is inconsistent, then  A is one of the matrices that just doesn’t have

an inverse.

Consider how the solution of these systems of linear equations

will proceed: for the first column, the Gaussian elimination pro-

ceeds as follows: We start with the augmented matrix

−1 0 1 1

0 1 −1 02 0 −1 0

,

and after pivoting on the (1, 1)-entry we get −1 0 1 1

0 1 −1 0

0 0 1 2

  R3 ← R3 + 2R1

which is now in row-echelon form. As every variable is basic, this

system of linear equations has a unique solution, which can easily

 be determined by back-substitution, giving b31   =   2, b21   =  2 and

b11 =  1.Now we solve for the second column of  B; this time the aug-

mented matrix is   −1 0 1 0

0 1 −1 1

2 0 −1 0

,

and after pivoting on the (1, 1)-entry we get

−1 0 1 0

0 1 −1 1

0 0 1 0

  R3 ← R3 + 2R1

It is immediately apparent that we used the  exact same elementary

row operations as we did for the previous system of linear equations

 because, naturally enough, the coefficient matrix is the same matrix.

And obviously, we’ll do the same elementary row operations  again

when we solve the third system of linear equations! So to avoid

repeating work unnecessarily, it is better to solve  all three systems

simultaneously. This is done by using a sort of “super-augmented”

matrix that has three columns to the right of the augmenting bar,

representing the right-hand sides of the three separate equations:

−1 0 1 1 0 0

0 1 −1 0 1 02 0 −1 0 0 1

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 64/212

64

Then performing an elementary row operation on this bigger ma-

trix has exactly the same effect as doing it on each of the three

systems separately.

−1 0 1 1 0 0

0 1 −1 0 1 00 0 1 2 0 1

  R3 ← R3 + 2R1

We could now break this apart again into three separate systems

of linear equations and solve each of them by back-substitution, but

again this involves a little bit more work than necessary.

Suppose instead that we do some more elementary row operations

in order to make the system(s) of linear equations even simpler than

 before. Usually, when we pivot on an entry in the matrix, we use

the pivot row in order to zero-out the rest of the column  below the

pivot entry. However there is nothing stopping us from zeroing out

the rest of the column above

 the pivot entry as well. Let’s do this onthe (3, 3)-entry in the matrix and see how useful it is: −1 0 0   −1 0 −1

0 1 0 2 1 1

0 0 1 2 0 1

R1 ← R1 − R3

R2 ← R2 + R3

One final elementary row operation puts the matrix into an espe-

cially nice form.

1 0 0 1 0 1

0 1 0 2 1 1

0 0 1 2 0 1

R1 ← (−1)R1

In this form not even any back substitution is needed to find the

solution; the first system of equations has solution  b11  =  1, b21  =  2

and b31   =  3, while the second has solution  b12   =   0, b22   =  1 and

b32   =   0 while the final system has solution b13   =   1, b23   =  1 and

b33 =  1. Thus the inverse of the matrix  A is given by

 A−1 =

1 0 1

2 1 1

2 0 1

which is just exactly the matrix that was found to the right of the

augmenting bar!In this example, we’ve jumped ahead without using the formal

terminology or precisely defining the “especially nice form” of the

final matrix. We remedy this immediately.

Definition 3.27.  (Reduced row-echelon form)

 A matrix is in reduced row-echelon form if it is in row echelon form,

and

1. The leading entry of each row is equal to one, and

2. The leading entry of each row is the only non-zero entry in its column.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 65/212

mathematical methods i 65

Example 3.28.  (Reduced row-echelon form) The following matrices are

both in reduced row echelon form:

1 0 0 2 0

0 1 0 0 0

0 0 1 3 00 0 0 0 1

and

1 0 0

0 1 0

0 0 10 0 0

.

Example 3.29.  (Not in reduced row-echelon form) The following matrices

are NOT in reduced row echelon form:

1 0 0 2 0

0 −1 0 0 0

0 0 1 3 0

0 0 0 0 2

and

1 0 0

0 1 0

0 0 1

0 0 1

.

A simple modification to the algorithm for Gaussian elimination

yields an algorithm for reducing a matrix to reduced row echelon

form; this algorithm is known as  Gauss-Jordan elimination. It is pre-

sented below, with the differences between Gaussian elimination

and Gauss-Jordan elimination highlighted in boldface.

Definition 3.30.  (Gauss-Jordan Elimination — in words)

Let A be an m × n matrix. At each stage in the algorithm, a particular

 position in the matrix, called the pivot position, is being processed. Ini-

tially the pivot position is at the top-left of the matrix. What happens ateach stage depends on whether the pivot entry (that is, the number in the

 pivot position) is zero or not.

1. If the pivot entry is zero then, if possible, interchange the pivot row

with one of the rows below it, in order to ensure that the pivot entry is

non-zero. This will be possible unless the pivot entry and every entry

below it are zero, in which case simply move the pivot position one

column to the right.

2. If the pivot entry is non-zero, multiply the pivot row to ensure

that the pivot entry is  1  and then, by adding a suitable multiple of 

the pivot row to every row above and below the pivot row, ensure that

every entry above and below the pivot entry is zero. Then move the

 pivot position one column to the right and one row down.

When the pivot position is moved off the matrix, then the process finishes

and the matrix will be in  reduced row echelon form.

Now we can formally summarise the procedure for finding the

inverse of a matrix. Remember, however, that this is simply a way

of  organising the calculations efficiently, and that there is nothing

more sophisticated occurring than solving systems of linear equa-tions.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 66/212

66

Key Concept  3.31.  (Finding the inverse of a matrix) In order to find

the inverse of an n × n matrix A, proceed as follows:

1. Form the “super-augmented” matrix

[ A |  I n]

2. Apply Gauss-Jordan elimination to this matrix to place it into

reduced row-echelon form

3. If the resulting reduced row echelon matrix has an identity matrix

to the left of the augmenting bar, then it must have the form

[I n |  A−1]

and so A−

1 will be the matrix on the right of the augmenting bar.

4. If the reduced row echelon matrix does not have an identity ma-

trix to the left of the augmenting bar, then the matrix A is not

invertible.

It is interesting to note that, while it is important to understand

what a matrix inverse is and how  to calculate a matrix inverse, it is

almost never necessary to actually find an explicit matrix inverse

in practice. An explicit problem for which a matrix inverse might

 be useful can almost always be solved directly (by some form of 

Gaussian elimination) without actually computing the inverse.However, as we shall see in the next section, understanding

the procedure for calculating an inverse is useful in developing

theoretical results.

3.5.2   Characterising invertible matrices

In the last two subsections we have defined the inverse of a matrix,

demonstrated that some matrices have inverses and others don’t

and given a procedure that will either find the inverse of a matrix

or demonstrate that it does not exist. In this subsection, we consider

some of the special properties of  invertible matrices focussing onwhat makes them invertible, and what particular properties are

enjoyed by invertible matrices.

Theorem  3.32.   An n × n matrix is invertible if and only if it has rank 

equal to n.

Proof.   This is so important that we give a couple of proofs in

slightly different language, though the fundamental concept is

the same in both proofs.

Proof 1: Applying elementary row operations to a matrix does not

alter its row space, and hence its rank. If a matrix  A is invertible,

then Gauss-Jordan elimination applied to  A will yield the iden-tity matrix, which has rank n. If  A is not invertible, then applying

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 67/212

mathematical methods i 67

Gauss-Jordan elimination to  A  yields a matrix with at least one row

of zeros, and so it does not have rank  n.

Proof 2: If a matrix  A is invertible then there is always a solution

to the matrix equation

 A x =  v

for every v, and so the column space of  A is equal to the whole

of Rn. Conversely if  A is not invertible, then at least one of the

standard basis vectors {e1, e2, . . . , en} is not in the column space of 

 A and so the rank of  A is strictly less than  n.

There are some other characterisations of invertible matrices that

may be useful, but they are all really just elementary restatements

of Theorem 3.32.

Theorem  3.33.  Let A be an n × n matrix. Then

1. A is invertible if and only if its rows are linearly independent.

2. A is invertible if and only if its columns are linearly independent.

3. A is invertible if and only if its row space is Rn.

4. A is invertible if and only if its column space is Rn.

Proof.  These are all ways of saying “the rank of  A is  n”.

Example 3.34.  (Non-invertible matrix) The matrix

 A =

0 1 2

1 2

 −1

1 3 1

is not invertible because

(0,1,2) + (1,2, −1) = (1,3,1)

is a dependency among the rows, and so the rows are not linearly indepen-

dent.

Now let’s consider some of the  properties of invertible matrices.

Theorem  3.35.  Suppose that A and B are invertible n × n matrices, and

k is a positive integer. Then

1. The matrix AB is invertible, and

( AB)−1 = B−1 A−1.

2. The matrix Ak  is invertible, and Ak 

−1=

 A−1k 

.

3. The matrix AT  is invertible, and

 AT 

−1=

 A−1T 

.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 68/212

68

Proof.   To show that a matrix is invertible, it is sufficient to demon-

strate the existence of  some matrix whose product with the given

matrix is the identity. Thus to show that  A B is invertible, we must

find something that we can multiply  AB  by in order to end up with

the identity.

( AB)(B−1 A−1) =  A(BB−1) A−1 (associativity)

=  AI n A−1 (properties of inverses)

=  AA−1 (properties of identity)

=  I n   (properties of inverses)

This shows that  AB  is invertible, and that its inverse is  B−1 A−1

as required. The remaining two statements are straightforward to

prove.

This theorem shows that the collection of invertible  n

×n ma-

trices is closed under matrix multiplication. In addition, there is amultiplicative identity (the matrix  I n) and every matrix has an in-

verse (obviously!). These turn out to be the conditions that define

an algebraic structure called a  group. The group of invertible n × n

matrices plays a fundamental role in the mathematical subject of 

 group theory which is an important topic in higher-level Pure Mathe-

matics.

3.6   Determinants

From high-school we are all familiar with the formula for the in-verse of a 2 × 2 matrix: a b

c d

−1

=  1

ad − bc

  d −b

−c a

  if  ad − bc = 0

where the inverse does not exist if  ad − bc   =  0. In other words, a

2 × 2 matrix has an inverse if and only if  ad − bc =  0. This num-

 ber is called the determinant of the matrix, and it is either denoted

det( A) or just | A|.Example 3.36.  (Determinant Notation) If 

 A = 3 5

2 4

then we say either

det( A) = 2   or

3 5

2 4

= 2

because 3 · 4 − 2 · 5 =  2.

In this section, we’ll extend the concept of determinant to  n × n

matrices and show that it characterises invertible matrices in the

same way — a matrix is invertible if and only if its determinant isnon-zero.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 69/212

mathematical methods i 69

The determinant of a square matrix is a scalar value (i.e. a num-

 ber) associated with that matrix that can be recursively defined as

follows:

Definition 3.37.   (Determinant)If A   = (aij ) is an n × n matrix, then the determinant of A is a real

number, denoted det( A) or | A|, that is defined as follows:

1. If n =  1, then | A| = a11

2. If n > 1, then

| A| = j=n

∑  j=1

(−1)1+ j a1 j| A[1, j]|   (3.4)

where A[i, j] is the (n − 1) × (n − 1) matrix obtained from A by

deleting the i-th row and the j-th column

Notice that when n   >  1, this expresses an n × n determinant as an alter-

nating sum of n terms, each of which is an  (n − 1) × (n − 1) determinant.

Example 3.38.   (A 3 × 3 determinant) What is the determinant of the

matrix

 A =

2 5 3

4 3 6

1 0 2

?

First let’s identify the matrices A[1, 1], A[1, 2] and A[1, 3] ; recall these are

obtained by deleting one row and column from A. For example, A[1, 2] isobtained by deleting the first row and second column from A, thus

 A[1, 2] =

2 5 3

4   3   6

1   0   2

The term (−1)1+ j simply alternates between +1 and −1 and so the first

term is added because (−1)2 =  1, the second subtracted because (−1)3 =

−1, the third added, and so on. Using the formula we get

| A| = 2 · 3 6

0 2 − 5 · 4 6

1 2 + 3 · 4 3

1 0 = 2 · 6 − 5 · 2 + 3 · (−3)

= −7

where the three 2 × 2 determinants have just been calculated using the

usual rule.

This procedure for calculating the determinant is called  expand-

ing along the first row, because each of the terms  a1 j A[1, j] is associ-

ated with an entry in the first row. However it turns out, although

we shall not prove it,12 that it is possible to do the expansion along   12 Proving this is not difficult but

it involves a lot of manipulation of subscripts and nested sums, which isprobably not the best use of your time.

any row or indeed, any column. So in fact we have the followingresult:

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 70/212

70

Theorem  3.39.   Let A   = (aij ) be an n × n matrix. Then for any fixed

row index i we have

| A| = j=n

∑  j=1

(−1)i+ j aij | A[i, j]|

and for any fixed column index j, we have

| A| =i=n

∑ i=1

(−1)i+ jaij | A[i, j]|.

(Notice that the first of these two sums involves terms obtained from the i-

th row of the matrix, while the second involves terms from the j-th column

of the matrix.)

Proof.   Omitted.

Example 3.40.   (Expanding down the second column) Determine the

determinant of 

 A =

2 5 3

4 3 6

1 0 2

by expanding down the second column. Notice that because we are using

the second column, the signs given by the  (−1)i+ j terms alternate −1,

+1, −1 starting with a negative, not a positive. So the calculation gives

| A| = (−1) · 5 ·

4 6

1 2

+ 3 ·

2 3

1 2

+ (−1) · 0 · (don’t care)

= −5 · 2 + 3 · 1 + 0= −7

 Also notice that because a32  =  0, the term (−1)3+2 a32 | A[3, 2]| is forced

to be zero, and so there is no need to actually calculate | A[3, 2]|.In general, you should choose the row or column of the matrix

that has lots of zeros in it, in order to make the calculation as easy

as possible!

Example 3.41.  (Easy if you choose right) To determine the determinant of 

 A =

3 1 0 24 1 0 1

2 1 0 1

1 1 3 2

use the third column which has only one non-zero entry, and get

| A| = (+1) · 0 + (−1) · 0 + (+1) · 0 + (−1) · 3 ·

3 1 2

4 1 1

2 1 1

= −6

rather than getting an expression with three or four 3 × 3 determinants toevaluate!

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 71/212

mathematical methods i 71

From this we can immediately deduce some theoretical results:

Theorem  3.42.  Let A be an n × n matrix. Then

1.   | AT | = | A|,2.

  |α A

|= αn

| A

|, and

3. If A has a row of zeros, then | A| = 0.

4. If A is an upper (or lower) triangular matrix, then | A| is the product of 

the entries on the diagonal of the matrix.

Proof.   To prove the first statement, we use induction on n. Certainly

the statement is true for 1 × 1 matrices. So now suppose that it

is true for all matrices of size up to  n − 1. Notice that expanding

along the first row of  A gives a sum with the same coefficients as

expanding down the first column of  AT , the signs of each term are

the same because (−1)i+ j = (−1) j+i, and all the  (n − 1) × (n − 1)

determinants in the first sum are just the transposes of those in thesecond sum, and so are equal by the inductive hypothesis.

For the second statement we again use induction. Certainly the

statement is true for 1 × 1 matrices. So now suppose that it is true

for all matrices of size up to  n − 1. Then

|α A| = j=n

∑  j=1

(−1)i+ j (αaij ) |α A[i, j]|

= j=n

∑  j=1

(−1)i+ j (αaij ) αn−1| A[i, j]|   (inductive hypothesis)

= ααn−1  j=n∑ 

 j=1

(−1)i+ j aij | A[i, j]|   (rearranging)

= αn| A|.The third statement is immediate because if we expand along the

row of zeros, then every term in the sum is zero.

The fourth statement again follows from an easy induction argu-

ment.

Example 3.43.  (Determinant of matrix in row echelon form) A matrix in

row echelon form is necessarily upper triangular, and so its determinant

can easily be calculated. For example, the matrix

 A =

2 0 1 −1

0 1 2 1

0 0 −3 2

0 0 0 1

which is in row echelon form has determinant equal to  2 · 1 · (−3) · 1  =

−6 because this is the product of the diagonal entries. We can verify this

easily from the formula by repeatedly expanding down the first column. So

2 0 1 −1

0 1 2 1

0 0 −3 20 0 0 1

= 2 ·

1 2 1

0 −3 20 0 1 = 2 · 1 · −

3 2

0 1

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 72/212

72

3.6.1   Calculating determinants

The recursive definition of a determinant expresses an  n

×n de-

terminant as a linear combination of  n  terms each involving an

(n − 1) × (n − 1) determinant. So to find a 10 × 10 determinant

like this involves computing ten 9 × 9 determinants, each of which

involves nine 8 × 8 determinants, each of which involves eight 7 × 7

determinants, each of which involves seven 6 × 6 determinants,

each of which involves six 5 × 5 determinants, each of which in-

volves five 4 × 4 determinants, each of which involves four 3 × 3

determinants, each of which involves three 2 × 2 determinants.

While this is possible (by computer) for a 10 × 10 matrix, even the

fastest supercomputer would not complete a 100 × 100 matrix in the

lifetime of the universe.However, in practice, a computer can easily find a 100 × 100

determinant, so there must be another more efficient way. Once

again, this way is based on elementary row operations.

Theorem  3.44.  Suppose A is an n × n matrix, and that A is obtained

 from A by performing a single elementary row operation.

1. If the elementary row operation is Type I, then

| A| = −| A|.

In other words, a Type I elementary row operation multiplies the deter-minant by −1.

2. If the elementary row operation is Type 2, say Ri ← αRi  then

| A| = α| A|.

In other words, multiplying a row by the scalar  α  multiplies the deter-

minant by α.

3. If the elementary row operation is of Type 3, then

| A

|=

| A

|.

In other words, adding a multiple of one row to another does not change

the determinant.

Proof.  Omitted for now.

Previously we have used elementary row operations to find

solutions to systems of linear equations, and to find the basis and

dimension of the row space of a matrix. In both these applications,

the elementary row operations did not change the answer that was

 being sought. For finding determinants however, elementary row

operations do change the determinant of the matrix, but they changeit in a controlled fashion  and so the process is still useful.13   13 Another common mistake for begin-

ning students of linear algebra is toassume that row-reduction preservesevery interesting property of a matrix.Instead, row-reduction preserves someproperties, alters others in a controlledfashion, and destroys others. It isimportant to always know  why therow-reduction is bein done.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 73/212

mathematical methods i 73

Example 3.45.  (Finding determinant by row-reduction) We return to an

earlier example, of finding the determinant of 

 A =

2 5 3

4 3 6

1 0 2

Suppose that this unknown value is denoted d. Then after doing the Type I 

elementary row operation R1 ↔ R3  we get the matrix 1 0 2

4 3 6

2 5 3

which has determinant −d, because Type 1  elementary row operations

multiply the determinant by −1. If we now perform the Type  3  elementary

row operations, R2 ←  R2 − 4R1 and R3 ←  R3 − 2R1, then the resultingmatrix 1 0 2

0 3 −2

0 5 −1

still has determinant −d because Type  3  elementary row operations do

not alter the determinant. Finally, the elementary row operation R3 ←R3 − (5/3)R2  yields the matrix

1 0 2

0 3   −2

0 0 7/3

which still has determinant −d. However, it is easy to see that the deter-

minant of this final matrix is 7, and so −d  =  7, which immediately tells

us that d = −7 confirming the results of Examples  3.38 and  3.40

Thinking about this process in another way shows us that if a

matrix  A has determinant d  and the matrix  A  is the row-echelon

matrix obtained by performing Gaussian elimination on  A, then

| A| =  β | A|   for some β = 0.

Combining this with the fourth property of Theorem  3.44 allows us

to state the single most important property of determinants:

Theorem  3.46.   A matrix A is invertible if and only if its determinant is

non-zero.

Proof.  Consider the row echelon matrix  A  obtained by applying

Gaussian elimination to  A. If  A is invertible, then  A  has no zero

rows and so every diagonal entry is non-zero and thus | A| =   0,

while if  A is not invertible,  A  has at least one zero row and thus

| A|   =  0. As the determinant of  A is a non-zero multiple of the

determinant of  A, it follows that  A has non-zero determinant if and only if it is invertible.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 74/212

74

3.6.2   Properties of Determinant

We finish this chapter with some of the properties of determinants,

most of which follow immediately from the following theorem,

which shows that the determinant function is  multiplicative.

Theorem  3.47.  If A and B are two n × n matrices, then

| AB| = | A| · |B|.Proof.  There are several proofs of this result, none of which are

very nice. We give a sketch outline14 of the most illuminating proof.   14 This is a very brief outline of theproof so do not worry if you cannotfollow it without some guidance onhow to fill in the gaps.

First note that if either  A or  B  (or both) is not invertible, then  AB  is

not invertible and so the result is true if any of the determinants is

zero.

Then proceed in the following steps:

1. Define an elementary matrix to be a matrix obtained by perform-

ing a single elementary row operation on the identity matrix.2. Note that premultiplying a matrix  A by an elementary matrix  E,

thereby forming the matrix  E A, is exactly the same as perform-

ing the same elementary row operation on  A.

3. Show that elementary matrices of Type 1, 2  and  3  have deter-

minant −1, α  and 1 respectively (where  α  is the non-zero scalar

associated with an elementary row operation of Type  2).

4. Conclude that the result is true if  A  is an elementary matrix or a

product of elementary matrices.

5. Finish by proving that a matrix is invertible if and only if it is the

product of elementary matrices, because Gauss-Jordan elimina-tion will reduce any invertible matrix to the identity matrix.

The other proofs of this result use a different description of the

determinant as the weighted sum of  n! products of matrix entries,

together with extensive algebraic manipulation.

 Just for fun, we’ll demonstrate this result for 2 × 2 matrices

purely algebraically, in order to give a flavour of the alternative

proofs. Suppose that

 A =  a b

c d   B =  a   b

c   d Then

 AB  =

 aa + bc   ab + bd

ac + cd bc + dd

Therefore

| AB| = (aa + bc)(bc + dd) − (ab + bd )(ac + cd)

=  aabc + aa dd + bc bc + bcdd − abac − abcd − bdac − bdcd= (aabc − abac) + aa dd + bc bc + (bcdd − bdcd) − abcd − bdac= 0 + (ad)(ad) + (bc)(bc) + 0 − (ad)(bc) − (ad)(bc)

= (ad − bc)(ad − bc)= | A||B|.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 75/212

mathematical methods i 75

The multiplicativity of the determinant immediately gives the

main properties.

Theorem  3.48.  Suppose that A and B are n × n matrices and k is a

 positive integer. Then

1.   | AB| = |BA|2.   | Ak | = | A|k 

3. If A is invertible, then | A−1| = 1/| A|Proof.  The first two are immediate, and the third follows from the

fact that  AA−1 =  I n  and so | A|| A−1| = 1.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 76/212

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 77/212

4

Linear transformations

4.1   Introduction

First we will give the general definition of a function.

Definition 4.1.   (Function)

Given two sets A and B, a  function   f   :  A →  B is a rule that assigns to

each element of A a  unique element of B. We often write

 f   :   A −→   B

a   −→   f (a)

where f (a) is the element of B assigned to a, called the  image of a un-

der f. The set A is called the  domain of f and is sometimes denoted

dom( f ). The set B is called the  codomain of f . The range of f, (some-

times denoted range( f )) is the set of all elements of B that are the image

of some element of A.

Often   f (a) is defined by some equation involving  a  (or whatever

variable is being used to represent elements of  A), for example

 f (a) =   a2. However, sometimes you may see   f  defined by listing

 f (a) for each a ∈ A. For example, if  A = {1,2,3} we could define   f 

 by

 f   :   A −→   R

1  −→

  10

2   −→   10

3   −→   102

If   f (x) is defined by some rule and the domain of   f   is not explicitly

given then we assume that the domain of   f  is the set of all values

on which   f (x) is defined.

Note that the range of   f  need not be all of the codomain. For

example, if   f   : R → R is the function defined by   f (x) =  x 2 then the

codomain of   f   is R while the range of   f  is the set {x ∈ R | x ≥ 0}.

A linear transformation is a function from one vector space to an-

other preserving the structure of vector spaces, that is, it preserves

vector addition and scalar multiplication.More precisely:

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 78/212

78

Definition 4.2.  (Linear transformation)

 A function f from Rn to R

m is a linear transformation if:

1. f (u + v) =   f (u) +   f (v) for all u, v in Rn ;

2. f (αv) = α f (v) for all v  in Rn and all α  in R.

An interesting case is when n  =  m , inwhich case the domain and codomainare the same vector space.Example 4.3.  (Linear transformation) In R

3, the orthogonal projection

to the xy-plane is a linear transformation. This maps the vector  (x, y, z) to

(x, y, 0).   Check the two conditions to convinceyourself 

Example 4.4.  (Not a linear transformation) The function from R3 to

R  given by f (x, y, z) =   x2 + y2 + z2 is not  a linear transformation.

Indeed for v   = (1,0,0) and  α   =   2, f (αv) =   f (2,0,0) =   4 while

α f (v) = 2.1  =  2.

Example 4.5.   (Not a linear transformation) Let f   :   R →   R defined

by f (x) =   ax  + b. Note that f (1) =   a + b while f (2) =   2a + b =2(a + b) when b =  0. Thus when b =   0, the function f is not a linear

transformation of the vector space R. We call f an affine function.

Example 4.6.  Let A be an m × n matrix. Then the function f from Rn to

Rm such that f ( x) =  A x is a linear transformation (where we see  x  as an

n × 1 column vector, as described in Chapter  1). Indeed

 f (u + v) =  A(u + v) =  Au + Av =   f (u) +   f (v)

(using matrix property (8) of Theorem 3.2), and

 f (αv) =  A(αv) = α Av =  α f (v)

(using matrix property (7) of Theorem 3.2).

Theorem  4.7.   Let f   : Rn −→ Rm be a linear transformation.

(i) f  (0) = 0.

(ii) The range of f is a subspace of Rm.

Proof. (i) This follows from an easy calculation and the fact that

0 + 0 =  0:   f (0) =   f (0 + 0) =   f (0) +   f (0). The result follows from

subtracting   f (0) on each side.

(ii) We need to prove the three subspace conditions for range( f ) (see

Definition 2.4).

(S1)   0 ∈ range( f ) since 0  =   f (0) by Part (i).

(S2) Let u, v ∈  range( f ), that is, u  =   f (u), v  =   f (v) for some

u, v ∈ Rn. Then u + v =   f (u) +   f (v) =   f (u + v) and so

u + v ∈ range( f ).

(S3) Let α ∈  R and  v ∈   range( f ), that is, v   =   f (v) for some

v ∈  Rn. Then αv  =  α f (v) =   f (αv) and so αv ∈ range( f )

for all scalars

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 79/212

mathematical methods i 79

4.1.1   Linear transformations and bases

A linear transformation can be given by a formula, but there are

other ways to describe it. In fact, if we know the images under a

linear transformation of each of the vectors in a basis, then the  rest

of the linear transformation is completely determined.

Theorem  4.8.   Let {u1, u2, . . . , un} be a basis for Rn and let t 1, t 2, . . .,   For instance the basis of Rn can be the

standard basis.t n  be n vectors of Rm. Then there exists a unique  linear transformation f 

 from Rn to R

m such that f (u1) = t 1, f (u2) = t 2, . . ., f (un) = t n.

Proof.  We know by Theorem  2.33 that any vector  v  ∈   Rn can be

written in a unique way as  v  =  α1u1 + α2u2 + . . . + αnun  (where the

αi’s are real numbers). Define   f (v) = α1t 1 + α2t 2 + . . . + αnt n. Then

 f   satisfies   f (ui) =   t i  for all i  between 1 and n  and we can easily

check that   f  is linear. So we have that   f   exists.

Now suppose  g  is also a linear transformation satisfying  g(ui) =

t i  for all i  between 1 and  n. Then

 g(v) = g(α1u1 + α2u2 + . . . + αnun)

= g(α1u1) + g(α2u2) + . . . + g(αnun)

(by the first condition for a linear function)

= α1 g(u1) + α2 g(u2) + . . . + αn g(un)

(by the second condition for a linear function)

= α1t 1 + α2t 2 + . . . + αnt n

=   f (v)

Thus g(v) =   f (v) for all v ∈  Rn so they are the same linear trans-

formation, that is,   f   is unique.

Exercise 4.1.1.  Say f is a linear transformation from R2 to R

3 with

 f (1, 0) = (1,2,3) and f (0, 1) = (0, −1, 2). Determine f (x, y).

4.1.2   Linear transformations and matrices

We have seen that it is useful to choose a basis of the domain, say

{u1, u2, . . . , un}. Now we will also take a basis for the codomain:

{v1, v2, . . . , vm}. For now we can think of both the bases as the

standard ones, but later we will need the general case.By Theorem 2.33, each vector   f (u j) (1  ≤   j ≤   n) has unique

coordinates in the basis {v1, v2, . . . , vm} of Rm. More precisely:

 f (u j) =  a1 jv1 + a2 jv2 + . . . + amj vm

where the symbols a ij  represent real numbers, for 1  ≤   i ≤   m,

1 ≤  j ≤ n. For short we write

 f (u j) =m

∑ i=1

aij vi.

Now we can determine the image of any vector  x  in Rn. If  x  =x1u1 + x2u2 + . . . xnun, then we have:

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 80/212

80

 f ( x) =   f (x1u1 + x2u2 + . . . + xnun)

=   f (n

∑  j=1

x ju j)

=n

∑  j=1

x j f (u j)   (by linearity)

=n

∑  j=1

x j

  m

∑ i=1

aij vi

=m

∑ i=1

  n

∑  j=1

aij x j

vi

Notice that ∑ n j=1 aij x j   is exactly the i-th element of the  m

×1

matrix  A x, where  x  is the  n × 1 vector (x1, x2, . . . , xn)T . Now let

us see vectors as column vectors (coordinates in the chosen basis),

that is, vectors in Rm are m × 1 matrices and vectors in R

n are n × 1

matrices. Then   f ( x) =   A x, where  A   = (aij ) is the  m × n matrix

defined by   f (u j) = ∑ mi=1 aij vi. Since  A has size  m × n and  x  has size

n × 1,  A x has size m × 1 and so is a vector of Rm.   Together with Example 4.6, this tellsus that linear transformations areessentially the same as matrices (afteryou have chosen a basis of the domainand a bais of the codomain).

This gives us a very convenient way to express a linear transfor-

mation (as the matrix  A) and to calculate the image of any vector.

Definition 4.9.

 (Matrix of a linear transformation)The matrix of a linear transformation f , with respect to the basis B of the

domain and the basis C of the codomain, is the matrix whose j-th column

contains the coordinates in the basis C of the the image under f of the j-th

basis vector of B.

When both B and C are the standard bases then we refer to the matrix

as the standard matrix of f .

Whenever  m  =  n  we usually take  B  =  C.

Example 4.10.   (identity) The identity matrix  I n

 corresponds to the

linear transformation that fixes every basis vector, and hence fixes every

vector in Rn.

Example 4.11.  (dilation) The linear transformation with matrix

2 0

0 2

  A dilation is a function that maps every

vector to a fixed multiple of itself: x →  λ x, where λ  is called the ratio of the dilation.

(with respect to the standard basis, for both the domain and codomain)

maps e1 to  2e1 and  e2  to  2e2: it is a dilation of ratio  2  in R2.

Example 4.12.  (rotation) The linear transformation with matrix

0 −1

1 0

(with respect to the standard basis) maps e1  to  e2 and  e2 to −e1: it is the

anticlockwise rotation of the plane by an angle of  90  degrees (or π /2)around the origin.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 81/212

mathematical methods i 81

Exercise 4.1.2.   In R2, an anticlockwise rotation of angle θ  around the

origin is a linear transformation. What is its matrix with respect to the

standard basis?

4.1.3   CompositionWhenever you have two linear transformations such that the

codomain of the first one is the same vector space as the domain

of the second one, we can apply the first one followed by the sec-

ond one.

Definition 4.13.   (Composition)

Let f   :  Rn →  Rm and g   :  Rm →  R

 p be functions. Then the function

 g ◦   f   : Rn → R p defined by

( g ◦   f )( x) =  g( f ( x))

is the composition of f by g.

Notice we read composition from rightto left.

Theorem  4.14.   If f   :   Rn →   Rm and g   :   Rm →   R

 p are linear

transformations, then g ◦   f is also a linear transformation.

Proof.   We need to prove the two conditions for a linear transforma-

tion.

For all u, v in Rn:

( g ◦  f )(u + v) = g( f (u + v))   (definition of composition)

= g( f (u) +   f (v))   ( f   is a linear transformation)

= g( f (u)) + g( f (v)))   ( g is a linear transformation)

= ( g ◦   f )(u) + ( g ◦   f )(v) (definition of composition)

For all u  in Rn and all α  in R:

( g ◦   f )(αv) = g( f (αv))   (definition of composition)

= g(α f (v))   ( f   is a linear transformation)

= α g( f (v))   ( g is a linear transformation)

= α( g ◦  f )(v)   (definition of composition)

Let  A   = (aij ) be the matrix corresponding to   f  with respect to

the basis {u1, u2, . . . , un} of the domain and the basis {v1, v2, . . . , vm}of the codomain. Let  B   = (bij ) be the matrix corresponding to  g

with respect to the basis {v1, v2, . . . , vm} of the domain and the ba-

sis {w1, w2, . . . , w p} of the codomain. So  A is an  m × n matrix and

B is a  p

×m matrix. Let us look at the image of  u1  under  g

◦  f .

We first apply   f , so the image   f (u1) corresponds to the firstcolumn of  A:   f (u1) =   a11v1 +  a21v2 + . . . + am1vm   =  ∑ 

mi=1 ai1vi.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 82/212

82

Then we apply  g  to   f (u1):

( g ◦   f )(u1) =  g

  m

∑ i=1

ai1vi

=

m

∑ i=1 ai1 g(vi)   ( g is a linear transformation)

=m

∑ i=1

ai1

  p

∑  j=1

b ji w j

= p

∑  j=1

  m

∑ i=1

ai1b ji

w j

= p

∑  j=1

  m

∑ i=1

b ji ai1

w j

= p

∑  j=1

(BA) j1w j

Notice this is exactly the first column of the matrix  B A! We can do

the same calculation with any  u j  (1 ≤   j ≤  n) to see that the image

( g ◦   f )(u j) corresponds exactly to the   j-th column of the matrix  B A.

Hence the matrix corresponding to  g ◦   f  with respect to the basis

{u1, u2, . . . , un} of the domain and the basis {w1, w2, . . . , w p} of the

codomain is  B A.

Key Concept  4.15.  Composition of linear transformations is the

same thing as multiplication of the corresponding matrices.

You may have thought that matrix multiplication was defined in

a strange way: it was defined precisely so that it corresponds with

composition of linear transformations.

4.1.4   Inverses

An inverse function is a function that “undoes” another function:

if   f (x) =   y, the inverse function  g  maps y  to  x . More directly,

 g( f (x)) =   x, meaning that  g ◦   f   is the identity function, that is, it

fixes every vector. A function   f  that has an inverse is called  invert-ible and the inverse function is then uniquely determined by   f   and

is denoted by   f −1.

In the case where   f   is a linear transformation, it is invertible if 

and only if its domain equals its codomain and its range, say Rn. In

this case,   f −1 ◦  f   is the identity function.

Let  A be the matrix corresponding to   f   and B  be the matrix

corresponding to   f −1 (all with respect to the standard basis, say);

 both are n × n matrices. We have seen in Example  4.10 that the

matrix corresponding to the identity function is the identity matrix

I n. By the Key Concept  4.15, we have that  B A  =   I n. In other words,

B is the inverse matrix of  A.Hence we have:

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 83/212

mathematical methods i 83

Theorem  4.16.  The matrix corresponding to the inverse of an invertible

linear transformation f is the inverse of the matrix corresponding to f 

(with respect to a chosen basis).

4.1.5   Rank-Nullity Theorem revisited

Remember the Rank-Nullity Theorem (Theorem 3.18): rank( A) +

nullity( A) =   n for an m × n matrix  A. We now know that  A rep-

resents a linear transformation   f   :   x →   A x, so we are going to

interpret what the rank and the nullity are in terms of   f .

The rank of  A is the dimension of the column space of  A. We

have seen that the columns represent the images   f (u j) for each

 basis vector u j  of Rn. So the column space corresponds exactly to

the range of   f . By Theorem 4.7, we know that the range of   f   is a

subspace, and so has a dimension: the rank of  A corresponds to the

dimension of the range of   f .

The nullity of  A is the dimension of the null space of  A. Recall

that the null space of  A is the set of vectors  x  of Rn such that  A x  =

0. In terms of   f , it corresponds to the vectors  x  of Rn such that

 f ( x) = 0. This set is called the kernel of   f .

Definition 4.17.   (Kernel)

The kernel of a linear transformation f   : Rn −→ Rm is the set

Ker( f ) = { x ∈ Rn| f ( x) = 0}.

The kernel of   f  is a subspace1 of Rn and so has a dimension: the   1 Try proving it!

nullity of  A corresponds to the dimension of the kernel of   f .

We can now rewrite the Rank-Nullity Theorem as follows:

Theorem  4.18.  Let f be a linear transformation. Then

dim(range( f )) + dim(Ker( f )) = dim(dom( f )).

We immediately get:

Corollary 4.19.  The dimension of the range of a linear transformation

is at most the dimension of its domain.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 84/212

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 85/212

Part II

Differential Calculus: byLuchezar Stoyanov,

 Jennifer Hopwood and

Michael Giudici. Some

pictures courtesy of Kevin Judd.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 86/212

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 87/212

5

Vector functions and functions of several variables

A great deal can be done with scalar functions of one real variable;

that is, functions that are rules for assigning to each real number in

some subset of the real line (the independent variable) another real

number (the dependent variable). An example of such a function is

 f (x) =  x2

where x  is a real number. However, sometimes this type of function

is not enough to deal with the problems that arise in science, en-

gineering, economics or other fields. For instance, a meteorologist

might want to deal with air pressure which varies with latitude,

longitude and time, so would need a function of three variables.

Other types of problems might involve just one independent vari-

able (for instance, time) which affects several dependent variables

(e.g. the components in three directions of an electromagnetic field);

then we would have a set of three functions of one variable. We

could deal with all three at once by regarding them as the compo-

nents of a vector.

 Just as for scalar functions of one real variable, it is important to

know how these functions change as the variables change. This is

where calculus is required. Luckily the ideas of differential calculus

from high school can be extended to these new functions and that is

the focus of this part of the course.

5.1   Vector valued functions

Vector functions are functions that take a real number and assign a

vector in Rn. We will usually be dealing with vectors in R

2 and R3.

More precisely, given a subset  I  of R, a vector function with values

in R3 is a function

r  :   I  −→   R3

t  −→   r (t) = ( f (t), g(t), h(t))

The functions   f (t), g(t) and  h(t) are called coordinate functions of 

r (t).

For instance, t  could represent time and r (t) the position of some

object at that time. As  t  varies in  I , r (t) draws a curve  C  in R3

which is the path of the object in space.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 88/212

88

An example of such a function is

r  :   R −→   R3

t   −→ sin t,2cos t,   t

10

that plots the curve given in Figure  5.1.

Figure 5.1: The curve defined by r (t) =sin(t),2cos(t),   t

10

 for  t ∈ [0,20]

In mathematical language, the curve  C  is the set described by

C = {( f (t), g(t), h(t)) | t ∈  I }.

We say that

C :   x =   f (t),   y =  g(t),   z =  h(t), for t ∈  I 

are parametric equations of  C  with parameter  t, or equivalently, that

( f (t), g(t), h(t))  is a  parameterisation of  C.

Remark 5.1.   Any curve C (except if it consists of a single point) has

infinitely many different parameterisations, that is, can be described by

infinitely many vector functions. See Example  5.2(2) and (3) below.

A vector function with values in R2 has the form

r  :   I  −→   R2

t  −→   r (t) = ( f (t), g(t))

with coordinate functions   f (t) and  g(t). As above, when  t  varies in

I , r (t) draws a curve  C  in R2.

Example 5.2.  1. For t ∈ R, let r (t) = (x0 + at, y0 + bt, z0 + ct), where

x0, y0, z0, a, b, c are real constants. The corresponding curve C defined

by

x =  x0 + at,   y =  y0 + bt,   z =  z0 + ct,   for t ∈ R,

is a straight line in R3  – this is the line through the point P   =

(x0, y0, z0) parallel to the vector v  = (a, b, c). Briefly, r (t) = P  + tv.

2. For t ∈ [0, 2π ] define r (t) = (cos(t),sin(t)) The corresponding curve

is the unit circle with centre at  (0, 0) and with radius one.

3. Another parameterisation of the unit circle is given by r 1(t) =(cos(2t),sin(2t)) for t ∈ [0,π ].

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 89/212

mathematical methods i 89

5.2   Functions of two variables

We can think of two variables  x  and  y  as the vector (x, y) ∈   R2.

A function of two variables can then be thought of as a function

whose variable is a vector in R2. More formally, let  D  be a subset of 

R2. A   function of two variables defined on  D  is a rule that assigns to

each (x, y) ∈ D a real number   f (x, y), that is,

 f   :   D   −→   R

(x, y)  −→   f (x, y)

An example of a function of two variables is the height above sea

level at a location on a map defined by its  x  and  y  coordinates.

Given a function   f   :  A → B, the graph of   f  is the set

graph( f ) = {( a, f (a) ) | a ∈  A}

When   f  is a scalar-valued function of one real variable, then theelements of graph( f ) are points in the plane and so this concept

corresponds to the graph in the plane that you are familiar with

drawing.

When   f  is a function of two variables then

graph( f ) = {( x, y, f (x, y) ) | (x, y) ∈ D}This defines a surface S  in R

3. We sometimes write

S :   z =   f (x, y), for   (x, y) ∈ D,

which can be regarded as a parametric equation for  S. So if we

think of R2 as the plane in R3 defined by z   =  0, then z   =   f (x, y)

gives the distance of the surface above or below the plane at the

point (x, y) just as  y  =   f (x) gives the distance of a curve above or

 below the  x-axis for functions of one variable.

Figure 5.2 shows the surface defined by the graph of the function

 f (x, y) = y2 − x2.

Figure 5.2: The surface defined by f (x, y) = y2 − x2

Example 5.3.  1. Let

 f   :   R2 −→   R

(x, y)  −→   x2 + y2

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 90/212

90

Then

graph( f ) = {(x, y, x2 + y2) | (x, y) ∈ R2}

and defines a surface that is called a  paraboloid and is shown in Fig-

ure 5.3.  The domain of f is R2 and the range is  [0,∞).

Figure 5.3: The surface defined by f (x, y) = x 2 + y2

2. Let D = {(x, y) ∈ R2 | x2 + y2 ≤ 1} and define

 f   :   D   −→   R

(x, y)  −→  1 − x2 − y2

The graph of f is the upper hemisphere in R3 of radius 1  and centre

(0,0,0) as displayed in Figure  5.4. The range of f is the interval  [0, 1].

Figure 5.4: The surface defined by

 f (x, y) =  1 − x2 − y2

5.2.1   Level Curves

Another way of visually representing functions of two variables is

 by using level curves.

Suppose   f   :  D −→  R  is a function of two variables. Then the set

of points (x, y) in  D  satisfying the equation

 f (x, y) = k 

where k  is some real constant defines a curve in  D . It is called a

level curve of   f   because it consists of all the points in  D  for whichthe corresponding points on the surface define by   f  are at the same

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 91/212

mathematical methods i 91

level, that is, at height |k | above or below the plane. If  k  is not in the

range of   f , then the level curve does not exist.

Examples of level curves with which you might be familiar are:

1. contour lines on maps, which are lines of constant height above

sea-level, and

2. the isobars on a weather chart, which are lines of constant atmo-

spheric pressure.

Level curves are a way of giving visual information about a surface

in R3 in a 2-dimensional form.

Example 5.4.  1. Let f  (x, y) =  x2 + y2. If k   > 0, the level curve defined

by f (x, y) =  k is the circle x2 + y2 =  k. If k  =  0 then the level curve

is the single point {0} and if k   < 0, then the level curve does not exist.

Figure  5.5 displays some of the level curves.

Figure 5.5: Some level curves for f (x, y) = x 2 + y2.

2. Let f (x, y) = 2x2 + 3 y2. If k   > 0, the level curve defined by f (x, y) =

k is the ellipse 2x2 + 3 y2 =  k. If k   =   0 the level curve is the single

 point {0} and if k   <  0, then the level curve does not exist. Some of the

level curves are displayed in Figure  5.6.

Figure 5.6: Some level curves for f (x, y) = 2x2 + 3 y2.

5.3   Functions of three or more variables

Functions of three or more variables may also be defined but itis not possible to visualise their graphs as curves or surfaces. For

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 92/212

92

example, if  D  is a subset of R3, a function of three variables  defined in

D is a rule that assigns to each  (x, y, z) ∈ D a real number   f (x, y, z).

In a similar way, given a subset  D  of Rn, we can consider functions

 f (x1, x2, . . . , xn) defined for (x1, x2, . . . , xn) ∈ D.

For a general function   f   of  n  variables,

graph( f ) = {( x1, x2, . . . , xn, f (x1, x2, . . . , xn) ) | (x1, x2, . . . , xn) ∈ D}

(this is an  n-dimensional surface in Rn+1, which is a concept which

makes algebraic but not visual sense). For any k  ∈ R the set

S = { x ∈ D |   f ( x) = k }

is called a  level surface of   f   (level curve when n  =  2).

Example 5.5.  1. f (x, y, z) =   ax + by + cz  =  c · x is a linear function

of three variables for each constant vector  c   = (a, b, c). Here x   =

(x, y, z) ∈   R3. If  c  =   0, then for each real number k the equation

 f ( x) =  k determines a plane in R3 with normal vector c. Changing k 

 gives different planes parallel to each other. That is, the level surfaces

defined by f are all planes perpendicular to  c.

2. f ( x) =  c · x  =  c1x1 + . . . + cnxn. Then (see part (1) above) the level

surfaces { x |   f ( x) =   k } are mutually parallel (n − 1)-dimensional

subspaces.

3. f (x, y, z) =   x2 + y2 + z2. Then, for each k   >   0, the level surface

defined by f (x, y, z) = k is a sphere with radius√ 

k and centre 0.

5.4   Summary

Summing up we have discussed three types of functions:

1. Scalar-valued functions of one real variable, that is, functions of 

the form

 f   :  A −→ R

for some subset  A of R.

2. Vector functions, that is, functions of the form

 f   :  A −→ Rn

for some subset  A of R with n ≥ 2.

3. Functions of several variables, that is, functions of the form

 f   :  A −→ R

for some subset  A of Rn with n ≥ 2.

High school calculus considered functions of type (1). In these

notes we will briefly refresh this and study the calculus of functionsof types (2) and (3).

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 93/212

mathematical methods i 93

We can of course have even more general functions, that is, func-

tions of the form

 f   :  A −→ Rm

for some subset  A of Rn with n, m ≥  2. Such functions are some-

times called vector fields or  vector transformations.We have already seen linear versions of such functions in Chap-

ter 4:  an m × n matrix B  defines the function

 f   :   Rn −→   Rm

 x   −→   B x

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 94/212

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 95/212

6

Limits and continuity

6.1   Scalar-valued functions of one variable

An important concept for the calculus of scalar-valued functions of 

one variable is the notion of a limit. This should mostly be familiarto you from school. Recall that an  open interval in R is a set denoted

 by (a, b) and consisting of all real numbers  x  such that a <  x   < b. A

closed interval [a, b] consists of all real numbers  x  such that a ≤  x ≤b.

Definition 6.1.  (Intuitive definition of a limit)

Let f (x) be defined on an open interval containing a, except possibly

at a, and let L ∈   R. We say that f  converges to L  as  x  approaches

a if we can make the values of f (x) arbitrarily close to L by taking x to

be sufficiently close to a, but not equal to a. We write  limx→a  f (x) =   L or

 f (x) →x→a  L.

Essentially what the definition is saying is that as  x  gets closer

and closer to a, the value   f (x) gets closer and closer to  L .

Sometimes we can calculate the limit of   f (x) as  x  approaches a

 by simply evaluating  f (a). For example, if   f (x) =  x2 then

limx→1

 f (x) =   f (1) = 1.

However, it may be the case that limx→a  f (x) =   f (a). For example,consider the function   f   : R → R defined by

 f (x) =

 x   if  x = 0

2 if  x  =  0(6.1)

The graph of this function is given in Figure  6.1.  Then we clearly

have that limx→0

 f (x) = 0 while   f (0) = 2.

It is also possible to define one-sided limits. We write

limx→a−  f (x) =  L

if   f (x) gets arbitrarily close to  L  as  x  gets sufficiently close to a  withx   <   a. This definition only requires   f  to be defined on an open

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 96/212

96

−2   −1 1   2 3

−2

−1

1

2

3   Figure 6.1: The graph of the function   f defined in (6.1)

interval (b, a) for some  b  <

  a. Similarly, lim x→a+   f (x) =   L if   f (x)gets arbitrarily close to  L as  x  gets sufficiently close to  a  with  x   >  a.

Note that limx→a

 f (x) exists if and only if 

limx→a−

 f (x) =   lim x→a+

 f (x)

Example 6.2.  1. For any constant α > 0,   limx→0

xα = 0.

2.   limx→0

sin x

x  = 1.

Figure 6.2: The graph of the function

 y =  sin(x)/x for  x ∈ [−5, 5].

3. Consider the function

 f (x) =

0   if x < 0

1   if x ≥ 0

Then limx→0   f (x) does not exist. However, lim x→0−   f (x) =   0 and

limx→0+   f (x) = 1.

To make it easier to calculate limits we have the following rules

that allow us to combine limits of known functions to determinelimits of new functions.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 97/212

mathematical methods i 97

Theorem  6.3 (Limit Laws).  Let I be an open interval, a  ∈   I, and

let f (x) and g(x) be defined for x ∈   I \ {a} such that  limx→a

 f (x) and

limx→a

 g(x) exist. Then:

1.

limx→a

( f (x) + g(x)) =   limx→a

 f (x) + limx→a

 g(x)

limx→a

( f (x) − g(x)) =   limx→a

 f (x) − limx→a

 g(x)

2. For any constant c ∈ R,

limx→a

(c ·   f (x)) = c · limx→a

 f (x)

3.

limx→a

( f (x) · g(x)) = (limx→a

 f (x)) · (limx→a

 g(x))

4. If   limx→a

 g(x) = 0, then

limx→a

 f (x)

 g(x)  =

  limx→a   f (x)

limx→a g(x)

The statements in Theorem  6.3 remain true if  x →   a is replaced

 by either  x → a−  or x → a+ for suitably defined functions.

The intuitive definition of a limit can be easily extended to the

notion of the limit of   f (x) as  x  approaches infinity. In particular

we say that   f (x) tends to L as x →   ∞ if we can make   f (x) arbi-

trarily close to  L  by taking  x  sufficiently large. We denote this bylim

x→∞ f (x) =  L . We can define lim

x→−∞  f (x) in a similar fashion.

The statements in Theorem  6.3 still hold if we replace  x →   a by

x → ∞ or x → −∞.

Example 6.4.  1. For any constant α > 0,   limx→∞

1

xα  = 0.

2. Evaulate   limx→∞

x2 + 2x + 1

2x2 + x + 7.

Solution: We have

limx→∞

x2

+ 2x + 12x2 + x + 7

  =   limx→∞

x

2

(1 +

  2

x  +

  1

x2 )x2(2 +   1

x  +   7x2 )

=   limx→∞

1 +   2x  +   1

x2

2 +   1x  +   7

x2

=lim

x→∞1 +   lim

x→∞

2

x +   lim

x→∞

1

x2

limx→∞

2 +   limx→∞

1

x +   lim

x→∞

7

x2

by Theorem 6.3

=   1+0+0

2+

0+

2

 =   1

2

Another useful theorem for calculating limits is the following

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 98/212

98

Theorem  6.5 (The Squeeze Theorem, or The Sandwich Theorem).

Let I be an open interval, a ∈  I, and let f (x), g(x) and h(x) be defined

 for x ∈  I \ {a} with g(x) ≤   f (x) ≤ h(x) for all x ∈  I \ {a}. If 

limx→

a g(x) =   lim

x→

ah(x) =  L

then  limx→a

 f (x) =  L.

The Squeeze Theorem still holds if  x →   a is replaced by any of 

x →  a−,  x →  a+, x →  ∞ or x → −∞ with the appropriate changes

to the statement about the domain of the function.

Example 6.6.   Determine   limx→∞

sin x

x  .

We have

− 1

x ≤   sin x

x  ≤   1

x

 for any x   >

  0, and

 limx→∞

−1

x  =

  limx→∞

  1

x  =

  0. So by the Squeeze

Theorem,   limx→∞

sin x

x  = 0.

We saw in Example 6.2(3) one way in which a limit may not

exist. Another way for which a limit may fail to exist is if   f (x)

diverges to ∞.

Definition 6.7.   (Diverging to ∞)

Let f (x) be defined on an open  (b, a). We say that f  diverges to ∞ as

x →   a from below if f (x) gets arbitrarily large as x gets sufficiently

close to a with x < a. We denote this by   limx→a−  f (x) = ∞.In a similar way one defines   lim

x→a−  f (x) = −∞,   limx→a+

 f (x) =  ∞ and

limx→a+

 f (x) = −∞.

Note that by writing

limx→a−  f (x) = ∞

we are not saying that the limit exists,we are simply using the notation todenote that   f  diverges to infinity from below.

Finally, if limx→a−

 f (x) =   limx→a+

 f (x) =  ∞, we write limx→a

 f (x) =  ∞.

Similarly, limx→a

 f (x) = −∞ means that limx→a−  f (x) =   lim

x→a+ f (x) = −∞.

Example 6.8.   For f (x) =  1

x

−1

, x = 1, we have

limx→1−

 f (x) = −∞   , limx→1+

 f (x) = ∞.

6.1.1   The precise definition of a limit

The intuitive definition of a limit given in Definition  6.1 is a bit im-

precise; what exactly do ‘approach’ or ‘sufficiently close’ mean?

This is particularly unclear when dealing with functions of sev-

eral variables: how do you ‘approach’ 0? From which direction?

It is also unclear when you start to deal with more complicated

functions than the ones you will see in this course. Moreover, we

need a more precise definition if we wish to give rigorous proofs of Theorem 6.3 and  6.5.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 99/212

mathematical methods i 99

For these reasons, mathematicians require a more formal and

precise definition. We outline such a definition in this subsection

for the interested student. These ideas will be explored in tutorials.

Definition 6.9.  (Limits at finite points)Let f (x) be defined on an open interval I around a, except possibly at a,

and let L ∈  R. We say that f  converges to L  as  x →  a if for all    >  0

there exists δ > 0 such that

| f (x) − L| <  for all x ∈  I \ {a} with |x − a| < δ

This definition needs quite a bit of unpacking. Essentially what it

is saying is that no matter what    you are given, then it is possible

to find a δ > 0 (possibly depending on  ) such that if  x  has distance

at most δ  from a  then   f (x) is at most distance    from L. The quan-

tity   is the measure of ‘closeness’ of   f (x) to  L  and  δ  is the measure

of ‘closeness’ of  x  to  a . Another way of thinking about it is that δ  is

the control on the input error that you need to guarantee an output

error of  .

Visualising this geometrically, what we have is the following:

Since | f (x) − L|   <    is equivalent to  L −   <   f (x)   <   L + , then

limx→a

 f (x) =   L means that for any    >  0 we can find  δ   >  0 small

enough that the part of the graph of   f  corresponding to the interval

(a − δ, a + δ) is entirely in the horizontal strip  L −   <  y   <   L + .

In other words, when  x  approaches a  the value   f (x) of the functionapproaches L .

We will demonstrate this with a simple example.

Example 6.10.   Let f (x) =   2x + 1 and a   =   0. Intuitively we expect

the limit as x tends to  0  to be 1. More formally, suppose that    >   0. To

show that   limx→0

 f (x) =  1  we need to find a  δ   >  0  so that if x is at distance

at most δ  from 0  then f (x) is at distance at most    from 1. A good guess

would be δ  =  /2. Indeed, if |x − 0| < δ then

| f (x) − 1| = |2x + 1 − 1| = |2x| = 2|x| < 2δ =  

as required.

We can also give a precise definition for limits as x  approaches ∞.

Definition 6.11.   (Limits at ∞ and −∞)

Let f (x) be defined on an interval  (c,∞) for some c ∈  R. We say that

 f (x) tends to L  as  x →  ∞ if for all     >  0 there exists b   >  c such that

| f (x) − L| <  for all x ≥ b. In a similar way we define   limx→−∞  f (x) =  L.

Geometrically, limx→∞  f (x) =   L means that for any    >  0 we canfind some b   >   a (possibly depending on ) so that the part of the

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 100/212

100

graph of   f  corresponding to the interval  [b,∞) is entirely in the

horizontal strip L −   <  y   <  L + . In other words, as the variable

x becomes larger and larger the value   f (x) of the function at x

approaches L .

We can also give a precise definition for   f  to diverge to ∞.

Definition 6.12.   (Diverging to ∞)

Let f (x) be defined on an interval of the form  (b, a) for some b   <  a. We

say that f  diverges to ∞ as x →   a with  x   <  a if for any M   >  0 there

exists δ > 0 such that f (x) >  M for all x < a with |x − a| < δ.

This is saying that no matter what positive number  M you

choose, you can always find some margin  δ  so that if  x  is at most

δ away from a  then   f (x) will be greater than  M.

6.2   Limits of vector functions

We can consider limits of vector functions in a similar way to how

we viewed limits for scalar functions.

Let r (t) = ( f (t), g(t), h(t)) be a vector function defined on some

open interval around a , except possibly at  a. Define

limt→a

r (t) =

limt→a

 f (t),limt→a

 g(t),limt→a

h(t)

,

if the limits of the coordinate functions exist. Limits can then be

calculated using the techniques for scalar-valued functions.

We can also define limt→a− r (t) and lim

t→a+r (t) in a similar fashion.

6.3   Limits of functions of two variables

We can also extend the notion of a limit to functions of two vari-

ables but this requires a few modifications.

First, for functions of one real variable we used the notion of an

open interval of the real line. The appropriate notion in the two

variable case is that of an open disc: Given  a  = (a1, a2) ∈  R2 and

δ > 0, the open disc of radius δ  and centre  a  is the set

B(a, δ) = { x ∈ R2 | d( x, a) < δ}

Here

d( x, a) = 

(x1 − a1)2 + (x2 − a2)2

is the distance between the points x  = (x1, x2) and  a.

Given a subset  D  of R2 we say that  a  is an  interior point of  D  if  D

contains some open disc centred at  a.

The intuitive definition of a limit would then be:

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 101/212

mathematical methods i 101

Definition 6.13.  (Intuitive definition of a limit)

Let f ( x) be defined on an open disc centred at  a, except possibly at  a, and

let L ∈  R. We say that f  converges to  L  as  x →   a if we can make the

values of f ( x) arbitrarily close to L by taking  x  to be sufficiently close toa, but not equal to  a. We write   lim

 x→a f ( x) =  L.

Example 6.14.  1. Let f  (x, y) =  x. Then

lim(x, y)→(a1,a2)

 f (x, y) = a1.

2. Determine   lim(x, y)→(0,0)

x2 + x + xy + y

x + y  .

Solution: Here it helps if we factorise the numerator. We have

lim(x, y)→(0,0)

x2 + x + xy + y

x + y  =   lim

(x, y)→(0,0)

(x + y)(x + 1)

x + y

=   lim(x, y)→(0,0)

x + 1

= 1

For the one variable case there were only two paths along which

to approach a real number  a: from the left or from the right. For

the limit to exist we require that the limit as we approach  a  from

either side is the same. However, there are infinitely many paths

approaching a point a ∈  R2 and they are not all straight lines! For

the limit to exist the value of   f ( x) must approach the same value  Lno matter which path we choose.

More formally, for the limit to exist and to be equal to  L  we

require that for every curve  r (t) with limt→t0 r (t) =  a  we have

limt→t0

 f  (r (t)) =  L

If the latter holds for only some curves but not all curves, then

lim x→a   f ( x) does not exist. This gives a simple method for showing

that a limit does not exist.

Example 6.15.  Let D = R2\{(0, 0)} and define

 f   :   D   −→   R

(x, y)  −→   xyx2+ y2

Let a  = (0, 0).

First consider the curve  r 1(t) = (t, t) passing through a. Then

limt→0

r 1(t) = (0, 0), so

limt→0

 f (r 1(t)) = limt→0

tt

t2 + t2  =

  1

2.

Next consider the curve,  r 2(t) = (0, t) defined for all t   >   0. Then

limt→0+

r 2(t) =  a  and along this curve f (r 2(t)) =   f (0, t) = 0  and so

limt→0+

 f (r 2(t)) =   limt→0+

0 =  0 =  1

2

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 102/212

102

Since we have found two different paths with two different limits as they

approach  (0, 0) it follows that   lim(x, y)→(0,0)

 f (x, y) does not exist.

To see what is actually happening with the function we can plot its

 graph and we see in Figure 6.3 that the resulting surface seems to have

some kind of pinch in it at  (0, 0). The curve (r 1(t), f (r 1(t)))  has gonealong the top ridge while the curve  (r 2(t), f (r 2(t)))  has gone along the

bottom ridge.

Figure 6.3: The surface defined by f (x, y) =   xy

x2 + y2 .

Sometimes it is easier to find a limit, if it exists, by first changing

the co-ordinate system; so we shall now consider another com-

monly used system of co-ordinates in R2.

Definition 6.16.  (Polar coordinates in R

2)

Let x   =  r cos(θ) and y   =   r sin(θ) for r ≥  0 and  θ ∈   [0, 2π ). Then r,

θ are called the polar coordinates of the point  (x, y). Notice that r2 =

x2 + y2, so r  = 

x2 + y2 and is the distance of the point  (x, y) from the

origin.

Exercise 6.3.1.   Find   lim(x, y)→(0,0)

xy x2 + y2

  if it exists.

Solution.  Introduce polar coordinates: x   =  r cos(θ) and y  =   r sin(θ).

Since r = 

x2 + y2, we have r → 0 as  (x, y) → (0, 0). Now

xy x2 + y2

= |r2

cos(θ) sin(θ)|r

  = r| cos(θ)|| sin(θ)| ≤ r

and so

−r ≤   xy x2 + y2

 ≤ r.

Letting (x, y) → (0, 0), we have r → 0, so by the Squeeze Theorem,

lim(x, y)→(0,0)

xy x2 + y2

  = 0.

We can define limits for functions of three or more variables in

a similar way. We define the  open ball  of radius δ  and centre  a  to be

the setB(a, δ) = { x ∈ R

n | d( x, a) < δ}

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 103/212

mathematical methods i 103

where d( x, a) is the distance between  x  and  a. Sometimes we refer

to open discs in R2 and open intervals in R as open balls so that

we do not need to interchange terminology for each case. Then

Definition 6.3 extends to the general setting by replacing “open

disc” with “open ball”.

We end this section by giving the precise definition of a limit for

the interested student.

Definition 6.17.  (Precise definition of limit)

Let f ( x) be defined on an open ball around  a, except possibly at  a, and let

L ∈ R. We write

L =   lim x→c

 f ( x)

and say that the limit of   f ( x) as  x  approaches  a  is equal to  L if for

every    >  0 there exists δ   >  0 such that | f ( x) − L|   <    holds for each

 x ∈ B(a, δ) with x =  a  (that is, for each  x  with 0 < d( x, a) < δ).

6.4   Continuity of Functions

Continuity for scalar-valued functions of one variable essentially

means that the graph of the function has no gaps in it, that is, can

 be drawn without taking your pen off the page. We make this def-

inition more precise and state it in a manner that also applies to

vector functions and functions of several variables as follows:

Definition 6.18.   (Continuity)

Let a ∈  Rm  for some m ≥  1 and let D ⊆  R

m such that D contains an

open ball centred at a. Let f   :  D →  Rn  for some n ≥ 1. We say that f   is

continuous at a  if 

lim x→  a

 f ( x) =   f (a)

We say that f is  continuous if it is continuous at all points in its domain.

It may be the case that   f   is not definedon an open ball around each point in

its domain. For example, if  A  = [0,∞

)and   f   :   A →  R is defined by   f (x) =√ x then point 0 is in the domain of 

 f   but   f  is not defined on an openinterval around 0. In this case, to saythat   f  is continuous at 0 means thatlimx→0+   =   f (0). We can similarlydefine   f  to be continuous at the rightend point of an interval.

One consequence of continuity is that a continuous function

cannot skip values, that is small changes in the variable result insmall changes in the value of the function.

It is easy to see that if  r (t) is a vector-valued function then it is

continuous at t0  if and only if each of the coordinate functions   f (t),

 g(t) and  h(t) are continuous at t0.

We give some examples of continuous functions.

Example 6.19.  1. Given real numbers a0, a1, . . . , an, the polynomial

P(x) =  a0xn + a1xn−1 + . . . + a1x + an,   x ∈ R

is a continuous function, and therefore every  rational function

R(x) =  P(x)Q(x)

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 104/212

104

where P and Q are polynomials, is continuous on its domain.

2.   sin(x) and  cos(x) are continuous on R.

3. ex is continuous on R, while ln(x) is continuous on its domain, which

is (0,∞).

4. For any constant α ∈  R, the function f (x) =  xα is continuous for all

x > 0.

5. The function r (t) = (t3, ln(3 − t),√ 

t) with domain [0, 3) is continu-

ous since each of its coordinate functions is continuous on this interval.

6. It follows from part (1) that f (x, y) =   x is a continuous function in

R2. Similarly, g(x, y) = y is continuous.

7. The function f defined in Equation (6.1) is not continuous at x =  0.

The next result tells us various ways of combining continuousfunctions to get new continuous functions.

Theorem  6.20.   Let n ≥   1, D ⊆   Rn and let f , g   :   D −→   R be

continuous at c ∈ D. Then:

1. The functions a f ( x) + bg( x) (a , b ∈  R) and f ( x) g( x) are continuous

at c.

2. If g(c) = 0, then  f ( x) g( x)

 is continuous at  c.

3. If h(t) is a function of a single variable t ∈ R, defined on the range of f 

and continuous at f (c), then h( f ( x)) is continuous at c.

4. If t ∈   I  ⊆ R, and  r (t) is a vector function such that  r (t) ∈ D for t ∈   I 

and r (t0) =  c , and if  r  is continuous at t0, then the function f (r (t)) is

continuous at t0.

Example 6.21.   Since f (x, y) =  x and g(x, y) =  y are continuous

 functions (Example  6.19(6)), Theorem 6.20(1) implies that h(x, y) =  x p yq

is continuous for all integers p, q ≥  0. Using Theorem  6.20(1) again, one

 gets that every polynomial

P(x, y) = a1x p1 yq1 + a2x p2 yq2 + . . . + ak x pk  yqk 

is continuous. Hence by Theorem 6.20(2) rational functions R(x, y) =P(x, y)Q(x, y)

, where P and Q are polynomials, are continuous on their domains

that is, everywhere except where Q(x, y) = 0.

Then, applying Theorem 6.20(3), it follows that eR(x, y), sin(R(x, y))

and cos(R(x, y)) are continuous functions (on their domains).

We note that if we know that a function   f   is continuous at a

point a  then it is easy to calculate lim x→a

 f ( x): it is simply   f (a).

Example 6.22.   Determine   lim(x, y)→(0,0)

x2 + y2

2x + 2 .

Solution: Note that the denominator is nonzero when  (x, y) = (0, 0).Thus since both the numerator and the denominator are polynomials,

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 105/212

mathematical methods i 105

Theorem  6.20(2) implies that the function f (x, y) =   x2+ y2

2x+2   is continuous

at (0, 0). Thus

lim(x, y)→(0,0)

x2 + y2

2x + 2  =

  02 + 02

2(0) + 2  = 0

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 106/212

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 107/212

7

Differentiation

You should be familiar with differentiating functions of one vari-

able from high school. We will first revise this and then move on

to the appropriate analogues for vector functions and functions of 

several variables.

7.1   Derivatives of functions of one variable

Definition 7.1.   (Derivative)

Let f (x) be a function defined on some interval I and let a be an interior

 point of I.

We say that f is  differentiable at a if the limit

limx→a

 f (x)−

  f (a)

x − a

exists and we define f (a) to be the value of the limit. Then f (a) is called

the (first) derivative of f at a.

In a similar way one defines the  left derivative

 f (a−) =   limx→a−

 f (x) −   f (a)

x − a

of f at a, and the  right derivative

 f (a+) =   lim

x→a+

 f (x) −   f (a)

x − aof f at a, whenever the corresponding limits exist.

Letting x  =   a + h  we have that  x →  a is the same as  h →  0 and so

an equivalent definition of the derivative is that

 f (a) =   limh→0

 f (a + h) −   f (a)

h

Recall that another notation for the derivative of   f   at a  is  d f dx (a).

Notice that the left and right derivatives can be considered not

only for interior points  a  of  I  but also when  a  is a right or left end-point of  I  respectively.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 108/212

108

We emphasis the following key concept as it will guide our later

definitions of the derivative of other types of functions.

Key Concept  7.2.  The derivative of f at a is the slope of the tangent

line to the curve defined by y   =   f (x) at the point  (a, f (a)). More-

over, close to x  =  a the tangent line is the best approximation to the

curve by a straight line.

You should be familiar with finding the derivative of various

functions from high school, such as polynomials, trigonometric

functions, exponential functions and logarithmic functions. More-

over, for a function defined in a piecewise manner you often need

to use the formal definition to determine the derivative. You should

also be familiar with the following results.

Theorem  7.3.   Let f (x) and g(x) be differentiable on an interval I. Then:

1.   [ f (x) + g(x)]  =   f (x) + g(x) and  [ f (x) − g(x)] =   f (x) − g(x).

2. For any constant c ∈ R, [c f (x)]  =  c f (x).

3. (Product Rule)

[ f (x) g(x)]  =   f (x) g(x) +   f (x) g(x)

4. (Quotient Rule) If g(x) = 0  on I  f (x)

 g(x)

=

  f (x) g(x) −   f (x) g(x)

( g(x))2

5. (Chain Rule) Assume that h(x) is a differentiable function on some

interval J and h(x) ∈  I for all x ∈   J. Then the function f (h(x)) is

differentiable on J and

d

dx [ f (h(x))] =   f (h(x))h(x)

Theorem  7.4.  If f is differentiable at some a ∈   I, then f is continuousat a.

Remark 7.5.  The converse statement of Theorem 7.4 is not true, that is,

not every continuous function is differentiable. For example, f (x) = |x| is

continuous everywhere on R. However

 f (0−) =   limx→0−

|x| − 0

x − 0  =   lim

x→0−−x

x  = −1 ,

while

 f (0+) =   limx→0+

|x| − 0

x − 0

  =   limx→0+

x

x

  = 1

=   f (0−) ,

so f is not differentiable at  0.1   1 There are continuous functions onR which are not differentiable at anypoint, but such examples are beyondthe scope of this unit.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 109/212

mathematical methods i 109

Note that even if   f   is differentiable at a point  a  it is not necessar-

ily true that   f (a) =   limx→a

 f (x). This happens to be true only if   f (x)

is continuous at a  and this is not necessarily the case. We say that

 f   is continuously differentiable on an interval  I  if   f (x) exists and is

continuous everywhere in  I .

Definition 7.6.   (nth derivative)

If f is differentiable at any x ∈  I, we get a function x  →   f (x) de-

 fined on I which is called the  (first) derivative of f on I. If f (x) is

also differentiable on I we define the  second derivative of f on I by

 f (x) = ( f )(x) for any x ∈  I. Repeating this process we can define the

nth derivative by f (n)(x) = ( f (n−1))(x) for all x ∈  I.

7.2   Differentiation of vector functions

We can define the derivative of a vector function in an analogous

way to the derivative of a scalar function of one variable.

Definition 7.7.  (Derivative of a vector function)

Let r (t) be a vector-valued function defined on an interval I containing

the point t0

. Define

r (t0) = lims→0

r (t0 + s) − r (t0)

s  ,

if the limit exists. In this case r (t0) is called the derivative of  r  at t0, and

r  is called differentiable at t0.

The vector function r (t) is called continuously differentiable in  I 

if  r (t) exists and is continuous everywhere in  I . Just as for scalar

functions of one variable, it is possible for a function to be differen-

tiable on an interval but for the derivative not to be continuous.The following result shows us that to differentiate a vector func-

tion we simply need to differentiate each of the coordinate func-

tions. We state it for vector functions whose image lies in R2  but it

will clearly extend to any Rn.

Theorem  7.8.  The vector function  r (t) = ( f (t), g(t)) is differentiable at

t0  if and only if the coordinate functions f , g and h are differentiable at t0.

When r (t) is differentiable at t0, its derivative is given by

r (t0) = ( f (t0), g(t0)).

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 110/212

110

Proof.   We have

r (t0) =   lims→0

r (t0 + s) − r (t0)

s

=   lim

s→0

1

s

 ( f (t0 + s)

−  f (t0), g(t0 + s)

− g(t0))

=

lims→0

 f (t0 + s) −   f (t0)

s  , lim

s→0

 g(t0 + s) − g(t0)

s

= ( f (t0), g(t0))

Recall that given a vector v  = (x, y) ∈R

2, the length or  magnitude of  v  is de-noted by |v| and is given by

 x2 + y2

What is the geometric interpretation of the derivative of a vector

 function?  Recall that the vector function  r   :   I  →  R2 describes

a curve C  in R2. If  t0 ∈   I  such that r (t0) exists and is non-

zero, then r (t0) is a tangent vector to the curve  C  at  r (t0). It

will sometimes be useful to use the corresponding  unit tangent

vector given by

T (t0) =  r (t0)

|r (t0)|The line passing through the point r (t0) and parallel to the

vector r (t0) is called the tangent line to the curve  C  at  r (t0). The

tangent line is called the  (affine) linear approximation  of  C  at

r (t0). There is a unique linear parameterisation of   by a vector

function v(t) such that v(t0) = r (t0) and  v (t0) = r (t0), namely

v(t) = r (t0) + (t−

t0) r (t0)

(However there are many other linear parameterisations of  .)

Near r (t0), it is the best approximation of  C  by a line.

We have a similar geometric interpretation for functions

r  :  I  → R3.

The derivative of a vector function  r (t) encodes more than just

the slope of the tangent to the curve traced out by  r (t). For exam-

ple, consider the two vector functions  r 1(t) = (cos(t),sin(t))  and

r 2(t) = (cos(−2t),sin(−2t)). Both functions trace out the same

curve inR

2

, namely the unit circle. Now r 1(t) = (− sin(t),cos(t))while r 2(t) = (2sin(−2t), −2cos(−2t)). The first is a vector of mag-

nitude 1  while the second has magnitude  2  and at time  t  =  0 points

in the opposite direction to the first. This reflects the fact that the

first function describes an anticlockwise trajectory around the circle

while the second describes a clockwise trajectory that is travelling

at twice the speed.

In particular, for a particle with position vector at time  t  given by

r )(t) the velocity vector is given by  r (t) and the speed is given by

|r (t)|. The velocity vector describes the rate of change of position

(and so has a direction) while the speed is the rate of change of 

distance travelled (and so only a scalar). The acceleration vector isgiven by r (t).

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 111/212

mathematical methods i 111

C

  r (t0)

Figure 7.1: Tangent line of a curve Cat r (t0).

Example 7.9.  1. Sketch the curve C defined by

r (t) = (cos(t),sin(t), t) for t ∈ R,

 find the tangent vector  r (t) and write down the parametric equations

of the tangent line to C at r (t0). (This curve is called a helix.)

Solution.  The curve C lies on the cylinder x2 + y2 =  1. The orthogo-

nal projection of  r (t) on the xy-plane is  v(t) = (cos(t),sin(t)), while

 z   =  t is the height of the point r (t). The curve C is shown in Figure

7.2.

Figure 7.2: The curve defined byR(t) = (cos(t),sin(t), t).

We have

r (t) = (− sin(t),cos(t), 1)

Given t0, the tangent line to C at r (t0) is the line through r (t0) =

(cos(t0),sin(t0), t0) parallel to r (t0) = (− sin(t0),cos(t0), 1). Thus

the parametric equations of  are:

L :  x  =  cos(t0) − (t − t0) sin(t0) ,   y =  sin(t0) + (t − t0) cos(t0) ,   z =  t  ,   t ∈ R.

2. Let r (t) = (t, f (t)), where f (t) is a differentiable function. Then the

curve defined byx =  t,   y =   f (t)

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 112/212

112

coincides with the graph of the function f and r (t) = (1, f (t)) which

is a vector in the direction of the tangent to the graph.

 Just as with scalar functions of one variable, we need some rules

for finding the derivatives of combinations of vector functions. So

we have the following

Theorem  7.10.   Let u(t) and  v(t) be differentiable vector functions (both

with values in Rn) and let f (t) be a differentiable real-valued function.

Then:

1.   [u(t) + v(t)]  =  u (t) + v(t) ;

2.   [cu(t)]  =  cu(t) for any constant c ∈ R ;

3.   [ f (t)u(t)]  =   f (t)u(t) +   f (t)u(t) ;

4.   [u(t) · v(t)]  =  u (t) · v(t) + u(t) · v(t) ;

5. (for vector functions in R3)

[u(t)× v(t)]  =  u (t)× v(t) + u(t)× v(t)

6. (Chain Rule) [u( f (t))]  =   f (t)u( f (t)).

Proof.  We will only prove parts (3) and (6) and only in the case

of R2. Let u(t) = (u1(t), u2(t)).

(3) Using the Product Rule for real-valued functions we have

[ f (t)u(t)]   = ( f (t)u1(t), f (t)u2(t))

= (( f (t)u1(t)), ( f (t)u2(t)))

= ( f (t)u1(t) +   f (t)u1(t), f (t)u2(t) +   f (t)u2(t))

=   f (t)(u1(t), u2(t)) +   f (t)(u1(t), u

2(t))

=   f (t)u(t) +  f (t)u(t).

(6) Using the Chain Rule for real-valued functions, we get

[u( f (t))]   = (u1( f (t)), u2( f (t)))

= ([u1( f (t))], [u2( f (t))])= ( f (t)u

1( f (t)), f (t)u2( f (t))) =   f (t) u( f (t))

Example 7.11.  Show that if the vector function  r (t) is continuously

differentiable for t in an interval I and |r (t)|  =  c , a constant for all t ∈   I,

then r (t) ⊥ r (t) for all t ∈  I.

Question: What would the curve described by r (t) look like?

Solution.  Note that |r (t)|2 = r (t) · r (t). Then differentiating

r (t) · r (t) = c2,

by using Theorem 7.10(4), we get

r (t) · r (t) + r (t) · r (t) = 0.

That is, 2r (t) · r (t) = 0. Hence r (t) ⊥ r (t) for all t.Question: So which well-known geometric fact have we just proved?

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 113/212

mathematical methods i 113

7.3   Partial Derivatives

For scalar-valued functions of one variable, the derivative gave us

the rate of change of the function as the single variable changed.

Functions of several variables have several variables and the func-

tion may change at different rates with respect to different vari-

ables. To accommodate this we need to introduce the notion of a

 partial derivative.

Definition 7.12.  (Partial Derivative)

Let f (x, y) be a function defined on a subset D of R2, and let (a, b) be an

interior point of D. For x near a, define g(x) =   f (x, b) to be a function of 

x. We define f x(a, b) = g (a), if this exists.

We call f x(a, b) the  partial derivative of f with respect to x at  (a, b).

From the definition of the derivative for a function of a single variable we

have:

 f x(a, b) =   limh→0

 f (a + h, b) −  f (a, b)

h  .

The partial derivative of f with respect to y at (a, b) is defined simi-

larly:

 f  y(a, b) =   limh→0

 f (a, b + h) −   f (a, b)

h  .

For   f x(a, b) we have fixed the value of  y  and let  x  vary, so we

are back to dealing with a function of just one variable, a familiar

situation.In general, for any value of  x  and  y:

 f x(x, y) =   limh→0

 f (x + h, y) −   f (x, y)

h

and

 f  y(x, y) =   limh→0

 f (x, y + h) −   f (x, y)

h  ,

whenever the limits exist. So   f x(x, y) gives us the rate of change

of   f  with respect to  x  with y  held constant and   f  y(x, y) the rate of 

change of   f  with respect to y  with x  constant.

What is the geometrical interpretation?  Imagine standing at a

point on the surface given by the graph of the function   f . The

value of   f x(x, y) at the point will be the slope of the tangent

line to the surface that is parallel to the  xz-plane, that is the

slope you see when looking in the direction of the  x-axis. This

is demonstrated in Figure 7.3.  This slope is likely to be different

from the slope of the tangent line to the surface that is parallel

to the yz-plane, which is given by   f  y(x, y). See Figure 7.4.

There are lots of different notation used to denote partial deriva-tives. Some are2:   2 Note that we use   ∂

∂x   instead of    ddx   to

distinguish between partial differentia-tion and ordinary differentiation.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 114/212

114

Figure 7.3: Tangent line parallel toxz-plane

Figure 7.4: Tangent lines parallel toxz-plane and yz-plane

 f x(x, y) =   f x(x, y) =  ∂

∂x f (x, y) =

 ∂ f 

∂x(x, y),

 f  y(x, y) =   f  y(x, y) =  ∂

∂ y f (x, y) =

 ∂ f 

∂ y(x, y)

To find the partial derivative of   f (x, y) with respect to x  we con-

sider y  as a constant and proceed as if   f  were a function of a single

variable. This means that we can use the familiar rules for differen-

tiation of functions of a single variable.

Example 7.13.   Let f (x, y) =  x2 y + y sin(x). Then

 f x(x, y) = 2xy + y cos(x),

 f  y(x, y) = x2 + sin(x).

We can extend the notion of a partial derivative to define the

partial derivatives   f x,   f  y  and   f  z  of a function   f (x, y, z) of three vari-

ables, and more generally the partial derivatives   f xi of a function

 f (x1, x2, . . . , xn) of  n  variables. To calculate the partial derivative

with respect to the variable  x i, we consider all  x j’s with   j =   i as

constants and proceed as though   f  were a function of a single vari-

able x i. Again we can use the familiar rules for differentiation of 

functions of a single variable.

Example 7.14.  1. Let f  (x, y, z) =  x + y + z

x2 + y2 + z2. Then

 f x(x, y, z) =  (x2 + y2 + z2) − (x + y + z) (2x)

(x2 + y2 + z2)2  =

  y2 + z2 − x2 − 2x( y + z)(x2 + y2 + z2)2

  .

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 115/212

mathematical methods i 115

Similarly, or just using the symmetry with respect to x, y and z, one

 gets

 f  y(x, y, z) =  x2 + z2 − y2 − 2 y(x + z)

(x2 + y2 + z2)2  ,   f  z(x, y, z) =

  x2 + y2 − z2 − 2 z(x + y)

(x2 + y2 + z2)2  .

2. f (x, y, z) = exyz + y sin(xz). Then

 f x(x, y, z) = yzexyz + yz cos(xz)   ,   f  y(x, y) = xzexyz + sin(xz)   ,

 f  z(x, y, z) =  xyexyz + xy cos(xz).

3. Consider the linear function f ( x) =  c · x defined for x ∈  Rn, where c

is a constant vector; that is,

 f (x1, x2, . . . , xn) = c1x1 + c2x2 + . . . + cnxn.

Then f xi ( x) = c i  for each i and each  x ∈ R

n

.

7.3.1   Higher Derivatives

With functions of one variable there is only at most one second

derivative but as the number of variables increases the number of 

possible second order derivatives increases quite rapidly, as we

shall see in what follows.

Definition 7.15.  (Second partial derivative)

Let D be an open disc inR

2

and let f   :   D −→  R. Suppose the partial

derivative f x(x, y) exists for all (x, y) ∈  D. This defines a function

 g  =   f x   :  D −→  R. Suppose this new function g has a partial derivative

with respect to x at some  (a, b) ∈ D. We denote

 f xx (a, b) =  gx(a, b) =  ∂

∂x f x(a, b)

and call it the  second partial derivative of   f  with respect to  x  at

(a, b). Similarly we define

 f xy(a, b) =  g y(a, b) =  ∂

∂ y f x(a, b)

Some alternative notation is:

 f xx  =   f xx  =  ∂

∂x

∂ f 

∂x

 =

  ∂2 f 

∂x2  ,

 f xy  =   f xy  =  ∂

∂ y

∂ f 

∂x

 =

  ∂2 f 

∂ y∂x  .

We can similarly define   f  yx  and   f  yy. Thus we have a total of four

second-order partial derivatives, although for many of the functions

with which we shall be concerned it turns out that the two mixedderivatives,   f  yx  and   f xy , are identical.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 116/212

116

We can also define third order derivatives:

 f xxy  =  ∂

∂ y

 ∂

∂x

∂ f 

∂x

,

similarly one defines   f xxx ,   f xyx , etc.

Example 7.16.   Let f (x, y) =  x2 y + xy − y3. Then

 f x(x, y) = 2xy + y and f  y(x, y) = x2 + x − 3 y2

 Also

 f xx (x, y) =  ∂

∂x(2xy + y) = 2 y ,   f xy(x, y) =

  ∂

∂ y(2xy + y) = 2x + 1,

 f  yx (x, y) =  ∂

∂x(x2 + x − 3 y2) = 2x + 1

 f  yy(x, y) =  ∂

∂ y

(x2 + x

−3 y2) =

−6 y.

Notice that f xy (x, y) =   f  yx (x, y) for this function. Also

 f xxy (x, y) =  ∂

∂ y(2 y) = 2,   f xyx (x, y) =

  ∂

∂x(2x + 1) = 2

and

 f  yxx (x, y) =  ∂

∂x(2x + 1) = 2

Thus f xxy (x, y) =   f xyx (x, y) =   f  yxx (x, y).

Example 7.16 is an illustration of the following theorem.

Theorem  7.17 (Clairaut’s Theorem).   Let f (x, y) be defined on an open

disc D of R2. If the functions f xy   and f  yx  are both defined and continuous

on D, then f xy  =   f  yx  on D.

An analogue of Theorem  7.17 holds for functions of three or

more variables. For example,

 f xyzx (x, y, z) =   f xxyz(x, y, z) =   f  yzxx (x, y, z) =   f  yxzx (x, y, z), etc,

provided the partial derivatives are continuous.

7.4   Tangent Planes and differentiability

When dealing with functions of one variable we often want to find

the equation of a line tangent to the graph of the function at some

point. One reason for doing this is that sometimes it is convenient

to use the equation of the tangent line as an approximation to the

more complicated equation of the actual function. Indeed if   f (x) is

a function that is differentiable at  x  =  a then the tangent line to the

curve defined by y  =   f (x) at  x  =  a  has equation

 y =   f (a)(x − a) +   f (a)   (7.1)

The derivative of   f   at x   =   a is the scalar multiple of  x  given in

the equation of the line. This is demonstrated in Figure 7.5 for thefunction   f (x) =  x2 at the point  x  =  3.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 117/212

mathematical methods i 117

−1 1   2 3   4   5

−10

10

20

30   Figure 7.5: The tangent line to  y  =  x2

at x  =  3.

The function  L(x) =   f (x)(x − a) +   f (a) is called the  best linearapproximation3 to   f   at x  =  a.   3 Note that  L(x)   :   R →   R is notnecessarily a linear function in thesense of Chapter  4.  Indeed, if   f (a) = 0then L(λx) =   f (a)(λx − a) +  f (a) =λL(x). The function  L(x) is what wewould call an affine function.

Similar ideas occur in multi-variable calculus but the linear ap-

proximations are of higher dimensions than one-dimensional tan-

gent lines. For instance, for functions of two variables the analogue

of a tangent line is a tangent plane.

Consider a function   f   :  D −→ R where the domain  D  is a subset

of R2, and let (a, b) be an interior point of  D . Let S  be the surface

defined by

{(x, y, f (x, y)) | (x, y) ∈ D}.

Suppose the partial derivatives   f x(a, b) and   f  y(a, b) exist. Fix  y  =  b

and consider the following curve  C1  on S  defined by

r 1(x) = (x, b, f (x, b))

for all (x, b) such that   f (x, b) is defined. Then  C1  is the intersection

curve of the surface  S  with the plane  y   =   b. The vector function

r 1(x) is differentiable at  a  and  r 1(a) = (1,0, f x(a, b))  is a tangent

vector to C1  at  P  =  r 1(a) = (a, b, f (a, b)).

Similarly, the curve C2  defined by

r 2( y) = (a, y, f (a, y))

for all (a, y) such that   f (a, y) exists, is the intersection of  S  withthe plane x   =   a and has a tangent vector r 2(b) = (0,1, f  y(a, b))  at

r 2(b) =  P.

In conclusion, the vectors  r 1(a) = (1,0, f x(a, b))  and  r 2(b) =

(0,1, f  y(a, b))  are tangent to the surface  S  at the point  P  = (a, b, f (a, b)).

These define tangent lines 1  and 2  to  S  parallel to the correspond-

ing coordinate planes. This was depicted in Figure  7.4.

Since 1  is the best linear approximation to  C1  near  P  and 2

is the best linear approximation to  C2  near P, if there is a plane

that is the best linear approximation to  S  near P  it must contain

 both 1  and 2. There is a unique plane Π, containing the point

P  = (a, b, f (a, b))  and parallel to the vectors  r 1(a) and  r 2(b) and Πis called the tangent plane to S at P . See Figure  7.6.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 118/212

118

Figure 7.6: Tangent plane

Any non-zero vector orthogonal to Π is called a normal vector to

the surface S  at the point  P. Thus

n =  r 1(a)× r 2(b) = (− f x(a, b), − f  y(a, b), 1)

is a normal vector to  S  at  P.4   4 Recall from the week  4  tutorial sheetthat the cross product of two vectors isperpendicular to both the original twovectors.

Since n  is perpendicular to Π, an equation for the tangent plane

Π has the form

− f x(a, b) x −   f  y(a, b) y + z =  d

for some constant d  which can be determined by using the fact that

P ∈ Π.

Example 7.18.  Determine the equation of the tangent plane Π to the

 graph S of f (x, y) =  x2 + 3xy − y2 at the point P = (1, −1, −3) and an

upward normal to this plane.

Solution:  Calculating the partial derivatives we have f x(x, y) = 2x + 3 y

and f  y(x, y) =  3x − 2 y. Thus f x(1, −1) = −1 and f  y(1, −1) =  5, and

therefore  n  = (1, −5, 1) is a normal vector to the plane Π.

Thus an equation for Π has the form x − 5 y + z   =  d. Since P   =

(1, −1, −3) ∈  Π it follows that 1 + 5 − 3  =  d, that is, d  =  3. Hence the

equation of the tangent plane is

x − 5 y + z =  3.

It is possible to write the equation of the tangent plane in a dif-ferent manner. Since the point P  is given by (a, b, f (a, b))  the equa-

tion of Π can also be written as

 z −   f (a, b) =   f x(a, b)(x − a) +   f  y(a, b)( y − b)

or

 z =

 f x(a, b)   f  y(a, b) x − a

 y − b

+   f (a, b)   (7.2)

This looks similar to the equation of the tangent line given in

(7.1). The tangent plane to S  at  P  may or may not be a good linear

approximation to S  near  P. If it is then we define our function   f   to be differentiable.   We can make this definition more

precise but this is beyond the scope of the course: we would define   f   to bedifferentiable at the point  c   = (a, b) if there is a linear function  A  :  R2 →  R

such that

lim x→c

 f ( x) −  f (c) −  A( x − c)

d( x, c)  = 0

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 119/212

mathematical methods i 119

Definition 7.19.   (Derivative)

We say that f is  differentiable at  (a, b) if the tangent plane given by

(7.2) is the best linear approximation to f near (a, b). The matrix

D f (a, b) =

 f x(a, b)   f  y(a, b)

is called the derivative of f at  (a, b). We say that f is  differentiable if f 

is differentiable at every point in its domain.

We should note that the existence of   ∂ f ∂x   and

  ∂ f ∂ y   at (a, b) does not

guarantee that   f   is differentiable at (a, b), as demonstrated in the

following example.

Example 7.20.  Consider the function

 f   :   R2 −→   R

(x, y)  −→

0   if xy = 0

1   if xy =  0

The graph of f is given in Figure  7.7. When x   =  0 or y   =  0, the value

of f (x, y) is  1  and so f is constant in both the x-direction and the y-

direction. Hence f x(0, 0) =   0 and f  y(0, 0) =   0. However, the tangent

 plane

 z   =

 f x(0, 0)   f  y(0, 0) x − 0

 y − 0

+   f (0, 0)

= 1is clearly not a good a approximation for f near (0, 0) as f (a, b) =  0 for

any (a, b) arbitrarily close to (0, 0) with ab = 0.

A simple test for a function to be differentiable is the following

theorem.

Theorem  7.21.   If   ∂ f ∂x   and

  ∂ f ∂ y  exist on an open disc centred (a, b) and are

continuous at (a, b) then f is differentiable at (a, b).

Example 7.22.  Recall the function f (x, y) =   x2 + 3xy −  y2  from

Example 7.18.  Its partial derivatives are f x(x, y) =   2x +  3 y and

 f  y(x, y) =  3x − 2 y which exist and are continuous for all (x, y) ∈  R

2

asthey are polynomials. Hence f is differentiable and its derivative is

D f (x, y) =

2x + 3 y   3x − 2 y

We also have the following analogue of Theorem  7.4.

Theorem  7.23.   If f is differentiable at (a, b) then f is continuous at

(a, b).

7.5   The Jacobian matrix and the Chain Rule

We saw in the previous section that if a function   f   :   R2 →   R

is differentiable then the derivative of   f   at c   = (a, b) is the 1 × 2

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 120/212

120

x

 y

 z

Figure 7.7: The graph of the function   f defined in Example  7.20.

matrix given by ∂ f ∂x (c)   ∂ f 

∂ y (c)

We also saw in Section 7.2 that if   f (t) = ( f 1(t), f 2(t)) is a differ-

entiable vector function then its derivative is the vector function

 f (t) = (

d f 1dt  (t),

 d f 2dt  (t)). It is convenient to write both   f (t) and   f (t)

as column vectors5 , that is   5 This is consistent with our conventionthat all vectors are actually columnvectors.

 f (t) =

 f 1(t)

 f 2(t)

  and   f (t) =

d f 1dt  (t)

d f 2dt  (t)

that is, as 2 × 1 matrices.

What happens if we have a function  F   :   R2 →   R2, that is, a

function of the form

(u, v) =  F(x, y)

Then really we have two coordinate functions of  x  and  y  and it is best to think of our function in terms of column vectors, that isu

v

 =  F(x, y) =

 f 1(x, y)

 f 2(x, y)

If our function  F  is differentiable6 then the derivative at  c  = (a, b) is   6 We haven’t said what it actuallymeans for such a function to be differ-entiable but essentially it is just thatF can be approximated by an (affine)linear function.

the 2 × 2 matrix

DF(c) =

∂ f 1∂x (c)   ∂ f 1

∂ y (c)∂ f 2∂x (c)   ∂ f 2

∂ y (c)

  (7.3)

consisting of all the first order partial derivatives.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 121/212

mathematical methods i 121

Definition 7.24.   (Jacobian matrix)

The matrix DF(c) is called the Jacobian matrix of F at  c  and its determi-

nant is called the Jacobian.

In general, if  F   :  Rn →  Rm then  F  has  m  coordinate functions

and the Jacobian matrix of  F  is an m × n matrix of partial deriva-

tives.

Example 7.25.  1. Polar coordinates transformation: Let F(r, θ) =

(r cos(θ), r sin(θ)) for r ≥   0 and  θ  ∈   [0, 2π ). That is F(r, θ) =

( f 1(r, θ), f 2(r, θ))  where f 1(r, θ) = r cos(θ) and f 2(r, θ) = r sin(θ).

Now if 

D =

{(r, θ)

|r

∈[0, a], θ

 ∈[0, 2π )

} for some a > 0, then

range( F ) = {(x, y) ∈ R2 | x2 + y2 ≤ a2}

that is, the disc with centre at  0  and radius a.

Then for every c  = (r, θ) we have

DF(c) =

∂ f 1∂r (c)   ∂ f 1

∂θ  (c)∂ f 2∂r (c)

  ∂ f 2∂θ  (c)

 =

cos(θ) −r sin(θ)

sin(θ)   r cos(θ)

Notice that the Jacobian of F is equal to r.

2. Consider F : R2 −→ R2  given by F(x, y) = (3x + 4 y, −2x + y). Then

(writing vectors as columns) we have

F(x, y) =

3x + 4 y

−2x + y

 =

  3 4

−2 1

x

 y

so F is the linear transformation defined by the matrix

 A =   3 4

−2 1Now

DF(x, y) =

∂ f 1∂x

∂ f 1∂ y

∂ f 2∂x

∂ f 2∂ y

 =

  3 4

−2 1

 =  A

This should not be surprising: the derivative of a function gives the

best linear approximation to the function near a point. If the function is

already linear then the best linear approximation is the function itself.

In the case of functions of one variable the derivative of f (x) =  ax is

equal to a for all x ∈  R. For a linear function F( x) =   A x given by amatrix A, then DF( x) =  A for every x ∈ R

2 (Exercise).

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 122/212

122

7.5.1   The Chain Rule

If   f  is a scalar-valued function of a variable  x  and  x  is in turn a

scalar-valued function of  t, then recall that the Chain Rule allows us

to determine the derivative of   f  with respect to t  without having to

explicitly find   f  as a function of  t :

d f 

dt  =

  d f 

dx

dx

dt

This naturally extends to functions of several variables.

Suppose first that   f   : R2 → R is a function of the variables  x  and

 y and both  x  and  y  are functions of  t. Then   f  can be rewritten as a

function of  t  and the derivative with respect to  t  will be:

Chain Rule I:  d f 

dt  =

 ∂ f 

∂x

dx

dt  +

 ∂ f 

∂ y

dy

dt  (7.4)

Note that both   f , x  and  y  are functions of the single variable  t

and so we use  d f 

dt ,   dxdt   and

  dydt  instead of the partial derivative nota-

tion.

Example 7.26.   Let f (x, y) =  x2 + y2 such that x =  t2 and y =  et. Then

dx

dt  = 2t and

  dy

dt  = et

while∂ f 

∂x  = 2x and

  ∂ f 

∂ y  = 2 y

Thus by the Chain Rule we have

d f 

dt  =

 ∂ f 

∂x

dx

dt  +

 ∂ f 

∂ y

dy

dt= 2x(2t) + 2 y(et)

= 2(t2)2t + 2etet substituting for x and y

= 4t3 + 2e2t

Alternatively, suppose that   f   :   R2 →   R is a function of the

variables x  and  y  and both x  and  y  are functions of the variables s

and t. Then   f  is a function of  s  and  t  and its partial derivatives will

 be given by:

Chain Rule II:

∂ f 

∂s  =

  ∂ f 

∂x

∂x

∂s  +

 ∂ f 

∂ y

∂ y

∂s

∂ f 

∂t  =

  ∂ f 

∂x

∂x

∂t  +

 ∂ f 

∂ y

∂ y

∂t

(7.5)

Example 7.27.   Let f (x, y) =  x2 + y2 with x  =  2s + t and y  =  s2 + 4.

Then ∂ f 

∂x  = 2x

  ∂ f 

∂ y  = 2 y

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 123/212

mathematical methods i 123

∂x

∂t  = 1

  ∂x

∂s  = 2

∂ y

∂t  = 0

  ∂ y

∂s  = 2s

Then by the Chain Rule we have

∂ f 

∂s  =

 ∂ f 

∂x

∂x

∂s +

 ∂ f 

∂ y

∂ y

∂s= (2x)2 + (2 y)(2s)

= 2(2s + t) + 2(s2 + 4)2s

= 4s3 + 24s + 4t

and∂ f 

∂t  =

  ∂ f 

∂x

∂x

∂t  +

 ∂ f 

∂ y

∂ y

∂t= (2x)1 + (2 y)0

= 2(2s + t) = 4s + 2t

Recall that the Chain Rule for scalar-valued functions of one

variable also provides a way of determining the derivative of the

composition of two functions. In this interpretation we have

d

dx f ( g(x)) =   f ( g(x)) g(x).

For example, if   f (x) =   sin(x) and  g(x) =   2x2 then   f ( g(x)) =

sin(2x2)7 and   7 Indeed doing such differentiationhas probably become second naturewithout even realising that you areusing the Chain Rule.

d

dx f ( g(x)) = cos(2x2)4x

This interpretation naturally extends to vector functions of an

arbitrary number of variables in a manner that does not requiremore complicated and complicated expressions along the lines of 

our previous versions of the Chain Rule in (7.4) and (7.5).

Theorem  7.28 (Chain Rule).   Let f   :   Rn →  Rm and g   :  Rs →  R

n

be differentiable functions and let F   =   f  ◦  g   :   Rs →   Rm. Then F is

differentiable and

DF( x) =  D f ( g( x))Dg( x)

Note that D f ( g( x))  is an  m × n matrix and  D g( x) is an n × s

matrix so their product is an  m × s matrix as required.

Is this the same as Chain Rules I and II?  Suppose that   f  is a func-

tion of  x  and  y, and both  x  and  y  are in turn functions of  t. De-

fine  g(t) = (x(t), y(t)), the change of variables function and let

F   =   f  ◦  g. Then  F  can be viewed as the function given by   f   when

written as a function of  t . Then

Dg(t) =

dxdtdydt

  and D f (x, y) =

∂ f ∂x

∂ f ∂ y

Now Theorem 7.28 gives

DF(t) = ∂ f ∂x

∂ f ∂ y

dxdtdy

dt =∂ f ∂x

dxdt   +

  ∂ f ∂ y

dydt

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 124/212

124

that isd f 

dt  =

  dF

dt  =

 ∂ f 

∂x

dx

dt  +

 ∂ f 

∂ y

dy

dt

as we saw in Chain Rule I  (7.4). Chain Rule II can be recovered in a

similar manner.

Example 7.29.   Let g (x, y) = (x2 + 3 y, y2 + 3) and   f (u, v) =

(eu − v, ln(v)). Find the Jacobian matrix and the Jacobian of  F (x, y) =

 f ( g (x, y)) at  c  = (0, 1).

We have g (c) = (3, 4) and

D g (x, y) =

2x   3

0 2 y

  ,   D f (u, v) =

eu −1

0 1/v

.

The Chain Rule implies

D F (c) = D f ( g (c))D g (c) = e3

−10 1/4

0 30 2

 = 0 3e3

− 20 1/2

 Hence the Jacobian, det(D F (c)), is  0.

7.6   Directional derivatives and gradients

We saw in Section  7.3, that  ∂ f ∂x   gives the rate of change of   f   in the

direction of the  x-axis and  ∂ f ∂ y  gives the rate of change of   f   in the

direction of the  y-axis. However, we may also be interested in the

rate of change of   f   in some other direction. This is provided by the

directional derivative.

Definition 7.30.  (Directional Derivative)

Let f (x, y) be defined on some subset D of R2 and let c   = (a, b) be an

interior point of D. Given a non-zero vector  v ∈  R2, define u  =

  v

|v| , that

is the unit vector in the direction of  v.

The directional derivative of f at  c  in the direction v  is defined by

Dv f (c) =   limh

→0

 f (c + hu) −   f (c)

h

whenever the limit exists.

One can define directional derivatives for functions of three or

more variables in a similar fashion.

Geometrically,  Dv f (c) is the rate of change of   f   at c  in the di-

rection of  v. So if  v  is parallel to the  x-axis then the directional

derivative is just the partial derivative with respect to  x . Similarly,

if  v  is parallel to the y-axis then the directional derivative is the

partial derivative with respect to y.

To enable us to calculate directional derivatives we introduce thenotion of the gradient vector.   8   8 You will have noticed that the gra-

dient vector ∇ f  looks very similar tothe derivative D f  and may wonderwhy we have given it a different name.There is a subtle difference and somany mathematicians usually write∇ f  as a column vector to distinguishthe two. The derivative D f ( x) is amatrix and so defines a linear function

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 125/212

mathematical methods i 125

Definition 7.31.  (Gradient vector)

Let f   :  D → R with D ⊆ R2 and let (x, y) be an interior point of D. The

gradient vector of f at  (x, y) is the vector

∇ f (x, y) = grad f (x, y) = ( f x(x, y), f  y(x, y))

if the partial derivatives f x(x, y) and f  y(x, y) exist.

In a similar way one defines the gradient vector of a function of three

or more variables. That is, if f (x, y, z) is a function of three variables, its

 gradient vector at (x, y, z) is defined by

∇ f (x, y, z) = ( f x(x, y, z), f  y(x, y, z), f  z(x, y, z))

whenever the partial derivatives of f at (x, y, z) exist.

Example 7.32.  1. I f f  (x, y) =   x2 y − ln(x) for all x   >  0 and y ∈  R,

then the gradient vector of f is

∇ f (x, y) = (2xy −  1

x, x2)

 for each  (x, y) in the domain of f.

2. Let f (x, y, z) = c · x =  ax + by + cz, where c  = (a, b, c) ∈ R3. Then

∇ f (x, y, z) = (a, b, c) = c.

The next theorem gives a method for calculating directional

derivatives

Theorem  7.33.  If f is a differentiable function at c, then for every non-

 zero vector v, the directional derivative Dv f (c) exists and

Dv f (c) = ∇ f (c) · u

where u  =  v

|v| .

Example 7.34.  Find the directional derivative of f (x, y) =   xe y at

c = (−3, 0) in the direction of the vector  v  = (1, 2).

Solution.  Now

∇ f (x, y) = (e y, xe y) and the partial derivatives f x  and

 f  y  are continuous so by Theorem  7.21, f is differentiable. Now ∇ f (c) =

(1, −3) and  u  =  v

|v|   =  (1, 2)√ 

5= (

  1√ 5

,  2√ 

5). Hence

Dv f (c) = ∇ f (c) · u = (1, −3) · (  1√ 

5,

  2√ 5

) = −   5√ 5

= −√ 

5.

7.6.1   Maximum rate of change

We would like to determine the direction in which   f  has the max-

imum rate of change. As we saw in Theorem  7.33 the directional

derivative of   f   in the direction of  v  is

Dv f (c) = ∇ f (c) · u

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 126/212

126

where u  is the unit vector in the direction of  v. Recall that we can

also calculate the dot product of two vectors by

∇ f (c) · u = |∇ f (c)| |u| cos(θ)

where θ  is the angle between∇

 f (c) and  u. Since|u|

  =   1 and

−1 ≤ cos(θ) ≤ 1 it follows that the maximum value for  Dv f (c) will

 be | ∇ f (c) | and will occur in the direction of ∇ f (c).

Thus we have proved the following theorem.

Theorem  7.35.   Let f ( x) for  x ∈ D, be a differentiable function of two or

more variables and let  c  be an interior point of D such that ∇ f (c) =   0.

Then

max|u|=1

Du f (c) = |∇ f (c)|

and the maximum is achieved only for  u  =   ∇ f (c)|∇ f (c)| .

Theorem 7.35 is saying that the gradient vector at a given point,points in the direction in which the function is increasing most

rapidly.

The same argument shows that the function is decreasing most

rapidly when the angle between  v  and ∇ f (c) is  π , that is, when  v

is in the direction of −∇ f (c).

Example 7.36.  The temperature at each point of a metal plate is given by

the function T (x, y) =  ex cos( y) + e y cos(x). In what direction does the

temperature increase most rapidly at the point  (0, 0). What is this rate of 

increase?

Solution. ∇

T   = (ex cos( y)−

e y sin(x),−

ex sin( y) + e y cos(x)), so

∇T (0, 0) = (1, 1). This is the direction of fastest increase of T at  (0, 0).

The rate of increase in the direction of ∇T (0, 0) is |∇T (0, 0)| =√ 

2.

If you consider the contour curves on a map, then the contour

curves indicate the direction in which the altitude is constant and

the direction in which the ground is steepest is in the direction

perpendicular to the contour curve. As we mentioned in Section

5.2.1, the contour lines on a map correspond to the level curves

of a function of two variables. The following theorem tells us that

the gradient vector is perpendicular to the level curves, since the

gradient vector is the direction of the greatest rate of increase, this

agrees with our experience.

Theorem  7.37.   If f   :   D →  R with D ⊆  R2, is differentiable at (a, b)

and ∇ f (a, b) =  0 then ∇ f (a, b) is perpendicular to the tangent line to

the level curve of f at (a, b).

Proof.   Let C  be the level curve passing through  c   = (a, b) and

let r (t) be a continuously differentiable parametrisation of  C  near

c with r (t0) =   c for some  t0. Since r (t) ∈   C for all  t, we have

 f (r (t)) =   k  for all  t. Differentiating this equality with respect to  t

and using the Chain Rule gives

0   =  D f (r (t ))Dr (t)= ∇ f (r (t)) · r (t)

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 127/212

mathematical methods i 127

For t  =   t0  this gives ∇ f (c) ⊥   r (t0). Since r (t) is the direction of 

the tangent line of  C  at  c  it follows that ∇ f (c) is orthogonal to C

at c.

Theorem 7.37 is illustrated in Figure  7.8.

z = f(x,y)

through (a,b)Level curve

(a,b)

Tangent line

Normal line

Gradient Vector

f(a,b)

Figure 7.8: The gradient vector relatedto the level curves

Similarly, let   f ( x)   :  D −→  R, where  D ⊂  Rn, be a differentiable

function of three or more variables. Consider the level surface

S :   f ( x) = k 

and assume that   f (c) =  k  and ∇ f (c) =  0. Then ∇ f (c) is a normal

vector to S  at  c, and the (n − 1)-dimensional subspace containing c

and orthogonal to ∇ f (c) is the tangent plane to S  at  c.

Example 7

.38

. 1

. Find a normal vector to the surface

S :  z − xe y cos( z) = −1

at c  = (1,0,0) and an equation for the tangent plane to S at  c.

Solution.  S is a level surface of f (x, y, z) = z − xe y cos z. We have

∇ f (x, y, z) = (−e y cos z, −x e y cos z, 1 + x e y sin z)

so ∇ f (c) = (−1, −1, 1) is a normal vector to S at  c. Since the tangent

 plane is perpendicular to ∇ f (c), it has an equation −x − y + z =  d for

some constant d. Since  c  lies in the tangent plane, we get

−1  =  d, so

an equation of the plane is x  + y − z =  1.

2. Let S be the graph of f (x, y) :  D −→ R. Notice that S is a level surface

of the function g(x, y, z) =  z −   f (x, y), so assuming f is differentiable

at (a, b) ∈  D, the vector ∇ g(c) = (− f x(a, b), − f  y(a, b), 1) is a

normal vector to S at c  = (a, b, f (a, b)). This agrees with the definition

of a normal vector given in Section  7.4.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 128/212

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 129/212

8

 Maxima and Minima

In this chapter we look at maxima and minima of functions of 

several variables. First we revise the single variable case that should

 be familiar from school but our treatment may be different.

8.1   Functions of a single variable

Definition 8.1.  (Maxima and minima)

Let D ⊆   R and let f   :   D →   R. Given c ∈  D, we say that f has an

absolute maximum at c if f (x) ≤   f (c) for all x ∈  D. Similarly we say

that f has an absolute minimum at c if f (x) ≥   f (c) for all x ∈ D.

We say that f has a  local maximum at c if there exists an open inter-

val I centred at c such that f (x) ≤   f (c) for all x ∈   I ∩ D. We define alocal minimum similarly.

 A maximum or minimum of f is also called an  extremum of f .

So a local maximum (or minimum) occurs at a point where the

value of the function is at least as great (or at least as small) as at

any point in the vicinity but which may be exceeded at some other

point in the domain, whereas an absolute maximum (minimum)

is the largest (least) value taken by the function anywhere in its

domain.

Example 8.2.  The function f defined on the interval  [−1, 3] and whose

 graph is given in Figure 8.1 has an absolute maximum at x   =   3, an

absolute mininum at x  = −1 as well a local maximum at x  =  2/3 and a

local minimum at x =  2.

Theorem  8.3 (The Extreme Value Theorem).  Let f be a continuous

scalar-valued function on a finite and closed interval  [a, b]. Then f is

bounded and has an absolute maximum and an absolute minimum in

[a, b], that is, there exist c1, c2 ∈   [a, b] such that f (c1) ≤   f (x) ≤   f (c2)

 for all x

∈[a, b].

Remark 8.4.  1. The points c1, c2 do not have to be unique1.   1 They can be the same point if   f   is aconstant function.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 130/212

130

−1 1   2 3

−20

−10

10   Figure 8.1: Examples of maxima andminima

2. The statement of Theorem 8.3 is not true (in general) for other types of intervals. For example, the function f (x) = 1/x defined on x ∈ [1,∞),

is a continuous function that has no absolute minimum. Although the

range of the function is bounded below by  0, it never actually achieves

this value.

Definition 8.5.   (Critical points)

Let f be a scalar-valued function defined on the subset D of  R. For c ∈D, we say that c is a  critical point of f if 

1. c is an interior point of D, and

2. either f (c) does not exist or f (c) = 0.

Theorem  8.6 (Fermat’s Theorem).  If the function f has a local maxi-

mum or minimum at some interior point c of it domain, then c is a critical

 point of f .

So, we have the following relations for interior points:

{points of absolute extrema} ⊆ {points of local extrema}⊆ {critical points} ∪ { boundary points}

In general, these sets are not equal. In particular, not all critical

points give rise to local maxima or minima. For example if   f (x) =

x3 then   f (x) =   3x2 so x   =  0 is a critical point yet it is neither a

local maxima nor a local minima.

The above suggests the following process.

Key Concept  8.7.  Procedure for finding the absolute extrema of a

continuous function f over a finite closed interval.

Step 1. Find the critical points of f and the values of f at these.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 131/212

mathematical methods i 131

Step 2. Find the values of f at the ends of the interval.

Step 3. Compare the values from Steps 1  and  2 ; the largest of them

 gives the absolute maximum, while the smallest gives the absolute

minimum.

The existence of the absolute maximum and minimum follows from

the Extreme Value Theorem.

Example 8.8.  Find the absolute maximum and the absolute minimum of 

the function f (x) =√ 

x (5 − x2) on the interval [0, 4].

Step 1. For x > 0 we have

 f (x) =  1

2√ 

x (5 − x2) +

√ x (−2x) =

  (5 − x2) − 4x2

2√ 

x  =

 5(1 − x2)

2√ 

x  .

So, for x ∈ (0, 4) we have f (x) = 0  only when x  =  1, that is, x  =  1  is

the only critical point of f. We have f (1) = 4.

Step 2. f (0) = 0  and f (4) = −22.

Step 3. Comparing f (1) =  4, f (0) =  0 and f (4) = −22, we conclude

that f (1) =  4 is the absolute maximum of the function, while f (4) =

−22 is its absolute minimum.

8.2   Functions of several variables

The ideas of Section  8.1 easily extend to functions of two variables.

Definition 8.9.  (Maxima and minima)

Let D ⊆  R2 and let f   :   D →   R. Given c ∈  D, we say that f has an

absolute maximum at  c  if f ( x) ≤   f (c) for all x ∈  D. Similarly we say

that f has an absolute minimum if f ( x) ≥   f (c) for all x ∈ D.

We say that f has a  local maximum at  c  if there exists an open ball B

centred at c  such that f ( x) ≤   f (c) for all x ∈  B ∩ D. We define a local

minimum similarly.

 A maximum or minimum of f is also called an  extremum of f .

Example 8.10.  The function f (x, y) =  2x2 + y2 has an absolute mini-

mum at (0, 0) because f (x, y) ≥ 0 =   f (0, 0) for all (x, y).

We have the natural analogue for critical points for functions of 

two variables.

Definition 8.11.   (Critical points)

Let f be a scalar-valued function defined on a subset D of R2. For c

 ∈ D

we say that c  is a critical point of f if 

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 132/212

132

1.   c is an interior point of D, and

2.  ∇ f (c) does not exist or ∇ f (c) = 0.

We have the following analogue of Theorem  8.6.

Theorem  8.12.  Let D ⊆  R2 and let f   :  D →  R. If  c  is an interior point

of D and f has a local maximum or minimum at  c, then c  is a critical

 point of f .

Again note that not every critical point is a local maxima or

minima. For example, if   f (x, y) =  y2 − x2, we have ∇ f (0, 0) =  0,

 but (0, 0) is neither a local maximum nor a local minimum. As can

 be seen in Figure 8.2 the graph of the function near  (0, 0) looks

like a saddle: it is increasing in some directions and decreasing in

others. This leads to the following definition.

Figure 8.2: The surface defined by f (x, y) = y2 − x2

Definition 8.13.   (Saddle point)

Let f   :  D −→  R, with D ⊆  R2. A critical point of f is called a  saddle

point of f if f has neither a local minimum nor a local maximum at the

 point.

The Extreme Value Theorem also has an appropriate analoguefor functions of two variables but first we need some definitions.

Definition 8.14.  (Closed, open and bounded subsets)

The boundary ∂D of a subset D of R2 is the set of all points  c ∈ R2 such

that every open disc centred at c  contains points of D and points not in D.

• A subset D of R2 is called closed if all its boundary points are in the

subset, that is, if  ∂D ⊆ D.

• A subset D of R2 is called open if none of its boundary points are in

the subset, that is, if  ∂D ∩ D = ∅.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 133/212

mathematical methods i 133

• A set D is called bounded if it is contained in a disc of finite radius.

The boundary of the set

D = {(x, y) | −2 ≤ x ≤ 3, 0 ≤  y ≤ 10}

is

{(x, y) ∈ D | x ∈ {−2, 2} or y ∈ {0,10}}Thus D  is closed and it is also bounded. The set {(x, y) | −2 ≤ x ≤3, y > 0} is unbounded.

Theorem  8.15 (The Extreme Value Theorem).  Let D be a non-empty

closed and bounded subset of R2 and let f   :   D −→  R be a continuous

 function. Then f is bounded and there exist  a ∈  D and b ∈  D such that

 f (a)≤

  f ( x)≤

  f (b) for all x∈

D.

Again we have a strategy for finding the absolute maxima and

minima of a function.

Key Concept  8.16.  Procedure for finding the absolute extrema of a

continuous function f over a closed and bounded set

Step 1. Find the critical points of f and the values of f at these.

Step 2. Find the maximum and the minimum of the restriction of f 

to the boundary ∂D.

Step 3. Compare the values from Steps 1  and  2 ; the largest of them

 gives the absolute maximum, while the smallest gives the absolute

minimum.

The existence of the absolute maximum and minimum follows from

the Extreme Value Theorem.

Example 8.17.  Find the maximum and minimum of the function f (x, y) =

2x2 + x + y2 − 2 on D = {(x, y) | x2 + y2 ≤ 4}.

Solution:Step 1. To find the critical points of f , solve the system

 f x(x, y) = 4x + 1 =  0

 f  y(x, y) = 2 y =  0

There are no points in D where ∇ f (x, y) fails to exist, so the only

critical point is P  = (−1/4,0) which is in the interior of D. We have

 f (P) =   18 −   1

4 − 2 = − 178 .

Step 2. The boundary ∂D of D is the circle x2 + y2 =  4. On this set of 

 points we have y2 = 4−

x2, so

 f (x, y) = 2x2 + x + (4 − x2) − 2 =  x2 + x + 2

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 134/212

134

on ∂D. Notice that on ∂D we have x ∈ [−2, 2]. So, we have to find the

extreme values of g(x) =  x2 + x + 2 on the interval  [−2, 2].

From g(x) =  2x + 1, the only critical point of g(x) on  [−2, 2] is x  =

−1/2. Since g(−1/2) =   74  and on the end points of the interval we

have g(−2) =  4 and g(2) =  8, it follows that the absolute maximumof g(x) is g(2) = 8  and its absolute minimum is g(−1/2) =   74 . These

are the maximum and the minimum values of f on the circle ∂D.

Step 3. Comparing the maximum and minimum values of f on ∂D with

the value of f at the only critical point P, we see that the absolute

minimum of f is − 178   =   f (−1/4,0) and its absosulte maximum is

8 =  g(2) =   f (2, 0).

If   f   is a function of three or more variables defined in some re-

gion D  of Rn, one can define local and absolute extrema, critical

points and saddle points in the same way. The analogues of Theo-

rems 8.12 and  8.15 remain true.

8.3   Identifying local maxima and minima

If   f (x) is a real-valued function of a single variable, one way to

identify if a critical point c  is a local maxima or minima is to ex-

amine the value of the derivative around  c. For example, if on an

interval I  centred at  c  the function   f (x) is increasing for  x   <  c  (that

is, if   f (x)   >  0) and decreasing for  x   >   c, (that is,   f (x)   <  0) then

 f  has a local maxima at  x  =  c. We can identify a local minima in a

similar way.

Another way is to identify the concavity of the function.

Definition 8.18.   (Inflection point)

 A function f is called  concave up on an interval I if f (x) is increasing

on I. Similarly, f is called concave down if f (x) is decreasing on I.

We say that c ∈  I is an inflection point for f if c is an interior point

of I and the type of concavity of f changes at c.

Remark 8.19.  Of course, one way to identify if f (x) is increasing or

decreasing is to examine the second derivative f (x). That is, if f (x)exists and f (x)   > 0  for all x ∈  I then f is concave up on I. Similarly, if 

 f (x) < 0 for all x ∈  I then f is concave down on I.

 Moreover, if c is a point of inflection then f (c) = 0.

Example 8.20.   For g(x) =   x3 whose graph is given in Figure  8.3, we

have g(x) =   6x, so g(x)   <  0 for x   <  0 and g(x)   >  0 for x   >  0.

Thus, g is concave down on  (−∞, 0) and concave up on  (0,∞). The point

x =  0  is an inflection point. There are no other inflection points.

This provides the Second Derivative Test.

Theorem  8.21 (The Second Derivative Test).   Let f (x) be a continuous

 function defined on an interval I with continuous second derivative andlet c ∈  I be a critical point of f .

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 135/212

mathematical methods i 135

−2   −1 1   2

−2

2

Figure 8.3: The graph of the function f (x) =  x3.

1. I f f  (c) > 0 then f has a local minimum at c.

2. I f f  (c) < 0 then f has a local maximum at c.

If   f (c) = 0, then c  may or may not be an inflection point.

Example 8.22.  The function f (x) =   x4  for x ∈   R whose graph is

 given in Figure 8.4, is concave up on the whole of R, so x  =  0 is not an

inflection point for f (although f (0) = 0).

−2   −1 1   2

2

4

6

8   Figure 8.4: The graph of the function f (x) =  x4.

Next consider a scalar-valued function   f (x, y) of two variables

defined on a subset  D  of R2. There are now four second order

partial derivatives   f xx ,   f xy ,   f  yx  and   f  yy  to consider. We already saw

in Theorem 7.17 that if   f xy  and   f  yx  are defined and continuous on  D

then they are equal. For  c ∈ D, we define the matrix f xx (c)   f xy (c)

 f  yx (c)   f  yy(c)

This matrix is sometimes called the Hessian matrix and denoted byD2 f (c).

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 136/212

136

The Hessian matrix defines a  quadratic form

Qc(x, y) =

x y  f xx (c)   f xy (c)

 f  yx (c)   f  yy(c)

x

 y

=   f xx (c)x2 + ( f xy (c) +   f  yx (c))xy +   f  yy (c) y2

The quadratic form  Qc  is called positive definite if  Qc(x, y)   > 0 for

all (x, y) = (0, 0) and called negative definite if  Qc(x, y)   <  0 for all

(x, y) = (0, 0).

We will assume from now on that   f xy  and   f  yx  are continuous

so that D2 f (c) is a symmetric matrix. Since  D2 f (c) is symmetric,

the quadratic form  Qc  will be positive definite if and only if the

determinant  Dc  of its associated Hessian matrix is strictly positive

and   f xx (c)   >  0. The form  Qc  will be negative definite if and only if 

Dc   > 0 and   f xx (c) < 0. When Dc   < 0, the quadratic form takes both

positive and negative values. These three statements are perhaps

 best justified by looking at the eigenvalues of the Hessian matrix, but

eigenvalues won’t be seen until later in the course.

Theorem  8.23 (Second Derivatives Test).  Suppose f has continuous

second partial derivatives in an open set U in R2, c ∈ U and ∇ f (c) =  0

(that is, c  is a critical point of f).

1. If Qc(x, y) is positive definite (that is, Dc   > 0 and f xx (c) > 0), then f 

has a local minimum at  c.

2. If Qc(x, y) is negative definite (that is, Dc   > 0 and f xx (c) < 0), then f has a local maximum at  c.

3. If Qc(x, y) assumes both positive and negative values (that is, Dc   < 0),

then c  is a saddle point of f .

Remark 8.24.  1. When Dc   =   0, the Second Derivatives Test gives no

information.

2. Notice that the three cases in Theorem 8.23 do not cover all possible

cases for the quadratic form Qc(x, y). For example, it may happen

that Qc(x, y) ≥

  0 for all x, y (we then say that Q is positive semi-

definite).

Example 8.25.   Find all critical points of the function f (x, y) =   x3 −3xy + y3 and determine their type (if possible).

Solution:  There are no points in the domain where ∇ f (x, y) does not

exist, so the only critical points of f are solutions of the system

 f x(x, y) = 3x2 − 3 y =  0,

 f  y(x, y) = −3x + 3 y2 = 0

From the 1st equation y  =  x2. Substituting this into the second equation

we get −x + x4 =  0. Thus, x  =  0 or x  =  1. Consequently, the critical points of f are P1  = (0, 0) and P2  = (1, 1).

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 137/212

mathematical methods i 137

Next, f xx  =  6x, f xy  = −3 and f  yy  =  6 y. At P1 the Hessian matrix is  0   −3

−3 0

which has determinant −9 and so P1  is a saddle point. At P2  the Hessian matrix is

  6   −3

−3 6

which has determinant 27. Since f xx (1, 1) =  6   >  0  the quadratic form is

 positive definite and so f has a local minimum at P2 = (1, 1).

The surface defined by the graph of f is given in Figure  8.5.

Figure 8.5: The surface defined by

 f (x, y) = x3

− 3xy + y3

.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 138/212

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 139/212

9

Taylor Polynomials

We saw in Chapter 7  that we could use the derivative of a function

to get a good linear approximation of the function. Sometimes,

an even better approximation is required. In particular, it is often

convenient to be able to approximate a function by a polynomial: itis easy to calculate its value and easy to differentiate and integrate.

9.1   Taylor polynomials for functions of one variable

Definition 9.1.   (Taylor polynomial)

Let f (x) be a real-valued function of one variable defined on some interval

I and having continuous derivatives f (x), f (x), . . . , f (n)(x) on I for

some integer n

≥1. Let a be an interior point of I.

The nth Taylor polynomial of   f  about a is defined by

T n,a(x) =   f (a) +  f (a)

1!  (x − a) +

  f (a)

2!  (x − a)2 + . . . +

  f (n)(a)

n!  (x − a)n

This is always a polynomial of degree at most  n  (however the de-

gree might be less than n, since it may happen that   f (n)(a) =   0).

Note that T 1,a(x) is the equation of the tangent line to the curve

 y =   f (x) when x  =  a.

Remark 9.2.  1. Clearly T n,a

(a) =   f (a). Next,

T n,a(x) =   f (a) +  f (a)

1!  (x − a) +

  f (a)

2!  (x − a)2 + . . . +

  f (n)(a)

(n − 1)! (x − a)n−1

so T n,a(a) =   f (a).

Continuing in this way, one sees that

T (k )n,a (a) =   f (k )(a)

 for all k  =  0, 1 , 2, . . . , n.

2. Using properties of polynomials one can actually prove that if Q(x) is

any polynomial of degree at most n such that Q(k )(a) =   f (k )(a) for allk  =  0, 1, 2 , . . . , n, then Q(x) = T n,a(x).

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 140/212

140

Example 9.3.  1. Let f  (x) =  ex and a =  0. Since f (x) =  e x =   f (k )(x)

 for all k ≥ 0 we have f (k )(0) = 1  for all k ≥ 0, and so

T n,0(x) = 1 +  x

1! +

 x2

2!  + . . . +

 xn

n!

 for any positive integer n.

2. Let f (x) = sin(x). Notice that

 f (x) = cos(x) ,   f (x) = − sin(x) ,   f (x) = − cos(x) ,   f (4)(x) = sin(x)

and the sequence of the derivatives is periodic with period  4.

Consider a =  π /3 and n =  3. We have

 f (π /3) =

√ 3

2  ,   f (π /3) =

 1

2  ,   f (π /3) = −

√ 3

2  ,   f (π /3) = −1

2

therefore

T 3,π /3(x) =

√ 3

2  +

 1

2 (x − π /3) −

√ 3

4  (x − π /3)2 −   1

12 (x − π /3)3

Taylor polynomials are used to approximate the original func-

tion   f . That is, given   f , n  and  a  as before, we consider the approxi-

mation

 f (x) ≈ T n,a(x) for  x ∈  I    (9.1)

This can be written as

 f (x) = T n,a(x) + Rn,a(x) , (9.2)

where Rn,a(x) is the error term (the so called remainder). Naturally,

the approximation (9.1) will not be of much use if we do not know

how small the error is. The following theorem gives some informa-

tion about the remainder  Rn,a(x) which can be used to get practical

estimates about the size of the error.

Theorem  9.4 (Taylor’s Formula).  Assume that f has continuous deriva-

tives up to order n + 1 on some interval I and a is an interior point of I.

Then for any x

∈ I there exists z between x and a such that

Rn,a(x) =  f (n+1)( z)

(n + 1)!  (x − a)n+1

That is, for any x ∈  I,

 f (x) =   f (a) +  f (a)

1!  (x − a) +

  f (a)

2!  (x − a)2 + . . . +

  f (n)(a)

n!  (x − a)n +

  f (n+1)( z)

(n + 1)!  (x − a)n+1

 for any x ∈  I, where z depends on x.

Despite the fact that we don’t know what  z  is, knwoing an in-

terval in which it lies allows us to bound the error, as we see in thenext example.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 141/212

mathematical methods i 141

Example 9.5.  Find T 6,0(x)  for f (x) =  sin(x) and estimate the maximal

 possible error in the approximation

sin(x) ≈ T 6,0(x)   (9.3)

 for x

 ∈  [−

0.3, 0.3]. Use this approximation to find  sin(12◦) correct to  6

decimal places.

Solution:  Recall the derivatives of  sin(x):

 f (x) = cos(x) ,   f (x) = − sin(x) ,   f (x) = − cos(x) ,   f (4)(x) = sin(x) ,

 f (5)(x) = cos(x)   ,   f (6)(x) = − sin(x)   ,   f (7)(x) = − cos(x)

Thus,

T 6,0(x) = 0 +  1

1! x  + 0 −   1

3! x3 + 0 +

  1

5! x5 = x −  x3

3!  +

 x5

5!

To estimate the error, recall that

sin(x) =  T 6,0(x) + R6,0(x) ,

where by Taylor’s formula, for any x the error term has the form

R6,0(x) =  f (7)( z)

7!  x7

 for some z between 0  and x. For x ∈ [−0.3, 0.3] we have |x| ≤ 0.3, so

|R6,0(x)| = | cos( z)|

5040  |x|7 ≤   1

5040|0.3|7 = 0.4339285 × 10−7

< 4.4× 10−8 .

 Hence the maximum error in the approximation (9.3) for x ∈ [−0.3, 0.3] is

at most 4.4 × 10−8.

Since 12◦  is  π /15 in radians and π /15   <  0.3, the previous argumentshows that

sin(12◦) = sin(π /15) ≈ π /15 − (π /15)3

3!  +

 (π /15)5

5!  = 0.20791169...

is correct to at least 6  decimals.

9.2   Taylor polynomials for functions of two variables

We can also define Taylor polynomials for functions of two vari-

ables. We will only state the second degree approximation.

Theorem  9.6 (Taylor’s Theorem).  Let D be an open disc in R2, let f   :  D −→  R, and let c  = (a, b) ∈  D. If f has continuous and bounded

 partial derivatives up to third order in D, then there exists a constant

C > 0 such that for each (x, y) ∈ D we have

 f (x, y) =   f (a, b) +

 f x(a, b)   f  y(a, b) x − a

 y − b

+1

2

x − a y − b

 f xx (a, b)   f xy (a, b)

 f  yx (a, b)   f  yy(a, b)

x − a

 y − b

+R2

(a, b; x, y)

where |R2(a, b; x, y)| ≤ C |(x − a, y − b)|3 = C [(x − a)2 + ( y − b)2]3/2.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 142/212

142

Note that the first two terms in the approximation for   f (x, y)

give the equation of the tangent plane to the surface  z  =   f (x, y) at

the point (a, b, f (a, b)).

Remark 9.7.   If  (a, b)  is a critical point of f then f x(a, b) =   f  y(a, b) =  0

and so the second degree Taylor polynomial is simply

 f (a, b) + 1

2

x − a y − b

 f xx (a, b)   f xy(a, b)

 f  yx (a, b)   f  yy (a, b)

x − a

 y − b

Note that the quadratic term involves the Hessian matrix for f and the

quadratic form used in the Second Derivative Test in Theorem  8.23.

Clearly, if the quadratic form only takes positive values near (a, b) then

 f will have a local minimum at  (a, b), and if the quadratic form only takes

negative values near (a, b) then f will have a local maximum at  (a, b).

This justifies Theorem  8.23.

Example 9.8.   Let f (x, y) =  1 + x2 + y2 and let c   = (0, 0). Then

 f x(x, y) =  x/ 

1 + x2 + y2, so f x(0, 0) =  0, and similarly f  y(0, 0) = 0.

Thus, the first degree approximation of f about  c  = (0, 0) is

 f (x, y) ≈   f (0, 0) +

 f x(0, 0)   f  y(0, 0) x − 0

 y − 0

 =  1

This amounts to approximating the function by the equation of its tangent

 plane.

Next,

 f xx (x, y) =   1 1 + x2 + y2 −   x

2

(1 + x2 + y2)3/2

so f xx (0, 0) = 1. Similarly, f xy(0, 0) = 0  and f  yy (0, 0) = 1, so the second

degree approximation of f about  c  = (0, 0) is

 f (x, y) ≈ 1 +   12

x − 0   y − 0

 f xx (0, 0)   f xy (0, 0)

 f  yx (0, 0)   f  yy(0, 0)

x − 0

 y − 0

= 1 +   1

2 (x2 + y2)

Now we are approximating the function by a second-order polynomial, so

by the equation of a curved, rather than flat, surface.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 143/212

Part III

Differential equations and

eigenstructure: by Des Hill

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 144/212

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 145/212

10

Differential Equations

10.1   Introduction

In a vast number of situations a mathematical model of a system

or process will result in an equation (or set of equations) involvingnot only functions of the dependent variables but also derivatives

of some or all of those functions with respect to one or more of the

variables. Such equations are called differential equations.

The simplest situation is that of a single function of a single

independent variable, in which case the equation is referred to

as an ordinary differential equation. A situation in which there is

more than one independent variable will involve a function of 

those variables and an equation involving partial derivatives of that

function is called a partial differential equation.

Notationally, it is easy to tell the difference. For example, the

equation∂ f 

∂x +

 ∂ f 

∂ y  =   f 2 (10.1)

is a partial differential equation to be solved for   f (x, y), whereas

d2 f 

dx2 + 3

d f 

dx + 2 f   = x 4 (10.2)

is an ordinary differential equation to be solved for   f (x).

The order of a differential equation is the degree of the highest

derivative that occurs in it. Partial differential equation  (10.1) is

first order and ordinary differential equation (10.2) is second order.For partial differential equations the degree of a mixed derivative

is the total number of derivatives taken. For example, the following

partial differential equation for   f (x, t) has order five:

∂ f 5

∂x3∂t2 +

 ∂2 f 

∂x2 +

 ∂ f 

∂t  = 0 (10.3)

An important class of differential equations are those referred to

as linear. Roughly speaking linear differential equations are those

in which neither the function nor its derivatives occur in products,

powers or nonlinear functions. Differential equations that are not

linear are referred to as nonlinear. Equation (10.1) is nonlinear,whereas equations (10.2) and  (10.3) are both linear.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 146/212

146

Example 10.1.  Classify the following differential equations with respect

to (i) their nature (ordinary or partial), (ii) their order and (iii) linear or

nonlinear:

(a)  ∂ f 

∂x − ∂ f 

∂ y

  = 1   (b)  ∂2 g

∂t2

  + g  =  sin(t)

(c)  d3 y

dx3 + 8 y =  x sin x   (d)

  ∂u

∂x

∂u

∂t  =  x  + t

(e)   P2 d2P

dx2  = x5 + 1   ( f )

  ∂4F

∂x∂ y3  = t2F

Solution:

(i) Equations (a), (b), (d) and  ( f ) involve partial derivatives and are

hence partial differential equations, whereas equations  (c) and  (e) involve

ordinary derivatives and are hence ordinary differential equations.

(ii) Recall that the order of a differential equation is the degree of the

highest derivative that occurs in it. The orders of the differential equations

are as follows:

(a) first   (b) second   (c) third   (d) first

(e) second   ( f ) fourth

(iii) Recall that linear differential equations are those in which neither

the function nor its derivatives occur in products, powers or nonlinear

 functions. It doesn’t matter how the independent variables appear. We

observe that equations (a), (b), (c) and  ( f ) are linear whereas equations

(d) and  (e) are nonlinear.

10.1.1   Solutions of differential equations

When asked to solve an algebraic equation, for example  x2 − 3x +

2   =  0, we expect the answers to be numbers. The situation as

regards differential equations is much more difficult because we are

 being asked to find functions that will satisfy the given equation,

for example in Example 10.1(a) we are asked for a function   f (x, y)

that will satisfy the partial differential equation  ∂ f ∂x −

  ∂ f ∂ y   =  1, and

in Example 10.1(c) we are asked to find a function  y(x) that will

satisfy

  d3 y

dx3  + 8 y =  x sin x.Unlike algebraic equations, which only have a discrete set of 

solutions (for example x2 − 3x + 2 =  0 only has the solutions  x  =  1

or 2) differential equations can have whole families of solutions.

For example, y   =   Ae3t satisfies the ordinary differential equationdydt   = 3 y for  any  value of  A.

If a differential equation is linear then there is a well-established

procedure for finding solutions and we shall cover this in detail for

ordinary differential equations. If an ordinary differential equa-

tion is nonlinear but is of first order then we may be able to find

solutions.

The theory of partial differential equations is outside the scope of this unit but we shall touch upon it at the end of this section.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 147/212

mathematical methods i 147

10.1.2   Verification of solutions of differential equations

To get a feel for things (and to practice our algebra) we’ll have a

quick look at the relatively simple procedure of verifying solutions

of differential equations by way of a few examples

Example 10.2.  Verify that

 y(x) =  Ae2x + Be−2x − 2cos(x) − 5x sin(x)   (10.4)

is a solution of the ordinary differential equation

d2 y

dx2 − 4 y =  25x sin(x)   (10.5)

 for any value of the constants A and B.

Solution:  We need to calculate  d2 ydx2 . In order to do this we need the

 product rule to differentiate x sin x. It gives

d

dx (x sin(x)) =  sin(x) + x cos(x)

and

d2

dx2 ( x sin x) =  2 cos(x) − x sin(x)

 Hence

d2 y

dx2  = 4 Ae2x + 4Be−2x − 8cos(x) + 5x sin(x)

and substitution of this and the expression (10.4) into equation (10.5)quickly yields the required verification.

Example 10.3.  Verify that both

(i)   f (x, y) =  xy −  y2

2  and   (ii)   f (x, y) = sin( y − x) +

 x2

2

are solutions of the partial differential equation

∂ f 

∂x +

 ∂ f 

∂ y  = x

Solution:  In each case we need to calculate  ∂ f ∂x   and

  ∂ f ∂ y . We have

(i)  ∂ f 

∂x  = y   and

  ∂ f 

∂ y  = x − y.   Thus

  ∂ f 

∂x +

 ∂ f 

∂ y  = y + x − y =  x

(ii)  ∂ f 

∂x  = − cos( y − x) + x   and

  ∂ f 

∂ y  = cos( y − x).

 Hence  ∂ f 

∂x +

 ∂ f 

∂ y  = − cos( y − x) + x + cos( y − x) =  x

In both cases we have verified the solution of the partial differential equa-tion.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 148/212

148

10.2   Mathematical modelling with ordinary differential equa-

tions

For real-world systems changing continuously in time we can use

derivatives to model the rates of change of quantities. Our mathe-

matical models are thus differential equations.

Example 10.4.  Modelling population growth.

The simplest model of population growth is to assume that the rate of 

change of population is proportional to the population at that time. Let

P(t) represent the population at time t. Then the mathematical model is

dP

dt  = rP for some constant r > 0

It can be shown (using a method called  separation of variables, which

we shall learn shortly) that the function P(t) that satisfies this differential

equation is

P(t) = P0ert where P0is the population at t  =  0

This model is clearly inadequate in that it predicts that the population will

increase without bound if r   >   0. A more realistic model is the logistic

growth model

dP

dt  = rP(C − P)   where r > 0 and C > 0 are constants

The method of separation of variables can be used to show that the solution

of this differential equation is

P(t) =  CP0

P0 + (C − P0)e−rt  where P0  =  P(0)

This model predicts that as time goes on, the population will tend towards

the constant value C, called the carrying capacity.

Example 10.5.  Newton’s law of cooling 

The rate at which heat is lost from an object is proportional to the differ-

ence between the temperature of the object and the ambient temperature.

Let H (t) be the temperature of the object (in ◦C) at time t and suppose

the fixed ambient temperature is A◦C. Newton’s law of cooling says that

dH 

dt  = α( A −  H )   for some constant α > 0

The method of separation of variables can be used to show that the solution

of this differential equation is

 H (t) =  A + ( H 0 −  A)e−αt where H 0 =  H (0)

This model predicts that as time goes on, the temperature of the object will

approach that of its surroundings, which agrees with our intuition.

Example 10.6.  One tank mixing process

Suppose we have a tank of salt water and we allow fresh water into thetank at a rate of F m3 /sec, and allow salt water out at the same rate. Note

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 149/212

mathematical methods i 149

Tank

Fresh Water

Salt Water

Volume

− −

− − −

Figure 10.1: A tank.

this is a volume rate, and that the volume V of the tank is maintained

constant. We assume instantaneous mixing so that the tank has a uniform

concentration.

Let x(t) represent the salt concentration of the water (kg/m3) in the

tank at time t and a(t) represent the amount of salt (kg). We have x (t) =

a(t)/V. The tank starts with an amount of salt a 0 kg.The rate at which salt is being removed from the tank at time t is given

byda

dt  = −x(t) × flow rate  = −Fx(t) = − F

V  a(t) = −αa(t)

where α   =   F/V is a positive constant. This equation has the solution

a(t) =  a0e−αt. This converges to zero as t → ∞.

Consider the same tank which is now filled with fresh water. Water pol-

luted with q kg/m3 of some chemical enters the tank at a rate of F m3 /sec,

and polluted water exits the tank at the same rate. We again assume in-

stantaneous mixing so that the tank has a uniform concentration.

Let x(t) represent the concentration of pollutant (kg/m3

) in the waterin the tank at time t and a(t) represent the amount of pollutant (kg). We

have x(t) =   a(t)/V. The rate at which pollutant is being added to the

tank at time t is given by

da

dt  = amount of pollutant added per sec − amount of pollutant removed per sec

That is,da

dt  = q F − Fx(t) = q F −   F

V  a(t)

 Alternatively, we can obtain a differential equation for the concentration

x(t) by dividing through the above equation by V to give

dx

dt  =

  F

V (q − x)   ⇒   dx

dt  = α(q − x)

where α   =   F/V is a positive constant. Notice that this is essentially

the same as the differential equation that we obtained for Newton’s law of 

cooling.

10.3   First order ordinary differential equations

Most first order ordinary differential equations can be expressed

(by algebraic re-arrangement if necessary) in the form

dx

dt  =   f (t, x)   (10.6)

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 150/212

150

where the function   f (t, x) is known, and we are asked to find the

solution x(t).

10.3.1   Direction fields

Expression (10.6) means that for any point in the  t x-plane (for

which   f   is defined) we can evaluate the gradient   dxdt  and repre-

sent this graphically by means of a small arrow. If we do this for a

whole grid of points in the tx-plane and place all of the arrows on

the same plot we produce what is called a  direction field. Figure 10.2

displays the direction field in the case where   f (t, x) = x 2 − t2.

Figure 10.2: The direction field of dxdt   = x2 − t2.

A solution of equation (10.6) is a function relating  x  and  t  which

geometrically is a curve in the  tx-plane. Since this solution satisfies

the differential equation then the curve is such that its gradient is

the same as the direction field vector at any point on the curve.

That is, the direction field is a collection of arrows that are

tangential to the solution curves. This observation enables us to

roughly sketch solution curves without actually solving the differ-

ential equation, as long as we have a device to plot the direction

field. We can indeed sketch many such curves (called a family of 

solution curves) superimposed on the same direction field.

Example 10.7.  The direction field of   dxdt   =   x2 − t2 along with three

solution curves is given in Figure  10.3. The top curve is the solution that

 goes through (t, x) = (0, 1). The middle curve is the solution that goes

through (t, x) = (0, 0) and the bottom curve is the solution that goes

through (t, x) = (0, −2).

Example 10.8.  The direction field of   dxdt   =   3x +  et along with three

solution curves. The top curve is the solution that goes through  (t, x) =

(0, 1) is given in Figure  10.4. The middle curve is the solution that goes

through (t, x) = (0, 0) and the bottom curve is the solution that goes

through (t, x) = (0, −1).

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 151/212

mathematical methods i 151

Figure 10.3: Three solution curves fordxdt   = x2 − t2.

Figure 10.4: Three solution curves fordxdt   = 3x + et.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 152/212

152

10.3.2   Separation of variables

If it is the case that a first order differential equation can be manip-

ulated into the following form, called  separable form:

dx

dt   =  h(t)

 g(x)   for some g(x) = 0 (10.7)

(and the equation is called a  separable differential equation) then there

is a simple way of determining its solution. Informally, we treat   dxdt

as a fraction and re-arrange equation  (10.7) into the form

 g(x)dx  =  h(t)dt

We now (symbolically) integrate both sides to get   g(x)dx  =

   h(t)dt

If we can integrate  g(x) and  h(t) with respect to their respective ar-guments then we will have an expression that we can (in principle)

re-arrange into a solution of the form x(t).

This informal approach can be made rigorous by appeal to the

Chain Rule. We write equation (10.7) as

 g(x)dx

dt  = h(t)

and integrate both sides with respect to t  to get

   g(x)

dx

dt dt  =

   h(t)dt   which implies

   g(x)dx  =

   h(t)dt

Example 10.9.  Solve the first order differential equation   dxdt   = x2 sin(t)

Solution: The differential equation is separable. The solution is given

by   dx

x2  =

   sin(t)dt which implies   −  1

x  = − cos(t) + C

(10.8)

where C is a constant of integration. We can re-arrange equation  (10.8) to

 get

x(t) =  1

cos(t) − C  (10.9)

Note that we could have added integration constants, say C1 and C2,to each side of equation  (10.8) but these can easily be combined into one

integration constant: C =  C2 − C1.

Note also that equation (10.8) does not hold if x   =   0. In such sit-

uations we have to investigate the original differential equation   dxdt   =

x2 sin(t). In this case it turns out that x(t) =  0 is in fact a solution, but

not of the form (10.9). Special situations like this are something that we

should be aware of.

10.3.3   The integrating factor method

If a first order ordinary differential equation is linear then it can besolved by the following process. The most general form for a first

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 153/212

mathematical methods i 153

order linear ordinary differential equation is

dx

dt  + p(t)x =  r(t)   (10.10)

The trick is to multiply this equation throughout by an as yet un-

determined function g(t) and then appeal to the product rule of 

differentiation to guide us in the correct choice of  g(t). Multiplying

equation (10.10) throughout by  g(t) gives

 g(t)dx

dt  + g(t) p(t)x =  g(t)r(t)   (10.11)

We would like to be able to write the left-hand side of this equation

in the form:d

dt [ g(t)x]

Applying the product rule to this expression and comparing it to

the leftt-hand side of equation (10.11) we have

 g(t)dx

dt  +

 d g

dt x  =  g(t)

dx

dt  + g(t) p(t)x

which implies thatdg

dt  = g(t) p(t)   (10.12)

This equation is separable. The solution is

   dg

 g(t)  =

   p(t)dt

which implies

ln g(t) = 

  p(t)dt   (10.13)

We can set the constant of integration to zero - any function that

satisfies equation (10.12) will do for our purposes. Exponentiating

 both sides of equation (10.13) gives

 g(t) = e 

  p(t)dt

and g(t) is referred to as an  integrating factor. With  g(t) having this

special form, equation (10.11) is equivalent to

d

dt [ g(t)x(t)] =  g(t)r(t)

Integrating this equation with respect to  t  gives

 g(t)x(t) = 

  g(t)r(t)dt + C

and hence the solution of the ordinary differential equation is

x(t) =  1

 g(t)

   g(t)r(t)dt + C

  where   g(t) = e

   p(t)dt

(10.14)

Note that we must include the integration constant  C  because wewant the most general solution possible.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 154/212

154

Example 10.10.  . Solve the first order linear differential equation

dx

dt − 3x =  et (10.15)

 As a guide to the shape of the solutions, the direction field for this differ-

ential equation along with a number of solution curves appears in Figure10.4.

Solution:  The integrating factor is

 g(t) = e 

  p(t)dt = e  −3dt = e−3t

 Multiplying the differential equation through by g(t) gives

e−3t dx

dt − 3e−3tx =  e−2t

We know from our recent work that the left-hand side of this can be rewrit-

ten in product form and so we obtain:

ddt

e−3tx

 =  e−2t

Now integrate both sides with respect to t:

e−3tx = 

  e−2tdt  = − 1

2e−2t + C

and tidy up:

x(t) = − et

2  + Ce3t (10.16)

Note that we could have written down the solution immediately by ap-

 pealing to equation (10.14) but when learning the method it is instructive

to follow through each step in the process in order to gain a better under-standing of how it works.

Example 10.11.  Solve the first order linear differential equation

dx

dt  +

 2

t x  =

  sin(t)

t2

Solution:  The integrating factor is

 g(t) = e 

  p(t)dt = e 

  2t dt = e2 ln(t) = e ln(t2) = t2

 Multiplying the differential equation through by this gives

t2 dxdt

  + 2tx  =  sin(t)

We know from our recent work that the right-hand side of this can be

rewritten in product form:

d

dt

t2x

 =  sin(t)

Now integrate with respect to t:

t2x = 

  sin(t)dt  = − cos(t) + C

and tidy up:

x(t) = − cos(t) + Ct2

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 155/212

mathematical methods i 155

10.4   Initial conditions

The values of constants of integration that arise when we solve

differential equations can be determined by making use of other

conditions (or restrictions) placed on the problem.

Example 10.12.   Solve   dxdt − 3x  =  et subject to x(0) =  1, that is, x  =  1

when t =  0.

Solution:  We have already seen this differential equation. It is equa-

tion (10.15) and we have determined that its (most general) solution is

 given by  (10.16):

x = − et

2  + Ce3t

 All we have to do is substitute x   =  1 and t  =  0 and solve the resulting

algebraic equation for C. We have

1 = −e0

2   + Ce0 ⇒   1 = −1

2 +  C   ⇒   C =  3

2

so the required solution is

x(t) = 3e3t − et

2  (10.17)

In applications, the variable t  is usually used to represent time.

The extra condition (for example  x(0) =  1) is called an  initial value

and the combined differential equation plus initial condition is

called an initial value problem. The solution curve of  (10.17) appears

in Figure 10.4.

Occasionally there are situations when the ‘initial condition’ isactually specified at a nonzero time value.

Example 10.13.  Solve the initial value problem

dx

dt  =

  t2

x  subject to   x(1) = 4

Solution:  We observe that the differential equation is separable. The

solution is:    xdx =

   t2dt   ⇒   x2

2  =

  t3

3  + C

which implies

x = ± 

2t3

3  + 2C

Notice that we have two different solutions to the differential equation, one

 positive and one negative. The initial condition x(1) =   4 allows us to

eliminate the negative solution so we are left with

x =

 2t3

3  + 2C

and substituting into this x  =  4  and t =  1  gives

4 = 2

3 + 2C   ⇒   C =   23

3  ⇒   x =

 2t3 + 463

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 156/212

156

10.5   Linear constant-coefficient ordinary differential equations

The general theory of higher order (that is, greater than one) dif-

ferential equations is beyond the scope of this unit and in any case

many applications lead to differential equations that have an espe-

cially simple structure, and we shall concentrate on these.

Recall that in a linear differential equation neither the function

nor its derivatives occur in products, powers or nonlinear functions.

We shall focus on second order linear differential equations. Exten-

sion to higher orders is straightforward, and first order ones can be

handled by the methods already discussed.

The most general second order linear ordinary differential equa-

tion is

a(t)d2x

dt2  + b(t)

dx

dt  + c(t)x =   f (t)

 but we shall simpify this even further by assuming that the coef-

ficient functions on the left-hand side are constants (that is, not

functions of  t). In this case we have the general second order linear

constant-coefficient ordinary differential equation

ad2x

dt2  + b

dx

dt  + cx  =   f (t)   a = 0 (10.18)

If   f (t) =  0 then the equation is referred to as being  homogeneous,

otherwise it is called nonhomogeneous. We shall first of all concen-

trate on the homogeneous case, that is, we wish to solve

ad2x

dt2   + bdx

dt   + cx  =  0   a = 0 (10.19)

where a, b  and  c  are given constants.

In seeking a solution technique we consider the first order equa-

tion

bdx

dt  + cx  =  0   b = 0

We can solve this by using either the separation approach or the

integrating factor method. Using separation we have

dxdt   = − cx

b   ⇒ 

  dxx   =

  − c

b dt

⇒   ln(x) = −c

b t + C⇒   x =  Aemt

where m  = −c/b, C  is the constant of integration and  A =  eC.

By analogy we attempt to find a solution to the second order

equation (10.19) in the form  x   =   emt. Substituting x   =   emt into

equation (10.19) yields

am2emt + bmemt + cemt = 0   ⇒   (am2 + bm + c)emt = 0

and since emt can never be zero we can divide through by it to

arrive at a quadratic equation for  m:

am2 + bm + c  =  0

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 157/212

mathematical methods i 157

This equation is usually referred to as the characteristic equation (or

auxiliary equation) of the differential equation. Label the roots of this

equation m1  and  m2. Then both  em1 t and em2t are solutions of the

differential equation.

Homogeneous linear differential equations have the linearity

property, that is, if both  x1(t) and  x2(t) are solutions then so is any

linear combination of these (that is,  Ax1(t) + Bx2(t) where  A and  B

are constants).

We can easily show this for equation  (10.19). Let x(t) =  Ax1(t) +

Bx2(t). Then

a d2 xdt2   + b dx

dt   + cx   =  a A d2 x1

dt2   + B d2 x2

dt2

+ b

 A dx1

dt   + B dx1dt

+ c [ Ax1 + Bx2]

=  A

a d2 x1dt2   + b dx1

dt   + cx1

+ B

a d2 x2

dt2   + b dx2dt   + cx2

= 0 A + 0B =  0

Making use of this linearity result allows us to write down the

general solution of the second order homogeneous linear constant-

coefficient ordinary differential equation:

x(t) =  Aem1t + Bem2t (10.20)

Notice that in this case we have two undetermined constants

in the solution. The general situation is that an  nth order linear

differential equation (even one with variable coefficients) will have

n undetermined constants in its general solution. If the differential

equation is nonlinear the situation is much less clear.

Recall that the roots of a quadratic equation can be either (i) both

real; (ii) complex conjugates; or (iii) a repeated root. The actualsolution form is different in each of these three cases.

Case  1. Two real roots. In this case the solution is simply that

given in (10.20).

Example 10.14.  Solve the differential equation

d2x

dt2 − 5

dx

dt  + 4x =  0

Solution: The characteristic equation is m2 − 5m + 4   =   0 which

 factorises into  (m − 1)(m − 4) =  0 and hence the required solutions are

m1 =  1  and m2 =  4  and the solution is

x(t) =  Aet + Be4t

Case  2. Complex conjugate roots. The differential equation is

in terms of real variables so we would like to have a solution in

terms of functions of real numbers. To achieve this we recall Euler’s

formula

eit = cos(t) + i sin(t)

If the roots are  m1  =  α + i β and  m2  =  α − i β then general solution

(including both real and complex functions) is

x =  Ae(α+i β)t + Be(α−i β)t = eαt Aei βt + Be−i βt

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 158/212

158

where  A and  B  are complex numbers. Applying Euler’s formula

gives

x(t) = eαt [ A(cos( βt) + i sin( βt)) + B(cos( βt) − i sin( βt))]

which we can rewrite as

x(t) = C1eαt cos( βt) + C2eαt sin( βt)

where C1  =  A + B  and  C2  =  i( A − B). Since we are only interested

in real-valued functions we are only interested in the cases where

C1  and C2  are real.

Example 10.15.  Solve the differential equation

d2x

dt2 − 4

dx

dt  + 13x =  0

Solution: The characteristic equation is m2 − 4m + 13   =   0. The

quadratic formula gives the roots as m1,2  =  2 ± 3i and hence the solution

is

x(t) = C1e2t cos(3t) + C2e2t sin(3t)

where C1  and C2 are any real numbers.

Case  3. Equal roots. In this case we have only one solution of the

form  x1  =  emt. The other solution will have the form  x2  =  temt. This

can be deduced using a process called reduction of order but this is

outside the scope of this material. The required solution is

x(t) =  Aemt + Btemt

Example 10.16.  Solve the differential equation

d2x

dt2  + 6

dx

dt  + 9x =  0

Solution: The characteristic equation is m2 + 6m + 9   =   0 which

 factorises into  (m + 3)2 =  0 and hence the required component solutions

are e−3t and te−3t and the solution of the differential equation is

x(t) =  Ae−

3t + Bte−

3t

10.6   Linear nonhomogeneous constant-coefficient differential equa-

tions

We now consider equations of the form  (10.18), reproduced here:

ad2x

dt2  + b

dx

dt  + cx  =   f (t)   a = 0 (10.21)

where a, b  and  c  are given constants. In the previous section we

learnt how to solve such a differential equation if   f (t) =  0 (that is,

the equation is homogeneous). If   f (t) =  0 the equation is callednonhomogeneous.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 159/212

mathematical methods i 159

In order to solve equation  (10.21) we first need to solve the

equivalent homogeneous equation (that is, when   f (t) =  0). The

solution so obtained is referred to as the  homogeneous solution or

the complementary function. It is actually a whole family of solutions

 because it has two free parameters (C1  and  C2, or  A and  B).

The solution of equation  (10.21) is the sum of this homogeneous

solution and a particular solution (that is, any solution that we

can find) to the nonhomogeneous differential equation (10.21) and

is referred to as the  general solution  of  (10.21). It is also actually a

whole family of solutions for the same reasons.

We can show that such a combination will be a solution to equa-

tion (10.21). Let xc(t) be a solution of the homogeneous equation

and x p(t) be a solution of equation  (10.21). We need to show that

x(t) =  xc(t) + x p(t) is also a solution of  (10.21):

ad2 xdt2   + b

dxdt   + cx   =  a d2 xc

dt2   +  d2 x p

dt2 + b dxcdt   +

  dx p

dt + c xc + x p=

a d2 xc

dt2   + b dxcdt   + cxc

+

a

d2 x p

dt2   + bdx p

dt   + cx p

= 0 +   f (t) =   f (t)

and our claim is proved. We can also show that there are in fact

no other possible solutions of equation (10.21) by showing that

the difference between any two solutions (say  x1(t) and  x2(t)) is a

solution of the homogeneous equation:

a  d2

dt2 (x1 − x2) + b  ddt (x1 − x2) + c(x1 − x2)

= a d2

x1dt2   + b dx1

dt   + cx1− a d2 x2

dt2   + b dx2dt   + cx2

=   f (t) −   f (t) = 0

So the set of solutions of a linear differential equation behaves in

the same way as what we saw for the set of solutions of a system of 

linear equations in Section 3.4.

In any given problem our tasks will be to solve the homogeneous

equation (using the methods of the previous section) and to find

a particular solution. There is a general method (called  variation of 

 parameters) for achieving the latter but it often leads to difficult, if 

not impossible integrals to solve. We shall restrict ourselves to aclass of right-hand side functions   f (t) for which there is an easily

applicable method.

This class is the collection of polynomial functions

αntn + αn−1tn−1 + · · · + α1t + α0,

exponential functions αe βt and the trigonometric functions α cos(γt)

and β sin(δt) or any linear combination of these, for example

 f (t) = t2 + 7t + 4 + e5t + 6cos(3t)

The method of finding a particular solution for such a class of right-hand side functions is called the  method of undetermined coefficients.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 160/212

160

10.6.1   Method of undetermined coefficients

Table 10.1 is a summary of the appropriate form of solution to seek.

The values of  αn, · · ·   , α0, α, and β  will be given and we are required

to determine the correct values of  Pn, · · ·  , P0,  P  and/or Q  that will

give a form that will satisfy the differential equation.

 f (t)   Solution form

αntn + αn−1tn−1 + · · · + α1t + α0   Pntn + Pn−1tn−1 + · · · + P1t + P0

αe βt Pe βt

α cos(γt)   P cos(γt) + Q sin(γt)

 β sin(δt)   P cos(δt) + Q sin(δt)

Table 10.1: Solution forms for methodof undetermined coefficients

Example 10.17.  Solve the differential equation   d2 xdt2  + 6 dx

dt  + 9x =  4t2 + 5.

Solution: In Example  10.16, we found the complementary solution

(that is, the solution of the homogeneous problem). It is

xc(t) =  Ae−3t + Bte−3t

We now need to determine a particular solution. Based on Table 10.1 we

try

x p(t) =  P2t2 + P1t + P0   (10.22)

Note that we must include the term P1t. Substitution of expression

(10.22) into the differential equation gives

2P2 + 6(2P2t + P1) + 9(P2t2 + P1t + P0) = 4t2 + 5

⇒   9P2t2 + (9P1 + 12P2)t + (9P0 + 6P1 + 2P2) = 4t2 + 5

and equating powers of t on each side of this equation leads to a set of 

algebraic equations to solve for the unknowns P0, P1 and P2:

9P2  =  4 9P1 + 12P2  =  0 9P0 + 6P1 + 2P2 =  5

The solution of this set of equations is

P0  =

 23

27   P1  = −16

27   P2 =

 4

9

 Hence the particular solution

x p(t) = 4

9t2 −  16

27t +

 23

27

and finally, the general solution of the nonhomogeneous differential equa-

tion is

x(t) =  Ae−3t + Bte−3t + 4

9t2 −  16

27t +

 23

27  (10.23)

Example 10.18.  Solve the differential equation

d2x

dt2 − 5

dx

dt  + 4x =  7 cos(3t)

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 161/212

mathematical methods i 161

Solution: In Example  10.14 we found the complementary solution

(that is, the solution of the homogeneous problem). It is

xc(t) =  Aet + Be4t

We now need to determine a particular solution. Based on Table 10.1 we

try

x p(t) =  P cos(3t) + Q sin(3t)   (10.24)

Note that we must include both cos and sin terms. Substitution of expres-

sion (10.24) into the differential equation gives

−9P cos(3t) − 9Q sin(3t) − 5(−3P sin(3t) +3Q cos(3t)) + 4(P cos(3t) + Q sin(3t)) = 7 cos(3t)

and equating coefficients of  cos(3t) and  sin(3t) on both sides of this

equation leads to a set of algebraic equations to solve for the unknowns P

and Q:

−9P − 15Q + 4P =  7   − 9Q + 15P + 4Q =  0

The solution of this set of equations is

P = −  7

50  and   Q = −21

50

and so

x p(t) = −  7

50 cos(3t) −  21

50 sin(3t)

and finally, the general solution of the nonhomogeneous differential equa-

tion is

x(t) =  Aet + Be4t

−  7

50 cos(3t)

− 21

50 sin(3t)   (10.25)

Example 10.19.  Solve the differential equation

d2x

dt2 − 4

dx

dt  + 13x =  8e−3t

Solution: In Example  10.15 we found the complementary solution

(that is, the solution of the homogeneous problem). It is

xc(t) = C1e2t cos(3t) + C2e2t sin(3t)

We now need to determine a particular solution. Based on Table 10.1 we

try

x p(t) = Pe−3t (10.26)

Substitution of expression (10.26) into the differential equation gives

9Pe−3t + 12Pe−3t + 13Pe−3t = 8e−3t

and cancelling throughout by e−3t (which is valid because it can never be

 zero) gives

34P =  8   ⇒   P =  4

17  ⇒   x p(t) =

  4

17e−3t

and hence the general solution of the nonhomogeneous differential equation

is

x(t) = C1e2t cos(3t) + C2e2t sin(3t) +   417

e−3t (10.27)

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 162/212

162

10.7   Initial and boundary conditions

As mentioned earlier, the values of constants that arise when we

solve differential equations can be determined by making use of 

other conditions (or restrictions) placed on the problem. If there are

n unknown constants then we will need  n  extra conditions.

If all of the extra conditions are given at one value of the inde-

pendent variable then the extra conditions are called  initial condi-

tions and the combined differential equation plus initial condition

is called an initial value problem. This is so even if the independent

variable doesn’t represent time.

Formally, if the extra conditions are given at different values of 

the independent variable then they are called  boundary conditions

and the combined differential equation plus boundary conditions is

called a boundary value problem.

In applications, the term  initial condition  is used if the dependentvariable represents time and the term boundary condition is used if 

the dependent variable represents space. Usually, the two different

uses of the terminolgy coincide.

Example 10.20.  Solve the initial value problem

d2x

dt2 − 5

dx

dt  + 4x =  7 cos(3t)   subject to   x(0) = 1 and

  dx

dt (0) = 2

Solution:  We have already seen this differential equation in Example

10.18 and determined that its general solution is given by  (10.25):

x(t) =  Aet + Be4t −   750

 cos(3t) −  2150

 sin(3t)

The initial conditions will give two equations to solve for the unknowns, A

and B. Firstly,

x(0) = 1   ⇒   1 =  A + B −   7

50  (10.28)

The second initial condition involves the derivative so first we need to find

it:dx

dt  =  Aet + 4Be4t +

 21

50 sin(3t) −  63

50 cos(3t)

and the second initial condition then gives

dx

dt (0) = 2   ⇒   2 =  A + 4B −  63

50  (10.29)

Solving the pair of algebraic equations (10.28) and  (10.29) gives

 A = 13

30  and   B =

  53

75

so the required solution is

x(t) = 13

30

et + 53

75

e4t

−  7

50

 cos(3t)

− 21

50

 sin(3t)

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 163/212

mathematical methods i 163

Example 10.21.  Solve the boundary value problem

d2x

dt2  + 6

dx

dt  + 9x =  4t2 + 5 subject to   x(0) = 7 and   x(1) = −3

Solution:  We have already seen this differential equation and deter-

mined that its general solution is given by  (10.23):

x(t) =  Ae−3t + Bte−3t + 4

9t2 −  16

27t +

 23

27

The boundary conditions give two equations to solve for the unknowns, A

and B:

x(0) = 7   ⇒   7 =  A + 23

27

x(1) = −3   ⇒ −3 =  Ae−3 + Be−3 + 4

9 − 16

27 +

 23

27

Solving this pair of algebraic equations gives

 A =  16627

  and   B = −166 + 100e3

75

so the required solution is

x(t) = 166

27 e−3t −  166 + 100e3

75  te−3t +

 4

9t2 −  16

27t +

 23

27

Example 10.22.  Solve the boundary value problem

d2x

dt2 − 4

dx

dt + 13x =  8e−3t subject to   x(−π ) = 2 and

  dx

dt (π ) = 0

Solution:  We have already seen this differential equation and deter-mined that its general solution is given by  (10.27):

x(t) = C1e2t cos(3t) + C2e2t sin(3t) +  4

17e−3t

The boundary conditions give two equations to solve for the unknowns, C1

and C2:

x(−π ) = 2   ⇒   2 = −C1e−2π +  4

17e3π 

The derivative of the solution is

dxdt

  = C12e2t cos(3t) − 3e2t sin(3t)+ C2

2e2t sin(3t) + 3e2t cos(3t)− 1217

e−3t

and hence

dx

dt (π ) = 0   ⇒   0 = −2C1e2π − 3C2e2π −  12

17e−3π 

Solving this pair of algebraic equations gives

C1 = −2e2π +  4

17e5π  and   C2  =

 4

3e2π − 20

51e5π 

so the required solution is

x(t) = −2e2π +   417

e5π  e2t cos(3t) +43

e2π −  2051

e5π  e2t sin(3t) +   417

e−3t

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 164/212

164

10.8   Summary of method

We now summarize the method for solving a linear constant-

coefficient second order ordinary differential equation with initial

or boundary conditions.

Key Concept  10.23.   Solve

ad2x

dt2  + b

dx

dt  + cx  =   f (t)   with a = 0

 given some initial or boundary conditions.

Step 1. Solve the corresponding characteristic (or auxilliary) equation

am2 + bm + c =  0

Step 2. Use your solution to the characteristic equation to find the

 general solution xc(t) of the corresponding homogeneous equation

ad2x

dt2  + b

dx

dt  + cx  =  0

by using one of the three general forms given in Section  10.5

depending on whether or not the characteristic equation has two

distinct real roots, complex conjugate roots or equal roots. Your

solution will have two unknown constants.

Step 3. Use the Method of Undetermined Coefficients outlined inSection 10.6.1 to find a particular solution x p(t).

Step 4. The general solution is now x(t) = xc(t) + x p(t).

Step 5. Use the initial or boundary conditions to determine the two

unknown constants in x(t) coming from xc(t).

10.9   Partial differential equations

A partial differential equation (PDE) is one that involves an un-

known function of more than one variable and any number of its

partial derivatives. For example,

∂u

∂x +

 ∂u

∂ y  = 0

is a partial differential equation for the unknown function  u(x, y).

There may be any number of independent variables. For ex-

ample, an analysis of the vibrations of a solid structure (due to an

earthquake perhaps) would involve the three spacial variables  x , y

and z, and time t. If there were only one independent variable thenthe equation would be an ordinary differential equation.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 165/212

mathematical methods i 165

Partial differential equations are important in a large number of 

very important areas, for example, hydrodynamics and aerodynam-

ics (the Navier-Stokes equation), groundwater flow, acoustics, the

spread of pollution, combustion, electrodynamics (Maxwell’s equa-

tions), quantum mechanics (the Schrödinger equation), meteorology

and climate science.

Example 10.24.  Maxwell’s Equations

These are 4  coupled vector PDEs for the vector unknowns for the elec-

tric and magnetics fields E  = (Ex, E y, E z) and  B   = (Bx, B y, B z). They

are, in vector derivative notation,

∇ · E =  4πρe   ∇ · B =  0

−∇ × E = 1

c

∂B

∂t  ∇ × B =

 1

c

∂E

∂t  +

 4π 

c  je

where∇

is the so-called vector differential operator

∇ = ( ∂

∂x,  ∂

∂ y,  ∂

∂ z)

 Maxwell’s Equations constitute an (overspecified) set of  8  scalar equations.

The charge density ρe and current density je  are specified.

Example 10.25.  Three important partial differential equations are the

 following.

The Wave equation

∂2u

∂t2

  = c2 ∂2u

∂x2

  for   u(x, t)

where c is the wave speed.

The heat conduction (or diffusion) equation

∂T 

∂t  = κ 

∂2T 

∂x2  for   T (x, t)

where κ  is the thermal diffusivity.

The Laplace equation

∂2ψ

∂x2  +

 ∂2ψ

∂ y2  = 0 for   ψ(x, y)

10.9.1   Finding solutions of partial differential equations

There are a number of ways to find solutions of PDEs but they can

involve a large amount of work and can be very subtle. A general

discussion of how to find solutions to PDEs is beyond the scope

of this unit. We shall instead focus on verification of a given solu-

tion and analysis of its behaviour and what this tells us about the

situation being modelled.

Recall that when solving ODEs there would arise  arbitrary con-

stants, for example,

d2 y

dx2 + y =  0   ⇒   y(x) =  A cos x + B sin x

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 166/212

166

and these would be determined from given initial data, e.g.

 y(0) = 1 and y (0) = 0   ⇒   A =  1 and  B  =  0

In the case of PDEs,  arbitrary functions can arise. For example, the

solution of 

∂u

∂x +

 ∂u

∂ y  = 0 is   u(x, y) =   f (x − y)

where   f   is any differentiable function of one variable. We verify this

using the Chain Rule as follows:

∂u

∂x +

 ∂u

∂ y  =   f (x − y)(1) +  f (x − y)(−1) = 0

as required.

10.9.2   Verification of given solutions of partial differential equations

This is simply a matter of substituting the given solution into thepartial differential equation and evaluating the derivatives involved.

Example 10.26.  Verify that u(x, t) = 

x/(1 + t) is a solution of the

PDE∂u

∂t  + u2 ∂u

∂x  = 0

Solution. We will need the Chain Rule and the quotient rule:

∂u

∂x  =

 1

2

1 x(1 + t)

∂u

∂∂t  = −1

2

   x

(1 + t)3

and hence

∂u∂t   + u2 ∂u∂x   = −12   x(1 + t)3  +   x1 + t · 12 1 

x(1 + t) = 0

as required.

Example 10.27.  Verify that u(x, y) =   ex f ( y − x) is a solution of the

PDE∂u

∂x +

 ∂u

∂ y  = u

 for any function f of  one variable.

Solution. We will need the chain rule and the product rule:

∂u

∂x  = ex f ( y − x) − ex f ( y − x)

  ∂u

∂ y  = ex f ( y − x)

and hence

∂u

∂x +

 ∂u

∂ y  = ex f ( y − x) − ex f ( y − x) + ex f ( y − x) = ex f ( y − x) = u

as required.

Remark 10.28.   Note that the PDE is symmetric in x and y but the

solution doesn’t appear to be. What happened? We can write a symmetric

 partner to this solution:

u(x, y) = ex f ( y − x) = e yex− y f ( y − x)

which implies that

u(x, y) = e y g(x − y) where g( z) = e z f (− z)

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 167/212

mathematical methods i 167

A given partial differential equation can have different solution

forms. For example, all of the functions

ψ(x, y) = x2 − y2

ψ(x, y) = ex cos( y)

ψ(x, y) = ln(x2 + y2)

are solutions of the Laplace equation

∂2ψ

∂x2  +

 ∂2ψ

∂ y2  = 0

Verification of these solutions is a simple task.

ψ(x, y) =  x2 − y2 then  ∂2ψ

∂x2  +

 ∂2ψ

∂ y2  = 2 + (−2) = 0

ψ(x, y) = ex cos y   then  ∂2ψ

∂x2 +

 ∂2ψ

∂ y2  = ex cos y + ex(− cos y) = 0

ψ(x, y) = ln(x2 + y2)   then  ∂ψ

∂x  =

  2x

x2 + y2  and

  ∂ψ

∂ y  =

  2 y

x2 + y2

and hence

∂2ψ

∂x2  +

 ∂2ψ

∂ y2  =

 2( y2 − x2)

(x2 + y2)2 +

 2(x2 − y2)

(x2 + y2)2  = 0

10.9.3   Graphical description of solutions of partial differential equa-tions

There are three ways of representing solutions of a PDE graphically.

Consider the function u(x, y) =   ex cos( y). If we have access to a

sophisticated graphing package we can plot the solution surface in

three-dimensional space as in Figure  10.5.

!"

#

$"

!%&'"&' ()&'

*&'

!&'

!&'

)&'

Figure 10.5: Surface plot of  u(x, y) =ex cos( y).

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 168/212

168

Alternatively, we could produce a plot of the level curves of the

solution in two dimensions as in Figure  10.6.

Figure 10.6: Level-curve plot of u(x, y) = ex cos( y).

We could also graph u  against  x  for chosen values of  y  as in

Figure 10.7.

Figure 10.7: Section plots of  u(x, y) =ex cos( y) for  y  =  0, π /8,π /4.

10.9.4   Initial and Boundary Conditions

Recall that to fully determine the solution of an  nth order ODE

we needed to specify  n  initial or boundary conditions. To fully

determine the solution of a PDE we need sufficiently many initial

and/or boundary conditions. These will appear in the form of 

functions. An example of an initial condition is

u(x, 0) = sin(x)   for all   x

and an example of a boundary condition is

u(0, t) = e−t for all   t > 0

How many initial and/or boundary conditions do we need? If 

a PDE is nth order in time and  mth order in space then we will re-

quire n  initial conditions and  m  boundary conditions. An example

of a fully specified problem is:

Solve∂2u

∂t2  =

  ∂3u

∂x3

subject to the initial conditions

u(x, 0) = cos(x)   and  ∂u

∂t (x, 0) = 0 for 0 < x < 1

and the boundary conditions

u(0, t) = sin(t),  ∂u

∂x(0, t) = 1,

  ∂2u

∂x2 (0, t) = t   for   t > 0

An initial-boundary value problem (IBVP)  is a PDE for an unknownfunction of  time and some spatial coordinates plus enough initial

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 169/212

mathematical methods i 169

and boundary information to determine any unknown functions or

constants that arise. An example of an IBVP for the Wave equation

in the region 0 < x < ∞, 0 < t < ∞ is:

Solve∂2u

∂t2   = c

2 ∂2u

∂x2

subject to the initial conditions

u(x, 0) = cos(x)   and  ∂u

∂t (x, 0) = 0

and the boundary conditions

u(0, t) = 1 and   u(π 

2 , t) = 0

Example 10.29.  Verify that the function

u(x, t) = xe−x

sin(t)

satisfies the initial-boundary value problem

∂2u

∂t2  =

 ∂2u

∂x2 + 2

∂u

∂x

u(x, 0) = 0,  ∂u

∂t

x, π 

2

 =  0

u(0, t) = 0, limx→∞

u(x, t) = 0

Soluton step 1. Verify that the function satisfies the partial differential

equation:

∂2u∂t2

  = −xe−x sin(t)

and

∂2u∂x2  + 2 ∂u

∂x   = −2e−x sin(t) + xe−x sin(t) + 2e−x sin(t) − 2xe−x sin(t)

= −xe−x sin(t)

and hence the partial differential equation is satisfied.

Solution step 2. Verify the auxiliary (that is, initial and boundary)

conditions:

sin(0) = 0   ⇒   u(x, 0) = 0

∂u∂t

  = xe−x cos(t)   and cos π 2 =  0  ⇒   ∂u

∂t (x, π 

2 ) = 0

u(0, t) = 0

limx→∞

u(x, t) = 0  =

limx→∞

x

ex

sin(t) = 0sin(t) = 0

and hence all of the auxiliary conditions are satisfied. A plot of the solution

surface in three-dimensional space is given in Figure  10.8.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 170/212

170

0.0

0.5

t

1.00.0

10.0

0.05

7.55.0

0.1

1.52.5x

0.15

0.0

0.2

0.25

0.3

0.35

Figure 10.8: Solution surface forExample 10.29.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 171/212

11

Eigenvalues and eigenvectors

11.1   Introduction

Matrix multiplication usually results in a change of direction, for

example,2 0

1 3

1

4

 =

2

13

  which is not parallel to

1

4

The eigenvectors of a given (square) matrix  A are those special non-

zero vectors v  that map to multiples of themselves under multipli-

cation by the matrix  A. That is,

 Av =  λv

The eigenvalues of  A are the corresponding scale factors λ. Geomet-

rically, the eigenvectors of a matrix  A  are stretched (or shrunk) onmultiplication by  A, whereas any other vector is rotated as well as

 being stretched or shrunk:

x1

x2

x1

x2

v

 x   Av =  λv

 A x = λ x

Note that by definition  0  is not an eigenvector but we do allow 0  to

 be an eigenvalue.

Example 11.1.  Observe that2 6

3 5

1

1

  =

8

8

  =   8

1

1

2 63 5

−21

  =

  2−1

  =   − 1

−21

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 172/212

172

 Hence 8  and −1 are eigenvalues of the matrix

2 6

3 5

 with λ  =  8 having

corresponding eigenvector (1, 1) and  λ   = −1 having corresponding

eigenvector (−2, 1).

Some observations:  Any multiple of an eigenvector is also aneigenvector:

 A(cv) =   c( Av) =   c(λv) =   λ(cv)

so cv is an eigenvector of  A if  v  is, and with the same correspond-

ing eigenvalue. The set of all eigenvectors of a given eigenvalue

together with the zero vector is called the  eigenspace corresponding

to the eigenvalue  λ  and we write

Eλ   =  {v |  Av =  λv}

The eigenspaces for Example  11.1 are shown in Figure  11.1.

x1

x2

E8

E−1

Figure 11.1: Eigenspaces for Example11.1.

Eigenspaces for nonzero eigenvalues are subspaces of the col-

umn space of the matrix while the  0-eigenspace is the nullspace

of the matrix. Geometrically, they are lines or planes through the

origin in R3 if  A is a 3 × 3 matrix (other than the identity matrix).

11.2   Finding eigenvalues and eigenvectors

Let  A be a given  n × n matrix. Recall that the algebraic definition of 

an eigenvalue-eigenvector pair is

 Av   =   λv

where λ  is a scalar and v  is a nonzero column vector of length  n.

We begin by rearranging and regrouping, and noting that  v  =   I v

for any vector v, where  I  is the n × n identity matrix, as follows:

 Av

−λv =  0

 Av − λI v =  0

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 173/212

mathematical methods i 173

( A − λI )v =  0

This is a homogeneous set of linear equations for the components

of  v. If the matrix  A − λI were invertible then the solution would

simply be v   =   0 but this is not allowed by definition and in any

case would be of no practical use in applications. We hence requirethat  A − λI be not invertible and this will be the case if 

det( A − λI) = 0 (11.1)

When we evaluate this determinant we will have a polynomial

equation of degree  n  in the unknown λ. The solutions of this equa-

tion will be the required eigenvalues. Equation  (11.1) is called the

characteristic equation of the matrix  A. The polynomial det( A − λI)

is called the characteristic polynomial  of  A.

Example 11.2. Example 11.1 revisited. We start by forming

 A − λI    =

2 6

3 5

− λ

1 0

0 1

  =

2 − λ   6

3 5 − λ

This determinant of this matrix is

det( A − λI ) = (2 − λ)(5 − λ) − 18   =   λ2 − 7λ − 8

and hence the characteristic equation is

λ2 − 7λ − 8 =  0   ⇒   (λ− 8)(λ + 1) = 0   ⇒   λ =  8, −1

That is, the eigenvalues of the given matrix A are  λ  =  8  and  λ  = −1.For each eigenvalue λ  we must solve the equation

( A − λI )v =  0

When  λ  =  8  we have−6 6

3   −3

v1

v2

 =

0

0

  ⇒   E8  =  span

1

1

When  λ  = −1 we have

3 6

3 6 v1

v2 = 0

0   ⇒   E−1  =  span−2

1 In solving these sets of equations we end up with the complete eigenspace

in each case, but we usually select just one member of each.

It is important to note that the reduced row echelon form of 

( A − λI ) will always have at least one row of zeros. The number of zero

rows equals the dimension of the eigenspace.

Example 11.3.  The upper triangular matrix

 A = 1 2 6

0 3 50 0 4

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 174/212

174

Since A is upper triangular, so is A − λI and hence its determinant is just

the product of the diagonal. Thus the characteristic equation is

(1 − λ)(3 − λ)(4 − λ) = 0   ⇒   λ =  1, 3, 4

Note that these are in fact the diagonal elements of  A. The respectiveeigenspaces are

E1  =  span

1

0

0

E3  =  span

1

1

0

E4  =  span

16

15

3

The eigenvalues, and corresponding eigenvectors, could be

complex-valued. If the matrix  A  is real-valued then the eigenval-ues, that is, the roots of the characteristic polynomial, will occur in

complex conjugate pairs.

Example 11.4.

 A =

  2 1

−5 4

The characteristic polynomial is λ2 − 6λ + 13  =  0. The eigenvalues are

λ  =  3 + 2i and λ  =  3 − 2i. When we solve ( A − λI )v  =  0 we will get

solutions containing complex numbers. Although we can’t interpret them

as vectors in R2 there are many applications (particularly in Engineering)

in which there is a natural interpretation in terms of the problem underinvestigation. The corresponding eigenvectors are

v =

1 − 2i

5

  and   v =

1 + 2i

5

The characteristic polynomial may have repeated roots. If it

factors into the form

(λ− λ1)m1 · · · (λ− λ j)m j · · · (λ− λ p)m p

we say that the  algebraic multiplicity of  λ  =  λ j   is m j. For example, if 

the characteristic polynomial were  (λ + 3)(λ − 2)4(λ − 5) then thealgebraic multiplicity of  λ  =  2 would be 4.

Example 11.5.

 A =

−2 2   −3

2 1   −6

−1 −2 0

The characteristic equation is

−λ3 − λ2 + 21λ + 45 =  0

and so

(3 + λ)2(5 − λ) = 0   ⇒   λ = −3, −3, 5

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 175/212

mathematical methods i 175

Repeating the root −3 reflects the fact that  λ  = −3 has algebraic multi-

 plicity 2.

To find the eigenvectors corresponding to  λ  =  5  we solve

( A − 5I )v =  0   ⇒ −7 2

  −3

2   −4 −6−1 −2 −5

v1

v2

v3

 = 0

00

 After some work we arrive at the reduced row echelon form

−1 −2 −5 0

0 1 2 0

0 0 0 0

Note that we have one row of zeros. The solution will therefore involve one

 free parameter. Let v3   =  t. The second row gives v2   = −2t and the first

row then gives v1

 = −

t and hence the eigenspace corresponding to λ  =  5

is

E5 =  span

1

2

−1

Similarly, to find the eigenvectors corresponding to λ  = −3 we solve

( A + 3I )v =  0   ⇒

1 2   −3

2 4   −6

−1 −2 3

v1

v2

v3

 =

0

0

0

It turns out that the reduced row echelon form has two rows of zeros andhence the solution will involve two free parameters. Let v 2   =   s and

v3   =  t. Calculation gives v1   = −2s + 3t and hence the eigenspace

corresponding to λ  = −3 is

E−3  =  span

−2

1

0

,

3

0

1

The dimension of the eigenspace of an eigenvalue is called its

 geometric multiplicity. In the above example the geometric multiplic-

ity of  λ  =

 −3 is 2 and that of  λ  =  5 is 1. Note that the eigenspace

corresponding to λ   = −3 is a plane through the origin and the

eigenspace corresponding to λ  =  5 is a line through the origin, as

displayed in Figure  11.2

11.3   Some properties of eigenvalues and eigenvectors

Let  A be an n × n matrix with eigenvalues  λ1,λ2, · · ·   ,λn. (We

include all complex and repeated eigenvalues.) Then

• The determinant of the matrix A equals the product of the eigen-

values:det( A) = λ1λ2 . . . λn

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 176/212

176

x

 y

 z

E5

Figure 11.2: The eigenspaces forExample 11.5.

• The trace of a square matrix is the sum of its diagonal entries.

The trace of the matrix  A equals the sum of the eigenvalues:

trace( A) ≡ a11 + a22 + · · · + ann  =  λ1 + λ2 + · · · + λn

Note that in both of these formulae  all n eigenvalues must becounted.

• The eigenvalues of  A−1 (if it exists) are

1

λ1,

  1

λ2,   · · ·  ,

  1

λn

• The eigenvalues of  AT  (that is, the transpose of  A) are the same

as for the matrix  A:

λ1,   λ2   , · · ·  ,   λn

• If  k  is a scalar then the eigenvalues of the matrix  k A are

k λ1,   k λ2   , · · ·   ,   k λn

• If  k  is a scalar and  I  the identity matrix then the eigenvalues of 

the matrix  A + kI  are

λ1 + k ,   λ2 + k    , · · ·  ,   λn + k 

• If  k  is a positive integer then the eigenvalues of  Ak  are

λk 1,   λk 

2   , · · ·  ,   λk n

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 177/212

mathematical methods i 177

• Any matrix polynomial in A:

 An + αn−1 An−1 + · · · + α1 A + α0 I 

has eigenvalues

λn + αn−1λn−1 + · · · + α1λ + α0   for   λ =  λ1, λ2, . . . ,λn

•   The Cayley-Hamilton Theorem:  A matrix  A satisfies its own

characteristic equation, that is, if the characteristic equation is

λn + cn−1λn−1 + · · · + c1λ + c0  =  0

where c1, c2, · · ·   , cn  are constants then

 An + cn−1 An−1 + · · · + c1 A + c0 I  =  0

• It can be shown that any set of  vectors from different eigenspaces,

that is, corresponding to different eigenvalues, is a  linearly inde- pendent set.

11.4   Diagonalisation

The eigenstructure (that is, the eigenvalue and eigenvector informa-

tion) is often collected into matrix form. The eigenvalues (including

multiplicities) are placed as the entries in a diagonal matrix  D  and

one eigenvector for each eigenvalue is taken to form a matrix  P.

Example 11.6. Example 11.5 revisited.

 A =

−2 2   −3

2 1   −6

−1 −2 0

  P =

1   −2 3

2 1 0

−1 0 1

  D =

5 0 0

0 −3 0

0 0   −3

Note that the columns of D and P must correspond, e.g.  λ   =   5 is in

column 1  of D so the corresponding eigenvector must be in column  1  of P.

The following relationship between the matrices  A,  P  and  D  will

always hold:

 AP  =  PD   (11.2)

We can see this by noting that the ith column of the equation:

a11  · · ·   a1n...

...

an1 · · ·   ann

. . .   vi1   . . ....

......

. . .   vin   . . .

=

. . .   vi1   . . ....

......

. . .   vin   . . .

. . .   0 0

0   λi   0

0 0  . . .

is just the eigenvalue equation for  λ i, that is

 Avi  =  λ ivi

For Example  11.6:

 AP  = 5 6

  −9

10  −3 0−5 0   −3

 =  P D

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 178/212

178

If matrix  P  is invertible then equation  (11.2) can be re-written as:

 A =  PDP−1 (11.3)

or

D =  P−1

 AP   (11.4)Two matrices  M and  N  are called similar matrices if there exists

an invertible matrix Q  such that

N  =  Q−1 MQ

Clearly  A and the diagonal matrix  D  constructed from the eigenval-

ues of  A are similar.

Diagonalisation is the process of determining a matrix  P  such that

D =  P−1 AP

All we need to do is to find the eigenvalues of  A and form  P  as

described above.Note, however, that not every matrix is diagonalisable.

Example 11.7.

 A =

1 −1 0

0 1 1

0 0   −2

The eigenvalues of A are λ   = −2,1,1 (that is, λ   =   1 has algebraic

multiplicity 2).

The eigenspace corresponding to λ  = −2 is

E−2 =  span1

3−9

The eigenspace corresponding to λ  =  1  is

E1  =  span

1

0

0

Both λ  = −2 and  λ  =  1 have geometric multipliciy 1. In order for P to

be invertible its columns must be  linearly independent. Unfortunately,

there are not enough linearly independent eigenvectors to enable us to

build matrix P. Hence matrix A is not diagonalisable.

If an n × n matrix has n  distinct eigenvalues then it will be di-

agonalisable because each eigenvalue will give a representative

eigenvector and these will be linearly independent because they

correspond to different eigenvalues.

A matrix  A is called a symmetric matrix if it equals its transpose,

that is

 A =  AT 

It can be shown that the eigenvalues of a  real symmetric n × n matrix

 A are all real and that we can always find enough linearly inde-

pendent eigenvectors to form matrix P, even if there are less than  n

distinct eigenvalues. That is, a real symmetric matrix can always bediagonalised.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 179/212

mathematical methods i 179

11.5   Linear Systems

A homogeneous system of linear differential equations is a system of the

form

 x  =  A x

where

 x =

x1(t)...

xn(t)

and   A =

a11   . . .   a1n...

...

an1   . . .   ann

, a constant matrix.

We’ll see later that the general solution is a linear combination

of certain vector solutions (which we’ll label with a superscript

 bracketed number), x(1)(t), x(2)(t), etc:

 x(t) = c1 x(1)(t) + · · · + cn x(n)(t)

where the constants  c1, · · ·   , cn  are called combination co-efficients.

There is a connection between the solution of the system of differ-

ential equations and the eigenstructure of the associated matrix  A.

Example 11.8.  Consider the second-order differential equation

x + 2x − 3x =  0

Let x1  =  x and x2 =  x . The derivatives of the new variables are

x1  =  x  =  x2   x

2 =  x  =  3x − 2x  =  3x1 − 2x2

In matrix-vector form this isx

1

x2

 =

0 1

3 −2

x1

x2

The (eigenvalue) characteristic equation for the matrix is λ2 + 2λ− 3 =  0.

Note the similarity between this and the original differential equation. In

order to solve the original problem we would try for a solution of the form

x =  emt. Then x  =  memt and x  =  m2emt and substitution gives

m2emt + 2memt − 3emt = 0   ⇒   (m2 + 2m − 3)emt = 0

Divide throughout by emt (which is never zero) to get

m2 + 2m − 3 =  0 This is the characteristic equation in m  instead of  λ.

We shall use a method based on eigenstructure to solve systems

of differential equations.

11.6   Solutions of homogeneous linear systems of differential equa-

tions

When n  =  1 the system is simply  x   =   ax  where a  =   a11. The so-

lution is  x(t) =  ceat where c  is a constant. To solve a homogeneous

linear system, we seek solutions of the form

 x(t) = veλt (11.5)

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 180/212

180

where λ  is a constant and v  is a constant vector. Substitution into

the system gives

 x  =  A x   ⇒   λveλt =  Aveλt ⇒   λv =  Av

This is the eigenvalue-eigenvector problem so as soon as we knowthe eigenvalues and eigenvectors we can write down the compo-

nent solutions just by substituting into equation  (11.5).

Example 11.9. Example 11.8 revisited.x

1

x2

 =

0 1

3 −2

x1

x2

The eigenstructure of the matrix is

v1  = 1

1   v−3  =   1

−3

The two component solutions are

 x(1)(t) =

1

1

et  x(2)(t) =

  1

−3

e−3t

and the general solution is

 x =  c1

1

1

et + c2

 1

−3

e−3t

In scalar form, the solutions are

x ≡ x1  =  c1et + c2e−3t x ≡ x2  =  c1et − 3c2e−3t

Note that both x1 (position) and x2 (velocity) satisfy the original differ-

ential equation. This will be so if the system was derived from one higher

order differential equation.

Example 11.10.  The second order system

 x(t) =  A x(t)   x =

x1

x2

  A =

1 2

5 4

The eigensolution is

v6 =

2

5

  v−1 =

 1

−1

 Hence the two component solutions are

 x(1)(t) =

2

5

e6t  x(2)(t) =

  1

−1

e−t

The general solution is

 x(t) = c12

5 e6t + c2   1−1

e−t

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 181/212

mathematical methods i 181

Example 11.11.  The third order system

 x(t) =  A x(t)   x =

x1

x2

x3

  A =

1 2 3

0 4 5

0 0 6

The eigensolution is

v1  =

1

0

0

  v4  =

2

3

0

  v6  =

16

25

10

The general solution is

 x(t) = c1

1

0

0

et + c2

2

3

0

e4t + c3

16

25

10

e6t

Example 11.12. Repeated eigenvalue example. The third order

system

 x(t) =  A x(t)   x =

x1

x2

x3

  A =

−2 2   −3

2 1   −6

−1 −2 0

The eigensolution is

v5 =

1

2

−1

  v−3  =

−2

1

0

,

3

0

1

The general solution is

 x(t) = c1

1

2

−1

e5t +

c2

−2

1

0

+ c3

3

0

1

e−3t

11.7   A change of variables approach to finding the solutions

Equations (11.3) and  (11.4) can be used to simplify difficult prob-

lems:

Original Problem Easy solution

Easier Problem   Original solution

 A

P   P−1

D

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 182/212

182

Introduce a vector variable  y(t) through the relationship

 x(t) =  Py(t)   (11.6)

where the columns of matrix  P  are the eigenvectors of  A . Substitute

this change of variables into the system of differential equations:

 x =  A x   ⇒   Py  =  APy   ⇒   y  =  P−1 APy

We know that P−1 AP  =  D  so the new system can be written as

y(t) =  Dy(t)

In expanded notation this new system of differential equations is

 y

1...

 yn

=

λ1   . . . 0...

  . . .  ...

0 . . .   λn

 y1

...

 yn

Each row of this matrix-vector equation is a simple scalar differ-

ential equation for the unknown functions y1(t), · · ·  , yn(t). They

are

 y1  =  λ1 y1,   y

2  =  λ2 y2,   · · ·  ,   yn =  λn yn

The solutions of these are easily obtained:

 y1(t) = c1eλ1t,   y2(t) = c2eλ2t,   · · ·  ,   yn(t) = cneλnt

and substituting these into equation  (11.6) gives the required solu-tions.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 183/212

12

Change of Basis

12.1   Introduction

Recall that the  standard basis in R2 is

S =

1

0

,

0

1

Any vector in R2 can be written as a linear combination  of these, for

example, 2

3

 =  2

1

0

+ 3

0

1

Recall also that the  standard basis in R3 is

S = 1

0

0 ,

0

1

0 ,

0

0

1

Any vector in R3 can be written as a linear combination  of these, for

example, −4

2

−7

 = −4

1

0

0

+ 2

0

1

0

− 7

0

0

1

In many areas of mathematics, both pure and applied, there is

good reason to opt to use a different basis than the standard one.

 Judicious choice of a different basis will greatly simplify the prob-

lem. We saw this idea in Section 11.7 where we used a speciallychosen (based on eigenstructure) change of variables to greatly sim-

plify a system of differential equations. This plan of attack (that is,

a well chosen change of basis) has many uses.

Recall that a set of vectors  B   = {u1, u2, · · ·   , un} is a basis for a

subspace W  of Rm if  B  is  linearly independent and  spans W . Note that

W  may be all of Rm.

Example 12.1.  Let W P be the plane 5x + 11 y − 9 z   =   0 through the

origin in R3. A basis for this subspace is

B = 1

23

, 4

−11

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 184/212

184

We can verify this by observing that the basis vectors are clearly linearly

independent (there are only two of them and one is not a multiple of the

other) and noting that

1

23

× 4

−11

 = 5

11−9

, a vector normal to W P

 Any vector  v  in W P can be written as a linear combination of the basis

vectors, for example,

v =

−5

8

7

 =  3

1

2

3

− 2

4

−1

1

  Note that 5(−5) +11(8) − 9(7) = 0

In general, the task of finding the required linear combination of 

 basis vectors u1, u2, · · ·  , un  that produce a given target vector  v  is

that of solving a set of simultaneous linear equations:

Find α1, α2, · · ·  , αn   such that   v =  α1u1 + α2u2 + · · · + αnun

Example 12.2. Example continued. Express w   = (16, −13, −7) in

terms of the basis B.

Solution. We must solve: 16

−13

−7

 =  α1

1

2

3

+ α2

4

−1

1

  ⇒

  α1 + 4α2  =  16

2α1 − α2 = −13

3α1 + α2  = −7

and so   α1  = −4α2 =  5

The fact that we have found a solution means that  w  = (16, −13, −7) lies

in the plane W P. If it did not then the system of equation for α1, α2  would

be inconsistent.

The coefficients α1, α2, · · ·   , αn  are called the coordinates of  v  in the

 basis B  and we write

[v]B =

α1...

αn

Example 12.3. Example continued.

The coordinates of  v  and  w  in the basis B are

[v]B  =

  3

−2

  [w]B =

−4

5

As mentioned before it is often useful to change the basis that

one is working with. Suppose that instead of the basis B  in the

above example we chose to use another basis, say

C = 5

14

, −3

32

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 185/212

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 186/212

186

Example 12.4. Example 12.1 revisited again. The required coordinates

are  

1

2

3

C

=

12

12

4

−1

1

C

=

12

−12

and hence

PCB  =

12

12

12  − 1

2

  and   PBC  = P−1

CB   =

1 1

1 −1

We can verify these. Recall that [v]B  = [3, −2] and  [w]C  = [1/2, −9/2].

Then

[v]C =  PCB [v]B =

12

12

12  − 1

2

  3

−2

 =

1252

which agrees with what we got before, and

[w]B =  PBC [w]C  = 1 11 −1

  12

− 92

 = −45

which also agrees with what we got before.

12.3   Linear transformations and change of bases

Recall that linear transfomations can be represented using matrices,

for example, counterclockwise rotation through an angle  θ  in R2

using the standard basis for the original and transformed vectors

has transformation matrix

 ASS  =

cos θ − sin θ

sin θ   cos θ

We have used the subscript SS to indicate that the standard basis

is being used to represent the original vectors and also the rotated

vectors. For example, the vector e2  = [0, 1] under a rotation of  π /2

 becomes 0 −1

1 0

0

1

 =

−1

0

  =  −e1, as expected

If we desire to use different bases to represent the vectors, say basis

B for the original vectors  v  and basis C  for the transformed vectors

v then we shall label the transformation matrix  ACB  and the linear

transformation will be v

C =  ACB [v]B   (12.1)

We need a way of deducing  ACB . This can be achieved by employ-

ing the change of basis matrices  PBS  and PCS  where

[v]B =  PBS [v]S   and

v

C = PCS

v

S

Substitution of these formulae in equation  (12.1) gives

PCS

v

S =  ACB PBS [v]S   ⇒

v

S = PSC ACB PBS [v]S

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 187/212

mathematical methods i 187

(recalling that PCS   =   P−1SC ). Using the standard basis for the trans-

formation would be [v]S   =   ASS [v]S  and hence we must have

 ASS   =  PSC ACB PBS, which we can rearrange to get the linear trans-

formation change of basis formula

 ACB  =  PCS ASS PSB

Note that if we use the same basis  B  for both the original and trans-

formed vectors then we have

 ABB  =  PBS ASS PSB   (12.2)

Recall that two matrices  M and  N  are  similar if there exists an in-

vertible matrix Q  such that  N   =  Q−1 MQ. Equation (12.2) tells us

that all linear transformation matrices in which the same basis is

used for the original and for the transformed vectors are similar.

Diagonalisation is an example of this situation when matrix  A

represents some linear transformation in the standard bases. If we switch to bases composed of the eigenvectors of  A then the

matrix representation of the linear transformation takes on the

much simpler form D , a diagonal matrix.

Example 12.5.  We determine the change of basis matrix from the stan-

dard basis S to a basis

B = {u1, u2, · · ·   , un}We note that [ui]S  = [ui] for all i  =  1, · · ·   , n and hence we can immedi-

ately write

PSB  =  [u1  u2 · · ·   un]

Example 12.6.  We determine the transformation matrix for counterclock-

wise rotation through an angle  θ  in R2 using the basis

B =

  1

−1

,

1

1

to represent both the original vectors and the transformed vectors.

We need to calculate PSB  and PBS. We can write down immediately

that

PSB  =   1 1−1 1   and that   PBS  =  P−1SB   =  12

1 −11 1

and so the desired transformation matrix is

 ABB  =  PBS ASS PSB   =   12

1 −1

1 1

cos θ − sin θ

sin θ   cos θ

  1 1

−1 1

=   12

1 −1

1 1

cos θ + sin θ   cos θ − sin θ

sin θ − cos θ   sin θ + cos θ

=

cos θ − sin θ

sin θ   cos θ

That is, ABB   =   ASS, but if we think about the geometry then this makessense.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 188/212

188

Example 12.7.  We determine the transformation matrix for counterclock-

wise rotation through an angle  θ  in R2 using the basis

C =

1

0

,

1

1

to represent both the original vectors and the transformed vectors.

We need to calculate PSC  and PCS. We can write down immediately

that

PSC  =

1 1

0 1

  and that   PCS  =  P−1

SC   =

1 −1

0 1

and so the desired transformation matrix is

 ACC  =  PCS ASS PSC  = 1 −1

0 1 cos(θ) − sin(θ)

sin(θ)   cos(θ) 1 1

0 1=

1 −1

0 1

cos(θ)   cos(θ) − sin(θ)

sin(θ)   sin(θ) + cos(θ)

 =

cos(θ) − sin(θ)   −2sin(θ)

sin(θ)   sin(θ) + cos(θ)

Example 12.8.  The matrix of a particular linear transfomation in R3 in

which the standard basis has been used for both the original and trans-

 formed vectors is

 ASS  =

1 3 4

2   −1 1

−3 5 1

Determine the matrix of the linear transfomation if basis B is used for the

original vector and basis C is used for the transformed vectors, where

B =

0

0

1

,

0

1

0

,

1

0

0

C =

1

1

1

,

0

1

1

,

0

0

1

Solution. First we need to calculate PCS  and PSB . We can immediately

write down PSB  and PSC, and use the fact that PCS  =  P−1SC . We have

PSB  = 0 0 1

0 1 01 0 0   and   PSC  =

1 0 0

1 1 01 1 1   ⇒   PCS  =  P−

1

SC   = 1 0 0

−1 1 00   −1 1

and hence

 ACB  =

4 3 1

−3 −4 1

0 6   −5

Example 12.9.  Recall that a square matrix A is diagonalisable if there ex-

ists an invertible matrix P such that D  =  P−1 AP where D is a diagonal

matrix. This is in fact a change of basis formula. If A represents a linear

transformation using the standard basis for both the original and trans-

 formed vectors, that is, A  =  ASS, then we can interpret P as representinga change of basis, from a new basis B to the standard basis S (yes, it has to

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 189/212

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 190/212

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 191/212

Part IV

Sequences and Series: by

Luchezar Stoyanov,

 Jennifer Hopwood and

Michael Giudici

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 192/212

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 193/212

13

Sequences and Series

13.1   Sequences

By a sequence we mean an infinite sequence of real numbers:

a1, a2, a3, . . . , an, . . .

We denote such a sequence by {an} or {an}∞n=1. Sometimes our

sequences will start with am  for some  m = 1.

Example 13.1.  1.

1, 1

2, 1

3, . . . ,

 1

n, . . .

 Here an  =   1n  for all integers n ≥ 1.

2. bn = (−1)nn3  for n ≥ 1, defines the sequence

−1, 23, −33, 43, −53, 63, . . .

3. For any integer n ≥ 1, define an  =  1  when n is odd, and an  =  0  when

n is even. This gives the sequence

1,0,1,0,1,0,1,0, . . .

4. The sequence of the so called  Fibonacci numbers is defined recur-

sively as follows: a1  =  a2   =  1, an+2  =  an +  an+1 for n ≥  1. This is

then the sequence

1,1,2,3,5,8,13, . . .

In the same way that we could define the limit as  x  →  ∞ of afunction   f (x) we can also define the limit of a sequence. This is not

surprising since a sequence can be regarded as a function with the

domain being the set of positive integers.   1   1 Again this definition can be mademore precise along the lines of Section6.1.1 in the following manner: We saythat {an} has a limit  L  if for every   > 0 there exists a positive integer  N such that |an − L| <  for all  n ≥  N .

Definition 13.2.  (Intuitive definition of the limit of a sequence)

Let {an} be a sequence and L be a real number. We say that {an} has

a limit L if we can make an  arbitrarily close to L by taking n to be suffi-

ciently large. We denote this situation by

limn→∞ an =  L

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 194/212

194

We say that {an} is convergent if   limn→∞

an  exists; otherwise we say

that {an} is   divergent.

Example 13.3.  1. Let an =  b for all n ≥ 1. Then   limn→∞ an =  b.

2. Consider the sequence an =   1n   (n ≥ 1). Then   lim

n→∞an = 0.

3. If  α   >  0 is a constant (that is, does not depend on n) and an   =  1

nα  for

any n ≥ 1, then   limn→∞

an = 0. For instance, taking α  =  1/2  gives

1,  1√ 

2,

  1√ 3

, . . . ,  1√ 

n, . . . −→ 0

4. Consider the sequence {an}  from Example  13.1(3) above, that is

1,0,1,0,1,0,1,0, . . .

This sequence is divergent.

 Just as for limits of functions of a real variable we have Limit

Laws and a Squeeze Theorem:

Theorem  13.4 (Limit Laws).   Let {an} and {bn} be convergent se-

quences with   limn→∞

an =  a and   limn→∞

bn =  b. Then:

1.   limn→∞

(an ± bn) =  a ± b.

2.   limn→∞

(c an) = c a for any constant c ∈ R.

3.   limn→∞

(an bn) =  a b.

4. If b = 0  and bn = 0, for all n then   limn→∞

an

bn=

  a

b.

Theorem  13.5 (The Squeeze Theorem or The Sandwich Theorem).

Let {an}, {bn} and {cn} be sequences such that   limn→∞

an   =   limn→∞

cn   =   a

and

an ≤ bn ≤ cn

 for all n ≥ 1. Then the sequence {bn} is also convergent and   limn→∞

bn =  a.

We can use Theorems  13.4 and  13.5 to calculate limits of various

sequences.

Example 13.6.  Using Theorem 13.4 several times

limn→∞

n2 − n + 1

3n2 + 2n − 1  =   lim

n→∞

n2 (1 − 1/n + 1/n2)

n2 (3 + 2/n − 1/n2)

=lim

n→∞(1 − 1/n + 1/n2)

limn→∞

(3 + 2/n − 1/n2)

=lim

n→∞1 −   lim

n→∞

1

n +   lim

n→∞

1

n2

limn

→∞

3 + 2 limn

→∞

1

n −   lim

n

→∞

1

n2

=   13

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 195/212

mathematical methods i 195

 Here we used the fact that   limn→∞

1

n2  = 0  (see Example 13.3(3)).

Example 13.7.   Find   limn→∞

cos(n)

n  if it exists.

Solution. (Note: Theorem 13.4 is not applicable here, since   limn

→∞

cos(n)

does not exist.) Since −1 ≤ cos(n) ≤ 1 for all n, we have

− 1

n ≤   cos(n)

n  ≤   1

n

 for all n ≥ 1. Using the Squeeze Theorem and the fact that

limn→∞

− 1

n  =   lim

n→∞

1

n = 0

it follows that   limn→∞

cos(n)

n  = 0.

A particular way for a sequence to be divergent is if it diverges

to ∞ or−∞.

Definition 13.8.  (Diverging to infinity)

We say that the sequence {an}∞n=1 diverges to ∞ if given any positive

number M we can always find a point in the sequence after which all

terms are greater than M. We denote this by   limn→∞

an  = ∞ or an → ∞.

Similarly, we say that {an}∞n=1  diverges to −∞ if given any negative

number M we can always find a point in the sequence after which all

terms are less than M. We denote this by   limn→∞

an = −∞ or an → −∞.

Note as in the limit of a function case when we write limn→∞

an   =

∞ we do not mean that the limit exists and is equal to some special

number ∞. We are just using a combination of symbols which has

 been agreed will be taken to mean "the limit does not exist and the

reason it does not exist is that the terms in the sequence increase

without bound’. In particular, ∞ IS NOT a number and so do not

try to do arithmetic with it. If you do you will almost certainly end

up with incorrect results, sooner or later, and you will always be

writing nonsense.

Note that it follows from the definitions that if  an

 → −∞, then

|an| → ∞.

Example 13.9.  1. Let an   =  n and bn   = −n2  for all n ≥   1. Then

an → ∞, while bn → −∞, and |bn| = n2 → ∞.

2. an   = (−1)n n does not diverge to ∞ and an  does not diverge to −∞,

either. However, |an| = n → ∞.

3. Let r   >  1 be a constant. Then rn →  ∞. If r   < −1, then rn does not

diverge to ∞ or −∞. However, |rn| = |r|n → ∞.

4. If an =  0 for all n, then an →  0 if and only if    1|an| →  ∞. Similarly,

if an   >  0 for all n ≥  1, then an →  ∞ if and only if   1an →  0. (These

 properties can be easily derived from the definitions.)

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 196/212

196

13.1.1   Bounded sequences

Definition 13.10.   (Bounded)

 A sequence {an}∞n=1 is called bounded above if an  is less than orequal to some number (an upper bound) for all n. Similarly, it is called

 bounded below if an  is greater than or equal to some number (a lower

bound) for all n.

We say that a sequence is bounded if it has both an upper and lower

bound.

In Examples 13.1, the sequences in part (1) and (3) are bounded,

the one in part (2) is bounded neither above nor below, while the

sequence in part (4) is bounded below but not above.

Theorem  13.11.  Every convergent sequence is bounded.

It is important to note that the converse statement in Theorem

13.11 is not true; there exist bounded sequences that are divergent.

See Example 13.1(3) above. So this theorem is not used to prove

that sequences are convergent but to prove that they are not, as in

the example below.

Example 13.12.  The sequence an   = (−1)nn3 is not bounded, so by

Theorem  13.11. it is divergent.

An upper bound of a sequence is not unique. For example, both

1 and  2  are upper bounds for the sequence  an   =   1n . This motivates

the following definition.

Definition 13.13.  (Supremum and infimum)

Let A ⊂ R. The least upper bound of A (whenever A is bounded above) is

called the supremum of A and is denoted  sup A.

Similarly, the greatest lower bound of A (whenever A is bounded be-

low) is called the infimum of A and is denoted inf  A.

Notice that if the set  A has a maximum, then sup A  is the largest

element of  A. Similarly, if  A has a minimum, then inf  A is the

smallest element of  A. However, sup A  always exists when  A is

 bounded above even when  A has no maximal element. For exam-

ple, if  A  = (0, 1) then  A does not have a maximal element ( for any

a ∈   A there is always  b ∈   A such that  a   <   b). However, it has a

supremum and sup A   =  1. Similarly, inf  A always exists when  A

is bounded below, regardless of whether  A has a minimum or not.

For example, if  A = (0, 1) then inf  A  =  0.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 197/212

mathematical methods i 197

Definition 13.14.   (Monotone)

 A sequence {an} is called monotone if it is either non-decreasing or non-

increasing, that is, if either an ≤   an+1 for all n or an ≥   an+1 for all

n.

We can now state one important property of sequences.

Theorem  13.15 (The Monotone Sequences Theorem).  If the se-

quence {an}∞n=1 is non-decreasing and bounded above, then the sequence

is convergent and

limn→∞

an =  sup({an})

If {an}∞n=1  is non-increasing and bounded below, then {an} is convergent

and

limn→∞ an =  inf ({an})That is, every monotone bounded sequence is convergent.

Theorem 13.15 and the definition of divergence to ±∞ implies

the following:

Key Concept  13.16.  For any non-decreasing sequence

a1 ≤ a2 ≤ a3 ≤ . . . ≤ an ≤ an+1 ≤

there are only two options:

Option  1.  The sequence is bounded above. Then by the MonotoneSequences Theorem the sequence is convergent, that is,   lim

n→∞an  exists.

Options 2.  The sequence is not bounded above. It then follows

 from the definition of divergence to ∞ that   limn→∞

an = ∞.

Similarly, for any non-increasing sequence

b1 ≥ b2 ≥ b3 ≥ . . . ≥ bn ≥ bn+1 ≥

either the sequence is bounded below and then   limn→∞

bn  exists or the

sequence is not bounded below and then   limn→∞

bn = −∞.

Example 13.17.  Let an =   1n  for all n ≥ 1. As we know,

limn→∞

an =  0  =  inf 

1

n | n =  1, 2 , . . .

This just confirms Theorem  13.15.

Theorem  13.18.  1. For every real constant α > 0

limn→∞

ln n

nα  = 0 .

2.

limn→∞

n√ n =   limn→∞

n1/n = 1 .

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 198/212

198

3. For every constant a ∈ R

limn→∞

an

n!  = 0 .

4. For every constant a ∈ R

limn→∞

1 +

  an

n= ea .

13.2   Infinite Series

Definition 13.19.  (Infinite series)

 An infinite series is by definition of the form

∑ n=1

an  =  a1 + a2 + . . . + an + . . . (13.1)

where a1, a2, . . . , an, . . . is a sequence of real numbers.

Sometimes we will label the series2  by ∑ ∞n=0 an =  a0 + a1 + a2 + . . ..   2 It is important to be clear about thedifference between a sequence anda series. These terms are often usedinterchangeably in ordinary English but in mathematics they have distinctlydifferent and precise meanings.

Since an infinite series involves the sum of infinitely many terms

this raises the issue of what the sum actually is. We can deal with

this in a precise manner by the following definition.

Definition 13.20.  (Convergent and divergent series)

For every n ≥ 1,sn  =  a1 + a2 + . . . + an

is called the nth partial sum of the series in (13.1). If   limn→∞

sn  =  s we say

that the infinite series is  convergent and write

∑ n=1

an =  s

The number s is then called the sum of the series.

If   limn→∞

sn  does not exist, we say that the infinite series (13.1) is diver-

gent.

Note that when a series is convergent we have

∑ n=1

an  =   limn→∞

(a1 + a2 + . . . + an)

Example 13.21.  1. The interval (0, 1] can be covered by subintervals

as follows: first take the subinterval [1/2,1] (of length 1/2), then the

subinterval [1/4,1/2] (of length 1/4); then the subinterval [1/8, 1/4]

(of length 1/8), etc. The total length of these subintervals is  1, which

implies

1 =  12

 +  14

 +  18

 +  . . . +   12n

 + . . . .

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 199/212

mathematical methods i 199

2. Given a constant r ∈ R, consider the   geometric series

∑ n=0

rn = 1 + r + r2 + . . . + rn−1 + . . .

For the nth partial sum of the above series we have

sn =  1 + r + r2 + . . . + rn−1 =  1 − rn

1 − r

when r = 1.

(a) Let |r| < 1. Then   limn→∞

sn  =   limn→∞

1 − rn

1 − r  =

  1

1 − r, since lim rn = 0

whenever |r|   <   1. Thus the geometric series is convergent in this

case and we write

1

1 − r  = 1 + r + r2 + . . . + rn−1 + . . . (13.2)

Going back to part (1), now rigorously we have∞

∑ n=1

1

2n  = −1 +

∑ n=0

1

2n  = −1 +

  1

1 − 1/2 = 1

(b) If |r| > 1, then   limn→∞

rn does not exist (in fact |rn| → ∞ as n → ∞),

so sn   =   1−rn

1−r   does not have a limit; that is, the geometric series is

divergent.

(c) If r   =  1, then sn   =  n →  ∞ as n →  ∞, so the geometric series is

again divergent.

(d) Let r =

−1. Then sn  = 1  for odd n and sn  = 0  for even n. Thus the

sequence of partial sums is 1, 0 , 1, 0 , 1, 0 , . . . , which is a divergent

sequence. Hence the geometric series is again divergent.

Conclusion: the geometric series is convergent if and only if  |r| < 1.

Theorem  13.22.  If the series∞

∑ n=1

an  is convergent, then   limn→∞

an =  0.

The above theorem says that limn→∞

an   =  0 is necessary for con-

vergence. The theorem is most useful in the following equivalent

form.

Theorem  13.23 (Test for divergence).   If the sequence {an} does not

converge to 0, then the series∞

∑ n=1

an  is divergent.

Example 13.24.  The series∞

∑ n=1

n2 + 2n + 1

−3n2 + 4  is divergent by the Test for

Divergence, since

limn→∞

an =   limn→∞

n2 + 2n + 1

−3n2 + 4  =   lim

n→∞

1 + 2/n + 1/n2

−3 + 4/n2  = −1

3 = 0 .

Note that Theorem  13.22 does not say that limn

→∞

an   =  0 implies

that the series is convergent. In fact, the following example showsthat this is not the case.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 200/212

200

Example 13.25.   The harmonic series

1 + 1

2 +

 1

3 +  . . . +

 1

n +  . . .

is divergent, even though   limn→∞

an  =  0.

Proof for the interested reader:  First notice that sn   <  sn+1 for everyn ≥   1, so the sequence {sn} of partial sums is increasing. Therefore to

show that   limn→∞

sn   =  ∞ it is enough to observe that the sequence {sn} is

not bounded. To do this we will demonstrate that the subsequence

s1, s2, s22 , s23 , . . . , s2n , . . . → ∞

as n → ∞.

We have s1  =  1  and s2  =  1 +   12 . Next,

s22  =

1 +

 1

2

+

1

3 +

 1

4

> s2 +

1

4 +

 1

4

 =  1 +

 2

2

ands23   =

1 +   1

2 +   13  +   1

4

+

15 +   1

6 +   17 +   1

8

> s22 +

18 +   1

8 +   18 +   1

8

> 1 +   2

2 +   12  = 1 +   3

2

 Assume that for some k ≥ 1 we have s2k   > 1 +   k 2 . Then

s2k +1   =

1 +   12 +  . . . +   1

2k 

+

  12k +1

 +   12k +2

 +  . . . +   12k +1

> s2k  + 2k ×   1

2k +1

> 1 +   k 2 +   1

2   = 1 +   k +12

It now follows by the Principle of Mathematical Induction that

s2n   > 1 + n

2

 for all positive integers n. This obviously implies   limn→∞

s2n   =  ∞ and there-

 fore the sequence {sn} is unbounded. Since this sequence is increasing (as

mentioned earlier), it follows that   limn→∞

sn   =  ∞, so the harmonic series is

divergent.

Indeed, it can be shown that if sn  is the nth partial sum of the harmonic

series thensn

ln n  =

 1 +   12 +   1

3 +  . . . +   1n

ln n  → 1

as n → ∞. That is, sn → ∞ as fast as ln(n) → ∞.

Definition 13.26.   (p-series)

One important series is the p-series given by

∑ n=1

1

n p

 Here p ∈ R is an arbitrary constant.

Note that the case  p   =  1 is the harmonic series which we havealready seen is divergent.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 201/212

mathematical methods i 201

Theorem  13.27.   The p-series

∑ n=1

1

n p

is convergent if and only if p > 1.

There are several results and tests that allow you to determine

if a series is convergent or not. We outline some of them in the

theorems below.

Theorem  13.28.  If the infinite series∞

∑ n=1

an  and∞

∑ n=1

bn  are convergent,

then the series∞

∑ n=1

(an ± bn) and∞

∑ n=1

c an  (for any constant c ∈ R) are also

convergent with

∑ n=1

(an ±

bn

) =∞

∑ n=1

an ±

∑ n=1

bn

  and∞

∑ n=1

c an

 =  c∞

∑ n=1

an

Example 13.29.  Using Theorem 13.28 and the formula for the sum of a

(convergent) geometric series, we get

∑ n=0

−2

3

n

+  3

4n+1

  =

∑ n=0

−2

3

n

+ 3

4

∑ n=0

1

4n

=  1

1 + 2/3 +

 3

4 ·   1

1 − 1/4  =

 8

5

Another way to determine if a series converges is to compare it

to a series whose behaviour we know. The first way to do this is in

an analogous way to the Squeeze Theorem.

Theorem  13.30 (The Comparison Test).   Let∞

∑ n=1

an  and∞

∑ n=1

bn  be

infinite series such that

0 ≤ an ≤ bn

holds for all sufficiently large n.

1. If ∞

∑ n=1

bn  is convergent then∞

∑ n=1

an  is convergent.

2. If ∞

∑ n=1

an  is divergent then∞

∑ n=1

bn  is divergent.

Since we know the behaviour of a  p-series, we usually use one in

our comparison.

Example 13.31.  1. Consider the series

∑ n=1

1 + sin(n)

n2

Since −1 ≤ sin(n) ≤ 1, we have 0 ≤ 1 + sin(n) ≤ 2 for all n. Thus

0 ≤  1 + sin(n)

n2  ≤   2

n2  (13.3)

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 202/212

202

 for all integers n ≥ 1.

The series∞

∑ n=1

1

n2  is convergent, since it is a p-series with p  =  2   >  1.

Thus∞

∑ n=1

2

n2

 is convergent, and now (13.3) and the Comparison Test

show that the series∞

∑ n=1

1 + sin(n)

n2  is also convergent.

2. Consider the series∞

∑ n=1

ln(n)

n  . Since

0 <  1

n ≤   ln(n)

n

 for all integers n  ≥   3, and the series∞

∑ n=1

1

n is divergent (it is the

harmonic series), the Comparison Test implies that the series∞

∑ n=1

ln(n)n

is also divergent.

Another way to compare to series is to take the limit of the ra-

tio of terms from the corrsponding sequences. This allows us to

compare the rates at which the two sequences go to 0.

Theorem  13.32 (The Limit Comparison Test).   Let∞

∑ n=1

an  and∞

∑ n=1

bn

be infinite series such that an ≥  0  and bn   >  0  for sufficiently large n, and

letc =   lim

n→∞

an

bn≥ 0

(a) If  0   <   c   <   ∞, then∞

∑ n=1

an  is convergent if and only if ∞

∑ n=1

bn  is

convergent.

(b) If c =  0  and∞

∑ n=1

bn  is convergent then∞

∑ n=1

an  is convergent.

(c) If   limn

→∞

an

bn= ∞ and

∑ n=1

an  is convergent then∞

∑ n=1

bn  is convergent.

Remark 13.33.  1. Clearly in case (a) above we have that∞

∑ n=1

an  is diver-

 gent whenever∞

∑ n=1

bn  is divergent. In case (b), if ∞

∑ n=1

an  is divergent,

then∞

∑ n=1

bn  must be also divergent. And in case (c), if ∞

∑ n=1

bn  is diver-

 gent, then∞

∑ n=1

an  is divergent.

2. Notice that in cases (b) and (c) we have implications (not equivalences).

For example, in case (b) if we know that∞∑ 

n=1

an  is convergent, we can-

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 203/212

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 204/212

204

Remark 13.36.  The conclusion of Theorem  13.35 about the convergence

of an alternating series remains true if we assume that a n ≥  0 and an ≥an+1 for all sufficiently large n and   lim

n→∞an =  0.

Example 13.37.  The series

∑ n=1

(−1)n−1 1

n  = 1 −  1

2 +

 1

3 − 1

4 +

 1

5 − 1

6 . . .

is convergent according to the Alternating Series Test, since an  =   1n ≥  0

 for all n ≥ 1, and the sequence {an} is decreasing and converges to 0.

13.3   Absolute Convergence and the Ratio Test

We begin with a theorem.

Theorem  13.38.   If the infinite series ∞∑ 

n=1

|an| is convergent, then the

series∞

∑ n=1

an  is also convergent.

This motivates the following definition.

Definition 13.39.  (Absolute convergence)

 An infinite series∞

∑ n=1

an  is called absolutely convergent if the series

∑ n=1

|an| is convergent. If ∞

∑ n=1

an  is convergent but∞

∑ n=1

|an| is divergent

then we say that∞

∑ n=1

an  is  conditionally convergent.

As Theorem 13.38 shows, every absolutely convergent series is

convergent. The next example shows that the converse is not true.

Example 13.40.  1. The series∞

∑ n=1

(−1)n−1

n  is convergent (by the Alter-

nating Series Test). However,

∑ n=1

(−1)n−1

n

=∞

∑ n=1

1

n

is divergent (it is the harmonic series). Thus∞

∑ n=1

(−1)n−1

n  is condition-

ally convergent.

2. Consider the series∞

∑ n=1

sin(n)

n2  . Since

0 ≤ sin(n)

n2

≤   1

n2

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 205/212

mathematical methods i 205

and the series∞

∑ n=1

1

n2  is convergent (a p-series with p   =   2   >   1),

the Comparison Test implies that∞

∑ n=1

sin(n)

n2

is convergent. Hence

∑ n=1

sin(n)

n2   is absolutely convergent and therefore convergent.

Theorem  13.41 (The Ratio Test).   Let∞

∑ n=1

an  be such that

limn→∞

an+1

an

=  L

1. If L < 1, then∞

∑ n=1

an  is absolutely convergent.

2. If L > 1, then∞

∑ n=1

an  is divergent.

Note that when  L  =  1 the Ratio Test gives no information.

Example 13.42.  Consider the series∞

∑ n=1

n2

2n. We have

an+1

an

=  an+1

an=

 (n + 1)2/2n+1

n2/2n  =

 1

2

n + 1

n

2

→  1

2

as n →   ∞. Since L   =  1

2  <   1, the Ratio Test implies that

∑ n=1

n2

2n  is

absolutely convergent.

Example 13.43.  For any constant b ∈

  R the infinite series∞

∑ n=0

bn

n!  is

absolutely convergent. We have an+1

an

=

bn+1/(n + 1)!

bn/n!

=  |b|n + 1

 → 0

as n →  ∞. So, by the Ratio Test, the series∞

∑ n=0

bn

n! is absolutely conver-

 gent. In particular (by the Test for Divergence),   limn→∞

bn

n!  = 0.

13.4   Power Series

Definition 13.44.  (Power series)

Let a ∈  R be a given number, {an}∞n=0  a given sequence of real numbers

and x ∈ R is a variable (parameter).

 A series of the form

∑ n=0

an (x − a)n =  a0 + a1 (x − a) + a2 (x − a)2 + . . . + an (x − a)n + . . .

is called a power series centred at a. When a   =   0, the series is simply

called a power series.

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 206/212

206

Clearly a power series can be regarded as a function of  x  defined

for all x ∈ R for which the infinite series is convergent.

Theorem  13.45.  For a power series ∑ ∞n=0 an(x − a)n one of the following

three possibilities occurs:

(a) The series is absolutely convergent for x   =  a and divergent for all

x =  a.

(b) The series is absolutely convergent for all x ∈ R.

(c) There exists R   >   0 such that the series is absolutely convergent for

|x − a| < R and divergent for |x − a| > R.

In other words, a power series is convergent at only one point or

convergent everywhere or convergent only on an interval centred at

a. It is not possible for it to only be convergent at several separated

points or on several separate intervals.

The number R  in part (c) of Theorem  13.45 is called the   radius of 

convergence of the power series. In case (a) we define  R  =  0, while in

(b) we set  R  =  ∞. When |x − a|  =  R then the series may or not be

convergent. It is even possible for a series to be convergent for one

value of  x  for which |x − a| =  R  and divergent for another.

Example 13.46.  Find all x ∈ R  for which the series

∑ n=1

(−1)n

√ n

  (x + 3)n (13.4)

is absolutely convergent, conditionally convergent or divergent.

Solution. When x = −3 the series is absolutely convergent.

Let x = 3. Then an+1

an

=

(−1)n+1(x + 3)n+1/√ 

n + 1

(−1)n(x + 3)n/√ 

n

= |x + 3| 

  n

n + 1 → |x + 3|

as n →  ∞. Thus, by the Ratio Test, the series is absolutely convergent for

|x + 3| < 1, that is, for −4 < x < −2, and divergent for |x + 3| > 1, that

is, for x < −4 and x > −2.

It remains to check the points x = −4 and x  = −2.

Substituting x   =

 −4 in (13.4) gives the series

∑ n=1

(−1)n

√ n  (

−1)n =

∑ n=1

1

n1/2  which is divergent (it is a p-series with p  =  1/2 < 1).

When x  = −2, the series (13.4) becomes∞

∑ n=1

(−1)n

√ n

  , which is conver-

 gent by the Alternating Series Test. However,

∑ n=1

(−1)n

√ n

=∞

∑ n=1

1√ n

is divergent (as we mentioned above), so∞

∑ n=1

(−1)n

√ n

  is conditionally con-

vergent.Conclusion: The series (13.4) is

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 207/212

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 208/212

208

for all |x − a|   <  R. Then substituting x  =   a in (13.6) gives   f (a) =

a1.

Similarly, differentiating (13.6) yields

 f (x) = 2a2 + 6a3 (x − a) + . . . + n (n − 1) an (x − a)n−2 + . . .

for all |x − a|   <   R and substituting  x   =   a in this equality gives

 f (a) = 2a2, that is,  a2 =  f (a)

2!  .

Continuing in this fashion we must have

an =  f (n)(a)

n!  for each n.

Definition 13.49.   (Taylor Series)

 Assume that f (x) has derivatives of all orders on some interval I contain-

ing the point a in its interior. Then the power series

∑ n=0

 f (n)(a)

n!  (x − a)n =   f (a) +   f (a) (x − a) +

  f (a)

2!  (x − a)2 + . . .

is called the Taylor series of   f  about a. When a  =  0 this series is called

the MacLaurin Series of f .

Recall from Chapter  9, that

T n,a(x) =   f (a) + f (a) (x − a) +  f (a)

2!  (x − a)2 +

  f (a)

3!  (x − a)3 + . . . +

  f (n)(a)

n!  (x − a)n

is the nth Taylor polynomial of   f   at a. Clearly this is the  (n + 1)st

partial sum of the Taylor series of   f   at a. For any given  x  ∈   I  we

have

 f (x) = T n,a(x) + Rn,a(x)

where Rn,a(x) is the remainder (or error term). So, if for some  x ∈ I 

we have  Rn,a(x) → 0 as n → ∞, then

 f (x) =   limn→∞

T n,a(x) =∞

∑ n=0

 f (n)(a)

n!  (x − a)n

That is, when  Rn,a(x)

→0 as n

→∞, the Taylor series is convergent

at x  and its sum is equal to   f (x).

Example 13.50.  1. Consider the function f (x) =  e x. Then f (n)(x) = e x

 for all integers n ≥  1, so f (n)(0) =  1  for all n. Thus, the Taylor series

 for ex about 0  has the form

∑ n=0

xn

n!  = 1 +

  x

1! +

 x2

2!  +

 x3

3!  + . . . +

 xn

n! + . . . . (13.7)

We will now show that the sum of this series is equal to e x  for all x ∈R.

Fix a constant b > 0. For any x∈

[−

b, b] we have

 f (x) =  T n,0(x) + Rn,0(x)

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 209/212

mathematical methods i 209

where T n,0(x) is the nth Taylor polynomial of f about 0  and Rn,0(x) is

the corresponding remainder. By Taylor’s formula,

Rn,0(x) =  f (n+1)( z)

(n + 1)!  xn+1

 for some z between 0  and x. Thus | z| ≤ b, and therefore

0 ≤ |Rn,0(x)| =

 f (n+1)( z)

(n + 1)!  xn+1

≤   e z

(n + 1)! bn+1 ≤ eb   bn+1

(n + 1)! → 0 ,

as n →  ∞ by Example 13.43. By the Squeeze Theorem, (for the given

x)   limn→∞

|Rn,0(x)|   =   0. Thus the Taylor series (13.7) is convergent

and its sum is ex. Since b   >  0 can be taken arbitrarily large the above

conclusion can be made about any x ∈ R. Thus

ex =∞

∑ n=0

xn

n!

 for all x ∈ R.

2. Using the above argument (with slight modifications), one shows that

sin(x) =∞

∑ n=0

(−1)n   x2n+1

(2n + 1)!  =

  x

1!− x3

3! +

 x5

5! − . . . + (−1)n   x2n+1

(2n + 1)! + . . .

 for all x ∈   R. Here the right hand side is the Taylor series of  sin(x)

about 0.

3. Similarly,

cos(x) =∞

∑ n=0

(−1)n   x2n

(2n)!  = 1 − x2

2! +

 x4

4! − x6

6! − . . . + (−1)n   x2n

(2n)! + . . .

 for all x ∈   R. Here the right hand side is the Taylor series of  cos(x)

about 0.

Combining parts (1), (2) and (3) and extending the notion of a power

series to the complex numbers this gives a way of showing that e ix =

cos(x) + i sin(x).

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 210/212

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 211/212

14

Index

 p-series, 200

divergent,  194

absolute maximum,  129,  131

absolute minimum, 129,  131

absolutely convergent,  204

additive identity,  49

Alternating series,  203

associative, 48

augmented matrix, 15

auxiliary equation, 157

 back substitution,  17

 basic variables,  20

 basis, 40 boundary, 132

 boundary conditions,  162

 boundary value problem, 162

 bounded, 133

change of coordinates matrix,  185

characteristic equation, 157

closed, 132

codomain, 77

coefficient matrix,  12

column space, 51

column vector, 26commutative, 48

commute,  49

conditionally convergent., 204

consistent, 10

continuous, 103

continuously differentiable, 109

contrapositive, 39

convergent, 194, 198

critical point, 130,  131

derivative,  109, 119

determinant, 69

diagonal matrix,  50

differentiable, 107, 119

differential equations,  145

dilation, 80

dimension, 44

direction field,  150directional derivative, 124

divergent,  198

diverges to ∞,  195

domain, 77

eigenvalues, 171

eigenvectors, 171

elementary matrix, 74

elementary row operation, 12

extremum, 129,  131

free parameter,  20

free variable,  20

full rank, 55

function, 77

Gaussian Elimination, 17

gradient vector, 125

group, 68

harmonic series, 200

Hessian matrix,  135

homogeneous, 23,  156

idempotent matrix,  50

identity, 50

identity matrix,  50,  80

image, 77

inconsistent., 10

independent, 38

infimum, 196

infinite series,  198

inflection point, 134

initial value problem,  155

inverse,  59, 61

inverse function, 82

invertible, 61,  82

 Jacobian, 121

 Jacobian matrix,  121

kernel, 83

leading entries, 19

leading entry,  16

leading variables,  20

left-distributive, 48

linear combination, 32

linear transformation, 77

linearly independent, 38

local maximum,  129,  131

local minimum, 129,  131

lower-triangular matrix, 50

MacLaurin Series, 208

main diagonal,  50

matrix, 47

matrix addition, 47

matrix multiplication,  47

matrix transposition, 47

monotone, 197

multiplicative identity, 49

negative definite, 136

nilpotent matrix,  50

non-basic variable, 20

non-invertible, 61

nonhomogeneous, 156

normal vector,  118

null space,  56

nullity, 58

open, 132

ordinary differential equation,  145

orthogonal projection,  78

partial derivative,  113

partial differential equation,  145

pivot entry, 17

pivot position,  17

8/12/2019 Mathematics Methods

http://slidepdf.com/reader/full/mathematics-methods 212/212

212

polar coordinates, 102

positive definite,  136

power series,  207

product, 26

proof, 14

quadratic form, 136

rank, 55

Rank-Nullity Theorem, 58

ratio, 80

reduced row-echelon form,  64

right-distributive, 48

row echelon form, 16

row space,  51

row vector,  26

row-reduction, 17

saddle point, 132

scalar,  9

scalar multiple, 9

scalar multiplication, 25,  47

separable differential equation,  152

sequence, 193similar,  187

skew-symmetric matrix, 50

span, 32

spanning set, 35

standard basis, 41

standard basis vectors,  41

standard matrix,  80

subspace, 27

sum, 26

supremum, 196

symmetric matrix,  50

systems of linear equations,  9

Taylor polynomial, 139

Taylor series,  208

transpose, 48

upper-triangular matrix, 50

vector addition,  25

vector space,  25

vector subspace, 27

vectors, 25

zero divisors, 50

zero matrix,  50

zero-vector,  27