docshare01.docshare.tipsdocshare01.docshare.tips/files/27342/273429050.pdfmatrices pam norton...
TRANSCRIPT
Matrices Pam NortonMatrices are used in many areas of mathematics, and also have applications in diverse areas such as engineering, computer graphics, image processing, physical sciences, biological sciences and social sciences. Powerful calculators and computers can now carry out complicated and difficult numeric and algebraic computations involving matrix methods, and such technology is a vital tool in related real-life, problem-solving applications.
This book provides mathematics teachers with an elementary introduction to matrix algebra and its uses in formulating and solving practical problems, solving systems of linear equations, representing combinations of affine (including linear) transformations of the plane and modelling finite state Markov chains. The basic theory in each of these areas is explained and illustrated using a broad range of examples. A feature of the book is the complementary use of technology, particularly computer algebra systems, to do the calculations involving matrices required for the applications. A selection of student activities with solutions and text and web references are included throughout the book.
MathsWorks for Teachers Series editor David Leigh-Lancaster
Matrices Pam NortonMatrices are used in many areas of mathematics, and have applications in diverse areas such as engineering, computer graphics, image processing, physical sciences, biological sciences and social sciences. Powerful calculators and computers can now carry out complicated and difficult numeric and algebraic computations involving matrix methods, and such technology is a vital tool in related real-life, problem-solving applications.
This book provides mathematics teachers with an elementary introduction to matrix algebra and its uses in formulating and solving practical problems, solving systems of linear equations, representing combinations of affine (including linear) transformations of the plane and modelling finite state Markov chains. The basic theory in each of these areas is explained and illustrated using a broad range of examples. A feature of the book is the complementary use of technology, particularly computer algebra systems, to do the calculations involving matrices required for the applications. A selection of student activities with solutions and text and web references are included throughout the book.
Series overviewMathsWorks for Teachers has been developed to provide a coherent and contemporary framework for conceptualising and implementing aspects of middle and senior mathematics curricula.
Titles in the series are:
Functional Equations David Leigh-Lancaster
Contemporary Calculus Michael Evans
Matrices Pam Norton
Foundation Numeracy in Context David Tout & Gary Motteram
Data Analysis Applications Kay Lipson
Complex Numbers and Vectors Les Evans
MathsWorks for Teachers
Functional EquationsDavid Leigh-Lancaster
MathsWorks for Teachers
Michael Evans
Contemporary Calculus
MathsWorks for Teachers
Pam Norton
Matrices
MathsWorks for Teachers
Foundation Numeracy in ContextDavid Tout and Gary Motteram
MathsWorks for Teachers
Kay Lipson
Data Analysis Applications
MathsWorks for Teachers
Les Evans
Complex Numbers and Vectors
MathsWorks for Teachers
Pam Norton
Matrices M
ATRIC
ES Pam N
ortonM
athsWorks for Teachers
ISBN 978-0-86431-508-3
9 7 8 0 8 6 4 3 1 5 0 8 3
MathsWorks for Teachers
MatricesPam Norton
C o n t e n t s
Introduction vAbout the author vi
1 Anintroductiontomatrices 1
History 2Matrices in the senior secondary mathematics curriculum 5
2 Rectangulararrays,matricesandoperations 11
Definition of a matrix 15Operations on matrices 16Addition and subtraction of two matrices 17Multiplication by a number (scalar multiple) 17Structure properties of matrix addition and scalar multiplication 19Matrix multiplication 20Zero and identity matrices 23The transpose of a matrix 25The inverse of a matrix 27Applications of matrices 30
3 solvingsystemsofsimultaneouslinearequations 37
Solving systems of simultaneous linear equations using matrix inverse 43
The method of Gaussian elimination 50Systems of simultaneous linear equations in various contexts 59
Contents
4 transformationsofthecartesianplane 75
Linear transformations 76Linear transformation of a straight line 81Linear transformation of a curve 87Standard types of linear transformations 89Composition of linear transformations 101Affine transformations 103Composition of affine transformations 104
5 transitionmatrices 111
Conditional probability 111Transition probabilities 112The steady-state vector 119Applications of transition matrices 121
6 Curriculumconnections 136
7 solutionnotestostudentactivities 141
Referencesandfurtherreading 163
Notes 165
I n t R o d u C t I o n
MathsWorks is a series of teacher texts covering various areas of study and topics relevant to senior secondary mathematics courses. The series has been specifically developed for teachers to cover helpful mathematical background, and is written in an informal discussion style.
The series consists of six titles:• Functional Equations• Contemporary Calculus• Matrices• Data Analysis Applications• Foundation Numeracy in Context• Complex Numbers and Vectors
Each text includes historical and background material; discussion of key concepts, skills and processes; commentary on teaching and learning approaches; comprehensive illustrative examples with related tables, graphs and diagrams throughout; references for each chapter (text and web-based); student activities and sample solution notes; and a bibliography.
The use of technology is incorporated as applicable in each text, and a general curriculum link between chapters of each text and Australian state and territory as well as selected overseas courses is provided.
A Notes section has been provided at the end of the text for teachers to include their own comments, annotations and observations. It could also be used to record additional resources, references and websites.
v
A b o u t t h e A u t h o R
Pam Norton is an experienced lecturer in university level mathematics. Her mathematical interests are in the applications of mathematics to sport, and her educational interests are in the use of technology in the teaching and learning of mathematics. She has been involved in the setting and assessing of examinations and extended assessment tasks in mathematics and curriculum review at both secondary and tertiary levels.
vi
C H A p T e r 1A n I n t R o d u C t I o n t om At R I C e s
Throughout history people have collected and recorded various data using sets (unordered lists), vectors (ordered one-dimensional lists) and matrices (ordered two-dimensional lists of lists, tables or rectangular arrays). Today arrays and tables of numbers and other information are found widely in everyday life. For example, sports ladders give numbers of wins, losses, draws, points and other information for teams in a competition. Each day stock tables are given in local newspapers. Results of opinion polls are usually given in table form in newspapers. Information of many forms is held in tables, and the spreadsheet is an electronic digital technology that is now widely used in everyday life wherever data is to be entered, stored and manipulated using tables and rectangular arrays.
The word matrix comes from the Latin word for ‘womb’. The term matrix is also used in areas other than mathematics, and generally means an environment, milieu, substance or place in which something is cast, shaped or developed.
It has become conventional in contemporary mathematics to describe a matrix in terms of the specification of its elements by reference to row by column, following the use of this form in developments by 19th century mathematicians, but it is interesting to speculate whether an element of a matrix would most likely have been referenced by column first then by row, had computerised spreadsheets been around prior to the modern development of matrices! Indeed, Chinese mathematicians who first recorded the use of matrices used an early form of computer algebra, the counting board, and did reference elements by column before row (see Hoe, 1980).
1
Matrices
mAthsWoRksfoRteACheRs
h I s t o R y
In the history of mathematics, the explicit and formal development of matrices is a relatively modern invention. Katz (1993), Grattan-Guinness (1994) and Fraleigh and Beauregard (1995) provide useful historical background, as do the websites referenced at the end of this chapter.
Although it is evident that ancient civilisations such as the Babylonians and Chinese were able to solve systems of simultaneous linear equations, it is not clear if systems with multiple solutions or no solutions were considered by them. The Egyptians were able to solve simple linear equations directly and by guessing an answer and adjusting it to the correct one.
The origins of matrices are found in the study of systems of simultaneous linear equations, and can be traced back to the Chinese in the Han Dynasty about 200 BCE. The Han dynasty, established in about 210 BCE, developed two important texts for mathematical education: Zhoubi suanjing (Arithmetical Classic of the Gnomon and the Circular Paths of Heaven) and Jiuzhang suanshu (Nine Chapters on the Mathematical Art). In the eighth chapter of the latter, systems of simultaneous linear equations were solved with the aid of a counting board by arranging the equations in tabular form, with each column containing the coefficients and the constant term for one of the equations. The solution was then obtained by multiplying and subtracting columns to get a triangular form, followed by back-substitution. English translations and commentary on these texts have recently become available (see Kangshen, Crossley & Lun, 1999).
Systems of simultaneous linear equations arise in many areas—economics, social sciences, medicine, engineering, biological and physical sciences and mathematics. The most useful method for solving such systems is known as Gaussian elimination. It was first used in ‘modern’ times by Gauss in 1809 to solve six simultaneous equations in six unknowns which he built while studying the orbit of the asteroid Pallas. Gauss simply developed the method first documented by the Chinese in about 200 BCE. Not only does this method have historical significance, but it is also the basis for the best direct methods for programming a computer to solve such systems today. While software programs such as MatLab have been specifically developed for high-speed numerical computation with high-order matrices, other programs and technologies such as spreadsheets, computer algebra systems (CAS) and graphics and CAS calculators can also carry out matrix computations with matrices that are too large for efficient by hand computation. CAS technologies can also carry out computations with algebraic as well as
2
3
An introduction to matrices
ChApteR1
numerical elements for matrices. Where quick and accurate computations are required, modern calculator and computer technologies are indispensable tools.
Gauss’s method was extended to become known as the Gauss–Jordan method for solving systems of simultaneous linear equations. Jordan’s contribution to this method is the incorporation of a systematic technique for back-substitution. The Gauss–Jordan method is used to obtain the reduced row echelon form of a matrix and hence to solve a system of simultaneous linear equations directly. The Jordan part was first described by Wilhelm Jordan, a German professor of geodesy, in the third edition (1888) of his Handbook of Geodesy. He used the method to solve symmetric systems of simultaneous linear equations arising out of a least squares application in geodesy.
It is perhaps somewhat surprising that the idea of a matrix did not evolve until well after that of a determinant. In 1545, Cardan gave a rule for solving a system of two linear equations, which is essentially Cramer’s rule using determinants for solving a 2 × 2 system. The idea of a determinant appeared in Japan and Europe at approximately the same time. Seki Kowa in a manuscript in 1683 introduced the notion of a determinant (without using this name for them). He described their use and showed how to find determinants of matrices up to order 5 × 5. In the same year in Europe, Leibniz also introduced the idea of a determinant to explain when a system of equations had a solution.
In 1750, Gabriel Cramer published his Introduction to the Analysis of Algebraic Curves. He was interested in the problem of determining the equation of a plane curve of a given degree that passes through a certain number of points. He stated a general rule, now known as Cramer’s rule, in an appendix, but did not explain how it worked. The term ‘determinant’ was first introduced by Gauss in 1801 while discussing quadratic forms. Binary quadratic forms are expressions such as ax2 + 2bxy + cy2, where x and y are variables and a, b and c coefficients, which can be represented in matrix form as:
x y
ab
bc
xy
6 = =@ G G
Gauss considered linear substitutions for the variables x and y of the form
x x y
y x y1 1
1 1
= +
= +
a bg h
and composition of substitutions, which led to matrix multiplication.
4
Matrices
mAthsWoRksfoRteACheRs
Augustin-Louis Cauchy, in 1812, published work on the theory of determinants, both using the term determinant and the abbreviation (a1,n) to stand for the symmetric system
a
a
a
a
a
a
a
a
a
,
,
,
,
,
,
,
,
,n n
n
n
n n
1 1
2 1
1
1 2
2 2
2
1
2
f
f
fh h hh
that is associated with the determinant. Although many of the basic results in calculating determinants were already known, he introduced work on minors and adjoints, and the procedure for calculating a determinant by expanding along any row or down any column (now called the Laplace expansion).
It was not until 1850 that James Joseph Sylvester used the term matrix to refer to a rectangular array of numbers. He spent most of his time studying determinants that arose from matrices.
Soon after, Arthur Cayley showed that matrices were useful to represent systems of simultaneous linear equations, with the notion of an inverse matrix for their solution. In 1858 Cayley gave the first abstract definition of a matrix, and subsequently developed the algebra of matrices, defining the operations of addition, multiplication, scalar multiplication and inverse. He also showed that every matrix satisfies its characteristic equation, a result which is now known as the Cayley–Hamilton theorem. Cayley proved this theorem for 2 × 2 matrices, and had checked the result for 3 × 3 matrices, while William Rowan Hamilton had proved the special case for 4 × 4 matrices with his investigations into the quaternions. Georg Frobenius proved it for the general case in 1878.
Many other mathematicians have contributed to the theory of matrices and determinants, including Etienne Bezout, Colin Maclaurin, Alexandre-Theophile Vandermonde, Pierre Simon Laplace, Joseph Louis Lagrange, Ferdinand Gotthold Eisenstein, Camille Jordan, Jaques Sturm, Karl Gustav Jacob Jacobi, Leopold Kronecker and Karl Weierstrass.
Markov chains are named after the Russian mathematician Andrei Andreevich Markov, who first defined them in 1906 in a paper dealing with the Law of Large Numbers. He used examples from literary books, with the two possible states being vowels and consonants. To illustrate his results, he did a statistical study of the alternation of vowels and consonants in Pushkin’s Eugene Onegin.
Matrices and matrix methods are now used in many areas of practical and theoretical application. The Harvard economist Wassily W. Leontief was
5
An introduction to matrices
ChApteR1
awarded the Nobel Prize in Economics for his work on input–output models, which relied heavily on matrices and solving systems of simultaneous linear equations. Matrices are also used extensively in business optimisation contexts, in particular where networks (graph theory) are applied to problems involving representation, connectedness and allocation. The mathematician Olga Taussky Todd developed matrix applications to analyse vibrations on airplanes during World War II, and made an important contribution to the development and application of matrix theory.
The development of the electronic digital computer has had a big impact on the use of matrices in many areas. Matrix methods are used extensively in computer graphics, a developing area especially driven by the demands of the movie and computer games industries. Matrix methods are also used extensively in the communication industry (especially for encryption and decryption), in engineering and the sciences, and in economic modelling and industry.
In mathematics, the study of matrices and determinants is part of linear algebra, and it is recognised that matrices and their natural operations provide models for the algebraic structures of a vector space and of a non-commutative ring.
m At R I C e s I n t h e s e n I o R s e C o n d A R ym At h e m At I C s C u R R I C u l u m
During the 1960s and 1970s matrices began to be incorporated into aspects of the senior secondary mathematics curriculum, in particular following the innovations of the new mathematics program from the 1950s (see, for example, Allendoerfer & Oakley, 1955; Adler, 1958), and the increasing use of them as a tool for business-related applications. In curriculum terms this can also be related to a greater emphasis on discrete mathematics in curriculum design during the second half of the 20th century.
There have been several purposes for which matrices have been introduced into the senior secondary mathematics curriculum:• to represent and solve systems of m simultaneous linear equations in
n variables, where m, n ≥ 2• to represent and apply transformations, and combinations of
transformations of the cartesian plane, in particular considering those subsets of the cartesian plane that represent graphs of functions and other relations
6
Matrices
mAthsWoRksfoRteACheRs
• as arrays to model and manipulate data related to practical situations such as stock inventories, sales and prices
• to represent and compute states and transitions between states, for example population modelling, tax scales and conditional probabilities, including Markov chains
• as a mathematical tool for analysis in graph theory (networks) and game theory
• to carry out numerical computation such as for approximation of irrational real numbers
• to provide a model for abstract mathematical structures such as non-commutative rings and vector spaces
• to provide a model for other mathematical entities such as complex numbers and vectorsThe incorporation of matrices within the school mathematics curriculum
has been both as objects for investigation in their own right, and for their instrumental application in particular contexts. For example, in The New Mathematics (Adler, 1958) matrices are introduced for analysis of linear transformations of the plane, then considered as an algebra in their own right, and finally used as a model for the complex numbers. In The Principles of Mathematics (Allendoerfer & Oakley, 1955) matrices are briefly introduced as an example of a non-commutative ring. During the 1970s in Aggregating Algebra (Holton & Pye, 1976), matrices are introduced in the context of systems of three simultaneous linear equations in three unknowns with an emphasis on geometric interpretation of the solutions. Similarly, in School Mathematics Research Foundation: Pure Mathematics (Cameron et al., 1970), matrices are introduced in the context of linear transformations of the plane and then applied to the solution of systems of simultaneous linear equations (with a detailed discussion on elementary matrices) and finally a brief consideration of matrix algebras.
Around this time matrices also began to be used increasingly for practical applications within discrete business-oriented mathematics courses (see, for example, Kemeny et al., 1972) including applications of transition matrices. During the 1970s and early 1980s, within pure mathematics-oriented senior secondary courses, especially those intended for students with interest and aptitude in higher level mathematics, such as Pure Mathematics (Fitzpatrick & Galbraith, 1971), the use of matrices covered transformations, solution of systems of simultaneous linear equations and investigation of mathematical structures (groups of transformations of the plane, complex numbers and rings). In some cases this also included consideration of determinants for
7
An introduction to matrices
ChApteR1
equation solving (Cramer’s rule) and finding inverses. However, a comprehensive study of determinants is a substantial area of mathematical investigation of its own, and it has typically not been included in senior secondary mathematics curricula in its own right, but rather in relation to the study of matrices (see, for example, Hodgson & Patterson, 1990). In terms of contemporary senior secondary curricula, matrices are covered in, for example, Further Mathematics and Mathematical Methods (CAS) studies in Victoria, Australia, and in the Further Pure AS module: Matrices and Transformations in the United Kingdom.
Matrix computations had the advantage, within the contexts typically used in senior secondary mathematics courses, of requiring relatively simple arithmetic calculations. However, for matrices of order 3 × 3 and greater, the extent of these calculations meant that reliability of correct calculation became problematic, and much of the time working on matrices was spent on learning algorithms for the necessary computations and then carrying these out mentally, by hand, or possibly with the assistance of arithmetic or scientific calculators.
A consequence of this was that only matrices of small orders were used, and computations involving finding inverses tended to be associated with more formal rather than practically oriented senior secondary mathematics courses. The advent from the late 1980s of powerful and readily accessible mathematically able technology, initially in the form of mini-computers and later as desk-top computers with software such as spreadsheets, numerical computation software and CAS, and subsequently hand-held graphics and CAS calculators, has meant that the computational load associated with matrix work can be carried out by these technologies. Thus, without the time and reliability constraints imposed by access only to mental and/or by hand arithmetic calculation, senior secondary mathematics students across a broad range of courses can utilise matrices, including those of higher orders, for a variety of purposes.
The efficient and effective use of these technologies requires students to have a sound conceptual understanding of key matrix definitions (such as order, row, column, types of matrices) and conditions for computations (such as conformability, existence of inverses), as well as practical mental and by hand facility with computation in simple cases so that they can understand what computations are taking place, verify the reasonableness of results, and anticipate the nature of these results to verify their mathematical working.
While matrices continue to have a strong role in providing a unifying abstract structure in contemporary senior secondary mathematics curricula, their
�
Matrices
mAthsWoRksfoRteACheRs
instrumental value in practical modelling and applications can be enhanced in conjunction with the use of modern technology (see, for example, Kissane 1997; Garner, McNamara & Moya, 2004). Although purpose-designed computational software such as Matlab is used in complicated real-life applications where high speed computations involve matrices of very high orders, general CAS such as Derive, Maple and Mathematica and spreadsheets also have powerful matrix functionalities. Student versions of these general CAS, or graphics and CAS calculators can be used by students for examples suitable for senior secondary mathematics courses. The author has used the CAS software Derive (which also underpins computation in the TI-89, TI-92 and Voyage 200 series of hand-held CAS calculators) for matrix computations in this text. More recent hand-held technology, such as the CASIO Classpad 300 series, the TI-nspire CAS+ and the HP 50G, have very user-friendly template-based matrix functionality.
Various data can be collected and recorded using sets (unordered lists), vectors (ordered one-dimensional lists) and matrices (ordered two-dimensional lists of lists).Matrices are ordered rectangular arrays, or lists of equal-sized lists, that are constructed by arranging data in rows and columns.Rectangular tables are simple examples of matrices.The origins of matrices can be found in Chinese mathematical instructional texts from around 200 BCE, with application to the solution of systems of simultaneous linear equations.In Europe, matrices arose from Gauss’s work in the early 19th century on solving systems of simultaneous linear equations. This work provides a general method (known as Gaussian elimination). The term matrix was first applied to a rectangular array of numbers by Sylvester in 1850.Gaussian elimination, and related methods, form the basis for algorithms used by modern technology to solve systems of simultaneous linear equations.The idea of a determinant arose independently in both Japan and Europe in the latter part of the 17th century. The modern theory of matrices and determinants was developed significantly in the latter part of the 19th century and the early 20th century.
•
•
••
•
•
•
•
s u m m A R y
�
An introduction to matrices
ChApteR1
ReferencesAdler, I 1958, The new mathematics, John Day, New York.Allendoerfer, CB & Oakley, CO 1955, Principles of mathematics, McGraw-Hill, New
York.Cameron, N, Clements, K, Green, LJ & Smith, GC 1970, School Mathematics
Research Foundation: Pure mathematics, Chesire, Melbourne.Fitzpatrick, JB & Galbraith, P 1971, Pure mathematics, Jacaranda, Milton, Queensland.Fraleigh, JB & Beauregard, AR 1995, Linear algebra (3rd edition with historical notes
by VJ Katz), Addison-Wesley, Reading, MA.Garner, S, McNamara, A & Moya, F 2004, CAS analysis supplement for
Mathematical Methods CAS Units 3 and 4, Pearson, Melbourne.Grattan-Guinness, I (ed.) 1994, Companion encyclopedia of the history and
philosophy of the mathematical sciences, volume 1, Routledge, London.
Matrices began to be incorporated in senior secondary mathematics curricula from the 1960s, as discrete mathematics began to have a more significant role in curriculum designMatrices have been used in the senior secondary mathematics curriculum to:– solve simple systems of simultaneous linear equations – apply transformations of the plane to sets of points– solve practical problems in which information can be modelled
and manipulated using matrices and matrix operations– analyse transition states, such as in simple Markov chains– provide an example of an abstract mathematical structure– model complex numbers and vectors.From the 1960s through to the early 1980s, calculation of matrix operations in the senior secondary mathematics was generally carried out by hand (possibly with the assistance of a scientific calculator) and hence was restricted to applications involving matrices of low order.Access to spreadsheets, numeric processors, graphics and CAS calculators, and CAS software as enabling technology in the senior secondary mathematics curriculum from the late 1980s has permitted a broader range of applications involving higher order matrices to be addressed.
•
•
•
•
s u m m A R y (Cont.)
10
Matrices
mAthsWoRksfoRteACheRs
Hodgson, B & Patterson, J 1990, Change and approximation, Jacaranda, Melbourne.Hoe, J 1980, ‘Zhu Shijie and his jade mirror of the four unknowns’, in JN Crossley
(ed.), First Australian conference on the history of mathematics: Proceedings, Monash University, Clayton.
Holton, DA & Pye, W 1976, Aggregating algebra, Holt, Rhinehart and Winston, Sydney.
Kangshen, S, Crossley, JN & Lun, A 1999, The nine chapters on the mathematical art: Companion and commentary, Oxford University Press, Oxford.
Katz, VJ 1993, A history of mathematics—an introduction, Harper-Collins, Reading, MA.
Kemeny, JG, Scheifer, A, Snell, JL & Thompson, GC 1972, Finite mathematics with business applications, Prentice-Hall, Englewood Cliffs, NJ.
Kissane, B 1997, More mathematics with a graphics calculator, Mathematical Association of Western Australia, Claremont.
Websiteshttp://www.ualr.edu/lasmoller/matrices.html – History Department, University of
Arkansas at Little RockThis website includes a concise overview of the history of matrices with a good list of links to related sites, including online applications for matrix computation.
http://www-groups.dcs.st-and.ac.uk/~history/HistTopics/Matrices_and_determinants.html – School of Mathematics and Statistics, University of St Andrews ScotlandThis website contains a comprehensive historical coverage of matrices, determinants and related mathematics from ancient Babylonia and China through to the modern era, with extensive cross references to key mathematicians and related topics.
11
C H A p T e r 2R e C tA n g u l A R A R R Ay s ,m At R I C e s A n d o p e R At I o n s
The purpose of this chapter is to provide some practical contexts to motivate the definition of matrices and their features, to introduce several operations on matrices, and to discuss their properties. When a new mathematical structure is introduced, students often need to explore the rationale behind this process, and find a concrete model for the elements of the structure and related operations defined on these elements that they can interpret in context. They can then subsequently explore certain generalisations or specialisations related to this structure as they become more confident with its elements and operations in their own right.
There are many different ways in which arrays can be made in two-dimensions, typically based on regular geometric shapes such as polygons or circles. Such arrays can be devised by placing a finite set of physical objects according to some established procedure or frame of reference. In texts, depending on the culture, rows, such as in English (left to right), or in Arabic (right to left) or columns, such as in Chinese (top to bottom) are used for writing in a given direction. Rows and columns can be used together to cross reference information in texts, and a rectangular table is a convenient way of doing this. Information is frequently stored in such arrays, and conventions are required as to how information is to be read from these tables, or when and how information from such tables may be combined or otherwise manipulated.
In many contexts the information stored in specific locations within a rectangular array is numerical. For example, suppose we record the number of coins of the denominations 5 cents, 10 cents, 20 cents, 50 cents, one dollar and two dollars, in that order, of spare change that several individuals Michael, Jay, Sam and Lin, also in that order, collect over a week. We might write these in an ordered list:
{Michael {4, 0, 1, 3, 2, 2}; Jay {5, 1, 0, 0, 4, 2}; Sam {0, 0, 0, 4, 3, 0} and Lin {10, 4, 6, 0, 0, 1}}
12
Matrices
mAthsWoRksfoRteACheRs
Alternatively, this information could be displayed in a table:
5cent 10cent 20cent 50cent $1.00 $2.00
Michael 4 0 1 3 2 2
Jay 5 1 0 0 4 2
Sam 0 0 0 4 3 0
Lin 10 4 6 0 0 1
The corresponding more compact rectangular array of numbers is:
450
10
0104
1006
3040
2430
2201
R
T
SSSSS
V
X
WWWWW
Similarly, we might consider a business that has two outlets and sells only three products, A, B and C. We can represent the numbers of items of each product held in stock by each outlet in a table:
product
A b C
outlet1 150 40 10
outlet2 70 20 10
The rectangular array of numbers representing current stock is15070
4020
1010
= G. The entry in each position corresponds to the number of
products of a certain type in stock at a particular outlet. This is an example of a matrix—a simple rectangular array of numbers. Particular pieces of information can be obtained by reference to the row and column in which the desired information is located. For example, the number of items of Product C at Outlet 1, which is 10, is found in the first row, third column. It is common to designate matrices by capital letters, so this matrix could be called S (the stock matrix for this context) where:
S
15070
4020
1010
= = G
13
rectangular arrays, matrices and operations
ChApteR2
The piece of information discussed earlier, the numbers of items of Product C at Outlet 1, can be referred to as s13, where use of the lowercase s indicates an element of the matrix S, and the subscript 13 indicates this element is found in the first row and third column of the matrix S. This matrix has two rows and three columns, and this is summarised by saying that S is a matrix of size (order or dimension) 2 by 3, or alternatively a 2 × 3 matrix.
Teachers can then use students’ intuitive understanding of the practical context to provide natural definitions, and related conditions for the processes of addition, subtraction of matrices, scalar multiples of a matrix and the product of matrices.
For example, if a sale is made, say, of two items of Product B from Outlet 1,
this can be represented by the matrix 00
00
20
= G, and we can easily update the
stock record by subtracting 00
20
00
= G from 15070
4020
1010
= G to get 15070
3820
1010
= G,
that is 15070
4020
1010
00
20
00
15070
3820
1010
- == = =G G G.
Similarly, if new stock is delivered, say for example, 10 items of Product A and 5 items of Product B are delivered to each of the outlets, this additional
stock can be represented by the matrix 1010
55
00
= G, and the total stock of each
type can be determined by adding 1010
55
00
= G to 15070
3820
1010
= G to give
16080
4325
1010
= G, that is 15070
3820
1010
1010
55
00
16080
4325
1010
+ == = =G G G.
In this sort of context it is usual to have a periodic valuation of stock. For example, it might be the case that at the end of each month the business accountant values the stock held at each outlet. Suppose Product A costs $50 per item, Product B $30 per item and Product C $80 per item. This can be
represented by the cost matrix 503080
R
T
SSSS
V
X
WWWW and we can easily calculate the value of
the stock held by multiplication of the two matrices as follows:
16080
4325
1010
503080
160 50 43 30 10 8080 50 25 30 10 80
100905550
## # #
# # #=
+ +
+ +=
R
T
SSSS
= = =
V
X
WWWW
G G G
14
Matrices
mAthsWoRksfoRteACheRs
It should be noted that this form of product is essentially based on a linear combination of product numbers multiplied by their corresponding cost. Students will likely wonder at some stage why multiplication of matrices is not defined in terms of the corresponding arithmetic operation between elements in the same position in two matrices of the same order, as is the case for addition and subtraction of matrices. Practical examples such as the stock context provide a basis for understanding why the linear combination definition is used.
If at some time there is a 10% tax added to the cost of items, then the matrix of costs can easily be updated by multiplying the cost matrix by 100% + 10% = 110% or a factor of 1.1:
....
1 1503080
1 1 501 1 301 1 80
553388
#
#
#
= =
R
T
SSSS
R
T
SSSS
R
T
SSSS
V
X
WWWW
V
X
WWWW
V
X
WWWW
The role of the scalar multiple can also be introduced through consideration of repeated addition, such as S + S = 2S, S + 2S = 3S, S + 3S = 4S … and so on; however, this has the limitation of restricting the scalar multiple to a whole number.
It is important to distinguish between this type of multiplication, a scalar multiple of a matrix, and the product of two matrices.
This simple example (see also VCAA 2005, 57–60, 149–50) demonstrates the use of matrices, and the operations of matrix addition, matrix multiplication and scalar multiplication of a matrix in a practical context where the natural, or intuitive interpretation of the processes being used motivates and models the development of matrix operations. Another context which is accessible to senior secondary school students is that of scoring for events in house or other sporting competitions. However this approach also means that the matrices used are, by definition, conformable for the operation that is to be applied, and their order is known prior to any related calculations.
To deal with matrices in their own right (that is, as reified objects), independently of a particular modelling context, and to explore their general properties it is necessary to be able to describe and work with them abstractly.
15
rectangular arrays, matrices and operations
ChApteR2
s t u d e n t A C t I v I t y 2 . 1
a Use a suitable matrix product to calculate the total amount of change held by each of Michael, Jay, Sam and Lin in the given week.
b If the Australian–US dollar exchange rate is A$1 = US$0.76, use a suitable scalar multiple of the matrix in part a to find the equivalent value of their change in US dollars.
d e f I n I t I o n o f A m At R I x
An m × n matrix, A, is a rectangular array of numbers with m rows and n columns. We say A is of order, dimension or size, m by n and write m × n as shorthand for this. This does not mean that we wish to calculate the corresponding arithmetic product, although this will tell us the total number of elements in matrix A. Unfortunately it is not very helpful to know this as many matrices can have the same total number of elements.
As in our practical example, the position of each entry, or element, in the matrix is uniquely determined by its column and row numbers. Thus, we write
A
aa
a
aa
a
aa
am m
n
n
mn
11
21
1
12
22
2
1
2
f
f
f
=h h h
R
T
SSSSSS
V
X
WWWWWW
where the entry in the ith row and jth column, called the (i, j) entry of A, is denoted aij. In this case the letters i and j are index variables denoting position, where i runs through 1 to m, that is 1 ≤ i ≤ m, and j runs through 1 to n, that is 1 ≤ j ≤ n.
There are various notations that can be used for matrices. In this text we will use square brackets to enclose the entries of a matrix. Curve brackets are also used, however it is conventional to use only one notation in a given context. Matrices are designated using upper case letters, the entries in a matrix are identified with the corresponding lower case letter and subscript indices indicating their position; thus, we sometimes write A = [ai j ] where, as above, i is the row index and j the column index. As before, aij is the entry in the ith row and jth column of A and the ranges of i and j are understood to be those given by the order of the matrix A.
16
Matrices
mAthsWoRksfoRteACheRs
Two special cases of note are that a m × 1 matrix is usually called a column matrix or column vector, while a 1 × n matrix is usually called a row matrix or row vector. If a rectangular array is not available for visual display, then a matrix can be written as a list of lists of equal size, where a list is an ordered set. For example, the matrix
.
415
0
01
100
213
3 4-
R
T
SSSSSS
V
X
WWWWWW
is the 4 × 3 matrix uniquely defined by the list {{4, 0, 2}, {1, 1, 1}, {–5, 10, 3}, {0, 0, 3.4}}.
When technology is used, the data to specify a matrix is either entered into a template of a specified size (where the dimensions of the matrix needs to be specified first to obtain the desired template), or as a list of lists.
exAmple2.1
If A15070
4020
1010
= = G, then A is a 2 × 3 matrix, where a 7021 = and
a 4012 = .
If B503080
=
R
T
SSSS
V
X
WWWW, then B is a 3 × 1 column matrix, or column vector, where
b 5011 = , b 3021 = and b 8031 = .
If C 1 2 4= -7 A, then C is a 1 × 3 row matrix, or row vector, where c 111 = , c 212 =- andc 413 = .
o p e R At I o n s o n m At R I C e s
As we have seen in the earlier practical example, matrices may be added, subtracted, multiplied by a number (scalar), or multiplied by matrices. Some of these operations are not always possible; the sizes, or orders, of the matrices involved is important, that is to say there are conditions to which two matrices need to conform for their sum, difference or product to be defined, or for them to be conformable for that operation. In practice, general computation with matrices of high order is carried out by technology, and the algorithms used by various programs to carry out these computations need the orders of the
17
rectangular arrays, matrices and operations
ChApteR2
matrices involved and definitions for the relevant processes in terms of elements and their indices. Since the various operations on matrices are defined in terms of their elements, it is important to note that these elements, and any scalars which may also be involved, are usually regarded as being drawn from some field, often the real number field, R. Thus, the operations of addition, subtraction and multiplication defined on elements of matrices are the natural operations of the relevant field.
An interesting exercise for teachers to work through with students is to devise programs using basic programming constructs in a suitable high-level programming language that carry out the operations of matrix arithmetic.
A d d I t I o n A n d s u b t R A C t I o n o f t W om At R I C e s
If matrices A and B are of the same size m × n then A + B is the m × n matrix with (i, j) entry aij + bij, for i = 1 to m, j = 1 to n. That is, A + B = [aij] + [bij] = [aij + bij]. In other words, we simply add all the entries in their corresponding positions throughout the matrix.
Subtraction can be defined in the same way, and A – B = [aij] – [bij ] = [aij – bij]. In other words, we simply subtract all the entries in matrix B from their corresponding entries in matrix A.
exAmple2.2
134
251
731
1212
1 73 34 1
2 125 11 2
805
10
36
-
-
+ -
-
=
+
+ -
- + -
- +
+
+
=
-
]
] ]
]
g
g g
gR
T
SSSS
R
T
SSSS
R
T
SSSS
R
T
SSSS
V
X
WWWW
V
X
WWWW
V
X
WWWW
V
X
WWWW
134
251
731
1212
1 73 34 1
2 125 11 2
663
1441-
-
-
-
=
-
- -
- - -
- -
-
-
=
-
-
-
-
- ]
] ]
]
g
g g
gR
T
SSSS
R
T
SSSS
R
T
SSSS
R
T
SSSS
V
X
WWWW
V
X
WWWW
V
X
WWWW
V
X
WWWW
m u lt I p l I C At I o n b y A n u m b e R ( s C A l A Rm u lt I p l e )
Given a matrix A of size m × n and a number (scalar) k, then kA is the m × n matrix with (i, j) entry kaij for i = 1 to m and j = 1 to n. That is, if A = [aij ] and k is a scalar then kA = k[aij] = [k × aij].
1�
Matrices
mAthsWoRksfoRteACheRs
exAmple2.3
If k 3= and A231
275
=
-
-R
T
SSSS
V
X
WWWW, then
A A A A3 3231
275
693
62115
= + + =
-
-
=
-
-
] g
R
T
SSSS
R
T
SSSS
V
X
WWWW
V
X
WWWW
Note that subtraction can also be expressed in terms of a scalar multiple, with k 1=- , and the addition operation. If A and B have the same size, then
A B A B- = + -] g
with entries the sums aij + (–bij) of the corresponding entries in A and (–B).
exAmple2.4
134
251
731
1212
1 734 1
2 125 1
663
1441
31 2-
-
- -
-
=
+ -
- +
- + -
+ - =
-
-
-
-
+
+ -
] ]
]
]
g g
g
g
R
T
SSSS
R
T
SSSS
R
T
SSSS
R
T
SSSS
V
X
WWWW
V
X
WWWW
V
X
WWWW
V
X
WWWW
There is a special matrix, called the zero matrix O = [oij] where oij = 0 for all i and j. For any matrix A, A A O- = .
exAmple2.5
If A14
22
31
=-
-> H and B
21
43
12
=-
-> H, then
i A B14
22
31
21
43
12
1 24 1
2 42 3
3 11 2
33
25
21
+ =-
-+
-
-
=+
+ -
- +
+
+ -
- +=
]
] ]
]g
g g
g
> >
= =
H H
G G
1�
rectangular arrays, matrices and operations
ChApteR2
ii
A B14
22
31
21
43
12
1 24 1
2 42 3
3 11 2
15
61
43
- =-
- -
-
=-
- -
- -
-
- -
- -=
- -
- -
-
]
] ]
]g
g g
g
> >
= >
H H
G H
iii A B2 3 214
22
31
321
43
12
28
44
62
63
129
36
2 68 3
4 124 9
6 32 6
411
165
98
- =-
--
-
-
=-
--
-
-
=-
- -
- -
-
- -
- -=
- -
- -]
] ]
]g
g g
g
> >
> >
= >
H H
H H
G H
iv A A14
22
31
14
22
31
00
00
00
- =-
--
-
-=> > =H H G
s t R u C t u R e p R o p e R t I e s o f m At R I xA d d I t I o n A n d s C A l A R m u lt I p l I C At I o n
Let A, B and C be any matrices of a given size m × n, where m and n are non-zero, then1 the sum of any two such matrices
is always defined (Closure property for addition)2 (A + B) + C = A + (B + C) (Associative property for addition)3 A + O = A = O + A (Identity property for addition)4 A + (-A) = O = (-A) + A (Inverse property for addition)5 A + B = B + A (Commutative property for addition)
This collection of properties can be established by working from the general definition of matrix addition as applied to the matrices A = [aij ], B = [bij ], C = [cij ] and O = [oij ] of the same order. It may be helpful to have students undertake some general case calculations for matrices of a given order, for example 2 × 3 matrices. The results may appear to be obvious, or even trivial to students; however, care should be taken to draw to their attention that they do apply to matrices drawn from the same set of a given order (for any order) by virtue of the component-wise definition of addition of matrices, and the corresponding number properties of their elements and the corresponding number operations. This may be summarised by saying that
20
Matrices
mAthsWoRksfoRteACheRs
such a set of matrices forms a commutative (or abelian) group under addition. If we also consider multiplication of a matrix from this set by a scalar (scalar multiple), then for any scalars r and s the following properties also hold:1 r(sA) = (rs)A (Associative property of scalar multiples) 2 (r + s)A = rA + sA (Right distributive property of scalar multiple over
scalar addition)3 r(A + B) = rA + rB (Left distributive property of scalar multiple over
matrix addition)Again, it may be useful for students to consider the general case for
matrices of a given order. Taken together, these properties show that such a set of matrices with these operations of addition and scalar multiple form what is called a vector space.
m At R I x m u lt I p l I C At I o n
As observed from the introductory example, the product matrix A × B, or product of two matrices, also written as AB, can only be defined when the number of columns in matrix A is equal to the number of rows in matrix B. Alternatively this may be expressed by saying that the matrices A and B are conformable for the product A × B when the number of elements in the rows of matrix A is the same as the number of elements in the columns of matrix B.
If A = [aij] is an m × p matrix, and B = [bij] is a p × n matrix, then AB = C = [cij] is an m × n matrix with (i, j) entry the number
c a bij ik kj
k
p
1=
=/
That is, the (i, j) entry of the product is obtained by multiplying each of the entries in the ith row of A by the corresponding entries in the jth column of B, and then adding all these products.
21
rectangular arrays, matrices and operations
ChApteR2
exAmple2.6
a If A14
22
31
=-
-= G and B
21
3= -
R
T
SSSS
V
X
WWWW, then
AB14
22
31
21
3
1 2 2 1 3 34 2 2 1 1 3
133
# # #
# # #=
-
-- =
+ - - +
+ - + -=
] ]
] ]
g g
g g
R
T
SSSS
= = =
V
X
WWWW
G G G
Note that we cannot form the product BA, since the number of columns of B is not equal to the number of rows of A.
b If A14
22
31
=-
-= G and B
12
1
234
=
-
-
R
T
SSSS
V
X
WWWW, then
AB14
22
31
12
1
234
1 1 2 2 3 14 1 2 2 1 1
1 2 2 3 3 44 2 2 3 1 4
69
810
# # #
# # #
# # #
# # #
=-
-
-
-
=- + - - +
- + - + -
+ - +
+ + -
=-
] ] ]
] ] ]
]
]
g g g
g g g
g
g
R
T
SSSS
>
=
>
V
X
WWWW
H
G
H
and
BA12
1
234
14
22
31
1 1 2 42 1 3 4
1 1 4 4
1 2 2 22 2 3 2
1 2 4 2
1 3 2 12 3 3 1
1 3 4 1
71017
6106
591
# #
# #
# #
# #
# #
# #
# #
# #
# #
=
-
--
-
=
- +
- +
+
- - +
- - +
- +
- + -
- + -
+ -
=
-
-
-
]
]
] ]
] ]
]
] ]
] ]
]
g
g
g g
g g
g
g g
g g
g
R
T
SSSSR
T
SSSSR
T
SSSS
>
V
X
WWWW
V
X
WWWW
V
X
WWWW
H
So we have the situation in Example 2.6 that AB is a 2 × 2 matrix and BA is a 3 × 3 matrix. While both products are defined, in this case they are not equal since the product matrices are of different size (order). Thus, if A and B are arbitrary matrices and the product AB is defined, it may be the case that the product BA is either not defined or, if it is defined, it may not be of the same
22
Matrices
mAthsWoRksfoRteACheRs
size as AB. It is important to point out this aspect of matrix multiplication to students at an early stage, and also that for matrices of a given order m × n where m and n are different, neither product will be defined. These motivate consideration of conditions under which matrix multiplication might be generally defined for a given set of matrices, and also those circumstances under which addition might be generally defined for the same set of matrices.
A matrix of size m × n where m = n is called a square matrix or a n × n matrix. If A and B are two square matrices of the same size, then we can form the products AB and BA by definition, since the number of rows and columns in both matrices are equal. This also ensures that addition is defined on these matrices, and we know addition is commutative anyway. However, it is not so clear for multiplication of square matrices whether AB is the same as BA or not. Considering a particular example for two 2 × 2 matrices such as
A21
35
=-> H and B
12
23
=-> H
AB21
35
12
23
2 1 3 21 1 5 2
2 2 3 31 2 5 3
411
1313
# #
# #
# #
# #=
-
-=
- +
- - +
+
- +
=
]
] ] ]
g
g g g> > =
=
H H G
G
and BA1
223
21
35
1 2 2 12 2 3 1
1 3 2 52 3 3 5
41
721
# #
# #
# #
# #=
-
-=
- + -
+ -
- +
+
=-
] ]
]
]g g
g
g> > =
>
H H G
H
Clearly, AB ≠ BA in this case. Inspection of a range of other examples will generally show that it is not the case that AB is the same as BA. This is also the case for square matrices of other orders, a situation which students can readily investigate using suitable technology. They will readily develop a collection of cases which show that multiplication of square matrices is, in general, not commutative. A related investigation is to see if students can identify examples, and then sets, of square matrices for which the product is commutative, for example:
30
02
= G and 0
06
1= G
There are some important situations in which matrix products are commutative, for example when they are used to represent and compose certain types of transformations of the cartesian plane, such as rotations about the origin.
23
rectangular arrays, matrices and operations
ChApteR2
Teachers may also wish to consider such arguments from the general definition of matrix multiplication for the case of, for example, 2 × 2 square matrices, and consideration of the equality of two matrices. Thus:
Aac
bd
Beg
fh
ABae bgce dg
af bhcf dh
BAea fcga hc
eb fdgb hd
if and
then
and
= =
=+
+
+
+
=+
+
+
+
= =
>
>
G G
H
H
If the elements of A and B are real numbers, then AB BA= when bg fc= , af bh eb fd+ = + , and ce dg ga hc+ = + .
propertiesofmatrixmultiplication
If A, B, and C are matrices of appropriate sizes, and k is a scalar then:1 A(B + C) = AB + AC (Distributive property of left multiplication
over addition)2 (B + C)A = BA + CA (Distributive property of right multiplication
over addition)3 (AB)C = A(BC) (Associative property of multiplication)4 k(AB) = (kA)B5 AB ≠ BA in general
Z e R o A n d I d e n t I t y m At R I C e s
A matrix with all entries zero is called the zero matrix (of appropriate size), and denoted O. A square matrix of size n × n with all entries zero except the diagonal entries, that is those in position (j, j) for j = 1 to n, which are all 1, is called the identity matrix of size n × n, denoted I. When the size (order) of matrices being considered is fixed, the symbols O and I can be used without ambiguity, otherwise the notation Om,n and In,n or just On (when m = n) and In can be used, as applicable.
24
Matrices
mAthsWoRksfoRteACheRs
exAmple2.7
O000
000
,3 2 =
R
T
SSSS
V
X
WWWW is the zero matrix of size 3 × 2.
I10
01,2 2 = = G is the identity matrix of size 2 × 2.
I100
010
001
3 =
R
T
SSSS
V
X
WWWW is the identity matrix of size 3 × 3.
Zero and identity matrices have properties similar to the numbers 0 and 1 with respect to addition and multiplication. Students should be able to convince themselves that if A is a matrix and O is the zero matrix of the same size, then A + O = A = O + A and A + (–A) = O = (–A) + A, and hence A – A = O. Similarly if A is an m × n matrix and I is the identity matrix of size n × n then AI = A, and if B is an n × m matrix then IB = B. If A is also an n × n matrix, then students should also be able to observe that AI = A = IA. Initially this may be tested by a judicious range of examples, and subsequently argued in terms of the general definitions of the relevant operations.
Exploration of these operations and their conformable computations for various combinations of matrices will enable students to see matrices as rectangular arrays that can be thought of as a list of lists of the same size, as a rectangular array and as abstract reified objects in their own right. They should also be aware that while particular computations involving matrices of relatively low order can be carried out fairly readily (if somewhat tediously) by hand, more complex computations and/or computations involving matrices of higher order are generally best carried out by technology designed for this purpose.
However, students should also be aware that general analysis of matrix operations is likely to involve component-wise operations with numbers using indexed sets of sums and/or products of these numbers. For example, students might be asked to show that for square matrices of a given size AO = O = OA but AB = O does not necessarily imply that A = O or B = O.
25
rectangular arrays, matrices and operations
ChApteR2
s t u d e n t A C t I v I t y 2 . 2
a Given A11
32
=-= G, B 2
21
4=
-= G, C 2
84
106
12= = G calculate
i 2Aii A – Biii A + Biv ABv BAvi AC
b Let M 48
16
= = G and P pq
= = G where p and q are non-zero real numbers. Find all real
values of the scalar k such that MP = kP.
c Let A 11
11
= = G. evaluate An for n = 2 to 5 and find a general form for n > 1.
d Show that for square matrices of a given size AO = O = OA, but AB = O does not necessarily imply that A = O or B = O.
e Let J 01
10
=-
= G and I be the identity matrix for multiplication for 2 × 2 matrices.
Show that J 2 = –I and that J 4 = I.f explain why, if X and Y are 2 × 2 matrices, then, in general,
X 2 – Y 2 ≠ (X + Y )(X – Y ) and illustrate this with a suitable (counter) example. Find two matrices X and Y for which this relationship is true.
t h e t R A n s p o s e o f A m At R I x
It is a natural question to ask what happens if a matrix is written ‘the other way round’. The transpose of a matrix is obtained by interchanging its rows and columns. That is, the entries of the ith row become the entries of the ith column. So, if A = [aij] is an m × n matrix, then its transpose, denoted A a aT
ij ijTT= =7 8A B, is an n × m matrix, with a aT
jiij = .
exAmple2.�
If A is the 3 × 2 matrix 125
348
R
T
SSSS
V
X
WWWW, then AT is a 2 × 3 matrix
13
24
58
= G.
26
Matrices
mAthsWoRksfoRteACheRs
propertyofmatrixtransposes
If A and B are matrices such that we can form the product AB, then
(AB)T = BTAT
We illustrate this with an example.
exAmple2.�
Let A13
20
=-
> H and B20
11
43
=-
> H.
Then AB13
20
20
11
43
26
33
212
232
63
12
TT T
=- -
=-
-
-= -
-
-] fg p
R
T
SSSS
> > >
V
X
WWWW
H H H
and B A21
4
013
12
30
232
63
12
T T = --
= -
-
-
R
T
SSSS
R
T
SSSS
>
V
X
WWWW
V
X
WWWW
H
Note that we cannot form the product ATBT, since the number of columns in AT, namely 2, is not equal to the number of rows in BT, namely 3.
Most equations involving matrices can be written in two forms: the original and its equivalent using transposes. For example, the matrix equation
16080
4325
1010
503080
100905550
# =
R
T
SSSS
= =
V
X
WWWW
G G
that we have seen earlier can also be written in transposed form:
50 30 801604310
802510
10090 5550# =
R
T
SSSS
6 6
V
X
WWWW
@ @
Commonly, matrix equations involving transformations (see Chapter 4) and transitions (see Chapter 5) may also be written in a form that is the transpose of the form given in this book.
A symmetric matrix is the same as its transpose. If A = [aij] is a symmetric matrix, then aij = aji and AT = A. A symmetric matrix must be a square matrix.
27
rectangular arrays, matrices and operations
ChApteR2
exAmple2.10
13
32
= G and 12
4
220
403
-
-R
T
SSSS
V
X
WWWW are symmetric matrices.
A diagonal matrix is a square matrix that has non-zero entries only on the main diagonal.
exAmple2.11
10
02
= G and 100
02
0
003
-
R
T
SSSS
V
X
WWWW are diagonal matrices.
t h e I n v e R s e o f A m At R I x
A square n × n matrix A is said to be invertible if there is a matrix C, written as A–1, of the same size as A, such that AC = I and CA = I, where I is the identity matrix of size n × n. C is said to be the inverse matrix of A.
Not all square matrices have inverses. For example, 23
46
= G does not have an inverse.
propertiesofinverses
It is important that students are familiar with some of the key properties of matrices and their inverses:1 A square matrix has at most one inverse, where A–1A = I = AA–1.2 If A is invertible, then so is AT, and (AT)–1 = (A–1)T.3 If A is invertible, then so is A–1, and (A–1)–1 = A.4 If A and B are invertible matrices of the same size, then AB is invertible
and (AB)–1 = B–1A–1.These properties can be established fairly readily, and provide some good
examples to students of proofs that are not lengthy, but are illustrative of important aspects of mathematical reasoning using structural properties such
2�
Matrices
mAthsWoRksfoRteACheRs
as uniqueness, identity and inverse, definitions and the use of previously established results. Proofs of these properties are as follows:1 If C and D are both inverses of A, then AC = I and CA = I, and AD = I and
DA = I. So C = CI = C(AD) = (CA)D = ID = D, that is C = D and the inverse A–1 is unique.
2 AA–1 = I, so (AA–1)T = (A–1)TAT = I since IT = I. Hence (AT)–1 = (A–1)T.3 AA–1 = I, so clearly A is the inverse matrix of A–1.4 By definition (AB)–1(AB) = I. Consider (B–1A–1)(AB), by the associative
property for multiplication, this is the same as B–1(A–1A)B, = B–1IB = B–1B = I by inverse and identity properties. So by 1, since inverses are unique, (AB)–1 = B–1A–1.
There are many uses of inverse matrices. The following are just a few.1 The inverse of a matrix can be used for cancellation purposes in matrix
equations. If A is invertible, then: i AB = AC implies that B = C (since we can multiply both sides on the
left by A–1).ii BA = CA implies that B = C (since we can multiply both sides on the
right by A–1). If A is not invertible, then this is not the case. For example, let A
23
46
= = G,
B13
24
12
= = G and C51
100
12
= = G.
Then AB23
46
13
24
12
1421
2030
1015
= == = =G G G
and AC23
46
51
100
12
1421
2030
1015
= == = =G G G
Hence AB = AC and yet B ≠ C. 2 The inverse matrix can be used to solve a system of simultaneous linear
equations with a unique solution. Consider the following system of simultaneous equations:
x y
x y
2 3 7
4 3
+ =
+ =
This system can be written in matrix form as AX = B, where A24
31
= = G,
Xxy
= = G and B73
= = G. If A–1 exists, then we can multiply both sides of the
equation on the left by A–1, and we thus have X = A–1B, and so we can find the solution by matrix multiplication.
2�
rectangular arrays, matrices and operations
ChApteR2
There are several ways to find the inverse of a matrix. The most useful one is via the Gauss–Jordan method (see, for example, Anton & Rorres, 2005, or Nicholson, 2003 for details). Since we shall generally only be concerned with the inverse of a 2 × 2 matrix, we will find it in a simple way.
exAmple2.12
Find the inverse of Aac
bd
= = G if it exists.
Solution
Suppose eg
fh
= G is its inverse.
Then ac
bd
eg
fh
10
01
== = =G G G.
Hence:ae + bg = 1 (i)af + bh = 0 (ii)ce + dg = 0 (iii)cf + dh = 1 (iv)
Considering equation (ii), if we put f b=- and h a= , then the equation is satisfied. Similarly for equation (iii) we can put e d= and g c=- .
Now consider the matrix product:
ac
bd
eg
fh
ac
bd
dc
ba
ad bcad bc
ad bc I0
0=
-
-=
-
-= -] g= = = = =G G G G G
and then it is easy to see that ac
bd ad bc
dc
ba
11
=- -
--
= =G G provided
ad bc 0!- , since if BA = kI, where k is a non-zero scalar, then
B A Ik1 =` j , and hence, by definition, B Ak
1 1= -` j .
If ad – bc = 0, then the matrix A does not have an inverse. The number ad – bc is called the determinant of A and denoted det(A) or |A|. (For more information on determinants, see the references on linear algebra.)
30
Matrices
mAthsWoRksfoRteACheRs
Inverseofa2×2matrix
A 2 × 2 matrix Aac
bd
= = G has an inverse if and only if ad – bc ≠ 0, and then
ac
bd ad bc
dc
ba
11
=- -
--
= =G G
(The reader can check that AA–1 = A–1A = I.)Note that an n × n matrix A has an inverse if and only if det(A) ≠ 0; see
further references on linear algebra for definition and calculation of such an inverse.
s t u d e n t A C t I v I t y 2 . 3
a Find the inverse of each of the following matrices:
i 13
24
= G ii 53
32
= G iii 23
11-
= G
b Use matrix inverses to solve the following system of simultaneous linear equations:
x yx y
2 3 74 3
+ =+ =
A p p l I C At I o n s o f m At R I C e s
fibonacci’srabbits
Suppose that newborn pairs of rabbits produce no offspring during the first month of their lives, but then produce one new pair every subsequent month. Start with F1 = 1 newborn pair in the first month and determine the number, Fr = number of pairs in the rth subsequent month, assuming that no rabbit dies.
Since the newborn pair do not produce offspring in the second month, we have F2 = F1 = 1. In the third month, the original pair will produce one pair of offspring, so F3 = 2. In the fourth month, the pairs in the second month will each produce another pair, so the total will be these newborn pairs added to the number of pairs from the previous month, that is F4 = 1 + 2 = 3, or F4 = F2 + F3.
Continuing in this manner, we see that
F5 = (Number of pairs alive in the third month, which each produce one pair of offspring in the fifth month) + (Number of pairs alive in the fourth month)
= F3 + F4
31
rectangular arrays, matrices and operations
ChApteR2
and in general that
Fr = (Number of pairs alive in month r – 1) + (Number of pairs alive in the (r – 2)th month, which each produce one pair of offspring in the rth month)
= Fr – 1 + Fr – 2.
Thus the sequence for the number of pairs of rabbits is
1, 1, 2, 3, 5, 8, 13, 21, 34 …
which is called the Fibonacci sequence, and has the property that from the third term on, each term is the sum of the preceding two terms in the sequence.
If Fr represents the rth term in this sequence, then
Fr = Fr – 1 + Fr – 2
We can express this in matrix form:
FF
FF
11
10
r
r
r
r1
1
2=
-
-
-> = >H G H
Then writing fr = F
Fr
r 1-> H for r > 1 and A =
11
10
= G, we find that fr = Afr – 1.
In particular, f3 = Af2, f4 = Af3 = A2f2, f5 = Af4 = A3f2 and, in general, fr = Ar – 2f2. So elements of this sequence can be determined by finding powers of the matrix A, and multiplying by the column matrix f2.
s t u d e n t A C t I v I t y 2 . 4
Find the 29th and 30th numbers in the Fibonacci sequence.
Codes
Governments, national security agencies, telecommunications providers, banks and other companies are often interested in the transmission of coded messages that are hard to decode by others, if intercepted, yet easily decoded at the receiving end. There are many interesting ways of coding a message, most of which use number theory or linear algebra. We will discuss one that is effective, especially when a large-size invertible matrix is used.
At a personal level, bank accounts, credit cards, superannuation funds, computer networks, phone companies, frequent flyer schemes, electronic
32
Matrices
mAthsWoRksfoRteACheRs
motorway toll systems and other commercial enterprises (such as online bookstores) require PINs or passwords to be able to do a transaction over the phone, at a machine or over the Internet. Banks now warn people not to write down the PIN/password at all, but to remember it. In some cases, people record PINs/passwords disguised as phone numbers, but this is not very secure and people are encouraged not to do it. Nowadays, a person usually has many different PINs/passwords which should be changed regularly. It is impossible to remember them all. However, it would be possible to write down the information in a coded form using the methods described below.
Let us start out with an invertible matrix M that is known only to the transmitting and receiving ends. For example:
M
31
42
=-
-= G
Suppose we want to code the message
L E A V E N O W
We make a table of letters of the alphabet with the number corresponding to the position of the letter in the alphabet under it. We use 0 for an empty space.
space A B C D E F G H
0 1 2 3 4 5 6 7 8
I J K L M N O P Q
9 10 11 12 13 14 15 16 17
R S T U V W X Y Z
18 19 20 21 22 23 24 25 26
We can use this table to replace each letter with the number that corresponds to the letter’s position in the alphabet.
L E A V E N O W
↑↓
↑↓
↑↓
↑↓
↑↓
↑↓
↑↓
↑↓
↑↓
↑↓
12 5 1 22 5 0 14 15 23 0
33
rectangular arrays, matrices and operations
ChApteR2
The message has now been converted into the sequence of numbers 12, 5, 1, 22, 5, 0, 14, 15, 23, 0, which we group as a sequence of column vectors:
125
= G, 122= G,
50= G,
1415= G,
230
= G
and multiply on the left by M:
M125
162
-
-== =G G, M
122
8543
== =G G, M50
155
-
-== =G G, M
1415
1816
== =G G, M230
6923
-== =G G
giving the sequence of numbers –16, –2, 85, 43, –15, –5, 18, 16, –69, 23. This is the coded message. Note that this could have been done in one step by the matrix multiplication:
M125
122
50
1415
230
162
8543
155
1816
6923
=-
-
-
-
-= >G H
To decode it, the recipient needs to compute M–1:
M1
21
2
231 =
-
--
> H
and multiply it by the vectors 162
-
-= G,
8543= G,
155
-
-= G,
1816= G,
6923
-= G to get back the
original numbers.
M162
125
1 -
-=-
= =G G, M8543
122
1 =-= =G G, M
155
50
1 =-
--
= =G G, M11 186
145
1 =-= =G G,
M69
23230
1 =--= =G G
or M162
8543
155
1816
6923
125
122
50
1415
230
1 -
-
-
-
-=-
> =H G
Note: It is possible to use 2 × 2 matrices so that both the matrix and its inverse have integer entries—simply ensure that the determinant is equal to 1 or –1. It is also easy to extend this simple coding system to take account of alphanumeric data such as PINs.
34
Matrices
mAthsWoRksfoRteACheRs
s t u d e n t A C t I v I t y 2 . 5
a Based on this approach, code the message SAVe THe WHALeS using the matrix
53
32
= G.
b The following message was received and was decoded using the coding matrix
M 53
32
= = G.
65, 42, 75, 50, 138, 87, 90, 54, 85, 54, 80, 49, 160, 99, 123, 76 Determine the original message.
Matrices are ordered rectangular arrays of numbers. An array with m rows of n elements (that is, with n columns) is said to be a matrix of size (order, dimension) ‘m by n’, written as m × n. Capital letters are typically used to designate matrices.A matrix A of size m × n can be written as a list of lists: A = {{a11, a12, a13 … a1n}, {a21, a22, a23 … a2n}, {a31, a32, a33 … a3n} … {am1, am2, am3 … amn}} or using a template such as:
A = [aij] =
aa
a
aa
a
aa
am m
n
n
mn
11
21
1
12
22
2
1
2
f
f
fh h h
R
T
SSSSS
V
X
WWWWW
where aij is the element in the ith row and the jth column, and 1 ≤ i ≤ m and 1 ≤ j ≤ n. Matrices can be added (are conformable for addition) if they are of the same size. The sum of two matrices is obtained by adding the elements in corresponding positions, that is A + B = [aij] + [bij] = [aij + bij] for all 1 ≤ i ≤ m and 1 ≤ j ≤ n. For any matrices A, B and C of the same size, A + B = B + A and (A + B) + C = A + (B + C).The zero matrix of size m × n is defined by Om, n = [oij ] where oij = 0 for all 1 ≤ i ≤ m and 1 ≤ j ≤ n, and is often written as O when the order is known (fixed) in a given context and there is no ambiguity.
•
•
•
•
s u m m A R y
35
rectangular arrays, matrices and operations
ChApteR2
If A is a matrix of size m × n and k is a scalar then kA = k[aij] = [kaij] for all 1 ≤ i ≤ m and 1 ≤ j ≤ n. In particular –1A = [–aij] = –A. For any matrix A and the corresponding zero matrix O, A + O = A = O + A, A + (–A) = O = (–A) + A and A – A = O. For any matrix A, and scalars r and s, r(sA) = (rs)A.If A is an m × n matrix and B is a p × q matrix then the product C = AB is defined when n = p and is a matrix of size m × q. Matrix
multiplication is defined ‘row by column’ where c a bij ik kjk
p
1=
=/ for
all 1 ≤ i ≤ m and 1 ≤ j ≤ q.In general, AB ≠ BA. If A, B and C are matrices and k is a scalar then (AB)C = A(BC) and k (AB) = (kA)B, provided the relevant matrix products are defined.The n × n identity matrix, written In, n = In or just I when the order is given and no ambiguity arises, is defined as the matrix with 1 for each element along its leading diagonal (top left to bottom right) and 0 for all other elements. For any square matrix A of a given order and the corresponding identity matrix I, AI = A = IA.If A, B and C are square matrices of a given size, and r and s are scalars, then A(B + C) = AB + AC, (B + C)A = BA + CA, (r + s)A = rA + sA and r(A + B) = rA + rB.The transpose of a matrix A = [aij] is the matrix AT = [aij]
T = aijT
8 B
where aijT = aji. If the product AB of two matrices is defined, then
(AB)T = BTAT.The inverse of a square matrix A is a square matrix A–1 such that AA–1 = A–1A = I, the identity matrix of the same size as A. If AB is the product of two square matrices of the same size, then (AB)–1 = B–1A–1.
For a 2 × 2 matrix Aac
bd
= = G, A ad bcdc
ba
11 =- -
--= G, provided
ad – bc ≠ 0. (ad – bc is called the determinant of matrix A, written det(A) or |A|.)
•
•
•
•
•
•
•
s u m m A R y (Cont.)
36
Matrices
mAthsWoRksfoRteACheRs
ReferencesAnton, H & Rorres, C 2005, Elementary linear algebra (applications version), 9th
edn, John Wiley and Sons, New York.Cirrito, F (ed.) 1999, Mathematics higher level (core), 2nd edn, IBID Press, Victoria.Hill, RO Jr 1996, Elementary linear algebra with applications, 3rd edn, Saunders
College, Philadelphia.Lipschutz, S & Lipson, M 2000, Schaum’s outlines of linear algebra, 3rd edn,
McGraw-Hill, New York.Nicholson, KW 2001, Elementary linear algebra, 1st edn, McGraw-Hill Ryerson,
Whitby, ON.Nicholson, KW 2003, Linear algebra with applications, 4th edn, McGraw-Hill
Ryerson, Whitby, ON.Poole, D 2006, Linear algebra: A modern introduction, 2nd edn, Thomson Brooks/
Cole, California.Sadler, AJ & Thorning, DWS 1996, Understanding pure mathematics, Oxford
University Press, Oxford.Victorian Curriculum and Assessment Authority (VCAA) 2005, VCE Mathematics
study design, VCAA, East Melbourne. Wheal, M 2003, Matrices: Mathematical models for organising and manipulating
information, 2nd edn, Australian Association of Mathematics Teachers, Adelaide.
Websiteshttp://wims.unice.fr/wims/en_tool~linear~matrix.html
This website provides a matrix calculator. http://en.wikipedia.org/wiki/Matrix_(mathematics)
This website gives a concise introduction to matrices and matrix arithmetic, and has links to other resources and references.
http://www.sosmath.com/matrix/matrix.html This site has some notes on basic matrix concepts and operations at quite a simple level.
37
C H A p T e r 3s o lv I n g s y s t e m s o fs I m u ltA n e o u s l I n e A Re q u At I o n s
From the middle years of secondary schooling, students become familiar with linear relations of the form ‘a certain number of x s added to a certain number of y s are equal to a given number’, such as 2x + 3y = 24. Usually this is done by considering a table of whole number ordered pairs of values that satisfy the relation, and then plotting the corresponding points on a set of cartesian axes. This is then typically extrapolated (by an implicit continuity assumption) to consideration of the continuous straight line on which these points lie. Thus students learn to draw the graph, part of which is shown in Figure 3.1, of such
–20 –15 –10 –5 5 10 15 20
–20
–15
–10
–5
5
10
15
Ox
y
figure3.1: part of the graph of the linear relation 2x + 3y = 24
3�
Matrices
mAthsWoRksfoRteACheRs
linear relations, by identification of their axis intercepts and drawing in the line containing these two points. The corresponding working might go something like: when x = 0, 3y = 24 and so y = 8, hence (0, 8) are the coordinates of the vertical, or y-axis, intercept. Similarly, when y = 0, 2x = 24 and so x = 12, hence (12, 0) are the coordinates of the horizontal, or x-axis, intercept.
Students will also learn how to identify the gradient of the straight line from this form, the graph, or possibly by re-writing it in the function form
y x32 8=
-+ .
A single linear equation in two variables ax + by = k, where a, b and k are real constants, is used to define the linear relation corresponding to the set of points {(x, y): ax + by = k, x, y ∈ R} and this set of points can be used to draw the graph of a straight line in the cartesian plane R2. Any ordered pair (x, y) that satisfies this equation is a point on the graph of the straight line. If we are asked to consider the set of points which satisfy both this relation and another relation of the same kind together, then we have a set of two simultaneous linear equations in the variables x and y. Given that each relation corresponds to the graph of a straight line in R2, we can interpret this geometrically to see that there are three possibilities:• The simultaneous linear equations corresponding to the rules of these
relations have a unique solution, and their graphs have a unique point of intersection.
• The simultaneous linear equations corresponding to the rules of these relations have no solution, and their graphs are parallel and distinct straight lines.
• The simultaneous linear equations corresponding to the rules of these relations have infinitely many solutions, the equations represent the same relation and their graphs are the same straight line.As a related activity students could be asked to identify the rules of several
other linear relations that correspond to each of these cases with respect to 2x + 3y = 24, and verify these by using technology to draw the corresponding graphs. In such an activity, students are not considering the relationship in terms of its component parts, but as an object in its own right: each relation has a graph, and this graph may or may not have certain properties with respect to the given relation 2x + 3y = 24 and its graph.
In the first and last cases described above, the system of simultaneous linear equations is said to be consistent, in the second case it is said to be inconsistent. It is likely that a set of three or more arbitrarily selected
3�
Solving systems of simultaneous linear equations
ChApteR3
simultaneous linear equations in two variables will be inconsistent, however this is not always the case. For example the set of simultaneous linear equations {3x + 2y = 5, x – y = 0, –x + 4y = 3} has the unique solution (1, 1). Indeed, teachers can ask students to form systems of several simultaneous linear equations in two variables that are satisfied by a given ordered pair (m, n), simply by choosing the constant k to be am + bn for each combination a and b of coefficients for x and y in ax + by = k. Students will likely be familiar with solving simultaneous systems of two linear equations in two variables, using graphical, numerical and algebraic approaches, from their work in the middle years of secondary mathematics. The linear equations involved will have been alternatively presented in the forms y = mx + c, ax + by = k and Ax + By + C = 0, and students should be able to convert algebraically between these forms. They should also be aware that the first form only applies to linear relations that are functions; that is, it is not possible to use this form to describe linear relations such as {(x, y): x = 6, y ∈ R}. It is perhaps useful in this context to explicitly point out how this relates to the coordinate specification of the position of a point in the cartesian plane. For example, as shown in Figure 3.2, the point with coordinates (4, 7) corresponds to the solution of the pair of simultaneous linear equations x = 4, or 1x + 0y = 4, and y = 7, or 0x + 1y = 7:
–4 –2 2 4 6
(4, 7)
8 10
–20
–15
–10
–5
5
10
15
x = 4
y = 7
x
y
O
figure3.2: parts of the graphs of the lines with equations x = 4 and y = 7 showing their point of intersection at (4, 7)
40
Matrices
mAthsWoRksfoRteACheRs
Practical and theoretical problems in many areas can be formulated in terms of solving a system of simultaneous linear equations. A linear equation in two variables is a special case of an equation of the form f(x, y) = k, where k is a real constant and f(x, y) has the form ax + by for real constants a and b. For example, f(x, y) = 3x – 2y = 10 is a (linear) equation where k = 10 and a and b are 3 and –2 respectively. Students would be familiar with the use of this form to specify the rule of a linear function in the cartesian plane, or R2, and its corresponding straight line graph, with gradient 2
3 and axis-intercepts at (0, –5) and , 03
10` j. If g(x, y) = l also has the same form, for example
4x + 5y = –3, then we have a pair of simultaneous linear equations {f(x, y) = k, g(x, y) = l} or {3x – 2y = 10, 4x + 5y = –3}. The solution to this pair of simultaneous linear equations will be the set of all ordered pairs (x, y) that satisfy both equations, in this case the ordered pair ,23
442349-
` j. The solution can be readily found using a computer algebra system (CAS), such as Derive:
SOLVE([3.x – 2.y = 10, 4.x + 5.y = –3], [x, y])
x = 23
44 ^ y = – 2349[ ]
This ordered pair represents the point of intersection of the corresponding straight line graphs in the cartesian plane, as shown in Figure 3.3.
–2 –1 1 2 3 4
–6
–4
–2
2
4
6
x
y
O
figure3.3: Graph of two linear functions 3x – 2y = 10, 4x + 5y = –3 showing their point of intersection
41
Solving systems of simultaneous linear equations
ChApteR3
We can similarly define a linear equation in three variables as a special case of an equation of the form f(x, y, z) = k, where k is a real constant and f(x, y, z) has the form ax + by + cz for real constants a, b and c. For example, if f(x, y, z) = 3x – 2y + z, then 3x – 2y + z = 0 is a (linear) equation where k = 0 and a, b and c are 3, –2 and 1 respectively. Some students will have worked with this form as representing the equation of a plane in three-dimensional cartesian space, or R3. Any point with coordinates (x, y, z) that satisfy this relation is part of the corresponding plane. Again, this single equation has infinitely many solutions—all the points in the plane that it defines.
Technologies such as CAS are useful tools to assist in graphically representing three-dimensional shapes. We can similarly form sets of simultaneous linear equations involving three variables, with two or more equations. For example, the intersection of 3x – 2y + z = 0 with x – y – z = 10, corresponds to the solution set:
SOLVE([3.x – 2.y + z = 0, x – y – z = 10], [x, y])
[x = –3.z – 20 ^ y = –2(2.z + 15)]
This is a parametric solution in terms of the variable z, where for each real value of z the corresponding values of x and y are given in terms of z. Thus, the solution set is {(–3z – 20, –2(2z + 15), z): z ∈ R}. Each value of z generates the coordinates of a point in R3, represented by an ordered triple, and these points all lie on the straight line formed in the three dimensional space, R3, where the two planes intersect, as shown in part in Figure 3.4.
–25
–25
25
xy
25–5z5
figure3.4: Graph of parts of 3x – 2y + z = 0 and x – y – z = 10 showing intersection
42
Matrices
mAthsWoRksfoRteACheRs
This process can be extended to systems defined by sets of simultaneous linear equations in as many variables as we wish, although it is not then possible to provide simple geometric interpretations for more than three variables.
CAS, and other technologies, can be used to solve systems of simultaneous linear equations directly in ‘black-box’ mode (that is, providing results without detailing intermediate steps of calculation). In this text the use of such technologies is intended to elucidate the processes that are employed in applying algorithms to obtain solutions. While mental and by hand computation is an important part of the experience of developing understanding of key concepts, skills and processes, particularly in less complicated illustrative cases, CAS and other technologies have been developed to facilitate problem solving and analysis of the behaviour of mathematical systems by application of their functionality. They are essential tools to employ in contexts where many computations are required to be carried out quickly and reliably, such as with matrices of large size (order).
Although judgments about when, where and how it is appropriate to employ enabling technology (or not) will naturally vary with practical and philosophical considerations in context, individuals will ultimately make their own decisions on this matter where they have a choice. Teachers may find it useful to discuss with students a range of perspectives and considerations on this issue, including their own views and the rationales for these views.
Matrices provide a natural model for representations and manipulation of systems of simultaneous linear equations. For example, the linear form of f(x, y, z) = k corresponds to multiplying the variables x, y and z by their respective coefficients a, b and c, and making the sum of these values equal to k, that is, ax + by + cz = k. This is precisely the same as matrix row by column multiplication:
a b cxyz
k# =
R
T
SSSS
7 5
V
X
WWWW
A ?
Moreover, this extends naturally to a system of simultaneous linear equations, because for matrices such multiplication is defined for each combination of rows and columns where the matrices involved are conformable for multiplication. That is, the equations involved have the same number of variables, even if some of their coefficients are zero. For example, the equations 3x – 2y + z = 0 and x – z = 4 can be represented as the two
43
Solving systems of simultaneous linear equations
ChApteR3
matrix equations, xyz
3 2 1 0#- =
R
T
SSSS
7 5
V
X
WWWW
A ? and xyz
1 0 1 4#- =
R
T
SSSS
7 5
V
X
WWWW
A ?
respectively, or simultaneously via the single matrix equationxyz
31
20
11
04
#-
-=
R
T
SSSS
> =
V
X
WWWW
H G.
s t u d e n t A C t I v I t y 3 . 1
a Write down a system of two simultaneous linear equations in two variables that has (–5, 6) as its unique solution.
b Write down a system of two simultaneous linear equations in two variables that has (–5, 6) as one of its many solutions.
c Write down a system of two simultaneous linear equations in three variables that have (0, 0, 0) and (1, 1, 1) as solutions.
d Use the solve functionality of a CAS, or a like functionality of other suitable technology, to find the intersection of 3x – 2y + z = 0 and x – y – z = 10, and express the solution set in terms of y.
s o lv I n g s y s t e m s o f s I m u ltA n e o u s l I n e A Re q u At I o n s u s I n g m At R I x I n v e R s e
If a system of n simultaneous linear equations in n variables is consistent and has a unique solution, then square matrices and their inverse matrices may be used to find this unique solution. We will begin by considering two simple examples, give some geometric interpretations and then introduce some general notation. To start with, it is important for students to see how a familiar set of two simultaneous linear equations in two variables may be written as an array, as is often the case for by hand techniques, and subsequently as a single matrix equation.
44
Matrices
mAthsWoRksfoRteACheRs
exAmple3.1
Consider the system of two simultaneous linear equations in two variables (often called unknowns in this context), x and y, defined by requiring two numbers x and y to have a difference of 1 and a sum of 3. This is given by {x – y = 1, x + y = 3} and can be written in a rectangular array form, with corresponding variables vertically aligned
as x y
x y
1
3
- =
+ =* 4 or in matrix form as
xy
11
11
13
#-
== = =G G G.
In this case the solution of x = 2 and y = 1 can be readily identified by inspection. However, this will not always be the case.
The system of three simultaneous linear equations in three variables, x, y and z given by {x – y – z = 0, 6x + 4y = 20, -4y + 2z = 10} can be written in a rectangular array form, with corresponding variables vertically aligned as
xxx
yyy
zzz
60
44
02
02010
-
+
-
-
+
+
=
=
=
Z
[
\
]]
]]
_
`
a
bb
bb or in matrix form as
xyz
160
144
102
02010
#
-
-
-
=
R
T
SSSS
R
T
SSSS
R
T
SSSS
V
X
WWWW
V
X
WWWW
V
X
WWWW.
Other more complicated systems of n equations in n variables can also be likewise represented.
The matrices 11
11
-= G and
160
144
102
-
-
-R
T
SSSS
V
X
WWWW are commonly called the coefficient
matrices, the matrices 13= G and
02010
R
T
SSSS
V
X
WWWW the constant matrices and
xy= G and
xyz
R
T
SSSS
V
X
WWWW the
matrices of the variables or unknowns. Students should be encouraged to note that the matrices involved hold
systematic information about the system of equations. Each column of the coefficient matrix contains the coefficients of one variable, one coefficient from each of the equations. The variables are thus ordered according to which column of the matrix they correspond to. If we write A for the coefficient matrix, B for the constant matrix and X for the matrix of unknowns, then each of the above systems of simultaneous equations (and any other like system) can be represented in the same form by a single matrix equation
45
Solving systems of simultaneous linear equations
ChApteR3
AX = B. This equation can be solved for X by left multiplying both sides of the matrix equation by A–1 and applying matrix algebra:
AX = BA–1(AX) = A–1B (multiplying by A–1 on left)(A–1A)X = A–1B (by associativity of matrix multiplication)
IX = A–1B (since (A–1A) = I)X = A–1B (since IX = X)
For the system of two simultaneous linear equations in two unknowns this gives:
xy
11
11
13
21
21
21
21
13
21
1
=-
=-
=-
R
T
SSSS
= = = = =
V
X
WWWW
G G G G G
So x = 2 and y = 1 is the (simultaneous) solution to both the equations, as expected. Using the matrix inverse, both values are obtained at the same time, unlike other techniques in which the values are determined successively. Geometrically, each equation corresponds to a straight line in the cartesian plane, and they intersect at the point with coordinates (2, 1), as shown in Figure 3.5.
–1 1 2 3 4 5
–3
–2
–1
1
x + y = 3
x – y = 1
2
3
x
y
O
figure3.5: Graphs of x – y = 1 and x + y = 3 and their point of intersection
46
Matrices
mAthsWoRksfoRteACheRs
For the system of three simultaneous linear equations in three unknowns this gives:
xyz
160
144
102
02010
112
113
116
223
221
111
111
223
225
02010
1140
115
1145
1
=
-
-
-
= -
-
- = -
-R
T
SSSS
R
T
SSSS
R
T
SSSS
R
T
SSSSSSS
R
T
SSSS
R
T
SSSSSSS
V
X
WWWW
V
X
WWWW
V
X
WWWW
V
X
WWWWWWW
V
X
WWWW
V
X
WWWWWWW
So x 1140
= , y 115
=- and z 1145
= is the simultaneous solution to all three
linear equations, and this is not really evident ‘by inspection’. Geometrically, each equation corresponds to a plane in the three dimensional space of R3, and these planes intersect at the point with coordinates , ,11
4011
51145-
_ i as shown in Figure 3.6.
–5
–5
–5
5
5
5 x
z
y
figure3.6: Graphs of x – y – z = 0, 6x – 4y = 20 and –4y + 2z = 10 and their point of intersection
47
Solving systems of simultaneous linear equations
ChApteR3
These are called consistent systems, as there is a solution. In both cases the solution is also unique. This simple matrix method is ideal if there is a unique solution. However, in many cases there may be no solution or infinitely many solutions, and in these cases the coefficient matrix does not have an inverse.
Intuitively students should be able to discuss and identify geometric interpretations of possible intersections of graphs of straight lines (using rulers) and planes (using sheets of paper) corresponding to the two-dimensional and three-dimensional cases respectively. If two students write down the equations of two straight lines of the form ax + by = c independently and then compare, there are three possibilities:• They correspond to distinct lines with different gradients which have a
unique point of intersection, for example {3x + 2y = 4, –x + y = 0}.• They correspond to distinct lines with the same gradient which have no
points of intersection, that is they are parallel with different axis intercepts, for example, {3x + 2y = 4, 3x + 2y = 0}.
• They correspond to the same line with the same gradient and have infinitely many points of intersection, that is they are parallel with the same axis intercepts, for example {3x + 2y = 4, 1.5x + y = 2}.If three students write down the equations of three planes of the form
ax + by +cz = d independently and then compare, there are also several possibilities:• The three planes are distinct and intersect in a unique point.• The three planes are distinct and intersect in a line (like a three-page book).• The three planes are identical and intersect in infinitely many points that
form a single common plane.• The three planes do not intersect all together. (There are several ways in
which this might occur geometrically.)Students should be able to discuss the idea that, as a single equation in
three variables of the form ax + by + cz = d represents a single plane in R3, then a system of two simultaneous linear equations in three variables could have zero or infinitely many solutions, the former because the planes are parallel but distinct, the latter either because the planes are identical or because they define a line in R3 by their set of intersection points. Student exploration of these possibilities will be aided by access to CAS with two-dimensional and three-dimensional graphing functionalities.
4�
Matrices
mAthsWoRksfoRteACheRs
exAmple3.2
Consider the following systems of two simultaneous linear equations in
the two variables x and y: 4
2
x y
x y
+ =
+ =* 4and
xx
yy
63
2 84
- +
-
=
=
-* 4.
Their coefficient matrices are 11
11
= G and 6
321
-
-= G respectively, and
neither of these has an inverse. That is, they are both singular matrices as their determinants are both equal to zero.
For the first system, the corresponding lines are parallel and do not intersect, as shown in Figure 3.7, so there are no solutions. This system is said to be inconsistent.
–2 –1 1 2 3 4 5
–3
–2
–1
1
2
3
4
5
x + y = 2
x + y = 4
x
y
O
figure3.7: Graphs of x + y = 2 and x + y = 4, parallel straight lines with no points of intersection
For the second system, the corresponding lines are in fact the same line, as shown in Figure 3.8, hence each point on the line is a solution to the system. This system is consistent, but with infinitely many solutions.
4�
Solving systems of simultaneous linear equations
ChApteR3
–2 2 4
–6
–4
–2
2
4
6
x
y
O
figure3.�: Graphs of 3x – y = 4 and –6x + 2y = –8, identical straight lines with infinitely many points of intersection
If we use y = k as the free variable, a set of solutions, or solution set, can be written parametrically in the form , :k k Rk
34 !+
` j$ ..There are infinitely many other ways of writing the solution set for this system. For example, another solution set, using x t= as the free variable, is {(t, 3t – 4): t ! R}. Each value of k or t, as applicable, generates the coordinates of a solution point.
s t u d e n t A C t I v I t y 3 . 2
a The system of linear equations {ax + by = 0, cx + dy = 0}, where a, b, c and d are real constants, is called a homogeneous system. Show that for this system there is either a unique solution or infinitely many solutions.
b Describe relationships between real constants a, b, c, d, e and f for which the system of simultaneous linear equations {ax + by = e, cx + dy = f} has:– a unique solution– no solution– infinitely many solutions
c A system of equations has solution set {(3t + 1, 2t – 1): t ∈ R}. Find the corresponding cartesian equation.
d Write the solution set to the system of simultaneous equations xx
yy
63
2 84
- +
-
=
=
-) 3
in two ways that are different from the ways given above.
50
Matrices
mAthsWoRksfoRteACheRs
Although we cannot use the inverse of the coefficient matrix to solve the above systems, we can use another technique called Gaussian elimination to solve such systems. This is a generalisation, of the process of eliminating variables, carried out systematically, and represented in matrix form. This algorithm can be applied in cases where there are not the same number of equations and variables—an important technical generalisation.
t h e m e t h o d o f g A u s s I A n e l I m I n At I o n
We have seen above that a system of simultaneous linear equations can be written in matrix form. While the method involving the inverse of the matrix of coefficients can be used when the system has the same number of equations as variables within an equation, and the required inverse exists (that is, the system is consistent and has an unique solution), it would be useful to have a more general method of analysis. Such a method should:• enable us to determine whether a system is consistent or not, irrespective of
its order • determine the solutions to the system, involving parametric forms as
applicable• generalise existing methods for known simple cases.
Such a method exists, and is called Gaussian elimination. It is a numerical method, involving only the coefficients and constants of the system of equations, and can be effectively represented using matrices. This method underpins the processes used in technology such as CAS for solving systems of simultaneous linear equations and related operations. To do this we simply use the coefficient matrix and the matrix of constants as discussed previously, and place them side by side to form a new matrix called the augmented matrix for the system of simultaneous linear equations.
exAmple3.3
The system x y
x y
1
3
- =
+ =* 4 has augmented matrix
11
11
13
-= G.
The system xxx
yyy
zzz
60
44
02
02010
-
+
-
-
+
+
=
=
=
Z
[
\
]]
]]
_
`
a
bb
bb has augmented matrix
160
144
102
02010
-
-
-R
T
SSSS
V
X
WWWW.
51
Solving systems of simultaneous linear equations
ChApteR3
In general, the matrix equation AX = B, where A is an m × n coefficient matrix, X is the n × 1 matrix of variables and B is the n × 1 matrix of constants, gives rise to the m × (n + 1) (that is, one additional column) augmented matrix:
A B
aa
a
aa
a
aa
a
bb
bm m
n
n
mn n
11
21
1
12
22
2
1
2
1
2
f
f
f
; =h h
R
T
SSSSSS
6
V
X
WWWWWW
@
The important discussion is to lead from this form, which is only a representation of all the coefficients and constants of the original system, to another form which enables us to read off the solutions for x1 through to xn. The processes by which we move from one form to another must ensure that these forms are equivalent, where two systems of simultaneous linear equations are said to be equivalent if each has the same set of solutions. The idea of Gaussian elimination is to solve a system of simultaneous linear equations by writing a sequence of systems, each one equivalent to the previous system. Then each of these systems has the same set of solutions as the original one. The aim is to end up with a system that is easy to solve, such as one in what is called triangular form. Instead of writing the system of equations out each time, we simply write the corresponding augmented matrix, since the natural ordering of the matrix takes care of ‘tracking’ what happens to the coefficients of the variables.
To do this, only a certain type of operation, called an elementary operation, can routinely be performed on systems of simultaneous linear equations to produce equivalent systems. These are the operations we would use in solving such a system by hand.1 Interchange: Any two equations can be interchanged.2 Scaling: We can multiply an equation by a non-zero constant.3 Elimination: We can add a constant multiple of one equation to another
equation.In practice, operations 2 and 3 are often applied in conjunction to add or
subtract a multiple of one equation from a multiple of another equation, usually to ‘eliminate’ a variable from one equation.
Elementary operations performed on a system of simultaneous equations produce corresponding manipulations of the rows of the augmented matrix. Hence, either by hand or using technology, we manipulate the rows of the
52
Matrices
mAthsWoRksfoRteACheRs
augmented matrix rather than the equations. These row operations have the same effect as the operation on equations, where the equations have been written as a rectangular array with the coefficients of the variables and the constants vertically aligned. The following are the corresponding elementary row operations for a matrix:1 Interchange: Interchange any two rows in their entirety.2 Scaling: Multiply any row by a non-zero constant.3 Elimination: Add a constant multiple of one row to another row.
A matrix is said to be in (row) echelon form if it satisfies the following two conditions:1 If there are any zero rows, they are at the bottom of the matrix.2 The first non-zero entry in each non-zero row (called the leading entry or
pivot) is to the right of the pivots in the rows above it.Matrices which are in row echelon form have a ‘staircase’ appearance:
* **
**
***
0000
000
00
00 0
R
T
SSSSS
V
X
WWWWW
exAmple3.4
The following matrices are in echelon form:
10
23
= G, 200
570
R
T
SSSS
V
X
WWWW,
200
300
540
R
T
SSSS
V
X
WWWW,
51
30
42
= G, 00
00 0
2 34
= G
This idea can be extended further: a matrix is said to be in reduced (row) echelon form if it is in echelon form and also:3 Each pivot is 1.4 Each pivot is the only non-zero entry in its column.
exAmple3.5
The following matrices are in reduced row echelon form:
10
01
= G, 00 0
1 01
R
T
SSSS
V
X
WWWW, 0
0
300
0
0
11
R
T
SSSS
V
X
WWWW,
10
51
01
= G, 00
00 0
1 01
= G
53
Solving systems of simultaneous linear equations
ChApteR3
Two matrices A and B are said to be equivalent if one can be obtained from the other by a finite sequence of elementary row operations, and we write A ~ B to denote this equivalence.
Gaussian elimination is a procedure for bringing a matrix to its equivalent row echelon form, and is described in steps 1–4 of the Gauss–Jordan algorithm given below. At this stage the corresponding system of simultaneous linear equations is in triangular form and could be solved by back-substitution. The Gauss–Jordan algorithm, described in Table 3.1 is an extension of Gaussian elimination which brings the matrix to its equivalent reduced row echelon form from which the solution (if there is one) can be directly written down.
table3.1:gauss–Jordanalgorithm
gauss–Jordanalgorithm
Step 1 Identify the leftmost non-zero column.
Step 2 If the first row has a zero in the column of Step 1, interchange it with one that has a non-zero entry in the same column.
Step 3 Obtain zeros below the leading entry (also called a pivot) by adding suitable multiples of the top row to the rows below it.
Step 4 Cover the top row and repeat the same process with the leftover sub-matrix and starting at step 1. Repeat this process with each row. (At this stage the matrix is in echelon form.)
Step 5 Start with the last non-zero row, work upwards. For each row, obtain a leading 1 (by dividing by the value of the pivot) and introduce zeros above it by adding suitable multiples of the row with the leading 1 to the corresponding rows.
exAmple3.6
Use the Gauss–Jordan algorithm to solve the system of simultaneous linear equations formed by requiring two numbers x and y to have a difference of 1 and a sum of 3:
x y
x y
1
3
- =
+ =* 4
54
Matrices
mAthsWoRksfoRteACheRs
As noted earlier, in this case the solution of x = 2 and y = 1 can readily be obtained by inspection, however this will not always be the case. A simple example such as this enables students to attend to the process being illustrated rather than focus on the manipulations involved.
Solution
This system has the corresponding augmented matrix 11
11
13
-= G.
• The leftmost non-zero column is the first column (Step 1).• The top entry in this column is non-zero, so proceed to Step 3 of the
algorithm (Step 2).)• The new second row will be the old second row minus the first row
(Step 3).
• The resulting matrix is 10
12
12
-= G, which is in row echelon form.
(This corresponds to the system of simultaneous linear equations
x y
y
1
2 2
- =
=* 4, which is in a triangular form, and could easily be solved
by back-substitution. The last equation is 2y = 2, and so y = 1. Substitute this value for y in the first equation, and solve for x, which gives x – 1 = 1, and so x = 2, hence the solution of the system is (2, 1).)
• Now we must use Step 5 to convert the matrix to reduced row echelon form. First, we need to turn the leading entry in the second row, into a leading ‘1’. So divide the second row by 2 to obtain
10
11
11
-= G. To convert this to reduced row echelon form, we need to
turn the entry in the first row, second column to 0. This can be done by writing a new first row which is equal to the second row added to
the first row to obtain 10
01
21
= G.
The matrix is now in reduced row echelon form, corresponding to the
equivalent system of simultaneous linear equations x
y
2
1
=
=* 4, which is
the solution, as noted earlier.
55
Solving systems of simultaneous linear equations
ChApteR3
exAmple3.7
Use the Gauss–Jordan algorithm to solve the following system of simultaneous linear equations:
xxx
yyy
zzz
60
44
02
02010
-
+
-
-
+
+
=
=
=
Z
[
\
]]
]]
_
`
a
bb
bb
Solution
This system has the corresponding augmented matrix:
160
144
102
02010
-
-
-R
T
SSSS
V
X
WWWW
The leftmost non-zero column is again the first column (Step 1) and the top entry in this column is non-zero (Step 2), so we proceed to Step 3 of the algorithm. The new second row will be the old second row with six times the first row subtracted from it, and since the element of the first column in the third row is already zero, we do not need to do anything to the third row at this stage. The equivalent matrix is:
100
110
4
162
02010
-
-
-R
T
SSSS
V
X
WWWW
Now we are at Step 4 of the elimination process. If we cover the top row, the leftmost non-zero column is now the second column (Step 1) and the top entry is non-zero (Step 2), so the new third row will be the previous third row with 10
4 × the second row added to it:
.
100
1100
16
4 4
02018
- -R
T
SSSS
V
X
WWWW
This matrix is now in echelon form, and we could use back-substitution to solve the corresponding system of simultaneous linear
equations .
x yy
zzz
10 64 4
02018
- -
+
=
=
=
Z
[
\
]]
]]
_
`
a
bb
bb.
56
Matrices
mAthsWoRksfoRteACheRs
Now we are at Step 5. We begin by dividing the last row by .4 4 522= ,
to obtain a leading 1. The matrix is then:
10
0
110
0
16
1
020
1145
- -R
T
SSSSS
V
X
WWWWW
We must next turn the first two numbers in the third column to 0 by using elementary row operations. The new first row will be the previous first row + the third row, and the new second row will be the previous second row with 6 × the third row subtracted from it. The new matrix will be:
1
0
0
1
10
0
0
0
1
1145
1150
1145
-
-
R
T
SSSSSSS
V
X
WWWWWWW
Next, we need to divide the second row by 10 to obtain a leading 1:
1
0
0
1
1
0
0
0
1
1145
115
1145
-
-
R
T
SSSSSSS
V
X
WWWWWWW
Finally, to get the matrix into reduced row echelon form, we need to obtain a 0 in the first row, second column position. This we can do by adding the second row to the first row and replacing the previous first row with this.
1
0
0
0
1
0
0
0
1
1140
115
1145
-
R
T
SSSSSSS
V
X
WWWWWWW
The final equivalent system of simultaneous linear equations is:
x
y
z
1140
115
1145
=
=
=
-
Z
[
\
]]
]]
_
`
a
bb
bb
This is the required solution.
57
Solving systems of simultaneous linear equations
ChApteR3
Students may well inquire what happens when this algorithm is applied to a system of simultaneous linear equations that corresponds to a pair of parallel lines or identical lines in the cartesian plane.
exAmple3.�
Use the Gauss–Jordan algorithm to solve this system of simultaneous linear equations:
x y
x y
4
2
+ =
+ =* 4
Solution
This system corresponds to a pair of distinct parallel lines (same gradient, different intercepts). The corresponding augmented matrix for
this system is 11
11
42
= G.
The first step in the Gauss–Jordan algorithm is to replace the second row with the previous second row minus the first row, to obtain
10
10
42-
= G.
(The system is now in triangular form, and we can see that the last row corresponds to the equation 0x + 0y = –2, which is impossible. Hence there are no solutions to this system.)
Next, we divide the elements in the second row by –2 to obtain
10
10
41
= G, and finally we subtract four times the second row from the
first row to obtain 10
10
01
= G. Now we see that the last equation is
0x + 0y = 1, which is impossible, and so there are no solutions to this system of equations.
The above is typical of the result of elimination when there are no solutions. The last non-zero row of the augmented matrix will have zeroes everywhere except in the right-most position.
5�
Matrices
mAthsWoRksfoRteACheRs
exAmple3.�
Use the Gauss–Jordan algorithm to solve the system of simultaneous linear equations:
xx
yy
63
2 84
- +
-
=
=
-) 3
Solution
This system corresponds to a pair of identical parallel lines (same gradient, same intercepts). The corresponding augmented matrix for this
system is 6
321
84
-
-
-= G.
Replace the second row by itself + 21 times the first row. This gives
60
20
80
- -= G. We could complete the Gauss–Jordan algorithm by dividing
the first row by –6, giving the reduced row echelon form of 10
31
034
0-
> H.
The variables in this example are x and y. There is one leading 1, corresponding to the x variable (as the coefficients of x were in the first column), and so we describe variable x as basic or leading and the other variable, y, as free. The second row now tells us that 0y = 0, which is true for any value of y, so we let y = k where k ∈ R is an arbitrary constant.
Then, from the first row of the reduced matrix, we have x y31
34
- = ,
and, since y k= , x k k31
34
34
= + =+ . Thus, we can write the solution
(in parametric form) as , :k k Rk3
4 d+` j$ ..
In general, the solution to consistent systems such as the one just considered, but also to more complicated cases where there are both leading (corresponding to leading 1s) and free variables, is written by assigning arbitrary constants, or parameters, to the free variables, and then writing the leading variables in terms of these arbitrary constants.
Most CAS have a function which automatically reduces an augmented matrix to reduced row echelon form, from which the solution can be determined. This is fine unless there is an arbitrary constant in the augmented matrix itself (for example, arising from one of the linear equations in the system involving an arbitrary constant for one of the coefficients or its
5�
Solving systems of simultaneous linear equations
ChApteR3
constant term). Then it is necessary to be wary that a division by a function of the constant may have taken place, and that this operation will only be valid if the function of the constant is non-zero. The resulting reduced row echelon matrix may have no trace of the constant. Some CAS have a ‘fraction free Gaussian-elimination’ function which effectively only gives the row echelon matrix, and this can be used to investigate what happens for such systems.
s t u d e n t A C t I v I t y 3 . 3
For the following systems of equations, enter the augmented matrix into a CAS, or other suitable technology, and use this to obtain the reduced row echelon form. Hence solve the following systems of simultaneous linear equations.
a xxx
yyy
zzz2
245
33
231
+
+
+
-
-
-
=
=
=
Z
[
\
]]
]]
_
`
a
bb
bb
b xxx
yyy
zzz3 2
3152
- +
-
-
+
+
+
=
=
=
-
-
Z
[
\
]]
]]
_
`
a
bb
bb
c xxx
yyy
zzz
224
2
2
57
10-
-
+
+
+
-
+
+
=
=
=
Z
[
\
]]
]]
_
`
a
bb
bb
s y s t e m s o f s I m u ltA n e o u s l I n e A Re q u At I o n s I n v A R I o u s C o n t e x t s
Many different contexts give rise to systems of simultaneous linear equations in several variables, even when the relations involved may themselves be non-linear. Substitution of values of the variables in such contexts often results in a system of simultaneous linear equations relating coefficients and/or arbitrary constants.
60
Matrices
mAthsWoRksfoRteACheRs
exAmple3.10
A circle has an equation of the form x2 + y2 + ax + by + c = 0, where a, b and c ∈ R. This circle passes through the points (–2, 3), (6, 3) and (2, 7) in the cartesian plane. Find the values of a, b and c.
Solution
Since (–2, 3) lies on the circle, substitution of these values into the equation for the circle gives (–2)2 + 32 – 2a + 3b + c = 0. This simplifies to the linear equation:
–2a + 3b + c = –13Similarly, as (6, 3) also lies on the circle we have
62 + 32 + 6a + 3b + c = 0, which simplifies to the linear equation:
6a + 3b + c = –45For the point (2, 7), which also lies on the circle, we have
22 + 72 + 2a + 7b + c = 0, and hence:
2a + 7b + c = –53We thus have a system of three simultaneous linear equations in a, b
and c:
a b c
a b c
a b c
2 3 13
6 3 45
2 7 53
- + + =-
+ + =-
+ + =-
Z
[
\
]]
]]
_
`
a
bb
bb
The augmented matrix form is:
262
337
111
134553
- -
-
-
R
T
SSSS
V
X
WWWW
and using by hand or CAS manipulation to bring this to reduced row echelon form yields:
100
010
001
463
-
-
-
R
T
SSSS
V
X
WWWW
The solution can be read directly from this: a = – 4, b = – 6 and c = – 3, and so the equation of the circle is x2 + y2 – 4x – 6y – 3 = 0. Completing the square, on both x and y, results in the alternative equation of the form (x – 2)2 + (y – 3)2 = 16 for the relation. The graph of this relation is a circle with centre (2, 3) and radius 4, as shown in Figure 3.9.
61
Solving systems of simultaneous linear equations
ChApteR3
–6 –4 –2 2 4 6 8
–6
–4
–2
2
4
6
8
x
y
O
figure3.�: Graph of the relation x2 + y2 – 4x – 6y – 3 = 0 or (x – 2)2 + (y – 3)2 = 16
exAmple3.11
Three Toyotas, two Fords and four Holdens can be rented for $212 per day. Alternatively, two Toyotas, four Fords and three Holdens can be rented for $214 per day, or four Toyotas, three Fords and two Holdens could be rented for $204 per day.
Assuming that the rate for renting any type of car is fixed by the make, find the rental rates for each type of car per day.
Solution
Let a, b and c be the respective costs of renting a Toyota, a Ford and a Holden per day. Then we have three simultaneous linear equations in the three unknowns a, b and c.
a b c
a b c
a b c
3 2 4 212
2 4 3 214
4 3 2 204
+ + =
+ + =
+ + =
62
Matrices
mAthsWoRksfoRteACheRs
The augmented matrix corresponding to this system is:
324
243
432
212214204
R
T
SSSS
V
X
WWWW
and using by hand or CAS manipulation to return the reduced row echelon form yields:
100
010
001
202426
R
T
SSSS
V
X
WWWW
Hence the rental rates are $20 per day for a Toyota, $24 per day for a Ford and $26 per day for a Holden.
exAmple3.12
A girl finds $5.20 in coins: 50 cent coins, 20 cent coins and 10 cent coins. She finds 21 coins in total. How many coins of each type must she have?
Solution
Suppose she has a 50 cent coins, b 20 cent coins and c 10 cent coins. Then
a b c50 20 10 520+ + =
She has 21 coins in total, so:
a b c 21+ + =
This gives us two simultaneous linear equations in three unknowns. We write the augmented matrix corresponding to the system:
501
201
101
52021
= G
and find its reduced row echelon form:
1
0
0
131
34
310
353
-R
T
SSSS
V
X
WWWW
Generally, there would be infinitely many possible solutions to these equations, but we require non-negative integers as solutions. The leading variables correspond to the columns with leading 1s, so are a and b.
63
Solving systems of simultaneous linear equations
ChApteR3
The free variable is c, and it can take integer values between 0 and 21. We can write a and b in terms of c, from the reduced row echelon matrix above, as
a c c
b c c3
103 3
10
353
34
353 4
= + =+
= - =-
To find the possible integer solutions, we need to consider integer values of c between 0 and 21, and determine when 10 + c and 53 – 4c are both divisible by 3. This could easily be done by technology, forming a 22 × 3 matrix, with
the first column containing c, the second c3
10 + and the third c3
53 4- .
cc
310 + c
353 4-
0 3.3 17.71 3.7 16.32 4.0 15.03 4.3 13.74 4.7 12.35 5.0 11.06 5.3 9.77 5.7 8.38 6.0 7.09 6.3 5.7
10 6.7 4.311 7.0 3.012 7.3 1.713 7.7 0.314 8.0 –1.015 8.3 –2.316 8.7 –3.717 9.0 –5.018 9.3 –6.319 9.7 –7.720 10.0 –9.021 10.3 –10.3
64
Matrices
mAthsWoRksfoRteACheRs
Thus we see that there are several possibilities:
{a = 4, b = 15, c = 2} or {a = 5, b = 11, c = 5} or {a = 6, b = 7, c = 8} or {a = 7, b = 3, c = 11}
exAmple3.13
The scores of three players in a tournament have been lost. The only information available is the total of the scores for players 1 and 2, the total for players 2 and 3, and the total for players 3 and 1. Show that the original scores can be recovered.
Solution
Let x, y and z be the scores for players 1, 2 and 3 respectively, and a, b and c the totals for players 1 and 2, 2 and 3, and 3 and 1 respectively. Then
x y a
y z b
z x c
+ =
+ =
+ =
is a system of three simultaneous linear equations in three unknowns x, y and z. The augmented matrix is:
abc
101
110
011
R
T
SSSS
V
X
WWWW
Its corresponding reduced row echelon form is
a b c
a b c
a b c
1
0
0
0
1
0
0
0
1
2
2
2
- +
+ -
- + +
R
T
SSSSSSS
V
X
WWWWWWW
So the original scores are x a b c2=
- + , y a b c2=
+ - and
z a b c2=
- + + .
65
Solving systems of simultaneous linear equations
ChApteR3
exAmple3.14
Find the rule for the family of parabolas which pass through the points (1, 2) and (3, 4).
Solution
Let the rule for the family of parabolas be y = ax2 + bx + c, where a is non-zero. Since (1, 2) lies on any member of this family of curves:
a b c 2+ + =
Similarly, since (3, 4) lies on any member of the family of curves:
a b c9 3 4+ + =
Hence we have a system of two equations in three unknowns a, b and c. We first write the augmented matrix:
19
13
11
24
= G
and use CAS to reduce it to its reduced row echelon form:
1 0
1031
34
31
37
- -> H
Now a and b are the leading variables, and c is a free variable, so we
let c = k where k ∈ R. Then the first row gives a c3 3
1- =- , and so
a c k3
13
1=
-=
- . The second row gives b c34
37
+ = and so
b c k3
7 43
7 4=
-=
- .
Hence the rule for the family of parabolas is:
y k x k x k31
37 42=
-+
-+ , k ∈ R
We will graph a few of these curves.If k = 1, then a = 0 and we obtain the straight line with equation
y = x + 1, which passes through the two points.If k = 4, then we obtain the parabola with equation y = x2 – 3x + 4.If k = – 5, then we obtain the parabola y = –2x2 + 9x – 5. The graphs
of these curves are shown in Figure 3.10.
66
Matrices
mAthsWoRksfoRteACheRs
–1 1 2 3
(3, 4)
(1, 2)
4
–2
–1
1
2
3
4
5
6g
h
f
x
y
O
figure3.10: Graph of f (x) = –2x2 + 9x – 5, g (x) = x2 – 3x + 4 and h (x) = x + 1
In the above example, we found a and b in terms of c. We could have used the same procedure to find, say, b and c in terms of a. All we would need to do is to have the columns in the matrix corresponding to b and c come before the column corresponding to a. In this case, the augmented matrix would be13
11
19
24
= G, and the reduced row echelon form would be 10
01
43
11-
= G. Now we
let a = r, where r ∈ R. Then from the first row we have b = 1 – 4r and from the second row we have c = 1 + 3r, giving the form of the family of parabolas as y = rx2 + (1 – 4r)x + 1 + 3r where r ∈ R.
Students may find it of interest to explore what values of k and r are required to produce a collection of given quadratic functions.
exAmple3.15
Find the rule for the family of cubic polynomials which passes through the points (1, 0) and (–1, 0), with slope –4 when x = 1.
Solution
Let f(x) = ax3 + bx2 + cx + d be the rule of a cubic polynomial function, with a, b, c and d the unknown coefficients. Since (1, 0) lies on the curve:
a b c d 0+ + + =
67
Solving systems of simultaneous linear equations
ChApteR3
Similarly, since (–1, 0) lies on the curve:
a b c d 0- + - + =
Now f ′(x) = 3ax2 + 2bx + c, and the slope at x = 1 is –4, so 3a + 2b + c = –4.
We now have a system of three simultaneous linear equations in four unknowns:
aaa
bbb
ccc
dd
3 2
004
-
+
+
+
+
-
+
+
+
=
=
=-
Z
[
\
]]
]]
_
`
a
bb
bb
We can write down the augmented matrix corresponding to this system:
11
3
112
11
1
110
004
- -
-
R
T
SSSS
V
X
WWWW
The reduced row echelon form of this is:
100
010
001
111
202
- -R
T
SSSS
V
X
WWWW
There are three leading variables, a, b and c, and one free variable, d. Let d = k where k ∈ R, and express the leading variables in terms of k. The first row of the reduced matrix tells us that a – d = –2, and so a = –2 + k. The second row tells us that b + d = 0, and so b = –k and the third row tells us that c + d = 2, so c = 2 – k. Thus, the rule for the family of polynomials is:
f: R → R, where f(x) = (k – 2)x3 – kx2 +(2 – k)x + k, and k ∈ R.If k = 2, the function will be the quadratic with rule f(x) = –2x2 + 2.We can check that this is the general form by drawing the graphs of
some members of this family: If k = 0, f(x) = –2x3 + 2x.If k = 3, f(x) = x3 – 3x2 – x + 3.If k = –1, f(x) = –3x3 + x2 + 3x – 1.The graphs of these, and for k = 2, are shown in Figure 3.11 on the
following page.
6�
Matrices
mAthsWoRksfoRteACheRs
–4 –3
y = –2x3 + 2x
h
g
f
p
y = –2x2 + 2
y = x3 –3x2 – x + 3
y = –3x3 + x2 + 3x – 1
–2 –1 1 2 3 4
–4
–3
–2
–1
1
2
3
4
x
y
O
figure3.11: Graphs of f (x) for k = –1, 0, 2 and 3
exAmple3.16
Find the rule for the family of quartic polynomials (polynomials of degree 4) that pass through the points (1, 2) and (–2, –1), and have slope 5 at x = 1.
Solution
Let f(x) = ax4 + bx3 + cx2 + dx + e be the rule for the family of quartic polynomials.
Since they pass through (1, 2), f(1) = 2, so:
a + b + c + d + e = 2Since they pass through (–2, –1), f(–2) = –1, so:
16a – 8b + 4c – 2d + e = –1Now f ′(x) = 4ax3 + 3bx2 + 2cx + d. Since we must have f ′(1) = 5:
4a + 3b + 2c + d = 5We now have a system of three simultaneous linear equations in the
five unknowns a, b, c, d and e:
aaa
bbb
ccc
ddd
ee16
483
42
2215
+
-
+
+
+
+
+
-
+
+
+
=
=
=
-
Z
[
\
]]
]]
_
`
a
bb
bb
6�
Solving systems of simultaneous linear equations
ChApteR3
The augmented matrix for this system is:
1
164
18
3
142
12
1
110
21
5- - -
R
T
SSSS
V
X
WWWW
which has reduced row echelon form:
1
0
0
0
1
0
0
0
1
21
0
23
43
21
49
121
65
1213
- -
-
R
T
SSSSSSS
V
X
WWWWWWW
So the leading variables are a, b and c, and the free variables d and e. Let d = s and e = t where s, t ∈ R. Our solution will now be given in terms of two parameters. Then the first row of the reduced echelon
matrix corresponds to the equation a d e21
43
121
- - = , hence
a s t21
43
121
= + + .
The second row of the reduced echelon matrix corresponds to the
equation b e21
65
- = , hence b t21
65
= + .
The third row of the reduced echelon matrix corresponds to the
equation c d e23
49
1213
+ + = , hence c s t23
49
1213
=- - + .
The solution set is
, , , , : ,s t t s t s t s t R143
121
21
65
23
49
1213
2 d+ + + - - +b l' 1
and the family of functions has the form:
f x s t x t x s t x sx t21
43
121
21
65
23
49
12134 3 2= + + + + + - - + + +] b b bg l l l
where ,s t Rd .As a solution, this looks fairly formidable, so it’s a useful strategy to
plot a few of its members. • If s 0= and t 0= , then we have the function
f x x x x121
65
12134 3 2= + +] g .
• If s 1= and t 0= , then we have the function
f x x x x x127
65
1254 3 2= + - +] g .
70
Matrices
mAthsWoRksfoRteACheRs
• If s 0= and t 1= , then we have the function
f x x x x65
34
67 14 3 2= + - +] g .
• If s 1= and t 1= , then we have the function
f x x x x x34
34
38 14 3 2= + - + +] g .
• If s 1=- and t 1=- , then we have the function
f x x x x x67
31
629 14 3 2- + + - -=] g .
These members are shown in Figure 3.12 below.
–4 –3 –2 –1 1O
–12 3 4
1
2
3
4
5
–2
–3
–4
–5
x
y
figure3.12: parts of the graphs of some members of the family of functions for various values of s and t
s t u d e n t A C t I v I t y 3 . 4
a The scores of four players in a tournament have been lost. The only information available is the total of the scores for players 1 and 2, the total for players 2 and 3, the total for players 3 and 4 and the total for players 4 and 1. Can the original scores be recovered?
b Find the equation of the cubic polynomial which passes through the points (1, 0) and (–1, 0), with slope –4 when x = 1 and slope 12 when x = –1.
c Find the rule for the family of quartic polynomials (polynomials of degree 4) that pass through the points (1, 2), (–2, –1) and (2, 0), and have slope 5 at x = 1.
d Find the rule for the quartic polynomial (polynomial of degree 4) that passes through the points (1, 2), (–2, –1), (2, 0) and (–1, 5), and has slope 5 at x = 1.
71
Solving systems of simultaneous linear equations
ChApteR3
A linear system of m simultaneous equations in n variables , , ,x x xn1 2 f is a set of m equations of the form
a xa xa x
a xa xa x
a xa xa x
bbbm m
n n
n n
mn n m
11 1
21 1
1 1
12 2
22 2
2 2
1
2
1
2
f
f
f
+
+
+
+
+
+
=
=
=
The numbers , , , , ,a a a an mn11 12 1f f are the coefficients of the system, and , , ,b b bm1 2 f are the constant terms. A system of simultaneous linear equations is said to be consistent if it has either a unique solution or infinitely many solutions, and inconsistent if it does not have a solution.For a system of two simultaneous linear equations in two variables m n 2= =] g:
– there is a unique solution which corresponds to the point of intersection of the graphs of the corresponding straight lines (different gradient) in the cartesian plane, R2
or– there are infinitely many solutions which correspond to the
infinite set of points that comprise the superimposition of the graphs of the same straight line (same gradients and same axis intercepts) specified by two equivalent linear relations, in the cartesian plane, R2
or– there are no solutions, and the graphs of the corresponding
straight lines are parallel (same gradient) but distinct (different axis intercepts) straight lines in the cartesian plane, R2.
For a system of three simultaneous linear equations in three variables m n 3= =] g:– there is a unique solution which corresponds to the point of
intersection of the graphs of the corresponding planes in three-dimensional space, R3
or– there are infinitely many solutions which correspond to the graphs
of three planes aligned like ‘pages’ (which may include a superimposed page or pages) in a book along a common spine, in three-dimensional space, R3, and where at least two of these ‘pages’ are distinct, their intersection points form a straight line in R3
•
•
•
•
•
s u m m A R y
72
Matrices
mAthsWoRksfoRteACheRs
or– there are no solutions, and the graphs of the three planes are all
parallel but distinct; or one pair is parallel (and distinct) and the other oblique to this pair; or they are configured like a triangular prism.
The system of simultaneous linear equations can be written in matrix form AX = B, where
A
aa
a
aa
a
aa
am m
n
n
mn
11
21
1
12
22
2
1
2
f
f
f
=h h h
R
T
SSSSSS
V
X
WWWWWW
is the m × n coefficient matrix, X =
xx
xn
1
2
h
R
T
SSSSSS
V
X
WWWWWW
the
n × 1 column matrix (vector) of variables, and B =
bb
bm
1
2
h
R
T
SSSSS
V
X
WWWWW
the m × 1 column matrix (vector) of constant terms.
If A is an invertible (non-singular) square matrix (m = n) then the inverse method can be employed and X = A–1B.The m × (n + 1) augmented matrix of the system is the following matrix:
aa
a
aa
a
aa
a
bb
bm m
n
n
mn m
11
21
1
12
22
2
1
2
1
2
f
f
fh h h h
R
T
SSSSS
V
X
WWWWW
The system of simultaneous linear equations of the form AX = O is said to be homogeneous and is always consistent, with X = O (the relevant zero vector) a solution.To solve such systems of equations using the Gauss–Jordan method, there are three steps. Step 1: Write the augmented matrix for the system of equations.Step 2: Enter the augmented matrix into CAS, or other suitable
technology, and obtain the reduced row echelon form of the matrix (using exact arithmetic where possible).
•
•
•
•
•
s u m m A R y (Cont.)
73
Solving systems of simultaneous linear equations
ChApteR3
ReferencesAnton, H & Rorres, C 2005, Elementary linear algebra (applications version),
9th edn, John Wiley and Sons, New York.Cirrito, F (ed.) 1999, Mathematics higher level (core), 2nd edn, IBID Press, Victoria.Hill, RO Jr 1996, Elementary linear algebra with applications, 3rd edn, Saunders
College, Philadelphia.Lipschutz, S & Lipson, M 2000, Schaum’s outline of linear algebra, 3rd edn,
McGraw-Hill, New York. Nicholson, KW 2001, Elementary linear algebra, 1st edn, McGraw-Hill Ryerson,
Whitby, ON. Nicholson, KW 2003, Linear algebra with applications, 4th edn, McGraw-Hill
Ryerson, Whitby, ON.Poole, D 2006, Linear algebra: A modern introduction, 2nd edn, Thomson Brooks/
Cole, California. Wheal, M 2003, Matrices: Mathematical models for organising and manipulating
information, 2nd edn, Australian Association of Mathematics Teachers, Adelaide.
Step 3: Interpret the resulting reduced row echelon matrix, as follows:Case 1: If the number of leading 1s is equal to the number of variables, and the last leading 1 is not in the rightmost column, then there is a unique solution which can be written down directly from the matrix.Case 2: If the number of leading 1s is less than the number of variables, and the last leading 1 is not in the rightmost column, then there will be an infinite number of solutions. The solutions can be written by assigning an arbitrary constant to each of the free variables (those not corresponding to leading 1s), and writing the leading variables in terms of these constants.Case 3: If the last non-zero row of the reduced row echelon matrix has a leading 1 in the rightmost column, then the system of equations is inconsistent (that is, has no solution).
Non-linear functions and relations can be used to generate a system of simultaneous linear equations where substitution of some values for the variables leads to such a system expressed in terms of coefficients used to specify the particular functions and/or relations involved.
•
s u m m A R y (Cont.)
74
Matrices
mAthsWoRksfoRteACheRs
Websiteshttp://en.wikipedia.org/wiki/Gaussian_elimination – Wikipedia
This site provides a comprehensive discussion with links to other resources and references.
http://mathworld.wolfram.com/GaussianElimination.html – Wolfram ResearchThis site is the online mathematical encyclopaedia from the developers of the CAS Mathematica. It provides a concise but comprehensive mathematical overview and includes links to related topics and a good list of other references.
http://www.sosmath.com/matrix/system1/system1.html – SOS MathematicsThis site provides an accessible discussion with worked examples using Gaussian elimination for some simple cases of systems of simultaneous linear equations.
http://aleph0.clarku.edu/~djoyce/ma105/simultaneous.html – Department of Mathematics and Computer Science, Clark University. This site includes a first principles discussion of a practical problem based on ancient Chinese methods.
http://www.jgsee.kmutt.ac.th/exell/PracMath/SimLinEq.html – Practical Mathematics. This site covers straightforward examples for 2 × 2 and 3 × 3 systems, with a collection of related exercises.
http://mathforum.org/linear/choosing.texts/ – Drexel UniversityThis site provides information on selected linear algebra texts, including those that are technology based.
http://www.sosmath.com/matrix/matrix.html – SOS MathematicsThis site has some notes with examples on solving systems of linear equations, and is at quite a simple level.
75
C H A p T e r 4t R A n s f o R m At I o n s o f t h eC A R t e s I A n p l A n e
A transformation on the cartesian plane, R × R, or R2 as it is also commonly designated, is a correspondence or mapping from the set of points in the plane to a set of points in the plane. That is, for every (original) point in the plane before the transformation is applied, there is a corresponding unique point, the image, in the plane after the transformation is applied.
In senior secondary mathematics curricula, particular importance is assigned to the study of the effects of transformations on certain subsets of the plane—those that correspond to the graphs of functions and other relations. Possibly the first case of analysis related to transformations of (graphs of) functions for middle school secondary students is the graph of the function with rule g(x) = a(x + b)2 + c, derived from the graph of the function with rule f(x) = x2 by a sequence of transformations involving a dilation from the y-axis, possibly a reflection in the x-axis (depending on the sign of a), a translation parallel to the x-axis and a translation parallel to the y-axis, although a good case could be made for considering graphs of linear functions of the form y = mx + c as a similar sequence of transformations of the graph of y = x.
The first two transformations mentioned above—dilation and reflection—are examples of what are commonly called linear transformations, while all three of these transformations are examples of what are called affine transformations (linear transformations and also translations). Some care will need to be taken for students to become clear that there are two types of ‘function’ involved in this context: the function of a single real variable whose graph corresponds to a particular type of subset of the cartesian plane {(x, y) where y = f(x) and x ∈ domain(f)}, and the function which is a transformation of the plane that maps an ordered pair (that is, a point) to another unique ordered pair.
76
Matrices
mAthsWoRksfoRteACheRs
Matrices are well suited to the analysis of linear transformations, and provide a convenient notation for distinguishing between these two senses of function. Indeed, linear transformations can be used to provide a strong motivation for the definition of matrix multiplication, as applied to 2 × 2 matrices. The application of matrices to the analysis of linear transformations involves the solution of systems of simultaneous linear equations and matrix inverses.
l I n e A R t R A n s f o R m At I o n s
A linear transformation is a function T:R2 → R2, T(u) = w, where u and w are ordered pairs corresponding to points in the plane, which satisfies the following properties or axioms, called the linearity axioms:
T(u + v) = T(u) + T(v) for all u and v ∈ R2 T(kv) = kT(v) for all v ∈ R2 and all scalars k ∈ R
More generally a linear transformation is defined as a function from one vector space (see Chapter 2, page 20) to another that satisfies the linearity axioms above. In this text we will consider only the restricted case of R2, where the underlying vector space is that of coordinate vectors in the cartesian plane. It is important that teachers clarify for students the nature of the cartesian plane as R × R, or R2. Students may or may not have come across the notion of the cartesian product of two sets X and Y, where
X × Y = {(x, y): x ∈ X and y ∈ Y}.Even if they are familiar with this notion, for example, from listing the
event space for two events with a finite discrete set of possible outcomes, they may not transfer this conceptually to the case of uncountable continuous sets, or at least require reminding of its application in this context. Indeed, their own practical experience is much more likely to have them familiar with a well known subset of R2, that is, the set of all integer (whole number) valued grid points, part of which is shown in Figure 4.1, and which constitute Z × Z = Z2, where Z is the set of integers.
In the case of R2 = {(x, y): x, y ∈ R}, where X = Y = R, the corresponding cartesian product can be regarded as the set of all points in the cartesian plane or the set of all position vectors for these points with respect to the origin of the cartesian plane. Clearly there is a one-to-one correspondence between points and position vectors with respect to the origin, and matrix notation is quite useful in this context.
77
Transformations of the cartesian plane
ChApteR4
Since any u = (x, y) ∈ R2 can be written as a column matrix u = xy= G, we
can write u = x y10
01
+= =G G; that is, we can write u as a linear combination of
10= G and
01= G.
By the linearity properties, to determine the image of u under T we simply
need to determine the image of 10= G and
01= G under T. The set of vectors
{[1, 0], [0, 1]} or ,10
01
= =G G) 3 is said to be a basis of the vector space they
generate, in this case the set of all coordinate vectors in the cartesian plane, as
any coordinate vector ,x yxy
=^ h = G is a linear combination of these two vectors.
This simple but powerful concept underpins much of the work relating to transformations of the plane, so it is useful to take some time to ensure that students have a sound grasp of it. In work on vector representations in the
plane, ,10
01
= =G G) 3 corresponds to the unit vectors commonly denoted {i, j}
(see Evans, 2006, Chapter 8).
–8 –6 –4 –2 2 4 6 8
–8
–6
–4
–2
2
4
6
8
xO
y
figure4.1: Intersecting lines indicating a subset of grid points of Z × Z = Z 2, where Z is the set of integers
7�
Matrices
mAthsWoRksfoRteACheRs
Let Tac
10
=e o= =G G and Tbd
01
=e o= =G G for some real numbers a, b, c, and d.
Then Txy
xac
bd
ax bycx dy
ac
bd
xy
T x y xT yT
y
001
10
01
1
=+
+
= + = +
+ = =
e e e eo o o o= = = = =
= = = = =
G G G G G
G G G G G
Such a linear transformation can be accomplished by a matrix multiplication, with the matrix determined by the image of the two points10= G and
10= G. Conversely, any 2 × 2 matrix
ac
bd
= G can be considered a linear
transformation that transforms 10= G to the point
ac= G and
10= G to the point
bd= G.
Any linear transformation T:R2 → R2 can be written in the form T(u) = Au, where A is a 2 × 2 matrix. If the transformation T is applied to a region with area s, then the area of the transformed region is equal to the product of s and the absolute value of the determinant of A, that is |det(A)| × s. Moreover, A can be uniquely determined if the images of two points u and v
are known, where u ≠ kv for some non-zero k ∈ R. If Aac
bd
= = G is the matrixfor T, then:
Tac
bd
ac
10
10
= =e o= = = =G G G G and Tac
bd
bd
01
01
= =e o= = = =G G G G
exAmple4.1
Let T be the transformation with matrix 01
10
= G. Then T10
01
=e o= =G G and
T01
10
=e o= =G G.
a Find the image of the point (2, 4).b Find the image of the point (x, y).
Solution
a To find the image of the point (2, 4), we simply find the matrix product:
01
10
24
42
== = =G G G
7�
Transformations of the cartesian plane
ChApteR4
So the point (2, 4) is transformed or mapped to the point (4, 2).
b Since xy
yx
01
10
== = =G G G, any point (x, y) is mapped to the point (y, x).
Geometrically, this transformation corresponds to a reflection in the line y = x, and can be used to determine inverse relations.
If the points whose images are known are not 10= G and
10= G, then pairs of
simultaneous linear equations or inverse matrices are used to find the matrix for the transformation.
exAmple4.2
If T is a linear transformation that transforms (1, 2) to (–1, 1) and (3, 1) to (0, 1), find the matrix for this transformation.
Solution
Let ac
bd
= G be the matrix corresponding to this linear transformation.
Then ac
bd
12
11
-== = =G G G and
ac
bd
31
01
== = =G G G.
That is, a b
c d
2 1
2 1
+ =-
+ = and
a b
c d
3 0
3 1
+ =
+ =.
Combining the four equations, we have two equations involving a and b, and two equations involving c and d. For integer values of a, b, c and d, these can usually be readily, if somewhat tediously, solved by hand.
aa
bb3
2 10
+
+
=
=
-) 3 has solution a 5
1= , b 5
3=- , and
cc
dd3
2 11
+
+
=
=) 3 has
solution c 51
= , d 52
= , hence the required matrix is 51
51
53
52
-R
T
SSSS
V
X
WWWW.
�0
Matrices
mAthsWoRksfoRteACheRs
Alternatively, ac
bd
12
11
=-
= = >G G H and ac
bd 1
31
0== = =G G G can be combined
and written as ac
bd
12
31
11
01
=-
= = >G G H and then
ac
bd
11
01
12
31
11
01
1
51
52
53
51
51
51
53
52
=-
=-
=
-
-
-
-
= > =
> >
>
G H G
H H
H
exAmple4.3
The matrix A1
328
=-
-> H transforms the point P(x, y) onto the point
Q(1, –7). Find the coordinates of the point P.
Solution
We have xy
13
28
17
-
- -== = =G G G.
Hence xy
13
28
17
4
23
1
21
17
32
1-
- -
-
-
-
- -= = =
-
= = = > = =G G G H G G
So P has coordinates (3, 2).
s t u d e n t A C t I v I t y 4 . 1
a Show that any linear transformation maps the origin to the origin.
b explain why a linear transformation T with T31
11-
-=f p= =G G and T 3
111
-=
-e o= =G G is
not uniquely determined. Find at least two linear transformations that satisfy the above conditions.
c Find the points that are mapped to the points (1, 0) and (0, 1) by the linear
transformation with matrix 45
34
= G.
�1
Transformations of the cartesian plane
ChApteR4
l I n e A R t R A n s fo R m At I o n o f A s t R A I g h t l I n e
Although a linear transformation acts upon individual points or position vectors, we usually want to see how a set of points corresponding to a subset of the plane of interest to us is transformed, particularly curves and figures. The transformation T:R2→ R2 with the rule T(x, y) = (x + y, x – y) can be written in matrix form, with (x, y) as an original point and (x1, y1) as an image point:
xy
xy
xy
xy
11
11
11
112
11
1
1
1,
- -= => = = = = >H G G G G H
Hence x x y21
1 1= +^ h and y x y21
1 1= -^ h. Under this transformation, the
image of the graph of the relation with the equation ax + by + c = 0 (which is a straight line) is the graph of the relation with the equationa x y b x y c2 2 01 1 1 1+ + - + =^ ^h h , or a b x a b y c2
121 01 1+ + - + =] ]g g
(which is also a straight line). In particular, the straight line with equation x y2 3 6 0+ - = is transformed onto the straight line with equation
x y25
21 6 0- - = , as shown for part of the original line (that is a line segment
subset of the original line) and the corresponding image points in Figure 4.2.
–4 –2
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
2O 4
–4
–2
2
4
2x + 3y – 6 = 0 5x – y – 6 = 12
x
y
figure4.2: Graph of 2x + 3y – 6 = 0, 0 < x < 3, and its image under T(x, y) = (x + y, x – y)
�2
Matrices
mAthsWoRksfoRteACheRs
An equation for any straight line can easily be written in parametric form. Although not commonly used for straight lines in the cartesian plane (unless as a simple application of vector kinematics), the parametric form of an equation for a straight line is simply a vector equation for the line. It gives the position vector of each point on the line with respect to the origin. In this sense the coordinates of a point in the plane correspond to its position vector relative to the origin (0, 0). Parametric forms are very useful in computer graphic applications.
The following discussion should be developed through an exposition that connects the graphical picture with the conceptual and symbolic argument. Consider the straight line that passes through the two distinct points P, with position vector p = (x1, y1), and Q, with position vector q = (x2, y2). A direction vector for this line from P to Q is d = q – p = (x2 – x1, y2 – y1), and the position vector r of any point on the line is given by r = p + td, t ∈ R. That is, any point on the line that passes through P and Q must be some distance (a scalar multiple of the length of the directed line segment PQ ) along this line from the point P. This is illustrated in Figure 4.3. If we restrict t to the interval [0, 1], then we have exactly the directed line segment from P to Q.
As d = q – p, another way of writing this position vector is r = p + t(q – p) = (1 – t)p + t q. When t = 0, r = p, and when t = 1, r = q, the endpoints of the line segment. If 0 ≤ t ≤ 1, then the vector r is clearly the position vector of some point on the directed line segment PQ .
dP
Q R
p
q r
O
figure4.3: Vector representation of a line through two distinct points in the plane
For example, consider the line passing through the points P(1, 2) and Q(–3, 4). Then the position vector r of any point R on the line through P and Q can be written r = (1, 2) + t (–4, 2), where PQ = (–4, 2).
�3
Transformations of the cartesian plane
ChApteR4
It is natural for students to inquire how this relates to the more common representation of a straight line by the rule y = mx + c, where c is interpreted as the y-axis intercept and m represents the slope of the line (the ratio of relative difference between the y values of two points with respect to their corresponding difference in x values). A simple parameterisation of y = mx + c is to let x = t, where t ∈ R, then y = mt + c and the corresponding vector parametric form is (t, mt + c), t ∈ R. This simple parameterisation can
also be written in matrix form as xy c m
t0 1
= += = =G G G.
In vector terms, the position vector of any point S on the line is the sum of
the vector c0= G, the position vector of the point (0, c), the y-axis intercept, and
a scalar multiple of the vector m1= G, which gives the direction of the line. Note
that the vector m1= G has a horizontal component of 1 unit and a vertical
component of m units, as shown in Figure 4.4.
Sy
x
m
O
1(0, c)
figure4.4: Vector representation of y = mx + c in the plane
Using matrices and vectors, it can easily be shown that under a linear transformation with non-singular matrix:1 The origin is mapped onto itself.2 The transformation is a one-to-one mapping of R2 onto R2.3 A straight line (line segment) is mapped onto a straight line (line segment).
�4
Matrices
mAthsWoRksfoRteACheRs
4 Any pair of distinct parallel lines is mapped onto another pair of distinct parallel lines.
5 A straight line that passes through the origin is mapped onto another straight line that passes through the origin.If the matrix of the transformation is singular, then any line will be
mapped to a line or point. There are several methods for determining the equation of the image of a
line under a linear transformation. Some involve the use of the inverse of a 2 × 2 matrix, while others do not. The following discussion illustrates three different approaches. Teachers should encourage students to think about the mathematical strategies involved, and where various constructs arise in each case.
exAmple4.4
Consider the linear transformation associated with the matrix
A14
23
=-
= G. Find the image of the line with rule y = 2x + 3 under this
transformation.
Solution
Method 1
Since straight lines are transformed onto straight lines under a linear transformation, we can find the image by finding the image of any two distinct points on the original straight line. For example, the points (0, 3) and (1, 5) clearly lie on the straight line with equation y = 2x + 3.The corresponding
image points are given by 14
23
03
69- -
== = =G G G and 14
23
15
1111- -
== = =G G G.
So the image of the line y = 2x + 3 passes through the points (6, –9) and (11, –11).
Using the general form y y x xy y
x x12 1
2 11- = -
--d ^n h, we get
y x9 52 6+ =- -^ ]h g or y x5
1 2 33=- +] g.
�5
Transformations of the cartesian plane
ChApteR4
Method 2
Use a vector parametric form for the straight line,
r = t
tt
03
12 3 2+
+ == = =G G G, t ∈ R.
Then T(r) = t
ttt
t14
23 3 2
6 59 2
69
52- +
=+
- -=
-+
-= = = = =G G G G G. This
corresponds to a line through the point (6, –9) in the direction of 52-
= G,
that is, with slope 52
- , and so has cartesian equation y x9 52 6+ =- -] g,
or y x51 2 33=- +] g, as before.
Method 3
Consider an arbitrary point (x, y) on the line. This is mapped to the
point (x1, y1) where xy
xy
A 1
1=f p= >G H, that is
xy
xy
14
23
1
1-== = >G G H. This gives
us equations for x1 and y1 in terms of x and y. What we want is to solve these for x and y in terms of x1 and y1, and substitute into the original equation to give an equation involving x1 and y1. This can be done easily by multiplying both sides of the matrix equation by A–1.
xy
xy
14
23
14
23
14
23
1 11
1- - -=
- -
> > = > >H H G H H
xy
xy
xy
x y
x y14
23
113
114
112
111
113 2
114
11
1
1
1
1 1
1 1=
-=
-=
+
-
-
R
T
SSSS
R
T
SSSSS
= > > >
V
X
WWWW
V
X
WWWWW
G H H H
Hence y x2 3= + becomes x y x y
114
2 113 2
31 1 1 1-=
++c m , which
simplifies to y x5 2 331 1=- - or y x51 2 331 1=- +^ h and so the equation
of the transformed line is y x51 2 33=- +] g, as before.
This method can be implemented directly by finding xy
xy
A 1 1
1= -
f p= >G H ,
substituting into y mx c= + , and then solving for y1 in terms of x1. These computations can be readily carried out in one step using a CAS. This method
�6
Matrices
mAthsWoRksfoRteACheRs
will not work if we cannot find the inverse of the transformation matrix, that is, if the transformation matrix is singular. In this case the transformation maps the plane onto a line through the origin or onto the origin itself. Method 3 is easy to adapt to finding the image of any function, and is the one we will use in general.
exAmple4.5
Consider the linear transformation associated with the matrix A11
11
= = G.
Under this transformation, what is the image of the line y = 2x + 3?
Solution
We cannot use Method 3 because the matrix A does not have an inverse, since det(A) = 1 – 1 = 0. We begin in a similar way, and find the image of the point with coordinates (x, y).
xy
x yx y
xy
11
11
1
1
+
+= == = > >G G H H
Since the x1- and y1-coordinates are the same, and x + y is not constant, the image of the line is y = x. In fact, one can show that the image of any line other than those of the form y = –x + c is y = x, and that the image of y = –x + c is the point (c, c), which of course is on the line y = x. This transformation corresponds to a projection onto the line y = x. Projections onto lines will not be considered in any detail, since they do not occur when transforming functions.
s t u d e n t A C t I v I t y 4 . 2
a establish each of the properties of linear transformations (with non-singular matrices) listed above.
b Find the equation of the image of the line with equation y = 5 – 3x under the linear
transformation with matrix 21
13
= G.
c Find the equation of the image of the line with equation y = 5 – 3x under the linear
transformation with matrix 111 1= G.
�7
Transformations of the cartesian plane
ChApteR4
d Find the equations of lines which are mapped to points under the linear
transformation with matrix 111 1= G.
e Find the image of the unit square (that is, the region bounded by line segments joining vertices (0, 0), (0, 1), (1, 1) and (1, 0)) and the area of this region under the
transformation with matrix 12 1
3= G.
l I n e A R t R A n s f o R m At I o n o f A C u R v e
To find the image of the graph of y = f(x) or f(x, y) = c under the linear transformation with matrix A, we can proceed as for a straight line, provided A has an inverse. Consider an arbitrary point (x, y) on the curve which is the graph of the function or relation we are interested in. If this is mapped to the
point (x1, y1) where Axy
xy
1
1== >G H, we can equivalently write
xy
xy
A 1
1
1= -= >G H,
which gives expressions for x and y. These can simply be substituted into y = f(x) or f(x, y) = c to find the equation of the image function or relation.
exAmple4.6
Find the image of the function y = x2 and the relation 4x2 + y2 = 1 under
the linear transformation represented by the matrix A14
23
=-
> H.
Solution
Consider an arbitrary point (x, y) on the curve, which is mapped to the
point (x1, y1) where xy
xy
14
23
1
1-=> = >H G H. This gives us equations for x1 and
y1 in terms of x and y. What we want is to solve these for x and y in terms of x1 and y1, and substitute into the original equation of the curve to give an equation involving x1 and y1. This can be done easily by pre-multiplying both sides of the matrix equation by A–1.
xy
xy
14
23
14
23
14
23
1 11
1- -=
-
- -
> > = > >H H G H H
xy
xy
xy
x y
x y14
23
113
114
112
111
113 2
114
11
1
1
1
1 1
1 1=
-=
-=
+
-
-
R
T
SSSS
R
T
SSSSS
= > > >
V
X
WWWW
V
X
WWWWW
G H H H
��
Matrices
mAthsWoRksfoRteACheRs
Hence the rule of the image of the function y = x2 is the relationx y x y
114
113 21 1 1 1
2-=
+c m . Expanding this expression and replacing x1 by
x and y1 by y gives 44x – 11y = 9x2 + 12xy + 4y2. The graphs of both the original function and the image relation are shown in Figure 4.5.
–4 –3 –2 –1 1 2O 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
–10
–6
2
6
10
x
y
figure4.5: Graph of the function y = x2 and its image relation under transformation by the matrix A
The rule of the image of the relation 4x2 + y2 = 1 is the relation
x y x y4 11
3 211
411 1
21 1
2++
-=c cm m . Expanding this expression and
replacing x1 by x and y1 by y we have 52x2 + 40xy + 17y2 = 121. The graphs of the original curve and its image are shown in Figure 4.6.
–6 –5 –4 –3 –2 –1 1 2 3 4 5 6O
–4
–3
–2
–1
1
2
3
4
x
y
figure4.6: Graph of the relation 4x2 + y2 = 1 and its image relation under transformation by matrix A
��
Transformations of the cartesian plane
ChApteR4
s t u d e n t A C t I v I t y 4 . 3
a Find the image of y = sin(x) under the linear transformation with matrix 30
02
= G.
b Find the image of y = x2 under the transformation with matrix 31
52
= G.
c Find the image of y = x2 under the transformation with matrix 11
11
= G.
s tA n d A R d t y p e s o f l I n e A Rt R A n s f o R m At I o n s
It is a key part of many senior secondary mathematics curricula to investigate the effects of certain standard types of transformations on the graphs of familiar functions and relations. In the following discussion, such an investigation is carried out with respect to the unit square—defined as the region bounded by and including the set of four line segments joining (0, 0) with (1, 0); (1, 0) with (1, 1); (1, 1) with (0, 1) and (0, 1) with (0, 0)—and the graphs of the functions with domain R and rules f(x) = x2 and g(x) = sin(x) respectively. Note that if a linear transformation with matrix A maps a region of area a1 onto a region of area a2, then a2 = |det(A)| × a1.
Computer algebra systems can be used to good effect to apply transformations to points, find algebraic relations corresponding to the transformation of variables, carry out computation involving compositions of transformations and their inverses, and draw graphs of original and transformed sets of points in the plane.
dilationsfromtheaxes
Dilation by a factor k from the x-axis
The transformation matrix is of the form k
10
0= G for k > 0.
exAmple4.7
What effect does the transformation matrix 10
03
= G have on the unit
square and the graphs of the functions f and g?
Solution
Under this transformation, each point (x, y) is mapped to the point (x, 3y), and so the corners (vertices) of the unit square (0, 0), (1, 0), (1, 1)
�0
Matrices
mAthsWoRksfoRteACheRs
and (0, 1) are mapped to the points (0, 0), (1, 0), (1, 3), (0, 3) respectively. The square has been stretched vertically, and the resulting rectangle has area 3 square units, as shown in Figure 4.7.
1–1 2 3O
(1, 1)
(1, 3)
1
–1
2
3
x
y
figure4.7: Graph of the unit square and its image under the transformation with matrix 10
03
< F (corresponding to a dilation by factor 3 from the x-axis)
In general, the point (x, y) is mapped to the point (x1, y1), where
xy
xy
xy
10
03 3
1
1= == = = >G G G H, and so x = x1 and y
y31= .
The effect on the graph of y = f(x) is given by substituting for x and y
in y = x2, which results in y
x31
12= . The corresponding rule for the
transformed function is y = f1(x) = 3x2. Part of the graph of the original function and its image function is shown in Figure 4.8.
–4 –3 –2 –1 1O 2 3 4–1
1
2
3
4
5
6
7
x
y
figure4.�: Graph of part of f (x) = x2 and its image under the transformation with matrix 10
03
< F
�1
Transformations of the cartesian plane
ChApteR4
The effect on the graph of y = g(x) is given by substituting for x and
y in siny x= ] g, which results in ( )siny
x31
1= or siny x31 1= ^ h. The rule
for the transformed function is sing x x31 =] ]g g.The graphs of g and g1 are shown in Figure 4.9.
–4 –3 –2 –1 O 1 2 3 4
–4
–3
–2
–1
1
2
3
4
x
y
figure4.�: Graph of part of g (x) = sin(x) and its image
under the transformation with matrix 10
03
< F
In general, dilation by factor k from the x-axis results in y = f(x) being transformed to y = kf(x).
Dilation by a factor k from the y-axis
The transformation matrix is of the form k0
01
= G for k > 0.
exAmple4.�
What effect does the transformation with matrix 30
01
= G have on the unit square and the graphs of the functions f and g?
Solution
Under the transformation, xy
xy
30
01
3== = =G G G, so each point (x, y) is mapped
to the point (3x, y), and so the vertices of the unit square (0, 0), (1, 0), (1, 1) and (0, 1) are mapped to the points (0, 0), (3, 0), (3, 1), (0, 1) respectively. The square has been stretched horizontally, and the area of the resulting rectangle is 3 square units, as shown in Figure 4.10.
�2
Matrices
mAthsWoRksfoRteACheRs
–1 1
(1, 1) (3, 1)
O 2 3 4
–1
1
x
y
figure4.10: Graph of the unit square and its image under the transformation with matrix 30
01
< F
In general, the effect on any curve or region is described by
xy
xy
xy
30
01
3 1
1= == = = >G G G H, and so x
x31= and y y1= .
The effect on the graph of y = f(x) is given by substituting for x and y
in y = x2, which results in yx x3 911
212
= =c m . The corresponding rule for
the transformed function is y f x x91
2= =] g . Part of the graph of the
original function and its image function are shown in Figure 4.11.
–10 –8 –6 –4 –2 2O 4 6 8 10
–1
1
2
3
4
x
y
figure4.11: Graph of part of f(x) = x2 and its image under the transformation with matrix 30
01
< F
�3
Transformations of the cartesian plane
ChApteR4
The effect on the graph of y = g(x) is given by substituting for x and
y in y = sin(x), which results in sinyx311= c m.
The rule for the transformed function is sing x x31 =] bg l. The graphs
of g and g1 are shown in Figure 4.12.
–10 –8 –6 –4 –2 O 2 4 6 8 10
–1
1
2
x
y
figure4.12: Graph of part of g (x) = sin(x) and its image
under the transformation with matrix 30
01
< F
In general, dilation by a factor k from the y-axis results in y = f(x) being
transformed to y f kx
= b l.
Equal dilations from both axes (scalings)
These transformations are used in scale diagrams and maps. The
transformation matrix is of the form k
k00
= G for k > 0. This transformation is
the product of a dilation from the x-axis followed by a dilation from the
y-axis, or vice versa. That is, k
kk
k kk
00
001
10
0 10
00
01
= == = = = =G G G G G.
exAmple4.�
What effect does the transformation matrix 30
03
= G have on the unit
square and on the graphs of the functions f and g?
�4
Matrices
mAthsWoRksfoRteACheRs
Solution
Under the transformation xy
xy
30
03
33
== = =G G G. Each point (x, y) is mapped to
the point (3x, 3y), and so the vertices of the unit square (0, 0), (1, 0), (1, 1) and (0, 1) are mapped to the points (0, 0), (3, 0), (3, 3), (0, 3) respectively. The square has been scaled by a factor of 3 and the resultant square has an area of 9 square units, as shown in Figure 4.13.
1–1 2 3O
(1, 1)
(3, 3)
1
–1
2
3
x
y
figure4.13: Graph of the unit square and its image
under the transformation with matrix 30
03
< F
Since xy
xy
xy
30
03
33
1
1= == = = >G G G H, we have x
x31= and y
y31= , so the
graphs of y = x2 and y = sin(x) are transformed to the graphs of the
functions y x3 31 1
2
= c m and siny x3 31 1= c m or y x
32
= and siny x3 3= b l
respectively.
In general, dilation by factor k from both axes results in y = f(x) being
transformed to y kf kx
= b l.
After a suitable range of specific examples have been explored from first principles for a variety of functions and relations, students can be led to consider the general case of a composition of dilations from both axes. This can be used to naturally and informally introduce the notion of composition of linear transformations. It also provides an example of a set of matrices for
which multiplication is commutative. Thus, given the transformation k
0
0
1y
> H
�5
Transformations of the cartesian plane
ChApteR4
for ky > 0 (a dilation from the y-axis) and the transformation k
10
0
x> H for kx > 0
(a dilation from the x-axis), the product
kk k
k k
k0
0
1
10
0 10
0
0
0
1 0
0y
x x
y y
x= => > > > >H H H H H
is a composite dilation, and this composition is commutative (it doesn’t matter in which order the transformations are applied, the final image is the same).
After such a composite dilation, y = f(x) becomes y k f kx
xy
= e o.
Reflectionsinlinesthroughtheorigin
The matrices 1
001
-> H,
10
01-
> H and 01
10
= G represent (i) a reflection in the y-axis,
(ii) a reflection in the x-axis and a (iii) reflection in the line y = x respectively, since for any (x, y) we have:
i xy
xy
10
01
- -=> = >H G H
ii xy
xy
10
01- -
=> = >H G H
iii xy
yx
01
10
== = =G G G
That these matrices are indeed the correct representations for the corresponding transformation can readily be seen by considering a general
transformation matrix ab
cd
= G and choosing a, b, c and d so that the requisite
transformation of coordinates applies. For example, reflection in the y-axis
maps the point (x, y) to the point (–x, y), or in matrix form ac
bd
xy
xy
-== = >G G H.
Thus, as we require ax + by = –x, this can be obtained by having a = –1 and b = 0. Similarly, as we require cx + dy = y, this is obtained by having c = 0
and d = 1, so the required transformation matrix is ac
bd
10
01
-== >G H
It is a useful exercise for students to similarly produce the other reflection matrices mentioned above, and apply the same reasoning to the dilation matrices covered earlier.
�6
Matrices
mAthsWoRksfoRteACheRs
exAmple4.10
What is the effect of the matrix 1
001
-> H corresponding to a reflection in
the y-axis on the graphs of the functions f and g?
Solution
Under this transformation xy
xy
xy
10
01
1
1
- -= => = > >H G H H, and so x = –x1 and
y = y1. Thus, in general, y = f(x) is transformed to y1 = f(–x1) or y = f(–x). So y = x2 is transformed to y = (–x)2 = x2 and y = sin(x) is transformed to y = sin(–x) = –sin(x). When the image of a function or relation is the same as the original function under a transformation, the transformation illustrates a symmetry of the function or relation (see Leigh-Lancaster, 2006). In this case, for y = x2
, the symmetry exhibited is reflection (mirror) symmetry in the vertical coordinate axis.
exAmple4.11
What is the effect of the matrix 10
01-
> H corresponding to a reflection in
the x-axis on the graphs of the functions f and g?
Solution
Under this transformation xy
xy
xy
10
01
1
1- -= => = > >H G H H, and so x = x1 and
y = –y1. Thus, in general, y = f(x) is transformed to –y1 = f(x1) or y = –f(x). So y = x2 is transformed to y = –(x)2 and y = sin(x) is transformed to y = –sin(x).
exAmple4.12
What is the effect of the matrix 01
10
= G corresponding to a reflection in
the line y = x on the graphs of the functions f and g?
Solution
Under this transformation xy
yx
xy
01
10
1
1= == = = >G G G H, and so x = y1 and
y = x1. Thus, in general, y = f(x) is transformed to x1 = f(y1) or x = f(y). So y = x2 is transformed to x = y2 and y = sin(x) is transformed to x = sin(y).
�7
Transformations of the cartesian plane
ChApteR4
The transformed functions are no longer functions but relations. In fact each function and its transformed relation are, by definition, inverse relations. For any one-to-one function h, the inverse relation is also a one-to-one function. Such functions are useful in solving equations by hand and using technology, since, for such a function, h, the equation h(x) = k will have the corresponding solution x = h–1(k).
Reflectionintheliney=mx
To find the transformation matrix, we need to find the images of the point A with coordinates (1, 0) and the point B with coordinates (0, 1) under the transformation. We will assume in the following that 0 < m < 1. The reader can adapt the argument for other values of m. Consider the diagram shown in Figure 4.14.
–2 –1
θθ
1 2
–1
O
1
x
P
A
y = mx
Q
y
figure4.14: Finding the image of A (1, 0) under reflection in the line y = mx, where 0 < m < 1
First we find the image of point A with coordinates (1, 0). The line through A perpendicular to the line y = mx cuts the line y = mx at Q and passes through the point P, where length of PQ = length of AQ. So P is the image of A after reflection in the line y = mx, and OP is also of length 1 unit. Let θ be the angle that the line makes with the positive x-axis, so tan (θ) = m. Then the angle POA = 2θ and so the coordinates of P are (cos(2θ), sin(2θ)) by definition. Next we find the image of point B with coordinates (0, 1), as shown in Figure 4.15.
��
Matrices
mAthsWoRksfoRteACheRs
The line through B perpendicular to y = mx intersects the line y = mx at R, cuts the x-axis at T, and S is the image of B after reflection in the line y = mx. Angles ORB and ORS are both right angles, and so, since the sum of angles in a triangle is 180°, the magnitude of angle TOS = 90° – 2θ. Hence the coordinates of S are (cos(–(90° – 2θ)), sin(–(90° – 2θ))). Using the double angle formulas
cos(A – B) = cos(A)cos(B) + sin(A)sin(B) and sin(A – B) = sin(A)cos(B) – cos(A)sin(B)
we see that the coordinates of S are (sin(2θ), –cos(2θ)). Hence a reflection in
the line y = mx can be represented by the matrix cossin
sincos
22
22-
]
]
]
]
g
g
g
g> H, where
tan(θ) = m and 2 21 1q-p p . In most cases it will not be possible to evaluate θ
exactly, but sinm
m21
22=
+q] g and cos
mm2
11
2
2=
+
-q] g for a reflection in the line y = mx.
exAmple4.13
The graph of the function y = f(x) is reflected in the line y = 2x. Find the equation of the transformed function.
Solution
Since m = tan(θ) = 2, it follows that θ = arctan(2), sin(2θ) = 0.8 and
cos(2θ) = –0.6. Consider .
...
xy
xy
0 60 8
0 80 6
1
1
-=> = >H G H.
–2 –1
θ
θ
θ
1 2
–1
O
1
xT
S
By = mx
R
y
figure4.15: Finding the image of B (0, 1) under reflection in the line y = mx, where 0 < m < 1
��
Transformations of the cartesian plane
ChApteR4
As reflection transformations are self-inverse, the inverse of a reflection matrix will be the reflection matrix itself (which can be readily verified by using the standard form for the inverse of a
2 × 2 matrix), .
...
xy
xy
0 60 8
0 80 6
1
1=
-= > >G H H.
Thus, the equation of the transformed function will be 0.8x + 0.6y = f(–0.6x + 0.8y).
Rotationsabouttheorigin
What is the matrix representing rotation about the origin in the anticlockwise direction through an angle θ, as shown in Figure 4.16?
–1
P
O
Q
θ θ1
1
x
y
figure4.16: rotation of points (1, 0) and (0, 1) anticlockwise about the origin through an angle 02q
The point (1, 0) is rotated to point P with coordinates (cos(θ), sin(θ)) and the point (0, 1) is rotated to the point Q with coordinates (cos(90° + θ), sin(90° + θ)) = (–sin(θ), cos(θ)). Hence the matrix corresponding to an anti-clockwise rotation about the origin through the angle θ is:
cossin
sincos
]
]
]
]
g
g
g
g= G
This result can be used to establish the compound angle formulas used in the previous section, by considering a rotation through an angle of θ1 + θ2 as both a single rotation through an angle of θ1 + θ2 and as a composition of two rotations, one through an angle of θ1 followed by another through an angle of θ2. By definition, the two corresponding matrices must be equal, so the corresponding elements give the required identities (see Leigh-Lancaster, 2006, pp. 66–8).
100
Matrices
mAthsWoRksfoRteACheRs
exAmple4.14
Find the resultant function when the graph of y = f(x) is rotated anticlockwise about the origin through an angle of 60°. Find the image for the particular case when f(x) = 2x.
Solution
The matrix corresponding to an anticlockwise rotation of 60° about the
origin is cossin
sincos
6060
6060
-c
c
c
c
]
]
]
]
g
g
g
g= G.
Then cossin
sincos
xy
xy
6060
6060
1
1
-=
c
c
c
c
]
]
]
]
g
g
g
g= = =G G G
so cossin
sincos
cossin
sincos
xy
xy
xy
6060
6060
6060
6060
11
1
1
1
=-
=-
-c
c
c
c
c
c
c
c
]
]
]
]
]
]
]
]
g
g
g
g
g
g
g
g
= = >
> >
G G H
H H
and y = f(x) is transformed to –xsin(60°) + y cos(60°) = f(xcos(60°) + y sin(60°)).
Using the known exact surd values for sin(60°) and cos(60°) we
obtain x yf x y
23
2 2 23
- + = +d n. For the particular case f(x) = 2x, the
image will be x y x y2
32 2 2
32- + = +d n, which can be rearranged to
give y x 115 3
118
=- +c m.
Students should be encouraged to explore these special cases:
1 When θ = 90° the rotation matrix becomes 01
10
-= G.
2 When θ = 180° the rotation matrix becomes 1
001
-
-= G.
101
Transformations of the cartesian plane
ChApteR4
C o m p o s I t I o n o f l I n e A R t R A n s f o R m At I o n s
When several transformations are applied in a sequence, the matrix of the resulting transformation is simply the product of the corresponding matrices
applied from right to left. If xy= G is the coordinate vector of any point in the
plane, and the linear transformation S is applied to xy= G then the resultant
image coordinates will by given by xy
xy
S=l
l= =G G. If a second transformation T is
applied to this, then the resultant image coordinates of the combined
transformations will be given by Txy
T Sxy
TSxy
= =l
lf ]p g= = =G G G. The process is
likewise repeated if further linear transformations are applied.
exAmple4.15
Find the matrix of the transformation consisting of a reflection in the y-axis followed by an anticlockwise rotation about the origin through an angle of 45°.
Solution
The matrix for the first transformation is 1
001
-> H, while the matrix for
the second transformation is cossin
sincos
4545
4545
21
21
21
21
-=
-c
c
c
c
]
]
]
]
g
g
g
g
R
T
SSSS
=
V
X
WWWW
G .
To obtain the image of each point (x, y) in the plane under the given
sequence of these two transformations we use xy
10
01
21
21
21
21
- -R
T
SSSS
> =
V
X
WWWW
H G.
So the matrix acting upon each point is the product of the rotation matrix and the reflection matrix:
A1
001
21
21
21
21
21
21
21
21=
- -=
-
-
-R
T
SSSS
R
T
SSSS
>
V
X
WWWW
V
X
WWWW
H
102
Matrices
mAthsWoRksfoRteACheRs
If the sequence of application of these two transformations is reversed, that is, we seek to find the matrix of the transformation consisting of a rotation about the origin through an angle of 45° followed by a reflection in the y-axis, then the combined transformation matrix will be
A1
001
21
21
21
21
21
21
21
21=
- -=
-R
T
SSSS
R
T
SSSS
>
V
X
WWWW
V
X
WWWW
H , which is not the same as the previous
combined transformation matrix. The order of the transformations is important for the composition of these two transformations, unlike the case for composition of two rotations about the origin.
In general, the matrices corresponding to the linear transformations should be multiplied from right to left matching the sequence of application of the transformations. Knowledge of the geometry of the transformations involved will provide insight into whether the composition of transformations in reverse order yields the same result as the composition of transformations in the original order, and consequently whether the matrix product is commutative. In general as matrix multiplication is not commutative, so the composition of transformations will not be the same when their sequence is reversed unless this is a consequence of the nature of the transformations involved.
s t u d e n t A C t I v I t y 4 . 4
a Find the coordinates of the images of the vertices of the unit square under the
transformation with matrix k
k00y
x> H, and show that its area is given by kxky.
b Show that the rule of the image of y = f (x) under the transformation with matrix
k
k00y
x> H, where k 0x ! and 0ky ! , is y k f k
xx
y= e o.
c Find the image of the relation x2 + y2 = 1 (the unit circle) under the transformation
with matrix k
k00y
x> H where kx = b and ky = a and a 0! and b 0! . Hence find the
formula for the area of the ellipse with horizontal axis length 2a and vertical axis length 2b.
d Find the transformation matrix for an anticlockwise rotation about the origin through an angle θ followed by an anticlockwise rotation about the origin through an angle φ. Use the transformation matrix for an anticlockwise rotation about the origin through an angle θ + φ to show:i sin(θ + φ) = sin(θ)cos(φ) + cos(θ)sin(φ) ii cos(θ + φ) = cos(θ)cos(φ) – sin(θ)sin(φ)
103
Transformations of the cartesian plane
ChApteR4
e The matrix k110
= G represents a shear transformation in the x direction and the
matrix k1 0
1= G represents a shear transformation in the y direction. Find the image
of the unit square under the shear transformation with matrix 10
31
= G and draw the original and its image on the same graph.
f Find the image of the function y = f (x) under the shear transformation with matrix
10
31
= G. In particular, find the image of y = x2 under this transformation.
A f f I n e t R A n s f o R m At I o n s
A function T: R2 → R2 is called an affine transformation if T(u) = Au + B, where u is an ordered pair (position vector) corresponding to a point in the plane, A is a 2 × 2 matrix and B is a fixed element of R2. A is called the matrix for T, and B the constant vector. It follows that every linear transformation is
an affine transformation, where ,B 0 000
= =^ h = G. However, affine
transformations also include translations parallel to the coordinate axes, and combinations of these translations. With some discussion similar to the earlier case for linear transformations, it will be apparent to students that affine transformations also transform straight lines to straight lines and line segments to line segments. However, unlike linear transformations, they do
not necessarily transform the origin ,0 000
=^ h = G to itself, or straight lines
through the origin to straight lines through the origin. Teachers may wish to explore some simple examples with students to establish this observation, for example, finding the image of y = x under the affine transformation
Txy
xy
10
02
01
= +f p= = = =G G G G, and the natural generalisation.
translationsparalleltotheaxes
To translate a point p units in the positive direction of the x-axis, we would
use Txy
xy
pI
0= +f p= = =G G G, where I is the identity matrix of order 2. To translate a
point q units in the positive direction of the y-axis, we would use
104
Matrices
mAthsWoRksfoRteACheRs
Txy
Ixy q
0= +f p= = =G G G. We can form the composite of these transformations to
translate a point p units in the positive direction of the x-axis and q units in
the positive direction of the y-axis, by Txy
Ixy
pq
= +f p= = =G G G. In this last case,
Txy
Ixy
p xyq
1
1= + =f p= = = >G G G H, so
xy
xy
pq
x py q
1
1
1
1
-
-= - == > = >G H G H and y = f(x) will be
transformed to y q f x p- = -^ h.
exAmple4.16
Find the image of the point (2, 6), the line y = 3x and the parabola y = x2 under a translation by three units in the x-direction and two units in the y-direction.
Solution
Txy
xy
10
01
32
= +f p= = = =G G G G
so T26
10
01
26
32
26
32
58
= + = + =f p= = = = = = =G G G G G G G
Suppose (x1, y1) is the image of the point with coordinates (x, y).
Then Txy
xy
xy
10
01
32
1
1= + =f p= = = = >G G G G H
Hence xy
xy
xy
32
32
1
1
1
1
-
-= - == > = >G H G H
Then x = x1 − 3 and y = y1 − 2, and the line y = 3x becomes the line y1 − 2 = 3(x1 − 3) or y = 3x − 7.
For the case of the parabola y = x2, we have y1 − 2 = (x1 − 3)2 or y = (x − 3)2 + 2.
C o m p o s I t I o n o f A f f I n e t R A n s f o R m At I o n sIf S and T are affine transformations, then so is the composite transformation T°S defined by T°S(u) = T(S(u)). Here S is applied to u first, and T is then applied to the result. Since u is a 2 × 1 coordinate (vector) matrix, S(u) is also a 2 × 1 coordinate (vector) matrix by conformability of matrix multiplication
105
Transformations of the cartesian plane
ChApteR4
and addition. The same argument then applies for the application of T to the 2 × 1 coordinate matrix S(u).
exAmple4.17
a Find the image of the point u = (x, y) after a reflection in the y-axis followed by a dilation from the x-axis by a factor of 2.
b Find the image of the point u = (x, y) after a dilation from the x-axis by a factor of 2 followed by a reflection in the y-axis.
Solution
a Let S be a reflection in the y-axis, so S has the matrix representation
1
001
-> H; and let T be a dilation from the x-axis by a factor of 2, so T
has the matrix representation 10
02
= G.
Then T°S(u) =T(S(u)) = xy
xy
10
02
10
01 2
- -== > = >G H G H, where T°S has the
matrix representation 10
02
10
01
10
02
- -== > >G H H.
b In the reverse order of application we have
S°T(u) = S(T(u)) = xy
xy
10
01
10
02 2
-=
-> = = >H G G H, where S°T also has the
matrix representation 1
001
10
02
10
02
- -=> = >H G H.
Thus composition of these two transformations is commutative, that is T°S = S°T.
Students should be able to verify this for themselves by considering the geometric interpretation of both compositions.
exAmple4.1�
The following sequence of affine transformations is applied to the region bounded by the unit circle {(x, y): x2 + y2 = 1, –1 ≤ x ≤ 1} to obtain an obliquely oriented ellipse:1 dilation by a factor of 3 from the y-axis and a factor of 2 from the
x-axis
106
Matrices
mAthsWoRksfoRteACheRs
2 rotation through an angle of 45° anti-clockwise about the origin3 translation 1 unit parallel to the x-axis and 1 unit parallel to the
y-axisFind the equation of the resulting ellipse, find its area and then draw
the corresponding graph.
Solution
The dilations are applied by multiplying an arbitrary position vector xy= G
by the matrix 30
02
= G. The rotation is then applied by multiplying the
result by the matrix 21
21
21
21
-R
T
SSSSS
V
X
WWWWW, and the translation is applied by
addition of the matrix 11= G.
Hence, in combination, xy
xy
21
21
21
21
30
02
11
1
1
-
= +
R
T
SSSSS
> = = =
V
X
WWWWW
H G G G
Reversing the order (that is, applying the inverse transformations) we obtain:
xy
xy
11
21
21
21
21
30
02
1
1- =
-R
T
SSSSS
> = = =
V
X
WWWWW
H G G G
which gives xy
xy
11
21
21
21
21
30
02
1
1
-
-
-
=
R
T
SSSSS
> = =
V
X
WWWWW
H G G
and so xy
xy
x y
y x30
02
21
21
21
21
11
62
62
32
42
42
1
1
1
1
1 1
1 1=
- -
-=
+ -
-
-
-R
T
SSSSS
R
T
SSSSS
= = >
V
X
WWWWW
V
X
WWWWW
G G H .
Substituting these values for x and y into the equation for the unit circle x2 + y2 = 1 yields the relation:
13x2 − 2x(5y + 8) + 13y2 − 16y = 56The area of the original region is π square units.
107
Transformations of the cartesian plane
ChApteR4
The area of the transformed region is
det det30
02
21
21
21
21
6# #p p-
=
R
T
SSSSS
=
V
X
WWWWW
G square units.
The graph of both the original and the transformed relations are shown in Figure 4.17.
–6 –5 –4 –3 –2
1
–1
–2
–3
2
3
4
–1 1 2 3 4 5 6Ox
y
figure4.17: Composition of affine transformation from unit circle to an oblique translated ellipse
The processes of finding the rule of an (affine) transformed function or relation can be carried out readily using CAS, and the corresponding functions or relations graphed.
s t u d e n t A C t I v I t y 4 . 5
a The graph of the function with rule f x x1
=] g is transformed as follows:
1 a dilation by a factor of 0.5 from the y-axis2 a reflection in the y-axis3 a translation of 3 units parallel to the x-axis and a translation of 1 unit parallel to
the y-axis. Use matrices to find the rule of the transformed function.b The graph of the relation x – y2 = 0 is reflected in the x-axis and then translated
2 units to the right and 1 unit down. Find the rule of the relation for the transformed graph.
10�
Matrices
mAthsWoRksfoRteACheRs
Matrices provide a natural representation for coordinate (position)
vectors in the cartesian plane where ,x yxy
x y10
01
= = +^ h = = =G G G.
The matrices (vectors) 10= G and
01= G are said to form a basis for the
coordinate vectors of the cartesian plane, since any coordinate vector can be written as a linear combination of these two vectors.A linear transformation of the cartesian plane is a function T: R2 → R2, where T(x, y) = (ax + by, cx + dy) and a, b, c and d are real numbers. In matrix form this can be represented by: ac
bd
xy
ax bycx dy
+
+== = >G G H
Linear transformations (with non-singular matrices) map straight lines onto straight lines and preserve the parallel relation between straight lines.The image of the origin under a linear transformation is itself, and a straight line passing through the origin is mapped onto another straight line passing through the origin (provided the transformation matrix is non-singular).
If the images of 10= G and
01= G under a linear transformation T are
ac= G
and bd= G respectively, then T
xy
ac
bd
xy
=f p= = =G G G.
If the images of pq= G and
rs= G under a linear transformation T with
matrix A are ac= G and
bd= G respectively, then A
pq
rs
ac
bd
== =G G and
Aac
bd
pq
rs
1
#=-
= =G G if the inverse matrix exists.
If P, with position vector p = (x1, y1), and Q, with position vector q = (x2, y2) are two distinct points, and d = q – p = (x2 – x1, y2 – y1), then the position vector r of any point on the line that contains P and Q is given by r = p + td, t ∈ R.
•
•
•
•
•
•
•
s u m m A R y
10�
Transformations of the cartesian plane
ChApteR4
The linear function with rule y = mx + c can be written in matrix
(vector) parametric form as xy c m
t0 1
= += = =G G G, where t ∈ R.
To find the image of the graph of y = f(x) or f(x, y) = c under the
linear transformation with matrix A, we substitute xy
xy
A 1
1
1= -= >G H
into y = f(x) or f(x, y) = c, if A–1 exists.k
k0
0y
x> H where ky and kx are positive real numbers, represents a
dilation by a factor ky from the y-axis and a dilation by a factor kx
from the x-axis in either order; cossin
sincos
22
22-
]
]
]
]
g
g
g
g> H represents
reflection in the line y = mx where m = tan(θ); cossin
sincos
]
]
]
]
g
g
g
g= G
represents a rotation through an angle θ anticlockwise about the
origin; and ab= G represents translation by vector (a, b) (a units in the
x-direction and b units in the y-direction when a 02 and b 02 ).An affine transformation of the cartesian plane is a function T: R2 → R2 such that T(u) = Au + B. In matrix form this can be
represented by ac
bd
xy
ef
ax by ecx dy f
+ +
+ ++ == = = >G G G H. Affine
transformations (where A–1 exists) map straight lines onto straight lines and preserve the parallel relation between straight lines; however, lines through the origin are not necessarily mapped to other lines through the origin.If S and T are affine transformations, then so is the composite transformation T°S defined by T°S(u) = T(S(u)). The composition may, or may not, be commutative. This may be determined by computation of the respective matrices, or by geometric interpretation of the transformations involved.
•
•
•
•
•
s u m m A R y (Cont.)
110
Matrices
mAthsWoRksfoRteACheRs
ReferencesAnton, H & Rorres, C 2005, Elementary linear algebra (applications version), 9th
edn, John Wiley and Sons, New York.Cirrito, F (ed.) 1999, Mathematics higher level (core), 2nd edn, IBID Press, Victoria.Evans, L 2006, Complex numbers and vectors, ACER Press, Camberwell.Leigh-Lancaster, D 2006, Functional equations, ACER Press, Camberwell.Nicholson, KW 2003, Linear algebra with applications, 4th edn, McGraw-Hill
Ryerson, Whitby, ON.Sadler, AJ & Thorning, DWS 1996, Understanding pure mathematics, Oxford
University Press, Oxford.
Websiteshttp://wims.unice.fr/wims/en_tool~linear~matrix.html – WIMS
This website provides a matrix calculator. http://en.wikipedia.org/wiki/Linear_transformation – Wikipedia
This site provides a comprehensive discussion on linear transformations with links to other resources and references.
http://en.wikipedia.org/wiki/Affine_transformation – WikipediaThis site provides a comprehensive discussion on affine transformations with links to other resources and references.
http://www.ies.co.jp/math/java/misc/don_trans/don_trans.html This website contains an applet that shows how the shape of a dog is transformed by a 2 × 2 matrix.
http://merganser.math.gvsu.edu/david/linear/linear.htmlThis website contains an applet that allows you to move a slider adjusting coefficients in a 2 × 2 matrix and see the effect of the equivalent transformation on the unit square.
When a translation is part of an affine transformation, this may occur before or after composition with other linear transformations. – If the translation component occurs last, then the affine
transformation has the form X1 = AX + B, so X = A–1(X1 – B) will determine x and y in terms of x1 and y1 for substitution into the rule of the original function or relation to determine the rule of the transformed relation or function.
– If the translation component occurs first, then the affine transformation has the form X1 = A(X + B), so X = A–1X1 – B will determine x and y in terms of x1 and y1 for substitution into the rule of the original function or relation to determine the rule of the transformed function or relation.
•
s u m m A R y (Cont.)
111
C H A p T e r 5t R A n s I t I o n m At R I C e s
C o n d I t I o n A l p R o b A b I l I t y
One of the key ideas that students come across early in their study of probabilities related to compound events for a given event space is the notion of conditional probability and the associated ideas of dependent and independent events. While students’ experience with subjective probability makes them well acquainted with events that may or may not be dependent, such as the likelihood of scoring on the second of two free shots in a basketball game, given that one may or may not have scored on the first shot, their school study of probability often begins with compound events that are (physically) independent events. On the other hand, a combination of experience and knowledge indicates that certain events are dependent, for example, gender and colour blindness to red and green, or having a disease and the likelihood of testing positive for the disease. Whether events in a given context are independent or not, is not always clear. Thus, experiments with tossing coins and rolling dice, which are physically independent, can lead to an implicit willingness, or preference, to assume that two events from a given event space are independent. One of the more intriguing and counter-intuitive scenarios involving conditional probability is the game show problem called the Monty Hall dilemma (see the UCSD website). This is a good context for stimulating student interest, many people think the events involved are (or should be) independent.
If we consider two events A and B from the same event space, U, then the conditional probability of A given B, that is the probability that event A occurs given that event B has occurred, written as |Pr A B^ h, corresponds to the proportion of B events that are also A events, relative to the proportion of B
events; that is, |PrPr
PrA BB
A Bk=^
]
]h
g
g. Conditional probability can be used to
discuss both dependence and independence of events.
112
Matrices
mAthsWoRksfoRteACheRs
If A and B are independent events, then |Pr A B^ h will be the same as Pr(A)
and so in this case |PrPr
Pr PrA BB
A B A+= =^
]
]]h
g
gg or
Pr Pr PrA B A B+ #=] ] ]g g g. Similarly, |Pr B A^ h will be the same as Pr(B) and
so in this case |PrPr
Pr PrB AA
A B B+= =^
]
]]h
g
gg or, as before,
Pr Pr PrA B A B+ #=] ] ]g g g.If Pr(A | B) is different from Pr(A), and likewise Pr(B | A) will be different
from Pr(B), then A and B are dependent events. Any two events A and B from a given event space may or may not be dependent; however, the following relationships (sometimes called the law of total probability for two events) holds irrespective of whether this is the case or not:
| |
| |
Pr Pr Pr Pr Pr
Pr Pr Pr Pr Pr
A A B B A B B
B B A A B A A
# #
# #
= +
= +
l l
l l
] ^ ] ^ ]
] ^ ] ^ ]
g h g h g
g h g h g
where A′ and B′ are the complements of A and B in U respectively.In practice, these relationships can be represented using a Venn diagram or
Karnaugh map, a tree diagram, or by matrices. It is important to ensure that students are familiar with each of these representations and their use, to assist in solving problems related to probabilities associated with simple compound events.
t R A n s I t I o n p R o b A b I l I t I e s
Many natural systems undergo a process of change where at any time the system can be in one of a finite number of distinct states. For example, the weather in a city could be sunny, cloudy and fine, or rainy. Such a system changes with time from one state to another, and at scheduled times, or stages, the state of the system is observed. At any given stage, the state to which it changes at the next stage cannot be determined with certainty, but the probability that a given state will occur next can be specified by knowing the current state of the system. That is, the probability that the system will be in a given state next is conditional only on the current state. Such a process of change is called a Markov chain or Markov process. The conditional probabilities involved (that is, the probabilities of going to one state given that the system was in a certain state) are called transition probabilities, and the process can be modelled using matrices.
113
Transition matrices
ChApteR5
We begin by considering the following example.Jane always does the weekly shopping at one of two stores, A and B. She
never shops at A twice in a row. However, if she shops at B, she is three times as likely to shop at B the next time as at A. Suppose that she initially shops at A.
This is an example of a Markov chain, since the store at which she shops next depends only on the store she shopped at the week before, and the conditional probabilities for each possible outcome are the same on each occasion. There are two states, state 1 which corresponds to shopping at store A, and state 2 which corresponds to shopping at store B. We can represent this by a tree diagram, as shown in Figure 5.1 and set up a corresponding table of transition probabilities, as shown in Table 5.1.
A
A
B
B
A
B
Current
0
1
0.25
0.75
Next
figure5.1: Tree diagram representation for the first transition
table5.1:summaryoftransitionprobabilities
presentweek’sstore
A b
nextweek’sstoreA 0 0.25
b 1 0.75
Note that the columns of the table sum to 1. The store at which Jane shops in a given week is not determined. The most we can expect to know is the probability that she will shop at A or B in that week. Let s( )m
1 denote the probability that she shops at A in week m, and s( )m
2 the probability that she shops at B in week m.
114
Matrices
mAthsWoRksfoRteACheRs
exAmple5.1
Use the law of total probability to find the probability that Jane shops at store A in week 1 and the probability that Jane shops at store B in week 1 given that she shops at store A initially.
Solution
As she shops at A initially, s 1( )10 = and s 0( )
20 = . For the next week (using
the law of total probability):
|
|
( )
.
Pr
Pr
Pr
Pr
s A A
A
A BB
1
0
1
0 1 0 25 0 0
shops at in week shopped at in week 0
shopped at in week
shops at in week shopped at in week 0
shopped at in week 0
( )11
#
#
# #
=
+
= + =
^
^
^
h
h
h
|
|
( )
.
Pr
Pr
Pr
Pr
s B A
A
B BB
1
0
1
1 1 0 75 0 1
shops at in week shopped at in week 0
shopped at in week
shops at in week shopped at in week 0
shopped at in week 0
( )21
#
#
# #
=
+
= + =
^
^
^
h
h
h
If we write Ss
s010
20
=
]
]
g
g> H and Ss
s111
21
=
]
]
g
g> H, then, in general, Ss
s mm
m1
2
=
]
]
g
g> H is
called the state vector for week m, since it gives the probabilities of being in any state after m weeks (or transitions). It is convenient to let S0 correspond to the initial week. Since the definition of matrix multiplication corresponds naturally to the calculations we wish to carry out for this purpose, these calculations can be written in matrix form as follows:
.
.S
s
sPS
01
0 250 75
10
011
11
21 0= = = =
]
]
g
g> = = =H G G G
where ..
P01
0 250 75
= = G is the matrix of transition probabilities, and is called
the transition matrix. Teachers should take care to ensure that students follow the modelling
process and make the conceptual connections between the law of total probability, its application to the transition state problem, and the subsequent representation using matrices and their products.
115
Transition matrices
ChApteR5
Hence, from S1 we can see that given that she shopped at A in week 0, the probability that she shops at A in week 1 is 0 and the probability that she shops at B in week 1 is 1. What happens two weeks after shopping at A initially?
If we use a tree diagram representation, as shown in Figure 5.2, we can calculate the corresponding probabilities for the first two transitions.
A
A
A
B
B
Initially Week 1
Week 2
0
0
0.25
0.75
1
1 A
B
figure5.2: Tree diagram representation for the first two transitions
|
|
( )
. .
Pr
Pr
Pr
Pr
s A A
A
A BB
2 1
1
2 1
1
0 0 0 25 1 0 25
shops at in week shopped at in week
shopped at in week
shops at in week shopped at in week
shopped at in week
( )12
#
#
# #
=
+
= + =
^
^
^
h
h
h
|
|
( )
. .
Pr
Pr
Pr
Pr
s B A
A
B BB
2 1
1
2 1
1
1 0 0 75 1 0 75
shops at in week shopped at in week
shopped at in week
shops at in week shopped at in week
shopped at in week
( )22
#
#
# #
=
+
= + =
^
^
^
h
h
h
In matrix terms, this is equivalent to
.
...
Ss
sPS
01
0 250 75
01
0 250 752
12
22 1= = = =
]
]
g
g> = = =H G G G.
Hence, given that Jane shopped at A in week 0, the probability that she shops at A in week 2 is 0.25 and the probability that she shops at B in week 2 is 0.75.
Moreover, ..
.
...
S PS P PS P S0 250 75
0 18750 8125
10
0 250 752 1 0
20= = = ==^ h = = =G G G (the first
column of P2).This can be extended to week 3; however, the process of calculation using
tree diagrams becomes increasingly time and space consuming, whereas the
116
Matrices
mAthsWoRksfoRteACheRs
matrix form offers a much more convenient representation for these calculations, where S PS P S3 2
30= = , and in general, for m weeks later:
S PS P Sm mm
1 0= =-
Where many transitions may take place the relevant matrix calculations are best done by technology, using, for example, a CAS.
exAmple5.2
Find the probabilities that Jane shops at A (i) 3, (ii) 4, (iii) 5, (iv) 10, (v) 50 and (vi) 100 weeks later.
Solution
i ..
.
...
S PS01
0 250 75
0 250 75
0 18750 81253 2= = == = =G G G, and so the probability that she
shops at A three weeks later is 0.1875.
ii ..
.
...
S PS01
0 250 75
0 18750 8125
0 2031250 7968754 3= = == = =G G G, and so the probability
that she shops at A 4 weeks later is 0.203125
iii ..
S P S0 1990 8015
50 .= = G, and so the probability that she shops at A
5 weeks later is approximately 0.199.
iv ..
S P S0 2000 80010
100 .= = G, and so the probability that she shops at A
10 weeks later is approximately 0.200.
v ..
S P S0 2000 80050
500 .= = G, and so the probability that she shops at A
50 weeks later is approximately 0.200.
vi ..
S P S0 2000 800100
1000 .= = G, and so the probability that she shops at A
100 weeks later is approximately 0.200. We can note that
..
.
.P
0 20 8
0 20 8
100 . = G; in fact there seems to have been very little change,
at this level of accuracy for larger values of Pm.
In this example, the state vectors S0, S1, S2 ... Sn appear to converge to ..
S0 20 8
= = G as n increases.
117
Transition matrices
ChApteR5
If this is indeed the case, then we can say that the long-term probability of Jane shopping at A is 0.2 and shopping at B is 0.8. That is, in the long term she will shop at A 20% of the time and at B 80% of the time, provided that she doesn’t change her pattern of behaviour in this regards.
If instead of shopping at A in week 0, we knew she shopped at B in
week 0, then S010 = = G, and the corresponding probabilities are shown in
Example 5.3.
exAmple5.3
Find the probabilities that Jane shops at A in (i) 1, (ii) 2, (iii) 3, (iv) 4, (v) 5, (vi) 10, (vii) 50 and (viii) 100 weeks later, given that she initially shopped at B.
Solution
i ..
.
.S PS
01
0 250 75
01
0 250 751 0= = == = =G G G, and so the probability that she
shops at A 1 week later is 0.25. This state vector is the same as that after two transitions if she shops at A initially.
ii ..
.
...
S PS01
0 250 75
0 250 75
0 18750 81252 1= = == = =G G G, and so the probability that she
shops at A 2 weeks later is 0.1875.
iii ..
.
...
S PS01
0 250 75
0 18750 8125
0 2031250 7968753 2= = == = =G G G, and so the probability
that she shops at A 3 weeks later is 0.203125.
iv ..
S P S0 1990 8014
40 .= = G, and so the probability that she shops at A
4 weeks later is approximately 0.199.
v ..
S P S0 2000 8005
50 .= = G, and so the probability that she shops at A
5 weeks later is approximately 0.200.
vi ..
S P S0 2000 80010
100 .= = G, and so the probability that she shops at A
10 weeks later is approximately 0.200.
11�
Matrices
mAthsWoRksfoRteACheRs
vii ..
S P S0 2000 80050
500 .= = G, and so the probability that she shops at A
50 weeks later is approximately 0.200.
viii ..
S P S0 2000 800100
1000 .= = G, and so the probability that she shops at A
100 weeks later is approximately 0.200.
The long-term probabilities are as in the previous case, in which we assumed Jane initially shopped at A. So it appears that, with this pattern of shopping behaviour, in the long term Jane will shop at A in any week with probability 0.2, and at B in any week with probability 0.8, regardless of where she shopped initially.
Table 5.2 summarises the state vectors for five transitions given that Jane either shopped at A or B initially, as well as the transition matrix raised to the number of transitions, for the first five transitions.
table5.2:summaryofstateandtransitionmatricesforfivetransitionsfromeitherinitialstate
numberof
transitions
statevectorgivensheshopsatA
initially
statevectorgivensheshopsatB
initially
Pn,wherenisthenumberoftransitions
1 01= G
.
.0 250 75= G
.
.01
0 250 75
= G
2 ..
0 250 75= G
.
.0 18750 8125= G
.
...
0 250 75
0 18750 8125
= G
3 ..
0 18750 8125= G
.
.0 2031250 796875= G
.
...
0 18750 8125
0 2031250 796875
= G
4 ..
0 2031250 796875= G
.
.0 1992190 800781= G
.
...
0 2031250 796875
0 1992190 800781
= G
5 ..
0 1992190 800781= G
.
.0 2001950 799805= G
.
...
0 1992190 800781
0 2001950 799805
= G
We can see from this table that the columns of Pn contain the state vectors (after n transitions) after initially shopping at A or at B respectively. So the (i, j)th element of Pn gives the probability of starting in state j and moving to state i after n transitions.
11�
Transition matrices
ChApteR5
s t u d e n t A C t I v I t y 5 . 1
a Suppose initially we are unsure of where Jane shops, but it could be at either A or
B with equal probability. Then ..
S 0 50 50 = = G. Find S1, S2, S5, S10, and S50.
b Consider a Markov chain, with two states 1 and 2, with transition probability matrix P, with pij = probability of going from state j to state i in one transition.
Suppose ..
.
.P 0 445
0 5550 4440 556
3 = = G. What is the probability of
i going from state 1 to state 2 in three transitions?ii going from state 2 to state 2 in three transitions?
t h e s t e A d y- s tAt e v e C t o R
It appears that in the long term, after many transitions, the state vectors converge to the same vector regardless of where Jane initially shopped. Such a vector is called a steady-state vector. If this is the case, how can we find the steady-state vector for this problem?
We can phrase this question more generally, and suppose that P is the transition matrix of a Markov chain, and assume that the state vectors Sm converge to a limiting state vector S. Then Sm is very close to S for sufficiently large m, so Sm + 1 is also very close to S. Then the equation
Sm + 1 = PSm
is closely approximated by
S = PS
where S is a solution to this matrix equation. It is easily solved as it can be written as a system of linear equations in matrix form
(I – P)S = O
where the entries of S are the unknowns and I is the identity matrix. This homogeneous system has many solutions; the one we are most interested in is the one whose entries sum to 1.
120
Matrices
mAthsWoRksfoRteACheRs
exAmple5.4
Find the steady-state vector S for Jane’s shopping.
Solution
.
..
.I P
10
01
01
0 250 75
11
0 250 25
- =-
-- == = >G G H
Using Gaussian elimination, this reduces to .1
00 250
-> H, and so the
solution for Sss
1
2= > H is s1 = 0.25 × s2 with s1 + s2 = 1, hence s1 = 0.2,
s2 = 0.8.
It is a natural question to ask if we can always find a steady-state vector, and whether the powers of the transition matrix converge to a matrix whose columns are equal to the steady-state vector. To answer this we need to introduce the notion of a regular transition probability matrix.
We say that a transition probability matrix P is regular if, for some positive integer m, the matrix Pm has no zero entries. It can be shown that, if P is a regular transition probability matrix, then it has a unique steady-state vector S (see, for example, the Iowa State maths website). Further, the matrix defined by limL P
mm=
"3 exists, and is given by | | |L S S Sf= 6 @, that is a matrix where
each column is a copy of the steady-state vector S, and if L = [lij] then lij is the long-term probability of being in state i if the system began in state j.
It is straightforward to determine the steady state vector for a 2 × 2 transition probability matrix, given that it exists. To do this we use a general
formulation, taking a transition probability matrix Pa
ab
b1
1=
-
-= G, where
a0 1# # and b0 1# # , and solve (I − P)S = O for S where I is the 2 × 2 identity matrix. Then
aa
bb
aa
bb
10
01
11
-
- -
-- == = =G G G
which reduces by Gaussian elimination to
a b0 0
-= G.
Writing Sxy
= = G, we have ax = by. There are now two possible cases:
Case 1: If b 02 , then y bax
= , and as x + y = 1, x bax 1+ = , and so
x ba1 1+ =b l or x b
a b 1+=
] g. Since b 02 , x a b
b=
+ and hence y a b
a=
+.
121
Transition matrices
ChApteR5
Note that if a = 0, then y = 0 and x = 1, and state 1 is called an absorbing state. This means that once the system is in state 1 it will never leave it. Note
also that if a = 1 and b = 1, then P01
10
= = G is not regular but ..
.
.01
10
0 50 5
0 50 5
== = =G G G,
and so the state vectors will converge if and only if the system has initial state
vector ..
0 50 5= G; that is, there is no steady state vector.
Case 2: If b = 0, then ax = 0, so a = 0 or x = 0. If a = 0, then the transition matrix is the identity matrix, so the system stays in whatever state it is in initially. So there is no steady state vector. If a ≠ 0 and x = 0, then y = 1, and state 2 is also an absorbing state.
In summary, if the transition probability matrix Pa
a bb1
1=
-
-= G is
regular, then the steady state vector is given by S a bb
a ba=+
+
R
T
SSSSS
V
X
WWWWW.
s t u d e n t A C t I v I t y 5 . 2
a Show that the transition probability matrix P10
21
21= > H is regular. Find the
steady-state vector S and the limit matrix limL Pn
n="3
.
b Show that the transition probability matrix P 01
10
= = G is not regular. Does the limit
matrix limL Pm
m="3
exist? Does it have a steady-state vector?
c For a0 11 1 , show that the matrix Pa
a101
=-
= G is not regular. Find a
steady-state vector and lim Pn
n"3
if it exists.
A p p l I C At I o n s o f t R A n s I t I o n m At R I C e s
Examples such as the following, and others from various practical and research contexts, or from the literature, can be used to help students develop the formulation, solutions and interpretation skills associated with modelling and problem solving that employs transition matrices and Markov chains. Keys aspects of these processes are:• consideration of features of the context that indicate a Markov process is
likely to provide a suitable model
122
Matrices
mAthsWoRksfoRteACheRs
• identification of relevant states and transition (conditional) probabilities and initial state (or states)
• formulation of the transition matrix and initial state vector, and computation of relevant powers of the transition matrix and subsequent state vectors
• analysis of long-run behaviour, including investigation and interpretation of possible steady-state or other behaviour of the systemWhile the first few transitions for a two-state system, and its long-run
behaviour, assuming convergence to a steady state, can readily be computed by hand calculation, student familiarity with the use of a suitable technology such as CAS is required for computation of higher powers in two-state problems, and problems where there are more than two states.
exAmple5.5
OzBank offers customers two choices of credit card: Ordinary and Gold. Currently 70% of its customers have an Ordinary card and 30% have a Gold card. The bank wants to increase the percentage of its customers with a Gold card, as it gets higher fees from these customers, and so sends out an offer to all Ordinary cardholders offering a free upgrade to a Gold card for twelve months. It expects that each month for the next three months, 10% of its Ordinary cardholders will upgrade to a Gold card, but 1% of Gold cardholders will downgrade to an Ordinary card. What percentage of its customers would have Gold cards at the end of the three months?
Solution
This information can be summarised as a table:
Current
ordinary gold
onemonthlaterordinary 0.90 0.01
gold 0.10 0.99
The corresponding transition matrix is ..
.
.0 900 10
0 010 99
= G, with initial state
vector ..
0 700 30= G.
123
Transition matrices
ChApteR5
To find the percentages three months later, we calculate
.
...
.
...
0 900 10
0 010 99
0 700 30
0 520 48
3
.= = =G G G.
So after three months the bank could expect approximately 48% of its customers to have a Gold card.
We can also observe that if the number of customers was fixed
during this period at, say, 1000, then we could use 700300= G instead of the
initial state vector, since ..
700300
0 700 30
1000 #== =G G, and use this to calculate
the numbers in each state after each transition. This only works if the total number of objects remains constant over transitions.
exAmple5.6
A wombat has its burrow beside a creek and each night it searches for food on either the east or west side of the creek. The side on which it searches for food each night depends only on the side on which it searched the night before. If the wombat searches for food on the east side one night, then the probability of the wombat searching on the east side of the creek the next night is 0.2. The transition matrix for the probabilities of the wombat searching for food on either side of the creek given the side searched on the previous night is
.
...
0 20 8
0 70 3
= G
a If the wombat searches for food on the west side one night, what is the probability that it searches for food on the west side the next night?
b If the wombat searches for food on the west side on the Monday night, what is the probability it searches for food on the west side again on the following Saturday (5 days later)?
c In the long term, what proportion of nights will it spend searching for food on the west side?
124
Matrices
mAthsWoRksfoRteACheRs
Solution
a We can view the transition matrix as below. It is clear that if it searches on the west side one night then the probability that it searches for food on the west side the next night is 0.3.
sidesearchedforfoodcurrentnight
east West
sidesearchedforfoodthenextnight
east 0.2 0.7
West 0.8 0.3
b We need to find ..
.
...
0 20 8
0 70 3
01
16077
16083
0 481250 51875
5
= =
R
T
SSSS
= = =
V
X
WWWW
G G G, or alternatively,
simply calculate ..
.
...
.
.0 20 8
0 70 3
0 450 55
0 481250 51875
5
== =G G. The probability the
wombat searches for food on the west side on the following Saturday
is .16083 0 51875= .
c Using the formula for the steady-state solution, with a = 0.8 and
b = 0.7, . ..x 0 7 0 8
0 7157
=+
= and
. ..y 0 7 0 8
0 8158
=+
= . Hence in the long term the wombat will spend
158 of the nights searching for food on the west side.
Alternatively, we can solve the matrix equation
..
.
.xy
10
01
0 20 8
0 70 3
00
- =f p= = = =G G G G for x and y, with x + y = 1.
The coefficient matrix reduces to . .0 80
0 70
-> H using Gaussian
elimination (replacing row 2 by row 2 + row 1), and so
0.8x – 0.7y = 0. Then ..
xy y
0 80 7
87
= = . As x + y = 1, y
y87
1+ = ,
y8
151= and so y 15
8= and x 15
7= . Hence in the long term the
wombat will spend 158 of the nights searching for food on the west
side.
125
Transition matrices
ChApteR5
Also, we could take suitable large powers of P and observe whether there is significant change in the resultant values or not.
Now ..
.
.P
0 4666670 533333
0 4666670 533333
20 . = G, and ..
.
.P
0 4666670 533333
0 4666670 533333
30 . = G,
so we appear to have convergence to 6 decimal places, and so the wombat will spend approximately 0.466667 (or just under 47%) of the nights searching for food on the east side and 0.533333 (or just over 53%) of the nights searching for food on the west side.
exAmple5.7
A wombat has its burrow beside a creek and each night it searches for food on either the other side of the creek or north or south of its burrow on the same side of the creek. The area in which it searches for food each night depends only on the area in which it searched for food the night before. If the wombat searches for food on the other side of the creek one night, then the probabilities of the wombat searching on the other side of the creek, or north or south of its burrow the next night are 0.2, 0.4 and 0.4 respectively. If the wombat searches for food north of its burrow one night, then the probability that it will search for food north of its burrow the next night is 0.1. The transition matrix for the probabilities of the wombat searching for food in each area given the area searched for food on the previous night is
.
.
.
.
.
.
.
.
.
0 20 40 4
0 50 10 4
0 50 30 2
R
T
SSSS
V
X
WWWW
a If the wombat searches for food on the south side of its burrow one night, what is the probability that he searches for food on the north side the next night?
b If the wombat searches for food on the north side of its burrow on the Monday night, what is the probability it searches for food on the north side of its burrow again on the following Saturday (5 days later)?
c In the long term, what proportion of nights will the wombat spend searching for food north of its burrow?
126
Matrices
mAthsWoRksfoRteACheRs
Solution
a We can view the transition matrix as below, and it is clear that if it searches for food on the south side one night then the probability that he searches for food on the north side the next night is 0.3.
sidesearchedforfoodcurrentnight
other north south
sidesearchedforfoodthenextnight
other 0.2 0.5 0.5
north 0.4 0.1 0.3
south 0.4 0.4 0.2
b We need to find ...
.
.
.
.
.
.
.
.
.
0 20 40 4
0 50 10 4
0 50 30 2
010
200007711
10000028101
31251042
0 385550 281010 33344
5
.=
R
T
SSSS
R
T
SSSS
R
T
SSSSSSS
R
T
SSSS
V
X
WWWW
V
X
WWWW
V
X
WWWWWWW
V
X
WWWW, and so
the probability it searches for food north of its burrow on the
following Saturday is .10000028101 0 28101= .
c Solve ...
.
.
.
.
.
.
xyz
100
010
001
0 20 40 4
0 50 10 4
0 50 30 2
000
- =
J
L
KKK
N
P
OOO
R
T
SSSS
R
T
SSSS
R
T
SSSS
R
T
SSSS
V
X
WWWW
V
X
WWWW
V
X
WWWW
V
X
WWWW for x, y and z, with
x + y + z = 1.
.
.
.
.
.
.
.
.
.
...
...
.
..
100
010
001
0 20 40 4
0 50 10 4
0 50 30 2
0 80 40 4
0 50 90 4
0 50 3
0 8- = -
-
-
-
-
-
R
T
SSSS
R
T
SSSS
R
T
SSSS
V
X
WWWW
V
X
WWWW
V
X
WWWW
which reduces to
1
0
0
0
1
0
1315
1311
0
-
-
R
T
SSSSSS
V
X
WWWWWW by Gaussian elimination.
Hence we have x z1315
= and y z1311
= with x + y + z = 1. This
gives 3z = 1, so z 31
= , y 3911
= and x 135
= . Thus in the long term
the wombat will spend 3911 (≈ 0.28205) of nights searching for food
north of its burrow.
127
Transition matrices
ChApteR5
Alternatively, consider suitably large powers of the transition matrix:
.
.
.
.
.
.
.
.
.P
0 384620 282050 33333
0 384620 282050 33333
0 384620 282050 33333
15 .
R
T
SSSS
V
X
WWWW
and ...
.
.
.
.
.
.P
0 384620 282050 33333
0 384620 282050 33333
0 384620 282050 33333
20 .
R
T
SSSS
V
X
WWWW
and so we have ...
.
.
.
.
.
.P
0 384620 282050 33333
0 384620 282050 33333
0 384620 282050 33333
.3
R
T
SSSS
V
X
WWWW
Hence the wombat will spend 0.28205 approximately of nights searching for food north of its burrow.
exAmple5.�
Consider a simple genetic model, involving just two types of alleles, A and a, for a gene. Suppose that a physical trait such as eye colour is controlled by a pair of these genes, one inherited from each parent. An individual could then have one of three combinations of alleles of the form AA, Aa or aa. A person may be classified as being in one of three states: Dominant (type AA): gene of type A from both parents Hybrid (type Aa): gene of type A from one parent and gene of type
a from the other parent Recessive (type aa): gene of type a from both parents
Assume that the gene inherited from a parent is a random ‘choice’ from the parent’s two genes and that each each parent is equally likely to transmit either of its two genes to an offspring. We can form a Markov chain by starting with a population and always crossing with hybrids to produce offspring. The time required to produce a subsequent generation is the time period for the chain.
What is the corresponding transition matrix? Suppose we start with a person with dominant trait—type AA—and cross with a person with hybrid trait—type Aa. Type AA will always contribute A to the offspring, and type Aa will contribute A one half of the time and a one half of the time. If we start with a hybrid, and cross with a hybrid, we have the following situation: the first hybrid will contribute either A or
12�
Matrices
mAthsWoRksfoRteACheRs
a to the offspring, each with probability one-half. The second hybrid will also contribute either A or a to the offspring, again each with probability one-half. Hence we have one-quarter probability of AA, one-quarter probability of aa and one-half probability of hybrid Aa. The transition matrix is as follows:
D
P=
D H R
H
21
21
0
41
21
41
0
21
21R
R
T
SSSSSSS
V
X
WWWWWWW
a What proportion of the third generation offspring (that is, after two time periods) of the recessive population has the dominant trait?
b What proportion of the third generation offspring (after two time periods) of the hybrid population is not hybrid?
c If, initially, the entire population is hybrid, find the population vector in the next generation.
d If, initially, the population is evenly divided among the three states, find the population vector in the third generation (after two time transitions).
e Show that this Markov chain is regular, and find the steady-state population vector.
Solution
a We need to calculate P2, and obtain the number in the (1, 3) position.
P
83
21
81
41
21
41
81
21
83
2 =
R
T
SSSSSSS
V
X
WWWWWWW
Hence one-eighth of the third generation offspring of the recessive population has the dominant trait.
b From P2, we see that one-half of the third generation offspring of the hybrid population has the hybrid trait, while one-quarter has dominant and one-quarter has the recessive trait. So one-half is not hybrid.
12�
Transition matrices
ChApteR5
c Here S010
0 =
R
T
SSSS
V
X
WWWW, and so PS
41
21
41
0 =
R
T
SSSSSSS
V
X
WWWWWWW
and this is the population
distribution vector for the next period.
d Here S
31
31
31
0 =
R
T
SSSSSSS
V
X
WWWWWWW
, and so P S
41
21
41
20 =
R
T
SSSSSSS
V
X
WWWWWWW
is the population vector for the
third generation. This seems familiar. e As P2 has no zero entries, P is regular. If we approximate a suitably
high power of P, say P20, we get ...
...
...
0 250 50 25
0 250 50 25
0 250 50 25
R
T
SSSS
V
X
WWWW. So we can guess
that the steady-state population distribution should be ...
0 250 50 25
R
T
SSSS
V
X
WWWW. This
should be checked.Begin with I P- :
100
010
001
21
21
0
41
21
41
0
21
21
21
21
0
41
21
41
0
21
21
-
-
-
-- =
R
T
SSSS
R
T
SSSSSSS
R
T
SSSSSSS
V
X
WWWW
V
X
WWWWWWW
V
X
WWWWWWW
and use technology to obtain the reduced row echelon form:
100
010
12
0
-
-
R
T
SSSS
V
X
WWWW
Let Sxyz
=
R
T
SSSS
V
X
WWWW. From the above, x = z and y = 2z. Then since
x + y + z = 1, z 41
= and so ...
S0 250 50 25
=
R
T
SSSS
V
X
WWWW.
At this stage it is useful to recall that a state in a Markov chain is called an absorbing state if it is not possible to leave that state over the next time period. If state i is an absorbing state, then in the ith column of the transition
130
Matrices
mAthsWoRksfoRteACheRs
matrix, there will be a 1 in the ith row and zeroes everywhere else. When we take powers of the transition matrix, the ith column will remain the same, and so the transition matrix is not regular, and may not have a steady-state vector.
Now, suppose that we always cross with recessives.
Then the transition matrix is P
0
1
0
0
21
21
0
0
1
=
R
T
SSSSSS
V
X
WWWWWW
. This has an absorbing state,
the recessive state. Let us take some powers of P, and see what happens.
P
0
21
21
0
41
43
0
0
1
2 =
R
T
SSSSSS
V
X
WWWWWW
P
0
41
43
0
81
87
0
0
1
3 =
R
T
SSSSSS
V
X
WWWWWW
P
0
5121
512511
0
10241
10241023
0
0
1
10 =
R
T
SSSSSS
V
X
WWWWWW
Continuing with higher powers, it would appear that there will be a
steady-state vector, equal to 001
R
T
SSSS
V
X
WWWW. In the long term, we will end up with
recessives.
Check that S001
=
R
T
SSSS
V
X
WWWW is a steady-state vector; that is, show PS = S.
exAmple5.�
Humans have two sets of chromosomes, one obtained from each parent, which determine their genetic makeup. In this example we investigate the inbreeding problem.
131
Transition matrices
ChApteR5
Assume that two individuals mate randomly. In the next generation, two of their offspring of opposite sex mate randomly. Suppose the process of brother and sister sibling mating continues each generation. We can regard this as a Markov chain whose states consist of six mating states:State 1: AA × AAState 2: AA × AaState 3: AA × aaState 4: Aa × AaState 5: Aa × aaState 6: aa × aa
Let P = [pij] be the corresponding transition matrix. Suppose that the parents are both of type AA. Then all children will be of type AA, and so a mating of brother and sister will only give AA. Hence p11 = 1.
Suppose that the parents are of type AA × Aa. Then half their children will be of type AA and half will be of type Aa. A mating of these offspring will give 0.25 of AA × AA, 0.5 of AA × Aa and 0.25 of Aa × Aa, and so p21 = 0.25, p22 = 0.5 and p24 = 0.25.
Continuing in this way, we can obtain Table 5.3.
table5.3:summaryofparent,offspringandoffspringmatingcombinationsfortheinbreedingproblem,andrelatedprobabilities
parents offspring offspringmating
AA × AA All AA. All AA × AA
AA × Aa 0.5 AA0.5 Aa
0.25 AA × AA0.5 AA × Aa 0.25 Aa × Aa
AA × aa All Aa. All Aa × Aa
Aa × Aa 0.25 AA 0.25 aa 0.5 Aa
0.0625 AA × AA0. 25 AA × Aa 0.125 AA × aa0.25 Aa × Aa0.25 Aa × aa0.0625 aa × aa
Aa × aa 0.5 Aa0.5 aa
0.25 Aa × Aa0.25 aa × aa0.5 Aa × aa
aa × aa All aa. All aa × aa.
132
Matrices
mAthsWoRksfoRteACheRs
Note that states 1 and 6 are absorbing states.
Hence the transition matrix will be P
1
0
0
0
0
0
41
21
0
41
0
0
0
0
0
1
0
0
161
41
81
41
41
161
0
0
0
41
21
41
0
0
0
0
0
1
=
R
T
SSSSSSSSSSSSSS
V
X
WWWWWWWWWWWWWW
.
P is not regular, since all powers of P will have zeroes in the first column except for the first position and in the last column except for the last position.
What happens in the long term? We can easily use technology to find powers of the matrix P. To find the long-term steady-state solution if it exists we need to solve the homogeneous system (I – P)S = O, where I is the 6 × 6 identity matrix, S is the steady-state vector and O is the 6 × 1 zero matrix. This has been done with technology. The reduced row echelon form of (I – P) is
000000
100000
010000
001000
000100
000000
R
T
SSSSSSSS
V
X
WWWWWWWW
and so, if S = [si] then s2 = s3 = s4 = s5 = 0, and all we know about s1 and
s6 is that they must sum to 1. In fact, for 0 ≤ a ≤ 1, S
a
a
0000
1
=
-
R
T
SSSSSSSSS
V
X
WWWWWWWWW
is a vector
such that PS = S, but it is not a steady-state vector.
133
Transition matrices
ChApteR5
Using technology to compute large powers of P, we can guess that
.
.
.
.
.
.
.
.
lim P
100000
0 750000
0 25
0 50000
0 5
0 50000
0 5
0 250000
0 75
000001
nn =
"3
R
T
SSSSSSSS
V
X
WWWWWWWW
This means that in the long term we will end up with some combination of AA × AA and/or aa × aa, depending on the initial state vector.
s t u d e n t A C t I v I t y 5 . 3
a For example 5.8, suppose that we always cross with dominants. Determine the transition matrix P, calculate P10 and P20, find lim P
nn
"3 and the steady-state vector if
it exists.b For example 5.9, find the long-term distribution of the population if the initial state
vector is
i
.
.
.
.
.
.
0 10 20 20 20 20 1
R
T
SSSSSSSS
V
X
WWWWWWWW
ii
.
.
.
.0 2
00 10 30 4
0
R
T
SSSSSSSS
V
X
WWWWWWWW
iii ...
.0 350 150 25
0
00 25
R
T
SSSSSSSS
V
X
WWWWWWWW
c There are three states in a country, called A, B and C. each year 10% of the residents of state A move to state B and 30% to state C, 20% of the residents of State B move to state A and 20% to state C, and 5% of the residents of state C move to state A and 15% to state B. Suppose initially the population is equally divided between the three states. i Find the percentage of the population in the states after 3 years.ii Find the percentage of the population in the three states after a long period
of time.
134
Matrices
mAthsWoRksfoRteACheRs
ReferencesAnton, H & Rorres, C 2005, Elementary linear algebra (applications version), 9th
edn, John Wiley and Sons, New York. Nicholson, KW 2001, Elementary linear algebra, 1st edn, McGraw-Hill Ryerson,
Whitby, ON.
A transition matrix P = [pij] for a Markov chain is a square matrix with non-negative entries such that the sum of the entries in each column is 1.pij = probability of moving from state j to state i in one transition.If the column vector S0 is the initial population distribution vector between states in a Markov chain with transition matrix P, the population distribution vector after one time period of the chain is PS0.Pm is the transition matrix for m time periods, so the population distribution after m time periods is PmS0, and if pij(m) is the (i,j)th element of Pm, then pij(m) gives the probability of moving from state j to state i after m transitions.A Markov chain and its associated matrix P is called regular if there exists an integer m such that Pm has no zero entries.If P is a regular transition matrix for a Markov chain:– The columns of Pm all approach the same probability distribution
vector S as m becomes large. – S is the unique probability vector satisfying PS = S.– As the number of time periods increases, the population
distribution vectors approach S regardless of the initial population distribution (provided P is regular). Thus S is called the steady-state population distribution vector.
For a 2 × 2 regular transition matrix a
ab
b1
1-
-= G, 0 ≤ a ≤ 1,
0 ≤ b ≤ 1, the steady-state vector is xy= G, where x a b
b=
+ and
y a ba
=+
(a + b > 0).
A state is called an absorbing state if once the system is in this state it will never leave it. If state j is an absorbing state, then pjj = 1 and pij = 0 for i ≠ j
•
••
•
•
•
•
•
s u m m A R y
135
Transition matrices
ChApteR5
Nicholson, KW 2003, Linear algebra with applications, 4th edn, McGraw-Hill Ryerson, Whitby, ON.
Poole, D 2006, Linear algebra: A modern introduction, 2nd edn, Brooks Cole, California.
Wheal, M 2003, Matrices: Mathematical models for organising and manipulating information, 2nd edn, Australian Association of Mathematics Teachers, Adelaide.
Websiteshttp://math.ucsd.edu/~crypto/Monty/monty.html
This website simulates the Monty Hall dilemma.http://orion.math.iastate.edu/msm/AthertonRMSMSS05.pdf#search=%22proof%20
that%20regular%20transition%20matrices%20converge%20to%20a%20steady%20state%22 – Iowa State Department of MathematicsThis pdf discusses Markov chains.
http://math.rice.edu/~pcmi/mathlinks/montyurl.html This website has links to many other sites that discuss or simulate the Monty Hall dilemma.
136
C H A p T e r 6C u R R I C u l u m C o n n e C t I o n s
Different school systems and educational jurisdictions have particular features in their senior secondary mathematics curricula that have been developed over decades, and even centuries in some cases, to meet the historical and contemporary educational needs of their cultures and societies. When these curricula are reviewed, it is often the case that this includes a process of benchmarking with respect to corresponding curricula in other systems and jurisdictions. This may be in a local, county, state, national or international context.
Over the past few decades, particularly in conjunction with renewed interest in comparative international assessments (such as TIMSS and PISA OECD), curriculum benchmarking has been employed extensively by educational authorities and ministries. Such benchmarking reveals much that is common in curriculum design and purpose in senior secondary mathematics courses around the world. Some key design constructs that are used to characterise the nature of senior secondary mathematics courses are:• content (areas of study, topics, strands)• aspects of working mathematically (concepts, skills and processes,
numerical, graphical, analytical, problem-solving, modelling, investigative, computation and proof)
• the use of technology, and when it is permitted, required or restricted (calculators, spreadsheets, statistical software, dynamic geometry software, CAS)
• the nature of related assessments (examinations, school-based and the relationship between these)
• the relationship between the final year subjects and previous years in terms of the acquisition of important mathematical background (assumed knowledge and skills, competencies, prerequisites and the like)
• the amount and nature of prescribed material within the course (completely prescribed, unitised, modularised, core plus options)
137
Curriculum connections
ChApteR6
• the amount of in-class (prescribed) and out-of-class (advised) time that students are expected to spend for completion of the course In broad terms, it is possible to characterise four main sorts of senior
secondary mathematics courses.
type1
Courses designed to consolidate and develop the foundation and numeracy skills of students with respect to the practical application of mathematics in other areas of study. These often have a thematic basis for course implementation.
type2
Courses designed to provide a general mathematical background for students proceeding to employment or further study with a numerical emphasis, and likely to draw strongly on data analysis and discrete mathematics. Such courses typically do not contain any calculus material, or only basic calculus material, related to the application of average and instantaneous rates of change. They may include, for example, business-related mathematics, linear programming, network theory, sequences, series and difference equations, practical applications of matrices and the like.
type3
Courses designed to provide a sound foundation in function, coordinate geometry, algebra, calculus and possibly probability with an analytical emphasis. These courses develop mathematical content to support further studies in mathematics, the sciences and sometimes economics.
type4
Courses designed to provide an advanced or specialist background in mathematics. These courses have a strong analytical emphasis and often incorporate a focus on mathematical proof. They typically include complex numbers, vectors, theoretical applications of matrices (for example transformations of the plane), higher level calculus (integration techniques, differential equations), kinematics and dynamics. In many cases Type 4 courses assume that students have previous or concurrent enrolment in a Type 3 course, or subsume them.
Table 6.1 provides a mapping in terms of curriculum connections between the chapters of this book, the four types of course identified above, and the courses currently (2006) offered in various Australian states and territories.
13�
Matrices
mAthsWoRksfoRteACheRs
As this book is a teacher resource, these connections are with respect to the usefulness of material from the chapters in terms of mathematical background of relevance, rather than direct mapping to curriculum content or syllabuses in a particular state or territory.
table6.1:CurriculumconnectionsforseniorsecondaryfinalyearmathematicscoursesinAustralia
stateorterritory
typeofcourse Relevantchapters
Victoria 2: Further Mathematics 1, 2 and 5
3: Mathematical Methods (CAS) all
4: Specialist Mathematics 1, 3 and 4
New South Wales
2: General Mathematics 3
3: Mathematics and Mathematics Extension 1 –
4: Mathematics Extension 2 –
Queensland 2: Mathematics A –
3: Mathematics B 3 and 4
4: Mathematics C all
South Australia/ Northern Territory
2: Mathematical Applications 1, 2 and 5
3: Mathematical Methods/Mathematical Studies
all
4: Specialist Mathematics –
Western Australia
2: Discrete Mathematics –
3: Applicable Mathematics 1, 3 and 4
4: Calculus –
Tasmania 2: Mathematics Applied –
3: Mathematics Methods –
4: Mathematics Specialised 1, 2, 3 and 4
Table 6.2 provides a mapping in terms of curriculum connections between the chapters of this book, the four types of course identified above, and some of the courses currently (2006) offered in various English-speaking jurisdictions around the world. Again, as this book is a teacher resource, these
13�
Curriculum connections
ChApteR6
connections indicate the usefulness of material from the chapters in terms of mathematical background of relevance, rather than direct mapping to curriculum content, or syllabuses, in a particular jurisdiction.
table6.2:Curriculumconnectionsforseniorsecondaryfinalyearmathematicscoursesinsomejurisdictionsaroundtheworld
stateorterritory
typeofcourse Relevantchapters
College Board US
3: Advanced Placement Calculus AB –
4: Advanced Placements Calculus BC –
International Baccalaureate Organisation (IBO)
3: Mathematics SL 1, 2, 3 and 4
4: Mathematics HL 1, 2, 3 and 4
UK 3: AS Mathematics 1, 2, 3 and 4
4: Advanced level 1, 2, 3 and 4
Content from the chapters of the book may be mapped explicitly to topics within particular courses, and teachers will perhaps find it useful to informally make these more specific connections in terms of their intended implementation of a given course of interest to them.
ReferencesThe following are the website addresses of Australian state and territory curriculum and assessment authorities, boards and councils. These include various teacher reference and support materials for curriculum and assessment.The Senior Secondary Assessment Board of South Australia (SSABSA)
http://www.ssabsa.sa.edu.au/ The Victorian Curriculum and Assessment Authority (VCAA)
http://www.vcaa.vic.edu.au/ The Tasmanian Qualifications Authority (TQA)
http://www.tqa.tas.gov.au/ The Queensland Studies Authority (QSA)
http://www.qsa.qld.edu.au/ The Board of Studies New South Wales (BOS)
http://www.boardofstudies.nsw.edu.au/The Australian Capital Territory Board of Senior Secondary Studies (ACTBSSS)
http://www.decs.act.gov.au/bsss/welcome.htm
140
Matrices
mAthsWoRksfoRteACheRs
The Curriculum Council Western Australia http://www.curriculum.wa.edu.au/
The following are the website addresses of various international and overseas curriculum and assessment authorities, boards, councils and organisations:College Board US Advanced Placement (AP) Calculus
http://www.collegeboard.com/student/testing/ap/sub_calab.html?calcab International Baccalaureate Organisation (IBO)
http://www.ibo.org/ibo/index.cfm Qualifications and Curriculum Authority (QCA) UK
http://www.qca.org.uk/ OECD Program for International Student Assessment (PISA)
http://www.pisa.oecd.org Trends in International Mathematics and Science Study (TIMSS)
http://nces.ed.gov/timss/
141
C H A p T e r 7s o l u t I o n n o t e s t o s t u d e n tA C t I v I t I e s
studentactivity2.1
a
....
.
.
.
.
450
10
0104
1006
3040
2430
2201
0 050 10 20 512
7 908 355 004 10
# =
R
T
SSSSS
R
T
SSSSSSSS
R
T
SSSSS
V
X
WWWWW
V
X
WWWWWWWW
V
X
WWWWW
That is, Michael has $7.90, Jay has $8.35, Sam has $5.00 and Lin has $4.10.
b .
.
.
.
.
.
.
..
0 76
7 908 355 004 10
6 0046 346
3 1163 80
# =
R
T
SSSSS
R
T
SSSSS
V
X
WWWWW
V
X
WWWWW
That is, Michael has US$6.00, Jay US$6.35, Sam US$3.80 and Lin US$3.12.
studentactivity2.2
a i A222
64
=-> H
ii A B13
42
- =-
- -> H
iii A B31
26
+ = = G
iv AB82
119
= = G
142
Matrices
mAthsWoRksfoRteACheRs
v BA32
414
=-> H
vi AC2614
3416
4218
= = G
b Suppose MP = kP. Then we have two simultaneous equationsp q kp
p q kq
4
8 6
+ =
+ =
and collecting terms in p and q, we havek
p k
p q
q
4
8 6
0
0
(i)
(ii)
-
+ -
+ =
=
]
]
g
g
Multiply equation (i) by k6 - and subtract equation (ii) from it:k k p p6 4 8 0- - - =] ]g g
That is, k k p16 10 02- + =^ h . Since p ≠ 0, k k16 10 02- + = , so k = 2 or 8.
c A22
22
211
11
2 = == =G G, A44
44
3 = = G, A88
88
4 = = G, A1616
1616
5 = = G
Hence A A211
11
2n n n1 1= =- -= G .
d Suppose A and O are of size n × n. Consider AO. The element in ith row
and jth column of this product will be a o a 0 0ik kjk
n
ikk
n
1 1= =
= =
/ / , since
okj = 0 for all k = 1, 2 … n, j = 1, 2, 3 … n. Hence AO = O. The element in
the ith row and jth column of OA is o a a 00ik kjk
n
kjk
n
1 1= =
= =
/ / , and so OA = O.
Take A26
13
= = G, B12
12
=- -> H, then AB = O, but neither A = O nor B = O.
e J I01
10
01
10
10
01
10
01
2 #=- - -
-= =- =-> > > =H H H G
J J J I1
001
10
01
10
01
4 2 2# #= =-
-
-
-= => > =H H G
f (X − Y)(X + Y) = X2 − YX + XY − Y2. Since generally YX ≠ XY for matrices, then (X − Y)(X + Y) ≠ X2 − Y2.
Take X42
12
= = G and Y30
01
= = G.
143
Solution notes to student activities
ChApteR7
Then X Y X Y12
11
72
13
916
45
#- + = =] ]g g = = =G G G and
X Y42
12
30
01
1812
66
90
01
912
65
2 22 2
- = - = - == = = = =G G G G G
and clearly X Y X Y X Y2 2!- + -] ]g g .
Take X40
02
= = G and Y30
01
= = G.
Then X Y X Y1
17
3 0 300
00 7 0
#- + = =] ]g g = = =G G G and
X Y4
230
01
160
04
90
01
70
030
02 22 2
- = - = - == = = = =G G G G G
In this case X Y X Y X Y2 2- + = -] ]g g .
studentactivity2.3
a i 21 4
32
1-
-
-> H
ii 23
35-
-> H
iii 51 1
31
213
125
1-
-
-
-
-=> >H H
b This system can be written in matrix form as AX = B, where A24
31
= = G,
Xxy
= = G and B73
= = G. If A 1- exists, then we can multiply both sides of the
equation on the left by A 1- , and we thus have X A B1= - . Now, the inverse
of A24
31
= = G is .
...
A0 1
0 40 30 2
1 =-
--
> H. We can check this by matrix
multiplication:
..
..
. .
. .. .. .
24
31
0 10 4
0 30 2
0 2 1 20 4 0 4
0 6 0 61 2 0 2
10
01
-
-
- +
- +
-
-= == > > =G H H G
Now, having found the inverse of the matrix A, we can solve the system of
simultaneous linear equations: .
...
.
.X
xy
A B0 1
0 40 30 2
73
0 22 2
1=-
-= = =-
= > = =G H G G
Then x = 0.2 and y = 2.2 is the solution of the system of simultaneous linear equations given.
144
Matrices
mAthsWoRksfoRteACheRs
studentactivity2.4
A11
10
= = G, f2 = 11= G, so f30 = A28 f2
514229317811
317811196418
11
832040514229
= == = =G G G. Hence
the 29th number is 514 229 and the 30th is 832 040. Note that, in addition, the power of A also gives the 28th number 317 811 and 27th number 196 418.
studentactivity2.5
a
Letter S A V E space T H E
Number 19 1 22 5 0 20 8 5
Letter space W H A L E S space
Number 0 23 8 1 12 5 19 0
To code the message, we find the matrix product:
53
32
191
225
020
85
023
81
125
190
9859
12576
6040
5534
6946
4326
7546
9557
== = =G G G
and so the message sent would be 98, 59, 125, 76, 60, 40, 55, 34, 69, 46, 43, 26, 75, 46, 95, 57.
b To find the original message, we must first find the inverse of the coding matrix:
M53
32
23
35
11
=-
-=-
-
= >G H
Arrange the received message in column vectors of length 2, and put them together into one matrix:6542
7550
13887
9054
8554
8049
16099
12376
= G
Now multiply this on the left by M−1 to recover the original message as a matrix:
23
35
6542
7550
13887
9054
8554
8049
16099
12376
415
025
1521
180
815
135
2315
1811-
-=> = =H G G
So the original message is represented by the numbers4, 15, 0, 25, 15, 21, 18, 0, 8, 15, 13, 5, 23, 15, 18, 11Checking the coding listed previously, we see that the message reads DO YOUR HOMEWORK.
145
Solution notes to student activities
ChApteR7
studentactivity3.1
a {x + y = 1, 2x + y = −4} is one such systemb In this case, one equation must be a multiple of the other.
{2x + y = −4, 4x + 2y = −8} is one such systemc Since (0, 0, 0) is a solution, the equations must be of the form
ax + by + cz = 0. One example is {x − 2y + z = 0, 2x + 2y − 4z = 0} and the corresponding planes will intersect in a line with a one parameter family of solutions {(t, t, t): t ∈ R}
Another example is {x − 2y + z = 0, 2x − 4y + 2z = 0} and the corresponding planes intersect in a plane (i.e. they are the same plane) with a two parameter family of solutions {(s, t, 2t − s): s, t ∈ R}
d SOLVE([3·x – 2·y + z = 0, x – y – z = 10], [x, z])
xy
zy
43 10
430
/=+
=-+
Solution set is , , :y
yy
y R43 10
430
!+
-+
c m' 1
studentactivity3.2
a It is clear that (0, 0) is a solution of this system of equations. The
coefficient matrix is ac
bd
= G. This is invertible if ad − bc ≠ 0, in which case
xy
ac
bd
00
00
1
= =-
= = = =G G G G, the unique solution.
So we need to consider the case ad = bc. If c and d are both non-zero,
then ca
db
= and the equations are a multiple of one another, and so there
will be infinitely many solutions. If, say, c is zero, then either a or d is zero. If d is zero, then the second equation is 0x + 0y = 0, which has R2 as solution, and so the solution to the system is simply the set of points on the line ax + by = 0. If a is zero, then the equations are {by = 0, dy = 0}, so one is a multiple of the other, and the solution set is the set of points on the x-axis (provided one of b or d is non-zero). (A similar argument applies if d = 0.)
b The system of linear equations {ax + by = e, cx + dy = f} will have infinitely many solutions when one equation is simply a non-zero multiple
of the other. Assuming c ≠ 0, d ≠ 0 and f ≠ 0, then we need ca
db
fe
= = .
146
Matrices
mAthsWoRksfoRteACheRs
This can be written ad − bc = 0 and af − ce = 0 and bf − de = 0, and this way we do not have to assume c ≠ 0, d ≠ 0 and f ≠ 0.
The system of linear equations {ax + by = e, cx + dy = f} will have no solutions if the corresponding lines are parallel but distinct. In this case we require ad − bc = 0 but either af − ce ≠ 0 or bf − de ≠ 0.
The system of linear equations will have a unique solution if ad − bc ≠ 0.
c We have x = 3t + 1 and y = 2t − 1. We need to eliminate t from these
equations. From the first equation, t x3
1=
- . Use this to substitute for t in the second equation:
y x2 31 1=
--b l or y x3 2 5= -
d The underlying equation is 3x − y = 4, or y = 3x – 4. So simply choose any expression involving a parameter for x, and use the equation to write y as a function of the parameter. Equivalent solution sets would be {(r + 1, 3r – 1): r ∈ R} and {(s − 1, 3s – 7): s ∈ R}.
studentactivity3.3
a The augmented matrix is 112
245
133
231
-
-
-
R
T
SSSS
V
X
WWWW, which has reduced row echelon form
100
010
11
0
001
-
R
T
SSSS
V
X
WWWW. Looking at the last row of the reduced form (corresponding to
the equation 0x + 0y + 0z = 1) we see the equations are inconsistent. Hence there is no solution.
b The augmented matrix is 1
13
112
131
152
-
-
-
-
-
R
T
SSSS
V
X
WWWW, which has reduced row echelon
form 100
010
001
79
1
-
-
R
T
SSSS
V
X
WWWW. This corresponds to the system of equations
x
y
z
7
9
1
=-
=-
=
Z
[
\
]]
]]
_
`
a
bb
bb
and hence we have a unique solution (−7, −9, 1).
c The augmented matrix is 224
211
112
5710
-
-
-R
T
SSSS
V
X
WWWW, which has reduced row echelon
form
1
00
0
10
21
00
23
40
- -R
T
SSSSS
V
X
WWWWW. There are two leading variables (corresponding to the
147
Solution notes to student activities
ChApteR7
two leading 1s) x and y, and one free variable z. So let z = k, an arbitrary
real number. Then the solution set is 23, 4, :k k k R!
-b l' 1.
studentactivity3.4
a Let v, x, y and z be the unknown scores of players 1, 2, 3 and 4 respectively, and a, b, c and d the known totals. Then
v x a
x y b
y z c
z v d
+ =
+ =
+ =
+ =
is a system of four equations in four unknowns. The augmented matrix is
abcd
1001
1100
0110
0011
R
T
SSSSSS
V
X
WWWWWW
and its reduced row echelon form using CAS is
1000
0100
0010
11
10
0001
-
R
T
SSSSSS
V
X
WWWWWW
.
Now we have a problem. There is no sign of a, b, c or d, and the last line corresponds to an equation which reads
0v + 0x + 0y + 0z = 1which is clearly a contradiction, indicating there are no solutions. So to find out what is really happening, we need to use the Gaussian elimination procedure to reduce the augmented matrix to echelon form (but not reduced).
To begin, the new row 4 will be the old row 4 subtract row 1, giving
abc
d a
1000
1101
0110
0011- -
R
T
SSSSSS
V
X
WWWWWW
.
Next, the new row 4 should be the old row 4 with row 2 added to it.
abc
d a b
1000
1100
0111
0011 - +
R
T
SSSSSS
V
X
WWWWWW
14�
Matrices
mAthsWoRksfoRteACheRs
Next, the new row 4 will be the old row 4 with row 3 subtracted from it.
abc
d a b c
1000
1100
0110
0010 - + -
R
T
SSSSSS
V
X
WWWWWW
Now the last equation is 0v + 0x + 0y + 0z = d − a + b − c. Generally, if a + c ≠ b + d, there will be no solution.In this example, (v + x) + (y + z) = a + c, and (x + y) + (z + v) = b + d, so
a + c = b + d = sum of all players scores in the tournament, and this means
that the echelon form of the augmented matrix is
abc
1000
1100
0110
0010 0
R
T
SSSSS
V
X
WWWWW
, which is
equivalent to the reduced row echelon matrix
a b cb c
c
1000
0100
0010
11
10 0
-
- +
-
R
T
SSSSS
V
X
WWWWW
.
So there will be three basic variables v, x, and y, and one free variable z, and hence there will be an infinite number of solutions and thus the scores will not be able to be uniquely determined.
Writing z = k, k ∈ R, the first row of the matrix tells us that v = −k + a − b + c, the second row that x = k + b − c, and the third row that y = −k + c.
b Let f(x) = ax3 + bx2 + cx + d be the equation of a cubic polynomial function, with a, b, c and d the unknown coefficients. From Example 3.15 we have three equations:
a b c d
a b c d
a b c
0
0
3 2 4
+ + + =
- + - + =
+ + =-
Since f ′(x) = 3ax2 + 2bx + c, and the slope at x = −1 is 12, so 3a − 2b + c = 12 is a fourth equation to add to the system of three above.
14�
Solution notes to student activities
ChApteR7
The augmented matrix for this system of four equations is
11
33
1122
11
11
1100
004
12
-
-
-
-
R
T
SSSSSS
V
X
WWWWWW
which reduces to
1000
0100
0010
0001
242
4
-
-
R
T
SSSSSS
V
X
WWWWWW
and so a = 2, b = −4, c = −2 and d = 4, givingf(x) = 2x3 − 4x2 − 2x + 4
The graph of this function is given in the figure below.
–1–1–2–3–4 2 3
–2
–3
2
3
4
1
1O
y
x
c Let f(x) = ax4 + bx3 + cx2 + dx + e be the rule for the family of quartic polynomials.
Since they pass through (1, 2) and (−2, −1) and have slope 5 at x = 1, we have the following equations (see Example 3.16 for details):
a b c d e
a b c d e
a b c d
2
16 8 4 2 1
4 3 2 5
+ + + + =
- + - + =-
+ + + =
In addition, they must pass through the point (2, 0), so f(2) = 0 and 16a + 8b + 4c + 2d + e = 0.
150
Matrices
mAthsWoRksfoRteACheRs
This gives us a system of 4 equations in 5 unknowns. The augmented
matrix is
1164
16
18
38
1424
12
12
1101
21
50
- - -
R
T
SSSSSS
V
X
WWWWWW
, which reduces to
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
41
21
43
2
2435
65
24137
1237
-
-
-
-
R
T
SSSSSSSSS
V
X
WWWWWWWWW
.
Now there are four leading variables a, b, c and d, and one free variable e. Let e = t, t ∈ R. Then the first row of the reduced matrix gives the equation
a e41
2435
+ =-
and so a t41
2435
=- - , since e = t.
The second row of the reduced matrix gives the equation
b e21
65
- =
and so b t21
65
= + .
The third row of the reduced matrix gives the equation
c e43
24137
- =
and so c t43
24137
= + .
The fourth row of the reduced matrix gives the equation
d e2 1237
+ =-
and so d t2 1237
=- - .
Hence the family of functions is of the form:
,f x t x t x t x t x t t R41
2435
21
65
43
24137 2 12
374 3 2 d= - - + + + + + - - +] b b b bg l l l l
Again, we will plot some members of this family.
When t 0= , f x x x x x2435
65
24137
12374 3 2= - + + + -] b b b bg l l l l
When t 1= , f x x x x x2441
34
24155
1261 14 3 2= - + + + - +] b b b bg l l l l
When t 8=- , f x x x x x2413
619
247
12155 84 3 2= + - + - + -] b b b bg l l l l
The graphs of these are given in the figure on the following page.
151
Solution notes to student activities
ChApteR7
5
–5
–10
–15
–20
–25
–30
10
15
20
2–2–4–6–10 –8 4 6 8 10O
y
x
d Let f(x) = ax4 + bx3 + cx2 + dx + e be the rule for the family of quartic polynomials.
Since they pass through (1, 2), (−2, −1) and (2, 0), and have slope 5 at x = 1, we have the following equations (see example above for details):
aaaa
bbbb
cccc
dddd
ee
e
164
16
838
424
2
2
21
50
+
-
+
+
+
+
+
+
+
-
+
+
+
+
+
=
=
=
=
-
Now we are looking for the particular curve that passes through (−1, 5), so f(−1) = 5, and this gives a fifth equation a − b + c − d + e = 5.
The augmented matrix for this system of 5 equations is
1164
161
18
381
14241
12
121
11011
21
505
-
-
-
-
-
R
T
SSSSSSS
V
X
WWWWWWW
, which has reduced echelon form
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
34
127
316
1225
21
-
-
-
R
T
SSSSSSSSSSS
V
X
WWWWWWWWWWW
.
Hence f x x x x x34
127
316
1225
214 3 2=- + + - -] g , and its graph is given in
the figure on the following page.
152
Matrices
mAthsWoRksfoRteACheRs
2
–2
–4
–6
–8
–10
4
6
8
1–1–2–3–5 –4 2 3 4O
y
x
studentactivity4.1
a ac
bd
00
00
== = =G G G and hence any linear transformation maps the origin to
itself.b T cannot be uniquely determined since
31
31
1-
-=-> >H H, that is, one vector
is a multiple of the other.
Let ac
bd
= G be the matrix for T.
Then ac
bd
31
11-
-== > >G H H and
ac
bd
31
11
-
-== > >G H H.
This yields the equations a b
c d
3 1
3 1
- =-
- = and
a b
c d
3 1
3 1
- + =
- + =-.
Grouping the equations for a and b and for c and d together, we have
a b
a b
3 1
3 1
- =-
- + =) 3 and
c d
c d
3 1
3 1
- =
- + =-) 3
But the two equations in each set are in fact a multiple of one another, so we only have two distinct equations in 4 unknowns. If we choose any values for b and d, then the values of a and c will be determined by these equations.
Take b = 4 and d = −1. Then a = 1 and c = 0, giving 10
41-
> H as a matrix for T.
153
Solution notes to student activities
ChApteR7
Take b = 4 and d = 2. Then a = 1 and c = 1, giving 11
42
= G as a matrix for T.
Take b = 4 and d = −4. Then a = 1 and c = −1, giving 11
44- -
> H as a matrix for T, (and this matrix is singular).
c Let xy
45
34
10
== = =G G G and rs
45
34
01
== = =G G G.
Then xy
45
34
10
1
=-
= = =G G G and rs
45
34
01
1
=-
= = =G G G.
Then xy
45
34
10
45-
-
-= == > = >G H G H and
rs
45
34
01
34-
- -= == > = >G H G H.
So (4, −5) is mapped to (1, 0) and (−3, 4) is mapped to (0, 1).
studentactivity4.2
a Property 1: ac
bd
00
00
== = =G G G and hence any linear transformation maps the
origin to itself.Property 2: A non-singular linear transformation is onto, since if A is
the matrix of the transformation then any point v = (x, y) is the image of A−1v.
A non-singular linear transformation is one-to-one, since if A is the matrix of the transformation, then if Av1 = Av2 for any v1, v2 ∈ R2, then A(v1 − v2) = O, so A−1A(v1 − v2) = A−1O, that is, (v1 − v2) = O, and so v1 = v2.
Property 3: Consider a straight line with vector equation r = u + tv, and a transformation T with matrix A. Then T(r) = Au + tAv = u1 + tv1, where u1, tv1, ∈ R2, and so is the vector equation for a line.
Property 4: Two distinct parallel lines will have vector equations of the form r1 = u1 + tv and r2 = u2 + tv, where u1 ≠ u2 and v gives the direction of the lines. Under the transformation with matrix A, the direction of both lines will be Av, so will still be parallel and distinct, since Au1 ≠ Au2.
Property 5: A line through the origin can be written in vector form r = tv, and if A is the matrix of the transformation then T(r) = tAv, which also passes through the origin.
154
Matrices
mAthsWoRksfoRteACheRs
b xy
xy
21
13
1
1=> = =H G G
xy
xy
xy
21
13
53
51
51
52
11
1
1
1=
=
-
-
-
= = >
> >
G G H
H H
x x y
y x y
53
1 51
1
51
1 52
1
= -
= +-
y x5 3= - becomes
x y x y5 351
1 52
1 53
1 51
1- + = - -` j
which simplifies to y x8 251 1= - or y x8 25= -
c xy
xy
11
11
1
1=> = =H G G
x x y
y x y1
1
= +
= +
,x y x ySince constant 1 1` !+ = so y x= is image of y x5 3= - .
d If the line maps to a point, then from c x y k1 1= = , where k is a real constant. So lines of form x y k+ = are mapped to the point (k, k) under this transformation.
e 21
13
00
00
== = =G G G
21
13
01
13
== = =G G G
21
13
11
34
== = =G G G
21
13
10
21
== = =G G G
So the transformation maps the unit square to the parallelogram with vertices (0, 0), (1, 3), (3, 4) and (2, 1), and the area of the parallelogram is
det21
13
= G × area of unit square = 5 square units.
155
Solution notes to student activities
ChApteR7
studentactivity4.3
a xy
xy
30
02
1
1== = >G G H
xy
xy
x
y20
03
3
261 1
1
1
1= =
R
T
SSSSS
= = >
V
X
WWWWW
G G H
Hence the image of y = sin(x) is siny x2 3
1 1= c m, or siny x2 3= b l.
b xy
xy
3215 1
1== = >G G H
xy
xy
x yx y
21
53
2 53
1
1
1 1
1 1=
-
-=
-
- += > > >G H H H
Hence the image of y = x2 is −x1 + 3y1 = (2x1 − 5y1)2, or
4x2 − 20xy + 25y2 + x − 3y = 0.
c xy
xy
11
11
1
1== = >G G H
So x + y = x1 = y1, and so the image of y = x2 is y = x.
studentactivity4.4
a k
k0
0 00
00
y
x=> = =H G G
k
k
k
0
0 10 0
y
x
y=> = >H G H
k
k
k
k0
0 11
y
x
y
x=> = >H G H
k
k k0
0 0 0
1y
x x=> = >H G H
Hence the area of the rectangle formed by the transformed vertices is kxky.
b k
kxy
xy0
0y
x
1
1=> = >H G H
So x kx
y
1= and y ky
x
1= , and so y = f(x) is transformed to ky
f kx
x y
1 1= e o, that is,
y k f kx
xy
= e o.
156
Matrices
mAthsWoRksfoRteACheRs
c From part b, x ax1= and y b
y1= , and so x2 + y2 = 1 is transformed to
ax
by
112
12
+ =c cm m , that is ax
by
12 2
+ =b cl m . This is the equation of an ellipse
with horizontal axis length 2a and vertical axis length 2b.
Hence area of ellipse = area of unit circle × det a
b00
= G = πab.
d The matrix for an anticlockwise rotation about the origin through an angle
of θ is cossin
sincos
]
]
]
]
g
g
g
g= G.
The matrix for an anticlockwise rotation about the origin through an
angle of φ is cos
sin
sin
cos
-ff
ff
^
^
^
^
h
h
h
h> H, and the matrix for an anticlockwise rotation
about the origin through an angle of θ + φ is cos
sin
sin
cos
+
+
- +
+
q fq f
q fq f
^
^
^
^
h
h
h
h> H.
Then, since a rotation through an angle of θ followed by a rotation through an angle φ is equivalent to a rotation through an angle θ + φ:
cossin
sincos
cos
sin
sin
cos
cos
sin
sin
cos- -
=+
+
- +
+
ff
ff
q fq f
q fq f
]
]
]
]
^
^
^
^
^
^
^
^
g
g
g
g
h
h
h
h
h
h
h
h= > >G H H
Now cossin
sincos
cos
sin
sin
cos- -
ff
ff
]
]
]
]
^
^
^
^
g
g
g
g
h
h
h
h= >G H
cos cos sin sin
sin cos cos sin
cos sin sin cos
cos cos sin sin
- -
-+
-q f q fq f q f
q f q fq f q f
] ^ ] ^
] ^ ] ^
] ^ ] ^
] ^ ] ^
g h g h
g h g h
g h g h
g h g h> H
And so cos cos cos sin sin+ = -q f q f q f^ ] ^ ] ^h g h g h and sin sin cos cos sin+ = +q f q f q f^ ] ^ ] ^h g h g h.
e xy
x yy
10
31
3+== = >G G H so the points (0, 0), (1, 0), (1, 1) and (0, 1) are
transformed to the points (0, 0), (1, 0), (4, 1) and (3, 1) respectively. The unit square and its image under this transformation are shown on the following page.
157
Solution notes to student activities
ChApteR7
1
–1x
y
–1
2
1O 2 3 4
f xy
xy
10
31
1
1== = >G G H
so xy
xy
xy
x yy
10
31
10
31
311
1
1
1
1 1
1= =
-=
--
= = > > > >G G H H H H.
Hence the function y = f(x) is transformed to y = f(x − 3y) under this transformation.
The image of y = x2 will be y = (x − 3y)2.
15�
Matrices
mAthsWoRksfoRteACheRs
studentactivity4.5
a Dilation by factor 0.5 from y-axis has matrix .0 50
01
= G
Reflection in y-axis has matrix 1
001
-> H
Translation 3 units parallel to x-axis and 1 unit parallel to y-axis is
represented by 31= G
Hence . x
yxy
10
01
0 50
01
31
1
1
-=+> = = = >H G G G H
Then .x
yxy
xy
0 50
01
10
01
31
2 31
1 11
1
1
1=
- -
-=
- -
-
- -^ h
= = > > >G G H H H
So y = f(x) becomes y – 1 = f(−2(x – 3)) and hence y x1
= is transformed
to yx
12 3
1- =
- -] g or y
x2 31 1=
- -+
] g.
b 10
01-
> H gives reflection in the x-axis, 21-
> H gives a translation of 2 units to
the right and 1 unit down.
Then xy
xy
10
01
21
1
1 - -= +> > = >H H G H
and xy
xy
21
10
01
1
1 - -- => > > =H H H G
that is, xy
xy
21
10
01
1
1
-
+ -=> > =H H G, and so
xy
xy
xy
xy
10
01
21
10
01
21
21
11
1
1
1
1
1=
-
-
+=
-
-
+=
-
- +
-
^ h= > > > > >G H H H H H
So x − y2 = 0 is transformed to (x − 2) − (y + 1)2 = 0.
15�
Solution notes to student activities
ChApteR7
studentactivity5.1
a ..
.
...
S PS01
0 250 75
0 50 5
0 1250 8751 0= = == = =G G G
.
...
.
.S P S
01
0 250 75
0 50 5
0 218750 781252
20
2
= = == = =G G G
.
...
.
.S P S
01
0 250 75
0 50 5
0 199710 800295
50
5
.= = = = =G G G
.
...
.
.S P S
01
0 250 75
0 50 5
0 200000 8000010
100
10
.= = = = =G G G
.
...
.
.S P S
01
0 250 75
0 50 5
0 200000 8000050
500
50
.= = = = =G G G
b ..
.
.P
0 4450 555
0 4440 556
3 = = G
Hence the probability of going from state 1 to state 2 in 3 transitions is the element in (2, 1) position of P3, which is 0.555.
Similarly, the probability of going from state 2 to state 2 in 3 transitions is the element in (2, 2) position of P3, which is 0.556.
studentactivity5.2
a ..
.
.P
0 750 25
0 500 50
2 = = G has no non-zero entries, so P is regular.
Since a = 21 and b = 1, S 3
2
31= > H and L 3
2
31
32
31= > H.
b P P P01
10
3 5 f= = = == G
P I P P10
01
2 4 6 f= = = = == G
P is not regular. The limit matrix L does not exist, as powers of P
oscillate between 01
10
= G and 10
01
= G. There is no steady-state vector; however,
if ..
S0 50 5
= = G, then PS S= .
c It is not regular since any power of P will have a zero in (1, 2) position. (In
fact, Pa
a101
nn
n=-
> H.)The steady-state vector is S01
= = G and lim P01
01n
n ="3
= G.
160
Matrices
mAthsWoRksfoRteACheRs
studentactivity5.3
a P
1
0
0
21
21
0
0
1
0
=
R
T
SSSSSS
V
X
WWWWWW
P
1
0
0
10241023
10241
0
512511
5121
0
10 =
R
T
SSSSSS
V
X
WWWWWW
P
1
0
0
10485761048575
10485761
0
524288524287
5242881
0
20 =
R
T
SSSSSS
V
X
WWWWWW
lim P100
100
100
nn =
"3
R
T
SSSS
V
X
WWWW
Steady-state vector S100
=
R
T
SSSS
V
X
WWWW
PS S
1
0
0
21
21
0
0
1
0
100
100
Check: = = =
J
L
KKKKK
N
P
OOOOO
R
T
SSSSSS
R
T
SSSS
R
T
SSSS
V
X
WWWWWW
V
X
WWWW
V
X
WWWW
b i
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
100000
0 750000
0 25
0 50000
0 5
0 50000
0 5
0 250000
0 75
000001
0 10 20 20 20 20 1
0 50000
0 5
=
R
T
SSSSSSSS
R
T
SSSSSSSS
R
T
SSSSSSSS
V
X
WWWWWWWW
V
X
WWWWWWWW
V
X
WWWWWWWW
ii
.
.
.
.
.
.
.
.
.
.
.
.100000
0 750000
0 25
0 50000
0 5
0 50000
0 5
0 250000
0 75
000001
0 30 40 20
0000
4021
00 1
4019
=
R
T
SSSSSSSS
R
T
SSSSSSSS
R
T
SSSSSSSSSS
V
X
WWWWWWWW
V
X
WWWWWWWW
V
X
WWWWWWWWWW
161
Solution notes to student activities
ChApteR7
iii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
100000
0 750000
0 25
0 50000
0 5
0 50000
0 5
0 250000
0 75
000001
0 250 350 150 25
0
0 50000
0 5
0
=
R
T
SSSSSSSS
R
T
SSSSSSSS
R
T
SSSSSSSS
V
X
WWWWWWWW
V
X
WWWWWWWW
V
X
WWWWWWWW
c
Currentstate
A b C
statenextyear
A 0.6 0.2 0.05
b 0.1 0.6 0.15
C 0.3 0.2 0.8
Then the transition matrix is ...
.
.
.
.
.
.P
0 60 10 3
0 20 60 2
0 050 150 8
=
R
T
SSSS
V
X
WWWW, and the initial state
vector is S
31
31
31
0 =
R
T
SSSSSSS
V
X
WWWWWWW
, and so the distribution after 3 years is
.
.
.
.
.
.
.
.
.
.
.
.P S
0 60 10 3
0 20 60 2
0 050 150 8
31
31
0 2260 2560 518
30
3 31
.=
R
T
SSSS
R
T
SSSSSS
R
T
SSSS
V
X
WWWW
V
X
WWWWWW
V
X
WWWW
After 3 years approximately 22.6% of the population will live in state A, 25.6% in state B and 51.8% in state C.
To find the long term population distribution, we can investigate powers of P.
.
.
.
.
.
.
.
.
.P
0 19610 25490 5490
0 19610 25490 5490
0 19610 25490 5490
15 .
R
T
SSSS
V
X
WWWW
.
.
.
.
.
.
.
.
.P
0 19610 25490 5490
0 19610 25490 5490
0 19610 25490 5490
20 .
R
T
SSSS
V
X
WWWW
162
Matrices
mAthsWoRksfoRteACheRs
It appears that, in the long term, approximately 19.6% of the population will live in state A, 25.5% in state B and 54.9% in state C.
Alternatively, consider
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.I P
100
010
001
0 60 10 3
0 20 60 2
0 050 150 8
0 40 10 3
0 20 40 2
0 050 150 2
- = - = -
-
-
-
-
-
R
T
SSSS
R
T
SSSS
R
T
SSSS
V
X
WWWW
V
X
WWWW
V
X
WWWW
which has reduced echelon form
1
0
0
0
1
0
145
2813
0
-
-
R
T
SSSSSS
V
X
WWWWWW, and if S
xyz
=
R
T
SSSS
V
X
WWWW, then we have
x z145
= , y z2813
= and x + y + z = 1, which has solution
x y z5110
5113
5128
/ /= = = , or, approximately, x = 0.1961, y = 0.2549,
z = 0.5490.