some notes on matrix algebra
TRANSCRIPT
CHAPTER 1
Matrix Algebra
In this chapter we collect results related to matrix algebra which are
relevant to this book. Some specific topics which are typically not
found in standard books are also covered here.
1.1. Preliminaries
Standard notation in this chapter is given here. Matrices are denoted
by capital letters A, B etc.. They can be rectangular with m rows
and n columns. Their elements or entries are referred to with small
letters aij, bij etc. where i denotes the i-th row of matrix and j denotes
the j-th column of matrix. Thus
A =
a11 a12 . . . a1n
a21 a22 . . . a1n...
.... . .
...
am1 am2 . . . amn
Mostly we consider complex matrices belonging to Cm×n. Sometimes
we will restrict our attention to real matrices belonging to Rm×n.
Definition 1.1 [Square matrix] An m×n matrix is called square
matrix if m = n.
Definition 1.2 [Tall matrix] An m× n matrix is called tall ma-
trix if m > n i.e. the number of rows is greater than columns.
1
2 1. MATRIX ALGEBRA
Definition 1.3 [Wide matrix] An m × n matrix is called wide
matrix if m < n i.e. the number of columns is greater than rows.
Definition 1.4 [Main diagonal] Let A = [aij] be an m×n matrix.
The main diagonal consists of entries aij where i = j. i.e. main
diagonal is {a11, a22, . . . , akk} where k = min(m,n). Main diagonal
is also known as leading diagonal, major diagonal primary
diagonal or principal diagonal. The entries of A which are not
on the main diagonal are known as off diagonal entries.
Definition 1.5 [Diagonal matrix] A diagonal matrix is a matrix
(usually a square matrix) whose entries outside the main diagonal
are zero.
Whenever we refer to a diagonal matrix which is not square, we
will use the term rectangular diagonal matrix.
A square diagonal matrixA is also represented by diag(a11, a22, . . . , ann)
which lists only the diagonal (non-zero) entries in A.
The transpose of a matrix A is denoted by AT while the Hermitian
transpose is denoted by AH . For real matrices AT = AH .
When matrices are square, we have the number of rows and columns
both equal to n and they belong to Cn×n.
If not specified, the square matrices will be of size n×n and rectangular
matrices will be of size m×n. If not specified the vectors (column vec-
tors) will be of size n×1 and belong to either Rn or Cn. Corresponding
row vectors will be of size 1× n.
For statements which are valid both for real and complex matrices,
sometimes we might say that matrices belong to Fm×n while the scalars
belong to F and vectors belong to Fn where F refers to either the field
of real numbers or the field of complex numbers. Note that this is not
1.1. PRELIMINARIES 3
consistently followed at the moment. Most results are written only for
Cm×n while still being applicable for Rm×n.
Identity matrix for Fn×n is denoted as In or simply I whenever the size
is clear from context.
Sometimes we will write a matrix in terms of its column vectors. We
will use the notation
A =[a1 a2 . . . an
]indicating n columns.
When we write a matrix in terms of its row vectors, we will use the
notation
A =
aT1aT2...
aTm
indicating m rows with ai being column vectors whose transposes form
the rows of A.
The rank of a matrix A is written as rank(A), while the determinant
as det(A) or |A|.
We say that an m × n matrix A is left-invertible if there exists an
n×m matrix B such that BA = I. We say that an m× n matrix A is
right-invertible if there exists an n×m matrix B such that AB = I.
We say that a square matrix A is invertible when there exists another
square matrix B of same size such that AB = BA = I. A square
matrix is invertible iff its both left and right invertible. Inverse of a
square invertible matrix is denoted by A−1.
A special left or right inverse is the pseudo inverse, which is denoted
by A†.
Column space of a matrix is denoted by C(A), the null space by N (A),
and the row space by R(A).
4 1. MATRIX ALGEBRA
We say that a matrix is symmetric when A = AT , conjugate sym-
metric or Hermitian when AH = A.
When a square matrix is not invertible, we say that it is singular. A
non-singular matrix is invertible.
The eigen values of a square matrix are written as λ1, λ2, . . . while the
singular values of a rectangular matrix are written as σ1, σ2, . . . .
The inner product or dot product of two column / row vectors u and
v belonging to Rn is defined as
u · v = 〈u, v〉 =n∑i=1
uivi. (1.1.1)
The inner product or dot product of two column / row vectors u and
v belonging to Cn is defined as
u · v = 〈u, v〉 =n∑i=1
uivi. (1.1.2)
1.1.1. Block matrix
Definition 1.6 A block matrix is a matrix whose entries them-
selves are matrices with following constraints
(1) Entries in every row are matrices with same number of
rows.
(2) Entries in every column are matrices with same number
of columns.
Let A be an m× n block matrix. Then
A =
A11 A12 . . . A1n
A21 A22 . . . A2n
......
. . ....
Am1 Am2 . . . Amn
(1.1.3)
where Aij is a matrix with ri rows and cj columns.
1.1. PRELIMINARIES 5
A block matrix is also known as a partitioned matrix.
Example 1.1: 2x2 block matrices Quite frequently we will be using
2x2 block matrices.
P =
[P11 P12
P21 P22
]. (1.1.4)
An example
P =
a b c
d e f
g h i
We have
P11 =
[a b
d e
]P12 =
[c
f
]P21 =
[g h
]P22 =
[i]
• P11 and P12 have 2 rows.
• P21 and P22 have 1 row.
• P11 and P21 have 2 columns.
• P12 and P22 have 1 column.
�
Lemma 1.1 Let A = [Aij] be an m×n block matrix with Aij being
an ri × cj matrix. Then A is an r × c matrix where
r =m∑i=1
ri (1.1.5)
and
c =n∑j=1
cj. (1.1.6)
Remark. Sometimes it is convenient to think of a regular matrix as a
block matrix whose entries are 1× 1 matrices themselves.
Definition 1.7 [Multiplication of block matrices] Let A = [Aij]
be an m × n block matrix with Aij being a pi × qj matrix. Let
6 1. MATRIX ALGEBRA
B = [Bjk] be an n×p block matrix with Bjk being a qj×rk matrix.
Then the two block matrices are compatible for multiplication
and their multiplication is defined by C = AB = [Cik] where
Cik =n∑j=1
AijBjk (1.1.7)
and Cik is a pi × rk matrix.
Definition 1.8 A block diagonal matrix is a block matrix
whose off diagonal entries are zero matrices.
1.2. Linear independence, span, rank
1.2.1. Spaces associated with a matrix
Definition 1.9 The column space of a matrix is defined as the
vector space spanned by columns of the matrix.
Let A be an m× n matrix with
A =[a1 a2 . . . an
]Then the column space is given by
C(A) = {x ∈ Fm : x =n∑i=1
αiai for some αi ∈ F}. (1.2.1)
Definition 1.10 The row space of a matrix is defined as the
vector space spanned by rows of the matrix.
Let A be an m× n matrix with
A =
aT1aT2...
aTm
1.2. LINEAR INDEPENDENCE, SPAN, RANK 7
Then the row space is given by
R(A) = {x ∈ Fn : x =m∑i=1
αiai for some αi ∈ F}. (1.2.2)
1.2.2. Rank
Definition 1.11 [Column rank] The column rank of a matrix
is defined as the maximum number of columns which are linearly
independent. In other words column rank is the dimension of the
column space of a matrix.
Definition 1.12 [Row rank] The row rank of a matrix is defined
as the maximum number of rows which are linearly independent.
In other words row rank is the dimension of the row space of a
matrix.
Theorem 1.2 The column rank and row rank of a matrix are
equal.
Definition 1.13 [Rank] The rank of a matrix is defined to be
equal to its column rank which is equal to its row rank.
Lemma 1.3 For an m× n matrix A
0 ≤ rank(A) ≤ min(m,n). (1.2.3)
Lemma 1.4 The rank of a matrix is 0 if and only if it is a zero
matrix.
Definition 1.14 [Full rank matrix] An m × n matrix A is called
full rank if
rank(A) = min(m,n).
In other words it is either a full column rank matrix or a full row
rank matrix or both.
8 1. MATRIX ALGEBRA
Lemma 1.5 [Rank of product to two matrices] Let A be an m×nmatrix and B be an n× p matrix then
rank(AB) ≤ min(rank(A), rank(B)). (1.2.4)
Lemma 1.6 [Post-multiplication with a full row rank matrix] Let
A be an m× n matrix and B be an n× p matrix. If B is of rank
n then
rank(AB) = rank(A). (1.2.5)
Lemma 1.7 [Pre-multiplication with a full column rank matrix]
Let A be an m × n matrix and B be an n × p matrix. If A is of
rank n then
rank(AB) = rank(B). (1.2.6)
Lemma 1.8 The rank of a diagonal matrix is equal to the number
of non-zero elements on its main diagonal.
Proof. The columns which correspond to diagonal entries which
are zero are zero columns. Other columns are linearly independent.
The number of linearly independent rows is also the same. Hence their
count gives us the rank of the matrix. �
1.3. Invertible matrices
Definition 1.15 [Invertible] A square matrix A is called invert-
ible if there exists another square matrix B of same size such that
AB = BA = I.
The matrix B is called the inverse of A and is denoted as A−1.
Lemma 1.9 If A is invertible then its inverse A−1 is also invertible
and the inverse of A−1 is nothing but A.
1.3. INVERTIBLE MATRICES 9
Lemma 1.10 Identity matrix I is invertible.
Proof.
II = I =⇒ I−1 = I.
�
Lemma 1.11 If A is invertible then columns of A are linearly
independent.
Proof. Assume A is invertible, then there exists a matrix B such
that
AB = BA = I.
Assume that columns of A are linearly dependent. Then there exists
u 6= 0 such that
Au = 0 =⇒ BAu = 0 =⇒ Iu = 0 =⇒ u = 0
a contradiction. Hence columns of A are linearly independent. �
Lemma 1.12 If an n × n matrix A is invertible then columns of
A span Fn.
Proof. Assume A is invertible, then there exists a matrix B such
that
AB = BA = I.
Now let x ∈ Fn be any arbitrary vector. We need to show that there
exists α ∈ Fn such that
x = Aα.
But
x = Ix = ABx = A(Bx).
Thus if we choose α = Bx, then
x = Aα.
10 1. MATRIX ALGEBRA
Thus columns of A span Fn. �
Lemma 1.13 If A is invertible, then columns of A form a basis
for Fn.
Proof. In Fn a basis is a set of vectors which is linearly inde-
pendent and spans Fn. By lemma 1.11 and lemma 1.12, columns of
an invertible matrix A satisfy both conditions. Hence they form a
basis. �
Lemma 1.14 If A is invertible than AT is invertible.
Proof. Assume A is invertible, then there exists a matrix B such
that
AB = BA = I.
Applying transpose on both sides we get
BTAT = ATBT = I.
Thus BT is inverse of AT and AT is invertible. �
Lemma 1.15 If A is invertible than AH is invertible.
Proof. Assume A is invertible, then there exists a matrix B such
that
AB = BA = I.
Applying conjugate transpose on both sides we get
BHAH = AHBH = I.
Thus BH is inverse of AH and AH is invertible. �
1.3. INVERTIBLE MATRICES 11
Lemma 1.16 If A and B are invertible then AB is invertible.
Proof. We note that
(AB)(B−1A−1) = A(BB−1)A−1 = AIA−1 = I.
Similarly
(B−1A−1)(AB) = B−1(A−1A)B = B−1IB = I.
Thus B−1A−1 is the inverse of AB. �
Lemma 1.17 The set of n×n invertible matrices under the matrix
multiplication operation form a group.
Proof. We verify the properties of a group
Closure: If A and B are invertible then AB is invertible. Hence the
set is closed.
Associativity: Matrix multiplication is associative.
Identity element: I is invertible and AI = IA = A for all invertible
matrices.
Inverse element: If A is invertible then A−1 is also invertible.
Thus the set of invertible matrices is indeed a group under matrix
multiplication. �
Lemma 1.18 An n × n matrix A is invertible if and only if it is
full rank i.e.
rank(A) = n.
Corollary 1.19. The rank of an invertible matrix and its inverse are
same.
12 1. MATRIX ALGEBRA
1.3.1. Similar matrices
Definition 1.16 [Similar matrices] An n×n matrix B is similar
to an n× n matrix A if there exists an n× n non-singular matrix
C such that
B = C−1AC.
Lemma 1.20 If B is similar to A then A is similar to B. Thus
similarity is a symmetric relation.
Proof.
B = C−1AC =⇒ A = CBC−1 =⇒ A = (C−1)−1BC−1
Thus there exists a matrix D = C−1 such that
A = D−1BD.
Thus A is similar to B. �
Lemma 1.21 Similar matrices have same rank.
Proof. Let B be similar to A. Thus their exists an invertible
matrix C such that
B = C−1AC.
Since C is invertible hence we have rank(C) = rank(C−1) = n. Now
using lemma 1.6 rank(AC) = rank(A) and using lemma 1.7 we have
rank(C−1(AC)) = rank(AC) = rank(A). Thus
rank(B) = rank(A).
�
Lemma 1.22 Similarity is an equivalence relation on the set of
n× n matrices.
1.3. INVERTIBLE MATRICES 13
Proof. Let A,B,C be n×n matrices. A is similar to itself through
an invertible matrix I. If A is similar to B then B is similar to itself.
If B is similar to A via P s.t. B = P−1AP and C is similar to B
via Q s.t. C = Q−1BQ then C is similar to A via PQ such that
C = (PQ)−1A(PQ). Thus similarity is an equivalence relation on the
set of square matrices and if A is any n×n matrix then the set of n×nmatrices similar to A forms an equivalence class. �
1.3.2. Gram matrices
Definition 1.17 Gram matrix of columns of A is given by
G = AHA (1.3.1)
Definition 1.18 Gram matrix of rows of A is given by
G = AAH (1.3.2)
Remark. Usually when we talk about Gram matrix of a matrix we
are looking at the Gram matrix of its column vectors.
Remark. For real matrix A ∈ Rm×n, the Gram matrix of its column
vectors is given by ATA and the Gram matrix for its row vectors is
given by AAT .
Following results apply equally well for the real case.
Lemma 1.23 The columns of a matrix are linearly dependent if
and only if the Gram matrix of its column vectors AHA is not
invertible.
Proof. Let A be an m × n matrix and G = AHA be the Gram
matrix of its columns.
If columns of A are linearly dependent, then there exists a vector u 6= 0
such that
Au = 0.
14 1. MATRIX ALGEBRA
Thus
Gu = AHAu = 0.
Hence the columns of G are also dependent and G is not invertible.
Conversely let us assume that G is not invertible, thus columns of G
are dependent and there exists a vector v 6= 0 such that
Gv = 0.
Now
vHGv = vHAHAv = (Av)H(Av) = ‖Av‖22.
From previous equation, we have
‖Av‖22 = 0 =⇒ Av = 0.
Since v 6= 0 hence columns of A are also linearly dependent. �
Corollary 1.24. The columns of a matrix are linearly independent if
and only if the Gram matrix of its column vectors AHA is invertible.
Proof. Columns of A can be dependent only if its Gram matrix is
not invertible. Thus if the Gram matrix is invertible, then the columns
of A are linearly independent.
The Gram matrix is not invertible only if columns of A are linearly
dependent. Thus if columns of A are linearly independent then the
Gram matrix is invertible. �
Corollary 1.25. Let A be a full column rank matrix. Then AHA is
invertible.
Lemma 1.26 The null space of A and its Gram matrix AHA co-
incide. i.e.
N (A) = N (AHA). (1.3.3)
Proof. Let u ∈ N (A). Then
Au = 0 =⇒ AHAu = 0.
1.3. INVERTIBLE MATRICES 15
Thus
u ∈ N (AHA) =⇒ N (A) ⊆ N (AHA).
Now let u ∈ N (AHA). Then
AHAu = 0 =⇒ uHAHAu = 0 =⇒ ‖Au‖22 = 0 =⇒ Au = 0.
Thus we have
u ∈ N (A) =⇒ N (AHA) ⊆ N (A).
�
Lemma 1.27 The rows of a matrix A are linearly dependent if and
only if the Gram matrix of its row vectors AAH is not invertible.
Proof. Rows of A are linearly dependent, if and only if columns
of AH are linearly dependent. There exists a vector v 6= 0 s.t.
AHv = 0
Thus
Gv = AAHv = 0.
Since v 6= 0 hence G is not invertible.
Converse: assuming that G is not invertible, there exists a vector u 6= 0
s.t.
Gu = 0.
Now
uHGu = uHAAHu = (AHu)H(AHu) = ‖AHu‖22 = 0 =⇒ AHu = 0.
Since u 6= 0 hence columns of AH and consequently rows of A are
linearly dependent. �
Corollary 1.28. The rows of a matrix A are linearly independent if
and only if the Gram matrix of its row vectors AAH is invertible.
Corollary 1.29. Let A be a full row rank matrix. Then AAH is in-
vertible.
16 1. MATRIX ALGEBRA
1.3.3. Pseudo inverses
Definition 1.19 [Moore-Penrose pseudo-inverse] Let A be an m×n matrix. An n×m matrix A† is called its Moore-Penrose pseudo-
inverse if it satisfies all of the following criteria:
(1) AA†A = A.
(2) A†AA† = A†.
(3)(AA†
)H= AA† i.e. AA† is Hermitian.
(4) (A†A)H = A†A i.e. A†A is Hermitian.
Theorem 1.30 [Existence and uniqueness] For any matrix A there
exists precisely one matrix A† which satisfies all the requirements
in definition 1.19.
We omit the proof for this. The pseudo-inverse can actually be ob-
tained by the singular value decomposition of A. This is shown in
lemma 1.110.
Lemma 1.31 Let D = diag(d1, d2, . . . , dn) be an n × n diag-
onal matrix. Then its Moore-Penrose pseudo-inverse is D† =
diag(c1, c2, . . . , cn) where
ci =
{1di
if di 6= 0;
0 if di = 0.
Proof. We note that D†D = DD† = F = diag(f1, f2, . . . fn)
where
fi =
{1 if di 6= 0;
0 if di = 0.
We now verify the requirements in definition 1.19.
DD†D = FD = D.
D†DD† = FD† = D†
D†D = DD† = F is a diagonal hence Hermitian matrix. �
1.3. INVERTIBLE MATRICES 17
Lemma 1.32 Let D = diag(d1, d2, . . . , dp) be an m × n rectan-
gular diagonal matrix where p = min(m,n). Then its Moore-
Penrose pseudo-inverse is an n × m rectangular diagonal matrix
D† = diag(c1, c2, . . . , cp) where
ci =
{1di
if di 6= 0;
0 if di = 0.
Proof. F = D†D = diag(f1, f2, . . . fn) is an n× n matrix where
fi =
1 if di 6= 0;
0 if di = 0;
0 if i > p.
G = DD† = diag(g1, g2, . . . gn) is an m×m matrix where
gi =
1 if di 6= 0;
0 if di = 0;
0 if i > p.
We now verify the requirements in definition 1.19.
DD†D = DF = D.
D†DD† = D†G = D†
F = D†D and G = DD† are both diagonal hence Hermitian matrices.
�
Lemma 1.33 If A is full column rank then its Moore-Penrose
pseudo-inverse is given by
A† = (AHA)−1AH . (1.3.4)
It is a left inverse of A.
Proof. By corollary 1.25 AHA is invertible.
18 1. MATRIX ALGEBRA
First of all we verify that its a left inverse.
A†A = (AHA)−1AHA = I.
We now verify all the properties.
AA†A = AI = A.
A†AA† = IA† = A†.
Hermitian properties:(AA†
)H=(A(AHA)−1AH
)H=(A(AHA)−1AH
)= AA†.
(A†A)H = IH = I = A†A.
�
Lemma 1.34 If A is full row rank then its Moore-Penrose pseudo-
inverse is given by
A† = AH(AAH)−1. (1.3.5)
It is a right inverse of A.
Proof. By corollary 1.29 AAH is invertible.
First of all we verify that its a right inverse.
AA† = AAH(AAH)−1 = I.
We now verify all the properties.
AA†A = IA = A.
A†AA† = A†I = A†.
Hermitian properties: (AA†
)H= IH = I = AA†.
(A†A)H =(AH(AAH)−1A
)H= AH(AAH)−1A = A†A.
�
1.4. TRACE AND DETERMINANT 19
1.4. Trace and determinant
1.4.1. Trace
Definition 1.20 [Trace] The trace of a square matrix is defined
as the sum of the entries on its main diagonal. Let A be an n× nmatrix, then
tr(A) =n∑i=1
aii (1.4.1)
where tr(A) denotes the trace of A.
Lemma 1.35 The trace of a square matrix and its transpose are
equal.
tr(A) = tr(AT ). (1.4.2)
Lemma 1.36 Trace of sum of two square matrices is equal to the
sum of their traces.
tr(A+B) = tr(A) + tr(B). (1.4.3)
Lemma 1.37 Let A be an m×n matrix and B be an n×m matrix.
Then
tr(AB) = tr(BA). (1.4.4)
Proof. Let AB = C = [cij]. Then
cij =n∑k=1
aikbkj.
Thus
cii =n∑k=1
aikbki.
Now
tr(C) =m∑i=1
cii =m∑i=1
n∑k=1
aikbki =n∑k=1
m∑i=1
aikbki =n∑k=1
m∑i=1
bkiaik.
20 1. MATRIX ALGEBRA
Let BA = D = [dij]. Then
dij =m∑k=1
bikakj.
Thus
dii =m∑k=1
bikaki.
Hence
tr(D) =n∑i=1
dii =n∑i=1
m∑k=1
bikaki =m∑i=1
n∑k=1
bkiaik.
This completes the proof. �
Lemma 1.38 Let A ∈ Fm×n, B ∈ Fn×p, C ∈ Fp×m be three ma-
trices. Then
tr(ABC) = tr(BCA) = tr(CAB). (1.4.5)
Proof. Let AB = D. Then
tr(ABC) = tr(DC) = tr(CD) = tr(CAB).
Similarly the other result can be proved. �
Lemma 1.39 Trace of similar matrices is equal.
Proof. Let B be similar to A. Thus
B = C−1AC
for some invertible matrix C. Then
tr(B) = tr(C−1AC) = tr(CC−1A) = tr(A).
We used lemma 1.37. �
1.4. TRACE AND DETERMINANT 21
1.4.2. Determinants
Following are some results on determinant of a square matrix A.
Lemma 1.40
det(αA) = αn det(A). (1.4.6)
Lemma 1.41 Determinant of a square matrix and its transpose
are equal.
det(A) = det(AT ). (1.4.7)
Lemma 1.42 Let A be a complex square matrix. Then
det(AH) = det(A). (1.4.8)
Proof.
det(AH) = det(AT
) = det(A) = det(A).
�
Lemma 1.43 Let A and B be two n× n matrices. Then
det(AB) = det(A) det(B). (1.4.9)
Lemma 1.44 Let A be an invertible matrix. Then
det(A−1) =1
det(A). (1.4.10)
22 1. MATRIX ALGEBRA
Lemma 1.45 Let A be a square matrix and p ∈ N. Then
det(Ap) = (det(A))p . (1.4.11)
Lemma 1.46 [Determinant of a triangular matrix] Determinant
of a triangular matrix is the product of its diagonal entries. i.e. if
A is upper or lower triangular matrix then
det(A) =n∏i=1
aii. (1.4.12)
Lemma 1.47 [Determinant of a diagonal matrix] Determinant of
a diagonal matrix is the product of its diagonal entries. i.e. if A
is a diagonal matrix then
det(A) =n∏i=1
aii. (1.4.13)
Lemma 1.48 [Determinant of similar matrices] Determinant of
similar matrices is equal.
Proof. Let B be similar to A. Thus
B = C−1AC
for some invertible matrix C. Hence
det(B) = det(C−1AC) = det(C−1) det(A) det(C).
Now
det(C−1) det(A) det(C) =1
det(C)det(A) det(C) = det(A).
We used lemma 1.43 and lemma 1.44. �
1.5. UNITARY AND ORTHOGONAL MATRICES 23
Lemma 1.49 Let u and v be vectors in Fn. Then
det(I + uvT ) = 1 + uTv. (1.4.14)
Lemma 1.50 [Determinant of a small perturbation of identity
matrix] Let A be a square matrix and let ε ≈ 0. Then
det(I + εA) ≈ 1 + ε tr(A). (1.4.15)
1.5. Unitary and orthogonal matrices
1.5.1. Orthogonal matrix
Definition 1.21 [Orthogonal matrix] A real square matrix U is
called orthogonal if the columns of U form an orthonormal set.
In other words, let
U =[u1 u2 . . . un
]with ui ∈ Rn. Then we have
ui · uj = δi,j.
Lemma 1.51 An orthogonal matrix U is invertible with UT =
U−1.
Proof. Let
U =[u1 u2 . . . un
]be orthogonal with
UT =
uT1uT2...
uTn .
24 1. MATRIX ALGEBRA
Then
UTU =
uT1uT2...
uTn .
[u1 u2 . . . un
]=[ui · uj
]= I.
Since columns of U are linearly independent and span Rn, hence U is
invertible. Thus
UT = U−1.
�
Lemma 1.52 Determinant of an orthogonal matrix is ±1.
Proof. Let U be an orthogonal matrix. Then
det(UTU) = det(I) =⇒ (det(U))2 = 1
Thus we have
det(U) = ±1.
�
1.5.2. Unitary matrix
Definition 1.22 [Unitary matrix] A complex square matrix U is
called unitary if the columns of U form an orthonormal set. In
other words, let
U =[u1 u2 . . . un
]with ui ∈ Cn. Then we have
ui · uj = 〈ui, uj〉 = uHj ui = δi,j.
Lemma 1.53 A unitary matrix U is invertible with UH = U−1.
Proof. Let
U =[u1 u2 . . . un
]
1.5. UNITARY AND ORTHOGONAL MATRICES 25
be orthogonal with
UH =
uH1uH2...
uHn .
Then
UHU =
uH1uH2...
uHn .
[u1 u2 . . . un
]=[uHi uj
]= I.
Since columns of U are linearly independent and span Cn, hence U is
invertible. Thus
UH = U−1.
�
Lemma 1.54 The magnitude of determinant of a unitary matrix
is 1.
Proof. Let U be a unitary matrix. Then
det(UHU) = det(I) =⇒ det(UH) det(U) = 1 =⇒ det(U)det(U) = 1.
Thus we have
| det(U)|2 = 1 =⇒ | det(U)| = 1.
�
1.5.3. F unitary matrix
We provide a common definition for unitary matrices over any field F.
This definition applies to both real and complex matrices.
Definition 1.23 [F Unitary matrix] A square matrix U ∈ Fn×n is
called F unitary if the columns of U form an orthonormal set. In
26 1. MATRIX ALGEBRA
other words, let
U =[u1 u2 . . . un
]with ui ∈ Fn. Then we have
〈ui, uj〉 = uHj ui = δi,j.
We note that a suitable definition of inner product transports the def-
inition appropriately into orthogonal matrices over R and unitary ma-
trices over C.
When we are talking about F unitary matrices, then we will use the
symbol UH to mean its inverse. In the complex case, it will map to its
conjugate transpose, while in real case it will map to simple transpose.
This definition helps us simplify some of the discussions in the sequel
(like singular value decomposition).
Following results apply equally to orthogonal matrices for real case and
unitary matrices for complex case.
Lemma 1.55 [Norm preservation] F-unitary matrices preserve norm.
i.e.
‖Ux‖2 = ‖x‖2.
Proof.
‖Ux‖22 = (Ux)H(Ux) = xHUHUx = xHIx = ‖x‖22.
�
Remark. For the real case we have
‖Ux‖22 = (Ux)T (Ux) = xTUTUx = xT Ix = ‖x‖22.
Lemma 1.56 [Inner product preservation] F-unitary matrices pre-
serve inner product. i.e.
〈Ux, Uy〉 = 〈x, y〉.
1.6. EIGEN VALUES 27
Proof.
〈Ux, Uy〉 = (Uy)HUx = yHUHUx = yHx.
�
Remark. For the real case we have
〈Ux, Uy〉 = (Uy)TUx = yTUTUx = yTx.
1.6. Eigen values
Much of the discussion in this section will be equally applicable to real
as well as complex matrices. We will use the complex notation mostly
and make specific remarks for real matrices wherever needed.
Definition 1.24 [Eigen value] A scalar λ is an eigen value of an
n×n matrix A = [aij] if there exists a non null vector x such that
Ax = λx. (1.6.1)
A non null vector x which satisfies this equation is called an eigen
vector of A for the eigen value λ.
An eigen value is also known as a characteristic value, proper
value or a latent value.
We note that (1.6.1) can be written as
Ax = λInx =⇒ (A− λIn)x = 0. (1.6.2)
Thus λ is an eigen value of A if and only if the matrix A−λI is singular.
Definition 1.25 [Spectrum of a matrix] The set comprising of
eigen values of a matrix A is known as its spectrum.
Remark. For each eigen vector x for a matrix A the corresponding
eigen value λ is unique.
28 1. MATRIX ALGEBRA
Proof. Assume that for x there are two eigen values λ1 and λ2,
then
Ax = λ1x = λ2x =⇒ (λ1 − λ2)x = 0.
This can happen only when either x = 0 or λ1 = λ2. Since x is an
eigen vector, it cannot be 0. Thus λ1 = λ2. �
Remark. If x is an eigen vector for A, then the corresponding eigen
value is given by
λ =xHAx
xHx. (1.6.3)
Proof.
Ax = λx =⇒ xHAx = λxHx =⇒ λ =xHAx
xHx.
since x is non-zero. �
Remark. An eigen vector x of A for eigen value λ belongs to the null
space of A− λI, i.e.
x ∈ N (A− λI).
In other words x is a nontrivial solution to the homogeneous system of
linear equations given by
(A− λI)z = 0.
Definition 1.26 [Eigen space] Let λ be an eigen value for a square
matrix A. Then its eigen space is the null space of A − λI i.e.
N (A− λI).
Remark. The set comprising all the eigen vectors of A for an eigen
value λ is given by
N (A− λI) \ {0} (1.6.4)
since 0 cannot be an eigen vector.
1.6. EIGEN VALUES 29
Definition 1.27 [Geometric multiplicity] Let λ be an eigen value
for a square matrix A. The dimension of its eigen space N (A−λI)
is known as the geometric multiplicity of the eigen value λ.
Remark. Clearly
dim(N (A− λI)) = n− rank(A− λI).
Remark. A scalar λ can be an eigen value of a square matrix A if and
only if
det(A− λI) = 0.
det(A− λI) is a polynomial in λ of degree n.
Remark.
det(A− λI) = p(λ) = αnλn + αn−1λn−1 + · · ·+ α1λ+ α0 (1.6.5)
where αi depend on entries in A.
In this sense, an eigen value of A is a root of the equation
p(λ) = 0. (1.6.6)
Its easy to show that αn = (−1)n.
Definition 1.28 [Characteristic polynomial and equation] For any
square matrix A, the polynomial given by p(λ) = det(A − λI) is
known as its characteristic polynomial. The equation give by
p(λ) = 0 (1.6.7)
is known as its characteristic equation. The eigen values of
A are the roots of its characteristic polynomial or solutions of its
characteristic equation.
30 1. MATRIX ALGEBRA
Lemma 1.57 [Roots of characteristic equation] For real square
matrices, if we restrict eigen values to real values, then the char-
acteristic polynomial can be factored as
p(λ) = (−1)n(λ− λ1)r1 . . . (λ− λk)rkq(λ). (1.6.8)
The polynomial has k distinct real roots. For each root λi, ri is a
positive integer indicating how many times the root appears. q(λ)
is a polynomial that has no real roots. The following is true
r1 + · · ·+ rk + deg(q(λ)) = n. (1.6.9)
Clearly k ≤ n.
For complex square matrices where eigen values can be complex
(including real square matrices), the characteristic polynomial can
be factored as
p(λ) = (−1)n(λ− λ1)r1 . . . (λ− λk)rk . (1.6.10)
The polynomial can be completely factorized into first degree poly-
nomials. There are k distinct roots or eigen values. The following
is true
r1 + · · ·+ rk = n. (1.6.11)
Thus including the duplicates there are exactly n eigen values for
a complex square matrix.
Remark. It is quite possible that a real square matrix doesn’t have
any real eigen values.
Definition 1.29 [Algebraic multiplicity] The number of times an
eigen value appears in the factorization of the characteristic poly-
nomial of a square matrix A is known as its algebraic multiplicity.
In other words ri is the algebraic multiplicity for λi in above fac-
torization.
Remark. In above the set {λ1, . . . , λk} forms the spectrum of A.
1.6. EIGEN VALUES 31
Let us consider the sum of ri which gives the count of total number of
roots of p(λ).
m =k∑i=1
ri. (1.6.12)
With this there are m not-necessarily distinct roots of p(λ). Let us
write p(λ) as
p(λ) = (−1)n(λ− c1)(λ− c2) . . . (λ− cm)q(λ). (1.6.13)
where c1, c2, . . . , cm are m scalars (not necessarily distinct) of which r1
scalars are λ1, r2 are λ2 and so on. Obviously for the complex case
q(λ) = 1.
We will refer to the set (allowing repetitions) {c1, c2, . . . , cm} as the
eigen values of the matrix A where ci are not necessarily distinct. In
contrast the spectrum of A refers to the set of distinct eigen values of
A. The symbol c has been chosen based on the other name for eigen
values (the characteristic values).
We can put together eigen vectors of a matrix into another matrix by
itself. This can be very useful tool. We start with a simple idea.
Lemma 1.58 Let A be an n × n matrix. Let u1, u2, . . . , ur be r
non-zero vectors from Fn. Let us construct an n× r matrix
U =[u1 u2 . . . ur
].
Then all the r vectors are eigen vectors of A if and only if there
exists a diagonal matrix D = diag(d1, . . . , dr) such that
AU = UD. (1.6.14)
Proof. Expanding the equation, we can write[Au1 Au2 . . . Aur
]=[d1u1 d2u2 . . . drur
].
Clearly we want
Aui = diui
32 1. MATRIX ALGEBRA
where ui are non-zero. This is possible only when di is an eigen value
of A and ui is an eigen vector for di.
Converse: Assume that ui are eigen vectors. Choose di to be corre-
sponding eigen values. Then the equation holds. �
Lemma 1.59 0 is an eigen value of a square matrix A if and only
if A is singular.
Proof. Let 0 be an eigen value of A. Then there exists u 6= 0 such
that
Au = 0u = 0.
Thus u is a non-trivial solution of the homogeneous linear system. Thus
A is singular.
Converse: Assuming that A is singular, there exists u 6= 0 s.t.
Au = 0 = 0u.
Thus 0 is an eigen value of A. �
Lemma 1.60 If a square matrix A is singular, then N (A) is the
eigen space for the eigen value λ = 0.
Proof. This is straight forward from the definition of eigen space
(see definition 1.26). �
Remark. Clearly the geometric multiplicity of λ = 0 equals nullity(A) =
n− rank(A).
Lemma 1.61 Let A be a square matrix. Then A and AT have
same eigen values.
Proof. The eigen values of AT are given by
det(AT − λI) = 0.
1.6. EIGEN VALUES 33
But
AT − λI = AT − (λI)T = (A− λI)T .
Hence (using lemma 1.41)
det(AT − λI) = det((A− λI)T
)= det(A− λI).
Thus the characteristic polynomials of A and AT are same. Hence the
eigen values are same. In other words the spectrum of A and AT are
same. �
Remark (Direction preservation). If x is an eigen vector with a non-
zero eigen value λ for A then Ax and x are collinear.
In other words the angle between Ax and x is either 0◦ when λ is
positive and is 180◦ when λ is negative. Let us look at the inner
product:
〈Ax, x〉 = xHAx = xHλx = λ‖x‖22.
Meanwhile
‖Ax‖2 = ‖λx‖2 = |λ|‖x‖2.
Thus
|〈Ax, x〉| = ‖Ax‖2‖x‖2.
The angle θ between Ax and x is given by
cos θ =〈Ax, x〉‖Ax‖2‖x‖2
=λ‖x‖22|λ|‖x‖22
= ±1.
Lemma 1.62 Let A be a square matrix and λ be an eigen value
of A. Let p ∈ N. Then λp is an eigen value of Ap.
Proof. For p = 1 the statement holds trivially since λ1 is an eigen
value of A1. Assume that the statement holds for some value of p.
Thus let λp be an eigen value of Ap and let u be corresponding eigen
vector. Now
Ap+1u = Ap(Au) = Apλu = λApu = λλpu = λp+1u.
34 1. MATRIX ALGEBRA
Thus λp+1 is an eigen value for Ap+1 with the same eigen vector u. With
the principle of mathematical induction, the proof is complete. �
Lemma 1.63 Let a square matrix A be non singular and let λ 6= 0
be some eigen value of A. Then λ−1 is an eigen value of A−1.
Moreover, all eigen values of A−1 are obtained by taking inverses
of eigen values of A i.e. if µ 6= 0 is an eigen value of A−1 then 1µ
is an eigen value of A also. Also, A and A−1 share the same set
of eigen vectors.
Proof. Let u 6= 0 be an eigen vector of A for the eigen value λ.
Then
Au = λu =⇒ u = A−1λu =⇒ 1
λu = A−1u.
Thus u is also an eigen vector of A−1 for the eigen value 1λ.
Now let B = A−1. Then B−1 = A. Thus if µ is an eigen value of B
then 1µ
is an eigen value of B−1 = A.
Thus if A is invertible then eigen values of A and A−1 have one to one
correspondence. �
This result is very useful. Since if it can be shown that a matrix A is
similar to a diagonal or a triangular matrix whose eigen values are easy
to obtain then determination of the eigen values of A becomes straight
forward.
1.6.1. Invariant subspaces
Definition 1.30 [Invariance subspace] Let A be a square n × n
matrix and let W be a subspace of Fn i.e. W ≤ F. Then W is
invariant relative to A if
Aw ∈W ∀ w ∈W. (1.6.15)
i.e. A(W ) ⊆ W or for every vector w ∈W its mapping Aw is also
in W. Thus action of A on W doesn’t take us outside of W.
1.6. EIGEN VALUES 35
We also say that W is A-invariant.
Eigen vectors are generators of invariant subspaces.
Lemma 1.64 Let A be an n × n matrix. Let x1, x2, . . . , xr be r
eigen vectors of A. Let us construct an n× r matrix
X =[x1 x2 . . . rr
].
Then the column space of X i.e. C(X) is invariant relative to A.
Proof. Let us assume that c1, c2, . . . , cr are the eigen values cor-
responding to x1, x2, . . . , xr (not necessarily distinct).
Let any vector x ∈ C(X) be given by
x =r∑i=1
αixi.
Then
Ax = Ar∑i=1
αixi =r∑i=1
αiAxi =r∑i=1
αicixi.
Clearly Ax is also a linear combination of xi hence belongs to C(X).
Thus X is invariant relative to A or X is A-invariant. �
1.6.2. Triangular matrices
Lemma 1.65 Let A be an n×n upper or lower triangular matrix.
Then its eigen values are the entries on its main diagonal.
Proof. If A is triangular then A − λI is also triangular with its
diagonal entries being (aii − λ). Using lemma 1.46, we have
p(λ) = det(A− λI) =n∏i=1
(aii − λ).
Clearly the roots of characteristic polynomial are aii. �
Several small results follow from this lemma.
36 1. MATRIX ALGEBRA
Corollary 1.66. Let A = [aij] be an n× n triangular matrix.
(a) The characteristic polynomial of A is p(λ) = (−1)n(λ− aii).
(a) A scalar λ is an eigen value of A iff its one of the diagonal entries
of A.
(a) The algebraic multiplicity of an eigen value λ is equal to the number
of times it appears on the main diagonal of A.
(a) The spectrum of A is given by the distinct entries on the main
diagonal of A.
A diagonal matrix is naturally both an upper triangular matrix as well
as a lower triangular matrix. Similar results hold for the eigen values
of a diagonal matrix also.
Lemma 1.67 Let A = [aij] be an n× n diagonal matrix.
(a) Its eigen values are the entries on its main diagonal.
(a) The characteristic polynomial of A is p(λ) = (−1)n(λ− aii).
(a) A scalar λ is an eigen value of A iff its one of the diagonal
entries of A.
(a) The algebraic multiplicity of an eigen value λ is equal to the
number of times it appears on the main diagonal of A.
(a) The spectrum of A is given by the distinct entries on the main
diagonal of A.
There is also a result for the geometric multiplicity of eigen values for
a diagonal matrix.
Lemma 1.68 Let A = [aij] be an n × n diagonal matrix. The
geometric multiplicity of an eigen value λ is equal to the number
of times it appears on the main diagonal of A.
Proof. The unit vectors ei are eigen vectors for A since
Aei = aiiei.
1.6. EIGEN VALUES 37
They are independent. Thus if a particular eigen value appears r num-
ber of times, then there are r linearly independent eigen vectors for the
eigen value. Thus its geometric multiplicity is equal to the algebraic
multiplicity. �
1.6.3. Similar matrices
Some very useful results are available for similar matrices.
Lemma 1.69 The characteristic polynomial and spectrum of sim-
ilar matrices is same.
Proof. Let B be similar to A. Thus there exists an invertible
matrix C such that
B = C−1AC.
Now
B−λI = C−1AC−λI = C−1AC−λC−1C = C−1(AC−λC) = C−1(A−λI)C.
Thus B − λI is similar to A − λI. Hence due to lemma 1.48, their
determinant is equal i.e.
det(B − λI) = det(A− λI).
This means that the characteristic polynomials of A and B are same.
Since eigen values are nothing but roots of the characteristic polyno-
mial, hence they are same too. This means that the spectrum (the set
of distinct eigen values) is same. �
Corollary 1.70. If A and B are similar to each other then
(a) An eigen value has same algebraic and geometric multiplicity for
both A and B.
(a) The (not necessarily distinct) eigen values of A and B are same.
Although the eigen values are same, but the eigen vectors are differ-
ent.
38 1. MATRIX ALGEBRA
Lemma 1.71 Let A and B be similar with
B = C−1AC
for some invertible matrix C. If u is an eigen vector of A for an
eigen value λ, then C−1u is an eigen vector of B for the same
eigen value.
Proof. u is an eigen vector of A for an eigen value λ. Thus we
have
Au = λu.
Thus
BC−1u = C−1ACC−1u = C−1Au = C−1λu = λC−1u.
Now u 6= 0 and C−1 is non singular. Thus C−1u 6= 0. Thus C−1u is an
eigen vector of B.
�
Theorem 1.72 [Geometric vs. algebraic multiplicity] Let λ be an
eigen value of a square matrix A. Then the geometric multiplicity
of λ is less than or equal to its algebraic multiplicity.
Corollary 1.73. If an n×n matrix A has n distinct eigen values, then
each of them has a geometric (and algebraic) multiplicity of 1.
Proof. The algebraic multiplicity of an eigen value is greater than
or equal to 1. But the sum cannot exceed n. Since there are n distinct
eigen values, thus each of them has algebraic multiplicity of 1. Now
geometric multiplicity of an eigen value is greater than equal to 1 and
less than equal to its algebraic multiplicity. �
1.6. EIGEN VALUES 39
Corollary 1.74. Let an n × n matrix A has k distinct eigen values
λ1, λ2, . . . , λk with algebraic multiplicities r1, r2, . . . , rk and geometric
multiplicities g1, g2, . . . gk respectively. Then
k∑i=1
gk ≤k∑i=1
rk ≤ n.
Moreover ifk∑i=1
gk =k∑i=1
rk
then
gk = rk.
1.6.4. Linear independence of eigen vectors
Theorem 1.75 [Linear independence of eigen vectors for distinct
eigen values] Let A be an n × n square matrix. Let x1, x2, . . . , xk
be any k eigen vectors of A for distinct eigen values λ1, λ2, . . . , λk
respectively. Then x1, x2, . . . , xk are linearly independent.
Proof. We first prove the simpler case with 2 eigen vectors x1 and
x2 and corresponding eigen values λ1 and λ2 respectively.
Let there be a linear relationship between x1 and x2 given by
α1x1 + α2x2 = 0.
Multiplying both sides with (A− λ1I) we get
α1(A− λ1I)x1 + α2(A− λ1I)x2 = 0
=⇒ α1(λ1 − λ1)x1 + α2(λ1 − λ2)x2 = 0
=⇒ α2(λ1 − λ2)x2 = 0.
Since λ1 6= λ2 and x2 6= 0 , hence α2 = 0.
Similarly by multiplying with (A − λ2I) on both sides, we can show
that α1 = 0. Thus x1 and x2 are linearly independent.
40 1. MATRIX ALGEBRA
Now for the general case, consider a linear relationship between x1, x2, . . . , xk
given by
α1x1 + α2x2 + . . . αkxk = 0.
Multiplying by∏k
i 6=j,i=1(A − λiI) and using the fact that λi 6= λj if
i 6= j, we get αj = 0. Thus the only linear relationship is the trivial
relationship. This completes the proof. �
For eigen values with geometric multiplicity greater than 1 there are
multiple eigenvectors corresponding to the eigen value which are lin-
early independent. In this context, above theorem can be generalized
further.
Theorem 1.76 Let λ1, λ2, . . . , λk be k distinct eigen values of
A. Let {xj1, xj2, . . . x
jgj} be any gj linearly independent eigen vec-
tors from the eigen space of λj where gj is the geometric mul-
tiplicity of λj. Then the combined set of eigen vectors given by
{x11, . . . x1g1 , . . . xk1, . . . x
kgk} consisting of
∑kj=1 gj eigen vectors is
linearly independent.
This result puts an upper limit on the number of linearly independent
eigen vectors of a square matrix.
Lemma 1.77 Let {λ1, . . . , λk} represents the spectrum of an n×nmatrix A. Let g1, . . . , gk be the geometric multiplicities of λ1, . . . λk
respectively. Then the number of linearly independent eigen vectors
for A isk∑i=1
gi.
Moreover ifk∑i=1
gi = n
then a set of n linearly independent eigen vectors of A can be found
which forms a basis for Fn.
1.6. EIGEN VALUES 41
1.6.5. Diagonalization
Diagonalization is one of the fundamental operations in linear algebra.
This section discusses diagonalization of square matrices in depth.
Definition 1.31 [Diagonalizable matrix] An n × n matrix A is
said to be diagonalizable if it is similar to a diagonal matrix.
In other words there exists an n × n non-singular matrix P such
that D = P−1AP is a diagonal matrix. If this happens then we
say that P diagonalizes A or A is diagonalized by P .
Remark.
D = P−1AP ⇐⇒ PD = AP ⇐⇒ PDP−1 = A. (1.6.16)
We note that if we restrict to real matrices, then U and D should
also be real. If A ∈ Cn×n (it may still be real) then P and D can be
complex.
The next theorem is the culmination of a variety of results studied so
far.
Theorem 1.78 [Properties of diagonalizable matrices] Let A be a
diagonalizable matrix with D = P−1AP being its diagonalization.
Let D = diag(d1, d2, . . . , dn). Then the following hold
(a) rank(A) = rank(D) which equals the number of non-zero en-
tries on the main diagonal of D.
(a) det(A) = d1d2 . . . dn.
(a) tr(A) = d1 + d2 + . . . dn.
(a) The characteristic polynomial of A is
p(λ) = (−1)n(λ− d1)(λ− d2) . . . (λ− dn).
(a) The spectrum of A comprises the distinct scalars on the diag-
onal entries in D.
42 1. MATRIX ALGEBRA
(a) The (not necessarily distinct) eigenvalues of A are the diagonal
elements of D.
(a) The columns of P are (linearly independent) eigenvectors of
A.
(a) The algebraic and geometric multiplicities of an eigenvalue λ
of A equal the number of diagonal elements of D that equal λ.
Proof. From definition 1.31 we note that D and A are similar.
Due to lemma 1.48
det(A) = det(D).
Due to lemma 1.47
det(D) =n∏i=1
di.
Now due to lemma 1.39
tr(A) = tr(D) =n∑i=1
di.
Further due to lemma 1.69 the characteristic polynomial and spectrum
of A and D are same. Due to lemma 1.67 the eigen values of D are
nothing but its diagonal entries. Hence they are also the eigen values
of A.
D = P−1AP =⇒ AP = PD.
Now writing
P =[p1 p2 . . . pn
]we have
AP =[Ap1 Ap2 . . . Apn
]= PD =
[d1p1 d2p2 . . . dnpn
].
Thus pi are eigen vectors of A.
Since the characteristic polynomials of A and D are same, hence the
algebraic multiplicities of eigen values are same.
From lemma 1.71 we get that there is a one to one correspondence
between the eigen vectors of A and D through the change of basis
1.6. EIGEN VALUES 43
given by P . Thus the linear independence relationships between the
eigen vectors remain the same. Hence the geometric multiplicities of
individual eigenvalues are also the same.
This completes the proof. �
So far we have verified various results which are available if a matrix A
is diagonalizable. We haven’t yet identified the conditions under which
A is diagonalizable. We note that not every matrix is diagonalizable.
The following theorem gives necessary and sufficient conditions under
which a matrix is diagonalizable.
Theorem 1.79 An n× n matrix A is diagonalizable by an n× nnon-singular matrix P if and only if the columns of P are (linearly
independent) eigenvectors of A.
Proof. We note that since P is non-singular hence columns of P
have to be linearly independent.
The necessary condition part was proven in theorem 1.78. We now
show that if P consists of n linearly independent eigen vectors of A
then A is diagonalizable.
Let the columns of P be p1, p2, . . . , pn and corresponding (not neces-
sarily distinct) eigen values be d1, d2, . . . , dn. Then
Api = dipi.
Thus by letting D = diag(d1, d2, . . . , dn), we have
AP = PD.
Now since columns of P are linearly independent, hence P is invertible.
This gives us
D = P−1AP.
Thus A is similar to a diagonal matrix D. This validates the sufficient
condition. �
44 1. MATRIX ALGEBRA
A corollary follows.
Corollary 1.80. An n×n matrix is diagonalizable if and only if there
exists a linearly independent set of n eigenvectors of A.
Now we know that geometric multiplicities of eigen values of A provide
us information about linearly independent eigenvectors of A.
Corollary 1.81. Let A be an n× n matrix. Let λ1, λ2, . . . , λk be its k
distinct eigen values (comprising its spectrum). Let gj be the geometric
multiplicity of λj.Then A is diagonalizable if and only if
n∑i=1
gi = n. (1.6.17)
1.6.6. Symmetric matrices
This subsection is focused on real symmetric matrices.
Following is a fundamental property of real symmetric matrices.
Theorem 1.82 Every real symmetric matrix has an eigen value.
The proof of this result is beyond the scope of this book.
Lemma 1.83 Let A be an n×n real symmetric matrix. Let λ1 and
λ2 be any two distinct eigen values of A and let x1 and x2 be any
two corresponding eigen vectors. Then x1 and x2 are orthogonal.
Proof. By definition we have Ax1 = λ1x1 and Ax2 = λ2x2. Thus
xT2Ax1 = λ1xT2 x1
=⇒ xT1ATx2 = λ1x
T1 x2
=⇒ xT1Ax2 = λ1xT1 x2
=⇒ xT1 λ2x2 = λ1xT1 x2
=⇒ (λ1 − λ2)xT1 x2 = 0
=⇒ xT1 x2 = 0.
1.6. EIGEN VALUES 45
Thus x1 and x2 are orthogonal. In between we took transpose on both
sides, used the fact that A = AT and λ1 − λ2 6= 0. �
Definition 1.32 [Orthogonally diagonalizable matrix] A real n×nmatrix A is said to be orthogonally diagonalizable if there
exists an orthogonal matrix U which can diagonalize A, i.e.
D = UTAU
is a real diagonal matrix.
Lemma 1.84 Every orthogonally diagonalizable matrix A is sym-
metric.
Proof. We have a diagonal matrix D such that
A = UDUT .
Taking transpose on both sides we get
AT = UDTUT = UDUT = A.
Thus A is symmetric. �
Theorem 1.85 Every symmetric matrix A is orthogonally diago-
nalizable.
We skip the proof of this theorem.
1.6.7. Hermitian matrices
Following is a fundamental property of Hermitian matrices.
Theorem 1.86 Every Hermitian matrix has an eigen value.
The proof of this result is beyond the scope of this book.
46 1. MATRIX ALGEBRA
Lemma 1.87 The eigenvalues of a Hermitian matrix are real.
Proof. Let A be a Hermitian matrix and let λ be an eigen value
of A. Let u be a corresponding eigen vector. Then
Au = λu
=⇒ uHAH = uHλ
=⇒ uHAHu = uHλu
=⇒ uHAu = λuHu
=⇒ uHλu = λuHu
=⇒ ‖u‖22(λ− λ) = 0
=⇒ λ = λ
thus λ is real. We used the facts that A = AH and u 6= 0 =⇒ ‖u‖2 6=0. �
Lemma 1.88 Let A be an n × n complex Hermitian matrix. Let
λ1 and λ2 be any two distinct eigen values of A and let x1 and
x2 be any two corresponding eigen vectors. Then x1 and x2 are
orthogonal.
Proof. By definition we have Ax1 = λ1x1 and Ax2 = λ2x2. Thus
xH2 Ax1 = λ1xH2 x1
=⇒ xH1 AHx2 = λ1x
H1 x2
=⇒ xH1 Ax2 = λ1xH1 x2
=⇒ xH1 λ2x2 = λ1xH1 x2
=⇒ (λ1 − λ2)xH1 x2 = 0
=⇒ xH1 x2 = 0.
Thus x1 and x2 are orthogonal. In between we took conjugate transpose
on both sides, used the fact that A = AH and λ1 − λ2 6= 0. �
1.6. EIGEN VALUES 47
Definition 1.33 [Unitary diagonalizable matrix] A complex n×nmatrix A is said to be unitary diagonalizable if there exists a
unitary matrix U which can diagonalize A, i.e.
D = UHAU
is a complex diagonal matrix.
Lemma 1.89 Let A be a unitary diagonalizable matrix whose di-
agonalization D is real. Then A is Hermitian.
Proof. We have a real diagonal matrix D such that
A = UDUH .
Taking conjugate transpose on both sides we get
AH = UDHUH = UDUH = A.
Thus A is Hermitian. We used the fact that DH = D since D is
real. �
Theorem 1.90 Every Hermitian matrix A is unitary diagonaliz-
able.
We skip the proof of this theorem. The theorem means that if A is
Hermitian then A = UΛUH
Definition 1.34 [Eigen value decomposition of a Hermitian ma-
trix] Let A be an n × n Hermitian matrix. Let λ1, . . . λn be its
eigen values such that |λ1| ≥ |λ2| ≥ · · · ≥ |λn|. Let
Λ = diag(λ1, . . . , λn).
Let U be a unit matrix consisting of orthonormal eigen vectors
corresponding to λ1, . . . , λn. Then The eigen value decomposition
of A is defined as
A = UΛUH . (1.6.18)
48 1. MATRIX ALGEBRA
If λi are distinct, then the decomposition is unique. If they are
not distinct, then
Remark. Let Λ be a diagonal matrix as in definition 1.34. Consider
some vector x ∈ Cn.
xHΛx =n∑i=1
λi|xi|2. (1.6.19)
Now if λi ≥ 0 then
xHΛx ≤ λ1
n∑i=1
|xi|2 = λ1‖x‖22.
Also
xHΛx ≥ λn
n∑i=1
|xi|2 = λn‖x‖22.
Lemma 1.91 Let A be a Hermitian matrix with non-negative eigen
values. Let λ1 be its largest and λn be its smallest eigen values.
λn‖x‖22 ≤ xHAx ≤ λ1‖x‖22 ∀ x ∈ Cn. (1.6.20)
Proof. A has an eigen value decomposition given by
A = UΛUH .
Let x ∈ Cn and let v = UHx. Clearly ‖x‖2 = ‖v‖2. Then
xHAx = xHUΛUHx = vHΛv.
From previous remark we have
λn‖v‖22 ≤ vHΛv ≤ λ1‖v‖22.
Thus we get
λn‖x‖22 ≤ xHAx ≤ λ1‖x‖22.
�
1.6. EIGEN VALUES 49
1.6.8. Miscellaneous properties
This subsection lists some miscellaneous properties of eigen values of a
square matrix.
Lemma 1.92 λ is an eigen value of A if and only if λ + k is an
eigen value of A + kI. Moreover A and A + kI share the same
eigen vectors.
Proof.Ax = λx
⇐⇒ Ax+ kx = λx+ kx
⇐⇒ (A+ kI)x = (λ+ k)x.
(1.6.21)
Thus λ is an eigen value of A with an eigen vector x if and only if λ+k
is an eigen vector of A+ kI with an eigen vector x. �
1.6.9. Diagonally dominant matrices
Definition 1.35 [Diagonally dominant matrix] Let A = [aij] be a
square matrix in Cn×n. A is called diagonally dominant if
|aii| ≥∑j 6=i
|aij|
holds true for all 1 ≤ i ≤ n. i.e. the absolute value of the diagonal
element is greater than or equal to the sum of absolute values of
all the off diagonal elements on that row.
Definition 1.36 [Strictly diagonally dominant matrix] Let A =
[aij] be a square matrix in Cn×n. A is called strictly diagonally
dominant if
|aii| >∑j 6=i
|aij|
holds true for all 1 ≤ i ≤ n. i.e. the absolute value of the diagonal
element is bigger than the sum of absolute values of all the off
diagonal elements on that row.
50 1. MATRIX ALGEBRA
Example 1.2: Strictly diagonally dominant matrix Let us con-
sider
A =
−4 −2 −1 0
−4 7 2 0
3 −4 9 1
2 −1 −3 15
We can see that the strict diagonal dominance condition is satisfied for
each row as follows:
row 1 : | − 4| > | − 2|+ | − 1|+ |0| = 3
row 2 : |7| > | − 4|+ |2|+ |0| = 6
row 3 : |9| > |3|+ | − 4|+ |1| = 8
row 4 : |15| > |2|+ | − 1|+ | − 3| = 6
�
Strictly diagonally dominant matrices have a very special property.
They are always non-singular.
Theorem 1.93 Strictly diagonally dominant matrices are non-
singular.
Proof. Suppose that A is diagonally dominant and singular. Then
there exists a vector u ∈ Cn with u 6= 0 such that
Au = 0. (1.6.22)
Let
u =[u1 u2 . . . un
]T.
We first show that every entry in u cannot be equal in magnitude. Let
us assume that this is so. i.e.
c = |u1| = |u2| = · · · = |un|.
1.6. EIGEN VALUES 51
Since u 6= 0 hence c 6= 0. Now for any row i in (1.6.22) , we have
n∑j=1
aijuj = 0
=⇒n∑j=1
±aijc = 0
=⇒n∑j=1
±aij = 0
=⇒ ∓ aii =∑j 6=i
±aij
=⇒ |aii| = |∑j 6=i
±aij|
=⇒ |aii| ≤∑j 6=i
|aij| using triangle inequality
but this contradicts our assumption that A is strictly diagonally dom-
inant. Thus all entries in u are not equal in magnitude.
Let us now assume that the largest entry in u lies at index i with
|ui| = c. Without loss of generality we can scale down u by c to
get another vector in which all entries are less than or equal to 1 in
magnitude while i-th entry is ±1. i.e. ui = ±1 and |uj| ≤ 1 for all
other entries.
Now from (1.6.22) we get for the i-th row
n∑j=1
aijuj = 0
=⇒ ± aii =∑j 6=i
ujaij
=⇒ |aii| ≤∑j 6=i
|ujaij| ≤∑j 6=i
|aij|
which again contradicts our assumption that A is strictly diagonally
dominant.
Hence strictly diagonally dominant matrices are non-singular. �
52 1. MATRIX ALGEBRA
1.6.10. Gershgorin’s theorem
We are now ready to examine Gershgorin’ theorem which provides very
useful bounds on the spectrum of a square matrix.
Theorem 1.94 Every eigen value λ of a square matrix A ∈ Cn×n
satisfies
|λ− aii| ≤∑j 6=i
|aij| for some i ∈ {1, 2, . . . , n}. (1.6.23)
Proof. The proof is a straight forward application of non-singularity
of diagonally dominant matrices.
We know that for an eigen value λ, det(λI − A) = 0 i.e. the matrix
(λI − A) is singular. Hence it cannot be strictly diagonally dominant
due to theorem 1.93.
Thus looking at each row i of (λI − A) we can say that
|λ− aii| >∑j 6=i
|aij|
cannot be true for all rows simultaneously. i.e. it must fail at least for
one row. This means that there exists at least one row i for which
|λ− aii| ≤∑j 6=i
|aij|
holds true. �
What this theorem means is pretty simple. Consider a disc in the
complex plane for the i-th row of A whose center is given by aii and
whose radius is given by r =∑
j 6=i |aij| i.e. the sum of magnitudes of
all non-diagonal entries in i-th row.
There are n such discs corresponding to n rows in A. (1.6.23) means
that every eigen value must lie within the union of these discs. It
cannot lie outside.
This idea is crystallized in following definition.
1.7. SINGULAR VALUES 53
Definition 1.37 [Gershgorin’s disc] For i-th row of matrix A we
define the radius ri =∑
j 6=i |aij| and the center ci = aii. Then the
set given by
Di = {z ∈ C : |z − aii| ≤ ri}
is called the i-th Gershgorin’s disc of A.
We note that the definition is equally valid for real as well as complex
matrices. For real matrices, the centers of disks lie on the real line. For
complex matrices, the centers may lie anywhere in the complex plane.
Clearly there is nothing magical about the rows of A. We can as well
consider the columns of A.
Theorem 1.95 Every eigen value of a matrix A must lie in a
Gershgorin disc corresponding to the columns of A where the Ger-
shgorin disc for j-th column is given by
Dj = {z ∈ C : |z − ajj| ≤ rj}
with
rj =∑i 6=j
|aij|
Proof. We know that eigen values of A are same as eigen values of
AT and columns of A are nothing but rows of AT . Hence eigen values of
A must satisfy conditions in theorem 1.94 w.r.t. the matrix AT . This
completes the proof. �
1.7. Singular values
In previous section we saw diagonalization of square matrices which
resulted in an eigen value decomposition of the matrix. This matrix
factorization is very useful yet it is not applicable in all situations. In
particular, the eigen value decomposition is useless if the square matrix
is not diagonalizable or if the matrix is not square at all. Moreover,
54 1. MATRIX ALGEBRA
the decomposition is particularly useful only for real symmetric or Her-
mitian matrices where the diagonalizing matrix is an F-unitary matrix
(see definition 1.23). Otherwise, one has to consider the inverse of the
diagonalizing matrix also.
Fortunately there happens to be another decomposition which applies
to all matrices and it involves just F-unitary matrices.
Definition 1.38 [Singular value] A non-negative real number σ is
a singular value for a matrix A ∈ Fm×n if and only if there exist
unit-length vectors u ∈ Fm and v ∈ Fn such that
Av = σu (1.7.1)
and
AHu = σv (1.7.2)
hold. The vectors u and v are called left-singular and right-
singular vectors for σ respectively.
We first present the basic result of singular value decomposition. We
will not prove this result completely although we will present proofs of
some aspects.
Theorem 1.96 For every A ∈ Fm×n with k = min(m,n), there
exist two F-unitary matrices U ∈ Fm×m and V ∈ Fn×n and a
sequence of real numbers
σ1 ≥ σ2 ≥ · · · ≥ σk ≥ 0
such that
UHAV = Σ (1.7.3)
where
Σ = diag(σ1, σ2, . . . , σk) ∈ Fm×n.
The non-negative real numbers σi are the singular values of A as
per definition 1.38.
1.7. SINGULAR VALUES 55
The sequence of real numbers σi doesn’t depend on the particular
choice of U and V .
Σ is rectangular with the same size as A. The singular values of A lie
on the principle diagonal of Σ. All other entries in Σ are zero.
It is certainly possible that some of the singular values are 0 themselves.
Remark. Since UHAV = Σ hence
A = UΣV H . (1.7.4)
Definition 1.39 [Singular value decomposition] The decomposi-
tion of a matrix A ∈ Fm×n given by
A = UΣV H (1.7.5)
is known as its singular value decomposition.
Remark. When F is R then the decomposition simplifies to
UTAV = Σ (1.7.6)
and
A = UΣV T . (1.7.7)
Remark. Clearly there can be at most k = min(m,n) distinct singular
values of A.
Remark. We can also write
AV = UΣ. (1.7.8)
Remark. Let us expand
A = UΣV H =[u1 u2 . . . um
] [σij
]vH1vH2...
vHn
=m∑i=1
n∑j=1
σijuivHj .
56 1. MATRIX ALGEBRA
Remark. Alternatively, let us expand
Σ = UHAV =
uH1uH2...
uHm
A[v1 v2 . . . vm
]=[uHi Avj
]
This gives us
σij = uHi Avj. (1.7.9)
Following lemma verifies that Σ indeed consists of singular values of A
as per definition 1.38.
Lemma 1.97 Let A = UΣV H be a singular value decomposition
of A. Then the main diagonal entries of Σ are singular values.
The first k = min(m,n) column vectors in U and V are left and
right singular vectors of A.
Proof. We have
AV = UΣ.
Let us expand R.H.S.
UΣ =[∑m
j=1 uijσjk
]= [uikσk] =
[σ1u1 σ2u2 . . . σkuk 0 . . . 0
]where 0 columns in the end appear n− k times.
Expanding the L.H.S. we get
AV =[Av1 Av2 . . . Avn
].
Thus by comparing both sides we get
Avi = σiui for 1 ≤ i ≤ k
and
Avi = 0 for k < i ≤ n.
Now let us start with
A = UΣV H =⇒ AH = V ΣHUH =⇒ AHU = V ΣH .
1.7. SINGULAR VALUES 57
Let us expand R.H.S.
V ΣH =[∑n
j=1 vijσjk
]= [vikσk] =
[σ1v1 σ2v2 . . . σkvk 0 . . . 0
]where 0 columns appear m− k times.
Expanding the L.H.S. we get
AHU =[AHu1 AHu2 . . . AHum
].
Thus by comparing both sides we get
AHui = σivi for 1 ≤ i ≤ k
and
AHui = 0 for k < i ≤ m.
We now consider the three cases.
For m = n, we have k = m = n. And we get
Avi = σiui, AHui = σivi for 1 ≤ i ≤ m
Thus σi is a singular value of A and ui is a left singular vector while vi
is a right singular vector.
For m < n, we have k = m. We get for first m vectors in V
Avi = σiui, AHui = σivi for 1 ≤ i ≤ m.
Finally for remaining n−m vectors in V , we can write
Avi = 0.
They belong to the null space of A.
For m > n, we have k = n. We get for first n vectors in U
Avi = σiui, AHui = σivi for 1 ≤ i ≤ n.
Finally for remaining m− n vectors in U , we can write
AHui = 0.
�
58 1. MATRIX ALGEBRA
Lemma 1.98 ΣΣH is an m×m matrix given by
ΣΣH = diag(σ21, σ
22, . . . σ
2k, 0, 0, . . . 0)
where the number of 0’s following σ2k is m− k.
Lemma 1.99 ΣHΣ is an n× n matrix given by
ΣHΣ = diag(σ21, σ
22, . . . σ
2k, 0, 0, . . . 0)
where the number of 0’s following σ2k is n− k.
Lemma 1.100 [Rank and singular value decomposition] Let A ∈Fm×n have a singular value decomposition given by
A = UΣV H .
Then
rank(A) = rank(Σ). (1.7.10)
In other words, rank of A is number of non-zero singular values of
A. Since the singular values are ordered in descending order in A
hence, the first r singular values σ1, . . . , σr are non-zero.
Proof. This is a straight forward application of lemma 1.6 and
lemma 1.7. Further since only non-zero values in Σ appear on its main
diagonal hence its rank is number of non-zero singular values σi. �
Corollary 1.101. Let r = rank(A). Then Σ can be split as a block
matrix
Σ =
[Σr 0
0 0
](1.7.11)
where Σr is an r × r diagonal matrix of the non-zero singular values
diag(σ1, σ2, . . . , σr). All other sub-matrices in Σ are 0.
1.7. SINGULAR VALUES 59
Lemma 1.102 The eigen values of Hermitian matrix AHA ∈Fn×n are σ2
1, σ22, . . . σ
2k, 0, 0, . . . 0 with n− k 0’s after σ2
k. Moreover
the eigen vectors are the columns of V .
Proof.
AHA =(UΣV H
)HUΣV H = V ΣHUHUΣV H = V ΣHΣV H .
We note that AHA is Hermitian. Hence AHA is diagonalized by V and
the diagonalization of AHA is ΣHΣ. Thus the eigen values of AHA are
σ21, σ
22, . . . σ
2k, 0, 0, . . . 0 with n− k 0’s after σ2
k.
Clearly
(AHA)V = V (ΣHΣ)
thus columns of V are the eigen vectors of AHA. �
Lemma 1.103 The eigen values of Hermitian matrix AAH ∈Fm×m are σ2
1, σ22, . . . σ
2k, 0, 0, . . . 0 with m−k 0’s after σ2
k. Moreover
the eigen vectors are the columns of V .
Proof.
AAH = UΣV H(UΣV H
)H= UΣV HV ΣHUH = UΣΣHUH .
We note that AHA is Hermitian. Hence AHA is diagonalized by V and
the diagonalization of AHA is ΣHΣ. Thus the eigen values of AHA are
σ21, σ
22, . . . σ
2k, 0, 0, . . . 0 with m− k 0’s after σ2
k.
Clearly
(AAH)U = U(ΣΣH)
thus columns of U are the eigen vectors of AAH . �
Lemma 1.104 The Gram matrices AAH and AHA share the same
eigen values except for some extra 0s. Their eigen values are the
squares of singular values of A and some extra 0s. In other words
60 1. MATRIX ALGEBRA
singular values of A are the square roots of non-zero eigen values
of the Gram matrices AAH or AHA.
1.7.1. The largest singular value
Lemma 1.105 For all u ∈ Fn the following holds
‖Σu‖2 ≤ σ1‖u‖2 (1.7.12)
Moreover for all u ∈ Fm the following holds
‖ΣHu‖2 ≤ σ1‖u‖2 (1.7.13)
Proof. Let us expand the term Σu.
σ1 0 . . . . . . 0
0 σ2 . . . . . . 0...
.... . . . . . 0
0... σk . . . 0
0 0... . . . 0
u1
u2...
uk...
un
=
σ1u1
σ2u2...
σkuk
0...
0
Now since σ1 is the largest singular value, hence
|σrui| ≤ |σ1ui| ∀ 1 ≤ i ≤ k.
Thusn∑i=1
|σ1ui|2 ≥n∑i=1
|σiui|2
or
σ21‖u‖22 ≥ ‖Σu‖22.
The result follows.
A simpler representation of Σu can be given using corollary 1.101. Let
r = rank(A). Thus
Σ =
[Σr 0
0 0
]
1.7. SINGULAR VALUES 61
We split entries in u as u = [(u1, . . . , ur)(ur+1 . . . un)]T . Then
Σu =
Σr
[u1 . . . ur
]T0[ur+1 . . . un
]T =
[σ1u1 σ2u2 . . . σrur 0 . . . 0
]TThus
‖Σu‖22 =r∑i=1
|σiui|2 ≤ σ1
r∑i=1
|ui|2 ≤ σ1‖u‖22.
2nd result can also be proven similarly. �
Lemma 1.106 Let σ1 be the largest singular value of an m × n
matrix A. Then
‖Ax‖2 ≤ σ1‖x‖2 ∀ x ∈ Fn. (1.7.14)
Moreover
‖AHx‖2 ≤ σ1‖x‖2 ∀ x ∈ Fm. (1.7.15)
Proof.
‖Ax‖2 = ‖UΣV Hx‖2 = ‖ΣV Hx‖2since U is unitary. Now from previous lemma we have
‖ΣV Hx‖2 ≤ σ1‖V Hx‖2 = σ1‖x‖2
since V H also unitary. Thus we get the result
‖Ax‖2 ≤ σ1‖x‖2 ∀ x ∈ Fn.
Similarly
‖AHx‖2 = ‖V ΣHUHx‖2 = ‖ΣHUHx‖2since V is unitary. Now from previous lemma we have
‖ΣHUHx‖2 ≤ σ1‖UHx‖2 = σ1‖x‖2
since UH also unitary. Thus we get the result
‖AHx‖2 ≤ σ1‖x‖2 ∀ x ∈ Fm.
�
62 1. MATRIX ALGEBRA
There is a direct connection between the largest singular value and
2-norm of a matrix (see section 1.8.6).
Corollary 1.107. The largest singular value of A is nothing but its
2-norm. i.e.
σ1 = max‖u‖2=1
‖Au‖2.
1.7.2. SVD and pseudo inverse
Lemma 1.108 [Pseudo-inverse of Σ] Let A = UΣV H and let r =
rank(A). Let σ1, . . . , σr be the r non-zero singular values of A.
Then the Moore-Penrose pseudo-inverse of Σ is an n ×m matrix
Σ† given by
Σ† =
[Σ−1r 0
0 0
](1.7.16)
where Σr = diag(σ1, . . . , σr).
Essentially Σ† is obtained by transposing Σ and inverting all its
non-zero (positive real) values.
Proof. Straight forward application of lemma 1.32. �
Corollary 1.109. The rank of Σ and its pseudo-inverse Σ† are same.
i.e.
rank(Σ) = rank(Σ†). (1.7.17)
Proof. The number of non-zero diagonal entries in Σ and Σ† are
same. �
Lemma 1.110 Let A be an m× n matrix and let A = UΣV H be
its singular value decomposition. Let Σ† be the pseudo inverse of
Σ as per lemma 1.108. Then the Moore-Penrose pseudo-inverse of
A is given by
A† = V Σ†UH . (1.7.18)
1.7. SINGULAR VALUES 63
Proof. As usual we verify the requirements for a Moore-Penrose
pseudo-inverse as per definition 1.19. We note that since Σ† is the
pseudo-inverse of Σ it already satisfies necessary criteria.
First requirement:
AA†A = UΣV HV Σ†UHUΣV H = UΣΣ†ΣV H = UΣV H = A.
Second requirement:
A†AA† = V Σ†UHUΣV HV Σ†UH = V Σ†ΣΣ†UH = V Σ†UH = A†.
We now consider
AA† = UΣV HV Σ†UH = UΣΣ†UH .
Thus(AA†
)H=(UΣΣ†UH
)H= U
(ΣΣ†
)HUH = UΣΣ†UH = AA†
since ΣΣ† is Hermitian.
Finally we consider
A†A = V Σ†UHUΣV H = V Σ†ΣV H .
Thus(A†A
)H=(V Σ†ΣV H
)H= V
(Σ†Σ
)HV H = V Σ†ΣV H = A†A
since Σ†Σ is also Hermitian.
This completes the proof. �
Finally we can connect the singular values of A with the singular values
of its pseudo-inverse.
Corollary 1.111. The rank of any m × n matrix A and its pseudo-
inverse A† are same. i.e.
rank(A) = rank(A†). (1.7.19)
Proof. We have rank(A) = rank(Σ). Also its easy to verify that
rank(A†) = rank(Σ†). So using corollary 1.109 completes the proof. �
64 1. MATRIX ALGEBRA
Lemma 1.112 Let A be an m× n matrix and let A† be its n×mpseudo inverse as per lemma 1.110. Let r = rank(A) Let k =
min(m,n) denote the number of singular values while r denote the
number of non-singular values of A. Let σ1, . . . , σr be the non-zero
singular values of A. Then the number of singular values of A† is
same as that of A and the non-zero singular values of A† are
1
σ1, . . . ,
1
σr
while all other k − r singular values of A† are zero.
Proof. k = min(m,n) denotes the number of singular values for
both A and A†. Since rank of A and A† are same, hence the number
of non-zero singular values is same. Now look at
A† = V Σ†UH
where
Σ† =
[Σ−1r 0
0 0
].
Clearly Σ−1r = diag( 1σ1, . . . , 1
σr).
Thus expanding the R.H.S. we can get
A† =r∑i=1
1
σiviu
Hi
where vi and ui are first r columns of V and U respectively. If we
reverse the order of first r columns of U and V and reverse the first r
diagonal entries of Σ† , the R.H.S. remains the same while we are able
to express A† in the standard singular value decomposition form. Thus1σ1, . . . , 1
σrare indeed the non-zero singular values of A†. �
1.7.3. Full column rank matrices
In this subsection we consider some specific results related to singular
value decomposition of a full column rank matrix.
1.7. SINGULAR VALUES 65
We will consider A to be an m × n matrix in Fm×n with m ≥ n and
rank(A) = n. Let A = UΣV H be its singular value decomposition.
From lemma 1.100 we observe that there are n non-zero singular values
of A. We will call these singular values as σ1, σ2, . . . , σn. We will define
Σn = diag(σ1, σ2, . . . , σn).
Clearly Σ is an 2× 1 block matrix given by
Σ =
[Σn
0
]
where the lower 0 is an (m− n)× n zero matrix. From here we obtain
that ΣHΣ is an n× n matrix given by
ΣHΣ = Σ2n
where
Σ2n = diag(σ2
1, σ22, . . . , σ
2n).
Lemma 1.113 Let A be a full column rank matrix with singular
value decomposition A = UΣV H . Then ΣHΣ = Σ2n = diag(σ2
1, σ22, . . . , σ
2n)
and ΣHΣ is invertible.
Proof. Since all singular values are non-zero hence Σ2n is invert-
ible. Thus(ΣHΣ
)−1=(Σ2n
)−1= diag
(1
σ21
,1
σ22
, . . . ,1
σ2n
). (1.7.20)
�
Lemma 1.114 Let A be a full column rank matrix with singular
value decomposition A = UΣV H . Let σ1 be its largest singular
value and σn be its smallest singular value. Then
σ2n‖x‖2 ≤ ‖ΣHΣx‖2 ≤ σ2
1‖x‖2 ∀ x ∈ Fn. (1.7.21)
66 1. MATRIX ALGEBRA
Proof. Let x ∈ Fn. We have
‖ΣHΣx‖22 = ‖Σ2nx‖22 =
n∑i=1
|σ2i xi|2.
Now since
σn ≤ σi ≤ σ1
hence
σ4n
n∑i=1
|xi|2 ≤n∑i=1
|σ2i xi|2 ≤ σ4
1
n∑i=1
|xi|2
thus
σ4n‖x‖22 ≤ ‖ΣHΣx‖22 ≤ σ4
1‖x‖22.
Applying square roots, we get
σ2n‖x‖2 ≤ ‖ΣHΣx‖2 ≤ σ2
1‖x‖2 ∀ x ∈ Fn.
�
We recall from corollary 1.25 that the Gram matrix of its column vec-
tors G = AHA is full rank and invertible.
Lemma 1.115 Let A be a full column rank matrix with singular
value decomposition A = UΣV H . Let σ1 be its largest singular
value and σn be its smallest singular value. Then
σ2n‖x‖2 ≤ ‖AHAx‖2 ≤ σ2
1‖x‖2 ∀ x ∈ Fn. (1.7.22)
Proof.
AHA = (UΣV H)H(UΣV H) = V ΣHΣV H .
Let x ∈ Fn. Let
u = V Hx =⇒ ‖u‖2 = ‖x‖2.
Let
r = ΣHΣu.
Then from previous lemma we have
σ2n‖u‖2 ≤ ‖ΣHΣu‖2 = ‖r‖2 ≤ σ2
1‖u‖2.
1.7. SINGULAR VALUES 67
Finally
AHAx = V ΣHΣV Hx = V r.
Thus
‖AHAx‖2 = ‖r‖2.
Substituting we get
σ2n‖x‖2 ≤ ‖AHAx‖2 ≤ σ2
1‖x‖2 ∀ x ∈ Fn.
�
There are bounds for the inverse of Gram matrix also. First let us
establish the inverse of Gram matrix.
Lemma 1.116 Let A be a full column rank matrix with singular
value decomposition A = UΣV H . Let the singular values of A be
σ1, . . . , σn. Let the Gram matrix of columns of A be G = AHA.
Then
G−1 = VΨV H
where
Ψ = diag
(1
σ21
,1
σ22
, . . . ,1
σ2n
).
Proof. We have
G = V ΣHΣV H
Thus
G−1 =(V ΣHΣV H
)−1=(V H)−1 (
ΣHΣ)−1
V −1 = V(ΣHΣ
)−1V H .
From lemma 1.113 we have
Ψ =(ΣHΣ
)−1= diag
(1
σ21
,1
σ22
, . . . ,1
σ2n
).
This completes the proof. �
We can now state the bounds:
68 1. MATRIX ALGEBRA
Lemma 1.117 Let A be a full column rank matrix with singular
value decomposition A = UΣV H . Let σ1 be its largest singular
value and σn be its smallest singular value. Then
1
σ21
‖x‖2 ≤ ‖(AHA
)−1x‖2 ≤
1
σ2n
‖x‖2 ∀ x ∈ Fn. (1.7.23)
Proof. From lemma 1.116 we have
G−1 =(AHA
)−1= VΨV H
where
Ψ = diag
(1
σ21
,1
σ22
, . . . ,1
σ2n
).
Let x ∈ Fn. Let
u = V Hx =⇒ ‖u‖2 = ‖x‖2.
Let
r = Ψu.
Then
‖r‖22 =n∑i=1
∣∣∣∣ 1
σ2i
ui
∣∣∣∣2 .Thus
1
σ21
‖u‖2 ≤ ‖Ψu‖2 = ‖r‖2 ≤1
σ2n
‖u‖2.
Finally (AHA
)−1x = VΨV Hx = V r.
Thus
‖(AHA
)−1x‖2 = ‖r‖2.
Substituting we get the result. �
1.8. MATRIX NORMS 69
1.7.4. Low rank approximation of a matrix
Definition 1.40 An m× n matrix A is called low rank if
rank(A)� min(m,n). (1.7.24)
Remark. A matrix is low rank if the number of non-zero singular
values for the matrix is much smaller than its dimensions.
Following is a simple procedure for making a low rank approximation
of a given matrix A.
(1) Perform the singular value decomposition of A given by A =
UΣV H .
(2) Identify the singular values of A in Σ.
(3) Keep the first r singular values (where r � min(m,n) is the
rank of the approximation). and set all other singular values
to 0 to obtain Σ.
(4) Compute A = UΣV H .
1.8. Matrix norms
This section reviews various matrix norms on the vector space of com-
plex matrices over the field of complex numbers (Cm×n,C).
We know (Cm×n,C) is a finite dimensional vector space with dimension
mn. We will usually refer to it as Cm×n.
Matrix norms will follow the usual definition of norms for a vector
space.
Definition 1.41 A function ‖ · ‖ : Cm×n → R is called a matrix
norm on Cm×n if for all A,B ∈ Cm×n and all α ∈ C it satisfies
the following
Positivity:
‖A‖ ≥ 0
70 1. MATRIX ALGEBRA
with ‖A‖ = 0 ⇐⇒ A = 0.
Homogeneity:
‖αA‖ = |α|‖A‖.
Triangle inequality:
‖A+B‖ ≤ ‖A‖+ ‖B‖.
We recall some of the standard results on normed vector spaces.
All matrix norms are equivalent. Let ‖ · ‖ and ‖ · ‖′ be two different
matrix norms on Cm×n. Then there exist two constants a and b such
that the following holds
a‖A‖ ≤ ‖A‖′ ≤ b‖A‖ ∀ A ∈ Cm×n.
A matrix norm is a continuous function ‖ · ‖ : Cm×n → R.
1.8.1. Norms like lp on Cn
Following norms are quite like lp norms on finite dimensional complex
vector space Cn. They are developed by the fact that the matrix vector
space Cm×n has one to one correspondence with the complex vector
space Cmn.
Definition 1.42 Let A ∈ Cm×n and A = [aij].
Matrix sum norm is defined as
‖A‖S =m∑i=1
n∑j=1
|aij| (1.8.1)
Definition 1.43 Let A ∈ Cm×n and A = [aij].
Matrix Frobenius norm is defined as
‖A‖F =
(m∑i=1
n∑j=1
|aij|2) 1
2
. (1.8.2)
1.8. MATRIX NORMS 71
Definition 1.44 Let A ∈ Cm×n and A = [aij].
Matrix Max norm is defined as
‖A‖M = max1≤i≤m1≤j≤n
|aij|. (1.8.3)
1.8.2. Properties of Frobenius norm
We now prove some elementary properties of Frobenius norm.
Lemma 1.118 The Frobenius norm of a matrix is equal to the
Frobenius norm of its Hermitian transpose.
‖AH‖F = ‖A‖F . (1.8.4)
Proof. Let
A = [aij].
Then
AH = [aji]
‖AH‖2F =
(n∑j=1
m∑i=1
|aij|2)
=
(m∑i=1
n∑j=1
|aij|2)
= ‖A‖2F .
Now
‖AH‖2F = ‖A‖2F =⇒ ‖AH‖F = ‖A‖F
�
Lemma 1.119 Let A ∈ Cm×n be written as a row of column vec-
tors
A =[a1 . . . an
].
Then
‖A‖2F =n∑j=1
‖aj‖22. (1.8.5)
72 1. MATRIX ALGEBRA
Proof. We note that
‖aj‖22 =m∑i=1
‖aij‖22.
Now
‖A‖2F =
(m∑i=1
n∑j=1
|aij|2)
=
(n∑j=1
(m∑i=1
|aij|2))
=
(n∑j=1
‖aj‖22
).
�
We thus showed that that the square of the Frobenius norm of a matrix
is nothing but the sum of squares of l2 norms of its columns.
Lemma 1.120 Let A ∈ Cm×n be written as a column of row vec-
tors
A =
a1
...
am
.Then
‖A‖2F =m∑i=1
‖ai‖22. (1.8.6)
Proof. We note that
‖ai‖22 =n∑j=1
‖aij‖22.
Now
‖A‖2F =
(m∑i=1
n∑j=1
|aij|2)
=m∑i=1
‖ai‖22.
�
We now consider how the Frobenius norm is affected with the action
of unitary matrices.
Let A be any arbitrary matrix in Cm×n. Let U be some unitary matrices
in Cm×m. Let V be some unitary matrices in Cn×n.
1.8. MATRIX NORMS 73
We present our first result that multiplication with unitary matrices
doesn’t change Frobenius norm of a matrix.
Theorem 1.121 The Frobenius norm of a matrix is invariant to
pre or post multiplication by a unitary matrix. i.e.
‖UA‖F = ‖A‖F (1.8.7)
and
‖AV ‖F = ‖A‖F . (1.8.8)
Proof. We can write A as
A =[a1 . . . an
].
So
UA =[Ua1 . . . Uan
].
Then applying lemma 1.119 clearly
‖UA‖2F =n∑j=1
‖Uaj‖22.
But we know that unitary matrices are norm preserving. Hence
‖Uaj‖22 = ‖aj‖22.
Thus
‖UA‖2F =n∑j=1
‖aj‖22 = ‖A‖2F
which implies
‖UA‖F = ‖A‖F .
Similarly writing A as
74 1. MATRIX ALGEBRA
A =
r1...
rm
.we have
AV =
r1V
...
rmV
.Then applying lemma 1.120 clearly
‖AV ‖2F =m∑i=1
‖riV ‖22.
But we know that unitary matrices are norm preserving. Hence
‖riV ‖22 = ‖ri‖22.
Thus
‖AV ‖2F =m∑i=1
‖ri‖22 = ‖A‖2F
which implies
‖AV ‖F = ‖A‖F .
An alternative approach for the 2nd part of the proof using the first
part is just one line
‖AV ‖F = ‖(AV )H‖F = ‖V HAH‖F = ‖AH‖F = ‖A‖F .
In above we use lemma 1.118 and the fact that V is a unitary matrix
implies that V H is also a unitary matrix. We have already shown that
pre multiplication by a unitary matrix preserves Frobenius norm. �
Theorem 1.122 Let A ∈ Cm×n and B ∈ Cn×P be two matrices.
Then the Frobenius norm of their product is less than or equal to
1.8. MATRIX NORMS 75
the product of Frobenius norms of the matrices themselves. i.e.
‖AB‖F ≤ ‖A‖F‖B‖F . (1.8.9)
Proof. We can write A as
A =
aT1...
aTm
where ai are m column vectors corresponding to rows of A. Similarly
we can write B as
B =[b1 . . . bP
]where bi are column vectors corresponding to columns of B. Then
AB =
aT1...
aTm
[b1 . . . bP
]=
aT1 b1 . . . aT1 bP
.... . .
...
aTmb1 . . . aTmbP
=[aTi bj
].
Now looking carefully
aTi bj = 〈ai, bj〉
Applying the Cauchy-Schwartz inequality we have
|〈ai, bj〉|2 ≤ ‖ai‖22‖bj‖22 = ‖ai‖22‖bj‖22
Now
‖AB‖2F =m∑i=1
P∑j=1
|aTi bj|2
≤m∑i=1
P∑j=1
‖ai‖22‖bj‖22
=
(m∑i=1
‖ai‖22
)(P∑j=1
‖bj‖22
)= ‖A‖2F‖B‖2F
which implies
‖AB‖F ≤ ‖A‖F‖B‖Fby taking square roots on both sides. �
76 1. MATRIX ALGEBRA
Corollary 1.123. Let A ∈ Cm×n and let x ∈ Cn. Then
‖Ax‖2 ≤ ‖A‖F‖x‖2.
Proof. We note that Frobenius norm for a column matrix is same
as l2 norm for corresponding column vector. i.e.
‖x‖F = ‖x‖2 ∀ x ∈ Cn.
Now applying theorem 1.122 we have
‖Ax‖2 = ‖Ax‖F ≤ ‖A‖F‖x‖F = ‖A‖F‖x‖2 ∀ x ∈ Cn.
�
It turns out that Frobenius norm is intimately related to the singular
value decomposition of a matrix.
Lemma 1.124 Let A ∈ Cm×n. Let the singular value decomposi-
tion of A be given by
A = UΣV H .
Let the singular value of A be σ1, . . . , σn. Then
‖A‖F =
√√√√ n∑i=1
σ2i . (1.8.10)
Proof.
A = UΣV H =⇒ ‖A‖F = ‖UΣV H‖F .
But
‖UΣV H‖F = ‖ΣV H‖F = ‖Σ‖Fsince U and V are unitary matrices (see theorem 1.121 ).
Now the only non-zero terms in Σ are the singular values. Hence
‖A‖F = ‖Σ‖F =
√√√√ n∑i=1
σ2i .
1.8. MATRIX NORMS 77
�
1.8.3. Consistency of a matrix norm
Definition 1.45 A matrix norm ‖·‖ is called consistent on Cn×n
if
‖AB‖ ≤ ‖A‖‖B‖ (1.8.11)
holds true for all A,B ∈ Cn×n. A matrix norm ‖ · ‖ is called
consistent if it is defined on Cm×n for all m,n ∈ N and eq (1.8.11)
holds for all matrices A,B for which the product AB is defined.
A consistent matrix norm is also known as a sub-multiplicative
norm.
With this definition and results in theorem 1.122 we can see that Frobe-
nius norm is consistent.
1.8.4. Subordinate matrix norm
A matrix operates on vectors from one space to generate vectors in
another space. It is interesting to explore the connection between the
norm of a matrix and norms of vectors in the domain and co-domain
of a matrix.
Definition 1.46 Let m,n ∈ N be given. Let ‖ · ‖α be some norm
on Cm and ‖ · ‖β be some norm on Cn. Let ‖ · ‖ be some norm on
matrices in Cm×n. We say that ‖ · ‖ is subordinate to the vector
norms ‖ · ‖α and ‖ · ‖β if
‖Ax‖α ≤ ‖A‖‖x‖β (1.8.12)
for all A ∈ Cm×n and for all x ∈ Cn. In other words the length of
the vector doesn’t increase by the operation of A beyond a factor
given by the norm of the matrix itself.
If ‖ · ‖α and ‖ · ‖β are same then we say that ‖ · ‖ is subordinate
to the vector norm ‖ · ‖α.
78 1. MATRIX ALGEBRA
We have shown earlier in corollary 1.123 that Frobenius norm is sub-
ordinate to Euclidean norm.
1.8.5. Operator norm
We now consider the maximum factor by which a matrix A can increase
the length of a vector.
Definition 1.47 Let m,n ∈ N be given. Let ‖ · ‖α be some norm
on Cn and ‖ · ‖β be some norm on Cm. For A ∈ Cm×n we define
‖A‖ , ‖A‖α→β , maxx 6=0
‖Ax‖β‖x‖α
. (1.8.13)
‖Ax‖β‖x‖α represents the factor with which the length of x increased
by operation of A. We simply pick up the maximum value of such
scaling factor.
The norm as defined above is known as (α→ β) operator norm,
the (α→ β)-norm, or simply the α-norm if α = β.
Off course we need to verify that this definition satisfies all properties
of a norm.
Clearly if A = 0 then Ax = 0 always, hence ‖A‖ = 0.
Conversely, if ‖A‖ = 0 then ‖Ax‖β = 0 ∀ x ∈ Cn. In particular this is
true for the unit vectors ei ∈ Cn. The i-th column of A is given by Aei
which is 0. Thus each column in A is 0. Hence A = 0.
Now consider c ∈ C.
‖cA‖ = maxx 6=0
‖cAx‖β‖x‖α
= |c|maxx6=0
‖Ax‖β‖x‖α
= |c|‖A‖.
We now present some useful observations on operator norm before we
can prove triangle inequality for operator norm.
For any x ∈ ker(A), Ax = 0 hence we only need to consider vectors
which don’t belong to the kernel of A.
1.8. MATRIX NORMS 79
Thus we can write
‖A‖α→β = maxx/∈ker(A)
‖Ax‖β‖x‖α
. (1.8.14)
We also note that
‖Acx‖β‖cx‖α
=|c|‖Ax‖β|c|‖x‖α
=‖Ax‖β‖x‖α
∀ c 6= 0, x 6= 0.
Thus, it is sufficient to find the maximum on unit norm vectors:
‖A‖α→β = max‖x‖α=1
‖Ax‖β.
Note that since ‖x‖α = 1 hence the term in denominator goes away.
Lemma 1.125 The (α→ β)-operator norm is subordinate to vec-
tor norms ‖ · ‖α and ‖ · ‖β. i.e.
‖Ax‖β ≤ ‖A‖α→β‖x‖α. (1.8.15)
Proof. For x = 0 the inequality is trivially satisfied. Now for
x 6= 0 by definition, we have
‖A‖α→β ≥‖Ax‖β‖x‖α
=⇒ ‖A‖α→β‖x‖α ≥ ‖Ax‖β.
�
Remark. There exists a vector x∗ ∈ Cn with unit norm (‖x∗‖α = 1)
such that
‖A‖α→β = ‖Ax∗‖β. (1.8.16)
Proof. Let x′ 6= 0 be some vector which maximizes the expression
‖Ax‖β‖x‖α
.
Then
‖A‖α→β =‖Ax′‖β‖x′‖α
.
Now consider x∗ = x′
‖x′‖α . Thus ‖x∗‖α = 1. We know that
‖Ax′‖β‖x′‖α
= ‖Ax∗‖β.
80 1. MATRIX ALGEBRA
Hence
‖A‖α→β = ‖Ax∗‖β.
�
We are now ready to prove triangle inequality for operator norm.
Lemma 1.126 Operator norm as defined in definition 1.47 satis-
fies triangle inequality.
Proof. Let A and B be some matrices in Cm×n. Consider the
operator norm of matrix A + B. From previous remarks, there exists
some vector x∗ ∈ Cn with ‖x∗‖α = 1 such that
‖A+B‖ = ‖(A+B)x∗‖β.
Now
‖(A+B)x∗‖β = ‖Ax∗ +Bx∗‖β ≤ ‖Ax∗‖β + ‖Bx∗‖β.
From another remark we have
‖Ax∗‖β ≤ ‖A‖‖x∗‖α = ‖A‖
and
‖Bx∗‖β ≤ ‖B‖‖x∗‖α = ‖B‖
since ‖x∗‖α = 1.
Hence we have
‖A+B‖ ≤ ‖A‖+ ‖B‖.
�
It turns out that operator norm is also consistent under certain condi-
tions.
1.8. MATRIX NORMS 81
Lemma 1.127 Let ‖ · ‖α be defined over all m ∈ N. Let ‖ · ‖β =
‖ · ‖α. Then the operator norm
‖A‖α = maxx 6=0
‖Ax‖α‖x‖α
is consistent.
Proof. We need to show that
‖AB‖α ≤ ‖A‖α‖B‖α.
Now
‖AB‖α = maxx 6=0
‖ABx‖α‖x‖α
.
We note that if Bx = 0, then ABx = 0. Hence we can rewrite as
‖AB‖α = maxBx 6=0
‖ABx‖α‖x‖α
.
Now if Bx 6= 0 then ‖Bx‖α 6= 0. Hence
‖ABx‖α‖x‖α
=‖ABx‖α‖Bx‖α
‖Bx‖α‖x‖α
and
maxBx 6=0
‖ABx‖α‖x‖α
≤ maxBx 6=0
‖ABx‖α‖Bx‖α
maxBx 6=0
‖Bx‖α‖x‖α
.
Clearly
‖B‖α = maxBx 6=0
‖Bx‖α‖x‖α
.
Furthermore
maxBx 6=0
‖ABx‖α‖Bx‖α
≤ maxy 6=0
‖Ay‖α‖y‖α
= ‖A‖α.
Thus we have
‖AB‖α ≤ ‖A‖α‖B‖α.
�
82 1. MATRIX ALGEBRA
1.8.6. p-norm for matrices
We recall the definition of lp norms for vectors x ∈ Cn from (??)
‖x‖p =
(∑n
i=1 |x|pi )
1p p ∈ [1,∞)
max1≤i≤n
|xi| p =∞.
The operator norms ‖ · ‖p defined from lp vector norms are of specific
interest.
Definition 1.48 The p-norm for a matrix A ∈ Cm×n is defined as
‖A‖p , maxx 6=0
‖Ax‖p‖x‖p
= max‖x‖p=1
‖Ax‖p (1.8.17)
where ‖x‖p is the standard lp norm for vectors in Cm and Cn.
Remark. As per lemma 1.127 p-norms for matrices are consistent
norms. They are also sub-ordinate to lp vector norms.
Special cases are considered for p = 1, 2 and ∞.
Theorem 1.128 Let A ∈ Cm×n.
For p = 1 we have
‖A‖1 , max1≤j≤n
m∑i=1
|aij|. (1.8.18)
This is also known as max column sum norm.
For p =∞ we have
‖A‖∞ , max1≤i≤m
n∑j=1
|aij|. (1.8.19)
This is also known as max row sum norm.
Finally for p = 2 we have
‖A‖2 , σ1 (1.8.20)
1.8. MATRIX NORMS 83
where σ1 is the largest singular value of A. This is also known as
spectral norm.
Proof. Let
A =[a1 . . . , an
].
Then
‖Ax‖1 =
∥∥∥∥∥n∑j=1
xjaj
∥∥∥∥∥1
≤n∑j=1
∥∥xjaj∥∥1=
n∑j=1
|xj|∥∥aj∥∥
1
≤ max1≤j≤n
‖aj‖1n∑j=1
|xj|
= max1≤j≤n
‖aj‖1‖x‖1.
Thus,
‖A‖1 = maxx 6=0
‖Ax‖1‖x‖1
≤ max1≤j≤n
‖aj‖1
which the maximum column sum. We need to show that this upper
bound is indeed an equality.
Indeed for any x = ej where ej is a unit vector with 1 in j-th entry and
0 elsewhere,
‖Aej‖1 = ‖aj‖1.
Thus
‖A‖1 ≥ ‖aj‖1 ∀ 1 ≤ j ≤ n.
Combining the two, we see that
‖A‖1 = max1≤j≤n
‖aj‖1.
84 1. MATRIX ALGEBRA
For p =∞, we proceed as follows:
‖Ax‖∞ = max1≤i≤m
∣∣∣∣∣n∑j=1
aijxj
∣∣∣∣∣≤ max
1≤i≤m
n∑j=1
|aij||xj|
≤ max1≤j≤n
|xj| max1≤i≤m
n∑j=1
|aij|
= ‖x‖∞ max1≤i≤m
‖ai‖1
where ai are the rows of A.
This shows that
‖Ax‖∞ ≤ max1≤i≤m
‖ai‖1.
We need to show that this is indeed an equality.
Fix an i = k and choose x such that
xj = sgn(akj).
Clearly ‖x‖∞ = 1.
Then
‖Ax‖∞ = max1≤i≤m
∣∣∣∣∣n∑j=1
aijxj
∣∣∣∣∣≥
∣∣∣∣∣n∑j=1
akjxj
∣∣∣∣∣=
∣∣∣∣∣n∑j=1
|akj|
∣∣∣∣∣=
n∑j=1
|akj|
= ‖ak‖1.Thus,
‖A‖∞ ≥ max1≤i≤m
‖ai‖1
1.8. MATRIX NORMS 85
Combining the two inequalities we get:
‖A‖∞ = max1≤i≤m
‖ai‖1.
Remaining case is for p = 2.
For any vector x with ‖x‖2 = 1,
‖Ax‖2 = ‖UΣV Hx‖2 = ‖U(ΣV Hx)‖2 = ‖ΣV Hx‖2
since l2 norm is invariant to unitary transformations.
Let v = V Hx. Then ‖v‖2 = ‖V Hx‖2 = ‖x‖2 = 1.
Now‖Ax‖2 = ‖Σv‖2
=
(n∑j=1
|σjvj|2) 1
2
≤ σ1
(n∑j=1
|vj|2) 1
2
= σ1‖v‖2 = σ1.
This shows that
‖A‖2 ≤ σ1.
Now consider some vector x such that v = (1, 0, . . . , 0). Then
‖Ax‖2 = ‖Σv‖2 = σ1.
Thus
‖A‖2 ≥ σ1.
Combining the two, we get that ‖A‖2 = σ1. �
1.8.7. The 2-norm
Theorem 1.129 Let A ∈ Cn×n has singular values σ1 ≥ σ2 ≥· · · ≥ σn. Let the eigen values for A be λ1, λ2, . . . , λn with |λ1| ≥|λ2| ≥ · · · ≥ |λn|. Then the following hold
‖A‖2 = σ1 (1.8.21)
86 1. MATRIX ALGEBRA
and if A is non-singular
‖A−1‖2 =1
σn. (1.8.22)
If A is symmetric and positive definite, then
‖A‖2 = λ1 (1.8.23)
and if A is non-singular
‖A−1‖2 =1
λn. (1.8.24)
If A is normal then
‖A‖2 = |λ1| (1.8.25)
and if A is non-singular
‖A−1‖2 =1
|λn|. (1.8.26)
1.8.8. Unitary invariant norms
Definition 1.49 A matrix norm ‖ · ‖ on Cm×n is called unitary
invariant if ‖UAV ‖ = ‖A‖ for any A ∈ Cm×n and any unitary
matrices U ∈ Cm×m and V ∈ Cn×n.
We have already seen in theorem 1.121 that Frobenius norm is unitary
invariant.
It turns out that spectral norm is also unitary invariant.
1.8.9. More properties of operator norms
In this section we will focus on operator norms connecting normed
linear spaces (Cn, ‖ · ‖p) and (Cm, ‖ · ‖q). Typical values of p, q would
be in {1, 2,∞}.
We recall that
‖A‖p→q = maxx 6=0
‖Ax‖q‖x‖p
= max‖x‖p=1
‖Ax‖q = max‖x‖p≤1
‖Ax‖q. (1.8.27)
1.8. MATRIX NORMS 87
Table 1[[5]] shows how to compute different (p, q) norms. Some can be
computed easily while others are NP-hard to compute.
Table 1. Typical (p→ q) norms
p q ‖A‖p→q Calculation
1 1 ‖A‖1 Maximum l1 norm of a column
1 2 ‖A‖1→2 Maximum l2 norm of a column
1 ∞ ‖A‖1→∞ Maximum absolute entry of a matrix
2 1 ‖A‖2→1 NP hard
2 2 ‖A‖2 Maximum singular value
2 ∞ ‖A‖2→∞ Maximum l2 norm of a row
∞ 1 ‖A‖∞→1 NP hard
∞ 2 ‖A‖∞→2 NP hard
∞ ∞ ‖A‖∞ Maximum l1-norm of a row
The topological dual of the finite dimensional normed linear space
(Cn, ‖ · ‖p) is the normed linear space (Cn, ‖ · ‖p′) where
1
p+
1
p′= 1.
l2-norm is dual of l2-norm. It is a self dual. l1 norm and l∞-norm are
dual of each other.
When a matrix A maps from the space (Cn, ‖ · ‖p) to the space (Cm, ‖ ·‖q), we can view its conjugate transpose AH as a mapping from the
space (Cm, ‖ · ‖q′) to (Cn, ‖ · ‖p′).
Theorem 1.130 Operator norm of a matrix always equals the op-
erator norm of its conjugate transpose. i.e.
‖A‖p→q = ‖AH‖q′→p′ (1.8.28)
where1
p+
1
p′= 1,
1
q+
1
q′= 1.
88 1. MATRIX ALGEBRA
Specific applications of this result are:
‖A‖2 = ‖AH‖2. (1.8.29)
This is obvious since the maximum singular value of a matrix and its
conjugate transpose are same.
‖A‖1 = ‖AH‖∞, ‖A‖∞ = ‖AH‖1. (1.8.30)
This is also obvious since max column sum of A is same as the max
row sum norm of AH and vice versa.
‖A‖1→∞ = ‖AH‖1→∞. (1.8.31)
‖A‖1→2 = ‖AH‖2→∞. (1.8.32)
‖A‖∞→2 = ‖AH‖2→1. (1.8.33)
We now need to show the result for the general case (arbitrary 1 ≤p, q ≤ ∞).
Proof. TODO �
Theorem 1.131
‖A‖1→p = max1≤j≤n
‖aj‖p. (1.8.34)
where
A =[a1 . . . , an
].
1.8. MATRIX NORMS 89
Proof.
‖Ax‖p =
∥∥∥∥∥n∑j=1
xjaj
∥∥∥∥∥p
≤n∑j=1
∥∥xjaj∥∥p=
n∑j=1
|xj|∥∥aj∥∥
p
≤ max1≤j≤n
‖aj‖pn∑j=1
|xj|
= max1≤j≤n
‖aj‖p‖x‖1.
Thus,
‖A‖1→p = maxx 6=0
‖Ax‖p‖x‖1
≤ max1≤j≤n
‖aj‖p.
We need to show that this upper bound is indeed an equality.
Indeed for any x = ej where ej is a unit vector with 1 in j-th entry and
0 elsewhere,
‖Aej‖p = ‖aj‖p.
Thus
‖A‖1→p ≥ ‖aj‖p ∀ 1 ≤ j ≤ n.
Combining the two, we see that
‖A‖1→p = max1≤j≤n
‖aj‖p.
�
Theorem 1.132
‖A‖p→∞ = max1≤i≤m
‖ai‖q (1.8.35)
where1
p+
1
q= 1.
90 1. MATRIX ALGEBRA
Proof. Using theorem 1.130, we get
‖A‖p→∞ = ‖AH‖1→q.
Using theorem 1.131, we get
‖AH‖1→q = max1≤i≤m
‖ai‖q.
This completes the proof. �
Theorem 1.133 For two matrices A and B and p ≥ 1, we have
‖AB‖p→q ≤ ‖B‖p→s‖A‖s→q. (1.8.36)
Proof. We start with
‖AB‖p→q = max‖x‖p=1
‖A(Bx)‖q.
From lemma 1.125, we obtain
‖A(Bx)‖q ≤ ‖A‖s→q‖(Bx)‖s.
Thus,
‖AB‖p→q ≤ ‖A‖s→q max‖x‖p=1
‖(Bx)‖s = ‖A‖s→q‖B‖p→s.
�
Theorem 1.134 For two matrices A and B and p ≥ 1, we have
‖AB‖p→∞ ≤ ‖A‖∞→∞‖B‖p→∞. (1.8.37)
Proof. We start with
‖AB‖p→∞ = max‖x‖p=1
‖A(Bx)‖∞.
From lemma 1.125, we obtain
‖A(Bx)‖∞ ≤ ‖A‖∞→∞‖(Bx)‖∞.
Thus,
‖AB‖p→∞ ≤ ‖A‖∞→∞ max‖x‖p=1
‖(Bx)‖∞ = ‖A‖∞→∞‖B‖p→∞.
1.8. MATRIX NORMS 91
�
Theorem 1.135
‖A‖p→∞ ≤ ‖A‖p→p. (1.8.38)
In particular
‖A‖1→∞ ≤ ‖A‖1. (1.8.39)
‖A‖2→∞ ≤ ‖A‖2. (1.8.40)
Proof. Choosing q =∞ and s = p and applying theorem 1.133
‖IA‖p→∞ ≤ ‖A‖p→p‖I‖p→∞.
But ‖I‖p→∞ is the maximum lp norm of any row of I which is 1. Thus
‖A‖p→∞ ≤ ‖A‖p→p.
�
Consider the expression
minz∈C(AH)z 6=0
‖Az‖q‖z‖p
. (1.8.41)
z ∈ C(AH), z 6= 0 means there exists some vector u /∈ ker(AH) such
that z = AHu.
This expression measures the factor by which the non-singular part of
A can decrease the length of a vector.
Theorem 1.136 [5] The following bound holds for every matrix
A:
minz∈C(AH)z 6=0
‖Az‖q‖z‖p
≥ ‖A†‖−1q,p. (1.8.42)
If A is surjective (onto), then the equality holds. When A is bijec-
tive (one-one onto, square, invertible), then the result implies
minz∈C(AH)z 6=0
‖Az‖q‖z‖p
= ‖A−1‖−1q,p. (1.8.43)
92 1. MATRIX ALGEBRA
Proof. The spaces C(AH) and C(A) have same dimensions given
by rank(A). We recall that A†A is a projector onto the column space
of A.
w = Az ⇐⇒ z = A†w = A†Az ∀ z ∈ C(AH).
As a result we can write
‖z‖p‖Az‖q
=‖A†w‖p‖w‖q
whenever z ∈ C(AH). Now minz∈C(AH)z 6=0
‖Az‖q‖z‖p
−1 = maxz∈C(AH)z 6=0
‖z‖p‖Az‖q
= maxw∈C(A)w 6=0
‖A†w‖p‖w‖q
≤ maxw 6=0
‖A†w‖p‖w‖q
.
When A is surjective, then C(A) = Cm. Hence
maxw∈C(A)w 6=0
‖A†w‖p‖w‖q
= maxw 6=0
‖A†w‖p‖w‖q
.
Thus, the inequality changes into equality. Finally
maxw 6=0
‖A†w‖p‖w‖q
= ‖A†‖q→p
which completes the proof.
�
1.8.10. Row column norms
Definition 1.50 Let A be an m× n matrix with rows ai as
A =
a1
...
am
Then we define
‖A‖p,∞ , max1≤i≤m
‖ai‖p = max1≤i≤m
(n∑j=1
|aij|p) 1
p
(1.8.44)
where 1 ≤ p < ∞. i.e. we take p-norms of all row vectors and
then find the maximum.
1.8. MATRIX NORMS 93
We define
‖A‖∞,∞ = maxi,j|aij|. (1.8.45)
This is equivalent to taking l∞ norm on each row and then taking
the maximum of all the norms.
For 1 ≤ p, q <∞, we define the norm
‖A‖p,q ,
[m∑i=1
(‖ai‖p
)q] 1q
. (1.8.46)
i.e., we compute p-norm of all the row vectors to form another
vector and then take q-norm of that vector.
Note that the norm ‖A‖p,∞ is different from the operator norm ‖A‖p→∞.
Similarly ‖A‖p,q is different from ‖A‖p→q.
Theorem 1.137
‖A‖p,∞ = ‖A‖q→∞ (1.8.47)
where1
p+
1
q= 1.
Proof. From theorem 1.132 we get
‖A‖q→∞ = max1≤i≤m
‖ai‖p.
This is exactly the definition of ‖A‖p,∞. �
Theorem 1.138
‖A‖1→p = ‖A‖p,∞. (1.8.48)
Proof.
‖A‖1→p = ‖AH‖q→∞.
From theorem 1.137
‖AH‖q→∞ = ‖AH‖p,∞.
�
94 1. MATRIX ALGEBRA
Theorem 1.139 For any two matrices A,B, we have
‖AB‖p,∞‖B‖p,∞
≤ ‖A‖∞→∞. (1.8.49)
Proof. Let q be such that 1p
+ 1q
= 1. From theorem 1.134, we
have
‖AB‖q→∞ ≤ ‖A‖∞→∞‖B‖q→∞.
From theorem 1.137
‖AB‖q→∞ = ‖AB‖p,∞
and
‖B‖q→∞ = ‖B‖p,∞.
Thus
‖AB‖p,∞ ≤ ‖A‖∞→∞‖B‖p,∞.
�
Theorem 1.140 Relations between (p, q) norms and (p → q)
norms
‖A‖1,∞ = ‖A‖∞→∞ (1.8.50)
‖A‖2,∞ = ‖A‖2→∞ (1.8.51)
‖A‖∞,∞ = ‖A‖1→∞ (1.8.52)
‖A‖1→1 = ‖AH‖1,∞ (1.8.53)
‖A‖1→2 = ‖AH‖2,∞ (1.8.54)
(1.8.55)
Proof. The first three are straight forward applications of theo-
rem 1.137. The next two are applications of theorem 1.138. See also
table 1.
�
1.8. MATRIX NORMS 95
1.8.11. Block diagonally dominant matrices and generalized
Gershgorin disc theorem
In [1] the idea of diagonally dominant matrices (see section 1.6.9) has
been generalized to block matrices using matrix norms. We consider
the specific case with spectral norm.
Definition 1.51 [Block diagonally dominant matrix] Let A be a
square matrix in Cn×n which is partitioned in following manner
A =
A11 A12 . . . A1k
A21 A22 . . . A2k
......
. . ....
Ak1 Ak2 . . . Akk
(1.8.56)
where each of the submatrices Aij is a square matrix of size m×m.
Thus n = km.
A is called block diagonally dominant if
‖Aii‖2 ≥∑j 6=i
‖Aij‖2.
holds true for all 1 ≤ i ≤ n. If the inequality satisfies strictly
for all i, then A is called block strictly diagonally dominant
matrix.
Theorem 1.141 If the partitioned matrix A of definition 1.51 is
block strictly diagonally dominant matrix, then it is nonsingular.
For proof see [1].
This leads to the generalized Gershgorin disc theorem.
96 1. MATRIX ALGEBRA
Theorem 1.142 Let A be a square matrix in Cn×n which is par-
titioned in following manner
A =
A11 A12 . . . A1k
A21 A22 . . . A2k
......
. . ....
Ak1 Ak2 . . . Akk
(1.8.57)
where each of the submatrices Aij is a square matrix of size m×m.
Then each eigenvalue λ of A satisfies
‖λI − Aii‖2 ≤∑j 6=i
‖Aij‖ for some i ∈ {1, 2, . . . , n}. (1.8.58)
For proof see [1].
Since the 2-norm of a positive semidefinite matrix is nothing but its
largest eigen value, the theorem directly applies.
Corollary 1.143. Let A be a Hermitian positive semidefinite matrix.
Let A be partitioned as in theorem 1.142. Then its 2-norm ‖A‖2 satis-
fies
|‖A‖2 − ‖Aii‖2| ≤∑j 6=i
‖Aij‖ for some i ∈ {1, 2, . . . , n}. (1.8.59)
1.9. Miscellaneous topics
1.9.1. Hadamard product
Usually standard linear algebra books don’t dwell much about element-
wise or component wise products of vectors or matrices. Yet in certain
contexts and algorithms, this is quite useful. We define the notation in
this section. For further details see [3], [2] and [4].
Definition 1.52 The Hadamard product of two matrices A =
[aij] and B = [bij] with same dimensions (not necessarily square)
1.10. DIGEST 97
with entries in a given ring R is the entry-wise product A ◦ B ≡[aijbij], which has the same dimensions as A and B.
Example 1.3: Hadamard product Let
A =
[1 2
3 4
]and B =
[5 −6
7 −3
]
Then
A ◦B =
[5 −12
21 −12
]�
The Hardamard product is associative and distributive. It is also com-
mutative.
Naturally this can also be defined for column vectors and row vectors
also.
The reason why this product is not mentioned in linear algebra texts
is because it is inherently basis dependent. But this product has a
number of uses in statistics and analysis.
In analysis, a similar concept is point-wise product which is defined
to be
(f.g)(x) = f(x)g(x).
1.10. Digest
1.10.1. Norms
All norms are equivalent.
Sum norm
‖A‖S =m∑i=1
n∑j=1
|aij|.
98 1. MATRIX ALGEBRA
Frobenius norm
‖A‖F =
(m∑i=1
n∑j=1
|aij|2) 1
2
.
Max norm
‖A‖M = max1≤i≤m1≤j≤n
|aij|.
Frobenius norm of Hermitian transpose
‖AH‖F = ‖A‖F .
Frobenius norm as sum of norms of column vectors
‖A‖2F =n∑j=1
‖aj‖22.
Frobenius norm as sum of norms of row vectors
‖A‖2F =m∑i=1
‖ai‖22.
Frobenius norm invariance w.r.t. unitary matrices
‖UA‖F = ‖A‖F
‖AV ‖F = ‖A‖F .
Frobenius norm is consistent:
‖AB‖F ≤ ‖A‖F‖B‖F .
corollary 1.123
‖Ax‖2 ≤ ‖A‖F‖x‖2.
‖A‖F =
√√√√ n∑i=1
σ2i .
Consistent norms
‖AB‖ ≤ ‖A‖‖B‖
also known as sub-multiplicative norm.
1.10. DIGEST 99
Subordinate matrix norm
‖Ax‖α ≤ ‖A‖‖x‖β
(α→ β) Operator norm
‖A‖ , ‖A‖α→β , maxx 6=0
‖Ax‖β‖x‖α
.
‖A‖α→β = maxx/∈ker(A)
‖Ax‖β‖x‖α
= max‖x‖α=1
‖Ax‖β.
(α→ β) norm is subordinate
‖Ax‖β ≤ ‖A‖α→β‖x‖α.
There exists a unit norm vector x∗ such that
‖A‖α→β = ‖Ax∗‖β.
α→ α-norms are consistent
‖A‖α = maxx 6=0
‖Ax‖α‖x‖α
‖AB‖α ≤ ‖A‖α‖B‖α.
p-norm
‖A‖p , maxx6=0
‖Ax‖p‖x‖p
= max‖x‖p=1
‖Ax‖p
Closed form p-norms
‖A‖1 , max1≤j≤n
m∑i=1
|aij|.
‖A‖∞ , max1≤i≤m
n∑j=1
|aij|.
2-norm
‖A‖2 , σ1
non-singular
‖A−1‖2 =1
σn.
100 1. MATRIX ALGEBRA
symmetric and positive definite
‖A‖2 = λ1
non-singular
‖A−1‖2 =1
λn.
normal
‖A‖2 = |λ1|
non-singular
‖A−1‖2 =1
|λn|.
Unitary invariant norm ‖UAV ‖ = ‖A‖ for any A ∈ Cm×n and any
unitary U and V .
Typical p→ q norms
Dual norm and conjugate transpose
‖A‖p→q = ‖AH‖q′→p′1
p+
1
p′= 1.
‖A‖2 = ‖AH‖2.
‖A‖1 = ‖AH‖∞, ‖A‖∞ = ‖AH‖1.
‖A‖1→∞ = ‖AH‖1→∞.
‖A‖1→2 = ‖AH‖2→∞.
‖A‖∞→2 = ‖AH‖2→1.
‖A‖1→p‖A‖1→p = max
1≤j≤n‖aj‖p.
‖A‖p→∞‖A‖p→∞ = max
1≤i≤m‖ai‖q
with 1p
+ 1q
= 1.
Consistency of p→ q norm
‖AB‖p→q ≤ ‖B‖p→s‖A‖s→q.
1.10. DIGEST 101
Consistency of p→∞ norm
‖AB‖p→∞ ≤ ‖A‖∞→∞‖B‖p→∞.
Dominance of p→∞ norm by p→ p norm
‖A‖p→∞ ≤ ‖A‖p→p.
‖A‖1→∞ ≤ ‖A‖1.
‖A‖2→∞ ≤ ‖A‖2.
Restricted minimum property
minz∈C(AH)z 6=0
‖Az‖q‖z‖p
≥ ‖A†‖−1q,p.
If A is surjective (onto), then the equality holds. When A is bijective
minz∈C(AH)z 6=0
‖Az‖q‖z‖p
= ‖A−1‖−1q,p.
Row column norm
‖A‖p,∞ , max1≤i≤m
‖ai‖p.
‖A‖p,∞ = max1≤i≤m
(n∑j=1
|aij|p) 1
p
.
‖A‖∞,∞ = maxi,j|aij|.
‖A‖p,q ,
[m∑i=1
(‖ai‖p
)q] 1q
.
Row column norm and p→∞ norm
‖A‖p,∞ = ‖A‖q→∞
with 1p
+ 1q
= 1.
Consistency of (p,∞) norm
‖AB‖p,∞‖B‖p,∞
≤ ‖A‖∞→∞.
102 1. MATRIX ALGEBRA
Relations between (p, q) norms and (p→ q) norms
‖A‖1,∞ = ‖A‖∞→∞
‖A‖2,∞ = ‖A‖2→∞‖A‖∞,∞ = ‖A‖1→∞‖A‖1→1 = ‖AH‖1,∞‖A‖1→2 = ‖AH‖2,∞
Bibliography
[1] David G Feingold, Richard S Varga, et al. Block diagonally domi-
nant matrices and generalizations of the gerschgorin circle theorem.
Pacific J. Math, 12(4):1241–1250, 1962.
[2] Roger A Horn. The hadamard product. In Proc. Symp. Appl. Math,
volume 40, pages 87–169, 1990.
[3] Elizabeth Million. The hadamard product, 2007.
[4] George PH Styan. Hadamard products and multivariate statistical
analysis. Linear Algebra and Its Applications, 6:217–240, 1973.
[5] JOEL A TROPP. Just relax: Convex programming methods for
subset selection and sparse approximation. 2004.
103
Index
F Unitary matrix, 25
p-norm for matrices, 82
Algebraic multiplicity, 30
Block diagonal matrix, 6
Block diagonally dominant matrix,
95
Block matrix, 4
Block strictly diagonally dominant
matrix, 95
Characteristic equation, 29
Characteristic polynomial, 29
Characteristic value, 27
Column rank, 7
Column space, 6
Consistent matrix norm, 77
Diagonal matrix, 2
Diagonalizable, 41
Diagonally dominant matrix, 49
Eigen space, 28
Eigen value, 27
Eigen value decomposition of a
Hermitian matrix, 47
Eigen vector, 27
Element-wise product, 97
Frobenius norm on matrix, 70
Full rank matrix, 7
Geometric multiplicity, 29
Gershgorin’s disc, 53
Gershgorin’s theorem, 52
Gram matrix of columns of a matrix,
13
Gram matrix of rows of a matrix, 13
Hadamard product, 97
Invariant subspace, 34
Inverse of a matrix, 8
Invertible matrix, 8
Latent value, 27
Left singular vector, 54
Low rank approximation, 69
Low rank matrix, 69
Main diagonal, 2
Matrix p-norm, 82
Matrix norm, 70
Max column sum norm, 82
Max norm on matrix, 71
Max row sum norm, 82
Moore-Penrose pseudo-inverse, 16
Multiplication of block matrices, 5
Off diagonal, 2
Operator norm, 78
104
INDEX 105
Orthogonal matrix, 23
Orthogonally diagonalizable matrix,
45
Partitioned matrix, 4
Proper value, 27
Rank, 7
rectangular diagonal matrix, 2
Right singular vector, 54
Row column norms, 92
Row rank, 7
Row space, 6
Similar matrices, 12
Singular value, 54
Singular value decomposition, 55
Spectral norm, 82
Spectrum of a matrix, 27
Square matrix, 1
Strictly diagonally dominant matrix,
49
Sub-multiplicative norm, 77
Subordinate matrix norm, 77
Sum norm on matrix, 70
Tall matrix, 1
Trace, 19
Unitary diagonalizable matrix, 47
Unitary invariant matrix norm, 86
Unitary matrix, 24
Wide matrix, 2