ics 6n computational linear algebra symmetric matrices...
TRANSCRIPT
ICS 6N Computational Linear AlgebraSymmetric Matrices and Orthogonal Diagonalization
Xiaohui Xie
University of California, Irvine
Xiaohui Xie (UCI) ICS 6N 1 / 21
Symmetric matrices
An n × n matrix A is symmetric if AT = A.
Component wise: A is symmetric if
aij = aji
for i , j = 1, 2, . . . , n
Xiaohui Xie (UCI) ICS 6N 2 / 21
Matrix Diagonalization
Matrix A is diagonalizable if there exists a diagonal matrix Λ such that
A = PΛP−1
If A can be diagonalized, then Ak = PΛkP−1
No all matrices can be diagonalized.
A matrix can be diagonalized if and only if there exists n linearlyindependent eigenvectors.
Some special cases:
If an nxn matrix A has n distinct eigenvalues, then it is diagonalizable.If A is symmetric, then it is diagonalizable.
Xiaohui Xie (UCI) ICS 6N 3 / 21
Diagonalization of symmetric matrices
Example: diagonalize the matrix
A =
6 −2 −1−2 6 −1−1 −1 5
Characteristic equation of A is
0 = −λ3 + 17λ2 − 90λ+ 144 = −(λ− 8)(λ− 6)(λ− 3)
so we have three distinct eigenvalues λ1 = 8, λ2 = 6, λ3 = 3.
Find corresponding eigenvectors
v1 =
−110
, v2 =
−1−12
, v3 =
111
Note that vT1 v2 = 0, vT1 v3 = 0, vT2 v3 = 0, i.e., the eigenvectors aremutually orthogonal.Xiaohui Xie (UCI) ICS 6N 4 / 21
Diagonalization of symmetric matrices
Example: diagonalize the matrix A =
6 −2 −1−2 6 −1−1 −1 5
Further normalize eigenvector to be unit vectors.
u1 =
−1/√
2
1/√
20
, u2 =
−1/√
6
−1/√
6
2/√
6
, u3 =
1/√
3
1/√
3
1/√
3
Let
P =
−1/√
2 −1/√
6 1/√
3
1/√
2 −1/√
6 1/√
3
0 2/√
6 1/√
3
,D =
8 0 00 6 00 0 3
A = PDPT , since P is an orthogonal matrix (P−1 = PT ).
Xiaohui Xie (UCI) ICS 6N 5 / 21
Spectrum theorem
If A is an n × n symmetric matrix
1 All eigenvalues of A are real
2 A has exactly n real eigenvalues (counting for multiplicity). But thisdoesn’t mean they are distinct
3 The geometric multiplicity of λ = dim(Null(A− λI )) = the algebraicmultiplicity of λ
4 The eigenspaces are mutually orthogonal:If λ1 6= λ2 are two distinct eigenvalues, then their correspondingeigenvectors v1, v2 are orthogonal.
Xiaohui Xie (UCI) ICS 6N 6 / 21
Proof
1 Let λ be an eigenvalue of A with corresponding eigenvector x , soAx = λx and Ax∗ = λ∗x∗. Then
λ∗xT x∗ = xTAx∗ = (Ax)T x∗ = λxT x∗.
=⇒ λ∗ = λ, so λ is real.
2 Let x1 and x2 be two eigenvectors corresponding to two distincteigenvalues λ1 and λ2.
xT1 Ax2 = (xT1 Ax2)T = xT2 AT (xT1 )T = xT2 Ax1
=⇒ λ2xT1 x2 = λ1x
T1 x2 =⇒ (λ1 − λ2)(xT1 x2) = 0
Since λ1 6= λ2, (xT1 x2) = 0 so they are orthogonal.
Xiaohui Xie (UCI) ICS 6N 7 / 21
Orthogonal diagonalization
If an n × n matrix A is symmetric, its eigenvectors v1, · · · , vn can bechosen to be orthonormal.
If it has n distinct eigenvalues, then the n eigenvectors are orthogonal.Normalize these vectors to make them orthonormal.If an eigenvalue λ has multiplicity greater than 1, find an orthonormalbasis of the corresponding eigenspace, Null(A− λI), and use vectors inthis basis as eigenvectors.
In this case, P =[v1 v2 . . . vn
]is an orthogonal matrix, that is,
P−1 = PT .
And A can be orthogonally diagonalized
A = PΛPT
Xiaohui Xie (UCI) ICS 6N 8 / 21
Orthogonal diagonalization: an example
Orthogonally diagonalize the matrix A =
3 −2 4−2 6 24 2 3
Characteristic equation:
0 = −λ3 + 12λ2 − 21λ− 98 = −(λ− 7)2(λ+ 2)
Produce bases for the eigenspaces by solving linear equations:
λ = 7 : v1 =
101
v2 =
−1/210
; λ = −2 : v3 =
−1−1/2
1
Apply Gram-Schdmit to produce an orthogonal basis for theeigenspace of λ = 7.
Xiaohui Xie (UCI) ICS 6N 9 / 21
Orthogonal diagonalization: an example
Produce bases for the eigenspaces by solving linear equations:
λ = 7 : v1 =
101
v2 =
−1/210
; λ = −2 : v3 =
−1−1/2
1
Apply Gram-Schdmit to produce orthogonal bases
The component of v2 orthogonal to v1 is
z2 = v2 −v2 · v1v1 · v1
v1 =
−1/41
1/4
Normalize v1, z2
u1 =
1/√
20
1/√
2
, u2 =
−1/√
18
4/√
18
1/√
18
Normalize v3 to obtain u3.
A = PDPT where P = [u1, u2, u3] and D = diag(7, 7,−2).
Xiaohui Xie (UCI) ICS 6N 10 / 21
Application 1: Quadratic Forms
Any quadratic function of x can be expressed in the form of
Q(x) = xTAx
where x is a vector in Rn and A is an nxn symmetric matrix.
More explicitly,
xTAx =n∑
i=1
n∑j=1
aijxixj
Xiaohui Xie (UCI) ICS 6N 11 / 21
Example
For example,
Q(x) = 2x21 + 3x22 + 4x23 + 5x2x3 + 6x1x2
can be written in quadratic form with matrix
A =
2 3 03 3 5
20 5
2 4
Xiaohui Xie (UCI) ICS 6N 12 / 21
Optimizing quadratic functions
Consider the following optimization problem:
max Q(x) = 2x21 + 3x22 + 4x23
subject to ‖x‖ = 1
Xiaohui Xie (UCI) ICS 6N 13 / 21
Optimizing quadratic functions
Consider the following optimization problem (without cross-productterms):
max Q(x) = 2x21 + 3x22 + 4x23
subject to ‖x‖ = 1
Solution:Since 2x21 ≤ 4x21 and 3x2 ≤ 4x22 , we have
Q(x) ≤ 4x21 + 4x22 + 4x23 = 4
In addition, we can choose x1 = 0, x2 = 0, x3 = 1 to reach the maximum.
Xiaohui Xie (UCI) ICS 6N 14 / 21
Optimizing quadratic functions
A more general problem:
max Q(x) = xTAx
subject to ‖x‖ = 1
Xiaohui Xie (UCI) ICS 6N 15 / 21
Optimizing quadratic functions
A more general problem:
max Q(x) = xTAx
subject to ‖x‖ = 1
Solution: Use A = PΛPT to transform the problem into an easier form.
Q(x) = xTPΛPT x = (PT x)TΛ(PT x)
Use y = PT x to change variables. Convert the problem to
max Q(y) = yTΛy = λ1y21 + · · ·+ λny
2n
subject to ‖y‖ = 1
max xTAx subject to ‖x‖ = 1: λmax
{A}
min xTAx subject to ‖x‖ = 1: λmin
{A}
Xiaohui Xie (UCI) ICS 6N 16 / 21
Optimizing quadratic functions: example
max Q(x) = x21 − 8x1x2 − 5x22
subject to ‖x‖ = 1
Xiaohui Xie (UCI) ICS 6N 17 / 21
Optimizing quadratic functions: example
Solution:
The matrix of the quadratic form is
A =
[1 −4−4 −5
]Orthogonally diagonalize A:
P =
[2/√
5 1/√
5
−1/√
5 2/√
5
],D =
[3 00 −7
]Change variables from x to y = PT x , and rewrite the objectivefunction
x21 − 8x1x2 − 5x22 = xTAx = (Py)TA(Py) = yTDy = 3y21 − 7y22
max Q(x) over ‖x‖ = 1 is 3.
Xiaohui Xie (UCI) ICS 6N 18 / 21
Application 2: Principle Component Analysis (PCA)
Problem: Given a set of data points {x (1), x (2), · · · , x (m)} in Rn, find theaxis along which the data points have maximal variation.
Assume the data center around origin. If not, subtract the mean fromeach data point.
Xiaohui Xie (UCI) ICS 6N 19 / 21
Application 2: Principle Component Analysis (PCA)
Problem: Given a set of data points {x (1), x (2), · · · , x (m)} in Rn, find theaxis along which the data points have maximal variance.
Use a unit vector u in Rn denote the direction of the axis.
Project each data point onto u to obtain {y (1), y (2), · · · , y (m)}, wherey (i) = uT x (i).
The variance of projected points
σ2 =1
m
m∑i=1
(y (i))2 =1
m
m∑i=1
uT x (i)(x (i))Tu = uTXu
where matrix X is defined by
X =1
m
m∑i=1
x (i)(x (i))T
called covariance matrix.
Xiaohui Xie (UCI) ICS 6N 20 / 21
Application 2: Principle Component Analysis (PCA)
Problem: Given a set of data points {x (1), x (2), · · · , x (m)} in Rn, find theaxis along which the data points have maximal variance.
Reformulate the problem into a quadratic optimization problem
max uTXu
subject to ‖u‖ = 1
where matrix X = 1m
∑mi=1 x
(i)(x (i))T is the covariance matrix.
Solution: u is the eigenvector corresponding to the largest eigenvalueof X . The resulting y points are called the first principle component.
Xiaohui Xie (UCI) ICS 6N 21 / 21