ics 6n computational linear algebra symmetric matrices...

ICS 6N Computational Linear AlgebraSymmetric Matrices and Orthogonal Diagonalization

Xiaohui Xie

University of California, Irvine

[email protected]

Xiaohui Xie (UCI) ICS 6N 1 / 21

Symmetric matrices

An n × n matrix A is symmetric if AT = A.

Component wise: A is symmetric if

aij = aji

for i , j = 1, 2, . . . , n


Matrix Diagonalization

Matrix A is diagonalizable if there exists a diagonal matrix Λ such that

A = PΛP−1

If A can be diagonalized, then Ak = PΛkP−1

No all matrices can be diagonalized.

A matrix can be diagonalized if and only if there exists n linearlyindependent eigenvectors.

Some special cases:

If an nxn matrix A has n distinct eigenvalues, then it is diagonalizable.If A is symmetric, then it is diagonalizable.


Diagonalization of symmetric matrices

Example: diagonalize the matrix

A =

6 −2 −1−2 6 −1−1 −1 5

Characteristic equation of A is

0 = −λ3 + 17λ2 − 90λ+ 144 = −(λ− 8)(λ− 6)(λ− 3)

so we have three distinct eigenvalues λ1 = 8, λ2 = 6, λ3 = 3.

Find corresponding eigenvectors

v1 =

−110

, v2 =

−1−12

, v3 =

111

Note that vT1 v2 = 0, vT1 v3 = 0, vT2 v3 = 0, i.e., the eigenvectors aremutually orthogonal.Xiaohui Xie (UCI) ICS 6N 4 / 21

Diagonalization of symmetric matrices

Example: diagonalize the matrix A =

6 −2 −1−2 6 −1−1 −1 5

Further normalize eigenvector to be unit vectors.

u1 =

−1/√

2

1/√

20

, u2 =

−1/√

6

−1/√

6

2/√

6

, u3 =

1/√

3

1/√

3

1/√

3

Let

P =

−1/√

2 −1/√

6 1/√

3

1/√

2 −1/√

6 1/√

3

0 2/√

6 1/√

3

,D =

8 0 00 6 00 0 3

A = PDPT , since P is an orthogonal matrix (P−1 = PT ).


Spectrum theorem

If A is an n × n symmetric matrix

1 All eigenvalues of A are real

2 A has exactly n real eigenvalues (counting for multiplicity). But thisdoesn’t mean they are distinct

3 The geometric multiplicity of λ = dim(Null(A− λI )) = the algebraicmultiplicity of λ

4 The eigenspaces are mutually orthogonal:If λ1 6= λ2 are two distinct eigenvalues, then their correspondingeigenvectors v1, v2 are orthogonal.


Proof

1 Let λ be an eigenvalue of A with corresponding eigenvector x , soAx = λx and Ax∗ = λ∗x∗. Then

λ∗xT x∗ = xTAx∗ = (Ax)T x∗ = λxT x∗.

=⇒ λ∗ = λ, so λ is real.

2 Let x1 and x2 be two eigenvectors corresponding to two distincteigenvalues λ1 and λ2.

xT1 Ax2 = (xT1 Ax2)T = xT2 AT (xT1 )T = xT2 Ax1

=⇒ λ2xT1 x2 = λ1x

T1 x2 =⇒ (λ1 − λ2)(xT1 x2) = 0

Since λ1 6= λ2, (xT1 x2) = 0 so they are orthogonal.


Orthogonal diagonalization

If an n × n matrix A is symmetric, its eigenvectors v1, · · · , vn can bechosen to be orthonormal.

If it has n distinct eigenvalues, then the n eigenvectors are orthogonal.Normalize these vectors to make them orthonormal.If an eigenvalue λ has multiplicity greater than 1, find an orthonormalbasis of the corresponding eigenspace, Null(A− λI), and use vectors inthis basis as eigenvectors.

In this case, P =[v1 v2 . . . vn

]is an orthogonal matrix, that is,

P−1 = PT .

And A can be orthogonally diagonalized

A = PΛPT


Orthogonal diagonalization: an example

Orthogonally diagonalize the matrix A =

3 −2 4−2 6 24 2 3

Characteristic equation:

0 = −λ3 + 12λ2 − 21λ− 98 = −(λ− 7)2(λ+ 2)

Produce bases for the eigenspaces by solving linear equations:

λ = 7 : v1 =

101

v2 =

−1/210

; λ = −2 : v3 =

−1−1/2

1

Apply Gram-Schdmit to produce an orthogonal basis for theeigenspace of λ = 7.


Orthogonal diagonalization: an example

Produce bases for the eigenspaces by solving linear equations:

λ = 7 : v1 =

101

v2 =

−1/210

; λ = −2 : v3 =

−1−1/2

1

Apply Gram-Schdmit to produce orthogonal bases

The component of v2 orthogonal to v1 is

z2 = v2 −v2 · v1v1 · v1

v1 =

−1/41

1/4

Normalize v1, z2

u1 =

1/√

20

1/√

2

, u2 =

−1/√

18

4/√

18

1/√

18

Normalize v3 to obtain u3.

A = PDPT where P = [u1, u2, u3] and D = diag(7, 7,−2).


Application 1: Quadratic Forms

Any quadratic function of x can be expressed in the form of

Q(x) = xTAx

where x is a vector in Rn and A is an nxn symmetric matrix.

More explicitly,

xTAx =n∑

i=1

n∑j=1

aijxixj


Example

For example,

Q(x) = 2x21 + 3x22 + 4x23 + 5x2x3 + 6x1x2

can be written in quadratic form with matrix

A =

2 3 03 3 5

20 5

2 4


Optimizing quadratic functions

Consider the following optimization problem:

max Q(x) = 2x21 + 3x22 + 4x23

subject to ‖x‖ = 1



Consider the following optimization problem (without cross-productterms):

max Q(x) = 2x21 + 3x22 + 4x23


Solution:Since 2x21 ≤ 4x21 and 3x2 ≤ 4x22 , we have

Q(x) ≤ 4x21 + 4x22 + 4x23 = 4

In addition, we can choose x1 = 0, x2 = 0, x3 = 1 to reach the maximum.



A more general problem:

max Q(x) = xTAx




A more general problem:

max Q(x) = xTAx


Solution: Use A = PΛPT to transform the problem into an easier form.

Q(x) = xTPΛPT x = (PT x)TΛ(PT x)

Use y = PT x to change variables. Convert the problem to

max Q(y) = yTΛy = λ1y21 + · · ·+ λny

2n

subject to ‖y‖ = 1

max xTAx subject to ‖x‖ = 1: λmax

{A}

min xTAx subject to ‖x‖ = 1: λmin

{A}


Optimizing quadratic functions: example

max Q(x) = x21 − 8x1x2 − 5x22



Optimizing quadratic functions: example

Solution:

The matrix of the quadratic form is

A =

[1 −4−4 −5

]Orthogonally diagonalize A:

P =

[2/√

5 1/√

5

−1/√

5 2/√

5

],D =

[3 00 −7

]Change variables from x to y = PT x , and rewrite the objectivefunction

x21 − 8x1x2 − 5x22 = xTAx = (Py)TA(Py) = yTDy = 3y21 − 7y22

max Q(x) over ‖x‖ = 1 is 3.


Application 2: Principle Component Analysis (PCA)

Problem: Given a set of data points {x (1), x (2), · · · , x (m)} in Rn, find theaxis along which the data points have maximal variation.

Assume the data center around origin. If not, subtract the mean fromeach data point.



Problem: Given a set of data points {x (1), x (2), · · · , x (m)} in Rn, find theaxis along which the data points have maximal variance.

Use a unit vector u in Rn denote the direction of the axis.

Project each data point onto u to obtain {y (1), y (2), · · · , y (m)}, wherey (i) = uT x (i).

The variance of projected points

σ2 =1

m

m∑i=1

(y (i))2 =1

m

m∑i=1

uT x (i)(x (i))Tu = uTXu

where matrix X is defined by

X =1

m

m∑i=1

x (i)(x (i))T

called covariance matrix.



Problem: Given a set of data points {x (1), x (2), · · · , x (m)} in Rn, find theaxis along which the data points have maximal variance.

Reformulate the problem into a quadratic optimization problem

max uTXu

subject to ‖u‖ = 1

where matrix X = 1m

∑mi=1 x

(i)(x (i))T is the covariance matrix.

Solution: u is the eigenvector corresponding to the largest eigenvalueof X . The resulting y points are called the first principle component.


ics 6n computational linear algebra symmetric matrices...

Documents