majorizationforchangesin...

1

Majorization for Changes in

Principal Angles Between Subspaces

Andrew V. Knyazev and Merico E. Argentati (speaker)

Department of Mathematics and

Center for Computational Mathematics

University of Colorado at Denver

2

Outline

1. Introduction and some definitions

2. A new inequality for perturbation of principal angles

3. Changes in the trial subspace in the Rayleigh Ritz Method

4. Conclusions

3

Why Are We Interested in Angles Between Subspaces?

1. Characterize distance or gap between subspaces

2. Statistics (canonical correlations), information retrieval, image

processing, signal processing

3. Optimization, robust control, system identification

4. Rayleigh Ritz Method (analysis of the influence of changes in a trial

subspace on Ritz values)

5. Grassmannian manifolds (chordal distance metric is the Frobenius

norm of the sines of the principal angles)

4

Definition of Principal Angles Between Subspaces

Let Mn be a complex n–dimensional vector space. Let F and G be

m–dimensional subspaces of Mn with 1 ≤ m < n. Then the principal angles

θ1, . . . , θm ∈ [0, π/2]

between F and G may be defined, e.g., [5, 2, 9], recursively for k = 1, . . . ,m

by

σk = cos(θk) = maxu ∈F maxv ∈G |u∗v| = |u∗kvk|subject to

‖u‖ = ‖v‖ = 1, u∗ui = 0, v∗vi = 0, i = 1, . . . , k − 1.

5

Definition of Principal Angles Between Subspaces

We use the notation

cos( 6 k{F ,G}) = σk, k = 1, . . . ,m,

where 1 ≥ σ1 ≥ · · · ≥ σm ≥ 0, and the sines of the principal angles are

given by

µ1 ≤ µ2 ≤ · · · ≤ µm,

where µk =√

1− σ2i , k = 1, . . . ,m.

Let the columns of the matrices QF , QG ∈Mn×m form orthonormal bases

for the subspaces F and G, respectively. Then the σ’s are the m singular

values of Q∗FQG .

6

Properties of Principal Angles Between Subspaces

Let PF and PG be the orthogonal projectors onto the subspaces F and G,respectively. Then the singular values of Q∗

FQG are the m largest singular

values of PFPG . Thus the singular values of PFPG , in decreasing order, are

σ1, σ2, · · · , σm, 0, · · · , 0.

Let µ1 ≤ µ2 ≤ · · · ≤ µm be the sines of the principal angles between F and

G, and let l = dim{F⋂

G}. Then the singular values of PF − PG , in

decreasing order, are

µm, µm, · · · , µl+1, µl+1, 0, · · · , 0.

7

Properties of Principal Angles Between Subspaces

The singular values of (I − PF )PG = PF⊥PG , in decreasing order, are

µm, · · · , µ1, 0, · · · , 0.

PF and PG play symmetric roles in all cases.

The largest principal angle is related to the notion of distance, or a gap,

between equidimensional subspaces, and the distance is defined [7] as

gap(F ,G) = ‖PF − PG‖ = sin( 6 {F ,G}) = sin( 6 m{F ,G}) = µm

8

Some Motivation – Trigonometric Inequalities

If an angle θ ∈ [0, π/2] is perturbed by ε ∈ [0, π/2] such that

θ + ε ∈ [0, π/2], then

0 ≤ cos(θ)− cos(θ + ε) ≤ sin(θ + ε) sin(ε) ≤ sin(ε), (1)

0 ≤ sin(θ + ε)− sin(θ) ≤ cos(θ) sin(ε) ≤ sin(ε), (2)

0 ≤ cos2(θ)− cos2(θ + ε) = sin(2θ + ε) sin(ε) ≤ sin(ε), (3)

0 ≤ sin2(θ + ε)− sin2(θ) = sin(2θ + ε) sin(ε) ≤ sin(ε). (4)

9

The One Dimensional Case

Let x, y, z ∈Mn and

cos( 6 {x, y}) =|(x, y)|

‖x‖‖y‖. (5)

Then

| cos( 6 {z, x})− cos( 6 {z, y})| ≤ sin( 6 {x, y}), (6)

and

| sin( 6 {z, x})− sin( 6 {z, y})| ≤ sin( 6 {x, y}). (7)

Also

| cos2( 6 {z, x})− cos2( 6 {z, y})| = | sin2( 6 {z, x})− sin2( 6 {z, y})|

≤ sin( 6 {x, y}). (8)

10

Perturbation of Subspaces

Let F , G be m–dimensional subspaces of Mn and let G be an

m–dimensional subspace of Mn that is a perturbation of G. What can we

say about the cosines and sines of the principal angles between F and Gand between F and G?We know from [9] that for k = 1, . . . ,m,

|σk − σk| ≤ sin( 6 {G, G}), (9)

and

|µk − µk| ≤ sin( 6 {G, G}). (10)

11


QUESTION: Can inequalities be characterized that involve all the angles?

Suppose that

|σi1 − σi1 | ≥ . . . ≥ |σik− σik

|Is it true that for k = 1, . . . ,m,

|σik− σik

| ≤ sin( 6 m−k+1{G, G})?

This is not true. However, majorization theory provides a tool that can be

used to obtain some very general inequalities that involve unitarily

invariant norms.

12


Let

C = diag (σ1, . . . , σm), C = diag (σ1, . . . , σm),

and

S = diag (µ1, . . . , µm), S = diag (µ1, . . . , µm).

Then (9) and (10) are equivalent to

‖C − C‖2 ≤ ‖diag (sin( 6 1{G, G}), . . . , sin( 6 m{G, G}))‖2

and

‖S − S‖2 ≤ ‖diag (sin( 6 1{G, G}), . . . , sin( 6 m{G, G}))‖2.

13

Majorization and Unitarily Invariant Norms

Let x = [xi], y = [yi] ∈ Rn be given vectors, and denote their algebraically

decreasing ordered entries by xi1 ≥ · · · ≥ xin and yi1 ≥ · · · ≥ yin . Then we say y

weakly majorizes x if

k∑

j=1

xij ≤

k∑

j=1

yij , k = 1, . . . , n.

If in addition to satisfying the above equation, we have

n∑

i=1

xi =

n∑

j=1

yi,

then we say y (strongly) majorizes x. We use the notation

[x1, · · · , xn] ≺w [y1, · · · , yn]

to indicate that y weakly majorizes x.

14


1. Majorization inequalities [3] are important in characterizing the

relationship between the main diagonal entries and eigenvalues of a

Hermitian matrix, and have an important connection to unitarily

invariant norms.

2. First proved by Schur, the eigenvalues of a Hermitian matrix majorizes

its diagonal entries [4]. Also the singular values of a general matrix

weakly majorizes the absolute values of its diagonal entries [11].

15


Let A ∈Mn×m, then a unitarily invariant matrix norm ‖ · ‖, satisfies‖UAV ‖ = ‖A‖, for all unitary matrices U ∈Mn×n and V ∈Mm×m.

Some examples of unitarily invariant norms include the 2–norm, the

Frobenius norm, the trace norm, p–norms (p ≥ 1) and Ky Fan k norms

[4][Theorem 7.4.45]. If A ∈Mn×m is a matrix, the singular values of A are

denoted by s1(A) ≥ · · · ≥ sq(A), where q = min{n,m}. Some of the

commonly recognized unitarily invariant norms include

‖A‖2 = s1(A), ‖A‖F =

√

√

√

√

q∑

i=1

s2i (A), ‖A‖tr =

q∑

i=1

si(A),

where ‖A‖2, ‖A‖F and ‖A‖tr are respectively, the 2–norm, the Frobenius

norm and the trace norm.

16


Theorem 1 Let A,B ∈Mn×m be given matrices with respective singular

values s1(A) ≥ · · · ≥ sq(A) and s1(B) ≥ · · · ≥ sq(B), where q = min{m,n}.In order that ‖A‖ ≤ ‖B‖ for every unitarily invariant norm ‖ · ‖ onMn×m, it is sufficient that

si(A) ≤ si(B), i = 1, . . . , q

and it is necessary and sufficient that

k∑

i=1

si(A) ≤k

∑

i=1

si(B), k = 1, . . . , n, (11)

that is the singular values of B must weakly majorize the singular values of

A.

17

Perturbation Theorems for Angles

Ji-Guang [13] has proven the following theorem.

Theorem 2 Let F , G and G be a m–dimensional subspaces of Mn with

1 ≤ m < n. Let C = diag (σ1, · · · , σm) be the cosines of the principal angles

between F and G and let S = diag (µ1, · · · , µm) be the sines of the principal

angles between F and G. Let C = diag (σ1, · · · , σm) be the cosines of the principal

angles between F and G and let S = diag (µ1, · · · , µm) be the sines of the

principal angles between F and G. Then

‖C − C‖ ≤ ‖PG − PG‖

and

‖S − S‖ ≤ ‖PG − PG‖

in an arbitrary unitarily invariant norm.

18


This theorem implies that

‖C − C‖ ≤ 2‖diag (sin( 6 1{G, G}), . . . , sin( 6 m{G, G}))‖

and

‖S − S‖ ≤ 2‖diag (sin( 6 1{G, G}), . . . , sin( 6 m{G, G}))‖in an arbitrary unitarily invariant norm.

19


Knyazev and Argentati [8] have proven the following theorem.

Theorem 3 Let F , G and G be a m–dimensional subspaces of Mn with

1 ≤ m < n. Let C = diag (σ1, · · · , σm) be the cosines of the principal angles

between F and G and let S = diag (µ1, · · · , µm) be the sines of the principal

angles between F and G. Let C = diag (σ1, · · · , σm) be the cosines of the

principal angles between F and G and let S = diag (µ1, · · · , µm) be the sines

of the principal angles between F and G. Then

‖C2 − C2‖ = ‖S2 − S2‖ ≤ ‖diag (sin( 6 1{G, G}), . . . , sin( 6 m{G, G})‖

in an arbitrary unitarily invariant norm.

20

Asymptotic Inequalities

Theorem 3 implies the following results. As all principal angles approach

zero, we have

‖C − C‖<∼1

2‖diag (sin( 6 1{G, G}), . . . , sin( 6 m{G, G}))‖,

and as all principal angles approach ninety degrees, we have

‖S − S‖<∼1

2‖diag (sin( 6 1{G, G}), . . . , sin( 6 m{G, G}))‖.

21

Changes in the Trial Subspace in the Rayleigh–Ritz Method

Concerning the perturbation of a subspace, an application is the analysis of

the influence of changes in a trial subspace in the Rayleigh–Ritz method.

Let A ∈Mn×n be a Hermitian matrix and let X be an m–dimensional

subspace of Mn. We can define an operator A = PXA|X on X , where PX is

the orthogonal projection onto X and PXA|X denotes the restriction of

PXA to X , as discussed in [12]. The eigenvalues of A are called Ritz values,

α1 ≥ · · · ≥ αm.

The Ritz values are also the eigenvalues of

Q∗XAQX ,

and the nonzero Ritz values are the nonzero eigenvalues of

PXAPX .

22


One of the main results of [1, 10] is the following theorem.

Theorem 4 Let X and Y both be m–dimensional subspaces of Rn, and

α1 ≥ · · · ≥ αm and β1 ≥ · · · ≥ βm denote the Ritz values for A with respect

to X and Y, i.e. α’s and β’s are the stationary values of the the Rayleighquotient on subspaces X and Y, correspondingly. Then

maxj=1,...,m|αj − βj | ≤ (λmax − λmin) sin( 6 {X ,Y}). (12)

23


One of the key results of [10][Theorem 10], that also involves unitarily

invariant norms, is given below.

Theorem 5 Let A ∈Mn×n be a symmetric real-valued matrix and let Xand Y both be m–dimensional subspaces of Mn, and let α1 ≥ · · · ≥ αm and

β1 ≥ · · · ≥ βm denote the Ritz values for A with respect to X and Y, i.e.,α’s and β’s are the stationary values of the the Rayleigh quotient on

subspaces X and Y, correspondingly. Then

‖diag (α1, . . . , αm)− diag (β1, . . . , βm)‖ ≤C ‖diag (sin( 6 1{X ,Y}), . . . , sin( 6 m{X ,Y}))‖

in an arbitrary unitarily invariant norm, where C =√2(λmax − λmin).

24


Interestingly, we can use Theorem 3 to prove Theorem 5 with a reduced

constant, if A is an orthogonal projector.

Theorem 6 Under the assumptions of Theorem 5, assuming that A is an

orthogonal projector, the√2 factor in the constant C can be eliminated.

Thus C = λmax − λmin = 1 for this case.

In fact, Theorem 3 is equivalent to Theorem 6.

25

Numerical Tests

We use a 4-dimensional vector space and 2-dimensional subspaces. There

are two principal angles between each pair of subspaces. We compute the

Ky–Fan Nk norms for k = 1, 2 and form the N1 ratio

N1 =max{|α1 − β1|, |α2 − β2|}

(λmax − λmin) sin( 6 2{X ,Y}),

and the N2 ratio

N2 =|α1 − β1|+ |α2 − β2|

(λmax − λmin)(sin( 6 1{X ,Y}) + sin( 6 2{X ,Y})).

We plot the largest N1 (left) and N2 (right) ratios for 400,000 trials, each

trial involving a different random symmetric 4× 4 matrix. For each trial

(matrix), we vary the two principal angles from 0 to 90 degrees by 2 degree

increments.

26

Numerical Tests

020

4060

80

020

4060

80

0.8

0.85

0.9

0.95

1

Ky Fan 1 − 4 x 4 Matrix

020

4060

80

020

4060

80

0.8

0.85

0.9

0.95

1

Ky Fan 2 − 4 x 4 Matrix

Figure 1: Ky–Fan N1 (left) and N2 (right) Ratios

27

Conclusions

1. We present a new result concerning perturbations of principal angles

where the absolute value of the difference of the squares of the

cosines/sines are majorized by the sines of the angles between the

perturbed subspaces, with a constant of one.

2. We prove a new result that the absolute value of the perturbations in

the Ritz values is bounded by a constant times the gap between the

original trial subspace and its perturbation, and the constant is sharp.

3. We generalize this result to unitarily invariant norms, but we have to

increase the constant by a factor√2.

4. Numerical results are consistent with our theorems and support our

hypothesis that the√2 factor is artificial.

28

Conclusions

5. In general, utilizing all angles provides finer detail and information

concerning smaller changes, and is more natural for some applications,

e.g., in the analysis of Grassmannian manifolds [6], where the chordal

distance is the Frobenius norm of the sine of the principal angles.

29

References

[1] Merico E. Argentati. Principal Angles Between Subspaces as Related to Rayleigh

Quotient and Rayleigh Ritz Inequalities with Applications to Eigenvalue Accuracy and

an Eigenvalue Solver. PhD thesis, University of Colorado at Denver, 2003.

[2] Gene H. Golub and Charles F. Van Loan. Matrix Computations. Johns Hopkins

University Press, Baltimore, MD, third edition, 1996.

[3] G. H. Hardy, J. E. Littlewood, and G. Polya. Inequalities. Cambridge University Press,

Cambridge, 1959.

[4] R. A. Horn and C. R. Johnson. Matrix analysis. Cambridge University Press,

Cambridge, 1990. Corrected reprint of the 1985 original.

[5] H. Hotelling. Relation between two sets of variables. Biometrica, 28:322–377, 1936.

[6] R. H. Hardin J. H. Conway and N. J. A. Sloane. Packing lines, planes, etc.: packings in

Grassmannian spaces. Experimental Mathematics, 5(2):139–159, 1996.

[7] T. Kato. Perturbation Theory for Linear Operators. Springer–Verlag, New York, NY,

corrected printing of the second edition edition, 1980.

[8] A. V. Knyazev and M. E. Argentati. Majorization for changes in principal angles between

subspaces. Draft Paper.

30

[9] A. V. Knyazev and M. E. Argentati. Principal angles between subspaces in an A-based

scalar product: Algorithms and perturbation estimates. SIAM Journal on Scientific

Computing, 23(6):2009–2041, 2002.

[10] A. V. Knyazev and M. E. Argentati. On proximity of Rayleigh quotients for different

vectors and Ritz values generated by different trial subspaces. Submitted to Linear

Algebra and its Applications, August 2004.

[11] A. W. Marshall and I. Olkin. Inequalities: theory of majorization and its applications,

volume 143 of Mathematics in Science and Engineering. Academic Press Inc. [Harcourt

Brace Jovanovich Publishers], New York, 1979.

[12] B. N. Parlett. The symmetric eigenvalue problem. Society for Industrial and Applied

Mathematics (SIAM), Philadelphia, PA, 1998. Corrected reprint of the 1980 original.

[13] Ji Guang Sun. Perturbation of angles between linear subspaces. J. Comput. Math.,

5(1):58–61, 1987.

majorizationforchangesin...

Documents