group symmetry and covariance regularization - mit...

25
Group Symmetry and Covariance Regularization Parikshit Shah University of Wisconsin Thanks Joint work with Venkat Chandrasekaran 1 / 18

Upload: others

Post on 18-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Group Symmetry and Covariance Regularization - MIT LIDSpeople.lids.mit.edu/pari/symmetry_talk.pdf · 2012-08-22 · Applications: MAR processes x 0!N(0,I) x 1 = A 0x 0 + w 1 x 2 =

Group Symmetry and CovarianceRegularization

Parikshit ShahUniversity of Wisconsin

ThanksJoint work with Venkat Chandrasekaran

1 / 18

Page 2: Group Symmetry and Covariance Regularization - MIT LIDSpeople.lids.mit.edu/pari/symmetry_talk.pdf · 2012-08-22 · Applications: MAR processes x 0!N(0,I) x 1 = A 0x 0 + w 1 x 2 =

Motivation

I Symmetry is common inscience and engineering.

I Symmetry in statisticalmodels.

I How to exploit known groupstructure?

I Message: Symmetry-awaremethods provide hugestatistical and computationalgains.

2 / 18

Page 3: Group Symmetry and Covariance Regularization - MIT LIDSpeople.lids.mit.edu/pari/symmetry_talk.pdf · 2012-08-22 · Applications: MAR processes x 0!N(0,I) x 1 = A 0x 0 + w 1 x 2 =

Applications: MAR processes

x0 ⇠ N (0, I)

x1 = A0x0 + w1 x2 = A0x0 + w2

x3 = A1x1 + w3 x6 = A1x2 + w6

Important class of stochastic models for multi-scale processes,e.g. oceanography, computer vision.

I What is the covariance among the leaf nodes?I Symmetries: automorphism group of Td .I Formally: Σ invariant under action of: Z2 wr Z2 . . . wr Z2.I Can we exploit symmetries? Haar wavelet transform ...

3 / 18

Page 4: Group Symmetry and Covariance Regularization - MIT LIDSpeople.lids.mit.edu/pari/symmetry_talk.pdf · 2012-08-22 · Applications: MAR processes x 0!N(0,I) x 1 = A 0x 0 + w 1 x 2 =

Applications: Random FieldsI Physical phenomena: oceanography, hydrology,

electromagneticsI Poisson’s equation (stochastic input):

52φ(x) = f (x).

I Green’s function: covariance process R(x1, x2).I Symmetry: Laplacian, boundary conditions, R(x1, x2).

I Symmetry-preserving discretization.

4 / 18

Page 5: Group Symmetry and Covariance Regularization - MIT LIDSpeople.lids.mit.edu/pari/symmetry_talk.pdf · 2012-08-22 · Applications: MAR processes x 0!N(0,I) x 1 = A 0x 0 + w 1 x 2 =

Other Applications

I Partial exchangeability: Clinical Tests1. N patients, T groups of similar characteristics2. X1, . . . ,XN physiological responses3. Patients within same group exchangeable (but not i.i.d.)

I Cyclostationarity: periodic phenomena such as vibrations,sinusoidal components ...

We model symmetry of covariance Σ via G-invariance.

Problem statement: Given G infer information about Σ.

5 / 18

Page 6: Group Symmetry and Covariance Regularization - MIT LIDSpeople.lids.mit.edu/pari/symmetry_talk.pdf · 2012-08-22 · Applications: MAR processes x 0!N(0,I) x 1 = A 0x 0 + w 1 x 2 =

Group Theory: Basics

I Finite group G = (G, ◦)1. G - collection of

permutations on [p]2. ◦ - composition3. Closure under

compositionI Examples

1. Symmetric group: Sp.2. Cyclic group: Z

/pZ.

3. Cartesian products:G1 ×G2.

4. Other products:semi-direct, wreath.

6 / 18

Page 7: Group Symmetry and Covariance Regularization - MIT LIDSpeople.lids.mit.edu/pari/symmetry_talk.pdf · 2012-08-22 · Applications: MAR processes x 0!N(0,I) x 1 = A 0x 0 + w 1 x 2 =

Group Theory: Group Action

Let G be a finite group (of permutation matrices), and Rp×p+ be

PSD matrices. A group action is a map

A :G × Rp×p+ → Rp×p

+(Πg ,Σ

)7→ ΠgΣΠT

g .

I G “acts on” matrices by permuting indices.

DefinitionΣ is G-invariant if

ΠgΣΠTg = Σ ∀Πg ∈ G.

I Formalizes notion of a symmetric model.I Fixed point subspace: WG =

{Σ : ΠgΣΠT

g = Σ ∀Πg ∈ G}

.

7 / 18

Page 8: Group Symmetry and Covariance Regularization - MIT LIDSpeople.lids.mit.edu/pari/symmetry_talk.pdf · 2012-08-22 · Applications: MAR processes x 0!N(0,I) x 1 = A 0x 0 + w 1 x 2 =

Fixed Point Subspace Projection

I Statistical model X ∼ N (0,Σ), Σ ∈ Rp×p.I Symmetry: Σ ∈WG.I Model Selection: Given i.i.d. samples X1, . . . ,Xn recover Σ.

Σn :=1n

n∑

i=1

XiX Ti

I High-D regime (n� p), Σn a poor estimate.I G-empirical covariance:

Σ̂ := PG (Σn) .

I Main contribution: statistical analysis of this estimator.

8 / 18

Page 9: Group Symmetry and Covariance Regularization - MIT LIDSpeople.lids.mit.edu/pari/symmetry_talk.pdf · 2012-08-22 · Applications: MAR processes x 0!N(0,I) x 1 = A 0x 0 + w 1 x 2 =

Fixed Point Subspace Projection

I Statistical model X ∼ N (0,Σ), Σ ∈ Rp×p.I Symmetry: Σ ∈WG.I Model Selection: Given i.i.d. samples X1, . . . ,Xn recover Σ.

Σn :=1n

n∑

i=1

XiX Ti

I High-D regime (n� p), Σn a poor estimate.I G-empirical covariance:

Σ̂ := PG (Σn) .

I Main contribution: statistical analysis of this estimator.

8 / 18

Page 10: Group Symmetry and Covariance Regularization - MIT LIDSpeople.lids.mit.edu/pari/symmetry_talk.pdf · 2012-08-22 · Applications: MAR processes x 0!N(0,I) x 1 = A 0x 0 + w 1 x 2 =

Fixed Point Projection: An Example

I MAR Process invariant w.r.t. Z2 wr Z2 . . . wr Z2.

0

a b

1 2 3 4

⌃ =

2664

a b c cb a c cc c a bc c b a

3775 T =

26664

12

12

1p2

012

12 � 1p

20

12 � 1

2 0 1p2

12 � 1

2 0 � 1p2

37775

T ⇤⌃T =

2664

�1

�2

�2

�3

3775

(a) (b)

I How to compute fixed-point subspace projection?I Use Haar wavelet transform T :

PG(Σn) = TD (T ∗ΣnT ) T ∗.

9 / 18

Page 11: Group Symmetry and Covariance Regularization - MIT LIDSpeople.lids.mit.edu/pari/symmetry_talk.pdf · 2012-08-22 · Applications: MAR processes x 0!N(0,I) x 1 = A 0x 0 + w 1 x 2 =

Statistical gains: Convergence in spectral norm

I ‖Σ− Σn‖ ≤ δ w.h.p. if n = O( pδ2

).

I However, ‖Σ− PG (Σn) ‖ ≤ δ w.h.p. if n = O(

log pδ2

)for

G = cyclic, symmetric.I Proof: Fourier transform diagonalizes circulant matrices.

I How do we generalize?

10 / 18

Page 12: Group Symmetry and Covariance Regularization - MIT LIDSpeople.lids.mit.edu/pari/symmetry_talk.pdf · 2012-08-22 · Applications: MAR processes x 0!N(0,I) x 1 = A 0x 0 + w 1 x 2 =

Group Theory: Representation

I G-invariant matrices can be simultaneously blockdiagonalized.

T ∗MT =

M1 0. . .

0 M|I|

Mi =

Bi 0. . .

0 Bi

.

I: (active) irreducible representationssi : dimension of Bimi : multiplicity of Bi

I Theorem: ‖Σ− PG (Σn) ‖ ≤ δ w.h.p. provided

n = O(

max{

maxi∈I

si

miδ2 ,maxi∈I

log pmiδ2

}).

11 / 18

Page 13: Group Symmetry and Covariance Regularization - MIT LIDSpeople.lids.mit.edu/pari/symmetry_talk.pdf · 2012-08-22 · Applications: MAR processes x 0!N(0,I) x 1 = A 0x 0 + w 1 x 2 =

Group Theory: Representation

I G-invariant matrices can be simultaneously blockdiagonalized.

T ∗MT =

M1 0. . .

0 M|I|

Mi =

Bi 0. . .

0 Bi

.

I: (active) irreducible representationssi : dimension of Bimi : multiplicity of Bi

I Theorem: ‖Σ− PG (Σn) ‖ ≤ δ w.h.p. provided

n = O(

max{

maxi∈I

si

miδ2 ,maxi∈I

log pmiδ2

}).

11 / 18

Page 14: Group Symmetry and Covariance Regularization - MIT LIDSpeople.lids.mit.edu/pari/symmetry_talk.pdf · 2012-08-22 · Applications: MAR processes x 0!N(0,I) x 1 = A 0x 0 + w 1 x 2 =

Statistical gains: Convergence in `∞ norm

I ‖Σ− Σn‖`∞ ≤ O(√

log pn

).

I ‖Σ− PG (Σn) ‖`∞ ≤ O(√

log ppn

)for G = cyclic.

I Proof idea: Reynolds averaging

PG (Σn) =1|G|

g∈GΠgΣnΠT

g .

⇒ Average over edge orbits.I For cyclic group edge orbits are of size p.

I How do we generalize?

12 / 18

Page 15: Group Symmetry and Covariance Regularization - MIT LIDSpeople.lids.mit.edu/pari/symmetry_talk.pdf · 2012-08-22 · Applications: MAR processes x 0!N(0,I) x 1 = A 0x 0 + w 1 x 2 =

Edge Orbit Parameters

I Combinatorial parameters:

The edge orbit of (i , j) is O(i , j) := {(g(i),g(j)) | g ∈ G}.The degree dij is the max. number of times any variableappears in O(i , j).

1. O := mini,j |O(i , j)|2. Od := mini,j

|O(i,j)|dij

.I Theorem: We have w.h.p. that

‖Σ− PG (Σn) ‖`∞ ≤ O(

max

{√log pnO ,

log pnOd

}).

I Delicate issues: non-i.i.d. averaging, sample reuse.

13 / 18

Page 16: Group Symmetry and Covariance Regularization - MIT LIDSpeople.lids.mit.edu/pari/symmetry_talk.pdf · 2012-08-22 · Applications: MAR processes x 0!N(0,I) x 1 = A 0x 0 + w 1 x 2 =

Edge Orbit Parameters

I Combinatorial parameters:

The edge orbit of (i , j) is O(i , j) := {(g(i),g(j)) | g ∈ G}.The degree dij is the max. number of times any variableappears in O(i , j).

1. O := mini,j |O(i , j)|2. Od := mini,j

|O(i,j)|dij

.I Theorem: We have w.h.p. that

‖Σ− PG (Σn) ‖`∞ ≤ O(

max

{√log pnO ,

log pnOd

}).

I Delicate issues: non-i.i.d. averaging, sample reuse.

13 / 18

Page 17: Group Symmetry and Covariance Regularization - MIT LIDSpeople.lids.mit.edu/pari/symmetry_talk.pdf · 2012-08-22 · Applications: MAR processes x 0!N(0,I) x 1 = A 0x 0 + w 1 x 2 =

Application: Covariance EstimationI Covariance Estimation: Bickel-Levina thresholding

Σ̂ := thresholdt (Σn) .

If Σ has at most d nonzeros per row/column,

‖Σ− Σ̂‖ <√

d2 log pn

w.h.p.

I Symmetry-aware thresholding: Consider G = cyclic

Σ̂G := thresholdt (PG (Σn)) .

If Σ has at most d nonzeros per row/column,

‖Σ− Σ̂G‖ <√

d2 log ppn

w.h.p.

I Rates in previous slides give results for general groups.14 / 18

Page 18: Group Symmetry and Covariance Regularization - MIT LIDSpeople.lids.mit.edu/pari/symmetry_talk.pdf · 2012-08-22 · Applications: MAR processes x 0!N(0,I) x 1 = A 0x 0 + w 1 x 2 =

Application: Covariance EstimationI Covariance Estimation: Bickel-Levina thresholding

Σ̂ := thresholdt (Σn) .

If Σ has at most d nonzeros per row/column,

‖Σ− Σ̂‖ <√

d2 log pn

w.h.p.

I Symmetry-aware thresholding: Consider G = cyclic

Σ̂G := thresholdt (PG (Σn)) .

If Σ has at most d nonzeros per row/column,

‖Σ− Σ̂G‖ <√

d2 log ppn

w.h.p.

I Rates in previous slides give results for general groups.14 / 18

Page 19: Group Symmetry and Covariance Regularization - MIT LIDSpeople.lids.mit.edu/pari/symmetry_talk.pdf · 2012-08-22 · Applications: MAR processes x 0!N(0,I) x 1 = A 0x 0 + w 1 x 2 =

Application: Gaussian Graphical Model Selection

I Zeros of Σ−1 encode conditional independence relations.I `1-regularized log-l’hood [Yuan and Lin, Ravikumar et al.]:

Θ̂ := arg minΘ∈Sp

++

tr(ΣnΘ)− log det(Θ) + µn‖Θ‖`1 .

Θ̂, Σ−1 have same zero pattern w.h.p. if n = O(d2 log p

),

where d is degree of graph.

I If Σ is G-invariant for G = cyclic :

Θ̂G := arg minΘ∈Sp

++∩WG

tr(ΣnΘ)− log det(Θ) + µn‖Θ‖`1 .

Θ̂G, Σ−1 have same zero pattern w.h.p. if n = O(

d2 log pp

).

I Again, rates in previous slides⇒ scaling in general groups.

15 / 18

Page 20: Group Symmetry and Covariance Regularization - MIT LIDSpeople.lids.mit.edu/pari/symmetry_talk.pdf · 2012-08-22 · Applications: MAR processes x 0!N(0,I) x 1 = A 0x 0 + w 1 x 2 =

Application: Gaussian Graphical Model Selection

I Zeros of Σ−1 encode conditional independence relations.I `1-regularized log-l’hood [Yuan and Lin, Ravikumar et al.]:

Θ̂ := arg minΘ∈Sp

++

tr(ΣnΘ)− log det(Θ) + µn‖Θ‖`1 .

Θ̂, Σ−1 have same zero pattern w.h.p. if n = O(d2 log p

),

where d is degree of graph.

I If Σ is G-invariant for G = cyclic :

Θ̂G := arg minΘ∈Sp

++∩WG

tr(ΣnΘ)− log det(Θ) + µn‖Θ‖`1 .

Θ̂G, Σ−1 have same zero pattern w.h.p. if n = O(

d2 log pp

).

I Again, rates in previous slides⇒ scaling in general groups.

15 / 18

Page 21: Group Symmetry and Covariance Regularization - MIT LIDSpeople.lids.mit.edu/pari/symmetry_talk.pdf · 2012-08-22 · Applications: MAR processes x 0!N(0,I) x 1 = A 0x 0 + w 1 x 2 =

Computational Gains

I When T known PG(·) efficiently computable.I Exploiting symmetries in convex optimization:

If objective and constraint functions G-invariant, thensolution in fixed-point subspace.⇒ reduction in problem size.⇒ improved numerical conditioning.

I For example

arg minΘ∈Sp

++∩WG

tr(ΣnΘ)− log det(Θ) + µn‖Θ‖`1

16 / 18

Page 22: Group Symmetry and Covariance Regularization - MIT LIDSpeople.lids.mit.edu/pari/symmetry_talk.pdf · 2012-08-22 · Applications: MAR processes x 0!N(0,I) x 1 = A 0x 0 + w 1 x 2 =

Computational Gains

I When T known PG(·) efficiently computable.I Exploiting symmetries in convex optimization:

If objective and constraint functions G-invariant, thensolution in fixed-point subspace.⇒ reduction in problem size.⇒ improved numerical conditioning.

I For example

arg minΘ∈Sp

++∩WG

tr(ΣnΘ)− log det(Θ) + µn‖Θ‖`1

16 / 18

Page 23: Group Symmetry and Covariance Regularization - MIT LIDSpeople.lids.mit.edu/pari/symmetry_talk.pdf · 2012-08-22 · Applications: MAR processes x 0!N(0,I) x 1 = A 0x 0 + w 1 x 2 =

Computational Gains

I When T known PG(·) efficiently computable.I Exploiting symmetries in convex optimization:

If objective and constraint functions G-invariant, thensolution in fixed-point subspace.⇒ reduction in problem size.⇒ improved numerical conditioning.

I For example

arg minΘ∈Sp

++∩WG

tr(ΣnΘ)− log det(Θ) + µn‖Θ‖`1

16 / 18

Page 24: Group Symmetry and Covariance Regularization - MIT LIDSpeople.lids.mit.edu/pari/symmetry_talk.pdf · 2012-08-22 · Applications: MAR processes x 0!N(0,I) x 1 = A 0x 0 + w 1 x 2 =

ExperimentsGaussian model invariant with respect to cyclic group, p = 50.

0 100 200100

101

102

103

Number of Samples

Frob

enius

norm

erro

r

0 100 20010−1

100

101

102

103

Number of Samples

Spec

tral n

orm

erro

r

0 100 20010−2

10−1

100

101

102

Number of Samples

! !no

rmer

ror

⌃n

PG(⌃n)

⌃n ⌃n

PG(⌃n) PG(⌃n)

Inverse covariance corresponding to a cycle graph, invariantwith respect to cyclic group, p = 50.

0 500 1000 1500 20000

0.2

0.4

0.6

0.8

1

number of samples

prob

abili

ty o

f cor

rect

mod

el s

elec

tion

p=50

PG(⌃n) � log det⌃n � log det

17 / 18

Page 25: Group Symmetry and Covariance Regularization - MIT LIDSpeople.lids.mit.edu/pari/symmetry_talk.pdf · 2012-08-22 · Applications: MAR processes x 0!N(0,I) x 1 = A 0x 0 + w 1 x 2 =

Conclusion

I Statistical models with symmetries.I Fixed-point projection as means of regularization.I Improved rates for several model selection and estimation

tasks.I Computational benefits.I Current efforts: approximately symmetric models.

http://arxiv.org/abs/1111.7061

18 / 18