joint variable and rank selection for parsimonious...

26
Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary Joint variable and rank selection for parsimonious estimation of high dimensional matrices Florentina Bunea Department of Statistical Science Cornell University High-dimensional Problems in Statistics Workshop ETH, September 2011 Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation o

Upload: others

Post on 03-Nov-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

Joint variable and rank selection for parsimoniousestimation of high dimensional matrices

Florentina BuneaDepartment of Statistical Science

Cornell University

High-dimensional Problems in Statistics WorkshopETH, September 2011

Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices

Page 2: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

1 Framework and motivation

2 Joint Rank and Row Selection JRRS MethodsThe construction of the one-step JRRS estimatorRow and rank sparsity oracle inequalities via one-step JRRSOne-step JRRS to select the best estimator from a finite list

3 Two-step JRRS estimatorsRank Constrained Group Lasso RCGLAdaptive RCGL for joint row and rank selectionRow and rank sparsity oracle inequalities via two-step JRRS

4 Numerical performance and examples

5 Summary

Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices

Page 3: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

A rank and row sparse model

Model: Y = XA + E ; E noise matrix.

Data: m × n matrix Y and m × p matrix X .

Target: p × n matrix A ←→ pn unknown parameters

Rank of A is r ≤ n ∧ p. Nbr of non-zero rows of A is |J| ≤ p.

Row and Rank Sparse Target ←→ r(|J|+ n − r) free param.

Full rank + all rows + large n and p = Hopeless, if m small.Low rank + Small |J| = HOPE, if m small.

Estimate A under joint rank and row constraints.Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices

Page 4: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

Why rank and row sparse Y = XA + E ?

• Multivariate response regression

Measure n response variables for m subjects: Yi ∈ Rn, 1 ≤ i ≤ m.

Measure p predictor variables for m subjects: Xi ∈ Rp, 1 ≤ i ≤ m.

No (rank / row ) constraints on A ⇐⇒ n separate univ.

Zero rows in A ⇐⇒ Not all predictors in the model.

Low rank of A ⇐⇒ Only few orthogonal scores relevant.

Goal: Estimation tailored to row and rank sparsity

Use only a subset of the predictors to construct few scores,

with high predictive power, under JOINT rank and row restrictions

on A.

Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices

Page 5: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

Why row and rank sparse Y = XA + E ? Contd.

• Supervised row and rank sparse PCA.

• Provides framework for row and rank sparse PCA and CCA.

• Building block in functional data analysis (with predictors).

Y = matrix of discretized trajectories for n subjects;X = matrix of basis functions evaluated at discrete data points

+ possibly other predictors of interest.

• Building block in multiple time series analysis.

(Macro-economics and forecasting)

Y = matrix of n time series observed over m time periods(n types of interest rates)

X = Y in the past + other predictive time series

(other potentially connected macro-economic factors).

Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices

Page 6: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

A historical perspective on sparse Y = XA + E

Rank Sparse Models

• Reduced-Rank Regression: Y = XA + E , rank (A) = k = known.Asymptotic results m→∞: Anderson (1951, 1999, 2002);

Rao (1979); Reinsel and Velu (1998); Izenman (1975; 2008).

• Low rank approximations: Y = XA + E , rank (A) = r = unknown.Adaptive estimation + Finite sample theoretical analysis, validfor any m, n, p and any r .

Rank Selection Criterion (RSC): Bunea, She and Wegkamp (2011).

Nuclear Norm Penalized (NNP) estimators:Candes and Plan; Tao (2009+), Rhode and Tsybakov (2011),Negahban and Wainwright (2011);

Koltchinskii, Lounici, and Tsybakov (2011).

Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices

Page 7: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

A historical perspective on sparse Y = XA + E contd.

Row-Sparse Models

• Predictor Xj not in the model ⇐⇒ The j-th row of A is zero.

• Individual variable selection in multivariate response regressionm

Group selection in univariate response regression.

Popular method: The Group Lasso. Yuan and Lin ( 2006); Lounici,

Pontil, Tsybakov and van der Geer (2011).

No rank and row sparse models; no adaptive methods tailored toboth.

Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices

Page 8: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

Joint rank and row selection: JRRS

• Will develop new criteria, for joint rank and predictor selection.

• r ≤ n ∧ |J|, rank(X ) = q ≤ m ∧ p; |J| ≤ p; r and J unknown.

• Optimal risk rates achievable adaptively bythe G-Lasso, RSC/NNP and (to show) JRRS.

G-Lasso: |J|n, in row-sparse modelsRSC or NNP: (p + n)r , in rank-sparse models

JRRS: (|J|+ n)r , in rank and row-sparse models

• JRRS rates never worse and typically much better.

Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices

Page 9: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

The construction of the one-step JRRS estimatorRow and rank sparsity oracle inequalities via one-step JRRSOne-step JRRS to select the best estimator from a finite list

A penalized least squares estimator

• Y is a m × n matrix; X is a m × p matrix.

• ‖M‖2F is the sum of the squared entries of M ∈Mp×n.

• Candidate model B ∈ Mp×n has number of parameters

(n + |J(B)| − rank(B))rank(B) ≤ (n + |J(B)|)rank(B).

The one-step JRRS estimator

A = argminB ∈ Mp×n

{‖Y − XB‖2F + cσ2(2n + |J(B)|)rank(B)}.

• Generalizes to multivariate response modelsthe AIC/Cp-type criteria developed for univariate response.

Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices

Page 10: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

The construction of the one-step JRRS estimatorRow and rank sparsity oracle inequalities via one-step JRRSOne-step JRRS to select the best estimator from a finite list

More on the one-step JRRS penalty

• B ∈ Mp×n with J(B) non-zero rows.

• JRRS penalty pen(B) ∝ σ2(n + |J(B)|)rank(B)

• B ∈ Mp×n ( ignoring non-zero rows), rank(X ) = q.

• RSC penalty pen(B) ∝ σ2(n + q)rank(B)

• Squared ”error level” in full model = Ed21 (PE ) ≈ σ2(n + q),

E with iid sub-Gaussian entries, P = X (X ′X )−X ′.• JRRS generalizes RSC to allow for variable selection.

• To reduce rank and select variables work with:

Ed21 (PJ(B)E ) ≈ σ2(n + |J(B)|).

Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices

Page 11: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

The construction of the one-step JRRS estimatorRow and rank sparsity oracle inequalities via one-step JRRSOne-step JRRS to select the best estimator from a finite list

Oracle-type bounds for the risk of the one-step JRRS

• rank(A) = r , non-zero rows of A with indices in J(A) = J.

Adaptation to Row and Rank Sparsity via one-step JRRS

For all A and X

E[‖XA− XA‖2

]. inf

B

[‖XA− XB‖2 + σ2(n + |J(B)|)r(B)

]. σ2{n + |J|}r .

• RHS = the best bias-variance trade-off across B.

• A is adaptive: it mimics the behavior of an optimalestimator computed knowing r and J.Minimax rate, under suitable conditions.

• Bound valid for any m, n, p.

Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices

Page 12: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

The construction of the one-step JRRS estimatorRow and rank sparsity oracle inequalities via one-step JRRSOne-step JRRS to select the best estimator from a finite list

Select the best from a finite list

• If p > 20, JRRS estimation over all B becomescomputationally intractable

• B = {B1, . . . ,BL} = Finite (large) collection of (random)matrices with different sparsity patterns;may depend on data X and Y .

Optimal selection from a finite list via JRRS

For all A and X

E[‖XA− XA‖2

]. inf

1≤j≤L

[‖XA− XBj‖2 + σ2(n + J(Bj))r(Bj)

].

A = argminB ∈ B

{‖Y − XB‖2F + cσ2(2n + |J(B)|)rank(B)}.

Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices

Page 13: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

Rank Constrained Group Lasso RCGLAdaptive RCGL for joint row and rank selectionRow and rank sparsity oracle inequalities via two-step JRRS

Rank Constrained Group Lasso: main building block

• One-step JRRS penalty pen(B) ∝ (n + |J(B)|)rank(B).J(B) forces complete enumeration; for large p that’s a problem!

• Idea: use convex relation ‖B‖2,1 =∑p

j=1 ‖bj‖2.

• Set λk ∝ σ√

kd21 (X ), for each k.

Bk = argminrank(B)≤k

{‖Y − XB‖2F + λk‖B‖2,1

}.

• Bk is a Rank-Constrained G-Lasso. (RCGL)Other ”group” penalties possible.

Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices

Page 14: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

Rank Constrained Group Lasso RCGLAdaptive RCGL for joint row and rank selectionRow and rank sparsity oracle inequalities via two-step JRRS

• Bk = argminrank(B)≤k{‖Y − XB‖2F + λk‖B‖2,1

}.

• For k = n ∧ p, estimator Bk is G-Lasso.• For λ = 0, estimator Bk is a reduced-rank estimator.

• Otherwise, Bk is a synthesis of the two; new algorithm needed.Efficient algorithm Bunea, She and Wegkamp (2011).

• Works in high dimensions.

Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices

Page 15: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

Rank Constrained Group Lasso RCGLAdaptive RCGL for joint row and rank selectionRow and rank sparsity oracle inequalities via two-step JRRS

Two-step JRRS: Method 1

Method 1

Step 1. Use the Rank Selection Criterion RSC to estimateconsistently r by r .

Step 2. Compute the Rank Constrained G-Lasso estimatorBk with k = r to obtain the final estimator B = Br .

Major Practical Advantage: Easy tuning, backed up by theory.

• For Step 1: Same tuning parameter of RSC gives best MSE andcorrect rank. Can use CV safely; other alternatives exist.

• For Step 2: We want best MSE, CV safe.

Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices

Page 16: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

Rank Constrained Group Lasso RCGLAdaptive RCGL for joint row and rank selectionRow and rank sparsity oracle inequalities via two-step JRRS

Two-step JRRS: Method 2

Method 2

Step 1. Pre-specify a grid of values Λ for λ. Use RCGL to construct

B = {Bk,λ : k ∈ {1, . . . , q}, λ ∈ Λ}.

Step 2. Compute

B = argminB∈B

{‖Y − XB‖2F + pen(B)},

with pen(B) ∝ σ2(n + |J(B)|)rank(B).

• Requires a 2-D grid search: more computationally involved than Met. 1.

Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices

Page 17: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

Rank Constrained Group Lasso RCGLAdaptive RCGL for joint row and rank selectionRow and rank sparsity oracle inequalities via two-step JRRS

Oracle-type bounds for the risk of the two-step JRRS

• Method 1 (RSC + RCGL) → B; Method 2 (RCGL + AIC-M) → B

Adaptation to Row and Rank Sparsity via two-step JRRS

For all A and for X satisfying Assumption 1

E[‖XA− XB‖2

]. inf

B

[‖XA− XB‖2 + σ2(n + J(B))r(B)

]. σ2{n + J(A)}r(A).

If, in addition, dr (XA) > 2√

2σ(√n +√q), same inequality holds for B.

• RHS = the best bias-variance trade-off across all matrices B.

• B, B are adaptive: mimic the behavior of an optimalestimator computed knowing r(A) and J(A).

• Bound valid for any m, n, p; computationally efficient.

Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices

Page 18: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

Rank Constrained Group Lasso RCGLAdaptive RCGL for joint row and rank selectionRow and rank sparsity oracle inequalities via two-step JRRS

Mild conditions on the design matrix

Assumption 1

There exists a set J ⊂ {1, . . . , p} and a number δJ > 0 such that

1

m‖XB‖2F ≥ δJ

∑j∈J‖bj‖22, for all B = [b1 · · · bp]T ∈ Rp×n

• Only a sub-matrix of X ′X has a non-zero smallest eigen-value.Mild condition.

Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices

Page 19: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

Large p - small m numerical performance comparison

• m = 30, |J| = 15, p = 100, n = 10, r = 2, σ2 = 1.

• Performance comparison between:rank and row reduction via RSC→RCGL and G-LASSO→RSC,only row via G-LASSO, and only rank via RSC.

• All optimally tuned on a very large independent set.

Method MSERSC→RCGL 363

G-LASSO→RSC 402

G-LASSO 511RSC 1905

Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices

Page 20: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

Large m - small p numerical performance comparison

• m = 100, |J| = 15, p = 25, n = 25, r = 5, σ2 = 1.

• Performance comparison between:rank and row reduction via RSC→RCGL, G-LASSO→RSC,only row via G-LASSO, and only rank via RSC

• All optimally tuned on a very large independent set.

Method MSERSC→RCGL 8.1

G-LASSO→RSC 8.1

RSC 11.5G-LASSO 17.7

Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices

Page 21: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

A study of the effect of HIV-infectionon human cognitive abilities

• HIV-Neuroimaging laboratory at Brown University, PI R. Cohen.

• m = 62 HIV+ patients, also infected with Hepatitis C,and with a history of drug abuse

• n = 13 neuro-cognitive indices (NCIs) from five domains:attention/working memory, speed of information processingpsychomotor abilities, executive function, and learning and memory.

• p = 234 predictors (a) clinical and demographic predictors and (b)

brain volumetric and diffusion tensor imaging (DTI) derived measures of

several white-matter regions of interest, such as fractional anisotropy,

mean diffusivity, axial diffusivity, and radial diffusivity, along with all

volumetrics × DTI interactions.

Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices

Page 22: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

RSC and JRRS: two rank-1 models

• Both methods: One new predictive score S .

• Left = RSC; MSE = 193; S = lin. comb. of p = 234 predictors.

• Right = JRRS; MSE = 138; S = lin. comb. of |J| = 10 predictors.

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

−40

−20

020

4060

80

Original predictors

Wei

ghts

HIV

_sta

gehc

v_cu

rrent

age

Educ

atio

nkm

sk_a

lckm

sk_c

ocop

ifa

_cc1

fa_c

c234

fa_c

c5m

d_cc

1m

d_cc

234

md_

cc5

fa_i

c1fa

_ic2

fa_i

c3fa

_ic4

md_

ic1

md_

ic2

md_

ic3

md_

ic4

whi

tem

atte

rco

rtex

thal

amus

caud

ate

puta

men

pallid

umhi

ppoc

ampu

sam

ygda

lacc

_spl

eniu

mcc

_bod

ycc

_gen

uH

IV_s

tage

.fa_c

c1H

IV_s

tage

.fa_c

c234

HIV

_sta

ge.fa

_cc5

HIV

_sta

ge.m

d_cc

1H

IV_s

tage

.md_

cc23

4H

IV_s

tage

.md_

cc5

HIV

_sta

ge.fa

_ic1

HIV

_sta

ge.fa

_ic2

HIV

_sta

ge.fa

_ic3

HIV

_sta

ge.fa

_ic4

HIV

_sta

ge.m

d_ic

1H

IV_s

tage

.md_

ic2

HIV

_sta

ge.m

d_ic

3H

IV_s

tage

.md_

ic4

fa_c

c1.fa

_cc1

fa_c

c234

.fa_c

c234

fa_c

c5.fa

_cc5

md_

cc1.

md_

cc1

md_

cc23

4.m

d_cc

234

md_

cc5.

md_

cc5

fa_i

c1.fa

_ic1

fa_i

c2.fa

_ic2

fa_i

c3.fa

_ic3

fa_i

c4.fa

_ic4

md_

ic1.

md_

ic1

md_

ic2.

md_

ic2

md_

ic3.

md_

ic3

md_

ic4.

md_

ic4

HIV

_sta

ge.w

hite

mat

ter

HIV

_sta

ge.c

orte

xH

IV_s

tage

.thal

amus

HIV

_sta

ge.c

auda

teH

IV_s

tage

.put

amen

HIV

_sta

ge.p

allid

umH

IV_s

tage

.hip

poca

mpu

sH

IV_s

tage

.am

ygda

laH

IV_s

tage

.cc_

sple

nium

HIV

_sta

ge.c

c_bo

dyH

IV_s

tage

.cc_

genu

fa_c

c1.w

hite

mat

ter

fa_c

c1.c

orte

xfa

_cc1

.thal

amus

fa_c

c1.c

auda

tefa

_cc1

.put

amen

fa_c

c1.p

allid

umfa

_cc1

.hip

poca

mpu

sfa

_cc1

.am

ygda

lafa

_cc1

.cc_

sple

nium

fa_c

c1.c

c_bo

dyfa

_cc1

.cc_

genu

fa_c

c234

.whi

tem

atte

rfa

_cc2

34.c

orte

xfa

_cc2

34.th

alam

usfa

_cc2

34.c

auda

tefa

_cc2

34.p

utam

enfa

_cc2

34.p

allid

umfa

_cc2

34.h

ippo

cam

pus

fa_c

c234

.am

ygda

lafa

_cc2

34.c

c_sp

leni

umfa

_cc2

34.c

c_bo

dyfa

_cc2

34.c

c_ge

nufa

_cc5

.whi

tem

atte

rfa

_cc5

.cor

tex

fa_c

c5.th

alam

usfa

_cc5

.cau

date

fa_c

c5.p

utam

enfa

_cc5

.pal

lidum

fa_c

c5.h

ippo

cam

pus

fa_c

c5.a

myg

dala

fa_c

c5.c

c_sp

leni

umfa

_cc5

.cc_

body

fa_c

c5.c

c_ge

num

d_cc

1.w

hite

mat

ter

md_

cc1.

corte

xm

d_cc

1.th

alam

usm

d_cc

1.ca

udat

em

d_cc

1.pu

tam

enm

d_cc

1.pa

llidum

md_

cc1.

hipp

ocam

pus

md_

cc1.

amyg

dala

md_

cc1.

cc_s

plen

ium

md_

cc1.

cc_b

ody

md_

cc1.

cc_g

enu

md_

cc23

4.w

hite

mat

ter

md_

cc23

4.co

rtex

md_

cc23

4.th

alam

usm

d_cc

234.

caud

ate

md_

cc23

4.pu

tam

enm

d_cc

234.

pallid

umm

d_cc

234.

hipp

ocam

pus

md_

cc23

4.am

ygda

lam

d_cc

234.

cc_s

plen

ium

md_

cc23

4.cc

_bod

ym

d_cc

234.

cc_g

enu

md_

cc5.

whi

tem

atte

rm

d_cc

5.co

rtex

md_

cc5.

thal

amus

md_

cc5.

caud

ate

md_

cc5.

puta

men

md_

cc5.

pallid

umm

d_cc

5.hi

ppoc

ampu

sm

d_cc

5.am

ygda

lam

d_cc

5.cc

_spl

eniu

mm

d_cc

5.cc

_bod

ym

d_cc

5.cc

_gen

ufa

_ic1

.whi

tem

atte

rfa

_ic1

.cor

tex

fa_i

c1.th

alam

usfa

_ic1

.cau

date

fa_i

c1.p

utam

enfa

_ic1

.pal

lidum

fa_i

c1.h

ippo

cam

pus

fa_i

c1.a

myg

dala

fa_i

c1.c

c_sp

leni

umfa

_ic1

.cc_

body

fa_i

c1.c

c_ge

nufa

_ic2

.whi

tem

atte

rfa

_ic2

.cor

tex

fa_i

c2.th

alam

usfa

_ic2

.cau

date

fa_i

c2.p

utam

enfa

_ic2

.pal

lidum

fa_i

c2.h

ippo

cam

pus

fa_i

c2.a

myg

dala

fa_i

c2.c

c_sp

leni

umfa

_ic2

.cc_

body

fa_i

c2.c

c_ge

nufa

_ic3

.whi

tem

atte

rfa

_ic3

.cor

tex

fa_i

c3.th

alam

usfa

_ic3

.cau

date

fa_i

c3.p

utam

enfa

_ic3

.pal

lidum

fa_i

c3.h

ippo

cam

pus

fa_i

c3.a

myg

dala

fa_i

c3.c

c_sp

leni

umfa

_ic3

.cc_

body

fa_i

c3.c

c_ge

nufa

_ic4

.whi

tem

atte

rfa

_ic4

.cor

tex

fa_i

c4.th

alam

usfa

_ic4

.cau

date

fa_i

c4.p

utam

enfa

_ic4

.pal

lidum

fa_i

c4.h

ippo

cam

pus

fa_i

c4.a

myg

dala

fa_i

c4.c

c_sp

leni

umfa

_ic4

.cc_

body

fa_i

c4.c

c_ge

num

d_ic

1.w

hite

mat

ter

md_

ic1.

corte

xm

d_ic

1.th

alam

usm

d_ic

1.ca

udat

em

d_ic

1.pu

tam

enm

d_ic

1.pa

llidum

md_

ic1.

hipp

ocam

pus

md_

ic1.

amyg

dala

md_

ic1.

cc_s

plen

ium

md_

ic1.

cc_b

ody

md_

ic1.

cc_g

enu

md_

ic2.

whi

tem

atte

rm

d_ic

2.co

rtex

md_

ic2.

thal

amus

md_

ic2.

caud

ate

md_

ic2.

puta

men

md_

ic2.

pallid

umm

d_ic

2.hi

ppoc

ampu

sm

d_ic

2.am

ygda

lam

d_ic

2.cc

_spl

eniu

mm

d_ic

2.cc

_bod

ym

d_ic

2.cc

_gen

um

d_ic

3.w

hite

mat

ter

md_

ic3.

corte

xm

d_ic

3.th

alam

usm

d_ic

3.ca

udat

em

d_ic

3.pu

tam

enm

d_ic

3.pa

llidum

md_

ic3.

hipp

ocam

pus

md_

ic3.

amyg

dala

md_

ic3.

cc_s

plen

ium

md_

ic3.

cc_b

ody

md_

ic3.

cc_g

enu

md_

ic4.

whi

tem

atte

rm

d_ic

4.co

rtex

md_

ic4.

thal

amus

md_

ic4.

caud

ate

md_

ic4.

puta

men

md_

ic4.

pallid

umm

d_ic

4.hi

ppoc

ampu

sm

d_ic

4.am

ygda

lam

d_ic

4.cc

_spl

eniu

mm

d_ic

4.cc

_bod

ym

d_ic

4.cc

_gen

uw

hite

mat

ter.w

hite

mat

ter

corte

x.co

rtex

thal

amus

.thal

amus

caud

ate.

caud

ate

puta

men

.put

amen

pallid

um.p

allid

umhi

ppoc

ampu

s.hi

ppoc

ampu

sam

ygda

la.a

myg

dala

cc_s

plen

ium

.cc_

sple

nium

cc_b

ody.

cc_b

ody

cc_g

enu.

cc_g

enu

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

−10

−50

5

Original predictors

Wei

ghts

HIV

_sta

gehc

v_cu

rrent

age

Educ

atio

nkm

sk_a

lckm

sk_c

ocop

ifa

_cc1

fa_c

c234

fa_c

c5m

d_cc

1m

d_cc

234

md_

cc5

fa_i

c1fa

_ic2

fa_i

c3fa

_ic4

md_

ic1

md_

ic2

md_

ic3

md_

ic4

whi

tem

atte

rco

rtex

thal

amus

caud

ate

puta

men

pallid

umhi

ppoc

ampu

sam

ygda

lacc

_spl

eniu

mcc

_bod

ycc

_gen

uH

IV_s

tage

.fa_c

c1H

IV_s

tage

.fa_c

c234

HIV

_sta

ge.fa

_cc5

HIV

_sta

ge.m

d_cc

1H

IV_s

tage

.md_

cc23

4H

IV_s

tage

.md_

cc5

HIV

_sta

ge.fa

_ic1

HIV

_sta

ge.fa

_ic2

HIV

_sta

ge.fa

_ic3

HIV

_sta

ge.fa

_ic4

HIV

_sta

ge.m

d_ic

1H

IV_s

tage

.md_

ic2

HIV

_sta

ge.m

d_ic

3H

IV_s

tage

.md_

ic4

fa_c

c1.fa

_cc1

fa_c

c234

.fa_c

c234

fa_c

c5.fa

_cc5

md_

cc1.

md_

cc1

md_

cc23

4.m

d_cc

234

md_

cc5.

md_

cc5

fa_i

c1.fa

_ic1

fa_i

c2.fa

_ic2

fa_i

c3.fa

_ic3

fa_i

c4.fa

_ic4

md_

ic1.

md_

ic1

md_

ic2.

md_

ic2

md_

ic3.

md_

ic3

md_

ic4.

md_

ic4

HIV

_sta

ge.w

hite

mat

ter

HIV

_sta

ge.c

orte

xH

IV_s

tage

.thal

amus

HIV

_sta

ge.c

auda

teH

IV_s

tage

.put

amen

HIV

_sta

ge.p

allid

umH

IV_s

tage

.hip

poca

mpu

sH

IV_s

tage

.am

ygda

laH

IV_s

tage

.cc_

sple

nium

HIV

_sta

ge.c

c_bo

dyH

IV_s

tage

.cc_

genu

fa_c

c1.w

hite

mat

ter

fa_c

c1.c

orte

xfa

_cc1

.thal

amus

fa_c

c1.c

auda

tefa

_cc1

.put

amen

fa_c

c1.p

allid

umfa

_cc1

.hip

poca

mpu

sfa

_cc1

.am

ygda

lafa

_cc1

.cc_

sple

nium

fa_c

c1.c

c_bo

dyfa

_cc1

.cc_

genu

fa_c

c234

.whi

tem

atte

rfa

_cc2

34.c

orte

xfa

_cc2

34.th

alam

usfa

_cc2

34.c

auda

tefa

_cc2

34.p

utam

enfa

_cc2

34.p

allid

umfa

_cc2

34.h

ippo

cam

pus

fa_c

c234

.am

ygda

lafa

_cc2

34.c

c_sp

leni

umfa

_cc2

34.c

c_bo

dyfa

_cc2

34.c

c_ge

nufa

_cc5

.whi

tem

atte

rfa

_cc5

.cor

tex

fa_c

c5.th

alam

usfa

_cc5

.cau

date

fa_c

c5.p

utam

enfa

_cc5

.pal

lidum

fa_c

c5.h

ippo

cam

pus

fa_c

c5.a

myg

dala

fa_c

c5.c

c_sp

leni

umfa

_cc5

.cc_

body

fa_c

c5.c

c_ge

num

d_cc

1.w

hite

mat

ter

md_

cc1.

corte

xm

d_cc

1.th

alam

usm

d_cc

1.ca

udat

em

d_cc

1.pu

tam

enm

d_cc

1.pa

llidum

md_

cc1.

hipp

ocam

pus

md_

cc1.

amyg

dala

md_

cc1.

cc_s

plen

ium

md_

cc1.

cc_b

ody

md_

cc1.

cc_g

enu

md_

cc23

4.w

hite

mat

ter

md_

cc23

4.co

rtex

md_

cc23

4.th

alam

usm

d_cc

234.

caud

ate

md_

cc23

4.pu

tam

enm

d_cc

234.

pallid

umm

d_cc

234.

hipp

ocam

pus

md_

cc23

4.am

ygda

lam

d_cc

234.

cc_s

plen

ium

md_

cc23

4.cc

_bod

ym

d_cc

234.

cc_g

enu

md_

cc5.

whi

tem

atte

rm

d_cc

5.co

rtex

md_

cc5.

thal

amus

md_

cc5.

caud

ate

md_

cc5.

puta

men

md_

cc5.

pallid

umm

d_cc

5.hi

ppoc

ampu

sm

d_cc

5.am

ygda

lam

d_cc

5.cc

_spl

eniu

mm

d_cc

5.cc

_bod

ym

d_cc

5.cc

_gen

ufa

_ic1

.whi

tem

atte

rfa

_ic1

.cor

tex

fa_i

c1.th

alam

usfa

_ic1

.cau

date

fa_i

c1.p

utam

enfa

_ic1

.pal

lidum

fa_i

c1.h

ippo

cam

pus

fa_i

c1.a

myg

dala

fa_i

c1.c

c_sp

leni

umfa

_ic1

.cc_

body

fa_i

c1.c

c_ge

nufa

_ic2

.whi

tem

atte

rfa

_ic2

.cor

tex

fa_i

c2.th

alam

usfa

_ic2

.cau

date

fa_i

c2.p

utam

enfa

_ic2

.pal

lidum

fa_i

c2.h

ippo

cam

pus

fa_i

c2.a

myg

dala

fa_i

c2.c

c_sp

leni

umfa

_ic2

.cc_

body

fa_i

c2.c

c_ge

nufa

_ic3

.whi

tem

atte

rfa

_ic3

.cor

tex

fa_i

c3.th

alam

usfa

_ic3

.cau

date

fa_i

c3.p

utam

enfa

_ic3

.pal

lidum

fa_i

c3.h

ippo

cam

pus

fa_i

c3.a

myg

dala

fa_i

c3.c

c_sp

leni

umfa

_ic3

.cc_

body

fa_i

c3.c

c_ge

nufa

_ic4

.whi

tem

atte

rfa

_ic4

.cor

tex

fa_i

c4.th

alam

usfa

_ic4

.cau

date

fa_i

c4.p

utam

enfa

_ic4

.pal

lidum

fa_i

c4.h

ippo

cam

pus

fa_i

c4.a

myg

dala

fa_i

c4.c

c_sp

leni

umfa

_ic4

.cc_

body

fa_i

c4.c

c_ge

num

d_ic

1.w

hite

mat

ter

md_

ic1.

corte

xm

d_ic

1.th

alam

usm

d_ic

1.ca

udat

em

d_ic

1.pu

tam

enm

d_ic

1.pa

llidum

md_

ic1.

hipp

ocam

pus

md_

ic1.

amyg

dala

md_

ic1.

cc_s

plen

ium

md_

ic1.

cc_b

ody

md_

ic1.

cc_g

enu

md_

ic2.

whi

tem

atte

rm

d_ic

2.co

rtex

md_

ic2.

thal

amus

md_

ic2.

caud

ate

md_

ic2.

puta

men

md_

ic2.

pallid

umm

d_ic

2.hi

ppoc

ampu

sm

d_ic

2.am

ygda

lam

d_ic

2.cc

_spl

eniu

mm

d_ic

2.cc

_bod

ym

d_ic

2.cc

_gen

um

d_ic

3.w

hite

mat

ter

md_

ic3.

corte

xm

d_ic

3.th

alam

usm

d_ic

3.ca

udat

em

d_ic

3.pu

tam

enm

d_ic

3.pa

llidum

md_

ic3.

hipp

ocam

pus

md_

ic3.

amyg

dala

md_

ic3.

cc_s

plen

ium

md_

ic3.

cc_b

ody

md_

ic3.

cc_g

enu

md_

ic4.

whi

tem

atte

rm

d_ic

4.co

rtex

md_

ic4.

thal

amus

md_

ic4.

caud

ate

md_

ic4.

puta

men

md_

ic4.

pallid

umm

d_ic

4.hi

ppoc

ampu

sm

d_ic

4.am

ygda

lam

d_ic

4.cc

_spl

eniu

mm

d_ic

4.cc

_bod

ym

d_ic

4.cc

_gen

uw

hite

mat

ter.w

hite

mat

ter

corte

x.co

rtex

thal

amus

.thal

amus

caud

ate.

caud

ate

puta

men

.put

amen

pallid

um.p

allid

umhi

ppoc

ampu

s.hi

ppoc

ampu

sam

ygda

la.a

myg

dala

cc_s

plen

ium

.cc_

sple

nium

cc_b

ody.

cc_b

ody

cc_g

enu.

cc_g

enu

Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices

Page 23: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

• JRRS selected rank 1 and only 10 predictors.

• Education is one of them, confirming past findings.

• The fractional anisotropy at corpus callosum stands out amongthe very many DTI-derived measures, in terms of predictive power.

• New finding in the lab and first quantitative confirmation.

Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices

Page 24: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

Summary

Methods Adaptation Assumptions Restrictionsto RR-sparsity on X and/or A on p

One-step JRRS(AIC-M) Yes None p ≤ 20

Two-step JRRS1 Restricted Eigenvalue;(RSC → RCGL ) Yes dr (XA) > ”noise level” NoneTwo-step JRRS2(RCGL→ AIC-M) Yes Restricted Eigenvalue None

GL → RSC Yes Mutual coherence et al. Noneminj ‖aj‖2 > noise level

• RSC → RCGL easy to tune in practice; backed up by theory. Best !• RCGL→ AIC-M tuning requires search over a 2-D grid. Second best !

• GL → RSC: (1) Most restrictive theoretical assumptions;

(2) Requires tuning for consistent group selection, open problem!Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices

Page 25: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

Summary: Our contribution

Jointly rank and row-sparse models and their estimation

1 Introduced jointly rank and row sparse models.

2 Offered new procedures tailored to the new class of models.

3 Showed that the one-step JRRS is a theoretically optimal adaptiveprocedure:Finite sample oracle inequalities for E|XA− XA‖2F for all A and X .

4 Introduced computationally efficient two-step JRRS.

5 Two-step JRRS satisfy finite sample oracle inequalities underminimal conditions on X .

6 Guaranteed small E‖XA− XA‖2F if A of low rank and few non-zerorows. Analysis valid for all m, n, p, rank r and J. In particular, rand |J| can grow with m and n.

Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices

Page 26: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series

Framework and motivationJoint Rank and Row Selection Methods

Two-step JRRS estimatorsNumerical performance and examples

Summary

Bibliography and acknowledgment

Talk based on

• Florentina Bunea, Yiyuan She and Marten WegkampJoint variable and rank selection for parsimonious estimation ofhigh dimensional matrices ; Cornell University Technical Report,2011.

• Florentina Bunea, Yiyuan She and Marten WegkampOptimal selection of reduced rank estimators of high-dimensionalmatrices; Annals of Statistics, Vol 39, 2011.

• Research partially supported by NSF-DMS 1007444.

Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices