joint variable and rank selection for parsimonious...
TRANSCRIPT
![Page 1: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/1.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
Joint variable and rank selection for parsimoniousestimation of high dimensional matrices
Florentina BuneaDepartment of Statistical Science
Cornell University
High-dimensional Problems in Statistics WorkshopETH, September 2011
Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices
![Page 2: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/2.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
1 Framework and motivation
2 Joint Rank and Row Selection JRRS MethodsThe construction of the one-step JRRS estimatorRow and rank sparsity oracle inequalities via one-step JRRSOne-step JRRS to select the best estimator from a finite list
3 Two-step JRRS estimatorsRank Constrained Group Lasso RCGLAdaptive RCGL for joint row and rank selectionRow and rank sparsity oracle inequalities via two-step JRRS
4 Numerical performance and examples
5 Summary
Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices
![Page 3: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/3.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
A rank and row sparse model
Model: Y = XA + E ; E noise matrix.
Data: m × n matrix Y and m × p matrix X .
Target: p × n matrix A ←→ pn unknown parameters
Rank of A is r ≤ n ∧ p. Nbr of non-zero rows of A is |J| ≤ p.
Row and Rank Sparse Target ←→ r(|J|+ n − r) free param.
Full rank + all rows + large n and p = Hopeless, if m small.Low rank + Small |J| = HOPE, if m small.
Estimate A under joint rank and row constraints.Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices
![Page 4: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/4.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
Why rank and row sparse Y = XA + E ?
• Multivariate response regression
Measure n response variables for m subjects: Yi ∈ Rn, 1 ≤ i ≤ m.
Measure p predictor variables for m subjects: Xi ∈ Rp, 1 ≤ i ≤ m.
No (rank / row ) constraints on A ⇐⇒ n separate univ.
Zero rows in A ⇐⇒ Not all predictors in the model.
Low rank of A ⇐⇒ Only few orthogonal scores relevant.
Goal: Estimation tailored to row and rank sparsity
Use only a subset of the predictors to construct few scores,
with high predictive power, under JOINT rank and row restrictions
on A.
Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices
![Page 5: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/5.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
Why row and rank sparse Y = XA + E ? Contd.
• Supervised row and rank sparse PCA.
• Provides framework for row and rank sparse PCA and CCA.
• Building block in functional data analysis (with predictors).
Y = matrix of discretized trajectories for n subjects;X = matrix of basis functions evaluated at discrete data points
+ possibly other predictors of interest.
• Building block in multiple time series analysis.
(Macro-economics and forecasting)
Y = matrix of n time series observed over m time periods(n types of interest rates)
X = Y in the past + other predictive time series
(other potentially connected macro-economic factors).
Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices
![Page 6: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/6.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
A historical perspective on sparse Y = XA + E
Rank Sparse Models
• Reduced-Rank Regression: Y = XA + E , rank (A) = k = known.Asymptotic results m→∞: Anderson (1951, 1999, 2002);
Rao (1979); Reinsel and Velu (1998); Izenman (1975; 2008).
• Low rank approximations: Y = XA + E , rank (A) = r = unknown.Adaptive estimation + Finite sample theoretical analysis, validfor any m, n, p and any r .
Rank Selection Criterion (RSC): Bunea, She and Wegkamp (2011).
Nuclear Norm Penalized (NNP) estimators:Candes and Plan; Tao (2009+), Rhode and Tsybakov (2011),Negahban and Wainwright (2011);
Koltchinskii, Lounici, and Tsybakov (2011).
Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices
![Page 7: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/7.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
A historical perspective on sparse Y = XA + E contd.
Row-Sparse Models
• Predictor Xj not in the model ⇐⇒ The j-th row of A is zero.
• Individual variable selection in multivariate response regressionm
Group selection in univariate response regression.
Popular method: The Group Lasso. Yuan and Lin ( 2006); Lounici,
Pontil, Tsybakov and van der Geer (2011).
No rank and row sparse models; no adaptive methods tailored toboth.
Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices
![Page 8: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/8.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
Joint rank and row selection: JRRS
• Will develop new criteria, for joint rank and predictor selection.
• r ≤ n ∧ |J|, rank(X ) = q ≤ m ∧ p; |J| ≤ p; r and J unknown.
• Optimal risk rates achievable adaptively bythe G-Lasso, RSC/NNP and (to show) JRRS.
G-Lasso: |J|n, in row-sparse modelsRSC or NNP: (p + n)r , in rank-sparse models
JRRS: (|J|+ n)r , in rank and row-sparse models
• JRRS rates never worse and typically much better.
Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices
![Page 9: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/9.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
The construction of the one-step JRRS estimatorRow and rank sparsity oracle inequalities via one-step JRRSOne-step JRRS to select the best estimator from a finite list
A penalized least squares estimator
• Y is a m × n matrix; X is a m × p matrix.
• ‖M‖2F is the sum of the squared entries of M ∈Mp×n.
• Candidate model B ∈ Mp×n has number of parameters
(n + |J(B)| − rank(B))rank(B) ≤ (n + |J(B)|)rank(B).
The one-step JRRS estimator
A = argminB ∈ Mp×n
{‖Y − XB‖2F + cσ2(2n + |J(B)|)rank(B)}.
• Generalizes to multivariate response modelsthe AIC/Cp-type criteria developed for univariate response.
Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices
![Page 10: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/10.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
The construction of the one-step JRRS estimatorRow and rank sparsity oracle inequalities via one-step JRRSOne-step JRRS to select the best estimator from a finite list
More on the one-step JRRS penalty
• B ∈ Mp×n with J(B) non-zero rows.
• JRRS penalty pen(B) ∝ σ2(n + |J(B)|)rank(B)
• B ∈ Mp×n ( ignoring non-zero rows), rank(X ) = q.
• RSC penalty pen(B) ∝ σ2(n + q)rank(B)
• Squared ”error level” in full model = Ed21 (PE ) ≈ σ2(n + q),
E with iid sub-Gaussian entries, P = X (X ′X )−X ′.• JRRS generalizes RSC to allow for variable selection.
• To reduce rank and select variables work with:
Ed21 (PJ(B)E ) ≈ σ2(n + |J(B)|).
Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices
![Page 11: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/11.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
The construction of the one-step JRRS estimatorRow and rank sparsity oracle inequalities via one-step JRRSOne-step JRRS to select the best estimator from a finite list
Oracle-type bounds for the risk of the one-step JRRS
• rank(A) = r , non-zero rows of A with indices in J(A) = J.
Adaptation to Row and Rank Sparsity via one-step JRRS
For all A and X
E[‖XA− XA‖2
]. inf
B
[‖XA− XB‖2 + σ2(n + |J(B)|)r(B)
]. σ2{n + |J|}r .
• RHS = the best bias-variance trade-off across B.
• A is adaptive: it mimics the behavior of an optimalestimator computed knowing r and J.Minimax rate, under suitable conditions.
• Bound valid for any m, n, p.
Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices
![Page 12: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/12.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
The construction of the one-step JRRS estimatorRow and rank sparsity oracle inequalities via one-step JRRSOne-step JRRS to select the best estimator from a finite list
Select the best from a finite list
• If p > 20, JRRS estimation over all B becomescomputationally intractable
• B = {B1, . . . ,BL} = Finite (large) collection of (random)matrices with different sparsity patterns;may depend on data X and Y .
Optimal selection from a finite list via JRRS
For all A and X
E[‖XA− XA‖2
]. inf
1≤j≤L
[‖XA− XBj‖2 + σ2(n + J(Bj))r(Bj)
].
A = argminB ∈ B
{‖Y − XB‖2F + cσ2(2n + |J(B)|)rank(B)}.
Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices
![Page 13: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/13.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
Rank Constrained Group Lasso RCGLAdaptive RCGL for joint row and rank selectionRow and rank sparsity oracle inequalities via two-step JRRS
Rank Constrained Group Lasso: main building block
• One-step JRRS penalty pen(B) ∝ (n + |J(B)|)rank(B).J(B) forces complete enumeration; for large p that’s a problem!
• Idea: use convex relation ‖B‖2,1 =∑p
j=1 ‖bj‖2.
• Set λk ∝ σ√
kd21 (X ), for each k.
Bk = argminrank(B)≤k
{‖Y − XB‖2F + λk‖B‖2,1
}.
• Bk is a Rank-Constrained G-Lasso. (RCGL)Other ”group” penalties possible.
Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices
![Page 14: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/14.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
Rank Constrained Group Lasso RCGLAdaptive RCGL for joint row and rank selectionRow and rank sparsity oracle inequalities via two-step JRRS
• Bk = argminrank(B)≤k{‖Y − XB‖2F + λk‖B‖2,1
}.
• For k = n ∧ p, estimator Bk is G-Lasso.• For λ = 0, estimator Bk is a reduced-rank estimator.
• Otherwise, Bk is a synthesis of the two; new algorithm needed.Efficient algorithm Bunea, She and Wegkamp (2011).
• Works in high dimensions.
Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices
![Page 15: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/15.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
Rank Constrained Group Lasso RCGLAdaptive RCGL for joint row and rank selectionRow and rank sparsity oracle inequalities via two-step JRRS
Two-step JRRS: Method 1
Method 1
Step 1. Use the Rank Selection Criterion RSC to estimateconsistently r by r .
Step 2. Compute the Rank Constrained G-Lasso estimatorBk with k = r to obtain the final estimator B = Br .
Major Practical Advantage: Easy tuning, backed up by theory.
• For Step 1: Same tuning parameter of RSC gives best MSE andcorrect rank. Can use CV safely; other alternatives exist.
• For Step 2: We want best MSE, CV safe.
Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices
![Page 16: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/16.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
Rank Constrained Group Lasso RCGLAdaptive RCGL for joint row and rank selectionRow and rank sparsity oracle inequalities via two-step JRRS
Two-step JRRS: Method 2
Method 2
Step 1. Pre-specify a grid of values Λ for λ. Use RCGL to construct
B = {Bk,λ : k ∈ {1, . . . , q}, λ ∈ Λ}.
Step 2. Compute
B = argminB∈B
{‖Y − XB‖2F + pen(B)},
with pen(B) ∝ σ2(n + |J(B)|)rank(B).
• Requires a 2-D grid search: more computationally involved than Met. 1.
Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices
![Page 17: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/17.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
Rank Constrained Group Lasso RCGLAdaptive RCGL for joint row and rank selectionRow and rank sparsity oracle inequalities via two-step JRRS
Oracle-type bounds for the risk of the two-step JRRS
• Method 1 (RSC + RCGL) → B; Method 2 (RCGL + AIC-M) → B
Adaptation to Row and Rank Sparsity via two-step JRRS
For all A and for X satisfying Assumption 1
E[‖XA− XB‖2
]. inf
B
[‖XA− XB‖2 + σ2(n + J(B))r(B)
]. σ2{n + J(A)}r(A).
If, in addition, dr (XA) > 2√
2σ(√n +√q), same inequality holds for B.
• RHS = the best bias-variance trade-off across all matrices B.
• B, B are adaptive: mimic the behavior of an optimalestimator computed knowing r(A) and J(A).
• Bound valid for any m, n, p; computationally efficient.
Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices
![Page 18: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/18.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
Rank Constrained Group Lasso RCGLAdaptive RCGL for joint row and rank selectionRow and rank sparsity oracle inequalities via two-step JRRS
Mild conditions on the design matrix
Assumption 1
There exists a set J ⊂ {1, . . . , p} and a number δJ > 0 such that
1
m‖XB‖2F ≥ δJ
∑j∈J‖bj‖22, for all B = [b1 · · · bp]T ∈ Rp×n
• Only a sub-matrix of X ′X has a non-zero smallest eigen-value.Mild condition.
Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices
![Page 19: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/19.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
Large p - small m numerical performance comparison
• m = 30, |J| = 15, p = 100, n = 10, r = 2, σ2 = 1.
• Performance comparison between:rank and row reduction via RSC→RCGL and G-LASSO→RSC,only row via G-LASSO, and only rank via RSC.
• All optimally tuned on a very large independent set.
Method MSERSC→RCGL 363
G-LASSO→RSC 402
G-LASSO 511RSC 1905
Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices
![Page 20: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/20.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
Large m - small p numerical performance comparison
• m = 100, |J| = 15, p = 25, n = 25, r = 5, σ2 = 1.
• Performance comparison between:rank and row reduction via RSC→RCGL, G-LASSO→RSC,only row via G-LASSO, and only rank via RSC
• All optimally tuned on a very large independent set.
Method MSERSC→RCGL 8.1
G-LASSO→RSC 8.1
RSC 11.5G-LASSO 17.7
Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices
![Page 21: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/21.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
A study of the effect of HIV-infectionon human cognitive abilities
• HIV-Neuroimaging laboratory at Brown University, PI R. Cohen.
• m = 62 HIV+ patients, also infected with Hepatitis C,and with a history of drug abuse
• n = 13 neuro-cognitive indices (NCIs) from five domains:attention/working memory, speed of information processingpsychomotor abilities, executive function, and learning and memory.
• p = 234 predictors (a) clinical and demographic predictors and (b)
brain volumetric and diffusion tensor imaging (DTI) derived measures of
several white-matter regions of interest, such as fractional anisotropy,
mean diffusivity, axial diffusivity, and radial diffusivity, along with all
volumetrics × DTI interactions.
Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices
![Page 22: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/22.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
RSC and JRRS: two rank-1 models
• Both methods: One new predictive score S .
• Left = RSC; MSE = 193; S = lin. comb. of p = 234 predictors.
• Right = JRRS; MSE = 138; S = lin. comb. of |J| = 10 predictors.
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
−40
−20
020
4060
80
Original predictors
Wei
ghts
HIV
_sta
gehc
v_cu
rrent
age
Educ
atio
nkm
sk_a
lckm
sk_c
ocop
ifa
_cc1
fa_c
c234
fa_c
c5m
d_cc
1m
d_cc
234
md_
cc5
fa_i
c1fa
_ic2
fa_i
c3fa
_ic4
md_
ic1
md_
ic2
md_
ic3
md_
ic4
whi
tem
atte
rco
rtex
thal
amus
caud
ate
puta
men
pallid
umhi
ppoc
ampu
sam
ygda
lacc
_spl
eniu
mcc
_bod
ycc
_gen
uH
IV_s
tage
.fa_c
c1H
IV_s
tage
.fa_c
c234
HIV
_sta
ge.fa
_cc5
HIV
_sta
ge.m
d_cc
1H
IV_s
tage
.md_
cc23
4H
IV_s
tage
.md_
cc5
HIV
_sta
ge.fa
_ic1
HIV
_sta
ge.fa
_ic2
HIV
_sta
ge.fa
_ic3
HIV
_sta
ge.fa
_ic4
HIV
_sta
ge.m
d_ic
1H
IV_s
tage
.md_
ic2
HIV
_sta
ge.m
d_ic
3H
IV_s
tage
.md_
ic4
fa_c
c1.fa
_cc1
fa_c
c234
.fa_c
c234
fa_c
c5.fa
_cc5
md_
cc1.
md_
cc1
md_
cc23
4.m
d_cc
234
md_
cc5.
md_
cc5
fa_i
c1.fa
_ic1
fa_i
c2.fa
_ic2
fa_i
c3.fa
_ic3
fa_i
c4.fa
_ic4
md_
ic1.
md_
ic1
md_
ic2.
md_
ic2
md_
ic3.
md_
ic3
md_
ic4.
md_
ic4
HIV
_sta
ge.w
hite
mat
ter
HIV
_sta
ge.c
orte
xH
IV_s
tage
.thal
amus
HIV
_sta
ge.c
auda
teH
IV_s
tage
.put
amen
HIV
_sta
ge.p
allid
umH
IV_s
tage
.hip
poca
mpu
sH
IV_s
tage
.am
ygda
laH
IV_s
tage
.cc_
sple
nium
HIV
_sta
ge.c
c_bo
dyH
IV_s
tage
.cc_
genu
fa_c
c1.w
hite
mat
ter
fa_c
c1.c
orte
xfa
_cc1
.thal
amus
fa_c
c1.c
auda
tefa
_cc1
.put
amen
fa_c
c1.p
allid
umfa
_cc1
.hip
poca
mpu
sfa
_cc1
.am
ygda
lafa
_cc1
.cc_
sple
nium
fa_c
c1.c
c_bo
dyfa
_cc1
.cc_
genu
fa_c
c234
.whi
tem
atte
rfa
_cc2
34.c
orte
xfa
_cc2
34.th
alam
usfa
_cc2
34.c
auda
tefa
_cc2
34.p
utam
enfa
_cc2
34.p
allid
umfa
_cc2
34.h
ippo
cam
pus
fa_c
c234
.am
ygda
lafa
_cc2
34.c
c_sp
leni
umfa
_cc2
34.c
c_bo
dyfa
_cc2
34.c
c_ge
nufa
_cc5
.whi
tem
atte
rfa
_cc5
.cor
tex
fa_c
c5.th
alam
usfa
_cc5
.cau
date
fa_c
c5.p
utam
enfa
_cc5
.pal
lidum
fa_c
c5.h
ippo
cam
pus
fa_c
c5.a
myg
dala
fa_c
c5.c
c_sp
leni
umfa
_cc5
.cc_
body
fa_c
c5.c
c_ge
num
d_cc
1.w
hite
mat
ter
md_
cc1.
corte
xm
d_cc
1.th
alam
usm
d_cc
1.ca
udat
em
d_cc
1.pu
tam
enm
d_cc
1.pa
llidum
md_
cc1.
hipp
ocam
pus
md_
cc1.
amyg
dala
md_
cc1.
cc_s
plen
ium
md_
cc1.
cc_b
ody
md_
cc1.
cc_g
enu
md_
cc23
4.w
hite
mat
ter
md_
cc23
4.co
rtex
md_
cc23
4.th
alam
usm
d_cc
234.
caud
ate
md_
cc23
4.pu
tam
enm
d_cc
234.
pallid
umm
d_cc
234.
hipp
ocam
pus
md_
cc23
4.am
ygda
lam
d_cc
234.
cc_s
plen
ium
md_
cc23
4.cc
_bod
ym
d_cc
234.
cc_g
enu
md_
cc5.
whi
tem
atte
rm
d_cc
5.co
rtex
md_
cc5.
thal
amus
md_
cc5.
caud
ate
md_
cc5.
puta
men
md_
cc5.
pallid
umm
d_cc
5.hi
ppoc
ampu
sm
d_cc
5.am
ygda
lam
d_cc
5.cc
_spl
eniu
mm
d_cc
5.cc
_bod
ym
d_cc
5.cc
_gen
ufa
_ic1
.whi
tem
atte
rfa
_ic1
.cor
tex
fa_i
c1.th
alam
usfa
_ic1
.cau
date
fa_i
c1.p
utam
enfa
_ic1
.pal
lidum
fa_i
c1.h
ippo
cam
pus
fa_i
c1.a
myg
dala
fa_i
c1.c
c_sp
leni
umfa
_ic1
.cc_
body
fa_i
c1.c
c_ge
nufa
_ic2
.whi
tem
atte
rfa
_ic2
.cor
tex
fa_i
c2.th
alam
usfa
_ic2
.cau
date
fa_i
c2.p
utam
enfa
_ic2
.pal
lidum
fa_i
c2.h
ippo
cam
pus
fa_i
c2.a
myg
dala
fa_i
c2.c
c_sp
leni
umfa
_ic2
.cc_
body
fa_i
c2.c
c_ge
nufa
_ic3
.whi
tem
atte
rfa
_ic3
.cor
tex
fa_i
c3.th
alam
usfa
_ic3
.cau
date
fa_i
c3.p
utam
enfa
_ic3
.pal
lidum
fa_i
c3.h
ippo
cam
pus
fa_i
c3.a
myg
dala
fa_i
c3.c
c_sp
leni
umfa
_ic3
.cc_
body
fa_i
c3.c
c_ge
nufa
_ic4
.whi
tem
atte
rfa
_ic4
.cor
tex
fa_i
c4.th
alam
usfa
_ic4
.cau
date
fa_i
c4.p
utam
enfa
_ic4
.pal
lidum
fa_i
c4.h
ippo
cam
pus
fa_i
c4.a
myg
dala
fa_i
c4.c
c_sp
leni
umfa
_ic4
.cc_
body
fa_i
c4.c
c_ge
num
d_ic
1.w
hite
mat
ter
md_
ic1.
corte
xm
d_ic
1.th
alam
usm
d_ic
1.ca
udat
em
d_ic
1.pu
tam
enm
d_ic
1.pa
llidum
md_
ic1.
hipp
ocam
pus
md_
ic1.
amyg
dala
md_
ic1.
cc_s
plen
ium
md_
ic1.
cc_b
ody
md_
ic1.
cc_g
enu
md_
ic2.
whi
tem
atte
rm
d_ic
2.co
rtex
md_
ic2.
thal
amus
md_
ic2.
caud
ate
md_
ic2.
puta
men
md_
ic2.
pallid
umm
d_ic
2.hi
ppoc
ampu
sm
d_ic
2.am
ygda
lam
d_ic
2.cc
_spl
eniu
mm
d_ic
2.cc
_bod
ym
d_ic
2.cc
_gen
um
d_ic
3.w
hite
mat
ter
md_
ic3.
corte
xm
d_ic
3.th
alam
usm
d_ic
3.ca
udat
em
d_ic
3.pu
tam
enm
d_ic
3.pa
llidum
md_
ic3.
hipp
ocam
pus
md_
ic3.
amyg
dala
md_
ic3.
cc_s
plen
ium
md_
ic3.
cc_b
ody
md_
ic3.
cc_g
enu
md_
ic4.
whi
tem
atte
rm
d_ic
4.co
rtex
md_
ic4.
thal
amus
md_
ic4.
caud
ate
md_
ic4.
puta
men
md_
ic4.
pallid
umm
d_ic
4.hi
ppoc
ampu
sm
d_ic
4.am
ygda
lam
d_ic
4.cc
_spl
eniu
mm
d_ic
4.cc
_bod
ym
d_ic
4.cc
_gen
uw
hite
mat
ter.w
hite
mat
ter
corte
x.co
rtex
thal
amus
.thal
amus
caud
ate.
caud
ate
puta
men
.put
amen
pallid
um.p
allid
umhi
ppoc
ampu
s.hi
ppoc
ampu
sam
ygda
la.a
myg
dala
cc_s
plen
ium
.cc_
sple
nium
cc_b
ody.
cc_b
ody
cc_g
enu.
cc_g
enu
●●●
●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●
●
●●●●●●●●●●
●
●
●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−10
−50
5
Original predictors
Wei
ghts
HIV
_sta
gehc
v_cu
rrent
age
Educ
atio
nkm
sk_a
lckm
sk_c
ocop
ifa
_cc1
fa_c
c234
fa_c
c5m
d_cc
1m
d_cc
234
md_
cc5
fa_i
c1fa
_ic2
fa_i
c3fa
_ic4
md_
ic1
md_
ic2
md_
ic3
md_
ic4
whi
tem
atte
rco
rtex
thal
amus
caud
ate
puta
men
pallid
umhi
ppoc
ampu
sam
ygda
lacc
_spl
eniu
mcc
_bod
ycc
_gen
uH
IV_s
tage
.fa_c
c1H
IV_s
tage
.fa_c
c234
HIV
_sta
ge.fa
_cc5
HIV
_sta
ge.m
d_cc
1H
IV_s
tage
.md_
cc23
4H
IV_s
tage
.md_
cc5
HIV
_sta
ge.fa
_ic1
HIV
_sta
ge.fa
_ic2
HIV
_sta
ge.fa
_ic3
HIV
_sta
ge.fa
_ic4
HIV
_sta
ge.m
d_ic
1H
IV_s
tage
.md_
ic2
HIV
_sta
ge.m
d_ic
3H
IV_s
tage
.md_
ic4
fa_c
c1.fa
_cc1
fa_c
c234
.fa_c
c234
fa_c
c5.fa
_cc5
md_
cc1.
md_
cc1
md_
cc23
4.m
d_cc
234
md_
cc5.
md_
cc5
fa_i
c1.fa
_ic1
fa_i
c2.fa
_ic2
fa_i
c3.fa
_ic3
fa_i
c4.fa
_ic4
md_
ic1.
md_
ic1
md_
ic2.
md_
ic2
md_
ic3.
md_
ic3
md_
ic4.
md_
ic4
HIV
_sta
ge.w
hite
mat
ter
HIV
_sta
ge.c
orte
xH
IV_s
tage
.thal
amus
HIV
_sta
ge.c
auda
teH
IV_s
tage
.put
amen
HIV
_sta
ge.p
allid
umH
IV_s
tage
.hip
poca
mpu
sH
IV_s
tage
.am
ygda
laH
IV_s
tage
.cc_
sple
nium
HIV
_sta
ge.c
c_bo
dyH
IV_s
tage
.cc_
genu
fa_c
c1.w
hite
mat
ter
fa_c
c1.c
orte
xfa
_cc1
.thal
amus
fa_c
c1.c
auda
tefa
_cc1
.put
amen
fa_c
c1.p
allid
umfa
_cc1
.hip
poca
mpu
sfa
_cc1
.am
ygda
lafa
_cc1
.cc_
sple
nium
fa_c
c1.c
c_bo
dyfa
_cc1
.cc_
genu
fa_c
c234
.whi
tem
atte
rfa
_cc2
34.c
orte
xfa
_cc2
34.th
alam
usfa
_cc2
34.c
auda
tefa
_cc2
34.p
utam
enfa
_cc2
34.p
allid
umfa
_cc2
34.h
ippo
cam
pus
fa_c
c234
.am
ygda
lafa
_cc2
34.c
c_sp
leni
umfa
_cc2
34.c
c_bo
dyfa
_cc2
34.c
c_ge
nufa
_cc5
.whi
tem
atte
rfa
_cc5
.cor
tex
fa_c
c5.th
alam
usfa
_cc5
.cau
date
fa_c
c5.p
utam
enfa
_cc5
.pal
lidum
fa_c
c5.h
ippo
cam
pus
fa_c
c5.a
myg
dala
fa_c
c5.c
c_sp
leni
umfa
_cc5
.cc_
body
fa_c
c5.c
c_ge
num
d_cc
1.w
hite
mat
ter
md_
cc1.
corte
xm
d_cc
1.th
alam
usm
d_cc
1.ca
udat
em
d_cc
1.pu
tam
enm
d_cc
1.pa
llidum
md_
cc1.
hipp
ocam
pus
md_
cc1.
amyg
dala
md_
cc1.
cc_s
plen
ium
md_
cc1.
cc_b
ody
md_
cc1.
cc_g
enu
md_
cc23
4.w
hite
mat
ter
md_
cc23
4.co
rtex
md_
cc23
4.th
alam
usm
d_cc
234.
caud
ate
md_
cc23
4.pu
tam
enm
d_cc
234.
pallid
umm
d_cc
234.
hipp
ocam
pus
md_
cc23
4.am
ygda
lam
d_cc
234.
cc_s
plen
ium
md_
cc23
4.cc
_bod
ym
d_cc
234.
cc_g
enu
md_
cc5.
whi
tem
atte
rm
d_cc
5.co
rtex
md_
cc5.
thal
amus
md_
cc5.
caud
ate
md_
cc5.
puta
men
md_
cc5.
pallid
umm
d_cc
5.hi
ppoc
ampu
sm
d_cc
5.am
ygda
lam
d_cc
5.cc
_spl
eniu
mm
d_cc
5.cc
_bod
ym
d_cc
5.cc
_gen
ufa
_ic1
.whi
tem
atte
rfa
_ic1
.cor
tex
fa_i
c1.th
alam
usfa
_ic1
.cau
date
fa_i
c1.p
utam
enfa
_ic1
.pal
lidum
fa_i
c1.h
ippo
cam
pus
fa_i
c1.a
myg
dala
fa_i
c1.c
c_sp
leni
umfa
_ic1
.cc_
body
fa_i
c1.c
c_ge
nufa
_ic2
.whi
tem
atte
rfa
_ic2
.cor
tex
fa_i
c2.th
alam
usfa
_ic2
.cau
date
fa_i
c2.p
utam
enfa
_ic2
.pal
lidum
fa_i
c2.h
ippo
cam
pus
fa_i
c2.a
myg
dala
fa_i
c2.c
c_sp
leni
umfa
_ic2
.cc_
body
fa_i
c2.c
c_ge
nufa
_ic3
.whi
tem
atte
rfa
_ic3
.cor
tex
fa_i
c3.th
alam
usfa
_ic3
.cau
date
fa_i
c3.p
utam
enfa
_ic3
.pal
lidum
fa_i
c3.h
ippo
cam
pus
fa_i
c3.a
myg
dala
fa_i
c3.c
c_sp
leni
umfa
_ic3
.cc_
body
fa_i
c3.c
c_ge
nufa
_ic4
.whi
tem
atte
rfa
_ic4
.cor
tex
fa_i
c4.th
alam
usfa
_ic4
.cau
date
fa_i
c4.p
utam
enfa
_ic4
.pal
lidum
fa_i
c4.h
ippo
cam
pus
fa_i
c4.a
myg
dala
fa_i
c4.c
c_sp
leni
umfa
_ic4
.cc_
body
fa_i
c4.c
c_ge
num
d_ic
1.w
hite
mat
ter
md_
ic1.
corte
xm
d_ic
1.th
alam
usm
d_ic
1.ca
udat
em
d_ic
1.pu
tam
enm
d_ic
1.pa
llidum
md_
ic1.
hipp
ocam
pus
md_
ic1.
amyg
dala
md_
ic1.
cc_s
plen
ium
md_
ic1.
cc_b
ody
md_
ic1.
cc_g
enu
md_
ic2.
whi
tem
atte
rm
d_ic
2.co
rtex
md_
ic2.
thal
amus
md_
ic2.
caud
ate
md_
ic2.
puta
men
md_
ic2.
pallid
umm
d_ic
2.hi
ppoc
ampu
sm
d_ic
2.am
ygda
lam
d_ic
2.cc
_spl
eniu
mm
d_ic
2.cc
_bod
ym
d_ic
2.cc
_gen
um
d_ic
3.w
hite
mat
ter
md_
ic3.
corte
xm
d_ic
3.th
alam
usm
d_ic
3.ca
udat
em
d_ic
3.pu
tam
enm
d_ic
3.pa
llidum
md_
ic3.
hipp
ocam
pus
md_
ic3.
amyg
dala
md_
ic3.
cc_s
plen
ium
md_
ic3.
cc_b
ody
md_
ic3.
cc_g
enu
md_
ic4.
whi
tem
atte
rm
d_ic
4.co
rtex
md_
ic4.
thal
amus
md_
ic4.
caud
ate
md_
ic4.
puta
men
md_
ic4.
pallid
umm
d_ic
4.hi
ppoc
ampu
sm
d_ic
4.am
ygda
lam
d_ic
4.cc
_spl
eniu
mm
d_ic
4.cc
_bod
ym
d_ic
4.cc
_gen
uw
hite
mat
ter.w
hite
mat
ter
corte
x.co
rtex
thal
amus
.thal
amus
caud
ate.
caud
ate
puta
men
.put
amen
pallid
um.p
allid
umhi
ppoc
ampu
s.hi
ppoc
ampu
sam
ygda
la.a
myg
dala
cc_s
plen
ium
.cc_
sple
nium
cc_b
ody.
cc_b
ody
cc_g
enu.
cc_g
enu
Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices
![Page 23: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/23.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
• JRRS selected rank 1 and only 10 predictors.
• Education is one of them, confirming past findings.
• The fractional anisotropy at corpus callosum stands out amongthe very many DTI-derived measures, in terms of predictive power.
• New finding in the lab and first quantitative confirmation.
Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices
![Page 24: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/24.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
Summary
Methods Adaptation Assumptions Restrictionsto RR-sparsity on X and/or A on p
One-step JRRS(AIC-M) Yes None p ≤ 20
Two-step JRRS1 Restricted Eigenvalue;(RSC → RCGL ) Yes dr (XA) > ”noise level” NoneTwo-step JRRS2(RCGL→ AIC-M) Yes Restricted Eigenvalue None
GL → RSC Yes Mutual coherence et al. Noneminj ‖aj‖2 > noise level
• RSC → RCGL easy to tune in practice; backed up by theory. Best !• RCGL→ AIC-M tuning requires search over a 2-D grid. Second best !
• GL → RSC: (1) Most restrictive theoretical assumptions;
(2) Requires tuning for consistent group selection, open problem!Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices
![Page 25: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/25.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
Summary: Our contribution
Jointly rank and row-sparse models and their estimation
1 Introduced jointly rank and row sparse models.
2 Offered new procedures tailored to the new class of models.
3 Showed that the one-step JRRS is a theoretically optimal adaptiveprocedure:Finite sample oracle inequalities for E|XA− XA‖2F for all A and X .
4 Introduced computationally efficient two-step JRRS.
5 Two-step JRRS satisfy finite sample oracle inequalities underminimal conditions on X .
6 Guaranteed small E‖XA− XA‖2F if A of low rank and few non-zerorows. Analysis valid for all m, n, p, rank r and J. In particular, rand |J| can grow with m and n.
Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices
![Page 26: Joint variable and rank selection for parsimonious ...ethz.ch/content/dam/ethz/special-interest/math/mathematical... · (Macro-economics and forecasting) Y = matrix of n time series](https://reader033.vdocuments.us/reader033/viewer/2022060718/607e4766d764586e962c2a89/html5/thumbnails/26.jpg)
Framework and motivationJoint Rank and Row Selection Methods
Two-step JRRS estimatorsNumerical performance and examples
Summary
Bibliography and acknowledgment
Talk based on
• Florentina Bunea, Yiyuan She and Marten WegkampJoint variable and rank selection for parsimonious estimation ofhigh dimensional matrices ; Cornell University Technical Report,2011.
• Florentina Bunea, Yiyuan She and Marten WegkampOptimal selection of reduced rank estimators of high-dimensionalmatrices; Annals of Statistics, Vol 39, 2011.
• Research partially supported by NSF-DMS 1007444.
Florentina Bunea Department of Statistical Science Cornell UniversityJoint variable and rank selection for parsimonious estimation of high dimensional matrices