sparsity with sign-coherent groups of variables via the cooperative-lasso
TRANSCRIPT
![Page 1: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/1.jpg)
Sparsity with sign-coherent groups of variables via thecooperative-Lasso
Julien Chiquet1, Yves Grandvalet2, Camille Charbonnier1
1 Statistique et Genome, CNRS & Universite d’Evry Val d’Essonne
2 Heudiasyc, CNRS & Universite de Technologie de Compiegne
SSB – 29 mars 2011
arXiv preprint.
http://arxiv.org/abs/1103.2697
R-package scoop.
http://stat.genopole.cnrs.fr/logiciels/scoop
cooperative-Lasso 1
![Page 2: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/2.jpg)
Notations
Let
I Y be the output random variable,
I X = (X1, . . . , Xp) be the input random variables, where Xj is thejth predictor.
The dataGiven a sample (yi, xi), i = 1, . . . , n of i.id. realizations of (Y,X),denote
I y = (y1, . . . , yn)ᵀ the response vector,
I xj = (xj1, . . . , xjn)ᵀ the vector of data for the jth predictor,
I X the n× p design matrix of data whose jth column is xj ,
I D = i : (yi, xi) ∈ training set,I T = i : (yi, xi) ∈ test set.
cooperative-Lasso 2
![Page 3: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/3.jpg)
Generalized linear models
Suppose Y depends linearly on X through a function g:
E(Y ) = g(Xβ?).
We predict a response yi by yi = g(xiβ) for any i ∈ T by solving
β = arg maxβ
`D(β) = arg minβ
∑i∈D
Lg(yi,xiβ),
where Lg is a loss function depending on the function g. Typically,
I if Y is Gaussian and g = Id (OLS),
Lg(y, xβ) = (y − xβ)2
I if Y is binary and g : t 7→ g(t) = (1 + e−t)−1 (logistic regression)
Lg(y, xβ) = −(y · xβ − log
1 + exβ
)or any negative log-likelihood ` of an exponential family distribution.
cooperative-Lasso 3
![Page 4: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/4.jpg)
Generalized linear models
Suppose Y depends linearly on X through a function g:
E(Y ) = g(Xβ?).
We predict a response yi by yi = g(xiβ) for any i ∈ T by solving
β = arg maxβ
`D(β) = arg minβ
∑i∈D
Lg(yi,xiβ),
where Lg is a loss function depending on the function g. Typically,
I if Y is Gaussian and g = Id (OLS),
Lg(y, xβ) = (y − xβ)2
I if Y is binary and g : t 7→ g(t) = (1 + e−t)−1 (logistic regression)
Lg(y, xβ) = −(y · xβ − log
1 + exβ
)or any negative log-likelihood ` of an exponential family distribution.
cooperative-Lasso 3
![Page 5: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/5.jpg)
Estimation and selection at the group level
1. Structure: the set I = 1, . . . , p splits into a known partition.
I =
K⋃k=1
Gk, with Gk ∩ G` = ∅, k 6= `.
2. Sparsity: the support S of β? has few entries.
S = i : β?i 6= 0, such as |S| p.
The group-Lasso estimator
Grandvalet and Canu ’98, Bakin ’99, Yuan and Lin ’06
βgroup
= arg minβ∈Rp
−`D(β) + λ
K∑k=1
wk∥∥βGk∥∥
.
I λ ≥ 0 controls the overall amount of penalty,
I wk > 0 adapts the penalty between groups (dropped hereafter).
cooperative-Lasso 4
![Page 6: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/6.jpg)
Estimation and selection at the group level
1. Structure: the set I = 1, . . . , p splits into a known partition.
I =
K⋃k=1
Gk, with Gk ∩ G` = ∅, k 6= `.
2. Sparsity: the support S of β? has few entries.
S = i : β?i 6= 0, such as |S| p.
The group-Lasso estimator
Grandvalet and Canu ’98, Bakin ’99, Yuan and Lin ’06
βgroup
= arg minβ∈Rp
−`D(β) + λ
K∑k=1
wk∥∥βGk∥∥
.
I λ ≥ 0 controls the overall amount of penalty,
I wk > 0 adapts the penalty between groups (dropped hereafter).
cooperative-Lasso 4
![Page 7: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/7.jpg)
Toy example: the prostate dataset
Examines the correlation between the prostate specific antigen and 8clinical measures for 97 patients.
-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0
lambda (log scale)
coeffi
cien
ts
lcp
agepgg45
gleason
lbph
lcavol
lweight
svi
Figure: Lasso
I lcavol log(cancer volume)
I lweight log(prostate weight)
I age age
I lbph log(benign prostatichyperplasia amount)
I svi seminal vesicle invasion
I lcp log(capsular penetration)
I gleason Gleason score
I pgg45 percentage Gleason scores 4or 5
cooperative-Lasso 5
![Page 8: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/8.jpg)
Toy example: the prostate dataset
Examines the correlation between the prostate specific antigen and 8clinical measures for 97 patients.
age
pgg45
lbph
lcavol
svi
lcp
lweigh
t
gleason
0100
200
300
400
500
600
Heigh
t
Figure: hierarchical clustering
I lcavol log(cancer volume)
I lweight log(prostate weight)
I age age
I lbph log(benign prostatichyperplasia amount)
I svi seminal vesicle invasion
I lcp log(capsular penetration)
I gleason Gleason score
I pgg45 percentage Gleason scores 4or 5
cooperative-Lasso 5
![Page 9: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/9.jpg)
Toy example: the prostate dataset
Examines the correlation between the prostate specific antigen and 8clinical measures for 97 patients.
-3 -2 -1 0
lambda (log scale)
coeffi
cien
ts
lcp
agepgg45
gleason
lbph
lcavol
lweight
svi
Figure: group-Lasso
I lcavol log(cancer volume)
I lweight log(prostate weight)
I age age
I lbph log(benign prostatichyperplasia amount)
I svi seminal vesicle invasion
I lcp log(capsular penetration)
I gleason Gleason score
I pgg45 percentage Gleason scores 4or 5
cooperative-Lasso 5
![Page 10: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/10.jpg)
Toy example: the prostate dataset
Examines the correlation between the prostate specific antigen and 8clinical measures for 97 patients.
-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0
lambda (log scale)
coeffi
cien
ts
lcp
agepgg45
gleason
lbph
lcavol
lweight
svi
Figure: Lasso
I lcavol log(cancer volume)
I lweight log(prostate weight)
I age age
I lbph log(benign prostatichyperplasia amount)
I svi seminal vesicle invasion
I lcp log(capsular penetration)
I gleason Gleason score
I pgg45 percentage Gleason scores 4or 5
cooperative-Lasso 5
![Page 11: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/11.jpg)
Application to splice site detection
Predict splice site status (0/1) by a sequence of 7 bases and theirinteractions.
1 2 3 4 5 6 7 8 9
Position
0
0.5
1
1.5
2
Inform
ationcontent I order 0: 7 factors with 4 levels,
I order 1: C27 factors with 42 levels,
I order 2: C37 factors with 43 levels,
I using dummy coding for factor,we form groups.
L. Meier, S. van de Geer, P. Buhlmann, 2008.The group-Lasso for logistic regression, JRSS series B.
cooperative-Lasso 6
![Page 12: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/12.jpg)
Application to splice site detection
Predict splice site status (0/1) by a sequence of 7 bases and theirinteractions.
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0
g18
g5
g4
g44
g54
g42
g49
g45
g61
order 0order 1order 2
I order 0: 7 factors with 4 levels,
I order 1: C27 factors with 42 levels,
I order 2: C37 factors with 43 levels,
I using dummy coding for factor,we form groups.
L. Meier, S. van de Geer, P. Buhlmann, 2008.The group-Lasso for logistic regression, JRSS series B.
cooperative-Lasso 6
![Page 13: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/13.jpg)
Group-Lasso limitations
1. Not a single zero should belong to a group with non-zerosI Strong group sparsity (Huang and Zhang, ’10 arXiv)
establish the conditions where the group-Lasso outperforms the Lasso,and conversely.
2. No sign-coherence within groupI Required if groups gather consonant variables
e.g., groups defined by clusters of positively correlated variables.
The cooperative-Lasso
A penalty which assumes a sign-coherent group structure, that is to say,groups which gather either
I non-positive,
I non-negative,
I or null parameters.
cooperative-Lasso 7
![Page 14: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/14.jpg)
Group-Lasso limitations
1. Not a single zero should belong to a group with non-zerosI Strong group sparsity (Huang and Zhang, ’10 arXiv)
establish the conditions where the group-Lasso outperforms the Lasso,and conversely.
2. No sign-coherence within groupI Required if groups gather consonant variables
e.g., groups defined by clusters of positively correlated variables.
The cooperative-Lasso
A penalty which assumes a sign-coherent group structure, that is to say,groups which gather either
I non-positive,
I non-negative,
I or null parameters.
cooperative-Lasso 7
![Page 15: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/15.jpg)
Motivation: multiple network inference
experiment 1 experiment 2 experiment 3
inference inference inference
A group is a set of corresponding edges across tasks (e.g., red or blueones): sign-coherence matters!
J. Chiquet, Y. Grandvalet, C. Ambroise, 2010.Inferring multiple graphical structures, Statistics and Computing.
cooperative-Lasso 8
![Page 16: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/16.jpg)
Motivation: joint segmentation of aCGH profiles
0 50 100 150 200
-2-1
01
position on chromosom
log-ratio(C
NVs)
minimize
β∈Rp‖β − y‖2,
s.t
p∑i=1
|βi − βi−1| < s,
where
I y a vector in Rp,
I β a vector in Rp,
I βi a size-n vector with ith probesfor the n profiles.
I a group gathers every position iacross profiles.
Sign-coherence may avoid inconsistent
variations across profiles.
K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.
cooperative-Lasso 9
![Page 17: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/17.jpg)
Motivation: joint segmentation of aCGH profiles
0 50 100 150 200
-2-1
01
position on chromosom
log-ratio(C
NVs)
minimizeβ∈Rn×p
‖β −Y‖2,
s.t
p∑i=1
‖βi − βi−1‖ < s,
where
I Y a n× p matrix with n profileswith size p.
I βi a size-n vector with ith probesfor the n profiles.
I a group gathers every position iacross profiles.
Sign-coherence may avoid inconsistent
variations across profiles.
K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.
cooperative-Lasso 9
![Page 18: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/18.jpg)
Motivation: joint segmentation of aCGH profiles
0 50 100 150 200
-2-1
01
position on chromosom
log-ratio(C
NVs)
minimizeβ∈Rn×p
‖β −Y‖2,
s.t
p∑i=1
‖βi − βi−1‖ < s,
where
I Y a n× p matrix with n profileswith size p.
I βi a size-n vector with ith probesfor the n profiles.
I a group gathers every position iacross profiles.
Sign-coherence may avoid inconsistent
variations across profiles.
K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.
cooperative-Lasso 9
![Page 19: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/19.jpg)
Motivation: joint segmentation of aCGH profiles
0 50 100 150 200
-2-1
01
position on chromosom
log-ratio(C
NVs)
minimizeβ∈Rn×p
‖β −Y‖2,
s.t
p∑i=1
‖βi − βi−1‖ < s,
where
I Y a n× p matrix with n profileswith size p.
I βi a size-n vector with ith probesfor the n profiles.
I a group gathers every position iacross profiles.
Sign-coherence may avoid inconsistent
variations across profiles.
K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.
cooperative-Lasso 9
![Page 20: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/20.jpg)
Motivation: joint segmentation of aCGH profiles
0 50 100 150 200
-2-1
01
position on chromosom
log-ratio(C
NVs)
minimizeβ∈Rn×p
‖β −Y‖2,
s.t
p∑i=1
‖βi − βi−1‖ < s,
where
I Y a n× p matrix with n profileswith size p.
I βi a size-n vector with ith probesfor the n profiles.
I a group gathers every position iacross profiles.
Sign-coherence may avoid inconsistent
variations across profiles.
K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.
cooperative-Lasso 9
![Page 21: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/21.jpg)
Motivation: joint segmentation of aCGH profiles
0 50 100 150 200
-2-1
01
position on chromosom
log-ratio(C
NVs)
minimizeβ∈Rn×p
‖β −Y‖2,
s.t
p∑i=1
‖βi − βi−1‖ < s,
where
I Y a n× p matrix with n profileswith size p.
I βi a size-n vector with ith probesfor the n profiles.
I a group gathers every position iacross profiles.
Sign-coherence may avoid inconsistent
variations across profiles.
K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.
cooperative-Lasso 9
![Page 22: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/22.jpg)
Motivation: joint segmentation of aCGH profiles
0 50 100 150 200
-2-1
01
position on chromosom
log-ratio(C
NVs)
minimizeβ∈Rn×p
‖β −Y‖2,
s.t
p∑i=1
‖βi − βi−1‖ < s,
where
I Y a n× p matrix with n profileswith size p.
I βi a size-n vector with ith probesfor the n profiles.
I a group gathers every position iacross profiles.
Sign-coherence may avoid inconsistent
variations across profiles.
K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.
cooperative-Lasso 9
![Page 23: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/23.jpg)
Outline
Definition
Resolution
Consistency
Model selection
Simulation studies
Sibling probe sets and gene selection
cooperative-Lasso 10
![Page 24: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/24.jpg)
Outline
Definition
Resolution
Consistency
Model selection
Simulation studies
Sibling probe sets and gene selection
cooperative-Lasso 11
![Page 25: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/25.jpg)
The cooperative-Lasso estimator
Definition
βcoop
= arg minβ∈Rp
J(β), with J(β) = −`D(β) + λ‖β‖coop,
where, for any v ∈ Rp,
‖v‖coop = ‖v+‖group + ‖v−‖group =
K∑k=1
∥∥∥v+Gk
∥∥∥+∥∥∥v−Gk∥∥∥ ,
and
I v+ = (v+1 , . . . , v+p ), v+j = max(0, vj),
I v− = (v−1 , . . . , v−p ), v+j = max(0,−vj).
cooperative-Lasso 12
![Page 26: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/26.jpg)
A geometric view of sparsity`(β1,β
2)
β2 β1
minimizeβ1,β2
−`(β1, β2) + λΩ(β1, β2)
mmaximize
β1,β2`(β1, β2)
s.t. Ω(β1, β2) ≤ c
cooperative-Lasso 13
![Page 27: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/27.jpg)
A geometric view of sparsityβ2
β1
minimizeβ1,β2
−`(β1, β2) + λΩ(β1, β2)
mmaximize
β1,β2`(β1, β2)
s.t. Ω(β1, β2) ≤ c
cooperative-Lasso 13
![Page 28: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/28.jpg)
Ball crafting: group-Lasso
Admissible set
I β = (β1, β2, β3, β4)ᵀ,
I G1 = 1, 2, G2 = 3, 4.
Unit ball
‖β‖group ≤ 1
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
β2=
0β2=
0.3
β4 = 0 β4 = 0.3
cooperative-Lasso 14
![Page 29: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/29.jpg)
Ball crafting: group-Lasso
Admissible set
I β = (β1, β2, β3, β4)ᵀ,
I G1 = 1, 2, G2 = 3, 4.
Unit ball
‖β‖group ≤ 1
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
β2=
0β2=
0.3
β4 = 0 β4 = 0.3
cooperative-Lasso 14
![Page 30: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/30.jpg)
Ball crafting: group-Lasso
Admissible set
I β = (β1, β2, β3, β4)ᵀ,
I G1 = 1, 2, G2 = 3, 4.
Unit ball
‖β‖group ≤ 1
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
β2=
0β2=
0.3
β4 = 0 β4 = 0.3
cooperative-Lasso 14
![Page 31: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/31.jpg)
Ball crafting: group-Lasso
Admissible set
I β = (β1, β2, β3, β4)ᵀ,
I G1 = 1, 2, G2 = 3, 4.
Unit ball
‖β‖group ≤ 1
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
β2=
0β2=
0.3
β4 = 0 β4 = 0.3
cooperative-Lasso 14
![Page 32: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/32.jpg)
Ball crafting: cooperative-Lasso
Admissible set
I β = (β1, β2, β3, β4)ᵀ,
I G1 = 1, 2, G2 = 3, 4.
Unit ball
‖β‖coop ≤ 1
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
β2=
0β2=
0.3
β4 = 0 β4 = 0.3
cooperative-Lasso 15
![Page 33: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/33.jpg)
Ball crafting: cooperative-Lasso
Admissible set
I β = (β1, β2, β3, β4)ᵀ,
I G1 = 1, 2, G2 = 3, 4.
Unit ball
‖β‖coop ≤ 1
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
β2=
0β2=
0.3
β4 = 0 β4 = 0.3
cooperative-Lasso 15
![Page 34: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/34.jpg)
Ball crafting: cooperative-Lasso
Admissible set
I β = (β1, β2, β3, β4)ᵀ,
I G1 = 1, 2, G2 = 3, 4.
Unit ball
‖β‖coop ≤ 1
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
β2=
0β2=
0.3
β4 = 0 β4 = 0.3
cooperative-Lasso 15
![Page 35: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/35.jpg)
Ball crafting: cooperative-Lasso
Admissible set
I β = (β1, β2, β3, β4)ᵀ,
I G1 = 1, 2, G2 = 3, 4.
Unit ball
‖β‖coop ≤ 1
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
1
1
−1
−1
β1
β3
β2=
0β2=
0.3
β4 = 0 β4 = 0.3
cooperative-Lasso 15
![Page 36: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/36.jpg)
Outline
Definition
Resolution
Consistency
Model selection
Simulation studies
Sibling probe sets and gene selection
cooperative-Lasso 16
![Page 37: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/37.jpg)
Convex analysisSupporting Hyperplane
An hyperplane supports a set iff
I the set is contained in one half-space
I the set has at least one point on the hyperplane
β2
β1
cooperative-Lasso 17
![Page 38: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/38.jpg)
Convex analysisSupporting Hyperplane
An hyperplane supports a set iff
I the set is contained in one half-space
I the set has at least one point on the hyperplane
β2
β1
β2
β1
cooperative-Lasso 17
![Page 39: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/39.jpg)
Convex analysisSupporting Hyperplane
An hyperplane supports a set iff
I the set is contained in one half-space
I the set has at least one point on the hyperplane
β2
β1
β2
β1
cooperative-Lasso 17
![Page 40: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/40.jpg)
Convex analysisSupporting Hyperplane
An hyperplane supports a set iff
I the set is contained in one half-space
I the set has at least one point on the hyperplane
β2
β1
β2
β1
cooperative-Lasso 17
![Page 41: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/41.jpg)
Convex analysisSupporting Hyperplane
An hyperplane supports a set iff
I the set is contained in one half-space
I the set has at least one point on the hyperplane
β2
β1
β2
β1
β2
β1
There are Supporting Hyperplane at all points of convex sets:Generalize tangents
cooperative-Lasso 17
![Page 42: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/42.jpg)
Convex analysisDual Cone and subgradient
Generalizes normals
β2
β1
β2
β1
β2
β1
g is a subgradient at xm
the vector (g,−1) is normal to the supporting hyperplane at this point
The subdifferential at x is the set of all subgradient at x.
cooperative-Lasso 18
![Page 43: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/43.jpg)
Convex analysisDual Cone and subgradient
Generalizes normals
β2
β1
β2
β1
β2
β1
g is a subgradient at xm
the vector (g,−1) is normal to the supporting hyperplane at this point
The subdifferential at x is the set of all subgradient at x.
cooperative-Lasso 18
![Page 44: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/44.jpg)
Convex analysisDual Cone and subgradient
Generalizes normals
β2
β1
β2
β1
β2
β1
g is a subgradient at xm
the vector (g,−1) is normal to the supporting hyperplane at this point
The subdifferential at x is the set of all subgradient at x.
cooperative-Lasso 18
![Page 45: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/45.jpg)
Convex analysisDual Cone and subgradient
Generalizes normals
β2
β1
β2
β1
β2
β1
g is a subgradient at xm
the vector (g,−1) is normal to the supporting hyperplane at this point
The subdifferential at x is the set of all subgradient at x.
cooperative-Lasso 18
![Page 46: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/46.jpg)
Optimality conditions
TheoremA necessary and sufficient condition for the optimality of β is that thenull vector 0 belong to the subdifferential of the convex function J :
0 3 ∂βJ(β) = v ∈ Rp : v = −∇β`(β) + λθ,
where θ ∈ Rp belongs to the subdifferential of the coop-norm.Define
ϕj(v) = (sign(vj)v)+,
then θ is such as
∀k ∈ 1, . . . ,K , ∀j ∈ Sk(β) , θj =βj∥∥ϕj(βGk)
∥∥ ,
∀k ∈ 1, . . . ,K , ∀j ∈ Sck(β) ,∥∥ϕj (θGk)
∥∥ ≤ 1.
We derive a subset algorithm to solve that problem (that you canenjoy in the paper and the package).
cooperative-Lasso 19
![Page 47: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/47.jpg)
Optimality conditions
TheoremA necessary and sufficient condition for the optimality of β is that thenull vector 0 belong to the subdifferential of the convex function J :
0 3 ∂βJ(β) = v ∈ Rp : v = −∇β`(β) + λθ,
where θ ∈ Rp belongs to the subdifferential of the coop-norm.Define
ϕj(v) = (sign(vj)v)+,
then θ is such as
∀k ∈ 1, . . . ,K , ∀j ∈ Sk(β) , θj =βj∥∥ϕj(βGk)
∥∥ ,
∀k ∈ 1, . . . ,K , ∀j ∈ Sck(β) ,∥∥ϕj (θGk)
∥∥ ≤ 1.
We derive a subset algorithm to solve that problem (that you canenjoy in the paper and the package).
cooperative-Lasso 19
![Page 48: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/48.jpg)
Optimality conditions
TheoremA necessary and sufficient condition for the optimality of β is that thenull vector 0 belong to the subdifferential of the convex function J :
0 3 ∂βJ(β) = v ∈ Rp : v = −∇β`(β) + λθ,
where θ ∈ Rp belongs to the subdifferential of the coop-norm.Define
ϕj(v) = (sign(vj)v)+,
then θ is such as
∀k ∈ 1, . . . ,K , ∀j ∈ Sk(β) , θj =βj∥∥ϕj(βGk)
∥∥ ,
∀k ∈ 1, . . . ,K , ∀j ∈ Sck(β) ,∥∥ϕj (θGk)
∥∥ ≤ 1.
We derive a subset algorithm to solve that problem (that you canenjoy in the paper and the package).
cooperative-Lasso 19
![Page 49: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/49.jpg)
Linear regression with orthonormal design
Consider
β = arg minβ
1
2‖y −Xβ‖2 + λΩ(β)
,
with XᵀX = I. Hence, (xj)ᵀ(Xβ − y) = βj − βols
and
β = arg minβ
1
2βᵀ(β − βols
) + λΩ(β)
.
We may find a closed-form of β for, e.g.,
1. Ω(β) = ‖β‖lasso,
2. Ω(β) = ‖β‖group,
3. Ω(β) = ‖β‖coop.
cooperative-Lasso 20
![Page 50: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/50.jpg)
Linear regression with orthonormal design
Consider
β = arg minβ
1
2‖y −Xβ‖2 + λΩ(β)
,
with XᵀX = I. Hence, (xj)ᵀ(Xβ − y) = βj − βols
and
β = arg minβ
1
2βᵀ(β − βols
) + λΩ(β)
.
We may find a closed-form of β for, e.g.,
1. Ω(β) = ‖β‖lasso,
2. Ω(β) = ‖β‖group,
3. Ω(β) = ‖β‖coop.
cooperative-Lasso 20
![Page 51: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/51.jpg)
Linear regression with orthonormal design
βlasso1
βols2 βols
1
∀j ∈ 1, . . . , p ,
βlassoj =
1− λ∣∣∣βolsj
∣∣∣+ βols
j ,
∣∣∣βlassoj
∣∣∣ =(∣∣∣βols
j
∣∣∣− λ)+ .
Fig.: Lasso as a function of the OLS coefficients
cooperative-Lasso 20
![Page 52: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/52.jpg)
Linear regression with orthonormal design
βgroup1
βols2 βols
1
∀k ∈ 1, . . . ,K , ∀j ∈ Gk ,
βgroupj =
1− λ∥∥∥βols
Gk
∥∥∥+ βols
j ,
∥∥∥βgroup
Gk
∥∥∥ =(∥∥∥βols
Gk
∥∥∥− λ)+ .
Fig.: Group-Lasso as a function of the OLS coefficients
cooperative-Lasso 20
![Page 53: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/53.jpg)
Linear regression with orthonormal design
βcoop1
βols2 βols
1
∀k ∈ 1, . . . ,K , ∀j ∈ Gk ,
βcoopj =
1− λ∥∥∥ϕj(βols
Gk )∥∥∥+ βols
j ,
∥∥∥ϕj(βcoop
Gk )∥∥∥ =
(∥∥∥ϕj(βols
Gk )∥∥∥− λ)+ .
Fig.: Coop-Lasso as a function of the OLS coefficients
cooperative-Lasso 20
![Page 54: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/54.jpg)
Outline
Definition
Resolution
Consistency
Model selection
Simulation studies
Sibling probe sets and gene selection
cooperative-Lasso 21
![Page 55: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/55.jpg)
Linear regression setupTechnical assumptions
(A1) X and Y have finite fourth order moments
E‖X‖4 <∞, E|Y |4 <∞,
(A2) the covariance matrix Ψ = EXXᵀ ∈ Rp×p is invertible,
(A3) for every k = 1, . . . ,K,if ‖(β?)+‖ > 0 and ‖(β?)−‖ > 0 then for every j ∈ Gk β?j 6= 0.(All sign-coherent groups are either included or excluded from the true support).
cooperative-Lasso 22
![Page 56: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/56.jpg)
Irrepresentability condition
Define Sk = S ∩ Gk the support within a group and
[D(β)]jj = ‖[sign(βj)βGk ]+‖−1.
Assume there exists η > 0 such that
(A4) For every group Gk including at least one null coefficient:
max(‖(ΨSckSΨ−1SSD(β?S)β?S)+‖, ‖(ΨSckSΨ−1SSD(β?S)β?S)−‖) ≤ 1− η,
(A5) For every group Gk intersecting the support and including eitherpositive or negative coefficients, let νk be the sign of thesecoefficients (νk = 1 if ‖(β?Gk)+‖ > 0 and νk = −1 if ‖(β?Gk)−‖ > 0):
νkΨSckSΨ−1SSD(β?S)β?S 0,
where denotes componentwise inequality.
cooperative-Lasso 23
![Page 57: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/57.jpg)
Consistency results
TheoremIf assumptions (A1-5) are satisfied and if there exists η > 0, then forevery sequence λn such that λn = λ0n
−γ , γ ∈]0, 1/2[,
βcoop P−→ β? and P(S(β
coop) = S)→ 1.
Asymptotically, the cooperative-Lasso is unbiased and enjoys exactsupport recovery (even when there are irrelevant variables within agroup).
cooperative-Lasso 24
![Page 58: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/58.jpg)
Sketch of the proof
1. Construct an artifical estimator βS restricted to the true support Sand extend it with 0 coefficients on Sc.
2. Consider the event En on which β satisfies the original optimalityconditions. On En, βS = β
coop
S and βcoop
Sc = 0, by uniqueness.
3. We need to prove that limn→∞ P(En) = 1.
4. Derive the asymptotic distribution of the derivative of the lossfunction Xᵀ(y −Xβ) from
I TCL on second order moments,I Optimality conditions on βS .I Right choice of λn provides convergence in probability.
5. Assumptions (A4-5) assume that the limits in probability satisfyoptimality constraints with strict inequalities.
6. As a result, optimility conditions are satisfied (with largeinequalities) with probability tending to 1.
cooperative-Lasso 25
![Page 59: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/59.jpg)
Sketch of the proof
1. Construct an artifical estimator βS restricted to the true support Sand extend it with 0 coefficients on Sc.
2. Consider the event En on which β satisfies the original optimalityconditions. On En, βS = β
coop
S and βcoop
Sc = 0, by uniqueness.
3. We need to prove that limn→∞ P(En) = 1.
4. Derive the asymptotic distribution of the derivative of the lossfunction Xᵀ(y −Xβ) from
I TCL on second order moments,I Optimality conditions on βS .I Right choice of λn provides convergence in probability.
5. Assumptions (A4-5) assume that the limits in probability satisfyoptimality constraints with strict inequalities.
6. As a result, optimility conditions are satisfied (with largeinequalities) with probability tending to 1.
cooperative-Lasso 25
![Page 60: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/60.jpg)
Sketch of the proof
1. Construct an artifical estimator βS restricted to the true support Sand extend it with 0 coefficients on Sc.
2. Consider the event En on which β satisfies the original optimalityconditions. On En, βS = β
coop
S and βcoop
Sc = 0, by uniqueness.
3. We need to prove that limn→∞ P(En) = 1.
4. Derive the asymptotic distribution of the derivative of the lossfunction Xᵀ(y −Xβ) from
I TCL on second order moments,I Optimality conditions on βS .I Right choice of λn provides convergence in probability.
5. Assumptions (A4-5) assume that the limits in probability satisfyoptimality constraints with strict inequalities.
6. As a result, optimility conditions are satisfied (with largeinequalities) with probability tending to 1.
cooperative-Lasso 25
![Page 61: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/61.jpg)
Sketch of the proof
1. Construct an artifical estimator βS restricted to the true support Sand extend it with 0 coefficients on Sc.
2. Consider the event En on which β satisfies the original optimalityconditions. On En, βS = β
coop
S and βcoop
Sc = 0, by uniqueness.
3. We need to prove that limn→∞ P(En) = 1.
4. Derive the asymptotic distribution of the derivative of the lossfunction Xᵀ(y −Xβ) from
I TCL on second order moments,I Optimality conditions on βS .I Right choice of λn provides convergence in probability.
5. Assumptions (A4-5) assume that the limits in probability satisfyoptimality constraints with strict inequalities.
6. As a result, optimility conditions are satisfied (with largeinequalities) with probability tending to 1.
cooperative-Lasso 25
![Page 62: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/62.jpg)
Illustration
-3 -2 -1 0 1
-1.0
-0.5
0.0
0.5
1.0
log10(λ)
coeffi
cien
ts
Generate data y = Xβ? + σε,
I β? = (1, 1,−1,−1, 0, 0, 0, 0)I G = 1, 2, 3, 4, 5, 6, 7, 8I σ = 0.1, R2 ≈ 0.99, n = 20,
I irrepresentability conditionsI holds for the coop-Lasso,I holds not for the group-Lasso.
I average over 100 simulations.
Fig.:: 50% coverage intervals (upper / lower quartiles)
cooperative-Lasso 26
![Page 63: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/63.jpg)
Illustration
-3 -2 -1 0 1
-1.0
-0.5
0.0
0.5
1.0
log10(λ)
coeffi
cien
ts
Generate data y = Xβ? + σε,
I β? = (1, 1,−1,−1, 0, 0, 0, 0)I G = 1, 2, 3, 4, 5, 6, 7, 8I σ = 0.1, R2 ≈ 0.99, n = 20,
I irrepresentability conditionsI holds for the coop-Lasso,I holds not for the group-Lasso.
I average over 100 simulations.
Fig.:group-Lasso: 50% coverage intervals (upper / lower quartiles)
cooperative-Lasso 26
![Page 64: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/64.jpg)
Illustration
-3 -2 -1 0 1
-1.0
-0.5
0.0
0.5
1.0
log10(λ)
coeffi
cien
ts
Generate data y = Xβ? + σε,
I β? = (1, 1,−1,−1, 0, 0, 0, 0)I G = 1, 2, 3, 4, 5, 6, 7, 8I σ = 0.1, R2 ≈ 0.99, n = 20,
I irrepresentability conditionsI holds for the coop-Lasso,I holds not for the group-Lasso.
I average over 100 simulations.
Fig.:coop-Lasso: 50% coverage intervals (upper / lower quartiles)
cooperative-Lasso 26
![Page 65: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/65.jpg)
Outline
Definition
Resolution
Consistency
Model selection
Simulation studies
Sibling probe sets and gene selection
cooperative-Lasso 27
![Page 66: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/66.jpg)
Optimism of the training error
I The training error:
err =1
|D|∑i∈D
L(yi,xiβ).
I The test error (“extra-sample” error):
Errex = EX,Y [L(Y,Xβ)|D].
I The “in-sample” error
Errin =1
|D|∑i∈D
EY[L(Yi,xiβ)|D
].
Definition (Optimism)
Errin = err + ”optimism”.
cooperative-Lasso 28
![Page 67: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/67.jpg)
Optimism of the training error
I The training error:
err =1
|D|∑i∈D
L(yi,xiβ).
I The test error (“extra-sample” error):
Errex = EX,Y [L(Y,Xβ)|D].
I The “in-sample” error
Errin =1
|D|∑i∈D
EY[L(Yi,xiβ)|D
].
Definition (Optimism)
Errin = err + ”optimism”.
cooperative-Lasso 28
![Page 68: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/68.jpg)
Cp statistics
For squared-error loss (and some other loss),
Errin = err +2
|D|∑i∈D
cov(yi, yi).
The amount by which err underestimates the true error dependson how strongly yi affects its own prediction. The harder we fitthe data, the greater the covariance will be thereby increasingthe optimism (ESLII 5th print).
Mallows’ Cp Statistic
For a linear regression fit yi with p inputs∑
i∈D cov(yi, yi) = pσ2 :
Cp = err + 2 · df
|D| σ2, with df = p.
cooperative-Lasso 29
![Page 69: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/69.jpg)
Cp statistics
For squared-error loss (and some other loss),
Errin = err +2
|D|∑i∈D
cov(yi, yi).
The amount by which err underestimates the true error dependson how strongly yi affects its own prediction. The harder we fitthe data, the greater the covariance will be thereby increasingthe optimism (ESLII 5th print).
Mallows’ Cp Statistic
For a linear regression fit yi with p inputs∑
i∈D cov(yi, yi) = pσ2 :
Cp = err + 2 · df
|D| σ2, with df = p.
cooperative-Lasso 29
![Page 70: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/70.jpg)
Generalized degrees of freedom
Let y(λ) = Xβ(λ) be the predicted values for a penalized estimator.
Proposition (Efron (’04)+ Stein’s Lemma (’81))
df(λ).=
1
σ2
∑i∈D
cov(yi(λ), yi) = Ey
[tr∂yλ∂y
].
For the Lasso, Zou et al. (’07) show that
df lasso(λ) =∥∥∥βlasso
(λ)∥∥∥0.
Assuming XᵀX = I Yuan and Lin (’06) show for the group-Lasso thatthe trace term equals
dfgroup(λ) =
K∑k=1
1(∥∥∥βgroup
Gk (λ)∥∥∥ > 0
)1 +
∥∥∥βgroup
Gk (λ)∥∥∥∥∥βols
Gk∥∥ (pk − 1)
.
cooperative-Lasso 30
![Page 71: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/71.jpg)
Generalized degrees of freedom
Let y(λ) = Xβ(λ) be the predicted values for a penalized estimator.
Proposition (Efron (’04)+ Stein’s Lemma (’81))
df(λ).=
1
σ2
∑i∈D
cov(yi(λ), yi) = Ey
[tr∂yλ∂y
].
For the Lasso, Zou et al. (’07) show that
df lasso(λ) =∥∥∥βlasso
(λ)∥∥∥0.
Assuming XᵀX = I Yuan and Lin (’06) show for the group-Lasso thatthe trace term equals
dfgroup(λ) =
K∑k=1
1(∥∥∥βgroup
Gk (λ)∥∥∥ > 0
)1 +
∥∥∥βgroup
Gk (λ)∥∥∥∥∥βols
Gk∥∥ (pk − 1)
.
cooperative-Lasso 30
![Page 72: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/72.jpg)
Generalized degrees of freedom
Let y(λ) = Xβ(λ) be the predicted values for a penalized estimator.
Proposition (Efron (’04)+ Stein’s Lemma (’81))
df(λ).=
1
σ2
∑i∈D
cov(yi(λ), yi) = Ey
[tr∂yλ∂y
].
For the Lasso, Zou et al. (’07) show that
df lasso(λ) =∥∥∥βlasso
(λ)∥∥∥0.
Assuming XᵀX = I Yuan and Lin (’06) show for the group-Lasso thatthe trace term equals
dfgroup(λ) =
K∑k=1
1(∥∥∥βgroup
Gk (λ)∥∥∥ > 0
)1 +
∥∥∥βgroup
Gk (λ)∥∥∥∥∥βols
Gk∥∥ (pk − 1)
.
cooperative-Lasso 30
![Page 73: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/73.jpg)
Approximated degrees of freedom for the coop-Lasso
Proposition
Assuming that data are generated according to a linear regression modeland that X is orthonormal, the following expression of dfcoop(λ) is anunbiased estimate of df(λ)
dfcoop(λ) =
K∑k=1
1∥∥∥∥(βcoopGk
(λ))+∥∥∥∥>0
1 + (pk+ − 1)
∥∥∥∥(βcoop
Gk (λ))+∥∥∥∥∥∥∥∥(βols
Gk
)+∥∥∥∥
+ 1∥∥∥∥(βcoopGk
(λ))−∥∥∥∥>0
1 + (pk− − 1)
∥∥∥∥(βcoop
Gk (λ))−∥∥∥∥∥∥∥(βols
Gk)−∥∥∥
,
where pk+ and pk− are respectively the number of positive and negative
entries in βols
Gk (γ).
cooperative-Lasso 31
![Page 74: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/74.jpg)
Approximated degrees of freedom for the coop-Lasso
Proposition
Assuming that data are generated according to a linear regression modeland that X is orthonormal, the following expression of dfcoop(λ) is anunbiased estimate of df(λ)
dfcoop(λ) =
K∑k=1
1∥∥∥∥(βcoopGk
(λ))+∥∥∥∥>0
1 +
pk+− 1
1 + γ
∥∥∥∥(βcoop
Gk (λ))+∥∥∥∥∥∥∥∥(βridge
Gk (γ))+∥∥∥∥
+ 1∥∥∥∥(βcoopGk
(λ))−∥∥∥∥>0
1 +
pk−− 1
1 + γ
∥∥∥∥(βcoop
Gk (λ))−∥∥∥∥∥∥∥∥(βridge
Gk (γ))−∥∥∥∥
,
where pk+ and pk− are respectively the number of positive and negative
entries in βridge
Gk (γ).cooperative-Lasso 31
![Page 75: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/75.jpg)
Approximated information criteria
Following Zou et al, we extend the Cp stat to an “approximated” AIC
AIC(λ) =‖y − y(λ)‖
σ2+ 2df(λ),
and from the AIC, there is (small) step to BIC:
BIC(λ) =‖y − y(λ)‖
σ2+ log(n)df(λ).
I The K–fold cross-validation works well but is computationallyintensive.
I It is required when we do not meet the linear regression setup. . .
cooperative-Lasso 32
![Page 76: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/76.jpg)
Outline
Definition
Resolution
Consistency
Model selection
Simulation studies
Sibling probe sets and gene selection
cooperative-Lasso 33
![Page 77: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/77.jpg)
Revisiting Elastic-Net experiments (1)
lasso enet group coop
1020
3040
5060
70
MS
E
Generate data y = Xβ? + σε,
I β? =(0, . . . , 0︸ ︷︷ ︸
10
, 2, . . . , 2︸ ︷︷ ︸10
, 0, . . . , 0︸ ︷︷ ︸10
, 2, . . . , 2︸ ︷︷ ︸10
)
I G1 = 1, . . . , 10,G2 = 11, . . . , 20,G3 = 21, . . . , 30,G4 = 31, . . . , 40.
I σ = 15, corr(xi,xj) = 0.5,
I training/validation/test/ =100/100/400,
I average over 100 simulations.
cooperative-Lasso 34
![Page 78: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/78.jpg)
Revisiting Elastic-Net experiments (2)
lasso enet group coop
050
100
150
200
250
MS
E
Generate data y = Xβ? + σε,
I β? = (3, . . . , 3︸ ︷︷ ︸15
, 0, . . . , 0︸ ︷︷ ︸25
)
I σ = 15,
I G1 = 1, . . . , 5,G2 = 6, . . . , 10,G3 = 11, . . . , 15,G4 = 16, . . . , 40.
I xj = Z1 + ε, Z1 ∼ N (0, 1), ∀j ∈ G1I xj = Z3 + ε, Z2 ∼ N (0, 1), ∀j ∈ G2I xj = Z3 + ε, Z3 ∼ N (0, 1), ∀j ∈ G3I xj ∼ N (0, 1), ∀j ∈ G4.
I training/validation/test/ =50/50/400,
I average over 100 simulations.
cooperative-Lasso 35
![Page 79: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/79.jpg)
Breiman’s setupSimulations setting
A wave-like vector of parameters β?
I p = 90 variables partitioned into K = 10 groups of size pk = 9,
I 3 (partially) active groups, 6 groups of zeros,
I in active groups, β?j ∝ (h− |5− j|) with h = 1, . . . , 5.
0 20 40 60 80
Figure: β? with h = 1, |Sk| = 1 non-zero coefficients in each active group.
cooperative-Lasso 36
![Page 80: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/80.jpg)
Breiman’s setupSimulations setting
A wave-like vector of parameters β?
I p = 90 variables partitioned into K = 10 groups of size pk = 9,
I 3 (partially) active groups, 6 groups of zeros,
I in active groups, β?j ∝ (h− |5− j|) with h = 1, . . . , 5.
0 20 40 60 80
Figure: β? with h = 2, |Sk| = 3 non-zero coefficients in each active group.
cooperative-Lasso 36
![Page 81: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/81.jpg)
Breiman’s setupSimulations setting
A wave-like vector of parameters β?
I p = 90 variables partitioned into K = 10 groups of size pk = 9,
I 3 (partially) active groups, 6 groups of zeros,
I in active groups, β?j ∝ (h− |5− j|) with h = 1, . . . , 5.
0 20 40 60 80
Figure: β? with h = 3, |Sk| = 5 non-zero coefficients in each active group.
cooperative-Lasso 36
![Page 82: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/82.jpg)
Breiman’s setupSimulations setting
A wave-like vector of parameters β?
I p = 90 variables partitioned into K = 10 groups of size pk = 9,
I 3 (partially) active groups, 6 groups of zeros,
I in active groups, β?j ∝ (h− |5− j|) with h = 1, . . . , 5.
0 20 40 60 80
Figure: β? with h = 4, |Sk| = 7 non-zero coefficients in each active group.
cooperative-Lasso 36
![Page 83: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/83.jpg)
Breiman’s setupSimulations setting
A wave-like vector of parameters β?
I p = 90 variables partitioned into K = 10 groups of size pk = 9,
I 3 (partially) active groups, 6 groups of zeros,
I in active groups, β?j ∝ (h− |5− j|) with h = 1, . . . , 5.
0 20 40 60 80
Figure: β? with h = 5, |Sk| = 9 non-zero coefficients in each active group.
cooperative-Lasso 36
![Page 84: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/84.jpg)
Breiman’s setupExample of path of solution and signal recovery with BIC choice
The signal strength is generated so as
I y = Xβ? + σε, with σ = 1, n = 30 to 500,
I X ∼ N (0,Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example),
I magnitude in β chosen so as R2 ≈ 0.75.
Remark
Covariance structure is purposely disconnected from the group structure.
None of the support recovery conditions are fulfilled.
cooperative-Lasso 37
![Page 85: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/85.jpg)
Breiman’s setupExample of path of solution and signal recovery with BIC choice
The signal strength is generated so as
I y = Xβ? + σε, with σ = 1, n = 30 to 500,
I X ∼ N (0,Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example),
I magnitude in β chosen so as R2 ≈ 0.75.
One shot sample with n = 120
cooperative-Lasso 37
![Page 86: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/86.jpg)
Breiman’s setupExample of path of solution and signal recovery with BIC choice
The signal strength is generated so as
I y = Xβ? + σε, with σ = 1, n = 30 to 500,
I X ∼ N (0,Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example),
I magnitude in β chosen so as R2 ≈ 0.75.
-0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0
-0.2
0.0
0.2
0.4
0.6
log10(λ)
βlasso
0 20 40 60 80
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
i
βlasso
True signal
Estimated signal
Figure: Lassocooperative-Lasso 37
![Page 87: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/87.jpg)
Breiman’s setupExample of path of solution and signal recovery with BIC choice
The signal strength is generated so as
I y = Xβ? + σε, with σ = 1, n = 30 to 500,
I X ∼ N (0,Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example),
I magnitude in β chosen so as R2 ≈ 0.75.
-0.4 -0.2 0.0 0.2 0.4 0.6 0.8
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
log10(λ)
βgroup
0 20 40 60 80
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
i
βgroup
True signal
Estimated signal
Figure: Group-Lassocooperative-Lasso 37
![Page 88: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/88.jpg)
Breiman’s setupExample of path of solution and signal recovery with BIC choice
The signal strength is generated so as
I y = Xβ? + σε, with σ = 1, n = 30 to 500,
I X ∼ N (0,Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example),
I magnitude in β chosen so as R2 ≈ 0.75.
-0.4 -0.2 0.0 0.2 0.4 0.6 0.8
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
log10(λ)
βco
op
0 20 40 60 80
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
i
βco
op
True signal
Estimated signal
Figure: Coop-Lassocooperative-Lasso 37
![Page 89: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/89.jpg)
Breiman’s setupErrors as a function of the sample size n
pred
icti
on
erro
r
100 200 300 400 500
0.0
0.2
0.4
0.6
0.8
1.0
1.2
sig
ner
ror
100 200 300 400 500
0.00
0.05
0.10
0.15
0.20
0.25
0.30
n n
Figure: h = 3, |Sk| = 5 (favoring Lasso).
lasso group coop
cooperative-Lasso 38
![Page 90: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/90.jpg)
Breiman’s setupErrors as a function of the sample size n
pred
icti
on
erro
r
100 200 300 400 500
0.0
0.2
0.4
0.6
0.8
1.0
1.2
sig
ner
ror
100 200 300 400 500
0.00
0.05
0.10
0.15
0.20
0.25
0.30
n n
Figure: h = 4, |Sk| = 7 (intermediate).
lasso group coop
cooperative-Lasso 38
![Page 91: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/91.jpg)
Breiman’s setupErrors as a function of the sample size n
pred
icti
on
erro
r
100 200 300 400 500
0.0
0.2
0.4
0.6
0.8
1.0
1.2
sig
ner
ror
100 200 300 400 500
0.00
0.05
0.10
0.15
0.20
0.25
0.30
n n
Figure: h = 5, |Sk| = 9 (favoring group-Lasso).
lasso group coop
cooperative-Lasso 38
![Page 92: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/92.jpg)
Outline
Definition
Resolution
Consistency
Model selection
Simulation studies
Sibling probe sets and gene selection
cooperative-Lasso 39
![Page 93: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/93.jpg)
Robust microarray gene selection
Affymetrix typically contains multiple probes per gene defined as siblingprobes.
Reasons (Li, Zhu, Cook, BMC genomics 2008)
1. lack of knowledgegenome annotation maps probe sets to the same genes after chip design.
2. instabilityprobe sets cross-hybridize in an unpredictable manner.
3. designed on purposeprobe sets specific to RNA variant (splicing).
at least two good reasons to put sibling probe sets in the same group
cooperative-Lasso 40
![Page 94: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/94.jpg)
Robust microarray gene selection
Affymetrix typically contains multiple probes per gene defined as siblingprobes.
Reasons (Li, Zhu, Cook, BMC genomics 2008)
1. lack of knowledgegenome annotation maps probe sets to the same genes after chip design.
2. instabilityprobe sets cross-hybridize in an unpredictable manner.
3. designed on purposeprobe sets specific to RNA variant (splicing).
at least two good reasons to put sibling probe sets in the same group
cooperative-Lasso 40
![Page 95: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/95.jpg)
Application: Basal tumor
Methodology
1. select a restricted number of d probes from differential analysis,
2. determine the genes associated to these d probes, retrieve all the pprobes related to the genes, regardless of their signal,
3. fit a model with group penalties where groups are defined by genes.
Breast cancer data set
I 22269 probes,
I n = 29 patients with basal tumor,
I predict response to chemotherapy: pCR / not-pCR.
cooperative-Lasso 41
![Page 96: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/96.jpg)
Application: Basal tumor
Pretreatment
I ordered p-values with differential analysis (Jeanmougin et al. 2011),
I keep the d = 10 most differentiated probes,
I this corresponds to exactly 10 genes for a total of 27 probes.
Methods comparison
1. probes: logistic Lasso on the d = 10 most differentiated probes,
2. lasso: logistic Lasso on the p = 27 probes (with no group effect),
3. group: logistic group-Lasso on the p = 27 probes (with groupeffect),
4. coop: logistic coop-Lasso on the p = 27 probes (with signed groupeffect).
cooperative-Lasso 42
![Page 97: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/97.jpg)
Results
Gk (gene) pk symbol probes lasso group coop
frmd4b 3 0.38 0.62 0.68 0.75rnps1 2 0 0 0 0phlda3 1 1.82 1.93 4.12 7.32tbc1d22a 3 0 0 0 0ece1 2 0.89 0 0 1.87lzts1 6 1.34 1.57 1.15 0rpp38 1 0.95 0.90 1.92 3.66gtse1 5 0.88 0.85 1.21 0pak4 3 1.68 0.96 1.70 4.58chst10 1 0.79 0.36 1.08 2.50
Table: Genes corresponding to the probes selected by differential analysis, sizeof groups of probes, and `2-norm of each group of parameters for each estimate.
cooperative-Lasso 43
![Page 98: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/98.jpg)
Results0
24
6
Figure: Lasso
Gk (gene) pk symbol
frmd4b 3
rnps1 2
phlda3 1
tbc1d22a 3
ece1 2
lzts1 6
rpp38 1
gtse1 5
pak4 3
chst10 1
cooperative-Lasso 44
![Page 99: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/99.jpg)
Results0
24
6
Figure: Group-Lasso
Gk (gene) pk symbol
frmd4b 3
rnps1 2
phlda3 1
tbc1d22a 3
ece1 2
lzts1 6
rpp38 1
gtse1 5
pak4 3
chst10 1
cooperative-Lasso 44
![Page 100: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/100.jpg)
Results0
24
6
Figure: Coop-Lasso
Gk (gene) pk symbol
frmd4b 3
rnps1 2
phlda3 1
tbc1d22a 3
ece1 2
lzts1 6
rpp38 1
gtse1 5
pak4 3
chst10 1
cooperative-Lasso 44
![Page 101: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/101.jpg)
Results
0 5 10 15 20
0.0
0.2
0.4
0.6
0.8
1.0
1.2
‖β‖
binom
ialdeviance
lassogroupcoopprobes CV(λ?) CV
?
probes 0.511 0.474lasso 0.513 0.499group 0.430 0.372coop 0.263 0.194
Table: Best average CV scoreCV(λ?) and averaged best CV
score CV?
.
cooperative-Lasso 45
![Page 102: Sparsity with sign-coherent groups of variables via the cooperative-Lasso](https://reader033.vdocuments.us/reader033/viewer/2022052410/5560de40d8b42a3d768b46b3/html5/thumbnails/102.jpg)
Conclusion
Summary
I A variant of the group-Lasso which assumes sign-coherent groups,possibly sparse.
I the coop-Lasso comes with the “usual” accompanying toolsI consistency theorem,I model selection criteria,I subset algorithm,I R-package scoop
I very encouraging results on real genomic data
Perspective
I enhance algorithms/implementation for large scale experiments
I deeper analysis in the gene selection framework
I other application in genomics (aCGH segmentation ?)
cooperative-Lasso 46