convex relaxation for combinatorial penalties · from sparsity... empirical risk: for w 2rd, l(w) =...
TRANSCRIPT
![Page 1: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/1.jpg)
Convex relaxation for Combinatorial Penalties
Guillaume Obozinski
Equipe Imagine
Laboratoire d’Informatique Gaspard Monge
Ecole des Ponts - ParisTech
Joint work with Francis Bach
Fete Parisienne in Computation, Inference and Optimization
IHES - March 20th 2013
Convex relaxation for Combinatorial Penalties 1/33
![Page 2: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/2.jpg)
From sparsity...
Empirical risk: for w ∈ Rd ,
L(w) =1
2n
n∑i=1
(yi − x>i w)2
Support of the model:
Supp(w) = i | wi 6= 0.
Penalization for variable selection
minw∈Rd
L(w) + λ |Supp(w)|
Lasso
minw∈Rd
L(w) + λ‖w‖1
|Supp(w)| =n∑
i=1
1wi 6=0
Convex relaxation for Combinatorial Penalties 2/33
![Page 3: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/3.jpg)
From sparsity...
Empirical risk: for w ∈ Rd ,
L(w) =1
2n
n∑i=1
(yi − x>i w)2
Support of the model:
Supp(w) = i | wi 6= 0.
Penalization for variable selection
minw∈Rd
L(w) + λ |Supp(w)|
Lasso
minw∈Rd
L(w) + λ‖w‖1
|Supp(w)| =n∑
i=1
1wi 6=0
Convex relaxation for Combinatorial Penalties 2/33
![Page 4: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/4.jpg)
From sparsity...
Empirical risk: for w ∈ Rd ,
L(w) =1
2n
n∑i=1
(yi − x>i w)2
Support of the model:
Supp(w) = i | wi 6= 0.
Penalization for variable selection
minw∈Rd
L(w) + λ |Supp(w)|
Lasso
minw∈Rd
L(w) + λ‖w‖1
|Supp(w)| =n∑
i=1
1wi 6=0
Convex relaxation for Combinatorial Penalties 2/33
![Page 5: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/5.jpg)
From sparsity...
Empirical risk: for w ∈ Rd ,
L(w) =1
2n
n∑i=1
(yi − x>i w)2
Support of the model:
Supp(w) = i | wi 6= 0.
Penalization for variable selection
minw∈Rd
L(w) + λ |Supp(w)|
Lasso
minw∈Rd
L(w) + λ‖w‖1
|Supp(w)| =n∑
i=1
1wi 6=0
Convex relaxation for Combinatorial Penalties 2/33
![Page 6: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/6.jpg)
From sparsity...
Empirical risk: for w ∈ Rd ,
L(w) =1
2n
n∑i=1
(yi − x>i w)2
Support of the model:
Supp(w) = i | wi 6= 0.
Penalization for variable selection
minw∈Rd
L(w) + λ |Supp(w)|
Lasso
minw∈Rd
L(w) + λ‖w‖1
|Supp(w)| =n∑
i=1
1wi 6=0
0
Convex relaxation for Combinatorial Penalties 2/33
![Page 7: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/7.jpg)
From sparsity...
Empirical risk: for w ∈ Rd ,
L(w) =1
2n
n∑i=1
(yi − x>i w)2
Support of the model:
Supp(w) = i | wi 6= 0.
Penalization for variable selection
minw∈Rd
L(w) + λ |Supp(w)|
Lasso
minw∈Rd
L(w) + λ‖w‖1
|Supp(w)| =n∑
i=1
1wi 6=0
0 1−1
Convex relaxation for Combinatorial Penalties 2/33
![Page 8: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/8.jpg)
From sparsity...
Empirical risk: for w ∈ Rd ,
L(w) =1
2n
n∑i=1
(yi − x>i w)2
Support of the model:
Supp(w) = i | wi 6= 0.
Penalization for variable selection
minw∈Rd
L(w) + λ |Supp(w)|
Lasso
minw∈Rd
L(w) + λ‖w‖1
|Supp(w)| =n∑
i=1
1wi 6=0
0 1−1
Convex relaxation for Combinatorial Penalties 2/33
![Page 9: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/9.jpg)
From sparsity...
Empirical risk: for w ∈ Rd ,
L(w) =1
2n
n∑i=1
(yi − x>i w)2
Support of the model:
Supp(w) = i | wi 6= 0.
Penalization for variable selection
minw∈Rd
L(w) + λ |Supp(w)|
Lasso
minw∈Rd
L(w) + λ‖w‖1
|Supp(w)| =n∑
i=1
1wi 6=0
0 1−1
Convex relaxation for Combinatorial Penalties 2/33
![Page 10: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/10.jpg)
... to Structured Sparsity
The support is not only sparse, but, in addition,we have prior information about its structure.
Examples
The variables should be selected in groups.
The variables lie in a hierarchy.
The variables lie on a graph or network and the support should belocalized or densely connected on the graph.
Convex relaxation for Combinatorial Penalties 3/33
![Page 11: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/11.jpg)
... to Structured Sparsity
The support is not only sparse, but, in addition,we have prior information about its structure.
Examples
The variables should be selected in groups.
The variables lie in a hierarchy.
The variables lie on a graph or network and the support should belocalized or densely connected on the graph.
Convex relaxation for Combinatorial Penalties 3/33
![Page 12: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/12.jpg)
... to Structured Sparsity
The support is not only sparse, but, in addition,we have prior information about its structure.
Examples
The variables should be selected in groups.
The variables lie in a hierarchy.
The variables lie on a graph or network and the support should belocalized or densely connected on the graph.
Convex relaxation for Combinatorial Penalties 3/33
![Page 13: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/13.jpg)
... to Structured Sparsity
The support is not only sparse, but, in addition,we have prior information about its structure.
Examples
The variables should be selected in groups.
The variables lie in a hierarchy.
The variables lie on a graph or network and the support should belocalized or densely connected on the graph.
Convex relaxation for Combinatorial Penalties 3/33
![Page 14: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/14.jpg)
Difficult inverse problem in Brain Imaging
L R
y=-84 x=17
L R
z=-13
-5.00e-02 5.00e-02Scale 6 - Fold 9
Jenatton et al. (2012)
Convex relaxation for Combinatorial Penalties 4/33
![Page 15: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/15.jpg)
Hierarchical Dictionary Learning
Figure: Hierarchical dictionary of imagepatches Figure: Hierarchical Topic model
Mairal, Jenatton, Obozinski and Bach (2010)
Convex relaxation for Combinatorial Penalties 5/33
![Page 16: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/16.jpg)
Ideas in structured sparsity
Convex relaxation for Combinatorial Penalties 6/33
![Page 17: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/17.jpg)
Group Lasso and `1/`p norm (Yuan and Lin, 2006)
Group Lasso
Given G = A1, . . . ,Am a partition of V := 1, . . . , d consider
‖w‖`1/`p =∑A∈G
δA ‖wA‖p
Overlapping groups: direct extension of Jenatton et al. (2011).
Interesting induced structures
→ Induce patterns of rooted subtree
→ Induce “convex” patterns on a grid
Convex relaxation for Combinatorial Penalties 7/33
![Page 18: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/18.jpg)
Group Lasso and `1/`p norm (Yuan and Lin, 2006)
Group Lasso
Given G = A1, . . . ,Am a partition of V := 1, . . . , d consider
‖w‖`1/`p =∑A∈G
δA ‖wA‖p
Overlapping groups: direct extension of Jenatton et al. (2011).
Interesting induced structures
→ Induce patterns of rooted subtree
→ Induce “convex” patterns on a grid
Convex relaxation for Combinatorial Penalties 7/33
![Page 19: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/19.jpg)
Hierarchical Norms (Zhao et al., 2009; Bach, 2008)
11
22 33
44 55 66
(Jenatton, Mairal, Obozinski and Bach, 2010a)
A covariate can only be selected after its ancestors
Structure on parameters w
Hierarchical penalization: Ω(w) =∑
g∈G ‖wg‖2 where groups g inG are equal to the set of descendants of some nodes in a tree.
Convex relaxation for Combinatorial Penalties 8/33
![Page 20: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/20.jpg)
Hierarchical Norms (Zhao et al., 2009; Bach, 2008)
(Jenatton, Mairal, Obozinski and Bach, 2010a)
A covariate can only be selected after its ancestors
Structure on parameters w
Hierarchical penalization: Ω(w) =∑
g∈G ‖wg‖2 where groups g inG are equal to the set of descendants of some nodes in a tree.
Convex relaxation for Combinatorial Penalties 8/33
![Page 21: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/21.jpg)
A new approach based on combinatorialfunctions
Convex relaxation for Combinatorial Penalties 9/33
![Page 22: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/22.jpg)
General framework
Let V = 1, . . . , d.Given a set function F : 2V 7→ R+
consider
minw∈Rd
L(w) + F (Supp(w))
Examples of combinatorial functions
Use recursivity or counts of structures (e.g. tree) with DP
Block-coding (Huang et al., 2011)
G (A) = minBi
F (B1) + . . .+ F (Bk) s.t. B1 ∪ . . . ∪ Bk ⊃ A
Submodular functions (Work on convex relaxations by Bach (2010))
Convex relaxation for Combinatorial Penalties 10/33
![Page 23: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/23.jpg)
General framework
Let V = 1, . . . , d.Given a set function F : 2V 7→ R+ consider
minw∈Rd
L(w) + F (Supp(w))
Examples of combinatorial functions
Use recursivity or counts of structures (e.g. tree) with DP
Block-coding (Huang et al., 2011)
G (A) = minBi
F (B1) + . . .+ F (Bk) s.t. B1 ∪ . . . ∪ Bk ⊃ A
Submodular functions (Work on convex relaxations by Bach (2010))
Convex relaxation for Combinatorial Penalties 10/33
![Page 24: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/24.jpg)
General framework
Let V = 1, . . . , d.Given a set function F : 2V 7→ R+ consider
minw∈Rd
L(w) + F (Supp(w))
Examples of combinatorial functions
Use recursivity or counts of structures (e.g. tree) with DP
Block-coding (Huang et al., 2011)
G (A) = minBi
F (B1) + . . .+ F (Bk) s.t. B1 ∪ . . . ∪ Bk ⊃ A
Submodular functions (Work on convex relaxations by Bach (2010))
Convex relaxation for Combinatorial Penalties 10/33
![Page 25: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/25.jpg)
A relaxation for F ...?
How to solve?minw∈Rd
L(w) + F (Supp(w))
→ Greedy algorithms
→ Non-convex methods
→ Relaxation
|A| F (A)
L(w) + λ |Supp(w)| L(w) + λF (Supp(w))
↓ ↓ ?
L(w) + λ ‖w‖1 L(w) + λ ...?...
Convex relaxation for Combinatorial Penalties 11/33
![Page 26: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/26.jpg)
A relaxation for F ...?
How to solve?minw∈Rd
L(w) + F (Supp(w))
→ Greedy algorithms
→ Non-convex methods
→ Relaxation
|A| F (A)
L(w) + λ |Supp(w)| L(w) + λF (Supp(w))
↓ ↓ ?
L(w) + λ ‖w‖1 L(w) + λ ...?...
Convex relaxation for Combinatorial Penalties 11/33
![Page 27: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/27.jpg)
A relaxation for F ...?
How to solve?minw∈Rd
L(w) + F (Supp(w))
→ Greedy algorithms
→ Non-convex methods
→ Relaxation
|A| F (A)
L(w) + λ |Supp(w)|
L(w) + λF (Supp(w))
↓ ↓ ?
L(w) + λ ‖w‖1 L(w) + λ ...?...
Convex relaxation for Combinatorial Penalties 11/33
![Page 28: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/28.jpg)
A relaxation for F ...?
How to solve?minw∈Rd
L(w) + F (Supp(w))
→ Greedy algorithms
→ Non-convex methods
→ Relaxation
|A| F (A)
L(w) + λ |Supp(w)|
L(w) + λF (Supp(w))
↓
↓ ?
L(w) + λ ‖w‖1
L(w) + λ ...?...
Convex relaxation for Combinatorial Penalties 11/33
![Page 29: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/29.jpg)
A relaxation for F ...?
How to solve?minw∈Rd
L(w) + F (Supp(w))
→ Greedy algorithms
→ Non-convex methods
→ Relaxation
|A| F (A)
L(w) + λ |Supp(w)| L(w) + λF (Supp(w))
↓
↓ ?
L(w) + λ ‖w‖1
L(w) + λ ...?...
Convex relaxation for Combinatorial Penalties 11/33
![Page 30: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/30.jpg)
A relaxation for F ...?
How to solve?minw∈Rd
L(w) + F (Supp(w))
→ Greedy algorithms
→ Non-convex methods
→ Relaxation
|A| F (A)
L(w) + λ |Supp(w)| L(w) + λF (Supp(w))
↓ ↓ ?
L(w) + λ ‖w‖1 L(w) + λ ...?...
Convex relaxation for Combinatorial Penalties 11/33
![Page 31: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/31.jpg)
Previous relaxation result
Bach (2010) showed that if F is a submodular function, it is possible toconstruct the “tightest” convex relaxation of the penalty F for vectorsw ∈ Rd such that ‖w‖∞ ≤ 1.
Limitations and open issues:
The relaxation is defined on the unit `∞ ball.
Seems to implicitly assume that the w to be estimated is in a fixed`∞ ball
The choice of `∞ seems arbitrary
The `∞ relaxation induces undesirable clustering artifacts of thecoefficients absolute values.
What happens in the non-submodular case?
Convex relaxation for Combinatorial Penalties 12/33
![Page 32: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/32.jpg)
Previous relaxation result
Bach (2010) showed that if F is a submodular function, it is possible toconstruct the “tightest” convex relaxation of the penalty F for vectorsw ∈ Rd such that ‖w‖∞ ≤ 1.
Limitations and open issues:
The relaxation is defined on the unit `∞ ball.
Seems to implicitly assume that the w to be estimated is in a fixed`∞ ball
The choice of `∞ seems arbitrary
The `∞ relaxation induces undesirable clustering artifacts of thecoefficients absolute values.
What happens in the non-submodular case?
Convex relaxation for Combinatorial Penalties 12/33
![Page 33: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/33.jpg)
Previous relaxation result
Bach (2010) showed that if F is a submodular function, it is possible toconstruct the “tightest” convex relaxation of the penalty F for vectorsw ∈ Rd such that ‖w‖∞ ≤ 1.
Limitations and open issues:
The relaxation is defined on the unit `∞ ball.
Seems to implicitly assume that the w to be estimated is in a fixed`∞ ball
The choice of `∞ seems arbitrary
The `∞ relaxation induces undesirable clustering artifacts of thecoefficients absolute values.
What happens in the non-submodular case?
Convex relaxation for Combinatorial Penalties 12/33
![Page 34: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/34.jpg)
Previous relaxation result
Bach (2010) showed that if F is a submodular function, it is possible toconstruct the “tightest” convex relaxation of the penalty F for vectorsw ∈ Rd such that ‖w‖∞ ≤ 1.
Limitations and open issues:
The relaxation is defined on the unit `∞ ball.
Seems to implicitly assume that the w to be estimated is in a fixed`∞ ball
The choice of `∞ seems arbitrary
The `∞ relaxation induces undesirable clustering artifacts of thecoefficients absolute values.
What happens in the non-submodular case?
Convex relaxation for Combinatorial Penalties 12/33
![Page 35: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/35.jpg)
Previous relaxation result
Bach (2010) showed that if F is a submodular function, it is possible toconstruct the “tightest” convex relaxation of the penalty F for vectorsw ∈ Rd such that ‖w‖∞ ≤ 1.
Limitations and open issues:
The relaxation is defined on the unit `∞ ball.
Seems to implicitly assume that the w to be estimated is in a fixed`∞ ball
The choice of `∞ seems arbitrary
The `∞ relaxation induces undesirable clustering artifacts of thecoefficients absolute values.
What happens in the non-submodular case?
Convex relaxation for Combinatorial Penalties 12/33
![Page 36: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/36.jpg)
Previous relaxation result
Bach (2010) showed that if F is a submodular function, it is possible toconstruct the “tightest” convex relaxation of the penalty F for vectorsw ∈ Rd such that ‖w‖∞ ≤ 1.
Limitations and open issues:
The relaxation is defined on the unit `∞ ball.
Seems to implicitly assume that the w to be estimated is in a fixed`∞ ball
The choice of `∞ seems arbitrary
The `∞ relaxation induces undesirable clustering artifacts of thecoefficients absolute values.
What happens in the non-submodular case?
Convex relaxation for Combinatorial Penalties 12/33
![Page 37: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/37.jpg)
Previous relaxation result
Bach (2010) showed that if F is a submodular function, it is possible toconstruct the “tightest” convex relaxation of the penalty F for vectorsw ∈ Rd such that ‖w‖∞ ≤ 1.
Limitations and open issues:
The relaxation is defined on the unit `∞ ball.
Seems to implicitly assume that the w to be estimated is in a fixed`∞ ball
The choice of `∞ seems arbitrary
The `∞ relaxation induces undesirable clustering artifacts of thecoefficients absolute values.
What happens in the non-submodular case?
Convex relaxation for Combinatorial Penalties 12/33
![Page 38: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/38.jpg)
Penalizing and regularizing...
Given a function F : 2V → R+, consider for ν, µ > 0 the combinedpenalty:
pen(w) = µF (Supp(w)) + ν ‖w‖pp.
Motivations
Compromise between variable selection and smooth regularization
Required for F allowing large supports such as A 7→ 1A6=∅
Leads to a penalty which is coercive so that a convex relaxation onRd will not be trivial.
Convex relaxation for Combinatorial Penalties 13/33
![Page 39: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/39.jpg)
Penalizing and regularizing...
Given a function F : 2V → R+, consider for ν, µ > 0 the combinedpenalty:
pen(w) = µF (Supp(w)) + ν ‖w‖pp.
Motivations
Compromise between variable selection and smooth regularization
Required for F allowing large supports such as A 7→ 1A6=∅
Leads to a penalty which is coercive so that a convex relaxation onRd will not be trivial.
Convex relaxation for Combinatorial Penalties 13/33
![Page 40: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/40.jpg)
Penalizing and regularizing...
Given a function F : 2V → R+, consider for ν, µ > 0 the combinedpenalty:
pen(w) = µF (Supp(w)) + ν ‖w‖pp.
Motivations
Compromise between variable selection and smooth regularization
Required for F allowing large supports such as A 7→ 1A6=∅
Leads to a penalty which is coercive so that a convex relaxation onRd will not be trivial.
Convex relaxation for Combinatorial Penalties 13/33
![Page 41: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/41.jpg)
Penalizing and regularizing...
Given a function F : 2V → R+, consider for ν, µ > 0 the combinedpenalty:
pen(w) = µF (Supp(w)) + ν ‖w‖pp.
Motivations
Compromise between variable selection and smooth regularization
Required for F allowing large supports such as A 7→ 1A6=∅
Leads to a penalty which is coercive so that a convex relaxation onRd will not be trivial.
Convex relaxation for Combinatorial Penalties 13/33
![Page 42: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/42.jpg)
A convex and homogeneous relaxation
Looking for a convex relaxation of pen(w).
Require as well that it is positively homogeneous → scale invariance.
Definition (Homogeneous extension of a function g)
gh : x 7→ infλ>0
1
λg(λx).
Proposition
The tightest convex positively homogeneous lower bound of a function gis the convex envelope of gh.
Leads us to consider:
penh(w) = infλ>0
1
λ
(µF (Supp(λw)) + ν ‖λw‖pp
)∝ Θ(w) := ‖w‖p F (Supp(w))1/q with
1
p+
1
q= 1.
Convex relaxation for Combinatorial Penalties 14/33
![Page 43: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/43.jpg)
A convex and homogeneous relaxation
Looking for a convex relaxation of pen(w).
Require as well that it is positively homogeneous → scale invariance.
Definition (Homogeneous extension of a function g)
gh : x 7→ infλ>0
1
λg(λx).
Proposition
The tightest convex positively homogeneous lower bound of a function gis the convex envelope of gh.
Leads us to consider:
penh(w) = infλ>0
1
λ
(µF (Supp(λw)) + ν ‖λw‖pp
)∝ Θ(w) := ‖w‖p F (Supp(w))1/q with
1
p+
1
q= 1.
Convex relaxation for Combinatorial Penalties 14/33
![Page 44: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/44.jpg)
A convex and homogeneous relaxation
Looking for a convex relaxation of pen(w).
Require as well that it is positively homogeneous → scale invariance.
Definition (Homogeneous extension of a function g)
gh : x 7→ infλ>0
1
λg(λx).
Proposition
The tightest convex positively homogeneous lower bound of a function gis the convex envelope of gh.
Leads us to consider:
penh(w) = infλ>0
1
λ
(µF (Supp(λw)) + ν ‖λw‖pp
)∝ Θ(w) := ‖w‖p F (Supp(w))1/q with
1
p+
1
q= 1.
Convex relaxation for Combinatorial Penalties 14/33
![Page 45: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/45.jpg)
A convex and homogeneous relaxation
Looking for a convex relaxation of pen(w).
Require as well that it is positively homogeneous → scale invariance.
Definition (Homogeneous extension of a function g)
gh : x 7→ infλ>0
1
λg(λx).
Proposition
The tightest convex positively homogeneous lower bound of a function gis the convex envelope of gh.
Leads us to consider:
penh(w) = infλ>0
1
λ
(µF (Supp(λw)) + ν ‖λw‖pp
)
∝ Θ(w) := ‖w‖p F (Supp(w))1/q with1
p+
1
q= 1.
Convex relaxation for Combinatorial Penalties 14/33
![Page 46: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/46.jpg)
A convex and homogeneous relaxation
Looking for a convex relaxation of pen(w).
Require as well that it is positively homogeneous → scale invariance.
Definition (Homogeneous extension of a function g)
gh : x 7→ infλ>0
1
λg(λx).
Proposition
The tightest convex positively homogeneous lower bound of a function gis the convex envelope of gh.
Leads us to consider:
penh(w) = infλ>0
1
λ
(µF (Supp(λw)) + ν ‖λw‖pp
)∝ Θ(w) := ‖w‖p F (Supp(w))1/q with
1
p+
1
q= 1.
Convex relaxation for Combinatorial Penalties 14/33
![Page 47: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/47.jpg)
Envelope of the homogeneous penalty ΘConsider Ωp with dual norm
Ω∗p(s) = maxA⊂V ,A 6=∅
‖sA‖qF (A)1/q
.
with unit ball: BΩ∗p
:=s ∈ Rd | ∀A ⊂ V , ‖sA‖qq ≤ F (A)
Proposition
The norm Ωp is the convex envelope (tightest convex lower bound) ofthe function w 7→ ‖w‖p F (Supp(w))1/q.
Proof.
Denote Θ(w) = ‖w‖p F (Supp(w))1/q:
Θ∗(s) = maxw∈Rd
w>s − ‖w‖p F (Supp(w))1/q
= maxA⊂V
maxwA∈RA
w>A sA − ‖wA‖p F (A)1/q
= maxA⊂V
ι‖sA‖q6F (A)1/q = ιΩ∗p (s)61
Convex relaxation for Combinatorial Penalties 15/33
![Page 48: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/48.jpg)
Envelope of the homogeneous penalty ΘConsider Ωp with dual norm
Ω∗p(s) = maxA⊂V ,A 6=∅
‖sA‖qF (A)1/q
.
with unit ball: BΩ∗p
:=s ∈ Rd | ∀A ⊂ V , ‖sA‖qq ≤ F (A)
Proposition
The norm Ωp is the convex envelope (tightest convex lower bound) ofthe function w 7→ ‖w‖p F (Supp(w))1/q.
Proof.
Denote Θ(w) = ‖w‖p F (Supp(w))1/q:
Θ∗(s) = maxw∈Rd
w>s − ‖w‖p F (Supp(w))1/q
= maxA⊂V
maxwA∈RA
w>A sA − ‖wA‖p F (A)1/q
= maxA⊂V
ι‖sA‖q6F (A)1/q = ιΩ∗p (s)61
Convex relaxation for Combinatorial Penalties 15/33
![Page 49: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/49.jpg)
Envelope of the homogeneous penalty ΘConsider Ωp with dual norm
Ω∗p(s) = maxA⊂V ,A 6=∅
‖sA‖qF (A)1/q
.
with unit ball: BΩ∗p
:=s ∈ Rd | ∀A ⊂ V , ‖sA‖qq ≤ F (A)
Proposition
The norm Ωp is the convex envelope (tightest convex lower bound) ofthe function w 7→ ‖w‖p F (Supp(w))1/q.
Proof.
Denote Θ(w) = ‖w‖p F (Supp(w))1/q:
Θ∗(s) = maxw∈Rd
w>s − ‖w‖p F (Supp(w))1/q
= maxA⊂V
maxwA∈RA
w>A sA − ‖wA‖p F (A)1/q
= maxA⊂V
ι‖sA‖q6F (A)1/q = ιΩ∗p (s)61
Convex relaxation for Combinatorial Penalties 15/33
![Page 50: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/50.jpg)
Envelope of the homogeneous penalty ΘConsider Ωp with dual norm
Ω∗p(s) = maxA⊂V ,A 6=∅
‖sA‖qF (A)1/q
.
with unit ball: BΩ∗p
:=s ∈ Rd | ∀A ⊂ V , ‖sA‖qq ≤ F (A)
Proposition
The norm Ωp is the convex envelope (tightest convex lower bound) ofthe function w 7→ ‖w‖p F (Supp(w))1/q.
Proof.
Denote Θ(w) = ‖w‖p F (Supp(w))1/q:
Θ∗(s) = maxw∈Rd
w>s − ‖w‖p F (Supp(w))1/q
= maxA⊂V
maxwA∈RA
w>A sA − ‖wA‖p F (A)1/q
= maxA⊂V
ι‖sA‖q6F (A)1/q = ιΩ∗p (s)61
Convex relaxation for Combinatorial Penalties 15/33
![Page 51: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/51.jpg)
Envelope of the homogeneous penalty ΘConsider Ωp with dual norm
Ω∗p(s) = maxA⊂V ,A 6=∅
‖sA‖qF (A)1/q
.
with unit ball: BΩ∗p
:=s ∈ Rd | ∀A ⊂ V , ‖sA‖qq ≤ F (A)
Proposition
The norm Ωp is the convex envelope (tightest convex lower bound) ofthe function w 7→ ‖w‖p F (Supp(w))1/q.
Proof.
Denote Θ(w) = ‖w‖p F (Supp(w))1/q:
Θ∗(s) = maxw∈Rd
w>s − ‖w‖p F (Supp(w))1/q
= maxA⊂V
maxwA∈RA
w>A sA − ‖wA‖p F (A)1/q
= maxA⊂V
ι‖sA‖q6F (A)1/q = ιΩ∗p (s)61
Convex relaxation for Combinatorial Penalties 15/33
![Page 52: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/52.jpg)
Envelope of the homogeneous penalty ΘConsider Ωp with dual norm
Ω∗p(s) = maxA⊂V ,A 6=∅
‖sA‖qF (A)1/q
.
with unit ball: BΩ∗p
:=s ∈ Rd | ∀A ⊂ V , ‖sA‖qq ≤ F (A)
Proposition
The norm Ωp is the convex envelope (tightest convex lower bound) ofthe function w 7→ ‖w‖p F (Supp(w))1/q.
Proof.
Denote Θ(w) = ‖w‖p F (Supp(w))1/q:
Θ∗(s) = maxw∈Rd
w>s − ‖w‖p F (Supp(w))1/q
= maxA⊂V
maxwA∈RA
w>A sA − ‖wA‖p F (A)1/q
= maxA⊂V
ι‖sA‖q6F (A)1/q
= ιΩ∗p (s)61
Convex relaxation for Combinatorial Penalties 15/33
![Page 53: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/53.jpg)
Envelope of the homogeneous penalty ΘConsider Ωp with dual norm
Ω∗p(s) = maxA⊂V ,A 6=∅
‖sA‖qF (A)1/q
.
with unit ball: BΩ∗p
:=s ∈ Rd | ∀A ⊂ V , ‖sA‖qq ≤ F (A)
Proposition
The norm Ωp is the convex envelope (tightest convex lower bound) ofthe function w 7→ ‖w‖p F (Supp(w))1/q.
Proof.
Denote Θ(w) = ‖w‖p F (Supp(w))1/q:
Θ∗(s) = maxw∈Rd
w>s − ‖w‖p F (Supp(w))1/q
= maxA⊂V
maxwA∈RA
w>A sA − ‖wA‖p F (A)1/q
= maxA⊂V
ι‖sA‖q6F (A)1/q = ιΩ∗p (s)61
Convex relaxation for Combinatorial Penalties 15/33
![Page 54: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/54.jpg)
Graphs of the different penalties for w ∈ R2
F (Supp(w))
pen(w) = µF (Supp(w)) + ν ‖w‖22
Convex relaxation for Combinatorial Penalties 16/33
![Page 55: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/55.jpg)
Graphs of the different penalties for w ∈ R2
F (Supp(w)) pen(w) = µF (Supp(w)) + ν ‖w‖22
Convex relaxation for Combinatorial Penalties 16/33
![Page 56: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/56.jpg)
Graphs of the different penalties for w ∈ R2
Θ(w) =√F (Supp(w))‖w‖2
ΩF (w)
Convex relaxation for Combinatorial Penalties 17/33
![Page 57: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/57.jpg)
Graphs of the different penalties for w ∈ R2
Θ(w) =√F (Supp(w))‖w‖2 ΩF (w)
Convex relaxation for Combinatorial Penalties 17/33
![Page 58: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/58.jpg)
A large latent group Lasso (Jacob et al., 2009)
V = v = (vA)A⊂V ∈(RV)2V
s.t. Supp(vA) ⊂ A
Ωp(w) = minv∈V
∑A⊂V
F (A)1q ‖vA‖p s.t. w =
∑A⊂V
vA,
w
v1 v2 v1,2 v1,2,3,4... ...
+ + + + + + + + + + + + + +=
Convex relaxation for Combinatorial Penalties 18/33
![Page 59: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/59.jpg)
Some simple examples
F Ωp
|A| ‖w‖1
1A6=∅ ‖w‖pIf G is a partition of 1, . . . , d:
∑B∈G 1A∩B 6=∅
∑B∈G ‖wB‖p
When p =∞ and F is submodular, our relaxation coincides withthat of Bach (2010).
However, when G is not a partition and p <∞, Ωp is not in generalan `1/`p-norms !
→ New norms... e.g. the k-support norm of Argyriou et al. (2012).
Convex relaxation for Combinatorial Penalties 19/33
![Page 60: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/60.jpg)
Some simple examples
F Ωp
|A| ‖w‖1
1A6=∅ ‖w‖pIf G is a partition of 1, . . . , d:
∑B∈G 1A∩B 6=∅
∑B∈G ‖wB‖p
When p =∞ and F is submodular, our relaxation coincides withthat of Bach (2010).
However, when G is not a partition and p <∞, Ωp is not in generalan `1/`p-norms !
→ New norms... e.g. the k-support norm of Argyriou et al. (2012).
Convex relaxation for Combinatorial Penalties 19/33
![Page 61: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/61.jpg)
Some simple examples
F Ωp
|A| ‖w‖1
1A6=∅ ‖w‖pIf G is a partition of 1, . . . , d:
∑B∈G 1A∩B 6=∅
∑B∈G ‖wB‖p
When p =∞ and F is submodular, our relaxation coincides withthat of Bach (2010).
However, when G is not a partition and p <∞, Ωp is not in generalan `1/`p-norms !
→ New norms... e.g. the k-support norm of Argyriou et al. (2012).
Convex relaxation for Combinatorial Penalties 19/33
![Page 62: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/62.jpg)
Example
Consider V=1, 2, 3.
G = 1, 2, 1, 3, 2, 3
F (1, 2) = 1,
F (1, 3) = 1,
F (2, 3) = 1,
F (A) =∞ or defined byblock-coding.
Convex relaxation for Combinatorial Penalties 20/33
![Page 63: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/63.jpg)
Example
Consider V=1, 2, 3.
G = 1, 2, 1, 3, 2, 3
F (1, 2) = 1,
F (1, 3) = 1,
F (2, 3) = 1,
F (A) =∞ or defined byblock-coding.
Convex relaxation for Combinatorial Penalties 20/33
![Page 64: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/64.jpg)
How tight is the relaxation? Example: the range function
Consider V = 1, . . . , p and the function
F (A) = range(A) = max(A)−min(A) + 1.
→ Leads to the selection of interval patterns.
What is its convex relaxation?
⇒ ΩFp (w) = ‖w‖1
The relaxation fails
New concept of Lower Combinatorial envelope provides a tool toanalyze the tightness of the relaxation.
Convex relaxation for Combinatorial Penalties 21/33
![Page 65: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/65.jpg)
How tight is the relaxation? Example: the range function
Consider V = 1, . . . , p and the function
F (A) = range(A) = max(A)−min(A) + 1.
→ Leads to the selection of interval patterns.
What is its convex relaxation?
⇒ ΩFp (w) = ‖w‖1
The relaxation fails
New concept of Lower Combinatorial envelope provides a tool toanalyze the tightness of the relaxation.
Convex relaxation for Combinatorial Penalties 21/33
![Page 66: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/66.jpg)
How tight is the relaxation? Example: the range function
Consider V = 1, . . . , p and the function
F (A) = range(A) = max(A)−min(A) + 1.
→ Leads to the selection of interval patterns.
What is its convex relaxation?
⇒ ΩFp (w) = ‖w‖1
The relaxation fails
New concept of Lower Combinatorial envelope provides a tool toanalyze the tightness of the relaxation.
Convex relaxation for Combinatorial Penalties 21/33
![Page 67: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/67.jpg)
How tight is the relaxation? Example: the range function
Consider V = 1, . . . , p and the function
F (A) = range(A) = max(A)−min(A) + 1.
→ Leads to the selection of interval patterns.
What is its convex relaxation?
⇒ ΩFp (w) = ‖w‖1
The relaxation fails
New concept of Lower Combinatorial envelope provides a tool toanalyze the tightness of the relaxation.
Convex relaxation for Combinatorial Penalties 21/33
![Page 68: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/68.jpg)
How tight is the relaxation? Example: the range function
Consider V = 1, . . . , p and the function
F (A) = range(A) = max(A)−min(A) + 1.
→ Leads to the selection of interval patterns.
What is its convex relaxation?
⇒ ΩFp (w) = ‖w‖1
The relaxation fails
New concept of Lower Combinatorial envelope provides a tool toanalyze the tightness of the relaxation.
Convex relaxation for Combinatorial Penalties 21/33
![Page 69: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/69.jpg)
Submodular penalties
A function F : 2V 7→ R is submodular if
∀A,B ⊂ V , F (A) + F (B) ≥ F (A ∪ B) + F (A ∩ B) (1)
For these functions ΩF∞(w) = f (|w |) for f the Lovasz extension of F .
Properties of submodular function
f is computed efficiently (via the so-called “greedy” algorithm)
decomposition (“weak” separability) properties
F and f can be minimized in polynomial time.
... leads to properties of the corresponding submodular norms
Regularized empirical risk minimization problems solved efficiently
Statistical guarantees in terms of consistency and support recovery.
Convex relaxation for Combinatorial Penalties 22/33
![Page 70: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/70.jpg)
Submodular penalties
A function F : 2V 7→ R is submodular if
∀A,B ⊂ V , F (A) + F (B) ≥ F (A ∪ B) + F (A ∩ B) (1)
For these functions ΩF∞(w) = f (|w |) for f the Lovasz extension of F .
Properties of submodular function
f is computed efficiently (via the so-called “greedy” algorithm)
decomposition (“weak” separability) properties
F and f can be minimized in polynomial time.
... leads to properties of the corresponding submodular norms
Regularized empirical risk minimization problems solved efficiently
Statistical guarantees in terms of consistency and support recovery.
Convex relaxation for Combinatorial Penalties 22/33
![Page 71: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/71.jpg)
Consistency for the Lasso (Bickel et al., 2009)
Assume that y = Xw∗ + σε, with ε ∼ N (0, Idn)
Let Q = 1nX>X ∈ Rd×d .
Denote J = Supp(w∗).
Assume the `1-Restricted Eigenvalue condition:
∀∆ s.t. ‖∆Jc‖1 6 3 ‖∆J‖1, ∆>Q∆ > κ ‖∆J‖21.
Then we have
1
n‖Xw − Xw∗‖2
2 672|J|σ2
κ
(2 log p + t2
n
),
with probability larger than 1− exp(−t2).
Convex relaxation for Combinatorial Penalties 23/33
![Page 72: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/72.jpg)
Support Recovery for the Lasso (Wainwright, 2009)
Assume y = Xw∗ + σε, with ε ∼ N (0, Idn)
Let Q = 1nX>X ∈ Rd×d .
Denote by J = Supp(w∗).
Define ν = minj ,w∗j 6=0 |w∗j | > 0
Assume κ = λmin(QJJ) > 0
Assume the Irrepresentability Condition, i.e., that for η > 0,
|||Q−1JJ QJJc |||∞,∞ 6 1− η.
Then, if 2η
√2σ2 log(p)
n < λ < κν|J| , the minimizer w is unique and has
support equal to J, with probability larger than 1− 4 exp(−c1nλ2).
Convex relaxation for Combinatorial Penalties 24/33
![Page 73: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/73.jpg)
An example: penalizing the rangeStructured prior on support (Jenatton et al., 2011):
the support is an interval of 1, . . . , p
Natural associated penalization:F (A) = range(A) = imax(A)− imin(A) + 1.
→ F is not submodular...
→ G (A) = |A|But F (A) : = d − 1 + range(A) is submodular !
In fact F (A) =∑
B∈G 1A∩B 6=∅ for B of the form:
Jenatton et al. (2011) considered Ω(w) =∑
B∈B ‖wB dB‖2.
Convex relaxation for Combinatorial Penalties 25/33
![Page 74: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/74.jpg)
An example: penalizing the rangeStructured prior on support (Jenatton et al., 2011):
the support is an interval of 1, . . . , p
Natural associated penalization:F (A) = range(A) = imax(A)− imin(A) + 1.
→ F is not submodular...
→ G (A) = |A|But F (A) : = d − 1 + range(A) is submodular !
In fact F (A) =∑
B∈G 1A∩B 6=∅ for B of the form:
Jenatton et al. (2011) considered Ω(w) =∑
B∈B ‖wB dB‖2.
Convex relaxation for Combinatorial Penalties 25/33
![Page 75: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/75.jpg)
An example: penalizing the rangeStructured prior on support (Jenatton et al., 2011):
the support is an interval of 1, . . . , p
Natural associated penalization:F (A) = range(A) = imax(A)− imin(A) + 1.
→ F is not submodular...
→ G (A) = |A|But F (A) : = d − 1 + range(A) is submodular !
In fact F (A) =∑
B∈G 1A∩B 6=∅ for B of the form:
Jenatton et al. (2011) considered Ω(w) =∑
B∈B ‖wB dB‖2.
Convex relaxation for Combinatorial Penalties 25/33
![Page 76: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/76.jpg)
An example: penalizing the rangeStructured prior on support (Jenatton et al., 2011):
the support is an interval of 1, . . . , p
Natural associated penalization:F (A) = range(A) = imax(A)− imin(A) + 1.
→ F is not submodular...
→ G (A) = |A|
But F (A) : = d − 1 + range(A) is submodular !
In fact F (A) =∑
B∈G 1A∩B 6=∅ for B of the form:
Jenatton et al. (2011) considered Ω(w) =∑
B∈B ‖wB dB‖2.
Convex relaxation for Combinatorial Penalties 25/33
![Page 77: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/77.jpg)
An example: penalizing the rangeStructured prior on support (Jenatton et al., 2011):
the support is an interval of 1, . . . , p
Natural associated penalization:F (A) = range(A) = imax(A)− imin(A) + 1.
→ F is not submodular...
→ G (A) = |A|But F (A) : = d − 1 + range(A) is submodular !
In fact F (A) =∑
B∈G 1A∩B 6=∅ for B of the form:
Jenatton et al. (2011) considered Ω(w) =∑
B∈B ‖wB dB‖2.
Convex relaxation for Combinatorial Penalties 25/33
![Page 78: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/78.jpg)
An example: penalizing the rangeStructured prior on support (Jenatton et al., 2011):
the support is an interval of 1, . . . , p
Natural associated penalization:F (A) = range(A) = imax(A)− imin(A) + 1.
→ F is not submodular...
→ G (A) = |A|But F (A) : = d − 1 + range(A) is submodular !
In fact F (A) =∑
B∈G 1A∩B 6=∅ for B of the form:
Jenatton et al. (2011) considered Ω(w) =∑
B∈B ‖wB dB‖2.
Convex relaxation for Combinatorial Penalties 25/33
![Page 79: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/79.jpg)
Experiments
0 50 100 150 200 2500
0.5
1
1.5
S1
0 50 100 150 200 2500
0.5
1
S2
0 50 100 150 200 2500
0.5
1
S3
0 50 100 150 200 2500
0.5
1
S4
0 50 100 150 200 250−2
0
2
S5
Figure: Signals
S1 constant
S2 triangular shape
S3 x 7→ | sin(x) sin(5x)|S4 a slope pattern
S5 i.i.d. Gaussian pattern
Compare:
Lasso
Elastic Net
Naive `2 group-Lasso
Ω2 for F (A) = d − 1 + range(A)
Ω∞ for F (A) = d − 1 + range(A)
The weighted `2 group-Lasso of(Jenatton et al., 2011).
Convex relaxation for Combinatorial Penalties 26/33
![Page 80: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/80.jpg)
Constant signal
0 500 1000 1500 20000
20
40
60
80
100
d=256, k=160, σ=0.5
n
Best
Ham
min
g
EN
GL+w
GL
L1
Sub p=∞
Sub p=2
d = 256
k = 160
σ = .5
Convex relaxation for Combinatorial Penalties 27/33
![Page 81: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/81.jpg)
Triangular signal
0 500 1000 1500 20000
10
20
30
40
50
60
70
80
90
100
Best Hamming d=256, k=160, σ=0.5, S2, cov=id
n
Best
Ham
min
g
EN
GL+w
GL
L1
Sub p=∞
Sub p=2 d = 256
k = 160
σ = .5
Convex relaxation for Combinatorial Penalties 28/33
![Page 82: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/82.jpg)
(x1, x2) 7→ | sin(x1) sin(5x1) sin(x2) sin(5x2)| signal in 2D
0 500 1000 1500 2000 250020
30
40
50
60
70
80
90
100
d=256, k=160, σ=1.0
n
Best
Ham
min
g
EN
GL+w
GL
L1
Sub p=∞
Sub p=2
d = 256
k = 160
σ = .5
Convex relaxation for Combinatorial Penalties 29/33
![Page 83: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/83.jpg)
i.i.d Random signal in 2D
0 500 1000 1500 2000 25000
20
40
60
80
100
d=256, k=160, σ=1.0
n
Best
Ham
min
g
EN
GL+w
GL
L1
Sub p=∞
Sub p=2
d = 256
k = 160
σ = .5
Convex relaxation for Combinatorial Penalties 30/33
![Page 84: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/84.jpg)
Summary
A convex relaxation for functions penalizing
(a) the support via a general set function(b) the `p norm of the parameter vector w .
Principled construction of:
known norms like the group Lasso or `1/`p-normmany new sparsity inducing norms
Caveat: the relaxation can fail to capture the structure(e.g. range function)
For submodular functions we can obtain efficient algorithms, andtheoretical results such as consistency and support recoveryguarantees.
Convex relaxation for Combinatorial Penalties 31/33
![Page 85: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/85.jpg)
References I
Argyriou, A., Foygel, R., and Srebro, N. (2012). Sparse prediction with the k-support norm.In Advances in Neural Information Processing Systems 25, pages 1466–1474.
Bach, F. (2008). Exploring large feature spaces with hierarchical multiple kernel learning. InAdvances in Neural Information Processing Systems.
Bach, F. (2010). Structured sparsity-inducing norms through submodular functions. In Adv.NIPS.
Bickel, P., Ritov, Y., and Tsybakov, A. (2009). Simultaneous analysis of Lasso and Dantzigselector. Annals of Statistics, 37(4):1705–1732.
Huang, J., Zhang, T., and Metaxas, D. (2011). Learning with structured sparsity. J. Mach.Learn. Res., 12:3371–3412.
Jacob, L., Obozinski, G., and Vert, J. (2009). Group lasso with overlap and graph lasso. InICML.
Jenatton, R., Audibert, J., and Bach, F. (2011). Structured variable selection withsparsity-inducing norms. JMLR, 12:2777–2824.
Jenatton, R., Gramfort, A., Michel, V., Obozinski, G., Eger, E., Bach, F., and Thirion, B.(2012). Multiscale mining of fMRI data with hierarchical structured sparsity. SIAMJournal on Imaging Sciences, 5(3):835–856.
Mairal, J., Jenatton, R., Obozinski, G., and Bach, F. (2011). Convex and network flowoptimization for structured sparsity. JMLR, 12:2681–2720.
Convex relaxation for Combinatorial Penalties 32/33
![Page 86: Convex relaxation for Combinatorial Penalties · From sparsity... Empirical risk: for w 2Rd, L(w) = 1 2n Xn i=1 (y i x>w)2 Support of the model: Supp(w) = fi jw i 6= 0 g: Penalization](https://reader030.vdocuments.us/reader030/viewer/2022041203/5d4ef7b988c99367198be8df/html5/thumbnails/86.jpg)
References II
Wainwright, M. J. (2009). Sharp thresholds for noisy and high-dimensional recovery ofsparsity using `1- constrained quadratic programming. IEEE Transactions on InformationTheory, 55:2183–2202.
Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with groupedvariables. J. Roy. Stat. Soc. B, 68:49–67.
Zhao, P., Rocha, G., and Yu, B. (2009). Grouped and hierarchical model selection throughcomposite absolute penalties. Annals of Statistics, 37(6A):3468–3497.
Convex relaxation for Combinatorial Penalties 33/33