on the k-support and related norms - sorbonne-universite.fr
TRANSCRIPT
On the k-Support and Related Norms
Massimiliano Pontil
Department of Computer ScienceCentre for Computational Statistics and Machine Learning
University College London
(Joint work with Andrew McDonald and Dimitris Stamos)
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 1 / 14
Plan
Problem
Spectral regularization
k-support norm
Box norm
Link to cluster norm
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 2 / 14
Problem
Learn a matrix from a set of linear measurements:
yi = 〈W ∗,Xi 〉+ noisei , i = 1, ..., n
Method
minW∈Rd×m
n∑i=1
(yi − 〈W ,Xi 〉)2 + λΩ(W )
Matrix completion: Xi = erec>
Multitask learning: Xi = erxi>
Regularizer Ω encourages matrix structure
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 3 / 14
Spectral Regularization
minW∈Rd×m
n∑i=1
(yi − 〈W ,Xi 〉)2 + λΩ(W )
Ω favors matrix structure (low rank, low variance, clustering, etc.)
Choose an OI-norm: Ω(W ) ≡ ‖W ‖ = ‖UWV ‖, ∀U,V orthogonal
von Neumann (1937): ‖W ‖ = g(σ(W )), with g is an SG-function
Well studied example is trace norm: g(·) = ‖ · ‖1
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 4 / 14
k-Support Norm [Argyriou et al. 2012]
Special case of group lasso with overlap [Jacob et al., 2009]
‖w‖(k) = inf
∑|J|≤k
‖vJ‖2 :∑|J|≤k
vJ = w , supp(vJ) ⊂ J
Includes the `1-norm (k = 1) and `2-norm (k = d)
Unit ball of ‖ · ‖(k) is the convex hull of card(w) ≤ k , ‖w‖2 ≤ 1
Dual norm: ‖u‖∗,(k) =
√k∑
i=1(|u|↓i )2
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 5 / 14
Spectral k-Support Norm
k-support norm is an SG-function, inducing the OI-norm
‖W ‖(k) := ‖σ(W )‖(k)
Proposition. Unit ball of ‖σ(·)‖(k) is the convex hull of
rank(W ) ≤ k , ‖W ‖F ≤ 1
Includes trace norm (k = 1) and Frobenius norm (k = d)
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 6 / 14
Matrix Completion Experiment
dataset norm test error r k a
ML 100k tr 0.2017 13 - -ρ = 50% en 0.2017 13 - -
ks 0.1990 9 1.87 -box 0.1989 10 2.00 1e-5
ML 1M tr 0.1790 17 - -ρ = 50% en 0.1789 15 - -
ks 0.1782 17 1.80 -box 0.1777 19 2.00 1e-6
Jester1 tr 0.1752 11 - -20 per en 0.1752 11 - -line ks 0.1739 11 6.38 -
box 0.1726 11 6.40 2e-5
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 7 / 14
MTL Experiment
Table: Multitask learning clustering on Lenk dataset, with simple thresholding.
dataset norm test error k a
Lenk fr 3.7869 (0.07) - -8 per task tr 1.9058 (0.04) - -
en 1.8974 (0.04) - -ks 1.8933 (0.04) 1.02 -box 1.8916 (0.04) 1.01 5.5e-3c-fr 1.8667 (0.08) - -c-tr 1.7904 (0.03) - -c-en 1.7896 (0.03) - -c-ks 1.7775 (0.03) 1.89 -c-box 1.7754 (0.03) 1.12 9.5e-3
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 8 / 14
Box Norm
Let Θ ⊆ Rd++, bounded and convex and consider the norm:
‖w‖2Θ = inf
θ∈Θ
d∑i=1
w2i
θi, ‖u‖2
∗,Θ = supθ∈Θ
d∑i=1
θiu2i
Box norm: Θ =a < θi ≤ b,
∑di=1 θi ≤ c
Includes k-support norm for a = 0, b = 1, c = k
Unit ball is the convex hull of
⋃|J|≤k
w ∈ Rd :
∑i∈J
w2i
b+∑i /∈J
w2i
a≤ 1
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 9 / 14
Unit Balls
Figure: Unit balls of the box norm in R2 for k = 1, a ∈ 0.01, 0.25, 0.50.
Figure: Unit balls of the dual box norm in R2 for k = 1, a ∈ 0.01, 0.25, 0.50.
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 10 / 14
Cluster Norm
Box norm is an SG-function, inducing the OI-norm
‖W ‖2Θ = ‖σ(W )‖2
Θ = inf d∑
i=1
σi (W )2
θi: θ ∈ (a, b]d ,
d∑i=1
θi ≤ c
Associated OI-norm has been used to favour task clustering [Jacob et al.
2008]. It can be written as
‖W ‖2Θ = inf
tr(WΣ−1W T ) : aI Σ bI , tr Σ ≤ c
Includes spectral k-support norm for a = 0, b = 1, c = k
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 11 / 14
Interpretation of “a”
Proposition. If c = da + k(b − a), the solution of the regularizationproblem is given by W = V + Z , where
(V , Z ) = argminV ,Z
n∑i=1
(yi − 〈V + Z ,Xi 〉)2 + λ
(1
a‖V ‖2
F +1
b − a‖Z‖2
(k)
)
Parameter ‘a’ balances the relative importance of the two components
Cluster norm is the Moureau envelope of spectral k-support norm:
‖W ‖2Θ = min
Z∈Rd×m
1
a‖W − Z‖2
F +1
b − a‖Z‖2
(k)
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 12 / 14
Computation of the Θ norm
Assume w.l.o.g. w ≥ 0 with non increasing components
‖w‖2Θ = 1
b‖w[1:q]‖22 + 1
c−qb−`a‖w[q+1:d−`]‖21 + 1
a‖w[`+1:d ]‖22,
where q, ` ∈ 0, ..., d are uniquely determined
In particular: ‖w‖(k) = ‖w[1:q]‖22 + 1
k−q‖w[q+1:d ]‖21
where q ∈ 0, ..., k − 1 is determined by |w |↓q ≥ 1k−q
d∑j=q+1
|w |↓j > |w |↓q+1
Computation of norm is O(d log(d))
For k-support improves previous O(kd) method
Efficient optimization using proximal-gradient methods
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 13 / 14
Extensions/Open Problems
Other sets Θ allow for exact prox, e.g. Θ = θ1 ≥ . . . θd > 0.Can give a general characterization?
Online learning / stochastic optimization
Kernel extensions
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 14 / 14