![Page 1: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/1.jpg)
Computing Nonnegative Matrix Factorizations
Nicolas Gillis
Joint work with Francois Glineur, Robert Luce, Stephen Vavasis,Arnaud Vandaele, Jeremy Cohen
![Page 2: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/2.jpg)
Where is Mons?
2//
![Page 3: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/3.jpg)
Where is Mons?
2//
![Page 4: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/4.jpg)
Nonnegative Matrix Factorization (NMF)
Given a matrix M ∈ Rp×n+ and a factorization rank r min(p, n), find
U ∈ Rp×rand V ∈ Rr×n such that
minU≥0,V≥0
||M − UV ||2F =∑i ,j
(M − UV )2ij . (NMF)
NMF is a linear dimensionality reduction technique for nonnegative data :
M(:, i)︸ ︷︷ ︸≥0
≈r∑
k=1
U(:, k)︸ ︷︷ ︸≥0
V (k , i)︸ ︷︷ ︸≥0
for all i .
Why nonnegativity?
→ Interpretability: Nonnegativity constraints lead to easily interpretablefactors (and a sparse and part-based representation).→ Many applications. image processing, text mining, hyperspectralunmixing, community detection, clustering, etc.
3//
![Page 5: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/5.jpg)
Nonnegative Matrix Factorization (NMF)
Given a matrix M ∈ Rp×n+ and a factorization rank r min(p, n), find
U ∈ Rp×rand V ∈ Rr×n such that
minU≥0,V≥0
||M − UV ||2F =∑i ,j
(M − UV )2ij . (NMF)
NMF is a linear dimensionality reduction technique for nonnegative data :
M(:, i)︸ ︷︷ ︸≥0
≈r∑
k=1
U(:, k)︸ ︷︷ ︸≥0
V (k , i)︸ ︷︷ ︸≥0
for all i .
Why nonnegativity?
→ Interpretability: Nonnegativity constraints lead to easily interpretablefactors (and a sparse and part-based representation).→ Many applications. image processing, text mining, hyperspectralunmixing, community detection, clustering, etc.
3//
![Page 6: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/6.jpg)
Nonnegative Matrix Factorization (NMF)
Given a matrix M ∈ Rp×n+ and a factorization rank r min(p, n), find
U ∈ Rp×rand V ∈ Rr×n such that
minU≥0,V≥0
||M − UV ||2F =∑i ,j
(M − UV )2ij . (NMF)
NMF is a linear dimensionality reduction technique for nonnegative data :
M(:, i)︸ ︷︷ ︸≥0
≈r∑
k=1
U(:, k)︸ ︷︷ ︸≥0
V (k , i)︸ ︷︷ ︸≥0
for all i .
Why nonnegativity?
→ Interpretability: Nonnegativity constraints lead to easily interpretablefactors (and a sparse and part-based representation).→ Many applications. image processing, text mining, hyperspectralunmixing, community detection, clustering, etc.
3//
![Page 7: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/7.jpg)
Example 1: Blind hyperspectral unmixing
Figure: Urban hyperspectral image, 162 spectral bands and 307-by-307 pixels.
Problem. Identify the materials and classify the pixels.
4//
![Page 8: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/8.jpg)
Example 1: Blind hyperspectral unmixing
Figure: Urban hyperspectral image, 162 spectral bands and 307-by-307 pixels.
Problem. Identify the materials and classify the pixels.
4//
![Page 9: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/9.jpg)
Linear mixing model
5//
![Page 10: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/10.jpg)
Linear mixing model
5//
![Page 11: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/11.jpg)
Example 1: Blind hyperspectral unmixing with NMF
Basis elements allow to recover the different endmembers: U ≥ 0;
Abundances of the endmembers in each pixel: V ≥ 0.
6//
![Page 12: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/12.jpg)
Example 1: Blind hyperspectral unmixing with NMF
Basis elements allow to recover the different endmembers: U ≥ 0;
Abundances of the endmembers in each pixel: V ≥ 0.
6//
![Page 13: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/13.jpg)
Example 1: Blind hyperspectral unmixing with NMF
Basis elements allow to recover the different endmembers: U ≥ 0;
Abundances of the endmembers in each pixel: V ≥ 0.
6//
![Page 14: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/14.jpg)
Urban hyperspectral image
7//
![Page 15: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/15.jpg)
Urban hyperspectral image
Figure: Decomposition of the Urban dataset.
8//
![Page 16: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/16.jpg)
Urban hyperspectral image
Figure: Decomposition of the Urban dataset.
8//
![Page 17: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/17.jpg)
Urban hyperspectral image
Figure: Decomposition of the Urban dataset.
8//
![Page 18: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/18.jpg)
Example 2: topic recovery and document classification
Basis elements allow to recover the different topics;
Weights allow to assign each text to its corresponding topics.
9//
![Page 19: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/19.jpg)
Example 2: topic recovery and document classification
Basis elements allow to recover the different topics;
Weights allow to assign each text to its corresponding topics.
9//
![Page 20: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/20.jpg)
Example 2: topic recovery and document classification
Basis elements allow to recover the different topics;
Weights allow to assign each text to its corresponding topics.
9//
![Page 21: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/21.jpg)
Exemple 3: feature extraction and classification
The basis elements extract facial features such as eyes, nose and lips.
10//
![Page 22: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/22.jpg)
Exemple 3: feature extraction and classification
The basis elements extract facial features such as eyes, nose and lips.
10//
![Page 23: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/23.jpg)
Exemple 3: feature extraction and classification
The basis elements extract facial features such as eyes, nose and lips.10
//
![Page 24: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/24.jpg)
![Page 25: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/25.jpg)
Outline
1 Computational complexity
2 Standard non-linear optimization schemes and acceleration
3 Exact NMF (M = UV ) and its geometric interpretation
4 NMF under the separability assumption
12//
![Page 26: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/26.jpg)
Computational Complexity of NMF
13//
![Page 27: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/27.jpg)
Complexity of NMF
minU∈Rp×r ,V∈Rr×n
||M − UV ||2F such that U ≥ 0,V ≥ 0.
For r = 1, Eckart-Young and Perron-Frobenius theorems.
Checking whether there exists an exact factorization M = UV :
NP-hard (Vavasis, 2009) where p, n and r are not fixed.
Using quantifier elimination (reformulation with fixed number ofvariables)
Cohen and Rothblum [1991]: (mn)O(mr+nr), non-polynomialArora et al. [2012]: (mn)O(2r ), polynomial
Moitra [2013] : (mn)O(r2), polynomial→ not really useful in practice . . .
Does not imply that rank+ (the minimum r such that M = UV ) canbe computed in polynomial time (because there are no upper boundon rank+).
14//
![Page 28: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/28.jpg)
Complexity of NMF
minU∈Rp×r ,V∈Rr×n
||M − UV ||2F such that U ≥ 0,V ≥ 0.
For r = 1, Eckart-Young and Perron-Frobenius theorems.
Checking whether there exists an exact factorization M = UV :
NP-hard (Vavasis, 2009) where p, n and r are not fixed.
Using quantifier elimination (reformulation with fixed number ofvariables)
Cohen and Rothblum [1991]: (mn)O(mr+nr), non-polynomialArora et al. [2012]: (mn)O(2r ), polynomial
Moitra [2013] : (mn)O(r2), polynomial→ not really useful in practice . . .
Does not imply that rank+ (the minimum r such that M = UV ) canbe computed in polynomial time (because there are no upper boundon rank+).
14//
![Page 29: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/29.jpg)
Complexity of NMF
minU∈Rp×r ,V∈Rr×n
||M − UV ||2F such that U ≥ 0,V ≥ 0.
For r = 1, Eckart-Young and Perron-Frobenius theorems.
Checking whether there exists an exact factorization M = UV :
NP-hard (Vavasis, 2009) where p, n and r are not fixed.
Using quantifier elimination (reformulation with fixed number ofvariables)
Cohen and Rothblum [1991]: (mn)O(mr+nr), non-polynomialArora et al. [2012]: (mn)O(2r ), polynomial
Moitra [2013] : (mn)O(r2), polynomial→ not really useful in practice . . .
Does not imply that rank+ (the minimum r such that M = UV ) canbe computed in polynomial time (because there are no upper boundon rank+).
14//
![Page 30: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/30.jpg)
Complexity of NMF
minU∈Rp×r ,V∈Rr×n
||M − UV ||2F such that U ≥ 0,V ≥ 0.
For r = 1, Eckart-Young and Perron-Frobenius theorems.
Checking whether there exists an exact factorization M = UV :
NP-hard (Vavasis, 2009) where p, n and r are not fixed.
Using quantifier elimination (reformulation with fixed number ofvariables)
Cohen and Rothblum [1991]: (mn)O(mr+nr), non-polynomialArora et al. [2012]: (mn)O(2r ), polynomial
Moitra [2013] : (mn)O(r2), polynomial→ not really useful in practice . . .
Does not imply that rank+ (the minimum r such that M = UV ) canbe computed in polynomial time (because there are no upper boundon rank+).
14//
![Page 31: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/31.jpg)
Complexity for other norms
minu∈Rp ,v∈Rn
||M − uvT ||1 =∑i ,j
|Mij − uivj | . (`1 norm)
If M is binary, M ∈ 0, 1m×n, any optimal solution (u∗, v∗) can beassumed to be binary, that is, (u∗, v∗) ∈ 0, 1p × 0, 1n.
minu∈Rp ,v∈Rn
||M − uvT ||2W =∑i ,j
Wij(M − uvT )2ij , (weighted `2 norm)
where W is a nonnegative weight matrix.This model can be used when
data is missing (Wij = 0 for missing entries),entries have different variances (Wij = 1/σ2
ij).
G., Vavasis, On the Complexity of Robust PCA and `1-Norm Low-Rank MatrixApproximation, Mathematics of Operations Research, 2018.G., Glineur, Low-Rank Matrix Approximation with Weights or Missing Data is NP-hard,SIAM J. Mat. Anal. Appl., 2011.
15//
![Page 32: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/32.jpg)
Complexity for other norms
minu∈Rp ,v∈Rn
||M − uvT ||1 =∑i ,j
|Mij − uivj | . (`1 norm)
If M is binary, M ∈ 0, 1m×n, any optimal solution (u∗, v∗) can beassumed to be binary, that is, (u∗, v∗) ∈ 0, 1p × 0, 1n.
minu∈Rp ,v∈Rn
||M − uvT ||2W =∑i ,j
Wij(M − uvT )2ij , (weighted `2 norm)
where W is a nonnegative weight matrix.This model can be used when
data is missing (Wij = 0 for missing entries),entries have different variances (Wij = 1/σ2
ij).
G., Vavasis, On the Complexity of Robust PCA and `1-Norm Low-Rank MatrixApproximation, Mathematics of Operations Research, 2018.
G., Glineur, Low-Rank Matrix Approximation with Weights or Missing Data is NP-hard,SIAM J. Mat. Anal. Appl., 2011.
15//
![Page 33: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/33.jpg)
Complexity for other norms
minu∈Rp ,v∈Rn
||M − uvT ||1 =∑i ,j
|Mij − uivj | . (`1 norm)
If M is binary, M ∈ 0, 1m×n, any optimal solution (u∗, v∗) can beassumed to be binary, that is, (u∗, v∗) ∈ 0, 1p × 0, 1n.
minu∈Rp ,v∈Rn
||M − uvT ||2W =∑i ,j
Wij(M − uvT )2ij , (weighted `2 norm)
where W is a nonnegative weight matrix.This model can be used when
data is missing (Wij = 0 for missing entries),entries have different variances (Wij = 1/σ2
ij).
G., Vavasis, On the Complexity of Robust PCA and `1-Norm Low-Rank MatrixApproximation, Mathematics of Operations Research, 2018.G., Glineur, Low-Rank Matrix Approximation with Weights or Missing Data is NP-hard,SIAM J. Mat. Anal. Appl., 2011.
15//
![Page 34: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/34.jpg)
NMF Algorithms and Acceleration
16//
![Page 35: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/35.jpg)
NMF Algorithms
Given a matrix M ∈ Rm×n+ and a factorization rank r ∈ N:
minU∈Rm×r
+ ,V∈Rr×n+
||M − UV ||2F =∑i ,j
(M − UV )2ij . (NMF)
This is a difficult non-linear optimization problem with potentially manylocal minima.
Standard framework:
0. Initialize (U, V ). Then, alternatively update U and V :
1. Update V ≈ argminX≥0 ||M − UX ||2F . (NNLS)2. Update U ≈ argminY≥0 ||M − YV ||2F . (NNLS)
Most NMF algorithms come with no guarantees (except convergence tostationary points).
Solution is in general highly non-unique: indentifiability issues.
17//
![Page 36: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/36.jpg)
NMF Algorithms
Given a matrix M ∈ Rm×n+ and a factorization rank r ∈ N:
minU∈Rm×r
+ ,V∈Rr×n+
||M − UV ||2F =∑i ,j
(M − UV )2ij . (NMF)
This is a difficult non-linear optimization problem with potentially manylocal minima.
Standard framework:
0. Initialize (U, V ). Then, alternatively update U and V :
1. Update V ≈ argminX≥0 ||M − UX ||2F . (NNLS)2. Update U ≈ argminY≥0 ||M − YV ||2F . (NNLS)
Most NMF algorithms come with no guarantees (except convergence tostationary points).
Solution is in general highly non-unique: indentifiability issues.
17//
![Page 37: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/37.jpg)
NMF Algorithms
Given a matrix M ∈ Rm×n+ and a factorization rank r ∈ N:
minU∈Rm×r
+ ,V∈Rr×n+
||M − UV ||2F =∑i ,j
(M − UV )2ij . (NMF)
This is a difficult non-linear optimization problem with potentially manylocal minima.
Standard framework:
0. Initialize (U, V ). Then, alternatively update U and V :
1. Update V ≈ argminX≥0 ||M − UX ||2F . (NNLS)2. Update U ≈ argminY≥0 ||M − YV ||2F . (NNLS)
Most NMF algorithms come with no guarantees (except convergence tostationary points).
Solution is in general highly non-unique: indentifiability issues.
17//
![Page 38: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/38.jpg)
NMF Algorithms
Given a matrix M ∈ Rm×n+ and a factorization rank r ∈ N:
minU∈Rm×r
+ ,V∈Rr×n+
||M − UV ||2F =∑i ,j
(M − UV )2ij . (NMF)
This is a difficult non-linear optimization problem with potentially manylocal minima.
Standard framework:
0. Initialize (U, V ). Then, alternatively update U and V :
1. Update V ≈ argminX≥0 ||M − UX ||2F . (NNLS)2. Update U ≈ argminY≥0 ||M − YV ||2F . (NNLS)
Most NMF algorithms come with no guarantees (except convergence tostationary points).
Solution is in general highly non-unique: indentifiability issues.
17//
![Page 39: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/39.jpg)
Block coordinate descent method
Use block-coordinate descent on the NNLS subproblems−→ closed-form solutions for the columns of U and rows of V :
U∗:k = argminU:k≥0 ||Rk − U:kVk:||2F = max
(0,
RkVTk:
||Vk:||22
)∀k ,
where Rk.
= M −∑
j 6=k U:jVj :, and similarly for V .This is the so-called HALS algorithm.
It can be accelerated:
1 Gauss-Seidel Coordinate descent (Hsieh, Dhillon, 2011).
2 Loop several time over columns of U/rows of V to perform moreiterations at a lower computational cost (Glineur, G., 2012).
3 Randomized shuffling (Chow, Wu, Yin, 2017).
4 Use an extrapolation step: W (k+1) = W (k+1) + βk(W (k+1) −W (k))(Ang, G., 2018).
18//
![Page 40: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/40.jpg)
Block coordinate descent method
Use block-coordinate descent on the NNLS subproblems−→ closed-form solutions for the columns of U and rows of V :
U∗:k = argminU:k≥0 ||Rk − U:kVk:||2F = max
(0,
RkVTk:
||Vk:||22
)∀k ,
where Rk.
= M −∑
j 6=k U:jVj :, and similarly for V .This is the so-called HALS algorithm.
It can be accelerated:
1 Gauss-Seidel Coordinate descent (Hsieh, Dhillon, 2011).
2 Loop several time over columns of U/rows of V to perform moreiterations at a lower computational cost (Glineur, G., 2012).
3 Randomized shuffling (Chow, Wu, Yin, 2017).
4 Use an extrapolation step: W (k+1) = W (k+1) + βk(W (k+1) −W (k))(Ang, G., 2018).
18//
![Page 41: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/41.jpg)
Illustration on the CBCL face image data set
19//
![Page 42: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/42.jpg)
Exact NMF: Geometry and ExtendedFormulations
20//
![Page 43: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/43.jpg)
Geometric interpretation of exact NMF
Given M = UV , one can scale M and U such that they become columnstochastic implying that V is column stochastic:
M = UV ⇐⇒ M ′ = MDM = (UDU)(D−1U VDM) = U ′V ′.
The columns of M are convex combinations of the columns of U:
M:j =k∑
i=1
U:i Vij withk∑
i=1
Vij = 1∀j , Vij ≥ 0∀ij .
In other terms,
conv(M) ⊆ conv(U) ⊆ Sn,
where conv(X ) is the convex hull of the columns of X , andSn = x ∈ Rn |x ≥ 0,
∑ni=1 xi = 1 is the unit simplex.
Exact NMF ≡ Find r points whose convex hull is nested between twogiven polytopes.
21//
![Page 44: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/44.jpg)
Geometric interpretation of exact NMF
Given M = UV , one can scale M and U such that they become columnstochastic implying that V is column stochastic:
M = UV ⇐⇒ M ′ = MDM = (UDU)(D−1U VDM) = U ′V ′.
The columns of M are convex combinations of the columns of U:
M:j =k∑
i=1
U:i Vij withk∑
i=1
Vij = 1∀j , Vij ≥ 0∀ij .
In other terms,
conv(M) ⊆ conv(U) ⊆ Sn,
where conv(X ) is the convex hull of the columns of X , andSn = x ∈ Rn |x ≥ 0,
∑ni=1 xi = 1 is the unit simplex.
Exact NMF ≡ Find r points whose convex hull is nested between twogiven polytopes.
21//
![Page 45: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/45.jpg)
Geometric interpretation of exact NMF
Given M = UV , one can scale M and U such that they become columnstochastic implying that V is column stochastic:
M = UV ⇐⇒ M ′ = MDM = (UDU)(D−1U VDM) = U ′V ′.
The columns of M are convex combinations of the columns of U:
M:j =k∑
i=1
U:i Vij withk∑
i=1
Vij = 1∀j , Vij ≥ 0∀ij .
In other terms,
conv(M) ⊆ conv(U) ⊆ Sn,
where conv(X ) is the convex hull of the columns of X , andSn = x ∈ Rn |x ≥ 0,
∑ni=1 xi = 1 is the unit simplex.
Exact NMF ≡ Find r points whose convex hull is nested between twogiven polytopes.
21//
![Page 46: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/46.jpg)
Geometric interpretation of exact NMF
Given M = UV , one can scale M and U such that they become columnstochastic implying that V is column stochastic:
M = UV ⇐⇒ M ′ = MDM = (UDU)(D−1U VDM) = U ′V ′.
The columns of M are convex combinations of the columns of U:
M:j =k∑
i=1
U:i Vij withk∑
i=1
Vij = 1∀j , Vij ≥ 0∀ij .
In other terms,
conv(M) ⊆ conv(U) ⊆ Sn,
where conv(X ) is the convex hull of the columns of X , andSn = x ∈ Rn |x ≥ 0,
∑ni=1 xi = 1 is the unit simplex.
Exact NMF ≡ Find r points whose convex hull is nested between twogiven polytopes.
21//
![Page 47: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/47.jpg)
Geometric interpretation of NMF
Example: Two nested hexagons (rank(Ma) = 3)
22//
![Page 48: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/48.jpg)
Geometric interpretation of NMF
Example: Two nested hexagons (rank(Ma) = 3)
Ma =1
a
1 a 2a− 1 2a− 1 a 11 1 a 2a− 1 2a− 1 aa 1 1 a 2a− 1 2a− 1
2a− 1 a 1 1 a 2a− 12a− 1 2a− 1 a 1 1 a
a 2a− 1 2a− 1 a 1 1
, a > 1.
22//
![Page 49: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/49.jpg)
Geometric interpretation of NMF
Example: Two nested hexagons (rank(Ma) = 3)Case 1: a = 2, rank+(Ma) = 3, col(M) = col(U)
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
∆p ∩ col(M2)
conv(M2)
conv(U)
22//
![Page 50: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/50.jpg)
Geometric interpretation of NMF
Example: Two nested hexagons (rank(Ma) = 3)Case 2: a = 3, rank+(Ma) = 4, col(M) = col(U)
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
∆p ∩ col(M3)
conv(M3)
conv(U)
22//
![Page 51: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/51.jpg)
Geometric interpretation of NMF
Example: Two nested hexagons (rank(Ma) = 3)Case 3: a→ +∞, rank+(Ma) = 5, col(M) 6= col(U)
22//
![Page 52: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/52.jpg)
An amazing result: NMF and extended formulations
Let P be a polytope
P = x ∈ Rk | bi − A(i , :)x ≥ 0 for 1 ≤ i ≤ m,
and let vj ’s (1 ≤ j ≤ n) be its vertices.
We define the m-by-n slack matrix SP of P as follows:
SP(i , j) = bi − A(i , :)vj≥ 0 1 ≤ i ≤ m, 1 ≤ j ≤ n.
The hexagon:
SP =
0 1 2 2 1 00 0 1 2 2 11 0 0 1 2 22 1 0 0 1 22 2 1 0 0 11 2 2 1 0 0
23//
![Page 53: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/53.jpg)
An amazing result: NMF and extended formulations
Let P be a polytope
P = x ∈ Rk | bi − A(i , :)x ≥ 0 for 1 ≤ i ≤ m,
and let vj ’s (1 ≤ j ≤ n) be its vertices.
We define the m-by-n slack matrix SP of P as follows:
SP(i , j) = bi − A(i , :)vj≥ 0 1 ≤ i ≤ m, 1 ≤ j ≤ n.
The hexagon:
SP =
0 1 2 2 1 00 0 1 2 2 11 0 0 1 2 22 1 0 0 1 22 2 1 0 0 11 2 2 1 0 0
23//
![Page 54: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/54.jpg)
An amazing result: NMF and extended formulations
Let P be a polytope
P = x ∈ Rk | bi − A(i , :)x ≥ 0 for 1 ≤ i ≤ m,
and let vj ’s (1 ≤ j ≤ n) be its vertices.
We define the m-by-n slack matrix SP of P as follows:
SP(i , j) = bi − A(i , :)vj≥ 0 1 ≤ i ≤ m, 1 ≤ j ≤ n.
The hexagon:
SP =
0 1 2 2 1 00 0 1 2 2 11 0 0 1 2 22 1 0 0 1 22 2 1 0 0 11 2 2 1 0 0
23
//
![Page 55: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/55.jpg)
An amazing result: NMF and extended formulations
An extended formulation of P is higher dimensional polyhedron Q ⊆ Rk+p
that (linearly) projects onto P. The minimum number of facets of such apolytope is called the extension complexity xp(P) of P.
Theorem (Yannakakis, 1991).
rank+(SP) = xp(P).
Proof (one direction). Given P = x ∈ Rk | b−Ax ≥ 0, any exact NMFof SP = UV ,U ≥ 0,V ≥ 0 provides an explicit extended formulation(with some redundant equalities) of P:
P = x | b − Ax ≥ 0 = x | b − Ax = Uy and y ≥ 0.
Remark. The slack matrix SP of P satisfies
conv(SP) = Sm ∩ col(SP).
To get a small factorization, we need to go to a higher dimensional space:rank(U) > rank(M).
24//
![Page 56: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/56.jpg)
An amazing result: NMF and extended formulations
An extended formulation of P is higher dimensional polyhedron Q ⊆ Rk+p
that (linearly) projects onto P. The minimum number of facets of such apolytope is called the extension complexity xp(P) of P.
Theorem (Yannakakis, 1991).
rank+(SP) = xp(P).
Proof (one direction). Given P = x ∈ Rk | b−Ax ≥ 0, any exact NMFof SP = UV ,U ≥ 0,V ≥ 0 provides an explicit extended formulation(with some redundant equalities) of P:
P = x | b − Ax ≥ 0 = x | b − Ax = Uy and y ≥ 0.
Remark. The slack matrix SP of P satisfies
conv(SP) = Sm ∩ col(SP).
To get a small factorization, we need to go to a higher dimensional space:rank(U) > rank(M).
24//
![Page 57: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/57.jpg)
An amazing result: NMF and extended formulations
An extended formulation of P is higher dimensional polyhedron Q ⊆ Rk+p
that (linearly) projects onto P. The minimum number of facets of such apolytope is called the extension complexity xp(P) of P.
Theorem (Yannakakis, 1991).
rank+(SP) = xp(P).
Proof (one direction). Given P = x ∈ Rk | b−Ax ≥ 0, any exact NMFof SP = UV ,U ≥ 0,V ≥ 0 provides an explicit extended formulation(with some redundant equalities) of P:
P = x | b − Ax ≥ 0 = x | b − Ax = Uy and y ≥ 0.
Remark. The slack matrix SP of P satisfies
conv(SP) = Sm ∩ col(SP).
To get a small factorization, we need to go to a higher dimensional space:rank(U) > rank(M).
24//
![Page 58: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/58.jpg)
An amazing result: NMF and extended formulations
An extended formulation of P is higher dimensional polyhedron Q ⊆ Rk+p
that (linearly) projects onto P. The minimum number of facets of such apolytope is called the extension complexity xp(P) of P.
Theorem (Yannakakis, 1991).
rank+(SP) = xp(P).
Proof (one direction). Given P = x ∈ Rk | b−Ax ≥ 0, any exact NMFof SP = UV ,U ≥ 0,V ≥ 0 provides an explicit extended formulation(with some redundant equalities) of P:
P = x | b − Ax ≥ 0 = x | b − Ax = Uy and y ≥ 0.
Remark. The slack matrix SP of P satisfies
conv(SP) = Sm ∩ col(SP).
To get a small factorization, we need to go to a higher dimensional space:rank(U) > rank(M).
24//
![Page 59: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/59.jpg)
The Hexagon
SP =
0 1 2 2 1 00 0 1 2 2 11 0 0 1 2 22 1 0 0 1 22 2 1 0 0 11 2 2 1 0 0
25//
![Page 60: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/60.jpg)
The Hexagon
SP =
0 1 2 2 1 00 0 1 2 2 11 0 0 1 2 22 1 0 0 1 22 2 1 0 0 11 2 2 1 0 0
=
1 0 0 1/2 00 1 0 1 00 0 1 1/2 00 0 1 0 1/20 1 0 0 11 0 0 0 1/2
0 1 2 1 0 00 0 1 0 0 11 0 0 0 1 20 0 0 2 2 02 2 0 0 0 0
,
with
rank(SP) = 3 ≤ rank+(SP) = 5 ≤ min(m, n) = 6.
25//
![Page 61: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/61.jpg)
Some implications
Problem: limits of LP for solving combinatorial problems: given apolytope, what is the most compact way to represent it?
Its extension complexity = nonnegative rank of its slack matrix.Key tool: lower bound techniques for the nonnegative rank.
Ex. The matching problem cannot be solved via a polynomial-size LP.Rothvoss (2014). The matching polytope has exponential extension complexity, STOC.
This can be generalized to
approximations (no poly-size LP can approximate these problems upto some precision).Braun, Fiorini, Pokutta & Steurer (2012). Approximation limits of linear programs(beyond hierarchies), FOCS.
any convex cone, in particular PSD (so called PSD-rank).See the survey: Fawzi, Gouveia, Parrilo, Robinson, & Thomas. Positivesemidefinite rank, Mathematical Programming, 2015.
26//
![Page 62: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/62.jpg)
Some implications
Problem: limits of LP for solving combinatorial problems: given apolytope, what is the most compact way to represent it?Its extension complexity = nonnegative rank of its slack matrix.
Key tool: lower bound techniques for the nonnegative rank.
Ex. The matching problem cannot be solved via a polynomial-size LP.Rothvoss (2014). The matching polytope has exponential extension complexity, STOC.
This can be generalized to
approximations (no poly-size LP can approximate these problems upto some precision).Braun, Fiorini, Pokutta & Steurer (2012). Approximation limits of linear programs(beyond hierarchies), FOCS.
any convex cone, in particular PSD (so called PSD-rank).See the survey: Fawzi, Gouveia, Parrilo, Robinson, & Thomas. Positivesemidefinite rank, Mathematical Programming, 2015.
26//
![Page 63: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/63.jpg)
Some implications
Problem: limits of LP for solving combinatorial problems: given apolytope, what is the most compact way to represent it?Its extension complexity = nonnegative rank of its slack matrix.Key tool: lower bound techniques for the nonnegative rank.
Ex. The matching problem cannot be solved via a polynomial-size LP.Rothvoss (2014). The matching polytope has exponential extension complexity, STOC.
This can be generalized to
approximations (no poly-size LP can approximate these problems upto some precision).Braun, Fiorini, Pokutta & Steurer (2012). Approximation limits of linear programs(beyond hierarchies), FOCS.
any convex cone, in particular PSD (so called PSD-rank).See the survey: Fawzi, Gouveia, Parrilo, Robinson, & Thomas. Positivesemidefinite rank, Mathematical Programming, 2015.
26//
![Page 64: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/64.jpg)
Some implications
Problem: limits of LP for solving combinatorial problems: given apolytope, what is the most compact way to represent it?Its extension complexity = nonnegative rank of its slack matrix.Key tool: lower bound techniques for the nonnegative rank.
Ex. The matching problem cannot be solved via a polynomial-size LP.Rothvoss (2014). The matching polytope has exponential extension complexity, STOC.
This can be generalized to
approximations (no poly-size LP can approximate these problems upto some precision).Braun, Fiorini, Pokutta & Steurer (2012). Approximation limits of linear programs(beyond hierarchies), FOCS.
any convex cone, in particular PSD (so called PSD-rank).See the survey: Fawzi, Gouveia, Parrilo, Robinson, & Thomas. Positivesemidefinite rank, Mathematical Programming, 2015.
26//
![Page 65: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/65.jpg)
Some implications
Problem: limits of LP for solving combinatorial problems: given apolytope, what is the most compact way to represent it?Its extension complexity = nonnegative rank of its slack matrix.Key tool: lower bound techniques for the nonnegative rank.
Ex. The matching problem cannot be solved via a polynomial-size LP.Rothvoss (2014). The matching polytope has exponential extension complexity, STOC.
This can be generalized to
approximations (no poly-size LP can approximate these problems upto some precision).Braun, Fiorini, Pokutta & Steurer (2012). Approximation limits of linear programs(beyond hierarchies), FOCS.
any convex cone, in particular PSD (so called PSD-rank).See the survey: Fawzi, Gouveia, Parrilo, Robinson, & Thomas. Positivesemidefinite rank, Mathematical Programming, 2015.
26//
![Page 66: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/66.jpg)
Some implications
Problem: limits of LP for solving combinatorial problems: given apolytope, what is the most compact way to represent it?Its extension complexity = nonnegative rank of its slack matrix.Key tool: lower bound techniques for the nonnegative rank.
Ex. The matching problem cannot be solved via a polynomial-size LP.Rothvoss (2014). The matching polytope has exponential extension complexity, STOC.
This can be generalized to
approximations (no poly-size LP can approximate these problems upto some precision).Braun, Fiorini, Pokutta & Steurer (2012). Approximation limits of linear programs(beyond hierarchies), FOCS.
any convex cone, in particular PSD (so called PSD-rank).See the survey: Fawzi, Gouveia, Parrilo, Robinson, & Thomas. Positivesemidefinite rank, Mathematical Programming, 2015.
26//
![Page 67: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/67.jpg)
Exact NMF computation and regular n-gons
Can we use numerical solvers to get insight into these problems?
Yes!
We have developed a library to compute exact NMF’s for small matricesusing meta-heuristics.[V14] Vandaele, G., Glineur & D. Tuyttens, Heuristics for Exact NMF (2014).
Extension complexity of the octagon?
27//
![Page 68: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/68.jpg)
Exact NMF computation and regular n-gons
Can we use numerical solvers to get insight into these problems? Yes!
We have developed a library to compute exact NMF’s for small matricesusing meta-heuristics.[V14] Vandaele, G., Glineur & D. Tuyttens, Heuristics for Exact NMF (2014).
Extension complexity of the octagon?
27//
![Page 69: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/69.jpg)
Exact NMF computation and regular n-gons
Can we use numerical solvers to get insight into these problems? Yes!
We have developed a library to compute exact NMF’s for small matricesusing meta-heuristics.[V14] Vandaele, G., Glineur & D. Tuyttens, Heuristics for Exact NMF (2014).
Extension complexity of the octagon?
27//
![Page 70: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/70.jpg)
Exact NMF computation and regular n-gons
Can we use numerical solvers to get insight into these problems? Yes!
We have developed a library to compute exact NMF’s for small matricesusing meta-heuristics.[V14] Vandaele, G., Glineur & D. Tuyttens, Heuristics for Exact NMF (2014).
Extension complexity of the octagon?
rank(SP) = 3 ≤ rank+(SP) = 6 ≤ min(m, n) = 8.
27//
![Page 71: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/71.jpg)
Exact NMF computation and regular n-gons
We observed a special structure on the solutions for regular n-gons,leading to the best known upper bound and closing the gap for somen-gons:
rank+(Sn) ≤
2dlog2(n)e − 1 for 2k−1 < n ≤ 2k−1 + 2k−2,2dlog2(n)e for 2k−1 + 2k−2 < n ≤ 2k .
[V15] Vandaele, G. & Glineur, On the Linear Extension Complexity of Regularn-gons (2015).
Implication: conic quadratic programming is ‘polynomially reducible’to linear programming.[BTN01] Ben-Tal and Nemirovski (2001). On polyhedral approximations of thesecond-order cone. Mathematics of Operations Research, 26(2), 193-205.
28//
![Page 72: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/72.jpg)
Exact NMF computation and regular n-gons
We observed a special structure on the solutions for regular n-gons,leading to the best known upper bound and closing the gap for somen-gons:
rank+(Sn) ≤
2dlog2(n)e − 1 for 2k−1 < n ≤ 2k−1 + 2k−2,2dlog2(n)e for 2k−1 + 2k−2 < n ≤ 2k .
[V15] Vandaele, G. & Glineur, On the Linear Extension Complexity of Regularn-gons (2015).
Implication: conic quadratic programming is ‘polynomially reducible’to linear programming.[BTN01] Ben-Tal and Nemirovski (2001). On polyhedral approximations of thesecond-order cone. Mathematics of Operations Research, 26(2), 193-205.
28//
![Page 73: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/73.jpg)
NMF under the separability assumption
29//
![Page 74: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/74.jpg)
Separability Assumption
Separability of M: there exists an index set K and V ≥ 0 withM = M(:,K)︸ ︷︷ ︸
U
V , with |K| = r .
[AGKM12] Arora, Ge, Kannan, Moitra, Computing a Nonnegative Matrix Factorization –Provably, STOC 2012.
30//
![Page 75: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/75.jpg)
Separability Assumption
Separability of M: there exists an index set K and V ≥ 0 withM = M(:,K)︸ ︷︷ ︸
U
V , with |K| = r .
[AGKM12] Arora, Ge, Kannan, Moitra, Computing a Nonnegative Matrix Factorization –Provably, STOC 2012.
30//
![Page 76: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/76.jpg)
Separability Assumption
Separability of M: there exists an index set K and V ≥ 0 withM = M(:,K)︸ ︷︷ ︸
U
V , with |K| = r .
[AGKM12] Arora, Ge, Kannan, Moitra, Computing a Nonnegative Matrix Factorization –Provably, STOC 2012.
30//
![Page 77: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/77.jpg)
Applications
In hyperspectral imaging, this is the pure-pixel assumption: for eachmaterial, there is a ‘pure’ pixel containing only that material.[M+14] Ma et al., A Signal Processing Perspective on Hyperspectral Unmixing: Insightsfrom Remote Sensing, IEEE Signal Processing Magazine 31(1):67-81, 2014.
In document classification: for each topic, there is a ‘pure’ word usedonly by that topic (an ‘anchor’ word).[A+13] Arora et al., A Practical Algorithm for Topic Modeling with Provable Guarantees,ICML 2013.
Time-resolved Raman spectra analysis: each substance has a peak inits spectrum while the other spectra are (close) to zero.[L+16] Luce et al., Using Separable Nonnegative Matrix Factorization for the Analysis ofTime-Resolved Raman Spectra, Appl Spectrosc. 2016.
Others: video summarization, foreground-background separation.[ESV12] Elhamifar, Sapiro, Vidal, See all by looking at a few: Sparse modeling for findingrepresentative objects, CVPR 2012.[KSK13] Kumar, Sindhwani, Near-separable Non-negative Matrix Factorization with `1-and Bregman Loss Functions, SIAM data mining 2015.
31//
![Page 78: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/78.jpg)
Applications
In hyperspectral imaging, this is the pure-pixel assumption: for eachmaterial, there is a ‘pure’ pixel containing only that material.[M+14] Ma et al., A Signal Processing Perspective on Hyperspectral Unmixing: Insightsfrom Remote Sensing, IEEE Signal Processing Magazine 31(1):67-81, 2014.
In document classification: for each topic, there is a ‘pure’ word usedonly by that topic (an ‘anchor’ word).[A+13] Arora et al., A Practical Algorithm for Topic Modeling with Provable Guarantees,ICML 2013.
Time-resolved Raman spectra analysis: each substance has a peak inits spectrum while the other spectra are (close) to zero.[L+16] Luce et al., Using Separable Nonnegative Matrix Factorization for the Analysis ofTime-Resolved Raman Spectra, Appl Spectrosc. 2016.
Others: video summarization, foreground-background separation.[ESV12] Elhamifar, Sapiro, Vidal, See all by looking at a few: Sparse modeling for findingrepresentative objects, CVPR 2012.[KSK13] Kumar, Sindhwani, Near-separable Non-negative Matrix Factorization with `1-and Bregman Loss Functions, SIAM data mining 2015.
31//
![Page 79: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/79.jpg)
Applications
In hyperspectral imaging, this is the pure-pixel assumption: for eachmaterial, there is a ‘pure’ pixel containing only that material.[M+14] Ma et al., A Signal Processing Perspective on Hyperspectral Unmixing: Insightsfrom Remote Sensing, IEEE Signal Processing Magazine 31(1):67-81, 2014.
In document classification: for each topic, there is a ‘pure’ word usedonly by that topic (an ‘anchor’ word).[A+13] Arora et al., A Practical Algorithm for Topic Modeling with Provable Guarantees,ICML 2013.
Time-resolved Raman spectra analysis: each substance has a peak inits spectrum while the other spectra are (close) to zero.[L+16] Luce et al., Using Separable Nonnegative Matrix Factorization for the Analysis ofTime-Resolved Raman Spectra, Appl Spectrosc. 2016.
Others: video summarization, foreground-background separation.[ESV12] Elhamifar, Sapiro, Vidal, See all by looking at a few: Sparse modeling for findingrepresentative objects, CVPR 2012.[KSK13] Kumar, Sindhwani, Near-separable Non-negative Matrix Factorization with `1-and Bregman Loss Functions, SIAM data mining 2015.
31//
![Page 80: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/80.jpg)
Geometric Interpretation
The columns of U are the vertices of the convex hull of the columns of M:
M(:, j) =r∑
k=1
U(:, k)V (k , j) ∀j , wherer∑
k=1
V (k , j) = 1,V ≥ 0.
32//
![Page 81: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/81.jpg)
Geometric Interpretation with Noise
The columns of U are the vertices of the convex hull of the columns of M:
M(:, j) ≈r∑
k=1
U(:, k)V (k , j) ∀j , wherer∑
k=1
V (k , j) = 1,V ≥ 0.
Goal: theoretical analysis of the robustness to noise of separable NMFalgorithms
32//
![Page 82: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/82.jpg)
Key Parameters: Noise and Conditioning
We assumeM = U[Ir , V
′]Π + N,
where V ′ ≥ 0, Π is a permutation and N is the noise.
We will assume that the noise is bounded (but otherwise arbitrary):
||N(:, j)||2 ≤ ε, for all j ,
and some dependence on the conditioning κ(U) = σmax(U)σmin(U) is unavoidable:
33//
![Page 83: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/83.jpg)
Key Parameters: Noise and Conditioning
We assumeM = U[Ir , V
′]Π + N,
where V ′ ≥ 0, Π is a permutation and N is the noise.
We will assume that the noise is bounded (but otherwise arbitrary):
||N(:, j)||2 ≤ ε, for all j ,
and some dependence on the conditioning κ(U) = σmax(U)σmin(U) is unavoidable:
33//
![Page 84: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/84.jpg)
Key Parameters: Noise and Conditioning
We assumeM = U[Ir , V
′]Π + N,
where V ′ ≥ 0, Π is a permutation and N is the noise.
We will assume that the noise is bounded (but otherwise arbitrary):
||N(:, j)||2 ≤ ε, for all j ,
and some dependence on the conditioning κ(U) = σmax(U)σmin(U) is unavoidable:
33//
![Page 85: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/85.jpg)
Key Parameters: Noise and Conditioning
We assumeM = U[Ir , V
′]Π + N,
where V ′ ≥ 0, Π is a permutation and N is the noise.
We will assume that the noise is bounded (but otherwise arbitrary):
||N(:, j)||2 ≤ ε, for all j ,
and some dependence on the conditioning κ(U) = σmax(U)σmin(U) is unavoidable:
33//
![Page 86: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/86.jpg)
Successive Projection Algorithm (SPA)
0: Initially K = ∅.For i = 1 : r1: Find j∗ = argmaxj ||M(:, j)||.2: K = K ∪ j∗.3: M ←
(I − uuT
)M where u = M(:,j∗)
||M(:,j∗)||2 .end∼modified Gram-Schmidt with column pivoting.
Theorem. If ε ≤ O(σmin(U)√rκ2(U)
), SPA satisfies
||U−M(:,K)|| = max1≤k≤r
||U(:, k)−M(:,K(k))|| ≤ O(εκ2(U)
).
Advantages. Extremely fast, no parameter.
Drawbacks. Requires U to be full rank; bound is weak.
[GV14] G., Vavasis, Fast and Robust Recursive Algorithms for Separable Nonnegative MatrixFactorization, IEEE Trans. Patt. Anal. Mach. Intell. 36 (4), pp. 698-714, 2014.
34//
![Page 87: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/87.jpg)
Successive Projection Algorithm (SPA)
0: Initially K = ∅.For i = 1 : r1: Find j∗ = argmaxj ||M(:, j)||.2: K = K ∪ j∗.3: M ←
(I − uuT
)M where u = M(:,j∗)
||M(:,j∗)||2 .end∼modified Gram-Schmidt with column pivoting.
Theorem. If ε ≤ O(σmin(U)√rκ2(U)
), SPA satisfies
||U−M(:,K)|| = max1≤k≤r
||U(:, k)−M(:,K(k))|| ≤ O(εκ2(U)
).
Advantages. Extremely fast, no parameter.
Drawbacks. Requires U to be full rank; bound is weak.
[GV14] G., Vavasis, Fast and Robust Recursive Algorithms for Separable Nonnegative MatrixFactorization, IEEE Trans. Patt. Anal. Mach. Intell. 36 (4), pp. 698-714, 2014.
34//
![Page 88: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/88.jpg)
Pre-conditioning for More Robust SPA
Observation. Pre-multiplying M preserves separability:
P M = P (U[Ir ,V′]Π + N) = (PU) [Ir ,V
′]Π + PN.
Ideally, P = U−1 so that κ(PU) = 1 (assuming m = r).Solving the minimum volume ellipsoid centered at the origin andcontaining all the columns of M (which is SDP representable)
minA∈Sr+
log det(A)−1 s.t. mjTAmj ≤ 1 ∀ j ,
allows to approximate U−1: in fact, A∗ ≈ (UUT )−1.
Theorem. If ε ≤ O(σmin(U)r√r
), preconditioned SPA satisfies
||U −M(:,K)|| ≤ O (εκ(U)).
[GV15] G., Vavasis, SDP-based Preconditioning for More Robust Near-Separable NMF, SIAM J.on Optimization, 2015.
35//
![Page 89: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/89.jpg)
Pre-conditioning for More Robust SPA
Observation. Pre-multiplying M preserves separability:
P M = P (U[Ir ,V′]Π + N) = (PU) [Ir ,V
′]Π + PN.
Ideally, P = U−1 so that κ(PU) = 1 (assuming m = r).
Solving the minimum volume ellipsoid centered at the origin andcontaining all the columns of M (which is SDP representable)
minA∈Sr+
log det(A)−1 s.t. mjTAmj ≤ 1 ∀ j ,
allows to approximate U−1: in fact, A∗ ≈ (UUT )−1.
Theorem. If ε ≤ O(σmin(U)r√r
), preconditioned SPA satisfies
||U −M(:,K)|| ≤ O (εκ(U)).
[GV15] G., Vavasis, SDP-based Preconditioning for More Robust Near-Separable NMF, SIAM J.on Optimization, 2015.
35//
![Page 90: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/90.jpg)
Pre-conditioning for More Robust SPA
Observation. Pre-multiplying M preserves separability:
P M = P (U[Ir ,V′]Π + N) = (PU) [Ir ,V
′]Π + PN.
Ideally, P = U−1 so that κ(PU) = 1 (assuming m = r).Solving the minimum volume ellipsoid centered at the origin andcontaining all the columns of M (which is SDP representable)
minA∈Sr+
log det(A)−1 s.t. mjTAmj ≤ 1 ∀ j ,
allows to approximate U−1: in fact, A∗ ≈ (UUT )−1.
Theorem. If ε ≤ O(σmin(U)r√r
), preconditioned SPA satisfies
||U −M(:,K)|| ≤ O (εκ(U)).
[GV15] G., Vavasis, SDP-based Preconditioning for More Robust Near-Separable NMF, SIAM J.on Optimization, 2015.
35//
![Page 91: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/91.jpg)
Pre-conditioning for More Robust SPA
Observation. Pre-multiplying M preserves separability:
P M = P (U[Ir ,V′]Π + N) = (PU) [Ir ,V
′]Π + PN.
Ideally, P = U−1 so that κ(PU) = 1 (assuming m = r).Solving the minimum volume ellipsoid centered at the origin andcontaining all the columns of M (which is SDP representable)
minA∈Sr+
log det(A)−1 s.t. mjTAmj ≤ 1 ∀ j ,
allows to approximate U−1: in fact, A∗ ≈ (UUT )−1.
Theorem. If ε ≤ O(σmin(U)r√r
), preconditioned SPA satisfies
||U −M(:,K)|| ≤ O (εκ(U)).
[GV15] G., Vavasis, SDP-based Preconditioning for More Robust Near-Separable NMF, SIAM J.on Optimization, 2015.
35//
![Page 92: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/92.jpg)
Geometric Interpretation
Figure: Geometric Interpretation of the SDP-based Preconditioning.
See also Mizutani, Ellipsoidal Rounding for Nonnegative MatrixFactorization Under Noisy Separability, JMLR, 2014.
36//
![Page 93: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/93.jpg)
Synthetic data sets
Each entry of U ∈ R40×20+ uniform in [0, 1]; each column normalized.
The other columns of M are the middle points of the columns of U(hence there are
(202
)= 190).
The noise moves the middle points toward the outside of the convexhull of the column of U.
37//
![Page 94: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/94.jpg)
Synthetic data sets
Each entry of U ∈ R40×20+ uniform in [0, 1]; each column normalized.
The other columns of M are the middle points of the columns of U(hence there are
(202
)= 190).
The noise moves the middle points toward the outside of the convexhull of the column of U.
37//
![Page 95: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/95.jpg)
Results for the synthetic data sets
Figure: Average of the fraction of columns correctly extracted depending on thenoise level (for each noise level, 25 matrices are generated).
38//
![Page 96: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/96.jpg)
Combinatorial formulation for separable NMF
We want to find the index set K with |K| = r such that
M = M(:,K)V .
This is equivalent to finding X ∈ Rn×n with r non-zero rows such that
M = M X .
A combinatorial formulation:
minX||X ||row,0 such that M = MX or ||M −MX || ≤ ε.
How to make X row sparse?
39//
![Page 97: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/97.jpg)
Combinatorial formulation for separable NMF
We want to find the index set K with |K| = r such that
M = M(:,K)V .
This is equivalent to finding X ∈ Rn×n with r non-zero rows such that
M = M X .
A combinatorial formulation:
minX||X ||row,0 such that M = MX or ||M −MX || ≤ ε.
How to make X row sparse?
39//
![Page 98: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/98.jpg)
Combinatorial formulation for separable NMF
We want to find the index set K with |K| = r such that
M = M(:,K)V .
This is equivalent to finding X ∈ Rn×n with r non-zero rows such that
M = M X .
A combinatorial formulation:
minX||X ||row,0 such that M = MX or ||M −MX || ≤ ε.
How to make X row sparse?
39//
![Page 99: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/99.jpg)
Combinatorial formulation for separable NMF
We want to find the index set K with |K| = r such that
M = M(:,K)V .
This is equivalent to finding X ∈ Rn×n with r non-zero rows such that
M = M X .
A combinatorial formulation:
minX||X ||row,0 such that M = MX or ||M −MX || ≤ ε.
How to make X row sparse?
39//
![Page 100: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/100.jpg)
A Linear Optimization Model
minX∈Rn×n
+
trace(X ) = || diag(X )||1
such that ||M −MX || ≤ ε,Xij ≤ Xii ≤ 1 for all i , j .
Robustness: noise ≤ O(κ−1
)⇒ error ≤ O
(rεκ
)[GL14].
This model is an improvement over [B+12]: more robust and detects thefactorization rank r automatically.It is equivalent [GL16] to using ||X ||1,∞ =
∑di=1 ||X (i , :)||∞ as a convex
surrogate for ||X ||row,0 [E+12].
[GL14] G., Luce, Robust Near-Separable NMF Using Linear Optimization, JMLR 2014.
[B+12] Bittorf, Recht, Re, Tropp, Factoring nonnegative matrices with LPs, NIPS 2012.[E+12] Esser et al., A convex model for NMF and dimensionality reduction on physical space,IEEE Trans. Image Processing, 2012.[GL16] G. and Luce, A Fast Gradient Method for Nonnegative Sparse Regression with SelfDictionary, IEEE Trans. Image Processing, 2018.
40//
![Page 101: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/101.jpg)
A Linear Optimization Model
minX∈Rn×n
+
trace(X ) = || diag(X )||1
such that ||M −MX || ≤ ε,Xij ≤ Xii ≤ 1 for all i , j .
Robustness: noise ≤ O(κ−1
)⇒ error ≤ O
(rεκ
)[GL14].
This model is an improvement over [B+12]: more robust and detects thefactorization rank r automatically.It is equivalent [GL16] to using ||X ||1,∞ =
∑di=1 ||X (i , :)||∞ as a convex
surrogate for ||X ||row,0 [E+12].
[GL14] G., Luce, Robust Near-Separable NMF Using Linear Optimization, JMLR 2014.
[B+12] Bittorf, Recht, Re, Tropp, Factoring nonnegative matrices with LPs, NIPS 2012.[E+12] Esser et al., A convex model for NMF and dimensionality reduction on physical space,IEEE Trans. Image Processing, 2012.[GL16] G. and Luce, A Fast Gradient Method for Nonnegative Sparse Regression with SelfDictionary, IEEE Trans. Image Processing, 2018.
40//
![Page 102: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/102.jpg)
A Linear Optimization Model
minX∈Rn×n
+
trace(X ) = || diag(X )||1
such that ||M −MX || ≤ ε,Xij ≤ Xii ≤ 1 for all i , j .
Robustness: noise ≤ O(κ−1
)⇒ error ≤ O
(rεκ
)[GL14].
This model is an improvement over [B+12]: more robust and detects thefactorization rank r automatically.
It is equivalent [GL16] to using ||X ||1,∞ =∑d
i=1 ||X (i , :)||∞ as a convexsurrogate for ||X ||row,0 [E+12].
[GL14] G., Luce, Robust Near-Separable NMF Using Linear Optimization, JMLR 2014.
[B+12] Bittorf, Recht, Re, Tropp, Factoring nonnegative matrices with LPs, NIPS 2012.
[E+12] Esser et al., A convex model for NMF and dimensionality reduction on physical space,IEEE Trans. Image Processing, 2012.[GL16] G. and Luce, A Fast Gradient Method for Nonnegative Sparse Regression with SelfDictionary, IEEE Trans. Image Processing, 2018.
40//
![Page 103: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/103.jpg)
A Linear Optimization Model
minX∈Rn×n
+
trace(X ) = || diag(X )||1
such that ||M −MX || ≤ ε,Xij ≤ Xii ≤ 1 for all i , j .
Robustness: noise ≤ O(κ−1
)⇒ error ≤ O
(rεκ
)[GL14].
This model is an improvement over [B+12]: more robust and detects thefactorization rank r automatically.It is equivalent [GL16] to using ||X ||1,∞ =
∑di=1 ||X (i , :)||∞ as a convex
surrogate for ||X ||row,0 [E+12].
[GL14] G., Luce, Robust Near-Separable NMF Using Linear Optimization, JMLR 2014.
[B+12] Bittorf, Recht, Re, Tropp, Factoring nonnegative matrices with LPs, NIPS 2012.[E+12] Esser et al., A convex model for NMF and dimensionality reduction on physical space,IEEE Trans. Image Processing, 2012.[GL16] G. and Luce, A Fast Gradient Method for Nonnegative Sparse Regression with SelfDictionary, IEEE Trans. Image Processing, 2018.
40//
![Page 104: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/104.jpg)
Practical Model and Algorithm
minX∈Ω
||M −MX ||2F + µ tr(X ),
Ω = X ∈ Rn,n | Xii ≤ 1,wiXij ≤ wjXii∀i , j.
We used a fast gradient method (optimal 1st order):
1 Choose an initial point X (0), Y = X (0), α1 ∈ (0, 1).
2 k = 1, 2, . . .
1 X (k) = PΩ
(Y − 1
L∇f (Y )).
2 Y = X (k)+βk(X (k) − X (k−1)
),
where βk = αk (1−αk )α2
k+αk+1with αk+1 ≥ 0 t.q. α2
k+1 = (1− αk+1)α2k .
Projection onto Ω can be done effectively in O(n2 log(n)) operations.
The total computational cost is O(pn2) operations.
[GL16] G. and Luce, A Fast Gradient Method for Nonnegative Sparse Regression with SelfDictionary, IEEE Trans. Image Processing, 2018.
41//
![Page 105: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/105.jpg)
Practical Model and Algorithm
minX∈Ω
||M −MX ||2F + µ tr(X ),
Ω = X ∈ Rn,n | Xii ≤ 1,wiXij ≤ wjXii∀i , j.
We used a fast gradient method (optimal 1st order):
1 Choose an initial point X (0), Y = X (0), α1 ∈ (0, 1).
2 k = 1, 2, . . .
1 X (k) = PΩ
(Y − 1
L∇f (Y )).
2 Y = X (k)+βk(X (k) − X (k−1)
),
where βk = αk (1−αk )α2
k+αk+1with αk+1 ≥ 0 t.q. α2
k+1 = (1− αk+1)α2k .
Projection onto Ω can be done effectively in O(n2 log(n)) operations.
The total computational cost is O(pn2) operations.
[GL16] G. and Luce, A Fast Gradient Method for Nonnegative Sparse Regression with SelfDictionary, IEEE Trans. Image Processing, 2018.
41//
![Page 106: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/106.jpg)
Practical Model and Algorithm
minX∈Ω
||M −MX ||2F + µ tr(X ),
Ω = X ∈ Rn,n | Xii ≤ 1,wiXij ≤ wjXii∀i , j.
We used a fast gradient method (optimal 1st order):
1 Choose an initial point X (0), Y = X (0), α1 ∈ (0, 1).
2 k = 1, 2, . . .
1 X (k) = PΩ
(Y − 1
L∇f (Y )).
2 Y = X (k)+βk(X (k) − X (k−1)
),
where βk = αk (1−αk )α2
k+αk+1with αk+1 ≥ 0 t.q. α2
k+1 = (1− αk+1)α2k .
Projection onto Ω can be done effectively in O(n2 log(n)) operations.
The total computational cost is O(pn2) operations.
[GL16] G. and Luce, A Fast Gradient Method for Nonnegative Sparse Regression with SelfDictionary, IEEE Trans. Image Processing, 2018.
41//
![Page 107: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/107.jpg)
Practical Model and Algorithm
minX∈Ω
||M −MX ||2F + µ tr(X ),
Ω = X ∈ Rn,n | Xii ≤ 1,wiXij ≤ wjXii∀i , j.
We used a fast gradient method (optimal 1st order):
1 Choose an initial point X (0), Y = X (0), α1 ∈ (0, 1).
2 k = 1, 2, . . .
1 X (k) = PΩ
(Y − 1
L∇f (Y )).
2 Y = X (k)+βk(X (k) − X (k−1)
),
where βk = αk (1−αk )α2
k+αk+1with αk+1 ≥ 0 t.q. α2
k+1 = (1− αk+1)α2k .
Projection onto Ω can be done effectively in O(n2 log(n)) operations.
The total computational cost is O(pn2) operations.
[GL16] G. and Luce, A Fast Gradient Method for Nonnegative Sparse Regression with SelfDictionary, IEEE Trans. Image Processing, 2018.
41//
![Page 108: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/108.jpg)
Hyperspectral unmixing
r = 6 r = 8
Time (s.) Rel. err. (%) Time (s.) Rel. err. (%)
VCA 1.02 18.05 1.05 22.68VCA-500 0.03 7.19 0.09 7.25
SPA 0.26 9.58 0.32 9.45SPA-500 <0.01 10.05 <0.01 8.86
SNPA 13.60 9.63 23.02 5.64SNPA-500 0.15 10.05 0.25 8.86
XRAY 28.17 7.50 95.34 6.82XRAY-500 0.15 8.07 0.28 7.36
H2NMF 12.20 5.81 14.92 5.47H2NMF-500 0.27 5.87 0.37 5.68
FGNSR-500 40.11 5.07 39.49 4.08
Table: Numerical results for the Urban HSI (the best result is highlighted in bold).
42//
![Page 109: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/109.jpg)
Figure: Abundance maps extracted by FGNSR-500.
43//
![Page 110: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/110.jpg)
Minimum-volume NMF: Relaxing separability
minK,V≥0
||M −M(:,K)V ||2F such that |K| = r .
test
Open problems: Efficient algorithm for min-vol NMF, robustness to noise.
Fu, Huang, Sidiropoulos, Ma, Nonnegative matrix factorization for signal and dataanalytics: Identifiability, algorithms, and applications, arXiv:1803.01257, 2018.
44//
![Page 111: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/111.jpg)
Minimum-volume NMF: Relaxing separability
minU≥0,V≥0
vol(U) such that ||M − UV ||2F ≤ ε,
where vol(U) ∼ det(UTU), V (:, j) ∈ ∆r for all j .
Open problems: Efficient algorithm for min-vol NMF, robustness to noise.
Fu, Huang, Sidiropoulos, Ma, Nonnegative matrix factorization for signal and dataanalytics: Identifiability, algorithms, and applications, arXiv:1803.01257, 2018.
44//
![Page 112: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/112.jpg)
Minimum-volume NMF: Relaxing separability
minU≥0,V≥0
vol(U) such that ||M − UV ||2F ≤ ε,
where vol(U) ∼ det(UTU), V (:, j) ∈ ∆r for all j .
Open problems: Efficient algorithm for min-vol NMF, robustness to noise.
Fu, Huang, Sidiropoulos, Ma, Nonnegative matrix factorization for signal and dataanalytics: Identifiability, algorithms, and applications, arXiv:1803.01257, 2018.
44//
![Page 113: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/113.jpg)
Minimum-volume NMF: Relaxing separability
minU≥0,V≥0
vol(U) such that ||M − UV ||2F ≤ ε,
where vol(U) ∼ det(UTU), V (:, j) ∈ ∆r for all j .
Open problems: Efficient algorithm for min-vol NMF, robustness to noise.
Fu, Huang, Sidiropoulos, Ma, Nonnegative matrix factorization for signal and dataanalytics: Identifiability, algorithms, and applications, arXiv:1803.01257, 2018.
44//
![Page 114: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/114.jpg)
Identifiability with sparsity
Decompose a low rank matrix with known coefficients sparsity.
M = UV ,rank(M) = rank(U) = r ,‖V (:, j)‖0 ≤ k = r − s < r ∀j .
Many existing theoretical results (see, e.g., [Gribonval 16]) and algorithms(Dictionary Learning). But:
% Not many results specific to the low-rank case
% Only two deterministic identifiability results [Elad 06, Georgiev 05]
% Not much in the NMF case except `1 regularization
45//
![Page 115: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/115.jpg)
Identifiability with sparsity
Decompose a low rank matrix with known coefficients sparsity.
M = UV ,rank(M) = rank(U) = r ,‖V (:, j)‖0 ≤ k = r − s < r ∀j .
Many existing theoretical results (see, e.g., [Gribonval 16]) and algorithms(Dictionary Learning). But:
% Not many results specific to the low-rank case
% Only two deterministic identifiability results [Elad 06, Georgiev 05]
% Not much in the NMF case except `1 regularization
45//
![Page 116: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/116.jpg)
Identifiability with sparsity: example
Example: p = 3, r = 3, s=sparsity=1, n = 9.
data pointsfirst decomposition
second decomposition
46//
![Page 117: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/117.jpg)
Identifiability with sparsity: example
Example: p = 3, r = 3, s=sparsity=1, n = 9.
data pointsfirst decompositionsecond decomposition
46//
![Page 118: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/118.jpg)
Identifiability results
Theorem
Let M = UV where rank(U) = rank(M) = r and each column of V has atleast s zeros. The factorization (U,V ) is essentially unique if on each
hyperplane spanned by all but one column of U, there are⌊r(r−2)
s
⌋+ 1
data points with spark r .
! For s = 1, this requires r3 − 2r2 + r data points and it is tight up tothe constant r (counter examples for any n = r3 − 2r2).
! For s = r − 1, this requires r data points and it is tight (one on eachintersection of r − 1 hyperplanes).
! It is tight up to constant factors for any s = βr for any fixed constantβ.
! Nonnegativity not taken into account in the analysis, it helps both intheory and in practice: further work.
[CG18] Cohen, G., Identifiability of Low-Rank Sparse Component Analysis,arXiv:1808.08765.
47//
![Page 119: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/119.jpg)
Identifiability results
Theorem
Let M = UV where rank(U) = rank(M) = r and each column of V has atleast s zeros. The factorization (U,V ) is essentially unique if on each
hyperplane spanned by all but one column of U, there are⌊r(r−2)
s
⌋+ 1
data points with spark r .
! For s = 1, this requires r3 − 2r2 + r data points and it is tight up tothe constant r (counter examples for any n = r3 − 2r2).
! For s = r − 1, this requires r data points and it is tight (one on eachintersection of r − 1 hyperplanes).
! It is tight up to constant factors for any s = βr for any fixed constantβ.
! Nonnegativity not taken into account in the analysis, it helps both intheory and in practice: further work.
[CG18] Cohen, G., Identifiability of Low-Rank Sparse Component Analysis,arXiv:1808.08765.
47//
![Page 120: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/120.jpg)
Identifiability results
Theorem
Let M = UV where rank(U) = rank(M) = r and each column of V has atleast s zeros. The factorization (U,V ) is essentially unique if on each
hyperplane spanned by all but one column of U, there are⌊r(r−2)
s
⌋+ 1
data points with spark r .
! For s = 1, this requires r3 − 2r2 + r data points and it is tight up tothe constant r (counter examples for any n = r3 − 2r2).
! For s = r − 1, this requires r data points and it is tight (one on eachintersection of r − 1 hyperplanes).
! It is tight up to constant factors for any s = βr for any fixed constantβ.
! Nonnegativity not taken into account in the analysis, it helps both intheory and in practice: further work.
[CG18] Cohen, G., Identifiability of Low-Rank Sparse Component Analysis,arXiv:1808.08765.
47//
![Page 121: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/121.jpg)
Identifiability results
Theorem
Let M = UV where rank(U) = rank(M) = r and each column of V has atleast s zeros. The factorization (U,V ) is essentially unique if on each
hyperplane spanned by all but one column of U, there are⌊r(r−2)
s
⌋+ 1
data points with spark r .
! For s = 1, this requires r3 − 2r2 + r data points and it is tight up tothe constant r (counter examples for any n = r3 − 2r2).
! For s = r − 1, this requires r data points and it is tight (one on eachintersection of r − 1 hyperplanes).
! It is tight up to constant factors for any s = βr for any fixed constantβ.
! Nonnegativity not taken into account in the analysis, it helps both intheory and in practice: further work.
[CG18] Cohen, G., Identifiability of Low-Rank Sparse Component Analysis,arXiv:1808.08765.
47//
![Page 122: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/122.jpg)
Identifiability results
Theorem
Let M = UV where rank(U) = rank(M) = r and each column of V has atleast s zeros. The factorization (U,V ) is essentially unique if on each
hyperplane spanned by all but one column of U, there are⌊r(r−2)
s
⌋+ 1
data points with spark r .
! For s = 1, this requires r3 − 2r2 + r data points and it is tight up tothe constant r (counter examples for any n = r3 − 2r2).
! For s = r − 1, this requires r data points and it is tight (one on eachintersection of r − 1 hyperplanes).
! It is tight up to constant factors for any s = βr for any fixed constantβ.
! Nonnegativity not taken into account in the analysis, it helps both intheory and in practice: further work.
[CG18] Cohen, G., Identifiability of Low-Rank Sparse Component Analysis,arXiv:1808.08765.
47//
![Page 123: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/123.jpg)
Geometric intuition
Example: p = 3, r = 3, sparsity=1, n = 4 + 3 + 2 = 9.
data pointsunique decomposition
48//
![Page 124: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/124.jpg)
Sparsity in action
Spectral unmixing, R = 6, s = 4
! Sparsity is another way to obtain identifiability for matrixdecompositions.
% Hard combinatorial problems to solve. . .
49//
![Page 125: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/125.jpg)
Sparsity in action
Spectral unmixing, R = 6, s = 4
! Sparsity is another way to obtain identifiability for matrixdecompositions.
% Hard combinatorial problems to solve. . .
49//
![Page 126: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/126.jpg)
Sparsity in action
Spectral unmixing, R = 6, s = 4
! Sparsity is another way to obtain identifiability for matrixdecompositions.
% Hard combinatorial problems to solve. . .
49//
![Page 127: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/127.jpg)
Take-home messages
1 NMF is a useful and widely used linear model in data analysis andmachine learning.
2 NMF is difficult (NP-hard) and ill-posed (non-uniqueness).
3 NMF is closely related to the nested polytopes problem and extendedformulations.
4 NMF with (self-)dictionary is tractable and well-posed (separableNMF).
5 To obtain identifiable NMF models: minimum volume or sparsity canbe used but, as opposed to separability, this does not lead to tractablemodels. This is an important direction of reasearch (robustness tonoise, tractability).
50//
![Page 128: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/128.jpg)
Take-home messages
1 NMF is a useful and widely used linear model in data analysis andmachine learning.
2 NMF is difficult (NP-hard) and ill-posed (non-uniqueness).
3 NMF is closely related to the nested polytopes problem and extendedformulations.
4 NMF with (self-)dictionary is tractable and well-posed (separableNMF).
5 To obtain identifiable NMF models: minimum volume or sparsity canbe used but, as opposed to separability, this does not lead to tractablemodels. This is an important direction of reasearch (robustness tonoise, tractability).
50//
![Page 129: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/129.jpg)
Take-home messages
1 NMF is a useful and widely used linear model in data analysis andmachine learning.
2 NMF is difficult (NP-hard) and ill-posed (non-uniqueness).
3 NMF is closely related to the nested polytopes problem and extendedformulations.
4 NMF with (self-)dictionary is tractable and well-posed (separableNMF).
5 To obtain identifiable NMF models: minimum volume or sparsity canbe used but, as opposed to separability, this does not lead to tractablemodels. This is an important direction of reasearch (robustness tonoise, tractability).
50//
![Page 130: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/130.jpg)
Take-home messages
1 NMF is a useful and widely used linear model in data analysis andmachine learning.
2 NMF is difficult (NP-hard) and ill-posed (non-uniqueness).
3 NMF is closely related to the nested polytopes problem and extendedformulations.
4 NMF with (self-)dictionary is tractable and well-posed (separableNMF).
5 To obtain identifiable NMF models: minimum volume or sparsity canbe used but, as opposed to separability, this does not lead to tractablemodels. This is an important direction of reasearch (robustness tonoise, tractability).
50//
![Page 131: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/131.jpg)
Take-home messages
1 NMF is a useful and widely used linear model in data analysis andmachine learning.
2 NMF is difficult (NP-hard) and ill-posed (non-uniqueness).
3 NMF is closely related to the nested polytopes problem and extendedformulations.
4 NMF with (self-)dictionary is tractable and well-posed (separableNMF).
5 To obtain identifiable NMF models: minimum volume or sparsity canbe used but, as opposed to separability, this does not lead to tractablemodels. This is an important direction of reasearch (robustness tonoise, tractability).
50//
![Page 132: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis](https://reader033.vdocuments.us/reader033/viewer/2022060319/5f0cb6837e708231d436c3ab/html5/thumbnails/132.jpg)
Thank you for your attention!
Code and papers available fromhttps://sites.google.com/site/nicolasgillis
51//