nonnegative matrix and tensor factorizations: models ...a2 linear mixing model. a3 sources are...
TRANSCRIPT
Nonnegative Matrix and Tensor Factorizations:Models, Algorithms and Applications
Andersen Man Shun AngSupervisor: Nicolas Gillis
Homepage: angms.science
October 14, 2020
Overview
1 Introduction
2 Part I: Minimum volume NMF
3 Part II: NuMF
4 Part III: HER
5 Summary of contributions
2 / 53
About myself, and acknowledgments
I spent 9 years learning how to know you don’t know.
I acknowledge:
I My boss Nicolas Gillis
I Jury membersI Xavier Siebert (UMONS), Thierry Dutoit (UMONS), Laurent Jacques
(UCL), Francois Glineur (UCL), Lieven De Lathauwer (KUL)
I Colleagues and friendsNicolas, Pierre, Hien, Arnaud, Valentin, Maryam, Tim, FrancoisPunit, Jeremy, Junjun, Christophe, Flavia
3 / 53
About this “talk”
I 3 main parts to cover the 8 chapters of the thesis.I Ch1 IntroductionI Ch2 NMF with minimum volume: minvol NMFI Ch3 Minvol NMF on hyperspectral unmixingI Ch4 Minvol NMF on audio source separationI Ch5 NMF with unimodality: NuMFI Ch6 Tensor algebra and factorizationI Ch7 Heuristic extrapolation with restartsI Ch8 Conclusion
I Highly compressed.→ details in the thesis
I Only core ideas and eye-catching items.
4 / 53
Overview
1 Introduction
2 Part I: Minimum volume NMF
3 Part II: NuMF
4 Part III: HER
5 Summary of contributions
5 / 53
(New) It is cool.6 / 53
Thunder-fast review of NMF
7 / 53
Thunder-fast review of NMF
8 / 53
NMF
I Numerical optimization
minimizeW,H
1
2‖M−WH‖2F subject to W ≥ 0 and H ≥ 0.
I Regularized model
minW,H
1
2‖M−WH‖2F+g(W) s.t. W ≥ 0,H ≥ 0 and other constraints.
I Notation: hide W ≥ 0, H ≥ 0.
9 / 53
NMF
I Numerical optimization
minimizeW,H
1
2‖M−WH‖2F subject to W ≥ 0 and H ≥ 0.
I Regularized model
minW,H
1
2‖M−WH‖2F+g(W) s.t. W ≥ 0,H ≥ 0 and other constraints.
I Notation: hide W ≥ 0, H ≥ 0.
10 / 53
The details I skipped
I Popularity of NMF in research
I The nonnegative rank
I How NMF is used in application
I How NMF problems are solved
I HALS
I Solution space of NMF
I NMF is NP-hard
I Separable NMF
I How to solve Separable NMF
I Successive Projection Algorithm
... see chapter 1 of the thesis!
11 / 53
Geometry of M = WH
00.2
Coordinate 1
0.40.6
NMF
0.811
0.8
0.6
Coordinate 2
0.4
0.2
0.6
0.8
0
0.2
0.4
1
0
Coor
din
ate
3
Columns of M
Columns of W
Columns of I3
00.2
Coordinate 1
0.4
Separable NMF
0.60.8
11
0.8
0.6
Coordinate 2
0.4
0.2
0.8
0.6
0.4
1
0.2
0
0
Coor
din
ate
3
Figure: Nested cone geometry of NMF: blue cone ⊆ red cone ⊆ green cone.
12 / 53
Minimum volume NMF
minW,H
1
2‖M−WH‖2F + λV(W) subject to some constraints.
13 / 53
What is “volume”?
I Lemma 2.1.1 (p.17 in thesis) If W ∈ Rm×r+ is full rank, then√det(W>W)
is the volume of conv([0 W]
)in the column space of W, up to a
constant factor.
Proof idea: Gram-Schmidt orthogonalization and SVD.
I Figure illustration.
14 / 53
What is “volume”?
det(W>W) = ‖w1‖22 ·∥∥∥P⊥1 w2
∥∥∥22
∥∥∥P⊥1,2w3
∥∥∥22. . .∥∥∥P⊥1,...,r−1wr
∥∥∥22,
where P⊥a = I− aa>
‖a‖22is the projector on span⊥(a),
P⊥1 = P⊥a1, a1 = w1
P⊥1,2 = P⊥1 P⊥a2, a2 = P⊥1 w2
P⊥1,2,3 = P⊥1,2P⊥a3, a3 = P⊥1,2w3
......
P⊥1,...,r−1 = P⊥1,2,...,r−2P⊥ar−1
, ar−1 = P⊥1,2,...,r−2wr−1.
15 / 53
Some volume functions for minW,H
1
2‖M−WH‖2F + λV(W)
Name Definition In g ◦ σ(W)
Determinant (det) Vdet(W) = det(W>W
) r∏i=1
σ2i
log-determinant (logdet) Vlogdet(W) = log det(W>W + δIr
) r∑i=1
log(σ2i + δ
)Frobenius norm squared VF(W) = ‖W‖2F
r∑i=1
σ2i
Nuclear norm V∗(W) = ‖W‖∗r∑i=1
σi
Smooth Schatten-p norm Vp,δ(W) = Tr(W>W + δIr
) p2
r∑i=1
(σ2i + δ)
p2
16 / 53
New theory developed in the thesis:Identifiability of a minvol (N)MF
I Theorem 2.1.1 Let M = WH where H ∈ Rr×n+ satisfies the SSC,W ∈ Rm×r satisfies W>1m = 1r and rank (M) = r. Then the(exact) solution of
argminW,H
Vdet(W) % Det-volume
s.t. M = WH % Matrix factorizationH ≥ 0 % NonnegativityW>1m = 1r % Normalization of col. of W
is essentially unique.
I Significance: justify the use of minvol NMF in application.
17 / 53
Illustration
http://angms.science/eg_SNPA_ini.gif
18 / 53
The details I skipped
I Details of the formulation of the minvol NMF
I Motivation of minvol NMF
I Details on the volume regularizer
I Comparing the volume regularizer
I Parameter tuning
I Rank deficiency case
I The proof of the identifiability theorem
I How to solve minvol NMF
I Computational cost of the algorithm
I Convergence analysis of the algorithm
... see chapter 2 of the thesis!
19 / 53
Hyperspectral Unmixing
Band number20 40 60 80 100 120 140 160 180
Re.ectance
0
0.1
0.2
0.3
0.4
0.5
0.6
Endmember spectral pro-les of the Jasper Ridge HSI data
Vegetation
Water
Dirt
Road
20 / 53
Minvol NMF on Hyperspectral UnmixingImage w.r.t. wavelength `5'
x-coordinate20 60 100
y-coordinate 20
40
60
80
100
Vectorized image
Spatial coordinate (x and y)3000 6000 9000
Data matrix M
Spatial coordinate3000 6000 9000
Wavelength
50
100
150
H
Spatial coordinate3000 6000 9000
2
4
One row of H
x-coordinate20 60 100
y-coordinate 20
40
60
80
100
W
2 4
Wavelength
50
100
150
Wavelength 0
100
2000 0.5
Plot of W
NMF
Matricization
Vectorization
h3
w1
hj
21 / 53
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
22 / 53
50 100 1500
0.5
1Vegetation type-1
50 100 1500
0.5
1Dirt type-1
50 100 1500
0.5
1Vegetation type-2
50 100 1500
0.5
1Vegetation type-3
50 100 1500
0.5
1Dirt type-2
50 100 1500
0.5
1Water
50 100 1500
0.5
1Vegetation type-4
50 100 1500
0.5
1Vegetation
50 100 1500
0.5
1Dirt
50 100 1500
0.5
1Water
50 100 1500
0.5
1Ref. Vegetation
50 100 1500
0.5
1Ref. Dirt
50 100 1500
0.5
1Ref. Water
23 / 53
The details I skipped
I Details of the hyperspectral image
I Pure-pixel assumption
I Performance metrics for ranking algorithms
I Parameter tuning and experiment setup
I Details of many experimental resultsI minvol NMF � SPA, MVC-NMF and RVolMin on synthetic and real
datasets.I minvol NMF with Vdet and Vlogdet are the top two methods among the
tested methods.
... see chapter 3 of the thesis!
24 / 53
Audio source separation
4
4&
E4
Ma
D4
-ry
C4
had
D4
a
E4
lit-
E4
tle
E4
lamb
œœ
œœ
œ œ ˙
-4
-3
-2
-1
0
1
-3
-2
-1
0
1
2
3
0 1 2 3 4
E4C4D4Hammer noise
redundant component
0 0.6 1.2 1.8 2.4 3 3.6 4.2
C4
Hammer noiseRedundant component
D4
E4
25 / 53
Illustration
https://www.youtube.com/watch?v=1BrpxvpghKQ
26 / 53
Correct identification of the note sequence
11
7
3
2
5
4
8
6
1
2
3
4
5 8
7
6
2 21
3
4
5 8
7
6 3
4
5 8
7
6
1
27 / 53
How does it work?
X(f,t)x(t) Ψ Amplitude V(f,t)
Phase
w1h1
NMF
[WH](f,t)
wrhr
y1(t)
yr(t) Ψ-1
y(t)
+Ψ-1
...... ...
...
Time domain Time-frequency domain
Y1(f,t)
Yr(f,t)
...
Complex-valued Real-valued
Input
Estimate
And the data fitting term is changed from1
2‖M−WH‖2F to the
β-divergence.28 / 53
Assumptions on applying minvol NMF on audio data
On using minvol NMF to estimate the source s:
A1 No time delay in the mixing.
A2 Linear mixing model.
A3 Sources are balanced.
A4 Additive mixture V =∑
i |Si|.
A5 |Si| are well approximated by nonnegative rank-1 matrices.
A6 Reconstructed components are consistent.
A7 The data has no outlier.
Violating any one of these assumptions leads to errors, or ... new researchopportunities!
29 / 53
On the consistency assumption
𝒙
ℝ𝑇 ℂ𝐹×𝑇′
𝐗 = 𝐕exp 𝑖𝚽
Time domain Time-frequency domain
𝚿
𝐘 = 𝐖1𝐇1exp 𝑖𝚽
𝐙 = 𝐖2𝐇2exp 𝑖𝚽
𝐍𝐌𝐅
𝐍𝐌𝐅
𝒚
𝒛
𝚿−1
𝚿−1
Figure: Illustrating the consistency issue of decomposing a time-frequency matrix.Here the NMF result W1H1 is consistent as the matrix Y while the resultW2H2 is not. 30 / 53
The details I skipped
I Details of the audio source separation process
I Time frequency transform
I β-divergence and β-divergence NMF
I Assumptions on applying NMF on audio data
I Parameter tuning and experiment setup
I Details of experimental results
... see chapter 4 of the thesis!
31 / 53
Overview
1 Introduction
2 Part I: Minimum volume NMF
3 Part II: NuMF
4 Part III: HER
5 Summary of contributions
32 / 53
Nonnegative unimodality
I A vector x is nonnegative unimodal ∃p ∈ [m] such that
0 ≤ x1 ≤ x2 ≤ · · · ≤ xp and xp ≥ xp+1 ≥ · · · ≥ xm ≥ 0.
Um+ : the set of Nu vectors in RmUm,p+ : the set of Nu vectors in Um+ with tonicity change at p.
I Examples
33 / 53
Characterizing the Nu set
x ∈ Um+︸ ︷︷ ︸“x is unimodal”
⇐⇒ ∃p s.t. x ∈ Um,p+ ∪ Um,p+1+︸ ︷︷ ︸
a convex set
⇐⇒
0 ≤ x1x1 ≤ x2
...xp−1 ≤ xpxp+1 ≥ xp+2
...xm−1 ≥ xmxm ≥ 0︸ ︷︷ ︸
Union of two systemsof monic inequalities.
34 / 53
Characterizing the Nu set
0 ≤ x1x1 ≤ x2
...xp−1 ≤ xpxp+1 ≥ xp+2
...xm−1 ≥ xmxm ≥ 0
⇐⇒ Upx ≥ 0
Up =
1−1 1
. . .. . .
−1 1
p×p︸ ︷︷ ︸
D
0p×(m−p)
0(m−p)×p
1 −1
. . .. . .
1 −11
(m−p)×(m−p)︸ ︷︷ ︸
D
∈ {−1, 0, 1}m×m,
35 / 53
NuMF
minW,H
1
2‖M−WH‖2F
subject to Upjwj ≥ 0 for all j ∈ [r],
w>j 1m = 1 for all j ∈ [r],
hi ≥ 0 for all i ∈ [r],
How to solve it?
I There are r integer unknowns p1, . . . , pr
I Idea to solve it: brute force
I Improvement: accelerated projected gradient, peak detection,multi-grid
36 / 53
New theory developed in the thesis:Restriction operator preserves Nu
I Theorem 5.2.1 Let x ∈ Um+ and R ∈ Rm×m′ with the structure
R(a, b) =
a b
b a b. . .
. . .. . .
b a b
b a
, a > 0, b > 0, a+ 2b = 1,
then y = Rx ∈ Um′+ . Furthermore, if x ∈ Um,p+ where p is even, then
y ∈ Um,py+ with py ∈{ p
2− 1,
p
2,p
2+ 1}
.
I Significance: justify the use of Multi-grid as a dimension reductionstep in solving NuMF.
37 / 53
New theory developed in the thesis:3 theorems related to identifiability of NuMF
I Theorem 5.3.1 Informally, if wi are Nu and have strictly disjointsupport, then the sol. of NuMF is essentially unique.I Specialty of the theorem: it works with n ≥ 1, even if r > n.
I Theorem 5.3.2 Informally, if wi are Nu and have strictly disjointsupport, H satisfies the independent sensing condition, then the(exact) sol. of NuMF is essentially unique.
I Theorem 5.3.3 Informally, given x,y ∈ Um+ s.t. supp(x)*+supp(y).
If x, y can be generated by two non-zero Nu vectors u, v asx = au+ bv and y = cu+ dv, then the only possibilities are eitheru = x, v = y or u = y, v = x.
38 / 53
39 / 53
40 / 53
The details I skipped
I Details of the accelerated projected gradient
I Details of the peak detection
I Details of multi-grid
I Details of the identifiability results
... see chapter 5 of the thesis!
41 / 53
Overview
1 Introduction
2 Part I: Minimum volume NMF
3 Part II: NuMF
4 Part III: HER
5 Summary of contributions
42 / 53
HER
I A framework for accelerating algo. for solving NMF, NTF and CPD.
a1
1
1
XXX a
a
+ … +ar
ar
ar
(1)
(2)
(3)
(1)
(3)
(2)
=
I Key ideasI Extrapolation with restart.I Cheap computation by making use of computed component in the
update.
I Many numerical evidences on the effectiveness of HER.
43 / 53
Extrapolation? Why?
55
10
10
10
1520
0 2 4-1
0
1
2
3
4
5
2
2
4
4
6
6
8
8
8
10
10
10
12
12
12
1416
0 2 4-1
0
1
2
3
4
5
2
2
4
4
6
6
6
8
8
8
10
10
10
12
12
12 14 16
0 2 4-1
0
1
2
3
4
5
5
5
5
10
10
10 15 20 25
0 2 4-1
0
1
2
3
4
5
44 / 53
45 / 53
k-1
Up A1
Ex A1
Pr A1
Up A2
Ex A2
Pr A2
Up AN
Ex AN
Pr AN. . .
k
46 / 53
Results
See thesis:Chapter 7, p.113- p.125.
47 / 53
The details I skipped
I The discussion of HER in the original form for solving NMF problem.
I The details of HER on NMF and NTF problems.
I Other similar algorithms.
... see chapter 7 of the thesis!
48 / 53
Overview
1 Introduction
2 Part I: Minimum volume NMF
3 Part II: NuMF
4 Part III: HER
5 Summary of contributions
49 / 53
Summary of contributions: the modeling aspect
I Minvol NMFI It generalizes Separable NMF, an important class of NMF.
I Minvol NMF with Vdet is identifiable under the SSC condition.
I Minvol NMF is empirically superior than other approaches.
I Minvol NMF with nuclear norm: new model.
I NuMFI Proposed a brute-force heuristic to solve it, with acceleration by APG,
peak detection and a dimension reduction step based on a multi-gridmethod.
I The multi-grid method preserves unimodality.
I 3 identifiability theorems of NuMF in 3 special cases.
50 / 53
Summary of contributions: the algorithm aspectI A generic framework named HER on accelerating both exact and
inexact BCD type of algorithms in solving NMF, NTF and CPD.
I Many empirical evidences on the effectiveness of HER.
I The benefits of HER:I it can accelerate any BCD algorithm
I it extrapolates the variable sequence without increasing theper-iteration computational cost of the algorithm
I the auxiliary extrapolation sequence it produces is always feasible.
I The main shortcomings of HER are: no theoretical convergenceguarantee; requires parameter tuning.
I Apart from HER, we provided efficient algorithms for solving minvolNMF and NuMF.
51 / 53
Summary of contributions: the application aspect
I GeographyHyperspectral unmixing in remote sensing.
I MusicAudio blind source separation.
I ChemistryChromatography - mass spectrometry
52 / 53
The end
I PhD Thesis:Nonnegative Matrix and Tensor Factorizations: Models, Algorithms and ApplicationsAvailable online: https://angms.science/doc/PhDThesis.pdf
Errata: https://angms.science/doc/PhDThesisErrata.pdf
I Works during the PhD period
green = journal paper, black = conference paper1. A., and Gillis, Volume regularized non-negative matrix factorizations, WHISPERS182. A., and Gillis, Algorithms and comparisons of nonnegative matrix factorizations with volume regularization for
hyperspectral unmixing, JSTAR, 193. A., and Gillis, Accelerating nonnegative matrix factorization algorithms using extrapolation, Neural
Computation, 194. A., Cohen and Gillis, Accelerating approximate nonnegative canonical polyadic decomposition using
extrapolation, GRETSI195. Leplat, Gillis, Siebert and A., Separation aveugle de sources sonores par factorization en matrices positives avec
penalite sur le volume du dictionnaire, GRETSI196. Leplat, A. and Gillis, Minimum-volume rank-deficient nonnegative matrix factorizations, ICASSP197. Leplat, Gillis and A., Blind Audio Source Separation with Minimum-Volume Beta-Divergence NMF, TSP, 208. A., Cohen, Le and Gillis, Extrapolated Alternating Algorithms for Approximate Canonical Polyadic
Decomposition, ICASSP209. A., Cohen, Gillis and Le, Accelerating Block Coordinate Descent for Nonnegative Tensor Factorization, NLAA,
under review, 2010. A., Gillis, Vandaele and De Sterck, Nonnegative Unimodal Matrix Factorization, submitted to ICASSP (NEW)
53 / 53