singular value decomposition for high-dimensional high ... · singular value decomposition for...
TRANSCRIPT
Singular Value Decomposition for High-dimensionalHigh-order Data
Anru ZhangDepartment of StatisticsUniversity of Wisconsin-Madison
CSML @ Princeton UniversityMay 14, 2018
Joint work with Dong Xia
Introduction
Introduction
• Tensors, or high-order arrays, attract lots of attention recently.
• e.g. an order-d tensor
X ∈ Rp1×···×pd , X =(Xi1···id
), 1 ≤ ik ≤ pk, k = 1, · · · , d.
Anru Zhang (UW-Madison) Tensor SVD 2
Introduction
Example
• Brain imaging
• Hyperspectral imaging
• Recommender system
Anru Zhang (UW-Madison) Tensor SVD 3
Introduction
• Higher order is fancier!
• Higher order tensor problems are far more than extension ofmatrices.
I More structuresI High-dimensionalityI Computational difficulty
Anru Zhang (UW-Madison) Tensor SVD 4
Introduction
In this talk, we focus on tensor SVD.
Anru Zhang (UW-Madison) Tensor SVD 5
Tensor SVD
SVD and PCA
• Singular value decomposition (SVD) is one of the most importanttools in multivariate analysis.
• Goal: Find the underlying low-rank structure from the data matrix.
• Closely related to Principal component analysis (PCA): Find theone/multiple directions that explain most of the variance.
• Variations: sparse PCA, robust PCA, sparse SVD, kernel SVD, ...
Anru Zhang (UW-Madison) Tensor SVD 6
Tensor SVD
Related Works
• Rank-1 Tensor SVD: Richard and Montanari, 2014; Hopkins, Shi,Steurer, 2015; Perry, Wein, Bandeira, 2017, Anandkumar, Deng, Ge,Mobahi, 2017.
Y = λ · u ⊗ v ⊗ w +Z, Ziid∼ N(0, σ2).
• Methods:I power methods, sum-of-squares, approximate message passing,
homotopy...I MLE, warm-start power iterations...
• Statistical and computational trade-off & phase transition effects.
• The statistical framework for tensor SVD when r ≥ 2 is not welldefined or solved.
Anru Zhang (UW-Madison) Tensor SVD 7
Tensor SVD
Tensor SVD
• We propose a general framework for tensor SVD.
•
Y = X +Z,
whereI Y ∈ Rp1×p2×p3 is the observation;I Z is the noise;I X is a low-rank tensor.
• We wish to recover the high-dimensional low-rank structure X.→ Unfortunately, there is no uniform definition for tensor rank.
Anru Zhang (UW-Madison) Tensor SVD 8
Tensor SVD
Low-rankness for Tensors
• Canonical polyadic (CP) low-rankness is widely used in literature.Definition:
rcp = minr
s.t. X =
r∑i=1
λi · ui ⊗ vi ⊗ wi.
• Disadvantages:I possibly rcp > max{p1, p2, p3};I the set of rank-r tensors may not be close;
Possible situation: limit of a series of cp rank-2 tensors is of rank 3!I ui’s (vi,wi’s) are usually not orthogonal.
Picture Source: Guoxu Zhou’s website. http://www.bsp.brain.riken.jp/ zhougx/tensor.html
Anru Zhang (UW-Madison) Tensor SVD 9
Tensor SVD
Tucker Low-rankness
• If X1, X2, and X3 are matricizations of X,
• We assume r1 = rank(X1), r2 = rank(X2), r3 = rank(X3), and denote
Tucker rank(X) = (r1, r2, r3).
• Note:Matrix: dim(row-span) = dim(column-span).Tensor: parallel definitions, r1, r2, r3, are not necessarily equal.
Picture Source: Tao, M., Zhou, F., Liu, Y. and Zhang, Z. (2015)
Anru Zhang (UW-Madison) Tensor SVD 10
Tensor SVD
More General Assumption: Tucker Low-rankness
• Equivalent form of definition: Tucker decomposition
X = S ×1 U1 ×2 U2 ×3 U3
• Smallest (r1, r2, r3) are exactly the Tucker rank of X.
Picture Source: Guoxu Zhou’s website. http://www.bsp.brain.riken.jp/ zhougx/tensor.html
Anru Zhang (UW-Madison) Tensor SVD 11
Tensor SVD
Model
• Observations: Y ∈ Rp1×p2×p3 ,
Y = X +Z = S ×1 U1 ×2 U2 ×3 U3 +Z,
Ziid∼ N(0, σ2), Uk ∈ Opk ,rk , S ∈ Rr1×r2×r3 .
• Goal: estimate U1,U2,U3, and the original tensor X.
Anru Zhang (UW-Madison) Tensor SVD 12
Tensor SVD
Straightforward Idea 1: Higher order SVD (HOSVD)
• Since Uk is the subspace forMk(X), let
Uk = SVDrk (Mk(Y)), k = 1, 2, 3.
i.e. the leading rk singular vectors of all mode-k fibers.
Note: SVDr(·) represents the first r left singular vectors of any given matrix.
Anru Zhang (UW-Madison) Tensor SVD 13
Tensor SVD
Straightforward Idea 1: Higher order SVD (HOSVD)
(De Lathauwer, De Moor, and Vandewalle, SIAM J. Matrix Anal. & Appl. 2000a)
• Advantage: easy to implement and analyze.
• Disadvantage: perform sub-optimally.Reason: simply unfolding the tensor fails to utilize the tensorstructure!
Anru Zhang (UW-Madison) Tensor SVD 14
Tensor SVD
Straightforward Idea 2: Maximum Likelihood Estimator
• Maximum-likelihood estimator
Umle1 , Umle
2 , Umle3 , Smle = argmax
U1,U2,U3,S
‖Y − S ×1 U1 ×2 U2 ×3 U3‖2F
• Equivalently, Umle1 , Umle
2 , Umle3 can be calculated via
max∥∥∥Y ×1 V>1 ×2 V>2 ×3 V>3
∥∥∥2F
subject to V1 ∈ Op1,r1 ,V2 ∈ Op2,r2 ,V3 ∈ Op3,r3 .
• Advantage: achieves statistical optimality. (will be shown later)• Disadvantage:
I Non-convex, computational intractable.I NP-hard to approximate even r = 1 (Hillar and Lim, 2013).
Anru Zhang (UW-Madison) Tensor SVD 15
Tensor SVD
Phase Transition in Tensor SVD
• The difficulty is driven by signal-to-noise ratio (SNR).
λ = mink=1,2,3
σrk (Mk(X))
= least non-zero singular value ofMk(X), k = 1, 2, 3,
σ = SD(Z) = noise level.
• Suppose p1 � p2 � p3 � p. Three phases:
λ/σ ≥ Cp3/4 (Strong SNR case),
λ/σ < cp1/2 (Weak SNR case),
p1/2 � λ/σ � p3/4 (Moderate SNR case).
Anru Zhang (UW-Madison) Tensor SVD 16
Tensor SVD Strong SNR Case
Strong SNR Case: Methodology
• When λ/σ ≥ Cp3/4, apply higher-order orthogonal iteration (HOOI).(De Lathauwer, Moor, and Vandewalle, SIAM. J. Matrix Anal. & Appl. 2000b)
• (Step 1. Spectral initialization)
U(0)k = SVDrk (Mk(Y)) , k = 1, 2, 3.
• (Step 2. Power iterations)Repeat Let t = t + 1. Calculate
U(t)1 = SVDr1
(M1(Y ×2 (U(t−1)
2 )> ×3 (U(t−1)3 )>)
),
U(t)2 = SVDr2
(M2(Y ×1 (U(t)
1 )> ×3 (U(t−1)3 )>)
),
U(t)3 = SVDr3
(M3(Y ×1 (U(t)
1 )> ×2 (U(t)2 )>)
).
Until t = tmax or convergence.
Anru Zhang (UW-Madison) Tensor SVD 17
Tensor SVD Strong SNR Case
Interpretation
1. Spectral initialization provides a “warm start.”2. Power iteration refines the initializations.
Given U(t−1)1 , U(t−1)
2 , U(t−1)3 , denoise Y via:
Y ×2 U(t−1)2 ×3 U(t−1)
3 .
I Mode-1 singular subspace is reserved;I Noise can be highly reduced.
Thus, we update
U(t)1 = SVDr1
(Mr1
(Y ×2 U(t−1)
2 ×3 U(t−1)3
)).
Anru Zhang (UW-Madison) Tensor SVD 18
Tensor SVD Strong SNR Case
Higher-order orthogonal iteration (HOOI)
(De Lathauwer, Moor, and Vandewalle, SIAM. J. Matrix Anal. & Appl. 2000b)
Anru Zhang (UW-Madison) Tensor SVD 19
Tensor SVD Strong SNR Case
Strong SNR Case: Theoretical Analysis
Theorem (Upper Bound)
Suppose λ/σ > Cp3/4 and other regularity conditions hold, after at mostO
(log(p/λ) ∨ 1
)iterations,
• (Recovery of U1,U2,U3)
E minO∈Or
∥∥∥Uk − UkO∥∥∥
F ≤C√
pkrk
λ/σ, k = 1, 2, 3;
• (Recovery of X)
supX∈Fp,r(λ)
maxk=1,2,3
E∥∥∥X − X∥∥∥2
F ≤ C (p1r1 + p2r2 + p3r3)σ2,
supX∈Fp,r(λ)
maxk=1,2,3
E‖X − X‖2F
‖X‖2F
≤C (p1 + p2 + p3)σ2
λ2 .
Anru Zhang (UW-Madison) Tensor SVD 20
Tensor SVD Strong SNR Case
Key Tool
• Key tool: one-sided perturbation bound.• The matricizationsM1(Y) ∈ Rp1×(p2p3) are flat,
M1(Y) =M1(X) +M1(Z) ∈ Rp1×(p2p3)
M1(Y) is observed, rank(M1(X)) ≤ r1, M1(Z) iid∼ N(0, σ2).
• LetU1 = SVDr(M1(Y)), U1 = SVDr(M1(X)),
V1 = SVDr(M1(Y)>), V1 = SVDr(M1(X)>).
Naturally, the left singular subspace U1 ofM1(Y) is more“informative” than the right one V1.
Anru Zhang (UW-Madison) Tensor SVD 21
Tensor SVD Strong SNR Case
One-sided Perturbation Analysis
• Traditional perturbation analysis were usually two-sided, e.g., Wedin’slemma (Wedin, 1972)
max{‖ sin Θ(U1,U1)‖F, ‖ sin Θ(V1,V1)‖F
}≤ ...
• One-sided perturbation bound (Cai and Z. 2018)
Y = X + Z, X,Y ,Z ∈ Rp1×(p2p3),
rank(X) = r, σr(X) = λ, Z iid∼ N(0, 1).
Theorem
E‖ sin Θ(U1,U1)‖2 �p1
λ2 +p1p2p3
λ4 ,
E‖ sin Θ(V1,V1)‖2 �p2p3
λ2 +p1p2p3
λ4 .
Anru Zhang (UW-Madison) Tensor SVD 22
Tensor SVD Strong SNR Case
Strong SNR Case: Lower Bound
Define the following class of low-rank tensors with signal strength λ.
Fp,r(λ) ={X ∈ Rp1×p2×p3 : rank(X) = (r1, r2, r3), σrk (Mk(X)) ≥ λ
}Theorem (Lower Bound)
(Recovery of U1,U2,U3)
infUk
supX∈Fp,r(λ)
E minO∈Or
∥∥∥Uk − UkO∥∥∥
F ≥ c√
pkrk
λ/σ, k = 1, 2, 3.
(Recovery of X)
infX
supX∈Fp,r(λ)
E∥∥∥X − X∥∥∥2
F ≥ c(p1r1 + p2r2 + p3r3)σ2,
infX
supX∈Fp,r(λ)
E‖X − X‖2F
‖X‖2F
≥c(p1 + p2 + p3)σ2
λ2 .
Anru Zhang (UW-Madison) Tensor SVD 23
Tensor SVD Strong SNR Case
HOSVD vs. HOOI
E minO∈Or‖UHOSVD
k − UkO‖F �√
pkrk
λ/σ+
√p1p2p3rk
(λ/σ)2 ;
E minO∈Or‖UHOOI
k − UkO‖F �√
pkrk
λ/σ.
• When λ/σ ≤ cp, HOOI significantly improves upon HOSVD.
• Analysis of rank-r tensor SVD is more difficult than rank-1 tensor SVDor rank-r matrix SVD.
I Many concepts (e.g. singular values) are not well defined for tensors.
Anru Zhang (UW-Madison) Tensor SVD 24
Tensor SVD Weak SNR Case
Weak SNR Case
Under the weak SNR case λ/σ < cp1/2, U1,U2,U3, or X cannot be stablyestimated in general.
Theorem(Recovery of U1,U2,U3)
infUk
supX∈Fp,r(λ)
E minO∈Or
r−1/2k ‖Uk − UkO‖F ≥ c, k = 1, 2, 3.
(Recovery of X)
infX
supX∈Fp,r(λ)
E‖X − X‖2F
‖X‖2F
≥ c.
Anru Zhang (UW-Madison) Tensor SVD 25
Tensor SVD Moderate SNR Case
Moderate SNR Case: Statistical Optimality
• First, MLE achieves statistical optimality.
Theorem (Performance of MLE Estimator)
When λ/σ ≥ Cp1/2,
I (Recovery of U1,U2,U3)
supX∈Fp,r(λ)
EminO∈Or
∥∥∥Umlek − UkO
∥∥∥F ≤ C
√pkrk
λ/σ, k = 1, 2, 3;
I (Recovery of X)
supX∈Fp,r(λ)
E∥∥∥Xmle − X
∥∥∥2F ≤ C (p1r1 + p2r2 + p3r3)σ2,
supX∈Fp,r(λ)
E‖Xmle − X‖2F
‖X‖2F
≤C (p1 + p2 + p3)σ2
λ2 .
• However MLE is computationally intractable.Anru Zhang (UW-Madison) Tensor SVD 26
Tensor SVD Simulation Analysis
Simulation Analysis
• Consider random settings: λ = pα, α ∈ [.4, .9], σ = 1.
0.4 0.5 0.6 0.7 0.8 0.9
α
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
l ∞(U
)
warm start: p=50warm start: p=80warm start: p=100spectral start: p=50spectral start: p=80spectral start: p=100
• Two phase transitions:I The computational inefficient method performs well starting atλ/σ ≈ p1/2;
I The computational efficient HOOI performs well starting at λ/σ ≈ p3/4.
Anru Zhang (UW-Madison) Tensor SVD 27
Tensor SVD Simulation Analysis
Moderate SNR Case: Computational Optimality
Moreover, the following theorem shows the computational hardness forpolynomial-time algorithms under moderate SNR.
TheoremAssume the conjecture of hypergraphic planted clique holds, andλ/σ = O
(p3(1−τ)/4) for any τ > 0, then for any polynomial-time algorithm
U1, U2, U3, X,(Recovery of U1,U2,U3)
lim infp→∞
supX∈Fp,r(λ)
E∥∥∥∥ sin Θ
(U(p)
k ,Uk)∥∥∥∥2≥ c1, k = 1, 2, 3,
(Recovery of X)
lim infp→∞
supX∈Fp,r(λ)
E‖X(p) − X‖2F
‖X‖2F
≥ c1.
Anru Zhang (UW-Madison) Tensor SVD 28
Tensor SVD Simulation Analysis
Remarks
• The analysis relies on the hypergrahic planted clique detectionassumption.
• Result shows the hardness of tensor SVD in moderate SNR case.
• More recently, Ben Arous, Mei, Montanari, Nica (2017) analyzed thelandscape of rank-1 spiked tensor model.
I MLE is with exponentially growing many critical points.
Anru Zhang (UW-Madison) Tensor SVD 29
Summary
Summary
Tensor SVD exhibits three phases,
• (Strong SNR) λ/σ ≥ Cp3/4,→ there is efficient algorithm to estimate U1,U2,U3, and X.
• (Weak SNR) λ/σ < cp1/2,→ no algorithm can stably recover U1,U2,U3, or X.
• (Moderate SNR) p1/2 � λ/σ � p3/4,I non-convex MLE stably recovers U1,U2,U3, and X;I Maybe no polynomial time algorithm performs stably.
Anru Zhang (UW-Madison) Tensor SVD 30
Summary
Further Generalization to Order-d Tensors
• The results can be generalized to order-d tensors.
• Three phasesI (Strong SNR) λ/σ ≥ Cpd/4,→ Efficient algorithm exists.
I (Weak SNR) λ/σ < cp1/2,→ No algorithm exists.
I (Moderate SNR) p1/2 � λ/σ � pd/4,F Inefficient algorithm exists;F Maybe no polynomial time algorithm performs stably.
• RemarkI d = 2, i.e. matrix SVD: computation and statistical gap closes.I d ≥ 3: tensor SVD is with not only statistical, but also computational
challenges.
Anru Zhang (UW-Madison) Tensor SVD 31
Summary
References
• Zhang, A. and Xia, D. (2018). Tensor SVD: Statistical and Computational Limits.
IEEE Transactions on Information Theory, to appear.
• Cai, T. and Zhang, A. (2018). Rate-Optimal Perturbation Bounds for Singular
Subspaces with Applications to High-Dimensional Statistics. Annals of Statistics, 46,
60-89.
• Zhang, A. and Han, R. (2017+). Optimal Denoising and Singular Value
Decomposition for Sparse High-dimensional High-order Data. submitted.
Anru Zhang (UW-Madison) Tensor SVD 32