non-convex relaxations for rank regularization...shrinking bias. not good for sfm! closed form...
Post on 04-Dec-2020
1 Views
Preview:
TRANSCRIPT
Non-Convex Relaxations for Rank Regularization
Carl Olsson
2019-05-01
Carl Olsson 2019-05-01 1 / 40
Structure from Motion and Factorization
W � X X
Affine camera model:
X =
P1
P2...
︸ ︷︷ ︸
camera matrices
[X1 X2 . . .
]︸ ︷︷ ︸3D points
Carl Olsson 2019-05-01 2 / 40
General Motion/Deformation
. . .
Linear shape basis assumption:
0.1581 0.4714 −0.9782 2.0509 1.8610 −2.4750−0.0366 −0.0468 0.2511 0.0532 0.2687 0.50760.5402 −1.9804 0.4749 −0.4343 2.0293 0.3569
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
=
...
Carl Olsson 2019-05-01 3 / 40
Rank and Factorization
X =[x1 x2 . . . xn
]=[b1 b2 . . . br
]︸ ︷︷ ︸B (m×r)
c11 c21 . . .c12 c21 . . .
......
. . .
c1r c21 . . .
.
︸ ︷︷ ︸CT (r×n)
rank(X ) = r
Factorization not unique: X = BCT = BG︸︷︷︸B
G−1CT︸ ︷︷ ︸CT
.
DOF: (m + n)r − r2 << mn
Can reconstruct at most mn − ((m + n)r − r2) missing elements.
Small DOF desirable!Incorporate as many constraints as possible.
Carl Olsson 2019-05-01 4 / 40
Structure from Motion
Rigid reconstruction:
Non-rigid version:
Carl Olsson 2019-05-01 5 / 40
Low Rank Approximation
Find the best rank r0 approximation of X0:
minrank(X )=r0
‖X − X0‖2F
Eckart, Young (1936): Closed form solution via SVD:
If X0 =∑n
i=1 σi (X0)uivTi then X =
∑r0i=1 σi (X0)uiv
Ti .
Alternative formulation:
minXµrank(X ) + ‖X − X0‖2
F
Eckart, Young:
σi (X ) =
{σi (X0) if σi (X0) ≥ √µ
0 otherwise
Carl Olsson 2019-05-01 6 / 40
Low Rank Approximation
Generalizations:
min g(rank(X )) + ‖AX − b‖2 + C (X )
No closed form solution.
Non-convex.
Discontinuous.
Even local optimization can be difficult.
Goal: Find ”flexible, easy to optimize” relaxations.
Carl Olsson 2019-05-01 7 / 40
The Nuclear Norm Approach
Recht, Fazel, Parillo 2008. Replace rank(X ) with ‖X‖∗ =∑n
i=1 σi (X ).
minXµ‖X‖∗ + ‖AX − b‖2
Convex.
Can be solved optimally.
Shrinking bias. Not good for SfM!
Closed form solution to minX µ‖X‖∗ + ‖X − X0‖2F :
If X0 =∑n
i=1 σi (X0)uivTi then X =
∑ni=1 max
(σi (X0)− µ
2, 0)
︸ ︷︷ ︸soft thresholding
uivTi .
Carl Olsson 2019-05-01 8 / 40
Just a few prior works
Low rank recovery via Nuclear Norm:Fazel, Hindi, Boyd. A rank minimization heuristic with application to minimum ordersystem approximation. 2001.Candes, Recht. Exact matrix completion via convex optimization. 2009.Candes, Li, Ma, Wright. Robust principal component analysis? 2011.Non-convex approaches:Mohan, Fazel. Iterative reweighted least squares for matrix rank minimization. 2010.Pinghua, Zhang, Lu, Huang, Ye. A general iterative shrinkage and thresholdingalgorithm for non-convex regularized optimization problems. 2013.Sparse signal recovery using the `1 norm:Tropp. Just relax: Convex programming methods for identifying sparse signals in noise.2006.Candes, Romberg, Tao. Stable signal recovery from incomplete and inaccuratemeasurements. 2006.Candes, Tao. Near-optimal signal recovery from random projections: Universal encodingstrategies? 2006.Non-Convex approaches:Candes, Wakin, Boyd. Enhancing sparsity by reweighted `1 minimization. 2008.
Carl Olsson 2019-05-01 9 / 40
Our Approach
Replace µrank(X ) with Rµ(σ(X )) =∑
i µ−max(√µ− σi (X ), 0)2.
minXRµ(X ) + ‖AX − b‖2
Rµ continuous, but non-convex.
The global minimizer does not change if ‖A‖ < 1.
Rµ(σ(X )) + ‖AX − b‖2 lower bound on µrank(X ) + ‖AX − b‖2.
f ∗∗µ (X ) = Rµ(σ(X )) + ‖X − X0‖2F is the convex envelope of
fµ(X ) = µrank(X ) + ‖X − X0‖2F .
Larsson, Olsson. Convex Low Rank Regularization. IJCV 2016.
Carl Olsson 2019-05-01 10 / 40
Shrinking Bias
1D versions:
rank(X ) =∑
i |σi (X )|0 ‖X‖∗ =∑
i σi (X ) Rµ(σ(X )) =∑
i µ− [√µ− σi (X )]2
+
Singular value thresholding:
µrank(X ) + ‖X − X0‖2F 2
√µ‖X‖∗ + ‖X − X0‖2
F Rµ(σ(X )) + ‖X − X0‖2F
Carl Olsson 2019-05-01 11 / 40
More General Framework
Computation of the convex envelopes
f ∗∗g (X ) = Rg (σ(X )) + ‖X − X0‖2F
offg (X ) = g(rank(X )) + ‖X − X0‖2
F
where g(k) =∑k
i=1 gi and 0 ≤ g1 ≤ g2 ≤ .... And proximal operators.
Another special case:
fr0(X ) = I(rank(X ) ≤ r0) + ‖X − X0‖2F
Larsson, Olsson. Convex Low Rank Regularization. IJCV 2016.
Carl Olsson 2019-05-01 12 / 40
Results
General Case
Iffg (X ) = g(rank(X )) + ‖X − X0‖2
F
then
f ∗∗g (X ) = maxσ(Z)
(n∑
i=1
min(gi , σ
2i (Z )
)− ‖σ(Z )− σ(X )‖2
)+ ‖X − X0‖2
F .
The maximization over Z reduces to a 1D-search.(piece-wise quadratic concave objective function)Can be done in O(n) time (n =number of singular values).
Carl Olsson 2019-05-01 13 / 40
Convexity of f ∗∗µ
õ
2µ
õ
2µ
µ−[√µ− σ
]2+
+ σ2 µ−[√µ− σ
]2+
+ (σ − σ0)2 for σ0 = 0, 1, 2
If σ0 = 2 the function will not try to make σ = 0!
Carl Olsson 2019-05-01 14 / 40
Interpretations: f ∗∗r0
fr0(X ) = I(rank(X ) ≤ r0) + ‖X − X0‖2F
Level set surfaces {X | Rr0(X ) = α} for X = diag(x1, x2, x3) with r0 = 1(Left) and r0 = 2 (Middle). Note that when r0 = 1 the regularizer
promotes solutions where only one of xk is non-zero. For r0 = 2 theregularlizer instead favors solutions with two non-zero xk . For comparison
we also include the level set of the nuclear norm.
Carl Olsson 2019-05-01 15 / 40
Hankel Matrix Estimation
Signal Hankel Matrix Matrix+Noise
minH∈H
I(rank(H) ≤ r0) + ‖H − X0‖2F
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
10
20
30
40
noise σ
‖H−H
(f)‖
SVDMeanf ∗∗µf ∗∗r0
Nuclear norm
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.8
0.9
1Meanf ∗∗µf ∗∗r0
Nuclear norm
Carl Olsson 2019-05-01 16 / 40
Smooth Linear Shape Basis
fN(S) = ‖S − S0‖2F + µ‖P(S)‖∗ + τTV(S)
fR(S) = ‖S − S0‖2F +Rµ(P(S)) + τTV(S)
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10100
120
140
160
noise σ
‖S−
SGT‖ F
RµNuclear norm
Carl Olsson 2019-05-01 17 / 40
RIP problems
Linear observations: b = AX0 + ε, A : Rm×n → Rp, X0 low rank, ε noise.
Recover X0 using rank-penalties/constraints
Rµ(σ(X )) + ‖AX − b‖2 or Rr0(σ(X )) + ‖AX − b‖2
Restricted Isometry Property (RIP):
(1− δq)‖X‖2F ≤ ‖AX‖2 ≤ (1 + δq)‖X‖2
F , rank(X ) ≤ q
Olsson, Carlsson, Andersson, Larsson. Non-Convex Rank/Sparsity Regularization and LocalMinima. ICCV 2017.
Olsson, Carlsson, Bylow. A Non-Convex Relaxation for Fixed-Rank Approximation.
RSL-CV 2017.
Carl Olsson 2019-05-01 18 / 40
Near Convex Rank/Sparsity Estimation
Intuition:
If RIP holds then ‖AX‖2 behaves like ‖X‖2F .
Rµ(σ(X )) + ‖X − X0‖2F ≈ Rµ(σ(X )) + ‖X‖2
F − 2〈X ,X0〉 is convex.
What aboutRµ(σ(X )) + ‖AX − b‖2 ≈ Rµ(σ(X )) + ‖AX‖2 − 2〈X ,A∗b〉?Near convex?
1D-example: Rµ(x) + ( 1√2x − b)2 (µ = 1)
b = 0 b = 1√2
b = 1 b =√
2 b = 1.5
Carl Olsson 2019-05-01 19 / 40
Main Result (Rank Penalty)
Def.Fµ(X ) := Rµ(σ(X )) + ‖AX − b‖2
Z := (I −A∗A)Xs +A∗b
Xs stationary point of Fµ(X ) ⇔
Xs ∈ arg minXRµ(σ(X )) + ‖X − Z‖2
F .
‖X − Z‖2F local approximation of ‖AX − b‖2 around Xs .
Xs obtained by thresholding SVD of Z .
Theorem
If Xs is a stationary point of Fµ, and the singular values of Z fulfill
σi (Z ) /∈ [(1− δr )√µ,√µ
1−δr ]. then for any another stationary point X ′s wehave rank(X ′s − Xs) > r .
Carl Olsson 2019-05-01 20 / 40
Main Result (Rank Constraint)
Def.Fr0(X ) := Rr0(σ(X )) + ‖AX − b‖2
Z := (I −A∗A)Xs +A∗b
Xs stationary point of Fr0(X ) ⇔
Xs ∈ arg minXRr0(σ(X )) + ‖X − Z‖2
F .
‖X − Z‖2F local approximation of ‖AX − b‖2 around Xs .
Xs obtained by thresholding SVD of Z .
Theorem
If Xs is a stationary point of Fr0 with rank(Xs) = r0, and the singularvalues of Z fulfill σr0+1(Z ) < (1− 2δ2r0)σr0(Z ) then any other stationarypoint X ′s has rank(X ′s) > r0.
Carl Olsson 2019-05-01 21 / 40
Experiments (Rank Penalty)
Low rank recovery for varying noise level (x-axis) and regularization strength (y-axis).
b = AX0 + ε, εi ∈ N (0, σ), where rank(X0) = 10.
A: 2√µ‖X‖∗ + ‖AX − b‖2
F rµ(σ(X )) + ‖AX − b‖2F Verified optima
202 × 400(δ = 0.2)
300× 400(δ un-known)
Carl Olsson 2019-05-01 22 / 40
Experiments (Fixed Rank)
µ‖X‖∗ + ‖AX − b‖2F vs. Rr (σ(X )) + ‖AX − b‖2
F
(a) (b) (c)
0 0.2 0.4 0.6 0.8 10
5
10
15
20
25
Rr
‖ · ‖∗
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
1.2
Rr
0 0.2 0.4 0.6 0.8 10
5
10
15
Rr
‖ · ‖∗
A− 400× 400 with δ = 0.2 A− 300× 400 (unknown δ)
(a) - Noise level (x-axis) vs. data fit ‖AX − b‖ (y-axis).(b) - Fraction of instances verified to be globally optimal.(c) - Same as (a).
X ∈ R20×20 ⇒ vec(X ) ∈ R400
(a) and (b) use 400× 400 A with δ = 0.2 while (c) uses 300× 400 A.
Carl Olsson 2019-05-01 23 / 40
NRSfM
Reconstruct moving and deforming object from image projections.
CMU Motion capture sequences.
Carl Olsson 2019-05-01 24 / 40
NRSfM (Dai et al. 2012)
Xi ,Yi ,Zi : x-,y- and z-coordinats in image i .Ri : 2× 3 matrix encoding orientation of camera i (affine model).
R =
R1 0 . . . 00 R2 . . . 0...
.... . .
...0 0 . . . RF
,X =
X1
Y1
Z1...XF
YF
ZF
and X# =
X1 Y1 Z1...
......
XF YF ZF
.
Projections: M ≈ RXLinear shape basis assumption: rank(X#) ≤ r .
minX#I(rank(X#) ≤ r0) + ‖AX# −M‖2
F ,
A : X# 7→ RX .
Carl Olsson 2019-05-01 25 / 40
MOCAP Experiments
CMU Motion capture sequences:
Drink Pick-up
Stretch Yoga
Carl Olsson 2019-05-01 26 / 40
MOCAP Experiments
Rr0(X#) + ‖RX −M‖2F (blue) vs.
µ‖X#‖∗ + ‖RX −M‖2F (orange)
Data fit ‖RX −M‖F (y-axis) versus rank(X#) (x-axis).
Carl Olsson 2019-05-01 26 / 40
MOCAP Experiments
Rr0(X#) + ‖RX −M‖2F (blue) vs.
µ‖X#‖∗ + ‖RX −M‖2F (orange)
Data fit ‖RX −M‖F (y-axis) versus rank(X#) (x-axis).
Distance to ground truth ‖X − Xgt‖F (y-axis) versus rank(X#) (x-axis).
Carl Olsson 2019-05-01 27 / 40
MOCAP Experiments
RIP does not hold for A(X#) = RX !If RiNi = 0, Ni ∈ R3×1 then RiNiCi = 0, ∀Ci ∈ R1×m .Therefore A(N(C )) = 0 for any matrix of the form
N(C ) =
n11C1 n21C1 n31C1
n12C2 n22C2 n32C2...
......
n1FCF n2FCF n3FCF
,where nij are the elements of Ni .
N(C ) does not affect the projections.
If the row space of an optimal solution contains N(C ) (for some C )the solution is not unique.
Carl Olsson 2019-05-01 28 / 40
MOCAP Experiments
Rr0(X#) + ‖RX −M‖2F + ‖DX#‖2
F (blue) vs.µ‖X#‖∗ + ‖RX −M‖2
F + ‖DX#‖2F (orange)
Data fit ‖RX −M‖F (y-axis) versus rank(X#) (x-axis).
Distance to ground truth ‖X − Xgt‖F (y-axis) versus rank(X#) (x-axis).
Carl Olsson 2019-05-01 29 / 40
Optimization via ADMM/Splitting
Rµ(X ) not differentiable X .
Rµ(X ) + ‖AX − b‖2 (near) convex in X .
Proximal operatorarg minX Rµ(X ) +ρ‖X −X0‖2
F computable.
Can apply splitting schemes:
L(X ,Y ,Λ) = Rµ(X ) + ρ‖X − Y + Λ‖2F + ‖AY − b‖2 − ρ‖Λi‖2
F .
Alternate:
Xt+1 = arg minXRµ(X ) + ρ‖X − Yt + Λt‖2
F ,
Yt+1 = arg minY
ρ‖Xt+1 − Y + Λt‖2F + ‖AY − b‖2,
Λt+1 = Λt + Xt+1 − Yt+1.
First order method.
Carl Olsson 2019-05-01 30 / 40
Quadratic Approximation
Reformulate into differentiable objective. Bilinear parameterization:
minB,C
Rµ(B,C ) + ‖A(BCT )− b‖2,
Rµ(B,C ) two times differentiable a.e.
Optimize with 2nd order methods.
Introduces non-optimal stationary points.
Characterization of local minima under RIP and optimization withVarPro/Wiberg.Valtonen-Ornhag, Olsson, Heyden. Bilinear Parameterization for Differentiable Rank
Regularization, arXiv 2018.
Carl Olsson 2019-05-01 31 / 40
Bilinear Parameterization
Assumption:R(X ) =∑r
i=1 f (σi (X )), f concave, non-decreasing on [0,∞).
Theorem
R(X ) = minBCT =X R(B,C ), where
R(B,C ) =k∑
i=1
f
(‖Bi‖2 + ‖Ci‖2
2
),
and Bi ,Ci columns of B,C .R(B,C ) differentiable if f (σ) = h(|σ|) where h is differentiable.
SCAD: Log: MCP: ETP: Geman:
Carl Olsson 2019-05-01 32 / 40
Bilinear Parameterization
If X = UΣV T , B = U√
Σ, C = V√
Σ then
Bi =√σiUi
andCi =
√σiVi .
Therefore‖Bi‖2 + ‖Ci‖2
2=σi‖Ui‖2 + σi‖Vi‖2
2= σi .
Carl Olsson 2019-05-01 33 / 40
Uniqueness of Low-Rank-Minima
Over parameterization with our relaxation:
Theorem
Let (B, C ) ∈ Rm×2k × Rn×2k be a local minimizer of
Rµ(B,C ) + ‖A(BCT )− b‖2,
with rank(BCT ) < k and Rµ(B, C ) = Rµ(BCT ). If the singular values of
Z = (I −A∗A)BCT +A∗b fulfill σi (Z ) /∈ [(1− δ2k)√µ,
õ
(1−δ2k ) ] then
BCT ∈ arg minrank(X )≤k
Rµ(X ) + ‖AX − b‖2.
Carl Olsson 2019-05-01 34 / 40
VarPro/Wiberg
minB,C‖A(BCT )− b‖2
Least Squares problem in C for fixed B. ⇒ compute (closed form)
C ∗(B) = arg minC‖A(BCT )− b‖2.
Linearize A(BC ∗(B)T )− b ≈ LδB + ` at Bk .
SolveδBk = arg min ‖LδB + `‖2 + λ‖δB‖2
Bk+1 = Bk + δBk .
Quadratic approximation. Rapid convergence.Similar to GN/LM.
Carl Olsson 2019-05-01 35 / 40
Regweighted VarPro
minB,C
∑i
f
(‖Bi‖2 + ‖Ci‖2
2
)+ ‖A(BCT )− b‖2
Taylor: f (x) ≈ f (xk) + f ′(xk)(x − xk) = f ′(xk)x + const
minB,C
∑i
wki
(‖Bi‖2 + ‖Ci‖2
2
)+ ‖A(BCT )− b‖2,
wki = f ′
(‖Bk
i ‖2+‖C k
i ‖2
2
).
One iteration of regular VarPro, recompute weights.
Refactorize into Bk+1, C k+1 using SVD.
Carl Olsson 2019-05-01 36 / 40
SfM-results
Carl Olsson 2019-05-01 37 / 40
VarPro vs. ADMM
Carl Olsson 2019-05-01 38 / 40
MOCAP Results
Drink Pickup Stretch Yoga
Carl Olsson 2019-05-01 39 / 40
The End
Carl Olsson 2019-05-01 40 / 40
Convex Envelopes and Conjugate Functions
−1.4−1.2 −1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Carl Olsson 2019-05-01 40 / 40
Convex Envelopes and Conjugate Functions
−1.4−1.2 −1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
−f*(y)1
y
f ∗(y) = maxx
yx − f (x)
Carl Olsson 2019-05-01 40 / 40
Convex Envelopes and Conjugate Functions
−1.4−1.2 −1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
−f*(y)1
y
f ∗(y) = maxx
yx − f (x)⇒ f ∗(y) ≥ yx − f (x)
⇒ f (x) ≥ yx − f ∗(y)
Carl Olsson 2019-05-01 40 / 40
Convex Envelopes and Conjugate Functions
−1.4−1.2 −1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
−f*(y)1
y
−f*(y2)1
y2
f ∗(y) = maxx
yx − f (x)⇒ f ∗(y) ≥ yx − f (x)
⇒ f (x) ≥ yx − f ∗(y)
Carl Olsson 2019-05-01 40 / 40
Convex Envelopes and Conjugate Functions
−1.4−1.2 −1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
f ∗(y) = maxx
yx − f (x)⇒ f ∗(y) ≥ yx − f (x)
⇒ f (x) ≥ yx − f ∗(y)
Carl Olsson 2019-05-01 40 / 40
Convex Envelopes and Conjugate Functions
−1.4−1.2 −1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
f ∗∗(x) = maxy
xy − f ∗(y)
Carl Olsson 2019-05-01 40 / 40
Computing the Conjugate
f ∗g (X ) = maxY〈X ,Y 〉 − fg (X )
= maxk−
k∑i=1
gk + maxrank(Y )=k
〈X ,Y 〉 − ‖X − X0‖2F
Inner maximization can be solved with SVD (Eckart, Young) andcompletion of squares.
Carl Olsson 2019-05-01 40 / 40
The Missing Data Problem
‖W � (X −M)‖2F
Red - visible element. Blue - missing measurement.No known closed for solution.
Carl Olsson 2019-05-01 40 / 40
A Block Decomposition Approach
Solve the problem on sub-blocks with no missing data:
f (X ) =K∑i=1
µi rank(Pi (X )) + ‖Pi (X )− Pi (M)‖2F
Convex relaxation:
f (X ) =K∑i=1
Rµi (σ(Pi (X ))) + ‖Pi (X )− Pi (M)‖2F
Carl Olsson 2019-05-01 40 / 40
A Block Decomposition Approach
X11 X12 ?
X21 X22 X23
? X32 X33
X =
Lemma
Let X1 and X2 be two given matrices with overlap matrix X22, and letr1 = rank(X1) and r2 = rank(X2). Suppose that rank(X22) = min(r1, r2),then there exists a matrix X with rank(X ) = max(r1, r2). Additionally ifrank(X22) = r1 = r2 then X is unique.
Carl Olsson 2019-05-01 40 / 40
Results
Observed: Our Solution: Nuclear Norm: Ground Truth:
Carl Olsson 2019-05-01 40 / 40
top related