non-convex relaxations for rank regularization...shrinking bias. not good for sfm! closed form...

Non-Convex Relaxations for Rank Regularization

Carl Olsson

2019-05-01

Carl Olsson 2019-05-01 1 / 40

Structure from Motion and Factorization

W � X X

Affine camera model:

︸︷︷︸

camera matrices

[X1 X2 . . .

]︸︷︷︸3D points

Carl Olsson 2019-05-01 2 / 40

General Motion/Deformation

Linear shape basis assumption:

0.1581 0.4714 −0.9782 2.0509 1.8610 −2.4750−0.0366 −0.0468 0.2511 0.0532 0.2687 0.50760.5402 −1.9804 0.4749 −0.4343 2.0293 0.3569

Carl Olsson 2019-05-01 3 / 40

Rank and Factorization

X =[x1 x2 . . . xn

]=[b1 b2 . . . br

]︸︷︷︸B (m×r)

c11 c21 . . .c12 c21 . . .

......

c1r c21 . . .

︸︷︷︸CT (r×n)

rank(X ) = r

Factorization not unique: X = BCT = BG︸︷︷︸B

G−1CT︸︷︷︸CT

DOF: (m + n)r − r2 << mn

Can reconstruct at most mn − ((m + n)r − r2) missing elements.

Small DOF desirable!Incorporate as many constraints as possible.

Carl Olsson 2019-05-01 4 / 40

Structure from Motion

Rigid reconstruction:

Non-rigid version:

Carl Olsson 2019-05-01 5 / 40

Low Rank Approximation

Find the best rank r0 approximation of X0:

minrank(X )=r0

‖X − X0‖2F

Eckart, Young (1936): Closed form solution via SVD:

If X0 =∑n

i=1 σi (X0)uivTi then X =

∑r0i=1 σi (X0)uiv

Alternative formulation:

minXµrank(X ) + ‖X − X0‖2

Eckart, Young:

σi (X ) =

{σi (X0) if σi (X0) ≥ √µ

0 otherwise

Carl Olsson 2019-05-01 6 / 40

Low Rank Approximation

Generalizations:

min g(rank(X )) + ‖AX − b‖2 + C (X )

No closed form solution.

Non-convex.

Discontinuous.

Even local optimization can be difficult.

Goal: Find ”flexible, easy to optimize” relaxations.

Carl Olsson 2019-05-01 7 / 40

The Nuclear Norm Approach

Recht, Fazel, Parillo 2008. Replace rank(X ) with ‖X‖∗ =∑n

i=1 σi (X ).

minXµ‖X‖∗ + ‖AX − b‖2

Convex.

Can be solved optimally.

Shrinking bias. Not good for SfM!

Closed form solution to minX µ‖X‖∗ + ‖X − X0‖2F :

If X0 =∑n

i=1 σi (X0)uivTi then X =

∑ni=1 max

(σi (X0)− µ

︸︷︷︸soft thresholding

uivTi .

Carl Olsson 2019-05-01 8 / 40

Just a few prior works

Low rank recovery via Nuclear Norm:Fazel, Hindi, Boyd. A rank minimization heuristic with application to minimum ordersystem approximation. 2001.Candes, Recht. Exact matrix completion via convex optimization. 2009.Candes, Li, Ma, Wright. Robust principal component analysis? 2011.Non-convex approaches:Mohan, Fazel. Iterative reweighted least squares for matrix rank minimization. 2010.Pinghua, Zhang, Lu, Huang, Ye. A general iterative shrinkage and thresholdingalgorithm for non-convex regularized optimization problems. 2013.Sparse signal recovery using the `1 norm:Tropp. Just relax: Convex programming methods for identifying sparse signals in noise.2006.Candes, Romberg, Tao. Stable signal recovery from incomplete and inaccuratemeasurements. 2006.Candes, Tao. Near-optimal signal recovery from random projections: Universal encodingstrategies? 2006.Non-Convex approaches:Candes, Wakin, Boyd. Enhancing sparsity by reweighted `1 minimization. 2008.

Carl Olsson 2019-05-01 9 / 40

Our Approach

Replace µrank(X ) with Rµ(σ(X )) =∑

i µ−max(√µ− σi (X ), 0)2.

minXRµ(X ) + ‖AX − b‖2

Rµ continuous, but non-convex.

The global minimizer does not change if ‖A‖ < 1.

Rµ(σ(X )) + ‖AX − b‖2 lower bound on µrank(X ) + ‖AX − b‖2.

f ∗∗µ (X ) = Rµ(σ(X )) + ‖X − X0‖2F is the convex envelope of

fµ(X ) = µrank(X ) + ‖X − X0‖2F .

Larsson, Olsson. Convex Low Rank Regularization. IJCV 2016.

Carl Olsson 2019-05-01 10 / 40

Shrinking Bias

1D versions:

rank(X ) =∑

i |σi (X )|0 ‖X‖∗ =∑

i σi (X ) Rµ(σ(X )) =∑

i µ− [√µ− σi (X )]2

Singular value thresholding:

µrank(X ) + ‖X − X0‖2F 2

√µ‖X‖∗ + ‖X − X0‖2

F Rµ(σ(X )) + ‖X − X0‖2F

Carl Olsson 2019-05-01 11 / 40

More General Framework

Computation of the convex envelopes

f ∗∗g (X ) = Rg (σ(X )) + ‖X − X0‖2F

offg (X ) = g(rank(X )) + ‖X − X0‖2

where g(k) =∑k

i=1 gi and 0 ≤ g1 ≤ g2 ≤ .... And proximal operators.

Another special case:

fr0(X ) = I(rank(X ) ≤ r0) + ‖X − X0‖2F

Larsson, Olsson. Convex Low Rank Regularization. IJCV 2016.

Carl Olsson 2019-05-01 12 / 40

Results

General Case

Iffg (X ) = g(rank(X )) + ‖X − X0‖2

f ∗∗g (X ) = maxσ(Z)

min(gi , σ

2i (Z )

)− ‖σ(Z )− σ(X )‖2

)+ ‖X − X0‖2

The maximization over Z reduces to a 1D-search.(piece-wise quadratic concave objective function)Can be done in O(n) time (n =number of singular values).

Carl Olsson 2019-05-01 13 / 40

Convexity of f ∗∗µ

µ−[√µ− σ

+ σ2 µ−[√µ− σ

+ (σ − σ0)2 for σ0 = 0, 1, 2

If σ0 = 2 the function will not try to make σ = 0!

Carl Olsson 2019-05-01 14 / 40

Interpretations: f ∗∗r0

fr0(X ) = I(rank(X ) ≤ r0) + ‖X − X0‖2F

Level set surfaces {X | Rr0(X ) = α} for X = diag(x1, x2, x3) with r0 = 1(Left) and r0 = 2 (Middle). Note that when r0 = 1 the regularizer

promotes solutions where only one of xk is non-zero. For r0 = 2 theregularlizer instead favors solutions with two non-zero xk . For comparison

we also include the level set of the nuclear norm.

Carl Olsson 2019-05-01 15 / 40

Hankel Matrix Estimation

Signal Hankel Matrix Matrix+Noise

minH∈H

I(rank(H) ≤ r0) + ‖H − X0‖2F

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

noise σ

‖H−H

(f)‖

SVDMeanf ∗∗µf ∗∗r0

Nuclear norm

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.8

1Meanf ∗∗µf ∗∗r0

Nuclear norm

Carl Olsson 2019-05-01 16 / 40

Smooth Linear Shape Basis

fN(S) = ‖S − S0‖2F + µ‖P(S)‖∗ + τTV(S)

fR(S) = ‖S − S0‖2F +Rµ(P(S)) + τTV(S)

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10100

noise σ

‖S−

SGT‖ F

RµNuclear norm

Carl Olsson 2019-05-01 17 / 40

RIP problems

Linear observations: b = AX0 + ε, A : Rm×n → Rp, X0 low rank, ε noise.

Recover X0 using rank-penalties/constraints

Rµ(σ(X )) + ‖AX − b‖2 or Rr0(σ(X )) + ‖AX − b‖2

Restricted Isometry Property (RIP):

(1− δq)‖X‖2F ≤ ‖AX‖2 ≤ (1 + δq)‖X‖2

F , rank(X ) ≤ q

Olsson, Carlsson, Andersson, Larsson. Non-Convex Rank/Sparsity Regularization and LocalMinima. ICCV 2017.

Olsson, Carlsson, Bylow. A Non-Convex Relaxation for Fixed-Rank Approximation.

RSL-CV 2017.

Carl Olsson 2019-05-01 18 / 40

Near Convex Rank/Sparsity Estimation

Intuition:

If RIP holds then ‖AX‖2 behaves like ‖X‖2F .

Rµ(σ(X )) + ‖X − X0‖2F ≈ Rµ(σ(X )) + ‖X‖2

F − 2〈X ,X0〉 is convex.

What aboutRµ(σ(X )) + ‖AX − b‖2 ≈ Rµ(σ(X )) + ‖AX‖2 − 2〈X ,A∗b〉?Near convex?

1D-example: Rµ(x) + ( 1√2x − b)2 (µ = 1)

b = 0 b = 1√2

b = 1 b =√

2 b = 1.5

Carl Olsson 2019-05-01 19 / 40

Main Result (Rank Penalty)

Def.Fµ(X ) := Rµ(σ(X )) + ‖AX − b‖2

Z := (I −A∗A)Xs +A∗b

Xs stationary point of Fµ(X ) ⇔

Xs ∈ arg minXRµ(σ(X )) + ‖X − Z‖2

‖X − Z‖2F local approximation of ‖AX − b‖2 around Xs .

Xs obtained by thresholding SVD of Z .

Theorem

If Xs is a stationary point of Fµ, and the singular values of Z fulfill

σi (Z ) /∈ [(1− δr )√µ,√µ

1−δr ]. then for any another stationary point X ′s wehave rank(X ′s − Xs) > r .

Carl Olsson 2019-05-01 20 / 40

Main Result (Rank Constraint)

Def.Fr0(X ) := Rr0(σ(X )) + ‖AX − b‖2

Z := (I −A∗A)Xs +A∗b

Xs stationary point of Fr0(X ) ⇔

Xs ∈ arg minXRr0(σ(X )) + ‖X − Z‖2

‖X − Z‖2F local approximation of ‖AX − b‖2 around Xs .

Xs obtained by thresholding SVD of Z .

Theorem

If Xs is a stationary point of Fr0 with rank(Xs) = r0, and the singularvalues of Z fulfill σr0+1(Z ) < (1− 2δ2r0)σr0(Z ) then any other stationarypoint X ′s has rank(X ′s) > r0.

Carl Olsson 2019-05-01 21 / 40

Experiments (Rank Penalty)

Low rank recovery for varying noise level (x-axis) and regularization strength (y-axis).

b = AX0 + ε, εi ∈ N (0, σ), where rank(X0) = 10.

A: 2√µ‖X‖∗ + ‖AX − b‖2

F rµ(σ(X )) + ‖AX − b‖2F Verified optima

202 × 400(δ = 0.2)

300× 400(δ un-known)

Carl Olsson 2019-05-01 22 / 40

Experiments (Fixed Rank)

µ‖X‖∗ + ‖AX − b‖2F vs. Rr (σ(X )) + ‖AX − b‖2

(a) (b) (c)

0 0.2 0.4 0.6 0.8 10

‖ · ‖∗

0 0.2 0.4 0.6 0.8 10

‖ · ‖∗

A− 400× 400 with δ = 0.2 A− 300× 400 (unknown δ)

(a) - Noise level (x-axis) vs. data fit ‖AX − b‖ (y-axis).(b) - Fraction of instances verified to be globally optimal.(c) - Same as (a).

X ∈ R20×20 ⇒ vec(X ) ∈ R400

(a) and (b) use 400× 400 A with δ = 0.2 while (c) uses 300× 400 A.

Carl Olsson 2019-05-01 23 / 40

Reconstruct moving and deforming object from image projections.

CMU Motion capture sequences.

Carl Olsson 2019-05-01 24 / 40

NRSfM (Dai et al. 2012)

Xi ,Yi ,Zi : x-,y- and z-coordinats in image i .Ri : 2× 3 matrix encoding orientation of camera i (affine model).

R1 0 . . . 00 R2 . . . 0...

.... . .

...0 0 . . . RF

Z1...XF

and X# =

X1 Y1 Z1...

......

XF YF ZF

Projections: M ≈ RXLinear shape basis assumption: rank(X#) ≤ r .

minX#I(rank(X#) ≤ r0) + ‖AX# −M‖2

A : X# 7→ RX .

Carl Olsson 2019-05-01 25 / 40

MOCAP Experiments

CMU Motion capture sequences:

Drink Pick-up

Stretch Yoga

Carl Olsson 2019-05-01 26 / 40

MOCAP Experiments

Rr0(X#) + ‖RX −M‖2F (blue) vs.

µ‖X#‖∗ + ‖RX −M‖2F (orange)

Data fit ‖RX −M‖F (y-axis) versus rank(X#) (x-axis).

Carl Olsson 2019-05-01 26 / 40

MOCAP Experiments

Rr0(X#) + ‖RX −M‖2F (blue) vs.

µ‖X#‖∗ + ‖RX −M‖2F (orange)

Distance to ground truth ‖X − Xgt‖F (y-axis) versus rank(X#) (x-axis).

Carl Olsson 2019-05-01 27 / 40

MOCAP Experiments

RIP does not hold for A(X#) = RX !If RiNi = 0, Ni ∈ R3×1 then RiNiCi = 0, ∀Ci ∈ R1×m .Therefore A(N(C )) = 0 for any matrix of the form

N(C ) =

n11C1 n21C1 n31C1

n12C2 n22C2 n32C2...

......

n1FCF n2FCF n3FCF

,where nij are the elements of Ni .

N(C ) does not affect the projections.

If the row space of an optimal solution contains N(C ) (for some C )the solution is not unique.

Carl Olsson 2019-05-01 28 / 40

MOCAP Experiments

Rr0(X#) + ‖RX −M‖2F + ‖DX#‖2

F (blue) vs.µ‖X#‖∗ + ‖RX −M‖2

F + ‖DX#‖2F (orange)

Distance to ground truth ‖X − Xgt‖F (y-axis) versus rank(X#) (x-axis).

Carl Olsson 2019-05-01 29 / 40

Optimization via ADMM/Splitting

Rµ(X ) not differentiable X .

Rµ(X ) + ‖AX − b‖2 (near) convex in X .

Proximal operatorarg minX Rµ(X ) +ρ‖X −X0‖2

F computable.

Can apply splitting schemes:

L(X ,Y ,Λ) = Rµ(X ) + ρ‖X − Y + Λ‖2F + ‖AY − b‖2 − ρ‖Λi‖2

Alternate:

Xt+1 = arg minXRµ(X ) + ρ‖X − Yt + Λt‖2

Yt+1 = arg minY

ρ‖Xt+1 − Y + Λt‖2F + ‖AY − b‖2,

Λt+1 = Λt + Xt+1 − Yt+1.

First order method.

Carl Olsson 2019-05-01 30 / 40

Quadratic Approximation

Reformulate into differentiable objective. Bilinear parameterization:

minB,C

Rµ(B,C ) + ‖A(BCT )− b‖2,

Rµ(B,C ) two times differentiable a.e.

Optimize with 2nd order methods.

Introduces non-optimal stationary points.

Characterization of local minima under RIP and optimization withVarPro/Wiberg.Valtonen-Ornhag, Olsson, Heyden. Bilinear Parameterization for Differentiable Rank

Regularization, arXiv 2018.

Carl Olsson 2019-05-01 31 / 40

Bilinear Parameterization

Assumption:R(X ) =∑r

i=1 f (σi (X )), f concave, non-decreasing on [0,∞).

Theorem

R(X ) = minBCT =X R(B,C ), where

R(B,C ) =k∑

(‖Bi‖2 + ‖Ci‖2

and Bi ,Ci columns of B,C .R(B,C ) differentiable if f (σ) = h(|σ|) where h is differentiable.

SCAD: Log: MCP: ETP: Geman:

Carl Olsson 2019-05-01 32 / 40

Bilinear Parameterization

If X = UΣV T , B = U√

Σ, C = V√

Σ then

Bi =√σiUi

andCi =

√σiVi .

Therefore‖Bi‖2 + ‖Ci‖2

2=σi‖Ui‖2 + σi‖Vi‖2

2= σi .

Carl Olsson 2019-05-01 33 / 40

Uniqueness of Low-Rank-Minima

Over parameterization with our relaxation:

Theorem

Let (B, C ) ∈ Rm×2k × Rn×2k be a local minimizer of

Rµ(B,C ) + ‖A(BCT )− b‖2,

with rank(BCT ) < k and Rµ(B, C ) = Rµ(BCT ). If the singular values of

Z = (I −A∗A)BCT +A∗b fulfill σi (Z ) /∈ [(1− δ2k)√µ,

(1−δ2k ) ] then

BCT ∈ arg minrank(X )≤k

Rµ(X ) + ‖AX − b‖2.

Carl Olsson 2019-05-01 34 / 40

VarPro/Wiberg

minB,C‖A(BCT )− b‖2

Least Squares problem in C for fixed B. ⇒ compute (closed form)

C ∗(B) = arg minC‖A(BCT )− b‖2.

Linearize A(BC ∗(B)T )− b ≈ LδB + ` at Bk .

SolveδBk = arg min ‖LδB + `‖2 + λ‖δB‖2

Bk+1 = Bk + δBk .

Quadratic approximation. Rapid convergence.Similar to GN/LM.

Carl Olsson 2019-05-01 35 / 40

Regweighted VarPro

minB,C

(‖Bi‖2 + ‖Ci‖2

)+ ‖A(BCT )− b‖2

Taylor: f (x) ≈ f (xk) + f ′(xk)(x − xk) = f ′(xk)x + const

minB,C

(‖Bi‖2 + ‖Ci‖2

)+ ‖A(BCT )− b‖2,

wki = f ′

(‖Bk

i ‖2+‖C k

i ‖2

One iteration of regular VarPro, recompute weights.

Refactorize into Bk+1, C k+1 using SVD.

Carl Olsson 2019-05-01 36 / 40

SfM-results

Carl Olsson 2019-05-01 37 / 40

VarPro vs. ADMM

Carl Olsson 2019-05-01 38 / 40

MOCAP Results

Drink Pickup Stretch Yoga

Carl Olsson 2019-05-01 39 / 40

The End

Carl Olsson 2019-05-01 40 / 40

Convex Envelopes and Conjugate Functions

−1.4−1.2 −1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4−1

−0.8

−0.6

−0.4

−0.2

Carl Olsson 2019-05-01 40 / 40

−1.4−1.2 −1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4−1

−0.8

−0.6

−0.4

−0.2

−f*(y)1

f ∗(y) = maxx

yx − f (x)

Carl Olsson 2019-05-01 40 / 40

−1.4−1.2 −1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4−1

−0.8

−0.6

−0.4

−0.2

−f*(y)1

f ∗(y) = maxx

yx − f (x)⇒ f ∗(y) ≥ yx − f (x)

⇒ f (x) ≥ yx − f ∗(y)

Carl Olsson 2019-05-01 40 / 40

−1.4−1.2 −1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4−1

−0.8

−0.6

−0.4

−0.2

−f*(y)1

−f*(y2)1

f ∗(y) = maxx

yx − f (x)⇒ f ∗(y) ≥ yx − f (x)

⇒ f (x) ≥ yx − f ∗(y)

Carl Olsson 2019-05-01 40 / 40

−1.4−1.2 −1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4−1

−0.8

−0.6

−0.4

−0.2

f ∗(y) = maxx

yx − f (x)⇒ f ∗(y) ≥ yx − f (x)

⇒ f (x) ≥ yx − f ∗(y)

Carl Olsson 2019-05-01 40 / 40

−1.4−1.2 −1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4−1

−0.8

−0.6

−0.4

−0.2

f ∗∗(x) = maxy

xy − f ∗(y)

Carl Olsson 2019-05-01 40 / 40

Computing the Conjugate

f ∗g (X ) = maxY〈X ,Y 〉 − fg (X )

= maxk−

k∑i=1

gk + maxrank(Y )=k

〈X ,Y 〉 − ‖X − X0‖2F

Inner maximization can be solved with SVD (Eckart, Young) andcompletion of squares.

Carl Olsson 2019-05-01 40 / 40

The Missing Data Problem

‖W � (X −M)‖2F

Red - visible element. Blue - missing measurement.No known closed for solution.

Carl Olsson 2019-05-01 40 / 40

A Block Decomposition Approach

Solve the problem on sub-blocks with no missing data:

f (X ) =K∑i=1

µi rank(Pi (X )) + ‖Pi (X )− Pi (M)‖2F

Convex relaxation:

f (X ) =K∑i=1

Rµi (σ(Pi (X ))) + ‖Pi (X )− Pi (M)‖2F

Carl Olsson 2019-05-01 40 / 40

A Block Decomposition Approach

X11 X12 ?

X21 X22 X23

? X32 X33

Let X1 and X2 be two given matrices with overlap matrix X22, and letr1 = rank(X1) and r2 = rank(X2). Suppose that rank(X22) = min(r1, r2),then there exists a matrix X with rank(X ) = max(r1, r2). Additionally ifrank(X22) = r1 = r2 then X is unique.

Carl Olsson 2019-05-01 40 / 40

Results

Observed: Our Solution: Nuclear Norm: Ground Truth:

Carl Olsson 2019-05-01 40 / 40

non-convex relaxations for rank regularization...shrinking bias. not good for sfm! closed form...

Documents

working paper...

x i = indicator random variable of the event that i-th...

lecture 2: curvelets - tunikaren/candescurvelets.pdf · 1,x...

msx-audio...

archie hs math mrs. crumpley - home · 2019. 4. 11. · 6....

ece 194c target classification in sensor networks ... w x x...

sistema stutturale structural system€¦ · f x l ≤ k 4...

incremental, inductive model checking -...

i*inaion ehtorm rdwl d~m · shirley bay, ottawa, ontario...

6.869 advances in computer vision - …...1 w x y u v w x y...

welcome to national spine care and sports...

2009 irs form...

basic notation. summation ( ) x i = the number of meals i...

i x i x cos w t + i y sin w t

x i l i n x , i n c

parameter estimation. 2d homography given a set of (x i,x i...

roes. x i«*** i

track allocation and path configurations · (iii) { 0 ,1 }...

ssc i x u cx i ut t u u x i it

induction self-inductance, rl circuits v t 0 l l/rl/r r i ...