exact recovery of multichannel sparse blind deconvolution ... · exact recovery of multichannel...

1
Exact Recovery of Multichannel Sparse Blind Deconvolution via Gradient Descent Qing Qu , Xiao Li , Zhihui Zhu Center for Data Science, New York University, EE Department, Chinese University of Hong Kong, MINDS, the Johns Hopkins University Basic Task Given multiple y i R n of circulant convolution y i = a x i , (1 i p), can we recover both unknown kernel a R n and sparse signal {x i } p i=1 R n simultaneously? Our Contribution With random initializations, a vanilla RGD fol- lowed by a subgradient method converges exactly to the target solution in a linear rate. Motivations in Imaging Science Computational Microscopy Imaging. Geophyiscs and Seismic Imaging. Neuroscience. Calcium imaging, Functional MRI. Image Deblurring. Symmetric Solutions in MCS-BD Scaled & shifts of (a, x i ) are solutions to MCS-BD 11 1 20 1 0 2 = y i = αs [ a] (1)s [ x i ] - W.l.o.g., fix the scaling a=1 for a. - Hope to recover a up to signed shifts ±{s [ a 0 ]} n1 =n+1 . Assumptions & Problem Formulation Assumptions. - Sparse signal x i : x i i.i.d. Bernoulli Gaussian(θ ), θ (0, 1); - Invertible kernel a with its circulant matrix C a |{z} invertible = F diag ( b a) F , or | b a| > 0. Problem Formulation. Let us denote Y = h y 1 y 2 ··· y p i , X = h x 1 x 2 ··· x p i - Let h be the inverse kernel of a, b h = b a ⊙−1 or a h = 1, C h · Y = C h · C a | {z } = I · X = X |{z} sparse - Ideally, we want to solve the problem min q 1 np C q Y 0 | {z } sparsity = 1 np p X i=1 C y i q 0 , s.t. q ̸= 0 | {z } prevent trivial solution , to recover b a =s h α b q ⊙−1 i up to a shift-scaling symmetry. Nonconvex Relaxation. We consider min q φ(q ) := 1 np p X i=1 H µ C y i P q | {z } smooth sparsity function , s.t. q S n1 | {z } sphere constraint . - H µ (·) is smooth Huber loss for promoting sparsity. H µ (Z ) := n X i=1 p X j =1 h µ (Z ij ), h µ (z ) := |z | |z |≥ µ z 2 2µ + µ 2 |z | , - P is a preconditioning matrix. P = 1 θnp p X i=1 C y i C y i 1/2 C a C a 1/2 . - Preconditioning orthogonalizes the kernel C a C y i P = C x i C a P | {z } R C x i C a C a C a 1/2 | {z } orthogonal Q . Given C y i Pq C x i Qq and suppose Q = I , it reduces to min q f (q ) := 1 np p X i=1 H µ (C x i q ) , s.t. q S n1 . This implies that standard basis e i } n i=1 are global solutions. (a) 1 -loss, (b) Huber-loss, (c) 4 -loss, (d) 1 -loss, (e) Huber-loss, (f) 4 -loss, Geometric Property Study optimization landscape for union of sets S i± ξ := q S n1 | |q i | q i q 1+ ξ, q i 0 , for some ξ (0, +), where for each set - It contains exactly one solution ±e i ; - It excludes all saddle points; - For some small ξ = 1 5 log n , random initialization falls in one S i± ξ with Prob. 1/2. e 1 e 2 -e 2 e 3 -e 3 ξ =0 ξ = 5 log(n) Regularity Condition. With p Ω (poly(n)), w.h.p. grad f (q ),q i q e i ⟩≥ α(q ) ·∥q e i holds for each S i+ ξ (1 i n) with α> 0 and for all q ∈S i+ ξ q S n1 | q 1 q 2 i µ . Implicit Regularization. With p Ω (poly(n)), w.h.p. * grad f (q ), 1 q j e j 1 q i e i + c θ (1 θ ) n ξ 1+ ξ , for all q ∈S i+ ξ and any q j such that j ̸= i and q 2 j 1 3 q 2 i . From Geometry to Optimization Random Initialization. Draw q (0) ∼U (S n1 ), s.t. P q n [ i=1 S i± ξ 1/2 Phase I: Riemannian Gradient Descent (RGD). q (k +1) = P S n1 q (k ) τ · grad f (q (k ) ) , with small fixed τ , stays in S i± ξ thanks to implicit regularization. RGD produces a solution q with q q tgt ≤O (µ) in a linear rate, thanks to regularity condition. Phase II: Rounding. With r = q , solve min q ζ (q ) := 1 np p X i=1 C y i Pq 1 , s.t. r , q =1 via projected subgradient descent q (k +1) = q (k ) τ (k ) ·P r g (k ) , with τ (k +1) = βτ (k ) and β (0, 1), it converges linearly q (k ) q tgt η k , η (0, 1), thanks to local sharpness of ζ (q ). Comparison with Literature Experiments Algorithmic Convergence and recovery with varying θ (a) Comparison of iterate convergence (b) Recovery prob. with varying θ Phase Transition on (p, n) (a) 1 -loss (b) Huber-loss (c) 4 -loss Experiments on STORM Imaging (a) Observation (b) Ground truth (c) Huber-loss (d) 4 -loss (e) Kernel: Ground truth (f) Kernel: Huber-loss (g) Kernel: 4 -loss References [1] Q. Qu, X. Li, and Z. Zhu “A nonconvex approach for exact and efficient multichannel sparse blind deconvolution”, NeurIPS, 2019. [2] Y. Li, and Y. Bresler, “Multichannel sparse blind deconvolution on the sphere”. NeurIPS, 2018. [3]L. Wang, and Y. Chi, “Blind Deconvolution From Multiple Sparse Inputs”. IEEE SPL, 2016.

Upload: others

Post on 26-Jun-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exact Recovery of Multichannel Sparse Blind Deconvolution ... · Exact Recovery of Multichannel Sparse Blind Deconvolution via Gradient Descent Qing Qu∗, Xiao Li†, Zhihui Zhu⋄

Exact Recovery of Multichannel Sparse Blind Deconvolution via Gradient DescentQing Qu∗, Xiao Li†, Zhihui Zhu⋄

∗ Center for Data Science, New York University, † EE Department, Chinese University of Hong Kong, ⋄ MINDS, the Johns Hopkins University

Basic Task

Given multiple yi ∈ Rn of circulant convolution

yi = a ⊛ xi, (1 ≤ i ≤ p),can we recover both unknown kernel a ∈ Rn and

sparse signal xipi=1 ∈ Rn simultaneously?

Our Contribution

With random initializations, a vanilla RGD fol-lowed by a subgradient method converges exactlyto the target solution in a linear rate.

Motivations in Imaging Science

• Computational Microscopy Imaging.

• Geophyiscs and Seismic Imaging.

• Neuroscience. Calcium imaging, Functional MRI.• Image Deblurring.

Symmetric Solutions in MCS-BD

• Scaled & shifts of (a, xi) are solutions to MCS-BD

−1 1

1 −2 0

−1 0 2

=⊛

⊛yi = αsℓ[a] (1/α)s−ℓ[xi]⊛

- W.l.o.g., fix the scaling ∥a∥ = 1 for a.- Hope to recover a up to signed shifts ± sℓ[a0]n−1

ℓ=−n+1.

Assumptions & Problem Formulation

• Assumptions.- Sparse signal xi: xi ∼i.i.d. Bernoulli − Gaussian(θ), θ ∈ (0, 1);- Invertible kernel a with its circulant matrix

Ca︸︷︷︸invertible

= F ∗ diag (a) F , or |a| > 0.

• Problem Formulation. Let us denoteY =

[y1 y2 · · · yp

], X =

[x1 x2 · · · xp

]

- Let h be the inverse kernel of a, h = a⊙−1 or a ⊛ h = 1,Ch · Y = Ch · Ca︸ ︷︷ ︸

= I

· X = X︸︷︷︸sparse

- Ideally, we want to solve the problem

minq

1np

∥CqY ∥0︸ ︷︷ ︸sparsity

= 1np

p∑i=1

∥∥∥Cyiq∥∥∥

0, s.t. q = 0︸ ︷︷ ︸

prevent trivial solution

,

to recover a = sℓ

[αq⊙−1

]up to a shift-scaling symmetry.

• Nonconvex Relaxation. We consider

minq

φ(q) := 1np

p∑i=1

(Cyi

P q)

︸ ︷︷ ︸smooth sparsity function

, s.t. q ∈ Sn−1︸ ︷︷ ︸sphere constraint

.

- Hµ(·) is smooth Huber loss for promoting sparsity.

Hµ(Z) :=n∑

i=1

p∑j=1

hµ(Zij), hµ (z) :=

|z| |z| ≥ µz2

2µ + µ2 |z| < µ

,

- P is a preconditioning matrix.

P = 1

θnp

p∑i=1

C⊤yi

Cyi

−1/2

≈(C⊤

aCa

)−1/2.

- Preconditioning orthogonalizes the kernel Ca

CyiP = Cxi

CaP︸ ︷︷ ︸R

≈ CxiCa

(C⊤

aCa

)−1/2

︸ ︷︷ ︸orthogonal Q

.

Given CyiP q ≈ Cxi

Qq and suppose Q = I , it reduces to

minq

f (q) := 1np

p∑i=1

Hµ (Cxiq) , s.t. q ∈ Sn−1.

This implies that standard basis ±eini=1 are global solutions.

(a) ℓ1-loss, 7 (b) Huber-loss, 7 (c) ℓ4-loss, 7

(d) ℓ1-loss, (e) Huber-loss, (f) ℓ4-loss,

Geometric Property

Study optimization landscape for union of sets

S i±ξ :=

q ∈ Sn−1 | |qi|∥q−i∥∞

≥√

1 + ξ, qi ≷ 0 ,

for some ξ ∈ (0, +∞), where for each set- It contains exactly one solution±ei;

- It excludes all saddle points;- For some small ξ = 1

5 log n,random initialization falls inone S i±

ξ with Prob. ≥ 1/2.

e1e2

−e2

e3

−e3

ξ = 0

ξ = 5 log(n)

• Regularity Condition. With p ≥ Ω (poly(n)), w.h.p.⟨grad f (q), qiq − ei⟩ ≥ α(q) · ∥q − ei∥

holds for each S i+ξ (1 ≤ i ≤ n) with α > 0 and for all

q ∈ S i+ξ ∩

q ∈ Sn−1 |

√1 − q2

i ≥ µ

.

• Implicit Regularization. With p ≥ Ω (poly(n)),w.h.p.⟨

grad f (q), 1qj

ej − 1qi

ei

⟩≥ c

θ(1 − θ)n

ξ

1 + ξ,

for all q ∈ S i+ξ and any qj such that j = i and q2

j ≥ 13q

2i .

From Geometry to Optimization

• Random Initialization. Draw q(0) ∼ U(Sn−1), s.t.

Pq ∈

n⋃i=1

S i±ξ

≥ 1/2

• Phase I: Riemannian Gradient Descent (RGD).q(k+1) = PSn−1

(q(k) − τ · grad f (q(k))

),

with small fixed τ , stays in S i±ξ thanks to implicit

regularization. RGD produces a solution q⋆ with∥∥∥q⋆ − qtgt

∥∥∥ ≤ O(µ)in a linear rate, thanks to regularity condition.

• Phase II: Rounding. With r = q⋆, solve

minq

ζ(q) := 1np

p∑i=1

∥∥∥CyiP q

∥∥∥1 , s.t. ⟨r, q⟩ = 1

via projected subgradient descentq(k+1) = q(k) − τ (k) · Pr⊥g(k),

with τ (k+1) = βτ (k) and β ∈ (0, 1), it converges linearly∥∥∥∥q(k) − qtgt

∥∥∥∥ ≤ ηk, η ∈ (0, 1),thanks to local sharpness of ζ(q).

Comparison with Literature

Experiments

• Algorithmic Convergence and recovery with varying θ

(a) Comparison of iterate convergence (b) Recovery prob. with varying θ

• Phase Transition on (p, n)

(a) ℓ1-loss (b) Huber-loss (c) ℓ4-loss

• Experiments on STORM Imaging

(a) Observation (b) Ground truth (c) Huber-loss (d) ℓ4-loss

(e) Kernel: Ground truth (f) Kernel: Huber-loss (g) Kernel: ℓ4-loss

References

[1] Q. Qu, X. Li, and Z. Zhu “A nonconvex approach for exact and efficient multichannel sparse blinddeconvolution”, NeurIPS, 2019.

[2] Y. Li, and Y. Bresler, “Multichannel sparse blind deconvolution on the sphere”. NeurIPS, 2018.[3] L. Wang, and Y. Chi, “Blind Deconvolution From Multiple Sparse Inputs”. IEEE SPL, 2016.