priors arxiv:1802.04073v2 [cs.cv] 15 mar...

16
Blind Image Deconvolution using Deep Generative Priors Muhammad Asim * , Fahad Shamshad * , and Ali Ahmed * The authors have contributed equally to this work. Abstract. This paper proposes a new framework to regularize the ill-posed and non-linear blind image deconvolution problem by using deep generative priors. We employ two separate deep generative models — one trained to produce sharp images while the other trained to generate blur kernels from lower-dimensional parameters. The regularized problem is efficiently solved by simple alternating gradient descent algorithm operating in the latent lower-dimensional space of each generative model. We empirically show that by doing so, excellent image deblurring results are achieved even under extravagantly large blurs, and heavy noise. Our proposed method is in stark contrast to the conventional end-to-end approaches, where a deep neural network is trained on blurred input, and the corresponding sharp output images while completely ignoring the knowledge of the underlying forward map (convolution operator) in image blurring. Keywords: Blind image deblurring, generative adversarial networks, variational autoencoder, alternating gradient descent 1 Introduction Blind image deblurring aims to recover true image i and blur kernel k from blurry and possibly noisy observation y. For a uniform and spatially invariant blur, its mathemati- cal formulation is y = i k + n, (1) where is convolution operator and n is additive Gaussian noise. In its full generality, the inverse problem (1) is severely ill-posed as many different instances of i, and k fit the observation y. To resolve between multiple instances, priors are introduced on images and/or blur kernels in the image deblurring algorithms. Priors assume some a a priori model on the true image/blur kernel or both. Common priors include sparsity of the true image and/or blur kernel in some transform domain such as wavelets, curvelets, etc, sparsity of image gradients [1,2], 0 regularized prior [3], internal patch recurrence [4], low-rank [5,6], hyperlaplacian prior [7], etc. Although generic and extendable to multiple applications, these engineered models are not very effective as many unrealistic images also fit the prior model. In last couple of years, advances in generative modeling of images has taken it well beyond the conventional prior models above. The introduction of such deep generative priors in the image deblurring inverse problem should enable a far more effective reg- ularization yielding sharper and better quality deblurred images. Our experiments in arXiv:1802.04073v2 [cs.CV] 15 Mar 2018

Upload: others

Post on 22-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Priors arXiv:1802.04073v2 [cs.CV] 15 Mar 2018static.tongtianta.site/paper_pdf/871b0816-ec63-11e8-a02c-00163e08bb86.pdfMuhammad Asim , Fahad Shamshad , and Ali Ahmed The authors have

Blind Image Deconvolution using Deep GenerativePriors

Muhammad Asim∗, Fahad Shamshad∗, and Ali Ahmed∗The authors have contributed equally to this work.

Abstract. This paper proposes a new framework to regularize the ill-posed andnon-linear blind image deconvolution problem by using deep generative priors.We employ two separate deep generative models — one trained to produce sharpimages while the other trained to generate blur kernels from lower-dimensionalparameters. The regularized problem is efficiently solved by simple alternatinggradient descent algorithm operating in the latent lower-dimensional space ofeach generative model. We empirically show that by doing so, excellent imagedeblurring results are achieved even under extravagantly large blurs, and heavynoise. Our proposed method is in stark contrast to the conventional end-to-endapproaches, where a deep neural network is trained on blurred input, and thecorresponding sharp output images while completely ignoring the knowledge ofthe underlying forward map (convolution operator) in image blurring.

Keywords: Blind image deblurring, generative adversarial networks, variationalautoencoder, alternating gradient descent

1 Introduction

Blind image deblurring aims to recover true image i and blur kernel k from blurry andpossibly noisy observation y. For a uniform and spatially invariant blur, its mathemati-cal formulation is

y = i⊗ k + n, (1)

where ⊗ is convolution operator and n is additive Gaussian noise. In its full generality,the inverse problem (1) is severely ill-posed as many different instances of i, and k fitthe observation y.

To resolve between multiple instances, priors are introduced on images and/or blurkernels in the image deblurring algorithms. Priors assume some a a priori model on thetrue image/blur kernel or both. Common priors include sparsity of the true image and/orblur kernel in some transform domain such as wavelets, curvelets, etc, sparsity of imagegradients [1,2], `0 regularized prior [3], internal patch recurrence [4], low-rank [5,6],hyperlaplacian prior [7], etc. Although generic and extendable to multiple applications,these engineered models are not very effective as many unrealistic images also fit theprior model.

In last couple of years, advances in generative modeling of images has taken it wellbeyond the conventional prior models above. The introduction of such deep generativepriors in the image deblurring inverse problem should enable a far more effective reg-ularization yielding sharper and better quality deblurred images. Our experiments in

arX

iv:1

802.

0407

3v2

[cs

.CV

] 1

5 M

ar 2

018

Page 2: Priors arXiv:1802.04073v2 [cs.CV] 15 Mar 2018static.tongtianta.site/paper_pdf/871b0816-ec63-11e8-a02c-00163e08bb86.pdfMuhammad Asim , Fahad Shamshad , and Ali Ahmed The authors have

2 Authors Suppressed Due to Excessive Length

Blu

rry

Deb

lurr

edO

rigi

nal

Fig. 1: Blind image deblurring using deep generative priors. Deblurred images (second row)recovered from the blurry ones (first row) using an alternating gradient descent algorithm withthe help of two implicit pre-trained generative models; one trained on clean images, and the otheron blur kernels. Extremely blurred faces and numbers, not even recognizable to the naked eye,are recovered faithfully using this approach.

Figure 1 confirm this hypothesis, and show that using deep generative priors in solvingthe inverse problem (1) produces promising deblurring results on the standard data setsof face images and house numbers. The blurred faces are almost not recognizable bythe naked eye due to extravagant blurs yet the algorithm deblurs the faces near perfectlywith the assistance of generative priors. Interestingly, owing to the implicit constraintsimposed by generative priors, any deviation, for example, in the deblurred faces fromthe original faces does not manifest itself through loss of sharpness rather through slightchange of facial expressions, or orientation. This form of information loss is mainlyharmless in important applications such as face recognition from blurred images.

This paper shows with an empirical study that given a noisy, and blurry image y,and the assumption that the true image i and blur kernel k belong to the range of theirrespective pre-trained generator, we can recover a visually appealing, and a sharp ap-proximation of the true image using alternating gradient descent algorithm. Specifically,the algorithm searches for a pair (i, k) in the range of respective pre-trained generatorsof images and blur kernels that explain the blurred image y. Implicitly, this aggressivelyregularizes the alternating gradient descent algorithm to produce sharp and clean im-ages as solutions. Since the range of the generative models can be traversed by a muchlower dimensional latent representations compared to the ambient dimension of the im-ages, it not only reduces the number of unknowns in the deblurring problem but alsoallows for an efficient implementation of gradients in this lower dimensional space us-ing back propagation through the generators. Our numerical experiments manifest that,in general, the deep generative priors yield better deblurring results compared to theclassical image priors studied extensively in the literature.

It is instructive to compare our approach against a standard end-to-end training of adeep neural net trained to fit blurred images with corresponding sharp ones [8,9,10,11].

Page 3: Priors arXiv:1802.04073v2 [cs.CV] 15 Mar 2018static.tongtianta.site/paper_pdf/871b0816-ec63-11e8-a02c-00163e08bb86.pdfMuhammad Asim , Fahad Shamshad , and Ali Ahmed The authors have

Blind Image Deconvolution using Deep Generative Priors 3

The main drawback of the end-to-end framework is that it completely ignores theknowledge of forward map or the governing equations (1) that explain the image blur-ring phenomenon. Consequently, the deblurring is more sensitive to changes in the blurkernels, images, and noise distributions in the test set that are not representative of thetraining data. Comparatively, our approach explicitly takes into account the knowledgeof forward map (1) in the gradient decent algorithm to achieve robust results. Moreover,our approach does not require expensive retraining of every deep network involved incase of partial changes in blur problem specifications such as alterations in the blurkernels or noise models; in the former case, we only have to retrain the blur generativemodel, and in the later case, we only need to modify the loss function of the descentalgorithm.

We show experimentally that our approach is better than classical engineered priorsand competitive against end-to-end learning models. We also show that how end-to-end models fail under different noise levels and large blurs, while our approach is veryrobust.

1.1 Main Contributions

To the best of our knowledge, this is the first work that investigates the power of deepgenerative priors to regularize ill-posed and non-linear inverse problem of blind im-age deblurring. We show that this regularized problem is efficiently solved by simplealternating gradient descent algorithm. Our results show that compared to the conven-tional engineered linear models, the deep generative priors enforce the image modelmuch more aggressively leading to visually pleasing and sharp deblurred images. Themethod produces excellent results even under tough conditions of extravagantly largeblurs, and heavy noise.

The rest of the paper is organized as follows. In Section 2, we give an overviewof the related work. We formulate the problem in Section 3 followed by our proposedalternating gradient descent algorithm in Section 4. Section 5 contains experimentalresults followed by concluding remarks in Section 6.

2 Related Work

Neural networks based implicit generative models such as Generative Adverserial Net-works (GANs) [12] and Variational Autoencoders (VAEs) [13] have found much suc-cess in modelling complex data distributions specially that of images. Due to theirpower of modelling distributions of natural image manifold, they have been extensivelyused to solve ill-pose inverse problems like compressed sensing [14], image superreso-lution [15], image inpainting [16] etc.

Recently GANs and VAEs have also been used for blind image deblurring. In [17]the authors jointly deblur and superresolve low resolution blurry text and face imagesby introducing novel feature matching loss term during training process of GAN. Theyshow their proposed formulation favours clean high resolution images over low resolu-tion blurry images. Similarly sparse coding based framework has been proposed in [18]consisting of both VAE that learns data prior by encoding its features into compact form,

Page 4: Priors arXiv:1802.04073v2 [cs.CV] 15 Mar 2018static.tongtianta.site/paper_pdf/871b0816-ec63-11e8-a02c-00163e08bb86.pdfMuhammad Asim , Fahad Shamshad , and Ali Ahmed The authors have

4 Authors Suppressed Due to Excessive Length

and GAN for discriminating clean and blurry image features. Their method can work onboth space-invariant and space variant blurs and show competitive performance whencompared with recent image deblurring algorithms. Recently in [19] authors employedvariant of GAN, conditional GAN for blind motion deblurring in an end-to-end frame-work and optimize it using multi-component loss functions consisting of content andadversarial loss. They also evaluate the quality of deblurred images through their algo-rithm on object detection task and show promising results. However all these generativemodels based deblurring pproaches are end-to-end and suffering from same draw backsas other deep learning techniques i.e. retraining for different noise level and differentblurs.

Our approach is inspired by [14] that use pretrained generator on natural imagesto solve the ill posed compressed sensing problem. Similar approach has been usedfor image inpainting problem [16]. We also note the work of [20] and [21] that usepretrained generators for circumventing the issue of adversarial attacks on deep neuralnetworks.

3 Problem Formulation

We assume the image i ∈ Rn and blur kernel k ∈ Rn in (1) are members of somestructured classes I of images, and K of blurs, respectively. As an example, I maybe a set of celebrity faces, or vehicle number plate images, and K comprises of nat-ural motion and Gaussain blurs. A representative sample set from both classes I andK is employed to train generative models for each class. We denote by the mappingsGI : Rl → Rn and GK : Rm → Rn, the generators for class I, and K, respectively.Given low-dimensional inputs zi ∈ Rl, and zk ∈ Rm, the generators GI , and GK gen-erate new samples GI(zi), and GK(zk) that are representative of the classes I, and K,respectively. These generators are fixed after training. To recover the clean image, andblur kernel (i, k) from the blurred image y in (1), we propose minimizing the followingobjective function

(i, k) := argmini∈Range(GI)k∈Range(GK)

‖y − i⊗ k‖2. (2)

where Range(GI) and Range(GK) is the set of all the image and blurs that can be gener-ated byGI andGK, respectively. In words, we want to find an image i and a blur kernelk that best explains the forward model (convolution operator) (1) from the range of theirrespective generators. Ideally, the range of a generator comprises of only the samplesdrawn from the distribution of the image or blur class. Constraining the solution (i, k)to lie only in generator ranges, therefore, implicitly reduces the solution ambiguitiesinherent to the ill-posed blind deconvolution problem, and forces the solution to bemembers of classes I, and K.

The minimization program in (2) can be equivalently formulated in the lower di-mensional, latent representation space as follows

(zi, zk) = argminzi∈Rl,zk∈Rm

‖y −GI(zi)⊗GK(zk)‖2. (3)

Page 5: Priors arXiv:1802.04073v2 [cs.CV] 15 Mar 2018static.tongtianta.site/paper_pdf/871b0816-ec63-11e8-a02c-00163e08bb86.pdfMuhammad Asim , Fahad Shamshad , and Ali Ahmed The authors have

Blind Image Deconvolution using Deep Generative Priors 5

Fig. 2: Block diagram of proposed approach. zti and ztk sampled from Gaussian normal distri-bution for alternating gradient descent are used to minimize the loss. The optimal pair (zi, zk)which minimizes the residual loss are used to generate image and blur estimates (i, k).

This optimization program can be thought of as tweaking the latent representation vec-tors zi, and zk, (input to the generatorsGI , andGK, respectively) until these generatorsgenerate an image i and blur kernel k whose convolution comes as close to y as possi-ble.

The optimization program in (3) is obviously non-convex owing to the bilinear con-volution operator, and non-linear deep generative models. We resort to an alternatinggradient descent algorithm to find a local minima (zi, zk). Importantly, the weights ofthe generators are always fixed as they enter into this algorithm as pre-trained models.At a given iteration, the algorithm fixes zi and takes a descent step in zk, and vice verse.The gradient step in each variable involves a forward and back propagation through thegenerators. Section 4.3 talks about the back propagation, and gives explicit gradientforms for descent in each zi and zk for this particular algorithm.

The estimated deblurred image and the blur kernel are acquired by a forward passof the solutions zi and zk through the generators GI and GK. Mathematically, (i, k) =(GI(zi), GK(zk)).

4 Image Deblurring Algorithm

Our approach requires that we have pre-trained generative models GI and GK forclasses I andK, respectively. As a result, we train GANs and VAEs on the clean imagesand blur kernels. We briefly recap the training process below.

Page 6: Priors arXiv:1802.04073v2 [cs.CV] 15 Mar 2018static.tongtianta.site/paper_pdf/871b0816-ec63-11e8-a02c-00163e08bb86.pdfMuhammad Asim , Fahad Shamshad , and Ali Ahmed The authors have

6 Authors Suppressed Due to Excessive Length

4.1 Training the Generative Models

A generative model1 G will either be trained via adversarial learning [12] or variationalinference [13].

Generative adversarial networks (GAN) learn the distribution p(i) of images in classI by playing an adversarial game. A discriminator network D learns to differentiatebetween true images sampled from p(i) and those produced by the generator networkG, while G tries to fool the discriminator. The cost function describing the game isgiven by

minG

maxD

Ep(i) logD(i) + Ep(z) log(1−D(G(z))),

where p(z) is the distribution of latent random variables z, and is usually defined to bea known and simple distribution such as p(z) = N (0, I).

Variational autoencoders (VAE) learn the distribution p(i) by maximizing a lowerbound on the log likelihood:

log p(i) ≥ Eq(z|i) log p(i|z)− KL(q(z|i)‖p(z)),

where the second term on the right hand side is the Kullback-Leibler divergence be-tween known distribution p(z), and q(z|i). The distribution q(z|i) is a proxy for theunknown p(z|i). Under a rich enough function model q(z|i), the lower bound is ex-pected to be tight. The functional forms of q(z|i), and p(i|z) each are modelled via adeep network. The right hand side is maximized by tweaking the network parameters.The deep network p(i|z) is the generative model that produces samples of p(i) fromlatent representation z.

Generative model GI for the face images is trained using adversarial learning forvisually better quality results. Each of the generative modelGI for other image datasets,and GK for blur kernels are trained using variational inference framework above.

4.2 Naive Deblurring

After the training in the last section, the weights of the generative models GI , and GKare fixed for the rest of the algorithm. To deblur an image y, a simplest possible strategyis to find an image closest to y in the range of the given generator GI of clean images.Mathematically, this amounts to solving the following optimization program

argminzi∈Rl

‖y −GI(zi)‖, i = GI(zi) (4)

Although non-convex, a local minima zi can be achieved via gradient descent imple-mented using the back propagation algorithm. The recovered image i is obtained by aforward pass of zi through the generative model GI . Expectedly, this approach failsto produce reasonable recovery results; see Figure 3. The reason being that this backprojection approach completely ignores the knowledge of the forward blur model in (1).

We now address this shortcoming by including the forward model and blur kernelin our proposed objective.

1 The discussion in this section applies to bothGI , andGK, therefore, we ignore the subscripts.

Page 7: Priors arXiv:1802.04073v2 [cs.CV] 15 Mar 2018static.tongtianta.site/paper_pdf/871b0816-ec63-11e8-a02c-00163e08bb86.pdfMuhammad Asim , Fahad Shamshad , and Ali Ahmed The authors have

Blind Image Deconvolution using Deep Generative Priors 7

Blu

rry

Ran

geO

riginal

Fig. 3: Naive Deblurring. We gradually increase blur size from left to right and demonstrate thefailure of naive deblurring by finding the closest image in the range of the image generator (lastrow) to blurred image.

4.3 Alternating Gradient Descent via Back Propagation

We discovered in the previous section that simply finding a clean image close to theblurred one in the range of the clean image generator GI is not good enough. A morenatural and effective strategy is to instead find a pair consisting of a clean image and ablur kernel in the range ofGI , andGK, respectively, whose convolution comes as closeto the blurred image y as possible. As outlined in Section 3 that this amounts to solvingthe optimization program below

argminzi∈Rl,zk∈Rm

‖y −GI(zi)⊗GK(zk)‖2, (5)

where ⊗ is the convolution operator. To reduce clutter, we restrict ourself to circularconvolution for the discussion in this section. Define an n× n DFT matrix F as

F [ω, t] = 1√ne−j2πωt/n, 1 ≤ ω, t ≤ n,

where F [ω, t] denotes the (ω, t)th entry of the Fourier matrix. Since the convolutionoperator reduces to multiplication in the Fourier domain, we can equivalently write theobjective function above in the Fourier domain as

argminzi∈Rl,zk∈Rm

‖Fy −√nFGI(zi)� FGK(zk)‖2, (6)

where the operator � computes the pointwise product of the entries of the vectors.Finally, incorporating the fact that latent representation vectors zi, and zk can be equiv-alently assumed to be coming from standard Gaussian distributions in the both the ad-versarial learning and variational inference framework, outlined in Section 4.1, we canfurther add `2 penalty terms on the latent representations. This brings us to our finaloptimization program for image deblurring

argminzi∈Rl,zk∈Rm

‖Fy −√nFGI(zi)� FGK(zk)‖2 + γ‖zi‖2 + λ‖zk‖2, (7)

where λ, and γ are scalar free parameters. For brevity, we denote the objective by

L(zi, zk) = ‖Fy −√nFGI(zi)� FGK(zk)‖2 + γ‖zi‖2 + λ‖zk‖2.

Page 8: Priors arXiv:1802.04073v2 [cs.CV] 15 Mar 2018static.tongtianta.site/paper_pdf/871b0816-ec63-11e8-a02c-00163e08bb86.pdfMuhammad Asim , Fahad Shamshad , and Ali Ahmed The authors have

8 Authors Suppressed Due to Excessive Length

Algorithm 1 Alternating Gradient DescentInput: y, GI , GK and ηOutput: Estimates i and kInitialize:z(0)i ∼ N (0, I), z

(0)k ∼ N (0, I)

for t = 0, 1, 2, . . . doz(t+1)i ← z

(t)i - η∇ziL(z

(t)i , z

(t)k );

z(t+1)k ← z

(t)k - η∇zkL(z

(t)i , z

(t)k );

end fori← GI(zi), k ← GK(zk)

As pointed out earlier, the objective function above is non-convex due to non-linearitiesintroduced from pointwise multiplications and generative models, therefore, we resortto finding a local minima using an alternating gradient descent step in each of the un-knowns zi, and zk. We begin by initializing each of the vectors zi, and zk by a samplefrom a standard Gaussian density. To circumvent non-convexity of the above program,we use different random initializations of latent variables and find minimizers usingeach instance pair (which we refer to as random restarts). From these possible solutions,we extract the final solution which best minimizes (6), which we define as residual loss.At a given step, we fix one of the two unknowns vectors zi, zk, and take a gradient stepin the other. Algorithm 1 formally introduces the proposed alternating gradient descentscheme.

We now compute the gradient expressions2 for each of the variables zi, and zk. Startby defining residual rt at the tth iteration:

rt := FGK(ztk)�

√nFGI(z

ti)− Fy,

Let GK = ∂∂zk

GK(zk) and GI = ∂∂ziGI(zi). Then, it is easy to see that

∇ztkL(zti , z

tk) = G∗KF

∗[rt �√n F (GI(zti)]+ λztk,

∇ztiL(zti , z

tk) =

√nG∗IF

∗[rt � FGK(ztk)]+ γzti .

For illustration, take the example of a two layer generator GI(zi), which is simply

GI(zi) = relu(W2(relu(W1zi))),

where W1 : Rl1×l, and W2 : Rn×l1 are the weight matrices at the first, and secondlayer, respectively. In this case GI = W2,ziW1,zi where W1,zi = diag(W1zi > 0)W1

and W2,zi = diag(W2W1,zi > 0)W2. From this expression, it is clear that alternating

2 For a function f(x) of variable x = u+ ιv, the Wirtinger derivatives of f(x) with respect tox, and x are defined as

∂f

∂x=

1

2

(∂f

∂u− ι∂f

∂v

), and

∂f

∂x=

1

2

(∂f

∂u+ ι

∂f

∂v

)

Page 9: Priors arXiv:1802.04073v2 [cs.CV] 15 Mar 2018static.tongtianta.site/paper_pdf/871b0816-ec63-11e8-a02c-00163e08bb86.pdfMuhammad Asim , Fahad Shamshad , and Ali Ahmed The authors have

Blind Image Deconvolution using Deep Generative Priors 9

Table 1: Architectures for VAEs used for Blur, MNIST and SVHN. Here, conv(m,n,s)represents convolutional layer with m filters of size n and stride s. Similarly, convTrepresent transposed convolution layer. Maxpool(n,m) represents a max pooling layerwith stride m and pool size of n. Finally, fc(m) represents a fully connected layer ofsize m. The decoder is designed to be the mirror reflection of the encoder in each case.

Model ArchitecturesModel Encoder Decoder

Blur VAE conv(20, 2× 2, 1)→ relu→ maxpool(2× 2, 2)→conv(20, 2× 2, 1)→ relu→ maxpool(2× 2, 2)→

fc(50), fc(50)→ fc(50)→ zk

zk → fc(720)→ relu→ reshape→ upsample(2× 2)→ convT(20, 2× 2, 1)→ relu→ upsample(2× 2)→convT(20, 2× 2, 1)→ relu→ convT(1, 2× 2, 1)→

sigmoidMNIST VAE conv(20, 3× 3, 1)→ relu→ maxpool(2× 2, 2)→

conv(20, 3× 3, 1)→ relu→ maxpool(2× 2, 2)→fc(500)→ fc(100), fc(100)→ fc(100)→ zi

zi→ fc(500)→ relu→ reshape→ upsample(2× 2)→ convT(20, 3× 3, 1)→ relu→ upsample(2× 2)→convT(20, 3× 3, 1)→ relu→ convT(1, 3× 3, 1)→

sigmoidSVHN VAE conv(128, 2× 2, 2)→ batch-norm→ relu→

conv(256, 2× 2, 2)→ batch-norm→ relu→conv(512, 2× 2, 2)→ batch-norm→ relu→ fc(100),

fc(100)→ fc(100)→ zi

zi→ fc(8192)→ reshape→ convT(512, 2× 2, 2)→batch-norm→ relu→ convT(256, 2× 2, 2)→batch-norm→ relu→ convT(128, 2× 2, 2)→

batch-norm→ relu→ conv(3, 1× 1, 1)→ sigmoid

gradient descent algorithm for the blind image deblurring requires alternate back prop-agation through the generators GI , and GK as illustrated in Figure 2. To update zt−1i ,we fix zt−1k , compute the Fourier transform of a scaling of the residual vector rt, andback propagate it through the generator GI . Similar update strategy is employed forzt−1k keeping zti fixed.

5 Experiments and Results

In this section we evaluate performance of our proposed algorithm on three datasetsMNIST [22], SVHN [23], and CelebA [24]. We show its competitiveness against base-line deblurring approaches and demonstrate its robustness to extravagant blurs andnoise.

5.1 Datasets and Generators Description

Image Datasets and Generators: First dataset, MNIST, consists of 28× 28 grayscaleimages of handwritten digits with 60, 000 training and 10, 000 test examples. For MNIST,we trained a VAE with architecture defined in Table 1 with latent dimension size (zi)100, batch size of 32 and learning rate 0.001 using adam optimizer with remainingparameters set to default values.

Second dataset, SVHN, consists of 32×32×3 real world images of house numbersobtained from google street view images. We used extra training set of SVHN contain-ing relatively less complicated 531, 131 images of which 30, 000 were held out as testset. These images are rescaled to [0 1] before training a VAE mentioned in Table 1with size of zi equal to 100, batch size of 1500 and learning rate of 10−5 using adamoptimizer.

Third dataset, CelebA contains more than 200, 000 RGB face images of size 218×178×3 of different celebrities. We center crop CelebA images to 64×64×3 and scale

Page 10: Priors arXiv:1802.04073v2 [cs.CV] 15 Mar 2018static.tongtianta.site/paper_pdf/871b0816-ec63-11e8-a02c-00163e08bb86.pdfMuhammad Asim , Fahad Shamshad , and Ali Ahmed The authors have

10 Authors Suppressed Due to Excessive Length

MNIST SVHN CelebA

γ λ Steps (t) Step size γ λ Steps (t) Step size γ λ Steps (t) Step size

0.1 0.1 1500 0.01e−t250 0.1 0.1 6000 0.01e

−t1000 0.01 0.01 10000 0.001

Table 2: Parameters of proposed alternating gradient descent algorithm used in each experiment.

them in range [−1 1] before feeding them into deep convolutional generative adversar-ial network (DCGAN) [25] for training with default parameters. Model is trained byupdating GI(zi) twice and GK(zk) once in each cycle to avoid fast convergence.Blur Datasets and Generators: We generate two blur datasets, A and B. Blur datasetA, modelled after camera shake blur, contains motion blurs of length 10 with uniformlyvarying angle between 0 to 360 degrees. Blur dataset B contains centered overlappingmotion blurs (from dataset A) and Gaussian blurs. These Gaussian blurs have uniformlyvarying standard deviation between 0.5 and 1.5. For each dataset, we generate 80, 000blurs and split them into 60, 000 training and 20, 000 test set. We train two separateVAEs with the same architecture on each dataset. Model architectures for blur VAEsare presented in Table 1. These VAEs are trained using Adam optimizer with size oflatent dimension zk 50, batch size 5, and learning rate 10−5. After training, the decoderpart is extracted as the desired generator GK.

To show robustness of proposed approach we also create a variant of dataset A,which we named as Extravagant Blurs, that have length of 30 instead of 10. For thisdataset, we trained another VAE with the same architecture as before except for the lastlayer of the decoder: relu instead of sigmoid. These large blurs significantly degrade theimages upto a point that they are not even recognizable. To recover sharp images fromthese blurred images, is a significant challenge.

5.2 Blind Image Deblurring Results

In this section, we evaluate the performance of our algorithm on noisy blurred images,generated by convolving images i and blurs k from their respective test sets. We traingenerative models on training data of respective dataset (VAE for MNIST and SVHNand GAN for CelebA). All experiments are performed by adding 1%3 Gaussian noise toblurry images unless stated otherwise. Parameters of our proposed alternating gradientdescent algorithm are mentioned in Table 2

Since the deblurred image using our proposed algorithm is constrained to lie in therange of the generator GI , we introduce irange, the closest image in `2 distance to i ∈ Ilying in range of generator GI . This is found by using (4) with y = i.Baseline Methods: We choose dark prior deblurring (DP) [26] and convolutional neu-ral network (CNN) [9] trained in an end-to-end setting for deblurring as baseline al-gorithms. DP makes prior assumption on sparsity of dark channel (prior based on the

3 For an image scaled between 0 and 1, Gaussian noise of 1% translates to Gaussian noise withstandard deviation σ = 0.01 and mean µ = 0.

Page 11: Priors arXiv:1802.04073v2 [cs.CV] 15 Mar 2018static.tongtianta.site/paper_pdf/871b0816-ec63-11e8-a02c-00163e08bb86.pdfMuhammad Asim , Fahad Shamshad , and Ali Ahmed The authors have

Blind Image Deconvolution using Deep Generative Priors 11

Ori

gina

lB

lurr

yD

PC

NN

Our

sR

ange

Fig. 4: Image deblurring results on SVHN, CelebA and MNIST. Images from test set along withcorresponding blur kernels are convolved to produce blurry images. Qualitatively our approachis better than Dark Prior (DP) and competitive against CNN. It can be seen that the recoveredimage i is close to irange, which is the closest image in the generator Range to the original image.

statistics of clean images) of blurred image and has shown promising results. We usedefault parameters for all experiments. On the other hand CNN [9] demonstrates powerof end-to-end deep convolutional neural networks and shows that they significantlyoutperform traditional prior based methods. We choose CNN as a proxy for the latestdeblurring methods, where neural networks are trained in end-to-end fashion to directlydeblur images. Through experiments, we highlight weakness of these deblurring meth-ods. We trained and fine tuned this network on CelebA and SVHN, for each blur dataset,separately, with 150, 000 training images each and 1% Gaussian noise.

Page 12: Priors arXiv:1802.04073v2 [cs.CV] 15 Mar 2018static.tongtianta.site/paper_pdf/871b0816-ec63-11e8-a02c-00163e08bb86.pdfMuhammad Asim , Fahad Shamshad , and Ali Ahmed The authors have

12 Authors Suppressed Due to Excessive Length

CelebA SVHN CelebA (Blur size 30)Methods DP CNN Ours DP CNN Ours DP CNN OursPSNR(dB)SSIM

19.674 28.338 23.525 14.244 21.450 21.606 - 19.441 20.072.781 .895 .822 .387 .869 .856 - 0.771 .784

Table 3: Quantitative comparison of proposed approach with dark prior (DP) and CNN forCelebA and SVHN dataset with blur test set A and B. We randomly sample 40 images fromtest set of CelebA and SVHN, 20 are blurred with blur A and 20 with blur B. This table showsaverage performance of PSNR and SSIM on these images.

Results and Discussion:4 Qualitative and quantitative results of proposed approachalong with baseline algorithms are shown in Figure 4 and Table 3, respectively forblur datasets A and B. We compare Peak Signal to Noise Ratio (PSNR) and StructuralSimilarity Index Measure (SSIM)[28] of the proposed algorithm with these baselinemethods. It can be seen that our approach is better than dark prior in terms of both PSNRand SSIM, and competitive against end-to-end trained CNN. Close resemblance of therecovered images with corresponding generator’s range shows that in our approach thegenerator is not just hallucinating details, but finds the closest image in its range to thetrue image.

In Table 3, there is a significant performance gap both in terms of PSNR and SSIMbetween our approach and CNN for CelebA when compared with SVHN – in case ofSVHN, the proposed approach performs just as well as CNN. This can be attributedto the range of generator GI for SVHN. Since, SVHN images are relatively simplecompared to CelebA, its generator almost entirely spans the set I. This fact, allowsthe generator of SVHN to recover almost exact images thus showing excellent perfor-mance. Face images are much more complex due to which they may not always lie init’s generator range. This explains the greater source of error in case of CelebA as com-pared to SVHN. To provide evidence for this claim, we investigate the sources of error.

Range Error Analysis: As explained in [14], three sources of error can exist: estimateszi and zk found by gradient descent may not be the optimal solutions – true solution liesfar away from the range of generators – error due to noise. Because of random restarts,the first error is reduced considerably, whereas the second error may dominate if thetrue image does not lie near or in the range of generator.

Here, we analyse how does this error affect the performance of our approach. Forevery tuple (itest, ktest, ytest) where itest and ktest are the image and blur from test set, andytest being the corresponding blurred image, we generate another tuple (irange, krange, yrange),where irange is the closest image to itest in Range(GI), krange is the closest kernel to ktestin Range(GK) and yrange is corresponding blurred image. Using ytest and yrange, we useour approach to recover estimates of image and blur for each case. In the later case,since images already lie in generator range, the second error will be eliminated, but

4 Although we formally analyse our algorithm on DCGAN, but we show in Figure 1 that anypre-trained generative model that can generate high resolution images such as PG-GAN [27]can also be used with excellent qualitative results. This highlights the transferability of ouralgorithm, where only GI needs to be changed for different image sets with possibly differentresolutions, but requires the same blur generator for deblurring. We do not report detailedanalysis for PG-GAN due to computational limitations.

Page 13: Priors arXiv:1802.04073v2 [cs.CV] 15 Mar 2018static.tongtianta.site/paper_pdf/871b0816-ec63-11e8-a02c-00163e08bb86.pdfMuhammad Asim , Fahad Shamshad , and Ali Ahmed The authors have

Blind Image Deconvolution using Deep Generative Priors 13

Test Reconstruction erroretest = ‖itest − i‖

Range Reconstruction errorerange = ‖irange − i‖

Improvement Ratioetest/erange

0.000917 0.000334 2.746Table 4: Range error analysis: We compare average (per pixel) test and range reconstructionerror for a total 200 images with 50 images from each image and blur dataset. Improvement ratioindicates that the reconstruction error is less than half on average when image being deblurredlies in the range of the generator.

(a) Original (b) Blurry (c) CNN (d) Ours (e) Range

Fig. 5: Blind image deblurring with large blurs.Images from CelebA test set along with corre-sponding blur kernels are shown in (a). Corre-sponding blurry images (b) are recovered usingCNN (c) and proposed approach (d) with theclosest image in generator Range shown in (e).

(a) 10% (b) 20% (c) 30% (d) 10% (e) 20% (f) 30%

Fig. 6: Noise Analyis. We show our recoveredimages (second row) from noisy blurred im-ages (first row) and compare our results withCNN (third row), by varying noise uniformlyfrom 10% to 30%.

this may not be the case for test set images. By comparing the reconstruction errors(per pixel) ‖itest − i‖ with ‖irange − i‖ where i is the recovered image in each case, wecan quantify the performance improvement when images lie in range of their respectivegenerators. From Table 4, it can be seen that reconstruction error is improved by a factorof 2.7, if images lie in the range of their respective generators.

Robustness to Extravagant Blurs and Noise: Here, we show the effectiveness of pro-posed algorithm on large blurs and high noise. To demonstrate this, we use images fromCelebA and blurs from Extravagant Blur test set. Dark Prior (DP) completely fails toestimate true image from these blurry images, so we do not report its quantitative resultsin Table 3. We retrained our previous CNN model on these extravagantly blurred im-ages and report quantitative results in Table 3. These results show that in the regime oflarge blurs, our algorithm is expected to outperforms end-to-end trained model. Theseend-to-end trained models suffer on large blurs because they do not embed the forwardblur model. Qualitatively, our results shown in Figure 5 are sharp and visually appealingas compared to CNN results.

Our proposed algorithm performs surprisingly well in high noise regime as shownqualitatively in Figure 6 and quantitatively in Table 5. For comparison, we used theCNN trained on CelebA and SVHN with 1% Gaussian noise to show robustness of our

Page 14: Priors arXiv:1802.04073v2 [cs.CV] 15 Mar 2018static.tongtianta.site/paper_pdf/871b0816-ec63-11e8-a02c-00163e08bb86.pdfMuhammad Asim , Fahad Shamshad , and Ali Ahmed The authors have

14 Authors Suppressed Due to Excessive Length

CelebA SVHNNoise (%)→ 10 20 30 40 50 10 20 30 40 50

CNNPSNR(dB) 16.556 14.216 13.450 12.275 11.772 18.908 14.464 12.634 11.747 10.803

SSIM 0.621 0.407 0.334 0.220 0.174 0.864 0.624 0.4893 0.308 0.2205

OursPSNR(dB) 22.055 21.601 21.765 21.682 20.678 24.765 20.823 20.963 15.120 13.213

SSIM 0.850 0.832 0.846 0.821 0.815 0.967 0.930 0.935 0.778 0.733Table 5: Performance of proposed approach at different noise levels. Quantitative comparison interms of PSNR and SSIM with CNN. Our proposed method outperform CNN for all noise levelwith significant margin.

approach compared to end-to-end trained models. Quantitative results are shown in Ta-ble 5. Our algorithm shows impressive performance by outperforming CNN by 5 − 8dB in term of PSNR. CNN based approach works only good for noise level on which itis trained (1% gaussian noise here) and completely fails on other noise levels as shownin Figure 6 while our approach give impressive results even with 50% noise levels Table5.

5.3 Limitations

Currently, our approach is limited by the range of generators GI(zi) and GK(zk).GANs although capable of generating sharp and realistic looking images, they still lackin their generative capabilities. A well trained generator with much greater range canrecover exact images. Similarly gradient descent can get stuck in local minima as shownin Figure 7. We demonstrate failure cases in which case more random restarts can im-prove performance. Although we use 10 random restarts in all experiments, but thisvalue is found experimentally and by no mean close to the optimum. Moreover, finetuning parameters of the models and optimizers is also a challenge and need furtherinvestigation.

(a) Range (b) 10 RR (c) 25 RR

Fig. 7: Failure case of recovering reconstructed image on large blur dataset. As clear from figureincreasing number of random restarts (RR) can improve reconstruction.

6 Conclusion and Future Work

We leverage the power of deep generative priors to regularize ill-posed and non-linearinverse problem of blind image deblurring. We show that this regularized problem is

Page 15: Priors arXiv:1802.04073v2 [cs.CV] 15 Mar 2018static.tongtianta.site/paper_pdf/871b0816-ec63-11e8-a02c-00163e08bb86.pdfMuhammad Asim , Fahad Shamshad , and Ali Ahmed The authors have

Blind Image Deconvolution using Deep Generative Priors 15

efficiently solved by simple alternating gradient descent algorithm. Through extensiveexperiments we show that our results outperform conventional engineered linear modelsand competitive against end-to-end deep learning approaches. Contrary to these deeplearning techniques, our approach does not require expensive re-training of neural net-works for different noise levels and is also robust to extravagant blurs.

The initial success of deep generative priors in solving the blind image deblurringproblem highlighted in this paper opens up exciting questions of similar strategy inother related ill-posed, non-linear inverse problems such as phase retrieval from magni-tude only measurements of images; a problem of great interest in physics, and biomed-ical imaging community [29]

References

1. Chan, T.F., Wong, C.K.: Total variation blind deconvolution. IEEE transactions on ImageProcessing 7(3) (1998) 370–375

2. Fergus, R., Singh, B., Hertzmann, A., Roweis, S.T., Freeman, W.T.: Removing camera shakefrom a single photograph. In: ACM transactions on graphics (TOG). Volume 25., ACM(2006) 787–794

3. Xu, L., Zheng, S., Jia, J.: Unnatural l0 sparse representation for natural image deblurring.In: Proceedings of the IEEE conference on computer vision and pattern recognition. (2013)1107–1114

4. Michaeli, T., Irani, M.: Blind deblurring using internal patch recurrence. In: EuropeanConference on Computer Vision, Springer (2014) 783–798

5. Ahmed, A., Recht, B., Romberg, J.: Blind deconvolution using convex programming. IEEETransactions on Information Theory 60(3) (2014) 1711–1732

6. Ren, W., Cao, X., Pan, J., Guo, X., Zuo, W., Yang, M.H.: Image deblurring via enhancedlow-rank prior. IEEE Transactions on Image Processing 25(7) (2016) 3426–3437

7. Krishnan, D., Fergus, R.: Fast image deconvolution using hyper-laplacian priors. In: Ad-vances in Neural Information Processing Systems. (2009) 1033–1041

8. Schuler, C.J., Hirsch, M., Harmeling, S., Scholkopf, B.: Learning to deblur. IEEE transac-tions on pattern analysis and machine intelligence 38(7) (2016) 1439–1451

9. Hradis, M., Kotera, J., Zemcık, P., Sroubek, F.: Convolutional neural networks for direct textdeblurring. In: Proceedings of BMVC. Volume 10. (2015)

10. Chakrabarti, A.: A neural approach to blind motion deblurring. In: European Conference onComputer Vision, Springer (2016) 221–235

11. Svoboda, P., Hradis, M., Marsık, L., Zemcık, P.: Cnn for license plate motion deblurring. In:Image Processing (ICIP), 2016 IEEE International Conference on, IEEE (2016) 3832–3836

12. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville,A., Bengio, Y.: Generative adversarial nets. In: Advances in neural information processingsystems. (2014) 2672–2680

13. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprintarXiv:1312.6114 (2013)

14. Bora, A., Jalal, A., Price, E., Dimakis, A.G.: Compressed sensing using generative models.arXiv preprint arXiv:1703.03208 (2017)

15. Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Te-jani, A., Totz, J., Wang, Z., et al.: Photo-realistic single image super-resolution using agenerative adversarial network. arXiv preprint (2016)

Page 16: Priors arXiv:1802.04073v2 [cs.CV] 15 Mar 2018static.tongtianta.site/paper_pdf/871b0816-ec63-11e8-a02c-00163e08bb86.pdfMuhammad Asim , Fahad Shamshad , and Ali Ahmed The authors have

16 Authors Suppressed Due to Excessive Length

16. Yeh, R.A., Chen, C., Lim, T.Y., Schwing, A.G., Hasegawa-Johnson, M., Do, M.N.: Semanticimage inpainting with deep generative models. In: Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition. (2017) 5485–5493

17. Xu, X., Sun, D., Pan, J., Zhang, Y., Pfister, H., Yang, M.H.: Learning to super-resolve blurryface and text images. In: Proceedings of the IEEE Conference on Computer Vision andPattern Recognition. (2017) 251–260

18. Nimisha, T., Singh, A.K., Rajagopalan, A.: Blur-invariant deep learning for blind-deblurring.In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.(2017) 4752–4760

19. Kupyn, O., Budzan, V., Mykhailych, M., Mishkin, D., Matas, J.: Deblurgan: Blind motiondeblurring using conditional adversarial networks. arXiv preprint arXiv:1711.07064 (2017)

20. Samangouei, P., Kabkab, M., Chellappa, R.: Defense-gan: Protecting classifiers against ad-versarial attacks using generative models. (2018)

21. Ilyas, A., Jalal, A., Asteri, E., Daskalakis, C., Dimakis, A.G.: The robust manifold defense:Adversarial training using generative models. arXiv preprint arXiv:1712.09196 (2017)

22. LeCun, Y.: The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/(1998)

23. Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in nat-ural images with unsupervised feature learning. In: NIPS workshop on deep learning andunsupervised feature learning. Volume 2011. (2011) 5

24. Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedingsof the IEEE International Conference on Computer Vision. (2015) 3730–3738

25. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improvedtechniques for training gans. In: Advances in Neural Information Processing Systems. (2016)2234–2242

26. Pan, J., Sun, D., Pfister, H., Yang, M.H.: Blind image deblurring using dark channel prior. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2016)1628–1636

27. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality,stability, and variation. arXiv preprint arXiv:1710.10196 (2017)

28. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from errorvisibility to structural similarity. IEEE transactions on image processing 13(4) (2004) 600–612

29. Jaganathan, K., Eldar, Y.C., Hassibi, B.: Phase retrieval: An overview of recent develop-ments. arXiv preprint arXiv:1510.07713 (2015)