end-to-end learning for image burst deblurringpatwie.com/pdf/wieschollek_accv2016_poster.pdf ·...

1
End-to-End Learning for Image Burst Deblurring P. Wieschollek 1,2 , B. Sch¨ olkopf 1 , H. P.A. Lensch 2 , M. Hirsch 1 1 Max Planck Institute for Intelligent Systems, T¨ ubingen, 2 University of T¨ ubingen In a nutshell: We describe a learning-based approach to multi-frame blind deconvolution using a neural network enabling information sharing across images. Our system is trained end-to-end using standard back- propagation on a set of artificially generated training examples, enabling competitive performance in multi-frame blind deconvolution, both with respect to quality and runtime. Multi-frame Blind Deblurring Problem Multi-frame blind deconvolution aims at recovering a sharp image from a series of blurry photos. burst of blurry input images sharp output Multiple photos of the same scene facilitate the reconstruction of a single sharp image Given a burst of observed images Y 1 ,Y 2 ,...,Y N ∈I capturing the same scene, our image model reads Y t = k t * X + ε t , for all t (1) where * denotes the convolution operator, k t blur kernel for observation Y t and ε t additive zero-mean Gaussian noise. Goal Predict a single sharp image ˆ X satisfying all constraints in (1). Network Architecture Our neural network mapping π (θ ) : I k p →I p , (y 1 ,y 2 ,...,y k ) 7ˆ x = π (θ ) (y 1 ,y 2 ,...,y k ). operates on overlapping image patches y t ∈I p using a sliding window approach. We directly minimize the L2-loss. The architecture π (θ ) (·) consists of several stages: 4096 4096 4096 4096 b × 65 2 y t 1024 2048 b × 33 2 ˜ x t (a) Frequency-Band-Analysis (b) Deconvolution Kernel Prediction b 1 b 2 b 3 b 4 b 1 b 2 b 3 b 4 mlp 1 mlp 1 mlp 1 mlp 1 deconvolution Our network is applied to all image patches from a burst in parallel. This allows us to share information between different photos of the same scene. a) Frequency Band Analysis Following the work of Chakrabarti [1] we separate the Fourier spectrum in 4 different bands by computing the discrete Fourier transform of the observed patch at three different sizes (17 × 17, 33 × 33, 65 × 65) using different sample sizes in b 1 ,b 2 ,b 3 . Hereby, band b 4 represents a low-pass band containing all coefficient max |z |≤ 4 from band b 3 . In addition, we allow each band separately to interact across all images in one burst to support early information sharing. b) Deconvolution Pairwise merging of the bands b 1 ,b 2 ,b 3 ,b 4 from the frequency band analysis (Step a) into b 0 1 ,b 0 2 ,b 0 3 ,b 0 4 is performed using fully connected layers with ReLU activation which entails a successive dimensionality reduction. The last layer directly produces a 4225 dimensional prediction of the filter coefficients of the deconvolution kernel, which is then applied to each input patch. c) Image Fusion Information fusion across entire bursts is done via a modified version of the recently proposed Fourier-Burst-Accumulation (FBA) [2] method. The vanilla FBA algorithm applies the following weighted sum to a Fourier transform ˆ α of the patch α: u α)= F -1 k X i =1 w i (ζ α i (ζ ) (x ), w i (ζ )= | ˆ α i (ζ )| p k j =1 ˆ α j (ζ ) p , (2) where w i denotes the contribution of frequency ζ of a patch α i . Contrary to [2] we propose to learn these parameters w i (ζ ) in a neural network layer by making use of information from all observed images within one burst. Value-Sharing In addition to support weight-sharing across all burst images, our architecture enables spread early interme- diate information across the entire burst, which essentially embeds a fully connected neural network for each coefficient (f ij ) t with weight sharing. mlp 1 v v We interpret each coefficient across one burst as a single vector. A transformed version (multiple 1 × 1 convolutions) of this excerpt will be placed at the same location in the output patch again. Results Comparison to state-of-the-art methods We compare the restored images with other state-of-the-art multi-image blind deconvolution algorithms such as the multi-channel blind deconvolution method of ˇ Sroubek et al. [4], the sparse-prior method of [5] and the FBA method proposed in [2]. random shot align& average ˇ Sroubek et al. Zhang et al. Delbracio et al. our approach Our trained neural network featuring the FBA-like averaging yields comparable if not superior results compared to previous approaches [4, 5, 2]. In direct comparison to the FBA results, our method is better in removing blur due to our additional prepended deconvolution module. Effect of our deconvolution module FBA and our algorithm are compared on bursts with increasing number of images of increasing quality. The individual images are sorted according to their PSNR starting with the most blurried images. The blurred input images are taken from [2]. 2 images 3 images 4 images 5 images 6 images all images FBA ours FBA ours Effect of end-to-end training One might ask, how our trained neural network compares to an approach that applies the methods of Chakrabarti [1] and Delbracio and Sapiro [2] subsequently, each in a separate step. Without end-to-end training ringing-artifacts (left) are clearly visible on the blue roof. They are significantly dampened (right) after training. Spatially-varying blur Our patch-based approach is able to handle spatially-varying blur. We took recorded camera trajectories and generated input images using the Efficient Filter Flow model of [5]. Our results (right) are consistently sharper than [2] (left) and demonstrate that our approach is also able to correct for spatially-varying blur. References [1] Ayan Chakrabarti. A Neural Approach to Blind Motion Deblurring., ECCV 2016. [2] Delbracio, Mauricio and Sapiro, Guillermo. Burst deblurring: Removing camera shake through fourier burst accumulation., CVPR 2015. [3] Tsung-Yi Lin et al. Microsoft COCO: Common Objects in Context. [4] ˇ Sroubek, Filip and Milanfar, Peyman. Robust multichannel blind deconvolution via fast alternating minimization, 2012. [5] Zhang, Haichao and Wipf, David and Zhang, Yanning. Multi-image blind deblurring using a coupled adaptive sparse prior, CVPR, 2013. [5] Hirsch, Michael and Sra, Suvrit and Sch¨ olkopf, Bernhard and Harmeling, Stefan. Efficient filter flow for space-variant multiframe blind deconvolution, 2010. goo.gl/GxxS8R

Upload: others

Post on 23-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: End-to-End Learning for Image Burst Deblurringpatwie.com/pdf/wieschollek_accv2016_poster.pdf · [2]Delbracio, Mauricio and Sapiro, Guillermo. Burst deblurring: Removing camera shake

End-to-End Learning for Image Burst DeblurringP. Wieschollek1,2, B. Scholkopf1, H. P.A. Lensch2, M. Hirsch1

1 Max Planck Institute for Intelligent Systems, Tubingen, 2 University of Tubingen

In a nutshell: We describe a learning-based approach to multi-frame blind deconvolution using a neuralnetwork enabling information sharing across images. Our system is trained end-to-end using standard back-propagation on a set of artificially generated training examples, enabling competitive performance in multi-frameblind deconvolution, both with respect to quality and runtime.

Multi-frame Blind Deblurring Problem

Multi-frame blind deconvolution aims at recovering a sharp image from a series of blurry photos.

burst of blurry input images sharp output

Multiple photos of the same scene facilitate the reconstruction of a single sharp image

Given a burst of observed images Y1, Y2, . . . , YN ∈ I capturing the same scene, our image model reads

Yt = kt ∗X + εt, for all t (1)

where ∗ denotes the convolution operator, kt blur kernel for observation Yt and εt additive zero-mean Gaussiannoise.

Goal Predict a single sharp image X satisfying all constraints in (1).

Network Architecture

Our neural network mapping

π(θ) : Ikp → Ip, (y1, y2, . . . , yk) 7→ x = π(θ)(y1, y2, . . . , yk).

operates on overlapping image patches yt ∈ Ip using a sliding window approach. We directly minimize the

L2-loss. The architecture π(θ)(·) consists of several stages:

4096409640964096

b× 652

yt

1024 2048

b× 332 xt

(a) Frequency-Band-Analysis (b) Deconvolution Kernel Prediction

b1

b2

b3

b4

b′1

b′2

b′3

b′4

mlp1

mlp1

mlp1

mlp1 deconvolution

Our network is applied to all image patches from a burst in parallel.

This allows us to share information between different photos of the same scene.

a) Frequency Band Analysis Following the work of Chakrabarti [1] we separate the Fourier spectrum in4 different bands by computing the discrete Fourier transform of the observed patch at three different sizes(17× 17, 33× 33, 65× 65) using different sample sizes in b1, b2, b3. Hereby, band b4 represents a low-passband containing all coefficient max |z | ≤ 4 from band b3.In addition, we allow each band separately to interact across all images in one burst to support early informationsharing.

b) Deconvolution Pairwise merging of the bands b1, b2, b3, b4 from the frequency band analysis (Step a)into b′1, b

′2, b′3, b′4 is performed using fully connected layers with ReLU activation which entails a successive

dimensionality reduction. The last layer directly produces a 4225 dimensional prediction of the filter coefficientsof the deconvolution kernel, which is then applied to each input patch.

c) Image Fusion Information fusion across entire bursts is done via a modified version of the recentlyproposed Fourier-Burst-Accumulation (FBA) [2] method. The vanilla FBA algorithm applies the followingweighted sum to a Fourier transform α of the patch α:

u(α) = F−1

k∑i=1

wi(ζ)αi(ζ)

(x), wi(ζ) =|αi(ζ)|p∑kj=1

∣∣αj(ζ)∣∣p , (2)

where wi denotes the contribution of frequency ζ of a patch αi . Contrary to [2] we propose to learn theseparameters wi(ζ) in a neural network layer by making use of information from all observed images within oneburst.

Value-Sharing

In addition to support weight-sharing across all burst images, our architecture enables spread early interme-diate information across the entire burst, which essentially embeds a fully connected neural network for eachcoefficient (fi j)t with weight sharing.

mlp1

v v′

We interpret each coefficient across one burst as a single vector. A transformed version

(multiple 1× 1 convolutions) of this excerpt will be placed at the same location in the output patch again.

Results

Comparison to state-of-the-art methods We compare the restored images with other state-of-the-artmulti-image blind deconvolution algorithms such as the multi-channel blind deconvolution method of Sroubeket al. [4], the sparse-prior method of [5] and the FBA method proposed in [2].

random shot align& average Sroubek et al. Zhang et al. Delbracio et al. our approach

Our trained neural network featuring the FBA-like averaging yields comparable if not superior results comparedto previous approaches [4, 5, 2]. In direct comparison to the FBA results, our method is better in removingblur due to our additional prepended deconvolution module.

Effect of our deconvolution module FBA and our algorithm are compared on bursts with increasingnumber of images of increasing quality. The individual images are sorted according to their PSNR starting withthe most blurried images. The blurred input images are taken from [2].

2 images 3 images 4 images 5 images 6 images all images

FBA

ours

FBA

ours

Effect of end-to-end training One might ask, how our trained neural network compares to an approachthat applies the methods of Chakrabarti [1] and Delbracio and Sapiro [2] subsequently, each in a separate step.Without end-to-end training ringing-artifacts (left) are clearly visible on the blue roof. They are significantlydampened (right) after training.

Spatially-varying blur Our patch-based approach is able to handle spatially-varying blur. We took recordedcamera trajectories and generated input images using the Efficient Filter Flow model of [5]. Our results(right) are consistently sharper than [2] (left) and demonstrate that our approach is also able to correct forspatially-varying blur.

References[1] Ayan Chakrabarti. A Neural Approach to Blind Motion Deblurring., ECCV 2016.

[2] Delbracio, Mauricio and Sapiro, Guillermo. Burst deblurring: Removing camera shake through fourier burst accumulation., CVPR 2015.

[3] Tsung-Yi Lin et al. Microsoft COCO: Common Objects in Context.

[4] Sroubek, Filip and Milanfar, Peyman. Robust multichannel blind deconvolution via fast alternating minimization, 2012.

[5] Zhang, Haichao and Wipf, David and Zhang, Yanning. Multi-image blind deblurring using a coupled adaptive sparse prior, CVPR, 2013.

[5] Hirsch, Michael and Sra, Suvrit and Scholkopf, Bernhard and Harmeling, Stefan. Efficient filter flow for space-variant multiframe blind deconvolution, 2010.

goo.gl/GxxS8R