opencl 2d convolution using separable filters -box filter

6

Click here to load reader

Upload: pi194043

Post on 14-Apr-2015

1.080 views

Category:

Documents


0 download

DESCRIPTION

OpenCL heterogenous parallel program for 2D Convolution Using Separable Filters implementation and comparison with CPU host code and 2D convolution methods for image processing application in OpenCV

TRANSCRIPT

Page 1: OpenCL 2D Convolution Using Separable Filters -Box Filter

OpenCL ImageConvolution-SeperableFilters

Pi19404

February 6, 2013

Page 2: OpenCL 2D Convolution Using Separable Filters -Box Filter

Contents

Contents

OpenCL Image Convolution-Seperable Filters 3

0.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30.2 Separable Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

0.3 Computation Required for Convolution Using Separablekernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

0.4 Parallel Implementation . . . . . . . . . . . . . . . . . . . . . . . 4

0.5 Comparison with CPU implementations . . . . . . . . . . . . . 5

0.6 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 | 6

Page 3: OpenCL 2D Convolution Using Separable Filters -Box Filter

OpenCL Image Convolution-Seperable Filters

OpenCL ImageConvolution-Seperable Filters

0.1 AbstractIn the earlier article we had seen the parallel implementation of2D image convolution and application to simple averaging box filter.In this article we will look at the basics of separable filter andutilizing it for task of image convolution using Heterogeneous parallelprogramming approach using OpenCL.

0.2 Separable Filters

Image convolution operation can be viewed as the output of LTIsystem with impulse response corresponding to convolution kernelwhich is impulse response of the LTI system.

Separable filters are a special case of general convolution

A 2D kernel is called a separable if it can be expressed asthe outer product of two 1D vectors.It can be decomposed intovertical and horizontal projections.

Let f(x,y) be the input image,and h(x,y) be the convolution kernelh = (u � v)

Let u be column vector and v be row vectorf � h = f � (u � v) = (u � v) � f = u � (v � f)

First convolve f with v and the result with uu and v are 1D vectorsconvolving f with v is convolving all the row of f with vconvolving result with u is convolving all the columns with u

(1)

3 | 6

Page 4: OpenCL 2D Convolution Using Separable Filters -Box Filter

OpenCL Image Convolution-Seperable Filters

0.3 Computation Required for ConvolutionUsing Separable kernel

If we have MxN image convolved with PxQ kernel then each pixelcomputation will require PQ multiplications and additions.Hence totaloperations require is MNPQ.

If we perform convolution using separable filter we require MN(P+Q)

multiplications and additions.

Thus 2D convolution requires computation by a factor PQ

P+Qlarger

than separable convolution method.

For a 3x3 convolution kernel it a speedup by a factor of 1:5.

0.4 Parallel Implementation

In parallel implementation will contain of two parts

In the first part all the rows of the image will be convolvedwith 1D row vector. This is a 1D convolution operation.

A thread will be launched for each row in the image each threadwill perform 1D convolution of the image row with 1D kernel andstore the result in the global memory.

In the second part all the columns of the image are convolvedwith 1D column vector.This is again a 1D convolution operation. Eachthread will read a column vector and perform 1D convolution. Beforewe proceed with the second part we need to make sure all thethreads of the first part have completed their operation.

Thus common code required is that of 1D convolution operation.

The issue with column filter is that adjacent threads would beaccessing non adjacent memory locations.

A simple trick that can be used allocate the global memory asthe square array of the largest dimension.

While writing the output of the row filter,write it along the columns

4 | 6

Page 5: OpenCL 2D Convolution Using Separable Filters -Box Filter

OpenCL Image Convolution-Seperable Filters

ie we are storing the results in the transposed matrix.

Then performing the row convolution again with the resultant matrixis equivalent to performing column filtering and again storing theresult in the transposed matrix.

This way we have coalesced memory read ,adjacent threads ina thread block are accessing adjacent global memory locations.

Each thread will perform computations of all the three dimen-sions of the pixels in the present implementations.

In the present implementations local memory is not used.But inthe later article we will implement a local memory implementationswhere the data from the global memory is copied to the localmemory and computations are performed on a block of data byeach thread block and results are combined to obtain the desiredrow/column filtering operations.

0.5 Comparison with CPU implementations

The Separable filter implementations was compared with 2D convo-lution,and CPU separable implementations for 320x240 image on aIntel(R) Core(TM) i3 CPU at 2.53GHz. The Separable filter gave aimproved performance of 8x compared to the CPU implementation ofbox filter and 4x compared to 2D parallel implementation of boxfiles.

0.6 CodeThe code consits of two parts the host code and the devicecode. Host side code uses OpenCv API’s to read the image fromvideo file and demonstrates the calling of the kernel code for Boxfilter for 2d convolution,seprable filter and host CPU implementation.

Code is available in repository https://code.google.com/p/m19404/

source/browse/OpenCL-Image-Processing/Convolution/

5 | 6

Page 6: OpenCL 2D Convolution Using Separable Filters -Box Filter

Bibliography

Bibliography

[1] A study of OpenCL image convolution optimization. url: http://www.evl.uic.edu/kreda/gpu/image-convolution/.

[2] Image Convolution Filter. url: http://lodev.org/cgtutor/filtering.html.

[3] NVidia CUDA Example. url: http://developer.download.nvidia.com/compute/cuda/4_2/rel/sdk/website/OpenCL/html/samples.html.

[4] OpenCL. url: http://www.khronos.org/opencl/.

[5] OpenCV color conversion. url: http://www.shervinemami.info/colorConversion.html.

[6] Steve on Image Processing Blog at Matlab Central. url: http : / / blogs .

mathworks.com/steve/2006/10/04/separable-convolution/.

[7] The Scientist and Engineer's Guide to Digital Signal Processing. url: http://www.dspguide.com/ch24/3.htm.

6 | 6