developing the demosaicing algorithm in gpgpu ping xiang electrical engineering and computer science

Post on 14-Jan-2016

221 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Developing the Demosaicing Algorithm in GPGPU

Ping XiangElectrical engineering and computer science

Outline

Background

Algorithm

Implementation

Experiment Results

Future Work

Background

1. Color Filter Array. A mosaic of color filters in front of the image sensor

Background

Demosaicing algorithm is to reconstruct a full color image from the data collected by the color filtering array.

Algorithm

Bilinear interpolation:

The red value of a non-red pixel is computed by the average of the two or four adjacent red pixels, and similarly for blue and green.

Algorithm

Algorithm

Algorithm

For Green Channels

Algorithm

For Red or Blue Channels

Implementation

Optimization: 1. Vectorize the pixel data to be processed

2. use shared memory to reduce the data transfer

Implementation

1. Vectorize the pixel data to be processed

Implementation

2. Use shared memory to reduce the data transfer

Experiment Results

Platform:

ATI Radeon™ HD 4870 Brook+ 1.4

Nvidia GeForce 8800 GTX CUDA 2.1

Dual Core AMD Opteron(tm) 2212 Frequency 2.0GHz

Experiment Results

performance comparison

0

0.5

1

1.5

2

2.5

CPU

CUDA

brook+

CPU 0.002 0.007 0.035 0.147 0.585 2.374

CUDA 0.002 0.0032 0.01 0.037 0.144 0.574

brook+ 0.0612 0.0635 0.075 0.113 0.277 0.575

128*128 256*256 512*512 1K*1K 2K*2K 4K*4K

For small data size, GPU is not always a good choice a. Memory transfer time dominates the kernel execution time b. Computation is not that complex enough

Experiment Results

performance comparison

0

0.5

1

1.5

2

2.5

CPU

CUDA

brook+

CPU 0.002 0.007 0.035 0.147 0.585 2.374

CUDA 0.002 0.0032 0.01 0.037 0.144 0.574

brook+ 0.0612 0.0635 0.075 0.113 0.277 0.575

128*128 256*256 512*512 1K*1K 2K*2K 4K*4K

When the data size is small, CUDA has better performance. When the data size increases to 4K, the brook+ performance catches up with CUDA

Experiment Results

Explanation ?

Memory Speed Stream processing Units

HD 4870 800 8*5*10*2

GTX 8800 128 (16*8)

Experiment Results

Shared Register Usage

ATI Radeon 4870 (brook 1.4)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

128*128 256*256 512*512 1K*1K 2K*2K 4K*4K

Unoptimized

Optimized

Read data into shared register and try to reuse the data

Experiment Results

ATI Radeon 3870 brook+1.3

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

128*128 256*256 512*512 1K*1K 2K*2K 4K*4K

Unoptimized

Optimized

Future Work

1. Shared memory usage for further optimization

2. Integrate the code with proper interface to import image data and export pixel data

3. Report

Reference

1. High-Quality linear interpolation for Demosaicing of Bayer-patterned color images, Henrique S. Malvar, Li-wei He, and Ross Cutler

2. An Improved Demosaicing Algorithm Alexey Lukin, Denis Kubasov

Questions?

top related