developing the demosaicing algorithm in gpgpu ping xiang electrical engineering and computer science

Developing the Demosaicing Algorithm in GPGPU

Ping XiangElectrical engineering and computer science

Outline

Background

Algorithm

Implementation

Experiment Results

Future Work

Background

1. Color Filter Array. A mosaic of color filters in front of the image sensor

Background

Demosaicing algorithm is to reconstruct a full color image from the data collected by the color filtering array.

Algorithm

Bilinear interpolation:

The red value of a non-red pixel is computed by the average of the two or four adjacent red pixels, and similarly for blue and green.

Algorithm

For Green Channels

Algorithm

For Red or Blue Channels

Implementation

Optimization: 1. Vectorize the pixel data to be processed

2. use shared memory to reduce the data transfer

Implementation

1. Vectorize the pixel data to be processed

Implementation

2. Use shared memory to reduce the data transfer

Experiment Results

Platform:

ATI Radeon™ HD 4870 Brook+ 1.4

Nvidia GeForce 8800 GTX CUDA 2.1

Dual Core AMD Opteron(tm) 2212 Frequency 2.0GHz

Experiment Results

performance comparison

brook+

CPU 0.002 0.007 0.035 0.147 0.585 2.374

CUDA 0.002 0.0032 0.01 0.037 0.144 0.574

brook+ 0.0612 0.0635 0.075 0.113 0.277 0.575

128*128 256*256 512*512 1K*1K 2K*2K 4K*4K

For small data size, GPU is not always a good choice a. Memory transfer time dominates the kernel execution time b. Computation is not that complex enough

Experiment Results

performance comparison

brook+

CPU 0.002 0.007 0.035 0.147 0.585 2.374

CUDA 0.002 0.0032 0.01 0.037 0.144 0.574

brook+ 0.0612 0.0635 0.075 0.113 0.277 0.575

128*128 256*256 512*512 1K*1K 2K*2K 4K*4K

When the data size is small, CUDA has better performance. When the data size increases to 4K, the brook+ performance catches up with CUDA

Experiment Results

Explanation ?

Memory Speed Stream processing Units

HD 4870 800 8*5*10*2

GTX 8800 128 (16*8)

Experiment Results

Shared Register Usage

ATI Radeon 4870 (brook 1.4)

128*128 256*256 512*512 1K*1K 2K*2K 4K*4K

Unoptimized

Optimized

Read data into shared register and try to reuse the data

Experiment Results

ATI Radeon 3870 brook+1.3

128*128 256*256 512*512 1K*1K 2K*2K 4K*4K

Unoptimized

Optimized

Future Work

1. Shared memory usage for further optimization

2. Integrate the code with proper interface to import image data and export pixel data

3. Report

Reference

1. High-Quality linear interpolation for Demosaicing of Bayer-patterned color images, Henrique S. Malvar, Li-wei He, and Ross Cutler

2. An Improved Demosaicing Algorithm Alexey Lukin, Denis Kubasov

Questions?

developing the demosaicing algorithm in gpgpu ping xiang electrical engineering and computer science

pixel data

data transferimplementation1

small data size

brook performance

data size increases

shared memory

better performance

nonred pixel

Documents

gpgpu introduction

demosaicing based on directional difference...

comparison of demosaicing...

python + gpgpu

gpgpu - uclouvain

multispectral snapshot demosaicing via non...

demosaicing and white balancing

algorithm engineering „ gpgpu“

lmmse demosaicing for multicolor...

image demosaicing: a systematic survey

green edge directed demosaicing algorithm

multi-frame demosaicing and super-resolution from under...

gpgpu cots platforms

joint demosaicing and denoising with self guidance

demosaicing with successive chrominance-based non-local

gpgpu and financial business -...

gpgpu - elte

demosaicing using dual layer feedforward neural network

an efficient demosaicing technique using geometrical ... ·...

the gpgpu continuum