developing the demosaicing algorithm in gpgpu ping xiang electrical engineering and computer science
TRANSCRIPT
Developing the Demosaicing Algorithm in GPGPU
Ping XiangElectrical engineering and computer science
Outline
Background
Algorithm
Implementation
Experiment Results
Future Work
Background
1. Color Filter Array. A mosaic of color filters in front of the image sensor
Background
Demosaicing algorithm is to reconstruct a full color image from the data collected by the color filtering array.
Algorithm
Bilinear interpolation:
The red value of a non-red pixel is computed by the average of the two or four adjacent red pixels, and similarly for blue and green.
Algorithm
Algorithm
Algorithm
For Green Channels
Algorithm
For Red or Blue Channels
Implementation
Optimization: 1. Vectorize the pixel data to be processed
2. use shared memory to reduce the data transfer
Implementation
1. Vectorize the pixel data to be processed
Implementation
2. Use shared memory to reduce the data transfer
Experiment Results
Platform:
ATI Radeon™ HD 4870 Brook+ 1.4
Nvidia GeForce 8800 GTX CUDA 2.1
Dual Core AMD Opteron(tm) 2212 Frequency 2.0GHz
Experiment Results
performance comparison
0
0.5
1
1.5
2
2.5
CPU
CUDA
brook+
CPU 0.002 0.007 0.035 0.147 0.585 2.374
CUDA 0.002 0.0032 0.01 0.037 0.144 0.574
brook+ 0.0612 0.0635 0.075 0.113 0.277 0.575
128*128 256*256 512*512 1K*1K 2K*2K 4K*4K
For small data size, GPU is not always a good choice a. Memory transfer time dominates the kernel execution time b. Computation is not that complex enough
Experiment Results
performance comparison
0
0.5
1
1.5
2
2.5
CPU
CUDA
brook+
CPU 0.002 0.007 0.035 0.147 0.585 2.374
CUDA 0.002 0.0032 0.01 0.037 0.144 0.574
brook+ 0.0612 0.0635 0.075 0.113 0.277 0.575
128*128 256*256 512*512 1K*1K 2K*2K 4K*4K
When the data size is small, CUDA has better performance. When the data size increases to 4K, the brook+ performance catches up with CUDA
Experiment Results
Explanation ?
Memory Speed Stream processing Units
HD 4870 800 8*5*10*2
GTX 8800 128 (16*8)
Experiment Results
Shared Register Usage
ATI Radeon 4870 (brook 1.4)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
128*128 256*256 512*512 1K*1K 2K*2K 4K*4K
Unoptimized
Optimized
Read data into shared register and try to reuse the data
Experiment Results
ATI Radeon 3870 brook+1.3
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
128*128 256*256 512*512 1K*1K 2K*2K 4K*4K
Unoptimized
Optimized
Future Work
1. Shared memory usage for further optimization
2. Integrate the code with proper interface to import image data and export pixel data
3. Report
Reference
1. High-Quality linear interpolation for Demosaicing of Bayer-patterned color images, Henrique S. Malvar, Li-wei He, and Ross Cutler
2. An Improved Demosaicing Algorithm Alexey Lukin, Denis Kubasov
Questions?