parallel image processing - purdue engineeringparallel image processing lei cao and yan wang...

Parallel Image Processing Lei Cao and Yan Wang

2013-04-19

Image Size: 4753 * 3168 (*3 RGB)

LOMO effect: boundaries are made darker; add contrast to the red channel.

Tilt-shift effect: certain regions are made out-of-focus by Gaussian filtering

Outline

Algorithm &Serial Code Profiling

OpenMP Parallelization

MPI Parallelization

Performance

Future Work

Algorithm

Tilt-shift LOMO

Serial Program Profiling

Read Image

Tilt Shift

Red Tune

Vignette

Write Image

Others

Tested at Purdue’s Hansen cluster (Dell compute nodes with four 12-core AMD Opteron

6176 processors). Compiler: intel/12.0.084. R/W the image with the JPEG library

255 135 2

221 127 18

94 100 33 a1 a2 a3

a4 a5 a6

a7 a8 a9

( )* ( )( )

p i a ip center

Gaussian Filtering:

Center Weighted Average

4753 * 3168 (*3

Gaussian Filter:

255 135 2

221 127 18

94 100 33 a1 a2 a3

a4 a5 a6

a7 a8 a9

221 127 18

94 100 33

32 93 105

a1 a2 a3

a4 a5 a6

a7 a8 a9

94 100 33

32 93 105

19 18 17

a1 a2 a3

a4 a5 a6

a7 a8 a9

19 18 94 100 33

32 93 32 93 105

19 18 19 18 17

19 18 94 100 33

a1 a2 a3

a4 a5 a6

a7 a8 a9

Less computation Vs. More computation

Load Imbalance: dynamical scheduling in OpenMP

OMP Parallelization

Dynamics scheduling

nowait

Loop coalescing: transform 2D image matrix

to 1D array

Performance

Serial

Read Image

Tilt Shift

Red Tune

Vignette

Write Image

Others

Hansen cluster (Dell compute nodes with four 12-core AMD Opteron 6176 processors);

Serial compiler: intel/12.0.084

Parallel compiler: openmpi/1.4.4_intel-12.0.084; 8 cores are used.

44% 6%

8-core OpenMP

Dynamic Scheduling

Dynamic

Serial compiler: intel/12.0.084

44% 6%

Static

Read Image

Tilt Shift

Red Tune

Vignette

Write Image

Others

nowait

w/ nowait

44% 6%

w/o nowait

Read Image

Tilt Shift

Red Tune

Vignette

Write Image

Others

Speedup

Original Dynamic w/o

nowait

Nowait w/o

dynamic

Dynamic

&nowait

+15% +2%

Speedup of OpenMP and MPI

0 5 10 15 20 250

No. of procs

OpenMP

OMP compiler: openmpi/1.4.4_intel-12.0.084;

MPI compiler: mpich2-intel/1.4.4_intel-12.0.084

MPI Vs. OpenMP

52% 6%

8-core MPI

Read Image

Tilt Shift

Red Tune

Vignette

Write Image

Others

Serial compiler: intel/12.0.084; OpenMP compiler: openmpi/1.4.4_intel-12.0.084; MPI

compiler: mpich2/1.4.4_intel-12.0.0848

44% 6%

8-core OpenMP

MPI: creates a large array in each thread, which is inefficient.

Future Work

Other image sizes (smaller or larger)

Find other aspects to improve the speedup

Try different chunk size in dynamic

scheduling

Thanks!

parallel image processing - purdue engineeringparallel image processing lei cao and yan wang...

Documents

course syllabus cpsc 6176 enterprise web application...

digital image processing csc331 morphological image...

linear image processing and...

advanced ortho image processing igic 2010. 2 advanced ortho...

digital image processing - embl heidelberg - the … · ...

image processing point processing filters dithering image...

fundamentals of image processing of image processing 1. ......

image processing an experimental analysis of image ... ·...

computer graphics & image processing chapter # 9...

image processing in idl - geological & mining...

image processing - ch 05 - image...

image processing and analysis - image processing

computer graphics & image processing chapter # 6 color image...

image processing on gpu - kent state...

fundamentals of image processing -...

image processing - study...

chapter 9: morphological image processing digital image...

image processing: image...

image processing - iit...

multiresolution image...