![Page 1: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)](https://reader030.vdocuments.us/reader030/viewer/2022040402/5e8010300d924b2cc960cbd4/html5/thumbnails/1.jpg)
Parallel Image Processing Lei Cao and Yan Wang
2013-04-19
1
![Page 2: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)](https://reader030.vdocuments.us/reader030/viewer/2022040402/5e8010300d924b2cc960cbd4/html5/thumbnails/2.jpg)
2
Image Size: 4753 * 3168 (*3 RGB)
![Page 3: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)](https://reader030.vdocuments.us/reader030/viewer/2022040402/5e8010300d924b2cc960cbd4/html5/thumbnails/3.jpg)
3
LOMO effect: boundaries are made darker; add contrast to the red channel.
![Page 4: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)](https://reader030.vdocuments.us/reader030/viewer/2022040402/5e8010300d924b2cc960cbd4/html5/thumbnails/4.jpg)
4
![Page 5: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)](https://reader030.vdocuments.us/reader030/viewer/2022040402/5e8010300d924b2cc960cbd4/html5/thumbnails/5.jpg)
5
Tilt-shift effect: certain regions are made out-of-focus by Gaussian filtering
![Page 6: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)](https://reader030.vdocuments.us/reader030/viewer/2022040402/5e8010300d924b2cc960cbd4/html5/thumbnails/6.jpg)
Outline
Algorithm &Serial Code Profiling
OpenMP Parallelization
MPI Parallelization
Performance
Future Work
6
![Page 7: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)](https://reader030.vdocuments.us/reader030/viewer/2022040402/5e8010300d924b2cc960cbd4/html5/thumbnails/7.jpg)
Algorithm
7
Tilt-shift LOMO
![Page 8: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)](https://reader030.vdocuments.us/reader030/viewer/2022040402/5e8010300d924b2cc960cbd4/html5/thumbnails/8.jpg)
Serial Program Profiling
8
4%
71%
8%
15%
2% 0%
Time
Read Image
Tilt Shift
Red Tune
Vignette
Write Image
Others
Tested at Purdue’s Hansen cluster (Dell compute nodes with four 12-core AMD Opteron
6176 processors). Compiler: intel/12.0.084. R/W the image with the JPEG library
LOMO
![Page 9: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)](https://reader030.vdocuments.us/reader030/viewer/2022040402/5e8010300d924b2cc960cbd4/html5/thumbnails/9.jpg)
9
255 135 2
221 127 18
94 100 33 a1 a2 a3
a4 a5 a6
a7 a8 a9
( )* ( )( )
( )
p i a ip center
a i
Gaussian Filtering:
Center Weighted Average
4753 * 3168 (*3
RGB)
Gaussian Filter:
![Page 10: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)](https://reader030.vdocuments.us/reader030/viewer/2022040402/5e8010300d924b2cc960cbd4/html5/thumbnails/10.jpg)
10
255 135 2
221 127 18
94 100 33 a1 a2 a3
a4 a5 a6
a7 a8 a9
![Page 11: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)](https://reader030.vdocuments.us/reader030/viewer/2022040402/5e8010300d924b2cc960cbd4/html5/thumbnails/11.jpg)
11
221 127 18
94 100 33
32 93 105
a1 a2 a3
a4 a5 a6
a7 a8 a9
![Page 12: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)](https://reader030.vdocuments.us/reader030/viewer/2022040402/5e8010300d924b2cc960cbd4/html5/thumbnails/12.jpg)
94 100 33
32 93 105
19 18 17
12
a1 a2 a3
a4 a5 a6
a7 a8 a9
![Page 13: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)](https://reader030.vdocuments.us/reader030/viewer/2022040402/5e8010300d924b2cc960cbd4/html5/thumbnails/13.jpg)
19 18 94 100 33
32 93 32 93 105
19 18 19 18 17
19 18 94 100 33
13
a1 a2 a3
a4 a5 a6
a7 a8 a9
Less computation Vs. More computation
Load Imbalance: dynamical scheduling in OpenMP
![Page 14: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)](https://reader030.vdocuments.us/reader030/viewer/2022040402/5e8010300d924b2cc960cbd4/html5/thumbnails/14.jpg)
OMP Parallelization
Dynamics scheduling
nowait
Loop coalescing: transform 2D image matrix
to 1D array
14
![Page 15: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)](https://reader030.vdocuments.us/reader030/viewer/2022040402/5e8010300d924b2cc960cbd4/html5/thumbnails/15.jpg)
Performance
15
4%
71%
8%
15%
2% 0%
Serial
Read Image
Tilt Shift
Red Tune
Vignette
Write Image
Others
Hansen cluster (Dell compute nodes with four 12-core AMD Opteron 6176 processors);
Serial compiler: intel/12.0.084
Parallel compiler: openmpi/1.4.4_intel-12.0.084; 8 cores are used.
16%
44% 6%
26%
6%
2%
8-core OpenMP
![Page 16: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)](https://reader030.vdocuments.us/reader030/viewer/2022040402/5e8010300d924b2cc960cbd4/html5/thumbnails/16.jpg)
Dynamic Scheduling
16
17%
39%
7%
28%
7%
2%
Dynamic
Hansen cluster (Dell compute nodes with four 12-core AMD Opteron 6176 processors);
Serial compiler: intel/12.0.084
Parallel compiler: openmpi/1.4.4_intel-12.0.084; 8 cores are used.
16%
44% 6%
26%
6%
2%
Static
Read Image
Tilt Shift
Red Tune
Vignette
Write Image
Others
![Page 17: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)](https://reader030.vdocuments.us/reader030/viewer/2022040402/5e8010300d924b2cc960cbd4/html5/thumbnails/17.jpg)
nowait
17
17%
39%
7%
28%
7%
2%
w/ nowait
16%
44% 6%
26%
6%
2%
w/o nowait
Read Image
Tilt Shift
Red Tune
Vignette
Write Image
Others
![Page 18: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)](https://reader030.vdocuments.us/reader030/viewer/2022040402/5e8010300d924b2cc960cbd4/html5/thumbnails/18.jpg)
Speedup
18
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Original Dynamic w/o
nowait
Nowait w/o
dynamic
Dynamic
&nowait
Time
Hansen cluster (Dell compute nodes with four 12-core AMD Opteron 6176 processors);
Parallel compiler: openmpi/1.4.4_intel-12.0.084; 8 cores are used.
+15% +2%
+15%
![Page 19: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)](https://reader030.vdocuments.us/reader030/viewer/2022040402/5e8010300d924b2cc960cbd4/html5/thumbnails/19.jpg)
Speedup of OpenMP and MPI
19
0 5 10 15 20 250
5
10
15
20
25
No. of procs
Sp
ee
du
p
Ideal
OpenMP
MPI
Hansen cluster (Dell compute nodes with four 12-core AMD Opteron 6176 processors);
OMP compiler: openmpi/1.4.4_intel-12.0.084;
MPI compiler: mpich2-intel/1.4.4_intel-12.0.084
![Page 20: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)](https://reader030.vdocuments.us/reader030/viewer/2022040402/5e8010300d924b2cc960cbd4/html5/thumbnails/20.jpg)
MPI Vs. OpenMP
20
9%
52% 6%
25%
6%
2%
8-core MPI
Read Image
Tilt Shift
Red Tune
Vignette
Write Image
Others
Hansen cluster (Dell compute nodes with four 12-core AMD Opteron 6176 processors);
Serial compiler: intel/12.0.084; OpenMP compiler: openmpi/1.4.4_intel-12.0.084; MPI
compiler: mpich2/1.4.4_intel-12.0.0848
16%
44% 6%
26%
6%
2%
8-core OpenMP
MPI: creates a large array in each thread, which is inefficient.
![Page 21: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)](https://reader030.vdocuments.us/reader030/viewer/2022040402/5e8010300d924b2cc960cbd4/html5/thumbnails/21.jpg)
Future Work
Other image sizes (smaller or larger)
Find other aspects to improve the speedup
Try different chunk size in dynamic
scheduling
21
![Page 22: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)](https://reader030.vdocuments.us/reader030/viewer/2022040402/5e8010300d924b2cc960cbd4/html5/thumbnails/22.jpg)
22
Thanks!