![Page 1: Parallelization of System Matrix generation code Mahmoud Abdallah Antall Fernandes](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649ec95503460f94bd6d04/html5/thumbnails/1.jpg)
Parallelization of System Matrix generation code
Mahmoud AbdallahAntall Fernandes
![Page 2: Parallelization of System Matrix generation code Mahmoud Abdallah Antall Fernandes](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649ec95503460f94bd6d04/html5/thumbnails/2.jpg)
SPECT System
![Page 3: Parallelization of System Matrix generation code Mahmoud Abdallah Antall Fernandes](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649ec95503460f94bd6d04/html5/thumbnails/3.jpg)
SPECT System
![Page 4: Parallelization of System Matrix generation code Mahmoud Abdallah Antall Fernandes](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649ec95503460f94bd6d04/html5/thumbnails/4.jpg)
Inverse Cone
![Page 5: Parallelization of System Matrix generation code Mahmoud Abdallah Antall Fernandes](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649ec95503460f94bd6d04/html5/thumbnails/5.jpg)
Back Projection
Ref figure: Tomographic Reconstruction of SPECT Data – Bill Amini, Magnus Björklund, Ron Dror, Anders Nygren oo
Filtered Back Projection is applying a ramp filter on the back projected image.Still widely used for its high speed and easy implementation.
![Page 6: Parallelization of System Matrix generation code Mahmoud Abdallah Antall Fernandes](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649ec95503460f94bd6d04/html5/thumbnails/6.jpg)
Maximum Likelihood-Expectation Maximization Algorithm
Is found to reduce noise in reconstruction iteratively
An iterative algorithm is used to solve the following linear problemFX = PP – vector of projection dataX – voxelized imageF – projection matrix operator
Needs a large number of iterations to reconstruct an image
![Page 7: Parallelization of System Matrix generation code Mahmoud Abdallah Antall Fernandes](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649ec95503460f94bd6d04/html5/thumbnails/7.jpg)
EM AlgorithmThe EM algorithm is given by
Summation over k is projection operation
Summation over j is the back projection operation
![Page 8: Parallelization of System Matrix generation code Mahmoud Abdallah Antall Fernandes](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649ec95503460f94bd6d04/html5/thumbnails/8.jpg)
System Matrix
Maps the image space to the data space
Takes detector geometry as input
Generates detector data for every bin for each angle (usually there are 72 angles/frames)
![Page 9: Parallelization of System Matrix generation code Mahmoud Abdallah Antall Fernandes](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649ec95503460f94bd6d04/html5/thumbnails/9.jpg)
System Matrix Algorithm
for each angle DO // number of angles = 72for each detector bin in U direction Do // bins: around 14
for each detector bin in V direction Do // bins: around 64for each row in the inverse cone grid Do // <= 99
for each Column in the inverse cone grid Do //<= 99for each voxel intersected the Ray Do calculate point responseend
endend
endend
end
Number of loops = 72 x 14 x 64 x 99 x 99 = 632282112
![Page 10: Parallelization of System Matrix generation code Mahmoud Abdallah Antall Fernandes](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649ec95503460f94bd6d04/html5/thumbnails/10.jpg)
System Matrix Parallelization
Observation:At each angle, each bin’s calculations are independent from other bins’.
Proposal:Parallelize all calculations for each angle.
E.g. use GPU.
![Page 11: Parallelization of System Matrix generation code Mahmoud Abdallah Antall Fernandes](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649ec95503460f94bd6d04/html5/thumbnails/11.jpg)
System Matrix Parallelization on GPU
![Page 12: Parallelization of System Matrix generation code Mahmoud Abdallah Antall Fernandes](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649ec95503460f94bd6d04/html5/thumbnails/12.jpg)
Parallelized System Matrix Algorithm
Host Program:for each angle DO
Run all kernels for all bins at the same timeend
GPU Kernel:for each voxel intersected the Ray Do calculate attenuation and store it in SysMatend
![Page 13: Parallelization of System Matrix generation code Mahmoud Abdallah Antall Fernandes](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649ec95503460f94bd6d04/html5/thumbnails/13.jpg)
SIMD (Architecture of GPU)
From: (AMD) Advanced Micro Devices INC 2010 (Introduction to OpenCL Programming)
![Page 14: Parallelization of System Matrix generation code Mahmoud Abdallah Antall Fernandes](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649ec95503460f94bd6d04/html5/thumbnails/14.jpg)
OpenCL
Based on ISO C99 with some extensions & restrictions
provides parallel computing using task-based and data-based parallelism
Architecture Host Program Kernel
![Page 15: Parallelization of System Matrix generation code Mahmoud Abdallah Antall Fernandes](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649ec95503460f94bd6d04/html5/thumbnails/15.jpg)
Program Architecture
Host ProgramExecutes on the host systemSends kernels to execute on OpenCL™ devices using command queue.
KernelsSimilar to C function.Executed on OpenCL™ devices ( GPU).
![Page 16: Parallelization of System Matrix generation code Mahmoud Abdallah Antall Fernandes](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649ec95503460f94bd6d04/html5/thumbnails/16.jpg)
Thank You