high performance sar image on commodity multicore
Post on 28-Apr-2022
2 Views
Preview:
TRANSCRIPT
Carnegie Mellon
High Performance SAR Image Formation On Commodity Multicore Architectures
Daniel S. McFarlinFranz FranchettiMarkus Püschel
José M. F. Moura
Carnegie Mellon
Challenge: High Performance Mapping of Algorithms to Highly Parallel Hardware
1 2 4 4 5 1 1 3
6 3 5 7
+ + + +
core core
cache
memory
Difficult for General Purpose C Toolchain
Carnegie Mellon
Spiral’s Automatically GeneratedPFA SAR Image Formation Code
44 43
0
10
20
30
40
50
SAR Image Formation on Intel platformsperformance [Gflop/s]
3.0 GHz Core 2 (65nm)
3.0 GHz Core 2 (45nm)
2.66 GHz Core i7
3.0 GHz Core i7 (Virtual)
newerplatforms
16 Megapixels 100 Megapixels
Algorithm by J. Rudin (best paper award, HPEC 2007): 30 Gflop/s on Cell
Each implementation: vectorized, threaded, cache tuned, ~13 MB of code
Code was not written by a human
Carnegie Mellon
Spiral: A Domain Specific Program Generator
constant foldingscheduling……
Transformuser specified
C Code:
Fast algorithmin SPLmany choices
∑-SPL:
Iteration of this process
to search for the fastest
But that’s not all …
parallelizationvectorization
loop optimizations
constant foldingscheduling……
Optimization at allabstraction levels
Iteration of this process to search for the fastest
Carnegie Mellon
Spiral Formula Representation of SAR
Grid
Compute
Range
Interpolation
Azimuth
Interpolation2D FFT
Carnegie Mellon
Parallelization through RewritingVectorization:Threading:
GPUs: Verilog for FPGAs:
Rigorous, correct by construction
Overcomes compiler limitations
Carnegie Mellon
Domain Specific FFT
1D FFT1D IFFT
Segmented Interpolationk segments of length r, with u–fold upsamping
Pruned FFT can reduce dominant Interpolation opcount by up to 15%
Carnegie Mellon
Performance Results
0.5
3.33
0
2
4
6
8
SAR Image Formation on Intel platformsruntime [sec]
3.0 GHz Core 2 (65nm)
3.0 GHz Core 2 (45nm)
2.66 GHz Core i7
3.0 GHz Core i7 (Virtual)
newerplatforms
16 Megapixels 100 Megapixels
Carnegie Mellon
Performance Results
14.4
53.1
0
10
20
30
40
50
60
SAR Image Formation on Intel platformsPercentage speedup of 2 MB pages over 4K pages
3.0 GHz Core 2 (65nm)
3.0 GHz Core 2 (45nm)
2.66 GHz Core i7
newerplatforms
16 Megapixels 100 Megapixels
Carnegie Mellon
Conclusions and Future Work
Spiral generated SAR Image Formation performancecomparable to hand-tuned code on the Cell
SAR generation for non-released platforms AVX
Larrabee
DFT on Larrabee
`
`Not actual data (NDA)
top related