high performance sar image on commodity multicore

10
High Performance SAR Image Formation On Commodity Multicore Architectures Daniel S. McFarlin Franz Franchetti Markus Püschel José M. F. Moura

Upload: others

Post on 28-Apr-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High Performance SAR Image On Commodity Multicore

Carnegie Mellon

High Performance SAR Image Formation On Commodity Multicore Architectures

Daniel S. McFarlinFranz FranchettiMarkus Püschel

José M. F. Moura

Page 2: High Performance SAR Image On Commodity Multicore

Carnegie Mellon

Challenge: High Performance Mapping of Algorithms to Highly Parallel Hardware

1 2 4 4 5 1 1 3

6 3 5 7

+ + + +

core core

cache

memory

Difficult for General Purpose C Toolchain

Page 3: High Performance SAR Image On Commodity Multicore

Carnegie Mellon

Spiral’s Automatically GeneratedPFA SAR Image Formation Code

44 43

0

10

20

30

40

50

SAR Image Formation on Intel platformsperformance [Gflop/s]

3.0 GHz Core 2 (65nm)

3.0 GHz Core 2 (45nm)

2.66 GHz Core i7

3.0 GHz Core i7 (Virtual)

newerplatforms

16 Megapixels 100 Megapixels

Algorithm by J. Rudin (best paper award, HPEC 2007): 30 Gflop/s on Cell

Each implementation: vectorized, threaded, cache tuned, ~13 MB of code

Code was not written by a human

Page 4: High Performance SAR Image On Commodity Multicore

Carnegie Mellon

Spiral: A Domain Specific Program Generator

constant foldingscheduling……

Transformuser specified

C Code:

Fast algorithmin SPLmany choices

∑-SPL:

Iteration of this process

to search for the fastest

But that’s not all …

parallelizationvectorization

loop optimizations

constant foldingscheduling……

Optimization at allabstraction levels

Iteration of this process to search for the fastest

Page 5: High Performance SAR Image On Commodity Multicore

Carnegie Mellon

Spiral Formula Representation of SAR

Grid

Compute

Range

Interpolation

Azimuth

Interpolation2D FFT

Page 6: High Performance SAR Image On Commodity Multicore

Carnegie Mellon

Parallelization through RewritingVectorization:Threading:

GPUs: Verilog for FPGAs:

Rigorous, correct by construction

Overcomes compiler limitations

Page 7: High Performance SAR Image On Commodity Multicore

Carnegie Mellon

Domain Specific FFT

1D FFT1D IFFT

Segmented Interpolationk segments of length r, with u–fold upsamping

Pruned FFT can reduce dominant Interpolation opcount by up to 15%

Page 8: High Performance SAR Image On Commodity Multicore

Carnegie Mellon

Performance Results

0.5

3.33

0

2

4

6

8

SAR Image Formation on Intel platformsruntime [sec]

3.0 GHz Core 2 (65nm)

3.0 GHz Core 2 (45nm)

2.66 GHz Core i7

3.0 GHz Core i7 (Virtual)

newerplatforms

16 Megapixels 100 Megapixels

Page 9: High Performance SAR Image On Commodity Multicore

Carnegie Mellon

Performance Results

14.4

53.1

0

10

20

30

40

50

60

SAR Image Formation on Intel platformsPercentage speedup of 2 MB pages over 4K pages

3.0 GHz Core 2 (65nm)

3.0 GHz Core 2 (45nm)

2.66 GHz Core i7

newerplatforms

16 Megapixels 100 Megapixels

Page 10: High Performance SAR Image On Commodity Multicore

Carnegie Mellon

Conclusions and Future Work

Spiral generated SAR Image Formation performancecomparable to hand-tuned code on the Cell

SAR generation for non-released platforms AVX

Larrabee

DFT on Larrabee

`

`Not actual data (NDA)