implementing the fft on gpus tips & trickscomputer-graphics.se/multicore/pdf/2013/12a fft on...

23
1 LINKÖPING UNIVERSITY Linköping, 2013 IMPLEMENTING THE FFT ON GPUs TIPS & TRICKS Department of Electrical Engineering Mario Garrido Gálvez [email protected]

Upload: others

Post on 14-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18

1

LINKÖPING UNIVERSITY

Linköping, 2013

IMPLEMENTING THE FFT ON GPUs

TIPS & TRICKS

Department of Electrical

Engineering

Mario Garrido Gálvez [email protected]

Page 2: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18

2

MARIO GARRIDO

Associate Professor at ES, ISY

PhD in Electrical Engineering (Spain).

Research background:

Optimized implementation of signal processing algorithms.

Transforms (FFT, STFT,…), statistical operations

(regressions, median filter,…).

Data management (matrix transposition, interleavers,…).

Hardware designer (FPGAs, ASICs,…).

Page 3: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18

3

A STORY ABOUT GPUs

Page 4: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18

4

< 2011

Mainly FFTs on FPGAs.

Hundreds of papers in the topic since the 70’s.

Is not everything done???

OPTIMIZE

OPTIMIZE

OPTIMIZE

Page 5: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18

5

DFT / FFT

Discrete Fourier Transform / Fast Fourier Transform.

The most widely used algorithm in signal processing

- Audio and Image Processing. - 3G, 4G.

- Medical applications: EEG, ECG. - ADSL.

Page 6: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18

6

…,2011,…

FFT FPGA FFT FFT FPGA FPGA FFT FPGA

FPGA FFT FPGA FFT FFT FPGA FPGA FFT…

I should do something new!!

What about GPUs?…Shouldn’t it be the same…

Page 7: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18

7

FPGA vs GPU

FPGA GPU

Altera Cyclone II NVIDIA Fermi

Page 8: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18

8

…,2011,…

Started Master Thesis (Sreehari Ambuluri): FFTs on GPUs.

Read articles and a book on GPUs.

Asked Ingemar, Jens, Gabriel.

Page 9: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18

9

…,2012,…

Finish the Master Thesis.

The work is good.

Why not to improve it and

publish a paper?

Asked Ingemar, Jens and

Gabriel for collaboration.

Page 10: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18

10

…,2013,…

Page 11: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18

11

…,2013 BEST PAPER AWARD

Page 12: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18

12

NVIDIA

NVIDIA WE

Page 13: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18

13

LEVEL OF ABSTRACTION

The compiler decides We decide

COMPILATION

COMPILATION

DESCRIPTION

OPTIMIZATION

&

DESCRIPTION

ALGORITHM ALGORITHM =

IMPLEMENTATION IMPLEMENTATION <

DE

SIG

N P

RO

CE

SS

High level of abstraction Low level of abstraction

Page 14: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18

14

ABSTRACTION vs PERFORMANCE

z = 15 * x;

z = (x<<3)+(x<<2)+(x<<1) +x;

z = (x<<4)-x;

x z

15

x << 3

z

<< 2

<< 1

x << 4

z

LANGUAGE

DESCRIPTION HARDWARE

IMPLEMENTATION

Page 15: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18

15

UNDERSTANDING GPUs

1.- The performance is related to the computation time. The lower

the computation time, the higher the performance. Try to simplify

the operations in the algorithm.

2.- Transactions to global memory very expensive. Try to avoid or

minimize. Try to use shared memory.

3.- Threads must be synchronized if we want to share information

among them. Unless they are in the same warp. Try to reduce the

number of synchronization points.

4.- We have to calculate the index of the data processed by each

thread. Try to minimize the number of index calculations.

5.- Threads process data in parallel and the synchronization is not

possible until all the threads have finished the calculations. Balance

the load among thread.

Page 16: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18

FFT FLOW GRAPH (RADIX -2)

4 03

7

6

5

4

0

07

15

11

2

3

1

0

0

4

0

0

0

0

0

0

4

6 4

0

2

0 0

0

5

13

1

9

6

14

2

10

0

0

0

0

0

0

4

0

0

0 0

0

4

12

0

8

14

15

13

12

10

11

8

9

7

6

1

5

4

3

2

0

STAGE 1 STAGE 2 STAGE 3 STAGE 4

4

6

2

0

0

0

0

0

x[n]

n

X[k]

k

logrN

stages

butterflies

(radix-2)

rotations

Nj

e

2

N points

16

Page 17: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18

1. SIMPIFY THE ALGORITHM

17

USE RADIX-22

Page 18: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18

2. USE SHARED MEMORY

18

Page 19: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18

3. REDUCE SYNC. POINTS 2-word group 4-word group

USE WORD GROUPS

19

Page 20: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18

4. REDUCE INDEX CALCULATIONS

USE CONSTANT GEOMETRY

Conventional flow graph Constant Geometry

20

Page 21: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18

5. BALANCE LOAD AMONG THREADS

USE SCHEDULING

Unbalanced scheduling Balanced scheduling

21

Page 22: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18

CONCLUSIONS

22

Optimization:

- Depends on the details and the level of abstraction.

- Requires to understand in-depth what you are doing.

Teamwork makes a difference.

GPUs are fun.

Page 23: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18

23