introduction to gpu computing - tams.informatik.uni-hamburg.de · i 32 cuda cores per sm i = 512...

20
Universit¨ at Hamburg MIN-Fakult¨ at Fachbereich Informatik Introduction to GPU Computing Introduction to GPU Computing Matthis Hauschild Universit¨ at Hamburg Fakult¨ at f¨ ur Mathematik, Informatik und Naturwissenschaften Fachbereich Informatik Technische Aspekte Multimodaler Systeme December 4, 2014 M. Hauschild - Introduction to GPU Computing 1

Upload: others

Post on 20-May-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to GPU Computing - tams.informatik.uni-hamburg.de · I 32 CUDA cores per SM I = 512 CUDA cores )512 ... I OpenACC { C, C++ and Fortran extension I ArrayFire { Wrapper

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Introduction to GPU Computing

Introduction to GPU Computing

Matthis Hauschild

Universitat HamburgFakultat fur Mathematik, Informatik und NaturwissenschaftenFachbereich Informatik

Technische Aspekte Multimodaler Systeme

December 4, 2014

M. Hauschild - Introduction to GPU Computing 1

Page 2: Introduction to GPU Computing - tams.informatik.uni-hamburg.de · I 32 CUDA cores per SM I = 512 CUDA cores )512 ... I OpenACC { C, C++ and Fortran extension I ArrayFire { Wrapper

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Introduction to GPU Computing

Table of Contents

1. Architecture of a GPU

2. General-purpose computing on GPUs

3. Applications of GPGPU

4. Performance evaluation examples

M. Hauschild - Introduction to GPU Computing 2

Page 3: Introduction to GPU Computing - tams.informatik.uni-hamburg.de · I 32 CUDA cores per SM I = 512 CUDA cores )512 ... I OpenACC { C, C++ and Fortran extension I ArrayFire { Wrapper

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Architecture of a GPU Introduction to GPU Computing

What is a GPU

I Graphics processing unitI Main GPU manufacturers

1. Intel2. AMD3. Nvidia

I Performance characteristics:1

I GPU architecture: 28 nmI GPU speed: ∼ 1 GHzI Memory amount: 8 GiB GDDR5I Memory bandwidth: 640 GiB/s

1based on the AMD Radeon R9 series (cf.[1])M. Hauschild - Introduction to GPU Computing 3

Page 4: Introduction to GPU Computing - tams.informatik.uni-hamburg.de · I 32 CUDA cores per SM I = 512 CUDA cores )512 ... I OpenACC { C, C++ and Fortran extension I ArrayFire { Wrapper

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Architecture of a GPU Introduction to GPU Computing

Difference between GPU and CPU[3]

I CPU optimized for single thread execution

I GPU optimized for multiple data execution

M. Hauschild - Introduction to GPU Computing 4

Page 5: Introduction to GPU Computing - tams.informatik.uni-hamburg.de · I 32 CUDA cores per SM I = 512 CUDA cores )512 ... I OpenACC { C, C++ and Fortran extension I ArrayFire { Wrapper

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Architecture of a GPU Introduction to GPU Computing

Architecture of a GPU[4]

based on the Nvidia Fermi architecture:

M. Hauschild - Introduction to GPU Computing 5

Page 6: Introduction to GPU Computing - tams.informatik.uni-hamburg.de · I 32 CUDA cores per SM I = 512 CUDA cores )512 ... I OpenACC { C, C++ and Fortran extension I ArrayFire { Wrapper

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Architecture of a GPU Introduction to GPU Computing

Architecture of a GPU[4]

M. Hauschild - Introduction to GPU Computing 6

Page 7: Introduction to GPU Computing - tams.informatik.uni-hamburg.de · I 32 CUDA cores per SM I = 512 CUDA cores )512 ... I OpenACC { C, C++ and Fortran extension I ArrayFire { Wrapper

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Architecture of a GPU Introduction to GPU Computing

Architecture of a GPU[4]

Summary of the Nvidia Fermi architecture:

I 16 Streaming Multiprocessors (SM)

I 32 CUDA cores per SM

I = 512 CUDA cores ⇒ 512 FMA op/clock

⇒ it is great for generating graphics, but what else could be donewith it?

M. Hauschild - Introduction to GPU Computing 7

Page 8: Introduction to GPU Computing - tams.informatik.uni-hamburg.de · I 32 CUDA cores per SM I = 512 CUDA cores )512 ... I OpenACC { C, C++ and Fortran extension I ArrayFire { Wrapper

Universitat Hamburg

MIN-FakultatFachbereich Informatik

General-purpose computing on GPUs Introduction to GPU Computing

What is GPGPU[5]

I General-purpose computing on graphics processing unitsI Using GPU for non-graphical computations

I Good for data parallelismI Bad for instruction parallelism

I First use in LU factorization

I Became popular at 2001 with matrix multiplication

I Started using DirectX and OpenGL

M. Hauschild - Introduction to GPU Computing 8

Page 9: Introduction to GPU Computing - tams.informatik.uni-hamburg.de · I 32 CUDA cores per SM I = 512 CUDA cores )512 ... I OpenACC { C, C++ and Fortran extension I ArrayFire { Wrapper

Universitat Hamburg

MIN-FakultatFachbereich Informatik

General-purpose computing on GPUs Introduction to GPU Computing

GPGPU Frameworks

I Brook – One of the earliest GPU frameworks by StanfordUniversity

I CUDA – Proprietary Nvidia-only framework

I OpenCL – Open source general framework by Khronos Group

I C++ AMP – Open C++ extension by Microsoft

I OpenACC – C, C++ and Fortran extension

I ArrayFire – Wrapper for CUDA, OpenCL, etc.

M. Hauschild - Introduction to GPU Computing 9

Page 10: Introduction to GPU Computing - tams.informatik.uni-hamburg.de · I 32 CUDA cores per SM I = 512 CUDA cores )512 ... I OpenACC { C, C++ and Fortran extension I ArrayFire { Wrapper

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Applications of GPGPU Introduction to GPU Computing

General applications of GPGPU

Again, GPGPU can only be superior to CPU computing, if thesame algorithm is applied to a lot of data (data parallelism)For example:

I k-nearest neighbor

I Fast Fourier Transform

I Segmentation

I Audio Processing

I CT reconstruction

I Weather forecasting

I Cryptography

I Database operations

M. Hauschild - Introduction to GPU Computing 10

Page 11: Introduction to GPU Computing - tams.informatik.uni-hamburg.de · I 32 CUDA cores per SM I = 512 CUDA cores )512 ... I OpenACC { C, C++ and Fortran extension I ArrayFire { Wrapper

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Applications of GPGPU Introduction to GPU Computing

Applications of GPGPU in Robotics[2]

For example:

I Generally many image processing tasks

I Frame transformation

I Inverse kinematic calculation

I 3D pose estimation

I Point-set registration

M. Hauschild - Introduction to GPU Computing 11

Page 12: Introduction to GPU Computing - tams.informatik.uni-hamburg.de · I 32 CUDA cores per SM I = 512 CUDA cores )512 ... I OpenACC { C, C++ and Fortran extension I ArrayFire { Wrapper

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Performance evaluation examples Introduction to GPU Computing

Performance evaluation examples

Test 1I Sobel operator on a real image using OpenCL

I Measurement of the possible frames per second

I On GPU and CPU

Test 2I Matrix multiplication of two squared matrices using OpenCL

I Measurement of time needed for calculation

I On GPU and CPU

M. Hauschild - Introduction to GPU Computing 12

Page 13: Introduction to GPU Computing - tams.informatik.uni-hamburg.de · I 32 CUDA cores per SM I = 512 CUDA cores )512 ... I OpenACC { C, C++ and Fortran extension I ArrayFire { Wrapper

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Performance evaluation examples Introduction to GPU Computing

Performance evaluation examples - System characteristics

I My CPU:I Model: AMD Phenom II X4 965I Clock speed: 3400 MHzI Misc: 4 Cores, SSE3

I My GPU:I Model: AMD Radeon HD 6950,I Memory: 2048 MBI Core clock: 800 MHzI Memory clock: 1250 MHzI Memory bandwidth: 160 GB/s

I My RAM: 8 GB

M. Hauschild - Introduction to GPU Computing 13

Page 14: Introduction to GPU Computing - tams.informatik.uni-hamburg.de · I 32 CUDA cores per SM I = 512 CUDA cores )512 ... I OpenACC { C, C++ and Fortran extension I ArrayFire { Wrapper

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Performance evaluation examples Introduction to GPU Computing

Performance evaluation examples - Test 1

The Sobel operator:

3. s =√dx2 + dy2

M. Hauschild - Introduction to GPU Computing 14

Page 15: Introduction to GPU Computing - tams.informatik.uni-hamburg.de · I 32 CUDA cores per SM I = 512 CUDA cores )512 ... I OpenACC { C, C++ and Fortran extension I ArrayFire { Wrapper

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Performance evaluation examples Introduction to GPU Computing

M. Hauschild - Introduction to GPU Computing 15

Page 16: Introduction to GPU Computing - tams.informatik.uni-hamburg.de · I 32 CUDA cores per SM I = 512 CUDA cores )512 ... I OpenACC { C, C++ and Fortran extension I ArrayFire { Wrapper

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Performance evaluation examples Introduction to GPU Computing

Performance evaluation examples - Test 1

M. Hauschild - Introduction to GPU Computing 16

Page 17: Introduction to GPU Computing - tams.informatik.uni-hamburg.de · I 32 CUDA cores per SM I = 512 CUDA cores )512 ... I OpenACC { C, C++ and Fortran extension I ArrayFire { Wrapper

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Performance evaluation examples Introduction to GPU Computing

Performance evaluation examples - Test 2

Matrix Multiplication2:

2from http://www.mathematrix.de/wp-content/uploads/matrixmul2.png

M. Hauschild - Introduction to GPU Computing 17

Page 18: Introduction to GPU Computing - tams.informatik.uni-hamburg.de · I 32 CUDA cores per SM I = 512 CUDA cores )512 ... I OpenACC { C, C++ and Fortran extension I ArrayFire { Wrapper

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Performance evaluation examples Introduction to GPU Computing

Performance evaluation examples - Test 2

M. Hauschild - Introduction to GPU Computing 18

Page 19: Introduction to GPU Computing - tams.informatik.uni-hamburg.de · I 32 CUDA cores per SM I = 512 CUDA cores )512 ... I OpenACC { C, C++ and Fortran extension I ArrayFire { Wrapper

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Performance evaluation examples Introduction to GPU Computing

Thank you for your attention!

Matthis [email protected]

Universitat HamburgFakultat fur Mathematik, Informatik und NaturwissenschaftenFachbereich Informatik

Technische Aspekte Multimodaler Systeme

M. Hauschild - Introduction to GPU Computing 19

Page 20: Introduction to GPU Computing - tams.informatik.uni-hamburg.de · I 32 CUDA cores per SM I = 512 CUDA cores )512 ... I OpenACC { C, C++ and Fortran extension I ArrayFire { Wrapper

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Performance evaluation examples Introduction to GPU Computing

Bibliography

[1] AMD. AMD RadeonTM R9 Grafikkartenserie, 2014.http://www.amd.com/de-de/products/graphics/desktop/r9#.

[2] J. Bedkowski and A. Maslowski. GPGPU computation in mobile robotapplications. Warsaw University of Technology, 2012.

[3] Nvidia. CUDA C Programming Guide, 2014.http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf.

[4] Nvidia. NVIDIA’s Next Generation CUDA Compute Architecture: Fermi,2014. http://www.nvidia.de/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf.

[5] Wikipedia. General-purpose computing on graphics processing units, 2014.http://en.wikipedia.org/wiki/General-purpose_computing_on_

graphics_processing_units.

M. Hauschild - Introduction to GPU Computing 20