a brief history of gpgpu - computer sciencefirst c/c++ language and compiler for gpus cuda c++...

22
Mark Harris Chief Technologist, GPU Computing UNC Ph.D. 2003 A BRIEF HISTORY OF GPGPU

Upload: others

Post on 13-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge

Mark Harris Chief Technologist, GPU Computing

UNC Ph.D. 2003

A BRIEF HISTORY OF GPGPU

Page 2: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge

2

Page 3: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge

3

A BRIEF HISTORY OF GPGPU

fd

General-Purpose computation on Graphics Processing Units

Page 4: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge

4

THE FIRST GPGPU: IKONAS RDS-3000   1978: Nick England & Mary Whitton founded Ikonas Graphics Systems

  Tim Van Hook wrote microcode for solid modeling, ray tracing (SIGGRAPH ’86)

  From a 1985 Video: “All computation is taking place in the Adage 3000 Display”

http://www.virhistory.com/ikonas/ikonas.html

Page 5: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge

5

UNC PIXEL PLANES AND PIXELFLOW   Procedural textures on Pixel Planes 5 (Rhodes et al. 1992)

PixelFlow

  100,000+ 100MHz 8-bit processors

  Early real-time programmable shading (Olano/Lastra ‘98)

Kedem et al. (‘98) used for unix password cracking

Proceedings 1992 Symposium on Interactive 3D Graphics

Cambridge, Massachusetts 29 March - 1 April 1992

Program Co-Chairs

Marc Levoy Edwin E. Catmull Stanford University Pixar

Symposium Chair

David Zeltzer MIT Media Laboratory

Sponsored by the following organizations:

Office of Naval Research National Science Foundation

USA Ballistic Research Laboratory Hewlett-Packard Silicon Graphics

Sun Microsystems MIT Media Laboratory

In Cooperation with ACM SIGGRAPH

Page 6: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge

6

GEFORCE 1-3: THE DAWN OF GPGPU (‘99-’01)   GeForce 256: First “GPU”

  GeForce 3: First programmable GPU

  Vertex Shaders – programmable vertex transforms, 32-bit float

  Data-dependent, configurable texturing + register combiners

  Enabled early GPGPU results

  Hoff (1999) -- Voronoi diagrams on NVIDIA TNT2

  Larsen &McAllister (2001): first GPU matrix multiplication (8-bit)

Rumpf & Strzodka (2001): first GPU PDEs (diffusion, image segmentation)

  NVIDIA SDK Game of Life, Shallow Water (Greg James, 2001)

Page 7: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge

7

PHYSICALLY BASED SIMULATION ON GEFORCE 3  Approximate simulation of natural phenomena

 Boiling liquid,

 fluid convection,

 chemical reaction-diffusion

  Inaccurate due to low GPU precision

“Physically-Based Visual Simulation on Graphics Hardware”. Harris, Coombe, Scheuermann, and Lastra. Graphics Hardware 2002

Page 8: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge

8

NAMING A TREND

 Let’s name this thing that people are doing!

 I coined “GPGPU” and created home page November 2002

 home on the web to collect research / resources

  Interest grew quickly: launched GPGPU.org August 2003

“Application of graphics hardware to non-graphics applications” “General computations on graphics hardware” “Exploiting special-purpose hardware for alternative purposes”

Page 9: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge

9

GEFORCE FX (2003) : FLOATING POINT PIXELS

  True Programmability enabled broader simulation research

  Ray Tracing (Purcell, 2002), Photon Maps (Purcell, 2003)

Radiosity (Carr et al., 2003 & Coombe et al., 2004)

  PDE solvers

  Red-black Gauss-Seidel (Harris et al., 2003)

  Conjugate gradient (Bolz et al. 2003, Krueger et al. 2003)

Multigrid (Goodnight et al. 2003)

  Physically-based simulation

  Fluid and cloud simulation: (Krueger et al. 2003, Harris et al. 2003)

  Cloth simulation (Green, 2003)

  Ice crystal formation (Kim and Lin, 2003)

  FFT (Moreland and Angel, 2003)

  High-level language: Brook for GPUs (Buck et al. 2004)

Page 10: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge

10

GPU CLOUD SIMULATION  My Ph.D. Dissertation: visually realistic cloud simulation on GPUs

 2D & 3D Incompressible Navier-Stokes fluid

 Thermodynamics (latent heat, diffusion)

 Water condensation / evaporation

 Light scattering simulation for rendering

 Programmed in OpenGL with pixel shaders

“Real-Time Cloud Simulation and Rendering”. Mark Harris Ph.D. Dissertation U. of North Carolina. 2003

Page 11: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge

11

CUDA AND THE G80 GPU (2006)

 First GPU arch. and software platform designed for computing

 Dedicated computing mode – threads rather than pixels/vertices

 General, byte-addressable memory architecture

 First C/C++ language and compiler for GPUs

 CUDA C++ defines minimally extended subset of C++ with parallelism

 2007 began a massive surge in GPGPU development

 Not just graphics PhDs

Page 12: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge

12

ACCELERATING DISCOVERIES

USING A SUPERCOMPUTER POWERED BY 3,000 TESLA PROCESSORS, UNIVERSITY OF ILLINOIS SCIENTISTS PERFORMED THE FIRST ALL-ATOM SIMULATION OF THE HIV VIRUS AND DISCOVERED THE CHEMICAL STRUCTURE OF ITS CAPSID — “THE PERFECT TARGET FOR FIGHTING THE INFECTION.” WITHOUT GPUS, THE SUPERCOMPUTER WOULD NEED TO BE 5X LARGER FOR SIMILAR PERFORMANCE.

Page 13: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge

13

FROM HPC TO ENTERPRISE DATACENTERS

Government Supercomputing Finance Higher Ed Oil & Gas Consumer Web

Air Force Research Laboratory

Naval Research Laboratory

Tokyo Institute of Technology

Page 14: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge

14

MACHINE LEARNING USING DEEP NEURAL NETWORKS

Input Result

Hinton et al., 2006; Bengio et al., 2007; Bengio & LeCun, 2007; Lee et al., 2008; 2009

Visual Object Recognition Using Deep Convolutional Neural Networks Rob Fergus (New York University / Facebook) http://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php#2985

Page 15: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge

15

GPU-ACCELERATED DEEP LEARNING

START-UPS Image Detection

Face Recognition

Gesture Recognition

Video Search & Analytics

Speech Recognition & Translation

Image and Video Understanding

Recommendation Engines

Indexing & Search

Page 16: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge

16

COMMON PROGRAMMING APPROACHES Across Heterogeneous Architectures

x86

Libraries

Programming Languages

Compiler Directives

AmgX cuBLAS

/

Page 17: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge

17

Unified Memory DRAMATICALLY LOWER DEVELOPER EFFORT

Past Developer View

System Memory

GPU Memory

Developer View With Unified Memory

Unified Memory

Page 18: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge

18

PARALLELISM IN MAINSTREAM LANGUAGES

  Enable more programmers to write parallel software

  Give programmers the choice of language to use

  Parallel computing support in key languages

C

Page 19: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge

19

FUTURE C++: PARALLEL STL

  Complete set of parallel primitives: for_each, sort, reduce, scan, etc.

  ISO C++ committee voted unanimously to accept as official tech. specification working draft

N3960 Technical Specification Working Draft: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4352.html Prototype: https://github.com/n3554/n3554

std::vector<int> vec = ... // previous standard sequential loop std::for_each(vec.begin(), vec.end(), f); // explicitly sequential loop std::for_each(std::seq, vec.begin(), vec.end(), f); // permitting parallel execution std::for_each(std::par, vec.begin(), vec.end(), f);

Page 20: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge

20

PARTING WORDS OF WISDOM

  Stand Up!

  “Keep it narrow and doable” – Fred Brooks

  “Write a little bit every day” – Fred Brooks

  “If you measure it, you can improve it” – Jen-Hsun Huang

  “But you have to measure the right thing!”

Page 21: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge

21

1999: ADVENT OF THE GPU

NVIDIA GeForce 256

Coined the term “Graphics Processing Unit”

“A single-chip processor with integrated transform, lighting, triangle setup/clipping, and rendering engines that is capable of processing a minimum of 10 million polygons per second.”

Register Combiners

configurable multipass shading

Beginning of GPU programmability

Spare 0

Fragment Color

Texture Fetch

General Combiner 0

4 RGB Inputs

Texture 0

Texture 1

Fog Color/Factor

Reg

iste

r Set

6 RGB Inputs

Specular Color

4 Alpha Inputs

3 RGB Outputs 3 Alpha Outputs

General Combiner 1

4 RGB Inputs 4 Alpha Inputs

3 RGB Outputs 3 Alpha Outputs

Final Combiner 1 Alpha Input

Specular Color

Page 22: A BRIEF HISTORY OF GPGPU - Computer ScienceFirst C/C++ language and compiler for GPUs CUDA C++ defines minimally extended subset of C++ with parallelism 2007 began a massive surge

22

EARLY PC & WORKSTATION GRAPHICS Rasterizer, texture unit, z-buffer, frame buffer

Fixed-point math, fixed-function interpolation / texturing

Lengyel et al. (1990) – Robot motion planning

Use rasterizer to fill minkowski sum polygons

HP 9000 TurboSRX Workstation

Hoff (1999) -- Voronoi diagrams on NVIDIA TNT2

Render cones – rasterizer and z-buffer compute voronoi diagram