evolution of the modern graphics architectures with a focus on gpus | turing100@persistent
DESCRIPTION
Sanjiv Satoor, Sr. Manager, NVIDIA talks about evolution of the modern graphics architectures with a focus on GPUs.TRANSCRIPT
Evolution of Graphics Architectures
with a focus on GPUs
Sanjiv Satoor
Senior Manager, NVIDIA
First Generation - Wireframe
Vertex: transform, clip, and project
Rasterization: lines only
Pixel: no pixels! calligraphic display
Dates: prior to 1987
Storage Tube Terminals
CRTs with analog charge “persistence”
Accumulate a detailed static image by writing points or
line segments
Erase the stored image to start a new one
Early Framebuffers
By the mid-1970’s one could afford framebuffers with a
few bits per pixel at modest resolution
“A Random Access Video Frame Buffer”,
Kajiya, Sutherland, Cheadle, 1975
Vector displays were still better for fine position detail
Framebuffers were used to emulate storage tube vector
terminals on a raster display
Second Generation – Shaded Solids
Vertex: lighting
Rasterization: filled polygons
Pixel: depth buffer, color blending
Dates: 1987 - 1992
Third Generation – Texture Mapping
Vertex: more, faster
Rasterization: more, faster
Pixel: texture filtering, antialiasing
Dates: 1992 - 2001
IRIS 3000 Graphics Cards
Geometry Engines & Rasterizer 4 bit / pixel Framebuffer
(2 instances)
1990’s
Desktop 3D workstations under $5000
Single-board, multi-chip graphics subsystems
Rise of 3D on the PC
40 company free-for-all until intense competition knocked out all but a
few players
Many were “decelerators”, and easy to beat
Single-chip GPUs
Interesting hardware experimentation
PCs would take over the workstation business
Interesting consoles
3DO, Nintendo, Sega, Sony
1998 1999 2000 2001 2002 2003 2004
DirectX 6
Multitexturing
Riva TNT
DirectX 8
SM 1.x
GeForce 3 Cg
DirectX 9
SM 2.0
GeForceFX
DirectX 9.0c
SM 3.0
GeForce 6
DirectX 5
Riva 128
DirectX 7
T&L TextureStageState
GeForce 256
Quake 3 Giants Halo Far Cry UE3 Half-Life
All images © their respective owners
Moving toward programmability
RIVA 128 3M xtors
GeForce 256 23M xtors
GeForce FX 250M xtors
GeForce 8800 681M xtors
GeForce 3 60M xtors
“Kepler” 7B xtors
1995 2000 2001 2006 2012
Fixed function Programmable shaders CUDA
2003
Evolution of GPUs
Copyright © NVIDIA Corporation 2006 Unreal © Epic
Per-Vertex Lighting No Lighting Per-Pixel Lighting
Lush, Rich Worlds Stunning Graphics Realism
Core of the Definitive Gaming Platform Incredible Physics Effects
Hellgate: London © 2005-2006 Flagship Studios, Inc. Licensed by NAMCO BANDAI Games America, Inc.
Crysis © 2006 Crytek / Electronic Arts
Full Spectrum Warrior: Ten Hammers © 2006 Pandemic Studios, LLC. All rights reserved. © 2006 THQ Inc. All rights reserved.
Tradition Fixed Function Graphics pipeline
T&L evolved to vertex shading
memory
interface
vertex
processing
triangle
setup
pixel
processing
raster
operations
Triangle, point, line setup
Flat shading, texturing eventually pixel shading
Blending, Z-buffering, Antialiasing
Wider and faster over the years
Processor per function
Migration of functionality to GPU hardware
GeForce3/DX8 Pixel Shading Pipeline
Programmable Shaders: GeForceFX (2002)
Vertex and fragment operations specified in small (macro) assembly language
User-specified mapping of input data to operations
Limited ability to use intermediate computed values to index input data (textures and vertex uniforms)
Input 2Input 1Input 0
OP
Temp 2Temp 1Temp 0
ADDR R0.xyz, eyePosition.xyzx, -f[TEX0].xyzx;
DP3R R0.w, R0.xyzx, R0.xyzx;
RSQR R0.w, R0.w;
MULR R0.xyz, R0.w, R0.xyzx;
ADDR R1.xyz, lightPosition.xyzx, -f[TEX0].xyzx;
DP3R R0.w, R1.xyzx, R1.xyzx;
RSQR R0.w, R0.w;
MADR R0.xyz, R0.w, R1.xyzx, R0.xyzx;
MULR R1.xyz, R0.w, R1.xyzx;
DP3R R0.w, R1.xyzx, f[TEX1].xyzx;
MAXR R0.w, R0.w, {0}.x;
GeForce 6 Architecture
Unified Hardware Shader Design
L2
FB
SP SP
L1
TF
Th
read
Pro
cesso
r
Vtx Thread Issue
Setup / Rstr / ZCull
Geom Thread Issue Pixel Thread Issue
Input Assembler
Host
SP SP
L1
TF
SP SP
L1
TF
SP SP
L1
TF
SP SP
L1
TF
SP SP
L1
TF
SP SP
L1
TF
SP SP
L1
TF
L2
FB
L2
FB
L2
FB
L2
FB
L2
FB
GeForce 8 Architecture Build the architecture around the processor
Millions of triangles Millions of pixels
Why are so many parallel operations needed?
Input triangle Tessellate Projection Rasterize Shade Transform vertices
Image plane
Camera
GPU = More computational horsepower and
bandwidth per watt
Few complex processors
Optimized for single-threaded performance
Many simple processors with minimal overhead
Slow single-threaded performance but massive overall throughput
GPU Architecture
Efficiency
Programmability
Performance
GPU Architecture:
Two Main Components
Streaming Multiprocessors (SMs)
Perform the actual computations
Each SM has its own:
Control units, registers, execution pipelines, caches
Global memory
Analogous to RAM in a CPU server
Accessible by both GPU and CPU
Currently up to 6 GB per GPU
Bandwidth currently up to 250 GB/s
DR
AM
I/F
G
iga
Th
rea
d
HO
ST
I/F
D
RA
M I
/F
DR
AM
I/F
DR
AM
I/F
DR
AM
I/F
DR
AM
I/F
L2
KEPLER
The Fastest, Most Efficient GPU Ever Built
Kepler GK110 Architecture
7.1B Transistors
14 SMX units
3.95 TFLOP FP32
1.31 TFLOP FP64
250 GB/sec
2688 cores
PCI Express Gen3
WORLD’S #1 SUPERCOMPUTER With a peak performance of 27 petaflops, the Titan supercomputer at Oak Ridge National Labs is the world’s fastest. 18,688 GPUs provide 90% of the machine’s computing power.
The Graphics pipeline
Vertex and fragment processing are programmable
The programmer can write programs that are executed for every vertex as well as for every fragment
This allows fully customizable geometry and shading effects that go well beyond the generic look and feel of older 3D applications
host
interface
vertex
processing
triangle
setup
pixel
processing
memory
interface
Thank you