1
Evolution of the Programmable
Graphics Pipeline
Patrick Cozzi
University of Pennsylvania
CIS 565 - Spring 2011
Administrivia
� Tip: google “cis 565”
� Slides posted before each class
� Tentative assignment dates on website
� 1st assignment handed out today
�Write concisely
�Due start of class, one week from today
� Google group in progress
� FYI. GDC Early Registration - 01/24
Survey Results
� 15/23 – graphics experience
� Most students have usable video cards
� Lerk – don’t be scared
� I want to be a Toys R Us kid too
Survey Results
� Class interests�Pure architecture
�Game rendering
�Physical simulations
�Animation
�Vision algorithms
� Image/video processing
�…
2
Course Roadmap
� Graphics Pipeline (GLSL)
� GPGPU (GLSL)�Briefly
� GPU Computing (CUDA, OpenCL)
� Choose your own adventure�Student Presentation
�Final Project
� Goal: Prepare you for your presentation and project
Agenda
� Why program the GPU?
� Graphics Review
� Evolution of the Programmable Graphics
Pipeline
�Understand the past
Why Program the GPU?
Graph from: http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_C_Programming_Guide.pdf
Why Program the GPU?
Graph from: http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_C_Programming_Guide.pdf
3
Why Program the GPU?
� Compute� Intel Core i7 – 4 cores – 100 GFLOP
� NVIDIA GTX280 – 240 cores – 1 TFLOP
� Memory Bandwidth� System Memory – 60 GB/s
� NVIDIA GT200 – 150 GB/s
� Install Base� Over 200 million NVIDIA G80s shipped
Numbers from Programming Massively Parallel Processors.
NVIDIA GPU Evolution
Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
Graphics Review
� Modeling
� Rendering
� Animation
Graphics Review: Modeling
� Modeling
�Polygons vs Triangles
� How do you store a triangle mesh?
� Implicit Surfaces
�Height maps
�…
4
Triangles
Image courtesy of A K Peters, Ltd. www.virtualglobebook.com
Triangles
Image courtesy of A K Peters, Ltd. www.virtualglobebook.com. Imagery from NASA Visible Earth: visibleearth.nasa.gov.
Triangles Triangles
5
Implicit Surfaces
Images from GPU Gems 3: http://http.developer.nvidia.com/GPUGems3/gpugems3_ch01.html
Height Maps
Image courtesy of A K Peters, Ltd. www.virtualglobebook.com
Graphics Review: Rendering
� Rendering�Goal: Assign color to pixels
� Two Parts�Visible surfaces
� What is in front of what for a given view
�Shading� Simulate the interaction of material and light to
produce a pixel color
Rasterization
� What about ray tracing?
6
Visible Surfaces
Image courtesy of A K Peters, Ltd. www.virtualglobebook.com
Visible Surfaces
� Z-Buffer / Depth Buffer
� Fragment vs Pixel
Image courtesy of A K Peters, Ltd. www.virtualglobebook.com
Shading
Images courtesy of A K Peters, Ltd. www.virtualglobebook.com
Shading
Image from GPU Gems 3: http://http.developer.nvidia.com/GPUGems3/gpugems3_ch14.html
7
Graphics Pipeline
PrimitiveAssembly
PrimitiveAssembly
VertexTransforms
VertexTransforms
Frame Buffer
Frame Buffer
RasterOperations
Rasterizationand
Interpolation
� Scissor Test
� Stencil Test
� Depth Test
� Blending
Graphics Pipeline
Images courtesy of A K Peters, Ltd. http://www.realtimerendering.com/
Graphics Pipeline
Images courtesy of A K Peters, Ltd. http://www.realtimerendering.com/
Graphics Pipeline
Images courtesy of A K Peters, Ltd. http://www.realtimerendering.com/
8
Graphics Pipeline
Images courtesy of A K Peters, Ltd. http://www.realtimerendering.com/
Graphics Review: Animation
� Move the camera and/or agents, and re-render the scene
� In less than 16.6 ms (60 fps)
Evolution of the Programmable
Graphics Pipeline
� Pre GPU
� Fixed function GPU
� Programmable GPU
� Unified Shader Processors
Early 90s – Pre GPU
Slide from Mike Houston: http://s09.idav.ucdavis.edu/talks/01-BPS-SIGGRAPH09-mhouston.pdf
9
Why GPUs?
� Exploit Parallelism
�Pipeline parallel
�Data-parallel
�CPU and GPU executing in parallel
� Hardware: texture filtering, MAD, etc.
Generation I: 3dfx Voodoo (1996)
Image from “7 years of Graphics”
• Did not do vertex transformations:these were done in the CPU
• Did do texture mapping, z-buffering.
PrimitiveAssembly
PrimitiveAssembly
VertexTransforms
VertexTransforms
Frame Buffer
Frame Buffer
RasterOperations
Rasterizationand
Interpolation
CPU GPUPCI
Slide adapted from Suresh Venkatasubramanian and Joe Kider
Aside: Mario Kart 64
Image from: http://www.gamespot.com/users/my_shoe/
� High fragment load / low vertex load
Aside: Mario Kart Wii
� High fragment load / low vertex load?
Image from: http://wii.ign.com/dor/objects/949580/mario-kart-wii/images/
10
Generation II: GeForce/Radeon 7500 (1998)
Slide from Suresh Venkatasubramanian and Joe Kider
VertexTransforms
VertexTransforms
• Main innovation: shifting the transformation and lighting
calculations to the GPU
• Allowed multi-texturing: giving bump
maps, light maps, and others..
• Faster AGP bus instead of PCI
PrimitiveAssembly
PrimitiveAssembly
Frame Buffer
Frame Buffer
RasterOperations
Rasterizationand
Interpolation
GPUAGP
Image from “7 years of Graphics”
Generation III: GeForce3/Radeon 8500(2001)
Slide from Suresh Venkatasubramanian and Joe Kider
VertexTransforms
VertexTransforms
• For the first time, allowed limited
amount of programmability in the vertex pipeline
• Also allowed volume texturing and multi-sampling (for antialiasing)
PrimitiveAssembly
PrimitiveAssembly
Frame Buffer
Frame Buffer
RasterOperations
Rasterizationand
Interpolation
GPUAGP
Small vertexshaders
Small vertexshaders
Image from “7 years of Graphics”
Generation IV: Radeon 9700/GeForce FX (2002)
VertexTransforms
VertexTransforms
• This generation is the first generation of fully-programmable graphics cards
• Different versions have different resource limits on fragment/vertex
programs
PrimitiveAssembly
PrimitiveAssembly
RasterOperations
Rasterizationand
Interpolation
AGP
ProgrammableVertex shader
ProgrammableVertex shader
ProgrammableFragmentProcessor
ProgrammableFragmentProcessor
Texture Memory
Slide from Suresh Venkatasubramanian and Joe Kider
Image from “7 years of Graphics”
Generation IV.V: GeForce6/X800 (2004)
Slide adapted from Suresh Venkatasubramanian and Joe Kider
� Simultaneous rendering to multiple buffers
� True conditionals and loops
� PCIe bus
� Vertex texture fetch
VertexTransforms
VertexTransforms
PrimitiveAssembly
PrimitiveAssembly
Frame Buffer
Frame Buffer
RasterOperations
Rasterizationand
Interpolation
PCIe
ProgrammableVertex shader
ProgrammableVertex shader
ProgrammableFragmentProcessor
ProgrammableFragmentProcessor
Texture Memory Texture Memory
11
NVIDIA NV40 Architecture
Image from GPU Gems 2: http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter30.html
6 vertex
shader units
16 fragment
shader units
Vertex TextureFetch
Generation V: GeForce8800/HD2900 (2006)
Slide adapted from Suresh Venkatasubramanian and Joe Kider
� Ground-up GPU redesign
� Support for Direct3D 10 / OpenGL
3
� Geometry Shaders
� Stream out / transform-feedback
� Unified shader processors
� Support for General GPU programming
Input Assembler
Input Assembler
ProgrammablePixel (Fragment)
Shader
ProgrammablePixel (Fragment)
Shader
RasterOperations
ProgrammableGeometry Shader
PCIe
ProgrammableVertex shader
ProgrammableVertex shader
OutputMerger
D3D 10 Pipeline
Image from David Blythe : http://download.microsoft.com/download/f/2/d/f2d5ee2c-b7ba-4cd0-9686-b6508b5479a1/direct3d10_web.pdf
Geometry Shaders: Point Sprites
12
Geometry Shaders: Point Sprites Geometry Shaders
Image from David Blythe : http://download.microsoft.com/download/f/2/d/f2d5ee2c-b7ba-4cd0-9686-b6508b5479a1/direct3d10_web.pdf
NVIDIA G80 Architecture
Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
NVIDIA G80 Architecture
Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
13
Why Unify Shader Processors?
Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
Why Unify Shader Processors?
Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
Unified Shader Processors
Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
Terminology
NVIDIA GeForce GTX 480
ATI Radeon HD 58704.x11.x4
NVIDIA GeForce 8800
ATI Radeon HD 29003.x10.x3
NVIDIA GeForce 6800
ATI Radeon X8002.x92
Video card
Example
OpenGLDirect3DShaderModel
14
Shader Capabilities
Table courtesy of A K Peters, Ltd. http://www.realtimerendering.com/
Shader Capabilities
Table courtesy of A K Peters, Ltd. http://www.realtimerendering.com/
Evolution of the Programmable Graphics Pipeline
Slide from Mike Houston: http://s09.idav.ucdavis.edu/talks/01-BPS-SIGGRAPH09-mhouston.pdf
Evolution of the Programmable Graphics Pipeline
Slide from Mike Houston: http://s09.idav.ucdavis.edu/talks/01-BPS-SIGGRAPH09-mhouston.pdf
15
� Not covered today:
�SM 5 / D3D 11 / GL 4
�Tessellation shaders
� *cough* student presentation *cough*
�Later this semester: NVIDIA Fermi
� Dual warp scheduler
� Configurable L1 / shared memory
� Double precision
� …
Evolution of the Programmable Graphics Pipeline New Tool: AMD System Monitor
� Released 01/04/2011
� http://support.amd.com/us/kbarticles/Pages/AMDSystemMonitor.aspx