![Page 1: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/1.jpg)
The Way of the GPU (based on GPGPU SIGGRAPH Course)
CS535 Fall 2014
Daniel G. Aliaga
Department of Computer Science Purdue University
![Page 2: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/2.jpg)
Computer Graphics Pipeline
Modeling Transformation
Lighting
Viewing Transformation
Clipping
Projection
Scan Conversion
Geometry
Image
Transform into 3D world coordinate system
Simulate illumination and reflectance
Transform into 3D camera coordinate system
Clip primitives outside camera’s view
Transform into 2D camera coordinate system
Draw pixels (incl. texturing, hidden surface…)
(this is really from 20 years ago…)
![Page 3: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/3.jpg)
Today, we have GPUs…
(GPU = graphical processing unit)
![Page 4: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/4.jpg)
Motivation: Computational Power
• Why are GPUs fast?
– Arithmetic intensity: the specialized nature of GPUs makes it easier to use additional transistors for computation not cache
– Economics: multi-billion dollar video game market is a pressure cooker that drives innovation
![Page 5: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/5.jpg)
Motivation: Flexible and Precise
• Modern GPUs are deeply programmable
– Programmable pixel, vertex, video engines
– Solidifying high-level language support
• Modern GPUs support high precision
– 32 bit floating point throughout the pipeline
– High enough for many (not all) applications
![Page 6: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/6.jpg)
The Problem: Difficult To Use
• GPUs designed for & driven by video games
– Programming model unusual
– Programming idioms tied to computer graphics
– Programming environment tightly constrained
• Underlying architectures are:
– Inherently parallel
– Rapidly evolving (even in basic feature set!)
– Largely secret
• Can’t simply “port” CPU code!
![Page 7: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/7.jpg)
Diagram of a Modern GPU
fast
memory
fast
memory
fast
memory
fast
memory
Input from CPU
Host interface
Geometry/Vertex processing
Triangle setup
Pixel processing
Memory Interface
![Page 8: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/8.jpg)
nVIDIA GPU
• GTX 680
– 1536 cores @ 1GHz (i.e., mini processors)
– 128.8 billions pixels/sec fill rate
– 966 Gflops in matrix multiplication
– 4096x2160 pixels
– 200W power
– 98C max GPU temp
– $500
![Page 9: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/9.jpg)
Modern GPU has more ALU’s
![Page 10: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/10.jpg)
GPU Pipeline: Transform
• Vertex/Geometry processor (multiple in parallel)
– Transform from “world space” to “image space”
– Compute per-primitive and per-vertex lighting
![Page 11: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/11.jpg)
GPU Pipeline: Rasterize (typically not programmable)
• Rasterizer
– Convert geometric rep. (vertex) to image rep. (fragment)
• Fragment = image fragment – Pixel + associated data: color, depth, stencil, etc.
– Interpolate per-vertex quantities across pixels
![Page 12: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/12.jpg)
GPU Pipeline: Shade
• Fragment processors (multiple in parallel)
– Compute a color for each pixel
– Optionally read colors from textures (images)
![Page 13: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/13.jpg)
GPU Programming Languages
• Many options!
– A while ago: “Renderman”
– cG (from NVIDIA)
– GLSL (GL shading Language)
– CUDA (more general that graphics)…
• Lets focus first on the concept, later on the language specifics…
![Page 14: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/14.jpg)
Mapping Parallel Computational Concepts to GPUs
![Page 15: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/15.jpg)
15
Mapping Parallel Computational Concepts to GPUs
• GPUs are designed for graphics – Highly parallel tasks
• GPUs process independent vertices & fragments – Temporary registers are zeroed
– No shared or static data
– No read-modify-write buffers
• Data-parallel processing – GPUs architecture is ALU-heavy
• Multiple vertex & pixel pipelines, multiple ALUs per pipe
– Hide memory latency (with more computation)
![Page 16: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/16.jpg)
16
Example: Simulation Grid
• Common GPGPU computation style – Textures represent computational grids = streams
• Many computations map to grids – Matrix algebra
– Image & Volume processing
– Physically-based simulation
– Global Illumination
• ray tracing, photon mapping, radiosity
• Non-grid streams can be mapped to grids
![Page 17: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/17.jpg)
17
Typical Stream Computation
• Grid Simulation algorithm
– Made up of steps
– Each step updates entire grid
– Must complete before next step can begin
• Grid is a stream, steps are kernels
– Kernel applied to each stream element
Cloud
simulation
algorithm
![Page 18: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/18.jpg)
18
e.g.: Scatter vs. Gather
• Grid communication
– Grid cells share information
![Page 19: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/19.jpg)
19
Vertex Processor
• Fully programmable (SIMD / MIMD)
• Processes 4-vectors (RGBA / XYZW)
• Capable of scatter but not gather
– Can change the location of current vertex
– Cannot read info from other vertices
– Can only read a small constant memory
• Latest GPUs: Vertex Texture Fetch
– Random access memory for vertices
– Gather (But not from the vertex stream itself)
![Page 20: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/20.jpg)
20
Fragment Processor
• Fully programmable (SIMD)
• Processes 4-component vectors (RGBA / XYZW)
• Random access memory read (textures)
• Capable of gather but not scatter – RAM read (texture fetch), but no RAM write
– Output address fixed to a specific pixel
• Typically more useful than vertex processor – More fragment pipelines than vertex pipelines
– Direct output (fragment processor is at end of pipeline)
![Page 21: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/21.jpg)
21
GPU Simulation Overview
• A Simulation:
– Its algorithm steps are fragment programs
• Called Computational kernels
– Current state is stored in textures
– Feedback via “render to texture”
• Question:
– How do we invoke computation?
![Page 22: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/22.jpg)
22
Invoking Computation
• Must invoke computation at each pixel
– Just draw geometry!
– Most common GPGPU invocation is a full-screen quad
• Other Useful Analogies
– Rasterization = Kernel Invocation
– Texture Coordinates = Computational Domain
– Vertex Coordinates = Computational Range
![Page 23: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/23.jpg)
23
Typical “Grid” Computation
• Initialize “view” (so that pixels:texels::1:1) glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrtho(0, 1, 0, 1, 0, 1);
glViewport(0, 0, outTexResX, outTexResY);
• For each algorithm step: – Activate render-to-texture
– Setup input textures, fragment program
– Draw a full-screen quad (1x1)
![Page 24: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/24.jpg)
24
Example: N-Body Simulation • Brute force
• N = 8192 bodies
• N2 gravity computations
• 64M force comps. / frame
• ~25 flops per force
• 10.5 fps
• 17+ GFLOPs sustained in this example Nyland, Harris, Prins,
GP2 2004 poster
![Page 25: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/25.jpg)
25
Computing Gravitational Forces
• Each body attracts all other bodies
– N bodies, so N2 forces
• Draw into an NxN buffer
– Pixel (i,j) computes force between bodies i and j
– Very simple fragment program
• More than 2048 bodies is tricky
• Why?
![Page 26: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/26.jpg)
26
Computing Gravitational Forces N-body force Texture
force(i,j)
N i
N
0
j
i
j
Body Position Texture
F(i,j) = gMiMj / r(i,j)2,
r(i,j) = |pos(i) - pos(j)|
Force is proportional to the inverse square
of the distance between bodies
![Page 27: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/27.jpg)
27
Computing Gravitational Forces
float4 force(float2 ij : WPOS,
uniform sampler2D pos) : COLOR0
{
// Pos texture is 2D, not 1D, so we need to
// convert body index into 2D coords for pos tex
float4 iCoords = getBodyCoords(ij);
float4 iPosMass = texture2D(pos, iCoords.xy);
float4 jPosMass = texture2D(pos, iCoords.zw);
float3 dir = iPos.xyz - jPos.xyz;
float r2 = dot(dir, dir);
dir = normalize(dir);
return dir * g * iPosMass.w * jPosMass.w / r2;
}
![Page 28: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/28.jpg)
28
Computing Total Force • Have: array of (i, j) forces
• Need: total force on each particle i – Sum of each column of the force
array
• Can do all N columns in parallel
This is called a Parallel Reduction
force(i,j)
N-body force Texture
N i
N
0
![Page 29: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/29.jpg)
29
Parallel Reductions
1D parallel reduction:
sum N columns or rows in parallel
add two halves of texture together
repeatedly...
Until we’re left with a single row of texels
+ NxN
Nx(N/2) Nx(N/4)
Nx1
Requires log2N steps
![Page 30: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/30.jpg)
30
Update Positions and Velocities
• Now we have a 1-D array of total forces
– One per body
• Update Velocity
– u(i,t+dt) = u(i,t) + Ftotal(i) * dt
– Simple pixel shader reads previous velocity and force textures, creates new velocity texture
• Update Position
– x(i, t+dt) = x(i,t) + u(i,t) * dt
– Simple pixel shader reads previous position and velocity textures, creates new position texture
![Page 31: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/31.jpg)
Linear Algebra on GPUs
• Use linear algebra for VR, education, simulation, games and much more!
![Page 32: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/32.jpg)
Representation Vector representation
– 2D textures best we can do
• Per-fragment vs. per-vertex operations
• High texture memory bandwidth
• Read-write access, dependent fetches
1 N
1
N
![Page 33: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/33.jpg)
Representation (cont.) Dense Matrix representation
– treat a dense matrix as a set of column vectors
– again, store these vectors as 2D textures
Matrix
N
N
i N Vectors
... ... N
1 i N
N 2D-Textures
... ... 1 i N
![Page 34: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/34.jpg)
Representation (cont.) Banded Sparse Matrix representation
– treat a banded matrix as a set of diagonal vectors
N
N
Matrix i 2 Vectors
N
1 2
2 2D-Textures 1 2
![Page 35: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/35.jpg)
Representation (cont.) Banded Sparse Matrix representation
– combine opposing vectors to save space
N
N
Matrix i 2 Vectors
N
1 2
2 2D-Textures 1 2
N-i
![Page 36: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/36.jpg)
Operations • Vector-Vector Operations
– Reduced to 2D texture operations
– Coded in vertex/fragment shaders
• Example: Vector1 + Vector2 Vector3
return tex0 + tex1 Pass through
Vector 1 Vector 2
Vertex Shader Pixel Shader
Vector 3
Render To Texture Static quad
+
TexUnit 0 TexUnit 1
![Page 37: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/37.jpg)
Operations (cont.) • Vector-Vector Operations
– Reduce operation for scalar products
Reduce m x n region
in fragment shader
2 pass nd
... ...
st 1 pass
...
original Texture
...
![Page 38: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/38.jpg)
The “single float” on GPUs
Some operations generate single float values e.g. reduce
Read-back to main-mem is slow
Keep single floats on the GPU as 1x1 textures
...
![Page 39: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/39.jpg)
Operations (cont.) In depth example: Vector / Banded-Matrix Multiplication
A x b
=
![Page 40: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/40.jpg)
Example (cont.)
Vector / Banded-Matrix Multiplication
A x b
=
A b
![Page 41: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/41.jpg)
Example (cont.)
Compute the result in 2 Passes
Pass 1:
A
b
x
=
multiply
![Page 42: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/42.jpg)
Example (cont.)
Compute the result in 2 Passes
Pass 2:
A
b
x
=
multiply
b‘ shift
![Page 43: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/43.jpg)
Random sparse matrix representation
– Textures do not work
• Splitting yields highly fragmented textures
• Difficult to find optimal partitions
– Idea: encode only non-zero entries in vertex arrays
Representation (cont.)
N
N
Matrix
Vertex Array 1 Vertex Array 2
Result Vector
=
Pos Tex1-4 Pos Tex1-4 Tex0
Result Vector Matrix =
Pos Tex1-4 Pos Tex0
Row Index as Position
Values as TexCoord 0
Column as TexCoord 1-4
![Page 44: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/44.jpg)
Sorting
• Given an unordered list of elements, produce list ordered by key value
– Kernel: compare and swap
• GPUs constrained programming environment limits viable algorithms
– Based on “Bitonic merge sort” [Batcher 68]
– Periodic balanced sorting networks [Dowd 89]
![Page 45: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/45.jpg)
Bitonic Merge Sort Overview
• Repeatedly build bitonic lists and then sort them
– Bitonic list is two monotonic lists concatenated together, one increasing and one decreasing.
• List A: (3, 4, 7, 8) monotonically increasing
• List B: (6, 5, 2, 1) monotonically decreasing
• List AB: (3, 4, 7, 8, 6, 5, 2, 1) bitonic
![Page 46: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/46.jpg)
Bitonic Merge Sort
1
2
3
4
5
6
7
8
8x monotonic lists: (3) (7) (4) (8) (6) (2) (1) (5)
4x bitonic lists: (3,7) (4,8) (6,2) (1,5)
![Page 47: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/47.jpg)
Bitonic Merge Sort
1
2
3
4
5
6
7
8
Sort the bitonic lists
![Page 48: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/48.jpg)
Bitonic Merge Sort
1
2
3
4
5
6
7
8
3
8
7
4
5
6
1
2
4x monotonic lists: (3,7) (8,4) (2,6) (5,1)
2x bitonic lists: (3,7,8,4) (2,6,5,1)
![Page 49: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/49.jpg)
Bitonic Merge Sort
1
2
3
4
5
6
7
8
3
8
7
4
5
6
1
2
Sort the bitonic lists
![Page 50: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/50.jpg)
Bitonic Merge Sort
3
8
4
7
2
6
1
5
1
2
3
4
5
6
7
8
3
8
7
4
5
6
1
2
Sort the bitonic lists
![Page 51: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/51.jpg)
Bitonic Merge Sort
3
8
4
7
2
6
1
5
1
2
3
4
5
6
7
8
3
8
7
4
5
6
1
2
Sort the bitonic lists
![Page 52: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/52.jpg)
Bitonic Merge Sort
3
7
4
8
2
5
1
6
3
8
4
7
2
6
1
5
1
2
3
4
5
6
7
8
3
8
7
4
5
6
1
2
2x monotonic lists: (3,4,7,8) (6,5,2,1)
1x bitonic list: (3,4,7,8, 6,5,2,1)
![Page 53: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/53.jpg)
Bitonic Merge Sort
3
7
4
8
2
5
1
6
3
8
4
7
2
6
1
5
1
2
3
4
5
6
7
8
3
8
7
4
5
6
1
2
Sort the bitonic list
![Page 54: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/54.jpg)
Bitonic Merge Sort
3
2
4
1
7
5
8
6
3
7
4
8
2
5
1
6
3
8
4
7
2
6
1
5
1
2
3
4
5
6
7
8
3
8
7
4
5
6
1
2
Sort the bitonic list
![Page 55: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/55.jpg)
Bitonic Merge Sort
3
2
4
1
7
5
8
6
3
7
4
8
2
5
1
6
3
8
4
7
2
6
1
5
1
2
3
4
5
6
7
8
3
8
7
4
5
6
1
2
Sort the bitonic list
![Page 56: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/56.jpg)
Bitonic Merge Sort
2
3
1
4
7
5
8
6
3
2
4
1
7
5
8
6
3
7
4
8
2
5
1
6
3
8
4
7
2
6
1
5
1
2
3
4
5
6
7
8
3
8
7
4
5
6
1
2
Sort the bitonic list
![Page 57: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/57.jpg)
Bitonic Merge Sort
2
3
1
4
7
5
8
6
3
2
4
1
7
5
8
6
3
7
4
8
2
5
1
6
3
8
4
7
2
6
1
5
1
2
3
4
5
6
7
8
3
8
7
4
5
6
1
2
Sort the bitonic list
![Page 58: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/58.jpg)
Bitonic Merge Sort
1
3
2
4
7
6
8
5
2
3
1
4
7
5
8
6
3
2
4
1
7
5
8
6
3
7
4
8
2
5
1
6
3
8
4
7
2
6
1
5
1
2
3
4
5
6
7
8
3
8
7
4
5
6
1
2
Done!
![Page 59: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/59.jpg)
Bitonic Merge Sort Summary
• Separate rendering pass for each set of swaps
– O(log2n) passes
– Each pass performs n compare/swaps
– Total compare/swaps: O(n log2n)
• Limitations of GPU cost us factor of log n over best CPU-based sorting algorithms
![Page 60: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/60.jpg)
60
Database and Streaming on GPUs
• Utilize graphics processors for fast computation of common database operations
– Conjunctive selections
– Aggregations
– Semi-linear queries
• Essential components
![Page 61: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/61.jpg)
61
Basic DB Operations
Basic SQL query
» Select A
» From T
» Where C
A= attributes or aggregations (SUM, COUNT, MAX etc)
T=relational table
C= Boolean Combination of Predicates (using operators AND, OR, NOT)
![Page 62: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/62.jpg)
62
Database Operations
• Predicates
– ai op constant or ai op aj
– op: <,>,<=,>=,!=, =, TRUE, FALSE
• Boolean combinations
– Conjunctive Normal Form (CNF)
• Aggregations
– COUNT, SUM, MAX, MEDIAN, AVG
![Page 63: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/63.jpg)
63
Predicate Evaluation
• ai op constant (d)
– Copy the attribute values ai into depth buffer
– Specify the comparison operation used in the depth test
– Draw a screen filling quad at depth d and perform the depth test
![Page 64: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/64.jpg)
64
Screen
P
If ( ai op d )pass fragment
Else
reject fragment
ai op d
d
![Page 65: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/65.jpg)
65
Predicate Evaluation
CPU implementation — Intel compiler 7.1 with SIMD optimizations
![Page 66: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/66.jpg)
66
Predicate Evaluation
• ai op aj – Equivalent to (ai – aj) op 0
• Semi-linear queries – Defined as linear combination of attribute values compared
against a constant
– Linear combination is computed as a dot product of two vectors
– Utilize the vector processing capabilities of GPUs
ii as
![Page 67: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/67.jpg)
67
Semi-linear Query
![Page 68: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/68.jpg)
68
Boolean Combination
• CNF: – (A1 AND A2 AND … AND Ak) where
Ai = (Bi1 OR Bi
2 OR … OR Bimi )
• Performed using stencil test recursively – C1 = (TRUE AND A1) = A1
– Ci = (A1 AND A2 AND … AND Ai) = (Ci-1 AND Ai)
• Different stencil values are used to code the outcome of Ci – Positive stencil values — pass predicate evaluation
– Zero — fail predicate evaluation
![Page 69: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/69.jpg)
69
A1 AND A2
A1
B21
B22
B23
A2 = (B21 OR B2
2 OR B23 )
![Page 70: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/70.jpg)
70
A1 AND A2
A1
Stencil value = 1
![Page 71: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/71.jpg)
71
A1 AND A2
A1
Stencil value = 0
Stencil value = 1
TRUE AND A1
![Page 72: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/72.jpg)
72
A1 AND A2
A1
Stencil = 0
Stencil = 1
B21
Stencil=2
B22
Stencil=2
B23
Stencil=2
![Page 73: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/73.jpg)
73
A1 AND A2
A1
Stencil = 0
Stencil = 1
B21
B22
B23
Stencil=2
Stencil=2
Stencil=2
![Page 74: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/74.jpg)
74
A1 AND A2
Stencil = 0
Stencil=2
A1 AND B21
Stencil = 2
A1 AND B22
Stencil=2
A1 AND B23
![Page 75: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/75.jpg)
75
Aggregations
• COUNT, MAX, MIN, SUM, AVG
– These can be efficiently implemented as well!
![Page 76: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/76.jpg)
76
Geometric Computations
• Well studied
– Computer graphics, computational geometry etc.
• Widely used in games, simulations, virtual reality applications
![Page 77: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/77.jpg)
77
GPUs: Geometric Computations
• Used for geometric applications
– Minkowski sums [Kim et al. 02]
– CSG rendering [Goldfeather et al. 89, Rossignac et al. 90]
– Voronoi computation [Hoff et al. 01, 02, Sud et al. 04]
– Isosurface computation [Pascucci 04]
– Map simplification [Mustafa et al. 01]
![Page 78: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/78.jpg)
78
Collision Detection Pointers:
• Overview
• Collision Detection: CULLIDE
• Inter- and Intra-Object Collision Detection: Quick-CULLIDE
• Reliable Collision Detection: FAR
• Analysis
![Page 79: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/79.jpg)
• so far: GPGPU limited to texture output
• new APIs allow geometry generation on GPU
Geometry processing on GPUs
![Page 80: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/80.jpg)
Examples
Fluid Simulation 3D Smoke & Fire
Particles
Grid displacement
Water Simulation 3D Water Surfaces
![Page 81: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/81.jpg)
Examples
Particles
Grid displacement
Point Decompression
Fluid Simulation 3D Smoke & Fire
Water Simulation 3D Water Surfaces
Point Compression
Point Rendering
![Page 82: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/82.jpg)
82
High Level Shading Languages
• Cg, HLSL, & OpenGL Shading Language
– Cg: • http://www.nvidia.com/cg
– HLSL: • http://msdn.microsoft.com/library/default.asp?url=/library/en-
us/directx9_c/directx/graphics/reference/highlevellanguageshaders.asp
– OpenGL Shading Language: • http://www.3dlabs.com/support/developer/ogl2/whitepapers/ind
ex.html
![Page 83: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/83.jpg)
83
GPGPU Languages
• Why do want them? – Make programming GPUs easier!
• Don’t need to know OpenGL, DirectX, or ATI/NV extensions
• Simplify common operations
• Focus on the algorithm, not on the implementation
![Page 84: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/84.jpg)
Compilers: CGC & FXC
• HLSL and Cg are syntactically almost identical – Exception: Cg 1.3 allows shader “interfaces”, unsized arrays
• Command line compilers – Microsoft’s FXC.exe
• Compiles to DirectX vertex and pixel shader assembly only
• fxc /Tps_2_0 myshader.hlsl
– NVIDIA’s CGC.exe
• Compiles to everything
• cgc -profile ps_2_0 myshader.cg
– Can generate very different assembly!
• Driver will recompile code
– Compliance may vary
![Page 85: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/85.jpg)
85
Babelshader
• Converts between DirectX pixel shaders and OpenGL shaders
• Allows OpenGL programs to use DirectX HLSL compilers to compile programs into ARB or fp30 assembly.
• Enables fair benchmarking competition between the HLSL compiler and the Cg compiler on the same platform with the same demo and driver.
http://graphics.stanford.edu/~danielrh/babelshader.html
Example Conversion Between Ps2.0 and ARB
![Page 86: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/86.jpg)
‘printf’ Debugging
• MOV suspect register to output
– Comment out anything else writing to output
– Scale and bias as needed
• Recompile
• Display/readback frame buffer
• Check values
• Repeat until error is (hopefully) found
![Page 87: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/87.jpg)
‘printf’ Debugging Examples
![Page 88: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/88.jpg)
‘printf’ Debugging Examples
![Page 89: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/89.jpg)
‘printf’ Debugging Examples
![Page 90: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/90.jpg)
‘printf’ Debugging Examples
![Page 91: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/91.jpg)
Ideal Fragment Program Debugger
• Automate ‘printf’ debugging
• Intuitive and easy to use interface
• Features over performance
– Debuggers don’t need to be fast • Should still be interactive
• Easy to add to existing apps
– And remove
• Touch as little GPU state as possible
• Report actual hardware values
– No software simulations!
![Page 92: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/92.jpg)
• Instruction-Level Parallelism
• Data-Level Parallelism
Data Parallel Computing
![Page 93: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/93.jpg)
frag2frame Smooth(vert2frag IN, uniform samplerRECT Source : texunit0, uniform samplerRECT Operator : texunit1,
uniform samplerRECT Boundary : texunit2, uniform float4 params)
{
frag2frame OUT;
float2 center = IN.TexCoord0.xy;
float4 U = f4texRECT(Source, center);
// Calculate Red-Black (odd-even) masks
float2 intpart;
float2 place = floor(1.0f - modf(round(center + float2(0.5f, 0.5f)) / 2.0f, intpart));
float2 mask = float2((1.0f-place.x) * (1.0f-place.y), place.x * place.y);
if (((mask.x + mask.y) && params.y) || (!(mask.x + mask.y) && !params.y))
{
float2 offset = float2(params.x*center.x - 0.5f*(params.x-1.0f), params.x*center.y - 0.5f*(params.x-1.0f));
...
float4 neighbor = float4(center.x - 1.0f, center.x + 1.0f, center.y - 1.0f, center.y + 1.0f);
float central = -2.0f*(O.x + O.y);
float poisson = ((params.x*params.x)*U.z + (-O.x * f1texRECT(Source, float2(neighbor.x, center.y)) +
-O.x * f1texRECT(Source, float2(neighbor.y, center.y)) +
-O.y * f1texRECT(Source, float2(center.x, neighbor.z)) +
-O.z * f1texRECT(Source, float2(center.x, neighbor.w)))) / O.w;
OUT.COL.x = poisson;
}
...
return OUT;
}
A really naïve shader
![Page 94: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/94.jpg)
frag2frame Smooth(vert2frag IN, uniform samplerRECT Source : texunit0, uniform samplerRECT Operator : texunit1,
uniform samplerRECT Boundary : texunit2, uniform float4 params)
{
frag2frame OUT;
float2 center = IN.TexCoord0.xy;
float4 U = f4texRECT(Source, center);
// Calculate Red-Black (odd-even) masks
float2 intpart;
float2 place = floor(1.0f - modf(round(center + float2(0.5f, 0.5f)) / 2.0f, intpart));
float2 mask = float2((1.0f-place.x) * (1.0f-place.y), place.x * place.y);
if (((mask.x + mask.y) && params.y) || (!(mask.x + mask.y) && !params.y))
{
float2 offset = float2(params.x*center.x - 0.5f*(params.x-1.0f), params.x*center.y - 0.5f*(params.x-1.0f));
...
float4 neighbor = float4(center.x - 1.0f, center.x + 1.0f, center.y - 1.0f, center.y + 1.0f);
float central = -2.0f*(O.x + O.y);
float poisson = ((params.x*params.x)*U.z + (-O.x * f1texRECT(Source, float2(neighbor.x, center.y)) +
-O.x * f1texRECT(Source, float2(neighbor.y, center.y)) +
-O.y * f1texRECT(Source, float2(center.x, neighbor.z)) +
-O.z * f1texRECT(Source, float2(center.x, neighbor.w)))) / O.w;
OUT.COL.x = poisson;
}
...
return OUT;
}
A really naïve shader
![Page 95: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/95.jpg)
float2 offset = float2(params.x*center.x - 0.5f*(params.x-1.0f),
params.x*center.y - 0.5f*(params.x-1.0f));
float4 neighbor = float4(center.x - 1.0f,
center.x + 1.0f,
center.y - 1.0f,
center.y + 1.0f);
Instruction-Level Parallelism
![Page 96: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/96.jpg)
float2 offset = center.xy - 0.5f;
offset = offset * params.xx + 0.5f; // MADR is cool too – one
// cycle, two flops
float4 neighbor = center.xxyy + float4(-1.0f,1.0f,-1.0f,1.0f);
Instruction-Level Parallelism
![Page 97: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/97.jpg)
Data-Level Parallelism
• Pack scalar data into RGBA in texture memory
![Page 98: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/98.jpg)
Computational Frequency
• Think of your CPU program and your vertex and fragment programs as different levels of nested looping.
...
foreach tri in triangles {
// run the vertex program on each vertex
v1 = process_vertex(tri.vertex1);
v2 = process_vertex(tri.vertex2);
v3 = process_vertex(tri.vertex2);
// assemble the vertices into a triangle
assembledtriangle = setup_tri(v1, v2, v3);
// rasterize the assembled triangle into [0..many] fragments
fragments = rasterize(assembledtriangle);
// run the fragment program on each fragment
foreach frag in fragments {
outbuffer[frag.position] = process_fragment(frag);
}
}
...
![Page 99: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/99.jpg)
Computational Frequency
• Branches
– Avoid these, especially in the inner loop – i.e., the fragment program.
![Page 100: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/100.jpg)
Computational Frequency
• Static branch resolution
– write several variants of each fragment program to handle boundary cases
– eliminates conditionals in the fragment program
– equivalent to avoiding CPU inner-loop branching
case 2: accounts for boundaries
case 1: no boundaries
![Page 101: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/101.jpg)
Computational Frequency
• Dynamic branching
– Use only when needed
• Good perf requires spatial coherence in branching
![Page 102: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/102.jpg)
Computational Frequency
• Precompute
• Precompute
• Precompute
![Page 103: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/103.jpg)
Computational Frequency
• Precompute texture coordinates
– Take advantage of under-utilized hardware
• vertex processor
• rasterizer
– Reduce instruction count at the per-fragment level
– Avoid lookups being treated as texture indirections
![Page 104: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/104.jpg)
104
Memory Hierarchy
• CPU and GPU Memory Hierarchy
Disk
CPU Main
Memory
GPU Video
Memory CPU Caches
CPU Registers GPU Caches
GPU Temporary
Registers
GPU Constant
Registers
![Page 105: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/105.jpg)
105
CPU Memory Model
• At any program point
– Allocate/free local or global memory
– Random memory access
• Registers – Read/write
• Local memory – Read/write to stack
• Global memory – Read/write to heap
• Disk – Read/write to disk
![Page 106: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/106.jpg)
106
GPU Memory Model
• Much more restricted memory access
– Allocate/free memory only before computation
– Limited memory access during computation (kernel)
• Registers – Read/write
• Local memory – Does not exist
• Global memory – Read-only during computation
– Write-only at end of computation (pre-computed address)
• Disk access – Does not exist
![Page 107: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/107.jpg)
GPU Memory Model
• Where is GPU Data Stored?
– Vertex buffer
– Frame buffer
– Texture
Vertex
Buffer
Vertex
Processor Rasterizer
Fragment
Processor
Texture
Frame
Buffer(s)
VS 3.0 GPUs
![Page 108: The Way of the GPU - Purdue University · The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2014 Daniel G. Aliaga Department of Computer Science Purdue University . Computer](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ece22cb1de8721bff1f8f2b/html5/thumbnails/108.jpg)
108
Glift: http://graphics.cs.ucdavis.edu/~lefohn/work/glift/
• Goal – A template for simple creation and efficient use of random-
access GPU data structures for graphics and GPGPU programming
• Contributions – Abstraction for GPU data structures
– The template library