gpu - how can we use it?

21
Bartłomiej Filipek www.bfilipek.com [email protected]

Upload: bartlomiej-filipek

Post on 12-Nov-2014

3.237 views

Category:

Technology


0 download

DESCRIPTION

this presentation covers some general topics about current GPU architecture, how one can use it, how one can code on this...

TRANSCRIPT

Page 1: GPU - how can we use it?

Bartłomiej Filipek www.bfilipek.com [email protected]

Page 2: GPU - how can we use it?

How does this work? General architecture Advices Tools

The lecture will not cover the technical details about the gpu, it shows only overview needed to understand current technologies and standads.

Page 3: GPU - how can we use it?

CPU

BUS Commands, Textures,

Vertices, Shaders, Data…

GPU

application

3D Api (DirectX/OpenGL)

Driver

Vertex Processing

Fragment Processing

Framebuffer

Memory

Display

Page 4: GPU - how can we use it?

Vertex Processing

Fragment Processing

Memory/textures

Vertex units

Pixel units

As we can see, previous architectures matched vertex/fragment „fixed” chain… so at the beginning all the data was processed in „vertex units” and then it was moved to fragment units.

Page 5: GPU - how can we use it?

SISD – Single Instruction Single Data

Standard way… one instruction is being executed per single data.

SIMD – Single Instruction Multiple Data Instruction is being executed per several data –

like for one 4D vector (128 bits)

MIMD – Multiple Instructions Multiple Data Parrarel processing!

Page 6: GPU - how can we use it?

Dyn

amic

tas

k d

ivis

ion

Fix

ed t

ask

div

isio

n…

u n u s e d

Fragment units used

Vertex units used

Effect that uses a lot vertex processing

u n u s e d

Fragment units used

Vertex units used

Effect that uses a lot fragment processing

Units

Effect that uses a lot vertex processing

Vertex units used

fragment units used

Units

Effect that uses a lot fragment processing

fragment units used

vertex units used

Vertex units/Fragment units and their quantities were fixed – we had N vertex processors, and M fragment processors, but now we have unifed architecture. That means that we have K units that can process vertex and fragments… there is no difference between them.

Page 7: GPU - how can we use it?

Shared memory

Stream processors

Controller

As we can see there are no vertex/fragment units… instead there are stream processors that can handle both vertex and fragments… and even more.

Page 8: GPU - how can we use it?

Scalars… not Vectors!

Stream processor uses only one data per instruction.

But we have a lot of SP!

SP gives far more great flexibility.

GPGPU

SIMT – Single Instruction Multiple Threads

Page 9: GPU - how can we use it?

New architecture - NV DX11, OpenCL Miltithreaded Rendering

Rendering commands can be called from difrent threads

3 000 000 000 transistors! End of 2009? End of winter 2010? Never?

Double precission callculations cost twice as much as float, not ten times as it was before!

Debugging – one can debug gpu directly from VisualStudio

Page 10: GPU - how can we use it?

Unified Shader

Geometry Shader

Vertex Shader

Fragment Shader

CUDA OpenCL DirectX Compute ATI Stream

Page 11: GPU - how can we use it?

General-purpose computing on graphics processing units

Kernels – code that will be executed on the

GPU Not only graphics but also:

Physics ▪ Fluids ▪ Collisions ▪ N-body simulations…

Financial Speach/Pattern recognition Phenomena modelling – weather… Neural nets AI

Page 12: GPU - how can we use it?

Use as few as possible: calculations Huge textures – mimpaps instead interpolators Data Rendering state changes Dynamic Vertex Buffers Textures… use texture atlases maybe Texture fetches

Use more: Batches Triangle stripes

Page 13: GPU - how can we use it?

Use Maths

Reduce calculation on uniform vars!

Normalize

Uniform sphere:

p = sqrt(Rx^2 + Ry^2 + (Rz + 1)^2) =

sqrt(Rx^2 + Ry^2 + Rz^2 + 2Rz + 1);

R vector is normalized so: Rx^2 + Ry^2 + Rz^2 = 1

p = sqrt(2 * (Rz + 1)) = 1.414*sqrt(Rz + 1)

half4 main(float2 diffuse : TEXCOORD0,

uniform sampler2D diffuseTex,

uniform half4 g_OverbrightColor) {

return tex2D(diffuseTex, diffuse) * g_OverbrightColor * 3.0;

}

dot(normalize(N), normalize(L)) uses two sqrts!

but:

(N/|N|) dot (L/|L|) = (N dot L) / (|N| * |L|) = (N dot L) / (sqrt( (N dot N) *

(L dot L) ) = (N dot L) * rsq( (N dot N) * (L dot L) )

Now we have only one sqrt – three dots are much cheaper than sqrt

Calculte this before it is send to the gpu!

Page 14: GPU - how can we use it?

Texture lookups:

~ 10 : 1 (ALU:Sampler)

Normalization cube map

Single „Dot” is not worth texture lookups…

But calculation of NormalDistribution… YES!

Early Z-Test

Depth-only Rendering, then full scene (for the second time)

Page 15: GPU - how can we use it?

Lighten number of attributes – „pack” them as possible. float4 myData is better than:

▪ float3 myDataOne; ▪ float1 myDataTwo;

But do not pack in interpolators

Use as few scalars as possible When vectors are packed no optimalizations can be performed

What do you really need?

Normal, binormal, tangent… no! You need only two of them! Binormal = normal _Cross_ Tangent

Page 16: GPU - how can we use it?

PerfKit •For DirectX mostly •Little support for OpenGL – via glExpert

PiX for Windows •Shows everything! But only for Windows, DirectX…

Similar to Pix, but for OpenGL… 800$ ;(

AMD GPU Perf

Page 17: GPU - how can we use it?

GLIntercept • OpenGL • free • log every call of opengl command • edit shaders in realtime • although it is a bit simple it has a powerful impact on debugging…

Page 18: GPU - how can we use it?

GPU ShaderAnalyzer • free, from AMD! • glsl/hlsl • shows number of asm instructions • ALU, TEX instructions, etc.. • bottlenecks

Page 19: GPU - how can we use it?

FXComposer, by NVidia

RenderMonkey by AMD/ATI

ShaderDesigner by TyphoonLabs

Page 21: GPU - how can we use it?