status – week 281 victor moya. objectives research in future gpus for 3d graphics. research in...

27
Status – Week Status – Week 281 281 Victor Moya Victor Moya

Post on 21-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

Status – Week Status – Week 281281

Victor MoyaVictor Moya

Page 2: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

ObjectivesObjectives

Research in future GPUs for 3D Research in future GPUs for 3D graphics.graphics.

Simulate current and future 3D Simulate current and future 3D graphic hardware.graphic hardware.

Finish (someday) the PhD ;).Finish (someday) the PhD ;).

Page 3: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

ProblemsProblems Information.Information. Choice of the simulation target:Choice of the simulation target:

Current GPUs.Current GPUs. Near future GPUs.Near future GPUs. Absolutely new GPU designs.Absolutely new GPU designs.

Future is hard to predict.Future is hard to predict. But GPUs change very fast.But GPUs change very fast. Fierce competence between ATI and NVidia. Fierce competence between ATI and NVidia.

Matrox and 3DLabs follow (3DLabs can rule Matrox and 3DLabs follow (3DLabs can rule workstation market). SIS and VIA as OEM. workstation market). SIS and VIA as OEM.

Page 4: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

StatusStatus

Designing a hardware 3D graphics Designing a hardware 3D graphics pipeline:pipeline: Command processors.Command processors. Vertex Shader. Vertex Shader. Divide by w, Clip, Culling and Triangle Setup.Divide by w, Clip, Culling and Triangle Setup. Rasterization.Rasterization. Pixel shaders.Pixel shaders. Antialiasing.Antialiasing.

Designing the simulator.Designing the simulator.

Page 5: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

3D Graphics Pipeline3D Graphics Pipeline

Page 6: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

GeometryGeometry

Vertex operations:Vertex operations: (1) Transform coordinates and normal (1) Transform coordinates and normal

Model => World.Model => World. World => Eye.World => Eye.

(2) Normalize the length of the normal.(2) Normalize the length of the normal. (3) Compute vertex lightning.(3) Compute vertex lightning. (4) Transform texture coordinates.(4) Transform texture coordinates. (5) Transform coordinates to clip coordinates (5) Transform coordinates to clip coordinates

(projection).(projection). (8) Divide coordinate by w.(8) Divide coordinate by w. (9) Apply affine viewport transform (x, y, z).(9) Apply affine viewport transform (x, y, z).

Page 7: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

GeometryGeometry

Primitive operations:Primitive operations: (6) Primitive assembly (6) Primitive assembly (7) Clipping:(7) Clipping: (10) Backface cull: eliminate back-(10) Backface cull: eliminate back-

facing triangles. facing triangles. Primitive generation: new pipeline Primitive generation: new pipeline

stage (ATI TruForm).stage (ATI TruForm).

Page 8: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

Vertex ShaderVertex Shader

VS 1.0, 1.1 and 1.2 (current technology) VS 1.0, 1.1 and 1.2 (current technology) for Direct3D 8 and 8.1. OpenGL for Direct3D 8 and 8.1. OpenGL extensions: ARB_vertex_program (finally extensions: ARB_vertex_program (finally in OpenGL v1.4), in OpenGL v1.4), NV_vertex_program1_1 (NVidia), NV_vertex_program1_1 (NVidia), EXT_vertex_shader (ATI).EXT_vertex_shader (ATI).

No branching.No branching. Single cycle execution latency (?).Single cycle execution latency (?). Single issue instruction each cycle.Single issue instruction each cycle. Simple in order pipeline (?).Simple in order pipeline (?).

Page 9: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

Vertex ShaderVertex Shader

16 input registers (read only).16 input registers (read only). 15 output registers (write only).15 output registers (write only). 12 temporary registers 12 temporary registers

(read/write).(read/write). 96 constant registers (read only or 96 constant registers (read only or

read/write?).read/write?). 256 instructions max256 instructions max

Page 10: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

Vertex ShaderVertex Shader OutputOutput Inputs (vector orInputs (vector or Opcode (scalar or vector) replicated scalar) OperationOpcode (scalar or vector) replicated scalar) Operation ------ ------------------ ------------------ -------------------------------- ------------------ ------------------ -------------------------- ARL s address register address register loadARL s address register address register load MOV v v moveMOV v v move MUL v,v v multiplyMUL v,v v multiply ADD v,v v addADD v,v v add MAD v,v,v v multiply and addMAD v,v,v v multiply and add RCP s ssss reciprocalRCP s ssss reciprocal RSQ s ssss reciprocal square rootRSQ s ssss reciprocal square root DP3 v,v ssss 3-component dot productDP3 v,v ssss 3-component dot product DP4 v,v ssss 4-component dot productDP4 v,v ssss 4-component dot product DST v,v v distance vectorDST v,v v distance vector MIN v,v v minimumMIN v,v v minimum MAX v,v v maximumMAX v,v v maximum SLT v,v v set on less thanSLT v,v v set on less than SGE v,v v set on greater equal thanSGE v,v v set on greater equal than EXP s v exponential base 2EXP s v exponential base 2 LOG s v logarithm base 2LOG s v logarithm base 2 LIT v v light coefficientsLIT v v light coefficients DPH v,v ssss homogeneous dot productDPH v,v ssss homogeneous dot product RCC s ssss reciprocal clampedRCC s ssss reciprocal clamped SUB v,v v subtractSUB v,v v subtract ABS v v absolute valueABS v v absolute value

Page 11: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

ClippingClipping

Clip geometry primitives with the Clip geometry primitives with the view frustrum (6 planes).view frustrum (6 planes).

Clip geometry primitives with the Clip geometry primitives with the user clip planes.user clip planes.

Techniques used:Techniques used: Guard-Band Clipping.Guard-Band Clipping. Homogenous rasterization avoids Homogenous rasterization avoids

clipping in the geometry stage.clipping in the geometry stage.

Page 12: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

Guard-Band ClippingGuard-Band Clipping

Page 13: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

Homogeneus coordinatesHomogeneus coordinates

““Triangle Scan Conversion using Triangle Scan Conversion using 2D Homogeneus Coordinates”, 2D Homogeneus Coordinates”, Olano and Greer.Olano and Greer.

Page 14: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

RasterizationRasterization

Setup (per-triangle).Setup (per-triangle). Sampling (triangle = {fragments}.Sampling (triangle = {fragments}. Interpolation (interpolate colors Interpolation (interpolate colors

and coordinates).and coordinates).

Page 15: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

RasterizationRasterization

Converts primitives to fragments.Converts primitives to fragments. Primitive: point, line, polygon, …Primitive: point, line, polygon, … Fragment: transient data structure Fragment: transient data structure

short x, y;short x, y;

long depth;long depth;

short r, g, b, a;short r, g, b, a;

Fragment selection.Fragment selection. Parameter Assignment (color, depth ...).Parameter Assignment (color, depth ...).

Page 16: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

Programmable PipelineProgrammable Pipeline

Page 17: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

Vertex ProgramVertex Program

Page 18: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

Vertex ProgramVertex Program

Page 19: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

NV_vertex_program2NV_vertex_program2 ARL (new support for four-component A0 and A1 instead of just ARL (new support for four-component A0 and A1 instead of just

A0.x)A0.x) ARR (similar to ARL, but rounds instead of truncating before ARR (similar to ARL, but rounds instead of truncating before

storing the integer result in an address register)storing the integer result in an address register) BRA, CAL, RET (branching instructions)BRA, CAL, RET (branching instructions) COS, SIN (high-precision trigonometric functions)COS, SIN (high-precision trigonometric functions) FLR, FRC (floor and fraction of floating-point values)FLR, FRC (floor and fraction of floating-point values) EX2, LG2 (high-precision exponentiation and logarithm functions)EX2, LG2 (high-precision exponentiation and logarithm functions) ARA (adds pairs of components of an address register; useful for ARA (adds pairs of components of an address register; useful for

looping and other operations)looping and other operations) SEQ, SFL, SGT, SLE, SNE, STR (“set on” instructions similar to SEQ, SFL, SGT, SLE, SNE, STR (“set on” instructions similar to

SLT, SGE)SLT, SGE) SSG (“set sign” operation; generates a vector holding –1.0 for SSG (“set sign” operation; generates a vector holding –1.0 for

negative operand components, 0 for zero-value components, negative operand components, 0 for zero-value components, and +1.0 for positive components)and +1.0 for positive components)

Page 20: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

NV_vertex_program2 Overview

1. Condition codes 2. Branching & subroutines 3. Even faster performance 4. Nineteen new instructions 5. New source modifiers 6. Clip plane support 7. More registers & instructions

Page 21: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

NV_vertex_program2 Resource NV_vertex_program2 Resource LimitsLimits

256 vertex program parameters256 vertex program parameters Up from 96Up from 96 16 temporary registers16 temporary registers Up from 12Up from 12 Two 4-component address registersTwo 4-component address registers Up from one single-component address registerUp from one single-component address register 256 static instructions per program256 static instructions per program Up from 128Up from 128 Given branching, 65536 dynamic instructions Given branching, 65536 dynamic instructions

can execute before termination to avoid infinite can execute before termination to avoid infinite loopsloops

Page 22: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

NV_vertex_program2 Source NV_vertex_program2 Source ModifiersModifiers

Source operand absolute valueSource operand absolute value Example: MOV R0, |R1|;Example: MOV R0, |R1|; In addition to source negation & In addition to source negation &

swizzlingswizzling Example: MAD R0, -|R1|.yzwy, |R2|, Example: MAD R0, -|R1|.yzwy, |R2|,

-R3,w;-R3,w; Swizzle, negate, & absolute value Swizzle, negate, & absolute value

operations are “free” source operations are “free” source modifiersmodifiers

Page 23: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

NV_vertex_program2 Condition NV_vertex_program2 Condition Codes (1)Codes (1)

Condition code stateCondition code state 4-component register stores condition code values4-component register stores condition code values Four possible valuesFour possible values LT –less than zeroLT –less than zero EQ – equal to zeroEQ – equal to zero GT –greater than zeroGT –greater than zero UN– unordered, for comparisons involving NaNUN– unordered, for comparisons involving NaN Most instructions optionally update condition code stateMost instructions optionally update condition code state Indicated with “C” suffix: DP4C, MOVC, etcIndicated with “C” suffix: DP4C, MOVC, etc “ “CC” pseudo-register used to just update condition CC” pseudo-register used to just update condition

codescodes

Page 24: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

NV_vertex_program2 Condition NV_vertex_program2 Condition Codes (2)Codes (2)

Optional condition code based destination Optional condition code based destination maskingmasking

Example: MOV R1.xy(NE.z), R0;Example: MOV R1.xy(NE.z), R0; Copy R0components to R1’s X & Y components Copy R0components to R1’s X & Y components

except when condition code’s Z component is except when condition code’s Z component is EQEQ

Condition code rules: EQ, equal; GE, greater or Condition code rules: EQ, equal; GE, greater or equal; GT, greater than; LE, less or equal; LT, equal; GT, greater than; LE, less or equal; LT, less than; NE, not equal; FL, false; and TR, trueless than; NE, not equal; FL, false; and TR, true

Note that condition code masking rule can Note that condition code masking rule can swizzle condition code componentsswizzle condition code components

Page 25: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

ATI R300. Vertex Shader.ATI R300. Vertex Shader.

Page 26: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

3DLabs P10. Pipeline.3DLabs P10. Pipeline.

Page 27: Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future

Matrox Parhelia. Pipeline.Matrox Parhelia. Pipeline.