questions about gpus

29
1 Questions about GPUs AS-Temps réel – Bordeaux Philippe Decaudin Fabrice Neyret Stéphane Guy Sylvain Lefebvre

Upload: sharne

Post on 30-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Questions about GPUs. AS-Temps réel – Bordeaux. Philippe Decaudin Fabrice Neyret. Stéphane Guy Sylvain Lefebvre. Overview. Suspicious behaviors of GPUs . What could be improved quickly. Our wishes. 1. Suspicious behaviors. 3D Textures, MIP-mapping and anisotropy - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Questions about GPUs

1

Questions about GPUs

AS-Temps réel – Bordeaux

Philippe DecaudinFabrice Neyret

Stéphane GuySylvain Lefebvre

Page 2: Questions about GPUs

2

Overview

1. Suspicious behaviors of GPUs.

2. What could be improved quickly.

3. Our wishes.

Page 3: Questions about GPUs

3

1. Suspicious behaviors

• 3D Textures, MIP-mapping and anisotropy

• Clamp to border and MIP-mapping

• Deferred shading for conditions

Page 4: Questions about GPUs

4

3D textures and MIP-mapping

• Filtering of 3D textures is isotropic (same MIP-map level for u, v and w)

When texturing a slice of volume with a 3D texture, the slope of the slice controls the blur

sloppy => blurry

With 2D textures, this problem is solved by anisotropic filtering, but:

- it is not available for 3D textures…- no dsdw

Page 5: Questions about GPUs

5

Clamp_to_border and MIP-mapping

• Using non-square textures (2a 2b, a b)• CLAMP_TO_BORDER_ARB enabled• MIP-map enabled

at MIP-map levels > log2(a) (with a < b), the texture becomes 1D:

clamp doesn’t seem to work correctly…

Page 6: Questions about GPUs

6

Clamp_to_border and MIP-mapping a quad mapped with a white texture 25664

(u,v) = (0,0) to (6,6)

mapping clamped to [0,1][0,1]

border color is black

MIP-map level forced with GL_TECTURE_MIN_LOD

Square texture (6464) correct

(0,0)

(6,6)

Clamp problem bug?

Texture 25664

MIP-map level = 1 (25664)

MIP-map level = 8 (11)

Page 7: Questions about GPUs

7

Clamp_to_border and MIP-mapping

all MIP-map levels

128x32 64x16 32x8 16x4 8x2 4x1 2x1 1x1

Page 8: Questions about GPUs

8

Deferred shading for conditions

fast, but a few pixels are wrong

1

Theory:Custom shader:

• costly particular case for some pixels

• general case fast.

Page 9: Questions about GPUs

9

fast, but a few pixels are wrong flag wrong pixels in stencil

1 2

Deferred shading for conditionsTheory:

Page 10: Questions about GPUs

10

fast, but a few pixels are wrong

use slow correction shader only where needed

1 2

3

flag wrong pixels in stencil

Deferred shading for conditionsTheory:

Page 11: Questions about GPUs

11

fast, but a few pixels are wrong

use slow correction shader only where needed

1 2

3 Practice:Does not work !?• cost the same as full rendering

• stencil after fragment program ?

flag wrong pixels in stencil

Deferred shading for conditions

Page 12: Questions about GPUs

12

2. What could be improved quickly

• Front to back rendering with alpha

• Fog and pre-multiplied alpha

• Scale / Bias not limited to [0,1]

• Tiled textures storage

Page 13: Questions about GPUs

13

Front to back rendering with alpha

• Theory: front to back (to benefit Z-test culling)

• Practice with alpha: must disable depth_test ! landscapes, billboards, volumes… even more costly

• Suggested solution:glEnable(ALPHA_DEPTH_TEST)

doing regular blending, thenif (Ascreen>Athreshold)

Zscreen:=Zfrag

Page 14: Questions about GPUs

14

Fog and pre-multiplied alpha

• Premultiplied alpha textures (aR,aG,aB,a) to avoid interpolation artefacts.

• Blend mode is then set to glBlendFunc(GL_ONE,GL_ONE_MINUS_SRC_ALPHA)

Page 15: Questions about GPUs

15

Fog and pre-multiplied alpha

• Problem: not compatible with standard Fog equation (glEnable(GL_FOG))Fog equation: Cfrag’= (1-f) Cfrag+ f Cfog In our case, Cfrag = aC (premultiplied alpha)

Blend equation: Cdest’=Cfrag’+ (1-a)Cdest

Cdest’= (1-f) aC + f Cfog + (1-a)Cdestmissing ‘a’

• Suggestion: if SRC_ALPHA == GL_ONE then

fog equation: (1-f) C + f a Cfog

Page 16: Questions about GPUs

16

Scale / Bias not limited to [0,1]

• Theory:scale, bias to tune transparency, light, color…

• Practice:coef must lies in [0,1] ! applications which need coef > 1 must multipass (volume rendering, light maps…)

• Suggested Solution:Float operation in texture units for scaling.

Page 17: Questions about GPUs

17

Tiled textures storage

• Principle:– Use a set of tiles to create a larger texture

fragmentprogram

Page 18: Questions about GPUs

18

Tiled textures storage

• Numerous new advanced usage of tiles– Antialiased Parameterized Solid Texturing

[Hart et al] ACM Transactions on Graphics 2002

– Adaptive Texture Maps[Kraus and Ertl] Graphics Hardware 2002

– Pattern Based Procedural Textures[Lefebvre and Neyret] I3D 2003

– … and more !

Page 19: Questions about GPUs

19

Tiled textures storage

• Storage requirements

– Efficient selection from FP– No waste of memory– Ease filtering / interpolation of tiles

• per-tile border color

• per-tile MIP-mapping

Page 20: Questions about GPUs

20

Tiled textures storage• Current solutions

– Store as separate textures• limited bindings• no real selection in FP

– Store in an atlas• false neighborhood and no border color• waste space if filtering on tiles• limited to 4096x4096

– Store as a stack in a 3D Texture• waste a lot of space (2n)• ill-defined filtering

Page 21: Questions about GPUs

21

Tiled textures storage• Our proposal: Texture stack

• 3D Texture with no filtering on R

• per-tile border colors

• per-tile filtering

• easy selection from fragment program

s

t r

level 0 - 32x32x4 level 1 - 16x16x4 level 2 - 4x4x4

Page 22: Questions about GPUs

22

3. Our wishes• Having features more coherent

– programable blending– feedback buffer– read access to all state variables (ex: MIP-mapping)– r/w stencil from FP

• Native support of tiled textures

• Fragment culling order (stencil, A, depth)

• Tools for procedural textures

Page 23: Questions about GPUs

23

Having features more coherent

• Programable blending:

– multi-pass algorithms with intermediate results

– needs complex blending

– current blending very limited compared to programable pipeline

Page 24: Questions about GPUs

24

Having features more coherent

• Feedback buffer:

– lot of vertex attributes

– few can be read back

Page 25: Questions about GPUs

25

• Perspective correction

rasterization in object space not always desired

need control on variables interpolated by rasterizer

glEnable/Disable(param,GL_PERSPECTIVE_CORRECTION)

Having features more coherent

Page 26: Questions about GPUs

26

Native support of tiled textures

• Difficulties

– Storage (borders, repeat, …)

– MIP-mapping

– Linear interpolation

Page 27: Questions about GPUs

27

Fragment culling order

• Early culling essential for performance

• Would let the user specify test order between– alpha test– stencil– depth– discard (from FP)

Page 28: Questions about GPUs

28

Tools for procedural textures

New texture types (envmap, indir., N, shaders, …) custom filtering

• MIP-mapping levels– have to be infered from ddx ddy– while computed for standard textures needs to be accessible (read)

• resolution specifiable by user

Page 29: Questions about GPUs

29

Questions about GPUs

AS-Temps réel – Bordeaux

Philippe DecaudinFabrice Neyret

Stéphane GuySylvain Lefebvre