optimization for directx9 graphics€¦ · optimization for directx9 graphics ashu rege. last year:...

29
Optimization for DirectX9 Optimization for DirectX9 Graphics Graphics Ashu Rege

Upload: others

Post on 23-Jun-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

Optimization for DirectX9 Optimization for DirectX9 GraphicsGraphics

Ashu Rege

Page 2: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

Last Year: Batch, Batch, Batch

• Moral of the story: Small batches BAD

• What is a “batch”– Every DrawIndexedPrimitive call is a batch– All render, texture, shader, ... state is same

Page 3: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

Simple Test App

• Degenerate Triangles (no fill cost)• Post TnL Cache Vertices (no xform cost)• Static Data (minimal AGP overhead)• Fixed (~100 K) Tris/Frame• Vary Number of Batches

Page 4: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

Last Year’s Graph Updated Measured Performance: Different Batch-Sizes

0.0%10.0%20.0%30.0%40.0%50.0%60.0%70.0%80.0%90.0%

100.0%

10 60 110

160

300

800

1300

triangles/batch

mill

ion

tria

ngle

s/s

3GHz Pentium 4; RADEON 9800 XT

3Ghz Pentium 4; NVIDIA GeForce FX 5950 Ultra

Axis scale changeAxis scale change

Page 5: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

This Year: Son Of A Batch

• What makes an app ‘batchy’?– Too many state changes

• What kinds of state changes?

• Techniques to reduce batches

Page 6: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

State Changes

• Analysis of some popular games

• Top State Changes:– Texture State– Vertex Shaders and Vertex Shader Constants– Pixel Shaders and Pixel Shader Constants

Page 7: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

Do State Changes Really Matter?

• Cost of state changes• Comparison with no state changes• One state change:

– Factor of 4 drop in fps (on average)• Multiple state changes:

– Another factor of 2-5 drop

Page 8: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

How To Sort?

• Seems like an n-dimensional problem

• Should I sort by texture, pixel shader, vertex shader, ... what?

Page 9: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

Texture v. Pixel Shader

Different

Textures

Different Pixel Shaders

Page 10: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

Collapse One Of The Axes

Different

Textures

Different Pixel Shaders

Page 11: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

Texture Atlases

(0,0) (0.5,0) (1,0)

Texture A Texture B(0,0) (1,0)

(1,1)(0,1)

(0,0)

(0,1)

(1,0)

(1,1)

Texture Atlas

(0,1) (0.5,1) (1,1)

Page 12: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

Basic Idea

• Select batch-breaking textures• Pack into one or more texture atlases• Update the uv-coordinates of models

• Convert multiple DIP calls into one

Page 13: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

What About Mip-Maps?

• What happens to the lowest 1x1 level?– Smearing?

• Tool-chain should generate mip-maps before packing

• Use special purpose mip-map filters

Page 14: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

What About Lower Levels? 1 16x16 Sub-Texture

12 8x8 Sub-Textures

4x4 Level

2x2 Level

Smearing

Page 15: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

Auto-Generation of Mip-Maps

• 2x2 Box filter can also work for power-of-2 textures– Both atlas and sub-textures in it are pow2– Textures should not cross pow2 lines

Page 16: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

Proper Placement For Box Filter

‘4’ power-of-2 lines

32x32 Atlas

‘16’ power-of-2 line

‘8’ power-of-2 lines

A 16x16 sub-texture cannot cross any ‘16’ power-of-2 lines.

Page 17: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

What About Lower Levels? 1 16x16 Sub-Texture

12 8x8 Sub-Textures

4x4 Level

2x2 Level

Smearing

Page 18: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

Possible Solutions

• Terminate mip chain to fit smallest sub-texture– Image Quality and Performance Issues

• Use only sub-textures of same size– May be inflexible

• But there’s good news...

Page 19: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

Cannot Access Lower Levels• A triangle’s texture coordinates never

span across sub-textures

• Worst case: pixel-sized triangle spanning entire sub-texture

• Only “1-texel” level is accessed– Fill it with valid data

Page 20: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

Cannot Access Lower Levels

Pixel Sized Quad

• DirectX raster rules make it unlikely for smaller quad (or tri) to generate pixel

Page 21: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

Other Issues

• Address modes such as clamp?– Use ddx, ddy in pixel-shader to emulate modes

• Smearing due to filtering– Texels on border of sub-textures get smeared– Aniso can help: smaller footprint– Do re-mapping of texcoords in pixel shaders– Pad textures with border texels

Page 22: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

DirectX9 Instancing API• What is it?

– Single draw call to draw multiple instances of the same model

• Why should you care?– Avoid DIP calls and minimize batching overhead

• What do you need?– DirectX 9.0c– VS 3.0/PS 3.0 support

Page 23: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

When To Use Instancing

• Many Instance of Same Model– Forest of trees, particle systems, sprites

• Encode per-instance data in auxiliary stream– Colors, texture coordinates, per-instance consts

• Not as useful if batching overhead is low– Fixed overhead to instancing

Page 24: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

How Does It Work?• Vertex stream frequency divider API

• Primary stream is a single copy of the model data

• Secondary stream: per instance data– pointer is advanced for each rendered instance

Vertex Data

Per instance dataVS_3_0

0

1

Page 25: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

Simple Instancing Example• 100 poly trees

– Stream 0 contains just the one tree model– Stream 1 contains model WVP transforms

• Possibly calculated per frame based on the instances in the view

– Vertex Shader is the same as normal, except you use the matrix from the vertex stream instead of the matrix from VS constants

• If you are drawing 10k trees that’s a lot of draw call savings!– You could manipulate the VB and pre-transform verts, but

it’s often tricky, and you are replicating a lot of data

Page 26: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

Some Test ResultsInstancing versus Single DIP calls

0 500 1000 1500 2000 2500

Batch Size

FPS

InstancingNo Instancing

1 million diffuse shaded polys in each run

Page 27: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

Test Summary• Big win for small batch sizes• Fixed overhead for instancing• Cross-over point changes depending on

CPU and GPU, engine overhead etc.

Page 28: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

More Information• White paper and tools soon for texture

atlases on www.nvidia.com/developer• “Profiling Your DirectX Application” in

NVIDIA sponsored session on Wed.

Page 29: Optimization for DirectX9 Graphics€¦ · Optimization for DirectX9 Graphics Ashu Rege. Last Year: Batch, Batch, Batch ... – Both atlas and sub-textures in it are pow2 – Textures

Questions?• Contact: [email protected]