id tech 5 challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-jp_id_tech_5_challenges.pdf · game...

37
Beyond Programmable Shading 1 id Tech 5 Challenges From Texture Virtualization to Massive Parallelization J.M.P. van Waveren senior programmer id Software

Upload: others

Post on 07-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 1

id Tech 5 Challenges

From Texture Virtualization toMassive Parallelization

J.M.P. van Waverensenior programmer

id Software

Page 2: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 2

Menu

• GPU virtual texturing, a couple of interesting issues

• How virtual texturing got us to a parallel job system

• Widespread use of the job system throughout the engine

• Getting the jobs back onto the (GP) GPU

Page 3: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 3

Virtual Texturing

• Unique, very large virtual textures key to id tech 5 rendering

• Full description beyond the scope of this talk

Page 4: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 4

Virtual Texturing

Page 5: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 5

Virtual Texturing

Page 6: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 6

Virtual Texturing

Texture Pyramid with Sparse Page Residency Physical Page Texture

Quad-tree of Sparse Texture Pyramid

Page 7: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 7

Virtual Texturing

Page 8: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 8

Virtual Texturing

Page 9: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 9

Virtual Texturing

Very Large = 128k x 128k texels (1024 pages on a side)

Page 10: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 10

Virtual Texturing

Very Large = 128k x 128k texels (1024 pages on a side)

Page 11: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 11

Virtual Texturing

Very Large = 128k x 128k texels (1024 pages on a side)

Page 12: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 12

Virtual Texturing

Very Large = 128k x 128k texels (1024 pages on a side)

Page 13: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 13

Virtual Texturing

A few interesting issues...

- Texture filtering

- Thrashing due to physical memory oversubscription

- LOD transitions under high latency

Page 14: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 14

Virtual Texturing - Filtering

• We tried no filtering at all

• We tried bilinear filtering without borders

• Bilinear filtering with border works well

• Trilinear filtering reasonably but still expensive

• Anisotropic filtering possible via TXD (texgrad)

– 4-texel border necessary (max aniso = 4)

– TEX with implicit derivs ok too (on some hardware)

Page 15: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 15

Virtual Texturing - Thrashing

• Sometimes you need more physical pages than you have

• With conventional virtual memory, you must thrash

• With virtual texturing, you can globally adjust feedback

LOD bias until working set fits

1024 Physical Pages 64 Physical Pages

8x8 pages32 x 32 pages

Page 16: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 16

Virtual Texturing – LOD Snap

• Latency between first need and availability can be high

– Especially if optical disk read required (>100 msec seek!)

• Visible snap happens when magnified texture changes LOD

• If we used trilinear filtering, blending in detail would be easy

• Instead continuously update physical pages with blended data

Page 17: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 17

Virtual Texturing – LOD Snap

upsample blend

• Upsample coarse page immediately

• Then blend in finer data when available

Page 18: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 18

Virtual Texturing - Management

• Analysis tells us what pages we need

• We fetch what we can

Fetch Page

• But this is a real-time app... so no blocking allowed

• Cache handles hits, schedules misses to load in background

• Resident pages managed independent of disk cache

• Physical pages organized as quad-tree per virtual texture

• Linked lists for free, LRU, and locked pages

Page 19: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 19

Virtual Texturing - Feedback

• Feedback Analysis– Gen ~breadth-first quad-tree order w/ priority

0

1

2

3

Feedback Buffer

Color Buffer

Page 20: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 20

Virtual Texturing - Transcode

8x8

four

4x4

DCT DXT

Transcode pipelined to block or row level to reduce memory profile.

• Transcode– diffuse, specular, bump and cover/alpha

– specular block scale stored in bump

• Typically 2-6kB input, 40kB output

• Unmap, Transcode, and Map all happen in parallel on

platforms that can directly write texture memory

Page 21: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 21

Virtual Texturing - Pipeline

• Compute intensive complex system with dependencies

that we want to run in parallel on all the different

platforms

Analysis

Fetch Page

Allocate

Transcode

Unmap

Map

Threads / Jobs ?

CPU Virtual Texture Pipeline

GPU

feedback

Page 22: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 22

Game Engine Situation Today

• Logical GPU Architecture Stable

– DX9 == nirvana for conventional hardware graphics

– programmable stages, fixed topology

• CPU Architectures all over the map

– Fast single core model definitely dead

– Homogenous / Symmetric processors (PC, XBox)

• big cores w/ cache, 1-2 hardware threads / core

• some have complicated out-of-order processing

– Heterogeneous processors (Cell)

• 1-2 big cores

• multiple small in-order cores w/ local memory & DMA controller

– Streaming processors / GPGPU (NVIDIA / AMD GPUs, Intel Larrabee)

• many cores

• CUDA / OpenCL

• Challenge: one engine to efficiently harness them all

Page 23: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 23

What's the big deal?

• id Tech 5 does a lot of processing

– Animation blending – ~2 msec

– Collision detection – ~4 msec

– Obstacle avoidance – ~4 msec

– Transparency sorting – ~2 msec

– Virtual texturing – ~8 msec

– Misc processing – ~4 msec

– Rendering – ~10 msec

– Audio – ~4 msec

• And at 60 Hz, not much time to do it – 16 msec

• Portable parallel software architecture is required

Page 24: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 24

What Software Architecture?

• OS thread factoring

– Good for small # of cores

– Not terribly invasive

– Complexity grows nonlinearly

– Load balancing tricky

– Not a good match for cell SPUs

• Small stand-alone job decomposition

– Quite invasive rewrite

– Very scalable

– Almost required by cell SPUs

– Good for heterogeneous processors

Page 25: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 25

Job Processing System

• Simplicity key to scalability– Job has well defined input and output

– Independent stateless, no stalls, always completes

– Jobs added to job lists

– Multiple job lists

– Job lists fully independent

– Simple synchronization of jobs within list through “signal” and

“synchronize” tokens

Phase 1a Jobs

Phase 2a Jobs

Signal

Sync

Phase 2b Jobs

Job

Job

Job

Pipelined Job List

Simple Job ListPhase 1b Jobs

Signal

Sync

Phase 1 Job

Sync

Phased Job List

Signal

Phase 1 Job

Phase 2 Job

Phase 2 Job

Page 26: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 26

Death by Synchronization

• Synchronization means waiting, waiting destroys

parallelism

• Architectural decision: Job processing given 1 frame of

latency to complete

– Results of jobs show up a frame late

– Requires some algorithm surgery

• e.g. foliage

– Rules out some algorithms

• e.g. screen-space binning of transparency sort

– But overall, not a bad compromise

A0

B0

A0

B1

B0

A1

A1

B1

00 1

1

Page 27: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 27

id Tech 5 Job Decomposition

• Major parts of of id Tech 5 processing factored into jobs– Collision detection

– Animation blend

– Obstacle avoidance

– Virtual texturing

– Transparency processing (foliage, particles)

– Cloth simulation

– Water surface simulation

– Detail model generation (rocks, pebbles etc.)

Page 28: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 28

Collision Detection

• Two phases

– Query (continuous collision detection CCD)

• Check sub-model collisions

– Merge

• Find the first collision or gather all contacts

• Player physics does not use delayed detection

– 16 msec extra delay in user feedback undesirable

Page 29: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 29

Animation Blend

• Animation graph or “web” describes valid transitions

• A stack is used to evaluate a blend tree

– Leaves are decoded source animations

– Parents are intermediate blend results

• Tree walking generates a command list for the stack

• Most blending happens in local space (parallel)

• Final phase moves everything to model space

Page 30: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 30

Obstacle Avoidance

• One job per character that wants to avoid obstacles

• Construction of job input comes from a scan of

Area Awareness System for potential obstacles and their

surroundings

Page 31: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 31

Transparency

• Transparency requires sorting and blending: expensive

– Must be handled separately

• Restrict to particle systems and foliage

• Limited buffer size

• Split into a number of jobs

– Foliage gather

– Foliage gen

– Particle gen

– Transparency sort and index gen

• Tricky to keep these jobs under

SPU limits

Page 32: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 32

Jobs on the (GP) GPU

• We are cautiously optimistic about the job model

– Anticipate CUDA, OpenCL, Larrabee support

• Easy to add additional job processing resources

– But this is new territory…

Page 33: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 33

Jobs on the (GP) GPU

– Not enough jobs to fill SIMD / SIMT lanes

– Code paths of different jobs diverge too much

– Jobs are useful as unit of work (latency tolerant &

small memory footprint)

– Data parallelism within jobs needs to be exploited

– Split jobs into many fine grained threads

– Data dependencies in input

– Convergence of output data

– Memory access of the fine grained threads is important

Page 34: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 34

Conclusions

• Virtual texturing + great artists = awesome environments

• id Tech 5 does a lot of work and has to exploit parallelism

• Cell forced us to re-factor engine into jobs

• Latency tolerant computational services model attractive

• Jobs are now running on a variety of processors

• Hopefully soon CUDA, OpenCL, Larrabee support

Page 35: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 35

Virtual Texturing

Page 36: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 36

Virtual Texturing

Page 37: id Tech 5 Challenges - vsb.czmrl.cs.vsb.cz/people/gaura/agu/05-JP_id_Tech_5_Challenges.pdf · Game Engine Situation Today •Logical GPU Architecture Stable –DX9 == nirvana for

Beyond Programmable Shading 37

Questions?

• Please complete the course evaluation at:

http://www.siggraph.org/courses_evaluation

• Chance to win a SIGGRAPH’09 mug!

One winner per course, notified by email in the evening.