adding rtx acceleration to iray with optix 7...• brute force (spectral) simulation no intermediate...

35
Lutz Kettner Director Advanced Rendering and Materials Adding RTX acceleration to Iray with OptiX 7 July 30 th , SIGGRAPH 2019

Upload: others

Post on 09-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

Lutz Kettner Director Advanced Rendering and Materials

Adding RTX acceleration to Iray with OptiX 7

July 30th, SIGGRAPH 2019

Page 2: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

2

What is Iray?

Bring ray tracing based production /

simulation quality rendering to GPUs

New paradigm: Push Button rendering

(open up new markets)

Production Rendering on CUDA In Production since > 10 Years

Plugins for

3ds Max Maya Rhino SketchUp …

Page 3: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

3

SIMULATION QUALITY

Page 4: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

4

iray legacy

ARTISTIC FREEDOM

Page 5: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

5

How Does it Work?

To guarantee simulation quality and Push Button

• Limit shortcuts and good enough hacks to minimum

• Brute force (spectral) simulation

no intermediate filtering

scale over multiple GPUs and hosts even in interactive use

99% physically based Path Tracing

19 VCA * 8 Q6000 GPUsGTC 2014

Page 6: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

6

How Does it Work?

To guarantee simulation quality and Push Button

• Limit shortcuts and good enough hacks to minimum

• Brute force (spectral) simulation

no intermediate filtering

scale over multiple GPUs and hosts even in interactive use

• Two-way path tracing from camera and (opt.) lights

99% physically based Path Tracing

Page 7: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

7

How Does it Work?

To guarantee simulation quality and Push Button

• Limit shortcuts and good enough hacks to minimum

• Brute force (spectral) simulation

no intermediate filtering

scale over multiple GPUs and hosts even in interactive use

• Two-way path tracing from camera and (opt.) lights

• Use NVIDIA Material Definition Language (MDL)

99% physically based Path Tracing

Page 8: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

8

How Does it Work?

To guarantee simulation quality and Push Button

• Limit shortcuts and good enough hacks to minimum

• Brute force (spectral) simulation

no intermediate filtering

scale over multiple GPUs and hosts even in interactive use

• Two-way path tracing from camera and (opt.) lights

• Use NVIDIA Material Definition Language (MDL)

• NVIDIA AI Denoiser to clean up remaining noise

99% physically based Path Tracing

Page 9: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

9

How Does it Work?

To guarantee simulation quality and Push Button

• Limit shortcuts and good enough hacks to minimum

• Brute force (spectral) simulation

no intermediate filtering

scale over multiple GPUs and hosts even in interactive use

• Two-way path tracing from camera and (opt.) lights

• Use NVIDIA Material Definition Language (MDL)

• NVIDIA AI Denoiser to clean up remaining noise

99% physically based Path Tracing

Page 10: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

10

Wavefront Architecture

Follows each path to completion

One path at a time

Single CUDA (mega-)kernel

From Megakernel to State Machine

Small progress on each path per step

Millions of active paths at a time

Multiple smaller CUDA kernels (states) specialized

on parts of the simulation (state machine)

Global memory (AoSoA layout) to communicate

between states

Page 11: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

11

Wavefront Architecture

23 specialized CUDA kernels (scene dependent)

• Ray tracing

to complete a path camera light

and connecting to lights on the way (NEE)

Iray State Machine

Page 12: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

12

Wavefront Architecture

23 specialized CUDA kernels (scene dependent)

• Ray tracing

to complete a path camera light

and connecting to lights on the way (NEE)

• Geometry / textured-light and environment

importance sampling

Iray State Machine

~400.000 emissive triangles

Page 13: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

13

Wavefront Architecture

23 specialized CUDA kernels (scene dependent)

• Ray tracing

to complete a path camera light

and connecting to lights on the way (NEE)

• Geometry / textured-light and environment

importance sampling

• Material evaluation / importance sampling

• …

Iray State Machine

~400.000 emissive triangles

Page 14: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

14

Wavefront Architecture

Tail-megakernel to finish up the last handful of paths

State machine within a single kernel to reduce kernel launches

Iray State Machine

Techreport: The Iray Light Transport Simulation and Rendering System

https://arxiv.org/pdf/1705.01263.pdf

Page 15: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

15

Adding RTX Support

Dec 2018: Start with RTX prototype

Feb 2019: Start using WIP OptiX 7 implementation

May 2019: Shipping!

From OptiX Prime to OptiX 7

First Iray RTX image

Page 16: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

16

Introducing OptiX 7

NVIDIA Driver

Microsoft DXR

NVIDIA VKRay

NVIDIA OptiX 1-6

NVIDIA OptiX 7

Multi GPU

NVLINK Memory Scaling

API Capture

Memory Management

Hierarchy builders

Schedulers

RTX Programming model

Language interface

Sustainable APIs

Page 17: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

17

Iray on OptiX 7

All kernel variants that need to trace rays are now executed through OptiX 7

Path-/Light-Tracer main trace kernels

incl. SSS code and shortcuts for state machine early outs

Wavefront Architecture

Page 18: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

18

Iray on OptiX 7

All kernel variants that need to trace rays are now executed through OptiX 7

Path-/Light-Tracer main trace kernels

incl. SSS code and shortcuts for state machine early outs

Path-/Light-Tracer shadow trace kernels

incl. few shortcuts for state machine early outs

Wavefront Architecture

Page 19: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

19

Iray on OptiX 7

All kernel variants that need to trace rays are now executed through OptiX 7

Path-/Light-Tracer main trace kernels

incl. SSS code and shortcuts for state machine early outs

Path-/Light-Tracer shadow trace kernels

incl. few shortcuts for state machine early outs

Rounded Corners

Wavefront Architecture

Page 20: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

20

Iray on OptiX 7

All kernel variants that need to trace rays are now executed through OptiX 7

Path-/Light-Tracer main trace kernels

incl. SSS code and shortcuts for state machine early outs

Path-/Light-Tracer shadow trace kernels

incl. few shortcuts for state machine early outs

Rounded Corners

Light-Tracer lens connection

Wavefront Architecture

Page 21: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

21

Iray on OptiX 7

All kernel variants that need to trace rays are now executed through OptiX 7

Path-/Light-Tracer main trace kernels

incl. SSS code and shortcuts for state machine early outs

Path-/Light-Tracer shadow trace kernels

incl. few shortcuts for state machine early outs

Rounded Corners

Light-Tracer lens connection

All other kernels stay on plain CUDA implementations / kernel launches

Wavefront Architecture

Page 22: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

22

Iray on OptiX 7

Split up the Tail-megakernel into 2 new kernels

Trace rays + the remainder of the state machine

Majority of code in __raygen__

One single optixTrace() call, no branching, for best performance

(except for Tail-trace- and rounded corners kernels)

__closesthit__ directly fills wavefront state, no payload communication

Compile time / Pipeline setup 7-10 secs (with warm cache 0.1-0.2 secs)

~21k lines of PTX

Wavefront Architecture

Page 23: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

23

Iray on OptiX 7

2-level hierarchy to get full RT core performance

optionally: reduce instancing overhead by (partially) flattening instances

RTX Hierarchy Setup

Page 24: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

24

Iray on OptiX 7

2-level hierarchy to get full RT core performance

optionally: reduce instancing overhead by (partially) flattening instances

Use compaction

slight build time decrease not that much of an issue for us

memory savings can be dramatic

No native OptiX 7 Motion Blur to get full RT cores performance

as sample rate per pixel is high and hierarchy updates cheap,

do brute force sample trafos/materials and rebuild scene every X iterations

Refitting of bottom level hierarchies for vertex deformed geometry

RTX Hierarchy Setup

Page 25: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

25

General Issues

4 Ray Tracing implementations at work now (before: OptiX Prime)

Embree (CPU)

OptiX Prime (pre-Turing / need to support all CUDA 10 GPUs)

OptiX 7 / RT Cores (Turing)

OptiX 7 / RTX Software Traversal (Turing with no RT Cores)

Slightly different behavior in special cases (i.e. self intersection)

and hierarchy construction/data implementation details

Precision / Performance / Memory Usage

Page 26: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

26

General Issues

Triangle/Node intersection watertightness has some interesting implications

Origins very far away with directions pointing to the scene will intersect

almost the whole scene,

causing massive slowdowns

Iray: infinite ground plane / shadow

catcher generates this frequently

Workaround by manually pushing all ray

origins closer to the scene BBox

Performance

Page 27: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

27

Performance3.0 x Overall Rendering Speed-Up

Page 28: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

28

Performance2.8 x Overall Rendering Speed-Up

Page 29: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

29

Performance1.5 x Overall Rendering Speed-Up

~101k instances flattened to ~50m triangles

66s down to 61s (4k, RTX 6000)

Page 30: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

30

Performance1.05 x Overall Rendering Speed-Up

Customers see full range from “no” to 5x overall rendering speed-up

Page 31: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

31

Ray Tracing no Bottleneck Anymore

One still has to care about overall efficient rendering:Not just tracing a ray as fast as possible, but generating valuable rays / samples

➢ Sample and eval large, layered materialand texture node graphs

➢ Sample and eval large amount ofgeometric light sources

i.e. many instructions and a lot of memory accessed per traced ray

Otherwise many many more rays/paths needed to get similar noise levelNeed to balance generation time vs sample quality vs evaluation time vs trace time

When Using RT Cores

Page 32: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

32

Performance

Batch scheduling (e.g. long running renderings / cloud)efficient with current Iray OptiX 7 implementation triggers almost no Tail-megakernel (paths are regenerated on the fly)

Interactive scheduling suffers from split ofTail-megakernel: kernel launch overhead too high

Too much time spent in light importance samplingtraversal of geometry lights & environment lighthierarchies

Notes

Page 33: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

33

Going Forward

Optimize / Rethink importance sampling and material / tex evaluation pipelinesto shift work-per-sample-ratio towards Ray Tracing again

Reduce material / texture complexitydynamically (LOD via MDL distiller)

Adaptive Sampling

Over time: Better scheduling performance / lessoverhead by basing complete core on OptiX 7

RTX Specific Roadmap

Page 34: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

34

Going Forward

Optimize / Rethink importance sampling and material / tex evaluation pipelinesto shift work-per-sample-ratio towards Ray Tracing again

Reduce material / texture complexitydynamically (LOD via MDL distiller)

Adaptive Sampling

Over time: Better scheduling performance / lessoverhead by basing complete core on OptiX 7

RTX Specific Roadmap

Page 35: Adding RTX acceleration to Iray with OptiX 7...• Brute force (spectral) simulation no intermediate filtering ... All kernel variants that need to trace rays are now executed through

35

Questions?

Acknowledgments

Carsten Wächter

Daniel Seibert

Enzo Catalano

Matthias Raab

More Information

Techreport: The Iray Light Transport

Simulation and Rendering System

https://arxiv.org/pdf/1705.01263.pdf