gpu-assisted path tracing
DESCRIPTION
GPU-Assisted Path Tracing. Matthias Boindl Christian Machacek. Institute of Computer Graphics and Algorithms Vienna University of Technology. Motivation: Why Path Tracing?. Physically based Nature provides the reference image Parallelizable Sublinear in #objects Conceptually simple - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/1.jpg)
GPU-Assisted Path Tracing
Matthias Boindl
Christian Machacek
Institute of Computer Graphics and Algorithms
Vienna University of Technology
![Page 2: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/2.jpg)
2
Motivation: Why Path Tracing?
Physically basedNature provides the reference image
Parallelizable
Sublinear in #objects
Conceptually simpleCan lead to a clean implementation
But: fast implementation on GPUs not trivial
![Page 3: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/3.jpg)
Outline
Path tracing introMain steps of the algorithm
Mapping the algorithm to the GPUHow to organize code into kernels
When to launch kernels
How to pass data between kernels
Acceleration structuresFocus on bounding volume hierarchies
3Christian Machacek
![Page 4: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/4.jpg)
Like ray tracing, except it……supports arbitrary BRDFs
…is stochastic: at each bounce, the new direction is decided randomly
Convergence video
From Pharr, Humphreys: PBRT, 2nd ed. (2010) 4
Path Tracing Intro
![Page 5: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/5.jpg)
From Pharr, Humphreys: PBRT, 2nd ed. (2010) 5
Path Tracing Pseudocode
while image not converged r = new ray from eye through next pixel do i = closest intersection of r with scene if no i: break if i is on a light source: c = c + throughput * emission randomly pick new direction and create reflected ray r evaluate BRDF at i update throughput while path throughput high enough
![Page 6: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/6.jpg)
From Pharr, Humphreys: PBRT, 2nd ed. (2010) 6
Path Tracing Pseudocode
while image not converged r = new ray from eye through next pixel do i = closest intersection of r with scene if no i: break if i is on a light source: c = c + throughput * emission randomly pick new direction and create reflected ray r evaluate BRDF at i update throughput while path throughput high enough
logic15%
new path4%
mate-rials25%
ray cast56%
Execution Time
![Page 7: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/7.jpg)
From Bikker (2013) 7
Megakernel Execution Divergence
![Page 8: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/8.jpg)
Solution: Wavefront Path Tracing
Separate, specialized kernels
Keep a pool of ~1 million paths alive
Work for next stage goes into kernel-specific, compact queues (=4MB index arrays)
8https://mediatech.aalto.fi/~samuli/
![Page 9: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/9.jpg)
Results
Performance
Execution times(ms / 1M path segments)
9Christian Machacek
![Page 10: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/10.jpg)
Limitations and Possible Improvements
Higher memory requirements (+200 MB)
Kernel launch overheadDynamic parallelism on GK110
Use an outer scheduling kernel
No CPU round trip
Launch independent stages side-by-sideCUDA streams
So kernels with little work don’t hog the GPU
10Christian Machacek
![Page 11: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/11.jpg)
Acceleration Structures
Find nearest intersection in O(log N)
Space partitioning vs. object partitioning
Hybrid methods exist
11Matthias Boindl
![Page 12: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/12.jpg)
Performance
For interactive rendering, compromiseTraversal performance (build quality)
Construction/Update time
Update or rebuild from scratch
Adapt to GPU environmentMemory architecture
Parallel execution
12Matthias Boindl
![Page 13: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/13.jpg)
State of the Art
Tero Karras and Timo Aila. 2013. Fast parallel construction of high-quality bounding volume hierarchies. In Proceedings of the 5th High-Performance Graphics Conference (HPG '13). ACM, New York, NY, USA, 89-99.
13Matthias Boindl
![Page 14: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/14.jpg)
Close the Performance Gap
14Matthias Boindl
![Page 15: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/15.jpg)
Basic Idea
Fast construction of simple BVHGenerate leaf for each triangle
Reduce SAH cost by modifying tree
15Matthias Boindl
![Page 16: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/16.jpg)
Treelets
Allow local tree modification
16Matthias Boindl
ABCF are leaves, DEG are internal nodes
![Page 17: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/17.jpg)
Treelet Construction
Find root: parallel bottom-up traversalStart with leaves
Use atomic counter at conjunctions
Ensures all children have been processed
Build treeletAdd both children
Pick children withhighest surface area
Fixed size: 7 leaf nodes
17Matthias Boindl
![Page 18: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/18.jpg)
Rearrange Treelet
Minimize treelet root node surface areaNaive implementation: test each permutation
Better: dynamic programmingCaching of best intermediate resultsStart with leaves, then pairs, then triplets, …
Suboptimal subtree construction avoided
Parallelizable as well
18Matthias Boindl
![Page 19: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/19.jpg)
Results
Gap closed
19Matthias Boindl
![Page 20: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/20.jpg)
Results
Speed/Quality tradeoff
20Matthias Boindl
![Page 21: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/21.jpg)
Conclusion
Use specialized kernelsLower execution divergence
(Better use of instruction cache)
(Fewer registers used simultaneously)
Construct acceleration structures quicklyBut not too quickly
21Matthias Boindl
![Page 22: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/22.jpg)
Thanks for your attention!
Institute of Computer Graphics and Algorithms
Vienna University of Technology
![Page 23: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/23.jpg)
Results
Speed/Quality tradeoff
23Matthias Boindl
![Page 24: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/24.jpg)
Logic Kernel
Does not need a queue, operates on all paths
If shadow ray was unblocked, add light contribution
Find material or light source the ray hitsPlace path into proper material queue
Russian roulette
If path terminated, accumulate to imagePlace path into new path queue
Sample light sources (aka next event estim.)
24Christian Machacek
![Page 25: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/25.jpg)
New Path Kernel
Generate a new image-space sample
Generate camera rayPlace it into extension ray cast queue
Initialize path stateThroughput
Pixel position
etc.
25Christian Machacek
![Page 26: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/26.jpg)
Material Kernels
Generate incoming direction
Evaluate light contribution based on light sample generated in the logic kernel
We haven’t cast the shadow ray yet!
For MIS: p(light sample) from the BSDF
Discard BSDF stack
Queueextension ray
(shadow ray)
26Christian Machacek
![Page 27: GPU-Assisted Path Tracing](https://reader030.vdocuments.us/reader030/viewer/2022032612/56813366550346895d9a7c71/html5/thumbnails/27.jpg)
Ray Cast Kernels
Extension raysFind first intersection against scene geometry
Store hit data into path state
Shadow raysBlocked or not?
27Christian Machacek