ray tracing and photon mapping on gpus tim purcellstanford / nvidia

33
Ray Tracing and Photon Ray Tracing and Photon Mapping on GPUs Mapping on GPUs Tim Purcell Stanford / NVIDIA

Upload: frederica-hensley

Post on 26-Dec-2015

249 views

Category:

Documents


4 download

TRANSCRIPT

Ray Tracing and Photon Ray Tracing and Photon Mapping on GPUsMapping on GPUs

Tim Purcell Stanford / NVIDIA

Small Sampling of GI on GPUsSmall Sampling of GI on GPUs

• Much more detail in the included papers

• Lots of other ‘global illumination on GPUs’ in the literature– The Ray Engine [Carr et al. 2002]– GPU Algorithms for Radiosity and Subsurface Scattering

[Carr et al. 2003]– Radiosity on Graphics Hardware [Coombe et al. 2004]– Lots and lots of shadow papers…

RadiosityRadiosity

Radiosity on Graphics Hardware[Coombe et al. 2004]

Subsurface ScatteringSubsurface Scattering

GPU Algorithms for Radiosity and Subsurface Scattering [Carr et al. 2003]

Ray TracingRay Tracing

Ray TracingRay Tracing

SpecularSpecular

DiffuseDiffuse

DiffuseDiffuse

PP

TT

TT

SS SS

SSOccluderOccluder

Point LightPoint Light

R

MaterialMaterial

MaterialMaterial

MaterialMaterial

CameraCamera

Implementation OptionsImplementation Options

• GPU as a ray-triangle intersection engine [Carr et al. 2002]

– Rays and geometry streamed to GPU– Intersection calculation results read back– Acceleration structure traversal done on host CPU

• GPU as a ray tracing engine [Purcell et al. 2002]

– Scene geometry and acceleration structure stored on GPU– GPU performs ray generation, acceleration structure traversal,

intersection, and shading– Host provides camera info

Streaming Ray TracerStreaming Ray TracerGenerate Generate Eye RaysEye Rays

Traverse Traverse Acceleration Acceleration

StructureStructure

Intersect Intersect TrianglesTriangles

Shade Hits Shade Hits and and

Generate Generate Shading Shading

RaysRays

CamerCameraa

GridGrid

TriangleTriangless

MaterialMaterialss

Techniques UsedTechniques Used

• Data structure navigation– Texture memory stores data structures– Dependent texture fetches walk through data

• Flow control– Kernel binding based on occlusion query results– Efficient selective execution of kernels using early-z occlusion

culling– Difficulty in flow control disappearing with newest graphics

cards• PS 3.0

Texture Memory OrganizationTexture Memory Organization

xyzxyz xyzxyz xyzxyz xyzxyz xyzxyz xyzxyz …… xyzxyz

00 33 1111 3838 …… 564564

00 33 11 33 77 2121 216216 ……

xyzxyz xyzxyz xyzxyz xyzxyz xyzxyz xyzxyz …… xyzxyz

xyzxyz xyzxyz xyzxyz xyzxyz xyzxyz xyzxyz …… xyzxyz

Uniform GridUniform Grid3D Luminance 3D Luminance

TextureTexture

Triangle ListTriangle List1D Luminance1D Luminance

Texture Texture

TrianglesTriangles3x 1D RGB 3x 1D RGB

TexturesTextures

vox0vox0 vox1vox1 vox2vox2 vox3vox3 vox4vox4 vox5vox5 voxMvoxM

vox0vox0 vox2vox2

tri0tri0 tri1tri1 tri2tri2 tri3tri3 tri4tri4 tri5tri5 triNtriN

v0v0

v1v1

v2v2

Efficient Selective ExecutionEfficient Selective Execution

• Rendering giant screen filling quad not ideal

– Not all pixels need to process every rendering pass

• Proposed low-overhead early fragment kill

– Computation mask– Controllable early-Z

occlusion culling

• Trade computation for bandwidth

Original System ImplementationOriginal System Implementation

• ATI Radeon 9700 Pro (R300)

• ATI Fragment Program

Cornell Box – Ray Traced ShadowsCornell Box – Ray Traced Shadows

Rendered using a Radeon 9700 Pro

TeapotahedronTeapotahedron

Rendered using a Radeon 9700 Pro

Quake 3 – Ray Traced ShadowsQuake 3 – Ray Traced Shadows

Rendered using a Radeon 9700 Pro

Quake 3 – Ray Traced ShadowsQuake 3 – Ray Traced Shadows

Rendered using a Radeon 9700 Pro

Performance ResultsPerformance Results

• Radeon 9700 Pro– 100M ray-triangle intersections/s– 300K to 4.0M rays/s– Between 3 – 12 fps @ 256x256 pixels

• CPU implementation– 20M intersections/s P3 800 MHz [Wald et al. 2001]– 800K to 7.1M ray/s 2.5 GHz P4 [Wald et al. 2003]

• With simple shading: 1.8M to 2.3M rays/s

Photon MappingPhoton Mapping

Photon MappingAlgorithm ReviewPhoton MappingAlgorithm Review• Photon tracing

– Emission, scattering, storing into k-d tree

– Similar to ray tracing

• Rendering

– Ray tracing for direct illumination

– Photon map visualization• Indirect bounce

Computational Challenge for GPUs #1Computational Challenge for GPUs #1

• Constructing a irregular or sparse data structure

Computational Challenge for GPUs #2Computational Challenge for GPUs #2

• Adaptive nearest neighbor search

– Noise vs. blur

Computational Challenge for GPUs #2Computational Challenge for GPUs #2

• Adaptive nearest neighbor search

– Noise vs. blur

Scatter on the GPUScatter on the GPU

• Sort photons into grid cells– Grid cell is sort key

• Two solutions– Simulate scatter with fragment programs

• Bitonic merge sort followed by binary search• Multiple rendering passes

– Vertex program with stencil buffer• Fixed number of photons per grid cell• Single rendering pass

Adaptive Nearest Neighbor SearchAdaptive Nearest Neighbor Search

• Iterative algorithm

• Accept or reject photons in cell visit order– No priority queue!– kNN-grid

Original System ImplementationOriginal System Implementation

• NVIDIA GeForce FX 5900 Ultra (NV35)

• Cg compiler 1.1

TracePhoton

s

BuildPhoton

Map

RayTraceScene

ComputeRadianceEstimate

Compute Lighting Render Image

Glass Ball – Bitonic SortGlass Ball – Bitonic Sort

18s @ 512x384, 5K photons

Glass Ball – Stencil RoutingGlass Ball – Stencil Routing

11s @ 512x384, 5K photons

Ring – Bitonic SortRing – Bitonic Sort

9s @ 512x384, 16K photons

Ring – Stencil RoutingRing – Stencil Routing

8s @ 512x384, 16K photons

Cornell Box – Bitonic SortCornell Box – Bitonic Sort

64s @ 512x512, 65K photons

Cornell Box – Stencil RoutingCornell Box – Stencil Routing

47s @ 512x512, 65K photons

Cornell Box – Increased Search RadiusCornell Box – Increased Search Radius

SummarySummary

• GPU can perform global illumination calculations– Lots of options for splitting computation between CPU and

GPU

• Global illumination calculations require many techniques useful to GPGPU computations– Data structure navigation– Sort, search– Data dependent looping and branching