using graphics processors for real-time global illumination uk gpu computing conference 2011 graham...
TRANSCRIPT
![Page 1: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/1.jpg)
Using Graphics Processors for Real-Time Global Illumination
UK GPU Computing Conference 2011
Graham Hazel
![Page 2: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/2.jpg)
What this talk is about
![Page 3: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/3.jpg)
Plan for the talk
• Who we are and what we do
• Why it’s a GPGPU problem
• Technical details and results
• Videos
![Page 4: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/4.jpg)
Who we are and what we do
Geomerics is a “middleware” company• We licence our technology to game
developers• Our first product is Enlighten, a real-time
radiosity solution
![Page 5: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/5.jpg)
Real-time radiosity pipeline
Goal: compute light bounces in real time
• Scene geometry is fixed
• Light sources can move
• Small objects can move and are lit correctly but don’t bounce light
![Page 6: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/6.jpg)
Real-time radiosity pipeline
1. Sample the direct lighting
![Page 7: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/7.jpg)
Real-time radiosity pipeline
2. Compute one bounce
![Page 8: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/8.jpg)
Real-time radiosity pipeline
3. Resample the bounce as input and repeat
![Page 9: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/9.jpg)
A very brief history of console hardware
...?
![Page 10: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/10.jpg)
Game console observations
• Custom hardware is the norm
• GPUs yield more FLOPS / $
• PC games already use GPGPU
![Page 11: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/11.jpg)
Why is this a GPGPU problem?
• Fastest platform available today
• Future-proof?
![Page 12: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/12.jpg)
Technical details
• How CUDA and DirectX share the GPU
• Optimisations
• Performance
![Page 13: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/13.jpg)
How to share memory resources
• DirectX device must be specified when creating CUDA context
• DirectX resources can be registered for use by CUDA• cu[da]GraphicsD3D9RegisterResource• cu[da]GraphicsD3D10RegisterResource• cu[da]GraphicsD3D11RegisterResource
![Page 14: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/14.jpg)
How to share memory resources
• Graphics resources can be mapped and unmapped using• cu[da]GraphicsMapResources• cu[da]GraphicsUnmapResources
• These are analogous to the DirectX calls• device->Lock• device->Unlock
![Page 15: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/15.jpg)
How to share computational resources
• CUDA and rendering can’t run simultaneously
• Context switch between DirectX and CUDA is expensive
![Page 16: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/16.jpg)
Timeshare the GPU
Frame start Frame end
time
Render shadow maps
context switch
Compute radiosity using CUDA
context switch
Render scene with radiosity results
![Page 17: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/17.jpg)
Advantages of running on the GPU
• Read from graphics resources directly
• Write graphics resources directly
• Zero latency
![Page 18: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/18.jpg)
Disadvantage of running on the GPU
• Taking time away from rendering
![Page 19: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/19.jpg)
Optimisations
• Light samples and sum in the same kernel
![Page 20: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/20.jpg)
Compute Capability Confession
• The input lighting kernel is “too big” and spills registers to global memory
• This kills performance on pre-Fermi cards
![Page 21: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/21.jpg)
Optimisations
Radiosity solver: usual CUDA techniques
• Pack threads into blocks
• Reorder precomputed data
• Use shared memory
![Page 22: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/22.jpg)
Compute Capability Confession #2
• Radiosity solver has to gather large amount of lighting data from memory
• L1/L2 cache key to performance
• Pre-Fermi cards only have texture cache
![Page 23: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/23.jpg)
Results
• Test scene: “Arches” with high output resolution (~91,000 pixels)
![Page 24: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/24.jpg)
Results
• CPU• Intel Core i7• Hand-optimised vector intrinsics• Single thread: 33ms• Whole CPU (8 cores): 4.1ms• Doesn’t measure CPU-GPU copies
• GPU• NVIDIA GTX 580• 2.0ms (+ 0.5ms CPU)• Measures all of the required work
![Page 25: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/25.jpg)
Videos
• Dockyard demo
• Current research
![Page 26: Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel](https://reader036.vdocuments.us/reader036/viewer/2022062318/551771cc55034645368b4c98/html5/thumbnails/26.jpg)
Thank you for listening!
Questions?
www.geomerics.com