advanced visual effects with - home - amd · 2013-10-25 · advanced visual effects with opengl ......
TRANSCRIPT
![Page 1: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/1.jpg)
![Page 2: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/2.jpg)
Advanced Visual Effects withAdvanced Visual Effects withOpenGLOpenGL10:00-11:00 Intro & Updates Bill Licea-Kane ATI Research11:00-11:15 Coffee Break11:15-12:15 What’s Next Bill Licea-Kane ATI Research
Michael Gold NVIDIA12:15-12:30 Morning Q/A12:30-14:00 Lunch14:00-14:45 Performance Evan Hart ATI Research14:45-15:15 Tools Jeff Kiel NVIDIA
Derek Cornish NVIDIA15:15-16:00 Tools Yaki Tebeka Graphic Remedy
Avi Shapira Graphic Remedy16:00-16:15 Coffee Break16:15-17:00 NVIDIA OpenGL Simon Green
NVIDIA17:00-17:45 GPGPU Mark Harris NVIDIA17:45-18:00 Closing Q/A
![Page 3: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/3.jpg)
OpenGL Performance TuningOpenGL Performance Tuning
Back to the Basics
Evan Hart – ATI ResearchEhart @ ati.com
![Page 4: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/4.jpg)
Performance RoadmapPerformance Roadmap
• Pipeline refresher• Finding bottlenecks• Ride the pipeline• Hot topics
![Page 5: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/5.jpg)
OpenGL Graphics PipelineOpenGL Graphics Pipeline
![Page 6: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/6.jpg)
Pipeline ContinuedPipeline Continued
• Enables parallelism• Designed to thrive in a multiprocessor system
• Complicates measurement• Performance limited by slowest stage• Must isolate stages to find the limit
![Page 7: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/7.jpg)
Bottleneck IdentificationBottleneck Identification
• Remove pipeline stages• ‘Useless’ API functions help
• Reduce workload on suspect stages• Walk up or down the pipeline
• Use performance counters• Direct hardware insight
![Page 8: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/8.jpg)
Bottleneck IdentificationBottleneck IdentificationVary FB FPS
varies?FB
limited
Vary texturesize/filtering
FPSvaries?
Vary resolution
FPSvaries?
Texturelimited
Vary instructions
FPSvaries?
Vary vertex
instructions
FPSvaries?
Vary vertex size/AGP rate
FPSvaries?
Transferlimited
Fragment/Vertexlimited
Raster/setuplimited
CPUlimited
Yes
No
No
No
No
No
No
Yes
Yes Yes
Yes
Yes
![Page 9: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/9.jpg)
System PerformanceSystem Performance
• CPU time spent on graphics• Largest problem in graphics performance• Problem grows over the lifecycle
• Graphics scales easier than CPU• Enhancing quality can create balance
• Shaders and anti-aliasing
![Page 10: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/10.jpg)
System BottlenecksSystem Bottlenecks
• Hardest bottleneck to find• Performance stays fixed
• Increased graphics power does not help• Reduced loads do not help
• Performance counters can help
![Page 11: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/11.jpg)
Software BottlenecksSoftware Bottlenecks
![Page 12: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/12.jpg)
System Perf. ComponentsSystem Perf. Components
• Data Transmission• Vertex data, textures, etc.• Large isolated component• Often easiest gains
• State management• Too much state• State thrashing
![Page 13: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/13.jpg)
Data TransmissionData Transmission
• Largest single chunks• Big data requires big efficiency• Poor performance on most of the data means
poor performance• Types of submissions
• Geometry submission• Image submission
![Page 14: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/14.jpg)
Geometry SubmissionGeometry Submission
• Relative performance (best to worst) • Display lists• Vertex buffer objects• Vertex arrays, preferably ranged • Immediate mode (glBegin/glEnd)• glArrayElement
![Page 15: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/15.jpg)
Display ListsDisplay Lists
• Excellent method for static geometry• Allows the driver to correct app mistakes
• Merge small draw calls• Format data types to be hardware friendly• Reformat primitive types
• Fairly large software penalty for compile
![Page 16: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/16.jpg)
VBOsVBOs
• Performance equivalent to display lists• Application must not make mistakes
• Supports both static and dynamic data• Cheaper to update than display lists
• Significantly more flexible than DLs• More control over memory usage
• May not be efficient for small draws
![Page 17: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/17.jpg)
Vertex ArraysVertex Arrays
• Extremely flexible• Reasonably efficient method of data
submission• Few calls, lots of data
• No data caching• Likely to run into CPU bottlenecks
![Page 18: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/18.jpg)
Miserable PerformersMiserable Performers
• Immediate mode evils• Stream arbitrarily hard to parse• Potentially poor cache performance• Each call involves function pointer indirection
• Like a virtual function for each attribute
• Further glArrayElement evils• Fools you into believing it is a vertex array
![Page 19: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/19.jpg)
Image TransferImage Transfer• Avoid in critical paths• Utilize methods that do not require memory
management (sub-image)• Match format as closely as possible
• Use a hardware native format
• Avoid synchronization (glReadPixels)• Asynch behavior being developed
• Utilize GPU friendly memory when available• Pixel Buffer Object
![Page 20: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/20.jpg)
General Transfer TipsGeneral Transfer Tips
• Bigger is better• More efficient to transfer lots of data together
• Know the native formats• Avoid GLdouble (processing is in floats)• Avoid GL(u)int (indices are ok)• Avoid unnecessary conversions• Avoid odd sizes (24-bit color)
![Page 21: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/21.jpg)
State ManagementState Management
• Too much state• Try to sort for efficient state transitions• Use shaders instead of fixed function
• State thrashing• Toggling state back and forth• Scene-graph centric problem
![Page 22: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/22.jpg)
Other State EvilsOther State Evils
• Context switching• Expensive software operation• Use FBOs instead of Pbuffers
• glPushAttrib / glPopAttrib• Hits a lot of state at once• Use sparingly for compatibility with 3rd party
code
![Page 23: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/23.jpg)
State Thrashing ExampleState Thrashing ExampleglEnableClientState( … );glVertexPointer( …);glEnable( GL_TEXTURE_GEN*);glMaterial( …);glDrawElements( …);glDisable( GL_TEXTURE_GEN*);glDisableClientState( …);
//Next objectglEnableClientState( … );glVertexPointer( …);glEnable( GL_TEXTURE_GEN*);glMaterial( …);glDrawElements( …);glDisable( GL_TEXTURE_GEN*);glDisableClientState( …);
![Page 24: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/24.jpg)
Vertex PerformanceVertex Performance
• Vertex fetch performance• How fast does the GPU get it?
• Vertex compute performance• How fast does it evaluate?
• Vertex efficiency• Is it wasting time?
![Page 25: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/25.jpg)
Vertex BottlenecksVertex Bottlenecks
![Page 26: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/26.jpg)
Vertex EfficiencyVertex Efficiency
• Indexed primitives• Utilized generalized post-transform vertex
cache• Avoids fetch and compute costs
• Ideal for maximal mesh efficiency• Other vertex reuse
• Strips, fans, and loops
![Page 27: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/27.jpg)
Vertex Fetch PerformanceVertex Fetch Performance• Minimize vertex size
• Utilize byte/ubytes/shorts/ushorts• Interleave vertex data
• Single vertex fits in a cache line• Maximize locality of reference
• Indices 0, 1, 2 are faster than 3, 8, 13• Pay attention to natural boundaries
• Aim for 32 or 64 byte vertices• Use ushorts for indices
![Page 28: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/28.jpg)
Vertex Compute PerformanceVertex Compute Performance
• Turn off anything you don’t need• Avoid the universal shader
• Try custom shortcuts• If it is only a 2x2 matrix, use a mat2
• Send fewer vertices• Efficient app level culling is always desirable
![Page 29: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/29.jpg)
Primitive PerformancePrimitive Performance
• Rare to have problems here• Possible issues
• Clipping• Interpolator overload• Culling
• Frustum, not back face
![Page 30: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/30.jpg)
Fragment PerformanceFragment Performance
• Second most common bottleneck• Often easy to address
• Reduce total fragments• Reduce per-fragment cost• Turn on multismapling
• Contains two subcomponents• Texture performance• ALU performance
![Page 31: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/31.jpg)
Fragment PerformanceFragment Performance
![Page 32: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/32.jpg)
Texture BottleneckTexture Bottleneck• Fragment pipe is starved reading textures• Expensive filtering
• Anisotropic• Trilinear• Deep formats (RGBA FP32)
• Texture cache abuse• Improper use of mipmaps
• Negative LOD bias• No mipmaps
• ‘Noisy’ dependent texture fetches• Textures oversized
• Use texture compression• Utilize smaller formats where appropriate• Fill ‘unused’ components (there is no 24-bit format)
• Trade off ALU instructions
![Page 33: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/33.jpg)
ALU BottleneckALU Bottleneck• Too much computation• Switch computations to textures
• Transcendental functions (some hardware)• Normalize (some hardware)• Only if texture is not a bottleneck• Becoming less effective
• Utilize dynamic flow control when applicable
• Avoid universal shaders
![Page 34: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/34.jpg)
Reducing FragmentsReducing Fragments• Scissor to the area of interest
• Scissor is essentially free• Use occlusion culling
• Render roughly front to back• Early depth testing
• Avoid discard, alpha test, and alpha to coverage• Hierarchical depth testing
• Use GL_LESS of GL_LEQUAL• Use reasonable projections• Clear the depth buffer
• Pre-fill the depth buffer with depth only pass
![Page 35: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/35.jpg)
Backend PerformanceBackend Performance
• Many ops do not impact performance directly• Alpha test• Fog• Dithering
• Typically heavily memory limited• Blending• Depth read/write• Multisampling
![Page 36: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/36.jpg)
Backend PerformanceBackend Performance
![Page 37: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/37.jpg)
Optimizing the BackendOptimizing the Backend• Utilize blending sparingly
• Collapse multiple passes into one• Avoid unnecessary use of higher bit depths• Ensure that occlusion culling operations can be used• Clear color, depth, and stencil buffers
• Can maximize compression• Clear together if possible
• Avoid accumulating unnecessary junk• Set unused alpha to identity value (0 or 1)
• Utilize write masks• If you don’t need it, don’t write it• Not applicable to single color channels
![Page 38: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/38.jpg)
ThanksThanks
• ATI ISV teams & 3DArg• NVIDIA ISV team• John Spitzer @ NVIDIA
• Performance analysis flowchart
![Page 39: Advanced Visual Effects with - Home - AMD · 2013-10-25 · Advanced Visual Effects with OpenGL ... • Few calls, lots of data • No data caching • Likely to run into CPU bottlenecks](https://reader030.vdocuments.us/reader030/viewer/2022041015/5ec66e2a4c21a112052e28fd/html5/thumbnails/39.jpg)
Questions?Questions?