droidcon2013 triangles gangolells_imagination

April 2013

It’s all about triangles! Understanding the GPU in your pocket to

write better code

Introductions

Guillem Vinals Gangolells (guillem.vinalsgangolells@imgtec.com)

Developer Technology Engineer, PowerVR Graphics

It’s all about triangles! Understanding the GPU in your pocket to write better code

Company overview

Leading silicon, software & cloud IP supplier

Multimedia: graphics; GPU compute; video; vision

Communications: demodulation; connectivity; sensors

Processors: applications CPUs; embedded MCUs

Cloud: device and user management; services

Targeting high volume, high growth markets

Top semis and OEMs for mobile, connected home consumer

automotive and more

Pure: our strategic product division

Digital radio, internet connected audio, home automation

Established technology powerhouse

Founded 1985; London FTSE 250 (IMG.L); ~1,500 employees

UK HQ; global operations

Comprehensive IP

portfolio for SoCs

& cloud connectivity

IP business pathfinder

Market maker/driver

A Crash Course in Graphics Architectures

Immediate Mode Renderer (IMR)

Buffers kept in system memory

High bandwidth use, power consumption & latency

Each triangle is processed to completion in submission order

Wastes processing time and thus power due to “overdraw”

‘Early-Z’ techniques help but are only as good as your geometry sorting

Concept: Tiling

Frame buffer sub-divided into Tiles

32x32 pixels per tile, for example

Varies by device

Geometry is sorted into affected tiles

Allows each tile to be processed independently

Small number of fragments per tile

Allows on-chip memory to be used

Tile Based Renderer (TBR)

Rasterizing performed per-tile

Allows the use of fast, on-chip, buffers

Each triangle is processed to completion in submission order

Wastes processing time and thus power due to “overdraw”

‘Early-Z’ techniques help but are only as good as your geometry sorting

Concept: Deferred Rendering

Fragments - Two stage process

Hidden Surface Removal (HSR)

Shading

HSR is pixel perfect

Only visible fragments pass, no ‘overdraw’

Only requires position data

Less bandwidth & processing, saves power

HSR is submission order independent

No need for applications to submit geometry front to back

Tile Based Deferred Renderer (TBDR) = PowerVR

Rasterizing performed per-tile

Allows the use of fast, on-chip, buffers

Hidden Surface Removal (HSR) reduces overdraw

Pixel perfect, and submission order independent, no geometry sorting needed

Optimised to only retrieve information required (*), saving even more bandwidth

Saves power and bandwidth

PowerVR Hardware Overview

Pipeline Summary Geometry Processing

Pipeline Summary Fragment Processing

Bandwidth Saving

Bandwidth usage is the biggest contributor to GPU power consumption

Saving bandwidth means staying ‘on chip’ as much as possible

It also means throwing away work you don’t need to do

PowerVR is designed from the ground up to do all of these

Unified Architecture

Pixel Back End (PBE)

Combines sub-samples for on-chip MSAA

MSAA Performed per-tile

Done using sub-sampling

Negligible impact on bandwidth

Each sub-sample benefits from HSR

Series5/5XT: 4x MSAA

Series6: 8x MSAA

Performs final format conversions

Up scaling, down scaling etc. (Internal True

Colour)

Further Considerations

Micro Kernel

Specialised software running on the USSE (Series5) or its own core (Series6)

Allows the GPU and CPU to operate with minimal synchronisation

Improves performance by handling interrupts on the GPU

Competing solutions handle interrupts on CPU (in the driver)

Multicore

Near linear performance scaling

Small fixed overhead known at design time

Geometry processing load-balanced

Cores share the processing effort

Tiling enables parallel fragment processing

Any core can work on any tile when available

Each tile is self-contained

Multi-core logic is handled by the hardware

Completely transparent to the developer

Alpha Blending

Tiling GPUs don’t need to reach in to system memory to perform an alpha blend

The colour buffer is on-chip

This means that alpha blending doesn’t cost you any additional bandwidth

It also means that alpha blending is fast…very fast

HSR will also save you some work by throwing away occluded blending work

Remember: Opaque, Alpha Test, Alpha Blend

Golden Rules

Common Bottlenecks Based on past observation

Most Likely

CPU Usage

Bandwidth Usage

CPU/GPU Synchronisation

Fragment Shader Instructions

Geometry Upload

Texture Upload

Vertex Shader Instructions

Geometry Complexity

Least Likely

Warning!

Some of these rules may seem obvious to you…

…we still see them broken everyday…

…if you know them, please bear with us

Understand Your Target Device

No two devices are identical

Even when they look the same

Different SoCs will have different bottlenecks

Make sure you test against different chips

Make sure you understand the hardware

You don’t want your optimisation to make things worse

Clearly, you’re already doing this….your here

Golden Rule 1

Don’t Waste GPU Time

The Principle of “Good Enough”

Don't waste polygons on un-needed detail

Textures should never be much larger than their size on screen

Why waste time loading a 1Kx1K texture if it’s never going to appear bigger than 128x128?

If the user won't notice it, don’t waste time processing it

Golden Rule 2

Promote Calculations up The Chain

Don’t do a calculation you don’t need to do

If you can do it once per scene, do it once per scene

If you can’t, try and do it per vertex

There are generally fewer vertices in a scene than fragments.

If you can, pre-bake

E.g. lighting

Remember, ‘Good Enough’

Golden Rule 3

Don’t Access an Active Render Target

Accessing a render target from the CPU is very bad for performance

If it’s not done properly it will synchronise the GPU and CPU….This is Bad™

Golden Rule 4

Accessing Render Targets Safely

Use EGL_KHR_fence_sync

Use CPU side handles to GPU mapped memory to avoid blocking calls

E.g. GraphicsBuffer (or gralloc) on Android

Golden Rule 4 Cont.

Avoid Updating Active Assets

Assets may need to stay the same for multiple frames

We refer to this as an asset’s ‘Lifespan’

Golden Rule 5

Changing a texture during its lifespan may cause ‘Ghosting’

Changing a buffer during its lifespan is blocking

This can be managed using circular buffers, similarly to render targets

Use VBOs and Indexed Geometry

VBOs benefit from driver level optimisations

Vertex Array Objects (VAOs) may be even better

Index your geometry

It makes your data smaller

It also benefits from driver level optimisations

Use static VBOs ideally, and consider the assets lifespan

Don’t use a VBO for dynamic data

Golden Rule 6

Batch Your Draw Calls

Group static objects, and draw once

Static objects are objects that are static relative to each other

Sort objects by render state

Emphasis on texture and program state changes

Try using texture atlases

Remember Golden Rule 5 if your going to update the contents

Golden Rule 7

Compress Your Textures

The lower the bitrate the less bandwidth consumed

Use PVRTC & PVRTC2, at 2 & 4bpp RGB/RGBA

Don’t confuse this with PNG or JPG which are

decompressed in memory

Usually to 24bpp or 32bpp

PVRTC is read directly from the compressed form

It stays in memory at 2bpp or 4bpp

Use MIP-Mapping and remember ‘Good Enough’

Golden Rule 8

Alpha Test/Discard & Alpha Blend

Alpha Test removes advantages of ‘Early-Z’ techniques and HSR

Fragment visibility isn’t known until fragment shader is run

Prefer Alpha Blending, and render in the order Opaque, Alpha Test, Alpha Blend

Makes best use of HSR

Golden Rule 9

Use ‘Clear’ and ‘DiscardFrameBuffer’

Calling ‘Clear’ ensures the previous render isn’t uploaded to the GPU

By default, the depth/stencil buffers are written to memory at the end of a render

Calling DiscardFrameBufferExt(…) ensures these buffers aren’t written to system memory

Look for the ‘GL_EXT_discard_framebuffer’ extension

Do both if you can!

Golden Rule 10

Questions ?

Or drop us an email: devtech@imgtec.com

Download our PowerVR SDK: bit.ly/PVR_SDK

Also, you can download examples, tools and

shell as an Android SDK add-on:

http://install.powervrinsider.com/androidsdk.xml

April 2013

droidcon2013 triangles gangolells_imagination

imagination technologies

imagination technologies

bandwidth processing

power hsr

gpu power consumption

chip msaa msaa

additional bandwidth

chip memory

Technology

special right triangles and trigonometry -...

droidcon2013 miracast final2

10.1 triangles. acute triangle not acute triangles

congruent triangles -...

chapter 4 congruent triangles 4.2 and 4.9 classifying...

triangles triangles exploration activity topic: medians

unit 2 - triangles equilateral triangles

droidcon2013 x86phones weggerle_taubert_intel

classifying triangles angle measures of triangles

unit 6 - congruent triangles congruent triangles...

9.2 special right triangles - wordpress.com · 09.04.2018...

solving right triangles 9.6 chapter 9 right triangles and...

droidcon2013 app analytics_huber_1und1

§ 5.1 classifying triangles classifying...

droidcon2013 facebook stewart

dilating triangles to create similar triangles

congruent triangles – overlapping triangles pg. 12

classifying triangles and angles in triangles book section...

chapter 4 congruent triangles. 4.1 triangles and angles

droidcon2013 commercialsuccess rannenberg