topics

18
© Copyright Khronos Group, 2004 - Page 1 The challenge of migration : desktop to handheld Phil Atkin Product Manager 3D Graphics August 2004

Upload: wenda

Post on 06-Jan-2016

33 views

Category:

Documents


1 download

DESCRIPTION

The challenge of migration : desktop to handheld Phil Atkin Product Manager 3D Graphics August 2004. Topics. Overview Definitions What does ‘desktop’ mean? What does ‘handheld’ mean? Challenges Management of 3D resources Management of CPU resources Case study - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Topics

© Copyright Khronos Group, 2004 - Page 1

The challenge of migration : desktop to handheld

Phil AtkinProduct Manager 3D Graphics

August 2004

Page 2: Topics

© Copyright Khronos Group, 2004 - Page 2

TopicsTopics

OverviewOverview•Definitions• What does ‘desktop’ mean?• What does ‘handheld’ mean?

•Challenges• Management of 3D resources• Management of CPU resources

•Case study• Realities of porting a desktop 3D framework to handheld• Demonstrations (Intel / Intrinsyc Carbonado)• Performance (PowerBook vs. Carbonado)

•Conclusions

Page 3: Topics

© Copyright Khronos Group, 2004 - Page 3

Desktop vs. handheld systemsDesktop vs. handheld systems

Desktop systemDesktop system•CPU + GPU + 3D API

- Powerful - 1GHz up to >3GHz CPU with SIMD floating-point- Big caches- Minimum ‘Free3D’ chipset - Maximum GeForce 6800 / Radeon X800- OpenGL 1.5 transitioning to OpenGL 2.0

Handheld system (PowerVR 3D)Handheld system (PowerVR 3D)•CPU + GPU + 3D API

- CPU ranges from 100MHz to 500+MHz- Small caches- CPU may or may not have FP capability- Minimum MBX Lite no VGP - 1M tris, 100M pixels- Maximum MBX VGP - 4M tris, 350M pixels, free AA- OpenGL ES 1.0 transitioning to OpenGL ES 1.1

Page 4: Topics

© Copyright Khronos Group, 2004 - Page 4

Handheld 3DHandheld 3D

•Delivering accelerated handheld 3D is all about power management•All chip vendors have access to similar process technologies

- Leads to similar power / MHz- Leads to similar performance / mW

•All system vendors have access to the similar battery technologies- Leads to similar ‘talk time / game-time’ per recharge

•Some architectures have clear power/performance advantages- Tile-based rendering, on-die framebuffers - minimize data passing between chips

•These factors lead to a relatively narrow spectrum of capabilities•Low-end and high-end systems only differ by 3-4x•Admittedly PowerVR sets a high baseline, but the generalization holds

Page 5: Topics

© Copyright Khronos Group, 2004 - Page 5

ObservationsObservations

Even low-end handheld 3D accelerators will offer excellent performanceEven low-end handheld 3D accelerators will offer excellent performance•On par with 2nd / 3rd generation desktop accelerators•Efficient API is in place and standardized•Hence the path from the driver to the hardware is sorted - but …

What about the path from the application to the driver?What about the path from the application to the driver?•How to structure application code to keep hardware busy?

Despite relatively narrow spectrum of 3D capabilitiesDespite relatively narrow spectrum of 3D capabilities•Potential for extremely large disparity between systems•Floating point-less CPU, rasterizer-only 3D•Very high performance CPU / FPU, vertex-programmable 3D

How to develop or port with such a spread of computational capabilities?How to develop or port with such a spread of computational capabilities?

Page 6: Topics

© Copyright Khronos Group, 2004 - Page 6

The challengeThe challenge

Management of 3D capabilities is not the challengeManagement of 3D capabilities is not the challenge•The usual techniques learned in the desktop space can be used•Resolution / triangle count / texture filtering / AA quality

Management of CPU resources is the challengeManagement of CPU resources is the challenge•Lowering vertex counts to GPU will inherently lower CPU load•But the problem is far bigger in scope than just this•The data type float is essentially unavailable at the low end

Platform CPUs have such diverse capabilities - eitherPlatform CPUs have such diverse capabilities - either•Stratify in software, code explicitly to each market stratum•Or code in a floating-point agnostic manner

The latter is achievable and allows a single code base across The latter is achievable and allows a single code base across platformsplatforms

Page 7: Topics

© Copyright Khronos Group, 2004 - Page 7

Why bother porting to an FPU-less Why bother porting to an FPU-less platform?platform?Consider the following 3 likely classes of handheld deviceConsider the following 3 likely classes of handheld device•Class A

- High-performance CPU, FPU, GPU with vertex processing

•Class B- High-performance CPU, GPU with vertex processing

•Class C- CPU, rasterizer

•Classes B and C will likely be smaller die, lower cost•Will likely ship in higher volumes•If so -

- will offer more revenue opportunities for software vendors- yet platforms do not have floating-point capability

•But a Class A device may win out•Software vendors must cover all the bases to guarantee success

Page 8: Topics

© Copyright Khronos Group, 2004 - Page 8

Why not just make everything fixed Why not just make everything fixed point?point?Because your desktop platform Because your desktop platform •Will be faster in floating-point•Does not have fixed-point OpenGL ES entrypoints!

If you really need If you really need •The same code base to run on desktop and handheld•High performance on all classes of handheld systems

You need to abstract out your numeric formatYou need to abstract out your numeric format

Page 9: Topics

© Copyright Khronos Group, 2004 - Page 9

Porting desktop softwarePorting desktop software

•Debugging on a handheld is no fun•Try to get as close as possible to the handheld codebase without leaving the desktop

•‘Portify’ 3D code - Encapsulate 3D code for ease of management- Modify 3D code so it is OpenGL ES friendly- Modify it so it is fixed point friendly

•‘Portify’ application code- Abstraction of floating point code - ‘real number’ type- Careful analysis of dynamic range of operands- Some operations simply must be performed in floating-point- Determine what they are, isolate them, do everything else fixed-

point

Page 10: Topics

© Copyright Khronos Group, 2004 - Page 10

And port to the handheld platformAnd port to the handheld platform

This bit is easy if the last bit went well ... This bit is easy if the last bit went well ... •Take cross-compiler•Turn on all the #ifdefs you prepared earlier•Type ‘make’•Or under Embedded Visual C++ hit F7

Page 11: Topics

© Copyright Khronos Group, 2004 - Page 11

Case study - the Mobile Scene GraphCase study - the Mobile Scene Graph

Framework for 3D applicationsFramework for 3D applications•Initial implementation - desktop

- Interactive landscape, architecture and garden design review- Straightforward design

- Classic app + cull + draw, frustum culling- C++, STL, polymorphic, RTTI

- Target platform PowerBook G3 500MHz / OpenGL / glut

•Transitioned into- Desktop - interactive landscape, architecture and garden design

review- Handheld - experimental testbed for OpenGL ES rendering- Target platforms

- PowerBook G3 500MHz / OpenGL 1.4 / glut- Intel / Intrinsyc Carbonado / OpenGL ES 1.0 / egl

•Great opportunity to take on a port- Aiming for 100% application source code compatibility- Aiming to deliver highest possible performance on desktop and

handheld

Page 12: Topics

© Copyright Khronos Group, 2004 - Page 12

MSG Implementation detailsMSG Implementation details

•‘MSGReal’- Build-time switchable float or OpenGL ES 16.16 fixed point- C++ operators provide +-*/ and common type conversions- Functions provide trig, sqrt / recipsqrt- All expensive operations implemented by piecewise

quadratics

•Additional 4.12 ‘MSGShortFix’ type- Intermediate product fits into 32 bits, no double-length

maths- Superbright unclamped colour accumulation- Reflection-mapping via quadratic approximation without

overflow

•Only 2 internal functions use floating-point- Plane fitter for frustum construction- Matrix inverter

Page 13: Topics

© Copyright Khronos Group, 2004 - Page 13

Porting realities - timescalesPorting realities - timescales

Approximately 3 man-months of portificationApproximately 3 man-months of portification•Difficult to measure accurately•Coding was in progress as portification began

Approximately 20,000 lines of codeApproximately 20,000 lines of code•Only 800 lines can see <gl/gl.h>• Just 8 #ifdefs in this module• i.e. the portification process is manageable

2 evening porting sessions2 evening porting sessions• Just 6 hours at the desk from ‘move code onto PC’ to ‘run on handheld’•… and one evening should have been enough

Then performance tuningThen performance tuning•Anticipated >30Hz was only 15-20Hz•Now tuned up to >30Hz with no change in geometric load

Page 14: Topics

© Copyright Khronos Group, 2004 - Page 14

Porting realitiesPorting realities - gotchas - gotchas

Handheld specificHandheld specific•Performance not linear with clock for a variety of reasons

- e.g. caching behaviour, driver behaviour, architectural

•Limited container class and template support•Some C++ operations will hurt more than you expect

- Very slow RTTI- STL list operations sort(), push_back(), pop_front() proved surprisingly

expensive

3D gotchas3D gotchas•Unanticipated differences in behaviour

- E.g. multiple strips from single pointer setup – multiple TnL on Carbonado

- Would benefit from gLDrawMultiElements

•Short tristrip performance- Would benefit from gLDrawMultiElements!!

•Best performance - glDrawElements(glTriangles)•Fixed-point to integer conversion in OpenGL ES interface

Page 15: Topics

© Copyright Khronos Group, 2004 - Page 15

DemonstrationsDemonstrations

MSGRefMap - arithmetic performance MSGRefMap - arithmetic performance testtest•Single object, reflection mapped

- Cull time virtually zero- Virtually all cycles spent in reflection-map

code - This is fixed-point on all platforms- 16-bit skybox textures

MSGHurricane - frustum-culling testMSGHurricane - frustum-culling test•2048 objects in hierarchical terrain

- unlit, 8-bit luminance texture

•7 animated aircraft- lit with 2 lights- 16-bit aircraft texture- 16-bit skybox textures

Page 16: Topics

© Copyright Khronos Group, 2004 - Page 16

PerformancePerformance

MSGRefMapMSGRefMap•PowerBook floating point

- OpenGL renderer - 116 Hz- NULL renderer - 1360 Hz

•PowerBook fixed point- NULL renderer - 1620 Hz

•Carbonado fixed point- OpenGL ES renderer - 35.9

Hz - NULL renderer - 668.4 Hz

•Carbonado floating point- NULL renderer - 101.2 Hz

MSGHurricaneMSGHurricane•PowerBook floating point

- OpenGL renderer - 122 Hz- NULL renderer - 1890 Hz

•PowerBook fixed point- NULL renderer - 960 Hz

•Carbonado fixed point- OpenGL ES renderer - 34.6

Hz - NULL renderer - 271.5 Hz

•Carbonado floating point- NULL renderer - 46.25 Hz

•Fixed-point code averages 6x faster than FP emulation- Despite data structure traversal and other non-arithmetic code- Despite fixed point reflection-mapping code in floating point version- This is a fast CPU, yet it is too slow in FP emulation running MSGHurricane

Page 17: Topics

© Copyright Khronos Group, 2004 - Page 17

Last word on performanceLast word on performance

The missing case - The missing case - •Floating point application code•Fixed point framework / middleware•Estimated by isolating application cycles on Carbonado

- Time spent in application = 11% of frame time (NULL renderer)

•MSGHurricane- Fixed point frame time = 0.0037 sec- Floating point frame time = 0.021 sec- Mixed-mode frame = (89% * 0.0037) + (11% * 0.021) = 0.011 sec- Estimated 88Hz mixed-mode rate

•Within 33mS budget•But scale processor back to 150MHz and it becomes too slow

again•And this is just a demo - just splines, no physics, no gameplay•Floating-point emulation is just too slow for even the simplest case

Page 18: Topics

© Copyright Khronos Group, 2004 - Page 18

ConclusionsConclusions

•The software migration process can be relatively painless•Source code should be ‘portified’ - i.e. made

- 3D API agnostic- Isolate and encapsulate your 3D API interactions- Structure desktop code to be OpenGL ES friendly

- Floating point agnostic- Abstract out your real number format- At minimum in middleware layer- Ideally allow fixed-point from application down to hardware

•You can do all this from the safety of your workstation- No handheld platform debugging until project is mature- MSG ported to Carbonado in 2 evenings with just printf

•And if you get it right- It will just port and just work - but may require some tuning- Performance will be high across platforms- Resulting software will be highly portable and reusable