Transcript
Page 1: EECS 487: Interactive Computer Graphicssugih/courses/eecs487/... · 2015-10-30 · Computer Graphics Lecture 21: • Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan

EECS487:InteractiveComputerGraphics

Lecture21:•  OverviewofLow-levelGraphicsAPI

Metal,Direct3D12,Vulkan

ConsoleGames

WhydogameslookandperformsomuchbetteronconsolesthanonPCswithequivalentspecs?

•  consolesareclosedplatformswithlongshelflive,programmerscanprogramthehardwaredirectly

•  doingawaywithserializedcall-preparationbottleneckallowsforbetterutilizationofmultipleCPUcores

Motivationsforlow-levelgraphicsAPIs:

•  fastergraphicsfromreducedAPIoverhead

•  “close-to-metal,”directcontroloftheGPU

[Anandtech:Smith]

Low-Overhead,Low-LevelAPI

WhencethehighoverheadofgraphicsAPI?•  hardwareabstractionshideunderlyingplatformdiversity,providingprogrammingconvenienceandflexibility:

graphicsvs.systemprogramming

•  “newby-friendly”safetynetsoferrorcheckingandstatevalidation

Codegurus(theoneswritinggameenginesand

renderers)wouldratherhaveperformancethan

hand-holding

[Anandtech:Smith]

Low-Overhead,Low-LevelAPI

Whyisthisanissuenow?• GPUperformanceisfaroutstrippingCPUduetothemassivelyparallelnatureofgraphicsrendering:API

overheadattheCPUisthrottlingGPUperformance

•  serializedcommandassemblypriortoissuingdrawcallsrestrictsutilizationofmulti-coreCPU

•  instancingandbatchingobjectsintoasmallernumberofdrawcallscanonlyhelpsomuch

Anotheradvantage:easierportingofconsole

gamestoPCs?

[Anandtech:Smith]

Page 2: EECS 487: Interactive Computer Graphicssugih/courses/eecs487/... · 2015-10-30 · Computer Graphics Lecture 21: • Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan

HowtoImprovePerformance?

1. Commandbuffer

a.  reduceddrawcalloverhead

b. bettercommandsubmissionmulti-threading

2. Baked-instates

a.  pipelinestateobjects

b.  resourcebinding

3. Pre-compiledshaders

BiggestSourceofCPUOverhead

Assemblyofcommandstreampriortoissuingadrawcall,e.g.,thegatheringtogetherof

•  linemode

•  polygonmode

•  flatorsmoothshading

•  textureobjectstouse• whichvertexarrayobjects

• whichvertexbufferobjects

•  settingvertexattributepointers•  argumentstodrawcalls

Donebydriver

[Anandtech:Smith]

Queue

Single-threadedJobAssembly

GPUFront-End

cmd

driver

cmd cmd

cmd

[nvidia:Foley]

Problem:single-threadedjobassemblybyCPUisoftennot

fastenoughtokeepGPUbusyCPUThread

CPUThread

Utilizesatmost

2cores

commandbuffer

CommandBuffer/List

Developersself-assemblecommandstreamintoacommandbuffer(Vulkan)orcommandlist(D3D12)

Eachcommandbufferisself-contained,somultiple

bufferscanbeassembledinparallel,eachonitsownthread/corewithoutextraconcurrencywork

Finalsubmissionofthecommandbuffersviathe

commandqueueisstillserial,butishighlyefficient

[Microsoft:Sandy]

Metal D3D12 Vulkan

MTLCommandBuffer() ID3D12CommandList() VkCmdBuffer()

MTLCommandQueue() ID3D12CommandQueue() VkCmdQueue()

Page 3: EECS 487: Interactive Computer Graphicssugih/courses/eecs487/... · 2015-10-30 · Computer Graphics Lecture 21: • Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan

Queue

Multi-ThreadedJobAssembly

GPUFront-End

CPUThread cmd cmd

CPUThread cmd cmd

CPUThread

CPUThread cmd cmd

cmd cmd

cmd

cmd

cmd cmd cmd cmd cmd

[nvidia:Foley]

CPUThread

CommandBufferRe-Use

InVulkanacommandbuffercanbere-used•  a“top-level”commandbuffercan“call”

second-levelcommandbuffers

InD3D12acommandlist“recorded”asabundle

canbesubmittedoncetotheGPUbutexecutedmultipletimes,withdifferentresources,e.g.,

differenttextures(muchlikeOpenGL’sretainedmodedisplaylist)

Metalcurrentlydoesn’tsupportcommandbuffer

re-use

[nvidia:Foley;Microsoft:Sandy]

Direct

3D11

Direct

3D1

2

3DMark–Multi-threadScalingand50%BetterCPUUtilization

UserModedriverworkloaddistributedacrossmultiplethreads/cores

AppLogic

Single-threadedbottleneckreduced

Windowskerneltimereduced

Direct3Ddrivertimereduced

KernelModedrivertimetotallyremoved

[Microsoft:Sandy;Anandtech:Smith]

Imagination’sGnomeHordeNoinstancing

Re-usecommandbuffersforeachtile:

300tiles,13,500draws/frame,30fps,lightCPUusageOver400,000drawcalls/sec,eachwithadifferenttransformationwithmanydifferentmaterials,textures,

blendmodes,andshaders[Imagination:Smith]

Page 4: EECS 487: Interactive Computer Graphicssugih/courses/eecs487/... · 2015-10-30 · Computer Graphics Lecture 21: • Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan

FastMovingCamera

appcpuusage

systemcpuusage

[Imagination:Smith]

Commandbuffersneedtoberegeneratedveryfrequently

single-threadedCPUbottleneckcannot

feedGPUfastenough�lowFPS

Commandbufferassemblydistributedto

multiplecores

couldrunCPUatlowerfrequency

PowerEfficiency

WhenCPUandGPUhavetosharepowerandthermalbudget

•  lowerCPUusageallowsmorepowerandthermalbudgettogotoGPU

•  spreadingworkloadacrossmoreCPUcoresalloweachtorunatalowerclockspeed,furtherreducingpowerusageascomparedtorunningasinglethreadatahighfrequency

(tofeedtheGPU)

[Intel;Lauritzen;Anandtech:Barrett&Smith]

D3D12:CPUuses1/3thepowerofGPU,allowmoreGPUprocessing,renderingspedupbyover70%

D3D12:Ormaintainthesamerenderingperformanceat50%powerusage

D3D11:CPUusesasmuchpowerasGPU

[Intel:Lauritzen]

50,000Asteroids(draws/frame)

PipelineStateObjects(PSOs)

Problem:draw-timevalidationofshaderstatesdelayshardwaresetupandreducesthenumberof

drawcallsperframe

Solution:bake(compileandvalidate)pipelinestates

intoPSOsthatarefinalizedoncreation,switching

PSOshaveloweroverheadthancomputinghardwarestateonthefly

[Microsoft:Sandy]

Page 5: EECS 487: Interactive Computer Graphicssugih/courses/eecs487/... · 2015-10-30 · Computer Graphics Lecture 21: • Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan

PipelineStateObjectsContainsallstaticstateforentire3Dpipeline• shaders,vertexattributeformats,rasterization,colorblend,depthstencil,etc.

Createdoutsideoftheperformancecriticalpaths

PSOcanbecachedforre-use,evensavedtodisk/

cloudforre-useacrossappruns

[nvidia:Daniell;AMD]

PSO

WhatDoesn’tGointoaPSO?Resourcebindings•  theactualvertex,index,constantbuffers•  textures,samplers,etc.

Fixed-functionstatesthatdonotcauseshader

recompilation:viewport,colorblendconstants,polygonoffset,scissor,stencilmasksandrefs,etc.

[nvidia:Foley]

GPUStateVectorPipelineStateObject

Textures Buffers

Samplers

BindingTables

DescriptorTablesandPool/HeapProblem:tousedifferentresources,e.g.,texture,anappmustbindandrebindthemtofixedandlimited

bindslots(descriptors)andissuemultipledrawcalls

[Microsoft:Sandy;nvidia:Foley]

GPUStateVector

PipelineStateObject

TexturesBuffers

Samplers

BindingTables

DescriptorTableandHeap/PoolSolution:pre-writemultiplesetsofdescriptorstodescriptorheap;changingresourcessimplyswitches

descriptorsetsalreadyresidentinGPUmemory

[nvidia:Foley;Microsoft:Sandy]

GPUStateVector

PipelineStateObject

Textures

Buffers

DescriptorTables

Samplers

RootTable

Page 6: EECS 487: Interactive Computer Graphicssugih/courses/eecs487/... · 2015-10-30 · Computer Graphics Lecture 21: • Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan

GPUMemoryManagement

Withhigh-levelAPI,topassdatafromapptoGPU,firstallocateadriver-managedbufferand

copythedatabeforepassingthedatatotheshader⇒CPUoverhead

Withlow-levelAPI,adevelopersimplymapsthe

GPUmemoryaddressandwritestothatmemorylocationdirectly,noCPUintervention

[Imagination:Smith]

Pre-CompiledShader

Vulkan:•  pre-compilesshadersintoacommonintermediaterepresentation

•  providesomeIPprotection,developerscandistributeshadersinacompiledintermediaterepresentationinsteadofinsource

•  pre-compiledshadersalsospeedupdrawcalls

Metalalsopre-compilesshaders

[Anandtech:Smith]

[Khronos]

andothers

opensource

GameEnginesotherlanguages,e.g.,C++ShadingLanguage

StandardPortableIntermediateRepresentation:coreinVulkan

VulkanShaderProgramming

otherIR,e.g.,LLVM

GLSLtoSPIR-Vcompiler

opensourcetranslator

MorePredictablePerformance

Previously:appsubmitsadrawcall,mapsabuffer,etc.

Drivermight(GPUdependent):

•  compileshaders

•  insertsynchronizationfencesintoGPUschedule•  flushcaches•  allocatememory

Withlow-levelAPIalltheabovemustbedonebythe

appitself,butdriverperformanceacrossvendorsbecomesmorepredictable

[nvidia:Foley]

Page 7: EECS 487: Interactive Computer Graphicssugih/courses/eecs487/... · 2015-10-30 · Computer Graphics Lecture 21: • Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan

WhyVulkanisNotforBeginners

Musthandlemulti-threadingandconcurrency/synchronization

Mustmanagememoryallocationandusage

TheseareoptionalinOpenGL,butmandatory

inVulkan

SummaryofFeatures

[nvidia:Foley;Anandtech:Smith]

Tech Metal Direct3D12 Vulkan

commandbuffer ✔ ✔ ✔

pipelinestateobjects ✔ ✔✔

descriptortable ✗ ✔✔

tile-basedrenderpass ✔ ✗ ✔

multi-adapter ✗ ✔ ✔?

VulkanandD3D12:• bothsimilartoMantletostartwith• Mantlesupportsmulti-GPU

• notaslowlevel,tobecross-vendorandcross-platform

Tile-basedArchitectures

“MobileGPU”usuallymeans“tile-basedGPU”• mostAndroidandalliOSdevicesusetile-basedrendering

•  VulkanandMetalhavesupportfortile-basedarchitecture,butnotDirect3D 12

• tilingreducesuseofexpensiveoff-chipmemorybandwidth

[Google:Hall;Imagination:Sommefeldt]

Immediate-ModeRendering

Fragmentshading,includingtexturesampling,performedevenonfragmentsthatwilleventually

failthedepthtest• requiresaccessingoff-chipmemory

•  inefficientuseofoff-chipbandwidth

[Imagination:Sommefeldt;Merry]

Page 8: EECS 487: Interactive Computer Graphicssugih/courses/eecs487/... · 2015-10-30 · Computer Graphics Lecture 21: • Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan

Tile-basedArchitectures

Tile-basedrenderingsplitsframebufferupintotiles(e.g.,16×16or32×32pixels)andsortalltrianglesontileusingon-chipstoragebeforefragmentshading

[Merry]

Multi-AdapterSupport

PCscancontainmultiplegraphicscards

Appscanenumerategraphicscards

•  cancreateadeviceabstractionforeach

SomegraphicscardshavemultipleGPUs

• eachwithitsownenginesandmemory

Appsshouldbeabletoassignworkto

anyGPUonanygraphicscard• createqueuesonanyengineandsubmitcommandbuffers

• allocateresourcesinmemoryassociatedwithanyGPU

[Microsoft:Boyd]

Multi-AdapterSupport

Options:• Alternate-framerendering(AFR):framepacingbecomesanissueiftheGPUsareofdifferentperformance

• Split-framerendering

• Worksharingofindividualframes

D3D12ExplicitMulti-Adapter(EMA)modeallows

exchangeofmultipledatatypesbetweenGPUs,beyondjustfinished,renderedimages

ButtransferringdataoverPCIebusisslowandwith

highlatency!

[Anandtech:Smith]

FeatureSets/Levels

Hardwarefeaturescoping•  canbedefinedfordifferentplatformsorversionsoftheAPI

•  allfeatureslistedinasetmustbesupported

•  developerscandevelopagainstFeatureSets•  featuresenabledatdevicecreationtime

[Microsoft:Sandy;Anandtech:Smith]

Apple Apple?Imagination?Khronos?Lockedout?

Page 9: EECS 487: Interactive Computer Graphicssugih/courses/eecs487/... · 2015-10-30 · Computer Graphics Lecture 21: • Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan

ReferencesSmith,R.,“MicrosoftAnnouncesDirectX12,”Anandtech,Mar.24,2014Smith,R.,“UnderstandingAMDMantle,”Anandtech,Sep.26,2013Smith,R.,“SomeThoughtsonApple’sMetalAPI,”Anandtech,Jun.3,2014Smith,R.,“KhronosAnnouncesNextGenerationOpenGLInitiative,”Anandtech,Aug.11,2014Chester,B.,“ComparingOpenGLEStoMetaloniOS,”Anandtech,Jun.15,2015Sandy,M.,“DirectX12,”DirectXDeveloperBlog,Mar.20,2014Lauritzen,A.,“DirectX12onIntel,”Aug.11,2014Yeung,A.,“DirectX12–LookingbackatGDC2015,”Mar.9,2015Langley,B.,“Windows10 andDirectX12Released!,”DirectXDeveloperBlog,Jul.29,2015Smith,R.,“MicrosoftDetailsDirect3D11.3&12NewRenderingFeatures,”Anandtech,Sep.18,2014Smith,R.,“TheDirectX12PerformancePreview,”Anandtech,Feb.6,2015Smith,R.andCutress,I.,“ExploringDirectX12:3DMarkAPIOverheadFeatureTest,”Anandtech,Mar.27,2015Smith,R.,“NextGenerationOpenGLBecomesVulkan,”Anandtech,Mar.3,2015Smith,A.,“TryingoutthenewVulkangraphicsAPIonPowerVRGPUs,”ImaginationPowerVRGraphicsBlog,Mar.3,2015Smith,A.,“GnomespersecondinVulkanandOpenGLES,”ImaginationPowerVRGraphicsBlog,Aug.10,2015Foley,T.,“Next-GenerationGraphicsAPIs:SimilaritiesandDifferences,”ACMSIGGRAPH2015Sellers,G.,“AWhirlwindTourofVulkan,”ACMSIGGRAPH2015Hall,J.,“VulkanonAndroid,”ACMSIGGRAPH2015Hall,J.,“UsingNext-GenerationAPIsonMobileGPUs,”ACMSIGGRAPH2015Daniell,P.,“VulkanonNVIDIAGPUs,”ACMSIGGRAPH2015Boyd,C.,“Direct3D12,”ACMSIGGRAPH2015Yeung,A.,“DirectX12Multiadapter,”DirectXDeveloperBlog,Apr.30,2015Smith,R.,“GeForce+Radeon:PreviewingDirectX12 Multi-Adapter,”Anandtech,Oct.26,2015Merry,B.,“PerformanceTuningforTile-BasedArchitecture,”OpenGLInsights,eds.Cozzi,P.andRiccio,C.,2012


Top Related