vulkan gdc 2016 event - khronos group · 38 march 2016 | combinatorial march 2016 | combinatorial...
Post on 06-Jun-2020
7 Views
Preview:
TRANSCRIPT
© Copyright Khronos Group 2016 - Page 1
Vulkan GDC 2016 Event
The Vulkan Working Group
© Copyright Khronos Group 2016 - Page 2
Schedule
• The Big Picture2:00 Where we are, and how we got here Tom Olson (ARM)2:20 Vulkan design philosophy Jeff Bolz (NVIDIA)
• The Deep Dive2:45 Vulkan memory allocation Graham Sellers (AMD)3:15 Command Buffers and renderpass / subpass Bill Licea-Kane (Qualcomm)3:45 Barriers and synchronization Tobias Hector (Imagination)4:15 Swapchains unchained! Alon Or-bach (Samsung)4:30 Vulkan layers Karen Ghavam & team (LunarG)
© Copyright Khronos Group 2016 - Page 3
Schedule, continued
• The Voice of Experience5:00 How we organize our engine for Vulkan Dan Baker (Oxide)5:30 Performance lessons from porting Source 2 Dan Ginsburg (Valve)
• Short Subjects 6:00 Vulkan does Retro! Hans-Kristian Arntzen (ARM)6:15 Porting Cinder to Vulkan Hai Nguyen (Google)6:30 GFXBench 5 Aztec Ruins Gergely Juhasz (Kishonti)6:45 Vulkan and OpenGL ES Barthold Lichtenbelt (NVIDIA)
• Party!7:00 Food, drink, and demos
© Copyright Khronos Group 2016 - Page 4
Logistics
• No breaks in the program!- Go for coffee etc. during Q&A time- Coffee, water available in the Mission Suite- Restrooms by the elevator – code 3636
• We WILL stay on schedule
• And don’t miss Vulkan at GDC:- D3D12 & Vulkan: Lessons Learned – Thursday 10am, Room 3016 West Hall- Practical Development for Vulkan – Thursday 12:45pm, Room 3009 West Hall- Vulkan and NVIDIA: The Essentials – Thursday 2pm, Room 3014 West Hall
© Copyright Khronos Group 2016 - Page 5
Where we are, and how we got hereTom Olson – Director of Graphics Research, ARM
Chair, Vulkan Working Group
© Copyright Khronos Group 2016 - Page 6
Vulkan: the origin story
• The Force Awakens: October 2012- GL Common TSG formed to consider a ground-up redesign of OpenGL / ES- Brainstorming and design sketches
• A New Hope: June/July 2014- Effort rebooted as GL Next – becomes the top priority- Unprecedented participation from key ISVs- AMD donates Mantle as a starting point
• Renamed and disclosed at GDC 2015
• Public launch on February 16, 2016
© Copyright Khronos Group 2016 - Page 7
Vulkan vision and goals at project launch
• An open-standard, cross-platform 3D+compute API for the modern era- Compatibility break with OpenGL- Start from first principles
• Goals- Clean, modern architecture- Multi-thread / multicore-friendly- Greatly reduced CPU overhead- Full support for both desktop and mobile GPU architectures- More predictable performance – no driver magic- Improved reliability and consistency between implementations
•We think we nailed these
© Copyright Khronos Group 2016 - Page 8
Wait a minute…
• Clean?- I didn’t say Simple- It’s as clean as we were able to make it
• Improved reliability and consistency between implementations?
- Will come with time- We’ve laid the foundations- Note: Only guaranteed for well-behaved applications
© Copyright Khronos Group 2016 - Page 9
Where we are today
• Vulkan 1.0 launched one month ago today!- Specifications (API, SPIR-V, DF, extensions)- Man pages- GLSL to SPIR-V compiler- Basic SDK (loader, layers)- Conformance test
• All Khronos resources are open source- https://github.com/KhronosGroup/
© Copyright Khronos Group 2016 - Page 10
Industry Support• Conformant drivers
• 3rd party tools- LunarG SDK- IHV and ISV tools
• An actual game!- Croteam “The Talos Principle”- Beta Vulkan port
© Copyright Khronos Group 2016 - Page 11
Platform Support
Vulkan is Cross Platform!Any OpenGL ES 3.1/4.X GPU
© Copyright Khronos Group 2016 - Page 12
The Move to Open Source• Going well so far- Repo structure and workflow seem to work- CLA taking longer than expected
• Community response has been fantastic!- Header file improvements- Typos and spec clarifications- Spec errors
• Be careful what you wish for…- Processing the input takes time- Still tuning our processes- We are committed to making this work
© Copyright Khronos Group 2016 - Page 13
• A whole industry, working together- GPU and SoC vendors- Game and middleware developers- Platform owners- Content providers
Brought to you by…
© Copyright Khronos Group 2016 - Page 14
• A bunch of people who really care about graphics
Brought to you by…
The Vulkan and SPIR-V working groups, Seattle, January 2016
© Copyright Khronos Group 2016 - Page 15
Vulkan Design Philosophy
Jeff BolzMarch 2016
© Copyright Khronos Group 2016 - Page 16
Overview• Explicit API
• Portability concerns
• Examples
• Multithreading
• Render passes and Multipass
© Copyright Khronos Group 2016 - Page 17
Explicit API – what it is• Providing information at the right time• Predictable performance costs - creating pipelines, allocating memory, …
• No driver magic on-the-clock- remove guesswork and late decision-making
• Simpler drivers• Better scheduling control over CPU and GPU work• Maintains (and adds new) higher-level abstractions
state definitions
api objectsgeneration
timeline(s)
api objectusage
Explicit API
Legacy API
implicitinternal
validation &objects
state contextmanipulation
© Copyright Khronos Group 2016 - Page 18
Explicit API – what it is not• Low-Level == Thin layer over specific hardware
implementation, little abstraction- Not possible given wide variety of hardware
• “Making everything the app’s problem”• “Getting the driver out of the way”• Solves a different problem than we were asked to
App
Driver
GPU
LegacyAPI
App
Driver
GPU
Low-LevelAPI
IHV
2IH
V 1
IHV
0
App
Off
the
clo
ck
GPU
VulkanAPI
On theclock
© Copyright Khronos Group 2016 - Page 19
Portability - Write once, run anywhere• Avoid mutually exclusive code paths- Quickly becomes a combinatorial nightmare- Compressed texture formats are a
necessary exception• Example: Fixed-function blending/vertex
fetch, vs programmable- Fixed-function can be converted to
programmable during pipeline creation• Strong desire to avoid forking the ecosystem
due to implementation details• Caps bit hell, Feature Levels coming soon
© Copyright Khronos Group 2016 - Page 20
Portability (Counter)Examples• Vulkan’s Soul – but how portable?• Some information is ignored on some implementations
TILERSAMDNVIDIA
Ignore image layouts
bufferImageGranularity>1
12 UBOs, 64KB each
State inheritance
By-region dependency
Load/Store Ops/renderArea
No state inheritance
Uses image layouts
bufferImageGranularity=1
UBOs/descriptor sets up to 4GB
State inheritance
© Copyright Khronos Group 2016 - Page 21
Portability• Validation is a critical part of the ecosystem• Can’t just test on one implementation and call it good enough• OpenGL tries to minimize undefined behavior, Vulkan doesn’t• First line of defense is the validation layers• Currently only verify that an application runs correctly on
your HW• vkjson project exports features/limits of an implementation,
and imports them into validation• Validates your app as if it were running on another
implementationContribute!!
App
Driver
GPU
LayeredAPI
Layers
Warnings/Errors
© Copyright Khronos Group 2016 - Page 22
Multithreading• All Vulkan commands/objects are Not Thread-Affine- Can execute on any thread, no TLS, no apartments
• Most Vulkan objects are Not Thread-Safe• Each use of an object is classified as Externally Synchronized (externsync) or not- An externsync use allows no other concurrent use of the object- Simultaneous non-externsync uses are allowed (e.g. read-only accesses)
• 99% of the time, non-externsync means “no synchronization necessary”• 1% of the time, it means “internally synchronized”
© Copyright Khronos Group 2016 - Page 23
DescriptorPool and CommandPool• Command buffers automatically allocate memory as you record into them- Without command pools, there’s a tension between allocating small chunks which
hammers on the allocator vs allocating large chunks which wastes memory• Allocating descriptor sets is a high-frequency operation, called as part of
multithreaded command buffer recording- Even harder problem than command buffers – can’t just alloc large chunks
• Pools provide an explicit object to manage per-thread allocators
© Copyright Khronos Group 2016 - Page 24
DescriptorPool and CommandPool• Objects added to the API (almost) exclusively for multithreaded scalability• Meant to be simple lock-free allocators, with app using one per thread
• Gives the app the tools it needs, doesn’t just make it the app’s problem
CommandBuffer
vkCmd vkCmd vkCmd vkCmd vkCmd vkCmd vkCmd vkCmd vkCmd
Set SetSet
DescriptorPool
CommandPool
Thread
© Copyright Khronos Group 2016 - Page 25
Pipeline Caches• Pipeline cache- Object that captures reusable information during pipeline creation
• Prior art:- ARB_get_program_binary – single program – no pipeline state- ID3D12PipelineState::GetCachedBlob – single program + all pipeline state- Automatic driver disk caching – “first run” problem
• Can save and reuse compilation results across pipelines within a run• API exposes driver/hardware IDs, so cache data can be shared across systems
© Copyright Khronos Group 2016 - Page 26
Pipeline Caches• Examples of possible reuse- Mix and match of shaders in a pipeline- Same shader with different pipeline state- Same shader with different specialization
constants- Whatever makes sense on an implementation
Pipeline Cache
(empty)VS0 FS0
Pipeline CacheVS1 FS1 VS0 FS0
Pipeline CacheVS0 FS2 VS0 FS0
VS1 FS1
miss
miss
hit
© Copyright Khronos Group 2016 - Page 27
Pipeline Caches and Threading• A rare (the only?) instance of an object internally synchronized by design• Obviously can’t expect app to take lock around vkCreate*Pipelines• Driver can take a lock at a much finer granularity- when probing and adding data to the cache
• A single cache can be used in multiple threads
© Copyright Khronos Group 2016 - Page 28
Render Passes• Original goals and mechanisms• Separate rendering commands from blit/copy commands- Resource updates flush a tiler and thus are very expensive- They’re expensive on immediate renderers as well
• LOAD/STORE/CLEAR/DISCARD operations on attachments• AA resolve when the pass is complete
Render pass Render pass
memoryupdates
blits &copies
© Copyright Khronos Group 2016 - Page 29
Multipass• Evolved into supporting multiple “subpasses”• A dependency graph (DAG) between subpasses:- Each is a subpass – a list of attachments w/format info
- Each is an execution/memory dependency between subpasses- Each edge indicates whether the dependency is tiler-friendly (BY_REGION)
Node
edge
Subpass 0
Render Pass Instance (in cmdbuffer)Subpass 0
Subpass 1
Subpass 2Subpass 1 Subpass 2
Subpass 0
Subpass 1
Subpass 2
Render Pass
© Copyright Khronos Group 2016 - Page 30
Multipass Tiling
0 1 2
3 4 5
6 7 8
9 10 11
12 13 14
15 16 17
Subpass 0:
Subpass 1:!BY_REGION
0 2 4
6 8 10
12 14 16
1 3 5
7 9 11
13 15 17
Subpass 0:
Subpass 1:BY_REGION
© Copyright Khronos Group 2016 - Page 31
Multipass• Compiled before use• Opportunity to do “register allocation” of on-chip resources• Pipelines compiled against a (compatible) renderpass• See GLES/EXT_shader_pixel_local_storage
Registers
0 1 2 1spill
3 14
3 14
0 1restore
5resolve
RAM
2
Subpass 0
Subpass 1
Subpass 2
© Copyright Khronos Group 2016 - Page 32
Conclusion• Early concerns about being too hard to use / only viable for AAA developers- But in the end, the abstractions we have are quite usable and the api evolved
being “simpler” than first versions.• Higher-level abstractions allow for better hardware innovations, optimization
opportunities, and implementation on a wide variety of hardware• Not “Low-Level” – careful design decisions give apps the tools they need to
improve performance
Vulkan | NVIDIA Developer Blog
33 MARCH 2016 | COMBINATORIAL
GRAHAM SELLERS, AMD (@grahamsellers)VULKAN MEMORY ALLOCATION
34 MARCH 2016 | COMBINATORIALMARCH 2016 | COMBINATORIAL
� Memory management topics‒Denial‒Anger‒Bargaining‒…
MANAGING MEMORY IN VULKANMEMORY IS IMPORTANT – MANAGE IT WELL
35 MARCH 2016 | COMBINATORIALMARCH 2016 | COMBINATORIAL
� Memory management topics‒Pooled objects‒Host memory allocation‒Device memory allocation
MANAGING MEMORY IN VULKANMEMORY IS IMPORTANT – MANAGE IT WELL
36 MARCH 2016 | COMBINATORIAL
BATCHING ALLOCATIONS
POOLED OBJECTS
37 MARCH 2016 | COMBINATORIALMARCH 2016 | COMBINATORIAL
� Some objects in Vulkan are allocated in pools‒Command buffers‒Descriptors‒Queries
� Each of these types has an associated pool type‒VkCommandPool‒VkDescriptorPool‒VkQueryPool
POOLED OBJECTSBATCH CREATION
38 MARCH 2016 | COMBINATORIALMARCH 2016 | COMBINATORIAL
� Pool objects are not thread safe‒Create several pools‒Assign a pool to a thread‒That thread is free to allocate from the pool without locks‒The application must ensure that a pool is not used from two threads at once
� Freeing the pool frees all objects allocated from it‒No need to free each individual object in the pool
POOLED OBJECTSTHREAD SAFETY
39 MARCH 2016 | COMBINATORIALMARCH 2016 | COMBINATORIAL
� Command pools are used to allocate command buffers� As command buffers grow, memory might be allocated
‒This memory comes from the pool‒Application guarantees that no two threads build command buffers from the
same pool at the same time‒A single thread can build multiple command buffers from the same pool, though
‒High performance, single threaded allocator‒Use a command pool per-thread
COMMAND POOLSWHERE COMMAND BUFFERS COME FROM
40 MARCH 2016 | COMBINATORIALMARCH 2016 | COMBINATORIAL
� Descriptor pools are used to allocate descriptor sets� Descriptors are generally homogenous arrays
‒It makes sense to manage them with specialized allocators‒Descriptor pools are those allocators‒Again, not thread safe
‒Use a dedicated thread to manage descriptor sets, or‒Create a descriptor pool for each thread that might allocate descriptor sets‒etc.
DESCRIPTOR POOLSWHERE DESCRIPTOR SETS COME FROM
41 MARCH 2016 | COMBINATORIALMARCH 2016 | COMBINATORIAL
� Query objects are generally small in memory‒Often a single 64-bit integer or similar
� Managing them as individual objects would be painful‒Not performant‒Fragments memory‒Hard to batch queries
� Query objects therefore allocated from pools‒Allows contiguous allocation‒Makes batching results easy
QUERY POOLSWHERE QUERY OBJECTS COME FROM
42 MARCH 2016 | COMBINATORIAL
MANAGING SYSTEM MEMORY
HOST MEMORY
43 MARCH 2016 | COMBINATORIALMARCH 2016 | COMBINATORIAL
� Vulkan allows your application to manage system memory‒System memory allocated via callbacks‒Most object creation functions take a VkAllocationCallbacks structure:
MANAGING HOST MEMORYALL OUR MEMORY ARE BELONG TO YOU
typedef struct VkAllocationCallbacks {void* pUserData;PFN_vkAllocationFunction pfnAllocation;PFN_vkReallocationFunction pfnReallocation;PFN_vkFreeFunction pfnFree;PFN_vkInternalAllocationNotification pfnInternalAllocation;PFN_vkInternalFreeNotification pfnInternalFree;
} VkAllocationCallbacks;
44 MARCH 2016 | COMBINATORIALMARCH 2016 | COMBINATORIAL
� The instance and device have their own allocators‒When you create the instance, driver will use the instance allocator‒When you create the device, driver may:
‒Use the device level allocator, or‒Use the instance level allocator
‒When you create an object, the driver may:‒Use the object level allocator,‒Use the device level allocator, or‒Use the instance level allocator
MEMORY MANAGER HIERARCHYSTEPPING UP
45 MARCH 2016 | COMBINATORIALMARCH 2016 | COMBINATORIAL
� Allocation callbacks:‒Allocation, Reallocation, Free – Allocate, reallocate and free, respectively
‒pUserData: Whatever you want it to be‒allocationScope: Scope or lifetime of the allocation
‒Command, object, cache, device or instance
HOST MEMORY ALLOCATORBASIC ALLOCATIONS
typedef void* (VKAPI_PTR *PFN_vkAllocationFunction)(void* pUserData,size_t size,size_t alignment,VkSystemAllocationScope
allocationScope);
46 MARCH 2016 | COMBINATORIALMARCH 2016 | COMBINATORIAL
� Drivers will call though the allocator from your thread‒Your allocator doesn’t have to be thread safe‒However, the driver might walk up the hierarchy‒It may call the device or instance allocator even if you give it an object allocator‒Device and instance allocators should be thread safe
HOST MEMORY ALLOCATORHANDLING ALLOCATIONS
47 MARCH 2016 | COMBINATORIALMARCH 2016 | COMBINATORIAL
� Sometimes the driver makes internal allocations‒Cannot use the host allocator for some reason
‒Need to allocate memory for executable code?‒Special alignment, caching or other platform-specific restrictions
� Driver will call notification functions‒pfnInternalAllocation, pfnInternalFree‒These are informational, optional
‒Hook them up in debug builds
INTERNAL MEMORY ALLOCATIONSSPECIAL CASES
48 MARCH 2016 | COMBINATORIALMARCH 2016 | COMBINATORIAL
� Object destruction functions take an allocator too� Allocator passed to vkDestroy* must be compatible with that passed to vkCreate*
� Again, driver might not use your allocator‒Objects internally allocated in pools?‒Object still in use internally?‒etc.
FREEING HOST MEMORYRETURNING MEMORY TO THE SYSTEM
49 MARCH 2016 | COMBINATORIAL
MANAGING GPU MEMORY
DEVICE MEMORY
50 MARCH 2016 | COMBINATORIALMARCH 2016 | COMBINATORIAL
� Resources require GPU memory� Resource and memory are separate objects� Allocate and free memory with vkAllocateMemory and vkFreeMemory� Bind memory to resource with:
‒ vkBindImageMemory for images‒vkBindBufferMemory for buffers
� Other objects may require GPU memory‒Driver manages that for you
DEVICE MEMORYALLOCATION OF MEMORY FOR RESOURCES
51 MARCH 2016 | COMBINATORIALMARCH 2016 | COMBINATORIAL
� Memory is allocated from heaps� Within each heap, there may be multiple “types” of memory
� Heap properties include its size and whether it is device local� Type properties include caching and coherency options� Determine memory properties with vkGetPhysicalDeviceMemoryProperties
DEVICE MEMORY PROPERTIESMEMORY TYPES AND HEAPS
typedef struct VkMemoryHeap {VkDeviceSize size;VkMemoryHeapFlags flags;
} VkMemoryHeap;
typedef struct VkMemoryType {VkMemoryPropertyFlags propertyFlags;uint32_t heapIndex;
} VkMemoryType;
52 MARCH 2016 | COMBINATORIALMARCH 2016 | COMBINATORIAL
� To determine the requirements of an image:
� Returns information in VkMemoryRequirements:
� Pick memory type by matching memoryTypeBits with reported types
QUERYING OBJECT REQUIREMENTSWHAT KIND OF MEMORY DOES A RESOURCE NEED?
VKAPI_ATTR void VKAPI_CALL vkGetImageMemoryRequirements(VkDevice device,VkImage image,VkMemoryRequirements*
pMemoryRequirements);
typedef struct VkMemoryRequirements {VkDeviceSize size;VkDeviceSize alignment;uint32_t memoryTypeBits;
} VkMemoryRequirements;
53 MARCH 2016 | COMBINATORIALMARCH 2016 | COMBINATORIAL
� To allocate device memory, call vkAllocateMemory
� Pass VkMemoryAllocateInfo
� Allocates from the heap that memoryTypeIndex lives in
ALLOCATE DEVICE MEMORYGETTING MEMORY TO USE
VKAPI_ATTR VkResult VKAPI_CALL vkAllocateMemory(VkDevice device,const VkMemoryAllocateInfo* pAllocateInfo,const VkAllocationCallbacks* pAllocator,VkDeviceMemory* pMemory);
typedef struct VkMemoryAllocateInfo {VkStructureType sType;const void* pNext;VkDeviceSize allocationSize;uint32_t memoryTypeIndex;
} VkMemoryAllocateInfo;
54 MARCH 2016 | COMBINATORIALMARCH 2016 | COMBINATORIAL
� Once you have memory and an object, bind them:
� This binds memory to image starting from memoryOffset‒Object consumes as much as it needs from memory object‒Objects can overlap in memory – you need to manage hazards
BIND MEMORY TO OBJECTSASSOCIATING MEMORY WITH AN OBJECT
VKAPI_ATTR VkResult VKAPI_CALL vkBindImageMemory(VkDevice device,VkImage image,VkDeviceMemory memory,VkDeviceSize
memoryOffset);
55 MARCH 2016 | COMBINATORIALMARCH 2016 | COMBINATORIAL
� To free device memory, call
� It is your responsibility to resolve hazards‒Do not free memory that is potentially in use by the device‒This may mean a stall before freeing memory‒Consider recycling allocations
FREEING DEVICE MEMORYRETURN WHAT YOU HAVE TAKEN
VKAPI_ATTR void VKAPI_CALL vkFreeMemory(VkDevice device,VkDeviceMemory memory,const VkAllocationCallbacks* pAllocator);
56 MARCH 2016 | COMBINATORIAL
NOT SILLLY
BE SMART
57 MARCH 2016 | COMBINATORIALMARCH 2016 | COMBINATORIAL
� Object pools aggregate large numbers of similar objects‒Used for homogeneous objects such as queries and descriptors‒Used for high-frequency allocations – command buffers
� Pools are not thread safe‒Use a pool per-thread‒Implement your own mutex
� Freeing the pool frees all the objects in it‒Make sure you’re done with them before you free them
USE OBJECT POOLSTHEY’RE PART OF THE API
58 MARCH 2016 | COMBINATORIALMARCH 2016 | COMBINATORIAL
� Using a system memory allocator is optional‒Do it if you think you can do a better job than the driver‒Drivers might not use your allocator the way you think they will
� If you do write a system memory allocator‒Make sure you properly support alignment‒Don’t just hook it up to malloc
‒Even “stupid drivers” do a better job than that‒Don’t go poking around inside the allocations
‒There’s no secret sauce
MANAGE SYSTEM MEMORYIF YOU WANT
59 MARCH 2016 | COMBINATORIALMARCH 2016 | COMBINATORIAL
� You need to write a memory allocator!� Don’t map resources to memory objects 1:1
‒Each memory object is tracked by the driver and OS‒Every submission handles memory access at allocation level‒This is a major performance issue
� Make few, large memory objects and sub-allocate from them‒Resources can share memory – you manage hazards‒Pack resources together
MANAGE GPU MEMORYBECAUSE YOU HAVE TO
60 MARCH 2016 | COMBINATORIAL
61 MARCH 2016 | COMBINATORIALFEBRUARY 2015 | CONFIDENTIAL
DISCLAIMER & ATTRIBUTION
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
ATTRIBUTION
©2016 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, [insert all other AMD trademarks used in the material here per AMD’s Checklist for Trademark Attribution] and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.
Qualcomm Technologies, Inc.
Vulkan:Command Buffers & Render Pass / Subpass
Bill Licea-Kane
A64
Overview
• Command Buffers
• Render Pass / Subpass
Vulkan: Command Buffers and Render Pass / Subpass
A65
Overview
• Command Pools
• Command Buffers– States– Primary Command Buffers– Secondary Command Buffers
Command Buffers
A66
Command Buffer Pools
Command Buffers
Host Memory Command Buffer Pool
Command Buffer Pool
Command Buffer Pool
Command Buffer
Command Buffer
Command Buffer
A67
Command Pools – Creating / Destroying
• vkResult VkCreateCommandPool(
vkDevice device,
const VkCommandPoolCreateInfo* pCreateInfo,
const vkAllocateCallbacks* pAllocator,
VkCommandPool* pCommandPool);
– pCreateInfo.flags
– VK_COMMAND_BUFFER_CREATE_TRANSIENT_BIT
– VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT
– pCreateInfo.queFamilyIndex
• vkResult VkDestroyCommandPool(
vkDevice device,
VkCommandPool CommandPool,
const vkAllocateCallbacks* pAllocator);
Command Buffers
A68
Command Pools – Resetting
• vkResult VkResetCommandPool(
vkDevice device,
VkCommandPool commandPool,
vkCommandPoolResetFlags flags);
– flags
– VK_COMMAND_POOL_RESET_RELEASE_RESOURCES_BIT
Command Buffers
A69
States
• Initial
• Recording
• Executable
• (Pending Execution)
Command Buffers
A70
Command Buffers – Allocating / Freeing
• VkResult vkAllocateCommandBuffers(
VkDevice device,
const VkCommandBufferAllocateInfo* pAllocateInfo,
VkCommandBuffer* pCommandBuffers);
– pAllocateInfo.commandPool
– pAllocateInfo.level
– VK_COMMAND_BUFFER_LEVEL_PRIMARY– VK_COMMAND_BUFFER_LEVEL_SECONDARY
– pAllocateInfo.commandBufferCount
• void vkFreeCommandBuffers(
VkDevice device,
VkCommandPool commandPool,
uint32_t commandBufferCount,
const VkCommandBuffer* pCommandBuffers);
Command Buffers
A71
Command Buffers – Resetting
• VkResult vkResetCommandBuffer(
VkCommandBuffer commandBuffer,
VkCommandBufferResetFlags flags);.
– flags
– VK_COMMAND_BUFFER_RESET_RELEASE_RESOURCES_BIT
Command Buffers
A72
Command Buffers – Recording
• VkResult vkBeginCommandBuffer(
VkCommandBuffer commandBuffer,
const VkCommandBufferBeginInfo* pBeginInfo);
– pBeginInfo.flags
– VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT
– VK_COMMAND_BUFFER_USAGE_RENDER_PASS_CONTINUE_BIT
– VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT
– pBeginInfo.pInheritanceInfo
– pBeginInfo.pInheritanceInfo.renderPass
– pBeginInfo.pInheritanceInfo.subpass
– pBeginInfo.pInheritanceInfo.occlusionQueryEnable
– pBeginInfo.pInheritanceInfo.queryFlags
– pBeginInfo.pInheritanceInfo.pipelineStatistics
• VkResult vkEndCommandBuffer(
VkCommandBuffer commandBuffer);
Command Buffers
A73
Command Buffers – Recording
• void vkCmd*
Command Buffers
A74
Command Buffers – Primary & Secondary Command Buffers
• Primary Command Buffers– Render Pass Commands
– vkCmdBeginRenderPass
– vkCmdNextSubPass
– vkCmdEndRenderPass
– Execute Commands– vkCmdExecuteCommands
– Submission Commands
– Can ONLY be submitted
– Can NOT be executed
Command Buffers
A75
Command Buffers – Primary & Secondary Command Buffers
• Primary Command Buffers– Render Pass Commands
– vkCmdBeginRenderPass
– vkCmdNextSubPass
– vkCmdEndRenderPass
– Execute Commands– vkCmdExecuteCommands
– Submission Commands
– Can ONLY be submitted
– Can NOT be executed
Command Buffers
• Secondary Command Buffers– Render Pass Commands
– NO
– Execute Commands– NO
– Submission Commands– Can ONLY be executed– Can NOT be submitted
A76
Command Buffers – Submitting
• VkResult vkQueueSubmit(
VkQueue queue,
uint32_t submitCount,
const VkSubmitInfo* pSubmits,
VkFence fence);
– pSubmits[].waitSemaphoreCount
– pSubmits[].pWaitSemaphores
– pSubmits[].pWaitDstStageMask
– pSubmits[].commandBufferCount
– pSubmits[].pCommandBuffers
– pSubmits[].signalSemaphoreCount
– pSubmits[].pSignalSemaphores
Command Buffers
A77
Command Buffers
Primary Command Buffer
Bind
Set
PushConstants
VkBeginCommandBuffer
A78
Command Buffers
Primary Command Buffer
Bind
Set
PushConstants
VkCmdBind*
VkBeginCommandBuffer
A79
Command Buffers
Primary Command Buffer
Bind
Set
PushConstants
VkCmdBind*
VkBeginCommandBuffer
VkCmdDispatch*
A80
Command Buffers
Primary Command Buffer
Bind
Set
PushConstants
VkPushConstants
VkCmdBind*
VkBeginCommandBuffer
VkCmdDispatch*
A81
Command Buffers
Primary Command Buffer
Bind
Set
PushConstants
VkCmdPushConstants
VkCmdBind*
VkBeginCommandBuffer
VkCmdDispatch*
VkCmdDispatch*
VkCmdDispatch*
A82
Command Buffers
Primary Command Buffer
Bind
Set
PushConstants
VkCmdPushConstants
VkCmdBind*
VkBeginCommandBuffer
VkCmdExecute
VkCmdDispatch*
VkCmdDispatch*
VkCmdDispatch*
A83
Command Buffers
Primary Command Buffer
Bind
Set
PushConstants
VkCmdPushConstants
VkCmdBind*
VkBeginCommandBuffer
VkCmdExecute
VkCmdBind*
VkCmdPushConstants
VkCmdDispatch*
VkCmdDispatch*
VkCmdDispatch*
VkCmdDispatch*
VkCmdDispatch*
A84
Command Buffers
Primary Command Buffer
Bind
Set
PushConstants
VkCmdPushConstants
VkCmdBind*
VkBeginCommandBuffer
VkCmdExecute
VkCmdBind*
VkCmdPushConstants
VkCmdDispatch*
VkCmdDispatch*
VkCmdDispatch*
VkCmdExecute
VkCmdDispatch*
VkCmdDispatch*
A85
Command Buffers
Primary Command Buffer
Bind
Set
PushConstants
VkCmdPushConstants
VkCmdBind*
VkBeginCommandBuffer
VkCmdExecute
VkCmdBind*
VkCmdPushConstants
VkCmdDispatch*
VkCmdDispatch*
VkCmdDispatch*
VkCmdExecute
VkCmdBind*
VkCmdDispatch*
VkCmdDispatch*
VkEndCommandBuffer
VkCmdDispatch*
VkCmdDispatch*
A86
Command Buffers
Secondary Command Buffer
Bind
Set
PushConstants
VkBeginCommandBuffer
A87
Command Buffers
Secondary Command Buffer
Bind
Set
PushConstants
VkCmdBind*
VkBeginCommandBuffer
VkEndCommandBuffer
VkCmdBind*
VkCmdDispatch*
VkCmdDispatch*
VkCmdBind*
VkCmdDispatch*
VkCmdDispatch*
VkCmdDispatch*
VkCmdPushConstants
VkCmdPushConstants
VkCmdPushConstants
VkCmdDispatch*
VkCmdDispatch*
VkCmdDispatch*
A88
Render Pass – Creating / Destroying
• VkResult vkCreateRenderPass(
VkDevice device,
const VkRenderPassCreateInfo* pCreateInfo,
const VkAllocationCallbacks* pAllocator,
VkRenderPass* pRenderPass);
– pCreateInfo.attachmentCount;
– pCreateInfo.pAttachments;
– pCreateInfo.subpassCount;
– pCreateInfo.pSubpasses;
– pCreateInfo.dependencyCount;
– pCreateInfo.pDependencies;
• void vkDestroyRenderPass(
VkDevice device,
VkRenderPass renderPass,
const VkAllocationCallbacks* pAllocator);
Render Pass & Subpass
A89
Render Pass – vkAttachmentDescription
• pCreateInfo.pAttachments[].flags;
– VK_ATTACHMENT_DESCRIPTION_MAY_ALIAS_BIT
• pCreateInfo.pAttachments[].format;
• pCreateInfo.pAttachments[].samples;
Render Pass & Subpass
A90
Render Pass – vkAttachmentDescription continued…
• pCreateInfo.pAttachments[].loadOp;
– VK_ATTACHMENT_LOAD_OP_LOAD
– VK_ATTACHMENT_LOAD_OP_CLEAR
– VK_ATTACHMENT_LOAD_OP_DONT_CARE
• pCreateInfo.pAttachments[].storeOp;
– VK_ATTACHMENT_STORE_OP_STORE
– VK_ATTACHMENT_STORE_OP_DONT_CARE
• pCreateInfo.pAttachments[].stencilLoadOp;
• pCreateInfo.pAttachments[].stencilStoreOp;
• pCreateInfo.pAttachments[].initialLayout;
• pCreateInfo.pAttachments[].finalLayout;
Render Pass & Subpass
A91
Render Pass – vkSubpassDescription
• pCreateInfo.pSubpasses[].flags
• pCreateInfo.pSubpasses[].pipelineBindPoint
– VK_PIPELINE_BIND_POINT_GRAPHICS
• pCreateInfo.pSubpasses[].inputAttachmentCount
• pCreateInfo.pSubpasses[].pInputAttachments
• pCreateInfo.pSubpasses[].colorAttachmentCount
• pCreateInfo.pSubpasses[].pColorAttachments
• pCreateInfo.pSubpasses[].pResolveAttachments
• pCreateInfo.pSubpasses[].pDepthStencilAttachment
• pCreateInfo.pSubpasses[].preserveAttachmentCount
• pCreateInfo.pSubpasses[].pPreserveAttachments
Render Pass & Subpass
A92
Render Pass – vkRenderPassDepenency
• pCreateInfo.pDependencies[].srcSubpass
• pCreateInfo.pDependencies[].dstSubpass
• pCreateInfo.pDependencies[].srcStageMask
• pCreateInfo.pDependencies[].dstStageMask
• pCreateInfo.pDependencies[].srcAccessMask
• pCreateInfo.pDependencies[].dstAccessMask
• pCreateInfo.pDependencies[].dependencyFlags
– VK_DEPENDENCY_BY_REGION_BIT
Render Pass & Subpass
A93
RenderPass – Subpass Dependencies Final = ( ( Surface * AmbientDiffuse ) + Specular ) Over Background
Render Pass & Subpass
Subpass0Background
Subpass1Surface
Subpass2AmbientDiffuse
Subpass3Specular
Subpass4Final
A94
Render Pass Commands
• void vkCmdBeginRenderPass(
VkCommandBuffer commandBuffer,
const VkRenderPassBeginInfo* pRenderPassBegin,
VkSubpassContents contents);
• void vkCmdNextSubpass(
VkCommandBuffer commandBuffer,
VkSubpassContents contents);
• void vkCmdEndRenderPass(
VkCommandBuffer commandBuffer);
– contents
– VK_SUBPASS_CONTENTS_INLINE
– VK_SUBPASS_CONTENTS_SECONDARY_COMMAND_BUFFERS
Render Pass & Subpass
A95
Framebuffer
• VkResult vkCreateFramebuffer(
VkDevice device,
const VkFramebufferCreateInfo* pCreateInfo,
const VkAllocationCallbacks* pAllocator,
VkFramebuffer* pFramebuffer);
– pCreateInfo.renderPass
– pCreateInfo.attachmentCount
– pCreateInfo.pAttachments
– pCreateInfo.width
– pCreateInfo.height
– pCreateInfo.layers
Render Pass & Subpass
A96
Command Buffer Commands
• vkCmdBind* both
• vkCmdSet* both
• vkCmdDraw* inside
• vkCmdDispatch* outside
• vkCmdCopy* outside
• vkCmdUpdate/Fill outside
• vkCmdClear*Image outside
• vkCmdBlitImage outside
• vkCmdResolveImage outside
• vkCmdClearAttachments inside
• vkCmdSetEvent outside
• vkCmdRestEvent outside
Command Buffers & Render Pass / Subpass
• vkCmdWaitEvents both
• vkCmdPipelineBarrier both
• vkCmdResetQueryPool outside
• vkCmdCopyQueryPoolResults outside
• vkCmdBeginQuery both
• vkCmdEndQuery both
• vkCmdWriteTimestamp both
• vkCmdPushConstantsConstants both
• vkCmdBeginRenderPass outside
• vkCmdNextSubpass inside
• vkCmdEndRenderPass inside
• vkCmdExecuteCommands both
A97
Putting it all together – Primary Command Buffer
Command Buffers & Render Pass / Subpass
Bind
Set
PushConstants
Render Pass
Sub Pass
VkCmdPushConstants
VkCmdSet*
VkCmdBind*
VkCmdBeginRenderPass
VkBeginCommandBuffer
VkCmdNextSubpass
VkCmdNextSubpass
VkCmdEndRederPass
VkCmdDispatch*
VkCmdDraw*
VkCmdDraw*
VkCmdDraw*
VkCmdDraw*
VkCmdExecute
VkCmdExecute
VkEndCommandBuffer
A98
Putting it all together – Secondary Command Buffer within a Subpass
Command Buffers & Render Pass / Subpass
Bind
Set
PushConstants
Render Pass
Sub Pass
VkCmdSet*
VkBeginCommandBuffer
VkEndCommandBuffer
VkCmdPushConstants
VkCmdBind*
VkCmdBind*
VkCmdDraw*
VkCmdDraw*
VkCmdDraw*
VkCmdBind*
VkCmdPushConstants
VkCmdPushConstants
VkCmdDraw*
VkCmdDraw*
VkCmdDraw*
VkCmdDraw*
MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION | 80-xxxxx-x Rev. A99
Vulkan: Command Buffers &Render Pass / Subpass
Thank You!
MAY CONTAIN U.S. AND INTERNATIONAL EXPORT CONTROLLED INFORMATION | 80-xxxxx-x Rev. A100
Vulkan: Command Buffers & Render Pass / Subpass
BACKGROUND INFO
A101
Definitions
• Render Pass• Render Pass Instance• Attachment Description• Subpass• Subpass Description• Subpass Dependencies• Subpass Dependency Chain• Framebuffer• Render Pass Compatibility
Render Pass & Subpass
A102
Definition – Render Pass• A collection of:
– Attachment Descriptions– Subpass Descriptions– Subpass Dependencies
• Contains at least one subpass
Render Pass & Subpass
A103
Definition – Render Pass Instance• The use of a Render Pass in a command buffer
Render Pass & Subpass
A104
Definition - Subpass• Reads/Writes a subset of the attachments
Render Pass & Subpass
A105
Definition – Attachment Description• The properties of an attachment:
– format– sample count– operations to perform on its contents prior to the its first use inside a render pass instance
– LoadOps– operations to perform on contents after the last use inside a render pass instance
– StoreOps
Render Pass & Subpass
A106
Definition – Subpass Description• Input Attachments
• Color Attachments
• Depth/Stencil Attachments
• Resolve Attachments
• Preserve Attachments
Render Pass & Subpass
A107
Definition – Subpass Dependencies• Ordering restrictions between pairs of Subpasses
Render Pass & Subpass
A108
Definition – Subpass Dependency Chain• A sequence of subpass depencies in a render pass
Render Pass & Subpass
© Copyright Khronos Group 2016 - Page 109
Keeping your GPU fedwithout getting bitten
Tobias HectorMarch 2016
© Copyright Khronos Group 2016 - Page 110
Introduction• You have delicious draw calls- Yummy!
© Copyright Khronos Group 2016 - Page 111
Introduction• You have delicious draw calls- Yummy!
• Your GPU wants to eat them- It’s really hungry
© Copyright Khronos Group 2016 - Page 112
Introduction• You have delicious draw calls- Yummy!
• Your GPU wants to eat them- It’s really hungry
• Keep it fed at all times- So it keeps making pixels
© Copyright Khronos Group 2016 - Page 113
Introduction• You have delicious draw calls- Yummy!
• Your GPU wants to eat them- It’s really hungry
• Keep it fed at all times- So it keeps making pixels
• Don’t want it biting your hand- Look at those teeth!
© Copyright Khronos Group 2016 - Page 114
Keeping it fed• GPU needs a constant supply of food- It doesn’t want to wait
• Certain tasks are tough to digest- Provide multiple tasks to hide stalls
• Draw calls provide a variety of nutrition- Vertex work, raster work, tessellation, vitamins A-K, etc.
© Copyright Khronos Group 2016 - Page 115
Keeping it fed
SystemC
PU
GP
U
0
0 1
1
© Copyright Khronos Group 2016 - Page 116
Keeping it fed
SystemC
PU
GP
U
20
0
1
1 2
© Copyright Khronos Group 2016 - Page 117
Keeping it fed
GPUV
erte
xFr
agm
ent
0
0 1
1
© Copyright Khronos Group 2016 - Page 118
Keeping it fed
GPUV
erte
xFr
agm
ent
20
0
1
1 2
© Copyright Khronos Group 2016 - Page 119
Not getting bitten• GPU eating from lots of different plates- Don’t touch anything it’s using!
• It doesn’t want a mouthful of beef choc chip ice cream- Don’t change data whilst it’s accessing a resource
• Hey I’m eating that!- Don’t delete resources whilst the GPU is still using them
© Copyright Khronos Group 2016 - Page 120
© Copyright Khronos Group 2016 - Page 121
© Copyright Khronos Group 2016 - Page 122
© Copyright Khronos Group 2016 - Page 123
On to the serious bits…
© Copyright Khronos Group 2016 - Page 124
Terminology• Execution Dependency- Tasks waiting on other tasks- All synchronization expresses these
• Memory Barrier- A transition of memory from one state to another- Flush/invalidate caches- Determination of access and visibility
• Memory Dependency- Execution dependency involving a Memory Barrier
© Copyright Khronos Group 2016 - Page 125
Synchronization Types• 3 types of explicit synchronization in Vulkan
• Pipeline Barriers, Events and Subpass Dependencies- Within a queue- Explicit memory dependencies
• Semaphores- Between Queues
• Fences- Whole queue operations to CPU
© Copyright Khronos Group 2016 - Page 126
Pipeline Barriers• Pipeline Barriers- Precise set of pipeline stages- Memory Barriers to execute- Single point in time
void vkCmdPipelineBarrier(VkCommandBuffer commandBuffer,VkPipelineStageFlags srcStageMask,VkPipelineStageFlags dstStageMask,VkDependencyFlags dependencyFlags,uint32_t memoryBarrierCount,const VkMemoryBarrier* pMemoryBarriers,uint32_t bufferMemoryBarrierCount,const VkBufferMemoryBarrier* pBufferMemoryBarriers,uint32_t imageMemoryBarrierCount,const VkImageMemoryBarrier* pImageMemoryBarriers);
© Copyright Khronos Group 2016 - Page 127
Events• Events- Same info as Pipeline Barriers- Operate over a range
void vkCmdSetEvent(VkCommandBuffer commandBuffer,VkEvent event,VkPipelineStageFlags stageMask);
void vkCmdResetEvent(VkCommandBuffer commandBuffer,VkEvent event,VkPipelineStageFlags stageMask);
void vkCmdWaitEvents(VkCommandBuffer commandBuffer,uint32_t eventCount,const VkEvent* pEvents,VkPipelineStageFlags srcStageMask,VkPipelineStageFlags dstStageMask,uint32_t memoryBarrierCount,const VkMemoryBarrier* pMemoryBarriers,uint32_t bufferMemoryBarrierCount,const VkBufferMemoryBarrier* pBufferMemoryBarriers,uint32_t imageMemoryBarrierCount,const VkImageMemoryBarrier* pImageMemoryBarriers);
© Copyright Khronos Group 2016 - Page 128
Events• Events- Same info as Pipeline Barriers- Operate over a range
• CPU interaction- No explicit CPU wait- No Memory Barriers
VkResult vkSetEvent(VkDevice device,VkEvent event);
VkResult vkResetEvent(VkDevice device,VkEvent event);
VkResult vkGetEventStatus(VkDevice device,VkEvent event);
© Copyright Khronos Group 2016 - Page 129
VkResult vkSetEvent(VkDevice device,VkEvent event);
VkResult vkResetEvent(VkDevice device,VkEvent event);
VkResult vkGetEventStatus(VkDevice device,VkEvent event);
Events• Events- Same info as Pipeline Barriers- Operate over a range
• CPU interaction- No explicit CPU wait- No Memory Barriers
• Warning!- OS may apply a timeout- Set events soon after submission
© Copyright Khronos Group 2016 - Page 130
Memory Barrier Types• Global Memory Barrier- All memory-backed resources
• Buffer Barrier- For a single buffer range
• Image Barrier- For a single image subresource
© Copyright Khronos Group 2016 - Page 131
Global Memory Barriers• Global Memory Barriers- All memory used by source stages- Effectively flushes entire caches
• Use when all resources transition- Cheaper than one-by-one- Don’t transition unnecessarily!
• User must define prior usage- Driver not tracking for you
typedef struct VkMemoryBarrier {VkStructureType sType;const void* pNext;VkAccessFlags srcAccessMask;VkAccessFlags dstAccessMask;
} VkMemoryBarrier;
© Copyright Khronos Group 2016 - Page 132
Buffer Barriers• Buffer Barriers- A single buffer range- Defines access stages- Defines queue ownership
• User must define prior usage- Driver not tracking for you
typedef struct VkBufferMemoryBarrier {VkStructureType sType;const void* pNext;VkAccessFlags srcAccessMask;VkAccessFlags dstAccessMask;uint32_t srcQueueFamilyIndex;uint32_t dstQueueFamilyIndex;VkBuffer buffer;VkDeviceSize offset;VkDeviceSize size;
} VkBufferMemoryBarrier;
© Copyright Khronos Group 2016 - Page 133
Image Barriers• Image Barriers- A single image subresource- Defines access stages- Defines queue ownership- Defines image layout
• User must define prior usage- Driver not tracking for you- For images, this includes prior layout
• Appropriate layouts allow compression- GPU may use image compression- Saves bandwidth- Use GENERAL instead of switching
frequently
typedef struct VkImageMemoryBarrier {VkStructureType sType;const void* pNext;VkAccessFlags srcAccessMask;VkAccessFlags dstAccessMask;VkImageLayout oldLayout;VkImageLayout newLayout;uint32_t srcQueueFamilyIndex;uint32_t dstQueueFamilyIndex;VkImage image;VkImageSubresourceRange subresourceRange;
} VkImageMemoryBarrier;
© Copyright Khronos Group 2016 - Page 134
Subpass Dependencies• Subpass dependencies- Similar info to Pipeline Barriers- Explicitly between two subpasses
• Memory barrier for attachments- No other memory barriers
• Pixel local dependencies- Same fragment/sample location- Cheap for most implementations- Use region dependency flag:- VK_DEPENDENCY_BY_REGION_BIT
typedef struct VkSubpassDependency {uint32_t srcSubpass;uint32_t dstSubpass;VkPipelineStageFlags srcStageMask;VkPipelineStageFlags dstStageMask;VkAccessFlags srcAccessMask;VkAccessFlags dstAccessMask;VkDependencyFlags dependencyFlags;
} VkSubpassDependency;
© Copyright Khronos Group 2016 - Page 135
Subpass Dependencies• Subpass self-dependencies- Subpasses can wait on themselves- A pipeline barrier in the subpass
• Forward progress only- Can’t wait on later stages- Must wait on earlier or same stage
• Pixel local only between fragments- Must use flag:- VK_DEPENDENCY_BY_REGION_BIT
typedef struct VkSubpassDependency {uint32_t srcSubpass;uint32_t dstSubpass;VkPipelineStageFlags srcStageMask;VkPipelineStageFlags dstStageMask;VkAccessFlags srcAccessMask;VkAccessFlags dstAccessMask;VkDependencyFlags dependencyFlags;
} VkSubpassDependency;
void vkCmdPipelineBarrier(VkCommandBuffer commandBuffer,VkPipelineStageFlags srcStageMask,VkPipelineStageFlags dstStageMask,VkDependencyFlags dependencyFlags,uint32_t memoryBarrierCount,const VkMemoryBarrier* pMemoryBarriers,uint32_t bufferMemoryBarrierCount,const VkBufferMemoryBarrier* pBufferMemoryBarriers,uint32_t imageMemoryBarrierCount,const VkImageMemoryBarrier* pImageMemoryBarriers);
© Copyright Khronos Group 2016 - Page 136
Subpass Dependencies• Subpass external dependencies- Wait on external tasks- VkCmdWaitEvent in the subpass- Outside the current render pass
typedef struct VkSubpassDependency {uint32_t srcSubpass;uint32_t dstSubpass;VkPipelineStageFlags srcStageMask;VkPipelineStageFlags dstStageMask;VkAccessFlags srcAccessMask;VkAccessFlags dstAccessMask;VkDependencyFlags dependencyFlags;
} VkSubpassDependency;void vkCmdWaitEvents(
VkCommandBuffer commandBuffer,uint32_t eventCount,const VkEvent* pEvents,VkPipelineStageFlags srcStageMask,VkPipelineStageFlags dstStageMask,uint32_t memoryBarrierCount,const VkMemoryBarrier* pMemoryBarriers,uint32_t bufferMemoryBarrierCount,const VkBufferMemoryBarrier* pBufferMemoryBarriers,uint32_t imageMemoryBarrierCount,const VkImageMemoryBarrier* pImageMemoryBarriers);
© Copyright Khronos Group 2016 - Page 137
Semaphores• Semaphores- Used to synchronize queues- Not necessary for single-queue
• Fairly coarse grain- Per submission batch- E.g. a set of command buffers
- Multiple per submit command
• Implicit memory guarantees- Effects visible to future tasks on the
same device- Not guaranteed visible to host
typedef struct VkSubmitInfo {VkStructureType sType;const void* pNext;uint32_t waitSemaphoreCount;const VkSemaphore* pWaitSemaphores;const VkPipelineStageFlags* pWaitDstStageMask;uint32_t commandBufferCount;const VkCommandBuffer* pCommandBuffers;uint32_t signalSemaphoreCount;const VkSemaphore* pSignalSemaphores;
} VkSubmitInfo;
© Copyright Khronos Group 2016 - Page 138
Fences• Fences- Used to synchronize queue to CPU
• Very coarse grain- Per queue submit command
• Implicit memory guarantees- Effects visible to future tasks on the
same device- Not guaranteed visible to host
• Useful for frame completion- E.g. resource multi-buffering- Still need image layout transitions!
VkResult vkQueueSubmit(VkQueue queue,uint32_t submitCount,const VkSubmitInfo* pSubmits,VkFence fence);
VkResult vkResetFences(VkDevice device,uint32_t fenceCount,const VkFence* pFences);
VkResult vkGetFenceStatus(VkDevice device,VkFence fence);
VkResult vkWaitForFences(VkDevice device,uint32_t fenceCount,const VkFence* pFences,VkBool32 waitAll,uint64_t timeout);
© Copyright Khronos Group 2016 - Page 139
Wait Idle• Two commands that ensure execution completes
• vkQueueWaitIdle- Equivalent to fence signalling on a queue- Plus wait for idle on the CPU
• vkDeviceWaitIdle- Equivalent to vkQueueWaitIdle for all queues
• Useful primarily at shutdown- Use it to quickly ensure all work is done- Favour other synchronization at all other times
© Copyright Khronos Group 2016 - Page 140
Programmer Guidelines• Specify EXACTLY the right amount of synchronization- Too much and you risk starving your GPU- Miss any and your GPU will bite you
• Pay particular attention to the pipeline stages- Fiddly but become intuitive as you use them
• Consider Image Layouts- If your GPU can save bandwidth it will
• Different behaviour depending on implementation- Test/Tune on every platform you can find!
• Keep your GPU fed without getting bitten!
© Copyright Khronos Group 2016 - Page 141
Questions?
top related