"update on khronos standards for vision and machine learning," a presentation from the...
TRANSCRIPT
© Copyright Khronos Group 2017 - Page 1
Update on Khronos Standards for
Vision and Machine LearningDecember 2017
Neil Trevett | Khronos President
NVIDIA | VP Developer Ecosystem
[email protected] | @neilt3d |www.khronos.org
© Copyright Khronos Group 2017 - Page 2
Khronos Mission
Software
Silicon
Khronos is an International Industry Consortium of over 100 companies creating
royalty-free, Open Standard APIs to enable software to access hardware
acceleration for 3D Graphics, Virtual and Augmented Reality, Parallel Computing,
Neural Networks and Vision Processing
© Copyright Khronos Group 2017 - Page 3
Khronos Open Standards
Vision
sensor(s)
Tracking and
Positioning
Geometric scene
reconstruction
Semantic scene
understanding
(Neural Networks)
Generate Low Latency 3D
Content and Augmentations
Interact with sensor, haptic
and display devices
Download 3D
augmentation object
and scene data
VR/AR Application
© Copyright Khronos Group 2017 - Page 4
Embedded/Mobile
Vision/Inferencing HardwareEmbedded/Mobile
Vision/Inferencing HardwareEmbedded/Mobile
Vision/Inferencing Hardware
Neural Net Training
FrameworksNeural Net Training
FrameworksNeural Net Training
Frameworks
Vision and Neural Net
Inferencing Runtime
Embedded/Mobile
Vision/Inferencing Hardware
Standards for Vision and Neural Networks
Vision/AI
Applications
Neural Net Training
Frameworks
Desktop and
Cloud Hardware
Trained
Networks
© Copyright Khronos Group 2017 - Page 5
NNEF - Solving Neural Net Fragmentation
NN Authoring Framework 1
NN Authoring Framework 2
NN Authoring Framework 3
Inference Engine 1
Inference Engine 2
Inference Engine 3Every Tool Needs an Exporter to
Every Accelerator
With NNEF- NN Training and Inferencing Interoperability
Before NNEF – NN Training and Inferencing Fragmentation
NN Authoring Framework 1
NN Authoring Framework 2
NN Authoring Framework 3
Inference Engine 1
Inference Engine 2
Inference Engine 3
Optimization and processing tools
NNEF is a Cross-vendor Neural Net file formatEncapsulates network formal semantics, structure, data formats,
commonly-used operations (such as convolution, pooling, normalization, etc.)
© Copyright Khronos Group 2017 - Page 6
NNEF Status and Roadmap• NNEF V1.0 provisional release – targeting late December 2017
- Focus on passing trained frameworks to embedded inference engines
- Support ‘First cut’ range of network types
• NNEF Roadmap
- Track development of new network types
- Allow authoring and retraining (3rd party tools)
- Address a wider range of applications (outside vision apps)
- Increase the expressive power of the format Baseline
Importer
File
Format
Validator
Working Group Participants
Baseline
Exporter
Khronos
Open Source
Projects
© Copyright Khronos Group 2017 - Page 7
OpenVX Pow
er
Eff
icie
ncy
Computation Flexibility
Dedicated Hardware
GPUCompute
Multi-coreCPUX1
X10
X100
Vision DSPs
Wide range of vision hardware architectures
OpenVX provides a high-level Graph-based abstraction
->
Enables Graph-level optimizations!
Can be implemented on almost any hardware or processor!->
Portable, Efficient Vision Processing!
VisionNode
VisionNode
VisionNodeVision
Node
Vision Processing Graph
GPU
Vision Engines
Middleware
Applications
DSP
Hardware
Software Portability
© Copyright Khronos Group 2017 - Page 8
Simple Edge Detector in OpenVXvx_image input = vxCreateImage(1920, 1080);
vx_image output = vxCreateImage(0, 0);
vx_image horiz = vxCreateVirtualImage();
vx_image vert = vxCreateVirtualImage();
vx_image mag = vxCreateVirtualImage();
vx_graph g = vxCreateGraph();
vxSobel3x3Node(g, input, horiz, vert);
vxMagnitudeNode(g, horiz, vert, mag);
vxThresholdNode(g, mag, THRESH, output);
status = vxVerifyGraph(g);
status = vxProcessGraph(g);
m
hi
v
oS M T
Compile the Graph
Execute the Graph
Declare Input and Output Images
Declare Intermediate Images
Construct the Graph topology
© Copyright Khronos Group 2017 - Page 9
OpenVX Evolution
OpenVX 1.0 Spec released October 2014
Conformant
Implementations
OpenVX 1.1 Spec released May 2016
Conformant
Implementations
AMD OpenVX Tools- Open source, highly optimized
for x86 CPU and OpenCL for GPU
- “Graph Optimizer”
- Scripting for rapid prototyping,
without re-compiling, at
production performance levelshttp://gpuopen.com/compute-product/amd-openvx/
New FunctionalityExpanded Nodes Functionality
Enhanced Graph Framework
OpenVX 1.2 Spec released May 2017
Adopters Program November 2017
New FunctionalityConditional node execution
Feature detection
Classification operators
Expanded imaging operations
ExtensionsNeural Network Acceleration
Graph Save and Restore
16-bit image operation
Safety CriticalOpenVX 1.1 SC for
safety-certifiable systems
OpenVX Roadmap under Discussion
NNEF Import
Programmable user
kernels with
accelerator offload
© Copyright Khronos Group 2017 - Page 10
New OpenVX 1.2 Functions• Feature detection: find features useful for object detection and recognition
- Histogram of gradients – HOG Template matching
- Local binary patterns – LBP Line finding
• Classification: detect and recognize objects in an image based on a set of features
- Import a classifier model trained offline
- Classify objects based on a set of input features
• Image Processing: transform an image
- Generalized nonlinear filter: Dilate, erode, median with arbitrary kernel shapes
- Non maximum suppression: Find local maximum values in an image
- Edge-preserving noise reduction
• Conditional execution & node predication
- Selectively execute portions of a graph based on a true/false predicate
• Many, many minor improvements
• New Extensions
- Import/export: compile a graph; save and run later
- 16-bit support: signed 16-bit image data
- Neural networks: Layers are represented as OpenVX nodes
B C
S
ACondition
If A then S ← B else S ← C
© Copyright Khronos Group 2017 - Page 11© Copyright Khronos Group 2017 - Page 11
OpenVX 1.2 and Neural Net Extension• Convolution Neural Network topologies can be represented as OpenVX graphs
- Layers are represented as OpenVX nodes
- Layers connected by multi-dimensional tensors objects
- Layer types include convolution, activation, pooling, fully-connected, soft-max
- CNN nodes can be mixed with traditional vision nodes
• Import/Export Extension
- Efficient handling of network Weights/Biases or complete networks
• OpenVX will be able to import NNEF files into OpenVX Neural Nets
VisionNode
VisionNode
VisionNode
Downstream
Application
Processing
Native
Camera
Control CNN Nodes
An OpenVX graph mixing CNN nodes
with traditional vision nodes
© Copyright Khronos Group 2017 - Page 12© Copyright Khronos Group 2017 - Page 12
OpenVX SC - Safety Critical Vision Processing• OpenVX 1.1 - based on OpenVX 1.1 main specification
- Enhanced determinism
- Specification identifies and numbers requirements
• MISRA C clean per KlocWorks v10
• Divides functionality into “development” and “deployment” feature sets
- Adds requirement to support import/export extension
OpenVX SC
Development Feature
Set (Create Graph)
OpenVX SC
Deployment Feature Set
(Execute Graph)
Binary
format
Verify
Export
Import
Entire graph creation API No graph creation APIImplementation
dependent format
© Copyright Khronos Group 2017 - Page 13
Safety Critical APIs
New Generation APIs for safety
certifiable vision, graphics and
computee.g. ISO 26262 and DO-178B/C
OpenGL ES 1.0 - 2003Fixed function graphics
OpenGL ES 2.0 - 2007Shader programmable pipeline
OpenGL SC 1.0 - 2005Fixed function graphics subset
OpenGL SC 2.0 - April 2016Shader programmable pipeline subset
Experience and Guidelines
OpenCL SC TSG FormedWorking on OpenCL SC 1.2
Eliminate Undefined Behavior
Eliminate Callback Functions
Static Pool of Event Objects
OpenVX SC 1.1 Released May 2017Restricted “deployment” implementation
executes on the target hardware by reading the
binary format and executing the pre-compiled
graphs
Khronos SCAP ‘Safety Critical Advisory Panel’
Guidelines for designing APIs that ease
system certification.
Open to Khronos member AND
industry experts
© Copyright Khronos Group 2017 - Page 14
Embedded/Mobile
Vision/Inferencing HardwareEmbedded/Mobile
Vision/Inferencing HardwareEmbedded/Mobile
Vision/Inferencing Hardware
Neural Net Training
FrameworksNeural Net Training
FrameworksNeural Net Training
Frameworks
Vision and Neural Net
Inferencing Runtime
Embedded/Mobile
Vision/Inferencing Hardware
Standards for Vision and Neural Networks
Vision/AI
Applications
Neural Net Training
Frameworks
Desktop and
Cloud Hardware
Trained
Networks
© Copyright Khronos Group 2017 - Page 15
OpenCL – Low-level Parallel Programing• Low-level, explicit programming of heterogeneous parallel compute resources
- One code tree can be executed on CPUs, GPUs, DSPs and FPGA …
• OpenCL C or C++ language to write kernel programs to execute on any compute device
- Platform Layer API - to query, select and initialize compute devices
- Runtime API - to build and execute kernels programs on multiple devices
• The programmer gets to control:
- What programs execute on what device
- Where data is stored in various speed and size memories in the system
- When programs are run, and what operations are dependent on earlier operations
OpenCL
Kernel
Code
OpenCL
Kernel
Code
OpenCL
Kernel
Code
OpenCL
Kernel
Code
GPU
DSPCPU
CPUFPGA
Kernel code
compiled for
devicesDevices
CPU
Host
Runtime API
loads and executes
kernels across devices
© Copyright Khronos Group 2017 - Page 16
OpenCL 2.2 Released in May 2017
2011
OpenCL 1.2
Becomes industry
baseline for heterogeneous
parallel computing
OpenCL 2.1SPIR-V 1.0
SPIR-V in CoreKernel Language
Flexibility
OpenCL 2.2SPIR-V 1.2
OpenCL C++ Kernel LanguageStatic subset of C++14
Templates and Lambdas
SPIR-V 1.2 OpenCL C++ support
PipesEfficient device-scope
communication between kernels
Code Generation Optimizations - Specialization constants at
SPIR-V compilation time- Constructors and destructors of
program scope global objects- User callbacks can be set at
program release time
201720152013
OpenCL 2.0
Enables new class of hardware
SVMGeneric AddressesOn-device dispatch
https://www.khronos.org/opencl/
© Copyright Khronos Group 2017 - Page 17
OpenCL Conformant Implementations
OpenCL 1.0Specification
Dec08 Jun10OpenCL 1.1
Specification
Nov11OpenCL 1.2
SpecificationOpenCL 2.0
Specification
Nov13
1.0|Jul13
1.0|Aug09
1.0|May09
1.0|May10
1.0 | Feb11
1.0|May09
1.0|Jan10
1.1|Aug10
1.1|Jul11
1.2|May12
1.2|Jun12
1.1|Feb11
1.1|Mar11
1.1|Jun10
1.1|Aug12
1.1|Nov12
1.1|May13
1.1|Apr12
1.2|Apr14
1.2|Sep13
1.2|Dec12
Desktop
Mobile
FPGA
2.0|Jul14
OpenCL 2.1 Specification
Nov15
1.2|May15
2.0|Dec14
1.0|Dec14
1.2|Dec14
1.2|Sep14
Vendor timelines are
first conformant
submission for each spec
generation
1.2|May15
Embedded
1.2|Aug15
1.2|Mar16
2.0|Nov15
2.0|Apr17
2.1 | Jun16
© Copyright Khronos Group 2017 - Page 18
OpenCL as Language/Library Backend
C++ based
Neural
network
framework
MulticoreWare
open source
project on
Bitbucket
Compiler
directives for
Fortran,
C and C++
Java language
extensions
for
parallelism
Language for
image
processing and
computational
photography
Single
Source C++
Programming
for OpenCL
Hundreds of languages, frameworks
and projects using OpenCL to access
vendor-optimized, heterogeneous
compute runtimes
Vision
processing
open source
project
Open source
software library
for machine
learning
Over 4,000 GitHub repositories
using OpenCL: tools, applications,
libraries, languages - up from
2,000 two years ago
© Copyright Khronos Group 2017 - Page 19
SYCL Ecosystem• Single-source heterogeneous programming using STANDARD C++
- Use C++ templates and lambda functions for host & device code
- Layered over OpenCL
• Fast and powerful path for bring C++ apps and libraries to OpenCL
- C++ Kernel Fusion - better performance on complex software than hand-coding
- Halide, Eigen, Boost.Compute, SYCLBLAS, SYCL Eigen, SYCL TensorFlow, SYCL GTX
- triSYCL, ComputeCpp, VisionCpp, ComputeCpp SDK …
• More information at http://sycl.tech
© Copyright Khronos Group 2017 - Page 20
SYCL 1.2.1 Publicly Released Today!• Major update representing two and a half years of work by Khronos members
- Incorporates significant experience gained from three separate implementations
- Feedback from developers of machine learning frameworks such as TensorFlow
- TensorFlow now supports SYCL alongside the original CUDA accelerator back-end
- Layers over OpenCL 1.2
“This is a significant release update for SYCL with an enhanced
ecosystem that matches our intention to support machine learning
and align with modern C++17. SYCL continues to help us to drive
the C++ standard towards heterogeneous support. Our intention is
to move forward quickly with the SYCL roadmap to bring even more
emphasis to machine learning and Safety Critical support, as well as
continued alignment with future ISO C++,”
Michael Wong, SYCL working group chair
C++ Kernel LanguageLow Level Control
‘GPGPU’-style separation of
device-side kernel source
code and host code
Single-source C++Programmer FamiliarityApproach also taken by
C++ AMP and OpenMP
Developer ChoiceThe development of the two specifications are aligned so
code can be easily shared between the two approaches
© Copyright Khronos Group 2017 - Page 21
OpenCL Roadmap
2011
OpenCL 1.2OpenCL C Kernel
Language
OpenCL 2.1 SPIR-V in Core
2015
SYCL 1.2C++11 Single source
programming
OpenCL 2.2 C++ Kernel Language
2017
SYCL 2.2C++14 Single source
programming
Industry working to bring
Heterogeneous compute to
standard ISO C++C++17 Parallel STL hosted by Khronos
Executors – for scheduling work
“Managed pointers” or “channels” –for sharing data
OpenCL for DSPs- Embedded imaging, vision and inferencing
- Flexible reduced precision - Conformance without IEEE 32 Floating Point
- Explicit DMA
Single source C++ programming.
Great for supporting C++ apps,
libraries and frameworks
Help bring OpenCL-
class compute to Vulkan
for GPUs
© Copyright Khronos Group 2017 - Page 22
Embedded/Mobile
Vision/Inferencing HardwareEmbedded/Mobile
Vision/Inferencing HardwareEmbedded/Mobile
Vision/Inferencing Hardware
Neural Net Training
FrameworksNeural Net Training
FrameworksNeural Net Training
Frameworks
Vision and Neural Net
Inferencing Runtime
Embedded/Mobile
Vision/Inferencing Hardware
Standards for Vision and Neural Networks
Vision/AI
Applications
Neural Net Training
Frameworks
Desktop and
Cloud Hardware
Trained
Networks
© Copyright Khronos Group 2017 - Page 23
Pervasive VulkanAll Major GPU Companies shipping Vulkan Drivers for Desktop and Mobile Platforms
Embedded LinuxAndroid TVAndroid 7.0+ Nintendo Switch
http://vulkan.gpuinfo.org/
Oculus RiftSteamVR
VR Platforms
Desktop, Mobile, Embedded and Console Platforms Supporting VulkanIncluding phones and tablets from Google, Huawei, Samsung, Sony, Xiaomi - both premium and mid-range devices
GearVR Google Daydream
Game Engines
Desktop
© Copyright Khronos Group 2017 - Page 24
Market Demand for Universal 3D Portability
Games
Engines
Native 3D
AppsBrowser
Engines Games
Engines
Native 3D
Apps
Browser
Engines
Vulkan Universally
Portable Subset
JavaScript and
WebAssembly Native bindings
for ‘nexgen WebGL’
Community Outreach at GDC 2017
Create a hybrid Portability API?
Feedback - AVOID CREATING A FOURTH API!!!
Would need new specification, CTS, Documentation.
Additional developer learning curve.
A whole new specification to name, brand, promote.
Would INCREASE industry fragmentation
Tools Layers
API Libraries
Shader Translators
MAP Vulkan to Metal
and DX12
© Copyright Khronos Group 2017 - Page 25
Vulkan Portability TSG Process
API
Overlap
Analysis
Metal Shading
Language
HLSL
Vulkan Portability Deliverables1. Vulkan Subset Diff Spec
2. Vulkan Subset Development Layer
3. Vulkan Subset API Library over DX12/Metal
4. SPIRV-Cross Translator
5. Vulkan Subset Conformance Tests
Expand/test existing
open source SPIRV-Cross Tool
Possible proposals for Vulkan extensions for
enhanced portability (and possibly Web
robustness) sent to Vulkan WG
Identify Vulkan
features not directly mappable
to DX12 and Metal
New Vulkan functionality may affect the
overlap analysis
Layers, APIs, Translators and Tests all to be
developed and released in open source
Open source project with similar goals https://github.com/gfx-rs/gfx
Vulkan on iOS and macOShttps://moltengl.com/moltenvk/
© Copyright Khronos Group 2017 - Page 26
Vulkan Evolution
Feb16
Vulkan 1.0
Vulkan ExtensionsMaintenance updates plus additional functionality
Explicit Building Blocks for VR
Explicit Building Blocks for Homogeneous Multi-GPUEnhanced Windows System Integration
Increased Shader Language Flexibility
Enhanced Cross-Process and Cross-API Sharing
Vulkan Roadmap
Regular Vulkan Core ReleasesIntegrates proven KHR extensions
Roadmap DiscussionsPushes forward the envelope of GPU Acceleration:
Enhanced Compute and Language FlexibilityOptimized Vision and Inferencing Acceleration
Ray Tracing
Widened Platform SupportThrough Vulkan Portability Initiative
Including Vulkan on macOS and iOS
Strengthened Ecosystem and SDKEnhanced developer and debugging tools
Regression testing for SDK stability
Enhanced Conformance Testing (API now has 198K test cases
- up from 107K last year)
Compiler robustness - including HLSL support
Explicit Access to GPU
Acceleration
© Copyright Khronos Group 2017 - Page 27
Clspv OpenCL C to Vulkan Compiler• Experimental collaboration between Google, Codeplay, and Adobe
- Successfully tested on over 200K lines of Adobe OpenCL C production code
- Released in open source https://github.com/google/clspv
- Tracks top-of-tree LLVM and clang, not a fork
• Uses new Vulkan extensions to support OpenCL C compute operations
- VK_KHR_16bit_storage/SPV_KHR_16bit_storage
- VK_KHR_variable_pointers/SPV_KHR_variable_pointers
• Compiles OpenCL C’s programming model to Vulkan’s SPIR-V execution environment
- Proof-of-concept that OpenCL compute can be brought seamlessly to Vulkan
Vulkan
Universal
Portability
OpenCL-class
compute in
Vulkan
Universally
Portable
Graphics and
Advanced
Compute
© Copyright Khronos Group 2017 - Page 28
Dedicated Vision
Hardware
Layered Vision/ Neural Net Ecosystem
Programmable Vision
Processors
Application
Implementers may use OpenCL or Vulkan to implement
OpenVX nodes on programmable processors
And then developers can use OpenVX to enable a
developer to easily connect those nodes into a graph
The OpenVX graph abstraction enables implementers to optimize execution
across diverse hardware architectures for optimal power and performance
OpenVX enables the graph to be extended to include hardware
architectures that don’t support programmable APIs
Vulkan RoadmapEnhanced compute – vision and
inferencing on any platforms with
GPU acceleration
OpenCL RoadmapFlexible precision for
widespread deployment on low-
cost embedded processors
© Copyright Khronos Group 2017 - Page 29
Khronos Cooperative Framework
Royalty-Free
Specifications
Adopters
Programs
Implementations, Tools
Documentation, Samples
Adopters Build conformant
implementation and products
DevelopersDevelop applications
using the APIs
Educators / CertifiersCreate Courses
Training and Certification
Educator Guidelines
Courseware Materials
API Working Groups (Industry, Academic and
Associate members)
$
$
Khronos BoardStrategy, budget and
oversight$$
One vote per industry member
Conformance
Tests
Open and
royalty-free.
Often open sourced
Advisory PanelsBy invitation, no fee.
Provide requirements
and draft spec feedback
Industry and
CommunityFeedback and
contributions on
released materials
Membership FeesAcademic $1,000
Small companies $175/employee
(min $3,500)
Non-profits $7,500
Larger companies - $18,000
© Copyright Khronos Group 2017 - Page 30
New Open Source Engagement Model• Khronos is open sourcing specification sources,
conformance tests, tools
- Merge requests welcome from the community
(subject to review by OpenCL working group)
• Deeper Community Enablement
- Mix your own documentation!
- Contribute and fix conformance tests
- Fix the specification, headers, ICD etc.
- Contribute new features (carefully)
Khronos builds and Ratifies
Canonical Specification under
Khronos IP Framework.
No changes or re-hosting allowed
Spec and Ref Language Source and
derivative materials.
Re-mixable under CC-BY by the
industry and community
Source Materials for Specifications and Reference
Documentation CONTRIBUTED Under Khronos IP Framework
(you won’t assert patents against conformant implementations,
and license copyright for Khronos use)
Contributions and
Distribution under
Apache 2.0 Spec Build
System and
Scripts
Spec and
Ref Language
SourceRedistribution
under CC-BY 4.0
Conformance
Test Suite
Source
Community built
documentation and
tools
Contributions and
Distribution under
Apache 2.0
Anyone can test any
implementation at
any time
Khronos Adopters
Program
Conformant Implementations can
use trademark and are covered
by Khronos IP Framework
© Copyright Khronos Group 2017 - Page 31
Khronos Advisory Panels
Working
Group
Advisory
Panel
Shared Email
list and
Repository
Khronos MembersAny company can join
Membership Fee
Sign NDA and IP Framework
Khronos AdvisorsInvited industry experts
$0 Cost
Sign NDA and IP Framework
Specification drafts and
invitations for
requirements and feedback
Requirements and
feedback on
specification drafts
Hosted by Khronos.
Under Khronos NDA
Advisory Panels Active for Vulkan, OpenCL/SYCL and NNEF
© Copyright Khronos Group 2017 - Page 32© Copyright Khronos Group 2017 - Page 32
Need for Camera Control API?• Advanced control of ISP and camera subsystem – with cross-platform portability
- Generate sophisticated image stream for advanced imaging & vision apps
• No platform API currently fulfills all developer requirements
- Portable access to growing sensor diversity: e.g. depth sensors and sensor arrays
- Cross sensor synch: e.g. synch of camera and MEMS sensors
- Advanced, high-frequency per-frame burst control of camera/sensor: e.g. ROI
- Multiple input, output re-circulating streams with RAW, Bayer or YUV Processing
Image Signal
Processor (ISP)
Defines control of Sensor, Color Filter Array
Lens, Flash, Focus, Aperture
Auto Exposure (AE)
Auto White Balance (AWB)
Auto Focus (AF) Stream of Images for
Vision Processing
OpenKCAM standard is
currently on ice – do we
need to restart?
© Copyright Khronos Group 2017 - Page 33© Copyright Khronos Group 2017 - Page 33
Please Get Involved!• Khronos is creating cutting-edge royalty-free open standards
- For embedded vision and neural network acceleration
• Khronos standards are key to many emerging markets such as
3D rendering and advanced gaming, Augmented and Virtual Reality,
Embedded Vision and machine learning
- Advanced next generation capabilities for ALL platforms
• Khronos encourages your participation
- www.khronos.org
- @neilt3d
© Copyright Khronos Group 2017 - Page 34© Copyright Khronos Group 2017 - Page 34
OpenVX - Graph-Level Abstraction • OpenVX developers express a graph of image operations (‘Nodes’)
- Using a C API
• Nodes can be executed on any hardware or processor coded in any language
- Implementers can optimize under the high-level graph abstraction
• Graphs are the key to run-time power and performance optimizations
- E.g. Node fusion, tiled graph processing for cache efficiency etc.
Array of
Keypoints
YUV
Frame
Gray
Frame
Camera
Input
Rendering
Output
Pyrt
Color Conversion
Channel Extract
Optical Flow
Harris Track
Image Pyramid
RGB
Frame
Array of
FeaturesFtrt-1OpenVX Graph
OpenVX Nodes
Feature Extraction Example Graph
© Copyright Khronos Group 2017 - Page 35
OpenVX Efficiency through Graphs..
Reuse
pre-allocated
memory for
multiple
intermediate data
Memory
Management
Less allocation overhead,
more memory for
other applications
Replace a sub-
graph with a
single faster node
Kernel
Fusion
Better memory
locality, less kernel
launch overhead
Split the graph
execution across
the whole
system: CPU /
GPU / dedicated
HW
Graph
Scheduling
Faster execution
or lower power
consumption
Execute a sub-
graph at tile
granularity
instead of image
granularity
Data
Tiling
Better use of
data cache and
local memory
© Copyright Khronos Group 2017 - Page 36
SPIR-V Ecosystem
LLVM
Third party kernel and
shader languages
SPIR-V• Khronos defined cross-API IR
• Native graphics and parallel compute
• Easily parsed/extended 32-bit stream
• Data object/control flow retained for
effective code generation/translation
OpenCL C++Front-end
OpenCL CFront-end
glslangKhronos open source
tools and translators
IHV Driver
Runtimes
Additional
Intermediate Forms
SPIR-V Validator
SPIR-V (Dis)Assembler
LLVM to SPIR-V
Bi-directional
Translator
https://github.com/KhronosGroup/SPIRV-Tools
GLSL HLSL
Khronos liaising with
Clang/LLVM CommunityE.g. discussing SPIR-V as
supported Clang target
SPIRV-CrossGLSL
HLSL
MSL
SPIRV-opt | SPIRV-remap
SPIR-V Optimizations• Inlining (exhaustive)
• Store/Load Elimination
• Dead Code Elimination
• Dead Branch Elimination
• Common Uniform Elimination
Coming• Loop Unrolling and Constant Folding
• Common Subexpression Elimination
© Copyright Khronos Group 2017 - Page 37
Vulkan and New Generation 3D APIs
Only AppleOnly Windows 10
Cross Platform
Clean, modern architecture | Low overhead, explicit GPU access
Portable across desktop and mobile | Multi-thread / multi-core friendly
Efficient, low-latency, predictable performance
© Copyright Khronos Group 2017 - Page 38
Vulkan Explicit GPU Control
GPU
High-level Driver
AbstractionLayered GPU Control
Context management
Memory allocation
Full GLSL compiler
Error detection
ApplicationSingle thread per context
GPU
Thin DriverExplicit GPU Control
ApplicationMemory allocation
Thread management
Synchronization
Multi-threaded generation
of command buffers
Multiple Front-end
CompilersGLSL, HLSL etc.
Loadable debug and
validation layers
Vulkan 1.0 provides access to
OpenGL ES 3.1 / OpenGL 4.X-class GPU functionality
but with increased performance and flexibility
SPIR-V pre-compiled shaders
Complex drivers
cause overhead
and inconsistent
behavior across
vendors
Always active
error handling
Full GLSL
preprocessor and
compiler in
driver
OpenGL vs.
OpenGL ES
Resource management
offloaded to app:
low-overhead, low-latency
driver
Consistent behavior:
no ‘fighting with driver
heuristics’
Validation and debug layers
loaded only when needed
SPIR-V intermediate
language: shading language
flexibility
Multi-threaded command
creation. Multiple graphics,
command and DMA queues
Unified API across all
platforms with feature set
flexibility
Vulkan = high performance and
low latency 3D and GPU Compute.
Ideal for VR/AR applications
© Copyright Khronos Group 2017 - Page 39
OpenCL Ecosystem
Hardware ImplementersDesktop/Mobile/Embedded/FPGA
100s of applications using
OpenCL accelerationRendering, visualization, video editing,
simulation, image processing, vision and
neural network inferencing
Single Source C++ Programming
Portable Kernel Intermediate Language
Core API and Language Specs
OpenCL 2.2 – Top to Bottom C++