hardware-accelerated computing and rendering with nodefiles.meetup.com › 2048391 ›...
TRANSCRIPT
MOTOROLA and the Stylized M Logo are trademarks or registered trademarks of Motorola Trademark Holdings, LLC. All other trademarks are the property of their respective owners. © 2010 Motorola Mobility, Inc. All rights reserved.
Hardware-accelerated computing and rendering with Node.JS (node-webcl, node-webgl, node-glfw, node-image)
Mikaël Bourges-Sévenier, Motorola Mobility September 6, 2012
ORGANIZATION DETAIL 2 Page
Content
§ Mo#va#on § General-‐Purpose compu#ng on GPU § Why Node.JS? § Notes on mul#-‐threading
§ Overview of my Node.JS modules § Architecture § Installa#on § Examples / Demos
§ Understanding WebCL / OpenCL § OpenCL features § OpenCL model § WebCL API § “Hello World” code walkthrough
§ Perspec#ves
2012-09-06 © 2012 Motorola Mobility, Inc.
ORGANIZATION DETAIL 3 Page
Motivation: General-Purpose Computing on GPU
§ More and more data to process § Signal & Image processing § Data mining, paRern matching, sta#s#cs… § Machine intelligence § Financial analysis § Physics engines, ray-‐tracing…
§ CPU tend to have up to 16 cores § General purpose § Launch a thread per hardware execu#on context
§ GPUs have 100s of cores § Persistent threads § Launch a workgroup per hardware execu#on context § Designed for data-‐parallel computa#ons § Originally developed for 3D vector graphics § More transistors devoted to processing than caching
and control 2012-09-06 © 2012 Motorola Mobility, Inc.
ControlALU ALU
ALU ALU
Cache
DRAM
CPU
DRAM
GPUDavid Luebke, The democratization of Parallel Computing, SC07
ORGANIZATION DETAIL 4 Page
Motivation: CPU-GPU Systems-on-a-Chip (SOCs)
2012-09-06 © 2012 Motorola Mobility, Inc.
AMD Trinity
Intel Ivy Bridge
Nvidia Kepler
ORGANIZATION DETAIL 5 Page
Motivation: Why Node.JS?
§ V8 JavaScript engine, as on Chrome browsers, cross-‐pla[orms
§ Fast prototyping § For app development § Great to test features before adding them to browsers
§ Modular, Easily extensible § Tons of great modules § Great for developing/tes#ng/maintaining modular apps § Great for developing new features
§ For Server-‐side applica#ons (no GUI or web browser as GUI) § With lots of data to process (great for OpenCL)
§ For client-‐side development § Same JS code running on Node.JS and browsers § Faster than browsers (less layers)
2012-09-06 © 2012 Motorola Mobility, Inc.
ORGANIZATION DETAIL 6 Page
Notes on multi-threading
§ JavaScript has no threading support: it is an event-‐based language § Node.JS is an implementa#on of the Reactor paRern
§ Opera#ons run in worker threads
§ For (large) data intensive tasks that can be parallelized, a GPU offers more processing power than a CPU
2012-09-06 © 2012 Motorola Mobility, Inc.
Event Loop(single thread)
Register callback
Responses via callback
Responses sent to clients Long operations are deferred
to worker threads
Threads are handled by node.js internally.
ORGANIZATION DETAIL 7 Page
OVERVIEW OF MY NODE.JS MODULES
2012-09-06 © 2012 Motorola Mobility, Inc.
ORGANIZATION DETAIL 8 Page
My Node.JS modules
2012-09-06 © 2012 Motorola Mobility, Inc.
FreeImageGLFWAntTweakBar
OpenGL
OpenCLnode-webcl
node-webgl
node-glfw
App
node.js
V8node-image
Native, Hardware
Node.JS module
Required dependency
Optional dependency
ORGANIZATION DETAIL 9 Page
My Node.JS modules
2012-09-06 © 2012 Motorola Mobility, Inc.
FreeImageGLFWAntTweakBar
OpenGL
OpenCLnode-webcl
node-webgl
node-glfw
App
node.js
V8node-image
Native, Hardware
Node.JS module
Required dependency
Optional dependency
node-webgl
- started from Blue Lava demo for WebOS http://minimason.no.de/- Emulates WebGL 1.x API- Uses desktop OpenGL- Requires node-glfw- Implements DOM document- Implements DOM mouse & key events- Implements HTML <Image>
ORGANIZATION DETAIL 10 Page
My Node.JS modules
2012-09-06 © 2012 Motorola Mobility, Inc.
FreeImageGLFWAntTweakBar
OpenGL
OpenCLnode-webcl
node-webgl
node-glfw
App
node.js
V8node-image
Native, Hardware
Node.JS module
Required dependency
Optional dependency
node-glfw
- wrapper around GLFW- cross-platform library for opening a window, creating an OpenGL context, and managing input- Relies on GLEW to get the right OpenGL extensions- Over time, added AntTweakBar to get a simple menu system
ORGANIZATION DETAIL 11 Page
My Node.JS modules
2012-09-06 © 2012 Motorola Mobility, Inc.
FreeImageGLFWAntTweakBar
OpenGL
OpenCLnode-webcl
node-webgl
node-glfw
App
node.js
V8node-image
Native, Hardware
Node.JS module
Required dependency
Optional dependency
node-image
- wraps FreeImage- optimized native buffers for node-webcl and node-webgl
ORGANIZATION DETAIL 12 Page
My Node.JS modules
2012-09-06 © 2012 Motorola Mobility, Inc.
FreeImageGLFWAntTweakBar
OpenGL
OpenCLnode-webcl
node-webgl
node-glfw
App
node.js
V8node-image
Native, Hardware
Node.JS module
Required dependency
Optional dependency
node-webcl
- implements WebCL API- implements WebCL - WebGL interop.- follows WebCL spec on a weekly basis
ORGANIZATION DETAIL 13 Page
Installation § Requires Node.JS >= 0.7.x (due to TypedArrays) § Mac OSX 10.7, Microso/ Windows 7, Ubuntu 10.10+
§ Make sure node-‐gyp is installed
§ Get latest OpenCL 1.1+ / OpenGL 2+ drivers for your GPU § Intel OpenCL SDK, AMD AMP SDK, Nvidia CUDA 4.x SDK
§ Install latest na#ve libraries in your library and include paths § GLFW hRp://www.glfw.org/ § GLEW hRp://glew.sourceforge.net/ § FreeImage hRp://freeimage.sourceforge.net/ § AntTweakBar hRp://www.an#sphere.com/Wiki/tools:anRweakbar
§ Install node-‐webcl or node-‐webgl § npm install node-‐webcl will also install node-‐webgl § npm install node-‐webgl will also install node-‐glfw
§ For examples, also install: npm install opDmist 2012-09-06 © 2012 Motorola Mobility, Inc.
ORGANIZATION DETAIL 14 Page
Usage
2012-09-06 © 2012 Motorola Mobility, Inc.
for node-webgl
WebGL = require('node-webgl');Image = WebGL.Image;document = WebGL.document();window = document;canvas = document.createElement("my_canvas");gl = canvas.getContext("experimental-webgl");
for node-webcl
WebCL = require('node-webcl');
ORGANIZATION DETAIL 15 Page
Test your installation (node-webgl) § GL = Graphics Language
§ cd node-‐webgl
§ node examples/lightgl/shadowmap.js
§ node test/cube.js
2012-09-06 © 2012 Motorola Mobility, Inc.
examples/lightgl/shadowmap.js
test/lesson08.js (node-image)
test/cube.js (AntTweakBar)
ORGANIZATION DETAIL 16 Page
Test your installation (node-webcl) § CL = Compute Language
§ cd node-‐webcl
§ node examples/DeviceQuery.js
§ node examples/VectorAdd.js
2012-09-06 © 2012 Motorola Mobility, Inc.
test/image.js (PBO)
examples/sine.js (FBO) examples/apple/qjulia/qjulia.js (PBO, AntTweakBar)
ORGANIZATION DETAIL 17 Page
UNDERSTANDING WEBCL & OPENCL
2012-09-06 © 2012 Motorola Mobility, Inc.
ORGANIZATION DETAIL 18 Page
What is WebCL?
§ WebCL brings parallel compu#ng to the Web through a secure JavaScript binding to OpenCL 1.1 (2011) § Open standard, royalty-‐free § Pla[orm independent § Device independent § being standardized by Khronos
§ First public working dran April 2012 § hRp://www.khronos.org/webcl/
2012-09-06 © 2012 Motorola Mobility, Inc.
ORGANIZATION DETAIL 19 Page
OpenCL overview
§ OpenCL framework has 2 parts 1. Host API: C-‐based, cross-‐pla[orm, object-‐oriented
• Commands to control send/receive data and control execu#on on devices
2. Kernels • Run on devices • use a subset of C99 and extensions • Vector extensions (<type>N) • No recursion, no func#on pointers • No dynamic memory (malloc, free…), no standard libc methods (memcpy…) • Kernels are akin to shaders in WebGL
§ Features § Well-‐defined numerical accuracy both for integers and floats § Rich-‐set of built-‐in func#ons (e.g. as GLSL and more)
• But no random method § Close to the hardware
• Allow control over memory use • Allow control over thread scheduling
2012-09-06 © 2012 Motorola Mobility, Inc.
ORGANIZATION DETAIL 20 Page
§ A host is connected to one or more Compute devices
§ Compute device § A collec#on of one or more compute
units (~ cores) § A compute unit is composed of
one or more processing elements (~ threads)
§ Processing elements execute code as SIMD or SPMD
Host(PC)
......
...
......
...
......
...
......
...
Compute Devices (GPU, CPU, DSP, FPGA…)
OpenCL Device Model
Processing Element (Thread)
Compute Device (GPU, CPU, …)
......
...
Compute Unit (Core)
ORGANIZATION DETAIL 21 Page
examples/DeviceQuery.js § Queries parameters of all OpenCL devices aRached to your computer
§ Example: on a MacBook Pro early 2011, OSX 10.8
2012-09-06 © 2012 Motorola Mobility, Inc.
Found 2 devices --------------------------------- Device: Intel(R) Core(TM) i7-2720QM CPU @ 2.20GHz --------------------------------- DEVICE_NAME: Intel(R) Core(TM) i7-2720QM CPU @ 2.20GHz DEVICE_VENDOR: Intel DRIVER_VERSION: 1.1 DEVICE_VERSION: OpenCL 1.2 DEVICE_PROFILE: FULL_PROFILE DEVICE_OPENCL_C_VERSION: OpenCL C 1.2 DEVICE_TYPE: cpu DEVICE_MAX_COMPUTE_UNITS: 8 DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3 DEVICE_MAX_WORK_ITEM_SIZES: 1024 / 1 / 1 DEVICE_MAX_WORK_GROUP_SIZE: 1024 DEVICE_MAX_CLOCK_FREQUENCY: 2200 MHz
--------------------------------- Device: ATI Radeon HD 6750M --------------------------------- ATI Radeon HD 6750M AMD 1.0 OpenCL 1.1 FULL_PROFILE OpenCL C 1.1 gpu 6 3 1024 / 1024 / 1024 1024 600 MHz
ORGANIZATION DETAIL 22 Page
DeviceQuery § Queries parameters of all OpenCL devices aRached to your computer
§ Example: on a MacBook Pro early 2011, OSX 10.8
2012-09-06 © 2012 Motorola Mobility, Inc.
Found 2 devices --------------------------------- Device: Intel(R) Core(TM) i7-2720QM CPU @ 2.20GHz --------------------------------- DEVICE_NAME: Intel(R) Core(TM) i7-2720QM CPU @ 2.20GHz DEVICE_VENDOR: Intel DRIVER_VERSION: 1.1 DEVICE_VERSION: OpenCL 1.2 DEVICE_PROFILE: FULL_PROFILE DEVICE_OPENCL_C_VERSION: OpenCL C 1.2 DEVICE_TYPE: cpu DEVICE_MAX_COMPUTE_UNITS: 8 DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3 DEVICE_MAX_WORK_ITEM_SIZES: 1024 / 1 / 1 DEVICE_MAX_WORK_GROUP_SIZE: 1024 DEVICE_MAX_CLOCK_FREQUENCY: 2200 MHz
--------------------------------- Device: ATI Radeon HD 6750M --------------------------------- ATI Radeon HD 6750M AMD 1.0 OpenCL 1.1 FULL_PROFILE OpenCL C 1.1 gpu 6 3 1024 / 1024 / 1024 1024 600 MHz
- 4 cores, hyperthreaded => 8 compute units
- Up to 1024 threads in 1D, at 2.2 GHz
- 6 compute units - Up to 1024 threads
but 3 dims, at 600 MHz
ORGANIZATION DETAIL 23 Page
OpenCL Execution Model
§ Kernel § Basic unit of executable code (~ DLL entry point) § Data-‐parallel or task-‐parallel
§ Program § Collec#on of kernels and func#ons called by kernels § Analogous to a dynamic library (DLL)
§ Command Queue § Control opera#ons on OpenCL objects (memory transfers, kernels execu#on, synchroniza#on) § Commands queued in order § Execu#on in-‐order or out-‐of-‐order § Applica#ons may use mul#ple command-‐queues per device
§ Work-‐item § An execu#on of a kernel by a processing element (~ thread)
§ Work-‐group § A collec#on of work-‐items that execute on a single compute unit (~ core)
Queue Queue Context
GPU
CPU
ORGANIZATION DETAIL 24 Page
OpenCL Work-group 2D analogy
# work-items = # pixels # work-groups = # tiles Work-group size = tileW * tileH All threads in a workgroup run synchronously
Local
Global
ORGANIZATION DETAIL 25 Page
OpenCL Kernel § Defined on a N-‐dimensional computa#on domain
§ A kernel is executed at each point of the computa#on domain
// In JavaScript function multiple(a,b,n) { var c = []; for(var i=0; i<n; ++i) c[i] = a[i] * b[i]; return c; }
// In OpenCL C99 /** * @param a, b, c are buffers in global memory * @param n number of elements in a, b, and c */ __kernel void multiply(__global const float *a, __global const float *b, __global float *c, unsigned int n) { unsigned int tid = get_global_id(0); // thread number if(tid >= n) return; // make sure we don't pass buffer area c[tid] = a[tid] * b[tid]; }
ORGANIZATION DETAIL 26 Page
OpenCL Memory Model § On Host
§ CPU RAM
§ On Compute Device § Global memory = GPU RAM § Constant memory = cached global memory § Texture memory = cached global memory
op#mized for streaming reads § Local memory = high-‐speed memory shared
among work-‐items of a work-‐group (~ L1 cache)
§ Private memory = registers of a work-‐item, very fast memory
§ Memory management is explicit § App must move data host ➞ global ➞ local
and back
Private Memory Private Memory
Work-Item 1 Work-Item M
Workgroup 1
Private Memory Private Memory
Work-Item 1 Work-Item M
Workgroup N
Local Memory Local Memory
Global Memory / Constant and Texture Caches
Compute Device
Host Memory
Host
Command queuesand
API calls
ORGANIZATION DETAIL 27 Page
WebCL API
WebCLMemoryObject{abstract}
WebCLImageWebCLBuffer
WebCLContext
WebCLKernel
WebCLProgram CommandQueue Event
WebCLDevice
Sampler
*
WebCL
WebCLPlatform WebCLExtension
* * * *
Platform layer
Compiler layer Runtime layer
Same OO model as OpenCL with JS classes WebCL is global object
ORGANIZATION DETAIL 28 Page
“HELLO WORLD” CODE WALKTHROUGH
2012-09-06 © 2012 Motorola Mobility, Inc.
ORGANIZATION DETAIL 29 Page
WebCL sequence (host side)
§ Create context § Compile kernels
§ Setup command-‐queues
§ Setup kernels arguments
§ Execute commands
§ Read results
Select Platform
Select Device
Create Context
Load and compile kernels on devices
Create command queues for each device
Send data to devices using their command
queues
Send commands to devices using their command queues
Get data from devices using their command
queues
Release resources
Create buffers to store data on devices
Update kernels arguments
Platform layerCompilerRuntime layer
ORGANIZATION DETAIL 30 Page
Select Platform
Select Device
Create Context
Load and compile kernels on devices
Create command queues for each device
Send data to devices using their command
queues
Send commands to devices using their command queues
Get data from devices using their command
queues
Release resources
Create buffers to store data on devices
Update kernels arguments
WebCL sequence (host side) [1/6] // create the OpenCL context try { clContext = WebCL.createContext({ deviceType: WebCL.DEVICE_TYPE_GPU }); } catch(err) { throw "Error: Failed to create context! "+err; } var devices = clContext.getInfo(WebCL.CONTEXT_DEVICES); if (!devices) { throw "Error: Failed to retrieve compute devices for context!"; }
ORGANIZATION DETAIL 31 Page
Select Platform
Select Device
Create Context
Load and compile kernels on devices
Create command queues for each device
Send data to devices using their command
queues
Send commands to devices using their command queues
Get data from devices using their command
queues
Release resources
Create buffers to store data on devices
Update kernels arguments
// Create the compute program from the source buffer (text) clProgram = clContext.createProgram(getScource("multiply_script")); // Build the program executable try { clProgram.build(clDevice, '-cl-fast-relaxed-math -DDEBUG=1'); } catch (err) { throw "Error: Failed to build program executable!\n" + clProgram.getBuildInfo(clDevice, WebCL.PROGRAM_BUILD_LOG); }
clKernel = clProgram.createKernel("multiply");
WebCL sequence (host side) [2/6] <script id="multiply_script" type="x-webcl"> __kernel void multiply(__global const float *a, __global const float *b, __global float *c, unsigned int n) { unsigned int tid = get_global_id(0); // thread number if(tid >= n) return; // make sure we don't pass buffer area c[tid] = a[tid] * b[tid]; } </script>
ORGANIZATION DETAIL 32 Page
Select Platform
Select Device
Create Context
Load and compile kernels on devices
Create command queues for each device
Send data to devices using their command
queues
Send commands to devices using their command queues
Get data from devices using their command
queues
Release resources
Create buffers to store data on devices
Update kernels arguments
WebCL sequence (host side) [3/6] BUFFER_SIZE=10; var A=new Uint32Array(BUFFER_SIZE); var B=new Uint32Array(BUFFER_SIZE);
// store data in A and B …
var size=BUFFER_SIZE*Uint32Array.BYTES_PER_ELEMENT; // size in bytes // Create buffer for A and B and copy host contents var aBuffer = clContext.createBuffer(WebCL.MEM_READ_ONLY, size); var bBuffer = clContext.createBuffer(WebCL.MEM_READ_ONLY, size);
// Create buffer for C to read results var cBuffer = clContext.createBuffer(WebCL.MEM_WRITE_ONLY, size);
ORGANIZATION DETAIL 33 Page
// Set kernel args clKernel.setArg(0, aBuffer); clKernel.setArg(1, bBuffer); clKernel.setArg(2, cBuffer); clKernel.setArg(3, BUFFER_SIZE, WebCL.type.UINT);
Select Platform
Select Device
Create Context
Load and compile kernels on devices
Create command queues for each device
Send data to devices using their command
queues
Send commands to devices using their command queues
Get data from devices using their command
queues
Release resources
Create buffers to store data on devices
Update kernels arguments
WebCL sequence (host side) [4/6] // Create command queue clQueue=context.createCommandQueue(devices[0]); // enqueue buffers clQueue.enqueueWriteBuffer (aBuffer, false, 0, size, A); clQueue.enqueueWriteBuffer (bBuffer, false, 0, size, B);
__kernel void multiply(__global const float *a, __global const float *b, __global float *c, unsigned int n);
ORGANIZATION DETAIL 34 Page
Select Platform
Select Device
Create Context
Load and compile kernels on devices
Create command queues for each device
Send data to devices using their command
queues
Send commands to devices using their command queues
Get data from devices using their command
queues
Release resources
Create buffers to store data on devices
Update kernels arguments
WebCL sequence (host side) [5/6]
// Execute (enqueue) kernel clQueue.enqueueNDRangeKernel(clKernel, null, // global work offset [BUFFER_SIZE], // global work size [2]); // local work size
Note: Use local work size = [] or null (default) to let driver chose the best values.
ORGANIZATION DETAIL 35 Page
Select Platform
Select Device
Create Context
Load and compile kernels on devices
Create command queues for each device
Send data to devices using their command
queues
Send commands to devices using their command queues
Get data from devices using their command
queues
Release resources
Create buffers to store data on devices
Update kernels arguments
WebCL sequence (host side) [6/6]
// get results and block while getting them var C=new Uint32Array(BUFFER_SIZE); clQueue.enqueueReadBuffer (cBuffer, true, // blocking call 0, size, C);
ORGANIZATION DETAIL 36 Page
Example: Matrix multiplication
§ “Hello World of CL”
§ C = A x B
§ N x N matrices
A B
C
ORGANIZATION DETAIL 37 Page
Example: Matrix multiplication
§ Op#miza#on § N x N matrices § C divided into m x m #les § With
• m = N / P • P = # threads per workgroup (16)
A B
C
ORGANIZATION DETAIL 38 Page
Example: Comparison with sequential § MacBook Pro (early 2011), OSX 10.8
§ CPU: Intel Core i7, 2.2GHz, 4 cores § GPU: AMD Radeon HD 6750M, 1 GB, 480 SPU, 600 MHz, 576 GFLOPS
0
50
100
150
200
250
128 256 512 1024 2048
Spee
dup
fact
or
OpenMP
CL (CPU)
CL (GPU)
CL (GPU opt)
ORGANIZATION DETAIL 39 Page
WEBCL – WEBGL INTEROP.
2012-09-06 © 2012 Motorola Mobility, Inc.
ORGANIZATION DETAIL 40 Page
WebCL / WebGL interop
§ WebCL context created from WebGL context
§ Configure shared CL objects from GL counterparts
§ Sync GL and CL § Flush GL, acquire GL object § Execute CL § Release CL object, flush CL
§ Vertex arrays, textures, render-‐buffers can be shared with CL
Initialize WebGL
Initialize WebCL
Configure shared CL-GL data
Set kernels args
Enqueue commands
Execute kernels
Update Scene
Initialization
Rendering loop (per frame)
Render scene
ORGANIZATION DETAIL 41 Page
WebCL / WebGL interop Initialize WebGL
Initialize WebCL
Configure shared CL-GL data
Set kernels args
Enqueue commands
Execute kernels
Update Scene
Render scene
// Create WebGL context var gl = canvas.getContext("experimental-webgl"); // Init GL …
// create the OpenCL context try { clContext = WebCL.createContext({ deviceType: WebCL.DEVICE_TYPE_GPU, shareGroup: gl }); } catch(err) { throw "Error: Failed to create context! "+err; }
ORGANIZATION DETAIL 42 Page
WebCL / WebGL interop (texture) // Create OpenGL texture object gl.activeTexture(gl.TEXTURE0); glTexture = gl.createTexture(); gl.bindTexture(gl.TEXTURE_2D, glTexture); gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.NEAREST); gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.NEAREST); gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA, TextureWidth, TextureHeight, 0, gl.RGBA, gl.UNSIGNED_BYTE, null); gl.bindTexture(gl.TEXTURE_2D, null);
Initialize WebGL
Initialize WebCL
Configure shared CL-GL data
Set kernels args
Enqueue commands
Execute kernels
Update Scene
Render scene
// Create the compute program from the source buffer (text) clProgram = clContext.createProgram(getScource("multiply_script")); // Build the program executable try { clProgram.build(clDevice, '-cl-fast-relaxed-math -DDEBUG=1'); } catch (err) { throw "Error: Failed to build program executable!\n" + clProgram.getBuildInfo(clDevice, WebCL.PROGRAM_BUILD_LOG); }
clKernel = clProgram.createKernel("multiply");
ORGANIZATION DETAIL 43 Page
Demo: GL Texture update with CL
§ Based on Evgeny Demidov 2D ink droplet WebGL ~26 fps WebCL ~124 fps
ORGANIZATION DETAIL 44 Page
WebCL / WebGL interop (vbo) Initialize WebGL
Initialize WebCL
Configure shared CL-GL data
Set kernels args
Enqueue commands
Execute kernels
Update Scene
Render scene
// set kernel args values clKernel.setArg(0, clVBO); clKernel.setArg(1, mesh_width, WebCL.type.UINT); clKernel.setArg(2, mesh_height, WebCL.type.UINT);
// create buffer object glVBO = gl.createBuffer(); gl.bindBuffer(gl.ARRAY_BUFFER, glVBO);
// initialize buffer object var sizeInBytes = mesh_width * mesh_height * 4 * FloatArray.BYTES_PER_ELEMENT; gl.bufferData(gl.ARRAY_BUFFER, sizeInBytes, gl.DYNAMIC_DRAW);
// create OpenCL buffer from GL VBO clVBO = clContext.createFromGLBuffer(WebCL.MEM_WRITE_ONLY, glVBO);
ORGANIZATION DETAIL 45 Page
Demo: VBO update with CL
ORGANIZATION DETAIL 46 Page
WebCL/WebGL interop (host side)
// Sync GL and acquire buffer from GL gl.flush(); clQueue.enqueueAcquireGLObjects(clTexture);
// Set global and local work sizes for kernel var local = null; var global = [ TextureWidth, TextureHeight ];
try { clQueue.enqueueNDRangeKernel(clKernel, null, global, local); } catch (err) { throw "Failed to enqueue kernel! " + err; }
// Release GL texture clQueue.enqueueReleaseGLObjects(clTexture); clQueue.flush();
Initialize WebGL
Initialize WebCL
Configure shared CL-GL data
Set kernels args
Enqueue commands
Execute kernels
Update Scene
Render scene
ORGANIZATION DETAIL 47 Page
Perspectives
§ WebCL and Node.JS are a match in heaven § Node.JS can process lots of events § WebCL can process lots of data using many devices
§ WebCL enables GPGPU applica#ons in Web browsers § Careful usage of architecture can lead to impressive speedup § With WebGL interoperability, rich graphics Web applica#ons are now possible
§ DRAFT WebCL specifica#on § Quite stable JavaScript host API § Focusing on more security and robustness
ORGANIZATION DETAIL 48 Page
WebCL Open process and Resources
§ Khronos open process to engage Web community § Public specifica#on drans, mailing lists, forums § hRp://www.khronos.org/webcl/ § [email protected]
§ Nokia open source prototype for Firefox in May 2011 (LGPL) § hRp://webcl.nokiaresearch.com
§ Samsung open source prototype for WebKit in July 2011 (BSD) § hRp://code.google.com/p/webcl/
§ Motorola open source prototype for NodeJS in March 2012 (BSD) § hRps://github.com/Motorola-‐Mobility/node-‐webcl § All demos in this talk were made with node-‐webcl / node-‐webgl
ORGANIZATION DETAIL 49 Page
Start learning Now! § OpenCL Programming Guide -‐ The “Red Book” of OpenCL
§ hRp://www.amazon.com/OpenCL-‐Programming-‐Guide-‐Aanab-‐Munshi/dp/0321749642
§ OpenCL in Ac#on § hRp://www.amazon.com/OpenCL-‐Ac#on-‐Accelerate-‐Graphics-‐Computa#ons/dp/1617290173/
§ Heterogeneous Compu#ng with OpenCL § hRp://www.amazon.com/Heterogeneous-‐Compu#ng-‐with-‐OpenCL-‐ebook/dp/B005JRHYUS
§ The OpenCL Programming Book § hRp://www.fixstars.com/en/opencl/book/
ORGANIZATION DETAIL 50 Page
Thank you!