opencl ryan renna. overview introduction history anatomy of opencl execution model memory model...
TRANSCRIPT
![Page 1: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/1.jpg)
OpenCLRyan Renna
![Page 2: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/2.jpg)
Overview
Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future
2
![Page 3: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/3.jpg)
Goals
Knowledge that is transferable to all APIs
Overview of concepts rather than API specific terminology
Avoid coding examples as much as possible
3
![Page 4: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/4.jpg)
Introduction
![Page 5: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/5.jpg)
What is OpenCL
A Language: Open Computer Language, it’s C like! Execute code across mixed platforms
consisting of CPUs, GPUs and other processors.
An API: Runs on the “Host”, manipulate and
control OpenCL objects and code. Deals with devices as abstract processing
units5
![Page 6: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/6.jpg)
Why Use GPUs?
Modern GPUs are made up of highly parallelizable processing units. Have been named “Stream Processors”
Modern pc’s all have dedicated GPUs which sit idle for most of the day to day processing
This strategy is known as “General-Purpose Computation on Graphical Processing Units” or GPGPU6
![Page 7: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/7.jpg)
Any device capable of Stream Processing, related to SIMD
Given a set of data (the stream) a series of functions (called Kernel functions) are applied to each element
On-chip memory is used, to minimize external memory bandwidth
The Stream Processor
Did you know:The Cell processor,
invented by Toshiba, Sony & IBM is a Stream
Processor?7
![Page 8: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/8.jpg)
Streams
Most commonly 2D grids (Textures)
Maps well to Matrix Algebra, Image Processing, Physics simulations, etc
Did you know:The latest ATI card has 1600 individual Stream
Processors?8
![Page 9: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/9.jpg)
Kernel Functions
for(int i = 0; i < 100 * 4; i++){
result[i] = source0[i] + source1[i];}
Traditional sequential method:
for(int el = 0; el < 100; el++){
vector_sum(result[el],source0[el],source1[el]);}
The same process, using the kernel “vector_sum”
9
![Page 10: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/10.jpg)
An “Open” Computing Language
Multiple CPU machines with multiple GPUs, all from different vendors, can work together.
10
![Page 11: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/11.jpg)
History
![Page 12: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/12.jpg)
GPGPU
General-Purpose Computation on Graphical Processing Units
Coined in 2002, with the rise of using GPUs for non-graphics applications
Hardware specific GPGPU APIs have been created :
CUDA NVidia 2007 Close To Metal ATI 2006
12
![Page 13: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/13.jpg)
GPGPU
General-Purpose Computation on Graphical Processing Units
Coined in 2002, with the rise of using GPUs for non-graphics applications
Hardware specific GPGPU APIs have been created :
CUDA NVidia 2007 Close To Metal ATI 2006
13
?
![Page 14: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/14.jpg)
The next step
OpenCL:
Developed by Apple computers
Collaborated with AMD, Intel, IBM and NVidia to refine the proposal
Submitted to the Khronos Group The specification for OpenCL 1.0 was
finished 5 months later
14
![Page 15: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/15.jpg)
You may remember me from such open standards as…
OpenGL – 2D and 3D graphics API
OpenAL – 3D audio API
OpenGL ES – OpenGL for embedded system. Used in all smartphones.
Collada – XML-based schema
for storing 3D assets. 15
![Page 16: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/16.jpg)
Anatomy of OpenCL
![Page 17: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/17.jpg)
API – Platform Layer
Compute Device A processor that executes data-parallel programs. Contains Compute Units
Compute Unit A Processing element. Example: a CORE of a CPU
Queues Submits work to a compute device. Can be in-order or out-of-order.
Context Collection of compute devices. Enables memory sharing across devices.
Host Container of Contexts. Represents the computer itself.17
![Page 18: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/18.jpg)
Host Example
A host computer with one device group
A Dual-core CPU
A GPU with 8 Stream Processors
Host
Context
Queue 1
Dual Core CPU
Compute Unit
Compute Unit
Queue 2
GPU with 8 Stream Processors
Compute Unit
Compute Unit
Compute Unit
Compute Unit
Compute Unit
Compute Unit
Compute Unit
Compute Unit
18
![Page 19: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/19.jpg)
API – Runtime Layer
Memory Objects Buffers
Blocks of memory, accessed as arrays, pointers or structs
Images 2D or 3D images
Executable Objects Kernel
A data-parallel function that is executed by a compute device
Program A group of kernels and
functions
Synchronization: Events
Caveat:Each image can be read or written in a kernel, but not both.
19
![Page 20: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/20.jpg)
Example Flow
Compile Code
Create Data &
Arguments
Send to Executio
n
Program• Program
with a collection of Kernels
CPU & GPU
Binaries
Memory Objects
Buffers
ImagesCompute Device
In-Order Queue
Out-of-Order Queue
20
![Page 21: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/21.jpg)
Execution Model of OpenCL
![Page 22: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/22.jpg)
The N-Dimensional computation domain is called the N-D Space, defines the total number of elements of execution Defines the Global Dimensions
Each element of execution, representing an instance of a kernel, is called a work-item
Work-items are grouped in local workgroups Size is defined by Local Dimensions
N-D Space
22
![Page 23: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/23.jpg)
Global work-items don’t belong to a workgroup and run in parallel independently (no synchronization)
Local work-items can be synchronized within a workgroup, and share workgroup memory
Each work-item runs as it’s own thread Thousands of lightweight threads can be running at a time, and are
managed by the device
Each work-item is assigned a unique id, a local id within it’s workgroup and naturally each workgroup is assigned a workgroup id
Work-Items
23
![Page 24: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/24.jpg)
Example – Image Filter
Executed on a 128 x 128 image, our Global Dimensions are 128, 128. We will have 16,384 work-items in total.
We can then define a Local Dimensions of 30, 30.
Since workgroups are executed together, and work-items can only be synchronized within workgroups, picking your Global and Local Dimensions is problem specific.
If we asked for the local id of work-item 31, we’d receive 1. As it’s the 1st work-item of the 2nd workgroup.
24
![Page 25: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/25.jpg)
Memory Model of OpenCL
![Page 26: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/26.jpg)
Memory Model
Private Per work-item
Local Shared within a
workgroup
Global/Constant Not synchronized,
per device
Host Memory
Compute Device
Host
Host Memory
Global / Constant Memory
Local Memory
Local Memory
..Compute Unit 1
Work Item
Private Private
Work Item ..
Compute Unit 2
Work Item
Private Private
Work Item
26
![Page 27: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/27.jpg)
Intermission
27
![Page 28: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/28.jpg)
Implementation
![Page 29: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/29.jpg)
Key thoughts: Work-items should be independent of each other
Workgroups share data, but are executed in sync, so they cannot depend on each others results
Find tasks that are independent and highly repeated, pay attention to loops
Transferring data over a PCI bus has overhead, parallelization is only justified for large data sets, or ones with lots of mathematical computations
Identifying Parallelizable Routines
29
![Page 30: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/30.jpg)
30
An Example – Class Average
Let’s imagine we were writing an application that computed the class average
There are two tasks we’d need to perform: Compute the final grade for each student
Obtain a class average by averaging the final grades
![Page 31: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/31.jpg)
Let’s imagine we were writing an application that computed the class average
There are two tasks we’d need to perform: Compute the final grade for each student
Obtain a class average by averaging the final grades
31
An Example – Class Average
Parallelizabl
e
Non-
Parallelizable
![Page 32: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/32.jpg)
Pseudo Code
32
Foreach(student in class){
grades = student.getGrades();sum = 0;count = 0;foreach(grade in grades){
sum += grade;count++;
}student.averageGrade = sum/count;
}
Compute the final grade for each student
![Page 33: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/33.jpg)
Foreach(student in class){ grades = student.getGrades(); sum = 0; count = 0; foreach(grade in grades) { sum += grade; count++; } student.averageGrade = sum/count;
}
Pseudo Code
33
This code can be isolated._kernel void calcGrade(__global const float* input,__global float* output){
int i = get_global_id(0);
//Do work on class[i]
}
![Page 34: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/34.jpg)
First decide how to represent your problem, this will tell you the dimensionality of your Global and Local dimensions.
Global dimensions are problem specific
Local dimensions are algorithm specific
Local dimensions must have the same number of dimensions as Global.
Local dimensions must divide the global space evenly
Passing NULL as a workgroup size argument will let OpenCL pick the most efficient setup, but no synchronization will be possible between work-items
34
Determining the Data Dimensions
![Page 35: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/35.jpg)
An OpenCL calculation needs to perform 6 key steps:
Initialization Allocate Resources Creating Programs/Kernels Execution Read the Result(s) Clean Up
Execution Steps
Warning! Code Ahead
35
![Page 36: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/36.jpg)
Store Kernel in string/char array
Initialization
const char* Kernel_Source = "\n "__calcGrade(__global const float* input,__global float* output){
int i = get_global_id(0);//Do work on class[i]
}”;
36
![Page 37: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/37.jpg)
Selecting a device and creating a context in which to run the calculation
Initialization
cl_int err;Cl_context context;cl_device_id devices;cl_command_queue cmd_queue;
err = clGetDeviceIDs(CL_DEVICE_TYPE_GPU,1,&devices,NULL);context = clCreateContext(0,1,&devices,NULL,NULL,&err);cmd_queue = clCreateCommandQueue(context,devices,0,NULL);
37
![Page 38: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/38.jpg)
Allocation of memory/storage that will be used on the device and push it to the device
Allocation
cl_mem ax_mem = clCreateBuffer(context,CL_MEM_READ_ONLY,atom_buffer_size,NULL,NULL);
err = clEnqueueWriteBuffer(cmd_queue,ax_mem,CL_TRUE,0,atom_buffer_size,(void*)values,0,NULL,NULL);
38
![Page 39: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/39.jpg)
Programs and Kernels are read in from source and loaded as binary
Program/Kernel Creation
cl_program program[1];cl_kernel kernel[1];
Program[0] = clCreateProgramWithSource(context,1,(const char**)&kernel_source,NULL,&err);
err = clBuildProgram(program[0],NULL,NULL,NULL,NULL);Kernel[0]= clCreateKernel(program[0],”calcGrade”,&err);
39
![Page 40: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/40.jpg)
Arguments to the kernel are set and the kernel is executed on all data
Execution
size_t global_work_size[1],local_work_size[1];global_work_size[0] = x; local_work_size[0] = x/2;
err = clSetKernelArg(kernel[0],0,sizeof(cl_mem),&values);
err = clEnqueueNDRangeKernel(cmd_queue,kernel[0],1,NULL,&global_work_size,&local_work_size,NULL,NULL);
40
![Page 41: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/41.jpg)
We read back the results to the Host
Read the Result(s)
err = clEnqueueReadBuffer(cmd_queue,val_mem,CL_TRUE,0,grid_buffer_size,val,0,NULL,NULL);
41
Note:If we were working on images, the function
clEnqueueReadImage() would be called
instead.
![Page 42: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/42.jpg)
Clean up memory, release all OpenCL objects. Can check OpenCL reference count and ensure it equals zero
Clean Up
clReleaseKernel(kernel);clReleaseProgram(program);clReleaseCommandQueue(cmd_queue);clReleaseContext(context);
42
![Page 43: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/43.jpg)
Instead of finding the first GPU, we could create a context out of all OpenCL devices, or decide to use specific dimensions / devices which would perform best on the devices dynamically.
Debugging can be done more efficiently on the CPU then on a GPU, prinf functions will work inside a kernel
Advanced Techniques
43
![Page 44: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/44.jpg)
Applications
![Page 45: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/45.jpg)
Applications
Raytracing Weather forecasting, Climate research Physics Simulations Computational finance Computer Vision Signal processing, Speech processing Cryptography / Cryptanalysis Neural Networks Database operations …Many more!
45
![Page 46: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/46.jpg)
The Future
![Page 47: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/47.jpg)
OpenGL Interoperability
OpenCL + OpenGL Efficient, inter-API communication OpenCL efficiently shares resources with OpenGL (doesn’t
copy) OpenCL objects can be created from OpenGL objects OpenGL 4.0 has been designed to align both standards to
closely work together
Example Implementation:Vertex and Image data generated
with OpenCL
Rendered with OpenGL
Post Processed with OpenCL
Kernels
47
![Page 48: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/48.jpg)
Competitor
DirectCompute by Microsoft Bundled with DirectX 11 Requires a DX10 or 11 graphic card Requires Windows Vista or 7 Close to OpenCL feature wise
Internet Explorer 9 and Firefox 3.7 both use DirectX to speed up dom tree rendering (Windows Only)
48
![Page 49: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/49.jpg)
Overview
With OpenCL Leverage CPUs, GPUs and other processors to accelerate
parallel computation
Get dramatic speedups for computationally intensive applications
Write accelerated portable code across different devices and architectures
49
![Page 50: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/50.jpg)
Getting Started…
ATI Stream SDK Support for OpenCL/OpenGL interoperability Support for OpenCL/DirectX interoperability http://developer.amd.com/gpu/ATIStreamSDK/Pages/default.aspx
Cuda Toolkit http://developer.nvidia.com/object/cuda_3_0_downloads.html
OpenCL.NET OpenCL Wrapper for .NET languages http://www.hoopoe-cloud.com/Solutions/OpenCL.NET/Default.a
spx
50
![Page 51: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/51.jpg)
The End? No… The Beginning
51
![Page 52: OpenCL Ryan Renna. Overview Introduction History Anatomy of OpenCL Execution Model Memory Model Implementation Applications The Future](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649cd75503460f9499f15b/html5/thumbnails/52.jpg)
References
52
http://www.macresearch.org/opencl_episode1 http://developer.amd.com/GPU/ATISTREAMSDK/pages/TutorialOpe
nCL.aspx http://en.wikipedia.org/wiki/Stream_Processing http://techreport.com/articles.x/11211 http://www.geeks3d.com/20100115/gpu-computing-geforce-and-r
adeon-opencl-test-part-1/ http://gpgpu.org/about http://developer.apple.com/Mac/library/documentation/Performan
ce/Conceptual/OpenCL_MacProgGuide/WhatisOpenCL/WhatisOpenCL.html
http://www.khronos.org/developers/library/overview/opencl_overview.pdf
http://gpgpu.org/wp/wp-content/uploads/2009/09/C1-OpenCL-API.pdf