Download - Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green
![Page 1: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/1.jpg)
An Introduction to OpenCL Libraries
Productive OpenCL Programming
![Page 2: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/2.jpg)
● We make code run faster○ Started in 2007 by Georgia Tech researchers○ 1000s of paying customers
![Page 3: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/3.jpg)
● We build an acceleration library○ for really cool science, engineering, and finance applications○ for mobile computing
![Page 4: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/4.jpg)
Libraries are Great!
![Page 5: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/5.jpg)
Eliminate Hidden Costs
![Page 6: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/6.jpg)
Library Types
● Specialized GPU Libs○ Targeted at a specific set of operators (functionality) ○ Optimized for specific systems○ C-like interface○ Raw pointer interface
● General GPU Libs○ Manage GPU resources using containers○ Applicable to a large set of applications and domains○ Portable across multiple architectures○ Higher level functions○ C++ interface (supports templates)
![Page 7: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/7.jpg)
Specialized GPU Libraries
● Fast Fourier Transforms○ clFFT
● Random Number Generation○ Random123
● Linear Algebra○ clBLAS○ MAGMA
● Signal and Image Processing○ OpenCLIPP
![Page 8: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/8.jpg)
Specialized GPU Libraries
● C Interface○ Use pointers to reference data
● Memory management is programmer responsibility● Mimic existing libraries
○ clBLAS ≈ BLAS○ MAGMA ≈ BLAS + LAPACK○ clFFT ≈ FFTW
● Simplifies GPU integration of specialized scientific libraries○ Still requires setting up the GPU
![Page 9: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/9.jpg)
clFFT
● 1D, 2D and 3D transforms● CPU and GPU backends● Supports
○ Real and complex data types○ Single and double-precision ○ Execution of multiple transformations concurrently
![Page 10: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/10.jpg)
Random123
● Counter-based RNG● Passed SmallCrush, Crush and BigCrush tests● Four RNG families
○ Threefry○ Philox○ AESNI○ ARS
● Not suitable for cryptography
![Page 11: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/11.jpg)
Magma & clBLAS
● Implements many popular linear algebra routines● Supports
○ Real and complex data types ○ Single and double-precision
![Page 12: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/12.jpg)
OpenCLIPP
● Supports multiple image types● Similar to Intel IPP● Primitives
○ Arithmetic and logic○ LUT○ Morphology○ Transform○ Resize○ Histogram○ Many more…
● C and C++ interface
![Page 13: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/13.jpg)
General-Purpose GPU Libraries
● Bolt● OpenCV● ArrayFire
Images taken from: http://wordlesstech.com/2012/10/12/leatherman-oht-multi-tool/
![Page 14: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/14.jpg)
Bolt
● GPU library which resembles C++ STL○ STL like data structures○ Iterators○ Fully interoperable with OpenCL
● Parallel vector operation methods○ Reductions○ Sorting○ Prefix-Sum
● Customizable GPU kernels using functors● Some functions only supported on AMD GPUs
![Page 15: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/15.jpg)
Bolt - Data Structures
● Built around the device_vector● Supports the same data types as C++
○ device_vector<float> data(2e6);
● Useful when performing multiple operations on a vector
● Can be passed into STL algorithms○ Always interoperability○ Data transfer will be costly
![Page 16: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/16.jpg)
Bolt - Algorithms
● Uses a C++ STL like interface○ Pass the begin and end iterators
● Accept functors which allow you to run custom operations on OpenCL devices
● Multiple backends○ OpenCL, C++AMP, and TBB○ Not all algorithms implemented across all backends
● Works on vector and device_vector
![Page 17: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/17.jpg)
OpenCV
● Open source computer vision library● C++ interface with many language wrappers● Hundreds of CV functions
![Page 18: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/18.jpg)
OpenCV ArrayFire Interop
● Helper Functions○ https://github.com/arrayfire-community/arrayfire_opencv.git
Mat R; Rodrigues(poses(Rect(0, 0, 1, 3)), R);af::array af_R = mat_to_array(R);
![Page 19: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/19.jpg)
ArrayFire - Data Structures
● Built around a flexible data structure named "array"○ Lightweight wrapper around the data on the compute device
○ Manages the data and basic metadata such as size, type and dimensions
● You can transfer data into an array using constructors● Column major
float hA[6] = {0, 1, 2, 3, 4, 5};array A(2, 3, hA);
![Page 20: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/20.jpg)
ArrayFire - Indexing#include <arrayfire.h>
#include <af/utils.h>
void af_example()
{
float f[8] = {1, 2, 4, 8, 16, 32, 64, 128};
array a(2, 4, f); // 2 rows x 4 col array initialized with f values
array sumSecondCol = sum(a(span, 1)); // reduce-sum over the second column
print(sumSecondCol); // 12
}
![Page 21: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/21.jpg)
Using ArrayFire:
array tmp = img(span,span,0); // save the R channel
img(span,span,0) = img(span,span,2); // R channel gets values of B
img(span,span,2) = tmp; // B channel gets value of R
Can also do it this way:
array swapped = join(2, img(span,span,2), // blue
img(span,span,1), // green
img(span,span,0)); // red
Or simply:
array swapped = img(span,span,seq(2,-1,0));
ArrayFire Example - swap R and B
![Page 22: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/22.jpg)
Using ArrayFire:array img = loadimage("image.jpg", false); // load grayscale image from disk to
device
array img_T = img.T(); // transpose
ArrayFire Functions
![Page 23: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/23.jpg)
Original
![Page 24: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/24.jpg)
Grayscale
![Page 25: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/25.jpg)
Box filter blur
![Page 26: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/26.jpg)
Gaussian blur
![Page 27: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/27.jpg)
Image Negative
![Page 28: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/28.jpg)
ArrayFire // erode an image, 8-neighbor connectivity
array mask8 = constant(1,3, 3);
array img_out = erode(img_in, mask8);
// erode an image, 4-neighbor connectivity
const float h_mask4[] = { 0.0, 1.0, 0.0,
1.0, 1.0, 1.0,
0.0, 1.0, 0.0 };
array mask4 = array(3, 3, h_mask4);
array img_out = erode(img_in, mask4);
Erosion
![Page 29: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/29.jpg)
Erosion
![Page 30: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/30.jpg)
ArrayFire
array R = convolve(img, ker); // 1, 2 and 3d convolution filter
array R = convolve(fcol, frow, img); // Separable convolution
array R = filter(img, ker); // 2d correlation filter
Filtering
![Page 31: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/31.jpg)
Histograms
ArrayFireint nbins = 256;
array hist = histogram(img,nbins);
![Page 32: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/32.jpg)
Transforms
ArrayFirearray half = resize(0.5, img);
array rot90 = rotate(img, af::Pi/2);
array warped = approx2(img, xLocations, yLocations);
![Page 33: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/33.jpg)
Image smoothing
ArrayFire
array S = bilateral(I, sigma_r, sigma_c);
array M = meanshift(I, sigma_r, sigma_c, iter);
array R = medfilt(img, 3, 3);
// Gaussian blur
array gker = gaussiankernel(ncols, ncols);
array res = convolve(img, gker);
![Page 34: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/34.jpg)
FFT
ArrayFire
array R1 = fft2(I); // 2d fft. check fft, fft3
array R2 = fft2(I, M, N); // fft2 with padding
array R3 = ifft2(fft2(I, M, N) * fft2(K, M, N)); // convolve using fft2
![Page 35: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/35.jpg)
ArrayFire Capabilities
● Hundreds of parallel functions for multi-disciplinary work○ Image processing○ Machine learning○ Graphics○ Sets
● Support for multiple languages○ C/C++, Fortran, Java and R
● Linux, Windows, Mac OS X
![Page 36: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/36.jpg)
ArrayFire Capabilities
● OpenGL based graphics● JIT
○ Combine multiple operations into one kernel
● GFOR - data parallel loop○ Allows concurrent execution over multiple data sets (for example
images)
![Page 37: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/37.jpg)
ArrayFire Functions
● Supports hundreds of parallel functions○ Building blocks
■ Reductions■ Scan■ Set operations■ Sorting■ Statistics■ Basic matrix manipulation
Images taken from: http://technogems.blogspot.com/2011/06/sorting-included-files-by-importance.htmlhttp://www.cmsoft.com.br/tutorialOpenCL/CLMatrixMultExplanationSubMatrixes.png
![Page 38: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/38.jpg)
ArrayFire Functions
● Hundreds of highly-optimized parallel functions○ Signal/image processing
■ Convolution■ FFT■ Histograms■ Interpolation■ Connected components
○ Linear Algebra■ Matrix multiply■ Linear system solving■ Factorization
![Page 39: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/39.jpg)
GFOR: What is it?
• Data-Parallel for loop, e.g.
for (i = 0; i < 3; i++) C(span,span,i) = A(span,span,i) * B;
gfor (array i, 3) C(span,span,i) = A(span,span,i) * B;
Serial matrix-vector multiplications (3 kernel launches)
Parallel matrix-vector multiplications (1 kernel launch)
![Page 40: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/40.jpg)
Example: Matrix Multiply
• Data-Parallel for loop, e.g.
*
BA(,,1)
iteration i = 1
C(,,1)
=
for (i = 0; i < 3; i++) C(span,span,i) = A(span,span,i) * B;
Serial matrix-vector multiplications (3 kernel launches)
![Page 41: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/41.jpg)
Example: Matrix Multiply
• Data-Parallel for loop, e.g.
for (i = 0; i < 3; i++) C(span,span,i) = A(span,span,i) * B;
*
BA(,,1)
iteration i = 1
C(,,1)
= *
BA(,,2)
iteration i = 2
C(,,2)
=
Serial matrix-vector multiplications (3 kernel launches)
![Page 42: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/42.jpg)
Example: Matrix Multiply
• Data-Parallel for loop, e.g.
for (i = 0; i < 3; i++) C(span,span,i) = A(span,span,i) * B;
*
BA(,,1)
iteration i = 1
C(,,1)
= *
BA(,,2)
iteration i = 2
C(,,2)
= *
BA(,,3)
iteration i = 3
C(,,3)
=
Serial matrix-vector multiplications (3 kernel launches)
![Page 43: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/43.jpg)
Example: Matrix Multiply
gfor (array i, 3) C(span,span,i) = A(span,span,i) * B;
Parallel matrix multiplications (1 kernel launch)
simultaneous iterations i = 1:3
*
BA(,,1)C(,,1)
= *
BA(,,2)C(,,2)
= *
BA(,,3)C(,,3)
=
![Page 44: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/44.jpg)
Example: Matrix Multiply
simultaneous iterations i = 1:3
BA(,,1:3)C(,,1:3)
*=*=
*=
Think of GFOR as compiling 1 stacked kernel with all iterations.
gfor (array i, 3) C(span,span,i) = A(span,span,i) * B;
Parallel matrix multiplications (1 kernel launch)
![Page 45: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/45.jpg)
JIT Code Generation
● Run time kernel generation● Combines multiple element wise operations into one
kernel● Reduces kernel launching overhead● Intermediate data not allocated● Improves cache performance
![Page 46: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/46.jpg)
Success Stories
Field Application Speedup
Academia Power Systems Simulations 35x
Finance Option Pricing 52x
Government Radar Image Formation 45x
Life Sciences Pathology Advances > 100x
Manufacturing Tomography of Vegetation 10x
Media & Computer Vision Digital Holography 17x
Oil & Gas Ground Water Simulations > 20x
![Page 47: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/47.jpg)
Future capabilities
● We are interested in Big Data applications● Create capabilities for
○ Streaming video○ Large number of images○ Machine learning○ Data analysis○ Dynamic data
● Faster rendering utilities for Big Data
![Page 48: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/48.jpg)
Comments on Open Source
● https://github.com/arrayfire-community
![Page 49: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/49.jpg)
Q & A
Speaker: Oded Green ([email protected])
Engineers: Umar Urshad ([email protected])
Pavan Yalamanchili ([email protected])
Sales:Scott Blakeslee ([email protected])
![Page 50: Productive OpenCL Programming An Introduction to OpenCL Libraries with ArrayFire COO Oded Green](https://reader033.vdocuments.us/reader033/viewer/2022061217/54b4eb7c4a79590a688b4581/html5/thumbnails/50.jpg)
Look us up
www.ArrayFire.com
For language wrappers and exampleshttps://github.com/ArrayFire