e6895 advanced big data analytics lecture 7: gpu and cudacylin/course/bigdata/eecs6895... · 2...
TRANSCRIPT
![Page 1: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/1.jpg)
© CY Lin, 2016 Columbia UniversityE6895 Advanced Big Data Analytics — Lecture 7
E6895 Advanced Big Data Analytics Lecture 7:
GPU and CUDA
Ching-Yung Lin, Ph.D.
Adjunct Professor, Dept. of Electrical Engineering and Computer Science
IBM Chief Scientist, Graph Computing Research
![Page 2: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/2.jpg)
© CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics — Lecture 72
Reference Book
CUDA: Compute Unified Device Architecture
![Page 3: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/3.jpg)
© CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics — Lecture 73
GPU
2001: NVIDIA’s GeForce 3 series made probably the most breakthrough in GPU technology
— the computing industry’s first chip to implement Microsoft’s then-new Direct 8.0 standard;
— which required that the compliant hardware contain both programmable vertex and programmable pixel shading stages
Early 2000s: The release of GPUs that possessed programmable pipelines attracted many researchers to the possibility of using graphics hardware for more than simply OpenGL or DirectX-based rendering.
— The GPUs of the early 2000s were designed to produce a color for every
pixel on the screen using programmable arithmetic units known as pixel shaders.
— The additional information could be input colors, texture coordinates, or other attributes
![Page 4: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/4.jpg)
© CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics — Lecture 74
2006: GPU computing starts going for prime time — Release of CUDA — The CUDA Architecture included a unified shader pipeline, allowing each
and every arithmetic logic unit (ALU) on the chip to be marshaled by a program intending to perform general-purpose computations.
CUDA
![Page 5: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/5.jpg)
© CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics — Lecture 75
• Medical Imaging • Computational Fluid Dynamics • Environmental Science
Examples
![Page 6: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/6.jpg)
© CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics — Lecture 76
GPU on a MacBook
GT 750M: — 2 * 192 CUDA cores — max thread number: 2 * 2048
![Page 7: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/7.jpg)
© CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics — Lecture 77
Amazon AWS
![Page 8: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/8.jpg)
© 2015 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 138
GPU on iOS devices
![Page 9: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/9.jpg)
© 2015 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 139
iPhone models and their GPUs
PowerVR GPU has been used. A4 => SGX 535 (1.6 GFLOPS) A5 => SGX 543 MP2 (12.8 GLOPS)
A6 => SGX 543 MP3 (25.5 GFLOPS) A7 => G6430 (quad core) (230.4 GFLOPS)
A8 => GX6450 (quad core) (332.8 GLOPS)
![Page 10: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/10.jpg)
© 2015 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 1310
GPU in Apple A8 SoC
A8 — iPhone 6 and iPhone 6 Plus
GPU: PowerVR Quad-core GX6450
4 Unified Shading Cluster (USC)
# of ALUs: 32 (FP32) or 64 (FP16) per USC
GFLOPS: 166.4 (FP32)/ 332.8 (FP16) @ 650 MHz
Supports OpenCL 1.2
source: http://www.imgtec.com/powervr/series6xt.asp
manufactured by TSMC
![Page 11: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/11.jpg)
© 2015 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 1311
GPU Programming in iPhone/iPad - Metal
Metal provides the lowest-overhead access to the GPU, enabling developers to maximize the graphics and compute potential of iOS 8 app.*
Metal could be used for:
Graphic processing ➔ openGL
General data-parallel processing ➔ open CL and CUDA
*: https://developer.apple.com/metal/
![Page 12: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/12.jpg)
© 2015 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 1312
Fundamental Metal Concepts
• Low-overhead interface • Memory and resource management • Integrated support for both graphics and compute operations • Precompiled shaders
![Page 13: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/13.jpg)
© 2015 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 1313
GPU Programming in iPhone/iPad - Metal
Programming flow is similar to CUDA
Copy data from CPU to GPU
Computing in GPU
Send data back from GPU to CPU
Example: kernel code in Metal, sigmoid function:
source: http://memkite.com
thread_id for data parallelization
kernel codedevice memory
![Page 14: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/14.jpg)
© CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics — Lecture 714
CUDA Compiler
CUDA supports most Windows, Linux, and Mac OS compilers
For Linux: • Red Hat • OpenSUSE • Ubuntu • Fedora
![Page 15: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/15.jpg)
© CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics — Lecture 715
CUDA C
Hello World!!
Host: CPU and its memory Device: GPU and its memory
![Page 16: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/16.jpg)
© CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics — Lecture 716
A Kernel Call
nvcc handles compiling the function kernel() it feeds main() to the host compiler
![Page 17: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/17.jpg)
© CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics — Lecture 717
Passing Parameters
![Page 18: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/18.jpg)
© CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics — Lecture 718
Parallel Programming in CUDA C
![Page 19: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/19.jpg)
© CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics — Lecture 719
Traditional C way
![Page 20: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/20.jpg)
© CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics — Lecture 720
Executing on each of the two CPU cores
![Page 21: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/21.jpg)
© CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics — Lecture 721
GPU way — I
![Page 22: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/22.jpg)
© CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics — Lecture 722
GPU way — II
![Page 23: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/23.jpg)
© CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics — Lecture 723
Blocks and Threads
![Page 24: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/24.jpg)
© CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics — Lecture 724
2D hierarchy of blocks and threads
![Page 25: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/25.jpg)
© CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics — Lecture 725
GPU way — III
![Page 26: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/26.jpg)
© CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics — Lecture 726
GPU Blocks
![Page 27: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/27.jpg)
© CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics — Lecture 727
GPU Threads — I
![Page 28: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/28.jpg)
© CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics — Lecture 728
GPU Threads — II
![Page 29: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/29.jpg)
© 2016 CY Lin, Columbia UniversityE6895 Big Data Analytics – Lecture 7
CUDA on Mac OS X
29
![Page 30: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/30.jpg)
© 2016 CY Lin, Columbia UniversityE6895 Big Data Analytics – Lecture 730
Example: deviceQuery
Understand the hardware constraint via deviceQuery (in example code of CUDA toolkit)
![Page 31: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/31.jpg)
© 2016 CY Lin, Columbia UniversityE6895 Big Data Analytics – Lecture 731
Example: Matrix Addition on CPU
Problem: Sum two matrices with M by N size. Cmxn = Amxn + Bmxn
In traditional C/C++ implementation: • A, B are input matrix, N is the size of A and B. • C is output matrix • Matrix stored in array is row-major fashion
![Page 32: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/32.jpg)
© 2016 CY Lin, Columbia UniversityE6895 Big Data Analytics – Lecture 732
Example: Matrix Addition on GPU - 2D grid with 2D blocks
Problem: Sum two matrices with M by N size. Cmxn = Amxn + Bmxn
CUDA C implementation: • matA, matB are input matrix, nx is column size, and ny is row size • matC is output matrix
![Page 33: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/33.jpg)
© 2016 CY Lin, Columbia UniversityE6895 Big Data Analytics – Lecture 733
Example: Matrix Addition on GPU - 2D grid with 2D blocks
Data accessing in 2D grid with 2D blocks arrangement (one green block is one thread block)
bloc
kDim
.y
blockDim.x
threadIdx.x
threadIdx.y
![Page 34: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/34.jpg)
© 2016 CY Lin, Columbia UniversityE6895 Big Data Analytics – Lecture 734
Example: Matrix Addition on GPU - 1D grid with 1D blocks
Data accessing in 1D grid with 1D blocks arrangement (one green block is one thread block)
blockDim.x
(ix, iy)threadIdx.x
iy
![Page 35: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/35.jpg)
© 2016 CY Lin, Columbia UniversityE6895 Big Data Analytics – Lecture 735
Example: Matrix Addition on GPU - 2D grid with 1D blocks
Data accessing in 2D grid with 1D blocks arrangement (one green block is one thread block)
blockDim.x1
(ix, iy)threadIdx.x
![Page 36: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/36.jpg)
© 2016 CY Lin, Columbia UniversityE6895 Big Data Analytics – Lecture 736
Example: Matrix Transpose on CPU
Problem: Transpose one matrix with M by N to one matrix with N by Amxn = Bnxm
In traditional C/C++ implementation: • in is input matrix, nx is column size, and ny is row size. • out is output matrix • Matrix stored in array is row-major fashion
![Page 37: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/37.jpg)
© 2016 CY Lin, Columbia UniversityE6895 Big Data Analytics – Lecture 737
Example: Matrix Transpose on GPU
![Page 38: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/38.jpg)
© 2016 CY Lin, Columbia UniversityE6895 Big Data Analytics – Lecture 738
Example: Matrix Transpose on GPU
![Page 39: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/39.jpg)
© 2016 CY Lin, Columbia UniversityE6895 Big Data Analytics – Lecture 739
Example: Concurrent Processing
Concurrent handle data transfer and computation For NVIDIA GT 650M (laptop GPU), there is one copy engine. For NVIDIA Tesla K40 (high-end GPU), there are two copy engines
The latency in data transfer could be hidden during computing To handle two tasks, which both are matrix multiplications.
Copy two inputs to GPU, copy one output from GPU
No concurrent processing Concurrent processing
![Page 40: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/40.jpg)
© 2016 CY Lin, Columbia UniversityE6895 Big Data Analytics – Lecture 740
Reference
Professional CUDA C Programming http://www.wrox.com/WileyCDA/WroxTitle/Professional-CUDA-C-
Programming.productCd-1118739329,descCd-DOWNLOAD.html source code are available on the above website
![Page 41: E6895 Advanced Big Data Analytics Lecture 7: GPU and CUDAcylin/course/bigdata/EECS6895... · 2 E6895 Advanced Big Data Analytics — Lecture 7 © CY Lin, Columbia University Reference](https://reader033.vdocuments.us/reader033/viewer/2022042008/5eca5c34c471db0fee405689/html5/thumbnails/41.jpg)
© 2016 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics — Lecture 7
Homework #3 (due March 31st)
Choose 2 of the algorithms you use in your homework #2. Convert them into a GPU version.
Show your code, and performance measurement comparing to your non-GPU version.
The more innovative/complex your algorithms are, the higher score you will get on your Homework #3.
41