cuda development in python language
DESCRIPTION
CUDA Development in Python LanguageTRANSCRIPT
![Page 1: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/1.jpg)
CUDA Development using
PyCUDAPart 1
Prof. Mario A. Gazziro (Yah!)Organization: Prof. André
Carvalho
July 2012Support:
Igor, Heitor, Pedro, Ruan and Andre
![Page 2: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/2.jpg)
What 1 TeraFlop/s means ?
Computer TeraFlops/s
Year of instalatio
n
PriceUS$
Institute
PowerPC970
3 2007 500.000 USP (CCE)
Cluster GPU
Attilio *
16 2009 100.000 IFSC
BlueGene L 26 -(49) 2007-(2010)
Donation IINN
TUPÃ 258 2010 25.000.000 INPE
*only paralell computer – all other are serial ones.
![Page 3: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/3.jpg)
Instalation – Step 1DRIVER:sudo /etc/init.d/gdm stop<ALT+F1><logar como labredes senha 12345678>chmod 777 devdriver_4.2_linux_64_295.41.runsudo ./devdriver_4.2_linux_64_295.41.run<concordar com tudo>sudo /etc/init.d/gdm start
![Page 4: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/4.jpg)
Instalation – Step 2TOOLKIT:chmod 777 ./cudatoolkit_4.2.9_linux_64_ubuntu10.04.runsudo ./cudatoolkit_4.2.9_linux_64_ubuntu10.04.run <concordar com todas as opcoes><incluir o texto abaixo no final do arquivo .bashrc>cd ~gedit .bashrcexport PATH=/usr/local/cuda/bin:$PATHexport LPATH=/usr/lib/nvidia-current:$LPATHexport LIBRARY_PATH=/usr/lib/nvidia-current:$LIBRARY_PATHexport
LD_LIBRARY_PATH=/usr/lib/nvidia-current:/usr/local/cuda/lib64:/usr/local/cuda/lib: $LD_LIBRARY_PATH
<salvar e sair>
![Page 5: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/5.jpg)
Instalation – Step 3SDK:chmod 777 gpucomputingsdk_4.2.9_linux.run<NAO USAR SUDO!>./gpucomputingsdk_4.2.9_linux.run<concordar com todas as opcoes><fechar todas janelas de terminal - variaveis de ambiente>
<Testar compilador>nvcc<deve aparecer a seguinte mensagem de erro>nvcc fatal : No input files specified; use option --help
for more information
![Page 6: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/6.jpg)
Instalation – Step 4cd ~/NVIDIA_GPU_Computing_SDK/make<aguardar>cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release/ls./SobelFilter./particles<clique com botao 2 e selecione move cursor mode>< mova o mouse>./fluidsGL<FECHAR TODOS>./particles<CTRL+Z>bg./fluidsGL< verificar execução conconcorrente dos kernels >< compartilhamento dos recursos da GPU >
![Page 7: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/7.jpg)
CUDA SDK sample applications
![Page 8: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/8.jpg)
CUDA SDK sample applications
![Page 9: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/9.jpg)
Instalation – Step 5<INSTALACAO DAS DEPENDENCIAS DO PYTHON >sudo apt-get install python-numpysudo apt-get install python-h5py sudo apt-get install python-scipysudo apt-get install python-matplotlib<testar interface grafica>cd ~/Área de Trabalho/CUDApython ./teste_grafico.py
visualizador hdf5:chmod 777 hdfview_install_linux64.bin./hdfview_install_xxx.bin./hdfview
![Page 10: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/10.jpg)
Instalation – Step 6Py-Cuda:sudo apt-get install build-essential python-dev python-
setuptools libboost-python-dev libboost-thread-dev –ytar xzvf pycuda-2011.2.2.tar.gz cd pycuda-2011.2.2/./configure.py --cuda-root=/usr/local/cuda --cudadrv-lib-
dir=/usr/lib --boost-inc-dir=/usr/include --boost-lib-dir=/usr/lib --boost-python-libname=boost_python-mt --boost-thread-libname=boost_thread-mt --no-use-shipped-boost
make -j 4<apagar siteconf.py em caso de erro!!!>sudo env PATH=$PATH python setup.py install<testar pycuda>cd ~/Área de Trabalho/CUDA/pycuda-2011.2.2/examplespython ./demo.py<visualizar manipulacao de arrays e matrizes>
![Page 11: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/11.jpg)
Part I: Overview
Definition:
Graphical Processing Units are just graphical card adaptors that can give access to programmers to their internal API (Advanced Program Interface). Today, there are even GPUs without graphics output (build only to perform scientific calculations).
Introduced in 2006, the Compute Unified Device Architecture is a combination of software and hardware architecture (available for NVIDIA G80 GPUs and above) which enables data-parallel general purpose computing on the graphics hardware. It therefore offers a C-like programming API with some language extensions.
Key Points:
The architecture offers support for massively multi threaded applications and provides support for inter-thread communication and memory access.
![Page 12: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/12.jpg)
Why this topic is important?
Data-intensive problems challenge conventional computing architectures with demanding CPU, memory, and I/O requirements.
Emerging hardware technologies, like CUDA architecture can significantly boost performance of a wide range of applications by increasing compute cycles and bandwidth and reducing latency.
![Page 13: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/13.jpg)
Where would I encounter this?
Gaming
Raytracing
3D Scanners
Computer Graphics
Number Crunching
Scientific Calculation
![Page 14: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/14.jpg)
CUDA vs Intel
NVIDIA GeForce 8800 GTX vs Intel Xeon E5335 2GHz, L2 cache 8MB
![Page 15: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/15.jpg)
Grid of thread blocks
The computational grid consist of a grid of thread blocks
Each thread executes the kernel
The application specifies the grid and block dimensions
The grid layouts can be 1, 2 or 3-dimensional
The maximal sizes are determined by GPU memory
Each block has a unique block ID
Each thread has a unique thread ID (within the block)
![Page 16: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/16.jpg)
Elementwise Matrix Addition
![Page 17: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/17.jpg)
Elementwise Matrix Addition
The nested for-loops are replaced with an implicit grid
![Page 18: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/18.jpg)
Memory model
CUDA exposes all the different type of memory on GPU:
![Page 19: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/19.jpg)
Part II: Classroom Exercises – HEAT (local)
Compile and run
heat.cu example
Command line: nvcc heat.cu –o heat -lglut
![Page 20: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/20.jpg)
Part II: Classroom Exercises – HEAT (remote)
Compile and run heat.cu example in HAL9k gpu server (IFSC/CIERMag)
cmd: ssh – X [email protected] –p 2236
What is the change ?
Why this change happen ?
What is influence of the network latency in the final result ?
![Page 21: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/21.jpg)
Part II: Classroom Exercises – EXERC1
Type, compile and test the following code. What this program do ?
Command line: gcc exerc1.c –o exerc1
![Page 22: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/22.jpg)
Part II: Classroom Exercises – EXERC2
Type, compile and test the following code. What this program do ?
Command line: gcc exerc2.c –o exerc2
![Page 23: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/23.jpg)
Part II: Classroom Exercises – EXERC3
Type, compile and test the following code. What this program do ?
Command line: nvcc exerc3.cu –o exerc3
![Page 24: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/24.jpg)
Part II: Classroom Exercises – EXERC4
![Page 25: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/25.jpg)
Part II: Classroom Exercises – EXERC5
![Page 26: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/26.jpg)
Part II: Classroom Exercises – EXERC6 – part A
![Page 27: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/27.jpg)
Part II: Classroom Exercises – EXERC6 – part B
![Page 28: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/28.jpg)
Part III: Project
Case Study: Initial calculation for solving sparse matrix in the method proposed by professor Guilherme Sipahi, from IFSC
N=1001;K(1:N) = rand(1,N);g1(1:2*N) = rand(1,2*N);k = 1.3;tic;for i=1:N for j=1:N M(i,j) = g1(N+i-j)*(K(i)+k)*(K(j)+k); endend
Task: Design the CUDA kernel for this algorithm (using PyCuda or C) and compare its speed-up with the gold-standard provided by professor.
![Page 29: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/29.jpg)
Part III: Project Best Solution (Mateus and Bié)
Case Study: Initial calculation for solving sparse matrix in the method proposed by professor Guilherme Sipahi, from IFSC
BLOCK(16, 2, 1)
GRID(1000/32, 1000)
370uS with 16 cuda cores
![Page 30: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/30.jpg)
Questions ?
So long and thanks by all the fish! – See you tomorrow!
2nd day activities:
-Database integration
(HDF5)
-Graphics Visualization
(Matplotlib)
-Thread syncs and
Thread fences
- Atomic Operations and
Critical region control
![Page 31: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/31.jpg)
References
Gokhale M. et al, Hardware Technologies for High-Performance Data-Intensive Computing, IEEE Computer, 18-9162, pg 60, 2008.
Lietsch S. et al. A CUDA-Supported Approach to Remote Rendering, Lecture Notes in Computer Science. 2007.
Fujimoto N. Faster Matrix-Vector Multiplication on GeForce 8800 GTX, IEEE, 2008.
Book Reference
NVIDIA Corporation, David, NVIDIA CUDA Programming Guide, Version 1.1, 2007.
![Page 32: CUDA Development in Python Language](https://reader035.vdocuments.us/reader035/viewer/2022062419/558e1b571a28abbd5b8b460e/html5/thumbnails/32.jpg)
/ s ?