ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail gpu software: modules dev core apps pgi...

20

Upload: others

Post on 10-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi
Page 2: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi

● ssh glogin.ibex.kaust.edu.sa● First login auto-generates keys & ssh config

– .ssh/config● Host glogin #GPU login nodes

Hostname glogin.ibex.kaust.edu.saUser $USERIdentityFile ~/.ssh/ksl-internalStrictHostKeyChecking noForwardX11 yesForwardX11Trusted yes

Getting Started: GPU Login

https://www.hpc.kaust.edu.sa/ibex/new_user https://www.hpc.kaust.edu.sa/ibex/faq

Page 3: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi

● Modules– Customized to login node (GPU, Intel, AMD)

● glogin: /sw/csg/modulefiles/*

– Improved GPU App Stack is here● Make requests: [email protected] ● Stay connected: https://kaust-ibex.slack.com/

– #announce, #general, #gpu

– Prefer default modules (/sw/csg/modulefiles/*)● /cbrc/modules/* will be deprecated

GPU Software: Modules

Page 4: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi

● Modules

module availmodule load module/version

GPU Software: Modules

https://www.hpc.kaust.edu.sa/ibex/appNvidia / show all

Page 5: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi

module avail

GPU Software: Modules

DEV CORE APPS

pgi (OpenACC)gccintelcmakegitjavamaven

NVIDIA (OpenGL / EGL)cudacudnnncclopenmpi

anaconda3machine_learning tensorflow keras torch caffe* caffe2 theano* scipy, numpy, scikit-learn, etc.

paraviewbclfastq2cp2kgromacslammpsmapdnamdpysparkrelionseismic_unixsphire

* NVIDIA EGL supported; X11+GL support is missing...

Page 6: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi
Page 7: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi

● sinfo --partition=batch --format="%n %f" | fgrep -v nogpu

● dgpu501-22-r cpu_intel_e5_2670,gpu,...,tesla_k40mdgpu502-01-l cpu_intel_e5_2670,gpu,...,tesla_k20mdgpu702-16 cpu_intel_e5_2699_v3,gpu,...,gtx1080tidgpu703-01 cpu_intel_e5_2699_v3,gpu,...,p100dgpu703-25 cpu_intel_e5_2699_v3,gpu,...,p6000

GPU Jobs + Constraints

https://www.hpc.kaust.edu.sa/ibex/job

Page 8: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi

● srun --time=30:00 --mem=64GB--gres=gpu:p100:1 --pty bash -l

● sbatch --time=60:00 --mem=128GB--gres=gpu:2--constraint="[p100|p6000]"runjob.sbat

GPU Jobs + Constraints

https://www.hpc.kaust.edu.sa/ibex/job

Page 9: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi

● sbatch --time=60:00 runjob.sbat● runjob.sbat

#SBATCH --job-name=gpujob#SBATCH --gres=gpu:gtx1080i:4#SBATCH --constraint="[local_500G]"#SBATCH --mem=128GB#SBATCH --nodes=2 --ntasks-per-node=2

GPU Jobs + Constraints

https://www.hpc.kaust.edu.sa/ibex/job

Page 10: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi

● CMake– module load cmake

● C++– System default: GCC v4.8.5

– module load gcc/6.4.0

– module load pgi/17.10

GPU Software: Modules & Compilers

Page 11: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi

● CUDA– module load cuda

– nvcc -std=c++11 -o example example.cu● cuDNN

– module load cudnn

– nvcc -std=c++11 -o example example.cu

GPU Software: Modules & Compilers

Page 12: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi

GPU Visualization Analytic Apps

● ParaView (HPC visualization / analytics)– module load paraview

– https://wiki.vis.kaust.edu.sa/training/2017-18/advancedparaviewworkshop

● MapD (GPU Database)– Available for early-user testing...

https://wiki.vis.kaust.edu.sa/training

Page 13: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi

GPU Python Environments

● anaconda3– module load anaconda3

– conda list

– ipython

● Custom Python environments:

– conda --help– https://conda.io/docs/_downloads/conda-cheatsheet.pdf

– https://conda.io/docs/

Page 14: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi

GPU Machine Learning Apps

● machine_learning– module av machine_learning

● <year>.<num>-cudnn<ver>-cuda<ver>-py<ver>

– module load machine_learning

– conda list

– Contains: ● TensorFlow, Keras, Caffe2, Torch, etc. +

numpy, scipy, scikit-learn, pandas, matplotlib, etc.

Page 15: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi

GPU Machine Learning Apps

● tensorflow– module load tensorflow

– ipython

>>> import tensorflow as tf

– python <model.py>

Page 16: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi

GPU Performance Tools

● General Information (not scalable)

– nvidia-smi+-----------------------------------------------------------------------------+| NVIDIA-SMI 384.98 Driver Version: 384.98 ||-------------------------------+----------------------+----------------------+| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. ||===============================+======================+======================|| 0 GeForce GTX TIT... On | 0000:0D:00.0 Off | N/A || 37% 56C P2 153W / 189W | 135MiB / 6081MiB | 86% Default |+-------------------------------+----------------------+----------------------+| 1 GeForce GTX TIT... On | 0000:0E:00.0 Off | N/A || 31% 47C P8 34W / 189W | 2MiB / 6082MiB | 0% Default |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes: GPU Memory || GPU PID Type Process name Usage ||=============================================================================|| 0 72633 C ../../build.cudnntraining.teneen/trainlenet 133MiB |+-----------------------------------------------------------------------------+

KSL provides profiling training...

Page 17: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi

GPU Performance Monitoring

● Modify Batch Script:

● View / Truncate logs

tail -f gpu-dmon.log

truncate --size=0 gpu-dmon.log

# SBATCH ...

# After SBATCH section, but before running main program# Pipe nvidia-smi logging into *.log file.# Must run nvidia-smi in background

nvidia-smi dmon >> gpu-dmon.log &

# Run primary GPU application here...# Don't run primary application in background

# After primary GPU application# kill nvidia-smi monitor to allow batch job to terminate early.

pkill nvidia-smi For Testi

ng ONLY

For Testi

ng ONLY

NOTNOT fo

r Pro

duction

for P

roductio

n

Page 18: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi
Page 19: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi
Page 20: ssh glogin.ibex.kaust.edu · 2018-09-26 · module avail GPU Software: Modules DEV CORE APPS pgi (OpenACC) gcc intel cmake git java maven NVIDIA (OpenGL / EGL) cuda cudnn nccl openmpi