internet access - · 2017-03-26 · – nvidia® gpus in amazon aws cloud services train smarter...
TRANSCRIPT
1
INTERNET ACCESS
WiFi Access:
SSID: HPCSAUDI
Password: HPCSaudi@KAUST
Guest -> guest
Eduroam
2
NAVIGATING TO QWIKLABS
1. Navigate to: https://nvlabs.qwiklab.com
1. Login or create a new account
3
HPC SA DLI WorkshopGunter Roeth [email protected] Deep Learning Institute Approved InstructorNVIDIA Corporation
4
AGENDA NVIDIA DLI
• 1:30 – 2:00 Workshop Overview
• 2:00 – 3:00 Getting Started with Deep Learning Lab
• 3:00 – 3:30 Break
• 3:30 – 5:00 Approaches to Object Detection using DIGITS Lab
5
NVIDIA DEEP LEARNING INSTITUTE
Training organizations and individuals to solve challenging problems using Deep Learning
On-site workshops and online courses presented by certified experts
Covering complete workflows for proven application use casesSelf-driving cars, recommendation engines, medical image classification, intelligent video analytics and more
www.nvidia.com/dli
Hands-on Training for Data Scientists and Software Engineers
6
TAKE THE SURVEY. GET LAB CREDITS.
Want more training from NVIDIA?
Here’s how to get it:
1. We’ll send you an email with a survey link
2. Complete and submit the survey
3. We’ll add credits to your Qwiklabs account
8
NVIDIA — “THE AI COMPUTING COMPANY”Pioneered GPU Computing | Founded 1993 | $7B | 9,500 Employees
COMPUTER GRAPHICSGPU COMPUTING ARTIFICIAL INTELLIGENCE
9
THE BIG BANG IN MACHINE LEARNING
“ Google’s AI engine also reflects how the world of computer hardware is changing. (It) depends on machines equipped with GPUs… And it depends on these chips more than the larger tech universe realizes.”
DNN GPUBIG DATA
12
EXAMPLE APPLICATIONS OF DEEP LEARNING
13
PRACTICAL EXAMPLES OF DEEP LEARNINGImage Classification, Object Detection,
Localization, Action RecognitionSpeech Recognition, Speech Translation,
Natural Language Processing
Breast Cancer Cell Mitosis Detection, Volumetric Brain Image Segmentation
Pedestrian Detection, Lane Detection, Traffic Sign Recognition
14
IMAGE CLASSIFICATION
Object
http://demo.caffe.berkeleyvision.org/
Open source demo code:
$CAFFE_ROOT/examples/web_demo
Scene
http://places.csail.mit.edu/
B. Zhou et al. NIPS 14
Style
http://demo.vislab.berkeleyvision.org/
Karayev et al. Recognizing Image Style. BMVC14
15
Expedites online shopping for 100 million users
with real-time object detection and classification
__________________________________________
– NVIDIA® GPUs in Amazon AWS cloud services
train smarter AI models and deliver more
responsive user experiences with faster
inferencing
– AI-powered visual search drives 50% higher
user engagement
– Generates nearly 25% of social media-driven
global sales
“The more people Pin, the better the technology
will become.”-Andrew Zhai, Software Engineer, Visual Discovery
team, Pinterest
16
SEMANTIC SEGMENTATION
17
FULLY CONVOLUTIONAL SEGMENTATION
18
Deep Learning for Self Driving Cars
Multi-class detection (DriveNet)
OpenRoadNet LaneNet 3D Bounding Boxes
The DriveNet team builds perception networks for autonomous driving
19
Offers a faster internet search option to 1
billion mobile users with voice search
_________________________________________
– NVIDIA® GPUs power Deep Speech 2 – the
world’s first advanced speech recognition
model to recognize English and Mandarin
– Delivers super-human accuracy
– GPUs deliver responsiveness not possible
on CPU servers
“Deep learning has pretty much taken over
speech recognition”-Andrew Ng, Chief Scientist, Baidu Research
21
CNN + RNN
THE NEXT STEP ̶̶̶̶̶̶̶̶̶ NATURAL LANGUAGE PROCESSING
22
Dynamic Memory NetworksMetaMind now SalesforceIQ https://arxiv.org/pdf/1603.01417v1.pdf
23
DEEP LEARNING Software
24
NVIDIA DEEP LEARNING SDKHigh Performance GPU-Acceleration for Deep Learning
COMPUTER VISION SPEECH AND AUDIO BEHAVIOR
Object Detection Voice Recognition TranslationRecommendation
EnginesSentiment Analysis
DEEP LEARNING
cuDNN
MATH LIBRARIES
cuBLAS cuSPARSE
MULTI-GPU
NCCL
cuFFT
Mocha.jl
Image Classification
DEEP LEARNING
SDK
FRAMEWORKS
APPLICATIONS
26
NVIDIA cuDNN
High performance building blocks for deep learning frameworks
Drop-in acceleration for widely used deep learning frameworks such as Caffe, CNTK, Tensorflow, Theano, Torch and others
Accelerates industry vetted deep learning algorithms, such as convolutions, LSTM, fully connected, and pooling layers
Fast deep learning training performance tuned for NVIDIA GPUs
Accelerating Deep Learning
developer.nvidia.com/cudnn
“NVIDIA has improved the speed of cuDNN with each
release while extending the interface to more
operations and devices at the same time”
— Evan Shelhamer, Lead Caffe Developer, UC Berkeley
K40
K80 + cuDNN1
M40 + cuDNN4
P100 + …
0x
10x
20x
30x
40x
50x
60x
70x
80x
AlexNet training throughput onCPU: 1x E5-2680v3 12 Core 2.5GHz. 128GB System Memory, Ubuntu 14.04M40 bar: 8x M40 GPUs in a node, P100: 8x P100 NVLink-enabled
Deep Learning Training PerformanceCaffe AlexNet
Speed-u
p o
f Im
ages/
Sec v
s K40 in 2
013
29
NVIDIA DIGITSInteractive Deep Learning GPU Training System
Test Image
Monitor ProgressConfigure DNNProcess Data Visualize Layers
developer.nvidia.com/digits
32
NVIDIA TensorRTHigh-performance deep learning inference for production deployment
developer.nvidia.com/tensorrt
EMBEDDED
Jetson TX1
DATA CENTER
Tesla P4
Tesla P40
AUTOMOTIVE
Drive PX2
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
2 8 128
CPU-Only
Tesla P40 + TensorRT (FP32)
Tesla P40 + TensorRT (INT8)
Up to 36x More Image/sec
Batch Size
GoogLenet, CPU-only vs Tesla P40 + TensorRTCPU: 1 socket E4 2690 v4 @2.6 GHz, HT-onGPU: 2 socket E5-2698 v3 @2.3 GHz, HT off, 1 P40 card in the box
Images/
Second
33
NVIDIA DEEPSTREAM SDKDelivering Video Analytics at Scale
Inference
PreprocessHardware Decode
“Boy playing soccer”
Simple, high performance API for analyzing video
Decode H.264, HEVC, MPEG-2, MPEG-4, VP9
CUDA-optimized resize and scale
TensorRT
0
20
40
60
80
100
1x Tesla P4 Server +DeepStream SDK
13x E5-2650 v4 Servers
Concurr
ent
Vid
eo S
tream
s
Concurrent Video Streams Analyzed
720p30 decode | IntelCaffe using dual socket E5-2650 v4 CPU servers, Intel MKL 2017Based on GoogLeNet optimized by Intel: https://github.com/intel/caffe/tree/master/models/mkl2017_googlenet_v2
34
DEEP LEARNING H/W
35
NVIDIA DGX-1AI supercomputer-in-a-box
170 TFLOPS performance (half precision)
8x Tesla P100 16GB
NVLink Hybrid Cube Mesh
Optimized Deep Learning Software
Dual Xeon
512 GB DDR4 Memory
7 TB SSD Deep Learning Cache
Dual 10GbE, Quad IB 100Gb
3RU – 3200W
38
NVIDIA DGX-1 SOFTWAREOptimized for Deep Learning Performance
Accelerated Deep Learning
cuDNN NCCL
cuSPARSE
cuBLAS cuFFT
Container Based Applications
NVIDIA Cloud Management
Digits DL Frameworks GPU Apps
Research & Develop
Deploy & Manage
Package & Test
39
40x Efficient vs CPU, 8x Efficient vs FPGA
0
50
100
150
200
AlexNet
CPU FPGA 1x M4 (FP32) 1x P4 (INT8)
Images/
Sec/W
att
Maximum Efficiency for Scale-out Servers P4
# of CUDA Cores 2560
Peak Single Precision 5.5 TeraFLOPS
Peak INT8 22 TOPS
Low Precision4x 8-bit vector dot product
with 32-bit accumulate
Video Engines 1x decode engine, 2x encode engine
GDDR5 Memory 8 GB @ 192 GB/s
Power 50W & 75 W
AlexNet, batch size = 128, CPU: Intel E5-2690v4 using Intel MKL 2017, FPGA is Arria10-1151x M4/P4 in node, P4 board power at 56W, P4 GPU power at 36W, M4 board power at 57W, M4 GPU power at 39W, Perf/W chart using GPU power
TESLA P4
40
TESLA P40
P40
# of CUDA Cores 3840
Peak Single Precision 12 TeraFLOPS
Peak INT8 47 TOPS
Low Precision4x 8-bit vector dot product
with 32-bit accumulate
Video Engines 1x decode engine, 2x encode engines
GDDR5 Memory 24 GB @ 346 GB/s
Power 250W
0
20,000
40,000
60,000
80,000
100,000
GoogLeNet AlexNet
8x M40 (FP32) 8x P40 (INT8)
Images/
Sec
4x Boost in Less than One Year
GoogLeNet, AlexNet, batch size = 128, CPU: Dual Socket Intel E5-2697v4
Highest Throughput for Scale-up Servers
41
TESLA DEEP LEARNING PLATFORM
TRAINING INFERENCING
DIGITS Training System
Deep Learning Frameworks
Tesla P100
DeepStream SDK
TensorRT
Tesla P40 & P4
43
DEEP LEARNING
44
GPUS IN ARTIFICIAL INTELLIGENCE
Replace hand-tuned parameters of the feature extraction steps (e.g. in voice and image recognition)
Deep learning is a subset of machine learning that refers to artificial neural networks that are composed of many layers.
Artificial Neural Networks inspired by human brain and need lots of training data (ideal for Big Data).
NVIDIA GPUs and cuDNN software broadly adopted for machine learning.
Machine Learning
Neural
Networks
Deep
Learning
45
Tree
Cat
Dog
Machine Learning Software
“turtle”
Forward Propagation
Compute weight update to nudge
from “turtle” towards “dog”
Backward Propagation
Trained Model
“cat”
Repeat
Training
Inference
46
Convolutional Networks Used Case
Yann LeCun et al, 1998
Local receptive field + weight sharing
“Gradient-Based Learning Applied to Document Recognition”, Proceedings of the IEEE 1998, http://yann.lecun.com/exdb/lenet/index.html
MNIST: 0.7% error rate
4747
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
1
2
2
1
1
1
0
1
2
2
2
1
1
0
1
2
2
2
1
1
0
0
1
1
1
1
1
0
0
0
0
0
0
0
4
0
0
0
0
0
0
0
-4
1
0
-8
Source
Pixel
Convolution
kernel
New pixel value
(destination
pixel)
Center element of the kernel is
placed over the source pixel.
The source pixel is then
replaced with a weighted sum
of itself and nearby pixels.
CONVOLUTION
4848
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
1
2
2
1
1
1
0
1
2
2
2
1
1
0
1
2
2
2
1
1
0
0
1
1
1
1
1
0
0
0
0
0
0
0
4
0
0
0
0
0
0
0
-4
1
0
-8
Source
Pixel
Filters consist
of a series of
weights (a.k.a.
parameters) Activation map
CNN TERMINOLOGY
49
Image “Volvo XC90”
Image source: “Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks” ICML 2009 & Comm. ACM 2011.Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Ng.
CONVOLUTIONAL NEURAL NETWORKS
50
GETTING STARTED WITH DEEP LEARNING
NVIDIA Deep Learning Institute Certified InstructorNVIDIA Corporation
52
LAUNCHING THE LAB ENVIRONMENT
53
NAVIGATING TO QWIKLABS
1. Navigate to: https://nvlabs.qwiklab.com
1. Login or create a new account
54
55
ACCESSING LAB ENVIRONMENT
3. Select the event specific In-Session Class in the upper left
3. Click the “Getting Started with Deep Learning” Class from the list
56
LAUNCHING THE LAB ENVIRONMENT
Click on the Select 5.button to launch the lab environment
After a short •wait, lab Connection information will be shown
Please ask Lab •Assistants for help!
57
LAUNCHING THE LAB ENVIRONMENT
6. Click on the Start Lab button
58
LAUNCHING THE LAB ENVIRONMENT
You should see that the lab environment is “launching” towards the upper-right corner
59
CONNECTING TO THE LAB ENVIRONMENT
7. Click on “here” to access your lab environment / Jupyter notebook
60
CONNECTION INSTRUCTIONS
Navigate to nvlabs.qwiklab.com, login or create a new account
Select the “HPC Saudi DL Workshop”
Find the lab called “Getting Started with Deep Learning”
Click Select, and finally click the green button
After the lab instance sets up, connection info will be shown, click ‘here’
Please ask Lab Assistants for help!
61
CONNECTING TO THE LAB ENVIRONMENT
You should see your “Getting Started With Deep Learning” Jupyter notebook
62
JUPYTER NOTEBOOK
1. Place your cursor in the code
2. Click the “run cell” button
2. Confirm you receive the same result
63
STARTING DIGITS
Instruction in Jupyter notebook will link you to DIGITS
64
ACCESSING DIGITS
• Will be prompted to enter a username to access DIGITS
• Can enter any username
• Use lower case letters
66
CREATE DATASET IN DIGITS
• Dataset settings
• Image Type: Grayscale
• Image Size: 28 x 28
• Training Images: /home/ubuntu/data/train_small
• Select “Separate test images folder” checkbox
• Test Images: /home/ubuntu/data/test_small
• Dataset Name: MNIST Small
6767
HANDWRITTEN DIGIT RECOGNITION
• MNIST data set of handwritten digits from Yann Lecun’s website
• All images are 28x28 grayscale
• Pixel values from 0 to 255
• 60K training examples / 10K test examples
• Input vector of size 784
• 28 * 28 = 784
• Output value is integer from 0-9
HELLO WORLD of machine learning?
68
CREATE MODEL
• Select the “MNIST small” dataset
• Set the number of “Training Epochs” to 10
• Set the framework to “Caffe”
• Set the model to “LeNet”
• Set the name of the model to “MNIST small”
• When training done, Classify One :
/home/ubuntu/data/test_small/2/img_4415.png
6969
Loss function(Validation)
Loss function(Training)
Accuracyobtained from
validation dataset
EVALUATE THE MODEL
70
ADDITIONAL TECHNIQUES TO IMPROVE MODEL
• More training data
• Data augmentation
• Modify the network
71
ADDITIONAL TERMINOLOGY• Hyperparameters – parameters specified before training begins
• Can influence the speed in which learning takes place• Can impact the accuracy of the model• Examples: Learning rate, decay rate, batch size
• Epoch – complete pass through the training dataset
• Activation functions – identifies active neurons• Examples: Sigmoid, Tanh, ReLU
• Pooling – Down-sampling technique• No parameters (weights) in pooling layer
72
LAB REVIEW
73
FIRST RESULTSSmall dataset ( 10 epochs )
• 96% of accuracy achieved
• Training is done within one minute
SMALL DATASET
1 : 99.90 %
2 : 69.03 %
8 : 71.37 %
8 : 85.07 %
0 : 99.00 %
8 : 99.69 %
8 : 54.75 %
74
FULL DATASET6x larger dataset
Dataset•
Training Images: /home/ubuntu/data/train_full•
Test Image: /home/ubuntu/data/test_full•
Dataset Name: MNIST full•
Model•
Clone “MNIST small”.•
Give a new name “MNIST full” to push the create button•
75
SMALL DATASET FULL DATASET
1 : 99.90 % 0 : 93.11 %
2 : 69.03 % 2 : 87.23 %
8 : 71.37 % 8 : 71.60 %
8 : 85.07 % 8 : 79.72 %
0 : 99.00 % 0 : 95.82 %
8 : 99.69 % 8 : 100.0 %
8 : 54.75 % 2 : 70.57 %
SECOND RESULTSFull dataset ( 10 epochs )
• 99% of accuracy achieved
• No improvements in recognizing real-world images
76
DATA AUGMENTATIONAdding Inverted Images
Pixel(Inverted) = • 255 – Pixel(original)
White letter with black background•
Black letter with white background•
Training Images:•/home/ubuntu/data/train_invert
Test Image:•/home/ubuntu/data/test_invert
Dataset Name: MNIST invert•
77
SMALL DATASET FULL DATASET +INVERTED
1 : 99.90 % 0 : 93.11 % 1 : 90.84 %
2 : 69.03 % 2 : 87.23 % 2 : 89.44 %
8 : 71.37 % 8 : 71.60 % 3 : 100.0 %
8 : 85.07 % 8 : 79.72 % 4 : 100.0 %
0 : 99.00 % 0 : 95.82 % 7 : 82.84 %
8 : 99.69 % 8 : 100.0 % 8 : 100.0 %
8 : 54.75 % 2 : 70.57 % 2 : 96.27 %
DATA AUGMENTATIONAdding inverted images ( 10 epochs )
78
MODIFY THE NETWORKAdding filters and ReLU layer
layer { name: "pool1“type: "Pooling“…
}
layer {name: "reluP1"type: "ReLU"bottom: "pool1"top: "pool1"
}
layer {name: "reluP1“
layer {name: "conv1"type: "Convolution"
...convolution_param {num_output: 75...
layer {name: "conv2"type: "Convolution"...convolution_param {num_output: 100...
79
MODIFY THE NETWORKAdding ReLU Layer
80
SMALL DATASET FULL DATASET +INVERTED ADDING LAYER
1 : 99.90 % 0 : 93.11 % 1 : 90.84 % 1 : 59.18 %
2 : 69.03 % 2 : 87.23 % 2 : 89.44 % 2 : 93.39 %
8 : 71.37 % 8 : 71.60 % 3 : 100.0 % 3 : 100.0 %
8 : 85.07 % 8 : 79.72 % 4 : 100.0 % 4 : 100.0 %
0 : 99.00 % 0 : 95.82 % 7 : 82.84 % 2 : 62.52 %
8 : 99.69 % 8 : 100.0 % 8 : 100.0 % 8 : 100.0 %
8 : 54.75 % 2 : 70.57 % 2 : 96.27 % 8 : 70.83 %
MODIFIED NETWORKAdding filters and ReLU layer ( 10 epochs )
81
DEEP LEARNING FOR APPROACHES TO OBJECT DETECTION
NVIDIA Deep Learning Institute Certified InstructorNVIDIA Corporation
82
83
COMPUTER VISION TASKSImage
SegmentationObject Detection
Image Classification +
Localization
Image Classification
(inspired by a slide used in cs231n lecture from Stanford University)
85
CNN USED CASEFully convolutional pixel segmentation : level classification and segmentation
http://fcn.berkeleyvision.orgLong, Shelhamer, Darrell, Fully convolutional networks for semantic segmentation, CVPR 2015
86
ALEXNET ARCHITECTURE
Source: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
87
ALEXNET LAYERS
88
IMAGE CLASSIFICATION
Alexnet classification of an image of a cat from the PASCAL VOC dataset.
89
LIMITS OF IMAGE CLASSIFICATION
90
OBJECT DETECTION LAB - PART1.1Sliding window classifier
91
OBJECT DETECTION LAB - PART 1.2Overlapping windows
92
IMAGE SEGMENTATION
93
IMAGE SEGMENTATION
Image segmentation (middle) vs. Instance-aware Image Segmentation (right). Images
from the PASCAL VOC dataset.
95
TAKE THE SURVEY. GET LAB CREDITS.
Want more training from NVIDIA?
Here’s how to get it:
1. We’ll send you an email with a survey link
2. Complete and submit the survey
3. We’ll add credits to your Qwiklabs account
96
Merci Thanks ☺Gunter RoethNVIDIA Deep Learning Institute Approved InstructorNVIDIA Corporation