![Page 2: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/2.jpg)
2
Nvidia is the world’s leading ai platform
ONE ARCHITECTURE — CUDA
![Page 3: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/3.jpg)
33
GPUCPU
GPU: Perfect Companion for Accelerating Apps & A.I.
![Page 4: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/4.jpg)
44
AGENDA&
TOPICS
• Intro to AI
• Deep Learning Intro
• NVIDIA’s DIGITS
• Autoencoding Enhancement
• TensorRT
![Page 5: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/5.jpg)
5
Intro to AI
![Page 6: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/6.jpg)
6
ARTIFICIAL NEURONS
From Stanford cs231n lecture notes
Biological neuron
w1 w2 w3
x1 x2 x3
y
y=F(w1x1+w2x2+w3x3)
Artificial neuron
Weights (Wn) = parameters
![Page 7: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/7.jpg)
7
ARTIFICIAL NEURAL NETWORKA collection of simple, trainable mathematical units that collectively
learn complex functions
Input layer Output layer
Hidden layers
Given sufficient training data an artificial neural network can approximate very complexfunctions mapping raw data to output decisions
![Page 8: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/8.jpg)
8
DEEP NEURAL NETWORK (DNN)
Input Result
Application components:
Task objectivee.g. Identify face
Training data10-100M images
Network architecture~10s-100s of layers1B parameters
Learning algorithm~30 Exaflops1-30 GPU days
Raw data Low-level features Mid-level features High-level features
![Page 9: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/9.jpg)
9
WHAT IS DEEP LEARNING?
![Page 10: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/10.jpg)
10
Accomplishing complex goals
![Page 11: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/11.jpg)
11
Difference in Workflow
Input
Hand
Designed
Features
Output
Classic Machine Learning [ 1990 : now ]Examples [ Regression and SVMs ]
Model /
Mapping
Example [ Conv Net ]
InputSimple
FeaturesOutput
Deep/End-to-End Learning [ 2012 : now ]
Model/
Mapping
Complex
Features
![Page 12: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/12.jpg)
12
Traditional Workflow
Input
Hand
Designed
Features
Output
Classic Machine Learning [ 1990 : now ]Examples [ Regression and SVMs ]
Model /
Mapping
Challenge in Slack channel: How would you describe this image to someone (or something) blind?
Difficult: From it’s raw pixels.Medium: From geometric primitives (lines, curves, colors)Easy: Using any words that you may know
![Page 13: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/13.jpg)
13
Deep Learning Workflow
Example [ Conv Net ]
InputSimple
FeaturesOutput
Deep/End-to-End Learning [ 2012 : now ]
Model/
Mapping
Complex
Features
Experience: Trust Neural Network to learn features and model by providing inputs and outputs.
Key Skill: Experience (data) creation
![Page 14: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/14.jpg)
14
NVIDIA’S DIGITS
![Page 15: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/15.jpg)
15
NVIDIA’S DIGITSInteractive Deep Learning GPU Training System
• Simplifies common deep learning tasks such as:
• Managing data
• Designing and training neural networks on multi-GPU systems
• Monitoring performance in real time with advanced visualizations
• Completely interactive so data scientists can focus on designing and training networks rather than programming and debugging
• Open source
![Page 16: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/16.jpg)
1616
Process Data Configure DNN VisualizationMonitor Progress
Interactive Deep Learning GPU Training System
NVIDIA’S DIGITS
![Page 17: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/17.jpg)
17
DIGITS - MODEL
Differences may exist between model tasks
Can anneal the learning rate
Define custom layers with Python
![Page 18: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/18.jpg)
18
DIGITS – VISUALIZATION RESULTS
![Page 19: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/19.jpg)
19
ENHANCING IMAGES WITH AN AI AUTOENCODER
![Page 20: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/20.jpg)
20
A great candidate for Deep Learning!
![Page 21: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/21.jpg)
21
Training Set of images.
1 sample per pixel
• It requires pairs of noisy and noise-free
images. The network will learn to
remove the noise from the images.
• We can then deploy this trained model
to any image we want to denoise.
(inference)
![Page 22: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/22.jpg)
228
Apply trained network to noisy
images
Deep Learning for Image Denoising
Inferencing
Collect iamhes
Add Noise to training images
Training on progression of images
Training Data
Training
Trained network detects
noise and reconstructs
Trained Neural Network
![Page 23: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/23.jpg)
23
Learning about images (CNN)
Input Result
Application components:
Task objectivee.g. Identify face
Training data10-100M images
Network architecture~10s-100s of layers1B parameters
Learning algorithm~30 Exaflops1-30 GPU days
Raw data Low-level features Mid-level features High-level features
![Page 24: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/24.jpg)
24
Autoencoder – in Action
![Page 25: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/25.jpg)
25
![Page 26: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/26.jpg)
26
10 VCAs 15 Minutes
1 VCA 2.5 hours
1 M6000 20 hours
Apply Noise Apply Noise
![Page 27: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/27.jpg)
2727
Provide image to autoencoder enhance
![Page 28: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/28.jpg)
28
![Page 29: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/29.jpg)
29
TensorRT
SOFTWARE INFERENCING PERFROMANCE EHANCEMENT
![Page 30: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/30.jpg)
3030
NVIDIA DEEP LEARNING SOFTWARE PLATFORM
NVIDIA DEEP LEARNING SDK
TensorRT
Embedded
Automotive
Data center
TRAINING FRAMEWORK
TrainingData
Training
Data Management
Model Assessment
Trained NeuralNetwork
developer.nvidia.com/deep-learning-software
![Page 31: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/31.jpg)
3131
NVIDIA TensorRTHigh-performance deep learning inference for production deployment
developer.nvidia.com/tensorrt
High performance neural network inference engine for production deployment
Generate optimized and deployment-ready models for datacenter, embedded and automotive platforms
Deliver high-performance, low-latency inference demanded by real-time services
Deploy faster, more responsive and memory efficient deep learning applications with INT8 and FP16 optimized precision support
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
2 8 128
CPU-Only
Tesla P40 + TensorRT (FP32)
Tesla P40 + TensorRT (INT8)
Up to 36x More Image/sec
Batch Size
GoogLenet, CPU-only vs Tesla P40 + TensorRTCPU: 1 socket E4 2690 v4 @2.6 GHz, HT-onGPU: 2 socket E5-2698 v3 @2.3 GHz, HT off, 1 P40 card in the box
Images/
Second
![Page 32: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/32.jpg)
3232
TENSORRT
• Image Classification (AlexNet, GoogleNet, VGG, ResNet)
• Object Detection
• Segmentation
Networks Supported
Not Yet Supported
• RNN/LSTM
• 3D convolutions
• Custom user layers
![Page 33: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/33.jpg)
3333
TENSORRT
• Convolution: Currently only 2D convolutions
• Activation: ReLU, tanh and sigmoid
• Pooling: max and average
• Scale: similar to Caffe Power layer (shift+scale*x)^p
• ElementWise: sum, product or max of two tensors
• LRN: cross-channel only
• Fully-connected: with or without bias
• SoftMax: cross-channel only
• Deconvolution
Layers Types Supported
![Page 34: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/34.jpg)
34
TENSORRTWorkflow
Training FrameworkOPTIMIZATION USING TensorRT
RUNTIMEUSING TensorRT
PLANNEURALNETWORK
developer.nvidia.com/tensorrt
![Page 35: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/35.jpg)
3535
TENSORRTOptimizations
• Fuse network layers
• Eliminate concatenation layers
• Kernel specialization
• Auto-tuning for target platform
• Tuned for given batch sizeTRAINED
NEURAL NETWORK
OPTIMIZEDINFERENCERUNTIME
developer.nvidia.com/tensorrt
![Page 36: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/36.jpg)
36
GRAPH OPTIMIZATIONUnoptimized network
concat
max pool
input
next input
3x3 conv.
relu
bias
1x1 conv.
relu
bias
1x1 conv.
relu
bias
1x1 conv.
relu
bias
concat
1x1 conv.
relu
bias
5x5 conv.
relu
bias
![Page 37: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/37.jpg)
37
GRAPH OPTIMIZATIONVertical fusion
concat
max pool
input
next input
concat
1x1 CBR 3x3 CBR 5x5 CBR 1x1 CBR
1x1 CBR 1x1 CBR
![Page 38: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/38.jpg)
38
GRAPH OPTIMIZATIONHorizontal fusion
concat
max pool
input
next input
concat
3x3 CBR 5x5 CBR 1x1 CBR
1x1 CBR
![Page 39: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/39.jpg)
3939
INT8 PRECISIONNew in TensorRT
ACCURACYEFFICIENCYPERFORMANCE
0
1000
2000
3000
4000
5000
6000
7000
2 4 128
FP32 INT8
Up To 3x More Images/sec with INT8
Precision
Batch Size
GoogLenet, FP32 vs INT8 precision + TensorRT on
Tesla P40 GPU, 2 Socket Haswell E5-2698 [email protected] with HT off
Images/
Second
0
200
400
600
800
1000
1200
1400
2 4 128
FP32 INT8
Deploy 2x Larger Models with INT8
Precision
Batch Size
Mem
ory
(M
B)
0%
20%
40%
60%
80%
100%
Top 1Accuracy
Top 5Accuracy
FP32 INT8
Deliver full accuracy with INT8
precision
% A
ccura
cy
![Page 40: NVIDIA FOR DEEP LEARNING - Center for Automotive Researchcargroup.org/wp-content/uploads/2018/02/Bill-Veenhuis... · NVIDIA’S DIGITS Interactive Deep Learning GPU Training System](https://reader035.vdocuments.us/reader035/viewer/2022062414/5f01d11a7e708231d4012e40/html5/thumbnails/40.jpg)
40
THANK YOU