deep learning with gpus - files.meetup.comfiles.meetup.com/4379272/accelerated-deep-learning... ·...
TRANSCRIPT
An NVIDIA Tech Talk @
Boston Imaging and Vision Meetup
Thursday, October 22nd, 2015
Accelerated Deep Learning With GPUs
2
PC DATA CENTER MOBILE
DESKTOP VIRTUALIZATION
AUTONOMOUS MACHINES
HPC & CLOUD SERVICE PROVIDERS GAMING DESIGN
The World Leader in Visual Computing
3
Academic Collaboration
https://developer.nvidia.com/academia
4
CUDA Zone
5
Accelerated Deep Learning with GPUs
6
• Data Science & Deep Learning Overview
• The Promise of Machine Learning
• What Makes Deep Learning Deep?
• Why Is Deep Learning Hot Now? AGENDA
7
Data science landscape (simplified)
• Regression • SVM • Recommendation systems
Data Analytics
Machine
Learning Graph Analytics SQL Query
Traditional
Methods
Deep Neural
Networks
8
How GPU Acceleration Works (highly simplified)
Application Code
+
GPU CPU 5% of Code
Compute-Intensive Functions
Rest of Sequential CPU Code
~ 80% of run-time
9
3 Drivers for Deep Learning
More Data Better Models Powerful GPU Accelerators
10
“Machine Learning” (ML) is in some sense a rebranding of AI.
The focus is now on more specific, often perceptual tasks, and there are many successes.
Today, some of the world’s largest internet companies, as well as the foremost research institutions, are using GPUs for machine learning.
GPU Accelerated Deep Learning CUDA for
Deep Learning
11
Neural Networks
GPUs
Inherently Parallel
Matrix Operations
FLOPS
Why are GPUs Useful for Deep Learning?
0 0 4
60
110 28%
26%
16%
12%
7%
2010 2011 2012 2013 2014
bird
frog
person
dog
chair
GPUs deliver —
Same or better prediction accuracy
Faster results
Smaller footprint
Lower power
12
Deep learning with COTS HPC systems
A. Coates, B. Huval, T. Wang, D. Wu, A. Ng, B. Catanzaro
ICML 2013
GOOGLE DATACENTER
1,000 CPU Servers 2,000 CPUs • 16,000
cores
600 kWatts
$5,000,00
0
STANFORD AI LAB
3 GPU-Accelerated Servers 12 GPUs • 18,432 cores
4 kWatts
$33,000
Now You Can Build Google’s
$1M Artificial Brain on the Cheap
“ “ GPUs Make Deep Learning Accessible
13
NVIDIA & IBM Cloud Support
“NVIDIA is excited to announce we have teamed up with IBM Cloud to provide qualifying competitors with access to the SoftLayer cloud server infrastructure. The servers are outfitted with dual Intel Xeon E5-2690 CPUs, 128GB RAM, two 1TB SATA HDD/RAID 0, and two NVIDIA Tesla K80s, NVIDIA’s fastest dual-GPU accelerators.” NVIDIA Parallel For All Blog, August 13th, 2015
14
The Promise of Machine Learning ML Systems Extract Value From Big Data
350 millions images uploaded per day
2.5 Petabytes of customer data hourly
100 hours of video uploaded every minute
15
“Delving Deep into Rectifiers: Surpassing Human-Level
Performance on ImageNet Classification”
— Microsoft: 4.94%, Feb. 6, 2015
“Deep Image: Scaling up Image Recognition”
— Baidu: 5.98%, Jan. 13, 2015
“Batch Normalization: Accelerating Deep Network
Training by Reducing Internal Covariant Shift”
— Google: 4.82%, Feb. 11, 2015
IMAGENET
CHALLENGE
Accuracy %
2010 2014 2012 2011 2013
74%
84%
DNN
CV
72%
16
The Early Big Data Narrative
2.5 Exabytes of Web Data Created Daily 2.5 Petabytes of Customer Data Hourly
350 Million Images Uploaded a Day 100 Hours Video Uploaded Every Minute
How can we organize, analyze, understand, and
benefit from such a trove of data?
?
TRAINING DATA
The promise of machine learning
LIVE DATA PREDICTION ANALYSIS
MACHINE LEARNING
Walmart • 2.5 PB of customer transaction
data every hr
Facebook • 340 million photos uploaded
every day
Orange France • >1 billion call data records every
month
18
Image Classification, Object Detection, Localization
Facial Recognition Speech & Natural Language
Processing
Medical Imaging & Interpretation
Seismic Imaging & Interpretation Recommendation
Machine Learning Use Cases …machine learning is pervasive!
19
Traditional ML – Hand Tuned Features
Image Vision features Detection
Images/video
Audio sample Audio features Speaker ID
Audio
Text
Text Text features
Text classification, Machine
translation, Information
retrieval, ....
Slide courtesy of Andrew Ng, Stanford University
20
What is Deep Learning? Systems that learn to recognize objects which are important, without us telling the system explicitly what the object is ahead of time
Key components
Task
Features
Model
Learning Algorithm
21
Deep Learning Advantages
Don’t have to figure out the features ahead of time
Use same neural net approach for many different problems
Fault tolerant
Scales well
Simplicity & Scalability
Linear classifier
Regression
Support Vector Machine Bayesian
Decision Trees Association Rules
Clustering
22
Convolutional Neural Networks (CNNs)
Biologically inspired
Neuron only connected to a small region of neurons in layer below it called the receptive field A given layer can have many convolutional filters/kernels Each filter has the same weights across the whole layer Bottom layers are convolutional, top layers are fully connected
Generally trained via supervised learning Supervised Unsupervised Reinforcement …ideal system automatically switches modes…
23
Convolutional Networks Breakthrough
Y. LeCun et al. 1989-1998 : Handwritten digit reading
A. Krizhevsky, G. Hinton et al. 2012 : Imagenet classification winner
24
Training
Image Classification with DNNs
Cars Buses Trucks Motor cycles
Truck
Inference
25
Image Classification with DNNs Training
Typical training run
Pick a DNN design
Input 100 million training images spanning 1,000 categories
One week of computation
Test accuracy
If bad: modify DNN, fix training set or update training parameters
Cars Buses Trucks Motor cycles
26
What makes Deep Learning deep?
Input Result
Hinton et al., 2006; Bengio et al., 2007; Bengio & LeCun, 2007; Lee et al., 2008; 2009
Visual Object Recognition Using Deep Convolutional Neural Networks
Rob Fergus (New York University / Facebook) http://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php#2985
Networks can have ~10 layers 1B parameters 10M images ~30 Exaflops ~30 GPU days Human brain has trillions of parameters.
27
Tree
Cat
Dog
Deep Learning Framework
“turtle”
Forward Propagation
Compute weight update to nudge
from “turtle” towards “dog”
Backward Propagation
Trained Model
“dog”
Repeat
Training
Classification
28
CNNs dominate in perceptual tasks
Slide credit: Yann Lecun, Facebook & NYU
29
Why is Deep Learning Hot Now ?
30
ConvNet Benchmark comments from github
“…9 months ago, we were ~3x slower on Alex net and ~4x slower on overfeat. Training that took 3 weeks, takes 1 week (on the same 1-GPU metric). That is a huge fundamental speedup results in several man-hours saved in waiting for experiments to finish.”
“Pushing these boundaries so far, in such a short time-frame is quite something. There’s two sets of teams who have made this happen:
• NVIDIA, with their Maxwell cards that are as fast as #*%!
• Nervana Systems (Scott Gray and team) who have pushed the CUDA kernels to the limits of the GPUs with efficiencies > 95%”
“Rejigging the marks… #56” – Soumith Chintala, August 2015 Artificial Intelligence Research Engineer at Facebook
31
Why is Deep Learning Hot Now? Three Driving Factors…
Big Data Availability New ML Techniques Compute Density
350 millions images uploaded per day
2.5 Petabytes of customer data hourly
100 hours of video uploaded every minute
Deep Neural Networks GPUs
ML systems extract value from Big Data
32
Deep Learning Revolutionizing Medical Research
Detecting Mitosis in
Breast Cancer Cells
— IDSIA
Predicting the Toxicity
of New Drugs
— Johannes Kepler University
Understanding Gene Mutation
to Prevent Disease
— University of Toronto
33
ILSVRC12 winning model: “Supervision”
7 layers
5 convolutional layers + 2 fully-connected
ReLU, pooling, drop-out, response normalization
Implemented with Caffe
GPU Acceleration Training a Deep, Convolutional Neural Network
Batch Size Training Time
CPU Training Time
GPU GPU
Speed Up
64 images 64 s 7.5 s 8.5X
128 images 124 s 14.5 s 8.5X
256 images 257 s 28.5 s 9.0X
Dual 10-core Ivy Bridge CPUs
1 Tesla K40 GPU
CPU times utilized Intel MKL BLAS library
GPU acceleration from CUDA matrix libraries (cuBLAS)
34
GPU-Accelerated Deep Learning Frameworks
CAFFE TORCH THEANO CUDA-
CONVNET2 KALDI
Domain Deep Learning
Framework
Scientific Computing
Framework
Math Expression
Compiler
Deep Learning
Application
Speech Recognition
Toolkit
cuDNN R2 R2 R2 -- --
Multi-GPU In Progress Partial Partial (nnet2)
Multi-CPU (nnet2)
License BSD-2 GPL BSD Apache 2.0 Apache 2.0
Interface(s)
Text-based
definition files,
Python, MATLAB
Python, Lua,
MATLAB Python C++ C++, Shell scripts
Embedded (TK1)
http://developer.nvidia.com/deeplearning
35
36
Deep Learning Use Cases for Vision & Graphics
37
Street Number Detection
[Goodfellow 2014]
38
Object Classification
[Krizhevsky 2012]
39
Image Retrieval
[Krizhevsky 2012]
40
Pose Estimation
[Toshev, Szegedy 2014]
41
Object Detection
[Toshev, Szegedy 2014]
42
Object Detection
[Huval et al. 2015]
43
Face Recognition
[Taigman et al. 2014]
44
Action Recognition
[Simonyan et al. 2014]
Thank You!