deep learning with gpus - files.meetup.comfiles.meetup.com/4379272/accelerated-deep-learning... ·...

An NVIDIA Tech Talk @

Boston Imaging and Vision Meetup

Thursday, October 22nd, 2015

Accelerated Deep Learning With GPUs

2

PC DATA CENTER MOBILE

DESKTOP VIRTUALIZATION

AUTONOMOUS MACHINES

HPC & CLOUD SERVICE PROVIDERS GAMING DESIGN

The World Leader in Visual Computing

3

Academic Collaboration

https://developer.nvidia.com/academia



4

CUDA Zone

5

Accelerated Deep Learning with GPUs

6

• Data Science & Deep Learning Overview

• The Promise of Machine Learning

• What Makes Deep Learning Deep?

• Why Is Deep Learning Hot Now? AGENDA

7

Data science landscape (simplified)

• Regression • SVM • Recommendation systems

Data Analytics

Machine

Learning Graph Analytics SQL Query

Traditional

Methods

Deep Neural

Networks

8

How GPU Acceleration Works (highly simplified)

Application Code

+

GPU CPU 5% of Code

Compute-Intensive Functions

Rest of Sequential CPU Code

~ 80% of run-time

9

3 Drivers for Deep Learning

More Data Better Models Powerful GPU Accelerators

10

“Machine Learning” (ML) is in some sense a rebranding of AI.

The focus is now on more specific, often perceptual tasks, and there are many successes.

Today, some of the world’s largest internet companies, as well as the foremost research institutions, are using GPUs for machine learning.

GPU Accelerated Deep Learning CUDA for

Deep Learning

11

Neural Networks

GPUs

Inherently Parallel

Matrix Operations

FLOPS

Why are GPUs Useful for Deep Learning?

0 0 4

60

110 28%

26%

16%

12%

7%

2010 2011 2012 2013 2014

bird

frog

person

dog

chair

GPUs deliver —

Same or better prediction accuracy

Faster results

Smaller footprint

Lower power

12

Deep learning with COTS HPC systems

A. Coates, B. Huval, T. Wang, D. Wu, A. Ng, B. Catanzaro

ICML 2013

GOOGLE DATACENTER

1,000 CPU Servers 2,000 CPUs • 16,000

cores

600 kWatts

$5,000,00

0

STANFORD AI LAB

3 GPU-Accelerated Servers 12 GPUs • 18,432 cores

4 kWatts

$33,000

Now You Can Build Google’s

$1M Artificial Brain on the Cheap

“ “ GPUs Make Deep Learning Accessible

13

NVIDIA & IBM Cloud Support

“NVIDIA is excited to announce we have teamed up with IBM Cloud to provide qualifying competitors with access to the SoftLayer cloud server infrastructure. The servers are outfitted with dual Intel Xeon E5-2690 CPUs, 128GB RAM, two 1TB SATA HDD/RAID 0, and two NVIDIA Tesla K80s, NVIDIA’s fastest dual-GPU accelerators.” NVIDIA Parallel For All Blog, August 13th, 2015

http://www.nvidia.com/object/tesla-servers.html

http://www.nvidia.com/object/tesla-servers.html

http://devblogs.nvidia.com/parallelforall/nvidia-ibm-cloud-support-imagenet-large-scale-visual-recognition-challenge/



14

The Promise of Machine Learning ML Systems Extract Value From Big Data

350 millions images uploaded per day

2.5 Petabytes of customer data hourly

100 hours of video uploaded every minute

15

“Delving Deep into Rectifiers: Surpassing Human-Level

Performance on ImageNet Classification”

— Microsoft: 4.94%, Feb. 6, 2015

“Deep Image: Scaling up Image Recognition”

— Baidu: 5.98%, Jan. 13, 2015

“Batch Normalization: Accelerating Deep Network

Training by Reducing Internal Covariant Shift”

— Google: 4.82%, Feb. 11, 2015

IMAGENET

CHALLENGE

Accuracy %

2010 2014 2012 2011 2013

74%

84%

DNN

CV

72%

16

The Early Big Data Narrative

2.5 Exabytes of Web Data Created Daily 2.5 Petabytes of Customer Data Hourly

350 Million Images Uploaded a Day 100 Hours Video Uploaded Every Minute

How can we organize, analyze, understand, and

benefit from such a trove of data?

?

TRAINING DATA

The promise of machine learning

LIVE DATA PREDICTION ANALYSIS

MACHINE LEARNING

Walmart • 2.5 PB of customer transaction

data every hr

Facebook • 340 million photos uploaded

every day

Orange France • >1 billion call data records every

month

18

Image Classification, Object Detection, Localization

Facial Recognition Speech & Natural Language

Processing

Medical Imaging & Interpretation

Seismic Imaging & Interpretation Recommendation

Machine Learning Use Cases …machine learning is pervasive!

19

Traditional ML – Hand Tuned Features

Image Vision features Detection

Images/video

Audio sample Audio features Speaker ID

Audio

Text

Text Text features

Text classification, Machine

translation, Information

retrieval, ....

Slide courtesy of Andrew Ng, Stanford University

20

What is Deep Learning? Systems that learn to recognize objects which are important, without us telling the system explicitly what the object is ahead of time

Key components

Task

Features

Model

Learning Algorithm

21

Deep Learning Advantages

Don’t have to figure out the features ahead of time

Use same neural net approach for many different problems

Fault tolerant

Scales well

Simplicity & Scalability

Linear classifier

Regression

Support Vector Machine Bayesian

Decision Trees Association Rules

Clustering

22

Convolutional Neural Networks (CNNs)

Biologically inspired

Neuron only connected to a small region of neurons in layer below it called the receptive field A given layer can have many convolutional filters/kernels Each filter has the same weights across the whole layer Bottom layers are convolutional, top layers are fully connected

Generally trained via supervised learning Supervised Unsupervised Reinforcement …ideal system automatically switches modes…

23

Convolutional Networks Breakthrough

Y. LeCun et al. 1989-1998 : Handwritten digit reading

A. Krizhevsky, G. Hinton et al. 2012 : Imagenet classification winner

24

Training

Image Classification with DNNs

Cars Buses Trucks Motor cycles

Truck

Inference

25

Image Classification with DNNs Training

Typical training run

Pick a DNN design

Input 100 million training images spanning 1,000 categories

One week of computation

Test accuracy

If bad: modify DNN, fix training set or update training parameters

Cars Buses Trucks Motor cycles

26

What makes Deep Learning deep?

Input Result

Hinton et al., 2006; Bengio et al., 2007; Bengio & LeCun, 2007; Lee et al., 2008; 2009

Visual Object Recognition Using Deep Convolutional Neural Networks

Rob Fergus (New York University / Facebook) http://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php#2985

Networks can have ~10 layers 1B parameters 10M images ~30 Exaflops ~30 GPU days Human brain has trillions of parameters.

27

Tree

Cat

Dog

Deep Learning Framework

“turtle”

Forward Propagation

Compute weight update to nudge

from “turtle” towards “dog”

Backward Propagation

Trained Model

“dog”

Repeat

Training

Classification

28

CNNs dominate in perceptual tasks

Slide credit: Yann Lecun, Facebook & NYU

29

Why is Deep Learning Hot Now ?

30

ConvNet Benchmark comments from github

“…9 months ago, we were ~3x slower on Alex net and ~4x slower on overfeat. Training that took 3 weeks, takes 1 week (on the same 1-GPU metric). That is a huge fundamental speedup results in several man-hours saved in waiting for experiments to finish.”

“Pushing these boundaries so far, in such a short time-frame is quite something. There’s two sets of teams who have made this happen:

• NVIDIA, with their Maxwell cards that are as fast as #*%!

• Nervana Systems (Scott Gray and team) who have pushed the CUDA kernels to the limits of the GPUs with efficiencies > 95%”

“Rejigging the marks… #56” – Soumith Chintala, August 2015 Artificial Intelligence Research Engineer at Facebook

https://www.linkedin.com/in/soumith

31

Why is Deep Learning Hot Now? Three Driving Factors…

Big Data Availability New ML Techniques Compute Density

350 millions images uploaded per day

2.5 Petabytes of customer data hourly

100 hours of video uploaded every minute

Deep Neural Networks GPUs

ML systems extract value from Big Data

32

Deep Learning Revolutionizing Medical Research

Detecting Mitosis in

Breast Cancer Cells

— IDSIA

Predicting the Toxicity

of New Drugs

— Johannes Kepler University

Understanding Gene Mutation

to Prevent Disease

— University of Toronto

33

ILSVRC12 winning model: “Supervision”

7 layers

5 convolutional layers + 2 fully-connected

ReLU, pooling, drop-out, response normalization

Implemented with Caffe

GPU Acceleration Training a Deep, Convolutional Neural Network

Batch Size Training Time

CPU Training Time

GPU GPU

Speed Up

64 images 64 s 7.5 s 8.5X

128 images 124 s 14.5 s 8.5X

256 images 257 s 28.5 s 9.0X

Dual 10-core Ivy Bridge CPUs

1 Tesla K40 GPU

CPU times utilized Intel MKL BLAS library

GPU acceleration from CUDA matrix libraries (cuBLAS)

34

GPU-Accelerated Deep Learning Frameworks

CAFFE TORCH THEANO CUDA-

CONVNET2 KALDI

Domain Deep Learning

Framework

Scientific Computing

Framework

Math Expression

Compiler

Deep Learning

Application

Speech Recognition

Toolkit

cuDNN R2 R2 R2 -- --

Multi-GPU In Progress Partial Partial (nnet2)

Multi-CPU (nnet2)

License BSD-2 GPL BSD Apache 2.0 Apache 2.0

Interface(s)

Text-based

definition files,

Python, MATLAB

Python, Lua,

MATLAB Python C++ C++, Shell scripts

Embedded (TK1)

http://developer.nvidia.com/deeplearning

http://developer.nvidia.com/deeplearning

36

Deep Learning Use Cases for Vision & Graphics

37

Street Number Detection

[Goodfellow 2014]

38

Object Classification

[Krizhevsky 2012]

39

Image Retrieval

[Krizhevsky 2012]

40

Pose Estimation

[Toshev, Szegedy 2014]

41

Object Detection

[Toshev, Szegedy 2014]

42

Object Detection

[Huval et al. 2015]

43

Face Recognition

[Taigman et al. 2014]

44

Action Recognition

[Simonyan et al. 2014]

Thank You!

deep learning with gpus - files.meetup.comfiles.meetup.com/4379272/accelerated-deep-learning... ·...

Documents