internet access - · 2017-03-26 · – nvidia® gpus in amazon aws cloud services train smarter...

1

INTERNET ACCESS

WiFi Access:

SSID: HPCSAUDI

Password: HPCSaudi@KAUST

Guest -> guest

Eduroam

2

NAVIGATING TO QWIKLABS

1. Navigate to: https://nvlabs.qwiklab.com

1. Login or create a new account

https://nvlabs.qwiklab.com/

3

HPC SA DLI WorkshopGunter Roeth [email protected] Deep Learning Institute Approved InstructorNVIDIA Corporation

4

AGENDA NVIDIA DLI

• 1:30 – 2:00 Workshop Overview

• 2:00 – 3:00 Getting Started with Deep Learning Lab

• 3:00 – 3:30 Break

• 3:30 – 5:00 Approaches to Object Detection using DIGITS Lab

5

NVIDIA DEEP LEARNING INSTITUTE

Training organizations and individuals to solve challenging problems using Deep Learning

On-site workshops and online courses presented by certified experts

Covering complete workflows for proven application use casesSelf-driving cars, recommendation engines, medical image classification, intelligent video analytics and more

www.nvidia.com/dli

Hands-on Training for Data Scientists and Software Engineers

http://www.nvidia.com/dli

6

TAKE THE SURVEY. GET LAB CREDITS.

Want more training from NVIDIA?

Here’s how to get it:

1. We’ll send you an email with a survey link

2. Complete and submit the survey

3. We’ll add credits to your Qwiklabs account

8

NVIDIA — “THE AI COMPUTING COMPANY”Pioneered GPU Computing | Founded 1993 | $7B | 9,500 Employees

COMPUTER GRAPHICSGPU COMPUTING ARTIFICIAL INTELLIGENCE

9

THE BIG BANG IN MACHINE LEARNING

“ Google’s AI engine also reflects how the world of computer hardware is changing. (It) depends on machines equipped with GPUs… And it depends on these chips more than the larger tech universe realizes.”

DNN GPUBIG DATA

12

EXAMPLE APPLICATIONS OF DEEP LEARNING

13

PRACTICAL EXAMPLES OF DEEP LEARNINGImage Classification, Object Detection,

Localization, Action RecognitionSpeech Recognition, Speech Translation,

Natural Language Processing

Breast Cancer Cell Mitosis Detection, Volumetric Brain Image Segmentation

Pedestrian Detection, Lane Detection, Traffic Sign Recognition

14

IMAGE CLASSIFICATION

Object

http://demo.caffe.berkeleyvision.org/

Open source demo code:

$CAFFE_ROOT/examples/web_demo

Scene

http://places.csail.mit.edu/

B. Zhou et al. NIPS 14

Style

http://demo.vislab.berkeleyvision.org/

Karayev et al. Recognizing Image Style. BMVC14

http://demo.caffe.berkeleyvision.org/

http://places.csail.mit.edu/

http://demo.vislab.berkeleyvision.org/

15

Expedites online shopping for 100 million users

with real-time object detection and classification

__________________________________________

– NVIDIA® GPUs in Amazon AWS cloud services

train smarter AI models and deliver more

responsive user experiences with faster

inferencing

– AI-powered visual search drives 50% higher

user engagement

– Generates nearly 25% of social media-driven

global sales

“The more people Pin, the better the technology

will become.”-Andrew Zhai, Software Engineer, Visual Discovery

team, Pinterest

16

SEMANTIC SEGMENTATION

17

FULLY CONVOLUTIONAL SEGMENTATION

18

Deep Learning for Self Driving Cars

Multi-class detection (DriveNet)

OpenRoadNet LaneNet 3D Bounding Boxes

The DriveNet team builds perception networks for autonomous driving

19

Offers a faster internet search option to 1

billion mobile users with voice search

_________________________________________

– NVIDIA® GPUs power Deep Speech 2 – the

world’s first advanced speech recognition

model to recognize English and Mandarin

– Delivers super-human accuracy

– GPUs deliver responsiveness not possible

on CPU servers

“Deep learning has pretty much taken over

speech recognition”-Andrew Ng, Chief Scientist, Baidu Research

21

CNN + RNN

THE NEXT STEP ̶̶̶̶̶̶̶̶̶ NATURAL LANGUAGE PROCESSING

22

Dynamic Memory NetworksMetaMind now SalesforceIQ https://arxiv.org/pdf/1603.01417v1.pdf

23

DEEP LEARNING Software

24

NVIDIA DEEP LEARNING SDKHigh Performance GPU-Acceleration for Deep Learning

COMPUTER VISION SPEECH AND AUDIO BEHAVIOR

Object Detection Voice Recognition TranslationRecommendation

EnginesSentiment Analysis

DEEP LEARNING

cuDNN

MATH LIBRARIES

cuBLAS cuSPARSE

MULTI-GPU

NCCL

cuFFT

Mocha.jl

Image Classification

DEEP LEARNING

SDK

FRAMEWORKS

APPLICATIONS

26

NVIDIA cuDNN

High performance building blocks for deep learning frameworks

Drop-in acceleration for widely used deep learning frameworks such as Caffe, CNTK, Tensorflow, Theano, Torch and others

Accelerates industry vetted deep learning algorithms, such as convolutions, LSTM, fully connected, and pooling layers

Fast deep learning training performance tuned for NVIDIA GPUs

Accelerating Deep Learning

developer.nvidia.com/cudnn

“NVIDIA has improved the speed of cuDNN with each

release while extending the interface to more

operations and devices at the same time”

— Evan Shelhamer, Lead Caffe Developer, UC Berkeley

K40

K80 + cuDNN1

M40 + cuDNN4

P100 + …

0x

10x

20x

30x

40x

50x

60x

70x

80x

AlexNet training throughput onCPU: 1x E5-2680v3 12 Core 2.5GHz. 128GB System Memory, Ubuntu 14.04M40 bar: 8x M40 GPUs in a node, P100: 8x P100 NVLink-enabled

Deep Learning Training PerformanceCaffe AlexNet

Speed-u

p o

f Im

ages/

Sec v

s K40 in 2

013

29

NVIDIA DIGITSInteractive Deep Learning GPU Training System

Test Image

Monitor ProgressConfigure DNNProcess Data Visualize Layers

developer.nvidia.com/digits

http://developer.nvidia.com/digits

32

NVIDIA TensorRTHigh-performance deep learning inference for production deployment

developer.nvidia.com/tensorrt

EMBEDDED

Jetson TX1

DATA CENTER

Tesla P4

Tesla P40

AUTOMOTIVE

Drive PX2

0

1,000

2,000

3,000

4,000

5,000

6,000

7,000

2 8 128

CPU-Only

Tesla P40 + TensorRT (FP32)

Tesla P40 + TensorRT (INT8)

Up to 36x More Image/sec

Batch Size

GoogLenet, CPU-only vs Tesla P40 + TensorRTCPU: 1 socket E4 2690 v4 @2.6 GHz, HT-onGPU: 2 socket E5-2698 v3 @2.3 GHz, HT off, 1 P40 card in the box

Images/

Second

33

NVIDIA DEEPSTREAM SDKDelivering Video Analytics at Scale

Inference

PreprocessHardware Decode

“Boy playing soccer”

Simple, high performance API for analyzing video

Decode H.264, HEVC, MPEG-2, MPEG-4, VP9

CUDA-optimized resize and scale

TensorRT

0

20

40

60

80

100

1x Tesla P4 Server +DeepStream SDK

13x E5-2650 v4 Servers

Concurr

ent

Vid

eo S

tream

s

Concurrent Video Streams Analyzed

720p30 decode | IntelCaffe using dual socket E5-2650 v4 CPU servers, Intel MKL 2017Based on GoogLeNet optimized by Intel: https://github.com/intel/caffe/tree/master/models/mkl2017_googlenet_v2

34

DEEP LEARNING H/W

35

NVIDIA DGX-1AI supercomputer-in-a-box

170 TFLOPS performance (half precision)

8x Tesla P100 16GB

NVLink Hybrid Cube Mesh

Optimized Deep Learning Software

Dual Xeon

512 GB DDR4 Memory

7 TB SSD Deep Learning Cache

Dual 10GbE, Quad IB 100Gb

3RU – 3200W

38

NVIDIA DGX-1 SOFTWAREOptimized for Deep Learning Performance

Accelerated Deep Learning

cuDNN NCCL

cuSPARSE

cuBLAS cuFFT

Container Based Applications

NVIDIA Cloud Management

Digits DL Frameworks GPU Apps

Research & Develop

Deploy & Manage

Package & Test

39

40x Efficient vs CPU, 8x Efficient vs FPGA

0

50

100

150

200

AlexNet

CPU FPGA 1x M4 (FP32) 1x P4 (INT8)

Images/

Sec/W

att

Maximum Efficiency for Scale-out Servers P4

# of CUDA Cores 2560

Peak Single Precision 5.5 TeraFLOPS

Peak INT8 22 TOPS

Low Precision4x 8-bit vector dot product

with 32-bit accumulate

Video Engines 1x decode engine, 2x encode engine

GDDR5 Memory 8 GB @ 192 GB/s

Power 50W & 75 W

AlexNet, batch size = 128, CPU: Intel E5-2690v4 using Intel MKL 2017, FPGA is Arria10-1151x M4/P4 in node, P4 board power at 56W, P4 GPU power at 36W, M4 board power at 57W, M4 GPU power at 39W, Perf/W chart using GPU power

TESLA P4

40

TESLA P40

P40

# of CUDA Cores 3840

Peak Single Precision 12 TeraFLOPS

Peak INT8 47 TOPS

Low Precision4x 8-bit vector dot product

with 32-bit accumulate

Video Engines 1x decode engine, 2x encode engines

GDDR5 Memory 24 GB @ 346 GB/s

Power 250W

0

20,000

40,000

60,000

80,000

100,000

GoogLeNet AlexNet

8x M40 (FP32) 8x P40 (INT8)

Images/

Sec

4x Boost in Less than One Year

GoogLeNet, AlexNet, batch size = 128, CPU: Dual Socket Intel E5-2697v4

Highest Throughput for Scale-up Servers

41

TESLA DEEP LEARNING PLATFORM

TRAINING INFERENCING

DIGITS Training System

Deep Learning Frameworks

Tesla P100

DeepStream SDK

TensorRT

Tesla P40 & P4

43

DEEP LEARNING

44

GPUS IN ARTIFICIAL INTELLIGENCE

Replace hand-tuned parameters of the feature extraction steps (e.g. in voice and image recognition)

Deep learning is a subset of machine learning that refers to artificial neural networks that are composed of many layers.

Artificial Neural Networks inspired by human brain and need lots of training data (ideal for Big Data).

NVIDIA GPUs and cuDNN software broadly adopted for machine learning.

Machine Learning

Neural

Networks

Deep

Learning

45

Tree

Cat

Dog

Machine Learning Software

“turtle”

Forward Propagation

Compute weight update to nudge

from “turtle” towards “dog”

Backward Propagation

Trained Model

“cat”

Repeat

Training

Inference

46

Convolutional Networks Used Case

Yann LeCun et al, 1998

Local receptive field + weight sharing

“Gradient-Based Learning Applied to Document Recognition”, Proceedings of the IEEE 1998, http://yann.lecun.com/exdb/lenet/index.html

MNIST: 0.7% error rate

http://yann.lecun.com/exdb/lenet/index.html

4747

0

0

0

0

0

0

0

0

1

1

1

0

0

0

0

1

2

2

1

1

1

0

1

2

2

2

1

1

0

1

2

2

2

1

1

0

0

1

1

1

1

1

0

0

0

0

0

0

0

4

0

0

0

0

0

0

0

-4

1

0

-8

Source

Pixel

Convolution

kernel

New pixel value

(destination

pixel)

Center element of the kernel is

placed over the source pixel.

The source pixel is then

replaced with a weighted sum

of itself and nearby pixels.

CONVOLUTION

4848

0

0

0

0

0

0

0

0

1

1

1

0

0

0

0

1

2

2

1

1

1

0

1

2

2

2

1

1

0

1

2

2

2

1

1

0

0

1

1

1

1

1

0

0

0

0

0

0

0

4

0

0

0

0

0

0

0

-4

1

0

-8

Source

Pixel

Filters consist

of a series of

weights (a.k.a.

parameters) Activation map

CNN TERMINOLOGY

49

Image “Volvo XC90”

Image source: “Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks” ICML 2009 & Comm. ACM 2011.Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Ng.

CONVOLUTIONAL NEURAL NETWORKS

50

GETTING STARTED WITH DEEP LEARNING

NVIDIA Deep Learning Institute Certified InstructorNVIDIA Corporation

52

LAUNCHING THE LAB ENVIRONMENT

53

NAVIGATING TO QWIKLABS

1. Navigate to: https://nvlabs.qwiklab.com

1. Login or create a new account

https://nvlabs.qwiklab.com/

55

ACCESSING LAB ENVIRONMENT

3. Select the event specific In-Session Class in the upper left

3. Click the “Getting Started with Deep Learning” Class from the list

56


Click on the Select 5.button to launch the lab environment

After a short •wait, lab Connection information will be shown

Please ask Lab •Assistants for help!

57


6. Click on the Start Lab button

58


You should see that the lab environment is “launching” towards the upper-right corner

59

CONNECTING TO THE LAB ENVIRONMENT

7. Click on “here” to access your lab environment / Jupyter notebook

60

CONNECTION INSTRUCTIONS

Navigate to nvlabs.qwiklab.com, login or create a new account

Select the “HPC Saudi DL Workshop”

Find the lab called “Getting Started with Deep Learning”

Click Select, and finally click the green button

After the lab instance sets up, connection info will be shown, click ‘here’

Please ask Lab Assistants for help!

61

CONNECTING TO THE LAB ENVIRONMENT

You should see your “Getting Started With Deep Learning” Jupyter notebook

62

JUPYTER NOTEBOOK

1. Place your cursor in the code

2. Click the “run cell” button

2. Confirm you receive the same result

63

STARTING DIGITS

Instruction in Jupyter notebook will link you to DIGITS

64

ACCESSING DIGITS

• Will be prompted to enter a username to access DIGITS

• Can enter any username

• Use lower case letters

66

CREATE DATASET IN DIGITS

• Dataset settings

• Image Type: Grayscale

• Image Size: 28 x 28

• Training Images: /home/ubuntu/data/train_small

• Select “Separate test images folder” checkbox

• Test Images: /home/ubuntu/data/test_small

• Dataset Name: MNIST Small

6767

HANDWRITTEN DIGIT RECOGNITION

• MNIST data set of handwritten digits from Yann Lecun’s website

• All images are 28x28 grayscale

• Pixel values from 0 to 255

• 60K training examples / 10K test examples

• Input vector of size 784

• 28 * 28 = 784

• Output value is integer from 0-9

HELLO WORLD of machine learning?

68

CREATE MODEL

• Select the “MNIST small” dataset

• Set the number of “Training Epochs” to 10

• Set the framework to “Caffe”

• Set the model to “LeNet”

• Set the name of the model to “MNIST small”

• When training done, Classify One :

/home/ubuntu/data/test_small/2/img_4415.png

6969

Loss function(Validation)

Loss function(Training)

Accuracyobtained from

validation dataset

EVALUATE THE MODEL

70

ADDITIONAL TECHNIQUES TO IMPROVE MODEL

• More training data

• Data augmentation

• Modify the network

71

ADDITIONAL TERMINOLOGY• Hyperparameters – parameters specified before training begins

• Can influence the speed in which learning takes place• Can impact the accuracy of the model• Examples: Learning rate, decay rate, batch size

• Epoch – complete pass through the training dataset

• Activation functions – identifies active neurons• Examples: Sigmoid, Tanh, ReLU

• Pooling – Down-sampling technique• No parameters (weights) in pooling layer

72

LAB REVIEW

73

FIRST RESULTSSmall dataset ( 10 epochs )

• 96% of accuracy achieved

• Training is done within one minute

SMALL DATASET

1 : 99.90 %

2 : 69.03 %

8 : 71.37 %

8 : 85.07 %

0 : 99.00 %

8 : 99.69 %

8 : 54.75 %

74

FULL DATASET6x larger dataset

Dataset•

Training Images: /home/ubuntu/data/train_full•

Test Image: /home/ubuntu/data/test_full•

Dataset Name: MNIST full•

Model•

Clone “MNIST small”.•

Give a new name “MNIST full” to push the create button•

75

SMALL DATASET FULL DATASET

1 : 99.90 % 0 : 93.11 %

2 : 69.03 % 2 : 87.23 %

8 : 71.37 % 8 : 71.60 %

8 : 85.07 % 8 : 79.72 %

0 : 99.00 % 0 : 95.82 %

8 : 99.69 % 8 : 100.0 %

8 : 54.75 % 2 : 70.57 %

SECOND RESULTSFull dataset ( 10 epochs )

• 99% of accuracy achieved

• No improvements in recognizing real-world images

76

DATA AUGMENTATIONAdding Inverted Images

Pixel(Inverted) = • 255 – Pixel(original)

White letter with black background•

Black letter with white background•

Training Images:•/home/ubuntu/data/train_invert

Test Image:•/home/ubuntu/data/test_invert

Dataset Name: MNIST invert•

77

SMALL DATASET FULL DATASET +INVERTED

1 : 99.90 % 0 : 93.11 % 1 : 90.84 %

2 : 69.03 % 2 : 87.23 % 2 : 89.44 %

8 : 71.37 % 8 : 71.60 % 3 : 100.0 %

8 : 85.07 % 8 : 79.72 % 4 : 100.0 %

0 : 99.00 % 0 : 95.82 % 7 : 82.84 %

8 : 99.69 % 8 : 100.0 % 8 : 100.0 %

8 : 54.75 % 2 : 70.57 % 2 : 96.27 %

DATA AUGMENTATIONAdding inverted images ( 10 epochs )

78

MODIFY THE NETWORKAdding filters and ReLU layer

layer { name: "pool1“type: "Pooling“…

}

layer {name: "reluP1"type: "ReLU"bottom: "pool1"top: "pool1"

}

layer {name: "reluP1“

layer {name: "conv1"type: "Convolution"

...convolution_param {num_output: 75...

layer {name: "conv2"type: "Convolution"...convolution_param {num_output: 100...

79

MODIFY THE NETWORKAdding ReLU Layer

80

SMALL DATASET FULL DATASET +INVERTED ADDING LAYER

1 : 99.90 % 0 : 93.11 % 1 : 90.84 % 1 : 59.18 %

2 : 69.03 % 2 : 87.23 % 2 : 89.44 % 2 : 93.39 %

8 : 71.37 % 8 : 71.60 % 3 : 100.0 % 3 : 100.0 %

8 : 85.07 % 8 : 79.72 % 4 : 100.0 % 4 : 100.0 %

0 : 99.00 % 0 : 95.82 % 7 : 82.84 % 2 : 62.52 %

8 : 99.69 % 8 : 100.0 % 8 : 100.0 % 8 : 100.0 %

8 : 54.75 % 2 : 70.57 % 2 : 96.27 % 8 : 70.83 %

MODIFIED NETWORKAdding filters and ReLU layer ( 10 epochs )

81

DEEP LEARNING FOR APPROACHES TO OBJECT DETECTION

NVIDIA Deep Learning Institute Certified InstructorNVIDIA Corporation

83

COMPUTER VISION TASKSImage

SegmentationObject Detection

Image Classification +

Localization

Image Classification

(inspired by a slide used in cs231n lecture from Stanford University)

85

CNN USED CASEFully convolutional pixel segmentation : level classification and segmentation

http://fcn.berkeleyvision.orgLong, Shelhamer, Darrell, Fully convolutional networks for semantic segmentation, CVPR 2015

http://fcn.berkeleyvision.org/

86

ALEXNET ARCHITECTURE

Source: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

87

ALEXNET LAYERS

88

IMAGE CLASSIFICATION

Alexnet classification of an image of a cat from the PASCAL VOC dataset.

89

LIMITS OF IMAGE CLASSIFICATION

90

OBJECT DETECTION LAB - PART1.1Sliding window classifier

91

OBJECT DETECTION LAB - PART 1.2Overlapping windows

92

IMAGE SEGMENTATION

93

IMAGE SEGMENTATION

Image segmentation (middle) vs. Instance-aware Image Segmentation (right). Images

from the PASCAL VOC dataset.

95

TAKE THE SURVEY. GET LAB CREDITS.

Want more training from NVIDIA?

Here’s how to get it:

1. We’ll send you an email with a survey link

2. Complete and submit the survey

3. We’ll add credits to your Qwiklabs account

96

Merci Thanks ☺Gunter RoethNVIDIA Deep Learning Institute Approved InstructorNVIDIA Corporation

internet access - · 2017-03-26 · – nvidia® gpus in amazon aws cloud services train smarter...

Documents