deploying deep neural networks to embedded gpus and cpus - matlab · deep learning with matlab this...

34
1 © 2015 The MathWorks, Inc. Deploying Deep Neural Networks to Embedded GPUs and CPUs Dr Rishu Gupta Senior Application Engineer

Upload: others

Post on 22-Mar-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

1© 2015 The MathWorks, Inc.

Deploying Deep Neural Networks

to Embedded GPUs and CPUs

Dr Rishu Gupta

Senior Application Engineer

Page 2: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

2

Deep Learning Workflow in MATLAB

Application

logic

Application

Design

Standalone

Deployment

Deep Neural Network

Design + Training

Trained

DNN

Page 3: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

3

Deep Neural Network Design and Training

▪ Design in MATLAB

▪ Manage large data sets

▪ Automate data labeling

▪ Easy access to models

▪ Training in MATLAB

▪ Acceleration with GPU’s

▪ Scale to clusters

Train in MATLAB

Model

importer

Trained

DNN

Transfer

learning

Reference

model

Page 4: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

4

Application Design

Pre-

processing

Post-

processing

Application

logic

Embedded

Multi-Platform Deep Learning Deployment

Page 5: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

5

Algorithm Design to Embedded Deployment Workflow

DNN Design &

Train

1

Desktop

GPU

Application

Design

2

Desktop

GPU

C++

Deployment

integration-test

3

Embedded GPU

C++

Real-time test4

High-level language

Deep learning framework

Large, complex software stack

Challenges

• Integrating multiple libraries and packages

• Verifying and maintaining multiple

implementations

• Algorithm & vendor lock-in

C/C++

Low-level APIs

Application-specific libraries

C/C++

Target-optimized libraries

Optimize for memory & speed

Page 6: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

6

Solution: Use MATLAB Coder & GPU Coder for

Deep Learning Deployment

Application

logic

GPU

Coder

NVIDIA

TensorRT &

cuDNN

Libraries

Intel

MKL-DNN

Library

MATLAB

Coder

Target Libraries

ARM

Compute

Library

Page 7: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

7

Solution: Use MATLAB Coder & GPU Coder for

Deep Learning Deployment

Application

logic

GPU

Coder

NVIDIA

TensorRT &

cuDNN

Libraries

Intel

MKL-DNN

Library

MATLAB

Coder

ARM

Compute

Library

Page 8: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

8

Musashi Seimitsu Industry Co.,Ltd.Detect Abnormalities in Automotive Parts

MATLAB use in project:

▪ Preprocessing of captured images

▪ Image annotation for training

▪ Deep learning based analysis

– Various transfer learning methods

(Combinations of CNN models, Classifiers)

– Estimation of defect area using Class Activation Map

(CAM)

– Abnormality/defect classification

▪ Deployment to NVIDIA Jetson using GPU CoderAutomated visual inspection of 1.3 million

bevel gear per month

Page 9: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

9

Deep Learning Deployment Workflows

Pre-

processing

Post-

processing

codegen

Portable target code

INTEGRATED APPLICATION DEPLOYMENT

cnncodegen

Portable target code

INFERENCE ENGINE DEPLOYMENT

Trained

DNN

Page 10: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

10

Workflow for Inference Engine Deployment

Steps for inference engine deployment

1. Generate the code for trained model>> cnncodegen(net, 'targetlib’, ‘arm-

compute’)

2. Copy the generated code onto target board

3. Build the code for the inference engine>> make –C ./codegen –f …mk

4. Use hand written main function to call inference

engine

5. Generate the exe and test the executable>> make –C ./ ……

cnncodegen

Portable target code

INFERENCE ENGINE DEPLOYMENT

Trained

DNN

Page 11: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

11

Deep Learning Inference Deployment

Application

logic

GPU

Coder

NVIDIA

TensorRT &

cuDNN

Libraries

Intel

MKL-DNN

Library

MATLAB

Coder

Target Libraries

ARM

Compute

Library

Page 12: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

12

Deep Learning Inference Deployment

Application

logic

GPU

Coder

NVIDIA

TensorRT &

cuDNN

Libraries

Intel

MKL-DNN

Library

MATLAB

Coder

Target Libraries

ARM

Compute

Library

Pedestrian Detection

Page 13: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

13

ARM

Compute

Library

Application

logic

GPU

Coder

NVIDIA

TensorRT &

cuDNN

Libraries

Intel

MKL-DNN

Library

MATLAB

Coder

Target Libraries

How is the

performance?

Page 14: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

14

Performance of Generated Code

▪ CNN inference (ResNet-50, VGG-16, Inception V3) on Titan V GPU

▪ CNN inference (ResNet-50) on Jetson TX2

▪ CNN inference (ResNet-50 , VGG-16, Inception V3) on Intel Xeon CPU

Page 15: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

15

Single Image Inference on Titan V using cuDNN

PyTorch (1.0.0)

MXNet (1.4.0)

GPU Coder (R2019a)

TensorFlow (1.13.0)

Intel® Xeon® CPU 3.6 GHz - NVIDIA libraries: CUDA10 - cuDNN 7 - Frameworks: TensorFlow 1.13.0, MXNet 1.4.0 PyTorch 1.0.0

Page 16: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

16

Single Image Inference on Jetson TX2

NVIDIA libraries: CUDA9 - cuDNN 7 – TensorRT 3.0.4 - Frameworks: TensorFlow 1.12.0

GPU Coder

+

TensorRT

TensorFlow

+

TensorRT

ResNet-50

Page 17: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

17

CPU Performance

MATLAB

TensorFlow

MXNet

MATLAB Coder

PyTorch

Intel® Xeon® CPU 3.6 GHz - Frameworks: TensorFlow 1.6.0, MXNet 1.2.1, PyTorch 0.3.1

CPU, Single Image Inference (Linux)

Page 18: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

18

Brief Summary

DNN libraries are great for inference, …

MATLAB Coder and GPU Coder generates code that takes advantage of:

NVIDIA® CUDA libraries, including TensorRT & cuDNN

Intel® Math Kernel Library for Deep Neural Networks

(MKL-DNN)

ARM® Compute libraries for mobile platforms

Page 19: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

19

Brief Summary

DNN libraries are great for inference, …

MATLAB Coder and GPU Coder generates code that takes advantage of:

NVIDIA® CUDA libraries, including TensorRT & cuDNN

Intel® Math Kernel Library for Deep Neural Networks

(MKL-DNN)

ARM® Compute libraries for mobile platforms

But, applications

require more than just

inference

Page 20: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

20

Deep Learning Workflows: Integrated Application Deployment

Pre-

processing

Post-

processing

codegen

Portable target code

Page 21: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

21

Lane

Detection

Strongest

Bounding

Box

Lane and Object Detection using YOLO v2

Post-

processing

Object

Detection

Workflow:

1) Test in MATLAB

2) Generate code and test on

desktop

3) Generate code and test on

Jetson AGX Xavier GPU

AlexNet-based YOLO v2

Page 22: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

22

Lane

Detection

Strongest

Bounding

Box

(1) Test in MATLAB

Post-

processing

Object

Detection

AlexNet-based YOLO v2

Page 23: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

23

Lane

Detection

Strongest

Bounding

Box

(2) Generate Code and Test on Desktop GPU

Post-

processing

Object

Detection

cuDNN/TensorRT optimized code

CUDA optimized code

AlexNet-based YOLO v2

Page 24: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

24

Lane

Detection

Strongest

Bounding

Box

(3) Generate Code and Test on Jetson AGX Xavier GPU

Post-

processing

Object

Detection

cuDNN/TensorRT optimized code

CUDA optimized code

AlexNet-based YOLO v2

Page 25: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

25

Lane

Detection

Strongest

Bounding

Box

Lane and Object Detection using YOLO v2

Post-

processing

Object

Detection

cuDNN/TensorRT optimized code

CUDA optimized code

AlexNet-based YOLO v2

Workflow:

1) Test in MATLAB

2) Generate code and test on

desktop

3) Generate code and test on

Jetson AGX Xavier GPU

Page 26: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

26

Accessing Hardware

Deploy Standalone

Application

Access Peripheral

from MATLAB

Processor-in-Loop

Verification

Page 27: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

27

Deploy to Target Hardware via Apps and Command Line

Page 28: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

28

PyTorch (1.0.0)

MXNet (1.4.0)

GPU Coder (R2019a)

TensorFlow (1.13.0)

Page 29: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

29

PyTorch (1.0.0)

MXNet (1.4.0)

GPU Coder (R2019a)

TensorFlow (1.13.0)

How does

MATLAB Coder and

GPU Coder

achieve these results?

Page 30: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

30

Coders Apply Various Optimizations

….

….

CUDA kernel

lowering

Traditional compiler

optimizations

MATLAB Library function mapping

Parallel loop creation

CUDA kernel creation

cudaMemcpy minimization

Shared memory mapping

CUDA code emission

Scalarization

Loop perfectization

Loop interchange

Loop fusion

Scalar replacement

Loop

optimizations

Page 31: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

31

Deep Learning Workflow in MATLAB

Deep Neural Network

Design + Training

Train in MATLAB

Model

importer

Trained

DNN

Transfer

learning

Reference

model

Application

Design

Application

logic

Standalone

Deployment

TensorRT and

cuDNN Libraries

MKL-DNN

Library

Coders

ARM Compute

Library

Page 32: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

32

Deep Learning with MATLAB

This two-day course provides a comprehensive introduction to practical

deep learning using MATLAB®.

Topics include:

▪ Importing image and sequence data

▪ Using convolutional neural networks for image classification,

regression, and object detection

▪ Using long short-term memory networks for sequence

classification and forecasting

▪ Modifying common network architectures

to solve custom problems

▪ Improving the performance of a network

by modifying training options

Page 33: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

33© 2015 The MathWorks, Inc.

Email: [email protected],

LinkedIn: https://www.linkedin.com/in/rishu-gupta-72148914/

Page 34: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning

34

▪ Scan this QR Code or log onto link below

(link also sent to your phone and email)

▪ http://bit.ly/expo19-feedback

▪ Enter the registration id number displayed

on your badge

▪ Provide feedback for this session

Please provide feedback for this block of sessions

Email: [email protected],

LinkedIn: https://www.linkedin.com/in/rishu-gupta-72148914/