visual computing for cloud mobile...visual computing for cloud mobile . 2 three trends converging...

Post on 05-Jul-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

HPC Advisory Council Singapore

October 7, 2014

Marc Hamilton, Vice President,

Solution Architecture and Engineering

VISUAL COMPUTING

FOR CLOUD MOBILE

2

THREE TRENDS CONVERGING

Torrent of Data

2010 2015

Exabyte

s of

unst

ructu

red d

ata

Deep Neural Networks GPU Computing

SOURCE: : IDC

3

Branch of Artificial Intelligence

Computers that learn from data

person

car

helmet

motorcycle

bird

frog

person

dog

chair

person

hammer

flower pot

power drill

MACHINE LEARNING

4

DEEP LEARNING IN A LARGER CONTEXT

Data Science

(“Big Data”)

Data

Analysis

Data

Management

Some GPU value

SVM

K-Means

Clustering

Deep Learning

Deep Neural Nets

Convolutional Neural Nets

Strong GPU value

Recommender Systems

Collaborative Filtering

Regression

Bayesian Networks

Decision Trees

Random Forests

Semantic Analysis

More research to prove

GPU value

Machine

Learning

Distributed

Storage

e.g. HDFS

Queries & Indexing

e.g. Map-D, GISFederal, SQream

Data Mining

e.g. Statistics

5

GPUS FOR DEEP LEARNING

1.2M training images • 1000 object

categories Hosted by

Image Recognition

CHALLENGE Winning %

Error

GPU usage for ILSVRC

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0%

5%

10%

15%

20%

25%

30%

2010 2011 2012 2013 2014

Winning % Error

% Teams

using GPUs

6

NUS WINS IMAGENET 2014 CHALLENGE

7

MACHINE LEARNING USE CASES

Face Detection

Autonomous Driving Image / Video Tagging

Speech Recognition

Product Recommendations

Object Recognition

Situational Awareness

…machine learning is pervasive

8

A B C

D E F

G H I

a b c

d e f

g h i

EFFICIENT CONVOLUTIONS ON GPUS

Convolution as GEMM (matrix-matrix product) => Great on GPUs

x

y

image

kernel α

- A B - D E - G H

A B C D E F G H I

B C - E F - H I -

i

h

g

f

… e …

d

c

b

a

x,y

α

i h g

f e d

c b a

9

INTRODUCING NVIDIA CUDNN

Lets DNN researchers focus on DNNs

We provide expertly tuned computational components

Accelerate, don’t replace, existing popular DNN frameworks

Forward and backward convolution routines tuned for NVIDIA GPUs

Optimized for all future NVIDIA GPU generations

Arbitrary dimension ordering, striding, and subregions for 4d tensors means easy integration into any neural net implementation

Download: http://www.nvidia.com/cudnn

Contact: cudnn@nvidia.com

10

USING CAFFE WITH CUDNN

Accelerate Caffe layer types by 1.2 – 3x

Example: AlexNet Layer 2 forward:

1.9x faster convolution, 2.7x faster pooling

Integrated into Caffe dev branch today! (targeting official release with Caffe 1.0)

Comparison against SOL: ~50% headroom

(still trying to figure this out)

CPU could probably get within ~3x

Caffe (CPU*)

1x

Caffe (GPU) 11x

Caffe (cuDNN)

14x

Baseline Caffe compared to Caffe

accelerated by cuDNN on K40

Overall AlexNet training time

*CPU is 24 core E5-2697v2 @ 2.4GHz

Intel MKL 11.1.3

11

Deep Learning with COTS HPC Systems

A. Coates, B. Huval, T. Wang, D. Wu, A. Ng, B. Catanzaro

Stanford / NVIDIA • ICML 2013

STANFORD AI LAB

3 GPU-Accelerated Servers

12 GPUs • 18,432 cores

4 kWatts

$33,000

Now You Can Build Google’s

$1M Artificial Brain on the

Cheap

-Wired

1,000 CPU Servers 2,000 CPUs • 16,000

cores

600 kWatts

$5,000,00

0

GOOGLE BRAIN

12

Mobile - More Than Just Phones

13

MOBILE

ARCHITECTURE

Maxwell

Kepler

Tesla

Fermi

Tegra 3

Tegra 4

Tegra

K1

GPU

ARCHITECTURE

UNIFIED ARCHITECTURE TEGRA K1 – MOBILE SUPER

CHIP

BREAKTHROUGH EXPERIENCES

TEGRA TK1

14

192 CUDA cores

326 GFLOPS

VisionWorks SDK

JETSON TK1 DEV KIT 1ST MOBILE SUPERCOMPUTER FOR EMBEDDED SYSTEMS

15

DIGITAL COCKPIT

EVOLUTION OF COMPUTING IN THE CAR

Tegra 4 Tegra 3 Tegra K1

Virtual Cockpit Autonomous Driving Infotainment

16

COMPUTER VISION ON CUDA

Feature Detection / Tracking ~30 GFLOPS @ 30 Hz

Object Recognition / Tracking ~180 GFLOPS @ 30 Hz

3D Scene Interpretation ~280 GFLOPS @ 30 Hz

17

Without GPU With GPU

NIGHT AND DAY DIFFERENCE HTTP://NVIDIA.COM/TRYGRID

18

Thank You

top related