urs köster presenting at re-work dl summit in boston

Post on 13-Apr-2017

270 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Proprietary and confidential. Do not distribute.

Deep Learning at Scale

May 2016 Urs Köster, PhD

Nervana

MAKING MACHINES SMARTER.

Proprietary and confidential. Do not distribute.

ne r vana

About nervana

2

• A platform for machine intelligence

• enable deep learning at scale

• optimized from algorithms to silicon

X

Proprietary and confidential. Do not distribute.

ne r vana

The Nervana Platform - a full-stack solution

3

neon deep learning

framework

nervana cloud Solutions

Images

Text

Tabular

Speech

Time series

Video

neon: nervana python deep learning library

4

• User-friendly, extensible, fast

• Support for many deep learning models

• Interface to nervana cloud

• Multiple backends

• nervana engine

• GPU (optimized assembler kernels)

• CPU cluster

Open source (Apache 2.0) on github.com/nervanaSystems/neon

Proprietary and confidential. Do not distribute.

ne r vana

Nervana Cloud

5

web interface

command line

Proprietary and confidential. Do not distribute.

ne r vana

Deep learning as a core technology

6

DL

Photos Maps

Voice Search

Self-driving car

Ad Targeting

Machine Translation

‘Google Brain’ model

DL

Image Classification

Object Localization

Video Indexing

Speech Recognition

Nervana Platform

Natural Language

Proprietary and confidential. Do not distribute.

ne r vana

Video recognition with 3D convolution

7

Training Speed

0

0.25

0.5

0.75

1

epochs / hour

neon caffe

Proprietary and confidential. Do not distribute.

ne r vana

Object Localization / Segmentation

8

CamVid DatasetSegNet model

KITTI DatasetFast R-CNN model

neon (ms) caffe (ms) Speedup

Fast-RCNN (batch size=4) 360 670 1.8x

SegNet (batch size=4) 267 1455 5.4x

SegNet (4 GPUs, batch size=16) 348 -- *5.9x

Proprietary and confidential. Do not distribute.

ne r vana

Image Classification (Residual Network)

9

Proprietary and confidential. Do not distribute.

ne r vana

Speech to text

10

Proprietary and confidential. Do not distribute.

ne r vana

Imagenet ILSVRC Challenge

11

Top-5

err

or

rate

0%

10%

20%

30%

2010 2011 2012 2013 2014 2015

Deep learninghuman

performance

Alex

Net

C

larifa

i

Goo

gleNe

t

Res

Net

Proprietary and confidential. Do not distribute.

ne r vana 12

• Same model, better performance:

• Hardware improvements

• Algorithmic improvements

Speeding up Deep Learning

0100200

300400500600

CPU GTX580TitanX neon

Soumith's AlexNet Benchmark

ms

0

100

200

300

400

500

4/2015 8/2015 3/2016

neonCuDNN

Soumith's GoogleNet Benchmark

ms

0

100

200

300

400

500

4/2015 8/2015 3/2016

neonCuDNN

15,000 ...

Alexnet ms / iteration

Proprietary and confidential. Do not distribute.

ne r vana

Dennard scaling has ended

13

# OF PROCESSORS

LEARNING SPEED

INDUSTRY STANDARD: COMMUNICATION OVERHEAD = PERFORMANCE CEILING

NERVANA: BETTER COMMUNICATION FABRIC, NEAR LINEAR SCALING

Transistors Clock speed Power Perf / clock

Proprietary and confidential. Do not distribute.

ne r vana

Nervana Engine (coming in 2017)

14

• Unprecedented computing power

• 10x speedup over current GPUs

• More memory on-chip

• High-Bandwidth Memory off-chip

• Six bi-directional high-bandwidth

links for 3D torus interconnect

• 8 chips in a box, seamlessly scale

to multiple chassis

Proprietary and confidential. Do not distribute.

ne r vana

Summary

15

• Deep learning is a new computational paradigm

• Learning and Inference on data

• neon with state-of-the-art GPU kernels

• Nervana Cloud with multi-GPU training

• Watch for Nervana Engine deep learning processor

top related