a deeper dive into apache mxnet - march 2017 aws online tech talks

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.Webinars

Sunil Mallya

Solutions Architect, Deep Learning

A Deeper Dive into Apache MXNet on AWS

Agenda

• Apache MXNet introduction• Distributed Deep Learning with AWS Cloudformation• Deep Learning motivation and basics• MXNet programing model overview• Train our first neural network using MXNet

Deep Learning ApplicationsSignificantly improve many applications on multiple domains

image understanding speech recognition natural language processing

autonomy

• Netflix – Recommendation Engine• FINRA – Anonmaly detection, Sequence matching• TuSimple - Computer Vision for Autonomous Driving• Pinterest - Image recognition search• Mapillary - Computer vision for crowd sourced maps

AI Customers on AWS

AI Services

AI Platform

AI Engines

Amazon Rekognition

Amazon Polly

Amazon Lex

More to comein 2017

Amazon Machine Learning

Amazon Elastic MapReduce

Spark & SparkML

More to comein 2017

Apache MXNet TensorFlow Caffe Theano KerasTorch CNTK

P2 ECS LambdaEMR/Spark GreenGrass FPGA More to comein 2017

Hardware

Democratizing Artificial Intelligence

Apache MXNet

Programmable Portable High PerformanceNear linear scaling

across hundreds of GPUsHighly efficient

models for mobileand IoT

Simple syntax, multiple languages

88% efficiencyon 256 GPUs

Resnet 1024 layer network is ~4GB

Webinars

Distributed Deep Learning

IdealInception v3Resnet

Alexnet

88%Efficiency

1 2 4 8 16 32 64 128 256No. of GPUs

• Cloud formation with Deep Learning AMI

• 16x P2.16xlarge. Mounted on EFS

• Inception and Resnet: batch size 32, Alex net: batch size 512

• ImageNet, 1.2M images,1K classes

• 152-layer ResNet, 5.4d on 4x K80s (1.2h per epoch), 0.22 top-1 error

Scaling with MXNet

Distributed Training Setup with Cloudformation

https://github.com/awslabs/deeplearning-cfn

Webinars

Deep Learning basics

Biological Neuron

slide from http://cs231n.stanford.edu/

Artificial Neuron

output

synapticweights

• InputVector of training data x

• OutputLinear function of inputs

• NonlinearityTransform output into desired range of values, e.g. for classification we need probabilities [0, 1]

• TrainingLearn the weights w and bias b

Deep Neural Network

hidden layers

The optimal size of the hidden layer (number of neurons) is usually between the size of the input and size of the output layers

Input layer

output

The “Learning” in Deep Learning

0.4 0.3

0.2 0.9

...

back propogation (gradient descent)

X1 != X0.4 ± 𝛿 0.3 ± 𝛿

newweights

newweights

01011

.

.--X

input

label

...X1

Hidden Layer Visualization

Webinars

MXNet Programing Model

import numpy as npa = np.ones(10)b = np.ones(10) * 2c = b * a

• Straightforward and flexible.• Take advantage of language

native features (loop, condition, debugger)

• E.g. Numpy, Matlab, Torch, …

• Hard to optimize

PROS

CONSd = c + 1c

Easy to tweak with python codes

Imperative Programing

• More chances for optimization• Cross different languages• E.g. TensorFlow, Theano,

Caffe

• Less flexible

PROS

CONSC can share memory with D because C is deleted later

A = Variable('A')B = Variable('B')C = B * AD = C + 1f = compile(D)d = f(A=np.ones(10),

B=np.ones(10)*2)

A B

1

+

X

Declarative Programing

IMPERATIVE NDARRAY API

DECLARATIVE SYMBOLIC EXECUTOR

>>> import mxnet as mx>>> a = mx.nd.zeros((100, 50))>>> b = mx.nd.ones((100, 50))>>> c = a + b>>> c += 1>>> print(c)

>>> import mxnet as mx>>> net = mx.symbol.Variable('data')>>> net = mx.symbol.FullyConnected(data=net, num_hidden=128)>>> net = mx.symbol.SoftmaxOutput(data=net)>>> texec = mx.module.Module(net)>>> texec.forward(data=c)>>> texec.backward() NDArray can be set

as input to the graph

MXNet: Mixed programming paradigm

Webinars

Lets train our first model to classify handwritten digits

MXNet Overview

• Founded by: U.Washington, Carnegie Mellon U. (~1.5yrs old)• Recently Accepted to the Apache Incubator • State of the Art Model Support: Convolutional Neural Networks (CNN), Long

Short-Term Memory (LSTM)• Scalable: Near-linear scaling equals fastest time to model• Multi-language: Support for Scala, Python, R, etc.. for legacy code leverage and

easy integration with Spark• Ecosystem: Vibrant community from Academia and Industry

Open Source Project on Github | Apache-2 Licensed

Application Examples | Python notebooks• https://github.com/dmlc/mxnet-notebooks• Basic concepts

• NDArray - multi-dimensional array computation• Symbol - symbolic expression for neural networks• Module - neural network training and inference

• Applications• MNIST: recognize handwritten digits• Check out the distributed training results• Predict with pre-trained models• LSTMs for sequence learning• Recommender systems• Train a state of the art Computer Vision model (CNN)• Lots more..

Call to ActionMXNet Resources:• MXNet Blog Post | AWS Endorsement • Read up on MXNet and Learn More: mxnet.io• MXNet Github Repo • MXNet Recommender Systems Talk | Leo DiracDeveloper Resources:• Deep Learning AMI | Amazon Linux• Deep Learning AMI | Ubuntu – NEW!!!• P2 Instance Information• CloudFormation Template Instructions• Deep Learning Benchmark • MXNet on Lambda • MXNet on ECS/Docker• MXNet on Raspberry Pi | Wine Detector

Webinars

Thank You

[email protected]

a deeper dive into apache mxnet - march 2017 aws online tech talks

Technology