ai on the edge - cambridge wireless · cyrus m. vahid, principal solutions architect, principal...

26
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearning Solution Architect AWS DeepLearning [email protected] Oct 2017 AI On the Edge

Upload: others

Post on 06-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Cyrus M. Vahid, Principal Solutions Architect,Principal DeepLearning Solution Architect

AWS [email protected]

Oct 2017

AI On the Edge

Page 2: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the

Motivation

Page 3: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the

Training vs. Inference

• Training is performed on the cloud.

• Inference is performed everywhere

• Efficiency of inference is indispensable to address:• Latency• Connectivity• Cost• Privacy/Security

Page 4: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the

MotivationLarge DNNs require huge amounts of memory--e.g.

Alexnet Caffemodel is over 200MB VGG-16 CaffeModel is over 500MB.

Complex computation makes apps power hungry.

Edge devices have low power and small memory capacity------------------------------------------------------∴

To run models on the edge we need to compress them significantly

Page 5: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the

Motivating Examples From Customers

• Industrial IoT (Out of Distribution/Anomaly Detection)

Page 6: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the

Motivating Examples From Customers

• Real Time Filtering (Neural Style Transfer)

Page 7: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the

Motivating Examples From Customers

• Building a Better Hearing Aid (Recurrent Acoustic Models)

Page 8: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the

Motivating Examples From Customers

• Security Robots (Object Detection and Recognition)

Page 9: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the

Autonomous Vehicles

Page 10: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the

Model Compression

Page 11: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the

Computational Efficiency• The goal is to reduce floating point operations and

number of parameters:

Fast Fourier Transform

Most effective for larger kernels

2𝑁# → 2𝑁𝑙𝑛(𝑁) Winograd FFT2.25𝑡𝑖𝑚𝑒𝑠𝑟𝑒𝑢𝑐𝑡𝑖𝑜𝑛𝑓𝑜𝑟

𝐹(2×2, 3×3)

Tensor Contraction Layer

𝐼 = 𝑥=,= ⋯ 𝑥?,=⋮ ⋱ ⋮𝑥=,B ⋯ 𝑥?,B ?CB

𝐹 = 𝑓=,= ⋯ 𝑓D,=⋮ ⋱ ⋮𝑓=,E ⋯ 𝑓D,E DCE

𝐹 = 𝐹= ⊗…⊗ 𝐹D

Separable KernelsO 𝑤J → 𝑂 𝑤×𝑑

Very effective on CPU

Page 12: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the

Model Compression: Pruning-Quantization-Encoding

arXiv:1510.00149v5

Page 13: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the

Model Compression: Pruning

• Pruning is removing connections that are less effective in computation of a network.

• After training is performed, then all the weights that are smaller than a certain threshold are removed, and model is retrained.

• Reduction of number of parameters by 9-13 times without loss of accuracy is shown. [arXiv:1510.00149v5]

Page 14: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the

Model Compression: Quantization

• Quantization is about using fewer bits to express the same information.

• Wight sharing a one method of quantization via using centroids as shared weights.

[arXiv:1510.00149v5]:weightsharingthroughscalarquantization

Good to take advantage of low precision hardware acceleration

Page 15: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the

Model Compression: Hoffman Coding

• A Hoffman code is an optimal prefix code commonly used for lossless data compression.

• It uses variable-length code words to encode source symbols.

• More common symbols are represented with fewer bits.

[arXiv:1510.00149v5]

• probabilitydistributionofquantizedweightsandthesparsematrixindexofthelastfullyconnectedlayerinAlexNet.

• mostofthequantizedweightsaredistributedaroundthetwopeaks;thesparsematrixindexdifferencearerarelyabove20

• ExperimentsshowthatHuffmancodingthesenon-uniformlydistributedvaluessaves20%- 30% ofnetworkstorage.

BMXNet – Collaborators in the MXNetcommunity, brought this to binary weightshttps://github.com/hpi-xnor/BMXNet

Page 16: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the

Reduced Architecture

SqueezeNet: AlexNet Accuracy with 50x Fewer Parameters

Good for devices with low RAM that can’t hold all weights for larger models concurrently in memory

Student/Teacher training

Page 17: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the

Comparing Techniques

WinogradConvolutions

SeparableConvolutions

Quantization Tensor Contractions

Sparsity Exploitation

Weight Sharing

CPUAcceleration

+ ++ = ++ + +

GPU Acceleration

+ + + + = +

Model Size = = - - - -

ModelAccuracy

= - - - - -SpecializedHardware Acceleration

+ + ++ + + +

Page 18: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the
Page 19: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the

Edge Compute Models – AWS IoT

Key Functions• Data Ingest• Compressed Inference• Full Inference / Trained Model Query• Model Training

Deployment ModelsCloud <-> EdgeCloud <-> Hub <-> Edge

Edge Analytics Trends : Reduce Latency, Reduce Transfer Costs

Page 20: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the

AWS Deep Learning Infrastructure Tools

P2 Instances:Up to 40K CUDA Cores

Deep Learning AMI,Preconfigured for Deep Learning mxnet, TensorFlow, …

CFM TemplateDeep Learning Cluster

Page 21: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the

Apache MXNet

Most Open Best On AWSOptimized for deep learning on AWSAccepted into the Apache Incubator

Page 22: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the

IdealInception v3Resnet

Alexnet

88%Efficiency

1 2 4 8 16 32 64 128 256

Amazon AI: Scaling With MXNet

Page 23: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the

Manage and Monitor Models on The Fly

AWS

Captured Data

Upload Tagged Data

Escalate toAI Service

Escalate toCustom Model on P2

Deploy andManage Model

Page 24: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the

Local Learning LoopPoorly

Classified Data

Updated Model

Fine Tune Model With Accurate Classification

Page 25: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the

References

• arXiv:1510.00149v5: Deep Compression; Han, Mao, and Dally• arXiv:1509.09308v2: Fast Algorithms for CNN, Laving & Gray• arXiv:1706.00439v1: Tensor Contraction Layers; Anima Anandkumar et al• arXiv:1606.09274v1 : Compression of NMT via Pruning; See, Luong, Manning• http://cs231n.stanford.edu/reports/2016/pdfs/117_Report.pdf: Pruning Winograd and FFT based

algorithms; Liu and Turakhia• https://colfaxresearch.com/falcon-library/• https://betterexplained.com/articles/an-interactive-guide-to-the-fourier-transform/• https://en.wikipedia.org/wiki/Fast_Fourier_transform• https://arxiv.org/pdf/1611.06321.pdf: Learning the Number of Neurons in Deep Networks• https://aclweb.org/anthology/D16-1139: Sequence Level Knowledge Distillation; Kim and Rush

Page 26: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the

Thank you!

Cyrus M. [email protected]