designing light-weight networks for mobile applications · networks on mobile devices a.designing...

109
Designing Light-Weight Networks for Mobile Applications Jianping Fan Dept of Computer Science UNC-Charlotte Course Website: http://webpages.uncc.edu/jfan/itcs5152.html

Upload: others

Post on 21-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Designing Light-Weight Networks for Mobile Applications

Jianping Fan

Dept of Computer Science

UNC-Charlotte

Course Website: http://webpages.uncc.edu/jfan/itcs5152.html

Page 2: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Outline

1. Background

2. SqueezeNet

3. MobileNet v1

4. MobileNet v2

5. ShuffleNet

6. Summary

7. References

2

Page 3: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model
Page 4: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Source:

http://isca2016.eecs.umich.edu/wp-content/uploads/2016/07/4A-1.pdf 4

Page 5: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Special Requirements from Mobile Applications

• Low Battery small size of model, cost-efficient inference methods, short inference time & cost

• Small Memory small size of model, less parameters, less data-hungry for model updating

• Quick Response: short inference time

Small Model

Page 6: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

For Mobile Applications

Accuracy vs. Energy Cost & Model Size

Page 7: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Why smaller models?

Page 8: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Why smaller models?

Page 9: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Why smaller models?

Page 10: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Why smaller models?

Page 11: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Why smaller models?

Page 12: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Why smaller models?

12

Operation Energy [pJ] Relative Cost

32 bit int ADD 0.1 1

32 bit float ADD 0.9 9

32 bit Register File 1 10

32 bit int MULT 3.1 31

32 bit float MULT 3.7 37

32 bit SRAM Cache 5 50

32 bit DRAM Memory 640 6400

Source:

http://isca2016.eecs.umich.edu/wp-content/uploads/2016/07/4A-1.pdf

Page 13: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

13

Original Net Design

Knowledge Distillation

Model Compression

Efficient Implementation

loss

student

teacher

data

New Layer Types

Design Space Exploration

Techniques for Creating Fast & Energy-Efficient DNNs

Page 14: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

14

Original Net Design

Knowledge Distillation

Model Compression

Efficient Implementation

loss

student

teacher

data

New Layer Types

Design Space Exploration

Techniques for Creating Fast & Energy-Efficient DNNs

Page 15: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

15

Original Net Design

Knowledge Distillation

Model Compression

Efficient Implementation

loss

student

teacher

data

New Layer Types

Design Space Exploration

Techniques for Creating Fast & Energy-Efficient DNNs

Page 16: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Where computational cost comes from?

Page 17: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Where computational cost comes from in deep networks?

• Kernel numbers for convolution

• Channel numbers for image inputs or feature maps

• Size of feature maps

Page 18: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Kernel Reduction

While 1x1 filters cannot see outside of a 1-pixel radius, they retain the ability to combine and reorganize information across channels.

SqueezeNet (2016): we found that we could replace half the 3x3 filters with 1x1's without diminishing accuracy

SqueezeNext (2018): eliminate most of the 3x3 filters – we use mix of 1x1, 3x1, and 1x3 filters (and still retain accuracy)

3

3

x numFilt

1

1 x numFilt

REDUCING THE SIZE (HEIGHT AND WIDTH) OF FILTERS

Page 19: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Kernel Reduction Decomposing larger filter into smaller ones

Page 20: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Channel Reduction REDUCING THE NUMBER of CHANNELS

Page 21: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Depthwise Separable Convolution

used in recent papers such as MobileNets and ResNeXt

3

3

x numFilt

3

3

x numFilt

ALSO CALLED: "GROUP CONVOLUTIONS" or "CARDINALITY"

Page 22: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Channel Reduction REDUCING THE NUMBER of CHANNELS

Page 23: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Feature Map Reduction REDUCING THE SIZE of FEATURE MAPS

Page 24: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Four Advantages of Light-Weight (Smaller) Networks:

(1) Smaller CNNs require less communication

across servers during distributed training.

(2) Smaller CNNs require less bandwidth to export

a new model from the cloud to a mobile device.

(3) Smaller CNNs are more feasible to deploy on

FPGAs and other hardware with limited

memory.

(4) Smaller CNNs result in less inference time and

storage space.

Page 25: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

DNN Challenges in Training

Don’t support full training due to energy inefficiency

TFLite Apple AI Hawaii NPUNervana

Highly Parallel Architecture

High Precision Computation2

1

DNN/CNN Training

How about using existing PIM architectures?

Large Data Movement3

Mohsen Imani, Saransh Gupta, Yeseo, Kim, Tajana Rosing:University of California San Diego

Page 26: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Neural Networks

Activation Function (g)

Derivative Activation (g’)

Feed Forward

Zi Weight Matrix ajZj

g’(aj)

Weight Wij

j

i

k w1j

wij

z1

zi

g zj

aj

wjkwij

i j

jkwk )ja('g= j

wjk

k

j .izη – ijw ij w

Error Backward:

Weight Update:

wij

Back Propagation

jkwk )ja('g= j Error Backward:

j .izη – ijw ij w

Page 27: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Vector-Matrix Multiplication

Input Weight Matrix

Ad

dit

ion Doesn’t support

row-level addition

a1

a2

a3

a4

a1 a2 a3 a4

a1 a2 a3 a4

a1 a2 a3 a4

a1 a2 a3 a4

Row-Parallel Copy

Transposed Input

Transposed Weight

Addition

Multiplication

Multiplication

Page 28: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model
Page 29: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Neural Network: Convolution Layer

Weight MatrixZ1 Z2 Z3

Z4 Z5 Z6

Z7 Z8 Z9

w1 w2

w3 w4

InputConvolution

Z1 Z2 Z3

Z4 Z5 Z6

Z7 Z8 Z9

How to move convolution windows in memory

Write in memory is too slow!

*

Z1 Z2 Z3

Z4 Z5 Z6

Z7 Z8 Z9

Input

w1 w2 w3 w4

w1 w2 w3 w4

w1 w2 w3 w4

Shifter

Expand weights

Multiplication Addition Addition

Page 30: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model
Page 31: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Neural Network: Back Propagation

Activation Function (g)

Derivative Activation (g’)

Error Backward Weight Update

Feed ForwardZi Weight Matrix aj

Zj

g’(aj)

δk Weight Matrix

Weight Wjk

Weight Wij

Weight Wij

Updated Weight

ηδjZi

i j

jkwk )ja('g= j

wjk

k

j .izη – ijw ij w

Error Backward:

Weight Update:

wij

η

δj

jkwk )ja('g= j Error Backward:

j .izη – ijw ij w

Back Propagation

Page 32: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Memory Layout: Back Propagation

δk

CopiesWTjk g’(aj) ηZj δj

PIM Reserved

Stored during Feed Forward

δj

CopiesWTij g’(ai)ηZi δi

PIM Reserved

Stored during Feed Forward

Switch

Error Backward Weight Update

δk Weight Matrix

Weight Wjk

Weight Wij

Updated Weight

ηδjZi

δj

Update next layer weights

Page 33: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Memory Layout: Back Propagation

δk

CopiesWTjk g’(aj) ηZj δj

PIM Reserved

Stored during Feed Forward

δj

CopiesWTij g’(ai)ηZi δi

PIM Reserved

Stored during Feed Forward

Switch

Error Backward Weight Update

δk Weight Matrix

Weight Wjk

Weight Wij

Updated Weight

ηδjZi

δj

Update next layer weights

Page 34: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Approaches for Applying Deep Networks on Mobile Devices

a.Designing Light-Weight Deep Networks

directly towards mobile applications

b.Performing Model Compression over Heavy but

Accurate Network

c.Knowledge Distillation from a Heavy Large

Teacher Network to a Light-Weight Small Student

Network

d.Searching Light-Weight Network according to

Pre-defined Constraints

Page 35: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

35

Original Net Design

Knowledge Distillation

Model Compression

Efficient Implementation

loss

student

teacher

data

New Layer Types

Design Space Exploration

Techniques for Creating Fast & Energy-Efficient DNNs

Page 36: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

SqueezeNet

Page 37: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Three issues to define size of neural networks:

• Kernel numbers for convolution

• Channel numbers for image input or feature maps

• Size of feature maps

Page 38: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

SqueezeNet

Page 39: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

39

SqueezeNet

• Strategy 1. Replace 3×3 filters with 1x1 filters◦ Parameters per filter: (3×3 filter) = 9 * (1x1 filter)

• Strategy 2. Decrease the number of input channels to 3×3 filters by using squeeze layers

◦ Total # of parameters: (# of input channels) * (# of filters) * ( # of parameters per filter)

• Strategy 3. Down-sample late in the network so that convolution layers have large activation maps◦ Size of activation maps: the size of input data, the choice of layers in which to down-sample in the CNN architecture

Iandola F N, Han S, Moskewicz M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size[J]. arXiv preprint arXiv:1602.07360, 2016.

Key ideas

Page 40: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

40

SqueezeNet

• Fire module is consist of:

◦ A squeeze convolution layer

◦ full of 𝑆1×1# of 1×1 filters

◦ An expand layer

◦ mixture of 𝑒1×1 # of 1×1 and 𝑒3×3 # of 3×3 filters

Iandola F N, Han S, Moskewicz M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size[J]. arXiv preprint arXiv:1602.07360, 2016.

A Fire module is comprised of: a squeeze convolution layer (which has only

1x1 filters), feeding into an expand layer that has a mix of 1x1 and 3x3

convolution filters

Page 41: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

41

SqueezeNet• Strategy 2. Decrease the

number of input channels to 3×3 filters◦ Total # of parameters: (# of input channels) * (# of filters) * ( # of parameters per filter)

Squeeze Layer

Set 𝑆1×1 < (𝑒1×1 + 𝑒3×3)

How much can we limit 𝑆1×1?

limits the # of input channels to

3×3 filters

How much can we replace 3×3

with 1×1?

• Strategy 1. Replace 3×3 filters with

1x1 filters

◦ Parameters per filter: (3×3 filter) = 9

* (1×1 filter)(𝑒1×1 𝑣𝑠 𝑒3×3 )?

Iandola F N, Han S, Moskewicz M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size[J]. arXiv preprint arXiv:1602.07360, 2016.

A Fire module is comprised of: a

squeeze convolution layer (which

has only 1x1 filters), feeding into an

expand layer that has a mix of 1x1

and 3x3 convolution filters

Page 42: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

42

SqueezeNet

Iandola F N, Han S, Moskewicz M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size[J]. arXiv preprint arXiv:1602.07360, 2016.

These relative late placements of

pooling concentrates activation

maps at later phase to preserve

higher accuracy .

Strategy 3. Downsample

late in the network so that

convolution layers have large

activation maps

◦ Size of activation maps: the

size of input data, the choice of

layers in which to downsample

in the CNN architecture

Page 43: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

A Fire module is comprised of: a squeeze convolution layer (which has only

1x1 filters), feeding into an expand layer that has a mix of 1x1 and 3x3

convolution filters

Page 44: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model
Page 45: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

SqueezeNet Design Strategies

• Strategy 1. Replace 3x3 filters with 1x1 filters– Parameters per filter: (3x3 filter) = 9 * (1x1 filter)

• Strategy 2. Decrease the number of input channels to 3x3 filters – Total # of parameters: (# of input channels) * (# of

filters) * ( # of parameters per filter)

• Strategy 3. Down-sample late in the network so that convolution layers have large activation maps – Size of activation maps: the size of input data, the

choice of layers in which to down-sample in the CNN architecture

45Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size."

Page 46: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Microarchitecture – Fire Module

• Fire module is consist of:

– A squeeze convolution layer

• full of s1x1 # of 1x1 filters

– An expand layer

• mixture of e1x1 # of 1x1 and e3x3 # of 3x3 filters

46

Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size."

Page 47: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Microarchitecture – Fire Module

47

Squeeze Layer

Set s1x1 < (e1x1 + e3x3),

limits the # of input channels to 3*3

filters

Strategy 2. Decrease the number of input

channels to 3x3 filters

Total # of parameters: (# of input

channels) * (# of filters) * ( # of

parameters per filter)

How much can we limit

s1x1?

Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size."

Strategy 1. Replace 3*3 filters with 1*1

filters

Parameters per filter: (3*3 filter) = 9 *

(1*1 filter)

How much can we replace 3*3 with

1*1?

(e1x1 vs e3x3 )?

Page 48: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Parameters in Fire Module

The # of expanded filter(ei)ei = ei,1x1 + ei,3x3

The % of 3x3 filter in expanded layer(pct3x3)

ei,3x3 = pct3x3 * ei

The Squeeze Ratio(SR)si,1x1 = SR *ei

48Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size."

Page 49: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Macroarchitecture

Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size."

49

Strategy 3. Downsample late in the

network so that convolution layers have

large activation maps

Size of activation maps: the size of

input data, the choice of layers in

which to downsample in the CNN

architecture

These relative late placements of pooling

concentrates activation maps at later phase to

preserve higher accuracy

Page 50: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Macroarchitecture

50Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size."

Page 51: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Summary of SqueezeNet

• Comparing with AlexNet: with close accuracy rate, the ratio of their parameter sizes is 1:50

• If model compression is performed, comparing with AlexNet, the ratio of their parameter sizes is 1:510

• Three strategies: (a) using 1x1filter to replace 3x3filter; (b) using squeeze layer to reduce the channels; © performing down-sampling late in the network to gain larger activation map; all these are implemented in fire module

Page 52: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

MobileNet V1

Page 53: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model
Page 54: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Three issues to define size of neural networks:

• Kernel numbers for convolution

• Channel numbers for image input or feature maps

• Size of feature maps

Page 55: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model
Page 56: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model
Page 57: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

SqueezeNet

Page 58: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

SqueezeNet

MobileNet V1

Page 59: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

59

MobileNet v1

Key Idea : Depthwise Separable Convolution!

• The MobileNet model is based on depthwise

separable convolutions which is a form of

factorized convolutions which factorize a

standard convolution into a depthwise

convolution and a 1× 1 convolution called a

pointwise convolution.

Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.

Page 60: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

60

MobileNet v1

Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.

Page 61: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

61

MobileNet v1

Standard Convolution Operation

Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.

Page 62: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

62

MobileNet v1

Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.

VGG, Inception-v3

• VGG – use only 3x3 convolution

◦ Stack of 3x3 conv layers has same effective

receptive field

as 5x5 or 7x7 conv layer

◦ Deeper means more non-linearities

◦ Fewer parameters: 2 × (3 × 3 × C) vs (5

× 5 × C)

• Inception-v3

◦ Factorization of filters

Page 63: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

63

MobileNet v1

Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.

Why should we always consider all channels?

Page 64: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

64

MobileNet v1

Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.

Standard Convolution

Page 65: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

65

MobileNet v1

Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.

Depthwise Separable Convolution

• Depthwise Convolution + Pointwise

Convolution(1x1 convolution)

Page 66: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

66

MobileNet v1

Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.

Standard Convolution vs Depthwise Separable

Convolution

Page 67: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

67

MobileNet v1

Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.

Standard Convolution vs Depthwise Separable

Convolution

• Standard convolutions have the computational cost of

◦ 𝐷𝐾×𝐷𝐾×M×N×𝐷𝐹×𝐷𝐹• Depthwise separable convolutions cost

◦ 𝐷𝐾×𝐷𝐾×M×𝐷𝐹×𝐷𝐹+M×N×𝐷𝐹×𝐷𝐹• Reduction in computations

◦ 1/N+1/𝐷𝐾2

◦ If we use 3x3 depthwise separable convolutions, we get

between 8 to 9 times𝐷𝐾: width/height of filters

𝐷𝐹: width/height of feature maps

M : number of input channels

N : number of output

channels(number of filters)

Page 68: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

68

MobileNet v1

Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.

Depthwise Separable Convolution

Page 69: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

69

MobileNet v1

Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.

Model Structure

Page 70: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

70

MobileNet v1

Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.

Width Multiplier & Resolution Multiplier

• Width Multiplier – Thinner Models◦ For a given layer and width multiplier α, the number of input

channels M becomes αM and the number of output channels N becomes αN – where α with typical settings of 1, 0.75, 0.6 and 0.25• Resolution Multiplier – Reduced Representation

◦ The second hyper-parameter to reduce the computational cost of a neural network is a resolution multiplier ρ

◦ 0<ρ≤1, which is typically set of implicitly so that input resolution of network is 224, 192, 160 or 128(ρ = 1, 0.857, 0.714, 0.571)• Computational cost:

◦ 𝐷𝐾×𝐷𝐾×αM×ρ𝐷𝐹×ρ𝐷𝐹+αM×αN×ρ𝐷𝐹× ρ𝐷𝐹

Page 71: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

71

MobileNet v1

Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.

Width Multiplier & Resolution Multiplier

Page 72: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

72

MobileNet v1

Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.

Experiments – Model Choices

Page 73: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

73

MobileNet v1

Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.

Experiments – Results

Page 74: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

MobileNet V2

Page 75: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model
Page 76: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model
Page 77: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

77

MobileNet v2

Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

2018: 4510-4520.

Key ideas

• Strategy 1. Linear Bottleneck

◦ using depthwise separable convolution as

efficient building blocks. However, V2 introduces

two new features to the architecture: 1) linear

bottlenecks between the layers

• Strategy 2. Inverted Residual Blocks

◦ shortcut connections between the bottlenecks

Page 78: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

78

MobileNet v2

Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

2018: 4510-4520.

Linear Bottleneck

• Informally, for an input set of real images, we say that the set of layer activations forms a “manifold of interest” .

• It has been long assumed that manifolds of interest in neural networks could be embedded in low-dimensional subspaces.

Page 79: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

79

MobileNet v2

Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

2018: 4510-4520.

Residual Blocks

• Residual blocks connect the beginning and end of a convolutional block with a shortcut connection. By adding these two states the network has the opportunity of accessing earlier activations that weren’t modified in the convolutional block.

• Wide → narrow(bottleneck) → wide approach

Page 80: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

80

MobileNet v2

Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

2018: 4510-4520.

Inverted Residuals

• Inspired by the intuition that the bottlenecks

actually contain all the necessary information,

while an expansion layer acts merely as an

implementation detail that accompanies a non-

linear transformation of the tensor, the authors

use shortcuts directly between the bottlenecks.

• narrow → wide → narrow approach

Page 81: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

81

MobileNet v2

Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

2018: 4510-4520.

Information Flow Interpretation

• The proposed convolutional block has a unique property that allows to separate the network expressiveness (encoded by expansion layers) from its capacity (encoded by bottleneck inputs).

Page 82: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

82

MobileNet v2

Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

2018: 4510-4520.

The Architecture of MobileNetV2

• The architecture of

MobileNetV2 contains

the initial fully

convolution layer with

32 filters, followed by

19 residual bottleneck

layers described in the

Table 2.

Page 83: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

83

MobileNet v2

Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

2018: 4510-4520.

Memory Efficient Inference

• The amount of memory is simply the maximum totalsize of combined inputs and outputs across alloperations.

• If we treat a bottleneck residual block as a singleoperation (and treat inner convolution as adisposable tensor), the total amount of memorywould be dominated by the size of bottlenecktensors, rather than the size of tensors that areinternal to bottleneck (and much larger)

Page 84: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

84

MobileNet v2

Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

2018: 4510-4520.

Comparison of Convolutional Blocks for Different Architectures

Page 85: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

85

MobileNet v2

Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

2018: 4510-4520.

The Max Number of Channels/Memory(in Kb)

Page 86: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

86

MobileNet v2

Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

2018: 4510-4520.

ImageNet Classification Results

Page 87: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

87

MobileNet v2

Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

2018: 4510-4520.

MobileNet v1 Vs MobileNet v2

Page 88: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

88

MobileNet v2

Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

2018: 4510-4520.

Object Detection & Semantic Segmentation Results

Object Detection Semantic Segmentation

Page 89: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

ShuffleNet

Page 90: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Three issues to define size of neural networks:

• Kernel numbers for convolution

• Channel numbers for image input or feature maps

• Size of feature maps

Page 91: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

SqueezeNet

MobileNet V1

Page 92: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

SqueezeNet

MobileNet V1

ShuffleNet

Page 93: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

93

ShuffleNet

Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision

and Pattern Recognition. 2018: 6848-6856.

Key ideas

• Strategy 1. Use depthwise separable convolution

• Strategy 2. Grouped convolution on 1x1 convolution layers – pointwise group convolution

• Strategy 3. Channel shuffle operation after pointwise group convolution

Page 94: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

94

ShuffleNet

Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision

and Pattern Recognition. 2018: 6848-6856.

Grouped Convolution

Page 95: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

95

ShuffleNet

Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision

and Pattern Recognition. 2018: 6848-6856.

1x1 Grouped Convolution with Channel Shuffling

• If multiple group convolutions stack together, there is one side effect(a)◦ Outputs from a certain channel are only derived from a small fraction of input

channels• If we allow group convolution to obtain input data from different groups, the input

and outputchannels will be fully related

Page 96: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

96

ShuffleNet

Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision

and Pattern Recognition. 2018: 6848-6856.

Channel Shuffle Operation

• Suppose a convolutional layer with g groups whoseoutput has g×n channels; we first reshape theoutput channel dimension into (g, n), transposingand then flattening it back as the input of next layer.

• Channel shuffle operation is also differentiable

Page 97: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

97

ShuffleNet

Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision

and Pattern Recognition. 2018: 6848-6856.

ShuffleNet Units

Page 98: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

98

ShuffleNet

Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision

and Pattern Recognition. 2018: 6848-6856.

ShuffleNet Units

• From (a), replace the first 1x1 layer with pointwise group

convolution followed by a channel shuffle operation

• ReLU is not applied to 3x3 DWConv

• As for the case where ShuffleNet is applied with stride,

simply make to modifications

◦ Add 3x3 average pooling on the shortcut path

◦ Replace element-wise addition with channel

concatenation to enlarge channel dimension with little

extra computation

Page 99: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

99

ShuffleNet

Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision

and Pattern Recognition. 2018: 6848-6856.

Complexity

• For example, given the input size c×h×w and the bottleneck channel m

Page 100: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

100

ShuffleNet

Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision

and Pattern Recognition. 2018: 6848-6856.

ShuffleNet Architecture

Page 101: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

101

ShuffleNet

Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision

and Pattern Recognition. 2018: 6848-6856.

Experimental Results

• It is clear that channel shuffle consistently boosts classification scores for

different settings

Page 102: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

102

ShuffleNet

Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision

and Pattern Recognition. 2018: 6848-6856.

Experimental Results

Page 103: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

103

ShuffleNet

Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision

and Pattern Recognition. 2018: 6848-6856.

Experimental Results

• Results show that with similar accuracy ShuffleNet is much more efficient

than others

Page 104: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

104

ShuffleNet

Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision

and Pattern Recognition. 2018: 6848-6856.

Experimental Results

• Due to memory access and other overheads, every 4x theoretical complexity reduction usually result in ~2.6x

actual speedup. Compared with AlexNet achieving ~13x actual speedup(the theoretical speedup is 18x)

Page 105: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Except Designing Light-Weight Networks, Other

Approaches to enable Mobile Applications

include:

a.Network Compression such as singular value

decomposition (SVD), network pruning,

quantization, binarization, …..

b.Knowledge Distillation from heavy large

teacher network to light-weight small student

network

c.Automatic Network Searching

Page 106: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

106

Original Net Design

Knowledge Distillation

Model Compression

Efficient Implementation

loss

student

teacher

data

New Layer Types

Design Space Exploration

Techniques for Creating Fast & Energy-Efficient DNNs

Automatic Network Search

Page 107: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Network Compression and Speedup 107

Summary

Model

SqueezeNet1×1 filters , input channels , Down-sample late in network

MobileNet v1 Pruning ,Weight Sharing , Groupwise Conv.

MobileNet v2Depthwise conv + Pconv, Width Multiplier, Resolution Multiple

ShuffleNetDepthwise convolution with Channel Shuffle

Page 108: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

Network Compression and Speedup 108

Summary

Model Caffe Tensorflow Keras PyTorch Migration networkRecommendation

level

SqueezeNet 1★ 4★ 2★ 2★AlexNet,

faster-rcnn3★

MobileNet v1 4★ 5★ 3★ 3★Mobilenet-SSD,

MXNet, faster-rcnn3★

MobileNet v2 3★ 5★ 2★ 4★MobileNetv2-

SSDLite5★

ShuffleNet 5★ 4★ 2★ 5★ Shufflenet-SSD 4★

Tips: according to the number of people used

Page 109: Designing Light-Weight Networks for Mobile Applications · Networks on Mobile Devices a.Designing Light-Weight Deep Networks directly towards mobile applications b.Performing Model

References• Iandola F N, Han S, Moskewicz M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer

parameters and< 0.5 MB model size[J]. arXiv preprint arXiv:1602.07360, 2016.

• Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.

• Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 4510-4520.

• Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6848-6856.

• Han S, Mao H, Dally W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding[J]. arXiv preprint arXiv:1510.00149, 2015.

• Chollet F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1251-1258.

• http://slazebni.cs.illinois.edu/spring17/lec06_compression.pdf

• https://www.slideshare.net/JinwonLee9/mobilenet-pr044

• https://www.slideshare.net/JinwonLee9/pr108-mobilenetv2-inverted-residuals-and-linear-bottlenecks

• https://www.slideshare.net/JinwonLee9/shufflenet-pr054

109