amc: automl for model compression and acceleration...speedup mobile-net on imagenet compressing...

1
Reward Functions For Resource-Constrained Compression, simply use Rerr=-Error For Accuracy-Guaranteed Compression, considering both accuracy and resource (like FLOPs): RFLOPs=-Error ∙ log(FLOPs) Abstract AutoML Model Compression Engine AMC Details Speedup Mobile-Net on ImageNet Compressing ResNet50 on ImageNet AMC: AutoML for Model Compression and Acceleration Ji Lin 1 * Yihui He 2 * Zhijian Liu 1 Hanrui Wang 1 Li-Jia Li 3 Song Han 1 1 Massachusetts Institute of Technology 2 Xi’an Jiaotong University 3 Google (* equal contributions) Model compression is an important technique facilitating efficient ML inference, while a human expert needs to find a good set of hyper-parameters (e.g., compression ratio of each layer), which requires domain expertise and many trials and errors, and is usually time-consuming and sub-optimal. Goal: Automate the compression pipeline and free human labor. “Model compression by AI”, which is automated, faster and enjoys higher performance. Reward Functions For Resource-Constrained Compression (RCG), simply use Rerr=-Error For Accuracy-Guaranteed Compression (AGC), consider both accuracy and resource (like FLOPs): RFLOPs=-Error ∙ log(FLOPs) He, Y., Lin, J., Liu, Z., Wang, H., Li, L.J. and Han, S., 2018. AMC: AutoML for Model Compression and Acceleration on Mobile Devices. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 784-800). References Novelty: 1. Learning based compression > Rule based compression 2. Resource-constrained search 3. Continuous action space for fine-grained surgery 4. Fast exploration with few GPUs (1GPU 4hours on ImageNet) Reward= -Error*log(FLOP) Agent: DDPG Action: Compress with Sparsity ratio at (e.g. 50%) Embedding st=[N,C,H,W,i…] Environment: Channel Pruning Layer t-1 Layer t Layer t+1 Critic Actor Embedding 30% 50% ? % DDPG Agent DDPG Agent for continuous action space (0-1) Input state embedding each layer and output sparse ratio Compression Methods Fine-grained Pruning for model size compression Coarse-grained/Channel Pruning for faster inference Search Protocols RCG to reach a desired compression ratio while getting highest possible performance. AGC to fully preserve the original accuracy while maintaining smallest possible model size. Upcoming work Optimize given FPGA's constraints and advantage (Xilinx)

Upload: others

Post on 16-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AMC: AutoML for Model Compression and Acceleration...Speedup Mobile-Net on ImageNet Compressing ResNet50 on ImageNet AMC: AutoML for Model Compression and Acceleration Ji Lin1* Yihui

Reward Functions

For Resource-Constrained Compression, simply use Rerr=-ErrorFor Accuracy-Guaranteed Compression, considering both accuracy and resource (like FLOPs): RFLOPs=-Error ∙ log(FLOPs)

Abstract AutoML Model Compression Engine AMC Details

Speedup Mobile-Net on ImageNet

Compressing ResNet50 on ImageNet

AMC: AutoML for Model Compression and AccelerationJi Lin1* Yihui He2* Zhijian Liu1 Hanrui Wang1 Li-Jia Li3 Song Han1

1 Massachusetts Institute of Technology 2 Xi’an Jiaotong University 3 Google (* equal contributions)

Model compression is an important technique facilitatingefficient ML inference, while a human expert needs to find a good set of hyper-parameters (e.g., compression ratio of each layer), which requires domain expertise and many trials and errors, and is usually time-consuming and sub-optimal.

Goal: Automate the compression pipeline and free human labor. “Model compression by AI”, which is automated, faster and enjoys higher performance.

Reward Functions• For Resource-Constrained Compression (RCG), simply use Rerr=-Error• For Accuracy-Guaranteed Compression (AGC), consider both accuracy

and resource (like FLOPs): RFLOPs=-Error ∙ log(FLOPs)

He, Y., Lin, J., Liu, Z., Wang, H., Li, L.J. and Han, S., 2018. AMC: AutoML for Model Compression and Acceleration on Mobile Devices. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 784-800).

References

Novelty: 1. Learning based compression > Rule based compression2. Resource-constrained search 3. Continuous action space for fine-grained surgery4. Fast exploration with few GPUs (1GPU 4hours on ImageNet)

2 Ji Lin, Yihui He, Zhij ian Liu, Hanrui Wang, Li-Jia Li, Song Han

Reward= -Error*log(FLOP)

Agent: DDPG

Action: Compress with

Sparsity ratio at (e.g. 50%)

Embedding st=[N,C,H,W,i…]

Environment: Channel Pruning

Layer t-1

Layer t

Layer t+1Critic

Actor

Embedding

Original NN

Model Compression by Human:

Labor Consuming, Sub-optimal

Model Compression by AI:

Automated, Higher Compression Rate, Faster

Compressed NN

AMC Engine

Original NN Compressed NN

30%

50%

? %

F ig. 1. Overview of AutoML for Model Compression (AMC) engine. Left : AMC replaces

human and makes model compression fully automated while performing bet ter than

human. Right : Form AMC as a reinforcement learning problem. We process a pret rained

network (e.g., MobileNet -V1) in a layer-by-layer manner. Our reinforcement learning

agent (DDPG) receives the embedding st from a layer t , and outputs a sparsity rat io

at . After the layer is compressed with at , it moves to the next layer L t + 1 . The accuracy

of the pruned model with all layers compressed is evaluated. Finally, as a funct ion of

accuracy and FLOP, reward R is returned to the reinforcement learning agent .

solved by brute-force methods. Reinforcement learning methods have been widely

approved to have bet ter sample efficiency than random explorat ion and achieve

bet ter solut ion. Therefore, we propose AutoML for Model Compression (AMC)

which leverages reinforcement learning to efficient ly sample the design space and

great ly improve the model compression quality. Figure 1 illust rates our AMC

engine. When compressing a network, rather than relying on human experience

or hand-crafted heurist ics, our AMC engine automates this process and frees the

compression pipeline from human labor.

We observe that the accuracy of the compressed model is very sensit ive to the

sparsity of each layer, requiring a fine-grained act ion space. Therefore, instead

of searching over a discrete space, we come up with a cont inuous compression

rat io cont rol st rategy with a DDPG [4] agent to learn through t rials and errors:

penalizing accuracy loss while encouraging model shrinking and speedup. The

actor-crit ic st ructure also helps to reduce variance, facilitat ing stabler t raining.

Specifically, our DDPG agent processes the network in a layer-wise manner. For

each layer L t , the agent receives a layer embedding st which encodes useful

characterist ics of this layer, and then it outputs a precise compression rat io at .

After layer L t is compressed with at , the agent moves to the next layer L t + 1. The

validat ion accuracy of the pruned model with all layers compressed is evaluated

without fine-tuning, which is an efficient delegate of the fine-tuned accuracy. This

simple approximat ion can improve the search t ime not having to ret rain the

model, and provide high quality search result . After the policy search is done,

the best -explored model is fine-tuned to achieve the best performance.

DDPG Agent• DDPG Agent for continuous action space (0-1)• Input state embedding each layer and output sparse ratio

Compression Methods• Fine-grained Pruning for model size compression• Coarse-grained/Channel Pruning for faster inference

Search Protocols• RCG to reach a desired compression ratio while getting

highest possible performance.• AGC to fully preserve the original accuracy while

maintaining smallest possible model size.

Upcoming work• Optimize given FPGA's constraints and advantage (Xilinx)