new nettailor: tuning the architecture, not just the weightsmorgado/nettailor/poster.pdf · 2019....
Post on 20-Oct-2020
3 Views
Preview:
TRANSCRIPT
-
NetTailor: Tuning the Architecture, Not Just the WeightsPedro Morgado, Nuno Vasconcelos
University of California, San Diego
NetTailorAbstract Evaluation
Real-world applications of object recognition often require the solution of multiple tasks
in a single platform. We propose a transfer learning procedure, denoted NetTailor, in
which layers of a pre-trained CNN are used as universal blocks that can be combined with
small task-specific layers to generate new networks. In this way, NetTailor can adapt the
network architecture, not just its weights, to the target task.
Experiments show that networks adapted to simpler tasks become significantly smaller
than those adapted to complex tasks. Due to the modular nature of the procedure, this
reduction is achieved without compromise of either parameter sharing across tasks, or
classification accuracy.
Complexity Constraints
Conditions for removing block 𝑩𝒊𝒋 Probability
Self exclusion: Small coefficient 𝛼 associated with the block. 𝑃 𝑅𝑠𝑒𝑙𝑓𝑖,𝑗
= 1 − 𝛼𝑖𝑗
Input exclusion: When all incoming paths are removed 𝑃 𝑅𝑖𝑛𝑝𝑢𝑡𝑖,𝑗
=ෑ
𝑘
1 − 𝛼𝐼𝑘
Output exclusion: When all departing paths are removed 𝑃 𝑅𝑜𝑢𝑡𝑝𝑢𝑡𝑖,𝑗
=ෑ
𝑘
1 − 𝛼𝑂𝑘
𝐸[𝒞] =
𝑖,𝑗
𝒞𝑖𝑗1 − 𝑃 𝑅𝑠𝑒𝑙𝑓
𝑖,𝑗∪ 𝑅𝑖𝑛𝑝𝑢𝑡
𝑖,𝑗∪ 𝑅𝑜𝑢𝑡𝑝𝑢𝑡
𝑖,𝑗Expected Complexity
High performance on all tasks
Models share parameters and computation to be supported in the same device
Models trained independently of tasks/datasets for distributed development.
Adaptive model complexity (Easier tasks→ simple model, Complex → complex)
Goals
HP
SP
DD
AC
Prior Work
Feature Extraction HP SP DD AC Universal classifier HP SP DD AC
Residual Adapters
⋯⋯Task 1 Task 2 Task 3 Task 4
Pre
trai
ne
d M
od
el
HP SP DD ACFinetuning
Pre
trai
ne
d M
od
el
Task 1 Task 2 Task 3 Task 4
⋯
HP SP DD AC
Idea
U
U
U
U
U
P
P
P
P
P
C
P
P
P
P
U
U
U
U
U
P
P
P
P
P
C
P
P
P
P
U
U
U
P
C
PP
P
U
U
U
U
U
C
1 2 3
Start with a pre-trained network that remains unchanged.
Augment the original network ( ) with a set of small task-specific blocks ( )
Find the smallest combination of layers and connections that solves the task.
1
2
3
U P
Learned architectures
SVHN
Flowers
VOC
Accuracy
# Params
FLOPs
# Univ Layers
Prunned Unprunned
Accuracy
# Params
FLOPs
# Univ Layers
Accuracy
# Params
FLOPs
# Univ Layers
#Univ. LayersFLOPs
# ParamsAccuracy
#Univ. LayersFLOPs
# ParamsAccuracy
#Univ. LayersFLOPs
# ParamsAccuracy
Complex tasks require larger models, simpler task require smaller models
Path SelectionU
U
U
U
U
P
P
P
P
P
C
P
P
P
P
𝑈𝑖
𝑃𝑖−1𝑖
𝒙𝑖−1
𝑃𝑖−2𝑖
𝒙𝑖−2
𝑃𝑖−3𝑖
𝒙𝑖−3𝛼𝑖−3𝑖
𝛼𝑖−2𝑖
𝛼𝑖−1𝑖
𝛼𝑖𝑖 +
𝒙𝑖
An attention coefficient 𝛼 ∈ 0,1 for each block
𝒙𝑖 = 𝛼𝑖𝑖𝑈𝑖 𝒙𝑖−1 +
𝑘
𝛼𝑘𝑖 𝑃𝑘
𝑖 𝒙𝑘
Blocks removed from inference path when 𝛼 is small.
Coefficients 𝛼 are parameterized by a softmaxdistribution to force network paths to compete
Teacher NetworkTeacher Student
𝒅
𝒅
𝒅
To preserve performance, a teacher network finetuned on the target task is used to supervise the student at each stage.
𝐿𝑡 𝒙 =
𝑙
𝒙𝑙𝑡 − 𝒙𝑙
2
Learning1st Stage: Tuning the architecture.
Student model optimized by SGD under the loss ℒ = 𝐿𝑐 𝒙, 𝑦 + 𝜆1𝐿𝑡 𝒙 + 𝜆2𝐸[𝒞].
2nd Stage: Pruning and retraining.Blocks with small 𝛼 are removed. Remaining parameters retrained with 𝜆2 = 0.
❖ Study different block pruning strategies
❖ Extend NetTailor path selection framework to tasks beyond recognition (e.g. detection, semantic segmentation, multi-task learning)
❖ Study the benefits of NetTailor to general neural architecture search
NetTailor on different datasets
0.00
5.00
ImageNet Flowers CIFAR100 Dped UCF101 SVHN DTD Omniglot Aircraft GTSRPar
amet
ers
[M]
Prunned Unprunned
0.00
0.20
0.40
0.60
ImageNet Flowers CIFAR100 Dped UCF101 DTD Aircraft SVHN Omniglot GTSR
FLO
PS
[B]
Compression rates up to 80% of parameters and 66% of FLOPs
NetTailor on different base networks
# Params (M)
Acc
ura
cy (
%)
Original
Pruned
SVHN
Flowers
VOC
ResNet50 pruned to the size of ResNet18 while preserving its
performance
Support
https://pedro-morgado.github.io/nettailor
Website
https://github.com/pedro-morgado/nettailor
Repository
Task 1 Task 2 Task 3 Task 4
⋯
Feature extractor jointly trained for all tasks
Task 1 Task 2 Task 3 Task 4
⋯
Initialize
Pre
trai
ne
d M
od
el Feature extractor
remains frozen after pre-training.
Future Work
top related