adaptive neural trees12... · 2019-06-07 · daniel c. alexander, antonio criminisi, aditya nori...
TRANSCRIPT
Adaptive Neural TreesRyutaro Tanno, Kai Arulkumaran,
Daniel C. Alexander, Antonio Criminisi, Aditya Nori
#82
WaterGrey matter
White Matter
Two Paradigms of Machine LearningDeep Neural Networks Decision Trees
『hierarchical representation of data』 『hierarchical clustering of data』Super-resolution of dMR brain images with a DT [Alexander et al. NeuroImage 2017]
ImageNet classifiers with CNNs[Zeiler and Fergus, ECCV 2014]
Low-level features
Mid-level features
High-level features
Trainable Classifier
Oriented edges & colours
Textures & patterns Object parts
Two Paradigms of Machine LearningDeep Neural Networks Decision Trees
『hierarchical representation of data』 『hierarchical clustering of data』
Two Paradigms of Machine Learning
+ learn features of data
+ scalable learning with stochastic optimisation
- architectures are hand-designed
- heavy-weight inference, engaging every parameter of the model for each input
Deep Neural Networks Decision Trees
『hierarchical representation of data』 『hierarchical clustering of data』
Two Paradigms of Machine Learning
+ learn features of data
+ scalable learning with stochastic optimisation
- architectures are hand-designed
- heavy-weight inference, engaging every parameter of the model for each input
Deep Neural Networks
- operate on hand-designed features
- limited expressivity with simple splitting functions
+ architectures are learned from data
+ lightweight inference, activating only a fraction of the model per input
Decision Trees
『hierarchical representation of data』 『hierarchical clustering of data』
Joining the Paradigms
+ learn features of data
+ scalable learning with stochastic optimisation
+ architectures are learned from data
+ lightweight inference, activating only a fraction of the model per input
『hierarchical representation of data』 『hierarchical clustering of data』
Adaptive Neural Trees
ANTs unify the two paradigms and generalise previous work
Joining the Paradigms
+ learn features of data
+ scalable learning with stochastic optimisation
+ architectures are learned from data
+ lightweight inference, activating only a fraction of the model per input
『hierarchical representation of data』 『hierarchical clustering of data』
Adaptive Neural Trees
ANTs unify the two paradigms and generalise previous work
•ANTs consist of two key designs:
What are ANTs?
•ANTs consist of two key designs:
input, x
(1). DTs which uses NNs in every path and routing decisions.
What are ANTs?
•ANTs consist of two key designs:
(2). DT-like architecture growth using SGD
(1). DTs which uses NNs in every path and routing decisions.
(a) Split (b) DeepenTarget Node
OR
What are ANTs?
What are ANTs?•ANTs consist of two key designs:
(2). DT-like architecture growth using SGD
(1). DTs which uses NNs in every path and routing decisions.
Conditional Computation
0
0.9
1.8
ANT
1
ANT
2
ANT
3 0
5
10
ANT
1
ANT
2
ANT
3 0
0.8
1.6
ANT
0K
51K
101K
ANT
1
ANT
2
ANT
3
Multi-path inferenceSingle-path inference
0M
0.65M
1.3M
ANT
1
ANT
2
ANT
3
0K
50K
100K
ANT
ErrorsMNIST
(%)
Model size drops!
• Single-path inference enables efficient inference without compromising accuracy.
Number of ParametersCIFAR10
(%)SARCOS
(mse)MNIST CIFAR10 SARCOS
Adaptive Model Complexity
Models are trained on subsets of size 50, 250, 500, 2.5k, 5k, 25k, 45k examples.
• ANTs can tune the architecture to the availability of training data.
Please come & see me at poster #82 for details!
Unsupervised Hierarchical Clustering