neurips 2019 efficient and diverse batch acquisition for ... · email: {andreas.kirsch,...
TRANSCRIPT
![Page 1: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/1.jpg)
BatchBALDEfficient and Diverse Batch Acquisition for Deep Bayesian Active Learning
NeurIPS 2019
Andreas Kirsch*, Joost van Amersfoort*, Yarin GalOATML - University of Oxford
Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk
![Page 2: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/2.jpg)
Introduction to Active Learning
● A key problem in Deep Learning is data efficiency● It is expensive to a-priori obtain a large labelled dataset● In Active Learning, we instead iteratively acquire labels for only the most
interesting datapoints
Goal: achieve particular accuracy (e.g. 95%) with fewest number of labelled data points
![Page 3: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/3.jpg)
Active Learning
(labelled) (unlabelled)
Goal: 95% accuracy(labelled)
![Page 4: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/4.jpg)
Active Learning
(labelled) (unlabelled)
Goal: 95% accuracy(labelled)
![Page 5: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/5.jpg)
Acquisition Function
Instead of acquiring a random data point, we want to pick an informative point.
We acquire points using the acquisition function , which takes as input data points and the model parameters:
Distribution over model parameters
![Page 6: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/6.jpg)
Active Learning
(labelled) (unlabelled)
Goal: 95% accuracy(labelled)
![Page 7: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/7.jpg)
Uncertainty in Deterministic Networks
We have the following two approaches:
1. Variation ratio (the max of the softmax output)2. Entropy of the softmax output
Problem: uncertainty isn’t reliable. Standard deep models extrapolate arbitrarily!
![Page 8: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/8.jpg)
Active Learning - Bayesian Approaches
Alternative: use a Bayesian Neural Network
● Dropout as inference method● Scales well and is easy to implement and train● Enables more reliable uncertainty quantification:
○ Entropy of marginalised softmax output○ Mutual information between parameters and the true label (BALD)
![Page 9: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/9.jpg)
BALD: “Bayesian Active Learning by Disagreement”
Intuitively:1. First term captures general uncertainty of model2. Second term captures the uncertainty of a given draw of the model parameters
Score is high when model is uncertain (high entropy) and per sample certain (expectation of sample entropy high)
![Page 10: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/10.jpg)
Problems in Practice
● Re-training a deep model is computationally expensive
● Mobilising an expert is time consuming
→ In practice: acquire a batch of data by selecting top k
![Page 11: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/11.jpg)
Active Learning
(labelled) (unlabelled)
Goal: 95% accuracy(labelled)
![Page 12: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/12.jpg)
Problems in Practice
Naively acquiring the top k scoring points can lead to redundant acquisitions!
![Page 13: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/13.jpg)
BALD vs BatchBALD
Lack of DiversityPerforms worse than random on Repeated MNIST
![Page 14: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/14.jpg)
BatchBALD I - Introduction
→ Score batches of points jointlyJoint entropy: Hard to compute
Expectation over entropy: easy to compute
![Page 15: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/15.jpg)
BALD BatchBALD
No double-counting with BatchBALDDark areas are counted double
![Page 16: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/16.jpg)
BatchBALD II - MC estimator
We derive an MC estimator for the first term (implicitly conditioned on x):
● k - the number of samples of the parameter distribution● ŷ - a configuration of a batch of labels● Size of ŷ grows exponentially with .
![Page 17: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/17.jpg)
BatchBALD III - Greedy Approximation
To avoid combinatorial explosion: we grow the acquisition batch one by one.
![Page 18: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/18.jpg)
Greedy Approximation and Submodularity
● We can compute the greedy joint entropy exact for the first 4 points, afterwards we use the Monte Carlo estimator
● We prove in the paper that the greedy approximation is a submodular function and therefore its error is bounded by 1 - 1/e.
![Page 19: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/19.jpg)
Consistent dropout
There is a lot of variance in dropout uncertainty estimates. By using the same dropout masks across the unlabelled set we reduce variance:
![Page 20: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/20.jpg)
Results MNIST
![Page 21: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/21.jpg)
Results EMNIST
![Page 22: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/22.jpg)
Diversity
![Page 23: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/23.jpg)
Results CINIC-10 (CIFAR+ImageNet)
![Page 24: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/24.jpg)
Conclusion
BatchBALD improves over BALD:
1. Performs as well with acquisition size 10 as BALD with 12. Much lower total experiment time than BALD3. Performs well on datasets with repetition4. Performs well when the number of classes is high
![Page 25: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/25.jpg)
Future Work
● Estimating joint entropy with dropout samples is noisy● Training a deep model on a small dataset with dropout is difficult
○ Uncertainty of deterministic ResNet is (surprisingly) good in practice
● In Active Learning the labelled set does not follow the original data distribution○ This leads to difficulties with imbalanced datasets○ Random acquisition doesn’t have this problem
![Page 26: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/26.jpg)
Thanks!
Joost van Amersfoort*Andreas Kirsch* Yarin Gal
Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk
![Page 27: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in](https://reader034.vdocuments.us/reader034/viewer/2022042416/5f31348cf8c53b081a1abbe9/html5/thumbnails/27.jpg)
Gal, Yarin, and Zoubin Ghahramani. "Dropout as a bayesian approximation: Representing model uncertainty in deep learning." international conference on machine learning. 2016.
Gal, Yarin, Riashat Islam, and Zoubin Ghahramani. "Deep bayesian active learning with image data." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.
Houlsby, Neil, et al. "Bayesian active learning for classification and preference learning." arXiv preprint arXiv:1112.5745(2011).