emerging topics in learning from noisy and missing...
TRANSCRIPT
Part I: 1/27
Emerging topics in learning from
noisy and missing data
ACM Multimedia 2016
X. Alameda-Pineda T. Hospedales E. Ricci N. Sebe X. Wang
Part I: 2/27
In a nutshell Goal
Introduce several methodologies for handling noisy, partially
annotated data
Focus on recent learning paradigms, such as low rank
modeling [T4,T6], deep learning [T3] and domain adaptation [T5]
Discuss emerging application scenarios, e.g. in social signal
processing and human-centric visual content analysis [T1,T2]
Special emphasis
Matrix completion, zero-shot learning [T7] , deep domain
adaptation
Related Tutorials [T1] Sparse and Low-Rank Modeling for High-Dimensional Data Analysis (CVPR 2015)
[T2] Sparse and Low-Rank Representations in Computer Vision - Theory, Algorithms, and Applications
(ICCV 2013)
[T3] Deep Learning in Image and Video Understanding (ICME 2014)
[T4] Domain Adaptation and Transfer Learning (ECCV 2014)
[T5] Emotional and Social Signals for Multimedia Research (ACM MM 2015)
[T6] Human-centric images and videos analysis (ACM MM 2015)
[T7] Zero-shot learning (ECCV 2016)
Part I: 3/27
Tutorial Overview
Schedule:
Part I – 30 min:
Challenges in learning from noisy and missing data
Part II – 60 min [Xavi & Elisa]:
Matrix Completion: recent advances and emerging
applications
Part III – 45 min [Tim]:
Zero-shot Learning: from shallow to deep learning
without annotations
Part IV – 45 min [Xiaogang]:
Learning from noisy and missing data with deep
networks
Material will be posted online:
http://mhug.disi.unitn.it/tutorial-acmmm16/
Part I: 4/27
The Machine Learning Revolution
Jeremy Howard (CEO of Enlitic, founder of Kaggle) The
wonderful and terrifying implications of computers that can learn
“The Machine Learning Revolution is
going to be very different from the
Industrial Revolution, because the
Machine Learning Revolution never
settles down. The better computers get at
intellectual activities, the more they can
build better computers to be better at
intellectual capabilities, so this is going to
be a kind of change that the world has
actually never experienced before…”
Part I: 5/27
What about data?
Computer Vision and Multimedia researchers are key players in
this revolution
Incredible progresses made thanks to availability of huge fully
annotated datasets
APPLE
Part I: 6/27
ImageNet Large Scale Visual
Recognition Challenge (ILSVRC)
“ILSVRC evaluates algorithms for object detection and image
classification at large scale... taking advantage of the quite
expensive labeling effort.”
Examples of images and annotations in
Part I: 7/27
Tremendous progress in the last years
ILSVRC results over the years
0.23
0.44
0.62 0.66
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
2013 2014 2015 2016
Avera
ge P
recis
ion
year
Detection
0.12
0.07
0.036 0.03
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
2013 2014 2015 2016
Err
or
year
Classification
0.3
0.25
0.09 0.077
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
2013 2014 2015 2016
Err
or
year
Localization
Part I: 8/27
Another success story
Autonomous driving & data, dataset examples:
KITTI Vision Benchmark (tasks: stereo, optical flow, visual
odometry, 3D object detection and 3D tracking)
KAIST Multispectral Pedestrian Dataset
Manual annotation
Manual annotation
Part I: 9/27
Collecting data
Collecting large scale fully annotated datasets is a resource-
consuming, often unaffordable task
The Faces of Mechanical Turk: http://waxy.org/2008/11/the_faces_of_mechanical_turk/
Part I: 10/27
Collecting Data
Collecting datasets is an open problem by itself [Sorokin, CVPR-W08], [Hata, CSCW17], [Kolesnikov, BMVC16]
Challenges in data collection
Different labeling needs: e.g. Is there a dog in the
image? Where is the dog?
Annotation quality: How good is it? How to be
sure?
Price: How much do we need to pay?
How to compare different annotation methods? e.g.
in house vs MTurk
Part I: 11/27
Collecting ImageNet
Two-steps process [Deng, CVPR09]
Step 1: Collect candidate images via the Internet. For each
synset*, the queries are the set of WordNet synonyms
Step 2: Rely on humans to verify candidate images
(Step 1) Accuracy of Internet image search results: 10%
(Step 2) Amazon Mechanical Turk (AMT) used for labeling
vision data
300 images: 0.02 dollar
14,197,122 images x 10 repetition: 9460 dollars
(Step 2) Quality control system
Human users make mistakes (not all users follow the
instructions, user disagreement, subtle or confusing synsets)
*set of synonyms of WordNet
Part I: 12/27
Let's move to the real world
ImageNet is great!
Essential training resource, useful for benchmarking and open
(API, original images, features, object attributes)
But what about the real world?
ImageNet based deep models can learn discriminant and
transferable features [Donahue, ICML14, Glorot, ICML11,
Yosinski NIPS14]
Dataset bias still there: the domain discrepancy can be
alleviated, but not removed
Part I: 13/27
A never-ending dilemma
Datasets should be:
large and variate enough so that learning
techniques can successfully exploit the variability
inherently present in real data
small enough so that they can be fully annotated
at a reasonable cost
Part I: 14/27
A never-ending dilemma
Datasets should be:
large and variate enough so that learning
techniques can successfully exploit the variability
inherently present in real data
small enough so that they can be fully annotated
at a reasonable cost
data-hungry
deep learning models
Part I: 15/27
Learning from noisy & missing data
Fundamental to deploy systems working in the wild
Robustness: How can we make a system robust to
corrupt inputs? How to handle noisy training/test data?
Adaptation: How can we make systems detect and
adapt to changes in their environment (e.g. low
overlap between train and test distributions)?
Part I: 16/27
Learning from noisy & missing data
When?
Few/no annotations available for the task and/or the
scenario of interest
Scene-specific Detectors
[Zheng, ECCV14]
Training data Test data
Part I: 17/27
Learning from noisy & missing data
When?
Few/no data available for the task/scenario of interest
… a blue persian cat?
Many cats but...
Part I: 18/27
Learning from noisy & missing data
When?
Annotated data are typically available in abundance for
traditional tasks but not for uncommon ones
Emotion recognition from paintings
[Alameda-Pineda, CVPR16]
Scary? What is more interesting?
Interestingness prediction
[Fu, TPAMI16]
Part I: 19/27
Learning from noisy & missing data
When?
Training data are derived from other sources or agents in
the environment
Head & Body pose estimation
[Alameda-Pineda, ACMMM15]
Part I: 20/27
Learning from noisy & missing data
When?
Some data can be hard to annotate (e.g. human-centric
annotations: perceptual attributes, image captioning)
[Misra, CVPR16] [Porzi, ACMMM15]
Safe?
Part I: 21/27
Traditional Research Directions Learning from noise and missing data: How? (Broad
literature)
Partially annotated labels Semi-supervised & transductive learning. Traditionally: self-training,
generative models, multi-view learning, graph-based methods, etc. Surveys: [Chapelle, 06], [Zhu, 08]
Noisy labels Methods taxonomy: (i) label noise-robust methods, (ii) probabilistic
label noise-tolerant methods, (iii) data cleaning methods and (iv)
model-based label noise-tolerant methods. Survey: [Frenay, TNNL14]
Missing & Noisy features/observations Problem formalization dates back to [Rubin, Biometrika76]. Popular
approaches are imputation methods and expectation maximization . Surveys: [McKnight, 07], [Aste, PAA15]
Part I: 22/27
Recent Trends
Low-Rank Matrix Recovery and Completion via Convex
Optimization
Idea: recovering low-rank matrices from incomplete or
corrupted observations.
Approach: formulate the problem as a matrix rank minimization
problem and solve it efficiently by nuclear-norm minimization
+
“corrupted” matrix low-rank
matrix
sparse error matrix
Useful resources: http://perception.csl.illinois.edu/matrix-rank/
Part I: 23/27
Recent Trends
Transfer learning and domain adaptation Idea: transfer information from source to target domain, by avoiding
learning from scratch
The amount of annotated data for the new task is reduced
Examples:
One-shot Learning: one labeled target sample.
Zero-shot Learning: both the target samples and their labels are not
available. Source and target share textual information.
Traditional Machine Learning Transfer Learning
Survey on transfer learning [Pan, TKDE10]
Part I: 24/27
Recent Trends Deep learning models for:
Deep domain adaptation
• Unsupervised Domain Adaptation: target domain is fully
unlabeled. [Zheng, ECCV14], [Sun, AAAI16], [Long, NIPS16]
• Semi-supervised Domain Adaptation: target has few labeled
data. [Chopra, ICMLW13], [Tzeng, ICCV15], [Ganin, ICML15]*
• Heterogeneous Domain Adaptation: heterogeneous modalities [Hoffman, CVPR16], [Shu, ACMMM15]
Learning from noisy labels (explicitly modeling label noise)
• Noise at Random (NAR) assumption, i.e. noise as conditionally
independent of the observation. [Mnih,ICML12], [Lee, ICMLWS13],
[Sukhbaatar, ICLR15]
• Noise Not at Random (NNAR), observation conditional noise. [Reed,arxiv14], [Xiao, CVPR15], [Misra,CVPR16]
* some approaches also apply to unsupervised setting
Part I: 25/27
References
A. Sorokin and D. Forsyth, “Utility data annotation with Amazon Mechanical Turk,”
ICVPR-W, 2008.
K. Hata, et al. “A Glimpse Far into the Future: Understanding Long-term Crowd Worker
Accuracy,” CSCW, 2017.
A. Kolesnikov, C. H. Lampert, "Improving Weakly-Supervised Object Localization By
Micro-Annotation", British Machine Vision Conference (BMVC), 2016.
J. Deng, et al. “ImageNet: A Large-Scale Hierarchical Image Database,” CVPR, 2009.
J. Donahue, et al. “Decaf: A deep convolutional activation feature for generic visual
recognition,” ICML, 2014.
X. Glorot, et al. “Domain adaptation for large-scale sentiment classification: A deep
learning approach,” ICML, 2011.
J. Yosinski, et al. “How transferable are features in deep neural networks?,” NIPS, 2014.
X Zeng, W Ouyang, M Wang, X Wang, "Deep learning of scene-specific classifier for
pedestrian detection," ECCV 2014.
X. Alameda-Pineda, et al. “Recognizing Emotions from Abstract Paintings using Non-
Linear Matrix Completion,” IEEE CVPR, 2016.
Y. Fu, et al. “Robust Estimation of Subjective Visual Properties from Crowdsourced
Pairwise Labels,” IEEE TPAMI, 2016.
X. Alameda-Pineda, et al. “Analyzing Free-standing Conversational Groups: A Multimodal
Approach,” ACM Multimedia, 2015.
Part I: 26/27
References
L. Porzi, et al. “Predicting and understanding urban perception with convolutional neural
networks,” ACM Multimedia, 2015.
I. Misra, et al. “Seeing through the Human Reporting Bias: Visual Classifiers from Noisy
Human-Centric Labels,” CVPR, 2016.
O. Chapelle, et al. “Semi-supervised learning,” MIT Press, 2006.
X. Zhu, “Semi-Supervised Learning Literature Survey,” University of Wisconsin, 2008.
B. Frénay and M. Verleysen, “Classification in the Presence of Label Noise: a Survey,”
IEEE TNNL, 2014.
D. B. Rubin, “Inference and missing data,” Biometrika, 1976.
P. E. McKnight, et al. “Missing Data: A Gentle Introduction,” The Guilford Press, 2007.
M. Aste, et al. “Techniques for dealing with incomplete data: a tutorial and survey,” PAA,
2014.
S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE TKDE, 2010.
B. Sun, et al. “Return of Frustratingly Easy Domain Adaptation,” AAAI, 2016.
M. Long, et al. “Unsupervised Domain Adaptation with Residual Transfer Networks”,
NIPS16.
S. Chopra, et al. "DLID: Deep Learning for Domain Adaptation by Interpolating between
Domains", ICML Workshop on Challenges in Representation Learning, 2013.
E. Tzeng, et al. “Simultaneous Deep Transfer Across Domains and Tasks,” ICCV, 2015.
Part I: 27/27
References
Y. Ganin, et al. “Domain-adversarial training of neural networks,” ICML 2015.
Hoffman, et al. “Learning with Side Information through Modality Hallucination”, CVPR
2016.
Shu, et al. “Weakly-Shared Deep Transfer Networks for Heterogeneous-Domain
Knowledge Propagation”, ACMMM 2015.
V. Mnih and G. E. Hinton, “Learning to label aerial images from noisy data,” ICML,
2012.
D.-H. Lee “Pseudo-label: The simple and efficient semi-supervised learning method for
deep neural networks,” ICML-W, 2013.
Sukhbaatar, et al, “Training Convolutional Networks with Noisy Labels.”, ICLR
Workshop track, 2015
Reed, et al. “Training deep neural networks on noisy labels with bootstrapping”, Arxiv
2014.
T. Xiao, et al. "Learning from massive noisy labeled data for image classification,"
CVPR 2015.