emerging topics in learning from noisy and missing...

Part I: 1/27

Emerging topics in learning from

noisy and missing data

ACM Multimedia 2016

X. Alameda-Pineda T. Hospedales E. Ricci N. Sebe X. Wang

Part I: 2/27

In a nutshell Goal

Introduce several methodologies for handling noisy, partially

annotated data

Focus on recent learning paradigms, such as low rank

modeling [T4,T6], deep learning [T3] and domain adaptation [T5]

Discuss emerging application scenarios, e.g. in social signal

processing and human-centric visual content analysis [T1,T2]

Special emphasis

Matrix completion, zero-shot learning [T7] , deep domain

adaptation

Related Tutorials [T1] Sparse and Low-Rank Modeling for High-Dimensional Data Analysis (CVPR 2015)

[T2] Sparse and Low-Rank Representations in Computer Vision - Theory, Algorithms, and Applications

(ICCV 2013)

[T3] Deep Learning in Image and Video Understanding (ICME 2014)

[T4] Domain Adaptation and Transfer Learning (ECCV 2014)

[T5] Emotional and Social Signals for Multimedia Research (ACM MM 2015)

[T6] Human-centric images and videos analysis (ACM MM 2015)

[T7] Zero-shot learning (ECCV 2016)

Part I: 3/27

Tutorial Overview

Schedule:

Part I – 30 min:

Challenges in learning from noisy and missing data

Part II – 60 min [Xavi & Elisa]:

Matrix Completion: recent advances and emerging

applications

Part III – 45 min [Tim]:

Zero-shot Learning: from shallow to deep learning

without annotations

Part IV – 45 min [Xiaogang]:

Learning from noisy and missing data with deep

networks

Material will be posted online:

http://mhug.disi.unitn.it/tutorial-acmmm16/

Part I: 4/27

The Machine Learning Revolution

Jeremy Howard (CEO of Enlitic, founder of Kaggle) The

wonderful and terrifying implications of computers that can learn

“The Machine Learning Revolution is

going to be very different from the

Industrial Revolution, because the

Machine Learning Revolution never

settles down. The better computers get at

intellectual activities, the more they can

build better computers to be better at

intellectual capabilities, so this is going to

be a kind of change that the world has

actually never experienced before…”

https://www.ted.com/talks/jeremy_howard_the_wonderful_and_terrifying_implications_of_computers_that_can_learn

https://www.ted.com/talks/jeremy_howard_the_wonderful_and_terrifying_implications_of_computers_that_can_learn

Part I: 5/27

What about data?

Computer Vision and Multimedia researchers are key players in

this revolution

Incredible progresses made thanks to availability of huge fully

annotated datasets

APPLE

Part I: 6/27

ImageNet Large Scale Visual

Recognition Challenge (ILSVRC)

“ILSVRC evaluates algorithms for object detection and image

classification at large scale... taking advantage of the quite

expensive labeling effort.”

Examples of images and annotations in

Part I: 7/27

Tremendous progress in the last years

ILSVRC results over the years

0.23

0.44

0.62 0.66

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

2013 2014 2015 2016

Avera

ge P

recis

ion

year

Detection

0.12

0.07

0.036 0.03

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

2013 2014 2015 2016

Err

or

year

Classification

0.3

0.25

0.09 0.077

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

2013 2014 2015 2016

Err

or

year

Localization

Part I: 8/27

Another success story

Autonomous driving & data, dataset examples:

KITTI Vision Benchmark (tasks: stereo, optical flow, visual

odometry, 3D object detection and 3D tracking)

KAIST Multispectral Pedestrian Dataset

Manual annotation

Manual annotation

Part I: 9/27

Collecting data

Collecting large scale fully annotated datasets is a resource-

consuming, often unaffordable task

The Faces of Mechanical Turk: http://waxy.org/2008/11/the_faces_of_mechanical_turk/

http://waxy.org/2008/11/the_faces_of_mechanical_turk/

Part I: 10/27

Collecting Data

Collecting datasets is an open problem by itself [Sorokin, CVPR-W08], [Hata, CSCW17], [Kolesnikov, BMVC16]

Challenges in data collection

Different labeling needs: e.g. Is there a dog in the

image? Where is the dog?

Annotation quality: How good is it? How to be

sure?

Price: How much do we need to pay?

How to compare different annotation methods? e.g.

in house vs MTurk

Part I: 11/27

Collecting ImageNet

Two-steps process [Deng, CVPR09]

Step 1: Collect candidate images via the Internet. For each

synset*, the queries are the set of WordNet synonyms

Step 2: Rely on humans to verify candidate images

(Step 1) Accuracy of Internet image search results: 10%

(Step 2) Amazon Mechanical Turk (AMT) used for labeling

vision data

300 images: 0.02 dollar

14,197,122 images x 10 repetition: 9460 dollars

(Step 2) Quality control system

Human users make mistakes (not all users follow the

instructions, user disagreement, subtle or confusing synsets)

*set of synonyms of WordNet

Part I: 12/27

Let's move to the real world

ImageNet is great!

Essential training resource, useful for benchmarking and open

(API, original images, features, object attributes)

But what about the real world?

ImageNet based deep models can learn discriminant and

transferable features [Donahue, ICML14, Glorot, ICML11,

Yosinski NIPS14]

Dataset bias still there: the domain discrepancy can be

alleviated, but not removed

Part I: 13/27

A never-ending dilemma

Datasets should be:

large and variate enough so that learning

techniques can successfully exploit the variability

inherently present in real data

small enough so that they can be fully annotated

at a reasonable cost

Part I: 14/27

A never-ending dilemma

Datasets should be:

large and variate enough so that learning

techniques can successfully exploit the variability

inherently present in real data

small enough so that they can be fully annotated

at a reasonable cost

data-hungry

deep learning models

Part I: 15/27

Learning from noisy & missing data

Fundamental to deploy systems working in the wild

Robustness: How can we make a system robust to

corrupt inputs? How to handle noisy training/test data?

Adaptation: How can we make systems detect and

adapt to changes in their environment (e.g. low

overlap between train and test distributions)?

Part I: 16/27


When?

Few/no annotations available for the task and/or the

scenario of interest

Scene-specific Detectors

[Zheng, ECCV14]

Training data Test data

Part I: 17/27


When?

Few/no data available for the task/scenario of interest

… a blue persian cat?

Many cats but...

Part I: 18/27


When?

Annotated data are typically available in abundance for

traditional tasks but not for uncommon ones

Emotion recognition from paintings

[Alameda-Pineda, CVPR16]

Scary? What is more interesting?

Interestingness prediction

[Fu, TPAMI16]

Part I: 19/27


When?

Training data are derived from other sources or agents in

the environment

Head & Body pose estimation

[Alameda-Pineda, ACMMM15]

Part I: 20/27


When?

Some data can be hard to annotate (e.g. human-centric

annotations: perceptual attributes, image captioning)

[Misra, CVPR16] [Porzi, ACMMM15]

Safe?

Part I: 21/27

Traditional Research Directions Learning from noise and missing data: How? (Broad

literature)

Partially annotated labels Semi-supervised & transductive learning. Traditionally: self-training,

generative models, multi-view learning, graph-based methods, etc. Surveys: [Chapelle, 06], [Zhu, 08]

Noisy labels Methods taxonomy: (i) label noise-robust methods, (ii) probabilistic

label noise-tolerant methods, (iii) data cleaning methods and (iv)

model-based label noise-tolerant methods. Survey: [Frenay, TNNL14]

Missing & Noisy features/observations Problem formalization dates back to [Rubin, Biometrika76]. Popular

approaches are imputation methods and expectation maximization . Surveys: [McKnight, 07], [Aste, PAA15]

Part I: 22/27

Recent Trends

Low-Rank Matrix Recovery and Completion via Convex

Optimization

Idea: recovering low-rank matrices from incomplete or

corrupted observations.

Approach: formulate the problem as a matrix rank minimization

problem and solve it efficiently by nuclear-norm minimization

+

“corrupted” matrix low-rank

matrix

sparse error matrix

Useful resources: http://perception.csl.illinois.edu/matrix-rank/

http://perception.csl.illinois.edu/matrix-rank/



Part I: 23/27

Recent Trends

Transfer learning and domain adaptation Idea: transfer information from source to target domain, by avoiding

learning from scratch

The amount of annotated data for the new task is reduced

Examples:

One-shot Learning: one labeled target sample.

Zero-shot Learning: both the target samples and their labels are not

available. Source and target share textual information.

Traditional Machine Learning Transfer Learning

Survey on transfer learning [Pan, TKDE10]

Part I: 24/27

Recent Trends Deep learning models for:

Deep domain adaptation

• Unsupervised Domain Adaptation: target domain is fully

unlabeled. [Zheng, ECCV14], [Sun, AAAI16], [Long, NIPS16]

• Semi-supervised Domain Adaptation: target has few labeled

data. [Chopra, ICMLW13], [Tzeng, ICCV15], [Ganin, ICML15]*

• Heterogeneous Domain Adaptation: heterogeneous modalities [Hoffman, CVPR16], [Shu, ACMMM15]

Learning from noisy labels (explicitly modeling label noise)

• Noise at Random (NAR) assumption, i.e. noise as conditionally

independent of the observation. [Mnih,ICML12], [Lee, ICMLWS13],

[Sukhbaatar, ICLR15]

• Noise Not at Random (NNAR), observation conditional noise. [Reed,arxiv14], [Xiao, CVPR15], [Misra,CVPR16]

* some approaches also apply to unsupervised setting

Part I: 25/27

References

A. Sorokin and D. Forsyth, “Utility data annotation with Amazon Mechanical Turk,”

ICVPR-W, 2008.

K. Hata, et al. “A Glimpse Far into the Future: Understanding Long-term Crowd Worker

Accuracy,” CSCW, 2017.

A. Kolesnikov, C. H. Lampert, "Improving Weakly-Supervised Object Localization By

Micro-Annotation", British Machine Vision Conference (BMVC), 2016.

J. Deng, et al. “ImageNet: A Large-Scale Hierarchical Image Database,” CVPR, 2009.

J. Donahue, et al. “Decaf: A deep convolutional activation feature for generic visual

recognition,” ICML, 2014.

X. Glorot, et al. “Domain adaptation for large-scale sentiment classification: A deep

learning approach,” ICML, 2011.

J. Yosinski, et al. “How transferable are features in deep neural networks?,” NIPS, 2014.

X Zeng, W Ouyang, M Wang, X Wang, "Deep learning of scene-specific classifier for

pedestrian detection," ECCV 2014.

X. Alameda-Pineda, et al. “Recognizing Emotions from Abstract Paintings using Non-

Linear Matrix Completion,” IEEE CVPR, 2016.

Y. Fu, et al. “Robust Estimation of Subjective Visual Properties from Crowdsourced

Pairwise Labels,” IEEE TPAMI, 2016.

X. Alameda-Pineda, et al. “Analyzing Free-standing Conversational Groups: A Multimodal

Approach,” ACM Multimedia, 2015.

Part I: 26/27

References

L. Porzi, et al. “Predicting and understanding urban perception with convolutional neural

networks,” ACM Multimedia, 2015.

I. Misra, et al. “Seeing through the Human Reporting Bias: Visual Classifiers from Noisy

Human-Centric Labels,” CVPR, 2016.

O. Chapelle, et al. “Semi-supervised learning,” MIT Press, 2006.

X. Zhu, “Semi-Supervised Learning Literature Survey,” University of Wisconsin, 2008.

B. Frénay and M. Verleysen, “Classification in the Presence of Label Noise: a Survey,”

IEEE TNNL, 2014.

D. B. Rubin, “Inference and missing data,” Biometrika, 1976.

P. E. McKnight, et al. “Missing Data: A Gentle Introduction,” The Guilford Press, 2007.

M. Aste, et al. “Techniques for dealing with incomplete data: a tutorial and survey,” PAA,

2014.

S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE TKDE, 2010.

B. Sun, et al. “Return of Frustratingly Easy Domain Adaptation,” AAAI, 2016.

M. Long, et al. “Unsupervised Domain Adaptation with Residual Transfer Networks”,

NIPS16.

S. Chopra, et al. "DLID: Deep Learning for Domain Adaptation by Interpolating between

Domains", ICML Workshop on Challenges in Representation Learning, 2013.

E. Tzeng, et al. “Simultaneous Deep Transfer Across Domains and Tasks,” ICCV, 2015.

Part I: 27/27

References

Y. Ganin, et al. “Domain-adversarial training of neural networks,” ICML 2015.

Hoffman, et al. “Learning with Side Information through Modality Hallucination”, CVPR

2016.

Shu, et al. “Weakly-Shared Deep Transfer Networks for Heterogeneous-Domain

Knowledge Propagation”, ACMMM 2015.

V. Mnih and G. E. Hinton, “Learning to label aerial images from noisy data,” ICML,

2012.

D.-H. Lee “Pseudo-label: The simple and efficient semi-supervised learning method for

deep neural networks,” ICML-W, 2013.

Sukhbaatar, et al, “Training Convolutional Networks with Noisy Labels.”, ICLR

Workshop track, 2015

Reed, et al. “Training deep neural networks on noisy labels with bootstrapping”, Arxiv

2014.

T. Xiao, et al. "Learning from massive noisy labeled data for image classification,"

CVPR 2015.

emerging topics in learning from noisy and missing...

Documents