generalized zero-shot learning with deep calibration...

21
Generalized Zero-Shot Learning with Deep Calibration Network Shichen Liu , Mingsheng Long , Jianmin Wang , and Michael I.Jordan School of Software, Tsinghua University, China KLiss, MOE; BNRist; Research Center for Big Data, Tsinghua University, China University of California, Berkeley, Berkeley, USA Youngnam Kim Machine Learning Group Department of Computer Science and Engineering Pohang University of Science and Technology 2018-11-20

Upload: others

Post on 29-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Generalized Zero-Shot Learning with Deep Calibration Networkmlg.postech.ac.kr/~readinglist/slides/20181120.pdf · 2018-11-20 · Generalized Zero-Shot Learning with Deep Calibration

Generalized Zero-Shot Learning with DeepCalibration Network

Shichen Liu†, Mingsheng Long†, Jianmin Wang†, and MichaelI.Jordan‡

†School of Software, Tsinghua University, China†KLiss, MOE; BNRist; Research Center for Big Data, Tsinghua University, China

‡University of California, Berkeley, Berkeley, USA

Youngnam Kim

Machine Learning GroupDepartment of Computer Science and Engineering

Pohang University of Science and Technology

2018-11-20

Page 2: Generalized Zero-Shot Learning with Deep Calibration Networkmlg.postech.ac.kr/~readinglist/slides/20181120.pdf · 2018-11-20 · Generalized Zero-Shot Learning with Deep Calibration

PreliminaryClass semantic representation

I Class semantic representation have information on the classsuch as hand-labeled attribute vectors or text descriptions.

Figure: Attribute vectors on AWA dataset (Xian et al., 2017)

Page 3: Generalized Zero-Shot Learning with Deep Calibration Networkmlg.postech.ac.kr/~readinglist/slides/20181120.pdf · 2018-11-20 · Generalized Zero-Shot Learning with Deep Calibration

PreliminaryClass semantic representation

I Class semantic representation have information on the classsuch as hand-labeled attribute vectors or text descriptions.

Figure: Text description on CUP dataset (Annonymous, 2018)

Page 4: Generalized Zero-Shot Learning with Deep Calibration Networkmlg.postech.ac.kr/~readinglist/slides/20181120.pdf · 2018-11-20 · Generalized Zero-Shot Learning with Deep Calibration

PreliminaryZero-shot learning

I Seen class dataset Ds = (x (s)i , y

(s)i )Ns

i=1,

i-th example’s label y(s)i ∈ 1, . . . ,Cs and

semantic representations Ss = s(s)i

Csi=1

I Unseen class dataset Du = x (u)i

Nui=1 and

semantic representations Su = s(u)i

Cui=1

I where Ns is the number of seen class examples, Cs is thenumber of seen classes, Nu is the number of unseen classexamples and Cu is the number of unseen classes.

I Ss and Su are disjoint

Page 5: Generalized Zero-Shot Learning with Deep Calibration Networkmlg.postech.ac.kr/~readinglist/slides/20181120.pdf · 2018-11-20 · Generalized Zero-Shot Learning with Deep Calibration

PreliminaryZero-shot learning

I Train a model (φ, ψ) using seen class dataset Ds and semanticrepresentations Ss

I Define fc(x) = sim(φ(x), ψ(s(u)c ))

I Prediction: y(u)i = argmaxc fc(x (u)

i )

I Sometimes people use unseen class semantic representationsSu (Liu et al., 2018) or even unseen class examples Du (Zhaoet al., 2018)

Page 6: Generalized Zero-Shot Learning with Deep Calibration Networkmlg.postech.ac.kr/~readinglist/slides/20181120.pdf · 2018-11-20 · Generalized Zero-Shot Learning with Deep Calibration

PreliminaryGeneralized zero-shot learning

I Standard zero-shot learning: Predict sample’s label over onlyunseen classes.

I Generalized zero-shot learning: Predict sample’s label overboth seen and unseen classes.

I For all semantic representations S ′ = siCs+Cui=1

I Define fc(x) = sim(φ(x), ψ(sc))

I Predict yi = argmaxc fc(xi )

Page 7: Generalized Zero-Shot Learning with Deep Calibration Networkmlg.postech.ac.kr/~readinglist/slides/20181120.pdf · 2018-11-20 · Generalized Zero-Shot Learning with Deep Calibration

GZSL with deep calibration networkMotivation

I Deep learning models are likely to overfit to seen classesexamples and have overconfidence to seen classes examples(almost close to 1)

I Model’s prediction becomes uncertain when unseen classes areintroduced at test time.

I Over-confidence on seen class samples and uncertainty onunseen class samples hurt zero-shot learning accuracy

Page 8: Generalized Zero-Shot Learning with Deep Calibration Networkmlg.postech.ac.kr/~readinglist/slides/20181120.pdf · 2018-11-20 · Generalized Zero-Shot Learning with Deep Calibration

GZSL with deep calibration networkMotivation

Page 9: Generalized Zero-Shot Learning with Deep Calibration Networkmlg.postech.ac.kr/~readinglist/slides/20181120.pdf · 2018-11-20 · Generalized Zero-Shot Learning with Deep Calibration

GZSL with deep calibration networkPrediction function

I Embedding of a sample xi ; φ(xi ) ∈ Rk

I Embedding of a semantic representation sc ; ψ(sc) ∈ Rk

I Define fc(xi ) = sim(φ(xi ), ψ(sc)); similarity measure like innerproduct and cosine similarity

I Prediction; yi = argmaxc fc(xi )

I φ is a CNN (e.g. GoogLeNet-v2, ResNet-101) and ψ is a MLP

Page 10: Generalized Zero-Shot Learning with Deep Calibration Networkmlg.postech.ac.kr/~readinglist/slides/20181120.pdf · 2018-11-20 · Generalized Zero-Shot Learning with Deep Calibration

GZSL with deep calibration networkLoss function

I Sample x ’s class probability q over seen classes, τ istemperature

q(y = c |x) =exp (fc(x)/τ)∑Cs

c ′=1 exp (fc ′(x)/τ)(1)

I Let ground truth class probability p(y = c|x)

I Cross entropy loss L

L = −Ex[Ey |x∼p

[log q(y |x)

]](2)

I Using τ < 1 to mitigate overconfidence problem over seenclasses samples

Page 11: Generalized Zero-Shot Learning with Deep Calibration Networkmlg.postech.ac.kr/~readinglist/slides/20181120.pdf · 2018-11-20 · Generalized Zero-Shot Learning with Deep Calibration

GZSL with deep calibration networkMulti class hinge loss

I Most zero-shot learning methods used multi-class hinge loss;∆(yi , c) is 0 when yi equals to c and 1 otherwise.

Ns∑i=1

Cs∑c=1

max(0,∆(yi , c) + fc(xi )− fyi (xi )) (3)

I If fyi (xi )− fc(xi ) < ∆(yi , c), than

minφ,ψ

[fc(xi )− fyi (xi )

](4)

I This paper shows that cross entropy loss has an advantage onzero-shot classification accuracy, compared to multi-classhinge loss.

Page 12: Generalized Zero-Shot Learning with Deep Calibration Networkmlg.postech.ac.kr/~readinglist/slides/20181120.pdf · 2018-11-20 · Generalized Zero-Shot Learning with Deep Calibration

GZSL with deep calibration networkUncertainty calibration

I Samples x ’s class probability q(u)c over unseen classes Su;

f(u)c (x) is simmilarity between embedding of unseen class c

and embedding of sample x

q(u)c (y = c |x) =

exp(f(u)c (x)/τ)∑Cu

c ′=1 exp(f(u)c ′ (x)/τ)

(5)

I Entropy loss H

H = −Ex[Ey |x∼q(u)

c

[log q

(u)c (y |x)

]](6)

I Total loss function of DCN

minφ,ψL+ λH+ γΩ(φ, ψ) (7)

Page 13: Generalized Zero-Shot Learning with Deep Calibration Networkmlg.postech.ac.kr/~readinglist/slides/20181120.pdf · 2018-11-20 · Generalized Zero-Shot Learning with Deep Calibration

GZSL with deep calibration networkExperiments

I DatasetsI Animals with Attributes (AwA); coarse-grained and

medium-scale.

I Caltech-UCSD-Birds-200-2011 (CUB); fine-grained andmedium-scale.

I SUN Attribute (SUN); fine-graiend and medium-scale.

I Attribute Pascal and Yahoo (aPY) coarse-grained andsmall-scale

Page 14: Generalized Zero-Shot Learning with Deep Calibration Networkmlg.postech.ac.kr/~readinglist/slides/20181120.pdf · 2018-11-20 · Generalized Zero-Shot Learning with Deep Calibration

GZSL with deep calibration networkExperiments

Page 15: Generalized Zero-Shot Learning with Deep Calibration Networkmlg.postech.ac.kr/~readinglist/slides/20181120.pdf · 2018-11-20 · Generalized Zero-Shot Learning with Deep Calibration

GZSL with deep calibration networkEvaluation protocol

I Per-class classification accuracy

ACCC =1

|C |∑c∈C

#correctly predicted samples in class c

#samples in class c(8)

I Generalized zero-shot learning

ACCH =2ACCunseen × ACCseen

ACCunseen + ACCunseen(9)

Page 16: Generalized Zero-Shot Learning with Deep Calibration Networkmlg.postech.ac.kr/~readinglist/slides/20181120.pdf · 2018-11-20 · Generalized Zero-Shot Learning with Deep Calibration

GZSL with deep calibration networkExperimental results

I DCN w/o ET; DCN without entropy loss and temperaturecalibration

I DCN w E; DCN without entropy loss

Page 17: Generalized Zero-Shot Learning with Deep Calibration Networkmlg.postech.ac.kr/~readinglist/slides/20181120.pdf · 2018-11-20 · Generalized Zero-Shot Learning with Deep Calibration

GZSL with deep calibration networkExperimental results

Page 18: Generalized Zero-Shot Learning with Deep Calibration Networkmlg.postech.ac.kr/~readinglist/slides/20181120.pdf · 2018-11-20 · Generalized Zero-Shot Learning with Deep Calibration

GZSL with deep calibration networkExperimental results

Page 19: Generalized Zero-Shot Learning with Deep Calibration Networkmlg.postech.ac.kr/~readinglist/slides/20181120.pdf · 2018-11-20 · Generalized Zero-Shot Learning with Deep Calibration

GZSL with deep calibration networkAnalysis

I Temperature calibration mitigates overconfidence problem

Page 20: Generalized Zero-Shot Learning with Deep Calibration Networkmlg.postech.ac.kr/~readinglist/slides/20181120.pdf · 2018-11-20 · Generalized Zero-Shot Learning with Deep Calibration

Other zero-shot learning papers

I Stacked semantic-guided attention model for fine-grainedzero-shot learning (Yu et al., 2018)

I Domain-invariant projection learning for zero-shot recognition(Zhao et al., 2018)

I Feature generating networks for zero-shot learning (Xianet al., 2018)

I Corelation network: meta learning for zero-shot learning(Annonymous, 2018)

Page 21: Generalized Zero-Shot Learning with Deep Calibration Networkmlg.postech.ac.kr/~readinglist/slides/20181120.pdf · 2018-11-20 · Generalized Zero-Shot Learning with Deep Calibration

References

Annonymous. Correction networks: Meta-learning for zero-shot learning.https://openreview.net/forum?id=r1xurn0cKQ, 2018.

S. Liu, M. Long, J. Wang, and M. Jordan. Generalized zero-shot learning with deep calibration network. NIPS,2018.

Y. Xian, B. Schiele, and Z. Akata. Zero-shot learning-the good, the bad and the ugly. arXiv preprintarXiv:1703.04394, 2017.

Y. Xian, T. Lorenz, B. Schiele, and Z. Akata. Feature generating networks for zero-shot learning. In Proceedings ofthe IEEE conference on computer vision and pattern recognition, 2018.

Y. Yu, Z. Ji, Y. Fu, J. Guo, Y. Pang, and Z. Zhang. Stacked semantic-guided attention model for fine-grainedzero-shot learning. arXiv preprint arXiv:1805.08113, 2018.

A. Zhao, M. Ding, J. Guan, Z. Lu, T. Xiang, and J.-R. Wen. Domain-invariant projection learning for zero-shotrecognition. arXiv preprint arXiv:1810.08326, 2018.