generalized zero-shot learning with deep calibration...

Generalized Zero-Shot Learning with DeepCalibration Network

Shichen Liu†, Mingsheng Long†, Jianmin Wang†, and MichaelI.Jordan‡

†School of Software, Tsinghua University, China†KLiss, MOE; BNRist; Research Center for Big Data, Tsinghua University, China

‡University of California, Berkeley, Berkeley, USA

Youngnam Kim

Machine Learning GroupDepartment of Computer Science and Engineering

Pohang University of Science and Technology

2018-11-20

PreliminaryClass semantic representation

I Class semantic representation have information on the classsuch as hand-labeled attribute vectors or text descriptions.

Figure: Attribute vectors on AWA dataset (Xian et al., 2017)

PreliminaryClass semantic representation

I Class semantic representation have information on the classsuch as hand-labeled attribute vectors or text descriptions.

Figure: Text description on CUP dataset (Annonymous, 2018)

PreliminaryZero-shot learning

I Seen class dataset Ds = (x (s)i , y

(s)i )Ns

i=1,

i-th example’s label y(s)i ∈ 1, . . . ,Cs and

semantic representations Ss = s(s)i

Csi=1

I Unseen class dataset Du = x (u)i

Nui=1 and

semantic representations Su = s(u)i

Cui=1

I where Ns is the number of seen class examples, Cs is thenumber of seen classes, Nu is the number of unseen classexamples and Cu is the number of unseen classes.

I Ss and Su are disjoint

PreliminaryZero-shot learning

I Train a model (φ, ψ) using seen class dataset Ds and semanticrepresentations Ss

I Define fc(x) = sim(φ(x), ψ(s(u)c ))

I Prediction: y(u)i = argmaxc fc(x (u)

i )

I Sometimes people use unseen class semantic representationsSu (Liu et al., 2018) or even unseen class examples Du (Zhaoet al., 2018)

PreliminaryGeneralized zero-shot learning

I Standard zero-shot learning: Predict sample’s label over onlyunseen classes.

I Generalized zero-shot learning: Predict sample’s label overboth seen and unseen classes.

I For all semantic representations S ′ = siCs+Cui=1

I Define fc(x) = sim(φ(x), ψ(sc))

I Predict yi = argmaxc fc(xi )

GZSL with deep calibration networkMotivation

I Deep learning models are likely to overfit to seen classesexamples and have overconfidence to seen classes examples(almost close to 1)

I Model’s prediction becomes uncertain when unseen classes areintroduced at test time.

I Over-confidence on seen class samples and uncertainty onunseen class samples hurt zero-shot learning accuracy

GZSL with deep calibration networkMotivation

GZSL with deep calibration networkPrediction function

I Embedding of a sample xi ; φ(xi ) ∈ Rk

I Embedding of a semantic representation sc ; ψ(sc) ∈ Rk

I Define fc(xi ) = sim(φ(xi ), ψ(sc)); similarity measure like innerproduct and cosine similarity

I Prediction; yi = argmaxc fc(xi )

I φ is a CNN (e.g. GoogLeNet-v2, ResNet-101) and ψ is a MLP

GZSL with deep calibration networkLoss function

I Sample x ’s class probability q over seen classes, τ istemperature

q(y = c |x) =exp (fc(x)/τ)∑Cs

c ′=1 exp (fc ′(x)/τ)(1)

I Let ground truth class probability p(y = c|x)

I Cross entropy loss L

L = −Ex[Ey |x∼p

[log q(y |x)

]](2)

I Using τ < 1 to mitigate overconfidence problem over seenclasses samples

GZSL with deep calibration networkMulti class hinge loss

I Most zero-shot learning methods used multi-class hinge loss;∆(yi , c) is 0 when yi equals to c and 1 otherwise.

Ns∑i=1

Cs∑c=1

max(0,∆(yi , c) + fc(xi )− fyi (xi )) (3)

I If fyi (xi )− fc(xi ) < ∆(yi , c), than

minφ,ψ

[fc(xi )− fyi (xi )

](4)

I This paper shows that cross entropy loss has an advantage onzero-shot classification accuracy, compared to multi-classhinge loss.

GZSL with deep calibration networkUncertainty calibration

I Samples x ’s class probability q(u)c over unseen classes Su;

f(u)c (x) is simmilarity between embedding of unseen class c

and embedding of sample x

q(u)c (y = c |x) =

exp(f(u)c (x)/τ)∑Cu

c ′=1 exp(f(u)c ′ (x)/τ)

(5)

I Entropy loss H

H = −Ex[Ey |x∼q(u)

c

[log q

(u)c (y |x)

]](6)

I Total loss function of DCN

minφ,ψL+ λH+ γΩ(φ, ψ) (7)

GZSL with deep calibration networkExperiments

I DatasetsI Animals with Attributes (AwA); coarse-grained and

medium-scale.

I Caltech-UCSD-Birds-200-2011 (CUB); fine-grained andmedium-scale.

I SUN Attribute (SUN); fine-graiend and medium-scale.

I Attribute Pascal and Yahoo (aPY) coarse-grained andsmall-scale

GZSL with deep calibration networkExperiments

GZSL with deep calibration networkEvaluation protocol

I Per-class classification accuracy

ACCC =1

|C |∑c∈C

#correctly predicted samples in class c

#samples in class c(8)

I Generalized zero-shot learning

ACCH =2ACCunseen × ACCseen

ACCunseen + ACCunseen(9)

GZSL with deep calibration networkExperimental results

I DCN w/o ET; DCN without entropy loss and temperaturecalibration

I DCN w E; DCN without entropy loss

GZSL with deep calibration networkExperimental results

GZSL with deep calibration networkAnalysis

I Temperature calibration mitigates overconfidence problem

Other zero-shot learning papers

I Stacked semantic-guided attention model for fine-grainedzero-shot learning (Yu et al., 2018)

I Domain-invariant projection learning for zero-shot recognition(Zhao et al., 2018)

I Feature generating networks for zero-shot learning (Xianet al., 2018)

I Corelation network: meta learning for zero-shot learning(Annonymous, 2018)

References

Annonymous. Correction networks: Meta-learning for zero-shot learning.https://openreview.net/forum?id=r1xurn0cKQ, 2018.

S. Liu, M. Long, J. Wang, and M. Jordan. Generalized zero-shot learning with deep calibration network. NIPS,2018.

Y. Xian, B. Schiele, and Z. Akata. Zero-shot learning-the good, the bad and the ugly. arXiv preprintarXiv:1703.04394, 2017.

Y. Xian, T. Lorenz, B. Schiele, and Z. Akata. Feature generating networks for zero-shot learning. In Proceedings ofthe IEEE conference on computer vision and pattern recognition, 2018.

Y. Yu, Z. Ji, Y. Fu, J. Guo, Y. Pang, and Z. Zhang. Stacked semantic-guided attention model for fine-grainedzero-shot learning. arXiv preprint arXiv:1805.08113, 2018.

A. Zhao, M. Ding, J. Guan, Z. Lu, T. Xiang, and J.-R. Wen. Domain-invariant projection learning for zero-shotrecognition. arXiv preprint arXiv:1810.08326, 2018.

generalized zero-shot learning with deep calibration...

Documents