generalized zero-shot learning with deep calibration...
TRANSCRIPT
Generalized Zero-Shot Learning with DeepCalibration Network
Shichen Liu†, Mingsheng Long†, Jianmin Wang†, and MichaelI.Jordan‡
†School of Software, Tsinghua University, China†KLiss, MOE; BNRist; Research Center for Big Data, Tsinghua University, China
‡University of California, Berkeley, Berkeley, USA
Youngnam Kim
Machine Learning GroupDepartment of Computer Science and Engineering
Pohang University of Science and Technology
2018-11-20
PreliminaryClass semantic representation
I Class semantic representation have information on the classsuch as hand-labeled attribute vectors or text descriptions.
Figure: Attribute vectors on AWA dataset (Xian et al., 2017)
PreliminaryClass semantic representation
I Class semantic representation have information on the classsuch as hand-labeled attribute vectors or text descriptions.
Figure: Text description on CUP dataset (Annonymous, 2018)
PreliminaryZero-shot learning
I Seen class dataset Ds = (x (s)i , y
(s)i )Ns
i=1,
i-th example’s label y(s)i ∈ 1, . . . ,Cs and
semantic representations Ss = s(s)i
Csi=1
I Unseen class dataset Du = x (u)i
Nui=1 and
semantic representations Su = s(u)i
Cui=1
I where Ns is the number of seen class examples, Cs is thenumber of seen classes, Nu is the number of unseen classexamples and Cu is the number of unseen classes.
I Ss and Su are disjoint
PreliminaryZero-shot learning
I Train a model (φ, ψ) using seen class dataset Ds and semanticrepresentations Ss
I Define fc(x) = sim(φ(x), ψ(s(u)c ))
I Prediction: y(u)i = argmaxc fc(x (u)
i )
I Sometimes people use unseen class semantic representationsSu (Liu et al., 2018) or even unseen class examples Du (Zhaoet al., 2018)
PreliminaryGeneralized zero-shot learning
I Standard zero-shot learning: Predict sample’s label over onlyunseen classes.
I Generalized zero-shot learning: Predict sample’s label overboth seen and unseen classes.
I For all semantic representations S ′ = siCs+Cui=1
I Define fc(x) = sim(φ(x), ψ(sc))
I Predict yi = argmaxc fc(xi )
GZSL with deep calibration networkMotivation
I Deep learning models are likely to overfit to seen classesexamples and have overconfidence to seen classes examples(almost close to 1)
I Model’s prediction becomes uncertain when unseen classes areintroduced at test time.
I Over-confidence on seen class samples and uncertainty onunseen class samples hurt zero-shot learning accuracy
GZSL with deep calibration networkMotivation
GZSL with deep calibration networkPrediction function
I Embedding of a sample xi ; φ(xi ) ∈ Rk
I Embedding of a semantic representation sc ; ψ(sc) ∈ Rk
I Define fc(xi ) = sim(φ(xi ), ψ(sc)); similarity measure like innerproduct and cosine similarity
I Prediction; yi = argmaxc fc(xi )
I φ is a CNN (e.g. GoogLeNet-v2, ResNet-101) and ψ is a MLP
GZSL with deep calibration networkLoss function
I Sample x ’s class probability q over seen classes, τ istemperature
q(y = c |x) =exp (fc(x)/τ)∑Cs
c ′=1 exp (fc ′(x)/τ)(1)
I Let ground truth class probability p(y = c|x)
I Cross entropy loss L
L = −Ex[Ey |x∼p
[log q(y |x)
]](2)
I Using τ < 1 to mitigate overconfidence problem over seenclasses samples
GZSL with deep calibration networkMulti class hinge loss
I Most zero-shot learning methods used multi-class hinge loss;∆(yi , c) is 0 when yi equals to c and 1 otherwise.
Ns∑i=1
Cs∑c=1
max(0,∆(yi , c) + fc(xi )− fyi (xi )) (3)
I If fyi (xi )− fc(xi ) < ∆(yi , c), than
minφ,ψ
[fc(xi )− fyi (xi )
](4)
I This paper shows that cross entropy loss has an advantage onzero-shot classification accuracy, compared to multi-classhinge loss.
GZSL with deep calibration networkUncertainty calibration
I Samples x ’s class probability q(u)c over unseen classes Su;
f(u)c (x) is simmilarity between embedding of unseen class c
and embedding of sample x
q(u)c (y = c |x) =
exp(f(u)c (x)/τ)∑Cu
c ′=1 exp(f(u)c ′ (x)/τ)
(5)
I Entropy loss H
H = −Ex[Ey |x∼q(u)
c
[log q
(u)c (y |x)
]](6)
I Total loss function of DCN
minφ,ψL+ λH+ γΩ(φ, ψ) (7)
GZSL with deep calibration networkExperiments
I DatasetsI Animals with Attributes (AwA); coarse-grained and
medium-scale.
I Caltech-UCSD-Birds-200-2011 (CUB); fine-grained andmedium-scale.
I SUN Attribute (SUN); fine-graiend and medium-scale.
I Attribute Pascal and Yahoo (aPY) coarse-grained andsmall-scale
GZSL with deep calibration networkExperiments
GZSL with deep calibration networkEvaluation protocol
I Per-class classification accuracy
ACCC =1
|C |∑c∈C
#correctly predicted samples in class c
#samples in class c(8)
I Generalized zero-shot learning
ACCH =2ACCunseen × ACCseen
ACCunseen + ACCunseen(9)
GZSL with deep calibration networkExperimental results
I DCN w/o ET; DCN without entropy loss and temperaturecalibration
I DCN w E; DCN without entropy loss
GZSL with deep calibration networkExperimental results
GZSL with deep calibration networkExperimental results
GZSL with deep calibration networkAnalysis
I Temperature calibration mitigates overconfidence problem
Other zero-shot learning papers
I Stacked semantic-guided attention model for fine-grainedzero-shot learning (Yu et al., 2018)
I Domain-invariant projection learning for zero-shot recognition(Zhao et al., 2018)
I Feature generating networks for zero-shot learning (Xianet al., 2018)
I Corelation network: meta learning for zero-shot learning(Annonymous, 2018)
References
Annonymous. Correction networks: Meta-learning for zero-shot learning.https://openreview.net/forum?id=r1xurn0cKQ, 2018.
S. Liu, M. Long, J. Wang, and M. Jordan. Generalized zero-shot learning with deep calibration network. NIPS,2018.
Y. Xian, B. Schiele, and Z. Akata. Zero-shot learning-the good, the bad and the ugly. arXiv preprintarXiv:1703.04394, 2017.
Y. Xian, T. Lorenz, B. Schiele, and Z. Akata. Feature generating networks for zero-shot learning. In Proceedings ofthe IEEE conference on computer vision and pattern recognition, 2018.
Y. Yu, Z. Ji, Y. Fu, J. Guo, Y. Pang, and Z. Zhang. Stacked semantic-guided attention model for fine-grainedzero-shot learning. arXiv preprint arXiv:1805.08113, 2018.
A. Zhao, M. Ding, J. Guan, Z. Lu, T. Xiang, and J.-R. Wen. Domain-invariant projection learning for zero-shotrecognition. arXiv preprint arXiv:1810.08326, 2018.