proxy anchor loss for deep metric learning
Post on 11-Dec-2021
12 Views
Preview:
TRANSCRIPT
Proxy Anchor Loss for Deep Metric Learning
Sungyeon Kim Dongwon Kim Minsu Cho Suha Kwak
{tjddus9597, kdwon, mscho, suha.kwak}@postech.ac.kr
2
Metric Learning
How much similar/dissimilar semantically?
Metric: Function that quantifies a distance
,๐ท
,๐ท <
Metric Learning: Learning a metric from a set of data
Applications
3
Content-based image retrieval Face verification/identification[1]
[1] FaceNet: A unified embedding for face recognition and clustering, CVPR 2015
Applications
4
Person re-identification[2] Patch matching/stereo imaging[3]
[2] Beyond triplet loss: a deep quadruplet network for person re-identification, CVPR 2017[3] Learning to compare image patches via convolutional neural networks, CVPR 2015
โ๐
Deep Metric Learning
5
๐ฑ๐ ๐ ๐ฑ๐
๐ฑ๐ ๐ ๐ฑ๐
๐ฑ๐ ๐ ๐ฑ๐
Learning a deep embedding network ๐ so that semantically similar images are closely grouped together
Distance = Semantic dissimilarity
This quality of the embedding space is mainly determined by
loss functions used for training the network.
โข Triplet rank loss[1]
โ๐
Examples of Metric Learning Losses
6
๐ฑ๐ ๐ ๐ฑ๐
๐ฑ๐ ๐ ๐ฑ๐
๐ฑ๐ ๐ ๐ฑ๐
โtri ๐, ๐, ๐ = ๐ท ๐๐, ๐๐ โ ๐ท ๐๐ , ๐๐ + ๐ฟ+
[1] FaceNet: A unified embedding for face recognition and clustering, CVPR 2015
๐ท ๐ ๐ฑ๐ , ๐ ๐ฑ๐ < ๐ท ๐ ๐ฑ๐ , ๐ ๐ฑ๐
Anchor
Positive
Negative
โข Proxy NCA loss[6]
โ๐
Examples of Metric Learning Losses
7
๐ฑ๐ ๐ ๐ฑ๐
๐ฑ๐ ๐ ๐ฑ๐
๐ฑ๐ ๐ ๐ฑ๐
โproxyNCA ๐ต = ฯ๐โ๐ต ๐ท ๐๐ , ๐+ โ logฯ๐โโ๐โ exp โ๐ท ๐๐ , ๐
โ
[6] No fuss distance metric learning using proxies, ICCV 2017
Robert Downey Jr
Chris Hemsworth
Two Categories of Existing Losses
โข Pair-based lossesโข (+) Exploiting data-to-data relations, fine-grained relations between data
โข (โ) Prohibitively high training complexity
โข Examplesโข Contrastive loss[4]
โctr ๐, ๐ = ๐ฆ๐๐๐ท ๐๐ , ๐๐2+ 1 โ ๐ฆ๐๐ ๐ฟ โ ๐ท(๐๐ , ๐๐) +
2
โข Triplet rank loss[1]
โtri ๐, ๐, ๐ = ๐ท ๐๐, ๐๐ โ ๐ท ๐๐ , ๐๐ + ๐ฟ+
โข N-pair loss[5]
โNP ๐, ๐, ๐1, โฆ , ๐๐โ1 = log 1 + ฯ๐=1๐โ1 exp ๐ท ๐๐ , ๐๐ โ ๐ท ๐๐ , ๐๐๐
8
[1] FaceNet: A unified embedding for face recognition and clustering, CVPR 2015[4] Learning a similarity metric discriminatively with application to face verification, CVPR 2005[5] Improved deep metric learning with multi-class N-pair loss objective, NeurIPS 2016
Two Categories of Existing Losses
โข Proxy-based lossesโข Proxy
โข Representative of a subset of training data
โข Learned as a part of the network parameters
โข Taking each data point as an anchor and associating it with proxies
โข (+) Lower training complexity, faster convergence in general
โข (+) More robust against label noises and outliers
โข (โ) Leveraging impoverished data-to-proxy relations only
โข Example: Proxy-NCA loss[6]
9
โproxyNCA ๐ต = โ
๐โ๐ต
logexp โ๐ท ๐๐ , ๐
+
ฯ๐โโ๐โ exp โ๐ท ๐๐ , ๐โ
[6] No fuss distance metric learning using proxies, ICCV 2017
Two Categories of Existing Losses
10
Triplet rank loss N-pair loss
โData-to-data relationsโRich and fine-grained
Demanding high training complexity
Proxy-NCA loss
โData-to-proxy relationsโReducing training complexity
Impoverished information
Pair-based losses Proxy-based losses
Our Method
โข A new proxy-based loss called proxy anchor lossโข Taking only advantages of both categories
โข Overcoming their limitations
โข How it worksโข Using a proxy as an anchor, and associating it with all data in a batch
โข Fast convergence thanks to the use of proxies
โข Taking data-to-data relations into account by allowing data points to interact with each other during training
โข Resultsโข State-of-the-art performance
โข Fastest convergence (on the Cars-196 dataset)
11
Details of Proxy Anchor Loss
โข Mathematical form and its interpretation
13
โ ๐ต =1
๐+
๐โ๐+
log 1 +
๐โ๐ต๐+
exp โ๐ผ ๐ ๐๐ , ๐ โ ๐ฟ
+1
๐
๐โ๐
log 1 +
๐โ๐ต๐โ
exp ๐ผ ๐ ๐๐ , ๐ + ๐ฟ
โ ๐ =1
๐+
๐โ๐+
SoftPlus LSE๐โ๐ต๐
+โ๐ผ ๐ ๐๐ , ๐ โ ๐ฟ
+1
๐
๐โ๐
SoftPlus LSE๐โ๐ต๐
โ๐ผ ๐ ๐๐ , ๐ + ๐ฟ
๐ โ ,โ Cosine similarity
SoftPlusA smooth approx. of ReLU
LSEA smooth approx. of MAX
Details of Proxy Anchor Loss
โข Mathematical form and its interpretation
14
โ ๐ต =1
๐+
๐โ๐+
SoftPlus LSE๐โ๐ต๐
+โ๐ผ ๐ ๐๐ , ๐ โ ๐ฟ
+1
๐
๐โ๐
SoftPlus LSE๐โ๐ต๐
โ๐ผ ๐ ๐๐ , ๐ + ๐ฟ
Regarding LSE as MAX: pull ๐ and its hardest positive example together,push ๐ and its hardest negative example apart.
In practice pull/push all embedding vectors in the batch, but with different degrees of strength determined by their relative hardness.
Details of Proxy Anchor Loss
โข Analysis on its gradients
15
๐โ ๐ต
๐๐(๐๐ , ๐)=
1
๐+โ๐ผ โ๐
+(๐๐)
1 + ฯ๐โ๐ต๐
+ โ๐+(๐๐)
, โ๐ โ ๐ต๐+,
1
๐
๐ผ โ๐โ(๐๐)
1 + ฯ๐โ๐ต๐โ โ๐
โ(๐๐), โ๐ โ ๐ต๐
โ,
โ๐+ ๐ = exp โ๐ผ ๐ ๐, ๐ โ ๐ฟ : Positive hardness metric
โ๐โ ๐ = exp ๐ผ ๐ ๐, ๐ + ๐ฟ : Negative hardness metric
where
The gradient w.r.t. ๐๐ is affected by other examples in the batch.(The gradient becomes larger when ๐๐ is harder than others.)
Comparison to Proxy NCA
16
In the case of positive examples In the case of negative examples
Proxy NCA Proxy Anchor Proxy NCA Proxy Anchor
Uniform scale for all gradients
Scales weighted by relative hardness
Pushing only a small number of data with
uniform strength
Pushing all data with consideration of their
distribution
Complexity Analysis
17
Type Loss Training Complexity
Proxy
Proxy Anchor ๐(๐๐ถ)
Proxy NCA[6] ๐(๐๐ถ)
SoftTriplet[8] ๐(๐๐ถ๐2)
Pair
Contrastive[4] ๐(๐2)
Triplet[1] ๐(๐3)
N-pair[5] ๐(๐3)
Lifted Structure[7] ๐(๐3)
[1] FaceNet: A unified embedding for face recognition and clustering, CVPR 2015[4] Learning a similarity metric discriminatively with application to face verification, CVPR 2005[5] Improved deep metric learning with multi-class N-pair loss objective, NeurIPS 2016[6] No fuss distance metric learning using proxies, ICCV 2017[7] Deep metric learning via lifted structured feature embedding, CVPR 2016[8] Softtriple loss: Deep metric learning without triplet sampling, ICCV 2019
๐: # of data
๐ถ: # of classes (๐ถ โช ๐)
๐: # of proxies per class
The same complexity, butProxy Anchor convergesfaster & performs bettersince it considers relativehardness of data.
Experiments
โข Evaluation on the 4 image retrieval benchmarksโข Caltech-UCSD Bird 200 (CUB-200-2011)
โข Cars-196
โข Stanford Online Product (SOP)
โข In-Shop Clothes Retrieval (In-Shop)
โข Proxy setting: 1 proxy per class
โข Image settingโข Default: 224 X 224 (as in most previous work)
โข Larger: 256 X 256 (for comparison to HORDE[9])
โข Hyper-parameters: ๐ผ = 32, ๐ฟ = 10โ1
18[9] High-order regularizer for deep embeddings, ICCV 2019
Experiments
โข Quantitative results on the SOP (left) and In-Shop (right)
20
Our method achieves state-of-the-art performance in almost all settings
on the all 4 benchmarks.
Experiments
โข Impact of hyper-parameters
23
Accuracy vs. embedding dimension Accuracy vs. ๐ผ and ๐ฟ
The performance is stable and high enoughwhen the embedding dimension โฅ 128 and ๐ผ โฅ 16.
Conclusion
โข Contributions โข A new metric learning loss based on proxy
โข State-of-the-art performance
โข Fastest convergence
โข Future directionsโข Analysis on generalizability
โข Improving test time efficiency
25
References
26
[1] FaceNet: A unified embedding for face recognition and clustering, CVPR 2015
[2] Beyond triplet loss: A deep quadruplet network for person re-identification, CVPR 2017
[3] Learning to compare image patches via convolutional neural networks, CVPR 2015
[4] Learning a similarity metric discriminatively with application to face verification, CVPR 2005
[5] Improved deep metric learning with multi-class N-pair loss objective, NeurIPS 2016
[6] No fuss distance metric learning using proxies, ICCV 2017
[7] Deep metric learning via lifted structured feature embedding, CVPR 2016
[8] Softtriple loss: Deep metric learning without triplet sampling, ICCV 2019
[9] High-order regularizer for deep embeddings, ICCV 2019
top related