large-margin softmax loss for convolutional neural networks · propose a generalized large-margin...

Post on 15-Mar-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Large-Margin Softmax Loss for Convolutional Neural Networks

Weiyang Liu1*, Yandong Wen2*, Zhiding Yu3 and Meng Yang4

1Peking University 2South China University of Technology3Carnegie Mellon University 4Shenzhen University *equal contribution

IntroductionCross-entropy loss together with softmax is arguably one of themost common used supervision components in convolutionalneutral networks (CNNs). Despite its simplicity, popularity andexcellent performance, the component does not explicitlyencourage discriminative learning of features. In this paper, wepropose a generalized large-margin softmax (L-Softmax) losswhich explicitly encourages intra-class compactness and inter-class separability between learned features. Moreover, L-Softmaxnot only can adjust the desired margin but also can avoidoverfitting. We also show that the L-Softmax loss can beoptimized by typical stochastic gradient descent. Extensiveexperiments on four benchmark datasets demonstrate that thedeeply-learned features with L-Softmax loss become morediscriminative, hence significantly boosting the performance on avariety of visual classification and verification tasks.

From Softmax Loss to Large-Margin Softmax Loss

Standard softmax loss can be written as

Using the transformation of inner products, the softmax loss isreformulated as

The large-margin softmax loss is formulated as

where

Intuition & Geometric InterpretationThe purpose of L-Softmax loss is to learn discriminative featureswith large angular margin. We train the CNN with L-Softmax losson MNIST dataset. The deeply-learned features are visualized inthe following figure.

One can observe that the features learned via L-Softmax loss areindeed more discriminative than those learned via standardsoftmax loss.

Experiments & Results

The geometric interpretation isgiven on the right. The L-Softmaxcan produce an angular decisionmargin between different classes,because it requires more rigorousclassification criteria comparedto the standard softmax loss. The parameter m controls the

desired decision margin. The L-Softmax can be easily

optimized using SGD. It can be used in tandem with

other regularization methods.

Accuracy (%) on MNISTAccuracy (%) on CIFAR10 & CIFAR10+

Accuracy (%) on CIFAR100 Verification accuracy(%) on LFWConfusion matrix on CIFAR10, CIFAR10+, CIFAR100

We perform extensive experiments on visualclassification and face verification task, achievingstate-of-the-art results on MNIST, CIFAR10,CIFAR100 and LFW public datasets. On all these datasets, we have shown that the

classification accuracy will be improved withlarger m, namely when the desired decisionmargin is set to be larger.

The confusion matrix on CIFAT10, CIFAR10+ andCIFAR100 validate the discriminativeness of thedeeply-learned features via our proposed L-Softmax loss.

top related