fisher vector faces in the wild - bmva.org

1
Fisher Vector Faces in the Wild Karen Simonyan [email protected] Omkar M. Parkhi [email protected] Andrea Vedaldi [email protected] Andrew Zisserman [email protected] Visual Geometry Group Department of Engineering Science University of Oxford SAME DIFFERENT Figure 1: Face Verification. Given a pair of face images, the task is to verify if both depict the same person. This is a challenging task due to the variations in lighting, pose, face expression, etc. Overview. The face verification and recognition domain has mostly been dominated by carefully designed representations based on features com- puted around numerous facial landmarks [1, 3]. On the other hand, for image classification, image representations are capable of capturing dis- criminative image information without any domain-specific knowledge by using densely computed features, coupled with a non-linear encoding. For instance, the high-dimensional Fisher vector encoding [6] of SIFT features [5] achieves state-of-the-art performance on several image clas- sification benchmarks [2]. In this paper, we address this discrepancy and make two contribu- tions. First, we show that Fisher vector encoding of dense SIFT, an off-the-shelf image representation, achieves state-of-the-art face verifica- tion performance on the challenging “Labeled Faces in the Wild” (LFW) benchmark. Second, we show that the high-dimensional face represen- tation using Fisher vector encoding is amenable to discriminative dimen- sionality reduction. The resulting compact descriptor has equal or better recognition accuracy and is very well suited to large-scale face recogni- tion tasks. The overview of our face descriptor computation pipeline is shown in Fig. 2. The learnt dimensionality reduction model is visualised in Fig. 3. It is able to automatically extract the discriminative regions of the face. Results. In the unrestricted setting of LFW, we achieve 93.03 ± 1.05% face verification accuracy, closely matching 93.18 ± 1.07% obtained by the state-of-the-art method [3], based on high-dimensional LBP descrip- tors sampled around 27 face landmarks. It should be noted that the best result of [3] using SIFT is 91.77%. In the restricted setting of LFW, we achieve the verification accuracy of 87.47 ± 1.49%, setting the new state- of-the-art. This is better than the second best method of [4] by 3.4%. The ROC curves of our method as well as the other methods are shown in Fig. 4. [1] T. Berg and P. N. Belhumeur. Tom-vs-Pete classifiers and identity- preserving alignment for face verification. In Proc. BMVC., 2012. [2] K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman. The devil is in the details: an evaluation of recent feature encoding methods. In Proc. BMVC., 2011. [3] D. Chen, X. Cao, F. Wen, and Sun J. Blessing of dimisionality: High dimensional feature and its efficient compression for face verification. In Proc. CVPR, 2013. [4] H. Li, G. Hua, J. Brandt, and J. Yang. Probabilistic elastic matching for pose variant face verification. In Proc. CVPR, 2013. [5] D. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91–110, 2004. [6] F. Perronnin, J. Sánchez, and T. Mensink. Improving the Fisher kernel for large-scale image classification. In Proc. ECCV, 2010. Facial landmark detection Aligned and cropped face Dense SIFT, GMM, and FV Discriminative dim. reduction Compact face representation Figure 2: Method overview. Given an image, we first run an off-the-shelf face and facial landmark detectors to align the face to a canonical frame. Spatially-augmented SIFT features are then densely extracted from the face region and encoded into a high-dimensional Fisher vector face rep- resentation. Finally, discriminative dimensionality reduction is learnt on these features to compress them into a compact representation, while im- proving the discriminative ability. (a) (b) (c) (d) (e) Figure 3: Learnt model. A Fisher vector, coupled with discriminative di- mensionality reduction, can automatically capture the discriminative parts of the face. (a): an aligned face image; (b): unsupervised GMM clus- ters densely span the face; (c): a close-up of a face part covered by the Gaussians; (d): 50 Gaussians of the GMM, corresponding to the learnt projection matrix columns with the highest energy; (e): 50 Gaussians cor- responding to the learnt projection matrix columns with the lowest energy. 0 0.2 0.4 0.6 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 false positive rate true positive rate ROC Curves - Unrestricted Setting Our Method high-dim LBP CMD+SLBP Face.com CMD LBP-PLDA LDML-MKNN 0 0.2 0.4 0.6 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 false positive rate true positive rate ROC Curves - Restricted Setting Our Method APEM-Fusion V1-like(MKL) Figure 4: Comparison with the state-of-the-art. ROC curves of our method and the state-of-the-art techniques in LFW-unrestricted (left) and LFW-restricted (right) settings.

Upload: others

Post on 04-Dec-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fisher Vector Faces in the Wild - bmva.org

Fisher Vector Faces in the Wild

Karen [email protected]

Omkar M. [email protected]

Andrea [email protected]

Andrew [email protected]

Visual Geometry GroupDepartment of Engineering ScienceUniversity of Oxford

SAME DIFFERENT

Figure 1: Face Verification. Given a pair of face images, the task is toverify if both depict the same person. This is a challenging task due to thevariations in lighting, pose, face expression, etc.

Overview. The face verification and recognition domain has mostly beendominated by carefully designed representations based on features com-puted around numerous facial landmarks [1, 3]. On the other hand, forimage classification, image representations are capable of capturing dis-criminative image information without any domain-specific knowledgeby using densely computed features, coupled with a non-linear encoding.For instance, the high-dimensional Fisher vector encoding [6] of SIFTfeatures [5] achieves state-of-the-art performance on several image clas-sification benchmarks [2].

In this paper, we address this discrepancy and make two contribu-tions. First, we show that Fisher vector encoding of dense SIFT, anoff-the-shelf image representation, achieves state-of-the-art face verifica-tion performance on the challenging “Labeled Faces in the Wild” (LFW)benchmark. Second, we show that the high-dimensional face represen-tation using Fisher vector encoding is amenable to discriminative dimen-sionality reduction. The resulting compact descriptor has equal or betterrecognition accuracy and is very well suited to large-scale face recogni-tion tasks.

The overview of our face descriptor computation pipeline is shownin Fig. 2. The learnt dimensionality reduction model is visualised in Fig. 3.It is able to automatically extract the discriminative regions of the face.

Results. In the unrestricted setting of LFW, we achieve 93.03±1.05%face verification accuracy, closely matching 93.18± 1.07% obtained bythe state-of-the-art method [3], based on high-dimensional LBP descrip-tors sampled around 27 face landmarks. It should be noted that the bestresult of [3] using SIFT is 91.77%. In the restricted setting of LFW, weachieve the verification accuracy of 87.47±1.49%, setting the new state-of-the-art. This is better than the second best method of [4] by 3.4%.

The ROC curves of our method as well as the other methods areshown in Fig. 4.

[1] T. Berg and P. N. Belhumeur. Tom-vs-Pete classifiers and identity-preserving alignment for face verification. In Proc. BMVC., 2012.

[2] K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman. The devilis in the details: an evaluation of recent feature encoding methods. InProc. BMVC., 2011.

[3] D. Chen, X. Cao, F. Wen, and Sun J. Blessing of dimisionality: Highdimensional feature and its efficient compression for face verification.In Proc. CVPR, 2013.

[4] H. Li, G. Hua, J. Brandt, and J. Yang. Probabilistic elastic matchingfor pose variant face verification. In Proc. CVPR, 2013.

[5] D. Lowe. Distinctive image features from scale-invariant keypoints.IJCV, 60(2):91–110, 2004.

[6] F. Perronnin, J. Sánchez, and T. Mensink. Improving the Fisher kernelfor large-scale image classification. In Proc. ECCV, 2010.

Facial landmark detection

Aligned and cropped face

Dense SIFT, GMM, and FV

Discriminative dim. reduction

Compact face representation

Figure 2: Method overview. Given an image, we first run an off-the-shelfface and facial landmark detectors to align the face to a canonical frame.Spatially-augmented SIFT features are then densely extracted from theface region and encoded into a high-dimensional Fisher vector face rep-resentation. Finally, discriminative dimensionality reduction is learnt onthese features to compress them into a compact representation, while im-proving the discriminative ability.

(a) (b) (c) (d) (e)

Figure 3: Learnt model. A Fisher vector, coupled with discriminative di-mensionality reduction, can automatically capture the discriminative partsof the face. (a): an aligned face image; (b): unsupervised GMM clus-ters densely span the face; (c): a close-up of a face part covered by theGaussians; (d): 50 Gaussians of the GMM, corresponding to the learntprojection matrix columns with the highest energy; (e): 50 Gaussians cor-responding to the learnt projection matrix columns with the lowest energy.

0 0.2 0.4 0.60.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

false positive rate

tru

e p

ositiv

e r

ate

ROC Curves − Unrestricted Setting

Our Method

high−dim LBP

CMD+SLBP

Face.com

CMD

LBP−PLDA

LDML−MKNN

0 0.2 0.4 0.60.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

false positive rate

tru

e p

ositiv

e r

ate

ROC Curves − Restricted Setting

Our Method

APEM−Fusion

V1−like(MKL)

Figure 4: Comparison with the state-of-the-art. ROC curves of ourmethod and the state-of-the-art techniques in LFW-unrestricted (left) andLFW-restricted (right) settings.