face recognition at-a-distance using texture, dense- and ... · face recognition at-a-distance...

4
Face Recognition at-a-Distance using Texture, Dense- and Sparse-Stereo Reconstruction Ham M. Rara, Asem A. Ali, Shireen Y. Elhabian, Thomas L. Starr, Aly A. Farag University of Louisville {hmrara01,amali003,syelha01,tlstar01,aafara01}@louisville.edu Abstract This paper introduces a framework for long-distance face recognition using dense and sparse stereo re- construction, with texture of the facial region. Two methods to determine correspondences of the stereo pair are used in this paper: (a) dense global stereo- matching using maximum-a-posteriori Markov Random Fields (MAP-MRF) algorithms and (b) Active Appear- ance Model (AAM) fitting of both images of the stereo pair and using the fitted AAM mesh as the sparse cor- respondences. Experiments are performed using com- binations of different features extracted from the dense and sparse reconstructions, as well as facial texture. The cumulative rank curves (CMC), which are gener- ated using the proposed framework, confirms the feasi- bility of the proposed work for long distance recognition of human faces. 1. Introduction Face recognition is a challenging task that has been an attractive research area in the past three decades [9]. The main theme of the solutions provided by different researchers involves detecting one or more faces from the given image, followed by facial feature extraction which can be used for recognition. Recently, there has been interest in face recogni- tion at-a-distance. Yao, et al. [8] created a face video database, acquired from long distances, high magnifica- tions, and both indoor and outdoor under uncontrolled surveillance conditions. They created a comprehensive processing algorithm to deal with image degradations related to long-distance image acquisition and were suc- cessful in improving recognition rates. Medioni, et al. [4] presented an approach to identify non-cooperative individuals at a distance by inferring 3D shape from a sequence of images. We constructed our own passive stereo acquisition setup and an accompanying database in [6]. Figure 1. Illustration of captured images: (a) 3-meter indoor (b) 15-meter indoor, (c) 30-meter outdoor, and (d) 50-meter out- door. In this paper, we used the same stereo setup to in- crease our database to a total of 61 subjects. In addition to previous indoor ranges, we acquired samples from outdoor ranges of 30 and 50 meters (Fig. 1). Exper- iments are then performed using various combinations of different features extracted from the dense and sparse reconstructions, as well as facial texture. The paper is organized as follows: Section 2 de- scribes stereo-based reconstruction, Section 3 discusses the features for recognition, Section 4 provides exper- imental results and related discussion, and Section 5 concludes the paper. 2. Stereo-matching Based Reconstruction Dense, Global Stereo Matching: The objective of the classical stereo problem is to find the pair of corre- sponding points p and q that result from the projection of the same scene point (X,Y,Z ) to the two images 2010 International Conference on Pattern Recognition 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.304 1225 2010 International Conference on Pattern Recognition 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.304 1225 2010 International Conference on Pattern Recognition 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.304 1221 2010 International Conference on Pattern Recognition 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.304 1221 2010 International Conference on Pattern Recognition 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.304 1221

Upload: vodan

Post on 12-Apr-2018

220 views

Category:

Documents


2 download

TRANSCRIPT

Face Recognition at-a-Distance using Texture, Dense- and Sparse-StereoReconstruction

Ham M. Rara, Asem A. Ali, Shireen Y. Elhabian, Thomas L. Starr, Aly A. FaragUniversity of Louisville

{hmrara01,amali003,syelha01,tlstar01,aafara01}@louisville.edu

Abstract

This paper introduces a framework for long-distanceface recognition using dense and sparse stereo re-construction, with texture of the facial region. Twomethods to determine correspondences of the stereopair are used in this paper: (a) dense global stereo-matching using maximum-a-posteriori Markov RandomFields (MAP-MRF) algorithms and (b) Active Appear-ance Model (AAM) fitting of both images of the stereopair and using the fitted AAM mesh as the sparse cor-respondences. Experiments are performed using com-binations of different features extracted from the denseand sparse reconstructions, as well as facial texture.The cumulative rank curves (CMC), which are gener-ated using the proposed framework, confirms the feasi-bility of the proposed work for long distance recognitionof human faces.

1. Introduction

Face recognition is a challenging task that has beenan attractive research area in the past three decades [9].The main theme of the solutions provided by differentresearchers involves detecting one or more faces fromthe given image, followed by facial feature extractionwhich can be used for recognition.

Recently, there has been interest in face recogni-tion at-a-distance. Yao, et al. [8] created a face videodatabase, acquired from long distances, high magnifica-tions, and both indoor and outdoor under uncontrolledsurveillance conditions. They created a comprehensiveprocessing algorithm to deal with image degradationsrelated to long-distance image acquisition and were suc-cessful in improving recognition rates. Medioni, et al.[4] presented an approach to identify non-cooperativeindividuals at a distance by inferring 3D shape from asequence of images. We constructed our own passivestereo acquisition setup and an accompanying databasein [6].

Figure 1. Illustration of captured images:(a) 3-meter indoor (b) 15-meter indoor, (c)30-meter outdoor, and (d) 50-meter out-door.

In this paper, we used the same stereo setup to in-crease our database to a total of 61 subjects. In additionto previous indoor ranges, we acquired samples fromoutdoor ranges of 30 and 50 meters (Fig. 1). Exper-iments are then performed using various combinationsof different features extracted from the dense and sparsereconstructions, as well as facial texture.

The paper is organized as follows: Section 2 de-scribes stereo-based reconstruction, Section 3 discussesthe features for recognition, Section 4 provides exper-imental results and related discussion, and Section 5concludes the paper.

2. Stereo-matching Based ReconstructionDense, Global Stereo Matching: The objective of

the classical stereo problem is to find the pair of corre-sponding points p and q that result from the projectionof the same scene point (X,Y, Z) to the two images

2010 International Conference on Pattern Recognition

1051-4651/10 $26.00 © 2010 IEEE

DOI 10.1109/ICPR.2010.304

1225

2010 International Conference on Pattern Recognition

1051-4651/10 $26.00 © 2010 IEEE

DOI 10.1109/ICPR.2010.304

1225

2010 International Conference on Pattern Recognition

1051-4651/10 $26.00 © 2010 IEEE

DOI 10.1109/ICPR.2010.304

1221

2010 International Conference on Pattern Recognition

1051-4651/10 $26.00 © 2010 IEEE

DOI 10.1109/ICPR.2010.304

1221

2010 International Conference on Pattern Recognition

1051-4651/10 $26.00 © 2010 IEEE

DOI 10.1109/ICPR.2010.304

1221

of the stereo-pair. The stereo problem is formulated asa MAP-MRF framework. To handle variant illumina-tion between stereo pairs, the data term of the energyfunction to be minimized is modified with a normal-ized cross-correlation (NCC) similarity measure. Dueto space constraints, more details can be found in [1].

Sparse-Stereo Reconstruction: This approach usesthe fitted AAM mesh of both images of the stereo pairas the sparse correspondences [6]. To facilitate a suc-cessful fitting process, the AAM mesh is initialized ac-cording to detected face landmarks (eyes, mouth center,and nose tip). More details can be found in [6][5].

3. Features for Face RecognitionFor face recognition, various features are extracted

from the dense and sparse reconstructions, as well asfacial texture.

Dense Reconstruction: The reconstruction resultsfor this section are basically a dense collection of 3Dpoints, resembling a face. The method used to classiftythe dense reconstructions in this paper is the straightfor-ward Procrustes approach [3] between the gallery andthe probe.

The Procrustes distance between two shapes is aleast-squares type of metric that requires one-to-onecorrespondence between shapes. After some prepro-cessing steps, the squared Procrustes distance betweentwo shapes x1 and x2 is the sum of squared point dis-tances:

P 2d = ‖x1 − x2‖2 (1)

Sparse Reconstruction: The graph-cuts approachused in dense reconstruction of stereo pairs can be com-putationally expensive, especially with high-resolutionimages. Sparse-stereo reconstruction [6], using the fit-ted AAM mesh of the stereo pair as correspondences,provides a quick alternative to dense reconstruction.This paper uses the findings in [6] that 2D projections ofthe 3D sparse reconstructions can provide decent clas-sification via a 2D version of the Procrustes distance.

Texture: To visualize the dense reconstruction re-sults, a pixel from the left/right image of the stereo pairis commonly mapped to each resulting 3D vertex (seeFig. 3). These textures can be used to classify faces, inaddition to the two earlier methods. The method used inthis paper is the classical principal component analysis(PCA) approach [2].

Confidence Levels: For each of the classificationapproaches (dense, sparse, and texture), we assign aconfidence value to the identification result. The firststep is to create a probability density function out ofthe set of all possible distance (similarity) measures be-tween authentic pairs (i.e., image pairs that belong to

Figure 2. Distribution of distance (similar-ity) measures between authentic pairs.

the same subject). Fig. 2 illustrates a real distribu-tion using the sparse-reconstruction approach. Sincewe are dealing with similarity measures, lower distancevalues mean good matches; higher values indicate un-sure matches. Therefore, given a distance d in Fig. 2,a reasonable measure of confidence is the area underthe curve to the right of d. These confidence values areused as scores to be combined later in the section in amulti-classifier architecture.

4. Experimental Results

Fig. 3 illustrates the 3D reconstruction results ac-cording to the methodology in [1]. The figure containsdifferent views of the reconstruction with and withouttexture. The reconstructed face contains about 10,0003D vertices.

Fig. 4 shows stereo reconstruction results of threesubjects, visualized with the x-y, x-z, and y-z projec-tions, after rigid alignment to one of the subjects. No-tice that in the x-y projections, the similarity (or differ-ence) of 2D shapes coming from the same (or different)subject is visually enhanced. In particular, Subject 1(probe) is visually similar to Subject 1 (gallery) thanSubject 2 (gallery) in the x-y projection. This similarity(or difference) is not obvious in other projections. Thisis the main reason behind the use of x-y projections asfeatures in Sec. 3 (Sparse Reconstruction). This phe-nomenon is validated in [5] with a much larger FRGCdatabase to prove that it is not a small-sample databaseoccurrence.

Experimental Setup: Our current database consistsof 61 subjects, with a gallery at 3 meters and four differ-ent probe sets at 3-meter and 15-meter indoors, togetherwith 30-meter and 50-meter outdoors. The 33-meter in-door of [6] is now replaced with 30-meter outdoor forbrevity of results. Fig. 1 illustrates the captured images(left image of the stereo pair) at different ranges.

The features discussed in Sec. 3 are now used forrecognition. No training is required for identification

12261226122212221222

Figure 3. Dense 3D reconstructions fromthe 15-meter range.

using dense- and sparse-stereo reconstructions, i.e., theProcrustes distance is computed directly between theprobe and each gallery instance and choosing the pairwith the smallest distance as the match. For recogni-tion using texture, the face space is determined by thegallery of 61 subjects at the 3-meter range.

Three quick observations can be garnered from Figs.5-8. Texture alone performs well at short ranges but isworst at outdoor and farther distances, i.e., it gets per-fect recognition at the 3-meter range but is mediocre atthe 30- and 50-meter outdoor ranges. This is expectedsince illumination variation can severely affect recogni-tion.

Dense reconstruction results generally have lowerrank-1 recognition rates compared to its two counter-parts. However, it performs better than texture at out-door cases (e.g., 30-m and 50-m). Notice that the densereconstruction results have a marked improvement overthe results at [5], which makes use of moments forrecognition.

Sparse reconstruction results provide the middleground between texture and dense reconstruction. Dueto the quick computation of sparse-stereo reconstruc-tion, it is an attractive alternative method for recogni-tion.

Multiple-Classifier Architecture: The observationregarding dense (and sparse) reconstruction results ar-riving perfect recognition at a lower rank compared totexture in outdoor conditions, leads to the design of amultiple-classifier architecture in Fig. 9.

In this architecture, the top n (e.g., n = 7) candi-dates from the dense reconstruction classifier is submit-ted to both sparse and texture classifiers. (An alternativewould be to replace the dense classifier with sparse and

Figure 4. Sparse-stereo reconstruction re-sults.

submit the results to both dense and texture classifiers.This architecture, along with other possibilities, will beconsidered in the future.) The final result uses the sumrule of decision fusion [7], weighted by the accuracy atrank-1 of each method at a specific range. Notice thatthe output of the multi-classifier approach is superior toall others, for any distance range in Figs. 5-8.

Figure 5. Cumulative match characteris-tic (CMC) curves for the 3-meter indoorprobe set.

5. Conclusions and Future Work

We have studied the use of texture, sparse-stereoand dense-stereo reconstructions, in the context of long-distance face recognition. With our database of im-ages, we have illustrated the effectiveness of relativelystraightforward algorithms, especially when combinedin a multi-classifier manner.

A continuing goal of this project is to further increasethe database size and include images as far as 100 me-ters at challenging image conditions. With that in mind,

12271227122312231223

Figure 6. CMC curves for the 15-meter in-door probe set.

Figure 7. CMC curves for the 30-meteroutdoor probe set.

the straightforward PCA for texture recognition will bereplaced by more elaborate algorithms.

The authors are aware of more sophisticated tech-niques for pure 3D facial shape recognition. However,at this stage of the project, as long as the stereo recon-struction is done right and input images are devoid ofexpressions, this work has illustrated that simpler algo-rithms (e.g., Procrustes) can do the job. As more chal-lenging scenarios are encountered later on, additionalnovel and existing approaches will be utilized.

Another future step of this project is to use a moresophisticated facial feature localization method that isbetter than the current active appearance model (AAM)approach.

Figure 8. CMC curves for the 50-meteroutdoor probe set.

Figure 9. Schematic diagram of the multi-classifier architecture.

References

[1] A. Ali, M. Miller, T. Starr, and A. Farag. Passive stereo-based 3d human face reconstruction at a distance. Tech-nical report, CVIP Lab, Univ. of Louisville, Jan. 2010.

[2] P. N. Belhumeur, J. Hespanha, and D. J. Kriegman.Eigenfaces vs. fisherfaces: Recognition using class spe-cific linear projection. IEEE Trans. Pattern Anal. Mach.Intell., 19(7):711–720, 1997.

[3] T. Cootes and C. Taylor. Statistical models of appearancefor computer vision. Technical report, Univ. of Manch-ester, UK, Mar. 2004.

[4] G. Medioni, J. Choi, C.-H. Kuo, and D. Fidaleo. Identify-ing noncooperative subjects at a distance using face im-ages and inferred three-dimensional face models. IEEETrans. Sys. Man Cyber. Part A, 39(1):12–24, 2009.

[5] H. Rara, S. Elhabian, A. Ali, T. Gault, M. Miller, T. Starr,and A. Farag. A framework for long distance face recog-nition using dense - and sparse-stereo reconstruction. InISVC ’09: Proceedings of the 5th International Sympo-sium on Advances in Visual Computing, pages 774–783,Berlin, Heidelberg, 2009. Springer-Verlag.

[6] H. Rara, S. Elhabian, A. Ali, M. Miller, T. Starr, andA. Farag. Face recognition at-a-distance based on sparse-stereo reconstruction. Computer Vision and PatternRecognition Workshop, 0:27–32, 2009.

[7] A. Ross and A. Jain. Information fusion in biometrics.Pattern Recogn. Lett., 24(13):2115–2125, 2003.

[8] Y. Yao, B. R. Abidi, N. D. Kalka, N. A. Schmid, andM. A. Abidi. Improving long range and high magni-fication face recognition: Database acquisition, evalua-tion, and enhancement. Comput. Vis. Image Underst.,111(2):111–125, 2008.

[9] W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld.Face recognition: A literature survey. ACM Comput.Surv., 35(4):399–458, 2003.

12281228122412241224