an exploration of gender identiﬁcation using only the ...bjou/docs/btas_2010.pdfreliably...

An Exploration of Gender IdentificationUsing Only the Periocular Region

Jameson Merkow, Brendan Jou, and Marios Savvides

Abstract— The periocular region, the region of the facesurrounding the eyes, has gained increasing attention in bio-metrics in recent years. This region of the face is of particularinterest when trying to identify a person whose face is partiallyoccluded. We propose the novel idea of applying the informationobtained from the periocular region to identify the gender ofa person, which is a type of soft biometric recognition. Wegradually narrow the region of interest of the face to explore thefeasibility of using smaller, eye-centered regions for building arobust gender classifier around the periocular region alone. Ourexperimental results show that at least an 85% classificationrate is still obtainable using only the periocular region with adatabase of 936 low resolution images collected from the web.

I. INTRODUCTION

Classical methods in facial biometrics have thrived onconsidering the entire face as the region of interest (ROI).The performance of these methods suffer greatly when thevarious portions of face are occluded. Recent strides haveinvestigated narrowing the principal region for biometricidentification to the area surrounding the eyes, referredto here as theperiocular region [9]. Brown et al. [4]claim that in human perception, with descending order ofimportance, the brow and eyes, brow alone, eyes alone,jaw alone, chin alone, nose and mouth, and mouth alonecarry information relevant for gender discrimination. Thefirst facial features, the brow and eyes, precisely define thescope of the periocular region, and the study suggests thatthis region contains sufficient and primary information forreliably identifying gender. In this work, we explore theuse of an increasingly narrowed ROIs, from the full faceto periocular, for algorithmically recognizing this widely-studied soft biometric.

Previous works in gender identification have been shownto perform with high accuracy (> 95%) given a databasewith controlled environments, pose, and focal length. Abdiet al. [1] compared raw image components from principalcomponent analysis (PCA), using Radial Basis Functions andperceptron networks, to identify discriminating attributes insexes. Applying PCA-based features to a database of160

images, an accuracy of91.8% was reported using percep-tron networks. Moghaddam et al. [7] showed that, usingthe 1755 images from the Facial Recognition Technology(FERET) database, classification rates as high as96.6%

J. Merkow and B. Jou are with the Department of Electrical and ComputerEngineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA E-mail: {jmerkow, bjou}@andrew.cmu.edu

M. Savvides is a Professor with the Department of Electrical andComputer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213,USA E-mail:[email protected]

can be achieved with kernel-based support vector machines(SVM).

II. IMPLEMENTATION

A. Database

Our working database consists of JPEG images retrievedfrom Flickr R© [12] using a web crawler. Our source imagesvary considerably with regard to image resolution, pose, focallength, et cetera. As a result, we select a subset of the imagesto highlight our target ROIs, considering only the images offaces with frontal and near frontal poses (∼5 degrees), andwhere the faces contain minimal occlusions and pitch. Weuse a total of936 images in this study, comprised of468

male and468 female.

(a) Male Subjects.

(b) Female Subjects.

Fig. 1. Database Subset.

978-1-4244-7580-3/10/$26.00 ©2010 IEEE

B. Regions of Interest

In order to isolate our ROIs, we first normalize rotationby orienting each face upright with respect to the eyes. Thegoal of this pre-processing stage is make the images robustagainst rotation in faces.

Fig. 2. Image Normalization.

The calculation of the necessary rotation to normalizeeach face orientation was obtained using seven key points.These manually labeled points correspond to landmarks onthe face. Points 1 through 4 are placed at the four extremaat the corners of the eyes; the remaining points correspondto tip of the nose, the center of the upper lip, and thechin, shown in Figure 3(a). The necessary rotation anglecan then be computed by the eye centroids, using the firstfour facial landmarks. In addition to the rotation invariance,focal length of the faces was also normalized by fixingthe horizontal distance from eye to eye, using the outerextrema (points 1 and 4). This automatically compensatesfor any variations in distance from the camera or zoomingeffects, a key component to performance when consideringthe periocular region.

Following normalization, the images from the databasewere divided into training and test sets with a ratio of3:1yielding702 images in training and234 in test. After splittingthe database, each image was resized and cropped from itsoriginal to a specific ROI. Each image was cropped to fourregions using the seven key facial landmark points. TheseROIs, referred to as “Full” and “Regions1−3,” are depictedin Figure 3(b). In this work, we designate Region3 as theperiocular region.

Each ROI is defined by using four bounds: left, right,upper and lower. For consistency, we fix the left and rightbounds of all the ROIs. These bounds were placed at theleft and right perimeter of the face respectively, with slightinclusion of the ear. The upper and lower bounds for the Fullcrop were derived from the vertical eye-to-chin distance, thatis, from the eye line to the chin landmark (point 7). In Region1, the upper bound is demarcated by using the vertical eye-to-nose distance from the eye line to remove information inthe head or hair region. We then remove the mandible inRegion2 by moving the bottom bound to the nose landmark(point 5). In the periocular Region3, the ROI is reduced toa box centered around the eye line with50% of the vertical

(a) Seven Face Points. (b) Regions of Interest.

Fig. 3. Face Landmarks and Crops.

eye-to-nose distance above and below the eye line resultingin a tight periocular area.

After the region refinement, each crop was resized to thesame resolution for each respective region given in Table I.All images were resized using bilinear interpolation with noanti-aliasing, other pre-filtering, or dithering. The intent ofapplying these methods for scaling is in the computationalsimplicity and proof of performance, even when using lowquality images.

Full 64 × 110

Region1 64× 68

Region2 64× 32

Region3 64× 18

TABLE I

ROI AND IMAGE RESOLUTIONS.

C. Gender Classification

For feature extraction, normalized pixel gray-scale inten-sities of the image are used as a baseline feature. We alsostudy the use of local binary patterns (LBPs) for classifica-tion. LBPs are a well-known distribution-based descriptor,often used in matching algorithms; its application here isto identify gender. In a3 × 3 LBP transform, each pixel iscompared with its eight adjacent pixels. If the neighboringpixel value is greater than the center value, the neighboringpixel is assigned1; otherwise0 is assigned. Given the eightneighboring pixels, an 8-bit binary number can be formedand converted to the decimal system for a gray-scale value(0 − 255) [8].

Fig. 4. Local Binary Pattern.

Gender identification is formulated as a binary classifi-cation problem, and popular pattern recognition techniquesare applied to categorize subjects as either male or female.

(a) Male Subjects. (b) Female Subjects.

Fig. 5. Full ROI Subset.

We study this classification using three algorithms: lineardiscriminant analysis (LDA) with a rank-1 nearest neighbor(NN) classifier, principal component analysis (PCA) with arank-1 NN classifier, and support vector machines (SVM).We compare these techniques using a 5-fold cross validation(CV) scheme.

LDA is a well known statistical method used in many highdimensional pattern recognition problems. LDA projects dataon a hyperplane that minimizes the within-class scatter andmaximizes between-class scatter. The projection hyperplaneis an important indicator of the most discriminant featuresof the data. In facial recognition, this plane is often referredto as a Fisherface and is used to visualize the regions thathave the highest discriminating value. In this work, LDAis performed by first reducing dimensionality using PCAto combat the singularity of the total within-class scattermatrix that results from the feature space dimensionalityexceeding the number of available training samples [2]. Thedimensionality of the reduced feature space we use consistsof the top250 principal components before applying LDAprojections.

SVM is a powerful, supervised learning algorithm, usedextensively in binary classification problems. SVM improveson simple regression by finding a plane that maximizesthe distance (margin) from the closest elements (supportvectors) to the hyperplane separating them. We use thepopular implementation SVMlight by Joachims [6]. Thisimplementation takes advantage of sparseness in the featurevectors to optimize computation. We utilize linear SVMs toperform gender classification.

III. EXPERIMENTAL RESULTS

Subsequent to the pre-processing stages on the data set, theresulting images for training and testing consist of grayscaleimages with slight blurring due to bilinear interpolation.A

(a) Male Subjects.


Fig. 6. Region 3 ROI Subset.

subset of these images, in Figures 5 and 6, shows the Fulland Region 3 crops, respectively.

PCA with a rank-1 NN method is used as the baselineclassification scheme and performs best, with80% accuracy,using LBP features in Region 2. In nearly all other trials,however, this method performs with an accuracy rate of72.5% to 75.1%. This performance is decidedly lower thanthe classification rates achieved by LDA and SVM.

With pixel-based features, LDA performs optimally withan 82.9% accuracy rate in the Full region; while SVMperforms 3.9% better, with 86.8% accuracy. The optimalSVM performance is achieved by executing several CV trials,and varying regularization/slack termsC between0.1 and1000 in each trial. The linear SVM performs optimally witha slack value of50.

When the ROI is reduced to Region3, the performance ofLDA falls by 2.9%, still achieving∼80% accuracy. The ac-curacy of the SVM classifier, under the same circumstances,drops by only2.1% to 84.9%. The test accuracies are furthersupported by receiver operating characteristic (ROC) curvesgenerated from the testing. The ROC curves show a distinctdrop in accuracy from the Full to the Region3 ROIs, but theSVM ROC curves shows only a small drop in accuracy.

Several intermediate ROIs exhibit peculiar behavior. Inalmost all trials, we note that Region1 has a lower accuracyrate than Region2; this result is counterintuitive, given thatRegion1 includes the jaw and mouth. With the LDA-NNclassifier, Region1 and Region2 perform with 77.1% and83.3% accuracy, an increase of6.2%. In our trials usingSVM with Region1 and Region2, there is an increase of1.7%, from 83.5% to 85.2%. This behavior is observed inCV performance, as well as in the ROC curves for trialsinvolving both LDA and SVM.

LBP features boost accuracy in both LDA and SVM whenit is used on the full face. These accuracies increase by0.7%

to 83.6% and 85.7% respectively. However, this increase

Full Region 1 Region 2 Region 340

45

50

55

60

65

70

75

80

85

90

ROI

Accu

racy

(%)

LDAPCALinear SVM

(a) Pixel.


50

60

70

80

90

ROI

Acc

urac

y(%

)

LDAPCALinear SVM

(b) LBP.

Fig. 7. Test Error with 5-fold CV.

does not apply for all the regions. When comparing theperiocular Region3 statistics, the performances of the pixel-and LBP-based features are approximately the same withLDA: accuracy rates of20.5% and 20.7%, respectively. Inaddition, there is a disparity of5.3% between pixel-basedfeature accuracies at84.9% and LBP at79.6% when usingthe linear SVM classifier with the periocular. In Regions2

and3, the accuracy of LBP descriptors is consistently at orbelow the accuracy obtained with the pixel-based features.This accuracy drop is also depicted in the ROC curves. TheROC on a full face with LBP is significantly better usingan SVM classifier and comparably the same with an LDA-NN classifier. The other three ROC curves corresponding torespective ROIs show drops in accuracy.

To support the use of the periocular region in genderidentification, we observe that, in the both Fisherfaces de-picted in Figures 9(a) and 9(b), it is apparent that theeyes, nose, and chin are the most discriminating featuresof the face with LDA as the metric. In particular, the eyeregion is highlighted as the most significant discriminatingarea for gender classification in the Fisherface for pixel-based features. In the LBP Fisherface, the discriminant areasare more uniform; this implies that by applying LBP-basedfeatures, the facial attributes for discriminating genderaremore evenly distributed than with pixel-based features thatconcentrate on the periocular region.

By examining Figure 10 we can hypothesize which areaof the face has more discriminatory value for each clas-sifier/descriptor combination. For example, observing thechanges from Full to Region1, the number of misclassifica-tions increases slightly, likely due to the discriminatoryvalueof hair, as it pertains to gender. Moreover, when using LBPdescriptors we notice a sharp increase in male and female

(a) LDA Pixel. (b) LDA LBP.

(c) SVM Pixel. (d) SVM LBP.

Fig. 8. ROC Curves.

(a) Pixel Fisherface. (b) LBP Fisherface.

Fig. 9. Visualization of LDA Hyperplanes.

misclassifications after removing the region that includesthehair. This is likely due to the potential rich texture presentthere.

Additionally, we note that the number of misidentifiedmales increases after the removal of the forehead and hair, asin Region1. The misclassification of males steadily decreasesas the ROI is narrowed to the periocular region, whereas thenumber of false negatives increases for female identification.This trend suggests that the periocular region is better suitedfor identifying males than females.

In Figure 11, we show selected examples of misclassifiedimages in Region 3. We observe that many of these pictureshave variations in illumination and pose, and are ratherdifficult to identify, even with the human eye.

IV. CONCLUSIONS AND FUTURE WORK

A. Conclusions

Our results show that the periocular region is a viableregion for performing gender classification. Although testaccuracies drop in both LDA and SVM classifiers whencomparing a full face to a periocular region, gender iden-tification using only the periocular region is still possible.When considered the periocular region, we note that SVM


5

10

15

20

25

30

35

ROI

No. M

iscl

ass

ified

FemaleMale

(a) LDA Pixel.


5

10

15

20

25

30

35

ROI

No. M

iscl

ass

ified

FemaleMale

(b) LDA LBP.


5

10

15

20

25

30

35

ROI

No. M

iscl

ass

ified

FemaleMale

(c) SVM Pixel.


5

10

15

20

25

30

35

ROI

No. M

iscl

ass

ified

FemaleMale

(d) SVM LBP.

Fig. 10. Number of Misclassifications.

performs considerably better than with an LDA rank-1 NNclassifier by∼5% using pixel-based features. In addition, wefind that LBP descriptors no longer boost performance ingender identification when using only the periocular region.

The periocular region does, indeed, carry considerable dis-criminating information using LDA projections, visualized asFisherfaces. The dominant regions in the full Fisherface arenoted as the eyes, chin and lip regions, but are clearly biasedtoward the ocular region. This supports our initial hypothesisthat the brow and eyes carry valuable information for genderdiscrimination.

Gender identification, using only the periocular region, isreliable, even when testing is performed on images with ahigh degree of variability, including illumination and focallength. This shows that, even under unpredictable environ-ments or acquisition conditions, periocular gender classifica-tion can be proven to perform only a small percentage lowerthan with more controlled data using full facial information.An accuracy rate of more than80%, using only raw pixel-based features, clear shows that the periocular region can beused to identify gender, and, with more advanced featureextraction methods, we expect that performance can beboosted to make the performance margin between the fullface and periocular tighter.

B. Future Work

Future work will involve applying other feature extractiontechniques and their fusions, using tighter periocular regions,using automated and robust image alignment methods foronline classification, and investigating the impact of illu-mination and cosmetics. We would also like to study theapplication of the periocular region to other soft biometricattributes, including ethnicity identification.

(a) Male Subjects.


Fig. 11. Misclassified Examples in the Periocular (Region3).

V. ACKNOWLEDGMENTS

The authors would like to acknowledge the support ofall the staff in Carnegie Mellon University CyLab Biomet-rics Center, in particular Arti Chhajta and Utsav Prabhufor helping collect the database and organize ground truthlandmarking.

REFERENCES

[1] H. Abdi, D. Valentin, B. Edelman, and A. J. O’Toole, “Moreaboutthe difference between men and women: Evidence from linear neuralnetworks and the principal component approach,” 1995.

[2] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces vs.fisherfaces: Recognition using class specific linear projection,” IEEETransactions on Pattern Analysis and Machine Intelligence, vol. 19,pp. 711–720, 1996.

[3] K. W. Bowyer, K. Hollingsworth, and P. J. Flynn, “Image under-standing for iris biometrics: a survey,” Computer Vision and ImageUnderstanding, Tech. Rep., 2007.

[4] E. Brown and D. I. Perrett, “What gives a face its gender?”vol. 22,no. 7, pp. 829–840, Feb. 1993.

[5] R. O. Duda, P. E. Hart, and D. G. Stork,Pattern Classification, 2nd ed.Wiley-Interscience, Nov. 2000.

[6] T. Joachims, “SVMlight, Implementing Support Vector Machines inC,” 1999. [Online]. Available: http://svmlight.joachims.org

[7] B. Moghaddam and M.-H. Yang, “Learning gender with supportfaces,” Mitsubishi Electric Research Laboratories, Tech.Rep., Jan.2002.

[8] T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution gray-scaleand rotation invariant texture classification with local binary patterns,”IEEE Transactions on Pattern Analysis and Machine Intelligence,vol. 24, no. 7, pp. 971–987, 2002.

[9] U. Park, A. Ross, and A. K. Jain, “Periocular biometrics in the visiblespectrum: a feasibility study,” inBTAS ’09: Proceedings of the 3rdIEEE International Conference on Biometrics: Theory, Applicationsand Systems. IEEE Press, 2009, pp. 153–158.

[10] M. Petrou and P. G. Sevilla,Image Processing: Dealing with Texture,1st ed. Wiley-Interscience, Mar. 2006.

[11] Z. Sun, G. Bebis, X. Yuan, and S. J. Louis, “Genetic feature subsetselection for gender classification: A comparison study,” in Proceed-ings of the IEEE Workshop on Applications of Computer Vision, 2002,pp. 165–170.

[12] Yahoo! Inc., “Flickr,” 2010. [Online]. Available: http://www.flickr.com

an exploration of gender identiﬁcation using only the ...bjou/docs/btas_2010.pdfreliably...

Documents