audio- and video-based biometric person authentication: 4th international conference, avbpa 2003...

Lecture Notes in Computer Science 2688 Edited by G. Goos, J. Hartmanis, and J. van Leeuwen
3 Berlin Heidelberg New York Barcelona Hong Kong London Milan Paris Tokyo
Josef Kittler Mark S. Nixon (Eds.)
Audio- andVideo-Based Biometric Person Authentication
4th International Conference, AVBPA 2003 Guildford, UK, June 9-11, 2003 Proceedings
1 3
Series Editors
Volume Editors
Josef Kittler University of Surrey Center for Vision, Speech and Signal Proc. Guildford, Surrey GU2 7XH, UK E-mail: [email protected]
Mark S. Nixon University of Southampton Department of Electronics and Computer Science Southampton, SO17 1BJ, UK E-mail: [email protected]
Cataloging-in-Publication Data applied for
A catalog record for this book is available from the Library of Congress
Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at <http://dnb.ddb.de>.
CR Subject Classification (1998): I.5, I.4, I.3, K.6.5, K.4.4, C.2.0
ISSN 0302-9743 ISBN 3-540-40302-7 Springer-Verlag Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law.
Springer-Verlag Berlin Heidelberg New York a member of BertelsmannSpringer Science+Business Media GmbH
http://www.springer.de
Typesetting: Camera-ready by author, data conversion by DA-TeX Gerd Blumenstein Printed on acid-free paper SPIN 10927847 06/3142 5 4 3 2 1 0
Preface
This book collects the research work presented at the 4th International Con- ference on Audio- and Video-Based Biometric Person Authentication that took place at the University of Surrey, Guildford, UK, in June 2003. We were pleased to see a surge of interest in AVBPA. We received many more submissions than before and this reflects not just the good work put in by previous organizers and participants, but also the increasing world-wide interest in biometrics. With grateful thanks to our program committee, we had a fine program indeed.
The papers concentrate on major established biometrics such as face and speech, and we continue to see the emergence of gait as a new research focus, together with other innovative approaches including writer and palmprint identification. The face-recognition papers show advances not only in recognition techniques, but also in application capabilites and covariate analysis (now with the inclusion of time as a recognition factor), and even in synthesis to evaluate wider recognition capability. Fingerprint analysis now includes study of the effects of compression, and new ways for compression, together with refined study of holistic vs. minutiae and feature set selection, areas of interest to the biometrics community as a whole. The gait presentations focus on new approaches for temporal recognition together with analysis of performance capability and new approaches to improve generalization in performance.
The speech papers reflect the wide range of possible applications together with new uses of visual information. Interest in data fusion continues to increase. But it is not just the more established areas that were of interest at AVBPA 2003. As ever in this innovative technology, there are always new ways to recognize people, as reflected in papers on on-line writer identification and palm print analysis. Iris recognition is also represented, as are face and person extraction in video.
The growing industry in biometrics was reflected in presentations with a specific commercial interest: there are papers on smart cards, wireless devices, architectures, and implementation factors, all of considerable consequence in the deployment of biometric systems. A competition for the best face-authentication (verification) algorithms took place in conjunction with the conference, and the results are reported here.
The papers are complemented by invited presentations by Takeo Kanade (Carnegie Mellon University), Jerry Friedman (Stanford University), and Fred- eric Bimbot (INRIA). All in all, AVBPA continues to offer a snapshot of research in this area from leading institutions around the world. If these papers and this conference inspire new research in this fascinating area, then this conference can be deemed to be truly a success.
April 2003 Josef Kittler and Mark S. Nixon
Organization
AVBPA 2003 was organized by
– the Centre for Vision, Speech and Signal Processing, University of Surrey, UK, and
– TC-14 of IAPR (International Association for Pattern Recognition).
Executive Committee
Conference Co-chairs Josef Kittler and Mark S. Nixon University of Surrey and University of Southampton, UK
Local Organization Rachel Gartshore, University of Surrey
Program Committee
Samy Bengio (Switzerland) Josef Bigun (Sweden) Frederic Bimbot (France) Mats Blomberg (Sweden) Horst Bunke (Switzerland) Hyeran Byun (South Korea) Rama Chellappa (USA) Gerard Chollet (France) Timothy Cootes (UK) Larry Davis (USA) Farzin Deravi (UK) Sadaoki Furui (Japan) M. Dolores Garcia-Plaza (Spain) Dominique Genoud (Switzerland) Shaogang Gong (UK) Steve Gunn (UK) Bernd Heisele (USA) Anil Jain (USA) Kenneth Jonsson (Sweden) Seong-Whan Lee (South Korea) Stan Li (China) John Mason (UK) Jiri Matas (Czech Republic) Bruce Millar (Australia) Larry O’Gorman (USA) Sharath Pankanti (USA)
Organization VII
P. Jonathon Phillips (USA) Salil Prabhakar (USA) Nalini Ratha (USA) Marek Rejman-Greene (UK) Gael Richard (France) Massimo Tistarelli (Italy) Patrick Verlinde (Belgium) Juan Villanueva (Spain) Harry Wechsher (USA) Pong Yuen (Hong Kong)
Sponsoring Organizations
Table of Contents
Face I
Robust Face Recognition in the Presence of Clutter . . . . . . . . . . . . . . . . . . . . . . . . . 1 A.N. Rajagopalan, Rama Chellappa, and Nathan Koterba
An Image Preprocessing Algorithm for Illumination Invariant Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Ralph Gross and Vladimir Brajovic
Quad Phase Minimum Average Correlation Energy Filters for Reduced Memory Illumination Tolerant Face Authentication . . . . . . . . . . . .19 Marios Savvides and B.V.K. Vijaya Kumar
Component-Based Face Recognition with 3D Morphable Models . . . . . . . . . . . 27 Jennifer Huang, Bernd Heisele, and Volker Blanz
Face II
A Comparative Study of Automatic Face Verification Algorithms on the BANCA Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 M. Sadeghi, J. Kittler, A. Kostin, and K. Messer
Assessment of Time Dependency in Face Recognition: An Initial Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Patrick J. Flynn, Kevin W. Bowyer, and P. Jonathon Phillips
Constraint Shape Model Using Edge Constraint and Gabor Wavelet Based Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Baochang Zhang, Wen Gao, Shiguang Shan, and Wei Wang
Expression-Invariant 3D Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .62 Alexander M. Bronstein, Michael M. Bronstein, and Ron Kimmel
Speech
Automatic Estimation of a Priori Speaker Dependent Thresholds in Speaker Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Javier R. Saeta and Javier Hernando
A Bayesian Network Approach for Combining Pitch and Reliable Spectral Envelope Features for Robust Speaker Verification . . . 78 Mijail Arcienega and Andrzej Drygajlo
X Table of Contents
Searching through a Speech Memory for Text-Independent Speaker Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .95 Dijana Petrovska-Delacretaz, Asmaa El Hannani, and Gerard Chollet
Poster Session I
LUT-Based Adaboost for Gender Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Bo Wu, Haizhou Ai, and Chang Huang
Independent Component Analysis and Support Vector Machine for Face Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Gianluca Antonini, Vlad Popovici, and Jean-Philippe Thiran
Real-Time Emotion Recognition Using Biologically Inspired Models . . . . . . . 119 Keith Anderson and Peter W. McOwan
A Dual-Factor Authentication System Featuring Speaker Verification and Token Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Purdy Ho and John Armington
Wavelet-Based 2-Parameter Regularized Discriminant Analysis for Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Dao-Qing Dai and P.C. Yuen
Face Tracking and Recognition from Stereo Sequence . . . . . . . . . . . . . . . . . . . . . 145 Jian-Gang Wang, Ronda Venkateswarlu, and Eng Thiam Lim
Face Recognition System Using Accurate and Rapid Estimation of Facial Position and Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Takatsugu Hirayama, Yoshio Iwai, and Masahiko Yachida
Fingerprint Enhancement Using Oriented Diffusion Filter . . . . . . . . . . . . . . . . . 164 Jiangang Cheng, Jie Tian, Hong Chen, Qun Ren, and Xin Yang
Visual Analysis of the Use of Mixture Covariance Matrices in Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .172 Carlos E. Thomaz and Duncan F. Gillies
A Face Recognition System Based on Local Feature Analysis . . . . . . . . . . . . . .182 Stefano Arca, Paola Campadelli, and Raffaella Lanzarotti
Face Detection Using an SVM Trained in Eigenfaces Space . . . . . . . . . . . . . . . .190 Vlad Popovici and Jean-Philippe Thiran
Face Detection and Facial Component Extraction by Wavelet Decomposition and Support Vector Machines . . . . . . . . . . . . . . . . . 199 Dihua Xi and Seong-Whan Lee
Table of Contents XI
U-NORM Likelihood Normalization in PIN-Based Speaker Verification Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 D. Garcia-Romero, J. Gonzalez-Rodriguez, J. Fierrez-Aguilar, and J. Ortega-Garcia
Facing Position Variability in Minutiae-Based Fingerprint Verification through Multiple References and Score Normalization Techniques . . . . . . . . . 214 D. Simon-Zorita, J. Ortega-Garcia, M. Sanchez-Asenjo, and J. Gonzalez-Rodriguez
Iris-Based Personal Authentication Using a Normalized Directional Energy Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 Chul-Hyun Park, Joon-Jae Lee, Mark J.T. Smith, and Kil-Houm Park
An HMM On-line Signature Verification Algorithm . . . . . . . . . . . . . . . . . . . . . . . 233 Daigo Muramatsu and Takashi Matsumoto
Automatic Pedestrian Detection and Tracking for Real-Time Video Surveillance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 Hee-Deok Yang, Bong-Kee Sin, and Seong-Whan Lee
Visual Features Extracting & Selecting for Lipreading . . . . . . . . . . . . . . . . . . . . 251 Hong-xun Yao, Wen Gao, Wei Shan, and Ming-hui Xu
An Evaluation of Visual Speech Features for the Tasks of Speech and Speaker Recognition . . . . . . . . . . . . . . . . . . . . . . . . . .260 Simon Lucey
Feature Extraction Using a Chaincoded Contour Representation of Fingerprint Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 Venu Govindaraju, Zhixin Shi, and John Schneider
Hypotheses-Driven Affine Invariant Localization of Faces in Verification Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 M. Hamouz, J. Kittler, J.K. Kamarainen, and H. Kalviainen
Shape Based People Detection for Visual Surveillance Systems . . . . . . . . . . . . 285 M. Leo, P. Spagnolo, G. Attolico, and A. Distante
Real-Time Implementation of Face Recognition Algorithms on DSP Chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 Seong-Whan Lee, Sang-Woong Lee, and Ho-Choul Jung
Robust Face-Tracking Using Skin Color and Facial Shape . . . . . . . . . . . . . . . . . 302 Hyung-Soo Lee, Daijin Kim, and Sang-Youn Lee
Fingerprint
Fusion of Statistical and Structural Fingerprint Classifiers . . . . . . . . . . . . . . . . 310 Gian Luca Marcialis, Fabio Roli, and Alessandra Serrau
XII Table of Contents
Learning Features for Fingerprint Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 Xuejun Tan, Bir Bhanu, and Yingqiang Lin
Fingerprint Matching with Registration Pattern Inspection . . . . . . . . . . . . . . . 327 Hong Chen, Jie Tian, and Xin Yang
Biometric Template Selection: A Case Study in Fingerprints . . . . . . . . . . . . . . 335 Anil Jain, Umut Uludag, and Arun Ross
Orientation Scanning to Improve Lossless Compression of Fingerprint Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Johan Tharna, Kenneth Nilsson, and Josef Bigun
Image, Video Processing, Tracking
A Nonparametric Approach to Face Detection Using Ranklets . . . . . . . . . . . . 351 Fabrizio Smeraldi
Refining Face Tracking with Integral Projections . . . . . . . . . . . . . . . . . . . . . . . . . . 360 Gines Garca Mateos
Glasses Removal from Facial Image Using Recursive PCA Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 Jeong-Seon Park, You Hwa Oh, Sang Chul Ahn, and Seong-Whan Lee
Synthesis of High-Resolution Facial Image Based on Top-Down Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377 Bon-Woo Hwang, Jeong-Seon Park, and Seong-Whan Lee
A Comparative Performance Analysis of JPEG 2000 vs. WSQ for Fingerprint Image Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385 Miguel A. Figueroa-Villanueva, Nalini K. Ratha, and Ruud M. Bolle
General
New Shielding Functions to Enhance Privacy and Prevent Misuse of Biometric Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 Jean-Paul Linnartz and Pim Tuyls
The NIST HumanID Evaluation Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 Ross J. Micheals, Patrick Grother, and P. Jonathon Phillips
Synthetic Eyes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .412 Behrooz Kamgar-Parsi, Behzad Kamgar-Parsi, and Anil K. Jain
Table of Contents XIII
Dental Biometrics: Human Identification Using Dental Radiographs . . . . . . . 429 Anil K. Jain, Hong Chen, and Silviu Minut
Effect of Window Size and Shift Period in Mel-Warped Cepstral Feature Extraction on GMM-Based Speaker Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438 C.C. Leung and Y.S. Moon
Discriminative Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446 Florent Perronnin and Jean-Luc Dugelay
Cross-Channel Histogram Equalisation for Colour Face Recognition . . . . . . . 454 Stephen King, Gui Yun Tian, David Taylor, and Steve Ward
Open World Face Recognition with Credibility and Confidence Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462 Fayin Li and Harry Wechsler
Enhanced VQ-Based Algorithms for Speech Independent Speaker Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470 Ningping Fan and Justinian Rosca
Fingerprint Fusion Based on Minutiae and Ridge for Enrollment . . . . . . . . . . 478 Dongjae Lee, Kyoungtaek Choi, Sanghoon Lee, and Jaihie Kim
Face Hallucination and Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486 Xiaogang Wang and Xiaoou Tang
Robust Features for Frontal Face Authentication in Difficult Image Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495 Conrad Sanderson and Samy Bengio
Facial Recognition in Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505 Dmitry O. Gorodnichy
Face Authentication Based on Multiple Profiles Extracted from Range Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515 Yijun Wu, Gang Pan, and Zhaohui Wu
Eliminating Variation of Face Images Using Face Symmetry . . . . . . . . . . . . . . .523 Yan Zhang and Jufu Feng
Combining SVM Classifiers for Multiclass Problem: Its Application to Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 Jaepil Ko and Hyeran Byun
A Bayesian MCMC On-line Signature Verification . . . . . . . . . . . . . . . . . . . . . . . . 540 Mitsuru Kondo, Daigo Muramatsu, Masahiro Sasaki, and Takashi Matsumoto
XIV Table of Contents
Illumination Normalization Using Logarithm Transforms for Face Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .549 Marios Savvides and B.V.K. Vijaya Kumar
Performance Evaluation of Face Recognition Algorithms on the Asian Face Database, KFDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557 Bon-Woo Hwang, Hyeran Byun, Myoung-Cheol Roh, and Seong-Whan Lee
Automatic Gait Recognition via Fourier Descriptors of Deformable Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566 Stuart D. Mowbray and Mark S. Nixon
A Study on Performance Evaluation of Fingerprint Sensors . . . . . . . . . . . . . . . 574 Hyosup Kang, Bongku Lee, Hakil Kim, Daecheol Shin, and Jaesung Kim
An Improved Fingerprint Indexing Algorithm Based on the Triplet Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584 Kyoungtaek Choi, Dongjae Lee, Sanghoon Lee, and Jaihie Kim
A Supervised Approach in Background Modelling for Visual Surveillance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592 P. Spagnolo, M. Leo, G. Attolico, and A. Distante
Human Recognition on Combining Kinematic and Stationary Features . . . . 600 Bir Bhanu and Ju Han
Architecture for Synchronous Multiparty Authentication Using Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609 Sunil J. Noronha, Chitra Dorai, Nalini K. Ratha, and Ruud M. Bolle
Boosting a Haar-Like Feature Set for Face Verification . . . . . . . . . . . . . . . . . . . . 617 Bernhard Froba, Sandra Stecher, and Christian Kublbeck
The BANCA Database and Evaluation Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . 625 Enrique Bailly-Bailliere, Samy Bengio, Frederic Bimbot, Miroslav Hamouz, Josef Kittler, Johnny Mariethoz, Jiri Matas, Kieron Messer, Vlad Popovici, Fabienne Poree, Belen Ruiz, and Jean-Philippe Thiran
A Speaker Pruning Algorithm for Real-Time Speaker Identification . . . . . . . 639 Tomi Kinnunen, Evgeny Karpov, and Pasi Franti
“Poor Man” Vote with M -ary Classifiers. Application to Iris Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647 V. Vigneron, H. Maaref, and S. Lelandais
Handwriting, Signature, Palm
Table of Contents XV
Personal Verification Using Palmprint and Hand Geometry Biometric . . . . . 668 Ajay Kumar, David C.M. Wong, Helen C. Shen, and Anil K. Jain
A Set of Novel Features for Writer Identification . . . . . . . . . . . . . . . . . . . . . . . . . . 679 Caroline Hertel and Horst Bunke
Combining Fingerprint and Hand-Geometry Verification Decisions . . . . . . . . 688 Kar-Ann Toh, Wei Xiong, Wei-Yun Yau, and Xudong Jiang
Iris Verification Using Correlation Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697 B.V.K. Vijaya Kumar, Chunyan Xie, and Jason Thornton
Gait
Gait Analysis for Human Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706 A. Kale, N. Cuntoor, B. Yegnanarayana, A.N. Rajagopalan, and R. Chellappa
Performance Analysis of Time-Distance Gait Parameters under Different Speeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .715 Rawesak Tanawongsuwan and Aaron Bobick
Novel Temporal Views of Moving Objects for Gait Biometrics . . . . . . . . . . . . .725 Stuart P. Prismall, Mark S. Nixon, and John N. Carter
Gait Shape Estimation for Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734 David Tolliver and Robert T. Collins
Fusion
Audio-Visual Speaker Identification Based on the Use of Dynamic Audio and Visual Features . . . . . . . . . . . . . . . . . 743 Niall Fox and Richard B. Reilly
Scalability Analysis of Audio-Visual Person Identity Verification . . . . . . . . . . 752 Jacek Czyz, Samy Bengio, Christine Marcel, and Luc Vandendorpe
A Bayesian Approach to Audio-Visual Speaker Identification . . . . . . . . . . . . . .761 Ara V. Nefian, Lu Hong Liang, Tieyan Fu, and Xiao Xing Liu
Multimodal Authentication Using Asynchronous HMMs . . . . . . . . . . . . . . . . . . 770 Samy Bengio
Theoretic Evidence k-Nearest Neighbourhood Classifiers in a Bimodal Biometric Verification System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 778 Andrew Teoh Beng Jin, Salina Abdul Samad, and Aini Hussain
XVI Table of Contents
Poster Session III
Combined Face Detection/Recognition System for Smart Rooms . . . . . . . . . . 787 Jia Kui and Liyanage C. De Silva
Capabilities of Biometrics for Authentication in Wireless Devices . . . . . . . . . 796 Pauli Tikkanen, Seppo Puolitaival, and Ilkka Kansala
Combining Face and Iris Biometrics for Identity Verification . . . . . . . . . . . . . . 805 Yunhong Wang, Tieniu Tan, and Anil K. Jain
Experimental Results on Fusion of Multiple Fingerprint Matchers . . . . . . . . . 814 Gian Luca Marcialis and Fabio Roli
Predicting Large Population Data Cumulative Match Characteristic Performance from Small Population Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 821 Amos Y. Johnson, Jie Sun, and Aaron F. Bobick
A Comparative Evaluation of Fusion Strategies for Multimodal Biometric Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .830 J. Fierrez-Aguilar, J. Ortega-Garcia, D. Garcia-Romero, and J. Gonzalez-Rodriguez
Iris Feature Extraction Using Independent Component Analysis . . . . . . . . . . .838 Kwanghyuk Bae, Seungin Noh, and Jaihie Kim
BIOMET: A Multimodal Person Authentication Database Including Face, Voice, Fingerprint, Hand and Signature Modalities . . . . . . . . 845 Sonia Garcia-Salicetti, Charles Beumier, Gerard Chollet, Bernadette Dorizzi, Jean Leroux les Jardins, Jan Lunter, Yang Ni, and Dijana Petrovska-Delacretaz
Fingerprint Alignment Using Similarity Histogram . . . . . . . . . . . . . . . . . . . . . . . . 854 Tanghui Zhang, Jie Tian, Yuliang He, Jiangang Cheng, and Xin Yang
A Novel Method to Extract Features for Iris Recognition System . . . . . . . . . .862 Seung-In Noh, Kwanghyuk Bae, Yeunggyu Park, and Jaihie Kim
Resampling for Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869 Xiaoguang Lu and Anil K. Jain
Toward Person Authentication with Point Light Display Using Neural Network Ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 878 Sung-Bae Cho and Frank E. Pollick
Fingerprint Verification Using Correlation Filters . . . . . . . . . . . . . . . . . . . . . . . . . 886 Krithika Venkataramani and B.V.K. Vijaya Kumar
On the Correlation of Image Size to System Accuracy in Automatic Fingerprint Identification Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 895 J.K. Schneider, C.E. Richardson, F.W. Kiefer, and Venu Govindaraju
Table of Contents XVII
A JC-BioAPI Compliant Smart Card with Biometrics for Secure Access Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 903 Michael Osborne and Nalini K. Ratha
Comparison of MLP and GMM Classifiers for Face Verification on XM2VTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 911 Fabien Cardinaux, Conrad Sanderson, and Sebastien Marcel
Fast Frontal-View Face Detection Using a Multi-path Decision Tree . . . . . . . 921 Bernhard Froba and Andreas Ernst
Improved Audio-Visual Speaker Recognition via the Use of a Hybrid Combination Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 929 Simon Lucey and Tsuhan Chen
Face Recognition Vendor Test 2002 Performance Metrics . . . . . . . . . . . . . . . . . . 937 Patrick Grother, Ross J. Micheals, and P. Jonathon Phillips
Posed Face Image Synthesis Using Nonlinear Manifold Learning . . . . . . . . . . .946 Eunok Cho, Daijin Kim, and Sang-Youn Lee
Pose for Fusing Infrared and Visible-Spectrum Imagery . . . . . . . . . . . . . . . . . . . 955 Jian-Gang Wang and Ronda Venkteswarlu
AVBPA2003 Face Authentication Contest
Face Verification Competition on the XM2VTS Database . . . . . . . . . . . . . . . . . 964 Kieron Messer, Josef Kittler, Mohammad Sadeghi, Sebastien Marcel, Christine Marcel, Samy Bengio, Fabien Cardinaux, C. Sanderson, Jacek Czyz, Luc Vandendorpe, Sanun Srisuk, Maria Petrou, Werasak Kurutach, Alexander Kadyrov, Roberto Paredes, B. Kepenekci, F.B. Tek, G.B. Akar, Farzin Deravi, and Nick Mavity
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .975
Robust Face Recognition
A.N. Rajagopalan1, Rama Chellappa2, and Nathan Koterba2
1 Indian Institute of Technology, Madras, India [email protected]
2 Center for Automation Research, University of Maryland, College Park, USA {rama,nathank}@cfar.umd.edu
Abstract. We propose a new method within the framework of principal component analysis to robustly recognize faces in the presence of clutter. The traditional eigenface recognition method performs poorly when confronted with the more general task of recognizing faces appear- ing against a background. It misses faces completely or throws up many false alarms. We argue in favor of learning the distribution of background patterns and show how this can be done for a given test image. An eigenbackground space is constructed and this space in conjunction with the eigenface space is used to impart robustness in the presence of background. A suitable classifier is derived to distinguish non-face patterns from faces. When tested on real images, the performance of the proposed method is found to be quite good.
1 Introduction
Two of the very successful and popular approaches to face recognition are the Principal Components Analysis (PCA) [1] and Fisher’s Linear Discriminant (FLD) [2]. Methods based on PCA and FLD work quite well provided the input test pattern is a face i.e., the face image has already been cropped out of a scene. The problem of recognizing faces in still images with a cluttered background is more general and difficult as one doesn’t know where a face pattern might appear in a given image. A good face recognition system should i) detect and recognize all the faces in a scene, and ii) not mis-classify background patterns as faces. Since faces are usually sparsely distributed in images, even a few false alarms will render the system ineffective. Also, the performance should not be too sensitive to any threshold selection. Some attempts to address this situation are discussed in [1, 3] where the use of distance from eigenface space (DFFS) and distance in eigenface space (DIFS) are suggested to detect and eliminate non-faces for robust face recognition in clutter. In this study, we show that DFFS and DIFS by themselves (in the absence of any information about the background) are not sufficient to discriminate against arbitrary background patterns. If the threshold is set high, traditional eigenface recognition (EFR) invariably ends up missing faces. If the threshold is lowered to capture faces, the technique incurs many false alarms.
J. Kittler and M.S. Nixon (Eds.): AVBPA 2003, LNCS 2688, pp. 1–9, 2003. c© Springer-Verlag Berlin Heidelberg 2003
2 A.N. Rajagopalan et al.
One possible approach to handle clutter in still images is to use a good face detection module to find face patterns and then feed only these patterns as inputs to the traditional EFR scheme. In this paper, we propose a new methodology within the PCA framework to robustly recognize frontal faces in a given test image with background clutter. Towards this end, we construct an ‘eigenbackground space’ which represents the distribution of the background images corresponding to the given test image. The background is learnt ‘on the fly’ and provides a sound basis for eliminating false alarms. An appropriate pattern classifier is derived and the eigenbackground space together with the eigenface space is used to simultaneously detect and recognize faces. Results are given on several test images to validate the proposed method.
2 Eigenface Recognition in Clutter
In the EFR technique, when a face image is presented to the system, its weight vector is determined with respect to the eigenface space. In order to perform recognition, the difference error between this weight vector and the a priori stored mean weight vector corresponding to every person in the training set is computed. This error is also called the distance in face space (DIFS). That face class in the training set for which the DIFS is minimum is declared as the recognized face provided the difference error is less than an appropriately chosen threshold. The case of a still image containing face against background is much more complex and some attempts have been made to tackle it [1, 3]. In [1], the authors advocate the use of distance from face space (DFFS) to reject non-face patterns. The DFFS can be looked upon as the error in the reconstruction of a pattern. It has been pointed out in [1] that a threshold θDFFS could be chosen such that it defines the maximum allowable distance from the face space. If DFFS is greater than θDFFS , then the test pattern is classified as a non-face image. In a more recent work [3], DFFS together with DIFS has been suggested to improve performance. A test pattern is classified as a face and recognized provided its DFFS as well as DIFS values are less than suitably chosen thresholds θDFFS
and θDIFS , respectively. Although DFFS and DIFS have been suggested as possible candidates for
discriminating against background patterns, it is difficult to conceive that by learning just the face class we can segregate any arbitrary background pattern against which face patterns may appear. It may not always be possible to come up with threshold values that will result in no false alarms and yet can catch all the faces. To better illustrate this point, we show some examples in Fig. 1(a) where faces appear against background. Our training set contains faces of these individuals. The idea is to locate and recognize these individuals in the test images when they appear against clutter. The DFFS and DIFS values corresponding to every subimage pattern in these images were calculated and an attempt was made to recognize faces based on these values as suggested in [3]. It turns out that not only do we catch the face but also end up with many false alarms (see Fig. 1(b)) since information about the background is completely
Robust Face Recognition in the Presence of Clutter 3
ignored. It is interesting to note that some of the background patterns have been wrongly identified as one of the individuals in the training set. If the threshold values are made smaller to eliminate false alarms, we end up missing some of the faces. Thus, the performance of the EFR technique is quite sensitive to the threshold values chosen.
3 Background Representation
If only the eigenface space is learnt, then background patterns with relatively small DFFS and DIFS values will pass for faces and this can result in an unac- ceptable number of false alarms. We argue in favor of learning the distribution of background images specific to a given scene. A locally learnt distribution can be expected to be more effective (than a universal background class learnt as in [4, 5] which is quite data intensive) for capturing the background character- istics of the given test image. By constructing the eigenbackground space for the given test image and comparing the proximity of an image pattern to this subspace versus the eigenface subspace, background patterns can be rejected.
3.1 The Eigenbackground Space
We now describe a simple but effective technique for constructing the ‘eigenbackground space’. It is assumed that faces are sparsely distributed in a given image, which is a reasonable assumption. Given a test image, the background is learnt ‘on the fly’ from the test image itself. Initially, the test image is scanned for those image patterns that are very unlikely to belong to the ‘face class’.
• A window pattern x in the test image is classified (positively) as a background pattern if its distance from the eigenface space is greater than a certain (high) threshold θb.
Note that we use DFFS to initially segregate only the most likely background patterns. Since the background usually constitutes a major portion of the test image, it is possible to obtain a sufficient number of samples for learning the ‘background class’ even if the threshold θb is chosen to be large for higher confidence. Since the number of background patterns is likely to be very large, these patterns are distributed into K clusters using simple K-means clustering so that K-pattern centers are returned. The mean and covariance estimated from these clusters allow us to effectively extrapolate to other background patterns in the image (not picked up due to high value of θb) as well.
The pattern centers which are much fewer in number as compared to the number of background patterns are then used as training images for learning the eigenbackground space. Although the pattern centers belong to different clusters, they are not totally uncorrelated with respect to one another and further dimensionality reduction is possible. The procedure that we follow is similar to that used to create the eigenface space. We first find the principal components of the background pattern centers or the eigenvectors of the covariance matrix Cb
of the set of background pattern centers. These eigenvectors can be thought of as a set of features which together characterize the variation among pattern centers of the background space. The subspace spanned by the eigenvectors corresponding to the largest K ′ eigenvalues of the covariance matrix Cb is called the eigenbackground space. The significant eigenvectors of the matrix Cb, which we call ‘eigenbackground images’, form a basis for representing the background image patterns in the given test image.
4 The Classifier
Let the face class be denoted by ω1 and the background class be denoted by ω2. Assuming the conditional density function for the two classes to be Gaussian
f(x|ωi) = 1
exp [ −1
2 di(x)
] (1)
where di(x) = (x − µi)tC−1 i (x − µi). Here, µ1 and µ2 are the means while C1
and C2 are the covariance matrices of the face and the background class, respectively. If the image pattern is of size M×M , then N = M2. Diagonalization of Ci
results in di(x) = (x − µi)t(φiΛ −1 i φt
i)(x − µi) = yt i Λ−1
i y i
where φi is a matrix containing eigenvectors of Ci and is of the form [φ1iφ2i . . . φNi]. The weight vector y
i = φt
i(x− µi) is obtained by projecting the mean-subtracted vector x onto the subspace spanned by the eigenvectors in φi. Written in scalar form, di(x)
becomes di(x) = ∑N
λij . Since d1(x) is approximated using only L′ principal
projections, we seek to formulate an estimator for d1(x) as follows.
d1(x) = L′∑
j=1
y2 1j
1 ρ1
ε21(x) (2)
where ε21(x) is the reconstruction error in x with respect to the eigenface space. This is because ε21(x) can be written as ε21(x) = ||x − xf ||2 where xf is the estimate of x when projected onto the eigenface space. Because xf is computed using only L′ principal projections in the eigenface space, we have
ε21(x) = ||x−(µ1+ L′∑
y2 1j
as the φ1js are orthonormal. In a similar vein, since d2(x) is approximated using only K ′ principal pro-
jections
where ε22(x) is the reconstruction error in x with respect to the eigenbackground space and ε22(x) =
∑N j=K′+1 y2
2j = ||x − xb||2 Here, xb is the estimate of x when projected onto the eigenbackground space.
From equations (1) and (2), the density estimate based on d1(x) can be written as the product of two marginal and independent Gaussian densities in the face space F and its orthogonal complement F⊥, i.e.,
f(x|ω1) = exp
= fF (x|ω1) · fF⊥(x|ω1)
Here, fF (x|ω1) is the true marginal density in the face space while fF⊥(x|ω1) is the estimated marginal density in F⊥.
Along similar lines, the density estimate for the background class can be expressed as
f(x|ω2) = exp
= fB(x|ω2) · fB⊥(x|ω2)
Here, fB(x|ω2) is the true marginal density in the background space while fB⊥(x|ω2) is the estimated marginal density in B⊥.
The optimal values of ρ1 and ρ2 can be determined by minimizing the Kullback-Leibler distance [6] between the true density and its estimate. The resultant estimates can be shown to be
ρ1 = 1
N − L′
λ2j (6)
Thus, once we select the L′-dimensional principal subspace F , the optimal density estimate f(x|ω1) has the form given by equation (4) where ρ1 is as given above. A similar argument applies to the background space also.
Assuming equal a priori probabilities, the classifier can be derived as
log fF⊥(x|ω1)−log fB⊥ (x|ω2) = ε22(x)
2ρ2 − ε21(x)
2 log (2πρ1)
(7)
When L′ = K ′ i.e., when the number of eigenfaces and eigenbackground patterns are the same, and when ρ1 = ρ2, i.e., when the arithmetic mean of the eigenvalues in the orthogonal subspaces is the same, the above classifier inter- estingly simplifies to
log fF⊥(x|ω1) − log fB⊥(x|ω2) = ε22(x) − ε21(x) (8)
which is simply a function of the reconstruction error. Clearly, the face space would favour a better reconstruction of face patterns while the background space would favour the background patterns.
5 The Proposed Method
Once the eigenface space and the eigenbackground space are learnt, the test image is examined again, but now for the presence of faces at all points in the image. For each of the test window patterns, the classifier proposed in Section 4 is used to determine whether a pattern is a face or not. Ideally, one must use equation (7) but for computational simplicity we use equation (8) which is the difference in the reconstruction error. The classifier works quite well despite this simplification.
To express the operations mathematically, let the subimage pattern under consideration in the test image be denoted as x. The vector x is projected onto the eigenface space as well as the eigenbackground space to yield estimates of x as xf and xb, respectively. If
x − xf2 < x − xb2
and x − xf2 < θDFFS (9)
where θDFFS is an appropriately chosen threshold then recognition is carried out based on its DIFS value. The weight vector W corresponding to pattern x in the eigenface space is compared (in the Euclidean sense) with the pre-stored mean weights of each of the face classes. The pattern x is recognized as belonging to the ith person if
i = min j
W − mj2, j = 1, . . . , q
and W − mi2 < θDIFS (10)
where q is the number of face classes or people in the database and θDIFS is a suitably chosen threshold.
In the above discussion, since a background pattern will be better approximated by the eigenbackground images than by the eigenface images, it is to be expected that x− xb2 would be less than x− xf2 for a background pattern x. On the other hand, if x is a face pattern, then it will be better represented by the eigenface space than the eigenbackground space. Thus, learning the eigenbackground space helps to reduce the false alarms considerably. Moreover, the threshold value can now be raised comfortably without generating false alarms because the reconstruction error of a background pattern would continue to remain a minimum with respect to the background space only. Knowledge of the background leads to improved performance (fewer misses as well as fewer false alarms) and reduces sensitivity to the choice of threshold values (properties that are highly desirable in a recognition scenario).
6 Experimental Results
Because our experiment requires individuals in the test images (with background clutter) to be the same as the ones in the training set, we generated our own face database. The training set consisted of images of size 27×27 pixels of 50 subjects with 10 images per subject. The number of significant eigenfaces was found to be 50 for satisfactory recognition. For purpose of testing, we captured images in which subjects in the database appeared (with an approximately frontal pose) against different types of background. Some of the images were captured within the laboratory. For other types of clutter, we used big posters with different types of complex background. Pictures of the individuals in our database were then captured with these posters in the backdrop. We captured about 400 such test images each of size 120 × 120 pixels. If a face pattern is recognized by the system, a box is drawn at the corresponding location in the output image.
Thresholds θDFFS and θDIFS were chosen to be the maximum of all the DFFS and DIFS values, respectively, among the faces in the training set (which is a reasonable thing to do). The threshold values were kept the same for all the test images and for both the schemes as well. For the proposed scheme, the number of background pattern centers was chosen to be 600 while the number of eigenbackground images was chosen to be 100 and these were kept fixed for all the test images. The number of eigenbackground images was arrived at based on the accuracy of reconstruction of the background patterns.
Due to space constraint, only a few representative results are given here (see Fig. 1 and Fig. 2). The figures are quite self-explanatory. We observe that traditional EFR (which does not utilize background information) confuses too many background patterns (Fig. 1(b)) with faces in the training set. If θDFFS is decreased to reduce the false alarms, then it ends up missing many of the faces. On the other hand, the proposed scheme works quite well and recognizes faces with very few false alarms, if any. When tested on all the 400 test images, the proposed method has a detection capability of 80% with no false alarms, and the recognition rate on these detected images is 78%. Most of the frontal faces are caught correctly. Even if θDFFS is increased to accommodate slightly difficult poses, we have observed that the results are unchanged for the proposed method. This can be attributed to the fact that the proximity of a background pattern continues to remain with respect to the background space despite changes in θDFFS .
7 Conclusions
In the literature, the eigenface technique has been demonstrated to be very useful for face recognition. However, when the scheme is directly extended to recognize faces in the presence of background clutter, its performance degrades as it cannot satisfactorily discriminate against non-face patterns. In this paper, we have presented a robust scheme for recognizing faces in still images of natural scenes. We argue in favor of constructing an eigenbackground space from the
(a) (b) (c)
Fig. 1. (a) Sample test images. Results for (b) traditional EFR, and (c) the proposed method. Note that traditional EFR has many false alarms
Fig. 2. Some representative results for the proposed method
background images of a given scene. The background space which is created ‘on the fly’ from the test image is shown to be very useful in distinguishing non-face patterns. The scheme outperforms the traditional EFR technique and gives very good results with almost no false alarms, even on fairly complicated scenes.
References
[1] M. Turk and A. Pentland, “Eigenfaces for recognition”, J. Cognitive Neuro- sciences, vol. 3, pp. 71-86, 1991. 1, 2
[2] P. Belhumeur, J. Hespanha and D. Kriegman, ”Eigenfaces vs. Fisherfaces: Recog- nition using class specific linear projection”, IEEE Trans. Pattern Anal. and Machine Intell., vol. 19, pp. 711-720, 1997. 1
[3] B. Moghaddam and A. Pentland, “Probabilistic visual learning for object representation”, IEEE Trans. Pattern Anal. and Machine Intell., vol. 19, pp. 696-710, 1997. 1, 2
[4] K. Sung and T. Poggio, “Example-based learning for view-based human face detection”, IEEE Trans. Pattern Anal. and Machine Intell., vol. 20, pp. 39-51, 1998. 3
[5] H.A. Rowley, S. Baluja, and T. Kanade, “Neural network-based face detection”, IEEE Trans. Pattern Anal. and Machine Intell., vol. 20, pp. 23-38, 1998. 3
[6] K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, 1991. 5
An Image Preprocessing Algorithm for
Illumination Invariant Face Recognition
The Robotics Institute Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213
{rgross,brajovic}@cs.cmu.edu
Abstract. Face recognition algorithms have to deal with significant amounts of illumination variations between gallery and probe images. State-of-the-art commercial face recognition algorithms still struggle with this problem. We propose a new image preprocessing algorithm that compensates for illumination variations in images. From a single brightness image the algorithm first estimates the illumination field and then compensates for it to mostly recover the scene reflectance. Unlike previously proposed approaches for illumination compensation, our algorithm does not require any training steps, knowledge of 3D face models or reflective surface models. We apply the algorithm to face images prior to recognition. We demonstrate large performance improvements with several standard face recognition algorithms across multiple, publicly available face databases.
1 Introduction
Besides pose variation, illumination is the most significant factor affecting the appearance of faces. Ambient lighting changes greatly within and between days and among indoor and outdoor environments. Due to the 3D shape of the face, a direct lighting source can cast strong shadows that accentuate or diminish certain facial features. Evaluations of face recognition algorithms consistently show that state-of-the-art systems can not deal with large differences in illumination conditions between gallery and probe images [1, 2, 3]. In recent years many appearance-based algorithms have been proposed to deal with the problem [4, 5, 6, 7]. Belhumeur showed [5], that the set of images of an object in fixed pose but under varying illumination forms a convex cone in the space of images. The illumination cones of human faces can be approximated well by low- dimensional linear subspaces [8]. The linear subspaces are typically estimated from training data, requiring multiple images of the object under different illumination conditions. Alternatively, model-based approaches have been proposed to address the problem. Blanz et al. [9] fit a previously constructed morphable 3D model to single images. The algorithm works well across pose and illumination, however, the computational expense is very high.
In general, an image I(x, y) is regarded as product I(x, y) = R(x, y)L(x, y) where R(x, y) is the reflectance and L(x, y) is the illuminance at each point
An Image Preprocessing Algorithm 11
(x, y) [10]. Computing the reflectance and the illuminence fields from real images is, in general, an ill-posed problem. Therefore, various assumptions and simplifications about L, or R, or both are proposed in order to attempt to solve the problem. A common assumption is that L varies slowly while R can change abruptly. For example, homomorphic filtering [11] uses this assumption to extract R by high-pass filtering the logarithm of the image.
Closely related to homomorphic filtering is Land’s ”retinex” theory [12]. The retinex algorithm estimates the reflectance R as the ratio of the image I(x, y) and its low pass version that serves as estimate for L(x, y). At large discontinuities in I(x, y) ”halo” effects are often visible. Jobson [13] extended the algorithm by combining several low-pass copies of the logarithm of I(x, y) using different cut-off frequencies for each low-pass filter. This helps to reduce halos, but does not eliminate them entirely.
In order to eliminate the notorious halo effect, Tumblin and Turk introduced the low curvature image simplifier (LCIS) hierarchical decomposition of an image [14]. Each component in this hierarchy is computed by solving a partial differential equation inspired by anisotropic diffusion [15]. At each hierarchical level the method segments the image into smooth (low-curvature) regions while stopping at sharp discontinuities. The algorithm is computationally intensive and requires manual selection of no less than 8 different parameters.
2 The Reflectance Perception Model
Our algorithm is motivated by two widely accepted assumptions about human vision: 1) human vision is mostly sensitive to scene reflectance and mostly in- sensitive to the illumination conditions, and 2) human vision responds to local changes in contrast rather than to global brightness levels. These two assumptions are closely related since local contrast is a function of reflectance.
Having these assumptions in mind our goal is to find an estimate of L(x, y) such that when it divides I(x, y) it produces R(x, y) in which the local contrast is appropriately enhanced. In this view R(x, y) takes the place of perceived sensation, while I(x, y) takes the place of the input stimulus. L(x, y) is then called perception gain which maps the input sensation into the perceived stimulus, that is:
I(x, y) 1
L(x, y) = R(x, y) (1)
With this biological analogy, R is mostly the reflectance of the scene, and L is mostly the illumination field, but they may not be ”correctly” separated in a strict physical sense. After all, humans perceive reflectance details in shadows as well as in bright regions, but they are also cognizant of the presence of shadows. From this point on, we may refer to R and L as reflectance and illuminance, but they are to be understood as the perceived sensation and the perception gain, respectively.
To derive our model, we turn to evidence gathered in experimental psychol- ogy. According to Weber’s Law the sensitivity threshold to a small intensity
12 Ralph Gross and Vladimir Brajovic
← log(I) slope = 1/I
(a) (b)
Fig. 1. (a) Compressive logarithmic mapping emphasizes changes at low stimulus levels and attenuates changes at high stimulus levels. (b) Discretization lattice for the PDE in Equation (5)
change increases proportionally to the signal level [16]. This law follows from ex- perimentation on brightness perception that consists of exposing an observer to a uniform field of intensity I in which a disk is gradually increased in brightness by a quantity I. The value I from which the observer perceives the existence of the disk against the background is called brightness discrimination threshold. Weber noticed that I
I is constant for a wide range of intensity values. Weber’s law gives a theoretical justification for assuming a logarithmic mapping from input stimulus to perceived sensation (see Figure 1(a)).
Due to the logarithmic mapping when the stimulus is weak, for example in deep shadows, small changes in the input stimulus elicit large changes in perceived sensation. When the stimulus is strong, small changes in the input stimulus are mapped to even smaller changes in perceived sensation. In fact local variations in the input stimulus are mapped to the perceived sensation variations with the gain 1
I , that is:
I(x, y) 1
IΨ (x, y) = R(x, y), (x, y) ∈ Ψ (2)
where IΨ (x, y) is the stimulus level in a small neighborhood Ψ in the input image. By comparing Equation (1) and (2) we arrive at the model for the perception gain:
L(x, y) = IΨ (x, y) .= I(x, y) (3)
where the neighborhood stimulus level is by definition taken to be the stimulus at point (x, y). As seen in Equation 4 we regularize the problem by imposing a smoothness constraint on the solution for L(x, y). The smoothness constraint takes care of producing IΨ ; therefore, the replacement by definition of IΨ by I
in Equation 3 is justified. We do not need to specify any particular region Ψ . The solution for L(x, y) is found by minimizing:
J(L) = ∫∫

y)dxdy (4)
where the first term drives the solution to follow the perception gain model, while the second term imposes a smoothness constraint. Here refers to the image. The parameter λ controls the relative importance of the two terms. The space varying permeability weight ρ(x, y) controls the anisotropic nature of the smoothing constraint.
The Euler-Lagrange equation for this calculus of variation problem yields:
L + λ
Discretized on a rectangular lattice, this linear partial differential equation becomes:
Li,j + λ [
1 hρ
= I (6)
where h is the pixel grid size and the value of each ρ is taken in the middle of the edge between the center pixel and each of the corresponding neighbors (see Figure 1(b)). In this formulation, ρ controls the anisotropic nature of the smoothing by modulating permeability between pixel neighbors. Equation 6 can be solved numerically using multigrid methods for boundary value problems [17]. Multigrid algorithms are fairly efficient having complexity O(N), where N is the number of pixels [17]. Running our non-optimized code on a 2.4GHz Pentium 4 produced execution times of 0.17 seconds for a 320x240-pixel image, and 0.76 seconds for a 640x480-pixel image.
The smoothness is penalized at every edge of the lattice by weights ρ (see Figure 1(b)). As stated earlier, the weight should change proportionally with the strength of the discontinuities. We need a relative measure of local contrasts that will equally ”respect” boundaries in shadows and bright regions. We call again upon Weber’s law and modulate the weights ρ by Weber’s contrast 1
hρ a+b 2
where ρ a+b 2
is the weight between two neighboring pixels whose intensities are Ia
and Ib. 1 In our experiments equally good performance can be obtained by using Michelson’s
contrast (Ia + Ib)/(Ia − Ib).
Original PIE images
Processed PIE images
Fig. 2. Result of removing illumination variations with our algorithm for a set of images from the PIE database
3 Face Recognition across Illumination
3.1 Databases and Algorithms
We use images from two publicly available databases in our evaluation: CMU PIE database and Yale database. The CMU PIE database contains a total of 41,368 images taken from 68 individuals [18]. The subjects were imaged in the CMU 3D Room using a set of 13 synchronized high-quality color cameras and 21 flashes. For our experiments we use images from the more challenging illumination set which was captured without room lights (see Figure 2).
The Yale Face Database B [6] contains 5760 single light source images of 10 subjects each seen under 576 viewing conditions: 9 different poses and 64 illumination conditions. Figure 3 shows examples for original and processed images. The database is divided into different subsets according to the angle the light source direction forms with the camera’s axis (12, 25, 50 and 77)
We report recognition accuracies for two algorithms: Eigenfaces (Principal Component Analysis (PCA)) and FaceIt, a commercial face recognition system from Identix. Eigenfaces [19] is a standard benchmark for face recognition algorithms [1]. FaceIt was the top performer in the Facial Recognition Vendor Test 2000 [2]. As comparison we also include results for Eigenfaces on histogram equalized and gamma corrected images.
3.2 Experiments
The application of our algorithm to the images of the CMU PIE and Yale databases results in accuracy improvements across all conditions and all algorithms. Figure 4 shows the accuracies of both PCA and FaceIt for all 13 poses of the PIE database. In each pose separately the algorithms use one illumination condition as gallery and all other illumination conditions as probe. The reported
Original Yale images
Processed Yale images
Fig. 3. Example images from the Yale Face Database B before and after processing with our algorithm
22 02 25 37 05 07 27 09 29 11 14 31 34 0
10
20
30
40
50
60
70
80
90
100
Original SI Histeq Gamma
22 02 25 37 05 07 27 09 29 11 14 31 34 0
10
20
30
40
50
60
70
80
90
100
(a) PCA (b) FaceIt
Fig. 4. Recognition accuracies on the PIE database. In each pose separately the algorithms use one illumination condition as gallery and all other illumination conditions as probe. Both PCA and FaceIt achieve better recognition accuracies on the images processed with our algorithm (SI) than on the original. The gallery poses are sorted from right profile (22) to frontal (27) and left profile (34)
results are averages over the probe illumination conditions in each pose. The performance of PCA improves from 17.9% to 48.6% on average across all poses. The performance of FaceIt improves from 41.2% to 55%. On histogram equalized and gamma corrected images PCA achieves accuracies of 35.7% and 19.3%, respectively.
Figure 5 visualizes the recognition matrix for PCA on PIE for frontal pose. Each cell of the matrix shows the recognition rate for one specific gallery/probe illumination condition. It is evident that PCA performs better in wide regions of
03 02 04 10 18 05 07 19 06 20 11 08 12 09 21 14 13 22 17 15 16
03
02
04
10
18
05
07
19
06
20
11
08
12
09
21
14
13
22
17
15
16
03 02 04 10 18 05 07 19 06 20 11 08 12 09 21 14 13 22 17 15 16
03
02
04
10
18
05
07
19
06
20
11
08
12
09
21
14
13
22
17
15
16
20
30
40
50
60
70
80
90
100
03 02 04 10 18 05 07 19 06 20 11 08 12 09 21 14 13 22 17 15 16
03
02
04
10
18
05
07
19
06
20
11
08
12
09
21
14
13
22
17
15
16
(a) Original (b) Histogram equalized (c) Our algorithm
Fig. 5. Visualization of PCA recognition rates on PIE for frontal pose. Gallery illumination conditions are shown on the y-axis, probe illumination conditions on the x-axis, both spanning illumination conditions from the leftmost illumination source to the rightmost illumination source
Subset 2 Subset 3 Subset 4 20
30
40
50
60
70
80
90
100
30
40
50
60
70
80
90
100
(a) PCA (b) FaceIt
Fig. 6. Recognition accuracies on the Yale database. Both algorithms used images from Subset 1 as gallery and images from Subset 2, 3 and 4 as probe. Using images processed by our algorithm (SI) greatly improves accuracies for both PCA and FaceIt
the matrix for images processed with our algorithm. For comparison the recognition matrix for histogram equalized images is shown as well.
We see similar improvements in recognition accuracies on the Yale database. In each case the algorithms used Subset 1 as gallery and Subsets 2, 3 and 4 as probe. Figure 6 shows the accuracies for PCA and FaceIt for Subsets 2, 3 and 4. For PCA the average accuracy improves from 59.3% to 93.7%. The accuracy of FaceIt improves from 75.3% to 85.7%. On histogram equalized and gamma corrected images PCA achieves accuracies of 71.7% and 59.7%, respectively.
4 Conclusion
We introduced a simple and automatic image-processing algorithm for compensation of illumination-induced variations in images. The algorithm computes the estimate of the illumination field and then compensates for it. At the high level, the algorithm mimics some aspects of human visual perception. If desired, the user may adjust a single parameter whose meaning is intuitive and simple to un- derstand. The algorithm delivers large performance improvements for standard face recognition algorithms across multiple face databases.
Acknowledgements
The research described in this paper was supported in part by National Science Foundation grants IIS-0082364 and IIS-0102272 and by U.S. Office of Naval Research contract N00014-00-1-0915.
References
[1] Phillips, P., Moon, H., Rizvi, S., Rauss, P.: The FERET evaluation methodology for face-recognition algorithms. IEEE PAMI 22 (2000) 1090–1104 10, 14
[2] Blackburn, D., Bone, M., Philips, P.: Facial recognition vendor test 2000: evaluation report (2000) 10, 14
[3] Gross, R., Shi, J., Cohn, J.: Quo vadis face recognition? In: Third Workshop on Empirical Evaluation Methods in Computer Vision. (2001) 10
[4] Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.: Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE PAMI 19 (1997) 711–720 10
[5] Belhumeur, P., Kriegman, D.: What is the set of images of an object under all possible lighting conditions. Int. J. of Computer Vision 28 (1998) 245–260 10
[6] Georghiades, A., Kriegman, D., Belhumeur, P.: From few to many: Generative models for recognition under variable pose and illumination. IEEE PAMI (2001) 10, 14
[7] Riklin-Raviv, T., Shashua, A.: The Quotient image: class-based re-rendering and recognition with varying illumination conditions. In: IEEE PAMI. (2001) 10
[8] Georghiades, A., Kriegman, D., Belhumeur, P.: Illumination cones for recognition under variable lighting: Faces. In: Proc. IEEE Conf. on CVPR. (1998) 10
[9] Blanz, V., Romdhani, S., Vetter, T.: Face identification across different poses and illumination with a 3D morphable model. In: IEEE Conf. on Automatic Face and Gesture Recognition. (2002) 10
[10] Horn, B.: Robot Vision. MIT Press (1986) 11 [11] Stockam, T.: Image processing in the context of a visual model. Proceedings of
the IEEE 60 (1972) 828–842 11 [12] Land, E., McCann, J.: Lightness and retinex theory. Journal of the Optical
Society of America 61 (1971) 11 [13] Jobson, D., Rahman, Z., Woodell, G.: A multiscale retinex for bridging the gap
between color imges and the human observation of scenes. IEEE Trans. on Image Processing 6 (1997) 11
[14] Tumblin, J., Turk, G.: LCIS: A boundary hierarchy for detail-preserving contrast reduction. In: ACM SIGGRAPH. (1999) 11
[15] Perona, P., Malik, J.: Scale-space and edge detection using anisotropic diffusion. IEEE PAMI 12 (1990) 629–639 11
[16] Wandel, B.: Foundations of Vision. Sunderland MA: Sinauer (1995) 12 [17] Press, W., Teukolsky, S., Vetterling, W., Flannery, B.: Numerical Recipes in C.
Cambridge University Press (1992) 13 [18] Sim, T., Baker, S., Bsat, M.: The CMU Pose, Illumination, and Expression (PIE)
database. In: IEEE Int. Conf. on Automatic Face and Gesture Recognition. (2002) 14
[19] Turk, M., Pentland, A.: Eigenfaces for recognition. Journal of Cognitive Neuro- science 3 (1991) 71–86 14
J. Kittler and M.S. Nixon (Eds.): AVBPA 2003, LNCS 2688, pp. 19-26, 2003. Springer-Verlag Berlin Heidelberg 2003
Quad Phase Minimum Average Correlation Energy Filters for Reduced Memory Illumination Tolerant Face
Authentication
Electrical and Computer Engineering Department Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh PA 15217, USA
[email protected] [email protected]
Abstract. In this paper we propose reduced memory biometric filters for performing distortion tolerant face authentication. The focus of this research is on implementing authentication algorithms on small factor devices with limited memory and computational resources. We com- pare the full complexity minimum average correlation energy filters for performing illumination tolerant face authentication with our proposed quad phase minimum average correlation energy filters[1] utilizing a Four-Level correlator. The proposed scheme requires only 2bits/frequency in the frequency domain achieving a compression ratio of up to 32:1 for each biometric filter while still attaining very good verification performance (100% in some cases). The results we show are based on the illumination subsets of the CMU PIE database[2] on 65 people with 21 facial images per person.
1 Introduction
Biometric authentication systems are actively being researched for access control, and a growing interest is emerging where these systems need to be integrated into small factor devices such as credit cards, PDA’s, cell phones and other devices with limited memory and computational resources, with memory being the most costly resource in such systems.
Traditional correlation filter based methods have not been favored in areas of pattern recognition, mainly because the filters employed were Matched filters[3] which meant that as many filters as training images were used, leading to large amount of memory needed for storing these filters and, more importantly one would have to perform cross-correlation with each of the training images (or matched filters) for each test image. Clearly this is very expensive computationally and requires huge memory resources.
20 Marios Savvides and B.V.K. Vijaya Kumar
Fig. 1. Correlation schematic block diagram. A single correlation filter is synthesized from many training images and stored directly in the frequency domain. FFT’s are used to perform cross-correlation fast and the correlation output is examined for sharp peaks
Recent work using advanced correlation filter designs have shown to be successful for performing face authentication in the presence of facial expressions[4][5]. Ad- vanced correlation filters[6] such as the minimum average correlation energy (MACE) filters[1], synthesize a single filter template from a set of training images and produce sharp distinct correlation peaks for the authentic class and no discernible peaks for impostor classes. MACE filters are well suited in applications where high discrimination is required. In authentication applications we are typically given only a small number of training images. These are used to synthesize a single MACE filter. This MACE filter will typically produce sharp distinct peaks only for the class it has been trained on, and will automatically reject any other classes without any a priori information about the impostor classes. Previous work applying these types of filters for eye detection can be found here [7].
1.1 Minimum Average Correlation Energy Filters
Minimum Average Correlation Energy (MACE)[1] filters are synthesized in closed form by optimizing a criterion function that seeks to minimize the average correlation energy resulting from cross-correlations with the given training images while satis- fying linear constraints to provide a specific peak value at the origin of the correlation plane for each training image. In doing so, the resulting correlation outputs from the training images resemble 2D-delta type outputs, i.e. sharp peaks at the origin with values close to zero elsewhere. The position of the detected peak also provides the location of the recognized object.
The MACE filter is given in the following closed form equation:
1 1( )− + − −= 1h D X X D X u (1)
Assuming that we have N training images, then X in Eq.(1) is an LxN matrix, where L is the total number of pixels of a single training images (L=d1xd2). X contains the Fourier transforms of each of the N training images lexicographically re- ordered and placed along each column. D is a diagonal matrix of dimension LxL containing the average power spectrum of the training images lexicographically re- ordered and placed along its diagonal. u is a row vector with N elements, containing the corresponding desired peak values at the origin of the correlation plane of the
Quad Phase Minimum Average Correlation Energy Filters 21
training images. The MACE filter is formulated directly in the frequency domain for efficiency. Note that + denotes complex conjugate transpose. Also, h is a row vector that needs to be lexicographically re-ordered to form the 2-D MACE filter. In terms of memory requirements, h is typically a complex 32 bit double. For example for a 64x64 resolution images, that would need 32x2x64x64 ~ 32Kb for a single MACE filter array stored in the frequency domain as shown in Fig. 1.
1.2 Peak-to-Sidelobe Ratio (PSR) Measure
The Peak-to-Sidelobe Ratio (PSR) is a metric used to test the whether a test image belongs to the authentic class. First, the test image is cross-correlated with the synthesized MACE filter, then the resulting correlation output is searched for the peak correlation value. A rectangular region (we use 20x20 pixels) centered at the peak is extracted and used to compute the PSR as follows. A 5x5 rectangular region centered at the peak is masked out and the remaining annular region defined as the sidelobe region is used to compute the mean and standard deviation of the sidelobes. The peak-to-sidelobe ratio is given as follows:
peak meanPSR σ −= (2)
The peak-to-sidelobe ratio measures the peak sharpness in a correlation output which is exactly what MACE filter tries to optimize, hence the larger the PSR the more likely the test image belongs to the authentic class. It is also important to realize that the authentication decision is not based on a single projection but many projections which should produce a specific response in order to belong to the authentic class, i.e. the peak value should be large, and the neighboring correlation values which correspond to projections of the MACE point spread function with shifted versions of the test image should yield values close to zero. Another important property of the PSR metric is that it is invariant to any uniform scale changes in illumination. This can be easily be verified from Eq. (2) as multiplying the test image by any constant scale factor can be factored out from the peak, mean and standard deviation terms to cancel out.
Fig. 2. Peak-to-sidelobe ratio computation uses a 20x20 region of the correlation output centered at the peak
2 Quad Phase MACE Filters – Reduced Memory Representation
It is well known that in the Fourier domain, phase information is more important than magnitude for performing image reconstruction [8][9]. Since phase contains most of the intelligibility of an image, and can be used to retrieve the magnitude information, we propose to reduce the memory storage requirement of MACE filters by preserving and quantizing the phase of the filter to 4 levels. Hence the resulting filter will be named Quad-Phase MACE filter where each element in the filter array will take on ± 1 for the real component and ± j for the imaginary component in the following man- ner.
1 { ( , )} 0 1 { ( , )} 0
H u v H u v
−
− ℑ <
(3)
Essentially 2 bits/per/frequency are needed to encode the 4 phase levels, namely π /4, 3π /4, 5π /4, 7π /4. Details on partial information filters can be found here [10].
2.1 Four-Level Correlator – Reduced Complexity Correlation
The QP-MACE filter described in Eq. (3) has unit magnitude at all frequencies, en- coding 4 phase levels. In order to produce sharp correlation outputs (that resemble delta-type outputs), then when multiplying the QP-MACE with the conjugate of the Fourier transform of the test image the phase should cancel out in order to provide a large peak. The only way the phases will cancel out is if the Fourier transform of the test image is also phase quantized in the same way such that phases can cancel out to produce a large peak at the origin. Therefore in this described architecture we also propose to quantize the Fourier transform of the test images as in Eq. (3).
Fig. 3. Correlation Outputs: (Left) Full Phase MACE filter ( Peak =1.00, PSR=66) (right) Quad Phase-MACE Filter using Four-Level Correlator (Peak=0.97, PSR=48)
Fig. 4. Sample images of Person 2 from the Illumination subset of PIE database captured with no background lighting
This effectively results in using a Four-Level correlator in the frequency domain, where multiplication involves only performing sign changes. Thus partly reducing the computational complexity of the correlation block in the authentication process. Ob- taining the quad-phase MACE filters and quad-phase Fourier transform arrays is achieved very simply (we do not require to implement the if…then branches shown in Eq. (3)), we need only to extract the sign bit from each element in the array.
3 Experiments Using CMU PIE Database
For applications such as face authentication, we can assume that the user will be co- operative and that he/she will be willing to provide a suitable face pose in order to be verified. However, illumination conditions cannot be controlled, especially for outdoor authentication. Therefore our focus in this paper is to provide robust face authentication in the presence of illumination changes. To test our proposed method, we used the illumination subset CMU PIE database containing 65 people each with 21 images captured under varying illumination conditions. There are 2 sessions of these dataset, one captured with background lights on (easier dataset), and another captured with no background lights (harder dataset). The face images were extracted and normalized for scale using selected ground truth feature points provided with the database. The resulting face images used in our experiments were of size 100x100 pixels.
We selected 3 training images from each person to build their filter. The images selected were those of extreme lighting variation, namely image 3, 7 and 16 shown in Fig. 3. The same image numbers were selected for every one of the 65 people, and a single MACE filter was synthesized for each person from those images using Eq. (1) and similarly a reduced memory Quad Phase MACE filter was also synthesized using Eq. (3). For each person’s filter, we performed cross-correlation with the whole dataset (65*21=1365 images), to examine the resulting PSRs for images from that person and all the other impostor faces. This was repeated for all people (total of 88,725 cross-correlations), for each of the two illumination datasets (with and without back-
ground lighting). We have observed in these results, that there is a clear margin of separation between the authentic class and all other impostors (shown as the bottom line plot depicting the maximum impostor PSR among all impostors) for all 65 people yielding 100% verification performance for both the full-complexity MACE filters and the reduced-memory Quad-Phase MACE filters. Figure 5. shows a PSR comparison plot for both types of filters for Person 2 for the dataset that was captured with background lights on (note that this plot is representative of the comparison plots of the other people in the database). Since this is the easier illumination dataset, it is reasonable that we observe that the authentic PSRs are very high in comparison to Figure 6. which shows the comparison plot for the harder dataset captured with no lights on. The 3 distinct peaks shown, are those that belong to the 3 training images (3,7,16) used to synthesize Person 2’s filter. We observe that while there is a degradation in PSR performance using QP-MACE filters, this degradation is non-linear i.e. the PSR degrades more for very large PSR values (but still provides a large margin of separation from the impostor PSRs) of the full-complexity MACE filters, but this is not the case for low PSR values resulting from the original full-complexity MACE filters. We see that for the impostor PSRs which are in 10 PSR range and below, QP- MACE achieves very similar performance as the full-phase MACE filters.
Another very important observation that was consistent throughout all 65 people, is that the impostor PSRs are consistently below some threshold (e.g. 12 PSR). This observed upper bound is irrespective of illumination or facial expression change as reported in [4]. This property makes MACE type correlation filters ideal for verification, as we can select a fixed global threshold, above which the user get authorized, and this is irrespective of what type of distortion occurs, and even irrespective of the person to be authorized. In contrast however, this property does not hold in other approaches such as traditional Eigenface or IPCA methods, who’s residue or distance to face space is highly dependent on any illumination changes.
Fig. 5. PSR plot for Person 2 comparing the performance of full-complexity MACE filters and the reduced-complexity Quad Phase MACE filter using the Four-Level Correlator on the easier illumination dataset that was captured with background lights on
Fig. 6. PSR plot for Person 2 comparing the performance of full-complexity MACE filters and the reduced-complexity Quad Phase MACE filter using the Four-Level Correlator on the harder illumination dataset that was captured with background lights off
Fig. 7. (left) PSF of Full-Phase MACE filter (right) PSF of QP-MACE filter
Examining the point spread functions (PSF) of the full-phase MACE filter and the QP-MACE filter show that they are very similar as shown in Fig. 6. for Person 2. Since magnitude response is unity for all frequencies for the QP-MACE filter; it effectively acts as an all pass filter. This justifies why we are able to see more salient features (lower spatial frequency features) of the face, while in contrast the full complexity MACE filter emphasizes higher spatial frequencies; hence we are able to see only edge outlines of mouth, nose, eyes and eye brows. MACE filters work as well as they do in the presence of illumination variations because they emphasize higher spatial frequency features such as outlines of nose, eyes, mouth, their size and the relative geometrical structure between these features on the face. Majority of illumination variations affect the lower spatial frequency content of images, and these frequencies are attenuated by the MACE filters hence the output is unaffected. Shadows for example will introduce new features that have higher spatial frequency content, however MACE filters look at the whole image and do not focus at any one single feature, thus these types of filters provide a graceful degradation in performance as more distortions occur.
4 Conclusions
We have shown that our proposed Quad Phase MACE (QP-MACE) filters perform comparably to the full-complexity MACE filters achieving 100% verification rates on both illumination datasets on the CMU PIE database using only 3 training images. These Quad-Phase MACE filters only occupy 2bits per frequency (essentially 1 bit each for the real and imaginary component). Assuming that full phase MACE filter uses 32 bit double data type occupying 64 bits per frequency for complex data, then the proposed Quad-Phase MACE filters only require 2bits/per/Frequency achieving a compression ratio of up to 32 times smaller. A 64x64 pixel biometric filter will only require 1 Kilobytes of memory for storage, making this scheme ideal for implementation on limited memory devices.
This research is supported in part by SONY Corporation.
References
[1] Mahalanobis, B.V.K. Vijaya Kumar, and D. Casasent: Minimum average correlation energy filters. Appl. Opt. 26, pp. 3633-3630, 1987.
[2] T. Sim, S. Baker, and M. Bsat: The CMU Pose, Illumination, and Expression (PIE) Database of Human Faces. Tech. Report CMU-RI-TR-01-02, Robotics Institute, Carnegie Mellon University, January 2001.
[3] Vanderlugt: Signal detection by complex spatial filtering. IEEE Trans. Inf. Theory 10, pp. 139-145, 1964.
[4] M. Savvides, B.V.K. Vijaya Kumar and P. Khosla: Face verification using correlation filters. Proc. Of Third IEEE Automatic Identification Advanced Technologies, Tarrytown, NY, pp.56-61, 2002.
[5] B.V.K. Vijaya Kumar, M. Savvides, K. Venkataramani and C. Xie: Spatial Frequency Domain Image Processing for Biometric Recognition. Proc. Of Intl. Conf. on Image Processing (ICIP), Rochester, NY, 2002.
[6] B.V.K. Vijaya Kumar: Tutorial survey of composite filter designs for optical correlators. Applied Optics 31, 1992.
[7] R. Brunelli, T. Poggio: Template Matching: Matched Spatial Filters and be- yond. Pattern Recognition, Vol. 30, No. 5, pp.751-768, 1997.
[8] S. Unnikrishna Pillai and Brig Elliott: Image Reconstruction from One Bit of Phase Information. Journal of Visual Communication and Image Representa- tion, Vol. 1, No. 2, pp. 153-157, 1990.
[9] V. Oppenheim and J. W. Lim: The importance of phase on signals. Proc. IEEE 69, pp. 529-532, 1981.
[10] V. K. Vijaya Kumar: A Tutorial Review of Partial-Information Filter Designs for Optical Correlators, Asia-Pacific Engineering Journal (A), Vol. 2, No. 2, pp. 203-215, 1992.
Component-Based Face Recognition
Jennifer Huang1, Bernd Heisele1,2, and Volker Blanz3
1 Center for Biological and Computational Learning, M.I.T., Cambridge, MA, USA [email protected]
2 Honda Research Institute US, Boston, MA, USA [email protected]
3 Computer Graphics Group, Max-Planck-Institut, Saarbrucken, Germany [email protected]
Abstract. We present a novel approach to pose and illumination invariant face recognition that combines two recent advances in the computer vision field: component-based recognition and 3D morphable models. First, a 3D morphable model is used to generate 3D face models from three input images from each person in the training database. The 3D models are rendered under varying pose and illumination conditions to build a large set of synthetic images. These images are then used to train a component-based face recognition system. The resulting system achieved 90% accuracy on a database of 1200 real images of six people and significantly outperformed a comparable global face recognition system. The results show the potential of the combination of morphable models and component-based recognition towards pose and illumination invariant face recognition based on only three training images of each subject.
1 Introduction
The need for a robust, accurate, and easily trainable face recognition system becomes more pressing as real world applications such as biometrics, law en- forcement, and surveillance continue to develop. However, extrinsic imaging parameters such as pose, illumination and facial expression still cause much diffi- culty in accurate recognition. Recently, component-based approaches have shown promising results in various object detection and recognition tasks such as face detection [7, 4], person detection [5], and face recognition [2, 8, 6, 3].
In [3], we proposed a Support Vector Machine (SVM) based recognition system which decomposes the face into a set of components that are interconnected by a flexible geometrical model. Changes in the head pose mainly lead to changes in the position of the facial components which could be accounted for by the flexi- bility of the geometrical model. In our experiments, the component-based system consistently outperformed global face recognition systems in which classification was based on the whole face pattern. A major drawback of the system was the need of a large number of training images taken from different viewpoints
28 Jennifer Huang et al.
and under different lighting conditions. These images are often unavailable in real-world applications.
In this paper, the system is further developed through the addition of a 3D morphable face model to the training stage of the classifier. Based on only three images of a person’s face, the morphable model allows the computation of a 3D face model using an analysis by synthesis method [1]. Once the 3D face models of all the subjects in the training database are computed, we generate arbitrary synthetic face images under varying pose and illumination to train the component-based recognition system.
The outline of the paper is as follows: Section 2 briefly explains the genera- tion of 3D head models. Section 3 describes the component-based face detector trained from the synthetic images. Section 4 describes the component-based face recognizer, which was trained from the output of the component-based face detection unit. Section 5 presents the experiments on component-based and global face recognition. Finally, Section 6 summarizes results and ou

audio- and video-based biometric person authentication: 4th international conference, avbpa 2003...

Documents