pattern recognition, july 2012 robust ear identification ...csajaykr/myhome/papers/prl2012b.pdf ·...

1

Robust Ear Identification using Sparse Representation of

Local Texture Descriptors

Ajay Kumar, Tak-Shing T. Chan

Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong

Abstract: Automated personal identification using localized ear images has wide range of

civilian and law-enforcement applications. This paper investigates a new approach for more

accurate ear recognition and verification problem using the sparse representation of local gray-

level orientations. We exploit the computational simplicity of localized Radon transform for the

robust ear shape representation and also investigate the effectiveness of local curvature encoding

using Hessian based feature representation. The ear representation problem is modeled as the

sparse coding solution based on multi-orientation Radon transform dictionary whose solution is

computed using the convex optimization approach. We also study the nonnegative formulation

such problem, to address the limitations from the regularized optimization problem, in the sparse

representation of localized ear features. The log-Gabor filter based approach and the localized

Radon transform based feature representation has been used as baseline algorithm to ascertain

the effectiveness of the proposed approach. We present experimental results from publically

available UND and IITD ear databases which achieve significant improvement in the

performance, both for the recognition and authentication problem, and confirm the usefulness of

proposed approach for more accurate ear identification.

1. Introduction

The identification of humans by the fellow humans has been the key to the fabrication of our

society and has matured with the evolution of mankind. We have been identifying humans

from their voice, appearance or their gait from thousands of years. However, the systematic

approach to scientifically identify humans is believed to have begun in 19th

century when

Alphonse Bertillon introduced the idea of using a number of anthropomorphic measurements

to identifying criminals. Personal identification using unique physiological and behavioral

characteristics is now increasingly employed in a variety of commercial and forensic

applications. The face, fingerprint and iris have emerged as the three most popular biometrics

Pattern Recognition, July 2012

2

modalities employed for the automated management of human identities. It is generally

believed that there is no single universal or superior biometric modality and each modality

has its unique application, deployment advantages and imaging requirements. Iris recognition

has shown to offer higher accuracy but requires constrained (near-infrared) imaging

environment and suffers from high failure to enroll rate [35]. Visually impaired persons

cannot use iris and retina based biometrics systems. Fingerprint recognition is widely

employed in the law-enforcement, border/immigration crossing and commercial applications.

The NIST report [30] to US Congress has stated that ~2% of the population (elderly, manual

labors, etc) does not have usable fingerprints. Similar conclusions have also been reported in

recently released large scale proof of concept study conducted by UIDAI [65] which

estimates that ~1.9% of subjects cannot be reliably authenticated using their fingerprints. The

fingerprint identification can also be rendered useless for the physically challenged

population who are not privileged to have fingers. There are new challenges to fingerprint

systems as the asylum seekers and criminals have shown to successfully evade the deployed

fingerprint identification technologies using surgically altered fingerprints [44]. Face

recognition technologies also have several limitations on the performance and suffer from the

performance degradations due to varying makeup, pose, expression and aging. The enhanced

research efforts on the face recognition technologies in the last decade have significantly

improved the accuracy of currently available face recognition systems. These technologies

are now grappling with new challenges emerging from the cosmetic/plastic surgery and face

spoofing [45]-[46]. Therefore further research efforts are required to exploit potential from

the other emerging biometrics modalities which can also be conveniently/simultaneously

acquired using advanced imaging sensors which are now widely available at lower cost.

3

Human ear images illustrate rich information which is embedded on the curved 3D

surface which has invited lot of attention from the forensic scientists. The morphology of

external ear is believed to be relatively stable over an acceptable period of time for

biometrics and forensic applications. Several studies on the stability of external ear, i.e.,

auricle, have suggested that ear shape matures quite early while its expansion* continues but

at very slow rate. In this context a study from 883 white Italian (4-73 years) subjects on age-

related trends in [62] suggests that ear length increases more than ear width. Meijerman et al.

[61] have presented detailed anthropometric study on external ear from 1353 subjects which

suggest that cartilage expansion, i.e., difference between auricle expansion and ear lobe

expansion, is greatest during the early adulthood. These studies are quite useful for the

biometrics identification and suggest that automated ear identification can benefit from

template update, whenever possible, for relatively younger and also for the older individuals.

The ear images can also be simultaneously acquired during face imaging and

employed to significantly improve the accuracy for the face recognition. It is possible to use

ear and face as complimentary pieces of information, especially in applications like

surveillance, tracking, or continuous personal authentication, as the best head position for

accurate ear recognition is not good for the accurate face recognition. The key advantages

associated with the use of 2-D ear images as biometric modality include the relative stability

of ear images in varying expressions, its relative immunity to privacy concerns, and

convenience to covertly acquire ear images for the surveillance applications. There has been

steady growth in the research interest to develop automated ear recognition technologies in

last decade. However significant efforts are further required to improve the ear detection,

* A study on 400 Japanese subjects aged 21-94 years has suggested [63] annual increase in auricle length of 0.13

mm while another study [54] on 206 subjects of various decent suggest such increase of 0.22 mm in the age

group of 30-93 years.

4

segmentation and recognition capabilities to make a convincing case for its deployment in

surveillance and other commercial applications.

Figure 1: Anatomy of external ear using a sample 2D sketch [56].

2. Related Prior Work

Automated personal identification using 2-D ear images has invited lot of research efforts in

the biometrics literature. The gray-level ear images typically acquire anatomy of external

human ear (figure 1) [2], [56]. Iannarelli [1] has manually attempted to classify the human ear

photographs into four categories, i.e., triangle, round, oval and rectangular, largely on the

basis of closed contour resulting from shape of the helix ring and lobule of the ear. He is

credited for developing a 12-point measurement scale, also referred to as Iannarelli system,

which was used to align and match ~7000 right ear images of different individuals.

Commercial solutions for the automated ear identification are not yet available (in the best of

our knowledge) but they can be highly useful in variety of forensic and civilian applications.

In this context, the US patent office has issued several patents on ear recognition methods and

systems. The Sandia Corporation was issued the first US patent [59] that exploited acoustic

properties of the ear for personal recognition. US patent no. 7826643 [60] describes 3D ear

recognition approach by generating eigen-ear space from the enrollment images. Another US

patent in [58] describes ear identification by incorporating imaging capabilities into the

5

telephone while recent US patent [57] describes another feature extraction algorithm for 3D

ear biometrics. The currently employed 3D imaging technologies in the literature employ 3D

digitizer which can be bulky and also quite expensive. Therefore the focus of this work has

been to exploit the 2D ear images that can be conveniently acquired from low-cost digital

camera.

A variety of approaches have been explored in the literature to extract discriminant

features from the 2D ear images that can characterize the gray level distribution in these

images for accurate automated personal identification. These approaches can be broadly

categorized into four categories based on the nature of features that are extracted from the

normalized ear images for the matching; (i) structural approaches, (ii) subspace learning-

based approaches, (iii) model-based, and (iv) spectral approaches. Salient features of these

approaches are outlined in Table 1 as the detailed description of these approaches is beyond

the scope of this paper. A structural feature scheme generally uses some of the well defined

geometrical features, such as the distance between crus of helix and ear lobe, to extract

features that can describe the shape of the ear. Such approaches are quite simple to implement

but often achieve limited performance as it is quite challenging to robustly extract the shape

features from limited resolution 2-D ear images and ensure sufficient discrimination in the

characterization of shape features. In this context, it may be noted that that the approach

employed in [2], [8] uses manual procedure to extract the structural features and ascertain the

uniqueness of the ear shape. The subspace learning-based approaches use normalized ear

images to construct subspaces which are built from the training data. Each of the unknown

normalized ear images is then projected into such subspaces. The similarity of resulting

coefficients with those from the training images is used to ascertain the identity of unknown

6

ear image. The subspace learning-based approaches using global appearance-based

representation (e.g. PCA [6]-[7], LDA [36]) are not robust to identify and accommodate local

image variations and are therefore not expected to match the superior performance that is

often possible using nonlinear (e.g. MCPCA [4]) and local subspace feature (coefficients)

matching approaches. The model-based approaches have attracted least attention in the ear

biometrics literature. Reference [3] described such an approach that develops a component

based model which is learned from the clustering of key points during the training phase.

Several applications of frequency and spatial-frequency domain features for the identification

of normalized ear images have been reported in the literature. Such approaches can be

Table 1: Classification of 2D ear identification methods into four categories

S. No. Approach Examples Reference

1

Structural

Vernoi diagram

Geometrical features

Active Shape Model

[2], [8]

[1], [21], [26]

[36]

2

Subspace learning

PCA

LDA

ICA

MCPCA

SIFT

[6]-[7]

[15]

[39]

[4]

[18], [34]

3

Model-based

Clustering of learned

components

Multiple matcher model

Fractal-based encoding

[3]

[43]

[64]

4

Spectral

Fourier Descriptors

1D Log-Gabor filter

Monogenic Log-Gabor

LBP

2D Gabor filter

QuaternionicCode

[14]

[17], [37]

[41]

[40], [42]

[37]

[41]

categorized as spectral approaches and have emerged as most popular volume of references in

the ear biometric literature. The spectral approaches typically characterize the normalized ear

images from the spectral-domain representation and then acquire the local phase or

orientation information to generate the ear templates for the matching.

7

The localization of region of interest from the 2-D ear images, prior to the feature

extraction process, can follow manual or the automated approach. While majority of work in

the literature [4]-[6], [12], [14], [26] uses manually segmented ear images, there have been

some promising efforts to employ automated segmentation [10], [16], [18], [37] and evaluate

the ear recognition performance. The color distribution in 2-D ear images can also be

exploited to improve the identification accuracy and reference [5] has exploited such an

approach using sequential forward selection of color spaces. The nearest neighbor (k-NN)

classifier has the least complexity and therefore has been widely employed for the feature

classifications while there have been some interesting efforts to use neural-network classifier

in [12], [39]. The curved surface images of the human ear profile can provide more

discriminent information for the ear identification. Therefore several promising efforts that

also exploit the 3D range images for the ear recognition have been reported [19], [23]-[24]

[32] in the literature. The slow acquisition speed of 3D ear imaging devices (such as Vivid

910 3D digitizer), its bulk, and high cost limits its possible application for any commercial

exploitation of ear biometrics technologies. This is possibly the reason that 2D ear images

acquired from conventional digital camera have attracted more attention in biometrics.

The use of different evaluation protocols, databases and number of subjects in the ear

biometrics literature makes it very difficult to comparatively ascertain the performance.

Reference [37] has recently attempted to focus on such qualitative comparison which suggest

that the spectral feature extraction approaches (e.g. log-Gabor filtering) that can exploit local

phase characteristics is likely to achieve superior performance on some publicly available

databases. A summary of prior work in the ear recognition literature suggests the need to look

beyond the conventional phase information and identify new features which can be more

8

discriminative in the normalized ear images. In this context, the sparse representation of local

ear shape orientations can be promising alternative for robust feature representation and has

not yet attracted the attention of the researchers in the literature.

2.1 Our Work

This paper investigates a new approach for the more accurate ear identification using visible

illumination 2-D ear images. Our feature extraction scheme exploits the sparse representation

of finite Radon transform based local orientation information. The neighborhood relationship

of gray-levels in the normalized ear images is encoded as the dominant gray-level feature

orientations in a local region using local Radon transform. Our efforts detailed in this paper to

exploit the nonnegative formulation of the regularized optimization problem, to more

effectively encode the sparse orientation representation, further illustrate promising

improvement in the accuracy for the ear identification problem. The local Radon transform

(LRT) is computationally economical and highly effective in identifying the continuous line

structures in 2D images which has also been employed in finger knuckle identification [29]

and palm biometrics [38]. The experimental results presented in this paper, from the two

publicly available databases, illustrate superiority of LRT based feature extraction and

matching strategy over the several other approaches presented in the literature for the ear

verification problem. The Hessian based approach introduced in this paper can more

effectively characterize the ear shape information by encoding local curvature (second order

derivative) information, i.e., Hessian matrix, as can also be observed from our analysis in

figure 3. Our experimental results presented in this paper suggest that the sparse

representation of local gray-level orientation using LRT consistently outperform LRT based

(without sparse encoding) and log-Gabor filter based approach presented in the literature. It

9

may be noted that the log-Gabor filter based matching approach is used as baseline in this

work as it has shown to outperform [37] several other competing approaches on publicly

available ear databases.

The development of any viable personal identification system for civilian applications,

using the ear biometric, will also require automated segmentation of region of interest images

from the face profile images which are acquired in a contactless but relatively cooperative

environment. This paper focuses on such problem and employs gray-level ear images

acquired using contactless ear imaging. The automated segmentation of ear (region of

interest) has been achieved using the approach based on morphological operators and Fourier

descriptors as detailed in [37]. The segmented ear images are firstly enhanced to normalize

the influence of uneven illumination, noise and shadow. The resulting image is subjected to a

new feature extraction approach using the sparse representation of local grey level orientation

information, using computationally attractive LRT operations, to generate the ear template for

the matching. We present extensive experimental results to ascertain the superiority of such

sparse coding solution, using convex optimization approach, both for the ear verification and

ear recognition problems.

The rest of this paper is organized as follows. We firstly detail the theoretical

formulation of the feature extraction and matching strategies in section 3 using sparse

representation of local This section also introduces the Hessian LRT based local orientation

encoding and the nonnegative formulation of the optimization problem to preserve the loss of

information. Section 4 details on the experiments and results, both for the recognition and

verification problems. The discussion on the key observations and analysis is presented in

section 5 while section 6 summarizes the key conclusions from this paper.

10

3. Sparse Representation based Localized Feature Extraction

The local grey-level information from the normalized ear images often describe the ear shape

and the appearance of the surface texture. One possible approach to effectively represent the

distribution of such local texture information is to compute their spatial orientation across

multiple scales. Such spatial orientation information can be acquired using the convolution

with the popular multiscale and multiorientation filters. However the convolution with such

filters, e.g. Gabor filters, Ordinal filters, or second Derivative of Gaussian filters, is

computationally expensive. Therefore our approach is to construct an overcomplete dictionary

using a set of binarized masks which are designed to recover the localized orientation

information from the normalized ear images. The elements of this dictionary are defined by

[ ] with k n. In this work, the elements of dictionary , i.e., , are

introduced to estimate the spatial orientation of dominant local grey-level appearances in one

of the possible directions as illustrated in the figure 1. These elements are constructed from a

set of points on a finite grid , where with q as a positive integer,

Figure 2: Estimating the dominant orientation of local gray-levels in normalized ear images

using binarized LRT masks in a 10 10 pixel region in the directions of 0o, 1π/6,

2π/6, 3π/6, 4π/6, 5π/6 and the is 2 pixel wide.

11

and defined as in the following:

{{( )| ( ) ( ) }

{( )| }

( )

where [ ] and denotes the angle between line and the positive x-axis, and is the

line passing through the center ( ) of [29].

The feature extraction approach investigated in this work is inspired by recent advances

and exciting results in the sparse representation of biometrics features [48]-[50]. We use the

sparse representation to model computationally efficient Localized Radon Transform (LRT)

based dictionary which encodes the spatial orientation of local gray-level relationship that

constitutes the localized ear shape and texture features. The sparse representation uses image

patches which are uniformly sampled from the normalized (and enhanced) ear image, with

adjacent patch centers of l l LRT mask. The sparse representation, i.e. coefficients , of

each of such vectorized patch centered at ( ), requires solution for the following

-regularized optimization problem:

‖ ‖

‖ ‖

(2)

The solution to above optimization problem can be computed by well studied convex

optimization approaches which have been detailed in the literature [55]. We can use magic

[22], FOCUSS [13], or fast iterative shrinkage thresholding algorithm (FISTA) [25] to solve

such unconstrained optimization problem. In this work we employed FISTA as it is

significantly faster. The Lipschitz constant is defined in [25], is equal to 2*(max Eigen

value of DTD), can be regarded as a fixed value. The magnitude of is empirically selected

12

and fixed as for all the experiments in this paper. All the negative 's are then

clipped to zeros and the resulting (clipped) coefficients from the same orientation of LRT

dictionary are added to compute representative orientation of the feature ( ) The index

of the dominant orientation is binary encoded to extract the feature template. This process is

repeated for every patch in the normalized ear images to extract sparse representation of LRT

based localized orientation information. This algorithm can be summarized as follows:

Algorithm: Sparse Orientation Representation using Localized Radon Transform

Input: (normalized ear image)

Output: (feature template)

for each image patch

1: Sparse Representation from LRT Dictionary

‖ ‖

‖ ‖

2: Clipping and Summation

( ) ∑ ( )

3: Feature Template using Dominant Orientation Estimation ( )

end

The matching scores between a registered feature template and an unknown feature

template can be computed as follows:

( ) [ ] [ ] (∑ ∑ ( ( ) ( ))

∑ ∑ ( ( )⨁ )

) (3)

where m represents the width and n represent the height of the binary encoded feature

templates. The registered feature template image is represented as with width and height

expanded to and , and ⨁ is the conventional Exclusive OR operator that

generates unity while two operands are different and zeros otherwise, while

(

) (

) (4)

( ) { ( )

[ ] [ ]

(5)

13

( ) {

(6)

We divide the encoded template ear images, i.e. (x, y), into disjoint sub-regions which are

matched using equation (3). Thus our matching strategy [47] attempts to accommodate the

influence of local image variations in the normalized ear images by matching the

corresponding template sub-regions with a small amount of shifting. Therefore the employed

approach is expected to be more robust against the image variations in the normalized ear

images.

3.1 Hessian LRT

We also attempted to estimate more accurate estimation of local texture orientations using

Hessian LRT. The proposed Hessian LRT is motivated by the principle of least action in

classical mechanics and secondly by the success of Hessian phase algorithm explored in [47].

Similar to as for the LRT, given a window of size l we create D rotationally equispaced paths

intersecting at the center of the window, with line width w, and choose the path with the least

squared curvature (also known as the curve of least energy) as the dominant orientation. Let

( ) represents an ear image. We rotate its Hessian by degrees and project it along the

rotated line. It may be noted that we are not using the Hessian eigenvalues here because the

Radon transform uses Dirac’s delta to remove all points except the rotated line; in other

words, only the curvature along the line has meaning. Such rotated Hessians are computed as

follows:

[ ] [

] [

] ( )

14

We compute the energy of each equispaced path by separately summing the squared values of

along each path/orientation and choose the orientation with the least energy. Such

dominant orientations are encoded and matched in a similar manner as for LRT discussed in

previous section. Figure 3 illustrates that the estimated gray-level neighborhood relationship,

using the localized dominant orientation, can significantly differ from four approaches

considered in this work.

Figure 3: Normalized ear image sample from the (a) IITD v1 and (b) UND dataset with typical

dominant grey-level orientation estimated in the local region (blue color block) using four

approaches considered in this work. The representative dominant orientation (in red color)

can differ from these four methods.

3.2 NNG

The sparse representation of the local orientation features for each of the normalized ear

image patch is achieved by solving the regularized optimization problem as depicted in

equation (2) using FISTA [25] and then clipping all negative 's to zeros. Therefore all the

negative coefficients are effectively removed in such formulation. Given this loss of

information, we conjectured that it might be better to solve the nonnegative version of the

problem ‖ ‖

∑ directly. As it turns out, this nonnegative

version can be solved by an existing implementation called NNG [28], [31]. Henceforth, we

15

simply call such as approach as NNG and also ascertain its effectiveness in the ear recognition

experiments.

4. Experiments and Results

The experimental results using the sparse representation of local orientation features, detailed

in previous section, are reported in this section. We performed extensive experiments with the

one training and two training image samples on two publicly available ear image datasets. The

evaluation approach and the results are detailed in following subsections.

4.1 Databases

We evaluated the proposed ear identification approach on two publicly available ear image

databases. The first database is the UND dataset [6] which contains images from 110 subjects

used in our experiments. The second dataset is the IITD ear image database [33], [37] which

contains ear images from 125 subjects. The IITD database also provides a larger dataset of ear

images from 221 subjects, which is formed by the combination of the 125-subject dataset with

another dataset. The automated ear segmentation and enhancement approach are same as

detailed in [37] for both UND and IITD dataset. The segmented ear images are further

subjected to illumination normalization and the resulting grayscale images of size

pixels are employed for the feature extraction. Each of the two databases was also enlarged to

include {0, 3, -3, 6, -6, 9, -9} degrees of rotated image samples (using bicubic interpolation)

to accommodate rotational variations in the ear images. The images in figure 4 illustrate

typical image samples from the two databases employed in our work.

16

(a) (b)

Figure 4: Ear image samples from publicly available (a) IITD v1 and (b) UND dataset.

4.2 Verification Experiments

The comparative performance for the ear verification problem was firstly investigated using

the feature extraction approaches discussed in section 3. In all the verification experiments we

compare each subject to every other subject on a one-to-one basis using the corresponding

matching criteria to ascertain the similarity distance. In order to ascertain the performance

from the proposed approach using minimum or one training image and also using two training

image, we report experimental results from the two test protocols. The test protocol A

generates average of three tests where each of the first three images from each of the subjects

are used as the test images while remaining other images from the respective subjects are used

as training images. The test protocol B (all-to-all) reports average of performance when every

image of the every subject in the dataset is used as respective training image/sample. In each

of the verification experiments, we report average of experimental results using equal error

rate (EER) and receiver operating characteristics (ROC). For the 125 subject IITD dataset,

using protocol A with three tests, we generate 125×3 genuine comparisons and 124×125×3

imposter comparisons, and we generate 493 genuine comparisons and 124×493 imposter

comparisons using protocol B. For the 221 IITD dataset, using protocol A with three tests, we

generate 221×3 genuine comparisons and 220×221×3 imposter comparisons while 793

genuine comparisons and 220×793 imposter comparisons are generated using protocol B.

17

Finally, for the UND dataset, we generate 110×3 genuine comparisons and 109×110×3

imposter comparisons using protocol A, and 433 genuine comparisons and 109×443 imposter

comparisons using protocol B.

(a) (b)

(c)

Figure 5: The ROC curves from the verification experiments using protocol A using (a) 125

subject, (b) 221 subject, and (c) UND database.

The experimental results from the verification experiments are presented on three sets

of ear data (125, 221 and UND), with log-Gabor filter approach as the baseline. Specifically,

the log-Gabor filters are used, with bandwidths of 2.0310 octaves and wavelengths of 18 and

18

54 (these filter parameters are same as used in [37] for log-Gabor filters and generate the best

performance). The verification experimental results are summarized in Table 2 (best results in

blue) while the corresponding receiver operating characteristics are shown in figure 5-6.

Table 2: Average equal error rate (%) from the verification experiments.

Protocol A Protocol B

125 221 UND 125 221 UND

NNG 1.8688 2.1106 5.1473 1.6227 2.0185 5.3065

s-LRT 2.3817 2.1428 5.4518 2.0300 2.0168 5.0808

LRT 4.2677 3.0481 7.0684 3.8679 2.9070 7.0132

Hessian 5.2806 3.6285 9.8471 4.6416 3.6553 9.9731

LG 4.3183 3.1335 8.1818 4.0552 3.1274 7.3956

(a) (b)

Figure 6: The ROC curves from the verification experiments using protocol B using (a) 125


19

The experimental results from the verification experiments using 125 subject ear dataset

with protocol A suggest that the sparse representation approach using nonnegative

formulation (NNG) achieves the best performance (EER = 1.87%). For the 221 subject

database, the best performance using protocol A is again achieved sparse representation using

nonnegative formulation (EER = 2.11%). It is worth noting that the performance from NNG

and s-LRT is very close (ROCs in figure 5-6). However this performance is significantly

improved as compared to those achievable from using log-Gabor filter based approach. The

best performing filters for 125 subject dataset using protocol B is again achieved using NNG

(EER = 1.62%). Corresponding best performing approach for 221 subject database is NNG

(EER = 2.02%) and also s-LRT (EER = 2.02%). For the UND dataset, the NNG achieves the

best results (EER = 5.15%) using protocol A while s-LRT achieves best results (EER =

5.08%) using protocol B. The experimental results for the verification experiments using

Hessian based approach illustrate competing performance in some cases (table 2), it has

however performed poorly among all the approaches considered in this work. It is worth

noting that there has not been any effort in the literature to explore ear verification or

recognition performance using LRT, Hessian LRT, sparse representation of orientation

features using LRT or NNG (section 3.2). Our experimental results (table 2-3) suggest that the

LRT, NNG and s-LRT consistently outperforms conventional log Gabor filter based approach

explored in the literature.

4.3 Recognition Experiments

In all the recognition experiments, we follow the same two protocols as discussed in section

4.2 on the same three sets of data. The test protocol A generates average recognition accuracy

from three tests where each of 3 images from each of the subjects are used as probe image

20

while remaining other respective images are used as gallery images. The test protocol B

generates average of the recognition performance when every image of the every subject is

used as the gallery while other respective images of that subject are used as the probe images.

The average rank-one recognition rate (R1RR) is reported from all the experiments in the

Table 3 (best results in blue). The cumulative match characteristic (CMC) curves from each of

the recognition experiments is illustrated in figure 7-8.

(a) (b)

(c)

Figure 7: The CMC curves from the recognition experiments using protocol A using (a) 125

subject, (b) 221 subject, and (c) UND datbase.

21

(a) (b)

(c)

Figure 8: The CMC curves from the recognition experiments using protocol B using (a) 125


The recognition experiments using protocol A and protocol B on all the dataset

considered in this work illustrate that the sparse representation LRT achieves the best

performance, i.e., rank-one recognition accuracy, among all the approaches considered in this

work. The experimental results shown in table 3 suggest that the sparse representation using

NNG achieves competing performance as from s-LRT which is significantly improved than

those from LRT, Hessian, and log-Gabor filter based approaches. It is interesting to note that

22

the LRT achieves worst performance on the UND database using protocol A and also using

protocol B. However, the performance from LRT on 221 subject database is better than those

from Hessian using protocol A and also using protocol B. The best performing approach for

the recognition experiments (s-LRT) is also robust against the changes in the test protocol.

Our experimental results for both recognition and verification experiments have suggested

that the sparse representation of local features using LRT achieves significantly improved

performance than using those using baseline (log-Gabor and LRT) approaches considered in

this work.

Table 3: Average rank-one recognition rate from the experiments.

Protocol A Protocol B

125 221 UND 125 221 UND

NNG 96.8000 96.6817 91.5152 97.1602 96.9735 92.3788

s-LRT 97.0667 97.7376 91.5152 97.5659 97.8562 92.6097

LRT 94.4000 94.2685 84.8485 94.9290 94.4515 84.7575

Hessian 95.4667 93.8160 86.0606 95.9432 94.0731 85.9122

LG 95.7333 94.7210 87.8788 95.7404 94.8298 88.9145

5. Discussion

There has not been any effort to exploit sparse representation of local gray-level information,

or even the local Radon transform, for the ear identification problem while log-Gabor filter-

based approach has shown its superiority [37] over the Gabor phase, force field transform,

Gabor orientation and geometrical feature-based approaches on the publicly available ear

database. We therefore employed the log-Gabor filter-based phase encoding approach and

also the localized Radon transform based orientation encoding approach, as the baseline

approaches to evaluated its performance against the sparse representation based approach

developed in this paper. The experimental results from the verification and recognition

experiments have consistently suggested the superiority of sparse representation of orientation

features using LRT.

23

The goals for the verification and the recognition problems are different [51]. Therefore

any specific feature extraction or matching approach that performs better for the verification

problem may not necessarily be the best performing approach for the recognition problem.

Our experimental results (table 2) from the verification experiments also suggest that LRT

itself can achieve better performance than log-Gabor filter based approach while achieving

competing or similar performance (table 3) from the recognition experiments. In this context,

it may be noted that the LRT is significantly attractive because of its computational simplicity

[29]. Therefore in case of competing or similar performance from log-Gabor and LRT based

approach, LRT based approach will have high preference primarily due to its computational

simplicity.

The use of such LRT based dictionary brings two key advantages for the sparse

representation based ear identification. Firstly, it significantly reduces the computational

complexity and this advantage can be explained as in the following. The computational

complexity of spatial filters, e.g. Gabor filters, Derivative of Gaussian, etc., can be significant

as they require expensive pixel-based convolution operation. On the other hand the LRT

based feature extractor requires only simple summation operations which significantly reduce

the computational complexity during the feature extraction process. If we can reasonably

assume† that the complexity of addition and multiplication is equivalent, then we can state

that the LRT (with line width w) based sparse representation employed in this paper requires

2Rw times fewer operations (additions/multiplications) as compared to those approaches using

the spatial filters (with size ) in the sparse representation literature. In our experiments

we also noted that this reduction in complexity also comes with the superior performance.

† Exact computational complexity of multiplication is higher than that of addition.

24

Figure 9 in the following reproduces such sample results from our experiments for the

superior performance when second derivatives of Gaussian (s-DoG) dictionary is employed.

These results obtained using two versions of s-DoG which use initial size of 33×33 with (x,

y) pairs of (2.4, 6.0), (2.7, 6.75), (3.0, 7.5) and a scaled-down version of size 22×22 with (x,

y) pairs of (1.4, 4.0), (1.8, 4.5), (2.0, 5). The choice of 22 is informed by our successes in

utilizing the best from various window sizes. Experimental procedures and protocols are the

same as in Section 4. The experimental results suggested significantly improved performance

with the sparse representation using LRT as compared to those from s-DoG.

(a)

(b)

Figure 9: Comparative ROC‡ from using s-DoG filters using (a) protocol A and (b) protocol B.

The second key advantage of LRT based feature extraction strategy pursued in this

paper is that it significantly reduces the template size. This reduction results from the nature

‡ All the experiments in this figure use three enlargements simply to reduce/conserve computations.

25

of LRT operations which has associated down-sampling effect to reduce the dimension of

extracted features relative to the size of normalized ear images. For example if the size of

normalized ear images is M N pixels, the template size will be . Therefore the

template size reduces by (a factor of) w2 times relative to the size of normalized/original ear

image. It may be noted that in most cases the line-like features (representing piecewise linear

approximation of curved ear-lines) in the normalized ear images are more than one pixel,

which would guarantee significant computational savings for most of the time/cases. It may

be noted that the reduction in the template size not only reduces the storage requirements but

also improves the matching time, i.e., time required authentication or identification, as the

smaller templates can more efficiently generate the matching scores.

The sparse representation using NNG is expected to be better than s-LRT, but in our

experiments it did not always outperform s-LRT (their ROC curves are quite similar). It is

worth nothing that for real variables the l1-minimization problem ‖ ‖ subject to

is equivalent to the nonnegative version of the same problem ∑ subject to

[ ] ; here the optimal can be obtained by [ ] and vice versa

[11]. This means that FISTA is equivalent to NNG with a negative version of the dictionary

appended to the original dictionary (modulo sign changes). This may explain why the

performance using s-LRT and NNG based approach can be similar.

Despite elegant differential geometric formulation to exploit the least squared curvature

from ear images, the Hessian LRT based approach could not achieve competing performance

in our experiments. There can be some plausible explanation for this failure: first, the least

energy curve might not be a good model for ear shape lines; second, Hessians might not be

robust to the noise; and third, our proposal is equivalent to the trace transform with a

26

curvature functional, and it has been often conjectured that the trace transform often fail to

work [52] for anything other than the Radon functional. The sparse coefficients (equation 2)

are computed with Matlab implementations while the template matching scores (equation 3)

are generated/computed from the C++ implementation. We employed Farid’s [53] 7-tap

second-order differentiator (as implemented by Kovesi [20]) to compute the Hessians. Farid’s

differentiators are especially optimized for rotations so we do not even need the smoothing

(e.g. force field transform; [10]) for good results.

Figure 10: Comparative ROC from quaternionic quadrature filters using protocol A.

We have recently developed QuaternionicCode representation [41] for the ear recognition

using quadrature filters and achieved promising results. We have compared the approach

proposed in this paper with those using such quadarature filters using same protocols (all the

parameters and settings are the same as illustrated in [41]) and consistently achieved superior

results, both for the verification and recognition problems. Some sample results from the

comparative experiments are reproduced in figure 10. It can also be observed from figure 9

that the sparse representation based approach developed in this paper can achieve

significantly improved performance than those in [37].

27

The experimental results illustrated in section 4 consistently suggest superior

recognition results using the feature encoding based on the sparse representation of local pixel

orientations (NNG and s-LRT) as compared to those using conventional 1-D log-Gabor filter

based approach and is therefore recommended for the ear recognition and also for the

verification.

6. Conclusions

This paper has developed a new approach for the personal identification using 2D ear images.

This approach has been motivated by the recent advances in the sparse representation of

biometric features using the convex optimization approach. The ear-shape in sparse image

representations can be represented by large coefficients that locally present some oriented ear

shape structures. In order to efficiently compute these coefficients on prior structures, a Radon

transform based dictionary of oriented directions is constructed. Our experimental results

presented of two publicly available ear biometric image databases illustrate significant

improvement in the performance over the best or the baseline approaches (using log Gabor,

monogenic quadrature filter, QuaternionicCode or 2-D quadrature filter), both for the ear

verification and recognition problem. One of the key advantages of selecting the Radon

transform based dictionary for sparse representation is related to it’s the computational

simplicity as it just requires simple summation operations. Another favorable factor for using

the LRT based dictionary lies in the significantly reduced template size as the LRT operations

have associated down sampling effect which reduces the template size by a factor of w2 (w as

the line width of LRT mask) as compared to those using Gabor filters, derivative of Gaussian

(DoG) filters or other spatial filters.

28

The efforts detailed in this paper to exploit the sparse representation of local ear shape

descriptors have illustrated superior performance for the automated ear identification problem.

However, further work is required to ascertain the effectiveness of sparse representation for

the identification of partial ear images or for the ear images which are occluded due to

occlusion resulting from hair, reflection, etc.. The occlusion typically corrupts only fraction

of segmented ear region pixels and is therefore sparse in standard basis given by individual

pixels. When the error has such sparse representation, it can be handled uniformly, similar to

as in [49]. Therefore our further efforts will also extend the developed sparse representation

framework to help compensate for the commonly occurring occlusions in the segmented ear

images.

7. References

[1] A. Iannerelli, Ear Identification, Forensic Identification Series, Paramount Publishing Company,

Fremount, California, 1989.

[2] M. Burge and W. Burger, “Ear Biometrics,” Biometrics: Personal Identification in networked society,

A. K. Jain, R.Bolle, S. Pankanti (Eds), pp. 273-286, 1998.

[3] Arbab-Zavar and M. S. Nixon, “On guided model-based analysis of ear biometrics,” Computer Vis. &

Image Understanding, pp. 487-502, vol. 115, 2011.

[4] Y. Xu, D. Zhang, J.-Y. Yang, “A feature extraction method for use with bimodal biometrics,” Pattern

Recognition, pp. 1106-1115, vol. 43, 2010.

[5] L. Nanni and A. Lumini, “Fusion of color spaces for ear authentication,” Pattern Recognition, vol. 42,

pp. 1906-1913, 2009.

[6] K. Chang, B. Victor B., K.W. Bowyer, and S. Sarkar, “Comparison and combination of ear and face

images in appearance-based biometrics,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 8, pp.

1160-1165, 2003.

[7] D. J. Hurley, M. S. Nixon, J. N. Carter, “Force field energy functionals for ear biometrics,” Computer

Vision and Image Understanding, vol. 98, no. 3, pp. 491- 512, 2005.

[8] M. Burge, W. Burger, “Ear biometrics in machine vision,” Proc. 21 Workshop of the Australian

Association for Pattern Recognition, 1997.

29

[9] B. Victor, K.W. Bowyer, S. Sarkar, “An Evaluation of face and ear Biometrics,” Proc. Intl. Conf. on

Pattern Recognition, pp. 429-432, 2002.

[10] D. J. Hurley, M. S. Nixon, and J. N. Carter, “Force field energy functionals for image feature

extraction,” Image and Vision Computing, vol.20, no. 5-6, pp. 311-318, 2002.

[11] M. Fornasier, Numerical methods for sparse recovery, De GRUYTER, 2010

http://www.ricam.oeaw.ac.at/people/page/fornasier/FornasierLinz.pdf

[12] B. Moreno, A. Sanchez, J.F. Velez, “On the use of outer ear images for personal identification in

security applications,” Proc. 33rd Annual Intl. Carnahan Conf., Madrid, pp. 469-476, 1999.

[13] B. I. F. Gorodnitsky and B. D. Rao, “Sparse signal reconstruction from limited data using focus: a

reweighted minimum norm algorithm,” IEEE Trans. Signal Processing, vol. 45, pp. 600-616, Mar.

1997.

[14] A. F. Abate, M. Nappi, D. Riccio, and S. Ricciardi, “Ear recognition by means of a rotation invariant

descriptor,” Proc. ICPR 2006, Hong Kong, 2006.

[15] Z. Xiaoxun and J. Yunde, “Symmetrical null space for face and ear recognition,” Neurocomputing,

pp. 842-848, vol. 70, 2007.

[16] M. Adbel-Mottaleb and J. Zhou, “Human ear recognition from face profile images,” Proc. ICB 2006,

LNCS 3832, pp. 786-792, 2006.

[17] A. Kumar and D. Zhang, “Ear authentication using log-Gabor wavelets,” Proc. SPIE, vol. 6539, pp.

65390A, doi:10.1117/12.720244, 2007.

[18] J. D. Bustard and M. S. Nixon, “Towards unconstrained ear recognition from two dimensional

images,” IEEE Trans Sys. Man Cybern., Part C, vol. 40. no. 3, pp. 486-494, May 2010.

[19] P. Yan and K. W. Bowyer, “Biometric recognition using 3D ear shape,” IEEE Trans. Pattern Anal.

Mach. Intell., vol. 29, no. 8, pp. 1297-1308, 2007.

[20] P. Koveski, DERIVATIVE5 - 7-Tap 1st and 2nd discrete derivatives, 2010

http://www.csse.uwa.edu.au/~pk/research/matlabfns/Spatial/derivative7.m

[21] M. Choras, “Ear biometrics based on geometrical feature extraction,” Electronics Lett,. Comput. Vis.

& Image Analy., vol. 5, no. 3, pp. 84-95, 2005.

[22] E. Cands and J. Romberg, “l1-magic: Recovery of sparse signals via complex programming,”

Technical Report, California Institute of Technology, 2005.

http://users.ece.gatech.edu/~justin/l1magic/downloads/l1magic.pdf

[23] B. Bhanu, H. Chen, “Human ear recognition in 3D,” Proc. Multimodal User Authentication

workshop (MMUA), Santa Barbara, CA, pp. 91-98, 2003.

[24] H. Chen and B. Bhanu, “Human ear recognition in 3D”, IEEE Trans. Pattern Anal. Mach. Intell.,

vol. 29, no. 4, pp. 718-737, Apr. 2007.

30

[25] A. Beck and M. Teboulle, “A fast iterative shrinkage thresholding algorithm for linear inverse

problem,” SIAM Journal on Imaging Sciences, vol. 2, pp. 183–202, 2009.

[26] Z. Mu, L. Yuan, Z. Xu, D. Xi and S. Qi, “Shape and structural feature based ear recognition,”

Sinobiometrics 2004, LNCS 3338, pp. 663-670, 2004.

[27] L. Yuan, Z. Mu, and Z. Xu, “Using ear biometrics for personal recognition,” Proc. IWBRS 2005,

LNCS 3781, pp. 221-228, 2005.

[28] L. Breiman, “Better subset regression using nonnegative garrote,” Technometrics, vol. 37, pp. 373-

384, 1995.

[29] A. Kumar, and Y. Zhou, “Human identification using knucklecodes,” Proc. IEEE 3rd International

Conference on Biometrics: Theory, Applications, and Systems, BTAS '09, Washington D.C., pp. 1-6,

2009.

[30] NIST Report, Summary of NIST standards for biometric accuracy, temper resistance and

interoperability, 2002. http://biometrics.nist.gov/cs_links/pact/NISTAPP_Nov02.pdf

[31] X. Liu, S. Yan, and H. Jin, “Projective nonnegative graph embedding,” IEEE Trans. Image

Processing, vol. 9, no. 5, pp. 1126-1137, May 2010.

[32] T. Theoharis, G. Passalis, G. Toderici, and I. Kakadiaris, “Unified 3d face and ear recognition using

wavelet on geometry images,” Pattern Recognition, vol. 41, pp. 796-804, 2008.

[33] IIT Delhi Ear Database Version 1, http://web.iitd.ac.in/~biometrics/Database_Ear.htm

[34] D. R. Kisku, H. Mehrotra, P. Gupta, and J. K. Sing, “SIFT-based ear recognition by fusion of

detected keypoints from color similarity slice regions,” CoRR abs/1003.5861, 2010.

[35] E. Mordini and D. Tzovaras (Eds), Second Generation Biometrics: The Ethical Legal and Social

Context, The International Library of Ethics and Technology 11, Springer, 2012, DOI 10.1007/978-

94-007-3892-8_3.

[36] L. Lu, X. Zhang, Y. Zhao, Y. Jia, “Ear recognition based on statistical shape model,” Proc. First Intl.

Conf. Innovative Computing, Information and control, ICICIC’06, pp. 1-4, 2006.

[37] A. Kumar and Chenye Wu, “Automated human identification using ear imaging,” Pattern

Recognition, vol. 45, pp. 956-968, Mar. 2012.

[38] W. Jia, D.-S. Huang, and D. Zhang, “Palmprint verification based on robust line orientation code,”

Pattern Recognition, vol. 41, no. 5, pp. 1504-1513, May, 2008.

[39] H. Zhang, Z. Mu, W. Qu, L. Liu, and C. Zhang, “A novel approach for ear recognition based on ICA

and RBF network,” Proc. 4th Intl. Conf. Machine Learning & Cybern., pp. 4511-4515, 2005.

[40] Y. Wang, Z. Mu, and H. Zheng, “Block-based and multiresolution methods for ear recognition using

wavelet transform and uniform local binary pattern,” Proc. ICPR’08, ICPR, 2008.

31

[41] T.-S. Chan and A. Kumar, “Reliable ear identification using 2D quadrature filters,” Pattern Recognition

Letters, 2012.

[42] M. I. S. Ibrahim, M. S. Nixon, and S. Mahmoodi, “The effect of time on ear biometrics,” Proc. IJCB

2011, Washington DC, Oct. 2011.

[43] L. Nanni and A. Lumini, “A multi-matcher model for ear recognition,” Pattern Recognition, vol. 28,

pp. 2219-2216, Dec. 2007.

[44] K. Singh, “Altered fingerprints,” Interpol Report, INTERPOL. France, 2008,

http://www.interpol.int/Public/Forensic/fingerprints/research/alteredfingerprints.pdf.

[45] Hong Kong airport security fooled by these hyper-real silicon masks. http://www.cnngo.com/hong-

kong/visit/hong-kong-airport-security-fooled-these-hyper-real-silicon-masks-743923 Accessed 27

Jan 2012.

[46] Fake biometric eye stamps: Three arrested at Dubai international airport,

http://innovya.com/2011/09/fake-biometric-eye-stamps-three-arrested-at-dia Accessed 28 January

2012

[47] Y. Zhou and A. Kumar, “Human identification using palm-vein images,” IEEE Trans. Info.

Forensics & Security, Dec. 2011.

[48] D. Xu, Y. Huang, Z. Zeng, and X. Xu, “Human gait recognition using patch distribution feature and

locality-constrained group sparse representation,” IEEE Trans. Image Processing, vol. 21, pp.316-

326, Jan. 2012.

[49] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse

representation,” IEEE Trans. Pattern Anal. Mach. Intell., pp. 210-227, Feb. 2009.

[50] W. Zuo, Z. Lin, Z. Guo, and D. Zhang, “The multiscale competitive code via sparse representation

for palmprint verification,” Proc. CVPR 2010, pp. 2265-2272, Jun. 2010.

[51] R. M. Bolle, J. H. Connell, S. Pankanti, N. K. Ratha, and A. Senior, “The relation between ROC

curve and CMC, Proc. 4th IEEE Workshop on Automatic Identification Advanced Technologies, pp.

15-20, 2005.

[52] A. Kumar, Ch. Srikanth, “Online personal identification in night using multiple representation,”

Proc. ICPR 2008, pp. 1-4, Tampa, Florida, Dec. 2008

[53] H. Farid and E. Simoncelli, “Differentiation of discrete multi-dimensional signals,” IEEE Trans.

Image Processing, vol. 13, pp 496-508, Apr. 2004.

[54] J. A. Heathcote, “Why do old man have big ears?” BMJ vol. 311, pp. 1668, Dec. 1995.

[55] E. van den Berg and M. P. Friedlander, SPGL1: A solver for large-scale sparse reconstruction,

http://www.cs.ubc.ca/labs/scl/spgl1/, 2010

32

[56] Michael P. D'Alessandro, A digital library of anatomy information

http://www.anatomyatlases.org/firstaid/Otoscopy.shtml, 2012

[57] A. A. C. M. Kalkar and A. H. M. Akkermans, “Feature extraction algorithm for automatic ear

recognition,” US Patent No. 20080013794, Jan. 2008.

[58] M. W. J. Coughlan, A. Q. Forbes, C. Gannon, P. R. Michaelis, P. D. Runcle, and R. Warta, “Method

and apparatus for controlling access and presence information using ear biometrics,” US Patent No.

20090061819, Mar. 2009.

[59] A. M. Bouchard and G. C. Osbourn, “Systems and methods for biometric identification using the

acoustic properties of the ear canal,” US Patent No. 5,787,187, 1998.

[60] Z. J. Geng, “Three dimensional ear biometrics system and method,” U. S Patent No. 7826643, Nov.

2010.

[61] L. Meijerman, C. v. d. Lugt, and G. J. R. Matt, “Cross-sectional anthropometric study of the external

ear,” J. Forensic Sci., vol. 52, no. 2, pp. 286-293, Mar. 2008.

[62] C. Sforza, G. Grandi, M. Binelli, D. G. Tommasi, R. Rosati, and V. F. Ferrario, “Age- and Sex-

related changes in the normal human ear,” Forensic Science Intl., 187, 110e1-110e7, 2009.

[63] Y. Asai, M. Yoshimura, N. Nago, and T. Yamada, “Correlation of ear length with the age in Japan,

BMJ, vil. 312, pp. 582, Mar. 1996.

[64] M. D. Marsico, M. Nappi, and R. Daniel, “HERO: Human Ear Recognition against Occlusions,”

Proc. CVPR 2010, pp. 178-183, CVPR'W 2010, San Francisco, June 2010.

[65] Role of biometric technology in Aadhaar authentication, Authentication Accuracy Report, UIDAI,

27th March 2012.

http://uidai.gov.in/images/role_of_biometric_technology_in_aadhaar_authentication_020412.pdf

pattern recognition, july 2012 robust ear identification ...csajaykr/myhome/papers/prl2012b.pdf ·...

Documents