real time biometric face recognition

Real time Face

Recognition System Real time face recognition system using Radon and DCT transform

Neha Rathore

Shivani Pandya

[EE586- Project Report]

Contents

1. Abstract ............................................................................................................................... 3

Introduction ................................................................................................................................. 4

System Description ..................................................................................................................... 6

Hardware : ................................................................................................................................... 6

Software: ................................................................................................................................. 7

Selection of Algorithms .............................................................................................................. 8

Preprocessing: ......................................................................................................................... 8

Downsampling ........................................................................................................................ 8

Normalization ......................................................................................................................... 9

Wide-Sense histogram equalization .................................................................................. 10

Full Scale linear Scaling ................................................................................................... 10

Feature Selection ................................................................................................................... 14

PCA : ..................................................................................................................................... 14

Linear Discriminant Analysis (LDA) ................................................................................... 14

Elastic Graph Bunch Matching ............................................................................................. 15

Radon ................................................................................................................................ 15

Discrete Cosine Transform ............................................................................................... 15

Feature extraction using Radon transform and DCT ........................................................ 16

Classification......................................................................................................................... 17

Euclidean Distance: .......................................................................................................... 17

K-Nearest Neighbor Classifier: ......................................................................................... 17

IMPLEMENTATION AND RESULTS ................................................................................... 18

Database Collection: ............................................................................................................. 18

Dimensionality reduction ...................................................................................................... 19

Downsampling .................................................................................................................. 19

Selection of downsample image size ................................................................................ 19

Radon Transform .............................................................................................................. 20

DCT Transform ................................................................................................................. 20

Sel Selection of number of coefficients ............................................................................ 21

Selection of Classifier ( ED results and KNN): ................................................................ 22

Selection of value of K ..................................................................................................... 23

Challenges ................................................................................................................................. 25

Dimensionality ...................................................................................................................... 25

Processing time for Radon for different image sizes ............................................................ 26

Scatter plot of data : .......................................................................................................... 26

Limitations with REAL TIME RESPONSE ............................................................................. 37

Displaying classification result ......................................................................................... 37

Imposter model. ................................................................................................................ 38

Quantitative Results .................................................................................................................. 38

Conclusion ................................................................................................................................ 39

Future Work .............................................................................................................................. 40

References ................................................................................................................................. 40

Abstract

We propose a real time face recognition system suitable for small businesses and home

security systems. The goal of this project is to build a system that works in real world real

world situations where the user is not constrained by lighting conditions or slight variation of

user poses and in-plain rotation. The system is designed for good recognition rates and takes

care of the various problems in face recognition systems like illumination, rotation and etc.

For a system based on purely face recognition, it is specifically difficult to achieve good

recognition rates without considering some kind of clustering or linear separation of data.

System represents prototype of Real time face recognition system using Radon Transform and

2D DCT for feature extraction and KNN for classification giving acceptable performance of

86% on small set database. The performance of the system is presented in terms of

recognition rates for various combinations of feature set extraction techniques. The system

shows a recognition rate as high as 98% for the offline data which consisted of 30 test images,

20 validation images and 10 training images.

Introduction

In today’s age of technology small world security systems are high in demand. Such security

system should be non-obtrusive and require low user interaction. Facial recognition is able to

satisfy such needs as visibility of face does not need any specific action from user. At the

same time if a system allows a user to recognize him/her in uncontrolled environment, it in a

way becomes an non-obtrusive way of recognition.

Some of the desirable features of such systems include ease of use, low error rates, low cost of

implementation, portability and ease of integration. This report describes a prototype

implementation of face recognition and verification algorithms in a stand-alone system using

the TI TMS320C6713 floating point processor. This system is organized to capture an image

sequence, find the features of face in the images, and recognize and identify a person from a

database of 20 people in indoor-building lighting conditions.

For each person an image database is collected in two sessions possibly on different days to

capture the maximum variance in illumination and face poses and 30 images per session were

stored in the database for each person.

One of the main challenges in face recognition system is finding informative and

discriminative information about class image. A 2D-DCT of face images is very sensitive to

pose variations, where as the most commonly used techniques like PCA, LDA are

computationally very expensive for a small hardware system. Also face recognition based on

Gaussian Mixture models pose a difficulty in terms of singularities arising due to large feature

set as compared to small number of training prototype images. Hence, for this system we used

combination of radon transform and 2D DCT transform to achieve feature that can yield to

low frequency information which is crucial to face recognition system. The property of

Radon transform to enhance the low frequency components, which are useful for face

recognition, has been exploited to derive the effective facial features. Data compaction

property of DCT yields lower-dimensional feature vector. The proposed technique computes

Radon projections in different orientations and captures the directional features of the face

images. Further, DCT applied on Radon projections provides frequency features. The

technique is invariant to in-plane rotation (tilt) and robust to zero mean white noise. The

system is also tested for combinations of only 2D-DCT (10 coefficients), 2D-DCT (5

coefficients), Radon + DCT(10 coeff.) by simple Euclidean distance based classifier and a k-

nearest neighbor classifier for different values of K.

System Description

Hardware :

The system is implemented using a floating point DSP processor TI TMS320C6713along with

daughter card DSP STAR TFT LCD Video Daughtercard (VM3224K2) and camera Color

TeleCamera NCK41CV. The DSP board has 16 KB of internal RAM, 16 MB SDRAM and

512 MB external RAM. System is designed as a standalone application and does not need

intervention of the computer once the system is loaded for the first time. The training feature

set is collected offline on the board form the training image set and stored in SDRAM at the

time of loading the program.

Input from the camera: The camera gives images at the rate of 30, 15 or 7 frames per second

in 16 bit YUV format. This camera is a low resolution camera and does not perform well in

low lighting conditions, where it introduces a lot of noise which makes recognition very

difficult. Although the systems is programmed to achieve good performance even in moderate

lighting conditions , the camera stills is operated in good lighting conditions.

Software:

The face recognition algorithm is as shown in the figure:

The system first captures the image of the person through the camera in the 16 bit YUV

format. This image is then used to extract the gray level image of the person which is

basically the Y value of the image obtained from the camera. This format is then converted to

8 bit format and 8 bit gray level input image is obtained.

This image is then preprocesses to make it suitable for recognition step. Firstly, the image is

down sampled from 128x128 to 64x64 followed by normalization of the image to take care of

illumination changes. Since, the most common histogram equalization method introduces

artificial grey level values; we preferred the contrast enhancement by Linear scaling method.

Once the image is normalized by preprocessing, it is send to the feature extraction step where

first the radon of the image is calculated for 32 rotation angles and a 32x44 sized image is

obtained.

System Layout

Then a 2D-DCT of this image is performed to capture the low frequency components

important for face recognition. While training, this process is followed to collect the feature

set of the training set and store it in a file. This feature file is then loaded in to system along

with the program and used as database.

In the test stage, this process produces the feature vector which then compared to the n-feature

vectors of the training set and a distance measure is calculated from each of the training

feature vector. This is followed by the k-nearest neighbor classification which sorts the values

of difference vector and give the closest k-neighbors which leads to final classification

depending on the frequency of class indices in the sorted array.

Selection of Algorithms

There are various algorithms for face recognition that used either the eigenfaces approach,

geometric features approach or the appearance based approach for extracting features from

the given image set.

Preprocessing:

Each method of feature selection poses its own limitation in terms of being illumination

variant, pose variant or dependent on similarity with training set. For this reason, before

applying any feature extraction method we first normalize the image to reduce the effect of

lighting and rotations. Similarly, a high dimensional data poses a problem in terms of

processing time, redundant information and very large feature set leading to “curse of

dimensionality” during the classification stage. Hence, we downsample the image to reduce

the number of dimensions.

Downsampling

As mentioned above, down sampling is a efficient way of reducing redundant information in

an image which might lead to unnecessary feature values that do no contribute much in the

final classification. Down sampling, maintains the total entropy of the image while reducing

the number of dimensions. There are typically two ways of downsampling an image; Bilinear

interpolation and pixel averaging.

Bilinear interpolation leads to a sharper image as the pixel values are reconstructed by using

the pixel values of its neighbors thereby, taking care of the pixel value variations on the

neighboring pixels. Usually in image processing tasks, bilinear imterpolation is a preferred

method for image reconstruction.

In down sampling by pixel averaging, we simply take the value of all the pixels and divide by

the total number of pixels, thereby averaging the pixel value over those n-pixel values. As a

result of averaging the sharp features of the image are lost resulting in blurring. Although, the

only disadvantage of using this method is the rounding of error in case the average of the

pixels is not an integer value. However, this is a very small error and would not lead to

significant difference in the pixel values.

EXAMPLE: Downsampling

Original 128x128 image Pixel Averaging Bilinear Interpolation

We know, the edges of an image are represented as high frequency components in the

frequency domain and smooth regions of the image represent the low frequency regions like

the cheek, nose, forehead and etc. As we need to capture the low frequency components of the

image for good face recognition, pixel averaging method is more suitable in our case.

Normalization

Normalization of the image is important to take care of the illumination changes.

HISTOGRAM EQUALIZATION: The most common method to normalize image is the

histogram equalization method that distributes the grey levels in the image such that we attain

a uniform grey level distribution or pdf of the image.

Wide-Sense histogram equalization

In this method we stretch the original histogram to cover the whole 0-255 range of gray levels.

This technique does not guarantee equal number of pixels in each gray level, but gives a

contrast enhanced version of input image. We use the following formula:

Pixels of No.

LevelIntensity .

0

MaxNO

i

j

ji ×

= ∑

=

The meaning of Max. Intensity Levels maximum intensity level which a pixel can get. For

example, if the image is in the grayscale domain, then the count is 255. And if the image is of

size NN × then, the No. of pixels is N2. And the expression is the bracket means the CDF

value for the value of input gray level. This is how we get new intensity levels calculated for

the old intensity levels.

LINEAR SCALING : The problem with histogram equalization method is that it introduces

some artificial grey levels in different locations of the image as per the gray level distribution.

This is not a desirable thing for our face recognition system. Hence, for our system we choose

to normalize the image by Linear Scaling which stretched the pdf of the images in such a way

that it covers the whole gray scale range whereas keeping the variations in the image intact.

Full Scale linear Scaling

There are three common linear scaling methods, the first one is called Linear Image Scaling,

in which the processed image is linearly mapped over its entire range; the second one is called

Linear Image Scaling with Clipping, where the extreme amplitude values of the processed

image are clipped to maximum and minimum limits. The last one is called Absolute Value

Scaling, which utilizes an absolute value transformation for visualizing an image with

negatively valued pixels. The second technique is often subjectively preferable, especially for

images in which a relatively small number of pixels exceed the limits.

For our purpose, we are going to implement the second method, which is Linear Image

Scaling. The idea of linear scaling is illustrated below.

This process Involves mapping of histogram of the input image in such a way that the

histogram of the output image covers the entire range from [0-255] of gray scale levels. The

main challenge faced here is to realize the mapping range. Low contrast images can be result

of poor illumination and lack of dynamic range in the imaging sensor. These low contrast

images have a very low dynamic range. Thus the primary idea is to increase the dynamic

range of these images, that is to stretch the range from low to high linearly. We have an

equation,

( )min

minmax

minmaxmin)( FF

FF

GGGFHG −

−

−+==

Where,

(Fmin,Fmax)= minimum and maximum grey level of input image that is occupied

(Gmin,Gmax)= minimum and maximum grey level of output image that is desired.

In a way, this equation represents the line form y=mx+c, where m is the slope and c is the

intersection on y axis. In our case the slope is given by the quantity (Gmax-Gmin) / (Fmax -

Fmin). When we make Gmin=0 and Gmax=255, we cover the entire range for 8 bit images,

hence the process is called full range linear scaling.

EXAMPLE: NORMALIZATION

Original bright,dark and midtone images repectively

HISTOGRAM

EQUALIZATION

LINEAR

SCALING

Original

images

Histogram

Equalization

Linear

Scaling

Analysis: the figure above shows the image normalization by histogram equalization and

linear scaling method. As mentioned before, we see that the histogram equalization method

introduces unwanted effects like contouring and also introduces unwanted grey levels. On the

otherhand, Linear scaling method enhances the contrast such that it suppresses any sudden

occurrence of bright light and also takes care of the poor lighting conditions. Hence our

selection of Linear scaling method for image normalization is well-justified.

Feature Selection

PCA :

PCA is one of the most successful techniques used in face recognition algorithms. The

purpose of PCA is to reduce the large dimensionality of the data space to the smaller intrinsic

dimensionality of feature space, which is needed to describe the data. This is the case when

there is a strong correlation between observed variables. The main idea of using PCA for face

recognition is to express the large 1D vector of pixels constructed from 2D facial image into

the compact principal components of the feature space. The equation of PCA is given by the

equation below for the set of D dimensional vector { }n

ix1

the M dominant eigenvectors of the

sample covariance matrix formulate as follows :

∑ −−=i

i

T

i xxC )()( µµ

Where is µ is the sample data mean, each each i v is an eigenvector of the Covariance Matrix

(C) having associated eigenvalue jλ :

jjj vCv λ=

Linear Discriminant Analysis (LDA)

LDA is closely related to PCA in terms of finding linear combinations which best explains the

data. LDA models the difference between the classes to make class cluster more separable.

Suppose that each of C classes has a mean µi and the same covariance Σ. Then the between

class variability may be defined by the sample covariance of the class means:

( )( )∑∑=

−−=C

i

T

ii

b C 1

1µµµµ

The class separation in a direction wr

in this case will be given by:

ww

ww

ST

b

T

rr

rr

∑=

∑

Linear Discriminant Analysis (LDA) finds the vectors in the underlying space that best

discriminate among classes. For all samples of all classes the between-class scatter matrix SB

and the within-class scatter matrix SW are defined. The goal is to maximize SB while

minimizing SW, in other words, maximize the ratio W

B

SS

∆∆

.

Elastic Graph Bunch Matching

EGBM approach has used the structure information of a face which reflects the fact that the

images of the same subject’s trend to translate, scale, rotate, and deform in the image plane. It

makes use of the labeled graph, edges are labeled the distance information and nodes are

labeled with wavelet coefficients in jets. This feature model graph can then be used to

generate image graph. The model graph can be translated, scaled, rotated and deformed during

the matching process. This can make the system robust to large variation in the images.

Radon

Radon transform has been used to derive enhanced low frequency components, which are

useful in face recognition. Radon transform for 2 dimensional function ( )yxf , is defined as :

Radon is an efficient way of extracting frequency components in different directions. The

Line integrals of face, during the computation of Radon transform, amplify low frequency

components, which are useful for face recognition. Radon Transform can give very good

dimensionality reduction by choosing proper number of angles (0-179 orientations.). Radon

Transform can achieve lossless compression. Provides Rotation Invariance to images which

are a very important factor in real time recognition systems.

Discrete Cosine Transform

2D DCt is a efficient way of transforming the image such that there is

good distinction between the high and low frequency components. DCT

enables a proper selection on low or high frequency coefficients due to its

way of spatial distribution of the coefficients. The DCT of image is

] [0

image theofcenter thefrom distance -r

)sincos)(,(),(

πθ

θθθ

∈

−−= ∫ ∫∞

∞−

∞

∞−

dxdyyxryxfrR

calculated as follows:

∑∑−

=

−

=

+

+=

1

01

22

2

1

02

11

1

2,12,1

2 2

2

1cos

2

1cos

N

n

N

n

nnkk knN

knN

xXππ

The first DCT coefficient is the DC coefficient and gives the average value of the image. This

mainly consists of the illumination information and is hence, sometimes discarded to remove

illuminations changes. DCT has excellent energy compaction property and divides the region

of image into regions of low and high frequency. It also facilitates feature selection through

zig-zag or other methods. 8x8 block DCT allows capturing of local frequency distribution and

speed up the overall performance.

OUR APPROACH

As we are implementing face recognition system on the dsp board with limited capabilities of

computation and memory. This constrains our algorithm selection in certain way. PCA and

LDA though very standard methods of face recognition algorithm due require computation of

correlation matrix. Hence, we decided to implement radon transform and DCT transform as

our feature extraction algorithms.

Feature extraction using Radon transform and DCT

(RDCT) Facial features derived in the proposed approach are the frequency components in

different directions. The line integrals of face image, during the computation of the Radon

transform, amplify low frequency components, which are useful in face recognition. Radon

space image for 0–179 orientations is shown in Figure. DCT is used to derive the frequency

features from the Radon space. The figure reveals excellent energy compaction property of

DCT. Significant coefficients (10) of DCT are concatenated to form the facial feature vector.

In our Project, we have used 32 radon orientations ranging from 0-180 degrees at the jumps of

5 degrees. The Image was divided in blocks of 8x8 for the DCT and 10 coefficients from each

block were chosen. The number of DCT coefficients were chosen keeping in mind, the

computational requirements, data seperatability and ease of classification. There are many

ways one can choose the DCT coefficients of a block, like methods based on maximum

magnitude of coefficients, maximum energy, or variance f coefficients and etc.

There are also proven results of different recognition rates as an effect of overlapping blocks

and the percentage of overlap while calculation 2D-DCT .In Our case, DCT coefficients of

non-overlapping blocks of an image are computed and ordered using zigzag scanning. The

main reason behind this is to keep the system computationally less expensive and speed up the

recognition process. The DCT coefficients extracted from each block are concatenated to

obtain the feature vector.

Classification

Euclidean Distance:

Euclidean distance is absolute difference between two points in one or more dimensions.

Euclidean distance for N dimensional space is given by:

22

33

2

22

2

11 )()()()( nn qpqpqpqpd −++−+−+−= K

It classifies the class which is at smallest distance from the test sample. Each class prototype

is defined as mean of the feature values of all the prototypes of that particular class.

K-Nearest Neighbor Classifier:

k-Nearest Neighbor (k-NN) method assumes all instances correspond to points in the n-

dimensional space. The nearest neighbors of an instance are defined in terms of the standard

Euclidean distance. KNN is based on instance based learning. A test sample is classified by

majority of votes by the neighbors.

Implementation and Results

Database Collection:

Database was collected for 20 people for minimum of 30 images. Each subject is asked to

adjust the face it the 128 X 128 box displayed on the LCD. No instructions were given to the

subjects in terms of pose or distance from the camera. The only constraint was to provide a

frontal image such that there are no off-plane rotations. As the algorithm selected does not

support these variations. Also it is quite likely for subject to follow the natural distance and

pose at which he/she is most comfortable at the time of testing/training hence no instructions

were provided to the subjects so that they can follow their natural pose at the time of testing.

The database was collected in two sessions for 11 people to capture variance in pose,

illumination and zoom factor. The subject was made to provide frontal pose while the camera

capturing image at 15 frames per second buffered the images for 2 seconds. Once the program

collected 30 images it saved the images in the database creating unique name for each file.

Training

images

Test Images

Sample Images from Database

Dimensionality reduction

Downsampling

The original LCD size was 240x320, i.e.: 76.8 Kbytes of data per processing. Taking into

consideration the size of internal RAM of 16Kbytes, processing of 1 frame of this size would

take about 4.8 seconds + processing of algorithms. In a real time system, such high processing

times are discouraged as they make the system very slow. As in our case we buffer the images

and then processes them for classification, taking an image size of such large dimensions is

impractical. Hence, we choose a smaller sized image of 128x128, i.e. 16.4Kb which brings

down the processing time to 1 sec per frame. However, considering the fact that a video

based recognition system would take more than 1 frame for classification, a processing time

of 1 sec per frames is still large. An image size of 128x128 when displayed on the LCD

provides enough resolution and image size for the user to see and adjust his face into the

camera. Hence, an Image size of 128x128 for display and crop purposes seems a reasonable

size.

Selection of downsample image size

The cropped image is buffered and frames for 2 seconds are saved. This enables the system to

capture variations in poses of the face. For a frame rate of 15 fps, we get 30 frames leading to

a total buffer size of 491.5 bytes. Processing all the frames is a time consuming process for

such a large dataset. Hence, a need for further reduction in dimension arises. We know that a

larger image contains a lot of redundant information and hence, reducing the size of image

would not harm the classification as far as the image is still recognizable. We initially choose

am image size of 24x24, but it posed lot difficulty as the Image size was too small and the

information contained was much lesser. Since, the system is only based on face recognition

we decided to go with a larger size of 64x64 to capture most of the information of the face and

at the same time, making the dimension of feature vector reasonable enough for the processor

to handle without much delays.

Radon Transform

We learnt that the in-plane rotations can be handled well with radon transformation which

projects the image into various orientations and amplifies the low frequency components by

taking line integrals along the columns for each orientation. Since, low frequency

components are the most important components for face recognition, Radon transform is a

good choice to reduce dimensions and compacting the information contained in the image.

Since we used 32 radon orientations we received a final image size of 32x44. To prevent any

clipping of data while rotating, the image were zero padded by 12 pixels each side to make the

resulting image 88x88. The padding size was decided by taking into consideration the

diagonal length which is 90. Since, the Radon image is followed by DCT where the blocking

of image takes place, we choose 88 as column width so that the image is divisible by 8 from

each side. This makes us loose 2 pixels from each side, but considering the fact that the face

of a person is mostly located in the center of the frame, this loss might not present a serious

error in classification. The Radon transform gives us a final image of 32x44 which is then

send for DCT transform stage.

DCT Transform

A 2D DCT transform of an image accumulated the low frequency components of the image in

the top-left corner of the image. A Local- block based DCT helps separating the frequency

components of the image on a local basis. This helps us select the coefficients from important

blocks containing eyes, lips, nose, etc. without any loss of data. Also it is proven that a Local

based DCT presents better frequency capturing for face recognition purposes. Hence, we

decided to go with a local 2D DCT for a block size of 8x8. We divide the radon image into

blocks of 8x8 and calculate DCT coefficients for each block. We have 64 coefficients per

block leading to 4096 total coefficients.

LOW frequency vs. High frequency components: When DCT is chosen as a feature selection

method, we can choose either high frequency components as features of low frequency

components. If we choose the high frequency components, this implies that the regions

containing edges information’s are selected as features. As the edges present the shape of the

face and features, it is sometimes considered as a good approach as it imitates the face very

closely. On the other hand, the low frequency component represents an approximation of the

image, so it can be considered as a source of classification errors. However, choosing high

frequency components make the dependency of training set too high and can lead to high

classification errors if the subject doesn’t present and image similar to his training set.

Choosing the low frequency components means that we are capturing the total energy of the

image which can in turn lead to better classification results. Hence, we choose to pick low

frequency components as our feature set. Picking up 10 coefficients from each clock gives us

a feature vector of 200 coefficients i.e. 0.8 kb for each image. This is a dimensionality

reduction of 98%.

Selection of number of coefficients

Number of Radon angles:

Selection of number of radon angles was an important decision. We wanted to cover the

whole range of 0-179 but at the same time wanted to keep the computational complexity in

terms of time and processing, very low. We noticed that the 128x128 images were taking 1

sec for each angle to rotate. This was due to the restricted small size of internal RAM. Hence

in total each frame took about 32 seconds for radon transform. This hurdle was taken care of,

when we reduced the size of the image to 64x64. Now the image was taking only 8 secs to

complete the recognition process including 32 rotations of radon transform. Although, 8

seconds is pretty high for a real time system, we decided to go with it and focus on

optimization of loops to reduce the time to rotations. For the 32 angles, we took angles at the

steps of 5 degrees to cover up all the 0-179 range of orientations. However, we learnt later

that a better approach could have been to range the set of angles from 0-30 in steps of two and

then flipping the results to obtain similar orientations in negative direction. For example,

flipping the orientation of 10 degrees would give resulting orientation of -10 degrees which is

170 degrees in effect. Also, considering a practical situation, a person would only rotate his

face up to as much as 30 degrees in each direction. Hence, this approach sounded very

intuitive to apply and would have given better results. This approach gives us 60 radon angles

resulting in a final image of 60x44. This was not a significant gain of dimensionality

reduction in our case and hence, we decided to go with the original 32 radon angles.

Number of DCT coefficients:

Selection of number of DCT coefficients from each block was another important decision.

Too many coefficients would give a large feature vector and vice versa. Since, the images are

gray level images; the symbol set consisted of 0-255 gray level values. After feature

extraction, it is very likely that these feature vectors lie very close to each other. Hence, to

differentiate between the data, a correct selection of coefficients was necessary. A dimension

of 16 coefficients per block gave a final feature vector of 1024 coefficients resulting in the

training set of 20 vectors of 1024 values each. This seems like a very big number, considering

the fact that we need to calculate distance of the incoming vector from each vector in training

feature set. Hence, 16 coefficients per block was an expensive choice in terms of existing

system for us. Since, it is an only face based recognition system, taking a too small value of

coefficients would also be an impractical thing to do. The capture the low frequency

components properly, we take 10 DCT coefficients per block in a zigzag manner such that we

have the highest magnitude coefficients in our feature set. Coefficients with larger magnitude

affect the classification rate more than the coefficients with lower magnitude. 10 DCT

coefficients per clock resulted in a feature vector of 200 values per feature vector. This

sounded a reasonable choice. Another choice was taking 5 DCT coefficients per block which

resulted in a feature vector of 100 values per feature vector. This again is a reasonable number

for our existing system.

The system has also been tested for only DCT based classification. For this purpose, DCT

coefficients are extracted for the original 64x64 normalized images resulting in the feature

vector of size 640 for 10 coefficients per block and 320 in case of 5 coeffecients per block.

Selection of Classifier ( ED results and KNN):

For this system we wanted to choose the classifier that is very less expensive in terms of

computation yet yielding the better classification rate. Euclidian distance classifier is very

light computationally but gives much emphasis on the minimum distance which results into

misclassification with little variation from test images. For Euclidian distance classifier the

recognition rate for validation data was 97% where on test data was 63%.

k-Nearest neighbor gives the most frequent class from the first k smallest class distance from

the test sample. So for KNN even if test sample yields lower distance from prototype of a

class if it only occurs once it has more probability of classifying itself correctly. KNN is

computationally heavy than Euclidian distance but overcomes the problem of singularity in

GMM which occurs because of small database and large number of feature set, and the

number of features which results in non-singular model of GMM fails to capture the facial

details of the face.

Selection of value of K

A proper selection of value of k is very important for a k-nn classifier. To high k value will

give noisy classification, however too low value might land up only considering the very

nearest neighbor, which might or might not be correct classification. We have noticed by

experimentation that the test samples that are very different from the training data will

generally produce larger distances than more similar data. Consequently they are more likely

to cause a misclassification. The following diagrams show the effect of k on classification.

K-NN classification for

Different values of k radius in

increasing order.

Lets say the incoming vector belongs to class red. As wee see if the vector resembles the

training image it might lie very close to the feature vector of the class RED. Taking a smaller

radius of K will help in this case as we will get the maximum frequency for the class RED has

the unknown vector will be classified as RED. Now, we increase the size of the radius under

the assumption that more number of vectors of same class should exist under the circle now.

However, we see that the prototypes for class GREEN and equal in number to that of class

RED. Hence on the basis of value of indices the classifier will classify the unknown vector as

RED or GREEN, whichever has the lower class index. If GREEN is chosen, this clearly is a

wrong classification even when the RED prototypes lie very close to the unknown vectors. As

we increase the size of the radius, the classification areas becomes more and more noisy and

might result in higher misclassifications then classifications.

In our experiments, we noticed that the classification was getting affected by the value of K

in the similar way. Also, we sometimes noticed that the correct class was within the first 10

nearest neighbors of the input vector, but still the classification was incorrect. This was

probably due to the above mentioned reasons.

Also, when the test image is very different from training images, the incoming vector will lie

very far from the training vector (as shown in the figure) and hence, will be always

misclassified. A large variation in training set can help this situation. However, in our case

due to restrictions in data availability we could not include lot of variant images in the training

set. Hence, the system gives good classification if the user presents an image close to the

training set.

Following is an example of test set misclassification due to large values of K.

String:

./Database/ashwin/video_ashwin

_10.raw

Just finished radon

sorted value of 0 at 13





String:

./Database/ashwin/video_ashwi

n_23.raw

Just finished radon






The classification result here shows the

classification of class-13 (“Ashwin”) for

different values of K. We see that

Ashwin is misclassified as class 10

inspite of being in the top 4 neighbors.

For k=15 this class is misclassified as

10, however, for k=5 this gives a correct











Classified as 10











Classified as 13

classification rate.

Challenges

Dimensionality

As mentioned before, a large size of image or feature vector posed the hurdle of very slow

recognition process. Also, since we buffer the images before processing them, a very large

size of image was constantly leading to buffer or stack overflow. As the stack overflows, the

data being displayed on the LCD was displaying garbage values. To solve, this we had to

reduce the size of image from originally 240x320 to 64x64 and flush the buffer after every

classification result.

For a very large feature vector size, we noticed that the data is noisier and closely scattered.

Hence, for large feature vectors the data for each class was overlapping and hence, resulting in

a lot of misclassifications. This was even true for the case of 640 DCT coefficients. Hence, for

DCT only case, we decided to go with selection of 5 DCT coefficients per block, resulting in

the feature vector of 340 coefficients

Small size of internal Ram and memory issues

Processing time for Radon for different image sizes

Scatter plot of data :

RADON 10images per class

FIGURE: Scatter plot of original feature set of radon and DCT,200 coefficients and 10 images per class.

FIGURE: Scatter plot of mean image of radon and DCT of 10 training images and feature set of 200 coefficients per class.

Graphs above show scatter plot for 10 images and the scatter plot for mean image for each

class. We observe that in the scatter plot of 10 images/class the feature vector is highly

overlapping. Since our symbol set consisted of only 0-255 grey level values this translates to

closely lying feature values which in turn contributes more towards misclassification.

However in the mean image graph we see that the values are visible and close to being distinct

which in turn means lower classification rate.

Median image

FIGURE: Scatter plot of median image of radon and DCT of 10 training images and feature set of 200 coefficients per class.

A general analysis of median image has been presented above. While experimentation we

noticed that the class was very oftenly being classified as class id -11 (“Shivani”) the probable

reason for this is that while we take Euclidian distance from all the features from the original

feature set, the set of values lie very close to the other classes as the original feature vector in

itself very less distinct and matches lot of classes in the train set, hence influencing the final

decision. As seen from the median graph the median sufficiently separated the data for class

11 which was highly scattered in the mean image.

Means of radon+DCT

FIGURE: A comparative chart for set of one mean per class for radon and DCT of 10 training images.

Median of radon

FIGURE:A comparative chart for set of one median per class for radon and DCT of 10 training images.

FIGURE: comparative chart for set of one mean per class as compared to median for that class for radon and DCT of 10 training

images.

For analysis purposes, we tried projecting the data on 1 dimension in terms of means and

medians. If we consider Euclidian distance than the class that is most likely to be

misclassified is the one having its mean value very close to any other class. The bar graphs

above the mean and median value, so the bar graphs at same level would be create confusion

at classification because of same distance from each of these feature vectors, hence the

misclassification rate would be high.

In most of the cases the mean and median value is similar for respective classes. However,

when the variation between training images is too high the median presents a better

representation of the class values as it tends to fall between the most frequent values where as

mean is a reconstructed mid value for the variation of the training set.

FIGURE: comparative chart for set of 10means per class for radon and DCT of 10 training images.

FIGURE: comparative chart for set of one mean,median, and median of 10 emans per class for radon and DCT of 10 training

images.

As mentioned above, the bars above shown either the median or the mean value. Classes at

the same level are more likely to be misclassified then classes showing some difference in

mean or median values. We have taken mean of 10 images , median of 10 images and median

of 10 means for 10 training mages for each class. From the graph above, we notice that the

median as 1D representation of data is a better metric for Euclidean distance.

We also notice that there are certain example like class 11(“shivani” and class-12 (“neha”)

and class 18(“pranav”) where there is a significant difference in the mean and median image.

The reason for this is that the training set have a image set of large variance. Since the

variance is too high, the mean of the images is increased whereas the median remains

unchanged. These classes are more likely to be misclassified if we take the distance from

mean image as a metric.

We also, notice that the radon and DCT together also, cannot handle such kind of variations in

the image set. When observed carefully, we see the images are different in terms of zoom

primarily and illumination secondly. The Second condition is handled by the algorithm but the

first condition still remains the problem.

Classification based on only DCT coefficients.

FIGURE: Scatter plot of 10images per class for feature vector of 640 DCT coefficients for each image.

As mentioned earlier, the scatter plot of 640 DCT coefficients is too dense. Also, wee see a

number of coefficients at same level, which implies less distinctness and more redundancy in

the information provided by the DCT coefficients. This highly overlapped data poses a great

problem in classification rates.

FIGURE: comparative chart for set of one mean per class for DCT of 10 training images.

FIGURE: comparative chart for set of one mean per class for DCT of 10 training images.

FIGURE: comparative chart for set of one mean and one median per class for DCT of 10 training images.

We notice from the above graph that the mean values of the DCT –training vectors are quite

distinct and the median and mean values are similar even for cases where the training data is

highly variant like in above mentioned cases. Since no two bars at same level, this scheme

can provide better recognition rates. Both mean and median serves as a sufficient metric for

distance calculations. We noticed that class 11(“shivani” was quite often being classified as

class 6(“shengakai”) in the real time tesing. The probable reason for this can be the very little

difference between the mean values and hence, misclassification because of the slight

variations in the test image. Hence, this provides a very good intuitive reasoning for the

misclassification results.

FIGURE: comparative chart for set of one median per class for DCT of 10 training images.

FIGURE: comparative chart for set of one median per class for DCT of 10 training images.

Again, As metioned above, median of images provides a good distinction in the feature set.

Intuitively, that the incoming feature vector is more likely to go towards the median values

for correct classification, than the slight variant images away from the median.

FIGURE: comparative chart for set of one mean, one median and median of means per class for DCT and radon plus DCT of 10

training images.

RADON_DCT VS DCT

For analysis purposes, we plotted the mean of radon and DCT feature vector, median of radon

and DCT feature vector, mean of DCT feature vector, median of DCT feature vector, and

medians of means for 10 images in each scenario. We notice that , as expected the feature set

values of the DCT are much higher then the Radon feature set. We also observe that the DCT

feature set presents more distinctness in the values of the clusters thereby, facilitating the

correct classification. Radon transform on the otherhand, had values very closely lying

together, hence, making the system very sensitive to slight changes.

The reason for less variation of the Radon transform could be that for the 32 rotations, it sums

up the values of the columns as line integrals. This in a way suppresses the variations in the

image and brings the values close together. This might be a good technique for clustering of

images. However, in our case need largevariations in the feature set , as we are not

implementing any discriminant algorithms. Radon transform and DCt in conjection with any

discriminant algorithms would provide excellent results as the data for each class would be

closely scattered making the in-class variation small and the for each class the values would

be well separated from other classes making the inter-class variation also large. This is a

desirable situation for any face recognition system.

Unfortunately, in our case, due to the large calculation of covariance matrix, implementation

of any Discriminant analysis method was not possible. Hence, in our case, DCT provides

better recognition rates then the combination of Radon and DCT.

Flow diagram of the code:

VM322K2 video input(16 bit YUV)� CAM (8bit-Y-240x320)

CAM (8bit-Y 240x320)�CROP(128x128)

CROP(128x128)� buffer (3 sec)� Downsample(64x64)

Downsample(64x64)�Preprocessing (64x64)

Preprocessing (64x64) � Radon(32x44)

Radon(32x44) � Block(20)

Block(20)� 2D-DCT

2D-DCT� 10 coeff per block� feature_concat[200]

Send to kNN for classification.

Limitations with REAL TIME RESPONSE

One of the major problems we encountered was not being able to capture the image properly

while real time testing. The dataset gave excellent results for the offline testing, while for the

same pose the classification of in real time was very drastic. We learnt that the algorithm was

working fine, however, the Image was not being able to capture properly in the first stage

itself. As initially we were just taking one image for classification, we implemented the

downsampling function in the interrupt (while) loop. As a particular number of frames we

collected, the last frame stored in the crop array was processed and downsampled. However,

later we learnt that the image in the final frame was most of the times corrupted or over

written because the interrupt was still enabled and running while we were performing the

recognition task. Hence, the image we used for classification was sometimes, blurred or

distorted due to interlace effects or sudden halt of interrupt while the camera was still storing

the image. In the very few times when the image was captured properly, out of luck, the

classifier gave satisfactory results. To solve this problem, we went back to the old scheme of

buffering images and then processing them. To avoid confusion or any corrupted frame we

choose a middle frame for classification rather than the first or last frame. Also, we disabled

the imterrupt everytime we went into recognition step.

The second problem, was the camera input. We notices that the classification was also getting

affected by the choice of camera. One of the camera units gave a noisy image on the edges

and a low resolution image while the other camera gave a high resolution and clear image.

Since we used the later camera for collecting the database, we used the same camera for test

stage.

Displaying classification result

The classification result has been displayed on the LCD as the names of the person being

classified. For this, instead of storing the images for each person, we stored the images in

terms of 84x320 sized 2d arrays. This was a reasonable decision because the program

eventually had to read the messages and store it in arrays in case we saved them in forms of

images. This would have been an extra overhead. This also saved us considerable amount of

the loading time. Messages for initial user instructions were also saved in the system .

Imposter model.

Making an imposter model requires a large amount of data with large variations. Due to

constraints and lack of availability of such large databases, we leave the imposter model as a

future work making the current recognition system as “one out of k classes classifier”.

Quantitative Results

The quantitative results in terms of recognition rate sis presented below:

Feature set extraction method and classification for test

database.

Correct Recognition

rate

32 Radon Angles+ 10 DCT coefficients/block – k-NN(10) 86%

32 Radon Angles+ 10 DCT coefficients/block – E.D 63%

5 DCT coefficients (320)+ k-NN (5) 98%

5 DCT coefficients (320) + k-NN (15) 74%

10 DCT coefficients(640) + k-NN (10) 93%

16 DCT coefficients (1024) + E.D 60%

We tested the system for real-time as well as the test dataset. The above results show , the

correct recognition rates for the various combinations of feature selection methods. As

expected the Euclidean distance presents a very low recognition rate for both Radon and DCT

based methods. For the 16 DCT coefficient based method the Euclidean distance gives

recognition rates as low as 60%. This is an expected result due to increased amount of overlap

in data, redundancy in the coefficient values. At the same time, the selection 5 DCT

coefficients per block along with a k-NN classifier with k=5 gives recognition rates as high as

98%. To see the effect of selection of value of K , we calculated the recognition rates for the

same 5 DCT coefficients per block with a k-nn classifier with k=15 nearest neighbors. The

recognition rates in this case went to as low as 74%. We noticed, while classification, that for

most of the misclassified classes, the correct class was present amongst the first 5-7

neighbors. Hence, this explains the good recognition rate of k=5 and low recognition rate of

k=15. Hence, an optimum value of k for the k-nn classifier is also an very important issue.

We also noticed that , the recognition rates for the test database of 50 images was pretty high

as compared to the size of training set of 10 images. Also, as shown above, the training and

test dataset is very distinct and has lot of pose variations. The algorithms gives good

recognition rates for the test database for offline-on-board testing. However, achieving such

high recognition rates for real-time recognition is still a problem due to variance in poses from

the training set.

Conclusion

The system is designed to work well with database containing varying images of each subject.

After looking at the real time classification we believe that face detection is needed prior to

feature extraction to take into account the zoom factor. Including any discriminant analysis

methods would boost the real time recognition rate significantly as explained earlier. We also

noticed that radon + DCT is a good approach for energy compaction and decreasing the in-

class variance but it does not contribute towards increasing inter class variance and hence, is

dependent on linear discriminant method to give good recognition rate. On the contrary only

DCT provides a good interclass variance but poor in-class variance and is very sensitive to

change in pose and training set.

Future Work

We would like to implement one of the discriminant analysis method or weighing function so

we can create a voting scheme for set of images. We also would like to test different feature

selection techniques like Haar wavelet transforms, WHT and EBGM.

References

References

1. Z. Hafed, M. Levine, “Face Recognition Using the Discrete Cosine Transform”,

International Journal of Computer, 43(3),167-188

2. W. Zhao et al., “Face Recognition: A Literature Survey”, ACM Computing Surveys,

Vol. 35, No. 4, pp. 399-458, 2003.

3. H.K. Ekenel, R. Stiefelhagen, "Local Appearance based Face Recognition Using

Discrete Cosine Transform", 13th European Signal Processing Conference (EUSIPCO

2005), Antalya, Turkey, September 2005.

4. J. Stallkamp, H.K. Ekenel, R. Stiefelhagen, “Video-based Face Recognition on Real-

World Data”, Computer Vision, 2007. ICCV 2007. IEEE 11th International

Conference on

5. P. Viola, M. Jones, “Robust Real-Time Face Detection”, Intl. J. of Computer Vision,

Vol. 57, No. 2, pp. 137-154, May 2004.

6. C. Sanderson, K.K. Paliwal, “Fast features for face authentication under illumination

direction changes”, Pattern Recognition Lett. 24 (14) (2003) 2409–2419.

7. A. Batur, B.Flinchbaugh and M. Hayes III, “A DSP-Based Approach for the

Implementation of Face Recognition Algorithms”, IICASS 2003, pp. II 253-256

8. S-W. Lee, Sang Lee and H.C. Jung, “Real-time Implementation of Face Recognition

Algorithms on DSP chips”, Lecture Notes in Computer Science, 2003, pp-1057

real time biometric face recognition

Documents

system description

feature selection

discrete cosine transform

feature extraction

selection of algorithms

scale linear scaling

elastic graph bunch

nearest neighbor classifier