equivalence of some common linear feature

8/2/2019 Equivalence of Some Common Linear Feature

1/14


2/14

Aim Recently, a number of empirical studies have

compared the performance of PCA and ICA as

feature extraction methods in appearance-basedobject recognition systems, with mixed andseemingly contradictory results.

In this paper, the author briefly describes the

connection between the two methods and arguesthat whitened PCA may yield identical results toICA in some cases.

Furthermore, he describes the specific situations

in which ICA might significantly improve onPCA.

The main goal of this short paper is to explainthose seemingly contradictory results and toclarify under which circumstances ICA mayoutperform PCA or perform equally.


3/14

PCA vs ICA PCA Focus on uncorrelated and Gaussian components

Second-order statistics

Orthogonal transformation

ICA

Focus on independent and non-Gaussian components

Higher-order statistics

Non-orthogonal transformation


4/14

The distance between two points is not the same in

the uncorrelated data and the whitened one. However, the distance

between two points is equal in the whitened data and the

independent one

Fig. 1. Two artificial

examples: (a) a sub-

Gaussian data set (top

row) and a super-

Gaussian data set (bottom

row),

(b) both transformed by

PCA,

(c) whitening, and

(d) using the ICA model

by means of FastICA.

(e) shows the original

data set with the ICA (a1,

a2 and w1, w2) and PCAdirections (u1, u2).

(a) Original. (b)

Uncorrelated.

(c) Whitened. (d)

Independent. (e) All

directions.


5/14

Extended infomax

In order to show that (for this simple artificial data set) the I Cs obtained by Infomax arealso, approximately, an orthonormal transformation of the whitened data, Fig. 2 shows thesame artificial data sets as Fig. 1 but the ICs have been obtained by Extended infomaxalgorithm.

o NOTE: The claim does not quite hold for the algorithm of Comon . This is because his algorithm

renormalizes the components so that the basis vectors (columns of A) have unit norm. Since this also

changes the variance of the independent components (rows of S), the transform is in this case not an

approximate rotation from the whitened data.


6/14

DIFFERENT ARCHITECTURES:

INDEPENDENCE IN A OR IN S?Two different architectures for the ICA decomposition namely,

Architecture I: we want the basis images (columns of A) to be mutually independent

Architecture II: we want a decomposition where the coefficients (rows of S) are mutuallyindependent

In either case, the independent components, that is, the matrix having been optimized forindependence, is always an orthogonal transformation from the whitened data given by

SVD, at least in the case of FastICA.


7/14

ROTATION INVARIANT CLASSIFIERSMost common classifiers are invariant to a rotation of the

data space.if one does not have any preexisting knowledge on the

structure of the data space, it seems reasonable to build aclassifier which does not care about data rotations.The author argues that the independent components of

ICA are the result of a simple rotation of the whitened data.Hence, a rotation invariant classifier will not do any better

nor any worse when fed with the independent componentsthan when fed with the whitened data. Thus, ICA gives youabsolutely no advantage over whitening by SVD.

if the used classifier is rotationally non-invariant, theremay be advantages to performing the rotation ICA gives.


8/14

SELECTING SUBSETS OF COMPONENTS

In (a), the classes are perfectly separated

using just the projection over the ICA

direction w1, while in (b), the classes arebetter separated using the direction of the

eigenvector u2.

The performance of ICA and PCA may differ

when a subset of components is used for

classification.

In (a) the classes are perfectly separated using just the projection over the ICA direction w1, in

(b) the classes are better separated using the

direction of eigenvector u2.

Feature selection is the process of removing features from the data set that are irrelevant with

respect to the task that is to be performed.

Therefore, in each example, a feature selection step may help to reduce the dimensionality and

improve the classification. If no feature selection is carried out, ICA and whitened PCA

perform exactly equally well on both data sets, provided a rotationally invariant classifier is

used.


9/14

Experimental Results


10/14

Limitations: Test evaluation is done according to only one criterion

i.e. Recognition rate

Other criteria such as computational cost may alsoapply.

Database considered by the author for experimentalresults are OSL database. Experimental results fordatabase JAFFE , FERRET and C-K may vary and are notcovered in the paper. Thus there is no generalization for thedatabases used stated in the paper.


11/14

Which is to be preferred? Though this comparative study does not consider

effects of registration errors and image pre-processingschemes, results show

There is no clear winner Because,

All algorithms are unsupervised and

Results are highly dependent on class distributions.

Differences cannot be explained in terms ofunderlying data distribution.


12/14

Suggestions. While considering recognition task comparisons should be

made on the basis of1. Nature of tasks2.Architectures3. ICA algorithms

4. Distance metrics e.g. 1. Facial identity recognition-

FastICA to implement ICA architecture II provides bestresults followed by PCA with L1 or Mahalanobisdistance metrics( ICA architecture I and PCA

with L2 or cosine metrics are poorer choices) 2.Facial action recognition

recommendation is reversed .Infomax to implement ICA architecture I provides best

results followed by PCA with L2 metric


13/14

Authors conclusion In this paper ,it has been shown how ICA and PCA

are closely connected and under whichcircumstances claim gets failed

1.A feature selection process is carried out

2.A rotationally non invariant classifier is used

3.A renormalization such as that used in Comons

algorithm is performed.4.Infomax is used and the data yields independent

components which are not close to orthogonaltransform from the whitened data


14/14

Thank You

equivalence of some common linear feature

Documents