equivalence of some common linear feature
TRANSCRIPT
-
8/2/2019 Equivalence of Some Common Linear Feature
1/14
-
8/2/2019 Equivalence of Some Common Linear Feature
2/14
Aim Recently, a number of empirical studies have
compared the performance of PCA and ICA as
feature extraction methods in appearance-basedobject recognition systems, with mixed andseemingly contradictory results.
In this paper, the author briefly describes the
connection between the two methods and arguesthat whitened PCA may yield identical results toICA in some cases.
Furthermore, he describes the specific situations
in which ICA might significantly improve onPCA.
The main goal of this short paper is to explainthose seemingly contradictory results and toclarify under which circumstances ICA mayoutperform PCA or perform equally.
-
8/2/2019 Equivalence of Some Common Linear Feature
3/14
PCA vs ICA PCA Focus on uncorrelated and Gaussian components
Second-order statistics
Orthogonal transformation
ICA
Focus on independent and non-Gaussian components
Higher-order statistics
Non-orthogonal transformation
-
8/2/2019 Equivalence of Some Common Linear Feature
4/14
The distance between two points is not the same in
the uncorrelated data and the whitened one. However, the distance
between two points is equal in the whitened data and the
independent one
Fig. 1. Two artificial
examples: (a) a sub-
Gaussian data set (top
row) and a super-
Gaussian data set (bottom
row),
(b) both transformed by
PCA,
(c) whitening, and
(d) using the ICA model
by means of FastICA.
(e) shows the original
data set with the ICA (a1,
a2 and w1, w2) and PCAdirections (u1, u2).
(a) Original. (b)
Uncorrelated.
(c) Whitened. (d)
Independent. (e) All
directions.
-
8/2/2019 Equivalence of Some Common Linear Feature
5/14
Extended infomax
In order to show that (for this simple artificial data set) the I Cs obtained by Infomax arealso, approximately, an orthonormal transformation of the whitened data, Fig. 2 shows thesame artificial data sets as Fig. 1 but the ICs have been obtained by Extended infomaxalgorithm.
o NOTE: The claim does not quite hold for the algorithm of Comon . This is because his algorithm
renormalizes the components so that the basis vectors (columns of A) have unit norm. Since this also
changes the variance of the independent components (rows of S), the transform is in this case not an
approximate rotation from the whitened data.
-
8/2/2019 Equivalence of Some Common Linear Feature
6/14
DIFFERENT ARCHITECTURES:
INDEPENDENCE IN A OR IN S?Two different architectures for the ICA decomposition namely,
Architecture I: we want the basis images (columns of A) to be mutually independent
Architecture II: we want a decomposition where the coefficients (rows of S) are mutuallyindependent
In either case, the independent components, that is, the matrix having been optimized forindependence, is always an orthogonal transformation from the whitened data given by
SVD, at least in the case of FastICA.
-
8/2/2019 Equivalence of Some Common Linear Feature
7/14
ROTATION INVARIANT CLASSIFIERSMost common classifiers are invariant to a rotation of the
data space.if one does not have any preexisting knowledge on the
structure of the data space, it seems reasonable to build aclassifier which does not care about data rotations.The author argues that the independent components of
ICA are the result of a simple rotation of the whitened data.Hence, a rotation invariant classifier will not do any better
nor any worse when fed with the independent componentsthan when fed with the whitened data. Thus, ICA gives youabsolutely no advantage over whitening by SVD.
if the used classifier is rotationally non-invariant, theremay be advantages to performing the rotation ICA gives.
-
8/2/2019 Equivalence of Some Common Linear Feature
8/14
SELECTING SUBSETS OF COMPONENTS
In (a), the classes are perfectly separated
using just the projection over the ICA
direction w1, while in (b), the classes arebetter separated using the direction of the
eigenvector u2.
The performance of ICA and PCA may differ
when a subset of components is used for
classification.
In (a) the classes are perfectly separated using just the projection over the ICA direction w1, in
(b) the classes are better separated using the
direction of eigenvector u2.
Feature selection is the process of removing features from the data set that are irrelevant with
respect to the task that is to be performed.
Therefore, in each example, a feature selection step may help to reduce the dimensionality and
improve the classification. If no feature selection is carried out, ICA and whitened PCA
perform exactly equally well on both data sets, provided a rotationally invariant classifier is
used.
-
8/2/2019 Equivalence of Some Common Linear Feature
9/14
Experimental Results
-
8/2/2019 Equivalence of Some Common Linear Feature
10/14
Limitations: Test evaluation is done according to only one criterion
i.e. Recognition rate
Other criteria such as computational cost may alsoapply.
Database considered by the author for experimentalresults are OSL database. Experimental results fordatabase JAFFE , FERRET and C-K may vary and are notcovered in the paper. Thus there is no generalization for thedatabases used stated in the paper.
-
8/2/2019 Equivalence of Some Common Linear Feature
11/14
Which is to be preferred? Though this comparative study does not consider
effects of registration errors and image pre-processingschemes, results show
There is no clear winner Because,
All algorithms are unsupervised and
Results are highly dependent on class distributions.
Differences cannot be explained in terms ofunderlying data distribution.
-
8/2/2019 Equivalence of Some Common Linear Feature
12/14
Suggestions. While considering recognition task comparisons should be
made on the basis of1. Nature of tasks2.Architectures3. ICA algorithms
4. Distance metrics e.g. 1. Facial identity recognition-
FastICA to implement ICA architecture II provides bestresults followed by PCA with L1 or Mahalanobisdistance metrics( ICA architecture I and PCA
with L2 or cosine metrics are poorer choices) 2.Facial action recognition
recommendation is reversed .Infomax to implement ICA architecture I provides best
results followed by PCA with L2 metric
-
8/2/2019 Equivalence of Some Common Linear Feature
13/14
Authors conclusion In this paper ,it has been shown how ICA and PCA
are closely connected and under whichcircumstances claim gets failed
1.A feature selection process is carried out
2.A rotationally non invariant classifier is used
3.A renormalization such as that used in Comons
algorithm is performed.4.Infomax is used and the data yields independent
components which are not close to orthogonaltransform from the whitened data
-
8/2/2019 Equivalence of Some Common Linear Feature
14/14
Thank You