equivalence of some common linear feature

Upload: anuradha-patil

Post on 05-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Equivalence of Some Common Linear Feature

    1/14

  • 8/2/2019 Equivalence of Some Common Linear Feature

    2/14

    Aim Recently, a number of empirical studies have

    compared the performance of PCA and ICA as

    feature extraction methods in appearance-basedobject recognition systems, with mixed andseemingly contradictory results.

    In this paper, the author briefly describes the

    connection between the two methods and arguesthat whitened PCA may yield identical results toICA in some cases.

    Furthermore, he describes the specific situations

    in which ICA might significantly improve onPCA.

    The main goal of this short paper is to explainthose seemingly contradictory results and toclarify under which circumstances ICA mayoutperform PCA or perform equally.

  • 8/2/2019 Equivalence of Some Common Linear Feature

    3/14

    PCA vs ICA PCA Focus on uncorrelated and Gaussian components

    Second-order statistics

    Orthogonal transformation

    ICA

    Focus on independent and non-Gaussian components

    Higher-order statistics

    Non-orthogonal transformation

  • 8/2/2019 Equivalence of Some Common Linear Feature

    4/14

    The distance between two points is not the same in

    the uncorrelated data and the whitened one. However, the distance

    between two points is equal in the whitened data and the

    independent one

    Fig. 1. Two artificial

    examples: (a) a sub-

    Gaussian data set (top

    row) and a super-

    Gaussian data set (bottom

    row),

    (b) both transformed by

    PCA,

    (c) whitening, and

    (d) using the ICA model

    by means of FastICA.

    (e) shows the original

    data set with the ICA (a1,

    a2 and w1, w2) and PCAdirections (u1, u2).

    (a) Original. (b)

    Uncorrelated.

    (c) Whitened. (d)

    Independent. (e) All

    directions.

  • 8/2/2019 Equivalence of Some Common Linear Feature

    5/14

    Extended infomax

    In order to show that (for this simple artificial data set) the I Cs obtained by Infomax arealso, approximately, an orthonormal transformation of the whitened data, Fig. 2 shows thesame artificial data sets as Fig. 1 but the ICs have been obtained by Extended infomaxalgorithm.

    o NOTE: The claim does not quite hold for the algorithm of Comon . This is because his algorithm

    renormalizes the components so that the basis vectors (columns of A) have unit norm. Since this also

    changes the variance of the independent components (rows of S), the transform is in this case not an

    approximate rotation from the whitened data.

  • 8/2/2019 Equivalence of Some Common Linear Feature

    6/14

    DIFFERENT ARCHITECTURES:

    INDEPENDENCE IN A OR IN S?Two different architectures for the ICA decomposition namely,

    Architecture I: we want the basis images (columns of A) to be mutually independent

    Architecture II: we want a decomposition where the coefficients (rows of S) are mutuallyindependent

    In either case, the independent components, that is, the matrix having been optimized forindependence, is always an orthogonal transformation from the whitened data given by

    SVD, at least in the case of FastICA.

  • 8/2/2019 Equivalence of Some Common Linear Feature

    7/14

    ROTATION INVARIANT CLASSIFIERSMost common classifiers are invariant to a rotation of the

    data space.if one does not have any preexisting knowledge on the

    structure of the data space, it seems reasonable to build aclassifier which does not care about data rotations.The author argues that the independent components of

    ICA are the result of a simple rotation of the whitened data.Hence, a rotation invariant classifier will not do any better

    nor any worse when fed with the independent componentsthan when fed with the whitened data. Thus, ICA gives youabsolutely no advantage over whitening by SVD.

    if the used classifier is rotationally non-invariant, theremay be advantages to performing the rotation ICA gives.

  • 8/2/2019 Equivalence of Some Common Linear Feature

    8/14

    SELECTING SUBSETS OF COMPONENTS

    In (a), the classes are perfectly separated

    using just the projection over the ICA

    direction w1, while in (b), the classes arebetter separated using the direction of the

    eigenvector u2.

    The performance of ICA and PCA may differ

    when a subset of components is used for

    classification.

    In (a) the classes are perfectly separated using just the projection over the ICA direction w1, in

    (b) the classes are better separated using the

    direction of eigenvector u2.

    Feature selection is the process of removing features from the data set that are irrelevant with

    respect to the task that is to be performed.

    Therefore, in each example, a feature selection step may help to reduce the dimensionality and

    improve the classification. If no feature selection is carried out, ICA and whitened PCA

    perform exactly equally well on both data sets, provided a rotationally invariant classifier is

    used.

  • 8/2/2019 Equivalence of Some Common Linear Feature

    9/14

    Experimental Results

  • 8/2/2019 Equivalence of Some Common Linear Feature

    10/14

    Limitations: Test evaluation is done according to only one criterion

    i.e. Recognition rate

    Other criteria such as computational cost may alsoapply.

    Database considered by the author for experimentalresults are OSL database. Experimental results fordatabase JAFFE , FERRET and C-K may vary and are notcovered in the paper. Thus there is no generalization for thedatabases used stated in the paper.

  • 8/2/2019 Equivalence of Some Common Linear Feature

    11/14

    Which is to be preferred? Though this comparative study does not consider

    effects of registration errors and image pre-processingschemes, results show

    There is no clear winner Because,

    All algorithms are unsupervised and

    Results are highly dependent on class distributions.

    Differences cannot be explained in terms ofunderlying data distribution.

  • 8/2/2019 Equivalence of Some Common Linear Feature

    12/14

    Suggestions. While considering recognition task comparisons should be

    made on the basis of1. Nature of tasks2.Architectures3. ICA algorithms

    4. Distance metrics e.g. 1. Facial identity recognition-

    FastICA to implement ICA architecture II provides bestresults followed by PCA with L1 or Mahalanobisdistance metrics( ICA architecture I and PCA

    with L2 or cosine metrics are poorer choices) 2.Facial action recognition

    recommendation is reversed .Infomax to implement ICA architecture I provides best

    results followed by PCA with L2 metric

  • 8/2/2019 Equivalence of Some Common Linear Feature

    13/14

    Authors conclusion In this paper ,it has been shown how ICA and PCA

    are closely connected and under whichcircumstances claim gets failed

    1.A feature selection process is carried out

    2.A rotationally non invariant classifier is used

    3.A renormalization such as that used in Comons

    algorithm is performed.4.Infomax is used and the data yields independent

    components which are not close to orthogonaltransform from the whitened data

  • 8/2/2019 Equivalence of Some Common Linear Feature

    14/14

    Thank You