random projections in dimensionality reduction

8/11/2019 Random Projections in Dimensionality Reduction

1/20

RANDOM PROJECTIONS INDIMENSIONALITY REDUCTION

APPLICATIONS TO IMAGE AND TEXT DATA

Ella Bingham and Heikki Mannilangelo Cardoso

IST/UTLNovember 2009

1


2/20

Outline

1. Dimensionality ReductionMotivation

2. Methods for dimensionality reduction1. PCA

2. DCT3. Random Projection

3. Results on Image Data

4. Results on Text Data5. Conclusions

2


3/20

Dimensionality ReductionMotivation Many applications have high dimensional data

Market basket analysis Wealth of alternative products

Text

Large vocabulary Image Large image window

We want to process the data High dimensionality of data restricts the choice of data

processing methods Time needed to use processing methods is too long Memory requirements make it impossible to use some

methods

3


4/20

Dimensionality ReductionMotivation

We want to visualize high dimensional data

Some features may be irrelevant Some dimensions may be highly correlated with

some other, e.g. height and foot size Intrinsic dimensionality may be smaller than

the number of features

The data can be best described and understoodby a smaller number dimensions

4


5/20

Methods for dimensionality reduction

Main idea is to project the high-dimensional (d)space into a lower-dimensional (k) space

A statistically optimal way is to project into alower-dimensional orthogonal subspace that

captures as much variation of the data as possiblefor the chosen k The best (in terms of mean squared error ) and

most widely used way to do this is PCA

How to compare different methods?Amount of distortion caused Computational complexity

5


6/20

Principal Components Analysis (PCA)Intuition

Given an original space in 2d

How can we represent that points in a k-dimensional space (k


7/20

Principal Components Analysis (PCA)Algorithm

Eigenvalues A measure of how much data variance is

explained by each eigenvector

Singular Value Decomposition (SVD) Can be used to find the eigenvectors

and eigenvalues of the covariancematrix

To project into the lower-dimensionalspace Multiply the principal components (PCs)

by X and subtract the mean of X in each

dimension To restore into the original space

Multiply the projection by the principalcomponents and add the mean of X ineach dimension

Algorithm

1. XCreate N x d data matrix,with one row vector xnperdata point

2. X subtract meanx from eachdimensionin X

3. covariance matrix of X

4. Find eigenvectors andeigenvalues of

5. PCsthe k eigenvectorswith largest eigenvalues

7


8/20

Random Projection (RP)Idea

PCA even when calculated using SVD iscomputationally expensive Complexity is O(dcN)

Where d is the number of dimensions, c is the average number of

non-zero entries per column and N the number of points Idea What if we randomly constructed principal component

vectors?

Johnson-Lindenstrauss lemma If points in vector space are projected onto a randomly

selected subspace of suitably high dimensions, then thedistances between the points are approximately preserved

8


9/20

Random Projection (RP)Idea Use a random matrix (R) equivalently to the principal

components matrix R is usually Gaussian distributed Complexity is O(kcn)

The generated random matrix (R) is usually notorthogonal Making R orthogonal is computationally expensive However we can rely on a result by Hecht-Nielsen:

In a high-dimensional space, there exists a much larger number ofalmost orthogonal than orthogonal directions.

Thus vectors with random directions are close enough toorthogonal

Euclidean distance in the projected space can be scaled to theoriginal space by kd/

9


10/20

Random ProjectionSimplified Random Projection (SRP)

Random matrix is usually gaussian distributedmean: 0; standart deviation: 1

Achlioptas showed that a much simpler

distribution can be used

This implies further computational savings sincethe matrix is sparse and the computations can beperformed using integer arithmetic's

10


11/20

Discrete Cosine Transform (DCT)

Widely used method for image compression

Optimal for human eyeDistortions are introduced at the highest

frequencies which humans tend to neglect asnoise

DCT is not data-dependent, in contrast to PCA

that needs the eigenvalue decomposition This makes DCT orders of magnitude cheaper to

compute

11


12/20

ResultsNoiseless Images

12


13/20


13


14/20


14

Original space 2500-d (100 image pairs with 50x50 pixels)

Error Measurement Average error on euclidean distance between 100

pairs of images in the original and reduced space

Amount of distortion RP and SRP give accurate results for very small k

(k>10) Distance scaling might be an explanation for the success

PCA gives accurate results for k>600 In PCA such scaling is not straightforward

DCT still as a significant error even for k > 600

Computational complexity Number of floating point operations for RP and SRP

is on the order of 100 times less than PCA

RP and SRP clearly outperform PCA and DCT atsmallest dimensions


15/20

ResultsNoisy Images

Images were corrupted bysalt and pepper impulsenoise with probability 0.2

Error is computed in thehigh-dimensionalnoiselessspace

RP, SRP, PCA and DCTperform quite similarly tothe noiseless case

15


16/20

ResultsText Data Data set

Newsgroups corpus sci.crypt, sci.med, sci.space, soc.religion

Pre-processing Term frequency vectors

Some common terms were removed but no stemming was used Document vectors normalized to unit length

Data was not made zero mean

Size 5000 terms 2262 newsgroup documents

Error measurement 100 pairs of documents were randomly selected and the error between

their cosine before and after the dimensionality reduction was calculated

16


17/20


18/20

ResultsText Data The cosine was used as similarity

measure since it is more commonfor this task

RP is not as accurate as SVD The Johnson-Lindenstrauss result

states that the euclidean distanceare retained well in randomprojection not the cosine

RP error may be neglected inmost applications

RP can be used on largedocument collections with lesscomputational complexity thanSVD

18


19/20

Conclusion

Random Projection is an effective dimensionalityreduction method for high-dimensional real-worlddata sets

RP preserves the similarities even if the data isprojected into a moderate number of dimensions

RP is beneficial in applications where thedistances of the original space are meaningful

RP is a good alternative for traditionaldimensionality reduction methods which areinfeasible for high dimensional data since it doesnot suffer from the curse of dimensionality

19


20/20

Questions20

random projections in dimensionality reduction

Documents