random projections in dimensionality reduction
TRANSCRIPT
-
8/11/2019 Random Projections in Dimensionality Reduction
1/20
RANDOM PROJECTIONS INDIMENSIONALITY REDUCTION
APPLICATIONS TO IMAGE AND TEXT DATA
Ella Bingham and Heikki Mannilangelo Cardoso
IST/UTLNovember 2009
1
-
8/11/2019 Random Projections in Dimensionality Reduction
2/20
Outline
1. Dimensionality ReductionMotivation
2. Methods for dimensionality reduction1. PCA
2. DCT3. Random Projection
3. Results on Image Data
4. Results on Text Data5. Conclusions
2
-
8/11/2019 Random Projections in Dimensionality Reduction
3/20
Dimensionality ReductionMotivation Many applications have high dimensional data
Market basket analysis Wealth of alternative products
Text
Large vocabulary Image Large image window
We want to process the data High dimensionality of data restricts the choice of data
processing methods Time needed to use processing methods is too long Memory requirements make it impossible to use some
methods
3
-
8/11/2019 Random Projections in Dimensionality Reduction
4/20
Dimensionality ReductionMotivation
We want to visualize high dimensional data
Some features may be irrelevant Some dimensions may be highly correlated with
some other, e.g. height and foot size Intrinsic dimensionality may be smaller than
the number of features
The data can be best described and understoodby a smaller number dimensions
4
-
8/11/2019 Random Projections in Dimensionality Reduction
5/20
Methods for dimensionality reduction
Main idea is to project the high-dimensional (d)space into a lower-dimensional (k) space
A statistically optimal way is to project into alower-dimensional orthogonal subspace that
captures as much variation of the data as possiblefor the chosen k The best (in terms of mean squared error ) and
most widely used way to do this is PCA
How to compare different methods?Amount of distortion caused Computational complexity
5
-
8/11/2019 Random Projections in Dimensionality Reduction
6/20
Principal Components Analysis (PCA)Intuition
Given an original space in 2d
How can we represent that points in a k-dimensional space (k
-
8/11/2019 Random Projections in Dimensionality Reduction
7/20
Principal Components Analysis (PCA)Algorithm
Eigenvalues A measure of how much data variance is
explained by each eigenvector
Singular Value Decomposition (SVD) Can be used to find the eigenvectors
and eigenvalues of the covariancematrix
To project into the lower-dimensionalspace Multiply the principal components (PCs)
by X and subtract the mean of X in each
dimension To restore into the original space
Multiply the projection by the principalcomponents and add the mean of X ineach dimension
Algorithm
1. XCreate N x d data matrix,with one row vector xnperdata point
2. X subtract meanx from eachdimensionin X
3. covariance matrix of X
4. Find eigenvectors andeigenvalues of
5. PCsthe k eigenvectorswith largest eigenvalues
7
-
8/11/2019 Random Projections in Dimensionality Reduction
8/20
Random Projection (RP)Idea
PCA even when calculated using SVD iscomputationally expensive Complexity is O(dcN)
Where d is the number of dimensions, c is the average number of
non-zero entries per column and N the number of points Idea What if we randomly constructed principal component
vectors?
Johnson-Lindenstrauss lemma If points in vector space are projected onto a randomly
selected subspace of suitably high dimensions, then thedistances between the points are approximately preserved
8
-
8/11/2019 Random Projections in Dimensionality Reduction
9/20
Random Projection (RP)Idea Use a random matrix (R) equivalently to the principal
components matrix R is usually Gaussian distributed Complexity is O(kcn)
The generated random matrix (R) is usually notorthogonal Making R orthogonal is computationally expensive However we can rely on a result by Hecht-Nielsen:
In a high-dimensional space, there exists a much larger number ofalmost orthogonal than orthogonal directions.
Thus vectors with random directions are close enough toorthogonal
Euclidean distance in the projected space can be scaled to theoriginal space by kd/
9
-
8/11/2019 Random Projections in Dimensionality Reduction
10/20
Random ProjectionSimplified Random Projection (SRP)
Random matrix is usually gaussian distributedmean: 0; standart deviation: 1
Achlioptas showed that a much simpler
distribution can be used
This implies further computational savings sincethe matrix is sparse and the computations can beperformed using integer arithmetic's
10
-
8/11/2019 Random Projections in Dimensionality Reduction
11/20
Discrete Cosine Transform (DCT)
Widely used method for image compression
Optimal for human eyeDistortions are introduced at the highest
frequencies which humans tend to neglect asnoise
DCT is not data-dependent, in contrast to PCA
that needs the eigenvalue decomposition This makes DCT orders of magnitude cheaper to
compute
11
-
8/11/2019 Random Projections in Dimensionality Reduction
12/20
ResultsNoiseless Images
12
-
8/11/2019 Random Projections in Dimensionality Reduction
13/20
ResultsNoiseless Images
13
-
8/11/2019 Random Projections in Dimensionality Reduction
14/20
ResultsNoiseless Images
14
Original space 2500-d (100 image pairs with 50x50 pixels)
Error Measurement Average error on euclidean distance between 100
pairs of images in the original and reduced space
Amount of distortion RP and SRP give accurate results for very small k
(k>10) Distance scaling might be an explanation for the success
PCA gives accurate results for k>600 In PCA such scaling is not straightforward
DCT still as a significant error even for k > 600
Computational complexity Number of floating point operations for RP and SRP
is on the order of 100 times less than PCA
RP and SRP clearly outperform PCA and DCT atsmallest dimensions
-
8/11/2019 Random Projections in Dimensionality Reduction
15/20
ResultsNoisy Images
Images were corrupted bysalt and pepper impulsenoise with probability 0.2
Error is computed in thehigh-dimensionalnoiselessspace
RP, SRP, PCA and DCTperform quite similarly tothe noiseless case
15
-
8/11/2019 Random Projections in Dimensionality Reduction
16/20
ResultsText Data Data set
Newsgroups corpus sci.crypt, sci.med, sci.space, soc.religion
Pre-processing Term frequency vectors
Some common terms were removed but no stemming was used Document vectors normalized to unit length
Data was not made zero mean
Size 5000 terms 2262 newsgroup documents
Error measurement 100 pairs of documents were randomly selected and the error between
their cosine before and after the dimensionality reduction was calculated
16
-
8/11/2019 Random Projections in Dimensionality Reduction
17/20
-
8/11/2019 Random Projections in Dimensionality Reduction
18/20
ResultsText Data The cosine was used as similarity
measure since it is more commonfor this task
RP is not as accurate as SVD The Johnson-Lindenstrauss result
states that the euclidean distanceare retained well in randomprojection not the cosine
RP error may be neglected inmost applications
RP can be used on largedocument collections with lesscomputational complexity thanSVD
18
-
8/11/2019 Random Projections in Dimensionality Reduction
19/20
Conclusion
Random Projection is an effective dimensionalityreduction method for high-dimensional real-worlddata sets
RP preserves the similarities even if the data isprojected into a moderate number of dimensions
RP is beneficial in applications where thedistances of the original space are meaningful
RP is a good alternative for traditionaldimensionality reduction methods which areinfeasible for high dimensional data since it doesnot suffer from the curse of dimensionality
19
-
8/11/2019 Random Projections in Dimensionality Reduction
20/20
Questions20