ics 278: data mining lecture 5: low-dimensional representations of high-dimensional data

61
Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Upload: johnna

Post on 30-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data. Today’s lecture. Extend project proposal deadline to Monday, 8am: questions? Outline of today’s lecture: “orphan” slides from earlier lectures Dimension reduction methods Motivation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

ICS 278: Data Mining

Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Page 2: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Today’s lecture

• Extend project proposal deadline to Monday, 8am: questions?

• Outline of today’s lecture:– “orphan” slides from earlier lectures

– Dimension reduction methods• Motivation• Variable selection methods• Linear projection techniques• Non-linear embedding methods

Page 3: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Notation Reminder

))(,),(),(()( 21 ixixixix p

• n objects, each with p measurements• data vector for ith object

• Data matrix • is the ith row, jth column• columns -> variables• rows -> data points

• Can define distances/similarities• between rows (data vectors i)• between columns (variables j)

)(ix j

Page 4: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Distance

),,,( 21 pxxxx • n objects with p measurements

2

1

1

2)(),(

p

kkkE yxyxd

• Most common distance metric is Euclidean distance:

• Makes sense in the case where the different measurements are commensurate; each variable measured in the same units.

• If the measurements are different, say length and weight, Euclidean distance is not necessarily meaningful.

),,,( 21 pyyyy

Page 5: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Dependence among Variables

• Covariance and correlation measure linear dependence(distance between variables, not objects)

• Assume we have two variables or attributes X and Y and n objects taking on values x(1), …, x(n) and y(1), …, y(n). The sample covariance of X and Y is:

• The covariance is a measure of how X and Y vary together.– it will be large and positive if large values of X are

associated with large values of Y, and small X small Y

n

i

yiyxixn

YXCov1

))()()((1

),(

Page 6: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Correlation coefficient

• Covariance depends on ranges of X and Y• Standardize by dividing by standard deviation• Linear correlation coefficient is defined as:

2

1

1

2

1

2

1

))(())((

))()()((),(

n

i

n

i

n

i

yiyxix

yiyxixYX

Page 7: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Sample Correlation Matrix

business acreage

nitrous oxide

percentage of large residential lots

-1 0 +1

Data on characteristicsof Boston surburbs

average # rooms

Median house value

Page 8: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Mahalanobis distance (between objects)

2

11),( yxyxyxd T

MH

1. It automatically accounts for the scaling of the coordinate axes

2. It corrects for correlation between the different features

Cost:1. The covariance matrices can be hard to determine

accurately2. The memory and time requirements grow quadratically,

O(p2), rather than linearly with the number of features.

Inverse covariance matrixVector difference inp-dimensional space

Evaluates to a scalar distance

Page 9: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Example 1 of Mahalanobis distance

Covariance matrix isdiagonal and isotropic

-> all dimensions have equal variance

-> MH distance reduces to Euclidean distance

Page 10: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Example 2 of Mahalanobis distance

Covariance matrix isdiagonal but non-isotropic

-> dimensions do not have equal variance

-> MH distance reduces to weighted Euclidean distance with weights = inverse variance

Page 11: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Example 2 of Mahalanobis distance

Two outer bluepoints will have same MHdistance to the centerblue point

Page 12: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Distances between Binary Vectors

• matching coefficient

i=1 j=0

i=1 n11 n10

i=0 n01 n00

00011011

0011

nnnn

nn

• Jaccard coefficient (e.g., for sparse vectors, non-symmetric)

011011

11

nnn

n

Number ofvariables whereitem j =1 and item i = 0

Page 13: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Other distance metrics

• Categorical variables– Number of matches divided by number of dimensions

• Distances between strings of different lengths– e.g., “Patrick J. Smyth” and “Padhraic Smyth”– Edit distance

• Distances between images and waveforms– Shift-invariant, scale invariant– i.e., d(x,y) = min_{a,b} ( (ax+b) – y)

• More generally, kernel methods

Page 14: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Transforming Data

• Duality between form of the data and the model– Useful to bring data onto a “natural scale”– Some variables are very skewed, e.g., income

• Common transforms: square root, reciprocal, logarithm, raising to a power– Often very useful when dealing with skewed real-world

data

• Logit: transforms from 0 to 1 to real-line

p

pp

1)(logit

Page 15: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Data Quality

• Individual measurements– Random noise in individual measurements

• Variance (precision)• Bias• Random data entry errors• Noise in label assignment (e.g., class labels in medical data sets)

– Systematic errors• E.g., all ages > 99 recorded as 99• More individuals aged 20, 30, 40, etc than expected

– Missing information• Missing at random

– Questions on a questionnaire that people randomly forget to fill in• Missing systematically

– Questions that people don’t want to answer– Patients who are too ill for a certain test

Page 16: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Data Quality

• Collections of measurements– Ideal case = random sample from population of interest– Real case = often a biased sample of some sort– Key point: patterns or models built on the training data may

only be valid on future data that comes from the same distribution

• Examples of non-randomly sampled data– Medical study where subjects are all students– Geographic dependencies– Temporal dependencies– Stratified samples

• E.g., 50% healthy, 50% ill– Hidden systematic effects

• E.g., market basket data the weekend of a large sale in the store• E.g., Web log data during finals week

Page 17: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Classifier technology and the illusion of progress

(abstract for workshop on State-of-theArt in Supervised Classification, May 2006)

Professor David J. Hand, Imperial College, London

Supervised classification methods are widely used in data mining. Highly sophisticated methods have been developed, using the full power of recent advances in computation. However, these advances have largely taken place within the context of a classical paradigm, in which construction of the classification rule is based on a ‘design sample’ of data randomly sampled from unknown but well defined distributions of the classes. In this paper, I argue that this paradigm fails to take account of other sources of uncertainty in the classification problem, and that these other sources lead to uncertainties which often swamp those arising from the classical ones of estimation and prediction. Several examples of such sources are given, including imprecision in the definitions of the classes, sample selectivity bias, population drift, and use of inappropriate optimisation criteria when fitting the model. Furthermore, it is argued, there are both theoretical arguments and practical evidence supporting the assertion that the marginal gains of increasing classifier complexity can often be minimal. In brief, the advances in classification technology are typically much less than is often claimed.

Page 18: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Dimension Reduction Methods

(reading: 3.6 to 3.8 in the text)

Page 19: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Dimension Reduction methods

• Dimension reduction

– From p-dimensional x to d-dimensional x’ , d < p

• Techniques

– Variable selection: • use an algorithm to find individual variables in x that are relevant to the

problem and discard the rest• stepwise logistic regression

– Linear projections• Project data to a lower-dimensional space• E.g., principal components

– Non-linear embedding• Use a non-linear mapping to “embed” data in a lower-dimensional space• E.g., multidimensional scaling

Page 20: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Dimension Reduction: why is it useful?

• In general, incurs loss of information about x– So why do this?

• If dimensionality p is very large (e.g., 1000’s), representing the data in a lower-dimensional space may make learning more reliable,– e.g., clustering example

• 100 dimensional data• but cluster structure is only present in 2 of the dimensions, the others are

just noise• if other 98 dimensions are just noise (relative to cluster structure), then

clusters will be much easier to discover if we just focus on the 2d space

• Dimension reduction can also provide interpretation/insight– e.g for 2d visualization purposes

• Caveat: 2-step approach of dimension reduction followed by learning algorithm is in general suboptimal

Page 21: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Variable Selection Methods• p variables, would like to use a smaller subset in our model

– e.g., in classification, do kNN in d-space rather than p-space– e.g., for logistic regresison, use d inputs rather than p

• Problem:– Number of subsets of p variables is O(2p)– Exhaustive search is impossible except for very small p– Typically the search problem is NP-hard

• Common solution:– Local systematic search (e.g., add/delete variables 1 at a time)

to locally maximize a score function (i.e., hill-climbing)– e.g., add a variable, build new model, generate new score, etc– Can often work well, but can get trapped in local maxima/minima– Can also be computationally-intensive (depends on model)

• Note: some techniques such as decision tree predictors automatically perform dimension reduction as part of the learning algorithm.

Page 22: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Linear Projections

x = p-dimensional vector of data measurements

Let a = weight vector, also dimension p

Assume aT a = 1 (i.e., unit norm)

aT x = aj xj

= projection of x onto vector a, gives distance of projected x along a

e.g., aT = [1 0] -> projection along 1st dimension aT = [0 1] -> projection along 2nd dimension aT = [0.71, 0.71] -> projection along diagonal

Page 23: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Example of projection from 2d to 1d

Direction of weight vector a

x1

x2

Page 24: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Projections to more than 1 dimension

• Multidimensional projections:– e.g., x is 4-dimensional

– a1T = [ 0.71 0.71 0 0 ]

– a2T = [ 0 0 0.71 0.71 ]

AT x -> coordinates of x in 2d space spanned by columns of A

-> linear transformation from 4d to 2d space

where A = [ a1 a2 ]

Page 25: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Principal Components Analysis (PCA)

X = p x n data matrix: columns = p-dim data vectors

Let a = weight vector, also dimension p

Assume aT a = 1 (i.e., unit norm)

aT X = projection of each column x onto vector a, = vector of distances of projected x vectors along a

PCA: find vector a such that var(aT X ) is maximized i.e., find linear projection with maximal variance

More generally:ATX = d x n data matrix with x vectors projected to d-dimensional space, where size(A) = p x d PCA: find d orthogonal columns of A such that variance in the d-dimensional projected space is maximized, d < p

Page 26: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

PCA Example

Direction of 1st principal component vector (highest variance projection)

x1

x2

Page 27: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

PCA Example

Direction of 1st principal component vector (highest variance projection)

x1

x2

Direction of 2ndprincipal component vector

Page 28: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

How do we compute the principal components?

• See class notes

• See also page 78 in the text

Page 29: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Notes on PCA

• Basis representation and approximation error

• Scree diagrams

• Computational complexity of computing PCA– Equivalent to solving set of linear equations, matrix

inversion, singular value decomposition, etc• Scales in general as O(np2 + p3)• Many numerical tricks possible, e.g., for sparse X matrices,

for finding only the first k eigenvectors, etc• In MATLAB can use eig.m or svd.m (also note sparse

versions)

Page 30: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

More notes on PCA

• Links between PCA and multivariate Gaussian density

• Caveat: PCA can destroy information about differences between groups for clustering or classification

• PCA for other data types– Images, e.g., eigenfaces– Text, e.g., “latent semantic indexing” (LSI)

Page 31: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Basis images (eigenimages) of faces

Page 32: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

20 face images

Page 33: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

First 16 Eigenimages

Page 34: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

First 4 eigenimages

Page 35: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Reconstruction of First Image with 8 eigenimages

Original Image

ReconstructedImage

Page 36: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Reconstruction of first image with 8 eigenimages

Weights = -14.0 9.4 -1.1 -3.5 -9.8 -3.5 -0.6 0.6

Reconstructed image = weighted sum of 8 images on left

Page 37: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Reconstruction of 7th image with eigenimages

Original Image

ReconstructedImage

Page 38: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Reconstruction of 7th image with 8 eigenimages

Weights = -13.7 12.9 1.6 4.4 3.0 0.9 1.6 -6.3

Weights for Image 1 = -14.0 9.4 -1.1 -3.5 -9.8 -3.5 -0.6 0.6

Reconstructed image = weighted sum of 8 images on left

Page 39: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Reconstructing Image 6 with 16 eigenimages

Page 40: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Multidimensional Scaling (MDS)

• Say we have data in the form of an N x N matrix of dissimilarities– 0’s on the diagonal– Symmetric– Could either be given data in this form, or create such a

dissimilarity matrix from our data vectors

• Examples– Perceptual dissimilarity of N objects in cognitive science

experiments– String-edit distance between N protein sequences

• MDS:– Find k-dimensional coordinates for each of the N objects such

that Euclidean distances in “embedded” space matches set of dissimilarities as closely as possible

Page 41: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Multidimensional Scaling (MDS)

• MDS score function (“stress”)

• N points embedded in k-dimensions -> Nk locations or parameters– To find the Nk locations?

• Solve optimization problem -> minimize S function

• Often used for visualization, e.g., k=2, 3

jiji

jidjijidS,

2

,

2 ),(/)),(),((

Originaldissimilarities

Euclidean distancein “embedded” k-dim space

Page 42: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

MDS Optimization

• Optimization problem: – S is a function of Nk parameters

• find the set of N k-dimensional positions that minimize S• Note: 3 parameters are redundant: location (2) and rotation (1)

• If original dissimilarities are Euclidean • -> linear algebra solution (equivalent to principal components)

• Non-Euclidean (more typical)– Local iterative hill-climbing, e.g., move each point to increase S, repeat– Non-trivial optimization, can have local minima, etc– Initialization: either random or heuristically (e.g., by PCA)– Complexity is O(N2 k) per iteration (iteration = move all points locally)– See Faloutsos and Lin (1995) for FastMap: O(Nk) approximation for large

N

jiji

jidjijidS,

2

,

2 ),(/)),(),((

Page 43: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

MDS example: input distance data

Chicago Raleigh Boston Seattle S.F. Austin Orlando

Chicago 0

Raleigh 641 0

Boston 851 608 0

Seattle 1733 2363 2488 0

S.F. 1855 2406 2696 684 0

Austin 972 1167 1691 1764 1495 0

Orlando 994 520 1105 2565 2458 1015 0

Page 44: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Page 45: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Result of MDS

Page 46: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

MDS for protein sequences

226 protein sequences of the Globin family (from Klock & Buhmann 1997).

MDS similarity matrix(note cluster structure) MDS embedding

Page 47: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

MDS from human judgements of similarity

Page 48: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

MDS: Example data

Page 49: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

MDS: 2d embedding of face images

Page 50: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Page 51: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Other embedding techniques

• Many other algorithms for non-linear embedding

• Some of the better-known examples…– Self-organizing maps (Kohonen)

• Neural-network inspired algorithm for 2d embedding

– ISOMAP, Local linear embedding • Find low-d coordinates that preserve local distances• Ignore global distances

Page 52: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Example of Local Linear Embedding (LLE) (Roweis and Saul, Science, 2000)

Note how points that are far away on the 3d manifold(e.g., red and blue) in “manifold distance” would be mappedas being close together by MDS or PCA but are kept“far apart” by LLE. LLE emphasizes local relationships

Page 53: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

LLE Algorithm

• N points in dimension p: wish to reduce to dimension d, d < p

• Step 1:– Select K nearest neighbors for each point in training data– Represent each point as X = a weighted linear combination of its K

neighbor points– Find best k weights for each of the X vectors (least squares fitting)

• Step 2:– Fix the weights from part 1 – For each p-dim vector X, find a d-dimensional Y vector that is

closest to its reconstructed approximation based on d-dim neighbors and weights

– Reduces to another linear algebra/eigenvalue problem, O(N3) complexity

Page 54: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Page 55: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

LLE applied to text data

Page 56: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

LLE applied to a set of face images

Page 57: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Local Linear Embedding example

Page 58: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

ISOMAPTenenbaum, de Silva, Langford, Science, 2000

Similar to LLE in concept: preserves local distances

Computational strategy is different: measures distance in original space via geodesic paths (distance “on manifold”)

Algorithm involves finding shortest paths between points and then embedding

Page 59: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Examplesof ISOMAP

embeddingsin 2d

Page 60: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

ISOMAP:morphingexamples

Page 61: ICS 278: Data Mining Lecture 5: Low-Dimensional Representations of High-Dimensional Data

Data Mining Lectures Lecture 5: Dimension Reduction Padhraic Smyth, UC Irvine

Summary on Dimension Reduction

• Can be used for defining a new set of (lower-dimensional) variables for modeling or for visualization/insight

• 3 general approaches– Variable selection (only select “relevant” variables)– Linear projections (e.g., PCA)– Non-linear projections (e.g., MDS, LLE, ISOMAP)– Can be used with text, images, etc by representing such data

as very high-dimensional vectors– MATLAB implementations of all of these techniques are

available on the Web

– These techniques can be useful, but like any high-powered tool they are not a solution to everything

• real-world data sets often do not produce the types of elegant 2d embeddings that one often sees in research papers on these topics.