cs53000-spring 2020 introduction to scientific …...cs530 / spring 2020 : introduction to...

30
April 8, 2020 CS53000-Spring 2020 Introduction to Scientific Visualization Dimensionality Reduction for Visualization Lecture 13

Upload: others

Post on 26-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

April 8, 2020

CS53000-Spring 2020

Introduction to Scientific Visualization

Dimensionality Reduction for Visualization

Lecture 13

Page 2: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

OutlineHigh-dimensional dataDimensionality reductionManifold learningTopological data analysis

2

Page 3: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

• Data samples with many attributes• high-throughput imaging, hyperspectral

imaging, medical records, …

• PDE modeling of physical systems• example: Navier Stokes equations

Why High-Dimensional?

3

⇢Du

Dt= ⇢

✓@u

@t+ u ·ru

◆= �rp+ µr2u+

1

3µr (r · u) + ⇢g

<latexit sha1_base64="BdLZ/k2vmHhlTSaAe3o9KOYjGqA=">AAAC43icbVLNi9NAFJ/Er7V+VT16GSzCSrEkXUEvQsE9eFzB7i40tb5MJ+2wk5kw8yKUIVcvHhTx6j/lzX/Fk5MmW7K7Pgg8fh/zvpIWUliMoj9BeO36jZu39m737ty9d/9B/+GjY6tLw/iUaanNaQqWS6H4FAVKfloYDnkq+Ul69rbmTz5zY4VWH3BT8HkOKyUywQA9tOj/Tcxa0yQzwNxhkgOu08yVVeUOsaJvaMNKnuF+K0oKMChA0q54B3rTsMPQhC010kRBKuECbsRqjc99hRc7NgXjisZftuDHcdc0bFuIK3dQdVRNf+fPNBV3rvNKw3aWllhVdNEfRKNoG/RqErfJgLRxtOj/TpaalTlXyCRYO4ujAueuHp1JXvWS0vIC2Bms+MynCnJu5257o4o+88iSZtr4TyHdol2Hg9zaTZ56Zd2ivczV4P+4WYnZ67kTqiiRK9YUykp/Ck3rg9OlMJyh3PgEmBG+V8rW4PeI/rfo+SXEl0e+mhyPR/HBaPz+5WAyadexR56Qp2SfxOQVmZB35IhMCQs+BV+Cb8H3kIdfwx/hz0YaBq3nMbkQ4a9/oWrowA==</latexit>

Page 4: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

Why High-Dimensional?• Amalgamate individual properties into high-dimensional feature vectors

• e.g., turn images into vectors

4

Page 5: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

Challenges• “Curse of dimensionality”• Sampling becomes exponentially costly • Space accumulates in“corners” of hypercube • Data processing becomes extremely expensive /

intractable

• How to visualize?• Missing intuition for high-dim spatial domain

5

Page 6: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

6CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

⇡n2

��n2 + 1

�2n

<latexit sha1_base64="2czjdlwhHmVdGfR8iKzpzuVQ97Y=">AAACK3icbZDLSgMxFIYzXmu9VV26CRZBEcpMFXRZdKHLCvYCnbZk0kwbTDJDckYow7yPG1/FhS684Nb3ML0savVA4OP/z+Hk/EEsuAHX/XAWFpeWV1Zza/n1jc2t7cLObt1EiaasRiMR6WZADBNcsRpwEKwZa0ZkIFgjuL8a+Y0Hpg2P1B0MY9aWpK94yCkBK3ULl36oCU39mHfSCaosLWdZlvrXREriCxbCEZ6x8An2sK95fwDHuNxRWbdQdEvuuPBf8KZQRNOqdgsvfi+iiWQKqCDGtDw3hnZKNHAqWJb3E8NiQu9Jn7UsKiKZaafjWzN8aJUeDiNtnwI8VmcnUiKNGcrAdkoCAzPvjcT/vFYC4UU75SpOgCk6WRQmAkOER8HhHteMghhaIFRz+1dMB8TGAjbevA3Bmz/5L9TLJe+0VL49K1Yq0zhyaB8doCPkoXNUQTeoimqIokf0jN7Qu/PkvDqfztekdcGZzuyhX+V8/wCiYafP</latexit>

Ratio of hypersphere volume / hypercube volume

Page 7: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

DIMENSION REDUCTION

7

Page 8: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

Principal Component Analysis• Assume a point cloud • Interpret these points as observations of a

random variable • The empirical mean (centroid) is• The covariance matrix is given by

8

(xi)i=1,..,n � IRk

X � IRk

c =1n

i

xi

Ajl =1n

i

(xij � cj)(xil � cl)

A = XXT

Page 9: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

PCA• The eigenvectors of the covariance matrix

form a data-dependent coordinate system

9

• The first m eigenvectors (in decreasing order of associated eigenvalues) span the m principal dimensions of the point cloud.

Page 10: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

MDSMultidimensional Scaling• Input: dissimilarity matrix with and

• Goal: find such that

10

�ij = �i,j �ii = 0�ij > 0

� 2 IRn⇥n

x1,x2, . . . ,xn 2 IRd

8(i, j) ||xi � xj || ⇡ �i, j

Page 11: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

MDSMethod: • Center(*): where• Spectral decomposition: • Clamp(*): • Solution: first d columns of

This is a PCA!(*): if measures Euclidean distances, is positive semidefinite ( )

11

B = �1

2H�H H = I � 1

n11T

B = U⇤UT

(⇤+)ij = max(⇤ij , 0)

X = U⇤12+

� B �i � 0<latexit sha1_base64="2ePEJcstSKUfZ7QdDdfORtkckqI=">AAAB+XicbVDLSsNAFL3xWesr6tLNYBFclaQKuiy6cVnBPqAJYTKZtEMnkzgzKZTQP3HjQhG3/ok7/8Zpm4W2Hhg4nHMu984JM86Udpxva219Y3Nru7JT3d3bPzi0j447Ks0loW2S8lT2QqwoZ4K2NdOc9jJJcRJy2g1HdzO/O6ZSsVQ86klG/QQPBIsZwdpIgW173IQjHDDkDegTcgK75tSdOdAqcUtSgxKtwP7yopTkCRWacKxU33Uy7RdYakY4nVa9XNEMkxEe0L6hAidU+cX88ik6N0qE4lSaJzSaq78nCpwoNUlCk0ywHqplbyb+5/VzHd/4BRNZrqkgi0VxzpFO0awGFDFJieYTQzCRzNyKyBBLTLQpq2pKcJe/vEo6jbp7WW88XNWat2UdFTiFM7gAF66hCffQgjYQGMMzvMKbVVgv1rv1sYiuWeXMCfyB9fkDbJKS3Q==</latexit>

Page 12: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

MANIFOLD LEARNING

12

Page 13: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

Manifold Learning• Move away from linear assumption• Assume curved low-dimensional

geometry• Assume smoothness➡ MANIFOLD• Manifold learning

13

terminology from geometry and topology in order tocrystallize this notion of dimensionality.

Definition 1. A homeomorphism is a continuous

function whose inverse is also a continuous function.

Definition 2. A d-dimensional manifold M is set

that is locally homeomorphic with Rd. That is, for

each x ∈ M , there is an open neighborhood around

x, Nx, and a homeomorphism f : Nx → Rd. These

neighborhoods are referred to as coordinate patches,and the map is referred to a a coordinate chart. The

image of the coordinate charts is referred to as the

parameter space.

Manifolds are terrifically well-studied in mathe-matics and the above definition is extremely general.We will be interested only in the case where M is asubset of RD, where D is typically much larger thand. In other words, the manifold will lie in a high-dimensional space (RD), but will be homeomorphicwith a low-dimensional space (Rd, with d < D).

Additionally, all of algorithms we will look atrequire some smoothness requirements that furtherconstrain the class of manifolds considered.

Definition 3. A smooth (or differentiable) manifold

is a manifold such that each coordinate chart is dif-

ferentiable with a differentiable inverse (i.e., each co-

ordinate chart is a diffeomorphism).

We will be using the term embedding in a numberof places, though our usage will be somewhat looseand inconsistent. An embedding of a manifold Minto RD is a smooth homeomorphism from M to asubset of RD. The algorithms discussed on this paperfind embeddings of discrete point-sets, by which wesimply mean a mapping of the point set into anotherspace (typically lower-dimensional Euclidean space).An embedding of a dissimilarity matrix into Rd, asdiscussed in section 3.1.2, is a configuration of pointswhose interpoint distances match those given in thematrix.

With this background in mind, we proceed to themain topic of this paper.

2.3 Manifold Learning

We have discussed the importance of dimensionalityreduction and provided some intuition concerning theinadequacy of PCA for data sets lying along somenon-flat surface; we now turn to the manifold learningapproach to dimensionality reduction.

We are given data x1, x2, . . . , xn ∈ RD and we wishto reduce the dimensionality of this data. PCA is ap-propriate if the data lies on a low-dimensional sub-space of RD. Instead, we assume only that the datalies on a d-dimensional manifold embedded into RD,where d < D. Moreover, we assume that the mani-fold is given by a single coordinate chart.1 We cannow describe the problem formally.

Problem: Given points x1, . . . , xn ∈ RD

that lie on a d-dimensional manifold Mthat can be described by a single coordi-nate chart f : M → Rd, find y1, . . . , yn ∈Rd, where yi

def= f(xi).

Solving this problem is referred to as manifoldlearning, since we are trying to “learn” a manifoldfrom a set of points. A simple and oft-used examplein the manifold learning literature is the swiss roll, atwo-dimensional manifold embedded in R3. Figure 3shows the swiss roll and a “learned” two-dimensionalembedding of the manifold found using Isomap, analgorithm discussed later. Notice that points nearbyin the original data set neighbor one another in thetwo-dimensional data set; this effect occurs becausethe chart f between these two graphs is a homeomor-phism (as are all charts).

Though the extent to which “real-world” data setsexhibit low-dimensional manifold structure is still be-ing explored, a number of positive examples havebeen found. Perhaps the most notable occurrence isin video and image data sets. For example, supposewe have a collection of frames taken from a video ofa person rotating his or her head. The dimensional-ity of the data set is equal to the number of pixelsin a frame, which is generally very large. However,the images are actually governed by only a couple ofdegrees of freedom (such as the angle of rotation).Manifold learning techniques have been successfullyapplied to a number of similar video and image datasets [Ple03].

3 Algorithms

In this section, we survey a number of algorithmsproposed for manifold learning. Our treatment goesroughly in chronological order: Isomap and LLE werethe first algorithms developed, followed by Laplacian

1All smooth, compact manifolds can be described by a sin-

gle coordinate chart, so this assumption is not overly restric-

tive.

4

Page 14: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

Isomap• Form k-NN graph of input points• Form dissimilarity matrix as square of approximated geodesic distance between points

• Compute MDS!14

Ufofocbvn!fu!bm/-!Tdjfodf-!3111

Ofbsftu!ofjhicpst

Page 15: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

Isomap• Computing geodesic distance• standard graph problem in CS

15

Ejkltusb!Bmhpsjuin!O)}F}…,…}W}…mph…}W}*

Page 16: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

Isomap• Successful in computer vision problems

16

tion to geodesic distance. For faraway points,geodesic distance can be approximated byadding up a sequence of “short hops” be-tween neighboring points. These approxima-tions are computed efficiently by findingshortest paths in a graph with edges connect-ing neighboring data points.

The complete isometric feature mapping,or Isomap, algorithm has three steps, whichare detailed in Table 1. The first step deter-mines which points are neighbors on themanifold M, based on the distances dX (i, j)between pairs of points i, j in the input space

X. Two simple methods are to connect eachpoint to all points within some fixed radius !,or to all of its K nearest neighbors (15). Theseneighborhood relations are represented as aweighted graph G over the data points, withedges of weight dX(i, j) between neighboringpoints (Fig. 3B).

In its second step, Isomap estimates thegeodesic distances dM (i, j) between all pairsof points on the manifold M by computingtheir shortest path distances dG(i, j) in thegraph G. One simple algorithm (16 ) for find-ing shortest paths is given in Table 1.

The final step applies classical MDS tothe matrix of graph distances DG " {dG(i, j)},constructing an embedding of the data in ad-dimensional Euclidean space Y that bestpreserves the manifold’s estimated intrinsicgeometry (Fig. 3C). The coordinate vectors yi

for points in Y are chosen to minimize thecost function

E ! !#$DG% " #$DY%!L2 (1)

where DY denotes the matrix of Euclideandistances {dY(i, j) " !yi & yj!} and !A!L2

the L2 matrix norm '(i, j Ai j2 . The # operator

Fig. 1. (A) A canonical dimensionality reductionproblem from visual perception. The input consistsof a sequence of 4096-dimensional vectors, rep-resenting the brightness values of 64 pixel by 64pixel images of a face rendered with differentposes and lighting directions. Applied to N " 698raw images, Isomap (K" 6) learns a three-dimen-sional embedding of the data’s intrinsic geometricstructure. A two-dimensional projection is shown,with a sample of the original input images (redcircles) superimposed on all the data points (blue)and horizontal sliders (under the images) repre-senting the third dimension. Each coordinate axisof the embedding correlates highly with one de-gree of freedom underlying the original data: left-right pose (x axis, R " 0.99), up-down pose ( yaxis, R " 0.90), and lighting direction (slider posi-tion, R " 0.92). The input-space distances dX(i, j )given to Isomap were Euclidean distances be-tween the 4096-dimensional image vectors. (B)Isomap applied to N " 1000 handwritten “2”sfrom the MNIST database (40). The two mostsignificant dimensions in the Isomap embedding,shown here, articulate the major features of the“2”: bottom loop (x axis) and top arch ( y axis).Input-space distances dX(i, j ) were measured bytangent distance, a metric designed to capture theinvariances relevant in handwriting recognition(41). Here we used !-Isomap (with ! " 4.2) be-cause we did not expect a constant dimensionalityto hold over the whole data set; consistent withthis, Isomap finds several tendrils projecting fromthe higher dimensional mass of data and repre-senting successive exaggerations of an extrastroke or ornament in the digit.

R E P O R T S

22 DECEMBER 2000 VOL 290 SCIENCE www.sciencemag.org2320

on

Augu

st 7

, 200

8 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

tion to geodesic distance. For faraway points,geodesic distance can be approximated byadding up a sequence of “short hops” be-tween neighboring points. These approxima-tions are computed efficiently by findingshortest paths in a graph with edges connect-ing neighboring data points.

The complete isometric feature mapping,or Isomap, algorithm has three steps, whichare detailed in Table 1. The first step deter-mines which points are neighbors on themanifold M, based on the distances dX (i, j)between pairs of points i, j in the input space

X. Two simple methods are to connect eachpoint to all points within some fixed radius !,or to all of its K nearest neighbors (15). Theseneighborhood relations are represented as aweighted graph G over the data points, withedges of weight dX(i, j) between neighboringpoints (Fig. 3B).

In its second step, Isomap estimates thegeodesic distances dM (i, j) between all pairsof points on the manifold M by computingtheir shortest path distances dG(i, j) in thegraph G. One simple algorithm (16 ) for find-ing shortest paths is given in Table 1.

The final step applies classical MDS tothe matrix of graph distances DG " {dG(i, j)},constructing an embedding of the data in ad-dimensional Euclidean space Y that bestpreserves the manifold’s estimated intrinsicgeometry (Fig. 3C). The coordinate vectors yi

for points in Y are chosen to minimize thecost function

E ! !#$DG% " #$DY%!L2 (1)

where DY denotes the matrix of Euclideandistances {dY(i, j) " !yi & yj!} and !A!L2

the L2 matrix norm '(i, j Ai j2 . The # operator

Fig. 1. (A) A canonical dimensionality reductionproblem from visual perception. The input consistsof a sequence of 4096-dimensional vectors, rep-resenting the brightness values of 64 pixel by 64pixel images of a face rendered with differentposes and lighting directions. Applied to N " 698raw images, Isomap (K" 6) learns a three-dimen-sional embedding of the data’s intrinsic geometricstructure. A two-dimensional projection is shown,with a sample of the original input images (redcircles) superimposed on all the data points (blue)and horizontal sliders (under the images) repre-senting the third dimension. Each coordinate axisof the embedding correlates highly with one de-gree of freedom underlying the original data: left-right pose (x axis, R " 0.99), up-down pose ( yaxis, R " 0.90), and lighting direction (slider posi-tion, R " 0.92). The input-space distances dX(i, j )given to Isomap were Euclidean distances be-tween the 4096-dimensional image vectors. (B)Isomap applied to N " 1000 handwritten “2”sfrom the MNIST database (40). The two mostsignificant dimensions in the Isomap embedding,shown here, articulate the major features of the“2”: bottom loop (x axis) and top arch ( y axis).Input-space distances dX(i, j ) were measured bytangent distance, a metric designed to capture theinvariances relevant in handwriting recognition(41). Here we used !-Isomap (with ! " 4.2) be-cause we did not expect a constant dimensionalityto hold over the whole data set; consistent withthis, Isomap finds several tendrils projecting fromthe higher dimensional mass of data and repre-senting successive exaggerations of an extrastroke or ornament in the digit.

R E P O R T S

22 DECEMBER 2000 VOL 290 SCIENCE www.sciencemag.org2320

on

Augu

st 7

, 200

8 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

Wbszjoh!qptf Iboexsjujohifolds, a guarantee of asymptotic conver-gence to the true structure; and the ability todiscover manifolds of arbitrary dimensional-ity, rather than requiring a fixed d initializedfrom the beginning or computational resourc-es that increase exponentially in d.

Here we have demonstrated Isomap’s per-formance on data sets chosen for their visu-ally compelling structures, but the techniquemay be applied wherever nonlinear geometrycomplicates the use of PCA or MDS. Isomapcomplements, and may be combined with,linear extensions of PCA based on higherorder statistics, such as independent compo-nent analysis (31, 32). It may also lead to abetter understanding of how the brain comesto represent the dynamic appearance of ob-jects, where psychophysical studies of appar-ent motion (33, 34 ) suggest a central role forgeodesic transformations on nonlinear mani-folds (35) much like those studied here.

References and Notes1. M. P. Young, S. Yamane, Science 256, 1327 (1992).2. R. N. Shepard, Science 210, 390 (1980).3. M. Turk, A. Pentland, J. Cogn. Neurosci. 3, 71 (1991).4. H. Murase, S. K. Nayar, Int. J. Comp. Vision 14, 5

(1995).5. J. W. McClurkin, L. M. Optican, B. J. Richmond, T. J.

Gawne, Science 253, 675 (1991).6. J. L. Elman, D. Zipser, J. Acoust. Soc. Am. 83, 1615

(1988).7. W. Klein, R. Plomp, L. C. W. Pols, J. Acoust. Soc. Am.48, 999 (1970).

8. E. Bizzi, F. A. Mussa-Ivaldi, S. Giszter, Science 253, 287(1991).

9. T. D. Sanger, Adv. Neural Info. Proc. Syst. 7, 1023(1995).

10. J. W. Hurrell, Science 269, 676 (1995).11. C. A. L. Bailer-Jones, M. Irwin, T. von Hippel, Mon.Not. R. Astron. Soc. 298, 361 (1997).

12. P. Menozzi, A. Piazza, L. Cavalli-Sforza, Science 201,786 (1978).

13. K. V. Mardia, J. T. Kent, J. M. Bibby, MultivariateAnalysis, (Academic Press, London, 1979).

14. A. H. Monahan, J. Clim., in press.15. The scale-invariant K parameter is typically easier to

set than !, but may yield misleading results when the

local dimensionality varies across the data set. Whenavailable, additional constraints such as the temporalordering of observations may also help to determineneighbors. In earlier work (36) we explored a morecomplex method (37), which required an order ofmagnitude more data and did not support the theo-retical performance guarantees we provide here for!- and K-Isomap.

16. This procedure, known as Floyd’s algorithm, requiresO(N3) operations. More efficient algorithms exploit-ing the sparse structure of the neighborhood graphcan be found in (38).

17. The operator " is defined by "(D) # $HSH/2, where Sis the matrix of squared distances {Sij # Di j

2}, and H isthe “centering matrix” {Hij # %ij $ 1/N} (13).

18. Our proof works by showing that for a sufficientlyhigh density (&) of data points, we can always choosea neighborhood size (! or K) large enough that thegraph will (with high probability) have a path notmuch longer than the true geodesic, but smallenough to prevent edges that “short circuit” the truegeometry of the manifold. More precisely, given ar-bitrarily small values of '1, '2, and (, we can guar-antee that with probability at least 1 $ (, estimatesof the form

)1 ! '1*dM)i, j* " dG)i, j* " )1 # '2*dM)i, j*

will hold uniformly over all pairs of data points i, j. For!-Isomap, we require

! " )2/+*r0!24'1, ! $ s0,

& % ,log)V/(-d)'2!/16*d*./-d)'2!/8*d

where r0 is the minimal radius of curvature of themanifold M as embedded in the input space X, s0 isthe minimal branch separation of M in X, V is the(d-dimensional) volume ofM, and (ignoring boundaryeffects) -d is the volume of the unit ball in Euclideand-space. For K-Isomap, we let ! be as above and fixthe ratio (K / 1)/& # -d(!/2)d/2. We then require

e!)K#1*/4 " (-d)!/4*d/4V,

)e/4*)K#1*/ 2 " (-d)!/8*d/16V,

& % ,4 log)8V/(-d)'2!/32+*d*./-d)'2!/16+*d

The exact content of these conditions—but not theirgeneral form—depends on the particular technicalassumptions we adopt. For details and extensions tononuniform densities, intrinsic curvature, and bound-ary effects, see http://isomap.stanford.edu.

19. In practice, for finite data sets, dG(i, j) may fail toapproximate dM(i, j) for a small fraction of points thatare disconnected from the giant component of theneighborhood graph G. These outliers are easily de-tected as having infinite graph distances from themajority of other points and can be deleted fromfurther analysis.

20. The Isomap embedding of the hand images is avail-able at Science Online at www.sciencemag.org/cgi/content/full/290/5500/2319/DC1. For additionalmaterial and computer code, see http://isomap.stanford.edu.

21. R. Basri, D. Roth, D. Jacobs, Proceedings of the IEEEConference on Computer Vision and Pattern Recog-nition (1998), pp. 414–420.

22. C. Bregler, S. M. Omohundro, Adv. Neural Info. Proc.Syst. 7, 973 (1995).

23. G. E. Hinton, M. Revow, P. Dayan, Adv. Neural Info.Proc. Syst. 7, 1015 (1995).

24. R. Durbin, D. Willshaw, Nature 326, 689 (1987).25. T. Kohonen, Self-Organisation and Associative Mem-ory (Springer-Verlag, Berlin, ed. 2, 1988), pp. 119–157.

26. T. Hastie, W. Stuetzle, J. Am. Stat. Assoc. 84, 502(1989).

27. M. A. Kramer, AIChE J. 37, 233 (1991).28. D. DeMers, G. Cottrell, Adv. Neural Info. Proc. Syst. 5,

580 (1993).29. R. Hecht-Nielsen, Science 269, 1860 (1995).30. C. M. Bishop, M. Svensen, C. K. I. Williams, NeuralComp. 10, 215 (1998).

31. P. Comon, Signal Proc. 36, 287 (1994).32. A. J. Bell, T. J. Sejnowski, Neural Comp. 7, 1129

(1995).33. R. N. Shepard, S. A. Judd, Science 191, 952 (1976).34. M. Shiffrar, J. J. Freyd, Psychol. Science 1, 257 (1990).

Table 1. The Isomap algorithm takes as input the distances dX(i, j ) between all pairs i, j from N data pointsin the high-dimensional input space X, measured either in the standard Euclidean metric (as in Fig. 1A)or in some domain-specific metric (as in Fig. 1B). The algorithm outputs coordinate vectors yi in ad-dimensional Euclidean space Y that (according to Eq. 1) best represent the intrinsic geometry of thedata. The only free parameter (! or K ) appears in Step 1.

Step

1 Construct neighborhood graph Define the graph G over all data points by connectingpoints i and j if [as measured by dX(i, j )] they arecloser than ! (!-Isomap), or if i is one of the Knearest neighbors of j (K-Isomap). Set edge lengthsequal to dX(i, j).

2 Compute shortest paths Initialize dG(i, j) # dX(i, j) if i, j are linked by an edge;dG(i, j) # 0 otherwise. Then for each value of k #1, 2, . . ., N in turn, replace all entries dG(i, j) bymin{dG(i, j), dG(i,k) / dG(k, j)}. The matrix of finalvalues DG # {dG(i, j)} will contain the shortest pathdistances between all pairs of points in G (16, 19).

3 Construct d-dimensional embedding Let 'p be the p-th eigenvalue (in decreasing order) ofthe matrix "(DG) (17 ), and v p

i be the i-thcomponent of the p-th eigenvector. Then set thep-th component of the d-dimensional coordinatevector yi equal to 1'pvp

i .

Fig. 4. Interpolations along straight lines inthe Isomap coordinate space (analogous tothe blue line in Fig. 3C) implement perceptu-ally natural but highly nonlinear “morphs” ofthe corresponding high-dimensional observa-tions (43) by transforming them approxi-mately along geodesic paths (analogous tothe solid curve in Fig. 3A). (A) Interpolationsin a three-dimensional embedding of faceimages (Fig. 1A). (B) Interpolations in a four-dimensional embedding of hand images (20)appear as natural hand movements whenviewed in quick succession, even though nosuch motions occurred in the observed data. (C)Interpolations in a six-dimensional embedding ofhandwritten “2”s (Fig. 1B) preserve continuity notonly in the visual features of loop and arch artic-ulation, but also in the implied pen trajectories,which are the true degrees of freedom underlyingthose appearances.

R E P O R T S

22 DECEMBER 2000 VOL 290 SCIENCE www.sciencemag.org2322

on

Augu

st 7

, 200

8 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

ifolds, a guarantee of asymptotic conver-gence to the true structure; and the ability todiscover manifolds of arbitrary dimensional-ity, rather than requiring a fixed d initializedfrom the beginning or computational resourc-es that increase exponentially in d.

Here we have demonstrated Isomap’s per-formance on data sets chosen for their visu-ally compelling structures, but the techniquemay be applied wherever nonlinear geometrycomplicates the use of PCA or MDS. Isomapcomplements, and may be combined with,linear extensions of PCA based on higherorder statistics, such as independent compo-nent analysis (31, 32). It may also lead to abetter understanding of how the brain comesto represent the dynamic appearance of ob-jects, where psychophysical studies of appar-ent motion (33, 34 ) suggest a central role forgeodesic transformations on nonlinear mani-folds (35) much like those studied here.

References and Notes1. M. P. Young, S. Yamane, Science 256, 1327 (1992).2. R. N. Shepard, Science 210, 390 (1980).3. M. Turk, A. Pentland, J. Cogn. Neurosci. 3, 71 (1991).4. H. Murase, S. K. Nayar, Int. J. Comp. Vision 14, 5

(1995).5. J. W. McClurkin, L. M. Optican, B. J. Richmond, T. J.

Gawne, Science 253, 675 (1991).6. J. L. Elman, D. Zipser, J. Acoust. Soc. Am. 83, 1615

(1988).7. W. Klein, R. Plomp, L. C. W. Pols, J. Acoust. Soc. Am.48, 999 (1970).

8. E. Bizzi, F. A. Mussa-Ivaldi, S. Giszter, Science 253, 287(1991).

9. T. D. Sanger, Adv. Neural Info. Proc. Syst. 7, 1023(1995).

10. J. W. Hurrell, Science 269, 676 (1995).11. C. A. L. Bailer-Jones, M. Irwin, T. von Hippel, Mon.Not. R. Astron. Soc. 298, 361 (1997).

12. P. Menozzi, A. Piazza, L. Cavalli-Sforza, Science 201,786 (1978).

13. K. V. Mardia, J. T. Kent, J. M. Bibby, MultivariateAnalysis, (Academic Press, London, 1979).

14. A. H. Monahan, J. Clim., in press.15. The scale-invariant K parameter is typically easier to

set than !, but may yield misleading results when the

local dimensionality varies across the data set. Whenavailable, additional constraints such as the temporalordering of observations may also help to determineneighbors. In earlier work (36) we explored a morecomplex method (37), which required an order ofmagnitude more data and did not support the theo-retical performance guarantees we provide here for!- and K-Isomap.

16. This procedure, known as Floyd’s algorithm, requiresO(N3) operations. More efficient algorithms exploit-ing the sparse structure of the neighborhood graphcan be found in (38).

17. The operator " is defined by "(D) # $HSH/2, where Sis the matrix of squared distances {Sij # Di j

2}, and H isthe “centering matrix” {Hij # %ij $ 1/N} (13).

18. Our proof works by showing that for a sufficientlyhigh density (&) of data points, we can always choosea neighborhood size (! or K) large enough that thegraph will (with high probability) have a path notmuch longer than the true geodesic, but smallenough to prevent edges that “short circuit” the truegeometry of the manifold. More precisely, given ar-bitrarily small values of '1, '2, and (, we can guar-antee that with probability at least 1 $ (, estimatesof the form

)1 ! '1*dM)i, j* " dG)i, j* " )1 # '2*dM)i, j*

will hold uniformly over all pairs of data points i, j. For!-Isomap, we require

! " )2/+*r0!24'1, ! $ s0,

& % ,log)V/(-d)'2!/16*d*./-d)'2!/8*d

where r0 is the minimal radius of curvature of themanifold M as embedded in the input space X, s0 isthe minimal branch separation of M in X, V is the(d-dimensional) volume ofM, and (ignoring boundaryeffects) -d is the volume of the unit ball in Euclideand-space. For K-Isomap, we let ! be as above and fixthe ratio (K / 1)/& # -d(!/2)d/2. We then require

e!)K#1*/4 " (-d)!/4*d/4V,

)e/4*)K#1*/ 2 " (-d)!/8*d/16V,

& % ,4 log)8V/(-d)'2!/32+*d*./-d)'2!/16+*d

The exact content of these conditions—but not theirgeneral form—depends on the particular technicalassumptions we adopt. For details and extensions tononuniform densities, intrinsic curvature, and bound-ary effects, see http://isomap.stanford.edu.

19. In practice, for finite data sets, dG(i, j) may fail toapproximate dM(i, j) for a small fraction of points thatare disconnected from the giant component of theneighborhood graph G. These outliers are easily de-tected as having infinite graph distances from themajority of other points and can be deleted fromfurther analysis.

20. The Isomap embedding of the hand images is avail-able at Science Online at www.sciencemag.org/cgi/content/full/290/5500/2319/DC1. For additionalmaterial and computer code, see http://isomap.stanford.edu.

21. R. Basri, D. Roth, D. Jacobs, Proceedings of the IEEEConference on Computer Vision and Pattern Recog-nition (1998), pp. 414–420.

22. C. Bregler, S. M. Omohundro, Adv. Neural Info. Proc.Syst. 7, 973 (1995).

23. G. E. Hinton, M. Revow, P. Dayan, Adv. Neural Info.Proc. Syst. 7, 1015 (1995).

24. R. Durbin, D. Willshaw, Nature 326, 689 (1987).25. T. Kohonen, Self-Organisation and Associative Mem-ory (Springer-Verlag, Berlin, ed. 2, 1988), pp. 119–157.

26. T. Hastie, W. Stuetzle, J. Am. Stat. Assoc. 84, 502(1989).

27. M. A. Kramer, AIChE J. 37, 233 (1991).28. D. DeMers, G. Cottrell, Adv. Neural Info. Proc. Syst. 5,

580 (1993).29. R. Hecht-Nielsen, Science 269, 1860 (1995).30. C. M. Bishop, M. Svensen, C. K. I. Williams, NeuralComp. 10, 215 (1998).

31. P. Comon, Signal Proc. 36, 287 (1994).32. A. J. Bell, T. J. Sejnowski, Neural Comp. 7, 1129

(1995).33. R. N. Shepard, S. A. Judd, Science 191, 952 (1976).34. M. Shiffrar, J. J. Freyd, Psychol. Science 1, 257 (1990).

Table 1. The Isomap algorithm takes as input the distances dX(i, j ) between all pairs i, j from N data pointsin the high-dimensional input space X, measured either in the standard Euclidean metric (as in Fig. 1A)or in some domain-specific metric (as in Fig. 1B). The algorithm outputs coordinate vectors yi in ad-dimensional Euclidean space Y that (according to Eq. 1) best represent the intrinsic geometry of thedata. The only free parameter (! or K ) appears in Step 1.

Step

1 Construct neighborhood graph Define the graph G over all data points by connectingpoints i and j if [as measured by dX(i, j )] they arecloser than ! (!-Isomap), or if i is one of the Knearest neighbors of j (K-Isomap). Set edge lengthsequal to dX(i, j).

2 Compute shortest paths Initialize dG(i, j) # dX(i, j) if i, j are linked by an edge;dG(i, j) # 0 otherwise. Then for each value of k #1, 2, . . ., N in turn, replace all entries dG(i, j) bymin{dG(i, j), dG(i,k) / dG(k, j)}. The matrix of finalvalues DG # {dG(i, j)} will contain the shortest pathdistances between all pairs of points in G (16, 19).

3 Construct d-dimensional embedding Let 'p be the p-th eigenvalue (in decreasing order) ofthe matrix "(DG) (17 ), and v p

i be the i-thcomponent of the p-th eigenvector. Then set thep-th component of the d-dimensional coordinatevector yi equal to 1'pvp

i .

Fig. 4. Interpolations along straight lines inthe Isomap coordinate space (analogous tothe blue line in Fig. 3C) implement perceptu-ally natural but highly nonlinear “morphs” ofthe corresponding high-dimensional observa-tions (43) by transforming them approxi-mately along geodesic paths (analogous tothe solid curve in Fig. 3A). (A) Interpolationsin a three-dimensional embedding of faceimages (Fig. 1A). (B) Interpolations in a four-dimensional embedding of hand images (20)appear as natural hand movements whenviewed in quick succession, even though nosuch motions occurred in the observed data. (C)Interpolations in a six-dimensional embedding ofhandwritten “2”s (Fig. 1B) preserve continuity notonly in the visual features of loop and arch artic-ulation, but also in the implied pen trajectories,which are the true degrees of freedom underlyingthose appearances.

R E P O R T S

22 DECEMBER 2000 VOL 290 SCIENCE www.sciencemag.org2322

on

Augu

st 7

, 200

8 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

Page 17: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020 17

Intuition• smooth manifold is locally close to linear

Method• Characterize local linearity through weight matrix

such that is minimized • Find d-dimensional ‘s that minimize • Solution is obtained as first d eigenvectors of

Locally Linear Embedding

W||xi �

X

j2N (i)

Wijxj ||2

yX

i

||yi �X

j

Wijyj ||2

(I �W )T (I �W )

Spxfjt!fu!bm/-!Tdjfodf-!3111

Page 18: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020 17

Intuition• smooth manifold is locally close to linear

Method• Characterize local linearity through weight matrix

such that is minimized • Find d-dimensional ‘s that minimize • Solution is obtained as first d eigenvectors of

Locally Linear Embedding

W||xi �

X

j2N (i)

Wijxj ||2

yX

i

||yi �X

j

Wijyj ||2

(I �W )T (I �W )

Spxfjt!fu!bm/-!Tdjfodf-!3111

function subject to two constraints: first, thateach data point !Xi is reconstructed only fromits neighbors (5), enforcing Wij ! 0 if !Xj does

not belong to the set of neighbors of !Xi;second, that the rows of the weight matrixsum to one: "jWij ! 1. The optimal weights

Wij subject to these constraints (6 ) are foundby solving a least-squares problem (7 ).

The constrained weights that minimizethese reconstruction errors obey an importantsymmetry: for any particular data point, theyare invariant to rotations, rescalings, andtranslations of that data point and its neigh-bors. By symmetry, it follows that the recon-struction weights characterize intrinsic geo-metric properties of each neighborhood, asopposed to properties that depend on a par-ticular frame of reference (8). Note that theinvariance to translations is specifically en-forced by the sum-to-one constraint on therows of the weight matrix.

Suppose the data lie on or near a smoothnonlinear manifold of lower dimensionality d## D. To a good approximation then, thereexists a linear mapping—consisting of atranslation, rotation, and rescaling—thatmaps the high-dimensional coordinates ofeach neighborhood to global internal coordi-nates on the manifold. By design, the recon-struction weights Wij reflect intrinsic geomet-ric properties of the data that are invariant toexactly such transformations. We thereforeexpect their characterization of local geome-try in the original data space to be equallyvalid for local patches on the manifold. Inparticular, the same weights Wij that recon-struct the ith data point in D dimensionsshould also reconstruct its embedded mani-fold coordinates in d dimensions.

LLE constructs a neighborhood-preservingmapping based on the above idea. In the finalstep of the algorithm, each high-dimensionalobservation !Xi is mapped to a low-dimensionalvector !Yi representing global internal coordi-nates on the manifold. This is done by choosingd-dimensional coordinates !Yi to minimize theembedding cost function

$%Y & ! !i

" !Yi " "jWij!Yj" 2

(2)

This cost function, like the previous one, isbased on locally linear reconstruction errors,but here we fix the weights Wij while opti-mizing the coordinates !Yi. The embeddingcost in Eq. 2 defines a quadratic form in thevectors !Yi. Subject to constraints that makethe problem well-posed, it can be minimizedby solving a sparse N ' N eigenvalue prob-lem (9), whose bottom d nonzero eigenvec-tors provide an ordered set of orthogonalcoordinates centered on the origin.

Implementation of the algorithm isstraightforward. In our experiments, datapoints were reconstructed from their K near-est neighbors, as measured by Euclidean dis-tance or normalized dot products. For suchimplementations of LLE, the algorithm hasonly one free parameter: the number ofneighbors, K. Once neighbors are chosen, theoptimal weights Wij and coordinates !Yi are

Fig. 2. Steps of locally lin-ear embedding: (1) Assignneighbors to each datapoint !Xi (for example byusing the K nearest neigh-bors). (2) Compute theweights Wij that best lin-early reconstruct !Xi fromits neighbors, solving theconstrained least-squaresproblem in Eq. 1. (3) Com-pute the low-dimensionalembedding vectors !Yi bestreconstructed by Wij, mini-mizing Eq. 2 by finding thesmallest eigenmodes ofthe sparse symmetric ma-trix in Eq. 3. Although theweights Wij and vectors Yiare computed by methodsin linear algebra, the con-straint that points are onlyreconstructed from neigh-bors can result in highlynonlinear embeddings.

Fig. 3. Images of faces (11) mapped into the embedding space described by the first twocoordinates of LLE. Representative faces are shown next to circled points in different parts of thespace. The bottom images correspond to points along the top-right path (linked by solid line),illustrating one particular mode of variability in pose and expression.

R E P O R T S

22 DECEMBER 2000 VOL 290 SCIENCE www.sciencemag.org2324

on

Augu

st 7

, 200

8 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

Page 19: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020 17

Intuition• smooth manifold is locally close to linear

Method• Characterize local linearity through weight matrix

such that is minimized • Find d-dimensional ‘s that minimize • Solution is obtained as first d eigenvectors of

Locally Linear Embedding

W||xi �

X

j2N (i)

Wijxj ||2

yX

i

||yi �X

j

Wijyj ||2

(I �W )T (I �W )

Spxfjt!fu!bm/-!Tdjfodf-!3111

Page 20: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

LLE

18

35. R. N. Shepard, Psychon. Bull. Rev. 1, 2 (1994).36. J. B. Tenenbaum, Adv. Neural Info. Proc. Syst. 10, 682(1998).

37. T. Martinetz, K. Schulten, Neural Netw. 7, 507 (1994).38. V. Kumar, A. Grama, A. Gupta, G. Karypis, Introduc-tion to Parallel Computing: Design and Analysis ofAlgorithms (Benjamin/Cummings, Redwood City, CA,1994), pp. 257–297.

39. D. Beymer, T. Poggio, Science 272, 1905 (1996).40. Available at www.research.att.com/!yann/ocr/mnist.41. P. Y. Simard, Y. LeCun, J. Denker, Adv. Neural Info.Proc. Syst. 5, 50 (1993).

42. In order to evaluate the fits of PCA, MDS, and Isomapon comparable grounds, we use the residual variance

1 – R2(DM , DY). DY is the matrix of Euclidean distanc-es in the low-dimensional embedding recovered byeach algorithm. DM is each algorithm’s best estimateof the intrinsic manifold distances: for Isomap, this isthe graph distance matrix DG; for PCA and MDS, it isthe Euclidean input-space distance matrix DX (exceptwith the handwritten “2”s, where MDS uses thetangent distance). R is the standard linear correlationcoefficient, taken over all entries of DM and DY.

43. In each sequence shown, the three intermediate im-ages are those closest to the points 1/4, 1/2, and 3/4of the way between the given endpoints. We can alsosynthesize an explicit mapping from input space X tothe low-dimensional embedding Y, or vice versa, us-

ing the coordinates of corresponding points {xi , yi} inboth spaces provided by Isomap together with stan-dard supervised learning techniques (39).

44. Supported by the Mitsubishi Electric Research Labo-ratories, the Schlumberger Foundation, the NSF(DBS-9021648), and the DARPA Human ID program.We thank Y. LeCun for making available the MNISTdatabase and S. Roweis and L. Saul for sharing relatedunpublished work. For many helpful discussions, wethank G. Carlsson, H. Farid, W. Freeman, T. Griffiths,R. Lehrer, S. Mahajan, D. Reich, W. Richards, J. M.Tenenbaum, Y. Weiss, and especially M. Bernstein.

10 August 2000; accepted 21 November 2000

Nonlinear DimensionalityReduction by

Locally Linear EmbeddingSam T. Roweis1 and Lawrence K. Saul2

Many areas of science depend on exploratory data analysis and visualization.The need to analyze large amounts of multivariate data raises the fundamentalproblem of dimensionality reduction: how to discover compact representationsof high-dimensional data. Here, we introduce locally linear embedding (LLE), anunsupervised learning algorithm that computes low-dimensional, neighbor-hood-preserving embeddings of high-dimensional inputs. Unlike clusteringmethods for local dimensionality reduction, LLE maps its inputs into a singleglobal coordinate system of lower dimensionality, and its optimizations do notinvolve local minima. By exploiting the local symmetries of linear reconstruc-tions, LLE is able to learn the global structure of nonlinear manifolds, such asthose generated by images of faces or documents of text.

How do we judge similarity? Our mentalrepresentations of the world are formed byprocessing large numbers of sensory in-puts—including, for example, the pixel in-tensities of images, the power spectra ofsounds, and the joint angles of articulatedbodies. While complex stimuli of this form canbe represented by points in a high-dimensionalvector space, they typically have a much morecompact description. Coherent structure in theworld leads to strong correlations between in-puts (such as between neighboring pixels inimages), generating observations that lie on orclose to a smooth low-dimensional manifold.To compare and classify such observations—ineffect, to reason about the world—dependscrucially on modeling the nonlinear geometryof these low-dimensional manifolds.

Scientists interested in exploratory analysisor visualization of multivariate data (1) face asimilar problem in dimensionality reduction.The problem, as illustrated in Fig. 1, involvesmapping high-dimensional inputs into a low-dimensional “description” space with as many

coordinates as observed modes of variability.Previous approaches to this problem, based onmultidimensional scaling (MDS) (2), havecomputed embeddings that attempt to preservepairwise distances [or generalized disparities(3)] between data points; these distances aremeasured along straight lines or, in more so-phisticated usages of MDS such as Isomap (4),

along shortest paths confined to the manifold ofobserved inputs. Here, we take a different ap-proach, called locally linear embedding (LLE),that eliminates the need to estimate pairwisedistances between widely separated data points.Unlike previous methods, LLE recovers globalnonlinear structure from locally linear fits.

The LLE algorithm, summarized in Fig.2, is based on simple geometric intuitions.Suppose the data consist of N real-valuedvectors !Xi, each of dimensionality D, sam-pled from some underlying manifold. Pro-vided there is sufficient data (such that themanifold is well-sampled), we expect eachdata point and its neighbors to lie on orclose to a locally linear patch of the mani-fold. We characterize the local geometry ofthese patches by linear coefficients thatreconstruct each data point from its neigh-bors. Reconstruction errors are measuredby the cost function

ε"W # ! !i

" !Xi$%jWij!Xj" 2

(1)

which adds up the squared distances betweenall the data points and their reconstructions. Theweights Wij summarize the contribution of thejth data point to the ith reconstruction. To com-pute the weights Wij, we minimize the cost

1Gatsby Computational Neuroscience Unit, Universi-ty College London, 17 Queen Square, London WC1N3AR, UK. 2AT&T Lab—Research, 180 Park Avenue,Florham Park, NJ 07932, USA.

E-mail: [email protected] (S.T.R.); [email protected] (L.K.S.)

Fig. 1. The problem of nonlinear dimensionality reduction, as illustrated (10) for three-dimensionaldata (B) sampled from a two-dimensional manifold (A). An unsupervised learning algorithm mustdiscover the global internal coordinates of the manifold without signals that explicitly indicate howthe data should be embedded in two dimensions. The color coding illustrates the neighborhood-preserving mapping discovered by LLE; black outlines in (B) and (C) show the neighborhood of asingle point. Unlike LLE, projections of the data by principal component analysis (PCA) (28) orclassical MDS (2) map faraway data points to nearby points in the plane, failing to identify theunderlying structure of the manifold. Note that mixture models for local dimensionality reduction(29), which cluster the data and perform PCA within each cluster, do not address the problemconsidered here: namely, how to map high-dimensional data into a single global coordinate systemof lower dimensionality.

R E P O R T S

www.sciencemag.org SCIENCE VOL 290 22 DECEMBER 2000 2323

on

Augu

st 7

, 200

8 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

function subject to two constraints: first, thateach data point !Xi is reconstructed only fromits neighbors (5), enforcing Wij ! 0 if !Xj does

not belong to the set of neighbors of !Xi;second, that the rows of the weight matrixsum to one: "jWij ! 1. The optimal weights

Wij subject to these constraints (6 ) are foundby solving a least-squares problem (7 ).

The constrained weights that minimizethese reconstruction errors obey an importantsymmetry: for any particular data point, theyare invariant to rotations, rescalings, andtranslations of that data point and its neigh-bors. By symmetry, it follows that the recon-struction weights characterize intrinsic geo-metric properties of each neighborhood, asopposed to properties that depend on a par-ticular frame of reference (8). Note that theinvariance to translations is specifically en-forced by the sum-to-one constraint on therows of the weight matrix.

Suppose the data lie on or near a smoothnonlinear manifold of lower dimensionality d## D. To a good approximation then, thereexists a linear mapping—consisting of atranslation, rotation, and rescaling—thatmaps the high-dimensional coordinates ofeach neighborhood to global internal coordi-nates on the manifold. By design, the recon-struction weights Wij reflect intrinsic geomet-ric properties of the data that are invariant toexactly such transformations. We thereforeexpect their characterization of local geome-try in the original data space to be equallyvalid for local patches on the manifold. Inparticular, the same weights Wij that recon-struct the ith data point in D dimensionsshould also reconstruct its embedded mani-fold coordinates in d dimensions.

LLE constructs a neighborhood-preservingmapping based on the above idea. In the finalstep of the algorithm, each high-dimensionalobservation !Xi is mapped to a low-dimensionalvector !Yi representing global internal coordi-nates on the manifold. This is done by choosingd-dimensional coordinates !Yi to minimize theembedding cost function

$%Y & ! !i

" !Yi " "jWij!Yj" 2

(2)

This cost function, like the previous one, isbased on locally linear reconstruction errors,but here we fix the weights Wij while opti-mizing the coordinates !Yi. The embeddingcost in Eq. 2 defines a quadratic form in thevectors !Yi. Subject to constraints that makethe problem well-posed, it can be minimizedby solving a sparse N ' N eigenvalue prob-lem (9), whose bottom d nonzero eigenvec-tors provide an ordered set of orthogonalcoordinates centered on the origin.

Implementation of the algorithm isstraightforward. In our experiments, datapoints were reconstructed from their K near-est neighbors, as measured by Euclidean dis-tance or normalized dot products. For suchimplementations of LLE, the algorithm hasonly one free parameter: the number ofneighbors, K. Once neighbors are chosen, theoptimal weights Wij and coordinates !Yi are

Fig. 2. Steps of locally lin-ear embedding: (1) Assignneighbors to each datapoint !Xi (for example byusing the K nearest neigh-bors). (2) Compute theweights Wij that best lin-early reconstruct !Xi fromits neighbors, solving theconstrained least-squaresproblem in Eq. 1. (3) Com-pute the low-dimensionalembedding vectors !Yi bestreconstructed by Wij, mini-mizing Eq. 2 by finding thesmallest eigenmodes ofthe sparse symmetric ma-trix in Eq. 3. Although theweights Wij and vectors Yiare computed by methodsin linear algebra, the con-straint that points are onlyreconstructed from neigh-bors can result in highlynonlinear embeddings.

Fig. 3. Images of faces (11) mapped into the embedding space described by the first twocoordinates of LLE. Representative faces are shown next to circled points in different parts of thespace. The bottom images correspond to points along the top-right path (linked by solid line),illustrating one particular mode of variability in pose and expression.

R E P O R T S

22 DECEMBER 2000 VOL 290 SCIENCE www.sciencemag.org2324

on

Augu

st 7

, 200

8 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

Page 21: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

Laplacian Eigenmaps• Graph Laplacian of is

with diagonal and • Define as either binary indicator of

connectivity or through heat kernel

• Compute eigenvectors of Laplacian associated with non-zero eigenvalue

19

L = D �WW

Dii =X

j

WijD

W

y

Cfmljo-!Ojzphj-!Bewbodft!jo!Ofvsbm!Jogpsnbujpo!Qspdfttjoh!Tztufnt-!3112!

Lf = �Df<latexit sha1_base64="dsrLN8VTUlLzGaQrKQMwKEl7fsg=">AAACCnicbVDLSsNAFL3xWesr6tLNaBFclaQKuhEKunDhooJ9QBPKZDJph04ezEyEErp246+4caGIW7/AnX/jpA2orQcGDufcw9x7vIQzqSzry1hYXFpeWS2tldc3Nre2zZ3dloxTQWiTxDwWHQ9LyllEm4opTjuJoDj0OG17w8vcb99TIVkc3alRQt0Q9yMWMIKVlnrmwY0TYjXwgiwYowvkcB31Mbr6UXtmxapaE6B5YhekAgUaPfPT8WOShjRShGMpu7aVKDfDQjHC6bjspJImmAxxn3Y1jXBIpZtNThmjI634KIiFfpFCE/V3IsOhlKPQ05P5hnLWy8X/vG6qgnM3Y1GSKhqR6UdBypGKUd4L8pmgRPGRJpgIpndFZIAFJkq3V9Yl2LMnz5NWrWqfVGu3p5V6vaijBPtwCMdgwxnU4Roa0AQCD/AEL/BqPBrPxpvxPh1dMIrMHvyB8fENoSmaOQ==</latexit>

xi ! (f1(i), f2(i), ..., fm(i))<latexit sha1_base64="W3rrKVlr5XECUHEdTmKwvWf1PlY=">AAACR3icbZBLSwMxFIUz9VXrq+rSTbAILcgwUwVdFty4rGAf0KlDJs20wcyD5I5ahv47N27d+RfcuFDEpWk7Qh9eCBy+c29yc7xYcAWW9WbkVlbX1jfym4Wt7Z3dveL+QVNFiaSsQSMRybZHFBM8ZA3gIFg7lowEnmAt7/5q7LcemFQ8Cm9hGLNuQPoh9zkloJFbvHMCAgPPT59GLseO5P0BECmjR+wI5kMZ//n+yLXLvHI6C6oTYJrmHA00zW6quMWSZVqTwsvCzkQJZVV3i69OL6JJwEKggijVsa0YuimRwKlgo4KTKBYTek/6rKNlSAKmuukkhxE+0aSH/UjqEwKe0NmJlARKDQNPd47XVYveGP7ndRLwL7spD+MEWEinD/mJwBDhcai4xyWjIIZaECq53hXTAZGEgo6+oEOwF7+8LJpV0z4zqzfnpVotiyOPjtAxKiMbXaAaukZ11EAUPaN39Im+jBfjw/g2fqatOSObOURzlTN+AZIYsGQ=</latexit>

Page 22: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

Laplacian Eigenmaps

20

−10 −5 0 5 10 150

50

100

−15

−10

−5

0

5

10

15

N = 5 t = 5.0 N = 10 t = 5.0 N = 15 t = 5.0

N = 5 t = 25.0 N = 10 t = 25.0 N = 15 t = 25.0

N = 5 t = ∞ N = 10 t = ∞ N = 15 t = ∞

7 8 9 10x 10−3

0.0095

0.01

0.0105

0.011

0.0115

0.012

0.0125

0.013

0.0135

was

were

would

has

willsaid

can

could

may

did

must

should

nevermight

used

does

got

told

didn’t

going

felt

sawwant

began

0.015 0.016 0.017 0.018

−0.0125

−0.012

−0.0115

−0.011

of

in

on

at

fromthan

between

under

against

during

upon

toward

among

along

2 4 6 8x 10−3

0.024

0.0245

0.025

0.0255

0.026

0.0265

0.027

0.0275

0.028

be

do

make

see

get

know

go

take

say

put

find

look

give

becomehelp

−0.015 −0.01 −0.005 0 0.005 0.01 0.015−0.01

−0.005

0

0.005

0.01

0.015

2

3 1

−8.2 −8 −7.8 −7.6 −7.4x 10−3

4.94

4.96

4.98

5

5.02

5.04

5.06

x 10−3

sh

sh

sh

sh

sh

sh

sh sh

sh

sh

sh

sh

sh

0 10 20x 10−4

−7.1

−7

−6.9

−6.8

−6.7

−6.6

x 10−3

aa

aa

ao

ao

ao

ao

ao ao

ao

ao ao

ao

ao

ao

ao

ao

ao

ao

ao

ao

q

q q

l

7.5 8 8.5 9 9.5x 10−3

4.6

4.7

4.8

4.9

5

5.1

5.2

5.3

5.4

x 10−3

h#

h#

h#

dcl

kcl

gcl

h#

h#

h#

h#

h# h# h#

Page 23: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

Computation•Spectral decomposition intractable for large problems

•Approximate method can be used •Nyström method + column sampling

21

Page 24: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

Topological Data Analysis• Use algebraic topology to analyze high-dimensional data

• Persistent homology • Contour tree • Morse-smale complex

22

}<latexit sha1_base64="Ygaav2VoP4r1lIqAV4/r/q+kz4k=">AAAB+HicbZBLSwMxFIUz9VXro6Mu3QSL4GqYqYIuC25cVrQP6Awlk95pQzMPkjtCLQX/hxsXirj1p7jz35g+Ftp6IPBxzg25OWEmhUbX/bYKa+sbm1vF7dLO7t5+2T44bOo0VxwaPJWpaodMgxQJNFCghHamgMWhhFY4vJ7mrQdQWqTJPY4yCGLWT0QkOENjde2yLyFCh/pK9AfoT7p2xXXcmegqeAuokIXqXfvL76U8jyFBLpnWHc/NMBgzhYJLmJT8XEPG+JD1oWMwYTHoYDxbfEJPjdOjUarMSZDO3N83xizWehSHZjJmONDL2dT8L+vkGF0FY5FkOULC5w9FuaSY0mkLtCcUcJQjA4wrYXalfMAU42i6KpkSvOUvr0Kz6njnTvX2olK7e5rXUSTH5IScEY9ckhq5IXXSIJzk5Jm8kjfr0Xqx3q2P+WjBWlR4RP7I+vwBkwmTfg==</latexit>

discrete data combinatorial processing

Page 25: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

TDA• Persistence diagram

• Filtered simplicial complex

23

Page 26: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

Persistent Homology

24

Page 27: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

Persistent Homology

24

Page 28: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

Persistent Homology

25

Page 29: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

Persistent Homology

25

Page 30: CS53000-Spring 2020 Introduction to Scientific …...CS530 / Spring 2020 : Introduction to Scientific Visualization. 04/02/2020 Dimensionality Reduction for Visualization •Data

CS530 / Spring 2020 : Introduction to Scientific Visualization. Dimensionality Reduction for Visualization04/02/2020

Persistent Homology

26