unsupervised learning dimensionality reduction principal ...csip/csip2007-pca.pdf · pca vs. lda....
TRANSCRIPT
![Page 1: Unsupervised Learning Dimensionality Reduction Principal ...csip/CSIP2007-PCA.pdf · PCA vs. LDA. PCA (2) ... Maximize: w 1 and w 2 are the two dominant eigenvectors of Σ x. This](https://reader033.vdocuments.us/reader033/viewer/2022050222/5f678e0415408e3c56255557/html5/thumbnails/1.jpg)
Unsupervised LearningDimensionality Reduction
Principal Component Analysis (PCA)Applications
Dimensionality Reduction
![Page 2: Unsupervised Learning Dimensionality Reduction Principal ...csip/CSIP2007-PCA.pdf · PCA vs. LDA. PCA (2) ... Maximize: w 1 and w 2 are the two dominant eigenvectors of Σ x. This](https://reader033.vdocuments.us/reader033/viewer/2022050222/5f678e0415408e3c56255557/html5/thumbnails/2.jpg)
Machine Learning
Supervised learning: generate a function from training data ((input, output) pairs).
regression: predict a continuous value as a function of the inputclassification: predict the class label of the input object
Unsupervised learning: fit a model to observations without a priori output.
clustering: natural groupings identifying patterns in the data
![Page 3: Unsupervised Learning Dimensionality Reduction Principal ...csip/CSIP2007-PCA.pdf · PCA vs. LDA. PCA (2) ... Maximize: w 1 and w 2 are the two dominant eigenvectors of Σ x. This](https://reader033.vdocuments.us/reader033/viewer/2022050222/5f678e0415408e3c56255557/html5/thumbnails/3.jpg)
Unsupervised Learning
Input: unlabeled data samples {x(t)}t=1..m
Why study unlabeled data?Collecting labeled data can be costlyCluster first, label later Changing pattern characteristicsIdentify features that will be useful for categorizationExploratory data analysis
![Page 4: Unsupervised Learning Dimensionality Reduction Principal ...csip/CSIP2007-PCA.pdf · PCA vs. LDA. PCA (2) ... Maximize: w 1 and w 2 are the two dominant eigenvectors of Σ x. This](https://reader033.vdocuments.us/reader033/viewer/2022050222/5f678e0415408e3c56255557/html5/thumbnails/4.jpg)
Dimensionality Reduction
Reducing the number of random variables under consideration.A technique for simplifying a high-dimensional data set by reducing its dimension for analysis.Projection of high-dimensional data to a low-dimensional space that preserves the “important” characteristics of the data.
![Page 5: Unsupervised Learning Dimensionality Reduction Principal ...csip/CSIP2007-PCA.pdf · PCA vs. LDA. PCA (2) ... Maximize: w 1 and w 2 are the two dominant eigenvectors of Σ x. This](https://reader033.vdocuments.us/reader033/viewer/2022050222/5f678e0415408e3c56255557/html5/thumbnails/5.jpg)
Principal Component Analysis (PCA)
An orthogonal linear transformation.Transforms into a new coordinate system
Greatest variance along the first axisSecond greatest variance along the secondetc
Also known as:Karhunen-Loève Transform (KLT)Hotelling Transform
![Page 6: Unsupervised Learning Dimensionality Reduction Principal ...csip/CSIP2007-PCA.pdf · PCA vs. LDA. PCA (2) ... Maximize: w 1 and w 2 are the two dominant eigenvectors of Σ x. This](https://reader033.vdocuments.us/reader033/viewer/2022050222/5f678e0415408e3c56255557/html5/thumbnails/6.jpg)
PCA (1)
Look for the unit vector w that maximizes the variance of the projected data items, wTx:
Solution: w is the dominant eigenvector of the covariance matrix Σx.
![Page 7: Unsupervised Learning Dimensionality Reduction Principal ...csip/CSIP2007-PCA.pdf · PCA vs. LDA. PCA (2) ... Maximize: w 1 and w 2 are the two dominant eigenvectors of Σ x. This](https://reader033.vdocuments.us/reader033/viewer/2022050222/5f678e0415408e3c56255557/html5/thumbnails/7.jpg)
PCA vs. LDA
![Page 8: Unsupervised Learning Dimensionality Reduction Principal ...csip/CSIP2007-PCA.pdf · PCA vs. LDA. PCA (2) ... Maximize: w 1 and w 2 are the two dominant eigenvectors of Σ x. This](https://reader033.vdocuments.us/reader033/viewer/2022050222/5f678e0415408e3c56255557/html5/thumbnails/8.jpg)
PCA (2)
What about projecting onto two vectors?Maximize:
w1 and w2 are the two dominant eigenvectors of Σx. This idea generalizes to any dimension k.
![Page 9: Unsupervised Learning Dimensionality Reduction Principal ...csip/CSIP2007-PCA.pdf · PCA vs. LDA. PCA (2) ... Maximize: w 1 and w 2 are the two dominant eigenvectors of Σ x. This](https://reader033.vdocuments.us/reader033/viewer/2022050222/5f678e0415408e3c56255557/html5/thumbnails/9.jpg)
PCA (3)Let X denote the n × m data matrix (each column is a centered vector x(t)-μx ∈ Rn)
Definition: the principle directions (axes) of {x(t)}t=1..T are the eigenvectors of the covariance matrix XXT.
Let W = {w1,…,wk} be the k leading principle axes, then the projection WTX maximizes:
![Page 10: Unsupervised Learning Dimensionality Reduction Principal ...csip/CSIP2007-PCA.pdf · PCA vs. LDA. PCA (2) ... Maximize: w 1 and w 2 are the two dominant eigenvectors of Σ x. This](https://reader033.vdocuments.us/reader033/viewer/2022050222/5f678e0415408e3c56255557/html5/thumbnails/10.jpg)
PCA (4)Let X denote the centered n × m data matrix as before.
Consider the Singular Value Decomposition of X:
The PCA transform projects X down into the reduced subspace spanned only by the first L singular vectors, WL:
![Page 11: Unsupervised Learning Dimensionality Reduction Principal ...csip/CSIP2007-PCA.pdf · PCA vs. LDA. PCA (2) ... Maximize: w 1 and w 2 are the two dominant eigenvectors of Σ x. This](https://reader033.vdocuments.us/reader033/viewer/2022050222/5f678e0415408e3c56255557/html5/thumbnails/11.jpg)
Computing the Principle Axes
Let A = XXT
Bad idea: compute all eigenvectors of A and keep the leading k.Power method:
Compute (v,λ), the leading eigenvector of AUpdate
Repeat the above with A = B, until we have k vectors.
![Page 12: Unsupervised Learning Dimensionality Reduction Principal ...csip/CSIP2007-PCA.pdf · PCA vs. LDA. PCA (2) ... Maximize: w 1 and w 2 are the two dominant eigenvectors of Σ x. This](https://reader033.vdocuments.us/reader033/viewer/2022050222/5f678e0415408e3c56255557/html5/thumbnails/12.jpg)
Power method (continued)
If v1 is the leading eigenvector of A, then it is an eigenvector of B (with eigenvalue 0):
Any other eigenvector vk of A is also an eignevector of B:
![Page 13: Unsupervised Learning Dimensionality Reduction Principal ...csip/CSIP2007-PCA.pdf · PCA vs. LDA. PCA (2) ... Maximize: w 1 and w 2 are the two dominant eigenvectors of Σ x. This](https://reader033.vdocuments.us/reader033/viewer/2022050222/5f678e0415408e3c56255557/html5/thumbnails/13.jpg)
Alternative methodLet Xn × m, where m << n. The matrix XXT is very large, but XTX is much smaller!
Idea: compute eigenvectors of XTX instead!
Let (v,λ) be an eigenpair of XTX, then (Xv,λ) is an eigenpair of XXT.
Proof:
![Page 14: Unsupervised Learning Dimensionality Reduction Principal ...csip/CSIP2007-PCA.pdf · PCA vs. LDA. PCA (2) ... Maximize: w 1 and w 2 are the two dominant eigenvectors of Σ x. This](https://reader033.vdocuments.us/reader033/viewer/2022050222/5f678e0415408e3c56255557/html5/thumbnails/14.jpg)
Applications in Comp. Graphics
Bounding box computation
x
y
minX maxX
maxY
minY
![Page 15: Unsupervised Learning Dimensionality Reduction Principal ...csip/CSIP2007-PCA.pdf · PCA vs. LDA. PCA (2) ... Maximize: w 1 and w 2 are the two dominant eigenvectors of Σ x. This](https://reader033.vdocuments.us/reader033/viewer/2022050222/5f678e0415408e3c56255557/html5/thumbnails/15.jpg)
Applications in Comp. Graphics
Bounding box computation
x’y’
![Page 16: Unsupervised Learning Dimensionality Reduction Principal ...csip/CSIP2007-PCA.pdf · PCA vs. LDA. PCA (2) ... Maximize: w 1 and w 2 are the two dominant eigenvectors of Σ x. This](https://reader033.vdocuments.us/reader033/viewer/2022050222/5f678e0415408e3c56255557/html5/thumbnails/16.jpg)
Applications in Comp. Graphics
Normal Estimation in point clouds
normal
![Page 17: Unsupervised Learning Dimensionality Reduction Principal ...csip/CSIP2007-PCA.pdf · PCA vs. LDA. PCA (2) ... Maximize: w 1 and w 2 are the two dominant eigenvectors of Σ x. This](https://reader033.vdocuments.us/reader033/viewer/2022050222/5f678e0415408e3c56255557/html5/thumbnails/17.jpg)
Applications in Comp. Graphics
Normal Estimation in point clouds
![Page 18: Unsupervised Learning Dimensionality Reduction Principal ...csip/CSIP2007-PCA.pdf · PCA vs. LDA. PCA (2) ... Maximize: w 1 and w 2 are the two dominant eigenvectors of Σ x. This](https://reader033.vdocuments.us/reader033/viewer/2022050222/5f678e0415408e3c56255557/html5/thumbnails/18.jpg)
Applications in Comp. Graphics
Morphable face modelshttp://gravis.cs.unibas.ch/Sigg99.html