![Page 1: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/1.jpg)
1 / 36
Week 4, Lecture 7 - Dimensionality Reduction:PCA and NMF
Aaron Meyer
![Page 2: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/2.jpg)
2 / 36
Outline
I Administrative IssuesI Decomposition methods
I Factor analysisI Principal components analysisI Non-negative matrix factorization
![Page 3: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/3.jpg)
3 / 36
Dealing with many variables
I So far we’ve largely concentrated on cases in which we haverelatively large numbers of measurements for a few variablesI This is frequently refered to as n > p
I Two other extremes are imporantI Many observations and many variablesI Many variables but few observations (p > n)
![Page 4: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/4.jpg)
4 / 36
Dealing with many variables
Usually when we’re dealing with many variables, we don’t havea great understanding of how they relate to each otherI E.g. if gene X is high, we can’t be sure that will mean gene Y
will be tooI If we had these relationships, we could reduce the data
I E.g. if we had variables to tell us it’s 3 pm in Los Angeles, wedon’t need one to say it’s daytime
![Page 5: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/5.jpg)
5 / 36
Dimensionality Reduction
Generate a low-dimensional encoding of a high-dimensionalspacePurposes:I Data compression / visualizationI Robustness to noise and uncertaintyI Potentially easier to interpret
Bonus: Many of the other methods from the class can be appliedafter dimensionality reduction with little or no adjustment!
![Page 6: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/6.jpg)
6 / 36
Matrix Factorization
Many (most?) dimensionality reduction methods involve matrix fac-torization
Basic Idea: Find two (or more) matrices whose product best approx-imate the original matrix
Low rank approximation to original N × M matrix:
X ≈ WHT
where W is N × R, HT is M × R, and R � N .
![Page 7: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/7.jpg)
7 / 36
Matrix Factorization
Generalization of many methods (e.g., SVD, QR, CUR, TruncatedSVD, etc.)
![Page 8: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/8.jpg)
8 / 36
Aside - What should R be?
X ≈ WHT
where W is M × R, HT is M × R, and R � N .
![Page 9: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/9.jpg)
9 / 36
Matrix FactorizationMatrix factorization is also compression
Figure: http://www.aaronschlegel.com/image-compression-principal-component-analysis/
![Page 10: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/10.jpg)
10 / 36
Factor AnalysisMatrix factorization is also compression
Figure: http://www.aaronschlegel.com/image-compression-principal-component-analysis/
![Page 11: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/11.jpg)
11 / 36
Factor AnalysisMatrix factorization is also compression
Figure: http://www.aaronschlegel.com/image-compression-principal-component-analysis/
![Page 12: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/12.jpg)
12 / 36
Examples from bioengineering
Process controlI Large bioreactor runs may be recorded in a database, along
with a variety of measurements from those runsI We may be interested in how those different runs varied, and
how each factor relates to one anotherI Plotting a compressed version of that data can indicate when
an anomolous change is present
![Page 13: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/13.jpg)
13 / 36
Examples from bioengineering
Mutational processesI Anytime multiple contributory factors give rise to a
phenomena, matrix factorization can separate them outI Will talk about this in greater detail
Cell heterogeneityI Enormous interest in understanding how cells are similar or
differentI Answer to this can be in millions of different waysI But cells often follow programs
![Page 14: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/14.jpg)
14 / 36
Principal Components Analysis
Application of matrix factorizationI Each principal component (PC) is linear combination of
uncorrelated attributes / features’I Ordered in terms of varianceI kth PC is orthogonal to all previous PCsI Reduce dimensionality while maintaining maximal variance
![Page 15: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/15.jpg)
15 / 36
Principal Components Analysis
BOARD
![Page 16: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/16.jpg)
16 / 36
Methods to calculate PCA
I Iterative computationI More robust with high numbers of variablesI Slower to calculate
I NIPALS (Non-linear iterative partial least squares)I Able to efficiently calculate a few PCs at onceI Breaks down for high numbers of variables (large p)
![Page 17: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/17.jpg)
17 / 36
Practical Notes
PCAI Implemented within sklearn.decomposition.PCA
I PCA.fit_transform(X) fits the model to X, then provides thedata in principal component space
I PCA.components_ provides the “loadings matrix”, ordirections of maximum variance
I PCA.explained_variance_ provides the amount of varianceexplained by each component
![Page 18: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/18.jpg)
18 / 36
PCA
import matplotlib.pyplot as pltfrom sklearn import datasetsfrom sklearn.decomposition import PCA
iris = datasets.load_iris()
X = iris.datay = iris.targettarget_names = iris.target_names
pca = PCA(n_components=2)X_r = pca.fit(X).transform(X)
# Print PC1 loadingsprint(pca.components_[:, 0])# ...
![Page 19: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/19.jpg)
19 / 36
PCA
# ...pca = PCA(n_components=2)X_r = pca.fit(X).transform(X)
# Print PC1 loadingsprint(pca.components_[:, 0])
# Print PC1 scoresprint(X_r[:, 0])
# Percentage of variance explained for each componentprint(pca.explained_variance_ratio_)# [ 0.92461621 0.05301557]
![Page 20: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/20.jpg)
20 / 36
PCA
![Page 21: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/21.jpg)
21 / 36
Non-negative matrix factorization
Like PCA, except the coefficients in the linear combinationmust be non-negativeI Forcing positive coefficients implies an additive combination
of basis parts to reconstruct wholeI Generally leads to zeros for factors that don’t contribute
![Page 22: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/22.jpg)
22 / 36
Non-negative matrix factorization
The answer you get will always depend on the error metric,starting point, and search methodBOARD
![Page 23: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/23.jpg)
23 / 36
What is significant about this?
I The update rule is multiplicative instead of additiveI In the initial values for W and H are non-negative, then W
and H can never become negativeI This guarantees a non-negative factorizationI Will converge to a local maxima
I Therefore starting point matters
![Page 24: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/24.jpg)
24 / 36
Non-negative matrix factorization
The answer you get will always depend on the error metric,starting point, and search methodI Another approach is to find the gradient across all the
variables in the matrixI Called coordinate descent, and is usually fasterI Not going to go through implementationI Will also converge to a local maxima
![Page 25: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/25.jpg)
25 / 36
NMF application: Netflix
Suppose Alice rates Inception 4 stars. What led to this rating?
![Page 26: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/26.jpg)
26 / 36
NMF application: Netflix480,000 users x 17,700 movies Test data: most recent ratings
What do you think is done with missing values?
![Page 27: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/27.jpg)
27 / 36
NMF application: Netflix
I More generally, additional baseline predictors include:I A factor that allows Alice’s rating to (linearly) depend on the
(square root of the) number of days since her first rating. (Forexample, have you ever noticed that you become a harshercritic over time?)
I A factor that allows Alice’s rating to depend on the number ofdays since the movie’s first rating by anyone. (If you’re one ofthe first people to watch it, maybe it’s because you’re a hugefan and really excited to see it on DVD, so you’ll tend to rateit higher.)
I A factor that allows Alice’s rating to depend on the number ofpeople who have rated Inception. (Maybe Alice is a hipsterwho hates being part of the crowd.)
I A factor that allows Alice’s rating to depend on the movie’soverall rating.
I (Plus a bunch of others.)
![Page 28: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/28.jpg)
28 / 36
NMF application: Netflix
And, in fact, modeling these biases turned out to be fairly important:in their paper describing their final solution to the Netflix Prize, Belland Koren write that:
Of the numerous new algorithmic contributions, I would like tohighlight one – those humble baseline predictors (or biases), whichcapture main effects in the data. While the literature mostly con-centrates on the more sophisticated algorithmic aspects, we havelearned that an accurate treatment of main effects is probably atleast as signficant as coming up with modeling breakthroughs.
![Page 29: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/29.jpg)
29 / 36
NMF application: Netflix
![Page 30: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/30.jpg)
30 / 36
NMF application: Mutational Processes in Cancer
Figure: Helleday et al, Nat Rev Gen, 2014
![Page 31: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/31.jpg)
31 / 36
NMF application: Mutational Processes in Cancer
Figure: Helleday et al, Nat Rev Gen, 2014
![Page 32: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/32.jpg)
32 / 36
NMF application: Mutational Processes in Cancer
Figure: Alexandrov et al, Cell Rep, 2013
![Page 33: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/33.jpg)
33 / 36
NMF application: Mutational Processes in Cancer
Figure: Alexandrov et al, Cell Rep, 2013
![Page 34: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/34.jpg)
34 / 36
Practical Notes - NMF
I Implemented within sklearn.decomposition.NMF.I n_components: number of componentsI init: how to initialize the searchI solver: ‘cd’ for coordinate descent, or ‘mu’ for multiplicative
updateI l1_ratio: Can regularize fit
I Provides:I NMF.components_: components x features matrixI Returns transformed data through NMF.fit_transform()
![Page 35: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/35.jpg)
35 / 36
Summary
PCAI Preserves the covariation within a datasetI Therefore mostly preserves axes of maximal variationI Number of components can vary—in practice more than 2 or
3 rarely helpful
NMFI Explains the dataset through factoring into two non-negative
matricesI Much more stable and well-specified reconstruction when
assumptions are appropriateI Excellent for separating out additive factors
![Page 36: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM](https://reader035.vdocuments.us/reader035/viewer/2022063015/5fd3026ffa56d5083449ad5d/html5/thumbnails/36.jpg)
36 / 36
Closing
As always, selection of the appropriate method depends uponthe question being asked.