week 4, lecture 7 - dimensionality reduction: pca and nmf · week 4, lecture 7 - dimensionality...

36
1 / 36 Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Aaron Meyer

Upload: others

Post on 23-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

1 / 36

Week 4, Lecture 7 - Dimensionality Reduction:PCA and NMF

Aaron Meyer

Page 2: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

2 / 36

Outline

I Administrative IssuesI Decomposition methods

I Factor analysisI Principal components analysisI Non-negative matrix factorization

Page 3: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

3 / 36

Dealing with many variables

I So far we’ve largely concentrated on cases in which we haverelatively large numbers of measurements for a few variablesI This is frequently refered to as n > p

I Two other extremes are imporantI Many observations and many variablesI Many variables but few observations (p > n)

Page 4: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

4 / 36

Dealing with many variables

Usually when we’re dealing with many variables, we don’t havea great understanding of how they relate to each otherI E.g. if gene X is high, we can’t be sure that will mean gene Y

will be tooI If we had these relationships, we could reduce the data

I E.g. if we had variables to tell us it’s 3 pm in Los Angeles, wedon’t need one to say it’s daytime

Page 5: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

5 / 36

Dimensionality Reduction

Generate a low-dimensional encoding of a high-dimensionalspacePurposes:I Data compression / visualizationI Robustness to noise and uncertaintyI Potentially easier to interpret

Bonus: Many of the other methods from the class can be appliedafter dimensionality reduction with little or no adjustment!

Page 6: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

6 / 36

Matrix Factorization

Many (most?) dimensionality reduction methods involve matrix fac-torization

Basic Idea: Find two (or more) matrices whose product best approx-imate the original matrix

Low rank approximation to original N × M matrix:

X ≈ WHT

where W is N × R, HT is M × R, and R � N .

Page 7: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

7 / 36

Matrix Factorization

Generalization of many methods (e.g., SVD, QR, CUR, TruncatedSVD, etc.)

Page 8: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

8 / 36

Aside - What should R be?

X ≈ WHT

where W is M × R, HT is M × R, and R � N .

Page 9: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

9 / 36

Matrix FactorizationMatrix factorization is also compression

Figure: http://www.aaronschlegel.com/image-compression-principal-component-analysis/

Page 10: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

10 / 36

Factor AnalysisMatrix factorization is also compression

Figure: http://www.aaronschlegel.com/image-compression-principal-component-analysis/

Page 11: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

11 / 36

Factor AnalysisMatrix factorization is also compression

Figure: http://www.aaronschlegel.com/image-compression-principal-component-analysis/

Page 12: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

12 / 36

Examples from bioengineering

Process controlI Large bioreactor runs may be recorded in a database, along

with a variety of measurements from those runsI We may be interested in how those different runs varied, and

how each factor relates to one anotherI Plotting a compressed version of that data can indicate when

an anomolous change is present

Page 13: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

13 / 36

Examples from bioengineering

Mutational processesI Anytime multiple contributory factors give rise to a

phenomena, matrix factorization can separate them outI Will talk about this in greater detail

Cell heterogeneityI Enormous interest in understanding how cells are similar or

differentI Answer to this can be in millions of different waysI But cells often follow programs

Page 14: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

14 / 36

Principal Components Analysis

Application of matrix factorizationI Each principal component (PC) is linear combination of

uncorrelated attributes / features’I Ordered in terms of varianceI kth PC is orthogonal to all previous PCsI Reduce dimensionality while maintaining maximal variance

Page 15: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

15 / 36

Principal Components Analysis

BOARD

Page 16: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

16 / 36

Methods to calculate PCA

I Iterative computationI More robust with high numbers of variablesI Slower to calculate

I NIPALS (Non-linear iterative partial least squares)I Able to efficiently calculate a few PCs at onceI Breaks down for high numbers of variables (large p)

Page 17: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

17 / 36

Practical Notes

PCAI Implemented within sklearn.decomposition.PCA

I PCA.fit_transform(X) fits the model to X, then provides thedata in principal component space

I PCA.components_ provides the “loadings matrix”, ordirections of maximum variance

I PCA.explained_variance_ provides the amount of varianceexplained by each component

Page 18: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

18 / 36

PCA

import matplotlib.pyplot as pltfrom sklearn import datasetsfrom sklearn.decomposition import PCA

iris = datasets.load_iris()

X = iris.datay = iris.targettarget_names = iris.target_names

pca = PCA(n_components=2)X_r = pca.fit(X).transform(X)

# Print PC1 loadingsprint(pca.components_[:, 0])# ...

Page 19: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

19 / 36

PCA

# ...pca = PCA(n_components=2)X_r = pca.fit(X).transform(X)

# Print PC1 loadingsprint(pca.components_[:, 0])

# Print PC1 scoresprint(X_r[:, 0])

# Percentage of variance explained for each componentprint(pca.explained_variance_ratio_)# [ 0.92461621 0.05301557]

Page 20: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

20 / 36

PCA

Page 21: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

21 / 36

Non-negative matrix factorization

Like PCA, except the coefficients in the linear combinationmust be non-negativeI Forcing positive coefficients implies an additive combination

of basis parts to reconstruct wholeI Generally leads to zeros for factors that don’t contribute

Page 22: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

22 / 36

Non-negative matrix factorization

The answer you get will always depend on the error metric,starting point, and search methodBOARD

Page 23: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

23 / 36

What is significant about this?

I The update rule is multiplicative instead of additiveI In the initial values for W and H are non-negative, then W

and H can never become negativeI This guarantees a non-negative factorizationI Will converge to a local maxima

I Therefore starting point matters

Page 24: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

24 / 36

Non-negative matrix factorization

The answer you get will always depend on the error metric,starting point, and search methodI Another approach is to find the gradient across all the

variables in the matrixI Called coordinate descent, and is usually fasterI Not going to go through implementationI Will also converge to a local maxima

Page 25: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

25 / 36

NMF application: Netflix

Suppose Alice rates Inception 4 stars. What led to this rating?

Page 26: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

26 / 36

NMF application: Netflix480,000 users x 17,700 movies Test data: most recent ratings

What do you think is done with missing values?

Page 27: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

27 / 36

NMF application: Netflix

I More generally, additional baseline predictors include:I A factor that allows Alice’s rating to (linearly) depend on the

(square root of the) number of days since her first rating. (Forexample, have you ever noticed that you become a harshercritic over time?)

I A factor that allows Alice’s rating to depend on the number ofdays since the movie’s first rating by anyone. (If you’re one ofthe first people to watch it, maybe it’s because you’re a hugefan and really excited to see it on DVD, so you’ll tend to rateit higher.)

I A factor that allows Alice’s rating to depend on the number ofpeople who have rated Inception. (Maybe Alice is a hipsterwho hates being part of the crowd.)

I A factor that allows Alice’s rating to depend on the movie’soverall rating.

I (Plus a bunch of others.)

Page 28: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

28 / 36

NMF application: Netflix

And, in fact, modeling these biases turned out to be fairly important:in their paper describing their final solution to the Netflix Prize, Belland Koren write that:

Of the numerous new algorithmic contributions, I would like tohighlight one – those humble baseline predictors (or biases), whichcapture main effects in the data. While the literature mostly con-centrates on the more sophisticated algorithmic aspects, we havelearned that an accurate treatment of main effects is probably atleast as signficant as coming up with modeling breakthroughs.

Page 29: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

29 / 36

NMF application: Netflix

Page 30: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

30 / 36

NMF application: Mutational Processes in Cancer

Figure: Helleday et al, Nat Rev Gen, 2014

Page 31: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

31 / 36

NMF application: Mutational Processes in Cancer

Figure: Helleday et al, Nat Rev Gen, 2014

Page 32: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

32 / 36

NMF application: Mutational Processes in Cancer

Figure: Alexandrov et al, Cell Rep, 2013

Page 33: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

33 / 36

NMF application: Mutational Processes in Cancer

Figure: Alexandrov et al, Cell Rep, 2013

Page 34: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

34 / 36

Practical Notes - NMF

I Implemented within sklearn.decomposition.NMF.I n_components: number of componentsI init: how to initialize the searchI solver: ‘cd’ for coordinate descent, or ‘mu’ for multiplicative

updateI l1_ratio: Can regularize fit

I Provides:I NMF.components_: components x features matrixI Returns transformed data through NMF.fit_transform()

Page 35: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

35 / 36

Summary

PCAI Preserves the covariation within a datasetI Therefore mostly preserves axes of maximal variationI Number of components can vary—in practice more than 2 or

3 rarely helpful

NMFI Explains the dataset through factoring into two non-negative

matricesI Much more stable and well-specified reconstruction when

assumptions are appropriateI Excellent for separating out additive factors

Page 36: Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF · Week 4, Lecture 7 - Dimensionality Reduction: PCA and NMF Author: Aaron Meyer Created Date: 3/3/2020 4:38:52 PM

36 / 36

Closing

As always, selection of the appropriate method depends uponthe question being asked.