![Page 1: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/1.jpg)
Detailed Look at PCA
Three Important (& Interesting) Viewpoints:
1. Mathematics
2. Numerics
3. Statistics
1st: Review Linear Alg. and Multivar. Prob.
![Page 2: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/2.jpg)
Review of Linear Algebra (Cont.)
Better View of Relationship:
Singular Value Dec. Eigenvalue Dec.
• Start with data matrix:
• With SVD:
• Create square, symmetric matrix:
• Note that:
• Gives Eigenanalysis,
tVSUX
X
2& SDUB
nd
tXX
tttt USUUSVVSUXX 2
![Page 3: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/3.jpg)
Review of Multivariate Probability
Given a Random Vector,
A Center of the Distribution is the Mean Vector,
Note: Component-Wise Calc’n (Euclidean)
dX
X
X 1
dEX
EX
XE 1
![Page 4: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/4.jpg)
Review of Multivariate Probability
Given a Random Vector,
A Measure of Spread is the Covariance Matrix:
dd
d
XXX
XXX
X
var,cov
,covvar
)cov(
1
11
dX
X
X 1
![Page 5: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/5.jpg)
Review of Multivar. Prob. (Cont.)
Outer Product Representation:
,
Where:
n
idididid
didii
XXXXXX
XXXXXX
n 1 211
112
11
11ˆ
tn
i
tii XXXXXX
n~~
11ˆ
1
ndn XXXX
nX
11
1~
![Page 6: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/6.jpg)
PCA as an Optimization Problem
Find “Direction of Greatest Variability”:
![Page 7: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/7.jpg)
PCA as an Optimization Problem
Find “Direction of Greatest Variability”:
![Page 8: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/8.jpg)
PCA as Optimization (Cont.)Now since is an Orthonormal Basis Matrix,
and
So the Rotation Gives a
Distribution of the Energy of Over the
Eigen-Directions
And is Max’d (Over ),
by Putting all Energy in the “Largest
Direction”, i.e. ,
Where “Eigenvalues are Ordered”,
B
d
jjj vuvu
1
,
d
jj uvu
1
22,1
uv
uv
uB
d
t
,
,1
u
d
jjj
n
iiu uvnXXP
1
2
1
2,1 u
1vu
d 21
![Page 9: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/9.jpg)
PCA as Optimization (Cont.)
Notes:
• Solution is Unique when
• Else have Sol’ns in Subsp. Gen’d by 1st s
• Projecting onto Subspace to ,
Gives as Next Direction
• Continue Through ,…,
• Replace by to get Theoretical PCA
• Estimated by the Empirical Version
21
v
1v
2v
3v dv
![Page 10: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/10.jpg)
Connect Math to Graphics2-d Toy Example
Descriptor Space Object Space
Data Points (Curves) are columns of data matrix, X
![Page 11: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/11.jpg)
Connect Math to Graphics (Cont.)
2-d Toy ExampleDescriptor Space Object Space
PC1 Direction = η = Eigenvector (w/ biggest λ)
![Page 12: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/12.jpg)
Connect Math to Graphics (Cont.)
2-d Toy ExampleDescriptor Space Object Space
Centered Data PC1 Projection Residual Best 1-d Approx’s of Data
![Page 13: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/13.jpg)
Connect Math to Graphics (Cont.)
2-d Toy ExampleDescriptor Space Object Space
Centered Data PC2 Projection Residual Worst 1-d Approx’s of Data
![Page 14: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/14.jpg)
PCA Redistribution of EnergyConvenient Summary of Amount of Structure:
Total Sum of Squares
Physical Interpetation:Total Energy in Data
Insight comes from decomposition
Statistical Terminology:ANalysis Of VAriance (ANOVA)
n
iiX
1
2
![Page 15: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/15.jpg)
PCA Redist’n of Energy (Cont.)
Have already studied this decomposition (recall curve e.g.)• Variation (SS) due to Mean (% of total)• Variation (SS) of Mean Residuals (% of total)
![Page 16: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/16.jpg)
PCA Redist’n of Energy (Cont.)
j
Eigenvalues Provide Atoms of SS Decompos’n
Useful Plots are:• Power Spectrum: vs. • log Power Spectrum: vs. • Cumulative Power Spectrum: vs.
Note PCA Gives SS’s for Free (As Eigenval’s),
But Watch Factors of
d
jj
ttn
ii DtrDBBtrBDBtrXX
n 11
2)(
11
j
j
j jlog
j
jj
1''
1n
![Page 17: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/17.jpg)
PCA Redist’n of Energy (Cont.)
Note, have already considered some of these Useful Plots:• Power Spectrum• Cumulative Power Spectrum
![Page 18: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/18.jpg)
Connect Math to Graphics (Cont.)
2-d Toy ExampleDescriptor Space Object Space
Revisit SS Decomposition for PC1
![Page 19: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/19.jpg)
Connect Math to Graphics (Cont.)
2-d Toy ExampleDescriptor Space Object Space
Revisit SS Decomposition for PC1:PC1 has “most of var’n” = 93%Reflected by good approximation in Desc’r Space
![Page 20: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/20.jpg)
Connect Math to Graphics (Cont.)
2-d Toy ExampleDescriptor Space Object Space
Revisit SS Decomposition for PC1:PC2 has “only a little var’n” = 7%Reflected by poor approximation in Desc’or Space
![Page 21: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/21.jpg)
Different Views of PCA
Solves several optimization problems:
1. Direction to maximize SS of 1-d proj’d data
![Page 22: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/22.jpg)
Different Views of PCA2-d Toy Example
Descriptor Space Object Space
1. Max SS of Projected Data2. Min SS of Residuals3. Best Fit Line
![Page 23: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/23.jpg)
Different Views of PCA
Solves several optimization problems:
1. Direction to maximize SS of 1-d proj’d data
2. Direction to minimize SS of residuals
![Page 24: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/24.jpg)
Different Views of PCA2-d Toy Example
Feature Space Object Space
1. Max SS of Projected Data2. Min SS of Residuals3. Best Fit Line
![Page 25: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/25.jpg)
Different Views of PCA
Solves several optimization problems:
1. Direction to maximize SS of 1-d proj’d data
2. Direction to minimize SS of residuals
(same, by Pythagorean Theorem)
![Page 26: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/26.jpg)
Different Views of PCA
Solves several optimization problems:
1. Direction to maximize SS of 1-d proj’d data
2. Direction to minimize SS of residuals
(same, by Pythagorean Theorem)
3. “Best fit line” to data in “orthogonal sense”
(vs. regression of Y on X = vertical sense
& regression of X on Y = horizontal sense)
![Page 27: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/27.jpg)
Different Views of PCA
Toy Example Comparison of Fit Lines:
PC1
Regression of Y on X
Regression of X on Y
![Page 28: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/28.jpg)
Different Views of PCA
Normal
Data
ρ = 0.3
![Page 29: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/29.jpg)
Different Views of PCA
Projected
Residuals
![Page 30: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/30.jpg)
Different Views of PCA
Vertical
Residuals
(X predicts Y)
![Page 31: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/31.jpg)
Different Views of PCA
Horizontal
Residuals
(Y predicts X)
![Page 32: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/32.jpg)
Different Views of PCA
Projected
Residuals
(Balanced
Treatment)
![Page 33: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/33.jpg)
Different Views of PCA
Toy Example Comparison of Fit Lines:
PC1
Regression of Y on X
Regression of X on Y
Note: Big Difference Prediction Matters
![Page 34: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/34.jpg)
Different Views of PCA
Solves several optimization problems:
1. Direction to maximize SS of 1-d proj’d data
2. Direction to minimize SS of residuals
(same, by Pythagorean Theorem)
3. “Best fit line” to data in “orthogonal sense”
(vs. regression of Y on X = vertical sense
& regression of X on Y = horizontal sense)
Use one that makes sense…
![Page 35: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/35.jpg)
PCA Data RepresentationIdea: Expand Data Matrix
in Terms of Inner Prod’ts & Eigenvectors
![Page 36: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/36.jpg)
PCA Data RepresentationIdea: Expand Data Matrix
in Terms of Inner Prod’ts & EigenvectorsRecall Notation:
(Mean Centered Data)
ndn XXXX
nX
11
1~
![Page 37: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/37.jpg)
PCA Data RepresentationIdea: Expand Data Matrix
in Terms of Inner Prod’ts & EigenvectorsRecall Notation:
Spectral Representation (centered data):
ndn XXXX
nX
11
1~
d
j
tjjnd XX
1
~~
n11d
![Page 38: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/38.jpg)
PCA Data Represent’n (Cont.)
Now Using:
XnXX ~1
![Page 39: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/39.jpg)
PCA Data Represent’n (Cont.)
Now Using:
Spectral Representation (Raw Data):
XnXX ~1
d
jjj
d
j
tjjnd cXXnXX
11
~1
![Page 40: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/40.jpg)
PCA Data Represent’n (Cont.)
Now Using:
Spectral Representation (Raw Data):
Where:• Entries of are Loadings• Entries of are Scores
XnXX ~1
d
jjj
d
j
tjjnd cXXnXX
11
~1
n11dj
jc
![Page 41: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/41.jpg)
PCA Data Represent’n (Cont.)
Can Focus on Individual Data Vectors:
(Part of Above Full Matrix Rep’n)
d
jijji cXX
1
![Page 42: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/42.jpg)
PCA Data Represent’n (Cont.)
Can Focus on Individual Data Vectors:
(Part of Above Full Matrix Rep’n)
Terminology: are Called “PCs” and are also Called Scores
d
jijji cXX
1
ijc
![Page 43: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/43.jpg)
PCA Data Represent’n (Cont.)
More Terminology:
• Scores, are Coefficients in Spectral Representation:
d
jijji cXX
1
ijc
![Page 44: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/44.jpg)
PCA Data Represent’n (Cont.)
More Terminology:
• Scores, are Coefficients in Spectral Representation:
• Loadings are Entries of Eigenvectors:
d
jijji cXX
1
ijc
dj
j
j
1ij
![Page 45: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/45.jpg)
PCA Data Represent’n (Cont.)
Reduced Rank Representation:
k
jijji cXX
1
![Page 46: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/46.jpg)
PCA Data Represent’n (Cont.)
Reduced Rank Representation:
• Reconstruct Using Only Terms
(Assuming Decreasing Eigenvalues)
k
jijji cXX
1
dk
![Page 47: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/47.jpg)
PCA Data Represent’n (Cont.)
Reduced Rank Representation:
• Reconstruct Using Only Terms
(Assuming Decreasing Eigenvalues)• Gives: Rank Approximation of Data
k
jijji cXX
1
dk
k
![Page 48: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/48.jpg)
PCA Data Represent’n (Cont.)
Reduced Rank Representation:
• Reconstruct Using Only Terms
(Assuming Decreasing Eigenvalues)• Gives: Rank Approximation of Data• Key to PCA Dimension Reduction
k
jijji cXX
1
dk
k
![Page 49: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/49.jpg)
PCA Data Represent’n (Cont.)
Reduced Rank Representation:
• Reconstruct Using Only Terms
(Assuming Decreasing Eigenvalues)• Gives: Rank Approximation of Data• Key to PCA Dimension Reduction• And PCA for Data Compression (~ .jpeg)
k
jijji cXX
1
dk
k
![Page 50: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/50.jpg)
PCA Data Represent’n (Cont.)
Choice of in Reduced Rank Represent’n:k
![Page 51: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/51.jpg)
PCA Data Represent’n (Cont.)
Choice of in Reduced Rank Represent’n:• Generally Very Slippery Problem
k
![Page 52: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/52.jpg)
PCA Data Represent’n (Cont.)
Choice of in Reduced Rank Represent’n:• Generally Very Slippery Problem
Not Recommended:o Arbitrary Choiceo E.g. % Variation Explained 90%? 95%?
k
![Page 53: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/53.jpg)
PCA Data Represent’n (Cont.)
Choice of in Reduced Rank Represent’n:• Generally Very Slippery Problem• SCREE Plot (Kruskal 1964):
Find Knee in Power Spectrum
k
![Page 54: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/54.jpg)
PCA Data Represent’n (Cont.)
SCREE Plot Drawbacks:• What is a Knee?
![Page 55: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/55.jpg)
PCA Data Represent’n (Cont.)
SCREE Plot Drawbacks:• What is a Knee?• What if There are Several?
![Page 56: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/56.jpg)
PCA Data Represent’n (Cont.)
SCREE Plot Drawbacks:• What is a Knee?• What if There are Several?• Knees Depend on Scaling (Power? log?)
![Page 57: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/57.jpg)
PCA Data Represent’n (Cont.)
SCREE Plot Drawbacks:• What is a Knee?• What if There are Several?• Knees Depend on Scaling (Power? log?)
Personal Suggestions: Find Auxiliary Cutoffs (Inter-Rater Variation) Use the Full Range (ala Scale Space)
![Page 58: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/58.jpg)
PCA SimulationIdea: given
• Mean Vector
• Eigenvectors
• Eigenvalues
Simulate data from Corresponding Normal Distribution
k
,1
k ,,1
![Page 59: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/59.jpg)
PCA SimulationIdea: given
• Mean Vector
• Eigenvectors
• Eigenvalues
Simulate data from Corresponding Normal Distribution
Approach: Invert PCA Data Represent’n
where
k
jijjji ZX
1
1,0~ NZ ij
k
,1
k ,,1
![Page 60: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/60.jpg)
PCA & Graphical Displays
Small caution on PC directions & plotting:
• PCA directions (may) have sign flip
• Mathematically no difference
• Numerically caused artifact of round off
• Can have large graphical impact
![Page 61: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/61.jpg)
PCA & Graphical Displays
Toy Example (2 colored “clusters” in data)
![Page 62: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/62.jpg)
PCA & Graphical Displays
Toy Example (1 point moved)
![Page 63: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/63.jpg)
PCA & Graphical Displays
Toy Example (1 point moved)
Important
Point:
Constant
Axes
![Page 64: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/64.jpg)
PCA & Graphical Displays
Original Data (arbitrary PC flip)
![Page 65: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/65.jpg)
PCA & Graphical Displays
Point Moved Data (arbitrary PC flip)
Much
Harder
To See
Moving
Point
![Page 66: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/66.jpg)
PCA & Graphical Displays
How to “fix directions”?
Personal approach:
Use ± 1 flip that gives:
Max(Proj(Xi)) > |Min(Proj(Xi))|
(assumes 0 centered)
![Page 67: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/67.jpg)
Alternate PCA Computation
Issue: for HDLSS data (recall )
• May be Quite Large,
nd
dd
![Page 68: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/68.jpg)
Alternate PCA Computation
Issue: for HDLSS data (recall )
• May be Quite Large,
• Thus Slow to Work with, and to Compute
• What About a Shortcut?
nd
dd
![Page 69: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/69.jpg)
Alternate PCA Computation
Issue: for HDLSS data (recall )
• May be Quite Large,
• Thus Slow to Work with, and to Compute
• What About a Shortcut?
Approach: Singular Value Decomposition
(of (centered, scaled) Data Matrix )
nd
dd
X~
![Page 70: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/70.jpg)
Alternate PCA Computation
Singular Value Decomposition:tUSVX ~
![Page 71: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/71.jpg)
Alternate PCA Computation
Singular Value Decomposition:
Where:
is Unitary
is Unitary
is diag’l matrix of singular val’s
VU
tUSVX ~
S
dd
nn
![Page 72: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/72.jpg)
Alternate PCA Computation
Singular Value Decomposition:
Where:
is Unitary
is Unitary
is diag’l matrix of singular val’s
Assume: decreasing singular values
VU
tUSVX ~
S
dd
nn
01 kss
![Page 73: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/73.jpg)
Alternate PCA Computation
Singular Value Decomposition:
Recall Relation to Eigen-Analysis ofUSVX ~
ttttt UUSUSVUSVXXn 2))((~~ˆ1
![Page 74: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/74.jpg)
Alternate PCA Computation
Singular Value Decomposition:
Recall Relation to Eigen-Analysis of
Thus have Same Eigenvector Matrix
USVX ~
ttttt UUSUSVUSVXXn 2))((~~ˆ1
U
![Page 75: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/75.jpg)
Alternate PCA Computation
Singular Value Decomposition:
Recall Relation to Eigen-Analysis of
Thus have Same Eigenvector Matrix
And Eigenval’s are Squares of Singular Val’s
USVX ~
ttttt UUSUSVUSVXXn 2))((~~ˆ1
nisii ,,1,2
U
![Page 76: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/76.jpg)
Alternate PCA Computation
Singular Value Decomposition,
Computational Advantage (for Rank ):
Use Compact Form, only need to find
e-vec’s s-val’s scores
USVX ~
nrrrrd VSU ,,
r
![Page 77: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/77.jpg)
Alternate PCA Computation
Singular Value Decomposition,
Computational Advantage (for Rank ):
Use Compact Form, only need to find
e-vec’s s-val’s scores
Other Components not Useful
USVX ~
nrrrrd VSU ,,
r
![Page 78: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/78.jpg)
Alternate PCA Computation
Singular Value Decomposition,
Computational Advantage (for Rank ):
Use Compact Form, only need to find
e-vec’s s-val’s scores
Other Components not Useful
So can be much faster for
nd
USVX ~
nrrrrd VSU ,,
r
![Page 79: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/79.jpg)
Alternate PCA Computation
Another Variation: Dual PCA
Idea: Useful to View Data Matrix
nddnd
n
XX
XX
X
1
111
![Page 80: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/80.jpg)
Alternate PCA Computation
Another Variation: Dual PCA
Idea: Useful to View Data Matrix as
Col’ns as Data Objects
nddnd
n
XX
XX
X
1
111
![Page 81: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/81.jpg)
Alternate PCA Computation
Another Variation: Dual PCA
Idea: Useful to View Data Matrix as Both
Col’ns as Data Objects
&
Rows as Data Objectsnddnd
n
XX
XX
X
1
111
![Page 82: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/82.jpg)
Alternate PCA Computation
Aside on Row-Column Data Object Choice:
Col’ns as Data Objects
&
Rows as Data Objectsnddnd
n
XX
XX
X
1
111
![Page 83: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/83.jpg)
Alternate PCA Computation
Aside on Row-Column Data Object Choice:
Personal Choice: (~Matlab,Linear Algebra)
Col’ns as Data Objects
&
Rows as Data Objects
nddnd
n
XX
XX
X
1
111
![Page 84: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/84.jpg)
Alternate PCA Computation
Aside on Row-Column Data Object Choice:
Personal Choice: (~Matlab,Linear Algebra)
Another Choice: (~SAS,R, Linear Models)
Variables
Col’ns as Data Objects
&
Rows as Data Objects
nddnd
n
XX
XX
X
1
111
![Page 85: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/85.jpg)
Alternate PCA Computation
Aside on Row-Column Data Object Choice:
Personal Choice: (~Matlab,Linear Algebra)
Another Choice: (~SAS,R, Linear Models)
Careful When Discussing!
Col’ns as Data Objects
&
Rows as Data Objects
nddnd
n
XX
XX
X
1
111
![Page 86: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/86.jpg)
Alternate PCA Computation
Dual PCA Computation:
Same as above, but replace
with
XtX
![Page 87: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/87.jpg)
Alternate PCA Computation
Dual PCA Computation:
Same as above, but replace
with
So can almost replace
with
X
tXX~~ˆ
tX
XX tD
~~ˆ
![Page 88: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/88.jpg)
Alternate PCA Computation
Dual PCA Computation:
Same as above, but replace
with
So can almost replace
with
Then use SVD, , to get:
X
tXX~~ˆ
tUSVX ~
tttttttD VVSUSVVSUUSVUSVXX 2~~ˆ
tX
XX tD
~~ˆ
![Page 89: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/89.jpg)
Alternate PCA Computation
Dual PCA Computation:
Same as above, but replace
with
So can almost replace
with
Then use SVD, , to get:
Note: Same Eigenvalues
X
tXX~~ˆ
tUSVX ~
tttttttD VVSUSVVSUUSVUSVXX 2~~ˆ
tX
XX tD
~~ˆ
![Page 90: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/90.jpg)
Alternate PCA Computation
Appears to be cool symmetry:
Primal Dual
Loadings Scores
VU
![Page 91: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/91.jpg)
Alternate PCA Computation
Appears to be cool symmetry:
Primal Dual
Loadings Scores
But, there is a problem with the
means
and normalization …
VU
![Page 92: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/92.jpg)
Functional Data Analysis
Recall from 1st Class Meeting:
Spanish Mortality Data
![Page 93: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/93.jpg)
Functional Data Analysis
Interesting Data Set:• Mortality Data• For Spanish Males (thus can relate to history)• Each curve is a single year• x coordinate is age
Note: Choice made of Data Object(could also study age as curves,
x coordinate = time)
![Page 94: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/94.jpg)
Important Issue:
What are the Data Objects?
• Curves (years) : Mortality vs. Age
• Curves (Ages) : Mortality vs. Year
Note: Rows vs. Columns of Data Matrix
Functional Data Analysis
![Page 95: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/95.jpg)
Mortality Time Series
ImprovedColoring:
RainbowRepresentingYear:
Magenta = 1908
Red = 2002
![Page 96: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/96.jpg)
Mortality Time Series
Object Space View of ProjectionsOnto PC1Direction
Main Mode Of Variation:Constant Across Ages
![Page 97: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/97.jpg)
Mortality Time Series
Shows MajorImprovementOver Time
(medical technology,etc.)
And ChangeIn Age Rounding Blips
![Page 98: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/98.jpg)
Mortality Time Series
Object Space View of ProjectionsOnto PC2Direction
2nd Mode Of Variation:Difference Between 20-45 & Rest
![Page 99: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/99.jpg)
Mortality Time SeriesScores Plot
Feature(Point Cloud)Space View
ConnectingLines HighlightTime Order
Good View ofHistoricalEffects
![Page 100: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/100.jpg)
Demography Data• Dual PCA• Idea: Rows and Columns trade
places• Terminology: from optimization• Insights come from studying
“primal” & “dual” problems
![Page 101: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/101.jpg)
Primal / Dual PCAConsider
“Data Matrix”
dndid
jnjij
ni
xxx
xxx
xxx
1
1
1111
![Page 102: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/102.jpg)
Primal / Dual PCAConsider
“Data Matrix”
Primal Analysis:
Columns are data vectors
dndid
jnjij
ni
xxx
xxx
xxx
1
1
1111
![Page 103: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/103.jpg)
Primal / Dual PCAConsider
“Data Matrix”
Dual Analysis:
Rows are data vectors
dndid
jnjij
ni
xxx
xxx
xxx
1
1
1111
![Page 104: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/104.jpg)
Demography Data
Dual
PCA
![Page 105: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/105.jpg)
Demography Data
Color Code
(Ages)
![Page 106: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/106.jpg)
Demography Data
Older People
More At Risk
Of Dying
![Page 107: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/107.jpg)
Demography Data
Improvement
Over Time
On Average
![Page 108: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/108.jpg)
Demography Data
Improvement
Over Time
On Average
Except
Flu Pandemic
& Civil War
![Page 109: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/109.jpg)
Demography Data
PC1: Reflects
Mortality
Higher for
Older People
![Page 110: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/110.jpg)
Demography Data
PC2: Younger
People
Benefitted
Most From
Improved
Health
![Page 111: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/111.jpg)
Demography Data
PC3: How
To Interpret?
![Page 112: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/112.jpg)
Demography Data
Use:
Mean + Max
Mean - Min
![Page 113: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/113.jpg)
Demography Data
Use:
Mean + Max
Mean – Min
Flu Pandemic
Civil War
Auto Deaths
![Page 114: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/114.jpg)
Demography Data
PC3 Intepretation:
Mid-Age
Cohort
![Page 115: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/115.jpg)
Primal / Dual PCAWhich is “Better”?
Same Info,
Displayed
Differently
Here: Prefer
Primal
dndid
jnjij
ni
xxx
xxx
xxx
1
1
1111
![Page 116: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/116.jpg)
Primal / Dual PCAWhich is “Better”?
Same Info,
Displayed
Differently
Here: Prefer
Primal, As Indicated by Graphics Quality
dndid
jnjij
ni
xxx
xxx
xxx
1
1
1111
![Page 117: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/117.jpg)
Primal / Dual PCAWhich is
“Better”?
In General:• Either Can Be
Best• Try Both and
Choose• Or Use
“Natural Choice” of
Data Object
dndid
jnjij
ni
xxx
xxx
xxx
1
1
1111
![Page 118: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/118.jpg)
Primal / Dual PCAImportant Early Version:
BiPlot Display
Overlay Primal & Dual PCAs
Not Easy to Interpret
Gabriel, K. R. (1971)
![Page 119: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/119.jpg)
Return to Big PictureMain statistical goals of OODA:• Understanding population
structure– Low dim’al Projections, PCA …
• Classification (i. e. Discrimination)– Understanding 2+ populations
• Time Series of Data Objects– Chemical Spectra, Mortality Data
• “Vertical Integration” of Data Types
![Page 120: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/120.jpg)
Return to Big PictureMain statistical goals of OODA:• Understanding population
structure– Low dim’al Projections, PCA …
• Classification (i. e. Discrimination)– Understanding 2+ populations
• Time Series of Data Objects– Chemical Spectra, Mortality Data
• “Vertical Integration” of Data Types
![Page 121: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/121.jpg)
Classification - Discrimination
Background: Two Class (Binary) version:
Using “training data” from
Class +1 and Class -1
Develop a “rule” for
assigning new data to a Class
Canonical Example: Disease Diagnosis• New Patients are “Healthy” or “Ill”• Determine based on measurements
![Page 122: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/122.jpg)
Classification - Discrimination
Important Distinction: Classification vs. Clustering
![Page 123: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/123.jpg)
Classification - Discrimination
Important Distinction: Classification vs. Clustering
Classification:Class labels are known,
Goal: understand differences
![Page 124: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/124.jpg)
Classification - Discrimination
Important Distinction: Classification vs. Clustering
Classification:Class labels are known,
Goal: understand differencesClustering:
Goal: Find class labels (to be similar)
![Page 125: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/125.jpg)
Classification - Discrimination
Important Distinction: Classification vs. Clustering
Classification:Class labels are known,
Goal: understand differencesClustering:
Goal: Find class labels (to be similar)Both are about clumps of similar data,
but much different goals
![Page 126: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/126.jpg)
Classification - Discrimination
Important Distinction:
Classification vs. Clustering
Useful terminology:
Classification: supervised learning
Clustering: unsupervised learning
![Page 127: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/127.jpg)
Classification - Discrimination
Terminology:
For statisticians, these are synonyms
![Page 128: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/128.jpg)
Classification - Discrimination
Terminology:
For statisticians, these are synonyms
For biologists, classification means:
• Constructing taxonomies
• And sorting organisms into them
![Page 129: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/129.jpg)
Classification - Discrimination
Terminology:
For statisticians, these are synonyms
For biologists, classification means:
• Constructing taxonomies
• And sorting organisms into them
(maybe this is why discrimination
was used, until politically incorrect…)
![Page 130: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/130.jpg)
Classification (i.e. discrimination)
There are a number of:
• Approaches
• Philosophies
• Schools of Thought
Too often cast as:
Statistics vs. EE - CS
![Page 131: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/131.jpg)
Classification (i.e. discrimination)
EE – CS variations:
• Pattern Recognition
• Artificial Intelligence
• Neural Networks
• Data Mining
• Machine Learning
![Page 132: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/132.jpg)
Classification (i.e. discrimination)
Differing Viewpoints:
Statistics• Model Classes with Probability
Distribut’ns• Use to study class diff’s & find rules
![Page 133: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/133.jpg)
Classification (i.e. discrimination)
Differing Viewpoints:
Statistics• Model Classes with Probability
Distribut’ns• Use to study class diff’s & find rules
EE – CS• Data are just Sets of Numbers• Rules distinguish between these
![Page 134: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/134.jpg)
Classification (i.e. discrimination)
Differing Viewpoints:
Statistics• Model Classes with Probability
Distribut’ns• Use to study class diff’s & find rules
EE – CS• Data are just Sets of Numbers• Rules distinguish between these
Current thought: combine these
![Page 135: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/135.jpg)
Classification (i.e. discrimination)
Important Overview Reference:
Duda, Hart and Stork (2001)
• Too much about neural nets???• Pizer disagrees…• Update of Duda & Hart (1973)
![Page 136: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/136.jpg)
Classification (i.e. discrimination)
For a more classical statistical view:
McLachlan (2004).
• Gaussian Likelihood theory, etc.• Not well tuned to HDLSS data
![Page 137: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/137.jpg)
Classification BasicsPersonal Viewpoint: Point Clouds
![Page 138: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/138.jpg)
Classification Basics
Simple and Natural Approach:
Mean Difference
a.k.a.
Centroid Method
Find “skewer through two meatballs”
![Page 139: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/139.jpg)
Classification BasicsFor Simple Toy Example:
Project
On MD
& split
at center
![Page 140: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/140.jpg)
Classification BasicsWhy not use PCA?
Reasonable
Result?
Doesn’t use
class labels…
• Good?
• Bad?
![Page 141: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/141.jpg)
Classification BasicsHarder Example (slanted clouds):
![Page 142: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/142.jpg)
Classification BasicsPCA for slanted clouds:
PC1 terrible
PC2 better?
Still missesright dir’n
Doesn’t useClass Labels
![Page 143: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/143.jpg)
Classification BasicsMean Difference for slanted clouds:
A little better?
Still missesright dir’n
Want toaccount forcovariance
![Page 144: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/144.jpg)
Classification BasicsMean Difference & Covariance,Simplest Approach:
Rescale (standardize) coordinate axesi. e. replace (full) data matrix:
Then do Mean DifferenceCalled “Naïve Bayes Approach”
ddndd
n
ddnd
n
sxsx
sxsx
X
s
s
xx
xx
X
//
//
/10
0/1
1
111111
1
111
![Page 145: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/145.jpg)
Classification Basics
Naïve Bayes Reference:
Domingos & Pazzani (1997)
Most sensible contexts:• Non-comparable data• E.g. different units
![Page 146: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/146.jpg)
Classification BasicsProblem with Naïve Bayes:
Only adjusts Variances
Not Covariances
Doesn’t solvethis problem
![Page 147: Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e0f5503460f94afa37c/html5/thumbnails/147.jpg)
Classification BasicsBetter Solution: Fisher Linear
Discrimination
Gets the
right dir’n
How does
it work?