1 unc, stat & or hailuoto workshop object oriented data analysis, ii j. s. marron dept. of...
TRANSCRIPT
![Page 1: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/1.jpg)
11
UNC, Stat & OR
Hailuoto WorkshopHailuoto Workshop
Object Oriented Data Analysis, II
J. S. Marron
Dept. of Statistics and Operations
Research, University of North Carolina
April 20, 2023
![Page 2: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/2.jpg)
22
UNC, Stat & OR
HDLSS Classification (i.e. Discrimination)
Background: Two Class (Binary) version:
Using “training data” from Class +1, and from Class -1
Develop a “rule” for assigning new data to a Class
Canonical Example: Disease Diagnosis New Patients are “Healthy” of “Ill” Determined bases on measurements
![Page 3: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/3.jpg)
33
UNC, Stat & OR
HDLSS Classification (Cont.)
Ineffective Methods: Fisher Linear Discrimination Gaussian Likelihood Ratio
Less Useful Methods: Nearest Neighbors Neural Nets
(“black boxes”, no “directions” or intuition)
![Page 4: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/4.jpg)
44
UNC, Stat & OR
HDLSS Classification (Cont.)
Currently Fashionable Methods: Support Vector Machines Trees Based Approaches
New High Tech Method Distance Weighted Discrimination
(DWD) Specially designed for HDLSS data Avoids “data piling” problem of SVM Solves more suitable optimization problem
![Page 5: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/5.jpg)
55
UNC, Stat & OR
HDLSS Classification (Cont.)
Currently Fashionable Methods:
Trees Based ApproachesSupport Vector Machines:
![Page 6: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/6.jpg)
66
UNC, Stat & OR
HDLSS Classification (Cont.)
Comparison of Linear Methods (toy data):
Optimal DirectionExcellent, but need dir’n in dim = 50
Maximal Data Piling (J. Y. Ahn, D. Peña) Great separation, but generalizability???
Support Vector Machine More separation, gen’ity, but some data
piling?Distance Weighted Discrimination
Avoids data piling, good gen’ity, Gaussians?
50,20,2.2,, 21,1 dnnINd
![Page 7: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/7.jpg)
77
UNC, Stat & OR
Distance Weighted Discrimination
Maximal Data Piling
![Page 8: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/8.jpg)
88
UNC, Stat & OR
Distance Weighted Discrimination
Based on Optimization Problem:
More precisely work in appropriate penalty for violations
Optimization Method (Michael Todd): Second Order Cone Programming Still Convex gen’tion of quadratic
prog’ing Fast greedy solution Can use existing software
n
i ibw r1,
1min
![Page 9: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/9.jpg)
99
UNC, Stat & OR
Simulation Comparison
E.G. Above
Gaussians:
Wide array of dim’s
SVM Subst’ly worse
MD – Bayes Optimal
DWD close to MD
![Page 10: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/10.jpg)
1010
UNC, Stat & OR
Simulation Comparison
E.G. Outlier Mixture:
Disaster for MD
SVM & DWD much
more solid
Dir’ns are “robust”
SVM & DWD similar
![Page 11: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/11.jpg)
1111
UNC, Stat & OR
Simulation Comparison
E.G. Wobble Mixture:
Disaster for MD
SVM less good
DWD slightly better
Note: All methods
come together for
larger d ???
![Page 12: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/12.jpg)
1212
UNC, Stat & OR
DWD Bias Adjustment for Microarrays
Microarray data: Simult. Measur’ts of “gene
expression” Intrinsically HDLSS
Dimension d ~ 1,000s – 10,000s Sample Sizes n ~ 10s – 100s
My view: Each array is “point in cloud”
![Page 13: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/13.jpg)
1313
UNC, Stat & OR
DWD Batch and Source AdjustmentDWD Batch and Source Adjustment
For Perou’s Stanford Breast Cancer Data Analysis in Benito, et al (2004)
Bioinformaticshttps://genome.unc.edu/pubsup/dwd/
Adjust for Source Effects Different sources of mRNA
Adjust for Batch Effects Arrays fabricated at different times
![Page 14: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/14.jpg)
1414
UNC, Stat & OR
DWD Adj: Raw Breast Cancer dataDWD Adj: Raw Breast Cancer data
![Page 15: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/15.jpg)
1515
UNC, Stat & OR
DWD Adj: Source ColorsDWD Adj: Source Colors
![Page 16: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/16.jpg)
1616
UNC, Stat & OR
DWD Adj: Batch ColorsDWD Adj: Batch Colors
![Page 17: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/17.jpg)
1717
UNC, Stat & OR
DWD Adj: Biological Class ColorsDWD Adj: Biological Class Colors
![Page 18: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/18.jpg)
1818
UNC, Stat & OR
DWD Adj: Biological Class Colors & DWD Adj: Biological Class Colors & SymbolsSymbols
![Page 19: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/19.jpg)
1919
UNC, Stat & OR
DWD Adj: Biological Class SymbolsDWD Adj: Biological Class Symbols
![Page 20: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/20.jpg)
2020
UNC, Stat & OR
DWD Adj: Source ColorsDWD Adj: Source Colors
![Page 21: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/21.jpg)
2121
UNC, Stat & OR
DWD Adj: PC 1-2 & DWD directionDWD Adj: PC 1-2 & DWD direction
![Page 22: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/22.jpg)
2222
UNC, Stat & OR
DWD Adj: DWD Source AdjustmentDWD Adj: DWD Source Adjustment
![Page 23: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/23.jpg)
2323
UNC, Stat & OR
DWD Adj: Source Adj’d, PCA viewDWD Adj: Source Adj’d, PCA view
![Page 24: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/24.jpg)
2424
UNC, Stat & OR
DWD Adj: Source Adj’d, Class ColoredDWD Adj: Source Adj’d, Class Colored
![Page 25: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/25.jpg)
2525
UNC, Stat & OR
DWD Adj: Source Adj’d, Batch ColoredDWD Adj: Source Adj’d, Batch Colored
![Page 26: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/26.jpg)
2626
UNC, Stat & OR
DWD Adj: Source Adj’d, 5 PCsDWD Adj: Source Adj’d, 5 PCs
![Page 27: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/27.jpg)
2727
UNC, Stat & OR
DWD Adj: S. Adj’d, Batch 1,2 vs. 3 DWDDWD Adj: S. Adj’d, Batch 1,2 vs. 3 DWD
![Page 28: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/28.jpg)
2828
UNC, Stat & OR
DWD Adj: S. & B1,2 vs. 3 AdjustedDWD Adj: S. & B1,2 vs. 3 Adjusted
![Page 29: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/29.jpg)
2929
UNC, Stat & OR
DWD Adj: S. & B1,2 vs. 3 Adj’d, 5 PCsDWD Adj: S. & B1,2 vs. 3 Adj’d, 5 PCs
![Page 30: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/30.jpg)
3030
UNC, Stat & OR
DWD Adj: S. & B Adj’d, B1 vs. 2 DWDDWD Adj: S. & B Adj’d, B1 vs. 2 DWD
![Page 31: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/31.jpg)
3131
UNC, Stat & OR
DWD Adj: S. & B Adj’d, B1 vs. 2 Adj’dDWD Adj: S. & B Adj’d, B1 vs. 2 Adj’d
![Page 32: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/32.jpg)
3232
UNC, Stat & OR
DWD Adj: S. & B Adj’d, 5 PC viewDWD Adj: S. & B Adj’d, 5 PC view
![Page 33: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/33.jpg)
3333
UNC, Stat & OR
DWD Adj: S. & B Adj’d, 4 PC viewDWD Adj: S. & B Adj’d, 4 PC view
![Page 34: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/34.jpg)
3434
UNC, Stat & OR
DWD Adj: S. & B Adj’d, Class ColorsDWD Adj: S. & B Adj’d, Class Colors
![Page 35: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/35.jpg)
3535
UNC, Stat & OR
DWD Adj: S. & B Adj’d, Adj’d PCADWD Adj: S. & B Adj’d, Adj’d PCA
![Page 36: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/36.jpg)
3636
UNC, Stat & OR
DWD Bias Adjustment for Microarrays
Effective for Batch and Source Adj. Also works for cross-platform Adj.
E.g. cDNA & Affy Despite literature claiming contrary
“Gene by Gene” vs. “Multivariate” views
Funded as part of caBIG“Cancer BioInformatics Grid”
“Data Combination Effort” of NCI
![Page 37: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/37.jpg)
3737
UNC, Stat & OR
Why not adjust by means?
DWD is complicated: value added?
Xuxin Liu example…
Key is sizes of biological subtypes
Differing ratio trips up mean
But DWD more robust
(although still not perfect)
![Page 38: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/38.jpg)
3838
UNC, Stat & OR
Twiddle ratios of subtypes
![Page 39: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/39.jpg)
3939
UNC, Stat & OR
DWD in Face Recognition, I
Face Images as Data
(with M. Benito & D. Peña)
Registered using
landmarks
Male – Female Difference?
Discrimination Rule?
![Page 40: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/40.jpg)
4040
UNC, Stat & OR
DWD in Face Recognition, II
DWD Direction
Good separation
Images “make
sense”
Garbage at ends?
(extrapolation
effects?)
![Page 41: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/41.jpg)
4141
UNC, Stat & OR
DWD in Face Recognition, III
Interesting summary:
Jump between means
(in DWD direction)
Clear separation of
Maleness vs.
Femaleness
![Page 42: 1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f335503460f94c50b62/html5/thumbnails/42.jpg)
4242
UNC, Stat & OR
DWD in Face Recognition, IV
Current Work:
Focus on “drivers”:
(regions of interest)
Relation to Discr’n?
Which is “best”?
Lessons for human
perception?