isaac newton institute - cambridge
DESCRIPTION
Isaac Newton Institute - Cambridge. Object Oriented Data Analysis J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina September 1, 2014. Personal Opinions on Mathematical Statistics. What is Mathematical Statistics? Validation of existing methods - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/1.jpg)
11
UNC, Stat & OR
Isaac Newton Institute - CambridgeIsaac Newton Institute - Cambridge
Object Oriented Data Analysis
J. S. Marron
Dept. of Statistics and Operations
Research, University of North Carolina
April 21, 2023
![Page 2: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/2.jpg)
22
UNC, Stat & OR
Personal Opinions on Mathematical Personal Opinions on Mathematical StatisticsStatistics
What is Mathematical Statistics?
Validation of existing methods
Asymptotics (n ∞) & Taylor
expansion
Comparison of existing methods
(requires hard math, but
really “accounting”???)
![Page 3: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/3.jpg)
33
UNC, Stat & OR
Personal Opinions on Mathematical Personal Opinions on Mathematical StatisticsStatistics
What could Mathematical Statistics be?
Basis for invention of new methods
Complicated data mathematical
ideas
Do we value creativity?
Since we don’t do this, others do…
(where are the ₤₤₤s???)
![Page 4: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/4.jpg)
44
UNC, Stat & OR
Personal Opinions on Mathematical Personal Opinions on Mathematical StatisticsStatistics
Since we don’t do this, others do…
Pattern Recognition
Artificial Intelligence
Neural Nets
Data Mining
Machine Learning
???
![Page 5: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/5.jpg)
55
UNC, Stat & OR
Personal Opinions on Mathematical Personal Opinions on Mathematical StatisticsStatistics
Possible Litmus Test:
Creative Statistics
Clinical Trials Viewpoint:
Worst Imaginable Idea
Mathematical Statistics Viewpoint:
???
![Page 6: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/6.jpg)
66
UNC, Stat & OR
Object Oriented Data Analysis, IObject Oriented Data Analysis, I
What is the “atom” of a statistical analysis?
1st Course: Numbers
Multivariate Analysis Course : Vectors
Functional Data Analysis: Curves
More generally: Data Objects
![Page 7: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/7.jpg)
77
UNC, Stat & OR
Object Oriented Data Analysis, IIObject Oriented Data Analysis, II
Examples:
Medical Image Analysis
Images as Data Objects?
Shape Representations as Objects
Micro-arrays
Just multivariate analysis?
![Page 8: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/8.jpg)
88
UNC, Stat & OR
Object Oriented Data Analysis, IIIObject Oriented Data Analysis, III
Typical Goals:
Understanding population variation
Visualization
Principal Component Analysis +
Discrimination (a.k.a. Classification)
Time Series of Data Objects
![Page 9: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/9.jpg)
99
UNC, Stat & OR
Object Oriented Data Analysis, IVObject Oriented Data Analysis, IV
Major Statistical Challenge, I:
High Dimension Low Sample Size (HDLSS)
Dimension d >> sample size n
“Multivariate Analysis” nearly useless Can’t “normalize the data”
Land of Opportunity for Statisticians Need for “creative statisticians”
![Page 10: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/10.jpg)
1010
UNC, Stat & OR
Object Oriented Data Analysis, VObject Oriented Data Analysis, V
Major Statistical Challenge, II:
Data may live in non-Euclidean space Lie Group / Symmet’c Spaces (manifold
data)
Trees/Graphs as data objects
Interesting Issues: What is “the mean” (pop’n center)?
How do we quantify “pop’n variation”?
![Page 11: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/11.jpg)
1111
UNC, Stat & OR
Statistics in Image Analysis, IStatistics in Image Analysis, I
First Generation Problems:
Denoising
Segmentation
Registration
(all about single images)
![Page 12: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/12.jpg)
1212
UNC, Stat & OR
Statistics in Image Analysis, IIStatistics in Image Analysis, II
Second Generation Problems:
Populations of Images
Understanding Population Variation
Discrimination (a.k.a. Classification)
Complex Data Structures (& Spaces)
HDLSS Statistics
![Page 13: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/13.jpg)
1313
UNC, Stat & OR
HDLSS Statistics in Imaging
Why HDLSS (High Dim, Low Sample Size)?
Complex 3-d Objects Hard to Represent Often need d = 100’s of parameters
Complex 3-d Objects Costly to Segment Often have n = 10’s cases
![Page 14: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/14.jpg)
1414
UNC, Stat & OR
Medical Imaging – A Challenging Medical Imaging – A Challenging ExampleExample
Male Pelvis Bladder – Prostate – Rectum How do they move over time (days)? Critical to Radiation Treatment
(cancer) Work with 3-d CT Very Challenging to Segment
Find boundary of each object? Represent each Object?
![Page 15: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/15.jpg)
1515
UNC, Stat & OR
Male Pelvis – Raw DataMale Pelvis – Raw Data
One CT Slice
(in 3d
image)
Coccyx
(Tail Bone)
Rectum
Prostate
![Page 16: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/16.jpg)
1616
UNC, Stat & OR
Male Pelvis – Raw DataMale Pelvis – Raw Data
Prostate:
manual segmentation
Slice by slice
Reassembled
![Page 17: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/17.jpg)
1717
UNC, Stat & OR
Male Pelvis – Raw DataMale Pelvis – Raw Data
Prostate:
Slices:Reassembled in 3d
How to represent?
Thanks: Ja-Yeon Jeong
![Page 18: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/18.jpg)
1818
UNC, Stat & OR
Object RepresentationObject Representation
Landmarks (hard to find) Boundary Rep’ns (no
correspondence) Medial representations
Find “skeleton” Discretize as “atoms” called M-reps
![Page 19: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/19.jpg)
1919
UNC, Stat & OR
3-d m-reps3-d m-reps
Bladder – Prostate – Rectum (multiple objects, J. Y. Jeong)
• Medial Atoms provide “skeleton”
• Implied Boundary from “spokes” “surface”
![Page 20: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/20.jpg)
2020
UNC, Stat & OR
3-d m-reps3-d m-reps
M-rep model fitting
• Easy, when starting from binary (blue)
• But very expensive (30 – 40 minutes technician’s time)
• Want automatic approach
• Challenging, because of poor contrast, noise, …
• Need to borrow information across training sample
• Use Bayes approach: prior & likelihood posterior
• ~Conjugate Gaussians, but there are issues:
• Major HLDSS challenges
• Manifold aspect of data
![Page 21: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/21.jpg)
2121
UNC, Stat & OR
PCA for m-reps, IPCA for m-reps, I
Major issue: m-reps live in(locations, radius and angles)
E.g. “average” of: = ???
Natural Data Structure is:Lie Groups ~ Symmetric spaces
(smooth, curved manifolds)
)2()3(3 SOSO
359,358,3,2
![Page 22: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/22.jpg)
2222
UNC, Stat & OR
PCA for m-reps, IIPCA for m-reps, II
PCA on non-Euclidean spaces?(i.e. on Lie Groups / Symmetric Spaces)
T. Fletcher: Principal Geodesic Analysis
Idea: replace “linear summary of data”With “geodesic summary of data”…
![Page 23: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/23.jpg)
2323
UNC, Stat & OR
PGA for m-reps, Bladder-Prostate-PGA for m-reps, Bladder-Prostate-RectumRectum
Bladder – Prostate – Rectum, 1 person, 17 days
PG 1 PG 2 PG 3
(analysis by Ja Yeon Jeong)
![Page 24: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/24.jpg)
2424
UNC, Stat & OR
PGA for m-reps, Bladder-Prostate-PGA for m-reps, Bladder-Prostate-RectumRectum
Bladder – Prostate – Rectum, 1 person, 17 days
PG 1 PG 2 PG 3
(analysis by Ja Yeon Jeong)
![Page 25: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/25.jpg)
2525
UNC, Stat & OR
PGA for m-reps, Bladder-Prostate-PGA for m-reps, Bladder-Prostate-RectumRectum
Bladder – Prostate – Rectum, 1 person, 17 days
PG 1 PG 2 PG 3
(analysis by Ja Yeon Jeong)
![Page 26: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/26.jpg)
2626
UNC, Stat & OR
HDLSS Classification (i.e. HDLSS Classification (i.e. Discrimination)Discrimination)
Background: Two Class (Binary) version:
Using “training data” from Class +1, and from Class -1
Develop a “rule” for assigning new data to a Class
Canonical Example: Disease Diagnosis New Patients are “Healthy” or “Ill” Determined based on measurements
![Page 27: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/27.jpg)
2727
UNC, Stat & OR
HDLSS Classification (Cont.)HDLSS Classification (Cont.)
Ineffective Methods: Fisher Linear Discrimination Gaussian Likelihood Ratio
Less Useful Methods: Nearest Neighbors Neural Nets
(“black boxes”, no “directions” or intuition)
![Page 28: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/28.jpg)
2828
UNC, Stat & OR
HDLSS Classification (Cont.)HDLSS Classification (Cont.)
Currently Fashionable Methods: Support Vector Machines Trees Based Approaches
New High Tech Method Distance Weighted Discrimination
(DWD) Specially designed for HDLSS data Avoids “data piling” problem of SVM Solves more suitable optimization problem
![Page 29: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/29.jpg)
2929
UNC, Stat & OR
HDLSS Classification (Cont.)HDLSS Classification (Cont.)
Currently Fashionable Methods:
Trees Based ApproachesSupport Vector Machines:
![Page 30: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/30.jpg)
3030
UNC, Stat & OR
Distance Weighted DiscriminationDistance Weighted Discrimination
Maximal Data Piling
![Page 31: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/31.jpg)
3131
UNC, Stat & OR
Distance Weighted DiscriminationDistance Weighted Discrimination
Based on Optimization Problem:
More precisely work in appropriate penalty for violations
Optimization Method (Michael Todd): Second Order Cone Programming Still Convex gen’tion of quadratic
prog’ing Fast greedy solution Can use existing software
n
i ibw r1,
1min
![Page 32: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/32.jpg)
3232
UNC, Stat & OR
DWD Bias Adjustment for MicroarraysDWD Bias Adjustment for Microarrays
Microarray data: Simult. Measur’ts of “gene
expression” Intrinsically HDLSS
Dimension d ~ 1,000s – 10,000s Sample Sizes n ~ 10s – 100s
My view: Each array is “point in cloud”
![Page 33: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/33.jpg)
3333
UNC, Stat & OR
DWD Batch and Source AdjustmentDWD Batch and Source Adjustment
For Perou’s Stanford Breast Cancer Data Analysis in Benito, et al (2004)
Bioinformaticshttps://genome.unc.edu/pubsup/dwd/
Adjust for Source Effects Different sources of mRNA
Adjust for Batch Effects Arrays fabricated at different times
![Page 34: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/34.jpg)
3434
UNC, Stat & OR
DWD Adj: Raw Breast Cancer dataDWD Adj: Raw Breast Cancer data
![Page 35: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/35.jpg)
3535
UNC, Stat & OR
DWD Adj: Source ColorsDWD Adj: Source Colors
![Page 36: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/36.jpg)
3636
UNC, Stat & OR
DWD Adj: Batch ColorsDWD Adj: Batch Colors
![Page 37: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/37.jpg)
3737
UNC, Stat & OR
DWD Adj: Biological Class ColorsDWD Adj: Biological Class Colors
![Page 38: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/38.jpg)
3838
UNC, Stat & OR
DWD Adj: Biological Class Colors & DWD Adj: Biological Class Colors & SymbolsSymbols
![Page 39: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/39.jpg)
3939
UNC, Stat & OR
DWD Adj: Biological Class SymbolsDWD Adj: Biological Class Symbols
![Page 40: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/40.jpg)
4040
UNC, Stat & OR
DWD Adj: Source ColorsDWD Adj: Source Colors
![Page 41: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/41.jpg)
4141
UNC, Stat & OR
DWD Adj: PC 1-2 & DWD directionDWD Adj: PC 1-2 & DWD direction
![Page 42: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/42.jpg)
4242
UNC, Stat & OR
DWD Adj: DWD Source AdjustmentDWD Adj: DWD Source Adjustment
![Page 43: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/43.jpg)
4343
UNC, Stat & OR
DWD Adj: Source Adj’d, PCA viewDWD Adj: Source Adj’d, PCA view
![Page 44: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/44.jpg)
4444
UNC, Stat & OR
DWD Adj: Source Adj’d, Class ColoredDWD Adj: Source Adj’d, Class Colored
![Page 45: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/45.jpg)
4545
UNC, Stat & OR
DWD Adj: Source Adj’d, Batch ColoredDWD Adj: Source Adj’d, Batch Colored
![Page 46: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/46.jpg)
4646
UNC, Stat & OR
DWD Adj: Source Adj’d, 5 PCsDWD Adj: Source Adj’d, 5 PCs
![Page 47: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/47.jpg)
4747
UNC, Stat & OR
DWD Adj: S. Adj’d, Batch 1,2 vs. 3 DWDDWD Adj: S. Adj’d, Batch 1,2 vs. 3 DWD
![Page 48: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/48.jpg)
4848
UNC, Stat & OR
DWD Adj: S. & B1,2 vs. 3 AdjustedDWD Adj: S. & B1,2 vs. 3 Adjusted
![Page 49: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/49.jpg)
4949
UNC, Stat & OR
DWD Adj: S. & B1,2 vs. 3 Adj’d, 5 PCsDWD Adj: S. & B1,2 vs. 3 Adj’d, 5 PCs
![Page 50: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/50.jpg)
5050
UNC, Stat & OR
DWD Adj: S. & B Adj’d, B1 vs. 2 DWDDWD Adj: S. & B Adj’d, B1 vs. 2 DWD
![Page 51: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/51.jpg)
5151
UNC, Stat & OR
DWD Adj: S. & B Adj’d, B1 vs. 2 Adj’dDWD Adj: S. & B Adj’d, B1 vs. 2 Adj’d
![Page 52: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/52.jpg)
5252
UNC, Stat & OR
DWD Adj: S. & B Adj’d, 5 PC viewDWD Adj: S. & B Adj’d, 5 PC view
![Page 53: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/53.jpg)
5353
UNC, Stat & OR
DWD Adj: S. & B Adj’d, 4 PC viewDWD Adj: S. & B Adj’d, 4 PC view
![Page 54: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/54.jpg)
5454
UNC, Stat & OR
DWD Adj: S. & B Adj’d, Class ColorsDWD Adj: S. & B Adj’d, Class Colors
![Page 55: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/55.jpg)
5555
UNC, Stat & OR
DWD Adj: S. & B Adj’d, Adj’d PCADWD Adj: S. & B Adj’d, Adj’d PCA
![Page 56: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/56.jpg)
5656
UNC, Stat & OR
DWD Bias Adjustment for Microarrays
Effective for Batch and Source Adj. Also works for cross-platform Adj.
E.g. cDNA & Affy Despite literature claiming contrary
“Gene by Gene” vs. “Multivariate” views
Funded as part of caBIG“Cancer BioInformatics Grid”
“Data Combination Effort” of NCI
![Page 57: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/57.jpg)
5757
UNC, Stat & OR
Interesting Benchmark Data SetInteresting Benchmark Data Set
NCI 60 Cell Lines Interesting benchmark, since same cells Data Web available:
http://discover.nci.nih.gov/datasetsNature2000.jsp
Both cDNA and Affymetrix Platforms
8 Major cancer subtypes
Use DWD now for visualization
![Page 58: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/58.jpg)
5858
UNC, Stat & OR
NCI 60: Views using DWD Dir’ns (focus on NCI 60: Views using DWD Dir’ns (focus on biology)biology)
![Page 59: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/59.jpg)
5959
UNC, Stat & OR
DWD in Face Recognition, I
Face Images as Data
(with M. Benito & D. Peña)
Registered using
landmarks
Male – Female Difference?
Discrimination Rule?
![Page 60: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/60.jpg)
6060
UNC, Stat & OR
DWD in Face Recognition, II
DWD Direction
Good separation
Images “make
sense”
Garbage at ends?
(extrapolation
effects?)
![Page 61: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/61.jpg)
6161
UNC, Stat & OR
Blood vessel tree dataBlood vessel tree data
Marron’s brain:
Segmented from
MRA
Reconstruct trees
in 3d
Rotate to view
![Page 62: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/62.jpg)
6262
UNC, Stat & OR
Blood vessel tree dataBlood vessel tree data
Marron’s brain:
Segmented from
MRA
Reconstruct trees
in 3d
Rotate to view
![Page 63: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/63.jpg)
6363
UNC, Stat & OR
Blood vessel tree dataBlood vessel tree data
Marron’s brain:
Segmented from
MRA
Reconstruct trees
in 3d
Rotate to view
![Page 64: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/64.jpg)
6464
UNC, Stat & OR
Blood vessel tree dataBlood vessel tree data
Marron’s brain:
Segmented from
MRA
Reconstruct trees
in 3d
Rotate to view
![Page 65: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/65.jpg)
6565
UNC, Stat & OR
Marron’s brain:
Segmented from
MRA
Reconstruct trees
in 3d
Rotate to view
Blood vessel tree dataBlood vessel tree data
![Page 66: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/66.jpg)
6666
UNC, Stat & OR
Blood vessel tree dataBlood vessel tree data
Marron’s brain:
Segmented from
MRA
Reconstruct trees
in 3d
Rotate to view
![Page 67: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/67.jpg)
6767
UNC, Stat & OR
Blood vessel tree dataBlood vessel tree data
Now look over many people (data
objects)
Structure of population (understand
variation?)
PCA in strongly non-Euclidean Space???
, ... ,,
![Page 68: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/68.jpg)
6868
UNC, Stat & OR
Blood vessel tree dataBlood vessel tree data
Possible focus of analysis:
• Connectivity structure only (topology)
• Location, size, orientation of segments
• Structure within each vessel segment
, ... ,,
![Page 69: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/69.jpg)
6969
UNC, Stat & OR
Blood vessel tree dataBlood vessel tree data
Present Focus:
Topology only
Already
challenging
Later address
others
Then add
attributes
To tree nodes
And extend
analysis
![Page 70: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/70.jpg)
7070
UNC, Stat & OR
Strongly Non-Euclidean Strongly Non-Euclidean SpacesSpaces
Statistics on Population of Tree-Structured Data Objects?
• Mean???• Analog of PCA???
Strongly non-Euclidean, since:• Space of trees not a linear space• Not even approximately linear
(no tangent plane)
![Page 71: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/71.jpg)
7171
UNC, Stat & OR
Strongly Non-Euclidean Strongly Non-Euclidean SpacesSpaces
PCA on Tree Space?
Key Idea (Jim Ramsay):
• Replace 1-d subspace
that best approximates data
• By 1-d representation
that best approximates data
Wang and Marron (2007) define notion of
Treeline (in structure space)
![Page 72: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/72.jpg)
7272
UNC, Stat & OR
PCA for blood vessel tree PCA for blood vessel tree datadata
Data Analytic Goals: Age, Gender
See
these?
No…
![Page 73: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/73.jpg)
7373
UNC, Stat & OR
Preliminary Tree-Curve Preliminary Tree-Curve ResultsResults
First Correlation
OfStructure
To Age!
(BackTrees)
![Page 74: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/74.jpg)
7474
UNC, Stat & OR
HDLSS Asymptotics
Why study asymptotics?
![Page 75: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/75.jpg)
7575
UNC, Stat & OR
HDLSS Asymptotics
Why study asymptotics?
An interesting (naïve) quote:
“I don’t look at asymptotics, because
I don’t have an infinite sample size”
![Page 76: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/76.jpg)
7676
UNC, Stat & OR
HDLSS Asymptotics
Why study asymptotics?
An interesting (naïve) quote:
“I don’t look at asymptotics, because
I don’t have an infinite sample size”
Suggested perspective:
Asymptotics are a tool for finding simple
structure underlying complex entities
![Page 77: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/77.jpg)
7777
UNC, Stat & OR
HDLSS Asymptotics
Which asymptotics?
n ∞ (classical, very widely
done)
d ∞ ???
Sensible?
Follow typical “sampling process”?
Say anything, as noise level
increases???
![Page 78: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/78.jpg)
7878
UNC, Stat & OR
HDLSS Asymptotics
Which asymptotics?
n ∞ & d ∞
n >> d: a few results around
(still have classical info in data)
n ~ d: random matrices (Iain J., et al)
(nothing classically estimable)
HDLSS asymptotics: n fixed, d ∞
![Page 79: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/79.jpg)
7979
UNC, Stat & OR
HDLSS Asymptotics
HDLSS asymptotics: n fixed, d ∞
Follow typical “sampling process”?
![Page 80: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/80.jpg)
8080
UNC, Stat & OR
HDLSS Asymptotics
HDLSS asymptotics: n fixed, d ∞
Follow typical “sampling process”?
Microarrays: # genes bounded
Proteomics, SNPs, …
A moot point, from perspective:
Asymptotics are a tool for finding
simple structure underlying complex
entities
![Page 81: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/81.jpg)
8181
UNC, Stat & OR
HDLSS Asymptotics
HDLSS asymptotics: n fixed, d ∞
Say anything, as noise level
increases???
![Page 82: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/82.jpg)
8282
UNC, Stat & OR
HDLSS Asymptotics
HDLSS asymptotics: n fixed, d ∞
Say anything, as noise level
increases???
Yes, there exists simple, perhaps
surprising, underlying structure
![Page 83: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/83.jpg)
8383
UNC, Stat & OR
HDLSS Asymptotics: Simple Paradoxes, I
For dim’al “Standard Normal” dist’n:
Euclidean Distance to Origin (as ):
- Data lie roughly on surface of sphere of radius
- Yet origin is point of “highest density”???
- Paradox resolved by:
“density w. r. t. Lebesgue Measure”
d
d
dd
d
IN
Z
Z
Z ,0~1
)1(pOdZ
d
![Page 84: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/84.jpg)
8484
UNC, Stat & OR
HDLSS Asymptotics: Simple Paradoxes, II
For dim’al “Standard Normal” dist’n: indep. of
Euclidean Dist. between and (as ):Distance tends to non-random constant:
Can extend to Where do they all go???
(we can only perceive 3 dim’ns)
d
d
dd INZ ,0~2
)1(221 pOdZZ
1Z
1Z 2Z
nZZ ,...,1
![Page 85: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/85.jpg)
8585
UNC, Stat & OR
HDLSS Asymptotics: Simple Paradoxes, III
For dim’al “Standard Normal” dist’n: indep. of
High dim’al Angles (as ):
- -“Everything is orthogonal”??? - Where do they all go???
(again our perceptual limitations) - Again 1st order structure is non-random
d
d
dd INZ ,0~2
)(90, 2/121
dOZZAngle p
1Z
![Page 86: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/86.jpg)
8686
UNC, Stat & OR
HDLSS Asy’s: Geometrical Representation, I
Assume , let
Study Subspace Generated by Data
a. Hyperplane through 0, of dimension
b. Points are “nearly equidistant to 0”, & dist
c. Within plane, can “rotate towards Unit Simplex”
d. All Gaussian data sets are“near Unit Simplex Vertices”!!!
“Randomness” appears only in rotation of simplex
n
d ddn INZZ ,0~,...,1
d
d
With P. Hall & A. Neeman
![Page 87: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/87.jpg)
8787
UNC, Stat & OR
HDLSS Asy’s: Geometrical Representation, II
Assume , let
Study Hyperplane Generated by Data
a. dimensional hyperplane
b. Points are pairwise equidistant, dist
c. Points lie at vertices of “regular hedron”
d. Again “randomness in data” is only in rotation
e. Surprisingly rigid structure in data?
1n
d ddn INZZ ,0~,...,1
d2d~
n
![Page 88: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/88.jpg)
8888
UNC, Stat & OR
HDLSS Asy’s: Geometrical Representation, III
Simulation View: shows “rigidity after rotation”
![Page 89: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/89.jpg)
8989
UNC, Stat & OR
HDLSS Asy’s: Geometrical Representation, III
Straightforward Generalizations:
non-Gaussian data: only need moments
non-independent: use “mixing conditions” (with P. Hall & A. Neeman)
Mild Eigenvalue condition on Theoretical Cov. (with J. Ahn, K. Muller & Y. Chi)
All based on simple “Laws of Large Numbers”
![Page 90: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/90.jpg)
9090
UNC, Stat & OR
HDLSS Asy’s: Geometrical Representation, IV
Explanation of Observed (Simulation) Behavior:
“everything similar for very high d”
2 popn’s are 2 simplices (i.e. regular n-
hedrons) All are same distance from the other class i.e. everything is a support vector i.e. all sensible directions show “data piling” so “sensible methods are all nearly the same” Including 1 - NN
![Page 91: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/91.jpg)
9191
UNC, Stat & OR
HDLSS Asy’s: Geometrical Representation, V
Further Consequences of Geometric Representation
1. Inefficiency of DWD for uneven sample size(motivates “weighted version”, work in progress)
2. DWD more “stable” than SVM(based on “deeper limiting distributions”)(reflects intuitive idea “feeling sampling
variation”)(something like “mean vs. median”)
3. 1-NN rule inefficiency is quantified.
![Page 92: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/92.jpg)
9292
UNC, Stat & OR
2nd Paper on HDLSS Asymptotics
Ahn, Marron, Muller & Chi (2007) Biometrika Assume 2nd Moments (and Gaussian)
Assume no eigenvalues too large in sense:
For assume i.e.
(min possible)
(much weaker than previous mixing conditions…)
d
jj
d
jj
d1
2
2
1
)(1 do 1 d
![Page 93: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/93.jpg)
9393
UNC, Stat & OR
HDLSS Math. Stat. of PCA, I
Consistency & Strong Inconsistency:
Spike Covariance Model (Johnstone & Paul)
For Eigenvalues:
1st Eigenvector:
How good are empirical versions,
as estimates?
1,,1, ,,2,1 dddd d
1u
1,,1 ˆ,ˆ,,ˆ uddd
![Page 94: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/94.jpg)
9494
UNC, Stat & OR
HDLSS Math. Stat. of PCA, II
Consistency (big enough spike):
For ,
Strong Inconsistency (spike not big enough):
For ,
1
0ˆ, 11 uuAngle
1
011 90ˆ, uuAngle
![Page 95: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/95.jpg)
9595
UNC, Stat & OR
HDLSS Math. Stat. of PCA, III
Consistency of eigenvalues?
Eigenvalues Inconsistent
But known distribution
Unless as well
nn
dL
d
2
,1,1̂
n
![Page 96: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/96.jpg)
9696
UNC, Stat & OR
HDLSS Work in Progress, II
Canonical Correlations: Myung Hee Lee
Results similar to those for those for
PCA
Singular values inconsistent
But directions converge under a much
milder spike assumption.
![Page 97: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/97.jpg)
9797
UNC, Stat & OR
HDLSS Work in Progress, III
Conditions for Geo. Rep’n & PCA Consist.:
John Kent example:
Can only say:
not deterministic
Conclude: need some flavor of mixing
dddddd ININX *100,02
1,0
2
1~
212/1212/1
2/1
..10
..)(
pwd
pwddOX p
![Page 98: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/98.jpg)
9898
UNC, Stat & OR
HDLSS Work in Progress, III
Conditions for Geo. Rep’n & PCA Consist.:
Conclude: need some flavor of mixing
Challenge: Classical mixing conditions
require notion of time ordering
Not always clear, e.g. microarrays
![Page 99: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/99.jpg)
9999
UNC, Stat & OR
HDLSS Work in Progress, III
Conditions for Geo. Rep’n & PCA Consist.:
Sungkyu Jung Condition:
where
Define:
Assume: Ǝ a permutation,
So that is ρ-mixing
ddX ,0~ tdddd UU
dtddd XUZ 2/1
d
ddZ
![Page 100: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/100.jpg)
100100
UNC, Stat & OR
HDLSS Deep Open Problem
In PCA Consistency:
Strong Inconsistency - spike
Consistency - spike
What happens at boundary
( )???
1
1
1
![Page 101: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/101.jpg)
101101
UNC, Stat & OR
The Future of HDLSS Asymptotics?
1. Address your favorite statistical problem…
2. HDLSS versions of classical optimality
results?
3. Continguity Approach (~Random Matrices)
4. Rates of convergence?
5. Improved Discrimination Methods?
It is early days…
![Page 102: Isaac Newton Institute - Cambridge](https://reader035.vdocuments.us/reader035/viewer/2022062410/568151e5550346895dc01f63/html5/thumbnails/102.jpg)
102102
UNC, Stat & OR
Some Carry Away Lessons
Atoms of the Analysis: Object Oriented
Viewpoint: Object Space Feature Space
DWD is attractive for HDLSS classification
“Randomness” in HDLSS data is only in rotations
(Modulo rotation, have constant simplex shape)
How to put HDLSS asymptotics to work?