support vector machines
DESCRIPTION
Support Vector Machines. Graphical View, using Toy Example:. Distance Weighted Discrim ’ n. Graphical View, using Toy Example:. HDLSS Discrim ’ n Simulations. Wobble Mixture:. HDLSS Discrim ’ n Simulations. Wobble Mixture: 80% dim. 1 , other dims 0 - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/1.jpg)
Support Vector MachinesGraphical View, using Toy Example:
![Page 2: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/2.jpg)
Distance Weighted Discrim’n
Graphical View, using Toy Example:
![Page 3: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/3.jpg)
HDLSS Discrim’n Simulations
Wobble Mixture:
![Page 4: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/4.jpg)
HDLSS Discrim’n Simulations
Wobble Mixture:80% dim. 1 , other dims 020% dim. 1 ±0.1, rand dim ±100,
others 0• MD still very bad, driven by outliers• SVM & DWD are both very robust• SVM loses (affected by margin push)• DWD slightly better (by w’ted
influence)• Methods converge for higher
dimension??Ignore RLR (a mistake)
2.21
![Page 5: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/5.jpg)
Melanoma Data• Study Differences Between
(Malignant) Melanoma & (Benign) Nevi
Use Image Features as Before
(Recall from Transformation Discussion)
Paper: Miedema et al (2012)
![Page 6: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/6.jpg)
March 17, 2010, 6
Clinical diagnosis
BackgroundIntroduction
![Page 7: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/7.jpg)
ROC Curve
Slide
Cutoff
To
Trace
Out
Curve
![Page 8: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/8.jpg)
Melanoma Data
Subclass
DWD
Direction
![Page 9: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/9.jpg)
Melanoma Data
Full Data
DWD
Direction
![Page 10: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/10.jpg)
Melanoma Data
Full Data
ROC
Analysis
AUC = 0.93
![Page 11: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/11.jpg)
Melanoma Data
SubClass
ROC
Analysis
AUC = 0.95
Better,
Makes
Intuitive
Sense
![Page 12: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/12.jpg)
ClusteringIdea: Given data • Assign each object to a class• Of similar objects• Completely data driven• I.e. assign labels to data• “Unsupervised Learning”
Contrast to Classification (Discrimination)• With predetermined classes• “Supervised Learning”
nXX ,...,
1
![Page 13: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/13.jpg)
K-means Clustering
Clustering Goal:
• Given data
• Choose classes
• To miminize
KCC ,...,1
nXX ,...,
1
n
ii
Ciji
K
j
K
XX
XX
CCCI j
1
2
2
1
1 ,,
![Page 14: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/14.jpg)
2-means Clustering
![Page 15: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/15.jpg)
2-means Clustering
Study CI, using simple 1-d examples
• Over changing Classes (moving b’dry)
• Multi-modal data interesting effects
– Local mins can be hard to find
– i.e. iterative procedures can “get stuck”
(even in 1 dimension, with K = 2)
![Page 16: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/16.jpg)
SWISS ScoreAnother Application of CI (Cluster Index)
Cabanski et al (2010)
![Page 17: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/17.jpg)
SWISS ScoreAnother Application of CI (Cluster Index)
Cabanski et al (2010)
Idea: Use CI in bioinformatics to“measure quality of data preprocessing”
![Page 18: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/18.jpg)
SWISS ScoreAnother Application of CI (Cluster Index)
Cabanski et al (2010)
Idea: Use CI in bioinformatics to“measure quality of data preprocessing”
Philosophy: Clusters Are Scientific GoalSo Want to Accentuate Them
![Page 19: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/19.jpg)
SWISS ScoreToy Examples (2-d): Which are “More Clustered?”
![Page 20: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/20.jpg)
SWISS ScoreToy Examples (2-d): Which are “More Clustered?”
![Page 21: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/21.jpg)
SWISS Score
SWISS =
Standardized Within class Sum of Squares
n
ii
Ciji
K
j
K
XX
XX
CCCISWISS j
1
2
2
1
1 ,,
![Page 22: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/22.jpg)
SWISS Score
SWISS =
Standardized Within class Sum of Squares
n
ii
Ciji
K
j
K
XX
XX
CCCISWISS j
1
2
2
1
1 ,,
![Page 23: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/23.jpg)
SWISS Score
SWISS =
Standardized Within class Sum of Squares
n
ii
Ciji
K
j
K
XX
XX
CCCISWISS j
1
2
2
1
1 ,,
![Page 24: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/24.jpg)
SWISS Score
SWISS =
Standardized Within class Sum of Squares
n
ii
Ciji
K
j
K
XX
XX
CCCISWISS j
1
2
2
1
1 ,,
![Page 25: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/25.jpg)
SWISS ScoreNice Graphical Introduction:
n
ii
Ciji
K
j
K
XX
XX
CCCISWISS j
1
2
2
1
1 ,,
![Page 26: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/26.jpg)
SWISS ScoreNice Graphical Introduction:
n
ii
Ciji
K
j
K
XX
XX
CCCISWISS j
1
2
2
1
1 ,,
![Page 27: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/27.jpg)
SWISS ScoreNice Graphical Introduction:
n
ii
Ciji
K
j
K
XX
XX
CCCISWISS j
1
2
2
1
1 ,,
![Page 28: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/28.jpg)
SWISS ScoreRevisit Toy Examples (2-d): Which are “More Clustered?”
![Page 29: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/29.jpg)
SWISS ScoreToy Examples (2-d): Which are “More Clustered?”
![Page 30: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/30.jpg)
SWISS ScoreToy Examples (2-d): Which are “More Clustered?”
![Page 31: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/31.jpg)
SWISS Score
Now Consider K > 2:
How well does it work?
n
ii
Ciji
K
j
K
XX
XX
CCCISWISS j
1
2
2
1
1 ,,
![Page 32: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/32.jpg)
SWISS Score
K = 3, Toy Example:
![Page 33: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/33.jpg)
SWISS Score
K = 3, Toy Example:
![Page 34: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/34.jpg)
SWISS Score
K = 3, Toy Example:
![Page 35: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/35.jpg)
SWISS Score
K = 3, Toy Example:
But C
Seems
“Better
Clustered”
![Page 36: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/36.jpg)
SWISS Score
K-Class SWISS:
Instead of using K-Class CI
Use Average of Pairwise SWISS Scores
![Page 37: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/37.jpg)
SWISS Score
K-Class SWISS:
Instead of using K-Class CI
Use Average of Pairwise SWISS Scores
(Preserves [0,1] Range)
![Page 38: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/38.jpg)
SWISS Score
Avg. Pairwise SWISS – Toy Examples
![Page 39: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/39.jpg)
SWISS Score
Additional Feature:
Ǝ Hypothesis Tests: H1: SWISS1 < 1
H1: SWISS1 < SWISS2
Permutation Based
See Cabanski et al (2010)
![Page 40: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/40.jpg)
Clustering
• A Very Large Area
• K-Means is Only One Approach
![Page 41: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/41.jpg)
Clustering
• A Very Large Area
• K-Means is Only One Approach
• Has its Drawbacks
(Many Toy Examples of This)
![Page 42: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/42.jpg)
Clustering
• A Very Large Area
• K-Means is Only One Approach
• Has its Drawbacks
(Many Toy Examples of This)
• Ǝ Many Other Approaches
![Page 43: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/43.jpg)
Clustering
Recall Important References (monographs):
• Hartigan (1975)
• Gersho and Gray (1992)
• Kaufman and Rousseeuw (2005)
See Also Wikipedia
![Page 44: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/44.jpg)
Clustering
• A Very Large Area
• K-Means is Only One Approach
• Has its Drawbacks
(Many Toy Examples of This)
• Ǝ Many Other Approaches
• Important (And Broad) Class
Hierarchical Clustering
![Page 45: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/45.jpg)
Hiearchical ClusteringIdea: Consider Either:
Bottom Up Aggregation:
One by One Combine Data
Top Down Splitting:
All Data in One Cluster & Split
Through Entire Data Set, to get Dendogram
![Page 46: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/46.jpg)
Hiearchical Clustering
Aggregate or Split, to get Dendogram
Thanks to US EPA: water.epa.gov
![Page 47: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/47.jpg)
Hiearchical Clustering
Aggregate or Split, to get Dendogram
Aggregate:
Start With
Individuals,
Move Up
![Page 48: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/48.jpg)
Hiearchical Clustering
Aggregate or Split, to get Dendogram
Aggregate:
Start With
Individuals
![Page 49: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/49.jpg)
Hiearchical Clustering
Aggregate or Split, to get Dendogram
Aggregate:
Start With
Individuals,
Move Up
![Page 50: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/50.jpg)
Hiearchical Clustering
Aggregate or Split, to get Dendogram
Aggregate:
Start With
Individuals,
Move Up
![Page 51: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/51.jpg)
Hiearchical Clustering
Aggregate or Split, to get Dendogram
Aggregate:
Start With
Individuals,
Move Up
![Page 52: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/52.jpg)
Hiearchical Clustering
Aggregate or Split, to get Dendogram
Aggregate:
Start With
Individuals,
Move Up,
End Up With
All in 1 Cluster
![Page 53: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/53.jpg)
Hiearchical Clustering
Aggregate or Split, to get Dendogram
Split:
Start With
All in 1 Cluster
![Page 54: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/54.jpg)
Hiearchical Clustering
Aggregate or Split, to get Dendogram
Split:
Start With
All in 1 Cluster,
Move Down
![Page 55: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/55.jpg)
Hiearchical Clustering
Aggregate or Split, to get Dendogram
Split:
Start With
All in 1 Cluster,
Move Down,
End With
Individuals
![Page 56: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/56.jpg)
Hiearchical ClusteringVertical Axis in Dendogram:
Based on:
Distance Metric
(Ǝ many, e.g. L2, L1, L∞, …)
Linkage Function
(Reflects Clustering Type)
![Page 57: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/57.jpg)
Hiearchical ClusteringVertical Axis in Dendogram:
Based on:
Distance Metric
(Ǝ many, e.g. L2, L1, L∞, …)
Linkage Function
(Reflects Clustering Type)
A Lot of “Art” Involved
![Page 58: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/58.jpg)
Hiearchical Clustering
Dendogram Interpretation
Branch
Length
Reflects
Cluster
Strength
![Page 59: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/59.jpg)
SigClust
• Statistical Significance of Clusters• in HDLSS Data• When is a cluster “really there”?
![Page 60: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/60.jpg)
SigClust
• Statistical Significance of Clusters• in HDLSS Data• When is a cluster “really there”?
Liu et al (2007), Huang et al (2014)
![Page 61: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/61.jpg)
SigClust
Co-authors:
Andrew Nobel – UNC Statistics & OR
C. M. Perou – UNC Genetics
D. N. Hayes – UNC Oncology & Genetics
Yufeng Liu – UNC Statistics & OR
Hanwen Huang – U Georgia Biostatistics
![Page 62: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/62.jpg)
Common Microarray Analytic Approach: Clustering
From: Perou, Brown,Botstein (2000) Molecular MedicineToday
d = 1161 genes
Zoomed to “relevant”Gene subsets
![Page 63: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/63.jpg)
Interesting Statistical Problem
For HDLSS data:When clusters seem to appear
E.g. found by clustering method
How do we know they are really there?Question asked by Neil Hayes
Define appropriate statistical significance?
Can we calculate it?
![Page 64: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/64.jpg)
First Approaches: Hypo Testing
Idea: See if clusters seem to be
Significantly Different Distributions
![Page 65: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/65.jpg)
First Approaches: Hypo Testing
Idea: See if clusters seem to be
Significantly Different Distributions
There Are Several Hypothesis Tests
Most Consistent with Visualization:
Direction, Projection, Permutation
![Page 66: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/66.jpg)
DiProPerm Hypothesis Test
Context: 2 – sample means
H0: μ+1 = μ-1 vs. H1: μ+1 ≠ μ-1
(in High Dimensions)
Wei et al (2013)
![Page 67: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/67.jpg)
DiProPerm Hypothesis Test
Context: 2 – sample means
H0: μ+1 = μ-1 vs. H1: μ+1 ≠ μ-1
Challenges: Distributional Assumptions Parameter Estimation
![Page 68: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/68.jpg)
DiProPerm Hypothesis Test
Context: 2 – sample means
H0: μ+1 = μ-1 vs. H1: μ+1 ≠ μ-1
Challenges: Distributional Assumptions Parameter Estimation
HDLSS space is slippery
![Page 69: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/69.jpg)
DiProPerm Hypothesis Test
Toy 2-Class
Example
See
Structure?
![Page 70: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/70.jpg)
DiProPerm Hypothesis Test
Toy 2-Class
Example
See
Structure?
Careful,
Only PC1-4
![Page 71: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/71.jpg)
DiProPerm Hypothesis Test
Toy 2-Class
Example
Structure
Looks
Real???
![Page 72: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/72.jpg)
DiProPerm Hypothesis Test
Toy 2-Class
Example
Actually
Both
Classes
Are N(0,I),
d = 1000
![Page 73: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/73.jpg)
DiProPerm Hypothesis Test
Toy 2-Class
Example
Actually
Both
Classes
Are N(0,I),
d = 1000
![Page 74: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/74.jpg)
DiProPerm Hypothesis Test
Toy 2-Class
Example
Separation
Is Natural
Sampling
Variation
![Page 75: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/75.jpg)
DiProPerm Hypothesis Test
Context: 2 – sample means
H0: μ+1 = μ-1 vs. H1: μ+1 ≠ μ-1
Challenges: Distributional Assumptions Parameter Estimation
HDLSS space is slippery
![Page 76: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/76.jpg)
DiProPerm Hypothesis Test
Context: 2 – sample means
H0: μ+1 = μ-1 vs. H1: μ+1 ≠ μ-1
Challenges: Distributional Assumptions Parameter Estimation
Suggested Approach:
Permutation test
![Page 77: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/77.jpg)
DiProPerm Hypothesis Test
Suggested Approach: Find a DIrection
(separating classes)
![Page 78: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/78.jpg)
DiProPerm Hypothesis Test
Suggested Approach: Find a DIrection
(separating classes) PROject the data
(reduces to 1 dim)
![Page 79: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/79.jpg)
DiProPerm Hypothesis Test
Suggested Approach: Find a DIrection
(separating classes) PROject the data
(reduces to 1 dim) PERMute
(class labels, to assess significance,
with recomputed direction)
![Page 80: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/80.jpg)
DiProPerm Hypothesis Test
![Page 81: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/81.jpg)
DiProPerm Hypothesis Test
![Page 82: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/82.jpg)
DiProPerm Hypothesis Test
![Page 83: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/83.jpg)
DiProPerm Hypothesis Test
![Page 84: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/84.jpg)
DiProPerm Hypothesis Test
.
.
.Repeat this 1,000 times
To get:
![Page 85: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/85.jpg)
DiProPerm Hypothesis Test
![Page 86: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/86.jpg)
DiProPerm Hypothesis Test
Toy 2-Class
Example
p-value
Not
Significant
![Page 87: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/87.jpg)
DiProPerm Hypothesis Test
Real Data Example: Autism
Caudate Shape
(sub-cortical brain structure)
Shape summarized by 3-d locations of 1032 corresponding points
Autistic vs. Typically Developing
![Page 88: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/88.jpg)
DiProPerm Hypothesis Test
Finds
Significant
Difference
Despite Weak
Visual
Impression
Thanks to Josh Cates
![Page 89: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/89.jpg)
DiProPerm Hypothesis Test
Also Compare: Developmentally Delayed
No
Significant
Difference
But Strong
Visual
ImpressionThanks to Josh Cates
![Page 90: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/90.jpg)
DiProPerm Hypothesis Test
Two
Examples
Which Is
“More
Distinct”?
Visually Better Separation? Thanks to Katie Hoadley
![Page 91: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/91.jpg)
DiProPerm Hypothesis Test
Two
Examples
Which Is
“More
Distinct”?
Stronger Statistical Significance! Thanks to Katie Hoadley
![Page 92: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/92.jpg)
DiProPerm Hypothesis Test
Value of DiProPerm: Visual Impression is Easily Misleading
(onto HDLSS projections,
e.g. Maximal Data Piling) Really Need to Assess Significance DiProPerm used routinely
(even for variable selection)
![Page 93: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/93.jpg)
DiProPerm Hypothesis Test
Choice of Direction: Distance Weighted Discrimination
(DWD) Support Vector Machine (SVM) Mean Difference Maximal Data Piling
![Page 94: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/94.jpg)
DiProPerm Hypothesis Test
Choice of 1-d Summary Statistic: 2-sample t-stat Mean difference Median difference Area Under ROC Curve
![Page 95: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/95.jpg)
Interesting Statistical Problem
For HDLSS data:When clusters seem to appear
E.g. found by clustering method
How do we know they are really there?Question asked by Neil Hayes
Define appropriate statistical significance?
Can we calculate it?
![Page 96: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/96.jpg)
First Approaches: Hypo Testing
e.g. Direction, Projection, Permutation Hypothesis test of:
Significant difference between sub-populationsEffective and AccurateI.e. Sensitive and SpecificThere exist several such testsBut critical point is:
What result implies about clusters
![Page 97: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/97.jpg)
Clarifying Simple Example
Why Population Difference Tests cannot indicate clustering
Andrew Nobel ObservationFor Gaussian Data (Clearly 1 Cluster!)Assign Extreme Labels
(e.g. by clustering)Subpopulations are signif’ly different
![Page 98: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/98.jpg)
Simple Gaussian Example
Clearly only 1 Cluster in this ExampleBut Extreme Relabelling looks differentExtreme T-stat strongly significantIndicates 2 clusters in data
![Page 99: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/99.jpg)
Simple Gaussian Example
Results:Random relabelling T-stat is not significantBut extreme T-stat is strongly significantThis comes from clustering operationConclude sub-populations are different
![Page 100: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/100.jpg)
Simple Gaussian Example
Results:Random relabelling T-stat is not significantBut extreme T-stat is strongly significantThis comes from clustering operationConclude sub-populations are differentNow see that:
Not the same as clusters really there
![Page 101: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/101.jpg)
Simple Gaussian Example
Results:Random relabelling T-stat is not significantBut extreme T-stat is strongly significantThis comes from clustering operationConclude sub-populations are differentNow see that:
Not the same as clusters really thereNeed a new approach to study clusters
![Page 102: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/102.jpg)
Statistical Significance of Clusters
Basis of SigClust Approach:
What defines: A Single Cluster?A Gaussian distribution (Sarle & Kou 1993)
![Page 103: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/103.jpg)
Statistical Significance of Clusters
Basis of SigClust Approach:
What defines: A Single Cluster?A Gaussian distribution (Sarle & Kou 1993)
So define SigClust test based on:2-means cluster index (measure) as statisticGaussian null distributionCurrently compute by simulationPossible to do this analytically???
![Page 104: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/104.jpg)
SigClust Statistic – 2-Means Cluster Index
Measure of non-Gaussianity:2-means Cluster IndexFamiliar Criterion from k-means Clustering
![Page 105: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/105.jpg)
SigClust Statistic – 2-Means Cluster Index
Measure of non-Gaussianity:2-means Cluster IndexFamiliar Criterion from k-means ClusteringWithin Class Sum of Squared Distances to
Class MeansPrefer to divide (normalize) by Overall Sum
of Squared Distances to MeanPuts on scale of proportions
![Page 106: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/106.jpg)
SigClust Statistic – 2-Means Cluster Index
Measure of non-Gaussianity:2-means Cluster Index:
Class Index Sets Class Means
“Within Class Var’n” / “Total Var’n”
22
1
2
1
|| ||
,|| ||
k
k
cj
k j C
n
jj
x x
CIx x
![Page 107: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/107.jpg)
SigClust Gaussian null distribut’n
Which Gaussian (for null)?
![Page 108: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/108.jpg)
SigClust Gaussian null distribut’n
Which Gaussian (for null)?
Standard (sphered) normal?No, not realisticRejection not strong evidence for clusteringCould also get that from a-spherical Gaussian
![Page 109: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/109.jpg)
SigClust Gaussian null distribut’n
Which Gaussian (for null)?
Standard (sphered) normal?No, not realisticRejection not strong evidence for clusteringCould also get that from a-spherical Gaussian
Need Gaussian more like data:Need Full modelChallenge: Parameter EstimationRecall HDLSS Context
![Page 110: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/110.jpg)
SigClust Gaussian null distribut’n
Estimated Mean, (of Gaussian dist’n)?1st Key Idea: Can ignore this
![Page 111: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/111.jpg)
SigClust Gaussian null distribut’n
Estimated Mean, (of Gaussian dist’n)?1st Key Idea: Can ignore thisBy appealing to shift invariance of CI
When Data are (rigidly) shiftedCI remains the same
So enough to simulate with mean 0
![Page 112: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/112.jpg)
SigClust Gaussian null distribut’n
Estimated Mean, (of Gaussian dist’n)?1st Key Idea: Can ignore thisBy appealing to shift invariance of CI
When Data are (rigidly) shiftedCI remains the same
So enough to simulate with mean 0Other uses of invariance ideas?
![Page 113: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/113.jpg)
SigClust Gaussian null distribut’n
Challenge: how to estimate cov. Matrix ?
![Page 114: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/114.jpg)
SigClust Gaussian null distribut’n
Challenge: how to estimate cov. Matrix ?Number of parameters:E.g. Perou 500 data:
Dimension
so
But Sample Size
2
)1( dd
533n
46,797,9752
)1(
dd9674d
![Page 115: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/115.jpg)
SigClust Gaussian null distribut’n
Challenge: how to estimate cov. Matrix ?Number of parameters:E.g. Perou 500 data:
Dimension
so
But Sample Size Impossible in HDLSS settings????
Way around this problem?
2
)1( dd
533n
46,797,9752
)1(
dd9674d
![Page 116: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/116.jpg)
SigClust Gaussian null distribut’n
2nd Key Idea: Mod Out Rotations
![Page 117: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/117.jpg)
SigClust Gaussian null distribut’n
2nd Key Idea: Mod Out RotationsReplace full Cov. by diagonal matrixAs done in PCA eigen-analysis
But then “not like data”???
tMDM
![Page 118: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/118.jpg)
SigClust Gaussian null distribut’n
2nd Key Idea: Mod Out RotationsReplace full Cov. by diagonal matrixAs done in PCA eigen-analysis
But then “not like data”???OK, since k-means clustering (i.e. CI) is
rotation invariant
(assuming e.g. Euclidean Distance)
tMDM
![Page 119: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/119.jpg)
SigClust Gaussian null distribut’n
2nd Key Idea: Mod Out Rotations
Only need to estimate diagonal matrix
But still have HDLSS problems?
![Page 120: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/120.jpg)
SigClust Gaussian null distribut’n
2nd Key Idea: Mod Out Rotations
Only need to estimate diagonal matrix
But still have HDLSS problems?
E.g. Perou 500 data:
Dimension
Sample Size
Still need to estimate param’s9674d533n9674d
![Page 121: Support Vector Machines](https://reader036.vdocuments.us/reader036/viewer/2022081603/56813625550346895d9d9b71/html5/thumbnails/121.jpg)
SigClust Gaussian null distribut’n
3rd Key Idea: Factor Analysis Model