object orie’d data analysis, last time
DESCRIPTION
Object Orie’d Data Analysis, Last Time. Finished Q-Q Plots Assess variability with Q-Q Envelope Plot SigClust When is a cluster “really there”? Statistic: 2-means Cluster Index Gaussian null distribution Fit to data (for HDLSS data, using invariance) P-values by simulation - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/1.jpg)
Object Orie’d Data Analysis, Last Time
• Finished Q-Q Plots– Assess variability with Q-Q Envelope Plot
• SigClust– When is a cluster “really there”?
– Statistic: 2-means Cluster Index
– Gaussian null distribution
– Fit to data (for HDLSS data, using invariance)
– P-values by simulation
– Breast Cancer Data
![Page 2: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/2.jpg)
More on K-Means Clustering
Classical Algorithm (from MacQueen,1967)
• Start with initial means
• Cluster: each data pt. to closest mean
• Recompute Class mean
• Stop when no change
Demo from:http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html
![Page 3: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/3.jpg)
More on K-Means Clustering
Raw Data
2 StartingCenters
![Page 4: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/4.jpg)
More on K-Means Clustering
Assign Each Data Point To NearestCenter
Recompute Mean
Re-assign
![Page 5: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/5.jpg)
More on K-Means Clustering
Recompute Mean
Re-AssignData Points To NearestCenter
![Page 6: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/6.jpg)
More on K-Means Clustering
Recompute Mean
Re-AssignData Points To NearestCenter
![Page 7: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/7.jpg)
More on K-Means Clustering
Recompute Mean
Final Assignment
![Page 8: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/8.jpg)
More on K-Means Clustering
New ExampleRaw Data
DeliberatelyStrange Starting Centers
![Page 9: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/9.jpg)
More on K-Means Clustering
Assign ClustersTo GivenMeans
Note poor clustering
![Page 10: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/10.jpg)
More on K-Means Clustering
Recompute Mean
Re-assign
ShowsImprovement
![Page 11: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/11.jpg)
More on K-Means Clustering
Recompute Mean
Re-assign
ShowsImprovement
Now very good
![Page 12: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/12.jpg)
More on K-Means Clustering
Different Example
Best 2-meansCluster?
Local Minima?
![Page 13: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/13.jpg)
More on K-Means Clustering
Assign
Recompute Mean
Re-assign
Note poor clustering
![Page 14: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/14.jpg)
More on K-Means Clustering
Recompute Mean
Final Assignment
Stuck in Local Min
![Page 15: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/15.jpg)
More on K-Means Clustering
Same Data
But slightly differentstarting points
Impact???
![Page 16: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/16.jpg)
More on K-Means Clustering
Assign
Recompute Mean
Re-assign
Note poor clustering
![Page 17: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/17.jpg)
More on K-Means Clustering
Recompute Mean
Final Assignment
Now get Global Min
![Page 18: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/18.jpg)
More on K-Means Clustering
???Next time:
Redo above, using my own Matlab
calculations
That way can show each step
And get right answers.
![Page 19: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/19.jpg)
More on K-Means Clustering
Now explore starting values:
• Approach randomly choose 2 data points
• Give stable solutions?
• Explore for different point configurations
• And try 100 random choices
• Do 2-d examples for easy visualization
![Page 20: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/20.jpg)
More on K-Means Clustering2 Clusters: Raw Data (Normal mixture)
![Page 21: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/21.jpg)
More on K-Means Clustering2 Clusters: Cluster Index, based on 100 Random Starts
![Page 22: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/22.jpg)
More on K-Means Clustering2 Clusters: Chosen Clustering
![Page 23: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/23.jpg)
More on K-Means Clustering
2 Clusters Results
• All starts end up with good answer
• Answer is very good (CI = 0.03)
• No obvious local minima
![Page 24: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/24.jpg)
More on K-Means ClusteringStretched Gaussian: Raw Data
![Page 25: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/25.jpg)
More on K-Means ClusteringStretched Gaussian : C. I., based on 100 Random Starts
![Page 26: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/26.jpg)
More on K-Means ClusteringStretched Gaussian : Chosen Clustering
![Page 27: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/27.jpg)
More on K-Means Clustering
Stretched Gaussian Results
• All starts end up with same answer
• Answer is less good (CI = 0.35)
• No obvious local minima
![Page 28: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/28.jpg)
More on K-Means ClusteringStandard Gaussian: Raw Data
![Page 29: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/29.jpg)
More on K-Means ClusteringStandard Gaussian : C. I., based on 100 Random Starts
![Page 30: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/30.jpg)
More on K-Means ClusteringStandard Gaussian: Chosen Clustering
![Page 31: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/31.jpg)
More on K-Means Clustering
Standard Gaussian Results
• All starts end up with same answer
• Answer even less good (CI = 0.62)
• No obvious local minima
• So still stable, despite poor CI
![Page 32: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/32.jpg)
More on K-Means Clustering4 Balanced Clusters: Raw Data (Normal mixture)
![Page 33: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/33.jpg)
More on K-Means Clustering4 Balanced Clusters: CI, based on 100 Random Starts
![Page 34: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/34.jpg)
More on K-Means Clustering
4 Balanced Clusters 100 Random Starts
• Many different solutions appear
• I.e. there are many local minima
• Sorting on CI (bottom) shows how many
• 2 seem smaller than others
• What are other local minima?
Understand with deeper visualization
![Page 35: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/35.jpg)
More on K-Means Clustering4 Balanced Clusters: Class Assignment Image Plot
![Page 36: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/36.jpg)
More on K-Means Clustering4 Balanced Clusters: Vertically Regroup (better view?)
![Page 37: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/37.jpg)
More on K-Means Clustering4 Balanced Clusters: Choose cases to “flip” – color cases
![Page 38: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/38.jpg)
More on K-Means Clustering4 Balanced Clusters: Choose cases to “flip” – color cases
![Page 39: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/39.jpg)
More on K-Means Clustering4 Balanced Clusters: “flip”, shows local min clusters
![Page 40: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/40.jpg)
More on K-Means Clustering4 Balanced Clusters: sort columns, for better visualization
![Page 41: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/41.jpg)
More on K-Means Clustering4 Balanced Clusters: CI, based on 100 Random Starts
![Page 42: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/42.jpg)
More on K-Means Clustering4 Balanced Clusters: Color according to local minima
![Page 43: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/43.jpg)
More on K-Means Clustering4 Balanced Clusters: Chosen Clustering, smallest CI
![Page 44: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/44.jpg)
More on K-Means Clustering4 Balanced Clusters: Chosen Clustering, 2nd small CI
![Page 45: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/45.jpg)
More on K-Means Clustering4 Balanced Clusters: Chosen Clustering, larger 3rd CI
![Page 46: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/46.jpg)
More on K-Means Clustering4 Balanced Clusters: Chosen Clustering, larger 4th CI
![Page 47: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/47.jpg)
More on K-Means Clustering4 Balanced Clusters: Chosen Clustering, larger 5th CI
![Page 48: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/48.jpg)
More on K-Means Clustering4 Balanced Clusters: Chosen Clustering, larger 6th CI
![Page 49: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/49.jpg)
More on K-Means Clustering
4 Balanced Clusters Results
• Many Local Minima
• Two good ones appear often (2-2 splits)
• 4 worse ones (1-3 splits less common)
• 1 with single strange point
• Overall very unstable
• Raises concern over starting values
![Page 50: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/50.jpg)
More on K-Means Clustering4 Unbalanced Clusters: Raw Data (try for stability)
![Page 51: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/51.jpg)
More on K-Means Clustering4 Unbalanced Clusters: CI, based on 100 Random Starts
![Page 52: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/52.jpg)
More on K-Means Clustering4 Unbalanced Clusters: Recolor by CI
![Page 53: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/53.jpg)
More on K-Means Clustering4 Unbalanced Clusters: Chosen Clustering, smallest CI
![Page 54: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/54.jpg)
More on K-Means Clustering4 Unbalanced Clusters: Chosen Clustering, 2nd small CI
![Page 55: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/55.jpg)
More on K-Means Clustering4 Unbalanced Clusters: Chosen Clustering, larger 3rd CI
![Page 56: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/56.jpg)
More on K-Means Clustering
4 Unbalanced Clusters Results
• Fewer Local Minima (more stable)
• Two good ones appear often (2-2 splits)
• Single 1-3 split less common
• Previous instability caused by balance?
• Maybe stability OK after all?
![Page 57: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/57.jpg)
More on K-Means ClusteringData on Circle: Raw Data (maximal instability?)
![Page 58: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/58.jpg)
More on K-Means ClusteringData on Circle: CI, based on 100 Random Starts
![Page 59: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/59.jpg)
More on K-Means ClusteringData on Circle: Recolor by CI
![Page 60: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/60.jpg)
More on K-Means ClusteringData on Circle: Chosen Clustering, smallest CI
![Page 61: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/61.jpg)
More on K-Means ClusteringData on Circle : Chosen Clustering, 2nd small CI
![Page 62: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/62.jpg)
More on K-Means ClusteringData on Circle : Chosen Clustering, 3rd small CI
![Page 63: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/63.jpg)
More on K-Means Clustering
Data on Circle Results
• Seems many local minima
• Several are the same?
• Could be programming error?
• But clear this is an unstable example
![Page 64: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/64.jpg)
K-Means Clustering Caution
• This is all a personal view
• Others would present different aspects
• E.g. replace Euclidean dist. by others
• E.g. other types of clustering
• E.g. heat-map dendogram views
…
![Page 65: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/65.jpg)
SigClust Breast Cancer Data
K-means Clustering & Starting Values
Try 100 random Starts
For full data set: Study Final CIs
• Shows just two solutions
Study changes in data, with image view• Shows little difference between these
Overall: Typical for clusters can split When Split is Clear, easily find it
![Page 66: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/66.jpg)
SigClust Random Restarts, Full Data
![Page 67: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/67.jpg)
SigClust Random Restarts, Full Data
![Page 68: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/68.jpg)
SigClust Breast Cancer Data
For full Chuck Class (e.g. Luminal B): Study Final CIs
• Shows several solutions
Study changes in data, with image view• Shows multiple, divergent minima
Overall: Typical for “terminal” clusters When no clear split, many local optima appear
Could base test on number of local optima???
![Page 69: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/69.jpg)
SigClust Random Restarts, Luminal B
![Page 70: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/70.jpg)
SigClust Random Restarts, Luminal B
![Page 71: Object Orie’d Data Analysis, Last Time](https://reader035.vdocuments.us/reader035/viewer/2022070409/56814473550346895db1067b/html5/thumbnails/71.jpg)
SigClust Breast Cancer Data
??? Next time: show many more of these
To better build this case….