clustering and regression -...
TRANSCRIPT
![Page 1: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/1.jpg)
Recap •
![Page 2: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/2.jpg)
Recap •
![Page 3: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/3.jpg)
Be cautious..
Data may not be in one blob, need to separate data into groups
Clustering
![Page 4: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/4.jpg)
CS 498 Probability & Sta;s;cs
Clustering methods
Zicheng Liao
![Page 5: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/5.jpg)
What is clustering?
• “Grouping”
– A fundamental part in signal processing
• “Unsupervised classification”
• Assign the same label to data points
that are close to each other
Why?
![Page 6: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/6.jpg)
We live in a universe full of clusters
![Page 7: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/7.jpg)
Two (types of) clustering methods
• Agglomerative/Divisive clustering
• K-means
![Page 8: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/8.jpg)
Agglomerative/Divisive clustering
Agglomera;ve clustering
(BoDom up)
Hierarchical cluster tree
Divisive clustering
(Top down)
![Page 9: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/9.jpg)
Algorithm
![Page 10: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/10.jpg)
Agglomerative clustering: an example • “merge clusters bottom up to form a hierarchical cluster tree”
Anima;on from Georg Berber www.mit.edu/~georg/papers/lecture6.ppt
![Page 11: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/11.jpg)
Dendrogram
>> X = rand(6, 2); %create 6 points on a plane
>> Z = linkage(X); %Z encodes a tree of hierarchical clusters
>> dendrogram(Z); %visualize Z as a dendrograph
Dis
tanc
e
![Page 12: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/12.jpg)
Distance measure •
![Page 13: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/13.jpg)
Inter-cluster distance • Treat each data point as a single cluster
• Only need to define inter-cluster distance – Distance between one set of points and another set of
points
• 3 popular inter-cluster distances – Single-link – Complete-link – Averaged-link
![Page 14: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/14.jpg)
Single-link • Minimum of all pairwise distances between
points from two clusters • Tend to produce long, loose clusters
![Page 15: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/15.jpg)
Complete-link • Maximum of all pairwise distances between
points from two clusters • Tend to produce tight clusters
![Page 16: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/16.jpg)
Averaged-link • Average of all pairwise distances between
points from two clusters
![Page 17: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/17.jpg)
How many clusters are there? • Intrinsically hard to know • The dendrogram gives insights to it • Choose a threshold to split the dendrogram into clusters
Dis
tanc
e
![Page 18: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/18.jpg)
An example
do_agglomerative.m
![Page 19: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/19.jpg)
Divisive clustering • “recursively split a cluster into smaller clusters” • It’s hard to choose where to split: combinatorial problem • Can be easier when data has a special structure (pixel grid)
![Page 20: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/20.jpg)
K-means • Partition data into clusters such that:
– Clusters are tight (distance to cluster center is small) – Every data point is closer to its own cluster center than to all
other cluster centers (Voronoi diagram)
[figures excerpted from Wikipedia]
![Page 21: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/21.jpg)
Formulation •
Cluster center
![Page 22: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/22.jpg)
K-means algorithm
•
![Page 23: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/23.jpg)
Illustration
Randomly ini;alize 3 cluster centers (circles)
Assign each point to the closest cluster center Update cluster center Re-‐iterate step2
[figures excerpted from Wikipedia]
![Page 24: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/24.jpg)
Example
do_Kmeans.m (show step-‐by-‐step updates and effects of cluster number)
![Page 25: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/25.jpg)
Discussion •
K = 2? K = 3? K = 5?
![Page 26: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/26.jpg)
Discussion • Converge to local minimum => counterintuitive clustering
[figures excerpted from Wikipedia]
![Page 27: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/27.jpg)
Discussion • Favors spherical clusters; • Poor results for long/loose/stretched clusters
Input data(color indicates true labels) K-‐means results
![Page 28: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/28.jpg)
Discussion • Cost is guaranteed to decrease in every step
– Assign a point to the closest cluster center minimizes the cost for current cluster center configuration
– Choose the mean of each cluster as new cluster center minimizes the squared distance for current clustering configuration
• Finish in polynomial time
![Page 29: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/29.jpg)
Summary • Clustering as grouping “similar” data together
• A world full of clusters/patterns
• Two algorithms – Agglomerative/divisive clustering: hierarchical clustering tree – K-means: vector quantization
![Page 30: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/30.jpg)
CS 498 Probability & Sta;s;cs Regression
Zicheng Liao
![Page 31: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/31.jpg)
Example-I • Predict stock price
t+1 ;me
Stock price
![Page 32: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/32.jpg)
Example-II • Fill in missing pixels in an image: inpainting
![Page 33: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/33.jpg)
Example-III • Discover relationship in data
Amount of hormones by devices from 3 produc;on lots
Time in service for devices from 3 produc;on lots
![Page 34: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/34.jpg)
Example-III • Discovery relationship in data
![Page 35: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/35.jpg)
Linear regression
•
![Page 36: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/36.jpg)
Parameter estimation • MLE of linear model with Gaussian noise
[Least squares, Carl F. Gauss, 1809]
Likelihood func;on
![Page 37: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/37.jpg)
Parameter estimation
• Closed form solution
Normal equation
Cost func;on
(expensive to compute the matrix inverse for high dimension)
![Page 38: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/38.jpg)
Gradient descent • hDp://openclassroom.stanford.edu/MainFolder/VideoPage.php?
course=MachineLearning&video=02.5-‐LinearRegressionI-‐GradientDescentForLinearRegression&speed=100 (Andrew Ng)
(Guarantees to reach global minimum in finite steps)
![Page 39: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/39.jpg)
Example
do_regression.m
![Page 40: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/40.jpg)
Interpreting a regression
![Page 41: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/41.jpg)
Interpreting a regression
• Zero mean residual • Zero correla;on
![Page 42: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/42.jpg)
Interpreting the residual
![Page 43: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/43.jpg)
Interpreting the residual •
![Page 44: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/44.jpg)
How good is a fit? •
![Page 45: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/45.jpg)
How good is a fit? • R-squared measure
– The percentage of variance explained by regression
– Used in hypothesis test for model selection
![Page 46: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/46.jpg)
Regularized linear regression • Cost
• Closed-form solution
• Gradient descent
![Page 47: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/47.jpg)
Why regularization?
• Handle small eigenvalues – Avoid dividing by small values by adding the
regularizer
![Page 48: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/48.jpg)
Why regularization?
• Avoid over-fitting: – Over fitting – Small parameters simpler model less prone
to over-fitting
Over fit: hard to generalize to new data
Smaller parameters make a simpler (beDer) model
![Page 49: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/49.jpg)
L1 regularization (Lasso) •
![Page 50: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/50.jpg)
How does it work?
•
![Page 51: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”](https://reader031.vdocuments.us/reader031/viewer/2022030404/5a7a51237f8b9a27638c4ac2/html5/thumbnails/51.jpg)
Summary •