k meansclustering
TRANSCRIPT
-
8/13/2019 K MeansClustering
1/13
K -Means Clustering
CMPUT 615
Applications of Machine Learningin Image Analysis
-
8/13/2019 K MeansClustering
2/13
K -means Overview
A clustering algorithm An approximation to an NP-hard combinatorial
optimization problem
It is unsupervised K stands for number of clusters, it is a user
input to the algorithm
From a set of data points or observations ( allnumerical ), K -means attempts to classify theminto K clusters
The algorithm is iterative in nature
-
8/13/2019 K MeansClustering
3/13
K -means Details
X 1,, X N are data points or vectors or observations
Each observation will be assigned to one and only one cluster
C (i ) denotes cluster number for the i th observation
Dissimilarity measure: Euclidean distance metric
K -means minimizes within-cluster point scatter:
K
k k iC k ik
K
k k iC k jC ji m x N x xC W
1 )(
2
1 )( )(
2
21
)(
where
mk is the mean vector of the k th cluster
N k is the number of observations in k th cluster
-
8/13/2019 K MeansClustering
4/13
K -means Algorithm
For a given assignment C , compute the cluster meansm k :
For a current set of cluster means, assign eachobservation as:
Iterate above two steps until convergence
.,,1,)(: K k N
xm
k
k iC ii
k
N im xiC K k
k i ,,1,minarg)(1
2
-
8/13/2019 K MeansClustering
5/13
Image Segmentation Results
An image ( I ) Three-cluster image ( J ) on
gray values of I
Matlab code:
I = double(imread('));
J = reshape(kmeans(I(:),3),size(I));
Note that K -means result is noisy
-
8/13/2019 K MeansClustering
6/13
Summary
K-means converges, but it finds a local minimumof the cost function
Works only for numerical observations (for
categorical and mixture observations, K -medoids is a clustering method) Fine tuning is required when applied for image
segmentation; mostly because there is no
imposed spatial coherency in k-means algorithm Often works as a staring point for sophisticatedimage segmentation algorithms
-
8/13/2019 K MeansClustering
7/13
Otsus Thresholding Method
Based on the clustering idea: Find thethreshold that minimizes the weighted within-cluster point scatter.
This turns out to be the same as maximizingthe between-class scatter.
Operates directly on the gray level histogram
[e.g. 256 numbers, P(i)], so its fast (once thehistogram is computed).
(1979)
-
8/13/2019 K MeansClustering
8/13
-
8/13/2019 K MeansClustering
9/13
-
8/13/2019 K MeansClustering
10/13
Finally, the individual class variances are:
12
( t ) [i 1(t )]2 P ( i)
q1(t )i 1
t
22( t ) [i 2(t )]
2 P (i)
q2 (t )i t 1
I
Now, we could actually stop here. All we need to do is justrun through the full range of t values [1, 256] and pick the
value that minimizes .
But the relationship between the within-class and between-class variances can be exploited to generate a recursionrelation that permits a much faster calculation.
w2 (t )
-
8/13/2019 K MeansClustering
11/13
Finally...
q1(t 1) q1(t ) P (t
1)
1(t 1) q1(t ) 1 (t ) (t 1) P (t 1)
q1( t 1)
q1(1) P (1) 1(0) 0;
2( t 1) q1(t 1) 1(t 1)
1 q1(t 1)
Initialization...
Recursion...
-
8/13/2019 K MeansClustering
12/13
After some algebra, we can express the total variance as...
2
w2
(t ) q1(t )[1 q1 (t )][ 1(t ) 2 (t )]2
Within-class,from before Between-class,
Since the total is constant and independent of t , the effect ofchanging the threshold is merely to move the contributions ofthe two terms back and forth.
So, minimizing the within-class variance is the same asmaximizing the between-class variance .
The nice thing about this is that we can compute the quantitiesin recursively as we run through the range of t values.
B2 (t )
B2 (t )
-
8/13/2019 K MeansClustering
13/13
Result of Otsus Algorithm
An image Binary imageby Otsus method
0 50 100 150 200 250 30
Gray level histogram
Matlab code:
I = double(imread('));
I = (I-min(I(:)))/(max(I(:))-min(I(:)));
J = I>graythresh(I);