k meansclustering

Upload: lohith-nani-royal

Post on 04-Jun-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 K MeansClustering

    1/13

    K -Means Clustering

    CMPUT 615

    Applications of Machine Learningin Image Analysis

  • 8/13/2019 K MeansClustering

    2/13

    K -means Overview

    A clustering algorithm An approximation to an NP-hard combinatorial

    optimization problem

    It is unsupervised K stands for number of clusters, it is a user

    input to the algorithm

    From a set of data points or observations ( allnumerical ), K -means attempts to classify theminto K clusters

    The algorithm is iterative in nature

  • 8/13/2019 K MeansClustering

    3/13

    K -means Details

    X 1,, X N are data points or vectors or observations

    Each observation will be assigned to one and only one cluster

    C (i ) denotes cluster number for the i th observation

    Dissimilarity measure: Euclidean distance metric

    K -means minimizes within-cluster point scatter:

    K

    k k iC k ik

    K

    k k iC k jC ji m x N x xC W

    1 )(

    2

    1 )( )(

    2

    21

    )(

    where

    mk is the mean vector of the k th cluster

    N k is the number of observations in k th cluster

  • 8/13/2019 K MeansClustering

    4/13

    K -means Algorithm

    For a given assignment C , compute the cluster meansm k :

    For a current set of cluster means, assign eachobservation as:

    Iterate above two steps until convergence

    .,,1,)(: K k N

    xm

    k

    k iC ii

    k

    N im xiC K k

    k i ,,1,minarg)(1

    2

  • 8/13/2019 K MeansClustering

    5/13

    Image Segmentation Results

    An image ( I ) Three-cluster image ( J ) on

    gray values of I

    Matlab code:

    I = double(imread('));

    J = reshape(kmeans(I(:),3),size(I));

    Note that K -means result is noisy

  • 8/13/2019 K MeansClustering

    6/13

    Summary

    K-means converges, but it finds a local minimumof the cost function

    Works only for numerical observations (for

    categorical and mixture observations, K -medoids is a clustering method) Fine tuning is required when applied for image

    segmentation; mostly because there is no

    imposed spatial coherency in k-means algorithm Often works as a staring point for sophisticatedimage segmentation algorithms

  • 8/13/2019 K MeansClustering

    7/13

    Otsus Thresholding Method

    Based on the clustering idea: Find thethreshold that minimizes the weighted within-cluster point scatter.

    This turns out to be the same as maximizingthe between-class scatter.

    Operates directly on the gray level histogram

    [e.g. 256 numbers, P(i)], so its fast (once thehistogram is computed).

    (1979)

  • 8/13/2019 K MeansClustering

    8/13

  • 8/13/2019 K MeansClustering

    9/13

  • 8/13/2019 K MeansClustering

    10/13

    Finally, the individual class variances are:

    12

    ( t ) [i 1(t )]2 P ( i)

    q1(t )i 1

    t

    22( t ) [i 2(t )]

    2 P (i)

    q2 (t )i t 1

    I

    Now, we could actually stop here. All we need to do is justrun through the full range of t values [1, 256] and pick the

    value that minimizes .

    But the relationship between the within-class and between-class variances can be exploited to generate a recursionrelation that permits a much faster calculation.

    w2 (t )

  • 8/13/2019 K MeansClustering

    11/13

    Finally...

    q1(t 1) q1(t ) P (t

    1)

    1(t 1) q1(t ) 1 (t ) (t 1) P (t 1)

    q1( t 1)

    q1(1) P (1) 1(0) 0;

    2( t 1) q1(t 1) 1(t 1)

    1 q1(t 1)

    Initialization...

    Recursion...

  • 8/13/2019 K MeansClustering

    12/13

    After some algebra, we can express the total variance as...

    2

    w2

    (t ) q1(t )[1 q1 (t )][ 1(t ) 2 (t )]2

    Within-class,from before Between-class,

    Since the total is constant and independent of t , the effect ofchanging the threshold is merely to move the contributions ofthe two terms back and forth.

    So, minimizing the within-class variance is the same asmaximizing the between-class variance .

    The nice thing about this is that we can compute the quantitiesin recursively as we run through the range of t values.

    B2 (t )

    B2 (t )

  • 8/13/2019 K MeansClustering

    13/13

    Result of Otsus Algorithm

    An image Binary imageby Otsus method

    0 50 100 150 200 250 30

    Gray level histogram

    Matlab code:

    I = double(imread('));

    I = (I-min(I(:)))/(max(I(:))-min(I(:)));

    J = I>graythresh(I);