Download - 1 K Means Clustering, Nearest Cluster and Gaussian Mixture Presented by Kuei-Hsien 2005.06.23
11
K Means Clustering , Nearest K Means Clustering , Nearest Cluster and Gaussian MixtureCluster and Gaussian Mixture
Presented by Kuei-HsienPresented by Kuei-Hsien2005.06.232005.06.23
22
Clustering algorithms are used to find groups of Clustering algorithms are used to find groups of “similar” data points among the input patterns.“similar” data points among the input patterns.
K means clustering is an effective algorithm to K means clustering is an effective algorithm to extract a given number of clusters of patterns extract a given number of clusters of patterns from a training set. from a training set.
Once done, the cluster locations can be used to Once done, the cluster locations can be used to classify data into distinct classes. classify data into distinct classes.
K Means ClusteringK Means Clustering
33
Initialize the number of cluster centers selected by the user by randomly selecting them from the training set.
Classify the entire training set. For each pattern Xi in the training set, find the nearest cluster center C* and classify Xi as a member of C*.
For each cluster, recompute its center by finding the mean of the cluster:
Where Mk is the new mean, Nk is the number of training patterns in cluster k, and Xjk is the k-th pattern belonging to cluster .
kN
jjk
kk XN
M1
1
K Means Training Flow ChartK Means Training Flow Chart
Loop until the change in cluster means is less the amount specified by the user.
44
If the number of cluster centers is less then the number specified, split each cluster center into two clusters by finding the input dimension with th
e highest deviation:
Where Xij is the i-th dimension of the j-th pattern in cluster k, Mij is the i-th dimension of the cluster center, and Nk is the number of training patter
ns in cluster k.
kN
jijiji MX
1
2)(
Store the k cluster centers.
55
1.1. Ask User how many Ask User how many clusters they’d like.clusters they’d like.
(e.g. k=5).(e.g. k=5).
2.2. Random select k Random select k cluster centers cluster centers locations locations
3. Each datapoint finds out which Center it’s closest to. (Thus each Center “owns” a set of datapoints)
4. Each cluster fine the centroid of the points it owns.
6. …repeat step 3 to 5 until terminated
kN
jjk
kk XN
M1
1
5. …and jumps to there
66
SD is large and SD change is less than a specified value
The means of clusters are too close
Stop Condition-Splitting & MergingStop Condition-Splitting & Merging
kN
jijiji MX
1
2)(
Splitting
m
iii YXYXDist
1
2)(),(
kN
jjk
kk XN
M1
1Merging
Loop until the change of SD in all clusters are less than a specified value by user, or when a specified number of epochs have been reached.
77
K Means Test Flow ChartK Means Test Flow ChartFor each pattern X, associate X with the cluster Y closest to X using the
Euclidean distance:
m
iii YXYXDist
1
2)(),(
Euclidean distance:
Loop until the change in all cluster means is less the amount specified by the user
m
iii YXYXDist
1
2)(),(
Stop Condition- training of parameterStop Condition- training of parameter
88
Commonly tunable parameters for K meansCommonly tunable parameters for K means
Number of initial clustersNumber of initial clustersRandomly chosenRandomly chosen
Number of cluster centersNumber of cluster centers 2~√N2~√N
Criteria for splitting and merging cluster Criteria for splitting and merging cluster centerscenters
Number of epochs or percent of SD changeNumber of epochs or percent of SD change
Stop conditionsStop conditionsThe stop condition Binary splitting is less important than that The stop condition Binary splitting is less important than that for the overall clustering.for the overall clustering.
K Means Clustering End
99
Nearest ClusterNearest Cluster
The nearest-cluster architecture can be viewed The nearest-cluster architecture can be viewed as a condensed version of the K nearest as a condensed version of the K nearest neighbor architecture.neighbor architecture.
This architecture can often deliver performance This architecture can often deliver performance close to that of KNN, while reducing computation close to that of KNN, while reducing computation time and memory requirements.time and memory requirements.
1010
Nearest ClusterNearest Cluster
The nearest-cluster architecture involves a The nearest-cluster architecture involves a partitioning of the training set into a few clusters.partitioning of the training set into a few clusters.
For a given cluster, these values estimate the For a given cluster, these values estimate the posterior probabilities of all possible classes for the posterior probabilities of all possible classes for the region of the input space in the vicinity of the cluster.region of the input space in the vicinity of the cluster.
During classification, an input is associated with the During classification, an input is associated with the nearest cluster, and the posterior probability nearest cluster, and the posterior probability estimates for that cluster are used to classify the estimates for that cluster are used to classify the input.input.
1111
Perform K means clustering on the data set.
For each cluster, generate a probability for each class according to:
Where Pjk is the probability for class j within cluster k, Njk is the number of class-j patterns belonging to cluster k, and Nk is the total number of p
atterns belonging to cluster k.
k
jk
kj N
NP
Nearest Cluster Training Flow ChartNearest Cluster Training Flow Chart
1212
Nearest Cluster Test Flow ChartNearest Cluster Test Flow Chart
For each input pattern, X, find the nearest cluster Ck, using the Euclidean distance measure:
Where Y is a cluster center and m is the number of dimensions in the input patterns.
m
iii YXYXDist
1
2)(),(
Use the probabilities Pjk for all classes j stored with Ck., and classify pattern X into the class j with the highest probability.
1313
Class 1
Class 2
In cluster 1:
Pclass1=100%
Pclass2=0%
In cluster 2:
Pclass1=99.5%
Pclass2=0.5%
In cluster 3:
Pclass1=50%
Pclass2=50%
In cluster 4:
Pclass1=15%
Pclass2=85%
In cluster 5:
Pclass1=65%
Pclass2=35%
Pclass1=65%
Nearest Nearest ClusterCluster
Nearest Cluster End
h
k
jk
kj N
NP
1414
Gaussian MixtureGaussian Mixture
The Gaussian mixture architecture The Gaussian mixture architecture estimates probability density functions estimates probability density functions (PDF) for each class, and then performs (PDF) for each class, and then performs classification based on Bayes’ rule:classification based on Bayes’ rule:
)(
)()|()|(
XP
CPCXPXCP iii
Where P(X | Ci) is the PDF of class j, evaluated at X, P( Cj ) is the prior probability for class j, and P(X) is the overall PDF, evaluated at X.
1515
Gaussian MixtureGaussian Mixture
Unlike the unimodal Gaussian architecture,Unlike the unimodal Gaussian architecture, which assumes P(X | C which assumes P(X | Cjj) to be in the form ) to be in the form
of a Gaussian, the Gaussian mixture modeof a Gaussian, the Gaussian mixture model estimates P(X | Cl estimates P(X | Cjj) as a weighted averag) as a weighted averag
e of multiple Gaussians.e of multiple Gaussians.
Nc
k
kkj GwCXP1
)|(
Where wk is the weight of the k-th Gaussian Gk and the weights sum to one. One such PDF model is produced for each class.
1616
Gaussian MixtureGaussian Mixture
Each Gaussian component is defined as: Each Gaussian component is defined as:
)]()(2/1[2/12/
1
||)2(
1kk
Tk MXVMX
kn
k eV
G
Where Mk is the mean of the Gaussian and Vk is the covariance matrix of the Gaussian..
1717
Gaussian MixtureGaussian Mixture
Free parameters of the Gaussian mixture Free parameters of the Gaussian mixture model consist of the means and covariancmodel consist of the means and covariance matrices of the Gaussian components ae matrices of the Gaussian components and the weights indicating the contribution ond the weights indicating the contribution of each Gaussian to the approximation of Pf each Gaussian to the approximation of P(X | C(X | Cjj). ).
1818
G1,w1 G2,w2
G3,w3
G4,w4
G5.w5
Class 1
)(
)()|()|(
XP
CPCXPXCP jjj
Nc
k
kkj GwCXP1
)|(
)]()(2/1[2/12/
1
||)2(
1)|( i
Ti XViX
idik eV
GXpG
Variables: μi, Vi, wk
We use EM (estimate-maximize) algorithm to approximate this variables.
Composition of Gaussian MixtureComposition of Gaussian Mixture
1919
Gaussian MixtureGaussian Mixture
These parameters are tuned using a complex These parameters are tuned using a complex iterative procedure called the estimate-maximize iterative procedure called the estimate-maximize (EM) algorithm, that aims at maximizing the (EM) algorithm, that aims at maximizing the likelihood of the training set generated by the likelihood of the training set generated by the estimated PDF.estimated PDF.
The likelihood function L for each class j can be The likelihood function L for each class j can be defined as: defined as:
trainN
ijij CXPL
0
)|(
trainN
ijij CXPL
0
))|(ln()ln(
2020
Gaussian Mixture Training Flow Chart (1)Gaussian Mixture Training Flow Chart (1)
Initialize the initial Gaussian means μi, i=1,…G using the K means clustering algorithm
Initialize the covariance matrices, Vi, to the distance to the nearest cluster.
Initialize the weights πi =1 / G so that all Gaussian are equally likely.
Present each pattern X of the training set and model each of the classes K as a weighte sum of Gaussians:
Where G is the number of Gaussians, the πi’s are the weights, and
Where Vi is the covariance matrix.
)|()|(1
i
G
iis GXpXp
)]()(2/1[2/12/
1
||)2(
1)|( i
Ti XViX
idi eV
GXp
2121
Gaussian Mixture Training Flow Chart (2)Gaussian Mixture Training Flow Chart (2)
Compute:
G
jkjj
kiikiiiip
CXp
CGXp
Xp
CGXpXGP
1
),|(
),|(
)(
),|()|(
Iteratively update the weights, means and covariances:
cN
ppi
ci t
Nt
1
)(1
)1(
p
N
ppi
ic
XttN
tic
1
)()(
1)1(
)))())(((()()(
1)1(
1
Tipip
N
ppi
ici tXtXt
tNtV
c
2222
Gaussian Mixture Training Flow Chart (3)Gaussian Mixture Training Flow Chart (3)
Recompute τip using the new weights, means and covariances. Stop training if
Or the number of epochs reach the specified value. Otherwise, continue the iterative updates.
thresholdtt pipipi )()1(
2323
Gaussian Mixture Test Flow ChartGaussian Mixture Test Flow Chart
Present each input pattern X and compute the confidence for each class j:
Where is the prior probability of class Cj estimated by counting the number of training patterns. Classify pattern X as the class with the highest confidence.
),|()( jxj CXPCP
N
NCP cij )(
Gaussian Mixture End
2424
Thanks for your attention !!Thanks for your attention !!