sayadi_fuzzyclustering

Fuzzy ClusteringPresented By: Omid Sayadi*

Supervisor: Dr. Bagheri

* PhD. student, Biomedical Image and Signal Processing Lab (BiSIPL),Department of Electrical Engineering, Sharif University of Technology,

[email protected]

Spring 2008 Sharif University of Technology 2

Fuzzy Clustering

• Problem Statement• Fuzzy Clustering Algorithms• Fuzzy Clustering Applications• Discussion and Conclusions


Introduction• Cluster

A number of similar individuals that occur together as a two or more consecutive features that span a specific subspace of a concept.

orCollection of data objects,

similar to one anotherwithin the same clusterdissimilar to the objectin other clusters

Lorries

Sport Cars

Medium Market Cars

Weight (Kg)

Top Speed (Km/h)


Introduction (cont.)• Clustering

Process of grouping a set of physical or abstract objects into classes of similar objects.

“ Clustering is the art of finding groups in data ”Kaufmann & Rousseeu

Cluster analysis is an important human activity: • Distinguishing in early chilhood,• Learn a new object or understand a new

phenomenon (feature extraction and comparison)


Introduction (cont.)• Motivation

• Discovering hidden patterns and structures,

• Discovering large sets of data into small number of meaningful groups (clusters),

• Dealing with a managable number of homogenous groups, instead of dealing with a vast number of single data objects,

• Data reduction and information compaction.


Introduction (cont.)• Clustering vs. Classification

• Clustering: Unsupervised Learning• No class labels defined.

• Classification: Supervised Learning• Predefined (priori known) clas labels,• Training set (labeled) and test set.

Clustering is unsupervised classification, where no classes are predefined (labeled).


Introduction (cont.)• Similarity measures

• Clustering:

Max intra-similarity

Min inter-similarity


Introduction (cont.)• Similarity measure functions

Minkowski

Tchebyschev

Hamming

Euclidean


Introduction (cont.)• Clustering Approaches

• Hierarchy algorithms• Find successive clusters using previously

established clusters.• Partitioning algorithms

• Construct various partitions and then evaluate.• Determine all clusters at once.

• Model-based algorithms• Grid-based algorithms• Density-based algorithms


Introduction (cont.)• Hierarchical Clustering

• Create a hierarchical decomposition of the data set using some criterion and a termination condition.

• Divisive (Top-down ) • Agglomerative (Bottom-up)


Introduction (cont.)• Divisive vs Agglomerative

☺


Introduction (cont.)• Partitional Clustering

• Given a database of N objects, partition the objectsinto a pre-specified number of K clusters.

Liu, 1968

• The clusters are formed to optimize a similarityfunction (max intra-similarity and min inter-similarity).

• Popular Partitioning Algorithms: • K-means • EM (Expectation Maximization)

∑=

−⎟⎟⎠

⎞⎜⎜⎝

⎛−=

K

i

Ni iKiK

KKNM

0)()1(

!1),(

Number of clustering ways


Introduction (cont.)• Challenges

• Hierarchy algorithms• The tree of clusters (dendogram) needs satisfaction

of a termination criteria → dendogram cutting• Agglomerative or Divisive• Irreversible split and merge

• Partitioning algorithms• Pre-selection of number of clusters (K).


Introduction (cont.)• K-means algorithm

• Given the number of clusters (K), partition objects (randomly) into K nonempty subsets,

• While new assignments occur, do: • Compute seed points as the centroids (virtual

mean point) of the clusters of the current partition.

• Assign each object to the cluster with the nearest seed point.


Introduction (cont.)• K-means example

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Problem:

Equal distance to centroids !


Introduction (cont.)• Taxonomy of Clustering Approaches


Fuzzy Clustering

• Fuzzy Clustering Algorithms• Fuzzy Clustering Applications• Discussion and Conclusions

• Introduction


Problem Statement• HCM (K-means) Formulation

• Set of data in the feature space},,,{ 21 nxxxX• Ci ith cluster

rK

rr=

KciallforUCØ

jiallforØCC

UC

i

ji

c

ii

≤≤⊂⊂

≠=∩

==

2

1U

All clusters C together fills the

whole universe U

Clusters do not overlap

A cluster C is never empty and it is

smaller than the whole universe U

There must be at least 2 clusters in a c-partition and at most as many as the number

of data points K


Problem Statement (cont.)• K-means Failures

• The objective function in classical clustering:

• Each data must be assigned to exactly one cluster.• The problem of data points that are equally distant.

∑ ∑∑= ∈= ⎟⎟

⎟

⎠

⎞

⎜⎜⎜

⎝

⎛−==

c

i Ckik

c

ii

ik

JJ1

2

,1 ucu

Minimise the total sum of all distances


Problem Statement (cont.)

• Equi-distant data points• Butterfly data points (Ruspini’s Butterfly 1969)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Problem Statement (cont.)• Towards Fuzzy Clustering

• We need to support uncertainity → Each data can belong to multiple clusters with varying degree of membership.

• The space is partitionedinto overlaping groups.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Crisp Clusters

Fuzzy Clusters


Problem Statement (cont.)• Fuzzy C-Partition Formulation

• Set of data in the feature space},,,{ 21 nxxxX• Ci ith cluster

rK

rr=

KciallforUCØ

jiallforØCC

UC

i

ji

c

ii

≤≤⊂⊂

≠=∩

==

2

1U

All clusters C together fills the

whole universe U

Clusters do overlap

A cluster C is never empty and it is

smaller than the whole universe U

There must be at least 2 clusters in a c-partition and at most as many as the number

of data points K

or ≠ Ø


Problem Statement (cont.)• Fuzzy Clustering Types

Hard Clustering

Fuzzy Clustering

Omitting thenon-overlapping

condition

Probabilistic Fuzzy Clustering

Bezdek, 1981

Possibilistic Fuzzy Clustering

Krishnapuram & Keller,

1993


Problem Statement (cont.)• Probabilistic Fuzzy Clustering

• A constraint optimization

• The membership degree of a datum Xj to ith cluster

• Fuzzy label vector to each data point Xj

• Fuzzy label vector to each data point Xj

]1,0[)( ∈= jCij xui

rμT

cjjj uuu ),,( 1 Kr=

),,,(][ 21 nncij uuuuU rK

rr== ×

∑∑= =

=c

i

n

jij

mijff duCUXJ

1 1

2),,( & },,1{0

1ciu

n

jij K∈∀>∑

=

},,1{11

njuc

iij K∈∀=∑

=

No empty cluster

Normalization Constraint


Problem Statement (cont.)• Probabilistic Fuzzy Clustering (cont.)

• m determines the “fuzziness”of the clustering:

∑∑= =

=c

i

n

jij

mijff duCUXJ

1 1

2),,( Distance between datum Xj and cluster i

Fuzzifier exponent (m≥1)

Usually m=2

m→1 : more crisp clustering

m→∞ : more fuzzy clustering

m=1.1

m=2



• The cost function Jf cannot be minimized directly, hence an alternative optimization scheme (AO) must be used.

• The iterative algorithm:• First, the membership degrees are optimized for fixed

cluster parameters:

• Then, the cluster parameters are optimized for fixedmembership degrees:

0),( 1 >= − tCjU tUt

)( tct UjC =



• Minimization result:• Update formula for the membership degree:

• It depends not only to the distance of the datum Xj to cluster i, but also on the distances between this data point and other clusters.

∑=

−−

−−+ = c

l

mlj

mijt

ij

d

du

1

)1/(2

)1/(2)1( Gravitation to cluster i

relative to total gravitation



• What about the cluster prototypes (C) ?• They are algorithm dependent, i.e. they depend on:

• Describing parameters of the cluster (location, shape, size)• Distance measure d.

• Problem: Lack of Typicality• The notmalization constraint

causes the cluster to tend tothe outliers.

• No difference betweenx1 and x2 (0.5 for both)


Problem Statement (cont.)• Possibilistic Fuzzy Clustering

• Idea: Drop the normalization condition in probabilistic fuzzy clustering.

• Remainig constraint:

• The cost function to be minimized:

},,1{11

njuc

iij K∈∀=∑

=

∑ ∑∑∑= == =

−+=c

i

n

j

miji

c

i

n

jij

mijff uduCUXJ

1 11 1

2 )1(),,( η

A penalty term which forces the membership degrees away from zero.


Problem Statement (cont.)• Possibilistic Fuzzy Clustering (cont.)

• ηi>0: Used to balance the contrary objectives expressedin the two above terms.

• Minimization result:

• Result: The membership degree of a datum Xj to cluster idepends only on its distance to this cluster.

∑ ∑∑∑= == =

−+=c

i

n

j

miji

c

i

n

jij

mijff uduCUXJ

1 11 1

2 )1(),,( η

)1/(12

1

1−

⎟⎟⎠

⎞⎜⎜⎝

⎛+

= m

i

ij

ijd

u

η


Problem Statement (cont.)• More about ηi

• Let m=2 in the previous update equation. If ηi equals (dij)2, then uij=0.5 . Hence, ηi determines the distance to the cluster i at which the membership degree should be 0.5 .

• Permitted extension of the cluster can be controlled by this parameter.

• ηi can be estimated by the fuzzy intra-class distance in probabilistic fuzzy clustering model:

∑

∑

=

== n

j

mij

n

jij

mij

i

u

du

1

1

2

η


Fuzzy Clustering

• Fuzzy Clustering Applications• Discussion and Conclusions

• Introduction• Problem Statement


FC algorithms• Major algorithms

• Fuzzy c-means (FCM)• Possibilistic c-means (PCM)

• Assumptions:• Input: data matrix (Xp×n), and number of clusters (c).

• Output: cluster centers (C), and fuzzy partition matrix (U).

• Initialize cluster centers randomly for all algorithms.

• Gustafson-Kessel (GK)


FC algorithms (cont.)• FCM

• A probabilistic fuzzy clustering approach,• Finds c spherical clusters → the cluster prototype is

the cluster center (C),• The found clusters are approximately the same size,• Distance measure: Euclidian distance,• According to the objective function Jf the cluster

prototype is updated as:

∑

∑+

=

+

+ = nmt

ij

n

jj

mtij

ti

u

xuC

)1(

1

)1(

)1(

)(

)( r

=j 1


FC algorithms (cont.)• FCM algorithm

• While repeat:• Compute distances.• Compute membership values (Partition matrix):

• Compute cluster centers:

ciforu

xuC n

j

mtij

n

jj

mtij

ti ,...,1

)(

)(

1

)1(

1

)1(

)1( ==

∑

∑

=

+

=

+

+

r

εε <−<− ++ )()1()()1( tttt CCorUU

Njandciford

du c

l

mlj

mijt

ij ,,1,,1

1

)1/(2

)1/(2)1( KK ===

∑=

−−

−−+


FC algorithms (cont.)• FCM (cont.)

• The probabilistic FCM is widely used as an initializer for other clustering methods.

• It is a fast, reliable and stable method.• In practice, FCM is not likely to stuck in local

minimums.

• But it has problems:• Lack of typicality,• Sensitive to outliers.

solution

PCM


FC algorithms (cont.)• PCM algorithm


• Compute cluster centers:

ciforu

xuC n

j

mtij

n

jj

mtij

ti ,...,1

)(

)(

1

)1(

1

)1(

)1( ==

∑

∑

=

+

=

+

+

r

εε <−<− ++ )()1()()1( tttt CCorUU

Njandciford

u m

i

ij

ij ,...,1,...,1

1

1)1/(12

==

⎟⎟⎠

⎞⎜⎜⎝

⎛+

= −

η Different from FCM

The same as FCM


FC algorithms (cont.)• FCM vs. PCM

• PCM has solved the problems of FCM, but we face a new problem :

Cluster Coincidence and Cluster Repulsion

FCM PCM


FC algorithms (cont.)• GK

• Problem of FCM and PCM: only spherical clusters.• In GK, each cluster is characterized by its center and

covariance matrix: • GK finds ellipsoidal clusters with approximately the

same size.• Clusters adapt themselves to the shape and location

of data, because of the covariance matrix.• Cluster size can be controlled by:

• Usually

cicC iii ,...,1},,{ =Σ=r

( )iΣdet( ) 1det =Σi


FC algorithms (cont.)• GK (cont.)

• The Mahalanobis distance is used in GK:

• Each cluster have its special size and shape,• The algorithm is locally adaptive,• We need an update equation for covariance matrix, to minimize the objective function (either in probabilistic or possibilistic):

( ) ( ) ( )ijiT

ijp

iij cxcxCxd rrrrr−Σ−Σ= −112 det),(

∑

∑

=

+

=

+++

+

−−=Σ n

j

tij

n

j

Ttij

tij

tij

ti

u

cxcxu

1

)1(

1

)1()1()1(

)1())(( rrrr


FC algorithms (cont.)• GK algorithm


• Compute cluster centers and cluster covariance matrix:

ciforu

xuC n

j

mtij

n

jj

mtij

ti ,...,1

)(

)(

1

)1(

1

)1(

)1( ==

∑

∑

=

+

=

+

+

r

εε <−<− ++ )()1()()1( tttt CCorUU

∑=

−−

−−+ = c

l

mlj

mijt

ij

d

du

1

)1/(2

)1/(2)1( or )1/(12

1

1−

⎟⎟⎠

⎞⎜⎜⎝

⎛+

= m

i

ij

ijd

u

η

∑

∑

=

+

=

+++

+

−−=Σ n

j

tij

n

j

Ttij

tij

tij

ti

u

cxcxu

1

)1(

1

)1()1()1(

)1())(( rrrr


FC algorithms (cont.)• FCM vs. GK

FCM GK


FC algorithms (cont.)• Other non-point-prototypes clustering models

• Shell clustering algorithms are used for segmentation and the detection of special geometrical contours.


Fuzzy Clustering

• Discussion and Conclusions

• Introduction• Problem Statement• Fuzzy Clustering Algorithms


FC applications• Typical Applications

• Fuzzy Inference System (FIS),• Image Processing,• Pattern Recognition, • Machine learning,• Data minig,• Social network analysis,• and ...


FC applications (cont.)• FIS

• Fuzzy inference mechanism is summerized as:

• Q1: Where do the membership functions come from?• Q2: How are the if-then rules extracted from data?


FC applications (cont.)• FIS from Fuzzy Clustering

• Clustering data in:• Input-Output feature space,

• Output space (induce clusters in inputs).• Obtain membership functions by:

• Projection onto variables,• Parametrization of the membership function.

• Extract one rule per cluster,• Usually, FCM + Mamdani FIS is used.

• Input and Output spaces separately,


FC applications (cont.)• FIS from Fuzzy Clustering (cont.)


FC applications (cont.)• The same idea for Ruspini’s Butterfly

m=1.25

m=2


FC applications (cont.)• Biomedical applications

• Tumor detection and extraction (cancer, mamograpgym, ...).

• Image segmentation (MRI images, Cephalic radiograohy, ...).


FC applications (cont.)• Tumor detection

Crisp

Adaptive methods

Fuzzy methods


Fuzzy Clustering

• Introduction• Problem Statement• Fuzzy Clustering Algorithms• Fuzzy Clustering Applications


Conclusion• In summary

• The ability to cluster data (concepts, perceptions, etc.) is an essential feature of human intelligence.

groups based on the similarity amongst patterns.• The result of clustering is a set of clusters, cluster centers,

and a matrix containig the membership degrees.• FCM results in spherical clusters, but confusing in equally

distant data objects.• PCM doen not have the normal constraint, but suffers

from coincidence or cluster repulsion.

• The main idea of FC is to partition data into overlaping


Conclusion (cont.)• Summary (cont.)

• KG uses the covariance matrix, hence it yields ellipsoidal clusters.

• The algorithms incorporate a fuzziness exponent which determines the intention of the algorithm towards fuzzy.


Conclusion (cont.)• Summary (cont.)

• FCM is widely used as an initializer for other clustering methods.

function generation to model fuzzy rule bases and inference systems.

• Compared to the classical (Crisp) clustering, FC methods show more efficiency in many applications.

• FC methods are widely used in Fuzzy membership


Discussion• Related issues

• Number of clusters (c):• Yang Shanlin & Malay proved that: c ≤ n0.5

• Elbow criterion: Define a validity measure, and evaluate it using different number of clusters to find an optimum point (elbow), where adding another cluster shouldn’t add sufficient information.


Discussion (cont.)• Related issues (cont.)


Discussion (cont.)• Shape of membership function

• Semantically fuzzy sets are required to be convex, monotonous and with limited support.

• Does PCM lead to convex membership functions?• We should choose another cluster estimation to have

proper clusters with flexibility to choose the membership functions support.

• Does FCM not support the above conditions?


Discussion (cont.)• Shape of membership function (cont.)

• A typical approach: Triangular Fuzzy Membership Functions.

FCM


References• Journal papers:

• C. Dring, M.J. Lesot, and R. Kruse, “Data Analysis with Fuzzy Clustering Methods”, Comp. statistics & data analysis, 2006.

• A. Baraldi, and P. Blonda, “A Survey of Fuzzy Clustering Algorithms for Pattern Recognition—Part I and II”, IEEE Trans. Systems, Man and Cybernetics, vol. 29, no. 6, 1999.

• J.C. Bezdek, “Pattern Recognition with Fuzzy Objective Function Algorithms”, Plenum Press, New York, 1981.

• A. K.Jain, M. N. Murtyand, and P. J. Flynn, “Data Clustering: A Review”, ACM Computing Surveys, vol. 31, no. 3, 1999.

• Thesis:

• A. I. Shihab, “Fuzzy Clustering Algorithms and Their Application to Medical Image Analysis”, PhD. Thesis, University of London, 2000.

sayadi_fuzzyclustering

Documents

classification clustering

hierarchical clustering

partitional clustering

similarity measures

data set

orcollection of data

data reduction

successive clusters