k - medoid clustering with genetic algorithm
DESCRIPTION
k - medoid clustering with genetic algorithm. Wei-Ming Chen 2012.12.06. Outline. k- medoids clustering famous works GCA : clustering with the add of a genetic algorithm Clustering genetic algorithm : also judge the number of clusters Conclusion. k- medoids clustering famous works - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/1.jpg)
WEI-MING CHEN2012 .12 .06
k-medoid clustering with genetic algorithm
![Page 2: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/2.jpg)
Outline
k-medoids clusteringfamous worksGCA : clustering with the add of a genetic
algorithmClustering genetic algorithm : also judge the
number of clusters Conclusion
![Page 3: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/3.jpg)
k-medoids clusteringfamous worksGCA : clustering with the add of a genetic
algorithmClustering genetic algorithm : also judge the
number of clusters Conclusion
![Page 4: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/4.jpg)
What is k-medoid clustering?
Proposed in 1987 (L. Kaufman and P.J. Rousseeuw)
There are N points in the spacek points are chosen as centers (medoids)Classify other points into k groupsWhich k points should be chosen to minimize
the summation of the points to its medoid
![Page 5: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/5.jpg)
Difficulty
NP-hardGenetic algorithms can be applied
![Page 6: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/6.jpg)
k-medoid clusteringfamous worksGCA : clustering with the add of a genetic
algorithmClustering genetic algorithm : also judge the
number of clusters Conclusion
![Page 7: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/7.jpg)
Partitioning Around Medoids (PAM)
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. New York: Wiley
Group N data into k setsIn every generation, select every pair of (Oi,
Oj), where Oi is a medoid and Oj is not, if replace Oi by Oj would reduce the distance, replace Oi by Oj
Computation time : O(k(N-k)2) [one generation]
![Page 8: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/8.jpg)
Clustering LARge Applications (CLARA)
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. New York: Wiley
Reduce the calculation timeOnly select s data in original N datas = 40+2k seems a good choiceComputation time : O(ks2+k(n-k)) [one
generation]
![Page 9: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/9.jpg)
Clustering Large Applications based upon RANdomized Search (CLARANS)
Ng, R., & Han, J. (1994). Efficient and effective clustering methods for spatial data mining. In Proceedings of the 20th international conference on very large databases, Santiago, Chile (pp. 144–155)
Do not try all pairs of (Oi, Oj)Try max(0.0125(k(N-k)), 250) different Oj to
each Oi
Computation time : O(N2) [one generation]
![Page 10: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/10.jpg)
k-medoids clusteringfamous worksGCA : clustering with the add of a
genetic algorithmClustering genetic algorithm : also judge the
number of clusters Conclusion
![Page 11: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/11.jpg)
GCA
Lucasius, C. B., Dane, A. D., & Kateman, G. (1993). On k-medoid clustering of large data sets with the aid of a genetic algorithm: Background, feasibility and comparison. Analytica Chimica Acta, 282, 647–669.
![Page 12: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/12.jpg)
Chromosome encoding
N data, clustering to k groupsProblem size = k (the number of groups)each location of the string is an integer
(1~N) (a medoid)
![Page 13: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/13.jpg)
Initialization
Each string in the population uniquely encodes a candidate solution of the target problem
Random choose the candidates
![Page 14: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/14.jpg)
Selection
Select M worst individuals in population and throw them out
![Page 15: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/15.jpg)
Crossover
Select some individuals for reproducing M new population
Building-block like crossoverMutation
![Page 16: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/16.jpg)
Crossover
For example, k =3, p1 = 2 3 7, p2 = 4 8 21. Mix p1 and p2
Q = 21 31 71 42 82 22
randomly scramble : Q = 42 22 21 82 71 31
2. Add new material : first k elements may be changed Q = 5 22 7 82 71 31
3. randomly scramble again Q = 22 71 7 31 5 82
4. The offspring are selected from left or from right C1 = 2 7 3 , C2 = 8 5 3
![Page 17: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/17.jpg)
Experiment
Under the limit of NFE < 100000N = 1000, k = 15
![Page 18: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/18.jpg)
Experiment
GCA versus Random search
![Page 19: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/19.jpg)
Experiment
GCA versus CLARA (k = 15)
![Page 20: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/20.jpg)
Experiment
GCA versus CLARA (k = 50)
![Page 21: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/21.jpg)
Experiment
![Page 22: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/22.jpg)
Paper’s conclusion
GCA can handle both large values of k and small values of k
GCA outperforms CLARA, especially when k is a large value
GCA lends itself excellently for parallelizationGCA can be combined with CLARA to obtain
a hybrid searching system with better performance.
![Page 23: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/23.jpg)
k-medoids clusteringfamous worksGCA : clustering with the add of a genetic
algorithmClustering genetic algorithm : also judge
the number of clusters Conclusion
![Page 24: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/24.jpg)
Motivation
In some cases, we do not actually know the number of clusters
If we only know the upper limit?
![Page 25: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/25.jpg)
Hruschka, E.R. and F.F.E. Nelson. (2003). “A Genetic Algorithm for Cluster Analysis.” Intelligent Data Analysis 7, 15–25.
![Page 26: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/26.jpg)
Fitness function
a(i) : the average distance of a individual to the individual in the same cluster
d(i) : the average distance of a individual to the individual in a different cluster
b(i) : the smallest of d(i, C)
![Page 27: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/27.jpg)
Fitness function
Silhouette fitness = This value will be high when…
small a(i) values high b(i) values
![Page 28: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/28.jpg)
Chromosome encoding
N data, clustering to at most k groupsProblem size = N+1 each location of the string is an integer (1~k)
(belongs to which cluster )Genotype1: 22345123453321454552
5To avoid following problems:
Genotype2: 2|2222|111113333344444 4 Genotype3: 4|4444|333335555511111 4 Child2: 2 4444 111113333344444 4 Child3: 4 2222 333335555511111 5
Consistent Algorithm : 11234512342215343441 5
![Page 29: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/29.jpg)
Initialization
Population size = 20The first genotype represents two clusters,
the second genotype represents three clusters, the third genotype represents four clusters, . . . , and the last one represents 21 clusters
![Page 30: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/30.jpg)
Selection
roulette wheel selection
normalize to
![Page 31: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/31.jpg)
Crossover
Uniform crossover do not workUse Grouping Genetic Algorithm (GGA),
proposed by Falkenauer (1998)
First, two strings are selectedA − 1123245125432533424B − 1212332124423221321
Randomly select groups to preserve in A(For example, group 2 and 3)
![Page 32: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/32.jpg)
Crossover
A − 1123245125432533424B − 1212332124423221321C − 0023200020032033020
Check the unchanged group in B and place in C
C − 0023200024432033020Another child : form by the groups in B
(without which is actually placed in C)D − 1212332120023221321
![Page 33: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/33.jpg)
Crossover
A − 1123245125432533424B − 1212332124423221321C − 0023200024432033020
Another child : form by the groups in B (without which is actually placed in C)
D − 1212332120023221321Check the unchanged group in A and place in D
The other objects (whose alleles are zeros) are placed to the nearest cluster
![Page 34: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/34.jpg)
Mutation
Two ways for mutation1. randomly chosen a group, places all the
objects to the remaining cluster that has the nearest centroid
2. divides a randomly selected group into two new ones
Just change the genotypes in the smallest possible way
![Page 35: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/35.jpg)
Experiment
4 test problems (N = 75, 200, 699, 150)
![Page 36: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/36.jpg)
Experiment
Ruspini data (N = 75)
![Page 37: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/37.jpg)
Paper’s conclusion
Do not need to know the number of groupsFind out the answer of four different test
problems successfully Only on small population size
![Page 38: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/38.jpg)
k-medoids clusteringfamous worksGCA : clustering with the add of a genetic
algorithmClustering genetic algorithm : also judge the
number of clusters Conclusion
![Page 39: k - medoid clustering with genetic algorithm](https://reader035.vdocuments.us/reader035/viewer/2022081505/5681636a550346895dd44523/html5/thumbnails/39.jpg)
Conclusion
Genetic algorithms is an acceptable method for clustering problems
Need to design crossover carefullyMaybe EDAs can be appliedSome theses? Or final projects!