objective of clustering - the university of edinburgh · fuzzy and soft k-means clustering...
TRANSCRIPT
![Page 1: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/1.jpg)
Objective of clustering
• Discover structures and patterns in high-dimensional
data.
• Group data with similar patterns together.
• This reduces the complexity and facilitates
interpretation.
![Page 2: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/2.jpg)
Expression level gene B
Exp
ress
ion
leve
l gen
e A
![Page 3: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/3.jpg)
Expression level gene B
Exp
ress
ion
leve
l gen
e A
Malignant tumour
Benign tumour
![Page 4: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/4.jpg)
Expression level gene B
Exp
ress
ion
leve
l gen
e A
Malignant tumour
Benign tumour
![Page 5: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/5.jpg)
Benign tumour
Malignant tumour
Exp
ress
ion
leve
l gen
e A
Expression level gene B
![Page 6: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/6.jpg)
Benign tumour
Malignant tumour
?
Exp
ress
ion
leve
l gen
e A
Expression level gene B
![Page 7: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/7.jpg)
Genes involved in pathway B
Genes involved in pathway A
Exp
ress
ion
leve
l und
er s
tarv
atio
n
Expression level under heat shock
?
![Page 8: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/8.jpg)
How shall we cluster the data?
![Page 9: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/9.jpg)
Good clustering
![Page 10: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/10.jpg)
Bad clustering
![Page 11: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/11.jpg)
Sum of squared vector-centroid distances
Within-group variance
![Page 12: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/12.jpg)
Between-group variance Sum over squared distances between centroids
![Page 13: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/13.jpg)
Tight clusters
Within-group variance small
![Page 14: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/14.jpg)
Diffuse clusters
Within-group variance large
![Page 15: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/15.jpg)
Between-group variance largeClusters far apart
![Page 16: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/16.jpg)
Between-group variance smallClusters close together
![Page 17: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/17.jpg)
How to cluster the data
![Page 18: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/18.jpg)
How to cluster the data
• Minimize the within-group variance
→ Tight clusters
![Page 19: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/19.jpg)
How to cluster the data
• Minimize the within-group variance
→ Tight clusters
• Maximize the between-group variance
→ Clusters well separated
![Page 20: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/20.jpg)
How to cluster the data
• Minimize the within-group variance
→ Tight clusters
• Maximize the between-group variance
→ Clusters well separated
• Problem NP-hard
→ Heuristic algorithms and approximations are needed.
![Page 21: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/21.jpg)
K-means clustering
![Page 22: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/22.jpg)
K-means clustering
• Objective: Partition the data into a predefined number
of clusters, K.
• Method: Alternatingly update
– the cluster assignment of each data vector;
– the cluster centroids.
![Page 23: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/23.jpg)
• Decide on the number of clusters, K.
![Page 24: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/24.jpg)
• Decide on the number of clusters, K.
• Start with a set of cluster centroids: c1, . . . , cK.
![Page 25: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/25.jpg)
• Decide on the number of clusters, K.
• Start with a set of cluster centroids: c1, . . . , cK.
• Iteration (until cluster assignments remain unchanged):
– For all data vectors xi, i = 1, . . . , N , and all centroids ck,
k = 1, . . . ,K: Compute the distance dik between the data
vector xi and the centroid ck.
![Page 26: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/26.jpg)
• Decide on the number of clusters, K.
• Start with a set of cluster centroids: c1, . . . , cK.
• Iteration (until cluster assignments remain unchanged):
– For all data vectors xi, i = 1, . . . , N , and all centroids ck,
k = 1, . . . ,K: Compute the distance dik between the data
vector xi and the centroid ck.
– Assign each data vector xi to the closest centroid ck, that is,
the one with minimal dik. Record the cluster membership in
an indicator variable λik, with λik = 1 if xi → ck and λik = 0otherwise.
![Page 27: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/27.jpg)
• Decide on the number of clusters, K.
• Start with a set of cluster centroids: c1, . . . , cK.
• Iteration (until cluster assignments remain unchanged):
– For all data vectors xi, i = 1, . . . , N , and all centroids ck,
k = 1, . . . ,K: Compute the distance dik between the data
vector xi and the centroid ck.
– Assign each data vector xi to the closest centroid ck, that is,
the one with minimal dik. Record the cluster membership in
an indicator variable λik, with λik = 1 if xi → ck and λik = 0otherwise.
– Set each cluster centroid to the mean of its assigned cluster:
ck =∑i λikxi∑i λik
![Page 28: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/28.jpg)
A good example of K-means clustering
![Page 29: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/29.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 30: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/30.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 31: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/31.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 32: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/32.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 33: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/33.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 34: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/34.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 35: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/35.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 36: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/36.jpg)
A bad example of K-means clustering
![Page 37: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/37.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 38: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/38.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 39: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/39.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 40: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/40.jpg)
Shortcoming of K-means clustering
• The algorithm can easily get stuck insuboptimal cluster formations.
• Use fuzzy or soft K-means.
![Page 41: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/41.jpg)
Fuzzy and soft K-means clustering
![Page 42: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/42.jpg)
Fuzzy and soft K-means clustering
• Objective: Soft or fuzzy partition of the data into a
predefined number of clusters, K.
– Each data vector may belong to more than one cluster,
according to its degree of membership.
– This is in contrast to K-means, where a data vector
either wholly belongs to a cluster or not.
![Page 43: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/43.jpg)
Fuzzy and soft K-means clustering
• Objective: Soft or fuzzy partition of the data into a
predefined number of clusters, K.
– Each data vector may belong to more than one cluster,
according to its degree of membership.
– This is in contrast to K-means, where a data vector
either wholly belongs to a cluster or not.
• Method: Alternatingly update
– the membership grade for each data vector;
– the cluster centroids.
![Page 44: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/44.jpg)
• Decide on the number of clusters, K.
• Start with a set of cluster centroids: c1, . . . , cK.
![Page 45: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/45.jpg)
• Decide on the number of clusters, K.
• Start with a set of cluster centroids: c1, . . . , cK.
• Iteration (until membership grades remain unchanged):
– For all data vectors xi, i = 1, . . . , N , and all centroids ck,
k = 1, . . . ,K: Compute the distance dik between the data
vector xi and the centroid ck.
![Page 46: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/46.jpg)
• Decide on the number of clusters, K.
• Start with a set of cluster centroids: c1, . . . , cK.
• Iteration (until membership grades remain unchanged):
– For all data vectors xi, i = 1, . . . , N , and all centroids ck,
k = 1, . . . ,K: Compute the distance dik between the data
vector xi and the centroid ck.
– Compute the membership grades λik. Note: λik ≥ 0 indicates
the amount of association of data vector xi with centroid ck and
depends on the distance dik: if dik < dik′, then λik > λik′. The
detailed functional form (omitted) differs between soft and fuzzy
K-means.
![Page 47: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/47.jpg)
• Decide on the number of clusters, K.
• Start with a set of cluster centroids: c1, . . . , cK.
• Iteration (until membership grades remain unchanged):
– For all data vectors xi, i = 1, . . . , N , and all centroids ck,
k = 1, . . . ,K: Compute the distance dik between the data
vector xi and the centroid ck.
– Compute the membership grades λik. Note: λik ≥ 0 indicates
the amount of association of data vector xi with centroid ck and
depends on the distance dik: if dik < dik′, then λik > λik′. The
detailed functional form (omitted) differs between soft and fuzzy
K-means.
– Recompute the cluster centroids: ck =∑i λikxi∑i λik
![Page 48: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/48.jpg)
Two examples of soft K-means clustering
The posterior probability for a given data point is indicated by a colour
scale ranging from pure red (corresponding to a posterior probability
of 1.0 for the red component and 0.0 for the blue component) through
to pure blue.
![Page 49: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/49.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 50: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/50.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 51: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/51.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 52: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/52.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 53: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/53.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 54: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/54.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 55: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/55.jpg)
Initialization for which K-means failed
![Page 56: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/56.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 57: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/57.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 58: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/58.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 59: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/59.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 60: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/60.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 61: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/61.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 62: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/62.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 63: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/63.jpg)
Agglomerative hierarchical clustering:UPGMA
(hierarchical average linkage clustering)
![Page 64: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/64.jpg)
. .
..
5
4
3
3 5421
1 2
![Page 65: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/65.jpg)
. .
..
5
4
3
3
d(1,2)/2
6
541
1 2
2
![Page 66: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/66.jpg)
. .
..
1 2 4 5
67
d(3,4)/2d(1,2)/2
3
3
4
5
21
![Page 67: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/67.jpg)
Distance between clusters
Average of individual distances.
![Page 68: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/68.jpg)
. .
..
1 2 4 5
67
d(3,4)/2d(1,2)/2
3
3
4
5
21
![Page 69: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/69.jpg)
. .
..
1 2 4 5
67
d(1,2)/2
8
d(5,7)/2d(3,4)/2
3
3
4
5
21
![Page 70: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/70.jpg)
. .
..
1 2 3 4 5
67
d(5,7)/2
8
9
d(1,2)/2 d(3,4)/2
3
4
5
d(6,8)/2 - d(5,7)/2
21
![Page 71: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/71.jpg)
Experiments
Genes.
![Page 72: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/72.jpg)
Experiments
Genes.
![Page 73: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/73.jpg)
Experiments
Genes.
![Page 74: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/74.jpg)
Experiments
Genes.
![Page 75: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/75.jpg)
Experiments
Genes.
![Page 76: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/76.jpg)
Experiments
Genes
![Page 77: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/77.jpg)
From Spellman et al., http://cellcycle-www.stanford.edu/
![Page 78: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/78.jpg)
• Hierarchical clustering methods produce a tree or
dendrogram → Allows the biologist to visualize and
interpret the data.
• No need to specify how many clusters are appropriate
−→ partition of the data for each number of clusters
K.
• Partitions are obtained from cutting the tree at different
levels.
![Page 79: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/79.jpg)
. .
..
1 2 3 4 5
67
d(5,7)/2
8
9
d(1,2)/2 d(3,4)/2
3
4
5
d(6,8)/2 - d(5,7)/2
21
![Page 80: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/80.jpg)
. .
...
4 5
67
8
9
d(5,7)/2
d(1,2)/2 d(3,4)/2
3
4
5
d(6,8)/2 - d(5,7)/2
1 2
31 2
![Page 81: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/81.jpg)
Principal clustering paradigms
• Non-hierarchical
Cluster N vectors into K groups in an iterative process.
• Hierarchical
Hierarchie of nested clusters; each cluster typically
consists of the union of two smaller sub-clusters.
![Page 82: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/82.jpg)
Hierarchical methods can be further subdivided
![Page 83: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/83.jpg)
Hierarchical methods can be further subdivided
• Bottom-up or agglomerative clustering:
Start with a single-object cluster and recursively merge
them into larger clusters.
![Page 84: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/84.jpg)
Hierarchical methods can be further subdivided
• Bottom-up or agglomerative clustering:
Start with a single-object cluster and recursively merge
them into larger clusters.
• Top down or divisive clustering:
Start with a cluster containing all data and recursively
divide it into smaller clusters.
![Page 85: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/85.jpg)
Overview of clustering methods
Hierarchical Non-hierarchical
Top-down or divisive K-means
Fuzzy/soft K-means
Bottom-up or agglomerative UPGMA
![Page 86: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/86.jpg)
Shortcoming of bottom-up agglomerative clustering
• Focus on local structures → loses some of the information
present in global patterns.
• Once a data vector has been assigned to a node,
it cannot be reassigned to another node later
when global patterns emerge.
![Page 87: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/87.jpg)
Shortcoming of bottom-up agglomerative clustering
• Focus on local structures → loses some of the information
present in global patterns.
• Once a data vector has been assigned to a node,
it cannot be reassigned to another node later
when global patterns emerge.
How can we devise a hierarchical top-down approach?
Hierarchical Non-hierarchical
Top-down or divisive ? K-means
Fuzzy/soft K-means
Bottom-up or agglomerative UPGMA
![Page 88: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/88.jpg)
Divisive (top-down) hierarchical clustering:Binary tree-structured vector quantization
(BTSVQ)
![Page 89: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/89.jpg)
Initially, all data belong to the same cluster
. .
..
![Page 90: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/90.jpg)
Fetch one cluster from the stack.Split this cluster into two clustersusing the fuzzy/soft Kmeans algorithm
Initially, all data belong to the same cluster
. .
..
![Page 91: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/91.jpg)
. .
..
Initially, all data belong to the same cluster
updated) threshold?Is this ratio larger than a (dynamicallybetween variance/within variance.Compute the ratio
using the fuzzy/soft Kmeans algorithmSplit this cluster into two clustersFetch one cluster from the stack.
![Page 92: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/92.jpg)
. .
..
Initially, all data belong to the same cluster
Yes updated) threshold?Is this ratio larger than a (dynamicallybetween variance/within variance.Compute the ratio
the stackclusters onPlace both
using the fuzzy/soft Kmeans algorithmSplit this cluster into two clustersFetch one cluster from the stack.
![Page 93: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/93.jpg)
Compute the ratiobetween variance/within variance.Is this ratio larger than a (dynamicallyupdated) threshold?Yes
No
Initially, all data belong to the same cluster
. .
..
Fetch one cluster from the stack.Split this cluster into two clustersusing the fuzzy/soft Kmeans algorithm
Merge the two clusters and remove theresulting cluster from the stack.
Place bothclusters onthe stack
![Page 94: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/94.jpg)
Compute the ratiobetween variance/within variance.Is this ratio larger than a (dynamicallyupdated) threshold?Yes
No
Initially, all data belong to the same cluster
. .
..
Fetch one cluster from the stack.Split this cluster into two clustersusing the fuzzy/soft Kmeans algorithm
Merge the two clusters and remove theresulting cluster from the stack.
Any remaining clusters on the stack ?
Place bothclusters onthe stack
![Page 95: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/95.jpg)
Compute the ratiobetween variance/within variance.Is this ratio larger than a (dynamicallyupdated) threshold?
Yes
Yes
No
Initially, all data belong to the same cluster
. .
..
Fetch one cluster from the stack.Split this cluster into two clustersusing the fuzzy/soft Kmeans algorithm
Merge the two clusters and remove theresulting cluster from the stack.
Any remaining clusters on the stack ?
Place bothclusters onthe stack
![Page 96: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/96.jpg)
Compute the ratiobetween variance/within variance.Is this ratio larger than a (dynamicallyupdated) threshold?
Yes
Yes
No
Initially, all data belong to the same cluster
No
. .
..
Fetch one cluster from the stack.Split this cluster into two clustersusing the fuzzy/soft Kmeans algorithm
Merge the two clusters and remove theresulting cluster from the stack.
Any remaining clusters on the stack ?
Place bothclusters onthe stack
End
![Page 97: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/97.jpg)
Compute the ratiobetween variance/within variance.Is this ratio larger than a (dynamicallyupdated) threshold?
Yes
Yes
No
Initially, all data belong to the same cluster
NoAdapted in simplified form fromSzeto et al.Journal of Visual Languagesand Computing 14, 341-362
(2003)
. .
..
Fetch one cluster from the stack.Split this cluster into two clustersusing the fuzzy/soft Kmeans algorithm
Merge the two clusters and remove theresulting cluster from the stack.
Any remaining clusters on the stack ?
Place bothclusters onthe stack
End
![Page 98: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/98.jpg)
Overview of clustering methods
Hierarchical Non-hierarchical
Top-down or divisive BTSVQ K-means
Fuzzy/soft K-means
Bottom-up or agglomerative UPGMA
![Page 99: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/99.jpg)
Pitfalls of clustering
![Page 100: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/100.jpg)
Pitfalls of clustering
• Clustering algorithms always produce clusters even for
uniformaly distrubuted data.
![Page 101: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/101.jpg)
Pitfalls of clustering
• Clustering algorithms always produce clusters even for
uniformaly distrubuted data.
• Difficult to test the null hypothesis of no clusters
(current research).
![Page 102: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/102.jpg)
Pitfalls of clustering
• Clustering algorithms always produce clusters even for
uniformaly distrubuted data.
• Difficult to test the null hypothesis of no clusters
(current research).
• Difficult to estimate the true number of clusters (current
research).
![Page 103: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/103.jpg)
Pitfalls of clustering
• Clustering algorithms always produce clusters even for
uniformaly distrubuted data.
• Difficult to test the null hypothesis of no clusters
(current research).
• Difficult to estimate the true number of clusters (current
research).
• Risk of artifacts.
• Use clustering only for hypothesis generation.
![Page 104: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/104.jpg)
Pitfalls of clustering
• Clustering algorithms always produce clusters even for
uniformaly distrubuted data.
• Difficult to test the null hypothesis of no clusters
(current research).
• Difficult to estimate the true number of clusters (current
research).
• Risk of artifacts.
• Use clustering only for hypothesis generation.
• Independent experimental verification required.
![Page 105: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/105.jpg)
Deciding on the number of clusters:Gap statistic
Tibshirani, Walther, Hastie (2001), J. Royal Statistical Society B
Idea:
• Compute EK for randomized data.
• Compare this with EK from real data.
![Page 106: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/106.jpg)
Randomize data
seneG
Experiments
![Page 107: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/107.jpg)
Randomize data
Shuffle. .
seneG
Experiments
![Page 108: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/108.jpg)
Randomize data
. .
seneG
Experiments
![Page 109: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/109.jpg)
Randomize data
seneG
Experiments
![Page 110: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/110.jpg)
Randomize data
Shuffle
seneG
Experiments
![Page 111: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/111.jpg)
Randomize data
. .
seneG
Experiments
![Page 112: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/112.jpg)
Randomize data
. .
seneG
Experiments
And so on . . .
![Page 113: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/113.jpg)
5 81 2 3 4 6 7Number of clusters K
random data
true data
.
. .
.
E K
![Page 114: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/114.jpg)
764321 85 5
.
..
.
Gap= | |(randomized data)(true dat) -
true data
random data
Number of clusters K Number of clusters K764321 8
KE KEKE
![Page 115: Objective of clustering - The University of Edinburgh · Fuzzy and soft K-means clustering Objective: Soft or fuzzy partition of the data into a prede ned number of clusters, K. {](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f1830fc900c682a68021c3d/html5/thumbnails/115.jpg)
85764321 85 1
.
..
.
Adapted from Hastie, Tibshiranie, Friedman: The Elements of Statistical Learning, Springer 2001
over random dataMaximal improvement
Gap= | |(randomized data)(true dat) -
true data
random data
Number of clusters K Number of clusters K76432
KE KEKE