![Page 1: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/1.jpg)
Hierarchical clustering
David M. Blei
COS424Princeton University
February 28, 2008
D. Blei Clustering 02 1 / 21
![Page 2: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/2.jpg)
Hierarchical clustering
• Hierarchical clustering is a widely used data analysis tool.
• The idea is to build a binary tree of the data that successivelymerges similar groups of points
• Visualizing this tree provides a useful summary of the data
D. Blei Clustering 02 2 / 21
![Page 3: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/3.jpg)
Hierarchical clustering
• Hierarchical clustering is a widely used data analysis tool.
• The idea is to build a binary tree of the data that successivelymerges similar groups of points
• Visualizing this tree provides a useful summary of the data
D. Blei Clustering 02 2 / 21
![Page 4: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/4.jpg)
Hierarchical clustering
• Hierarchical clustering is a widely used data analysis tool.
• The idea is to build a binary tree of the data that successivelymerges similar groups of points
• Visualizing this tree provides a useful summary of the data
D. Blei Clustering 02 2 / 21
![Page 5: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/5.jpg)
Hierarchical clusering vs. k-means
• Recall that k-means or k-medoids requires
• A number of clusters k• An initial assignment of data to clusters• A distance measure between data d(xn, xm)
• Hierarchical clustering only requires a measure of similarity betweengroups of data points.
D. Blei Clustering 02 3 / 21
![Page 6: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/6.jpg)
Hierarchical clusering vs. k-means
• Recall that k-means or k-medoids requires
• A number of clusters k
• An initial assignment of data to clusters• A distance measure between data d(xn, xm)
• Hierarchical clustering only requires a measure of similarity betweengroups of data points.
D. Blei Clustering 02 3 / 21
![Page 7: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/7.jpg)
Hierarchical clusering vs. k-means
• Recall that k-means or k-medoids requires
• A number of clusters k• An initial assignment of data to clusters
• A distance measure between data d(xn, xm)
• Hierarchical clustering only requires a measure of similarity betweengroups of data points.
D. Blei Clustering 02 3 / 21
![Page 8: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/8.jpg)
Hierarchical clusering vs. k-means
• Recall that k-means or k-medoids requires
• A number of clusters k• An initial assignment of data to clusters• A distance measure between data d(xn, xm)
• Hierarchical clustering only requires a measure of similarity betweengroups of data points.
D. Blei Clustering 02 3 / 21
![Page 9: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/9.jpg)
Hierarchical clusering vs. k-means
• Recall that k-means or k-medoids requires
• A number of clusters k• An initial assignment of data to clusters• A distance measure between data d(xn, xm)
• Hierarchical clustering only requires a measure of similarity betweengroups of data points.
D. Blei Clustering 02 3 / 21
![Page 10: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/10.jpg)
Agglomerative clustering
• We will talk about agglomerative clustering.
• Algorithm:
1 Place each data point into its own singleton group2 Repeat: iteratively merge the two closest groups3 Until: all the data are merged into a single cluster
D. Blei Clustering 02 4 / 21
![Page 11: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/11.jpg)
Agglomerative clustering
• We will talk about agglomerative clustering.
• Algorithm:
1 Place each data point into its own singleton group2 Repeat: iteratively merge the two closest groups3 Until: all the data are merged into a single cluster
D. Blei Clustering 02 4 / 21
![Page 12: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/12.jpg)
Agglomerative clustering
• We will talk about agglomerative clustering.
• Algorithm:
1 Place each data point into its own singleton group
2 Repeat: iteratively merge the two closest groups3 Until: all the data are merged into a single cluster
D. Blei Clustering 02 4 / 21
![Page 13: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/13.jpg)
Agglomerative clustering
• We will talk about agglomerative clustering.
• Algorithm:
1 Place each data point into its own singleton group2 Repeat: iteratively merge the two closest groups
3 Until: all the data are merged into a single cluster
D. Blei Clustering 02 4 / 21
![Page 14: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/14.jpg)
Agglomerative clustering
• We will talk about agglomerative clustering.
• Algorithm:
1 Place each data point into its own singleton group2 Repeat: iteratively merge the two closest groups3 Until: all the data are merged into a single cluster
D. Blei Clustering 02 4 / 21
![Page 15: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/15.jpg)
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
Data
D. Blei Clustering 02 5 / 21
![Page 16: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/16.jpg)
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 001
V1
V2
D. Blei Clustering 02 5 / 21
![Page 17: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/17.jpg)
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 002
V1
V2
D. Blei Clustering 02 5 / 21
![Page 18: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/18.jpg)
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 003
V1
V2
D. Blei Clustering 02 5 / 21
![Page 19: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/19.jpg)
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 004
V1
V2
D. Blei Clustering 02 5 / 21
![Page 20: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/20.jpg)
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 005
V1
V2
D. Blei Clustering 02 5 / 21
![Page 21: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/21.jpg)
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 006
V1
V2
D. Blei Clustering 02 5 / 21
![Page 22: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/22.jpg)
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 007
V1
V2
D. Blei Clustering 02 5 / 21
![Page 23: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/23.jpg)
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 008
V1
V2
D. Blei Clustering 02 5 / 21
![Page 24: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/24.jpg)
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 009
V1
V2
D. Blei Clustering 02 5 / 21
![Page 25: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/25.jpg)
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 010
V1
V2
D. Blei Clustering 02 5 / 21
![Page 26: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/26.jpg)
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 011
V1
V2
D. Blei Clustering 02 5 / 21
![Page 27: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/27.jpg)
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 012
V1
V2
D. Blei Clustering 02 5 / 21
![Page 28: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/28.jpg)
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 013
V1
V2
D. Blei Clustering 02 5 / 21
![Page 29: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/29.jpg)
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 014
V1
V2
D. Blei Clustering 02 5 / 21
![Page 30: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/30.jpg)
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 015
V1
V2
D. Blei Clustering 02 5 / 21
![Page 31: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/31.jpg)
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 016
V1
V2
D. Blei Clustering 02 5 / 21
![Page 32: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/32.jpg)
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 017
V1
V2
D. Blei Clustering 02 5 / 21
![Page 33: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/33.jpg)
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 018
V1
V2
D. Blei Clustering 02 5 / 21
![Page 34: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/34.jpg)
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 019
V1
V2
D. Blei Clustering 02 5 / 21
![Page 35: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/35.jpg)
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 020
V1
V2
D. Blei Clustering 02 5 / 21
![Page 36: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/36.jpg)
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 021
V1
V2
D. Blei Clustering 02 5 / 21
![Page 37: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/37.jpg)
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 022
V1
V2
D. Blei Clustering 02 5 / 21
![Page 38: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/38.jpg)
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 023
V1
V2
D. Blei Clustering 02 5 / 21
![Page 39: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/39.jpg)
Example
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80
−20
020
4060
80
iteration 024
V1
V2
D. Blei Clustering 02 5 / 21
![Page 40: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/40.jpg)
Agglomerative clustering
• Each level of the resulting tree is a segmentation of the data
• The algorithm results in a sequence of groupings
• It is up to the user to choose a ”natural” clustering from thissequence
D. Blei Clustering 02 6 / 21
![Page 41: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/41.jpg)
Agglomerative clustering
• Each level of the resulting tree is a segmentation of the data
• The algorithm results in a sequence of groupings
• It is up to the user to choose a ”natural” clustering from thissequence
D. Blei Clustering 02 6 / 21
![Page 42: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/42.jpg)
Agglomerative clustering
• Each level of the resulting tree is a segmentation of the data
• The algorithm results in a sequence of groupings
• It is up to the user to choose a ”natural” clustering from thissequence
D. Blei Clustering 02 6 / 21
![Page 43: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/43.jpg)
Dendrogram
• Agglomerative clustering is monotonic
• The similarity between merged clusters is monotone decreasingwith the level of the merge.
• Dendrogram: Plot each merge at the (negative) similarity betweenthe two merged groups
• Provides an interpretable visualization of the algorithm and data
• Useful summarization tool, part of why hierarchical clustering ispopular
D. Blei Clustering 02 7 / 21
![Page 44: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/44.jpg)
Dendrogram
• Agglomerative clustering is monotonic
• The similarity between merged clusters is monotone decreasingwith the level of the merge.
• Dendrogram: Plot each merge at the (negative) similarity betweenthe two merged groups
• Provides an interpretable visualization of the algorithm and data
• Useful summarization tool, part of why hierarchical clustering ispopular
D. Blei Clustering 02 7 / 21
![Page 45: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/45.jpg)
Dendrogram
• Agglomerative clustering is monotonic
• The similarity between merged clusters is monotone decreasingwith the level of the merge.
• Dendrogram: Plot each merge at the (negative) similarity betweenthe two merged groups
• Provides an interpretable visualization of the algorithm and data
• Useful summarization tool, part of why hierarchical clustering ispopular
D. Blei Clustering 02 7 / 21
![Page 46: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/46.jpg)
Dendrogram
• Agglomerative clustering is monotonic
• The similarity between merged clusters is monotone decreasingwith the level of the merge.
• Dendrogram: Plot each merge at the (negative) similarity betweenthe two merged groups
• Provides an interpretable visualization of the algorithm and data
• Useful summarization tool, part of why hierarchical clustering ispopular
D. Blei Clustering 02 7 / 21
![Page 47: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/47.jpg)
Dendrogram
• Agglomerative clustering is monotonic
• The similarity between merged clusters is monotone decreasingwith the level of the merge.
• Dendrogram: Plot each merge at the (negative) similarity betweenthe two merged groups
• Provides an interpretable visualization of the algorithm and data
• Useful summarization tool, part of why hierarchical clustering ispopular
D. Blei Clustering 02 7 / 21
![Page 48: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/48.jpg)
Dendrogram of example data
2904
2432
2641
2489
2278
2905
2085
2959 27
4327
9723
1422
8224
2520
2414
5517
2316
2216
16 244
851
220
81 184 25
247
7
020
4060
8010
012
0
Cluster Dendrogram
hclust (*, "complete")dist(x)
Hei
ght
Groups that merge at high values relative to the merger values of theirsubgroups are candidates for natural clusters. (Tibshirani et al., 2001)
D. Blei Clustering 02 8 / 21
![Page 49: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/49.jpg)
Group similarity
• Given a distance measure between points, the user has many choicesfor how to define intergroup similarity.
• Three most popular choices
• Single-linkage: the similarity of the closest pair
dSL(G ,H) = mini∈G ,j∈H
di ,j
• Complete linkage: the similarity of the furthest pair
dCL(G ,H) = maxi∈G ,j∈H
di ,j
• Group average: the average similarity between groups
dGA =1
NGNH
∑i∈G
∑j∈H
di ,j
D. Blei Clustering 02 9 / 21
![Page 50: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/50.jpg)
Group similarity
• Given a distance measure between points, the user has many choicesfor how to define intergroup similarity.
• Three most popular choices
• Single-linkage: the similarity of the closest pair
dSL(G ,H) = mini∈G ,j∈H
di ,j
• Complete linkage: the similarity of the furthest pair
dCL(G ,H) = maxi∈G ,j∈H
di ,j
• Group average: the average similarity between groups
dGA =1
NGNH
∑i∈G
∑j∈H
di ,j
D. Blei Clustering 02 9 / 21
![Page 51: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/51.jpg)
Group similarity
• Given a distance measure between points, the user has many choicesfor how to define intergroup similarity.
• Three most popular choices
• Single-linkage: the similarity of the closest pair
dSL(G ,H) = mini∈G ,j∈H
di ,j
• Complete linkage: the similarity of the furthest pair
dCL(G ,H) = maxi∈G ,j∈H
di ,j
• Group average: the average similarity between groups
dGA =1
NGNH
∑i∈G
∑j∈H
di ,j
D. Blei Clustering 02 9 / 21
![Page 52: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/52.jpg)
Group similarity
• Given a distance measure between points, the user has many choicesfor how to define intergroup similarity.
• Three most popular choices
• Single-linkage: the similarity of the closest pair
dSL(G ,H) = mini∈G ,j∈H
di ,j
• Complete linkage: the similarity of the furthest pair
dCL(G ,H) = maxi∈G ,j∈H
di ,j
• Group average: the average similarity between groups
dGA =1
NGNH
∑i∈G
∑j∈H
di ,j
D. Blei Clustering 02 9 / 21
![Page 53: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/53.jpg)
Group similarity
• Given a distance measure between points, the user has many choicesfor how to define intergroup similarity.
• Three most popular choices
• Single-linkage: the similarity of the closest pair
dSL(G ,H) = mini∈G ,j∈H
di ,j
• Complete linkage: the similarity of the furthest pair
dCL(G ,H) = maxi∈G ,j∈H
di ,j
• Group average: the average similarity between groups
dGA =1
NGNH
∑i∈G
∑j∈H
di ,j
D. Blei Clustering 02 9 / 21
![Page 54: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/54.jpg)
Properties of intergroup similarity
• Single linkage can produce “chaining,” where a sequence of closeobservations in different groups cause early merges of those groups
• Complete linkage has the opposite problem. It might not mergeclose groups because of outlier members that are far apart.
• Group average represents a natural compromise, but depends on thescale of the similarities. Applying a monotone transformation to thesimilarities can change the results.
D. Blei Clustering 02 10 / 21
![Page 55: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/55.jpg)
Properties of intergroup similarity
• Single linkage can produce “chaining,” where a sequence of closeobservations in different groups cause early merges of those groups
• Complete linkage has the opposite problem. It might not mergeclose groups because of outlier members that are far apart.
• Group average represents a natural compromise, but depends on thescale of the similarities. Applying a monotone transformation to thesimilarities can change the results.
D. Blei Clustering 02 10 / 21
![Page 56: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/56.jpg)
Properties of intergroup similarity
• Single linkage can produce “chaining,” where a sequence of closeobservations in different groups cause early merges of those groups
• Complete linkage has the opposite problem. It might not mergeclose groups because of outlier members that are far apart.
• Group average represents a natural compromise, but depends on thescale of the similarities. Applying a monotone transformation to thesimilarities can change the results.
D. Blei Clustering 02 10 / 21
![Page 57: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/57.jpg)
Caveats
• Hierarchical clustering should be treated with caution.
• Different decisions about group similarities can lead to vastlydifferent dendrograms.
• The algorithm imposes a hierarchical structure on the data, evendata for which such structure is not appropriate.
D. Blei Clustering 02 11 / 21
![Page 58: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/58.jpg)
Caveats
• Hierarchical clustering should be treated with caution.
• Different decisions about group similarities can lead to vastlydifferent dendrograms.
• The algorithm imposes a hierarchical structure on the data, evendata for which such structure is not appropriate.
D. Blei Clustering 02 11 / 21
![Page 59: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/59.jpg)
Caveats
• Hierarchical clustering should be treated with caution.
• Different decisions about group similarities can lead to vastlydifferent dendrograms.
• The algorithm imposes a hierarchical structure on the data, evendata for which such structure is not appropriate.
D. Blei Clustering 02 11 / 21
![Page 60: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/60.jpg)
Examples
• “Repeated Observation of Breast Tumor Subtypes in IndependentGene Expression Data Sets” (Sorlie et al., 2003)
• Hierarchical clustering of gene expression data lead to new theories
• Later, theories tested in the lab.
D. Blei Clustering 02 12 / 21
![Page 61: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/61.jpg)
Examples
• “Repeated Observation of Breast Tumor Subtypes in IndependentGene Expression Data Sets” (Sorlie et al., 2003)
• Hierarchical clustering of gene expression data lead to new theories
• Later, theories tested in the lab.
D. Blei Clustering 02 12 / 21
![Page 62: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/62.jpg)
Examples
• “Repeated Observation of Breast Tumor Subtypes in IndependentGene Expression Data Sets” (Sorlie et al., 2003)
• Hierarchical clustering of gene expression data lead to new theories
• Later, theories tested in the lab.
D. Blei Clustering 02 12 / 21
![Page 63: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/63.jpg)
D. Blei Clustering 02 13 / 21
![Page 64: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/64.jpg)
Examples
• “The Balance of Roger de Piles” (Studdert-Kennedy and Davenport,1974)
• Roger de Piles rated 57 paintings along different dimensions.
• These authors cluster them using different methods, includinghierarchical clustering
• They discuss the different clusters. (They are art critics.)
D. Blei Clustering 02 14 / 21
![Page 65: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/65.jpg)
Examples
• “The Balance of Roger de Piles” (Studdert-Kennedy and Davenport,1974)
• Roger de Piles rated 57 paintings along different dimensions.
• These authors cluster them using different methods, includinghierarchical clustering
• They discuss the different clusters. (They are art critics.)
D. Blei Clustering 02 14 / 21
![Page 66: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/66.jpg)
Examples
• “The Balance of Roger de Piles” (Studdert-Kennedy and Davenport,1974)
• Roger de Piles rated 57 paintings along different dimensions.
• These authors cluster them using different methods, includinghierarchical clustering
• They discuss the different clusters. (They are art critics.)
D. Blei Clustering 02 14 / 21
![Page 67: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/67.jpg)
Examples
• “The Balance of Roger de Piles” (Studdert-Kennedy and Davenport,1974)
• Roger de Piles rated 57 paintings along different dimensions.
• These authors cluster them using different methods, includinghierarchical clustering
• They discuss the different clusters. (They are art critics.)
D. Blei Clustering 02 14 / 21
![Page 68: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/68.jpg)
Good: They are cautious. “The value of this analysis...will depend onany interesting speculation it may provoke.”
D. Blei Clustering 02 15 / 21
![Page 69: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/69.jpg)
Examples
• “Similarity Grouping of Australian Universities” (Stanley andReynlds, 1994)
• Use hierarchical clustering on Austrailian universities
• Use features such as
• # of staff in different departments• entry scores• funding• evaluations
D. Blei Clustering 02 16 / 21
![Page 70: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/70.jpg)
Examples
• “Similarity Grouping of Australian Universities” (Stanley andReynlds, 1994)
• Use hierarchical clustering on Austrailian universities
• Use features such as
• # of staff in different departments• entry scores• funding• evaluations
D. Blei Clustering 02 16 / 21
![Page 71: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/71.jpg)
Examples
• “Similarity Grouping of Australian Universities” (Stanley andReynlds, 1994)
• Use hierarchical clustering on Austrailian universities
• Use features such as
• # of staff in different departments• entry scores• funding• evaluations
D. Blei Clustering 02 16 / 21
![Page 72: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/72.jpg)
Examples
• “Similarity Grouping of Australian Universities” (Stanley andReynlds, 1994)
• Use hierarchical clustering on Austrailian universities
• Use features such as
• # of staff in different departments
• entry scores• funding• evaluations
D. Blei Clustering 02 16 / 21
![Page 73: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/73.jpg)
Examples
• “Similarity Grouping of Australian Universities” (Stanley andReynlds, 1994)
• Use hierarchical clustering on Austrailian universities
• Use features such as
• # of staff in different departments• entry scores
• funding• evaluations
D. Blei Clustering 02 16 / 21
![Page 74: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/74.jpg)
Examples
• “Similarity Grouping of Australian Universities” (Stanley andReynlds, 1994)
• Use hierarchical clustering on Austrailian universities
• Use features such as
• # of staff in different departments• entry scores• funding
• evaluations
D. Blei Clustering 02 16 / 21
![Page 75: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/75.jpg)
Examples
• “Similarity Grouping of Australian Universities” (Stanley andReynlds, 1994)
• Use hierarchical clustering on Austrailian universities
• Use features such as
• # of staff in different departments• entry scores• funding• evaluations
D. Blei Clustering 02 16 / 21
![Page 76: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/76.jpg)
One dendrogram
D. Blei Clustering 02 17 / 21
![Page 77: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/77.jpg)
Another dendrogram—notice that it’s different
D. Blei Clustering 02 18 / 21
![Page 78: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/78.jpg)
• Split values: They notice that there’s no kink and conclude thatthere is no cluster structure in Austrailian universities.
• Good: Cautious interpretation of clustering, analysis of clusteringbased on multiple subsets of the features.
• Bad: Their conclusions—we can’t cluster Australianuniversities—ignores all the algorithmic choices that were made.
D. Blei Clustering 02 19 / 21
![Page 79: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/79.jpg)
Examples
• “Comovement of International Equity Markets: A TaxonomicApproach” (Panton et al., 1976)
• Data: weekly rates of return for stocks in twelve countries
• Run agglometerative clustering year by year
• Interpret the structure and examine stability over different timeperiods
D. Blei Clustering 02 20 / 21
![Page 80: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/80.jpg)
Examples
• “Comovement of International Equity Markets: A TaxonomicApproach” (Panton et al., 1976)
• Data: weekly rates of return for stocks in twelve countries
• Run agglometerative clustering year by year
• Interpret the structure and examine stability over different timeperiods
D. Blei Clustering 02 20 / 21
![Page 81: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/81.jpg)
Examples
• “Comovement of International Equity Markets: A TaxonomicApproach” (Panton et al., 1976)
• Data: weekly rates of return for stocks in twelve countries
• Run agglometerative clustering year by year
• Interpret the structure and examine stability over different timeperiods
D. Blei Clustering 02 20 / 21
![Page 82: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/82.jpg)
Examples
• “Comovement of International Equity Markets: A TaxonomicApproach” (Panton et al., 1976)
• Data: weekly rates of return for stocks in twelve countries
• Run agglometerative clustering year by year
• Interpret the structure and examine stability over different timeperiods
D. Blei Clustering 02 20 / 21
![Page 83: David M. Blei...• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups • Provides an interpretable visualization of the algorithm and data •](https://reader033.vdocuments.us/reader033/viewer/2022041514/5e2a042317120708f119fbb0/html5/thumbnails/83.jpg)
Examples
Good: Cautious. “This study is only descriptive...A logical subsequentresearch area is to explain observed structural properties and the causesof structural change.”
D. Blei Clustering 02 21 / 21