cutting the dendrogram through permutation tests
TRANSCRIPT
![Page 1: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/1.jpg)
Dario Bruzzese Domenico [email protected] [email protected]
Dario Bruzzese, Domenico Vistocco () Compstat 2010 1 / 19
Cutting the dendrogram throughpermutation tests
Department ofPreventive Medical Sciences
UNIVERSITY OF NAPLES ITALY
Department ofEconomics
UNIVERSITY OF CASSINO ITALY
![Page 2: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/2.jpg)
La Carte
1 Motivation
2 The stairstep-like permutation procedureNotationThe outline
3 Some resultsReal datasetsSynthetic dataset
4 ToDo List
Dario Bruzzese, Domenico Vistocco () Compstat 2010 2 / 19
![Page 3: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/3.jpg)
La Carte
1 Motivation
2 The stairstep-like permutation procedureNotationThe outline
3 Some resultsReal datasetsSynthetic dataset
4 ToDo List
Dario Bruzzese, Domenico Vistocco () Compstat 2010 3 / 19
![Page 4: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/4.jpg)
Motivation
Dario Bruzzese, Domenico Vistocco () Compstat 2010 4 / 19
Automatically determine the optimal cut-off level of a dendrogramExplore partitions different from those allowed by an horizontal cut
![Page 5: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/5.jpg)
Motivation
Dario Bruzzese, Domenico Vistocco () Compstat 2010 4 / 19
Automatically determine the optimal cut-off level of a dendrogramExplore partitions different from those allowed by an horizontal cut
The rep1HighNoise datasetYeung KY, Medvedovic M, Bumgarner KY:Clustering gene-expression data withrepeated measurements.
Genome Biology, 2003, 4:R34
n = 200p = 20
![Page 6: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/6.jpg)
Motivation
Dario Bruzzese, Domenico Vistocco () Compstat 2010 4 / 19
Automatically determine the optimal cut-off level of a dendrogramExplore partitions different from those allowed by an horizontal cut
Horizontal cutk = 3
![Page 7: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/7.jpg)
Motivation
Dario Bruzzese, Domenico Vistocco () Compstat 2010 4 / 19
Automatically determine the optimal cut-off level of a dendrogramExplore partitions different from those allowed by an horizontal cut
An alternative cutk = 3
![Page 8: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/8.jpg)
La Carte
1 Motivation
2 The stairstep-like permutation procedureNotationThe outline
3 Some resultsReal datasetsSynthetic dataset
4 ToDo List
Dario Bruzzese, Domenico Vistocco () Compstat 2010 5 / 19
![Page 9: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/9.jpg)
Notation
Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Let:
n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h(
CkL ∪ Ck
R
)the height necessary to merge
CkL and Ck
R
h(
Ckj
)the height at which Ck
j has been obtained(j ∈ { L, R })
![Page 10: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/10.jpg)
Notation
Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h(
CkL ∪ Ck
R
)the height necessary to merge
CkL and Ck
R
h(
Ckj
)the height at which Ck
j has been obtained(j ∈ { L, R })
︸ ︷︷ ︸︸ ︷︷ ︸
![Page 11: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/11.jpg)
Notation
Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h(
CkL ∪ Ck
R
)the height necessary to merge
CkL and Ck
R
h(
Ckj
)the height at which Ck
j has been obtained(j ∈ { L, R })
![Page 12: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/12.jpg)
Notation
Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h(
CkL ∪ Ck
R
)the height necessary to merge
CkL and Ck
R
h(
Ckj
)the height at which Ck
j has been obtained(j ∈ { L, R })
C1L C1
R
![Page 13: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/13.jpg)
Notation
Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h(
CkL ∪ Ck
R
)the height necessary to merge
CkL and Ck
R
h(
Ckj
)the height at which Ck
j has been obtained(j ∈ { L, R })
C2L C2
R
![Page 14: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/14.jpg)
Notation
Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h(
CkL ∪ Ck
R
)the height necessary to merge
CkL and Ck
R
h(
Ckj
)the height at which Ck
j has been obtained(j ∈ { L, R })
C3L C3
R
![Page 15: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/15.jpg)
Notation
Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h(
CkL ∪ Ck
R
)the height necessary to merge
CkL and Ck
R
h(
Ckj
)the height at which Ck
j has been obtained(j ∈ { L, R })
![Page 16: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/16.jpg)
Notation
Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h(
CkL ∪ Ck
R
)the height necessary to merge
CkL and Ck
R
h(
Ckj
)the height at which Ck
j has been obtained(j ∈ { L, R })
C1L C1
R
h(
C1L ∪ C1
R
)
![Page 17: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/17.jpg)
Notation
Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h(
CkL ∪ Ck
R
)the height necessary to merge
CkL and Ck
R
h(
Ckj
)the height at which Ck
j has been obtained(j ∈ { L, R })
C2L C2
R
h(
C2L ∪ C2
R
)
![Page 18: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/18.jpg)
Notation
Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h(
CkL ∪ Ck
R
)the height necessary to merge
CkL and Ck
R
h(
Ckj
)the height at which Ck
j has been obtained(j ∈ { L, R })
C3L C3
R
h(
C3L ∪ C3
R
)
![Page 19: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/19.jpg)
Notation
Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h(
CkL ∪ Ck
R
)the height necessary to merge
CkL and Ck
R
h(
Ckj
)the height at which Ck
j has been obtained(j ∈ { L, R })
![Page 20: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/20.jpg)
Notation
Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h(
CkL ∪ Ck
R
)the height necessary to merge
CkL and Ck
R
h(
Ckj
)the height at which Ck
j has been obtained(j ∈ { L, R })
C1L
h(
C1L
)
![Page 21: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/21.jpg)
Notation
Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h(
CkL ∪ Ck
R
)the height necessary to merge
CkL and Ck
R
h(
Ckj
)the height at which Ck
j has been obtained(j ∈ { L, R })
C1R
h(
C1R
)
![Page 22: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/22.jpg)
Notation
Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h(
CkL ∪ Ck
R
)the height necessary to merge
CkL and Ck
R
h(
Ckj
)the height at which Ck
j has been obtained(j ∈ { L, R })
C2L
h(
C2L
)
![Page 23: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/23.jpg)
Notation
Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h(
CkL ∪ Ck
R
)the height necessary to merge
CkL and Ck
R
h(
Ckj
)the height at which Ck
j has been obtained(j ∈ { L, R })
C2R
h(
C2R
)
![Page 24: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/24.jpg)
Notation
Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h(
CkL ∪ Ck
R
)the height necessary to merge
CkL and Ck
R
h(
Ckj
)the height at which Ck
j has been obtained(j ∈ { L, R })
C3L
h(
C3L
)
![Page 25: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/25.jpg)
Notation
Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h(
CkL ∪ Ck
R
)the height necessary to merge
CkL and Ck
R
h(
Ckj
)the height at which Ck
j has been obtained(j ∈ { L, R })
C3R
h(
C3R
)
![Page 26: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/26.jpg)
The algorithm - Pseudo CodeInput: A dataset and its related dendrogramOutput: A partition of the dataset
initialization:aggregationLevelsToVisit← h(C1
L ∪ C1R)
permClusters← [ ]i← 1repeat
if C iL ≡ C i
R thenadd C i
L ∪ C iR to permClusters
elseadd h(C i
L) and h(C iR) to aggregationLevelsToVisit
sort aggregationLevelsToVisit in descending orderendremove the first element from aggregationLevelsToVisiti← i+1
until aggregationLevelsToVisit is empty
Dario Bruzzese, Domenico Vistocco () Compstat 2010 7 / 19
![Page 27: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/27.jpg)
The algorithm - Pseudo CodeInput: A dataset and its related dendrogramOutput: A partition of the dataset
initialization:aggregationLevelsToVisit← h(C1
L ∪ C1R)
permClusters← [ ]i← 1
repeatif C i
L ≡ C iR then
add C iL ∪ C i
R to permClusterselse
add h(C iL) and h(C i
R) to aggregationLevelsToVisitsort aggregationLevelsToVisit in descending order
endremove the first element from aggregationLevelsToVisiti← i+1
until aggregationLevelsToVisit is empty
Dario Bruzzese, Domenico Vistocco () Compstat 2010 7 / 19
![Page 28: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/28.jpg)
The algorithm - Pseudo CodeInput: A dataset and its related dendrogramOutput: A partition of the dataset
initialization:aggregationLevelsToVisit← h(C1
L ∪ C1R)
permClusters← [ ]i← 1repeat
if C iL ≡ C i
R thenadd C i
L ∪ C iR to permClusters
elseadd h(C i
L) and h(C iR) to aggregationLevelsToVisit
sort aggregationLevelsToVisit in descending orderend
remove the first element from aggregationLevelsToVisiti← i+1
until aggregationLevelsToVisit is empty
Dario Bruzzese, Domenico Vistocco () Compstat 2010 7 / 19
![Page 29: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/29.jpg)
The algorithm - Pseudo CodeInput: A dataset and its related dendrogramOutput: A partition of the dataset
initialization:aggregationLevelsToVisit← h(C1
L ∪ C1R)
permClusters← [ ]i← 1repeat
if C iL ≡ C i
R thenadd C i
L ∪ C iR to permClusters
elseadd h(C i
L) and h(C iR) to aggregationLevelsToVisit
sort aggregationLevelsToVisit in descending orderendremove the first element from aggregationLevelsToVisiti← i+1
until aggregationLevelsToVisit is empty
Dario Bruzzese, Domenico Vistocco () Compstat 2010 7 / 19
![Page 30: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/30.jpg)
The algorithm - Pseudo CodeInput: A dataset and its related dendrogramOutput: A partition of the dataset
initialization:aggregationLevelsToVisit← h(C1
L ∪ C1R)
permClusters← [ ]i← 1repeat
if C iL ≡ C i
R thenadd C i
L ∪ C iR to permClusters
elseadd h(C i
L) and h(C iR) to aggregationLevelsToVisit
sort aggregationLevelsToVisit in descending orderendremove the first element from aggregationLevelsToVisiti← i+1
until aggregationLevelsToVisit is empty
Dario Bruzzese, Domenico Vistocco () Compstat 2010 7 / 19
![Page 31: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/31.jpg)
The algorithm - The outline
Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19
Initializationi ← 0
aggregationLevelsToVisit
h(C1L ∪ C1
R)
permClusters
![Page 32: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/32.jpg)
The algorithm - The outline
Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19
Iterationi ← 1
aggregationLevelsToVisit
h(C1L ∪ C1
R)
permClusters
h(
C1L ∪ C1
R
)
![Page 33: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/33.jpg)
The algorithm - The outline
Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19
Iterationi ← 1
aggregationLevelsToVisit
h(C1L ∪ C1
R)
permClusters
clusters to compare
H0 : C1L ≡ C1
R 7→ reject
C1L C1
R
![Page 34: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/34.jpg)
The algorithm - The outline
Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19
Iterationi ← 1
permClusters
aggregationLevelsToVisit
h(C1L ∪ C1
R),h(C1R),h(C
1L)
h(
C1L
)h(
C1R
)
![Page 35: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/35.jpg)
The algorithm - The outline
Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19
Iterationi ← 1
permClusters
aggregationLevelsToVisit
h(C1R),h(C
1L)
![Page 36: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/36.jpg)
The algorithm - The outline
Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19
permClusters
Iterationi ← 2
aggregationLevelsToVisit
h(C1R),h(C
1L)
h(
C1R
)
![Page 37: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/37.jpg)
The algorithm - The outline
Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19
permClusters
Iterationi ← 2
aggregationLevelsToVisit
h(C1R),h(C
1L)
clusters to compare
H0 : C2L ≡ C2
R 7→ reject
C2L C2
R
![Page 38: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/38.jpg)
The algorithm - The outline
Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19
permClusters
Iterationi ← 2
aggregationLevelsToVisit
h(C1R),h(C
1L),h(C
2R),h(C
2L)
h(
C2L
)h(
C2R
)
![Page 39: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/39.jpg)
The algorithm - The outline
Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19
permClusters
Iterationi ← 2
aggregationLevelsToVisit
h(C1L),h(C
2R),h(C
2L)
![Page 40: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/40.jpg)
The algorithm - The outline
Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19
permClusters
Iterationi ← 3
aggregationLevelsToVisit
h(C1L),h(C
2R),h(C
2L)
h(
C1L
)
![Page 41: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/41.jpg)
The algorithm - The outline
Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19
permClusters
Iterationi ← 3
aggregationLevelsToVisit
h(C1L),h(C
2R),h(C
2L)
clusters to compare
H0 : C3L ≡ C3
R 7→ reject
C3L C3
R
![Page 42: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/42.jpg)
The algorithm - The outline
Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19
permClusters
Iterationi ← 3
h(
C3L
)h(
C3R
) aggregationLevelsToVisit
h(C3R),h(C
2R),h(C
2L),h(C
3L)
![Page 43: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/43.jpg)
The algorithm - The outline
Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19
permClusters
Iterationi ← 4
aggregationLevelsToVisit
h(C3R),h(C
2R),h(C
2L),h(C
3L)
h(
C3R
)
![Page 44: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/44.jpg)
The algorithm - The outline
Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19
permClusters
Iterationi ← 4
aggregationLevelsToVisit
h(C3R),h(C
2R),h(C
2L),h(C
3L)
C4L C4
R
clusters to compare
H0 : C4L ≡ C4
R 7→ accept
![Page 45: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/45.jpg)
The algorithm - The outline
Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19
Iterationi ← 4
aggregationLevelsToVisit
h(C3R),h(C
2R),h(C
2L),h(C
3L)
clusters to compare
H0 : C4L ≡ C4
R 7→ accept
permClusters
C4L ∪ C4
R ⇔ C3R
C3R
![Page 46: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/46.jpg)
The algorithm - The outline
Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19
Iterationi ← 9
aggregationLevelsToVisit
permClusters
C3L ,C
3R,C
2L ,C
4L ,C
4R
![Page 47: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/47.jpg)
The algorithm - The outline
Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19
Iterationi ← 9
aggregationLevelsToVisit
permClusters
C3L ,C
3R,C
2L ,C
4L ,C
4R
![Page 48: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/48.jpg)
The algorithm - The permutation Test
Dario Bruzzese, Domenico Vistocco () Compstat 2010 9 / 19
For each aggregation level k a permutation test isdesigned to test the Null Hypothesis that the twogroups Ck
L and CkR really belong to the same
cluster, i.e. :
H0 : CkL ≡ Ck
R
Under this null, mixing up (permuting) the statisticalunits of Ck
L and CkR should not alter the aggregation
process resulting in their merging in.
![Page 49: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/49.jpg)
The algorithm - The permutation Test
Dario Bruzzese, Domenico Vistocco () Compstat 2010 9 / 19
For each k , the difference between maxj∈{L,R}
h(
Ckj
)and min
j∈{L,R}h(
Ckj
)can be considered as the
minimum cost necessary to merge the two classes..
min h(C3j )
max h(C3j )
![Page 50: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/50.jpg)
The algorithm - The permutation Test
Dario Bruzzese, Domenico Vistocco () Compstat 2010 9 / 19
For each k , the difference between maxj∈{L,R}
h(
Ckj
)and min
j∈{L,R}h(
Ckj
)can be considered as the
minimum cost necessary to merge the two classes.
The difference between h(
CkL ∪ Ck
R
)and
maxj∈{L,R}
h(
Ckj
)can be, instead, considered as the
cost actually incurred for merging CkL and Ck
R .
h(C3L ∪ C3
R )
max h(C3j )
![Page 51: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/51.jpg)
The algorithm - The permutation Test
Dario Bruzzese, Domenico Vistocco () Compstat 2010 9 / 19
For each k , the difference between maxj∈{L,R}
h(
Ckj
)and min
j∈{L,R}h(
Ckj
)can be considered as the
minimum cost necessary to merge the two classes.
The difference between h(
CkL ∪ Ck
R
)and
maxj∈{L,R}
h(
Ckj
)can be, instead, considered as the
cost actually incurred for merging CkL and Ck
R .
The ratio between these two differences:
cost(
CkL ∪ Ck
R
)=
maxj∈{L,R}
h(
Ckj
)− min
j∈{L,R}h(
Ckj
)h(Ck
L ∪ CkR
)− max
j∈{L,R}h(
Ckj
)is thus a measure that characterizes the aggregation process resulting in thenew class Ck
L ∪ CkR
![Page 52: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/52.jpg)
The algorithm - The permutation Test
Dario Bruzzese, Domenico Vistocco () Compstat 2010 10 / 19
C3L C3
R
mC3LmC3
R
![Page 53: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/53.jpg)
The algorithm - The permutation Test
Dario Bruzzese, Domenico Vistocco () Compstat 2010 10 / 19
C3L C3
R
mC3LmC3
R
mC3L
mC3R
h(mC3L)
h(mC3R)
![Page 54: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/54.jpg)
The algorithm - The permutation Test
Dario Bruzzese, Domenico Vistocco () Compstat 2010 10 / 19
The ratio:
cost(
mCkL ∪ mCk
R
)=
maxj∈{L,R}
h(
mCkj
)− min
j∈{L,R}h(
mCkj
)h(Ck
L ∪ CkR
)− max
j∈{L,R}h(
mCkj
)is thus a measure that characterizes the aggregation process resulting in thenew (potential) class mCk
L ∪ mCkR
C3L C3
R
mC3LmC3
R
mC3L
mC3R
h(mC3L)
h(mC3R)
![Page 55: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/55.jpg)
The algorithm - The permutation Test
Dario Bruzzese, Domenico Vistocco () Compstat 2010 10 / 19
Under H0 the aggregation process resulting in the new cluster CkL ∪ Ck
R should be very similar
to the one that potentially produces mCkL ∪ mCk
R ; thus the two values cost(
mCkL ∪ mCk
R
)and
cost(
CkL ∪ Ck
R
)should be close enough.
C3L C3
R
mC3LmC3
R
mC3L
mC3R
h(mC3L)
h(mC3R)
![Page 56: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/56.jpg)
The algorithm - The permutation Test
Dario Bruzzese, Domenico Vistocco () Compstat 2010 10 / 19
The permutation procedure is repeated M times and each time a new couple mCkL , mCk
R isobtained. The pvalue Montecarlo is thus computed as:
p =#
{cost
(mCk
L ∪ mCkR
)≤ cost
(Ck
L ∪ CkR
)}+ 1
M + 1
C3L C3
R
mC3LmC3
R
mC3L
mC3R
h(mC3L)
h(mC3R)
![Page 57: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/57.jpg)
La Carte
1 Motivation
2 The stairstep-like permutation procedureNotationThe outline
3 Some resultsReal datasetsSynthetic dataset
4 ToDo List
Dario Bruzzese, Domenico Vistocco () Compstat 2010 11 / 19
![Page 58: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/58.jpg)
Some results - Real datasets
The yeast galactosedatasetIdeker T, Thorsson V, Ranish JA,Christmas R, Buhler J, Eng JK,Bumgarner RE, Goodlett DR,Aebersold R, Hood LIntegrated genomic andproteomic analyses of asystemically perturbed metabolicnetwork.
Science 2001, 292:929-934.
n = 205p = 80
Dario Bruzzese, Domenico Vistocco () Compstat 2010 12 / 19
![Page 59: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/59.jpg)
Some results - Real datasets
Dario Bruzzese, Domenico Vistocco () Compstat 2010 12 / 19
% of misclassification = 1.5
![Page 60: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/60.jpg)
Some results - Real datasets
The diabetes datasetBanfield JD, Raftery AEModel–based Gaussian andNon–Gaussian Clustering.
Biometrics, 1993, 49, 803-821.
n = 145p = 3
Dario Bruzzese, Domenico Vistocco () Compstat 2010 13 / 19
![Page 61: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/61.jpg)
Some results - Real datasets
Dario Bruzzese, Domenico Vistocco () Compstat 2010 13 / 19
% of misclassification = 15.2
![Page 62: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/62.jpg)
Some results - Synthetic dataset
QIU W.-L, JOE H. (2009). clusterGeneration: random clustergeneration (with specified degree of separation). R package version1.2.7.
different number of clusters (k = 2; 3; 4; 5; 6; 7)separation index = 0.01different number of variables (p = 5; 10; 15)100 replications for each combination of k and p
Dario Bruzzese, Domenico Vistocco () Compstat 2010 14 / 19
![Page 63: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/63.jpg)
Some results - Synthetic dataset (p=5)
Dario Bruzzese, Domenico Vistocco () Compstat 2010 15 / 19
![Page 64: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/64.jpg)
Some results - Synthetic dataset (p=10)
Dario Bruzzese, Domenico Vistocco () Compstat 2010 16 / 19
![Page 65: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/65.jpg)
Some results - Synthetic dataset (p=15)
Dario Bruzzese, Domenico Vistocco () Compstat 2010 17 / 19
![Page 66: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/66.jpg)
La Carte
1 Motivation
2 The stairstep-like permutation procedureNotationThe outline
3 Some resultsReal datasetsSynthetic dataset
4 ToDo List
Dario Bruzzese, Domenico Vistocco () Compstat 2010 18 / 19
![Page 67: Cutting the dendrogram through permutation tests](https://reader031.vdocuments.us/reader031/viewer/2022021120/620605edcf456418c32f0afc/html5/thumbnails/67.jpg)
ToDo List
Statistical issues
Introducing a penalty term in the permutation test stepQuality measures of the obtained partitionMultiple Testing Problem (???)
Computational issues
profiling and optimizing the R codeI use of compiled codeI deploying a package
Dario Bruzzese, Domenico Vistocco () Compstat 2010 19 / 19