![Page 1: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/1.jpg)
Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
Advisor : Dr. Hsu
Presenter : Keng-Wei Chang
Author: Yehuda Koren and David Harel
A Two-Way Visualization Method for Clustered Data
ACM SIGKDD international conference on Knowledge discovery and datamining
![Page 2: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/2.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Outline
Motivation Objective Introduction Basic Notions Computing The x-Coordinates Computing The y-Coordinates Result Related Work Conclusions Personal Opinion
![Page 3: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/3.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Motivation
A number of technological development have led to an explosion of raw data that has to be analyzed
We are especially interested in two families of tools in this domain
Clustering algorithms and data visualization methods
![Page 4: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/4.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Objective
in this paper, we integrate the two approacheshierarchical clustering depicted as a dendrogram
low-dimensional embedding
![Page 5: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/5.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Introduction
A number of technological development have led to an explosion of raw data that has to be analyzed
We are especially interested in two families of tools in this domain
Clustering algorithms and data visualization methods
Clustering methods can be broadly classifiedHierarchical and partitional
![Page 6: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/6.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Introduction
Our main interest here is hierarchical clustering
The clustering hierarchy is often visualized as a dendrogram
A full binary tree
has a significant disadvantagedoes not provide exploratory visual representations of the data itself
another issue is that of cluster validity
![Page 7: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/7.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Introduction
we are particularly interested in methods for achieving a low-dimensional embedding of data
principal component analysis (PCA)
multidimensional scaling (MDS)
force-directed placement
solve some limitations of dendrogrambut, cannot utilize external clustering information
![Page 8: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/8.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Introduction
for a demonstration of the relative merits of the two approaches
a dendrogram vs. a low-dimensional embedding
![Page 9: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/9.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Introduction
in this paper, we integrate the two approacheshierarchical clustering depicted as a dendrogram
low-dimensional embedding
![Page 10: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/10.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Basic Notions
given data about n elements {1,…,n}
relationships between pairs of elements are bydistances dij ≥ 0 or
similarities wij ≥ 0
2-dimentional embedding of the dataid defined by two vectors x, y Є
the coordinates of element i are ( xi, yi)
n
![Page 11: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/11.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Computing The x-Coordinates
The embedding must place each element exactly below its corresponding leaf in the dendrogram
this means that the x-coordinate must corresponding leaf in the dendrogram
face the problem of computing the x-coordinates of the dendrogram leaves
preserves the relationships among the data as much as possible
![Page 12: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/12.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Computing The x-Coordinates
we exhaust all the existing methods, opting for a twofold process
find the best orientation of the dendrogramthis step determines the ordering of the leaves
decide on the exact gaps between consecutive leaves in the ordering
![Page 13: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/13.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Dendrogram orientation
a dendrogram has 2n-1 different orientationsexample :
![Page 14: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/14.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Dendrogram orientation
one way of defining formally what should be considered a “good” ordering
associate a cost function with the dendrogram
such that finding the best ordering is equivalent to optimizing this function
be the classical minimum linear arrangement problem
ji
jiij
def
sim xxwxLA,
.
minimizes
![Page 15: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/15.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Dendrogram orientation
in our particular problemalso faced with an ordering task
a permutation of {1, …, n}
however, here we should not consider all possible permutations, but only agree with dendrogram’s structure
n! 2n-1
using dynamic programming, running time is exponential in the dendrogram’s height not in its size
![Page 16: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/16.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Dendrogram orientation
introduce an additional form of the cost function
ji
jiij
def
dist xxdxLA,
.
maximizes
![Page 17: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/17.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Dendrogram orientation
given an ordered dendrogram T
a node v
Leaves(v) : the set of leaves in the substree rooted by v
x be the ordering on the leaves
Let S be Leaves(v)L be the set of leaves of left of S
R be the set of leaves of right of S
if |L| = l, |S| = s, we have x(L) = {1,…,l},
x(S) = {l+1,…,l+x}, x(R) = {l+s+1,…,n}
![Page 18: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/18.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Dendrogram orientation
a key concept of the algorithm is local arrangement cost, defined as :
RS,ji RjLiijiij
Sji LjSiiijjiij
defT
swxslw
lxwxxwvLocalLA
,
, ,
if |L| = l, |S| = s, we have x(L) = {1,…,l}, x(S) = {l+1,…,l+x}, x(R) = {l+s+1,…,n}
![Page 19: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/19.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Dendrogram orientation
two additional related terms will be used
another term that will be used in the algorithm
RjSi
ij
defT
LjSiij
defT wvRightCutwvLeftCut
,,
,
ij
rightvLeavesjleftvLeavesi
wvInnerCut
..
![Page 20: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/20.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
![Page 21: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/21.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Determining coordinates of the leaves
computing the exact gaps between each two consecutive leaves
example :
![Page 22: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/22.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Determining coordinates of the leaves
a better approach is to take a weighted average over all influenced leaf pairs
ikij
kj
ikiji jk
d
jkgap
,
1
,
1
![Page 23: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/23.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Computing The y-Coordinates
Principle component analysis
Classical multidimensional scaling
Eigen-projection
Stress minimization
![Page 24: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/24.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Result
Odors datasetconsists of 30 volatile odorous pure chemicals
contains 262 elements, natural clusters : 30
use a UPGMA agglomerative clustering to construct
the dendrogram
![Page 25: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/25.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Result
Iris datasetan example of discriminant analysis
contains 150 elements, natural clusters : 3
![Page 26: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/26.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Result
Gene expression data : CDC15-synchronized cell cycle
a much larger dataset of gene-expression data
contains 6113 elements
![Page 27: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/27.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Related Work
TreeViewdendrogram over a color-coded matrix
![Page 28: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/28.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Discussion
success for integrating two key methods in exploratory data analysis
cluster analysis and low-dimensional embedding
two unique propertiesGuaranteed separation between any kind of given clusters
The ability to deal with a predefined hierarchical clustering
![Page 29: A Two-Way Visualization Method for Clustered Data](https://reader035.vdocuments.us/reader035/viewer/2022070402/5681379e550346895d9f4497/html5/thumbnails/29.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Personal Opinion
Advantages─ has success for integrating two of clustering methods.─ more intuition in analyzing
Application─ Real data for clustering and analyzing.─ May solve the problem lack of clustering information
Limited ─ cannot show the real shape of clusters