from context to distance-learning dissimilarity for categorical data clustering
DESCRIPTION
From Context to Distance-Learning Dissimilarity for Categorical Data Clustering. Presenter : Jian-Ren Chen Authors : DINO IENCO, RUGGERO G. PENSA, and ROSA MEO 2012 , ACMKDD. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
Intelligent Database Systems Lab
Presenter : JIAN-REN CHEN
Authors : DINO IENCO, RUGGERO G. PENSA, and ROSA MEO
2012 , ACMKDD
From Context to Distance-Learning Dissimilarity for Categorical Data Clustering
Intelligent Database Systems Lab
Outlines
MotivationObjectivesMethodologyExperimentsConclusionsComments
Intelligent Database Systems Lab
Motivation• Clustering data described by categorical
attributes
is a challenging task in data mining
applications.
• It is difficult to define a distance between pairs
of values of a categorical attribute, since the
values
are not ordered.
Intelligent Database Systems Lab
Objectives• We present a new methodology to compute a
context-based distance between values of a categorical
variable.
- apply this technique to hierarchical clustering of categorical
data.
Intelligent Database Systems Lab
Methodology-Framework
DILCA (DIstance Learning for Categorical Attributes)
1. selection of a suitable context:(i) a parametric method(ii) a fully automatic one
2. compute the distance between any pair of values of a specific categorical attribute
Intelligent Database Systems Lab
Methodology - Context Selection
Intelligent Database Systems Lab
Methodology - Context Selection
Intelligent Database Systems Lab
Methodology - Context Selection
Intelligent Database Systems Lab
Methodology - Distance Computation
Intelligent Database Systems Lab
Experiments - Datasets
Intelligent Database Systems Lab
Experiments - Purity、 NMI、 ARI
Intelligent Database Systems Lab
Experiments - Purity、 NMI、 ARI
Intelligent Database Systems Lab
Experiments - Purity、 NMI、 ARI
Intelligent Database Systems Lab
Experiments - Impact of σ on DILCAM
Intelligent Database Systems Lab
Experiments - Impact of σ on DILCAM
Intelligent Database Systems Lab
Experiments - Scalability
Intelligent Database Systems Lab
Conclusions
• DILCA is competitive with respect to the state of the
art of categorical data clustering approaches.
• DILCA is scalable and has a low impact on the overall
computational time of a clustering task.
Intelligent Database Systems Lab
Comments• Advantages– scalable, computational time
• Applications– a context-based distance between values of a
categorical variable– hierarchical clustering of categorical data