from context to distance-learning dissimilarity for categorical data clustering

18
Intelligent Database Systems Presenter : JIAN-REN CHEN Authors : DINO IENCO, RUGGERO G. PENSA, and ROSA MEO 2012 , ACMKDD From Context to Distance- Learning Dissimilarity for Categorical Data Clustering

Upload: alamea

Post on 29-Jan-2016

43 views

Category:

Documents


0 download

DESCRIPTION

From Context to Distance-Learning Dissimilarity for Categorical Data Clustering. Presenter : Jian-Ren Chen Authors : DINO IENCO, RUGGERO G. PENSA, and ROSA MEO 2012 , ACMKDD. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: From  Context to Distance-Learning Dissimilarity for Categorical Data Clustering

Intelligent Database Systems Lab

Presenter : JIAN-REN CHEN

Authors : DINO IENCO, RUGGERO G. PENSA, and ROSA MEO

2012 , ACMKDD

From Context to Distance-Learning Dissimilarity for Categorical Data Clustering

Page 2: From  Context to Distance-Learning Dissimilarity for Categorical Data Clustering

Intelligent Database Systems Lab

Outlines

MotivationObjectivesMethodologyExperimentsConclusionsComments

Page 3: From  Context to Distance-Learning Dissimilarity for Categorical Data Clustering

Intelligent Database Systems Lab

Motivation• Clustering data described by categorical

attributes

is a challenging task in data mining

applications.

• It is difficult to define a distance between pairs

of values of a categorical attribute, since the

values

are not ordered.

Page 4: From  Context to Distance-Learning Dissimilarity for Categorical Data Clustering

Intelligent Database Systems Lab

Objectives• We present a new methodology to compute a

context-based distance between values of a categorical

variable.

  - apply this technique to hierarchical clustering of categorical

data.

Page 5: From  Context to Distance-Learning Dissimilarity for Categorical Data Clustering

Intelligent Database Systems Lab

Methodology-Framework

DILCA (DIstance Learning for Categorical Attributes)

1. selection of a suitable context:(i) a parametric method(ii) a fully automatic one

2. compute the distance between any pair of values of a specific categorical attribute

Page 6: From  Context to Distance-Learning Dissimilarity for Categorical Data Clustering

Intelligent Database Systems Lab

Methodology - Context Selection

Page 7: From  Context to Distance-Learning Dissimilarity for Categorical Data Clustering

Intelligent Database Systems Lab

Methodology - Context Selection

Page 8: From  Context to Distance-Learning Dissimilarity for Categorical Data Clustering

Intelligent Database Systems Lab

Methodology - Context Selection

Page 9: From  Context to Distance-Learning Dissimilarity for Categorical Data Clustering

Intelligent Database Systems Lab

Methodology - Distance Computation

Page 10: From  Context to Distance-Learning Dissimilarity for Categorical Data Clustering

Intelligent Database Systems Lab

Experiments - Datasets

Page 11: From  Context to Distance-Learning Dissimilarity for Categorical Data Clustering

Intelligent Database Systems Lab

Experiments - Purity、 NMI、 ARI

Page 12: From  Context to Distance-Learning Dissimilarity for Categorical Data Clustering

Intelligent Database Systems Lab

Experiments - Purity、 NMI、 ARI

Page 13: From  Context to Distance-Learning Dissimilarity for Categorical Data Clustering

Intelligent Database Systems Lab

Experiments - Purity、 NMI、 ARI

Page 14: From  Context to Distance-Learning Dissimilarity for Categorical Data Clustering

Intelligent Database Systems Lab

Experiments - Impact of σ on DILCAM

Page 15: From  Context to Distance-Learning Dissimilarity for Categorical Data Clustering

Intelligent Database Systems Lab

Experiments - Impact of σ on DILCAM

Page 16: From  Context to Distance-Learning Dissimilarity for Categorical Data Clustering

Intelligent Database Systems Lab

Experiments - Scalability

Page 17: From  Context to Distance-Learning Dissimilarity for Categorical Data Clustering

Intelligent Database Systems Lab

Conclusions

• DILCA is competitive with respect to the state of the

art of categorical data clustering approaches.

• DILCA is scalable and has a low impact on the overall

computational time of a clustering task.

Page 18: From  Context to Distance-Learning Dissimilarity for Categorical Data Clustering

Intelligent Database Systems Lab

Comments• Advantages– scalable, computational time

• Applications– a context-based distance between values of a

categorical variable– hierarchical clustering of categorical data