intelligent database systems lab presenter: chang, shih-jie authors: luca cagliero, paolo garza...

21
Intelligent Database Systems Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classi cation models with taxonomy information

Upload: albert-fields

Post on 19-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy

Intelligent Database Systems Lab

Presenter: CHANG, SHIH-JIE

Authors: Luca Cagliero, Paolo Garza

2013.DKE.

Improving classification models with taxonomy information

Page 2: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy

Intelligent Database Systems Lab

Outlines

MotivationObjectivesMethodologyExperimentsConclusionsComments

Page 3: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy

Intelligent Database Systems Lab

Motivation • A number of different approaches to build accurate

classifiers have been proposed but the integration of taxonomy information in data used for classifier training has never been investigated so far.

Page 4: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy

Intelligent Database Systems Lab

Objectives

• This paper presents a general-purpose strategy to improve structured data classifier accuracy provided by a taxonomy built over data items.

Page 5: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy

Intelligent Database Systems Lab

Definition. Aggregation tree

Definition. Multiple-taxonomy

Let T ¼ t 1 ; …; t be a set of attributes.A multiple-taxonomy Θ={AT 1 ,…,AT m } is a forest of aggregation trees defined on the domains of attributes in T .

Page 6: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy

Intelligent Database Systems Lab

Methodology

Page 7: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy

Intelligent Database Systems Lab

Methodology – Multiple-taxonomy over data items in D

Page 8: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy

Intelligent Database Systems Lab

two-step process:(i)Generalized classification rule mining. ex: {(Location,Italy)} {(User category, Entrepreneur)}⇒ (s=50%, c=100%)(1)An extended training dataset version is generated first(2) FP-tree-like representation of the extended dataset is generated . Only frequent items are included in the FP-tree.

(ii)Rule selection by means of lazy pruning.

Page 9: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy

Intelligent Database Systems Lab

Methodology – lazy pruning(1) Pruning rules that only misclassify training data.

(2) Rules that correctly classify at least one training data are grouped in the Level I rule set, while rules that remain unused during the training phase are kept in the Level II.

Page 10: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy

Intelligent Database Systems Lab

Methodology – The G−L3 algorithm

Page 11: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy

Intelligent Database Systems Lab

Methodology

Page 12: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy

Intelligent Database Systems Lab

Methodology – G−L3 class prediction

When a new test case rt has to be classified, G−L3 considers the sorted rule sets in Level I and Level II.

If none of the Level I rules match rt , then the top-ranked rule in Level II matching r is considered.

If none of the rules belonging to the two model sets match rt , the default class label is assigned to rt.

Page 13: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy

Intelligent Database Systems Lab

Experiments – Dataset characteristics

Page 14: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy

Intelligent Database Systems Lab

Experiments – Accuracy comparison(baseline V.S. extended)

Page 15: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy

Intelligent Database Systems Lab

Experiments – Accuracy comparison

Page 16: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy

Intelligent Database Systems Lab

Experiments –

Page 17: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy

Intelligent Database Systems Lab

Experiments –

Page 18: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy

Intelligent Database Systems Lab

Experiments –

Page 19: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy

Intelligent Database Systems Lab

Experiments – execution time comparison

Page 20: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy

Intelligent Database Systems Lab

Conclusions– Taxonomy integration is shown to yield significant

accuracy improvements.

Page 21: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy

Intelligent Database Systems Lab

Comments• Advantages

– More accurate.• Applications

– Classification、 Data mining.