intelligent database systems lab presenter: chang, shih-jie authors: luca cagliero, paolo garza...
TRANSCRIPT
Intelligent Database Systems Lab
Presenter: CHANG, SHIH-JIE
Authors: Luca Cagliero, Paolo Garza
2013.DKE.
Improving classification models with taxonomy information
Intelligent Database Systems Lab
Outlines
MotivationObjectivesMethodologyExperimentsConclusionsComments
Intelligent Database Systems Lab
Motivation • A number of different approaches to build accurate
classifiers have been proposed but the integration of taxonomy information in data used for classifier training has never been investigated so far.
Intelligent Database Systems Lab
Objectives
• This paper presents a general-purpose strategy to improve structured data classifier accuracy provided by a taxonomy built over data items.
Intelligent Database Systems Lab
Definition. Aggregation tree
Definition. Multiple-taxonomy
Let T ¼ t 1 ; …; t be a set of attributes.A multiple-taxonomy Θ={AT 1 ,…,AT m } is a forest of aggregation trees defined on the domains of attributes in T .
Intelligent Database Systems Lab
Methodology
Intelligent Database Systems Lab
Methodology – Multiple-taxonomy over data items in D
Intelligent Database Systems Lab
two-step process:(i)Generalized classification rule mining. ex: {(Location,Italy)} {(User category, Entrepreneur)}⇒ (s=50%, c=100%)(1)An extended training dataset version is generated first(2) FP-tree-like representation of the extended dataset is generated . Only frequent items are included in the FP-tree.
(ii)Rule selection by means of lazy pruning.
Intelligent Database Systems Lab
Methodology – lazy pruning(1) Pruning rules that only misclassify training data.
(2) Rules that correctly classify at least one training data are grouped in the Level I rule set, while rules that remain unused during the training phase are kept in the Level II.
Intelligent Database Systems Lab
Methodology – The G−L3 algorithm
Intelligent Database Systems Lab
Methodology
Intelligent Database Systems Lab
Methodology – G−L3 class prediction
When a new test case rt has to be classified, G−L3 considers the sorted rule sets in Level I and Level II.
If none of the Level I rules match rt , then the top-ranked rule in Level II matching r is considered.
If none of the rules belonging to the two model sets match rt , the default class label is assigned to rt.
Intelligent Database Systems Lab
Experiments – Dataset characteristics
Intelligent Database Systems Lab
Experiments – Accuracy comparison(baseline V.S. extended)
Intelligent Database Systems Lab
Experiments – Accuracy comparison
Intelligent Database Systems Lab
Experiments –
Intelligent Database Systems Lab
Experiments –
Intelligent Database Systems Lab
Experiments –
Intelligent Database Systems Lab
Experiments – execution time comparison
Intelligent Database Systems Lab
Conclusions– Taxonomy integration is shown to yield significant
accuracy improvements.
Intelligent Database Systems Lab
Comments• Advantages
– More accurate.• Applications
– Classification、 Data mining.