mining favorable facets
DESCRIPTION
Mining Favorable Facets. Presenter : Wei-Hao Huang Authors : Raymond Chi-Wing Wong, Jian Pei, Ada Wai-Chee Fu, Ke Wang SIGKDD, 2008. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
1
Mining Favorable Facets
Presenter : Wei-Hao Huang Authors : Raymond Chi-Wing Wong, Jian Pei, Ada Wai-Chee Fu, Ke Wang
SIGKDD, 2008
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
2
Outlines Motivation Objectives Methodology Experiments Conclusions Comments
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
3
Motivation The importance of dominance and skyline
analysis in multi-criteria decision making applications.
Fixed order v.s. different customers may have different preferences on nominal attributes.
Finding favorable facets.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Objectives
4
Propose to minimal disqualifying condition (MDC) which can summarize favorable facets and is meaningful to the user.
Develop two algorithms:─ Computing MDC On-the-fly (MDC-O)─ A Materialization Method (MDC-M)
Use real data sets and synthetic data set to verify effectiveness and efficiency
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
5
Methodology
Skyline analysis
Naïve Method
Minimal Disqualifying Conditions(MDC)
MDC On-the-fly (MDC-O)
A Materialization Method (MDC-M)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
6
Skyline analysis
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
7
Naïve Method: Lattice Search
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
8
Minimal Disqualifying Conditions
Used to summarize favorable facets effectively.
R’={(T,M)}R’’={(H,M)}MDC(f)={(T,M),(H,M)}
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
9
MDC-O: Computing MDC On-the-fly
Point: P Data Set: DTemplate:
R
Process
MDC(P)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
10
MDC-M: A Materialization Method
Data Set: DTemplate: R
Process
SKY(R)MDC
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
11
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
12
Indexing for Speed-up Use R-tree index structure An R-tree can be built the totally ordered
attributes T Find points that quasi-dominates p, a range
search is conducted on the R-tree
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
13
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments Synthetic Data Set
Dimension Numeric attributes Nominal attributes
Tuples Template Size Cardinality of Nominal Attributes Zipfian Parameter
Real Data Set Nursery Automobile
14
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Synthetic Data Set-Dimension(numeric attributes)
15
Numeric 3 3 3 3
Nominal 1 2 3 4
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Synthetic Data Set-Dimension(nominal attributes)
16
Numeric 2 3 4 5Nominal 1 1 1 1
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Synthetic Data Set-Tuples
17
500k -> 1000k
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Synthetic Data Set-Template Size
18
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Synthetic Data Set-Cardinality of Nominal Attributes
19
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Real Data Set
20
Nursery Data Set There are 12,960 instances and 8 attributes. The results in the performance are similar to synthetic data
sets.
Automobile Data Set Computation times were negligibly small. Honda, Mitsubishi and Toyota.Car Brand names MDCHonda Toyota <Honda
Mitsubishi Honda<Mitsubishi or Toyota < Mitsubishi
Toyota none
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
21
Conclusions MDC is effective in summarizing the favorable
facets. The experimental results show proposed
methods are efficacious. Future work is used to dynamic data and
ordering is an interesting topic.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
22
Comments Advantages
─ Finding favorable facets which has not been studied before.
─ Effectiveness and the efficiency of the mining. Applications
─ Information retrieval