advisor : dr.hsu graduate : keng-wei chang author : balaji rajagopalan
DESCRIPTION
國立雲林科技大學 National Yunlin University of Science and Technology. Exploiting data preparation to enhance mining and knowledge discovery. Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan Mark W. Isken. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/1.jpg)
Intelligent Database Systems Lab
Advisor : Dr.Hsu
Graduate : Keng-Wei Chang
Author : Balaji Rajagopalan
Mark W. Isken
國立雲林科技大學National Yunlin University of Science and Technology
Exploiting data preparation to enhance
mining and knowledge discovery
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS-PART C: APPLICATIONS AND REVIEWS, VOL. 31, NO. 4, NOVEMBER 2001
![Page 2: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/2.jpg)
Intelligent Database Systems Lab
Outline
Motivation Objective Introduction Data Preparation Research Method Results
N.Y.U.S.T.
I.M.
![Page 3: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/3.jpg)
Intelligent Database Systems Lab
Motivation using organizational data for mining and
knowledge discovery not amenable for mining in its natural form
N.Y.U.S.T.
I.M.
![Page 4: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/4.jpg)
Intelligent Database Systems Lab
Objective data enhancement by the introduction of new
attributes along with judicious aggregation of existing attributes results in higher quality knowledge discovery differential impact on the performance of different
mining algorithms
N.Y.U.S.T.
I.M.
![Page 5: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/5.jpg)
Intelligent Database Systems Lab
Introduction Exponential growth information result a
tremendous volume of data to knowledge workers.
Knowledge management solution Knowledge repository Knowledge sharing Knowledge discovery
N.Y.U.S.T.
I.M.
![Page 6: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/6.jpg)
Intelligent Database Systems Lab
Data Preparation Present a framework based on prior research in
knowledge discovery Data quality Data characteristics Data preparation
N.Y.U.S.T.
I.M.
![Page 7: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/7.jpg)
Intelligent Database Systems Lab
Research Method data set from a large tertiary care hospital in
the United States was used few topics
A. Problem Domain
B. Data
C. Clustering Algorithms for Knowledge Discovery
D. Entropy-Based Metrics for Cluster Quality
Assessment
E. Rule Extraction Metrics
N.Y.U.S.T.
I.M.
![Page 8: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/8.jpg)
Intelligent Database Systems Lab
Problem Domain allocation of inpatient beds
more difficult is use quantitative resource allocation in a manageable set of patient types
quantitative resource sequence of hospital units visited and corresponding
length of stay patient types
a group of patients consuming a similar level of hospital resources
N.Y.U.S.T.
I.M.
![Page 9: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/9.jpg)
Intelligent Database Systems Lab
Problem Domain refer to this as the patient classification
problem too few V.S. too many patient types The key is identify the set of patient types
N.Y.U.S.T.
I.M.
![Page 10: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/10.jpg)
Intelligent Database Systems Lab
Data Inpatient obstetrical and gynecological (OB/G
YN) patient flow There are numerous fields
demographics physician information ICD9-CM diagnostic procedure codes
diagnosis-related groups (DRGs)
N.Y.U.S.T.
I.M.
![Page 11: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/11.jpg)
Intelligent Database Systems Lab
Data almost 500 defined in DRGs range[353-384] are related to OB/GYN grouping these DRGs into five DRG types
N.Y.U.S.T.
I.M.
![Page 12: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/12.jpg)
Intelligent Database Systems Lab
Clustering Algorithms for Knowledge Discovery
K-means and Kohonen seof-organizing Similarity
Euclidean distance function
N.Y.U.S.T.
I.M.
n
iii yxyxd
1
2,
![Page 13: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/13.jpg)
Intelligent Database Systems Lab
Entropy-Based Metrics for Cluster Quality Assessment
Entropy
Weighted Entropy cluster size calculate a weighted average entropy measure for
a cluster solution
Purity, let
N.Y.U.S.T.
I.M.
i ijijj ppE
1log2
ijij pP max
be the number of cases having a DRG type of i in cluster j
ijn
l ljijij nnp /
![Page 14: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/14.jpg)
Intelligent Database Systems Lab
Rule Extraction Metrics expect a high degree of resonance for most of
the rules with our domain knowledge
N.Y.U.S.T.
I.M.
![Page 15: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/15.jpg)
Intelligent Database Systems Lab
Results detail the data enhancements relevant to this
studyA. Data Preparation : Basics
B. Mining and Knowledge Discovery
C. Differential Impact Based on Clustering Method
D. Usefulness of Knowledge Discovered
E. Limitations
F. Implications for Research and Practice
N.Y.U.S.T.
I.M.
![Page 16: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/16.jpg)
Intelligent Database Systems Lab
Data Preparation : Basics Data set included fields that represent the path
and associated lengths of stay along that path
N.Y.U.S.T.
I.M.
![Page 17: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/17.jpg)
Intelligent Database Systems Lab
Data Preparation : Basics Consider three data sets characterized in order
to illustrate the impact of data preparation ED1
Eight numeric variables
N.Y.U.S.T.
I.M.
![Page 18: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/18.jpg)
Intelligent Database Systems Lab
Data Preparation : Basics ED2
Both DRG and CCS were designed to serve as aggregate measures of hospital resource consumption
in addition ED1, ED2 add five nominal variables
N.Y.U.S.T.
I.M.
![Page 19: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/19.jpg)
Intelligent Database Systems Lab
Data Preparation : Basics ED3
in addition to ED2, ED3 contains two binary variables whether or not gave birth during the visit whether or not gave birth via C-section
N.Y.U.S.T.
I.M.
![Page 20: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/20.jpg)
Intelligent Database Systems Lab
Mining and Knowledge DiscoveryN.Y.U.S.T.
I.M.
![Page 21: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/21.jpg)
Intelligent Database Systems Lab
Mining and Knowledge DiscoveryN.Y.U.S.T.
I.M.
![Page 22: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/22.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
![Page 23: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/23.jpg)
Intelligent Database Systems Lab
Differential Impact Based on Clustering Method
N.Y.U.S.T.
I.M.
![Page 24: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/24.jpg)
Intelligent Database Systems Lab
Usefulness of Knowledge DiscoveredN.Y.U.S.T.
I.M.
![Page 25: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/25.jpg)
Intelligent Database Systems Lab
Limitations may not exactly applicable in every case examine only two data mining algorithms
K-means and Kohonen self-organizing maps
illustrative, not exhaustive domain knowledge played a critical role in the
data preparation process
N.Y.U.S.T.
I.M.
![Page 26: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/26.jpg)
Intelligent Database Systems Lab
Implications for Research and Practice
provides empirical evidence demonstrating the impact of data preparation on mining and knowledge discovery
engage in a comparative investigation of multiple altorithms
N.Y.U.S.T.
I.M.
![Page 27: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan](https://reader030.vdocuments.us/reader030/viewer/2022032709/568130f1550346895d9715b0/html5/thumbnails/27.jpg)
Intelligent Database Systems Lab
Personal opinion …
N.Y.U.S.T.
I.M.