gmdh-based feature ranking and selection for improved classification of medical data
DESCRIPTION
GMDH-based feature ranking and selection for improved classification of medical data. Advisor : Dr. Hsu Presenter : Yu-San Hsieh Author : R.E. Abdel-Aal. 2005. BI.456-468. Outline. Motivation Objective Method Material Results Conclusions. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: GMDH-based feature ranking and selection for improved classification of medical data](https://reader035.vdocuments.us/reader035/viewer/2022062410/568154c5550346895dc2c9bd/html5/thumbnails/1.jpg)
1Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
GMDH-based feature ranking and selection for improved
classification of medical data
Advisor : Dr. Hsu
Presenter : Yu-San Hsieh
Author : R.E. Abdel-Aal
2005. BI.456-468
![Page 2: GMDH-based feature ranking and selection for improved classification of medical data](https://reader035.vdocuments.us/reader035/viewer/2022062410/568154c5550346895dc2c9bd/html5/thumbnails/2.jpg)
2
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Motivation Objective Method Material Results Conclusions
Outline
![Page 3: GMDH-based feature ranking and selection for improved classification of medical data](https://reader035.vdocuments.us/reader035/viewer/2022062410/568154c5550346895dc2c9bd/html5/thumbnails/3.jpg)
3
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Motivation Accuracy is very important in classifiers used
for medical application.
![Page 4: GMDH-based feature ranking and selection for improved classification of medical data](https://reader035.vdocuments.us/reader035/viewer/2022062410/568154c5550346895dc2c9bd/html5/thumbnails/4.jpg)
4
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Objective Improved classification performance of
medical data.
![Page 5: GMDH-based feature ranking and selection for improved classification of medical data](https://reader035.vdocuments.us/reader035/viewer/2022062410/568154c5550346895dc2c9bd/html5/thumbnails/5.jpg)
5
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Method
First stage – ranked feature─ GMDH algorithm
z1
Zm(m-1)/2
1. representation
2.Selection and stopping
x1
x2
x3
x4
y
An increasing rmin: model becoming complex,
1.Overfitting the estimation data
2.Performing poorly on the new selection data.
Iteration
Square error
r12
rm(m-1)2
rmin
r22
![Page 6: GMDH-based feature ranking and selection for improved classification of medical data](https://reader035.vdocuments.us/reader035/viewer/2022062410/568154c5550346895dc2c9bd/html5/thumbnails/6.jpg)
6
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Method
First stage – ranked feature─ AIM abductive network
2.Selection and stopping
1.repesentation
1.repesentation
First stage – ranked feature─ AIM abductive network
2.Selection and stoppingAvoid overfitting
Using CPM control
1.CPM>1,simpler model that are less accurate but generalize.
2.CPM<1,complex model, overfit training data and decrease actual prediction performance.
![Page 7: GMDH-based feature ranking and selection for improved classification of medical data](https://reader035.vdocuments.us/reader035/viewer/2022062410/568154c5550346895dc2c9bd/html5/thumbnails/7.jpg)
7
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Method Second stage – selected feature
─ Selected k, performance on an evaluation dataset would first improve and starts to deteriorate due to the model overfitting the training data.
─ A compact m-feature subset can be obtained by taking the first m features starting from top of the ranking list.Ex: ranking list{2,6,7,8,1,5,3,4,9}, selected 6-features is {2,6,7,8,1,5}.
─ The optimum subset of features is determined by repeatedly forming subset of k features, starting from the top of the ranking list.Ex: ranking list{2,6,7,8,1,5,3,4,9}, {2,6,7,8,1,5},{6,7,8,1,5,3}…中選出最佳的 subset
![Page 8: GMDH-based feature ranking and selection for improved classification of medical data](https://reader035.vdocuments.us/reader035/viewer/2022062410/568154c5550346895dc2c9bd/html5/thumbnails/8.jpg)
8
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Material Two standard medical diagnosis datasets from
the UCI Machine Learning Repository were used for this study.─ Wisconsin breast cancer dataset─ Cleveland heart disease dataset
70% 30%
![Page 9: GMDH-based feature ranking and selection for improved classification of medical data](https://reader035.vdocuments.us/reader035/viewer/2022062410/568154c5550346895dc2c9bd/html5/thumbnails/9.jpg)
9
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Results
The breast cancer data─ Ranking for the feature set
{2,6,7,8,1,5,3,4,9}
7
5
9
Feature selected Feature ranked
![Page 10: GMDH-based feature ranking and selection for improved classification of medical data](https://reader035.vdocuments.us/reader035/viewer/2022062410/568154c5550346895dc2c9bd/html5/thumbnails/10.jpg)
10
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Results Rough set data analysis of dataset
Overfitting Overfitting
3%
3%
![Page 11: GMDH-based feature ranking and selection for improved classification of medical data](https://reader035.vdocuments.us/reader035/viewer/2022062410/568154c5550346895dc2c9bd/html5/thumbnails/11.jpg)
11
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Results
Standard error↓Standard error↓
AUC↑
3%3%
![Page 12: GMDH-based feature ranking and selection for improved classification of medical data](https://reader035.vdocuments.us/reader035/viewer/2022062410/568154c5550346895dc2c9bd/html5/thumbnails/12.jpg)
12
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Results The heart disease data
─ Ranking for the feature set{13,12,9,3,2,10,8,4,5,11,1,7,6}
Feature selected Feature ranked
![Page 13: GMDH-based feature ranking and selection for improved classification of medical data](https://reader035.vdocuments.us/reader035/viewer/2022062410/568154c5550346895dc2c9bd/html5/thumbnails/13.jpg)
13
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Results
3%6%
Overfitting
![Page 14: GMDH-based feature ranking and selection for improved classification of medical data](https://reader035.vdocuments.us/reader035/viewer/2022062410/568154c5550346895dc2c9bd/html5/thumbnails/14.jpg)
14
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Results
AUC↑
AUC↑
Requires less than half the number of input features
Models using the reduced feature set will be more efficient.
![Page 15: GMDH-based feature ranking and selection for improved classification of medical data](https://reader035.vdocuments.us/reader035/viewer/2022062410/568154c5550346895dc2c9bd/html5/thumbnails/15.jpg)
15
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Conclusions
Improved implementation and performance of classifiers for medical screening and diagnosis.
Feature reduction is particularly useful with high-dimensional data characterized by a large number of feature and a relatively few training example.
![Page 16: GMDH-based feature ranking and selection for improved classification of medical data](https://reader035.vdocuments.us/reader035/viewer/2022062410/568154c5550346895dc2c9bd/html5/thumbnails/16.jpg)
16
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.My opinion
Advantage: Preprocess Disadvantage: Apply: Clustering, Association Rule……