medical diagnosis via genetic programming project #2 artificial intelligence: biointelligence...
TRANSCRIPT
![Page 1: Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e265503460f94b15988/html5/thumbnails/1.jpg)
Medical Diagnosis via Genetic Medical Diagnosis via Genetic ProgrammingProgramming
Project #2
Artificial Intelligence: Biointelligence
Computational Neuroscience
Connectionist Modeling of Cognitive Processes
![Page 2: Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e265503460f94b15988/html5/thumbnails/2.jpg)
© 2007 SNU CSE Biointelligence Lab 2
Project PurposeProject Purpose
Medical Diagnosis To predict either benign or malignant case of breast
cancer Human experts (M.D.) vs Machine (GP)
Data Sets came from Wisconsin Diagnostic Breast Cancer
(WDBC) data in UCI Machine Leaning Repository (http://www.ics.uci.edu/~mlearn/databases/breast-cancer-wisconsin/)
Two text files for training and test data, respectively You can download them in the course web page.
![Page 3: Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e265503460f94b15988/html5/thumbnails/3.jpg)
© 2007 SNU CSE Biointelligence Lab3
Wisconsin Diagnostic Breast Wisconsin Diagnostic Breast CancerCancer Data Description
Number of patients: 569 Benign (0): 357, Malignant (1): 212 Training: 456, Test: 113
Features : 10 attributes × 3 kinds = 30 features Real-valued features are computed from the digitized images.
1) Attributes radius, texture, perimeter, area, smoothness, compactness, concavity, concave points, symmetry, fractal dimension 2) Kinds Mean value, Standard Error, Worst or Largest Value
Mean of attributes SE of attributes LV of attributes class
Patients 1 10 real values 10 real values 10 real values 0 or 1
Patients 2 10 real values 10 real values 10 real values 0 or 1
… … … … …
![Page 4: Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e265503460f94b15988/html5/thumbnails/4.jpg)
© 2007 SNU CSE Biointelligence Lab 4
Evolving a ClassifierEvolving a Classifier
GP settings Functions
Numerical operators {+, -, *, /, exp, log, sin, cos, sqrt, …} Some operators should be protected from the illegal operation.
Terminals Input features and constants {x1, x2, … x30, R} where R [a, b]
Additional parameters Threshold value for the decision Crossover and mutation rates Population size and the maximum number of generations
![Page 5: Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e265503460f94b15988/html5/thumbnails/5.jpg)
© 2007 SNU CSE Biointelligence Lab 5
Fitness FunctionFitness Function Maximization problem
Classification accuracy Confusion matrix for the training data
Minimization problem Classification error
Number of the incorrectly classified patients: q + r
True
PredictPositive Negative
Positive p q
Negative r s srqp
sp
srqp
rq
![Page 6: Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e265503460f94b15988/html5/thumbnails/6.jpg)
© 2007 SNU CSE Biointelligence Lab 6
BloatBloat Bloat = “survival of the fattest”, i.e., the tree sizes in the
population are increasing over time There are many studies devoted to understanding why
bloat occurs. For reducing the tree growth
We need countermeasures, e.g. Prohibiting variation operators that would deliver “too big” children
→ discard big children and perform crossover again Parsimony pressure: penalty for being oversized
constant: Depth, :# Nodes, ofNumber :#
)#,(#
)#,(#
DN
DNCError
DNPenaltyFitnessOriginalFitness
![Page 7: Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e265503460f94b15988/html5/thumbnails/7.jpg)
© 2007 SNU CSE Biointelligence Lab 7
ExperimentsExperiments
One problems WDBC Diagnostics
Various experimental setup Termination condition: maximum_generation
A GP run is stopped when the number of generation reaches a given limit.
Various settings Effects of the penalty term: adjusting α Different function sets: different models (e.g. polynomial vs.
complex functions) Selection methods and their parameters Crossover and mutation rates
![Page 8: Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e265503460f94b15988/html5/thumbnails/8.jpg)
© 2007 SNU CSE Biointelligence Lab 8
ResultsResults For each problem
Show the result table and write your own analysis At least 10 runs for one setting
Present the optimal classifier (the best GP tree). Write the confusion matrix for the test data by using the best tree.
Draw learning curves of your experiments. Compare with the results of neural networks (optional).
Training Test
Average SD
Best Worst Average SD
Best Worst
Setting 1
Setting 2
Setting 3
![Page 9: Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e265503460f94b15988/html5/thumbnails/9.jpg)
© 2007 SNU CSE Biointelligence Lab 9Generation
Fitness
(Error)
![Page 10: Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e265503460f94b15988/html5/thumbnails/10.jpg)
© 2007 SNU CSE Biointelligence Lab 10
ReferencesReferences
Source Codes GP libraries (C, C++, JAVA, …) MATLAB Tool box
Web sites http://www.cs.bham.ac.uk/~cmf/GPLib/GPLib.html http://cs.gmu.edu/~eclab/projects/ecj/ http://www.geneticprogramming.com/GPpages/
software.html …
![Page 11: Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e265503460f94b15988/html5/thumbnails/11.jpg)
© 2007 SNU CSE Biointelligence Lab 11
Pay Attention!Pay Attention!
Due: May 17, 2007 Submission
Source code and executable file(s) Proper comments in the source code Via e-mail ([email protected])
Report: Hardcopy!! (Submit to 301-419) Running environments and libraries (or packages) which you
used. Results for many experiments with various parameter settings Analysis and explanation about the results in your own way