medical diagnosis via genetic programming project #2 artificial intelligence: biointelligence...

11
Medical Diagnosis via Medical Diagnosis via Genetic Programming Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive Processes

Upload: jessie-rich

Post on 26-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive

Medical Diagnosis via Genetic Medical Diagnosis via Genetic ProgrammingProgramming

Project #2

Artificial Intelligence: Biointelligence

Computational Neuroscience

Connectionist Modeling of Cognitive Processes

Page 2: Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive

© 2007 SNU CSE Biointelligence Lab 2

Project PurposeProject Purpose

Medical Diagnosis To predict either benign or malignant case of breast

cancer Human experts (M.D.) vs Machine (GP)

Data Sets came from Wisconsin Diagnostic Breast Cancer

(WDBC) data in UCI Machine Leaning Repository (http://www.ics.uci.edu/~mlearn/databases/breast-cancer-wisconsin/)

Two text files for training and test data, respectively You can download them in the course web page.

Page 3: Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive

© 2007 SNU CSE Biointelligence Lab3

Wisconsin Diagnostic Breast Wisconsin Diagnostic Breast CancerCancer Data Description

Number of patients: 569 Benign (0): 357, Malignant (1): 212 Training: 456, Test: 113

Features : 10 attributes × 3 kinds = 30 features Real-valued features are computed from the digitized images.

1) Attributes radius, texture, perimeter, area, smoothness, compactness, concavity, concave points, symmetry, fractal dimension 2) Kinds Mean value, Standard Error, Worst or Largest Value

Mean of attributes SE of attributes LV of attributes class

Patients 1 10 real values 10 real values 10 real values 0 or 1

Patients 2 10 real values 10 real values 10 real values 0 or 1

… … … … …

Page 4: Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive

© 2007 SNU CSE Biointelligence Lab 4

Evolving a ClassifierEvolving a Classifier

GP settings Functions

Numerical operators {+, -, *, /, exp, log, sin, cos, sqrt, …} Some operators should be protected from the illegal operation.

Terminals Input features and constants {x1, x2, … x30, R} where R [a, b]

Additional parameters Threshold value for the decision Crossover and mutation rates Population size and the maximum number of generations

Page 5: Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive

© 2007 SNU CSE Biointelligence Lab 5

Fitness FunctionFitness Function Maximization problem

Classification accuracy Confusion matrix for the training data

Minimization problem Classification error

Number of the incorrectly classified patients: q + r

True

PredictPositive Negative

Positive p q

Negative r s srqp

sp

srqp

rq

Page 6: Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive

© 2007 SNU CSE Biointelligence Lab 6

BloatBloat Bloat = “survival of the fattest”, i.e., the tree sizes in the

population are increasing over time There are many studies devoted to understanding why

bloat occurs. For reducing the tree growth

We need countermeasures, e.g. Prohibiting variation operators that would deliver “too big” children

→ discard big children and perform crossover again Parsimony pressure: penalty for being oversized

constant: Depth, :# Nodes, ofNumber :#

)#,(#

)#,(#

DN

DNCError

DNPenaltyFitnessOriginalFitness

Page 7: Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive

© 2007 SNU CSE Biointelligence Lab 7

ExperimentsExperiments

One problems WDBC Diagnostics

Various experimental setup Termination condition: maximum_generation

A GP run is stopped when the number of generation reaches a given limit.

Various settings Effects of the penalty term: adjusting α Different function sets: different models (e.g. polynomial vs.

complex functions) Selection methods and their parameters Crossover and mutation rates

Page 8: Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive

© 2007 SNU CSE Biointelligence Lab 8

ResultsResults For each problem

Show the result table and write your own analysis At least 10 runs for one setting

Present the optimal classifier (the best GP tree). Write the confusion matrix for the test data by using the best tree.

Draw learning curves of your experiments. Compare with the results of neural networks (optional).

Training Test

Average SD

Best Worst Average SD

Best Worst

Setting 1

Setting 2

Setting 3

Page 9: Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive

© 2007 SNU CSE Biointelligence Lab 9Generation

Fitness

(Error)

Page 10: Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive

© 2007 SNU CSE Biointelligence Lab 10

ReferencesReferences

Source Codes GP libraries (C, C++, JAVA, …) MATLAB Tool box

Web sites http://www.cs.bham.ac.uk/~cmf/GPLib/GPLib.html http://cs.gmu.edu/~eclab/projects/ecj/ http://www.geneticprogramming.com/GPpages/

software.html …

Page 11: Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive

© 2007 SNU CSE Biointelligence Lab 11

Pay Attention!Pay Attention!

Due: May 17, 2007 Submission

Source code and executable file(s) Proper comments in the source code Via e-mail ([email protected])

Report: Hardcopy!! (Submit to 301-419) Running environments and libraries (or packages) which you

used. Results for many experiments with various parameter settings Analysis and explanation about the results in your own way