![Page 1: Selecting Informative Genes with Parallel Genetic Algorithms](https://reader036.vdocuments.us/reader036/viewer/2022081604/56816835550346895dddecef/html5/thumbnails/1.jpg)
Selecting Informative Geneswith Parallel Genetic
Algorithms
Deodatta BhoitePrashant Jain
![Page 2: Selecting Informative Genes with Parallel Genetic Algorithms](https://reader036.vdocuments.us/reader036/viewer/2022081604/56816835550346895dddecef/html5/thumbnails/2.jpg)
Terminology
GenesDNA, mRNAGene expressionMicroarrays
![Page 3: Selecting Informative Genes with Parallel Genetic Algorithms](https://reader036.vdocuments.us/reader036/viewer/2022081604/56816835550346895dddecef/html5/thumbnails/3.jpg)
Microarray output
![Page 4: Selecting Informative Genes with Parallel Genetic Algorithms](https://reader036.vdocuments.us/reader036/viewer/2022081604/56816835550346895dddecef/html5/thumbnails/4.jpg)
Gene SelectionLarge number of irrelevant genes
introduce “biological noise”Analysis of results can be simplified
by selecting only relevant genes for study
Two categories of gene selection– Filter approach selection– Wrapper approach selection
![Page 5: Selecting Informative Genes with Parallel Genetic Algorithms](https://reader036.vdocuments.us/reader036/viewer/2022081604/56816835550346895dddecef/html5/thumbnails/5.jpg)
Gene Selection
![Page 6: Selecting Informative Genes with Parallel Genetic Algorithms](https://reader036.vdocuments.us/reader036/viewer/2022081604/56816835550346895dddecef/html5/thumbnails/6.jpg)
Classifier
What is a classifier used for?Mapping of label pairs <xi, li> to
{0,1,?}Golub-Slonim classifier
Positive value = class 1, negative value = class 2
classifieringgene
ggg
gggg xsignxclass ]}2/)()][/()[({)( 212121
![Page 7: Selecting Informative Genes with Parallel Genetic Algorithms](https://reader036.vdocuments.us/reader036/viewer/2022081604/56816835550346895dddecef/html5/thumbnails/7.jpg)
Ranking based gene selection methods
GS-correlation
Genes with most positive and negative correlation values are selected.
Tends to not select genes for which class values have large standard deviations with respect to training data (some of them may be most relevant and informative).
![Page 8: Selecting Informative Genes with Parallel Genetic Algorithms](https://reader036.vdocuments.us/reader036/viewer/2022081604/56816835550346895dddecef/html5/thumbnails/8.jpg)
Ranking with disorder
This method doesn’t use the actual expression levels.
Ng_I represents the set of indices that belong to class I and h(x) is the indicator function.
![Page 9: Selecting Informative Genes with Parallel Genetic Algorithms](https://reader036.vdocuments.us/reader036/viewer/2022081604/56816835550346895dddecef/html5/thumbnails/9.jpg)
Need for subset ranking
Individual ranking may not always result in selection of informative genes.
They ignore the relationships between genes by solely relying on individual scores.
Thus we need to explore subsets of genes to find the optimal subset for classification.
![Page 10: Selecting Informative Genes with Parallel Genetic Algorithms](https://reader036.vdocuments.us/reader036/viewer/2022081604/56816835550346895dddecef/html5/thumbnails/10.jpg)
Genetic AlgorithmWhat is a genetic algorithm?
– “Genetic Algorithms are defined as global optimization procedures that use an analogy of genetic evolution of biological organisms.”
– Basically genetic algorithms tend to find the best solution to a problem by following an evolutionary process.
![Page 11: Selecting Informative Genes with Parallel Genetic Algorithms](https://reader036.vdocuments.us/reader036/viewer/2022081604/56816835550346895dddecef/html5/thumbnails/11.jpg)
Basic Genetic Algorithm
![Page 12: Selecting Informative Genes with Parallel Genetic Algorithms](https://reader036.vdocuments.us/reader036/viewer/2022081604/56816835550346895dddecef/html5/thumbnails/12.jpg)
Parallel Genetic Algorithm
For large population sizes, G.A. is computationally infeasible.
Hence the use of Parallel Genetic Algorithms.
![Page 13: Selecting Informative Genes with Parallel Genetic Algorithms](https://reader036.vdocuments.us/reader036/viewer/2022081604/56816835550346895dddecef/html5/thumbnails/13.jpg)
Parallel Genetic Algorithm
![Page 14: Selecting Informative Genes with Parallel Genetic Algorithms](https://reader036.vdocuments.us/reader036/viewer/2022081604/56816835550346895dddecef/html5/thumbnails/14.jpg)
Model and Encoding
Island Model -: Each processor runs a G.A. on a subset of the population and there is periodic migration.
Fixed Length Binary String Encoding-: Here if gene is included in the subset then value is 1 else 0.
![Page 15: Selecting Informative Genes with Parallel Genetic Algorithms](https://reader036.vdocuments.us/reader036/viewer/2022081604/56816835550346895dddecef/html5/thumbnails/15.jpg)
Fitness EvaluationTwo Different Criteria
– Classification Accuracy– Size of the subset
fitness(x) = w1 * accuracy(x) + w2 *(1 – dimensionality(x))
Here,– accuracy(x) = test accuracy of the classifier
built with the gene subset represented by x – dimensionality(x) [0,1] = the dimension
of the subset
![Page 16: Selecting Informative Genes with Parallel Genetic Algorithms](https://reader036.vdocuments.us/reader036/viewer/2022081604/56816835550346895dddecef/html5/thumbnails/16.jpg)
Fitness Evaluation
– w1 = weight assigned to accuracy– w2 = weight assigned to dimensionality
High classification accuracy and low dimension has high fitness.
![Page 17: Selecting Informative Genes with Parallel Genetic Algorithms](https://reader036.vdocuments.us/reader036/viewer/2022081604/56816835550346895dddecef/html5/thumbnails/17.jpg)
Data Sets Used
![Page 18: Selecting Informative Genes with Parallel Genetic Algorithms](https://reader036.vdocuments.us/reader036/viewer/2022081604/56816835550346895dddecef/html5/thumbnails/18.jpg)
Test Parameters
The tests were run on two processors.
The parameters of G.A. in each processor were set as -:– Population Size : 1000– Trials : 400000– Crossover probability: 0.6– Mutation probability: 0.001
![Page 19: Selecting Informative Genes with Parallel Genetic Algorithms](https://reader036.vdocuments.us/reader036/viewer/2022081604/56816835550346895dddecef/html5/thumbnails/19.jpg)
Test Parameters
– Selection Strategy: Elitist– Migration Probability: 0.002
Crossover probability of average level to get different subpopulation with good traits of the parents.
Mutation Probability low to avoid randomness of selection.
Selection Strategy is Elitist which ensures that the best individuals are kept and hence leads to more accurate subsets of genes.
![Page 20: Selecting Informative Genes with Parallel Genetic Algorithms](https://reader036.vdocuments.us/reader036/viewer/2022081604/56816835550346895dddecef/html5/thumbnails/20.jpg)
Results
![Page 21: Selecting Informative Genes with Parallel Genetic Algorithms](https://reader036.vdocuments.us/reader036/viewer/2022081604/56816835550346895dddecef/html5/thumbnails/21.jpg)
Results
Leukemia Data Set– Subset with 29 Genes found– Classifies 36/38 training instances
correctly– Classifies 30/34 test instances correctly
Colon Data Set– Subset with 30 genes found– 92% accuracy on the training data set
![Page 22: Selecting Informative Genes with Parallel Genetic Algorithms](https://reader036.vdocuments.us/reader036/viewer/2022081604/56816835550346895dddecef/html5/thumbnails/22.jpg)
Results Comparison
Results better than other algorithms such as G-S and NB algorithms which have accuracies less than 90% and gene numbers varying from 10 to 500.
![Page 23: Selecting Informative Genes with Parallel Genetic Algorithms](https://reader036.vdocuments.us/reader036/viewer/2022081604/56816835550346895dddecef/html5/thumbnails/23.jpg)
Average Performance Graphs
![Page 24: Selecting Informative Genes with Parallel Genetic Algorithms](https://reader036.vdocuments.us/reader036/viewer/2022081604/56816835550346895dddecef/html5/thumbnails/24.jpg)
Conclusion
Method does well in finding smaller gene subsets and better accuracies.
Fitness function needs to be something more sophisticated than the simple one used right now to ensure a final compact subset every time.
![Page 25: Selecting Informative Genes with Parallel Genetic Algorithms](https://reader036.vdocuments.us/reader036/viewer/2022081604/56816835550346895dddecef/html5/thumbnails/25.jpg)
Questions
Thank You.