ckannas phd thesis slides
TRANSCRIPT
Scientific Workflow Systems
and
Multi-Objective Evolutionary Algorithms
for
Life Sciences Informatics
Christos C. Kannas
Computer Science, University of Cyprus
6th June 2017
Table of Contents
1 IntroductionScientific Workflow Management SystemsSelf-Adaptive Multi-Objective Evolutionary AlgorithmsVirtual Screening & De Novo Molecular Design
2 Life Sciences Informatics platformAbout Life Sciences Informatics platformLiSIs ShowcaseLiSIs Showcase Discussion
3 Self-Adaptive Multi-Objective Evolutionary AlgorithmAbout Self-Adaptive MOEASelf-Adaptive MOEA ShowcasesSelf-Adaptive MOEA Showcases Discussion
4 Concluding RemarksConcluding Remarks - LiSIs platformConcluding Remarks - Self-Adaptive MOEA
5 Future WorkFuture Work - LiSIs platformFuture Work - Self-Adaptive MOEA
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 1 / 130
Introduction
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 2 / 130
Table of Contents
1 IntroductionScientific Workflow Management SystemsSelf-Adaptive Multi-Objective Evolutionary AlgorithmsVirtual Screening & De Novo Molecular Design
2 Life Sciences Informatics platformAbout Life Sciences Informatics platformLiSIs ShowcaseLiSIs Showcase Discussion
3 Self-Adaptive Multi-Objective Evolutionary AlgorithmAbout Self-Adaptive MOEASelf-Adaptive MOEA ShowcasesSelf-Adaptive MOEA Showcases Discussion
4 Concluding RemarksConcluding Remarks - LiSIs platformConcluding Remarks - Self-Adaptive MOEA
5 Future WorkFuture Work - LiSIs platformFuture Work - Self-Adaptive MOEA
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 2 / 130
Scientific Workflow Management Systems
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 3 / 130
SWMSs Application Domains
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 4 / 130
Self-Adaptive Multi-Objective Evolutionary
Algorithms
Multi-Objective Evolutionary Algorithms:Family of algorithms inspired by nature:
Evolve a populationMutation and CrossoverSelect fittest individuals by Pareto ranking
Handle 1 to 3 objectives
Self-Adaptive Techniques:Optimise search parameters:
Population SizeMutation RateCrossover RateGeneration GapScaling Window
Optimise reproduction operators:
Mutation Operator(s)Crossover Operator(s)Parent Selection Operator
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130
Self-Adaptive Multi-Objective Evolutionary
Algorithms
Multi-Objective Evolutionary Algorithms:Family of algorithms inspired by nature:
Evolve a populationMutation and CrossoverSelect fittest individuals by Pareto ranking
Handle 1 to 3 objectives
Self-Adaptive Techniques:Optimise search parameters:
Population SizeMutation RateCrossover RateGeneration GapScaling Window
Optimise reproduction operators:
Mutation Operator(s)Crossover Operator(s)Parent Selection Operator
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130
Self-Adaptive Multi-Objective Evolutionary
Algorithms
Multi-Objective Evolutionary Algorithms:Family of algorithms inspired by nature:
Evolve a populationMutation and CrossoverSelect fittest individuals by Pareto ranking
Handle 1 to 3 objectives
Self-Adaptive Techniques:Optimise search parameters:
Population SizeMutation RateCrossover RateGeneration GapScaling Window
Optimise reproduction operators:
Mutation Operator(s)Crossover Operator(s)Parent Selection Operator
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130
Self-Adaptive Multi-Objective Evolutionary
Algorithms
Multi-Objective Evolutionary Algorithms:Family of algorithms inspired by nature:
Evolve a populationMutation and CrossoverSelect fittest individuals by Pareto ranking
Handle 1 to 3 objectives
Self-Adaptive Techniques:Optimise search parameters:
Population SizeMutation RateCrossover RateGeneration GapScaling Window
Optimise reproduction operators:
Mutation Operator(s)Crossover Operator(s)Parent Selection Operator
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130
Self-Adaptive Multi-Objective Evolutionary
Algorithms
Multi-Objective Evolutionary Algorithms:Family of algorithms inspired by nature:
Evolve a populationMutation and CrossoverSelect fittest individuals by Pareto ranking
Handle 1 to 3 objectives
Self-Adaptive Techniques:Optimise search parameters:
Population SizeMutation RateCrossover RateGeneration GapScaling Window
Optimise reproduction operators:
Mutation Operator(s)Crossover Operator(s)Parent Selection Operator
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130
Self-Adaptive Multi-Objective Evolutionary
Algorithms
Multi-Objective Evolutionary Algorithms:Family of algorithms inspired by nature:
Evolve a populationMutation and CrossoverSelect fittest individuals by Pareto ranking
Handle 1 to 3 objectives
Self-Adaptive Techniques:Optimise search parameters:
Population SizeMutation RateCrossover RateGeneration GapScaling Window
Optimise reproduction operators:
Mutation Operator(s)Crossover Operator(s)Parent Selection Operator
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130
Drug Discovery Process - Steps
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 6 / 130
Drug Discovery Process - Timeline
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 7 / 130
Life Sciences Informatics platform
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 8 / 130
Table of Contents
1 IntroductionScientific Workflow Management SystemsSelf-Adaptive Multi-Objective Evolutionary AlgorithmsVirtual Screening & De Novo Molecular Design
2 Life Sciences Informatics platformAbout Life Sciences Informatics platformLiSIs ShowcaseLiSIs Showcase Discussion
3 Self-Adaptive Multi-Objective Evolutionary AlgorithmAbout Self-Adaptive MOEASelf-Adaptive MOEA ShowcasesSelf-Adaptive MOEA Showcases Discussion
4 Concluding RemarksConcluding Remarks - LiSIs platformConcluding Remarks - Self-Adaptive MOEA
5 Future WorkFuture Work - LiSIs platformFuture Work - Self-Adaptive MOEA
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 8 / 130
Motivation & Objectives
MotivationProvide an easy to use web based platform,Focused on Virtual Screening (VS) of natural products, andAimed towards cancer chemoprevention researchers.
ObjectivesDesign and develop a web based Scientific WorkflowManagement System (SWMS),Provide tools for VS, andEvaluate it on use cases for identifying novel chemopreventiveagents.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130
Motivation & Objectives
MotivationProvide an easy to use web based platform,Focused on Virtual Screening (VS) of natural products, andAimed towards cancer chemoprevention researchers.
ObjectivesDesign and develop a web based Scientific WorkflowManagement System (SWMS),Provide tools for VS, andEvaluate it on use cases for identifying novel chemopreventiveagents.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130
Motivation & Objectives
MotivationProvide an easy to use web based platform,Focused on Virtual Screening (VS) of natural products, andAimed towards cancer chemoprevention researchers.
ObjectivesDesign and develop a web based Scientific WorkflowManagement System (SWMS),Provide tools for VS, andEvaluate it on use cases for identifying novel chemopreventiveagents.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130
Motivation & Objectives
MotivationProvide an easy to use web based platform,Focused on Virtual Screening (VS) of natural products, andAimed towards cancer chemoprevention researchers.
ObjectivesDesign and develop a web based Scientific WorkflowManagement System (SWMS),Provide tools for VS, andEvaluate it on use cases for identifying novel chemopreventiveagents.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130
Motivation & Objectives
MotivationProvide an easy to use web based platform,Focused on Virtual Screening (VS) of natural products, andAimed towards cancer chemoprevention researchers.
ObjectivesDesign and develop a web based Scientific WorkflowManagement System (SWMS),Provide tools for VS, andEvaluate it on use cases for identifying novel chemopreventiveagents.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130
Motivation & Objectives
MotivationProvide an easy to use web based platform,Focused on Virtual Screening (VS) of natural products, andAimed towards cancer chemoprevention researchers.
ObjectivesDesign and develop a web based Scientific WorkflowManagement System (SWMS),Provide tools for VS, andEvaluate it on use cases for identifying novel chemopreventiveagents.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130
Scientific Workflow Management Systems for
Virtual Screening
Applications Technology Scientific Field(s)
Open SourceTaverna Java
Bioinformatics,Chemistry,Astronomy,Data Mining,Text Mining,Music
Galaxy PythonLife Sciences,Bioinformatics
Knime Java
Life Sciences,Chemoinformatics,Bioinformatics,High Performance Data Anal-ysis
CommercialInforsence/DiscoveryNet
Life Sciences,Healthcare,Environmental Monitoring,Geo-hazard Modelling
Pipeline PilotBiology,Chemistry,Material Science
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 10 / 130
Funding Support
The work has been partially supported through the EU-FP7GRANATUM project, ”A Social Collaborative WorkingSpace Semantically Interlinking Biomedical Researchers,Knowledge and data for the design and execution of In SilicoModels and Experiments in Cancer Chemoprevention”,contract number 270139.Support the research of EU-FP7 Linked2Safety project, ”ANext-Generation, Secure Linked Data Medical InformationSpace For Semantically-Interconnecting Electronic HealthRecords and Clinical Trials Systems Advancing PatientsSafety In Clinical Research”, contract number 288328.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 11 / 130
Funding Support
The work has been partially supported through the EU-FP7GRANATUM project, ”A Social Collaborative WorkingSpace Semantically Interlinking Biomedical Researchers,Knowledge and data for the design and execution of In SilicoModels and Experiments in Cancer Chemoprevention”,contract number 270139.Support the research of EU-FP7 Linked2Safety project, ”ANext-Generation, Secure Linked Data Medical InformationSpace For Semantically-Interconnecting Electronic HealthRecords and Clinical Trials Systems Advancing PatientsSafety In Clinical Research”, contract number 288328.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 11 / 130
Life Sciences Informatics platform
Life Sciences Informatics (LiSIs) is a web based SWMS forVS [Kannas et al., 2015].LiSIs is based on the Galaxy SWMS [Goecks et al., 2010],[Blankenberg et al., 2010], [Giardine et al., 2005].
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 12 / 130
Life Sciences Informatics platform
Life Sciences Informatics (LiSIs) is a web based SWMS forVS [Kannas et al., 2015].LiSIs is based on the Galaxy SWMS [Goecks et al., 2010],[Blankenberg et al., 2010], [Giardine et al., 2005].
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 12 / 130
LiSIs modules
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 13 / 130
LiSIs modules
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 13 / 130
LiSIs modules
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 13 / 130
LiSIs modules
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 13 / 130
LiSIs Showcase
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 14 / 130
LiSIs Showcase Information
LiSIs was (successfully) used for the discovery of promisingagents with chemopreventive properties, that are able to bindto Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β(ER-β)Datasets:
2414 compounds from Indofine,55 compounds characterized by Medina-Franco et al.[Medina-Franco et al., 2010], and21 known ER ligands retrieved from PubChem.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130
LiSIs Showcase Information
LiSIs was (successfully) used for the discovery of promisingagents with chemopreventive properties, that are able to bindto Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β(ER-β)Datasets:
2414 compounds from Indofine,55 compounds characterized by Medina-Franco et al.[Medina-Franco et al., 2010], and21 known ER ligands retrieved from PubChem.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130
LiSIs Showcase Information
LiSIs was (successfully) used for the discovery of promisingagents with chemopreventive properties, that are able to bindto Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β(ER-β)Datasets:
2414 compounds from Indofine,55 compounds characterized by Medina-Franco et al.[Medina-Franco et al., 2010], and21 known ER ligands retrieved from PubChem.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130
LiSIs Showcase Information
LiSIs was (successfully) used for the discovery of promisingagents with chemopreventive properties, that are able to bindto Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β(ER-β)Datasets:
2414 compounds from Indofine,55 compounds characterized by Medina-Franco et al.[Medina-Franco et al., 2010], and21 known ER ligands retrieved from PubChem.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130
LiSIs Showcase Information
LiSIs was (successfully) used for the discovery of promisingagents with chemopreventive properties, that are able to bindto Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β(ER-β)Datasets:
2414 compounds from Indofine,55 compounds characterized by Medina-Franco et al.[Medina-Franco et al., 2010], and21 known ER ligands retrieved from PubChem.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130
LiSIs Showcase Workflow
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 16 / 130
LiSIs Showcase Docking Results
(a) ER-α Docking Score (b) ER-β Docking Score
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 17 / 130
LiSIs Showcase Discussion
From Indofine dataset (2414 compounds), based on theirnatural-like criteria and docking results, we selected:
18 potential ER ligands,Were further investigated in vitro with the ER binding assaydescribed by Gurer-Orhan et al. [Gurer-Orhan et al., 2005]with minor modifications,15 out of 18 compounds (83.3%) were experimentallyconfirmed active.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 18 / 130
LiSIs Showcase Discussion
From Indofine dataset (2414 compounds), based on theirnatural-like criteria and docking results, we selected:
18 potential ER ligands,Were further investigated in vitro with the ER binding assaydescribed by Gurer-Orhan et al. [Gurer-Orhan et al., 2005]with minor modifications,15 out of 18 compounds (83.3%) were experimentallyconfirmed active.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 18 / 130
LiSIs Showcase Discussion
From Indofine dataset (2414 compounds), based on theirnatural-like criteria and docking results, we selected:
18 potential ER ligands,Were further investigated in vitro with the ER binding assaydescribed by Gurer-Orhan et al. [Gurer-Orhan et al., 2005]with minor modifications,15 out of 18 compounds (83.3%) were experimentallyconfirmed active.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 18 / 130
Self-Adaptive Multi-Objective Evolutionary
Algorithm
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 19 / 130
Table of Contents
1 IntroductionScientific Workflow Management SystemsSelf-Adaptive Multi-Objective Evolutionary AlgorithmsVirtual Screening & De Novo Molecular Design
2 Life Sciences Informatics platformAbout Life Sciences Informatics platformLiSIs ShowcaseLiSIs Showcase Discussion
3 Self-Adaptive Multi-Objective Evolutionary AlgorithmAbout Self-Adaptive MOEASelf-Adaptive MOEA ShowcasesSelf-Adaptive MOEA Showcases Discussion
4 Concluding RemarksConcluding Remarks - LiSIs platformConcluding Remarks - Self-Adaptive MOEA
5 Future WorkFuture Work - LiSIs platformFuture Work - Self-Adaptive MOEA
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 19 / 130
Multi-Objective Algorithms for Molecular Design
Name MO Method SearchMethod
Remarks Reference
EA-Inventor
Weighted EvolutionaryAlgorithm
Ligand [Feher et al., 2008]
GANDI Weighted Parallel Evo-lutionary Al-gorithm
Structure [Dey and Caflisch, 2008]
FOG Weighted EvolutionaryAlgorithm
Ligand [Kutchukian et al., 2009]
MEGA Pareto based EvolutionaryAlgorithm
Ligand & Struc-ture
[Nicolaou et al., 2009a]
PLD Pareto based EvolutionaryAlgorithm
ADME relatedproperties
[Ekins et al., 2010]
NovoFLAP Weighted EvolutionaryAlgorithm
Ligand [Damewood et al., 2010]
PhDD Weighted Workflow Pharmacophore [Huang et al., 2010]DOGS Weighted Workflow Ligand [Hartenfeller et al., 2012]LiGen Weighted Workflow Ligand, Struc-
ture & Pharma-cophore
[Beccari et al., 2013]
MOARF Weighted Workflow Ligand & Struc-ture
[Firth et al., 2015]
Synopsis Pareto based EvolutionaryAlgorithm
Ligand & Struc-ture
[Daeyaert and Deem, 2016]
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 20 / 130
Motivation & Objectives
MotivationFind suitable search parameters for an algorithm in a givenproblem, andAutomate this process.
ObjectivesDesign and develop an algorithm:
To search for the fittest search parameters of MOEAs,To be problem agnostic, andEvaluate on our previously proposed eMEGA for molecularDe Novo Design.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130
Motivation & Objectives
MotivationFind suitable search parameters for an algorithm in a givenproblem, andAutomate this process.
ObjectivesDesign and develop an algorithm:
To search for the fittest search parameters of MOEAs,To be problem agnostic, andEvaluate on our previously proposed eMEGA for molecularDe Novo Design.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130
Motivation & Objectives
MotivationFind suitable search parameters for an algorithm in a givenproblem, andAutomate this process.
ObjectivesDesign and develop an algorithm:
To search for the fittest search parameters of MOEAs,To be problem agnostic, andEvaluate on our previously proposed eMEGA for molecularDe Novo Design.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130
Motivation & Objectives
MotivationFind suitable search parameters for an algorithm in a givenproblem, andAutomate this process.
ObjectivesDesign and develop an algorithm:
To search for the fittest search parameters of MOEAs,To be problem agnostic, andEvaluate on our previously proposed eMEGA for molecularDe Novo Design.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130
Motivation & Objectives
MotivationFind suitable search parameters for an algorithm in a givenproblem, andAutomate this process.
ObjectivesDesign and develop an algorithm:
To search for the fittest search parameters of MOEAs,To be problem agnostic, andEvaluate on our previously proposed eMEGA for molecularDe Novo Design.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130
About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:
Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:
Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:
Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:
Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:
Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:
Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:
Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:
Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:
Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:
Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:
Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:
Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
Self-Adaptive MOEA Pseudocode
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 23 / 130
Self-Adaptive MOEA Chromosome
Chromosomes Example
Objective Fitness Functions
Objective Fitness Function Range Example
Non-dominated Solutions % 0 - 1.0 0.90Unique Solutions % 0 - 1.0 0.88Pareto Front Hypervolume 0 - 1.0 0.56
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 24 / 130
Self-Adaptive MOEA Chromosome
Chromosomes Example
Objective Fitness Functions
Objective Fitness Function Range Example
Non-dominated Solutions % 0 - 1.0 0.90Unique Solutions % 0 - 1.0 0.88Pareto Front Hypervolume 0 - 1.0 0.56
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 24 / 130
eMEGA Chromosome
Graph based, andInformation related to evolutionary design process.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 25 / 130
eMEGA Chromosome
Graph based, andInformation related to evolutionary design process.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 25 / 130
eMEGA Chromosome
Graph based, andInformation related to evolutionary design process.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 25 / 130
Self-Adaptive MOEA Flowchart
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 26 / 130
Self-Adaptive MOEA Flowchart
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 26 / 130
Self-Adaptive MOEA Flowchart
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 26 / 130
Self-Adaptive MOEA Flowchart
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 26 / 130
Self-Adaptive MOEA Showcases
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 27 / 130
Validation of Self-Adaptive MOEA: About
Compare SAMOEA, eMEGA and MOARF[Firth et al., 2015].Design molecules that have structural and chemicalproperties similarity to the target molecule of Seliciclib.
Figure: Seliciclib (CYC202, R-roscovitine)
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 28 / 130
Validation of Self-Adaptive MOEA: Staring
Datasets
Starting Molecules datasets:
Maybridge’s Screening Library that contains 53953 molecules(Dataset 1),Asinex’s Elite Libraries that contains 104577 molecules(Dataset 2).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 29 / 130
Validation of Self-Adaptive MOEA: Staring
Datasets
Starting Molecules datasets:
Maybridge’s Screening Library that contains 53953 molecules(Dataset 1),Asinex’s Elite Libraries that contains 104577 molecules(Dataset 2).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 29 / 130
Validation of Self-Adaptive MOEA: Settings
eMEGA SettingsDataset Objectives Population Iterations Evolutionary Operations
Dataset 1 Structural SimilarityChemical DescriptorSimilarity
500 500Mutation Probability: 15%Crossover Probability: 80%Selection Type: RouletteDiversity Type: Genotype
Dataset 2
SAMOEA SettingsSAMOEA
Dataset Objectives Population Iterations Evolutionary OperationsDataset 1 Non Dominate
Solutions PercentageUnique SolutionsPercentage
20 100Mutation Probability: 15%Crossover Probability: 80%Selection Type: RouletteDiversity Type: Phenotype
Dataset 2
eMEGADataset 1 Structural Similarity
Chemical DescriptorSimilarity
100 1Defined during run time.Based on SAMOEA’s chro-mosomes.
Dataset 2
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 30 / 130
Validation of Self-Adaptive MOEA: Settings
eMEGA SettingsDataset Objectives Population Iterations Evolutionary Operations
Dataset 1 Structural SimilarityChemical DescriptorSimilarity
500 500Mutation Probability: 15%Crossover Probability: 80%Selection Type: RouletteDiversity Type: Genotype
Dataset 2
SAMOEA SettingsSAMOEA
Dataset Objectives Population Iterations Evolutionary OperationsDataset 1 Non Dominate
Solutions PercentageUnique SolutionsPercentage
20 100Mutation Probability: 15%Crossover Probability: 80%Selection Type: RouletteDiversity Type: Phenotype
Dataset 2
eMEGADataset 1 Structural Similarity
Chemical DescriptorSimilarity
100 1Defined during run time.Based on SAMOEA’s chro-mosomes.
Dataset 2
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 30 / 130
Validation of Self-Adaptive MOEA: Results
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 31 / 130
Validation of Self-Adaptive MOEA: Results
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 31 / 130
Validation of Self-Adaptive MOEA: Results -
Search Settings (1)
SAMOEA Top 10 proposed settings for eMEGA for Maybridge dataset
MutationProbability
CrossoverProbability
SelectionType
DiversityType
NonDominated%
UniqueSolutions%
Rank
0.029 0.694 roulette genotype 0.9 0.986 10.175 0.818 roulette phenotype 0.914 0.961 10.172 0.818 tournament phenotype 0.934 0.9533 10.026 0.694 roulette phenotype 0.928 0.955 10.001 0.963 roulette phenotype 0.982 0.848 10.177 0.818 roulette phenotype 0.921 0.956 10.083 0.73 tournament phenotype 0.95 0.946 10.086 0.798 tournament genotype 0.976 0.928 10.172 0.818 best genotype 0.914 0.973 20.176 0.818 roulette genotype 0.9312 0.956 2
Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the
actual %. The smaller the number listed here the better. ’Rank’ is their non dominance
rank.
Validation of Self-Adaptive MOEA: Results -
Search Settings (2)
SAMOEA Top 10 proposed settings for eMEGA for Asinex dataset
MutationProbability
CrossoverProbability
SelectionType
DiversityType
NonDominated%
UniqueSolutions%
Rank
0.105 1.0 best phenotype 0.988 0.931 10.139 0.963 tournament phenotype 0.962 0.956 10.089 0.694 tournament genotype 0.976 0.943 10.139 0.969 best phenotype 0.96 0.96 10.108 0.69 tournament genotype 0.955 0.962 10.1 1.0 best phenotype 0.988 0.942 1
0.088 0.685 tournament genotype 0.96 0.962 10.139 0.966 roulette phenotype 0.965 0.948 10.089 0.709 tournament genotype 0.964 0.957 2
Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the
actual %. The smaller the number listed here the better. ’Rank’ is their non dominance
rank.
Use Case 1: About
Design molecules that bind to ER-α based on:
Structural similarity to Tamoxifen, andStructural dissimilarity to Ibuproxam.
(a) Tamoxifen. (b) Ibuproxam.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 34 / 130
Use Case 1: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,Applied filters:
Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130
Use Case 1: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,Applied filters:
Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130
Use Case 1: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,Applied filters:
Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130
Use Case 1: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,Applied filters:
Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130
Use Case 1: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,Applied filters:
Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130
Use Case 1: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,Applied filters:
Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130
Use Case 1: Results - In objective space
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 36 / 130
Use Case 1: Results - Designed molecules
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 37 / 130
Use Case 1: Results - AutoDock Vina docking
Molecule Id Docking Affinity (kcal/mol)Tamoxifen -8.2DnD 6 SP 20 4 X 13a -7.9DnD 31 SP 150 37 M 19 -7.9DnD 8 SP 9 2 M 13 -7.8DnD 4 SP 199 49 X 46b -7.7DnD 12 SP 75 18 M 13 -7.6DnD 31 SP 6 1 M 16 -7.2DnD 15 SP 168 41 M 0 -7.2DnD 11 SP 74 18 M 4 -7.1DnD 31 SP 193 48 X 76b -6.9DnD 1 SP 78 19 X 84a -6.8
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 38 / 130
Use Case 1: Results - Self-Adaptive MOEA non
dominated settings for eMEGA
MutationProbability
CrossoverProbability
SelectionType
DiversityType
NonDominated%
ParetoHypervolume
Rank
0.15777 0.80279 tournament genotype 0.634 0.341 10.15613 0.88305 tournament genotype 0.634 0.341 10.15627 0.88891 tournament genotype 0.634 0.341 10.15688 0.88891 roulette genotype 0.649 0.340 10.00552 0.94308 best genotype 0.624 0.427 1
Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the
number listed here the better. ’Rank’ is their non dominance rank.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 39 / 130
Use Case 3: About
Design molecules that bind to ER-α based on:
Structural similarity to Raloxifene, andChemical Properties similarity to Raloxifene.
Figure: Raloxifene.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 40 / 130
Use Case 3: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,Applied filters:
Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 41 / 130
Use Case 3: Results - In objective space
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 42 / 130
Use Case 3: Results - Designed molecules
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 43 / 130
Use Case 3: Results - AutoDock Vina docking
Molecule Id Docking Affinity (kcal/mol)DnD 31 SP 194 48 M 49 -8.2DnD 34 SP 197 49 X 13a -5.9Raloxifene -2.2 (-11.70 PubChem)
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 44 / 130
Use Case 3: Results - Self-Adaptive MOEA non
dominated settings for eMEGA
MutationProbability
CrossoverProbability
SelectionType
DiversityType
NonDominated%
ParetoHypervolume
Rank
0.12927 0.98597 roulette genotype 0.997 0.274 10.12897 0.98588 roulette genotype 0.997 0.274 10.12933 0.98588 roulette genotype 0.997 0.274 10.12946 0.98559 roulette genotype 0.997 0.274 10.12928 0.98582 roulette genotype 0.997 0.274 10.12897 0.98588 tournament genotype 0.997 0.274 1
Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the
number listed here the better. ’Rank’ is their non dominance rank.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 45 / 130
Use Case 4: About
Design molecules that bind to Proteasome B5 based on:
Structural similarity to Ixazomib, andChemical Properties similarity to Ixazomib.
Figure: Ixazomib.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 46 / 130
Use Case 4: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,Applied filters:
Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 47 / 130
Use Case 4: Results - In objective space
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 48 / 130
Use Case 4: Results - Designed molecules
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 49 / 130
Use Case 4: Results - AutoDock 4 docking
Molecule Id Docking Affinity (kcal/mol)DnD 19 SP 196 48 X 59b -7.19DnD 49 SP 193 48 X 123b -6.68DnD 1 SP 196 48 X 67a -6.08
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 50 / 130
Use Case 4: Results - Self-Adaptive MOEA non
dominated settings for eMEGA
MutationProbability
CrossoverProbability
SelectionType
DiversityType
NonDominated%
ParetoHypervolume
Rank
0.09507 0.98194 tournament phenotype 0.993 0.442 10.09507 0.9819 roulette phenotype 0.991 0.442 10.09471 0.98178 roulette genotype 0.997 0.426 10.09484 0.98183 roulette phenotype 0.996 0.441 10.09277 0.98235 roulette genotype 0.996 0.441 1
Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the
number listed here the better. ’Rank’ is their non dominance rank.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 51 / 130
Self-Adaptive MOEA Showcases Discussion
SAMOEA proposed interesting solutions in all problems thathas been applied to,Further in-vitro investigation is required, andSAMOEA’s proposed eMEGA settings differ based onproblem and dataset (no silver bullet).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 52 / 130
Self-Adaptive MOEA Showcases Discussion
SAMOEA proposed interesting solutions in all problems thathas been applied to,Further in-vitro investigation is required, andSAMOEA’s proposed eMEGA settings differ based onproblem and dataset (no silver bullet).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 52 / 130
Self-Adaptive MOEA Showcases Discussion
SAMOEA proposed interesting solutions in all problems thathas been applied to,Further in-vitro investigation is required, andSAMOEA’s proposed eMEGA settings differ based onproblem and dataset (no silver bullet).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 52 / 130
Concluding Remarks
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 53 / 130
Table of Contents
1 IntroductionScientific Workflow Management SystemsSelf-Adaptive Multi-Objective Evolutionary AlgorithmsVirtual Screening & De Novo Molecular Design
2 Life Sciences Informatics platformAbout Life Sciences Informatics platformLiSIs ShowcaseLiSIs Showcase Discussion
3 Self-Adaptive Multi-Objective Evolutionary AlgorithmAbout Self-Adaptive MOEASelf-Adaptive MOEA ShowcasesSelf-Adaptive MOEA Showcases Discussion
4 Concluding RemarksConcluding Remarks - LiSIs platformConcluding Remarks - Self-Adaptive MOEA
5 Future WorkFuture Work - LiSIs platformFuture Work - Self-Adaptive MOEA
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 53 / 130
Concluding Remarks - LiSIs platform
Features a Web based Virtual Screening platform, focused forCancer Chemoprevention Research.To be expanded later in the future with tools featuring thealgorithms from MEGA framework.A number of SWs were implemented for:
preparing docking models,preparing predictive models,performing docking experiments,using predictive models to predict biochemical propertiesand behaviour, andperforming VS workflows.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 54 / 130
Concluding Remarks - LiSIs platform
Features a Web based Virtual Screening platform, focused forCancer Chemoprevention Research.To be expanded later in the future with tools featuring thealgorithms from MEGA framework.A number of SWs were implemented for:
preparing docking models,preparing predictive models,performing docking experiments,using predictive models to predict biochemical propertiesand behaviour, andperforming VS workflows.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 54 / 130
Concluding Remarks - LiSIs platform
Features a Web based Virtual Screening platform, focused forCancer Chemoprevention Research.To be expanded later in the future with tools featuring thealgorithms from MEGA framework.A number of SWs were implemented for:
preparing docking models,preparing predictive models,performing docking experiments,using predictive models to predict biochemical propertiesand behaviour, andperforming VS workflows.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 54 / 130
Concluding Remarks - Self-Adaptive MOEA (1)
Drawbacks:
Needs a lot of time to terminate, andVery slow convergence.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 55 / 130
Concluding Remarks - Self-Adaptive MOEA (1)
Drawbacks:
Needs a lot of time to terminate, andVery slow convergence.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 55 / 130
Concluding Remarks - Self-Adaptive MOEA (1)
Drawbacks:
Needs a lot of time to terminate, andVery slow convergence.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 55 / 130
Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
Future Work
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 57 / 130
Table of Contents
1 IntroductionScientific Workflow Management SystemsSelf-Adaptive Multi-Objective Evolutionary AlgorithmsVirtual Screening & De Novo Molecular Design
2 Life Sciences Informatics platformAbout Life Sciences Informatics platformLiSIs ShowcaseLiSIs Showcase Discussion
3 Self-Adaptive Multi-Objective Evolutionary AlgorithmAbout Self-Adaptive MOEASelf-Adaptive MOEA ShowcasesSelf-Adaptive MOEA Showcases Discussion
4 Concluding RemarksConcluding Remarks - LiSIs platformConcluding Remarks - Self-Adaptive MOEA
5 Future WorkFuture Work - LiSIs platformFuture Work - Self-Adaptive MOEA
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 57 / 130
Future Work - LiSIs platform
Develop LiSIs 2.0:
Based on latest Galaxy platform, andRedesign of tools to be compatible with Galaxy’s ToolShedfor easy deployment,
Update LiSIs with a feature to visualise intermediate resultsfrom various tools,Expand LiSIs tools with tools featuring the MEGA line-up ofalgorithms and SAMOEA,Explore resource management in SWMSs:
Novel Multi-Objective Optimization SW design approaches,Novel Multi-Objective Optimization SWs schedulingapproaches.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
Future Work - LiSIs platform
Develop LiSIs 2.0:
Based on latest Galaxy platform, andRedesign of tools to be compatible with Galaxy’s ToolShedfor easy deployment,
Update LiSIs with a feature to visualise intermediate resultsfrom various tools,Expand LiSIs tools with tools featuring the MEGA line-up ofalgorithms and SAMOEA,Explore resource management in SWMSs:
Novel Multi-Objective Optimization SW design approaches,Novel Multi-Objective Optimization SWs schedulingapproaches.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
Future Work - LiSIs platform
Develop LiSIs 2.0:
Based on latest Galaxy platform, andRedesign of tools to be compatible with Galaxy’s ToolShedfor easy deployment,
Update LiSIs with a feature to visualise intermediate resultsfrom various tools,Expand LiSIs tools with tools featuring the MEGA line-up ofalgorithms and SAMOEA,Explore resource management in SWMSs:
Novel Multi-Objective Optimization SW design approaches,Novel Multi-Objective Optimization SWs schedulingapproaches.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
Future Work - LiSIs platform
Develop LiSIs 2.0:
Based on latest Galaxy platform, andRedesign of tools to be compatible with Galaxy’s ToolShedfor easy deployment,
Update LiSIs with a feature to visualise intermediate resultsfrom various tools,Expand LiSIs tools with tools featuring the MEGA line-up ofalgorithms and SAMOEA,Explore resource management in SWMSs:
Novel Multi-Objective Optimization SW design approaches,Novel Multi-Objective Optimization SWs schedulingapproaches.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
Future Work - LiSIs platform
Develop LiSIs 2.0:
Based on latest Galaxy platform, andRedesign of tools to be compatible with Galaxy’s ToolShedfor easy deployment,
Update LiSIs with a feature to visualise intermediate resultsfrom various tools,Expand LiSIs tools with tools featuring the MEGA line-up ofalgorithms and SAMOEA,Explore resource management in SWMSs:
Novel Multi-Objective Optimization SW design approaches,Novel Multi-Objective Optimization SWs schedulingapproaches.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
Future Work - LiSIs platform
Develop LiSIs 2.0:
Based on latest Galaxy platform, andRedesign of tools to be compatible with Galaxy’s ToolShedfor easy deployment,
Update LiSIs with a feature to visualise intermediate resultsfrom various tools,Expand LiSIs tools with tools featuring the MEGA line-up ofalgorithms and SAMOEA,Explore resource management in SWMSs:
Novel Multi-Objective Optimization SW design approaches,Novel Multi-Objective Optimization SWs schedulingapproaches.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
Future Work - LiSIs platform
Develop LiSIs 2.0:
Based on latest Galaxy platform, andRedesign of tools to be compatible with Galaxy’s ToolShedfor easy deployment,
Update LiSIs with a feature to visualise intermediate resultsfrom various tools,Expand LiSIs tools with tools featuring the MEGA line-up ofalgorithms and SAMOEA,Explore resource management in SWMSs:
Novel Multi-Objective Optimization SW design approaches,Novel Multi-Objective Optimization SWs schedulingapproaches.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
Future Work - LiSIs platform
Develop LiSIs 2.0:
Based on latest Galaxy platform, andRedesign of tools to be compatible with Galaxy’s ToolShedfor easy deployment,
Update LiSIs with a feature to visualise intermediate resultsfrom various tools,Expand LiSIs tools with tools featuring the MEGA line-up ofalgorithms and SAMOEA,Explore resource management in SWMSs:
Novel Multi-Objective Optimization SW design approaches,Novel Multi-Objective Optimization SWs schedulingapproaches.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
Future Work - Self-Adaptive MOEA
Optimise MEGA framework (memory management andparallelism),Implement self-adaptive technique for selecting geneticoperators,Extend Self-Adaptive MOEA to use other MOEAs,Implement models for other problems, andImplement new objective functions.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130
Future Work - Self-Adaptive MOEA
Optimise MEGA framework (memory management andparallelism),Implement self-adaptive technique for selecting geneticoperators,Extend Self-Adaptive MOEA to use other MOEAs,Implement models for other problems, andImplement new objective functions.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130
Future Work - Self-Adaptive MOEA
Optimise MEGA framework (memory management andparallelism),Implement self-adaptive technique for selecting geneticoperators,Extend Self-Adaptive MOEA to use other MOEAs,Implement models for other problems, andImplement new objective functions.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130
Future Work - Self-Adaptive MOEA
Optimise MEGA framework (memory management andparallelism),Implement self-adaptive technique for selecting geneticoperators,Extend Self-Adaptive MOEA to use other MOEAs,Implement models for other problems, andImplement new objective functions.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130
Future Work - Self-Adaptive MOEA
Optimise MEGA framework (memory management andparallelism),Implement self-adaptive technique for selecting geneticoperators,Extend Self-Adaptive MOEA to use other MOEAs,Implement models for other problems, andImplement new objective functions.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130
List of Publications
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 61 / 130
Table of Contents
6 List of Publications7 References8 Backup Frames
Validation of Self-Adaptive MOEAUse Case 1Use Case 2Use Case 3Use Case 4
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 61 / 130
List of Publications I
Book Chapters
C. A. Nicolaou and C. C. Kannas, “Molecular LibraryDesign Using Multi-Objective OptimizationMethods,” in Chemical Library Design, J. Z. Zhou, Ed.Humana Press, 2011, pp. 53–69.
Journals
C. Kannas et al., “LiSIs: An Online Scientific WorkflowSystem for Virtual Screening,” Combinatorial Chemistry& High Throughput Screening, vol. 18, no. 3, pp. 281–295,Mar. 2015.C. A. Nicolaou, C. Kannas, and E. Loizidou,“Multi-objective optimization methods in de novodrug design,” Mini Rev Med Chem, vol. 12, no. 10, pp.979–987, Sep. 2012.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 62 / 130
List of Publications II
C. Nicolaou, C. Kannas, and C. Pattichis,“Knowledge-driven multi-objective de novo drugdesign,” Chemistry Central Journal, vol. 3, p. P22, 2009.
Conferences
C. C. Kannas, and C. S. Pattichis, ”Self-AdaptiveMulti-Objective Evolutionary Algorithm forMolecular Design,” in 30th IEEE InternationalSymposium on Computer-Base Medical Systems,Thessoloniki, Greece, 22-24 June 2017, pp. 1-6.P. Hasapis et al., ”Molecular clustering via knowledgemining from biomedical scientific corpora,” in 2013IEEE 13th International Conference on Bioinformatics andBioengineering (BIBE), 2013, pp. 1-5.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 63 / 130
List of Publications III
C. C. Kannas et al., “A workflow system for virtualscreening in cancer chemoprevention,” in 2012 IEEE12th International Conference on BioinformaticsBioengineering (BIBE), 2012, pp. 439–446.K. G. Achilleos, C. C. Kannas, C. A. Nicolaou, C. S.Pattichis, and V. J. Promponas, “Open source workflowsystems in life sciences informatics,” in 2012 IEEE 12thInternational Conference on Bioinformatics Bioengineering(BIBE), 2012, pp. 552–558.C. A. Nicolaou, C. Kannas, and C. S. Pattichis, “Optimalgraph design using a knowledge-drivenmulti-objective evolutionary graph algorithm,” in2009 9th International Conference on InformationTechnology and Applications in Biomedicine, Larnaka,Cyprus, 2009, pp. 1–6.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 64 / 130
List of Publications IV
C. C. Kannas, C. A. Nicolaou, and C. S. Pattichis, “AParallel implementation of a Multi-objectiveEvolutionary Algorithm,” in 2009 9th InternationalConference on Information Technology and Applications inBiomedicine, Larnaka, Cyprus, 2009, pp. 1–6.
Abstracts
C. C. Kannas, and C. S. Pattichis, ”Self-AdaptiveMulti-Objective Evolutionary Algorithm forMolecular Design,” in 39th Annual InternationalConference of the IEEE Engineering in Medicine and BiologySociety, Jeju Island, Korea, 11-15 July 2017.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 65 / 130
References
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 66 / 130
Table of Contents
6 List of Publications7 References8 Backup Frames
Validation of Self-Adaptive MOEAUse Case 1Use Case 2Use Case 3Use Case 4
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 66 / 130
References I
Beccari, A. R., Cavazzoni, C., Beato, C., and Costantino, G.(2013). LiGen: A High Performance Workflow for ChemistryDriven de Novo Design. Journal of Chemical Information andModeling.
Blankenberg, D., Kuster, G. V., Coraor, N., Ananda, G.,Lazarus, R., Mangan, M., Nekrutenko, A., and Taylor, J. (2010).Galaxy: A Web-Based Genome Analysis Tool forExperimentalists. In Current Protocols in Molecular Biology.John Wiley & Sons, Inc.
Daeyaert, F. and Deem, M. W. (2016). A Pareto Algorithm forEfficient De Novo Design of Multi-functional Molecules.Molecular Informatics, pages n/a–n/a.
References II
Damewood, Jr, J. R., Lerman, C. L., and Masek, B. B. (2010).NovoFLAP: A ligand-based de novo design approach for thegeneration of medicinally relevant ideas. Journal of ChemicalInformation and Modeling, 50(7):1296–1303.
Dey, F. and Caflisch, A. (2008). Fragment-based de novo liganddesign by multiobjective evolutionary optimization. Journal ofChemical Information and Modeling, 48(3):679–690.
Ekins, S., Honeycutt, J. D., and Metz, J. T. (2010). Evolvingmolecules using multi-objective optimization: applying toADME/Tox. Drug Discovery Today, 15(11-12):451–460.
References III
Feher, M., Gao, Y., Baber, J. C., Shirley, W. A., and Saunders,J. (2008). The use of ligand-based de novo design for scaffoldhopping and sidechain optimization: two case studies. Bioorganic& Medicinal Chemistry, 16(1):422–427.
Firth, N. C., Atrash, B., Brown, N., and Blagg, J. (2015).MOARF, an Integrated Workflow for MultiobjectiveOptimization: Implementation, Synthesis, and BiologicalEvaluation. Journal of Chemical Information and Modeling.
Fonseca, C. and Fleming, P. (1998). Multiobjective optimizationand multiple constraint handling with evolutionary algorithms. I.A unified formulation. IEEE Transactions on Systems, Man andCybernetics, Part A: Systems and Humans, 28(1):26–37.
References IV
Giardine, B., Riemer, C., Hardison, R. C., Burhans, R., Elnitski,L., Shah, P., Zhang, Y., Blankenberg, D., Albert, I., Taylor, J.,Miller, W., Kent, W. J., and Nekrutenko, A. (2005). Galaxy: APlatform for Interactive Large-Scale Genome Analysis. GenomeResearch, 15(10):1451–1455.
Goecks, J., Nekrutenko, A., Taylor, J., and Galaxy Team, T.(2010). Galaxy: A comprehensive approach for supportingaccessible, reproducible, and transparent computational researchin the life sciences. Genome Biology, 11(8):R86.
Grefenstette, J. (1986). Optimization of Control Parameters forGenetic Algorithms. IEEE Transactions on Systems, Man andCybernetics, 16(1):122–128.
References V
Gurer-Orhan, H., Kool, J., Vermeulen, N. P. E., and Meerman, J.H. N. (2005). A novel microplate reader-based high-throughputassay for estrogen receptor binding. International Journal ofEnvironmental Analytical Chemistry, 85(3):149–161.
Hartenfeller, M., Zettl, H., Walter, M., Rupp, M., Reisen, F.,Proschak, E., Weggen, S., Stark, H., and Schneider, G. (2012).DOGS: Reaction-Driven de novo Design of BioactiveCompounds. PLoS Comput Biol, 8(2):e1002380.
Huang, Q., Li, L.-L., and Yang, S.-Y. (2010). PhDD: a newpharmacophore-based de novo design method of drug-likemolecules combined with assessment of synthetic accessibility.Journal of Molecular Graphics and Modelling, 28(8):775–787.
References VI
Kannas, C., Kalvari, I., Lambrinidis, G., Neophytou, C., Savva,C., Kirmitzoglou, I., Antoniou, Z., Achilleos, K., Scherf, D.,Pitta, C., Nicolaou, C., Mikros, E., Promponas, V., Gerhauser,C., Mehta, R., Constantinou, A., and Pattichis, C. (2015). LiSIs:An Online Scientific Workflow System for Virtual Screening.Combinatorial Chemistry & High Throughput Screening,18(3):281 – 295.
Kramer, O. (2010). Evolutionary self-adaptation: a survey ofoperators and strategy parameters. Evolutionary Intelligence,3(2):51–65.
References VII
Kutchukian, P. S., Lou, D., and Shakhnovich, E. I. (2009). FOG:Fragment Optimized Growth algorithm for the de novogeneration of molecules occupying druglike chemical space.Journal of Chemical Information and Modeling, 49(7):1630–1642.
Medina-Franco, J. L., Lopez-Vallejo, F., Kuck, D., and Lyko, F.(2010). Natural products as DNA methyltransferase inhibitors: acomputer-aided discovery approach. Molecular Diversity,15:293–304.
Nicolaou, C. A., Apostolakis, J., and Pattichis, C. S. (2009a). DeNovo Drug Design Using Multiobjective Evolutionary Graphs.Journal of Chemical Information and Modeling, 49(2):295–307.
References VIII
Nicolaou, C. A., Kannas, C., and Pattichis, C. S. (2009b).Optimal graph design using a knowledge-driven multi-objectiveevolutionary graph algorithm. In 2009 9th InternationalConference on Information Technology and Applications inBiomedicine, pages 1–6, Larnaka, Cyprus. IEEE.
Backup Frames
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 61 / 130
Table of Contents
6 List of Publications7 References8 Backup Frames
Validation of Self-Adaptive MOEAUse Case 1Use Case 2Use Case 3Use Case 4
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 61 / 130
Pareto Ranking
LiSIs Showcase - Known ER Ligands
A/A Estrogen Ligand Docking Score ER-α Docking Score ER-β1 Raloxifene -11.70 -8.722 Lilly-117018 -11.53 -3.803 3-HydroxyTamoxifen -11.02 N/A4 Nafoxidine -10.88 N/A5 ICI-182780 -10.73 N/A6 Pyrolidine -10.04 N/A7 Clomiphene A -10.01 N/A8 Nitrofinene Citrate -9.87 N/A9 ICI-164384 -9.82 -9.13
10 Moxestrol -9.38 -9.7711 Naringenine -8.55 -7.8012 Triphenylethylene -8.50 N/A13 Afema -8.15 -7.7814 Danazol -6.99 N/A15 Ethamoxytriphetol -6.67 N/A16 4-HydroxyTamoxifen -6.60 N/A17 Dioxin -6.22 N/A18 Estralutin -5.86 -3.8019 Cyclopentanone -4.88 N/A20 Miproxifene Phosphate -4.48 N/A21 EM-800 N/A N/A
Note: The list was retrieved from PubChem and it includes compounds characterized as
“estrogen ligands”. N/A; no binding affinity.
LiSIs Showcase - Natural-like Rule of 5 filter
GRANATUM Rule of 5 filter:
1 MW between 160 and 700,2 HBD less or equal to 5,3 HBA less or equal to 10,4 TPSA less than 140, and5 cLogP between -0.4 and 5.6.
eMEGA Settings
Table: eMEGA experimental design settings
Dataset Objectives Population Iterations Evolutionary OperationsDataset 1 Structural Similarity
Chemical DescriptorSimilarity
500 500Mutation Probability: 15%Crossover Probability: 80%Selection Type: RouletteDiversity Type: Genotype
Dataset 2
SAMOEA Settings
Table: SAMOEA experimental design settings
SAMOEADataset Objectives Population Iterations Evolutionary Operations
Dataset 1 Non DominateSolutions PercentageUnique SolutionsPercentage
20 100Mutation Probability: 15%Crossover Probability: 80%Selection Type: RouletteDiversity Type: Phenotype
Dataset 2
eMEGADataset 1 Structural Similarity
Chemical DescriptorSimilarity
100 1Defined during run time.Based on SAMOEA’s chro-mosomes.
Dataset 2
Virtual Machine Specifications
Table: Specifications of the virtual machine the experimental runs wereperformed
Linux Virtual MachineCPU 4x Virtual CPU @ 2GHzRAM 16GBOS CentOS 6
eMEGA Maybridge Run 1
Figure: eMEGA Run 1 results for Maybridge dataset.
eMEGA Maybridge Run 2
Figure: eMEGA Run 2 results for Maybridge dataset.
eMEGA Maybridge Run 3
Figure: eMEGA Run 3 results for Maybridge dataset.
eMEGA Maybridge Run 4
Figure: eMEGA Run 4 results for Maybridge dataset.
eMEGA Maybridge Run 5
Figure: eMEGA Run 5 results for Maybridge dataset.
eMEGA Maybridge All Runs
Figure: eMEGA results for Maybridge dataset.
eMEGA Maybridge All Runs Top 10 Results (1)
Figure: eMEGA Top 10 results for Maybridge dataset.
eMEGA Maybridge All Runs Top 10 Results (2)
Figure: eMEGA Top 10 results for Maybridge dataset compared withSeliciclib, the red highlighted part of the molecules is their commoncore.
eMEGA Asinex Run 1
Figure: eMEGA Run 1 results for Asinex dataset.
eMEGA Asinex Run 2
Figure: eMEGA Run 2 results for Asinex dataset.
eMEGA Asinex Run 3
Figure: eMEGA Run 3 results for Asinex dataset.
Results - eMEGA Asinex Run 4
Figure: eMEGA Run 4 results for Asinex dataset.
eMEGA Asinex Run 5
Figure: eMEGA Run 5 results for Asinex dataset.
eMEGA Asinex All Runs
Figure: eMEGA results for Asinex dataset.
eMEGA Asinex All Runs Top 10 Results (1)
Figure: eMEGA Top 10 results for Asinex dataset.
eMEGA Asinex All Runs Top 10 Results (2)
Figure: eMEGA Top 10 results for Asinex dataset compared withSeliciclib, the red highlighted part of the molecules is their commoncore.
SAMOEA Maybridge Run 1
Figure: SAMOEA Run 1 results for Maybridge dataset.
SAMOEA Maybridge Run 2
Figure: SAMOEA Run 2 results for Maybridge dataset.
SAMOEA Maybridge Run 3
Figure: SAMOEA Run 3 results for Maybridge dataset.
SAMOEA Maybridge Run 4
Figure: SAMOEA Run 4 results for Maybridge dataset.
SAMOEA Maybridge Run 5
Figure: SAMOEA Run 5 results for Maybridge dataset.
SAMOEA Maybridge All Runs
Figure: SAMOEA results for Maybridge dataset.
SAMOEA Maybridge All Runs Top 10 Results (1)
Figure: SAMOEA Top 10 results for Maybridge dataset.
SAMOEA Maybridge All Runs Top 10 Results (2)
Figure: SAMOEA Top 10 results for Maybridge dataset compared withSeliciclib, the red highlighted part of the molecules is their commoncore.
SAMOEA Top 10 proposed settings for eMEGA
for Maybridge dataset
Table: SAMOEA Top 10 proposed settings for eMEGA for Maybridgedataset
MutationProbability
CrossoverProbability
SelectionType
DiversityType
NonDominated%
UniqueSolutions%
Rank
0.029 0.694 roulette genotype 0.9 0.986 10.175 0.818 roulette phenotype 0.914 0.961 10.172 0.818 tournament phenotype 0.934 0.9533 10.026 0.694 roulette phenotype 0.928 0.955 10.001 0.963 roulette phenotype 0.982 0.848 10.177 0.818 roulette phenotype 0.921 0.956 10.083 0.73 tournament phenotype 0.95 0.946 10.086 0.798 tournament genotype 0.976 0.928 10.172 0.818 best genotype 0.914 0.973 20.176 0.818 roulette genotype 0.9312 0.956 2
Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the
actual %. The smaller the number listed here the better. ’Rank’ is their non dominance
rank.
SAMOEA Asinex Run 1
Figure: SAMOEA Run 1 results for Asinex dataset.
SAMOEA Asinex Run 2
Figure: SAMOEA Run 2 results for Asinex dataset.
SAMOEA Asinex Run 3
Figure: SAMOEA Run 3 results for Asinex dataset.
SAMOEA Asinex Run 4
Figure: SAMOEA Run 4 results for Asinex dataset.
SAMOEA Asinex All Runs
Figure: SAMOEA results for Asinex dataset.
SAMOEA Asinex All Runs Top 10 Results (1)
Figure: SAMOEA Top 10 results for Asinex dataset.
SAMOEA Asinex All Runs Top 10 Results (2)
Figure: SAMOEA Top 10 results for Asinex dataset compared withSeliciclib, the red highlighted part of the molecules is their commoncore.
SAMOEA Top 10 proposed settings for eMEGA
for Maybridge Asinex
Table: SAMOEA Top 10 proposed settings for eMEGA for Asinexdataset
MutationProbability
CrossoverProbability
SelectionType
DiversityType
NonDominated%
UniqueSolutions%
Rank
0.105 1.0 best phenotype 0.988 0.931 10.139 0.963 tournament phenotype 0.962 0.956 10.089 0.694 tournament genotype 0.976 0.943 10.139 0.969 best phenotype 0.96 0.96 10.108 0.69 tournament genotype 0.955 0.962 10.1 1.0 best phenotype 0.988 0.942 1
0.088 0.685 tournament genotype 0.96 0.962 10.139 0.966 roulette phenotype 0.965 0.948 10.089 0.709 tournament genotype 0.964 0.957 2
Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the
actual %. The smaller the number listed here the better. ’Rank’ is their non dominance
rank.
MOARF Results
Figure: MOARF’s results compared with Seliciclib.
Compare SAMOEA, eMEGA and MOARF
Figure: Compare all Top 10 results with MOARF’s results andSeliciclib.
Discussion (1)
eMEGA and SAMOEA generate molecules that approximateSeliciclib,Datasets and algorithms have different common core withSeliciclib,MOARF approximates Seliciclib better than eMEGA andSAMOEA:
Generates molecules in a more chemical oriented way, withless stochastic operations,Starts from a selected core for the target where then attachesnew fragments on to it,
SAMOEA explores the space better than eMEGA andMOARF
Discussion (1)
eMEGA and SAMOEA generate molecules that approximateSeliciclib,Datasets and algorithms have different common core withSeliciclib,MOARF approximates Seliciclib better than eMEGA andSAMOEA:
Generates molecules in a more chemical oriented way, withless stochastic operations,Starts from a selected core for the target where then attachesnew fragments on to it,
SAMOEA explores the space better than eMEGA andMOARF
Discussion (1)
eMEGA and SAMOEA generate molecules that approximateSeliciclib,Datasets and algorithms have different common core withSeliciclib,MOARF approximates Seliciclib better than eMEGA andSAMOEA:
Generates molecules in a more chemical oriented way, withless stochastic operations,Starts from a selected core for the target where then attachesnew fragments on to it,
SAMOEA explores the space better than eMEGA andMOARF
Discussion (1)
eMEGA and SAMOEA generate molecules that approximateSeliciclib,Datasets and algorithms have different common core withSeliciclib,MOARF approximates Seliciclib better than eMEGA andSAMOEA:
Generates molecules in a more chemical oriented way, withless stochastic operations,Starts from a selected core for the target where then attachesnew fragments on to it,
SAMOEA explores the space better than eMEGA andMOARF
Discussion (1)
eMEGA and SAMOEA generate molecules that approximateSeliciclib,Datasets and algorithms have different common core withSeliciclib,MOARF approximates Seliciclib better than eMEGA andSAMOEA:
Generates molecules in a more chemical oriented way, withless stochastic operations,Starts from a selected core for the target where then attachesnew fragments on to it,
SAMOEA explores the space better than eMEGA andMOARF
Discussion (1)
eMEGA and SAMOEA generate molecules that approximateSeliciclib,Datasets and algorithms have different common core withSeliciclib,MOARF approximates Seliciclib better than eMEGA andSAMOEA:
Generates molecules in a more chemical oriented way, withless stochastic operations,Starts from a selected core for the target where then attachesnew fragments on to it,
SAMOEA explores the space better than eMEGA andMOARF
Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.
Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.
Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.
Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.
Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.
Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.
Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.
Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.
Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.
Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.
Discussion (3)
The objective fitness scores for the proposed settings are veryhigh, which means that the actual percentage is really low, below5%. From this we can conclude the following:
eMEGA instances generate a large number of identicalsolutions, despite the fact that they have differentconfigurations, this is something that we noticed withprevious experiments when comparing MEGA, eMEGA andMOGA [Nicolaou et al., 2009b], andThe objective fitness functions we choose to use in SAMOEAcompete each other, which means that having eMEGAsgenerating a high number of unique and non dominatedsolutions (above 20%) proves to be a difficult task.
Discussion (3)
The objective fitness scores for the proposed settings are veryhigh, which means that the actual percentage is really low, below5%. From this we can conclude the following:
eMEGA instances generate a large number of identicalsolutions, despite the fact that they have differentconfigurations, this is something that we noticed withprevious experiments when comparing MEGA, eMEGA andMOGA [Nicolaou et al., 2009b], andThe objective fitness functions we choose to use in SAMOEAcompete each other, which means that having eMEGAsgenerating a high number of unique and non dominatedsolutions (above 20%) proves to be a difficult task.
Use Case 1: Docked designed molecules (1)
Figure: Designed molecule DnD 6 SP 20 4 X 13a docked to ER-α.
Use Case 1: Docked designed molecules (2)
Figure: Designed molecule DnD 31 SP 150 37 M 19 docked to ER-α.
Use Case 1: Docked designed molecules (3)
Figure: Designed molecule DnD 8 SP 9 2 M 13 docked to ER-α.
Use Case 1: Docked designed molecules (4)
Figure: Designed molecule DnD 4 SP 199 49 X 46b docked to ER-α.
Use Case 1: Docked designed molecules (5)
Figure: Designed molecule DnD 12 SP 75 18 M 13 docked to ER-α.
Use Case 1: Docked designed molecules (6)
Figure: Designed molecule DnD 31 SP 6 1 M 16 docked to ER-α.
Use Case 1: Docked designed molecules (7)
Figure: Designed molecule DnD 15 SP 168 41 M 0 docked to ER-α.
Use Case 1: Docked designed molecules (8)
Figure: Designed molecule DnD 11 SP 74 18 M 4 docked to ER-α.
Use Case 1: Docked designed molecules (9)
Figure: Designed molecule DnD 31 SP 193 48 X 76b docked to ER-α.
Use Case 1: Docked designed molecules (10)
Figure: Designed molecule DnD 1 SP 78 19 X 84a docked to ER-α.
Use Case 2: About
Design molecules that bind to ER-α based on:
Structural similarity to Tamoxifen, andChemical Properties similarity to Tamoxifen.
Figure: Tamoxifen.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 116 / 130
Use Case 2: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,Applied filters:
Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 117 / 130
Use Case 2: Results - In objective space
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 118 / 130
Use Case 2: Results - Designed molecules
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 119 / 130
Use Case 2: Results - AutoDock Vina docking
Molecule Id Docking Affinity (kcal/mol)DnD 42 SP 194 48 X 96b -10.1DnD 17 SP 199 49 M 4 -10DnD 33 SP 189 47 X 66b -9.9DnD 48 SP 193 48 M 5 -9.6Tamoxifen -8.2
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 120 / 130
Use Case 2: Results - Self-Adaptive MOEA non
dominated settings for eMEGA
MutationProbability
CrossoverProbability
SelectionType
DiversityType
NonDominated%
ParetoHypervolume
Rank
0.02707 0.97973 tournament genotype 0.983 0.153 10.02758 0.97965 tournament phenotype 0.988 0.152 1
Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the
number listed here the better. ’Rank’ is their non dominance rank.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 121 / 130
Use Case 2: Docked designed molecules (1)
Figure: Designed molecule DnD 42 SP 194 48 X 96b docked to ER-α.
Use Case 2: Docked designed molecules (2)
Figure: Designed molecule DnD 17 SP 199 49 M 4 docked to ER-α.
Use Case 2: Docked designed molecules (3)
Figure: Designed molecule DnD 33 SP 189 47 X 66b docked to ER-α.
Use Case 2: Docked designed molecules (4)
Figure: Designed molecule DnD 48 SP 193 48 M 5 docked to ER-α.
Use Case 3: Docked designed molecules (1)
Figure: Designed molecule DnD 31 SP 194 48 M 49 docked to ER-α.
Use Case 3: Docked designed molecules (2)
Figure: Designed molecule DnD 34 SP 197 49 X 13a docked to ER-α.
Use Case 4: Docked designed molecules (1)
Figure: Designed molecule DnD 19 SP 196 48 X 59b docked toProteasome B5.
Use Case 4: Docked designed molecules (2)
Figure: Designed molecule DnD 49 SP 193 48 X 123b docked toProteasome B5.
Use Case 4: Docked designed molecules (3)
Figure: Designed molecule DnD 1 SP 196 48 X 67a docked toProteasome B5.