ckannas phd thesis slides

Post on 28-Jan-2018

74 Views

Category:

Science

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Scientific Workflow Systems

and

Multi-Objective Evolutionary Algorithms

for

Life Sciences Informatics

Christos C. Kannas

Computer Science, University of Cyprus

6th June 2017

Table of Contents

1 IntroductionScientific Workflow Management SystemsSelf-Adaptive Multi-Objective Evolutionary AlgorithmsVirtual Screening & De Novo Molecular Design

2 Life Sciences Informatics platformAbout Life Sciences Informatics platformLiSIs ShowcaseLiSIs Showcase Discussion

3 Self-Adaptive Multi-Objective Evolutionary AlgorithmAbout Self-Adaptive MOEASelf-Adaptive MOEA ShowcasesSelf-Adaptive MOEA Showcases Discussion

4 Concluding RemarksConcluding Remarks - LiSIs platformConcluding Remarks - Self-Adaptive MOEA

5 Future WorkFuture Work - LiSIs platformFuture Work - Self-Adaptive MOEA

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 1 / 130

Introduction

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 2 / 130

Table of Contents

1 IntroductionScientific Workflow Management SystemsSelf-Adaptive Multi-Objective Evolutionary AlgorithmsVirtual Screening & De Novo Molecular Design

2 Life Sciences Informatics platformAbout Life Sciences Informatics platformLiSIs ShowcaseLiSIs Showcase Discussion

3 Self-Adaptive Multi-Objective Evolutionary AlgorithmAbout Self-Adaptive MOEASelf-Adaptive MOEA ShowcasesSelf-Adaptive MOEA Showcases Discussion

4 Concluding RemarksConcluding Remarks - LiSIs platformConcluding Remarks - Self-Adaptive MOEA

5 Future WorkFuture Work - LiSIs platformFuture Work - Self-Adaptive MOEA

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 2 / 130

Scientific Workflow Management Systems

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 3 / 130

SWMSs Application Domains

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 4 / 130

Self-Adaptive Multi-Objective Evolutionary

Algorithms

Multi-Objective Evolutionary Algorithms:Family of algorithms inspired by nature:

Evolve a populationMutation and CrossoverSelect fittest individuals by Pareto ranking

Handle 1 to 3 objectives

Self-Adaptive Techniques:Optimise search parameters:

Population SizeMutation RateCrossover RateGeneration GapScaling Window

Optimise reproduction operators:

Mutation Operator(s)Crossover Operator(s)Parent Selection Operator

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130

Self-Adaptive Multi-Objective Evolutionary

Algorithms

Multi-Objective Evolutionary Algorithms:Family of algorithms inspired by nature:

Evolve a populationMutation and CrossoverSelect fittest individuals by Pareto ranking

Handle 1 to 3 objectives

Self-Adaptive Techniques:Optimise search parameters:

Population SizeMutation RateCrossover RateGeneration GapScaling Window

Optimise reproduction operators:

Mutation Operator(s)Crossover Operator(s)Parent Selection Operator

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130

Self-Adaptive Multi-Objective Evolutionary

Algorithms

Multi-Objective Evolutionary Algorithms:Family of algorithms inspired by nature:

Evolve a populationMutation and CrossoverSelect fittest individuals by Pareto ranking

Handle 1 to 3 objectives

Self-Adaptive Techniques:Optimise search parameters:

Population SizeMutation RateCrossover RateGeneration GapScaling Window

Optimise reproduction operators:

Mutation Operator(s)Crossover Operator(s)Parent Selection Operator

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130

Self-Adaptive Multi-Objective Evolutionary

Algorithms

Multi-Objective Evolutionary Algorithms:Family of algorithms inspired by nature:

Evolve a populationMutation and CrossoverSelect fittest individuals by Pareto ranking

Handle 1 to 3 objectives

Self-Adaptive Techniques:Optimise search parameters:

Population SizeMutation RateCrossover RateGeneration GapScaling Window

Optimise reproduction operators:

Mutation Operator(s)Crossover Operator(s)Parent Selection Operator

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130

Self-Adaptive Multi-Objective Evolutionary

Algorithms

Multi-Objective Evolutionary Algorithms:Family of algorithms inspired by nature:

Evolve a populationMutation and CrossoverSelect fittest individuals by Pareto ranking

Handle 1 to 3 objectives

Self-Adaptive Techniques:Optimise search parameters:

Population SizeMutation RateCrossover RateGeneration GapScaling Window

Optimise reproduction operators:

Mutation Operator(s)Crossover Operator(s)Parent Selection Operator

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130

Self-Adaptive Multi-Objective Evolutionary

Algorithms

Multi-Objective Evolutionary Algorithms:Family of algorithms inspired by nature:

Evolve a populationMutation and CrossoverSelect fittest individuals by Pareto ranking

Handle 1 to 3 objectives

Self-Adaptive Techniques:Optimise search parameters:

Population SizeMutation RateCrossover RateGeneration GapScaling Window

Optimise reproduction operators:

Mutation Operator(s)Crossover Operator(s)Parent Selection Operator

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130

Drug Discovery Process - Steps

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 6 / 130

Drug Discovery Process - Timeline

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 7 / 130

Life Sciences Informatics platform

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 8 / 130

Table of Contents

1 IntroductionScientific Workflow Management SystemsSelf-Adaptive Multi-Objective Evolutionary AlgorithmsVirtual Screening & De Novo Molecular Design

2 Life Sciences Informatics platformAbout Life Sciences Informatics platformLiSIs ShowcaseLiSIs Showcase Discussion

3 Self-Adaptive Multi-Objective Evolutionary AlgorithmAbout Self-Adaptive MOEASelf-Adaptive MOEA ShowcasesSelf-Adaptive MOEA Showcases Discussion

4 Concluding RemarksConcluding Remarks - LiSIs platformConcluding Remarks - Self-Adaptive MOEA

5 Future WorkFuture Work - LiSIs platformFuture Work - Self-Adaptive MOEA

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 8 / 130

Motivation & Objectives

MotivationProvide an easy to use web based platform,Focused on Virtual Screening (VS) of natural products, andAimed towards cancer chemoprevention researchers.

ObjectivesDesign and develop a web based Scientific WorkflowManagement System (SWMS),Provide tools for VS, andEvaluate it on use cases for identifying novel chemopreventiveagents.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130

Motivation & Objectives

MotivationProvide an easy to use web based platform,Focused on Virtual Screening (VS) of natural products, andAimed towards cancer chemoprevention researchers.

ObjectivesDesign and develop a web based Scientific WorkflowManagement System (SWMS),Provide tools for VS, andEvaluate it on use cases for identifying novel chemopreventiveagents.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130

Motivation & Objectives

MotivationProvide an easy to use web based platform,Focused on Virtual Screening (VS) of natural products, andAimed towards cancer chemoprevention researchers.

ObjectivesDesign and develop a web based Scientific WorkflowManagement System (SWMS),Provide tools for VS, andEvaluate it on use cases for identifying novel chemopreventiveagents.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130

Motivation & Objectives

MotivationProvide an easy to use web based platform,Focused on Virtual Screening (VS) of natural products, andAimed towards cancer chemoprevention researchers.

ObjectivesDesign and develop a web based Scientific WorkflowManagement System (SWMS),Provide tools for VS, andEvaluate it on use cases for identifying novel chemopreventiveagents.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130

Motivation & Objectives

MotivationProvide an easy to use web based platform,Focused on Virtual Screening (VS) of natural products, andAimed towards cancer chemoprevention researchers.

ObjectivesDesign and develop a web based Scientific WorkflowManagement System (SWMS),Provide tools for VS, andEvaluate it on use cases for identifying novel chemopreventiveagents.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130

Motivation & Objectives

MotivationProvide an easy to use web based platform,Focused on Virtual Screening (VS) of natural products, andAimed towards cancer chemoprevention researchers.

ObjectivesDesign and develop a web based Scientific WorkflowManagement System (SWMS),Provide tools for VS, andEvaluate it on use cases for identifying novel chemopreventiveagents.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130

Scientific Workflow Management Systems for

Virtual Screening

Applications Technology Scientific Field(s)

Open SourceTaverna Java

Bioinformatics,Chemistry,Astronomy,Data Mining,Text Mining,Music

Galaxy PythonLife Sciences,Bioinformatics

Knime Java

Life Sciences,Chemoinformatics,Bioinformatics,High Performance Data Anal-ysis

CommercialInforsence/DiscoveryNet

Life Sciences,Healthcare,Environmental Monitoring,Geo-hazard Modelling

Pipeline PilotBiology,Chemistry,Material Science

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 10 / 130

Funding Support

The work has been partially supported through the EU-FP7GRANATUM project, ”A Social Collaborative WorkingSpace Semantically Interlinking Biomedical Researchers,Knowledge and data for the design and execution of In SilicoModels and Experiments in Cancer Chemoprevention”,contract number 270139.Support the research of EU-FP7 Linked2Safety project, ”ANext-Generation, Secure Linked Data Medical InformationSpace For Semantically-Interconnecting Electronic HealthRecords and Clinical Trials Systems Advancing PatientsSafety In Clinical Research”, contract number 288328.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 11 / 130

Funding Support

The work has been partially supported through the EU-FP7GRANATUM project, ”A Social Collaborative WorkingSpace Semantically Interlinking Biomedical Researchers,Knowledge and data for the design and execution of In SilicoModels and Experiments in Cancer Chemoprevention”,contract number 270139.Support the research of EU-FP7 Linked2Safety project, ”ANext-Generation, Secure Linked Data Medical InformationSpace For Semantically-Interconnecting Electronic HealthRecords and Clinical Trials Systems Advancing PatientsSafety In Clinical Research”, contract number 288328.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 11 / 130

Life Sciences Informatics platform

Life Sciences Informatics (LiSIs) is a web based SWMS forVS [Kannas et al., 2015].LiSIs is based on the Galaxy SWMS [Goecks et al., 2010],[Blankenberg et al., 2010], [Giardine et al., 2005].

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 12 / 130

Life Sciences Informatics platform

Life Sciences Informatics (LiSIs) is a web based SWMS forVS [Kannas et al., 2015].LiSIs is based on the Galaxy SWMS [Goecks et al., 2010],[Blankenberg et al., 2010], [Giardine et al., 2005].

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 12 / 130

LiSIs modules

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 13 / 130

LiSIs modules

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 13 / 130

LiSIs modules

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 13 / 130

LiSIs modules

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 13 / 130

LiSIs Showcase

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 14 / 130

LiSIs Showcase Information

LiSIs was (successfully) used for the discovery of promisingagents with chemopreventive properties, that are able to bindto Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β(ER-β)Datasets:

2414 compounds from Indofine,55 compounds characterized by Medina-Franco et al.[Medina-Franco et al., 2010], and21 known ER ligands retrieved from PubChem.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130

LiSIs Showcase Information

LiSIs was (successfully) used for the discovery of promisingagents with chemopreventive properties, that are able to bindto Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β(ER-β)Datasets:

2414 compounds from Indofine,55 compounds characterized by Medina-Franco et al.[Medina-Franco et al., 2010], and21 known ER ligands retrieved from PubChem.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130

LiSIs Showcase Information

LiSIs was (successfully) used for the discovery of promisingagents with chemopreventive properties, that are able to bindto Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β(ER-β)Datasets:

2414 compounds from Indofine,55 compounds characterized by Medina-Franco et al.[Medina-Franco et al., 2010], and21 known ER ligands retrieved from PubChem.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130

LiSIs Showcase Information

LiSIs was (successfully) used for the discovery of promisingagents with chemopreventive properties, that are able to bindto Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β(ER-β)Datasets:

2414 compounds from Indofine,55 compounds characterized by Medina-Franco et al.[Medina-Franco et al., 2010], and21 known ER ligands retrieved from PubChem.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130

LiSIs Showcase Information

LiSIs was (successfully) used for the discovery of promisingagents with chemopreventive properties, that are able to bindto Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β(ER-β)Datasets:

2414 compounds from Indofine,55 compounds characterized by Medina-Franco et al.[Medina-Franco et al., 2010], and21 known ER ligands retrieved from PubChem.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130

LiSIs Showcase Workflow

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 16 / 130

LiSIs Showcase Docking Results

(a) ER-α Docking Score (b) ER-β Docking Score

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 17 / 130

LiSIs Showcase Discussion

From Indofine dataset (2414 compounds), based on theirnatural-like criteria and docking results, we selected:

18 potential ER ligands,Were further investigated in vitro with the ER binding assaydescribed by Gurer-Orhan et al. [Gurer-Orhan et al., 2005]with minor modifications,15 out of 18 compounds (83.3%) were experimentallyconfirmed active.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 18 / 130

LiSIs Showcase Discussion

From Indofine dataset (2414 compounds), based on theirnatural-like criteria and docking results, we selected:

18 potential ER ligands,Were further investigated in vitro with the ER binding assaydescribed by Gurer-Orhan et al. [Gurer-Orhan et al., 2005]with minor modifications,15 out of 18 compounds (83.3%) were experimentallyconfirmed active.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 18 / 130

LiSIs Showcase Discussion

From Indofine dataset (2414 compounds), based on theirnatural-like criteria and docking results, we selected:

18 potential ER ligands,Were further investigated in vitro with the ER binding assaydescribed by Gurer-Orhan et al. [Gurer-Orhan et al., 2005]with minor modifications,15 out of 18 compounds (83.3%) were experimentallyconfirmed active.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 18 / 130

Self-Adaptive Multi-Objective Evolutionary

Algorithm

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 19 / 130

Table of Contents

1 IntroductionScientific Workflow Management SystemsSelf-Adaptive Multi-Objective Evolutionary AlgorithmsVirtual Screening & De Novo Molecular Design

2 Life Sciences Informatics platformAbout Life Sciences Informatics platformLiSIs ShowcaseLiSIs Showcase Discussion

3 Self-Adaptive Multi-Objective Evolutionary AlgorithmAbout Self-Adaptive MOEASelf-Adaptive MOEA ShowcasesSelf-Adaptive MOEA Showcases Discussion

4 Concluding RemarksConcluding Remarks - LiSIs platformConcluding Remarks - Self-Adaptive MOEA

5 Future WorkFuture Work - LiSIs platformFuture Work - Self-Adaptive MOEA

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 19 / 130

Multi-Objective Algorithms for Molecular Design

Name MO Method SearchMethod

Remarks Reference

EA-Inventor

Weighted EvolutionaryAlgorithm

Ligand [Feher et al., 2008]

GANDI Weighted Parallel Evo-lutionary Al-gorithm

Structure [Dey and Caflisch, 2008]

FOG Weighted EvolutionaryAlgorithm

Ligand [Kutchukian et al., 2009]

MEGA Pareto based EvolutionaryAlgorithm

Ligand & Struc-ture

[Nicolaou et al., 2009a]

PLD Pareto based EvolutionaryAlgorithm

ADME relatedproperties

[Ekins et al., 2010]

NovoFLAP Weighted EvolutionaryAlgorithm

Ligand [Damewood et al., 2010]

PhDD Weighted Workflow Pharmacophore [Huang et al., 2010]DOGS Weighted Workflow Ligand [Hartenfeller et al., 2012]LiGen Weighted Workflow Ligand, Struc-

ture & Pharma-cophore

[Beccari et al., 2013]

MOARF Weighted Workflow Ligand & Struc-ture

[Firth et al., 2015]

Synopsis Pareto based EvolutionaryAlgorithm

Ligand & Struc-ture

[Daeyaert and Deem, 2016]

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 20 / 130

Motivation & Objectives

MotivationFind suitable search parameters for an algorithm in a givenproblem, andAutomate this process.

ObjectivesDesign and develop an algorithm:

To search for the fittest search parameters of MOEAs,To be problem agnostic, andEvaluate on our previously proposed eMEGA for molecularDe Novo Design.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130

Motivation & Objectives

MotivationFind suitable search parameters for an algorithm in a givenproblem, andAutomate this process.

ObjectivesDesign and develop an algorithm:

To search for the fittest search parameters of MOEAs,To be problem agnostic, andEvaluate on our previously proposed eMEGA for molecularDe Novo Design.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130

Motivation & Objectives

MotivationFind suitable search parameters for an algorithm in a givenproblem, andAutomate this process.

ObjectivesDesign and develop an algorithm:

To search for the fittest search parameters of MOEAs,To be problem agnostic, andEvaluate on our previously proposed eMEGA for molecularDe Novo Design.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130

Motivation & Objectives

MotivationFind suitable search parameters for an algorithm in a givenproblem, andAutomate this process.

ObjectivesDesign and develop an algorithm:

To search for the fittest search parameters of MOEAs,To be problem agnostic, andEvaluate on our previously proposed eMEGA for molecularDe Novo Design.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130

Motivation & Objectives

MotivationFind suitable search parameters for an algorithm in a givenproblem, andAutomate this process.

ObjectivesDesign and develop an algorithm:

To search for the fittest search parameters of MOEAs,To be problem agnostic, andEvaluate on our previously proposed eMEGA for molecularDe Novo Design.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130

About Self-Adaptive MOEA

Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:

Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.

Objective fitness functions for the meta-level:

The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130

About Self-Adaptive MOEA

Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:

Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.

Objective fitness functions for the meta-level:

The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130

About Self-Adaptive MOEA

Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:

Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.

Objective fitness functions for the meta-level:

The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130

About Self-Adaptive MOEA

Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:

Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.

Objective fitness functions for the meta-level:

The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130

About Self-Adaptive MOEA

Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:

Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.

Objective fitness functions for the meta-level:

The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130

About Self-Adaptive MOEA

Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:

Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.

Objective fitness functions for the meta-level:

The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130

About Self-Adaptive MOEA

Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:

Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.

Objective fitness functions for the meta-level:

The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130

About Self-Adaptive MOEA

Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:

Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.

Objective fitness functions for the meta-level:

The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130

About Self-Adaptive MOEA

Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:

Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.

Objective fitness functions for the meta-level:

The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130

About Self-Adaptive MOEA

Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:

Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.

Objective fitness functions for the meta-level:

The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130

About Self-Adaptive MOEA

Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:

Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.

Objective fitness functions for the meta-level:

The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130

About Self-Adaptive MOEA

Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:

Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.

Objective fitness functions for the meta-level:

The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130

Self-Adaptive MOEA Pseudocode

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 23 / 130

Self-Adaptive MOEA Chromosome

Chromosomes Example

Objective Fitness Functions

Objective Fitness Function Range Example

Non-dominated Solutions % 0 - 1.0 0.90Unique Solutions % 0 - 1.0 0.88Pareto Front Hypervolume 0 - 1.0 0.56

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 24 / 130

Self-Adaptive MOEA Chromosome

Chromosomes Example

Objective Fitness Functions

Objective Fitness Function Range Example

Non-dominated Solutions % 0 - 1.0 0.90Unique Solutions % 0 - 1.0 0.88Pareto Front Hypervolume 0 - 1.0 0.56

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 24 / 130

eMEGA Chromosome

Graph based, andInformation related to evolutionary design process.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 25 / 130

eMEGA Chromosome

Graph based, andInformation related to evolutionary design process.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 25 / 130

eMEGA Chromosome

Graph based, andInformation related to evolutionary design process.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 25 / 130

Self-Adaptive MOEA Flowchart

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 26 / 130

Self-Adaptive MOEA Flowchart

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 26 / 130

Self-Adaptive MOEA Flowchart

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 26 / 130

Self-Adaptive MOEA Flowchart

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 26 / 130

Self-Adaptive MOEA Showcases

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 27 / 130

Validation of Self-Adaptive MOEA: About

Compare SAMOEA, eMEGA and MOARF[Firth et al., 2015].Design molecules that have structural and chemicalproperties similarity to the target molecule of Seliciclib.

Figure: Seliciclib (CYC202, R-roscovitine)

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 28 / 130

Validation of Self-Adaptive MOEA: Staring

Datasets

Starting Molecules datasets:

Maybridge’s Screening Library that contains 53953 molecules(Dataset 1),Asinex’s Elite Libraries that contains 104577 molecules(Dataset 2).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 29 / 130

Validation of Self-Adaptive MOEA: Staring

Datasets

Starting Molecules datasets:

Maybridge’s Screening Library that contains 53953 molecules(Dataset 1),Asinex’s Elite Libraries that contains 104577 molecules(Dataset 2).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 29 / 130

Validation of Self-Adaptive MOEA: Settings

eMEGA SettingsDataset Objectives Population Iterations Evolutionary Operations

Dataset 1 Structural SimilarityChemical DescriptorSimilarity

500 500Mutation Probability: 15%Crossover Probability: 80%Selection Type: RouletteDiversity Type: Genotype

Dataset 2

SAMOEA SettingsSAMOEA

Dataset Objectives Population Iterations Evolutionary OperationsDataset 1 Non Dominate

Solutions PercentageUnique SolutionsPercentage

20 100Mutation Probability: 15%Crossover Probability: 80%Selection Type: RouletteDiversity Type: Phenotype

Dataset 2

eMEGADataset 1 Structural Similarity

Chemical DescriptorSimilarity

100 1Defined during run time.Based on SAMOEA’s chro-mosomes.

Dataset 2

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 30 / 130

Validation of Self-Adaptive MOEA: Settings

eMEGA SettingsDataset Objectives Population Iterations Evolutionary Operations

Dataset 1 Structural SimilarityChemical DescriptorSimilarity

500 500Mutation Probability: 15%Crossover Probability: 80%Selection Type: RouletteDiversity Type: Genotype

Dataset 2

SAMOEA SettingsSAMOEA

Dataset Objectives Population Iterations Evolutionary OperationsDataset 1 Non Dominate

Solutions PercentageUnique SolutionsPercentage

20 100Mutation Probability: 15%Crossover Probability: 80%Selection Type: RouletteDiversity Type: Phenotype

Dataset 2

eMEGADataset 1 Structural Similarity

Chemical DescriptorSimilarity

100 1Defined during run time.Based on SAMOEA’s chro-mosomes.

Dataset 2

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 30 / 130

Validation of Self-Adaptive MOEA: Results

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 31 / 130

Validation of Self-Adaptive MOEA: Results

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 31 / 130

Validation of Self-Adaptive MOEA: Results -

Search Settings (1)

SAMOEA Top 10 proposed settings for eMEGA for Maybridge dataset

MutationProbability

CrossoverProbability

SelectionType

DiversityType

NonDominated%

UniqueSolutions%

Rank

0.029 0.694 roulette genotype 0.9 0.986 10.175 0.818 roulette phenotype 0.914 0.961 10.172 0.818 tournament phenotype 0.934 0.9533 10.026 0.694 roulette phenotype 0.928 0.955 10.001 0.963 roulette phenotype 0.982 0.848 10.177 0.818 roulette phenotype 0.921 0.956 10.083 0.73 tournament phenotype 0.95 0.946 10.086 0.798 tournament genotype 0.976 0.928 10.172 0.818 best genotype 0.914 0.973 20.176 0.818 roulette genotype 0.9312 0.956 2

Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the

actual %. The smaller the number listed here the better. ’Rank’ is their non dominance

rank.

Validation of Self-Adaptive MOEA: Results -

Search Settings (2)

SAMOEA Top 10 proposed settings for eMEGA for Asinex dataset

MutationProbability

CrossoverProbability

SelectionType

DiversityType

NonDominated%

UniqueSolutions%

Rank

0.105 1.0 best phenotype 0.988 0.931 10.139 0.963 tournament phenotype 0.962 0.956 10.089 0.694 tournament genotype 0.976 0.943 10.139 0.969 best phenotype 0.96 0.96 10.108 0.69 tournament genotype 0.955 0.962 10.1 1.0 best phenotype 0.988 0.942 1

0.088 0.685 tournament genotype 0.96 0.962 10.139 0.966 roulette phenotype 0.965 0.948 10.089 0.709 tournament genotype 0.964 0.957 2

Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the

actual %. The smaller the number listed here the better. ’Rank’ is their non dominance

rank.

Use Case 1: About

Design molecules that bind to ER-α based on:

Structural similarity to Tamoxifen, andStructural dissimilarity to Ibuproxam.

(a) Tamoxifen. (b) Ibuproxam.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 34 / 130

Use Case 1: Starting Dataset

Starting Molecules dataset:

Molecules retrieved from ZINC15,Applied filters:

Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).

The collection contains 7035 molecules.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130

Use Case 1: Starting Dataset

Starting Molecules dataset:

Molecules retrieved from ZINC15,Applied filters:

Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).

The collection contains 7035 molecules.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130

Use Case 1: Starting Dataset

Starting Molecules dataset:

Molecules retrieved from ZINC15,Applied filters:

Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).

The collection contains 7035 molecules.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130

Use Case 1: Starting Dataset

Starting Molecules dataset:

Molecules retrieved from ZINC15,Applied filters:

Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).

The collection contains 7035 molecules.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130

Use Case 1: Starting Dataset

Starting Molecules dataset:

Molecules retrieved from ZINC15,Applied filters:

Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).

The collection contains 7035 molecules.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130

Use Case 1: Starting Dataset

Starting Molecules dataset:

Molecules retrieved from ZINC15,Applied filters:

Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).

The collection contains 7035 molecules.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130

Use Case 1: Results - In objective space

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 36 / 130

Use Case 1: Results - Designed molecules

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 37 / 130

Use Case 1: Results - AutoDock Vina docking

Molecule Id Docking Affinity (kcal/mol)Tamoxifen -8.2DnD 6 SP 20 4 X 13a -7.9DnD 31 SP 150 37 M 19 -7.9DnD 8 SP 9 2 M 13 -7.8DnD 4 SP 199 49 X 46b -7.7DnD 12 SP 75 18 M 13 -7.6DnD 31 SP 6 1 M 16 -7.2DnD 15 SP 168 41 M 0 -7.2DnD 11 SP 74 18 M 4 -7.1DnD 31 SP 193 48 X 76b -6.9DnD 1 SP 78 19 X 84a -6.8

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 38 / 130

Use Case 1: Results - Self-Adaptive MOEA non

dominated settings for eMEGA

MutationProbability

CrossoverProbability

SelectionType

DiversityType

NonDominated%

ParetoHypervolume

Rank

0.15777 0.80279 tournament genotype 0.634 0.341 10.15613 0.88305 tournament genotype 0.634 0.341 10.15627 0.88891 tournament genotype 0.634 0.341 10.15688 0.88891 roulette genotype 0.649 0.340 10.00552 0.94308 best genotype 0.624 0.427 1

Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the

number listed here the better. ’Rank’ is their non dominance rank.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 39 / 130

Use Case 3: About

Design molecules that bind to ER-α based on:

Structural similarity to Raloxifene, andChemical Properties similarity to Raloxifene.

Figure: Raloxifene.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 40 / 130

Use Case 3: Starting Dataset

Starting Molecules dataset:

Molecules retrieved from ZINC15,Applied filters:

Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).

The collection contains 7035 molecules.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 41 / 130

Use Case 3: Results - In objective space

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 42 / 130

Use Case 3: Results - Designed molecules

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 43 / 130

Use Case 3: Results - AutoDock Vina docking

Molecule Id Docking Affinity (kcal/mol)DnD 31 SP 194 48 M 49 -8.2DnD 34 SP 197 49 X 13a -5.9Raloxifene -2.2 (-11.70 PubChem)

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 44 / 130

Use Case 3: Results - Self-Adaptive MOEA non

dominated settings for eMEGA

MutationProbability

CrossoverProbability

SelectionType

DiversityType

NonDominated%

ParetoHypervolume

Rank

0.12927 0.98597 roulette genotype 0.997 0.274 10.12897 0.98588 roulette genotype 0.997 0.274 10.12933 0.98588 roulette genotype 0.997 0.274 10.12946 0.98559 roulette genotype 0.997 0.274 10.12928 0.98582 roulette genotype 0.997 0.274 10.12897 0.98588 tournament genotype 0.997 0.274 1

Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the

number listed here the better. ’Rank’ is their non dominance rank.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 45 / 130

Use Case 4: About

Design molecules that bind to Proteasome B5 based on:

Structural similarity to Ixazomib, andChemical Properties similarity to Ixazomib.

Figure: Ixazomib.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 46 / 130

Use Case 4: Starting Dataset

Starting Molecules dataset:

Molecules retrieved from ZINC15,Applied filters:

Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).

The collection contains 7035 molecules.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 47 / 130

Use Case 4: Results - In objective space

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 48 / 130

Use Case 4: Results - Designed molecules

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 49 / 130

Use Case 4: Results - AutoDock 4 docking

Molecule Id Docking Affinity (kcal/mol)DnD 19 SP 196 48 X 59b -7.19DnD 49 SP 193 48 X 123b -6.68DnD 1 SP 196 48 X 67a -6.08

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 50 / 130

Use Case 4: Results - Self-Adaptive MOEA non

dominated settings for eMEGA

MutationProbability

CrossoverProbability

SelectionType

DiversityType

NonDominated%

ParetoHypervolume

Rank

0.09507 0.98194 tournament phenotype 0.993 0.442 10.09507 0.9819 roulette phenotype 0.991 0.442 10.09471 0.98178 roulette genotype 0.997 0.426 10.09484 0.98183 roulette phenotype 0.996 0.441 10.09277 0.98235 roulette genotype 0.996 0.441 1

Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the

number listed here the better. ’Rank’ is their non dominance rank.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 51 / 130

Self-Adaptive MOEA Showcases Discussion

SAMOEA proposed interesting solutions in all problems thathas been applied to,Further in-vitro investigation is required, andSAMOEA’s proposed eMEGA settings differ based onproblem and dataset (no silver bullet).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 52 / 130

Self-Adaptive MOEA Showcases Discussion

SAMOEA proposed interesting solutions in all problems thathas been applied to,Further in-vitro investigation is required, andSAMOEA’s proposed eMEGA settings differ based onproblem and dataset (no silver bullet).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 52 / 130

Self-Adaptive MOEA Showcases Discussion

SAMOEA proposed interesting solutions in all problems thathas been applied to,Further in-vitro investigation is required, andSAMOEA’s proposed eMEGA settings differ based onproblem and dataset (no silver bullet).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 52 / 130

Concluding Remarks

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 53 / 130

Table of Contents

1 IntroductionScientific Workflow Management SystemsSelf-Adaptive Multi-Objective Evolutionary AlgorithmsVirtual Screening & De Novo Molecular Design

2 Life Sciences Informatics platformAbout Life Sciences Informatics platformLiSIs ShowcaseLiSIs Showcase Discussion

3 Self-Adaptive Multi-Objective Evolutionary AlgorithmAbout Self-Adaptive MOEASelf-Adaptive MOEA ShowcasesSelf-Adaptive MOEA Showcases Discussion

4 Concluding RemarksConcluding Remarks - LiSIs platformConcluding Remarks - Self-Adaptive MOEA

5 Future WorkFuture Work - LiSIs platformFuture Work - Self-Adaptive MOEA

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 53 / 130

Concluding Remarks - LiSIs platform

Features a Web based Virtual Screening platform, focused forCancer Chemoprevention Research.To be expanded later in the future with tools featuring thealgorithms from MEGA framework.A number of SWs were implemented for:

preparing docking models,preparing predictive models,performing docking experiments,using predictive models to predict biochemical propertiesand behaviour, andperforming VS workflows.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 54 / 130

Concluding Remarks - LiSIs platform

Features a Web based Virtual Screening platform, focused forCancer Chemoprevention Research.To be expanded later in the future with tools featuring thealgorithms from MEGA framework.A number of SWs were implemented for:

preparing docking models,preparing predictive models,performing docking experiments,using predictive models to predict biochemical propertiesand behaviour, andperforming VS workflows.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 54 / 130

Concluding Remarks - LiSIs platform

Features a Web based Virtual Screening platform, focused forCancer Chemoprevention Research.To be expanded later in the future with tools featuring thealgorithms from MEGA framework.A number of SWs were implemented for:

preparing docking models,preparing predictive models,performing docking experiments,using predictive models to predict biochemical propertiesand behaviour, andperforming VS workflows.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 54 / 130

Concluding Remarks - Self-Adaptive MOEA (1)

Drawbacks:

Needs a lot of time to terminate, andVery slow convergence.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 55 / 130

Concluding Remarks - Self-Adaptive MOEA (1)

Drawbacks:

Needs a lot of time to terminate, andVery slow convergence.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 55 / 130

Concluding Remarks - Self-Adaptive MOEA (1)

Drawbacks:

Needs a lot of time to terminate, andVery slow convergence.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 55 / 130

Concluding Remarks - Self-Adaptive MOEA (2)

Advantages:

Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130

Concluding Remarks - Self-Adaptive MOEA (2)

Advantages:

Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130

Concluding Remarks - Self-Adaptive MOEA (2)

Advantages:

Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130

Concluding Remarks - Self-Adaptive MOEA (2)

Advantages:

Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130

Concluding Remarks - Self-Adaptive MOEA (2)

Advantages:

Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130

Concluding Remarks - Self-Adaptive MOEA (2)

Advantages:

Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130

Concluding Remarks - Self-Adaptive MOEA (2)

Advantages:

Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130

Concluding Remarks - Self-Adaptive MOEA (2)

Advantages:

Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130

Concluding Remarks - Self-Adaptive MOEA (2)

Advantages:

Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130

Future Work

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 57 / 130

Table of Contents

1 IntroductionScientific Workflow Management SystemsSelf-Adaptive Multi-Objective Evolutionary AlgorithmsVirtual Screening & De Novo Molecular Design

2 Life Sciences Informatics platformAbout Life Sciences Informatics platformLiSIs ShowcaseLiSIs Showcase Discussion

3 Self-Adaptive Multi-Objective Evolutionary AlgorithmAbout Self-Adaptive MOEASelf-Adaptive MOEA ShowcasesSelf-Adaptive MOEA Showcases Discussion

4 Concluding RemarksConcluding Remarks - LiSIs platformConcluding Remarks - Self-Adaptive MOEA

5 Future WorkFuture Work - LiSIs platformFuture Work - Self-Adaptive MOEA

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 57 / 130

Future Work - LiSIs platform

Develop LiSIs 2.0:

Based on latest Galaxy platform, andRedesign of tools to be compatible with Galaxy’s ToolShedfor easy deployment,

Update LiSIs with a feature to visualise intermediate resultsfrom various tools,Expand LiSIs tools with tools featuring the MEGA line-up ofalgorithms and SAMOEA,Explore resource management in SWMSs:

Novel Multi-Objective Optimization SW design approaches,Novel Multi-Objective Optimization SWs schedulingapproaches.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130

Future Work - LiSIs platform

Develop LiSIs 2.0:

Based on latest Galaxy platform, andRedesign of tools to be compatible with Galaxy’s ToolShedfor easy deployment,

Update LiSIs with a feature to visualise intermediate resultsfrom various tools,Expand LiSIs tools with tools featuring the MEGA line-up ofalgorithms and SAMOEA,Explore resource management in SWMSs:

Novel Multi-Objective Optimization SW design approaches,Novel Multi-Objective Optimization SWs schedulingapproaches.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130

Future Work - LiSIs platform

Develop LiSIs 2.0:

Based on latest Galaxy platform, andRedesign of tools to be compatible with Galaxy’s ToolShedfor easy deployment,

Update LiSIs with a feature to visualise intermediate resultsfrom various tools,Expand LiSIs tools with tools featuring the MEGA line-up ofalgorithms and SAMOEA,Explore resource management in SWMSs:

Novel Multi-Objective Optimization SW design approaches,Novel Multi-Objective Optimization SWs schedulingapproaches.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130

Future Work - LiSIs platform

Develop LiSIs 2.0:

Based on latest Galaxy platform, andRedesign of tools to be compatible with Galaxy’s ToolShedfor easy deployment,

Update LiSIs with a feature to visualise intermediate resultsfrom various tools,Expand LiSIs tools with tools featuring the MEGA line-up ofalgorithms and SAMOEA,Explore resource management in SWMSs:

Novel Multi-Objective Optimization SW design approaches,Novel Multi-Objective Optimization SWs schedulingapproaches.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130

Future Work - LiSIs platform

Develop LiSIs 2.0:

Based on latest Galaxy platform, andRedesign of tools to be compatible with Galaxy’s ToolShedfor easy deployment,

Update LiSIs with a feature to visualise intermediate resultsfrom various tools,Expand LiSIs tools with tools featuring the MEGA line-up ofalgorithms and SAMOEA,Explore resource management in SWMSs:

Novel Multi-Objective Optimization SW design approaches,Novel Multi-Objective Optimization SWs schedulingapproaches.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130

Future Work - LiSIs platform

Develop LiSIs 2.0:

Based on latest Galaxy platform, andRedesign of tools to be compatible with Galaxy’s ToolShedfor easy deployment,

Update LiSIs with a feature to visualise intermediate resultsfrom various tools,Expand LiSIs tools with tools featuring the MEGA line-up ofalgorithms and SAMOEA,Explore resource management in SWMSs:

Novel Multi-Objective Optimization SW design approaches,Novel Multi-Objective Optimization SWs schedulingapproaches.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130

Future Work - LiSIs platform

Develop LiSIs 2.0:

Based on latest Galaxy platform, andRedesign of tools to be compatible with Galaxy’s ToolShedfor easy deployment,

Update LiSIs with a feature to visualise intermediate resultsfrom various tools,Expand LiSIs tools with tools featuring the MEGA line-up ofalgorithms and SAMOEA,Explore resource management in SWMSs:

Novel Multi-Objective Optimization SW design approaches,Novel Multi-Objective Optimization SWs schedulingapproaches.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130

Future Work - LiSIs platform

Develop LiSIs 2.0:

Based on latest Galaxy platform, andRedesign of tools to be compatible with Galaxy’s ToolShedfor easy deployment,

Update LiSIs with a feature to visualise intermediate resultsfrom various tools,Expand LiSIs tools with tools featuring the MEGA line-up ofalgorithms and SAMOEA,Explore resource management in SWMSs:

Novel Multi-Objective Optimization SW design approaches,Novel Multi-Objective Optimization SWs schedulingapproaches.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130

Future Work - Self-Adaptive MOEA

Optimise MEGA framework (memory management andparallelism),Implement self-adaptive technique for selecting geneticoperators,Extend Self-Adaptive MOEA to use other MOEAs,Implement models for other problems, andImplement new objective functions.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130

Future Work - Self-Adaptive MOEA

Optimise MEGA framework (memory management andparallelism),Implement self-adaptive technique for selecting geneticoperators,Extend Self-Adaptive MOEA to use other MOEAs,Implement models for other problems, andImplement new objective functions.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130

Future Work - Self-Adaptive MOEA

Optimise MEGA framework (memory management andparallelism),Implement self-adaptive technique for selecting geneticoperators,Extend Self-Adaptive MOEA to use other MOEAs,Implement models for other problems, andImplement new objective functions.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130

Future Work - Self-Adaptive MOEA

Optimise MEGA framework (memory management andparallelism),Implement self-adaptive technique for selecting geneticoperators,Extend Self-Adaptive MOEA to use other MOEAs,Implement models for other problems, andImplement new objective functions.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130

Future Work - Self-Adaptive MOEA

Optimise MEGA framework (memory management andparallelism),Implement self-adaptive technique for selecting geneticoperators,Extend Self-Adaptive MOEA to use other MOEAs,Implement models for other problems, andImplement new objective functions.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130

List of Publications

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 61 / 130

Table of Contents

6 List of Publications7 References8 Backup Frames

Validation of Self-Adaptive MOEAUse Case 1Use Case 2Use Case 3Use Case 4

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 61 / 130

List of Publications I

Book Chapters

C. A. Nicolaou and C. C. Kannas, “Molecular LibraryDesign Using Multi-Objective OptimizationMethods,” in Chemical Library Design, J. Z. Zhou, Ed.Humana Press, 2011, pp. 53–69.

Journals

C. Kannas et al., “LiSIs: An Online Scientific WorkflowSystem for Virtual Screening,” Combinatorial Chemistry& High Throughput Screening, vol. 18, no. 3, pp. 281–295,Mar. 2015.C. A. Nicolaou, C. Kannas, and E. Loizidou,“Multi-objective optimization methods in de novodrug design,” Mini Rev Med Chem, vol. 12, no. 10, pp.979–987, Sep. 2012.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 62 / 130

List of Publications II

C. Nicolaou, C. Kannas, and C. Pattichis,“Knowledge-driven multi-objective de novo drugdesign,” Chemistry Central Journal, vol. 3, p. P22, 2009.

Conferences

C. C. Kannas, and C. S. Pattichis, ”Self-AdaptiveMulti-Objective Evolutionary Algorithm forMolecular Design,” in 30th IEEE InternationalSymposium on Computer-Base Medical Systems,Thessoloniki, Greece, 22-24 June 2017, pp. 1-6.P. Hasapis et al., ”Molecular clustering via knowledgemining from biomedical scientific corpora,” in 2013IEEE 13th International Conference on Bioinformatics andBioengineering (BIBE), 2013, pp. 1-5.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 63 / 130

List of Publications III

C. C. Kannas et al., “A workflow system for virtualscreening in cancer chemoprevention,” in 2012 IEEE12th International Conference on BioinformaticsBioengineering (BIBE), 2012, pp. 439–446.K. G. Achilleos, C. C. Kannas, C. A. Nicolaou, C. S.Pattichis, and V. J. Promponas, “Open source workflowsystems in life sciences informatics,” in 2012 IEEE 12thInternational Conference on Bioinformatics Bioengineering(BIBE), 2012, pp. 552–558.C. A. Nicolaou, C. Kannas, and C. S. Pattichis, “Optimalgraph design using a knowledge-drivenmulti-objective evolutionary graph algorithm,” in2009 9th International Conference on InformationTechnology and Applications in Biomedicine, Larnaka,Cyprus, 2009, pp. 1–6.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 64 / 130

List of Publications IV

C. C. Kannas, C. A. Nicolaou, and C. S. Pattichis, “AParallel implementation of a Multi-objectiveEvolutionary Algorithm,” in 2009 9th InternationalConference on Information Technology and Applications inBiomedicine, Larnaka, Cyprus, 2009, pp. 1–6.

Abstracts

C. C. Kannas, and C. S. Pattichis, ”Self-AdaptiveMulti-Objective Evolutionary Algorithm forMolecular Design,” in 39th Annual InternationalConference of the IEEE Engineering in Medicine and BiologySociety, Jeju Island, Korea, 11-15 July 2017.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 65 / 130

References

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 66 / 130

Table of Contents

6 List of Publications7 References8 Backup Frames

Validation of Self-Adaptive MOEAUse Case 1Use Case 2Use Case 3Use Case 4

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 66 / 130

References I

Beccari, A. R., Cavazzoni, C., Beato, C., and Costantino, G.(2013). LiGen: A High Performance Workflow for ChemistryDriven de Novo Design. Journal of Chemical Information andModeling.

Blankenberg, D., Kuster, G. V., Coraor, N., Ananda, G.,Lazarus, R., Mangan, M., Nekrutenko, A., and Taylor, J. (2010).Galaxy: A Web-Based Genome Analysis Tool forExperimentalists. In Current Protocols in Molecular Biology.John Wiley & Sons, Inc.

Daeyaert, F. and Deem, M. W. (2016). A Pareto Algorithm forEfficient De Novo Design of Multi-functional Molecules.Molecular Informatics, pages n/a–n/a.

References II

Damewood, Jr, J. R., Lerman, C. L., and Masek, B. B. (2010).NovoFLAP: A ligand-based de novo design approach for thegeneration of medicinally relevant ideas. Journal of ChemicalInformation and Modeling, 50(7):1296–1303.

Dey, F. and Caflisch, A. (2008). Fragment-based de novo liganddesign by multiobjective evolutionary optimization. Journal ofChemical Information and Modeling, 48(3):679–690.

Ekins, S., Honeycutt, J. D., and Metz, J. T. (2010). Evolvingmolecules using multi-objective optimization: applying toADME/Tox. Drug Discovery Today, 15(11-12):451–460.

References III

Feher, M., Gao, Y., Baber, J. C., Shirley, W. A., and Saunders,J. (2008). The use of ligand-based de novo design for scaffoldhopping and sidechain optimization: two case studies. Bioorganic& Medicinal Chemistry, 16(1):422–427.

Firth, N. C., Atrash, B., Brown, N., and Blagg, J. (2015).MOARF, an Integrated Workflow for MultiobjectiveOptimization: Implementation, Synthesis, and BiologicalEvaluation. Journal of Chemical Information and Modeling.

Fonseca, C. and Fleming, P. (1998). Multiobjective optimizationand multiple constraint handling with evolutionary algorithms. I.A unified formulation. IEEE Transactions on Systems, Man andCybernetics, Part A: Systems and Humans, 28(1):26–37.

References IV

Giardine, B., Riemer, C., Hardison, R. C., Burhans, R., Elnitski,L., Shah, P., Zhang, Y., Blankenberg, D., Albert, I., Taylor, J.,Miller, W., Kent, W. J., and Nekrutenko, A. (2005). Galaxy: APlatform for Interactive Large-Scale Genome Analysis. GenomeResearch, 15(10):1451–1455.

Goecks, J., Nekrutenko, A., Taylor, J., and Galaxy Team, T.(2010). Galaxy: A comprehensive approach for supportingaccessible, reproducible, and transparent computational researchin the life sciences. Genome Biology, 11(8):R86.

Grefenstette, J. (1986). Optimization of Control Parameters forGenetic Algorithms. IEEE Transactions on Systems, Man andCybernetics, 16(1):122–128.

References V

Gurer-Orhan, H., Kool, J., Vermeulen, N. P. E., and Meerman, J.H. N. (2005). A novel microplate reader-based high-throughputassay for estrogen receptor binding. International Journal ofEnvironmental Analytical Chemistry, 85(3):149–161.

Hartenfeller, M., Zettl, H., Walter, M., Rupp, M., Reisen, F.,Proschak, E., Weggen, S., Stark, H., and Schneider, G. (2012).DOGS: Reaction-Driven de novo Design of BioactiveCompounds. PLoS Comput Biol, 8(2):e1002380.

Huang, Q., Li, L.-L., and Yang, S.-Y. (2010). PhDD: a newpharmacophore-based de novo design method of drug-likemolecules combined with assessment of synthetic accessibility.Journal of Molecular Graphics and Modelling, 28(8):775–787.

References VI

Kannas, C., Kalvari, I., Lambrinidis, G., Neophytou, C., Savva,C., Kirmitzoglou, I., Antoniou, Z., Achilleos, K., Scherf, D.,Pitta, C., Nicolaou, C., Mikros, E., Promponas, V., Gerhauser,C., Mehta, R., Constantinou, A., and Pattichis, C. (2015). LiSIs:An Online Scientific Workflow System for Virtual Screening.Combinatorial Chemistry & High Throughput Screening,18(3):281 – 295.

Kramer, O. (2010). Evolutionary self-adaptation: a survey ofoperators and strategy parameters. Evolutionary Intelligence,3(2):51–65.

References VII

Kutchukian, P. S., Lou, D., and Shakhnovich, E. I. (2009). FOG:Fragment Optimized Growth algorithm for the de novogeneration of molecules occupying druglike chemical space.Journal of Chemical Information and Modeling, 49(7):1630–1642.

Medina-Franco, J. L., Lopez-Vallejo, F., Kuck, D., and Lyko, F.(2010). Natural products as DNA methyltransferase inhibitors: acomputer-aided discovery approach. Molecular Diversity,15:293–304.

Nicolaou, C. A., Apostolakis, J., and Pattichis, C. S. (2009a). DeNovo Drug Design Using Multiobjective Evolutionary Graphs.Journal of Chemical Information and Modeling, 49(2):295–307.

References VIII

Nicolaou, C. A., Kannas, C., and Pattichis, C. S. (2009b).Optimal graph design using a knowledge-driven multi-objectiveevolutionary graph algorithm. In 2009 9th InternationalConference on Information Technology and Applications inBiomedicine, pages 1–6, Larnaka, Cyprus. IEEE.

Backup Frames

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 61 / 130

Table of Contents

6 List of Publications7 References8 Backup Frames

Validation of Self-Adaptive MOEAUse Case 1Use Case 2Use Case 3Use Case 4

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 61 / 130

Pareto Ranking

LiSIs Showcase - Known ER Ligands

A/A Estrogen Ligand Docking Score ER-α Docking Score ER-β1 Raloxifene -11.70 -8.722 Lilly-117018 -11.53 -3.803 3-HydroxyTamoxifen -11.02 N/A4 Nafoxidine -10.88 N/A5 ICI-182780 -10.73 N/A6 Pyrolidine -10.04 N/A7 Clomiphene A -10.01 N/A8 Nitrofinene Citrate -9.87 N/A9 ICI-164384 -9.82 -9.13

10 Moxestrol -9.38 -9.7711 Naringenine -8.55 -7.8012 Triphenylethylene -8.50 N/A13 Afema -8.15 -7.7814 Danazol -6.99 N/A15 Ethamoxytriphetol -6.67 N/A16 4-HydroxyTamoxifen -6.60 N/A17 Dioxin -6.22 N/A18 Estralutin -5.86 -3.8019 Cyclopentanone -4.88 N/A20 Miproxifene Phosphate -4.48 N/A21 EM-800 N/A N/A

Note: The list was retrieved from PubChem and it includes compounds characterized as

“estrogen ligands”. N/A; no binding affinity.

LiSIs Showcase - Natural-like Rule of 5 filter

GRANATUM Rule of 5 filter:

1 MW between 160 and 700,2 HBD less or equal to 5,3 HBA less or equal to 10,4 TPSA less than 140, and5 cLogP between -0.4 and 5.6.

eMEGA Settings

Table: eMEGA experimental design settings

Dataset Objectives Population Iterations Evolutionary OperationsDataset 1 Structural Similarity

Chemical DescriptorSimilarity

500 500Mutation Probability: 15%Crossover Probability: 80%Selection Type: RouletteDiversity Type: Genotype

Dataset 2

SAMOEA Settings

Table: SAMOEA experimental design settings

SAMOEADataset Objectives Population Iterations Evolutionary Operations

Dataset 1 Non DominateSolutions PercentageUnique SolutionsPercentage

20 100Mutation Probability: 15%Crossover Probability: 80%Selection Type: RouletteDiversity Type: Phenotype

Dataset 2

eMEGADataset 1 Structural Similarity

Chemical DescriptorSimilarity

100 1Defined during run time.Based on SAMOEA’s chro-mosomes.

Dataset 2

Virtual Machine Specifications

Table: Specifications of the virtual machine the experimental runs wereperformed

Linux Virtual MachineCPU 4x Virtual CPU @ 2GHzRAM 16GBOS CentOS 6

eMEGA Maybridge Run 1

Figure: eMEGA Run 1 results for Maybridge dataset.

eMEGA Maybridge Run 2

Figure: eMEGA Run 2 results for Maybridge dataset.

eMEGA Maybridge Run 3

Figure: eMEGA Run 3 results for Maybridge dataset.

eMEGA Maybridge Run 4

Figure: eMEGA Run 4 results for Maybridge dataset.

eMEGA Maybridge Run 5

Figure: eMEGA Run 5 results for Maybridge dataset.

eMEGA Maybridge All Runs

Figure: eMEGA results for Maybridge dataset.

eMEGA Maybridge All Runs Top 10 Results (1)

Figure: eMEGA Top 10 results for Maybridge dataset.

eMEGA Maybridge All Runs Top 10 Results (2)

Figure: eMEGA Top 10 results for Maybridge dataset compared withSeliciclib, the red highlighted part of the molecules is their commoncore.

eMEGA Asinex Run 1

Figure: eMEGA Run 1 results for Asinex dataset.

eMEGA Asinex Run 2

Figure: eMEGA Run 2 results for Asinex dataset.

eMEGA Asinex Run 3

Figure: eMEGA Run 3 results for Asinex dataset.

Results - eMEGA Asinex Run 4

Figure: eMEGA Run 4 results for Asinex dataset.

eMEGA Asinex Run 5

Figure: eMEGA Run 5 results for Asinex dataset.

eMEGA Asinex All Runs

Figure: eMEGA results for Asinex dataset.

eMEGA Asinex All Runs Top 10 Results (1)

Figure: eMEGA Top 10 results for Asinex dataset.

eMEGA Asinex All Runs Top 10 Results (2)

Figure: eMEGA Top 10 results for Asinex dataset compared withSeliciclib, the red highlighted part of the molecules is their commoncore.

SAMOEA Maybridge Run 1

Figure: SAMOEA Run 1 results for Maybridge dataset.

SAMOEA Maybridge Run 2

Figure: SAMOEA Run 2 results for Maybridge dataset.

SAMOEA Maybridge Run 3

Figure: SAMOEA Run 3 results for Maybridge dataset.

SAMOEA Maybridge Run 4

Figure: SAMOEA Run 4 results for Maybridge dataset.

SAMOEA Maybridge Run 5

Figure: SAMOEA Run 5 results for Maybridge dataset.

SAMOEA Maybridge All Runs

Figure: SAMOEA results for Maybridge dataset.

SAMOEA Maybridge All Runs Top 10 Results (1)

Figure: SAMOEA Top 10 results for Maybridge dataset.

SAMOEA Maybridge All Runs Top 10 Results (2)

Figure: SAMOEA Top 10 results for Maybridge dataset compared withSeliciclib, the red highlighted part of the molecules is their commoncore.

SAMOEA Top 10 proposed settings for eMEGA

for Maybridge dataset

Table: SAMOEA Top 10 proposed settings for eMEGA for Maybridgedataset

MutationProbability

CrossoverProbability

SelectionType

DiversityType

NonDominated%

UniqueSolutions%

Rank

0.029 0.694 roulette genotype 0.9 0.986 10.175 0.818 roulette phenotype 0.914 0.961 10.172 0.818 tournament phenotype 0.934 0.9533 10.026 0.694 roulette phenotype 0.928 0.955 10.001 0.963 roulette phenotype 0.982 0.848 10.177 0.818 roulette phenotype 0.921 0.956 10.083 0.73 tournament phenotype 0.95 0.946 10.086 0.798 tournament genotype 0.976 0.928 10.172 0.818 best genotype 0.914 0.973 20.176 0.818 roulette genotype 0.9312 0.956 2

Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the

actual %. The smaller the number listed here the better. ’Rank’ is their non dominance

rank.

SAMOEA Asinex Run 1

Figure: SAMOEA Run 1 results for Asinex dataset.

SAMOEA Asinex Run 2

Figure: SAMOEA Run 2 results for Asinex dataset.

SAMOEA Asinex Run 3

Figure: SAMOEA Run 3 results for Asinex dataset.

SAMOEA Asinex Run 4

Figure: SAMOEA Run 4 results for Asinex dataset.

SAMOEA Asinex All Runs

Figure: SAMOEA results for Asinex dataset.

SAMOEA Asinex All Runs Top 10 Results (1)

Figure: SAMOEA Top 10 results for Asinex dataset.

SAMOEA Asinex All Runs Top 10 Results (2)

Figure: SAMOEA Top 10 results for Asinex dataset compared withSeliciclib, the red highlighted part of the molecules is their commoncore.

SAMOEA Top 10 proposed settings for eMEGA

for Maybridge Asinex

Table: SAMOEA Top 10 proposed settings for eMEGA for Asinexdataset

MutationProbability

CrossoverProbability

SelectionType

DiversityType

NonDominated%

UniqueSolutions%

Rank

0.105 1.0 best phenotype 0.988 0.931 10.139 0.963 tournament phenotype 0.962 0.956 10.089 0.694 tournament genotype 0.976 0.943 10.139 0.969 best phenotype 0.96 0.96 10.108 0.69 tournament genotype 0.955 0.962 10.1 1.0 best phenotype 0.988 0.942 1

0.088 0.685 tournament genotype 0.96 0.962 10.139 0.966 roulette phenotype 0.965 0.948 10.089 0.709 tournament genotype 0.964 0.957 2

Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the

actual %. The smaller the number listed here the better. ’Rank’ is their non dominance

rank.

MOARF Results

Figure: MOARF’s results compared with Seliciclib.

Compare SAMOEA, eMEGA and MOARF

Figure: Compare all Top 10 results with MOARF’s results andSeliciclib.

Discussion (1)

eMEGA and SAMOEA generate molecules that approximateSeliciclib,Datasets and algorithms have different common core withSeliciclib,MOARF approximates Seliciclib better than eMEGA andSAMOEA:

Generates molecules in a more chemical oriented way, withless stochastic operations,Starts from a selected core for the target where then attachesnew fragments on to it,

SAMOEA explores the space better than eMEGA andMOARF

Discussion (1)

eMEGA and SAMOEA generate molecules that approximateSeliciclib,Datasets and algorithms have different common core withSeliciclib,MOARF approximates Seliciclib better than eMEGA andSAMOEA:

Generates molecules in a more chemical oriented way, withless stochastic operations,Starts from a selected core for the target where then attachesnew fragments on to it,

SAMOEA explores the space better than eMEGA andMOARF

Discussion (1)

eMEGA and SAMOEA generate molecules that approximateSeliciclib,Datasets and algorithms have different common core withSeliciclib,MOARF approximates Seliciclib better than eMEGA andSAMOEA:

Generates molecules in a more chemical oriented way, withless stochastic operations,Starts from a selected core for the target where then attachesnew fragments on to it,

SAMOEA explores the space better than eMEGA andMOARF

Discussion (1)

eMEGA and SAMOEA generate molecules that approximateSeliciclib,Datasets and algorithms have different common core withSeliciclib,MOARF approximates Seliciclib better than eMEGA andSAMOEA:

Generates molecules in a more chemical oriented way, withless stochastic operations,Starts from a selected core for the target where then attachesnew fragments on to it,

SAMOEA explores the space better than eMEGA andMOARF

Discussion (1)

eMEGA and SAMOEA generate molecules that approximateSeliciclib,Datasets and algorithms have different common core withSeliciclib,MOARF approximates Seliciclib better than eMEGA andSAMOEA:

Generates molecules in a more chemical oriented way, withless stochastic operations,Starts from a selected core for the target where then attachesnew fragments on to it,

SAMOEA explores the space better than eMEGA andMOARF

Discussion (1)

eMEGA and SAMOEA generate molecules that approximateSeliciclib,Datasets and algorithms have different common core withSeliciclib,MOARF approximates Seliciclib better than eMEGA andSAMOEA:

Generates molecules in a more chemical oriented way, withless stochastic operations,Starts from a selected core for the target where then attachesnew fragments on to it,

SAMOEA explores the space better than eMEGA andMOARF

Discussion (2)

From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.

Maybridge dataset:

Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.

Asinex dataset:

Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.

Discussion (2)

From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.

Maybridge dataset:

Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.

Asinex dataset:

Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.

Discussion (2)

From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.

Maybridge dataset:

Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.

Asinex dataset:

Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.

Discussion (2)

From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.

Maybridge dataset:

Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.

Asinex dataset:

Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.

Discussion (2)

From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.

Maybridge dataset:

Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.

Asinex dataset:

Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.

Discussion (2)

From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.

Maybridge dataset:

Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.

Asinex dataset:

Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.

Discussion (2)

From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.

Maybridge dataset:

Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.

Asinex dataset:

Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.

Discussion (2)

From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.

Maybridge dataset:

Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.

Asinex dataset:

Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.

Discussion (2)

From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.

Maybridge dataset:

Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.

Asinex dataset:

Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.

Discussion (2)

From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.

Maybridge dataset:

Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.

Asinex dataset:

Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.

Discussion (3)

The objective fitness scores for the proposed settings are veryhigh, which means that the actual percentage is really low, below5%. From this we can conclude the following:

eMEGA instances generate a large number of identicalsolutions, despite the fact that they have differentconfigurations, this is something that we noticed withprevious experiments when comparing MEGA, eMEGA andMOGA [Nicolaou et al., 2009b], andThe objective fitness functions we choose to use in SAMOEAcompete each other, which means that having eMEGAsgenerating a high number of unique and non dominatedsolutions (above 20%) proves to be a difficult task.

Discussion (3)

The objective fitness scores for the proposed settings are veryhigh, which means that the actual percentage is really low, below5%. From this we can conclude the following:

eMEGA instances generate a large number of identicalsolutions, despite the fact that they have differentconfigurations, this is something that we noticed withprevious experiments when comparing MEGA, eMEGA andMOGA [Nicolaou et al., 2009b], andThe objective fitness functions we choose to use in SAMOEAcompete each other, which means that having eMEGAsgenerating a high number of unique and non dominatedsolutions (above 20%) proves to be a difficult task.

Use Case 1: Docked designed molecules (1)

Figure: Designed molecule DnD 6 SP 20 4 X 13a docked to ER-α.

Use Case 1: Docked designed molecules (2)

Figure: Designed molecule DnD 31 SP 150 37 M 19 docked to ER-α.

Use Case 1: Docked designed molecules (3)

Figure: Designed molecule DnD 8 SP 9 2 M 13 docked to ER-α.

Use Case 1: Docked designed molecules (4)

Figure: Designed molecule DnD 4 SP 199 49 X 46b docked to ER-α.

Use Case 1: Docked designed molecules (5)

Figure: Designed molecule DnD 12 SP 75 18 M 13 docked to ER-α.

Use Case 1: Docked designed molecules (6)

Figure: Designed molecule DnD 31 SP 6 1 M 16 docked to ER-α.

Use Case 1: Docked designed molecules (7)

Figure: Designed molecule DnD 15 SP 168 41 M 0 docked to ER-α.

Use Case 1: Docked designed molecules (8)

Figure: Designed molecule DnD 11 SP 74 18 M 4 docked to ER-α.

Use Case 1: Docked designed molecules (9)

Figure: Designed molecule DnD 31 SP 193 48 X 76b docked to ER-α.

Use Case 1: Docked designed molecules (10)

Figure: Designed molecule DnD 1 SP 78 19 X 84a docked to ER-α.

Use Case 2: About

Design molecules that bind to ER-α based on:

Structural similarity to Tamoxifen, andChemical Properties similarity to Tamoxifen.

Figure: Tamoxifen.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 116 / 130

Use Case 2: Starting Dataset

Starting Molecules dataset:

Molecules retrieved from ZINC15,Applied filters:

Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).

The collection contains 7035 molecules.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 117 / 130

Use Case 2: Results - In objective space

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 118 / 130

Use Case 2: Results - Designed molecules

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 119 / 130

Use Case 2: Results - AutoDock Vina docking

Molecule Id Docking Affinity (kcal/mol)DnD 42 SP 194 48 X 96b -10.1DnD 17 SP 199 49 M 4 -10DnD 33 SP 189 47 X 66b -9.9DnD 48 SP 193 48 M 5 -9.6Tamoxifen -8.2

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 120 / 130

Use Case 2: Results - Self-Adaptive MOEA non

dominated settings for eMEGA

MutationProbability

CrossoverProbability

SelectionType

DiversityType

NonDominated%

ParetoHypervolume

Rank

0.02707 0.97973 tournament genotype 0.983 0.153 10.02758 0.97965 tournament phenotype 0.988 0.152 1

Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the

number listed here the better. ’Rank’ is their non dominance rank.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 121 / 130

Use Case 2: Docked designed molecules (1)

Figure: Designed molecule DnD 42 SP 194 48 X 96b docked to ER-α.

Use Case 2: Docked designed molecules (2)

Figure: Designed molecule DnD 17 SP 199 49 M 4 docked to ER-α.

Use Case 2: Docked designed molecules (3)

Figure: Designed molecule DnD 33 SP 189 47 X 66b docked to ER-α.

Use Case 2: Docked designed molecules (4)

Figure: Designed molecule DnD 48 SP 193 48 M 5 docked to ER-α.

Use Case 3: Docked designed molecules (1)

Figure: Designed molecule DnD 31 SP 194 48 M 49 docked to ER-α.

Use Case 3: Docked designed molecules (2)

Figure: Designed molecule DnD 34 SP 197 49 X 13a docked to ER-α.

Use Case 4: Docked designed molecules (1)

Figure: Designed molecule DnD 19 SP 196 48 X 59b docked toProteasome B5.

Use Case 4: Docked designed molecules (2)

Figure: Designed molecule DnD 49 SP 193 48 X 123b docked toProteasome B5.

Use Case 4: Docked designed molecules (3)

Figure: Designed molecule DnD 1 SP 196 48 X 67a docked toProteasome B5.

top related