ckannas phd thesis slides

243
Scientific Workflow Systems and Multi-Objective Evolutionary Algorithms for Life Sciences Informatics Christos C. Kannas Computer Science, University of Cyprus 6th June 2017

Upload: christos-kannas

Post on 28-Jan-2018

73 views

Category:

Science


0 download

TRANSCRIPT

Page 1: CKannas PhD Thesis Slides

Scientific Workflow Systems

and

Multi-Objective Evolutionary Algorithms

for

Life Sciences Informatics

Christos C. Kannas

Computer Science, University of Cyprus

6th June 2017

Page 2: CKannas PhD Thesis Slides

Table of Contents

1 IntroductionScientific Workflow Management SystemsSelf-Adaptive Multi-Objective Evolutionary AlgorithmsVirtual Screening & De Novo Molecular Design

2 Life Sciences Informatics platformAbout Life Sciences Informatics platformLiSIs ShowcaseLiSIs Showcase Discussion

3 Self-Adaptive Multi-Objective Evolutionary AlgorithmAbout Self-Adaptive MOEASelf-Adaptive MOEA ShowcasesSelf-Adaptive MOEA Showcases Discussion

4 Concluding RemarksConcluding Remarks - LiSIs platformConcluding Remarks - Self-Adaptive MOEA

5 Future WorkFuture Work - LiSIs platformFuture Work - Self-Adaptive MOEA

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 1 / 130

Page 3: CKannas PhD Thesis Slides

Introduction

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 2 / 130

Page 4: CKannas PhD Thesis Slides

Table of Contents

1 IntroductionScientific Workflow Management SystemsSelf-Adaptive Multi-Objective Evolutionary AlgorithmsVirtual Screening & De Novo Molecular Design

2 Life Sciences Informatics platformAbout Life Sciences Informatics platformLiSIs ShowcaseLiSIs Showcase Discussion

3 Self-Adaptive Multi-Objective Evolutionary AlgorithmAbout Self-Adaptive MOEASelf-Adaptive MOEA ShowcasesSelf-Adaptive MOEA Showcases Discussion

4 Concluding RemarksConcluding Remarks - LiSIs platformConcluding Remarks - Self-Adaptive MOEA

5 Future WorkFuture Work - LiSIs platformFuture Work - Self-Adaptive MOEA

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 2 / 130

Page 5: CKannas PhD Thesis Slides

Scientific Workflow Management Systems

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 3 / 130

Page 6: CKannas PhD Thesis Slides

SWMSs Application Domains

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 4 / 130

Page 7: CKannas PhD Thesis Slides

Self-Adaptive Multi-Objective Evolutionary

Algorithms

Multi-Objective Evolutionary Algorithms:Family of algorithms inspired by nature:

Evolve a populationMutation and CrossoverSelect fittest individuals by Pareto ranking

Handle 1 to 3 objectives

Self-Adaptive Techniques:Optimise search parameters:

Population SizeMutation RateCrossover RateGeneration GapScaling Window

Optimise reproduction operators:

Mutation Operator(s)Crossover Operator(s)Parent Selection Operator

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130

Page 8: CKannas PhD Thesis Slides

Self-Adaptive Multi-Objective Evolutionary

Algorithms

Multi-Objective Evolutionary Algorithms:Family of algorithms inspired by nature:

Evolve a populationMutation and CrossoverSelect fittest individuals by Pareto ranking

Handle 1 to 3 objectives

Self-Adaptive Techniques:Optimise search parameters:

Population SizeMutation RateCrossover RateGeneration GapScaling Window

Optimise reproduction operators:

Mutation Operator(s)Crossover Operator(s)Parent Selection Operator

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130

Page 9: CKannas PhD Thesis Slides

Self-Adaptive Multi-Objective Evolutionary

Algorithms

Multi-Objective Evolutionary Algorithms:Family of algorithms inspired by nature:

Evolve a populationMutation and CrossoverSelect fittest individuals by Pareto ranking

Handle 1 to 3 objectives

Self-Adaptive Techniques:Optimise search parameters:

Population SizeMutation RateCrossover RateGeneration GapScaling Window

Optimise reproduction operators:

Mutation Operator(s)Crossover Operator(s)Parent Selection Operator

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130

Page 10: CKannas PhD Thesis Slides

Self-Adaptive Multi-Objective Evolutionary

Algorithms

Multi-Objective Evolutionary Algorithms:Family of algorithms inspired by nature:

Evolve a populationMutation and CrossoverSelect fittest individuals by Pareto ranking

Handle 1 to 3 objectives

Self-Adaptive Techniques:Optimise search parameters:

Population SizeMutation RateCrossover RateGeneration GapScaling Window

Optimise reproduction operators:

Mutation Operator(s)Crossover Operator(s)Parent Selection Operator

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130

Page 11: CKannas PhD Thesis Slides

Self-Adaptive Multi-Objective Evolutionary

Algorithms

Multi-Objective Evolutionary Algorithms:Family of algorithms inspired by nature:

Evolve a populationMutation and CrossoverSelect fittest individuals by Pareto ranking

Handle 1 to 3 objectives

Self-Adaptive Techniques:Optimise search parameters:

Population SizeMutation RateCrossover RateGeneration GapScaling Window

Optimise reproduction operators:

Mutation Operator(s)Crossover Operator(s)Parent Selection Operator

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130

Page 12: CKannas PhD Thesis Slides

Self-Adaptive Multi-Objective Evolutionary

Algorithms

Multi-Objective Evolutionary Algorithms:Family of algorithms inspired by nature:

Evolve a populationMutation and CrossoverSelect fittest individuals by Pareto ranking

Handle 1 to 3 objectives

Self-Adaptive Techniques:Optimise search parameters:

Population SizeMutation RateCrossover RateGeneration GapScaling Window

Optimise reproduction operators:

Mutation Operator(s)Crossover Operator(s)Parent Selection Operator

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130

Page 13: CKannas PhD Thesis Slides

Drug Discovery Process - Steps

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 6 / 130

Page 14: CKannas PhD Thesis Slides

Drug Discovery Process - Timeline

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 7 / 130

Page 15: CKannas PhD Thesis Slides

Life Sciences Informatics platform

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 8 / 130

Page 16: CKannas PhD Thesis Slides

Table of Contents

1 IntroductionScientific Workflow Management SystemsSelf-Adaptive Multi-Objective Evolutionary AlgorithmsVirtual Screening & De Novo Molecular Design

2 Life Sciences Informatics platformAbout Life Sciences Informatics platformLiSIs ShowcaseLiSIs Showcase Discussion

3 Self-Adaptive Multi-Objective Evolutionary AlgorithmAbout Self-Adaptive MOEASelf-Adaptive MOEA ShowcasesSelf-Adaptive MOEA Showcases Discussion

4 Concluding RemarksConcluding Remarks - LiSIs platformConcluding Remarks - Self-Adaptive MOEA

5 Future WorkFuture Work - LiSIs platformFuture Work - Self-Adaptive MOEA

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 8 / 130

Page 17: CKannas PhD Thesis Slides

Motivation & Objectives

MotivationProvide an easy to use web based platform,Focused on Virtual Screening (VS) of natural products, andAimed towards cancer chemoprevention researchers.

ObjectivesDesign and develop a web based Scientific WorkflowManagement System (SWMS),Provide tools for VS, andEvaluate it on use cases for identifying novel chemopreventiveagents.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130

Page 18: CKannas PhD Thesis Slides

Motivation & Objectives

MotivationProvide an easy to use web based platform,Focused on Virtual Screening (VS) of natural products, andAimed towards cancer chemoprevention researchers.

ObjectivesDesign and develop a web based Scientific WorkflowManagement System (SWMS),Provide tools for VS, andEvaluate it on use cases for identifying novel chemopreventiveagents.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130

Page 19: CKannas PhD Thesis Slides

Motivation & Objectives

MotivationProvide an easy to use web based platform,Focused on Virtual Screening (VS) of natural products, andAimed towards cancer chemoprevention researchers.

ObjectivesDesign and develop a web based Scientific WorkflowManagement System (SWMS),Provide tools for VS, andEvaluate it on use cases for identifying novel chemopreventiveagents.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130

Page 20: CKannas PhD Thesis Slides

Motivation & Objectives

MotivationProvide an easy to use web based platform,Focused on Virtual Screening (VS) of natural products, andAimed towards cancer chemoprevention researchers.

ObjectivesDesign and develop a web based Scientific WorkflowManagement System (SWMS),Provide tools for VS, andEvaluate it on use cases for identifying novel chemopreventiveagents.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130

Page 21: CKannas PhD Thesis Slides

Motivation & Objectives

MotivationProvide an easy to use web based platform,Focused on Virtual Screening (VS) of natural products, andAimed towards cancer chemoprevention researchers.

ObjectivesDesign and develop a web based Scientific WorkflowManagement System (SWMS),Provide tools for VS, andEvaluate it on use cases for identifying novel chemopreventiveagents.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130

Page 22: CKannas PhD Thesis Slides

Motivation & Objectives

MotivationProvide an easy to use web based platform,Focused on Virtual Screening (VS) of natural products, andAimed towards cancer chemoprevention researchers.

ObjectivesDesign and develop a web based Scientific WorkflowManagement System (SWMS),Provide tools for VS, andEvaluate it on use cases for identifying novel chemopreventiveagents.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130

Page 23: CKannas PhD Thesis Slides

Scientific Workflow Management Systems for

Virtual Screening

Applications Technology Scientific Field(s)

Open SourceTaverna Java

Bioinformatics,Chemistry,Astronomy,Data Mining,Text Mining,Music

Galaxy PythonLife Sciences,Bioinformatics

Knime Java

Life Sciences,Chemoinformatics,Bioinformatics,High Performance Data Anal-ysis

CommercialInforsence/DiscoveryNet

Life Sciences,Healthcare,Environmental Monitoring,Geo-hazard Modelling

Pipeline PilotBiology,Chemistry,Material Science

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 10 / 130

Page 24: CKannas PhD Thesis Slides

Funding Support

The work has been partially supported through the EU-FP7GRANATUM project, ”A Social Collaborative WorkingSpace Semantically Interlinking Biomedical Researchers,Knowledge and data for the design and execution of In SilicoModels and Experiments in Cancer Chemoprevention”,contract number 270139.Support the research of EU-FP7 Linked2Safety project, ”ANext-Generation, Secure Linked Data Medical InformationSpace For Semantically-Interconnecting Electronic HealthRecords and Clinical Trials Systems Advancing PatientsSafety In Clinical Research”, contract number 288328.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 11 / 130

Page 25: CKannas PhD Thesis Slides

Funding Support

The work has been partially supported through the EU-FP7GRANATUM project, ”A Social Collaborative WorkingSpace Semantically Interlinking Biomedical Researchers,Knowledge and data for the design and execution of In SilicoModels and Experiments in Cancer Chemoprevention”,contract number 270139.Support the research of EU-FP7 Linked2Safety project, ”ANext-Generation, Secure Linked Data Medical InformationSpace For Semantically-Interconnecting Electronic HealthRecords and Clinical Trials Systems Advancing PatientsSafety In Clinical Research”, contract number 288328.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 11 / 130

Page 26: CKannas PhD Thesis Slides

Life Sciences Informatics platform

Life Sciences Informatics (LiSIs) is a web based SWMS forVS [Kannas et al., 2015].LiSIs is based on the Galaxy SWMS [Goecks et al., 2010],[Blankenberg et al., 2010], [Giardine et al., 2005].

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 12 / 130

Page 27: CKannas PhD Thesis Slides

Life Sciences Informatics platform

Life Sciences Informatics (LiSIs) is a web based SWMS forVS [Kannas et al., 2015].LiSIs is based on the Galaxy SWMS [Goecks et al., 2010],[Blankenberg et al., 2010], [Giardine et al., 2005].

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 12 / 130

Page 28: CKannas PhD Thesis Slides

LiSIs modules

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 13 / 130

Page 29: CKannas PhD Thesis Slides

LiSIs modules

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 13 / 130

Page 30: CKannas PhD Thesis Slides

LiSIs modules

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 13 / 130

Page 31: CKannas PhD Thesis Slides

LiSIs modules

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 13 / 130

Page 32: CKannas PhD Thesis Slides

LiSIs Showcase

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 14 / 130

Page 33: CKannas PhD Thesis Slides

LiSIs Showcase Information

LiSIs was (successfully) used for the discovery of promisingagents with chemopreventive properties, that are able to bindto Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β(ER-β)Datasets:

2414 compounds from Indofine,55 compounds characterized by Medina-Franco et al.[Medina-Franco et al., 2010], and21 known ER ligands retrieved from PubChem.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130

Page 34: CKannas PhD Thesis Slides

LiSIs Showcase Information

LiSIs was (successfully) used for the discovery of promisingagents with chemopreventive properties, that are able to bindto Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β(ER-β)Datasets:

2414 compounds from Indofine,55 compounds characterized by Medina-Franco et al.[Medina-Franco et al., 2010], and21 known ER ligands retrieved from PubChem.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130

Page 35: CKannas PhD Thesis Slides

LiSIs Showcase Information

LiSIs was (successfully) used for the discovery of promisingagents with chemopreventive properties, that are able to bindto Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β(ER-β)Datasets:

2414 compounds from Indofine,55 compounds characterized by Medina-Franco et al.[Medina-Franco et al., 2010], and21 known ER ligands retrieved from PubChem.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130

Page 36: CKannas PhD Thesis Slides

LiSIs Showcase Information

LiSIs was (successfully) used for the discovery of promisingagents with chemopreventive properties, that are able to bindto Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β(ER-β)Datasets:

2414 compounds from Indofine,55 compounds characterized by Medina-Franco et al.[Medina-Franco et al., 2010], and21 known ER ligands retrieved from PubChem.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130

Page 37: CKannas PhD Thesis Slides

LiSIs Showcase Information

LiSIs was (successfully) used for the discovery of promisingagents with chemopreventive properties, that are able to bindto Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β(ER-β)Datasets:

2414 compounds from Indofine,55 compounds characterized by Medina-Franco et al.[Medina-Franco et al., 2010], and21 known ER ligands retrieved from PubChem.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130

Page 38: CKannas PhD Thesis Slides

LiSIs Showcase Workflow

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 16 / 130

Page 39: CKannas PhD Thesis Slides

LiSIs Showcase Docking Results

(a) ER-α Docking Score (b) ER-β Docking Score

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 17 / 130

Page 40: CKannas PhD Thesis Slides

LiSIs Showcase Discussion

From Indofine dataset (2414 compounds), based on theirnatural-like criteria and docking results, we selected:

18 potential ER ligands,Were further investigated in vitro with the ER binding assaydescribed by Gurer-Orhan et al. [Gurer-Orhan et al., 2005]with minor modifications,15 out of 18 compounds (83.3%) were experimentallyconfirmed active.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 18 / 130

Page 41: CKannas PhD Thesis Slides

LiSIs Showcase Discussion

From Indofine dataset (2414 compounds), based on theirnatural-like criteria and docking results, we selected:

18 potential ER ligands,Were further investigated in vitro with the ER binding assaydescribed by Gurer-Orhan et al. [Gurer-Orhan et al., 2005]with minor modifications,15 out of 18 compounds (83.3%) were experimentallyconfirmed active.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 18 / 130

Page 42: CKannas PhD Thesis Slides

LiSIs Showcase Discussion

From Indofine dataset (2414 compounds), based on theirnatural-like criteria and docking results, we selected:

18 potential ER ligands,Were further investigated in vitro with the ER binding assaydescribed by Gurer-Orhan et al. [Gurer-Orhan et al., 2005]with minor modifications,15 out of 18 compounds (83.3%) were experimentallyconfirmed active.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 18 / 130

Page 43: CKannas PhD Thesis Slides

Self-Adaptive Multi-Objective Evolutionary

Algorithm

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 19 / 130

Page 44: CKannas PhD Thesis Slides

Table of Contents

1 IntroductionScientific Workflow Management SystemsSelf-Adaptive Multi-Objective Evolutionary AlgorithmsVirtual Screening & De Novo Molecular Design

2 Life Sciences Informatics platformAbout Life Sciences Informatics platformLiSIs ShowcaseLiSIs Showcase Discussion

3 Self-Adaptive Multi-Objective Evolutionary AlgorithmAbout Self-Adaptive MOEASelf-Adaptive MOEA ShowcasesSelf-Adaptive MOEA Showcases Discussion

4 Concluding RemarksConcluding Remarks - LiSIs platformConcluding Remarks - Self-Adaptive MOEA

5 Future WorkFuture Work - LiSIs platformFuture Work - Self-Adaptive MOEA

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 19 / 130

Page 45: CKannas PhD Thesis Slides

Multi-Objective Algorithms for Molecular Design

Name MO Method SearchMethod

Remarks Reference

EA-Inventor

Weighted EvolutionaryAlgorithm

Ligand [Feher et al., 2008]

GANDI Weighted Parallel Evo-lutionary Al-gorithm

Structure [Dey and Caflisch, 2008]

FOG Weighted EvolutionaryAlgorithm

Ligand [Kutchukian et al., 2009]

MEGA Pareto based EvolutionaryAlgorithm

Ligand & Struc-ture

[Nicolaou et al., 2009a]

PLD Pareto based EvolutionaryAlgorithm

ADME relatedproperties

[Ekins et al., 2010]

NovoFLAP Weighted EvolutionaryAlgorithm

Ligand [Damewood et al., 2010]

PhDD Weighted Workflow Pharmacophore [Huang et al., 2010]DOGS Weighted Workflow Ligand [Hartenfeller et al., 2012]LiGen Weighted Workflow Ligand, Struc-

ture & Pharma-cophore

[Beccari et al., 2013]

MOARF Weighted Workflow Ligand & Struc-ture

[Firth et al., 2015]

Synopsis Pareto based EvolutionaryAlgorithm

Ligand & Struc-ture

[Daeyaert and Deem, 2016]

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 20 / 130

Page 46: CKannas PhD Thesis Slides

Motivation & Objectives

MotivationFind suitable search parameters for an algorithm in a givenproblem, andAutomate this process.

ObjectivesDesign and develop an algorithm:

To search for the fittest search parameters of MOEAs,To be problem agnostic, andEvaluate on our previously proposed eMEGA for molecularDe Novo Design.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130

Page 47: CKannas PhD Thesis Slides

Motivation & Objectives

MotivationFind suitable search parameters for an algorithm in a givenproblem, andAutomate this process.

ObjectivesDesign and develop an algorithm:

To search for the fittest search parameters of MOEAs,To be problem agnostic, andEvaluate on our previously proposed eMEGA for molecularDe Novo Design.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130

Page 48: CKannas PhD Thesis Slides

Motivation & Objectives

MotivationFind suitable search parameters for an algorithm in a givenproblem, andAutomate this process.

ObjectivesDesign and develop an algorithm:

To search for the fittest search parameters of MOEAs,To be problem agnostic, andEvaluate on our previously proposed eMEGA for molecularDe Novo Design.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130

Page 49: CKannas PhD Thesis Slides

Motivation & Objectives

MotivationFind suitable search parameters for an algorithm in a givenproblem, andAutomate this process.

ObjectivesDesign and develop an algorithm:

To search for the fittest search parameters of MOEAs,To be problem agnostic, andEvaluate on our previously proposed eMEGA for molecularDe Novo Design.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130

Page 50: CKannas PhD Thesis Slides

Motivation & Objectives

MotivationFind suitable search parameters for an algorithm in a givenproblem, andAutomate this process.

ObjectivesDesign and develop an algorithm:

To search for the fittest search parameters of MOEAs,To be problem agnostic, andEvaluate on our previously proposed eMEGA for molecularDe Novo Design.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130

Page 51: CKannas PhD Thesis Slides

About Self-Adaptive MOEA

Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:

Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.

Objective fitness functions for the meta-level:

The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130

Page 52: CKannas PhD Thesis Slides

About Self-Adaptive MOEA

Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:

Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.

Objective fitness functions for the meta-level:

The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130

Page 53: CKannas PhD Thesis Slides

About Self-Adaptive MOEA

Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:

Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.

Objective fitness functions for the meta-level:

The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130

Page 54: CKannas PhD Thesis Slides

About Self-Adaptive MOEA

Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:

Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.

Objective fitness functions for the meta-level:

The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130

Page 55: CKannas PhD Thesis Slides

About Self-Adaptive MOEA

Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:

Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.

Objective fitness functions for the meta-level:

The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130

Page 56: CKannas PhD Thesis Slides

About Self-Adaptive MOEA

Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:

Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.

Objective fitness functions for the meta-level:

The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130

Page 57: CKannas PhD Thesis Slides

About Self-Adaptive MOEA

Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:

Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.

Objective fitness functions for the meta-level:

The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130

Page 58: CKannas PhD Thesis Slides

About Self-Adaptive MOEA

Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:

Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.

Objective fitness functions for the meta-level:

The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130

Page 59: CKannas PhD Thesis Slides

About Self-Adaptive MOEA

Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:

Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.

Objective fitness functions for the meta-level:

The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130

Page 60: CKannas PhD Thesis Slides

About Self-Adaptive MOEA

Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:

Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.

Objective fitness functions for the meta-level:

The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130

Page 61: CKannas PhD Thesis Slides

About Self-Adaptive MOEA

Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:

Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.

Objective fitness functions for the meta-level:

The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130

Page 62: CKannas PhD Thesis Slides

About Self-Adaptive MOEA

Meta-level algorithmic approach influenced by Grefenstette[Grefenstette, 1986] and Kramer [Kramer, 2010]Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],Meta-level EA is a modified MOGA[Fonseca and Fleming, 1998],Optimise eMEGA parameters:

Mutation Rate,Crossover Rate,Parent Selection Type,Population Diversity Type.

Objective fitness functions for the meta-level:

The percentage of non-dominated solutions each eMEGA hasper iteration,The percentage of unique solutions each eMEGA has periteration.Pareto Front Hypervolume each eMEGA has per iteration.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130

Page 63: CKannas PhD Thesis Slides

Self-Adaptive MOEA Pseudocode

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 23 / 130

Page 64: CKannas PhD Thesis Slides

Self-Adaptive MOEA Chromosome

Chromosomes Example

Objective Fitness Functions

Objective Fitness Function Range Example

Non-dominated Solutions % 0 - 1.0 0.90Unique Solutions % 0 - 1.0 0.88Pareto Front Hypervolume 0 - 1.0 0.56

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 24 / 130

Page 65: CKannas PhD Thesis Slides

Self-Adaptive MOEA Chromosome

Chromosomes Example

Objective Fitness Functions

Objective Fitness Function Range Example

Non-dominated Solutions % 0 - 1.0 0.90Unique Solutions % 0 - 1.0 0.88Pareto Front Hypervolume 0 - 1.0 0.56

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 24 / 130

Page 66: CKannas PhD Thesis Slides

eMEGA Chromosome

Graph based, andInformation related to evolutionary design process.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 25 / 130

Page 67: CKannas PhD Thesis Slides

eMEGA Chromosome

Graph based, andInformation related to evolutionary design process.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 25 / 130

Page 68: CKannas PhD Thesis Slides

eMEGA Chromosome

Graph based, andInformation related to evolutionary design process.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 25 / 130

Page 69: CKannas PhD Thesis Slides

Self-Adaptive MOEA Flowchart

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 26 / 130

Page 70: CKannas PhD Thesis Slides

Self-Adaptive MOEA Flowchart

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 26 / 130

Page 71: CKannas PhD Thesis Slides

Self-Adaptive MOEA Flowchart

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 26 / 130

Page 72: CKannas PhD Thesis Slides

Self-Adaptive MOEA Flowchart

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 26 / 130

Page 73: CKannas PhD Thesis Slides

Self-Adaptive MOEA Showcases

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 27 / 130

Page 74: CKannas PhD Thesis Slides

Validation of Self-Adaptive MOEA: About

Compare SAMOEA, eMEGA and MOARF[Firth et al., 2015].Design molecules that have structural and chemicalproperties similarity to the target molecule of Seliciclib.

Figure: Seliciclib (CYC202, R-roscovitine)

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 28 / 130

Page 75: CKannas PhD Thesis Slides

Validation of Self-Adaptive MOEA: Staring

Datasets

Starting Molecules datasets:

Maybridge’s Screening Library that contains 53953 molecules(Dataset 1),Asinex’s Elite Libraries that contains 104577 molecules(Dataset 2).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 29 / 130

Page 76: CKannas PhD Thesis Slides

Validation of Self-Adaptive MOEA: Staring

Datasets

Starting Molecules datasets:

Maybridge’s Screening Library that contains 53953 molecules(Dataset 1),Asinex’s Elite Libraries that contains 104577 molecules(Dataset 2).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 29 / 130

Page 77: CKannas PhD Thesis Slides

Validation of Self-Adaptive MOEA: Settings

eMEGA SettingsDataset Objectives Population Iterations Evolutionary Operations

Dataset 1 Structural SimilarityChemical DescriptorSimilarity

500 500Mutation Probability: 15%Crossover Probability: 80%Selection Type: RouletteDiversity Type: Genotype

Dataset 2

SAMOEA SettingsSAMOEA

Dataset Objectives Population Iterations Evolutionary OperationsDataset 1 Non Dominate

Solutions PercentageUnique SolutionsPercentage

20 100Mutation Probability: 15%Crossover Probability: 80%Selection Type: RouletteDiversity Type: Phenotype

Dataset 2

eMEGADataset 1 Structural Similarity

Chemical DescriptorSimilarity

100 1Defined during run time.Based on SAMOEA’s chro-mosomes.

Dataset 2

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 30 / 130

Page 78: CKannas PhD Thesis Slides

Validation of Self-Adaptive MOEA: Settings

eMEGA SettingsDataset Objectives Population Iterations Evolutionary Operations

Dataset 1 Structural SimilarityChemical DescriptorSimilarity

500 500Mutation Probability: 15%Crossover Probability: 80%Selection Type: RouletteDiversity Type: Genotype

Dataset 2

SAMOEA SettingsSAMOEA

Dataset Objectives Population Iterations Evolutionary OperationsDataset 1 Non Dominate

Solutions PercentageUnique SolutionsPercentage

20 100Mutation Probability: 15%Crossover Probability: 80%Selection Type: RouletteDiversity Type: Phenotype

Dataset 2

eMEGADataset 1 Structural Similarity

Chemical DescriptorSimilarity

100 1Defined during run time.Based on SAMOEA’s chro-mosomes.

Dataset 2

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 30 / 130

Page 79: CKannas PhD Thesis Slides

Validation of Self-Adaptive MOEA: Results

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 31 / 130

Page 80: CKannas PhD Thesis Slides

Validation of Self-Adaptive MOEA: Results

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 31 / 130

Page 81: CKannas PhD Thesis Slides

Validation of Self-Adaptive MOEA: Results -

Search Settings (1)

SAMOEA Top 10 proposed settings for eMEGA for Maybridge dataset

MutationProbability

CrossoverProbability

SelectionType

DiversityType

NonDominated%

UniqueSolutions%

Rank

0.029 0.694 roulette genotype 0.9 0.986 10.175 0.818 roulette phenotype 0.914 0.961 10.172 0.818 tournament phenotype 0.934 0.9533 10.026 0.694 roulette phenotype 0.928 0.955 10.001 0.963 roulette phenotype 0.982 0.848 10.177 0.818 roulette phenotype 0.921 0.956 10.083 0.73 tournament phenotype 0.95 0.946 10.086 0.798 tournament genotype 0.976 0.928 10.172 0.818 best genotype 0.914 0.973 20.176 0.818 roulette genotype 0.9312 0.956 2

Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the

actual %. The smaller the number listed here the better. ’Rank’ is their non dominance

rank.

Page 82: CKannas PhD Thesis Slides

Validation of Self-Adaptive MOEA: Results -

Search Settings (2)

SAMOEA Top 10 proposed settings for eMEGA for Asinex dataset

MutationProbability

CrossoverProbability

SelectionType

DiversityType

NonDominated%

UniqueSolutions%

Rank

0.105 1.0 best phenotype 0.988 0.931 10.139 0.963 tournament phenotype 0.962 0.956 10.089 0.694 tournament genotype 0.976 0.943 10.139 0.969 best phenotype 0.96 0.96 10.108 0.69 tournament genotype 0.955 0.962 10.1 1.0 best phenotype 0.988 0.942 1

0.088 0.685 tournament genotype 0.96 0.962 10.139 0.966 roulette phenotype 0.965 0.948 10.089 0.709 tournament genotype 0.964 0.957 2

Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the

actual %. The smaller the number listed here the better. ’Rank’ is their non dominance

rank.

Page 83: CKannas PhD Thesis Slides

Use Case 1: About

Design molecules that bind to ER-α based on:

Structural similarity to Tamoxifen, andStructural dissimilarity to Ibuproxam.

(a) Tamoxifen. (b) Ibuproxam.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 34 / 130

Page 84: CKannas PhD Thesis Slides

Use Case 1: Starting Dataset

Starting Molecules dataset:

Molecules retrieved from ZINC15,Applied filters:

Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).

The collection contains 7035 molecules.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130

Page 85: CKannas PhD Thesis Slides

Use Case 1: Starting Dataset

Starting Molecules dataset:

Molecules retrieved from ZINC15,Applied filters:

Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).

The collection contains 7035 molecules.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130

Page 86: CKannas PhD Thesis Slides

Use Case 1: Starting Dataset

Starting Molecules dataset:

Molecules retrieved from ZINC15,Applied filters:

Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).

The collection contains 7035 molecules.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130

Page 87: CKannas PhD Thesis Slides

Use Case 1: Starting Dataset

Starting Molecules dataset:

Molecules retrieved from ZINC15,Applied filters:

Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).

The collection contains 7035 molecules.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130

Page 88: CKannas PhD Thesis Slides

Use Case 1: Starting Dataset

Starting Molecules dataset:

Molecules retrieved from ZINC15,Applied filters:

Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).

The collection contains 7035 molecules.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130

Page 89: CKannas PhD Thesis Slides

Use Case 1: Starting Dataset

Starting Molecules dataset:

Molecules retrieved from ZINC15,Applied filters:

Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).

The collection contains 7035 molecules.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130

Page 90: CKannas PhD Thesis Slides

Use Case 1: Results - In objective space

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 36 / 130

Page 91: CKannas PhD Thesis Slides

Use Case 1: Results - Designed molecules

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 37 / 130

Page 92: CKannas PhD Thesis Slides

Use Case 1: Results - AutoDock Vina docking

Molecule Id Docking Affinity (kcal/mol)Tamoxifen -8.2DnD 6 SP 20 4 X 13a -7.9DnD 31 SP 150 37 M 19 -7.9DnD 8 SP 9 2 M 13 -7.8DnD 4 SP 199 49 X 46b -7.7DnD 12 SP 75 18 M 13 -7.6DnD 31 SP 6 1 M 16 -7.2DnD 15 SP 168 41 M 0 -7.2DnD 11 SP 74 18 M 4 -7.1DnD 31 SP 193 48 X 76b -6.9DnD 1 SP 78 19 X 84a -6.8

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 38 / 130

Page 93: CKannas PhD Thesis Slides

Use Case 1: Results - Self-Adaptive MOEA non

dominated settings for eMEGA

MutationProbability

CrossoverProbability

SelectionType

DiversityType

NonDominated%

ParetoHypervolume

Rank

0.15777 0.80279 tournament genotype 0.634 0.341 10.15613 0.88305 tournament genotype 0.634 0.341 10.15627 0.88891 tournament genotype 0.634 0.341 10.15688 0.88891 roulette genotype 0.649 0.340 10.00552 0.94308 best genotype 0.624 0.427 1

Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the

number listed here the better. ’Rank’ is their non dominance rank.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 39 / 130

Page 94: CKannas PhD Thesis Slides

Use Case 3: About

Design molecules that bind to ER-α based on:

Structural similarity to Raloxifene, andChemical Properties similarity to Raloxifene.

Figure: Raloxifene.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 40 / 130

Page 95: CKannas PhD Thesis Slides

Use Case 3: Starting Dataset

Starting Molecules dataset:

Molecules retrieved from ZINC15,Applied filters:

Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).

The collection contains 7035 molecules.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 41 / 130

Page 96: CKannas PhD Thesis Slides

Use Case 3: Results - In objective space

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 42 / 130

Page 97: CKannas PhD Thesis Slides

Use Case 3: Results - Designed molecules

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 43 / 130

Page 98: CKannas PhD Thesis Slides

Use Case 3: Results - AutoDock Vina docking

Molecule Id Docking Affinity (kcal/mol)DnD 31 SP 194 48 M 49 -8.2DnD 34 SP 197 49 X 13a -5.9Raloxifene -2.2 (-11.70 PubChem)

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 44 / 130

Page 99: CKannas PhD Thesis Slides

Use Case 3: Results - Self-Adaptive MOEA non

dominated settings for eMEGA

MutationProbability

CrossoverProbability

SelectionType

DiversityType

NonDominated%

ParetoHypervolume

Rank

0.12927 0.98597 roulette genotype 0.997 0.274 10.12897 0.98588 roulette genotype 0.997 0.274 10.12933 0.98588 roulette genotype 0.997 0.274 10.12946 0.98559 roulette genotype 0.997 0.274 10.12928 0.98582 roulette genotype 0.997 0.274 10.12897 0.98588 tournament genotype 0.997 0.274 1

Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the

number listed here the better. ’Rank’ is their non dominance rank.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 45 / 130

Page 100: CKannas PhD Thesis Slides

Use Case 4: About

Design molecules that bind to Proteasome B5 based on:

Structural similarity to Ixazomib, andChemical Properties similarity to Ixazomib.

Figure: Ixazomib.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 46 / 130

Page 101: CKannas PhD Thesis Slides

Use Case 4: Starting Dataset

Starting Molecules dataset:

Molecules retrieved from ZINC15,Applied filters:

Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).

The collection contains 7035 molecules.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 47 / 130

Page 102: CKannas PhD Thesis Slides

Use Case 4: Results - In objective space

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 48 / 130

Page 103: CKannas PhD Thesis Slides

Use Case 4: Results - Designed molecules

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 49 / 130

Page 104: CKannas PhD Thesis Slides

Use Case 4: Results - AutoDock 4 docking

Molecule Id Docking Affinity (kcal/mol)DnD 19 SP 196 48 X 59b -7.19DnD 49 SP 193 48 X 123b -6.68DnD 1 SP 196 48 X 67a -6.08

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 50 / 130

Page 105: CKannas PhD Thesis Slides

Use Case 4: Results - Self-Adaptive MOEA non

dominated settings for eMEGA

MutationProbability

CrossoverProbability

SelectionType

DiversityType

NonDominated%

ParetoHypervolume

Rank

0.09507 0.98194 tournament phenotype 0.993 0.442 10.09507 0.9819 roulette phenotype 0.991 0.442 10.09471 0.98178 roulette genotype 0.997 0.426 10.09484 0.98183 roulette phenotype 0.996 0.441 10.09277 0.98235 roulette genotype 0.996 0.441 1

Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the

number listed here the better. ’Rank’ is their non dominance rank.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 51 / 130

Page 106: CKannas PhD Thesis Slides

Self-Adaptive MOEA Showcases Discussion

SAMOEA proposed interesting solutions in all problems thathas been applied to,Further in-vitro investigation is required, andSAMOEA’s proposed eMEGA settings differ based onproblem and dataset (no silver bullet).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 52 / 130

Page 107: CKannas PhD Thesis Slides

Self-Adaptive MOEA Showcases Discussion

SAMOEA proposed interesting solutions in all problems thathas been applied to,Further in-vitro investigation is required, andSAMOEA’s proposed eMEGA settings differ based onproblem and dataset (no silver bullet).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 52 / 130

Page 108: CKannas PhD Thesis Slides

Self-Adaptive MOEA Showcases Discussion

SAMOEA proposed interesting solutions in all problems thathas been applied to,Further in-vitro investigation is required, andSAMOEA’s proposed eMEGA settings differ based onproblem and dataset (no silver bullet).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 52 / 130

Page 109: CKannas PhD Thesis Slides

Concluding Remarks

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 53 / 130

Page 110: CKannas PhD Thesis Slides

Table of Contents

1 IntroductionScientific Workflow Management SystemsSelf-Adaptive Multi-Objective Evolutionary AlgorithmsVirtual Screening & De Novo Molecular Design

2 Life Sciences Informatics platformAbout Life Sciences Informatics platformLiSIs ShowcaseLiSIs Showcase Discussion

3 Self-Adaptive Multi-Objective Evolutionary AlgorithmAbout Self-Adaptive MOEASelf-Adaptive MOEA ShowcasesSelf-Adaptive MOEA Showcases Discussion

4 Concluding RemarksConcluding Remarks - LiSIs platformConcluding Remarks - Self-Adaptive MOEA

5 Future WorkFuture Work - LiSIs platformFuture Work - Self-Adaptive MOEA

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 53 / 130

Page 111: CKannas PhD Thesis Slides

Concluding Remarks - LiSIs platform

Features a Web based Virtual Screening platform, focused forCancer Chemoprevention Research.To be expanded later in the future with tools featuring thealgorithms from MEGA framework.A number of SWs were implemented for:

preparing docking models,preparing predictive models,performing docking experiments,using predictive models to predict biochemical propertiesand behaviour, andperforming VS workflows.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 54 / 130

Page 112: CKannas PhD Thesis Slides

Concluding Remarks - LiSIs platform

Features a Web based Virtual Screening platform, focused forCancer Chemoprevention Research.To be expanded later in the future with tools featuring thealgorithms from MEGA framework.A number of SWs were implemented for:

preparing docking models,preparing predictive models,performing docking experiments,using predictive models to predict biochemical propertiesand behaviour, andperforming VS workflows.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 54 / 130

Page 113: CKannas PhD Thesis Slides

Concluding Remarks - LiSIs platform

Features a Web based Virtual Screening platform, focused forCancer Chemoprevention Research.To be expanded later in the future with tools featuring thealgorithms from MEGA framework.A number of SWs were implemented for:

preparing docking models,preparing predictive models,performing docking experiments,using predictive models to predict biochemical propertiesand behaviour, andperforming VS workflows.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 54 / 130

Page 114: CKannas PhD Thesis Slides

Concluding Remarks - Self-Adaptive MOEA (1)

Drawbacks:

Needs a lot of time to terminate, andVery slow convergence.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 55 / 130

Page 115: CKannas PhD Thesis Slides

Concluding Remarks - Self-Adaptive MOEA (1)

Drawbacks:

Needs a lot of time to terminate, andVery slow convergence.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 55 / 130

Page 116: CKannas PhD Thesis Slides

Concluding Remarks - Self-Adaptive MOEA (1)

Drawbacks:

Needs a lot of time to terminate, andVery slow convergence.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 55 / 130

Page 117: CKannas PhD Thesis Slides

Concluding Remarks - Self-Adaptive MOEA (2)

Advantages:

Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130

Page 118: CKannas PhD Thesis Slides

Concluding Remarks - Self-Adaptive MOEA (2)

Advantages:

Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130

Page 119: CKannas PhD Thesis Slides

Concluding Remarks - Self-Adaptive MOEA (2)

Advantages:

Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130

Page 120: CKannas PhD Thesis Slides

Concluding Remarks - Self-Adaptive MOEA (2)

Advantages:

Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130

Page 121: CKannas PhD Thesis Slides

Concluding Remarks - Self-Adaptive MOEA (2)

Advantages:

Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130

Page 122: CKannas PhD Thesis Slides

Concluding Remarks - Self-Adaptive MOEA (2)

Advantages:

Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130

Page 123: CKannas PhD Thesis Slides

Concluding Remarks - Self-Adaptive MOEA (2)

Advantages:

Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130

Page 124: CKannas PhD Thesis Slides

Concluding Remarks - Self-Adaptive MOEA (2)

Advantages:

Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130

Page 125: CKannas PhD Thesis Slides

Concluding Remarks - Self-Adaptive MOEA (2)

Advantages:

Searches a larger space,Generates far more solutions per iteration,Proposes the fittest parameter sets that should be used fromeMEGA for the given problem,Has been build to be adaptable,Uses objective fitness functions that can evaluate theeffectiveness and the progression of any MOEA,Can be used on other problems,SAMOEA’s chromosome can be expanded with additionalsearch parameters, andLeverages multi-core parallelism (needs more memory).

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130

Page 126: CKannas PhD Thesis Slides

Future Work

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 57 / 130

Page 127: CKannas PhD Thesis Slides

Table of Contents

1 IntroductionScientific Workflow Management SystemsSelf-Adaptive Multi-Objective Evolutionary AlgorithmsVirtual Screening & De Novo Molecular Design

2 Life Sciences Informatics platformAbout Life Sciences Informatics platformLiSIs ShowcaseLiSIs Showcase Discussion

3 Self-Adaptive Multi-Objective Evolutionary AlgorithmAbout Self-Adaptive MOEASelf-Adaptive MOEA ShowcasesSelf-Adaptive MOEA Showcases Discussion

4 Concluding RemarksConcluding Remarks - LiSIs platformConcluding Remarks - Self-Adaptive MOEA

5 Future WorkFuture Work - LiSIs platformFuture Work - Self-Adaptive MOEA

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 57 / 130

Page 128: CKannas PhD Thesis Slides

Future Work - LiSIs platform

Develop LiSIs 2.0:

Based on latest Galaxy platform, andRedesign of tools to be compatible with Galaxy’s ToolShedfor easy deployment,

Update LiSIs with a feature to visualise intermediate resultsfrom various tools,Expand LiSIs tools with tools featuring the MEGA line-up ofalgorithms and SAMOEA,Explore resource management in SWMSs:

Novel Multi-Objective Optimization SW design approaches,Novel Multi-Objective Optimization SWs schedulingapproaches.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130

Page 129: CKannas PhD Thesis Slides

Future Work - LiSIs platform

Develop LiSIs 2.0:

Based on latest Galaxy platform, andRedesign of tools to be compatible with Galaxy’s ToolShedfor easy deployment,

Update LiSIs with a feature to visualise intermediate resultsfrom various tools,Expand LiSIs tools with tools featuring the MEGA line-up ofalgorithms and SAMOEA,Explore resource management in SWMSs:

Novel Multi-Objective Optimization SW design approaches,Novel Multi-Objective Optimization SWs schedulingapproaches.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130

Page 130: CKannas PhD Thesis Slides

Future Work - LiSIs platform

Develop LiSIs 2.0:

Based on latest Galaxy platform, andRedesign of tools to be compatible with Galaxy’s ToolShedfor easy deployment,

Update LiSIs with a feature to visualise intermediate resultsfrom various tools,Expand LiSIs tools with tools featuring the MEGA line-up ofalgorithms and SAMOEA,Explore resource management in SWMSs:

Novel Multi-Objective Optimization SW design approaches,Novel Multi-Objective Optimization SWs schedulingapproaches.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130

Page 131: CKannas PhD Thesis Slides

Future Work - LiSIs platform

Develop LiSIs 2.0:

Based on latest Galaxy platform, andRedesign of tools to be compatible with Galaxy’s ToolShedfor easy deployment,

Update LiSIs with a feature to visualise intermediate resultsfrom various tools,Expand LiSIs tools with tools featuring the MEGA line-up ofalgorithms and SAMOEA,Explore resource management in SWMSs:

Novel Multi-Objective Optimization SW design approaches,Novel Multi-Objective Optimization SWs schedulingapproaches.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130

Page 132: CKannas PhD Thesis Slides

Future Work - LiSIs platform

Develop LiSIs 2.0:

Based on latest Galaxy platform, andRedesign of tools to be compatible with Galaxy’s ToolShedfor easy deployment,

Update LiSIs with a feature to visualise intermediate resultsfrom various tools,Expand LiSIs tools with tools featuring the MEGA line-up ofalgorithms and SAMOEA,Explore resource management in SWMSs:

Novel Multi-Objective Optimization SW design approaches,Novel Multi-Objective Optimization SWs schedulingapproaches.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130

Page 133: CKannas PhD Thesis Slides

Future Work - LiSIs platform

Develop LiSIs 2.0:

Based on latest Galaxy platform, andRedesign of tools to be compatible with Galaxy’s ToolShedfor easy deployment,

Update LiSIs with a feature to visualise intermediate resultsfrom various tools,Expand LiSIs tools with tools featuring the MEGA line-up ofalgorithms and SAMOEA,Explore resource management in SWMSs:

Novel Multi-Objective Optimization SW design approaches,Novel Multi-Objective Optimization SWs schedulingapproaches.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130

Page 134: CKannas PhD Thesis Slides

Future Work - LiSIs platform

Develop LiSIs 2.0:

Based on latest Galaxy platform, andRedesign of tools to be compatible with Galaxy’s ToolShedfor easy deployment,

Update LiSIs with a feature to visualise intermediate resultsfrom various tools,Expand LiSIs tools with tools featuring the MEGA line-up ofalgorithms and SAMOEA,Explore resource management in SWMSs:

Novel Multi-Objective Optimization SW design approaches,Novel Multi-Objective Optimization SWs schedulingapproaches.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130

Page 135: CKannas PhD Thesis Slides

Future Work - LiSIs platform

Develop LiSIs 2.0:

Based on latest Galaxy platform, andRedesign of tools to be compatible with Galaxy’s ToolShedfor easy deployment,

Update LiSIs with a feature to visualise intermediate resultsfrom various tools,Expand LiSIs tools with tools featuring the MEGA line-up ofalgorithms and SAMOEA,Explore resource management in SWMSs:

Novel Multi-Objective Optimization SW design approaches,Novel Multi-Objective Optimization SWs schedulingapproaches.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130

Page 136: CKannas PhD Thesis Slides

Future Work - Self-Adaptive MOEA

Optimise MEGA framework (memory management andparallelism),Implement self-adaptive technique for selecting geneticoperators,Extend Self-Adaptive MOEA to use other MOEAs,Implement models for other problems, andImplement new objective functions.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130

Page 137: CKannas PhD Thesis Slides

Future Work - Self-Adaptive MOEA

Optimise MEGA framework (memory management andparallelism),Implement self-adaptive technique for selecting geneticoperators,Extend Self-Adaptive MOEA to use other MOEAs,Implement models for other problems, andImplement new objective functions.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130

Page 138: CKannas PhD Thesis Slides

Future Work - Self-Adaptive MOEA

Optimise MEGA framework (memory management andparallelism),Implement self-adaptive technique for selecting geneticoperators,Extend Self-Adaptive MOEA to use other MOEAs,Implement models for other problems, andImplement new objective functions.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130

Page 139: CKannas PhD Thesis Slides

Future Work - Self-Adaptive MOEA

Optimise MEGA framework (memory management andparallelism),Implement self-adaptive technique for selecting geneticoperators,Extend Self-Adaptive MOEA to use other MOEAs,Implement models for other problems, andImplement new objective functions.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130

Page 140: CKannas PhD Thesis Slides

Future Work - Self-Adaptive MOEA

Optimise MEGA framework (memory management andparallelism),Implement self-adaptive technique for selecting geneticoperators,Extend Self-Adaptive MOEA to use other MOEAs,Implement models for other problems, andImplement new objective functions.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130

Page 141: CKannas PhD Thesis Slides
Page 142: CKannas PhD Thesis Slides

List of Publications

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 61 / 130

Page 143: CKannas PhD Thesis Slides

Table of Contents

6 List of Publications7 References8 Backup Frames

Validation of Self-Adaptive MOEAUse Case 1Use Case 2Use Case 3Use Case 4

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 61 / 130

Page 144: CKannas PhD Thesis Slides

List of Publications I

Book Chapters

C. A. Nicolaou and C. C. Kannas, “Molecular LibraryDesign Using Multi-Objective OptimizationMethods,” in Chemical Library Design, J. Z. Zhou, Ed.Humana Press, 2011, pp. 53–69.

Journals

C. Kannas et al., “LiSIs: An Online Scientific WorkflowSystem for Virtual Screening,” Combinatorial Chemistry& High Throughput Screening, vol. 18, no. 3, pp. 281–295,Mar. 2015.C. A. Nicolaou, C. Kannas, and E. Loizidou,“Multi-objective optimization methods in de novodrug design,” Mini Rev Med Chem, vol. 12, no. 10, pp.979–987, Sep. 2012.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 62 / 130

Page 145: CKannas PhD Thesis Slides

List of Publications II

C. Nicolaou, C. Kannas, and C. Pattichis,“Knowledge-driven multi-objective de novo drugdesign,” Chemistry Central Journal, vol. 3, p. P22, 2009.

Conferences

C. C. Kannas, and C. S. Pattichis, ”Self-AdaptiveMulti-Objective Evolutionary Algorithm forMolecular Design,” in 30th IEEE InternationalSymposium on Computer-Base Medical Systems,Thessoloniki, Greece, 22-24 June 2017, pp. 1-6.P. Hasapis et al., ”Molecular clustering via knowledgemining from biomedical scientific corpora,” in 2013IEEE 13th International Conference on Bioinformatics andBioengineering (BIBE), 2013, pp. 1-5.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 63 / 130

Page 146: CKannas PhD Thesis Slides

List of Publications III

C. C. Kannas et al., “A workflow system for virtualscreening in cancer chemoprevention,” in 2012 IEEE12th International Conference on BioinformaticsBioengineering (BIBE), 2012, pp. 439–446.K. G. Achilleos, C. C. Kannas, C. A. Nicolaou, C. S.Pattichis, and V. J. Promponas, “Open source workflowsystems in life sciences informatics,” in 2012 IEEE 12thInternational Conference on Bioinformatics Bioengineering(BIBE), 2012, pp. 552–558.C. A. Nicolaou, C. Kannas, and C. S. Pattichis, “Optimalgraph design using a knowledge-drivenmulti-objective evolutionary graph algorithm,” in2009 9th International Conference on InformationTechnology and Applications in Biomedicine, Larnaka,Cyprus, 2009, pp. 1–6.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 64 / 130

Page 147: CKannas PhD Thesis Slides

List of Publications IV

C. C. Kannas, C. A. Nicolaou, and C. S. Pattichis, “AParallel implementation of a Multi-objectiveEvolutionary Algorithm,” in 2009 9th InternationalConference on Information Technology and Applications inBiomedicine, Larnaka, Cyprus, 2009, pp. 1–6.

Abstracts

C. C. Kannas, and C. S. Pattichis, ”Self-AdaptiveMulti-Objective Evolutionary Algorithm forMolecular Design,” in 39th Annual InternationalConference of the IEEE Engineering in Medicine and BiologySociety, Jeju Island, Korea, 11-15 July 2017.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 65 / 130

Page 148: CKannas PhD Thesis Slides

References

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 66 / 130

Page 149: CKannas PhD Thesis Slides

Table of Contents

6 List of Publications7 References8 Backup Frames

Validation of Self-Adaptive MOEAUse Case 1Use Case 2Use Case 3Use Case 4

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 66 / 130

Page 150: CKannas PhD Thesis Slides

References I

Beccari, A. R., Cavazzoni, C., Beato, C., and Costantino, G.(2013). LiGen: A High Performance Workflow for ChemistryDriven de Novo Design. Journal of Chemical Information andModeling.

Blankenberg, D., Kuster, G. V., Coraor, N., Ananda, G.,Lazarus, R., Mangan, M., Nekrutenko, A., and Taylor, J. (2010).Galaxy: A Web-Based Genome Analysis Tool forExperimentalists. In Current Protocols in Molecular Biology.John Wiley & Sons, Inc.

Daeyaert, F. and Deem, M. W. (2016). A Pareto Algorithm forEfficient De Novo Design of Multi-functional Molecules.Molecular Informatics, pages n/a–n/a.

Page 151: CKannas PhD Thesis Slides

References II

Damewood, Jr, J. R., Lerman, C. L., and Masek, B. B. (2010).NovoFLAP: A ligand-based de novo design approach for thegeneration of medicinally relevant ideas. Journal of ChemicalInformation and Modeling, 50(7):1296–1303.

Dey, F. and Caflisch, A. (2008). Fragment-based de novo liganddesign by multiobjective evolutionary optimization. Journal ofChemical Information and Modeling, 48(3):679–690.

Ekins, S., Honeycutt, J. D., and Metz, J. T. (2010). Evolvingmolecules using multi-objective optimization: applying toADME/Tox. Drug Discovery Today, 15(11-12):451–460.

Page 152: CKannas PhD Thesis Slides

References III

Feher, M., Gao, Y., Baber, J. C., Shirley, W. A., and Saunders,J. (2008). The use of ligand-based de novo design for scaffoldhopping and sidechain optimization: two case studies. Bioorganic& Medicinal Chemistry, 16(1):422–427.

Firth, N. C., Atrash, B., Brown, N., and Blagg, J. (2015).MOARF, an Integrated Workflow for MultiobjectiveOptimization: Implementation, Synthesis, and BiologicalEvaluation. Journal of Chemical Information and Modeling.

Fonseca, C. and Fleming, P. (1998). Multiobjective optimizationand multiple constraint handling with evolutionary algorithms. I.A unified formulation. IEEE Transactions on Systems, Man andCybernetics, Part A: Systems and Humans, 28(1):26–37.

Page 153: CKannas PhD Thesis Slides

References IV

Giardine, B., Riemer, C., Hardison, R. C., Burhans, R., Elnitski,L., Shah, P., Zhang, Y., Blankenberg, D., Albert, I., Taylor, J.,Miller, W., Kent, W. J., and Nekrutenko, A. (2005). Galaxy: APlatform for Interactive Large-Scale Genome Analysis. GenomeResearch, 15(10):1451–1455.

Goecks, J., Nekrutenko, A., Taylor, J., and Galaxy Team, T.(2010). Galaxy: A comprehensive approach for supportingaccessible, reproducible, and transparent computational researchin the life sciences. Genome Biology, 11(8):R86.

Grefenstette, J. (1986). Optimization of Control Parameters forGenetic Algorithms. IEEE Transactions on Systems, Man andCybernetics, 16(1):122–128.

Page 154: CKannas PhD Thesis Slides

References V

Gurer-Orhan, H., Kool, J., Vermeulen, N. P. E., and Meerman, J.H. N. (2005). A novel microplate reader-based high-throughputassay for estrogen receptor binding. International Journal ofEnvironmental Analytical Chemistry, 85(3):149–161.

Hartenfeller, M., Zettl, H., Walter, M., Rupp, M., Reisen, F.,Proschak, E., Weggen, S., Stark, H., and Schneider, G. (2012).DOGS: Reaction-Driven de novo Design of BioactiveCompounds. PLoS Comput Biol, 8(2):e1002380.

Huang, Q., Li, L.-L., and Yang, S.-Y. (2010). PhDD: a newpharmacophore-based de novo design method of drug-likemolecules combined with assessment of synthetic accessibility.Journal of Molecular Graphics and Modelling, 28(8):775–787.

Page 155: CKannas PhD Thesis Slides

References VI

Kannas, C., Kalvari, I., Lambrinidis, G., Neophytou, C., Savva,C., Kirmitzoglou, I., Antoniou, Z., Achilleos, K., Scherf, D.,Pitta, C., Nicolaou, C., Mikros, E., Promponas, V., Gerhauser,C., Mehta, R., Constantinou, A., and Pattichis, C. (2015). LiSIs:An Online Scientific Workflow System for Virtual Screening.Combinatorial Chemistry & High Throughput Screening,18(3):281 – 295.

Kramer, O. (2010). Evolutionary self-adaptation: a survey ofoperators and strategy parameters. Evolutionary Intelligence,3(2):51–65.

Page 156: CKannas PhD Thesis Slides

References VII

Kutchukian, P. S., Lou, D., and Shakhnovich, E. I. (2009). FOG:Fragment Optimized Growth algorithm for the de novogeneration of molecules occupying druglike chemical space.Journal of Chemical Information and Modeling, 49(7):1630–1642.

Medina-Franco, J. L., Lopez-Vallejo, F., Kuck, D., and Lyko, F.(2010). Natural products as DNA methyltransferase inhibitors: acomputer-aided discovery approach. Molecular Diversity,15:293–304.

Nicolaou, C. A., Apostolakis, J., and Pattichis, C. S. (2009a). DeNovo Drug Design Using Multiobjective Evolutionary Graphs.Journal of Chemical Information and Modeling, 49(2):295–307.

Page 157: CKannas PhD Thesis Slides

References VIII

Nicolaou, C. A., Kannas, C., and Pattichis, C. S. (2009b).Optimal graph design using a knowledge-driven multi-objectiveevolutionary graph algorithm. In 2009 9th InternationalConference on Information Technology and Applications inBiomedicine, pages 1–6, Larnaka, Cyprus. IEEE.

Page 158: CKannas PhD Thesis Slides

Backup Frames

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 61 / 130

Page 159: CKannas PhD Thesis Slides

Table of Contents

6 List of Publications7 References8 Backup Frames

Validation of Self-Adaptive MOEAUse Case 1Use Case 2Use Case 3Use Case 4

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 61 / 130

Page 160: CKannas PhD Thesis Slides

Pareto Ranking

Page 161: CKannas PhD Thesis Slides

LiSIs Showcase - Known ER Ligands

A/A Estrogen Ligand Docking Score ER-α Docking Score ER-β1 Raloxifene -11.70 -8.722 Lilly-117018 -11.53 -3.803 3-HydroxyTamoxifen -11.02 N/A4 Nafoxidine -10.88 N/A5 ICI-182780 -10.73 N/A6 Pyrolidine -10.04 N/A7 Clomiphene A -10.01 N/A8 Nitrofinene Citrate -9.87 N/A9 ICI-164384 -9.82 -9.13

10 Moxestrol -9.38 -9.7711 Naringenine -8.55 -7.8012 Triphenylethylene -8.50 N/A13 Afema -8.15 -7.7814 Danazol -6.99 N/A15 Ethamoxytriphetol -6.67 N/A16 4-HydroxyTamoxifen -6.60 N/A17 Dioxin -6.22 N/A18 Estralutin -5.86 -3.8019 Cyclopentanone -4.88 N/A20 Miproxifene Phosphate -4.48 N/A21 EM-800 N/A N/A

Note: The list was retrieved from PubChem and it includes compounds characterized as

“estrogen ligands”. N/A; no binding affinity.

Page 162: CKannas PhD Thesis Slides

LiSIs Showcase - Natural-like Rule of 5 filter

GRANATUM Rule of 5 filter:

1 MW between 160 and 700,2 HBD less or equal to 5,3 HBA less or equal to 10,4 TPSA less than 140, and5 cLogP between -0.4 and 5.6.

Page 163: CKannas PhD Thesis Slides

eMEGA Settings

Table: eMEGA experimental design settings

Dataset Objectives Population Iterations Evolutionary OperationsDataset 1 Structural Similarity

Chemical DescriptorSimilarity

500 500Mutation Probability: 15%Crossover Probability: 80%Selection Type: RouletteDiversity Type: Genotype

Dataset 2

Page 164: CKannas PhD Thesis Slides

SAMOEA Settings

Table: SAMOEA experimental design settings

SAMOEADataset Objectives Population Iterations Evolutionary Operations

Dataset 1 Non DominateSolutions PercentageUnique SolutionsPercentage

20 100Mutation Probability: 15%Crossover Probability: 80%Selection Type: RouletteDiversity Type: Phenotype

Dataset 2

eMEGADataset 1 Structural Similarity

Chemical DescriptorSimilarity

100 1Defined during run time.Based on SAMOEA’s chro-mosomes.

Dataset 2

Page 165: CKannas PhD Thesis Slides

Virtual Machine Specifications

Table: Specifications of the virtual machine the experimental runs wereperformed

Linux Virtual MachineCPU 4x Virtual CPU @ 2GHzRAM 16GBOS CentOS 6

Page 166: CKannas PhD Thesis Slides

eMEGA Maybridge Run 1

Figure: eMEGA Run 1 results for Maybridge dataset.

Page 167: CKannas PhD Thesis Slides

eMEGA Maybridge Run 2

Figure: eMEGA Run 2 results for Maybridge dataset.

Page 168: CKannas PhD Thesis Slides

eMEGA Maybridge Run 3

Figure: eMEGA Run 3 results for Maybridge dataset.

Page 169: CKannas PhD Thesis Slides

eMEGA Maybridge Run 4

Figure: eMEGA Run 4 results for Maybridge dataset.

Page 170: CKannas PhD Thesis Slides

eMEGA Maybridge Run 5

Figure: eMEGA Run 5 results for Maybridge dataset.

Page 171: CKannas PhD Thesis Slides

eMEGA Maybridge All Runs

Figure: eMEGA results for Maybridge dataset.

Page 172: CKannas PhD Thesis Slides

eMEGA Maybridge All Runs Top 10 Results (1)

Figure: eMEGA Top 10 results for Maybridge dataset.

Page 173: CKannas PhD Thesis Slides

eMEGA Maybridge All Runs Top 10 Results (2)

Figure: eMEGA Top 10 results for Maybridge dataset compared withSeliciclib, the red highlighted part of the molecules is their commoncore.

Page 174: CKannas PhD Thesis Slides

eMEGA Asinex Run 1

Figure: eMEGA Run 1 results for Asinex dataset.

Page 175: CKannas PhD Thesis Slides

eMEGA Asinex Run 2

Figure: eMEGA Run 2 results for Asinex dataset.

Page 176: CKannas PhD Thesis Slides

eMEGA Asinex Run 3

Figure: eMEGA Run 3 results for Asinex dataset.

Page 177: CKannas PhD Thesis Slides

Results - eMEGA Asinex Run 4

Figure: eMEGA Run 4 results for Asinex dataset.

Page 178: CKannas PhD Thesis Slides

eMEGA Asinex Run 5

Figure: eMEGA Run 5 results for Asinex dataset.

Page 179: CKannas PhD Thesis Slides

eMEGA Asinex All Runs

Figure: eMEGA results for Asinex dataset.

Page 180: CKannas PhD Thesis Slides

eMEGA Asinex All Runs Top 10 Results (1)

Figure: eMEGA Top 10 results for Asinex dataset.

Page 181: CKannas PhD Thesis Slides

eMEGA Asinex All Runs Top 10 Results (2)

Figure: eMEGA Top 10 results for Asinex dataset compared withSeliciclib, the red highlighted part of the molecules is their commoncore.

Page 182: CKannas PhD Thesis Slides

SAMOEA Maybridge Run 1

Figure: SAMOEA Run 1 results for Maybridge dataset.

Page 183: CKannas PhD Thesis Slides

SAMOEA Maybridge Run 2

Figure: SAMOEA Run 2 results for Maybridge dataset.

Page 184: CKannas PhD Thesis Slides

SAMOEA Maybridge Run 3

Figure: SAMOEA Run 3 results for Maybridge dataset.

Page 185: CKannas PhD Thesis Slides

SAMOEA Maybridge Run 4

Figure: SAMOEA Run 4 results for Maybridge dataset.

Page 186: CKannas PhD Thesis Slides

SAMOEA Maybridge Run 5

Figure: SAMOEA Run 5 results for Maybridge dataset.

Page 187: CKannas PhD Thesis Slides

SAMOEA Maybridge All Runs

Figure: SAMOEA results for Maybridge dataset.

Page 188: CKannas PhD Thesis Slides

SAMOEA Maybridge All Runs Top 10 Results (1)

Figure: SAMOEA Top 10 results for Maybridge dataset.

Page 189: CKannas PhD Thesis Slides

SAMOEA Maybridge All Runs Top 10 Results (2)

Figure: SAMOEA Top 10 results for Maybridge dataset compared withSeliciclib, the red highlighted part of the molecules is their commoncore.

Page 190: CKannas PhD Thesis Slides

SAMOEA Top 10 proposed settings for eMEGA

for Maybridge dataset

Table: SAMOEA Top 10 proposed settings for eMEGA for Maybridgedataset

MutationProbability

CrossoverProbability

SelectionType

DiversityType

NonDominated%

UniqueSolutions%

Rank

0.029 0.694 roulette genotype 0.9 0.986 10.175 0.818 roulette phenotype 0.914 0.961 10.172 0.818 tournament phenotype 0.934 0.9533 10.026 0.694 roulette phenotype 0.928 0.955 10.001 0.963 roulette phenotype 0.982 0.848 10.177 0.818 roulette phenotype 0.921 0.956 10.083 0.73 tournament phenotype 0.95 0.946 10.086 0.798 tournament genotype 0.976 0.928 10.172 0.818 best genotype 0.914 0.973 20.176 0.818 roulette genotype 0.9312 0.956 2

Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the

actual %. The smaller the number listed here the better. ’Rank’ is their non dominance

rank.

Page 191: CKannas PhD Thesis Slides

SAMOEA Asinex Run 1

Figure: SAMOEA Run 1 results for Asinex dataset.

Page 192: CKannas PhD Thesis Slides

SAMOEA Asinex Run 2

Figure: SAMOEA Run 2 results for Asinex dataset.

Page 193: CKannas PhD Thesis Slides

SAMOEA Asinex Run 3

Figure: SAMOEA Run 3 results for Asinex dataset.

Page 194: CKannas PhD Thesis Slides

SAMOEA Asinex Run 4

Figure: SAMOEA Run 4 results for Asinex dataset.

Page 195: CKannas PhD Thesis Slides

SAMOEA Asinex All Runs

Figure: SAMOEA results for Asinex dataset.

Page 196: CKannas PhD Thesis Slides

SAMOEA Asinex All Runs Top 10 Results (1)

Figure: SAMOEA Top 10 results for Asinex dataset.

Page 197: CKannas PhD Thesis Slides

SAMOEA Asinex All Runs Top 10 Results (2)

Figure: SAMOEA Top 10 results for Asinex dataset compared withSeliciclib, the red highlighted part of the molecules is their commoncore.

Page 198: CKannas PhD Thesis Slides

SAMOEA Top 10 proposed settings for eMEGA

for Maybridge Asinex

Table: SAMOEA Top 10 proposed settings for eMEGA for Asinexdataset

MutationProbability

CrossoverProbability

SelectionType

DiversityType

NonDominated%

UniqueSolutions%

Rank

0.105 1.0 best phenotype 0.988 0.931 10.139 0.963 tournament phenotype 0.962 0.956 10.089 0.694 tournament genotype 0.976 0.943 10.139 0.969 best phenotype 0.96 0.96 10.108 0.69 tournament genotype 0.955 0.962 10.1 1.0 best phenotype 0.988 0.942 1

0.088 0.685 tournament genotype 0.96 0.962 10.139 0.966 roulette phenotype 0.965 0.948 10.089 0.709 tournament genotype 0.964 0.957 2

Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the

actual %. The smaller the number listed here the better. ’Rank’ is their non dominance

rank.

Page 199: CKannas PhD Thesis Slides

MOARF Results

Figure: MOARF’s results compared with Seliciclib.

Page 200: CKannas PhD Thesis Slides

Compare SAMOEA, eMEGA and MOARF

Figure: Compare all Top 10 results with MOARF’s results andSeliciclib.

Page 201: CKannas PhD Thesis Slides

Discussion (1)

eMEGA and SAMOEA generate molecules that approximateSeliciclib,Datasets and algorithms have different common core withSeliciclib,MOARF approximates Seliciclib better than eMEGA andSAMOEA:

Generates molecules in a more chemical oriented way, withless stochastic operations,Starts from a selected core for the target where then attachesnew fragments on to it,

SAMOEA explores the space better than eMEGA andMOARF

Page 202: CKannas PhD Thesis Slides

Discussion (1)

eMEGA and SAMOEA generate molecules that approximateSeliciclib,Datasets and algorithms have different common core withSeliciclib,MOARF approximates Seliciclib better than eMEGA andSAMOEA:

Generates molecules in a more chemical oriented way, withless stochastic operations,Starts from a selected core for the target where then attachesnew fragments on to it,

SAMOEA explores the space better than eMEGA andMOARF

Page 203: CKannas PhD Thesis Slides

Discussion (1)

eMEGA and SAMOEA generate molecules that approximateSeliciclib,Datasets and algorithms have different common core withSeliciclib,MOARF approximates Seliciclib better than eMEGA andSAMOEA:

Generates molecules in a more chemical oriented way, withless stochastic operations,Starts from a selected core for the target where then attachesnew fragments on to it,

SAMOEA explores the space better than eMEGA andMOARF

Page 204: CKannas PhD Thesis Slides

Discussion (1)

eMEGA and SAMOEA generate molecules that approximateSeliciclib,Datasets and algorithms have different common core withSeliciclib,MOARF approximates Seliciclib better than eMEGA andSAMOEA:

Generates molecules in a more chemical oriented way, withless stochastic operations,Starts from a selected core for the target where then attachesnew fragments on to it,

SAMOEA explores the space better than eMEGA andMOARF

Page 205: CKannas PhD Thesis Slides

Discussion (1)

eMEGA and SAMOEA generate molecules that approximateSeliciclib,Datasets and algorithms have different common core withSeliciclib,MOARF approximates Seliciclib better than eMEGA andSAMOEA:

Generates molecules in a more chemical oriented way, withless stochastic operations,Starts from a selected core for the target where then attachesnew fragments on to it,

SAMOEA explores the space better than eMEGA andMOARF

Page 206: CKannas PhD Thesis Slides

Discussion (1)

eMEGA and SAMOEA generate molecules that approximateSeliciclib,Datasets and algorithms have different common core withSeliciclib,MOARF approximates Seliciclib better than eMEGA andSAMOEA:

Generates molecules in a more chemical oriented way, withless stochastic operations,Starts from a selected core for the target where then attachesnew fragments on to it,

SAMOEA explores the space better than eMEGA andMOARF

Page 207: CKannas PhD Thesis Slides

Discussion (2)

From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.

Maybridge dataset:

Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.

Asinex dataset:

Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.

Page 208: CKannas PhD Thesis Slides

Discussion (2)

From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.

Maybridge dataset:

Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.

Asinex dataset:

Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.

Page 209: CKannas PhD Thesis Slides

Discussion (2)

From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.

Maybridge dataset:

Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.

Asinex dataset:

Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.

Page 210: CKannas PhD Thesis Slides

Discussion (2)

From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.

Maybridge dataset:

Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.

Asinex dataset:

Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.

Page 211: CKannas PhD Thesis Slides

Discussion (2)

From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.

Maybridge dataset:

Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.

Asinex dataset:

Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.

Page 212: CKannas PhD Thesis Slides

Discussion (2)

From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.

Maybridge dataset:

Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.

Asinex dataset:

Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.

Page 213: CKannas PhD Thesis Slides

Discussion (2)

From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.

Maybridge dataset:

Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.

Asinex dataset:

Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.

Page 214: CKannas PhD Thesis Slides

Discussion (2)

From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.

Maybridge dataset:

Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.

Asinex dataset:

Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.

Page 215: CKannas PhD Thesis Slides

Discussion (2)

From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.

Maybridge dataset:

Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.

Asinex dataset:

Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.

Page 216: CKannas PhD Thesis Slides

Discussion (2)

From the SAMOEA proposed eMEGA settings Tables we can seethat different settings are favoured for each dataset.

Maybridge dataset:

Mutation probability around 17%,Crossover probability around 80%,Selection type either roulette or tournament andDiversity type both selections are valid ones.

Asinex dataset:

Mutation probability around 10%,Crossover probability around 96%,Selection type either best or tournament andDiversity type both selections are valid ones.

Page 217: CKannas PhD Thesis Slides

Discussion (3)

The objective fitness scores for the proposed settings are veryhigh, which means that the actual percentage is really low, below5%. From this we can conclude the following:

eMEGA instances generate a large number of identicalsolutions, despite the fact that they have differentconfigurations, this is something that we noticed withprevious experiments when comparing MEGA, eMEGA andMOGA [Nicolaou et al., 2009b], andThe objective fitness functions we choose to use in SAMOEAcompete each other, which means that having eMEGAsgenerating a high number of unique and non dominatedsolutions (above 20%) proves to be a difficult task.

Page 218: CKannas PhD Thesis Slides

Discussion (3)

The objective fitness scores for the proposed settings are veryhigh, which means that the actual percentage is really low, below5%. From this we can conclude the following:

eMEGA instances generate a large number of identicalsolutions, despite the fact that they have differentconfigurations, this is something that we noticed withprevious experiments when comparing MEGA, eMEGA andMOGA [Nicolaou et al., 2009b], andThe objective fitness functions we choose to use in SAMOEAcompete each other, which means that having eMEGAsgenerating a high number of unique and non dominatedsolutions (above 20%) proves to be a difficult task.

Page 219: CKannas PhD Thesis Slides

Use Case 1: Docked designed molecules (1)

Figure: Designed molecule DnD 6 SP 20 4 X 13a docked to ER-α.

Page 220: CKannas PhD Thesis Slides

Use Case 1: Docked designed molecules (2)

Figure: Designed molecule DnD 31 SP 150 37 M 19 docked to ER-α.

Page 221: CKannas PhD Thesis Slides

Use Case 1: Docked designed molecules (3)

Figure: Designed molecule DnD 8 SP 9 2 M 13 docked to ER-α.

Page 222: CKannas PhD Thesis Slides

Use Case 1: Docked designed molecules (4)

Figure: Designed molecule DnD 4 SP 199 49 X 46b docked to ER-α.

Page 223: CKannas PhD Thesis Slides

Use Case 1: Docked designed molecules (5)

Figure: Designed molecule DnD 12 SP 75 18 M 13 docked to ER-α.

Page 224: CKannas PhD Thesis Slides

Use Case 1: Docked designed molecules (6)

Figure: Designed molecule DnD 31 SP 6 1 M 16 docked to ER-α.

Page 225: CKannas PhD Thesis Slides

Use Case 1: Docked designed molecules (7)

Figure: Designed molecule DnD 15 SP 168 41 M 0 docked to ER-α.

Page 226: CKannas PhD Thesis Slides

Use Case 1: Docked designed molecules (8)

Figure: Designed molecule DnD 11 SP 74 18 M 4 docked to ER-α.

Page 227: CKannas PhD Thesis Slides

Use Case 1: Docked designed molecules (9)

Figure: Designed molecule DnD 31 SP 193 48 X 76b docked to ER-α.

Page 228: CKannas PhD Thesis Slides

Use Case 1: Docked designed molecules (10)

Figure: Designed molecule DnD 1 SP 78 19 X 84a docked to ER-α.

Page 229: CKannas PhD Thesis Slides

Use Case 2: About

Design molecules that bind to ER-α based on:

Structural similarity to Tamoxifen, andChemical Properties similarity to Tamoxifen.

Figure: Tamoxifen.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 116 / 130

Page 230: CKannas PhD Thesis Slides

Use Case 2: Starting Dataset

Starting Molecules dataset:

Molecules retrieved from ZINC15,Applied filters:

Clean (Substances with ”clean” reactivity),In-vitro (Substances reported or inferred active at 10 uM orbetter in direct binding assays) andNow (Immediate delivery, includes in-stock and agent).

The collection contains 7035 molecules.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 117 / 130

Page 231: CKannas PhD Thesis Slides

Use Case 2: Results - In objective space

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 118 / 130

Page 232: CKannas PhD Thesis Slides

Use Case 2: Results - Designed molecules

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 119 / 130

Page 233: CKannas PhD Thesis Slides

Use Case 2: Results - AutoDock Vina docking

Molecule Id Docking Affinity (kcal/mol)DnD 42 SP 194 48 X 96b -10.1DnD 17 SP 199 49 M 4 -10DnD 33 SP 189 47 X 66b -9.9DnD 48 SP 193 48 M 5 -9.6Tamoxifen -8.2

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 120 / 130

Page 234: CKannas PhD Thesis Slides

Use Case 2: Results - Self-Adaptive MOEA non

dominated settings for eMEGA

MutationProbability

CrossoverProbability

SelectionType

DiversityType

NonDominated%

ParetoHypervolume

Rank

0.02707 0.97973 tournament genotype 0.983 0.153 10.02758 0.97965 tournament phenotype 0.988 0.152 1

Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the

number listed here the better. ’Rank’ is their non dominance rank.

C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 121 / 130

Page 235: CKannas PhD Thesis Slides

Use Case 2: Docked designed molecules (1)

Figure: Designed molecule DnD 42 SP 194 48 X 96b docked to ER-α.

Page 236: CKannas PhD Thesis Slides

Use Case 2: Docked designed molecules (2)

Figure: Designed molecule DnD 17 SP 199 49 M 4 docked to ER-α.

Page 237: CKannas PhD Thesis Slides

Use Case 2: Docked designed molecules (3)

Figure: Designed molecule DnD 33 SP 189 47 X 66b docked to ER-α.

Page 238: CKannas PhD Thesis Slides

Use Case 2: Docked designed molecules (4)

Figure: Designed molecule DnD 48 SP 193 48 M 5 docked to ER-α.

Page 239: CKannas PhD Thesis Slides

Use Case 3: Docked designed molecules (1)

Figure: Designed molecule DnD 31 SP 194 48 M 49 docked to ER-α.

Page 240: CKannas PhD Thesis Slides

Use Case 3: Docked designed molecules (2)

Figure: Designed molecule DnD 34 SP 197 49 X 13a docked to ER-α.

Page 241: CKannas PhD Thesis Slides

Use Case 4: Docked designed molecules (1)

Figure: Designed molecule DnD 19 SP 196 48 X 59b docked toProteasome B5.

Page 242: CKannas PhD Thesis Slides

Use Case 4: Docked designed molecules (2)

Figure: Designed molecule DnD 49 SP 193 48 X 123b docked toProteasome B5.

Page 243: CKannas PhD Thesis Slides

Use Case 4: Docked designed molecules (3)

Figure: Designed molecule DnD 1 SP 196 48 X 67a docked toProteasome B5.