fiorella ruggiu, gilles marcou, alexandre varnek & dragos horvath€¦ ·  ·...

39
Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie UMR 7177 CNRS – Université de Strasbourg Institut de Chimie, 4, rue Blaise Pascal, 6700 Strasbourg, FR

Upload: buiminh

Post on 27-May-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath

Laboratoire d’InfoChimie

UMR 7177 CNRS – Université de Strasbourg

Institut de Chimie, 4, rue Blaise Pascal, 6700 Strasbourg, FR

Page 2: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie
Page 3: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

Fragment Descriptors – a Golden Standard in Drug Design

O-C*C*C-N 1O-C*N*C-N 1…

Example: ISIDA Sequence and Augmented Atom counts

Pros: Open-ended, comprehensive &intuitive capture of structural information(atoms & bonds – scaffold-oriented)

Cons: Atom symbols are not informativeof the actual chemical context of theatom. Context information is not lost, butdispersed…

Insensitive to the actual ionization status

Page 4: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

Fuzzy pH-dependent Pharmacophore Triplets (FPT)

3 3

3

4

6

7

4

3 4

5

5 3

0 0 0 … 0 0 … +6 … … +3 … … … … 0 …

5

5 4

Di(m) = total occupancy of basis triplet i in molecule m.

Pros: Explicit labeling of groups bytheir physico-chemical nature, pH-sensitive, fuzzy (scaffold hopping)

Cons: Fixed format/sizes

Page 5: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

Property-Labeled Fragments: Combining the best of two worlds

Open-ended enumeration of linear (Sequences) orbranched (Augmented Atoms) fragments, while labelingatoms by their context-dependent physico-chemicalproperties:

Pharmacophore Type

Gasteiger Charge-based Topological Potential

logP contributions (in progress)

pH-dependence granted by fingerprinting each populatedmicrospecies, then returning the population-weightedaverage fingerprint

Fuzziness supported by the use of “wildcard” atoms

Page 6: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie
Page 7: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

Workflow

ChemAxon-Readable

Compound Set

Annotated .sd file with typing info

Fragment Counts

Type: ChemAxon API-based Java

tool

FragType: Free Pascal Program

Page 8: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

T

If exception, retry withoutTakeResonantStructures

For each molecule

For each µspecies of population level >1%

Submit to pKa plugin

Submit to Standardizer

Submit to pmapper

Submit to pmapper

Pharmacophore Types

Force Field Types

Submit to charge plugin

GasteigerCharges

Electrostatic Potential Flags

End µspecies loop

Add population value & flag strings as new property fields of molecule

Write molecule with property fields

End loop

The Typing Tool

aRomatic, Hydrophobic,

Acceptor, Donor, Negative, Positive

CVFF-types

Page 9: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

Workflow

ChemAxon-Readable

Compound Set

Annotated .sd file with typing info

Fragment Counts

Type: ChemAxon API-based Java

tool

FragType: Free Pascal Program

....M END> <POP1>95

> <PHTYP1>H;H;A;D;R;R;R;R;A/D;R;R

> <FFTYP1>c3;c';o';n;cp;cp;cp;cp;oh;cp;cp

> <EPTYP1>0;n/0;N;0;n/0;n/0;n/0;n/0;N;n/0;n/0

> <POP2>5

> <PHTYP2>H;H;A;D;R;R;R;R;A/N;R;R....$$$$

Page 10: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

Topological electrostatic potentials For each atom i, the potential Vi is:

with qj the partial charge of j, dij the topologicaldistance, and do the “own field distance” (0.4)

In function of Vi, atom i will be classified into: N(strongly negative), n (negative), 0 (neutral), p(positive), P (strongly positive)

NN/n

-0.32 -0.28 0.08 0.12

-0.12 -0.08 0.28 0.32V

n n/0 0 o/p p p/PP

Page 11: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

Molecular Fingerprint

Microspecies-Specific Labeling of Fragments…

A-R*R*R*R-D +95D-R*R*R*R-D +95A-R*R*R*R-D +95D-R*R*R*R-D +95A-R*R*A*R-D +95D-R*R*A*R-D +95…

Population: 95% 5%

R

R

A/D

R

R

D

R

R/A

A/D

R

R

D

R

R/A

N

R

R

DN-R*R*R*R-D +5N-R*R*A*R-D +5…

µSpecies increment counters of contained fragments by their population levelsLower & Upper Fragment sizes are

user-defined

Page 12: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

Sequencing Options: (1) The Bond Information Toggle

With Bond Info

A-R*R*R*R-DD-R*R*R*R-DA-R*R*A*R-DD-R*R*A*R-D…

R

R/A

A/D

R

R

D

User may decide to capture (-b flag) or ignore bond information.

Without Bond Info

ARRRRDDRRRRDARRARDDRRARD

Page 13: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

Sequencing Options: (2) Wildcardatoms for Fuzziness Control

Strict Typing

ARRRRDDRRRRD

With the wildcard option (-w flag), non-terminal sequence atoms are alsomatched by the generic wildcard type “?”

Wildcards Allowed

ARRRRD A?RRRD AR?RRD ARR?RD ARRR?DA??RRD A?R?RD A?RR?DAR??RD AR?R?D ARR??D AR???D A??R?D AR???DA????D…DRRRRDD?RRRD … ……

R

R

A/D

R

R

D

A4D –Pair Counts may be explicitly (and exclusively) generated by the fragmentor.

Page 14: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

Augmented Atoms…

Strict Typing with Bond Info (-b)

D(-R(*R)*R)(-H(-H)=A)D(-R(*R)*A)(-H(-H)=A)

Branched fragments, representing an atom and (an user-defined number of ) its successive coordination spheres

H

RR/A R

D A

H

Strict Typing, noBond Info

D(R(R)R)(H(H)A)D(R(R)A)(H(H)A)

All but Central and Terminal Atoms may be

wildcards (-b -w)

D(-R(*R)*R)(-H(-H)=A)D(-?(*R)*R)(-H(-H)=A)D(-?(*R)*A)(-H(-H)=A)

“Tree” descriptors have wildcards for all but Central & Terminal:

D(-?(*R)*A)(-?(-H)=A)…

Page 15: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

FragType in a Nutshell Uses atom typing schemes (symbols, pharmacophore

types, electrostatic potential – others to follow), whereone atom may represent several types.

Several atom typing schemes are allowed for a molecule(µspecies-specific, weighted by population level at pH)

The ChemAxon API is a versatile atom typing tool.

Generates either sequences or augmented atoms, whichmay include or ignore bond information.

With the wildcard option, all the generic fragmentsignoring one or more atom types are also counted:

Sequences ↔ Fuzzy wildcard sequences ↔ Topological Pairs

Augmented Atoms ↔ Fuzzy branched fragments ↔ Trees

Page 16: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie
Page 17: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

Based on a core of 2500 molecules (all the ~200 actives,completed with randomly picked inactives) of acombinatorial library based on an Ugi synthesis.

Experimental screen against 5 proteases (Chymotrypsin,Factor Xa, Trypsin, Tryptase, Urokinase-type PlasminogenActivator )

For each active M (pIC50>=4.9) of each target T, LocalAscertained Optimality was calculated around M, accordingto different descriptor spaces {descriptor D, dissimilaritymetric S}

(1) Neighborhood Behavior

M1 D1-S1:X D1-S2:X

* … D1-Sn:X* D2-S1:X* D2-S2:X

* … Dk-Sn:X*

M2 D1-S1:X D1-S2:X

* … D1-Sn:X* D2-S1:X* D2-S2:X

* … Dk-Sn:X*

M3 D1-S1:X D1-S2:X

* … D1-Sn:X* D2-S1:X* D2-S2:X

* … Dk-Sn:X*

Page 18: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

Descriptor Dimension Descriptor Dimension

pairEP28 49 treeSY03 744

pairSY28 110 seqbPH25 751

seqSY25 123 aabPH02 784

pairPH28 169 seqwPH25 1311

aaSY02 249 treePH03 2201

seqEP25 268 seqEP37 2209

seqSY37 293 seqbEP25 2691

aabSY02 358 aaPH03 3409

seqbSY25 363 aabPH03 3716

seqwSY25 385 aaEP02 3785

seqPH25 443 aabEP02 6704

seqwEP25 566 treeEP03 6761

aaPH02 698 aaEP03 41667

Benchmarked Descriptors:“New” “Classical”

Pharmacophore Pairs: CATS (Prof. G. Schneider

et. al.)

ChemAxon PF

Pharmacophore Triplets: pH-sensitive & rule-based

3D Pharmacophore des-criptors: LIQUIDS (Prof. G.

Schneider et. al.)

SEL – subspace of relevantTryptase QSAR des-criptors

DPRED: Predicted TryptasepIC50

Page 19: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

Dissimilarity Metrics Six dissimilarity metrics were based upon:

Two descriptor rescaling schemes: Z-transformation (Avg/Varrescaling) or No rescaling.

Three distance formulas: Euclidean, Dice, binary block(FDIFF)

M

i

M

i

m

m

i

m

m

i

M

iM

i DdD

DDd

21

otherwise

dxordifwhere

dd

dd

dd

M

i

m

iMm

i

i

Mm

i

FDIFF

Mm

i

M

i

i

m

i

i

M

i

m

iDice

Mm

i

M

i

m

i

Eucl

Mm

0

13

2

121

,,

,

22,

2

,

S

SS

Page 20: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

Formula One Descriptor Grand Prix

M1 D1-S1:X D1-S2:X

* … D1-Sn:X* D2-S1:X* D2-S2:X

* … Dk-Sn:X*

M2 D1-S1:X D1-S2:X

* … D1-Sn:X* D2-S1:X* D2-S2:X

* … Dk-Sn:X*

M3 D1-S1:X D1-S2:X

* … D1-Sn:X* D2-S1:X* D2-S2:X

* … Dk-Sn:X*

XXXS

*/**2

XX 22121:''** DrankDbeatsDthenif

DD

M1 Rank(D1) Rank(D2) … Rank(Dk)

M2 Rank(D1) Rank(D2) … Rank(Dk)

M3 Rank(D1) Rank(D2) … Rank(Dk)

One Active Molecule = One ‘Grand Prix’ race

D Champion-ship Points

Champion-ship RANK

D1

D2

Dk

MSX*

#1: 10 points, #2: 6 points, #3 to #6: 4 to 1 points, respectively

One Target= One ‘Grand Prix’ Championship

Local

Optimality

Scores

Page 21: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

Chymotrypsin Championship(12 Actives, i.e. Grand Prix Races)

RANK Descriptor #Gold #Silver #Bronze POINTS Avg. Opt

1 aaSY02 4 2 1 62 0.24

2 seqSY25 2 1 1 42 0.23

3 treeSY03 1 1 1 32 0.23

4 seqSY37 0 2 3 30 0.23

5 aabSY02 0 2 2 29 0.22

6 seqbSY25 1 2 0 27 0.21

7 seqwSY25 0 0 1 8 0.21

8 pairSY28 1 0 0 13 0.2

9 treePH03 2 1 0 26 0.18

10 aabPH02 0 0 1 7 0.17

11 aaPH02 0 0 0 3 0.16

12 aabPH03 0 0 0 1 0.16

13 aaPH03 0 0 0 0 0.16

14 seqbPH25 0 0 0 0 0.14

15 seqPH25 0 0 0 0 0.14

Page 22: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

Factor Xa Championship(81 Actives, i.e. Grand Prix Races)

RANK Descriptor #Gold #Silver #Bronze POINTS Avg. Opt

1 DPRED 35 11 8 463 0.37

2 SEL 19 26 5 389 0.33

3 CATS-P1 10 7 11 212 0.29

4 FPT-nopK 1 3 6 104 0.29

5 FPT1 0 1 5 58 0.27

6 treeEP03 2 1 4 89 0.26

7 PF 0 3 4 76 0.26

8 aabPH03 0 2 6 51 0.25

9 seqbEP25 0 4 3 47 0.25

10 aaPH03 0 2 3 47 0.25

11 CATS-P2 0 4 1 43 0.25

12 CATS-R1 0 0 2 42 0.25

13 aabEP02 0 0 0 16 0.25

14 CATS-A1 0 3 2 49 0.24

15 treePH03 2 0 1 37 0.24

Page 23: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

Trypsin Championship(3 Actives, i.e. Grand Prix Races)

RANK Descriptor #Gold #Silver #Bronze POINTS Avg. Opt

1 aaPH03 1 0 0 11 0.22

2 treePH03 0 1 1 10 0.22

3 aabPH03 0 0 1 4 0.22

4 treeSY03 0 1 0 10 0.21

5 SEL 1 0 0 10 0.2

6 aabEP02 1 0 0 10 0.19

7 aaEP03 0 1 0 8 0.19

8 treeEP03 0 0 1 5 0.19

9 CATS-P1 0 0 0 3 0.19

10 seqbEP25 0 0 0 2 0.19

11 aaSY02 0 0 0 2 0.19

12 aabPH02 0 0 0 0 0.19

13 aaPH02 0 0 0 0 0.19

14 aaEP02 0 0 0 3 0.18

15 CATS-P2 0 0 0 0 0.18

Page 24: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

Tryptase Championship(100 Actives, i.e. Grand Prix Races)

RANK Descriptor #Gold #Silver #Bronze POINTS Avg. Opt

1 DPRED 63 3 1 656 0.35

2 treeSY03 2 16 10 196 0.26

3 aabSY02 0 16 12 188 0.26

4 aaSY02 2 5 15 170 0.26

5 aabEP02 1 8 5 100 0.25

6 treeEP03 2 5 6 97 0.25

7 pairPH28 0 5 6 83 0.25

8 seqwPH25 1 2 3 74 0.24

9 seqPH25 3 1 2 65 0.24

10 aaPH02 1 3 0 46 0.24

11 treePH03 4 0 0 43 0.24

12 aabPH02 0 1 2 31 0.24

13 SEL 0 2 5 46 0.23

14 seqbPH25 1 1 3 44 0.23

15 seqbSY25 1 0 2 39 0.23

Page 25: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

UPA Championship(11 Actives, i.e. Grand Prix Races)

RANK Descriptor #Gold #Silver #Bronze POINTS Avg. Opt

1 SEL 3 0 0 32 0.28

2 CATS-P1 4 0 0 43 0.26

3 CATS-P2 0 3 2 26 0.26

4 CATS-A1 0 1 2 17 0.24

5 CATS-P3 0 0 0 8 0.24

6 FPT-nopK 2 1 0 30 0.23

7 treeEP03 1 0 0 15 0.23

8 CATS-P4 0 0 0 6 0.23

9 aabEP02 1 1 0 16 0.22

10 aaEP03 0 0 1 5 0.22

11 treePH03 0 0 0 2 0.22

12 CATS-R2 0 0 0 2 0.22

13 CATS-A2 0 0 0 2 0.22

14 treeSY03 0 1 0 9 0.21

15 CATS-R1 0 0 1 7 0.21

Page 26: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

Overall Ranking…Descriptor Ranks with Targets: From Best to Worst => Average Rank Rank Variance

treeSY03 2 3 4 14 18 8.20 6.52

treeEP03 6 6 7 8 21 9.60 5.75

treePH03 2 9 11 11 15 9.60 4.27

aabEP02 5 6 9 13 19 10.40 5.12

SEL 1 2 5 13 37 11.60 13.38

aabPH03 3 8 12 17 19 11.80 5.84

aaPH03 1 10 13 18 20 12.40 6.71

aaSY02 1 4 11 21 32 13.80 11.41

aaEP03 7 10 16 25 26 16.80 7.68

aabSY02 3 5 16 31 31 17.20 12.11

aaEP02 14 17 17 19 20 17.40 2.06

seqbEP25 9 10 16 18 34 17.40 8.98

FPT-nopK 4 6 23 28 28 17.80 10.63

aabPH02 10 12 12 20 43 19.40 12.29

aaPH02 10 11 13 23 44 20.2 12.77

Page 27: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

(2) QSAR – External Validation

logP SQS linear consensus models:

Trained on 3225 molecules

Validated on 9677 compoundsfrom the PhysProp database

aaSY02 0.8188

treePH03 0.8148

aaPH02 0.8109

treeSY03 0.7987

seqPH25 0.7891

seqSY37 0.7245

pairSY28 0.6981

pairPH28 0.6788

treeEP03 0.1169

aaEP02 0.0393

seqEP37 -0.1632

seqwEP25 -0.9547

pairEP28 -1.3567

seqEP25 -2.0424

Descriptor R2

Page 28: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

QSAR – External Validation hERG categorical consensus models:

Trained on 562 molecules (courtesy T. Oprea, UNM)

Validated on 1889 PubChem molecules

Descriptor Well Classified INACTIVES Well Classified ACTIVES Balanced

fraction Nr. out of 1698 fraction Nr. out of 191 Accuracy

aaSY02 0.65 1104 0.72 137 0.68

seqPH25 0.62 1058 0.73 140 0.68

treePH03 0.76 1288 0.59 113 0.68

aaPH03 0.68 1150 0.67 128 0.67

seqPH37 0.7 1192 0.63 121 0.67

seqSY25 0.64 1095 0.69 132 0.67

treeSY03 0.66 1125 0.67 128 0.67

aaPH02 0.69 1164 0.63 121 0.66

pairEP28 0.55 927 0.76 146 0.66

pairPH28 0.63 1078 0.68 130 0.66

seqSY37 0.69 1170 0.6 115 0.65

pairSY28 0.62 1053 0.66 127 0.64

seqEP37 0.88 1492 0.25 47 0.56

seqEP25 0.96 1624 0.08 15 0.52

treeEP03 0.95 1615 0.09 17 0.52

aaEP02 0.95 1613 0.04 7 0.49

Page 29: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie
Page 30: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

Conclusions Pharmacophore-Colored Tree descriptors seem to be

the most versatile ones.

they score well in both NB tests – against other coloredfragments, but also against other pharmacophore terms,

they also score well in the two QSAR studies, againstother colored fragments

Various symbol- and pharmacophore-coloredAugmented Atoms, Sequences and Pairs also werequite successful in QSAR and reasonably steady in NBtests.

Electrostatic potential-colored descriptors failed inQSARs, but some were useful NB monitors. Why?

Page 31: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie
Page 32: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie
Page 33: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

Molecular Similarity & Neighborhood Behavior…

• In chemoinformatics, molecular dissimilarity is a

metric (distance) S(m,M) between the points m and

M representing compounds in a descriptor space (DS).

• The concept of Neighborhood Behavior* (NB) in a DS

is the quantitative equivalent (of statistical nature) of

the Similarity Principle:

– If the probability to pick a pair of compounds with similar

activity levels increases with decreasing S(m,M), then this

space and its metric are told to display significant NB with

respect to the considered activity.

* Patterson, D.E., Cramer, R.D., Ferguson, A.M., Clark, R.D., Weinberger, L.E., Neighborhood Behavior: AUseful Concept for Validation of “Molecular Diversity” Descriptors, J. Med. Chem. 1996, 39, 3049-3059

Page 34: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

The Similarity Principle

Calculated Structural Dissimilarity S(m,M)

Pro

per

tyD

issi

mil

arit

y

*

*

**

*

*

*

* *

*

*

*

*

*

*

** *

**

*

**

*

* *

*

**

*

**

*

**

*

**

* *

**

**

*

*

*

False Positives (FP)

True Negatives (TN)

True Positives (TP)

Potentially (!) False Negatives (FN)

Molecule Pairs M,m

*

*

** **

*

*

*

*

*

*

*

Some Random Ranking Criterion for pairs (m,M)

Pairs with different Properties L(m,M)=|P(m)-P(M)| ≥l

Pairs with similar Properties L(m,M)=|P(m)-P(M)| <l

Page 35: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

Unfortunately, there is no Absolute Similarity Scale, nor a “Quantum of Chemical Change”

Nr. Sequence

Count

in M1 … M2 … M3

1 HDH 1 1 0

2 DHA 1 1 0

3 HHD 1 1 0

4 HHA 1 1 1

5 HHH 4 4 4

6 RHH 2 2 2

7 RRH 4 6 4

8 RRR 6 6 6

9 RRA 2 2 2

10 RAH 2 2 2

11 HAH 1 1 1

12 HDHA 1 1 0

....

Pharmacophore Sequences

Page 36: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

s

W1.0

)()(

)()(

)(E

FN

E

FP

FNFP

NN

NNs

W

SSS

The Optimality Index W

L(M,m) l L(M,m)> l

S(M

,m)

s

True

Positives

(TP)

False

Positives

(FP)

False (?)

Negatives

(FN)

True

Negatives

(TN)

)()(

)()(

)(E

FN

E

FP

FNFP

NN

NNs

W

SSS

s

Activity (profile) differences L(m,M)

Page 37: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

Global & Local Optimality

*

)()(

)()(, )(

pairsall

E

FN

E

FP

FNFPG

NN

NNs

W

SSS

MallformMpairs

E

FN

E

FP

FNFPM

NN

NNs

),,(

)()(

)()(, )(

W

SSS

* For binding affinities, pairs of inactives should be ignored….

Similarity-based

Virtual Screening

(VS) with query M(active molecule)

Page 38: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie

The Ascertained Optimality Excess X

Compound Pairs selected at cutoff s

Random S

values

MeaningfulS values

Var(W)WX

Fraction of Compound Pairs selected at cutoff s

sVars randrand WWWX

X

Page 39: Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath€¦ ·  · 2017-06-27Fiorella Ruggiu, Gilles Marcou, Alexandre Varnek & Dragos Horvath Laboratoire d’InfoChimie