conclusions conclusions - missing values of the principal physico-chemical properties are predicted...

1
CONCLUSIONS CONCLUSIONS - Missing values of the principal physico-chemical properties are predicted by validated regressio using different kinds of molecular descriptors. - Ranking of POPs according to their atmospheric mobility tendency is carried out by means of two approaches: a) Principal Component Analysis of predicted physico-chemical data for long range-transport poten b) Multicriteria Decision Making, taking also into account the atmospheric half-life and the sorp atmospheric particles, for more general environmental behaviour. - POP classifications in 4 mobility classes allow the direct and simple assessment of POP environm behaviour (for both existing and new chemicals) from the molecular structure only. Paola Gramatica , Stefano Pozzi, Federica Consolaro and Roberto Todeschini § QSAR Research Unit, Dep. of Structural and Functional Biology, University of Insubria, via Dunant 3, 21100 Varese (Italy) § Milano Chemometric and QSAR Res. Group, Dep. of Environmental Sciences, University of Milano-Bicocca, via Emanueli 15, 20123 Milano (Italy) e-mail: [email protected] web-site: http://andromeda.varbio.unimi.it/~QSAR INTRODUCTION INTRODUCTION Persistent Organic Pollutants (POPs), particularly PAH, PCB, polychlorinated dibenzo-p-dioxins and some pesticides, are organic compounds that bioaccumulate and resist photolytic, chemical or biological degradation. The discovery that these compounds can move for thousands of kilometres from the point of release, by the so-called “grasshopper effect” (Figure 1), has shown their global distribution behaviour. The POPs capable of exhibiting this long-range transport potential are nowadays the focus of various national and international regulatory initiatives (UNECE, UNEP, etc.) and have been the object of the recent SETAC Pellston Workshop. POP environmental behaviour is clearly controlled by a variety of physical and chemical processes, that can be analyzed with the study of properties such as vapour pressure (vp), Henry’s law constant (H), various partition coefficients (Kow, Koc,...), water solubility (S), atmospheric half-life, etc. Unfortunately, for a large number of POPs, the experimental data for several properties remain unknown, thus regression models, according to the QSAR/QSPR strategies, need to be developed to predict the missing values. A rank of POPs, according to their atmospheric mobility is possible by multivariate approaches (PCA and Multicriteria Decision Making) based on these predicted data. A classification of POPs in mobility classes by few molecular descriptors of chemical structure will allow a fast screening of existing and new compoundsMOLECULAR DESCRIPTORS MOLECULAR DESCRIPTORS The QSAR/QSPR approach is applied to predict the values of several physico-chemical properties (lacking for a large number of POPs) by regression models using different kinds of molecular descriptors for the structural representation of the studied compounds. Molecular descriptors represent the way chemical information contained in the molecular structure is transformed and coded. Among the theoretical descriptors, the best known, obtained simply from the knowledge of the formula, are: molecular weight and count descriptors (1D- descriptors, i. e. counting of bonds, atoms of different kind, presence or counting of functional groups and fragments, etc.). Graph-invariant descriptors (2D-descriptors, including both topological and information indices), are obtained from the knowledge of the molecular topology. WHIM molecular descriptors [1] contain information about the whole 3D-molecular structure in terms of size, symmetry and atom distribution. All these indices are calculated [2] from the (x,y,z)-coordinates of a three-dimensional structure of a molecule, usually from a spatial conformation of minimum energy: 37 non-directional (or global) and 66 directional WHIM descriptors are obtained. A complete set of about two hundred molecular descriptors has been obtained. Being our representation of a chemical based on a number of molecular descriptors, an effective variable selection strategy GA-VSS (Genetic Algorithm - Variable Subset Selection) was applied to the whole set of descriptors in order to set out the most relevant variables in modelling POP properties by Ordinary Least Squares regression (OLS), maximising the predictive power (Q2 LOO) [3]. Models with good predictive performances (Q2 LOO = 78-96%) are obtained for all the physico- chemical properties and the atmospheric half-life, thus achieving reliable data for 87 compounds. [1] Todeschini R. and Gramatica P.; Quant.Struct.-Act.Relat. 1997, 16, 113-119 [2] Todeschini R. - WHIM-3D / QSAR - Software for the calculation of the WHIM descriptors. rel. 4.1 for Windows, Talete srl, Milan (Italy) 1996. Download: http://www.disat.unimi.it/chm. [3] Todeschini R. - Moby Digs - Software for Variable Subset Selection by Genetic Algorithms. Rel. 1.0 for Windows, Talete srl, Milan (Italy) 1997. PRINCIPAL COMPONENT ANALYSIS PRINCIPAL COMPONENT ANALYSIS The biplot of principal component analysis (Figure 2) for 87 POP (Tab. 1), described by the principal physico-chemical properties (boiling point, melting point, logKow, logKoc, Henry’s law constant, TSA, Vmol, water solubility, vapour pressure) and the atmospheric half-life, shows a distribution of compounds along the first component (PC1, EV = 70.4%) according to the global mobility categories assigned by Wania and Mackay[4]. Consequently, it is possible to use the PC1 score values to classify all the 87 POPs in one of the four classes of mobility (high, relatively high, relatively low and low mobility), defined by the marked cut-off. This PCA model doesn’t include the atmospheric half-life, because this property is represented only in the second component, thus it is a model capable to describe the relative long-range transport potential of POPs. [4] Wania F. and Mackay D., Environ. Sci. Technol., Vol. 30, NO. 9, 1996 BIPLOT PC1-PC2 (CUM EV = 88.8 %, PC1 = 70.4%) PC1 PC2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 -4 -3 -2 -1 0 1 2 3 4 -10 -8 -6 -4 -2 0 2 4 6 8 high mobility relat. high relat. low low mobility unknown I) High Mobility II) Relat. high Mobili III) Relat. low mobili IV) Low mobility Half-life Vp Solub. logKow logKoc mp bp Categories of Wania and Mackay Some of the variables represented in the first principal component Not represented in the first PC FIGURE 2 ID COMPOUND CLA CART KNN RDA ID COMPOUND CLA CART KNN RDA 1 indan 1 1 1 1 45 PC B -4 2 2 2 2 2 naphtalene 1 1 1 1 46 PC B -7 2 2 2 2 3 1-methylnaphthalene 1 1 1 2 47 PC B -9 2 2 2 2 4 2-methylnaphthalene 1 1 1 2 48 PC B -11 2 2 2 2 5 1.2-dimethylnaphthalene 2 2 2 2 49 PC B -12 2 2 2 2 6 1.3-dimethylnaphthalene 2 2 2 2 50 PC B -14 2 2 2 2 7 1.4-dimethylnaphthalene 2 2 2 2 51 PC B -15 2 2 2 2 8 1.5-dimethylnaphthalene 2 2 2 2 52 PC B -18 2 2 2 2 9 2.3-dimethylnaphthalene 2 2 2 2 53 PC B -26 2 2 2 2 10 2-6-dim ethylnaphthalene 2 2 2 2 54 PC B -28 2 2 2 2 11 1-ethylnaphthalene 2 2 2 2 55 PC B -29 2 2 2 2 12 2-ethylnaphthalene 2 2 2 2 56 PC B -30 2 2 2 2 13 1.4.5-trim ethylnaphthalene 2 2 2 2 57 PC B -33 2 2 2 2 14 acenaphthene 2 2 2 2 58 PC B -40 2 2 2 2 15 acenaphthylene 2 2 2 2 59 PC B -47 2 2 2 2 16 fluorene 2 2 2 2 60 PC B -52 2 2 2 2 17 1-m ethylfluorene 2 3 2 2 61 PC B -53 2 2 2 2 18 anthracene 3 2 2 2 62 PC B -54 2 2 2 2 19 2-methylanthracene 3 3 3 3 63 PC B -77 2 2 2 2 20 9-methylanthracene 3 3 3 3 64 PC B -86 2 3 2 2 21 9-10-dim ethylanthracene 3 3 3 3 65 PC B -87 2 2 2 2 22 phenanthrene 2 3 2 2 66 PC B -101 2 2 2 2 23 1-methylphenanthrene 3 3 3 3 67 PC B -104 2 2 2 2 24 fluoranthene 3 3 3 3 68 PC B -128 2 3 3 2 25 1-2-benzofluorene 3 3 3 3 69 PC B -153 3 3 2 2 26 2-3-benzofluorene 3 3 3 3 70 PC B -155 3 3 3 2 27 pyrene 3 3 3 3 71 PC B -171 2 3 3 3 28 chrysene 3 3 3 4 72 PC B -194 3 3 3 3 29 triphenylene 3 3 3 4 73 PC B -202 2 3 3 3 30 naphthacene 4 3 3 3 74 PC B -206 3 3 3 3 31 benz[a]anthracene 3 3 3 4 75 PC B -209 3 2 3 3 32 benzo[b]fluoranthene 4 4 4 4 76 a -HC H 1 1 1 1 33 benzo[j]fluoranthene 4 4 4 4 77 y-H CH 1 1 1 1 34 benzo[a]pyrene 4 4 4 4 78 p,p'-D D T 3 3 2 3 35 benzo[e]pyrene 4 4 4 4 79 p,p'-D D E 3 3 3 3 36 perylene 4 4 4 4 80 p,p'-D D D 3 3 3 3 37 7.12-dim ethylbenz[a]anthracen 4 4 4 4 81 chlordane 3 3 2 3 38 3-m ethylcholanthrene 4 4 4 4 82 dieldrin 3 3 2 3 39 benzo[g,h,i]perylene 4 4 4 4 83 2,3,7,8-tetraC l-dibenzo- p -dioxin 3 2 3 3 40 dibenz[a,h]anthracene 4 4 3 4 84 1,2,3,4,7,8-hexaC l-dibenzo- p -dioxin 3 3 3 3 41 biphenyl(P C B -0) 1 1 2 2 85 octaC l-dibenzo- p -dioxin 3 3 3 3 42 PC B -1 1 2 2 2 86 pentaC l-benzene 1 1 1 1 43 PC B -2 2 2 2 2 87 hexaC l-benzene 1 1 1 1 44 PC B -3 2 2 2 2 DESIRABILITY” OF POP DESIRABILITY” OF POP s s ACCORDING TO THEIR ACCORDING TO THEIR ATMOSPHERIC MOBILITY ATMOSPHERIC MOBILITY The main goal of this work is a suggestion of POP ranking according to their atmospheric mobility. In addition to the long-range transport potential, modelled by PC1, the compound half-life and their sorption on the atmospheric particles must be considered because of their influence on the atmospheric mobility. A chemometric strategy known as “Multicriteria Decision Making“, particularly the desirability functions, was used for these purposes. Less mobile POPs are considered in this work as more desirable. Thus we apply the following criteria: first principal component (PC1) score values as mobility index: optimum = low values logKoc as atmospheric particle sorption index: optimum = high values atmospheric half-life values: optimum = low values The desirability values for each compounds were calculated by a linear function. In figure 3 the desirability values were plotted as a function of the molecule ID; the compound mobility trend appears very similar to the real world distribution. TABLE 1 1 1 3 3 4 4 6 6 2 2 High High mobility mobility Relat. high Relat. high mobility mobility Relat. Low Relat. Low mobility mobility Low mobility Low mobility GRASSHOPPER EFFECT GRASSHOPPER EFFECT DDT DDT HCB HCB FIGURE 1 Selected molecular descriptors C = count descriptors T = topological descriptors W-DIR = directional WHIM descriptors W-ND = no directional WHIM descriptors NAT= number of atoms (C) NBO = number of bonds (C) CHI0 = Randic chi-0 (T) CHI1A = Randic chi-1 (average) (T) GSI = Gordon-Scatlebury index (T) BAL = Balaban index (T) IAC = index of atomic composition (T) IDDE = total index on equivalence of degrees (T) DELS = total electrotopological difference (T) ROUV = Rouvray index (T) MAXDP = maximum electrotop. difference (T) MW = molecular weight NCl = number of Cl (C) L1m = dimension along the first component with atomic mass weight (W-DIR) L2s = size along the second component with electrotopological weight (W-DIR) E1u, E3u = density along respectively the first and the third dimension with unit weight (W-DIR) P2u = shape along the second component with unit weight (W-DIR) P1v = shape along the first component with van der Waals volume weight (W-DIR) Ts, Tm = size (eigenvalue sum) with respectively atomic mass and electrotopological weight (W-ND) Av = size (cross-term eigenvalue sum) with van der Waals volume weight (W-ND) Vu = size (complete eigenvalue expression) with unit weight (W-ND) LDA LDA MOLECULAR MOLECULAR DESCRIPTORS: DESCRIPTORS: CHI0 IAC DELS ROUV CHI0 IAC DELS ROUV MAXDP MW NCl L1m E3u MAXDP MW NCl L1m E3u P1v L2s Ts Tm Vu Av P1v L2s Ts Tm Vu Av CART (162 DESCRIPTORS) CART (162 DESCRIPTORS) MR MR cv cv = 12.64 % = 12.64 % = 0.5 ; = 0.5 ; = 0.0 = 0.0 MR MR cv cv = 14.94 = 14.94 % % K = 3 K = 3 MR MR cv cv = 13.79 = 13.79 % % RDA RDA KNN KNN 4 2 3 4 2 3 1 2 3 1 a ssigne d cla ss GSI 25.50 NBO 12.50 CHI1A 0.46 BAL 1.71 NAT 31.00 P 2u 0.17 IDDE 2.84 BAL 1.49 E 1u 0.55 C lassificatio n T ree CLASSIFICATIONS CLASSIFICATIONS A POP classification according to their environmental behaviour was made by means of several classification methods (CART, K-NN and RDA). The a priori classes were obtained from the desirability values: high mobility (class 1) 0.33, relatively high (cla 2) = [0.33- 0.5], relatively low (cla 3) = [0.5- 0.67], low mobility (cla 4) > 0.67. All the classification methods give models with satisfactory prediction power (results below). The simplest model, and consequently the most directly applicable, is developed with CART (figure 4): the selected descriptors are mainly related to the molecular size. In Tab. 1 are reported the a priori classes (cla) and the predicted classes by each model for all POPs: most of the compounds have been assigned to the same class by all the applied classification methods, only compounds at the border of two contiguous classes have a different classification. Nevertheless, it must be noted that no compounds have been assigned to not- adjacent classes and the cut-off values are not strictly definable. CLASSIFICATION CLASSIFICATION NOMMR = 63.22 % NOMMR = 63.22 % FIGURE 4 5 5 ID Desirability 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 -1 19 39 59 79 Chemical class PAH PCB Pesticide Dioxin Benzene FIGURE 3 High High mobilit mobilit y y Low Low mobilit mobilit y y

Upload: brice-hoover

Post on 02-Jan-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CONCLUSIONS CONCLUSIONS - Missing values of the principal physico-chemical properties are predicted by validated regression models by using different kinds

CONCLUSIONSCONCLUSIONS - Missing values of the principal physico-chemical properties are predicted by validated regression models by

using different kinds of molecular descriptors.

- Ranking of POPs according to their atmospheric mobility tendency is carried out by means of two multivariate

approaches:

a) Principal Component Analysis of predicted physico-chemical data for long range-transport potential;

b) Multicriteria Decision Making, taking also into account the atmospheric half-life and the sorption on the

atmospheric particles, for more general environmental behaviour.

- POP classifications in 4 mobility classes allow the direct and simple assessment of POP environmental

behaviour (for both existing and new chemicals) from the molecular structure only.

Paola Gramatica, Stefano Pozzi, Federica Consolaro and Roberto Todeschini§

QSAR Research Unit, Dep. of Structural and Functional Biology, University of Insubria, via Dunant 3, 21100 Varese (Italy) §Milano Chemometric and QSAR Res. Group, Dep. of Environmental Sciences, University of Milano-Bicocca, via Emanueli 15, 20123 Milano (Italy)

e-mail: [email protected] web-site: http://andromeda.varbio.unimi.it/~QSAR

INTRODUCTIONINTRODUCTION

Persistent Organic Pollutants (POPs), particularly PAH, PCB, polychlorinated dibenzo-p-dioxins and some pesticides, are

organic compounds that bioaccumulate and resist photolytic, chemical or biological degradation.

The discovery that these compounds can move for thousands of kilometres from the point of release, by the so-called

“grasshopper effect” (Figure 1), has shown their global distribution behaviour.

The POPs capable of exhibiting this long-range transport potential are nowadays the focus of various national and

international regulatory initiatives (UNECE, UNEP, etc.) and have been the object of the recent SETAC Pellston Workshop.

POP environmental behaviour is clearly controlled by a variety of physical and chemical processes, that can be analyzed

with the study of properties such as vapour pressure (vp), Henry’s law constant (H), various partition coefficients (Kow,

Koc,...), water solubility (S), atmospheric half-life, etc. Unfortunately, for a large number of POPs, the experimental data for

several properties remain unknown, thus regression models, according to the QSAR/QSPR strategies, need to be developed

to predict the missing values.

A rank of POPs, according to their atmospheric mobility is possible by multivariate approaches (PCA and Multicriteria

Decision Making) based on these predicted data.

A classification of POPs in mobility classes by few molecular descriptors of chemical structure will allow a fast screening

of existing and new compounds.

MOLECULAR DESCRIPTORSMOLECULAR DESCRIPTORS

The QSAR/QSPR approach is applied to predict the values of several physico-chemical properties (lacking for a large

number of POPs) by regression models using different kinds of molecular descriptors for the structural representation of

the studied compounds.

Molecular descriptors represent the way chemical information contained in the molecular structure is transformed and

coded. Among the theoretical descriptors, the best known, obtained simply from the knowledge of the formula, are:

molecular weight and count descriptors (1D-descriptors, i. e. counting of bonds, atoms of different kind, presence or

counting of functional groups and fragments, etc.). Graph-invariant descriptors (2D-descriptors, including both

topological and information indices), are obtained from the knowledge of the molecular topology. WHIM molecular

descriptors [1] contain information about the whole 3D-molecular structure in terms of size, symmetry and atom

distribution. All these indices are calculated [2] from the (x,y,z)-coordinates of a three-dimensional structure of a

molecule, usually from a spatial conformation of minimum energy: 37 non-directional (or global) and 66 directional WHIM

descriptors are obtained. A complete set of about two hundred molecular descriptors has been obtained.

Being our representation of a chemical based on a number of molecular descriptors, an effective variable selection

strategy GA-VSS (Genetic Algorithm - Variable Subset Selection) was applied to the whole set of descriptors in order to

set out the most relevant variables in modelling POP properties by Ordinary Least Squares regression (OLS), maximising

the predictive power (Q2 LOO) [3].

Models with good predictive performances (Q2 LOO = 78-96%) are obtained for all the physico-chemical properties and

the atmospheric half-life, thus achieving reliable data for 87 compounds.

[1] Todeschini R. and Gramatica P.; Quant.Struct.-Act.Relat. 1997, 16, 113-119

[2] Todeschini R. - WHIM-3D / QSAR - Software for the calculation of the WHIM descriptors. rel. 4.1 for Windows, Talete srl, Milan (Italy) 1996. Download: http://www.disat.unimi.it/chm.

[3] Todeschini R. - Moby Digs - Software for Variable Subset Selection by Genetic Algorithms. Rel. 1.0 for Windows, Talete srl, Milan (Italy) 1997.

PRINCIPAL COMPONENT ANALYSISPRINCIPAL COMPONENT ANALYSIS

The biplot of principal component analysis (Figure 2) for 87 POP (Tab. 1), described by the principal physico-chemical

properties (boiling point, melting point, logKow, logKoc, Henry’s law constant, TSA, Vmol, water solubility, vapour pressure)

and the atmospheric half-life, shows a distribution of compounds along the first component (PC1, EV = 70.4%) according to

the global mobility categories assigned by Wania and Mackay[4].

Consequently, it is possible to use the PC1 score values to classify all the 87 POPs in one of the four classes of mobility

(high, relatively high, relatively low and low mobility), defined by the marked cut-off.

This PCA model doesn’t include the atmospheric half-life, because this property is represented only in the second

component, thus it is a model capable to describe the relative long-range transport potential of POPs.

[4] Wania F. and Mackay D., Environ. Sci. Technol., Vol. 30, NO. 9, 1996

BIPLOT PC1-PC2 (CUM EV = 88.8 %, PC1 = 70.4%)

PC1

PC

2

1

2

345

6

789

10111213

14

15

16

17

18

19

20

21

2223

24

2526

27

28 29

30

313233

3435

36

37

38

3940

41

4243

44

4546

47

4849

50

51

5253

5455

56

5758

5960

61

62

63

6465

66

67

68

69

70

71

72

73

7475

76

77

78

79

80

81

8283

84

85

8687

-4

-3

-2

-1

0

1

2

3

4

-10 -8 -6 -4 -2 0 2 4 6 8

high mobilityrelat. highrelat. lowlowmobility unknown

I) High MobilityII) Relat. high MobilityIII) Relat. low mobilityIV) Low mobility

Half-life

Vp

Solub.

logKow

logKoc

mp

bp

Categories of

Wania and Mackay

Some of the variables represented in the first principal component

Not represented in the first PC

FIGURE 2

ID COMPOUND CLA CART KNN RDA ID COMPOUND CLA CART KNN RDA

1 indan 1 1 1 1 45 PCB-4 2 2 2 2

2 naphtalene 1 1 1 1 46 PCB-7 2 2 2 2

3 1-methylnaphthalene 1 1 1 2 47 PCB-9 2 2 2 2

4 2-methylnaphthalene 1 1 1 2 48 PCB-11 2 2 2 2

5 1.2-dimethylnaphthalene 2 2 2 2 49 PCB-12 2 2 2 2

6 1.3-dimethylnaphthalene 2 2 2 2 50 PCB-14 2 2 2 2

7 1.4-dimethylnaphthalene 2 2 2 2 51 PCB-15 2 2 2 2

8 1.5-dimethylnaphthalene 2 2 2 2 52 PCB-18 2 2 2 2

9 2.3-dimethylnaphthalene 2 2 2 2 53 PCB-26 2 2 2 2

10 2-6-dimethylnaphthalene 2 2 2 2 54 PCB-28 2 2 2 2

11 1-ethylnaphthalene 2 2 2 2 55 PCB-29 2 2 2 2

12 2-ethylnaphthalene 2 2 2 2 56 PCB-30 2 2 2 2

13 1.4.5-trimethylnaphthalene 2 2 2 2 57 PCB-33 2 2 2 2

14 acenaphthene 2 2 2 2 58 PCB-40 2 2 2 2

15 acenaphthylene 2 2 2 2 59 PCB-47 2 2 2 2

16 fluorene 2 2 2 2 60 PCB-52 2 2 2 2

17 1-methylfluorene 2 3 2 2 61 PCB-53 2 2 2 2

18 anthracene 3 2 2 2 62 PCB-54 2 2 2 2

19 2-methylanthracene 3 3 3 3 63 PCB-77 2 2 2 2

20 9-methylanthracene 3 3 3 3 64 PCB-86 2 3 2 2

21 9-10-dimethylanthracene 3 3 3 3 65 PCB-87 2 2 2 2

22 phenanthrene 2 3 2 2 66 PCB-101 2 2 2 2

23 1-methylphenanthrene 3 3 3 3 67 PCB-104 2 2 2 2

24 fluoranthene 3 3 3 3 68 PCB-128 2 3 3 2

25 1-2-benzofluorene 3 3 3 3 69 PCB-153 3 3 2 2

26 2-3-benzofluorene 3 3 3 3 70 PCB-155 3 3 3 2

27 pyrene 3 3 3 3 71 PCB-171 2 3 3 3

28 chrysene 3 3 3 4 72 PCB-194 3 3 3 3

29 triphenylene 3 3 3 4 73 PCB-202 2 3 3 3

30 naphthacene 4 3 3 3 74 PCB-206 3 3 3 3

31 benz[a]anthracene 3 3 3 4 75 PCB-209 3 2 3 3

32 benzo[b]fluoranthene 4 4 4 4 76 a -HCH 1 1 1 1

33 benzo[j]fluoranthene 4 4 4 4 77 y-HCH 1 1 1 1

34 benzo[a]pyrene 4 4 4 4 78 p,p'-DDT 3 3 2 3

35 benzo[e]pyrene 4 4 4 4 79 p,p'-DDE 3 3 3 3

36 perylene 4 4 4 4 80 p,p'-DDD 3 3 3 3

377.12-dimethylbenz[a]anthracene 4 4 4 4 81 chlordane 3 3 2 3

38 3-methylcholanthrene 4 4 4 4 82 dieldrin 3 3 2 3

39 benzo[g,h,i]perylene 4 4 4 4 83 2,3,7,8-tetraCl-dibenzo-p -dioxin 3 2 3 3

40 dibenz[a,h]anthracene 4 4 3 4 84 1,2,3,4,7,8-hexaCl-dibenzo-p -dioxin 3 3 3 3

41 biphenyl (PCB-0) 1 1 2 2 85 octaCl-dibenzo-p -dioxin 3 3 3 3

42 PCB-1 1 2 2 2 86 pentaCl-benzene 1 1 1 1

43 PCB-2 2 2 2 2 87 hexaCl-benzene 1 1 1 1

44 PCB-3 2 2 2 2

““DESIRABILITY” OF POPDESIRABILITY” OF POPs s ACCORDING TO THEIR ACCORDING TO THEIR ATMOSPHERIC MOBILITYATMOSPHERIC MOBILITY

The main goal of this work is a suggestion of POP ranking according to their atmospheric mobility.

In addition to the long-range transport potential, modelled by PC1, the compound half-life and their sorption on the

atmospheric particles must be considered because of their influence on the atmospheric mobility. A chemometric strategy

known as “Multicriteria Decision Making“, particularly the desirability functions, was used for these purposes.

Less mobile POPs are considered in this work as more desirable. Thus we apply the following criteria:• first principal component (PC1) score values as mobility index: optimum = low values • logKoc as atmospheric particle sorption index: optimum = high values• atmospheric half-life values: optimum = low values

The desirability values for each compounds were calculated by a linear function.

In figure 3 the desirability values were plotted as a function of the molecule ID; the compound mobility trend appears very

similar to the real world distribution.

TABLE 1

11

33 44

66

22

High mobilityHigh mobility

Relat. high mobilityRelat. high mobility

Relat. Low mobilityRelat. Low mobility

Low mobilityLow mobility

GRASSHOPPER EFFECTGRASSHOPPER EFFECT

DDTDDTHCBHCB

FIGURE 1

Selected molecular descriptors

C = count descriptorsT = topological descriptorsW-DIR = directional WHIM descriptorsW-ND = no directional WHIM descriptors

NAT= number of atoms (C)NBO = number of bonds (C) CHI0 = Randic chi-0 (T) CHI1A = Randic chi-1 (average) (T) GSI = Gordon-Scatlebury index (T)BAL = Balaban index (T) IAC = index of atomic composition (T)IDDE = total index on equivalence of degrees (T)DELS = total electrotopological difference (T)ROUV = Rouvray index (T)MAXDP = maximum electrotop. difference (T)MW = molecular weight NCl = number of Cl (C)L1m = dimension along the first component with atomic mass weight (W-DIR)L2s = size along the second component with electrotopological weight (W-DIR)E1u, E3u = density along respectively the first and the third dimension with unit weight (W-DIR)P2u = shape along the second component with unit weight (W-DIR)P1v = shape along the first component with van der Waals volume weight (W-DIR)Ts, Tm = size (eigenvalue sum) with respectively atomic mass and electrotopological weight (W-ND)Av = size (cross-term eigenvalue sum) with van der Waals volume weight (W-ND)Vu = size (complete eigenvalue expression) with unit weight (W-ND)

LDALDA MOLECULAR MOLECULAR DESCRIPTORS:DESCRIPTORS:

CHI0 IAC DELS ROUV CHI0 IAC DELS ROUV MAXDP MW NCl L1m E3u P1v MAXDP MW NCl L1m E3u P1v

L2s Ts Tm Vu AvL2s Ts Tm Vu Av

CART (162 DESCRIPTORS)CART (162 DESCRIPTORS)

MRMRcv cv = 12.64 %= 12.64 %

= 0.5 ; = 0.5 ; = 0.0 = 0.0

MRMRcvcv = 14.94 % = 14.94 %

K = 3K = 3

MRMRcvcv = 13.79 % = 13.79 %

RDARDA

KNNKNN

4234231231

assigned class

GSI 25.50

NBO 12.50

CHI1A 0.46

BAL 1.71

NAT 31.00

P2u 0.17

IDDE 2.84

BAL 1.49

E1u 0.55

Classification Tree

CLASSIFICATIONSCLASSIFICATIONS

A POP classification according to their environmental behaviour was made by means of several classification methods

(CART, K-NN and RDA). The a priori classes were obtained from the desirability values: high mobility (class 1) 0.33,

relatively high (cla 2) = [0.33- 0.5], relatively low (cla 3) = [0.5- 0.67], low mobility (cla 4) > 0.67.

All the classification methods give models with satisfactory prediction power (results below). The simplest model, and

consequently the most directly applicable, is developed with CART (figure 4): the selected descriptors are mainly related

to the molecular size.

In Tab. 1 are reported the a priori classes (cla) and the predicted classes by each model for all POPs: most of the

compounds have been assigned to the same class by all the applied classification methods, only compounds at the

border of two contiguous classes have a different classification. Nevertheless, it must be noted that no compounds have

been assigned to not-adjacent classes and the cut-off values are not strictly definable.

CLASSIFICATIONCLASSIFICATION

NOMMR = 63.22 %NOMMR = 63.22 %

FIGURE 4

55

ID

De

sir

ab

ilit

y

1

234

56789101112

13

1415

16

17

181920

21

22

2324

2526

27

2829

30

31

3233

3435

36

37

38

3940

4142

4344454647

484950

51

5253545556

575859

6061626364656667

68

697071

72

73

7475

7677

78

79

8081

8283

8485

8687

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

-1 19 39 59 79

Chemical classPAHPCBPesticideDioxinBenzene

FIGURE 3

High High mobilitymobility

Low Low mobilitymobility