emma peré-trepat 1 and romà tauler 2 *

61
INVESTIGATION OF MAIN CONTAMINATION SOURCES OF HEAVY METAL IONS IN FISH, SEDIMENTS, AND WATERS FROM CATALONIA RIVERS USING DIFFERENT MULTIWAY DATA ANALYSIS METHODS Emma Peré-Trepat 1 and Romà Tauler 2 * 1 Dept. of Analytical Chemistry, Universitat de Barcelona, Diagonal 647, 08028 Barcelona, Spain 2 IIQAB-CSIC, Jordi Girona 18-26, 08034 Barcelona, Spain * e-mail: [email protected]

Upload: sydnee-bennett

Post on 31-Dec-2015

23 views

Category:

Documents


0 download

DESCRIPTION

INVESTIGATION OF MAIN CONTAMINATION SOURCES OF HEAVY METAL IONS IN FISH, SEDIMENTS, AND WATERS FROM CATALONIA RIVERS USING DIFFERENT MULTIWAY DATA ANALYSIS METHODS. Emma Peré-Trepat 1 and Romà Tauler 2 * - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Emma Peré-Trepat 1  and Romà Tauler  2 *

INVESTIGATION OF MAIN CONTAMINATION SOURCES OF HEAVY METAL IONS IN FISH,

SEDIMENTS, AND WATERS FROM CATALONIA RIVERS USING DIFFERENT MULTIWAY DATA

ANALYSIS METHODS

Emma Peré-Trepat1 and Romà Tauler 2*

1 Dept. of Analytical Chemistry, Universitat de Barcelona, Diagonal 647, 08028 Barcelona, Spain

2 IIQAB-CSIC, Jordi Girona 18-26, 08034 Barcelona, Spain

* e-mail: [email protected]

Page 2: Emma Peré-Trepat 1  and Romà Tauler  2 *

Outline:

• Introduction and motivations of this work

• Environmental data tables and chemometrics models and methods

• Example of application: metal contamination sources in fish, sediment and surface water river samples.

• Conclusions

Page 3: Emma Peré-Trepat 1  and Romà Tauler  2 *

Introduction and motivations of this work

• Pollution and toxicological chemical compounds are a threat for the environment and the health which need urgent measures and actions

• Environmental monitoring studies produce huge amounts of multivariate data ordered in large data tables (data matrices)

• The bottle neck in the study of these environmental data tables is their analysis and interpretation

• There is a need for chemometrics (statistical and numerical analysis of multivariate chemical data) analysis of these data tables!

Page 4: Emma Peré-Trepat 1  and Romà Tauler  2 *

What kind of information can be obtained from chemometric analysis of environmental multivariate data tables?

1. Detection, identification, interpretation and resolution of the main sources of contamination

2. Distribution of these contamination sources in the environment: geographically, temporally, by environmental compartment (air, water, sediments, biota,...),…

3. Distinction between point and diffuse contamination sources sources

4. Quantitative apportionment of these sources .....

Page 5: Emma Peré-Trepat 1  and Romà Tauler  2 *

Introduction and motivations of this work

In this work different chemometric multiway data analysis methods are compared for the resolution of the environmental sources of 11 metal ions in 17 river samples of fish, sediment and water at the same site locations of Catalonia (NE, Spain).

• Two-way bilinear model based methods• MA-PCA Matrix Augmentation Principal Component

Analysis • MA-MCR-ALS Matrix Augmentation Multivariate Curve

Resolution Alternating Least Squares• Three-way trilinear models based methods

• PARAFAC • TUCKER3• MCR-ALS trilinear• MCR-ALS TUCKER3

Page 6: Emma Peré-Trepat 1  and Romà Tauler  2 *

Introduction and motivations of this work

Special attention will be paid to:

• Finding ways to compare results obtained using bilinear and trilinear models for three-way data: getting profiles in three modes from bilinear models of three-way data

• Adaptation of MCR-ALS to the fulfillment of PARAFAC and TUCKER3 trilinear models

•Reliability of solutions: calculation of boundaries of bands of feasible solutions

•Integration of Geostatistics and Chemometrics in the investigation of environmental data

Page 7: Emma Peré-Trepat 1  and Romà Tauler  2 *

Outline:

• Introduction and motivations of this work

• Environmental data tables and chemometrics models and methods

• Example of application: metal contamination sources in fish, sediment and river surface water samples.

• Conclusions

Page 8: Emma Peré-Trepat 1  and Romà Tauler  2 *

I sa

mpl

es

J variables

0 5 10 15 20 25 30-50

0

50

100

150

200

250

300

350

0 5 10 15 20 25 30 35 40 45 50-50

0

50

100

150

200

250

300

350

Data table or matrix

Plot of samples(rows)

Plot of variables(columns)

12 13 45 67 89 42 35 0 0.3 0.005 111 33 5 67 90 0.06 44 33 1 2

X

Environmental data tables (two-way data)

Conc. of chemicalsPhysical PropertiesBiological propertiesOther .....

‘m’

<LOD

Page 9: Emma Peré-Trepat 1  and Romà Tauler  2 *

Environmental three-way data sets

Measured data usually consisted on concentrations of differentchemical compounds (variables) measured in different samplesat different times/situations/conditions/compartments.

Data are ordered in a two-way or in a three-way data table accordingto their structure

time/

compartm

ent3-way data sets

variables (conc. Chemical ompounds)

sam

ples

Three measurement modes- variables mode- sample mode- times/situations/conditions/ compartments mode

Page 10: Emma Peré-Trepat 1  and Romà Tauler  2 *

Models for what?Models for:1. identification of contamination sources?2. exploration of contamination sources?3. interpretation of contamination sources?4. resolution of environmental source?5. apportionment/quantitation of environmental

source?6. ??????..............................

Chemometric models to describe environmental measurements

Page 11: Emma Peré-Trepat 1  and Romà Tauler  2 *

E XY D T

N

1nijjninij eyxd

dij is the concentration of chemical contaminant j in sample in=1,...,N are a reduced number of independent environmental sourcesxin is the amount of source n in sample i;ynj is the amount of contaminant j in source n

Bilinear models for two way data:

D

J

Idij

Chemometric models to describe environmental measurements

Page 12: Emma Peré-Trepat 1  and Romà Tauler  2 *

N

D XYT

E+

J J J

I I

N

N << I or J

PCA X orthogonal, YT orthonormal

YT in the direction of maximum variance

Unique solutions

but without physical meaningIdentification and Intereprtation!

MCR-ALSX and YT non-negativeX or YT normalization

other constraints (unimodality, local rank,… )

Non-unique solutions but with physical meaning

Resolution and apportionment!

I

Bilinear models for two way data:

Chemometric models to describe environmental measurements

Page 13: Emma Peré-Trepat 1  and Romà Tauler  2 *

YT

Dk Xk

(I x J) (I,n)

YT

(n,J)

Dk

Dk Xk

(I x J) (I,n)

YT

(n,J)

PCA: orthogonality; max. variance

MCR: non-negativity, nat. constraints

Xk

Daug

Xaug

Extension of Bilinear models for simultaneous analysis of multiple two way data sets

Chemometric models to describe environmental measurements

Matrix augmentation

strategy

Page 14: Emma Peré-Trepat 1  and Romà Tauler  2 *

Environmental data sets

Page 15: Emma Peré-Trepat 1  and Romà Tauler  2 *

dijk is the concentration of chemical contaminant j in sample I at time (condition) kn=1,...,N are a reduced number of independent environmental sourcesxin is the amount of source n in sample i;ynj is the amount of contaminant j in source nznk is the contribution of source n to compartment k

Tk kD =XZ Y E

z

N

ijk in jn kn ijkn=1

d = x y +e

Chemometric models to describe environmental measurements

Trilinear models for three-way data:

k=1,...,Ki=1,

...,I

j=1,...,J

Dk

Page 16: Emma Peré-Trepat 1  and Romà Tauler  2 *

Three Way data models

X-mode D

Y-mode

Z-mode

(I , J , K)

X YZ

Ni Nj Nk

I J

K

variables

sam

ples

cond

itions

Page 17: Emma Peré-Trepat 1  and Romà Tauler  2 *

D= X

YT

Z

PARAFAC (trilinear model)

The same number of components In the three modes: Ni = Nj = Nk = N

No interactions between components

Different slices Xk are decomposed In bilinear profiles having the same shape!

Tk kD =XZ Y E

z

N

ijk in jn kn ijkn=1

d = x y +e

Page 18: Emma Peré-Trepat 1  and Romà Tauler  2 *

ji k

i j k i j k

i j k

NN N

ijk n n n in jn kn ijkn =1 n =1 n =1

d = x y z +eg

D

X

YTG

Z

=•Different number of componentsin the different modes Ni Nj Nk

•Interaction between components in different modes is possible

In PARAFAC Ni = Nj = Nk = N andcore array G is a superdiagonal identity cube

Tucker3 models

Page 19: Emma Peré-Trepat 1  and Romà Tauler  2 *

Deviations from trilinearity Mild Medium Strong Array size

PARAFAC

Small PARAFAC2

Medium TUCKER

Large MCR, PCA, SVD,..

Guidelines for method selection(resolution purposes)

Journal of Chemometrics, 2001, 15, 749-771

Page 20: Emma Peré-Trepat 1  and Romà Tauler  2 *

METHODOLOGYMETHODOLOGY

INTEGRATION OF CHEMOMETRICS

—GEOSTATISTICS

(GeographicalInformation

Systems, GIS)

Page 21: Emma Peré-Trepat 1  and Romà Tauler  2 *

Outline:

• Introduction and motivations of this work

• Environmental data tables

• Chemometrics bilinear and trilinear models and methods

• Example of application: metal contamination sources in fish, sediment and river surface water samples.

• Conclusions

Page 22: Emma Peré-Trepat 1  and Romà Tauler  2 *

141516

17

13

1211

10

9

8

7

45 6

2

3

1

1. RIU MUGA Castelló d´Empúries J0522. RIU FLUVIÀ Besalú J0223. RIU FLUVIÀ L´Armentera J0114. RIU TER Manlleu J0345. RIU TERRI Sant Julià de Ramis J0286. RIU TER Clomers J1127. RIU TORDERA Fogars de Tordera J0628. RIU CONGOST La Garriga J0379. RIU LLOBREGAT El Pont de Vilomara J03110. RIU CARDENER Castellgali J00211. RIU LLOBREGAT Abrera J08412. RIU LLOBREGAT Martorell J00513. RIU LLOBREGAT Sant Joan Despí J04914. RIU FOIX Castellet J00815. RIU FRANCOLÍ La Masó J05916. RIU EBRE Flix J05617. RIU SEGRE Térmens J207

17 rivers, 11 metals (As, Ba, Cd, Co, Cu, Cr, Fe, Mn, Ni, Pb, Zn),

3 environmental conpartments: Fish (barb’, ‘bagra comuna’, bleak, carp and

trout), Sediment and Water samples

METAL CONTAMINATION SOURCES IN SEDIMENTS, FISH AND WATERS FROM CATALONIA RIVERS USING MULTIWAY DATA ANALYSIS METHODS

Emma Peré-Trepat (UB), Mónica Flo, Montserrat Muñoz, Antoni Ginebreda (ACA), Marta Terrado, Romà Tauler (CSIC)

Mediterranean Sea

Pyrinees

Barcelona

France

Ara

n

Page 23: Emma Peré-Trepat 1  and Romà Tauler  2 *

Missing data (‘m’)

• Unknown values produce empty holes in data matrices•When they are few and they are evenly distributed, theymay be estimated by PCA imputation (or other method)

Below LOD values (<LOD)

• This a common problem in environmental data tables• If most of the values are below LOD, data matrices are sparse • For calculations, it is better, either to use the experimental values or set them to LOD/2 instead of to zero or to LOD

Page 24: Emma Peré-Trepat 1  and Romà Tauler  2 *

Preliminary data description: Use of descriptive statisticsIndividual sample plots1) Individual variable plots 2) Descriptive statistics (Excel Statistics)3) Histograms/Box plots4) Binary correlation between variables5) .............................................................

**

*

**

1 2 3 4 5 6 7 8 91011121314151617181920212223242526272829303132333435363738394041424344454647484950

0

50

100

150

200

250

300

Column Number

lower quartile

median

upper quartile

upper whisker

lower whisker

outliers

outliers

Page 25: Emma Peré-Trepat 1  and Romà Tauler  2 *

0 2 4 6 8 10 120

1

2

3x 10

4

1 2 3 4 5 6 7 8 9 10 11-2

0

2x 10

4

1 2 3 4 5 6 7 8 9 10 11-2

0

2

4

1 2 3 4 5 6 7 8 9 10 110

2

4

6

Effect of different data pre-treatments: Sediment samples

raw

mean-centred

auto-scaled

scaled

As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn

Mo is eliminated

Page 26: Emma Peré-Trepat 1  and Romà Tauler  2 *

Data Pretreatment– No mean-centering was applied to allow an improved physical

interpretation of factors (application of non-negativity constraints instead of orthogonality constraints) and the comparison of results using MCR-ALS methods

– Two scaling possibilities:• First, data matrix augmentation and then column scaling to equal variance (each

column element divided by its standard deviation)• First, column scaling each data matrix separately and then data matrix

augmentation

– Variables with nearly no-changes and equal or close to their limit of detection were removed from scaling and divided by 20 (to avoid their miss-overweighting)

Page 27: Emma Peré-Trepat 1  and Romà Tauler  2 *

As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn-1

0

1

2

3

4

5mean -+ std of scaled concentrations of 11 metals

water

sedimentsfish

metals (variables)

Description of scaled dataMetal distribution in the three compartments

Cd, Co and Ld in waterwere not scaled; only downweigthed

Page 28: Emma Peré-Trepat 1  and Romà Tauler  2 *

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170.5

1

1.5

2

2.5

3

water

sediment

fish

sample sites

Foix

Ter

Ebre

Llobregat

Terri

Segre

Tordera

Francolí

Llobregat

Llobregat

MugaFluvià

FluviàTer

Congost

Llobregat

Cardener

Description of scaled data:different sites in the three compartments

Page 29: Emma Peré-Trepat 1  and Romà Tauler  2 *

As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn

1 2 3 4 5 6 7 8 9 10 110

2

4

Unit variance scaled concentrations boxplot

Va

lues

1 2 3 4 5 6 7 8 9 10 110

2

4

Va

lues

1 2 3 4 5 6 7 8 9 10 110

2

4

6

Va

lues

Fish

Sediment

Water

Page 30: Emma Peré-Trepat 1  and Romà Tauler  2 *

Water

Sediment

Fish

Fish Sediment Water

0 1 2 3 4 5 6 7 8 9 10

0

5

10

15

20

25

30

35

40

45SVD odf augmented data matrices in the three-directions

svd column-wise (variables)

svd row-wise (samples)

svd trube-wise (type)

AUGMENTATION direction

column row tube

s1 40.2619 43.2553 41.3302

s2 16.7504 9.2823 19.4850

s3 9.4963 8.5312 14.3739

2nd component

How many componentsare needed to explain each mode?

contaminants

compartments

site

s

THREE-WAY DATA ARRAY MATRICIZING

or MATRIX AUGMENTATION

FishSediment

Water

Page 31: Emma Peré-Trepat 1  and Romà Tauler  2 *

Daug

Y

metals

compartments

site

s

F

S

W

F

S

W

contaminants

site

ssi

tes

site

s

1

2

3

4

5

6

MA-PCAMA-MCR-ALS

Bilinear modelling of three-way data(Matrix Augmentation or matricizing, stretching, unfolding )

Xaug

Augmenteddata matrix

Augmentedscores matrix

Loadings

Page 32: Emma Peré-Trepat 1  and Romà Tauler  2 *

I

1i

J

1j

2j,i

d

I

1i

J

1j

2j,i

e

12R

j,ij,ij,i dde

where di,j is the experimental value in the augmented data matrix for metal j and sample i

and j,id

is the corresponding calculated value using PCA or MCR-ALS bilinear models

and number of components

Explained variances using bilinear models (profiles in two modes)

N

1n j,ien,jyn,ixj,id

Page 33: Emma Peré-Trepat 1  and Romà Tauler  2 *

1 2 3 4 5 6 7 8 9 10 110

0.1

0.2

0.3

0.4

0.5

1 2 3 4 5 6 7 8 9 10 11-0.5

0

0.5

MA-PCA of scaled data without scores refolding

As Ba Cd Co Cu Cr Fe Mn Ni Pb Znmetals

%R2 (2-WAY)

1rst Compone

nt

2nd Compone

ntTotal

67.3 13.2 80.5

0

2

4

6

8

10

water samples

sediment and fishsamples

0 5 10 15 20 25 30 35 40 45 50

0 5 10 15 20 25 30 35 40 45 50-5

0

5

MA-PCA

As Ba Cu Zn

water soluble

metal ions

As Ba Cd Co Cu Cr Fe Mn Ni Pb Znmetals

Page 34: Emma Peré-Trepat 1  and Romà Tauler  2 *

1 2 3 4 5 6 7 8 9 10 110

0.2

0.4

0.6

1 2 3 4 5 6 7 8 9 10 110

0.2

0.4

0.6

MA-MCR-ALS of scaled data with nn and without scores refolding

%R2 (2-WAY)

1rst Compone

nt

2nd Compone

ntTotal

48.2 42.8 80.5

67.3 13.2 80.5

MA-MCR-ALSMA-PCA

0 5 10 15 20 25 30 35 40 45 500

2

4

6

8

10

0 5 10 15 20 25 30 35 40 45 500

2

4

6

8

10

As Ba Cd Co Cu Cr Fe Mn Ni Pb Znmetals

As Ba Cd Co Cu Cr Fe Mn Ni Pb Znmetals

water samples

sediment and fishsamples

As

BaCu Zn

More easilyInterpretable!!!

Page 35: Emma Peré-Trepat 1  and Romà Tauler  2 *

0 10 20 30 40 500

2

4

6

8

10

1 2 3 4 5 6 7 8 9 10 110

0.2

0.4

0.6

0.8

0 10 20 30 40 500

2

4

6

8

10

1 2 3 4 5 6 7 8 9 10 110

0.2

0.4

0.6

Calculation of the boundaries of feasible band solutions(Journal of Chemometrics, 2001, 15, 627-646)

Nearly no rotation ambiguities are present in non-negative environmental profiles calculated by MCR-ALS

(very different to spectroscopy!!!!!)

max

min

Page 36: Emma Peré-Trepat 1  and Romà Tauler  2 *

Xaug

D

X Y

Y

contaminants

compartments

site

s

SVD

Z

FS

W

F

S

W

contaminants

site

ssi

tes

site

s

1

2

3

1 2 3 x

i

site

s

contaminants

compartments (F,S,W)

SVD4 5 6

4

5

6

xii

z

i

z

ii

x

i

xii

PCAMCR-ALS

z

iz

ii

Bilinear modelling of three-way data(Matrix Augmentation or matricizing, stretching, unfolding )

Scores refolding

strategy!!!(applied only

to final augmented

Scores)

Loadings recalculationin two modes

from augmentedscores

Page 37: Emma Peré-Trepat 1  and Romà Tauler  2 *

I

1i

J

1j

K

1k

2k,j,i

d

I

1i

J

1j

K

1k

2k,j,i

e

12R

k,j,id

k.j,id

k,j,ie

where di,j,k is the experimental value in the data cube for metal j.

sample i environmental compartment k

and k,j,id

is the corresponding calculated value using PARAFAC , Tucker3 or for PCA or MCR-ALS of augmented data matrices after recovery ogf loadings in three modes (either from scores refolding or from constraints application,

d̂ zN

i,j,k i,n j,n k,nn=1

= x yji k

i j k i j k

i j k

NN N

i,j,k n ,n ,n i,n j,n k,nn =1 n =1 n =1

d = x y zg

Explained variances using trlinear models (profiles in three modes)

Page 38: Emma Peré-Trepat 1  and Romà Tauler  2 *

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170

2

4

6

8

10

12

sample sites

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170

2

4

6

8

10

12

sample sites

F S W0

0.4

0.8

compartments

F S W-0.8

-0.4

0

0.4

0.8

compartments

As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0

0.1

0.2

0.3

0.4

0.5

metals

As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn-0.5

0

0.5

metals

%R2 (3-WAY)

1rst Component

2nd Component Total

64.7 11.7 76.4

67.3 13.2 80.5

MA-PCA + refolding MA-PCA

MA-PCA of scaled data with nn and scores refolding

Little differences in samples mode!!!

Page 39: Emma Peré-Trepat 1  and Romà Tauler  2 *

As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0

0.2

0.4

0.6

0.8

metals

As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0

0.2

0.4

0.6

0.8

metals1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

0

5

10

15

sample sites

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170

5

10

15

sample sites

F S W0

0.5

1

compartments

F S W0

0.5

1

compartments

MA-MCR-ALS of scaled data with scores refolding

47.0 40.7 76.9

%R2 (3-WAY)

1rst Componen

t

2nd Component Total

MA-MCR-ALS + refoldingMA-MCR-ALS

48.2 42.8 80.5

Page 40: Emma Peré-Trepat 1  and Romà Tauler  2 *

D PARAFAC

Zcontaminants

compartments

site

s

X

YF

S

W

metals

site

scom

partm

ents

(F,S

,W)

site

s

metals

com

partm

ents

(F,S

,W)

Trilinear modelling of three-way data

Page 41: Emma Peré-Trepat 1  and Romà Tauler  2 *

As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0

0.2

0.4

0.6

0.8

metals

As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0

0.2

0.4

0.6

0.8

metals

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170

5

10

15

samples sites

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170

5

10

15

sample sites

F S W0

0.5

1

compartments

F S W0

0.5

1

compartments

PARAFAC of scaled data

%R2 (3-WAY)

1rst Component

2nd Component Total

43.4 36.2 77.4

67.3 13.2 80.5

PARAFACMA-PCA (bilinear)

Page 42: Emma Peré-Trepat 1  and Romà Tauler  2 *

1 2 3 4 5 6 7 8

-0.2

0

0.2

0.4

0.6

0.8

1

1.2Core consistency 99.9395% (yellow target)

Core elements (green should be zero/red non-zero)

Cor

e S

ize

Page 43: Emma Peré-Trepat 1  and Romà Tauler  2 *

D

Xaug

Y

contaminants

compartments

site

s

X Y

Z

TRILINEARITY CONSTRAINT(ALS iteration step)

Selection of species profile

Folding SVDRebuilding augmented scores

Substitution ofspecies profile

F

S

W

F

S

W

contaminants

site

ssi

tes

site

s

site

s

contaminants

compartments (F,S,W)

1

2

3

1

2

3

1’

2’

3’

MCR-ALS

every augmentedscored wnated tofollow the trilinearmodel is refolded

Loadings recalculationin two modes

from augmentedscores

MA-MCR-ALSTrilinear constraint

This constraintis applied at each stepof the ALS optimization

and independently for each component

individually

Page 44: Emma Peré-Trepat 1  and Romà Tauler  2 *

MA-MCR-ALS of scaled data with nn, trilinearity (without scores refolding)

1 2 3 4 5 6 7 8 9 10 110

0.2

0.4

0.6

1 2 3 4 5 6 7 8 9 10 110

0.2

0.4

0.6

0 5 10 15 20 25 30 35 40 45 500

2

4

6

8

10

0 5 10 15 20 25 30 35 40 45 500

2

4

6

8

10

As Ba Cd Co Cu Cr Fe Mn Ni Pb Znmetals

As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn

%R2 (2-WAY)

1rst Component

2nd Compon

entTotal

44.3 42.9 76.8

48.2 42.8 80.5

MA-MCR-ALS nn + trilinearMA-MCR-ALS nn

Page 45: Emma Peré-Trepat 1  and Romà Tauler  2 *

0 10 20 30 40 500

2

4

6

8

10

0 10 20 30 40 500

2

4

6

8

10

1 2 3 4 5 6 7 8 9 10 110

0.2

0.4

0.6

0.8

1 2 3 4 5 6 7 8 9 10 110

0.1

0.2

0.3

0.4

0.5

Calculation of the boundaries of feasible band solutions(Journal of Chemometrics, 2001, 15, 627-646)

No rotation ambiguities are present in trilinear non-negative environmental profiles calculated by MCR-ALS

(very different to spectroscopy!!!!!)

Page 46: Emma Peré-Trepat 1  and Romà Tauler  2 *

As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0

0.2

0.4

0.6

metals

As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0

0.2

0.4

0.6

metals

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170

5

10

15

sample sites

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170

5

10

15

sample sites

F S W0

0.5

1

compartments

F S W0

0.5

1

compartments

MA-MCR-ALS of scaled data with nn, trilinearity and with scores refolding

%R2 (3-WAY)

1rst Componen

t

2nd Component Total

44.3 42.9 76.8

43.4 36.2 77.4

MA-MCR-ALS nn + trilinearPARAFAC nn

Page 47: Emma Peré-Trepat 1  and Romà Tauler  2 *

As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0

0.2

0.4

0.6

0.8

metals

As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0

0.2

0.4

0.6

0.8

metals

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170

5

10

15

samples sites

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170

5

10

15

sample sites

F S W0

0.5

1

compartments

F S W0

0.5

1

compartments

As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0

0.2

0.4

0.6

metals

As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0

0.2

0.4

0.6

metals

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170

5

10

15

sample sites

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170

5

10

15

sample sites

F S W0

0.5

1

compartments

F S W0

0.5

1

compartments

Comparison PARAFAC vs MCR-ALS (trilinearity)

Page 48: Emma Peré-Trepat 1  and Romà Tauler  2 *

TUCKER3

GModel (1,2,2)

D =

metals

compartments

site

s

X

Y

Z

F

S

W

site

s

metalsmetals

site

scom

partm

ents

(F,S

,W)

12

21

22

com

partm

ents

(F,S

,W)

Tucker3 modelling of three-way data

Page 49: Emma Peré-Trepat 1  and Romà Tauler  2 *

Tucker Models with non-negativityconstraints

0 5 10 15 20 25 3064

66

68

70

72

74

76

78

80

82

84

[1 2 2] [1 2 3]

[1 3 3]

[2 2 2] [2 2 3]

[2 3 3] [3 3 3]

Explained variances (%) for each

TUCKER3 mstudied odel studied.

TUCKER3 model

Sum of Squares (%)

[1,1,1] 64.7

[1,1,2] 64.7

[1,1,3] 64.7

[1,2,1] 64.7

[1,2,2] 76.1

[1,2,3] 76.1

[1,3,1] 64.7

[1,3,2] 76.1

[1,3,3] 80.3

[2,1,1] 64.7

[2,1,2] 66.3

[2,1,3] 66.3

[2,2,1] 66.9

[2,2,2] 77.3

[2,2,3] 78.1

[2,3,1] 66.9

[2,3,2] 78.4

[2,3,3] 82.4

[3,1,1] 64.7

[3,1,2] 66.3

[3,1,3] 67.3

[3,2,1] 66.9

[3,2,2] 77.9

[3,2,3] 79.3

[3,3,1] 68.4

[3,3,2] 79.8

[3,3,3] 83.6

[3 2 3]

parsimonious model[1 2 2]

Page 50: Emma Peré-Trepat 1  and Romà Tauler  2 *

0 5 10 150

0.2

0.4

1 2 3 4 5 6 7 8 9 10110

0.5

1

1 2 30

0.5

1

1 2 3 4 5 6 7 8 9 10110

0.5

1

1 2 30

0.5

1

Tucker3 of scaled data

%R2 (3-WAY)

1rst Componen

t

2nd Component Total

50.7 35.3 76.1

43.4 36.2 77.4

TUCKER3PARAFAC

model [1 2 2]model [2 2 2]

Page 51: Emma Peré-Trepat 1  and Romà Tauler  2 *

Folding

D

=

XY

Xaug

Y

contaminants

compartments

site

s

Z

=

F

S

W

F

S

W

metals

site

ssi

tes

site

s

1

2

3

4

5

6

1 2 3 4 5 6

compartments (F,S,W)

1’

2’

3’

4’

5’

6’

=

MCR-ALS

Tucker3 CONSTRAINT(ALS iteration step)

SVD

interacting augmented scores are folded

together

Loadings recalculationin two modes

from augmentedscores

MA-MCR-ALSTucker3

constraint

This constraint is applied at each step of the ALS optimizationand independently and individually for each component i

Page 52: Emma Peré-Trepat 1  and Romà Tauler  2 *

1 2 3 4 5 6 7 8 9 10 110

0.2

0.4

0.6

1 2 3 4 5 6 7 8 9 10 110

0.2

0.4

0.6

MA-MCR-ALS of scaled data with nn, tucker3 (without scores refolding)

0 5 10 15 20 25 30 35 40 45 500

2

4

6

8

10

0 5 10 15 20 25 30 35 40 45 500

2

4

6

8

10

%R2 (2-WAY)

1rst Component

2nd Compon

entTotal

45.2 41.4 75.8

44.3 42.9 76.8

MA-MCR-ALS nn + Tucker3MA-MCR-ALS nn + PARAFAC

model [1 2 2]model [2 2 2]

Page 53: Emma Peré-Trepat 1  and Romà Tauler  2 *

As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0

0.2

0.4

0.6

metals

As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0

0.2

0.4

0.6

metals1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

0

5

10

15

sample sitesF S W

0

0.5

1

compartments

F S W0

0.5

1

compartments

MA-MCR-ALS of scaled data with nn, tucker3 and with scores refolding

%R2 (3-WAY)

1rst Componen

t

2nd Component Total

45.2 41.4 75.8

50.7 35.3 76.1

MA-MCR-ALS nn + Tucker3Tucker3

model [1 2 2]model [1 2 2]

Page 54: Emma Peré-Trepat 1  and Romà Tauler  2 *

CHEMOMETRIC METHOD%R2 (3-WAY) %R2 (2-WAY)

1rst Component

2nd Component

Total1rst

Component2nd

ComponentTotal

MA-PCA (scale) 64.7 11.7 76.4 67.3 13.2 80.5

PARAFAC (non-negativity) 43.4 36.2 77.4 - - -

TUCKER3 (non-negativity) 50.7 35.3 76.1 - - -

MA-MCR-ALS (non-negativity) 47.0 40.7 76.9 48.2 42.8 80.5

MA-MCR-ALS (non-negativity and triliniarity)

44.3 42.9 76.8 - - -

MA-MCR-ALS (non-negativity and tucker restrictions)

45.2 41.4 75.8 - - -

Summary of Results

Page 55: Emma Peré-Trepat 1  and Romà Tauler  2 *

J052 Riu Muga Castelló d'Empúries

J022 Riu Fluvià Besalú

J011 Riu Fluvià Armentera, l'

J034 Riu Ter Manlleu

J028 Riu Terri Sant Julià de Ramis

J112 Riu Ter Colomers

J062 Riu Tordera Fogars de Tordera

J037 Riu Congost Garriga, la

J031 Riu Llobregat Pont de Vilomara i Rocafort, el

J002 Riu Cardener Castellgalí

J084 Riu Llobregat Abrera

J005 Riu Llobregat Martorell

J049 Riu Llobregat Sant Joan Despí

J008 Riu Foix Castellet i la Gornal

J059 Riu Francolí Masó, la

J056 Riu Ebre Flix

E207 Riu Segre Térmens

(67.3%)(13.2%)

INTEGRATION OF CHEMOMETRICS-GEOSTATISTICS (Geographical Information

Systems, GIS)

Page 56: Emma Peré-Trepat 1  and Romà Tauler  2 *

J052 Riu Muga Castelló d'Empúries

J022 Riu Fluvià Besalú

J011 Riu Fluvià Armentera, l'

J034 Riu Ter Manlleu

J028 Riu Terri Sant Julià de Ramis

J112 Riu Ter Colomers

J062 Riu Tordera Fogars de Tordera

J037 Riu Congost Garriga, la

J031 Riu Llobregat Pont de Vilomara i Rocafort, el

J002 Riu Cardener Castellgalí

J084 Riu Llobregat Abrera

J005 Riu Llobregat Martorell

J049 Riu Llobregat Sant Joan Despí

J008 Riu Foix Castellet i la Gornal

J059 Riu Francolí Masó, la

J056 Riu Ebre Flix

E207 Riu Segre Térmens

(67.3%)(13.2%)

INTEGRATION OF CHEMOMETRICS-GEOSTATISTICS (Geographical Information

Systems, GIS)

Page 57: Emma Peré-Trepat 1  and Romà Tauler  2 *

J052 Riu Muga Castelló d'Empúries

J022 Riu Fluvià Besalú

J011 Riu Fluvià Armentera, l'

J034 Riu Ter Manlleu

J028 Riu Terri Sant Julià de Ramis

J112 Riu Ter Colomers

J062 Riu Tordera Fogars de Tordera

J037 Riu Congost Garriga, la

J031 Riu Llobregat Pont de Vilomara i Rocafort, el

J002 Riu Cardener Castellgalí

J084 Riu Llobregat Abrera

J005 Riu Llobregat Martorell

J049 Riu Llobregat Sant Joan Despí

J008 Riu Foix Castellet i la Gornal

J059 Riu Francolí Masó, la

J056 Riu Ebre Flix

E207 Riu Segre Térmens

(67.3%)(13.2%)

INTEGRATION OF CHEMOMETRICS-GEOSTATISTICS (Geographical Information

Systems, GIS)

Page 58: Emma Peré-Trepat 1  and Romà Tauler  2 *

Outline:

• Introduction and motivations of this work

• Environmental data tables

• Chemometrics bilinear and trilinear models and methods

• Example of application: metal contamination sources in fish, sediment and river surface water samples.

• Conclusions

Page 59: Emma Peré-Trepat 1  and Romà Tauler  2 *

Conclusions

Chemometric methods allow resolution of environemtal sources of chemical contaminants

However we should we aware of how every method displays the information because the mathematical properties of the used method are different (i.e. orthogonality vs non-negativity, bilinearity vs trilinearity, nr. of components...)

This interpretation and resolution of environmental sources is not easy because the contamination sources in real world are correlated and because of experimental data limitations (environmental sources should show variation in the investigated data set).

Bilinear PCA and MCR-ALS can be used to study multiway data sets and compared with multiway methods (like PARAFAC and Tucker if appropriate scores refolding is performed)

Bilinear non-negative MCR-ALS solutions may provide good approximation of the real sources because non-negative environmental profiles have little rotation ambiguity

Page 60: Emma Peré-Trepat 1  and Romà Tauler  2 *

Conclusions

PARAFAC and Tucker3 may provide simpler models and they are special useful for trilinear data or when not the same number of components are present in the different modes.

Intermediate situations between pure bilinear and pure trilinear models can be easily implemented in MCR-ALS

Bilinear based models are more flexible than trilinear based models to resolve ‘true’ sources of data variation

Different number of components and interactions between components in different modes (constraint under development) can be considered in mixed bilinear-trilinear-Tucker MA-MCR models

For an optimal RESOLUTION, the model should be in accordance with the 'true' data structure

Integration of Chemometrics-GIS results may facilitate geographical and temporal interpretation of contamination sources and they correlation with land uses, population and industrial activities

Page 61: Emma Peré-Trepat 1  and Romà Tauler  2 *

Acknowledgements

• Water Catalan Agency is acknowledge for its financial support and for providing experimental data sets

• Research grant Project MCYT, Nr. BQU2003-00191, Spain