1 6. other issues quimiometria teórica e aplicada instituto de química - unicamp

11
1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP

Upload: loraine-georgina-boone

Post on 28-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP

1

6. Other issues

Quimiometria Teórica e Aplicada

Instituto de Química - UNICAMP

Page 2: 1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP

2

How many components to use?How many components to use?

• Use ‘unfolding trick’ i.e. look at rank of each mode.– does not have strict statistical basis, but generally works

well!

• Use core-consistency diagnostic (PARAFAC).– also seems to work well in practice

• Split-half analysis.

• Does algorithm converge without problems?

• Use full cross-validation.– N-way Toolbox now has a routine for this – can be slow!

• Look at loadings and residuals.

• Use chemical knowledge.

Page 3: 1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP

3

Preprocessing: centering (1)Preprocessing: centering (1)

• We are often interested in the differences between objects, not in their absolute values.– building calibration models: differences between samples

• Mean-centering removes offsets from the data– removes constant background effects

– can help to linearize data, i.e.

Page 4: 1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP

4

Preprocessing: centering (2)Preprocessing: centering (2)

• When performing a calibration, it is most common to remove the mean value from each column:

X

jx

ob

ject

variable

Two-way

jijij xxx *

X

primary variable

secondary variable

ob

ject

xjk

Three-way

jkx

jkijkijk xxx *

Page 5: 1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP

5

Preprocessing: scaling (1)Preprocessing: scaling (1)

• Sometimes we want to analyse variables measured in different units– chemical engineering: temperatures, pressures, flow rates

– QSAR: ionization constants, Hammett constants, dipole moments

• These variables should be scaled in order to give variables an equal chance to appear in the model.

Page 6: 1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP

6

Preprocessing: scaling (2)Preprocessing: scaling (2)

• For two-way arrays (object variables), it is common to divide by the standard deviation after mean-centering the data (‘autoscaling’):

X

j

ob

ject

variable

Two-way

jijij xx /*

X

primary variable

secondary variable

ob

ject

xjk

Three-way

jkAutoscaling can destroy

multilinear structure!

Page 7: 1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP

7

Preprocessing: scaling (3)Preprocessing: scaling (3)

process variable

time

ob

ject

X

Xj

Slab scaling maintains the multilinear structure!

jijkijk xx /*

jprocess variable 1

process variable 2

ob

ject

X

Xj

Xk

j k

Double slab scaling may also be useful - ITERATIVE

kijkijk

jijkijk

xx

xx

/

/**

*

Page 8: 1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP

8

Tucker modelsTucker models

• Tucker1: X = AG + E– Tucker1 = PCA

• Tucker2: X = G(BA)T + E– G (I R2 R3)

– very rarely used

• Tucker3: X = AG(CB)T + E

Page 9: 1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP

9

PARAFAC2PARAFAC2

time shift

wavelength (J)

time (K)

ob

ject

(I)

In PARAFAC2, only the matrix product XiXi

T (J J) is modelled. It works if the correlation structures in the objects are the same.

time shift

Page 10: 1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP

10

Missing dataMissing data

• Expectation-maximization (EM) is a technique for estimating models (PARAFAC, Tucker, PLS, PCA etc.) when some of the data is missing:

X = [X* X#]

known missing

• 0. Initialize X#

nnn EXX ˆ• 1. Estimate model, (maximization)

• 3. Repeat until convergence

• 2. Replace missing values with model values

(expectation)## ˆnn XX

Page 11: 1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP

11

MuitoMuito obrigadoobrigadoparapara sua

atenção!atenção!