new data analysis in gcxgc - tecnometrix · 2014. 7. 2. · welcome to the magic world of...

56
Data analysis in GCxGC Data analysis in GCxGC Van ‘t Hoff Institute for Molecular Sciences Van ‘t Hoff Institute for Molecular Sciences University of Amsterdam University of Amsterdam Gabriel Vivó Truyols Analytical-chemistry group Van ‘t Hoff Institute for Molecular Sciences University of Amsterdam [email protected]

Upload: others

Post on 17-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Data analysis in GCxGCData analysis in GCxGC

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Gabriel Vivó TruyolsAnalytical-chemistry group

Van ‘t Hoff Institute for Molecular SciencesUniversity of Amsterdam

[email protected]

Page 2: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

IntroductionFrom 1D chromatography to 2D chromatography: what does change?

GCxGC, Riva - 2014

2 4 6 8 10 12 14 16 18Time, min

0x100

4x106

8x106

1x107

Inte

nsity

Page 3: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

The complexity of the data changesInstruments can be classified according to the order of the tensor of data used to represent a single experiment:

Zero-order instruments Produce Zero-order tensor

(e.g. a number) Example pH - meter

First-order instruments Produce First-order tensor

(e.g. a vector) Example UV-VIS spectrometer

Second-order instruments Produce Second-order tensor

(e.g. a matrix) Example LC-MS, GCxGC

Third-order instrument Produce Third-order tensor

(e.g. a “cube” of data) Example GCxGC-MS

nth-order instrument exist, but they are rare

The changes from 1D to 2D

GCxGC, Riva - 2014

Page 4: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

The concept of “peak vicinity” changes

Two-dimensional

1tR2 t R

One-dimensional

tR

Peak of interest

Peak of interest

Neighbors Neighbors

S. Peters, G. Vivó-Truyols, P. Marriott, P.J. Schoenmakers, J. Chromatogr. A. 1146 (2007), 232-241.

The changes from 1D to 2D

GCxGC, Riva - 2014

Page 5: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

The concept of “peak resolution” is different

A

B

One-dimensional

tR

Two-dimensional

1tR2 t R

f

g

Valley-to-peak ratio(between A and B) g

fP gfP

AB?

Valley-to-peak ratio(between A and B)?

S. Peters, G. Vivó-Truyols, P. Marriott, P.J. Schoenmakers, J. Chromatogr. A. 1146 (2007), 232-241.GCxGC, Riva - 2014

The changes from 1D to 2D

Page 6: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

The concept of “peak” changes

One-dimensional

tR

Two-dimensional

The changes from 1D to 2D

GCxGC, Riva - 2014

Page 7: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Do the main steps in data processing change?

Ste

p 2

Pre-process

Ste

p 3

MeasureS

tep

1View

• Base-line correction

• Noise filtering• Spike filtering• Alignment• … etc.

• Peak detection / integration

• Calibration• Deconvolution• Pattern

recognition

The changes from 1D to 2D

GCxGC, Riva - 2014

Page 8: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Do the main steps in data processing change?

Ste

p 2

Pre-process

Ste

p 3

MeasureS

tep

1View

• Base-line correction

• Noise filtering• Spike filtering• Alignment• … etc.

• Peak detection / integration

• Calibration• Deconvolution• Pattern

recognition• Class separation

The changes from 1D to 2D

GCxGC, Riva - 2014

Basically, the steps are the same, but the algorithms used for each step may be different.

Univariate

Multivariate

• Folding• Phasing

Page 9: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

First step: visualizationFirst step: visualization

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

GCxGC, Riva - 2014

Ste

p 2

Pre-process

Ste

p 3

Measure

Ste

p 1

View

• Base-line correction

• Noise filtering• Spike filtering• Alignment• … etc.

• Peak detection / integration

• Calibration• Deconvolution• Pattern

recognition• Class separation

• Folding• Phasing

Page 10: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

VisualisationRaw data in 2D chromatography

0 10 20 30Time, min

-101234

Abso

rban

ce, A

U

Chromatogram of Glycine preparation (254nm)(data courtesy of University of Valencia)

Two-dimensional chromatographyGCxGC chromatogram of diesel (FID detector)First dimension: non-polar; Second dimension: polar

GCxGC, Riva - 2014

Page 11: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

“Folding” the chromatogram Visualisation

GCxGC, Riva - 2014

Mod time (8 seconds)

27002710

27202730

2740

02

46

80

2000

4000

6000

8000

10000

1tR, s2tR, s

FID

sig

nal

Page 12: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

“Folding” the chromatogram Visualisation

27002720

2740

02

46

80

5000

10000

1tR, s2tR, s

FID

sig

nal

2700 2710 2720 2730 27400

1

2

3

4

5

6

7

8

1tR, s

2 t R, s

0

2000

4000

6000

8000

10000

Mod time (8 seconds)

Page 13: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

“Folding” the chromatogram: phasing Visualisation

Mod time (8 seconds)

27002720

2740

46

8100

5000

10000

1tR, s2tR, s

FID

sig

nal

2700 2710 2720 2730 2740

3

4

5

6

7

8

9

10

1tR, s

2 t R, s

0

2000

4000

6000

8000

10000

Phase = 0.3

Page 14: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

2700 2710 2720 2730 27404

5

6

7

8

9

10

11

12

1tR, s

2 t R, s

0

2000

4000

6000

8000

10000

27002720

2740

46

810

120

5000

10000

1tR, s2tR, s

FID

sig

nal

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

“Folding” the chromatogram: phasing Visualisation

Mod time (8 seconds)

Phase = 0.5

Page 15: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

2700 2710 2720 2730 27406

7

8

9

10

11

12

13

14

1tR, s

2 t R, s

0

2000

4000

6000

8000

10000

27002720

2740

68

1012

140

5000

10000

1tR, s2tR, s

FID

sig

nal

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

“Folding” the chromatogram: phasing Visualisation

Mod time (8 seconds)

Phase = 0.75

Page 16: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Cylindrical coordinates. An alternative way to represent the data.

Visualisation

J.J.A.M. Weusten, E.P.P.A. Derks , J.H.M. Mommers, S. van der Wal, Anal. Chim. Acta 726 (2012), 9

1tR

=2tR

GCxGC, Riva - 2014

Page 17: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Interpolation Visualisation

27002720

2740

46

8100

5000

10000

1tR, s2tR, s

FID

sig

nal

GCxGC, Riva - 2014

Page 18: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

+ =

Welcome to the magic world of chemometrics!

Interpolation Visualisation

GCxGC, Riva - 2014

Page 19: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

“Folding” the chromatogram: final result Visualisation

Page 20: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Conclusions Visualisation

• Visualising is simple, and gives a lot of information.

• Folding (one-dimensional) data into (2D) image introduces discontinuities in the edges. Other visualization methods (cylindrical coordinates) possibloe.

• Phasing can be of great help.

• Careful with “cosmetic” effects!

GCxGC, Riva - 2014

Page 21: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Second step: Pre-processingSecond step: Pre-processing

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

GCxGC, Riva - 2014

Ste

p 2

Pre-process

Ste

p 3

Measure

Ste

p 1

View

• Base-line correction

• Noise filtering• Spike filtering• Alignment• … etc.

• Peak detection / integration

• Calibration• Deconvolution• Pattern

recognition• Class separation

• Folding• Phasing

Page 22: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pre-processingTypical problems: base-line drifts and noise

0 10 20 30Time, min

-101234

Abso

rban

ce, A

U

0 10 20 30Time, min

-0.15-0.1

-0.050

0.050.1

Abso

rban

ce, A

U

Base-line drifts Noise

20 20.2 20.4 20.6 20.8 21Time, min

-0.04-0.02

00.020.040.06

Abso

rban

ce, A

U

GCxGC, Riva - 2014

Page 23: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

0 10 20 30Time, min

-0.2

-0.1

0

0.1

0.2

Abso

rban

ce, A

U

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pre-processingBase-line drifts.

0 10 20 30Time, min

-0.15-0.1

-0.050

0.050.1

Abso

rban

ce, A

U

Original chromatogram

Corrected chromatogr.

Fitted base-line correction

GCxGC, Riva - 2014

Weighted least squares fitting

Page 24: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pre-processing

GCxGC, Riva - 2014

Originalchromatogram

Base-line drifts.

Weighted least squares fitting

Fitted base-line correction

Correctedchromatogram

Page 25: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pre-processing

GCxGC, Riva - 2014

Originalchromatogram

Fitted base-line correction

Correctedchromatogram

Other (more sophisticated) options:

- Use splines- Base-line correction coupled to peak detection- Fourier-transform based approaches- Wavelet-based approaches

Base-line drifts.

Weighted least squares fitting

Page 26: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pre-processing

GCxGC, Riva - 2014 S.E. Reichenbach, M. Ni, D. Zhang, E.B. Ledford Jr., J. Chromatogr. A, 985 (2003) 47 - 56

Base-line is reached at the (half) upper part

Base-line is reached at the (half) bottom part

Base-line drifts.

Page 27: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pre-processing

GCxGC, Riva - 2014 S.E. Reichenbach, M. Ni, D. Zhang, E.B. Ledford Jr., J. Chromatogr. A, 985 (2003) 47 - 56

Consider the positions with the smallest values in each half

Estimate local background parameters using neighboring

values

Interpolate the main background trend and

substract it

Base-line drifts.

Page 28: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pre-processingNoise removal. Smoothing and derivatives.

Savitzky-Golay filter is the most common method

Two parameters should be optmisized

• Window size• Polynomial degree

These parameters govern how much correlated noise is

removed

• Large window sizes and low polynomial degree

Too much noise is removed (chromatograms appear deformed)

• Small window sizes and large polynomial degrees Too much noise remains

GCxGC, Riva - 2014

Page 29: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pre-processingNoise removal. Smoothing and derivatives.

Window size = 11Polynomial = 2

Window size = 41Polynomial = 2

Window size = 251Polynomial = 2

Originalchromatogram

Correctedchromatogram

G. Vivó-Truyols, P.J. Schoenmakers, "Automatic selection of optimal Savitzky-Golay smoothing”, Anal. Chem. 78 (2006) 4598-4608.GCxGC, Riva - 2014

FID

sig

nal

Page 30: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pre-processingNoise removal. Spikes.

A good way of removing spikes consists of passing a median filter(before the Savitsky-Golay filter)

Savitzky-Golay filter

Median filterBase-line correction

Original data

Parameter to tune: window

size

Parameters to tune: window size and

polynomial degreeOptimising three parameters

GCxGC, Riva - 2014

Page 31: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pre-processingAlignment.

Two types of alignment

Between-chromatogram alignment

Between-modulation alignment

Alignment is not always necessary, depending on the final objective of the analysis

Rarely doneUsing 2D

techniques(folded data)

Using 1D techniques

(unfolded data)

Page 32: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

15 different (but related) chromatograms

13.5 13.6 13.7 13.8Time, min

Alignment

Pre-processingAlignment. COW (1D)

Page 33: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pre-processingAlignment. Using (truly) 2D algorithms

S. Castillo, I. Mattila, J. Miettinen, M. Orešič, T.Hyötyläinen, Anal. Chem. 83 ( 2011) 3058–3067

Score alignment in GCxGC-MS

D. Zhang, X. Huang, F.E. Regnier,M. Zhang, Anal. Chem., 80 (2008) 2664–2671

COW-adapted GCxGC-MS (using

single channel)

Page 34: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Conclusions Pre-processing

• Pre-processing methods are almost the same: one-dimensional = two-dimensional. Normally done in the (pre-folded) raw data.

• Every case needs a particular solution (it always exists, but some care should be taken!)

GCxGC, Riva - 2014

Page 35: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Third step: measureThird step: measure

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

GCxGC, Riva - 2014

Ste

p 2

Pre-process

Ste

p 3

Measure

Ste

p 1

View

• Base-line correction

• Noise filtering• Spike filtering• Alignment• … etc.

• Peak detection / integration

• Calibration• Deconvolution• Pattern

recognition• Class separation

• Folding• Phasing

Page 36: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Peak detection in one step: the watershed algorithm Peak detection

Most common software programs use the watershed algorithm to detect peaks in 2D chromatography:

J. De Bock et al., doi 10.1007/11558484

1 2 3 4

GCxGC, Riva - 2014

Page 37: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

??? ?Single catchmentbasin?

G. Vivó-Truyols, H.G. Janssen, J. Chromatogr. A, doi:10.1016/j.chroma.2009.12.063

Peak detection in one step: the watershed algorithm

GCxGC, Riva - 2014

Peak detection

Page 38: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

The first problem using the watershed algorithm.

Peak detection in one step: the watershed algorithm

GCxGC, Riva - 2014

Peak detection

Page 39: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Critical variability in second-dimension retention time:

2

2 1

pq

tP critfail

22 1

pq

tP critfail

mp

2

1

m

p

2

1

1mq 1mq

q=0.5 (eight cuts per peak)

q=1 (four cuts per peak)

q=2 (two cuts per peak)

q=4 (1 cut per peak)0 0.04 0.08 0.12 0.16 0.2

tR,crit, min

0

0.2

0.4

0.6

0.8

1

Pfa

il

p=5p=20

p=2.5

p=10

Current instruments

Current chromatographic practice

G. Vivó-Truyols, H.G. Janssen, J. Chromatogr. A, doi:10.1016/j.chroma.2009.12.063

Peak detectionin one step: the watershed algorithm

GCxGC, Riva - 2014

Peak detection

Page 40: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

GCxGC, Riva - 2014

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

S. Peters, G. Vivó-Truyols, P.J. Marriott and P.J. Schoenmakers, J. Chromatogr. A 1156 (2007) 14.E.J.C. van der Klift, G. Vivó-Truyols, F.W. Claassen, F.L. van Holthoon, T.A. van Beek, J. Chromatogr. A, 1178 (2008) 43.

Time, arbitrary units (AU)

Ste

p

1 Detect peaks as in one-dimensional chromatography

Use information from derivatives (pre-processing

step)

Peak detectioni in two steps. Peak detection

Page 41: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Ste

p

2 Merge peaks that belong to the same compound according to 2nd-dimension retention time differences

50100

150

0

5

100

500

1000

1500

4 4.5 5 5.5 60

500

1000

1500

2tR, AU

T: Tolerance criterion

70 90 110 1304.5

5

5.5

1tR, AU

2 t R, A

U

GCxGC, Riva - 2014

Peak detectioni in two steps. Peak detection

Page 42: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Ste

p

2 Merge peaks that belong to the same compound according to 2nd-dimension retention time differences

50100

150

0

5

100

500

1000

1500

70 90 110 1304.5

5

5.5

1tR, AU

2 t R, A

U

4 4.5 5 5.5 60

500

1000

1500

2tR, AU

D

D>T ?

GCxGC, Riva - 2014

Peak detectioni in two steps. Peak detection

Page 43: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Ste

p

3 Check unimodality

50100

150

0

5

100

500

1000

1500

4 4.5 5 5.5 60

500

1000

1500

2tR, AU

70 90 110 1304.5

5

5.5

1tR, AU

2 t R, A

U

GCxGC, Riva - 2014

Peak detectioni in two steps. Peak detection

Page 44: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Conclusions Peak detection

• Two methods available: (inverse) watershed, and two-step peak detection process.

• Peak detection seems to be still a subject of discussion.

GCxGC, Riva - 2014

Page 45: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Deconvolution mehtods. Deconvolution

GCxGC, Riva - 2014

Deconvolution methods (always with MS)

Using 1D (unfolded) data

Use truly (folded) 2D data

• ALS or rank annihilation methods• Does not need between-

modulation alignment• Similar to AMDIS

• PARAFAC (needs between-modulation alignment)

• PARAFAC2 (more robust against between-modulation alignment)

Main problem: determine the number of components behind the peak cluster

Page 46: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Deconvolution mehtods. Deconvolution

GCxGC, Riva - 2014A.E. Sinha, J.L. Hope, B.J. Prazen, C.G. Fraga, E.J. Nilsson, R.E.

Synovec, J. Chromatogr. A, 1056 (2004) 145 - 154

An example of PARAFAC

Page 47: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Third step: measureThird step: measure

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

GCxGC, Riva - 2014

Ste

p 2

Pre-process

Ste

p 3

Measure

Ste

p 1

View

• Base-line correction

• Noise filtering• Spike filtering• Alignment• … etc.

• Peak detection / integration

• Calibration• Deconvolution• Pattern

recognition• Class separation

• Folding• Phasing

Page 48: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pattern recognition as a variable reductionS

tep

1 Obtain chromatogram(s)

Ste

p

2 Peak detection

Ste

p

3 Pattern recognition

Ste

p

4 Unknown sample characterized

HPLC-MS/MS GCxGC-MS

GC-MS

Base-line correction Alignment

Peak detection

Digital variables

Chemicalvariables

Process/biologicalvariables

From digital variables (chromatogram) to process variables

Raw data

Features

Healthy/sick

Pattern recognition

GCxGC, Riva - 2014

Page 49: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Information

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Is peak detection necessary?

Chromatographic data

Using raw data directly

Ana

lysi

s of

flam

e ac

cele

rant

s us

ing

GC

xGC

Source ASource B

… for example:

Pattern recognition

GCxGC, Riva - 2014

Page 50: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pattern recognition as a variable reductionS

tep

1 Obtain chromatogram(s)

Ste

p

2 Peak detection

Ste

p

3 Pattern recognition

Ste

p

4 Unknown sample characterized

HPLC-MS/MS GCxGC-MS

GC-MS

Base-line correction Alignment

Peak detection

Digital variables

Chemicalvariables

Process/biologicalvariables

From digital variables (chromatogram) to process variables

Raw data

Features

Healthy/sick

Pattern recognition

GCxGC, Riva - 2014

Page 51: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pattern recognition in GCxGC. The options Pattern recognition

GCxGC, Riva - 2014

Pattern recognition in GCxGC

Using raw data Using peak tables

• Alignment is critical• Less chance to miss important

compounds• Normally done with the unfolded

(raw) data, but not always (e.g. N-PLS)

• Alignment not important, but peak tracking is essential (normally MS should be present)

• Chance to miss important compounds (close to the S/N)

• (Truly) 1D method

Page 52: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pattern recognition in GCxGC. Supervised methods. Pattern recognition

GCxGC, Riva - 2014

Any method will be prone to overfitting

In supervised pattern recognition of GCxGC, a tremendous reduction of variables is performed (form millions to a few tens/hundreds)

Any variable pre-reduction (e.g. using Fisher ratios) should be done within a cross-validation loop

Otherwise the results will be optimistic (a method that seems to work, when in fact it only works for that data)

Page 53: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pattern recognition in GCxGC. Example of a wrong strategy Pattern recognition

GCxGC, Riva - 2014

Ste

p 2Variable

selection: Fisher ratio on the raw

data Ste

p 3Supervised

pattern recognition: PLS-DA to

separate sick from healthyS

tep

1Obtain GCxGCchromatograms for sick (50) and

healthy (50) Ste

p 4Consider the

coefficients from PLS-DA as indicators of

potential metabolites

Objective: discovering metabolites responsible for cancer tumor

??? ?Aren’t you Overfitting? No, I’ve been cross-

validating the PLS-DA

… but the variable pre-selection has been done with the full data set!!

Page 54: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pattern recognition in GCxGC. Example of a wrong strategy Pattern recognition

GCxGC, Riva - 2014

Ste

p 2Variable

selection: Fisher ratio on the raw

data Ste

p 3Supervised

pattern recognition: PLS-DA to

separate sick from healthyS

tep

1Obtain GCxGCchromatograms for sick (50) and

healthy (50) Ste

p 4Consider the

coefficients from PLS-DA as indicators of

potential metabolites

Objective: discovering metabolites responsible for cancer tumor

??? ?Aren’t you Overfitting? No, I’ve been cross-

validating the variable seletion and the PLS-DA

correct

Page 55: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Orig

inal

dat

a se

t

Ste

p 2Withdraw

subset 1, fit the model with the rest, and test

the model with subset 1 S

tep

3

Do the same for the

other k sectionsS

tep

1Divide the original data

set in k subsets

(randomly) Ste

p 4

Sum up the error of the model in all

validation sets

1

2

k

Cal

ibra

tion

Val

Cal

ibra

tion

Val

Cal

Test the model here

Test the model here

Cal

Val

Cal

Pattern recognition in GCxGC. Example of a correct strategy Pattern recognition

“model” = “variable pre-selection (Fisher ratio) + PLS-DA”

Page 56: New Data analysis in GCxGC - Tecnometrix · 2014. 7. 2. · Welcome to the magic world of chemometrics! Interpolation Visualisation GCxGC, Riva - 2014. Van ‘t Hoff Institute for

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Conclusions Multivariate methods

• Deconvolution: normally done with the unfolded data (less problems with between-modulation alignment)

• Deconvolution: problem to establish the number of compounds (normally done in a manual way)

• Two ways for pattern recognition: with raw data (normally preferred) or with peak table.

• Careful with validation of supervised pattern recognition. Variable pre-selection should be included in the validation loop.

GCxGC, Riva - 2014