biosyst-mebios the potential of functional data analysis for chemometrics dirk de becker, wouter...

30
BIOSYST-MeBioS www.biw.kuleuven.be The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

Upload: oswald-mcdaniel

Post on 04-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIOSYST-MeBioS www.biw.kuleuven.be

The potential of Functional Data

Analysis for Chemometrics

Dirk De Becker, Wouter Saeys,

Bart De Ketelaere and Paul Darius

Page 2: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

The Potential of FDA for Chemometrics

Introduction to FDA

Introduction to Chemometrics

Using FDA in chemometrics

For prediction

For Analysis Of Variance

Conclusions

Page 3: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

What is Functional Data Analysis?

Developed by Ramsay & Silverman (1997)

Analyse Data

By approximating it

Using some kind of functional basis

Mainly for longitudinal data

High correlation between neighbouring datapoints

Page 4: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

Why use FDA?

Data as single entity <-> individual observations

Make a function of your data

Derivatives

Reduce the amount of data

Noise -> smoothing

Impose some known properties on the data

Monotonicity, non-negativeness, smoothness, ...

Page 5: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

Basis Functions?

Polynomials: 1, t, t², t³, ...

Fourier: 1, sin(ωt), cos(ωt), sin(2ωt),

cos(2ωt)

Splines

Wavelets

Depends on your data

Page 6: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

Chemometrics

Measure optical properties of material

Transmission or reflection of light

At a large number of wavelengths

Use these properties to predict something else

Page 7: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

Why Chemometrics?

Fast

Cheap

Non-destructive

Environment-friendly

Page 8: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

Classical methods

Ignore correlation between neighbouring

wavelengths:

Page 9: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

FDA in chemometrics

NIR spectra

Absorption peaks

Width and height

Basis: B-splines

~ shape of absorption peaks

Preserve the vicinity constraint

Page 10: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

Spline Functions

Piecewise joining polynomials of order m

Fast evaluation

Continuity of derivatives

Up to order m-2

In L interior knots

Degrees of freedom: L + m

Flexible

Page 11: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

Page 12: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

Constructing a spline basis

Order

What to use the model for

Mostly cubic splines (order 4)

Number and position of knots

Use enough

Look at the data

!Overfitting

Page 13: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

Position of knots

More variation -> more knots

0 500 1000 1500 2000

12

34

5

valu

es

54 knots, equally spaced

0 500 1000 1500 2000

12

34

5

valu

es

54 knots, tuned

Page 14: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

B-spline approximation

Page 15: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

FDA for prediction

Functional regression models

P-Spline Regression (Marx and Eilers)

Non-Parametric Functional Data Analysis

(Ferraty and Vieu)

Page 16: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

Functional Regression Models

Project spectra to spline basis

Apply Multivariate Linear Regression to the spline

coefficients

Great reduction in system complexity

Natural shape of absorption peaks is used

Page 17: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

Functional Regression Models: case study

420 samples of hog manure

Reflectance spectra

Total nitrogen (TN) and dry matter (DM) content

PLS and Functional Regression applied

Page 18: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

Functional Regression: case study (ct'd)

Page 19: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

Functional Regression: case study: results

FDA PLS # B-splines # lat varDataset 1 10,4069 10,3282 22 6Dataset 2 9,9084 10,565 20 6Dataset 3 10,4921 10,4857 22 6Dataset 4 10,4533 10,3236 22 6Dataset 5 9,1203 10,6019 23 6

Dry matter content

FDA PLS # B-splines # lat varDataset 1 1,1922 1,2603 25 6Dataset 2 1,1582 1,1826 25 6Dataset 3 1,1806 1,2325 25 6Dataset 4 1,253 1,2852 25 6Dataset 5 1,1562 1,2664 25 6

Total nitrogen content

Page 20: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

P-Spline Regression (PSR)

By Marx and Eilers

Construct with B-splines:

Use roughness parameter on

Minimize

Full spectra are used for regression

BD

22 DXByS

Page 21: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

P-Spline Regression: case study

121 samples of seed pills

y is % humidity

PLS: RMSEP = 1,19

PSR: RMSEP = 1,115

# B-spline coefficients = 7

λ = 0.001

Page 22: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

Non-Parametric Functional Data Analysis

By F. Ferraty and P. Vieu

No regression model is involved

Prediction by applying local kernel functions in

function space

So far, no good results yet

Page 23: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

FDA in Anova setting: FANOVA

ANOVA:

“Study the relation between a response variable and

one or more explanatory variables”

is overall mean

are the effects of belonging to a group g

are residuals

)()()()( iggigx

)()( g

)( ig

Page 24: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

FANOVA: theory

Constraint:

Introduce so that

Introduce functional aspect:

Constraint: introduce

],[,0)( 1 mb

Z

Tg ],,,[ 1

)()()( Zx

)()( Cx )()( B

*** ,, xCZ

Page 25: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

FANOVA: goal and solution

Goal: estimate from

Solution:

B C

**1**^

)( CZZZB T

Page 26: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

FANOVA: significance testing

Locally:

Globally: ig

igig Zxerrordf

MSE 2^

)]()()([)(

1)(

)(/)(sup CMSEContrastM

Page 27: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

FANOVA: case study

Spectra of manure

4 types of animals: dairy, beef, calf, hog

3 ambient temperatures: 4°C, 12°C, 20°C

3 sample temperatures: 4°C, 12°C, 20°C

9 replicates

=> 324 samples

Model: )()()()()( ijklkjiijkl SATI

]9,1[],3,1[],3,1[],4,1[ lkji

Page 28: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

FANOVA: case study (ct'd)

Page 29: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

FANOVA: case study (ct'd)

Page 30: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius

BIO

SY

ST

-MeB

ioS

Conclusions

Splines are a good basis for fitting spectral

data

Using FDA, it is possible to include vicinity

constraint in prediction models in

chemometrics

FANOVA is a good tool to explore the

variance in spectral data