near infrared spectroscopy and chemometrics: a marriage...
TRANSCRIPT
Near Infrared Spectroscopy andChemometrics: a Marriage Made inHeaven
Tom FearnUCL, [email protected]
Introduction
• Quantitative near infrared spectroscopy needschemometrics, but both sides of this partnershiphave benefited from the interaction:– NIR has provided chemometrics with a success story,– NIR applications have inspired new chemometric
methodology and have supplied a market for thedevelopment and sale of chemometric software.
• Over the last 30 years the two methodologieshave prospered together.
Near infrared spectra of 40 biscuit doughs
Spectra with the mean subtracted
Early history (1970s and 80s) - MLR
• Sometime in the 1970s, Karl Norris started usingmultiple linear regression to make quantitative NIRcalibrations.
• This led directly to a generation of filterinstruments, calibrated by MLR, with manysuccessful applications in food and agriculture.
• MLR combined with variable (i.e. wavelength)selection was the routine method for scanninginstruments.
Five wavelengths for biscuit composition
MLR and variable selection - comments
• The regression can be ill-conditioned, because ofhigh correlations between the predictors.
• In the early days wavelength selection washampered by slow computers. It went out offashion when methods like PLS appeared.However– It has enjoyed a revival: faster computers, stochastic
search, genetic algorithms, etc.– If you want to build a cheap and fast instrument, it is
useful to reduce the measurement to a few wavelengths
Consolidation (1980s - …) – PLS and PCR
• In the early 1980s the first commercial scanninginstruments appeared.
• This prompted a wave of research investigatingnew applications, especially in food & agriculture.Calibration methods that used the full spectrumsoon began to be employed– Principal component regression (PCR), Cowe and
McNicol in Scotland,– Partial least squares regression (PLSR), Wold, Martens
even further north.
• These have become the standard approaches.
Coefficients for PCR(7) and PLSR(5) for fat
PCR and PLSR - comments
• Although they involve more complex maths, theyare actually easier to implement (especially semi-automatically) than MLR with variable selection.
• In this application there are sound reasons forbelieving in low-dimensional bases for the spectra,so using factor-type methods makes sense.
• There is lots of scope for informative plots and(eventually) there is software that draws them.
• The plots allow (over-?) interpretation of the basisof a calibration.
Loadings for PCR and PLSR for fat
Getting cleverer (?) – ANN and SVM
• More recent developments have focussed on theuse of nonlinear calibration methods borrowedfrom Computer Science
• First came artificial neural networks (ANN)– Very fashionable for a while– A lot of work using small training sets and inadequate
validation caused some disillusion– A bit more of a black box than PLS– Still being used in some major applications, with large
training sets
. . . ANN and SVM, continued
• Then came support vector machines (SVM)– Still the height of fashion– Not being abused so badly (lessons learned?)
• Then came ??– Look and see what the machine learning community is
doing now. NIR will be doing it in a few years.
• Advice: try PLSR or PCR first!
Current challenges
• Large training sets – is it better to use local linearor global nonlinear calibration methods?
• Imaging spectroscopy– How to deal with a data cube of 80k spectra for one
sample?– How to calibrate when you don’t know what each pixel
is?
Thanks for listening.
Questions?