Interactive Series Baseline Correction Algorithm
Andrey Bogomolova, Willem Windigb, Susan M. Geerc, Debra B. Blondellc, and Mark J. Robbinsc
a ACD/Labs, Russian Chemometrics Society, Moscow, Russiab Eigenvector Research Inc., Rochester, NY, USAc Eastman Kodak Company, Rochester, NY, USA
Baseline (Background) Problem Baseline is an “eternal” issue in analytical data
processing “Baseline” or “background”?
no clear distinction baseline is associated with a smooth line reflecting a
“physical” interference background tends to be used in a more general sense
to designate ANY unwanted signal including noise and chemical components
Our preference is given to the term “baseline” because smoothness of the background signal is the main assumption of the proposed correction algorithm
Classical Approach to the Baseline Correction Problem Classical baseline correction algorithms with
respect to single curve are almost exhaustively elaborated in the literature
A baseline to be subtracted is fitted by a linear (polynomial) function to the nodes that belong to signal-free regions
The nodes can be automatically detected by the software or manually placed by the user
These methods are advantageous for half-automatic processing where software-generated results need to be revised by a human expert
Serial (Batch) Methods Development of two-dimensional spectroscopy
and hyphenated techniques demanded new methods applicable to data matrices
Early works in this direction applied automated baseline correction algorithms to every individual curve in a matrix dataset
The main problem with this approach is that it neglects internal (inter-spectral) correlations
Instead of the expected rank reduction it may introduce additional variance into the dataset
It is a “black-box” routine that is difficult to control
Multivariate Background Correction Multivariate data analysis produced a revolutionary
impact onto the baseline problem in general The paradigmatic shift from hard- (knowledge-
driven) to soft- or self- (data-driven) modeling has opened new horizons and introduced new concepts
PLS introduces the means to address the background without its subtraction in the calibration context
OSC by S. Wold turns the problem inside out eliminating the variance that is irrelevant for calibration (orthogonal to Y) from the data (X)
A number of other excellent algorithms…
Our Objectives The researchers are typically concentrated at
the development of fully automated background correction methods
Statement: fuzzy character of the baseline problem in general puts in doubt the feasibility of automated (expert-free) baseline correction routines
In contrast, we present an alternative approach that tends to maximize the means of control for a human operator
simplicity visualization interactive stepwise algorithm
The Method The method is applied to a series of curves
(e.g., spectra or chromatograms) The method consists of two distinct steps First, a prototype baseline is constructed from
linear segments by selecting a set of nodes To aid in the node selection the mean values
are calculated to represent the entire series:
Second, the prototype baseline is used to construct individual baselines to be subtracted from the series curves by adjusting the nodes vertically to the corrected curve
j c iji
c
d1
1
HPLC/DAD: Sample Data Calculating the meanSelecting nodesSubtracting the baselineRaw
Corrected
2nd Derivative for Node Selection
Baseline Correction for Curve Resolution Baseline correction is an application-specific
preprocessing technique The present baseline correction algorithm has
been developed to improve the performance of SIMPLISMA (SIMPLe-to-use Interactive Self-modeling Mixture Analysis) curve resolution technique
The algorithm has been used at Eastman Kodak Company over 10 years for routine analysis of TGA/IR data that represent a challenging case for curve resolution:
a lot of components high degree of overlap intensive background signal
TGA/IR Sample Data
Reprinted with permission from Eastman Kodak Company, 2005
Baseline Nature in TGA/IR
The most common reasons for TGA/IR baseline drift:
Temperature fluctuations over time Instrument drift Material scattering Impurities Inappropriate background, etc.
In the present dataset - miscellaneous reasons Spectral domain is more suitable for series
baseline correction because of narrow peaks and explicit baseline areas
Raw spectral series
TGA/IR: Baseline CorrectionCalculating the
mean“Snapping” the
baselineSubtractingRaw
Corrected
Reprinted with permission from Eastman Kodak Company, 2005
TGA/IR: Corrected Data Map
Reprinted with permission from Eastman Kodak Company, 2005
TGA/IR: SIMPLISMA Curve Resolution
Reprinted with permission from Eastman Kodak Company, 2005
IR Library Identification
Reprinted with permission from Eastman Kodak Company, 2005
Conclusions
A new interactive approach to the baseline correction problem has been suggested
It allows for adapting traditional automated single-scan baseline correction routines or for performing manual correction on matrix data as if they were a single curve
Advantages of the method include “transparency” of the process and the means for extensive operator interaction
The method has passed long-term testing in an industrial laboratory and was integrated into a professional software package
In spite of the simplicity of the algorithm, it allows for successful elimination of baselines – even in complex cases such as TGA/IR data
Acknowledgements Antony Williams for his friendly support, and Michel Hachey for his help and valuable ideas