a primer on multivariate calibration

10
F or centuries the practice of cali- bration has been widespread throughout science and engineer- ing. The modern application of calibra- tion procedures is very diverse. For exam- ple, calibration methods have been used in conjunction with ultrasonic measure- ments to predict the gestational age of human fetuses (1), and calibrated bicycle wheels have been used to measure mara- thon courses (2). Within analytical chemistry and re- lated areas, the field of calibration has evolved into a discipline of its own. In ana- lytical chemistry, calibration is the proce- dure that relates instrumental measure- ments to an analyte of interest. In this context, calibration is one of the key steps associated with the analyses of many in- dustrial, environmental, and biological ma- terials. Increased capabilities resulting from advances in instrumentation and computing have stimulated the develop- ment of numerous calibration methods. These new methods have helped to broaden the use of analytical techniques Edward V. Thomas Sandia National Laboratories A Primer on Multivariate Calibration Calibration methods allow one to relate instrumental measurements to analytes of interest in industrial, environmental, and biological materials (especially those that are spectroscopic in nature) for increasingly difficult prob- lems. In the simplest situations, models such as? = a + χ · b have been used to express the relationship between a single measure- ment (y) from an instrument (e.g., absor- bance of a dilute solution at a single wave- length) and the level (*) of the analyte of interest. Typically, instrumental measure- ments are obtained from specimens in which the amount (or level) of the analyte has been determined by some indepen- dent and inherently accurate assay (e.g., wet chemistry). Together, the instrumen- tal measurements and results from the in- dependent assays are used to construct a model (e.g., estimate a and b) that relates the analyte level to the instrumental mea- surements. This model is then used to pre- dict the analyte levels associated with fu- ture samples based solely on the instru- mental measurements. In the past, data acquisition and analy- sis were often time-consuming, tedious ac- tivities in analytical laboratories. The ad- 0003-2700/94/0366-795A/$04.50/0 © 1994 American Chemical Society Analytical Chemistry, Vol. 66, No. 15, August 1, 1994 795 A Report

Upload: edward-v

Post on 20-Dec-2016

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: A Primer on Multivariate Calibration

For centuries the practice of cali­bration has been widespread throughout science and engineer­

ing. The modern application of calibra­tion procedures is very diverse. For exam­ple, calibration methods have been used in conjunction with ultrasonic measure­ments to predict the gestational age of human fetuses (1), and calibrated bicycle wheels have been used to measure mara­thon courses (2).

Within analytical chemistry and re­lated areas, the field of calibration has evolved into a discipline of its own. In ana­lytical chemistry, calibration is the proce­dure that relates instrumental measure­ments to an analyte of interest. In this context, calibration is one of the key steps associated with the analyses of many in­dustrial, environmental, and biological ma­terials. Increased capabilities resulting from advances in instrumentation and computing have stimulated the develop­ment of numerous calibration methods. These new methods have helped to broaden the use of analytical techniques

Edward V. Thomas Sandia National Laboratories

A Primer on Multivariate Calibration

Calibration methods allow one to relate

instrumental measurements to

analytes of interest in industrial,

environmental, and biological materials

(especially those that are spectroscopic in nature) for increasingly difficult prob­lems.

In the simplest situations, models such as? = a + χ · b have been used to express the relationship between a single measure­ment (y) from an instrument (e.g., absor-bance of a dilute solution at a single wave­length) and the level (*) of the analyte of interest. Typically, instrumental measure­ments are obtained from specimens in which the amount (or level) of the analyte has been determined by some indepen­

dent and inherently accurate assay (e.g., wet chemistry). Together, the instrumen­tal measurements and results from the in­dependent assays are used to construct a model (e.g., estimate a and b) that relates the analyte level to the instrumental mea­surements. This model is then used to pre­dict the analyte levels associated with fu­ture samples based solely on the instru­mental measurements.

In the past, data acquisition and analy­sis were often time-consuming, tedious ac­tivities in analytical laboratories. The ad-

0003-2700/94/0366-795A/$04.50/0 © 1994 American Chemical Society

Analytical Chemistry, Vol. 66, No. 15, August 1, 1994 795 A

Report

Page 2: A Primer on Multivariate Calibration

Report

vent of high-speed digital computers has greatly increased data acquisition and analysis capabilities and has provided the analytical chemist with opportunities to use many measurements—perhaps hundreds—for calibrating an instru­ment (e.g., absorbances at multiple wave­lengths) . To take advantage of this tech­nology, however, new methods (i.e., multi­variate calibration methods) were needed for analyzing and modeling the experimen­tal data. The purpose of this Report is to introduce several evolving multivariate cal­ibration methods and to present some im­portant issues regarding their use.

Univariate calibration To understand the evolution of multivari­ate calibration methods, it is useful to re­view univariate calibration methods and their limitations. In general, these meth­ods involve the use of a single measure­ment from an instrument such as a spec­trometer for the determination of an ana­lyte. This indirect measurement can have significant advantages over gravimetric or other direct measurements. Foremost among these advantages is the reduction in sample preparation (e.g., chemical sepa­ration) that is often required with the use of direct methods. Thus, indirect meth­ods, which can be rapid and inexpensive, have replaced a number of direct meth­ods.

The role of calibration in these analy­ses is embodied in a two-step procedure: calibration and prediction. In the calibra­tion step, indirect instrumental measure­ments are obtained from specimens in which the amount of the analyte of inter­est has been determined by an inherently accurate independent assay. The set of in­strumental measurements and results from the independent assays, collectively referred to as the calibration set or train­ing set, is used to construct a model that relates the amount of analyte to the instru­mental measurements.

For example, in determining Sb con­centration by atomic absorption spectros­copy (AAS), the absorbances of a number of solutions (with known concentrations of Sb) are measured at a strong absorbing line of elemental Sb (e.g., 217.6 nm). A model relating absorbance and Sb concen­tration is generated. In this case, model development is straightforward, because

Beer's law can be applied. In other situa­tions, the model may be more complex and lack a straightforward theoretical basis. In general, this step is the most time-consuming and expensive part of the overall calibration procedure because it involves the preparation of reference sam­ples and modeling.

Next, the indirect instrumental mea­surement of a new specimen (in combina­tion with the model developed in the cal­ibration step) is used to predict its associ­ated analyte level. This prediction step is illustrated in Figure 1, which shows Sb de­termination by AAS. Usually, this step is repeated many times with new specimens using the model developed in the calibra­tion step.

Even in the simplest case of univariate calibration—when there is a linear rela­tionship between the analyte level (AT) and instrumental measurement (y)—model­ing can be done in different ways. In one approach, often referred to as the classical method, the implied statistical model is

y\ = V * i + e-, (1)

where x-, and y-, are the analyte level and in­strument measurement associated with the fth of η specimens in the calibration set. The measurement error associated with^j is represented by ex. To simplify this discussion, an intercept is not included in Equation 1. In the calibration step, the model parameter, bv is usually estimated by least-squares regression of the instru­ment measurements on the reference values associated with the specimens com-

0.06 0.05

-gO.03' 5 0.02

0.01

0 1 2 3 4 5 Concentration of Sb (ppm)

Figure 1 . Prediction of the Sb concentration of a new specimen. The calibration model (solid line derived from the calibration set [dots]) relating the absor­bance at 217.6 nm to the Sb concentration and the absorbance of the new specimen are used for prediction.

posing the calibration set. The estimate of ft, can be expressed as b\ = (x'x)"1 xTy, wherex= fru*» · · -.«π)1 andy = (yi,y2, .. .,yn)T. In this article, the "hat" symbol over a quantity is used to denote an esti­mate (or prediction) of that quantity. The predicted analyte level associated with a new specimen is x' = y*/b\, where/ is the observed measurement associated with the new specimen.

In another approach, often referred to as the inverse method, the implied statisti­cal model is

*i = ft2 · :Vi + e- (2) where ex is assumed to be the measure­ment error associated with the reference value xy In the calibration step, the model parameter, ft2, is estimated by least-squares regression of the reference values on the instrument measurements (i.e., \ = (yTy)_1yTx)·m the prediction step, x" = $2 · / . In general, predictions ob­tained by the classical and inverse meth­ods will be different. However, in many cases, these differences will not be im­portant. In the literature, there has been an ongoing debate about which method is preferred (3,4). When calibrating with a single measurement, the inverse method may be preferred if the instrumental mea­surements are precise (e.g., as in near-IR spectroscopy).

The point of this discussion isn't to rec­ommend one method over another; it is to show that, even in this relatively simple situation, different approaches exist. How­ever, the breadth of the applicability of these univariate methods is limited.

For example, let us reconsider the de­termination of Sb concentration by AAS. Suppose the specimens to be analyzed contain Pb. It is well known that Pb has a strongly absorbing spectral line at 217.0 nm, which is quite close to the pri­mary Sb line at 217.6 nm (5). There are important ramifications of this fact. If an analyst fails to recognize the presence of .Pb, the application of univariate calibra­tion using the 217.6-nm line can result in inaccurate predictions for Sb because of the additional absorbance attributable to Pb (see Figure 2). If Pb is recognized as a possible interference, the usual ap­proach is to move to a less intense spec­tral line for Sb (e.g., 231.2 nm); however, one can expect a poorer detection limit

796 A Analytical Chemistry, Vol. 66, No. 15, August 1, 1994

Page 3: A Primer on Multivariate Calibration

0.06' g 0.05 J 0.04 "§0.03 (Λ

<0.02 0.01

ο

0.06' g 0.05 J 0.04 "§0.03 (Λ

<0.02 0.01

ο

τ : / ' |Pb absorbance X

/ι ;

\ s ' i,Bias|

0 1 2 3 4 5 Concentration of Sb (ppm)

Figure 2. Effect of the presence of Pb on predicting Sb concentration.

and degraded analytical precision. The preceding example exposes a fun­

damental weakness of univariate methods: In the absence of an effective method for separating the analyte from the interfer­ences, there is a need for measurements that are highly selective for the analyte of interest. Without a selective measure­ment, univariate methods may produce unreliable predictions. Furthermore, on the basis of a single measurement it is im­possible to detect the presence of un­known interferences or to know when pre­dictions are unreliable.

To apply univariate methods success­fully, an analyst will often need a great deal of specific knowledge about the chemical system to be analyzed. Furthermore, un­less the system is relatively simple (or unless the analyst has substantial knowl­edge of the subject matter), a selective measurement with an appropriate level of sensitivity will usually be hard to find.

On the positive side, univariate calibra­tion can often be used with reasonable suc­cess in applications where selective mea­surements can be found (as in AAS) or when the analyte can be effectively sepa­rated from interferences. In such cases, the simplicity of a univariate method of­fers a significant advantage. Even in these ideal settings, however, care is needed to maintain the reliability of predictions based on univariate calibration.

Given the data-rich environment in modern laboratories, numerous selective measurements might be available for anal­ysis. To use univariate methods, it is nec­essary to specify a single measurement or condense the multiple measurements to a single entity, such as peak area. For ex­ample, suppose IR absorption spectros­

copy is chosen to determine trace levels of water vapor by using the spectral region displayed in Figure 3. Furthermore, sup­pose that the water vapor is the only ab­sorbing species in the optical path in this spectral region. A common univariate ap­proach would be to select the single wave­length that exhibits the strongest signal for the analyte, in this case at about 2595 nm. This procedure might form the basis for a usable prediction model. How­ever, as we shall see in the next section, the use of measurements from many wavelengths (in conjunction with multiva­riate calibration methods) can provide more precise predictions.

In the event that selective measure­ments are not available (which is fre­quently the case), univariate methods will not be reliable. For example, in the anal­ysis of multicomponent biological materi­als by near-IR spectroscopy, the spectral responses of the components frequently overlap and selective measurements for the analyte of interest are unavailable. Modern methods need to be reliable, rapid, and precise—even for difficult appli­cations such as quality control, process monitoring, environmental monitoring, and medical diagnosis, which are impor­tant in the chemical, pharmaceutical, oil, microelectronics, and medical industries.

The nature of these applications, which frequently requires in situ, noninvasive, or nondestructive analyses, precludes the use of tedious sample preparation to ob­tain highly selective measurements or separation. The result is that the materials to be analyzed by analytical instruments are often quite complex and involve a very large number of chemical components, some of which may be unknown. The com­bination of complex materials and the need for rapid, reliable, accurate, and pre­cise determinations has motivated re­searchers to develop and use multivariate calibration methods.

Multivariate calibration In the agricultural and food industries multivariate calibration methods are used with spectral data to determine protein in wheat (6), water in meat (7), and fat in fish (7). In the manufacturing industries, multivariate calibration methods are of­ten used in process monitoring applica­tions, including the fabrication of semicon­

ductor devices (8). In medical applica­tions, significant developments are being made in producing reagentless and nonin­vasive instruments for analyzing blood components (9-11). Additional examples have been reviewed by Brown et al. (12).

As is true with univariate calibration, multivariate calibration consists of the cali­bration step and the prediction step. In the calibration step, multiple instrumental measurements are obtained from numer­ous specimens. These measurements could be the absorbances of each speci­men at each of a number of wavelengths. As with univariate calibration, the level of the analyte in each specimen is deter­mined by independent assay.

By using multiple measurements it is sometimes possible to determine multiple components of interest simultaneously. Often, however, there is only a single ana­lyte of interest. The multivariate instru­mental measurements and results from the independent assays form the calibra­tion set and are used to model the level of the analyte.

In the prediction step, the model and the multivariate instrumental measure­ments of new specimens are used to pre­dict the analyte levels associated with the new specimens. Often, predicted analyte values for each new specimen (x) are ob­tained by evaluating a particular linear combination of the available instrumen­tal measurements (yvy2, • • -,;yq), that is

χ = a0 + a1'yl + a2'y2 +

(3)

Individual calibration methods differ in the values of the coefficients (a0, av ..., aq) used to form x. More fundamentally, the method used (in the calibration step) to obtain the coefficients in Equation 3 dis­tinguishes the different calibration meth­ods. In some methods (e.g., multiple lin­ear regression, MLR), the number of in­strumental measurements (q) that can be used on the right-hand side of Equation 3 is constrained to be no greater than the number of specimens in the calibration set («). In other methods (e.g., principal components regression, PCR, and partial least squares, PLS), the number of instru­mental measurements is unrestricted.

The evolution of multivariate calibra­tion methods has been motivated by the

Analytical Chemistry, Vol. 66, No. 15, August 1, 1994 797 A

Page 4: A Primer on Multivariate Calibration

Report

continuing desire to solve increasingly dif­ficult problems. Many advances have oc­curred in conjunction with the use of spec­tral methods. Therefore, although the methods can be applied to areas outside spectroscopy, it is convenient to describe them in this context.

Methods such as PCR and PLS, in which an unlimited number of measure­ments may be used, are often referred to as full-spectrum methods. In the rest of this section, some of the more common calibration methods (all producing predic­tions from linear combinations of instru­mental measurements, as in Equation 3) will be described and illustrated by spec­tral data analysis. Strengths and weak­nesses of each method will be discussed, and the breadth of application for each will be emphasized.

Classical least-squares method. CLS is based on an explicit causal (or hard) model that relates the instrumental mea­surements to the level of the analyte of in­terest (and often the levels of interfering components) via a well-understood mecha­nism (e.g, Beer's law). In the simplest case, CLS uses a linear model (an exten­sion of Equation 1) that relates a single an-

2550 2600 2650 2700 2750 2800 Wavelength (nm)

(b)

-0.2 * 2550 2600 2650 2700 2750 2800

Wavelength (nm)

Figure 3. Absorbance spectra of (a) water vapor at 1 ppm concen­tration (estimated) and (b) a new specimen.

alyte to q instrumental measurements that are selective for that analyte. For exam­ple, for the i'th specimen,

3»ij = ftj ·*; + e8, for j = 1, 2 q (4)

where etj is the measurement error associ­ated with the/th measurement on the ith specimen,^. As in Equation 1, note that an intercept term is not included in this model for the sake of simplicity. The ap­propriate method for estimating the model parameters, b = (blt b2,..., 6q)T, in the calibration step depends somewhat on the nature of the measurement errors. In the prediction step, a linear combina­tion of the measurements from a new spec­imen in conjunction with estimated model

Λ Λ Λ Λ

parameters, b=(b1, b2,..., ftq) , is used to predict the analyte level. If the errors across measurements are independent, it is appropriate to express the predicted analyte level as

+ aa'ya (5) χ = a1,yl + a2

my2 + A A A

where ÛJ = fy / b b. In essence, the pre­dicted value provided by Equation 5 is the least-squares estimate of the slope of the relationship, through the origin, among the various (fy, y) pairs. That is, the pre­dicted value is obtained by regressing the Vj's on the ft/s.

To illustrate this graphically, let's re­visit the example associated with Figure 3a. The spectrum displayed in Figure 3a represents the estimated absorbance spec­trum of water in the gas phase at 1 ppm

Λ Α Λ

concentration (bv b2,..., 6q) that was ob­tained from the calibration step. Sup­pose we wish to predict the water concen­tration of a new specimen that exhibits the absorbance spectrum (y1,y2, • • -,:yq) il­lustrated in Figure 3b. The prediction of the water concentration of this new speci­men can be visualized by plotting the (ftj,^) pairs (Figure 4). The predicted value of the water concentration of this new specimen is χ = 0.457 ppm, or the es­timated slope of the (bs,yj) relationship.

The reliability and precision of this prediction can be assessed by examining the nature and magnitude of the scatter in the relationship among the (fy,^) pairs. In this case, the strong linear relationship among the (fy, y}) pairs and the constant level of scatter throughout the range of the 6j's indicate a reliable prediction. If many of the (bj,yj) pairs had deviated signifi­

cantly from the typical relationship among the pairs, one would suspect that an un­accounted interfering species or some other unexpected phenomenon influenced the spectrum of the new specimen. Hence, the selectivity of the measure­ments and the reliability of the prediction would be questioned.

In the case of univariate calibration, A

there would be only one (bj,ys) pair and hence no ability to discover unusual be­havior. Thus, one important advantage of using multivariate versus univariate meth­ods is the ability to assess the reliability of predictions and identify outliers.

For our purposes, an outlier is a speci­men (a member of the calibration set or a new specimen) that exhibits some form of discordance with the bulk of the speci­mens in the calibration set. In the calibra­tion set, an outlier specimen could result from an unusually large error in a refer­ence determination. During prediction, a new specimen would be considered an out­lier if it contained a chemical component (which affects the instrumental measure­ments) that was not present in the speci­mens composing the calibration set.

The consequences of failing to detect an outlier differ, depending on whether the outlier is in the calibration or the predic­tion set. When outliers are present in the calibration set, the result will likely be a poor model. The performance of such models during prediction will often be ad­versely affected. When a prediction set specimen is an outlier, the predicted ana­lyte value for that specimen may differ sig­nificantly from the true unknown analyte value. Thus, it is very important to identify outliers.

For many multivariate methods, a num­ber of diagnostics are available (13). Fur­thermore, the use of multivariate, rather than univariate, methods offers the po­tential for significantly improving the pre­cision of the predicted analyte values. The example depicted in Figure 3a demon­strates this potential. If, for example, the whole spectral region in Figure 3a is used to model the water concentration, the precision of the predictions generated by using Equations 4 and 5 can be expected to be about four times better than if mea­surements at a single wavelength (2595 nm) are used with the classical uni­variate method. That is, the standard de-

798 A Analytical Chemistry, Vol. 66, No. 15, August 1, 1994

(a) 3.0

ο g 2.5 Χ 2.0 Φ g 1.5 CO € 1.0 ο 5 0.5 <c

0

ο 1 ' 0

§0.8 Χ 0.6 03

g 0.4 10.2 § ο

Page 5: A Primer on Multivariate Calibration

Figure 4. Absorbance of the new specimen (y axis) versus the estimated absorbance of water vapor at 1 ppm concentration by wavelength (x axis). Each point represents a (bt, y) pair.

viation of repeated determinations will be about four times smaller for the multivari­ate method. However, this gain in effi­ciency will be realized in practice only if the precision of the reference method is sufficiently good.

The CLS method can be further gener­alized to include multiple components, among them other analytes or interfer­ences (14). In this case the underlying model given in Equation 4 is expanded to account for the effects of other analytes or interferences. The primary advantage of this more general approach is that it does not require selective measurements. However, the successful use of this or any other method based on an explicit model depends on complete knowledge of the chemical system to be analyzed. For ex­ample, all interferences must be known; furthermore, with regard to each speci­men in the calibration set, the levels of all analytes and interferences must be known.

These requirements greatly restrict the applicability of any method (such as CLS) that is based on an explicit causal model, especially for the analysis of complex ma­terials. However, if an explicit causal model can be developed, a method such as CLS can be quite valuable; it may provide a reasonable basis for extrapolation and understanding of the uncertainty in the predicted values of the analyte (15).

Multiple linear regression. In many ap­plications, analysts lack the knowledge re­quired to use an explicit causal model and instead use methods involving empiri­cally determined (or soft) models that

are driven by correlation rather than cau­sation. Empirical models often relate the characteristic of interest to simple func­tions (e.g., low-order polynomials) of the instrumental measurements. In calibra­tion problems in analytical chemistry, particularly spectroscopy, the simple func­tions often consist of linear combinations of instrumental measurements. For exam­ple, calibration methods relating the con­centration of an analyte to a linear combi­nation of measurements (e.g., absor­bance) from several wavelengths were introduced to develop models for a broader range of conditions. Specifically, such methods were designed to be used when the spectral features of the analyte overlap with features of other, perhaps un­known components in the material to be analyzed. These methods, which are based on a generalization of the univariate inverse model (see Equation 2), are of the form

*i = K + V^ii + ν>Ί2 + · · • +

V^iq + Ί (6)

whereby is the/th measurement associ­ated with the i'th specimen.

Various methods exist for estimating the model parameters (b0, bv ..., bq) us­ing data from the calibration set. One method uses MLR (16). The resulting

Λ Λ Λ

parameter estimates (b0, bu ..., bq) are used to predict the analyte levels of new specimens using Equation 3, with as = by Unlike the generalized CLS method, MLR and other soft modeling methods do not explicitly require knowledge of the levels of interferences and other analytes for specimens in the calibration set. A judi­cious choice of instrumental measure­ments can compensate for interferences.

For example, transmission (or reflec­tance) spectroscopy involving two wave­lengths in the vis/near-IR region is used to noninvasively monitor the level of oxy­gen saturation of arterial blood in a surgi­cal or critical care environment (17). One wavelength (in the red portion of the spectrum) is sensitive to the pulsatile blood volume and oxygen saturation of the blood. A second wavelength (in the near-IR region), which is insensitive to the level of oxygen saturation, provides a measure of the pulsatile blood volume. From the total signal associated with the

first wavelength, the contribution of the in­terfering phenomenon ( blood volume) can be effectively removed by using infor­mation obtained from the second wave­length. The resulting signal can provide a useful measure of oxygen saturation.

The number of measurements (a) that can be used with MLR is often severely re­stricted; it is usually in the range of 2-10, depending on the complexity of the mate­rials being analyzed. The strong correla­tion among the instrumental measure­ments can introduce instability among the resulting model parameter estimates. Because MLR cannot use a large number of correlated measurements, selecting the appropriate set of instrumental measure­ments is important. If available, as in the case of the noninvasive oximeter, spe­cific knowledge of the way in which the analyte and interfering components in the sample material affect the instrumental measurements can be used to select in­strumental measurements. In the absence of specific information, it is necessary to use empirical selection methods.

PLS, PCR, and related methods. Re­cently, soft-model-based methods (includ­ing PLS and PCR), in which a very large number of instrumental measurements are used simultaneously, have been suc­cessfully applied to analytical chemistry, particularly spectroscopy. Stone and Brooks (18) showed that PLS and PCR belong to a general class of statistical methods referred to collectively as contin­uum regression. In general, the assumed model for all such methods is of the form

X; = fto + V ' l l + b2'h2 + . . . + 6h-iih + ei (7)

where iik is the kth score associated with the jth sample. Each score consists of a linear combination of the original mea­surements; that is

Ik - Ιχΐ 'Λ + Yk2*:Vi2 +

• • • • Τ ί κ , · ^ (8) Data from the calibration set are used to obtain the set of coefficients {γ^}; the esti­mated model parameters ) 6k); and the model size that is given by the meta-parameter, h, which is usually consider­ably smaller than the number of measure­ments, q. In general, the coefficients l7kj} are obtained in a manner such that the vec-

Analytical Chemistry, Vol. 66, No. 15, August 1, 1994 799 A

Ο 0.5 1.0 1.5 2.0 2.5 Absorbance X 1000 per ppm

§1.0 - 0 . 8 eu 0.6 ο 5 0.4 1 0.2 Ι ο

-0.2

Page 6: A Primer on Multivariate Calibration

Report

tors (tlk, t2k>..., tnk) and (tlm, t2m

tnm),fork * m, are orthogonal. Thus, the resulting parameter estimates {b0, b\,..., bh] are stable. During the prediction step, the predicted analyte value for a new specimen is given by A Α Λ Λ

χ = b0 + ô j ' i j + b2't2 +

wherefk = ykl-yi + yk2·y2 + . . . + Ykq-yq. Thus, î is simply a linear combination of the instrumental measurements associ­ated with the new specimen (i.e., ic = a0 + ai'yi+a2'y2 + ... + aq'yq).

What differentiates these methods from one another is the set of coefficients (ûjl that is used and the way in which the {Ojl are obtained. One important differ­ence between PLS and PCR is the man­ner in which the {Ykjl coefficients are ob­tained. In PCR, the instrumental measure­ments of the calibration set are used exclusively to obtain the {ykj} coefficients; in PLS, the analyte values of the calibra­tion set are used as well.

Although it is beyond the scope of this article to explain precisely how the {ykj| and |«j} coefficients are obtained for these methods, detailed information is avail­able (13,14,19, 20). In addition, Refer­ences 21 and 22 provide comparisons of competing calibration methods. In these and other comparative studies, the predic­tion performances of PCR and PLS were generally found to be quite similar over a broad range of conditions. Rather than delve into how these methods differ, we will focus on common issues that are criti­cal to the successful use of soft-model-based approaches such as PLS and PCR.

Data pretreatment Before methods such as PLS and PCR are used, a certain amount of data pretreat­ment (or transformation) is often per­formed. The most common and simplest type is centering the data. This operation is usually performed on both the analyte values and the instrumental measure­ments, taken one at a time. That is, the centered analyte values are given by x*= χ-, - x, where

1 * f • - Σ *i !-l

and the centered values for the/th mea­surement are given by

1 "

This operation can help make subsequent computations less sensitive to round-off and overflow problems. In addition, this operation generally reduces the size of the resulting model

x\ = f + h ' f,*i + b2 ' k +

. . . + fth-4 + «i do) where

4 = Yki · J £ + Yk2 · y'u +

• · · + Ykq * y'n ÛD

by one factor. During the prediction step, the measurements associated with a new

Before methods such as PLS and PCR

are used, a certain amount of data pretreatment is often performed.

specimen, [yji, are similarly translated by the amounts

Less frequently, the instrumental mea­surements are differentially weighted. Each centered instrumental measure­ment is multiplied by a measurement-dependent weight, w; (i.e., y I = y"ï} · tt>j). The centered and weighted instrumental measurements (yjj) are then used as the basis for constructing the calibration model.

The purpose of using nonuniform weighting is to modify the relative influ­ence of each measurement on the result­ing model. The influence of the/th mea­surement is raised by increasing the mag­nitude of its weight, wy Sometimes

weighting is performed such that the stan­dard deviation of the centered and weighted measurements are identical (au-toscaling). That is, the standard deviation of (y%yli ^ ) = ifory=i,2,...,9.

Although centering promotes numeri­cal stability during the model-building stage, differential weighting can drasti­cally alter the form and performance of the resulting model. For example, if the weights of the least informative measure­ments were unilaterally increased, one would expect the model performance to suffer. On the other hand, if the weights of the most informative measurements were unilaterally increased, one would expect performance to improve.

The key to using weighting success­fully is the ability to identify informative measurements. Without knowledge of the relative information content of the vari­ous measurements, weighting is akin to "shooting in the dark."

To a certain extent, differential weight­ing reduces to variable selection. That is, for measurements that are not selected, the associated weights are set to zero. Because full-spectrum methods (such as PCR and PLS) can use many wavelengths, the prevailing belief among spectrosco-pists seems to be that measurement (wavelength) selection is unnecessary; thus, they often use all available wave­lengths within some broad range. How­ever, in many applications, measurements from many spectral wavelengths are non-informative or are difficult to incorporate in a model because of nonlinearities. Whereas to some degree full-spectrum methods are able to accommodate non-linearities, the inclusion of noninformative (or difficult) spectral measurements in a model can seriously degrade perfor­mance.

For many difficult problems, wave­length selection can greatly improve the performance of full-spectrum methods. Furthermore, in applications outside the laboratory (e.g. determination of compo­nents in an in situ setting), physical and economic considerations associated with the measurement apparatus may re­strict the number of wavelengths (or mea­surements) that can be used. Thus, wave­length selection is very important, even when applying methods capable of using a very large number of measurements.

800 A Analytical Chemistry, Vol. 66, No. 15, August 1, 1994

Page 7: A Primer on Multivariate Calibration

Currently few empirical procedures for wavelength selection are appropriate for use with full-spectrum methods such as PLS. Most procedures (e.g., stepwise re­gression) are associated with calibration methods (e.g., MLR) that are capable of using relatively few wavelengths. How­ever, Frank and Friedman (22) showed that stepwise MLR does not seem to per­form as well as PLS or PCR with all mea­surements.

In general, wavelength selection proce­dures that can be used with full-spectrum methods (e.g., the correlation plot) search for individual wavelengths that empiri­cally exhibit good selectivity, sensitivity, and linearity for the analyte of interest over the training set (23, 24). In order for these methods to be useful, wave­lengths specific to the analyte of interest with good S/N are needed. However, the required wavelength specificity is not usu­ally available in difficult applications (e.g., analysis of complex biological materials). Procedures such as the correlation plot, which consider only the relationships be­tween individual wavelengths and the ana­lyte of interest, are ill-equipped for such applications. This has provided the motiva­tion to develop methods that select in­strumental measurements on the basis of the collective relationship between candi­date measurements and the analyte of in­t e r e s t ^ ) .

A number of other procedures exist for data pretreatment, primarily to linearize the relationships between the analyte level and the various instrumental measure­ments. This is important because of the in­herent linear nature of the commonly used multivariate calibration methods. For example, in spectroscopy, optical trans­mission data are usually converted to ab-sorbance before analysis. In this setting, this is a natural transformation given the underlying linear relationship (through Beer's law) between analyte concentra­tion and absorbance.

Other pretreatment methods rely on the ordered nature of the instrumental measurements (e.g., a spectrum). In near-IR spectroscopy, instrumental mea­surements—which are first converted to reflectance—are often further trans­formed by procedures such as smooth­ing and the use of differencing (deriva­tives). Smoothing reduces the effects of

high-frequency noise throughout an or­dered set of instrumental measurements such as a spectrum. It can be effective if the signal present in the instrumental measurements has a smooth (or low-frequency) nature.

Differencing the ordered measure­ments mitigates problems associated with baseline shifts and overlapping features. Another technique often used in near-IR reflectance spectroscopy is multiplica­tive signal correction (26), which handles problems introduced by strong scatter­ing effects. The performance of the multi­variate calibration methods described earlier can be strongly influenced by data pretreatment.

Cross-validation, model size, and model validation Cross-validation is a general statistical method that can be used to obtain an ob-

Cross validation can be used to

obtain an objective assessment of the

magnitude of prediction errors.

jective assessment of the magnitude of prediction errors resulting from the use of an empirically based model or rule in complex situations (27). The objectivity of the assessment is obtained by comparing predictions with known analyte values for specimens that are not used in develop­ing the prediction model. In complex situa­tions, it is impossible or inappropriate to use traditional methods of model assess­ment. In the context of multivariate cali­bration, cross-validation is used to help identify the optimal size (hopt) for soft-model-based methods such as PLS and PCR. In addition, cross-validation can pro­vide a preliminary assessment of the pre­diction errors that are to be expected when using the developed model (of op­timal size) with instrumental measure­ments obtained from new specimens.

The cross-validated assessment of the

performance of a specific calibration model (fixed method/model size) is based on a very simple concept. First, data from the calibration set are partitioned into a number of mutually exclusive subsets (S1(

S2,..., Sv), with the z'th subset (SJ con­taining the reference values and instru­mental measurements associated with »; specimens. Next, F different models are constructed, each using the prescribed method/model size with all except one of the F available data subsets. The ith model, MA, is constructed by using all data subsets except Sr In turn, each model is used to predict the analyte of interest for specimens whose data were not used in its construction (i.e., M_{ is used to predict the specimens in S,). In a sense, this pro­cedure, which can be computing-inten­sive, simulates the prediction of new specimens. A comparison of predictions obtained in this way with the known refer­ence analyte values provides an objective assessment of the errors associated with predicting the analyte values of new specimens.

Partitioning the calibration set into the various data subsets should be done care­fully. Typically, the calibration set is par­titioned into subsets of size one (i.e., leave one out at a time cross-validation). How­ever, difficulty arises when replicate sets of instrumental measurements are obtained from individual specimens. Many practi­tioners use "leave one out at a time cross-validation" in this situation. Unfortunately, what is left out one at a time is usually a single set of instrumental measurements. In this case, the cross-validated predic­tions associated with specimens with rep­licate instrumental measurements will be influenced by the replicate measure­ments (from the same specimen) used to construct M_{.

Such use of cross-validation does not simulate the prediction of new samples. The likely result is an optimistic assess­ment of prediction errors. A more realistic assessment of prediction errors would be obtained if the calibration set were parti­tioned into subsets in which all replicate measurements from a single specimen are included in the same subset.

To select the optimal model size (hopt), the cross-validation procedure is per­formed using various values of the meta-parameter, h. For each value of A, an ap-

Analytical Chemistry, Vol. 66, No. 15, August 1, 1994 801 A

Page 8: A Primer on Multivariate Calibration

Report

propriate measure of model performance is obtained. A commonly used measure of performance is the root mean squared prediction error based on cross-validation

RMSCV(A) =

(12)

where χ{[ΜΑ0ι)] represents the predicted value of the ith specimen using a model of size h, which was developed without us­ing Sj. Sometimes, to establish a base­line performance metric, RMSCV(O) is computed. For this purpose, îJAi.ji.O)] is defined as the average analyte level in the set of all specimens with the ith speci­men removed. Thus, RMSCV(O) provides a measure of how well we would predict on the basis of the average analyte level in the calibration set rather than instrumen­tal measurements.

Often, practitioners choose hopt as the value of h that yields the minimum value of the RMSCV. The shape associated with RMSCV(/j) in Figure 5 is quite common. When h < hopV the prediction errors are largely a consequence of systematic ef­fects (e.g., interferences) that are unac­counted for. When h > hopt, the prediction errors are primarily attributable to modeling of noise artifacts (overfitting). Usually, if the model provides a high de­gree of predictability (as in the case illus­trated by Figure 5), the errors caused by overfitting are relatively small compared with those associated with systematic ef­fects not accounted for.

- .300 ' 5 - .300 ' 5 Λ £200 \ > ϋ to 100 tr

0 ν γ.

> ϋ to 100 tr

0 * I • •

0 5 10 15 No. of factors (ft)

Figure 5. Determination of optimal PLS model size, ftopt. The model relates near-IR spectroscopic measurements to urea concentration (mg/dL) in multicomponent aqueous solutions.

At this point, an optimal (or near-optimal) model size has been selected. RMSCV(/iopt) can be used as a rough esti­mate of the root mean squared predic­tion error associated with using the se­lected model with new specimens. This estimate may be somewhat optimistic, given the nature of the model selection process where many possible models were under consideration. A more realis­tic assessment of the magnitude of predic­tion errors can be obtained by using an external data set for model validation. An ideal strategy for model selection and vali­dation would be to separate the original calibration set into two subsets: one for model selection (i.e., determination of hopl) and one strictly for model valida­tion. Use of this strategy would guarantee that model validation is independent of model selection.

Pitfalls The primary difficulty associated with us­ing empirically determined models is that they are based on correlation rather than causation. Construction of these models involves finding measurements or combi­nations of measurements that simply cor­relate well with the analyte level through­out the calibration set. However, correla­tion does not imply causation (i.e., a cause-and-effect relationship between the analyte level and the instrumental mea­surements).

Suppose we find, by empirical means, that a certain instrumental measurement correlates well with the analyte level throughout the calibration set. Does this mean that the analyte level affects that par­ticular instrumental measurement? Not necessarily. Consider Figure 6, which dis­plays the hypothetical relationship be­tween the reference analyte level and the order of measurement (run order) for specimens in the calibration set. Be­cause of the strong relationship between analyte level and run order, it is difficult to separate their effects on the instrumen­tal measurements. Thus, the effects of ana­lyte level and run order are said to be con­founded. In this case, simple instrument instability could generate a strong mis­leading correlation between analyte level and an instrumental measurement. Fortu­nately, a useful countermeasure for this type of confounding exists: randomization

of the run order with respect to analyte level.

Often, however, more subtle confound­ing patterns exist. For instance, in a multi-component system, the analyte level may be correlated with the levels of other com­ponents or a physical phenomenon such as temperature. In such situations it may be difficult to establish whether, in fact, the model is specific to the analyte of in­terest. In a tightly controlled laboratory study, where the sample specimens can be formulated by the experimenter, it is possible to design the calibration set (with respect to component concentra­tions) so that the levels of different compo­nents are uncorrelated. However, this countermeasure does not work when an empirical model is being used to predict analyte levels associated with, for exam­ple, complex industrial or environmental specimens. In such situations, one rarely has complete knowledge of the compo­nents involved, not to mention the physi­cal and chemical interactions among com­ponents.

The validity of empirically based mod­els depends heavily on how well the cali­bration set represents the new specimens in the prediction set. All phenomena (with a chemical, physical, or other basis) that vary in the prediction set and influ­ence the instrumental measurements must also vary in the calibration set over ranges that span the levels of the phenom­ena occurring in the prediction set. Some­times the complete prediction set is at hand before the calibration takes place. In such cases, the calibration set can be ob­tained directly by sampling the prediction set (28). Usually, however, the complete prediction set is not available at the time of calibration, and an unusual (or unac­counted for) phenomenon may be associ­ated with some of the prediction speci­mens. Outlier detection methods repre­sent only a limited countermeasure against such difficulties; valid predictions for these problematic specimens cannot be obtained.

Sources of prediction errors Ultimately, the efficacy of an empirical cal­ibration model depends on how well it predicts the analyte level of new speci­mens that are completely external to the development of the model. If the reference

802 A Analytical Chemistry, Vol. 66, No. 15, August 1, 1994

Page 9: A Primer on Multivariate Calibration

Run order

Figure 6. Relationship between the reference analyte level and run order of the calibration experiment.

values associated with m new specimens (or specimens from an external data set) are available, a useful measure of model performance is given by the standard error of prediction

SEP = (13)

If these m new specimens comprise a ran­dom sample from the prediction set spanning the range of potential analyte values (and interferences), the SEP can provide a good measure of how well, on average, the calibration model performs. Often, however, the performance of the calibration model varies, depending on the analyte level.

For example, consider Figure 7a where the standard deviation of the prediction er­rors (eK = *; - x) increases as the analyte value deviates from the average analyte value found in the training set, f. In this case, although the precision of predic­tions depends on the analyte value, the ac­curacy is maintained over the range of analyte values. That is, for a particular ana­lyte level, the average prediction error is about zero. The behavior with respect to precision is neither unexpected nor ab­normal; the model is often better de­scribed in the vicinity off rather than in the extremes.

On the other hand sometimes there is a systematic bias associated with predic­tion errors that is dependent on the ana­lyte level (Figure 7b). When the analyte values are less than χ, prediction errors are generally positive. Conversely, when ana­lyte values are greater than x, the predic­tion errors are generally negative.

This pattern is indicative of a defective model in which the apparent good predic­tions in the vicinity of χ are attributable primarily to the centering operation that is usually performed during preprocessing in PLS and PCR. That is, predictions based on Equation 10 effectively reduce to i, = χ + noise, if the estimated model coeffi-cients, {b), are spurious. Spurious model coefficients are obtained if noninformative instrumental measurements are used to construct a model. Thus, one should be wary of models that produce the system­atic pattern of prediction errors shown in Figure 7b, regardless of whether the pre­dictions are based on cross-validation or a true external validation set.

Several other factors affect the accu­racy and precision of predictions, notably the inherent accuracy and precision of the reference method used. If the reference method produces erroneous analyte val­ues that are consistently low or high, the resulting predictions will reflect that bias. Imprecise ( but accurate) reference val­ues will also inflate the magnitude of pre­diction errors, but in a nonsystematic way. Furthermore, errors in determining the reference values will affect the ability to as­sess the magnitude of prediction errors. The assessed magnitude of prediction er­rors can never be less than the magnitude of the reference errors. Thus, it is very important to minimize the errors in the ref­erence analyte values that are used to construct an empirical model.

Other sources of prediction error are related to the repeatability, stability, and reproducibility of the instrumental mea­surements. Repeatability relates to the ability of the instrument to generate con­sistent measurements of a specimen using some fixed conditions (without removing the specimen from the instrument), over a relatively short period of time (perhaps seconds or minutes). Stability is similar to repeatability, but it involves a somewhat longer time period (perhaps hours or days). Reproducibility refers to the con­sistency of instrumental measurements during a small change in conditions, as might occur from multiple insertions of a specimen into an instrument.

Further classification of instrumental measurement errors is possible for cases in which the multiple instrumental errors are ordered (e.g., by wavelength). It is also

possible to decompose instrumental varia­tion into features that are slowly varying (low frequency) and quickly varying (high frequency). Often the focus is on only the high-frequency error component, and usu­ally only in the context of repeatability. This is unfortunate because multivariate methods that are capable of using many measurements are somewhat adept at reducing the effects of high-frequency errors. Practitioners should work to iden­tify and eliminate sources of slowly vary­ing error features.

Other sources of prediction error may be unrelated to the reference method or the analytical instrument. Modeling non­linear behavior with inherently linear methods can result in model inade­quacy. Some researchers thus have adapted multivariate calibrations methods in order to accommodate nonlinearities (29,30).

The ability to adequately sample and measure specimens in difficult environ-

6 8 10 12 14 16 Reference analyte level

(b)

Φ 1 6 "

| 14-yf*

I 1 12-c Ê3

/i

Pre

dict

ed

σ»

oo

ο

• y

6 8 10 12 14 16 Reference analyte level

Figure 7. Relationship between the predicted analyte and reference analyte levels. (a) In the nomal relationship, the average reference analyte level, x, is 11 (arbitrary units). The precision of the predicted values depends on the reference analyte level and is best in the vicinity of x. (b) In the abnormal relationship, the precision and accuracy of the predicted values depend on the reference analyte level.

Analytical Chemistry, Vol. 66, No. 15, August 1, 1994 803 A

100

80 m S 60

IT 40 CO

* 20

0 Ο 5 10 15 20

(a) _ 16

I 14 ϊ 12 C

Ι 1 0

ο 'S 8 * 6

Page 10: A Primer on Multivariate Calibration

A COMPLETE FAMILY

OF SELECT TOOLS

Report

FOR THE DEMANDING

ELECTROCHEMICAL

RESEARCHER For more than 25 years, EG&G Princeton

Applied Research has been supplying science

and industry with the finest selection of

electrochemical instruments and software

available. Add the outstanding support

provided by our full staff of applications

specialists, and it's easy to see why the

EG&G PARC family of electrochemical

solutions is legendary.

And the family is growing, with the recent

addition of the Model 263, a 20 volt compli­

ance, 200 mA potentiostat/galvanostat with

impressive speed, precision, and sensitivity.

You can run static or scanning experiments

from the front panel or from your PC.

Also, the new Model 6310 Electrochemical

Impedance Analyzer combines high perform­

ance potentiostat circuitry with state-

of-the-art phase sensitive detection in one

affordable unit.

To start your own family of electrochemical

products or just to add on, give us a call at

609-530-1000. One of our expert applications

people can help you make the right choice.

ments can significantly affect the perfor­mance of calibration methods. In a labora­tory it might be possible to control some of the factors that adversely affect model performance. However, many emerging analytical methods (e.g., noninvasive medical analyses and in situ analyses of in­dustrial and environmental materials) are intended for use outside the traditional laboratory environment, where it might not be possible to control such factors. The ability to overcome these obstacles will largely influence the success or fail­ure of a calibration method in a given appli­cation. Thus, practitioners must strive to identify and eliminate the dominant sources of prediction error.

Summary This article has provided a basic introduc­tion to multivariate calibration methods with an emphasis on identifying issues that are critical to their effective use. In the future, as increasingly difficult problems arise, these methods will continue to evolve. Regardless of the direction of the evolutionary process, the resulting meth­ods will need to be used carefully, with rec­ognition of the issues presented in this ar­ticle.

I thank Steven Brown, Bob Easterling, Ries Robinson, and Brian Stallard for their advice on this manuscript. Brian Stallard provided the water vapor spectra.

References (1) Oman, S. D.; Wax, Y. Biometrics 1984,

40,947-60. (2) Smith, R. L.; Corbett, M. Applied Statistics

1987,36, 283-95. (3) Krutchkoff, R. G. Technometrics 1967, 9,

425-39. (4) Williams, E. J. Technometrics 1969, 11,

189-92. (5) Occupational Safety and Health Adminis­

tration Salt Lake Technical Center: Metal and Metalloid Particulate in Workplace Atmospheres (Atomic Absorption) (US-DOL/OSHA Method No. ID-121). In OSHA Analytical Methods Manual Part 2, 2nd ed.

(6) Fearn, T. Applied Statistics 1983,32, 73-79.

(7) Oman, S. D.; Naes, T.; Zube, A. J. Che-mom. 1993, 7,195-212.

(8) Haaland, D. M.Anal. Chem. 1988, 60, 1208-17.

(9) Small, G. Α.; Arnold, Μ. Α.; Marquardt, L. A. Anal. Chem. 1993, 65, 3279-89.

(10) Bhandare, P.; Mendelson, Y.; Peura, R. Α.; Janatsch, G.; Kruse-Jarres, J. D.; Mar-bach, R.; Heise, H. M. Appl. Spectrosc. 1993, 47,1214-21.

(11) Robinson, M. R.; Eaton, R. P.; Haaland, D. M.; Koepp, G. W.; Thomas, E. V.; Stal­lard, B. R.; Robinson, P. L. Clin. Chem. 1992,38,1618-22.

(12) Brown, S. D.; Bear, R. S.; Blank, T. B. Anal. Chem. 1992,64, 22R-49R.

(13) Martens, H.; Naes, T. Multivariate Cali­bration; Wiley: Chichester, England, 1989.

(14) Haaland, D. M.; Thomas, E. V. Anal. Chem. 1988, 60,1193-1202.

(15) Thomas, E. V. Technometrics 1991,33, 405-14.

(16) Brown, C. W.; Lynch, P. F.; Obremski, R. J.; Lavery, D. S. Anal. Chem. 1982,54, 1472-79.

(17) Mendelson, Y. Ph. D. Dissertation, Case Western Reserve University, Cleveland, OH, 1983.

(18) Stone, M.; Brooks, R. }.]. Royal Statistical Soc. Series B, 1990,52, 237-69.

(19) Hoskuldsson.A./. Chemom. 1988,2, 211-28.

(20) Helland, I. S. Scandinavian J. Statistics 1990,17,97-114.

(21) Thomas, E. V.; Haaland, D. M. Anal. Chem. 1990, 62,1091-99.

(22) Frank, I. E.; Friedman, J. H. Technomet­rics 1993, 35,109-48.

(23) Hruschka, W. R. In Near-Infrared Technol­ogy in the Agricultural and Food Indus­tries; Williams, P.; Norris K., Eds.; Ameri­can Association of Cereal Chemists, Inc.: St. Paul, MN, 1987; pp. 35-55.

(24) Brown, P.J./. Chemom. 1992, 6,151-61. (25) Li, T.; Lucasius, C. B.; Kateman, G. Anal.

Chim. Acta 1992,268,123-34. (26) Isaksson, T.; Naes, T. Appl. Spectrosc.

1988,42,1273-84. (27) Stone, M.J. Royal Statistical Soc. Series B,

1974,36,111-33. (28) Naes, T.; Isaksson, T. Appl. Spectrosc.

1989, 43,328-35. (29) Hoskuldsson, A./. Chemom. 1992, 6,

307-34. (30) Sekulic, S.; Seasholtz, M. B.; Kowalski,

B. R.; Lee, S. E.; Holt, B. R. Anal. Chem. 1993,65,835A-845A

Edward V. Thomas is a statistician at the Statistics and Human Factors Department ofSandia National Laboratories, Albu­querque, NM 87185-0829. His B.A. degree in chemistry and M.A. degree and Ph.D in statistics were all awarded by the University of New Mexico. His research interests in­clude calibration methods with a focus on their application to problems in analytical chemistry.

804 A Analytical Chemistry, Vol. 66, No. 15, August 1, 1994

INSTRUMENTS Princeton Applied Research EG&G

P.O. Box 2565 · Princeton, NJ 08543 · (609) 530-1000 FAX: (609) 883-7259-TELEX: 843409

United Kingdom (44) 734-773003 · Canada 905-827-2400 Netherlands (31) 034-0248777 · Italy (39) 02-27003636

Germany (49) 89-926920 France (33)01-69898920 · Japan (03) 638-1506

Circle 10 for Literature. Circle 11 for Sales Rep.