partial least squares regression (plsr)

Partial Least Squares Partial Least Squares Regression (PLSR)Regression (PLSR)

• Partial least squares (PLS) is a method for constructing predictive models when the predictors are many and highly collinear.

• Note that the emphasis is on predicting the responses and not necessarily on trying to understand the underlying relationship between the variables.

• When prediction is the goal and there is no practical need to limit the number of measured factors, PLS can be a useful tool.

• PLS was developed in the 1960’s by Herman Wold as an econometric technique, but some of its most avid proponents (including Wold’s son Svante) are chemical engineers and chemometricians.

• Partial least squares regression (PLSR) is a multivariate data analytical technique designed to handle intercorrelated regressors.

• It is based on Herman Wold’s general PLS principle in which complicated, multivariate systems analysis problems are solved by sequence of simple least squares regressions.

How Does PLS Work?How Does PLS Work?

• In principle, MLR can be used with very many predictors.

• However, if the number of predictors gets too large (for example, greater than the number of observations), you are likely to get a model that fits the sampled data perfectly but that will fail to predict new data well.

• This phenomenon is called over-fitting.

• In such cases, although there are many manifest predictors, there may be only a few underlying or latent factors that account for most of the variation in the response.

• The general idea of PLS is to try to extract these latent factors, accounting for as much of the manifest predictor variation as possible while modeling the responses well.

• For this reason, the acronym PLS has also been taken to mean ‘‘projection to latent structure.’’

• The overall goal is to use the predictors to predict the responses in the population.

• This is achieved indirectly by extracting latent variables T and U from sampled factors and responses, respectively.

• The extracted factors T (also referred to as X-scores) are used to predict the Y-scores U, and then the predicted Y-scores are used to construct predictions for the responses.

• This procedure actually covers various techniques, depending on which source of variation is considered most crucial.

• PCR is based on the spectral decomposition of XtX, where X is the matrix of predictor values;

• PLS is based on the singular value decomposition of XtY .

• If the number of extracted factors is greater than or equal to the rank of the sample factor space, then PLS is equivalent to MLR.

• An important feature of the method is that usually a great deal fewer factors are required.

• One One approach approach toto extract extract optimum number ofoptimum number of factors factors is to construct the PLS model for a given number of factors on one set of data and then to test it on another, choosing the number of extracted factors for which the total prediction error is minimized.

• Alternatively, van der Voet (1994) suggests choosing the least number of extracted factors whose residuals are not significantly greater than those of the model with minimum error.

• If no convenient test set is available, then each observation can be used in turn as a test set; this is known as cross-validation.

• The PLSR is a bilinear regression method that extracts a small number of factor, ta, a = 1, 2,…, A that are linear combinations of the K X variables, and use these factors as regressors for y.

• What is special for the PLSR compared to principal component regression (PCR) is that the y variable is used actively in determining how the regression factors ta are computed from the X.

• Each PLSR factor ta is defined so that it describes as much as possible of the covariance between X and y remaining after the previous a-1 factors have been estimated and subtracted.

• The purpose of using PLSR in multivariate calibration is to obtain good insight and good predictive ability at the same time.

• In classical stepwise multiple linear regression (SMLR) the collinearity is handled by picking out a small subset of individual, distinctly different X variables from all the available X variables.

• This reduced subset is used as regressors for y, leaving the other X variables unused.

• The estimated factors are often defined to be orthogonal to one another.

• The model for regressions on estimated latent variables can be summarized as follows:

T = w(X)

X = p(T) + E

y = q(T) + f

y = q(w(X)) + f = b(X) + f

• In practice, the model parameters have to be estimated from empirical data.

• Since the regression is intended for later prediction of y and X, the factor scores T are generally defined as functions of X:T = w(X).

• The major difference between calibration methods is how T is estimated.

• For instance, in PCR it is estimated as a series of eigenvector spectra for (X – 1x(X – 1xTT))TT(X – 1x(X – 1xTT),), etc.

• In PLSR w() is defined as a sequence of X versus y covariances.

PLS-Regression (PLS-R)PLS-Regression (PLS-R)PLS-A Powerful Alternative to PCRPLS-A Powerful Alternative to PCR

• It is possible to obtain the same prediction results as PCR, but based on a smaller number of components, by allowing the y-data structure to intervene directly in the X-decomposition.

• This by condensing the two-stage PCR process into just one: PLS-R (Partial Least Squares Regression).

• Usually the term used is just PLS, which has also been interpreted to signify Projection to Latent Structures.

• PLS claims to do the same job as PCR, only with fewer bilinear components.

PLS(X, Y); Initial Comparison with PLS(X, Y); Initial Comparison with PCA(X),PCA(Y)PCA(X),PCA(Y)

• In comparision between PCR and PLS, PLS uses the y-data structure, the y-variance, directly as a guiding hand in decomposing the X-matrix, so that the outcome constitutes as optimal regression, precisely in the strict prediction validation sense.

• A very first approximation to an understanding of how the PLS-approach works (though not entirely correct) is tentatively and simply to view it as two simultaneous PCA-analyses, PCA of X and PCA of Y.

• The equivalent PCA equations are presented at the following Figure.

• Note how the score and loading complements in X are called T and P respectively (X also has an alternative W-loading in addition to the familiar P-loading), while these are called U and Q respectively for the Y-space.

A

T

A

T

FQUY

EPTX

• However PLS does not really perform two independent PCA-analyses on the two spaces.

• On the contrary, PLS actively connects the X- and Y-spaces by specifying the u-score vector (s) to act as the starting points for (actually instead of) the t-score vectors in the X-space decomposition.

w = loading weight p = x loading q = y loading

• Thus the starting proxy-t1 is actually u1 in the PLS-R method, thereby letting the Y-data structure directly guide the otherwise much more “PCA-like” decomposition of X.

• Subsequently u1 is later substituted by t1 at the relevant stage in the PLS-algorithm in which the Y-space is decomposed.

• The crucial point is that it is the u1 (reflecting the Y-space structure) that first influences the X-decomposition leading to calculation of the X-loadings, but these are now termed “w” (for “loading-weights”).

• Then the X-space t-vectors are calculated, formally in a “standard” PCA fashion, but necessarily based on this newly calculated w-vector.

• This t-vector is now immediately used as the starting proxy- u1-vector, i.e. instead of u1, as described above only symmetrically with the X- and the Y-space interchanged.

• By this means, the X-data structure also influences the “PCA (Y)-like” decomposition.

B = W(PTW)-1QT

• Thus, what might at first sight appear as two sets of independent PCA decompositions is in fact based on these interchanged score vectors.

• In this way we have achieved the goal of modeling the X- and Y-space interdependently. PLS actively reduces the influence of large X-variations which do not correlate with Y.

• PCR is based on the spectral decomposition of X’X, where X is the matrix of variables and PLS is based on the singular value decomposition of X’Y.

• Alternative overview of PLS (indirect modeling) states that the overall goal is to use the variables to predict the responses in the population.

• This is achieved indirectly by extracting latent variables T and U from sampled variables and responses, respectively.

• The extracted factors T (also referred to as X-scores) are used to predict the Y-scores U, and then the predicted Y-scores are used to construct predictions for the responses.

Interpretation of PLS modelsInterpretation of PLS models

• In principle PLS models are interpreted in much the same way as PCA and PCR models.

• Plotting the X- and the Y-loadings in the same plot allows you to study the inter-variable relationship, now also including the relationship between the X- and Y-variables.

• Since PLS focuses on Y, the Y-relevant information is usually expected already in early components.

• There are however situations where the variation related to Y is very subtle, so many components will be necessary to explain enough of Y.

Loadings (p) and loading weights (w)Loadings (p) and loading weights (w)

• The P-loadings are very much like the well-known PCA-loadings; they express the relationship between the raw data matrix X and its score, T. (in PLS these may be called PLS scores.)

• These loadings may be interpreted in the same way as in PCA or PCR, so long as it is aware that the scores have been calculated by PLS.

• In many PLS applications P and W are quite similar. This means that the dominant structures in X “happen” to be directed more or less along the same directions as those with maximum correlation to Y.

• The loading weights, W, however represent the effective loadings directly connected to building the sought for regression relationship between X and Y.

• In PLS there is also a set of Y-loadings, Q, which are the regression coefficients from the Y-variables onto the scores, U.

• Q and W may be used to interpret relationships between the X- and Y-variables, and to interpret the patterns in the score plots related to these loadings.

Loading plot of non-spectra variablesLoading plot of non-spectra variables

Loading plot of spectra variablesLoading plot of spectra variables

• The fact that both P and W are important however, is clear from construction of the formal regression equation Y = XB from any specific PLS solution with A components.

• This B matrix is calculated from:

B = W(PTW)-1QT

This B-matrix is often used for practical (numerical) prediction purposes.

When to use which method?When to use which method?

• PLS-approach is easy to understand conceptually and to be preferred because it is direct, and effective.

• PLS is said to produce results, which are easier to interpret because they are less complex (using fewer components).

• Often PCR may give prediction errors as low as those of PLS, but almost invariably by using more PCs to do the jobs.

• PLS2 is a natural method to start with when there are many Y-variables.

• You quickly get an overview of the basic patterns and see if there is significant correlation between the Y-variables.

• PLS2 may actually in a few cases even give better results if Y is collinear, because it utilises all the available information in Y.

• The drawback is that you may need different numbers of PCs for the different Y-variables, which you must remember at interpretation and prediction.

Exercise- Interpretation of PLS (Jam)Exercise- Interpretation of PLS (Jam)

partial least squares regression (plsr)

Documents

regression factors

plsregression plsr pls

squares pls

pls model

estimated factors

latent factors

given number of factors

optimum number of factors