PLS
LLeasteastSSquaresquares
Multivariate R e g r e s s i o nA Standard Tool for :
PPartialartial
Regression :
Modeling dependentdependent variable(s): YY
By predictorpredictor variables: XX
Chemical property
Biological activity
Chem. composition
Chem. structure (Coded)
MLRMLRTraditional method:
IfIf X-variables are:
few ( # X-variables < # Samples)
Uncorrelated (Full Rank X)
Noise Free ( when some correlation exist)
InstrumentsInstrumentsSpectrometers
Chromatographs
Sensor Arrays
Numerous
Data …Data …
Correlated
Noisy
Incomplete
But !But !
Correlated
PredictorPredictor
XX : Independent Variables
The relation between
two Matrices XX and YY
By a LinearLinear Multivariate Regression
PLSR ModelsModels:
The StructureStructure of both XX and YY
Richer resultsRicher results than MLRMLR
1
2
PLSRPLSR is a generalizationgeneralization of MLRMLR
PLSR is able to analyzeable to analyze Data with:
Noise
Collinearity (Highly Correlated Data)
Numerous X-variables (> # samples)
Incompleteness in both X and Y
HistoryHistory
Herman Wold (1975):
Modeling of chain matrices by:
NNonlinear IIterative PPAArtial LLeast SSquares
Regression between :
- a variablevariable matrix
- a parameterparameter vector
Other parameter vector
Fixed
Completion and modification of
Two-blocksTwo-blocks ( XX, YY ) PLS (simplest)
Herman Wold (~2000):
PProjection to LLatent SStructures
As a more descriptivemore descriptive interpretation
Svante Wold & H. Martens (1980):
A QQSSPPRR example :
OneOne YY-variable: a chemical propertyproperty
Quant. descriptiondescription of variation in chem. structurestructure
The Free Energy of unfolding of a protein
SevenSeven XX-variables:
1919 different AminoAcids in position 49 of proteinHighlyHighly
CorrelatedCorrelated
123456789
10111213141516171819
data PIEPIE0.23
-0.48-0.610.45
-0.11-0.510.000.151.201.28
-0.770.901.560.380.000.171.850.890.71
PIFPIF0.31
-0.60-0.771.54
-0.22-0.640.000.131.801.70
-0.991.231.790.49
-0.040.262.250.961.22
DGRDGR-0.550.511.20
-1.400.290.760.00
-0.25-2.10-2.000.78
-1.60-2.60-1.500.09
-0.58-2.70-1.70-1.60
SACSAC254.2303.6287.9282.9335.0311.6224.9337.2322.6324.0336.6336.3366.1288.5266.7283.9401.8377.8295.1
MRMR2.1262.9942.9942.9333.4583.2431.6623.8563.3503.5182.9333.8604.6382.8762.2792.7435.7554.7913.054
LamLam-0.02-1.24-1.08-0.11-1.19-1.430.03
-1.060.040.12
-2.26-0.33-0.05-0.31-0.40-0.53-0.31-0.84-0.13
DDGTSDDGTS8.58.28.5
11.06.38.87.1
10.116.815.0
7.913.311.2
8.27.48.89.98.8
12.0
VolVol82.2
112.3103.799.1
127.5120.565.0
140.6131.7131.5144.3132.3155.8106.788.5
105.3185.9162.7115.6
X YY
TransformationTransformation Symmetrical Distribution
12.542350.2546100584
loglog
1.0973.627-0.6992.7375.002
ScalingScaling More weights for
more informative X-variables
No Knowledge about importance of variables
Auto ScalingAuto Scaling
1.Scale to unit variance (xxi i /SD/SD).
2.Centering (xi – xaver).
Same weights for all X-variables
Auto Scaling
Numerically More Stable
BaseBase of PLSR Model (usually linearlinear)
A few “new” variables :
XX-scoresscores tta a (a=1,2, …,A)(a=1,2, …,A)
OrthogonalOrthogonal
& Linear Combination of X-variables
Modelers of XX Predictors of YY
: T = X W*
Weights
X = T P’ + E
TT (X-scores) (X-scores) ttaa (a=1,2, …,A)(a=1,2, …,A)
Are:
Predictors of YY: Y = T Q’ + F
loadings
Y = XW* Q’ + FPLS-Regression PLS-Regression
CoefficientsCoefficients ((BB))
Modelers of XX:
By stepwise subtraction of each component (ttaap’p’aa) from XX
X = T P’ + E
X - T P’ = E
X - ta pa’ = Xa
Residual after Residual after subtraction of subtraction of aathth component component
Estimation of Estimation of TT : :
XX11 XX22 XX33 XXa-1a-1
XXaa
XX00== tt11pp11 +tt22pp22+ tt33pp33+ t4pp44+… + tappa a + E
t1 = X0w1
X1 = X – t1 p1’t2 = X1w2
X2= X1 – t2 p2’t3 = X2w3
XXa-1a-1 = XXa-2a-2 – ta-1 p’a-1ta = XXa-1a-1 wa
.
.
.
.
.
.
XXaa= XXa-1a-1 – ta pa’= E
Stepwise “DeflationDeflation” of XX-matrix
Geometrical Interpretation
tt,s are modelers of XX and predictors of YY
PLS-2PLS-2PLSPLS--11
MultivariableMultivariable YY
or ??
YY PCARankRank of Y ( #PCs)
IfIf #PCs << # Y variables :
One One yy at a time at a time all in a single modelall in a single model
IfIf #PCs =< # Y variables :
One PLS-2PLS-2 model
PLSPLS--1 1 models
UnderfittingUnderfitting
No of PLS components !!
If proper :If proper :
OverfittingOverfitting
GOODGOOD predictionprediction abilityability
Cross Validation:Cross Validation:
X YY
PPredictive redictive
REREsidualsidual
SSum ofum of
SSquaresquares
Calibr.
Pred.
Pred.Pred.
Different # components# components in the model
Different PRESS PRESS values
Model with proper proper # components# components
is
The model with minmin PRESS value
PLS AlgorithmPLS Algorithm
NNonlinear IIterative PAPArtial LLeast SSquares
Common and simple
TransformationTransformation, ScalingScaling and CenteringCentering of XX and YY
Initially :
X = T P’ + E
Y = U Q’ + F = T Q’ + F
TT PP
Base :Base :
TT = XX PP
PP = X’X’ TT
X Utilizing X-model
AA Getting uu (temporary Y-score):
One of Y columnsOne of Y columnsFor using as X-score
Having: XXYY
is (XX00, or XX11, …, or XXa-1a-1)is (YY00, or YY11, …, or YYa-1a-1)
1. Autoscaled2. Not deflated
For aa = 11 to AA
BB
Xa-1 = uuaa wa’ + E wa= X’a-1uuaa//u’u’aauuaa
Make w’awa=1
Temp. X-loadings
Calculating wwaa ( X-weightsweights )
CC Calculating ttaa (X-scoresscores):
Xa-1 = ttaa wa’ + E ttaa= Xa-1wa
ScoresScores for both XX and YY
Xa-1 = ta pa’ + E
Ya-1 = ta qa’ + F
DD Calculating ppaa ( X-loadingloading)
and qqaa (Y-loadingloading)
pa = Xa-1ta/ta’ta
qa = Ya-1 ta/ta’ta
EE Testing desireness of uuaa :
By calculating tta a again
(uua a )new = Ya-1 qa / qa’ qa
wa= X’a-1uuaa//u’u’aauuaa
(ttaa)new= Xa-1wa
Performing convergence testconvergence test on it.
((ttaa))newnew - ttaa / ((ttaa))newnew < 10-7
FF If No convergenceNo convergence : Goto
Using ((uua a ))newnew
BB
GG If convergenceconvergence :
Calculating new XX and YY for the next cycle
Xa = Xa-1 - ta pa’
Ya = Ya-1 - ta qa’
Next aa Or : aa=aa+1 and Goto BB
HH Last Step (when aa = AA)
BB = W(P’W)-1Q’
PLS-Regression PLS-Regression
Coefficients Coefficients ((BB))
Y = X B + B0
ScoresScores
LoadingsLoadingsw1
p1
t1 uu11
q1
XX00 YY00
X1 = X0 – t1 p1’
Y1 = Y0 – t1 q1’
summary
ScoresScores
LoadingsLoadingsw2
p2
t2 uu22
q2
XX11 YY11
X2 = X1 – t2 p2’
Y2 = Y1 – t2 q2’
ScoresScores
LoadingsLoadingswa
pa
ta uuaa
qa
XXa-1a-1 YYa-1a-1
Xa = Xa-1 – ta pa’= E
Ya = Ya-1 – ta qa’ = F
TT UU
WWPP QQXX YY
+ AA , EE, and FF