Download - 2. The PARAFAC model
![Page 1: 2. The PARAFAC model](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814a42550346895db75e3b/html5/thumbnails/1.jpg)
1
2. The PARAFAC model
Quimiometria Teórica e Aplicada
Instituto de Química - UNICAMP
![Page 2: 2. The PARAFAC model](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814a42550346895db75e3b/html5/thumbnails/2.jpg)
2
Example: fluorescence data (1)Example: fluorescence data (1)
240
260
280
300
250300
350400
450-100
0
100
200
300
400
Excitation wavelength (nm)Emission wavelength (nm)
Inte
nsity
Each fluorescence spectrum is a matrix of emission vs excitation wavelengths:
Xi (201 61)
![Page 3: 2. The PARAFAC model](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814a42550346895db75e3b/html5/thumbnails/3.jpg)
3
Example: fluorescence data (2)Example: fluorescence data (2)
• Each spectrum is a linear sum of three components: tryptophan, phenylalanine and tyrosine.
Xi = ai1b1c1T + ai2b2c2
T + ai3b3c3T + Ei
concentration of tryptophan in sample i
emission spectrum of pure tryptophan
excitation spectrum of pure tryptophan
Xi =
b1
c1T
ai1
b2
c2T
ai2 +
b3
c3T
ai3 + + Ei
![Page 4: 2. The PARAFAC model](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814a42550346895db75e3b/html5/thumbnails/4.jpg)
4
Example: fluorescence data (3)Example: fluorescence data (3)
• Five samples were measured and stacked to give a three-way array: X (5 201 61).
X5
X4
X3
X2
X1
5 sa
mp
les
201 emission ’s
61 excitation ’s
=
b1T
c1T
a1
b2T
c2T
a2
+
b3T
c3T
a3
+
+ Econcentration of tryptophan in each
sample
![Page 5: 2. The PARAFAC model](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814a42550346895db75e3b/html5/thumbnails/5.jpg)
5
Example: fluorescence data (4)Example: fluorescence data (4)
• If we are given a set of fluroescence spectra, X, how can we determine:
– How many chemical species are present?
– Which chemical species are present? What are their pure excitation and emission spectra?
i.e. self-modelling curve resolution (SMCR)
– What is the concentration of each species in each sample?
i.e. (second-order) calibration
• Answer: use the PARAFAC model!
![Page 6: 2. The PARAFAC model](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814a42550346895db75e3b/html5/thumbnails/6.jpg)
6
The PARAFAC model (1)The PARAFAC model (1)
EBT
CT
A
+=
K
X
J
I
= b2T
c2T
a2
+cR
T
bRT
aR
… + + E
c1T
b1T
a1
Triad
}
![Page 7: 2. The PARAFAC model](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814a42550346895db75e3b/html5/thumbnails/7.jpg)
7
The PARAFAC model (2)The PARAFAC model (2)
• Loadings– A (I R) describes variation in the first mode.
– B (J R) describes variation in the second mode.
– C (K R) describes variation in the third mode.
• Residuals– E (I J K) are the model residuals.
EBT
CT
A
+=K
X
J
I
![Page 8: 2. The PARAFAC model](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814a42550346895db75e3b/html5/thumbnails/8.jpg)
8
Example: fluorescence data (5)Example: fluorescence data (5)
• Loadings– A (5 3) describes the component concentrations.
– B (201 3) describes the pure component emission spectra.
– C (61 3) describes the pure component excitation spectra.• Residuals
– E (5 201 61) describes instrument noise.
EBT
CT
A
+=X5 sa
mp
les
201 emission ’s
61 excitation ’s
![Page 9: 2. The PARAFAC model](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814a42550346895db75e3b/html5/thumbnails/9.jpg)
9
Example: fluorescence data (6)Example: fluorescence data (6)
• A 3-component PARAFAC model describes 99.94% of X.
250 300 350 400 450-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
Emission wavelength (nm)
Load
ings
(se
cond
mod
e)
B (201 3)
240 250 260 270 280 290 3000
0.05
0.1
0.15
0.2
0.25
Excitation wavelength (nm)
Load
ings
(se
cond
mod
e)
C (61 3)
phenylalanine
tyrosine
tryptophan
tryptophan
tyrosine
phenylalanine
![Page 10: 2. The PARAFAC model](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814a42550346895db75e3b/html5/thumbnails/10.jpg)
10
Example: fluorescence data (7)Example: fluorescence data (7)
• The A-loadings describe the relative amounts of species 1 (tryptophan), 2 (tyrosine) and 3 (phenylalanine) in each sample:
• In order to know the absolute amounts, it is necessary to use a standard of known concentrations, i.e. sample 5.
A (5 3)
2.7867
0.0147
0.0492
1.6140
0.9179
-0.0135
2.0803
0.0234
0.8378
0.6949
-0.0042
0.0006
1.8358
0.7990
0.6945
Concentrations (ppm)
2.6685
0.0141
0.0471
1.5455
-0.0853
13.172
0.1484
5.3045
-1.8151
0.2714
785.09
341.68
0.8790 4.4000 297.00
![Page 11: 2. The PARAFAC model](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814a42550346895db75e3b/html5/thumbnails/11.jpg)
11
The PARAFAC formulaThe PARAFAC formula
• Data array– X (I J K) is matricized into XIJK (I JK)
XIJK = A(CB)T + EIJK
• Loadings– A (I R) describes variation in the first mode
– B (J R) describes variation in the second mode
– C (K R) describes variation in the third mode
• Residuals– E (I J K) is matricized into EIJK (I JK)
Khatri-Rao matrix product
![Page 12: 2. The PARAFAC model](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814a42550346895db75e3b/html5/thumbnails/12.jpg)
12
PCA vs PARAFACPCA vs PARAFAC
PCA
Bilinear model
X = ABT + E
PARAFAC
Trilinear model
XIJK = A(CB)T + EIJK
Components are calculated sequentially in order of
importance.
Components are calculated simultaneously in random
order.
Solution is unique (i.e. not possible to rotate factors
without losing fit).
Solution has rotational freedom.
Orthogonal, i.e. BTB = I Not (usually) orthgonal.
![Page 13: 2. The PARAFAC model](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814a42550346895db75e3b/html5/thumbnails/13.jpg)
13
Rotational freedomRotational freedom
• The bilinear model X = ABT + E contains rotational freedom. There are many sets of loadings (and scores) which give exactly the same residuals, E:
X = ABT + E
= ARR-1BT + E
= A*B*T + E (A*=AR B*T=R-1BT)
• This model is not unique – there are many different sets of loadings which give the same % fit.
![Page 14: 2. The PARAFAC model](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814a42550346895db75e3b/html5/thumbnails/14.jpg)
14
PARAFAC solution is uniquePARAFAC solution is unique
• The trilinear model X = A(CB)T + E is said to be unique, because it is not possible to rotate the loadings without changing the residuals, E:
X = A(CB)T + E
= ARR-1(CB)T + E
= A*(C*B*)T + E*
• This is why PARAFAC is able to find the correct fluorescence profiles – because the unique solution is close to the true solution.
![Page 15: 2. The PARAFAC model](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814a42550346895db75e3b/html5/thumbnails/15.jpg)
15
Spot the difference!Spot the difference!
0 50 100 150 200 250-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
PCA loadings PARAFAC loadings
0 50 100 150 200 250-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
![Page 16: 2. The PARAFAC model](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814a42550346895db75e3b/html5/thumbnails/16.jpg)
16
Alternating least squares (ALS) Alternating least squares (ALS)
• How to estimate the PCA model X = ABT + E?
• Step 0 - Initialize B
1T2Tmin
BBXBAABXA
• Step 1 - Estimate A using least squares:
1TT2TTmin
AAAXBBAXB
• Step 2 - Estimate B using least squares:
• Step 3 - Check for convergence - if not, go to Step 1.
Each update must reduce the sum-of-squares, 2
E
![Page 17: 2. The PARAFAC model](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814a42550346895db75e3b/html5/thumbnails/17.jpg)
17
Three different unfoldings – the formula is symmetricThree different unfoldings – the formula is symmetric
XIJK = A(CB)T + EIJK
XJKI = B(AC)T + EJKI
XKIJ = C(BA)T + EKIJ
or
or
XIJK
XJKI
XKIJ
![Page 18: 2. The PARAFAC model](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814a42550346895db75e3b/html5/thumbnails/18.jpg)
18
How is the PARAFAC model calculated? How is the PARAFAC model calculated?
• Step 0 - Initialize B & C
2TJKI
TT
min AZX
BCZ
A
• Step 1 - Estimate A:
• Step 4: Check for convergence. If not, go to Step 1.
• Step 3 - Estimate C in same way:2TIJKmin CZX
C
• Step 2 - Estimate B in same way:2TKIJmin BZX
B
• How to estimate the model X = A(CB)T + E?
![Page 19: 2. The PARAFAC model](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814a42550346895db75e3b/html5/thumbnails/19.jpg)
19
Good initialization is sometimes importantGood initialization is sometimes important
Initialization methods
– random numbers (do this ten times and compare models)
– use another method to give rough estimate (e.g. DTLD, MCR)
– use sensible guesses (e.g. elution profiles are Gaussian)
2E
response surface
initialize B & C good solution
local minium
initialize B* & C*
ALS
ALS
![Page 20: 2. The PARAFAC model](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814a42550346895db75e3b/html5/thumbnails/20.jpg)
20
Conclusions (1)Conclusions (1)
• The PARAFAC model decomposes a three-way array array into three sets of loadings – one for each ‘mode’.Each set of loadings describes the variation in that mode, e.g. differences in concentration, changes in time, spectral profiles etc.
• PARAFAC components are calculated together and have no particular order. PARAFAC components are not orthogonal and cannot be rotated.
• PARAFAC can be used for curve resolution and for calibration.
![Page 21: 2. The PARAFAC model](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814a42550346895db75e3b/html5/thumbnails/21.jpg)
21
Conclusions (2)Conclusions (2)
• Some data sets have a chemical structure which is particularly suitable for the PARAFAC model, e.g. fluorescence spectroscopy.
• The PARAFAC model can also be used for four-way, five-way, N-way etc. data by simply using more sets of loadings.