Exploring metabolomic data from
designed experiments using
ANOVA Multiblock Orthogonal
Partial Least Squares
Julien Boccard
School of Pharmaceutical Sciences
University of Geneva, University of Lausanne
Experimental design in metabolomics
Several factors can be evaluated simultaneously
e.g. dose, time, genotype
Many experimental setups generate multivariate data
e.g. spectra, omics, multi-components response
Each factor has different levels
e.g. quantitative, qualitative, ordinal
Full factorial designs allow the systematic evaluation of
- main effects
- interactions between factors
• MANOVA is not able to handle underdetermined systems (k>n)
• PCA mixes the different sources of variation
ANOVA and multivariate data analysis
How to account for the study design and covariances between variables?
Between group
variation
Total variation
Within group
variation
ASCA
ANOVA-PCA
ANOVA-PLS
ANOVA-TP
AComDim
Resαββαμ X XXXXX
Existing strategies associate ANOVA decomposition of the
experimental matrix with projection methods
ANOVA Multiblock OPLS workflow
Experimental matrix
(n x k) X
ANOVA
decomposition
(n x k) + + + XRes X A X B X AB
SVD SVD SVD
For each main effect and interaction term:
Extraction of levels barycentres from pure effect
submatrices by Singular Value Decomposition
ASCA
ANOVA Multiblock OPLS workflow
Experimental matrix
(n x k) X
ANOVA
decomposition
(n x k) + + + XRes X A X B X AB
XRes X A+XRes X B+XRes X AB+XRes
For each main effect and interaction term:
Computation of experimental submatrices
Pure effect submatrices + Residuals
Hypothesis:
The structure of a significant effect will emerge from noise
ANOVA-PCA ANOVA-TP AComDim
ANOVA Multiblock OPLS workflow
Experimental matrix
(n x k) X
ANOVA
decomposition
(n x k) + + + XRes X A X B X AB
XRes X A+XRes X B+XRes X AB+XRes
X = tpαppαT + tpβppβ
T + tpαβppαβT + topo
T + E
Y = tpαqpαT + tpβqpβ
T + tpαβqpαβT + F
Joint analysis of
the submatrices
Prediction of level barycentres based on experimental submatrices
multiblock OPLS
Y
AMOPLS model outputs
Supervised multiblock OPLS model
Joint decomposition based on predictive/orthogonal component(s)
Balance between block saliences
Assign each latent structure to a factor
Assess statistical relevance by comparison with ANOVA residuals
Permutation tests (effect-to-residuals ratio)
λp2 λo1 λp1
Observations scores
Check for sample groupings tp2
tp1
tp4
tp3
to2
to1
Variables loadings
Highlight relevant biomarkers pp2
pp1
pp4
pp3
Dataset
UHPLC-TOF/MS metabolic profiles of 3D aggregating rat brain cells
36 observations x 1’397 variables (m/z @ RetTime)
Paraquat Neurotoxicity - Dataset
2 Factors design
(i) Maturation (2 levels): Immature / Mature
(ii) Paraquat (3 levels): Control / Paraquat 0.5 µM / Paraquat 1 µM
Imm
atu
re
Ma
ture
Control Paraquat 0.5 µM Paraquat 1 µM
Paraquat Neurotoxicity - PCA
-40
-30
-20
-10
0
10
20
30
-50 -40 -30 -20 -10 0 10 20 30 40 50
tp2 (Genotype*Time) vs. toScores
t1 32.5%
t2 10.3%
Immature
Ctrl
Immature
0.5 µM
Immature
1 µM
Mature
1 µM
Mature
0.5 µM
Mature
Ctrl
The principal components are not easily interpretable
?The factors under study affect the metabolic profiles
The groups are not well-separated
The effects of Maturation and Paraquat are mixed up
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Maturation Paraquat Interaction Residuals
to
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Maturation Paraquat Interaction Residuals
tp3
Paraquat Neurotoxicity - AMOPLS
4 terms ANOVA decomposition:
6 components AMOPLS model
5 predictive (2 main effects + 1 interaction) + 1 orthogonal
Goodness of fit
R2 0.903
Maturation 25.8%
Paraquat 10.6%
Interaction 4.5%
Residuals 59.1%
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Maturation Paraquat Interaction Residuals
tp1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Maturation Paraquat Interaction Residuals
tp2
0
0.1
0.2
0.3
0.4
0.5
0.6
Maturation Paraquat Interaction Residuals
tp4
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Maturation Paraquat Interaction Residuals
tp5
tp1: Maturation tp2: Paraquat
tp3: Interaction tp4: Interaction
tp5: Paraquat to1: Residuals
Maturation (p<1%)
Interaction (p>5%)
Permutation tests
Global model (p<5%)
Effect-to-residuals ratio
Paraquat (p<1%)
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
tp1
Paraquat Neurotoxicity - Maturation effect
tp1: Scores related to Maturation
tp1
Immature
Mature
-8
-6
-4
-2
0
2
4
6
8
pp1
Maturation has a strong
impact on metabolic profiles
Immature Mature
Acylcarnitines Tyrosine dipeptides
Neurotensin Hydroxyvitamin D
Mature cells exhibit more
elaborated phenotypes
Shift from tissue growth
to neuronal cells differentiation
Mature Immature
Gangliosides (GA1, GM2) Arachidonic acid metabolism Glutamate dipeptides Precursors of serotonin Sphingomyelin
-8
-6
-4
-2
0
2
4
6
8
Paraquat Neurotoxicity - Treatment effect
tp2: Scores related to Paraquat
Paraquat Control
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
-0.25 -0.15 -0.05 0.05 0.15 0.25
tp2 (Paraquat) vs. to
Control 0.5 µM
1 µM
tp2
pp2 Dihydroquinone QH2
Peroxydation Ubiquinone-1
Prostaglandin metabolites
Cathecholamine precursors Glutamate pathway Prostamides
Intoxicated cells show altered
metabolic profiles
Shift from neurotransmission
to oxidative stress
tp5
Paraquat treatment induces
a dose-related response
Conclusions
Supervised analysis of ANOVA submatrices
Joint analysis of the effects
Easy interpretation
Contribution of each ANOVA submatrix
Objective evaluation of the effects
AMOPLS combines ANOVA decomposition of the sources of
variation, OPLS interpretability and multiblock modelling
Broad field of application
(biomarker discovery, method development, etc.)
Prof. Serge Rudaz
Dr. Fabienne Jeanneret
Dr. David Tonoli
Acknowledgements
Prof. Florianne Tschudi-Monnet
Dr. Jenny Sandström von Tobel