combining asca and mixed models to analyse high dimensional … · 2018-02-09 · combining asca...
Post on 12-Jul-2020
1 Views
Preview:
TRANSCRIPT
Combining ASCA and mixed models to analysehigh dimensional designed data
M. Martin 1 P. De Tullio 2 B. Govaerts 1
1 Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA), Université catholique de Louvain (UCL), Belgium
2Laboratoire de Chimie Pharmaceutique, Université de Liège (ULg), Liège 1, Belgium
January 30, 2018
Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References
High Dimensional Experimental Design data
Challenging data analysis
Application to life science data:From -omics sciences: genomics, transcriptomics, metabolomicsHigh Dimensional: multivariate; often m variables > n samplesBiological variability, instrumental noise/artifactsMulticollinearity between variables
Nontrivial Design Of Experiments (DOE):ex.: longitudinal, multi-centre, cross-over studies, etc.Presence of random factors (day, lab variation, ...)
2 / 24
Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References
Table of content
High Dimensional designed data analysis
ASCA methodology
Extension to mixed models: description & application
3 / 24
Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References
Methods needed to analyse HD designed data
Multivariate projection methods
- PCA, ICA, (O)PLS, ...- Disadvantages:
Simple exp. designs (e.g. 2 groups comparison)
Few statistical modelling and tests
Statistical regression methods
Depends on the response dimension:y(1�m): linear or logistic regression, ANOVA, mixed modelsY(n�m) with m < n: MANOVA, multivariate-GLM
But often mo n) need to combine dimension reduction and statisticalmodelling.
4 / 24
Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References
ASCA methodology
) Combine the dimension reduction methodsand statistical modelling [Jansen et al., 2005]:
ANOVA-Simultaneous Components Analysis (ASCA)
Example: crossed ANOVA II model:yijk = �:: + �i + �j + (��)ij + �ijk
STEP 1: Parallel ANOVA decomposition of YY(n�m) is decomposed into effect matrices:
Y = M0 + MA + MB + MAB + E
STEP 2: (Residual-Augmented) effect matricesvisualisation
ASCA: Mf = TfP′f APCA: Mf +E = Tf P
′f
STEP 3: Effect importance quantification based onFrobenius norms jjMf jj
2
STEP 4: Global measure of effect significance based onpermutation tests
5 / 24
Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References
Limitations with classic ASCA and new developments
1. Balanced ANOVA designs with fixed factors
) Generalisation to unbalanced data with fixed and random factors
2. // ANOVA does not take into account the correlation between the responses
) Prior PCA reduction & back-transformations
3. Permutation tests implementation is challenging for advanced DOE[Anderson and Braak, 2003]
) Use of alternative test strategies: Likelihood ratio tests & bootstrap
6 / 24
Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References
APCA+ for unbalanced designs (Thiel et al. [2017], Guisset et al. [Submitted])
7 / 24
Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References
Mixed Models for High Dimensional Designed Data (MiMoHD3)
8 / 24
Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References
The Metabiose repeatability datasets
Context:Urine and Serum 1H-NMR spectral data from control and endometriosis patients
Statistical perspective of spectral reproducibility/repeatability and quality control: not yet wellstudied in metabolomics
Unbalanced design for the Serum dataset
3 factors:1 fixed: groups (G; 2)
2 random: patients (P; 7/group) &repetitions over weeks (W; 3)
Main research questions:Compare the groups
Quantify the variability of therepetitions and the patients
Test the significance of theserandom/fixed effects
Experimental design
9 / 24
Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References
Step 0: Prior PCA dimension reduction (1)
10 / 24
Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References
Step 0: Prior PCA dimension reduction (2)
Transform the highly correlated response matrix into a reduced number of orthogonalcomponents without information loss
PCA dimension reduction on the response matrix Y = TCP′C + F
Keep C = 11 first Principal Components (PC) withP
Cvar(PC) � 99%
●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●●
●
●●●
●●●●
●●
●●
●●
●●
● ●
●
●
−0.04
−0.02
0.00
0.02
0.04
0.06
−0.10 −0.05 0.00 0.05 0.10
PC1 (66.17%)
PC
2 (1
5.52
%)
Group
Endo
Control
Scores plot − Serum
●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●●
●
●●●
●●●●
●●
●●
●●
●●
● ●
●
●
−0.04
−0.02
0.00
0.02
0.04
0.06
−0.10 −0.05 0.00 0.05 0.10
PC1 (66.17%)
PC
2 (1
5.52
%)
Patient
210E
211E
214E
219C
221C
223E
226C
241E
248C
254E
258C
261C
265E
266C
Scores plot − Serum
11 / 24
Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References
Step 1: Parallel mixed models on Tn�C
12 / 24
Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References
General framework for mixed models
Fixed + random factors vs linear regression (only 1 source of random variation)
The mixed model for one response tc can be written as:
Variance components = �2P + �
2R + �
2
Drop the hypothesis of independence between the samples) model advanced designs
Coding: Sum coding for fixed and dummy coding for random effects
Typical applications: Multicentre study, Multilevel data, Repeated data, etc.
Parameters are estimated with the Restricted Maximum Likelihood (REML) method
13 / 24
Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References
Step 2: PCA decomposition of (RA) effect matrices
14 / 24
Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References
PCA on fixed/random pure effect matrices
●●●●●●●● ●●●●●●● ●●●●● ●●●●●● ●●●●●●●●● ●●●
−0.50
−0.25
0.00
0.25
0.50
−0.02 −0.01 0.00 0.01
PC1 (100%)
PC
2 (0
%)
Scores plot − Group effect − Serum
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−4e−04
−2e−04
0e+00
2e−04
−0.002 −0.001 0.000 0.001 0.002
PC1 (97.61%)
PC
2 (2
.39%
)
Scores plot − Repetition effect − Serum
●●●
●●
●●●
●●
●●●
●●
●●●
●●
●●●
●●●
●●●
●●●
●●●●●●
−0.02
−0.01
0.00
0.01
0.02
−0.050 −0.025 0.000 0.025
PC1 (75.88%)
PC
2 (1
8.2%
)
Scores plot − Patient effect − Serum
PC
1
2.55.07.510.0−0.8
−0.6
−0.4
−0.2
0.0
Loadings plot − Group effect − Serum
PC
1P
C2
2.55.07.510.0
−0.2
0.0
0.2
0.4
0.6
−0.25
0.00
0.25
Loadings plot − Repetition effect − Serum
PC
1P
C2
2.55.07.510.0
−0.50
−0.25
0.00
−0.50
−0.25
0.00
0.25
0.50
Loadings plot − Patient effect − Serum
15 / 24
Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References
PCA on fixed/random Residual-Augmented effect matrices
In APCA: E added to Mf
But which variance components should be added for mixed models?
Solution: RA effect matrices based on the ANOVA F-tests (Expected Mean Squaresratio)
MG = MG+MP +E
MP = MP+E
MR = MR+E
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●●
●
●
●●
●●●●
●●
●
●
●●
●●
● ●
●
●
−0.02
0.00
0.02
0.04
−0.05 0.00 0.05
PC1 (66.86%)
PC
2 (1
5.67
%)
Group
Endo
Control
Scores plot − RA Group effect − Serum
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●●●
●
●
−0.02
−0.01
0.00
0.01
0.02
−0.02 −0.01 0.00 0.01 0.02
PC1 (58.33%)
PC
2 (2
2.02
%)
Repetition
1
2
3
Scores plot − RA Repetition effect − Serum
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
●
●●●
●●
●
●
●●
● ●
●
●
●
●
−0.02
0.00
0.02
0.04
−0.075 −0.050 −0.025 0.000 0.025 0.050
PC1 (61.58%)
PC
2 (1
8.17
%)
Patient
210E
211E
214E
219C
221C
223E
226C
241E
248C
254E
258C
261C
265E
266C
Scores plot − RA Patient effect − Serum
16 / 24
Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References
Steps 3 & 4: Global measure of factor importance/significance
17 / 24
Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References
Step 3: Quantification of effect importance
For each response tc, c = 1; ::; C:
Random effects: Variance components
�2P;c; �2R;c; �
2c
Fixed effects [Nakagawa and Schielzeth, 2013]:
�2G;c = var(�G;cxG)
For all tc responses:
Total variance:
�2tot =PC
c=1(�2G;c + �2P;c + �2R;c + �2c )
Serum Urine
GroupPatientRepetitionResiduals
Variance components percentage
Medium
020
4060
80
Serum UrineGroup 14.32 2.4Patient 64.03 91.48Repetition 0.6 0Residuals 21.05 6.12
Effect importance percentage
18 / 24
Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References
Step 4: Global measure of effect significance
Likelihood Ratio Test (LRT)Test the significance of a fixed/random effect in a (mixed) linear modelCompare the likelihoods L between nested models
LRT statistic: 2[log(Lfull)� log(Lnull)] �H0�2df
a Global LRT (GLRT)The test statistic for a fixed/random effect matrix
GLRT statistic: 2[PC
c=1(log(Lfull;c)� log(Lnull;c))] �H0
�2C�df
Assess the significance of an effect based on:
The �2 distribution with known df (fixed effects)bootstrap simulations (fixed/random effects)
19 / 24
Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References
Histograms of bootstrapped GLRT (Nsim = 2000)
Fixed Group effect − Serum
0 10 20 30 40
0.00
0.02
0.04
0.06
0.08
●
● True GLRT: 8.93Kernel densitychi2 distrib. (df=11)qchi2=0.95
p−val: 0.72
Fixed Group effect − Urine
0 10 20 30 40
0.00
0.02
0.04
0.06
0.08
●
● True GLRT: 15.61Kernel densitychi2 distrib. (df=12)qchi2=0.95
p−val: 0.365
20 / 24
Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References
Histograms of boostraped GLRT (Nsim = 2000)
Random Patient effect − Serum
0 20 40 60 80 100
0.00
0.04
0.08
0.12
●
● True GLRT: 112.44Kernel density
p−val: 0***
Random Repetition effect − Serum
Den
sity
0 5 10 15 20
0.00
0.05
0.10
0.15
0.20
●
● True GLRT: 2.01Kernel density
p−val: 0.84
Random Patient effect − Urine
0 200 400 600 800
0.00
0.02
0.04
0.06
0.08
0.10
0.12
●
● True GLRT: 850.15Kernel density
p−val: 0***
Random Repetition effect − Urine
Den
sity
0 5 10 15 20 25
0.00
0.05
0.10
0.15
0.20
●
● True GLRT: 0Kernel density
p−val: 1
21 / 24
Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References
In summary
MiMoHD3, a combination of mixed models & multivariate projection methods
Innovative extension and generalisation in the ASCA(+) framework
Enable to model unbalanced designs with random factors
Take into account the correlation between the response variables
Global test of effect significance
Quantification and comparison of the mixed variability sources
Targeted applications: repeatability/reproductibility study & longitudinal data
22 / 24
Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References
Acknowledgments
23 / 24
Introduction ASCA-related methods MiMoHD3 In summary Acknowledgments References
References
Marti Anderson and Cajo Ter Braak. Permutation tests for multi-factorial analysis of variance.Journal of Statistical Computation and Simulation, 73(2):85–113, 2003. doi:10.1080/00949650215733. URL http://dx.doi.org/10.1080/00949650215733.
Séverine Guisset, Bernadette Govaerts, and Manon Martin. Comparison of parafasca, acomdim,and amopls approaches in the multivariate glm modelling of multi-factorial designs.Chemometrics and Intelligent Laboratory Systems, Submitted.
Jeroen J. Jansen, Huub C. J. Hoefsloot, Jan van der Greef, Marieke E. Timmerman, Johan A.Westerhuis, and Age K. Smilde. Asca: analysis of multivariate data obtained from anexperimental design. Journal of Chemometrics, 19(9):469–481, 2005. ISSN 1099-128X. doi:10.1002/cem.952. URL http://dx.doi.org/10.1002/cem.952.
Shinichi Nakagawa and Holger Schielzeth. A general and simple method for obtaining r2 fromgeneralized linear mixed-effects models. Methods in Ecology and Evolution, 4(2):133–142,2013. ISSN 2041-210X. doi: 10.1111/j.2041-210x.2012.00261.x. URLhttp://dx.doi.org/10.1111/j.2041-210x.2012.00261.x.
Michel Thiel, Baptiste Féraud, and Bernadette Govaerts. Asca+ and apca+: Extensions of ascaand apca in the analysis of unbalanced multifactorial designs. Journal of Chemometrics, 31(6):e2895–n/a, 2017. ISSN 1099-128X. doi: 10.1002/cem.2895. URLhttp://dx.doi.org/10.1002/cem.2895. e2895 cem.2895.
24 / 24
top related