jianguo (jeff) xia dr. david wishart lab university of alberta, canada the fifth international...
TRANSCRIPT
![Page 1: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/1.jpg)
Jianguo (Jeff) Xia
Dr. David Wishart Lab
University of Alberta, Canada
The Fifth International Conference of Metabolomic Society
![Page 2: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/2.jpg)
OutlineI. Overview of procedures for metabolomic
studies
II. Introduction to different data processing & statistical methods
III. MetaboAnalyst – a web service for metabolomic data processing, analysis and annotation
IV. Conclusions & future directions
The Fifth International Conference of Metabolomic Society 2
![Page 3: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/3.jpg)
The Fifth International Conference of Metabolomic Society 3
![Page 4: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/4.jpg)
Data collection
The Fifth International Conference of Metabolomic Society 4
Biological Samples → Spectra
![Page 5: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/5.jpg)
Data processing
The Fifth International Conference of Metabolomic Society 5
Raw Spectra → Data Matrix
![Page 6: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/6.jpg)
Data analysis
The Fifth International Conference of Metabolomic Society 6
Extract important features/patterns
![Page 7: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/7.jpg)
Data interpretation
Mainly a manual processRequire domain expert knowledgeTools are coming:
Comprehensive metabolite databases Network visualization Pathway analysis
The Fifth International Conference of Metabolomic Society 7
Features/patterns → biological knowledge
![Page 8: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/8.jpg)
The Fifth International Conference of Metabolomic Society 8
![Page 9: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/9.jpg)
Data processing (I)
The Fifth International Conference of Metabolomic Society 9
Purposes:To convert different metabolomic data into data
matrices suitable for varieties of statistical analysisQuality control
To check for inconsistencies To deal with missing values To remove noises
![Page 10: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/10.jpg)
Data processing (II)
The Fifth International Conference of Metabolomic Society 10
A data matrix with rows represent samples and columns represents features (concentrations/intensities/areas)
![Page 11: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/11.jpg)
Data normalization
The Fifth International Conference of Metabolomic Society 11
Purposes:To remove systematic variation between experimental
conditions unrelated to the biological differences (i.e. dilutions, mass) Sample normalization (row-wise)
To bring variances of all features close to equal Feature normalization (column-wise)
![Page 12: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/12.jpg)
Sample normalization
The Fifth International Conference of Metabolomic Society 12
By sum or total peak areaBy a reference compound (i.e. creatinine, internal
standard)By a reference sample
a.k.a “probabilistic quotient normalization” (Dieterle F, et al. Anal. Chem. 2006)
By dry mass, volume, etc
![Page 13: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/13.jpg)
Feature normalizationLog transformationScaling
The Fifth International Conference of Metabolomic Society 13
-- van den Berg RA, et al. BMC Genomics (2006) 7:142
![Page 14: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/14.jpg)
The Fifth International Conference of Metabolomic Society 14
![Page 15: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/15.jpg)
Data Analysis
The Fifth International Conference of Metabolomic Society 15
![Page 16: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/16.jpg)
Volcano-plotArrange features along dimensions of statistical (p-
values from t-tests) and biological (fold changes) changes;
The assumption is that features with both statistical and biological significance are more likely to be true positive.
Widely used in microarray and proteomics data analysis
The Fifth International Conference of Metabolomic Society 16
![Page 17: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/17.jpg)
PLS-DADe facto standard for
chemometric analysisA supervised method that
uses multiple linear regression technique to find the direction of maximum covariance between a data set (X) and the class membership (Y)
Extracted features are in the form of latent variables (LV)
The Fifth International Conference of Metabolomic Society 17
![Page 18: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/18.jpg)
PLS-DA for feature selectionVariable importance in projection or VIP score
A weighted sum of squares of the PLS loadings. The weights are based on the amount of explained Y-variance in each dimension.
Based on the weighted sum of PLS-regression
coefficients. The weights are a function of the reduction of the sums of
squares across the number of PLS components.
The Fifth International Conference of Metabolomic Society 18
![Page 19: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/19.jpg)
Over fitting problemPLS-DA tend to over fit data
It will try to separate classes even there is no real difference between them! Westerhuis, C.A., et al. (2007) Assessment of PLSDA cross
validation. Metabolomics, 4, 81-89.
Require more rigorous validationFor example, to use permutations to test the
significance of class separations
The Fifth International Conference of Metabolomic Society 19
![Page 20: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/20.jpg)
1) Use the same data set with its class labels reassigned randomly.
2) Build a new model and measure its performance (B/W)
3) Repeat many times to estimate the distribution of the performance measure (not necessarily follows a normal distribution).
4) Compare the performance using the original label and the performance based on the randomly labeled data
Permutation tests
The Fifth International Conference of Metabolomic Society 20
![Page 21: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/21.jpg)
Multi-testing problemP-value appropriate to a single test situation is
inappropriate to presenting evidence for a set of changed features.Adjusting p-values
Bonferroni correction Holm step-down procedure
Using false discovery rate (FDR) A percentage indicating the expected false positives among
all features predicted to be significant More powerful, suitable for multiple testing
The Fifth International Conference of Metabolomic Society 21
![Page 22: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/22.jpg)
A well-established method widely used for identification of differentially expressed genes in microarray experiments
Use moderated t-tests to computes a statistic dj for each gene j, which measures the strength of the relationship between gene expression (X) and a response variable (Y).
Uses non-parametric statistics by repeated permutations of the data to determine if the expression of any gene is significant related to the response.
Significance Analysis of Microarray (and Metabolomics)
The Fifth International Conference of Metabolomic Society 22
![Page 23: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/23.jpg)
ClusteringUnsupervised learningGood for data overviewUse some sort of distance measures to group
samplesPCAHeatmap & dendrogramSOM & K-means
The Fifth International Conference of Metabolomic Society 23
![Page 24: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/24.jpg)
ClassificationSupervised learningMany traditional multivariate statistical methods are
not suitable for high-dimensional data, particularly small sample size with large feature numbers
New or improved methods, developed in the past decades for microarray data analysisSupport vector machine (SVM)Random Forests
The Fifth International Conference of Metabolomic Society 24
![Page 25: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/25.jpg)
The Fifth International Conference of Metabolomic Society 25
![Page 26: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/26.jpg)
Microarray data analysis pipeline
The Fifth International Conference of Metabolomic Society 26
![Page 27: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/27.jpg)
Bijlsma S et al. Large-scale human metabolomics studies: a strategy for data (pre-) processing and validation. Anal. Chem. (2006) 78:567–574
A proposed pipeline for metabolomics studies
The Fifth International Conference of Metabolomic Society 27
![Page 28: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/28.jpg)
-- A web service for high-throughput metabolomic data processing, analysis and annotation
-- Implementation of all the methods mentioned in the form of user-friendly web interfaces
-- www.metaboanalyst.ca
The Fifth International Conference of Metabolomic Society 28
![Page 29: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/29.jpg)
The Fifth International Conference of Metabolomic Society 29
Metaboliteconcentrations
Data annotation• Peak searching (3)• Pathway mapping
MS / NMRpeak lists
GC/LC-MSraw spectra
• Peak detection• Retention time correction
Peak alignment
Data analysis• Univariate analysis (3)• Dimension reduction (2)• Feature selection (2)• Cluster analysis (4)• Classification (2)
• Data integrity check• Missing value imputation
Data normalization• Row-wise normalization (4)• Column-wise normalization (4)
MS / NMR spectra bins
Baseline filtering
Download• Processed data• PDF report• Images
![Page 30: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/30.jpg)
Implementation features
The Fifth International Conference of Metabolomic Society 30
![Page 31: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/31.jpg)
The Fifth International Conference of Metabolomic Society 31
![Page 32: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/32.jpg)
Some usage statistics
Over 1,200 visits since publication (~15 / day)
The Fifth International Conference of Metabolomic Society 32
![Page 33: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/33.jpg)
Current status
The Fifth International Conference of Metabolomic Society 33
![Page 34: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/34.jpg)
Challenges & future directionsUnbiased and comprehensive survey of
metabolomeNMR only able to detect more abundant compound
species (> 1 µmol)MS are usually optimized to detect compounds of
certain classesSystematic classification of compounds (ontology) More efficient pathway analysis & visualization
The Fifth International Conference of Metabolomic Society 34
![Page 35: Jianguo (Jeff) Xia Dr. David Wishart Lab University of Alberta, Canada The Fifth International Conference of Metabolomic Society](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649da75503460f94a92b44/html5/thumbnails/35.jpg)
Acknowledgement
• Dr. David Wishart
• Dr. Nick Psychogios
• Nelson Young
The Fifth International Conference of Metabolomic Society 35
Alberta Ingenuity Fund (AIF)
The Human Metabolome Project (HMP)
University of Alberta