genomica)funzionale) genomica ii- lezione v... · integration of metabolomics with other...
TRANSCRIPT
Genomica Funzionale
10-‐14 Febbraio 2014
Topics covered
-‐ Metabolic Engineering;
-‐ Concept of Metabolomics;
-‐ Metabolomic plaAorms (LC-‐MS, GC-‐MS, NMR, ICP-‐MS etc); -‐ Set up of a metabolomic protocol and database;
-‐ ApplicaLons in plant-‐/food science field;
-‐ BioinformaLcs applied to Metabolomic data.
Metabolic engineering of plant volatiles (aromas)
2007
Metabolic engineering of plant volatiles (defense)
Metabolomics in Association Mapping Studies
Glucosinolate pathway
Jansen et al, 12
Mass Spectrometry Imaging Technique used in mass spectrometry to visualize the spaLal distribuLon of e.g.
compounds, biomarker, metabolites, pepLdes or proteins by their molecular masses.
-‐ SIMS -‐ MALDI -‐ DESI
Integration of metabolomics with other ‘omics’ fields
• Integrating genomics and metabolomics for engineering plant metabolic pathways - Kirsi-Marja Oksman-Caldentey and Kazuki Saito (2005);
• Proteomic and metabolomic analysis of cardioprotection: Interplay between protein kinase C epsilon and delta in regulating glucose metabolism of murine hearts;
• Plant studies (2005) to integrate transcriptomics, proteomics and
metabolomics in an effort to enhance production efficiency under stressful conditions of grapes.
How to beYer invesLgate metabolic engineered plant products?
2007
In vivo studies
Phenomic data
Fluxomics
SYSTEMS BIOLOGY
potato oligo array 42150 probes
1. TranscripLonal Profiling
Transcriptomic, metabolomic and phenomic profiling
2. Metabolic Profiling
GC-‐ToF-‐MS and LC-‐MS
Data analysis Transcriptome + Metabolome + Phenome
3. Phenotyping
Instron, Penetrometer etc
Mapping Soawares Heatmap-‐Clustering CorrelaLon/Network biology
Mapping of transcript/metabolite data
Mapman representaLon of 5.000 gene+metabolite data in 2 transgenic lines
Metabolome alteraLons in “Golden” potatoes
Krebs cycle
(+)
FaYy acids
Carotenoids
Tocopherols
AA
Sugars
Arom AA
Org. acids
Sugars
Phytosterols
AA
(-‐)
Principal Component Analysis (PCA)
• Unsupervised • Multivariate analysis based on projection methods • Main tool used in chemometrics • Extract and display the systematic variation in the data • Each Principle Component (PC) is a linear combination of
the original data parameters • Each successive PC explains the maximum amount of
variance possible, not accounted for by the previous PCs • PCs Orthogonal to each other • Conversion of original data leads to two matrices, known as
scores and loadings • The scores(T) represent a low-dimensional plane that
closely approximates X. Linear combinations of the original variables. Each point represents a single sample spectrum.
• A loading plot/scatter plot(P) shows the influence (weight) of the individual X-variables in the model. Each point represents a different spectral intensity.
• The part of X that is not explained by the model forms the residuals(E)
• X = TPT = t1p1T + t2p2
T + ... + E
Metabolomic Microarray
Principal Component Analysis
Urbanczyk- Wochniak et al., 03
Soft Indipendent Modeling of Class Analogy (SIMCA)
• Supervised learning method based on PCA
• Construct a seperate PCA model for each known class of observations
• PCA models used to assign the class belonging to observations of unknown class origin
• Boundaries defined by 95% class interval
• Recommended for use in one class case or for classification if no interpretation is needed
CLASS SPECIFIC STUDIES n One-class problem: Only disease observations
define a class; control samples are too heterogeneous, for example, due to other variations caused by diseases, gender, age, diet, lifestyle, etc.
n Two-class problem: Disease and control observations define two seperate classes
Partial Least Square Discriminant Analysis (PLS)
• Supervised learning method. • Recommended for two-class cases instead of
using SIMCA. • Principles that of PCA. But in PLS, a second
piece of information is used, namely, the labeled set of class identities.
• Two data tables considered namely X (input data from samples) and Y (containing qualitative values, such as class belonging, treatment of samples)
• The quantitive relationship between the two tables is sought.
• X = TPT + E • Y = TCT + E • The PLS algorithm maximizes the covariance
between the X variables and the Y variables • PLS models negatively affected by systematic
variation in the X matrix not related to the Y matrix (not part of the joint correlation structure between X-Y.
OPLS
• OPLS method is a recent modification of the PLS method to help overcome pitfalls • Main idea to seperate systematic variation in X into two parts, one linearly related to Y and one unrelated
(orthogonal). • Comprises two modeled variations, the Y-predictive (TpPp
T) and the Y-orthogonal (ToPoT) compononents.
• Only Y-predictive variation used for modeling of Y. • X = TpPp
T + ToPoT + E
• Y = TpCpT + F
• E and F are the residual matrices of X and Y • OPLS-DA compared to PLS-DA
Method of cluster analysis which seeks to build a hierarchy of clusters.
DireYo et al., 10
“Local” Clustering
“Global” Clustering
Hierarchical clustering
CorrelaLon coefficients
Pairwise correla-on analysis
Heat-‐Map Clustering Network
How (and how much) does data correlate?
Measures of dependence
Ascobate/CONSTANS Lysine/ WRKY6
Sucrose/Sucrose Transporters 4-Aminobutric Acid/ Glutamate Decarboxylase
Expected correla-ons
Unintended correla-ons
Urbanczyk- Wochniak et al., 03
Pairwise correlaLon analysis
CorrelaLon matrix
Carrari et al., 06
the matrix of Pearson product-‐moment correlaLon coefficients between each of the random variables in the random vector {X}
InteracLon Network
VirtualPlant
Libourel and Shachar-‐Hill, 08
CorrelaLon Networks (“local” biology) CrtI PSY1 PSY2 PDS ZDS CrtISO LCY-b LCY-e CHY1 CHY2 CYP97A CYP97C ZEP NXS Lutein Zea Anthera Viola Neo
CrtI 1 PSY1 -0.984 1 PSY2 -0.98 0.932 1 PDS 0.994 -0.997 -0.955 1 ZDS 0.194 -0.361 0 0.293 1 CrtISO 0.941 -0.868 -0.988 0.901 -0.148 1 LCY-b -0.9 0.962 0.799 -0.94 -0.6 -0.701 1 LCY-e -0.339 0.498 0.151 -0.434 -0.988 -0.002 0.714 1 CHY1 0.688 -0.552 -0.816 0.61 -0.577 0.892 -0.305 0.447 1 CHY2 -0.28 0.109 0.461 -0.18 0.887 -0.587 -0.164 -0.807 -0.888 1 CYP97A 0.973 -0.998 -0.911 0.992 0.411 0.839 -0.975 -0.544 0.505 -0.054 1 CYP97C 0.982 -0.935 -0.999 0.958 0.008 0.987 -0.804 -0.159 0.811 -0.453 0.914 1 ZEP -0.252 0.08 0.435 -0.151 0.9 -0.564 -0.193 -0.824 -0.875 0.999 -0.025 -0.427 1 NXS 0.187 -0.014 -0.374 0.085 -0.927 0.508 0.257 0.859 0.841 -0.995 -0.04 0.366 -0.997 1 Lutein 0.683 -0.799 -0.528 0.754 0.848 0.396 -0.932 -0.918 -0.058 0.509 0.831 0.536 0.534 -0.588 1 Zea 0.188 -0.356 0.005 0.288 0.999 -0.154 -0.596 -0.987 -0.582 0.889 0.406 0.003 0.902 -0.929 0.845 1 Anthera 0.899 -0.81 -0.967 0.85 -0.253 0.994 -0.621 0.104 0.935 -0.67 0.777 0.965 -0.649 0.597 0.296 -0.258 1 Viola 0.999 -0.983 -0.981 0.994 0.189 0.942 -0.898 -0.335 0.692 -0.284 0.972 0.983 -0.256 0.192 0.679 0.183 0.901 1 Neo 0.983 -0.999 -0.93 0.997 0.366 0.865 -0.964 -0.503 0.547 -0.103 0.998 0.933 -0.074 0.008 0.803 0.361 0.807 0.982 1
CorrelaLon Matrix
ns=node strength=Σ⏐ρ⏐of a node/n NS=Network Strength=Σ(ns)/n n=number of nodes
Network CorrelaLon file
pP-‐I; n=19; NS=0.62 pP-‐BI; n=20; NS=0.79 pP-‐YBI n=24; NS=0.79
Transgenes + Carotenoid genes + Carotenoids
NegaLve CorrelaLon PosiLve CorrelaLon Gene Carotenoid
Transgene
Only correlaLons ⏐ρ⏐>0.6 are shown
PosiLve hub
NegaLve hub
NegaLve hub
PosiLve hub
Correlation Network for fishing candidates…
• Node size according ns • Only correlations ⏐ρ⏐>0.65 are shown • Edge width according ⏐ρ⏐
Gene Metabolite
Negative Correlation Positive Correlation ns=node strength= AVG⏐ρ⏐
n=number of nodes
NS=network strength= AVG ns
lycopene β-‐carotene Total Carotenoids
CorrelaLon Network of carotenoids + 100 volaLles (I) CrtI PSY1 PSY2 PDS ZDS CrtISO LCY-b LCY-e CHY1 CHY2 CYP97A CYP97C ZEP NXS Lutein Zea Anthera Viola Neo
CrtI 1 PSY1 -0.984 1 PSY2 -0.98 0.932 1 PDS 0.994 -0.997 -0.955 1 ZDS 0.194 -0.361 0 0.293 1 CrtISO 0.941 -0.868 -0.988 0.901 -0.148 1 LCY-b -0.9 0.962 0.799 -0.94 -0.6 -0.701 1 LCY-e -0.339 0.498 0.151 -0.434 -0.988 -0.002 0.714 1 CHY1 0.688 -0.552 -0.816 0.61 -0.577 0.892 -0.305 0.447 1 CHY2 -0.28 0.109 0.461 -0.18 0.887 -0.587 -0.164 -0.807 -0.888 1 CYP97A 0.973 -0.998 -0.911 0.992 0.411 0.839 -0.975 -0.544 0.505 -0.054 1 CYP97C 0.982 -0.935 -0.999 0.958 0.008 0.987 -0.804 -0.159 0.811 -0.453 0.914 1 ZEP -0.252 0.08 0.435 -0.151 0.9 -0.564 -0.193 -0.824 -0.875 0.999 -0.025 -0.427 1 NXS 0.187 -0.014 -0.374 0.085 -0.927 0.508 0.257 0.859 0.841 -0.995 -0.04 0.366 -0.997 1 Lutein 0.683 -0.799 -0.528 0.754 0.848 0.396 -0.932 -0.918 -0.058 0.509 0.831 0.536 0.534 -0.588 1 Zea 0.188 -0.356 0.005 0.288 0.999 -0.154 -0.596 -0.987 -0.582 0.889 0.406 0.003 0.902 -0.929 0.845 1 Anthera 0.899 -0.81 -0.967 0.85 -0.253 0.994 -0.621 0.104 0.935 -0.67 0.777 0.965 -0.649 0.597 0.296 -0.258 1 Viola 0.999 -0.983 -0.981 0.994 0.189 0.942 -0.898 -0.335 0.692 -0.284 0.972 0.983 -0.256 0.192 0.679 0.183 0.901 1 Neo 0.983 -0.999 -0.93 0.997 0.366 0.865 -0.964 -0.503 0.547 -0.103 0.998 0.933 -0.074 0.008 0.803 0.361 0.807 0.982 1
CorrelaLon Matrix
ns=node strength= AVG⏐ρ⏐ NS=Network Strength= AVG (ns) n=number of nodes
CorrelaLon Network file
Carotenoids
Carotenoid-‐vol.
Terpenoid-‐vol.
Lipid-‐vol.
Aminoacid-‐vol.
NegaLve CorrelaLon PosiLve CorrelaLon
Only correlaLons ⏐ρ⏐>0.85 are shown
• Node size according ns • Node shape according the metabolic class
Up-‐regulaLon Dw-‐regulaLon
Carotenoids
Aminoacid-‐vol.
Carotenoid-‐vol.
Lipid-‐vol.
Terpenoid-‐vol.
Network Strength = NS= 0.78
CorrelaLon Networks (“global” biology)
Carotenoids
Carotenoid-‐vol.
Terpenoid-‐vol.
Lipid-‐vol.
Aminoacid-‐vol.
Rank 1 cluster:
Rank 3 cluster: Rank 2 cluster:
Significant modules in a correlaLon network…
Correlation network analysis of the main regulatory “hubs” in “Golden” fruits
Negative Correlation Positive Correlation
Up-regulation Dw-regulation ns=node strength= AVG⏐ρ⏐
n=number of nodes
• Node size according ns • Only correlations ⏐ρ⏐>0.90 are shown
Gene Metabolite
Phenotype Enzyme
• Edge width according ⏐ρ⏐
Ethylene
ABA
Lycopene
β-carotene
NS=network strength= AVG ns
n= 176 NS= 0.89
Network Reconstruction of Cell Metabolism
Leucine and faYy acid metabolism in A. thaliana
Conclusions – Data IntegraLon
-‐ Systems Biology data integra-on allow to increase knowledge about all the modifica-ons accoun-ng global metabolism -‐ Bioinforma-c/sta-s-c tools can point the aMen-on on the “major players” involved in a biological process -‐ Era of iden-fica-on of Master Nodes started, but metabolic boMleneck overcome is s-ll far…
RaLonal Design of future crops is a sLll far away, but possible, DREAM…
THANK YOU and GOOD LUCK!!!!
Contact: [email protected]