genomica)funzionale) genomica ii- lezione v... · integration of metabolomics with other...

Genomica Funzionale

10-‐14 Febbraio 2014

Topics covered

-‐  Metabolic Engineering;

-‐  Concept of Metabolomics;

-‐  Metabolomic plaAorms (LC-‐MS, GC-‐MS, NMR, ICP-‐MS etc); -‐  Set up of a metabolomic protocol and database;

-‐  ApplicaLons in plant-‐/food science field;

-‐  BioinformaLcs applied to Metabolomic data.

Metabolic engineering of plant volatiles (aromas)

2007

Metabolic engineering of plant volatiles (defense)

Metabolomics in Association Mapping Studies

Glucosinolate pathway

Jansen et al, 12

Mass Spectrometry Imaging Technique used in mass spectrometry to visualize the spaLal distribuLon of e.g.

compounds, biomarker, metabolites, pepLdes or proteins by their molecular masses.

-‐  SIMS -‐  MALDI -‐  DESI

Integration of metabolomics with other ‘omics’ fields

•  Integrating genomics and metabolomics for engineering plant metabolic pathways - Kirsi-Marja Oksman-Caldentey and Kazuki Saito (2005)‏;

•  Proteomic and metabolomic analysis of cardioprotection: Interplay between protein kinase C epsilon and delta in regulating glucose metabolism of murine hearts;

•  Plant studies (2005) to integrate transcriptomics, proteomics and

metabolomics in an effort to enhance production efficiency under stressful conditions of grapes.

How to beYer invesLgate metabolic engineered plant products?

2007

In vivo studies

Phenomic data

Fluxomics

SYSTEMS BIOLOGY

potato oligo array 42150 probes

1. TranscripLonal Profiling

Transcriptomic, metabolomic and phenomic profiling

2. Metabolic Profiling

GC-‐ToF-‐MS and LC-‐MS

Data analysis Transcriptome + Metabolome + Phenome

3. Phenotyping

Instron, Penetrometer etc

Mapping Soawares Heatmap-‐Clustering CorrelaLon/Network biology

Mapping of transcript/metabolite data

Mapman representaLon of 5.000 gene+metabolite data in 2 transgenic lines

Metabolome alteraLons in “Golden” potatoes

Krebs cycle

(+)

FaYy acids

Carotenoids

Tocopherols

AA

Sugars

Arom AA

Org. acids

Sugars

Phytosterols

AA

(-‐)

Principal Component Analysis (PCA)

•  Unsupervised •  Multivariate analysis based on projection methods •  Main tool used in chemometrics •  Extract and display the systematic variation in the data •  Each Principle Component (PC) is a linear combination of

the original data parameters •  Each successive PC explains the maximum amount of

variance possible, not accounted for by the previous PCs •  PCs Orthogonal to each other •  Conversion of original data leads to two matrices, known as

scores and loadings •  The scores(T) represent a low-dimensional plane that

closely approximates X. Linear combinations of the original variables. Each point represents a single sample spectrum.

•  A loading plot/scatter plot(P) shows the influence (weight) of the individual X-variables in the model. Each point represents a different spectral intensity.

•  The part of X that is not explained by the model forms the residuals(E)

•  X = TPT = t1p1T + t2p2

T + ... + E

Metabolomic Microarray

Principal Component Analysis

Urbanczyk- Wochniak et al., 03

Soft Indipendent Modeling of Class Analogy (SIMCA)

•  Supervised learning method based on PCA

•  Construct a seperate PCA model for each known class of observations

•  PCA models used to assign the class belonging to observations of unknown class origin

•  Boundaries defined by 95% class interval

•  Recommended for use in one class case or for classification if no interpretation is needed

CLASS SPECIFIC STUDIES n  One-class problem: Only disease observations

define a class; control samples are too heterogeneous, for example, due to other variations caused by diseases, gender, age, diet, lifestyle, etc.

n  Two-class problem: Disease and control observations define two seperate classes

Partial Least Square Discriminant Analysis (PLS)

•  Supervised learning method. •  Recommended for two-class cases instead of

using SIMCA. •  Principles that of PCA. But in PLS, a second

piece of information is used, namely, the labeled set of class identities.

•  Two data tables considered namely X (input data from samples) and Y (containing qualitative values, such as class belonging, treatment of samples)‏

•  The quantitive relationship between the two tables is sought.

•  X = TPT + E •  Y = TCT + E •  The PLS algorithm maximizes the covariance

between the X variables and the Y variables •  PLS models negatively affected by systematic

variation in the X matrix not related to the Y matrix (not part of the joint correlation structure between X-Y.

OPLS

•  OPLS method is a recent modification of the PLS method to help overcome pitfalls •  Main idea to seperate systematic variation in X into two parts, one linearly related to Y and one unrelated

(orthogonal). •  Comprises two modeled variations, the Y-predictive (TpPp

T) and the Y-orthogonal (ToPoT) compononents.

•  Only Y-predictive variation used for modeling of Y. •  X = TpPp

T + ToPoT + E

•  Y = TpCpT + F

•  E and F are the residual matrices of X and Y •  OPLS-DA compared to PLS-DA

Method of cluster analysis which seeks to build a hierarchy of clusters.

DireYo et al., 10

“Local” Clustering

“Global” Clustering

Hierarchical clustering

CorrelaLon coefficients

Pairwise correla-on analysis

Heat-‐Map Clustering Network

How (and how much) does data correlate?

Measures of dependence

Ascobate/CONSTANS Lysine/ WRKY6

Sucrose/Sucrose Transporters 4-Aminobutric Acid/ Glutamate Decarboxylase

Expected correla-ons

Unintended correla-ons

Urbanczyk- Wochniak et al., 03

Pairwise correlaLon analysis

CorrelaLon matrix

Carrari et al., 06

the matrix of Pearson product-‐moment correlaLon coefficients between each of the random variables in the random vector {X}

InteracLon Network

VirtualPlant

Libourel and Shachar-‐Hill, 08

CorrelaLon Networks (“local” biology) CrtI PSY1 PSY2 PDS ZDS CrtISO LCY-b LCY-e CHY1 CHY2 CYP97A CYP97C ZEP NXS Lutein Zea Anthera Viola Neo

CrtI 1 PSY1 -0.984 1 PSY2 -0.98 0.932 1 PDS 0.994 -0.997 -0.955 1 ZDS 0.194 -0.361 0 0.293 1 CrtISO 0.941 -0.868 -0.988 0.901 -0.148 1 LCY-b -0.9 0.962 0.799 -0.94 -0.6 -0.701 1 LCY-e -0.339 0.498 0.151 -0.434 -0.988 -0.002 0.714 1 CHY1 0.688 -0.552 -0.816 0.61 -0.577 0.892 -0.305 0.447 1 CHY2 -0.28 0.109 0.461 -0.18 0.887 -0.587 -0.164 -0.807 -0.888 1 CYP97A 0.973 -0.998 -0.911 0.992 0.411 0.839 -0.975 -0.544 0.505 -0.054 1 CYP97C 0.982 -0.935 -0.999 0.958 0.008 0.987 -0.804 -0.159 0.811 -0.453 0.914 1 ZEP -0.252 0.08 0.435 -0.151 0.9 -0.564 -0.193 -0.824 -0.875 0.999 -0.025 -0.427 1 NXS 0.187 -0.014 -0.374 0.085 -0.927 0.508 0.257 0.859 0.841 -0.995 -0.04 0.366 -0.997 1 Lutein 0.683 -0.799 -0.528 0.754 0.848 0.396 -0.932 -0.918 -0.058 0.509 0.831 0.536 0.534 -0.588 1 Zea 0.188 -0.356 0.005 0.288 0.999 -0.154 -0.596 -0.987 -0.582 0.889 0.406 0.003 0.902 -0.929 0.845 1 Anthera 0.899 -0.81 -0.967 0.85 -0.253 0.994 -0.621 0.104 0.935 -0.67 0.777 0.965 -0.649 0.597 0.296 -0.258 1 Viola 0.999 -0.983 -0.981 0.994 0.189 0.942 -0.898 -0.335 0.692 -0.284 0.972 0.983 -0.256 0.192 0.679 0.183 0.901 1 Neo 0.983 -0.999 -0.93 0.997 0.366 0.865 -0.964 -0.503 0.547 -0.103 0.998 0.933 -0.074 0.008 0.803 0.361 0.807 0.982 1

CorrelaLon Matrix

ns=node strength=Σ⏐ρ⏐of a node/n NS=Network Strength=Σ(ns)/n n=number of nodes

Network CorrelaLon file

pP-‐I; n=19; NS=0.62 pP-‐BI; n=20; NS=0.79 pP-‐YBI n=24; NS=0.79

Transgenes + Carotenoid genes + Carotenoids

NegaLve CorrelaLon PosiLve CorrelaLon Gene Carotenoid

Transgene

Only correlaLons ⏐ρ⏐>0.6 are shown

PosiLve hub

NegaLve hub

NegaLve hub

PosiLve hub

Correlation Network for fishing candidates…

•  Node size according ns •  Only correlations ⏐ρ⏐>0.65 are shown •  Edge width according ⏐ρ⏐

Gene Metabolite

Negative Correlation Positive Correlation ns=node strength= AVG⏐ρ⏐

n=number of nodes

NS=network strength= AVG ns

lycopene β-‐carotene Total Carotenoids

CorrelaLon Network of carotenoids + 100 volaLles (I) CrtI PSY1 PSY2 PDS ZDS CrtISO LCY-b LCY-e CHY1 CHY2 CYP97A CYP97C ZEP NXS Lutein Zea Anthera Viola Neo

CrtI 1 PSY1 -0.984 1 PSY2 -0.98 0.932 1 PDS 0.994 -0.997 -0.955 1 ZDS 0.194 -0.361 0 0.293 1 CrtISO 0.941 -0.868 -0.988 0.901 -0.148 1 LCY-b -0.9 0.962 0.799 -0.94 -0.6 -0.701 1 LCY-e -0.339 0.498 0.151 -0.434 -0.988 -0.002 0.714 1 CHY1 0.688 -0.552 -0.816 0.61 -0.577 0.892 -0.305 0.447 1 CHY2 -0.28 0.109 0.461 -0.18 0.887 -0.587 -0.164 -0.807 -0.888 1 CYP97A 0.973 -0.998 -0.911 0.992 0.411 0.839 -0.975 -0.544 0.505 -0.054 1 CYP97C 0.982 -0.935 -0.999 0.958 0.008 0.987 -0.804 -0.159 0.811 -0.453 0.914 1 ZEP -0.252 0.08 0.435 -0.151 0.9 -0.564 -0.193 -0.824 -0.875 0.999 -0.025 -0.427 1 NXS 0.187 -0.014 -0.374 0.085 -0.927 0.508 0.257 0.859 0.841 -0.995 -0.04 0.366 -0.997 1 Lutein 0.683 -0.799 -0.528 0.754 0.848 0.396 -0.932 -0.918 -0.058 0.509 0.831 0.536 0.534 -0.588 1 Zea 0.188 -0.356 0.005 0.288 0.999 -0.154 -0.596 -0.987 -0.582 0.889 0.406 0.003 0.902 -0.929 0.845 1 Anthera 0.899 -0.81 -0.967 0.85 -0.253 0.994 -0.621 0.104 0.935 -0.67 0.777 0.965 -0.649 0.597 0.296 -0.258 1 Viola 0.999 -0.983 -0.981 0.994 0.189 0.942 -0.898 -0.335 0.692 -0.284 0.972 0.983 -0.256 0.192 0.679 0.183 0.901 1 Neo 0.983 -0.999 -0.93 0.997 0.366 0.865 -0.964 -0.503 0.547 -0.103 0.998 0.933 -0.074 0.008 0.803 0.361 0.807 0.982 1

CorrelaLon Matrix

ns=node strength= AVG⏐ρ⏐ NS=Network Strength= AVG (ns) n=number of nodes

CorrelaLon Network file

Carotenoids

Carotenoid-‐vol.

Terpenoid-‐vol.

Lipid-‐vol.

Aminoacid-‐vol.

NegaLve CorrelaLon PosiLve CorrelaLon

Only correlaLons ⏐ρ⏐>0.85 are shown

•  Node size according ns •  Node shape according the metabolic class

Up-‐regulaLon Dw-‐regulaLon

Carotenoids

Aminoacid-‐vol.

Carotenoid-‐vol.

Lipid-‐vol.

Terpenoid-‐vol.

Network Strength = NS= 0.78

CorrelaLon Networks (“global” biology)

Carotenoids

Carotenoid-‐vol.

Terpenoid-‐vol.

Lipid-‐vol.

Aminoacid-‐vol.

Rank 1 cluster:

Rank 3 cluster: Rank 2 cluster:

Significant modules in a correlaLon network…

Correlation network analysis of the main regulatory “hubs” in “Golden” fruits

Negative Correlation Positive Correlation

Up-regulation Dw-regulation ns=node strength= AVG⏐ρ⏐

n=number of nodes

•  Node size according ns •  Only correlations ⏐ρ⏐>0.90 are shown

Gene Metabolite

Phenotype Enzyme

•  Edge width according ⏐ρ⏐

Ethylene

ABA

Lycopene

β-carotene

NS=network strength= AVG ns

n= 176 NS= 0.89

Network Reconstruction of Cell Metabolism

Leucine and faYy acid metabolism in A. thaliana

Conclusions – Data IntegraLon

-‐  Systems Biology data integra-on allow to increase knowledge about all the modifica-ons accoun-ng global metabolism -‐  Bioinforma-c/sta-s-c tools can point the aMen-on on the “major players” involved in a biological process -‐  Era of iden-fica-on of Master Nodes started, but metabolic boMleneck overcome is s-ll far…

RaLonal Design of future crops is a sLll far away, but possible, DREAM…

THANK YOU and GOOD LUCK!!!!

Contact: [email protected]

genomica)funzionale) genomica ii- lezione v... · integration of metabolomics with other...

Documents