lecture7 introduction to signaling pathways reverse engineering of biological networks ...

49
Lecture7 Introduction to signaling pathways Reverse Engineering of biological networks Metabolomics approach for determining growth-specific metabolites based on FT-ICR-MS Self organizing mapping(SOM)

Upload: valentine-harrison

Post on 11-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

Lecture7

Introduction to signaling pathwaysReverse Engineering of biological networksMetabolomics approach for determining growth-specific metabolites based on FT-ICR-MSSelf organizing mapping(SOM)

Page 2: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

Signaling networks involves the transduction of “signal” usually from outside to the inside of the cellOn molecular level signaling involves the same type of processes as metabolism such as production and degradation of substances, molecular modifications (mainly phosphorylation but also methylation and acetylation) and activation or inhibition of reactions.But signaling pathways serve for information processing or transfer of information while metabolism provide mainly mass transfer

Introduction to signaling pathways

Page 3: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

Introduction to signaling pathways

Signal transduction often involves:

•The binding of a ligand to an extracellular receptor

•The subsequent phosphorylation of an intra cellular enzyme

•Amplification and transfer of the signal

•The resultant change in the cellular function e.g. increase /decrease in the expression of a gene

Page 4: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

Signaling paradiam

Usually a signaling network has three principal parts:Events around the membraneReactions that link sub-membrane events to the nucleusEvents that leads to transcription

Source: Systems biology in practice by E. klipp et. al.

Page 5: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

Schematic representation of receptor activation

Source: Systems biology in practice by E. klipp et. al.

Page 6: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

Steroids Not always a receptor exists at the membrane for example the steroid receptors.Sterol lipids include hormones such as cortisol, estrogen, testosteron and calcitriol.These steroids simply cross the membrane of the target cell and then bound the intracellular receptor which results in the release of the inhibitory molecule from the receptor.The receptor then traverses the nuclear membrane and binds to its site on the DNA to trigger the transcription of the target gene.

Source: Systems biology by Bernhard O. Palsson

Page 7: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

G-protein signalingG-protein coupled receptor (GPCR) represents important components of signal transduction networkThis class of receptor comprises 5% of the genes in C. elegansThe G-protein complex consists of three subunits (α, β and λ) and in its inactive state bound to guanosine diphosphate(GDP) When a ligand binds to the GPCR, the G-protein exchanges its GDP for a guanosine trihosphate(GTP) This exchange leads to the dissociation of the G-protein from the receptor and its split into a βλ complex and a GTP-bound α subunit which is its active state initiating other downstream processes

Source: Systems biology by Bernhard O. Palsson

Page 8: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

G-protein signaling model

Source: Systems biology in practice by E. klipp et. al.

Page 9: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

G-protein signaling model

Time course of G protein activation. The total number of molecules is 10000. The concentration of GDP-bound Gα is low for the whole period due to its fast complex formation with the heterodimer Gβλ

Source: Systems biology in practice by E. klipp et. al.

Page 10: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

The JAK-STAT network

A cell surface receptor often dimerizes upon binding to a cytokineThe monomeric form of the receptor is associated with a kinase called JAKWhen the receptor dimerizes the JAKs induce phosphorylation of themselves and the receptor which is the active state of the receptor.The active complex phosphorylates the STAT(signal transducer and activator of transcription) molecules STAT molecules then dimerizes, go to nucleus and trigger transcription

The JAK-STAT signaling system is an important two-step process that is involved in multiple cellular functions including cell growth and inflammatory response

Source: Systems biology in practice by E. klipp et. al.

Page 11: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

Schematic representation of the MAP kinase cascade. An upstream signal causes phosphorylation of the MAPKKK. The phosphorylation of the MAPKKK in turn phosphorylates the protein at the next level. Dephosphorylation is assumed to occur continuously by phosphatases or autodephosphorylation

Source: Systems biology in practice by E. klipp et. al.

Page 12: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

Signaling pathways in Baker’s yeast

HOG pathway activated by osmotic shock, pheromone pathway activated by pheromones from cells of opposite mating type and pseudohyphal growth pathway stimulated by starvation conditionA MAP kinase cascade is a particular part of many signalling pathways . In this figure its components are indicated by bold border Source: Systems biology in practice by E. klipp et. al.

Page 13: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

Reverse Engineering of biological networks

The task of reverse engineering of a genetic network is the reconstruction of the interactions among biological entities ( genes, proteins, metabolites etc.) in a qualitative way from experimental data using algorithm that weight the nature of the possible interactions with numerical values.

In forward modeling network is constructed with known interactions and subsequently its topological and other properties are analyzed

In reverse engineering the network is estimated from experimental data and then it is used for other predictions

Page 14: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

Reverse Engineering of gene regulatory networkBy clustering the gene expression data, we can determine co-expressed genes.

Co-expressed genes might have similar regulatory characteristics but it is not possible to get the information about the nature of the regulation.

Here we discuss a reverse engineering method of estimating regulatory relation between genes based on gene expression data from the following paper:

Reverse engineering gene networks using singular value decomposition and robust regressionM. K. Stephen Yeung, Jesper Tegne´ r†, and James J. Collins‡Proc. Natl. Acad. Sci. USA 99:6163-6168

Page 15: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

It is assumed that the dynamics i.e. the rate of change of a gene-product’s abundance is a function of the abundance of all other genes in the network.

For all N genes the system of equations are as follows:

In Vector notation

Where f(X) is a vector valued function

Reverse Engineering of gene regulatory network

Page 16: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

Under linear assumption i.e. has linear relation with Xi s we can write

Here Aij is the coupling parameter that represents the influence of Xj on the expression rate of Xi . In other words Aij represents a network showing the regulatory relation among the genes.

Target of reverse engineering is to determine A. Solving A requires a large number of measurements of and X

Reverse Engineering of gene regulatory network

Page 17: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

Measurement of is difficult and hence can be estimated in several ways.

First, if time series data can be obtained then can be approximated by using the profiles of the expression values for fixed time intervals

Alternatively a cellular system at steady state can be perturbed by external stimulation and then can be determined by comparing the gene expression in the perturbed cellular population and the unperturbed reference population.

Reverse Engineering of gene regulatory network

Page 18: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

Now using any method if we can produce matrices and then we can write

Or, (if external perturbation is used)

Here BNxM is the matrix representing the effect of perturbation

The goal of reverse engineering is to use the measured data B, X, and to deduce A i.e. the connectivity matrix of the regulatory relation among the genes.

Reverse Engineering of gene regulatory network

Page 19: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

By taking transpose the system can be rewritten as

A is the unknown. If M =N and X is full-ranked, we can simply invert the matrix X to find A. However, typically M<<N mainly because of the high cost of perturbations and measurements. We therefore have an underdetermined problem. Underdetermined problem means the number of linearly independent equations is less than the number of unknown variables. Therefore there is no unique solution One way to get around this is to use SVD to decompose XT into

Reverse Engineering of gene regulatory network

Page 20: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

where U and V are each orthogonal which means:

with I being the identity matrix, and W is diagonal:

Without loss of generality, we may assume that all nonzero elements of wk are listed at the end, i.e., w1, w2, . . . , wL =0 and wL+1, wL+2,. . . , wN≠0, where L :=dim(ker(XT)). Then one particular solution for A is:

Reverse Engineering of gene regulatory network

Page 21: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

the general solution is given by the affine space

with C = (cij)N×N, where cij is zero if j >L and is otherwise an arbitrary scalar coefficient. This family of solutions in Eq. 3represents all the possible networks that are consistent with the microarray data. Among these solutions, the particular solution A0 is the one with the smallest L2 norm. Now, the question is which one of the solutions of equation 3 is the best.

Reverse Engineering of gene regulatory network

Page 22: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

In such cases, we may rely on insights provided by earlier works on gene regulatory networks and bioinformatics databases, which suggest that naturally occurring gene networks are sparse, i.e., generally each gene interacts with only a small percentage of all the genes in the entire genome. Imposing sparseness on the family of solutions given by Eq. 3 means that we need to choose the coefficients cij to maximize the number of zero entries in A. This is a nontrivial problem.

Reverse Engineering of gene regulatory network

Page 23: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

The task is equivalent to the problem of finding the exact-fit plane in robust statistics, where we try to fit a hyperplane to a set of points containing a few outliers. Here they have chosen L1 regression where the figure of merit is the minimization of the sum of the absolute values of the errors, for its efficiency. In short, this method of reverse engineering can produce multiple solutions (gene networks) that are consistent with a given microarray data. This paper says among them the sparsest one is the best solution and used L1 regression to detect the best solution.

Reverse Engineering of gene regulatory network

Page 24: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

24

Metabolomics approach for determining growth-specific metabolites based on FT-ICR-MS

Page 25: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

25

[1] Metabolomics

Metabolite 1 Metabolite 2 Metabolite 3

Metabolite 4

Metabolite 5

Metabolite 6

B C

D EF

I L

H K

Interpretation of Metabolome

Species

Molecular weight and formula

Fragmentation Pattern

Metabolite information

Species Metabolites

Tissue Samples

Species-Metabolite relation DB

Experimental Information

MS

Page 26: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

Data Processing from FT-MS data acquisition of a time series experiment to assessment of cellular conditions

0.1

1

10

0 200 400 600 800

Time (min)

OD

600

T1T2

T3T4

T5T6 T7 T8(a) Metabolite quantities

for time series experiments

Metabolites

MM+1M/2(e) Assessment of cellular condition by metabolite composition

sM

Mk

Mk

ss

j

j

x

xx

xx

xx

xx

xxx

.............

..................

........

..........

..........

....................

..........

.....

22

11

21

221

11211

m/z

Tim

e po

int

(b) Data preprocessing and constructing data matrix

(d) Annotation of ions as metabolites

(c) Classification of ions into metabolite-derivative group

Detectedm/z

Theoreticalm/z

Molecular formula

Exact mass Error Candidate Species

72.9878 73.9951 C2H2O3 74.0004 0.0053 Glyoxylic acid Escherichia coli

143.1080 144.1153 C8H16O2 144.1150 0.0003 Octanoic acid Escherichia coli

662.1037 663.1109 C21H27N7O14P2 663.1091 0.0018 NAD Escherichia coli

664.1095 665.1168 C21H29N7O14P2 665.1248 0.0080 NADH Escherichia coli

.....

..........

..........

.....

..... ..........

.......... .....

.....

.....

.....

.....

..........

.....

.....

.....

E. coli

Page 27: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

27

time

719.4869

722.505

747.5112

NMNk

tMtjtt

sM

Mk

Mk

ss

j

j

xx

x

xxx

xxx

x

xx

xx

xx

xx

xxx

NjNN ........

..................

.............

..................

.....

....................

.....

....................

.............

..................

........

..........

..........

....................

..........

.....

21

21

22

11

21

221

11211time 1

time 8

time 2

metab.1 metab.200(b) Data matrix

Software are provided by T. Nishioka (Kyoto Univ./Keio Univ.)

Page 28: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

28

1-1

1-2

1-3

1-4,5

1-6

2-1

2-2

2-3

3

45

6

78

9

10

11

PG5

PG7

PG9 PG3

PG1

PG6

PG2

PG4

PG10

PG8

M-1

M-2 M-3

M-4

M-5

M-6M-7

M-8

M-9M-10

M-11M-12

M-13

M-14

M-15

M-16

M-17

(c) Classification of ions into metabolite-derivative group (DPClus)

Correlation network for individual ions.

Intensity ratio between Monoisotope (M) and Isotope (M+1) # of Carbons in molecular formula:

Page 29: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

29

(d) Annotation of ions as metabolites using KNApSAcK DBDetected

m/za

Theoreticalm/z

Molecular formula Exact mass Error Candidate Species

72.9878 73.9951 C2H2O3 74.0004 0.0053 Glyoxylic acid Escherichia coli

143.1080 144.1153 C8H16O2 144.1150 0.0003 Octanoic acid Escherichia coli

253.2137 254.2210 C16H30O2 254.2246 0.0036 omega-Cycloheptanenonanoic acid Alicyclobacillus acidocaldarius

253.2185 254.2258 C16H30O2 254.2246 0.0012 omega-Cycloheptanenonanoic acid Alicyclobacillus acidocaldarius

281.2444 282.2516 C18H34O2 282.2559 0.0042 Oleic acid Escherichia coli

C18H34O2 282.2559 0.0042 cis-11-Octadecanoic acid Lactobacillus plantarum

C18H34O2 282.2559 0.0042 omega-Cycloheptylundecanoic acid Alicyclobacillus acidocaldarius

297.2410 298.2482 C18H34O3 298.2508 0.0026 alpha-Cycloheptaneundecanoic acid Alicyclobacillus acidocaldarius

297.2467 298.2540 C18H34O3 298.2508 0.0032 alpha-Cycloheptaneundecanoic acid Alicyclobacillus acidocaldarius

297.2516 298.2589 C18H34O3 298.2508 0.0081 alpha-Cycloheptaneundecanoic acid Alicyclobacillus acidocaldarius

321.0506 322.0579 C10H15N2O8P 322.0566 0.0013 dTMP Escherichia coli K12

346.0570 347.0643 C10H14N5O7P 347.0631 0.0012 AMP Escherichia coli

C10H14N5O7P 347.0631 0.0012 3'-AMP Escherichia coli

C10H14N5O7P 347.0631 0.0012 dGMP Escherichia coli

401.0168 402.0241 C10H16N2O11P2 402.0229 0.0012 dTDP Escherichia coli

402.9962 404.0035 C9H14N2O12P2 404.0022 0.0013 UDP Escherichia coli

426.0237 427.0310 C10H15N5O10P2 427.0294 0.0016 Adenosine 3',5'-bisphosphate Escherichia coli

C10H15N5O10P2 427.0294 0.0016 ADP Escherichia coli

C10H15N5O10P2 427.0294 0.0016 dGDP Escherichia coli

454.0391 455.0464 C20H19Cl2NO7 455.0539 0.0075 Antibiotic MI 178-34F18A2 Actinomadura spiralis MI178-34F18

C20H19Cl2NO7 455.0539 0.0075 Antibiotic MI 178-34F18C2 Actinomadura spiralis MI178-34F18

458.1112 459.1185 C15H22N7O8P 459.1267 0.0083 Phosmidosine B Streptomyces sp. strain RK-16

495.1039 496.1112 C24H20N2O10 496.1118 0.0006 Kinamycin A Streptomyces murayamaensis sp. nov.

C24H20N2O10 496.1118 0.0006 Kinamycin C Streptomyces murayamaensis sp. nov.

505.9908 506.9981 C10H16N5O13P3 506.9957 0.0023 ATP,dGTP Escherichia coli

547.0756 548.0829 C16H26N2O15P2 548.0808 0.0020 dTDP-L-rhamnose Escherichia coli

565.0503 566.0576 C15H24N2O17P2 566.0550 0.0025 UDP-D-glucose Escherichia coli

C15H24N2O17P2 566.0550 0.0025 UDP-D-galactose Escherichia coli

606.0775 607.0848 C17H27N3O17P2 607.0816 0.0032 UDP-N-acetyl-D-mannosamine Escherichia coli

C17H27N3O17P2 607.0816 0.0032 UDP-N-acetyl-D-glucosamine Escherichia coli

618.0897 619.0970 C17H27N5O16P2 619.0928 0.0042 ADP-L-glycero-beta-D-manno-heptopyranose Escherichia coli

662.1037 663.1109 C21H27N7O14P2 663.1091 0.0018 NAD Escherichia coli

664.1095 665.1168 C21H29N7O14P2 665.1248 0.0080 NADH Escherichia coli

741.4729 742.4801 C32H62N12O8 742.4814 0.0012 Argimicin A Sphingomonas sp.

786.4712 787.4785 C41H65N5O10 787.4731 0.0054 BE 32030B Nocardia sp. A32030

853.3166 854.3239 C41H46N10O9S 854.3170 0.0069 Argyrin G Archangium gephyra Ar 8082

C45H56Cl2N2O10 854.3312 0.0073 Decatromicin B Actinomadura sp. MK73-NF4

C39H50N8O12S 854.3269 0.0030 Napsamycin C Streptomyces sp. HIL Y-82,11372

Page 30: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

30

PLSY

Responses

X

N=8

M=220K=1

N=8

PLS (Partial Least Square regression model) -- extract important combinations of metabolites. N (biol.condition) << M (metabolites)

(e) Estimation of cell condition based on a function of the composition of metabolites.

Y(Cell density)= a1 x1 +…+ aj xj +….+ aM xM

xj, the quantity for jth metabolites

cell condition cell condition

mea

sure

men

t poi

nts

Metabolites0.1

1

10

0 200 400 600 800Time (min)

OD

600

T1T2T3

T4T5

T6 T7 T8

Page 31: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

0.1

0.0

ajUDP-glucose, UDP-galactose

NAD

Parasperone A

UDP-N-acetyl-D-glucosamineUDP-N-acetyl-D-mannosamine

ADP, Adenosine 3',5'-bisphosphate, dGDP

UDP

omega-Cycloheptyl-alpha-hydroxyundecanoate

Octanoic aciddTMP, dGMP, 3'-AMP

NADH

Argyrin G

dTDP

ATP, dGTP

Lenthionine

omega-CycloheptylnonanoatedTDP-6-deoxy-L-mannoseomega-Cycloheptylundecanoate, cis-11-Octadecanoic acid

ADP-(D,L)-glycero-D-manno-heptose

Glyoxylate

omega-Cycloheptyl-alpha-hydroxyundecanoate

-0.15

Stationary-phase dominantExponential-phase dominant

y(OD600 Cell Density)= a1 x1 +…+ aj xj +….+ aM xM

aj > 0, stationary phase-dominant metabolites

xj , the quantity for jth

aj < 0, exponential phase-dominant metabolites

(e) Assessment of cellular condition by metabolite compositionDetection of stage-specific metabolites

(PLS model of OD600 to metabolite intensities)

Red: E.coli metabolites;Black: Other bacterial metabolites

PG1,3,5,7,9

MS/MS analyses

120 metabolites

80 metabolites

MS/MS analysesPG2,4,6,8,10

Page 32: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

10 Phosphatidylglycerols detected by MS/MS spectra

(b) Relation of mass differences among PG1 to 10marker molecules

PG530:1(14:0,16:1)

PG132:1(16:0,16:1)

PG334:1(16:0,18:1)

PG631:0(14:0,c17:0)

PG233:0(16:0,c17:0)

PG434:5(16:0,c19:0)

PG734:2(16:1,18:1)

PG936:2(18:1,18:1)

PG835:1(16:1,c19:1)

PG1037:1(18:1,c19:0)

(Cluster 1)28.0281

14.0170

(Cluster 2)

14.0187 14.0110

14.0181

28.0315

28.0298 28.0237

2.0138

2.0051

28.0330

28.0314

14.0197

CFA CFA CFA

CFA CFA∆(CH2)2

US

US

∆(CH2)2

∆(CH2)2

∆(CH2)2

∆(CH2)2

∆(CH2)2

O

O C15H31

O

O

OX3

O

O C15H31

O

O

OX3

Cyclopropane Formation of PGs occurs in the transition from exponential to stationary phase.

Exponential phase

Stationary phase

Cyclopropane Formaiton of PGs

unsaturated PGs

cyclopropanated PGs

Page 33: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

Self organizing Maps

Page 34: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

Time-series Data

0.01

0.1

1

10

12

Tj

Time

Growth curve

DTDjDD

iTijii

Tj

Tj

xxxx

xxxx

xxxx

xxxx

......

..................

......

..................

......

......

21

21

222221

111211

D

i

Gene

Gene

Gene

Gene

...

...2

1

Expression profiles

When we measure time-series microarray, gene expression profile is represented by a matrixSOM makes it possible to examine gene similarity and stage similarity simultaneously.

Stage 1 2 …. j … T

D

i

x

x

x

x

...

...

21

T, # of time-series microarray experimentsD, # of genes in a microarray

Page 35: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

Time-series Data

0.01

0.1

1

10

12

Tj

Time

Growth curve

DTDjDD

iTijii

Tj

Tj

xxxx

xxxx

xxxx

xxxx

......

..................

......

..................

......

......

21

21

222221

111211

D

i

Gene

Gene

Gene

Gene

...

...2

1

Expression profiles

When we measure time-series microarray, gene expression profile is represented by a matrixSOM makes it possible to examine gene similarity and stage similarity simultaneously.

Stage 1 2 …. j … T

D

i

x

x

x

x

...

...

21

T, # of time-series microarray experimentsD, # of genes in a microarray … …

Stage similarity

Expression similarity

STATESState-Transition

Multivariate AnalysisSOM : expression similarity of genes and stage similarity simultaneously.

BL-SOM is available at http://kanaya.aist-nara.ac.jp/SOM/

Page 36: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

SOM was developed by Prof. Teuvo Kohonen in the early 1980s

Multi-dimensional data/input vectors are mapped onto a two dimensional array of nodes

In original SOM, output depends on input order of the vectors.

To remove this problem Prof. Kanaya developed BL-SOM.

[1] Initial model vectors are determined based on PCA of the data.

[2] The learning process of BL-SOM makes the output independent of the order of the input vectors.

Page 37: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

SOM Algorithm

Source: “Clustering Challenges in Biological Networks” edited by S. Butenko et. al.

Page 38: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

SOM Algorithm

Source: “Clustering Challenges in Biological Networks” edited by S. Butenko et. al.

Page 39: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

SOM Algorithm

Source: “Clustering Challenges in Biological Networks” edited by S. Butenko et. al.

Page 40: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

SOM Algorithm

in Fig. before

Source: “Clustering Challenges in Biological Networks” edited by S. Butenko et. al.

Page 41: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

X2

X1

XT

Self-organizing Mapping (Summary)

Gene i (xi1,xi2,..,xiT)

DTDjDD

iTijii

Tj

Tj

xxxx

xxxx

xxxx

xxxx

......

..................

......

..................

......

......

21

21

222221

111211

D

i

Gene

Gene

Gene

Gene

...

...2

1

D

i

x

x

x

x

...

...

21

T, different time-series microarray experiments

[1] Detection method for transition points in gene expression and metabolite quantity based on batch-learning Self-organinzing map (BL-SOM)

[2] Diversity of metabolites in species Species-metabolite relation Database

Page 42: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

X2

X1

XT

Self-organizing Mapping (Summary)Arrangement of lattice points in multi-dimensional expression spaceLattice points are optimized for reflecting data distribution

Gene ClassificationGenes are classified into the nearest lattice points

Gene i (xi1,xi2,..,xiT)

Page 43: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

X2

X1

XT

Self-organizing Mapping (Summary)

Non-linear projection of multi-dimensional expression profiles of genes.Original dimension is conserved in individual lattice points.Several types of information is stored in SOM

Arrangement of lattice points in multi-dimensional expression spaceLattice points are optimized for reflecting data distribution

Gene ClassificationGenes with similar expression profiles are clusterized to identical or near lattice points

Feature Mapping In the i-th condition, lattice points containing only highly (low) expressed genes are colored by red (blue).

Xk> Th.(k)

Xk< -Th.(k)

X1 (Time 1)

X2 (Time 2)

X3 (Time 3)

XT (Time T)

Visually comparing among each stage of time-series data

(ex.)

…..…..…..

k=1,2,…,T

Page 44: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

SOM for time-series expression profile

Estimation of transition points; Bacillus subtilis (LB medium) (Data: Kazuo Kobayashi, Naotake Ogasawara (NAIST))

Stage 1 2 3 4 5 6 7 8

(min)

Cell Density (OD600 )

0.001

0.01

0.1

1

10

-1000

0

-2000

1

2

34

8765

LB

log(Prob. Density)

0 200 400 600 800 1000

State transition point is observed between stages 3 and 4

Low prob.

High prob.

Page 45: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

Integerated analysis of gene expression profile and metabolite quantity data of Arabidopsis thaliana (sulfur def./cont.; Data are provided by K.Saito, M. Hirai group (PSC) )

Nakamura et al (2004)

ppm(error rate)

Page 46: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

Accurate molecular weights Candidate metabolites corresponding to accurate molecular weights

3. Species-metabolite relation Database

Lattice points with highly difference between 12 and 24 h.Blue: DecreasedRed: increased

Gene

Metabolites(m/z)

Feature Maps

State transition

Root Root

LeafLeaf

Page 47: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

Download sites of BL-SOMRiken:  http://prime.psc.riken.jp/NAIST: http://kanaya.naist.jp/SOM/

Application of BL-SOM to “-omics”

GenomeKanaya et al., Gene, 276, 89-99 (2001)Abe et al., Genome Res., 13, 693-702, (2003)Abe et al., J.Earth Simulator, 6, 17-23, (2003)Abe et al., DNA Res., 12, 281-290. (2005) Transcriptome Haesgawa et al., Plant Methods, 2:5:1-18 (2006)

MetabolomeKim et al., J. Exp.Botany, 58, 415-424, (2007)Fukusaki et al., J.Biosci.Bioeng., 100, 347-354, (2005)

Transcriptome and MetabolomeHirai, M. Y., M. Klein, et al. J.Biol. Chem., 280, 25590-5 (2005)Hirai, M. Y., M. Yano, et al. Proc Natl Acad Sci U S A 101, 10205-10 (2004)Morioka, R, et al., BMC Bioinformatics, 8, 343, (2007)Yano et al., J.Comput. Aided Chem.,7,125-136 (2007)……

Page 48: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

Summary of Bioinformatics Tool developed in our laboratory http://kanaya.naist.jp/~skanaya/Web/JTop.html

Metabolomics-- MS data processing

Transcriptome and Metabolomics Profiling-- estimation of transition points

Species-metabolite DB

Transcriptomics-- Statistics, Profiling, …

Network analysis: PPI

All softwares and DB are freely accessable via Web.

Page 49: Lecture7  Introduction to signaling pathways  Reverse Engineering of biological networks  Metabolomics approach for determining growth-specific metabolites

Introduction to self organizing mapping software

&

Introduction to software package Expander

http://acgt.cs.tau.ac.il/expander/