1 today’s topics general discussion on systems biology metabolomics approach for determining...
TRANSCRIPT
1
Today’s topics
•General discussion on systems biology
•Metabolomics approach for determining growth-specific metabolites based on FT-ICR-MS
•Self organizing mapping(SOM)
What is systems biology?
Each lab/group has its own definition of systems biology.
This is because systems biology requires the understanding and integration of different branches of science and different levels of OMICS information together and individual labs/groups are working on different area.
Theoretical target: Understanding life as a system.Practical Targets: Serving humanity by developing new generation medical tests, drugs, foods, fuel, materials, sensors, logic gates……
Bioinofomatics
a
b c
d e f g
h i k m
j l
5’
5’3’
3’
A B C D E F G H I J K L MProtein
A B C D EF
G H I JK L MFunctionUnit
Metabolite 1 Metabolite 2 Metabolite 3
Metabolite 4
Metabolite 5
Metabolite 6
B C
D EF
I L
H KMetabolic Pathway
G
Activation (+)A
GRepression (-)
ab c
d e f gh i k m
j l5’
5’3’3’
Genome:
Transcriptome :
Proteome, Interactome
MetabolomeFT-MS
Integration of omicsto define elements(genome, mRNAs, Proteins, metabolites)
Understanding organism as a system (Systems Biology)
Understanding species-species relations (Survival Strategy)
comprehensive and global analysis of diverse metabolites produced in cells and organisms
Prescription
ProteomeInteractomeTranscriptomeMetabolomicsMedicinal Herb.
・・
・
PhysiologicalActivity
・・
・
TherapeuticUsage
・・
・・・
・
Metabolomics
・・
・ ・・
・・・
・
ProteomeInteractomeTranscriptome
・・
・
Plant Omics Human Omics
Plant-Human interacted Systems biology
Plant Systems Biology Human Systems Biology
Con
nect
wit
h T
hera
peut
ic U
sage
Con
nect
wit
h P
hysi
olog
ical
Act
ivit
y
Traditional & Modern Knowledge of Medicinal Plants
Modelling can be extended to Plant-Human interaction.
Okada, T., Afendi, FM., Amin, M., Takahashi, H., Nakamura, K., Kanaya, S.,Current Computer Aided Drug Design, 179-196, 10, (2010)
Prescription
ProteomeInteractomeTranscriptomeMetabolomicsMedicinal Herb.
・・
・
PhysiologicalActivity
・・
・
TherapeuticUsage
・・
・・・
・
Metabolomics
・・
・ ・・
・・・
・
ProteomeInteractomeTranscriptome
・・
・
Plant Omics Human Omics
Plant-Human interacted Systems biology
Plant Systems Biology Human Systems Biology
Con
nect
wit
h T
hera
peut
ic U
sage
Con
nect
wit
h P
hysi
olog
ical
Act
ivit
y
Traditional & Modern Knowledge of Medicinal Plants
NMNN
M
M
xxx
xxx
xxx
...
............
...
...
21
22221
12111
X
(1) Comprehensively understanding of each layers
Principal component analysisBL-SOMDPClus……….……….
Modelling can be extended to Plant-Human interaction.
Prescription
ProteomeInteractomeTranscriptomeMetabolomicsMedicinal Herb.
・・
・
PhysiologicalActivity
・・
・
TherapeuticUsage
・・
・・・
・
Metabolomics
・・
・ ・・
・・・
・
ProteomeInteractomeTranscriptome
・・
・
Plant Omics Human Omics
Con
nect
wit
h T
hera
peut
ic U
sage
Con
nect
wit
h P
hysi
olog
ical
Act
ivit
y
Traditional & Modern Knowledge of Medicinal Plants
NMNN
M
M
xxx
xxx
xxx
...
............
...
...
21
22221
12111
X
(2) Relation between layersMathematical modelingPartial Least Square Multi-regression AnalysisDiscriminant analysis
Ny
y
y
...2
1
y
XfyTherapeutic UsagePhysiological activity etc. Herb composition
metabolites in herbs.
Modelling can be extended to Plant-Human interaction.
Prescription
ProteomeInteractomeTranscriptomeMetabolomicsMedicinal Herb.
・・
・
PhysiologicalActivity
・・
・
TherapeuticUsage
・・
・・・
・
Metabolomics
・・
・ ・・
・・・
・
ProteomeInteractomeTranscriptome
・・
・
Plant Omics Human Omics
Plant-Human interacted Systems biology
Plant Systems Biology Human Systems Biology
Con
nect
wit
h T
hera
peut
ic U
sage
Con
nect
wit
h P
hysi
olog
ical
Act
ivit
y
Traditional & Modern Knowledge of Medicinal Plants
Plant-Human interaction
(1,2)Multivariate analysis
Partial least Square modelingPrincipal Compornet AnalysisBL-Selforganizing MapDPClus (Network clustering)….….
Metabolomics
Transcriptomcs
Prescription
ProteomeInteractomeTranscriptomeMetabolomicsMedicinal Herb.
・・
・
PhysiologicalActivity
・・
・
TherapeuticUsage
・・
・・・
・
Metabolomics
・・
・ ・・
・・・
・
ProteomeInteractomeTranscriptome
・・
・
Plant Omics Human Omics
Con
nect
wit
h T
hera
peut
ic U
sage
Con
nect
wit
h P
hysi
olog
ical
Act
ivit
y
Traditional & Modern Knowledge of Medicinal Plants
This situation can be exteneded to Plant-Human interaction.
(3) Knowledge Systematization of interaction between human and plantsDatabase
Prescription
ProteomeInteractomeTranscriptomeMetabolomicsMedicinal Herb.
・・
・
PhysiologicalActivity
・・
・
TherapeuticUsage
・・
・・・
・
Metabolomics
・・
・ ・・
・・・
・
ProteomeInteractomeTranscriptome
・・
・
Plant Omics Human Omics
Con
nect
wit
h T
hera
peut
ic U
sage
Con
nect
wit
h P
hysi
olog
ical
Act
ivit
yPlant-Human interaction
Traditional & Modern Knowledge of Medicinal Plants
Prescription
ProteomeInteractomeTranscriptomeMetabolomicsMedicinal Herb.
・・
・
PhysiologicalActivity
・・
・
TherapeuticUsage
・・
・・・
・
Metabolomics
・・
・ ・・
・・・
・
ProteomeInteractomeTranscriptome
・・
・
Plant Omics Human Omics
Plant-Human interacted Systems biology
Plant Systems Biology
Human Systems Biology
Con
nect
wit
h T
hera
peut
ic U
sage
Con
nect
wit
h P
hysi
olog
ical
Act
ivit
y
Traditional & Modern Knowledge of Medicinal Plants
(4) Systems Biology for Plant-Human interaction
[1] Responsibility of synergetic activity[2] reduction of side effects in medication for the complexity of disease derived by mutifactorial causes [3] metabolites in plants interact with multiple targeted proteins in humanregulate gene expression lead to dynamical state change in metabolome and physiological activity in human.
11
Metabolomics approach for determining growth-specific metabolites based on FT-ICR-MS
12
[1] Metabolomics
Metabolite 1 Metabolite 2 Metabolite 3
Metabolite 4
Metabolite 5
Metabolite 6
B C
D EF
I L
H K
Interpretation of Metabolome
Species
Molecular weight and formula
Fragmentation Pattern
Metabolite information
Species Metabolites
Tissue Samples
Species-Metabolite relation DB
Experimental Information
MS
Data Processing from FT-MS data acquisition of a time series experiment to assessment of cellular conditions
0.1
1
10
0 200 400 600 800
Time (min)
OD
600
T1T2
T3T4
T5T6 T7 T8(a) Metabolite quantities
for time series experiments
Metabolites
MM+1
M/2(e) Assessment of cellular condition by metabolite composition
sM
Mk
Mk
ss
j
j
x
xx
xx
xx
xx
xxx
.............
..................
........
..........
..........
....................
..........
.....
22
11
21
221
11211
m/z
Tim
e p
oin
t
(b) Data preprocessing and constructing data matrix
(d) Annotation of ions as metabolites
(c) Classification of ions into metabolite-derivative group
Detectedm/z
Theoreticalm/z
Molecular formula
Exact mass Error Candidate Species
72.9878 73.9951 C2H2O3 74.0004 0.0053 Glyoxylic acid Escherichia coli
143.1080 144.1153 C8H16O2 144.1150 0.0003 Octanoic acid Escherichia coli
662.1037 663.1109 C21H27N7O14P2 663.1091 0.0018 NAD Escherichia coli
664.1095 665.1168 C21H29N7O14P2 665.1248 0.0080 NADH Escherichia coli
.....
..........
..........
.....
..... ..........
.......... .....
.....
.....
.....
.....
..........
.....
.....
.....
E. coli
14
time
719.4869
722.505
747.5112
NMNk
tMtjtt
sM
Mk
Mk
ss
j
j
xx
x
xxx
xxx
x
xx
xx
xx
xx
xxx
NjNN ........
..................
.............
..................
.....
....................
.....
....................
.............
..................
........
..........
..........
....................
..........
.....
21
21
22
11
21
221
11211time 1
time 8
time 2
metab.1 metab.200(b) Data matrix
Software are provided by T. Nishioka (Kyoto Univ./Keio Univ.)
15
1-1
1-2
1-3
1-4,5
1-6
2-1
2-2
2-3
3
45
6
78
9
10
11
PG5
PG7
PG9 PG3
PG1
PG6
PG2
PG4
PG10
PG8
M-1
M-2 M-3
M-4
M-5
M-6
M-7
M-8
M-9M-10
M-11
M-12
M-13
M-14
M-15
M-16
M-17
(c) Classification of ions into metabolite-derivative group (DPClus)
Correlation network for individual ions.
Intensity ratio between Monoisotope (M) and Isotope (M+1) # of Carbons in molecular formula:
16
(d) Annotation of ions as metabolites using KNApSAcK DBDetected
m/za
Theoreticalm/z
Molecular formula
Exact mass Error Candidate Species
72.9878 73.9951 C2H2O3 74.0004 0.0053 Glyoxylic acid Escherichia coli
143.1080 144.1153 C8H16O2 144.1150 0.0003 Octanoic acid Escherichia coli
253.2137 254.2210 C16H30O2 254.2246 0.0036 omega-Cycloheptanenonanoic acid Alicyclobacillus acidocaldarius
253.2185 254.2258 C16H30O2 254.2246 0.0012 omega-Cycloheptanenonanoic acid Alicyclobacillus acidocaldarius
281.2444 282.2516 C18H34O2 282.2559 0.0042 Oleic acid Escherichia coli
C18H34O2 282.2559 0.0042 cis-11-Octadecanoic acid Lactobacillus plantarum
C18H34O2 282.2559 0.0042 omega-Cycloheptylundecanoic acid Alicyclobacillus acidocaldarius
297.2410 298.2482 C18H34O3 298.2508 0.0026 alpha-Cycloheptaneundecanoic acid Alicyclobacillus acidocaldarius
297.2467 298.2540 C18H34O3 298.2508 0.0032 alpha-Cycloheptaneundecanoic acid Alicyclobacillus acidocaldarius
297.2516 298.2589 C18H34O3 298.2508 0.0081 alpha-Cycloheptaneundecanoic acid Alicyclobacillus acidocaldarius
321.0506 322.0579 C10H15N2O8P 322.0566 0.0013 dTMP Escherichia coli K12
346.0570 347.0643 C10H14N5O7P 347.0631 0.0012 AMP Escherichia coli
C10H14N5O7P 347.0631 0.0012 3'-AMP Escherichia coli
C10H14N5O7P 347.0631 0.0012 dGMP Escherichia coli
401.0168 402.0241 C10H16N2O11P2 402.0229 0.0012 dTDP Escherichia coli
402.9962 404.0035 C9H14N2O12P2 404.0022 0.0013 UDP Escherichia coli
426.0237 427.0310 C10H15N5O10P2 427.0294 0.0016 Adenosine 3',5'-bisphosphate Escherichia coli
C10H15N5O10P2 427.0294 0.0016 ADP Escherichia coli
C10H15N5O10P2 427.0294 0.0016 dGDP Escherichia coli
454.0391 455.0464 C20H19Cl2NO7 455.0539 0.0075 Antibiotic MI 178-34F18A2 Actinomadura spiralis MI178-34F18
C20H19Cl2NO7 455.0539 0.0075 Antibiotic MI 178-34F18C2 Actinomadura spiralis MI178-34F18
458.1112 459.1185 C15H22N7O8P 459.1267 0.0083 Phosmidosine B Streptomyces sp. strain RK-16
495.1039 496.1112 C24H20N2O10 496.1118 0.0006 Kinamycin A Streptomyces murayamaensis sp. nov.
C24H20N2O10 496.1118 0.0006 Kinamycin C Streptomyces murayamaensis sp. nov.
505.9908 506.9981 C10H16N5O13P3 506.9957 0.0023 ATP,dGTP Escherichia coli
547.0756 548.0829 C16H26N2O15P2 548.0808 0.0020 dTDP-L-rhamnose Escherichia coli
565.0503 566.0576 C15H24N2O17P2 566.0550 0.0025 UDP-D-glucose Escherichia coli
C15H24N2O17P2 566.0550 0.0025 UDP-D-galactose Escherichia coli
606.0775 607.0848 C17H27N3O17P2 607.0816 0.0032 UDP-N-acetyl-D-mannosamine Escherichia coli
C17H27N3O17P2 607.0816 0.0032 UDP-N-acetyl-D-glucosamine Escherichia coli
618.0897 619.0970 C17H27N5O16P2 619.0928 0.0042 ADP-L-glycero-beta-D-manno-heptopyranose
Escherichia coli
662.1037 663.1109 C21H27N7O14P2 663.1091 0.0018 NAD Escherichia coli
664.1095 665.1168 C21H29N7O14P2 665.1248 0.0080 NADH Escherichia coli
741.4729 742.4801 C32H62N12O8 742.4814 0.0012 Argimicin A Sphingomonas sp.
786.4712 787.4785 C41H65N5O10 787.4731 0.0054 BE 32030B Nocardia sp. A32030
853.3166 854.3239 C41H46N10O9S 854.3170 0.0069 Argyrin G Archangium gephyra Ar 8082
C45H56Cl2N2O10 854.3312 0.0073 Decatromicin B Actinomadura sp. MK73-NF4
C39H50N8O12S 854.3269 0.0030 Napsamycin C Streptomyces sp. HIL Y-82,11372
17
PLSY
Responses
X
N=8
M=220K=1
N=8
PLS (Partial Least Square regression model) -- extract important combinations of metabolites. N (biol.condition) << M (metabolites)
(e) Estimation of cell condition based on a function of the composition of metabolites.
Y(Cell density)= a1 x1 +…+ aj xj +….+ aM xM
xj, the quantity for jth metabolites
cell condition cell condition
mea
sure
men
t p
oin
ts
Metabolites0.1
1
10
0 200 400 600 800Time (min)
OD
600
T1T2T3
T4T5
T6T7 T8
0.1
0.0
ajUDP-glucose, UDP-galactose
NAD
Parasperone A
UDP-N-acetyl-D-glucosamineUDP-N-acetyl-D-mannosamine
ADP, Adenosine 3',5'-bisphosphate, dGDP
UDP
omega-Cycloheptyl-alpha-hydroxyundecanoate
Octanoic aciddTMP, dGMP, 3'-AMP
NADH
Argyrin G
dTDP
ATP, dGTP
Lenthionine
omega-CycloheptylnonanoatedTDP-6-deoxy-L-mannoseomega-Cycloheptylundecanoate, cis-11-Octadecanoic acid
ADP-(D,L)-glycero-D-manno-heptose
Glyoxylate
omega-Cycloheptyl-alpha-hydroxyundecanoate
-0.15
Stationary-phase dominantExponential-phase dominant
y(OD600 Cell Density)= a1 x1 +…+ aj xj +….+ aM xM
aj > 0, stationary phase-dominant metabolites
xj , the quantity for jth
aj < 0, exponential phase-dominant metabolites
(e) Assessment of cellular condition by metabolite compositionDetection of stage-specific metabolites
(PLS model of OD600 to metabolite intensities)
Red: E.coli metabolites;Black: Other bacterial metabolites
PG1,3,5,7,9
MS/MS analyses
120 metabolites
80 metabolites
MS/MS analysesPG2,4,6,8,10
10 Phosphatidylglycerols detected by MS/MS spectra
(b) Relation of mass differences among PG1 to 10marker molecules
PG530:1(14:0,16:1)
PG132:1(16:0,16:1)
PG334:1(16:0,18:1)
PG631:0(14:0,c17:0)
PG233:0(16:0,c17:0)
PG434:5(16:0,c19:0)
PG734:2(16:1,18:1)
PG936:2(18:1,18:1)
PG835:1(16:1,c19:1)
PG1037:1(18:1,c19:0)
(Cluster 1)
28.0281
14.0170
(Cluster 2)
14.0187 14.0110
14.0181
28.0315
28.0298 28.0237
2.0138
2.0051
28.0330
28.0314
14.0197
CFA CFA CFA
CFA CFA∆(CH2)2
US
US
∆(CH2)2
∆(CH2)2
∆(CH2)2
∆(CH2)2
∆(CH2)2
O
O C15H31
O
O
OX3
O
O C15H31
O
O
OX3
Cyclopropane Formation of PGs occurs in the transition from exponential to stationary phase.
Exponential phase
Stationary phase
Cyclopropane Formaiton of PGs
unsaturated PGs
cyclopropanated PGs
Self organizing Maps
Time-series Data
0.01
0.1
1
10
12
Tj
Time
Growth curve
DTDjDD
iTijii
Tj
Tj
xxxx
xxxx
xxxx
xxxx
......
..................
......
..................
......
......
21
21
222221
111211
…
D
i
Gene
Gene
Gene
Gene
...
...2
1
Expression profiles
…
When we measure time-series microarray, gene expression profile is represented by a matrixSOM makes it possible to examine gene similarity and stage similarity simultaneously.
Stage 1 2 …. j … T
D
i
x
x
x
x
...
...
21
T, # of time-series microarray experimentsD, # of genes in a microarray
Time-series Data
0.01
0.1
1
10
12
Tj
Time
Growth curve
DTDjDD
iTijii
Tj
Tj
xxxx
xxxx
xxxx
xxxx
......
..................
......
..................
......
......
21
21
222221
111211
…
D
i
Gene
Gene
Gene
Gene
...
...2
1
Expression profiles
…
When we measure time-series microarray, gene expression profile is represented by a matrixSOM makes it possible to examine gene similarity and stage similarity simultaneously.
Stage 1 2 …. j … T
D
i
x
x
x
x
...
...
21
T, # of time-series microarray experimentsD, # of genes in a microarray … …
Stage similarity
Expression similarity
STATES
State-Transition
Multivariate AnalysisSOM : expression similarity of genes and stage similarity simultaneously.
BL-SOM is available at http://kanaya.aist-nara.ac.jp/SOM/
SOM was developed by Prof. Teuvo Kohonen in the early 1980s
Multi-dimensional data/input vectors are mapped onto a two dimensional array of nodes
In original SOM, output depends on input order of the vectors.
To remove this problem Prof. Kanaya developed BL-SOM.
[1] Initial model vectors are determined based on PCA of the data.
[2] The learning process of BL-SOM makes the output independent of the order of the input vectors.
SOM Algorithm
Source: “Clustering Challenges in Biological Networks” edited by S. Butenko et. al.
SOM Algorithm
Source: “Clustering Challenges in Biological Networks” edited by S. Butenko et. al.
SOM Algorithm
Source: “Clustering Challenges in Biological Networks” edited by S. Butenko et. al.
SOM Algorithm
in Fig. before
Source: “Clustering Challenges in Biological Networks” edited by S. Butenko et. al.
X2
X1
XT
Self-organizing Mapping (Summary)
Gene i (xi1,xi2,..,xiT)
DTDjDD
iTijii
Tj
Tj
xxxx
xxxx
xxxx
xxxx
......
..................
......
..................
......
......
21
21
222221
111211
D
i
Gene
Gene
Gene
Gene
...
...2
1
D
i
x
x
x
x
...
...
21
T, different time-series microarray experiments
[1] Detection method for transition points in gene expression and metabolite quantity based on batch-learning Self-organinzing map (BL-SOM)
[2] Diversity of metabolites in species Species-metabolite relation Database
X2
X1
XT
Self-organizing Mapping (Summary)Arrangement of lattice points in multi-dimensional expression spaceLattice points are optimized for reflecting data distribution
Gene ClassificationGenes are classified into the nearest lattice points
Gene i (xi1,xi2,..,xiT)
X2
X1
XT
Self-organizing Mapping (Summary)
Non-linear projection of multi-dimensional expression profiles of genes.Original dimension is conserved in individual lattice points.Several types of information is stored in SOM
Arrangement of lattice points in multi-dimensional expression spaceLattice points are optimized for reflecting data distribution
Gene ClassificationGenes with similar expression profiles are clusterized to identical or near lattice points
Feature Mapping In the i-th condition, lattice points containing only highly (low) expressed genes are colored by red (blue).
Xk> Th.(k)
Xk< -Th.(k)
X1 (Time 1)
X2 (Time 2)
X3 (Time 3)
XT (Time T)
Visually comparing among each stage of time-series data
(ex.)
…..…..…..
k=1,2,…,T
SOM for time-series expression profile
Estimation of transition points; Bacillus subtilis (LB medium) (Data: Kazuo Kobayashi, Naotake Ogasawara (NAIST))
Stage 1 2 3 4 5 6 7 8
(min)
Cell Density (OD600 )
0.001
0.01
0.1
1
10
-1000
0
-2000
1
2
34
8765
LB
log(Prob. Density)
0 200 400 600 800 1000
State transition point is observed between stages 3 and 4
Low prob.
High prob.
Integerated analysis of gene expression profile and metabolite quantity data of Arabidopsis thaliana (sulfur def./cont.; Data are provided by K.Saito, M. Hirai group (PSC) )
Nakamura et al (2004)
ppm(error rate)
Accurate molecular weights Candidate metabolites corresponding to accurate molecular weights
3. Species-metabolite relation Database
Lattice points with highly difference between 12 and 24 h.Blue: DecreasedRed: increased
Gene
Metabolites(m/z)
Feature Maps
State transition
Root Root
LeafLeaf
Download sites of BL-SOMRiken : http://prime.psc.riken.jp/NAIST: http://kanaya.naist.jp/SOM/
Application of BL-SOM to “-omics”
Genome
Kanaya et al., Gene, 276, 89-99 (2001)Abe et al., Genome Res., 13, 693-702, (2003)Abe et al., J.Earth Simulator, 6, 17-23, (2003)Abe et al., DNA Res., 12, 281-290. (2005) Transcriptome Haesgawa et al., Plant Methods, 2:5:1-18 (2006)
MetabolomeKim et al., J. Exp.Botany, 58, 415-424, (2007)Fukusaki et al., J.Biosci.Bioeng., 100, 347-354, (2005)
Transcriptome and MetabolomeHirai, M. Y., M. Klein, et al. J.Biol. Chem., 280, 25590-5 (2005)Hirai, M. Y., M. Yano, et al. Proc Natl Acad Sci U S A 101, 10205-10 (2004)Morioka, R, et al., BMC Bioinformatics, 8, 343, (2007)Yano et al., J.Comput. Aided Chem.,7,125-136 (2007)……
35
Some other popular clustering/classification algorithms:
K-mean clustering
Support vector machines
Summary of Bioinformatics Tool developed in our laboratory http://kanaya.naist.jp/~skanaya/Web/JTop.html
Metabolomics-- MS data processing
Transcriptome and Metabolomics Profiling-- estimation of transition points
Species-metabolite DB
Transcriptomics-- Statistics, Profiling, …
Network analysis: PPI
All softwares and DB are freely accessable via Web.
www.geneontology.org www.genome.ad.jp/kegg www.ncbi.nlm.nih.gov www.ebi.ac.uk/databases http://www.ebi.ac.uk/uniprot/ http://www.yeastgenome.org/ http://mips.helmholtz-muenchen.de/proj/ppi/ http://www.ebi.ac.uk/trembl http://dip.doe-mbi.ucla.edu/dip/Main.cgi www.ensembl.org
Some websites
Some websites where we can find different types of data and links to other databases