symbiosys k.u.leuven center for systems biology. topics to be addressed international trend project...
TRANSCRIPT
SymBioSys
K.U.Leuven Center for Systems Biology
Topics to be addressed
International trend Project concept Project structure 3 problems and 3 cases Computational methodology leads to user-
friendly tools and real biological impact Strategic importance internationally Strategic importance K.U.Leuven Coherence of the consortium
Systems biology
Biostatistics
Genetics
Sequenceanalysis
Expression analysis
Personalized
medicine
Nutraceuticals
Post-genomicdrug
development(new targets,
toxicogenomics)GMOs
Systems biology
Biological question& model
High-throughputtechnology
Computers& databases
Mathematicalmodels
The Human Genome Project has catalyzed striking paradigm changes in
biology - biology is an information science. [...] Systems biology will play a
central role in the 21st century; there is a need for global (high
throughput) tools of genomics, proteomics, and cell biology to decipher
biological information; and computer science and applied math will play a
commanding role in converting biological information into knowledge.
Leroy Hood, Institute for Systems Biology, Seattle, WA, 2002
Center of Excellence
Become a world-leading bioinformatics center for systems biology
Bioinformatics & microarrays Three topics of excellence
Gene prioritization by integrative genomics Graphical models of regulatory motifs and modules Inference of regulatory networks
We will achieve this goal through Further build-up of existing expertise Symbiosis between computational and biological
partners Concrete cases for real biological relevance Diverse cases for generic applicability in biology
Systems biologyG
enes
Module
sN
etw
ork
s
Probabilisticmodels
Inte
grat
ive
geno
mic
sRegulatory
modules
Cellularnetworks
Case
Case
Project concept
Cas
eCase
Probabilisticmodels
Inte
grat
ive
geno
mic
sRegulatory
modules
Cellularnetworks
Gen
etic
al g
enom
ics
Endocrinology
Salmonella genomics
Biological problem
Research concept & consortium
Probabilisticmodels
Inte
grat
ive
geno
mic
sRegulatory
modules
Cellularnetworks
Gen
etic
al g
enom
ics
Endocrinology
Salmonella genomics
Biological problem
Experiment design
Research concept & consortium
Probabilisticmodels
Inte
grat
ive
geno
mic
sRegulatory
modules
Cellularnetworks
Gen
etic
al g
enom
ics
Endocrinology
Salmonella systems biology
Biological problem
Experiment design
Biological data
Research concept & consortium
Probabilisticmodels
Inte
grat
ive
geno
mic
sRegulatory
modules
Cellularnetworks
Gen
etic
al g
enom
ics
Endocrinology
Salmonella genomics
Biological problem
Experiment design
Biological data
Data analysis
Research concept & consortium
Probabilisticmodels
Inte
grat
ive
geno
mic
sRegulatory
modules
Cellularnetworks
Gen
etic
al g
enom
ics
Endocrinology
Salmonella genomics
Biological problem
Experiment design
Biological data
Data analysis
Biological validation
Research concept & consortium
Probabilisticmodels
Inte
grat
ive
geno
mic
sRegulatory
modules
Cellularnetworks
Gen
etic
al g
enom
ics Endocrinology
Salmonella genomics
Biological problem
Experiment design
Biological data
Data analysis
Biological validation
Improved method
Research concept & consortium
Inte
grat
ive
geno
mic
sRegulatory
modules
Cellularnetworks
Gen
etic
al g
enom
ics Endocrinology
Salmonella genomics
Biological problem
Experiment design
Biological data
Data analysis
Biological validation
Improved method
New biology
Probabilisticmodels
Research concept & consortium
Inte
grat
ive
geno
mic
sRegulatory
modules
Cellularnetworks
Gen
etic
al g
enom
ics Endocrinology
Salmonella genomics
Biological problem
Experiment design
Biological data
Data analysis
Biological validation
Improved method
New biology
Probabilisticmodels
Research concept & consortium
Inte
grat
ive
geno
mic
sRegulatory
modules
Cellularnetworks
Gen
etic
al g
enom
ics Endocrinology
Salmonella genomics
DME-VIB
Prometa
KUL &DME-VIB
World
Probabilisticmodels
Peripheral groups & visibility
Yeast(CMPG & Bio)
Project structure
WP1. Candidate genes
WP2. Regulatory modules
WP3. Cellular networks
Humangenetics
Glucoseregulation
VitDmodes
of action
Salmonellasystemsbiology
Networkinference
Motifanalysis
Primaryanalysis
CGHChIPchip
Proteomics
Metabolomics
Candidate genes
Regulatory modules
Cellular networks
cDNA/
Affy
Geneprioritization
Data analysis Data generation
Project structure (SysBio -> 3 partners)
Geneticalgenomics
Endocrinology
Salmonellagenomics
WP1. Candidate gene prioritization
High-throughputgenomics
Statistics& data mining
Candidategenes
?
Human genetics identifies key genes in monogenic and multifactorial diseases
Moduleanalysis
Statisticalanalysis CGH
cDNA/
Affy
Geneprioritization
Algorithms Technologies
1
23
4
5
WP2. Module discovery
ACTC
MYLA
MYL1
MYOG
MYF6
CHRM2
MEF2
MYOD
SRF
Bayesiannetworks
Motifanalysis
Statisticalanalysis CGH ChIP Proteomi
csMetabolomics
cDNA/
Affy
Geneprioritization
Algorithms Technologies
OH
OHHO
H
Cells/tissues treated with 1,25-(OH)2D3
Identification of signalling cascades and transcription factors important for the effects of 1,25-(OH)2D3
TF
Validation of transcription
factor binding to detected motifs
12
3
4
5
VitD affects bone and calcium homeostasis and has potent anti-proliferative effects
mRNA expression analysis in pancreatic beta cells: finding mechanisms of diabetes
Motifanalysis
Statisticalanalysis
Generation of
antibodies
Functionalanalysis of beta cells
AffymetrixGene
System
Geneprioritization
Algorithms Technologies
Discovery of new modules for post-transcriptional gene regulation
1
3
4
5
Beta non brain pitui lung kidney fat liver musclCells beta
cells
musclepituitarynon-beta cells
<-2.5 >2.5
Signal Log Ratio of mRNA in beta -cells versus other tissues
mRNA expression profiles of normal
& diabetic beta cells
2
Mouse models for a common human disease
Microarray-data
ChIP-chip-data
Library of strains, eachwith a tagged regulator
Chromatin IP toenrich promoters
bound by regulatorin vivo
Microarray to identifypromoters bound by
regulator in vivo
Regulator Tag
Library of strains, eachwith a tagged regulator
Chromatin IP toenrich promoters
bound by regulatorin vivo
Microarray to identifypromoters bound by
regulator in vivo
Regulator Tag Sequence data
Network inferenceREMODISCOVERY
Functional Class: p-value Seed Profile
10 CELL CYCLE AND DNA PROCESSING: 0 10.03 cell cycle: 2.7e-5 10.01 DNA processing: 1.3e-4 42.04 cytoskeleton: 4.2e-3
40 CELL FATE : 5.2e-4 40.01 cell growth / morphogenesis: 2.6e-3 43 CELL TYPE DIFFERENTIATION: 5.2e-3 43.01 f ungal/microorganismic cell type differentiation: 5.2e-3 34.11 cellular sensing and response: 5.3e-3 01.05.01 C-compound and carbohydrate utilization: 6.8e-3 10.03.04.03 chromosome condensation: 9.4e-3
43 CELL TYPE DIFFERENTIATION: 3.6e-3 43.01 fungal/microorganismic cell type differentiation: 3.6e-3 10.03.03 cytokinesis (cell division) /septum formation : 4.8e-3
32.01 stress response: 3.2e-3 10.03 cell cycle: 8.7e-3
Combinatorial algorithm
WP3. Network inference
Salmonella is a powerful model for systems biology (illustration size)
Networkinference
Moduleanalysis
Statisticalanalysis CGH ChIP Proteomi
csMetabolomics
cDNA/
Affy
Geneprioritization
Algorithms Technologies
Library of strains, eachwith a tagged regulator
Chromatin IP toenrich promoters
bound by regulatorin vivo
Microarray to identifypromoters bound by
regulator in vivo
Regulator Tag
Library of strains, eachwith a tagged regulator
Chromatin IP toenrich promoters
bound by regulatorin vivo
Microarray to identifypromoters bound by
regulator in vivo
Regulator Tag
0
TF1
Gene 1Gene 2Gene 3Gene 4Gene 5Gene 6
Gene n
TF2 TF3 TF4 … TFm
…
1 0 0 11 0 1 0 01 0 1 0 01 1 1 0 11 0 1 0 00 1 1 0 0
1 0 1 1 0
0
TF1
Gene 1Gene 2Gene 3Gene 4Gene 5Gene 6
Gene n
TF2 TF3 TF4 … TFm
…
1 0 0 11 0 1 0 01 0 1 0 01 1 1 0 11 0 1 0 00 1 1 0 0
1 0 1 1 0
0
M1
Gene 1Gene 2Gene 3Gene 4Gene 5Gene 6
Gene n
M2 M3 M4 … Mp
…
1 0 0 00 0 1 1 11 0 0 1 11 1 1 0 11 0 1 1 10 1 1 0 0
1 0 1 1 1
0
M1
Gene 1Gene 2Gene 3Gene 4Gene 5Gene 6
Gene n
M2 M3 M4 … Mp
…
1 0 0 00 0 1 1 11 0 0 1 11 1 1 0 11 0 1 1 10 1 1 0 0
1 0 1 1 1
E1
Gene 1
Gene 2
Gene 3
Gene n
E2 E3 E4 … Ex
…
Gene 4
Gene 5
E1
Gene 1
Gene 2
Gene 3
Gene n
E2 E3 E4 … Ex
…
Gene 4
Gene 5
Preprocessing
Heterogeneous data
Motif compendium
Inferred network
Toucan 2
CGHGate
Endeavour
Real biological impact
Screenshots of titles of papers demonstrating a real biological impact of bioinformatics methods?
Bioi@SCD growth
Turnover since 1998
0
200000
400000
600000
800000
1000000
1200000
1400000
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
Omzet verloop per financieringskanaal 1998-2009
IWT
FWO
EU
DWTC
BOF
CMPG• J. Vanderleyden• J. Michiels• B. Cammue
Dept. of Mol. Microbiology
• J. Thevelein
CME-MG
• B. Hassan
• P. Marynen
• B. De Strooper
• W. Van de Ven
Lab of Clin. & Evolut.
Virology
• A. Vandamme
Dept. of Transgene Tech. &
Gene Therapy
• P. Carmeliet
CME-UZ
• JJ. Cassiman (CME-KUL)
• J. Vermeesch
Intensive Care
• G. Van Den Berghe
Obstetrics & Gynaecology
• I. Vergote
• T. D‘Hooghe
• D. Timmerman
PaperPaper
Paper
PaperPaper
Paper
Paper
Paper
Lab of Functional Biology
• J. Winderickx
LEGENDO
• C. Mathieu
CMPG• J. Vanderleyden• J. Michiels• B. Cammue
Lab of Clin. & Evolut.
Virology
• A. Vandamme
QuantPsy
• I. Van Mechelen
Lab of Functional Biology
• J. Winderickx
LEGENDO
• C. Mathieu
Mol.Cell Biology
BioChemistry
• F. Schuit
BioStat
• G. Verbeke
Dept. of Mol. Microbiology
• J. Thevelein
Dept. of Transgene Tech. &
Gene Therapy
• P. Carmeliet
CME-MG
• B. Hassan
• P. Marynen
• B. De Strooper
•W. Van de Ven
CME-UZ
• JJ. Cassiman
• J. Vermeersch
Intensive Care
• G. Van Den Berghe
Obstetrics & Gynaecology
• I. Vergote
• T. D‘Hooghe
• D. Timmerman
CoE
CoE
CoE
CoE
CoE
CoE
CoE
European bioinformatics landscape
Integration bioinformatics & stats Algorithmic methodologiesz
Three topics of excellence
Bioinformatics & microarrays1. Gene prioritization by integrative genomics2. Graphical models of regulatory motifs and modules3. Bayesian networks for prokaryotic systems biology
(1) Genomic data fusion
After an experiment, many sources of information are available to select the best candidates for modeling and validation
Probabilistic methods can optimize the prioritization Known genesrelated to a disease
or pathway Candidate genes Locus Screening
Multiple data sources Sequence Expression Function
Endeavour [Methodological impact]
http://www.esat.kuleuven.ac.be/endeavour
(2) Regulatory modules [what is a module? What is transcript. regulation?]
© Davidson EH et al. Science. 2002 Mar 1;295(5560):1669-78.
Gibbs motif finding
Initialization Sequences Random motif matrix
Iteration Sequence scoring Alignment update Motif instances Motif matrix
Termination Convergence of the alignment
and of the motif matrix
MotifSampler & TOUCAN
(3) Network inference
Reconstruction of the regulatory network underlying the phenotypic behavior
High throughput data
Benchmarking network inference methods
Realistic network structures
Realistic network dynamics
Simulated networks
Inferred networks
Graphical models
System identification
AK
Av
sv
max
1max ifAvv
Netw
ork
simula
tion
Netw
ork
Infe
rence
Workpackages
WP1: Candidate genes Preliminary data analysis
Microarrays (xM1.1) Generic
CGH microarrays (gWP1) Genetical genomics
Dealing with noise (xM2.1) Knowledge mining (gWP2)
& Combined modeling of different data sets (xM2.3) Genetical genomics Generic -> WP3: Salmonella
Software & databases (xM1.4)
Workpackages WP2: Regulatory modules
Motif and module discovery (xM1.2) Expression profiling in vitD and analogs pathways (xM3.1,
xM3.2) Beta cell regulation
Transcriptional regulation Post-transcriptional regulation
Genetic modules Multiple genome scans and gene modifiers?
Software & databases (xM1.4) WP3: Cellular networks
Network inference (xM1.3) Salmonella high-throughput technologies (xM4.1) Salmonella high-throughput data and analysis (xM4.2) VitD pathway modeling? Glucose sensing?
Detection of dependence relations (xM2.2) Software & databases (xM1.4)
Bioi@SCD growth
Personnel since 1998
0
5
10
15
20
25
Jul-98
Oct-98
Jan-99
Apr-99
Jul-99
Oct-99
Jan-00
Apr-00
Jul-00
Oct-00
Jan-01
Apr-01
Jul-01
Oct-01
Jan-02
Apr-02
Jul-02
Oct-02
Jan-03
Apr-03
Jul-03
Oct-03
Jan-04
Apr-04
Jul-04
Oct-04
Jan-05
Personeelsverloop 1998-2005
PhD
Postdoc
ZAP
Bioi@SCD growth
Publications since 1998
•
0
2
4
6
8
10
12
14
16
18
20
1999 2000 2001 2002 2003 2004 2005
Aantal publicaties van 1999-2005
Books
Conference
Journal
Bio@SCD growth
5 successful PhDs Gert Thijs (juni 2003) : Probabilistic methods to search
for regulatory elements in sets of coregulated genes Frank De Smet (mei 2004) : Microarrays : algorithms for
knowledge discovery in oncology and molecular biology Stein Aerts (mei 2004): Computational discovery of cis-
regulatory modules in animal genomes Geert Fannes (juni 2004): Bayesian learning with expert
knowledge : Transforming informative priors between Bayesian networks and multilayer perceptrons
Patrick Glenisson (juni 2004) : Integrating scientific literature with large scale gene expression analysis
Bioi@SCD growth
Software portal http://www.esat.kuleuven.ac.be/~dna/Bioi/
Number of user on a monthly basis
0
200
400
600
800
1000
1200
1400
Nov-0
0
Feb-0
1
May
-01
Aug-0
1
Nov-0
1
Feb-0
2
May
-02
Aug-0
2
Nov-0
2
Feb-0
3
May
-03
Aug-0
3
Nov-0
3
Feb-0
4
Toucan 2
Endeavour
CMPG• J. Vanderleyden• J. Michiels• B. Cammue
Dept. of Mol. Microbiology
• J. Thevelein
CME-MG
• B. Hassan
• P. Marynen
• B. De Strooper
•W. Van de Ven
Intensive Care
• G. Van Den Berghe
Obstetrics & Gynaecology
• I. Vergote
• T. D‘Hooghe
• D. Timmerman
IDO, BOF PostDoc
GBOU, PhD
Project, PhD, PostDoc
CAGE
Bruges
Kortrijk
Ghent
Antwerp
Brussels
Leuven
Turnhout
2005
Geel
Hasselt
Mechelen
BrugesGenencor International
GhentAblynxAlgoNomics Applied Maths Bayer BioScience Bioin4matrixBioMARIC CropDesigndeVGen
Innogenetics Maize Technologies Int’lMethexis Genomics XcellentisYakultPeakadilly
AntwerpDCI-labsFlen PharmaHistogenexMemo Bead Technologies
TurnhoutDiaMed EuroGenJanssen Pharmaceutica
GeelBarrier TherapeuticsGenzyme FlandersMaia Scientific
MechelenBio-ArtCryoSaveGalapagos Genomics TibotecVirco
BrusselsBeta-cell DentechEggCentrisR.E.D. Laboratories
Leuven4AZA Bioscience DiatosNeurogeneticsPharmaDMreMynd RNA-TEC Thromb-X Tigenix Vivactis
Flemish biotech companies
Bayesiannetworks
Motifanalysis
Statisticalanalysis
CGHChIPchip
Proteomics
Metabolomics
Candidate genes
PI:
Regulatory modules
PI:
Cellular networks
PI:
cDNA/
Affy
Geneprioritization
Algorithmic research Data generation
Project structure – budget (750 KEuro?)
Geneticalgenomics
Endocrinology
Salmonellagenomics
Postdoc 2
Phd 2
Techn 1
Postdoc 3
Phd 3
Postdoc 1
Phd 1
Techn 2
Techn 3
Phd 4
allerlei
Eerste citaties met “bioinformatics”
Trends Biotechnol 1993 Ann N Y Acad Sci 1993
Network reconstruction based on heterogeneous data
Microarray-data
ChIP-chip-data
Library of strains, eachwith a tagged regulator
Chromatin IP toenrich promoters
bound by regulatorin vivo
Microarray to identifypromoters bound by
regulator in vivo
Regulator Tag
Library of strains, eachwith a tagged regulator
Chromatin IP toenrich promoters
bound by regulatorin vivo
Microarray to identifypromoters bound by
regulator in vivo
Regulator Tag Sequence data
Preprocessing Network inference
AK
Av
sv
max
1max ifAvv
Network structures based on real biological networks
Realistic network dynamics Simulated networks
Benchmarking network inference methodologies
R M Functional Class: p-value Seed Profile
Module 1
Mbp1 Swi6 Swi4 Stb1
M_18 (Mbp1) M_12 (Mbp1) M_11 (Swi4) M_67 (Swi4)
10 CELL CYCLE AND DNA PROCESSING: 0 10.03 cell cycle: 2.7e-5 10.01 DNA processing: 1.3e-4 42.04 cytoskeleton: 4.2e-3
Module 2
Swi4 Mbp1 Swi6 FKH2
M_18 (Mbp1) M_12 (Mbp1) M_11 (Swi4) M_8 (Mcm)
40 CELL FATE : 5.2e-4 40.01 cell growth / morphogenesis: 2.6e-3 43 CELL TYPE DIFFERENTIATION: 5.2e-3 43.01 f ungal/microorganismic cell type differentiation: 5.2e-3 34.11 cellular sensing and response: 5.3e-3 01.05.01 C-compound and carbohydrate utilization: 6.8e-3 10.03.04.03 chromosome condensation: 9.4e-3
Module 3
NDD1 FKH2 Mcm1
M_8 (Mcm) M_30 (Mcm)
43 CELL TYPE DIFFERENTIATION: 3.6e-3 43.01 fungal/microorganismic cell type differentiation: 3.6e-3 10.03.03 cytokinesis (cell division) /septum formation : 4.8e-3
Module 4
Swi5 (Ace2)
M_8 (Mcm)
32.01 stress response: 3.2e-3 10.03 cell cycle: 8.7e-3
AK
Av
sv
max
1max ifAvv
Realistic network structures
Realistic network dynamics
Simulated networks
Benchmarking network inference methodologies
Inferred networks
Graphical models
System identification
Now: the molecular pipeline
Powerful high-throughput technologies enable genomewide screening
Sequencing, microarrays, etc.
Some genes selected(arbitrarily) for validation
After a long validationthe best-known genesare integrated into a biological model (maken van predictieve modellen op beperkte genen is niet het onderwerp van het project)
Screen
Validate
Model
Future: the systems genomics pipeline
Validate
Select
By integrating computation tightly with biological experiments, promising genes are selected and integrated to computational models to retain only the best candidates for validation
There is a continuous interchange between the different levels of analysis
Screen
Model