integration of diverse large-scale datasets
TRANSCRIPT
Integration of diverselarge-scale datasets
Lars Juhl Jensen
promoter analysis
Jensen et al., Bioinformatics, 2000
DNA structure
genome visualization
Pedersen et al., Journal of Molecular Biology, 2000
microarray normalization
Workman et al., Genome Biology, 2002
protein function prediction
STRING
integrate diverse evidence
functional interactions
Bork et al., Current Opinion in Structural Biology, 2005
179 proteomes
evolution
statistics
(the original sin)
prokaryotes
genomic context methods
gene fusion
gene neighborhood
phylogenetic profiles
Cell
Cellulosomes
Cellulose
eukaryotes
integrate diverse datasets
Jensen et al., Drug Discovery Today: Targets, 2004
curated knowledge
MIPSMunich Information center
for Protein Sequences
KEGGKyoto Encyclopedia of Genes and Genomes
STKESignal Transduction Knowledge Environment
Reactome
literature mining
MEDLINE
SGDSaccharomyces Genome Database
The Interactive Fly
OMIMOnline Mendelian Inheritance in Man
co-mentioning
NLPNatural Language Processing
Gene and protein namesCue words for entity recognitionVerbs for relation extraction
[nxgene The GAL4 gene]
[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]
primary experimental data
microarray expression data
GEOGene Expression Omnibus
physical protein interactions
BINDBiomolecular Interaction Network Database
MINTMolecular Interactions Database
GRIDGeneral Repository for Interaction Datasets
DIPDatabase of Interacting Proteins
HPRDHuman Protein Reference Database
problems
many sources
(different gene identifiers)
many types of evidence
questionable quality
not directly comparable
spread over many species
huge synonyms lists
calculate raw quality scores
calibrate vs. gold standard
KEGGKyoto Encyclopedia of Genes and Genomes
von Mering et al., Nucleic Acids Research, 2005
transfer based on orthology
combine all evidence
Bork et al., Current Opinion in Structural Biology, 2005
cell cycle
qualitative modeling
Chen et al., Molecular Biology of the Cell, 2004
Chen et al., Molecular Biology of the Cell, 2004
synchronized cell culture
microarray time series
periodically expressed genes
S. cerevisiae
Cho et al.
Spellman et al.
numerous analysis methods
Cho et al.
Spellman et al.
Zhao et al.
Johansson et al.
Luan and Li
Lu et al.
Ahdesmäki et al.
Willbrand et al.
no benchmarking
de Lichtenberg et al., Bioinformatics, 2005
reproducibility
de Lichtenberg et al., Bioinformatics, 2005
regulation vs. periodicity
de Lichtenberg et al., Bioinformatics, 2005
list of 600 periodic genes
S. pombe
several expression studies
reproducibility
Marguerat et al., Yeast, 2006
name inconsistencies
Marguerat et al., Yeast, 2006
different analysis methods
no benchmarking
Marguerat et al., Yeast, 2006
Marguerat et al., Yeast, 2006
too many genes suggested
Marguerat et al., Yeast, 2006
Marguerat et al., Yeast, 2006
averaging better than voting
Marguerat et al., Yeast, 2006
S. cerevisiae
list of 600 periodic genes
protein interaction data
von Mering et al., Nucleic Acids Research, 2005
de Lichtenberg et al., Science, 2005
dynamic proteins
static proteins
de Lichtenberg et al., Science, 2005
reproduces what is known
de Lichtenberg et al., Science, 2005
many detailed predictions
de Lichtenberg et al., Science, 2005
global trends
dynamic proteins
de Lichtenberg et al., Science, 2005
static proteins
de Lichtenberg et al., Science, 2005
just-in-time assembly
de Lichtenberg et al., Science, 2005
de Lichtenberg et al., Science, 2005
coordinated regulation
periodically expressed genes
Cdc28p substrates
PEST degradation signals
the human interactome
yeast two-hybrid
1936
13
4
4
1385
65
18465
Stelzl et al. Rual et al.
Small-scale studies
32
0
3
4
18
4
23
Stelzl et al. Rual et al.
Small-scale studies
62 8 39
Small-scale studies
Stelzl et al. Rual et al.
852
17
473
432
69
260
3.5% and 21% sensitivity
in a couple of years
the human interactome
100% = 1/5?
the yeast interactome
five years ago
yeast two-hybrid
1150
117
117
72
4053
118
4469
Uetz et al. Ito et al.
Small-scale studies
162
53
34
72
180
29
338
Uetz et al. Ito et al.
Small-scale studies
511 189 616
Small-scale studies
Uetz et al. Ito et al.
439
178
759
897
190
1347
19% and 12% sensitivity
the challenge
how to get from here …
1936
13
4
4
1385
65
18465
Stelzl et al. Rual et al.
Small-scale studies
… to there …
de Lichtenberg et al., Science, 2005
Acknowledgments
• The STRING team (EMBL)– Christian von Mering– Berend Snel– Martijn Huynen– Sean Hooper– Mathilde Foglierini– Julien Lagarde– Peer Bork
• Literature mining project(EML Research)– Jasmin Saric– Rossitza Ouzounova– Isabel Rojas
• Cell cycle studies (CBS)– Ulrik de Lichtenberg– Thomas Skøt Jensen– Søren Brunak
• S. pombe cell cycle (Sanger)– Samuel Marguerat– Jürg Bähler
• Inspiration for presentation– Lawrence Lessig– Dick Clarence Hardt– Anders Gorm Pedersen
Thank you!