using networks to derive function
DESCRIPTION
Systems Biology Workshop, Technical University of Denmark, Lyngy, Denmark, May 14-15, 2009TRANSCRIPT
Using networks to derive function
Lars Juhl Jensen
STRING
Jensen, Kuhn et al., Nucleic Acids Research, 2009
functional associations
Frishman et al., Modern Genome Annotation, 2009
common basis
630 genomes
model organism databases
Ensembl
RefSeq
genomic context methods
gene fusion
Korbel et al., Nature Biotechnology, 2004
conserved neighborhood
operons
Korbel et al., Nature Biotechnology, 2004
bidirectional promoters
Korbel et al., Nature Biotechnology, 2004
phylogenetic profiles
Korbel et al., Nature Biotechnology, 2004
primary experimental data
protein interactions
yeast two-hybrid
affinity purification
fragment complementation
Jensen & Bork, Science, 2008
genetic interactions
Beyer et al., Nature Reviews Genetics, 2007
BINDBiomolecular Interaction Network Database
BioGRIDGeneral Repository for Interaction Datasets
DIPDatabase of Interacting Proteins
IntAct
MINTMolecular Interactions Database
HPRDHuman Protein Reference Database
PDBProtein Data Bank
inferred associations
gene coexpression
GEOGene Expression Omnibus
expression compendia
curated knowledge
complexes
MIPSMunich Information center
for Protein Sequences
Gene Ontology
pathways
Letunic & Bork, Trends in Biochemical Sciences, 2008
KEGGKyoto Encyclopedia of Genes and Genomes
MetaCyc
Reactome
PIDNCI-Nature Pathway Interaction Database
literature mining
MEDLINE
SGDSaccharomyces Genome Database
The Interactive Fly
OMIMOnline Mendelian Inheritance in Man
co-mentioning
statistical methods
NLPNatural Language Processing
Gene and protein namesCue words for entity recognitionVerbs for relation extraction
[nxgene The GAL4 gene]
[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]
easy in theory …
… but not in practice
many data types
not comparable
variable quality
many sources
different file formats
different gene identifiers
partially redundant
spread over 630 genomes
quality scores
reproducibility
von Mering et al., Nucleic Acids Research, 2005
benchmarking
von Mering et al., Nucleic Acids Research, 2005
orthology
von Mering et al., Nucleic Acids Research, 2005
two modes
COG mode
von Mering et al., Nucleic Acids Research, 2005
protein mode
von Mering et al., Nucleic Acids Research, 2005
combine all evidence
Frishman et al., Modern Genome Annotation, 2009
Acknowledgments
Christian von Mering
Michael Kuhn
Manuel Stark
Samuel Chaffron
Philippe Julien
Tobias Doerks
Jan Korbel
Berend Snel
Martijn Huynen
Peer Bork