integration of diverse large-scale datasets

174
Integration of diverse large-scale datasets

Upload: lars-juhl-jensen

Post on 15-Jul-2015

380 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Integration of diverse large-scale datasets

Integration of diverselarge-scale datasets

Page 2: Integration of diverse large-scale datasets

Lars Juhl Jensen

Page 3: Integration of diverse large-scale datasets
Page 4: Integration of diverse large-scale datasets
Page 5: Integration of diverse large-scale datasets
Page 6: Integration of diverse large-scale datasets

promoter analysis

Page 7: Integration of diverse large-scale datasets

Jensen et al., Bioinformatics, 2000

Page 8: Integration of diverse large-scale datasets

DNA structure

Page 9: Integration of diverse large-scale datasets

genome visualization

Page 10: Integration of diverse large-scale datasets

Pedersen et al., Journal of Molecular Biology, 2000

Page 11: Integration of diverse large-scale datasets

microarray normalization

Page 12: Integration of diverse large-scale datasets

Workman et al., Genome Biology, 2002

Page 13: Integration of diverse large-scale datasets

protein function prediction

Page 14: Integration of diverse large-scale datasets
Page 15: Integration of diverse large-scale datasets
Page 16: Integration of diverse large-scale datasets
Page 17: Integration of diverse large-scale datasets
Page 18: Integration of diverse large-scale datasets

STRING

Page 19: Integration of diverse large-scale datasets
Page 20: Integration of diverse large-scale datasets

integrate diverse evidence

Page 21: Integration of diverse large-scale datasets

functional interactions

Page 22: Integration of diverse large-scale datasets

Bork et al., Current Opinion in Structural Biology, 2005

Page 23: Integration of diverse large-scale datasets

179 proteomes

Page 24: Integration of diverse large-scale datasets

evolution

Page 25: Integration of diverse large-scale datasets
Page 26: Integration of diverse large-scale datasets
Page 27: Integration of diverse large-scale datasets

statistics

Page 28: Integration of diverse large-scale datasets

(the original sin)

Page 29: Integration of diverse large-scale datasets

prokaryotes

Page 30: Integration of diverse large-scale datasets

genomic context methods

Page 31: Integration of diverse large-scale datasets

gene fusion

Page 32: Integration of diverse large-scale datasets
Page 33: Integration of diverse large-scale datasets

gene neighborhood

Page 34: Integration of diverse large-scale datasets
Page 35: Integration of diverse large-scale datasets

phylogenetic profiles

Page 36: Integration of diverse large-scale datasets
Page 37: Integration of diverse large-scale datasets
Page 38: Integration of diverse large-scale datasets
Page 39: Integration of diverse large-scale datasets
Page 40: Integration of diverse large-scale datasets

Cell

Cellulosomes

Cellulose

Page 41: Integration of diverse large-scale datasets

eukaryotes

Page 42: Integration of diverse large-scale datasets

integrate diverse datasets

Page 43: Integration of diverse large-scale datasets

Jensen et al., Drug Discovery Today: Targets, 2004

Page 44: Integration of diverse large-scale datasets

curated knowledge

Page 45: Integration of diverse large-scale datasets

MIPSMunich Information center

for Protein Sequences

Page 46: Integration of diverse large-scale datasets

KEGGKyoto Encyclopedia of Genes and Genomes

Page 47: Integration of diverse large-scale datasets

STKESignal Transduction Knowledge Environment

Page 48: Integration of diverse large-scale datasets

Reactome

Page 49: Integration of diverse large-scale datasets

literature mining

Page 50: Integration of diverse large-scale datasets

MEDLINE

Page 51: Integration of diverse large-scale datasets

SGDSaccharomyces Genome Database

Page 52: Integration of diverse large-scale datasets

The Interactive Fly

Page 53: Integration of diverse large-scale datasets

OMIMOnline Mendelian Inheritance in Man

Page 54: Integration of diverse large-scale datasets

co-mentioning

Page 55: Integration of diverse large-scale datasets

NLPNatural Language Processing

Page 56: Integration of diverse large-scale datasets

Gene and protein namesCue words for entity recognitionVerbs for relation extraction

[nxgene The GAL4 gene]

[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]

Page 57: Integration of diverse large-scale datasets
Page 58: Integration of diverse large-scale datasets

primary experimental data

Page 59: Integration of diverse large-scale datasets

microarray expression data

Page 60: Integration of diverse large-scale datasets

GEOGene Expression Omnibus

Page 61: Integration of diverse large-scale datasets

physical protein interactions

Page 62: Integration of diverse large-scale datasets

BINDBiomolecular Interaction Network Database

Page 63: Integration of diverse large-scale datasets

MINTMolecular Interactions Database

Page 64: Integration of diverse large-scale datasets

GRIDGeneral Repository for Interaction Datasets

Page 65: Integration of diverse large-scale datasets

DIPDatabase of Interacting Proteins

Page 66: Integration of diverse large-scale datasets

HPRDHuman Protein Reference Database

Page 67: Integration of diverse large-scale datasets

problems

Page 68: Integration of diverse large-scale datasets

many sources

Page 69: Integration of diverse large-scale datasets

(different gene identifiers)

Page 70: Integration of diverse large-scale datasets

many types of evidence

Page 71: Integration of diverse large-scale datasets

questionable quality

Page 72: Integration of diverse large-scale datasets

not directly comparable

Page 73: Integration of diverse large-scale datasets

spread over many species

Page 74: Integration of diverse large-scale datasets

huge synonyms lists

Page 75: Integration of diverse large-scale datasets

calculate raw quality scores

Page 76: Integration of diverse large-scale datasets

calibrate vs. gold standard

Page 77: Integration of diverse large-scale datasets

KEGGKyoto Encyclopedia of Genes and Genomes

Page 78: Integration of diverse large-scale datasets

von Mering et al., Nucleic Acids Research, 2005

Page 79: Integration of diverse large-scale datasets

transfer based on orthology

Page 80: Integration of diverse large-scale datasets

combine all evidence

Page 81: Integration of diverse large-scale datasets

Bork et al., Current Opinion in Structural Biology, 2005

Page 82: Integration of diverse large-scale datasets

cell cycle

Page 83: Integration of diverse large-scale datasets

qualitative modeling

Page 84: Integration of diverse large-scale datasets
Page 85: Integration of diverse large-scale datasets

Chen et al., Molecular Biology of the Cell, 2004

Page 86: Integration of diverse large-scale datasets

Chen et al., Molecular Biology of the Cell, 2004

Page 87: Integration of diverse large-scale datasets

synchronized cell culture

Page 88: Integration of diverse large-scale datasets
Page 89: Integration of diverse large-scale datasets

microarray time series

Page 90: Integration of diverse large-scale datasets
Page 91: Integration of diverse large-scale datasets

periodically expressed genes

Page 92: Integration of diverse large-scale datasets
Page 93: Integration of diverse large-scale datasets

S. cerevisiae

Page 94: Integration of diverse large-scale datasets

Cho et al.

Page 95: Integration of diverse large-scale datasets

Spellman et al.

Page 96: Integration of diverse large-scale datasets

numerous analysis methods

Page 97: Integration of diverse large-scale datasets

Cho et al.

Page 98: Integration of diverse large-scale datasets

Spellman et al.

Page 99: Integration of diverse large-scale datasets

Zhao et al.

Page 100: Integration of diverse large-scale datasets

Johansson et al.

Page 101: Integration of diverse large-scale datasets

Luan and Li

Page 102: Integration of diverse large-scale datasets

Lu et al.

Page 103: Integration of diverse large-scale datasets

Ahdesmäki et al.

Page 104: Integration of diverse large-scale datasets

Willbrand et al.

Page 105: Integration of diverse large-scale datasets

no benchmarking

Page 106: Integration of diverse large-scale datasets

de Lichtenberg et al., Bioinformatics, 2005

Page 107: Integration of diverse large-scale datasets

reproducibility

Page 108: Integration of diverse large-scale datasets

de Lichtenberg et al., Bioinformatics, 2005

Page 109: Integration of diverse large-scale datasets

regulation vs. periodicity

Page 110: Integration of diverse large-scale datasets

de Lichtenberg et al., Bioinformatics, 2005

Page 111: Integration of diverse large-scale datasets

list of 600 periodic genes

Page 112: Integration of diverse large-scale datasets

S. pombe

Page 113: Integration of diverse large-scale datasets

several expression studies

Page 114: Integration of diverse large-scale datasets

reproducibility

Page 115: Integration of diverse large-scale datasets

Marguerat et al., Yeast, 2006

Page 116: Integration of diverse large-scale datasets

name inconsistencies

Page 117: Integration of diverse large-scale datasets

Marguerat et al., Yeast, 2006

Page 118: Integration of diverse large-scale datasets

different analysis methods

Page 119: Integration of diverse large-scale datasets

no benchmarking

Page 120: Integration of diverse large-scale datasets

Marguerat et al., Yeast, 2006

Page 121: Integration of diverse large-scale datasets

Marguerat et al., Yeast, 2006

Page 122: Integration of diverse large-scale datasets

too many genes suggested

Page 123: Integration of diverse large-scale datasets

Marguerat et al., Yeast, 2006

Page 124: Integration of diverse large-scale datasets

Marguerat et al., Yeast, 2006

Page 125: Integration of diverse large-scale datasets

averaging better than voting

Page 126: Integration of diverse large-scale datasets

Marguerat et al., Yeast, 2006

Page 127: Integration of diverse large-scale datasets

S. cerevisiae

Page 128: Integration of diverse large-scale datasets

list of 600 periodic genes

Page 129: Integration of diverse large-scale datasets

protein interaction data

Page 130: Integration of diverse large-scale datasets
Page 131: Integration of diverse large-scale datasets

von Mering et al., Nucleic Acids Research, 2005

Page 132: Integration of diverse large-scale datasets

de Lichtenberg et al., Science, 2005

Page 133: Integration of diverse large-scale datasets

dynamic proteins

Page 134: Integration of diverse large-scale datasets

static proteins

Page 135: Integration of diverse large-scale datasets

de Lichtenberg et al., Science, 2005

Page 136: Integration of diverse large-scale datasets

reproduces what is known

Page 137: Integration of diverse large-scale datasets

de Lichtenberg et al., Science, 2005

Page 138: Integration of diverse large-scale datasets

many detailed predictions

Page 139: Integration of diverse large-scale datasets

de Lichtenberg et al., Science, 2005

Page 140: Integration of diverse large-scale datasets

global trends

Page 141: Integration of diverse large-scale datasets

dynamic proteins

Page 142: Integration of diverse large-scale datasets

de Lichtenberg et al., Science, 2005

Page 143: Integration of diverse large-scale datasets

static proteins

Page 144: Integration of diverse large-scale datasets

de Lichtenberg et al., Science, 2005

Page 145: Integration of diverse large-scale datasets

just-in-time assembly

Page 146: Integration of diverse large-scale datasets

de Lichtenberg et al., Science, 2005

Page 147: Integration of diverse large-scale datasets

de Lichtenberg et al., Science, 2005

Page 148: Integration of diverse large-scale datasets

coordinated regulation

Page 149: Integration of diverse large-scale datasets

periodically expressed genes

Page 150: Integration of diverse large-scale datasets

Cdc28p substrates

Page 151: Integration of diverse large-scale datasets

PEST degradation signals

Page 152: Integration of diverse large-scale datasets

the human interactome

Page 153: Integration of diverse large-scale datasets

yeast two-hybrid

Page 154: Integration of diverse large-scale datasets

1936

13

4

4

1385

65

18465

Stelzl et al. Rual et al.

Small-scale studies

Page 155: Integration of diverse large-scale datasets

32

0

3

4

18

4

23

Stelzl et al. Rual et al.

Small-scale studies

Page 156: Integration of diverse large-scale datasets

62 8 39

Small-scale studies

Stelzl et al. Rual et al.

852

17

473

432

69

260

Page 157: Integration of diverse large-scale datasets

3.5% and 21% sensitivity

Page 158: Integration of diverse large-scale datasets

in a couple of years

Page 159: Integration of diverse large-scale datasets

the human interactome

Page 160: Integration of diverse large-scale datasets

100% = 1/5?

Page 161: Integration of diverse large-scale datasets

the yeast interactome

Page 162: Integration of diverse large-scale datasets

five years ago

Page 163: Integration of diverse large-scale datasets

yeast two-hybrid

Page 164: Integration of diverse large-scale datasets

1150

117

117

72

4053

118

4469

Uetz et al. Ito et al.

Small-scale studies

Page 165: Integration of diverse large-scale datasets

162

53

34

72

180

29

338

Uetz et al. Ito et al.

Small-scale studies

Page 166: Integration of diverse large-scale datasets

511 189 616

Small-scale studies

Uetz et al. Ito et al.

439

178

759

897

190

1347

Page 167: Integration of diverse large-scale datasets

19% and 12% sensitivity

Page 168: Integration of diverse large-scale datasets

the challenge

Page 169: Integration of diverse large-scale datasets

how to get from here …

Page 170: Integration of diverse large-scale datasets

1936

13

4

4

1385

65

18465

Stelzl et al. Rual et al.

Small-scale studies

Page 171: Integration of diverse large-scale datasets

… to there …

Page 172: Integration of diverse large-scale datasets

de Lichtenberg et al., Science, 2005

Page 173: Integration of diverse large-scale datasets

Acknowledgments

• The STRING team (EMBL)– Christian von Mering– Berend Snel– Martijn Huynen– Sean Hooper– Mathilde Foglierini– Julien Lagarde– Peer Bork

• Literature mining project(EML Research)– Jasmin Saric– Rossitza Ouzounova– Isabel Rojas

• Cell cycle studies (CBS)– Ulrik de Lichtenberg– Thomas Skøt Jensen– Søren Brunak

• S. pombe cell cycle (Sanger)– Samuel Marguerat– Jürg Bähler

• Inspiration for presentation– Lawrence Lessig– Dick Clarence Hardt– Anders Gorm Pedersen

Page 174: Integration of diverse large-scale datasets

Thank you!