string large-scale data and text mining

95
STRING Large-scale data and text mining Lars Juhl Jensen

Upload: brandon-luby

Post on 04-Jan-2016

24 views

Category:

Documents


0 download

DESCRIPTION

STRING Large-scale data and text mining. Lars Juhl Jensen. association networks. guilt by association. biological systems. protein networks. STRING. 1100+ genomes. computational predictions. gene fusion. Korbel et al., Nature Biotechnology , 2004. gene neighborhood. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: STRING Large-scale data and text mining

STRINGLarge-scale data and text

mining

Lars Juhl Jensen

Page 2: STRING Large-scale data and text mining

association networks

Page 3: STRING Large-scale data and text mining

guilt by association

Page 4: STRING Large-scale data and text mining
Page 5: STRING Large-scale data and text mining

biological systems

Page 6: STRING Large-scale data and text mining

protein networks

Page 7: STRING Large-scale data and text mining

STRING

Page 8: STRING Large-scale data and text mining

1100+ genomes

Page 9: STRING Large-scale data and text mining

computational predictions

Page 10: STRING Large-scale data and text mining

gene fusion

Page 11: STRING Large-scale data and text mining

Korbel et al., Nature Biotechnology, 2004

Page 12: STRING Large-scale data and text mining

gene neighborhood

Page 13: STRING Large-scale data and text mining

Korbel et al., Nature Biotechnology, 2004

Page 14: STRING Large-scale data and text mining

phylogenetic profiles

Page 15: STRING Large-scale data and text mining

Korbel et al., Nature Biotechnology, 2004

Page 16: STRING Large-scale data and text mining

experimental data

Page 17: STRING Large-scale data and text mining

gene coexpression

Page 18: STRING Large-scale data and text mining
Page 19: STRING Large-scale data and text mining

protein interactions

Page 20: STRING Large-scale data and text mining

Jensen & Bork, Science, 2008

Page 21: STRING Large-scale data and text mining

a real example

Page 22: STRING Large-scale data and text mining
Page 23: STRING Large-scale data and text mining
Page 24: STRING Large-scale data and text mining
Page 25: STRING Large-scale data and text mining

Cell

Cellulosomes

Cellulose

Page 26: STRING Large-scale data and text mining

curated knowledge

Page 27: STRING Large-scale data and text mining

complexes

Page 28: STRING Large-scale data and text mining

pathways

Page 29: STRING Large-scale data and text mining

Letunic & Bork, Trends in Biochemical Sciences, 2008

Page 30: STRING Large-scale data and text mining

many databases

Page 31: STRING Large-scale data and text mining

different formats

Page 32: STRING Large-scale data and text mining

different identifiers

Page 33: STRING Large-scale data and text mining

variable quality

Page 34: STRING Large-scale data and text mining

not comparable

Page 35: STRING Large-scale data and text mining

not same species

Page 36: STRING Large-scale data and text mining

hard work

Page 37: STRING Large-scale data and text mining

(Ph.D. students)

Page 38: STRING Large-scale data and text mining

common identifiers

Page 39: STRING Large-scale data and text mining

quality scores

Page 40: STRING Large-scale data and text mining

von Mering et al., Nucleic Acids Research, 2005

Page 41: STRING Large-scale data and text mining

score calibration

Page 42: STRING Large-scale data and text mining

von Mering et al., Nucleic Acids Research, 2005

Page 43: STRING Large-scale data and text mining

homology-based transfer

Page 44: STRING Large-scale data and text mining

Franceschini et al., Nucleic Acids Research, 2013

Page 45: STRING Large-scale data and text mining

missing most of the data

Page 46: STRING Large-scale data and text mining

text mining

Page 47: STRING Large-scale data and text mining

>10 km

Page 48: STRING Large-scale data and text mining

too much to read

Page 49: STRING Large-scale data and text mining

computer

Page 50: STRING Large-scale data and text mining

comprehensive lexicon

Page 51: STRING Large-scale data and text mining

CDC2

Page 52: STRING Large-scale data and text mining

cyclin dependent kinase 1

Page 53: STRING Large-scale data and text mining

expansion rules

Page 54: STRING Large-scale data and text mining

hCdc2

Page 55: STRING Large-scale data and text mining

CDC2

Page 56: STRING Large-scale data and text mining

flexible matching

Page 57: STRING Large-scale data and text mining

cyclin-dependent kinase 1

Page 58: STRING Large-scale data and text mining

cyclin dependent kinase 1

Page 59: STRING Large-scale data and text mining

“black list”

Page 60: STRING Large-scale data and text mining

SDS

Page 61: STRING Large-scale data and text mining

co-mentioning

Page 62: STRING Large-scale data and text mining

counting

Page 63: STRING Large-scale data and text mining

within documents

Page 64: STRING Large-scale data and text mining

within paragraphs

Page 65: STRING Large-scale data and text mining

within sentences

Page 66: STRING Large-scale data and text mining

natural language processing

Page 67: STRING Large-scale data and text mining

Gene and protein namesCue words for entity recognitionVerbs for relation extraction

[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]

Page 68: STRING Large-scale data and text mining

text corpus

Page 69: STRING Large-scale data and text mining

~2 million full-text articles

Page 70: STRING Large-scale data and text mining

~22 million abstracts

Page 71: STRING Large-scale data and text mining

Exercise 1Go to http://string-db.org

Query for Mt H37Rv adhD

(Rv3086)

Change between different

views

Check evidence for adhD–lipR

link

Extent network to 50

interactors

Page 72: STRING Large-scale data and text mining
Page 73: STRING Large-scale data and text mining
Page 74: STRING Large-scale data and text mining

Exercise 2Go to the paper PMC2995261

Extract the protein names in

table 1

Create STRING network of

them

Change to “advanced” mode

Analyze for clusters and

enrichment

Page 75: STRING Large-scale data and text mining

multi-page tables

Page 76: STRING Large-scale data and text mining

related resources

Page 77: STRING Large-scale data and text mining

general approach

Page 78: STRING Large-scale data and text mining

curated knowledge

Page 79: STRING Large-scale data and text mining

experimental data

Page 80: STRING Large-scale data and text mining

text mining

Page 81: STRING Large-scale data and text mining

computational predictions

Page 82: STRING Large-scale data and text mining

common identifiers

Page 83: STRING Large-scale data and text mining

quality scores

Page 84: STRING Large-scale data and text mining

score calibration

Page 85: STRING Large-scale data and text mining

visualization

Page 86: STRING Large-scale data and text mining

protein networks

Page 87: STRING Large-scale data and text mining

string-db.org

Page 88: STRING Large-scale data and text mining

chemical networks

Page 89: STRING Large-scale data and text mining

stitch-db.org

Page 90: STRING Large-scale data and text mining

subcellular localization

Page 91: STRING Large-scale data and text mining

compartments.jensenlab.org

Page 92: STRING Large-scale data and text mining

tissue expression

Page 93: STRING Large-scale data and text mining

tissues.jensenlab.org

Page 94: STRING Large-scale data and text mining

disease associations

Page 95: STRING Large-scale data and text mining

Work on own datastring-db.org

stitch-db.org

compartments.jensenlab.org

tissues.jensenlab.org

diseases.jensenlab.org