an exploratory method to reconstruct pathways cory tobin

22
An Exploratory Method to Reconstruct Pathways Cory Tobin

Post on 20-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

An Exploratory Method to Reconstruct Pathways

Cory Tobin

Collaborators

Dr. Matteo Pellegrini

Shawn Cokus

@ UCLA

Outline

• Purpose

• Methods

• Sample Data

• Possible Uses

• Final Remarks

Purpose

Reconstruct signal transduction pathways & protein complexes using protein-protein interactions reported

on the web

Materials

• Python

• Yahoo! Search API

• ProstgreSQL

• Django Web Framework

Methods

• Construct high likelihood / low noise queries• Ex: “Jak2 phophorylates Stat5”

• Query Yahoo! for every permutation of 2 proteins in a given species

• Use high likelihood joining words…

Joining Words

• Phosphorylates• Methylates• Acetylates• Activates• Deactivates• Binds to

• Inhibits• Dephosphorylates• Glycosylates• Ubiquitinates• Interacts with

Full Query

“Jak2 acetylates OR phosphorylates OR

methylates OR binds to OR interacts

with Stat5”

Hindrance

• Doing pair-wise queries for all N proteins in an organism requires N*N queries

• E. coli has >4000 genes

(16,000,000 queries)

• Yahoo! allows 5k / day / computer

Possible Solutions

Recruit 4k computers and finish in a day

Find a better method

OR

Better Method

• Only specify the first symbol

• Iterate through the results and only

take results whose word following the

joining symbol corresponds to a valid

symbol

Full Query

“Jak2 acetylates OR phosphorylates OR

methylates OR binds to OR interacts

with”

Another Hindrance

• The symbol “thE”(and others like it)

• Searches need to be case insensitive to account for “p53” and “P53”

• Recognizes the word “the” as the protein “thE”

Solution

• Use a list of stop words

• Very common, non-interesting

words

• If the name appears in that list of stop

words, just forget about that protein

all together

http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words

Methods (cont.)

• After we have this data in a database...

• Create a web interface to the data so others can search for protein interactions (Shwe)

Data

KEGG - Yeast MAPK

Our Datahttp://www.genome.jp/dbget-bin/show_pathway?sce04010+YGR040W

Data (cont.)

KEGG - Yeast Cell Cycle

http://www.genome.jp/dbget-bin/get_pathway?org_name=sce&mapno=04110

Our Data

Data (cont.)

KEGG - Yeast 26S Proteasome

Our Data

http://www.genome.jp/dbget-bin/show_pathway?sce03050+YER012W

Possible Uses

• General reference for protein

interactions

• Curate other databases

Final Remarks

• Only works well detecting signal pathways and protein complexes

• Not metabolic pathways

• It is possible to get high quality, interesting data without much noise or complex text analysis algorithms

References

• Kyoto Encyclopedia of Genes and Genomes http://www.genome.jp/kegg/

• Cytoscape Network Visualization http://www.cytoscape.org/

• Yahoo! Developer Network http://developer.yahoo.com/

Acknowledgements

• Dr. Matteo Pellegrini

• Everyone in the lab

• SoCalBSI

• NIH / NSF