go based data analysis

37
GO based data analysis Iowa State Workshop 11 June 2009

Upload: meadow

Post on 20-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

GO based data analysis. Iowa State Workshop 11 June 2009. All tools and materials from this workshop are available online at the AgBase database Educational Resources link. For continuing support and assistance please contact: [email protected]. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: GO based data analysis

GO based data analysis

Iowa State Workshop

11 June 2009

Page 2: GO based data analysis

All tools and materials from this workshop are available online at the AgBase database Educational Resources link.

For continuing support and assistance please contact:

[email protected]

This workshop is supported by USDA CSREES grant number MISV-329140.

Page 3: GO based data analysis

AgBase protein annotation processProtein identifiers or

Fasta format

GORetriever

Annotated Proteins

GOanna

Proteins with no annotations

GOSlimViewer

Page 4: GO based data analysis

Hypothesis generating

Gene Ontology enrichment analysis

GO terms that are statistically (Fisher’s exact test) over or underrepresented in a set of genes

Annotation Clustering

group similar annotations based on the hypothesis that they should have similar gene members   

Page 5: GO based data analysis

Some resources

DAVID: http://david.abcc.ncifcrf.gov/ GOStat: http://gostat.wehi.edu.au/ EasyGO: http://bioinformatics.cau.edu.cn/easygo/ AmiGO http://amigo.geneontology.org/cgi-bin/amigo/term_enrichment

(does not use IEA) Onto-Express & OE2GO http://vortex.cs.wayne.edu/projects.htm GOEAST http://omicslab.genetics.ac.cn/GOEAST http://www.geneontology.org/GO.tools.shtml Comparison of enrichment analysis tools : Nucleic Acids Research, 2009,

Vol. 37, No. 1 1–13 (Tool_Comparison_09.pdf)

DAVID and EasyGO analysis included DAVID&EasyGo.ppt

Page 6: GO based data analysis

Database for Annotation, Visualization and Integrated Discovery

Page 7: GO based data analysis
Page 8: GO based data analysis
Page 9: GO based data analysis

http://vortex.cs.wayne.edu/ontoexpress

Onto-Express analysis instructions areAvailable in onto-express.ppt

Page 10: GO based data analysis

Species represented in Onto-Express

Page 11: GO based data analysis

For uploading your own annotations use OE2GO

Page 12: GO based data analysis

Comparison

Onto-Express , EasyGO, GOstat and DAVID Test set: 60 randomly selected chicken genes Used AgBase GO annotations as baseline

annotations

Vandenberg et al (BMC Bioinformatics, in review)

Page 13: GO based data analysis
Page 14: GO based data analysis

Networks & Pathways

Iowa State Workshop

11 June 2009

Page 15: GO based data analysis

Multiple data analysis platforms

Proteomics

Transcriptomics

ESTs

LIST

Page 16: GO based data analysis

Our original aim…. …understand biological phenomena….

Bits and pieces of information Do not have the full picture How do we get back to BIOLOGY in this

digital information landscape?

Page 17: GO based data analysis

What do we know about biological systems …. biological systems are dynamic, not static how molecules interact is key to understanding

complex systems

Francis Crick, 1958

Page 18: GO based data analysis

Types of interactions protein (enzyme) – metabolite (ligand)

metabolic pathways

protein – protein cell signaling pathways, protein complexes

protein – gene genetic networks

Page 19: GO based data analysis

Sod1 Mus musculus

STRING Database

http://string.embl.de/

Page 20: GO based data analysis
Page 21: GO based data analysis

PLoS Computational Biology March 2007, Volume 3 e42

Database/URL/FTPDIP http://dip.doe-mbi.ucla.eduBIND http://bind.ca MPact/MIPS http://mips.gsf.de/services/ppi STRING http://string.embl.deMINT http://mint.bio.uniroma2.it/mintIntAct http://www.ebi.ac.uk/intactBioGRID http://www.thebiogrid.orgHPRD http://www.hprd.orgProtCom http://www.ces.clemson.edu/compbio/ProtCom3did, Interprets http://gatealoy.pcb.ub.es/3did/Pibase, Modbase http://alto.compbio.ucsf.edu/pibaseCBM ftp://ftp.ncbi.nlm.nih.gov/pub/cbmSCOPPI http://www.scoppi.org/iPfam http://www.sanger.ac.uk/Software/Pfam/iPfamInterDom http://interdom.lit.org.sgDIMA http://mips.gsf.de/genre/proj/dima/index.htmlProlinks http://prolinks.doe-mbi.ucla.edu/cgibin/functionator/pronav/Predictome http://predictome.bu.edu/

Page 22: GO based data analysis

Pathways & Networks

A network is a collection of interactions

Pathways are a subset of networks Network of interacting proteins that carry out biological

functions such as metabolism and signal transduction

All pathways are networks of interactions

NOT ALL NETWORKS ARE PATHWAYS

Page 23: GO based data analysis

Biological Networks

Networks often represented as graphs Nodes represent proteins or genes that code for

proteins Edges represent the functional links between

nodes (ex regulation) Small changes in graph’s topology/architecture

can result in the emergence of novel properties

Page 24: GO based data analysis

Yeast Protein-Protein Interaction Map

Nature 411, 2001,

H. Jeong, et al

Page 25: GO based data analysis

KEGG http://www.genome.jp/kegg/pathway.html/BioCyc http://www.biocyc.org/Reactome http://www.reactome.org/GenMAPP http://www.genmapp.org/BioCarta http://www.biocarta.com/

Pathguide – the pathway resource list http://www.pathguide.org/

Some resources

Page 26: GO based data analysis
Page 27: GO based data analysis

Gallus gallus is missing

PathguideStatistics

Page 28: GO based data analysis

Reactome

Page 29: GO based data analysis

What is feasible with my specific dataset?

Page 30: GO based data analysis

Systems Biology Workflow

Nanduri & McCarthy CAB reviews, 2008

Page 31: GO based data analysis

Systems Biology Workflow

For a given species of interest what type of data is available???

Page 32: GO based data analysis

Retrieval of interaction datasets

Evaluate PPI resources such as Predictome

Prolinks for existence of species of interest If unavailable, find orthologous proteins in

related species that have interactions!

Page 33: GO based data analysis

I have interactions what next?

Evaluate the quality of interactions i.e. type of method used for identification….what exactly are these methods?

Page 34: GO based data analysis

I have interactions what next?

Evaluate the quality of interactions i.e. type of method used for identification….what exactly are these methods?

STRING Database

Page 35: GO based data analysis

PPI Identification

Experimental Computational

Gene Coexpression

TAP assays

Sequence coevolution

Yeast two hybrid Phylogenetic profile

Gene Cluster

Rosetta stone method

Text mining

TAP assays

Yeast two hybrid (Y2H)

Protein arrays

PLoS Computational Biology March 2007, Volume 3 e42

Page 36: GO based data analysis

PPI database comparisons

Proteins: Structure, Function and Bioinformatics 63:490-500 2006

Page 37: GO based data analysis

I have interactions what next?

Evaluate the quality of interactions i.e. type of method used for identification….what exactly are these methods?

Visualize these interactions as a network and analyze…

what are the available tools?