GO based data analysis
Iowa State Workshop
11 June 2009
All tools and materials from this workshop are available online at the AgBase database Educational Resources link.
For continuing support and assistance please contact:
This workshop is supported by USDA CSREES grant number MISV-329140.
AgBase protein annotation processProtein identifiers or
Fasta format
GORetriever
Annotated Proteins
GOanna
Proteins with no annotations
GOSlimViewer
Hypothesis generating
Gene Ontology enrichment analysis
GO terms that are statistically (Fisher’s exact test) over or underrepresented in a set of genes
Annotation Clustering
group similar annotations based on the hypothesis that they should have similar gene members
Some resources
DAVID: http://david.abcc.ncifcrf.gov/ GOStat: http://gostat.wehi.edu.au/ EasyGO: http://bioinformatics.cau.edu.cn/easygo/ AmiGO http://amigo.geneontology.org/cgi-bin/amigo/term_enrichment
(does not use IEA) Onto-Express & OE2GO http://vortex.cs.wayne.edu/projects.htm GOEAST http://omicslab.genetics.ac.cn/GOEAST http://www.geneontology.org/GO.tools.shtml Comparison of enrichment analysis tools : Nucleic Acids Research, 2009,
Vol. 37, No. 1 1–13 (Tool_Comparison_09.pdf)
DAVID and EasyGO analysis included DAVID&EasyGo.ppt
Database for Annotation, Visualization and Integrated Discovery
http://vortex.cs.wayne.edu/ontoexpress
Onto-Express analysis instructions areAvailable in onto-express.ppt
Species represented in Onto-Express
For uploading your own annotations use OE2GO
Comparison
Onto-Express , EasyGO, GOstat and DAVID Test set: 60 randomly selected chicken genes Used AgBase GO annotations as baseline
annotations
Vandenberg et al (BMC Bioinformatics, in review)
Networks & Pathways
Iowa State Workshop
11 June 2009
Multiple data analysis platforms
Proteomics
Transcriptomics
ESTs
LIST
Our original aim…. …understand biological phenomena….
Bits and pieces of information Do not have the full picture How do we get back to BIOLOGY in this
digital information landscape?
What do we know about biological systems …. biological systems are dynamic, not static how molecules interact is key to understanding
complex systems
Francis Crick, 1958
Types of interactions protein (enzyme) – metabolite (ligand)
metabolic pathways
protein – protein cell signaling pathways, protein complexes
protein – gene genetic networks
Sod1 Mus musculus
STRING Database
http://string.embl.de/
PLoS Computational Biology March 2007, Volume 3 e42
Database/URL/FTPDIP http://dip.doe-mbi.ucla.eduBIND http://bind.ca MPact/MIPS http://mips.gsf.de/services/ppi STRING http://string.embl.deMINT http://mint.bio.uniroma2.it/mintIntAct http://www.ebi.ac.uk/intactBioGRID http://www.thebiogrid.orgHPRD http://www.hprd.orgProtCom http://www.ces.clemson.edu/compbio/ProtCom3did, Interprets http://gatealoy.pcb.ub.es/3did/Pibase, Modbase http://alto.compbio.ucsf.edu/pibaseCBM ftp://ftp.ncbi.nlm.nih.gov/pub/cbmSCOPPI http://www.scoppi.org/iPfam http://www.sanger.ac.uk/Software/Pfam/iPfamInterDom http://interdom.lit.org.sgDIMA http://mips.gsf.de/genre/proj/dima/index.htmlProlinks http://prolinks.doe-mbi.ucla.edu/cgibin/functionator/pronav/Predictome http://predictome.bu.edu/
Pathways & Networks
A network is a collection of interactions
Pathways are a subset of networks Network of interacting proteins that carry out biological
functions such as metabolism and signal transduction
All pathways are networks of interactions
NOT ALL NETWORKS ARE PATHWAYS
Biological Networks
Networks often represented as graphs Nodes represent proteins or genes that code for
proteins Edges represent the functional links between
nodes (ex regulation) Small changes in graph’s topology/architecture
can result in the emergence of novel properties
Yeast Protein-Protein Interaction Map
Nature 411, 2001,
H. Jeong, et al
KEGG http://www.genome.jp/kegg/pathway.html/BioCyc http://www.biocyc.org/Reactome http://www.reactome.org/GenMAPP http://www.genmapp.org/BioCarta http://www.biocarta.com/
Pathguide – the pathway resource list http://www.pathguide.org/
Some resources
Gallus gallus is missing
PathguideStatistics
Reactome
What is feasible with my specific dataset?
Systems Biology Workflow
Nanduri & McCarthy CAB reviews, 2008
Systems Biology Workflow
For a given species of interest what type of data is available???
Retrieval of interaction datasets
Evaluate PPI resources such as Predictome
Prolinks for existence of species of interest If unavailable, find orthologous proteins in
related species that have interactions!
I have interactions what next?
Evaluate the quality of interactions i.e. type of method used for identification….what exactly are these methods?
I have interactions what next?
Evaluate the quality of interactions i.e. type of method used for identification….what exactly are these methods?
STRING Database
PPI Identification
Experimental Computational
Gene Coexpression
TAP assays
Sequence coevolution
Yeast two hybrid Phylogenetic profile
Gene Cluster
Rosetta stone method
Text mining
TAP assays
Yeast two hybrid (Y2H)
Protein arrays
PLoS Computational Biology March 2007, Volume 3 e42
PPI database comparisons
Proteins: Structure, Function and Bioinformatics 63:490-500 2006
I have interactions what next?
Evaluate the quality of interactions i.e. type of method used for identification….what exactly are these methods?
Visualize these interactions as a network and analyze…
what are the available tools?