divining systems biology knowledge from high-throughput experiments using egan jesse paquette ismb...
Post on 22-Dec-2015
217 Views
Preview:
TRANSCRIPT
Divining Systems Biology Knowledge from High-throughput Experiments Using EGAN
Jesse PaquetteISMB 2010
Biostatistics and Computational Biology CoreHelen Diller Family Comprehensive Cancer Center
University of California, San Francisco (AKA BCBC HDFCCC UCSF)
High-throughput experiments
• This talk applies to– Expression microarrays– aCGH– SNP/CNV arrays– MS/MS Proteomics– DNA methylation– ChIP-Seq– RNA-Seq– In-silico experiments
• If parts of the output can be mapped to gene IDs– You can use EGAN
What do you hope to accomplish?
Collect data
Process data
Differential analysis Publish!
Clusters and/or gene lists
New testable hypotheses
Produce insight about the underlying biology
New grants!New papers!
Drug targets!
Leverage organic intelligence
Clusters and/or gene lists
New testable hypotheses
Produce insight about the underlying biology
Summarize
Visualize
Contextualize
Producing insight from clusters and gene lists
• Summarize: find enriched pathways (and other gene sets)– Hypergeometric over-representation
• DAVID– Global trends
• GSEA
• Visualize: gene relationships in a graph– Protein-protein interactions
• Cytoscape– Network module discovery
• Ingenuity IPA– Literature co-occurrence
• PubGene
• Contextualize: pertinent literature• PubMed• Google• iHOP
EGAN: Exploratory Gene Association Networks
• Methods: state-of-the-art analysis of clusters and gene lists– Hypergeometric enrichment of gene sets– Global statistical trends of gene sets– Hypergraph visualization (via Cytoscape libraries)– Literature identification– Network module discovery
• User Interface: responds quickly to new queries from the biologist– Sandbox-style functionality– Dynamic adjustment of p-value cutoffs– Point-and-click interface– All data in-memory for immediate access– Links to external websites
• Modular: integrates as a flexible plug-and-play cog – All data is customizable– Proprietary data can be restricted to the client location– Java runs on almost every OS (PC, Mac, LINUX)– Can be configured and launched from a different application (e.g. GenePattern)– Analyses can be scripted for automation
Gene sets
• A gene set is a a set of semantically related genes– e.g. Wnt signaling pathway
• EGAN contains a database of gene sets– > 100k gene sets by default
• KEGG, Reactome, NCI-Nature, Gene Ontology, MeSH, Conserved Domain, Cytoband, miRNA targets
– You can easily add your own• Simple file format
• Download from MSigDB (Broad Institute)
Gene-gene relationships
• EGAN also contains– Protein-protein interactions (PPI)– Literature co-occurrence– Chromosomal adjacency– Kinase-target relationships
• Other possibilities– Sequence homology– Expression correlation
Example with microarray and aCGH results
• Mirzoeva et al. (2009) Cancer Research– UCSF-LBL collaboration– Analysis of breast cancer cell lines
• Basal vs. luminal
• Discoveries in this presentation– miRNA regulator of subtype (mir-200)– Annexin (ANXA1) as potential regulator of ER,
glucocorticoid and EGFR signaling
Where to find EGAN
• Website– http://akt.ucsf.edu/EGAN/
• 2010 paper in Bioinformatics– http://www.ncbi.nlm.nih.gov/pubmed/19933825
Acknowledgements• BCBC HDFCCC UCSF
– Taku Tokuyasu– Adam Olshen– Ritu Roy– Ajay Jain
• LBNL– Debopriya Das– Joe Gray
• Funding– UCSF Cancer Center Support
Grant
• UCSF– Early adopters
• Ingrid Revet• Antoine Snijders• Stephan Gysin• Sook Wah Yee• Joachim Silber
– Cytoscape gurus• David Quigley• Scooter Morris
– OTM• David Eramian• Ha Nguyen
– Laura van ’t Veer– Donna Albertson– Graeme Hodgson
top related