a software tool for analyzing genome-scale data in the context of biological pathways and

Post on 02-Feb-2016

28 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

GenMAPP. A Software Tool for Analyzing Genome-Scale Data in the Context of Biological Pathways and the Gene Ontology. J. David Gladstone Institute of Cardiovascular Disease UCSF. Overview. Intro to GenMAPP - GenMAPP analysis example Advanced features. - PowerPoint PPT Presentation

TRANSCRIPT

A Software Tool for Analyzing Genome-Scale Data in the Context of Biological Pathways and

the Gene Ontology

J. David Gladstone Institute of Cardiovascular DiseaseUCSF

GenMAPP

Overview

• Intro to GenMAPP

- GenMAPP analysis example

• Advanced features

Analyzing Large-Scale Data in the Context of Biological Pathways

• Which genes are expressed in my dataset?

• What biological processes are important in my data model?

• New insight into underlying biology

Analyzing Large-Scale Data in the Context of Biological Pathway

• View data in the context of known biology

• Rather than seeing which individual genes are changed, pathway analysis emphasizes processes that are changed

• Biologists are familiar with pathways, so it is a natural way of sharing data

Cardiomyopathy: Downregulated genes

Cardiomyopathy: Downregulated genes

Fatty Acid Degradation Pathway

Cardiomyopathy Data on Fatty Acid Degradation Pathway

GenMAPPGene Map Annotator and

Pathway Profiler

Visualize gene expression and other genomic data on biological pathways and other groupings of genes Global analysis identifies significantly changed processes and functional groups

www.GenMAPP.org

GenMAPP

• Developed in the Conklin lab at Gladstone as an internal tool for dealing with microarray data

• Approximately ~12,000 registered users to date

• 100% Free!!

• Used in 150 - 200 publications

• Open source, code available at SourceForge.net

• Current version for Windows only (Coded in VB)

Time Course Data on Cell Cycle Pathway

SNPs with Predicted Effects

http://alto.compbio.ucsf.edu/LS-SNP/

SNPs that Predispose to Myocardial Infarction

Tobin et al, European Heart Journal 2004

• 547 acute MI cases; 505 controls• 58 SNPs in 35 genes

=> SNPs in 5 different genes showed statistical

association with MI

Study spans 19 pathways

=> 4 of 5 hits are on a single pathway

SNPs and Myocardial Infarction

Tobin et al, European Heart Journal 2004

SNP Data in GenMAPP

• Visualization Distribution of SNPs per gene

• Prioritization Mapping SNP annotations onto pathways

• Analysis Interpreting SNP data in the context of biological pathways

Future directions High-resolution visualization of individual SNPs with the ability to overlay data

MAPPFinder

MAPPFinder

Global comparison of changes in dataset to changes expected by

chance

Experimental Data Gene Ontology termsGenMAPP Pathways

Pathways and GO terms with significant changes

Originally developed as a separate application by Scott Doniger*

* Doniger et al. Genome Biology 4(1):R7

MAPPFinder Browser

MAPPFinder Browser

GenMAPP Relationship SchemaGenMAPP Relationship Schema

Pathway MAPP

User Dataset (GEX)

Criterion Gene ID

Blue 1415904_at

Gene ID System

Affymetrix

Gene Name Gene ID

Lpl 16956

Gene ID System

EntrezGene

GenMAPP Supported Species

Fruit flyHumanMouseRatWormYeastZebrafishChicken DogCow

By request:Chimp FrogFugu  F.rubripesHoney beeMosquito Pufferfish T.nigroviridis

GenMAPP Supported Gene IDs

Annotations

InterProEMBLOMIMPfamGene Ontology

Species-specific

MGIRGDSGDWormBaseZFINHUGOFlyBase

Gene IDs

Affymetrix Entrez GeneRefSeq (protein only) UnigeneUniProtEnsemblPDB

Available MAPP Archives

Download all MAPPs through Downloader in GenMAPP

Contributed MAPPsHand-curated pathways created at GenMAPP.org or submitted by GenMAPP users. >70 MAPPs for human, mouse and rat.

Inferred MAPPs Inferred from human contributed MAPPs, using homology information from Homologene and Ensembl   

Tissue-Specific MAPPs (human and mouse only)Based on the analysis of two microarray datasets generated by the Genomic Institute of the Novartis Foundation

  GO Sample MAPPs 

An partial collection of GO terms formatted as GenMAPP MAPP files, each containing between 100 genes and 300 genes. GO MAPPs are formatted as lists of genes, and do not contain any graphics other than the gene object and the label

SGD metabolic MAPPs  (yeast only)Derived from the yeast pathways at SGD

KEGG converted MAPPs The KEGG Converted MAPPs were converted from the Pathway Resource at the Kyoto Encyclopedia of Genes and Genomes.

http://www.genmapp.org/featured_mapps.html

Input Data

• Data in spreadsheet summary format • NO raw data• Data should include metrics that you want to use as cutoffs:

avg signal, ratio, fold, signal quality, p-value, cluster ID, other statistics

• Include ALL genes measured in experiment, DO NOT pre-filter• Choose optimal primary gene ID• Custom annotation can be useful (Database includes standard annotation)

Example: Group Comparison Experiment

• Fold changes between groups• p-value associated with fold • Average signal per group

GenMAPP WorkflowGenMAPP Workflow

Import Data

Set Color Criteria

Display Data on Pathways

Gene Ontology analysis Export Pathways to the Web

Pre-Processed Formatted Data (with statistics, metrics)

Create/Edit/ConvertPathways

ExpressionExpressionDatasetDatasetManagerManager

DraftingDraftingBoardBoard

Drafting BoardDrafting BoardMAPPBuilderMAPPBuilder

ConverterConverter

MAPPFinderMAPPFinder MAPPSetsMAPPSets

Example: Analysis of Complex Time-Example: Analysis of Complex Time-Course DataCourse Data

Challenges:• How to represent your data in an intuitive manner• How to analyze patterns rather than specific comparisons.

Approach: • Set up hypotheses to test• Attach global statistics (e.g. ANOVA) and pattern recognition

• Efficiently import in data into GenMAPP• Visualize cluster and time-point data (GenMAPP 2.1-NEW)• Global analysis of pathway/ontologies (MAPPFinder)• Export results to the web/for publication

Set Up Hypotheses to TestSet Up Hypotheses to Test

Build a MAPP to Test a Hypothesis• Use literature and previous knowledge about the model you are

studying to build a list of candidates or pathway.

Step 1):• Collect a list of gene IDs• Import them using the MAPPBuilder Function• Organize into a biological pathway along with predictions of expected

changes.

Salomonis N, et al. Genome Biol. 2005 6:R12–R12.16

Import List of Genes in MAPPBuilderImport List of Genes in MAPPBuilder

Gene Layout on the Drafting BoardGene Layout on the Drafting Board

Example: Analysis of Complex Time-Example: Analysis of Complex Time-Course DataCourse Data

Challenges:• How to represent your data in an intuitive manner• How to analyze patterns rather than specific comparisons.

Approach: • Set up hypotheses to test• Attach global statistics (e.g. ANOVA) and pattern

recognition• Efficiently import in data into GenMAPP• Visualize cluster and time-point data (GenMAPP 2.1-NEW)• Global analysis of pathway/ontologies (MAPPFinder)• Export results to the web/for publication

Dataset: Mouse Uterine Pregnancy Dataset: Mouse Uterine Pregnancy Time-CourseTime-Course

Experiment Design:• Analyzed 7 time-points (3-8 replicates):

• Non-Pregnant mice• 14.5, 16.5 and 17.5 days post fertilization • 18.5 days (term pregnancy)• 6 hours and 24 hours postpartum

• Hybridized to mouse 11k Affymetrix arrays

Analysis:• Normalized and Adjusted expression (gcrma R)• Performed a global f-test (multtest R)• Hierarchical and partitioned clustering (hopach R)

Salomonis N, et al. Genome Biol. 2005 6:R12–R12.16

HOPACH ClusteringHOPACH ClusteringHierarchical Ordered Partitioning and Collapsing HybridHierarchical Ordered Partitioning and Collapsing Hybrid

1. Use global f-test to filter probeset list down to 3500 entries.

2. Cluster fold changes for each replicate compared to non-pregnant baseline mean.

3. Take the top level cluster (left) and re-associate with expression data.

Example: Analysis of Complex Time-Example: Analysis of Complex Time-Course DataCourse Data

Challenges:• How to represent your data in an intuitive manner• How to analyze patterns rather than specific comparisons.

Approach: • Set up hypotheses to test• Attach global statistics (e.g. ANOVA) and pattern recognition

• Efficiently import in data into GenMAPP• Visualize cluster and time-point data (GenMAPP 2.1-NEW)• Global analysis of pathway/ontologies (MAPPFinder)• Export results to the web/for publication

GenMAPP InputGenMAPP Input

Import File Design:• Include all probe data (not just filtered)• Include the following columns of data

• Multtest p-values• HOPACH clusters• Average group expression values• Fold changes (all relevant pair wise comparisons)• Gene Database system code

Salomonis N, et al. Genome Biol. 2005 6:R12–R12.16

GenMAPP InputGenMAPP Input

GenMAPP Expression Dataset GenMAPP Expression Dataset ManagerManager

Import Text File into GenMAPP• Tell GenMAPP which columns have non-numeric data.

Establishing Rules for Coloring Gene Boxes:• Design criterion that captures any patterns you want to see.• Here we want:

• Fold change gradients for up and down regulated for time-point comparisons (Color Sets)

• Different colors assigned to each HOPACH cluster

Salomonis N, et al. Genome Biol. 2005 6:R12–R12.16

GenMAPP Expression Dataset GenMAPP Expression Dataset ManagerManager

GenMAPP Expression Dataset GenMAPP Expression Dataset ManagerManager

Example: Analysis of Complex Time-Example: Analysis of Complex Time-Course DataCourse Data

Challenges:• How to represent your data in an intuitive manner• How to analyze patterns rather than specific comparisons.

Approach: • Set up hypotheses to test• Attach global statistics (e.g. ANOVA) and pattern recognition• Efficiently import in data into GenMAPP• Visualize cluster and time-point data (GenMAPP 2.1-NEW)• Global analysis of pathway/ontologies (MAPPFinder)• Export results to the web/for publication

Viewing Time-Course Data on Viewing Time-Course Data on MAPPsMAPPs

Method 1)• View criterion, one at a time on pathways of

interest.

Single Color Set ViewSingle Color Set View

Single Color Set ViewSingle Color Set View

Viewing Time-Course Data on Viewing Time-Course Data on MAPPsMAPPs

Method 1)• View criterion, one at a time on pathways of

interest.

Method 2)• View clusters directly on pathway.

Single Color Set ViewSingle Color Set View

Single Color Set ViewSingle Color Set View

Viewing Time-Course Data on Viewing Time-Course Data on MAPPsMAPPs

Method 1)• View criterion, one at a time on pathways of

interest.

Method 2)• View clusters directly on pathway.

Method 3)• View all criterion of interest simultaneously.

Single Color Set ViewSingle Color Set View

Multiple Color Set ViewMultiple Color Set View

Example: Analysis of Complex Time-Example: Analysis of Complex Time-Course DataCourse Data

Challenges:• How to represent your data in an intuitive manner• How to analyze patterns rather than specific comparisons.

Approach: • Set up hypotheses to test• Attach global statistics (e.g. ANOVA) and pattern recognition

• Efficiently import in data into GenMAPP• Visualize cluster and time-point data (GenMAPP 2.1-NEW)• Global analysis of pathway/ontologies (MAPPFinder)• Export results to the web/for publication

Advanced Features

• Customizing a Gene Database / Creating a Gene Database for a non-supported species=> Implement GenMAPP for a novel model species

• Create your own pathway MAPPs => Implement GenMAPP for a novel model species => Author novel pathways based on your discoveries

• High-throughput export of browsable html pathway archive => For interactive web-display of data on pathway archive

International Gene Trap Consortium

GenMAPP team

The GenMAPP program can be downloaded at www.GenMAPP.org

Questions?

genmapp@gladstone.ucsf.edukhanspers@gladstone.ucsf.edu

Bruce Conklin Alex Pico Alex Zambon Karen Vranizan Nathan Salomonis Kam Dahlquist

http://groups.google.com/group/GenMAPP

top related