genepattern overview for mage-tab workshop ted liefeld january 24, 2007

10
GenePattern GenePattern Overview for MAGE-TAB Workshop Ted Liefeld January 24, 2007

Upload: terence-norton

Post on 31-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: GenePattern Overview for MAGE-TAB Workshop Ted Liefeld January 24, 2007

GenePatternGenePattern

Overview for MAGE-TAB Workshop

Ted Liefeld

January 24, 2007

Page 2: GenePattern Overview for MAGE-TAB Workshop Ted Liefeld January 24, 2007

a platform for integrative genomics

Client User InterfacesPipeline EnvironmentModule Repository

Module Integrator

Desktop

Programming

Web

all_aml_train

all_aml_test

Preprocess

Class Neighbors

Weighted

VotingCross-

Val

SOM Clusterin

g

Preprocess

Weighted Voting

Train/Test

SOM Cluster Viewer

Marker SelectionViewer

Prediction

ResultsViewer

Prediction

ResultsViewer

Golub and Slonim et. al 1999

KNN

SVM

SOM

GSEA

NMF

PCA

Page 3: GenePattern Overview for MAGE-TAB Workshop Ted Liefeld January 24, 2007

Features

Automatic Module Integration Add new modules without writing code Supports any command line callable code (language

independent)

Multiple user interfaces Desktop client Web client Programmatic interfaces to Java, MATLAB, R

Local and Distributed Computing

Laptop

Client/Server

Compute farm

Public server (1/2008)

Interoperability caBIG

caArray

caGrid geWorkbench

Cytoscape

Analytic Reproducibility Easy, rapid sharing of methodologies via pipelines

Versioning using Life Sciences Identifier (LSID)

Executable history of all sessions

Automatic pipeline generation from result files

Executable research documents

Comprehensive Module Repository ~90 modules: analysis, visualization, pipelines

Expression, proteomic, sequence, variation (SNP), and whole genome association data

Construction of context-sensitive, flexible analytic workflows

Module suites

Page 4: GenePattern Overview for MAGE-TAB Workshop Ted Liefeld January 24, 2007

Module Integrator

Add modules and visualizers without

writing code

Share custom analysis tasks

Integrate your own or “third-party” tools

easily

Add tools to a common repository

Page 5: GenePattern Overview for MAGE-TAB Workshop Ted Liefeld January 24, 2007

as a Visualization & Analysis Engine

http://www.broad.mit.edu/mmgp

Portal

GenePattern

LSF Worker Nodes

GenePattern SNPViewervisualizer

(running as applet)

RunGenePattern

Analyses

Page 6: GenePattern Overview for MAGE-TAB Workshop Ted Liefeld January 24, 2007

Using MAGE-ML today

Page 7: GenePattern Overview for MAGE-TAB Workshop Ted Liefeld January 24, 2007

MAGE-TAB use tomorrow

Ideally Be able to automatically find raw/derived bioassay data

when parsing MAGE-TAB files

• Use MAGE-TAB like our native (tab-delimited) data formats, GCT, RES in (almost) any GenePattern analysis module

• Not require user interaction to specify Assays or quantitation types

• ? MGED-Ontology for common data transform protocols (eg RMA, MAS5) in addition to free text

Sub-optimal but still good Have an interactive viewer to convert from MAGE-TAB to a

native format (e.g. MAGE-ML import viewer)

• Human interaction required…

Page 8: GenePattern Overview for MAGE-TAB Workshop Ted Liefeld January 24, 2007

More MAGE-TAB thoughts

Define structure/format for keeping multiple MAGE-TAB files together IDF, ADF, SDRF, raw data files -> package together as

ZIP? tgz?

• Sub directories in the zip? (defined)

Does MAGE-TAB support for multiple Arrays in one file? Useful & MAGE-ML allows this now (but I don’t like it for

automated processing)

• E.g. E-GEOD-995.mageml.tgz from ArrayExpress

Page 9: GenePattern Overview for MAGE-TAB Workshop Ted Liefeld January 24, 2007

More MAGE-TAB thoughts

Persistent identifiers For protocols, samples etc

• Allow use of SDRF, data matrix (eg in GP with persistent references to external entities)

• Array details, experiment design, etc

Question? Should we consider MAGE-TAB DAG to record data

processing pipelines (provenance - HLA)?

• e.g. a protocol for each module execution added to MAGE-TAB file outputs

• File growth issues…

• Record all analysis for a publication

• Add additional SDRF file at each step

Page 10: GenePattern Overview for MAGE-TAB Workshop Ted Liefeld January 24, 2007

Release Information

Initially released in March, 2004

Current version 3.0, released April 2007 3.1 due Feb 08

Currently 5900+ users, 500+ organizations, ~90 countries

Availability

Freely available

Windows, Mac OS, and Unix platforms

Resources

http://www.genepattern.org

User workshops, documentation, email help desk, online user forum

Reich et al. (2006) Nature Genetics

GenePattern is a winner of the 2005 BioIT World Best Practices Award

Collaborations

caBIG

MAGNet NCBC

NCIBI NCBC