a factor graph model for minimal gene set enrichment analysis diana uskat computational biology -...

15
A Factor Graph Model for Minimal Gene Set Enrichment Analysis Diana Uskat Computational Biology - Gene Center Munich

Upload: tanya-cureton

Post on 14-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: A Factor Graph Model for Minimal Gene Set Enrichment Analysis Diana Uskat Computational Biology - Gene Center Munich

A Factor Graph Model for Minimal Gene Set Enrichment Analysis

Diana Uskat

Computational Biology - Gene Center Munich

Page 2: A Factor Graph Model for Minimal Gene Set Enrichment Analysis Diana Uskat Computational Biology - Gene Center Munich

24.03.2010 Diana Uskat - Gene Center Munich 2

Problem Outline:• Single gene analysis of microarray

experiments entails a large multiple testing problem

• Even after appropriate multiple testing correction, the result is usually a long list of differentially expressed genes

• Interpretation is difficult by hand

Possible improvement: Gene set enrichment analysis

1. Group genes into different biologically meaningful categories (Gene Ontology, KEGG Pathways, Transcription factor targets)

2. Use a statistical method for finding those categories which are enriched for differentially expressed genes

Motivation

Ontologizer from S. Bauer, J. Gagneur, P. N. Robinson

Cutout of Gene Ontology

Graph from Ontologizer by S. Bauer, J. Gagneur, P. N. Robinson (NAR 2010)

Cutout of Gene Ontology

Page 3: A Factor Graph Model for Minimal Gene Set Enrichment Analysis Diana Uskat Computational Biology - Gene Center Munich

24.03.2010 Diana Uskat - Gene Center Munich 3

Established Methods:

• GSEA (Subramanian, Tamayo)

• TopGO (Alexa)

• Globaltest (Goemann, Mansmann)

• GOStats (Falcon, Gentleman)

Drawbacks:

• There are often 1000’s of overlapping categories, genes can belong to multiple categories difficult new multiple testing problem

• Group testing returns often a large number of significant categories identification of biologically relevant categories difficult

Motivation

Graph from Ontologizer by S. Bauer, J. Gagneur, P. N. Robinson (NAR 2010)

Cutout of Gene Ontology

Page 4: A Factor Graph Model for Minimal Gene Set Enrichment Analysis Diana Uskat Computational Biology - Gene Center Munich

24.03.2010 Diana Uskat - Gene Center Munich 4

Minimal Gene Set Enrichment

Idea (Bauer, Gagneur et al., Nucleic Acids Research 2010)

• Search for a sparse explanation, i.e. a minimal number of categories that explain the data (sufficiently well)

• Use a simplistic probabilistic graphical model relating categories and genes, and do Bayesian inference on the marginal posterior for each category

T2

E3E2E1

T1 T3 T2

E3E2E1

T1 T3

Correct explanation Correct minimal explanation

Genes

Categories

“gene E3 is element of category T3”

(coloured means „on“)

Page 5: A Factor Graph Model for Minimal Gene Set Enrichment Analysis Diana Uskat Computational Biology - Gene Center Munich

24.03.2010 Diana Uskat - Gene Center Munich 5

Minimal Gene Set Enrichment

T2

E3E2E1

T1 T3

D3D2D1

Genes

Categories

Observations (data)

Posterior Likelihood Prior

The model

A Bayesian Network factorization of the full posterior:Main trick: Use a prior favoring sparse solutions

Page 6: A Factor Graph Model for Minimal Gene Set Enrichment Analysis Diana Uskat Computational Biology - Gene Center Munich

24.03.2010 Diana Uskat - Gene Center Munich 6

Factor Graphs

T2

E3E2E1

T1 T3

D3D2D1

• Graphical model (Kschischang IEEE, 2001)

• Bipartite graph with factor nodes and variable nodes

• Each factor node encodes a function for its neighbouring variables

• Efficient computation of marginal distribution with the sum-product algorithm (if factor graph is a tree...)

Our method: Factor Graphs

Page 7: A Factor Graph Model for Minimal Gene Set Enrichment Analysis Diana Uskat Computational Biology - Gene Center Munich

24.03.2010 Diana Uskat - Gene Center Munich 7

Factor Graphs

T2

E3E2E1

T1 T3

D3D2D1

f1 f2 f3

Jj

TgnextjjJj

jj TfTEgEfj

)(),()( )(

)|,Pr( DET

• Graphical model (Kschischang IEEE, 2001)

• Bipartite graph with factor nodes and variable nodes

• Each factor node encodes a function its neighbouring variables

• Efficient computation of marginal distribution with the sum-product algorithm (if factor graph is a tree...)

Pr(D|E)

given by dataset

Page 8: A Factor Graph Model for Minimal Gene Set Enrichment Analysis Diana Uskat Computational Biology - Gene Center Munich

24.03.2010 Diana Uskat - Gene Center Munich

Factor Graphs

T2

E3E2E1

T1 T3

D3D2D1

g1

f1 f2 f3

g2 g3 g6

g4 g5

• Graphical model (Kschischang IEEE, 2001)

• Bipartite graph with factor nodes and variable nodes

• Each factor node encodes a function its neighbouring variables

• Efficient computation of marginal distribution with the sum-product algorithm (if factor graph is a tree...)

Jj

TgnextjjJj

jj TfTEgEfj

)(),()( )(

)|,Pr( DET

E only active if at least one parent active

7

Page 9: A Factor Graph Model for Minimal Gene Set Enrichment Analysis Diana Uskat Computational Biology - Gene Center Munich

24.03.2010 Diana Uskat - Gene Center Munich 7

Factor Graphs

T2

E3E2E1

T1 T3

D3D2D1

g1

f1 f2 f3

g2 g3 g6

g4 g5

fT

Jj

TgnextjjJj

jj TfTEgEfj

)(),()( )(

)|,Pr( DET

• Graphical model (Kschischang IEEE, 2001)

• Bipartite graph with factor nodes and variable nodes

• Each factor node encodes a function its neighbouring variables

• Efficient computation of marginal distribution with the sum-product algorithm (if factor graph is a tree...)

with

N

j

TTT

jj ppTf1

1)1(

5.00 p

Page 10: A Factor Graph Model for Minimal Gene Set Enrichment Analysis Diana Uskat Computational Biology - Gene Center Munich

24.03.2010 Diana Uskat - Gene Center Munich 8

Estimation Methods for Factor Graphs

T2

E3E2E1

T1 T3

D3D2D1

g1

f1 f2 f3

g2 g3 g6

g4 g5

fT

Computation of posterior for T,E:• Message-Passing Algorithm: Sum-

Product-Algorithm

• Stops at correct result after one round if graph has a tree structure

• No guarantees if graph has cycles

(e.g., oscillation may occur), however works well in practice

Principle:• Start in leaf nodes

• Message propagation:

– variable to factor node („Sum“)

– factor to variable node („Product“)

• Termination: Compute the marginal distribution of the variable nodes

Page 11: A Factor Graph Model for Minimal Gene Set Enrichment Analysis Diana Uskat Computational Biology - Gene Center Munich

24.03.2010 Diana Uskat - Gene Center Munich 9

Application: Yeast Salt Stress

• Categories: Transcritption factors (with their targets) instead of GO categories

• Given: – List of transcription factors with their corresponding genes– List of genes (their p-values) from a yeast salt stress experiment

• Question: Which transcription factors are active during salt stress? • Task: Find a set of transcription factors that are most likely to be active

TF1

TF2

g1

g2

g3

g4

g5

“g2 is target of TF2”

Page 12: A Factor Graph Model for Minimal Gene Set Enrichment Analysis Diana Uskat Computational Biology - Gene Center Munich

24.03.2010 Diana Uskat - Gene Center Munich 10

Results

~2.000 genes

118 transcription factors

Graph obtained from re-analysis of Harbison TF binding data

(Nat, 2004) by MacIsaac et al. (BMC Bioinformatics, 2006)

Page 13: A Factor Graph Model for Minimal Gene Set Enrichment Analysis Diana Uskat Computational Biology - Gene Center Munich

24.03.2010 Diana Uskat - Gene Center Munich 10

Results

~2.000 genes

118 transcription factors

Graph obtained from re-analysis of Harbison TF binding data

(Nat, 2004) by MacIsaac et al. (BMC Bioinformatics, 2006)

Previously known transcription factors

involved in salt stress (Capaldi et al., Nat.Gen 2008,Wu and Chen, Bioinform Biol

Insights. 2009)

Differentially phosphorylated

transcription factors (Soufi et al., Mol.Biosyst 2009)

YML081W

DAL81

STB4

HSF1

UME6

SNT2

RGT1

MET28

MSN2

GAL4

SKO1

Page 14: A Factor Graph Model for Minimal Gene Set Enrichment Analysis Diana Uskat Computational Biology - Gene Center Munich

24.03.2010 Diana Uskat - Gene Center Munich 11

Summary and Outlook

• Todo: scalability and speed• Lists of (meaningful) gene sets are better than

lists of genes• Search for biologically meaningful explanations

requires a new minmal model (MGSE) for gene set enrichment analysis

• We use factor graphs for parameter estimation• Wide application to GO analysis, TF-target

analysis, Pathway enrichment

Page 15: A Factor Graph Model for Minimal Gene Set Enrichment Analysis Diana Uskat Computational Biology - Gene Center Munich

24.03.2010 Diana Uskat - Gene Center Munich 12

Acknowledgments

Gene Center Munich:

Achim Tresch, Theresa Niederberger, Björn Schwalb, Sebastian Dümcke

Collaborating Partners:

Gene Center Munich:

Patrick Cramer, Christian Miller, Daniel Schulz, Dietmar Martin, Andreas Mayer

EMBL Heidelberg:

Julien Gagneur(talk nov. 2009, working group conference of the GMDS „AG Statistische Methoden in der Bioinformatik, Munich“)