a modified graphical gaussian model approach for genetic...

32
A modified graphical gaussian model approach for genetic regulatory networks Anja Wille, SfS Reverse Engineering Project, ETHZ

Upload: others

Post on 17-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

A modified graphical gaussian modelapproach for genetic regulatory

networks

Anja Wille, SfSReverse Engineering Project, ETHZ

Page 2: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

biological scenarios

Introduction

modified GGMs

graphical gaussian models(GGMs)

Page 3: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Biological scenarios

DXPS2DXPS1DXPS3 DXPSx

DXR

MCT

CMK

MECPS

IPPI1

GGPPS5 GGPPS6 GGPPS2GGPPS10 GGPPS11

HMGS

AACT2

HMGR2HMGR1

MK

MPDC2

IPPI2

FPPS2

GPPS(P) GPPS(Q)

FPPS1

GPPS

AACT1

Mitochondria

CytosolChloroplast

GGPPS3 GGPPS4GGPPS7 GGPPS1GGPPS9 GGPPS12GGPPS8

GPPR

CarotenoidsChlorophylls

PS

PhytosterolsSesquiterpenes

DPPS

Isoprenoid pathways

Page 4: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Genetic regulation

39 g

enes

118 observations

… signals …

graphical model to estimateconditional dependencebetween genes

Page 5: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Graphical Gaussian Models

• undirected

• random variables follow a multivariate normal distribution

• log likelihood:

for a sample of N observations y1,…,yN withsample mean , sample covariance matrix Sand precision matrix

y-1S=W

l(m,W) = -N2

(qln(2p ) + ln | S | +tr(WS) + (y - m)'W(y - m))

• partial correlation coefficient wij|rest for gene pair ij

Page 6: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Bootstrapping and pairwise correlation

Page 7: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Application to isoprenoid pathways

DXPS1DXPS3

DXR

MCT

CMK

MECPS

IPPI1

GGPPS5 GGPPS6 GGPPS2GGPPS10

HMGS

AACT2

HMGR1

MK

MPDC2

FPPS2FPPS1

GPPS

AACT1

Mitochondria

CytosolChloroplast

GGPPS3 GGPPS4GGPPS7 GGPPS1GGPPS9 GGPPS12GGPPS8

GPPR

CarotenoidsChlorophylls

PS

PhytosterolsSesquiterpenes

Isoprenoid pathways

HMGR2

DXPSx

GPPS(P) GPPS(Q) DPPS

DXPS2

IPPI2

GGPPS11

Page 8: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Problems

• matrix inversion rank-sensitive

• genes ↑ fi exponentially increasing number of models

fi difficult to interpret

• how to interpret high partial correlation accompanied with low pairwise correlation

• efficient procedure for attaching new genes needed

Page 9: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Outline for modified GGM approach

For each pair of genes i,j:

• take pairwise correlation into account

• fit GGMs with gene triples i,j,k for all remaining genes k to study the partial correlation

• combine GGMs and pairwise correlation for inference on edge ij

• 3 methods that differ with respect to statistical framework and computational costs

Page 10: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Frequentist approach

j

i

Focus on genepair ij• pij|k is p-value from deviance test wij|k ≠ 0 versus wij|k = 0

k

j

i

k

Page 11: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Frequentist approach (cont’d)

j

i

1) Form pij,max = max (pij|k for all genes k ≠ i,j)2) Adjust pij,max according to Bonferroni-Holm or FDR3) If the adjusted value pij,max <0.05, draw edge between i and j

k

j

i

k

Page 12: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Application to isoprenoid pathway

DXPS2DXPS1DXPS3 DXPSx

DXR

MCT

CMK

MECPS

GGPPS5 GGPPS6 GGPPS2GGPPS10

HMGS

AACT2

HMGR2HMGR1

MK

MPDC2

IPPI2

FPPS1

GPPS

AACT1

Mitochondria

CytosolChloroplast

GGPPS3 GGPPS4GGPPS7 GGPPS1GGPPS9 GGPPS12GGPPS8

GPPR

CarotenoidsChlorophylls

PS

PhytosterolsSesquiterpenes

Isoprenoid pathways

IPPI1

GGPPS11

FPPS2

GPPS(P) DPPSGPPS(Q)

Page 13: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Likelihood approach with parameters q

Estimate q = {q ij for all i,j} in a maximum likelihood approach

j

i

kparameterqij foredge ij

L(q) = L(q | g)P(g)gŒGÂ

Page 14: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

EM algorithm

j

i

kparameterqij foredge ij

Q(q |q t ) = L(q | g)P(g |q t ,y)gŒGÂ to be maximized

qik

qjk

qijt +1 =

qikt gik (1-qik

t )1-gik q jkt gjk (1-q jk

t )1-gjk ⋅ L(mg,Wg)g|gij =1Â

qikt gik (1-qik

t )1-gik q jkt gjk (1-q jk

t )1-gjk ⋅ L(mg,Wg)gÂk≠ i,j

Page 15: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Simplification

qijt +1 = qik

t q jkt ⋅ P(w ij|k > 0 | y) + (1-qik

t q jkt ) ⋅ P(s ij > 0 | y)

k≠ i,j’

qijt +1 =

qikt gik (1-qik

t )1-gik q jkt gjk (1-q jk

t )1-gjk ⋅ L(mg,Wg)g|gij =1Â

qikt gik (1-qik

t )1-gik q jkt gjk (1-q jk

t )1-gjk ⋅ L(mg,Wg)gÂk≠ i,j

j

i

kqij

qik

qjk

not all GGMswith i,j,kconsidered

Page 16: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Distribution of qij

Page 17: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Conclusions

• GGM of gene triples used to look whether correlation between two genes can be “explained” by a third one

• frequentist approach simple, can be applied to many genes

• approach with q-parameters requires iteration, tested for up to70 genes

• a large set of additional genes can be attached to constructednetwork

Page 18: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Modeling at two levels

j

i

k

attach additional genes• possibly 1000s• which one “explain” edges?

genetic network• small number <100• model edges

Page 19: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Attaching additional genes

For additional genes k• include in computation of qij but keep qik and qjk fixed• qij decreases• count how often k decreases qij, validate

j

i

k

qjk fixed

Page 20: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Attaching genes from other pathways

without additional genes with additional genes

Page 21: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Attaching genes from other pathways

both pathways

carotenoid*

tocopherol*

calvin cyclechlorophyll*

abscisic acid*

onecarbonpoolphytosterol*

chloroplast

carotenoid*

onecarbonpoolchlorophyll*

calvin cycle

cytoplasm

tocopherol*

phytosterol*

downstream pathways marked by *

Page 22: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Attaching genes from other pathways

Page 23: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Hierarchical clustering

Page 24: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Conclusions

Modified GGMs

• model dependence between genes

• combine GGMs and pairwise correlation for inference on edge ij

• different statistical design, computational cost

• additional genes can be fitted into the model

• similarities in expression patterns between groups of genescan be identified (also verified in yeast data)

Page 25: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Comparison of different methods

pairwise correlation

GGM

0.68

0.96

0.60

0.800.81

0.56

0.49

0.470.43

Modified GGM1

Modified GGM2

Modified GGM3

Page 26: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Consistent results

DXPS2DXPS1DXPS3 DXPSx

DXR

MCT

CMK

MECPS

IPPI1

GGPPS5 GGPPS6 GGPPS2GGPPS10 GGPPS11

HMGS

AACT2

HMGR2HMGR1

MK

MPDC2

IPPI2

FPPS2

GPPS(P)

FPPS1

GPPS

AACT1

Mitochondria

CytosolChloroplast

GGPPS3 GGPPS4GGPPS7 GGPPS1GGPPS9 GGPPS12GGPPS8

GPPR

CarotenoidsChlorophylls

PS

PhytosterolsSesquiterpenes

GPPS(Q) DPPS

Page 27: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Acknowledgments

• Peter Buehlmann

• Stefan Bleuler, Amela Prelic, Eckhard Zitzler

• Philip Zimmermann, Lars Hennig, Willi Gruissem

Page 28: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Galactose pathway in yeast

Page 29: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Galactose pathway in yeast

Page 30: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Network for galactose pathway

Page 31: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

Network for galactose pathway

Genes that attached to network

GCY1FAR1YDR010CYEL057CYPL066WYOR121CMLF3

YLL058WYJL212CPCL10LYS1YBR139WYCR059C

Page 32: A modified graphical gaussian model approach for genetic …compdiag.molgen.mpg.de/docs/talk_12_01_04_wille.pdf · MLF3 YLL058W YJL212C PCL10 LYS1 YBR139W YCR059C † L(q)= L(q|g)P(g)

L(q) = L(q | g)P(g)gŒGÂ

Q(q |q t ) = L(q | g)P(g |q t ,y)gŒGÂ

qijt +1 = qik

t q jkt ⋅ P(w ij|k > 0 | y) + (1-qik

t q jkt ) ⋅ P(s ij > 0 | y)

k≠ i,j’

qijt +1 =

qikt gik (1-qik

t )1-gik q jkt gjk (1-q jk

t )1-gjk ⋅ L(mg,Wg)g|gij =1Â

qikt gik (1-qik

t )1-gik q jkt gjk (1-q jk

t )1-gjk ⋅ L(mg,Wg)gÂk≠ i,j

l(m,W) = -N2

(qln(2p ) + ln | S | +tr(WS) + (y - m)'W(y - m))