bayesian networks as framework for data integration · 2013. 7. 23. · bayesian networks as...

44
Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences Icahn Institute of Genomics and Multiscale Biology Icahn Medical School at Mount Sinai New York, NY @IcahnInstitute UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Upload: others

Post on 28-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

Bayesian Networks as framework for data integration

Jun Zhu, Ph. D.

Department of Genomics and Genetic Sciences

Icahn Institute of Genomics and Multiscale

Biology

Icahn Medical School at Mount Sinai

New York, NY

@IcahnInstitute UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Page 2: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

What are Bayesian networks?

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Association vs Causality

From Stephen Friend

Page 3: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

A simple biological question: are there

causal/reactive relationships?

Page 4: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

A Bayesian network approach:

Best model

Page 5: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

A Bayesian network approach:

A

B C

Best models Markov Equivalent models

A

A

A

B

B

B

C

C

C

Page 6: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

A Bayesian network ≠ a causal structure

Markov Equivalent models

A

A

A

B

B

B

C

C

C

|B C A

Page 7: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

X

F1

F2

F0

Diabetes

resistant

Diabetes

susceptible

Animal model: mouse F2 intercrosses

Bayesian network: how to break

Markov equivalent?

Page 8: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Liver Brain Muscle

White

adipose

Genotyping

Constructing

genetics map

Scanning QTLs

clinical traits Molecular profiling

Network

reconstruction

General data flow genetic crosses

Page 9: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Variation in mRNA leads to

variation in protein, which in

turn can lead to disease

Causal inference: genetics

Perturbations with a causal anchor

--Natural variation in a segregating population provides the same type of

causal anchor

DNA Supporting

Gene X

Variation in DNA leads to

variation in mRNA

AA

CA

GT

T

AA

CG

GT

T

High expression, alt

splicing, codon

change, etc.

Low expression, no alt.

splicing, no codon

change, etc.

Central Dogma of Biology

Schadt et al. Nature Genetics (2005)

Page 10: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

A Bayesian network approach:

Best models Markov Equivalent models

Page 11: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Structure priors based on causality

▶ Estimate confidence of causality

– Bootstrap samples for 200

times

– Factions of causal, reactive,

independent calls

▶ The pair is independent

▶ The pair is causa/reactive

Zhu et al., PLoS CompBio, 2007

Page 12: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

Bayesian network: integrating genetic data

• Give a sense of causality to Bayesian network

• how much improvement is achieved by integrating genetic data?

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Page 13: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Bayesian Network: a simulation study

Zhu et al., PLoS CompBio, 2007

Page 14: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Bayesian network: Genetics information is critical

when sample size is small

Largest improvement in recall occurs

with smaller sample sizes

Zhu et al., PLoS CompBio, 2007

Page 15: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Bayesian network: integrating genetic data

L1 L2 Ln-1 Ln

G1 G2 Gn-1 Gn Gj

Lj

Cis-regulation

Genetic loci

trans-regulation Transcriptional regulation

Gene

Page 16: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

recall

pre

cis

ion

Weak signals Strong signals

300 samples 900 samples 300 samples 900 samples

Page 17: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

Bayesian network: why samples matter?

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Page 18: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Bayesian network: integrating genetics

Experimental Hsd11b1 signature : mice treated with Hsd1

inhibitor

Prediction Hsd1 signatures based on BxD data

Correlation to Hsd1 10% of predicted signature overlap with experimental one

BN without genetics 20% of predicted signature overlap with experimental one

BN with genetics 52% of predicted signature overlap with experimental one

Zhu J et al, Cytogenet Genome Res. (2004)

Page 19: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

A framework for data integration

probabilistic

graphic models

Microarray data

Proteomic data

Genomics

Genetics

Medline Biocarta/Biopathway Biologists

Database

GUI Hypothesis, test

High throughput

data

knowledge

Metabolomic data

Page 20: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

Bayesian network: PPI

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Zhu J et al, Nature Genetics, 2008

Page 21: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

Bayesian network: PPI

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Zhu J et al, Nature Genetics, 2008

3-clique

4-clique 4-clique

3-clique

Clique community

(partial clique)

Page 22: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

Bayesian network: PPI

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Zhu J et al, Nature Genetics, 2008

Page 23: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

Bayesian network: Transcription Factors

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

C B

TF

D E

Is the TF is functional?

Are genes B, C, D, and E are correlated?

Page 24: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

Bayesian network: Transcription Factors

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Introducing scale-free priors for TF or protein

complex

)()( TwgTp

)),(log()(

Rg

cutoffi

i

rgTrTw

Zhu J et al, Nature Genetics, 2008

Page 26: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Zhu J et al, Nature Genetics, 2008

Integration improves network qualities

BN KO data GO terms TF data

w/o any priors 125 55 26

w/ genetics

priors 139 59 34

w/ genetics, TF

and PPI

priors 152 66 52

Page 27: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

Zhu J et al, Nature Genetics, 2008

LEU2 GCN4

ILV6

GCN4

LEU2 KO gives rise to small expression signature

• LEU2 KO sig enriched (p~10E-18)

• GCN4 downregulated in LEU2 KO small signature

ILV6 gives rise to large expression signature

• ILV6 KO sig enriched (p~10E-52)

• GCN4 upregulated in ILV6 KO large signature

Prospective validation is the gold

standard

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Page 28: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

How does LEU2 affect LEU3 activity?

LEU3 binding sites

LEU2

mRNA expression

LEU2 LEU3

Surrogate marker for Leu3p activity

Page 29: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

A framework for building causal networks

probabilistic

graphic models

Microarray data

Proteomic data

Genomics

Genetics

Medline Biocarta/Biopathway Biologists

Database

GUI Hypothesis, test

High throughput

data

knowledge

Metabolomic data

Page 30: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Zhu et al, PLoS Biology, 2012

Yeast segregants

Synthetic complete

medium

Logorithm growth

Gene expression metabolites Y

east

seg

regan

ts

genotypes

Public

databases

Protein-

protein

interations

Transcription

factor binding

sites

Bayesian network

Protein

Metabolite

interations

Page 31: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Zhu et al, PLoS Biology, 2012

Metabolite abundance is under genetic control

Page 32: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

KEGG biochemical pathways

emdeemp ,)(

Zhu et al, PLoS Biology, 2012

Page 33: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

LEU2 mRNA is causal to 2-isopropylmalate

KEGG pathway

Zhu et al, PLoS Biology, 2012

Page 34: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

LEU3 binding site

LEU2

With metabolomic data

Page 35: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

LEU3 regulation

• The activity of Leu3p is positively regulated by alpha-isopropylmalate (IPM), the product of the first step in leucine biosynthesis

Sze JY, et al. (1992) In vitro transcriptional activation by a metabolic intermediate: activation by Leu3 depends on alpha-isopropylmalate. Science 258(5085):1143-5

• The degree of activation by Leu3p is Leu3p concentration dependent, and it has been shown that LEU3 gene expression is regulated by general amino acid control, which is mediated by the GCN4 transcription factor

Zhou K, et al. (1987) Structure of yeast regulatory gene LEU3 and evidence that LEU3 itself is under general amino acid control. Nucleic Acids Res 15(13):5261-73

Page 36: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

2-isopropylmalate: mechanism of causal

regulator LEU2

LEU2 genotype LEU2 activity 2-isopropylmalate

LEU3 activity Transcriptional response for

genes with LEU3 binding sites

Page 37: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Zhu et al, PLoS Biology, 2012

Consistent with KEGG pathway

Page 38: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

What else can you learn from integrating

metabolomic data? Metabolite QTLs Causal candidates

Protein degradation

Metabolite

Signature

size

KO

Page 39: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Zhu et al, PLoS Biology, 2012

Page 40: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Zhu et al, PLoS Biology, 2012

Is the transcriptional effect real?

Page 41: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Zhu et al, PLoS Biology, 2012

PHM7-ko affects many metabolites

Page 42: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Integration of CNV blocks into Bayesian networks

Network-based model selection

Random

gene

Tran et al. BMC Sys. Biol. 2011

Page 43: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

Aknowledgements

UCLA workshop, July, 2013---Jun Zhu, Ph. D.

Sage Bionetworks

Stephen Friend et al.

Mount Sinai

Eric Schadt

Bin Zhang

Zhidong Tu

Decode

Valur Emilsson

U Washington

Roger Baumgarner

UCLA

Jake Lusis

Xia Yang, et al

Berkerley

Rachel Brem

Princeton

Lenoid Kruglyak

Harvard

Jun Liu

Merck

Qiuwei Xu

Ethan Xu

Theretha Zhang

Fred Hutchingson

Paddison lab

MD Anderson

Hanash lab

U Wisconsin

Alan Attie

Mark Keller, et al

Mount Sinai

Powell lab

Oh Lab

Casaccia

Page 44: Bayesian Networks as framework for data integration · 2013. 7. 23. · Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences

Aknowledgements

• Icahn Institute of Genomics and Multiscale Biology, Icahn Medical School at Mount Sinai

• Canary Foundation

• Prostate Cancer Foundation

• NIH

• NCI

UCLA workshop, July, 2013---Jun Zhu, Ph. D.