incob 2007 incob 2007 aug. 29, 2007 motif-directed network component analysis for regulatory network...

33
InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Motif-directed Network Component Analysis for Component Analysis for Regulatory Network Inference Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, Chen Wang, Lily Chen, Yue Wang, (Jason) Jianhua Xuan* (Jason) Jianhua Xuan* Virginia Tech, USA Virginia Tech, USA Po Zhao, Eric Hoffman Po Zhao, Eric Hoffman Children’s National Medical Center, Children’s National Medical Center, USA USA Robert Clarke Robert Clarke Georgetown University Medical Center, Georgetown University Medical Center, USA USA

Upload: phoebe-riley

Post on 13-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007Aug. 29, 2007

Motif-directed Network Component Motif-directed Network Component Analysis for Regulatory Network InferenceAnalysis for Regulatory Network Inference

Motif-directed Network Component Motif-directed Network Component Analysis for Regulatory Network InferenceAnalysis for Regulatory Network Inference

Chen Wang, Lily Chen, Yue Wang, (Jason) Jianhua Xuan*Chen Wang, Lily Chen, Yue Wang, (Jason) Jianhua Xuan*Virginia Tech, USAVirginia Tech, USA

Po Zhao, Eric HoffmanPo Zhao, Eric HoffmanChildren’s National Medical Center, USAChildren’s National Medical Center, USA

Robert ClarkeRobert ClarkeGeorgetown University Medical Center, USAGeorgetown University Medical Center, USA

Page 2: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

OutlineOutlineOutlineOutline

• Background & Motivation• Proposed Approach

– Motif-directed network component analysis (mNCA)

– Stability analysis• Experimental Results

– Muscle regeneration

• Conclusion & Discussion

Page 3: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

Background & MotivationBackground & MotivationBackground & MotivationBackground & Motivation

• High-throughput biological data (e.g., microarray data, proteomic data, etc.) provide us a great opportunity to study genome systems.– Identify gene modules, interactions and

pathways.

• Gene regulatory network modeling– Clustering or biclustering– Decomposition

• The whole gene population is regulated by a few key transcription factors (TFs).

• TFs and their interactions can form a skeleton of the regulatory networks.

Page 4: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

BackgroundBackgroundBackgroundBackground

• However, decomposition methods relying on microarray data alone often make their results difficult to interpret biologically.– Independent Component Analysis (ICA), and – Non-negative Matrix Factorization (NMF).

• Network Component Analysis (NCA) – An integrative approach– Microarray gene expression data– Protein binding data (i.e., ChIP-on-chip data) – network

connections (topology)• Available in yeast model system

Page 5: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

MotivationMotivationMotivationMotivation

• Limitations of NCA:– ChIP-on-chip data are often not available for species like mouse and

human;– When different data sources are integrated, the consistency is often

not guaranteed;– ChIP-on-chip data come from biological experiments, which might

contain false-positives leading to incorrect network inference.

• Proposed solution - motif-directed network component analysis (mNCA)– Motif information derived from DNA sequence for initial network

topology.– With the awareness of false-positives in motif information, stability

analysis procedures shall be developed to combat the inconsistency between motif information and microarray data.

Page 6: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

Motivation - Pathway Building Motivation - Pathway Building Motivation - Pathway Building Motivation - Pathway Building

• Emery Dreifuss Muscular Dystrophy (EDMD)

Aug. 29, 2007

Bakay, M, et al., Brain (129), 2006

Page 7: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

Network Component Analysis (NCA)Network Component Analysis (NCA)Network Component Analysis (NCA)Network Component Analysis (NCA)

TF Connection mRNA TF Connection mRNA

Page 8: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

• A linear model:

Mathematical Formulation of NCAMathematical Formulation of NCAMathematical Formulation of NCAMathematical Formulation of NCA

CHRNG = a1 MYOD1 + a2 MYOG

A: the connection strengthsT: transcription factor activities (TFAs)

0

,

. .

N M N L L MA

s t A Z

E T

2

0

min || |

.

| ,

. .

N M N L L MA

s t A Z

E T

Criterion to infer TFAs and regulation relationship according to both expression and topology:

Page 9: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

Illustration of NCAIllustration of NCAIllustration of NCAIllustration of NCA

gene

=

0 100 200 300 400 500 600 700 800 900 1000-12

-10

-8

-6

-4

-2

0

2

4

6

8

0 100 200 300 400 500 600 700 800 900 1000-10

-5

0

5

0 100 200 300 400 500 600 700 800 900 1000-8

-6

-4

-2

0

2

4

6

8

=E A T

0 20 40 60 80 100 120 140 160 180 200

-10

-8

-6

-4

-2

0

2

4

6

Microarray data Regulation strength

Transcription Factor Activities (TFAs)

Page 10: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

mNCA - Motif InformationmNCA - Motif InformationmNCA - Motif InformationmNCA - Motif Information• Transcription Factors (TFs)

– Proteins that bind to the promoter regions of genes – Activate or inhibit gene expression.

• Motif (DNA sequence motif) – Common pattern in binding sites for a TF– Short sequences (5-25 bp)– Up to 1000 bp (or farther) from the gene– Inexactly repeating patterns

Gene 1Gene 2Gene 3Gene 4

Gene 5

Binding sites for a TF

Page 11: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

Motif RepresentationMotif RepresentationMotif RepresentationMotif Representation

• Consensus sequence MyoD (M00001): SRACAGGTGKYG

• Position-Weighted Matrices (PWMs) MyoD (M00001):

• Sequence Logo: – graphical depiction of a profile – conservation of elements in a motif MyoD (M00001):

Page 12: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

• Input:– Promoter region of a gene g (2000bp upstream)– Muscle specific binding site s

• Match™ search algorithm – Minimize false positives

[Kel, A.E., et al., ucleic Acids Res, 2003. 31(13): p. 3576-9.]

• Output:– Initial connection strength – motif score

Motif IdentificationMotif IdentificationMotif IdentificationMotif Identification

0gsA

: average scores of matrix similarity and core similarity 0gsA

Page 13: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

Stability Analysis for mNCAStability Analysis for mNCAStability Analysis for mNCAStability Analysis for mNCA• The information sources:

– mRNA Microarray data (specific but noisy)– motif information (general & with false positives)

• The questions we want to answer:– What TFs play a relevant role in the experiment? – What genes are regulated by a particular TF? (downstream

targets)

• Stability analysis: If small perturbations being applied, – A bad TFA estimate tends to be altered easily, even

destroyed;– A good TFA estimate tends to keep its activity pattern

throughout the perturbation..

Page 14: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

Testing Stability by PerturbationsTesting Stability by PerturbationsTesting Stability by PerturbationsTesting Stability by Perturbations• Method 1: Theresholding the motif score

– A TF-gene connection is deleted if the motif score is below some cut-off threshold. By setting different cut-off thresholds, we can change the number of connections, hence, change the network topology accordingly.

• Method 2: Deleting/inserting connections– TF-gene connections are altered randomly, either by

deleting the existing connections or inserting new connections with some small percentage (e.g., 10%).

Aug. 29, 2007

Page 15: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

Understanding of Stability AnalysisUnderstanding of Stability AnalysisUnderstanding of Stability AnalysisUnderstanding of Stability Analysis

• Obtain the confidence measure of an estimate:

perturbation

comparison

e.g. absolute correlation coefficient: 0.92;highly confident

e.g. absolute correlation coefficient: 0.52;less confident

Page 16: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

Stability MeasurementStability MeasurementStability MeasurementStability Measurement• Stability measurements from perturbations:

stability measurements of j-th TFA{| correlation[ ( ), ( )] | }j j i kTFA i TFA k

75% Quantile

25% Quantile

Median

Boxplot of the stability measurements

Page 17: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

Experimental ResultsExperimental ResultsExperimental ResultsExperimental Results

• Dataset Description:Staged skeletal muscle degeneration/regeneration was induced by injection of cardiotoxin (CTX). In the time range up to 40 days, 27 time points were sampled, and each time sample contains two mice duplicates.

The time course microarray data set was acquired with Affymetrix’s Murine Genome U74v2 Set from an expression profiling study in Children’s National Medical Center (CNMC). We obtained expression measurements of 7570 probesets in each sample.

0.5 40(day)

10 11 12 13 14 16 20 301 2 3 4 5 …

Page 18: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

Muscle Related TFsMuscle Related TFsMuscle Related TFsMuscle Related TFs

• 24 Muscle related TF binding sites from TRANSFAC:

YY1 Tal-1alpha:E47

NF-Y alpha-CP1

Sp1 Hand1:E47

MEF-2

USF USF2

Tal-1beta:E47

Ebox myogenin

E2A

NKX25 Nkx2-5

TATA TBX5

MyoD

SRF

TBP GATA-4

GATA E47 E12

Page 19: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

Muscle Related TFsMuscle Related TFsMuscle Related TFsMuscle Related TFs• Some muscle related TF binding sites from TRANSFAC:

Page 20: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

• Thresholding the motif score: – The threshold of motif score was set from low to high, making the connection number

vary gradually from 12,000 to 18,000, which results in more than 30% topology alterations.

Stability Analysis (Method I)Stability Analysis (Method I)Stability Analysis (Method I)Stability Analysis (Method I)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1S

tabi

lity

Mea

sure

men

t

Transcription factor index

Tal

-1al

pha:

E47

YY

1

NF

-Y

Myo

D

TB

X5

TA

TA

Nkx

2-5

NK

X25

E2A

myo

geni

n

Ebo

x

Tal

-1be

ta:E

47

US

F2

US

F

ME

F-2

Han

d1:E

47

Sp1

alph

a-C

P1

SR

F

TB

P

GA

TA

-4

GA

TA

E47 E12YY1

myogenin

MyoD

Page 21: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

• Deleting or inserting connections:– For each transcription factor, 10% of connections were altered randomly regardless of

the motif score, by deleting existing connections or inserting new connections to test the stability of TFA estimates.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Sta

bili

ty M

easure

ment

Transcription factor index

Tal-1alp

ha:E

47

YY

1

NF

-Y

MyoD

TB

X5

TA

TA

Nkx2-5

NK

X25

E2A

myogenin

Ebox

Tal-1beta

:E47

US

F2

US

F

ME

F-2

Hand1:E

47

Sp1

alp

ha-C

P1

SR

F

TB

P

GA

TA

-4

GA

TA

E47

E12

YY1

myogenin

MyoD

Stability Analysis (Method II)Stability Analysis (Method II)Stability Analysis (Method II)Stability Analysis (Method II)

Page 22: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

Stable TFA EstimatesStable TFA EstimatesStable TFA EstimatesStable TFA Estimates

• The most stable TFA - YY1:– Observed expression is of almost no change;– Estimated TFA is muscle regeneration related.

YY1’s gene expression (probe id: 98767_at) Estimated YY1’s TFA

0 5 10 15 20 25 30-0.5

0

0.5

1

1.5

2

2.5

Time (days)

log

expr

essi

on r

atio

0 5 10 15 20 25 30-0.14

-0.12

-0.1

-0.08

-0.06

-0.04

-0.02

0

0.02

Time (days)

log

TF

A r

atio

Page 23: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

• The difference between YY1’s mRNA level and protein level is supported by biological experiments.Walowitz, JL, et al., “Proteolytic Regulation of the Zinc Finger Transcription Factor YY1, a Repressor of Muscle-restricted Gene Expression ,”J Biol Chem, Vol. 273, Issue 12, 6656-6661, March 20, 1998.

YY1 expression level YY1 protein level

YY1’s TFA EstimateYY1’s TFA EstimateYY1’s TFA EstimateYY1’s TFA Estimate

Page 24: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

YY1 – A Repressor in Muscle RegenerationYY1 – A Repressor in Muscle RegenerationYY1 – A Repressor in Muscle RegenerationYY1 – A Repressor in Muscle Regeneration

• Underlying regulation mechanism:

Calpain II’s gene expression (probe id: 101040_at)

Calpain IIYY1 YY1 targets

Estimated YY1’s TFA

0 5 10 15 20 25 30-0.14

-0.12

-0.1

-0.08

-0.06

-0.04

-0.02

0

0.02

Time (days)

log

TF

A r

atio

0 5 10 15 20 25 30

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Time (days)

log

expr

essi

on r

atio

Page 25: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

Stable TFA estimatesStable TFA estimatesStable TFA estimatesStable TFA estimates

• Some other stable TFAs - myogenin & MyoD MyoD(probe id: 102986_at)

myogenin(probe id: 103053_at)

Expression

Estimated TFA

0 5 10 15 20 25 30-0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Time (days)

log

TF

A r

atio

0 5 10 15 20 25 300

0.05

0.1

0.15

0.2

0.25

Time (days)

log

TF

A r

atio

0 5 10 15 20 25 30-0.5

0

0.5

1

1.5

2

2.5

Time (days)

log

expr

essi

on r

atio

0 5 10 15 20 25 30-0.5

0

0.5

1

1.5

2

2.5

Time (days)

log

expr

essi

on r

atio

Page 26: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

Identifying TF’s Downstream TargetsIdentifying TF’s Downstream TargetsIdentifying TF’s Downstream TargetsIdentifying TF’s Downstream Targets• Stability Analysis:

– Similarly, we can test the stability of regulation strength A with small perturbations, hence to rank the most likely targets of a specific TF.

• Ranking downstream targets by frequency count (confidence measure):– Perform multiple independent perturbations by deleting

a connection with some probability.– Count how many times a TF-gene regulation strength

is in the top rank group (defined by some preset threshold), based on its regulation strength A.

Page 27: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

Stability Analysis of MyoD’s TargetsStability Analysis of MyoD’s TargetsStability Analysis of MyoD’s TargetsStability Analysis of MyoD’s Targets• MyoD’s downstream targets ranking:

– 1000 independent perturbations are carried out.– Each connection is deleted with a probability (e.g., 0.3). – The top ranking

threshold is set to 100 in this case.

if one gene’s regulation strength by MyoD is in the top 100, then this gene is counted for once.

0 100 200 300 400 5000

100

200

300

400

500

600

700

Sorted downstream targets' index

Fre

quen

cy C

ount

Page 28: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

• MyoD’s downstream genes from Ingenuity Pathway Analysis:

MyoD’s Downstream TargetsMyoD’s Downstream TargetsMyoD’s Downstream TargetsMyoD’s Downstream Targets

Top 100 genes: 16 directly related genes with MyoD, and several key muscle regeneration TFs: MYC, MYOG, and MEF2C

Page 29: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

• YY1’s downstream genes from Ingenuity Pathway Analysis:

YY1’s Downstream TargetsYY1’s Downstream TargetsYY1’s Downstream TargetsYY1’s Downstream Targets

Page 30: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

ConclusionsConclusionsConclusionsConclusions

• A new computational approach, namely motif-directed network component analysis (mNCA), has been developed to integrate motif information and microarray data for regulatory network inference.– Motif information has been utilized to derive the initial

topology information for mNCA.– With the awareness of many false-positives in motif

information, stability analysis procedures have been developed to extract stable TFAs and TFs’ downstream targets.

• The experimental results have demonstrated that mNCA can help reveal key regulators in muscle regeneration.

Page 31: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

Future Work – New Hypothesis & Future Work – New Hypothesis & Validation Validation

Future Work – New Hypothesis & Future Work – New Hypothesis & Validation Validation

• Integrative approaches to pathway building

Calpain IIYY1

MyoD

c-Myc

PAX2

DYS

….

myogenin

MYL4TNNC1

MYBPH…DES

: interaction from database and knowledge

: interaction derived from computational methods

CYBBMCM5………RRM1

Page 32: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007Aug. 29, 2007

AcknowledgementAcknowledgementAcknowledgementAcknowledgement

• NIH Grants:– NS2925-13A, CA 096483 & CA109872

• DoD/CDMRP Grant– BC030280

Page 33: InCoB 2007 InCoB 2007 Aug. 29, 2007 Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason)

InCoB 2007InCoB 2007

Thank you very much!

Aug. 29, 2007