xianghong jasmine zhou molecular and computational biology university of southern california

67
Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California IMS, NUS, Singapore, July 18, 2007

Upload: derora

Post on 09-Feb-2016

21 views

Category:

Documents


3 download

DESCRIPTION

Functions, Networks, and Phenotypes by Integrative Genomics Analysis. Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California. IMS, NUS, Singapore, July 18, 2007. 2001 Human 30~100K genes. 1995 Bacteria - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Functions, Networks, and Phenotypes by Integrative

Genomics Analysis

Xianghong Jasmine ZhouMolecular and Computational Biology

University of Southern California

IMS, NUS, Singapore, July 18, 2007

Page 2: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

DNA

RNA

Protein

Gene Expression Datasets: the Transcriptome•Oligonucleotide Array

•cDNA Array

Protein abundance measurement (Mass Spec)

Protein interactions (yeast 2-hybrid system, protein arrays)

Protein complexes (Mass Spec)

Information FlowCellular systems

1995 Bacteria~1.6K genes

1997Eukaryote~6K genes

1998Animal~20K genes

2001Human30~100K genes

Objective:$1000 human genome

Page 3: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Rapid accumulation of microarray data in public repositories

• NCBI Gene Expression Omnibus

• EBI Array Express

137231 experiments

55228 experiments

The public microarray data increases by 3 folds per year

Page 4: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Multiple Microarray Technology Platforms

MicroarrayPlatforms

Page 5: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

--------------------

--------------------

--------------------

--------------------

a

b

d

e

f

g

h

i

j

k

c

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

Recurrent Patterns

a

b

c

d

e

f

g

h

i

j

k

a

b

d

e

f

g

h

i

j

k

c

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

Coexpression Networks

a

b

c

d

e

f

g

h

i

j

k

Datasets

Graph-based Approach for the Integrative Microarray Analysis

gene

experiments

e

g

h

i

Annotation

a

b

d

f

Functional Annotation

Gene Ontology

e

g

h

i

Transcriptional Annotation

TF

Page 6: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Frequent Subgraph Mining Problem is hard!

Problem formulation: Given n graphs, identify subgraphs which occur in at least m graphs (m n)

Our graphs are massive! (>10,000 nodes and >1 million edges)

The traditional pattern growth approach (expand frequent subgraph of k edges to k+1 edges) would not work, since the time and memory requirements increase exponentially with increasing size of patterns and increasing number of

networks.

Page 7: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Novel Algorithms to identify diverse frequent network patterns

• CoDense (Hu et al. ISMB 2005)

– identify frequent coherent dense subgraphs across many massive graphs

• Network Biclustering (Huang et al, ISMB 2007)

– identify frequent subgraphs across many massive graphs

• Network Modules (NeMo) (Yan et al. ISMB 2007)

– identify frequent dense vertex sets across many massive graphs

Page 8: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

CODENSE: identify frequent coherent dense subgraphs across

massive graphs

Hu et al, ISMB 2005

Page 9: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Identify frequent co-expression clusters across multiple microarray data sets

c1 c2… cm

g1 .1 .2… .2g2 .4 .3… .4…

c1 c2… cm

g1 .8 .6… .2g2 .2 .3… .4…

c1 c2… cm

g1 .9 .4… .1g2 .7 .3… .5…

c1 c2… cm

g1 .2 .5… .8g2 .7 .1… .3…

...a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

ef

g

h

i

j

k

a

bd

e

f

g

h

i

j

k

c

...a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

bd

e

f

g

h

i

j

k

c

...

Page 10: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

The common pattern growth approach

Find a frequent subgraph of k edges, and expand it to k+1 edge to check occurrence frequency

– Koyuturk M., Grama A. & Szpankowski W. An efficient algorithm for detecting frequent subgraphs in biological networks. ISMB 2004

– Yan, Zhou, and Han. Mining Closed Relational Graphs with Connectivity Constraints. ICDE 2005

Page 11: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

The time and memory requirements increase exponentially with increasing size of patterns and increasing number of networks. The number of frequent dense subgraphs is explosive when there are very large frequent dense subgraphs, e.g., subgraphs with hundreds of edges.

Problem of the Pattern-growth approach

Page 12: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Problem of the Pattern-growth approach

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

bd

e

f

g

h

i

j

k

c

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

bd

e

f

g

h

i

j

k

c

Pattern Expansionk k+1

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

bd

e

f

g

h

i

j

k

c

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

bd

e

f

g

h

i

j

k

c

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

bd

e

f

g

h

i

j

k

c

Page 13: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Our solutionWe develop a novel algorithm, called CODENSE, to mine frequent coherent dense subgraphs. The target subgraphs have three characteristics:(1) All edges occur in >= k graphs (frequency)(2) All edges should exhibit correlated occurrences in

the given graph set. (coherency)(3) The subgraph is dense, where density d is higher

than a threshold and d=2m/(n(n-1)) (density)m: #edges, n: #nodes

Page 14: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

CODENSE: Mine coherent dense subgraph

a

b

d

e

g

h

i

c

f

summary graph Ĝ

fa

b

d

e

g

h

i

c

G1

f

a

b

c

d

e

f

g

h

i

a

b

c

d

e

f

g

h

i

a

b

c

d

e

f

g

h

i

a

b

c

d

e

f

g

h

i

a

b

c

d

e

g

h

i

G3G2

G6G5G4

(1) (1) Builds a summary graph by eliminating infrequent edgesBuilds a summary graph by eliminating infrequent edges

Page 15: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

(2) Identify dense subgraphs of the summary graph(2) Identify dense subgraphs of the summary graph

a

bd

e

g

h

i

cf

summary graph Ĝ

e

g

h

i

cf

Sub(Ĝ)

Step 2

MODES

Observation: If a frequent subgraph is dense, it must be a dense subgraph in the summary graph. However, the reverse conclusion is not true.

CODENSE: Mine coherent dense subgraph

Page 16: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

(3) (3) Construct the edge occurrence profiles for each Construct the edge occurrence profiles for each dense summary subgraphdense summary subgraph

e

g

h

i

cf

Sub(Ĝ)

Step 3

…………………

111000e-f

011100c-i

111000c-h

111010c-f

101100c-e

G6G5G4G3G2G1E

edge occurrence profiles

CODENSE: Mine coherent dense subgraph

Page 17: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

…………………

111000e-f

011100c-i

111000c-h

111010c-f

111100c-e

G6G5G4G3G2G1E

edge occurrence profiles

Step 4

c-f

c-h

c-e

e-h

e-f

f-h

c-i

e-i

e-g g-i

h-i

second-order graph S

g-hf-i

(4) (4) builds a second-order graph for each dense summary builds a second-order graph for each dense summary subgraphsubgraph

CODENSE: Mine coherent dense subgraph

Page 18: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

c-f

c-h

c-e

e-h

e-f

f-h

c-i

e-i

e-g g-i

h-i

second-order graph S

g-hf-i

Step 4

c-f

c-h

c-e

e-h

e-f

f-h

e-i

e-g g-i

h-i

Sub(S)

g-h

(5) Identify dense subgraphs of the second-order graph(5) Identify dense subgraphs of the second-order graph

Observation: if a subgraph is coherent (its edges show high correlation in their occurrences across a graph set), then its 2nd-order graph must be dense.

CODENSE: Mine coherent dense subgraph

Page 19: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

(6) Identify the coherent dense subgraphs(6) Identify the coherent dense subgraphs

c-f

c-h

c-e

e-h

e-f

f-h

e-i

e-g g-i

h-i

Sub(S)

g-h

Step 5

c

e

fh

e

g

h

i

Sub(G)

CODENSE: Mine coherent dense subgraph

Page 20: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Our solutionThe identified subgraphs by definition satisfy the three criteria:(1) All edges occur in >= k graphs (frequency)(2) All edges should exhibit correlated occurrences in

the given graph set. (coherency)(3) The subgraph is dense, where density d is higher

than a threshold and d=2m/(n(n-1)) (density)m: #edges, n: #nodes

Page 21: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

…………………

111000e-f

011100c-i

111000c-h

111010c-f

111100c-e

G6G5G4G3G2G1E

edge occurrence profiles

c

e

fh

e

g

h

i Step 4Step 5

Sub(G)

a

bd

e

g

h

i

cf

a

bc

d

e

f

g

h

i

a

b

c

d

e

f

g

h

i

a

b

c

d

e

f

g

h

i

a

b

de

f

g

h

i

c a

b

c

de

f

g

h

i

a

b

c

de

f

g

h

i

G1 G3G2

G6G5G4

c-f

c-h

c-e

e-h

e-f

f-h

c-i

e-i

e-g g-i

h-i

second-order graph S

g-hf-i

Step 1

Step 3

summary graph Ĝ

e

g

h

i

cf

Sub(Ĝ)

Step 2

c-f

c-h

c-e

e-h

e-f

f-h

e-i

e-g g-i

h-i

Sub(S)

g-h

Step 6

MODESAdd/Cut

MODESRestore G and MODES

CODENSE: Mine coherent dense subgraph

Page 22: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

CODENSEThe design of CODENSE can solve the scalability issue. Instead of mining each biological network individually, CODENSE compresses the networks into two meta-graphs (the summary graph and the second-order graph) and performs clustering in these two graphs only. Thus, CODENSE can handle any large number of networks.

Page 23: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

V

g

j

h

i

g

f

e

a

b

c

d

h

i

j

g

f

e

a

b

c

d

h

i

j

V

h

i

f

e

a

b

c

d

h

i

f

e

h

i

Step 1 Step 2

Step 3Step 4

G Sub(G)

Sub(G’)

G’

HCS’ condense

HCS’

restoreHCS’

MODES: Mine overlapped dense subgraph

Hu et al. ISMB 2005

Page 24: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

c1 c2… cm

g1 .1 .2… .2g2 .4 .3… .4…

c1 c2… cm

g1 .8 .6… .2g2 .2 .3… .4…

c1 c2… cm

g1 .9 .4… .1g2 .7 .3… .5…

c1 c2… cm

g1 .2 .5… .8g2 .7 .1… .3…

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

ef

g

h

i

j

k

a

bd

e

f

g

h

i

j

k

c

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

bd

e

f

g

h

i

j

k

c

Applying CoDense to 39 yeast microarray data sets

Page 25: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Functional annotation

Annotation

Page 26: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Functional Annotation (Validation)

Method: leave-one-out approach - masking a known gene to be unknown, and assign its function based on the other genes in the subgraph pattern.

Functional categories: 166 functional categories at GO level at least 6

Results: 448 predictions with accuracy of 50%

Page 27: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Functional Annotation (Prediction)

We made functional predictions for 169 genes, covering a wide range of functional categories, e.g. amino acid biosynthesis, ATP biosynthesis, ribosome biogenesis, vitamin biosynthesis, etc. A significant number of our predictions can be supported by literature.

Page 28: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

However…• How about frequent non-dense graphs?

– Many biological modules may form paths

• How about subgraphs which are coherent across only a subset of the graphs?– Not all modules are activated across all

conditions, and genes may form modules with diff. other genes under diff. conditions

Page 29: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Network Biclustering:Identify frequent subgraphs across

massive graphs

Huang et al, ISMB 2007

Page 30: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Using 65 human co-expression network as an illustration example

• 65 co-expression networks generated from 65 microarray data sets

• each graph contains 8297 genes, and 1%-10% edges of a complete graph

Page 31: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Basically, it is a biclustering problem

edges

graphs

Page 32: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Network Biclustering

• Objective function

cmncf

'

c’: number of 1 in the bicluster c: number of 1 in the whole matrix mn: size of the bicluster: regularization factor

However, the matrix is very large with millions of edges …We will first identify robust seed to narrow down the search space

1 0 1 0 0 … 1

0 1 0 0 1 … 0

0 1 1 1 1 … 0

0 1 0 1 1 … 1

0 1 1 0 1 … 0

1 1 1 1 1 … 0

0 0 1 0 0 … 1

graphs

edges

Page 33: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Identify Bicluster seed

The property of relation graphs: edge labels are unique.

Hence, each graph can be treated as a collection of items

Thus, Frequent subgraph Mining can be modeled as frequent item set mining

Problem: current frequent item set mining algorithms can only efficiently mine across many small item sets In our problem, we have 65 very large item set…

We use a trick….

Page 34: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

…………………

101011e5

011100e4

111111e3

101011e2

101011e1

G6G5G4G3G2G1E

…………………

111000…

011100...

111000…

111010…

111100...

G65G64G63G62G61G60...

Edge occurrence profiles:

common edges common edges common edges

{e1, e10, e56, e100, e1000,…} {e4, e12, e33, e56, e890,…} …. {e99, e220, e1545, e2629,…}

{G1, G3, G5, G6, G7,…} {G2, G3, G5, G7, G8,…} …. {G8, G9, G15, G26, G29}

Graph set with more than 5 membersand with > 1000 common edgesFrequent pattern tree

Identify Bicluster seed

Very time consuming! It takes more than 2 weeks on 40 Pentium IV nodesHuang et al. ISMB 2007

Page 35: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Expanding the Biclusters

…………………

101001e5

111111e4

111111e3

011111e2

111011e1

G6G5G4G3G2G1E

…………………

1110001

011100.0

1110000

1110101

1111000

G65G64G63G62G61…G7

Simulated Annealing

…………………

101001e5

111111e4

111111e3

011111e2

111011e1

G6G5G4G3G2G1E

…………………

1110001

011100.0

1110000

1110101

1111000

G65G64G63G62G61…G7

Identify connected components

Page 36: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Systematic identification of functional modules in human genome

• We identified 143,400 network modules with recurrence >= 5. They vary in size from 4 to 180.

• 77.0% of the patterns are functionally homogenous (GO hyper-geometric P-value less than 0.01)

• The figure shows that the functional homogeneity of modules increase with their recurrences.

Page 37: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Results (I): Examples of highly recurring modules

Recurrence 20Defense against oxygen and nitrogen species

Recurrence 18Involved in spermatogenesis

Page 38: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Loosely connected network patterns with high recurrence can

represent functional modules

Page 39: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Functional annotation

Annotation

We made functional predictions for 779 known and 116 unknown genes by random forest classification with 71% accuracy.

Variables for random forest classification: functional enrichment P-value network topology scorenetwork connectivity pattern recurrence numbersaverage node degree unknown gene ratioNetwork size

Page 40: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Network Modules (NeMo)Identify frequent dense vertex sets

across many massive graphsm ic ro array

...

c o exp res s io ngrap h

...

(neighb o r as s o c iatio n)s um m ary grap h

c lus tering refinem ent

s tep 1 s tep 2 s tep 3 s tep 4

...

p artitio ning

Yan et al. ISMB 2007

Page 41: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

105 human microarray data sets

NeMo

6477 recurrent coexpression clusters(density > 0.7 and support > 10)

Validation based on ChIp-chip data (9176 target genes for 20 TFs)

Validation based on human-mouse Conserved Transfac prediction (7720 target genes for 407 TFs)

15.4% homogenous clusters(vs. 0.2% by randomization test)

12.5% homogenous clusters(vs. 3.3% by randomization test)

Page 42: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Percentage of potential transcription modules validated by ChIP-Chip data increase with cluster density and recurrence

Page 43: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Microarray Data

Expression data

Phenotype Concepts (e.g. diseases, perturbations, tissues )in Unified Medical Language System (UMLS)

Phenotype information

Page 44: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Classifying microarray data based on phenotype

Adenocarcinoma Arthritis Asthma Glaucoma … HIV

For example, the current NCBI GEO database contains >60 cancer datasets, among which 11 leukemia datasets.

Page 45: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Identify phenotype-specific functional or transcriptional modules

• Unsupervised approach

Frequent pattern mining

Module 1, Module 2, Module 3, … Module kModule 1

Cancer Cancer Cancer Cancer

Page 46: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

105 microarray data sets

NeMo

6477 recurrent coexpression clusters(density > 0.7 and support > 10)

A total of 459 of the clusters are statistically significant with a hypergeometric P-value <0.01 in a specific phenotype:

e.g. malignant neoplasms, cardiovascular diseases or nervous system disorders.

Page 47: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

An example

5 out of the 9 support datasets are leukemia datasets (P-value 0.0039). It is potentially regulated by E2F4, and majority genes are involved in cell cycle and DNA repair.

Page 48: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Identify phenotype-specific functional or transcriptional modules

• Supervised approachAdenocarcinoma Arthritis Asthma Glaucoma … HIV Non-Adenocarcinoma

Functional and transcriptional modules which are active ONLY in Adenocarcinoma

Related data sets

Page 49: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

A case study: Identify Network Modules Characterizing Cancer

c1 c2… cm

g1 .9 .4… .1g2 .7 .3… .5…

c1 c2… cm

g1 .2 .5… .8g2 .7 .1… .3…

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

ef

g

h

i

j

k

a

b

d

e

f

g

h

i

j

k

c

c1 c2… cm

g1 .9 .4… .1g2 .7 .3… .5…

c1 c2… cm

g1 .9 .4… .1g2 .7 .3… .5…

c1 c2… cm

g1 .9 .4… .1g2 .7 .3… .5…

c1 c2… cm

g1 .9 .4… .1g2 .7 .3… .5…

c1 c2… cm

g1 .9 .4… .1g2 .7 .3… .5…

32 Cancer Datasets

25 Non-cancer Datasets

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k

a

b

c

d

e

f

g

h

i

j

k…

c1 c2… cm

g1 .2 .5… .8g2 .7 .1… .3…

c1 c2… cm

g1 .2 .5… .8g2 .7 .1… .3…

c1 c2… cm

g1 .2 .5… .8g2 .7 .1… .3…

c1 c2… cm

g1 .2 .5… .8g2 .7 .1… .3…

c1 c2… cm

g1 .2 .5… .8g2 .7 .1… .3…

Page 50: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Examples of identified modules

Cell cycle Moduleacross all cancer datasets

Cell adhesionacross all solid tumor datasets

PDGF-signalingin breast cancer datasets

Page 51: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Reconstruct transcriptional cascades by second-order analysis

Zhou et al. Nature Biotech 2005

Page 52: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Frequently occurring tight clusters

Page 53: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Frequently occurring tight clustersTranscription Factors

Page 54: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Co-occurrence of tight clusters

Coexpression network constructed with the dataset 1

Page 55: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Co-occurrence of tight clusters

Coexpression network constructed with the dataset 2

Page 56: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Co-occurrence of tight clusters

Coexpression network constructed with the dataset 3

Page 57: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Co-occurrence of tight clusters

Coexpression network constructed with the dataset 4

Page 58: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Co-occurrence of tight clusters

Coexpression network constructed with the dataset 5

Page 59: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Coexpression Networks

Transcription Factors Set 1

Transcription Factor Set 2

Cooperativity

Page 60: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Relevance Networks

Page 61: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Three types of transcription cascades

TF1

TF2

TF3

Type I

Type II

gene 2

gene 1

gene 3

gene 4

gene 5

TF1

TF2 gene 2

gene 1

gene 3gene 4

gene 5

Type III

TF1

TF2

gene 2

gene 1

gene 3

gene 4

gene 5

Transcription Regulation

Protein Interaction

Page 62: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Applying to 39 yeast microarray data sets

• We identified 60 transcription modules. Among them, we found 34 pairs that showed high 2nd-order correlation. A significant portion (29%, p-value<10-5 by Monte Carlo simulation ) of those modules pairs are participants in transcription cascades: 2 pairs in Type I, 8 pairs in Type II, and 3 pairs in type III cascades. In fact, these transcription cascades inter-connect into a partial cellular regulatory network.

Page 63: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

CDC45, RAD27, RNR1, SPT21, YPL267W

YBL113C YRF1-1 YRF1-7

SWI5

YEL077C YRF1-3 YRF1-7

ALK1 BUD4 CDC20 CDC5 CLB2 HST3 SWI5 YIL158W YJL051W YOR315W

MSN4

NDD1

YBL113C YIL177C YJL225C YLR464W YML133C YRF1-1 YRF1-3 YRF1-4 YRF1-5 YRF1-6 YRF1-7

BUD19 RPL13A RPL13B RPL16A RPL24B RPL28 RPL2B RPL33B RPL39 RPL42B RPS16B RPS18A RPS21B RPS22A RPS23B RPS4B RPS6A

MET4

GCN4

MET10 MET14 MET28 SUL2

ALD5 ARG1 ARG2 ARG3 ARG4 ARG5,6 ARO1 ARO3 ARO4 ASN1 BNA1 CPA2 DED81 ECM40 HIS1 HIS4 HOM3 LEU4 LEU9 LYS20 MET22 ORT1 PCL5 TEA1 TRP2 YBR043C YDR341C YHM1 YHR162W YJL200C YJR111C

Leu3

BAT1 ILV2 LEU9

SWI4

MBP1

CLB6 RNR1 SPT21 YPL267W

CDC21 CDC45 CLB5 CLB6 CLN1 GIN4 IRR1 MCD1 MSH6 PDS5 RAD27 RNR1 SPO16 SPT21 SWE1 TOF1 YBR070C

HGH1 LOC1 NOC3 YDR152W YHL013C

YBL113C YRF1-1 YRF1-3 YRF1-7

SWI6

RCS1

GAT3

YAP5

PDR1

GAL4

RGM1

GAS1HTA1HTB1

Regulation of transcription modules by transcription factors, based on ChIP-chip data and supported by recurrent expression clusters

Regulation between transcription factors, based on ChIP-chip data and supported by 2nd-order expression correlation

Protein interactions between two transcription factors, based on experimental data and supported by 2nd-order expression correlation

Two transcription modules with high 2nd-order expression correlations

Page 64: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Integrative Array Analyzer (iArray): a software package for cross-platform and cross-species microarray analysis

Bioinformatics. 22(13):1665-7

Page 65: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Gene Aging Nexus (GAN): An Integrated Genomics Data Mining Platform for Aging

Nucleic Acids Res. 2006 Nov

Page 66: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Acknowledgement

Second-order TF cascade• Ming-Chih Kao (U. Michigan)• Haiyan Huang (UC Berkeley)• Wing H. Wong (Stanford)

CoDense• Haiyan Hu

NeMo• Xifeng Yan (IBM)• Mike Mehan

NetBiclustering• Haifeng Li• Yu Huang

Cancer Network Module• Min Xu

Funding agencies: NIGMS, NCI, NSF, Seaver Foundation, Sloan Foundation, Zumberg Foundation

Consultation• Jiawei Han (UIUC)• Michael Waterman

Page 67: Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

Thank you!