eccb

28
Guideline Introduction Methods Experimental Results Inferring Cancer Subnetwork Markers using Density-Constrained Biclustering Presenters: Phuong Dao 1 , Alexander Schonhuth 2 1 School of Computing Science, Simon Fraser University 2 Algorithmic Computational Biology Group, CWI, Netherlands

Upload: phuongdao1

Post on 23-Jun-2015

368 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Eccb

Guideline Introduction Methods Experimental Results

Inferring Cancer Subnetwork Markersusing Density-Constrained Biclustering

Presenters: Phuong Dao1, Alexander Schonhuth2

1School of Computing Science, Simon Fraser University2Algorithmic Computational Biology Group, CWI, Netherlands

Page 2: Eccb

Guideline Introduction Methods Experimental Results

Guideline

IntroductionPersonalized MedicineBiomarker Discovery

MethodsMotivationsOur approach

Experimental ResultsDataClassifier PerformanceMarkers

Page 3: Eccb

Guideline Introduction Methods Experimental Results

IntroductionPersonalized Medicine

• Exact determination of disease status based onpatient genetics/genomics

• Goal: Specific, individual choice of treatment

• Necessary: Reliable disease markers

Page 4: Eccb

Guideline Introduction Methods Experimental Results

IntroductionPersonalized Medicine

• Exact determination of disease status based onpatient genetics/genomics

• Goal: Specific, individual choice of treatment• Necessary: Reliable disease markers

Page 5: Eccb

Guideline Introduction Methods Experimental Results

Biomarker Discovery

• Single gene markers: Each gene is ranked according totheir ability to distinguish samples of different classes

• Multigenic markers: Each subset S of genes is rankedbased on the aggregation ability of all genes in S todistinguish samples of different classes

Page 6: Eccb

Guideline Introduction Methods Experimental Results

Single Gene Markers

Gene 6

Gene 4

Gene 2Gene 1

Ca

se

1

Ca

se

2

Ca

se

3

Co

ntr

ol 1

Co

ntr

ol 2

Co

ntr

ol 3

Gene 2Gene 4

Gene 6Gene 5

Gene 3Gene 1

Ca

se

1

Ca

se

2

Ca

se

3

Co

ntr

ol 1

Co

ntr

ol 2

Co

ntr

ol 3

Differentially Expressed

Non−Differentially Expressed

Gene 5

Gene 3

Page 7: Eccb

Guideline Introduction Methods Experimental Results

Multigenic MarkersSubnetwork Markers

[Chuang et al., Mol.Sys.Biol. (2007)]:

• Predicting progression of breastcancer

• Subnetwork markers areconnected subnetworks withaggregate expression profilescorrelates the most with the labelsof the samples

• Greedy heuristics for searchingfor optimal subnetwork markers

Page 8: Eccb

Guideline Introduction Methods Experimental Results

Multigenic MarkersSubnetwork Markers

[Chowdhury et al., PSB 2010]:• Predicting colon cancer subtypes• Each marker is a small connected subnetwork N such that genes

in N cover all disease samples (gene g covers sample s if g isdifferentially expressed in s)

• Greedy heuristics for searching for the smallest subnetworkmarkers

Page 9: Eccb

Guideline Introduction Methods Experimental Results

MotivationsHeterogeneity of Cancer Genomes

• Cancer genomes evolve(many cells in onepatient have differentgenomes)

• No two cancer cells oftwo different patientsare the same

[Hampton et al., Genome Research (2009)]

Page 10: Eccb

Guideline Introduction Methods Experimental Results

MotivationsProximity of Disease Related Genes in PPI Network

[Goh et al., PNAS (2007)]:

• The protein products of genes related to the same disease tend tointeract with one another

• Genes related to a disease have coherent functions with respect to theGene Ontology hierarchy

Page 11: Eccb

Guideline Introduction Methods Experimental Results

Our Approach

Each of our subnetwork markers:• includes genes that have higher interaction among

them than expected (densely connectedsubnetworks)

• contains differentially expressed genes in a fraction ofall the samples from cancer tissues (partiallydifferential expression)

Page 12: Eccb

Guideline Introduction Methods Experimental Results

Methods

Page 13: Eccb

Guideline Introduction Methods Experimental Results

Densely Connected SubnetworksProperties

Let G = (V ,E) be a network with edge weights we,e ∈ E .• The density θ(G) of G is

θ(G) :=

∑e∈E we(|V |

2

)where

(|V |2

)is the number of possible edges in G.

• G is called α-dense ifθ(G) ≥ α.

• An α-dense, connected network G is called α-denselyconnected.

Page 14: Eccb

Guideline Introduction Methods Experimental Results

Partially Differential Expression

0.75

0.9

0.85

0.7

0.95

S1

S2

S3

G1

G2

G3

G4

01

1 1

1

1 1 1

0

1

1

1

S1

S2

S3

1

1

1 1

1

1

1 1

0

10

0

G4

G5

G6

G7

G2

G4

G1

G3

0.8

0.75

0.85

0.95

0.9

G4

0.70.9

G6

0.95

0.85

G7

G5

0.3

0.65

0.75

0.45

0.95

0.55

0.7

0.8

0.45

0.95

0.75

0.6

0.85

0.8

0.25

0.9

0.9

0.5

0.9

0.950.650.35

0.750.8

0.8

0.9

0.8 0.9

0.950.85

0.80.9

Compute all densely connected subnetworks whose genes are differentiallyexpressed in a subset of patients of size at least k (here: k = 2).

Page 15: Eccb

Guideline Introduction Methods Experimental Results

Density Constrained BiclusteringSearch Strategy

Theorem: Let α ≥ 0.5. Every α-densely connected network of size ncontains an α-densely connected subnetwork of size n − 1.

maximal wDCB

B

D0.8A

C0.6

B

A0.4 A

D0.9

B

C

D

C

A

B

D 0.40.9

0.8

A

C

D

0.60.9 B

D

C

0.8A

C

B

0.60.4

0.80.9

0.60.4

C

A

D

B

Not Connected

Not Dense

0.80.9

0.60.4

C

A

D

B

= [(0.8 + 0.9 + 0.6 + 0.4) / 6]Density: 0.45

wDCB

Figure: Toy example for computation of densely connected subnetworks,density threshold θ = 0.5.

Page 16: Eccb

Guideline Introduction Methods Experimental Results

Classifier Construction

1. Rank density constrainedbiclusters according to densitysignificance

2. Keep only high-rankedsubnetworks with little overlap

3. Feature space dimension =number of markers

4. SVM classification

Average Gene Expression Profile

1.25

1.5

1.0

1.25

0.5

0.0

0.25

Gene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Gene 7

1.25

0.5

Marker 1

Marker 2

0.8

0.950.85

0.75

0.9

G4

G6

0.95

G2

G4

G3

G1

0.70.9

0.85

G5

G7

Average

Gene Expression Profile

Page 17: Eccb

Guideline Introduction Methods Experimental Results

Experimental Results

Page 18: Eccb

Guideline Introduction Methods Experimental Results

Network Data

Confidence-scored PPI network[STRING, von Mering et al., NAR 2009]

• Edges reflect physicalprotein-protein interactions

• Confidence scores reflect theprobability that the interaction isassociated with a cellularphenomenon (and not anexperimental artifact)

• Scoring system based on KEGGpathways

0.95

0.3

0.65

0.75

0.45

0.95

0.55

0.7

0.8

0.45

0.95

0.75

0.6

0.85

0.8

0.25

0.9

0.9

0.950.5

0.9

0.85

0.950.75

0.80.650.35

0.750.8

0.80.9

0.8 0.9

0.9

0.85

0.7

0.9

Page 19: Eccb

Guideline Introduction Methods Experimental Results

Gene Expression Data

Three experiments on colon cancer

• GSE8671, 32 patients / tissue pairs

• GSE10950, 24 patients / tissue pairs

• GSE6988, 123 samples across several cancer subtypes

One experiment on breast cancer

• GSE3494, 251 patients with different mutation status (wildtype vs.mutant)

Page 20: Eccb

Guideline Introduction Methods Experimental Results

GSE 8671 −→ GSE 10950

0.75

0.8

0.85

0.9

0.95

1

0 5 10 15 20 25 30 35 40 45 50

AU

C

# Subnetworks/Genes

GSE8671 >> GSE10950

SGMGMI

NETCOVERwDCB

Page 21: Eccb

Guideline Introduction Methods Experimental Results

GSE 8671 −→ GSE 6988 - Colon Cancer

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 5 10 15 20 25 30 35 40 45 50

AU

C

#Subnetworks/Genes

GSE8671 >> GSE6988

SGMGMI

NETCOVERwDCB

Page 22: Eccb

Guideline Introduction Methods Experimental Results

GSE 8671 −→ GSE 6988 - Colon Cancer

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 5 10 15 20 25 30 35 40 45 50

AU

C

#Subnetworks/Genes

GSE8671 >> GSE6988

SGMGMI

NETCOVERwDCB

Page 23: Eccb

Guideline Introduction Methods Experimental Results

GSE 3494 - Breast Cancer

Page 24: Eccb

Guideline Introduction Methods Experimental Results

Subnetwork Marker Statistics

Avg AUC Avg AUC# ER-50 6988 10950 # ER-50 6988 8671

GMI 806 0.38 0.86 0.95 755 0.34 0.84 0.99NC 923 0.12 0.87 0.99 N/A N/A 0.86 N/A

wDCB 282 0.76 0.91 1.00 216 0.74 0.91 1.008671 Subnetworks 10950 Subnetworks

GMI = Greedy Mutual Information (Chuang et al.)NC = NetCover (Chowdhury et al.)

wDCB = weighted Density Constrained Biclustering# = total number of subnetworks computed

ER-50 = enrichment rate of the top-50 markers

Page 25: Eccb

Guideline Introduction Methods Experimental Results

Top Marker 8671

• DNA replicationinitiation

• DNA metabolicprocess

• TP53, BRCA1: tumorsuppressor genes

• Minichromosomemaintenance (MCM)complex

• Protein kinase CDC7phosphorylatesMCM2

Page 26: Eccb

Guideline Introduction Methods Experimental Results

Top Marker 10950

• Nukleotide excision

• DNA clamp (PCNA)loader activity

• Polymorphisms inWRN↔ colon cancer

• DNMT1: methyltransferase, silencescell growth repressors

Page 27: Eccb

Guideline Introduction Methods Experimental Results

Future Works

1. Comparison subnetwork signatures of different cancers or subtypes of aparticular cancer

2. Extend the interaction network with for example ncRNA-protein interactions

3. Redesign novel methods to work with real valued continuous phenotypevariables

Page 28: Eccb

Guideline Introduction Methods Experimental Results

Thanks for the attention!