eccb
TRANSCRIPT
Guideline Introduction Methods Experimental Results
Inferring Cancer Subnetwork Markersusing Density-Constrained Biclustering
Presenters: Phuong Dao1, Alexander Schonhuth2
1School of Computing Science, Simon Fraser University2Algorithmic Computational Biology Group, CWI, Netherlands
Guideline Introduction Methods Experimental Results
Guideline
IntroductionPersonalized MedicineBiomarker Discovery
MethodsMotivationsOur approach
Experimental ResultsDataClassifier PerformanceMarkers
Guideline Introduction Methods Experimental Results
IntroductionPersonalized Medicine
• Exact determination of disease status based onpatient genetics/genomics
• Goal: Specific, individual choice of treatment
• Necessary: Reliable disease markers
Guideline Introduction Methods Experimental Results
IntroductionPersonalized Medicine
• Exact determination of disease status based onpatient genetics/genomics
• Goal: Specific, individual choice of treatment• Necessary: Reliable disease markers
Guideline Introduction Methods Experimental Results
Biomarker Discovery
• Single gene markers: Each gene is ranked according totheir ability to distinguish samples of different classes
• Multigenic markers: Each subset S of genes is rankedbased on the aggregation ability of all genes in S todistinguish samples of different classes
Guideline Introduction Methods Experimental Results
Single Gene Markers
Gene 6
Gene 4
Gene 2Gene 1
Ca
se
1
Ca
se
2
Ca
se
3
Co
ntr
ol 1
Co
ntr
ol 2
Co
ntr
ol 3
Gene 2Gene 4
Gene 6Gene 5
Gene 3Gene 1
Ca
se
1
Ca
se
2
Ca
se
3
Co
ntr
ol 1
Co
ntr
ol 2
Co
ntr
ol 3
Differentially Expressed
Non−Differentially Expressed
Gene 5
Gene 3
Guideline Introduction Methods Experimental Results
Multigenic MarkersSubnetwork Markers
[Chuang et al., Mol.Sys.Biol. (2007)]:
• Predicting progression of breastcancer
• Subnetwork markers areconnected subnetworks withaggregate expression profilescorrelates the most with the labelsof the samples
• Greedy heuristics for searchingfor optimal subnetwork markers
Guideline Introduction Methods Experimental Results
Multigenic MarkersSubnetwork Markers
[Chowdhury et al., PSB 2010]:• Predicting colon cancer subtypes• Each marker is a small connected subnetwork N such that genes
in N cover all disease samples (gene g covers sample s if g isdifferentially expressed in s)
• Greedy heuristics for searching for the smallest subnetworkmarkers
Guideline Introduction Methods Experimental Results
MotivationsHeterogeneity of Cancer Genomes
• Cancer genomes evolve(many cells in onepatient have differentgenomes)
• No two cancer cells oftwo different patientsare the same
[Hampton et al., Genome Research (2009)]
Guideline Introduction Methods Experimental Results
MotivationsProximity of Disease Related Genes in PPI Network
[Goh et al., PNAS (2007)]:
• The protein products of genes related to the same disease tend tointeract with one another
• Genes related to a disease have coherent functions with respect to theGene Ontology hierarchy
Guideline Introduction Methods Experimental Results
Our Approach
Each of our subnetwork markers:• includes genes that have higher interaction among
them than expected (densely connectedsubnetworks)
• contains differentially expressed genes in a fraction ofall the samples from cancer tissues (partiallydifferential expression)
Guideline Introduction Methods Experimental Results
Methods
Guideline Introduction Methods Experimental Results
Densely Connected SubnetworksProperties
Let G = (V ,E) be a network with edge weights we,e ∈ E .• The density θ(G) of G is
θ(G) :=
∑e∈E we(|V |
2
)where
(|V |2
)is the number of possible edges in G.
• G is called α-dense ifθ(G) ≥ α.
• An α-dense, connected network G is called α-denselyconnected.
Guideline Introduction Methods Experimental Results
Partially Differential Expression
0.75
0.9
0.85
0.7
0.95
S1
S2
S3
G1
G2
G3
G4
01
1 1
1
1 1 1
0
1
1
1
S1
S2
S3
1
1
1 1
1
1
1 1
0
10
0
G4
G5
G6
G7
G2
G4
G1
G3
0.8
0.75
0.85
0.95
0.9
G4
0.70.9
G6
0.95
0.85
G7
G5
0.3
0.65
0.75
0.45
0.95
0.55
0.7
0.8
0.45
0.95
0.75
0.6
0.85
0.8
0.25
0.9
0.9
0.5
0.9
0.950.650.35
0.750.8
0.8
0.9
0.8 0.9
0.950.85
0.80.9
Compute all densely connected subnetworks whose genes are differentiallyexpressed in a subset of patients of size at least k (here: k = 2).
Guideline Introduction Methods Experimental Results
Density Constrained BiclusteringSearch Strategy
Theorem: Let α ≥ 0.5. Every α-densely connected network of size ncontains an α-densely connected subnetwork of size n − 1.
maximal wDCB
B
D0.8A
C0.6
B
A0.4 A
D0.9
B
C
D
C
A
B
D 0.40.9
0.8
A
C
D
0.60.9 B
D
C
0.8A
C
B
0.60.4
0.80.9
0.60.4
C
A
D
B
Not Connected
Not Dense
0.80.9
0.60.4
C
A
D
B
= [(0.8 + 0.9 + 0.6 + 0.4) / 6]Density: 0.45
wDCB
Figure: Toy example for computation of densely connected subnetworks,density threshold θ = 0.5.
Guideline Introduction Methods Experimental Results
Classifier Construction
1. Rank density constrainedbiclusters according to densitysignificance
2. Keep only high-rankedsubnetworks with little overlap
3. Feature space dimension =number of markers
4. SVM classification
Average Gene Expression Profile
1.25
1.5
1.0
1.25
0.5
0.0
0.25
Gene 1
Gene 2
Gene 3
Gene 4
Gene 5
Gene 6
Gene 7
1.25
0.5
Marker 1
Marker 2
0.8
0.950.85
0.75
0.9
G4
G6
0.95
G2
G4
G3
G1
0.70.9
0.85
G5
G7
Average
Gene Expression Profile
Guideline Introduction Methods Experimental Results
Experimental Results
Guideline Introduction Methods Experimental Results
Network Data
Confidence-scored PPI network[STRING, von Mering et al., NAR 2009]
• Edges reflect physicalprotein-protein interactions
• Confidence scores reflect theprobability that the interaction isassociated with a cellularphenomenon (and not anexperimental artifact)
• Scoring system based on KEGGpathways
0.95
0.3
0.65
0.75
0.45
0.95
0.55
0.7
0.8
0.45
0.95
0.75
0.6
0.85
0.8
0.25
0.9
0.9
0.950.5
0.9
0.85
0.950.75
0.80.650.35
0.750.8
0.80.9
0.8 0.9
0.9
0.85
0.7
0.9
Guideline Introduction Methods Experimental Results
Gene Expression Data
Three experiments on colon cancer
• GSE8671, 32 patients / tissue pairs
• GSE10950, 24 patients / tissue pairs
• GSE6988, 123 samples across several cancer subtypes
One experiment on breast cancer
• GSE3494, 251 patients with different mutation status (wildtype vs.mutant)
Guideline Introduction Methods Experimental Results
GSE 8671 −→ GSE 10950
0.75
0.8
0.85
0.9
0.95
1
0 5 10 15 20 25 30 35 40 45 50
AU
C
# Subnetworks/Genes
GSE8671 >> GSE10950
SGMGMI
NETCOVERwDCB
Guideline Introduction Methods Experimental Results
GSE 8671 −→ GSE 6988 - Colon Cancer
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
0 5 10 15 20 25 30 35 40 45 50
AU
C
#Subnetworks/Genes
GSE8671 >> GSE6988
SGMGMI
NETCOVERwDCB
Guideline Introduction Methods Experimental Results
GSE 8671 −→ GSE 6988 - Colon Cancer
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
0 5 10 15 20 25 30 35 40 45 50
AU
C
#Subnetworks/Genes
GSE8671 >> GSE6988
SGMGMI
NETCOVERwDCB
Guideline Introduction Methods Experimental Results
GSE 3494 - Breast Cancer
Guideline Introduction Methods Experimental Results
Subnetwork Marker Statistics
Avg AUC Avg AUC# ER-50 6988 10950 # ER-50 6988 8671
GMI 806 0.38 0.86 0.95 755 0.34 0.84 0.99NC 923 0.12 0.87 0.99 N/A N/A 0.86 N/A
wDCB 282 0.76 0.91 1.00 216 0.74 0.91 1.008671 Subnetworks 10950 Subnetworks
GMI = Greedy Mutual Information (Chuang et al.)NC = NetCover (Chowdhury et al.)
wDCB = weighted Density Constrained Biclustering# = total number of subnetworks computed
ER-50 = enrichment rate of the top-50 markers
Guideline Introduction Methods Experimental Results
Top Marker 8671
• DNA replicationinitiation
• DNA metabolicprocess
• TP53, BRCA1: tumorsuppressor genes
• Minichromosomemaintenance (MCM)complex
• Protein kinase CDC7phosphorylatesMCM2
Guideline Introduction Methods Experimental Results
Top Marker 10950
• Nukleotide excision
• DNA clamp (PCNA)loader activity
• Polymorphisms inWRN↔ colon cancer
• DNMT1: methyltransferase, silencescell growth repressors
Guideline Introduction Methods Experimental Results
Future Works
1. Comparison subnetwork signatures of different cancers or subtypes of aparticular cancer
2. Extend the interaction network with for example ncRNA-protein interactions
3. Redesign novel methods to work with real valued continuous phenotypevariables
Guideline Introduction Methods Experimental Results
Thanks for the attention!