2016 presentation at the university of hawaii cancer center

Post on 19-Feb-2017

129 Views

Category:

Science

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.

- Attributed to Mark Twain

Everybody knows there are 4 subtypes of HGSC.

Everybody

… but Greg.

Tothill et al. Clinical Cancer Research. 2008

One hundred and seventy one tumors consistently segregated into one of the six k-means clusters. Most of the remaining tumors (80 of 114) could be further assigned to one of the molecular subsets by performing class prediction.

171 clustered cleanly

80 could be assigned 34 ???

12-40% unclear

The Cancer Genome Atlas, Nature. 2011

The silhouette width was computed to filter out expression profiles that were included in a subclass, but that were not robust representatives of the subclass. This resulted in the removal of 51 of 135 samples of the Differentiated subclass; 12 of 107 samples of the Immunoreactive subclass; 0 of 109 samples of the Mesenchymal subclass; and 13 of 138 samples of the Proliferative subclass..

Verhaak et al. JCI. 2013

What’s the deal with HGSC subtypes?

Casey Greene

Assistant Professor Systems Pharmacology and Translational Therapeutics

Unified Bioinformatics Pipeline curatedOvarianData

Remove •  <130 tumors •  Custom array

technology

Clustering

Analyses

SAM

Overrepresented Pathways

Survival

Match Clusters

Dataset Inclusion Criteria

TCGA

Tothill

Yoshihara

Bonome

*Our group deposited 528 samples to GEO (GSE74357)

Mayo*

Keep •  Histology •  High Grade

Sample Inclusion Criteria

Gene Selection Criteria

Keep •  1500 MAD •  Union

Remove •  <130 tumors •  Custom array

technology

Clustering

Analyses

SAM

Overrepresented Pathways

Survival

Match Clusters

Dataset Inclusion Criteria

TCGA

Tothill

Yoshihara

Bonome

*Our group deposited 528 samples to GEO (GSE74357)

Mayo*

Keep •  Histology •  High Grade

Sample Inclusion Criteria

Gene Selection Criteria

Keep •  1500 MAD •  Union

Unified Bioinformatics Pipeline

Are HGSC subtypes consistent?

Cluster Comparison

Results are consistent across clustering algorithm.

Cross-population Comparison

Cross-population Comparison

Are HGSC subtypes consistent across populations?

Are HGSC subtypes consistent across populations?

Syn-clusters: Consistent across method, study, and population

Concordance with other publications

TCGA Tothill Konecny

Unified Pipeline

Syn-cluster 1?

Syn-cluster 2?

Syn-cluster 3?

Mesenchymal-like

Proliferative-like

Immunoreactive/Differentiated-like

Why didn’t TCGA (2011) find this?

Why didn’t Konecny (2014) find this?

What about TCGA’s re-analysis of Tothill?

What if you re-analyze Tothill without LMP samples?

Cross-population analysis of high-grade serous ovarian cancer reveals only two robust subtypes. bioRxiv: http://dx.doi.org/10.1101/030239 github: http://github.com/greenelab/hgsc_subtypes

Research is to see what everybody else has seen and to think what

nobody else has thought.�Albert Szent-Györgyi

Image by J.W. McGuire/NIH

Image from You Don’t Know Jack. Vol 3.

Classes of Algorithms

Data

What patterns exist in the data?

Unsupervised algorithms

What fits this pattern in the data?

Supervised algorithms

Early-Mid 2000s Mid 2000s-Present & Mid 2010s -

If you showed 16,000 computers 10 million images from youtube, what would they see?

Le et al. 2012

Analysis with Denoising Autoencoders of �Gene Expression (ADAGE)

Tan et al. Pac Sym Bio 2015; Tan et al. mSystems 2016.

ADAGE Identifies Genes’ Pathways

Assign Pathway

… and produces useful networks

The Transcription Factor Anr Controls P.a. Response to Low O2

Low O2

O2

O2

O2

O2

O2 O2

O2 O2

O2

O2

O2

O2

O2

O2

O2 O2

O2

O2 O2

O2

O2

O2 O2

O2

O2

O2 O2 O2

O2 O2

O2

O2

O2

Anr

CF Lung Epithelium

Node42 reflects Anr Activity

E−GEOD−17179

} wt

}}Δanr

Δdnr

E−GEOD−17296

}}}}}}

ΔanrΔroxSR

ΔanrΔroxSR

wt

wt

}}

EXP

STAT

O2

E−GEO

D−52445

O2

Node42 - Anr ActivityE−G

EOD

−33160

O2

A

B

−15 0 10Value

Color KeyColor Key

−10 0 10Value

Color Key

Value−10 0 10

−10 0 15

Color Key

Value

}}Δanr

wt

}}Δanr

wt }}Δanr

wt

−5 0 5

Color Key

Value

Color Key

Value−4 0 4

Color Key

Value−2 0 2

Microarray RNAseq PAO1

RNAseq J215

C

New Experiment Validates Node 42’s Low-O2 Signature

CF lung epithelial cells Jack Hammond

E−GEOD−17179

} wt

}}Δanr

Δdnr

E−GEOD−17296

}}}}}}

ΔanrΔroxSR

ΔanrΔroxSR

wt

wt

}}

EXP

STAT

O2

E−GEO

D−52445

O2

Node42 - Anr Activity

E−GEO

D−33160

O2

A

B

−15 0 10Value

Color KeyColor Key

−10 0 10Value

Color Key

Value−10 0 10

−10 0 15

Color Key

Value

}}Δanr

wt

}}Δanr

wt }}Δanr

wt

−5 0 5

Color Key

Value

Color Key

Value−4 0 4

Color Key

Value−2 0 2

Microarray RNAseq PAO1

RNAseq J215

CE−GEOD−17179

} wt

}}Δanr

Δdnr

E−GEOD−17296

}}}}}}

ΔanrΔroxSR

ΔanrΔroxSR

wt

wt

}}

EXP

STAT

O2

E−GEO

D−52445

O2

Node42 - Anr Activity

E−GEO

D−33160

O2

A

B

−15 0 10Value

Color KeyColor Key

−10 0 10Value

Color Key

Value−10 0 10

−10 0 15

Color Key

Value

}}Δanr

wt

}}Δanr

wt }}Δanr

wt

−5 0 5

Color Key

Value

Color Key

Value−4 0 4

Color Key

Value−2 0 2

Microarray RNAseq PAO1

RNAseq J215

C

ADAGE complements PCA/ICA

E−GEOD−17179} wt

}}Δanr

Δdnr

E−GEOD−17296

}}}}}}

ΔanrΔroxSR

ΔanrΔroxSR

wt

wt

}}

EXP

STAT

O2

O2

Node42

O2

E−GEOD−33160

E−GEOD−52445

PC4 PC7 IC14

} wt

}}Δanr

Δdnr

O2

} wt

}}Δanr

Δdnr

O2

} wt

}}Δanr

Δdnr

O2−0.5 0 0.51Value

Color Key

}}}}}}

ΔanrΔroxSR

ΔanrΔroxSR

wt

wt

}}

EXP

STAT

}}}}}}

ΔanrΔroxSR

ΔanrΔroxSR

wt

wt

}}

EXP

STAT

}}}}}}

ΔanrΔroxSR

ΔanrΔroxSR

wt

wt

}}

EXP

STAT

−1 0 1 2Value

Color Key

O2 O2 O2−2−1 0 1 2 3Value

Color Key

O2 O2 O2−0.5 0.5 1.5

Value

Color Key

−2−1 0 1Value

Color Key

−3−2−1 0 1Value

Color Key

−1 0 1Value

Color Key

−1 0 1 2 3 4Value

Color Key

−1.5−0.5 0.5Value

Color Key

−0.5 0 0.5Value

Color Key

−0.4 0 0.4Value

Color Key

−1 0 1 2Value

Color Key

IC49

} wt

}}Δanr

Δdnr

O2

}}}}}}

ΔanrΔroxSR

ΔanrΔroxSR

wt

wt

}}

EXP

STAT

O2

O2

Color Key

Color Key

Color Key

−1 0 1Value

Color Key

−0.5 0.5 1.5Value

−0.5 0 0.51Value

−1 0 1 2Value

}}Δanr

wt

}Δanr

wt

Anr-Microarray

Anr-RNAseq

}}Δanr

wt

}}Δanr

wt

}}Δanr

wt

}}Δanr

wt

Value

Color Key

Value

Color Key

Value

Color Key

Value

Color Key

−0.6 0.60 −0.1 0 0.1 −0.1 0 0.1 0.2 −0.1 0 0.1

Value

Color Key

Value

Color Key

Value

Color Key

Value

Color Key

−15 0 10Value

Color Key

Color Key

−10 0 10Value

Color Key

Value−10 0 10

−10 0 15

Color Key

Value

−5 0 5

Color Key

Value

Color Key

Value

}}}

}}

Δanr

wt

PAO1

J215

}Δanr

wt

}}}

}}

Δanr

wt

PAO1

J215

}Δanr

wt

}}}

}}

Δanr

wt

PAO1

J215

}Δanr

wt

}}}

}}

Δanr

wt

PAO1

J215

}Δanr

wt

}}}

}}

Δanr

wt

PAO1

J215

−10 0 10 −1.5 0 1 −1 0 1 −0.05 0 0.1 −0.2 0 0.2

Cross-platform normalization of microarray and RNA-seq data for machine learning applications

Thompson, Tan, Greene. PeerJ. Jeff Thompson

Cross-platform normalization of microarray and RNA-seq data for machine learning applications

Thompson, Tan, Greene. PeerJ.

New Experiment Validates Node 42’s Low-O2 Signature

CF lung epithelial cells Jack Hammond

E−GEOD−17179

} wt

}}Δanr

Δdnr

E−GEOD−17296

}}}}}}

ΔanrΔroxSR

ΔanrΔroxSR

wt

wt

}}

EXP

STAT

O2

E−GEO

D−52445

O2

Node42 - Anr Activity

E−GEO

D−33160

O2

A

B

−15 0 10Value

Color KeyColor Key

−10 0 10Value

Color Key

Value−10 0 10

−10 0 15

Color Key

Value

}}Δanr

wt

}}Δanr

wt }}Δanr

wt

−5 0 5

Color Key

Value

Color Key

Value−4 0 4

Color Key

Value−2 0 2

Microarray RNAseq PAO1

RNAseq J215

CE−GEOD−17179

} wt

}}Δanr

Δdnr

E−GEOD−17296

}}}}}}

ΔanrΔroxSR

ΔanrΔroxSR

wt

wt

}}

EXP

STAT

O2

E−GEO

D−52445

O2

Node42 - Anr Activity

E−GEO

D−33160

O2

A

B

−15 0 10Value

Color KeyColor Key

−10 0 10Value

Color Key

Value−10 0 10

−10 0 15

Color Key

Value

}}Δanr

wt

}}Δanr

wt }}Δanr

wt

−5 0 5

Color Key

Value

Color Key

Value−4 0 4

Color Key

Value−2 0 2

Microarray RNAseq PAO1

RNAseq J215

C

E−GEOD−17179

} wt

}}Δanr

Δdnr

E−GEOD−17296

}}}}}}

ΔanrΔroxSR

ΔanrΔroxSR

wt

wt

}}

EXP

STAT

O2

E−GEO

D−52445

O2

Node42 - Anr Activity

E−GEO

D−33160

O2

A

B

−15 0 10Value

Color KeyColor Key

−10 0 10Value

Color Key

Value−10 0 10

−10 0 15

Color Key

Value

}}Δanr

wt

}}Δanr

wt }}Δanr

wt

−5 0 5

Color Key

Value

Color Key

Value−4 0 4

Color Key

Value−2 0 2

Microarray RNAseq PAO1

RNAseq J215

C

ADAGE analysis of publicly available gene expression data collections illuminates Pseudomonas aeruginosa-host interactions�bioRxiv: http://dx.doi.org/10.1101/030650�mSystems. 2016.

LeCun, Bengio, and Hinton. Nature 2015.

How do we move from �this to mechanisms?

What “pathways” did my experiment affect?

ADAGE-based Pathway Analysis of Transcriptomic Changes

ADAGE of Cancer Biopsies.

Hyperactive RAS

TP53

FOXM1

TCGA & METABRIC

Greene et al. Journal of Cell. Phys. 2014

Molecular Subtype Features

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

Acc

urac

y

Subtype METABRIC Discovery

METABRIC Test

TCGA Evaluation

Node30

Node29Node5Node66Node42

Node6

WWW.NATURE.COM/NATURE | 6

WWW.NATURE.COM/NATURE | 6

Basal Her2-enriched Luminal A Luminal B Normal-like

TCGA, 2012 Tan et al. 2015, PSB

But what about survival?

- Usually some portion of the audience

0 50 100 150

0.0

0.2

0.4

0.6

0.8

1.0

ER status

MonthsD

isea

se−s

peci

fic s

urviva

l pro

bablity Logrank p = 8.8e−09

ER pos: 1494(342)ER neg: 434(161)

Logrank p=8.8e-09

Survival-associated Feature

0 50 100 150

0.0

0.2

0.4

0.6

0.8

1.0

LumA Subtype

Months

Dis

ease

−spe

cific

surviva

l pro

bablity Logrank p = 2.5e−17

LumA LumA: 709(110)LumA non−LumA: 1263(396)

Logrank p=2.5e-17

0 50 100 150

0.0

0.2

0.4

0.6

0.8

1.0

Tumor Grade

Months

Dis

ease

−spe

cific

surviva

l pro

bablity Logrank p = 3.3e−08

Grade 3: 950(305)Grade 1,2: 933(186)

Logrank p=3.3e-08

Node5Logrank p = 2.1e−20

0 50 100 150

0.0

0.2

0.4

0.6

0.8

1.0

Months

Dis

ease

−spe

cific

sur

viva

l pro

babl

ity

Node5 active: 912(153)Node5 inactive: 1060(353)

Logrank p=2.1e-20

Pathway Analysis of Survival Feature (Node 5)

Pathway FDR q-value

FOXM1 transcription factor network <1×10-4

Aurora B signaling 4.93×10-4

Aurora A signaling 0.001

PLK1 signaling 0.003

Integrin-linked kinase signaling 0.068

C-MYB transcription factor network 0.074

Cell cycle

Luminal Subtype

Tumor Progression

Pan-Cancer Analysis�(Node 196 of 300)

supermarketnews.com

Mine Your�

Own Business

Greene and Troyanskaya, Nucleic Acids Research. 2011

Mine Your�

Own Business

Wong*, Park*, Greene* et al., Nucleic Acids Research. 2012

Mine Your�

Own Business

Greene,* Wong,* Krishnan,* et al. Nature Genetics. 2015.

Mine Your�

Own Business

Zelaya and Greene, In Preparation

ADAGE Webserver coming soon! http://www.greenelab.com/webservers

When you’re caught in the data deluge…

… don’t grab an umbrella…

… get a bucket.

Greene Lab: Jie Tan (Grad Student) Gregory Way (Grad Student) Brett Beaulieu-Jones (Grad Student) Sammy Klasfeld (Rotation Student) René Zelaya (Programmer) Matt Huyck (Programmer) Dongbo Hu (Programmer) Kathy Chen (Undergrad) Mulin Xiong (Undergrad) Tim Chang (Undergrad)

Collaborators: Jen Doherty & James Rudd Deb Hogan & Jack Hammond

Data: All investigators who publicly release their gene expression data.

Images: Artists who release their work under a Creative Commons license.

Funding: G&B Moore Investigator in Data-Driven Discovery National Science Foundation Cystic Fibrosis Foundation American Cancer Society

Find us online: http://www.greenelab.com Twitter: @GreeneScientist

top related