* these authors contributed equally to this work † … · 2020. 9. 17. · * these authors...

43
TITLE: Diversity Across the Pancreatic Ductal Adenocarcinoma Disease Spectrum Revealed by Network- Anchored Functional Genomics AUTHORS: Johnathon L. Rose 1,2, *, Sanjana Srinivasan 1,2, *, Wantong Yao 3 , Sahil Seth 2,4 , Michael Peoples 4 , Annette Machado 4 , Chieh-Yuan Li 1,2 , I-Lin Ho 1,2 , Jaewon J. Lee 3,5,6 , Paola A. Guerrero 3,5 , Eiru Kim 7 , Mustafa Syed 8 , Joseph R. Daniele 4 , Angela Deem 9 , Michael Kim 6 , Christopher A. Bristow 4 , Eugene J. Koay 8 , Giannicola Genovese 10 , Andrea Viale 1 , Timothy P. Heffernan 4 , Anirban Maitra 3,5 , Traver Hart 7,11 , Alessandro Carugo 4,, and Giulio F. Draetta 1,AFFILIATIONS: 1. Department of Genomic Medicine, UT MD Anderson Cancer Center, Houston, TX 77030, USA 2. The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences; The University of Texas Health Science Center at Houston, Houston, TX 77030, USA 3. Department of Translational Molecular Pathology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA 4. Translational Research to AdvanCe Therapeutics and Innovation in ONcology (TRACTION), University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA 5. Sheikh Ahmed Center for Pancreatic Cancer Research, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA 6. Department of Surgical Oncology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA 7. Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA 8. Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA 9. Institute for Applied Cancer Science, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA 10. Department of Genitourinary Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA 11. Department of Cancer Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA * These authors contributed equally to this work † Correspondence: [email protected] , [email protected] ABSTRACT: Cancers are highly complex ecosystems composed of molecularly distinct sub-populations of tumor cells, each exhibiting a unique spectrum of genetic features and phenotypes, and embedded within a complex organ context. To substantially improve clinical outcomes, there is a need to comprehensively define (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint this version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034 doi: bioRxiv preprint

Upload: others

Post on 24-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

TITLE:

Diversity Across the Pancreatic Ductal Adenocarcinoma Disease Spectrum Revealed by Network-

Anchored Functional Genomics

AUTHORS:

Johnathon L. Rose1,2,*, Sanjana Srinivasan1,2,*, Wantong Yao3, Sahil Seth2,4, Michael Peoples4, Annette

Machado4, Chieh-Yuan Li1,2, I-Lin Ho1,2, Jaewon J. Lee3,5,6, Paola A. Guerrero3,5, Eiru Kim7, Mustafa

Syed8, Joseph R. Daniele4, Angela Deem9, Michael Kim6, Christopher A. Bristow4, Eugene J. Koay8,

Giannicola Genovese10, Andrea Viale1, Timothy P. Heffernan4, Anirban Maitra3,5, Traver Hart7,11,

Alessandro Carugo4,†, and Giulio F. Draetta1,†

AFFILIATIONS:

1. Department of Genomic Medicine, UT MD Anderson Cancer Center, Houston, TX 77030, USA 2. The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical

Sciences; The University of Texas Health Science Center at Houston, Houston, TX 77030, USA 3. Department of Translational Molecular Pathology, University of Texas MD Anderson Cancer Center,

Houston, TX 77030, USA 4. Translational Research to AdvanCe Therapeutics and Innovation in ONcology (TRACTION),

University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA 5. Sheikh Ahmed Center for Pancreatic Cancer Research, University of Texas MD Anderson Cancer

Center, Houston, TX 77030, USA 6. Department of Surgical Oncology, University of Texas MD Anderson Cancer Center, Houston, TX

77030, USA 7. Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson

Cancer Center, Houston, TX 77030, USA 8. Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston,

TX 77030, USA 9. Institute for Applied Cancer Science, The University of Texas MD Anderson Cancer Center, Houston,

TX 77030, USA 10. Department of Genitourinary Medical Oncology, The University of Texas MD Anderson Cancer

Center, Houston, TX 77030, USA 11. Department of Cancer Biology, The University of Texas MD Anderson Cancer Center, Houston, TX

77030, USA

* These authors contributed equally to this work

† Correspondence: [email protected], [email protected]

ABSTRACT:

Cancers are highly complex ecosystems composed of molecularly distinct sub-populations of tumor cells,

each exhibiting a unique spectrum of genetic features and phenotypes, and embedded within a complex

organ context. To substantially improve clinical outcomes, there is a need to comprehensively define

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 2: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

inter- and intra-tumor phenotypic diversity, as well as to understand the genetic dependencies that

underlie discrete molecular subpopulations. To this end, we integrated CRISPR-based co-dependency

annotations with a tissue-specific co-expression network developed from patient-derived models to

establish CoDEX, a framework to quantitatively associate gene-cluster patterns with genetic

vulnerabilities in pancreatic ductal adenocarcinoma (PDAC). Using CoDEX, we defined multiple

prominent anticorrelated gene-cluster signatures and specific pathway dependencies, both across

genetically distinct PDAC models and intratumorally at the single-cell level. Of these, one differential

signature recapitulated the characteristics of classical and basal-like PDAC molecular subtypes on a

continuous scale. Anchoring genetic dependencies identified through functional genomics within the

gene-cluster signature defined fundamental vulnerabilities associated with transcriptomic signatures of

PDAC subtypes. Subtype-associated dependencies were validated by feature-barcoded CRISPR

knockout of prioritized basal-like-associated genetic vulnerabilities (SMAD4, ILK, and ZEB1) followed by

scRNAseq in multiple PDAC models. Silencing of these genes resulted in a significant and directional

clonal shift toward the classical-like signature of more indolent tumors. These results validate CoDEX as a

novel, quantitative approach to identify specific genetic dependencies within defined molecular contexts

that may guide clinical positioning of targeted therapeutics. (Word count: 231)

INTRODUCTION:

Pancreatic ductal adenocarcinoma (PDAC) is the third leading cause of cancer-related death in the

United States, with a 5-year survival rate of 8% and a median survival of <11 months1,2. Despite continual

attempts to manage this disease with targeted drugs and immunotherapy, a vast majority of PDAC tumors

are recalcitrant to therapeutic interventions. This is partially due to the dominance of KRAS gain-of-

function mutations (90% of tumors), and frequent loss-of-function genetic alterations in the well-known

epithelial tumor suppressors, TP53 (64%), SMAD4 (23%), and CDKN2A (17%), which make this disease

resistant to RTK pathways inhibitors, inducers of apoptosis, and other drugs. Beyond these dominant

genetic alterations, mutation-based diversity in PDAC consists of other mutations present at a

significantly lower frequency (<10%)3,4. This low mutation load is likely one factor that limits the efficacy of

immunotherapy, as checkpoint inhibitor monotherapies have shown little success in unselected patients5.

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 3: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

The fundamental functional characteristics, or “hallmarks”, of tumor cells, predict that interfering with

functions such as cell division, cell motility, cell energetics, etc., should profoundly inhibit tumor growth

and progression. Yet, redundancy in essential pathways and adaptation often result in selective

enrichment of drug-resistant tumor cells and disease progression and relapse. Recent efforts in PDAC

molecular subtyping have better characterized disease heterogeneity using transcriptomic signatures

associated with clinical features, which can be used to define multiple PDAC subtypes6-9. Of these, the

most widely accepted classification distinguishes between “classical” and “basal-like” tumor subtypes7.

While these subtypes possess prognostic relevance, with basal-like tumors exhibiting the poorer

prognosis, their utility to predict patient response to specific targeted therapeutics has not yet been

realized. Moreover, clonal and sub-clonal evolution as well as therapeutic intervention can result in

molecular signatures that change the original subtype classification, highlighting the relevance of

intratumoral molecular and functional heterogeneity that underlies high-level subtype groupings10,11.

We have developed a systematic approach to correlate the status of pathway activation in a tumor with its

response to genetic suppression of individual gene functions. To associate genetic drivers with clinically

predictive transcriptomic signatures, we developed CoDEX, a dedicated PDAC co-expression network to

anchor dependencies within annotated cluster signatures. CoDEX represents the integration of CRISPR-

based co-dependency annotation with a tumor-specific co-expression network that we derived from a

curated cohort of patient-derived xenografts (PDXs), and it allows us to quantitatively delineate context-

specific vulnerabilities within high-resolution maps of transcriptomic diversity.

To develop CoDEX, we first refined and annotated the PDX PDAC co-expression network into 31

biologically defined gene clusters. Then, using prominent anticorrelated cluster signatures derived from

the co-expression network, we defined PDAC classical and basal-like subtypes on a continuous scale

and quantified the transitory nature of the transcriptional signature underlying these molecular subtypes,

which defined a group that exhibited a “quasi-basal” signature. We captured the subtype-associated

signatures, observed in PDX models, across TCGA tumor samples, and also at the single-cell level in

sequenced PDX-derived cell lines and human PDAC primary tumor samples. Finally, leveraging our

previously optimized in vivo genetic interference screening platform12,13, we used a customized CRISPR

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 4: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

library to expand on interacting nodes identified through a protein-protein interaction network, which

identified a spectrum of interconnected dependencies characteristic of each PDX model tested.

CoDEX is a novel approach to establish context specificity of individual gene targets within PDAC

molecular subtypes. This bears translational significance for several reasons. First, gene centrality serves

as a quantitative method to prioritize cluster-representative genes, with centrality and direct co-expression

jointly defining ideal biomarkers for subtype-specific dependencies. Second, by anchoring dependencies

identified through CRISPR screening within the co-expression network, we are able to uncover anti-

correlative cluster signatures that inform on the molecular context underlying each molecular subtype.

Fundamental dependencies associated with the basal-like subtype were validated intratumorally using

feature barcoding and single-cell RNA sequencing (scRNAseq), which directly supported the functional

relevance of specific genetic perturbations both at the level of sub-clonal composition and network-

defined transcriptomic signatures. Third, CoDEX annotation of bulk tumors uncovers an opportunity to

selectively eradicate dominant sub-populations and explore transcriptomic heterogeneity as a metric to

characterize tumor response to perturbation, as opposed to tumor size alone.

In sum, this work describes the application of a novel, quantitative methodology to characterize and

stratify genetic targets on transcriptomic signature patterns, which represents an important advancement

toward the development of subtype-specific targeted therapies in PDAC.

RESULTS:

Defining Transcriptomic Diversity in PDAC through the Construction and Annotation of a PDX-

based Co-Expression Network

To assess the diversity of transcriptomic signatures among PDAC tumors, we selected early passage

patient-derived PDAC xenografts, which maintain the cellular heterogeneity of tumor lesions while

reducing the contribution of the stromal components prevalent in these tumors14. We performed whole-

transcriptome sequencing on a set of 48 PDAC PDX tumors curated at MD Anderson (Figure 1A). Within

this cohort, we recognized a level of transcriptional diversity consistent with the previously defined Moffitt

classification status and distribution of classical and basal-like pancreatic adenocarcinoma (Extended

Data Figure 1A). In addition, we conducted whole-exome sequencing, which confirmed mutation

frequencies comparable to those reported by the TCGA4 (Figure 1B).

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 5: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

To characterize our cohort, we calculated global pairwise Pearson correlations of genes with variable

expression to quantify concordant patterns of transcriptomic diversity across our models. These

correlations were further pruned to prioritize biologically relevant gene pairs15, and from the resulting

103,000 correlations of 7,828 genes, we established a PDAC co-expression network (Figure 1A and

Extended Data Figure 1B). Using InfoMap16, a community detection tool, we divided the co-expression

network into 31 clusters, which were further genome ontology (GO) annotated (Figure 1C). We measured

PDAC diversity by applying a dimensional reduction approach on our 31 defined clusters. Specifically, we

quantified the mean expression of each cluster, or centroid score, on a tumor-by-tumor basis, for all 48

PDX models in the PDAC cohort (Figure 1D). To determine if the mutational background of a model was

significantly associated with cluster enrichment across the PDX cohort, we applied UNCOVER17, a

method to identify complementary patterns of mutation enrichment across groups.

We observed general patterns of anti-correlative clusters across the PDAC PDX cohort that were

reflected in cluster positioning and subsequent cross-cluster connectivity. The most significant

anticorrelated signatures were identified in clusters predominantly localized to adjacent ends of the force-

directed layout (Figure 1D - F), and the non-overlap of these adjacent anti-correlated cluster trends

implicates multiple distinct molecular signaling contexts that represent the immense diversity across the

PDAC disease spectrum. The top anti-correlative signatures were quantified between two opposing

clusters: Cluster 1 vs. Cluster 23, respectively enriched for lipid metabolism vs. cell development; and

Cluster 2 vs. Cluster 13, respectively enriched for Golgi-vesicle transport vs. nuclear-transcribed mRNA

catabolism (Figure 1E - F). Additionally, the PDAC co-expression network also highlighted GO annotated

clusters critical for proliferating tumor cells; specifically, Cluster 15, cell cycle, and Cluster 16,

mitochondrial transport and organization (Figure 1C and Extended Data Figure 1D). These findings reveal

that, along with 31 unique gene clusters, two distinct anti-correlated cluster signatures contribute to PDAC

tumor diversity and have the potential to provide context for tumor cell-intrinsic vulnerabilities.

Interestingly, we identified a limited cluster-mutation association between ACVR2A, RREB1, and MARK2

mutations and tumors with significant enrichment in Cluster 21 (Extended Data Figure 1C). PDAC-

associated loss-of-function mutations in RREB1, which encodes a zinc finger transcription factor that

binds to RAS-responsive elements, have been reported in the TCGA PDAC cohort4. RREB1 is a positive

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 6: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

regulator of the zinc transporter, ZIP3, with loss-of-function playing a potential role in limiting zinc uptake

and shielding developing tumors from the cytotoxic effects of high cellular zinc concentrations18. RREB1

has also been described as a KRAS-regulated SMAD co-factor involved in driving the expression of

epithelial-to-mesenchymal (EMT) transcription factors19. While the significance of mutations in the serine-

threonine kinases, MARK2 and ACVR2A, is relatively poorly understood in PDAC, dysregulation in these

genes could implicate the regulation of epithelial polarity and downstream SMAD-associated signaling,

respectively20-22. Associating co-expressed and annotated gene clusters with these less frequent

mutations in PDAC could aid in illuminating the molecular signaling, and potential therapeutic avenues,

underlying these genomic alterations.

In vitro and in vivo screening of essential signaling pathways in PDAC

We selected four PDXs out of those used to establish the co-expression network based on the presence

of mutations representative of PDAC (Extended Figure 2B and 2C), and we employed early passage cell

lines (PDX lines) of these models for genetic screening. We applied a stepwise custom functional

genomics platform for screening Moffitt-defined classical (PATC69) and basal-like (PATC124, PATC53,

and PATC153) PDX lines in parallel in vivo and in vitro (Figure 2A and Extended Data Figure 2A). Small,

customized lentiviral libraries were used to ensure maintenance of library complexity in vivo, based on

tumor-initiating cell frequency assessment12.

To perform in vivo and in vitro RNAi screens, we designed a surface proteins-targeting library (Figure 2A

and Extended Table 1A) based on evidence of differential expression in pancreatic tumors compared to

matched-normal tissue, copy number versus RNA expression correlation trends from the TCGA PDAC

dataset, and SILAC screening for mutant Kras dependency (Extended Data Table 1A)4,23-27. We used

redundant shRNA activity (RSA) to statistically score and identify candidate protein hits whose targeting

shRNA were selectively depleted in the screens (Extended Data Figure 2D - G). Through these initial

RNAi screens, we functionally annotated a wide range of PDAC-associated dependencies at the cell

surface and uncovered oncogenic signaling diversity across the PDX lines (Extended Data Figure 2).

We subsequently built a custom sgRNA library that re-annotated and expanded upon the RNAi-identified

protein surface dependencies (See Methods) across all of our PDX models (Figure 2A, Extended Data

Figure 2H). Using a CRISPR-based approach, the library incorporated sgRNAs targeting highly

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 7: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

connected proteins (see Methods) defined by a stringent STRING protein-protein interaction (PPI)

network (Figure 2A and Extended Data Table 1B)28 used to evaluate oncogenic signaling redundancies

and interpret co-dependencies. Three out of four available PDX lines (PATC69, PATC124 and PATC53)

were compatible with the CRISPR-based screening technology in vivo (Extended Data Figure 2B). We

conducted each in vitro CRISPR screen as a time course, with a matched in vivo endpoint (Extended

Data Figure 2A).

Data from the sgRNA screens were not amenable to analysis using current analytical frameworks (e.g.

MaGeCK29, JACKS30, CERES31, and BAGEL32,33), which are designed for genome-wide screens and are

not tailored for the small training sets of our custom library. To address this, we adapted the BAGEL

framework32,33 to create Low-Fat BAGEL, which is optimized to analyze small targeted sgRNA libraries

that are needed for in vivo screening or in other experimental settings where complexity may be a

limitation. The difference in performance between BAGEL and Low-Fat BAGEL in dealing with outliers

(Extended Data Figure 3A) is exemplified in the Bayes Factors (BF) (Extended Data Figure 3B and

Extended Data Figure 3C) and Precision-Recall curves for a particular screen in PATC69 PDX lines

(Extended Data Figure 3D). In comparison to other genome-wide analytical methods, Low-Fat BAGEL

analyses demonstrate better performance and a more accurate classification of essential and

nonessential genes in our screens (Extended Data Figure 3E). To ensure quality control, complexity

coverage for RNAi and CRISPR libraries was confirmed for each in vitro and in vivo screen (Extended

Data 4A - C). Control separation was confirmed in RNAi screens, and fold change separation and

precision-recall of the 50 essential and 50 non-essential control populations were confirmed for each

CRISPR screen using Low-Fat BAGEL (See Methods, Extended Data Figure 4D - I).

Quantile-normalized BFs (BF > 1) were directly compared to uncover essential genes32 represented in the

three models in vitro and in vivo (Figure 2B and Extended Data Figure 5A). Dependencies among the

three PDX lines, in both in vitro and in vivo conditions, were highly diverse. Ribosome-associated RACK1,

RPL30, and the MYC proto-oncogene were the only shared vulnerabilities identified in all in vivo and in

vitro screening contexts (Figure 2B and Extended Data Figure 5A). Quantile-normalized BFs also

highlighted varying degrees of essentiality for KRAS, with PATC124 growth exhibiting less dependence

on KRAS compared to PATC69 and PATC53 PDX lines (Figure 2C - E and Extended Data Figure 5B -

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 8: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

D), which suggests that oncogenic RTK buffering may contribute to signaling complexity even in the

context of mutated KRAS.

CoDEX platform identifies EMT-associated dependencies along prominent PDAC molecular

signatures

Next, we integrated quantile-normalized BFs with the STRING PPI network used to build the sgRNA

library to generate dependency networks for each model. Dependency networks were merged across all

three PDX lines based on overlapping essentiality thresholds (BF >1) for in vivo and in vitro contexts and

displayed using a force-directed layout to visualize connectivity between established and novel gene

targets (Figure 2C - E and Extended Data Figure 5B - D). The in vivo PATC53 dependency network

highlighted multiple unique groups of interconnected vulnerabilities when compared to PATC69 and

PATC124 dependency networks, despite PATC53 and PATC124 both being classified as basal-like,

based on the Moffitt signature (Extended Data Figure 2B). Notably, interconnected vulnerabilities in

PATC53 were associated with epithelial-to-mesenchymal transition (EMT) (e.g. SNAI1, ZEB1, SMAD4,

MAPK11 and MAPK14), integrin signaling (e.g. ILK, ITGB1, ITGB2 and NPNT), heparan sulfate

proteoglycan regulation (e.g. SDC3 and EXT1), cell junction regulation (e.g. CTTN, TJP1 and TJP2), and

intracellular signaling kinases (e.g. BCAR1, NCK1, CRKL and CRK) (Figure 2F - H, Extended Data Figure

5E - G).

To anchor these vulnerability trends within the landscape of PDAC diversity, we then integrated the

CRISPR screen-defined dependencies within our co-expression network. The integration displayed a

clear shift in the PATC53-associated dependency spectrum along the Cluster 1 -to- Cluster 23 axis, with

many of the dependencies identified in the CRISPR screen localized within, or adjacent to, Cluster 23

(Figure 1D, Figure 2F - H, Extended Data Figure 5E - G). Specifically, MAPK11 and FAM171A2 were

localized within Cluster 23 itself; NKAIN4 in Cluster 25; and SMAD4, ZEB1 and SDC3 in Cluster 31

(Figure 1D, Figure 2F - H, Extended Data Figure 5E - G). The network localization of functionally

annotated and PATC53-specific ZEB1, an EMT-associated transcription factor, and SMAD4, an EMT

facilitator and oncogenic driver in advanced PDAC, implicated a connection between the Cluster 1-to-23

(C1vC23) axis, and its associated dependencies, with PDAC epithelial-to-mesenchymal

transdifferentiation34,35.

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 9: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

Prominent anticorrelated cluster signatures outline a continuous classical-to-basal differential

signature

The distinct gene clusters we identified in the PDAC co-expression network provide a means to deeply

characterize the diversity of any PDAC model by considering correlations in disease-specific cluster

enrichment patterns, a parallel strategy to current approaches that use consensus clustering to assign a

tumor subtype based on refined PDAC-specific gene sets. Thus, this refined co-expression network can

comprehensively quantify global transcriptomic signaling trends on a tumor-by-tumor basis. Upon

observing the clear dependency shift along Cluster 23 in the basal-like PATC53 model (Figure 2H), we

noted a strong anti-correlative trend between Cluster 23 and Cluster 1 centroid scores among the entire

PDAC PDX cohort (Figure 1E). The Cluster 1 centroid scores across our PDX cohort showed low

variance among Moffitt-defined classical models, but a wider range and significant depletion of Cluster 1

gene expression (p = 3.67 x 10-6) was observed in basal-like models (Figure 3A). Interestingly, Cluster 1

unbiasedly localized the entire set of 21 classical signature genes from the Moffitt classification. Together,

these findings suggested that we could leverage the anti-correlated C1vC23 axis as a classical to basal-

like differential signature. Indeed, by applying K means clustering on only the Cluster 1 and Cluster 23

centroid scores, we separated the PDX cohort into three groups (Figure 3B): 1) enrichment in Cluster 1

represented Moffitt-defined classical models, 2) enrichment in Cluster 23 represented Moffitt-defined

basal-like models, and 3) models with partial enrichment of both Clusters 1 and 23 (Figures 3C - D),

which we termed “quasi-basal”. Thus, our C1vC23 signature provides a continuous transcriptomic

signature that expands on the original binary Moffitt classification and uncovers a transitionary quasi-

basal phenotype of PDAC with molecular signatures falling along a continuum.

To gain further insight into the molecular signatures driving PDAC subtypes, we conducted gene-set

enrichment analysis to compare CoDEX-defined classical vs. basal-like models within our PDX cohort.

This highlighted that EMT signaling was significantly enriched among PDX models that fell into the

CoDEX definition of basal-like, whereas this gene set was depleted in classical models (Figure 3E).

Enrichment of an EMT signature in basal-like PDX models supports the hypothesis that the basal-like

subtype is strongly associated with tumors where a majority of tumor cells has at least partly undergone

transdifferentiation towards a mesenchymal phenotype. Consistently, analysis of clinical and histological

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 10: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

data (See Methods) for the PDX cohort demonstrated that basal-like tumors were uniformly associated

with poor differentiation status and distant metastatic recurrence (Figure 3B). Next, using high-epithelial-

content PDAC patient data from TCGA (30% - 80% epithelial cells), we confirmed the presence of the

network-derived subtyping, again identifying a quasi-basal continuum. Further, we recapitulated the

association between tumor histology and the network-derived subtype in the TCGA dataset, wherein

more poorly differentiated tumors were classified as basal-like, with strong enrichment in Cluster 23 gene

expression (Figure 3F). By quantitatively characterizing a quasi-basal population, the CoDEX platform

enables a more precise definition of the clinically relevant basal-like tumor cohort while also expanding on

associated molecular dependencies that may be considered for therapeutic intervention7. Moreover,

CoDEX defines a broader, cluster-level characterization of the range of molecular signaling contributing to

diversity in PDAC, which can be used to granularly ascertain pathways that may be therapeutically

targeted to exert anti-tumor effects across this classical, quasi-basal and basal-like tumor spectrum.

Single-cell transcriptomic profiles of tumors define a cell-intrinsic clonal signature of PDAC

subtypes

To investigate whether the quasi-basal signature identified in bulk tumor populations represents a

quantifiable cell state or a mean signature derived from competing subcellular populations, we conducted

single-cell RNAseq (scRNAseq) on the PATC69 (quasi-basal), PATC124 (quasi-basal), and PATC53

(basal-like) PDX lines. The co-expression network was used to identify cells expressing more than 30% of

any cluster, and a centroid score for that cluster was calculated using genes with expression highly

correlated to the cluster enrichment (r > 0.4). By analyzing scRNAseq data in the context of the co-

expression network, we circumvented the technical issue of signal dropout by prioritizing large gene

clusters to represent transcriptional diversity rather than single gene expression. Thus, the co-expression

network serves as an additional resource for disease-specific single cell analysis. This approach

successfully confirmed the presence of a quasi-basal C1vC23 signature as a quantifiable state in

individual cells within in each PDX model (Figure 4A - D), confirming that the bulk readout represented an

average of the intratumoral spectrum of classical, quasi-basal, and basal-like subclonal populations,

rather than competing classical and basal-like signatures.

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 11: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

To evaluate the utility of the network-defined tumor subtypes in a clinical setting, in which small numbers

of cells and stromal components may influence molecular analyses, we evaluated seven patient core

needle biopsies (CNBs; four primary tumors, as well as one each of liver, lung, and vaginal metastases)

using the network-aided scRNAseq analysis36 (Figure 4E, Extended Data Figure 6A). We first defined the

distribution of cell types amongst our samples (Figure 4F) using previously annotated marker genes

(Extended Data Figure 6B) and characterized cluster representation across the tumor microenvironment

by quantifying the mean percentage of genes of each cluster for each cell type. This identified the

epithelial component of the tumor as the primary contributor to the co-expression network cluster

signatures (Figure 4G). Applying the same cluster centroid normalization method described above, the

majority of cells that met our quality control cutoff were epithelial cells, with very little representation from

two primary tumor samples (Primary 1 and Primary 2) that contained little to no epithelial content

(Extended Data Figure 6A). Finally, to determine where each cell type was represented on the classical to

basal-like continuum, we assessed the C1vC23 cluster differential. We found that the C1vC23

classification is largely present within epithelial cells, with Uniform Manifold Approximation and Projection

(UMAP) clusters of fibroblasts and endothelial cells only representing a minority of misclassified C1vC23

signatures within the representative multiregional tumor microenvironment (Figure 4H). This finding

confirms a lack of sample purity bias in the characterization of C1vC23 signatures in the TCGA PDAC

samples (outlined in Figure 3F) and supports the feasibility of applying network-based cluster

characterization to bulk clinical samples, including CNBs.

CoDEX platform prioritizes subtype-associated biomarkers

Our PDAC-specific co-expression network allows us to leverage cluster centrality to associate cluster-

representative biomarkers to signatures, and the CRISPR screens uncovered first-degree nodes of

specific dependencies that can also serve as significantly co-expressed biomarkers. By jointly applying

centrality and defining hit-specific nodes to prioritize biomarkers, we capture both broader, disease-

specific cluster patterns as well as the expression of individual genetic vulnerabilities of potential interest.

Accordingly, we used multiplex immunofluorescence (IF) to independently recapitulate the subtype-

defining C1vC23 transcriptomic signature at the protein level. This method provides an avenue to

quantitatively prioritize potentially useful target-associated biomarkers while also serving as a protein-

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 12: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

level validation of this transcriptomic signature. As a proof of concept, we first determined closeness

centrality for all genes in both Cluster 1 and Cluster 23 (Figure 5A). We then defined the first-degree

nodes of two in vivo targets, DNM2 and MAPK11, which localized to Cluster 1 and 23, respectively

(Figure 5B - C). Among the first-degree nodes, we then prioritized genes in each gene set based on

closeness centrality and antibody availability, which resulted in selection of VSIG1 and VIM as

representative biomarkers of Cluster 1 and 23, respectively (Figure 5D). IF analysis and image-based

quantification of VSIG1 and VIM, when combined with double staining for HLA, successfully recapitulated

the expected C1vC23 signature at the protein level in PDX models with the representative classical

(VSIG+), quasi-basal (VIM-/VSIG-), and basal-like (VIM+) transcriptomic signatures (Figure 5E, F). These

findings demonstrate the utility of jointly leveraging cluster-centrality and defined gene targets to

quantitatively prioritize protein biomarkers associated with relevant transcriptional profiles. Moreover,

these biomarkers provide protein-level validation of transcriptomic cluster trends and serve as accessible

alternatives for clinical characterization.

CoDEX-informed genetic dependencies functionally validate intratumoral classical, quasi-basal,

and basal-like molecular signatures

By integrating CRISPR screen-defined co-dependencies with the co-expression network, the CoDEX

platform can identify both common and unique dependencies within the larger context of PDAC diversity.

To determine the functional relevance of this approach, we evaluated the effect of perturbing gene targets

associated with the basal-like subtype that were prioritized through our CoDEX analysis in varied tumor

contexts (Figures 2F - H, Figure 6A). Based on our in vitro CRISPR screening results, we selected

sgRNA sequences targeting SMAD4, ZEB1, and ILK, as well as the non-essential gene, ABCG8, as a

negative control (Figure 6B, Extended Data Figure 6A). CoDEX-informed SMAD4 and ZEB1, both

localized within the network, were targeted for C1vC23 signature validation. ILK, a CRISPR-defined

dependency in PATC53 not present in the co-expression network, was selected to determine whether

knockout of this potential EMT regulator would also have the capacity to influence the C1vC23 signature

differential37-39. In addition, we applied a feature barcoding strategy whereby a complement sequence

was incorporated into the 3’ end of the sgRNA sequences, enabling scRNAseq sample multiplexing and

quantification of the C1vC23 signature shift relative to the sgABCG8 negative control distribution (Figure

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 13: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

6B). Individual sgRNAs derived from the CRISPR library were transduced into quasi-basal PATC69 and

basal-like PATC53 cells (Extended Figure 7A). Cells were cultured cells in vitro and collected at the

earliest point when separation of essential versus non-essential genes was observed in the original

CRISPR screen (Day 20) (Figure 6B and Extended Data Figure 4D - H). Sanger sequencing was used to

analyze the indel frequency of each sgRNA (Extended Data Figure 7B - K), and colony growth was

tracked for each sgRNA to confirm selective growth inhibition in the basal-like PATC53. (Figure 6B).

Multiplexed scRNAseq was conducted on 10,000 cells total (2,500 cells per sgRNA) for each PDX line

model.

PATC53 UMAPs revealed a clear separation between the ABCG8 negative control knockout cells and

populations with perturbations in combined Cluster 23 (SMAD4 and ZEB1) and basal-like (ILK)-

associated genes, whereas no separation was observed between test genes and the negative control in

PATC69 cells (Figure 6C, E). Also, as expected, the PATC69 population transduced with negative control

sgABCG8 contained C1vC23-defined classical and quasi-basal cells, while the basal-like PATC53

population contained quasi-basal and basal-like cells (Extended Data Figure 8C - D). To test the C1vC23

signature distribution shift relative to the reference distribution defined by the ABCG8-null negative

control, C1vC23 density plots were generated for each sgRNA knockout cell line (Figure 6G - H), and

Kolmogorov’s D statistic was used. Deletion of ILK, SMAD4, and ZEB1 each resulted in a significant

C1vC23 shift towards the Cluster 1-enriched classical signature in both PATC69 and PATC53 (Figure 6G

- H). In parallel with scRNAseq, transduced populations were seeded in vitro immediately following

selection to compare relative growth phenotypes for both PDX lines (Figure 6B). Coinciding with the

signature shift towards Cluster 1 at the single-cell level, perturbation of each of the three basal-like-

associated genes inhibited bulk population growth in PATC53 relative to sgABCG8-knockout controls,

whereas no growth phenotype was observed in the classical PATC69 (Figure 6D, F and Extended Data

Figure 8A, B).

In summary, we provide functional validation of the C1vC23 signature based on concordant signature

shifting, in both quasi-basal PATC69 and basal-like PATC53 cell populations, following genetic

perturbation of basal-like associated targets. Importantly, these findings provide evidence for our CoDEX

platform to uncover the intratumoral context of vulnerabilities localized within and adjacent to Cluster 23,

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 14: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

by quantifying the targeted depletion of C1vC23-defined basal-like cell populations. This represents a

novel approach to leverage network-informed signaling perturbation to influence heterogeneity in

pancreatic tumors.

DISCUSSION:

PDAC characterization and subtyping centered on the categorical Moffitt classical and basal-like

molecular subtypes are able to stratify patients based on clinical outcome, but have not yet been able to

guide precision medicine7. We developed CoDEX as an unbiased platform to quantify the molecular

signatures that contribute to PDAC diversity and provide signaling context to functionally annotated

genetic dependencies. By applying the CoDEX platform to our cohort of PDAC PDX models and patient

samples, we identified a differential signature derived from anti-correlated Clusters 1 and 23, C1vC23,

which demonstrated a classical-basal continuum corresponding with EMT. GO annotations underlying the

identified clusters provided novel insight into the oncogenic signaling characteristics of the Moffitt-defined

classical and basal-like subtypes. Cluster 23, which defined basal-like subtypes, was characterized by

gene signatures associated with cell projection organization, cell motility, and general cell development.

Alternatively, Cluster 1, which defined classical tumors, exhibited a transcriptomic bias towards lipid

metabolism, hormone regulation, and other xenobiotic metabolism signaling genes. Lipid metabolism and

lipid droplets have previously been observed to play important roles in generating the fatty acids required

for pancreatic cancer cell proliferation40. Thus, linking de novo lipid synthesis to the classical tumor

spectrum provides key insight into potential context-specific metabolic dependencies in lower histological

grade PDAC40.

Application of the C1vC23 signature yielded a notable refinement in the Cluster 23-enriched basal-like

cohort that was defined by a significant enrichment in mesenchymal-associated clinical features and

genetic dependencies. This basal-like refinement was due to the quantification of a near-zero differential

in the C1vC23 signature at the bulk tumor and single-cell levels, which we defined as quasi-basal.

Importantly, this quasi-basal single-cell population was significantly represented within both bulk classical

and basal-like tumors, expanding on the Toronto classification approach that demonstrated co-

occurrence of classical and basal-like subtype heterogeneity at the single-cell level10,41.

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 15: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

In addition to quantifying the transcriptomic diversity within PDAC, the CoDEX platform integrates

prioritized in vivo functional genomics on our PDX samples that results in a clinically relevant, context-

forward approach to contextualize the nature of genetic dependencies. By using PDXs, we overcome the

lack of Moffit-defined classical cell lines available for PDAC research7. scRNAseq analysis of

phenotypically and genetically diverse PDX models recapitulated findings from bulk tumors, revealing

vast intratumoral heterogeneity, and a cluster-defined classical to basal-like clonal spectrum with a

discrete quasi-basal signature in individual cells. Furthermore, scRNAseq of patient samples

recapitulated the C1vC23 intratumoral spectrum and revealed that the majority of the co-expression

network signatures was intrinsic to tumor cells.

We validated our C1vC23 signature by directly quantifying the effect of informed target perturbation on

C1vC23 subclonal heterogeneity through feature-barcoded scRNAseq. Multiplexed CRISPR deletion of

basal-like associated and cluster 23-anchored targets, ILK, SMAD4, and ZEB1 in knockout cells

quantified a “subclonal shift” toward the classical signature. This change in molecular dominance,

observed in both basal-like and quasi-basal populations, functionally demonstrated context-specific gene

essentiality as a function of disease evolution. Furthermore, knockout of established mesenchymal

drivers, such as SMAD4 and ZEB1, served to validate the C1vC23 signature. Consistent with previous

studies, our findings support a dual function of SMAD4 in PDAC, and the CoDEX-defined C1vC23

signature puts the tumor suppressor and tumor promotor roles of SMAD4 within distinct molecular

contexts35,42.

Our integrated CoDEX approach introduces multiple avenues for hypothesis generation within the context

of tumor heterogeneity. For example, anchoring of relatively less studied targets (e.g., FAM171A2 and

NKAIN4) to Cluster 23 implicates context-specific roles in mesenchymal cell proliferation and

maintenance. Although our results here focused on selective gene dependencies within highly aggressive

basal-like sub-clones, functional maps developed through the CoDEX pipeline also unraveled Cluster 1-

associated vulnerabilities, such as DNM2, with potential to eradicate classical clonal subpopulations.

Thus, while the quasi-basal population exhibits a near-zero differential in the C1vC23 signature,

associating other network-derived cluster signatures (e.g. C2vC13) may be useful to identify selective

genetic dependencies and further stratify this heterogeneous sub-population. This context-forward

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 16: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

approach to stratify dependencies provides a novel avenue to expand on current PDAC subtypes and to

define gene targets localized within additional cluster patterns underlying PDAC heterogeneity.

Notably, the subclonal shift induced by ILK knockout highlights the fact that additional metrics at the

protein-level are needed to more comprehensively quantify PDAC diversity, and underscores the need to

integrate protein status with network cluster trends to understand how molecular signaling is altered or

repositioned following major transcriptomic rewiring. Future integration of the co-expression network with

partially recapitulated or intact immune and stromal components may also elevate functional

characterization from the cell-intrinsic to tumor-intrinsic setting.

The use of CoDEX as an integrated approach to deconvolute tumor heterogeneity into relatively more

homogenous and approachable cluster-defined subclonal populations may, ideally, help to direct

treatment options to more significantly and comprehensively impact pancreatic tumors. This strategy may

also inform on the implementation of novel or existing drugs that may mitigate tumor heterogeneity, but

have only a negligible effect on tumor volume and viability. To our knowledge, CoDEX serves as the first

platform to define and leverage a comprehensive set of transcriptional signatures as a foundation to

inform context-specific dependency.

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 17: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

METHODS:

PDX models and Sequencing (RNA and Whole Exome)

A total of 49 models were utilized in this paper. PDAC PDX models were obtained from the labs of Dr.

Michael Kim (Department of Surgical Oncology, MD Anderson Cancer Center) and Dr. Scott Lowe

(Memorial Sloan Kettering Cancer Center)43,44. PDXs were propagated and maintained in NOD scid

gamma (NSG) mice carrying NOD.Cg-Prkdcscid Il2rgtm1Wjl/SzJ (Jackson Labs).

PDX Sequencing

Whole exome library preparation and sequencing

Whole exome sequencing (WES) libraries were prepared using the Agilent SureSelect XT library

preparation kit in accordance with the manufacturer’s instructions. Briefly, DNA was sheared using a

Covaris LE220. DNA fragments were end-repaired, adenylated, ligated to Illumina sequencing adapters,

and amplified by PCR. Exome capture was performed using the Agilent SureSelect XT v4 51Mb capture

probe set and captured exome libraries were enriched by PCR. Final libraries were quantified using the

KAPA Library Quantification Kit (KAPA Biosystems), Qubit Fluorometer (Life Technologies) and Agilent

2100 BioAnalyzer and were sequenced on an Illumina HiSeq2500 sequencer using 2 x 125bp cycles.

Base calling and filtering were performed using current Illumina software and adapters were trimmed

using Trim Galore [55]. Sequences were aligned to both NCBI genome human build 37 and mouse build

38 using Burrows-Wheeler Aligner45; identified mouse reads were removed from the original FASTQs and

then the files were realigned again to NCBI build 37 using BWA. Picard was used to remove duplicate

reads (http://picard.sourceforge.net); base quality scores were recalibrated using GATK46. Assessment of

reads that do not align fully to the reference genome was performed, locally realigning around indels to

identify putative insertions or deletions in the region. Variants were called using GATK HaplotypeCaller,

which generates a single-sample GVCF. To improve variant call accuracy, multiple single-sample GVCF

files were jointly genotyped using GATK GenotypeGVCFs, which generates a multi-sample VCF. Variant

Quality Score Recalibration (VQSR) was performed on the multi-sample VCF, which adds quality metrics

to each variant that can be used in downstream variant filtering.

RNA library preparation and sequencing

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 18: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

RNA sequencing libraries were prepared using the Illumina TruSeq Stranded mRNA sample preparation

kit in accordance with the manufacturer’s instructions. Briefly, 100ng of total RNA was used for

purification and fragmentation of mRNA. Following conversion of mRNA to cDNA, DNA was adenylated,

ligated to Illumina sequencing adapters, and amplified by PCR (using 10 cycles). Final libraries were

quantified using the KAPA Library Quantification Kit (KAPA Biosystems), Qubit Fluorometer (Life

Technologies) and Agilent 2100 BioAnalyzer and were sequenced on an Illumina HiSeq2500 sequencer

(v4 chemistry) using 2 x 50bp cycles. Base calling and filtering were performed using current Illumina

software. Reads were aligned to a joint index of NCBI genome human build 37 and mouse build 38 with

STAR aligner47. Reads that map uniquely and unambiguously to the mouse genome were removed from

the FASTQ files and then the files (containing unmapped reads and reads mapped at least once to the

human reference) were remapped to GRCh37 using STAR aligner and Gencode 19 annotation. Gene

expression quantification was performed with featureCounts (http://bioinf.wehi.edu.au/featureCounts/).

Genes with raw read counts present in less than 20% of samples were removed from further analysis,

and log normalized counts were generated on the 14175 filtered genes using DESeq48.

Moffitt Classification

Moffitt classification of Classical and Basal was assigned to the 49 PDX models using log normalized and

scaled counts of 21 classical and 25 basal genes for PDX PDAC model classification described by

consensus clustering into two groups using the R package ConsensusClusterPlus49. The clusters were

manually assigned as classical and basal based on high expression of each group of genes.

Construction of the PDAC co-expression network

Gene-wise median absolute deviation across samples from the normalized read counts. 8505 genes with

median absolute deviation exceeding the 40th percentile was filtered to consider highly variable genes. A

comprehensive network of all 8505 was generated by calculating spearman correlation across all gene

pairs. In order to prune the network to prioritize biologically relevant gene pairs, a Bayesian framework

developed by Yang, et al, Log Likelihood Score (lls), was applied15. This paper curated a “positive gold

standard” list of biologically relevant gene pairs and “negative gold standard” gene pairs with no known

functional annotations. The likelihood of correlation between any given gene-pair being functionally

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 19: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

relevant is calculated comparing to the negative and positive gold standard gene pairs. A log likelihood

score of 2.5 was used to cut off gene pairs to be included in the network.

Clustering and GO pathway annotation

Infomap16, a community detection tool for large networks, was used to cluster the network. All gene pairs

were inputted into Infomap, and three hierarchical tiers of clusters were produced. In order to assign

clusters, if the third tier contained more than 50 genes, it was assigned to an individual cluster. In the

event that the third tier contained less than 50 genes, it was folded into the second tier, all of which

formed a cluster. This was the case with cluster 29-31, which are larger encompassing clusters. For the

resulting 31 clusters, the R package goseq50 was used to conduct hyper-enrichment analysis of Gene

Ontology Biological Processes pathways on each cluster. The R package “revigo”51 was used to prioritize

and visualize GO pathways to represent their hierarchical class.

Mutation analysis

The “oncoplot” feature within the R package maftools52 was utilized to visualize the mutational spectrum

across the genes identified as relevantly mutated by the PDAC TCGA paper. To identify complementary

patterns of mutations across clusters, we used a method called UNCOVER17 using a filtered set of “high

impact” mutations and the cluster enrichment centroid scores per model. The filtered set of mutations was

generated by identifying canonical mutations with gnomAD53 allele frequency less than 1%, “moderate”

and “high” impact, and limited to non-intronic/non-coding and synonymous mutations.

Centroid scores

To generate a cluster level enrichment score, we calculated a centroid score per cluster by taking the

mean log-normalized expression of all genes in each cluster for each sample. A comprehensive overview

of the 31 centroid scores across the PDX models was generated using the R package

ComplexHeatmap49. We further computed a Pearson correlation across all 31 clusters and identified

strongly anticorrelated signature between Clusters 1 and 23.

Network Display

Cytoscape (version 3.7.2) was leveraged for visualization of both the PDAC co-expression network and

STRING-anchored co-dependency networks54,55. The PDAC co-expression network was visualized using

a Prefuse Force-Directed Layout, with node color displayed based on Random Walk defined clusters and

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 20: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

node size representative of Betweenness Centrality. For visualization purposes, only nodes with 12 or

more edges are represented, and edges are not displayed in representative co-expression network

figures. For STRING-anchored co-dependency networks, a Perfuse Force-Directed Layout is also

applied, with quantile-normalized Bayes Factors represented as a blue -to- red color distribution. For each

in vivo and in vitro condition, networks were constructed for each independent PDX line and then merged

for comparison based on overlapping nodes. All STRING-defined edges were maintained for in vivo and

in vitro network merging.

Cell Culture

PDX cell lines were seeded in treated tissue-culture plates (Corning) in DMEM/F12 medium (Gibco)

supplemented with 10% Fetal Bovine Serum (Gibco), Penicillin (50 units/mL) and Streptomycin (50

µg/mL) (ThermoFischer Scientific). Phosphate Buffered Saline (PBS) was utilized prior to trypsinization

and for general cell washing purposes (Thermo Fisher Scientific). Cells were regularly trypsonized

(0.25%, Trypsin-EDTA, Gibco) prior to reaching 70% - 80% confluence, and maintained on 10 cm and 15

cm treated tissue-culture dishes (Corning). Viable cells were counted using a Cellometer mini and 0.2%

Trypan Blue staining (Nexcelom).

Design and Construction of Custom RNAi Library

The original RNAi custom library, constituted by 2,653 shRNAs targeting PDAC-prioritized proteins

associated to the extracellular face of the plasma membrane, was constructed using chip-based

oligonucleotide synthesis and cloned into a pRSI17cb-U6-sh-13kCB18-HTS6-UbiC-TagGFP2-2A-Puro

lentiviral vector (Cellecta) as a pool. PDAC-prioritized surface proteins (including proteins with evidence of

mislocalization) were selected based on tumor-specific overexpression compared to matched-normal

tissue, copy number vs. RNA expression correlation, and SILAC screening for KRAS dependency26.

Significant tumor-specific overexpression and copy number vs RNA expression were determined for the

full transcriptome, and subsequently refined using an internally curated list of select extracellular proteins

containing transmembrane domains, GPI-anchored proteins and proteins with evidence of membrane

mislocalization. Tumor-specific overexpression vs. matched normal tissue was defined based on two

matched-normal bulk tumor microarray datasets (FC ≥ 1.5, q < 0.005, ArrayExpress: E-GEOD-15471,

EG-GEOD-28735), and further refined for tumor-specific expression (FC ≥ 1.5) by using a microarray

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 21: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

dataset composed of micro-dissected samples (ArrayExpress: E-MEXP-1121/950)23-25,56,57. Positive

Spearman correlations (rho ≥ 0.35, q < 0.005) of copy number vs. RNA expression were calculated based

on available TCGA level 3 data derived from the 07/15/2014 dataset58. False discovery rate for bulk tumor

microarray datasets and for copy number vs. RNA expression correlations was calculated using the

“qvalue” Bioconductor package59. Genes derived from published data characterizing KRAS dependent

surface protein localization, conducted across three iKRAS p53L/+ mouse60 tumor-derived cell lines, were

also incorporated into the library based on previous work26. The shRNAs targeted 241 genes, with 10

shRNAs/gene. Targeting sequences were designed using a proprietary algorithm (Cellecta). The oligo

corresponding to each shRNA was synthesized with a unique molecular barcode (18 nucleotides) for

measuring representation by NGS. Negative controls consisted of shRNAs targeting Luciferase, and

positive controls consisted of shRNAs targeting RPL30 and PSMA1. In addition, we incorporated shRNAs

targeting KRAS into the library to serve as additional controls for the PDAC PDX lines.

RNAi Screening in vivo and in vitro

Using the custom barcodes lentiviral shRNA library, PDX lines (PATC69, PATC124, PATC53 and

PATC153) were transduced in vitro using 8 µg/mL Polybrene (Sigma-Aldrich). Libraries were transduced

at 1000X coverage and multiplicity of infection (MOI) of 0.3. Media was replaced after 14 hours, and at

48 hours, and MOI was confirmed by checking GFP percentage with flow-cytometry 48 hours post-

transduction. Immediately following MOI confirmation with flow cytometry, 2 µg/mL Puromycin (Thermo

Fisher) was added to each transduced cell population for 72 hours. Immediately following Puromycin

selection, a Reference population was collected (1000X) and stored at -80 ºC. Remaining cells were split

and moved into in vitro and in vivo settings. Three independent in vitro screens were seeded (1000X) into

15 cm treated plates (Corning). For in vivo implantation, cells were combined into 1:1 PBS/Growth-Factor

Reduced Matrigel (Corning), and injected orthotopically at 1000X coverage per mouse. Immunodeficient

NOD SCID mice were leveraged for in vivo screening. For in vitro screening populations, cell populations

were collected at 10, 20 and 30 days post Reference collection. For in vivo screening, the entire pancreas

and tumor of each mouse was collected at day 30. For each condition, cells were lysed using SDS and

DNA was sheared using sterile 23 gauge 1 inch needles (Becton Dickinson). Tumor DNA was isolated

using Phenol:Chloroform extraction and Ethanol precipitation. A Nested PCR strategy was utilized to

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 22: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

amplify and prepare barcode populations for next-generation sequencing (Cellecta). Redundant siRNA

activity (RSA) analysis was applied to rank targets for each PDX model and screen setting61.

Design and construction of custom CRISPR Library

The custom CRISPR-Cas9 sgRNA library was constituted by 3,367 sgRNAs and designed to both

incorporate and expand upon the original RNAi-defined targets. The library was constructed using chip-

based oligonucleotide synthesis and cloned into a pRSG16-U6-sg-HTS6C-UbiC-TagRFP-2A-Puro

lentiviral vector (Cellecta) as a pool. RSA was used to rank essential genes from shRNA screens. Genes

were curated based on RSA < 0.05 and FC < -2 in at least one model (N=100). Curated gene targets

were further annotated by incorporating neighbors with a PPI score ≥ 0.80 (STRING, version 10), and

TPM (transcripts per million) > 2 in that model47. In addition, 50 non-essential genes and 50 essential

genes were added to have a final set of 654 genes32.

CRISPR Cas9 Screening in vivo and in vitro

Using the custom barcodes lentiviral sgRNA library, PDX lines (PATC69, PATC124, PATC53 and

PATC153) containing the lentiCas9-blast vector (addgene, plasmid #52962) were transduced in vitro

using 8 µg/mL Polybrene (Sigma-Aldrich). PDX lines transduced with Cas9 were constantly kept at 10

µg/mL blasticidin (Thermo Fischer Scientific). Libraries were transduced at 1000X coverage and a

multiplicity of infection (MOI) of 0.3. Media was replaced after 14 hours, and at 48 hours, and MOI was

confirmed by checking RFP percentage with flow-cytometry 48 hours post-transduction. Immediately

following flow cytometry, 2 ug/mL puromycin (Thermo Fisher) was added to each transduced cell

population for 72 hours. Immediately following puromycin selection, a Reference population was collected

(1000X) and stored at -80 °C. Cells were allowed to culture for an additional 12 days in order to allow for

some initial sgRNA-directed cutting prior to starting the in vivo screen, to reduce potential noise derived

from disproportional cell doubling following orthotopic implantation. A secondary Reference (1000X), at

the time of injection, was then collected. Remaining cells were split and moved into in vitro and in vivo

settings. Three independent in vitro screens were seeded (1000X) into 15 cm treated plates (Corning).

For in vivo implantation, cells were combined into 1:1 PBS/Growth-Factor Reduced Matrigel (Corning),

and injected orthotopically at 1000X coverage per mouse. Immunodeficient NSG mice were leveraged for

in vivo screening. For in vitro screening populations, cell populations were collected at 10, 20 and 30 days

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 23: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

post injection and secondary Reference collection. For in vivo screening, the entire pancreas and tumor

of each mouse was collected at day 30 post injection. DNA extraction and barcode library preparation

were conducted in similar fashion to the RNAi screens. For each condition, cells were lysed using SDS

and DNA was sheared using sterile 23 gauge 1 inch needles (Becton Dickinson). DNA was isolated using

Phenol:Chloroform extraction and Ethanol precipitation. Nested PCR was utilized to amplify and prepare

barcode populations for NGS (Cellecta).

Low Fat BAGEL

Log2 fold-change was calculated on a guide level by comparing each time point to the reference time

point for each model. Screen quality and efficient drop out was assessed by comparing the log density

ratio log2 fold-change of core essential versus non-essential guides.

Low-Fat BAGEL is an adapted framework of the BAGELv233 algorithm to more accurately analyze small

screens with limited training sets. The BAGEL and BAGELv2 algorithms are previously described

implementations of a Bayesian model selection algorithm, and in short, calculate a “Bayes Factor”, a log

likelihood of gene essentiality trained on gold-standard core essential and non-essential genes32.

�� �Pr�� | ��� ����

Pr�� | ������ �����

� Pr�� |�, ��� ���� Pr�� | ��� ���� ��

� Pr�� |�, ������ ���� Pr�� | ������ ���� ��

In Low-Fat BAGEL, a random bootstrap of core essential and non-essential guides is used as the training

set to generate log density curves of the essential and non-essential fold-change distribution. This

distribution is used to generate a linear regression model (k) of the fold-change to control for the influence

of outliers. Over 100 permutations of the training set, Low-Fat BAGEL calculates the BF of all guides in

the library, by comparing the log2 fold-change of each guide (D) to the distribution of fold-changes of the

essential and non-essential genes. The final BF each of individual guide is computed as the mean across

100 iterations, and a gene level BF was calculated as the sum of all guides for that gene.

The R package ‘ROCR’ was used to generate Precision-Recall curves of the essential and non-essential

gene level BF62. The F measure of each screen using different analytical methods was calculated as

described here32.

Anticorrelated cluster signatures and “Quasi-basal” group

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 24: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

A test was performed to quantify the difference in Cluster 1 enrichment across Classical and Basal

models. The centroid scores for Cluster 1 and 23 were used to categorize the PDX models into three

clusters using K-means clustering. K=3 was chosen as the optimal number of clusters based on the

majority consensus of two out of three methods between the elbow criterion in within-cluster sum of

squares, BIC calculated by the R package “mclust”63 and “NbClust”64 which calculates the optimal k

across 30 methods. The R package “factoextra”65 was used to visualize the results. The group of models

with Cluster 1 enrichment and Cluster 23 depletion were termed “classical”, with the inverse group termed

“basal”, and the third group with roughly equal centroid scores in Cluster 1 and 23 termed “quasi-basal”.

Differential Expression between Classical and Basal Models

After categorizing models, the ones deemed classical and basal using the CEN methodology were used

to conduct differential expression analysis. DESeq48 was used to identify significantly differentially

expressed genes, and genes were ranked by t score. GSEA was conducted using the “GSEA pre-ranked”

feature66. We compared differences in histology and site of recurrence obtained from clinical data

associated with each PDX model. Chi square tests were conducted using the CEN subtypes and Moffitt

subtypes to evaluate if subtype level differences were statistically significant.

Clinical correlations with signature

Clinical data from patients corresponding to the PDAC PDX cohort was provided by the lab of Dr. Eugene

Koay67. Differentiation status on 45 models was categorized as “Poor” and “Moderate” and histology

status on 35 models was categorized as “Locoregional” if the records indicated as “regional” or

“locoregional”, and “Distant” if site was outside the pancreas. Chi square tests were used to evaluate

differences between the cluster-defined Classical, Quasi-basal and Basal groups as well as the Moffitt

binary classification.

TCGA data

TCGA gene expression data and corresponding clinical data was downloaded from GDC, and processed

as previously described4. Raw read counts were log2 transformed and normalized using DESeq48

similarly to the PDXs described above. We filtered the dataset to “high purity” samples annotated in the

clinical data, where we obtained the Moffitt classification data. Cluster centroids were calculated as

described above with the PDXs and a k means of 3 was applied to the Cluster 1 and 23 centroid scores

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 25: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

to categorize tumors into Classical, Quasi-basal and Basal. Tumor grade and differentiation status were

obtained from the clinical data and Chi square tests were conducted to assess subtype level associations

with the C1vC23 signature and Moffitt groups.

Single Cell Data for PDX-derived Cell Lines.

Seurat version 3.168 was used to analyze all single cell analysis. Each of the lines, PATC124, PATC53

and PATC69 were analyzed separately. PATC53 contained two replicates, which were merged for

analysis. For all PDX cell lines, single cells with a minimum of 350 expressed genes and less than 10%

mitochondrial reads were retained. Genes expressed in less than 3 cells and mitochondrial genes were

removed from further analysis. The data was log normalized, transformed using the “vst” function with top

2000 variant genes. The total RNA count, cell cycle score and mitochondrial reads were regressed out.

For PATC124, principal-component analysis and uniform manifold approximation and projection (UMAP)

with the first 15 dimensions was performed, followed by identifying clusters using a resolution of 0.15. For

PATC69, principal-component analysis and uniform manifold approximation and projection (UMAP) with

the first 15 dimensions was performed, followed by identifying clusters using a resolution of 0.10. PATC53

contained two replicates, which were integrated as one dataset following normalization and variant

stabilization. Total RNA count, cell cycle score and mitochondrial reads were regressed out of the

integrated dataset, and PCA and UMAP on first 15 dimensions was performed, further identifying clusters

using a resolution of 0.10.

Core Needle Biopsy Single Cell Analysis

Seven CNB samples were used – four primary tumors, one liver, lung and vaginal metastases sample

each36. All seven samples were filtered to have a minimum of 350 expressed genes and less than 25%

mitochondrial reads were retained. Similar to the PDX analysis, genes expressed in less than 3 cells and

mitochondrial genes were removed from further analysis. Each sample was log normalized, transformed

using the “vst” function with top 2000 variant genes. The seven samples were then integrated, following

which, the total RNA count, cell cycle score and mitochondrial reads were regressed out. Principal-

component analysis and UMAP with the first 20 dimensions was performed, followed by identifying

clusters using a resolution of 0.20. Cell types associated with clusters were identified using established

stromal markers10,69 and epithelial cell markers specific to liver and pancreas36,70,71.

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 26: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

Cluster Signatures for Single Cells

Network-based normalization of cluster signatures amongst single cells was achieved by first identifying a

subset of genes whose expression was strongly correlated with its own cluster assignment (r > 0.4). For

the PDX models, using the subset of highly correlated genes per cluster, if more than 30% of the cluster

was captured per single cell, we calculated a centroid score by taking the mean of normalized UMI count.

For patient CNB samples, centroid was calculated for all single cells if more than 20% of the cluster was

captured. The percentage of cluster genes expressed within each cell type was calculated by taking the

mean number of genes with UMI count above 0 per cluster across all the single cells within each cell

type. Single cells were grouped into classical, quasibasal and basal based on the C1vC23 differential

signature, classified as basal if the differential signature was < -0.1, quasibasal if they are -0.1 to 0.1 and

classical if they were > 0.1.

Biomarker Prioritization

Cluster 1 and 23 specific networks were generated using Cytoscape 3.7.2 55, where we calculated the

closeness centrality of each gene to its respective cluster. Gene to cluster correlation was calculated by

computing the Pearson correlation between each gene and its corresponding cluster score across our 48

PDX models. First degree nodes (FDN) were identified as directly connected nodes to CRISPR

associated vulnerabilities DNM2 and MAPK11.

The R packages ggplot272 and ComplexHeatmap52 were used to plot the comparison between the gene

to cluster correlation and closeness centrality and the gene expression heatmap of first-degree nodes,

respectively.

Multiplex IHC-IF staining and data analysis

Formalin fixed and paraffin embedded (FFPE) PATX samples were sectioned into 3 µm thick sections

and placed on positive charged slides. Sections were deparaffinised by baking at 60 ºC for 1 hour, then

rehydrated by serial passage thorough xylene and graded alcohol. All sections were subjected to an initial

heat-induced epitope retrieval (HIER) in 10 mM citrate buffer with 0.05% Tween20, pH 6.0, at 95 ºC for 15

minutes using a BioGenex EZ retrieval microwave. Subsequent HIER for Opal development was done

using fresh citrate buffer at 95 ºC for 10 minutes. All sections were initially blocked for endogenous

peroxidase using Bloxal (Vector Labs SP6000). After and before each primary incubation, sections were

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 27: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

blocked using 2.5% serum (Vector Labs S1012). Opal, indirect, and direct immunofluorescence methods

were used. First node protein targets were developed using Opal methods, VSIG1 (Thermofisher MAB

4818, at 1/2000), and Vimentin (Cell Signaling Technologies 5741, at 1/1600). Ki67 (Cell Signaling

Technologies 9129, at 1/400) was developed indirectly using anti-rabbit secondary, Alexa 680

(Thermofisher A32802, 1/500). Lastly, HLA conjugated to Alexa 647 (Abcam 199837, at 1/1000) was

used to aid in tissue segmentation. Sections were then counterstained with DAPI.

Slides were imaged using Vectra 3.0 Automated Quantitative Pathology Imaging System (Akoya

Biosciences). Image processing and analysis was performed using inForm Software v2.4 (Akoya

Biosciences). For a subset of images from each PATX, the following was performed: Images were

unmixed and autofluorescence was removed. Then, tissue was segmented as tumor, stroma or other

based on training regions and pattern recognition of DAPI and HLA stain. This was followed by cell

segmentation using DAPI and HLA to segment nuclei, cytosol, and membrane. Phenotyping was

performed for each marker individually by selecting representative positives for algorithm training and

allowing the software to select the rest. Batch analysis of all images was performed using the

segmentation and phenotyping algorithm described above. At this threshold of detection, VSIG+/VIM+

cells represented a negligible population.

Spatial and data analysis was performed using phenoptrReports (Akoya Biosciences), an R script

package. Briefly, all single cell phenotype data was merged, aggregated and consolidated for each

marker. Consolidated data was analyzed based on the phenotypes of interest. Using the XY coordinates

of each cell, spatial relationships between cell types was visualized using the phenotrReports GUI. All

data was graphed using GraphPad Prism version 8.0.0 for Windows, GraphPad Software, San Diego,

California USA, www.graphpad.com.

Feature Barcoding Vector

The feature barcode vector (LentiCRISPR-E-10xcs1) was built using the pLentiCRISPR-v2 (addgene:

52961) as the base vector. All of the molecular modifications were performed by Epoch Life Sciences

(Missouri City, Tx). The pLentiCRISPR-v2 was modified to an optimized sgRNA scaffold73 that included

the 3’ 10x capture sequence 1:

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 28: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

(cgtttCagagctaTCGTGgaaaCAGCAtagcaagttaaaataaggctagtccgttatcaacttgaaaaagtggcaccgagtcggtgcGCT

TTAAGGCCGGTCCTAGCAAtttttt); a ccdb bacterial expression cassette between the BsmBI restriction

sites was introduced to reduce background during sgRNA cloning and library generation; and an N-

terminal Flag-sv40 NLS was added to Sp. Cas9-nucleoplasmin NLS-P2A-Puro.

Preparation of Feature Barcoded sgRNA Knockout Populations

Lentiviral transductions of four separate feature barcode sgRNA vectors (targeting ABCG8, ILK, SMAD4

and ZEB1), were conducted on separate cell populations for both the PATC69 and PATC53 PDX lines.

Lentivirus was concentrated through ultra-centrifugation, resuspended in 200 µL of PBS, and stored at -

80 °C until use. For each condition, 1x106 cells were transduced in 10 cm treated plates (Corning) using 8

µg/mL Polybrene (Sigma-Aldrich). Media was replaced after a 16-hour incubation, and each cell

population was washed with PBS and then placed under puromycin selection for 72 hours. Following

selection, all conditions were cultured for 22 days (or 10 days post “CRISPR Screen Injection point”) to

match the in vitro CRISPR screen control-separation profile at day 10 (Extended Data Figure 2A). Prior to

library preparation for scRNAseq, knockout populations were combined in equal proportion for each PDX

line. Library preparation was conducted on a total of 10,000 cells per PDX line, resulting in an

approximate coverage of 2500 cells per condition.

sgRNA Phenotype Confirmation and Confirmation of Site-Specific Cutting

Utilizing the same feature barcoded populations prepared for scRNAseq, 1500 cells/well were seeded in

triplicate in 12 well tissue culture plates (Corning) immediately following 72 hours of 2 µg/mL Puromycin

selection. Cells were then cultured for a minimum of 10 doublings. Individual plates were then stained

with 0.5% crystal violet (in 25% methanol) for 2 hours. Plates were washed in water, dried overnight and

then digitally scanned. After digitally scanning the plates, crystal violet was dissolved in equal volumes of

1% SDS, and 200 µL of each sample was moved into 96-well plates to measure absorbance at 570 nm.

Relative growth was quantified based on the internal sgABCG8 negative control. All data was graphed

using GraphPad Prism v 8.0.

Puromycin selected cells were collected for Sanger sequencing to confirm sgRNA induced indel formation

relative to non-infected populations. Cell pellets for each sgRNA across both PATC69 and PATC53, 1x106

cells each, were isolated at Day 12 and Day 40 for each PDX line (sgRPS27A indels representative at

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 29: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

Day 12, all other sgRNAs at Day 40). Cell pellets were centrifuged, washed once with PBS, and frozen at

-80 °C. All Sanger sequenced regions were normalized against respective non-transduced PDX line

populations, 1x106 cells/pellet. Primers for each cut site were developed to allow for 400 - 800 bp

products, and primer sites were run on 2% agarose gels and extracted following amplification. Site-

specific sequencing primers were utilized for Sanger sequencing, and indels distributions were calculated

using the Synthego ICE Analysis Tool74.

Feature Barcoding Analysis

Feature barcoded scRNAseq data were analyzed using Seurat3.168. Each cell line was individually

evaluated. Cells expressing more than 350 genes and less than 25% mitochondrial reads were retained

and subsequently log normalized, variant stabilized, and total RNA count, mitochondrial reads and cell

cycle were regressed out. All guide-level samples for each PDX line were merged using the Seurat

Anchor Cell feature to provide a direct point of comparison for PATC69 and PATC53 perturbations. A

total of 10,113 PATC69 (3449 ABCG8, 1962 ILK, 2662 SMAD4, and 2040 ZEB1 knockout cells) and

11,439 PATC53 (3414 ABCG8, 2952 ILK, 2819 SMAD4, and 2254 ZEB1 knockout cells) cells were

retained and analyzed for processing. Principal-component analysis and UMAP with the first 20

dimensions was performed, with clustering performed at 0.15 resolution for. Cluster centroids were

calculated using the method described above for the patient CNB samples.

ACKNOWLEDGMENTS

We thank the Advanced Technology Genomics Core (ATGS), the UTMDACC Flow Cytometry and

Cellular Imaging Core Facility, the UTMDACC Department of Veterinary Medicine, all funded by a Cancer

Center Support Grant (P30 CA016672). We thank David Aten and Jordan Pietz at the UTMDACC

Medical Graphics and Photography Department funded by Cancer Center Support Grant (P30

CA016672). We thank the UTMDACC Science Park NGS Facility, funded by CPRIT Core Facility

Support Grants (RP120348 and RP170002). We thank Sisi Gao for her contributions editing the

manuscript. We thank our colleagues at the Institute of Applied Cancer Science (IACS), Justin Huang,

Stephanie Schmidt, and Vandhana Ramamoorthy, for technical assistance and suggestions. We thank

David Pollock at ATGS for technical assistance and advice. G.F.D. was supported by the Sewell Family

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 30: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

Endowed Chairmanship in Genomic Medicine, NIH/NCI P01 CA117969 12, and the UT MD Anderson

Cancer Moon Shots Program. S.S was supported by the CPRIT Research Training Grant (RP170067).

J.R. and I.H. received support from the Paula-Altman Goldstein Discovery Fellowship. J.R. also received

support from P01 CA117969 12. W.Y. is supported by the PanCAN-AACR Pathway to Leadership Grant

(16-70-25-YAO) and the Pancreatic Cancer Action Network Translational Research Grant (19-65-YAO).

TH is a CPRIT Scholar in Cancer Research, and is supported by NIGMS grant R35GM130119 and MD

Anderson Cancer Center Support Grant P30 CA016672. TH is a consultant for Repare Therapeutics.

AUTHOR CONTRIBUTIONS

J.L.R., S. Srinivasan., A.C. and G.F.D. conceived and designed the study. J.L.R., W.Y., M.P., A.

Machado. performed the experiments. S. Srinivasan, S. Seth, E.K. and C.A.B analyzed the data. PDX

generation was guided by the lab of M.K. Single-cell RNAseq data of PDXs were provided by C.Y.L., I.H.

and A.V. Single-cell RNAseq data from patients were provided by J.J.L., P.A.G., and A. Maitra. Clinical

annotations for PDXs were provided by M.S. and E.J.K. Guidance on experimental and computational

aspects of the study was provided by W.Y., J.R.D., C.A.B., G.G., A.V., T.P.H., A. Maitra, and T.H. The

project was led and supervised by A.C. and G.F.D. The manuscript was written by J.L.R., S. Srinivasan,

A.C. and G.F.D. and edited by A.K.D., A. Maitra and T.H. All authors approved the final manuscript.

COMPETING INTERESTS

The authors declare no competing interests.

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 31: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

REFERENCES

1. Siegel, R.L., Miller, K.D. & Jemal, A. Cancer statistics, 2019. CA Cancer J Clin 69, 7-34 (2019). 2. Society, A.C. Cancer Facts & Figures 2020. in Atlanta: American Cancer Society (2020). 3. Jones, S. et al. Core signaling pathways in human pancreatic cancers revealed by global

genomic analyses. Science 321, 1801-6 (2008). 4. Cancer Genome Atlas Research Network. Integrated Genomic Characterization of Pancreatic

Ductal Adenocarcinoma. Cancer Cell 32, 185-203.e13 (2017). 5. Hilmi, M., Bartholin, L. & Neuzillet, C. Immune therapies in pancreatic ductal adenocarcinoma:

Where are we now? World J Gastroenterol 24, 2137-2151 (2018). 6. Collisson, E.A. et al. Subtypes of pancreatic ductal adenocarcinoma and their differing responses

to therapy. Nat Med 17, 500-3 (2011). 7. Moffitt, R.A. et al. Virtual microdissection identifies distinct tumor- and stroma-specific subtypes of

pancreatic ductal adenocarcinoma. Nat Genet 47, 1168-78 (2015). 8. Bailey, P. et al. Genomic analyses identify molecular subtypes of pancreatic cancer. Nature 531,

47-52 (2016). 9. Puleo, F. et al. Stratification of Pancreatic Ductal Adenocarcinomas Based on Tumor and

Microenvironment Features. Gastroenterology 155, 1999-2013.e3 (2018). 10. Chan-Seng-Yue, M. et al. Transcription phenotypes of pancreatic cancer are driven by genomic

events during tumor evolution. Nat Genet 52, 231-240 (2020). 11. Simeonov, K.P. et al. Single-cell lineage and transcriptome reconstruction of metastatic cancer

reveals selection of aggressive hybrid EMT states. bioRxiv, 2020.08.11.245787 (2020). 12. Carugo, A. et al. In Vivo Functional Platform Targeting Patient-Derived Xenografts Identifies

WDR5-Myc Association as a Critical Determinant of Pancreatic Cancer. Cell Rep 16, 133-147 (2016).

13. Manguso, R.T. et al. In vivo CRISPR screening identifies Ptpn2 as a cancer immunotherapy target. Nature 547, 413-418 (2017).

14. Jung, J. et al. Generation and molecular characterization of pancreatic cancer patient-derived xenografts reveals their heterologous nature. Oncotarget 7, 62533-62546 (2016).

15. Yang, S. et al. COEXPEDIA: exploring biomedical hypotheses via co-expressions associated with medical subject headings (MeSH). Nucleic Acids Res 45, D389-d396 (2017).

16. Edler, D., Eriksson, A. & Rosvall, M. The MapEquation software package, available online at http://www.mapequation.org.

17. Sarto Basso, R., Hochbaum, D.S. & Vandin, F. Efficient algorithms to discover alterations with complementary functional association in cancer. PLoS Comput Biol 15, e1006802 (2019).

18. Franklin, R.B., Zou, J. & Costello, L.C. The cytotoxic role of RREB1, ZIP3 zinc transporter, and zinc in human pancreatic adenocarcinoma. Cancer Biol Ther 15, 1431-7 (2014).

19. Su, J. et al. TGF-β orchestrates fibrogenic and developmental EMTs via the RAS effector RREB1. Nature 577, 566-571 (2020).

20. Deacu, E. et al. Activin type II receptor restoration in ACVR2-deficient colon cancer cells induces transforming growth factor-beta response pathway genes. Cancer Res 64, 7690-6 (2004).

21. Biernat, J. et al. Protein kinase MARK/PAR-1 is required for neurite outgrowth and establishment of neuronal polarity. Mol Biol Cell 13, 4013-28 (2002).

22. Cohen, D., Rodriguez-Boulan, E. & Müsch, A. Par-1 promotes a hepatic mode of apical protein trafficking in MDCK cells. Proceedings of the National Academy of Sciences of the United States of America 101, 13792-13797 (2004).

23. Grützmann, R. et al. Gene expression profiling of microdissected pancreatic ductal carcinomas using high-density DNA microarrays. Neoplasia 6, 611-22 (2004).

24. Zhang, G. et al. DPEP1 inhibits tumor cell invasiveness, enhances chemosensitivity and predicts clinical outcome in pancreatic ductal adenocarcinoma. PLoS One 7, e31507 (2012).

25. Badea, L., Herlea, V., Dima, S.O., Dumitrascu, T. & Popescu, I. Combined gene expression analysis of whole-tissue and microdissected pancreatic ductal adenocarcinoma identifies genes specifically overexpressed in tumor epithelia. Hepato-gastroenterology 55, 2016-2027 (2008).

26. Yao, W. et al. Syndecan 1 is a critical mediator of macropinocytosis in pancreatic cancer. Nature 568, 410-414 (2019).

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 32: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

27. Commisso, C. et al. Macropinocytosis of protein is an amino acid supply route in Ras-transformed cells. Nature 497, 633-637 (2013).

28. Szklarczyk, D. et al. The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res 45, D362-d368 (2017).

29. Li, W. et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol 15, 554 (2014).

30. Allen, F. et al. JACKS: joint analysis of CRISPR/Cas9 knockout screens. Genome Res 29, 464-471 (2019).

31. Meyers, R.M. et al. Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nat Genet 49, 1779-1784 (2017).

32. Hart, T. & Moffat, J. BAGEL: a computational framework for identifying essential genes from pooled library screens. BMC Bioinformatics 17, 164 (2016).

33. Kim, E. & Hart, T. Improved analysis of CRISPR fitness screens and reduced off-target effects with the BAGEL2 gene essentiality classifier. bioRxiv, 2020.05.30.125526 (2020).

34. Aiello, N.M. et al. EMT Subtype Influences Epithelial Plasticity and Mode of Cell Migration. Dev Cell 45, 681-695.e4 (2018).

35. Bardeesy, N. et al. Smad4 is dispensable for normal pancreas development yet critical in progression and tumor biology of pancreas cancer. Genes Dev 20, 3130-46 (2006).

36. Lee, J.J. et al. Elucidation of tumor-stromal heterogeneity and the ligand-receptor interactome by single cell transcriptomics in real-world pancreatic cancer biopsies. bioRxiv, 2020.07.28.225813 (2020).

37. Li, Y. et al. Inhibition of integrin-linked kinase attenuates renal interstitial fibrosis. J Am Soc Nephrol 20, 1907-18 (2009).

38. Sawai, H. et al. Integrin-linked kinase activity is associated with interleukin-1 alpha-induced progressive behavior of pancreatic cancer and poor patient survival. Oncogene 25, 3237-46 (2006).

39. Serrano, I., McDonald, P.C., Lock, F.E. & Dedhar, S. Role of the integrin-linked kinase (ILK)/Rictor complex in TGFβ-1-induced epithelial-mesenchymal transition (EMT). Oncogene 32, 50-60 (2013).

40. Sunami, Y., Rebelo, A. & Kleeff, J. Lipid Metabolism and Lipid Droplets in Pancreatic Cancer and Stellate Cells. Cancers (Basel) 10(2017).

41. Hayashi, A. et al. A unifying paradigm for transcriptional heterogeneity and squamous features in pancreatic ductal adenocarcinoma. Nature Cancer 1, 59-74 (2020).

42. Malkoski, S.P. & Wang, X.J. Two sides of the story? Smad4 loss in pancreatic cancer versus head-and-neck cancer. FEBS Lett 586, 1984-92 (2012).

43. Kim, M.P. et al. Generation of orthotopic and heterotopic human pancreatic cancer xenografts in immunodeficient mice. Nat Protoc 4, 1670-80 (2009).

44. Ivanics, T. et al. Patient-derived xenograft cryopreservation and reanimation outcomes are dependent on cryoprotectant type. Lab Invest 98, 947-956 (2018).

45. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754-60 (2009).

46. DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43, 491-8 (2011).

47. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15-21 (2013). 48. Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol

11, R106 (2010). 49. Wilkerson, M.D. & Hayes, D.N. ConsensusClusterPlus: a class discovery tool with confidence

assessments and item tracking. Bioinformatics 26, 1572-3 (2010). 50. Young, M.D., Wakefield, M.J., Smyth, G.K. & Oshlack, A. Gene ontology analysis for RNA-seq:

accounting for selection bias. Genome Biol 11, R14 (2010). 51. Supek, F., Bošnjak, M., Škunca, N. & Šmuc, T. REVIGO summarizes and visualizes long lists of

gene ontology terms. PLoS One 6, e21800 (2011). 52. Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in

multidimensional genomic data. Bioinformatics 32, 2847-9 (2016). 53. Karczewski, K.J. et al. The mutational constraint spectrum quantified from variation in 141,456

humans. Nature 581, 434-443 (2020).

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 33: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

54. Szklarczyk, D. et al. STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 43, D447-52 (2015).

55. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13, 2498-504 (2003).

56. Rückert, F. et al. Co-expression of KLK6 and KLK10 as prognostic factors for survival in pancreatic ductal adenocarcinoma. British journal of cancer 99, 1484-1492 (2008).

57. Pilarsky, C. et al. Activation of Wnt signalling in stroma from pancreatic cancer identified by gene expression profiling. Journal of cellular and molecular medicine 12, 2823-2835 (2008).

58. HARVARD, B.I.o.M.a. Broad Institute TCGA Genome Data Analysis Center (2014): Firehose VERSION run. .

59. Storey, J.D., Bass, A.J., Dabney, A. & Robinson, D. qvalue: Q-value estimation for false discovery rate control. R package version 2.20.0, http://github.com/jdstorey/qvalue. (2020).

60. Ying, H. et al. Oncogenic Kras maintains pancreatic tumors through regulation of anabolic glucose metabolism. Cell 149, 656-70 (2012).

61. König, R. et al. A probability-based approach for the analysis of large-scale RNAi screens. Nat Methods 4, 847-9 (2007).

62. Sing, T., Sander, O., Beerenwinkel, N. & Lengauer, T. ROCR: visualizing classifier performance in R. Bioinformatics 21, 3940-3941 (2005).

63. L., S., M., F., T.B., M. & A.E., R. mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. The R Journal 8, 289–317 (2016).

64. Charrad, M., Ghazzali, N., Boiteau, V. & Niknafs, A. NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set. 2014 61, 36 (2014).

65. Kassambara, A. & Mundt, F. Package “factoextra.” R Top. Doc. (2017). 66. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for

interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 15545-50 (2005). 67. Koay, E.J. et al. A Visually Apparent and Quantifiable CT Imaging Feature Identifies Biophysical

Subtypes of Pancreatic Ductal Adenocarcinoma. Clinical Cancer Research 24, 5883-5894 (2018). 68. Satija, R., Farrell, J.A., Gennert, D., Schier, A.F. & Regev, A. Spatial reconstruction of single-cell

gene expression data. Nat Biotechnol 33, 495-502 (2015). 69. Muraro, Mauro J. et al. A Single-Cell Transcriptome Atlas of the Human Pancreas. Cell Systems

3, 385-394.e3 (2016). 70. Peng, J. et al. Single-cell RNA-seq highlights intra-tumoral heterogeneity and malignant

progression in pancreatic ductal adenocarcinoma. Cell Res 29, 725-738 (2019). 71. MacParland, S.A. et al. Single cell RNA sequencing of human liver reveals distinct intrahepatic

macrophage populations. Nature Communications 9, 4383 (2018). 72. Wickham, H. ggplot2: Elegant Graphics for Data Analysis, (Springer-Verlag, New York, 2016). 73. Dang, Y. et al. Optimizing sgRNA structure to improve CRISPR-Cas9 knockout efficiency.

Genome Biology 16, 280 (2015). 74. Hsiau, T. et al. Inference of CRISPR Edits from Sanger Trace Data. bioRxiv, 251082 (2018).

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 34: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

FIGURE LEGENDS

Figure 1: Defining signaling diversity in PDAC through the construction and annotation of a PDX-

based co-expression network. (a) Overview outlining the development of the PDAC co-expression

network from a diverse and representative PDAC patient-derived xenograft (PDX) cohort of 48 models,

and subsequent integration of mutation data for cluster enrichment analysis. (b) Oncoplot of common

PDAC-associated mutations in the 48-model PDX cohort displays mutation frequencies similar to

previous publications. (c) Force-directed layout of the PCEN. Gene clusters, defined by Gene Ontology

(GO), visually highlight nodes with a minimum of 12 edges. (d) Heat map depicting Pearson correlations

between cluster centroid scores across 48 PDX models. Starred correlations represent an adjusted P

value < 0.05. (e) Force-directed layout of the PCEN (top) and matching GO hierarchical treemaps

(bottom) of prominent anti-correlative cluster centroid trends across the PDAC co-expression network

highlighting Cluster 1 (Lipid metabolism) vs. Cluster 23 (Cell development). (f) Force-directed layout of

the PCEN (top) and matching GO hierarchical treemaps (bottom) of prominent anti-correlative cluster

centroid trends across PDAC co-expression network highlighting Cluster 2 (Golgi vesicle transport) vs.

Cluster 13 (mRNA catabolism).

Figure 2. Informed CRISPR characterization of PDX models identifies co-expression network

cluster-associated functional diversity within the PDAC cohort. (a) Overview of our prioritized in vivo

functional genomics platform, whereby a custom PDAC-prioritized RNAi library was used to screen a

functional set of proteins largely localized at the cell surface. Functionalized surface proteins were then

expanded upon using a protein-protein interaction (PPI) network, which was further characterized using a

custom CRISPR library and integrated into the PCEN. (b) Venn diagram of functional targets derived from

in vivo orthotopic CRISPR screening of PDX lines using a quantile-normalized Bayes factor (BF) > 1 (c -

e) CRISPR screening results from PATC69 (c), PATC124 (d), and PATC53 (e) PDX lines overlaid onto a

merged PPI force-directed diagram. The BF of each gene indicates the degree of vulnerability of the

gene, with BF > 1 indicating an essential gene. (f - h) PCEN anchoring of in vivo functional

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 35: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

dependencies, represented as quantile-normalized Bayes Factors, to inform on cluster-associated

vulnerability context in PATC69 (f) PATC124 (g), and PATC53 (h) PDX lines.

Figure 3: Prominent anti-correlated CoDEX clusters recapitulate a granular spectrum of

“classical”, “quasi-basal”, and “basal-like” cells inter-tumorally. (a) [left panel] Overlay of the

normalized expression score of each gene over the PDAC co-expression network of CoDEX defined

classical, quasi-basal and basal models [middle panel]. Pie chart comparison of clinical histology and

recurrence of patients associated with PDX models across subtypes. P values are derived from Chi

Square test. [right panel] K means clustering reveals optimal k=3 using the Cluster 1 and Cluster 23

centroid scores across the PDX models paired with Consensus Clustering (k=2) into Moffitt’s classical

and basal subtypes (b) Box plot of average Cluster 1 centroid score across Moffitt-defined classical and

basal-like models. Box-whisker plots show median�±�first and third quartiles. P�values are derived from

t–test (n�=�48 PDX tumors). (c) PCA demonstrating the subtypes derived from the PDAC co-

expression network with the quasi-basal group driving the PCs compared to the binary Moffitt

Classification. (d) GSEA conducted on PDAC co-expression network-defined classical and basal-like

models identifies EMT as a top pathway enriched in basal-like models FDR = 0.000. (e) [left panel] Pie

chart comparison of tumor differentiation status in high-epithelial-content TCGA models comparing PDAC

co-expression network-derived classifications and Moffitt classifications (n=76 tumors). [right panel] Heat

map of C1vC23 anti-correlated cluster signature differential, matching Moffitt classification and tumor

grade on TCGA tumor samples (samples with ≥ 30% epithelial content).

Figure 4: Intra-tumoral characterization of anti-correlated CoDEX clusters recapitulate a granular

and tumor-localized “classical”, “quasi-basal”, and “basal-like” spectrum. (a - c) UMAP of single-

cell sequenced low-passage (a) PATC69 (7,857 cells), (b) PATC124 (9,482 cells) and (c) PATC53

(14,791 cells) cell lines with an overlay of a PDAC co-expression network-normalized C1v23 signature

differential with respective percentages of cells corresponding to each subtype [right panels]. Pie chart

with distribution of three classifications (percentage) (d) Density histogram of the C1v23 signature

differential distributions of PATC124, PATC53, and PATC69 PDX lines, with more positive cluster

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 36: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

differential indicating enrichment in Cluster 1 and more negative cluster differential indicating Cluster 23

enrichment. (e) Diagram of the locations and numbers of isolated patient core needle biopsy (CNB)

samples used for single-cell RNA sequencing analysis. (g) Combined UMAP outlining the tumor

microenvironment components of multiple single-cell sequenced primary (n = 4) and metastatic (n = 3)

CNB samples from PDAC patients. (g) The mean percentage of cluster representation for each cluster in

the PDAC co-expression network from the cellular constituents of the tumor microenvironment. Each dot

per cell type represents individual clusters. Box plot center: mean; box: quartiles 1–3; whiskers: quartiles

1–3 ± 1.5 × IQR. (h) Combined CNB UMAP with an overlay of a PDAC co-expression network-normalized

C1vC23 signature represented as the difference between Cluster 1 and Cluster 23 expressions.

Figure 5: CoDEX platform prioritizes potential biomarkers by integrating first-degree nodes of

anchored dependencies and cluster centrality. (a) Force-directed layout of prominent anti-correlative

cluster centroid trends across the PDAC co-expression network highlighting Cluster 1 (Lipid metabolism)

vs. Cluster 23 (Cell development) along with PATC69 in vivo specific vulnerability, DNM2, and PATC53

specific in vivo vulnerability, MAPK11, from our CRISPR screen analysis. Force-directed network outlining

first-degree nodes of DNM2. (b) Dot plot of closeness centrality of each gene within Cluster 1 and Cluster

23 calculated using Cytoscape compared with gene-to-cluster correlation calculated as the Pearson

correlation between single-gene expression and Cluster 1 and 23 centroids using the PDX cohort (n=48)

(c) Force-directed layout of first-degree nodes of Cluster 23 localized MAPK11 generated from

Cytoscape, or direct gene-wise correlations within the co-expression network with color of each gene

representing its respective cluster assignment (d) Force-directed layout of first-degree nodes of Cluster 1

localized DNM2 generated from Cytoscape (e) Heat map of the normalized expression the first-degree

node of DNM2 and MAPK11 in the PDX models (n=48), displayed along the C1vC23 signature-

determined classification on the left and gene-wise closeness centrality on the top annotation (f) IHC-IF

for VSIG1 vs. VIM, in the context of HLA, DAPI and Ki67, in representative slices of classical (PATX90),

quasi-basal (PATX124) and basal-like (PATX53) PDXs (g) Quantification of VIM-positive, VSIG1-positive,

and double-negative signatures in HLA-positive tumor cells.

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 37: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

Figure 6. Functional validation of CoDEX-informed intra-tumoral context dependency for C1vC23-

associated genetic targets. (a) Force-directed layout of prominent anti-correlative cluster centroid trends

across the PDAC co-expression network highlighting Cluster 1 (Lipid metabolism) vs. Cluster 23 (Cell

development) along with PATC53 in vivo specific vulnerabilities from our CRISPR screen analysis with

size of the gene representing the quantile-normalized BF (b) Overview of the feature barcoding strategy

for intratumoral tracking of PDAC co-expression network signatures following sgRNA knockouts.

Targeting sgRNAs for ABCG8 (as a negative control), ILK, SMAD4, and ZEB1 were selected and based

on CRIPSR screening results. PATC69 and PATC53 cell populations were individually transduced in

vitro, selected with puromycin, and parallel assays were also prepared to confirm knockout phenotypes

through colony growth, and sgRNA cutting through Sanger sequencing. Each knockout cell population

was cultured in vitro for 22 days (CRISPR screening time-point day 10), and then combined scRNAseq

library preparation. A total of 10,000 cells per PDX line were sequenced, 2,500 per condition. (c) UMAP

of the PATC69 PDX line with defined ABCG8, ILK, SMAD4, and ZEB1 knockout populations (10113 total

cells). (d) Normalized viability of PATC69 cells following knockout of ABCG8, SMAD4, ZEB1, ILK, or

RPS27A with sgRNA. **p < 0.05. (e) UMAP of the PATC53 PDX line with defined ABCG8, ILK, SMAD4,

and ZEB1 knockout populations (11,439 total cells). (f) Normalized viability of PATC53 cells following

knockout of ABCG8, SMAD4, ZEB1, ILK, or RPS27A with sgRNA. *p < 0.05; **p < 0.01. (g) Comparative

PATC69 density plots of C1vC23 signature following knockout with sgILK, sgSMAD4, sgZEB1 and

sgABCG8 negative control calculated as the differential between the Cluster 1 and 23 centroid score, with

more positive differential indicating cluster 1 enrichment and vice versa. P values and D statistic derived

from Kolmogorov Smirnov test (h) Comparative PATC53 density plots of C1vC23 signature in PATC53

cells following knockout of ILK, SMAD4, ZEB1, or ABCG8 (negative control).

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 38: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

Figure 1: Defining Signaling Diversity in PDAC through the Construction and Annotationof a PDX-based Co-Expression Network

FE

DC

BA

REVIGO Gene Ontology treemap

cellular modifiedamino acid

metabolic process

bloodcirculation

digestion

digestivesystemprocesslymphocyte

differentiation

drugmetabolicprocess

acidsecretion

alcoholmetabolicprocess

aniontransport

cellularlipid

catabolicprocess

glycerolipidbiosynthetic

process

glycosylation

icosanoidmetabolicprocess

iontransport

lipidmetabolicprocess

long−chainfatty acidmetabolicprocess

neutrallipid

metabolicprocess

organicacid

metabolicprocess

organic acidtransport

proteinglycosylation

smallmolecule

biosyntheticprocess

steroidmetabolicprocess

terpenoidmetabolicprocess

transmembranetransport

responseto

carbohydrateresponse to

xenobiotic stimulus

ammoniumion

metabolism

carbohydratemetabolism

cellactivation

digestion

drugcatabolism

drugmetabolism

lipidmetabolism organic

hydroxycompoundmetabolism

regulationof

hormonelevels

xenobioticmetabolism

REVIGO Gene Ontology treemap

actinfilament−based

process

celldevelopment

cellmotility

cellrecognition

cell−cellsignaling

extracellularmatrix

organization

regulationof actin

filament−basedprocess

regulationof

anatomicalstructure

size

regulationof cell

projectionorganization

regulationof

cellularcomponent

size

reproductivestructure

development

reproductivesystem

developmenttissue

development

behavior

biologicaladhesion

celladhesion

celldevelopment

collagenmetabolism

growth

locomotion

locomotorybehavior

neuralprecursor

cellproliferation

REVIGO Gene Ontology treemap

Golgi vesicletransport

REVIGO Gene Ontology treemap

posttranscriptionalregulationof gene

expression

regulationof cellular

amidemetabolicprocess

amidetransport

peptidetransport

macromolecularcomplexsubunit

organization

organelleassembly

nuclear−transcribedmRNA

catabolism,nonsense−mediated

decay

proteinlocalization

toendoplasmic

reticulum

ribosomebiogenesis

Pearson

1-1

Pearson

1-1

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 39: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

In vivo orthotopic CRISPR screens in cluster-informed PDXs

PPI-based functional PDAC CRISPR Library

-

Bb C

A

Cluster-defined PDAC subtype associated dependencies

Cluster #2

Cluster #1

Cluster #23

Cluster #13

D

C

B

A

Cluster #1

Cluster #23

A B

C D E

in vivo

RACK1RPS27AMYCRPL30GRB2MTORYKT6CDC27

RACK1RPS27AMYCRPL30GRB2MTORYKT6CDC27

PATC53 PATC124

PATC69

CRISPR Screens

KRAS

RPS27AMAD2L2

CCND1

DNM2

SKP2

ATP1A1

PRKDC

SNAP23

CDC42

CTNNB1

RPS27A

MAD2L2

SKP2SNAP23

EZRDOK1

MAPK11PRKDC

DVL1PRKCZ

ATP1A3

TLE2

PXN

TLN1

CRKL

ZEB1KRAS

RPS27A

ITGB1

TLN1

CCND1

ATP1A1

NKAIN4

NPNT

CCL5FAM171A2

SHH

LRP1CYBA

SDC3

MAPK11

SMAD4

SNAI1

CDC42

NCK1

SKP2

CA2

ITGB2

MFNG

Gene: 1 - 4.9Gene: 5 - 9.9

Gene: 10 - 14.9

Gene: 15 - 25

Quantile-Normalized Bayes Factors

F

Figure 2: Informed CRISPR chatacterization of PDX models identifies co-expressionnetwork cluster-associated functional diversity within the PDAC cohort

PATC53PATC69 PATC124

PATC69 PATC124 PATC53G H

32

Quantile-Normalized Bayes Factors

-24

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 40: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

B

Figure 3: Prominent anti-correlating clusters recapitulate a granular spectrum of “classical”, “quasi-basal”, and “basal-like” cells intertumorally

ClassicalQuasi-basalBasal-like

CEN

_Sub

type

Mof

fitt

TCGA.FB.AAPU.01TCGA.HZ.A8P1.01TCGA.US.A776.01TCGA.S4.A8RM.01TCGA.US.A77G.01TCGA.IB.A5SP.01TCGA.YY.A8LH.01TCGA.US.A779.01TCGA.FB.AAQ0.01TCGA.IB.7644.01TCGA.2J.AAB1.01TCGA.FB.A78T.01TCGA.FB.AAPQ.01TCGA.2L.AAQA.01TCGA.IB.AAUU.01TCGA.HV.AA8X.01TCGA.HV.A7OL.01TCGA.FB.AAQ6.01TCGA.FB.AAQ3.01TCGA.Q3.AA2A.01TCGA.2L.AAQJ.01TCGA.OE.A75W.01TCGA.F2.6879.01TCGA.2L.AAQL.01TCGA.PZ.A5RE.01TCGA.US.A77E.01TCGA.HV.A5A4.01TCGA.HZ.7919.01TCGA.2J.AABH.01TCGA.S4.A8RP.01TCGA.HZ.A9TJ.01TCGA.IB.7886.01TCGA.F2.A8YN.01TCGA.LB.A7SX.01TCGA.HZ.A49I.01TCGA.HZ.8317.01TCGA.3A.A9J0.01TCGA.S4.A8RO.01TCGA.2J.AABE.01TCGA.HZ.A8P0.01TCGA.3E.AAAZ.01TCGA.3A.A9IZ.01TCGA.2J.AABA.01TCGA.IB.7652.01TCGA.XD.AAUL.01TCGA.IB.8127.01TCGA.LB.A8F3.01TCGA.3A.A9IH.01TCGA.M8.A5N4.01TCGA.HV.A5A3.01TCGA.FB.AAPZ.01TCGA.IB.A7M4.01TCGA.IB.AAUN.01TCGA.HZ.7922.01TCGA.HZ.A77O.01TCGA.HV.A5A6.01TCGA.HZ.8636.01TCGA.IB.AAUO.01TCGA.3A.A9I5.01TCGA.3A.A9IC.01TCGA.2L.AAQE.01TCGA.FB.AAQ1.01TCGA.3A.A9IU.01TCGA.IB.A6UF.01TCGA.FB.A545.01TCGA.2J.AABU.01TCGA.3A.A9IB.01TCGA.IB.A7LX.01TCGA.FB.AAQ2.01TCGA.IB.7890.01TCGA.FB.AAPS.01TCGA.H6.8124.01TCGA.2J.AAB6.01TCGA.IB.A5SS.01TCGA.HZ.8005.01TCGA.2J.AABI.01

Clu

ster

1

Clu

ster

23 0

0.2

0.4

0.6

0.8

Purity

Cluster Enrichment

−0.200.20.40.6

PoorModerateWell

Total=45

Total=31

Total=25

Total=36

Total=15

Moffitt Classification Anticorrelating Cluster Signature

Tumor Differentiation Status

Total=17

Total=8

ModeratePoor

Total=20

Clinical Histology

Regional & LocoregionalDistant

Recurrence Type

Total=7

Total=12

Total=16

p value = 0.00217 p value = 0.05896

PDX Expression Profile

PATX090

PATX124

PATX148

CEN

_Sub

type

Mof

fitt

PATX090PATX122PATX097PATX100PATX079PATX032PATX112PATX172PATX039PATX144PATX141PATX034PATX147PATX110PATX140PATX045PATX126PATX077PATX066PATX081PATX108PATX046PATX118PATX102PATX076PATX137PATX136PATX069PATX104PATX124PATX050PATX059PATX142PATX060PATX056PATX133PATX106PATX107PATX179PATX043PATX092PATX055PATX064PATX053PATX155PATX070PATX153PATX178PATX148

Clu

ster

1

Clu

ster

23

Cluster Enrichment

−2−1012

Quasi-basalClassical

Basal-like

Normalized Gene Expression

-6.34 6.34

Classic

al

Basal-

like

-2

-1

0

1

Moffitt Subtypes (PDX Cohort)

Mod

ule

Cen

troid

Cluster 1 - Module Centroid ComparisonClassical vs. Basal-like PDXs

p = 3.67e-06

C

D

E

A

FBasal-likeClassical

Basal-like

ClassicalQuasi-basal

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 41: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

PATC6942.82 %

49.39 %

7.79 %

9.45 %

54.61 %

35.95 %

PATC124

10.46 %

64.89 %

24.65 %

A

Figure 4: Intratumoral characterization of anti-correlating clusters recapitulate a granular and tumor-localized “classical”, “quasi-basal”, and “basal-like” spectrum

ClassicalQuasi-basalBasal-like

CEN

Clus

ter %

Exp

ress

ed

ClassicalQuasi-basalBasal-like

ClassicalQuasi-basalBasal-like

Male Female

PATC53

B

C

D

E

F

G

H

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 42: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

Figure 5: CoDEX methodology prioritizes potential biomarkers by integrating first-degree nodes of anchored dependencies and cluster centrality

Whole Slide

20X

E

DNM2

*

*

**

**

MAPK11

*

*

PATX90 PATX124 PATX148

C F

D

G

B

Gene: 1 - 4.9Gene: 5 - 9.9

Gene: 10 - 14.9

Gene: 15 - 25

Quantile-Normalized Bayes FactorsPATC69

PATC53

DNM2

MAPK11

Pearson-1 1

A

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint

Page 43: * These authors contributed equally to this work † … · 2020. 9. 17. · * These authors contributed equally to this work † Correspondence: acarugo@mdanderson.org, gdraetta@mdanderson.org

Figure 6: Functional validation of CoDEX-informed intratumoral context dependency for C1vC23 associated genetic targets

PATC69 PATC53

PATC69 PATC53

C E

G H

ABCG8

SMAD4ZEB1

ILK

RPS27A

0.00.20.40.60.81.01.21.41.6

PATC69

sgRNA

Viab

ility

(Nor

mal

ized

)

ns

ns

ns

**

ABCG8

SMAD4ZEB1

ILK

RPS27A

0.00.20.40.60.81.01.21.41.6

PATC53

sgRNA

Viab

ility

(Nor

mal

ized

)

**

**

***

FD

A B

Gene: 1 - 4.9Gene: 5 - 9.9

Gene: 10 - 14.9

Gene: 15 - 25

Quantile-Normalized Bayes Factors

KS D Value: 0.328 p value: 2.2x10-16

KS D Value: 0.101 p value: 6.0x10-4

KS D Value: 0.252 p value: 2.2x10-16

KS D Value: 0.281 p value: 2.2x10-16

KS D Value: 0.265 p value: 2.2x10-16

KS D Value: 0.397 p value: 2.2x10-16

Pearson-1 1

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint