supplementary materials for - science...anonymized in accordance with approval and advisory report...

28
www.sciencemag.org/content/357/6352/eaan2507/suppl/DC1 Supplementary Materials for A pathology atlas of the human cancer transcriptome Mathias Uhlen,* Cheng Zhang, Sunjae Lee, Evelina Sjöstedt, Linn Fagerberg, Gholamreza Bidkhori, Rui Benfeitas, Muhammad Arif, Zhengtao Liu, Fredrik Edfors, Kemal Sanli, Kalle von Feilitzen, Per Oksvold, Emma Lundberg, Sophia Hober, Peter Nilsson, Johanna Mattsson, Jochen M. Schwenk, Hans Brunnström, Bengt Glimelius, Tobias Sjöblom, Per-Henrik Edqvist, Dijana Djureinovic, Patrick Micke, Cecilia Lindskog, Adil Mardinoglu, Fredrik Ponten *Corresponding author. Email: [email protected] Published 18 August 2017, Science 357, eaan2507 (2017) DOI: 10.1126/science.aan2507 This PDF file includes: Materials and Methods Figs. S1 to S14 Captions for tables S1 to S21 References Other supplementary material for this manuscript includes the following: Tables S1 to S21 (Excel format)

Upload: others

Post on 13-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

www.sciencemag.org/content/357/6352/eaan2507/suppl/DC1

Supplementary Materials for

A pathology atlas of the human cancer transcriptome

Mathias Uhlen,* Cheng Zhang, Sunjae Lee, Evelina Sjöstedt, Linn Fagerberg, Gholamreza Bidkhori, Rui Benfeitas, Muhammad Arif, Zhengtao Liu, Fredrik Edfors,

Kemal Sanli, Kalle von Feilitzen, Per Oksvold, Emma Lundberg, Sophia Hober, Peter Nilsson, Johanna Mattsson, Jochen M. Schwenk, Hans Brunnström, Bengt Glimelius, Tobias Sjöblom, Per-Henrik Edqvist, Dijana Djureinovic, Patrick Micke, Cecilia Lindskog, Adil Mardinoglu,

Fredrik Ponten *Corresponding author. Email: [email protected]

Published 18 August 2017, Science 357, eaan2507 (2017) DOI: 10.1126/science.aan2507

This PDF file includes: Materials and Methods

Figs. S1 to S14

Captions for tables S1 to S21

References

Other supplementary material for this manuscript includes the following: Tables S1 to S21 (Excel format)

Page 2: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

3

Materials and Methods

Sample preparation

Samples of normal and cancer tissues used for protein and mRNA expression analysis, as

described previously (6) were obtained from the Department of Pathology, Uppsala University

Hospital, Uppsala, Sweden as part of the sample collection governed by the Uppsala Biobank

(http://www.uppsalabiobank.uu.se/en/). All human tissue samples used in the present study were

anonymized in accordance with approval and advisory report from the Uppsala Ethical Review

Board (Reference # 2002-577, 2005-338, 2007-159 and 2012-532 (protein) and # 2011-473 and

2012-532 (RNA)).

Cancer patient samples used for mRNA expression and survival analysis were collected

from The Cancer Genome Atlas (TCGA) project from the initial release of Genomic Data

Commons (GDC) on June 6, 2016, and information regarding sex, age and other clinical

information can be found at https://gdc-portal.nci.nih.gov/. Only samples with both clinical info

and transcriptomic data available at that time point were used in this study.

The lung cancer cohort consists of 345 patients that were consecutively operated at the

Uppsala University hospital between 2006-2010 as published previously (31). Fresh-frozen RNA

was available for 199 of these patients and used for RNAseq analysis as previously described

(30).

The colorectal cancer cohort is based on U-CAN (http://www.u-can.uu.se/?languageId=1),

an infrastructure programme for biobanking, and includes 828 patients with tumor tissue from

colorectal cancers in a TMA format and an associated clinical database. All patients have been

operated at the Uppsala University hospital between 2010 and 2016. For a selected subset of

these patients (n=60), where frozen tumor tissue showed a high fraction of tumor cells, RNA was

extracted and used for RNA sequencing based on the same methodologies as for the lung cancer

tissue described above.

The hepatocellular carcinoma cell line Hep G2 was derived from DSMZ, Braunschweig,

Germany (42).

Protein profiling (tissue microarrays and immunohistochemistry)

Candidates for protein profiling in lung and colorectal cancer were selected based on

prognostic association in the TCGA data, availability of antibodies already analyzed by the

Human Protein Atlas project, supportive antibody validation and distinct differentially expressed

staining pattern among the 12 cancer patients available on the Human Protein Atlas. Generation

of tissue microarrays (TMAs), immunohistochemical staining and slide scanning were performed

as previously described (43). Briefly, formalin-fixed, paraffin-embedded (FFPE) tissue samples

were collected from the pathology archives based on hematoxylin and eosin (HE)-stained tissue

sections showing a representative normal histology for each tissue type. Representative cores (1

mm in diameter) were sampled from the FFPE blocks and assembled into TMAs. TMA blocks

were cut into 4-μm-thick sections using waterfall microtomes (Microm HM 355S, Thermo

Fisher Scientific, Freemont, CA, USA), dried at RT overnight and baked at 50°C for 12-24 hours

prior to immunohistochemical staining. Automated immunohistochemistry was performed using

Autostainer 480® instruments (Lab Vision, Freemont, CA, USA). For details on antibodies, see

Table S18. High-resolution digital images were obtained by slide scanning using Scanscope XT

(Aperio, Vista, CA, USA). The images of immunohistochemically stained TMA sections were

evaluated and scored manually using a four-graded scale for staining intensity (negative, weak,

Page 3: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

4

moderate or strong) and a six-graded scale for fraction of positive cells (0-1%, 2-10%, 11-25%,

26-50%, 50-75% or >75%).

Transcript profiling (RNA-seq)

Tissue samples were embedded in Optimal Cutting Temperature (O.C.T.) compound and

stored at –80°C. HE-stained frozen sections (4 µm) were prepared from each sample using a

cryostat and the CryoJane® Tape-Transfer System (Instrumedics, St. Louis, MO, USA). Each

slide was examined by a pathologist to ensure sampling of representative normal tissue. Three

sections (10 µm) were cut from each frozen tissue block and collected in a tube for subsequent

RNA extraction. The tissue was homogenized mechanically using a 3-mm steel grinding ball

(VWR). Total RNA was extracted from the cell lines and tissue samples using the RNeasy Mini

Kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions. The extracted RNA

samples were analyzed using either an Experion automated electrophoresis system (Bio-Rad

Laboratories, Hercules, CA, USA) with the standard sensitivity RNA chip or an Agilent 2100

Bioanalyzer system (Agilent Biotechnologies, Palo Alto, USA) with the RNA 6000 Nano

Labchip Kit. Only samples of high-quality RNA (RNA Integrity Number ≥7.5) were used in the

following mRNA sample preparation for sequencing.

Processing of RNA-seq data

RNA sequencing data for 162 samples from 37 tissues and organs from the Human Protein

Atlas, 9,666 samples from 33 cancer types from TCGA, 198 samples from a lung cancer cohort

from a previous study (30) and 59 colorectal cancer samples in the UCAN cohort were

processed/reprocessed using the same pipeline as GDC. In brief, the processed reads were

mapped to the human genome (GRCh38) using STAR v2.4.2a (44). To obtain quantification

scores for all human genes and transcripts across all samples, raw counts were calculated using

HTSeq v0.6.1p1 (45) and then converted to FPKM (fragments per kilobase of exon per million

mapped reads). Gencode annotation v22 was used in HTSeq, and 19,571 protein-coding genes

overlapped with the Human Protein Atlas. The average FPKM value for all individual samples

for each tissue was used to estimate gene expression levels. A cut-off value of 1 FPKM was used

as a detection limit across all tissues.

RNA-based classification of genes

Each of the 19,571 genes with mapped RNA-seq data was classified into one of six

categories for normal tissues and cancers based on the FPKM levels in 32 normal tissues and 33

cancer types, respectively: (1) Not detected: FPKM <1 in all tissues/cancers; (2) Enriched: at

least a 5-fold higher FPKM level in one tissue/cancer than in all other tissues/cancers; (3) Group

enriched: a 5-fold higher average FPKM value in a group of 2-7 tissues/cancers than in all other

tissues/cancers; (4) Expressed in all: detected in all 32 tissues/cancers with FPKM >1; (5) Tissue

enhanced: at least a 5-fold higher FPKM level in one tissue/cancer than the average value of all

37/33 tissues/cancers; and (6) Mixed: the remaining genes detected in 1-36/32 tissues/cancers

with FPKM >1 that did not fit the above categories.

Differential expression analysis

The significantly down-regulated tissue enriched genes in liver cancer were identified by

differential expression analysis using DESeq2 (46). The raw counts for 10 normal and 365

cancer samples were used as input for DESeq2.

Survival analysis

Based on the FPKM value of each gene, we classified the patients into two groups and

examined their prognoses. In the analysis, we excluded genes with low expression, i.e., those

with a median expression among samples less than FPKM 1. The prognosis of each group of

Page 4: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

5

patients was examined by Kaplan-Meier survival estimators, and the survival outcomes of the

two groups were compared by log-rank tests. To choose the best FPKM cut-offs for grouping the

patients most significantly, all FPKM values from the 20th to 80th percentiles were used to

group the patients, significant differences in the survival outcomes of the groups were examined

and the value yielding the lowest log-rank P value is selected.

Additionally, a previous published method as a R package named ‘maxstat’ (13) for

normalization of optimally selected expression cutoff was employed to evaluate the stability of

the results.

Defining favorable and unfavorable prognostic genes

Genes with log rank P values less than 0.001 were defined as prognostic genes. In addition,

if the group of patients with high expression of a selected prognostic gene has a higher observed

event than expected event, it is an unfavorable prognostic gene; otherwise, it is a favorable

prognostic gene. When the statistic method by Hothorn and Lausen was used for sensitivity

analysis, prognostic genes were defined as genes with maximal P value less than 0.01. When

hazard ratio (HR) was used for sensitivity analysis, genes whose high expression associated with

the group of patient with HR more than 1.2 were defined as unfavorable prognostic genes, and

genes whose low expression associated with the group of patient with HR more than 1.2 were

defined as favorable prognostic genes.

Survival analysis based on a panel of genes with the most prognostic expression

After examining the most prognostic genes for each cancer with their best FPKM cut-offs,

we selected the five most significant favorable genes and five most significant adverse genes as

“panel” genes. Based on the best FPKM cut-off values, we examined whether favorable genes

were expressed more than the best cut-off or adverse genes expressed less than the best cut-off in

all patients. If more than 80% of the panel genes were expressed in one of the two cases, we

predicted that those patients would be in the better survival group; otherwise, we predicted that

those patients would be in the poor survival group. Next, we compared the survival outcomes for

these two patient groups using log-rank tests.

Survival analysis of lung and colorectal cancer validation cohorts

The protein expression scores, based on staining intensity (score 1-4) and fraction of stained

ells (score 1-6), were multiplied in order to generate a protein level score between one and 24.

This score was used for subsequent survival analysis using a best separation cut-off.

Gene ontology analysis

Enriched gene ontology terms in sets of enriched genes were determined using DAVID

Bioinformatics Resource v 6.8 (47). Only the biological gene ontology term ‘GOTERM_BP_5’

was used to obtain reliable and interpretable enriched terms.

Visualization of enriched GO terms

The enriched GO terms were visualized in a network plot using Cytoscape (version 3.2.1)

with the external package EnrichmentMap (48). An FDR of 0.05 was used as a threshold for the

selection of enriched GO terms. The overlap coefficient 0.8 and combined constant 0.8 were

selected for similarity cut-offs.

Generality and directionality in the bubble plot

All GO terms that were over-represented by favorable or unfavorable prognostic genes for

at least one cancer were visualized in the plot. The bubbles are located based on two parameters

of the corresponding GO terms defined herein as generality (y-axis) and directionality (x-axis),

calculated as follows:

Page 5: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

6

𝐺𝑒𝑛𝑒𝑟𝑎𝑙𝑖𝑡𝑦 = ∑(𝑁𝑓𝑎𝑣,𝑖 + 𝑁𝑢𝑛𝑓,𝑖)

𝑛

𝑖 = 1

𝐷𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑎𝑙𝑖𝑡𝑦 = ∑ 𝑁𝑓𝑎𝑣,𝑖

𝑛

𝑖 = 1

− ∑ 𝑁𝑢𝑛𝑓,𝑖

𝑛

𝑖 = 1

where i represents each cancer type, and n represents the total number of cancer types. Nfav,i and

Nunf,i are binary variables, and their values are 1 if the GO term is over-represented by the

favorable and unfavorable prognostic genes of cancer ith, respectively, and 0 otherwise.

Generation of cancer-specific co-expression networks

For each cancer transcriptome, we first removed genes with low expression by disregarding

the bottom 25% expression means (mean values shown in Table S11). Calculating Pearson’s

correlation coefficients between the expressions of genes above the bottom 25%, we selected the

top 1% correlation values of those gene pairs of each cancer type and constructed cancer-specific

co-expression networks. Co-expression networks are available at http://inetmodels.com.

Hallmark gene list

The hallmark genes used in this study were collected based on related biological functions

in the MSigDB and KEGG databases, and the detailed list is included in Table S19.

Analysis of co-expression networks

In each co-expression network, co-expression clusters, i.e., groups of highly co-expressed

genes, were identified using the modularity-based community detection algorithm random walk,

which was implemented in the cluster-walktrap function of the R igraph package (49, 50). Here,

we excluded small-sized co-expression clusters with less than five genes for further analysis.

While visualizing co-expression networks by their co-expression clusters (Figure 5C),

interactions among co-expression clusters were identified based on an interaction score of two

clusters, A and B (IAB), which were defined by expected co-expression links (EAB) and observed

co-expression links (OAB) between the clusters, as described below:

Cluster interactions when IAB >1,

𝐼𝐴𝐵 = 𝑂𝐴𝐵−𝐸𝐴𝐵

𝐸𝐴𝐵 (1)

𝐸𝐴𝐵 = ∑ ∑𝑘𝑎𝑘𝑏

2𝑁𝑏∈𝐵𝑎∈𝐴 (2)

where a and b respectively indicate a node of cluster A and a node of cluster B, ka and kb

respectively indicate the degree of connectivity of node a and b, and N indicates the number of

all network edges.

Next, we examined co-expression clusters enriched in genes associated with hallmarks of

cancer using hypergeometric tests (Figure 5A-C). For the examination, we selected genes that

were associated with hallmarks of cancer from the MSigDB and KEGG pathway (16, 20).

Likewise, we examined co-expression clusters that were enriched in favorable or adverse genes

by hypergeometric tests (Figure 5C).

Reconstruction of personalized genome-scale metabolic models (GSMMs)

Personalized models were reconstructed based on the RNA-seq data and a previously

developed task-driven model reconstruction (tINIT) algorithm (26). The tINIT algorithm

Page 6: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

7

employs defined metabolic tasks for imposing constraints on the functionality of the

reconstructed models. In this context, only cell growth was defined as required for tumor cells.

This metabolic task was used as an input in the tINIT algorithm for the reconstruction of

personalized models for growth and was simultaneously consistent with the RNA-seq data. A

generic GSMM for human cancer (Table S20) was used as the reference model for the tINIT

algorithm. A time limit of 10h have been set, and as a result, 6753 personalized models have

been reconstructed (Table S21). The personalized GSMMs are available at

https://www.ebi.ac.uk/biomodels (51) with the accession numbers MODEL1707110000-

MODEL1707116752.

Metabolic pathway enrichment analysis

The genes related to a specific metabolic pathway were defined according to the Gene-

Reaction relationship from the generic GSMM for human cancer. A gene set was regarded as

enriched in a specific metabolic pathway if it significantly (Padj <0.05) overlapped with the

genes related to the metabolic pathway using the hypergeometric test.

Page 7: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

8

Fig. S1 Global expression pattern of protein-coding genes in human tissues and cancers.

Heat map showing the pairwise correlation between all 37 normal tissues and 33 TCGA cancers

based on transcript expression levels of 19,571 genes. The average FPKM values for each gene

and tissue/cancer were used in the analysis.

Page 8: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

9

Fig. S2 Classification of protein-coding genes in human tissues and cancers. The number of

protein-coding genes classified in each expression category based on the transcript expression

level of 33 cancers from TCGA (A) and 37 normal tissues (B).

Page 9: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

10

Fig. S3 GO term enrichment analysis of cancer-specific house-keeping genes. Network

visualization of enriched GO terms, in which the node sizes indicate the number of genes in the

corresponding GO terms, and edge widths indicate the number of genes shared between the two

linked GO terms.

Page 10: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

11

Fig. S4 PCA plot showing the similarities in expression of 19,571 protein-coding genes

among 21 subcancer types. The short names of the subcancer types follow the naming of

TCGA which are provided in Table S4.

Page 11: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

12

Fig. S5 Overall survival analysis for 17 cancer types. Kaplan-Meier plots showing the overall

survival rates of patients from each of the 17 cancer types. We showed overall survival rates

(solid line) with 95% confidence intervals (dashed lines for upper and lower bounds). Here we

found prostate cancer and testis cancer had the best 3-year survival rates and glioma and

pancreatic cancer had the lowest 3-year survival rates.

Page 12: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

13

Fig. S6 Comparison between maximally selected log rank P values used in our study and

another method described by Hothorn and Laursen for 17 cancer types. Scatter plots show

the correlation between the expression cut-offs for stratifying patients (left) and log scale P

values (right) between the method used in this paper and the method described by Hothorn and

Laursen (13). The two alternative statistical methods showed highly similar results.

Page 13: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

14

Fig. S7 Bubble plots showing the common enriched GO terms among the 17 Human

Pathology Atlas cancer types based on optional P value or HR cutoff defined prognostic

genes

Page 14: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

15

Fig. S8 Overlapping of hallmark genes with prognostic genes of cancers. Bar plot showing

the fraction of hallmark genes that overlap with prognostic genes for all and each of the 17

cancers.

Page 15: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

16

Fig. S9. Co-expression network analyses with prognostic genes selected based on optional

parameters. (A) Overlapping of hallmark genes with prognostic genes of cancers. (B) Network

plot showing co-expression clusters of lung cancer, overlapped with prognostic genes. The gray,

yellow and red color of the nodes indicates that the cluster was significantly enriched with

hallmark genes, prognostic genes and both cases, respectively. (C) Bar plot showing the fraction

of prognostic genes that are mere hallmark genes (red), co-expressed in “hallmark” gene clusters

(pink), or not co-expressed with “hallmark” genes (gold).

Page 16: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

17

Fig. S10 Co-expression cluster analysis for 17 cancers. Network plots for the co-expression

clusters for 17 major cancers. All nodes indicate gene co-expression clusters, and edges indicate

significant co-expression links connected between clusters. Gray, yellow and red nodes indicate

clusters that are significantly enriched with hallmark genes, prognostic genes or both,

respectively.

Page 17: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

18

Fig. S11 Summary of model statistics for personalized GSMMs from 17 cancer types. Box

plot showing the number of reactions, metabolites and genes for personalized GSMMs from 17

different cancer types.

Page 18: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

19

Fig. S12 Metabolic functions of non-toxic genes that are essential for tumor growth and

conserved in 17 cancer types. Circus plot showing the 32 conserved genes that are essential for

tumor growth and their corresponding metabolic functions.

Page 19: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

20

Fig. S13 Validation of panel genes for lung cancer. Kaplan-Meier plot for panel genes

stratified patient groups in independent lung cancer cohort, showing high statistical significance

(log-rank P = 0.0154). Exactly same expression cutoffs as the discovery cohort were used for

each gene, and each patient in group marked as ‘good’ has 8 out of the 10 panel genes showing

favorable sign (high expression of favorable genes or low expression of unfavorable genes).

Page 20: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

21

Fig. S14 Validation of selected genes with a prognostic effect in colorectal cancer.

Kaplan-Meier plots for RNA level separation from the TCGA cohort, the HPA cohort and

protein level separation are shown in the first, second and third columns, respectively. The 3-

year recurrence was used as event for the HPA cohort because of the short follow-up time. The

log-rank P values are shown in the lower left corner of each Kaplan-Meier plot. High and low

proteins staining are shown in the fourth and fifth columns. Protein expression levels of the

targets in all Human Pathology Atlas cancers are shown in the last column.

Page 21: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

2

Table S1. Summary of 33 TCGA cancer types.

Table S2. Categories of protein-coding genes in normal tissues and cancers.

Table S3. GO term enrichment analysis for cancer-specific house-keeping genes from DAVID.

Table S4. Summary of the 17 major cancer types examined in this study,

Table S5. The number of prognostic genes for 17 major cancer types.

Table S6. Expression cut-off for the best stratification and results of the survival analysis for all protein-

coding genes in 17 major cancer types.

Table S7. Prognostic genes and their log-rank P values involved in prognostic panels of the Big 5 cancers

shown in Figure 2A.

Table S8. Summary of all prognostic genes and the respective cancer types for which they are prognostic

markers.

Table S9. Enriched GO terms for each cancer type with prognostic genes defined by two different log rank

P value cutoffs and HR cutoff.

Table S10. Summary of unfavorable prognostic cell cycle genes and the respective cancer types for which

they are prognostic markers.

Table S11. Hypergeometric P values of the overlap between favorable prognostic genes for each cancer

and genes with elevated expression in their supposed tissues of origin.

Table S12. Statistical features of cancer-specific co-expression networks for 17 cancer types. All the

networks are normalized and of the same size with 14,293 genes and 1,021,378 co-expressed gene pairs for

fair comparison.

Table S13. Summary of genes involved in the co-expression cluster of lung cancer in Figure 5B.

Table S14. Statistical summary of cancer-specific co-expression networks in cancer.

Table S15. Statistical summary of genome-scale metabolic models for all patients.

Table S16. Summary of metabolic pathways associated with the essential genes in 17 cancers.

Table S17. Short names for all 17 cancer types.

Table S18. Antibodies used for protein profiling of the selected genes.

Table S19. Terms and full gene list for hallmark of cancer.

Table S20. Reference GSMM for reconstruction of personalized GSMMs.

Table S21. Complete list of patient IDs and corresponding cancer types for reconstructed GSMMs.

Page 22: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

References

1. D. J. Brennan, D. P. O’Connor, E. Rexhepaj, F. Ponten, W. M. Gallagher, Antibody-based proteomics: Fast-tracking molecular diagnostics in oncology. Nat. Rev. Cancer 10, 605–617 (2010). doi:10.1038/nrc2902 Medline

2. E. Björnson, B. Mukhopadhyay, A. Asplund, N. Pristovsek, R. Cinar, S. Romeo, M. Uhlen, G. Kunos, J. Nielsen, A. Mardinoglu, Stratification of hepatocellular carcinoma patients based on acetate utilization. Cell Rep. 13, 2014–2026 (2015). doi:10.1016/j.celrep.2015.10.045 Medline

3. A. Mardinoglu, J. Nielsen, New paradigms for metabolic modeling of human cells. Curr. Opin. Biotechnol. 34, 91–97 (2015). doi:10.1016/j.copbio.2014.12.013 Medline

4. S. Lee, A. Mardinoglu, C. Zhang, D. Lee, J. Nielsen, Dysregulated signaling hubs of liver lipid metabolism reveal hepatocellular carcinoma pathogenesis. Nucleic Acids Res. 44, 5529–5539 (2016). doi:10.1093/nar/gkw462 Medline

5. The Cancer Genome Atlas (TCGA) Research Network, J. N. Weinstein, E. A. Collisson, G. B. Mills, K. R. Shaw, B. A. Ozenberger, K. Ellrott, I. Shmulevich, C. Sander, J. M. Stuart, The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013). doi:10.1038/ng.2764 Medline

6. M. Uhlén, L. Fagerberg, B. M. Hallström, C. Lindskog, P. Oksvold, A. Mardinoglu, Å. Sivertsson, C. Kampf, E. Sjöstedt, A. Asplund, I. Olsson, K. Edlund, E. Lundberg, S. Navani, C. A.-K. Szigyarto, J. Odeberg, D. Djureinovic, J. O. Takanen, S. Hober, T. Alm, P.-H. Edqvist, H. Berling, H. Tegel, J. Mulder, J. Rockberg, P. Nilsson, J. M. Schwenk, M. Hamsten, K. von Feilitzen, M. Forsberg, L. Persson, F. Johansson, M. Zwahlen, G. von Heijne, J. Nielsen, F. Pontén, Tissue-based map of the human proteome. Science 347, 1260419 (2015). doi:10.1126/science.1260419 Medline

7. J. Lonsdale, J. Thomas, M. Salvatore, R. Phillips, E. Lo, S. Shad, R. Hasz, G. Walters, F. Garcia, N. Young, B. Foster, M. Moser, E. Karasik, B. Gillard, K. Ramsey, S. Sullivan, J. Bridge, H. Magazine, J. Syron, J. Fleming, L. Siminoff, H. Traino, M. Mosavel, L. Barker, S. Jewell, D. Rohrer, D. Maxim, D. Filkins, P. Harbach, E. Cortadillo, B. Berghuis, L. Turner, E. Hudson, K. Feenstra, L. Sobin, J. Robb, P. Branton, G. Korzeniewski, C. Shive, D. Tabor, L. Qi, K. Groch, S. Nampally, S. Buia, A. Zimmerman, A. Smith, R. Burges, K. Robinson, K. Valentino, D. Bradbury, M. Cosentino, N. Diaz-Mayoral, M. Kennedy, T. Engel, P. Williams, K. Erickson, K. Ardlie, W. Winckler, G. Getz, D. DeLuca, D. MacArthur, M. Kellis, A. Thomson, T. Young, E. Gelfand, M. Donovan, Y. Meng, G. Grant, D. Mash, Y. Marcus, M. Basile, J. Liu, J. Zhu, Z. Tu, N. J. Cox, D. L. Nicolae, E. R. Gamazon, H. K. Im, A. Konkashbaev, J. Pritchard, M. Stevens, T. Flutre, X. Wen, E. T. Dermitzakis, T. Lappalainen, R. Guigo, J. Monlong, M. Sammeth, D. Koller, A. Battle, S. Mostafavi, M. McCarthy, M. Rivas, J. Maller, I.

Page 23: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

Rusyn, A. Nobel, F. Wright, A. Shabalin, M. Feolo, N. Sharopova, A. Sturcke, J. Paschal, J. M. Anderson, E. L. Wilder, L. K. Derr, E. D. Green, J. P. Struewing, G. Temple, S. Volpi, J. T. Boyer, E. J. Thomson, M. S. Guyer, C. Ng, A. Abdallah, D. Colantuoni, T. R. Insel, S. E. Koester, A. R. Little, P. K. Bender, T. Lehner, Y. Yao, C. C. Compton, J. B. Vaught, S. Sawyer, N. C. Lockhart, J. Demchok, H. F. Moore, The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013). doi:10.1038/ng.2653 Medline

8. L. Collado-Torres, A. Nellore, K. Kammers, S. E. Ellis, M. A. Taub, K. D. Hansen, A. E. Jaffe, B. Langmead, J. T. Leek, Reproducible RNA-seq analysis using recount2. Nat. Biotechnol. 35, 319–321 (2017). doi:10.1038/nbt.3838 Medline

9. L. Peng, X. W. Bian, D. K. Li, C. Xu, G. M. Wang, Q. Y. Xia, Q. Xiong, Large-scale RNA-Seq transcriptome analysis of 4043 cancers and 548 normal tissue controls across 12 TCGA cancer types. Sci. Rep. 5, 13413 (2015). doi:10.1038/srep13413 Medline

10. F. Edfors, F. Danielsson, B. M. Hallström, L. Käll, E. Lundberg, F. Pontén, B. Forsström, M. Uhlén, Gene-specific correlation of RNA and protein levels in human cells and tissues. Mol. Syst. Biol. 12, 883 (2016). doi:10.15252/msb.20167144 Medline

11. D. Hanahan, R. A. Weinberg, Hallmarks of cancer: The next generation. Cell 144, 646–674 (2011). doi:10.1016/j.cell.2011.02.013 Medline

12. C. Kandoth, M. D. McLellan, F. Vandin, K. Ye, B. Niu, C. Lu, M. Xie, Q. Zhang, J. F. McMichael, M. A. Wyczalkowski, M. D. M. Leiserson, C. A. Miller, J. S. Welch, M. J. Walter, M. C. Wendl, T. J. Ley, R. K. Wilson, B. J. Raphael, L. Ding, Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339 (2013). doi:10.1038/nature12634 Medline

13. T. Hothorn, B. Lausen, On the exact distribution of maximally selected rank statistics. Comput. Stat. Data Anal. 43, 121–137 (2003). doi:10.1016/S0167-9473(02)00225-6

14. B. Hjelm, D. J. Brennan, N. Zendehrokh, J. Eberhard, B. Nodin, A. Gaber, F. Pontén, H. Johannesson, K. Smaragdi, C. Frantz, S. Hober, L. B. Johnson, S. Påhlman, K. Jirström, M. Uhlen, High nuclear RBM3 expression is associated with an improved prognosis in colorectal cancer. Proteomics Clin. Appl. 5, 624–635 (2011). doi:10.1002/prca.201100020 Medline

15. C. J. Creighton, M. Morgan, P. H. Gunaratne, D. A. Wheeler, R. A. Gibbs, A. Gordon Robertson, A. Chu, R. Beroukhim, K. Cibulskis, S. Signoretti, F. Vandin Hsin-Ta Wu, B. J. Raphael, R. G. W. Verhaak, P. Tamboli, W. Torres-Garcia, R. Akbani, J. N. Weinstein, V. Reuter, J. J. Hsieh, A. Rose Brannon, A. Ari Hakimi, A. Jacobsen, G. Ciriello, B. Reva, C. J. Ricketts, W. Marston Linehan, J. M. Stuart, W. Kimryn Rathmell, H. Shen, P. W. Laird, D. Muzny, C. Davis, M. Morgan, L. Xi, K. Chang, N. Kakkar, L. R. Treviño, S. Benton, J. G. Reid, D. Morton, H. Doddapaneni, Y. Han, L. Lewis, H. Dinh, C. Kovar,

Page 24: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

Y. Zhu, J. Santibanez, M. Wang, W. Hale, D. Kalra, C. J. Creighton, D. A. Wheeler, R. A. Gibbs, G. Getz, K. Cibulskis, M. S. Lawrence, C. Sougnez, S. L. Carter, A. Sivachenko, L. Lichtenstein, C. Stewart, D. Voet, S. Fisher, S. B. Gabriel, E. Lander, R. Beroukhim, S. E. Schumacher, B. Tabak, G. Saksena, R. C. Onofrio, S. L. Carter, A. D. Cherniack, J. Gentry, K. Ardlie, C. Sougnez, G. Getz, S. B. Gabriel, M. Meyerson, A. Gordon Robertson, A. Chu, H.-J. E. Chun, A. J. Mungall, P. Sipahimalani, D. Stoll, A. Ally, M. Balasundaram, Y. S. N. Butterfield, R. Carlsen, C. Carter, E. Chuah, R. J. N. Coope, N. Dhalla, S. Gorski, R. Guin, C. Hirst, M. Hirst, R. A. Holt, C. Lebovitz, D. Lee, H. I. Li, M. Mayo, R. A. Moore, E. Pleasance, P. Plettner, J. E. Schein, A. Shafiei, J. R. Slobodan, A. Tam, N. Thiessen, R. J. Varhol, N. Wye, Y. Zhao, I. Birol, S. J. M. Jones, M. A. Marra, J. T. Auman, D. Tan, C. D. Jones, K. A. Hoadley, P. A. Mieczkowski, L. E. Mose, S. R. Jefferys, M. D. Topal, C. Liquori, Y. J. Turman, Y. Shi, S. Waring, E. Buda, J. Walsh, J. Wu, T. Bodenheimer, A. P. Hoyle, J. V. Simons, M. G. Soloway, S. Balu, J. S. Parker, D. Neil Hayes, C. M. Perou, R. Kucherlapati, P. Park, H. Shen, T. Triche Jr., D. J. Weisenberger, P. H. Lai, M. S. Bootwalla, D. T. Maglinte, S. Mahurkar, B. P. Berman, D. J. Van Den Berg, L. Cope, S. B. Baylin, P. W. Laird, C. J. Creighton, D. A. Wheeler, G. Getz, M. S. Noble, D. DiCara, H. Zhang, J. Cho, D. I. Heiman, N. Gehlenborg, D. Voet, W. Mallard, P. Lin, S. Frazer, P. Stojanov, Y. Liu, L. Zhou, J. Kim, M. S. Lawrence, L. Chin, F. Vandin, H.-T. Wu, B. J. Raphael, C. Benz, C. Yau, S. M. Reynolds, I. Shmulevich, R. G. W. Verhaak, W. Torres-Garcia, R. Vegesna, H. Kim, W. Zhang, D. Cogdell, E. Jonasch, Z. Ding, Y. Lu, R. Akbani, N. Zhang, A. K. Unruh, T. D. Casasent, C. Wakefield, D. Tsavachidou, L. Chin, G. B. Mills, J. N. Weinstein, A. Jacobsen, A. Rose Brannon, G. Ciriello, N. Schultz, A. Ari Hakimi, B. Reva, Y. Antipin, J. Gao, E. Cerami, B. Gross, B. Arman Aksoy, R. Sinha, N. Weinhold, S. Onur Sumer, B. S. Taylor, R. Shen, I. Ostrovnaya, J. J. Hsieh, M. F. Berger, M. Ladanyi, C. Sander, S. S. Fei, A. Stout, P. T. Spellman, D. L. Rubin, T. T. Liu, J. M. Stuart, S. Ng, E. O. Paull, D. Carlin, T. Goldstein, P. Waltman, K. Ellrott, J. Zhu, D. Haussler, P. H. Gunaratne, W. Xiao, C. Shelton, J. Gardner, R. Penny, M. Sherman, D. Mallery, S. Morris, J. Paulauskis, K. Burnett, T. Shelton, S. Signoretti, W. G. Kaelin, T. Choueiri, M. B. Atkins, R. Penny, K. Burnett, D. Mallery, E. Curley, S. Tickoo, V. Reuter, W. Kimryn Rathmell, L. Thorne, L. Boice, M. Huang, J. C. Fisher, W. Marston Linehan, C. D. Vocke, J. Peterson, R. Worrell, M. J. Merino, L. S. Schmidt, P. Tamboli, B. A. Czerniak, K. D. Aldape, C. G. Wood, J. Boyd, J. E. Weaver, M. V. Iacocca, N. Petrelli, G. Witkin, J. Brown, C. Czerwinski, L. Huelsenbeck-Dill, B. Rabeno, J. Myers, C. Morrison, J. Bergsten, J. Eckman, J. Harr, C. Smith, K. Tucker, L. Anne Zach, W. Bshara, C. Gaudioso, C. Morrison, R. Dhir, J. Maranchie, J. Nelson, A. Parwani, O. Potapova, K. Fedosenko, J. C. Cheville, R. Houston Thompson, S. Signoretti, W. G. Kaelin, M. B. Atkins, S. Tickoo, V. Reuter, W. Marston Linehan, C. D. Vocke, J. Peterson, M. J. Merino, L. S. Schmidt, P. Tamboli, J. M. Mosquera, M. A. Rubin, M. L. Blute, W. Kimryn Rathmell, T. Pihl, M. Jensen, R. Sfeir, A. Kahn, A. Chu, P. Kothiyal, E. Snyder,

Page 25: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

J. Pontius, B. Ayala, M. Backus, J. Walton, J. Baboud, D. Berton, M. Nicholls, D. Srinivasan, R. Raman, S. Girshik, P. Kigonya, S. Alonso, R. Sanbhadti, S. Barletta, D. Pot, M. Sheth, J. A. Demchok, T. Davidsen, Z. Wang, L. Yang, R. W. Tarnuzzer, J. Zhang, G. Eley, M. L. Ferguson, K. R. Mills Shaw, M. S. Guyer, B. A. Ozenberger, H. J. Sofia, Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499, 43–49 (2013). doi:10.1038/nature12222 Medline

16. A. Subramanian, P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Ebert, M. A. Gillette, A. Paulovich, S. L. Pomeroy, T. R. Golub, E. S. Lander, J. P. Mesirov, Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U.S.A. 102, 15545–15550 (2005). doi:10.1073/pnas.0506580102 Medline

17. H. A. Edmondson, P. E. Steiner, Primary carcinoma of the liver: A study of 100 cases among 48,900 necropsies. Cancer 7, 462–503 (1954). doi:10.1002/1097-0142(195405)7:3<462:AID-CNCR2820070308>3.0.CO;2-E Medline

18. T. M. Pawlik, A. L. Gleisner, R. A. Anders, L. Assumpcao, W. Maley, M. A. Choti, Preoperative assessment of hepatocellular carcinoma tumor grade using needle biopsy: Implications for transplant eligibility. Ann. Surg. 245, 435–442 (2007). doi:10.1097/01.sla.0000250420.73854.ad Medline

19. A. J. Simpson, O. L. Caballero, A. Jungbluth, Y. T. Chen, L. J. Old, Cancer/testis antigens, gametogenesis and cancer. Nat. Rev. Cancer 5, 615–625 (2005). doi:10.1038/nrc1669 Medline

20. M. Kanehisa, M. Furumichi, M. Tanabe, Y. Sato, K. Morishima, KEGG: New perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45, D353–D361 (2017). doi:10.1093/nar/gkw1092 Medline

21. T. I. Zack, S. E. Schumacher, S. L. Carter, A. D. Cherniack, G. Saksena, B. Tabak, M. S. Lawrence, C. Z. Zhsng, J. Wala, C. H. Mermel, C. Sougnez, S. B. Gabriel, B. Hernandez, H. Shen, P. W. Laird, G. Getz, M. Meyerson, R. Beroukhim, Pan-cancer patterns of somatic copy number alteration. Nat. Genet. 45, 1134–1140 (2013). doi:10.1038/ng.2760 Medline

22. N. N. Pavlova, C. B. Thompson, The emerging hallmarks of cancer metabolism. Cell Metab. 23, 27–47 (2016). doi:10.1016/j.cmet.2015.12.006 Medline

23. M. G. Vander Heiden, R. J. DeBerardinis, Understanding the intersections between metabolism and cancer biology. Cell 168, 657–669 (2017). doi:10.1016/j.cell.2016.12.039 Medline

24. P. Ghaffari, A. Mardinoglu, J. Nielsen, Cancer metabolism: A modeling perspective. Front. Physiol. 6, 382 (2015). doi:10.3389/fphys.2015.00382 Medline

Page 26: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

25. A. Mardinoglu, R. Agren, C. Kampf, A. Asplund, M. Uhlen, J. Nielsen, Genome-scale metabolic modelling of hepatocytes reveals serine deficiency in patients with non-alcoholic fatty liver disease. Nat. Commun. 5, 3083 (2014). doi:10.1038/ncomms4083 Medline

26. R. Agren, A. Mardinoglu, A. Asplund, C. Kampf, M. Uhlen, J. Nielsen, Identification of anticancer drugs for hepatocellular carcinoma through personalized genome-scale metabolic modeling. Mol. Syst. Biol. 10, 721 (2014). doi:10.1002/msb.145122 Medline

27. A. Mardinoglu, E. Bjornson, C. Zhang, M. Klevstig, S. Söderlund, M. Ståhlman, M. Adiels, A. Hakkarainen, N. Lundbom, M. Kilicarslan, B. M. Hallström, J. Lundbom, B. Vergès, P. H. R. Barrett, G. F. Watts, M. J. Serlie, J. Nielsen, M. Uhlén, U. Smith, H.-U. Marschall, M.-R. Taskinen, J. Boren, Personal model-assisted identification of NAD(+) and glutathione metabolism as intervention target in NAFLD. Mol. Syst. Biol. 13, 916 (2017). doi:10.15252/msb.20167422 Medline

28. L. Jerby-Arnon, N. Pfetzer, Y. Y. Waldman, L. McGarry, D. James, E. Shanks, B. Seashore-Ludlow, A. Weinstock, T. Geiger, P. A. Clemons, E. Gottlieb, E. Ruppin, Predicting cancer-specific vulnerability via data-driven detection of synthetic lethality. Cell 158, 1199–1209 (2014). doi:10.1016/j.cell.2014.07.027 Medline

29. C. Zhang, Q. Hua, Applications of genome-scale metabolic models in biotechnology and systems medicine. Front. Physiol. 6, 413 (2016). doi:10.3389/fphys.2015.00413 Medline

30. D. Djureinovic, B. M. Hallström, M. Horie, J. S. M. Mattsson, L. La Fleur, L. Fagerberg, H. Brunnström, C. Lindskog, K. Madjar, J. Rahnenführer, S. Ekman, E. Ståhle, H. Koyi, E. Brandén, K. Edlund, J. G. Hengstler, M. Lambe, A. Saito, J. Botling, F. Pontén, M. Uhlén, P. Micke, Profiling cancer testis antigens in non-small-cell lung cancer. Jci Insight 1, e86837 (2016). doi:10.1172/jci.insight.86837 Medline

31. P. Micke, J. S. M. Mattsson, D. Djureinovic, B. Nodin, K. Jirström, L. Tran, P. Jönsson, M. Planck, J. Botling, H. Brunnström, The impact of the Fourth Edition of the WHO Classification of Lung Tumours on histological classification of resected pulmonary NSCCs. J. Thorac. Oncol. 11, 862–872 (2016). doi:10.1016/j.jtho.2016.01.020 Medline

32. T. Tanaka, G. Kutomi, T. Kajiwara, K. Kukita, V. Kochin, T. Kanaseki, T. Tsukahara, Y. Hirohashi, T. Torigoe, Y. Okamoto, K. Hirata, N. Sato, Y. Tamura, Cancer-associated oxidoreductase ERO1-α drives the production of VEGF via oxidative protein folding and regulating the mRNA level. Br. J. Cancer 114, 1227–1234 (2016). doi:10.1038/bjc.2016.105 Medline

33. K. Katono, Y. Sato, S.-X. Jiang, M. Kobayashi, K. Saito, R. Nagashio, S. Ryuge, Y. Satoh, M. Saegusa, N. Masuda, Clinicopathological significance of S100A10 expression in lung adenocarcinomas. Asian Pac. J. Cancer Prev. 17, 289–294 (2016). doi:10.7314/APJCP.2016.17.1.289 Medline

Page 27: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

34. K. Saito, M. Kobayashi, R. Nagashio, S. Ryuge, K. Katono, H. Nakashima, B. Tsuchiya, S.-X. Jiang, M. Saegusa, Y. Satoh, N. Masuda, Y. Sato, S100A16 is a prognostic marker for lung adenocarcinomas. Asian Pac. J. Cancer Prev. 16, 7039–7044 (2015). doi:10.7314/APJCP.2015.16.16.7039 Medline

35. F. Penault-Llorca, N. Radosevic-Robin, Ki67 assessment in breast cancer: An update. Pathology 49, 166–171 (2017). doi:10.1016/j.pathol.2016.11.006 Medline

36. J. N. Jakobsen, J. B. Sørensen, Clinical impact of Ki-67 labeling index in non-small cell lung cancer. Lung Cancer 79, 1–7 (2013). doi:10.1016/j.lungcan.2012.10.008 Medline

37. M. Younes, R. W. Brown, M. Stephenson, M. Gondo, P. T. Cagle, Overexpression of Glut1 and Glut3 in stage I nonsmall cell lung carcinoma is associated with poor survival. Cancer 80, 1046–1051 (1997). doi:10.1002/(SICI)1097-0142(19970915)80:6<1046:AID-CNCR6>3.0.CO;2-7 Medline

38. C. K. Jung, J. H. Jung, G. S. Park, A. Lee, C. S. Kang, K. Y. Lee, Expression of transforming acidic coiled-coil containing protein 3 is a novel independent prognostic marker in non-small cell lung cancer. Pathol. Int. 56, 503–509 (2006). doi:10.1111/j.1440-1827.2006.01998.x Medline

39. K. Magnusson, G. Gremel, L. Rydén, V. Pontén, M. Uhlén, A. Dimberg, K. Jirström, F. Pontén, ANLN is a prognostic biomarker independent of Ki-67 and essential for cell cycle progression in primary breast cancer. BMC Cancer 16, 904 (2016). doi:10.1186/s12885-016-2923-8 Medline

40. C. Suzuki, Y. Daigo, N. Ishikawa, T. Kato, S. Hayama, T. Ito, E. Tsuchiya, Y. Nakamura, ANLN plays a critical role in human lung carcinogenesis through the activation of RHOA and by involvement in the phosphoinositide 3-kinase/AKT pathway. Cancer Res. 65, 11314–11325 (2005). doi:10.1158/0008-5472.CAN-05-1507 Medline

41. D. Aran, M. Sirota, A. J. Butte, Systematic pan-cancer analysis of tumour purity. Nat. Commun. 6, 8971 (2015). doi:10.1038/ncomms9971 Medline

42. D. P. Aden, A. Fogel, S. Plotkin, I. Damjanov, B. B. Knowles, Controlled synthesis of HBsAg in a differentiated human liver carcinoma-derived cell line. Nature 282, 615–616 (1979). doi:10.1038/282615a0 Medline

43. C. Kampf, I. Olsson, U. Ryberg, E. Sjöstedt, F. Pontén, Production of tissue microarrays, immunohistochemistry staining and digitalization within the human protein atlas. J. Vis. Exp. 3620, 3620 (2012). doi:10.3791/3620 Medline

44. A. Dobin, C. A. Davis, F. Schlesinger, J. Drenkow, C. Zaleski, S. Jha, P. Batut, M. Chaisson, T. R. Gingeras, STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013). doi:10.1093/bioinformatics/bts635 Medline

Page 28: Supplementary Materials for - Science...anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board (Reference # 2002-577, 2005-338, 2007-159 and

45. S. Anders, P. T. Pyl, W. Huber, HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015). doi:10.1093/bioinformatics/btu638 Medline

46. M. I. Love, W. Huber, S. Anders, Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014). doi:10.1186/s13059-014-0550-8 Medline

47. D. W. Huang, B. T. Sherman, Q. Tan, J. Kir, D. Liu, D. Bryant, Y. Guo, R. Stephens, M. W. Baseler, H. C. Lane, R. A. Lempicki, DAVID Bioinformatics Resources: Expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res. 35 (suppl. 2), W169–W175 (2007). doi:10.1093/nar/gkm415 Medline

48. D. Merico, R. Isserlin, O. Stueker, A. Emili, G. D. Bader, Enrichment map: A network-based method for gene-set enrichment visualization and interpretation. PLOS ONE 5, e13984 (2010). doi:10.1371/journal.pone.0013984 Medline

49. G. Csardi, T. Nepusz, The igraph software package for complex network research. Int. J. Complex Syst. 1695, 1–9 (2006).

50. P. Pons, M. Latapy, in International Symposium on Computer and Information Sciences (Springer, 2005), pp. 284–293.

51. V. Chelliah, N. Juty, I. Ajmera, R. Ali, M. Dumousseau, M. Glont, M. Hucka, G. Jalowicki, S. Keating, V. Knight-Schrijver, A. Lloret-Villas, K. N. Natarajan, J.-B. Pettit, N. Rodriguez, M. Schubert, S. M. Wimalaratne, Y. Zhao, H. Hermjakob, N. Le Novère, C. Laibe, BioModels: Ten-year anniversary. Nucleic Acids Res. 43, D542–D548 (2015). doi:10.1093/nar/gku1181