identification and characterization of essential genes in...

44
Reports The systematic identification of essential genes in microor- ganisms has provided critical insights into the molecular basis of many biological processes (1). Similar studies in human cells have been hindered by the lack of suitable tools. Moreover, little is known about how the set of cell essential genes differs across cell types and genotypes. Dif- ferentially essential genes are likely to encode tissue-specific modulators of key cellular processes and important targets for cancer therapies. Here, we used two independent ap- proaches for inactivating genes at the DNA level to define the cell-essential genes of the human genome. The first approach employs the CRISPR/Cas9-based gene editing system, which has emerged as a powerful tool to engineer the genomes of cultured cells and whole organisms (2, 3). We and others have shown that lentiviral (single- guide RNA) sgRNA libraries can enable pooled loss-of- function screens and have used the technology to uncover medi- ators of drug resistance and pathogen toxicity (46). To sys- tematically identify cell-essential genes, we constructed a novel library, which was optimized for high cleavage activity, and per- formed a proliferation-based screen in the near-haploid hu- man KBM7 chronic myelogenous leukemia (CML) cell line (Fig. 1 and supplementary text S1). The unusual karyotype of these cells also allows for an in- dependent method of genetic screening. In this approach, null mutants are generated at ran- dom through retroviral gene- trap mutagenesis, selected for a phenotype, and monitored by sequencing the viral integration sites to pinpoint the causal genes (7). Positive selection-based screens using this method have identified genes underlying pro- cesses such as epigenetic silenc- ing and viral infection (79). We extended this technique by de- veloping a strategy for negative selection and conducted a screen for cell-essential genes (Fig. 1 and supplementary text S2). For both methods, we com- puted a score for each gene that reflects the fitness cost imposed by inactivation of the gene. We defined the CRISPR score (CS) as the average log2 fold-change in the abundance of all sgRNAs targeting a given gene, with replicate experiments showing a high degree of reproduci- bility (r = 0.90) (Fig. 2A, fig. S1A, and table S2). Of the 18,166 genes targeted by the library, 1,878 scored as essential for optimal proliferation in our screen, although this precise number depends on the cutoff chosen (Fig. 2A and tables S2 and S3). Overall, this fraction represents ~10% of genes within our dataset or roughly 9.2% of the entire genome (many of the genes not targeted by our library encode olfac- tory receptors that are unlikely to be cell-essential). Gene products that act in a non-cell-autonomous manner are not expected to score as essential in this pooled setting (fig. S1B). We defined the gene-trap score (GTS) as the fraction of insertions in a given gene occurring in the inactivating ori- entation. Because the accuracy of this score depends on the Identification and characterization of essential genes in the human genome Tim Wang, 1,2,3,4 Kıvanç Birsoy, 1,2,3,4 * Nicholas W. Hughes, 3 Kevin M. Krupczak, 2,3,4 Yorick Post, 2,3,4 Jenny J. Wei, 1,2 Eric S. Lander, 1,3,5 †‡ David M. Sabatini 1,2,3,4,6 †‡ 1 Department of Biology, Massachusetts Institute of Technology, Cambridge MA 02139, USA. 2 Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA. 3 Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02142, USA. 4 David H. Koch Institute for Integrative Cancer Research at MIT, Cambridge, MA 02139, USA. 5 Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA. 6 Howard Hughes Medical Institute, Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA. *Present address: Laboratory of Metabolic Regulation and Genetics, The Rockefeller University, New York, NY 10065, USA. †These authors contributed equally to this work. ‡Corresponding author. E-mail: [email protected] (E.S.L.); [email protected] (D.M.S.) Large-scale genetic analysis of lethal phenotypes has elucidated the molecular underpinnings of many biological processes. Using the bacterial clustered regularly interspaced short palindromic repeats (CRISPR) system, we constructed a genome-wide single-guide RNA (sgRNA) library to screen for genes required for proliferation and survival in a human cancer cell line. Our screen revealed the set of cell-essential genes, which was validated by an orthogonal gene-trap-based screen and comparison with yeast gene knockouts. This set is enriched for genes that encode components of fundamental pathways, are expressed at high levels, and contain few inactivating polymorphisms in the human population. We also uncovered a large group of uncharacterized genes involved in RNA processing, a number of whose products localize to the nucleolus. Lastly, screens in additional cell lines showed a high degree of overlap in gene essentiality, but also revealed differences specific to each cell line and cancer type that reflect the developmental origin, oncogenic drivers, paralogous gene expression pattern, and chromosomal structure of each line. These results demonstrate the power of CRISPR-based screens and suggest a general strategy for identifying liabilities in cancer cells. / sciencemag.org/content/early/recent / 15 October 2015 / Page 1 / 10.1126/science.aac7041 on October 16, 2015 www.sciencemag.org Downloaded from on October 16, 2015 www.sciencemag.org Downloaded from on October 16, 2015 www.sciencemag.org Downloaded from on October 16, 2015 www.sciencemag.org Downloaded from on October 16, 2015 www.sciencemag.org Downloaded from on October 16, 2015 www.sciencemag.org Downloaded from on October 16, 2015 www.sciencemag.org Downloaded from on October 16, 2015 www.sciencemag.org Downloaded from on October 16, 2015 www.sciencemag.org Downloaded from on October 16, 2015 www.sciencemag.org Downloaded from

Upload: others

Post on 18-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

Reports

The systematic identification of essential genes in microor-ganisms has provided critical insights into the molecular basis of many biological processes (1). Similar studies in human cells have been hindered by the lack of suitable tools. Moreover, little is known about how the set of cell essential genes differs across cell types and genotypes. Dif-ferentially essential genes are likely to encode tissue-specific modulators of key cellular processes and important targets for cancer therapies. Here, we used two independent ap-proaches for inactivating genes at the DNA level to define the cell-essential genes of the human genome.

The first approach employs the CRISPR/Cas9-based gene editing system, which has emerged as a powerful tool to engineer the genomes of cultured cells and whole organisms (2, 3). We and others have shown that lentiviral (single-guide RNA) sgRNA libraries can enable pooled loss-of-

function screens and have used the technology to uncover medi-ators of drug resistance and pathogen toxicity (4–6). To sys-tematically identify cell-essential genes, we constructed a novel library, which was optimized for high cleavage activity, and per-formed a proliferation-based screen in the near-haploid hu-man KBM7 chronic myelogenous leukemia (CML) cell line (Fig. 1 and supplementary text S1).

The unusual karyotype of these cells also allows for an in-dependent method of genetic screening. In this approach, null mutants are generated at ran-dom through retroviral gene-trap mutagenesis, selected for a phenotype, and monitored by sequencing the viral integration sites to pinpoint the causal genes (7). Positive selection-based screens using this method have identified genes underlying pro-cesses such as epigenetic silenc-ing and viral infection (7–9). We extended this technique by de-veloping a strategy for negative selection and conducted a screen for cell-essential genes (Fig. 1 and supplementary text S2).

For both methods, we com-puted a score for each gene that reflects the fitness cost imposed by inactivation of the gene. We defined the CRISPR score (CS) as the average log2 fold-change in

the abundance of all sgRNAs targeting a given gene, with replicate experiments showing a high degree of reproduci-bility (r = 0.90) (Fig. 2A, fig. S1A, and table S2). Of the 18,166 genes targeted by the library, 1,878 scored as essential for optimal proliferation in our screen, although this precise number depends on the cutoff chosen (Fig. 2A and tables S2 and S3). Overall, this fraction represents ~10% of genes within our dataset or roughly 9.2% of the entire genome (many of the genes not targeted by our library encode olfac-tory receptors that are unlikely to be cell-essential). Gene products that act in a non-cell-autonomous manner are not expected to score as essential in this pooled setting (fig. S1B).

We defined the gene-trap score (GTS) as the fraction of insertions in a given gene occurring in the inactivating ori-entation. Because the accuracy of this score depends on the

Identification and characterization of essential genes in the human genome Tim Wang,1,2,3,4 Kıvanç Birsoy,1,2,3,4* Nicholas W. Hughes,3 Kevin M. Krupczak,2,3,4 Yorick Post,2,3,4 Jenny J. Wei,1,2 Eric S. Lander,1,3,5†‡ David M. Sabatini1,2,3,4,6†‡ 1Department of Biology, Massachusetts Institute of Technology, Cambridge MA 02139, USA. 2Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA. 3Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02142, USA. 4David H. Koch Institute for Integrative Cancer Research at MIT, Cambridge, MA 02139, USA. 5Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA. 6Howard Hughes Medical Institute, Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA. *Present address: Laboratory of Metabolic Regulation and Genetics, The Rockefeller University, New York, NY 10065, USA.

†These authors contributed equally to this work.

‡Corresponding author. E-mail: [email protected] (E.S.L.); [email protected] (D.M.S.)

Large-scale genetic analysis of lethal phenotypes has elucidated the molecular underpinnings of many biological processes. Using the bacterial clustered regularly interspaced short palindromic repeats (CRISPR) system, we constructed a genome-wide single-guide RNA (sgRNA) library to screen for genes required for proliferation and survival in a human cancer cell line. Our screen revealed the set of cell-essential genes, which was validated by an orthogonal gene-trap-based screen and comparison with yeast gene knockouts. This set is enriched for genes that encode components of fundamental pathways, are expressed at high levels, and contain few inactivating polymorphisms in the human population. We also uncovered a large group of uncharacterized genes involved in RNA processing, a number of whose products localize to the nucleolus. Lastly, screens in additional cell lines showed a high degree of overlap in gene essentiality, but also revealed differences specific to each cell line and cancer type that reflect the developmental origin, oncogenic drivers, paralogous gene expression pattern, and chromosomal structure of each line. These results demonstrate the power of CRISPR-based screens and suggest a general strategy for identifying liabilities in cancer cells.

/ sciencemag.org/content/early/recent / 15 October 2015 / Page 1 / 10.1126/science.aac7041

on

Oct

ober

16,

201

5w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

o

n O

ctob

er 1

6, 2

015

ww

w.s

cien

cem

ag.o

rgD

ownl

oade

d fro

m

on

Oct

ober

16,

201

5w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

o

n O

ctob

er 1

6, 2

015

ww

w.s

cien

cem

ag.o

rgD

ownl

oade

d fro

m

on

Oct

ober

16,

201

5w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

o

n O

ctob

er 1

6, 2

015

ww

w.s

cien

cem

ag.o

rgD

ownl

oade

d fro

m

on

Oct

ober

16,

201

5w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

o

n O

ctob

er 1

6, 2

015

ww

w.s

cien

cem

ag.o

rgD

ownl

oade

d fro

m

on

Oct

ober

16,

201

5w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

o

n O

ctob

er 1

6, 2

015

ww

w.s

cien

cem

ag.o

rgD

ownl

oade

d fro

m

Page 2: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

depth of insertional coverage, we set a requirement on the minimum number (n = 65) of anti-sense inserts in a gene needed for inclusion in our analysis by measuring the con-cordance between replicate experiments (Fig. 2B; fig. S1, C and D; table S4; and supplementary text S2). For the 7,370 genes on the haploid chromosomes that exceeded this threshold, the GTS was well-correlated with the CS and with results from a co-published study employing a similar gene-trap approach (r = 0.68) (Fig. 2C; fig. S1, E and F; and sup-plementary text S3). The strong correspondence between the overlapping sets of cell-essential genes defined by the two methods provides support for the accuracy of the CRISPR scores for the full set of 18,166 genes.

The two methods differed with respect to the diploid chromosome 8. Whereas the gene-trap screen failed to de-tect any cell-essential genes on this chromosome, the CRISPR screen uncovered a similar proportion of cell-essential genes on all the autosomes (Fig. 2, A and C). These observations indicate that (i) the vast majority of cell-essential genes are haplosufficient and (ii) biallelic inactiva-tion occurs at high frequency in our CRISPR screen (4).

To assess the accuracy of our scores with other measures of gene essentiality, we relied on functional profiling exper-iments conducted in yeast S. cerevisiae as a benchmark (1, 10). Specifically, we ranked genes common to all datasets by their scores in each dataset –CRISPR, gene trap, scores from similar loss-of-function RNA interference (RNAi) screens (11), and, as a naïve proxy for gene essentiality, gene expres-sion levels determined by RNA-seq– and compared these rankings with the essentiality of yeast homologs. The CRISPR and gene-trap methods had significantly stronger correlations with the yeast results than did the RNAi screens or gene expression, which performed similarly to each other (both methods: P < 10−4, permutation test) (Fig. 2D). Based on additional comparisons with yeast gene es-sentiality, we also found that (i) our new optimized sgRNA library gave better results than those from screens using older unoptimized libraries (4, 5) and (ii) the coverage of this library (~10 constructs per gene) approaches saturation, as evidenced by down-sampling (decreasing the coverage by randomly eliminating subsets of data) (fig. S2, A and B). Together, our results suggest that scores from the CRISPR and gene-trap screens both provide accurate measures of the cell-essentiality of human genes.

Essential genes should be under strong purifying selec-tion and should thus show greater evolutionary constraint than non-essential genes (12). Consistent with this expecta-tion, the essential genes found in our screens were more broadly retained across species, showed higher levels of conservation between closely related species, and contain fewer inactivating polymorphisms within the human spe-cies, as compared to their dispensable counterparts (Fig. 2, E and G). Essential genes also tend to have higher expres-sion and encode proteins that engage in more protein-protein interactions (13–15). These patterns were also ob-

served in our CRISPR dataset (Fig. 2, H and I). In S. cerevisiae, genes with paralogous copies in the ge-

nome show a lower degree of essentiality –due presumably to at least partial functional overlap (16). Surprisingly, meta-analysis of knockout mouse collections has suggested that there is no such correlation in mammals (17, 18). However, others have challenged this interpretation because the genes analyzed were far from a random sample (19). Using the results from our genome-wide screens, we revisited this question and observed that genes with paralogs are indeed less likely to be essential, consistent with the idea that pa-ralogs can provide functional redundancy at the cellular level (Fig. 2J).

To examine the functions of the cell-essential genes we used gene set enrichment analysis (GSEA) and found strong enrichment for many fundamental biological processes, such as DNA replication, RNA transcription and mRNA translation (fig. S3A) (20). Whereas most of the genes could be assigned to such well-defined pathways, no function has been ascribed to about 330 of the cell-essential genes (18%) (Fig. 3A). For this set of uncharacterized genes, an analysis of the domains within their encoded gene products and comparisons to proteomic datasets from organellar purifica-tions revealed substantial enrichment in proteins found in the nucleolus and those containing domains associated with RNA processing (fig. S3, B and C) (21).

We characterized three such genes, C16orf80, C3orf17, and C9orf114, whose mRNA expression patterns across the cancer cell line encyclopedia (CCLE) were correlated with that of genes involved in RNA processing (Fig. 3B). We vali-dated the essentiality of these genes in short-term prolifera-tion assays and detected localization of their products to the nucleus (C16orf80) or nucleolus (C3orf17 and C9orf114) (Fig. 3, C and D) (22). Additionally, mass spectrometric analyses of anti-FLAG-immunoprecipitates prepared from KBM7 cells expressing FLAG-tagged C16orf80, C3orf17, and C9orf114 revealed interactions with multiple subunits of the spliceosome, ribonuclease (RNAse) P/MRP, and H/ACA small nucleolar ribonucleoprotein (snoRNP) complexes, re-spectively (Fig. 3E). These results implicate C16orf80 in splicing, consistent with its association with mRNAs; C3orf17 in rRNA/tRNA processing; and C9orf114 in RNA modification (23). More broadly, our results indicate that the molecular components of many critical cellular process-es, especially RNA processing, have yet to be fully defined in mammalian cells.

To determine how the set of essential genes differs among cell lines, we screened another CML cell line (K562) and two Burkitt’s lymphoma cell lines (Raji and Jiyoye) us-ing the CRISPR system (tables S2 and S3). Overall, the sets of essential genes in the four cell lines showed a high degree of overlap (Fig. 4A). Importantly, out of these four cell lines, the KBM7 CRISPR results showed the highest correlation with the KBM7 gene-trap dataset, suggesting that the few differences observed are likely to be biologically meaningful

/ sciencemag.org/content/early/recent / 15 October 2015 / Page 2 / 10.1126/science.aac7041

Page 3: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

(fig. S4A). We focused first on genes found to be essential in only

one of the four cell lines. The Raji, Jiyoye, and KBM7 cell lines had 6, 7, and 19 such genes, respectively (fig. S4B and table S5). One example was DDX3Y, which resides in the non-pseudoautosomal region of the Y chromosome and was required only in Raji cells (Fig. 4B). Its X-linked paralog, DDX3X, was essential in KBM7 and K562 cells (Fig. 4E). Both genes encode DEAD-box helicases that likely have sim-ilar cellular functions (24). Thus, the dependence on one paralog might reflect functional absence of the other pa-ralog. Indeed, DNA sequencing of DDX3X in Raji cells re-vealed hemizygous mutations in the 5′-splice site of intron 8 that resulted in the production of a truncated mRNA tran-script (Fig. 4, C and D). Conversely, DDX3Y was not ex-pressed in KBM7 cells and was not present in K562 cells, which are of female origin (Fig. 4E). Introduction of wild-type DDX3X cDNA into Raji cells fully rescued the prolifera-tion defect resulting from DDX3Y loss, indicating that the paralogous genes are essential and functionally overlapping (Fig. 4F). Essential paralogous gene pairs, involved in glu-cose metabolism (HK1/2 and SLC2A1/3) and cell-cycle regu-lation (CDK4/6), were also observed in the Jiyoye line (fig, S4C and supplementary text S4). Vulnerabilities due to the loss of a paralogous partner may serve as targets for highly personalized anti-tumor therapies (25).

In some cases, cell line-specific essentiality of paralogous genes did not reflect differential expression. For example, the transcription factors GATA1 and GATA2 are expressed in both K562 and KBM7 cells, but the first is specifically essen-tial in K562 cells and the second in KBM7 cells (fig. S4D). These master regulators are known to promote proliferation and survival during distinct developmental stages in the hematopoietic lineage – GATA1 is required for the survival of erythroid progenitors and GATA2 for the maintenance and proliferation of immature hematopoietic progenitors (26). These two cell types likely correspond to the cells-of-origin of the two CML lines (27, 28). We also identified simi-lar instances of genes required for cell line-specific func-tions such as NF-κB pathway regulation and homology-directed DNA repair in Raji and KBM7 cells, respectively (fig. S4, E and F, and supplementary text S5).

Whereas the other three cell lines showed only a few cell line-specific essential genes, the K562 cell line was an outli-er, with 63 such genes (fig. S4B). Oddly, these genes showed no discernible common functions and many were not even expressed in the K562 cell line. (Additionally, a few encoded secreted factors whose loss would not be expected to be le-thal in a pooled screen.) This mystery was resolved when we examined the chromosomal location of the genes: strikingly, the majority (39 of the 63 genes) reside near 9q34 and 22q11. These two regions are translocated to produce the BCR-ABL oncogene and are present in a high-copy tandem amplification in K562 cells (fig. S5, A and B) (29). Notably, all 61 contiguous genes within the amplicon on 22q11 scored

as essential, suggesting that sgRNA-mediated cleavage in this repeated region induces cytotoxicity in a manner unre-lated to the function of the target genes themselves (Fig. 4G and fig. S5F). Indeed, two sgRNAs targeting non-genic sites within the amplicon were toxic to K562, but not KBM7 cells. They increased the abundance of phosphorylated histone H2AX (γH2AX), a marker of DNA damage, and induced erythroid differentiation, which occurs upon DNA damage in this cell line (Fig. 4, H to J, and fig. S5C) (30). We ob-tained similar results in another cell line, HEL, which con-tains a highly amplified region surrounding the JAK2 tyrosine kinase (fig. S5, D and E). Together, these findings indicate that lethality upon Cas9-mediated cutting may also reflect chromosome structure and therefore should be eval-uated in light of copy-number information.

Finally, we looked for consistent differences in essential genes between the two CML and two Burkitt’s lymphoma lines. Such genes might represent attractive targets for anti-neoplastic therapies as their inhibition is less likely to be broadly cytotoxic. Overall, we identified 33 genes that were specifically essential in the CML lines and 15 genes in the Burkitt’s lymphoma lines (Fig. 4K and table S6). As a con-trol, permuted comparisons – that is, a set containing of one CML and one Burkitt’s line vs. the complementary sets – showed roughly half as many “set-specific” essential genes (fig. S6, A to C).

In the CML lines, the top two genes were BCR and ABL1, consistent with the known essentiality of the BCR-ABL translocation product and the therapeutic effect of BCR-ABL inhibitors such as imatinib (31). Additional members of the BCR-ABL signaling pathway, son of sevenless homolog 1 (SOS1), growth factor receptor-bound protein (GRB2), and GRB2-associated binding protein 2 (GAB2) scored strongly as well (ranked 3, 4, and 7, respectively). Network analysis of the other top hits also uncovered several genes encoding assembly factors for the electron transport chain, as well as enzymes involved in folate-mediated one-carbon metabo-lism. These results suggest additional potential targets for CML therapy (Table S6).

In the B cell-derived Burkitt’s lymphoma cell lines, the top genes included three B cell lineage transcription factors early B cell factor 1 (EBF1), POU class 2 associating factor 1 (POU2AF1), and paired box 5 (PAX5) (ranked 3, 6, and 8, respectively). Each of these genes is the target of recurrent translocations in lymphoma (32–34). Enhancers of the cor-responding three gene loci all show a high level of bromo-domain containing 4 (BRD4) occupancy in Ly1 cells, a related diffuse large B cell lymphoma cell line, suggesting bromodomain inhibitors such as JQ1 as potential treatments (35). Other selectively essential genes included MEF2B, a transcriptional activator of BCL6, B cell lymphoma 6, and CCND3, cyclin D3, both of which are frequently mutated and implicated in the pathogenesis of various lymphomas (36). Intriguingly, the top two hits, CHM and RPP25L, do not appear to have specific roles in B cells; rather, their dif-

/ sciencemag.org/content/early/recent / 15 October 2015 / Page 3 / 10.1126/science.aac7041

Page 4: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

ferential essentiality is likely explained by the lack of ex-pression of their paralogs, CHML and RPP25, in both of the Burkitt’s lymphoma cell lines studied (fig. S6D).

In conclusion, we used two complementary and con-cordant approaches, CRISPR and gene-trap, to define the cell-essential genes in the human genome. Although the gene-trap method is suitable only for loss-of-function screening in rare haploid cell lines, the CRISPR method is broadly applicable. Extending our analysis across different cell lines and tumor types, we developed a framework to assess differential gene essentiality and identify potential drivers of the malignant state. The method can be readily applied to more cell lines per cancer type to eliminate idio-syncrasies particular to a given cell line and to more cancer types to systematically uncover tumor-specific liabilities that might be exploited for targeted therapies.

REFERENCES AND NOTES 1. G. Giaever, A. M. Chu, L. Ni, C. Connelly, L. Riles, S. Véronneau, S. Dow, A. Lucau-

Danila, K. Anderson, B. André, A. P. Arkin, A. Astromoff, M. El-Bakkoury, R. Bangham, R. Benito, S. Brachat, S. Campanaro, M. Curtiss, K. Davis, A. Deutschbauer, K. D. Entian, P. Flaherty, F. Foury, D. J. Garfinkel, M. Gerstein, D. Gotte, U. Güldener, J. H. Hegemann, S. Hempel, Z. Herman, D. F. Jaramillo, D. E. Kelly, S. L. Kelly, P. Kötter, D. LaBonte, D. C. Lamb, N. Lan, H. Liang, H. Liao, L. Liu, C. Luo, M. Lussier, R. Mao, P. Menard, S. L. Ooi, J. L. Revuelta, C. J. Roberts, M. Rose, P. Ross-Macdonald, B. Scherens, G. Schimmack, B. Shafer, D. D. Shoemaker, S. Sookhai-Mahadeo, R. K. Storms, J. N. Strathern, G. Valle, M. Voet, G. Volckaert, C. Y. Wang, T. R. Ward, J. Wilhelmy, E. A. Winzeler, Y. Yang, G. Yen, E. Youngman, K. Yu, H. Bussey, J. D. Boeke, M. Snyder, P. Philippsen, R. W. Davis, M. Johnston, Functional profiling of the Saccharomyces cerevisiae genome. Nature 418, 387–391 (2002). Medline doi:10.1038/nature00935

2. L. Cong, F. A. Ran, D. Cox, S. Lin, R. Barretto, N. Habib, P. D. Hsu, X. Wu, W. Jiang, L. A. Marraffini, F. Zhang, Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823 (2013). Medline doi:10.1126/science.1231143

3. H. Wang, H. Yang, C. S. Shivalila, M. M. Dawlaty, A. W. Cheng, F. Zhang, R. Jaenisch, One-step generation of mice carrying mutations in multiple genes by CRISPR/Cas-mediated genome engineering. Cell 153, 910–918 (2013). Medline doi:10.1016/j.cell.2013.04.025

4. T. Wang, J. J. Wei, D. M. Sabatini, E. S. Lander, Genetic screens in human cells using the CRISPR-Cas9 system. Science 343, 80–84 (2014). Medline doi:10.1126/science.1246981

5. O. Shalem, N. E. Sanjana, E. Hartenian, X. Shi, D. A. Scott, T. S. Mikkelsen, D. Heckl, B. L. Ebert, D. E. Root, J. G. Doench, F. Zhang, Genome-scale CRISPR-Cas9 knockout screening in human cells. Science 343, 84–87 (2014). Medline doi:10.1126/science.1247005

6. H. Koike-Yusa, Y. Li, E. P. Tan, M. C. Velasco-Herrera, K. Yusa, Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library. Nat. Biotechnol. 32, 267–273 (2014). Medline doi:10.1038/nbt.2800

7. J. E. Carette, C. P. Guimaraes, M. Varadarajan, A. S. Park, I. Wuethrich, A. Godarova, M. Kotecki, B. H. Cochran, E. Spooner, H. L. Ploegh, T. R. Brummelkamp, Haploid genetic screens in human cells identify host factors used by pathogens. Science 326, 1231–1235 (2009). Medline doi:10.1126/science.1178955

8. J. E. Carette, M. Raaben, A. C. Wong, A. S. Herbert, G. Obernosterer, N. Mulherkar, A. I. Kuehne, P. J. Kranzusch, A. M. Griffin, G. Ruthel, P. Dal Cin, J. M. Dye, S. P. Whelan, K. Chandran, T. R. Brummelkamp, Ebola virus entry requires the cholesterol transporter Niemann-Pick C1. Nature 477, 340–343 (2011). Medline doi:10.1038/nature10348

9. I. A. Tchasovnikarova, R. T. Timms, N. J. Matheson, K. Wals, R. Antrobus, B. Göttgens, G. Dougan, M. A. Dawson, P. J. Lehner, Epigenetic silencing by the HUSH complex mediates position-effect variegation in human cells. Science 348, 1481–1485 (2015). Medline doi:10.1126/science.aaa7227

10. J. P. Kastenmayer, L. Ni, A. Chu, L. E. Kitchen, W. C. Au, H. Yang, C. D. Carter, D. Wheeler, R. W. Davis, J. D. Boeke, M. A. Snyder, M. A. Basrai, Functional genomics of genes with small open reading frames (sORFs) in S. cerevisiae. Genome Res. 16, 365–373 (2006). Medline doi:10.1101/gr.4355406

11. G. S. Cowley et al., Parallel genome-scale loss of function screens in 216 cancer cell lines for the identification of context-specific genetic dependencies. Scientific Data 10.1038/sdata.2014.35 (2014).

12. A. C. Wilson, S. S. Carlson, T. J. White, Biochemical evolution. Annu. Rev. Biochem. 46, 573–639 (1977). Medline doi:10.1146/annurev.bi.46.070177.003041

13. Y. Ishihama, T. Schmidt, J. Rappsilber, M. Mann, F. U. Hartl, M. J. Kerner, D. Frishman, Protein abundance profiling of the Escherichia coli cytosol. BMC Genomics 9, 102 (2008). Medline doi:10.1186/1471-2164-9-102

14. H. Jeong, S. P. Mason, A. L. Barabási, Z. N. Oltvai, Lethality and centrality in protein networks. Nature 411, 41–42 (2001). Medline doi:10.1038/35075138

15. T. Hart, K. R. Brown, F. Sircoulomb, R. Rottapel, J. Moffat, Measuring error rates in genomic perturbation screens: gold standards for human functional genomics. Mol. Sys. Biol. 10.15252/msb.20145216 (2014).

16. Z. Gu, L. M. Steinmetz, X. Gu, C. Scharfe, R. W. Davis, W. H. Li, Role of duplicate genes in genetic robustness against null mutations. Nature 421, 63–66 (2003). Medline doi:10.1038/nature01198

17. H. Liang, W.-H. Li, Gene essentiality, gene duplicability and protein connectivity in human and mouse. Trends Genet. 23, 375–378 (2007). Medline doi:10.1016/j.tig.2007.04.005

18. B.-Y. Liao, J. Zhang, Mouse duplicate genes are as essential as singletons. Trends Genet. 23, 378–381 (2007). Medline doi:10.1016/j.tig.2007.05.006

19. T. Makino, K. Hokamp, A. McLysaght, The complex relationship of gene duplication and essentiality. Trends Genet. 25, 152–155 (2009). Medline doi:10.1016/j.tig.2009.03.001

20. A. Subramanian, P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Ebert, M. A. Gillette, A. Paulovich, S. L. Pomeroy, T. R. Golub, E. S. Lander, J. P. Mesirov, Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U.S.A. 102, 15545–15550 (2005). Medline doi:10.1073/pnas.0506580102

21. Y. Ahmad, F.-M. Boisvert, E. Lundberg, M. Uhlen, A. I. Lamond, Systematic analysis of protein pools, isoforms, and modifications affecting turnover and subcellular localization. Mol. Cell. Proteomics 11, 013680 (2012). 10.1074/mcp.M111.013680 Medline doi:10.1074/mcp.M111.013680

22. J. Barretina, G. Caponigro, N. Stransky, K. Venkatesan, A. A. Margolin, S. Kim, C. J. Wilson, J. Lehár, G. V. Kryukov, D. Sonkin, A. Reddy, M. Liu, L. Murray, M. F. Berger, J. E. Monahan, P. Morais, J. Meltzer, A. Korejwa, J. Jané-Valbuena, F. A. Mapa, J. Thibault, E. Bric-Furlong, P. Raman, A. Shipway, I. H. Engels, J. Cheng, G. K. Yu, J. Yu, P. Aspesi Jr., M. de Silva, K. Jagtap, M. D. Jones, L. Wang, C. Hatton, E. Palescandolo, S. Gupta, S. Mahan, C. Sougnez, R. C. Onofrio, T. Liefeld, L. MacConaill, W. Winckler, M. Reich, N. Li, J. P. Mesirov, S. B. Gabriel, G. Getz, K. Ardlie, V. Chan, V. E. Myer, B. L. Weber, J. Porter, M. Warmuth, P. Finan, J. L. Harris, M. Meyerson, T. R. Golub, M. P. Morrissey, W. R. Sellers, R. Schlegel, L. A. Garraway, The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012). Medline doi:10.1038/nature11003

23. A. G. Baltz, M. Munschauer, B. Schwanhäusser, A. Vasile, Y. Murakawa, M. Schueler, N. Youngs, D. Penfold-Brown, K. Drew, M. Milek, E. Wyler, R. Bonneau, M. Selbach, C. Dieterich, M. Landthaler, The mRNA-bound proteome and its global occupancy profile on protein-coding transcripts. Mol. Cell 46, 674–690 (2012). Medline doi:10.1016/j.molcel.2012.05.021

24. T. Sekiguchi, H. Iida, J. Fukumura, T. Nishimoto, Human DDX3Y, the Y-encoded isoform of RNA helicase DDX3, rescues a hamster temperature-sensitive ET24 mutant cell line with a DDX3X mutation. Exp. Cell Res. 300, 213–222 (2004). Medline doi:10.1016/j.yexcr.2004.07.005

25. F. L. Muller, S. Colla, E. Aquilanti, V. E. Manzo, G. Genovese, J. Lee, D. Eisenson, R. Narurkar, P. Deng, L. Nezi, M. A. Lee, B. Hu, J. Hu, E. Sahin, D. Ong, E. Fletcher-Sananikone, D. Ho, L. Kwong, C. Brennan, Y. A. Wang, L. Chin, R. A. DePinho, Passenger deletions generate therapeutic vulnerabilities in cancer. Nature 488, 337–342 (2012). Medline doi:10.1038/nature11331

26. K. Ohneda, M. Yamamoto, Roles of hematopoietic transcription factors GATA-1 and GATA-2 in the development of red blood cell lineage. Acta Haematol. 108, 237–245 (2002). Medline doi:10.1159/000065660

27. L. C. Andersson, K. Nilsson, C. G. Gahmberg, K562—a human erythroleukemic

/ sciencemag.org/content/early/recent / 15 October 2015 / Page 4 / 10.1126/science.aac7041

Page 5: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

cell line. Int. J. Cancer 23, 143–147 (1979). Medline doi:10.1002/ijc.2910230202 28. B. S. Andersson et al., KBM-7, a human myeloid leukemia cell line with double

Philadelphia chromosomes lacking normal c-ABL and BCR transcripts. Leukemia 9, 2100–2108 (1995). Medline

29. S. Q. Wu, K. V. Voelkerding, L. Sabatini, X. R. Chen, J. Huang, L. F. Meisner, Extensive amplification of bcr/abl fusion genes clustered on three marker chromosomes in human leukemic cell line K-562. Leukemia 9, 858–862 (1995). Medline

30. A. Constantinou, K. Kiguchi, E. Huberman, Induction of differentiation and DNA strand breakage in human HL-60 and K-562 leukemia cells by genistein. Cancer Res. 50, 2618–2624 (1990). Medline

31. B. Scappini, S. Gatto, F. Onida, C. Ricci, V. Divoky, W. G. Wierda, M. Andreeff, L. Dong, K. Hayes, S. Verstovsek, H. M. Kantarjian, M. Beran, Changes associated with the development of resistance to imatinib (STI571) in two leukemia cell lines expressing p210 Bcr/Abl protein. Cancer 100, 1459–1471 (2004). Medline doi:10.1002/cncr.20131

32. S. Iida, P. H. Rao, P. Nallasivam, H. Hibshoosh, M. Butler, D. C. Louie, V. Dyomin, H. Ohno, R. S. Chaganti, R. Dalla-Favera, The t(9;14)(p13;q32) chromosomal translocation associated with lymphoplasmacytoid lymphoma involves the PAX-5 gene. Blood 88, 4110–4117 (1996). Medline

33. H. Bouamar, S. Abbas, A. P. Lin, L. Wang, D. Jiang, K. N. Holder, M. C. Kinney, S. Hunicke-Smith, R. C. Aguiar, A capture-sequencing strategy identifies IRF8, EBF1, and APRIL as novel IGH fusion partners in B-cell lymphoma. Blood 122, 726–733 (2013). Medline doi:10.1182/blood-2013-04-495804

34. S. Galiègue-Zouitina, S. Quief, M. P. Hildebrand, C. Denis, G. Lecocq, M. Collyn-d’Hooghe, C. Bastard, M. Yuille, M. J. Dyer, J. P. Kerckaert, Fusion of the LAZ3/BCL6 and BOB1/OBF1 genes by t(3; 11) (q27; q23) chromosomal translocation. C. R. Acad. Sci. III 318, 1125–1131 (1995). Medline

35. B. Chapuy, M. R. McKeown, C. Y. Lin, S. Monti, M. G. Roemer, J. Qi, P. B. Rahl, H. H. Sun, K. T. Yeda, J. G. Doench, E. Reichert, A. L. Kung, S. J. Rodig, R. A. Young, M. A. Shipp, J. E. Bradner, Discovery and characterization of super-enhancer-associated dependencies in diffuse large B cell lymphoma. Cancer Cell 24, 777–790 (2013). Medline doi:10.1016/j.ccr.2013.11.003

36. R. D. Morin, M. Mendez-Lago, A. J. Mungall, R. Goya, K. L. Mungall, R. D. Corbett, N. A. Johnson, T. M. Severson, R. Chiu, M. Field, S. Jackman, M. Krzywinski, D. W. Scott, D. L. Trinh, J. Tamura-Wells, S. Li, M. R. Firme, S. Rogic, M. Griffith, S. Chan, O. Yakovenko, I. M. Meyer, E. Y. Zhao, D. Smailus, M. Moksa, S. Chittaranjan, L. Rimsza, A. Brooks-Wilson, J. J. Spinelli, S. Ben-Neriah, B. Meissner, B. Woolcock, M. Boyle, H. McDonald, A. Tam, Y. Zhao, A. Delaney, T. Zeng, K. Tse, Y. Butterfield, I. Birol, R. Holt, J. Schein, D. E. Horsman, R. Moore, S. J. Jones, J. M. Connors, M. Hirst, R. D. Gascoyne, M. A. Marra, Frequent mutation of histone-modifying genes in non-Hodgkin lymphoma. Nature 476, 298–303 (2011). Medline doi:10.1038/nature10351

37. J. M. Engreitz, A. Pandya-Jones, P. McDonel, A. Shishkin, K. Sirokman, C. Surka, S. Kadri, J. Xing, A. Goren, E. S. Lander, K. Plath, M. Guttman, The Xist lncRNA exploits three-dimensional genome architecture to spread across the X chromosome. Science 341, 1237973 (2013). Medline doi:10.1126/science.1237973

38. D. Kim, G. Pertea, C. Trapnell, H. Pimentel, R. Kelley, S. L. Salzberg, TopHat2: Accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013). Medline doi:10.1186/gb-2013-14-4-r36

39. C. Trapnell, B. A. Williams, G. Pertea, A. Mortazavi, G. Kwan, M. J. van Baren, S. L. Salzberg, B. J. Wold, L. Pachter, Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010). Medline doi:10.1038/nbt.1621

40. F. Cunningham, M. R. Amode, D. Barrell, K. Beal, K. Billis, S. Brent, D. Carvalho-Silva, P. Clapham, G. Coates, S. Fitzgerald, L. Gil, C. G. Girón, L. Gordon, T. Hourlier, S. E. Hunt, S. H. Janacek, N. Johnson, T. Juettemann, A. K. Kähäri, S. Keenan, F. J. Martin, T. Maurel, W. McLaren, D. N. Murphy, R. Nag, B. Overduin, A. Parker, M. Patricio, E. Perry, M. Pignatelli, H. S. Riat, D. Sheppard, K. Taylor, A. Thormann, A. Vullo, S. P. Wilder, A. Zadissa, B. L. Aken, E. Birney, J. Harrow, R. Kinsella, M. Muffato, M. Ruffier, S. M. Searle, G. Spudich, S. J. Trevanion, A. Yates, D. R. Zerbino, P. Flicek, Ensembl 2015. Nucleic Acids Res. 43 (D1), D662–D669 (2015). Medline doi:10.1093/nar/gku1010

41. L. Y. Geer, A. Marchler-Bauer, R. C. Geer, L. Han, J. He, S. He, C. Liu, W. Shi, S. H. Bryant, The NCBI BioSystems database. Nucleic Acids Res. 38 (Database),

D492–D496 (2010). Medline doi:10.1093/nar/gkp858 42. P. Flicek, M. R. Amode, D. Barrell, K. Beal, K. Billis, S. Brent, D. Carvalho-Silva, P.

Clapham, G. Coates, S. Fitzgerald, L. Gil, C. G. Girón, L. Gordon, T. Hourlier, S. Hunt, N. Johnson, T. Juettemann, A. K. Kähäri, S. Keenan, E. Kulesha, F. J. Martin, T. Maurel, W. M. McLaren, D. N. Murphy, R. Nag, B. Overduin, M. Pignatelli, B. Pritchard, E. Pritchard, H. S. Riat, M. Ruffier, D. Sheppard, K. Taylor, A. Thormann, S. J. Trevanion, A. Vullo, S. P. Wilder, M. Wilson, A. Zadissa, B. L. Aken, E. Birney, F. Cunningham, J. Harrow, J. Herrero, T. J. Hubbard, R. Kinsella, M. Muffato, A. Parker, G. Spudich, A. Yates, D. R. Zerbino, S. M. Searle, Ensembl 2014. Nucleic Acids Res. 42 (D1), D749–D755 (2014). Medline doi:10.1093/nar/gkt1196

43. J. A. Tennessen, A. W. Bigham, T. D. O’Connor, W. Fu, E. E. Kenny, S. Gravel, S. McGee, R. Do, X. Liu, G. Jun, H. M. Kang, D. Jordan, S. M. Leal, S. Gabriel, M. J. Rieder, G. Abecasis, D. Altshuler, D. A. Nickerson, E. Boerwinkle, S. Sunyaev, C. D. Bustamante, M. J. Bamshad, J. M. AkeyBroad GOSeattle GONHLBI Exome Sequencing Project, Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012). Medline doi:10.1126/science.1219240

44. C. Stark, B. J. Breitkreutz, T. Reguly, L. Boucher, A. Breitkreutz, M. Tyers, BioGRID: A general repository for interaction datasets. Nucleic Acids Res. 34, D535–D539 (2006). Medline doi:10.1093/nar/gkj109

45. H. Li, A. Coghlan, J. Ruan, L. J. Coin, J. K. Hériché, L. Osmotherly, R. Li, T. Liu, Z. Zhang, L. Bolund, G. K. Wong, W. Zheng, P. Dehal, J. Wang, R. Durbin, TreeFam: A curated database of phylogenetic trees of animal gene families. Nucleic Acids Res. 34, D572–D580 (2006). Medline doi:10.1093/nar/gkj118

46. J. Barretina, G. Caponigro, N. Stransky, K. Venkatesan, A. A. Margolin, S. Kim, C. J. Wilson, J. Lehár, G. V. Kryukov, D. Sonkin, A. Reddy, M. Liu, L. Murray, M. F. Berger, J. E. Monahan, P. Morais, J. Meltzer, A. Korejwa, J. Jané-Valbuena, F. A. Mapa, J. Thibault, E. Bric-Furlong, P. Raman, A. Shipway, I. H. Engels, J. Cheng, G. K. Yu, J. Yu, P. Aspesi Jr., M. de Silva, K. Jagtap, M. D. Jones, L. Wang, C. Hatton, E. Palescandolo, S. Gupta, S. Mahan, C. Sougnez, R. C. Onofrio, T. Liefeld, L. MacConaill, W. Winckler, M. Reich, N. Li, J. P. Mesirov, S. B. Gabriel, G. Getz, K. Ardlie, V. Chan, V. E. Myer, B. L. Weber, J. Porter, M. Warmuth, P. Finan, J. L. Harris, M. Meyerson, T. R. Golub, M. P. Morrissey, W. R. Sellers, R. Schlegel, L. A. Garraway, The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012). Medline doi:10.1038/nature11003

47. W. Huang, B. T. Sherman, R. A. Lempicki, Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009). Medline doi:10.1038/nprot.2008.211

48. P. Mertins, N. D. Udeshi, K. R. Clauser, D. R. Mani, J. Patel, S. E. Ong, J. D. Jaffe, S. A. Carr, iTRAQ labeling is superior to mTRAQ for quantitative global proteomics and phosphoproteomics. Mol. Cell. Proteomics 11, 014423 (2012). 10.1074/mcp.M111.014423 Medline doi:10.1074/mcp.M111.014423

49. S. Schwartz, M. R. Mumbach, M. Jovanovic, T. Wang, K. Maciag, G. G. Bushkin, P. Mertins, D. Ter-Ovanesyan, N. Habib, D. Cacchiarelli, N. E. Sanjana, E. Freinkman, M. E. Pacold, R. Satija, T. S. Mikkelsen, N. Hacohen, F. Zhang, S. A. Carr, E. S. Lander, A. Regev, Perturbation of m6A writers reveals two distinct classes of mRNA methylation at internal and 5′ sites. Cell Reports 8, 284–296 (2014). Medline doi:10.1016/j.celrep.2014.05.048

50. A. G. Uren, H. Mikkers, J. Kool, L. van der Weyden, A. H. Lund, C. H. Wilson, R. Rance, J. Jonkers, M. van Lohuizen, A. Berns, D. J. Adams, A high-throughput splinkerette-PCR method for the isolation and sequencing of retroviral insertion sites. Nat. Protoc. 4, 789–798 (2009). Medline doi:10.1038/nprot.2009.64

51. T. Brady, Y. N. Lee, K. Ronen, N. Malani, C. C. Berry, P. D. Bieniasz, F. D. Bushman, Integration target site selection by a resurrected human endogenous retrovirus. Genes Dev. 23, 633–642 (2009). Medline doi:10.1101/gad.1762309

ACKNOWLEDGMENTS

We thank T. Mikkelsen for assistance with oligonucleotide synthesis; Z. Tsun for assistance with figures; C. Hartigan, G. Guzman, M. Schenone, and S. Carr for mass spectrometric analysis; J. Down and J. Chen for reagents for hemoglobin staining. This work was supported by the National Institutes of Health (CA103866) (D.M.S.), the National Human Genome Research Institute (2U54HG003067-10) (E.S.L.), an award from the National Science Foundation

/ sciencemag.org/content/early/recent / 15 October 2015 / Page 5 / 10.1126/science.aac7041

Page 6: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

(T.W.), an award from the MIT Whitaker Health Sciences Fund (T.W.). D.M.S. is an investigator of the Howard Hughes Medical Institute. T.W., D.M.S., and E.S.L. are inventors on a patent application for functional genomics using the CRISPR-Cas system and T.W. and D.M.S. are in the process of forming a company using this technology. The sgRNA plasmid library and other plasmids described here have been deposited in Addgene.

SUPPLEMENTARY MATERIALS www.sciencemag.org/cgi/content/full/science.aac7041/DC1 Materials and Methods Supplementary Text S1 to S5 Figs. S1 to S6 Tables S1 to S6 References (37–51) 17 October 2014; accepted 1 October 2015 Published online 15 October 2015 10.1126/science.aac7401

/ sciencemag.org/content/early/recent / 15 October 2015 / Page 6 / 10.1126/science.aac7041

Page 7: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

Fig. 1. Two approaches for genetic screening in human cells. (Top) CRISPR/Cas9. Cells are transduced with a genome-wide sgRNA lentiviral library. Gene inactivation via Cas9-mediated genomic cleavage is directed by the 20-bp sequence at the 5′ end of the sgRNA. Cells bearing sgRNAs targeting essential genes are depleted in the final population. (Bottom) Gene-trap. KBM7 cells are transduced with a gene-trap retrovirus which integrates in an inactivating or “harmless” orientation at random genomic loci. Essential genes contain fewer insertions in the inactivating orientation. Sample data for two neighboring genes RPL14, encoding an essential ribosomal protein, and ZNF619, encoding a dispensable zinc finger protein, are displayed. For CRISPR/Cas9, sgRNAs are plotted according to their target position along each gene with the height of each bar indicating the level of depletion. Boxes indicate individual exons. For gene-trap, the intronic insertion sites in each gene are plotted according to their orientation and genomic position. The height of each point is randomized.

/ sciencemag.org/content/early/recent / 15 October 2015 / Page 7 / 10.1126/science.aac7041

Page 8: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

Fig. 2. Identification and characterization of human cell-essential genes. (A) CRISPR scores (CS) of all genes in the KBM7 cells. Similar proportions of cell-essential genes were identified on all autosomes. (B) KBM7 gene-trap score (GTS) distributions. No low GTS genes were detected on the diploid chromosome 8. (C) CS and GTS of overlapping genes. (D) Yeast homolog essentiality prediction analysis. (E) Broader retention of essential genes across species. (F) Higher sequence conservation of essential genes. (G) Genes with deleterious stop-gain variants are less likely to be essential. (H) Greater connectivity of proteins encoded by essential genes. (I) Higher mRNA transcript levels of essential genes. (J) Genes with paralogs are less likely to be essential. ***P < 0.001 from Kolmogorov-Smirnov test.

/ sciencemag.org/content/early/recent / 15 October 2015 / Page 8 / 10.1126/science.aac7041

Page 9: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

Fig. 3. Functional characterization of novel cell-essential genes. (A) Of the 1,878 cell-essential genes identified in the KBM7 cell line, 330 genes had no known function. (B) Genes co-expressed with C16orf80, C3orf17, and C9orf114 across CCLE cell lines were associated with RNA processing. Parentheses denote the number of genes in the set. (C) Proliferation of KBM7 cells transduced with sgRNAs targeting C16orf80, C3orf17, and C9orf114, or a non-targeting control. Error bars denote SD (n = 4). (D) C16orf80 localized to the nucleus and C3orf17 and C9orf114 to the nucleolus. Fibrillarin-RFP was used as a nucleolar marker. (E) Multiple subunits of the spliceosome, RNAse P/MRP and H/ACA ribonucleoprotein complexes interact with C16orf80, C3orf17, and C9orf114, respectively.

/ sciencemag.org/content/early/recent / 15 October 2015 / Page 9 / 10.1126/science.aac7041

Page 10: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

Fig. 4. Comparisons of gene essentiality across four cell lines. (A) Heatmap of CS across four cell lines sorted by average CS. (B) CS of genes residing in the non-pseudoautosomal region of chromosome Y for male cell lines (Raji, Jiyoye and KBM7). (C) Sanger sequencing of DDX3X in Raji cells reveals mutations in the 5′ splice site of intron 8. (D) Splice-site mutations in DDX3X result in a 69-bp truncation of the mRNA. PCR primers spanning exons 8 and 9 of DDX3X (denoted by green arrows) were used to amplify cDNA from each line. (E) Essentiality of DDX3X and DDX3Y is determined by the expression and mutation status of their paralogs. (F) Proliferation of GFP- and DDX3X-expressing Raji cells transduced with sgRNAs targeting DDX3Y or an AAVS1-targeting control. Error bars denote SD (n = 4). (G) Analysis of genes on chromosome 22q reveals apparent ‘essentiality’ of 61 contiguous genes in K562 residing in a region of high-copy tandem amplification. (H) Proliferation of K562 and KBM7 cells transduced with sgRNAs targeting a non-genic region within the BCR-ABL amplicon in K562 cells or a non-targeting control. Error bars denote SD (n = 4). (I) γH2AX (phospho-S139 H2AX) immunoblot analysis of K562 transduced with sgRNAs as in (H). S6K1 was used as a loading control. (J) Cleavage within amplified region induced erythroid differentiation in K562 cells as assessed by 3,3′-dimethoxybenzidine hemoglobin staining. (K) Comparison of gene essentiality between the two cancer types reveals oncogenic drivers and lineage specifiers. Genes were ranked by the difference between the average CS of each cancer type.

/ sciencemag.org/content/early/recent / 15 October 2015 / Page 10 / 10.1126/science.aac7041

Page 11: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

www.sciencemag.org/cgi/content/full/science.aac7041/DC1

Supplementary Material for

Identification and characterization of essential genes in the human genome

Tim Wang, Kıvanç Birsoy, Nicholas W. Hughes, Kevin M. Krupczak, Yorick Post, Jenny

J. Wei, Eric S. Lander,* David M. Sabatini*

*Corresponding author. E-mail: [email protected] (E.S.L.); [email protected] (D.M.S.).

Published 15 October 2015 on Science Express DOI: 10.1126/science.aac7041

This PDF file includes:

Materials and Methods Supplementary Text Figs. S1 to S6 Tables S5 and S6 Full Reference List

Other Supplementary Material for this manuscript includes the following: (available at www.sciencemag.org/content/science.aac7041/DC1)

Tables S1 to S4

Page 12: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

Methods and Materials Cell lines and vectors Materials were obtained from the following sources: Jiyoye and Raji cells from Dr. Robert Weinberg (Whitehead Institute for Biomedical Research), K562 cells from ATCC, KBM7 cells from Dr. Thijn Brummelkamp (Netherlands Cancer Institute), HEL cells from the Cancer Cell Line Encyclopedia (Broad Institute), pEGFP-C1-Fibrillarin and lentiCRISPR-v1 from Addgene, pGT-GFP, pGT+1-GFP, and pGT+2-GFP were kindly provided by Dr. Thijn Brummelkamp (Netherlands Cancer Institute) and pMXs-IRES-Bsd vector was purchased from Cell Biolabs, Inc. The identities of all cell lines used in this study were authenticated by STR profiling. Cell culture All cells were cultured in IMDM (Life Technologies) and supplemented with 20% Inactivated Fetal Calf Serum (Sigma), 5 mM glutamine, and penicillin/streptomycin. Vector construction The GFP and FLAG-tagged C16orf80, C3orf17, and C9orf114 expression vectors were constructed by cloning, via Gibson assembly, cDNA inserts generated by PCR from a KBM7 cDNA library into versions of the pMXs-IRES-Bsd vector containing a C-terminal GFP or FLAG tag. For sgDDX3Y rescue experiments, GFP and DDX3X expression vectors were constructed by cloning, via Gibson assembly, cDNA inserts generated by PCR from a GFP-tagged pMXs-IRES-Bsd vector (for GFP) and a KBM7 cDNA library into the pMXs-IRES-Bsd vector (for DDX3X). For live-cell imaging, the EGFP cassette in pEGFP-C1-Fibrillarin was replaced with turboRFP. Genome-wide lentiviral sgRNA library construction Oligonucleotides were synthesized on the CustomArray 90K arrays (CustomArray Inc.) as two separate sub-pools. PCR was performed to incorporate overhangs compatible for Gibson Assembly (NEB) into lentiCRISPR-v1 linearized with BsmBI (primer sequences provided below). Gibson Assembly reaction products were transformed into E. cloni 10G SUPREME electrocompetent cells (Lucigen). To preserve the diversity of the library, at least 20-fold coverage in library representation was recovered in each transformation and grown in liquid culture for 16-18 hours. PCR primers for library amplification F-GGCTTTATATATCTTGTGGAAAGGACGAAACACCG R-CTAGCCTTATTTTAACTTGCTATTTCTAGCTCTAAAAC Virus production and transduction Virus was produced by co-transfecting the transfer vector of interest with VSV-G envelope plasmid and Delta-Vpr (lentiviral) or Gag-Pol (retroviral) packaging plasmids into HEK-293T cells using XTremeGene 9 transfection reagent (Roche). Media was changed 24 hours after transfection and the virus-containing supernatant was collected 72 hours after transfection and passed through a 0.45 μm filter to eliminate cells. Target cells in 6-well tissue culture plates were infected in media containing 8 μg/mL of polybrene by centrifugation at 2,220 RPM for 45

2

Page 13: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

minutes. 24 hours after infection, virus was removed and cells were selected with the appropriate antibiotics. RNA sequencing Transcriptomic analysis was performed using the strand-specific RNA sequencing protocol described previously (37). Briefly, total RNA was extracted from KBM7, K562, Jiyoye and Raji cells using the RNeasy Mini kit (Qiagen). 5 μg of polyA-selected RNA was fragmented and dephosphorylated after which an ssRNA adapter was then ligated. Reverse transcription (RT) was performed using a primer complementary to the RNA adapter after which a DNA adapter was ligated onto the 3’ end of the resulting cDNA product. The library was then PCR amplified, cleaned, quantified using a TapeStation (Agilent) and sequenced on a HiSeq 2500 (Illumina). All primer sequences for this protocol can be found in (37). Reads were aligned against the human genome version hg19 using Tophat version 2.0.13 (38). Parameters used were ‘--transcriptome-only --no-novel-juncs’ and ‘–transcriptome-index’ for which we used RefSeq transcript models. Transcript abundances were quantified using Cufflinks version 2.2.1 (39). Comparative essentiality testing To compare human gene essentiality with yeast gene essentiality, as assessed by knockout viability (1, 10), 1-to-1 human-yeast homologs mappings were obtained from the Ensembl Gene release 79 database (40). To obtain gene scores for the RNAi dataset, we averaged hairpin scores across 216 cell lines and used the mean hairpin score as the gene score (11). For FPKM (Fragments Per Kilobase of exon per Million fragments mapped) values from the RNA sequencing experiment in KBM7 cells were used. Human genes common to the gene-trap, CRISPR, RNAi and KBM7 RNA-seq datasets were used in the essentiality analysis to control for any biases in mapping. Each dataset was ranked by their respective scores and used to predict the essentiality of yeast homologs. The sensitivity and specificity of these predictions were analyzed using receiver operator characteristic curves. Similar analyses were performed to assess the coverage of the library through down-sampling the number of sgRNAs per gene and the performance of the optimized library compared to previous studies using unoptimized libraries. Features of essential genes To understand the relationship between gene essentiality and various gene properties, CS was intersected with several datasets described in detail below. To visualize the data, genes were ranked by CS and median-binned into approximately 200 bins. To assess phyletic retention across species, the number of homologs was determined for each gene using the Homologene database release 68 (41). To assess sequence divergence between closely related species, the dN/dS ratio was calculated for all 1-1 mouse-human homologs identified in Ensembl Gene relase 79 (42). To assess the evolutionary constraint on human genes, essentiality scores were compared between genes with and without stop-gain alleles with an observed frequency above 0.05% and an average sample depth > 30x using data from the Exome Variant Server 6500 (43). To assess protein network connectivity the number of protein-protein interactions identified in the Biogrid database release 3.3.124 was used (44). To assess gene expression, RNA-sequencing was performed on the KBM7 cell lines and the FPKM values were calculated and used for the analysis. To assess genetic redundancy, essentiality scores were compared between genes with

3

Page 14: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

and without recognizable paralogs in the TreeFam database release 9 (45). Finally, genes of unknown function were defined as those that were not annotated in the Reactome database and contained no unique (i.e. gene-specific) Entrez Gene GeneRIF. Gene set enrichment analysis (GSEA) To identify pathways over- or underrepresented for essential genes, GSEA was performed using the KBM7 CS as a ranked gene list. To identify pathways that were highly expressed in Raji cells, GSEA was performed comparing the FPKM values from RNA-sequencing experiments performed in Raji cells with those from the other three lines. Both analyses were performed using 1,000 permutations and the C2 curated gene sets and the C5 GO gene sets. Western blotting Cells were lysed directly in Laemmeli sample buffer, sonicated, separated on a NuPAGE Novex 16% Tris-Glycine gel, and transferred to a polyvinylidene difluoride membrane (Millipore). Immunoblots were processed according to standard procedures, using primary antibodies directed to S6K1 (CST) and gamma-H2AX (Ser139), clone JBW301 (Millipore) and analyzed using enhanced chemiluminescence with HRP-conjugated anti-mouse and anti-rabbit secondary antibodies (Santa Cruz Biotechnology). Short-term proliferation assay Individual sgRNA constructs targeting C16orf80, C3orf17, C9orf114, DDX3Y, the non-genic region of the BCR-ABL amplicon in K562 cells, and the non-genic region of the JAK2 amplicon in HEL cells were cloned into lentiCRISPR-v1 (sequences provided below). Lentivirus was produced and target cells were transduced and selected as described above for the screens. For the sgDDX3Y experiments, target Raji cells stably expressing GFP and DDX3X were first generated via retroviral transduction. ATP-based measurements of cellular proliferation were performed by plating 2,000 cells per well, biologically replicated six times, in 96-well plates. After 1-5 hours (for the initial time point), 50 µl of Cell Titer-Glo reagent (Promega) were added to each well, mixed for 5 minutes, after which the luminescence was measured on the SpectraMax M5 Luminometer (Molecular Devices). At the indicated times, the same procedure was performed. For all samples, after removal of the highest and lowest outliers of the six measurements, the fold change in luminescence relative to the initial sample was computed. sgC16orf80-1: CGATGCTGTAGAGGATGGAG sgC16orf80-2: TGTCTGAGAAGTAAACCCGT sgC3orf17-1: GTGTGAGAATCCCTAAGGCG sgC3orf17-2: GGGCCAAATGGGGTTTGTGG sgC9orf114-1: GCGGCAGAGAAGGAGGACCG sgC9orf114-2: CAGGCGGGCTCACCTCCGTG sgDDX3Y-1: GCAGTTTAGCGATATTGACA sgDDX3Y-2: TCTTGTTGGGGCTAAAACCA

4

Page 15: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

sgAmplicon-1 (BCR-ABL): GGATGACAGATGATGGATGG sgAmplicon-2 (BCR-ABL): GGCTCCCTTCAAGTGGGATG sgAmplicon-1 (JAK2): GGTTTAATGGAAGAGAAGGG sgAmplicon-2 (JAK2): GAGGCATATTCTTCTCCTGG sgAAVS1: GGGGCCACTAGGGACAGGAT (AAVS1-targeting control) sgCTRL: GGATACTTCTTCGAACGTTT (non-targeting control) Co-expression analysis C16orf80, C3orf17, and C9orf114 expression levels across all CCLE cell lines, as assessed by microarray analysis, was compared with gene expression levels across the CCLE of all other genes in a pairwise fashion to obtain a Pearson correlation coefficient for each gene as a measure of the degree of co-expression (46). The top 200 ranked genes that were not on the same chromosome as the analyzed gene (to filter out highly co-expressing neighboring genes) were then analyzed via DAVID to identify functional categories that were associated with genes concordantly expressed with C16orf80, C3orf17, and C9orf114 (47). Live-cell imaging 100,000 HEK-293T cells stably expressing C-terminal GFP-tagged C16orf80, C3orf17, and C9orf114, generated via retroviral transduction, were seeded on fibronectin-coated glass-bottom 35 mm dishes (MatTek Corporation). The next day, cells were transfected using XtremeGene9 transfection reagent (Roche) with 1.2μg of pTURBORFP-C1-Fibrillarin. 30 minutes before imaging, cells were stained with Hoechst 33342 (Invitrogen) and imaged on a spinning disk confocal microscope (Perkin Elmer) with a 488 nm and a 568 nm laser through a 63X objective. Immunoprecipitation and mass spectrometry data analysis KBM7 lines stably expressing C-terminal FLAG-tagged C16orf80, C3orf17, and C9orf114 were generated by viral transduction and FLAG-immunoprecipitated for mass spectrometric analysis. Briefly, 1 billion C16orf80-FLAG, C3orf17-FLAG, C9orf114-FLAG, and wild-type KBM7 cells were pelleted and rinsed twice with ice-cold PBS and lysed in 1mL lysis buffer (1% NP40, 50mM Tris-HCl pH 7.5 150mM NaCl, 1 tablet of EDTA-free protease inhibitor (Roche; per 25 ml buffer)). The soluble fractions of the cell lysates were isolated by centrifugation at 13,000 rpm in a microcentrifuge for 10 minutes. The Anti-FLAG-M2 magnetic beads were washed with lysis buffer three times. Subsequently, 50 µl was added to cleared cell lysates and incubated with rotation overnight at 4°C. Next, the beads were washed twice with lysis buffer and twice in wash buffer (50mM Tris-HCl pH 7.5 150mM NaCl, 1 tablet of EDTA-free protease inhibitor (Roche; per 25 ml buffer)). The resulting samples were processed and analyzed by mass spectrometry using iTRAQ as previously described (48). We assessed enrichment as previously described (49). Briefly, for each peptide, a log2 fold change was calculated between its intensity in the ORF-expressing sample compared to the wild-type control pull-down. The peptides were mapped to the Uniprot database and, for each gene, the median fold-change value was used. Enrichment scores were then calculated by subtracting

5

Page 16: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

the median of the entire distribution of the log2 transformed values across all genes to center the fold change distribution. Characterization of DDX3X Genomic DNA was extracted from Jiyoye and Raji cells and amplified using primers directed against the exon-intron 8 boundary (primer sequences provided below). PCR amplicons were purified and analyzed by Sanger sequencing (Eton Bioscience Inc.). cDNA libraries were prepared from RNA from Raji, Jiyoye, KBM7 and K562 cells and amplified with primers targeting exons 8 and 9 of DDX3X (primer sequences provided below). PCR amplicons were then analyzed by gel electrophoresis. Genomic DNA primers for DDX3X F-CTTGCGTGGTTTATGGTGGTG R-GCCCATCCTAGTTGACTGTCC cDNA primers for DDX3X F-GGTATTAGCACCAACGAGAGAGT R-GCCAACTCTTCCTACAGCCAA

6

Page 17: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

Supplementary Text Note S1. CRISPR/Cas9-based screen. sgRNA Library Design The optimized sgRNA library was designed with the same specificity requirements as previously described (4). Briefly, candidate sgRNAs were first filtered for potential off-target matches (for duplicated or highly homologous genes and gene families with less than 5 members, this requirement was relaxed to allow for sgRNAs that targeted multiple homologs). In contrast to previous collections, the sgRNAs in this library were designed for high cleavage activity. Efficient sgRNAs are critical for essentiality screens because an sgRNA can only reliably assess essentiality if it cleaves and inactivates its gene target in a large proportion of the cells expressing it. To identify rules governing target cleavage efficiency, a support-vector-machine classifier was constructed using the target sequences (encoded by 80 binary features for positions 1-20 and nucleotides A, C, G, and T) to predict depletion scores of ribosomal protein-targeting sgRNAs from a previous pooled proliferation screen described in (4). As these sgRNAs are all expected to be essential, differences in their depletion levels reflected differences in cleavage efficiency. Using the SVM classifier, new sgRNA candidate sequences were ranked in order of their predicted cleavage efficacy and the best 4-10 (mode=10) candidates for each gene were selected for synthesis. In total, we constructed a novel library, containing 178,896 sgRNAs targeting 18,166 protein-coding genes in the human consensus CDS (CCDS) and 1,004 non-targeting control sgRNAs Screen procedure The Cas9-expressing sgRNA library lentivirus was produced in HEK-293T cells as described above. In all screens, 240 million target cells were transduced with the viral pool to achieve an average 1000-fold coverage of the library after selection. After 72 hours, cells were selected with puromycin and an initial pool of 80 million cells was harvested for genomic DNA extraction. The remaining cells were passaged every 3 days, and after 14 doublings, 80 million cells were harvested for genomic DNA extraction. Screen deconvolution sgRNA inserts were PCR amplified from 50-75 million genome equivalents of DNA from each initial and final sample, achieving an average coverage of ~275-400x of the sgRNA library. The resultant PCR products were purified and sequenced on a HiSeq 2500 (Illumina) (primer sequences provided below) to monitor the change in the abundance of each sgRNA between the initial and final cell populations. sgRNAs targeting essential genes are expected to be depleted from the population, while those targeting dispensable genes should be maintained. The results for two neighboring genes, RPL14, an essential ribosomal protein gene, and ZNF619, a dispensable gene encoding a zinc finger protein, illustrate the expected pattern (Fig. 1). Primer sequences for sgRNA quantification F-AATGATACGGCGACCACCGAGATCTAGAATACTGCCATTTGTCTCAAG R-CAAGCAGAAGACGGCATACGAGATCnnnnnnTTTCTTGGGTAGTTTGCAGTTTT (nnnnn denotes the sample barcode)

7

Page 18: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

Illumina sequencing primer CGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTATTTCTAGCTCTAAAAC Illumina indexing primer TTTCAAGTTACGGTAAGCATATGATAGTCCATTTTAAAACATAATTTTAAAACTGCAAACTACCCAAGAAA Data analysis Sequencing reads were aligned to the sgRNA library and the abundance of each sgRNA was calculated. The sgRNA counts from the initial populations of all four cells lines were combined to generate an initial reference set. sgRNAs with less than 400 counts in this set were removed from downstream analyses. The log2 fold change in abundance of each sgRNA was calculated for final population samples for each of the cell lines after adding a count of one as a pseudocount. Gene-based CRISPR scores (CS) were defined as the average log2 fold change of all sgRNAs targeting a given gene and calculated for all screens. The CS reported for the KBM7 cell line was the average of two independent replicate experiments. To identify genes essential for optimal proliferation under standard media conditions, the log2 fold change distribution for all sgRNAs targeting a given gene was compared with the entire distribution using a Kolmogorov-Smirnov test using the ks_2samp function from the scipy.stats Python library. The resulting p-values were corrected using the Benjamini-Hochberg procedure. Genes with a CS < -0.1 and corrected p< 0.05 in the KBM7 cell line were defined as cell-essential in downstream analyses. To identify cell line-specific essential genes, the CS distribution of each line was mean-normalized to zero. For each gene in each line, the CS in the given line was subtracted by the minimum CS in the other three lines to define a cell line-specific essentiality score (negative values indicate cell line specificity). For each line, genes with a differential score less than -1.5 (~4 standard deviations from the mean score) whose minimum CS in the other three lines was greater than -1 were defined as cell line-specific genes. To identify cancer type-specific essential genes, the log2 sgRNA fold change distribution of each line was mean-normalized to zero. For each sgRNA, a differential essentiality score was then defined as the average log2 fold change in the two CML lines subtracted by the average log2 fold change in the two Burkitt’s lymphoma lines (with positive and negative value representing Burkitt’s lymphoma- and CML-specific essentiality, respectively). The differential essentiality score distribution for all sgRNAs targeting a given gene was compared with the entire distribution using a Kolmogorov-Smirnov test using the ks_2samp function from the scipy.stats Python library. The resulting p-values were corrected using the Benjamini-Hochberg procedure. Candidate genes with corrected p < 0.05 were then filtered by additional CS-based criteria after mean-normalization of the CS: (i) the CS must perfectly segregate between cell lines of the two cancer types (ii) CS must be less than -1 in both lines of the given cancer type (iii) the gene must not be essential (CS less than -1) in either cell line of the other cancer type (iv) the average difference in CS between cell lines of the two cancer types must be greater than 1. Genes fulfilling all of these criteria were designated as cancer-type specific. Identical criteria were applied to identify “set-selective” genes for the permuted sample groupings.

8

Page 19: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

Note S2. Gene-trap-based screen. Screen procedure Gene-trap retrovirus was produced in HEK-293T cells as previously described (7). 100 million KBM7 cells were infected and, after 3 days, an initial pool of 100 million cells was harvested for genomic DNA extraction. The remaining cells were passaged every 3 days, and after 14 population doublings, 100 million cells were harvested for genomic DNA extraction. Screen deconvolution To date, haploid genetic screens have been limited to probing phenotypes amenable to positive selection. In positive selection screens, candidate genes are enriched for disruptive insertions and can be readily detected in the surviving mutant population by using inverse PCR. However, this protocol is unsuited for negative selection screening, in which inserts in the genes of interest are underrepresented. Identification of these genes requires a highly accurate and efficient method for measuring the presence of insertion sites in all genes. Toward this end, we developed such a protocol using ‘splinkerette’-based PCR, which enables efficient amplification of DNA fragments with only one known end, coupled with massively parallel sequencing to map genomic regions proximal to each gene-trap insertion sites (50). Briefly, 100 million genome equivalents of DNA were digested with NlaIII and/or MseI to produce sticky ends to which double stranded ‘splinkerette’ adapters were ligated (adapter sequences provided below). Two rounds of nested PCR were performed to generate an insert junction library. We sequenced the resultant library on a HiSeq 2000 (Illumina) (primer sequences provided below) and, after aligning the reads to the reference genome, tallied the number and orientation of unique integration events in each gene. Essential genes are expected to contain fewer inactivating insertions (i.e., in the ‘sense’ orientation) relative to the number of ‘harmless’ inserts (i.e., in the ‘anti-sense’ orientation), whereas non-essential genes are expected to show no such bias. The results for the neighboring genes RPL14 and ZNF619 show the expected pattern (Fig. 1). Splinkerette adapters MseI adapters F-CGCGAACAACGCTAACGACGCGAACGACAGC R-TAGCTGTCGTTCGCGTCGTAAAAAAACTTTTTTT NlaIII adapters F-CGCGAACAACGCTAACGACGCGAACGACAGCCATG R-GCTGTCGTTCGCGTCGTAAAAAAACTTTTTTT Primer sequences for insertion quantification Outer primers F-CGAGTCCACGATTCGGATGCAA R-CGCGAACAACGCTAACGACG Inner primers F-AATGATACGGCGACCACCGAGATCTACACGAAACATCTGATGGTTCTCTAGCTTGCC R-CAAGCAGAAGACGGCATACGAGATCnnnnnnCACTTACCGCTAACGACGCGAACGACAG (nnnnn denotes the sample barcode)

9

Page 20: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

Illumina sequencing primer CTAGCTTGCCAAACCTACAGGTGGGGTCTTTCA Illumina indexing primer GCTGTCGTTCGCGTCGTTAGCGGTAAGTG Data analysis Reads were de-duplicated (as only unique insertions were counted) then mapped to the reference human genome and intersected with RefSeq transcript models. For each transcript, the number of unique sense and anti-sense insertions in all intronic regions was tallied. For genes with multiple transcript models, the fraction of inactivating inserts was calculated for each transcript and the transcript with the minimum value was defined as the gene-trap score (GTS). The accuracy of the GTS for a given gene depends on the total number of insertions observed. Therefore, we determined a minimum number (n) of anti-sense inserts in a gene required for inclusion in downstream analyses by assessing the concordance between replicate experiments. For this test, genes shared between replicates 1 and 2 were compared. For increasing n, the correlation between gene-trap replicates increased, reaching a plateau at ~0.89 at n=65, which we used as a cutoff for subsequent GTS analyses (Fig. S1C-D). In the final dataset, we combined insertional data from 3 independent replicates of the final population at the transcript level, calculated the GTS for each gene, and removed all genes with less than n=65 anti-sense insertions.

10

Page 21: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

Note S3. Analysis of cell-essential genes in the KBM7 gene-trap screen. The gene-trap method allows for a principled estimate of the number of genes essential for optimal proliferation by comparing the GTS distributions between the initial and final populations. An excess of low GTS genes on the haploid chromosomes (all except 8) in the final population indicated that approximately 1,142 of the 6,694 genes with adequate insertional data (n=65 anti-sense inserts) are cell-essential (Fig. 2B). Notably, the proportion of genes that scored as cell-essential by the gene-trap method (17%) is an overestimate of the proportion of essential genes in the genome as a whole. The retrovirus utilized in these screens exhibits a substantial preference for integrating into regions of active transcription (51). As a result, silent or lowly-expressed genes (that are unlikely to be essential) are not targeted and therefore excluded from analysis. We also compared our results with those from a co-published study by Blomen et al. in which a similar proliferation-based screen was performed in KBM7 cells using gene-trap mutagenesis. Despite two major methodological differences between our experiments (namely, the use of different splice acceptor sequences in the gene-trap vectors and an additional purification for haploid cells prior to mutagenesis implemented by Blomen et al.), the results were highly concordant. For the overlap analysis, the set of genes on the haploid chromosomes where n=65 in both datasets was considered (6,285 genes). For this common set of genes, the GTS between the two studies was well-correlated (r=0.81) (Fig. S1E). Furthermore, 1,039 of the top 1,250 genes, as ranked by the GTS ratio score, from both data sets (83%) were overlapping (Fig. S1F). The high level of agreement reached between these two experiments performed in different labs with different subclones of KBM7 cells using different reagents (in addition to the aforementioned differences in experimental protocols) provides a strong demonstration of the robustness of this method and the validity of our results.

11

Page 22: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

Note S4. Paralogous gene expression may underlie Jiyoye-specific essential genes. We identified several additional instances of cell line-specific essentiality that could be attributed to paralogous gene pairs. For example, the cyclin-dependent kinase CDK6 was specifically essential in Jiyoye cells, in which its paralog, CDK4, is specifically not expressed (Fig. S4C). Similar patterns were observed in this cell line for two other pairs of paralogous genes, HK1/HK2 and SLC2A1/SLC2A3 (also known as GLUT1/GLUT3), which are both involved in glucose metabolism (Fig. S4C). To assess the essentiality of functions supplied by a set of paralogous genes, it may be useful to design libraries containing multiple sgRNAs to simultaneously inactivate all members of the set.

12

Page 23: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

Note S5. Cell line-specific essential functions in Raji and KBM7 cells. In Raji cells, two of the three subunits comprising the heterotrimeric IκB kinase complex, a positive regulator of the NF-κB pathway, scored strongly (with the third subunit nearing our selectivity threshold) indicating the importance of this pathway in this cell line (Fig. S4E). Consistent with this hypothesis, we found that Raji cells show a distinctive gene-expression signature of NF-κB pathway activation that may underlie their unique dependence (Fig. S4F). In KBM7 cells, we found several sets of cell line-specific essential genes that encode physically interacting or functionally related gene products, including RAD51B/RAD51D/FANCM/RTEL1, MCL1/BCL2, RUNX1/CBFB, LIPT2/LIAS, and SEPHS2/SEPSECS; the biological bases for the selective essentiality of these gene sets remain to be defined, although it is tempting to speculate whether the importance of the various DNA-repair components is related to the unique haploid karyotype of this line (Table S5).

13

Page 24: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

Figures

14

Page 25: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

Fig. S1. Replicate screening experiments are well correlated. (A) CS from replicate CRISPR/Cas9 screens in KBM7 cells. (B) Gene-set enrichment analysis (GSEA). Chemokines are underrepresented among essential genes in the pooled screens. (C) Correlation between gene-trap replicates at various cutoffs for the required minimum number of anti-sense inserts observed in a gene, n. (D) GTS from replicate gene-trap screens for n=65. (E) Correlation between full gene-trap dataset from our study and co-published study for all genes on the haploid chromosomes where n=65 in both datasets (6,285 genes). (F) Overlap analysis. The set of genes in (E) was ranked by the GTS in both datasets and the proportion of overlapping genes between the top X genes of both datasets, for all X between 1 and 6,285, was determined. Of the top 1,250 genes in both datasets, 1,039 genes (83%) were overlapping, indicating the high concordance between these two studies.

15

Page 26: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

Fig. S2. sgRNA library characterization. (A) Gene scores from genes common to all three studies were used to predict the essentiality of yeast homologs. Receiver operator characteristic (ROC) curve analysis revealed a substantial improvement with the optimized library. (B) Area under the curve (AUC) values from ROC analysis using gene scores from down-sampling experiments. Error bars denote SD. (n=25 samplings).

16

Page 27: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

17

Page 28: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

Fig. S3. Characteristics of unannotated essential genes. (A) Cell-essential genes are involved in fundamental biological processes. GSEA was performed on genes ranked by CS. (B) GO term analysis. Interpro domains were ranked by their fold-enrichment within the unannotated essential genes compared to all genes and mapped to GO terms. A K-S test was performed for each GO term to identify terms over-represented for high-ranking (i.e. unannotated essential gene-enriched) domains. (C) Comparisons with the nucleolar proteomic dataset from (21) revealed a substantial enrichment of nucleolar gene products encoded by the uncharacterized, cell-essential genes as compared to the rest of the genome. Parentheses denote the fraction of nucleolar genes.

18

Page 29: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

19

Page 30: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

Fig. S4. Cell line-specific essentiality. (A) Correlation of GTS from screens conducted in KBM7 cells with CS from screens conducted in all four cells lines. (B) For each line, genes are ranked by the difference between mean-normalized CS in the line and the minimum, mean-normalized CS of the other three lines. Arrows along the distribution for K562 indicate genes residing the high-copy tandem amplification. (C) Specific gene essentiality of CDK4, HK2, and SLC2A1 in Jiyoye cells due to absent/low expression of paralogs, CDK6, HK1, and SLC2A3. (D) Differential requirements for GATA1 and GATA in K562 and KBM7 cells, respectively. (E) Raji-specific essentiality of CHUK and IKBKB (IKBKG approaches selectively threshold) which form the heterotrimeric IκB kinase complex. (F) GSEA analysis reveals NF-κB pathway activation specifically in Raji cells.

20

Page 31: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

Fig. S5. Cytotoxicity induced by Cas9-mediated cleavage within highly amplified regions. (A) Out of all cell lines in the CCLE, K562 cells are the most amplified at the BCR locus, as assessed by DNA microarray copy number analysis. Only the top 250 cell lines are displayed for clarity. (B) Similar analysis as in (A) for the ABL1 locus. (C) Two sgRNAs targeting non-genic sites within the BCR-ABL amplicon induce erythroid differentiation, as assessed by hemoglobin production. (D) Similar analysis as in (A) for the JAK2 locus in HEL cells. (E) Two sgRNAs targeting non-genic sites within the JAK2 amplicon exhibit toxicity in HEL but not K562 cells. Error bars denote SD (n=4). (F) Model of Cas9-mediated cleavage in a prototypical region of high-copy tandem amplification.

21

Page 32: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

Fig. S6. Cancer-type specific essentiality. (A) Differential gene essentiality of cell lines paired by cancer type or paired randomly. (B) Quantile-quantile plot of cancer type-specific essentiality versus permuted set-specific essentiality. (C) Greater number of cancer type-specific essential genes versus permuted set-specific essential genes. (D) CHM and RPP25L essentiality in Raji and Jiyoye cells is likely due to the lack of expression of paralogs, CHML and RPP25, respectively.

22

Page 33: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

Table Captions Table S1 (separate file) Annotations for the genome-wide sgRNA library containing spacer sequences and target gene information. Table S2 (separate file) Raw sgRNA counts from initial and final KBM7, K562, Raji, and Jiyoye cell populations. Table S3 (separate file) CRISPR scores (CS) and K-S test p-values adjusted for multiple hypotheses testing from screens in KBM7, K562, Raji, and Jiyoye cells. Values for KBM7 replicates are averaged. Table S4 (separate file) Gene-trap scores (GTS) from the gene-trap screen in KBM7 cells. Only genes with at least 65 or more anti-sense insertions were analyzed.

23

Page 34: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

Tables

Rank Gene KBM7 K562 Jiyoye Raji Scoring cell line Comment

1 KIF18A -2.44 0.36 0.13 -0.15 KBM7 2 MCL1 -2.56 -0.52 -0.11 -0.44 KBM7 BCL2/MCL1 pair 3 ERG -1.89 0.41 0.01 0.21 KBM7 4 CBFB -2.21 -0.42 0.21 0.64 KBM7 CBFB/RUNX1 pair 5 GATA2 -2.32 0.00 -0.34 -0.53 KBM7 HSC/CMP master regulator 6 RBM10 -1.41 0.72 0.34 0.60 KBM7 7 SEPSECS -2.08 0.31 -0.39 0.19 KBM7 SEPSECS/SEPHS2 pair 8 RUNX1 -2.30 -0.63 -0.61 0.22 KBM7 CBFB/RUNX1 pair

9 RAD51B -1.54 0.36 0.29 0.13 KBM7 RAD51B/RAD51D/ RTEL1/FANCM set

10 USP17L21 -2.11 -0.05 -0.31 -0.45 KBM7

11 RAD51D -2.55 -0.86 -0.86 -0.90 KBM7 RAD51B/RAD51D/ RTEL1/FANCM set

12 TRAF2 -1.35 0.28 0.53 0.25 KBM7 13 LIAS -2.17 -0.57 -0.27 -0.44 KBM7 LIAS/LIPT2 pair 14 STAT5B -2.25 -0.66 0.27 0.19 KBM7 BCR-ABL pathway

15 RTEL1 -2.43 -0.83 -0.29 -0.69 KBM7 RAD51B/RAD51D/ RTEL1/FANCM set

16 BCL2 -1.51 0.86 0.48 0.08 KBM7 BCL2/MCL1 pair

17 FANCM -1.43 0.08 0.19 0.19 KBM7 RAD51B/RAD51D/ RTEL1/FANCM set

18 SEPHS2 -1.90 -0.40 -0.24 -0.20 KBM7 SEPSECS/SEPHS2 pair 19 LIPT2 -1.59 -0.09 0.09 0.09 KBM7 LIAS/LIPT2 pair 1 SMCHD1 0.09 -4.82 0.15 0.65 K562 2 MAPK1 -0.46 -4.74 0.25 -0.29 K562 In amplicon 3 PELO -0.62 -4.79 -0.30 -0.57 K562 4 FIBCD1 0.10 -3.72 0.28 0.27 K562 In amplicon 5 AHCYL1 -0.51 -4.25 -0.29 -0.50 K562 6 QRFP 0.01 -3.68 0.61 0.40 K562 In amplicon 7 SDF2L1 0.39 -3.28 0.43 0.46 K562 In amplicon 8 GATA1 -0.06 -3.73 0.17 0.16 K562 MEP master regulator 9 RAB36 -0.11 -3.77 0.37 0.16 K562 In amplicon

10 GNAZ 0.29 -3.24 0.20 0.55 K562 In amplicon 11 HIC2 0.20 -3.79 -0.14 -0.53 K562 In amplicon 12 IGLL5 0.47 -2.76 0.35 0.55 K562 In amplicon 13 GGTLC2 -0.33 -3.73 -0.65 -0.51 K562 In amplicon 14 ARVCF 0.02 -3.18 0.17 -0.19 K562 In amplicon 15 PPM1F -0.30 -3.52 -0.13 -0.57 K562 In amplicon 16 YPEL1 -0.02 -2.93 -0.04 0.00 K562 In amplicon 17 REXO2 -0.88 -3.78 -0.69 0.07 K562 18 LAMC3 -0.04 -2.90 0.13 0.14 K562 In amplicon

24

Page 35: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

19 ZDHHC8 0.03 -3.04 -0.08 -0.19 K562 In amplicon 20 FLVCR1 0.12 -3.19 0.10 -0.35 K562 21 KLHL22 -0.06 -2.85 0.04 -0.31 K562 In amplicon 22 PRAME 0.13 -2.31 0.18 0.07 K562 In amplicon 23 TOP3B 0.15 -2.45 0.13 -0.07 K562 In amplicon 24 TXNRD2 0.15 -2.44 -0.12 -0.02 K562 In amplicon 25 ZNF280B 0.50 -2.07 0.18 0.17 K562 In amplicon 26 AIF1L 0.19 -2.05 0.26 0.29 K562 In amplicon 27 TRMT2A -0.02 -2.50 -0.27 -0.16 K562 In amplicon 28 FAM155A 0.54 -1.91 0.31 0.65 K562 29 ZNF280A 0.31 -1.91 0.31 0.29 K562 In amplicon 30 CBFA2T3 0.27 -2.17 0.02 0.12 K562 31 HIRA -0.23 -2.78 -0.62 -0.22 K562 In amplicon 32 DGCR2 0.45 -1.80 0.32 0.35 K562 In amplicon 33 RTDR1 0.23 -2.03 0.04 0.07 K562 In amplicon 34 P2RX6 0.25 -1.81 0.27 0.36 K562 In amplicon 35 CRKL 0.21 -1.93 0.12 0.15 K562 In amplicon 36 SLC7A4 0.41 -1.61 0.43 0.39 K562 In amplicon 37 FAM78A 0.17 -1.78 0.25 0.27 K562 In amplicon 38 TRIP12 0.22 -1.71 0.43 0.64 K562 39 TSSK2 0.31 -2.02 -0.04 -0.10 K562 In amplicon 40 EPS8L3 -0.43 -2.34 -0.41 -0.32 K562 41 SIPA1 -0.45 -2.30 -0.42 -0.29 K562 42 C22orf29 -0.51 -2.40 -0.49 -0.58 K562 In amplicon 43 L3MBTL2 0.37 -1.49 0.32 0.65 K562 44 AIFM3 -0.21 -2.30 -0.55 -0.19 K562 In amplicon 45 VPREB1 0.04 -1.92 -0.04 -0.20 K562 In amplicon 46 YDJC 0.46 -1.88 0.24 -0.17 K562 In amplicon 47 EIF2AK4 0.09 -1.73 0.13 -0.03 K562 48 RTN4R 0.30 -1.49 0.21 0.22 K562 In amplicon 49 CNNM4 -0.31 -2.01 -0.30 -0.20 K562 50 SSBP3 -0.46 -2.13 -0.19 -0.25 K562 51 DGCR6L -0.07 -2.26 0.07 -0.59 K562 In amplicon 52 PTPN1 0.64 -1.86 0.04 -0.21 K562 53 CLTCL1 0.27 -1.36 0.36 0.40 K562 In amplicon 54 ARL1 0.34 -2.37 0.39 -0.74 K562 55 CCT8L2 0.27 -1.48 0.45 0.14 K562 In amplicon 56 MED16 0.29 -2.07 -0.03 -0.45 K562 57 FRMPD2 0.07 -1.54 0.61 0.50 K562 58 CRAMP1L 0.47 -1.65 0.45 -0.04 K562 59 COMT 0.22 -1.57 0.16 0.03 K562 In amplicon 60 TMEM63A 0.13 -1.60 0.64 0.00 K562 61 HSP90AB1 -0.10 -1.68 0.27 -0.09 K562 62 TBC1D31 0.14 -1.43 0.27 0.42 K562 63 SREBF1 -0.65 -2.19 0.21 -0.22 K562

25

Page 36: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

1 CDK4 -0.27 0.08 -3.20 -0.19 Jiyoye Paralog expression 2 GOT1 0.45 0.17 -1.90 0.24 Jiyoye 3 HK2 -0.48 -0.75 -2.82 -0.61 Jiyoye Paralog expression 4 GPI -0.66 -0.65 -2.63 0.00 Jiyoye 5 DERL1 0.02 -0.14 -1.91 -0.28 Jiyoye 6 SUCLG1 -0.13 -0.44 -2.03 0.28 Jiyoye 7 SLC2A1 -0.16 -0.34 -1.88 0.34 Jiyoye Paralog expression 1 DDX3Y 0.55 1.11 0.52 -1.37 Raji Paralog mutation 2 IKBKB -0.27 -0.55 0.09 -2.41 Raji NF-κB pathway 3 CLCN3 0.33 1.03 0.55 -1.52 Raji 4 ACSL1 0.58 0.56 0.23 -1.56 Raji 5 SH3GL1 0.05 0.58 -0.48 -2.21 Raji 6 CHUK 0.38 0.24 0.26 -1.41 Raji NF-κB pathway

Table S5. Cell-line specific hits. CRISPR scores for cell line-specific hits from all four cell lines. CRISPR scores are mean-normalized.

26

Page 37: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

Rank Gene KBM7 K562 Jiyoye Raji

Average differ-ence

Adjusted p-value

Scoring cancer type Comment

1 ABL1 -3.98 -4.80 0.29 -0.10 -4.48 1.2E-05 CML BCR-ABL pathway

2 BCR -3.05 -5.30 -0.29 -0.32 -3.88 6.3E-05 CML BCR-ABL pathway

3 SOS1 -2.26 -3.47 0.50 0.36 -3.30 8.5E-05 CML BCR-ABL pathway

4 GRB2 -1.71 -3.36 0.71 0.34 -3.06 1.1E-02 CML BCR-ABL pathway

5 LMO2 -2.18 -2.83 0.29 0.25 -2.78 5.5E-05 CML

6 ATIC -2.96 -3.82 -0.43 -0.87 -2.74 5.5E-05 CML One-carbon metabolism

7 GAB2 -1.54 -3.35 0.24 -0.20 -2.47 6.5E-05 CML BCR-ABL pathway

8 FKBPL -2.34 -2.37 -0.21 -0.10 -2.20 5.6E-05 CML 9 ATP1A1 -2.70 -2.57 -0.29 -0.72 -2.13 4.2E-04 CML

10 MTHFD1 -2.03 -3.25 -0.44 -0.63 -2.11 1.2E-04 CML One-carbon metabolism

11 PIK3C3 -2.52 -1.71 -0.02 -0.15 -2.03 2.6E-04 CML 12 OBFC1 -1.33 -1.86 0.35 0.18 -1.86 2.1E-03 CML 13 HSD17B12 -1.50 -3.07 -0.15 -0.86 -1.78 4.9E-04 CML 14 PGM3 -1.48 -1.39 0.20 0.35 -1.71 1.1E-02 CML 15 NSMCE1 -2.34 -1.65 -0.20 -0.46 -1.67 1.0E-03 CML 16 PMVK -2.23 -1.68 -0.18 -0.44 -1.65 1.0E-02 CML

17 SHMT2 -1.82 -1.03 0.28 0.14 -1.64 6.3E-05 CML One-carbon metabolism

18 NRF1 -1.46 -3.13 -0.86 -0.74 -1.50 3.8E-02 CML 19 MYBL2 -1.58 -1.71 -0.16 -0.14 -1.50 3.4E-02 CML

20 MINOS1 -2.40 -1.61 -0.33 -0.71 -1.48 1.3E-02 CML ETC assembly factor

21 WRB -1.32 -1.29 -0.12 0.36 -1.42 8.6E-03 CML 22 CREBBP -1.85 -1.18 -0.30 -0.05 -1.34 4.9E-02 CML 23 BAG6 -1.18 -1.75 -0.32 0.06 -1.34 1.1E-03 CML 24 ZNHIT3 -1.01 -1.70 -0.12 0.03 -1.31 2.7E-02 CML

25 SCO2 -2.04 -1.88 -0.64 -0.73 -1.28 1.6E-03 CML ETC assembly factor

26 CENPW -1.52 -1.87 -0.43 -0.41 -1.27 3.0E-02 CML 27 UBE2L3 -1.38 -2.81 -0.96 -0.71 -1.26 4.3E-03 CML 28 THG1L -1.06 -2.59 -0.64 -0.51 -1.26 1.8E-02 CML 29 COASY -2.16 -2.18 -0.89 -0.95 -1.25 7.9E-03 CML 30 FASTKD5 -1.73 -1.67 -0.62 -0.30 -1.24 5.0E-02 CML

27

Page 38: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

31 NAE1 -1.06 -1.80 -0.72 0.29 -1.21 1.0E-02 CML

32 PET117 -2.09 -1.29 -0.86 -0.31 -1.10 2.0E-02 CML ETC assembly factor

33 BCS1L -1.61 -1.55 -0.52 -0.50 -1.07 4.1E-02 CML ETC assembly factor

1 RPP25L 0.42 0.72 -2.75 -1.66 2.78 1.9E-04 BL Paralog not expressed

2 CHM 0.50 0.94 -1.94 -2.14 2.76 1.2E-05 BL Paralog not expressed

3 EBF1 0.39 0.22 -1.40 -2.76 2.38 6.7E-04 BL B-cell transcription factor

4 MEF2B -0.06 -0.62 -2.43 -2.27 2.01 5.5E-05 BL Mutated in lymphoma

5 MEF2BNB-MEF2B -0.06 -0.62 -2.43 -2.27 2.01 5.5E-05 BL

6 POU2AF1 0.59 0.32 -1.01 -2.04 1.98 5.5E-05 BL B-cell transcription factor

7 CCND3 0.08 0.05 -2.40 -1.41 1.97 1.7E-02 BL Mutated in lymphoma

8 PAX5 0.18 0.55 -1.49 -1.28 1.75 1.1E-03 BL B-cell transcription factor

9 LOC100287177 -0.35 0.87 -1.33 -1.50 1.68 3.5E-04 BL

10 PPIAL4D -0.57 -0.49 -1.91 -1.83 1.34 7.0E-05 BL 11 STAG2 -0.07 -0.13 -1.48 -1.41 1.34 1.1E-03 BL 12 NBPF24 -0.90 0.11 -1.74 -1.61 1.28 1.2E-03 BL 13 TTC7A -0.02 -0.48 -1.58 -1.40 1.24 6.7E-03 BL 14 LOC147646 -0.14 0.19 -1.02 -1.27 1.17 2.7E-02 BL 15 GTF2E2 -0.48 -0.94 -1.18 -2.25 1.00 1.3E-02 BL

Table S6. Cancer type-specific hits. CRISPR scores for cancer type-specific hits from all four cell lines and average differences between cancer types. CRISPR scores are mean-normalized.

28

Page 39: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

References and Notes 1. G. Giaever, A. M. Chu, L. Ni, C. Connelly, L. Riles, S. Véronneau, S. Dow, A. Lucau-Danila,

K. Anderson, B. André, A. P. Arkin, A. Astromoff, M. El-Bakkoury, R. Bangham, R. Benito, S. Brachat, S. Campanaro, M. Curtiss, K. Davis, A. Deutschbauer, K. D. Entian, P. Flaherty, F. Foury, D. J. Garfinkel, M. Gerstein, D. Gotte, U. Güldener, J. H. Hegemann, S. Hempel, Z. Herman, D. F. Jaramillo, D. E. Kelly, S. L. Kelly, P. Kötter, D. LaBonte, D. C. Lamb, N. Lan, H. Liang, H. Liao, L. Liu, C. Luo, M. Lussier, R. Mao, P. Menard, S. L. Ooi, J. L. Revuelta, C. J. Roberts, M. Rose, P. Ross-Macdonald, B. Scherens, G. Schimmack, B. Shafer, D. D. Shoemaker, S. Sookhai-Mahadeo, R. K. Storms, J. N. Strathern, G. Valle, M. Voet, G. Volckaert, C. Y. Wang, T. R. Ward, J. Wilhelmy, E. A. Winzeler, Y. Yang, G. Yen, E. Youngman, K. Yu, H. Bussey, J. D. Boeke, M. Snyder, P. Philippsen, R. W. Davis, M. Johnston, Functional profiling of the Saccharomyces cerevisiae genome. Nature 418, 387–391 (2002). Medline doi:10.1038/nature00935

2. L. Cong, F. A. Ran, D. Cox, S. Lin, R. Barretto, N. Habib, P. D. Hsu, X. Wu, W. Jiang, L. A. Marraffini, F. Zhang, Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823 (2013). Medline doi:10.1126/science.1231143

3. H. Wang, H. Yang, C. S. Shivalila, M. M. Dawlaty, A. W. Cheng, F. Zhang, R. Jaenisch, One-step generation of mice carrying mutations in multiple genes by CRISPR/Cas-mediated genome engineering. Cell 153, 910–918 (2013). Medline doi:10.1016/j.cell.2013.04.025

4. T. Wang, J. J. Wei, D. M. Sabatini, E. S. Lander, Genetic screens in human cells using the CRISPR-Cas9 system. Science 343, 80–84 (2014). Medline doi:10.1126/science.1246981

5. O. Shalem, N. E. Sanjana, E. Hartenian, X. Shi, D. A. Scott, T. S. Mikkelsen, D. Heckl, B. L. Ebert, D. E. Root, J. G. Doench, F. Zhang, Genome-scale CRISPR-Cas9 knockout screening in human cells. Science 343, 84–87 (2014). Medline doi:10.1126/science.1247005

6. H. Koike-Yusa, Y. Li, E. P. Tan, M. C. Velasco-Herrera, K. Yusa, Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library. Nat. Biotechnol. 32, 267–273 (2014). Medline doi:10.1038/nbt.2800

7. J. E. Carette, C. P. Guimaraes, M. Varadarajan, A. S. Park, I. Wuethrich, A. Godarova, M. Kotecki, B. H. Cochran, E. Spooner, H. L. Ploegh, T. R. Brummelkamp, Haploid genetic screens in human cells identify host factors used by pathogens. Science 326, 1231–1235 (2009). Medline doi:10.1126/science.1178955

8. J. E. Carette, M. Raaben, A. C. Wong, A. S. Herbert, G. Obernosterer, N. Mulherkar, A. I. Kuehne, P. J. Kranzusch, A. M. Griffin, G. Ruthel, P. Dal Cin, J. M. Dye, S. P. Whelan, K. Chandran, T. R. Brummelkamp, Ebola virus entry requires the cholesterol transporter Niemann-Pick C1. Nature 477, 340–343 (2011). Medline doi:10.1038/nature10348

9. I. A. Tchasovnikarova, R. T. Timms, N. J. Matheson, K. Wals, R. Antrobus, B. Göttgens, G. Dougan, M. A. Dawson, P. J. Lehner, Epigenetic silencing by the HUSH complex mediates position-effect variegation in human cells. Science 348, 1481–1485 (2015). Medline doi:10.1126/science.aaa7227

Page 40: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

10. J. P. Kastenmayer, L. Ni, A. Chu, L. E. Kitchen, W. C. Au, H. Yang, C. D. Carter, D. Wheeler, R. W. Davis, J. D. Boeke, M. A. Snyder, M. A. Basrai, Functional genomics of genes with small open reading frames (sORFs) in S. cerevisiae. Genome Res. 16, 365–373 (2006). Medline doi:10.1101/gr.4355406

11. G. S. Cowley et al., Parallel genome-scale loss of function screens in 216 cancer cell lines for the identification of context-specific genetic dependencies. Scientific Data 10.1038/sdata.2014.35 (2014).

12. A. C. Wilson, S. S. Carlson, T. J. White, Biochemical evolution. Annu. Rev. Biochem. 46, 573–639 (1977). Medline doi:10.1146/annurev.bi.46.070177.003041

13. Y. Ishihama, T. Schmidt, J. Rappsilber, M. Mann, F. U. Hartl, M. J. Kerner, D. Frishman, Protein abundance profiling of the Escherichia coli cytosol. BMC Genomics 9, 102 (2008). Medline doi:10.1186/1471-2164-9-102

14. H. Jeong, S. P. Mason, A. L. Barabási, Z. N. Oltvai, Lethality and centrality in protein networks. Nature 411, 41–42 (2001). Medline doi:10.1038/35075138

15. T. Hart, K. R. Brown, F. Sircoulomb, R. Rottapel, J. Moffat, Measuring error rates in genomic perturbation screens: gold standards for human functional genomics. Mol. Sys. Biol. 10.15252/msb.20145216 (2014).

16. Z. Gu, L. M. Steinmetz, X. Gu, C. Scharfe, R. W. Davis, W. H. Li, Role of duplicate genes in genetic robustness against null mutations. Nature 421, 63–66 (2003). Medline doi:10.1038/nature01198

17. H. Liang, W.-H. Li, Gene essentiality, gene duplicability and protein connectivity in human and mouse. Trends Genet. 23, 375–378 (2007). Medline doi:10.1016/j.tig.2007.04.005

18. B.-Y. Liao, J. Zhang, Mouse duplicate genes are as essential as singletons. Trends Genet. 23, 378–381 (2007). Medline doi:10.1016/j.tig.2007.05.006

19. T. Makino, K. Hokamp, A. McLysaght, The complex relationship of gene duplication and essentiality. Trends Genet. 25, 152–155 (2009). Medline doi:10.1016/j.tig.2009.03.001

20. A. Subramanian, P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Ebert, M. A. Gillette, A. Paulovich, S. L. Pomeroy, T. R. Golub, E. S. Lander, J. P. Mesirov, Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U.S.A. 102, 15545–15550 (2005). Medline doi:10.1073/pnas.0506580102

21. Y. Ahmad, F.-M. Boisvert, E. Lundberg, M. Uhlen, A. I. Lamond, Systematic analysis of protein pools, isoforms, and modifications affecting turnover and subcellular localization. Mol. Cell. Proteomics 11, 013680 (2012). 10.1074/mcp.M111.013680 Medline doi:10.1074/mcp.M111.013680

22. J. Barretina, G. Caponigro, N. Stransky, K. Venkatesan, A. A. Margolin, S. Kim, C. J. Wilson, J. Lehár, G. V. Kryukov, D. Sonkin, A. Reddy, M. Liu, L. Murray, M. F. Berger, J. E. Monahan, P. Morais, J. Meltzer, A. Korejwa, J. Jané-Valbuena, F. A. Mapa, J. Thibault, E. Bric-Furlong, P. Raman, A. Shipway, I. H. Engels, J. Cheng, G. K. Yu, J. Yu, P. Aspesi Jr., M. de Silva, K. Jagtap, M. D. Jones, L. Wang, C. Hatton, E. Palescandolo, S. Gupta, S. Mahan, C. Sougnez, R. C. Onofrio, T. Liefeld, L. MacConaill,

Page 41: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

W. Winckler, M. Reich, N. Li, J. P. Mesirov, S. B. Gabriel, G. Getz, K. Ardlie, V. Chan, V. E. Myer, B. L. Weber, J. Porter, M. Warmuth, P. Finan, J. L. Harris, M. Meyerson, T. R. Golub, M. P. Morrissey, W. R. Sellers, R. Schlegel, L. A. Garraway, The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012). Medline doi:10.1038/nature11003

23. A. G. Baltz, M. Munschauer, B. Schwanhäusser, A. Vasile, Y. Murakawa, M. Schueler, N. Youngs, D. Penfold-Brown, K. Drew, M. Milek, E. Wyler, R. Bonneau, M. Selbach, C. Dieterich, M. Landthaler, The mRNA-bound proteome and its global occupancy profile on protein-coding transcripts. Mol. Cell 46, 674–690 (2012). Medline doi:10.1016/j.molcel.2012.05.021

24. T. Sekiguchi, H. Iida, J. Fukumura, T. Nishimoto, Human DDX3Y, the Y-encoded isoform of RNA helicase DDX3, rescues a hamster temperature-sensitive ET24 mutant cell line with a DDX3X mutation. Exp. Cell Res. 300, 213–222 (2004). Medline doi:10.1016/j.yexcr.2004.07.005

25. F. L. Muller, S. Colla, E. Aquilanti, V. E. Manzo, G. Genovese, J. Lee, D. Eisenson, R. Narurkar, P. Deng, L. Nezi, M. A. Lee, B. Hu, J. Hu, E. Sahin, D. Ong, E. Fletcher-Sananikone, D. Ho, L. Kwong, C. Brennan, Y. A. Wang, L. Chin, R. A. DePinho, Passenger deletions generate therapeutic vulnerabilities in cancer. Nature 488, 337–342 (2012). Medline doi:10.1038/nature11331

26. K. Ohneda, M. Yamamoto, Roles of hematopoietic transcription factors GATA-1 and GATA-2 in the development of red blood cell lineage. Acta Haematol. 108, 237–245 (2002). Medline doi:10.1159/000065660

27. L. C. Andersson, K. Nilsson, C. G. Gahmberg, K562—a human erythroleukemic cell line. Int. J. Cancer 23, 143–147 (1979). Medline doi:10.1002/ijc.2910230202

28. B. S. Andersson et al., KBM-7, a human myeloid leukemia cell line with double Philadelphia chromosomes lacking normal c-ABL and BCR transcripts. Leukemia 9, 2100–2108 (1995). Medline

29. S. Q. Wu, K. V. Voelkerding, L. Sabatini, X. R. Chen, J. Huang, L. F. Meisner, Extensive amplification of bcr/abl fusion genes clustered on three marker chromosomes in human leukemic cell line K-562. Leukemia 9, 858–862 (1995). Medline

30. A. Constantinou, K. Kiguchi, E. Huberman, Induction of differentiation and DNA strand breakage in human HL-60 and K-562 leukemia cells by genistein. Cancer Res. 50, 2618–2624 (1990). Medline

31. B. Scappini, S. Gatto, F. Onida, C. Ricci, V. Divoky, W. G. Wierda, M. Andreeff, L. Dong, K. Hayes, S. Verstovsek, H. M. Kantarjian, M. Beran, Changes associated with the development of resistance to imatinib (STI571) in two leukemia cell lines expressing p210 Bcr/Abl protein. Cancer 100, 1459–1471 (2004). Medline doi:10.1002/cncr.20131

32. S. Iida, P. H. Rao, P. Nallasivam, H. Hibshoosh, M. Butler, D. C. Louie, V. Dyomin, H. Ohno, R. S. Chaganti, R. Dalla-Favera, The t(9;14)(p13;q32) chromosomal translocation associated with lymphoplasmacytoid lymphoma involves the PAX-5 gene. Blood 88, 4110–4117 (1996). Medline

Page 42: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

33. H. Bouamar, S. Abbas, A. P. Lin, L. Wang, D. Jiang, K. N. Holder, M. C. Kinney, S. Hunicke-Smith, R. C. Aguiar, A capture-sequencing strategy identifies IRF8, EBF1, and APRIL as novel IGH fusion partners in B-cell lymphoma. Blood 122, 726–733 (2013). Medline doi:10.1182/blood-2013-04-495804

34. S. Galiègue-Zouitina, S. Quief, M. P. Hildebrand, C. Denis, G. Lecocq, M. Collyn-d’Hooghe, C. Bastard, M. Yuille, M. J. Dyer, J. P. Kerckaert, Fusion of the LAZ3/BCL6 and BOB1/OBF1 genes by t(3; 11) (q27; q23) chromosomal translocation. C. R. Acad. Sci. III 318, 1125–1131 (1995). Medline

35. B. Chapuy, M. R. McKeown, C. Y. Lin, S. Monti, M. G. Roemer, J. Qi, P. B. Rahl, H. H. Sun, K. T. Yeda, J. G. Doench, E. Reichert, A. L. Kung, S. J. Rodig, R. A. Young, M. A. Shipp, J. E. Bradner, Discovery and characterization of super-enhancer-associated dependencies in diffuse large B cell lymphoma. Cancer Cell 24, 777–790 (2013). Medline doi:10.1016/j.ccr.2013.11.003

36. R. D. Morin, M. Mendez-Lago, A. J. Mungall, R. Goya, K. L. Mungall, R. D. Corbett, N. A. Johnson, T. M. Severson, R. Chiu, M. Field, S. Jackman, M. Krzywinski, D. W. Scott, D. L. Trinh, J. Tamura-Wells, S. Li, M. R. Firme, S. Rogic, M. Griffith, S. Chan, O. Yakovenko, I. M. Meyer, E. Y. Zhao, D. Smailus, M. Moksa, S. Chittaranjan, L. Rimsza, A. Brooks-Wilson, J. J. Spinelli, S. Ben-Neriah, B. Meissner, B. Woolcock, M. Boyle, H. McDonald, A. Tam, Y. Zhao, A. Delaney, T. Zeng, K. Tse, Y. Butterfield, I. Birol, R. Holt, J. Schein, D. E. Horsman, R. Moore, S. J. Jones, J. M. Connors, M. Hirst, R. D. Gascoyne, M. A. Marra, Frequent mutation of histone-modifying genes in non-Hodgkin lymphoma. Nature 476, 298–303 (2011). Medline doi:10.1038/nature10351

37. J. M. Engreitz, A. Pandya-Jones, P. McDonel, A. Shishkin, K. Sirokman, C. Surka, S. Kadri, J. Xing, A. Goren, E. S. Lander, K. Plath, M. Guttman, The Xist lncRNA exploits three-dimensional genome architecture to spread across the X chromosome. Science 341, 1237973 (2013). Medline doi:10.1126/science.1237973

38. D. Kim, G. Pertea, C. Trapnell, H. Pimentel, R. Kelley, S. L. Salzberg, TopHat2: Accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013). Medline doi:10.1186/gb-2013-14-4-r36

39. C. Trapnell, B. A. Williams, G. Pertea, A. Mortazavi, G. Kwan, M. J. van Baren, S. L. Salzberg, B. J. Wold, L. Pachter, Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010). Medline doi:10.1038/nbt.1621

40. F. Cunningham, M. R. Amode, D. Barrell, K. Beal, K. Billis, S. Brent, D. Carvalho-Silva, P. Clapham, G. Coates, S. Fitzgerald, L. Gil, C. G. Girón, L. Gordon, T. Hourlier, S. E. Hunt, S. H. Janacek, N. Johnson, T. Juettemann, A. K. Kähäri, S. Keenan, F. J. Martin, T. Maurel, W. McLaren, D. N. Murphy, R. Nag, B. Overduin, A. Parker, M. Patricio, E. Perry, M. Pignatelli, H. S. Riat, D. Sheppard, K. Taylor, A. Thormann, A. Vullo, S. P. Wilder, A. Zadissa, B. L. Aken, E. Birney, J. Harrow, R. Kinsella, M. Muffato, M. Ruffier, S. M. Searle, G. Spudich, S. J. Trevanion, A. Yates, D. R. Zerbino, P. Flicek, Ensembl 2015. Nucleic Acids Res. 43 (D1), D662–D669 (2015). Medline doi:10.1093/nar/gku1010

Page 43: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

41. L. Y. Geer, A. Marchler-Bauer, R. C. Geer, L. Han, J. He, S. He, C. Liu, W. Shi, S. H. Bryant, The NCBI BioSystems database. Nucleic Acids Res. 38 (Database), D492–D496 (2010). Medline doi:10.1093/nar/gkp858

42. P. Flicek, M. R. Amode, D. Barrell, K. Beal, K. Billis, S. Brent, D. Carvalho-Silva, P. Clapham, G. Coates, S. Fitzgerald, L. Gil, C. G. Girón, L. Gordon, T. Hourlier, S. Hunt, N. Johnson, T. Juettemann, A. K. Kähäri, S. Keenan, E. Kulesha, F. J. Martin, T. Maurel, W. M. McLaren, D. N. Murphy, R. Nag, B. Overduin, M. Pignatelli, B. Pritchard, E. Pritchard, H. S. Riat, M. Ruffier, D. Sheppard, K. Taylor, A. Thormann, S. J. Trevanion, A. Vullo, S. P. Wilder, M. Wilson, A. Zadissa, B. L. Aken, E. Birney, F. Cunningham, J. Harrow, J. Herrero, T. J. Hubbard, R. Kinsella, M. Muffato, A. Parker, G. Spudich, A. Yates, D. R. Zerbino, S. M. Searle, Ensembl 2014. Nucleic Acids Res. 42 (D1), D749–D755 (2014). Medline doi:10.1093/nar/gkt1196

43. J. A. Tennessen, A. W. Bigham, T. D. O’Connor, W. Fu, E. E. Kenny, S. Gravel, S. McGee, R. Do, X. Liu, G. Jun, H. M. Kang, D. Jordan, S. M. Leal, S. Gabriel, M. J. Rieder, G. Abecasis, D. Altshuler, D. A. Nickerson, E. Boerwinkle, S. Sunyaev, C. D. Bustamante, M. J. Bamshad, J. M. AkeyBroad GOSeattle GONHLBI Exome Sequencing Project, Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012). Medline doi:10.1126/science.1219240

44. C. Stark, B. J. Breitkreutz, T. Reguly, L. Boucher, A. Breitkreutz, M. Tyers, BioGRID: A general repository for interaction datasets. Nucleic Acids Res. 34, D535–D539 (2006). Medline doi:10.1093/nar/gkj109

45. H. Li, A. Coghlan, J. Ruan, L. J. Coin, J. K. Hériché, L. Osmotherly, R. Li, T. Liu, Z. Zhang, L. Bolund, G. K. Wong, W. Zheng, P. Dehal, J. Wang, R. Durbin, TreeFam: A curated database of phylogenetic trees of animal gene families. Nucleic Acids Res. 34, D572–D580 (2006). Medline doi:10.1093/nar/gkj118

46. J. Barretina, G. Caponigro, N. Stransky, K. Venkatesan, A. A. Margolin, S. Kim, C. J. Wilson, J. Lehár, G. V. Kryukov, D. Sonkin, A. Reddy, M. Liu, L. Murray, M. F. Berger, J. E. Monahan, P. Morais, J. Meltzer, A. Korejwa, J. Jané-Valbuena, F. A. Mapa, J. Thibault, E. Bric-Furlong, P. Raman, A. Shipway, I. H. Engels, J. Cheng, G. K. Yu, J. Yu, P. Aspesi Jr., M. de Silva, K. Jagtap, M. D. Jones, L. Wang, C. Hatton, E. Palescandolo, S. Gupta, S. Mahan, C. Sougnez, R. C. Onofrio, T. Liefeld, L. MacConaill, W. Winckler, M. Reich, N. Li, J. P. Mesirov, S. B. Gabriel, G. Getz, K. Ardlie, V. Chan, V. E. Myer, B. L. Weber, J. Porter, M. Warmuth, P. Finan, J. L. Harris, M. Meyerson, T. R. Golub, M. P. Morrissey, W. R. Sellers, R. Schlegel, L. A. Garraway, The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012). Medline doi:10.1038/nature11003

47. W. Huang, B. T. Sherman, R. A. Lempicki, Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009). Medline doi:10.1038/nprot.2008.211

48. P. Mertins, N. D. Udeshi, K. R. Clauser, D. R. Mani, J. Patel, S. E. Ong, J. D. Jaffe, S. A. Carr, iTRAQ labeling is superior to mTRAQ for quantitative global proteomics and phosphoproteomics. Mol. Cell. Proteomics 11, 014423 (2012). 10.1074/mcp.M111.014423 Medline doi:10.1074/mcp.M111.014423

Page 44: Identification and characterization of essential genes in ...sabatinilab.wi.mit.edu/pubs/2013/Wang-science.aac7041.pdf · the cell-essential genes of the human genome. The first approach

49. S. Schwartz, M. R. Mumbach, M. Jovanovic, T. Wang, K. Maciag, G. G. Bushkin, P. Mertins, D. Ter-Ovanesyan, N. Habib, D. Cacchiarelli, N. E. Sanjana, E. Freinkman, M. E. Pacold, R. Satija, T. S. Mikkelsen, N. Hacohen, F. Zhang, S. A. Carr, E. S. Lander, A. Regev, Perturbation of m6A writers reveals two distinct classes of mRNA methylation at internal and 5′ sites. Cell Reports 8, 284–296 (2014). Medline doi:10.1016/j.celrep.2014.05.048

50. A. G. Uren, H. Mikkers, J. Kool, L. van der Weyden, A. H. Lund, C. H. Wilson, R. Rance, J. Jonkers, M. van Lohuizen, A. Berns, D. J. Adams, A high-throughput splinkerette-PCR method for the isolation and sequencing of retroviral insertion sites. Nat. Protoc. 4, 789–798 (2009). Medline doi:10.1038/nprot.2009.64

51. T. Brady, Y. N. Lee, K. Ronen, N. Malani, C. C. Berry, P. D. Bieniasz, F. D. Bushman, Integration target site selection by a resurrected human endogenous retrovirus. Genes Dev. 23, 633–642 (2009). Medline doi:10.1101/gad.1762309