comprehensive analysis of the chromatin landscape in...

1

Comprehensive analysis of the chromatin landscape in Drosophila melanogaster

Peter V. Kharchenko1,2, Artyom A. Alekseyenko3,4, Yuri B. Schwartz5, Aki Minoda6, Nicole C. Riddle7, Jason

Ernst8,9, Peter J. Sabo10, Erica Larschan3,4,11, Andrey A. Gorchakov3,4, Tingting Gu7, Daniela Linder-Basso5,

Annette Plachetka3,4, Gregory Shanower5, Michael Y. Tolstorukov1,2, Lovelace J. Luquette1, Ruibin Xi1,

Youngsook L. Jung1,3, Richard Park1,12, Eric P. Bishop1,12, Theresa P. Canfield10, Richard Sandstrom10, Robert

E. Thurman10, David M. MacAlpine13, John Stamatoyannopoulos10,14, Manolis Kellis8,9, Sarah C. R. Elgin7,

Mitzi I. Kuroda3,4, Vincenzo Pirrotta5, Gary Karpen6*, Peter J. Park1,2,3*

* co-corresponding authors 1 Center for Biomedical Informatics, Harvard Medical School, and Informatics Program, Children's Hospital,

Boston, MA, USA 2 Children’s Hospital Informatics Program, Boston, MA USA 3 Division of Genetics, Department of Medicine, Brigham & Women’s Hospital, Boston, MA USA 4 Department of Genetics, Harvard Medical School, Boston, MA, USA 5 Department of Molecular Biology & Biochemistry, Rutgers University, Piscataway, NJ, USA 6 Department of Molecular and Cell Biology, University of California at Berkeley, and Department of Genome

Dynamics, Lawrence Berkeley National Lab, Berkeley, CA, USA 7 Department of Biology, Washington University in St. Louis, St. Louis, MO, USA 8 MIT Computer Science and Artificial Intelligence Laboratory, Cambridge MA, USA 9 Broad Institute of MIT and Harvard, Cambridge, MA, USA 10 Department of Genome Sciences, University of Washington, Seattle, WA, USA 11 Department of Molecular Biology, Cell Biology and Biochemistry, Brown University, Providence, RI, USA 12 Graduate Program in Bioinformatics, Boston University, Boston, MA, USA 13 Department of Pharmacology and Cancer Biology, Duke University Medical Center, Durham, NC, USA. 14 Department of Medicine, University of Washington, Seattle, WA, USA

2

Summary

Chromatin, the composite of DNA and its associated proteins, occurs in various conformations in living cells

and is essential for cell differentiation, gene regulation and other key cellular processes. We present a genome-

wide map of the chromatin landscape for Drosophila melanogaster, based on the distributions of 18 histone

modifications and 9 combinatorial patterns identified by computational analysis. Integrative analysis with other

genome-wide mapping data (non-histone chromatin proteins, DNaseI hypersensitive sites, GRO-seq engaged

polymerase, short/long RNA products) reveals distinct properties of chromosomes, genes, regulatory elements,

and other functional domains. This analysis identifies distinct chromatin signatures among active genes that are

correlated with differences in gene length, exonic structure, regulatory function, and genomic context. It also

reveals a diversity of chromatin signatures among Polycomb targets, including a subset with paused RNA

polymerase. This systematic profiling and integrative analysis of chromatin signatures provides important

insights into the differential packaging of functional elements, and will serve as a valuable resource for future

experimental investigations of genome structure and function.

3

The NIH model organism Encyclopedia of DNA Elements (modENCODE) project is focused on using the D.

melanogaster and C. elegans model systems to build a foundation for understanding genome function by

providing the community with a comprehensive map of the distributions of chromatin components, transcription

factors, transcripts, small RNAs, and origins of replication1. Drosophila has been used as a model system for

over a century to study mechanisms of chromosome structure and function, gene regulation, development, and

evolution. The availability of nearly complete, high quality euchromatic and heterochromatic sequence

assemblies for D. melanogaster2 and other Drosophila species3, extensive community annotation of genes and

other functional elements4, and a vast repertoire of experimental manipulations for elucidating genome

functions highlight the value of performing comprehensive epigenomic studies in Drosophila.

The packaging of DNA into chromatin occurs in a variety of conformations that impact the transfer of

information from genome sequence to genic, chromosomal, cellular and organismal functions. Thus, genome-

wide profiling of chromatin components (post-translational histone modifications and non-histone proteins)

provides an information-rich annotation for the potential functions of the underlying DNA sequences. Analyses

of these chromatin landscapes have identified simple patterns associated with specific DNA elements, such as

promoters, enhancers, and transcription start and end sites, as well as the active or silent status of genes and

large domains5. Here, we build upon this work by painting a comprehensive picture of the chromatin landscape

in a model eukaryotic genome. We define combinatorial chromatin ‘states’ at different levels of organization,

from individual regulatory units to the chromosome level, and provide evidence for the association of individual

states with regulation of genome functions.

Combinatorial chromatin states

We performed chromatin immunoprecipitation (ChIP)-array analysis on numerous histone modifications and

chromosomal proteins (Table 1), using antibodies tested for cross-reactivity with non-target proteins or

modifications6 (Methods and Supp. Figure S1). Here, we describe results and analyses for the two cell lines

4

with the most comprehensive data sets, S2-DRSC (S2) and ML-DmBG3-c2 (BG3), derived from late male

embryonic tissues (stages 16-17) and the central nervous system of male third instar larvae, respectively (data

from other cell lines and animal stages are available from http://www.modencode.org). Genome-wide analysis

reveals groups of correlated factors, including those associated with heterochromatic regions7, Polycomb-

mediated repression8, and active transcription9 (Supp. Figure S2), similar to those observed in other

organisms10. This clustering suggests that specific histone modifications work together to achieve a distinct

chromatin “state”.

We utilized a machine-learning approach to identify the prevalent combinatorial patterns of 18 histone

modifications across the genome. Our model captures the overall complexity of chromatin patterns observed in

S2 and BG3 cell lines with nine combinatorial states (Figure 1a, Methods). The model associates each genomic

location with a particular state, generating a chromatin-centric annotation of the genome (Figure 1b). We

examined each state for enrichments in non-histone proteins (e.g. RNA Pol II, HP1a; Figure 1a, Supp. Figure

S3), gene elements (e.g. TSS, gene body), genome coverage, and for distributions across the karyotype (Figure

1b, Supp. Figure S4) and finer-scale levels (examples in Figure 1c-e).

The majority of the chromatin patterns in the 9 state model are associated with transcriptionally active genes.

The promoter and transcription start site (TSS)-proximal regions are identified by state 1 (Figure 1; red),

marked by prominent enrichment in H3K4me3/me2 and H3K9ac. The transcriptional elongation signature

associated with H3K36me3 enrichment11 is captured by state 2 (purple), which preferentially occurs over

exonic regions of transcribed genes. State 3 is typically found within intronic regions, and is distinguished by

high enrichment in H3K27ac, H3K4me1, and H3K18ac (brown). Exhibiting relatively low levels of H3K4me3,

state 3 resembles the chromatin signatures of mammalian enhancers12. A related chromatin signature is captured

by state 4 (coral), distinguished by enrichment of H3K36me1, accompanied by H3K18ac or H3K4me1, but

lacking H3K27ac. State 4 is biased towards intronic regions in expressed genes; however, this signature is also

found in intergenic regions, and overall shows only a slight preference for genic sequence (Figure 1a). The

5

number of genes associated with different chromatin states, as well as the spatial placement of different states

within genes is shown in Supp. Figure S5.

Several aspects of large-scale genome organization are revealed by the karyotype view of chromatin annotation

(Figure 1b). Chromosome X (from male cells) is strikingly enriched for state 5 (green), which has high levels of

H4K16ac and some enrichment in H3K36me3 and other marks associated with “elongation” state 2. The

pericentromeric heterochromatin domains and chromosome 4 are characterized by high levels of

H3K9me2/me37, as well as depletion of H3K23ac and other marks associated with activation (state 7, dark

blue). Interestingly, the border between euchromatin and heterochromatin defined by these states varies among

different cell lines and fly tissues13. Finally, the model also distinguishes another set of heterochromatin-like

regions, which contain moderate levels of H3K9me2 and me3 (state 8, light blue, Figure 1e). This state

occupies extensive domains positioned predominantly within the euchromatic arms on the autosomes in BG3

cells, and in chromosome X in both cell lines13.

Further aspects of chromatin organization are visualized by folding the chromosome into a square using a

Hilbert curve pattern (Figure 2a)14, which maintains the spatial proximity of the nearby elements. Thus, local

patches of corresponding colors reveal the sizes and relative positions of domains associated with particular

chromatin states (Figure 2b,c; Supp. Figures S6,S7; interactive online browser15). For instance, specks of TSS-

proximal regions (state 1, red) are typically positioned within larger blocks of transcriptional elongation marks

(state 2, purple), which are in turn encompassed by extensive patches of H3K36me1-enriched domains (state 4,

coral) and variable-sized blocks of enhancer-like signatures (state 3, brown). The clusters of open chromatin

formed by these gene-centric patterns are separated by extensive transcriptionally-silent domains (state 9, light

gray; state 8, light-blue) and regions of Polycomb-mediated repression (state 6, dark gray). Finally, the presence

of red specks (state 1 – active TSS regions) within the extensive dark blue regions (state 7) illustrates sparse

active genes within pericentric heterochromatin. The factors that may delimit regions associated with domain

boundaries are of interest, but were not identified by our analysis (Supp. Figure S8). While some aspects of

6

large-scale organization are apparent from the present chromatin annotation (Supp. Figures S9,S10), we have

developed a method to characterize chromatin organization at multiple spatial scales, depending on the genome

properties being investigated. For example, we observe that chromatin patterns most accurately reflect the

replication timing of the S2 genome at scales of ~170kb (Supp. Section 1), consistent with earlier size estimates

of chromatin domains influencing replication timing16 and suggesting that multiple replication origins are

coordinately regulated by the local chromatin environment (each replicon ~28-50kb17).

To examine combinatorial patterns not captured by the simplified 9 state model, we have also generated and

annotated a 30-state combinatorial model that utilizes presence/absence probabilities of individual marks18

(Supp. Figure S11). The increased number of states identifies finer variations that are potentially biologically

significant. For instance, the 30-state model identifies an additional chromatin signature corresponding to

transcriptional elongation in heterochromatic regions13. A comparison of the two models is provided in Supp.

Figure S11.

Chromatin signature reveals a class of genes enriched for regulatory function

Active genes generally display enrichments or depletions of individual marks at specific gene regions (Figure

3a). To more thoroughly examine the relationship between chromatin marks, gene organization, and gene

activity, we clustered genes based on the spatial pattern of chromatin signatures at promoters and gene bodies

(Supp. Figure G1). We observe distinct subclasses of active gene signatures that correlate with expression

magnitude (also see Supp. Section 2), gene structure, and genomic context (e.g. heterochromatic genes

combining H3K9me2/me3 with some active marks). One class of expressed genes (cluster 2, Supp. Figure G1;

131 genes in S2, 202 in BG3) was of particular interest, due to enrichment for H3K36me1 and longer than

average genes with developmental and other regulatory functions (Supp. Table G1).

7

To further examine the spatial enrichment patterns associated with larger transcribed genes, we clustered long

(≥4kb) expressed autosomal genes based on contiguous blocks of enrichment for each chromatin mark (first

panel, Figure 3b; 1055 genes). This analysis revealed that genes with large 5’-end introns (green subtree, Figure

3b; 552 genes) show extensive H3K27ac and H3K18ac enrichment, broader H3K9ac domains, and blocks of

H3K36me1 enrichment, corresponding to chromatin state 3 (from the 9 state model) within large intronic

regions (Figure 3b, last column, dark brown color). In contrast, genes with more uniformly distributed coding

regions (red subtree, Figure 3b) lack most state 3 marks, and H3K9ac enrichment is restricted to the 2kb

downstream of the TSS (state 1; Figure 3b, last column, red color). These differences are not explained by

variation in histone density (Supp. Figure G2). Overall, the presence or absence of state 3 is the most common

difference in the chromatin composition of expressed genes of length 1kb and longer (Supp. Figure G3), and the

acquisition of state 3 consistently correlates with a reduced fraction of coding sequence in the gene body,

mainly associated with the presence of a long first intron.

The set of large-intron genes (green subtree, Figure 3b) is enriched for developmental and regulatory functions

(Supp. Table G2); these genes are often controlled by complex regulatory regions and multiple promoters.

Indeed, we observe that the state 3 domains that characterize the introns of such genes are enriched for binding

of chromosomal proteins and other factors with known roles in gene regulation. These include Nipped-B19

(Figure 3b), a cohesin-complex loading protein previously associated with transcriptionally active regions19,20,

and the histone methyltransferase ASH1 (Supp. Figure G4), thought to act primarily at H3K4, also reported to

methylate H3K9, H3K36 and H4K2021.

State 3 domains are highly enriched for specific chromatin remodeling factors (SPT16 and dMI-2; Supp. Figure

G5, G6), whereas state 1 regions around active TSSs are preferentially bound by NURF301 and MRG15. ISWI

is enriched in both states 1 and 3 (Supp. Figure G6, G7). State 3 also exhibits the highest level of nucleosome

turnover22, and shows notably higher enrichment of the H3.3 histone variant23 than either the TSS- or

elongation-associated states 1 and 2 (Supp. Figure G5,G6). Consistent with earlier analysis of cohesin-bound

8

regions24, state 3 sequences tend to replicate early in G1 phase, and show abundance of early replicating origins

(Supp. Figure G8). A likely regulatory role for state 3 is further supported by enrichment for a known enhancer

binding protein (dCBP/p300 25) in these regions in adult flies, as is the case for enhancers validated in transgene

constructs26 (Supp Figure G9).

Multiple modes of regulation in Polycomb domains

In Drosophila, loci repressed by Polycomb group (PcG) proteins are embedded in broad H3K27me3 domains

that are regulated by Polycomb Response Elements (PREs) bound by E(Z), PSC, and dRING (Figure 1d)27,28.

We find that regions of H3K4me1 enrichment surround all PREs, 90% of which also display narrower peaks of

H3K4me2 enrichment (Supp. Figure P1). While this pattern is reminiscent of transcriptionally active promoter

regions, PREs lack H3K4me3 (Supp. Figure P1), suggesting that a different mechanism of H3K4 methylation is

employed, perhaps involving dependence on the Trithorax (TRX) H3K4 HMTase found at all PREs28.

To gain insight into the distinct chromatin states associated with PcG target genes that range from fully

repressed to fully active28, we analyzed the chromatin and transcriptional signatures of TSSs in Polycomb-

bound domains (Figure 4a, Supp. Figure P2). In addition to fully repressed TSSs (cluster 1, Figure 4a), we

identify TSSs within domains that correspond to the “balanced” state28 (cluster 2, Figure 4a), distinguished by

coexistence of Polycomb with active marks (including ASH1) and production of low level full-length mRNA

transcripts (e.g. Su(z)2 domain, Figure 1d). TSSs in clusters 3 and 4 are distinguished by presence of adjacent

PREs (Figure 4a), and are associated with the H3K4me1/2 marks (see above).

Surprisingly, we find that approximately 53% of the PRE-proximal TSSs bind RNA Pol II and produce short

RNA transcripts (cluster 3, Figure 4a,c), a pattern recently linked to stalling of engaged RNA Pol II 29. Using

the global run-on sequencing (GRO-seq) assay to accurately assess locations, abundance and orientation of

engaged RNA polymerases30, we observe that cluster 3 TSSs produce short transcripts of 400-600nt in the sense

orientation only. The level of RNA Pol II at these TSSs is low compared to most transcriptionally active genes

9

(compare to cluster 2), but the level of GRO+ short RNAs is similar to that found at active genes (Figure 4d);

thus, the majority of transcription starts in cluster 3 fail to elongate. Genes without a TSS-proximal PRE

generally lack short transcript signatures (e.g. clusters 1 and 5 in Figure 4a; see Supp. Figure P2 for exceptions).

Interestingly, genes with TSS-proximal PREs (clusters 3,4) are enriched for regulatory and developmental

functions, even more than other genes within Polycomb domains (see Supp Tables P1, P2). Importantly,

engaged polymerases and transcripts are not a general feature of PREs; TSS-distal PREs typically lack short

RNA and GRO-seq signals (Figure 4b, e), although they are also enriched in H3K4me1/me2. The striking link

between TSS-proximal PREs and the production of short RNAs suggests a potential mechanism for control of

developmental regulatory genes in PC domains, whereby the same features that recruit H3K4 methyl marks at

PREs may also facilitate RNA Pol II recruitment to TSSs.

The paused genes identified here resemble the “bivalent” genes in mammalian cells, which are similarly linked

to transcriptional pausing of key regulatory and developmental genes31. However, the mammalian “bivalent

state” is characterized by the simultaneous presence of PcG proteins, H3K27me3 and H3K4me3. This

combination is absent from the paused promoters described here (cluster 3, Figure 4a). Drosophila loci enriched

for both H3K27me3 and H3K4me3 are rare and found only within the ASH1-associated “balanced” state, which

is associated with low level, productive elongation28,32 (cluster 2). Thus, paused regulatory genes within

Drosophila Polycomb domains display a different chromatin signature from mammalian cells, which may be

linked to the presence of TSS-proximal PREs.

DNaseI hypersensitivity identifies putative enhancers exhibiting bi-directional transcription

We utilized a DNase I hypersensitivity assay33 to examine the distributions of putative regulatory regions and

their relationships with chromatin states. DHS mapping broadly identifies sites with low nucleosome density

and/or regions bound by non-histone proteins34. Short-read sequencing35 identified 8616 high-magnitude DNase

I hypersensitive sites (DHSs) in S2 cells and 6354 in BG3 cells (and a comparable number of low-magnitude

10

DHSs, Supp. Figure Y1; see Methods). Of the high magnitude DHSs, 43% are positioned within 250bp of

TSSs, and 64% within 2kb (Figure 5b). 98% of these TSSs are transcriptionally active (Supp. Figure Y2),

consistent with earlier observations that the promoters of expressed genes exhibit nucleosome-free regions23.

Thus, the chromatin context of the TSS-proximal DHSs is dominated by the features expected of an active TSS,

including bound RNA polymerase II, and enrichments for H3K4me3, H2B-ubiqutination and other state 1

marks extending in the direction of transcription (Figure 5a, Supp. Figure Y3).

Of the 36% TSS-distal DHSs (>2kb away), most (60%) are positioned within annotated expressed genes (5’

UTRs and intronic regions, Supp. Figure Y2). These gene-body DHSs are distinguished from TSS-proximal

DHSs by the paucity of H3K4me3, higher levels of both H3K4me1 and H3K36me1 (clusters 4, 5, Figure 5a,

Supp. Figure Y4), and enrichments for H3K18ac, H3K27ac and other marks linked to chromatin state 3 (Figure

1a). An additional 20% of the TSS-distal DHSs are positioned outside of annotated genes, but show signatures

associated with active transcription start sites or elongating gene bodies, suggesting the presence of new

alternative promoters or unannotated genes (Supp. Figure Y5, Y6). The remaining 20% of TSS-distal DHSs

appear intergenic (6% of all DHSs); these sites are typically enriched for H3K4me1, but lack other active marks

(cluster 6, Figure 5a).

Most DHS positions (82%) fall into the TSS-proximal state 1 or the intron-biased state 3 (Figure 5c). State 3

lacks H3K4me3 and is enriched for H3K4me1/H3K27ac/H3K18ac, similar to mammalian enhancer

elements12,36. Further, we find that many state 3 DHS positions bind regulatory proteins: GAGA factor binds to

49% of the DHS sites in S2 cells, and developmental transcription factors bind to 44% of the DHS sites in

embryos37. TSS-distal DHSs in Drosophila exhibit low-level bi-directional transcripts (Figure 5a, shortRNA

panels), analogous to the enhancer RNAs (eRNAs) characterized in mice38. Strand-specific GRO-seq profiles

confirm the presence of antisense transcripts at intragenic DHS sites (Figure 5f, Supp. Figures Y7, Y8), whose

levels are an order of magnitude lower than sense strand transcripts from these expressed genes, as observed for

murine enhancers38. A similar magnitude of bi-directional transcription is also observed at intergenic DHSs

11

(Figure 5f, Supp. Figure Y4). Together, these results demonstrate that eRNA-like transcripts are a common

feature of TSS-distal DHSs in Drosophila, a feature that is conserved with mammals.

The association of DHSs with chromatin states 1 and 3 (Figure 5d) is maintained across the genome, even in

chromosome 4 and pericentromeric heterochromatin, where they are infrequently found (Supp. Figure Y9). This

suggests that these chromatin states and associated remodeling factors (e.g. ISWI, SPT16) provide the context

necessary for non-histone chromosomal protein binding at DHSs, or are the consequence of such binding

events. To investigate the relationship between chromatin states and the presence of DHSs, we analyzed a high-

confidence set of loci that exhibit a DHS in only one of the two examined cell lines (Supp. Figure Y10).

Surprisingly, although most DHSs are positioned in state 1 regions (Figure 5c), 91% of the cell type-specific

DHSs are found within state 3 domains (14-fold increase compared to state 1 DHSs; Supp. Table Y1, Figure

5e). Comparison with DHSs in an additional cell type (Kc167, Supp. Figure Y11) confirms that state 3 is highly

enriched for DHSs that display plasticity between cell types. In the absence of DHSs, the altered loci maintain

chromatin state 3 in 23% of the cases, indicating that the presence of state 3 is not always dependent on the

DHS. More frequently, the altered loci transition to an open chromatin state 4 (43% of the cases, Figure 5e) that

lacks many histone modifications and chromatin remodelers (dMi-2, SPT16, Supp. Figure G5,G6) characteristic

of the enhancer-like state 3. Less common are transitions to the Polycomb state 6 (7%) or background state 9

(17%) that typically coincide with gene silencing. Most of the genes associated with loci that maintain state 3 or

transition to open chromatin state 4 remain transcriptionally active (Supp. Figure Y12). These observations,

combined with previously described findings, support an enhancer-like function for state 3 DHSs, but also

suggest a more subtle regulatory role than simple linkage to the presence or absence of gene expression.

Chromatin-based elucidation of genome organization and gene function

The genomic chromatin state annotation and the discovery of refined chromatin signatures for chromosomes,

domains and subsets of regulatory genes demonstrate the utility of a systematic, genome-wide profiling of an

organism that is already understood in considerable detail. Clearly, the definition and functional annotation of

12

chromatin patterns will be enhanced by incorporation of data for different types of components. Five ‘colors’ of

chromatin were recently identified in Kc167 cells using chromosomal protein maps39. Comparison with the 9

state model indicates correlation of some features, as well as considerable differences between the resulting

states and classes of functional elements (Supp. Figure Y13), suggesting that further integration of such data in

the same cell type may allow to resolve additional functional features. Our results illustrate the utility of

integrating multiple data types (histone marks, non-histone proteins, chromatin accessibility, short RNAs, and

transcriptional activity) for comprehensive characterization of functional chromatin states.

An important, repeated theme is that chromatin state analysis identifies unexpected distinctions between subsets

of genes. One key finding is the identification of classes of active genes with distinct chromatin patterns.

Besides the differences linked to genomic context (e.g., male X chromosome, heterochromatin), the main

source of variability is the presence of the acetylation-rich state 3 (Figure 6). State 3 is enriched in long intronic

regions, distinguishing an important class of genes with extensive 5’-end introns that frequently encode proteins

with regulatory and developmental functions. Several lines of evidence suggest that the specific positions within

the intronic regions marked by state 3 are important for gene regulation. State 3 regions show specific

associations with known chromatin remodelers (SPT16, dMi-2 and ISWI), gene regulatory proteins (e.g. GAF,

dCBP/p300), as well as higher rates of nucleosome turnover than any other chromatin state, including the active

TSS state 1. Furthermore, state 3 regions show the highest rates of transcription-dependent deposition of the

H3.3 variant, well above that observed in the elongation-associated state 2. State 3 genes are also bound by

cohesin complex proteins, which may preferentially target remodeled chromatin fibers that lack higher-order

packaging19, and promote looping interactions with promoter regions20.

The high density of DHSs within these regions also underscores the likely regulatory role of state 3 chromatin.

Strikingly, state 3 regions contain a comparable number and density of DHSs as the active TSS state 1, but

appear to account for most of the DHS plasticity observed between the S2, BG3, and other cell types. The

combinations of histone marks distinguishing state 3 from state 1 DHS positions (H3K4me1/me3 disparity,

13

presence of H3K27ac, H3K18ac) are signatures of mammalian enhancers12, which also show high variability

between cell types36. Furthermore, many of the state 3 DHSs bind regulatory proteins, and exhibit low levels of

short, non-coding bidirectional transcripts reminiscent of eRNAs identified in mice38. The majority of the TSS-

distal DHSs are found primarily within the bodies of transcribed genes, most likely a result of the compactness

of the Drosophila genome; hence the chromatin profiles surrounding these sites are not as isolated as in

mammals. Together, these findings suggest that state 3 regions most likely mark sites containing enhancers or

other regulatory elements.

The presence of gene subclasses with different chromatin signatures is not limited to transcriptionally active

genes. Genes within Polycomb domains also display several distinct functional states. A small subset of PcG

target genes is maintained in a ‘balanced state’ marked by binding of ASH1 and other proteins, and full-length

expression at low-to-moderate levels28. Other genes within PC domains do not produce full-length transcripts;

however, we demonstrate chromatin signatures that distinguish a class of genes with PREs immediately

adjacent to TSS positions, a substantial fraction of which (53%) are maintained in a transcriptionally paused

state. The repertoire of chromatin signatures observed at TSSs of PcG target genes (Figure 4a) may represent

stages in a progression from fully repressed to fully activated. Alternatively, distinct signatures might mark

subsets of regulatory genes that require either long-term repression or the ability to reverse functional states,

depending on environmental or developmental cues.

In summary, we note that comprehensive analysis of chromatin signatures has enormous potential for

annotating functional elements in both well-studied and new genomes, such as promoters, enhancers, gene

bodies and large domains. Going forward, our systematic characterization of the epigenomic and transcriptional

properties of Drosophila cells should spur in-depth experimental analyses of the relationship between chromatin

states and genome functions, ranging from whole chromosomes down to individual regulatory elements and

circuits.

14

Figure Legends

Figure 1. Chromatin annotation of the Drosophila melanogaster genome.

a. A 9 state model of prevalent chromatin states found in S2 cells. Each chromatin state (row) is defined by a

combinatorial pattern of enrichment (red) or depletion (blue) in specific chromatin marks (first panel, columns).

For instance, state 1 is distinguished by enrichment in H3K4me2/me3 and H3K9ac, which is typical of

transcription start sites (TSS) in expressed genes. The enrichments/depletions are shown using a log2 scale,

normalized relative to chromatin input (see Supp. Figure S3 for BG3 data and for normalization relative to

histone density). The second panel shows average enrichment of chromosomal proteins for each state. The third

panel shows fold over/under-representation of genic and TSS-proximal (±1kb) regions relative to the entire tiled

genome. The fold enrichment of intronic regions is relative to genic regions associated with each state.

b. A genome-wide karyotype view of the domains defined by the 9 state model in S2 cells. The centromeres are

shown as open circles, and the dashed lines span gaps in the genome assembly. This illustrates several

prominent chromatin organization features (color code in a), including the extent of pericentromeric

heterochromatin (dark blue), and the unique H4K16ac-driven signature of the dosage-compensated male X

chromosome (green). See Supp. Figure S4 for the BG3 genome.

c-e. Examples of chromatin annotation at specific loci. c. illustrates two distinct chromatin signatures of

transcriptionally active genes: one (left) is associated with extensive enrichment in acetylation and

monomethylation marks of states 3 and 4, while the smaller gene (right) is limited to only states 1 and 2 that

recapitulate well-established TSS and elongation signatures (note: small patches of state 7 in CG13185 illustrate

increased H3K9me2 seen at some expressed genes in S213). d. illustrates a locus containing two Polycomb-

associated domains (dashed boxes). The domain on the left contains the transcriptionally repressed vg gene,

which also shows some H3K4me1 enrichment centered on PSC binding sites (PRE). The domain on the right

contains the Su(z)2 gene and illustrates a rare example of a balanced state in which low-level activation of PC-

bound genes by ASH1 and Trithorax proteins leads to the presence of chromatin state 3 signatures.

e. (BG3 cells) An example of a large state 8 domain located within euchromatic sequence, which is enriched for

15

chromatin marks typically associated with heterochromatic regions, but at lower levels than in pericentromeric

heterochromatin (state 7).

Figure 2. Visualization using compact folding illustrates the spatial organization of chromatin

annotations.

a. To illustrate the scales and relative locations of domains associated with different chromatin states, the

chromosome is folded using a geometric pattern (Hilbert space-filling curve) that maintains spatial proximity of

nearby regions. The schematic illustration of the first four folding steps is shown, with blue dots designating 5’

and 3’ ends of the sequence. While this compact curve is optimal for preserving proximity relationships, some

distal sites can appear next to each other along the fold axis (green dots).

b. Chromosome 3L in S2 cells. A domain of a given chromatin state appears as a patch of uniform color of

corresponding size. Note that some of the patches may combine distal chromosomal locations together and

would not represent coherent functional units; thin black lines (finer scale not shown) are used to separate

regions that are distant on the chromosome. Prominent features, such as the large pericentromeric domain

(lower right, dark blue) or blocks of Polycomb-mediated repression (dark gray), are easily distinguished. The

folded view also illustrates chromatin organization features that cannot be easily discerned from a linear view.

For instance, transcription start sites of expressed genes (state 1, red) appear as small specks typically

surrounded by elongation linked state 2 (purple), commonly embedded within larger regions marked by

H3K36me1-driven state 4 (light brown), which also contains patches of intron-associated state 3 (dark brown).

Such open chromatin regions appear as clustered blocks on the chromosome, separated by extensive domains of

state 9 (light-gray), which is largely devoid of transcription. The folded views can be browsed alongside the

linear annotations and other relevant data online15.

c. Other chromosomes. All autosomes show overall organizational patterns similar to chromosome 3L (see

Supp. Figure S6 for full-size view). Note that the chromosome 3R assembly does not reach the pericentromeric

heterochromatin. The chromatin state of the male X chromosome is strikingly different, with abundance of the

16

dosage-compensation driven state 5 (green) and increased low-level heterochromatic signatures (state 8, light

blue). See Supp. Figure S7 for BG3.

Figure 3. Chromatin patterns associated with transcriptionally active genes.

a. Location and extent of chromatin features relative to boundaries of expressed genes (over 1kb in size) in BG3

cells. The color intensity indicates the relative frequency with which enrichment/depletion of a given mark

occurs within the gene (normalized independently for each mark). Regions of enrichment/depletion for many

marks, such as H3K9ac or H3K27ac, are localized within TSS-proximal regions. Other marks appear

downstream of TSSs (H3K4me1, H3K36me1), extending further within the gene body. H2B-ubiquitination

typically covers the exact extent of the transcript uniformly.

b. Regions enriched for ‘active’ chromatin marks in long transcribed genes. The plot shows the extent of

regions enriched for various active marks for long (>4kb), transcriptionally active genes on BG3 autosomes.

Each row represents a gene. The first column illustrates coding exons within each gene using the same spatial

scaling; the last column shows chromatin state annotation. The clustering of the genes according to the spatial

patterns of chromatin marks separates genes with a high fraction of coding sequence (red subtree, bottom) from

genes containing long intronic sequences (green subtrees, top). The latter are associated with H3K18ac,

frequent H3K36me1, and extensive H3K9ac and H3K27ac domains. These histone marks correspond to the

presence of chromatin state 3 (last column) within the large intronic regions. The presence of these histone

marks is also associated with binding of specific chromosomal proteins, such as the tight coupling of H3K18ac

enrichment and binding of a Cohesin loading protein Nipped-B. A subset of such genes is also bound by the

ASH1 histone methyltransferase. Both of these complexes are strongly associated with the chromatin state 3

genome-wide (see Supp. Figure G2 for extended analysis, including renormalization by histone density).

Figure 4. Chromatin and transcriptional signatures of TSSs within domains of Polycomb-mediated

repression.

a. Distinct classes of TSSs in S2 Polycomb domains. Each row represents a TSS. Clusters 1-5 illustrate distinct

17

TSS states (see Supp. Figure P2 for a complete set of clusters): cluster 1 shows fully repressed TSSs with the

expected pattern of PC and H3K27me3 enrichment; cluster 2 shows 21 TSSs found within ASH1 domains,

which are maintained in a “balanced” state and are transcribed at moderate levels. Clusters 3 and 4 distinguish

TSSs located in the immediate proximity of Polycomb response elements (PREs), showing the symmetric

H3K4me1/me2 enrichment typical of all PREs (Supp. Figure P1). Many such TSSs (cluster 3, 42 TSSs)

produce short, non-polyadenylated transcripts along the sense strand (GRO+/shortRNA+ columns), indicating

that these genes are maintained in a paused polymerase state.

b. PRE positions distant from annotated TSSs. TSS-distal PREs exhibit enrichment for H3K4me1/me2, but

unlike TSS-proximal PREs, are not associated with GRO or shortRNA signatures.

c. An example of a Polycomb domain on chromosome 2L containing several PREs (dashed vertical lines).

While all PREs are associated with H3K4me1/me2 enrichment, only the PRE located immediately near a TSS

(middle) shows shortRNA and GRO-seq signals indicating paused polymerase activity.

d. The magnitude of shortRNA peaks at PRE-proximal TSSs is similar to that found at expressed genes. The

plot compares distributions of shortRNA peaks of different TSSs using cumulative distribution functions. The

plot demonstrates that the distribution of shortRNA peak magnitudes found at the PRE-proximal TSSs (red)

closely matches those seen at expressed genes (green), which are much higher than levels associated with silent

genes (blue) or random genomic positions (gray).

e. Transcriptional signals are a feature specific to TSS-proximal PREs. The distribution of shortRNA peak

magnitudes shows that a substantial fraction (44%) of TSS-proximal (<1kb) PREs are associated with

shortRNA signals while such signals are generally not observed for TSS-distal PREs.

Figure 5. Chromatin signatures of regulatory elements identified by DNaseI hypersensitivity.

a. Representative classes of high-magnitude DNaseI hypersensitive sites (DHSs) compared to the combinatorial

chromatin signatures in S2 cells. TSS-proximal DHSs show chromatin signatures expected of proximal

promoter regions in expressed genes: high H3K4me3 enrichment and RNA Pol II signal extending in the

direction of transcription (see Supp. Figure Y3 for a complete set of clusters). By contrast, TSS-distal DHSs are

18

associated with high H3K4me1 and low H3K4me3 levels. Most TSS-distal DHSs that are found within the

bodies of expressed genes are associated with chromatin state 3 (high H3K18ac, H3K27ac; Figure 1a) and vary

in the magnitude of enrichment for some of these marks (H2B-ubiquitination levels in cluster 4 vs. cluster 5, see

Supp. Figure Y4 for a complete set of clusters). A cluster of rare intergenic DHSs (cluster 6), associated with

localized peaks of H3K4me2 (see Supp. Figure Y6).

b. DHS positions relative to annotated gene structures in S2 and BG3 cells. The majority of the DHSs (64%) are

found in TSS-proximal (≤2kb) regions, including 43% found within 250bp of TSSs, 10% within 2kb proximal

promoter regions, and a large fraction of the DHS positioned within the 5’ UTRs. Most DHS positions outside

the TSS-proximal regions (>2kb away) are found within gene bodies, with only 12% of all DHS located in the

intergenic regions (including 3’-proximal positions located within the 2kb of the 3’ gene ends). Many such

positions, however, show transcriptional signatures consistent with the presence of currently unannotated genes

(Supp. Figure Y5).

c. Distribution of DHS positions among chromatin states. In both S2 and BG3 cell lines, the vast majority of

DHS positions are found within the TSS-proximal state 1 or enhancer-like state 3 regions.

d. States 1 and 3 exhibit the highest density of DHSs. On average, one DHS is found per 2.2kb of sequence

associated with state 1, and one per 3.7kb of state 3. The density is markedly lower in all other chromatin states.

e. Cell line-specific DHS differences are positioned predominantly within the enhancer-like state 3. The matrix

shows the chromatin state of loci containing DHSs in one cell line (x-axis) and the state of the same locus in the

other cell line where the DHS is absent (y-axis). While the majority of DHSs are distributed between states 1

and 3, most (91%) of the DHSs that differ between the cell lines are found in the enhancer-like state 3. When

DHSs are not present, the loci most commonly transition to an open chromatin state 4 (43%), or maintain

enhancer-like state 3 (23%). In both scenarios, most of the associated gene loci remain transcriptionally active

(see Supp. Figure Y12).

f. Low-magnitude non-coding RNA transcripts are associated with DHS positions. The presence of transcription

at the TSS-distal DHSs (see shortRNA in a.) is confirmed by the GRO-seq assay. The average GRO-seq

profiles illustrate the bi-directional transcription originating from the DHS positions. The top plot shows local

19

increase in the antisense GRO-seq signal for the DHSs located within transcribed genes; positive strand GRO-

seq signal is shown for genes transcribed along the negative strand, and negative strand GRO-seq signal is

shown for genes transcribed along the positive strand (see Supp. Figure Y7); dashed lines show median levels.

The DHS-associated signals are an order of magnitude lower than those observed for mRNA elongation. The

intergenic DHS positions (bottom plot) also show bi-directional GRO-seq signal of comparable magnitude (see

Supp. Figure Y5).

Figure 6. Spatial arrangements of chromatin states associated with active transcription.

a. A graphic representation of chromatin patterns associated with different types of expressed genes. Unlike

short or exon-rich expressed genes (right panel) which are primarily associated with TSS-proximal chromatin

state 1 and an elongation state 2, expressed genes with long intronic regions typically contain one or more

regions of enhancer-like state 3. The state 3 regions are bound by distinct chromatin remodeling enzymes

(SPT16, dMi-2), and show higher nucleosome turnover and H3.3 incorporation rates than either TSS-proximal

state 1 or elongation state 2 regions. Such long-intron genes are enriched for regulatory functions, are typically

found within more extensive domains formed by binding of Cohesin complexes, and are often also bound by

ASH1 histone methyltransferase. The state 3 regions also contain the majority of DHSs that vary between the

cell lines.

b. Network of transitions between chromatin states along expressed genes. The observed frequency of

transitions from one state to another along the body of expressed genes is shown using arrows of different

thickness. The most frequent transition is from active TSS state 1 directly into the elongation state 2. In long

genes, however, state 1 can be followed by a sequence of enhancer-like state 3 and open chromatin state 4,

eventually transitioning to the elongation state 2. While the 3’ end of expressed genes is typically marked by

state 2, proximity of neighboring promoters makes transition from state 2 to state 1 relatively common within

the genome.

20

Table 1. List of examined experimental measurements.

The table lists histone marks, chromosomal proteins and sequencing data analyzed in the manuscript. See Supp.

Table S1 for the antibody information.

Histone Marks Chromosomal Proteins Sequencing data

H3K36me3, H4K16ac, H3K79me1, H2B-ubiq,

H3K4me2, H3K4me3, H3K9ac, H3K79me2,

H3K27ac, H3K23ac, H3K36me1, H3K4me1,

H3K18ac, H3K27me3, H1, H4, H3K9me2,

H3K9me3, H4K5ac*, H4ac Tetra*, H4K8ac*

PSC, (EZ), PC, ASH1, RNA Pol

II, HP1a, HP1c, Nipped-B+, ISWI,

NURF301*, SPT16*, dMi-2*,

SU(VAR)3-9, MRG15, dRING,

CHROMATOR, GAF

RNA-seq

GRO-seq*

DNaseI-seq

short-RNA#

*measured only in S2 cells; +BG3 cell data from Misulovin et al.19. #S2 data from Nechaev et al.29.

Methods Summary

Histone modification and chromosomal protein antibodies were characterized for cross-reactivity with non-

target proteins or modifications using Westerns to nuclear extracts, peptide slot/dot blots, mass spectrometry,

immunofluorescence staining, and RNAi depletion6. ChIP-chip was performed in duplicate on chromatin

extracts from cells as described previously27, and IP’d DNA was amplified using whole genome amplification,

then hybridized to Affymetrix Drosophila Tiling 2.0R Arrays. Digital DNaseI-seq assays were performed as

described previously40, and Global Run-On library (GRO-seq) data was generated as described in Core et al30.

Short RNA data was generated by Nechaev et al29, and RNA-seq data was generated by Graveley et al.41.

The chromatin state models were generated using the distributions of 18 different histone marks, using only

data sets that passed strict statistical criteria for replicate consistency. The 9 state model utilized average

enrichment levels in 200bp bins, based on unsmoothed M values. Contiguous regions of enrichment for

individual marks was based on a three-state hidden Markov model (HMM) (corresponding to enriched, neutral,

and depleted profiles). DHS positions were determined as read density peaks, significantly enriched relative to

the genomic DNA control. Clustering of chromatin signatures around TSSs, PREs, and DHSs was determined

21

using the PAM algorithm, and average values utilized ±2kb bins. Further details about data processing and

computational analyses are described in the online supplementary information.

Methods

Growth conditions The ML-DmBG3-c2 cells were obtained from DGRC and are described at https://dgrc.cgb.indiana.edu/, the S2-DRSC cells were from DRSC (http://www.flyrnai.org/). All cell lines were grown to a density of ~5x106 cells/ml in Schneider's media (Gibco) supplemented with 10% FCS (HyClone). 10 µg/ml insulin was added to the ML-DmBG3-c2 media.

Antibodies The antibodies used are listed in Supplemental Table S1. Commercial antibodies against modified histones were tested by Western-blot for the lack of cross-reactivity with corresponding recombinant histone produced in E.coli and non-histone proteins of embryonic nuclear extract. The antibody specificity was further assayed by Western dot/slot blot against a panel of synthetic modified histone peptides. All antibodies that showed 50% or more of their total activity directed against non-histone embryonic proteins, or less then 5 fold higher affinity to corresponding histone peptide, were not used in our ChIP experiments. The specificity of antibodies against chromosomal proteins was tested by Western blots with nuclear extracts prepared from mutant flies or S2 cells subjected to RNAi knockdown. RNAi was done as described by Clemens et al.42. An antibody was considered specific if it recognized a major band of expected mobility that was absent in the sample prepared from mutant flies or diminished 2-fold or more upon RNAi knock-down. For the chromatin proteins for which only one antibody was available, we performed validation by comparing its genomic distribution with the published distribution of a different component of the same protein complex or to published genomic distributions generated with a different antibody. When no published data was available, distributions of a chromosomal protein was mapped with two antibodies generated against different epitopes to ensure accuracy (see Supp. Figure G7).

ChIP and microarray hybridization The preparation of crosslinked chromatin from cultured cells was performed as described in Schwartz et al.27 with the following modifications. Prior to ultrasound shearing the cells were permeabilized with 1% SDS and the shearing was done in TE-PMSF (0.1% SDS, 10mM Tris-HCl pH8.0, 1mM EDTA pH8.0, 1mM PMSF) using a Bioruptor (Diagenode) (2 x 10 min, 1 x 5 min; 30sec on, 30 sec off; high power setting). ChIP was performed as described in Schwartz et al.27 and IP’d DNA was amplified using the whole genome amplification kit (WGA2, Sigma) according to the manufacturer’s instructions, except that the chemical fragmentation step was omitted. The amplified material was labeled and hybridized to Drosophila Tiling Arrays v2.0 (Affymetrix) according to Schwartz et al.27.

Processing of ChIP data At least two independent biological replicates were assessed for each ChIP profile. The log2 intensity ratios (M values) were calculated for each replicate. The profiles were smoothed using local regression (lowess) with

22

500bp bandwidth, and the genome-wide mean was subtracted. The regions of significant enrichment were determined as clusters of at least 1kb in length, with gaps no more than 100bp where M value exceeds a statistically significant (0.1% FDR) enrichment threshold. The set of biological replicates was deemed consistent if the enriched regions from individual experiments had a 75% reciprocal overlap, or if at least 80% of the top 40% of the regions identified in each experiment were identified in the other replicate (before comparison the replicates were size-equalized by increasing the significance threshold for a replicate with more enriched sequence). The data from individual replicates were then combined using local regression smoothing, and used for all of the presented analysis, unless noted otherwise.

DNaseI hypersensitivity Digital DNaseI-seq assays were performed as described previously40. The sequenced reads were aligned to the dm3 genome assembly, recording only uniquely mappable reads. To detect DNase I hypersensitive sites, hotspot positions were identified based on a 300bp scanning window statistic (Poisson model relative to 50kb background density, Z-score threshold of 2), and peaks of read density were selected within the hotspots using randomization-based thresholding at 0.1% FDR. The set of high-magnitude DHS analyzed in the manuscript (except for Supp. Figure Y1) was then determined as a subset of all identified peaks that show statistically significant enrichment over the normalized genomic DNA read density profile (using a 300bp window centered around the peak, binomial model, with Z-score threshold of 3). This method controls for copy number variation and sequencing/mapping biases, however it may also reduce the sensitivity of DHS detection. In the DHS chromatin profile clustering analysis (Figure 5a, relevant supplementary figures), DHSs found within 1kb of another DHS were excluded if their enrichment magnitude (relative to genomic background) was lower (to avoid showing the same region more than once).

RNA sequencing The preparation of RNA-seq libraries and sequencing is described in Graveley et al.41. The sequenced reads were aligned to the dm3 genome assembly and annotated exon junctions, recording only uniquely mappable reads. The RPKM (reads per kilobase of exonic sequence per million reads mapped) was estimated for each exon. The total transcriptional output of each annotated gene was estimated based on the maximum of all exons within the gene. The presented analysis uses

€

log10(RPKM +1) values unless otherwise noted.

GRO sequencing Global Run-On library was prepared from S2 cells and sequenced as described by Core et al 30. The obtained reads were aligned to the dm3 genome assembly, recording only uniquely mappable reads. The smoothed profiles of reads mapping to each strand were calculated using Gaussian smoothing (

€

σ =100bp). The presented analysis uses

€

log10(d +1) , where

€

d is the smoothed density value. Short RNA data processing The short RNA data for S2 cells was generated by Nechaev et al29, and was aligned and processed in the same way as the GRO-seq data.

Chromatin state models To derive a nine-state joint chromatin state model for S2 and BG3 cells (Figure 1a), the genome was first divided into 200bp bins, and the average enrichment level was calculated within each bin based on unsmoothed M values taking into account individual replicates, using all histone enrichment profiles and PC to discount the genome-wide difference in S2 H3K27me3 profiles. The bin-average values of each mark were shifted by the genome-wide mean, scaled by the genome-wide variance, and quantile-normalized between the two cells. The HMM model with multivariate normal emission distributions was then determined from the Baum-Welch algorithm using data from both cell types, and 30 seeding configurations determined with K-means clustering. States with minor intensity variations (Euclidian distance of mean emission values < 0.15) were merged. Larger models (up to 30 states) were examined, and the final number of states was chosen for optimal interpretability.

23

An extensive discrete chromatin state model (Supp. Figure S11) was calculated as described in Ernst et al.18. The model was trained using 200bp grid with binary calls (enriched/not enriched). The binary calls were made based on a 5% FDR threshold determined from 10 genome-wide randomizations for each mark. For H1, H4 and H3K23ac regions of significant depletion rather than enrichment were called.

Regions of enrichment for individual marks (Figure 3) To determine contiguous regions of enrichment for individual marks, a three-state HMM model was used, with states corresponding to enriched, neutral, and depleted profiles (normally-distributed emission parameters:

€

µ = −0.5 0 0.5[ ] ,

€

σ2 = 0.3). The enriched regions were determined from the Viterbi path. The HMM segmentation was applied to unsmoothed M value data taking into account individual biological replicates.

Classification of enrichment profiles (Figures 4,5) Clustering of chromatin signatures around TSSs (Figure 4a), PREs (Figure 4b), and DHSs (Figure 5a, relevant supplements) was determined using PAM algorithm. For clustering, each profile was summarized with average values within bins spanning ±2kb regions. 100bp bins were used for the central ±500bp region, 300bp bins outside.

References 1 Celniker, S. E. et al., Unlocking the secrets of the genome. Nature 459 (7249), 927 (2009). 2 Adams, M. D. et al., The genome sequence of Drosophila melanogaster. Science 287 (5461), 2185

(2000); Hoskins, R. A. et al., Sequence finishing and mapping of Drosophila melanogaster heterochromatin. Science 316 (5831), 1625 (2007).

3 Clark, A. G. et al., Evolution of genes and genomes on the Drosophila phylogeny. Nature 450 (7167), 203 (2007).

4 Tweedie, S. et al., FlyBase: enhancing Drosophila Gene Ontology annotations. Nucleic Acids Res 37 (Database issue), D555 (2009).

5 Felsenfeld, G. and Groudine, M., Controlling the double helix. Nature 421 (6921), 448 (2003); Mendenhall, E. M. and Bernstein, B. E., Chromatin state maps: new technologies, new insights. Curr Opin Genet Dev 18 (2), 109 (2008).

6 Egelhofer TA, Minoda A, Klugman S, Lee K, et. al, An assessment of histone-modification antibody quality. Nat Mol Struct Biol (in press).

7 Eissenberg, J. C. and Reuter, G., Cellular mechanism for targeting heterochromatin formation in Drosophila. Int Rev Cell Mol Biol 273, 1 (2009).

8 Schwartz, Yuri B and Pirrotta, Vincenzo, Polycomb complexes and epigenetic states. Current Opinion in Cell Biology 20 (3), 266 (2008).

9 Li, B., Carey, M., and Workman, J. L., The role of chromatin during transcription. Cell 128 (4), 707 (2007).

10 Liu, C. L. et al., in PLoS Biol (2005), Vol. 3, pp. e328; Barski, A. et al., in Cell (2007), Vol. 129, pp. 823.

11 Carrozza, Michael J et al., Histone H3 methylation by Set2 directs deacetylation of coding regions by Rpd3S to suppress spurious intragenic transcription. Cell 123 (4), 581 (2005).

12 Heintzman, N. D. et al., Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat Genet 39 (3), 311 (2007).

13 Nicole C. Riddle, Aki Minoda, Peter V. Kharchenko, Artyom A. Alekseyenko, Yuri B. Schwartz, Michael Y. Tolstorukov, Andrey A. Gorchakov, Cameron Kennedy, Daniela Linder-Basso,Jacob D. Jaffe, Gregory Shanower, Mitzi I. Kuroda, Vincenzo Pirrotta, Peter J. Park, Sarah C. R. Elgin, Gary H.

24

Karpen, Plasticity in patterns of histone modifications and chromosomal proteins in Drosophila heterochromatin. ((submitted)).

14 Anders, S., Visualization of genomic data with the Hilbert curve. Bioinformatics 25 (10), 1231 (2009). 15 Drosophila chromatin browser, Available at http://compbio.med.harvard.edu/flychromatin, (2010). 16 MacAlpine, D. M., Rodriguez, H. K., and Bell, S. P., Coordination of replication and transcription along

a Drosophila chromosome. Genes Dev 18 (24), 3094 (2004). 17 Blumenthal, A. B., Kriegstein, H. J., and Hogness, D. S., The units of DNA replication in Drosophila

melanogaster chromosomes. Cold Spring Harb Symp Quant Biol 38, 205 (1974). 18 Ernst, J. and Kellis, M., Discovery and characterization of chromatin states for systematic annotation of

the human genome. Nat Biotechnol 28 (8), 817 (2010). 19 Misulovin, Z. et al., Association of cohesin and Nipped-B with transcriptionally active regions of the

Drosophila melanogaster genome. Chromosoma 117 (1), 89 (2008). 20 Kagey, M. H. et al., Mediator and cohesin connect gene expression and chromatin architecture. Nature

467 (7314), 430 (2010). 21 Beisel, C. et al., Histone methylation by the Drosophila epigenetic transcriptional regulator Ash1.

Nature 419 (6909), 857 (2002); Tanaka, Y. et al., Trithorax-group protein ASH1 methylates histone H3 lysine 36. Gene 397 (1-2), 161 (2007).

22 Deal, R. B., Henikoff, J. G., and Henikoff, S., Genome-wide kinetics of nucleosome turnover determined by metabolic labeling of histones. Science 328 (5982), 1161 (2010).

23 Henikoff, S. et al., Genome-wide profiling of salt fractions maps physical properties of chromatin. Genome Res 19 (3), 460 (2009).

24 MacAlpine, H. K. et al., Drosophila ORC localizes to open chromatin and marks sites of cohesin complex loading. Genome Res 20 (2), 201 (2010).

25 Tie, Feng et al., CBP-mediated acetylation of histone H3 lysine 27 antagonizes Drosophila Polycomb silencing. Development 136 (18), 3131 (2009); Visel, Axel et al., ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature 457 (7231), 854 (2009).

26 Zinzen, R. P. et al., Combinatorial binding predicts spatio-temporal cis-regulatory activity. Nature 462 (7269), 65 (2009).

27 Schwartz, Y. B. et al., Genome-wide analysis of Polycomb targets in Drosophila melanogaster. Nat Genet 38 (6), 700 (2006).

28 Schwartz, Yuri B. et al., Alternative Epigenetic Chromatin States of Polycomb Target Genes. PLoS Genet 6 (1), e1000805 (2010).

29 Nechaev, S. et al., Global analysis of short RNAs reveals widespread promoter-proximal stalling and arrest of Pol II in Drosophila. Science 327 (5963), 335 (2010).

30 Core, L. J., Waterfall, J. J., and Lis, J. T., Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science 322 (5909), 1845 (2008).

31 Bernstein, B. E. et al., A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125 (2), 315 (2006); Kanhere, A. et al., Short RNAs are transcribed from repressed polycomb target genes and interact with polycomb repressive complex-2. Mol Cell 38 (5), 675 (2010).

32 Schuettengruber, B. et al., Functional anatomy of polycomb and trithorax chromatin landscapes in Drosophila embryos. PLoS Biol 7 (1), e13 (2009); Gan, Q. et al., Monovalent and unpoised status of most genes in undifferentiated cell-enriched Drosophila testis. Genome Biol 11 (4), R42 (2010).

33 Wu, C., The 5' ends of Drosophila heat shock genes in chromatin are hypersensitive to DNase I. Nature 286 (5776), 854 (1980); Wu, C. et al., The chromatin structure of specific genes: I. Evidence for higher order domains of defined DNA sequence. Cell 16 (4), 797 (1979).

34 Elgin, S. C., The formation and function of DNase I hypersensitive sites in the process of gene activation. J Biol Chem 263 (36), 19259 (1988); Jin, C. et al., H3.3/H2A.Z double variant-containing nucleosomes mark 'nucleosome-free regions' of active promoters and other regulatory regions. Nat Genet 41 (8), 941 (2009).

35 Hesselberth, J. R. et al., Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nat Methods 6 (4), 283 (2009).

25

36 Heintzman, N. D. et al., Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 459 (7243), 108 (2009).

37 MacArthur, S. et al., Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions. Genome Biol 10 (7), R80 (2009).

38 Kim, T. K. et al., Widespread transcription at neuronal activity-regulated enhancers. Nature 465 (7295), 182 (2010).

39 Filion, G. J. et al., Systematic protein location mapping reveals five principal chromatin types in Drosophila cells. Cell 143 (2), 212 (2010).

40 Sekimata, M. et al., CCCTC-binding factor and the transcription factor T-bet orchestrate T helper 1 cell-specific structure and function at the interferon-gamma locus. Immunity 31 (4), 551 (2009).

41 Gravely, B., et al., Celniker S.E., Characterization of transcriptional activity in Drosophila melanogaster. ((submitted)).

42 Clemens, J. C. et al., Use of double-stranded RNA interference in Drosophila cell lines to dissect signal transduction pathways. Proc Natl Acad Sci U S A 97 (12), 6499 (2000).

H3K23ac

HP1a

H3K9me2

H3K9me3

H3K4me3

RNA-Pol II

state

Figure 1

e.a.

b.

c. d.

987654321

H3K

4me3

H3K

4me2

H3K

9ac

H4K

16ac

H2B

−ubi

qH

3K79

me2

H3K

79m

e1H

3K36

me3

H3K

18ac

H3K

4me1

H3K

27ac

H3K

36m

e1H

3K9m

e3H

3K9m

e2H

4H

1H

3K27

me3

H3K

23ac

SPT1

6G

AFdM

i−2

ASH

1H

P1c

ISW

IN

UR

F301

RN

A po

l II

MR

G15

Chr

omat

ordR

ING

E(Z)

PC PSC

SU(V

AR)3

−9H

P1a

gene

TSS−

prox

imal

intro

n%

of g

enom

e

402566118138

colo

rco

de

chro

mat

in s

tate

s

−2 −1 0 01 2 −1 1log2 enrichment:

−2 −1 0 1 2

PC

H3K27me3

ASH1

H3K36me1

H3K4me1

H3K4me3

RNA-seq

H3

PSC

state

+-

chr2R

H3K36me1

H3K4me1

H3K18ac

H3K27ac

H2B-ubi

RNA-seq

RNA pol II

state

+-

chr2R

H3

+-

chr3L

3R3L

2L

X

2R

4 YHet

1Mbp

chr3

Rch

r2L

chrX

chr2

Rch

r4

a. c.

b.

chro

mat

in s

tate

s:

chro

mos

ome

3LFi

gure

2

5’3’

5’3’

5’3’

5’3’

pericentromericheterochromatincluster of small

expressed genes

PcGdomains heterochromatin-

like domain

open chromatindomain

98

76

54

32

1

codingsequence

H3K36me3 H3K9ac H3K36me1 H3K27ac H3K18ac Nipped-B chromatinstates

500bp 2Kbp body 2Kbp 500bp

enrichment

H3K23ac

H1

H4

H3K4me2

H3K4me3

H3K9ac

H3K27ac

RNA pol II

H3K79me2

H3K18ac

H3K36me1

H3K4me1

H4K16ac

��

H3K79me1

H3K36me3

depletedenriched

a. � Figure 3

chromatin states:

9

8

7

6

5

4

3

2

1

1.5Kbp 500bp body 500bp 1.5Kbp

PSC PC ASH1 H3K27me3 H3K4me1 H3K4me2 H3K4me3 RNA pol II GRO+ GRO- shortRNA+ shortRNA-

1

234

5

TSS-

dist

ant

PREs

TSS

with

in P

olyc

omb

dom

ains

0 1 2 3 4

0.0

0.2

0.4

0.6

0.8

1.0

all genesexpressedsilentPRE TSSrandom

log10(shortRNA read density)

frac

tion

of s

ites

belo

w th

is le

vel

TSS2Kbp 2Kbp

PRE2Kbp 2Kbp

GRO +

GRO -

short RNA

H3K4me1

H3K4me2

PSC

PC

H3K27me3

chr 2L

Figure 4

�� 0 1 2 3 4

0.0

0.2

0.4

0.6

0.8

1.0

log10(shortRNA peak)

scal

ed d

ensi

ty

��

a.

b.

c. d. e.

�� 0 1 2log2 enrichment

DHS2Kbp 2Kbp

H3K4me3 H3K4me1 H3K4me2 H3K9ac H3K18ac H3K79me2 H4K16ac H3K27ac RNA pol II H3K36me1 H2B−ubiq shortRNA+ shortRNA-

1

2

3

4

5

6

TSS-

prox

imal

TSS-

dist

al

a.

b.c. d. e. f.

Figure 5

intergenic (9%)promoter (10%)3’-proximal (3%) exon (1%)

intergenic (9%)promoter (10%)

at TSS43% 5’ UTR

14%

intron20%

10%9%

TSS-proximal (64%

)

gene body (34%)

gene body (34%)

1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

7

8

9

050

100200

300

chromatin state with DHS

chro

mat

in s

tate

with

out D

HS num

ber of altered DH

S

�� 0 1000 2000

��

��

��

�� 0 1000 2000

��

��

��

��

DHS within expressed genes

intergenic DHS

��

��

��

� ��

��

��

(antisense)

�� minus strand

1 2 4 6 83 5 7 9��

��

��

��

��

chromatin state

��

��

��!

� ��

�"#�

$

��

��

0

1

2 log2 enrichment

c.state 1

45%

state 337%

6%

5%

state 4 (6%)state 6 (3%)

state 9 (5%)

a. b.

cohesin / ASH1chromatin state

ISWINURF301

DHS

SPT16 / dMi-2H3.3 / nucleosome turnover

( cell type-specific, common)

49 1 4 3 34 4 2 9 91 2

long expressed gene short gene

1

4

2

3

comprehensive analysis of the chromatin landscape in...

Documents