comprehensive analysis of the chromatin landscape in...
TRANSCRIPT
1
Comprehensive analysis of the chromatin landscape in Drosophila melanogaster
Peter V. Kharchenko1,2, Artyom A. Alekseyenko3,4, Yuri B. Schwartz5, Aki Minoda6, Nicole C. Riddle7, Jason
Ernst8,9, Peter J. Sabo10, Erica Larschan3,4,11, Andrey A. Gorchakov3,4, Tingting Gu7, Daniela Linder-Basso5,
Annette Plachetka3,4, Gregory Shanower5, Michael Y. Tolstorukov1,2, Lovelace J. Luquette1, Ruibin Xi1,
Youngsook L. Jung1,3, Richard Park1,12, Eric P. Bishop1,12, Theresa P. Canfield10, Richard Sandstrom10, Robert
E. Thurman10, David M. MacAlpine13, John Stamatoyannopoulos10,14, Manolis Kellis8,9, Sarah C. R. Elgin7,
Mitzi I. Kuroda3,4, Vincenzo Pirrotta5, Gary Karpen6*, Peter J. Park1,2,3*
* co-corresponding authors 1 Center for Biomedical Informatics, Harvard Medical School, and Informatics Program, Children's Hospital,
Boston, MA, USA 2 Children’s Hospital Informatics Program, Boston, MA USA 3 Division of Genetics, Department of Medicine, Brigham & Women’s Hospital, Boston, MA USA 4 Department of Genetics, Harvard Medical School, Boston, MA, USA 5 Department of Molecular Biology & Biochemistry, Rutgers University, Piscataway, NJ, USA 6 Department of Molecular and Cell Biology, University of California at Berkeley, and Department of Genome
Dynamics, Lawrence Berkeley National Lab, Berkeley, CA, USA 7 Department of Biology, Washington University in St. Louis, St. Louis, MO, USA 8 MIT Computer Science and Artificial Intelligence Laboratory, Cambridge MA, USA 9 Broad Institute of MIT and Harvard, Cambridge, MA, USA 10 Department of Genome Sciences, University of Washington, Seattle, WA, USA 11 Department of Molecular Biology, Cell Biology and Biochemistry, Brown University, Providence, RI, USA 12 Graduate Program in Bioinformatics, Boston University, Boston, MA, USA 13 Department of Pharmacology and Cancer Biology, Duke University Medical Center, Durham, NC, USA. 14 Department of Medicine, University of Washington, Seattle, WA, USA
2
Summary
Chromatin, the composite of DNA and its associated proteins, occurs in various conformations in living cells
and is essential for cell differentiation, gene regulation and other key cellular processes. We present a genome-
wide map of the chromatin landscape for Drosophila melanogaster, based on the distributions of 18 histone
modifications and 9 combinatorial patterns identified by computational analysis. Integrative analysis with other
genome-wide mapping data (non-histone chromatin proteins, DNaseI hypersensitive sites, GRO-seq engaged
polymerase, short/long RNA products) reveals distinct properties of chromosomes, genes, regulatory elements,
and other functional domains. This analysis identifies distinct chromatin signatures among active genes that are
correlated with differences in gene length, exonic structure, regulatory function, and genomic context. It also
reveals a diversity of chromatin signatures among Polycomb targets, including a subset with paused RNA
polymerase. This systematic profiling and integrative analysis of chromatin signatures provides important
insights into the differential packaging of functional elements, and will serve as a valuable resource for future
experimental investigations of genome structure and function.
3
The NIH model organism Encyclopedia of DNA Elements (modENCODE) project is focused on using the D.
melanogaster and C. elegans model systems to build a foundation for understanding genome function by
providing the community with a comprehensive map of the distributions of chromatin components, transcription
factors, transcripts, small RNAs, and origins of replication1. Drosophila has been used as a model system for
over a century to study mechanisms of chromosome structure and function, gene regulation, development, and
evolution. The availability of nearly complete, high quality euchromatic and heterochromatic sequence
assemblies for D. melanogaster2 and other Drosophila species3, extensive community annotation of genes and
other functional elements4, and a vast repertoire of experimental manipulations for elucidating genome
functions highlight the value of performing comprehensive epigenomic studies in Drosophila.
The packaging of DNA into chromatin occurs in a variety of conformations that impact the transfer of
information from genome sequence to genic, chromosomal, cellular and organismal functions. Thus, genome-
wide profiling of chromatin components (post-translational histone modifications and non-histone proteins)
provides an information-rich annotation for the potential functions of the underlying DNA sequences. Analyses
of these chromatin landscapes have identified simple patterns associated with specific DNA elements, such as
promoters, enhancers, and transcription start and end sites, as well as the active or silent status of genes and
large domains5. Here, we build upon this work by painting a comprehensive picture of the chromatin landscape
in a model eukaryotic genome. We define combinatorial chromatin ‘states’ at different levels of organization,
from individual regulatory units to the chromosome level, and provide evidence for the association of individual
states with regulation of genome functions.
Combinatorial chromatin states
We performed chromatin immunoprecipitation (ChIP)-array analysis on numerous histone modifications and
chromosomal proteins (Table 1), using antibodies tested for cross-reactivity with non-target proteins or
modifications6 (Methods and Supp. Figure S1). Here, we describe results and analyses for the two cell lines
4
with the most comprehensive data sets, S2-DRSC (S2) and ML-DmBG3-c2 (BG3), derived from late male
embryonic tissues (stages 16-17) and the central nervous system of male third instar larvae, respectively (data
from other cell lines and animal stages are available from http://www.modencode.org). Genome-wide analysis
reveals groups of correlated factors, including those associated with heterochromatic regions7, Polycomb-
mediated repression8, and active transcription9 (Supp. Figure S2), similar to those observed in other
organisms10. This clustering suggests that specific histone modifications work together to achieve a distinct
chromatin “state”.
We utilized a machine-learning approach to identify the prevalent combinatorial patterns of 18 histone
modifications across the genome. Our model captures the overall complexity of chromatin patterns observed in
S2 and BG3 cell lines with nine combinatorial states (Figure 1a, Methods). The model associates each genomic
location with a particular state, generating a chromatin-centric annotation of the genome (Figure 1b). We
examined each state for enrichments in non-histone proteins (e.g. RNA Pol II, HP1a; Figure 1a, Supp. Figure
S3), gene elements (e.g. TSS, gene body), genome coverage, and for distributions across the karyotype (Figure
1b, Supp. Figure S4) and finer-scale levels (examples in Figure 1c-e).
The majority of the chromatin patterns in the 9 state model are associated with transcriptionally active genes.
The promoter and transcription start site (TSS)-proximal regions are identified by state 1 (Figure 1; red),
marked by prominent enrichment in H3K4me3/me2 and H3K9ac. The transcriptional elongation signature
associated with H3K36me3 enrichment11 is captured by state 2 (purple), which preferentially occurs over
exonic regions of transcribed genes. State 3 is typically found within intronic regions, and is distinguished by
high enrichment in H3K27ac, H3K4me1, and H3K18ac (brown). Exhibiting relatively low levels of H3K4me3,
state 3 resembles the chromatin signatures of mammalian enhancers12. A related chromatin signature is captured
by state 4 (coral), distinguished by enrichment of H3K36me1, accompanied by H3K18ac or H3K4me1, but
lacking H3K27ac. State 4 is biased towards intronic regions in expressed genes; however, this signature is also
found in intergenic regions, and overall shows only a slight preference for genic sequence (Figure 1a). The
5
number of genes associated with different chromatin states, as well as the spatial placement of different states
within genes is shown in Supp. Figure S5.
Several aspects of large-scale genome organization are revealed by the karyotype view of chromatin annotation
(Figure 1b). Chromosome X (from male cells) is strikingly enriched for state 5 (green), which has high levels of
H4K16ac and some enrichment in H3K36me3 and other marks associated with “elongation” state 2. The
pericentromeric heterochromatin domains and chromosome 4 are characterized by high levels of
H3K9me2/me37, as well as depletion of H3K23ac and other marks associated with activation (state 7, dark
blue). Interestingly, the border between euchromatin and heterochromatin defined by these states varies among
different cell lines and fly tissues13. Finally, the model also distinguishes another set of heterochromatin-like
regions, which contain moderate levels of H3K9me2 and me3 (state 8, light blue, Figure 1e). This state
occupies extensive domains positioned predominantly within the euchromatic arms on the autosomes in BG3
cells, and in chromosome X in both cell lines13.
Further aspects of chromatin organization are visualized by folding the chromosome into a square using a
Hilbert curve pattern (Figure 2a)14, which maintains the spatial proximity of the nearby elements. Thus, local
patches of corresponding colors reveal the sizes and relative positions of domains associated with particular
chromatin states (Figure 2b,c; Supp. Figures S6,S7; interactive online browser15). For instance, specks of TSS-
proximal regions (state 1, red) are typically positioned within larger blocks of transcriptional elongation marks
(state 2, purple), which are in turn encompassed by extensive patches of H3K36me1-enriched domains (state 4,
coral) and variable-sized blocks of enhancer-like signatures (state 3, brown). The clusters of open chromatin
formed by these gene-centric patterns are separated by extensive transcriptionally-silent domains (state 9, light
gray; state 8, light-blue) and regions of Polycomb-mediated repression (state 6, dark gray). Finally, the presence
of red specks (state 1 – active TSS regions) within the extensive dark blue regions (state 7) illustrates sparse
active genes within pericentric heterochromatin. The factors that may delimit regions associated with domain
boundaries are of interest, but were not identified by our analysis (Supp. Figure S8). While some aspects of
6
large-scale organization are apparent from the present chromatin annotation (Supp. Figures S9,S10), we have
developed a method to characterize chromatin organization at multiple spatial scales, depending on the genome
properties being investigated. For example, we observe that chromatin patterns most accurately reflect the
replication timing of the S2 genome at scales of ~170kb (Supp. Section 1), consistent with earlier size estimates
of chromatin domains influencing replication timing16 and suggesting that multiple replication origins are
coordinately regulated by the local chromatin environment (each replicon ~28-50kb17).
To examine combinatorial patterns not captured by the simplified 9 state model, we have also generated and
annotated a 30-state combinatorial model that utilizes presence/absence probabilities of individual marks18
(Supp. Figure S11). The increased number of states identifies finer variations that are potentially biologically
significant. For instance, the 30-state model identifies an additional chromatin signature corresponding to
transcriptional elongation in heterochromatic regions13. A comparison of the two models is provided in Supp.
Figure S11.
Chromatin signature reveals a class of genes enriched for regulatory function
Active genes generally display enrichments or depletions of individual marks at specific gene regions (Figure
3a). To more thoroughly examine the relationship between chromatin marks, gene organization, and gene
activity, we clustered genes based on the spatial pattern of chromatin signatures at promoters and gene bodies
(Supp. Figure G1). We observe distinct subclasses of active gene signatures that correlate with expression
magnitude (also see Supp. Section 2), gene structure, and genomic context (e.g. heterochromatic genes
combining H3K9me2/me3 with some active marks). One class of expressed genes (cluster 2, Supp. Figure G1;
131 genes in S2, 202 in BG3) was of particular interest, due to enrichment for H3K36me1 and longer than
average genes with developmental and other regulatory functions (Supp. Table G1).
7
To further examine the spatial enrichment patterns associated with larger transcribed genes, we clustered long
(≥4kb) expressed autosomal genes based on contiguous blocks of enrichment for each chromatin mark (first
panel, Figure 3b; 1055 genes). This analysis revealed that genes with large 5’-end introns (green subtree, Figure
3b; 552 genes) show extensive H3K27ac and H3K18ac enrichment, broader H3K9ac domains, and blocks of
H3K36me1 enrichment, corresponding to chromatin state 3 (from the 9 state model) within large intronic
regions (Figure 3b, last column, dark brown color). In contrast, genes with more uniformly distributed coding
regions (red subtree, Figure 3b) lack most state 3 marks, and H3K9ac enrichment is restricted to the 2kb
downstream of the TSS (state 1; Figure 3b, last column, red color). These differences are not explained by
variation in histone density (Supp. Figure G2). Overall, the presence or absence of state 3 is the most common
difference in the chromatin composition of expressed genes of length 1kb and longer (Supp. Figure G3), and the
acquisition of state 3 consistently correlates with a reduced fraction of coding sequence in the gene body,
mainly associated with the presence of a long first intron.
The set of large-intron genes (green subtree, Figure 3b) is enriched for developmental and regulatory functions
(Supp. Table G2); these genes are often controlled by complex regulatory regions and multiple promoters.
Indeed, we observe that the state 3 domains that characterize the introns of such genes are enriched for binding
of chromosomal proteins and other factors with known roles in gene regulation. These include Nipped-B19
(Figure 3b), a cohesin-complex loading protein previously associated with transcriptionally active regions19,20,
and the histone methyltransferase ASH1 (Supp. Figure G4), thought to act primarily at H3K4, also reported to
methylate H3K9, H3K36 and H4K2021.
State 3 domains are highly enriched for specific chromatin remodeling factors (SPT16 and dMI-2; Supp. Figure
G5, G6), whereas state 1 regions around active TSSs are preferentially bound by NURF301 and MRG15. ISWI
is enriched in both states 1 and 3 (Supp. Figure G6, G7). State 3 also exhibits the highest level of nucleosome
turnover22, and shows notably higher enrichment of the H3.3 histone variant23 than either the TSS- or
elongation-associated states 1 and 2 (Supp. Figure G5,G6). Consistent with earlier analysis of cohesin-bound
8
regions24, state 3 sequences tend to replicate early in G1 phase, and show abundance of early replicating origins
(Supp. Figure G8). A likely regulatory role for state 3 is further supported by enrichment for a known enhancer
binding protein (dCBP/p300 25) in these regions in adult flies, as is the case for enhancers validated in transgene
constructs26 (Supp Figure G9).
Multiple modes of regulation in Polycomb domains
In Drosophila, loci repressed by Polycomb group (PcG) proteins are embedded in broad H3K27me3 domains
that are regulated by Polycomb Response Elements (PREs) bound by E(Z), PSC, and dRING (Figure 1d)27,28.
We find that regions of H3K4me1 enrichment surround all PREs, 90% of which also display narrower peaks of
H3K4me2 enrichment (Supp. Figure P1). While this pattern is reminiscent of transcriptionally active promoter
regions, PREs lack H3K4me3 (Supp. Figure P1), suggesting that a different mechanism of H3K4 methylation is
employed, perhaps involving dependence on the Trithorax (TRX) H3K4 HMTase found at all PREs28.
To gain insight into the distinct chromatin states associated with PcG target genes that range from fully
repressed to fully active28, we analyzed the chromatin and transcriptional signatures of TSSs in Polycomb-
bound domains (Figure 4a, Supp. Figure P2). In addition to fully repressed TSSs (cluster 1, Figure 4a), we
identify TSSs within domains that correspond to the “balanced” state28 (cluster 2, Figure 4a), distinguished by
coexistence of Polycomb with active marks (including ASH1) and production of low level full-length mRNA
transcripts (e.g. Su(z)2 domain, Figure 1d). TSSs in clusters 3 and 4 are distinguished by presence of adjacent
PREs (Figure 4a), and are associated with the H3K4me1/2 marks (see above).
Surprisingly, we find that approximately 53% of the PRE-proximal TSSs bind RNA Pol II and produce short
RNA transcripts (cluster 3, Figure 4a,c), a pattern recently linked to stalling of engaged RNA Pol II 29. Using
the global run-on sequencing (GRO-seq) assay to accurately assess locations, abundance and orientation of
engaged RNA polymerases30, we observe that cluster 3 TSSs produce short transcripts of 400-600nt in the sense
orientation only. The level of RNA Pol II at these TSSs is low compared to most transcriptionally active genes
9
(compare to cluster 2), but the level of GRO+ short RNAs is similar to that found at active genes (Figure 4d);
thus, the majority of transcription starts in cluster 3 fail to elongate. Genes without a TSS-proximal PRE
generally lack short transcript signatures (e.g. clusters 1 and 5 in Figure 4a; see Supp. Figure P2 for exceptions).
Interestingly, genes with TSS-proximal PREs (clusters 3,4) are enriched for regulatory and developmental
functions, even more than other genes within Polycomb domains (see Supp Tables P1, P2). Importantly,
engaged polymerases and transcripts are not a general feature of PREs; TSS-distal PREs typically lack short
RNA and GRO-seq signals (Figure 4b, e), although they are also enriched in H3K4me1/me2. The striking link
between TSS-proximal PREs and the production of short RNAs suggests a potential mechanism for control of
developmental regulatory genes in PC domains, whereby the same features that recruit H3K4 methyl marks at
PREs may also facilitate RNA Pol II recruitment to TSSs.
The paused genes identified here resemble the “bivalent” genes in mammalian cells, which are similarly linked
to transcriptional pausing of key regulatory and developmental genes31. However, the mammalian “bivalent
state” is characterized by the simultaneous presence of PcG proteins, H3K27me3 and H3K4me3. This
combination is absent from the paused promoters described here (cluster 3, Figure 4a). Drosophila loci enriched
for both H3K27me3 and H3K4me3 are rare and found only within the ASH1-associated “balanced” state, which
is associated with low level, productive elongation28,32 (cluster 2). Thus, paused regulatory genes within
Drosophila Polycomb domains display a different chromatin signature from mammalian cells, which may be
linked to the presence of TSS-proximal PREs.
DNaseI hypersensitivity identifies putative enhancers exhibiting bi-directional transcription
We utilized a DNase I hypersensitivity assay33 to examine the distributions of putative regulatory regions and
their relationships with chromatin states. DHS mapping broadly identifies sites with low nucleosome density
and/or regions bound by non-histone proteins34. Short-read sequencing35 identified 8616 high-magnitude DNase
I hypersensitive sites (DHSs) in S2 cells and 6354 in BG3 cells (and a comparable number of low-magnitude
10
DHSs, Supp. Figure Y1; see Methods). Of the high magnitude DHSs, 43% are positioned within 250bp of
TSSs, and 64% within 2kb (Figure 5b). 98% of these TSSs are transcriptionally active (Supp. Figure Y2),
consistent with earlier observations that the promoters of expressed genes exhibit nucleosome-free regions23.
Thus, the chromatin context of the TSS-proximal DHSs is dominated by the features expected of an active TSS,
including bound RNA polymerase II, and enrichments for H3K4me3, H2B-ubiqutination and other state 1
marks extending in the direction of transcription (Figure 5a, Supp. Figure Y3).
Of the 36% TSS-distal DHSs (>2kb away), most (60%) are positioned within annotated expressed genes (5’
UTRs and intronic regions, Supp. Figure Y2). These gene-body DHSs are distinguished from TSS-proximal
DHSs by the paucity of H3K4me3, higher levels of both H3K4me1 and H3K36me1 (clusters 4, 5, Figure 5a,
Supp. Figure Y4), and enrichments for H3K18ac, H3K27ac and other marks linked to chromatin state 3 (Figure
1a). An additional 20% of the TSS-distal DHSs are positioned outside of annotated genes, but show signatures
associated with active transcription start sites or elongating gene bodies, suggesting the presence of new
alternative promoters or unannotated genes (Supp. Figure Y5, Y6). The remaining 20% of TSS-distal DHSs
appear intergenic (6% of all DHSs); these sites are typically enriched for H3K4me1, but lack other active marks
(cluster 6, Figure 5a).
Most DHS positions (82%) fall into the TSS-proximal state 1 or the intron-biased state 3 (Figure 5c). State 3
lacks H3K4me3 and is enriched for H3K4me1/H3K27ac/H3K18ac, similar to mammalian enhancer
elements12,36. Further, we find that many state 3 DHS positions bind regulatory proteins: GAGA factor binds to
49% of the DHS sites in S2 cells, and developmental transcription factors bind to 44% of the DHS sites in
embryos37. TSS-distal DHSs in Drosophila exhibit low-level bi-directional transcripts (Figure 5a, shortRNA
panels), analogous to the enhancer RNAs (eRNAs) characterized in mice38. Strand-specific GRO-seq profiles
confirm the presence of antisense transcripts at intragenic DHS sites (Figure 5f, Supp. Figures Y7, Y8), whose
levels are an order of magnitude lower than sense strand transcripts from these expressed genes, as observed for
murine enhancers38. A similar magnitude of bi-directional transcription is also observed at intergenic DHSs
11
(Figure 5f, Supp. Figure Y4). Together, these results demonstrate that eRNA-like transcripts are a common
feature of TSS-distal DHSs in Drosophila, a feature that is conserved with mammals.
The association of DHSs with chromatin states 1 and 3 (Figure 5d) is maintained across the genome, even in
chromosome 4 and pericentromeric heterochromatin, where they are infrequently found (Supp. Figure Y9). This
suggests that these chromatin states and associated remodeling factors (e.g. ISWI, SPT16) provide the context
necessary for non-histone chromosomal protein binding at DHSs, or are the consequence of such binding
events. To investigate the relationship between chromatin states and the presence of DHSs, we analyzed a high-
confidence set of loci that exhibit a DHS in only one of the two examined cell lines (Supp. Figure Y10).
Surprisingly, although most DHSs are positioned in state 1 regions (Figure 5c), 91% of the cell type-specific
DHSs are found within state 3 domains (14-fold increase compared to state 1 DHSs; Supp. Table Y1, Figure
5e). Comparison with DHSs in an additional cell type (Kc167, Supp. Figure Y11) confirms that state 3 is highly
enriched for DHSs that display plasticity between cell types. In the absence of DHSs, the altered loci maintain
chromatin state 3 in 23% of the cases, indicating that the presence of state 3 is not always dependent on the
DHS. More frequently, the altered loci transition to an open chromatin state 4 (43% of the cases, Figure 5e) that
lacks many histone modifications and chromatin remodelers (dMi-2, SPT16, Supp. Figure G5,G6) characteristic
of the enhancer-like state 3. Less common are transitions to the Polycomb state 6 (7%) or background state 9
(17%) that typically coincide with gene silencing. Most of the genes associated with loci that maintain state 3 or
transition to open chromatin state 4 remain transcriptionally active (Supp. Figure Y12). These observations,
combined with previously described findings, support an enhancer-like function for state 3 DHSs, but also
suggest a more subtle regulatory role than simple linkage to the presence or absence of gene expression.
Chromatin-based elucidation of genome organization and gene function
The genomic chromatin state annotation and the discovery of refined chromatin signatures for chromosomes,
domains and subsets of regulatory genes demonstrate the utility of a systematic, genome-wide profiling of an
organism that is already understood in considerable detail. Clearly, the definition and functional annotation of
12
chromatin patterns will be enhanced by incorporation of data for different types of components. Five ‘colors’ of
chromatin were recently identified in Kc167 cells using chromosomal protein maps39. Comparison with the 9
state model indicates correlation of some features, as well as considerable differences between the resulting
states and classes of functional elements (Supp. Figure Y13), suggesting that further integration of such data in
the same cell type may allow to resolve additional functional features. Our results illustrate the utility of
integrating multiple data types (histone marks, non-histone proteins, chromatin accessibility, short RNAs, and
transcriptional activity) for comprehensive characterization of functional chromatin states.
An important, repeated theme is that chromatin state analysis identifies unexpected distinctions between subsets
of genes. One key finding is the identification of classes of active genes with distinct chromatin patterns.
Besides the differences linked to genomic context (e.g., male X chromosome, heterochromatin), the main
source of variability is the presence of the acetylation-rich state 3 (Figure 6). State 3 is enriched in long intronic
regions, distinguishing an important class of genes with extensive 5’-end introns that frequently encode proteins
with regulatory and developmental functions. Several lines of evidence suggest that the specific positions within
the intronic regions marked by state 3 are important for gene regulation. State 3 regions show specific
associations with known chromatin remodelers (SPT16, dMi-2 and ISWI), gene regulatory proteins (e.g. GAF,
dCBP/p300), as well as higher rates of nucleosome turnover than any other chromatin state, including the active
TSS state 1. Furthermore, state 3 regions show the highest rates of transcription-dependent deposition of the
H3.3 variant, well above that observed in the elongation-associated state 2. State 3 genes are also bound by
cohesin complex proteins, which may preferentially target remodeled chromatin fibers that lack higher-order
packaging19, and promote looping interactions with promoter regions20.
The high density of DHSs within these regions also underscores the likely regulatory role of state 3 chromatin.
Strikingly, state 3 regions contain a comparable number and density of DHSs as the active TSS state 1, but
appear to account for most of the DHS plasticity observed between the S2, BG3, and other cell types. The
combinations of histone marks distinguishing state 3 from state 1 DHS positions (H3K4me1/me3 disparity,
13
presence of H3K27ac, H3K18ac) are signatures of mammalian enhancers12, which also show high variability
between cell types36. Furthermore, many of the state 3 DHSs bind regulatory proteins, and exhibit low levels of
short, non-coding bidirectional transcripts reminiscent of eRNAs identified in mice38. The majority of the TSS-
distal DHSs are found primarily within the bodies of transcribed genes, most likely a result of the compactness
of the Drosophila genome; hence the chromatin profiles surrounding these sites are not as isolated as in
mammals. Together, these findings suggest that state 3 regions most likely mark sites containing enhancers or
other regulatory elements.
The presence of gene subclasses with different chromatin signatures is not limited to transcriptionally active
genes. Genes within Polycomb domains also display several distinct functional states. A small subset of PcG
target genes is maintained in a ‘balanced state’ marked by binding of ASH1 and other proteins, and full-length
expression at low-to-moderate levels28. Other genes within PC domains do not produce full-length transcripts;
however, we demonstrate chromatin signatures that distinguish a class of genes with PREs immediately
adjacent to TSS positions, a substantial fraction of which (53%) are maintained in a transcriptionally paused
state. The repertoire of chromatin signatures observed at TSSs of PcG target genes (Figure 4a) may represent
stages in a progression from fully repressed to fully activated. Alternatively, distinct signatures might mark
subsets of regulatory genes that require either long-term repression or the ability to reverse functional states,
depending on environmental or developmental cues.
In summary, we note that comprehensive analysis of chromatin signatures has enormous potential for
annotating functional elements in both well-studied and new genomes, such as promoters, enhancers, gene
bodies and large domains. Going forward, our systematic characterization of the epigenomic and transcriptional
properties of Drosophila cells should spur in-depth experimental analyses of the relationship between chromatin
states and genome functions, ranging from whole chromosomes down to individual regulatory elements and
circuits.
14
Figure Legends
Figure 1. Chromatin annotation of the Drosophila melanogaster genome.
a. A 9 state model of prevalent chromatin states found in S2 cells. Each chromatin state (row) is defined by a
combinatorial pattern of enrichment (red) or depletion (blue) in specific chromatin marks (first panel, columns).
For instance, state 1 is distinguished by enrichment in H3K4me2/me3 and H3K9ac, which is typical of
transcription start sites (TSS) in expressed genes. The enrichments/depletions are shown using a log2 scale,
normalized relative to chromatin input (see Supp. Figure S3 for BG3 data and for normalization relative to
histone density). The second panel shows average enrichment of chromosomal proteins for each state. The third
panel shows fold over/under-representation of genic and TSS-proximal (±1kb) regions relative to the entire tiled
genome. The fold enrichment of intronic regions is relative to genic regions associated with each state.
b. A genome-wide karyotype view of the domains defined by the 9 state model in S2 cells. The centromeres are
shown as open circles, and the dashed lines span gaps in the genome assembly. This illustrates several
prominent chromatin organization features (color code in a), including the extent of pericentromeric
heterochromatin (dark blue), and the unique H4K16ac-driven signature of the dosage-compensated male X
chromosome (green). See Supp. Figure S4 for the BG3 genome.
c-e. Examples of chromatin annotation at specific loci. c. illustrates two distinct chromatin signatures of
transcriptionally active genes: one (left) is associated with extensive enrichment in acetylation and
monomethylation marks of states 3 and 4, while the smaller gene (right) is limited to only states 1 and 2 that
recapitulate well-established TSS and elongation signatures (note: small patches of state 7 in CG13185 illustrate
increased H3K9me2 seen at some expressed genes in S213). d. illustrates a locus containing two Polycomb-
associated domains (dashed boxes). The domain on the left contains the transcriptionally repressed vg gene,
which also shows some H3K4me1 enrichment centered on PSC binding sites (PRE). The domain on the right
contains the Su(z)2 gene and illustrates a rare example of a balanced state in which low-level activation of PC-
bound genes by ASH1 and Trithorax proteins leads to the presence of chromatin state 3 signatures.
e. (BG3 cells) An example of a large state 8 domain located within euchromatic sequence, which is enriched for
15
chromatin marks typically associated with heterochromatic regions, but at lower levels than in pericentromeric
heterochromatin (state 7).
Figure 2. Visualization using compact folding illustrates the spatial organization of chromatin
annotations.
a. To illustrate the scales and relative locations of domains associated with different chromatin states, the
chromosome is folded using a geometric pattern (Hilbert space-filling curve) that maintains spatial proximity of
nearby regions. The schematic illustration of the first four folding steps is shown, with blue dots designating 5’
and 3’ ends of the sequence. While this compact curve is optimal for preserving proximity relationships, some
distal sites can appear next to each other along the fold axis (green dots).
b. Chromosome 3L in S2 cells. A domain of a given chromatin state appears as a patch of uniform color of
corresponding size. Note that some of the patches may combine distal chromosomal locations together and
would not represent coherent functional units; thin black lines (finer scale not shown) are used to separate
regions that are distant on the chromosome. Prominent features, such as the large pericentromeric domain
(lower right, dark blue) or blocks of Polycomb-mediated repression (dark gray), are easily distinguished. The
folded view also illustrates chromatin organization features that cannot be easily discerned from a linear view.
For instance, transcription start sites of expressed genes (state 1, red) appear as small specks typically
surrounded by elongation linked state 2 (purple), commonly embedded within larger regions marked by
H3K36me1-driven state 4 (light brown), which also contains patches of intron-associated state 3 (dark brown).
Such open chromatin regions appear as clustered blocks on the chromosome, separated by extensive domains of
state 9 (light-gray), which is largely devoid of transcription. The folded views can be browsed alongside the
linear annotations and other relevant data online15.
c. Other chromosomes. All autosomes show overall organizational patterns similar to chromosome 3L (see
Supp. Figure S6 for full-size view). Note that the chromosome 3R assembly does not reach the pericentromeric
heterochromatin. The chromatin state of the male X chromosome is strikingly different, with abundance of the
16
dosage-compensation driven state 5 (green) and increased low-level heterochromatic signatures (state 8, light
blue). See Supp. Figure S7 for BG3.
Figure 3. Chromatin patterns associated with transcriptionally active genes.
a. Location and extent of chromatin features relative to boundaries of expressed genes (over 1kb in size) in BG3
cells. The color intensity indicates the relative frequency with which enrichment/depletion of a given mark
occurs within the gene (normalized independently for each mark). Regions of enrichment/depletion for many
marks, such as H3K9ac or H3K27ac, are localized within TSS-proximal regions. Other marks appear
downstream of TSSs (H3K4me1, H3K36me1), extending further within the gene body. H2B-ubiquitination
typically covers the exact extent of the transcript uniformly.
b. Regions enriched for ‘active’ chromatin marks in long transcribed genes. The plot shows the extent of
regions enriched for various active marks for long (>4kb), transcriptionally active genes on BG3 autosomes.
Each row represents a gene. The first column illustrates coding exons within each gene using the same spatial
scaling; the last column shows chromatin state annotation. The clustering of the genes according to the spatial
patterns of chromatin marks separates genes with a high fraction of coding sequence (red subtree, bottom) from
genes containing long intronic sequences (green subtrees, top). The latter are associated with H3K18ac,
frequent H3K36me1, and extensive H3K9ac and H3K27ac domains. These histone marks correspond to the
presence of chromatin state 3 (last column) within the large intronic regions. The presence of these histone
marks is also associated with binding of specific chromosomal proteins, such as the tight coupling of H3K18ac
enrichment and binding of a Cohesin loading protein Nipped-B. A subset of such genes is also bound by the
ASH1 histone methyltransferase. Both of these complexes are strongly associated with the chromatin state 3
genome-wide (see Supp. Figure G2 for extended analysis, including renormalization by histone density).
Figure 4. Chromatin and transcriptional signatures of TSSs within domains of Polycomb-mediated
repression.
a. Distinct classes of TSSs in S2 Polycomb domains. Each row represents a TSS. Clusters 1-5 illustrate distinct
17
TSS states (see Supp. Figure P2 for a complete set of clusters): cluster 1 shows fully repressed TSSs with the
expected pattern of PC and H3K27me3 enrichment; cluster 2 shows 21 TSSs found within ASH1 domains,
which are maintained in a “balanced” state and are transcribed at moderate levels. Clusters 3 and 4 distinguish
TSSs located in the immediate proximity of Polycomb response elements (PREs), showing the symmetric
H3K4me1/me2 enrichment typical of all PREs (Supp. Figure P1). Many such TSSs (cluster 3, 42 TSSs)
produce short, non-polyadenylated transcripts along the sense strand (GRO+/shortRNA+ columns), indicating
that these genes are maintained in a paused polymerase state.
b. PRE positions distant from annotated TSSs. TSS-distal PREs exhibit enrichment for H3K4me1/me2, but
unlike TSS-proximal PREs, are not associated with GRO or shortRNA signatures.
c. An example of a Polycomb domain on chromosome 2L containing several PREs (dashed vertical lines).
While all PREs are associated with H3K4me1/me2 enrichment, only the PRE located immediately near a TSS
(middle) shows shortRNA and GRO-seq signals indicating paused polymerase activity.
d. The magnitude of shortRNA peaks at PRE-proximal TSSs is similar to that found at expressed genes. The
plot compares distributions of shortRNA peaks of different TSSs using cumulative distribution functions. The
plot demonstrates that the distribution of shortRNA peak magnitudes found at the PRE-proximal TSSs (red)
closely matches those seen at expressed genes (green), which are much higher than levels associated with silent
genes (blue) or random genomic positions (gray).
e. Transcriptional signals are a feature specific to TSS-proximal PREs. The distribution of shortRNA peak
magnitudes shows that a substantial fraction (44%) of TSS-proximal (<1kb) PREs are associated with
shortRNA signals while such signals are generally not observed for TSS-distal PREs.
Figure 5. Chromatin signatures of regulatory elements identified by DNaseI hypersensitivity.
a. Representative classes of high-magnitude DNaseI hypersensitive sites (DHSs) compared to the combinatorial
chromatin signatures in S2 cells. TSS-proximal DHSs show chromatin signatures expected of proximal
promoter regions in expressed genes: high H3K4me3 enrichment and RNA Pol II signal extending in the
direction of transcription (see Supp. Figure Y3 for a complete set of clusters). By contrast, TSS-distal DHSs are
18
associated with high H3K4me1 and low H3K4me3 levels. Most TSS-distal DHSs that are found within the
bodies of expressed genes are associated with chromatin state 3 (high H3K18ac, H3K27ac; Figure 1a) and vary
in the magnitude of enrichment for some of these marks (H2B-ubiquitination levels in cluster 4 vs. cluster 5, see
Supp. Figure Y4 for a complete set of clusters). A cluster of rare intergenic DHSs (cluster 6), associated with
localized peaks of H3K4me2 (see Supp. Figure Y6).
b. DHS positions relative to annotated gene structures in S2 and BG3 cells. The majority of the DHSs (64%) are
found in TSS-proximal (≤2kb) regions, including 43% found within 250bp of TSSs, 10% within 2kb proximal
promoter regions, and a large fraction of the DHS positioned within the 5’ UTRs. Most DHS positions outside
the TSS-proximal regions (>2kb away) are found within gene bodies, with only 12% of all DHS located in the
intergenic regions (including 3’-proximal positions located within the 2kb of the 3’ gene ends). Many such
positions, however, show transcriptional signatures consistent with the presence of currently unannotated genes
(Supp. Figure Y5).
c. Distribution of DHS positions among chromatin states. In both S2 and BG3 cell lines, the vast majority of
DHS positions are found within the TSS-proximal state 1 or enhancer-like state 3 regions.
d. States 1 and 3 exhibit the highest density of DHSs. On average, one DHS is found per 2.2kb of sequence
associated with state 1, and one per 3.7kb of state 3. The density is markedly lower in all other chromatin states.
e. Cell line-specific DHS differences are positioned predominantly within the enhancer-like state 3. The matrix
shows the chromatin state of loci containing DHSs in one cell line (x-axis) and the state of the same locus in the
other cell line where the DHS is absent (y-axis). While the majority of DHSs are distributed between states 1
and 3, most (91%) of the DHSs that differ between the cell lines are found in the enhancer-like state 3. When
DHSs are not present, the loci most commonly transition to an open chromatin state 4 (43%), or maintain
enhancer-like state 3 (23%). In both scenarios, most of the associated gene loci remain transcriptionally active
(see Supp. Figure Y12).
f. Low-magnitude non-coding RNA transcripts are associated with DHS positions. The presence of transcription
at the TSS-distal DHSs (see shortRNA in a.) is confirmed by the GRO-seq assay. The average GRO-seq
profiles illustrate the bi-directional transcription originating from the DHS positions. The top plot shows local
19
increase in the antisense GRO-seq signal for the DHSs located within transcribed genes; positive strand GRO-
seq signal is shown for genes transcribed along the negative strand, and negative strand GRO-seq signal is
shown for genes transcribed along the positive strand (see Supp. Figure Y7); dashed lines show median levels.
The DHS-associated signals are an order of magnitude lower than those observed for mRNA elongation. The
intergenic DHS positions (bottom plot) also show bi-directional GRO-seq signal of comparable magnitude (see
Supp. Figure Y5).
Figure 6. Spatial arrangements of chromatin states associated with active transcription.
a. A graphic representation of chromatin patterns associated with different types of expressed genes. Unlike
short or exon-rich expressed genes (right panel) which are primarily associated with TSS-proximal chromatin
state 1 and an elongation state 2, expressed genes with long intronic regions typically contain one or more
regions of enhancer-like state 3. The state 3 regions are bound by distinct chromatin remodeling enzymes
(SPT16, dMi-2), and show higher nucleosome turnover and H3.3 incorporation rates than either TSS-proximal
state 1 or elongation state 2 regions. Such long-intron genes are enriched for regulatory functions, are typically
found within more extensive domains formed by binding of Cohesin complexes, and are often also bound by
ASH1 histone methyltransferase. The state 3 regions also contain the majority of DHSs that vary between the
cell lines.
b. Network of transitions between chromatin states along expressed genes. The observed frequency of
transitions from one state to another along the body of expressed genes is shown using arrows of different
thickness. The most frequent transition is from active TSS state 1 directly into the elongation state 2. In long
genes, however, state 1 can be followed by a sequence of enhancer-like state 3 and open chromatin state 4,
eventually transitioning to the elongation state 2. While the 3’ end of expressed genes is typically marked by
state 2, proximity of neighboring promoters makes transition from state 2 to state 1 relatively common within
the genome.
20
Table 1. List of examined experimental measurements.
The table lists histone marks, chromosomal proteins and sequencing data analyzed in the manuscript. See Supp.
Table S1 for the antibody information.
Histone Marks Chromosomal Proteins Sequencing data
H3K36me3, H4K16ac, H3K79me1, H2B-ubiq,
H3K4me2, H3K4me3, H3K9ac, H3K79me2,
H3K27ac, H3K23ac, H3K36me1, H3K4me1,
H3K18ac, H3K27me3, H1, H4, H3K9me2,
H3K9me3, H4K5ac*, H4ac Tetra*, H4K8ac*
PSC, (EZ), PC, ASH1, RNA Pol
II, HP1a, HP1c, Nipped-B+, ISWI,
NURF301*, SPT16*, dMi-2*,
SU(VAR)3-9, MRG15, dRING,
CHROMATOR, GAF
RNA-seq
GRO-seq*
DNaseI-seq
short-RNA#
*measured only in S2 cells; +BG3 cell data from Misulovin et al.19. #S2 data from Nechaev et al.29.
Methods Summary
Histone modification and chromosomal protein antibodies were characterized for cross-reactivity with non-
target proteins or modifications using Westerns to nuclear extracts, peptide slot/dot blots, mass spectrometry,
immunofluorescence staining, and RNAi depletion6. ChIP-chip was performed in duplicate on chromatin
extracts from cells as described previously27, and IP’d DNA was amplified using whole genome amplification,
then hybridized to Affymetrix Drosophila Tiling 2.0R Arrays. Digital DNaseI-seq assays were performed as
described previously40, and Global Run-On library (GRO-seq) data was generated as described in Core et al30.
Short RNA data was generated by Nechaev et al29, and RNA-seq data was generated by Graveley et al.41.
The chromatin state models were generated using the distributions of 18 different histone marks, using only
data sets that passed strict statistical criteria for replicate consistency. The 9 state model utilized average
enrichment levels in 200bp bins, based on unsmoothed M values. Contiguous regions of enrichment for
individual marks was based on a three-state hidden Markov model (HMM) (corresponding to enriched, neutral,
and depleted profiles). DHS positions were determined as read density peaks, significantly enriched relative to
the genomic DNA control. Clustering of chromatin signatures around TSSs, PREs, and DHSs was determined
21
using the PAM algorithm, and average values utilized ±2kb bins. Further details about data processing and
computational analyses are described in the online supplementary information.
Methods
Growth conditions The ML-DmBG3-c2 cells were obtained from DGRC and are described at https://dgrc.cgb.indiana.edu/, the S2-DRSC cells were from DRSC (http://www.flyrnai.org/). All cell lines were grown to a density of ~5x106 cells/ml in Schneider's media (Gibco) supplemented with 10% FCS (HyClone). 10 µg/ml insulin was added to the ML-DmBG3-c2 media.
Antibodies The antibodies used are listed in Supplemental Table S1. Commercial antibodies against modified histones were tested by Western-blot for the lack of cross-reactivity with corresponding recombinant histone produced in E.coli and non-histone proteins of embryonic nuclear extract. The antibody specificity was further assayed by Western dot/slot blot against a panel of synthetic modified histone peptides. All antibodies that showed 50% or more of their total activity directed against non-histone embryonic proteins, or less then 5 fold higher affinity to corresponding histone peptide, were not used in our ChIP experiments. The specificity of antibodies against chromosomal proteins was tested by Western blots with nuclear extracts prepared from mutant flies or S2 cells subjected to RNAi knockdown. RNAi was done as described by Clemens et al.42. An antibody was considered specific if it recognized a major band of expected mobility that was absent in the sample prepared from mutant flies or diminished 2-fold or more upon RNAi knock-down. For the chromatin proteins for which only one antibody was available, we performed validation by comparing its genomic distribution with the published distribution of a different component of the same protein complex or to published genomic distributions generated with a different antibody. When no published data was available, distributions of a chromosomal protein was mapped with two antibodies generated against different epitopes to ensure accuracy (see Supp. Figure G7).
ChIP and microarray hybridization The preparation of crosslinked chromatin from cultured cells was performed as described in Schwartz et al.27 with the following modifications. Prior to ultrasound shearing the cells were permeabilized with 1% SDS and the shearing was done in TE-PMSF (0.1% SDS, 10mM Tris-HCl pH8.0, 1mM EDTA pH8.0, 1mM PMSF) using a Bioruptor (Diagenode) (2 x 10 min, 1 x 5 min; 30sec on, 30 sec off; high power setting). ChIP was performed as described in Schwartz et al.27 and IP’d DNA was amplified using the whole genome amplification kit (WGA2, Sigma) according to the manufacturer’s instructions, except that the chemical fragmentation step was omitted. The amplified material was labeled and hybridized to Drosophila Tiling Arrays v2.0 (Affymetrix) according to Schwartz et al.27.
Processing of ChIP data At least two independent biological replicates were assessed for each ChIP profile. The log2 intensity ratios (M values) were calculated for each replicate. The profiles were smoothed using local regression (lowess) with
22
500bp bandwidth, and the genome-wide mean was subtracted. The regions of significant enrichment were determined as clusters of at least 1kb in length, with gaps no more than 100bp where M value exceeds a statistically significant (0.1% FDR) enrichment threshold. The set of biological replicates was deemed consistent if the enriched regions from individual experiments had a 75% reciprocal overlap, or if at least 80% of the top 40% of the regions identified in each experiment were identified in the other replicate (before comparison the replicates were size-equalized by increasing the significance threshold for a replicate with more enriched sequence). The data from individual replicates were then combined using local regression smoothing, and used for all of the presented analysis, unless noted otherwise.
DNaseI hypersensitivity Digital DNaseI-seq assays were performed as described previously40. The sequenced reads were aligned to the dm3 genome assembly, recording only uniquely mappable reads. To detect DNase I hypersensitive sites, hotspot positions were identified based on a 300bp scanning window statistic (Poisson model relative to 50kb background density, Z-score threshold of 2), and peaks of read density were selected within the hotspots using randomization-based thresholding at 0.1% FDR. The set of high-magnitude DHS analyzed in the manuscript (except for Supp. Figure Y1) was then determined as a subset of all identified peaks that show statistically significant enrichment over the normalized genomic DNA read density profile (using a 300bp window centered around the peak, binomial model, with Z-score threshold of 3). This method controls for copy number variation and sequencing/mapping biases, however it may also reduce the sensitivity of DHS detection. In the DHS chromatin profile clustering analysis (Figure 5a, relevant supplementary figures), DHSs found within 1kb of another DHS were excluded if their enrichment magnitude (relative to genomic background) was lower (to avoid showing the same region more than once).
RNA sequencing The preparation of RNA-seq libraries and sequencing is described in Graveley et al.41. The sequenced reads were aligned to the dm3 genome assembly and annotated exon junctions, recording only uniquely mappable reads. The RPKM (reads per kilobase of exonic sequence per million reads mapped) was estimated for each exon. The total transcriptional output of each annotated gene was estimated based on the maximum of all exons within the gene. The presented analysis uses
€
log10(RPKM +1) values unless otherwise noted.
GRO sequencing Global Run-On library was prepared from S2 cells and sequenced as described by Core et al 30. The obtained reads were aligned to the dm3 genome assembly, recording only uniquely mappable reads. The smoothed profiles of reads mapping to each strand were calculated using Gaussian smoothing (
€
σ =100bp). The presented analysis uses
€
log10(d +1) , where
€
d is the smoothed density value. Short RNA data processing The short RNA data for S2 cells was generated by Nechaev et al29, and was aligned and processed in the same way as the GRO-seq data.
Chromatin state models To derive a nine-state joint chromatin state model for S2 and BG3 cells (Figure 1a), the genome was first divided into 200bp bins, and the average enrichment level was calculated within each bin based on unsmoothed M values taking into account individual replicates, using all histone enrichment profiles and PC to discount the genome-wide difference in S2 H3K27me3 profiles. The bin-average values of each mark were shifted by the genome-wide mean, scaled by the genome-wide variance, and quantile-normalized between the two cells. The HMM model with multivariate normal emission distributions was then determined from the Baum-Welch algorithm using data from both cell types, and 30 seeding configurations determined with K-means clustering. States with minor intensity variations (Euclidian distance of mean emission values < 0.15) were merged. Larger models (up to 30 states) were examined, and the final number of states was chosen for optimal interpretability.
23
An extensive discrete chromatin state model (Supp. Figure S11) was calculated as described in Ernst et al.18. The model was trained using 200bp grid with binary calls (enriched/not enriched). The binary calls were made based on a 5% FDR threshold determined from 10 genome-wide randomizations for each mark. For H1, H4 and H3K23ac regions of significant depletion rather than enrichment were called.
Regions of enrichment for individual marks (Figure 3) To determine contiguous regions of enrichment for individual marks, a three-state HMM model was used, with states corresponding to enriched, neutral, and depleted profiles (normally-distributed emission parameters:
€
µ = −0.5 0 0.5[ ] ,
€
σ2 = 0.3). The enriched regions were determined from the Viterbi path. The HMM segmentation was applied to unsmoothed M value data taking into account individual biological replicates.
Classification of enrichment profiles (Figures 4,5) Clustering of chromatin signatures around TSSs (Figure 4a), PREs (Figure 4b), and DHSs (Figure 5a, relevant supplements) was determined using PAM algorithm. For clustering, each profile was summarized with average values within bins spanning ±2kb regions. 100bp bins were used for the central ±500bp region, 300bp bins outside.
References 1 Celniker, S. E. et al., Unlocking the secrets of the genome. Nature 459 (7249), 927 (2009). 2 Adams, M. D. et al., The genome sequence of Drosophila melanogaster. Science 287 (5461), 2185
(2000); Hoskins, R. A. et al., Sequence finishing and mapping of Drosophila melanogaster heterochromatin. Science 316 (5831), 1625 (2007).
3 Clark, A. G. et al., Evolution of genes and genomes on the Drosophila phylogeny. Nature 450 (7167), 203 (2007).
4 Tweedie, S. et al., FlyBase: enhancing Drosophila Gene Ontology annotations. Nucleic Acids Res 37 (Database issue), D555 (2009).
5 Felsenfeld, G. and Groudine, M., Controlling the double helix. Nature 421 (6921), 448 (2003); Mendenhall, E. M. and Bernstein, B. E., Chromatin state maps: new technologies, new insights. Curr Opin Genet Dev 18 (2), 109 (2008).
6 Egelhofer TA, Minoda A, Klugman S, Lee K, et. al, An assessment of histone-modification antibody quality. Nat Mol Struct Biol (in press).
7 Eissenberg, J. C. and Reuter, G., Cellular mechanism for targeting heterochromatin formation in Drosophila. Int Rev Cell Mol Biol 273, 1 (2009).
8 Schwartz, Yuri B and Pirrotta, Vincenzo, Polycomb complexes and epigenetic states. Current Opinion in Cell Biology 20 (3), 266 (2008).
9 Li, B., Carey, M., and Workman, J. L., The role of chromatin during transcription. Cell 128 (4), 707 (2007).
10 Liu, C. L. et al., in PLoS Biol (2005), Vol. 3, pp. e328; Barski, A. et al., in Cell (2007), Vol. 129, pp. 823.
11 Carrozza, Michael J et al., Histone H3 methylation by Set2 directs deacetylation of coding regions by Rpd3S to suppress spurious intragenic transcription. Cell 123 (4), 581 (2005).
12 Heintzman, N. D. et al., Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat Genet 39 (3), 311 (2007).
13 Nicole C. Riddle, Aki Minoda, Peter V. Kharchenko, Artyom A. Alekseyenko, Yuri B. Schwartz, Michael Y. Tolstorukov, Andrey A. Gorchakov, Cameron Kennedy, Daniela Linder-Basso,Jacob D. Jaffe, Gregory Shanower, Mitzi I. Kuroda, Vincenzo Pirrotta, Peter J. Park, Sarah C. R. Elgin, Gary H.
24
Karpen, Plasticity in patterns of histone modifications and chromosomal proteins in Drosophila heterochromatin. ((submitted)).
14 Anders, S., Visualization of genomic data with the Hilbert curve. Bioinformatics 25 (10), 1231 (2009). 15 Drosophila chromatin browser, Available at http://compbio.med.harvard.edu/flychromatin, (2010). 16 MacAlpine, D. M., Rodriguez, H. K., and Bell, S. P., Coordination of replication and transcription along
a Drosophila chromosome. Genes Dev 18 (24), 3094 (2004). 17 Blumenthal, A. B., Kriegstein, H. J., and Hogness, D. S., The units of DNA replication in Drosophila
melanogaster chromosomes. Cold Spring Harb Symp Quant Biol 38, 205 (1974). 18 Ernst, J. and Kellis, M., Discovery and characterization of chromatin states for systematic annotation of
the human genome. Nat Biotechnol 28 (8), 817 (2010). 19 Misulovin, Z. et al., Association of cohesin and Nipped-B with transcriptionally active regions of the
Drosophila melanogaster genome. Chromosoma 117 (1), 89 (2008). 20 Kagey, M. H. et al., Mediator and cohesin connect gene expression and chromatin architecture. Nature
467 (7314), 430 (2010). 21 Beisel, C. et al., Histone methylation by the Drosophila epigenetic transcriptional regulator Ash1.
Nature 419 (6909), 857 (2002); Tanaka, Y. et al., Trithorax-group protein ASH1 methylates histone H3 lysine 36. Gene 397 (1-2), 161 (2007).
22 Deal, R. B., Henikoff, J. G., and Henikoff, S., Genome-wide kinetics of nucleosome turnover determined by metabolic labeling of histones. Science 328 (5982), 1161 (2010).
23 Henikoff, S. et al., Genome-wide profiling of salt fractions maps physical properties of chromatin. Genome Res 19 (3), 460 (2009).
24 MacAlpine, H. K. et al., Drosophila ORC localizes to open chromatin and marks sites of cohesin complex loading. Genome Res 20 (2), 201 (2010).
25 Tie, Feng et al., CBP-mediated acetylation of histone H3 lysine 27 antagonizes Drosophila Polycomb silencing. Development 136 (18), 3131 (2009); Visel, Axel et al., ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature 457 (7231), 854 (2009).
26 Zinzen, R. P. et al., Combinatorial binding predicts spatio-temporal cis-regulatory activity. Nature 462 (7269), 65 (2009).
27 Schwartz, Y. B. et al., Genome-wide analysis of Polycomb targets in Drosophila melanogaster. Nat Genet 38 (6), 700 (2006).
28 Schwartz, Yuri B. et al., Alternative Epigenetic Chromatin States of Polycomb Target Genes. PLoS Genet 6 (1), e1000805 (2010).
29 Nechaev, S. et al., Global analysis of short RNAs reveals widespread promoter-proximal stalling and arrest of Pol II in Drosophila. Science 327 (5963), 335 (2010).
30 Core, L. J., Waterfall, J. J., and Lis, J. T., Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science 322 (5909), 1845 (2008).
31 Bernstein, B. E. et al., A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125 (2), 315 (2006); Kanhere, A. et al., Short RNAs are transcribed from repressed polycomb target genes and interact with polycomb repressive complex-2. Mol Cell 38 (5), 675 (2010).
32 Schuettengruber, B. et al., Functional anatomy of polycomb and trithorax chromatin landscapes in Drosophila embryos. PLoS Biol 7 (1), e13 (2009); Gan, Q. et al., Monovalent and unpoised status of most genes in undifferentiated cell-enriched Drosophila testis. Genome Biol 11 (4), R42 (2010).
33 Wu, C., The 5' ends of Drosophila heat shock genes in chromatin are hypersensitive to DNase I. Nature 286 (5776), 854 (1980); Wu, C. et al., The chromatin structure of specific genes: I. Evidence for higher order domains of defined DNA sequence. Cell 16 (4), 797 (1979).
34 Elgin, S. C., The formation and function of DNase I hypersensitive sites in the process of gene activation. J Biol Chem 263 (36), 19259 (1988); Jin, C. et al., H3.3/H2A.Z double variant-containing nucleosomes mark 'nucleosome-free regions' of active promoters and other regulatory regions. Nat Genet 41 (8), 941 (2009).
35 Hesselberth, J. R. et al., Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nat Methods 6 (4), 283 (2009).
25
36 Heintzman, N. D. et al., Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 459 (7243), 108 (2009).
37 MacArthur, S. et al., Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions. Genome Biol 10 (7), R80 (2009).
38 Kim, T. K. et al., Widespread transcription at neuronal activity-regulated enhancers. Nature 465 (7295), 182 (2010).
39 Filion, G. J. et al., Systematic protein location mapping reveals five principal chromatin types in Drosophila cells. Cell 143 (2), 212 (2010).
40 Sekimata, M. et al., CCCTC-binding factor and the transcription factor T-bet orchestrate T helper 1 cell-specific structure and function at the interferon-gamma locus. Immunity 31 (4), 551 (2009).
41 Gravely, B., et al., Celniker S.E., Characterization of transcriptional activity in Drosophila melanogaster. ((submitted)).
42 Clemens, J. C. et al., Use of double-stranded RNA interference in Drosophila cell lines to dissect signal transduction pathways. Proc Natl Acad Sci U S A 97 (12), 6499 (2000).
H3K23ac
HP1a
H3K9me2
H3K9me3
H3K4me3
RNA-Pol II
state
Figure 1
e.a.
b.
c. d.
987654321
H3K
4me3
H3K
4me2
H3K
9ac
H4K
16ac
H2B
−ubi
qH
3K79
me2
H3K
79m
e1H
3K36
me3
H3K
18ac
H3K
4me1
H3K
27ac
H3K
36m
e1H
3K9m
e3H
3K9m
e2H
4H
1H
3K27
me3
H3K
23ac
SPT1
6G
AFdM
i−2
ASH
1H
P1c
ISW
IN
UR
F301
RN
A po
l II
MR
G15
Chr
omat
ordR
ING
E(Z)
PC PSC
SU(V
AR)3
−9H
P1a
gene
TSS−
prox
imal
intro
n%
of g
enom
e
402566118138
colo
rco
de
chro
mat
in s
tate
s
−2 −1 0 01 2 −1 1log2 enrichment:
−2 −1 0 1 2
PC
H3K27me3
ASH1
H3K36me1
H3K4me1
H3K4me3
RNA-seq
H3
PSC
state
+-
chr2R
H3K36me1
H3K4me1
H3K18ac
H3K27ac
H2B-ubi
RNA-seq
RNA pol II
state
+-
chr2R
H3
+-
chr3L
3R3L
2L
X
2R
4 YHet
1Mbp
chr3
Rch
r2L
chrX
chr2
Rch
r4
a. c.
b.
chro
mat
in s
tate
s:
chro
mos
ome
3LFi
gure
2
5’3’
5’3’
5’3’
5’3’
pericentromericheterochromatincluster of small
expressed genes
PcGdomains heterochromatin-
like domain
open chromatindomain
98
76
54
32
1
codingsequence
H3K36me3 H3K9ac H3K36me1 H3K27ac H3K18ac Nipped-B chromatinstates
500bp 2Kbp body 2Kbp 500bp
enrichment
H3K23ac
H1
H4
H3K4me2
H3K4me3
H3K9ac
H3K27ac
RNA pol II
H3K79me2
H3K18ac
H3K36me1
H3K4me1
H4K16ac
�������
H3K79me1
H3K36me3
depletedenriched
a. � Figure 3
chromatin states:
9
8
7
6
5
4
3
2
1
1.5Kbp 500bp body 500bp 1.5Kbp
PSC PC ASH1 H3K27me3 H3K4me1 H3K4me2 H3K4me3 RNA pol II GRO+ GRO- shortRNA+ shortRNA-
1
234
5
TSS-
dist
ant
PREs
TSS
with
in P
olyc
omb
dom
ains
0 1 2 3 4
0.0
0.2
0.4
0.6
0.8
1.0
all genesexpressedsilentPRE TSSrandom
log10(shortRNA read density)
frac
tion
of s
ites
belo
w th
is le
vel
TSS2Kbp 2Kbp
PRE2Kbp 2Kbp
GRO +
GRO -
short RNA
H3K4me1
H3K4me2
PSC
PC
H3K27me3
chr 2L
Figure 4
�� 0 1 2 3 4
0.0
0.2
0.4
0.6
0.8
1.0
log10(shortRNA peak)
scal
ed d
ensi
ty
��������� ��������
a.
b.
c. d. e.
�� �� 0 1 2log2 enrichment
DHS2Kbp 2Kbp
H3K4me3 H3K4me1 H3K4me2 H3K9ac H3K18ac H3K79me2 H4K16ac H3K27ac RNA pol II H3K36me1 H2B−ubiq shortRNA+ shortRNA-
1
2
3
4
5
6
TSS-
prox
imal
TSS-
dist
al
a.
b.c. d. e. f.
Figure 5
intergenic (9%)promoter (10%)3’-proximal (3%) exon (1%)
intergenic (9%)promoter (10%)
at TSS43% 5’ UTR
14%
intron20%
10%9%
TSS-proximal (64%
)
gene body (34%)
gene body (34%)
1 2 3 4 5 6 7 8 9
1
2
3
4
5
6
7
8
9
050
100200
300
chromatin state with DHS
chro
mat
in s
tate
with
out D
HS num
ber of altered DH
S
����� ����� 0 1000 2000
���
���
����
����� ����� 0 1000 2000
���
��
���
���
DHS within expressed genes
intergenic DHS
��� ����� ���������
��
��
� ��
����
��
(antisense)
����������� minus strand
1 2 4 6 83 5 7 9���
���
���
���
���
chromatin state
���
���
���!
� ��
�"#�
$
��
��
0
1
2 log2 enrichment
c.state 1
45%
state 337%
6%
5%
state 4 (6%)state 6 (3%)
state 9 (5%)
a. b.
cohesin / ASH1chromatin state
ISWINURF301
DHS
SPT16 / dMi-2H3.3 / nucleosome turnover
( cell type-specific, common)
49 1 4 3 34 4 2 9 91 2
long expressed gene short gene
1
4
2
3