bar ilan universitylifefaculty.biu.ac.il/gershon-tamar/images/theses... · dr. tirza doniger, dr....
TRANSCRIPT
BAR ILAN UNIVERSITY
FROM PROMOTERS OF DROSOPHILA HOX GENES
TO THE HUMAN GENOME:
IDENTIFICATION, CHARACTERIZATION AND
POTENTIAL FUNCTION OF A HUMAN DPE MOTIF
Yehuda M. Danino
Submitted in partial fulfillment of the requirements for the Master's
Degree in the Faculty of Life Sciences, Bar-Ilan University
Ramat Gan, Israel 2015
This work was carried out under the supervision of Dr. Tamar Juven-Gershon,
from the Mina and Everard Goodman Faculty of Life Sciences, Bar Ilan
University
Acknowledgments
A complex research project like this is never performed without multiple
collaborations and many contributions of different people.
I would like to extend my appreciation especially to the following.
First and foremost, I would like to express my sincere gratitude to my
advisor Dr. Tamar Juven-Gershon, who has supported me throughout my
'Psagot' course, my thesis and even before, with her patience, immense
knowledge and her valuable advices and excellent guidance that allowed me
to work in my own way and to say all my thoughts. I appreciate her
enthusiasm from ideas and science in general, her continuous professional
and personal support, her endless patience, her striving for excellence and
her humanity. Her guidance helped me in all the time of research and writing
of this thesis. With her admirable leadership, I was able to make the best of
my abilities. One simply could not wish for a better or friendlier supervisor.
Special thank is reserved to Dr. Diana Ideses, our lab manager, for her
kindness, encouragement and insightful comments. Her help and friendship
are invaluable to me.
In my daily work I have been blessed with a friendly, cheerful and very
much helpful group of fellow lab mates: Yonathan Zehavi, Avital Ovadia-
Shochat, Adi Kedmi, Miri Pinhasov, Anna Sloutskin and the former lab
members Julia Sharabany and Ola Kuznetsova. Moreover, I would like to
thank B.Sc students, who worked with me in our lab: Sivan Shakartzi, Noy
Elimelech, Gal Nuta and Chen Katz, their help was very important.
Particularly, I would like to significantly thank Hila Shir-Shapira for working
close to me during this M.Sc. period including moments of success, gladness,
crying and immediately support. A significant part from my knowledge is due
to her. Sometimes, little words say big things. Furthermore, I would like to
especially thank Dan Even for the many valuable discussions, support and
help. His readiness to help, even in late night, and his friendship will remain
with me forever.
I would also like to thank Prof. Sergey Chesnokov and his wife Lina, Dr.
Julia Starkova, Amitay Drummer, Dr. Julia Zeitlinger and Wanqing Shao for
their collaboration and their part in my thesis. Moreover, I would like to thank
Dr. Tirza Doniger, Dr. Haim Wachtel and Dr. Ofir Hakim for their help and their
smart advices. I wish everyone the best in their future research and work.
I am also grateful to my friends: Einav Cohen, Oriel Davidi, Daniel Rika,
Telem Nahum, Snir Dadon, Elor Lict and Dvir Manjem. Their encouragement
and understanding were a source of support to me.
Nevertheless, none of this would have been possible without the love and
patience of my family. My immediate family has been a constant source of
love, concern, support and strength all these years.
Finally, last but not least, I would to thank God, for every single moment of
my life and of challenge and success during this scientific journey to
understand, if only a little, its majesty and greatness in order to reveal its
honor in the reality.
Table of Contents
Abstract ......................................................................................................... I
1. Introduction ............................................................................................... 1
1.1. The complexity of the transcription initiation landscape and the core
promoter region ......................................................................................... 1
1.2. The focused core promoter elements ................................................. 3
1.2.1. The TATA box .............................................................................. 6
1.2.2. The Inr .......................................................................................... 6
1.2.3 The DPE ....................................................................................... 7
1.3. The Homeotic proteins: evolutionary conservation, structure and
function in development ............................................................................ 8
1.4. The network of HOX and CDX proteins and their role in blood
cancers .................................................................................................... 11
2. Research Importance: ............................................................................ 13
3. Research Goals ...................................................................................... 15
4. Results .................................................................................................... 16
4.1. The integrated approaches of the researc ........................................ 16
4.2. Part I: Promoters of human Cdx/Hox genes contain functional DPE
motifs ....................................................................................................... 18
4.2.1. The 'Individual examination' project ........................................... 19
4.2.2. The PCP project ......................................................................... 24
4.2.3. The 'Individual examination' project- further analysis ................. 32
4.3. Part II: Identification of a SNP in the +1 position of the Hoxb6 and its
potential implications in health and disease ............................................ 35
4.4. Part III: Whole genome analysis ....................................................... 41
4.4.1. Identification of human promoters that contain Drosophila DPE
sequence motifs ................................................................................... 41
4.4.2. ElemeNT- a core promoter Elements Navigation Tool ............... 45
4.5 Part IV: Identification of binding sites of the human TAF6 and TAF9,
subunits of the TFIID in human promoters .............................................. 48
5. Discussion ........................................................................................... 55
In search of a functional human DPE motifs – analysis of human Cdx/Hox
promoters and computational whole genome analysis ............................ 55
Identification of a SNP in the +1 position of the Hoxb6 and its potential
implications in health and disease ........................................................... 58
Identification of binding sites of the human TAF6 and TAF9, subunits of
the TFIID in human promoters ................................................................. 59
6. Materials and Methods ........................................................................... 61
7. Reference ............................................................................................... 70
8. Publications during the M.Sc. period ...................................................... 78
9. Appendixes ............................................................................................. 79
Appendix 1 .............................................................................................. 79
Appendix 2 .............................................................................................. 80
Appendix 3 .............................................................................................. 85
Appendix 4 .............................................................................................. 86
Appendix 5 .............................................................................................. 87
Appendix 6 .............................................................................................. 89
Appendix 7 .............................................................................................. 90
Appendix 8 ............................................................................................ 112
Appendix 9 ............................................................................................ 113
א ............................................................................................................. תקציר
I
Abstract
Accurate gene expression is pivotal for determining the distinct identities
and dedicated functions of different cells and tissues in the multicellular
organism. This multistep program is regulated by mechanisms that comprise
diverse multiplayer molecular circuits of multiple dedicated components.
One process underlying accurate gene expression is transcription.
Transcription of protein-coding genes by RNA polymerase II (Pol II) uses the
DNA sequence as a template for transcribing mRNA molecules, which in turn
would be translated into proteins. Transcription initiation is one of the first and
central regulation points underlying the expression of protein-coding genes
and distinct non-coding RNAs. It occurs following the recruitment of Pol II to
the core promoter region by the basal transcription machinery, during the
preinitiation complex (PIC) assembly, through protein-protein and protein-
DNA interactions.
The core promoter is generally defined as the minimal DNA sequence that
directs accurate initiation of transcription. This region encompasses the
transcription start site (TSS), typically referred to as the +1 position, and its
length is approximately 80bp (from -40 to +40, relative to the TSS). Moreover,
the core promoter sequence contains short functional DNA sequences, which
are termed core promoter elements (or motifs), which confer structural and
functional properties to the core promoter. The downstream core promoter
element (DPE) is one of these elements. The DPE element has an important
role in the expression of regulated genes that are associated with embryonic
development, including the caudal gene that regulates the Hox genes and the
Hox genes themselves, which specify the identity of the segments in the
II
developing embryo. The function of the DPE was mainly characterized in the
fruit fly, Drosophila Melanogaster. Even though this element was discovered
about twenty years ago, to date, a functional DPE has only been identified in
two human promoters (irf-1 and calm2).
In this study, we tried to identify the DPE motif within promoters of human
genes in general, and within the promoters of the Hox genes in particular,
using several computational and experimental methods. The majority of the
research approaches was based on the features of the DPE, as defined in
Drosophila, and on the evolutionary conservation between Drosophila to
humans, particularly in the Hox genes, which are highly conserved in
metazoans. Remarkably, we demonstrated that several promoters of human
Hox genes contain a functional DPE motif. The majority of the
abovementioned analyses, which were based on the definition of the
Drosophila DPE, were unsuccessful in identifying novel human DPE-
containing promoters. It is reasonable that despite significant evolutionary
conservation, these organisms are evolutionarily distant and the human DPE
may differ from the Drosophila DPE. In order to identify and characterize the
equivalent of the DPE motif in the human genome, independent of the
Drosophila DPE definition, we generated stable cell lines for performing
advanced chromatin immune-precipitation assays (ChIP). The assumption is
that the homologous element in humans is bound by TAF6 and TAF9, which
are found in the PIC. The assumption is based on the finding that human
TAF6 and TAF9 have previously been shown to bind the human DPE-
containing irf-1 promoter.
III
Furthermore, following sequencing of several Hox and Cdx promoter
regions from dozens of patients with different blood cancers, we suggest that
overexpression of the Hoxb6 gene, a common feature of blood cancers, may
result from single nucleotide polymorphism (SNP). To summarize, this study
highlights the importance of understanding the core promoter composition in
health and disease by focusing on the DPE motif. In addition, this work
demonstrates that the DPE motif is functional in humans. However, it remains
to be determined whether a functional equivalent motif, in human, may have a
different sequence and position constraints.
1
1. Introduction
1.1. The complexity of the transcription initiation landscape and the core
promoter region
Appropriate temporal and spatial gene expression is a highly complex
process underlying the fate and function of different cells and tissues. The
regulation of this process is composed of multiple levels and orchestrated
molecular events [1-3]. A central event in the regulation of eukaryotic gene
expression is the initiation of transcription. The initiation of transcription of
protein-coding genes and distinct non-coding RNAs occurs following the
recruitment of RNA polymerase II (Pol II) to the core promoter region by the
basal transcription machinery [4].
The core promoter is generally defined as the minimal DNA sequence that
directs accurate initiation of transcription. The core promoter sequence
encompasses the transcription start site (TSS), typically referred to as the +1
position [5, 6].
In the past, it was assumed that the core promoter is a generic entity that
functions in a universal manner. Nowadays however, the growing convention
is that the unique properties of a given promoter are a function of its
architecture and core promoter motifs composition [5-8]. The core promoter,
which is referred to as “the gateway to transcription”, is a central component
in the initiation of transcription [9, 10]. Research in the past decade has
enhanced our understanding of the fundamental roles that the core promoter
plays in the initiation of transcription, as well as in the regulation of additional
aspects of gene expression. Two modes of transcription initiation were noted
in metazoan, focused and dispersed [8] (Figure 1).
2
Focused (also termed “sharp peak”) promoters contain a single
predominant TSS or a few TSSs within a narrow region of several nucleotides
[7]. Focused promoters encompass approximately between -40 to +40
nucleotides relative to the TSS (referred to as the +1 position). Focused
transcription initiation is associated with spatiotemporally regulated tissue
specific genes [11] and with canonical core promoter elements, which have a
positional bias, such as the TATA box, Initiator, MTE and DPE [12].
Dispersed (also termed “broad”) promoters, contain multiple weak start
sites that spread over 50 to 100 nucleotides at the promoter region ([7, 8] and
refs therein). Dispersed transcription initiation is associated with constitutive
or housekeeping genes. Vertebrate dispersed promoters often contain CpG
islands and Sp1 and NF-Y sites [6, 7, 13] whereas Drosophila core promoters
often contain elements that have weaker positional biases (as compared to
the focused promoters), such as the Ohler 1, DNA replication element (DRE),
Ohler 6 and Ohler 7 [12, 14]. Although the focused promoter architecture
exists in all the organisms and is the predominant initiation mode in simpler
organisms, the dispersed mode is more common in higher eukaryotes [7, 11].
Figure 1. Three main shapes of core promoters including dispersed, focused and
mixed promoters, based on their distribution of TSSs. Small arrows represent weak
TSSs, whereas large arrow represents a single strong TSS. Estimated length of each
promoter type is shown at left and their matched name is shown at right.
3
Although the focused promoter architecture exists in all the organisms and is
the predominant initiation mode in simpler organisms, the dispersed mode is
more common in higher eukaryotes [7, 11]. From a teleological standpoint, the
associations of sharp TSSs with regulated genes and of broad TSSs patterns
with constitutively expressed genes are rather intuitive. It would be easier to
achieve a more precise control of gene expression from focused TSSs as
compared with dispersed promoters of housekeeping genes, which would be
constitutively transcribed with minimal variation of constitutive gene
expression by usage of multiple start sites [7]. With respect to the “focused vs.
dispersed” sub-classifications mentioned above, a mixed promoter, an
additional promoter type, was revealed. This promoter type exhibits a
dispersed initiation pattern with a single strong transcription start site [6, 15]
(Figure 1). Nevertheless, from this point, the reference, in this work, will only
be to the focused core promoter landscape.
1.2. The focused core promoter elements
Classic biochemical studies performed over 30 years ago using the TATA
box-containing adenovirus major late promoter, identified the general
transcription factors (GTFs) as accessory factors for accurate Pol II
transcription initiation [16, 17]. The GTFs were named TFIIA, TFIIB, TFIID,
TFIIE, TFIIF and TFIIH, based on the protein fractions they purified in
(Reviewed in [4]) . These components, together with Pol II were necessary
and sufficient for basal transcription of the adenovirus major late promoter.
They assemble into the preinitiation complex (PIC) by protein-protein
interactions and by mediating core promoter recognition (Figure 2).
4
The basal transcription machinery recruits Pol II to the core promoter that
directs the initiation of transcription [4, 6, 7, 18-20]. The Pol II core promoter is
composed of short DNA sequences that are referred to as core promoter
elements or motifs. The majority of core promoter motifs serve as binding
sites for components of the basal transcription machinery, in particular TFIID
and TFIIB. Notably, TFIID is composed of TATA box-binding protein (TBP)
and TBP-associated proteins (TAFs) [4, 21, 22] (Figure 3). Nevertheless,
there are no universal core promoter elements, and diverse core promoter
compositions have been reported [6, 23]. The vast majority of the core
promoter motifs have been identified in the focused core promoter region.
There are different core promoter motifs that are currently known: human and
Drosophila Inr (Initiator), TATA box, BREu, BREd (up/downstream TFIIB
recognition elements), human TCT and Drosophila TCT, MTE (motif ten
element), DPE (downstream promoter element), Bridge, DCE (downstream
core element), XCPE1 (X core promoter element 1) and XCPE2 (X core
promoter element 2) [6, 18, 24] (Figure 4).
Figure 2. Schematic representation of the architecture of the preinitiation complex at a
promoter of Pol II-transcribed gene. The PIC is illustrated including all the GTFs and the Pol
II. The arrow represents the TSS, which is located within the core promoter.
5
These DNA motifs and their combinations contribute to the architecture and
function of the core promoter [25]. Although many elements have been
discovered so far, it is reasonable that additional elements will be discovered
in the future.
Figure 3. Schematic illustration of TFIIB and hollo-TFIID multicomplex. TFIIB and TFIID
are the central basal transcription factors, which bind the core promoter region. TFIIB, TBP
and TAF1/2 bind the BREs, TATA box, and Inr elements, respectively. TAF6/9 bind the MTE
and DPE elements. The other TAFs, which are not known as DNA binding factor, are
mentioned in general.
The core promoter elements were characterized by computational and
experimental methods. The consensus sequence of each of the elements is
defined by the IUPAC code for nucleotides. Here, only three main core
promoter elements will be discussed: TATA box, Inr and DPE.
6
1.2.1. The TATA box
The TATA box motif is the first core promoter motif to be identified [26].
Although the TATA box was previously considered to be a universal element,
it is presently estimated that only 8%-30% of metazoan core promoters [11,
18, 27-29] and 20%-46% yeast promoters [20, 30, 31] are TATA-dependent.
The TATA box motif is also present in plants [32, 33]. The TATA box is bound
by the TBP subunit of TFIID ([5, 6, 23] and refs therein). Both the TATA box
element and the TBP are conserved from archaebacteria to humans [7, 34].
The consensus sequence of the TATA box is TATAWAAR, where the 5' T is
usually located at -30 or -31 relative to the TSS in metazoans (or at -120 to -
40 in yeast). A wide range of sequences can functionally replace the yeast
TATA box for in vivo transcriptional activity [35].
1.2.2. The Inr
Early studies from the Chambon lab described the existence of a putative
element at the TSS [36] and the function of the initiator (Inr) as a
transcriptional element that encompasses the +1 TSS was articulated by
Smale and Baltimore [37]. The Inr is probably the most prevalent core
Figure 4. Schematic illustration of the majority of the known core promoter elements at
focused promoter area (-40 to +40 relative to the TSS). The diagram is roughly to scale
and every motif is colored differently. The arrow represents the TSS, the +1 position.
7
promoter motif in focused core promoters [27, 38, 39]. It is mainly bound by
the TAF1 and TAF2 subunits of TFIID [40-42]. The mammalian Inr consensus
sequence is YYANWYY (IUPAC nomenclature) [43], and the Drosophila
consensus is TCAKTY with A designated as the +1 [42, 44]. Inr-like
sequences were also identified in Saccharomyces cerevisiae [45].
Computational analyses of promoters argue that the Inr consensus is only YR
(-1, +1 positions) in humans [8, 11, 46] or TCAGTY for Drosophila [27, 38].
The A nucleotide (or R in the YR consensus) is generally designated as the
+1 position, even when transcription does not initiate at this specific
nucleotide. This critical convention is instrumental, because functional
downstream elements are completely dependent on the presence of an Inr
and the precise spacing from it [6, 7, 10].
1.2.3 The DPE
The downstream core promoter element (DPE) is located at +28 to +33
relative to the initiator‟s A+1, with a functional range set of „DSWYVY‟ and it
is recognized by TAF6 and TAF9 [47-49]. In addition to this functional range
set, the guanine at +24 was experimentally found as a contributor to the DPE
function [49].The DPE is associated with and enriched in developmental gene
networks [8, 50-52], and it is conserved from Drosophila to humans [48, 53].
This element was determined to be exclusively dependent on the presence of
a functional initiator, and dependent on strict spacing from it. Additionally, one
central feature of the DPE is its enrichment in TATA-less promoters. However,
co-occurrence of putative TATA, Inr and DPE was observed in a small fraction
of Drosophila genes [7, 29].
8
1.3. The Homeotic proteins: evolutionary conservation, structure and
function in development
The Homeotic proteins (HOX), which are encoded by the Homeobox genes,
are Helix-Turn-Helix transcription factors (TFs) that were first identified in
1978 in the fruit fly, Drosophila melanogaster [54]. These TFs contain a
conserved sequence-specific DNA-binding domain of 60 amino acids, termed
homeodomain or homeobox, which was discovered after their identification
[55]. The HOX proteins are the most investigated family of proteins from all
the Homeodomain-containing proteins. Nowadays, based on this conserved
domain, it is known that the HOX proteins are conserved among all the
eukaryotes [56, 57] (Figure 5). Another group of Homeodomain-containing
proteins is that of the ParaHox genes, the paralogues of the HOX proteins.
The Drosophila Caudal protein and the vertebrates CDX proteins (Caudal-
type homeobox) belong to the ParaHox genes [58].
The HOX and the ParaHox proteins are known as master regulators in
development and differentiation [57, 59-62]. The genes, which encode the
Drosophila HOX proteins, are found in two proximal gene clusters: the Ant-C
and BX-C clusters [54, 57, 63]. This family of genes contains eight Hox genes
that are expressed in the developing embryo in a collinear manner, similar to
their genomic organization. Additionally, in a wide variety of animals, ranging
from C. elegans to mice, mutations and disorders in the Hox genes result in
morphological defects including transformation of one body region into
another. Thus, owing to these 'Homeotic transformations' the Hox genes
called also as 'homeotic genes' [54]. The vertebrate CDX proteins and the
Drosophila Caudal protein regulate Hox gene expression [50, 64]. The name
9
'caudal' originated from its expression in the posterior part of the embryo.
Caudal is expressed in a maternal, as well as zygotic manner [65-67]. It
influences the anterior-posterior axis and regulates additional genes apart
from the Hox genes [50, 68, 69].
Figure 5. Evolutionary development of Hox genes among eukaryotic species. The
phylogenic tree represents the duplications and divergence of the Hox genes from the first
Homeobox gene to humans. The colored boxes represent the Hox genes in each of the
species, and the same color represents the same origin of the genes [57].
In Drosophila, the Hox genes are expressed in a relatively late stage during
embryogenesis. The development of the Drosophila body plan is multi-step
process that includes a sequential expression of maternal-, gap-, pair-rule-
and segment polarity genes, which gradually produce the body segments and
then determine their polarity. After these stages, the role of the Hox genes is
to specify the identity of the segments ([50, 62, 70] and refs therein).
The HOX proteins function as heterodimers or as components of larger
protein complexes to regulate multiple cellular processes (including adhesion,
cell cycle, cell death and cell movement) to regulate animal development and
10
morphology via transcriptional activation or repression of many target genes
[59, 71].
Focusing on humans, the caudal-related and the Hox genes were
duplicated and divergent through the evolution (Figure 6). Consequently,
there are 39 Hox genes, which are organized in four clusters (clusters: A, B, C
and D) on different chromosomes in the human genome [57]. Moreover, as
mentioned above, there are three Cdx genes in the human genome namely,
Cdx1, Cdx2 and Cdx4. Similarly to Caudal, the CDX proteins regulate the
expression of the vertebrate Hox genes [56, 58, 64, 72, 73]. The conservation
of the Hox genes is not only reflected by the DNA sequence, protein
homology and function, but also by their collinear genomic arrangement
relative to their distinct, often overlapping, expression domains [56, 57, 59].
Figure 6. Evolutionary conservation of Hox
genes between Drosophila and humans. Two
Hox clusters, the Ant-C and the BX-C clusters, in
Drosophila and four clusters of human Hox genes
(A, B, C and D) are schematically represented. The
colored boxes represent the Hox genes in each of
the species, and the same color represents the
duplication and divergent of the same colored
original gene in Drosophila and humans. Each color
in the fly and human embryo, indicates the anterior-
posterior expression domain of each Hox gene.
Both the fly and the human embryo are aligned
from left to right [57].
From the transcription regulation perspective, it was found that the Caudal
protein regulates the expression of Drosophila Hox genes by activation of
their DPE-dependent promoters [50] (Figure 7). Different core promoter
11
elements have previously been shown to contribute to enhancer-promoter
compatibility [74].
1.4. The network of HOX and CDX proteins and their role in blood
cancers
Since their discovery, it has been revealed that the Hox genes also play a
role in normal adult tissues. Along with that, mutations in several Hox genes
have been found to cause pathological processes in humans, which range
from morphological disorders to cancers [75]. Molecular pathways, including
the CDX-HOX pathway, which underline malignancies, have been shown to
regulate embryogenesis [56]. Specifically, the Hox and Cdx genes normally
function in adults in self-renewal of hematopoietic precursors and in the
hematopoiesis process. However, Cdx and Hox genes are involved in the
development of blood cancers [58, 73, 76, 77]. Indeed, abnormal expression
patterns of Hox and Cdx genes have been reported in diverse blood cancers
[78-80] (Figure 8). Normally, the expression of the Hox and Cdx genes
Figure 7. Schematic illustration of preferential activation by Caudal. In Drosophila, Caudal
(Green oval), a master regulator of Hox genes, preferentially activates functional DPE-dependent
genes as compared to TATA-containing genes. PIC; preinitiation complex (Orange cloud). Arrows
pointing away from Caudal represent activation and the bold arrow represents preferential
activation. The smaller, 90 degrees arrows at the core promoter regions represent TSSs.
12
decreases during the differentiation of hematopoietic cells [76]. Nevertheless,
in the malignant state there is an aberrant expression of multiple Hox and Cdx
genes, especially genes from the A, B and C Hox clusters, which are
inappropriately activated by the CDX proteins [58, 76, 77, 80, 81].
Figure 8. The Hox and Cdx expression levels in normal and malignant hematopoiesis.
A. Normal hematopoiesis. Cdx1 and Cdx2 are not expressed but Cdx4 is expressed. Its
expression levels as well as the Hox expression levels are diminished during differentiation.
B. Malignant hematopoiesis. The Hox genes are deregulated: ectopic expression of Cdx2 in
progenitor cells induces leukemic transformation and overexpression of Hox genes in
hematopoietic cells is observed during all the differentiation stages. The expression of Cdx4 is
diminished and the Cdx1 is not expressed [58].
Furthermore, as a result of chromosomal translocation events, Cdx2 or Hox
genes are fused with other genes, which promote oncogenesis. Thus, fused
proteins, which include CDX2 or HOX proteins, are a common phenotype in
cancers and particularly, in specific blood cancers [77, 82, 83]. Hence, there is
a body of evidence demonstrating a link between Hox and Cdx genes and
different blood cancers that suggests that these genes and their deregulated
expression could be the cause or the consequence of cancers [56].
13
2. Research Importance:
First, this research aims to deepen the understanding of regulation of Pol II
transcription via the core promoter region, generally, in eukaryotes and, in
particular, in humans. This can be achieved through the characterization of
promoters and their function through the functional analysis of putative core
promoter elements.
Second, the importance of this research stems from its novel approach to
compare between promoter sequences in Drosophila melanogaster and
promoter sequences in humans (part 1). Hence, I actually examine and rely
on the evolutionary conservation in the promoter region. This point of view is
not common, as most conservation studies examine the protein coding
regions of genes and not the regulatory regions. I have also taken this
approach in another part of this work, in which I relate to substitutions, such
as single nucleotide polymorphisms (SNPs) or mutations, in the core promoter
sequence. To date, the majority of the SNPs studies have been examining
their effect in protein coding regions, since these changes could drive
substitutions in amino acids and thus change the function of the protein
encoded by this gene.
Moreover, in addition to the basic insights about transcription regulation,
this research should be potentially relevant at the clinical level.
If a SNP in a specific Hox promoter is found in a higher frequency in patients
with a particular type of blood cancer, as compared to the healthy population,
such SNP could serve as a diagnostic marker for the disease in the future.
Last but not least, this study analyzes the DPE motif in the human genome.
Since 1996, when the DPE motif was first discovered, only two human genes
14
have been published as genes that contain a functional DPE in their
promoters [48, 53]. Thus, researches in the transcription community that use
the Drosophila DPE consensus to search for human genes that contain a
DPE, believe that the DPE may be fly-specific [8]. My research takes on the
great challenge of redefining the DPE element in the human genome using
approaches that not used before to explore this issue, such as Chromatin
Immunoprecipitation with nucleotide resolution through exonuclease unique
barcode and single ligation (ChIP-nexus). It is a novel technique that maps
transcription factor binding footprints genome-wide in vivo at nucleotide
resolution. Thus, the new methodology will enable the identification and
subsequent characterization of human downstream core promoter elements,
in a manner that is independent of the Drosophila DPE consensus.
15
3. Research Goals
1. To identify and characterize functional DPE-containing genes among
human genes and in particular among human Hox/Cdx genes, based
on our knowledge of the Drosophila DPE motif
2. To identify and characterize a downstream element in human
promoters, using an unbiased approach (which is independent of the
Drosophila melanogaster DPE motif) and a newly developed
Chromatin Immunoprecipitation (ChIP) methodology.
3. To analyze the involvement of the promoters of the human Cdx and
Hox genes in the AML and ALL blood cancer types.
16
4. Results
4.1. The integrated approaches of the research
In this study, several independent but integrated strategies were used to
discover, identify (both computationally and experimentally) and functionally
characterize the DPE motif in the human genome and examine its association
with different blood cancers.
The integrated strategies employed were:
a. Individual examination
b. The PCP project
c. The clinical approach
d. Whole genome analysis
e. Advanced ChIP-seq method named ChIP-nexus.
All these strategies, apart from the last one, are based on several guidelines
in order to increase the likelihood that the putative DPE sequences, identified
in the human genome, would be functional. There are three guidelines (Figure
9):
First, as a key criterion, the putative DPE motifs in the human genome must
match the functional range set of the Drosophila DPE motif and be absolutely
dependent on the presence of an Inr. Thus, the mammalian Inr has to contain
at least 3 out of 7 matches to the Inr consensus ('YYANWYY') with an
obligatory match of the 'cA' or 'tA' at positions -1, +1, respectively.
Additionally, the DPE has to be in located with a precise distance from the Inr
(the DPE must be positioned at +28 to +33 relative to the A+1 of the Inr) and
at least 4 out of 6 matches of the sequence to the functional range set
('DSWYVY') with an obligatory match of 'G' or 'C' at position +29 [47-49].
17
It was previously observed that having a „G‟ or a „C‟ at this position (+29) is
critical for the functionality of the DPE motif in Drosophila [49].
The second criterion is whether there are EST clones and/or tags for the
candidate gene and whether the EST clones and/or tags initiate at a position
that matches the A+1 of the putative Inr in the identified Inr-DPE combination
(or within one nucleotide upstream or downstream). These clones and tags
mark experimentally defined TSSs. The information of ESTs and tags is found
in the Genome Browser [84, 85] and the dbTSS (database of transcriptional
start sites in humans) [86], respectively.
The last but not the least guideline, is that the A +1 of the Inr in a putative
Inr-DPE combination is close to the Genome Browser/RefSeq annotated TSS
('known TSS').
These criteria enable the selection of the most likely DPE-containing
candidate genes in the human genome.
Figure 9. Diagram of the three basic guidelines
in order to increase the likelihood that the
putative DPE motifs identified in the human
genome are functional. Putative combinations of
Inr and DPE at human promoters were defined as
the best cndidates of functional DPE containing
promoters basically according to four guidelines;
match to the Drosophila DPE definition, existence
of EST clones, tags and known TSS by the
RefSeq and their proximity to the A+1 of the
putative combination.
18
In general, all the search modes in the human genome underlying the
majority of the research approaches used in this study are based on the
known information from Drosophila.
Likewise, the analysis of human Hox genes using the Drosophila DPE
definition, is, actually based on three experimental findings, which are the
starting point of this study:
a. The majority of the Drosophila Hox genes are DPE dependent [50].
b. To date, the identification of two DPE-dependent human genes, irf1
[48] and calm2 [53] has been published. Their DPE motifs are
characterized by the definition of the Drosophila DPE.
c. The Hox genes are evolutionarily conserved from simple organisms to
humans [57].
Hence, in this study, I focus on the human Hox genes a) because of their
importance in development and cancers and b) as a test case for the
identification and characterization of human DPE-dependent genes.
4.2. Part I: Promoters of human Cdx/Hox genes contain functional DPE
motifs
In order to examine and identify functional DPE sequences at the core
promoter regions of the human Cdx and Hox genes that are pivotal for
transcription, two parallel work channels were developed: An experimental
project, which will be referred to as 'Individual examination', and a
computational approach that was termed 'The PCP project'. These two
19
projects were based on searching for putative combinations of Inr and DPE in
the human Cdx genes and in the clusters of the human Hox genes. The
combinations were defined by the criteria mentioned above.
4.2.1. The 'Individual examination' project
In the 'Individual examination' project, Hila Shir-Shapira, a Ph.D. student in
our lab, and I, cloned together selected minimal promoters (-40 to +40 or -10
to +40 relative to the +1 position of the RefSeq) of the human Cdx and Hox
genes, upstream of the firefly Luciferase (FL) reporter gene in the pGL3 Basic
modified plasmid (in this work, this vector is referred to as pGL3). These
plasmids were transiently transfected into HEK-293 (Human embryonic kidney
cells) by the calcium-phosphate method (see 'Materials and Methods'). In
addition, two more plasmids were transfected. The first one was the TK
Renilla Luciferase (RL) containing plasmid, as a normalization control. The
second plasmid was the pBlueScript vector, in order to supplement the DNA
amount to the required total DNA amount. Generally, we compared between
wild-type (wt) constructs, which contain putative combinations of Inr and DPE,
and mutant (mD) constructs, which contain combinations of Inr and mutant
DPE. Cells were harvested 48 hours following the transfections and cellular
extracts were subjected to dual-luciferase analysis Thus, by the FL activity,
we could indirectly examine the activity of the core promoters and their
dependency on specific sequence motifs.
We initially examined the minimal promoters of the human Cdx1 and Cdx2,
genes (Figure 10).
The Cdx2 minimal promoter contains a combination of an Inr and DPE, which
matches in 4 out of 6 positions to the DPE definition, and does not contain a
20
putative TATA box sequence in the appropriate location. Upon mutation of the
putative Cdx2 DPE (mutating nucleotides 'AGAGG' to 'CTCATG'), there is a
~50% reduction in reporter activity as compared to the activity of a wt-driven
reporter (Figure 10A). Thus, the basal transcription of the human Cdx2 gene
is dependent on the DPE motif.
Figure 10. Transcription from the human Cdx1 promoter is dependent on both a TATA
box and a DPE motif whereas transcription from the Cdx2 promoter is dependent on
the DPE motif. The bar graphs illustrate a summary of three independent dual luciferase
experiments (each performed in triplicates). Mutated minimal promoter-driven constructs:
mutant DPE (mD), mutant TATA box (mT) or mutant TATA box and DPE (mTmD) were
compared to Wild-type (wt) minimal promoter-driven constructs A. Fold activation of the
human Cdx2 minimal promoter variants. B. Fold activation of the human Cdx1 minimal
promoter variants. The error bars represent the standard error of the mean (SEM).
In contrast to the human Cdx2 and the majority of Hox genes (see below),
the human Cdx1 promoter contains a TATA box motif in addition to a putative
Inr-DPE combination. Both the putative TATA box and the putative DPE
sequences fully match their consensus. The Inr motif only matches in 4 out of
7 positions to the mammalian Inr consensus. Nevertheless, the A+1 of the Inr,
in this triple element combination (TATA box, Inr and DPE), coincides with the
RefSeq +1 position of the gene (Appendix 1). As can be seen, only when both
the TATA box and DPE were mutated (mTmD, mutating nucleotides
21
'TATAAAAG' to 'ACGGACGT' and 'GCTCGT' to 'CTCATA' for TATA box and
DPE, respectively), a significant reduction of about 40% was observed.
However, neither the mutated TATA box (mT)-driven nor the mutated DPE
(mD)-driven constructs displayed significantly reduced reporter activities
(Figure 10B). Thus, the basal transcription of the human Cdx1 gene is not
exclusively dependent on the DPE motif, but this motif contributes to Cdx1
transcription at a certain level. Notably, the existence of sequence motifs that
match the Drosophila DPE definition, influence basal transcription levels of
both human Cdx1 and Cdx2 genes.
Surprisingly, a year after obtaining these results, there was an update to the
RefSeq annotation of the major TSSs of human genes, which indicated a
different and more distal TSS of the Cdx2 gene. Moreover, the analysis of
relations and the mutual contributions of TATA, Inr and DPE is beyond the
scope of this study, as we want to focus on the function of the DPE only and
its influence on transcription. Thus, we decided to discontinue the analysis of
the Cdx1 and Cdx2 promoters at this point.
Subsequently, the core promoters of all 39 Hox genes were manually
analyzed. As a preliminary step, the presence of TATA box, Inr and DPE
sequences in the appropriate positions was examined in the -200 to +200
region (relative to the TSSs of RefSeq) (Table 1). In this analysis, we used the
abovementioned criteria, as well as additional parameters such as: Pol II
occupancy, nucleosome positioning, histone modifications and CAGE
patterns, to select promising candidates of Inr and DPE combinations.
22
Core promoter
element
Human Hox gene cluster Total
A B C D
DPE only 9 6 4 7 26
TATA only 1 3 2 0 6
TATA+DPE 1 3 1 1 6
No TATA or DPE 0 0 2 3 5
Total 11 12 9 11 43
Table 1. A summary table of the appearance of TATA box- and DPE (associated with
Inr- like sequences) arround the TSSs of the human Hox transcripts. Sequences of
about 400nt around the known TSSs of all the Hox genes (200nt from each side), both, the
major TSS and alternative TSSs of each gene, were analyzed for the presence of TATA box,
combinations of Inr and DPE or TATA, Inr and DPE together, which may regulate
transcription of the Hox genes. 'TATA+DPE' or 'DPE only' means a human Inr and DPE
combination with strict spacing of 27bp between these two elements with or without TATA box
existence, respectively. 'No TATA or DPE' means that there is no TATA or Inr and DPE
combination at all, or in a reasonable distance (less than 50bp) from the known TSS.
Overall, twenty six putative combinations of Inr and DPE were identified
among all four Hox gene clusters. The best candidate promoters that we
decided to focus on, were: Hoxa1, Hoxa2, Hoxa9, Hoxa11, Hoxb3, Hoxb9,
Hoxc6, Hoxc8, Hoxd3, Hoxd9 and Hoxd10 (Figure 11, see Appendix 1).
23
Figure 11. Multiple human Hox genes contain the Drosophila DPE sequence motif, but
only a subset of these genes is transcriptionally dependent on this DNA element. The
bar graphs depict summaries of three independent, dual luciferase experiments (performed in
triplicates) that compare between minimal core promoter variants. Mutant DPE-containing
minimal promoter-driven constructs (mD) were compared to wild-type (wt) minimal promoter-
driven constructs. A. Fold activation of wt and mD pairs of Hox genes whose activities were
either not affected or were enhanced by the mutation. B. Fold activation of wt and mD pairs of
Hox genes whose activities were reduced by the mutation. The error bars represent the
standard error of the mean (SEM).
As shown in Figure 11, four Hox candidates (Hoxb3, Hoxc6, Hoxd9, and
Hoxd10) are not affected by the mutation of the DPE, which means that the
identified DPE sequence is not significantly important for their transcription
regulation (Figure 11A). However, the other tested candidates were affected.
While the luciferase activity of two Hox genes (Hoxa9, Hoxd3) seem to be
24
enhanced by the DPE mutation (Figure 11A), the luciferase activity of five
more Hox genes (Hoxa1, Hoxa2, Hoxa11, Hoxb9 and Hoxc8) is reduced as a
result of the mutation, suggesting that the DPE is important for their
transcription (Figure 11B). The nucleotides at positions +28 - +34 of each Hox
genes were mutated to 'CTCATGT' except for Hoxb9, which was only mutated
to 'CTCATG' at positions +28 - +33. The enhancement of the activities of
Hoxa9 and Hoxd3 following the DPE mutation, can be explained by the
potential involvement of the DPE motif in Pol II pausing [87, 88]. If so, then
upon mutation of the DPE, Pol II may move to productive-elongation state,
which might result in higher luciferase activity levels (see discussion for
further details). To conclude, using the Drosophila definition of a DPE motif,
we have identified five Hox genes that contain a functional DPE.
4.2.2. The PCP project
The PCP project is based on the 'Determinacy Analysis Chain' (DAC)
software that was developed by the mathematician Prof. Sergey Chesnokov.
This computational approach, which was done in collaboration with Prof.
Chesnokov, was used as a complementary strategy to the 'Individual
examination' approach to facilitate the identification of Inr and DPE
combinations within the human Hox gene clusters in a high-throughput
manner. This is done by searching for perfect and imperfect matches of
nucleotides that are found at specific positions within the promoter.
Nucleotides are defined as critical for DPE function based on experimental
findings using the Drosophila Hox promoters.
25
In agreement with that, the working assumption in this project was that Inr
and DPE combinations-containing sequences in the human genome (defined
based on the Drosophila DPE) should be associated with active promoters of
human Hox genes. The analysis was based on the seven Drosophila DPE-
containing Hox promoters: Scr, Dfd, lab, Antp-p1, Antp-p2, pb and Abd-B .
Since the Drosophila abd-A and Ubx promoters do not have functional DPE
motifs [50], these Hox promoters were not used in the reference-set.
Generation of 'Irreducible genetic markers' (IGms) was performed after
alignment of these seven minimal Drosophila Hox promoters (-10 to +40
relative the TSS). IGms are distinct sets of positions that contain, by definition,
13 irreplaceable nucleotides, which are part of the critical nucleotides that are
necessary for the transcriptional function of each Hox promoter. This number,
13, is the number of irreplaceable nucleotides that must be used for the
creation of IGms by the DAC software. IGms were searched for within the four
human Hox clusters and constructed based on the Drosophila DPE functional
range set and to the mammalian Inr consensus. Following that, for every IGm
that was matched in the human genome, the additional matched positions of
the mammalian Inr and Drosophila DPE motif sequences were appended.
These sequences were named 'Putative core promoters' (PCPs). The
PCPs, which were identified by the IGms, were considered to be 'good' PCPs
(see below) if they were likely to contain a functional DPE and serve as active
promoters.
At first, the irreplaceable positions, which compose the different IGms, only
included the positions of the Inr and the DPE motifs. A stronger indication for
a possible biological significance to these sequences was given if, in addition
26
to these positions, nucleotides in positions +17, +19 and +24 were T, G and
G, respectively. These nucleotides were previously shown to be
overrepresented in these positions in Drosophila DPE-containing promoters.
Moreover, the nucleotide G at position +24 was also experimentally shown to
contribute to DPE function in Drosophila [49]. Every single PCP was scored in
order to evaluate how 'good' it is. The PCP received one point if the positions
of the Inr and DPE motifs matched their consensus and functional range set,
respectively. The PCP received half a point if position +24 matched. As a
result, a total of 1524 'good' PCPs for all four human Hox clusters were
identified in the both strands (-/+) (see Appendix 2). This quantity of PCPs
exceeds ten times the expected results. We next “filtered out” of PCPs that
co-occupied nucleosomal sites, as promoter regions are generally
nucleosome free. It is unlikely that PCPs that are found at nucleosome-
occupied sites would be active promoters. The filtering was done, with the
help of Dr. Tirza Doniger, using data from two complement resources from the
genome browser: 'Nucleosome Positioning' and 'DNaseI Hypersensitivity
Sites'. Following this filtration, ~ 400 PCPs remained. Although the vast
majority of the results were eliminated by the filtration, the number of the
remaining PCPs is still higher than expected.
Thus, as a second step, we evaluated the PCP approach by comparing the
human Hox gene clusters to the human Histone gene cluster. The different
clusters are similar in length (see Appendix 3). The reason for this comparison
is that in contrast to the Hox genes, the promoters of the core Histones (h2a,
h2b, h3 and h4), at least in Drosophila, contain TATA box elements [50, 89,
90]. Unfortunately, this analysis indicated that the number of PCPs was
27
similar among the different clusters (Hox versus core Histones). This finding,
as well as other data from our lab, led us to suspect that additional
nucleotides that are located at distinct positions between the Inr and DPE
might be important for promoter activity. Notably, this region (which we
termed, “the Neck”), has already been shown to contain one conserved motif
that functions with the Inr and DPE, namely, the MTE [91].
Hence, these nucleotides and their positions may also be conserved among
Drosophila and humans. We therefore generated new sets of IGms, which
contain the Inr and DPE as well as nucleotides in between, and searched for
new PCPs at the human Hox and Histone gene clusters (Figure 12).
Figure 12. A schematic representation of the workflow of the 'PCP project'. A. The first step of the PCPs
analysis within the human Hox and Histone gene clusters was based on IGms that were only generated only
from the Inr and the DPE positions (IGms-Area-0) of the seven Drosophila DPE-containing Hox genes. B. The
revised analysis involved the generation of IGms (IGms-Area-1) based on the Inr, DPE and nucleotides in the
region between them (the 'Neck') of the Drosophila DPE-containing Hox genes.
28
Fifty-four sequences of Drosophila TATA-less, DPE-containing core
promoters were analyzed by the bioinformatics tool 'WebLOGO', in order to
estimate the conserved positions in the “Neck” region (Table 2). This set of
sequences was composed of experimentally validated core promoter
sequences from our lab [51, 92] and core promoter sequences analyzed by
the Kadonaga Lab and published in the 'Drosophila Core Promoter Database'
(DCPD) website (http://labs.biology.ucsd.edu/Kadonaga/DCPD.htm) [49].
Specifically, following this new definition, there is a new set of 13 positions for
each of the seven Drosophila DPE-containing Hox genes, which is used as its
IGms for searching for PCPs in the human Hox and Histone gene clusters
(Table 3, see Appendix 4).
Table 2. The conserved positions and bases in the 'Neck' region between the Inr and the DPE
of the 54 Drosophila DPE-containing genes. These positions and nucleotides were conserved
among the set of the 54 Drosophila DPE-containing genes and may be important in selection the
DPE-dependent genes in humans. The table is separated to four groups (Group1-4) according to the
conservation level of the bases in these positions.
Group4 Group3 Group2 Group1
+20 +25 +24 +19 +18 +27 +17 Position
S A/T G G/A C/G A/C T/C Base
The chosen positions
Inr Neck DPE
Dfd C-1, A+1, A+3 T+17, C+18, G+19, G+24, C+27 G+28, G+29, T+30, T+31, C+32
AntpP1 T-2, A+1, T+3 T+17, A+19, A+20, T+25, A+27 A+28, C+29, A+30, T+31, C+32
Abd-B T-2, C-1, A+1 T+17, C+18, G+19, G+24, T+25, C+27 G+28, G+29, T+30, T+31
AntpP2 T-2, C-1, A+1 C+18, G+19, T+20, A+25 A+28, G+29, A+30, C+31, G+32, T+33
lab T-2, C-1, A+1 C+18, G+19, G+24, A+27 G+28, C+29, A+30, C+31, G+32, T+33
pb T-2, C-1, A+1 T+17, G+19, G+24, A+25 G+28, G+29, T+30, T+31, G+32, T+33
Scr C-1, A+1, T+3, C+4 T+17, C+18, G+19, G+28, C+29, A+30, C+31, G+32, T+33
29
The results are organized in seven different tables. Each table summarizes
the PCPs results for each Drosophila Hox promoter (Figure 13). Using this
approach we identified hundreds of PCPs among human Histone and Hox
gene clusters and dozens of 'good' PCPs, which were divided into three
groups:
Type I (Hox) - the PCPs that were only found in the human Hox gene
clusters (marked in green). These PCPs were generated from the set of IGms
that were unique to the Hox gene clusters.
Type II (HoxHist) - the PCPs that were found in both the Histone and Hox
clusters (marked in light orange). These PCPs were generated from sets of
IGms that were present in both the human Hox and Histone gene clusters.
Type III (Hist) - the PCPs that were only found in the Histone gene cluster
(marked in red). These PCPs were generated from the set of IGms that were
unique to the Histone gene clusters.
Table3. The new composition of the IGms based on the seven Drosophila Hox genes including
the Inr, Neck and DPE regions. The table presents the nucleotides and their positions that compose
the IGms including the new area, the region between the Inr and the DPE ('Neck'). For each
Drosophila Hox gene, the best positions were chosen (see main text and Table 2 for further details).
30
31
This analysis (Figure 13) led to two main findings. First, whereas the
previous analysis resulted in similar numbers of PCPs in the genomic regions
of both the human Hox and Histone genes, the vast majority of the PCPs
identified by this analysis were within the human Hox gene clusters ('PCP All',
Figure 13). Moreover, the number of the 'good' PCPs, which were identified in
the genomic area of the human Histone genes, was negligible in comparison
to the number of the 'good' PCPs that were identified in the human Hox gene
clusters.
Second, in contrast to the previous results, the number of PCPs was reduced
from hundreds of 'good' PCPs to only dozens of 'good' PCPs ('PCP Good',
Figure 13). In the previous analysis that did not include the Neck region,
hundreds of PCPs were identified (1524 'good' PCPs in total, see above).
Having dozens of PCPs seems more “reasonable” as there are only 39
human Hox genes.
The new set of 'good' PCP sequences was further filtered by Dr. Tirza
Doniger using the 'Nucleosome Positioning' and 'DNaseI Hypersensitivity
Sites' data from the UCSC genome browser. Following this step, only 52
'good' PCPs remained as potential candidates for functional combinations of
Figure 13. Summary tables containing the distribution of PCP within the human Hox gene clusters
versus the Histone gene clusters following the analysis that included the 'Neck' positions. Every
single table contains the number of the PCPs, which were identified in the human Hox and Histone gene
clusters, separately (Hox or Hist) or overlapping (HoxHist), and were generated by IGms from a single
Drosophila Hox promoter. The three groups: Hox, HoxHist and Hist are marked in green, light orange and
red, respectively. At the top of each table there are the percentages of the different types of PCPs out of all
the PCPs that were identified in total. The percentages of the 'good' PCPs are indicated at the bottom of
each table. The names of the Drosophila Hox gene, which the PCPs were originally generated from, are
indicated on the left. (A-G). PCP-All: the sum of 'good' and „not good' PCPs that were identified. “PCP
Good” (dark blue) indicates the number of PCPs that, based on their sequence, may be functional
candidates.
32
Inr and DPE in the human Hox gene clusters. To visualize the genomic
location of the PCPs, we uploaded this final set of PCPs to the UCSC genome
browser (through the generation of a personal custom track), (Appendix 5).
Unfortunately, even though the results seemed promising, the 'good' PCPs
did not overlap with (or were not even within <50bp from) the known Hox
TSSs. Moreover, their orientation did not always fit the directionality of the
closest known TSSs.
Taken together, the “PCP approach” of identifying putative DPE-containing
human Hox genes, based on the Drosophila DPE motif was unsuccessful and
did not result in the discovery of DPE-containing promoters within the human
Hox gene clusters.
4.2.3. The 'Individual examination' project- further analysis
Based on the individual examination of human Hox genes, there are five
functional genes that contained a Drosophila DPE motif, namely, Hoxa1,
Hoxa2, Hoxa11, Hoxb9 and Hoxc8 (Figure 11B). The next step was to
experimentally define their TSS using primer extension assays on RNA
purified from cells that were transfected with minimal promoter constructs.
However, the levels of the transcripts from the transfected minimal promoters
were very low and we could not detect the extension products.
To overcome this, we designed new reporter constructs, each containing a
larger genomic fragment that encompasses the natural core promoter, as well
as ~ 1000bp of upstream sequence. Our assumption was that this region
would contain binding sites of activators, especially CDX2, and thus the
33
binding of activators that are expressed in the cells to these regions could
potentially enhance the transcription from the promoters of these five human
Hox genes. After computational analyses (using the 'JASPAR' tool) of putative
transcription factor binding sites in these regions, we constructed new
plasmids, which contain ~1000bp upstream of these minimal promoters, using
PCR with specific primers on human genomic DNA (for wt constructs).
Following the construction of the wt constructs, we used site-directed
mutagenesis to construct the mDPE version of each promoter. For simplicity,
we termed these plasmids „Enhancer-Promoter‟ constructs, even though it is
very likely that additional cis-regulatory modules, not necessarily proximal to
these promoters, contribute to their activities. It is noteworthy that the
mutagenesis of the Hoxc8 construct was technically challenging (perhaps due
to its high GC content) and we‟ve only recently managed to generate it. To
test the activities of these constructs, HEK 293 cells were transfected with
either the wt or mD 'Enhancer-Promoter' construct of each of the four human
Hox genes (Hoxa1, Hoxa2, Hoxa11 and Hoxb9). Cells were harvested the
cells 48h following transfections and the extracts were assayed for luciferase
activity (Figure 14).
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Hoxa1 Hoxa2 Hoxa11 Hoxb9
Rel
ativ
e Lu
cife
rase
Act
ivit
y
wt
34
Figure 14. The ‘Enhancer-Promoter’ constructs of human Hox promoters that contain
the Drosophila DPE motif indicate that their activity is dependent on the DPE motif.
The bar graph illustrates a summary of three independent dual luciferase experiments
(performed in triplicates). Mutant DPE-containing enhancer-promoter-driven constructs (mD)
were compared to wild-type (wt) enhancer-promoter-driven constructs. The error bars
represent the standard error of the mean (SEM).
Surprisingly, in contrast to the results of the minimal promoter constructs
(Figure 11B), the mD human Hoxa1 Enhancer-Promoter construct showed
high reporter activity, as compared to the wt construct (Figure 14). Enhanced
luciferase activity can be explained by RNA polymerase II pausing at the
promoter-proximal region of genes. As mentioned above, paused Pol II
Drosophila promoters are enriched for the presence of DPE (perhaps due to
its high GC content [87]). Notably, the relative activity of the other constructs
(Hoxa2, Hoxa11 and Hoxb9) was reduced when the DPE motif was mutated.
Nevertheless, the relative luciferase activity of the human Hoxa2 enhancer-
promoter-containing plasmids couple (wt and mD) does not reflect the activity
of the firefly luciferase (FL). Rather, owing to high RL levels in the Hoxa2 mD
transfected cells relative to the RL levels in the Hoxa2 wt transfected cells, the
normalization of the dual luciferase activity, which is performed by division of
the FL by the RL values, results in reduced levels of mD. The FL levels of
both wt and mD Hoxa2 transfected cells were similar. Hence, the only two
genes that convincingly demonstrated a Drosophila DPE-dependent activity
were the human Hoxa11 and human Hoxb9.
Following these findings, we purified total RNA from transfected cells in
order to do primer extension assay (performed by Hila Shir-Shapira, a Ph.D.
student in the lab). This assay allows the in-vivo validation of the levels of the
35
transcripts of the reporter gene under the regulation of the different enhancer-
promoter regions (wt versus mD) of human Hoxa11 and human Hoxb9.
Moreover, this assay reveals the TSSs of the transcripts. We did were unable
to detect a signal with total RNA and decided to purify poly A+ RNA from the
transfected cells. Primer extension experiments using PolyA+ RNA are being
performed these days and the preliminary results are promising.
4.3. Part II: Identification of a SNP in the +1 position of the Hoxb6 and
its potential implications in health and disease
On one hand, it is known that, similarly to Cdx2 [78]), certain human Hox
genes are aberrantly expressed in several solid tumors[56, 93], especially in
different types of blood cancers [56, 58, 76, 77, 79, 80, 94]. On the other
hand, based on evolutionary conservation, we hypothesized that functional
DPE motifs exist in the human genome, particularly in the Hox gene clusters
whose basal transcription is regulated by this motif in Drosophila. Thus, one
possible explanation for the abnormal expression of Hox genes in cancers is
that it may be caused by genetic substitutions (mutations or single nucleotide
polymorphism; SNP) in their core promoters or in core promoter elements.
This hypothesis is supported by evidence of many nucleotide polymorphisms
in multiple TATA box elements, which have been associated with multiple
pathologies in humans [95]. In order to examine this suggestion, we searched
for substitutions in the promoter regions (-200 to +200 relative to the known
TSS) of the human Cdx2, Hoxa9, Hoxa10, Hoxb6 and Hoxb7 genes in
sampled obtained from blood cancer patients. The promoter regions of these
36
genes were sequenced by Dr. Julia Starkova from the 'Childhood Leukaemia
Investigation Prague' (CLIP) institute in Prague, who has an access to
samples from patients. The above genes were chosen from all the human
Hox and Cdx genes, since their expression in leukemia is aberrant and it
suggested that these genes are associated with the development or the
appearance of cancers [80]. In order to distinguish between mutation and
SNP, we first mapped all the known SNPs (based on the SNP database,
dbSNP of NCBI), in the range of 700bp upstream and 700bp downstream,
relative to the known TSSs of these genes.
Further, for each of the genes, the
sequences of dozens of patients were
examined and compared to the human
genome sequence (from the genome
browser; 'the reference') (Table 4).
We analyzed sequences of samples
obtained from T-cell Acute Lymphoblastic
Leukemia (T-ALL) and Acute Myeloid
Leukemia (AML) patients.
Although we did not detect any mutations
in the sequenced core promoter regions we
examined, several SNPs were identified. Strikingly, we detected a SNP in the
known +1 position (RefSeq) of the Hoxb6 gene (SNP ID in the dbSNP is
rs56805315). In this SNP, Cytosine (C), which is more prevalent, is
substituted by a Thymine (T). Further analysis of the Hoxb6 core promoter
sequence led to several findings (Figure 15):
AML T-ALL Cancer
type
The gene
35 60 Cdx2
33 20 Hoxa9
34 50 Hoxa10
31 71 Hoxb6
33 69 Hoxb7
Table4. The distribution of the patients' samples, whose promoter regions were sequenced. The promoter regions (±200bp relative to the known TSS) of five human genes: Cdx2, Hoxa9, Hoxa10, Hoxb6 and Hoxb7, were sequenced from samples of blood cancer patients (T-ALL or AML). The sequencing was done in order to search for substitutions (mutations or SNPs) in these sequences.
37
a. There is an Adenine (A) at the +2 position, which could be defined as
the +1 position and then the preceding 'C' could be defined as the -1
position (A is typically designated as the +1 even if transcription
initiates within a few nucleotides from it).
b. These two nucleotides constitute a sequence that partially matches the
mammalian Inr consensus (4 out of 7 positions). At the appropriate
spacing of 28 nucleotides downstream of this putative +1, there is a
sequence that partially matches the functional range set of the
Drosophila DPE motif (4 out of 6 positions) (Figure 15, top).
c. Three nucleotides downstream of the abovementioned 'A', there is
another A. This DNA sequence (highlighted in yellow in Figure 15,
bottom) may serve as a +1 of a motif that fully matches the Inr
consensus (7 out of 7). Similarly to the first putative Inr, there is a
sequence that partially matches DPE motif (4 out of 6) and is located at
the appropriate distance from the putative Inr motif (Figure 15, bottom).
Figure 15. Combinations of putative Inr and DPE motifs, which are found in the core promoter
sequence of the Hoxb6 gene. The combination containing the RefSeq TSS is shown on top and the
alternative combination, which is close to the known TSS and contains a sequence that perfectly
matches the Inr consensus, is shown on the bottom. The SNP (C or T-containing alleles) is colored in
grey; matching positions to the Inr consensus or to the functional range set of the Drosophila DPE are
colored in yellow or purple, respectively. The start and end positions (-5 to +40) are relative to the
RefSeq TSS.
38
As shown in Figure 15 and as mentioned above, although the Inr of the first
combination of Inr and DPE (Figure 15, top) contains the known TSS of
Hoxb6 and is thus expected to be optimal to transcription, this Inr does not
fully match the consensus. However, the Inr of the second putative
combination of Inr and DPE (Figure 15, bottom) matches the consensus
perfectly. Moreover, it is known that the expression levels of this gene, Hoxb6,
are higher in leukemic cells compared to normal hematopoietic cells [80, 96,
97]. Therefore, it is possible that the high expression of Hoxb6, especially in
blood cancers, is affected, directly or indirectly, by this SNP. Hence, I want to
suggest the following potential mechanism:
Normally, the presence of the canonical Inr, which contains only 4
matching positions including the most important positions -1 and +1 'C' and
'A', ('cA') respectively (Figure 15, top), enables transcription that leads to
appropriate expression levels of the Hoxb6 gene in the cells. However, if the
'C' is replaced by a 'T' due to certain environmental influences or genetic
factors, the strength of the Inr, which now contains a 'tA' instead of a 'cA', is
reduced and thus the transcription levels from this TSS are also reduced. In
general, although both 'C' and 'T' at position -1 match the Inr consensus, 'C' is
more common than 'T'. In such a situation, an alternative TSS might be
preferred. Hence, the basal transcription machinery might recognize the
“second Inr” sequence, which matches the Inr consensus perfectly (Figure 15,
bottom). Hence, it is likely that the alternative TSS would be stronger than the
known TSS with a „T‟ at -1. Subsequently, this alternative usage of a “second
Inr” might result in higher transcription levels of the Hoxb6, which is commonly
associated with blood cancers.
39
In order to examine this hypothesis, the minimal promoter variants from -10
to +40 relative to the known TSS (wt or SNP) of the Hoxb6 gene were cloned
upstream of the firefly luciferase reporter gene and transfected into HEK-293
cells along with the Renilla reporter plasmid for transfection-efficiency
normalization. The activity of the SNP promoter, which contains the rare allele
('T'), was compared to the activity of the wt promoter, which contains the
common allele ('C'), using dual luciferase assays. As can be seen in Figure
16, minimal promoter containing the rare allele 'T' (below; SNP) is 8.5 fold
more active than the wt variant. This preliminary finding supports the above
hypothesis but additional experiments, such as primer extension and
tumorigenicity-related functional experiments that use the promoter in the
context of the whole gene, need to be done to strengthen it.
Figure 16. Substitution of Cytosine to Tymine in the known TSS of the human Hoxb6 gene
enhances the expression levels of the reporter gene that is cloned downstream of it. The bar
graph illustrates a summary of three independent dual luciferase experiments (performed in
triplicates) that compare between minimal core promoter variants; wt ('C' at the known +1 position) or
SNP ('T' at the known +1 position). The error bars represent the standard error of the mean (SEM).
40
Additionally, in order to estimate the clinical effect of the SNP and
examine whether the presence of the SNP is correlated with the severity of
blood cancer, Dr. Starkova found out from Affymetrix (which manufactures
SNParrays), that for 170 healthy individuals, the frequency of the rare allele 'T'
is 3.5% heterozygotes. Furthermore, according to the 1000 Genome Project
data, for 120 healthy individuals, the frequency of the rare allele is 4.9%
heterozygotes. Hoxb6 promoters were sequenced (see above, Table 4) from
102 T-ALL and AML patients in total. The results indicate similar frequencies
of the rare allele among the 102 patients (~ 5%). AML patients displayed a
higher frequency of the rare allele, as compared to T-ALL patients. Obviously,
this number of our patients' samples is not enough for performing statistical
analyses. Many more samples should be analyzed in order to determine
whether the frequency of the rare allele is higher in patients compared to the
healthy population. We have analyzed the samples that Dr. Starkova has and
obtaining samples from additional patients is expected to be a lengthy
assignments. In addition, this SNP is rare, and is barely found in databases
that are currently available to us. Thus, we decided to suspend future analysis
of this SNP until more samples from Dr. Starkova and more databases of
patients' samples will be available for us. Notably, we have recently identified
a sequence in the Hoxb6 promoter that partially matches the TATA box
element and is located 22bp upstream of the TSS, according to the RefSeq.
Even though it is not the typical position for a TATA box sequence relative to
the TSS, it may still be functional. Nevertheless, in light of the progress of my
other projects, we have decided to advance them and discontinue the work on
the regulation of the Hoxb6 expression for the time being.
41
4.4. Part III: Whole genome analysis
4.4.1. Identification of human promoters that contain Drosophila DPE
sequence motifs
Following the identification of the DPE motif in the human Hox gene clusters
as a test case, our search was expanded to other human candidates using
computational and experimental whole genome analysis. Whereas a few
candidates were identified in the human genome by the 'individual
examination' strategy, this approach is not systematic.
This project was performed in collaboration with Hila Shir-Shapira and Anna
Sloutzkin, two Ph.D. students in our lab, and Amitay Drummer from Sol
Efroni's lab at Bar-Ilan University.
Towards this systematic strategy, a computational software, termed
'hDPEsearcher', was developed. The Matlab-based hDPEsearcher code was
developed by Amitay Drummer and was optimized by Anna Sloutzkin. The
software searches for putative combinations of Inr and DPE within the human
genome. The combinations were defined by the abovementioned key
sequence criteria (see pages 16-18). Moreover, the detected combinations
are compared with the known RefSeq TSSs of all the annotated human
protein-coding transcripts. The proximity to the TSS and the Inr and DPE
match scores are considered to be indicative of a potential biological function
of the detected DPE motif. EST clones and tags are not automatically taken
into consideration in this software.
42
The hDPEsearcher analyzes each strand of each chromosome separately,
and its algorithm can be described by the following steps:
1) Search for DPE sequences, where each match receives a score of 1.
2) If the DPE score is at least 4 out of 6, search for a mammalian initiator
starting at position -2 if the second position of the detected DPE ('G' or
'C') is precisely located at +29 position relative to the A+1 of the
initiator.
3) Calculate the mammalian Inr score.
4) An Inr and DPE combination is considered to be a „hit‟ only if the
combined score for both is at least 8. i.e., if the DPE is highly
conserved the corresponding Inr can be less conserved and vice versa.
In addition to the chromosomal coordinates of the putative Inr and DPE
combinations, the software plots the combinations found against the genomic
coordinates for each chromosome. These graphs describe a rather uniform
distribution of putative Inr and DPE combinations along the chromosome, with
some regions containing peaks of Inr and DPE combinations frequencies.
However, no significant correlation between the peaks and potential biological
function was discovered.
The list of the complete human TSS locations was extracted from the UCSC
table browser, with the help of Dr. Tirza Doniger. Alternative splice variants
starting at the same position were considered as single TSSs, denoted by its
official gene name. Putative Inr and DPE combinations found within ±5bp of
known TSSs were considered for future analysis.
Furthermore, the putative Inr and DPE combinations that were computationally
identified were manually filtered. Specifically, we checked whether there are EST
43
clones and tags for these combinations (using the genome browser and dbTSS,
respectively) that coincide with promoters of genes that are associated with
specific biological processes, such as; development, cell-cycle, proliferation,
apoptosis and differentiation. Following this selection, 11 human genes were
considered to be the most promising candidates for experimental validation.
Finally, 11 human genes were considered to be the most promising candidates
for experimental validation.
The genes are: p21, tp53inp2, ccnd1, proS1, twist2, snail1, cdc25a, cdc25b,
cdc34, Hoxb6 and Hoxd13 (see Appendix 6). For each of these genes, wt and
mutant DPE (mD) versions of the minimal promoter (-10 to +40, relative to the A
+1 position of our combination) were cloned into the firefly luciferase pGL3
vector. The generated plasmids were co-transfected into HEK-293 cells, along
with the TK Renilla luciferase reporter plasmid for transfection-efficiency
normalization. The relative activation of the mD version, compared to the wt, was
examined using dual luciferase assays. All the experiments were performed at
least twice and each experiment was done in triplicates.
In contrast to our expectations, no major differences between the
transcriptional activity of the wt and the mD minimal promoters were observed
for all the analyzed promoters (Figure 17). A close examination of the genes
giving an apparent reduction of expression upon mutation of the DPE (i.e. ccnd1
and proS1) reveal a difference in the activity levels of normalizing RL between
the wt and mD transfected cells, and therefore, the general conclusions seem to
be applicable to all the analyzed promoters.
44
Figure 17. Multiple human promoters contain matching Drosophila DPE sequences,
however these motifs are not functional. The bar graph illustrates dual luciferase
experiments that compare between minimal promoter variants. Mutant DPE-containing
promoter-driven constructs (mD) were compared to wild-type (wt) promoter-driven constructs.
Overall, there is no substantial difference in expression between the wt and the mD versions
of the minimal core promoters. Error bars represent standard deviations.
n=4 for p21, ccnd1, twist, and Hoxd13. n=3 for tp53inp2, proS1, cdc25a, cdc25b, and cdc34. n=2 for snail and Hoxb6.
These findings could be explained by experimental design issues and by
the evolutionary distance between Drosophila and humans (see discussion).
It is noteworthy that two more human Hox genes, Hoxb6 and Hoxd13 (this
Hxob6 transcript has a different TSS as compared to the Hoxb6 TSS that was
mentioned above) were examined in this project, although these candidates
were previously not considered promising enough to have a functional DPE in
the 'Individual examination' project (see Table 1 and its related text). These
findings and this computational project emphasized the need for a
bioinformatics tool that detects core promoter elements in order to facilitate
the identification of core promoter elements.
45
4.4.2. ElemeNT- a core promoter Elements Navigation Tool
Indeed, there is no available resource allowing the identification of all the
specific core promoter elements and their potential combinations within a
given sequence. Hence, almost every annotation of core promoter elements
in a sequence of interest described above was individually performed. To
automate this process and alleviate the time burden associated with manual
scanning of dozens of sequences at once, Anna Sloutzkin and I have
developed the 'core promoter Elements Navigation Tool' (ElemeNT).
A paper describing this work is under review (see Appendix 7).
Briefly, ElemeNT is a web-based, interactive tool (implemented in Perl) for
rapid and convenient detection of core promoter elements and their
combinations within any given sequence.
It is accessible at http://lifefaculty.biu.ac.il/gershon-tamar/index.php/element-
description (password-protected until publication; Username-GershonLab,
password- TJGL2014). ElemeNT searches the input sequences for the
presence of certain core promoter elements specified by the user. The
elements are represented by position weight matrices (PWMs), which are
constructed based on the experimentally validated biologically functional
sequences.
The elements that can be searched for are: mammalian Initiator, Drosophila
Initiator, TATA box, MTE, DPE, Bridge, BREu, BREd, Human TCT,
Drosophila TCT, XCPE1 and XCPE2. The MTE, DPE and Bridge motifs are
only calculated at the precise location relative to each detected
mammalian/Drosophila Initiator, based on the known strict spacing
requirement. The scores are normalized to be between 0 and 1, generating
46
more intuitively interpretable results. For each element, the user should
specify a threshold between 0 and 1, which determines whether the element
is present or not at a position. Default threshold values were empirically
determined for each element, based on known functional sequence elements,
and are provided.
The output of the program contains the analyzed sequence, a color display of
some possible core promoter elements combinations found, and a table
containing each of the detected elements alongside its position, PWM and
consensus match scores. A sample output of the ElemeNT program is
depicted in Figure 18.
In addition to the automation of core promoter elements annotation, the
ElemeNT program uses PWM data, rather than consensus sequence, to
score the putative motifs. The use of the PWM enables a better reflection of
the biological significance of the different nucleotides‟ distribution at specific
position, which is hard to account for by manual annotation of sequences.
Notably, for some elements, the PWM differs from the defined consensus,
reflecting differences in the analyses of the sequences by different labs.
47
Figure 18. A sample output of the ElemeNT program. ElemeNT has detected a TATA box
flanked by both, a BREu element and a BREd element, Drosophila and Mammalian initiator
elements and MTE, DPE and Bridge elements. Sample input sequence (top), the
combinations of elements identified in it (middle) and a table of the detected elements
(bottom) are shown. The two possible combinations result from a sequence match to both the
Drosophila and mammalian initiators, due to the partial sequence redundancy of the two
elements. The table displaying all the elements identified within the sample input sequence,
their location, PWM and consensus match scores. Note the message displayed for the TATA-
box, indicating the presence of mammalian and Drosophila initiator, as well as BREu and
BREd, at optimal distances for transcriptional synergy.
48
4.5 Part IV: Identification of binding sites of the human TAF6 and
TAF9, subunits of the TFIID in human promoters
Until this point in my study, in order to identify functional DPE motifs in human
promoters, our strategies relied on the strict conditions of spacing and sequence
of the DPE motif as characterized in Drosophila [47-49, 91]. Although
publications have demonstrated the transcriptionally dependency of two human
genes on an element that matches the definition of Drosophila DPE [48, 53],
using these strict features, we barely succeeded to define other functional DPE-
containing human genes. Nevertheless, one detail was not taken into
consideration, which is the proteins that bind this element. Originally, the
Drosophila DPE was defined as the recognition and binding site of the TFIID
subunits, TAF6 and TAF9 [47, 48]. More recently, footprinting assays and
structural studies of the human TFIID multi-complex (hTFIID) demonstrated that
the hTFIID binds to the TATA box, Inr, MTE and DPE sequences, which are
contained in the synthetic super core promoter [25, 98].
Hence, as an alternative approach towards the identification of human DPE-
containing promoters, we decided to identify the binding sites of human TAF6
and human TAF9 in human promoters. In contrast to the previous strategies, this
strategy is independent of the spacing requirement, location and sequence of
the Drosophila DPE. To this end, human stable cell lines were generated by the
Flp-In systemTM (Figure 19), in order to perform, thereafter, a high-resolution
Chromatin Immuno-Precipitation (ChIP) assay. The Flp-In system allows a single
allele, single site integration event, of a plasmid of interest (see for example
[99]).
49
Figure 19. Illustration of the Flp-In™ system. HEK-293 Flp-In cells are co-transfected with
the pcDNA5/FRT vector, which contains a gene of interest (i.e. the tagged human taf6 or
human taf9 genes), and the pOG44 plasmid (a Flp-recombinase expression plasmid). Flp-
recombinase catalyzes homologous recombination between the FRT site in the pcDNA5/FRT
vector and the FRT site, which is located in specific known locus in the genome of the cells.
Transfected cells express the desired gene and become Hygromycin B resistant and Zeocin
sensitive. This integration breaks the ORF of the lacZ gene that has been integrated in the
genome of the HEK-293 Flp-In parental cells.
Four constructs were generated in order to create four different stable cell
lines expressing one of the four tagged-TAF protein versions. Each of the TAFs
cDNAs (TAF6 or TAF9) were cloned into the pcDNA5/FRT vector with two short
tandem protein-tags, FLAG and HA, which were cloned in frame immediately
upstream or downstream (N- or C- terminal, respectively) of the TAFs. Tagging
was required because ChIP-grade antibodies against the TAFs were unavailable
to us.
50
The four plasmids are:
a. TAF6 with a C-term. FLAG-HA tag.
b. TAF6 with an N-term. FLAG-HA tag.
c. TAF9 with a C-term. FLAG-HA tag.
d. TAF9 with an N-term. FLAG-HA tag.
Each of the above plasmids was co-transfected with the pOG44 plasmid into
HEK-293 flp-in cultured cells. The pOG44 plasmid expresses the flp-
recombinase enzyme, which catalyzes homologous recombination between the
two FRT sites. One FRT site is located just downstream of the tagged-TAFs in
the pcDNA5/FRT vector. The second FRT site is located within the ORF of the
lacZ gene that is integrated in the HEK-293 flp-in genome. Subsequently the
tagged-TAFs are should be integrated specifically between these two FRT sites
(Figure 19). Starting one day post transfections, cells were grown in
Hygromycin-containing medium to select for stably integrated clones.
In order to examine the proper integration of the tagged-TAFs-containing
plasmids into the dedicated site in the genome, three distinct tests were
performed. The first test was whether the cells express β-galactosidase, which is
encoded by the lacZ gene. Generally, when the β-galactosidase catalyzes
hydrolysis of its chromogenic substrate X-gal, the cells are colored in blue.
However, because the integration site is located within the lacZ gene, the
insertion of the tagged-TAF-pcDNA5/FRT plasmid version should disrupt the
ORF of lacZ. Indeed, stable cell lines with successful integration event for each
tagged-TAF variation, remain mostly white when assayed for X-gal activity
(Figure 20).
51
A. B. C.
D. E. F.
G. H. I.
J. K. L.
M.
TAF6 with a C-term. FLAG-HA tag
TAF6 with an N-term. FLAG-HA tag
TAF9 with a C-term. FLAG-HA tag
TAF9 with an N-term. FLAG-HA tag
52
We next used PCR to test for the presence of two FRT sites in the stable
cell lines. Genomic DNA of each of these stable cell lines was extracted and
used as a template for PCR with specific primers for the FRT sites (Figure 21).
Each amplified PCR product was gel-purified and sequenced.
Figure 20. TAF6 and TAF9 are stably integrated in the stable cell lines, as determined by the
X-gal assay. Photos of three different areas of a well of each of the four Hygromycin-resistant stable
cell lines taken following three weeks of antibiotic selection and after fixation and incubation with an
X-gal-containing solution.
(A-C) TAF6 with a C-term. FLAG-HA; (D-F) TAF6 with an N-term. FLAG-HA; (G-I) TAF9 with a C-
term. FLAG-HA; (J-L) TAF9 with an N-term. FLAG-HA. (M) A positive control of untransfected
parental HEK-293 flp-in cells.
White cells indicate successful integration, whereas blue cells indicate unsuccessful integration.
Figure 21. Two FRT sites are present around the integration site in the stable cell lines. In
order to verify successful integration into the HEK-293 flp-in cells, genomic DNA was PCR-amplified.
with specific primers for each one of the FRT sites. A. Extracted genomic DNA from each of the
stable cell lines was run on a 1% agarose gel. As expected, the genomic DNA appeared at the top of
the gel. M, 1Kb DNA Marker. B. PCR using SV40 promoter- and Hygromycin gene-specific primers
to detect the FRT site that is located upstream of the tagged-TAFs-pcDNA5/FRT integrated
plasmids. The expected product size is marked by a white arrow. M, 1Kb Marker. C. PCR using BGH
poly A sequence- and lacZ gene-specific primers for the FRT site that is located downstream of the
tagged-TAFs-pcDNA5/FRT integrated plasmids. The expected product size is marked by a white
arrow. M1, 1Kb DNA Marker. M2, 100 bp DNA Marker.
N.T, No Template control. P.C, Positive-Control - a known sample that was previously tested. N.C,
Negative-Control - untransfected parental HEK-293 flp-in cells.
53
We next examined whether the four stable cell lines express each of the
tagged-TAFs by Western-blotting (WB). Whole cell extracts were with anti-FLAG,
anti-HA, anti-TAF6 or anti-TAF9 antibodies (Figure 22). TAF6 was detected in all
the four stable cell lines. Overexpressed TAF6 was detected using either anti-
FLAG, anti-HA or anti-TAF6 antibodies (Figure 22A, B). Notably, we could detect
endogenous TAF6 in the TAF9-overexpressing cell lines (Figure 22B). TAF9
was barely detectable by WB using the abovementioned antibodies (Figure 22A,
C).
Figure 22. The stable cell lines express tagged-
TAF6 and TAF9 protein variations.
A. Western-blot (WB) analysis using anti-HA (left
panel) or with anti-FLAG (right panel) antibodies.
The tagged-TAF6 versions are detected.
B. WB analysis using anti-TAF6 antibodies. TAF6
versions are over-expressed in extracts of the
tagged-TAF6 containing cell lines.
TAF6 is detected at ~75kDa.
C. WB analysis using anti-HA antibodies against
protein lysates from tagged-TAF9 stable cell lines. A
~33kDa band (as expected) is detected in N-
terminally tagged TAF9 cells and a similar, but very
weak band, is detected in the C-terminally tagged
TAF9 cells..
C' and N' are represent the position of the tandem
tags (FLAG and HA) relative to the TAF6 or TAF9
proteins. M, protein marker (kDa).
54
To further examine the expression of TAF9, we immuno-precipitated cell
extracts using anti-FLAG beads and subjected it to WB analysis using anti-FLAG
antibodies.
Unfortunately, we could not detect TAF9 using the immunoprecipitation (IP)
(data not shown). The difficulties to detect TAF9 proteins (both the tagged- and
endogenous- TAF9), can be explained by the biology of TAF9 and its function
that is independent of basal transcription (see discussion). Nevertheless, we
next focused on the ChIP experiments.
I next generated a ChIP-seq protocol for HEK-293 flp-in cells by combining
different published protocols. Experimental optimization of the ChIP-seq protocol
was done (data not shown). Nonetheless, we initiated a collaboration with Dr.
Julia Zeitlinger's lab (Stowers Institute for Medical Research).
This collaboration allows us to use a novel, yet unpublished ChIP method with
a higher-resolution than traditional methods such as ChIP-seq. This method,
termed ChIP-nexus, was developed in the Zeitlinger lab to detect transcription
factor binding sites in vivo with nucleotide resolution.
We shipped the four stable cell lines to the Zeitlinger lab and they are
currently performing ChIP-nexus, as well as ChIP-seq experiments on them,
using anti-FLAG and anti-HA antibodies.
55
5. Discussion
In search of a functional human DPE motifs – analysis of human
Cdx/Hox promoters and computational whole genome analysis
Individual examination of the promoters of human Cdx and Hox genes was
performed. Dual luciferase experiments on both minimal promoter and
enhancer-promoter constructs, demonstrated that several Hox promoters
contain functional DPE motifs that contribute to transcription from these
promoters. Moreover, similar results were obtained when the TATA box and
DPE sequences of Cdx1 or DPE sequence of Cdx2 promoters were mutated.
These observations can be explained by experimental evidence that the TATA
box, Inr and DPE serve as binding sites for different components of the TFIID
multi-complex, a major PIC component [20, 47-50]. Hence, upon mutation of
these motifs, the basal transcription machinery binds the promoter only through
a sub-optimal Inr motif, which leads to unstable architecture of the PIC, which in
turn, results in reduced transcription.
Additionally, sequence analysis of the core promoter regions of all the human
Hox transcripts, indicated that the vast majority of the human Hox promoters
contain DPE sequence motifs but not TATA box sequences.
These functional and in silico findings support the notion that the Hox genes,
despite of duplication and divergent events during evolution, are mostly
conserved. In addition to the known conservation of the Hox genes, which was
previously only done at the protein level by comparisons of the amino acids
homeodomain composition of different genes, our analyses present conservation
at the DNA level, and more importantly, in the promoters of these genes.
56
Based on these findings, several consequences may be suggested. First,
synchronized and cooperative regulation of the expression of Hox genes in
development and differentiation could be achieved by a shared transcriptional
mechanism driven by specific master regulators (e.g. CDX protein family).
Second, as a complementary view, it suggests that similarly to the Drosophila
Hox genes [50], human Hox genes may be preferentially regulated through
specific core promoter elements, such as the DPE.
However, we have shown that promoters of other Hox as well as other human
genes, which contain the Drosophila DPE sequence, did not contain functional
DPE motifs. Technically, the constructs used in the dual luciferase assays only
contained the minimal promoter comprising 50 nucleotides, and may thus, not be
able to present the actual transcriptional difference between wt and mutant DPE
constructs, if it is minor. Moreover, this assay measures the transcriptional
activity indirectly, by quantifying the enzymatic activity of the reporter protein that
is produced from the gene under the control of these promoters. Hence, it only
detects products of transcripts that were correctly translated into the active
enzyme, but not all the transcripts that were transcribed from the reporter gene.
Therefore, the measurements could be affected by the luciferase mRNAs
processing and stability and by the degradation of unfolded firefly luciferase
proteins.
The 'PCP project' defined Inr and DPE combinations in the human Hox
clusters computationally, based on the sequences of Drosophila Hox promoters.
Unfortunately, we did not identify any new functional DPE-containing human
57
genes using this approach. Notably, it is known that the TSSs in the human
genome, which are annotated in the genome browser, are not accurate, as seen
for the updated TSS of Cdx2. Furthermore, recent studies revealed transcription
initiation in enhancers [100-104] and a recent paper from the Lis lab indicated a
shared architecture between promoters and enhancers [105, 106]. Thus, some
of the final PCPs may theoretically encompass real TSSs. Importantly, these
results emphasize that transcription, even at the basal level, is regulated by
multiple factors in addition to sequence composition. Notably, in contrast to the
Hox genes, the putative DPE-containing promoters of the genes that were
analyzed in the 'whole genome' project are less conserved from Drosophila to
humans.
Hence, despite the identification of a few human genes that conform to the
Drosophila DPE definition, Drosophila and humans are still evolutionarily distant,
and the precise functional Drosophila DPE sequence might have evolved over
time to represent a modified set of nucleotides in humans. Subsequently, the
strict requirements of functional initiator and precise spacing present in
Drosophila, might be altered. These conclusions were the motivation to develop
an assay that would be independent of the Drosophila DPE definition and its
sequence features (see Part IV of the Results and discussion below).
58
Identification of a SNP in the +1 position of the Hoxb6 and its potential
implications in health and disease
Screening of dozens of core promoter sequences of T-ALL and AML patients,
revealed an interesting SNP in the human Hoxb6 gene. Based on dual luciferase
experiments, this SNP (cytosine to thymine), which is located in the known +1
position of the Hoxb6 transcript, enhanced the reporter activity about 850% as
compared to the frequent allele. It can be speculated that this substitution
contributes to the overexpression of Hoxb6, a frequent phenotype in the blood
cancers mentioned above. We have proposed a model how this SNP can
contribute to the utilization of an alternative TSS with a potentially stronger Inr
and DPE combination. Notably, there are many examples in which the aberrant
expression of genes that regulate embryogenesis has been implicated in
carcinogenesis [56]. The DPE element has been associated with developmental
genes and development regulators. Thus, it could be speculated that not only
this SNP and others have a role in cancer development, but the DPE might as
well. If so, along with previous data regarding TATA box sequence aberrations
and their correlation with diseases [95], this part of my thesis, strengthens the
clinical implications of the core promoter elements.
59
Identification of binding sites of the human TAF6 and TAF9, subunits of
the TFIID in human promoters
In order to characterize the DPE motif in human promoters, independently of
the strict requirements of the Drosophila DPE motif, tagged- human TAF6 and
human TAF9 stable cell lines were generated in the last stage of this work. To
validate the integration and expression of the TAFs, three tests were performed.
While the X-gal and the integration tests were successful for all four cell lines,
the protein expression data indicated high TAF6 overexpression, but mostly
undetectable tagged and endogenous TAF9 expression (Figure 22). There are
several explanations that may account for our inability to detect TAF9. First, the
half-life of TAF9 is shorter than the half-life of TAF6. Thus, it may be difficult to
detect TAF9 proteins by WB, although TAF9 was previously detected by WB
[107]. Moreover, the FLAG-HA tags at N- or C- terminus may also influence on
the stability of TAF9 protein more than the stability of TAF6 protein because the
TAF9 protein is smaller than TAF6 protein.
Second, it was shown that TAF9 protein contributes to the tumor suppression
activity of p53 through protein-protein interactions [108, 109]. Thus, a plausible
(yet not highly likely), explanation for the barely detection of TAF9 in the stable
cell lines is that the TAF9 is degraded in HEK-293 flp-in cells, whose karyotype
is abnormal and were previously shown to generate tumors in nude mice [110].
Moreover, to identify the hTFIID complexes at active promoters, recent ChIP-
chip experiments were performed on human embryonic stem cells (hESCs) and
revealed non-canonical hTFIID complexes, which are only composed of six
TAFs, containing TAF6 but not TAF9 [111]. Therefore, the endogenous TAF9
may not be expressed in some human embryonic-origin cells. This, however,
60
does not account for the fact that we were unable to detect the ectopic TAF9
driven by a CMV enhancer-promoter.
Nevertheless, all the four stable cell lines have been used for ChIP-nexus
experiments that allow the identification of the DNA binding sites of the ChIPed
TAFs at single nucleotide resolution. With this method, it will be possible, for the
first time, to characterize de novo a core promoter element (a human DPE) that
is the functional equivalent of the Drosophila DPE motif, in a manner that is
independent of the original DPE.
To conclude, in this thesis, which is composed of five related, but
independent projects, we aimed to identify and characterize a DPE motif in
the human genome. Based on a previous knowledge about the original DPE
motif that has been extensively studied in Drosophila, through computational
and experimental evidence using the human Hox genes as a test case and
finally by the ChIP-nexus, the hDPE can be regarded as a human core
promoter element with potential links to gene expression in blood cancers.
61
6. Materials and Methods
Plasmids received
1. TK Renilla luciferase- This plasmid contains the Renilla luciferase gene
that is controlled by the Thymidine Kinase promoter. This plasmid was
received from the lab of Prof. Yaron Shav-Tal.
2. pBlueScript- A commercial plasmid (Stratagene).
3. pGL3 modified basic- This is the commercial plasmid of pGL3 basic
(Promega) but with a different multiple cloning site. 'Basic' means that
enhancer and promoter do not exist in this plasmid.
4. pOG44- A commercial plasmid (Life Technologies).
Construction of Plasmids
1. Cdx1-pGL3- cloning the core promoters (-40 to +40 relative to the TSS)
versions of Cdx1 (wt, mTATA, mDPE, mTATAmDPE) into the pGL3
modified basic vector using a “Drop-In” procedure. In order to generate
this plasmid, the Cdx1 promoter was divided into two parts that would
later be ligated to each other and to the vector at the same time ('three
way ligation' manner). Designed primers were (ordered from IDT),
include a KpnI compatible sticky site at the 5 ' end of the upstream part,
an SpeI compatible sticky site at the 3' end of the downstream part and
cohesive sticky ends between the two parts. Construction was verified
by DNA sequencing (Hy Labs).
2. Cdx2/Hox/other promoters-pGL3- cloning the minimal promoter (-10 to
+40 relative to the TSS) versions (wt and mDPE or wt and SNP for
Hoxb6 only) of the genes: Cdx2, Hoxa1, Hoxa2, Hoxa9, Hoxa11,
Hoxb3, Hoxb6, Hoxc6, Hoxc8, Hoxd3, Hoxd9, Hoxd10, Hoxd13, twist2,
62
snail1, cdc25a, cdc25b, cdc34, proS1, ccnd1, p21 and tp53inp2 into
the pGL3 modified basic vector using a “Drop-In” procedure (primers
were ordered from IDT). The minimal promoter versions of Hoxa6 and
Hoxb9 genes are cut and purified with PstI and XbaI restriction
enzymes from pUC119 plasmids that contained these minimal
promoters (previously constructed by Dr. Juven-Gershon).
Construction was verified by DNA sequencing (Hy Labs).
3. enhancer-promoter wt Hox -pGL3- generation of this type of constructs,
which include the promoter region of five different Hox genes (Hoxa1,
Hoxa2, Hoxa11, Hoxb9 and Hoxc8) with ~1kb upstream region, was
done using PCR on genomic DNA that was purified from HEK-293
cultured cells with gene-specific primers. The PCR primers for each
Hox gene are listed below (cloned positions are noted):
Hoxa1 (from -987 to +145 relative to the known TSS)
- Forward primer: 5'- CTCCTACCCCTAAAAATCCGGCGGTC -3'
- Reverse primer: 5'- ACTGCTAAGTATGGGGTATTCCAGGAAGGA -3'
Hoxa2 (from -531 to +160 relative to the known TSS)
- Forward primer: 5'- CTTTCTCCATCTCTCAAACTCTCTCTTCTTC -3'
- Reverse primer: 5'- CGCTGCTAGGGTGTTTTTTTTCTAATTCAC -3'
Hoxa11 (from -800 to +98 relative to the known TSS)
- Forward primer: 5'- GATCCCGGGTAAGACGAAGGCCCT -3'
- Reverse primer: 5'- CAGGGACCACGCTCATCAAAATCCATT -3'
63
Hoxb9 (from -986 to +69 relative to the known TSS)
- Forward primer: 5'- GTGGCCTTAACCCTTTCTCCTATTTAGCTCCCTCATCAG -3'
- Reverse primer: 5'- CACCCCCTGCTCAACTTCTCAGCCAACAAAGTA -3'
Hoxc8 (from -963 to +141 relative to the known TSS)
- Forward primer: 5'- CCAGCTAGAAACCAGGGACACACAGCT -3'
- Reverse primer: 5'- CTCACGAGTACCCCGCCCAGTACC -3'
PCR products were first cloned into the pJET 2.1 blunt vector and then
transferred into the pGL3 vector using restriction enzymes.
Construction was verified by DNA sequencing (Hy Labs).
4. Enhancer-promoter mDPE Hox-pGL3- Constructs were generated by
site-Directed mutagenesis following Stratagene‟s QuickChange
protocol. Complementary primers (IDT) include nucleotides
mismatches to the DPE motif and additional sequences that surround
them. The enhancer-promoter wt Hox -pGL3 plasmids served as
templates for the mutagenesis reaction. Following the mutagenesis
PCR, tubes were incubated with DpnI to digest the template plasmids.
DNA sequence-verified fragments that encompass the mutated
nucleotides were sub-cloned into their corresponding locations in the
wt vectors.
5. FLAG-HA tagged TAF6/9 versions-pcDNA5/FRT- this set of plasmids
contains four types of inserts: a) TAF6 with a C-term. FLAG-HA tag, b)
TAF6 with an N-term. FLAG-HA tag, c) TAF9 with a C-term. FLAG-HA
tag and d) TAF9 with an N-term. FLAG-HA tag. In order to preserve the
correct open-reading frame that includes the tags- and the TAFs-types,
construction of these plasmids was done in two steps. First, I cloned
64
the FLAG-HA encoding sequences with an addition of Methionine
codon or stop codon into the pcDNA5/FRT vector by the “Drop-In”
procedure. These inserts are flanked by NheI and KpnI or KpnI and
XhoI sites (For sequences of the tags and related details, see
Appendix 8). Next, I set PCR reactions on pET-taf6/taf9 containing
vectors (received from Prof. Rivka Dikstein, Weizmann Institute,
Rehovot), with primers that lack a Methionine codon (for N-term. tags)
or lack a stop codon (for C-term. tags). The primers are:
- forward TAF9 C-term. tags: 5'- GCTAGCATGGAGTCTGGCAAGACG -3'
- reverse TAF9 C-term. tags: 5'- GGTACCCAGATTATCATAGTCATCATCATCATCAT -
3'
- forward TAF9 N-term. tags: 5'- GGTACCGAGTCTGGCAAGACGGCTT -3'
- reverse TAF9 N-term. tags: 5'- CTCGAGTTACAGATTATCATAGTCATCATCATCATC
ATCGTC -3'
- forward TAF6 C-term. tags: 5'- GCTAGCATGGCTGAGGAGAAGAAGCTGAAGCTTAGC -
3'
- reverse TAF6 C-term. tags: 5'- GGTACCCGGAGCAGGCTGAGGGGA -3'
- forward TAF6 N-term. tags: 5'- GGTACCGCTGAGGAGAAGAAGCTGAAGCTT -3'
- reverse TAF6 N-term. tags: 5'- CTCGAGTCACGGAGCAGGCTGAGG -3'
GCTAGC- NheI site, GGTACC-KpnI site and CTCGAG – XhoI site. Highlighted in Red are
stop codons.
The PCR products were cloned into the pcDNA5/FRT with FLAG-HA
tags using the abovementioned restriction enzymes. The plasmids are
schematically illustrated in Appendix 8(B-E).
65
Cultured cells
Both Human Embryonic Kidney (HEK) -293 cells and HEK-293 flp-in cells
were cultured in DMEM with high glucose supplemented with 10% FBS, 1%
L-Glutamine, 1% Pen-Strep, 0.2% Amphotericin (Biological industries) at 37oC
with 5% CO2. HEK-293 flp-in TAF6 and TAF9 stable cells were grown in
media containing 85ug/ml of Hygromycin. The HEK-293 cells were received
from Prof. Ronit Sarid and Dr. Jeremy Don, and the parental HEK-293-flp-in
cell line was received from Prof. Yaron Shav-Tal.
Transfections
1. Transient transfections- One day prior to transfections, 0.8 X 106 HEK-
293 cells were plated in each 60mm dish to obtain 60%-80%
confluence on the following day. Cells were transfected by calcium
phosphate with 3g total DNA per dish (see details in the dual
luciferase section below). Just prior to transfections, the medium was
replaced with medium containing 0.1% Chloroquine, which prevents
lysosomal degradation and improves the transfection efficiency. After
six to eight hours, the medium was replaced with a new medium
without Chloroquine, and after forty-eight hours the cells were
harvested for reporter activity assays or for RNA purification.
2. Stable transfections- HEK293-flp-in cells, which contain FRT site in
their genome, were co-transfected with each of the taf6/9 tagged
plasmids, and with the pOG44 flp recombinase expression plasmid in
ratio of 1:9, respectively (this ratio in favor of the pOG44 plasmid is
used to increase the chances of homologous recombination into the
66
FRT site, rather than random integration in the genome). One day prior
to co-transfections, 2-3 X 106 cells were plated in 100mm plates to
obtain 60%-80% confluence on the following day. Cells were
transfected using calcium phosphate with 10g DNA in total (1g of
each of the different taf 6/9 expression vectors and 9g of pOG44
plasmid) per plate. Just prior to transfections, the medium was
replaced with medium containing 0.1% Chloroquine. After six to eight
hours, the medium was replaced with a new medium without
Chloroquine, Starting one day after the transfections and during the
next three weeks, growth medium was supplemented with 85ug/ml of
Hygromycin B. Hygromycin-resistant colonies were transferred to
larger plates and finally to 75cm2 flasks. To verify genomic integration,
cells were tested for lack of -gal expression. Colonies in which the
majority of cells were white In the X-gal test were taken for further
analyses. Genomic integration was also tested by PCR. Cell extracts
were analyzed by western blotting in order to detect the expression of
the tagged-TAFs in the stable cell lines.
X-gal staining (lacZ assay)
The four stable cell clones, as well as the parental HEK-293 flp-in cells (as
a negative control) were grown in 24 well plates for 24 h. After 24 h, the
cells were washed in PBS x1, fixed with 3.7% formaldehyde for 5 minutes,
washed again with PBS x1 and incubated in X-gal solution (containing
5mM K3Fe(CN)6, 5mM K4Fe(CN)6, 2mg MgCl2 and 1mg/ml X-gal) at
37°C for at least three hours.
67
PCR amplification: detection of FRT sites in the stable cell lines
Genomic DNA was extracted from the four stable cell clones as well as from
the parental HEK-293 flp-in cells (as a negative control) using the Archive
Pure DNA Cell/Tissue kit (5 PRIME). The genomic DNA was used as a
template for two separate PCRs. The first PCR reaction was done with a
forward SV40 promoter primer and a reverse Hygromycin-resistance gene
primer (Hygro). The second PCR reaction was done with a forward BGH
polyA primer and a reverse lacZ-Zeocin gene primer (lacZ). PCR products
were run on an agarose gel and the amplified DNA was extracted using the
Nucleo Spin PCR clean-up Gel extraction kit (Macherey Nagel). The amplified
DNA was sequenced using the same primers of the PCR reactions. The
primers are:
- SV40 promoter (forward primer): 5′-CCAGTTCCGCCCATTCTCC-3’
- Hygro (reverse primer): 5’-CTGTTATGCGGCCATTGTCC -3’
- BGH polyA (forward primer): 5'- CGAGTCTAGAGGGCCCGTTTAAAC -3'
- lacZ (reverse primer): 5'- GTAACCGTGCATCTGCCAGTTTG -3'
Western-blotting
Protein extracts from the stable cell lines, which express TAF6- or TAF9-
FLAG-HA versions were run and separated by electrophoresis in
polyacrylamide –SDS gels (concentration of 12%-15%) and transferred to
nitrocellulose membrane (GE Healthcare). For western blotting, the
membranes were incubated with different primary antibodies; αFLAG-M2
(Sigma), αHA11.1 (Covance), αTAF6 or αTAF9 (from Lazslo Tora's lab).
Washes of the primary antibodies were done with 5% milk PBS-T. The
68
membranes were then reacted with the secondary antibodies, goat-anti-
mouse-HRP. Washes of the secondary antibodies were done with PBS-T
(without 5% milk). The detection of the TAF6 and TAF9 proteins was done by
the EZ-ECL Kit (Biological Industries).
Dual luciferase analysis
Each transient transfection was done in triplicates. The total amount of DNA
per dish was 3g. This amount of DNA was consisted of: 2.5g of the pGL3
reporter constructs, 0.1g of TK Renilla (RL) and 0.4g of the pBlueScript
vector for completion to 3g total DNA. Cells were harvested 48 hours post
transfection and extracts were analyzed for both firefly and Renilla luciferase
activities using the Synergy instrument (BioTek). To normalize for variations in
transfection efficiency, firefly luciferase values for each plate were divided by
the Renilla luciferase values.
RNA extraction
Transfected HEK-293 cells were washed with PBSx1 and were harvested for
RNA production using the Trizol reagent (Invitrogen) or the PerfectPure RNA
Cell & Tissue kit (5 PRIME). The levels of extracted RNA were tested by a
nanodrop spectrophotometer. RNA was also run on agarose gels to assess its
quality. The RNA was used as a template for primer extension analysis.
Primer extension analysis
Total RNA, which was extracted from transfected HEK 293 cells, was used as
a template for cDNA synthesis by the MMLV Reverse Transcriptase
69
(Promega) using a specific 32P end-labeled primer. In order to accurately
identify the transcripts' start sites, 2g of each transfected plasmid was
sequenced with the same primer used for the primer extension reaction, by
the sequenase Version 2.0 DNA sequencing kit (USB). The samples were run
on an 8% polyacrylamide-urea DNA sequencing gel. The gel was dried at
80oC under vacuum and exposed overnight to a PhosphoImager screen (GE
Healthcare). The primer extension reaction was done on 15-30g of total
RNA. The labeled primer is complementary to the firefly luciferase gene.
Computational software and Bioinformatic tools
1. DAC software- The 'Determinacy Analysis Chain' Software is a
program that uses a search algorithm that refers to texts (i.e. DNA
sequence) as a sequential string of letters. This tool finds the positions
in a sequence that match the search (using the combinations of bases
that are contained in the different IGms).
2. hDPEsearcher Software- This tool is a MatLab based software, which
was developed by Amitay Drummer, Anna Sloutskin and I. The script of
this software contains pre-determined instructions for searching Inr and
DPE combinations in the human genome (for more details, see above:
section 4.4.1.).
70
7. Reference
1. Splinter, E. and W. de Laat, The complex transcription regulatory landscape
of our genome: control in three dimensions. EMBO J, 2011. 30(21): p. 4345-55.
2. Dong, X., et al., Modeling gene expression using chromatin features in
various cellular contexts. Genome Biol, 2012. 13(9): p. R53.
3. Shandilya, J. and S.G. Roberts, The transcription cycle in eukaryotes: from
productive initiation to RNA polymerase II recycling. Biochim Biophys Acta, 2012.
1819(5): p. 391-400.
4. Thomas, M.C. and C.M. Chiang, The general transcription machinery and
general cofactors. Crit Rev Biochem Mol Biol, 2006. 41(3): p. 105-78.
5. Butler, J.E. and J.T. Kadonaga, The RNA polymerase II core promoter: a key
component in the regulation of gene expression. Genes Dev, 2002. 16(20 :(p. 2583-
92.
6. Kadonaga, J.T., Perspectives on the RNA polymerase II core promoter. Wiley
Interdiscip Rev Dev Biol, 2012. 1(1): p. 40-51.
7. Juven-Gershon, T. and J.T. Kadonaga, Regulation of gene expression via the
core promoter and the basal transcriptional machinery. Dev Biol, 2010. 339(2): p.
225-9.
8. Lenhard, B., A. Sandelin, and P. Carninci, Metazoan promoters: emerging
characteristics and insights into transcriptional regulation. Nat Rev Genet, 2012.
13(4): p. 233-45.
9. Heintzman, N.D. and B .Ren, The gateway to transcription: identifying,
characterizing and understanding promoters in the eukaryotic genome. Cell Mol Life
Sci, 2007. 64(4): p. 386-400.
10. Juven-Gershon, T., et al., The RNA polymerase II core promoter - the
gateway to transcription. Curr Opin Cell Biol, 2008. 20(3): p. 253-9.
11. Carninci, P., et al., Genome-wide analysis of mammalian promoter
architecture and evolution. Nat Genet, 2006. 38(6): p. 626-35.
12. Rach, E.A., et al., Motif composition, conservation and condition-specificity of
single and alternative transcription start sites in the Drosophila genome. Genome
Biol, 2009. 10(7): p. R73.
13. Bajic, V.B., et al., Mice and men: their promoter properties. PLoS Genet,
2006. 2(4): p. e54.
14. Hoskins, R.A., et al., Genome-wide analysis of promoter architecture in
Drosophila melanogaster. Genome Res, 2011. 21(2): p. 182-92.
71
15. Stamatoyannopoulos, J.A., Illuminating eukaryotic transcription start sites.
Nat Methods, 2010. 7(7): p. 501-3.
16. Matsui, T., et al., Multiple factors required for accurate initiation of
transcription by purified RNA polymerase II. J Biol Chem, 1980. 255(24): p. 11992-6.
17. Samuels, M., A. Fire, and P.A. Sharp, Separation and characterization of
factors mediating accurate transcription by RNA polymerase II. J Biol Chem, 1982.
257(23): p. 14419-27.
18. Dikstein, R., The unexpected traits associated with core promoter elements.
Transcription, 2011. 2(5): p. 201-6.
19. Kadonaga, J.T., The DPE, a core promoter element for transcription by RNA
polymerase II. Exp Mol Med, 2002. 34(4): p. 259-64.
20. Smale, S.T. and J.T. Kadonaga, The RNA polymerase II core promoter. Annu
Rev Biochem, 2003. 72: p. 449-79.
21. Muller, F. and L. Tora, The multicoloured world of promoter recognition
complexes. EMBO J, 2004. 23 )1 :( p. 2-8.
22. Tora, L., A unified nomenclature for TATA box binding protein (TBP)-
associated factors (TAFs) involved in RNA polymerase II transcription. Genes Dev,
2002. 16(6): p. 673-5.
23. Muller, F. and L. Tora, Chromatin and DNA sequences in defining promoters
for transcription initiation. Biochim Biophys Acta, 2014. 1839(3): p. 118-28.
24. Anish, R., et al., Characterization of transcription from TATA-less promoters:
identification of a new core promoter element XCPE2 and analysis of factor
requirements. PLoS One, 2009. 4(4): p. e5103.
25. Juven-Gershon, T., S. Cheng, and J.T. Kadonaga, Rational design of a super
core promoter that enhances gene expression. Nat Methods, 2006. 3(11): p. 917-22.
26. Goldberg, M.L., Ph.D. thesis, in Stanford University 1979.
27. Ohler, U., et al., Computational analysis of core promoters in the Drosophila
genome. Genome Biol, 2002. 3(12): p. RESEARCH0087.
28. Kim, T.H., et al., A high-resolution map of active promoters in the human
genome. Nature, 2005. 436(7052): p .876-80.
29. Gershenzon, N.I. and I.P. Ioshikhes, Synergy of human Pol II core promoter
elements revealed by statistical sequence analysis. Bioinformatics, 2005. 21(8): p.
1295-300.
30. Mencia, M., et al., Activator-specific recruitment of TFIID and regulation of
ribosomal protein genes in yeast. Mol Cell, 2002. 9(4): p. 823-33.
31. Basehoar, A.D., S.J. Zanton, and B.F. Pugh, Identification and distinct
regulation of yeast TATA box-containing genes. Cell, 2004. 116(5): p. 699-709.
72
32. Molina, C. and E. Grotewold, Genome wide analysis of Arabidopsis core
promoters. BMC Genomics, 2005. 6: p. 25.
33. Yamamoto, Y.Y., et al., Differentiation of core promoter architecture between
plants and mammals revealed by LDSS analysis. Nucleic Acids Res, 2007. 35(18): p .
6219-26.
34. Reeve, J.N., Archaeal chromatin and transcription. Mol Microbiol, 2003. 48(3):
p. 587-98.
35. Singer, V.L., C.R. Wobbe, and K. Struhl, A wide variety of DNA sequences
can functionally replace a yeast TATA element for transcriptional activation. Genes
Dev, 1990. 4(4): p. 636-45.
36. Corden, J., et al., Promoter sequences of eukaryotic protein-coding genes.
Science, 1980. 209(4463): p. 1406-14.
37. Smale, S.T. and D. Baltimore, The "initiator" as a transcription control
element. Cell, 1989. 57 )1 :( p. 103-13.
38. FitzGerald, P.C., et al., Comparative genomics of Drosophila and human core
promoters. Genome Biol, 2006. 7(7): p. R53.
39. Gershenzon, N.I., E.N. Trifonov, and I.P. Ioshikhes, The features of
Drosophila core promoters revealed by statistical analysis. BMC Genomics, 2006. 7:
p. 161.
40. Kaufmann, J. and S.T. Smale, Direct recognition of initiator elements by a
component of the transcription factor IID complex. Genes Dev, 1994. 8(7): p. 821-9.
41. Verrijzer, C.P., et al., Binding of TAFs to core elements directs promoter
selectivity by RNA polymerase II. Cell, 1995. 81(7): p. 1115-25.
42. Chalkley, G.E. and C.P. Verrijzer, DNA binding site selection by RNA
polymerase II TAFs: a TAF(II)250-TAF(II)150 complex recognizes the initiator. EMBO
J, 1999. 18(17): p. 4835-45.
43. Javahery, R., et al., DNA sequence requirements for transcriptional initiator
activity in mammalian cells. Mol Cell Biol, 1994. 14(1): p. 116-27.
44. Purnell, B.A., P.A. Emanuel, and D.S. Gilmour, TFIID sequence recognition of
the initiator and sequences farther downstream in Drosophila class II genes. Genes
Dev, 1994. 8(7): p. 830-42.
45. Yang, C., et al., Prevalence of the initiator over the TATA box in human and
yeast genes and identification of DNA motifs enriched in human TATA-less core
promoters. Gene, 2007. 389(1): p. 52-65.
46. Frith, M.C., et al., A code for transcription initiation in mammalian genomes.
Genome Res, 2008. 18(1): p. 1-12.
73
47. Burke, T.W. and J.T. Kadonaga, Drosophila TFIID binds to a conserved
downstream basal promoter element that is present in many TATA-box-deficient
promoters. Genes Dev, 1996. 10(6): p. 711-24.
48. Burke, T.W. and J.T. Kadonaga, The downstream core promoter element,
DPE, is conserved from Drosophila to humans and is recognized by TAFII60 of
Drosophila. Genes Dev, 1997. 11(22): p. 3020-31.
49. Kutach, A.K. and J.T. Kadonaga, The downstream promoter element DPE
appears to be as widely used as the TATA box in Drosophila core promoters. Mol
Cell Biol, 2000. 20(13): p. 4754-64.
50. Juven-Gershon, T., J.Y. Hsu, and J.T. Kadonaga, Caudal, a key
developmental regulator, is a DPE-specific transcriptional factor. Genes Dev, 2008.
22(20): p. 2823-30.
51. Zehavi, Y., et al., Core promoter functions in the regulation of gene
expression of Drosophila dorsal target genes. J Biol Chem, 2014. 289(17): p. 11993-
2004.
52. Zehavi, Y., et al., The core promoter composition establishes a new
dimension in developmental gene networks. Nucleus, 2014. 5(4.(
53. Duttke, S.H., RNA polymerase III accurately initiates transcription from RNA
polymerase II promoters in vitro. J Biol Chem, 2014. 289(29): p. 20396-404.
54. Lewis, E.B., A gene complex controlling segmentation in Drosophila. Nature,
1978. 276(5688): p. 565-70.
55. McGinnis, W., et al., A conserved DNA sequence in homoeotic genes of the
Drosophila Antennapedia and bithorax complexes. Nature, 1984. 308(5958): p. 428-
33.
56. Abate-Shen, C., Deregulated homeobox gene expression in cancer: cause or
consequence? Nat Rev Cancer, 2002. 2(10): p. 777-85.
57. Lappin, T.R., et al., HOX genes: seductive science, mysterious mechanisms.
Ulster Med J, 2006. 75(1): p. 23-31.
58. Rawat, V.P., R.K. Humphries, and C. Buske, Beyond Hox: the role of
ParaHox genes in normal and malignant hematopoiesis. Blood, 2012. 12 0)3 :( p. 519-
27.
59. Pearson, J.C., D. Lemons, and W. McGinnis, Modulating Hox gene functions
during animal body patterning. Nat Rev Genet, 2005. 6(12): p. 893-904.
60. McIntyre, D.C., et al., Hox patterning of the vertebrate rib cage. Development,
2007 .134)16 :( p. 2981-9.
61. Wellik, D.M., Hox patterning of the vertebrate axial skeleton. Dev Dyn, 2007.
236(9): p. 2454-63.
74
62. Mallo, M. and C.R. Alonso, The regulation of Hox gene expression during
animal development. Development, 2013. 140(19): p. 3951-6 3.
63. Montavon, T. and D. Duboule, Chromatin organization and global regulation
of Hox gene clusters. Philos Trans R Soc Lond B Biol Sci, 2013. 368(1620): p.
20120367.
64. Deschamps, J. and J. van Nes, Developmental regulation of the Hox genes
during axial morphogenesis in the mouse. Development, 2005. 132(13): p. 2931-42.
65. Mlodzik, M., A. Fjose, and W.J. Gehring, Isolation of caudal, a Drosophila
homeo box-containing gene with maternal expression, whose transcripts form a
concentration gradient at the pre-blastoderm stage. EMBO J, 1985. 4(11): p. 2961-9.
66. Mlodzik, M. and W.J. Gehring, Expression of the caudal gene in the germ line
of Drosophila: formation of an RNA and protein gradient during early embryogenesis.
Cell, 1987. 48(3): p. 465-78.
67. Mlodzik, M., G. Gibson, and W.J. Gehring, Effects of ectopic expression of
caudal during Drosophila development. Development, 1990. 109(2): p. 271-7.
68. Levine, M., et al., Expression of the homeo box gene family in Drosophila.
Cold Spring Harb Symp Quant Biol, 1985. 50: p. 209-22.
69. Macdonald, P.M. and G. Struhl, A molecular gradient in early Drosophila
embryos and its role in specifying the body pattern. Nature, 1986. 324(6097): p. 537-
45.
70. Sanson, B., Generating patterns from fields of cells. Examples from
Drosophila segmentation. EMBO Rep, 2001. 2(12): p. 1083-8.
71. Svingen, T. and K.F. Tonissen, Hox transcription factors and their elusive
mammalian gene targets. Heredity (Edinb), 2006. 97(2): p. 88-96.
72. van den Akker, E., et al., Cdx1 and Cdx2 have overlapping functions in
anteroposterior patterning and posterior axis elongation. Development, 2002. 129(9):
p. 2181-93.
73. Lengerke, C., et al., BMP and Wnt specify hematopoietic fate by activation of
the Cdx-Hox pathway. Cell Stem Cell, 2008 .2)1 :( p. 72-82.
74. Butler, J.E. and J.T. Kadonaga, Enhancer-promoter specificity mediated by
DPE or TATA core promoter motifs. Genes Dev, 2001. 15(19): p. 2515-9.
75. Quinonez, S.C. and J.W. Innis, Human HOX gene disorders. Mol Genet
Metab, 2014. 111(1 :(p. 4-15.
76. Argiropoulos, B. and R.K. Humphries, Hox genes in hematopoiesis and
leukemogenesis. Oncogene, 2007. 26(47): p. 6766-76.
77. Frohling, S., et al., HOX gene regulation in acute myeloid leukemia: CDX
marks the spot? Cell Cycle, 2007. 6(18): p .2241-5.
75
78. Scholl, C., et al., The homeobox gene CDX2 is aberrantly expressed in most
cases of acute myeloid leukemia and promotes leukemogenesis. J Clin Invest, 2007.
117(4): p. 1037-48.
79. Andreeff, M., et al., HOX expression patterns identify a common signature for
favorable AML. Leukemia, 2008. 22(11): p. 2041-7.
80. Starkova, J., et al., HOX gene expression in phenotypic and genotypic
subgroups and low HOXA gene expression as an adverse prognostic factor in
pediatric ALL. Pediatr Blood Cancer, 201 0 .55)6 :( p. 1072-82.
81. Lengerke, C. and G.Q. Daley, Caudal genes in blood development and
leukemia. Ann N Y Acad Sci, 2012. 1266: p. 47-54.
82. Passegue, E., et al., Normal and leukemic hematopoiesis: are leukemias a
stem cell disorder or a reacquisition of stem cell characteristics? Proc Natl Acad Sci
U S A, 2003. 100 Suppl 1: p. 11842-9.
83. Eklund, E., The role of Hox proteins in leukemogenesis: insights into key
regulatory events in hematopoiesis. Crit Rev Oncog, 2011. 16(1-2): p. 65-76.
84. Zweig ,A.S., et al., UCSC genome browser tutorial. Genomics, 2008. 92(2): p.
75-84.
85. Rosenbloom, K.R., et al., The UCSC Genome Browser database: 2015
update. Nucleic Acids Res, 2014.
86. Yamashita, R., et al., DBTSS: DataBase of Transcriptional Start Sites
progress report in 2012. Nucleic Acids Res, 2012. 40(Database issue): p. D150-4.
87. Hendrix, D.A., et al., Promoter elements associated with RNA Pol II stalling in
the Drosophila embryo. Proc Natl Acad Sci U S A, 2008. 105(22): p. 7762-7.
88. Nechaev, S .and K. Adelman, Pol II waiting in the starting gates: Regulating
the transition from transcription initiation into productive elongation. Biochim Biophys
Acta, 2011. 1809(1): p. 34-45.
89. Goldberg, M.L., PhD thesis. Stanford University, 1979.
90. Guglielmi, B., N. La Rochelle, and R. Tjian, Gene-specific transcriptional
mechanisms at the histone gene cluster revealed by single-cell imaging. Mol Cell,
2013. 51(4): p. 480-92.
91. Lim, C.Y., et al., The MTE, a new core promoter element for transcription by
RNA polymerase II. Genes Dev, 2004. 18(13): p. 1606-17.
92. Kedmi, A., et al., Drosophila TRF2 is a preferential core promoter regulator.
Genes Dev, 2014. 28(19): p. 2163-74.
93. Bhatlekar, S., J.Z. Fields, and B.M. Boman, HOX genes and their role in the
development of human cancers. J Mol Med (Berl), 2014. 92(8): p. 811-23.
76
94. Drabkin, H.A., et al., Quantitative HOX expression in chromosomally defined
subsets of acute myelogenous leukemia. Leukemia, 2002. 16(2): p. 186-95.
95. Savinkova, L.K., et al., TATA box polymorphisms in human gene promoters
and associated hereditary pathologies. Biochemistry (Mosc), 2009. 74(2): p. 117-29.
96. Giampaolo, A., et al., Expression pattern of HOXB6 homeobox gene in
myelomonocytic differentiation and acute myeloid leukemia. Leukemia, 2002. 16(7):
p. 1293-301.
97. Fischbach, N.A., et al., HOXB6 overexpression in murine bone marrow
immortalizes a myelomonocytic precursor in vitro and causes hematopoietic stem cell
expansion and acute myeloid leukemia in vivo. Blood, 2005 .105)4 :( p. 1456-66.
98. Cianfrocco, M.A., et al., Human TFIID binds to core promoter DNA in a
reorganized structural state. Cell, 2013. 152(1-2): p. 120-31.
99. Yunger, S., et al., Quantifying the transcriptional output of single alleles in
single living mammalian cells. Nat Protoc, 2013. 8(2): p. 393-408.
100. Kim, T.K., et al., Widespread transcription at neuronal activity-regulated
enhancers. Nature, 2010. 465(7295): p. 182-7.
101. Lai, F. and R. Shiekhattar, Enhancer RNAs: the new molecules of
transcription. Curr Opin Genet Dev, 2014. 25: p. 38-42.
102. Lam, M.T., et al., Enhancer RNAs and regulated transcriptional programs.
Trends Biochem Sci, 2014. 39(4): p. 170-82.
103. Li, W., M.T. Lam, and D. Notani, Enhancer RNAs. Cell Cycle, 2014. 13(20): p .
3151-2.
104. Andersson, R., et al., An atlas of active enhancers across human cell types
and tissues. Nature, 2014. 507(7493): p. 455-61.
105. Core, L.J., et al., Analysis of nascent RNA identifies a unified architecture of
initiation regions at mammalian promoters and enhancers. Nat Genet, 2014. 46(12):
p. 1311-20.
106. Weingarten-Gabbay, S. and E. Segal, A shared architecture for promoters
and enhancers. Nat Genet, 2014. 46(12): p. 1253-4.
107. Frontini, M., et al., TAF9b (formerly TAF9L) is a bona fide TAF that has
unique and overlapping roles with TAF9. Mol Cell Biol, 2005. 25(11): p. 4638-49.
108. Lu, H., et al., The regulation of p53-mediated transcription and the roles of
hTAFII31 and mdm-2. Harvey Lect, 1994. 90: p. 81-93.
109. Lu, H. and A.J. Levine, Human TAFII31 protein is a transcriptional coactivator
of the p53 protein. Proc Natl Acad Sci U S A, 1995. 92(11): p. 5154-8.
110. Shen, C., et al., The tumorigenicity diversification in human embryonic kidney
293 cell line cultured in vitro. Biologicals, 2008. 36(4): p. 263-8.
77
111. Maston, G.A., et al., Non-canonical TAF complexes regulate active promoters
in human embryonic stem cells. Elife, 2012. 1: p. e00068.
78
8. Publications during the M.Sc. period
Sloutskin A., Danino Y.M., Zehavi Y., Orenstein Y., Doniger T., Shamir Y.,
and Juven-Gershon, T., ElemeNT: A Computational Tool for Detecting Core
Promoter Elements, Plos One, under review.
Danino Y.M., Even D., Ideses D., and Juven-Gershon T., The core promoter:
at the heart of gene expression, BBA - Gene Regulatory Mechanisms, (invited
review) under review.
Safra M., Fickentscher R., Levi-ferber M., Danino Y.M., Haviv-Chesner A.,
Hansen M., Juven-Gershon T., Weiss M., and Henis-Korenblit S. (2014) The
FOXO transcription factor DAF-16 bypasses ire-1 requirement to promote
endoplasmic reticulum homeostasis, Cell Metabolism, 20(5):870-881.
79
9. Appendixes
Appendix 1
The minimal promoter sequences of the human Cdx and human Hox genes.
The table contains the name of the gene, its symbol and the sequence from -
10 to +40 (for Cdx1 only the sequence is from -40 to +40) relative to the +1
position that we defined (in bold). The capital letters represent the nucleotides
that are transcribed as reported by the RefSeq in the genome browser. Color
code of matching positions is: TATA box, mammalian Inr and DPE.
Gene Gene
Symbol Sequence (5' to 3')
Cdx1 uc0031rq.3 ccggagctataaaaggcctgggtggggcgggcgcggcg
gcAGGACAGCCGAGTTCAGGTGAGCGGTTGCTCGTCGTCGGG
Cdx2 uc001urv.4 ccgcctctgcagcctagtgggaaggaggtGGGAGGAAAGAAGGAAGAAAG
Hoxa1 uc003syd.3 cATTCATATCATTTTTCTTCTCCGGCCCCATGGAGGAAGTGAGAAAGTTG
Hoxa2 uc003syh.3 TGAATTCAATAGTTTAATAGTAGCGCGGTCCCCATACGGCTGTAATCAGT
Hoxa9 uc003syt.3 TGAAATCTGCAGTTTCATAATTTCCGTGGGTCGGGCCGGGCGGGCCAGGC
Hoxa11 uc003syx.3 ccaaatttctacttcacggatccgCTTCAAAGAGGCAGCTGCAGTGGAGA
Hoxb3 uc010wlm.2 accgcgcagtATATTTCACATTCTCCAGAATGTTAAGTGACACTTTAACT
Hoxb9 uc002inx.3 ttgaccaatcATTTTGCAAGGAGAGCTGAGACGGGCTGCTCCACTGTACT
Hoxc6 uc001sev.3 tgactttgtcaTTTTGTCTGTCCTGGATTGGAGCCGTCCCTATAACCATC
Hoxc8 uc001ser.3 gGCCGAGCTCAGCACCGAGGCGCCCCCCAACCTGCCCAGCCCCCAGCCCA
Hoxd3 uc002ukp.3 tcGCCTCCACAGATATCAAAAGAAACCTGAAGAGCCTACAAAAAAAAAAG
Hoxd9 uc010zex.2 ccgCGCGACCAATGGTGGAGGCTGCAGCCTGCGAACTAGTCGGTGGCTCG
Hoxd10 uc002ukf.1 ATGTTTTCCTAGAGATGTCAGCCTACAAAGGACACAATCTCTCTTCTTCA
80
Appendix 2
Representation of the 1524 'good' PCPs in the four human Hox gene clusters.
A. Hoxa cluster.
B. Hoxb cluster.
C. Hoxc cluster.
D. Hoxd cluster.
Each of the clusters contains the 'good' PCPs that found in both strands (+/-).
81
82
83
84
85
Appendix 3
A table comparing between the human Hox and Histone gene clusters that
contains: cluster name, chromosomal location, the length (bp) of the genomic
regions that contain the Hox genes or Histone genes, and the average length
of the Hox gene cluster (bp). Notably, both the Hox and the Histone cluster
are similar in size.
Histones Hoxd Hoxc Hoxb Hoxa Name Cluster
Chr6 Chr2 Chr12 Chr7 Chr7
Chromosome
26104094-
26285993
176957000-
177058134
54332000-
54450000
46606807-
46858272
271320000-
27240000 Positions
181900 104134 118001 251466 108000 Length(bp)
181900 145400
Average(bp)
86
Appendix 4
The minimal promoter sequences of the seven Drosophila Hox genes that
were used as the origin sequences for the generation of IGms (and thus, the
PCPs) in the human Hox gene clusters. The color code for matching positions
to Drosophila Inr, 'Neck' positions and DPE is shown.
Inr
+17
+18
+19
+20
+24
+25
+27
DPE
Colored marks
T
C
G
A G
A
A
Inr+Neck+DPE
C
G
A
T
T
C
Each of the positions that were chosen for the IGms of each Drosophila Hox
promoter, is colored in dark blue above the sequence (see Table 3). The
number of the IGms that were generated from the promoter sequence is
indicated above the colored name of each gene.
87
Appendix 5
A print-screen of the uploaded 52 'good' PCPs to the genome browser, which
are located within the four human Hox gene clusters. Each of these PCPs is
represented as vertical black line (under the 'good PCPs' title) and its name is
indicated to the right of it. The name is composed of the Drosophila gene
name from which the PCP originated from and a serial number.
A. Hoxa cluster.
B. Hoxb cluster.
C. Hoxc cluster.
D. Hoxd cluster.
88
89
Appendix 6
The minimal promoter sequences of the human gene candidates obtained
from the hDPEsearcher software. The table contains the name of the gene, its
symbol and the sequence from -10 to +40 relative to the +1 position that we
defined (in bold). The capital letters represent the nucleotides that are
transcribed as reported by the RefSeq in the genome browser. The color code
of matching positions is the same color code that is used in Appendix 4.
Gene Gene
Symbol Sequence (5' to 3')
p21 uc021yzb.1 aacatgtcccAACATGTTGAGCTCTGGCATAGAAGAGGCTGGTGGCTATT
tp53inp2 uc002xau.1 gcggccgcacAGACTCAAAGCCCCGCGGGCGAGCTCAGCAGCCCGGAGCG
ccnd1 uc010hoo.3 cagtaacgtcACACGGACTACAGGGGAGTTTTGTTGAAGTTGCAAAGTCc
ProS1 uc010hoo.3 tgtttccttcAGTTTTGTCAAAGCAACAGGCTTCACAAGTCCTGGTTAGG
twist2 uc021vyw.2 cagcccagctAGAGTTTCCAAAAAAGTTAGAATAACTTCCTCTCCCGGAG
snail1 uc002xuz.3 tgctgcattcATTGCGCCGCGGCACGGCCTAGCGAGTGGTTCTTCTGCGC
cdc25a uc003csh.1 CAGCGAAGACAGCGTGAGCCTGGGCCGTTGCCTCGAGGCTCTCGCCCGGC
cdc25b uc002wjn.3 gctgctgctcagcGCAGCCAGTCGCGGAGGCGGGGAGGCTGCGCGGTCAG
cdc34 uc010hoo.3 cggccaaggcAAGCGCCGGTGGGGCGGCGGCGCCAGAGCTGCTGGAGCGC
Hoxb6 uc010dbh.1 cctggtggttaTAATGCAGCATTCTTTTGGACACCACACCTAGGTCGGAG
Hoxd13 uc002ukf.1 cgagcgaaccagaGAGAAAGGAGAGGAGGGAGGAGGCGCGCCGCGCCATG
90
Appendix 7
A paper describing the ElemeNT and CORE resources, currently under
review.
91
ElemeNT: A Computational Tool for Detecting Core Promoter Elements
Anna Sloutskin1, Yehuda M. Danino1, Yonathan Zehavi1, Yaron Orenstein2, Tirza
Doniger1, Ron Shamir2 and Tamar Juven-Gershon1*
1The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University,
Ramat Gan 5290002, Israel
2Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv 6997801,
Israel.
The authors declare that there are no potential conflicts of interest.
Corresponding author
Email: [email protected] (T.J-G)
Dear Reviewer,
The resources described in this manuscript will be publicly available upon
acceptance of the paper. However, to secure them until publication, they are user-
protected. Please use the general user details provided below to access the
database and software described.
To protect the identity of the reviewer when accessing the resources, you may use a
proxy server that conceals your IP address. We suggest using google to choose such
a proxy server.
The resources are available at:
http://lifefaculty.biu.ac.il/gershon-tamar/index.php/resources
Username: GershonLab
Password: TJGL2014
92
Abstract
Core promoter elements play a pivotal role in the transcriptional output, yet their
detection within sequences of interest is largely manually-performed. Here, we
present two contributions in the curation and detection of core promoter elements
within given sequences. First, the CORE is a collection of TATA-box, initiator and
downstream core promoter element (DPE) sequences among RefSeq-defined
Drosophila melanogaster transcription start sites. Second, the Elements Navigation
Tool (ElemeNT) is a convenient web-based, interactive tool for prediction and display
of putative core promoter elements and their biologically-relevant combinations.
These resources, accessible at http://lifefaculty.biu.ac.il/gershon-
tamar/index.php/resources, facilitate the identification of core promoter elements as
active contributors to gene expression.
93
1. Introduction
The uniqueness of each cell, as well as the differences between cell types in
multicellular organisms, are largely achieved by distinct transcriptional programs. The
regulation of transcription initiation is a complex process that is primarily based on
the direct interactions between transcription factors and DNA. Transcription initiation
occurs at the core promoter region where the RNA Polymerase II (RNAPII) binds,
which is often referred to as the „gateway to transcription‟ [1-6]. Although it was
previously believed that the core promoter is a universal component that works in a
similar mechanism for all protein-coding genes, it is nowadays established that core
promoters differ in their architecture and function [5-9]. Moreover, distinct core
promoter compositions were demonstrated to result in various transcriptional outputs
[10-14].
Transcription initiation is generally thought to occur in either a focused or a dispersed
manner with multiple detected combinations between these modes [6,7]. Promoters
that exhibit a dispersed initiation pattern typically contain multiple weak transcription
start sites (TSSs) within a 50 to 100bp region, and are associated with CpG islands.
In vertebrates, dispersed transcription initiation appears to account for the majority of
protein-coding genes and is believed to direct the transcription of constitutively-
expressed genes.
Focused promoters contain a single predominant TSS or within a cluster of several
nucleotides and are highly correlated with tightly regulated gene expression [6]. The
focused core promoter typically spans the region from -40 to +40 relative to the first
transcribed nucleotide, which is usually termed “the +1 position”. The focused core
promoter area encompasses distinct DNA sequence motifs, termed core promoter
elements or motifs. These elements are recognized by the basal transcription
machinery to recruit RNAPII and form the preinitiation complex [15-17]. The TFIID
multi-subunit complex is a key basal transcription factor that recognizes the core
promoter in the process of transcription initiation [15-18]. A distinct set of TFIID
subunits, namely TATA box-binding protein (TBP) and TBP-associated factors
(TAFs), recognize specific core promoter sequences [4-6,15,19-22]. Table 1 and
Figure 1 provide a summary of the characteristics of the known core promoter
elements. Remarkably, the MTE, DPE and Bridge elements are exclusively
dependent on the presence of a functional initiator with a strict spacing requirement,
and are typically enriched in TATA-less promoters [4-6,19,20,22-24].
94
An important aspect of core promoter elements is their synergistic nature. Although
the presence of a specific core promoter element is usually sufficient to influence
transcription, different combinations of core promoter elements exist, with some
shown to act in concert and hence, affect the potency of the transcriptional outcome
[10,25]. It is therefore important to consider all the elements present within the same
promoter in order to assess its transcriptional strength.
Manual annotation of experimentally-validated Drosophila promoters for the presence
of TATA-box, Initiator and DPE was previously described [23]. This analysis includes
205 promoters, whose TSSs were empirically determined. This mapping of core
promoter elements has facilitated the discovery that the Drosophila Hox gene
network is regulated via the DPE [26]. A more comprehensive analysis of the whole
Drosophila transcriptome revealed that DPE-containing genes are conserved and
highly prevalent among the target genes of Dorsal, a key regulator of dorsal-ventral
axis formation [11]. These examples demonstrate that the comprehensive annotation
of core promoter elements in each transcript can greatly advance the understanding
of gene expression regulation.
Prediction of promoter elements that affect the transcriptional output, in the absence
of experimental validation, is a difficult task. Although high-throughput transcription
data, such as cap analysis gene expression (CAGE) [27] and genomic run-on assay
followed by deep sequencing (GRO-seq) [28] exist, the RefSeq annotation is still
considered the “gold standard” for TSSs annotation (see Discussion).
The majority of currently available promoter prediction programs search for over-
represented motifs in a given set of promoter sequences (based on annotated TSSs),
rather than known core promoter elements [29-31]. Most of these programs utilize
other features, such as transcription factors binding sites, physical properties of the
DNA, DNA accessibility, RNA polymerase II occupancy and various epigenetic
markers [31-37]. However, even available programs that aim to identify core
promoter elements, such as McPromoter [38] and Eukaryotic Core Promoter
Predictor (YAPP, http://www.bioinformatics.org/yapp/cgi-bin/yapp.cgi), rarely
consider the strict spacing required by the Inr-dependent elements, namely, DPE,
MTE and Bridge.
The selection of promoters that comprise the data set used to predict core promoter
elements based on position weight matrices (PWMs) is of pivotal importance, as
subtle variations in the sequences may generate completely different PWMs [33].
Motif finding algorithms, such as XXmotif, can be used to accurately construct a
PWM for over-represented motifs within a given set of sequences [39,40].
Unfortunately, even a perfect model that is only based on sequence features, cannot
95
exclusively account for the observed transcriptional activity, as most of the sequence
motifs are short and redundant, and can thus be found in many non-transcriptionally
active regions of the genome [33]. Using experimentally-validated sequences rather
than over-represented motifs, can greatly enhance the strength of the prediction
program, but cannot fully guarantee the accuracy of the prediction. Currently, the
experimental readout of transcription strength and start sites resulting from mutated
promoter sequences is not performed on a high-throughput scale; hence, the
currently available experimental results are prone to be biased. Moreover, the known
biologically functional sequences may slightly differ from the determined consensus,
and therefore the detection of candidate core promoter elements cannot be easily
performed using currently available resources.
2. Methods
2.1 Availability
CORE and ElemeNT are accessible at http://lifefaculty.biu.ac.il/gershon-
tamar/index.php/resources. Each resource is described in a separate description
page. For ElemeNT, both source files (Perl programming language) and the PWMs
used can be downloaded at http://lifefaculty.biu.ac.il/gershon-
tamar/index.php/element-description
2.2 CORE annotation guidelines
For a position between -10 and +10 relative to the RefSeq‟s TSS, each adenosine
was examined as a potential A+1, and was assigned a score based on nucleotides
match to the consensus Drosophila initiator sequence (Table 1). Only a match of at
least 4 out of 6 nucleotides was considered for further analysis.
DPE motifs were calculated for each putative initiator position by scoring the
sequence that is precisely located at +28 to +33 relative to the A+1 of the
corresponding initiator, based on a match to the DPE functional range set (DSWYVY;
an experimentally defined broad DPE consensus [23], presented in Table 1). The
presence of TATA-box motifs was determined by searching for a 4-nucleotides TATA
sequence match in the region between -45 and -19 relative to the RefSeq +1
position. This loose criterion was used in order to avoid missing functional TATA box-
containing promoters that do not match the 8-nucleotides-long consensus
(TATAWAAR).
2.3 The ElemeNT algorithm
For each core promoter element, the user should specify a threshold between 0 and
1 for the presence of the element at a position. Default threshold values were
96
empirically determined for each element, based on known functional sequence
elements.
For a PWM matrix P with k columns, the PWM score is calculated for each sub-
sequence of length k (k-mer) in the sequences, by multiplying the appropriate values
of the PWM for each consecutive position, as follows:
1: 1_ ( , ) '( , )k
i i k j i jPWM SCORE S P P j S , where 1:i i kS is a k-mer starting at
position i+1 in sequence S and '( , )P j x is the probability for nucleotide x at position j
in P, normalized so that for a given j, max{ '( , )} 1P j x . The role of this
normalization is to guarantee that the final PWM score for every element is between
0 and 1, irrespective of the PWM‟s parameters. Each sub-sequence with a score
exceeding the specified threshold is termed „hit‟. The score is calculated for
0 i n k , where n is the length of the input sequence S, and hits are displayed in
a list sorted in descending score order for each element. Consensus match scores,
which are the number of base matches of the hit to the motif‟s consensus, are also
reported for each hit (Table 1).
3. Results
3.1 The CORE database
We constructed CORE, a database of all RefSeq-defined Drosophila melanogaster
transcripts, annotated for the presence of TATA-box, Drosophila initiator and
downstream core promoter element (DPE) (File S1). All Drosophila transcripts
initiating at the same nucleotide were treated as a single TSS. For a given TSS, an
initiator score was calculated for each position from -50 to +50 relative to +1 of the
RefSeq TSS. Two putative initiators were determined for each RefSeq TSS, with the
first priority Inr located closer to the annotated TSS. DPE scores were calculated for
each of the determined initiators, and the presence of a TATA box was assigned.
The annotation guidelines are detailed in section 2.2. Furthermore, the frequencies of
the following elements among the Drosophila transcripts were summarized: TATA-
box, Drosophila initiator and DPE motifs. In addition to a comprehensive analysis of
the core promoter composition of Drosophila transcripts, CORE provides clues
(based on the core promoter composition) with regards to an optimal TSS. Notably,
none of the available resources, including CORE, allow the identification of most
current core promoter elements and their potential combinations within a given
sequence.
3.2 The Elements Navigation Tool
97
In order to facilitate the joint identification of the vast majority of core promoter
elements and their biologically-relevant combinations within a sequence, we
developed the Elements Navigation Tool (ElemeNT). ElemeNT is a web-based,
interactive tool for rapid and convenient detection of core promoter elements and
their combinations within any given sequence. Core promoter elements have been
shown to function at a specific distance from the TSS and to affect transcription (e.g.
as examined by mutational analysis). ElemeNT searches the input sequences for the
presence of core promoter elements that are precisely located relative to the TSS, as
specified by the user (Figure 2). The elements are represented by PWMs, which are
constructed based on the validated biologically functional sequences (File S2, Table
1). Notably, for some elements, the PWMs differ from the defined consensus
sequences, reflecting differences in the data sources used to generate these models.
The elements that can be searched for are: Mammalian initiator, Drosophila initiator,
TATA box, MTE, DPE, Bridge, BREu, BREd, Human TCT, Drosophila TCT, XCPE1
and XCPE2 (Table 1, Figure 1). Notably, the MTE, DPE and Bridge motifs are only
scored at the precise location relative to each detected mammalian/Drosophila
initiator, based on the known strict spacing requirement that is crucial for these
elements to be functional. The scores are normalized to the scale of 0 to 1, to allow
more interpretable results. The ElemeNT algorithm is described in section 2.3.
The output of the program contains the analyzed sequence, a color display of certain
possible core promoter elements combinations found, and a table containing each of
the detected elements alongside its position, PWM and consensus match scores
(Figure 3). Suggested combinations of core promoter elements are displayed in order
to indicate potential synergism between elements that may inspire further
exploration. The elements that are considered to form possible combinations are any
combination of the following: 1) the mammalian/Drosophila initiator and either the
MTE, DPE or Bridge motifs, 2) TATA box and mammalian/Drosophila initiator, 3)
TATA box and either BREu or BREd (Figure 3A).
In the output table, the elements are ordered by their type and then sorted by PWM
scores (Figure 3B). The MTE, DPE and Bridge motifs, which are strictly dependent
on the presence of a functional initiator [4-6,19,20,22,24], are displayed immediately
below the corresponding initiator. For TATA box motifs, a message is displayed if the
specific TATA-box is located 26 to 40bp upstream of the A+1 of an initiator. In
addition, a message is displayed if a BREu or BREd is located in close proximity to
the specific TATA-box [41-43].
To assess the performance of the ElemeNT tool, a set of experimentally-validated
core promoter sequences were analyzed by the tool. The analysis of the Drosophila
98
Inr is presented as an example (Figure S1). Importantly, ElemeNT detected most of
the biologically functional Drosophila initiator motifs among the dataset, at cutoff
values around 0.01. As expected, lower threshold values used were able to detect a
greater number of correct hits, however, the false positive ratio was higher as well.
False negative hits were scored as well, based on missed motifs. The threshold
values of 0.005-0.01 had a strong correlation with scores obtained for previously
validated motifs‟ sequence variations [23].
This dataset cannot be used to compare the performance of ElemeNT with other
programs, such as YAPP, as YAPP does not search for Drosophila Inr, only for
mammalian Inr. The definition of mammalian Inr is more loose than that of the
Drosophila Inr, and no individually-validated set of mammalian TSS was available.
Taken together, both the CORE database and the ElemeNT program present new
improved tools to assess the presence of core promoter elements within a given DNA
sequence.
4. Discussion
Core promoter elements, located in the immediate vicinity of the TSSs, were
demonstrated to have a great effect on the transcriptional output [6,7]. The majority
of core promoter elements were identified as DNA sequences that are recognized by
components of the preinitiation complex [19,41,42,44,45]. In addition,
overrepresented motifs were discovered in the region around the annotated TSSs
[46-48]. Some of these motifs affected the transcriptional outcome [24] and some
were bound by transcription-regulating proteins [49].
The determination of actual TSSs, which influence the motifs discovered in their
vicinity, is a critical factor in the prediction of core promoter elements. The
comprehensive determination of TSS provided by RefSeq is based on the rigorous
alignment of reads from high-quality RNA [50]. However, the TSS of the same gene
can vary across the developmental stages, tissues, and time points sampled, which
possess a great challenge for integration of the data provided by different studies.
Both the CORE database and the ElemeNT tool will benefit from the wealth of rapidly
evolving novel high-throughput techniques to identify features and sequences that
might affect transcription; these include PEAT [51], CAGE [27], FAIRE-seq [52],
ChIP-seq [53], and GRO-seq [28]. The above techniques are applied by major
projects and consortia, which are aimed at dissecting the rules governing
transcriptional regulation, including ENCODE [54], modENCODE [55], and
FANTOM5 [56], as well as other genome-wide studies [57,58]. These different
99
strategies complement each other and together introduce a much more complex view
of RNA transcription initiation than previously anticipated [59].
Furthermore, core promoter elements are associated with focused, rather than
dispersed, transcription [6], while the classification of promoters to these classes is
largely lacking. Since the CORE database uses the RefSeq‟s annotation of 5‟ ends, it
should be revisited in the future, when new standardized data for transcription start
sites will be available. Insights gained during the integration of additional data, e.g.
CAGE [60-62] and GRO-seq [59,63] will be of utmost importance for re-defining
transcription start sites. Moreover, this will enable the re-evaluation of current tools.
The overall distribution of TATA box, Inr and DPE motifs among the Drosophila
transcripts might consequently change.
The uniqueness of the ElemeNT program, as compared to other promoter-prediction
software, is its major focus on biologically-functional core promoter elements,
manifested by two major concepts that lie at the foundation of the ElemeNT
algorithm. The first is the exclusive use of experimentally validated core promoter
motifs, rather than overrepresented motifs, to construct the PWMs used. The use of
an experimentally-determined individual TSSs set is, however, limited due to possible
statistical bias.
The second is the obligatory presence of an initiator, and the strict spacing for the
downstream promoter elements MTE, DPE and Bridge. Both the presence of a
functional initiator and the strict spacing are crucial for the functionality of the
downstream elements, and are frequently omitted by other core promoter elements
prediction programs available [29,31,34,37,38]. Moreover, the identification of
combinations of elements, which were experimentally demonstrated to result in
synergistic effects [10,24,25], may spark new research directions. Despite the fact
that the presence of potential core promoter elements, or any combination of them,
may not necessarily imply that the elements are functional, their presence might
indicate that the specific genomic locus is transcriptionally active. However, in
contrast to most of the available promoter prediction programs, ElemeNT is not
designed to produce or analyze a genome-scale data, but is rather intended to
narrow down a given region of interest, considering the currently available,
experimentally-validated information about core promoter motifs themselves. The
redundancy of the core promoter motifs leads to the identification of sequences that
perfectly match functionally-verified sequences, yet are not functional. Based on
experience with transcription factors binding motifs [64], sorting out only the
functionally-relevant hits might prove to be a difficult task. Future modifications of the
algorithm used to annotate core promoter elements will be based on new insights
100
and a better understanding of transcription regulation, obtained by the
abovementioned techniques and consortia.
Importantly, the ElemeNT program can assist in the analysis of sequences from
organisms whose TSSs have not yet been comprehensively defined. For example,
both the TATA box and the BRE motifs are conserved from archaebacteria to
humans [65] and many organisms whose transcriptomes have not been annotated,
are likely to contain such core promoter elements.
To conclude, we anticipate that the ElemeNT tool, along with the CORE database,
will make the search for specific core promoter elements and their combinations
within Drosophila transcripts or any sequence of interest, accessible to scientists and
help in elucidating the major role core promoter elements play in gene expression.
Acknowledgments
We thank Marina Socol, Boris Komraz and Dr. Eli Sloutskin for invaluable assistance
in ElemeNT development and web execution. We thank Gal Nuta for assisting with
optimization of ElemeNT parameters. We thank Dr. Diana Ideses, Dan Even, Adi
Kedmi, Hila Shir-Shapira and Gal Nuta for critical reading of the manuscript.
Funding Statement
This research was supported by grants from the Israel Science Foundation to T.J-G
(no. 798/10) and R.S (no. 317/13) and the European Union Seventh Framework
Programme (Marie Curie International Reintegration Grant) to T.J-G (no. 256491).
Y.O was supported by the Edmond J. Safra Center for Bioinformatics at Tel-Aviv
University and the Israeli Center for Research Excellence (I-CORE), Gene
Regulation in Complex Human Disease, center 41/11.
101
References
1. Smale ST (2001) Core promoters: active contributors to combinatorial gene
regulation. Genes & Development 15: 2503-2508.
2. Smale ST, Kadonaga JT (2003) The RNA polymerase II core promoter. Annual
Review of Biochemistry 72: 449-479.
3. Heintzman ND, Ren B (2007) The gateway to transcription: identifying,
characterizing and understanding promoters in the eukaryotic genome. Cellular and
Molecular Life Sciences 64: 386-400.
4. Juven-Gershon T, Hsu J-Y, Theisen JWM, Kadonaga JT (2008) The RNA
polymerase II core promoter - the gateway to transcription. Current Opinion in Cell
Biology 20: 253-259.
5. Juven-Gershon T, Kadonaga JT (2010) Regulation of gene expression via the core
promoter and the basal transcriptional machinery. Dev Biol 339: 225-229.
6. Kadonaga JT (2012) Perspectives on the RNA polymerase II core promoter. Wiley
Interdiscip Rev Dev Biol 1: 40-51.
7. Lenhard B, Sandelin A, Carninci P (2012) Metazoan promoters: emerging
characteristics and insights into transcriptional regulation. Nature Reviews Genetics
13: 233-245.
8. Muller F, Demeny MA, Tora L (2007) New problems in RNA polymerase II
transcription initiation: matching the diversity of core promoters with a variety of
promoter recognition factors. J Biol Chem 282: 14685-14689.
9. Muller F, Tora L (2014) Chromatin and DNA sequences in defining promoters for
transcription initiation. Biochim Biophys Acta 1839: 118-128.
10. Juven-Gershon T, Cheng S, Kadonaga JT (2006) Rational design of a super core
promoter that enhances gene expression. Nature Methods 3: 917-922.
11. Zehavi Y, Kuznetsov O, Ovadia-Shochat A, Juven-Gershon T (2014) Core
promoter functions in the regulation of gene expression of Drosophila dorsal target
genes. J Biol Chem 289: 11993-12004.
12. Zehavi Y, Sloutskin A, Kuznetsov O, Juven-Gershon T (2014) The core promoter
composition establishes a new dimension in developmental gene networks. Nucleus
5.
13. Butler JE, Kadonaga JT (2001) Enhancer-promoter specificity mediated by DPE
or TATA core promoter motifs. Genes Dev 15: 2515-2519.
14. Dikstein R (2011) The unexpected traits associated with core promoter elements.
Transcription 2: 201-206.
15. Thomas MC, Chiang CM (2006) The general transcription machinery and general
cofactors. Critical Reviews in Biochemistry and Molecular Biology 41: 105-178.
102
16. He Y, Fang J, Taatjes DJ, Nogales E (2013) Structural visualization of key steps
in human transcription initiation. Nature 495: 481-486.
17. Grunberg S, Hahn S (2013) Structural insights into transcription initiation by RNA
polymerase II. Trends Biochem Sci 38: 603-611.
18. Cianfrocco MA, Kassavetis GA, Grob P, Fang J, Juven-Gershon T, et al. (2013)
Human TFIID binds to core promoter DNA in a reorganized structural state. Cell 152:
120-131.
19. Burke TW, Kadonaga JT (1996) Drosophila TFIID binds to a conserved
downstream basal promoter element that is present in many TATA-box-deficient
promoters. Genes & Development 10: 711-724.
20. Burke TW, Kadonaga JT (1997) The downstream core promoter element, DPE, is
conserved from Drosophila to humans and is recognized by TAF(II)60 of Drosophila.
Genes & Development 11: 3020-3031.
21. Wu CH, Madabusi L, Nishioka H, Emanuel P, Sypes M, et al. (2001) Analysis of
core promoter sequences located downstream from the TATA element in the hsp70
promoter from Drosophila melanogaster. Mol Cell Biol 21: 1593-1602.
22. Theisen JW, Lim CY, Kadonaga JT (2010) Three key subregions contribute to
the function of the downstream RNA polymerase II core promoter. Mol Cell Biol 30:
3471-3479.
23. Kutach AK, Kadonaga JT (2000) The downstream promoter element DPE
appears to be as widely used as the TATA box in Drosophila core promoters. Mol
Cell Biol 20: 4754-4764.
24. Lim CY, Santoso B, Boulay T, Dong E, Ohler U, et al. (2004) The MTE, a new
core promoter element for transcription by RNA polymerase II. Genes &
Development 18: 1606-1617.
25. Gershenzon NI, Ioshikhes IP (2005) Synergy of human Pol II core promoter
elements revealed by statistical sequence analysis. Bioinformatics 21: 1295-1300.
26. Juven-Gershon T, Hsu J-Y, Kadonaga JT (2008) Caudal, a key developmental
regulator, is a DPE-specific transcriptional factor. Genes & Development 22: 2823-
2830.
27. Shiraki T, Kondo S, Katayama S, Waki K, Kasukawa T, et al. (2003) Cap analysis
gene expression for high-throughput analysis of transcriptional starting point and
identification of promoter usage. Proc Natl Acad Sci U S A 100: 15776-15781.
28. Core LJ, Waterfall JJ, Lis JT (2008) Nascent RNA sequencing reveals
widespread pausing and divergent initiation at human promoters. Science 322: 1845-
1848.
103
29. Bajic VB, Tan SL, Suzuki Y, Sugano S (2004) Promoter prediction analysis on
the whole human genome. Nat Biotechnol 22: 1467-1473.
30. Frith MC, Valen E, Krogh A, Hayashizaki Y, Carninci P, et al. (2008) A code for
transcription initiation in mammalian genomes. Genome Res 18: 1-12.
31. Narlikar L, Ovcharenko I (2009) Identifying regulatory elements in eukaryotic
genomes. Brief Funct Genomic Proteomic 8: 215-230.
32. Ohler U, Niemann H (2001) Identification and analysis of eukaryotic promoters:
recent computational approaches. Trends Genet 17: 56-60.
33. Pedersen AG, Baldi P, Chauvin Y, Brunak S (1999) The biology of eukaryotic
promoter prediction--a review. Comput Chem 23: 191-207.
34. Rach EA, Winter DR, Benjamin AM, Corcoran DL, Ni T, et al. (2011)
Transcription initiation patterns indicate divergent strategies for gene regulation at the
chromatin level. PLoS Genet 7: e1001274.
35. Duran E, Djebali S, Gonzalez S, Flores O, Mercader JM, et al. (2013) Unravelling
the hidden DNA structural/physical code provides novel insights on promoter
location. Nucleic Acids Res 41: 7220-7230.
36. Abeel T, Saeys Y, Rouze P, Van de Peer Y (2008) ProSOM: core promoter
prediction based on unsupervised clustering of DNA physical profiles. Bioinformatics
24: i24-31.
37. Datta S, Mukhopadhyay S (2013) A composite method based on formal grammar
and DNA structural features in detecting human polymerase II promoter region. PLoS
One 8: e54843.
38. Ohler U (2006) Identification of core promoter modules in Drosophila and their
application in accurate transcription start site prediction. Nucleic Acids Res 34: 5943-
5950.
39. Hartmann H, Guthohrlein EW, Siebert M, Luehr S, Soding J (2013) P-value-
based regulatory motif discovery using positional weight matrices. Genome Res 23:
181-194.
40. Luehr S, Hartmann H, Soding J (2012) The XXmotif web server for eXhaustive,
weight matriX-based motif discovery in nucleotide sequences. Nucleic Acids Res 40:
W104-109.
41. Deng W, Roberts SG (2005) A core promoter element downstream of the TATA
box that is recognized by TFIIB. Genes Dev 19: 2418-2423.
42. Lagrange T, Kapanidis AN, Tang H, Reinberg D, Ebright RH (1998) New core
promoter element in RNA polymerase II-dependent transcription: sequence-specific
DNA binding by transcription factor IIB. Genes Dev 12: 34-44.
104
43. Deng W, Roberts SG (2007) TFIIB and the regulation of transcription by RNA
polymerase II. Chromosoma 116: 417-429.
44. Chalkley GE, Verrijzer CP (1999) DNA binding site selection by RNA polymerase
II TAFs: a TAF(II)250-TAF(II)150 complex recognizes the initiator. EMBO J 18: 4835-
4845.
45. Tokusumi Y, Ma Y, Song X, Jacobson RH, Takada S (2007) The new core
promoter element XCPE1 (X Core Promoter Element 1) directs activator-, mediator-,
and TATA-binding protein-dependent but TFIID-independent RNA polymerase II
transcription from TATA-less promoters. Mol Cell Biol 27: 1844-1858.
46. FitzGerald PC, Sturgill D, Shyakhtenko A, Oliver B, Vinson C (2006) Comparative
genomics of Drosophila and human core promoters. Genome Biol 7: R53.
47. Ohler U, Liao GC, Niemann H, Rubin GM (2002) Computational analysis of core
promoters in the Drosophila genome. Genome Biol 3: RESEARCH0087.
48. Xi H, Yu Y, Fu Y, Foley J, Halees A, et al. (2007) Analysis of overrepresented
motifs in human core promoters reveals dual regulatory roles of YY1. Genome Res
17: 798-806.
49. Li J, Gilmour DS (2013) Distinct mechanisms of transcriptional pausing
orchestrated by GAGA factor and M1BP, a novel transcription factor. EMBO J 32:
1829-1841.
50. Pruitt KD, Tatusova T, Maglott DR (2005) NCBI Reference Sequence (RefSeq): a
curated non-redundant sequence database of genomes, transcripts and proteins.
Nucleic Acids Res 33: D501-504.
51. Ni T, Corcoran DL, Rach EA, Song S, Spana EP, et al. (2010) A paired-end
sequencing strategy to map the complex landscape of transcription initiation. Nat
Methods 7: 521-527.
52. Giresi PG, Kim J, McDaniell RM, Iyer VR, Lieb JD (2007) FAIRE (Formaldehyde-
Assisted Isolation of Regulatory Elements) isolates active regulatory elements from
human chromatin. Genome Res 17: 877-885.
53. Furey TS (2012) ChIP-seq and beyond: new and improved methodologies to
detect and characterize protein-DNA interactions. Nat Rev Genet 13: 840-852.
54. (2004) The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306:
636-640.
55. Washington NL, Stinson EO, Perry MD, Ruzanov P, Contrino S, et al. (2011) The
modENCODE Data Coordination Center: lessons in harvesting comprehensive
experimental details. Database (Oxford) 2011: bar023.
56. Forrest AR, Kawaji H, Rehli M, Baillie JK, de Hoon MJ, et al. (2014) A promoter-
level mammalian expression atlas. Nature 507: 462-470.
105
57. Sandelin A, Carninci P, Lenhard B, Ponjavic J, Hayashizaki Y, et al. (2007)
Mammalian RNA polymerase II core promoters: insights from genome-wide studies.
Nat Rev Genet 8: 424-436.
58. Lenhard B, Sandelin A, Carninci P (2012) Metazoan promoters: emerging
characteristics and insights into transcriptional regulation. Nat Rev Genet 13: 233-
245.
59. Core LJ, Martins AL, Danko CG, Waters CT, Siepel A, et al. (2014) Analysis of
nascent RNA identifies a unified architecture of initiation regions at mammalian
promoters and enhancers. Nat Genet 46: 1311-1320.
60. Consortium F, the RP, Clst, Forrest AR, Kawaji H, et al. (2014) A promoter-level
mammalian expression atlas. Nature 507: 462-470.
61. Hoskins RA, Landolin JM, Brown JB, Sandler JE, Takahashi H, et al. (2011)
Genome-wide analysis of promoter architecture in Drosophila melanogaster.
Genome Res 21: 182-192.
62. Nechaev S, Fargo DC, dos Santos G, Liu L, Gao Y, et al. (2010) Global analysis
of short RNAs reveals widespread promoter-proximal stalling and arrest of Pol II in
Drosophila. Science 327: 335-338.
63. Saunders A, Core LJ, Sutcliffe C, Lis JT, Ashe HL (2013) Extensive polymerase
pausing during Drosophila axis patterning enables high-level and pliable
transcription. Genes Dev 27: 1146-1158.
64. Shlyueva D, Stampfel G, Stark A (2014) Transcriptional enhancers: from
properties to genome-wide predictions. Nat Rev Genet 15: 272-286.
65. Reeve JN (2003) Archaeal chromatin and transcription. Mol Microbiol 48: 587-
598.
66. Smale ST, Baltimore D (1989) The "initiator" as a transcription control element.
Cell 57: 103-113.
67. Goldberg ML (1979) Ph.D. Thesis. Sequence analysis of Drosophila histone
genes.
68. Parry TJ, Theisen JWM, Hsu J-Y, Wang Y-L, Corcoran DL, et al. (2010) The TCT
motif, a key component of an RNA polymerase II transcription system for the
translational machinery. Genes & Development 24: 2013-2018.
69. Anish R, Hossain MB, Jacobson RH, Takada S (2009) Characterization of
transcription from TATA-less promoters: identification of a new core promoter
element XCPE2 and analysis of factor requirements. PLoS One 4: e5103.
70. Lewis BA, Kim TK, Orkin SH (2000) A downstream element in the human beta-
globin promoter: evidence of extended sequence-specific transcription factor IID
contacts. Proc Natl Acad Sci U S A 97: 7172-7177.
106
71. Lee DH, Gershenzon N, Gupta M, Ioshikhes IP, Reinberg D, et al. (2005)
Functional characterization of core promoter elements: the downstream core element
is recognized by TAF1. Mol Cell Biol 25: 9674-9686.
107
Figure legends
Figure 1. Schematic representation of the major core promoter elements. The region
of the core promoter area (-40 to +40 relative to the TSS) is illustrated. The diagram
is roughly to scale, and each element is colored according to its color in the output
table (see Figure 3B).
Figure 2. Flow diagram of the ElemeNT process. The flowchart demonstrates the
input, processing and output steps of the ElemeNT program. The input consists of a
set of sequences and the elements to search for with their corresponding thresholds.
ElemeNT calculates hits for each element, and considers possible combinations. The
output includes combinations of core promoter elements and a table containing all
the identified elements, their location, PWM score and consensus match score.
Figure 3. A sample output of the ElemeNT program. (A) A screen-shot of the sample
input sequence and the combinations of elements identified in it. ElemeNT has
detected a TATA box flanked by both a BREu element and a BREd element,
Drosophila and Mammalian initiator elements and MTE, DPE and Bridge elements.
The two possible combinations result from a sequence match to both the Drosophila
and mammalian initiators, due to the partial sequence redundancy of the two
elements. (B) The table displaying all the elements identified within the sample input
sequence, their location, PWM and consensus match scores. Note the message
displayed for the TATA-box, indicating the presence of mammalian and Drosophila
initiator, as well as BREu and BREd, at optimal distances for transcriptional synergy.
Figure S1. Evaluation of ElemeNT‟s discovery rates. False positive (red) and false
negative (blue) hits ratios for Drosophila initiator motif were scored as a function of
the threshold used. False positives ratio was calculated using the number of false
positive matches among all potential matches. False negative ratio was assigned
based on discovery rate of true hits. The x-axis is presented on a logarithmic scale.
The analysis included 43 50bp-long sequences, which were found to be
experimentally functional.
File S1. The CORE Database. This database was created in order to identify putative
TATA box, initiator (Inr) and DPE elements in Drosophila melanogaster core
promoter region of different TSSs and to calculate the frequencies of these core
promoter elements among the transcripts, as well as to provide clues (based on core
promoter composition), with regards to an optimal TSS. The RefSeq sheet contains
108
all the annotated Drosophila transcripts from -50 to +50, relative to +1 of the RefSeq
TSS. All the transcripts initiating at the same nucleotide are treated as a single TSS.
The Inr position sheet contains the determination of the optimal initiator +1 position.
To enable a more comprehensive analysis, a „Second best initiator‟ was performed
using less stringent criteria. The DPE & TATA sheet calculates the score for
downstream core promoter element (DPE) and the presence of TATA-box motifs for
each Drosophila distinct TSS. This file along with the documentation is available at
http://lifefaculty.biu.ac.il/gershon-tamar/index.php/core-description
File S2. Position weight matrices representing the core promoter elements. This file
contains the position weight matrices (PWMs) of the different core promoter
elements, containing the nucleotide distributions in each position of each element.
Each core promoter element appears in a separate sheet. The source of the
promoter sequences used to calculate the PWM is indicated, as well as the Laplace
smoothing performed to avoid zero values. All indicated positions are relative to the
TSS.
109
Table 1. The known core promoter elements.
Name
Position
(relative to the
TSS)
PWM logo representation Consensus
(in IUPAC characters) References
Mammalian
Initiator -2 to +4
YYANWYY
[66] Drosophila
Initiator -2 to +5
TCAKTY
TATA box -30/-31 to -23/-24
TATAWAAR [6,67]
BREu
Immediately
upstream of the
TATA box
SSRCGCC [42]
BRE d
Immediately
downstream of the
TATA box
RTDKKKK [41]
DPE
(Inr dependent) +28 to +33
DSWYVY
(functional range set) [19,20,23]
MTE
(Inr dependent) +18 to +29
CSARCSSAAC [24]
Bridge
(Inr dependent)
Part I: +18 to +22
Part II: +30 to +33
Part I: CGANC
Part II: WYGT [22]
Drosophila TCT -2 to +6
YYCTTTYY [68]
Human TCT -1 to +6
YCTYTYY [68]
XCPE1 -8 to +2
DSGYGGRASM [45]
XCPE2 -9 to +2
VCYCRTTRCMY [69]
DCE +6 to +11, +16 to
+21, +30 to +34 -
Necessary motifs:
CTTC, CTGT, AGC [70,71]
Motif 1
Just upstream of
TSS, but can be
found up to -300 YGGTCACACTR [47,49]
The table includes the position (relative to the TSS, +1), motif logo, IUPAC
consensus sequence and references for each element.
110
Figure 1:
Figure 2:
111
Figure 3:
112
Appendix 8
General schematic plasmid maps of the tagged-TAFs-containing plasmid
variants. A. The sequences of the FLAG and HA epitope-tags.
B. TAF6 with a C-term. FLAG-HA tag. C. TAF6 with an N-term. FLAG-HA
tag. D. TAF9 with a C-term. FLAG-HA tag. E. TAF9 with an N-term. FLAG-
HA tag.
A.
113
Appendix 9
A invited review article about the core promoter, currently under review.
114
The core promoter: at the heart of gene expression
Yehuda M. Danino1, Dan Even1, Diana Ideses1 and Tamar Juven-Gershon1*
1The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan
University, Ramat Gan 5290002, Israel
Running title: The core promoter: a central player in gene expression
Key words: core promoter; RNA Pol II transcription; core promoter
elements/motifs; enhancer-promoter specificity; core promoter preferential
activation; gene expression
The authors declare that there are no potential conflicts of interest.
* To whom correspondence should be addressed. Tel: +972-3-531-8244; Fax:
+972-3-738-4058; Email: [email protected]
115
ABSTRACT
The identities of different cells and tissues in multicellular organisms are
determined by tightly controlled transcriptional programs that enable accurate
gene expression. The mechanisms that regulate gene expression comprise
diverse multiplayer molecular circuits of multiple dedicated components. The
RNA polymerase II (Pol II) core promoter establishes the center of this
spatiotemporally orchestrated molecular machine. Here, we discuss
transcription initiation, diversity in core promoter composition, interactions of
the basal transcription machinery with the core promoter, enhancer-promoter
specificity, core promoter-preferential activation, enhancer RNAs, Pol II
pausing, transcription termination, Pol II recycling and translation. We further
discuss recent findings indicating that promoters and enhancers share similar
features and may not substantially differ from each other, as previously
assumed. Taken together, we review a broad spectrum of studies that
highlight the importance of the core promoter and its pivotal role in the
regulation of metazoan gene expression and suggest future research
directions and challenges.
116
Introduction
Appropriate temporal and spatial gene expression is a highly complex process
underlying the fate and function of different cells and tissues. The regulation
of this process is composed of multiple levels and orchestrated molecular
events [1-3]. A central event in the regulation of eukaryotic gene expression is
the initiation of transcription. The initiation of transcription of protein-coding
genes and distinct non-coding RNAs occurs following the recruitment of RNA
polymerase II (Pol II) to the core promoter region by the basal transcription
machinery [4].
The core promoter is generally defined as the minimal DNA sequence
that directs accurate initiation of transcription. The core promoter sequence
encompasses the transcription start site (TSS), typically referred to as the +1
position [5, 6]. Examination of the distribution of TSSs reveals that there are
multiple modes of transcription initiation (Fig. 1A). Distinct molecular players
can open the chromatin structure at the core promoter region and thus
facilitate initiation of transcription. Interestingly, active promoters are
associated with specific chromatin signatures. These include: nucleosome-
depleted regions (NDR) or reduced nucleosome occupancy over the
promoters, DNaseI hypersensitive sites (DHS) and the enrichment of specific
histone modifications, such as di- and tri-methylation of H3K4 and acetylation
of H3K4 and H3K27 (Fig. 1B) [7, 8]. In the past, it was assumed that the core
promoter is a generic entity that functions in a universal manner. Nowadays
however, the growing convention is that the unique properties of a given
promoter are a function of its architecture and core promoter motifs
composition (Fig. 1C and D) [5, 6, 9, 10].
The core promoter, which is often referred to as “the gateway to
transcription”, is a central component in the initiation of transcription [11, 12].
Research in the past decade has enhanced our understanding of the
fundamental roles that the core promoter plays in the initiation of transcription,
as well as in the regulation of additional aspects of gene expression. Insights
are gained from studies of specific genes and gene networks [12-14], as well
as from genome-wide studies [10, 15] utilizing methodologies such as PEAT
[16], 5' RACE [17], CAGE [18], FAIRE-seq [19], ChIP-seq [20], Gro-seq [21],
117
and RNA-seq [22], and key projects and consortia (e.g. modENCODE [23],
ENCODE [24] and FANTOM5 [25]), which developed following the
implementation of some of the above methods. Accordingly, core promoters
can be studied at different resolutions: from genomic architecture,
transcription co-regulators and sequence-specific transcription factors (Fig.
2A), through basal transcription factors (Fig. 2B and C) and DNA sequence
motifs (Fig. 2C). Importantly, the different experimental strategies complement
each other and together, provide the elaborate view of core promoters. Here,
we review the current state of knowledge relevant to the contribution of the
core promoter to multiple aspects of gene expression, and discuss future
directions and challenges in the field.
1. Diversity in the transcription initiation landscape
1.1. Multiple modes of transcription initiation
The core promoter is best known for its role in directing proper transcription
initiation at the TSS. Several years ago, two modes of transcription initiation,
focused and dispersed, were noted in metazoan (Fig. 1A) [6, 10]. Focused
(also termed “sharp peak”) promoters contain a single predominant TSS or a
few TSSs within a narrow region of several nucleotides [9]. Focused
promoters encompass approximately between -40 to +40 nucleotides relative
to the TSS (referred to as the +1 position). Focused transcription initiation is
associated with spatiotemporally regulated tissue specific genes [26] and with
canonical core promoter elements that have a positional bias, such as the
TATA box, Initiator, MTE and DPE [27] (Fig. 1C).
Dispersed (also termed “broad”) promoters contain multiple weak start
sites that spread over 50 to 100 nucleotides at the promoter region ([9, 10]
and refs therein). Dispersed transcription initiation is associated with
constitutive or housekeeping genes. Vertebrate dispersed promoters often
contain CpG islands and Sp1 and NF-Y sites [6, 9, 28] whereas Drosophila
core promoters often contain elements that have weaker positional biases (as
compared to the focused promoters), but frequently co-occur in a specific
order and orientation: Ohler 1, DNA replication element (DRE), Ohler 6 and
118
Ohler 7 [27, 29] (Fig. 1D). Although the focused promoter architecture exists
in all the organisms and is the predominant initiation mode in simpler
organisms, the dispersed mode is more common in higher eukaryotes [9, 26].
For example, over 70% of vertebrate promoters are dispersed [28, 30-32].
From a teleological standpoint, the associations of sharp TSSs with regulated
genes and of broad TSSs patterns with constitutively expressed genes are
rather intuitive. It would be easier to achieve a more precise control of gene
expression from focused TSSs, as compared with dispersed promoters of
housekeeping genes, which would be constitutively transcribed with minimal
variation of gene expression by usage of multiple start sites [9].
1.2. Focused versus Dispersed initiation patterns - recent studies, new
insights
Despite the abovementioned distinction between the two modes of
transcription initiation, classification of transcription initiation landscapes is not
so straightforward. Functional experiments and genome-wide studies using
advanced technologies imply that there are multiple ways to classify
promoters. Thus, the boundaries between these two major types of promoters
are sometimes unclear [6, 33]. With respect to the “focused vs. dispersed”
sub-classifications mentioned above, a mixed promoter (also termed “broad
with peak”; [16]), an additional promoter type, was revealed. This promoter
type exhibits a dispersed initiation pattern with a single strong transcription
start site [6, 34] (Fig. 1A). Several studies classified mammalian promoters
using alternative criteria [26, 28, 32]. The Ren Lab classified active promoters
based on genome-wide ChIP experiments for TFIID and Pol II, as well as
H3Ac and H3K4me, regardless of focused or dispersed initiation patterns [32].
Bajic et. al. [28] define four promoter types, based on distribution of
dinucleotides over the promoter regions, CpG Islands and TATA boxes.
Moreover, Carninci et. al. [26] classified promoters into four groups based on
CAGE analysis: single peak, broad shape peak, bimodal/multimodal peak and
broad with dominant peak. These studies also challenge the “focused vs.
dispersed” classification, as some mouse and human promoters contain both
CpG Islands and TATA boxes. A recent comprehensive review [10], which
119
compared genome-wide studies in human and Drosophila, presented another
sub-classification of three major types of promoters termed Type I, Type II
and Type III. Type I promoters contain TATA boxes and focused TSSs, lack
CpG islands and are associated with tissue-specific expression in adult
tissues. Type II promoters contain CpG islands and dispersed TSSs. In
mammals, type II promoters lack TATA boxes, and in Drosophila they contain
DRE, Ohler 1 or Ohler 6 motifs. Genes belonging to this group are associated
with broad expression throughout the organism's life. Type III promoters are
associated with developmentally regulated genes, which in Drosophila contain
combinations of Initiator and DPE motifs. In mammals, type III promoters
contain large CpG islands.
Taken together, the transcriptional initiation landscape is more complex than
the simple classification of two types of promoters.
1.3. Bidirectional and divergent transcription
Another manifestation of the complexity of transcription initiation is the
phenomenon of bidirectional transcription. Bidirectional transcription, which
presents two closely spaced transcription initiation events (within less than
1kb) of head-to-head Pol II transcripts in both sense and anti-sense
orientations, was originally defined for adjacent head-to-head oriented pairs of
protein-coding genes [35]. The relatively short region that contains the
opposite-oriented initiations and separates between these genes, is often
called a “bidirectional promoter” [36]. Experimental and computational studies
have characterized many features of bidirectional promoters. In general, it is
shown that 10%-22% of the genes in mammals are organized in this manner
[37]. Moreover, the bidirectionality was shown to be controlled in a cell-type
specific manner, and these pairs of genes are coordinately regulated ([37] and
refs therein). Hence, bidirectional promoters might have evolved to facilitate
the regulation of transcription of different genes at the same time, and might
consist of two separate, yet dependent, core promoters. Additionally, a
computational analysis supports an evolutionary role for bidirectional
promoters in the emergence of novel species-specific transcripts [38].
Bioinformatics analysis of the distribution of common core promoter elements
120
(BREu, TATA box, Inr and DPE) and CpG islands at bidirectional versus
unidirectional promoters, demonstrated that while the BREu is enriched at
bidirectional promoters, the Inr and DPE elements are similarly detected at
both promoter types [39]. The TATA box is rare in general, but is enriched in
bidirectional promoters of histone genes. Moreover, it was shown that the
CpG islands and Sp1 binding sites are common features of most of the
bidirectional promoters, compared to unidirectional promoters [40]. Other
studies focused on overrepresented binding-sites of different transcription
factors, and in some cases - on their influence on the expression of two
opposite genes regulated by a bidirectional promoter [37, 41].
Interestingly, another manifestation of bidirectional transcription
involving non-coding RNAs (ncRNAs) was recently characterized. Multiple
classes of ncRNAs were identified in different organisms (reviewed in [42]).
One of these classes is promoter-associated ncRNAs. During the years,
classes of promoter-associated non-coding transcripts were discovered in
bacteria, yeast, Drosophila, mouse, human and plants ([42-44] and refs
therein). Four studies, published back-to-back in 2008, described new classes
of promoter-associated ncRNAs in humans and mice [21, 45-48]. These
ncRNAs were generally divided into two classes, termed TSS-associated
RNAs (TSSa-RNAs) [47] and promoter upstream transcripts (PROMPTs) [46]
or upstream antisense RNAs (uaRNAs) [49], which share many features.
They are short, present at low abundance and are associated with CpG
islands and active-promoter-related histone marks (H3K4me3, H3ac), but not
with elongation-related histone marks (H3K36me3, H3K79me3).
Non-coding antisense RNAs derived from bidirectional promoters have
very short half-lives and are barely detectable. Two recent studies have
shown that an asymmetric distribution of polyadenylation signals and U1
snRNP-binding sites surrounding TSSs control transcript stability [49-51].
Notably, bidirectional initiation is also a feature of enhancer RNAs (eRNA; see
section 7) [52, 53].
The Lis lab has demonstrated that nearly 80% of active genes have
bidirectional promoters, suggesting that bidirectional initiation is a general
feature of mammalian genomes [21, 54]. Hence, these divergent ncRNAs
may be regarded as markers for active promoters of protein-coding genes [21,
121
45-47, 55]. Duttke et al. have recently analyzed transcription from human
promoters in HeLa cells and have classified promoters into three types:
unidirectional promoters, divergent promoters (containing an annotated gene
in the forward direction and no annotated gene in the reverse direction) and
bidirectional promoters (containing annotated genes in both directions) [56].
Surprisingly, they discovered that about half of human active promoters are
intrinsically unidirectional. Moreover, the divergent transcripts result from their
own reverse-oriented core promoters. Using DNaseI accessibility they
determined that unidirectional promoters are depleted at the edges of open
chromatin. The authors suggest that divergent transcription is not an inherent
property of the transcription process, but a consequence of the presence of
both forward and reverse-directed promoters. This suggestion is in line with
the two occupancy peaks observed for each TBP and Pol II by the Lis lab
[54]. The Lis lab observed tight spacing (estimated 110 bp) between the
forward and reverse-directed promoters [54], whereas the Ohler & Kadonaga
labs, observed variable, however larger, spacing between the two [56]. It
remains to be determined whether the difference between these findings
results from the differences between the different cell lines used or from the
analysis methodology.
Despite the impressive discoveries related to bidirectional transcription
in the last few years (which highlight the complexity of gene expression), the
functional role of short non-coding antisense RNAs still remains elusive. From
this point onwards, we only refer to the comprehensively studied focused and
dispersed core promoter types.
2. Core promoter elements: the combinatorial code of precise
transcription initiation
The Pol II core promoter is composed of short DNA sequences that are
referred to as core promoter elements or motifs. The majority of core promoter
motifs serve as binding sites for components of the basal transcription
machinery, in particular TFIID, which is composed of TATA box-binding
protein (TBP) and TBP-associated factors (TAFs), and TFIIB [4, 57, 58].
122
The basal transcription machinery recruits Pol II to the core promoter
that directs the initiation of transcription [4, 6, 9, 59-61]. Nevertheless, there
are no universal core promoter elements, and diverse core promoter
compositions have been reported [6, 62]. In this section, we will briefly discuss
the majority of core promoter elements (schematically depicted in Fig. 1C and
D), which have been analyzed in Drosophila and mammals, with particular
emphasis on their variety and the relations between them.
2.1. The precisely-positioned core promoter elements are common in
the focused promoters
Early studies from the Chambon lab described the existence of a putative
element at the TSS [63]. The function of the initiator (Inr) as a transcriptional
element that encompasses the +1 TSS was articulated by Smale and
Baltimore [64]. The Inr is probably the most prevalent core promoter motif in
focused core promoters [65-67]. It is mainly bound by the TAF1 and TAF2
subunits of TFIID [68-70]. The mammalian Inr consensus sequence is
YYA+1NWYY (IUPAC nomenclature) [71], and the Drosophila consensus is
TCA+1KTY [70, 72]. Inr-like sequences were also identified in Saccharomyces
cerevisiae [73]. Computational analyses of promoters argue that the Inr
consensus is only YR (-1, +1 positions) in humans [10, 26, 74] or TCA+1GTY
for Drosophila [65, 67]. The A nucleotide (or R in the YR consensus) is
generally designated as the +1 position, even when transcription does not
initiate at this specific nucleotide. This critical convention is instrumental,
because functional downstream elements are completely dependent on the
presence of an Inr and the precise spacing from it [6, 9, 12].
Notably, a strict version of the mammalian initiator (sINR), which is
present in 1.5% of human genes and enriched in TATA-less promoters of
specific functional categories, was defined as CCA+1TYTT, with conserved
sequences flanking the motif [75]. The sINR motif functions in cooperation
with Sp1 and can replace the conventional Inr, but not vice versa. Similarly to
the canonical Inr element, sINR is bound by TAF1 and its function depends on
it [75]. The YY1 transcription factor binds sINR, but this binding is dispensable
for sINR function [75].
123
In addition to these versions of the Inr, a few core elements that
encompass the transcription start site were identified. The polypyrimidine
initiator motif (TCT), which was originally identified in mouse, is conserved
from Drosophila to humans [13, 76-78]. The TCT has a consensus sequence
of YYC+1TTTYY in Drosophila and YC+1TYTYY in humans, in which C is the
+1 TSS. Although the Inr consensus resembles the TCT consensus, the TCT
motif cannot substitute for an Inr to initiate transcription [13]. The TCT
overlaps with a motif that was previously identified in humans, termed 5'-
terminal oligopyrimidine tract (5'-TOP) (reviewed in [79]), which is functionally
distinct from it [13]. Both the TCT and the 5‟-TOP elements are enriched and
are functional in the transcription of ribosomal protein genes and proteins
involved in the regulation of translation [13, 76].
Two additional core promoter motifs that are located around TSSs
were originally identified in the hepatitis B virus X gene promoter, which
contains two TSSs. The X gene core promoter element 1 (XCPE1) drives Pol
II transcription from the first TSS of the X gene promoter as well as from other
human promoters, when accompanied by co-activator sites. XCPE1 is found
in ~1% of the human genes (particularly TATA-less genes) and its consensus
sequence DSGYGGRASM spans positions -8 to +2 relative to the TSS [80].
Unlike XCPE1, The X gene core promoter element 2 (XCPE2) is sufficient to
drive Pol II transcription by itself. The XCPE2 directs transcription from the
second TSS of the X gene mRNA, but it also drives transcription from
additional human promoters, in a TAF-free manner. Its consensus sequence
VCYCRTTRCMY spans positions -9 to +2 relative to the TSS [81].
There are core promoter elements that are located upstream of the
TSS. The TATA box motif is the first core promoter motif to be identified [82].
Although the TATA box was previously considered to be a universal element,
it is presently estimated that only 8%-30% of metazoan core promoters [26,
32, 59, 67, 83] and 20%-46% yeast promoters [61, 84, 85] are TATA-
dependent. The TATA box motif is also present in plants [86, 87], however the
majority of Arabidopsis promoters are TATA-less [88]. The TATA box is bound
by the TBP subunit of TFIID ([5, 6, 62] and refs therein). Both the TATA box
element and the TBP are conserved from archaebacteria to humans [9, 89].
The consensus sequence of the TATA box is TATAWAAR, where the 5' T is
124
usually located at -30 or -31 relative to the TSS in metazoans (or at -120 to -
40 in yeast). A wide range of sequences can functionally replace the yeast
TATA box for in vivo transcriptional activity [90].
The TFIIB recognition elements (BRE), which are bound by the TFIIB
basal transcription factor, are located immediately upstream or downstream of
the TATA box, respectively [91-93]. TFIIB contacts these two elements by two
independent DNA-recognition motifs within its core domain [92]. The
consensus of the upstream BRE (BREu) is SSRCGCC [93], and the
consensus of the downstream BRE (BREd) is RTDKKKK [91]. The TFIIB and
the BRE elements are conserved from archaebacteria to humans [6, 92]. Both
BREu and BREd act in conjunction with the TATA box [6, 9]. A bioinformatics
analysis using the EPD database showed that 25% of the eukaryotic core
promoters contain a potential BREu [83]. Surprisingly, this study revealed that
the BREu is more prevalent in TATA-less promoters (28.1%) than in TATA-
containing promoters (11.8%). Both elements exert positive as well as
negative effects on basal transcription and on activated transcription in a
manner that is context-dependent [91, 93-95].
In addition to the abovementioned upstream elements there are core
promoter elements that are located downstream of the TSS. The downstream
core promoter element (DPE), which was discovered as a TFIID recognition
site that is downstream of the Inr, is precisely located at +28 to +33 relative to
the A+1 of the Inr, with a functional range set of DSWYVY [96-98]. In addition
to this functional range set, the guanine at +24 was shown to contribute to
DPE function [98]. The DPE is prevalent in developmental gene networks [10,
14, 95, 99]. Importantly, a recent study provides in vivo evidence that
expression driven by the homeotic Antennapedia P2 promoter during
Drosophila embryogenesis is dependent on the DPE [99]. The motif ten
element (MTE) was identified as an overrepresented core promoter
sequence, which is located immediately upstream of the DPE, encompassing
positions +18 to +29 relative to the A+1 of the Inr [67]. As positions +28 to +29
overlap the DPE, the MTE consensus sequence was defined for positions +18
to +27 (CSARCSSAAC) [100]. Although the majority of the MTE-containing
promoters contain a DPE, the MTE motif functions independently of the DPE
[100, 101]. Both the MTE and DPE serve as recognition sites for TFIID and
125
appear to be in close proximity to TAF6 and TAF9 [97, 101]. Using single-
nucleotide substitution analysis, the MTE and DPE together were found to
consist of three functional sub-regions: positions 18-22, 27-29 and 30-33
downstream to the A+1 of the Inr. The bridge configuration, which includes the
first and the third functional sub-regions (bridge I, positions 18-22 with favored
nucleotides CSARC; bridge II, positions 30-33 with favored nucleotides
WYVY), was shown to be a naturally rare but functional core promoter
element [101]. Both the MTE and DPE are conserved from Drosophila to
humans [6, 96, 97, 100-104]. The MTE, DPE and Bridge motifs are
exclusively dependent on the presence of a functional Inr, and are enriched in
TATA-less promoters. However, co-occurrence of putative TATA, Inr and DPE
motifs was observed in a small fraction of Drosophila genes [14, 83].
An additional downstream element was identified and characterized in
the human adult β-globin promoter. This element, termed downstream core
element (DCE), was detected by scanning mutagenesis of the +10 to +45 in
the promoter region. The DCE is composed of three sub-elements, located at
positions +6 to +11 (necessary motif CTTC), +16 to +21 (necessary motif
CTGT), and +30 to +34 (necessary motif AGC) relative to the TSS. The DCE
is distinct from the MTE, DPE and Bridge downstream elements, as the DCE
is recognized and bound by TAF1 [105] and not by TAF6 or TAF9 [97, 101].
Moreover, unlike the DPE, the DCE is frequently found in TATA box-
containing promoters [105, 106].
2.2. Core promoter elements with weak positional biases in dispersed
promoters
Even though the vast majority of core promoter elements are precisely located
in focused promoters, there are still a few variably located motifs that were
also identified in dispersed promoters. These variably located elements, like
some of the precisely located elements discussed above, are associated with
specific gene groups.
As mentioned, there are sequence motifs such as the DNA-replicated-
related element (DRE) and Ohler 1, 6 and 7 motifs, which were detected by a
computational analysis as commonly expressed in dispersed promoters of
126
Drosophila genes with maternally inherited transcripts [27]. The consensus
sequences of the DRE, Ohler 1, 6 and 7 motifs are WATCGATW,
YGGTCACACTR, KTYRGTATWTTT and KNNCAKCNCTRNY, respectively
[67]. The DRE is a target of the DNA replication-related-element binding factor
(DREF). DREF, which was discovered in Drosophila and was later found to
have orthologues in many other species (including humans), is involved in
transcriptional regulation of proliferation-related genes [107]. A motif 1 binding
protein (M1BP) has recently been identified and the enrichment of Motif 1 and
M1BP was implicated in cytoskeletal organization, mitotic cell cycle and
metabolism [108].
2.3. The interplay between core promoter elements
With the notion that there are no universal core promoter elements and that
core promoter elements are a very important feature of regulation of gene
expression, many studies examined the combinations between core promoter
elements such as: Inr, TATA box, BREu, BREd, MTE and DPE, and their
effects on the transcriptional output. For example, the BRE elements were
originally characterized as functional elements with conjugation to TATA box.
In this context, both the BREu and the BREd either increase or decrease the
levels of basal transcription [91, 93, 94, 109]. Notably, the addition of a BREu
element to a core promoter of a Caudal target gene has a differential effect on
transcription in a TATA box- or DPE- context [95]. The TATA box and the Inr
cooperate, in certain cases, as synergistic elements [110]. An antagonistic
behavior was demonstrated between TBP, which activates TATA transcription
and inhibits DPE transcription, and NC2 and Mot1, which activate DPE
transcription by inhibiting the function of TBP [111].
The functionality of the DPE, MTE and Bridge elements is, by
definition, dependent on their precise location relative to the Inr [96, 97, 100,
101]. Synergy was observed between the MTE and DPE, as well as between
the MTE and TATA box [100]. Based on these relationships, a synthetic core
promoter, termed super core promoter (SCP), containing a TATA box, Inr,
MTE and DPE was designed. Remarkably, the SCP is stronger than any of
the natural core promoters examined [112].
127
Collectively, these findings indicate that the levels of gene expression
can be modulated by the core promoter composition. Such modulation is
directly achieved by the impact of the combinations of core promoter elements
on the architecture of the basal transcription machinery, which provides an
additional level of transcriptional regulation. The core promoter may have
diversified during evolution so that each element may work with the other,
depending on the context and organism. Hence, simple categorization may
disregard the complexity of gene expression.
3. Functional and structural insights regarding the role of the core
promoter in the assembly of the Pol II transcription machinery
In this section, we describe the assembly of the basal transcription
machinery components (primarily based on the analysis of TATA-dependent
promoters) and their distinct roles in specific cellular contexts.
3.1. Terminology change: from “general” to “basal” transcription
machinery
Classic biochemical studies performed over 30 years ago using the TATA
box-containing adenovirus major late promoter identified the general
transcription factors (GTFs) as accessory factors for accurate Pol II
transcription initiation [113, 114]. The GTFs were named TFIIA, TFIIB, TFIID,
TFIIE, TFIIF and TFIIH, based on the protein fractions they purified in
(reviewed in [4]) . These components, together with Pol II, were necessary
and sufficient for basal transcription of the adenovirus major late promoter.
They assemble into the preinitiation complex (PIC) by protein-protein
interactions and by mediating core promoter recognition (Fig. 2B).
In the past, it was generally accepted that the PIC composition of GTFs
does not vary between promoters with different core promoter architecture,
and the PIC is nucleated by the binding of the TBP subunit of TFIID, which
binds the TATA box [115] (reviewed in [4, 30]). Traditionally, this simple model
has been considered “general”. However, due to the diversity in core promoter
composition and the realization that the known GTFs are insufficient to
transcribe DPE-containing promoters [116], it is suggested that the GTFs do
128
not function in a “general” manner, and different compositions of PIC exist.
Indeed, the non-ubiquitous expression pattern of certain TAFs imply that they
cannot be PIC components in every cell type [57]. Moreover, many studies
have presented the variability in PIC formation, specifically by the molecular
flexibility in TFIID composition. Hence, GTFs should be addressed as “basal”
rather than “general” transcription factors (also discussed in [57, 117-119]).
3.2. Compatibility between PIC components, related factors and core
promoter elements
Undoubtedly, the diverse assembly of the basal transcription factors, as well
as the diversity of core promoter elements, is a complex subject, both
structurally and functionally. Nevertheless, due to this complexity, the PIC,
which is pivotal for core promoter recognition ([57, 117, 120] and refs therein),
can assemble at core promoters with varying compositions and regulate Pol II
transcription in different cells and organisms. In agreement with that,
requirements for a “match” between the PIC and the core promoter have been
observed in recent years.
This compatibility has mainly been reflected in studies addressing the
flexibility and modularity of TFIID subunits and the entire TFIID complex. Early
footprinting assays detected differential TFIID protection patterns with respect
to the presence of a TATA box and BRE in mammalian promoters [121, 122],
and a DPE in Drosophila [97]. These studies and others [123] have
demonstrated the important roles of TAFs in the assembly of the PIC, and
hence, in the transcription process. As mentioned earlier, sub-modules of
TFIID bind specific core promoter elements, e.g. TBP binds the TATA box,
TAF1/TAF2 bind the Inr, TAF1 binds the DCE and TAF6/9 bind the DPE and
the MTE (Fig. 2C) [68-70, 96, 97, 100, 103, 105]. It is noteworthy that
TAF4/TAF12 and TAF4b/TAF12 sub-complexes can also bind core promoters
[103], and are necessary for transcription of a sub-group of genes, which are
mostly associated with TATA box and Inr motifs [124]. Interestingly, TAF1
contains two distinct enzymatic activities: an acetyl-transferase and a kinase
activity, which are important for regulating non-overlapping, different gene
129
sets in vivo [125], suggesting that different functional modules of the PIC
contribute to transcription of different target genes.
While TBP and TAF1 were initially considered the nucleating subunits
of holo-TFIID assembly [126], Wright et. al. [127] discovered that Drosophila
TAF4 preferentially nucleates TFIID in TATA-less, DPE-containing promoters.
This study also uncovered a stable core-sub-complex, composed of TAF5 and
the histone fold domain (HFD)-containing TAF4, TAF6, TAF9 and TAF12.
This core sub-complex is associated with the peripheral subunits TAF1, TAF2,
TAF11 and TBP. These core TAFs are incorporated into TFIID in two copies,
and are organized in five heterodimer pairs with other HFD-containing TAFs
(TAF3-TAF10, TAF6-TAF9, TAF4-TAF12, TAF8-TAF10 and TAF11-TAF13)
([120] and refs therein). Recent structural analysis of human TFIID
demonstrated that these core TAFs exhibit two-fold symmetry [128].
Interestingly, incorporation of the TAF8-TAF10 pair breaks the symmetry and
allows the entry of the single copy TAFs and TBP into the structure, resulting
in an asymmetric holo-TFIID that can nucleate the PIC.
Several TBP-free complexes have been characterized [123, 129, 130]. One of
them, the TBP-free TAF-containing complex (TFTC), is capable of replacing the
canonical TFIID at both TATA-less and TATA-containing promoters in vitro [123]. The
assembly of TAF-less TBP-containing complexes (such as TBP-TFIIA-containing
complexes) at specific core promoters, which was somewhat surprising, has also
been observed [131-133]. A TAF-free TBP-containing PIC is important for
transcription from HIV-1 LTR promoter [132]. Interestingly, a distinctive TBP-TAF
complex, lacking TAF1, TAF4 and TAF10, is involved in transcription of the U2
snRNA gene [134].
These findings add to a growing body of evidence implying that distinct
core promoters would be differentially recognized by PICs that contain TBP or
are devoid of it. Notably, TBP activates TATA-dependent transcription and
represses DPE-dependent transcription, whereas Mot1 and NC2 block TBP
function and thus repress TATA-dependent transcription and activate DPE-
dependent transcription [111, 135]. Interestingly, Deng et. al. [136]
demonstrated that NC2 acts positively at promoters that lack functional BREs,
while TFIIA recruitment, which is dependent on the presence of BREs,
reduces transcriptional activity. The association of BRE elements with TATA
130
boxes further supports these findings [83, 93]. Interestingly, the architectural
DNA-binding protein HMGA1 has been shown to interact with the Mediator
and activate transcription of mammalian promoters containing both a TATA
box and an Inr [137].
Remarkably, the Nogales lab used electron microscopy to visualize
human TFIID with promoter DNA, and discovered that TFIID exists in two
structurally distinct conformations (termed canonical and rearranged) [138].
The transition between the two states is modulated by TFIIA, and the
presence of TFIIA and promoter DNA facilitates the formation of the
rearranged conformation [138]. Human TFIID is composed from three main
structural lobes (termed lobe A, B and C) [138, 139]. Using the super core
promoter DNA [112], lobe C was shown to interact with downstream elements
(DPE and MTE), while lobe A interacts with the Inr and TATA box.
Three TBP-related factors (TRF1, TRF2 and TRF3) have been
discovered in the animal kingdom based on their homology to the C-terminal
core domain of TBP, which is essential for interaction with the TATA box
(reviewed in [117-119, 140-142]. Unlike TRF1 and TRF3 (also termed TBP2
and TBPL2), TRF2 (also termed TLP, TLF, TRP and TBPL1), is unable to
recognize the TATA box, as the TATA-interacting Phe residues of TBP are
not conserved in TRF2 [143-145]. Drosophila TRF2 selectively regulates the
TATA-less Histone H1 promoter, whereas TBP regulates the TATA-containing
core Histones genes [133, 146]. The Kadonaga lab has recently discovered
that TRF2, and not TBP, regulates transcription of ribosomal protein genes
that lack TATA box and contain functional TCT motifs [147]. Kedmi et. al.
[148] discovered that TRF2 preferentially functions as a core promoter
regulator of DPE-containing promoters. These findings and others have
highlighted the involvement of TRF2 in the regulation of diverse biological
processes driven by distinct core promoter compositions (reviewed in [119]).
Taken together, promoter recognition by multiple TAFs, TRFs, TBP-free or
TBP-containing complexes, underscore a key regulatory role for core
promoters in transcription initiation, and may provide an explanation for
evolutionary changes affecting the PIC-promoter interface [149].
131
3.3. Different basal transcription factors promote distinct biological
processes
The diversity in the components of the PIC, especially in TFIID subunits,
establishes distinct protein complexes that drive transcription of specific sets
of genes (e.g. with cell type- or tissue-specific functions) (reviewed in [150]).
The Wassarman lab has shown that Drosophila TAF1 affects multiple
developmental events in vivo [151], and that Drosophila TAF6 is broadly
required for cell growth and cell fate specification [152]. Moreover, Drosophila
TAF4 and TAF6 were shown to be required for transcription of the snail and
twist Dorsal-target genes in vivo [153]. Human TAF8 was implicated in
differentiation of cultured 3T3-L1 preadipocytes to adipocytes [154].
Interestingly, the Drosophila TAF10 homologues TAF10 and TAF10b, are
differentially expressed during Drosophila embryogenesis [155]. Expression of
mouse TAF10 was later shown to be required for early mouse embryogenesis
of the inner cell mass, but not the trophoblast [156]. Remarkably, conditional
knock out of mouse TAF10 in embryonic and adult liver resulted in the
dissociation of TFIID into individual components [157]. Based on these
findings, it was suggested that TFIID is not required for the maintenance of
ongoing transcription of hepatic genes. Rather, it is involved in mechanism of
postnatal silencing of hepatic genes [157]. Additional studies reveal an
important role for distinct TFIID complexes in regulating pluripotency of
embryonic stem cells [158, 159].
Multiple TAF paralogues have been implicated in different biological
processes. A retroposed homologue of human TAF1 (TAF1L) and TAF7L are
expressed during male germ-cells differentiation [160, 161]. Similarly to
humans, TAF7L in mice is required for spermatogenesis in cooperation with
TRF2 [161-163]. TAF7L was recently demonstrated to be an important
regulator of white- as well as brown- adipose tissue differentiation [164, 165].
TAF4b was originally identified as a cell-type-specific TAF in a human B
lymphocyte cell line [166]. Using knockout mice, TAF4b was shown to be
important for ovarian development and spermatogenesis [167-170].
Remarkably, mouse TAF9L was recently shown to regulate neuronal gene
expression in vivo [171]. Interestingly, tissue-specific TAF homologues of
132
Drosophila TAF4 (no hitter), TAF5 (cannonball), TAF6 (meiosis 1 arrest),
TAF8 (spermatocyte arrest) and TAF12 (ryan express) collaborate to control a
testis-specific transcriptional program [172].
TBP paralogues are involved in distinct biological processes, such as
embryonic development, differentiation and morphogenesis (reviewed in [117,
119, 141, 173]). TRF2 regulates a subset of genes that differ from TBP-
regulated genes. TRF2 is essential for embryonic development of C. elegans,
Drosophila, zebrafish and Xenopus [117, 119, 141, 173]. It is highly
conserved in evolution and is present in all bilaterian organisms [143]. Since
bilaterian organisms contain three germ layers (endoderm, mesoderm and
ectoderm) and more ancient animals only contain two germ layers (endoderm
and ectoderm), it is tempting to speculate that TRF2 may be important for
mesoderm formation. This suggestion is further supported by the fact that the
DPE motif is prevalent among Drosophila genes that are involved in
embryonic development [14, 95]. Mouse TRF2, unlike C. elegans, Drosophila,
zebrafish and Xenopus TRF2, is not required for embryonic development but
is essential for spermiogenesis [174, 175]. A separate study demonstrated
that the cleavage of TFIIA- precursor (into the and subunits of TFIIA) is
necessary for activation spermiogenic TRF2 target genes [176]. Drosophila
trf2 is also required for the response to the steroid hormone ecdysone during
Drosophila metamorphosis [177]. Hence, TRF2 drives multiple transcriptional
programs [119].
Zebrafish TRF3 is important for initiation of hematopoiesis during
embryonic development [178, 179], however, both zebrafish and Xenopus
TRF3 are mainly expressed in oocytes and are essential for embryogenesis
[180, 181]. Mouse TRF3, which is exclusively expressed in oocytes, is
essential for the differentiation of female germ cells but not for embryonic
development [182].
These fascinating findings emphasize the motivation to investigate the
regulation of gene expression at the core promoter level. It is possible that there are
core promoter motifs that have not yet been discovered, and they might be bound by
other PIC components. Thus, the analysis of novel core promoter elements in
multiple organisms is likely to shed light on mechanistic aspects of transcriptional
regulation.
133
4. Enhancer-promoter connectivity
Zooming out from the basal transcription resolution uncovers another facet of
regulation of gene expression, namely, enhancer-promoter interactions that
regulate the activation of specific genes in a precise spatio-temporal manner.
Enhancers contain DNA binding sites for sequence-specific transcription
factors that in turn, recruit co-activators and co-repressors and determine the
overall activity of the enhancers (reviewed in [183-190]). Originally, scientists
searched for enhancers as cis-regulatory elements that stimulate transcription
levels from the nearest promoter, irrespective of orientation. Enhancer-
promoter pairs are commonly engaged by enhancer's looping, which
physically brings these regulatory elements into proximity, through recruitment
of multiple proteins (activators, co-activators, Mediator, cohesin and the PIC).
Studies in recent years, employing advanced global methodologies such as
chromatin conformation capture (3C), its derivatives (4C, 5C, Hi-C) and ChIA-
PET, have led to the discovery of both intrachromosomal and
interchromosomal physical contacts with promoters. While multiple
enhancers can interact with multiple promoters, specificity has been
observed. The mechanisms that determine enhancer–promoter specificity are
still poorly understood, but they are thought to include biochemical
compatibility, constraints imposed by the three-dimensional architecture of
chromosomes, insulator elements, and effects of local chromatin environment
[190].
In the last twenty years, the compatibility of enhancer-promoter
interactions has mostly been studied in Drosophila. One of the early studies
analyzing the compatibility between enhancer-promoter pairs examined the
expression of the neighboring gooseberry (gsb) and gooseberry neuro (gsbn)
genes [191]. Swapping experiments revealed that although both enhancers
(GsbE and GsbnE) are located between the two TSSs of the two genes (and
thus cross-activation could potentially occur), the GsbE could only activate the
gsb promoter, while the GsbnE could only activate the gsbn promoter.
Another study showed compatibility between the decapentaplegic (dpp)
promoter and its enhancer, which only activates the dpp gene, but not other
genes that are located closer to it [192]. Erythroid-specific long-range
134
interactions have been observed in vivo between the active murine β-globin
gene and the locus control region (LCR) [193]. These long-range interactions
of the β-globin gene were not observed in non-expressing brain cells. High-
throughput imaging of thousands of transparent transgenic zebrafish embryos
(which were injected with about two hundred combinations of enhancer-core
promoter pairs driving the expression of the GFP reporter gene),
demonstrated the specificity of individual enhancer-promoter interactions and
underscored the importance of the core promoter sequence in these
interactions [194]. Taken together, these results demonstrate distinct
compatibilities of enhancers to their cognate promoters and the importance of
the core promoters in the regulation of enhancer-promoter interactions.
While a few studies in Drosophila demonstrated the involvement of
proximal-promoter elements in enhancer specificity [195, 196], there are
multiple examples of enhancer-promoter communications that are affected by
specific core promoter elements. Promoter competition experiments revealed
that both the AE1 enhancer from the Drosophila Antennapedia gene complex
and the IAB5 enhancer from the Bithorax gene complex preferentially activate
TATA-containing promoters when challenged with linked TATA-less
promoters [197]. Nevertheless, both enhancers were able to activate
transcription from a TATA-less promoter in reporters that lacked a linked
TATA-containing promoter [197]. Enhancer-promoter specificity was first
demonstrated in transgenic Drosophila sister lines that contain a DPE- or a
TATA-dependent reporter gene at precisely the same genomic position
relative to the enhancer [198]. Remarkably, this study identified enhancers
that can discriminate between core promoters that are dependent on a TATA
or a DPE motif. Furthermore, Caudal, a sequence-specific transcription and a
key regulator of the Drosophila HOX gene network, activates transcription
with a preference for a DPE motif relative to the TATA-box [95]. More
recently, Zehavi et. al. [14] analyzed the Drosophila dorsal-ventral
developmental gene network that is regulated by the sequence-specific
transcription factor Dorsal, and discovered that the majority of Dorsal target
genes contain DPE sequence motifs. The DPE motif is functional in multiple
Dorsal target genes, as mutation of the DPE leads to a loss of transcriptional
activity. Moreover, the analysis of hybrid enhancer-promoter constructs of
135
Dorsal targets reveals that the core promoter plays a pivotal role in the
transcriptional output [99].
High-throughput analyses of enhancers in diverse biological systems
have led to a wealth of information with regards to long-range enhancer-
promoter interactions and three-dimensional chromatin landscapes. We
highlight several remarkable findings below. First, most of the enhancer-
promoter interaction loops of regulated genes are distal, and are not localized
at the nearest promoter as originally considered [199-201]. Second, enhancer
looping enables cooperative regulation of genes of the same biological
process by organizing them in physical proximity [199, 201]. This may indicate
a similar core promoter composition among these gene networks or gene
clusters (as previously described for the Hox and dorsal-ventral
developmental gene regulatory networks [14, 95]).
A recently developed genome-wide screen termed STARR-seq (self-
transcribing active regulatory region sequencing) identified thousands of
enhancers that could activate transcription of a synthetic promoter containing
four core promoter elements in a single promoter - the TATA-box, Inr, MTE
and DPE motifs [202]. Notably, enhancers near ribosomal protein genes were
under-represented among the enhancers identified in this study, which could
be due to the fact that the majority of ribosomal protein gene promoters are
regulated via the TCT core promoter element [13, 190, 202].
Remarkably, both the Furlong lab analyzing enhancer three-
dimensional contacts during Drosophila embryogenesis, and the Ren lab
analyzing long-range chromatin interactions in human cells, discovered that
the majority of enhancer interactions remain unchanged during marked
developmental transitions or activation following gene induction, respectively
[199, 203]. This “on-hold” enhancer-promoter connections, may be preparing
the cell for rapid activation of transcription. The Furlong lab discovered that
the pre-existing loops are associated with paused Pol II and proposed a
model where through transcription factor–enhancer occupancy, an enhancer
loops towards the promoter and polymerase is recruited, but paused in the
majority of cases (Pol II pausing is discussed below). They suggest that the
subsequent recruitment of transcription factor(s) or additional enhancers at
preformed enhancer-promoter interaction hubs could trigger activation by
136
releasing Pol II pausing [203]. Notably, enhancer–promoter interactions
analyzed in these studies involve active promoters, with high enrichment for
H3K27ac and H3K4me3, and active enhancers, defined by H3K27ac, Pol II
and H3K79me3, indicating similarities in 3D regulatory principles from flies to
humans [199, 200, 203].
Strikingly, the Stark lab has recently demonstrated that distinct sets of
enhancers activate transcription with core promoter specificity using two types
of Drosophila cultured cells [204]. They used the core promoter of a ribosomal
protein gene driven by the TCT motif, as a representative of housekeeping
promoters, and a synthetic promoter (derived from the even skipped
promoter), which contains four core promoter elements in a single promoter -
the TATA-box, Inr, MTE and DPE motifs, as a representative of
developmental promoters. Thousands of enhancers exhibit a marked
specificity to one of the two core promoters - the housekeeping promoter or
the developmental promoter. Interestingly, TSSs next to housekeeping
enhancers were enriched in Ohler motifs 1, 5, 6 and 7 (consistent with the
ubiquitous expression and housekeeping functions of these genes), whereas
TSSs next to developmental enhancers were enriched in TATA box, Inr, MTE
and DPE motifs (which are associated with cell-type-specific gene
expression).
Taken together, these observations strengthen the concept that the
core promoter composition is not only a pivotal component in basal
transcription and initiation, but also an active regulator of transcription that is
instrumental for activating developmental and housekeeping gene regulatory
programs via sequence-encoded enhancer-promoter specificity.
5. Transcription initiation, Pol II recycling and steps in between: the
crosstalk between the core promoter and other modules in the
transcription cycle
Apart from transcription initiation, Pol II-driven transcription cycle contains
additional steps: elongation and termination. These steps contain at least
eight transition points at which transcription is regulated by multiple dedicated
factors, and each can be rate limiting (reviewed in [205, 206]). Moreover,
137
maturation of mRNA precursors occurs co-transcriptionally [207]. Below, we
briefly describe these highly regulated steps with a focus on the direct or
indirect role of the core promoter.
5.1. Timing and synchrony - Pol II pausing and productive elongation
Early elongation, following proper transcription initiation and preceding
productive elongation, contains two sequential steps: promoter-escape and
promoter-proximal pausing of Pol II. Pol II pausing is a highly regulated step,
which is characterized by accumulation of Pol II, typically at 20-60 nucleotides
downstream of the TSS (reviewed in [206, 208, 209]). The transition from
initiation to early elongation is regulated by multiple factors and
phosphorylation events of the heptad repeats within the C-terminal domain
(CTD) of the largest subunit of Pol II. The CTD is mostly unphosphorylated
when Pol II is recruited to the promoter. Serine 5 (Ser5) of the CTD is then
phosphorylated by TFIIH, which causes destabilization of the interaction
between Pol II and other PIC components and thus, permits promoter escape
and early elongation. Following Ser5 phosphorylation, association of DRB
sensitivity-inducing factor (DSIF) and Negative elongation factor (NELF)
complexes with the phosphorylated Pol II leads to pausing at the promoter-
proximal region [210]. Next, positive transcription elongation factor b (P-TEFb)
complex phosphorylates the Ser2 residue of the Ser5-phosphorylated CTD,
and the DSIF and NELF factors. These post-translational modifications result
in productive elongation (reviewed in [206, 208, 209]).
Pol II pausing was originally identified in Drosophila heat-shock and
human c-myc genes [211-214]. Although Pol II pausing was originally
considered to be restricted to a few specific genes, nowadays, the pausing of
Pol II appears to be a common step in transcription process of multiple genes
from C.elegans [215] to humans, and generally prevalent in metazoans [21,
216-220]. Specifically, multiple genome-wide assays and studies in vitro and
in vivo, mostly in Drosophila, showed that the Pol II pausing has a role in
facilitating metazoan developmental control genes and genes that respond to
environmental stimuli ([221] and refs therein, [215]). Thus, Pol II pausing
contributes to developmental dynamics, along with designated transcription
138
initiation programs [222, 223]. It was previously argued that Pol II pausing
prepares genes for a rapid and synchronous induction. Recent studies,
however, suggest that paused Pol II is not absolutely required for rapid gene
induction, as genes in which Pol II is not paused, can be induced just as
quickly, and to even higher levels than paused genes ([209, 221] and refs
therein). Promoters regulated by pausing possess a distinct chromatin
architecture that may facilitate the plasticity of gene expression in response to
signaling events [209]. Notably, paused Pol II complexes were recently shown
to be more stable than originally considered, and thus, pausing may serve as
a time-window to integrate regulatory signals [224]. There are two known
sequence-specific transcription factors that regulate pausing: the GAGA factor
(GAF) [211, 212, 218, 225] and the more recently identified M1BP factor
[108].
Pausing allows synchronous gene expression of developmentally
regulated genes following their induction during embryogenesis [221, 226-
229]. Differences in synchronicity are most likely due to the core promoter
composition, as demonstrated by promoter-swapping experiments [227] and
the relationship between Pol II pausing and core promoter sequence during
Drosophila development [226, 230].
The positive elongation factor P-TEFb controls NFκB target genes
driven by TATA-containing promoters, whereas the negative elongation factor
DSIF controls weak TATA and TATA-less genes [231]. Interestingly,
Drosophila TATA-dependent promoters are associated with a low degree of
pausing [226, 230], suggesting that the TATA box prevents Pol II pausing and
promotes P-TEFb activity, leading to a more productive elongation [231].
Remarkably, the Levine lab has shown that at least one fourth of
paused Drosophila promoters contain a shared sequence motif, the „„pause
button‟‟ (PB), whose consensus (KCGRWCG) [232] is similar to that of the
DPE (DSWYVY) [9]. The PB motif is typically located between +25 and +35
(somewhat overlapping the DPE, although it has a wider distribution with
regards to its location relative to the TSS). Over one-fifth of the paused
Drosophila promoters are enriched for the DPE, MTE and PB core promoter
motifs, all of which are located close to the pause site [232]. Notably, 75% of
the genes in the dorsal-ventral network were identified as paused genes
139
[232]. Over two thirds of Dorsal target genes contain a DPE motif [14]. These
correlations, in addition to the fact that PB and DPE are GC-rich and share
the 'GGWC' sub-consensus, and that both motifs overlap with the paused Pol
II (see above), may indicate that the DPE, as opposed to the TATA box, could
contribute to Pol II pausing. The Adelman lab has later found out that both the
DPE and PB precisely align with the peak of Pol II pausing [219].
In addition, a current study indicates that whereas proximity of Pol II
pausing to the TSSs is correlated with focused initiation, pausing at dispersed
promoters is located more distally, and with a wider pattern [221, 233].
Moreover, it seems that in contrast to dispersed promoters, Pol II pausing at
focused promoters is not dependent on nucleosome regulation. When the
core promoter elements are not located at optimal position, or do not match
the consensus sequence, pausing appears to be weaker and located more
downstream (+60 to +80) than its typical location. Thus, initiation modes and
core promoter architecture affect the strength and location of pausing [233].
It is well known that enhancers play a major effect on activity and
synchrony of gene expression in development. Remarkably, Lagha et al. [227]
used a promoter swapping strategy and advanced imaging methods and
discovered that promoters of key developmental genes play a pivotal role in
pausing, which in turn determines the “time to synchrony”- the time it takes to
achieve coordinated gene expression in over 50% of the nuclei in the
developing Drosophila embryo. The authors demonstrate that substitutions of
paused promoters (e.g. tup), which show rapid and synchronous activity, with
non-paused promoters (such as pnr), result in slow and stochastic activation
of gene expression. Moreover, elements associated with pausing (e.g. GAGA)
influence the timing and synchrony of the gene expression. The synchronous
activation is essential for proper mesoderm invagination in the developing
Drosophila embryo. They provide evidence for a positive correlation between
pausing, synchrony and gene expression levels, which are necessary for
morphogenesis. Hence, it is the promoter, and not the enhancer, that
determines the levels of paused Pol II and the synchrony of gene activation
[227, 228].
To summarize, these studies provide evidence regarding different
aspects of regulation of Pol II pausing via the core promoter. However,
140
additional biochemical studies are needed to elucidate the mechanisms
underlying pausing.
5.2. Termination, polyadenylation and recycling of Pol II - back to
square one
The promoter and terminator modules define the boundaries of the
transcribed region of protein-coding genes. Transcription termination includes
dephosphorylation of the Pol II CTD, its disassociation from the 3'-end and
cleavage of the pre-mRNA. Furthermore, this highly regulated event is
coupled with the 3'-end polyadenylation processing [234]. Numerous factors in
multi-subunit protein complexes and several RNA elements mediate the
termination/polyadenylation processes, including two central complexes:
cleavage and polyadenylation specificity factor (CPSF) and cleavage
stimulation factor (CstF) [235, 236]. Although several factors are shared, the
termination mechanism for metazoan replication-dependent core histone
genes, which are not polyadenylated, is different than the termination
mechanism of polyadenylated genes (reviewed in [235, 237, 238].
There are mutual links between transcription initiation and termination/
polyadenylation. It should be noted that although many studies were done
using yeast, we focus here on metazoan transcriptional termination. The
CPSF complex was first immunoprecipitated and co-purified with holo-TFIID
from nuclear extracts of human cell-lines almost twenty years ago [239]. The
authors showed that CPSF is recruited to the core promoter by TFIID and
later dissociates from TFIID and continues to be associated with the
elongating Pol II and later with the polyA site. Specifically, the CPSF-160
subunit mainly interacts with TAF5, TAF7 and TAF12, but not with TAF1,
TAF10 and TAF15 and minimally, if at all, with TBP. Overexpression of TBP
reduced polyadenylation of transcripts initiated from a TATA-containing
promoter, while both polyadenylated transcripts and non-polyadenylated
transcripts that initiated from a TATA-less promoter were unaffected [58, 239].
Furthermore, the recruitment of CstF by TFIIB to the core promoter through
PIC assembly was also demonstrated ([240] and refs therein). Thus, subunits
of the main termination factors CPSF and CstF are brought to the PIC and
141
transferred to Pol II, which eventually leads to transcription termination.
Moreover, components of the core histone termination machinery were also
found associated with histone promoters ([235] and refs therein).
Nevertheless, it was previously observed that the termination/polyadenylation
machinery influences PIC assembly and the efficiency of transcription re-
initiation through Pol II recycling ([241] and refs therein). These transcription
initiation-termination/polyadenylation connections are mediated by two
different chromatin and genomic mechanisms: gene looping from 3'-end
processing sites to core promoters, which brings both modules into spatial
and physical proximity, and compartmentalization of genes into “gene
factories” [3, 235, 242]. It is noteworthy that these connections and couplings
are conserved throughout eukaryotes. In this regard, it is possible that the PIC
assemblies and 3'-associated machineries of the core histone genes are
particularly specialized, as compared to other protein-encoding genes [133,
235].
In a recent paper, Oktaba et al. [243] demonstrated that the promoters
are involved in the regulation of alternative cleavage and polyadenylation. The
nuclear RNA-binding protein embryonic lethal abnormal visual system (ELAV)
is known to inhibit the canonical polyadenylation processing at the 3' UTRs of
genes, which causes to Pol II read-through and 3' UTR extension, during the
development of the nervous system in Drosophila and vertebrates. The
authors provide evidence that ELAV-mediated 3' UTR extension is dependent
on the promoter and Pol II pausing in the developing Drosophila nervous
system [243]. Using double-labeling assays and swapping promoters
experiments, they show that only reporter constructs that were driven by
promoters of known extended genes in vivo, produced extended transcripts in
transgenic Drosophila embryos. Ectopic expression of ELAV in non-neural
tissues resulted in the induction of 3‟ UTR extension. Moreover, sequence
analysis of 252 neural-specific transcripts with 3‟ UTR extensions revealed the
enrichment of the GAGA motif and Pol II pausing. Indeed, reduced 3' UTR
extension levels were observed in GAGA-binding protein Tritorax-like (Trl)-
mutant Drosophila embryos. ChIP-seq analysis revealed the enrichment of
ELAV in promoter regions of extended genes, as well as in 3' UTRs and
introns. Thus, ELAV is selectively recruited to the 3' UTRs of extended genes
142
through paused Pol II promoters, perhaps via looping between the promoters
and the termination regions. Taken together, the above studies strengthen the
link between transcription initiation and termination and the pivotal role of the
promoter in this linkage.
6. Is the dogma really composed of sequential steps? – the
transcription-translation linkage
Traditionally, eukaryotic translation has been defined as a separate process
that is independent from transcription. However, the translation machinery
depends on mRNA-maturation processing, such as the m7G cap structure at
the 5‟ UTR and its associated protein complexes [244]. These complexes
recruit the small ribosomal subunit that in turn reaches the first codon, AUG,
via a 5' UTR scanning mechanism (reviewed in [245]). A common element for
translation initiation is the Kozak element (RCCAUGG), which contains the
AUG [246, 247]. In addition to this well-defined translational initiator, a
distinguished element, Translation Initiator of Short 5' UTR (TISU), was
recently identified. Remarkably, this element is important for transcription and
initiation of translation of a specific set of genes [248]. The TISU is found in
4.5% of the mammalian protein-coding genes, with consensus sequence of
„SAASATGGCGGC‟ with rigid core-sequence of 'ATG' located at +5 to +30,
and particularly positioned around the +10 relative to the TSS [59, 248, 249].
This core promoter element is enriched in TATA-less promoters of genes
mostly involved in cellular functions such as protein metabolism and RNA
processing. As a transcriptional element, it was shown to be necessary for
transcription and its function was mediated, at least in part, by YY1 [246, 248].
As a translational element, it was defined as an optimized translation initiator
for protein-coding genes possessing a very short 5' UTR (median of 12nt) that
mediates translation in cap-dependent but ribosomal-scanning independent
manner, as opposed to the Kozak sequence [246, 249]. The 5'-TOP, a
mammalian pyrimidine-tract regulatory element, was previously characterized
as a transcriptional and translational element [76, 77, 250, 251]. It was
identified as a core promoter motif used as a transcriptional "initiator" in many
protein-biogenesis genes, and its translational activity is critical under stress
143
conditions. The translational control element (TCE) [252], another
transcription/translation element, was previously shown to regulate translation
in Drosophila testes [253]. Katzenberger et. al. [254] recently showed that the
overlapping transcriptional motifs, testis element 1 (TE1) and testis element 2
(TE2), which are overrepresented in testis-specific core promoters, are
together identical (TE1/2 motif) to the original TCE. Thus, this element is a
transcriptional element, too. The TCE is identified as a transcriptional element
in 45% of Drosophila testis-specific genes that are driven by focused
promoters. Its consensus sequence is “CTCAAAATTT”, with enrichment in the
-5 to +25 region, but without precise location relative to the TSS [254].
Hence, these three core promoter motifs play pivotal roles in both
transcription and translation of distinct sets of genes. Moreover, correlations
between the TATA box and different features of genes (e.g. gene length) have
been observed [255]. This co-regulation of these processes raises questions
regarding the interplay between transcription and translation, such as: Do
downstream core promoter elements affect the translation of these genes?
Based on the fact that the 5' UTRs of some organisms are short, are these
elements evolutionarily conserved? Indeed, a recent study reveals general
associations and co-occurrence between translational and transcriptional
regulatory trends and features, including core promoter composition [256].
Taken together, the core promoter region is, at least in part, a central
intersection for coordinating transcription and translation.
7. Discussion and future perspectives
In this review, we discussed diverse aspects of regulation of gene expression,
particularly in metazoans, with an emphasis on the core promoter. We
highlighted the complexity of the core promoter architecture. Furthermore, we
presented its intricate connections and its pivotal influences on different steps
of transcription: initiation, elongation, termination, polyadenylation and finally,
translation (Fig. 3). Moreover, we would like to raise a few issues that are
directly related to the core promoter but were not mentioned above.
First, in addition to the diversity of core promoter elements and the
relationships between them, nucleotide polymorphism in the core promoter
144
affects its activity including its binding by the PIC components. Multiple lines
of evidence point towards polymorphisms in many human promoters,
particularly in the TATA box sequence. These TATA box substitutions can
affect TBP binding and core promoter activity, and are associated with human
diseases ([257], reviewed in [258]). It is expected that like TATA box
polymorphism, polymorphisms in other elements exist, and may be clinically
relevant.
Second, the enhancer-promoter interactome seems to be a much more
complex landscape than previously considered. In agreement with that,
promoter-promoter interactions have recently been found [259]. These
interactions behave as enhancer-promoter interactions, where one promoter
is able to act as an enhancer of another. Hence, hypothetical, more
complicated hierarchies of direct and indirect interactions between enhancers
and promoters could be achieved (e.g. generating an enhancer-promoter-
promoter hub).
Moreover, an additional regulatory aspect that is associated with
enhancers is the discovery of enhancer-derived RNAs (eRNAs). This class of
ncRNAs was only discovered a few years ago in humans [260]. eRNAs are
short-lived, 5'-capped transcripts produced from enhancer regions. Their
expression is correlated with histone marks of active enhancers (H3K4me1
and H3K27ac), and they are enriched for transcription factors, co-activators
(such as p300/CBP), basal transcription factors and Ser5-phosphorylated Pol
II. eRNAs are preferentially found in enhancers that contact their target
promoters though enhancer-looping, and it is suggested that these transcripts
play a role in generating or maintaining enhancer-promoter-loops and in
facilitating the recruitment of sequence-specific transcription factors,
chromatin remodeling or chromatin modifying complexes to the targeted
promoters [52]. Additionally, eRNAs are associated with several signaling-
pathways ([52, 53] and refs therein). Although eRNAs are extensively
investigated, also by high-scale methodologies [261], little is known about
their core promoter compositions and their TSS architectures [54]. Hence, one
of the future goals should be an in-depth investigation of the core promoter
architectures of eRNAs and their transcriptional machineries.
145
Actually, in agreement with the current knowledge that many active
mammalian promoters are bidirectional [21, 56], a study published several
months ago revealed shared architectures of bidirectional initiations at
promoters and active enhancers [54]. On one hand, similar trends and profiles
of transcription factor binding, nucleosome positioning, histone marks and
similar frequencies of sequence motifs such as the TATA box, BREs and Inr
(YR only) were present in both promoters and transcribed enhancers. On the
other, these modules differ in the stability of the transcripts that they
synthesize in each direction: promoters give rise to stable transcripts in the
sense direction, whereas promoter upstream antisense RNA and enhancer
RNAs are rapidly degraded [54]. This unifying architecture of TSSs [262]
along with recent findings (e.g. promoter-promoter interactions) challenge the
traditional classification of promoters and enhancers (see also [263]). It is
noteworthy that Core et. al. [54] indicated that although there are distinct
pause modes, which include proximal focused pausing and distal dispersed
pausing (see also [233]), the length between the bidirectional TSS pairs and
the peaks of TFIIB are not affected. This high-resolution analysis of nascent
RNAs might also imply that the high frequency of dispersed mammalian core
promoters observed previously, represents multiple independent initiation
sites acting as enhancers for neighboring promoters [54]. Thus, the
phenomena of dispersed mammalian promoters might be less abundant than
originally perceived. Taken together, the growing body of evidence indicates
that the core promoter lies at the heart of gene expression.
146
Acknowledgments
We thank Ron Even for graphic design assistance. We thank Jim Kadonaga,
Uwe Ohler, Sascha Duttke, Anna Sloutskin, Hila Shir-Shapira and Racheli
Harshish for critical reading of the manuscript. Core promoter-related
research in the Juven-Gershon lab is supported by grants from the Israel
Science Foundation (no. 798/10), the European Union Seventh Framework
Programme (Marie Curie International Reintegration Grant; no. 256491), the
United States-Israel Binational Science Foundation (no. 2009428; joint with
James T. Kadonaga) and the German-Israeli Foundation for Scientific
Research and Development (no. I-1220-363.13/2012; joint with Eileen E.M.
Furlong).
147
References
[1] E. Splinter, W. de Laat, The complex transcription regulatory landscape of our
genome: control in three dimensions, EMBO J, 30 (2011) 4345-4355.
[2] X. Dong, M.C. Greven, A. Kundaje, S. Djebali, J.B. Brown, C. Cheng, T.R.
Gingeras, M. Gerstein, R. Guigo, E. Birney, Z. Weng, Modeling gene expression
using chromatin features in various cellular contexts, Genome Biol, 13 (2012) R53.
[3] J. Shandilya, S.G. Roberts, The transcription cycle in eukaryotes: from productive
initiation to RNA polymerase II recycling, Biochim Biophys Acta, 1819 (2012) 391-
400.
[4] M.C. Thomas, C.M. Chiang, The general transcription machinery and general
cofactors, Crit Rev Biochem Mol Biol, 41 (2006) 105-178.
[5] J.E. Butler, J.T. Kadonaga, The RNA polymerase II core promoter: a key
component in the regulation of gene expression, Genes Dev, 16 (2002) 2583-2592.
[6] J.T. Kadonaga, Perspectives on the RNA polymerase II core promoter, Wiley
Interdiscip Rev Dev Biol, 1 (2012) 40-51.
[7] B. Li, M. Carey, J.L. Workman, The role of chromatin during transcription, Cell,
128 (2007) 707-719.
[8] E. Valen, A. Sandelin, Genomic and chromatin signals underlying transcription
start-site selection, Trends Genet, 27 (2011) 475-485.
[9] T. Juven-Gershon, J.T. Kadonaga, Regulation of gene expression via the core
promoter and the basal transcriptional machinery, Dev Biol, 339 (2010) 225-229.
[10] B. Lenhard, A. Sandelin, P. Carninci, Metazoan promoters: emerging
characteristics and insights into transcriptional regulation, Nat Rev Genet, 13 (2012)
233-245.
[11] N.D. Heintzman, B. Ren, The gateway to transcription: identifying, characterizing
and understanding promoters in the eukaryotic genome, Cell Mol Life Sci, 64 (2007)
386-400.
[12] T. Juven-Gershon, J.Y. Hsu, J.W. Theisen, J.T. Kadonaga, The RNA
polymerase II core promoter - the gateway to transcription, Current opinion in cell
biology, 20 (2008) 253-259.
[13] T.J. Parry, J.W. Theisen, J.Y. Hsu, Y.L. Wang, D.L. Corcoran, M. Eustice, U.
Ohler, J.T. Kadonaga, The TCT motif, a key component of an RNA polymerase II
transcription system for the translational machinery, Genes Dev, 24 (2010) 2013-
2018.
[14] Y. Zehavi, O. Kuznetsov, A. Ovadia-Shochat, T. Juven-Gershon, Core promoter
functions in the regulation of gene expression of Drosophila dorsal target genes, The
Journal of biological chemistry, 289 (2014) 11993-12004.
148
[15] A. Sandelin, P. Carninci, B. Lenhard, J. Ponjavic, Y. Hayashizaki, D.A. Hume,
Mammalian RNA polymerase II core promoters: insights from genome-wide studies,
Nat Rev Genet, 8 (2007) 424-436.
[16] T. Ni, D.L. Corcoran, E.A. Rach, S. Song, E.P. Spana, Y. Gao, U. Ohler, J. Zhu,
A paired-end sequencing strategy to map the complex landscape of transcription
initiation, Nat Methods, 7 (2010) 521-527.
[17] M.A. Frohman, M.K. Dush, G.R. Martin, Rapid production of full-length cDNAs
from rare transcripts: amplification using a single gene-specific oligonucleotide
primer, Proc Natl Acad Sci U S A, 85 (1988) 8998-9002.
[18] T. Shiraki, S. Kondo, S. Katayama, K. Waki, T. Kasukawa, H. Kawaji, R.
Kodzius, A. Watahiki, M. Nakamura, T. Arakawa, S. Fukuda, D. Sasaki, A.
Podhajska, M. Harbers, J. Kawai, P. Carninci, Y. Hayashizaki, Cap analysis gene
expression for high-throughput analysis of transcriptional starting point and
identification of promoter usage, Proc Natl Acad Sci U S A, 100 (2003) 15776-15781.
[19] P.G. Giresi, J. Kim, R.M. McDaniell, V.R. Iyer, J.D. Lieb, FAIRE (Formaldehyde-
Assisted Isolation of Regulatory Elements) isolates active regulatory elements from
human chromatin, Genome Res, 17 (2007) 877-885.
[20] T.S. Furey, ChIP-seq and beyond: new and improved methodologies to detect
and characterize protein-DNA interactions, Nat Rev Genet, 13 (2012) 840-852.
[21] L.J. Core, J.J. Waterfall, J.T. Lis, Nascent RNA sequencing reveals widespread
pausing and divergent initiation at human promoters, Science, 322 (2008) 1845-
1848.
[22] Z. Wang, M. Gerstein, M. Snyder, RNA-Seq: a revolutionary tool for
transcriptomics, Nat Rev Genet, 10 (2009) 57-63.
[23] N.L. Washington, E.O. Stinson, M.D. Perry, P. Ruzanov, S. Contrino, R. Smith,
Z. Zha, R. Lyne, A. Carr, P. Lloyd, E. Kephart, S.J. McKay, G. Micklem, L.D. Stein,
S.E. Lewis, The modENCODE Data Coordination Center: lessons in harvesting
comprehensive experimental details, Database (Oxford), 2011 (2011) bar023.
[24] The ENCODE (ENCyclopedia Of DNA Elements) Project, Science, 306 (2004)
636-640.
[25] A.R. Forrest, H. Kawaji, M. Rehli, J.K. Baillie, M.J. de Hoon, T. Lassmann, M.
Itoh, K.M. Summers, H. Suzuki, C.O. Daub, J. Kawai, P. Heutink, W. Hide, T.C.
Freeman, B. Lenhard, V.B. Bajic, M.S. Taylor, V.J. Makeev, A. Sandelin, D.A. Hume,
P. Carninci, Y. Hayashizaki, A promoter-level mammalian expression atlas, Nature,
507 (2014) 462-470.
[26] P. Carninci, A. Sandelin, B. Lenhard, S. Katayama, K. Shimokawa, J. Ponjavic,
C.A. Semple, M.S. Taylor, P.G. Engstrom, M.C. Frith, A.R. Forrest, W.B. Alkema,
149
S.L. Tan, C. Plessy, R. Kodzius, T. Ravasi, T. Kasukawa, S. Fukuda, M. Kanamori-
Katayama, Y. Kitazume, H. Kawaji, C. Kai, M. Nakamura, H. Konno, K. Nakano, S.
Mottagui-Tabar, P. Arner, A. Chesi, S. Gustincich, F. Persichetti, H. Suzuki, S.M.
Grimmond, C.A. Wells, V. Orlando, C. Wahlestedt, E.T. Liu, M. Harbers, J. Kawai,
V.B. Bajic, D.A. Hume, Y. Hayashizaki, Genome-wide analysis of mammalian
promoter architecture and evolution, Nat Genet, 38 (2006) 626-635.
[27] E.A. Rach, H.Y. Yuan, W.H. Majoros, P. Tomancak, U. Ohler, Motif composition,
conservation and condition-specificity of single and alternative transcription start sites
in the Drosophila genome, Genome Biol, 10 (2009) R73.
[28] V.B. Bajic, S.L. Tan, A. Christoffels, C. Schonbach, L. Lipovich, L. Yang, O.
Hofmann, A. Kruger, W. Hide, C. Kai, J. Kawai, D.A. Hume, P. Carninci, Y.
Hayashizaki, Mice and men: their promoter properties, PLoS Genet, 2 (2006) e54.
[29] R.A. Hoskins, J.M. Landolin, J.B. Brown, J.E. Sandler, H. Takahashi, T.
Lassmann, C. Yu, B.W. Booth, D. Zhang, K.H. Wan, L. Yang, N. Boley, J. Andrews,
T.C. Kaufman, B.R. Graveley, P.J. Bickel, P. Carninci, J.W. Carlson, S.E. Celniker,
Genome-wide analysis of promoter architecture in Drosophila melanogaster,
Genome Res, 21 (2011) 182-192.
[30] M. Baumann, J. Pontiller, W. Ernst, Structure and basal transcription complex of
RNA polymerase II core promoters in the mammalian genome: an overview, Mol
Biotechnol, 45 (2010) 241-247.
[31] S.J. Cooper, N.D. Trinklein, E.D. Anton, L. Nguyen, R.M. Myers, Comprehensive
analysis of transcriptional promoter structure and function in 1% of the human
genome, Genome Res, 16 (2006) 1-10.
[32] T.H. Kim, L.O. Barrera, M. Zheng, C. Qu, M.A. Singer, T.A. Richmond, Y. Wu,
R.D. Green, B. Ren, A high-resolution map of active promoters in the human
genome, Nature, 436 (2005) 876-880.
[33] M.C. Frith, Explaining the correlations among properties of mammalian
promoters, Nucleic Acids Res, 42 (2014) 4823-4832.
[34] J.A. Stamatoyannopoulos, Illuminating eukaryotic transcription start sites, Nat
Methods, 7 (2010) 501-503.
[35] N. Adachi, M.R. Lieber, Bidirectional gene organization: a common architectural
feature of the human genome, Cell, 109 (2002) 807-809.
[36] J.C. Ame, V. Schreiber, V. Fraulob, P. Dolle, G. de Murcia, C.P. Niedergang, A
bidirectional promoter connects the poly(ADP-ribose) polymerase 2 (PARP-2) gene
to the gene for RNase P RNA. structure and expression of the mouse PARP-2 gene,
The Journal of biological chemistry, 276 (2001) 11092-11099.
150
[37] A.S. Orekhova, P.M. Rubtsov, Bidirectional promoters in the transcription of
mammalian genomes, Biochemistry. Biokhimiia, 78 (2013) 335-341.
[38] V. Gotea, H.M. Petrykowska, L. Elnitski, Bidirectional promoters as important
drivers for the emergence of species-specific transcripts, PloS one, 8 (2013) e57323.
[39] M.Q. Yang, L.L. Elnitski, Diversity of core promoter elements comprising human
bidirectional promoters, BMC genomics, 9 Suppl 2 (2008) S3.
[40] P.G. Engstrom, H. Suzuki, N. Ninomiya, A. Akalin, L. Sessa, G. Lavorgna, A.
Brozzi, L. Luzi, S.L. Tan, L. Yang, G. Kunarso, E.L. Ng, S. Batalov, C. Wahlestedt, C.
Kai, J. Kawai, P. Carninci, Y. Hayashizaki, C. Wells, V.B. Bajic, V. Orlando, J.F. Reid,
B. Lenhard, L. Lipovich, Complex Loci in human and mouse genomes, PLoS Genet,
2 (2006) e47.
[41] G. Wang, K. Qi, Y. Zhao, Y. Li, L. Juan, M. Teng, L. Li, Y. Liu, Y. Wang,
Identification of regulatory regions of bidirectional genes in cervical cancer, BMC
medical genomics, 6 Suppl 1 (2013) S5.
[42] M.U. Kaikkonen, M.T. Lam, C.K. Glass, Non-coding RNAs as regulators of gene
expression and epigenetics, Cardiovascular research, 90 (2011) 430-440.
[43] P. Kapranov, J. Cheng, S. Dike, D.A. Nix, R. Duttagupta, A.T. Willingham, P.F.
Stadler, J. Hertel, J. Hackermuller, I.L. Hofacker, I. Bell, E. Cheung, J. Drenkow, E.
Dumais, S. Patel, G. Helt, M. Ganesh, S. Ghosh, A. Piccolboni, V. Sementchenko, H.
Tammana, T.R. Gingeras, RNA maps reveal new RNA classes and a possible
function for pervasive transcription, Science, 316 (2007) 1484-1488.
[44] W. Wei, V. Pelechano, A.I. Jarvelin, L.M. Steinmetz, Functional consequences of
bidirectional promoters, Trends Genet, 27 (2011) 267-276.
[45] Y. He, B. Vogelstein, V.E. Velculescu, N. Papadopoulos, K.W. Kinzler, The
antisense transcriptomes of human cells, Science, 322 (2008) 1855-1857.
[46] P. Preker, J. Nielsen, S. Kammler, S. Lykke-Andersen, M.S. Christensen, C.K.
Mapendano, M.H. Schierup, T.H. Jensen, RNA exosome depletion reveals
transcription upstream of active human promoters, Science, 322 (2008) 1851-1854.
[47] A.C. Seila, J.M. Calabrese, S.S. Levine, G.W. Yeo, P.B. Rahl, R.A. Flynn, R.A.
Young, P.A. Sharp, Divergent transcription from active promoters, Science, 322
(2008) 1849-1851.
[48] S. Buratowski, Transcription. Gene expression--where to start?, Science, 322
(2008) 1804-1805.
[49] P. Richard, J.L. Manley, How bidirectional becomes unidirectional, Nature
structural & molecular biology, 20 (2013) 1022-1024.
[50] A.E. Almada, X. Wu, A.J. Kriz, C.B. Burge, P.A. Sharp, Promoter directionality is
controlled by U1 snRNP and polyadenylation signals, Nature, 499 (2013) 360-363.
151
[51] E. Ntini, A.I. Jarvelin, J. Bornholdt, Y. Chen, M. Boyd, M. Jorgensen, R.
Andersson, I. Hoof, A. Schein, P.R. Andersen, P.K. Andersen, P. Preker, E. Valen, X.
Zhao, V. Pelechano, L.M. Steinmetz, A. Sandelin, T.H. Jensen, Polyadenylation site-
induced decay of upstream transcripts enforces promoter directionality, Nature
structural & molecular biology, 20 (2013) 923-928.
[52] F. Lai, R. Shiekhattar, Enhancer RNAs: the new molecules of transcription,
Current opinion in genetics & development, 25 (2014) 38-42.
[53] M.T. Lam, W. Li, M.G. Rosenfeld, C.K. Glass, Enhancer RNAs and regulated
transcriptional programs, Trends in biochemical sciences, 39 (2014) 170-182.
[54] L.J. Core, A.L. Martins, C.G. Danko, C.T. Waters, A. Siepel, J.T. Lis, Analysis of
nascent RNA identifies a unified architecture of initiation regions at mammalian
promoters and enhancers, Nat Genet, 46 (2014) 1311-1320.
[55] M. Uesaka, O. Nishimura, Y. Go, K. Nakashima, K. Agata, T. Imamura,
Bidirectional promoters are the major source of gene activation-associated non-
coding RNAs in mammals, BMC genomics, 15 (2014) 35.
[56] S.H. Duttke, S.A. Lacadie, M.M. Ibrahim, C.K. Glass, D.L. Corcoran, C. Benner,
S. Heinz, J.T. Kadonaga, U. Ohler, Human Promoters Are Intrinsically Directional,
Molecular cell, (2015).
[57] F. Muller, L. Tora, The multicoloured world of promoter recognition complexes,
EMBO J, 23 (2004) 2-8.
[58] L. Tora, A unified nomenclature for TATA box binding protein (TBP)-associated
factors (TAFs) involved in RNA polymerase II transcription, Genes Dev, 16 (2002)
673-675.
[59] R. Dikstein, The unexpected traits associated with core promoter elements,
Transcription, 2 (2011) 201-206.
[60] J.T. Kadonaga, The DPE, a core promoter element for transcription by RNA
polymerase II, Exp Mol Med, 34 (2002) 259-264.
[61] S.T. Smale, J.T. Kadonaga, The RNA polymerase II core promoter, Annu Rev
Biochem, 72 (2003) 449-479.
[62] F. Muller, L. Tora, Chromatin and DNA sequences in defining promoters for
transcription initiation, Biochim Biophys Acta, 1839 (2014) 118-128.
[63] J. Corden, B. Wasylyk, A. Buchwalder, P. Sassone-Corsi, C. Kedinger, P.
Chambon, Promoter sequences of eukaryotic protein-coding genes, Science, 209
(1980) 1406-1414.
[64] S.T. Smale, D. Baltimore, The "initiator" as a transcription control element, Cell,
57 (1989) 103-113.
152
[65] P.C. FitzGerald, D. Sturgill, A. Shyakhtenko, B. Oliver, C. Vinson, Comparative
genomics of Drosophila and human core promoters, Genome Biol, 7 (2006) R53.
[66] N.I. Gershenzon, E.N. Trifonov, I.P. Ioshikhes, The features of Drosophila core
promoters revealed by statistical analysis, BMC genomics, 7 (2006) 161.
[67] U. Ohler, G.C. Liao, H. Niemann, G.M. Rubin, Computational analysis of core
promoters in the Drosophila genome, Genome Biol, 3 (2002) RESEARCH0087.
[68] J. Kaufmann, S.T. Smale, Direct recognition of initiator elements by a component
of the transcription factor IID complex, Genes Dev, 8 (1994) 821-829.
[69] C.P. Verrijzer, J.L. Chen, K. Yokomori, R. Tjian, Binding of TAFs to core
elements directs promoter selectivity by RNA polymerase II, Cell, 81 (1995) 1115-
1125.
[70] G.E. Chalkley, C.P. Verrijzer, DNA binding site selection by RNA polymerase II
TAFs: a TAF(II)250-TAF(II)150 complex recognizes the initiator, EMBO J, 18 (1999)
4835-4845.
[71] R. Javahery, A. Khachi, K. Lo, B. Zenzie-Gregory, S.T. Smale, DNA sequence
requirements for transcriptional initiator activity in mammalian cells, Mol Cell Biol, 14
(1994) 116-127.
[72] B.A. Purnell, P.A. Emanuel, D.S. Gilmour, TFIID sequence recognition of the
initiator and sequences farther downstream in Drosophila class II genes, Genes Dev,
8 (1994) 830-842.
[73] C. Yang, E. Bolotin, T. Jiang, F.M. Sladek, E. Martinez, Prevalence of the
initiator over the TATA box in human and yeast genes and identification of DNA
motifs enriched in human TATA-less core promoters, Gene, 389 (2007) 52-65.
[74] M.C. Frith, E. Valen, A. Krogh, Y. Hayashizaki, P. Carninci, A. Sandelin, A code
for transcription initiation in mammalian genomes, Genome Res, 18 (2008) 1-12.
[75] G. Yarden, R. Elfakess, K. Gazit, R. Dikstein, Characterization of sINR, a strict
version of the Initiator core promoter element, Nucleic Acids Res, 37 (2009) 4234-
4246.
[76] N. Hariharan, R.P. Perry, Functional dissection of a mouse ribosomal protein
promoter: significance of the polypyrimidine initiator and an element in the TATA-box
region, Proc Natl Acad Sci U S A, 87 (1990) 1526-1530.
[77] A. Shibui-Nihei, Y. Ohmori, K. Yoshida, J. Imai, I. Oosuga, M. Iidaka, Y. Suzuki,
J. Mizushima-Sugano, K. Yoshitomo-Nakagawa, S. Sugano, The 5' terminal
oligopyrimidine tract of human elongation factor 1A-1 gene functions as a
transcriptional initiator and produces a variable number of Us at the transcriptional
level, Gene, 311 (2003) 137-145.
153
[78] R.P. Perry, The architecture of mammalian ribosomal protein promoters, BMC
Evol Biol, 5 (2005) 15.
[79] T.L. Hamilton, M. Stoneley, K.A. Spriggs, M. Bushell, TOPs and their regulation,
Biochem Soc Trans, 34 (2006) 12-16.
[80] Y. Tokusumi, Y. Ma, X. Song, R.H. Jacobson, S. Takada, The new core
promoter element XCPE1 (X Core Promoter Element 1) directs activator-, mediator-,
and TATA-binding protein-dependent but TFIID-independent RNA polymerase II
transcription from TATA-less promoters, Mol Cell Biol, 27 (2007) 1844-1858.
[81] R. Anish, M.B. Hossain, R.H. Jacobson, S. Takada, Characterization of
transcription from TATA-less promoters: identification of a new core promoter
element XCPE2 and analysis of factor requirements, PloS one, 4 (2009) e5103.
[82] M.L. Goldberg, Ph.D. thesis, in: Stanford University 1979.
[83] N.I. Gershenzon, I.P. Ioshikhes, Synergy of human Pol II core promoter
elements revealed by statistical sequence analysis, Bioinformatics, 21 (2005) 1295-
1300.
[84] M. Mencia, Z. Moqtaderi, J.V. Geisberg, L. Kuras, K. Struhl, Activator-specific
recruitment of TFIID and regulation of ribosomal protein genes in yeast, Molecular
cell, 9 (2002) 823-833.
[85] A.D. Basehoar, S.J. Zanton, B.F. Pugh, Identification and distinct regulation of
yeast TATA box-containing genes, Cell, 116 (2004) 699-709.
[86] C. Molina, E. Grotewold, Genome wide analysis of Arabidopsis core promoters,
BMC genomics, 6 (2005) 25.
[87] Y.Y. Yamamoto, H. Ichida, T. Abe, Y. Suzuki, S. Sugano, J. Obokata,
Differentiation of core promoter architecture between plants and mammals revealed
by LDSS analysis, Nucleic Acids Res, 35 (2007) 6219-6226.
[88] T. Morton, J. Petricka, D.L. Corcoran, S. Li, C.M. Winter, A. Carda, P.N. Benfey,
U. Ohler, M. Megraw, Paired-end analysis of transcription start sites in Arabidopsis
reveals plant-specific promoter signatures, The Plant cell, 26 (2014) 2746-2760.
[89] J.N. Reeve, Archaeal chromatin and transcription, Molecular microbiology, 48
(2003) 587-598.
[90] V.L. Singer, C.R. Wobbe, K. Struhl, A wide variety of DNA sequences can
functionally replace a yeast TATA element for transcriptional activation, Genes Dev,
4 (1990) 636-645.
[91] W. Deng, S.G. Roberts, A core promoter element downstream of the TATA box
that is recognized by TFIIB, Genes Dev, 19 (2005) 2418-2423.
[92] W. Deng, S.G. Roberts, TFIIB and the regulation of transcription by RNA
polymerase II, Chromosoma, 116 (2007) 417-429.
154
[93] T. Lagrange, A.N. Kapanidis, H. Tang, D. Reinberg, R.H. Ebright, New core
promoter element in RNA polymerase II-dependent transcription: sequence-specific
DNA binding by transcription factor IIB, Genes Dev, 12 (1998) 34-44.
[94] R. Evans, J.A. Fairley, S.G. Roberts, Activator-mediated disruption of sequence-
specific DNA contacts by the general transcription factor TFIIB, Genes Dev, 15
(2001) 2945-2949.
[95] T. Juven-Gershon, J.Y. Hsu, J.T. Kadonaga, Caudal, a key developmental
regulator, is a DPE-specific transcriptional factor, Genes Dev, 22 (2008) 2823-2830.
[96] T.W. Burke, J.T. Kadonaga, Drosophila TFIID binds to a conserved downstream
basal promoter element that is present in many TATA-box-deficient promoters,
Genes Dev, 10 (1996) 711-724.
[97] T.W. Burke, J.T. Kadonaga, The downstream core promoter element, DPE, is
conserved from Drosophila to humans and is recognized by TAFII60 of Drosophila,
Genes Dev, 11 (1997) 3020-3031.
[98] A.K. Kutach, J.T. Kadonaga, The downstream promoter element DPE appears to
be as widely used as the TATA box in Drosophila core promoters, Mol Cell Biol, 20
(2000) 4754-4764.
[99] Y. Zehavi, A. Sloutskin, O. Kuznetsov, T. Juven-Gershon, The core promoter
composition establishes a new dimension in developmental gene networks, Nucleus,
5 (2014).
[100] C.Y. Lim, B. Santoso, T. Boulay, E. Dong, U. Ohler, J.T. Kadonaga, The MTE,
a new core promoter element for transcription by RNA polymerase II, Genes Dev, 18
(2004) 1606-1617.
[101] J.W. Theisen, C.Y. Lim, J.T. Kadonaga, Three key subregions contribute to the
function of the downstream RNA polymerase II core promoter, Mol Cell Biol, 30
(2010) 3471-3479.
[102] T. Zhou, C.M. Chiang, The intronless and TATA-less human TAF(II)55 gene
contains a functional initiator and a downstream promoter element, The Journal of
biological chemistry, 276 (2001) 25503-25511.
[103] H. Shao, M. Revach, S. Moshonov, Y. Tzuman, K. Gazit, S. Albeck, T. Unger,
R. Dikstein, Core promoter binding by histone-like TAF complexes, Mol Cell Biol, 25
(2005) 206-219.
[104] S.H. Duttke, RNA polymerase III accurately initiates transcription from RNA
polymerase II promoters in vitro, The Journal of biological chemistry, 289 (2014)
20396-20404.
155
[105] D.H. Lee, N. Gershenzon, M. Gupta, I.P. Ioshikhes, D. Reinberg, B.A. Lewis,
Functional characterization of core promoter elements: the downstream core element
is recognized by TAF1, Mol Cell Biol, 25 (2005) 9674-9686.
[106] B.A. Lewis, T.K. Kim, S.H. Orkin, A downstream element in the human beta-
globin promoter: evidence of extended sequence-specific transcription factor IID
contacts, Proc Natl Acad Sci U S A, 97 (2000) 7172-7177.
[107] A. Matsukage, F. Hirose, M.A. Yoo, M. Yamaguchi, The DRE/DREF
transcriptional regulatory system: a master key for cell proliferation, Biochim Biophys
Acta, 1779 (2008) 81-89.
[108] J. Li, D.S. Gilmour, Distinct mechanisms of transcriptional pausing orchestrated
by GAGA factor and M1BP, a novel transcription factor, EMBO J, 32 (2013) 1829-
1841.
[109] Z. Chen, J.L. Manley, Core promoter elements and TAFs contribute to the
diversity of transcriptional activation in vertebrates, Mol Cell Biol, 23 (2003) 7350-
7362.
[110] E. Martinez, H. Ge, Y. Tao, C.X. Yuan, V. Palhan, R.G. Roeder, Novel
cofactors and TFIIA mediate functional core promoter selectivity by the human
TAFII150-containing TFIID complex, Mol Cell Biol, 18 (1998) 6571-6583.
[111] J.Y. Hsu, T. Juven-Gershon, M.T. Marr, 2nd, K.J. Wright, R. Tjian, J.T.
Kadonaga, TBP, Mot1, and NC2 establish a regulatory circuit that controls DPE-
dependent versus TATA-dependent transcription, Genes Dev, 22 (2008) 2353-2358.
[112] T. Juven-Gershon, S. Cheng, J.T. Kadonaga, Rational design of a super core
promoter that enhances gene expression, Nat Methods, 3 (2006) 917-922.
[113] T. Matsui, J. Segall, P.A. Weil, R.G. Roeder, Multiple factors required for
accurate initiation of transcription by purified RNA polymerase II, The Journal of
biological chemistry, 255 (1980) 11992-11996.
[114] M. Samuels, A. Fire, P.A. Sharp, Separation and characterization of factors
mediating accurate transcription by RNA polymerase II, The Journal of biological
chemistry, 257 (1982) 14419-14427.
[115] Y. He, J. Fang, D.J. Taatjes, E. Nogales, Structural visualization of key steps in
human transcription initiation, Nature, 495 (2013) 481-486.
[116] B.A. Lewis, R.J. Sims, 3rd, W.S. Lane, D. Reinberg, Functional characterization
of core promoter elements: DPE-specific transcription requires the protein kinase
CK2 and the PC4 coactivator, Molecular cell, 18 (2005) 471-481.
[117] F. Muller, M.A. Demeny, L. Tora, New problems in RNA polymerase II
transcription initiation: matching the diversity of core promoters with a variety of
156
promoter recognition factors, The Journal of biological chemistry, 282 (2007) 14685-
14689.
[118] T.W. Sikorski, S. Buratowski, The basal initiation machinery: beyond the
general transcription factors, Current opinion in cell biology, 21 (2009) 344-351.
[119] Y. Zehavi, A. Kedmi, D. Ideses, T. Juven-Gershon, TRF2: TRansForming the
view of general transcription factors, Transcription, (2015) 0.
[120] G. Papai, P.A. Weil, P. Schultz, New insights into the function of transcription
factor TFIID from recent structural studies, Current opinion in genetics &
development, 21 (2011) 219-224.
[121] N. Nakajima, M. Horikoshi, R.G. Roeder, Factors involved in specific
transcription by mammalian RNA polymerase II: purification, genetic specificity, and
TATA box-promoter interactions of TFIID, Mol Cell Biol, 8 (1988) 4028-4040.
[122] C.M. Chiang, H. Ge, Z. Wang, A. Hoffmann, R.G. Roeder, Unique TATA-
binding protein-containing complexes and cofactors involved in transcription by RNA
polymerases II and III, EMBO J, 12 (1993) 2749-2762.
[123] E. Wieczorek, M. Brand, X. Jacq, L. Tora, Function of TAF(II)-containing
complex without TBP in transcription by RNA polymerase II, Nature, 393 (1998) 187-
191.
[124] K. Gazit, S. Moshonov, R. Elfakess, M. Sharon, G. Mengus, I. Davidson, R.
Dikstein, TAF4/4b x TAF12 displays a unique mode of DNA binding and is required
for core promoter function of a subset of genes, The Journal of biological chemistry,
284 (2009) 26286-26296.
[125] T. O'Brien, R. Tjian, Different functional domains of TAFII250 modulate
expression of distinct subsets of mammalian genes, Proc Natl Acad Sci U S A, 97
(2000) 2456-2461.
[126] R.O. Weinzierl, B.D. Dynlacht, R. Tjian, Largest subunit of Drosophila
transcription factor IID directs assembly of a complex containing TBP and a
coactivator, Nature, 362 (1993) 511-517.
[127] K.J. Wright, M.T. Marr, 2nd, R. Tjian, TAF4 nucleates a core subcomplex of
TFIID and mediates activated transcription from a TATA-less promoter, Proc Natl
Acad Sci U S A, 103 (2006) 12347-12352.
[128] C. Bieniossek, G. Papai, C. Schaffitzel, F. Garzoni, M. Chaillet, E. Scheer, P.
Papadopoulos, L. Tora, P. Schultz, I. Berger, The architecture of human general
transcription factor TFIID core complex, Nature, 493 (2013) 699-702.
[129] M.A. Demeny, E. Soutoglou, Z. Nagy, E. Scheer, A. Janoshazi, M. Richardot,
M. Argentini, P. Kessler, L. Tora, Identification of a small TAF complex and its role in
the assembly of TAF-containing complexes, PloS one, 2 (2007) e316.
157
[130] J. Bonnet, C.Y. Wang, T. Baptista, S.D. Vincent, W.C. Hsiao, M. Stierle, C.F.
Kao, L. Tora, D. Devys, The SAGA coactivator complex acts on the whole
transcribed genome and is required for RNA polymerase II transcription, Genes Dev,
28 (2014) 1999-2012.
[131] D.J. Mitsiou, H.G. Stunnenberg, TAC, a TBP-sans-TAFs complex containing
the unprocessed TFIIAalphabeta precursor and the TFIIAgamma subunit, Molecular
cell, 6 (2000) 527-537.
[132] T. Raha, S.W. Cheng, M.R. Green, HIV-1 Tat stimulates transcription complex
assembly through recruitment of TBP in the absence of TAFs, PLoS biology, 3
(2005) e44.
[133] B. Guglielmi, N. La Rochelle, R. Tjian, Gene-specific transcriptional
mechanisms at the histone gene cluster revealed by single-cell imaging, Molecular
cell, 51 (2013) 480-492.
[134] J. Zaborowska, A. Taylor, S. Murphy, A novel TBP-TAF complex on RNA
polymerase II-transcribed snRNA genes, Transcription, 3 (2012) 92-104.
[135] F.J. van Werven, H. van Bakel, H.A. van Teeffelen, A.F. Altelaar, M.G.
Koerkamp, A.J. Heck, F.C. Holstege, H.T. Timmers, Cooperative action of NC2 and
Mot1p to regulate TATA-binding protein function across the genome, Genes Dev, 22
(2008) 2359-2369.
[136] W. Deng, B. Malecova, T. Oelgeschlager, S.G. Roberts, TFIIB recognition
elements control the TFIIA-NC2 axis in transcriptional regulation, Mol Cell Biol, 29
(2009) 1389-1400.
[137] M. Xu, P. Sharma, S. Pan, S. Malik, R.G. Roeder, E. Martinez, Core promoter-
selective function of HMGA1 and Mediator in Initiator-dependent transcription, Genes
Dev, 25 (2011) 2513-2524.
[138] M.A. Cianfrocco, G.A. Kassavetis, P. Grob, J. Fang, T. Juven-Gershon, J.T.
Kadonaga, E. Nogales, Human TFIID binds to core promoter DNA in a reorganized
structural state, Cell, 152 (2013) 120-131.
[139] M.A. Cianfrocco, E. Nogales, Regulatory interplay between TFIID's
conformational transitions and its modular interaction with core promoter DNA,
Transcription, 4 (2013) 120-126.
[140] W. Akhtar, G.J. Veenstra, TBP-related factors: a paradigm of diversity in
transcription initiation, Cell & bioscience, 1 (2011) 23.
[141] F. Muller, A. Zaucker, L. Tora, Developmental regulation of transcription
initiation: more than just changing the actors, Current opinion in genetics &
development, 20 (2010) 533-540.
158
[142] J.H. Reina, N. Hernandez, On a roll for new TRF targets, Genes Dev, 21 (2007)
2855-2860.
[143] S.H. Duttke, R.F. Doolittle, Y.L. Wang, J.T. Kadonaga, TRF2 and the evolution
of the bilateria, Genes Dev, 28 (2014) 2071-2076.
[144] P.A. Moore, J. Ozer, M. Salunek, G. Jan, D. Zerby, S. Campbell, P.M.
Lieberman, A human TATA binding protein-related protein with altered DNA binding
specificity inhibits transcription from multiple promoters and activators, Mol Cell Biol,
19 (1999) 7610-7620.
[145] M.D. Rabenstein, S. Zhou, J.T. Lis, R. Tjian, TATA box-binding protein (TBP)-
related factor 2 (TRF2), a third member of the TBP family, Proc Natl Acad Sci U S A,
96 (1999) 4791-4796.
[146] Y. Isogai, S. Keles, M. Prestel, A. Hochheimer, R. Tjian, Transcription of
histone gene cluster by differential core-promoter factors, Genes Dev, 21 (2007)
2936-2949.
[147] Y.L. Wang, S.H. Duttke, K. Chen, J. Johnston, G.A. Kassavetis, J. Zeitlinger,
J.T. Kadonaga, TRF2, but not TBP, mediates the transcription of ribosomal protein
genes, Genes Dev, 28 (2014) 1550-1555.
[148] A. Kedmi, Y. Zehavi, Y. Glick, Y. Orenstein, D. Ideses, C. Wachtel, T. Doniger,
H. Waldman Ben-Asher, N. Muster, J. Thompson, S. Anderson, D. Avrahami, J.R.
Yates, 3rd, R. Shamir, D. Gerber, T. Juven-Gershon, Drosophila TRF2 is a
preferential core promoter regulator, Genes Dev, 28 (2014) 2163-2174.
[149] S.H. Duttke, Evolution and diversification of the basal transcription machinery,
Trends in biochemical sciences, (2015).
[150] J.A. Goodrich, R. Tjian, Unexpected roles for core promoter recognition factors
in cell-type-specific transcription and gene regulation, Nat Rev Genet, 11 (2010) 549-
558.
[151] D.A. Wassarman, N. Aoyagi, L.A. Pile, E.M. Schlag, TAF250 is required for
multiple developmental events in Drosophila, Proc Natl Acad Sci U S A, 97 (2000)
1154-1159.
[152] N. Aoyagi, D.A. Wassarman, Developmental and transcriptional consequences
of mutations in Drosophila TAF(II)60, Mol Cell Biol, 21 (2001) 6808-6819.
[153] J. Zhou, J. Zwicker, P. Szymanski, M. Levine, R. Tjian, TAFII mutations disrupt
Dorsal activation in the Drosophila embryo, Proc Natl Acad Sci U S A, 95 (1998)
13483-13488.
[154] M. Guermah, K. Ge, C.M. Chiang, R.G. Roeder, The TBN protein, which is
essential for early embryonic mouse development, is an inducible TAFII implicated in
adipogenesis, Molecular cell, 12 (2003) 991-1001.
159
[155] S. Georgieva, D.B. Kirschner, T. Jagla, E. Nabirochkina, S. Hanke, H.
Schenkel, C. de Lorenzo, P. Sinha, K. Jagla, B. Mechler, L. Tora, Two novel
Drosophila TAF(II)s have homology with human TAF(II)30 and are differentially
regulated during development, Mol Cell Biol, 20 (2000) 1639-1648.
[156] W.S. Mohan, Jr., E. Scheer, O. Wendling, D. Metzger, L. Tora, TAF10
(TAF(II)30) is necessary for TFIID stability and early embryogenesis in mice, Mol Cell
Biol, 23 (2003) 4307-4318.
[157] A. Tatarakis, T. Margaritis, C.P. Martinez-Jimenez, A. Kouskouti, W.S. Mohan,
2nd, A. Haroniti, D. Kafetzopoulos, L. Tora, I. Talianidis, Dominant and redundant
functions of TFIID involved in the regulation of hepatic genes, Molecular cell, 31
(2008) 531-543.
[158] W.W. Pijnappel, D. Esch, M.P. Baltissen, G. Wu, N. Mischerikow, A.J.
Bergsma, E. van der Wal, D.W. Han, H. Bruch, S. Moritz, P. Lijnzaad, A.F. Altelaar,
K. Sameith, H. Zaehres, A.J. Heck, F.C. Holstege, H.R. Scholer, H.T. Timmers, A
central role for TFIID in the pluripotent transcription circuitry, Nature, 495 (2013) 516-
519.
[159] G.A. Maston, L.J. Zhu, L. Chamberlain, L. Lin, M. Fang, M.R. Green, Non-
canonical TAF complexes regulate active promoters in human embryonic stem cells,
eLife, 1 (2012) e00068.
[160] P.J. Wang, D.C. Page, Functional substitution for TAF(II)250 by a retroposed
homolog that is expressed in human spermatogenesis, Human molecular genetics,
11 (2002) 2341-2346.
[161] J.C. Pointud, G. Mengus, S. Brancorsini, L. Monaco, M. Parvinen, P. Sassone-
Corsi, I. Davidson, The intracellular localisation of TAF7L, a paralogue of
transcription factor TFIID subunit TAF7, is developmentally regulated during male
germ-cell differentiation, Journal of cell science, 116 (2003) 1847-1858.
[162] Y. Cheng, M.G. Buffone, M. Kouadio, M. Goodheart, D.C. Page, G.L. Gerton, I.
Davidson, P.J. Wang, Abnormal sperm in mice lacking the Taf7l gene, Mol Cell Biol,
27 (2007) 2582-2589.
[163] H. Zhou, I. Grubisic, K. Zheng, Y. He, P.J. Wang, T. Kaplan, R. Tjian, Taf7l
cooperates with Trf2 to regulate spermiogenesis, Proc Natl Acad Sci U S A, 110
(2013) 16886-16891.
[164] H. Zhou, T. Kaplan, Y. Li, I. Grubisic, Z. Zhang, P.J. Wang, M.B. Eisen, R.
Tjian, Dual functions of TAF7L in adipocyte differentiation, eLife, 2 (2013) e00170.
[165] H. Zhou, B. Wan, I. Grubisic, T. Kaplan, R. Tjian, TAF7L modulates brown
adipose tissue formation, eLife, 3 (2014).
160
[166] R. Dikstein, S. Zhou, R. Tjian, Human TAFII 105 is a cell type-specific TFIID
subunit related to hTAFII130, Cell, 87 (1996) 137-146.
[167] A.E. Falender, R.N. Freiman, K.G. Geles, K.C. Lo, K. Hwang, D.J. Lamb, P.L.
Morris, R. Tjian, J.S. Richards, Maintenance of spermatogenesis requires TAF4b, a
gonad-specific subunit of TFIID, Genes Dev, 19 (2005) 794-803.
[168] A.E. Falender, M. Shimada, Y.K. Lo, J.S. Richards, TAF4b, a TBP associated
factor, is required for oocyte development and function, Dev Biol, 288 (2005) 405-
419.
[169] R.N. Freiman, S.R. Albright, S. Zheng, W.C. Sha, R.E. Hammer, R. Tjian,
Requirement of tissue-selective TBP-associated factor TAFII105 in ovarian
development, Science, 293 (2001) 2084-2087.
[170] K.J. Grive, K.A. Seymour, R. Mehta, R.N. Freiman, TAF4b promotes mouse
primordial follicle assembly and oocyte survival, Dev Biol, 392 (2014) 42-51.
[171] F.J. Herrera, T. Yamaguchi, H. Roelink, R. Tjian, Core promoter factor TAF9B
regulates neuronal gene expression, eLife, 3 (2014) e02559.
[172] M. Hiller, X. Chen, M.J. Pringle, M. Suchorolski, Y. Sancak, S. Viswanathan, B.
Bolival, T.Y. Lin, S. Marino, M.T. Fuller, Testis-specific TAF homologs collaborate to
control a tissue-specific transcription program, Development, 131 (2004) 5297-5308.
[173] U. Ohler, D.A. Wassarman, Promoting developmental transcription,
Development, 137 (2010) 15-26.
[174] I. Martianov, G.M. Fimia, A. Dierich, M. Parvinen, P. Sassone-Corsi, I.
Davidson, Late arrest of spermiogenesis and germ cell apoptosis in mice lacking the
TBP-like TLF/TRF2 gene, Molecular cell, 7 (2001) 509-515.
[175] D. Zhang, T.L. Penttila, P.L. Morris, M. Teichmann, R.G. Roeder,
Spermiogenesis deficiency in mice lacking the Trf2 gene, Science, 292 (2001) 1153-
1155.
[176] T. Oyama, S. Sasagawa, S. Takeda, R.A. Hess, P.M. Lieberman, E.H. Cheng,
J.J. Hsieh, Cleavage of TFIIA by Taspase1 activates TRF2-specified mammalian
male germ cell programs, Developmental cell, 27 (2013) 188-200.
[177] A. Bashirullah, G. Lam, V.P. Yin, C.S. Thummel, dTrf2 is required for
transcriptional and developmental responses to ecdysone during Drosophila
metamorphosis, Developmental dynamics : an official publication of the American
Association of Anatomists, 236 (2007) 3173-3179.
[178] D.O. Hart, T. Raha, N.D. Lawson, M.R. Green, Initiation of zebrafish
haematopoiesis by the TATA-box-binding protein-related factor Trf3, Nature, 450
(2007) 1082-1085.
161
[179] D.O. Hart, M.K. Santra, T. Raha, M.R. Green, Selective interaction between
Trf3 and Taf3 required for early development and hematopoiesis, Developmental
dynamics : an official publication of the American Association of Anatomists, 238
(2009) 2540-2549.
[180] R. Bartfai, C. Balduf, T. Hilton, Y. Rathmann, Y. Hadzhiev, L. Tora, L. Orban, F.
Muller, TBP2, a vertebrate-specific member of the TBP family, is required in
embryonic development of zebrafish, Current biology : CB, 14 (2004) 593-598.
[181] Z. Jallow, U.G. Jacobi, D.L. Weeks, I.B. Dawid, G.J. Veenstra, Specialized and
redundant roles of TBP and a vertebrate-specific TBP paralog in embryonic gene
regulation in Xenopus, Proc Natl Acad Sci U S A, 101 (2004) 13525-13530.
[182] E. Gazdag, A. Santenard, C. Ziegler-Birling, G. Altobelli, O. Poch, L. Tora, M.E.
Torres-Padilla, TBP2 is essential for germ cell development by regulating
transcription and chromatin condensation in the oocyte, Genes Dev, 23 (2009) 2210-
2223.
[183] M. Bulger, M. Groudine, Functional and mechanistic diversity of distal
transcription enhancers, Cell, 144 (2011) 327-339.
[184] M. Levine, Transcriptional enhancers in animal development and evolution,
Current biology : CB, 20 (2010) R754-763.
[185] M. Levine, C. Cattoglio, R. Tjian, Looping back to leap forward: transcription
enters a new era, Cell, 157 (2014) 13-25.
[186] J. Marsman, J.A. Horsfield, Long distance relationships: enhancer-promoter
communication and dynamic gene transcription, Biochim Biophys Acta, 1819 (2012)
1217-1227.
[187] C.T. Ong, V.G. Corces, Enhancer function: new insights into the regulation of
tissue-specific gene expression, Nat Rev Genet, 12 (2011) 283-293.
[188] D. Shlyueva, G. Stampfel, A. Stark, Transcriptional enhancers: from properties
to genome-wide predictions, Nat Rev Genet, 15 (2014) 272-286.
[189] F. Spitz, E.E. Furlong, Transcription factors: from enhancer binding to
developmental control, Nat Rev Genet, 13 (2012) 613-626.
[190] J. van Arensbergen, B. van Steensel, H.J. Bussemaker, In search of the
determinants of enhancer-promoter interaction specificity, Trends in cell biology,
(2014).
[191] X. Li, M. Noll, Compatibility between enhancers and promoters determines the
transcriptional specificity of gooseberry and gooseberry neuro in the Drosophila
embryo, EMBO J, 13 (1994) 400-406.
162
[192] C. Merli, D.E. Bergstrom, J.A. Cygan, R.K. Blackman, Promoter specificity
mediates the independent regulation of neighboring genes, Genes Dev, 10 (1996)
1260-1270.
[193] B. Tolhuis, R.J. Palstra, E. Splinter, F. Grosveld, W. de Laat, Looping and
interaction between hypersensitive sites in the active beta-globin locus, Molecular
cell, 10 (2002) 1453-1465.
[194] J. Gehrig, M. Reischl, E. Kalmar, M. Ferg, Y. Hadzhiev, A. Zaucker, C. Song, S.
Schindler, U. Liebel, F. Muller, Automated high-throughput mapping of promoter-
enhancer interactions in zebrafish embryos, Nat Methods, 6 (2009) 911-916.
[195] V.C. Calhoun, A. Stathopoulos, M. Levine, Promoter-proximal tethering
elements regulate enhancer-promoter specificity in the Drosophila Antennapedia
complex, Proc Natl Acad Sci U S A, 99 (2002) 9243-9247.
[196] O.S. Akbari, E. Bae, H. Johnsen, A. Villaluz, D. Wong, R.A. Drewell, A novel
promoter-tethering element regulates enhancer-driven gene expression at the
bithorax complex in the Drosophila embryo, Development, 135 (2008) 123-131.
[197] S. Ohtsuki, M. Levine, H.N. Cai, Different core promoters possess distinct
regulatory activities in the Drosophila embryo, Genes Dev, 12 (1998) 547-556.
[198] J.E. Butler, J.T. Kadonaga, Enhancer-promoter specificity mediated by DPE or
TATA core promoter motifs, Genes Dev, 15 (2001) 2515-2519.
[199] F. Jin, Y. Li, J.R. Dixon, S. Selvaraj, Z. Ye, A.Y. Lee, C.A. Yen, A.D. Schmitt,
C.A. Espinoza, B. Ren, A high-resolution map of the three-dimensional chromatin
interactome in human cells, Nature, 503 (2013) 290-294.
[200] A. Sanyal, B.R. Lajoie, G. Jain, J. Dekker, The long-range interaction
landscape of gene promoters, Nature, 489 (2012) 109-113.
[201] Y. Zhang, C.H. Wong, R.Y. Birnbaum, G. Li, R. Favaro, C.Y. Ngan, J. Lim, E.
Tai, H.M. Poh, E. Wong, F.H. Mulawadi, W.K. Sung, S. Nicolis, N. Ahituv, Y. Ruan,
C.L. Wei, Chromatin connectivity maps reveal dynamic promoter-enhancer long-
range associations, Nature, 504 (2013) 306-310.
[202] C.D. Arnold, D. Gerlach, C. Stelzer, L.M. Boryn, M. Rath, A. Stark, Genome-
wide quantitative enhancer activity maps identified by STARR-seq, Science, 339
(2013) 1074-1077.
[203] Y. Ghavi-Helm, F.A. Klein, T. Pakozdi, L. Ciglar, D. Noordermeer, W. Huber,
E.E. Furlong, Enhancer loops appear stable during development and are associated
with paused polymerase, Nature, 512 (2014) 96-100.
[204] M.A. Zabidi, C.D. Arnold, K. Schernhuber, M. Pagani, M. Rath, O. Frank, A.
Stark, Enhancer--core-promoter specificity separates developmental and
housekeeping gene regulation, Nature, (2014).
163
[205] N.J. Fuda, M.B. Ardehali, J.T. Lis, Defining mechanisms that regulate RNA
polymerase II transcription in vivo, Nature, 461 (2009) 186-192.
[206] S. Nechaev, K. Adelman, Pol II waiting in the starting gates: Regulating the
transition from transcription initiation into productive elongation, Biochim Biophys
Acta, 1809 (2011) 34-45.
[207] D.L. Bentley, Coupling mRNA processing with transcription in time and space,
Nat Rev Genet, 15 (2014) 163-175.
[208] K. Adelman, J.T. Lis, Promoter-proximal pausing of RNA polymerase II:
emerging roles in metazoans, Nat Rev Genet, 13 (2012) 720-731.
[209] D.A. Gilchrist, K. Adelman, Coupling polymerase pausing and chromatin
landscapes for precise regulation of transcription, Biochim Biophys Acta, 1819 (2012)
700-706.
[210] Y. Yamaguchi, H. Shibata, H. Handa, Transcription elongation factors DSIF and
NELF: promoter-proximal pausing and beyond, Biochim Biophys Acta, 1829 (2013)
98-104.
[211] D.S. Gilmour, J.T. Lis, RNA polymerase II interacts with the promoter region of
the noninduced hsp70 gene in Drosophila melanogaster cells, Mol Cell Biol, 6 (1986)
3984-3989.
[212] E.B. Rasmussen, J.T. Lis, In vivo transcriptional pausing and cap formation on
three Drosophila heat shock genes, Proc Natl Acad Sci U S A, 90 (1993) 7923-7927.
[213] D.L. Bentley, M. Groudine, A block to elongation is largely responsible for
decreased transcription of c-myc in differentiated HL60 cells, Nature, 321 (1986) 702-
706.
[214] A. Krumm, T. Meulia, M. Brunvand, M. Groudine, The block to transcriptional
elongation within the human c-myc gene is determined in the promoter-proximal
region, Genes Dev, 6 (1992) 2201-2213.
[215] C.S. Maxwell, W.S. Kruesi, L.J. Core, N. Kurhanewicz, C.T. Waters, C.L.
Lewarch, I. Antoshechkin, J.T. Lis, B.J. Meyer, L.R. Baugh, Pol II docking and
pausing at growth and stress genes in C. elegans, Cell reports, 6 (2014) 455-466.
[216] G.W. Muse, D.A. Gilchrist, S. Nechaev, R. Shah, J.S. Parker, S.F. Grissom, J.
Zeitlinger, K. Adelman, RNA polymerase is poised for activation across the genome,
Nat Genet, 39 (2007) 1507-1511.
[217] J. Zeitlinger, A. Stark, M. Kellis, J.W. Hong, S. Nechaev, K. Adelman, M.
Levine, R.A. Young, RNA polymerase stalling at developmental control genes in the
Drosophila melanogaster embryo, Nat Genet, 39 (2007) 1512-1516.
164
[218] C. Lee, X. Li, A. Hechmer, M. Eisen, M.D. Biggin, B.J. Venters, C. Jiang, J. Li,
B.F. Pugh, D.S. Gilmour, NELF and GAGA factor are linked to promoter-proximal
pausing at many genes in Drosophila, Mol Cell Biol, 28 (2008) 3290-3300.
[219] S. Nechaev, D.C. Fargo, G. dos Santos, L. Liu, Y. Gao, K. Adelman, Global
analysis of short RNAs reveals widespread promoter-proximal stalling and arrest of
Pol II in Drosophila, Science, 327 (2010) 335-338.
[220] M. Quinodoz, C. Gobet, F. Naef, K.B. Gustafson, Characteristic bimodal
profiles of RNA polymerase II at thousands of active mammalian promoters, Genome
Biol, 15 (2014) R85.
[221] B. Gaertner, J. Zeitlinger, RNA polymerase II pausing during development,
Development, 141 (2014) 1179-1183.
[222] C. Nepal, Y. Hadzhiev, C. Previti, V. Haberle, N. Li, H. Takahashi, A.M. Suzuki,
Y. Sheng, R.F. Abdelhamid, S. Anand, J. Gehrig, A. Akalin, C.E. Kockx, A.A. van der
Sloot, W.F. van Ijcken, O. Armant, S. Rastegar, C. Watson, U. Strahle, E. Stupka, P.
Carninci, B. Lenhard, F. Muller, Dynamic regulation of the transcription initiation
landscape at single nucleotide resolution during vertebrate embryogenesis, Genome
Res, 23 (2013) 1938-1950.
[223] V. Haberle, N. Li, Y. Hadzhiev, C. Plessy, C. Previti, C. Nepal, J. Gehrig, X.
Dong, A. Akalin, A.M. Suzuki, I.W.F. van, O. Armant, M. Ferg, U. Strahle, P. Carninci,
F. Muller, B. Lenhard, Two independent transcription initiation codes overlap on
vertebrate core promoters, Nature, 507 (2014) 381-385.
[224] T. Henriques, D.A. Gilchrist, S. Nechaev, M. Bern, G.W. Muse, A. Burkholder,
D.C. Fargo, K. Adelman, Stable pausing by RNA polymerase II provides an
opportunity to target and integrate regulatory signals, Molecular cell, 52 (2013) 517-
528.
[225] J. Li, Y. Liu, H.S. Rhee, S.K. Ghosh, L. Bai, B.F. Pugh, D.S. Gilmour, Kinetic
competition between elongation rate and binding of NELF controls promoter-proximal
pausing, Molecular cell, 50 (2013) 711-722.
[226] B. Gaertner, J. Johnston, K. Chen, N. Wallaschek, A. Paulson, A.S. Garruss, K.
Gaudenz, B. De Kumar, R. Krumlauf, J. Zeitlinger, Poised RNA polymerase II
changes over developmental time and prepares genes for future expression, Cell
reports, 2 (2012) 1670-1683.
[227] M. Lagha, J.P. Bothma, E. Esposito, S. Ng, L. Stefanik, C. Tsui, J. Johnston, K.
Chen, D.S. Gilmour, J. Zeitlinger, M.S. Levine, Paused Pol II coordinates tissue
morphogenesis in the Drosophila embryo, Cell, 153 (2013) 976-987.
[228] A. Saunders, H.L. Ashe, Taking a pause to reflect on morphogenesis, Cell, 153
(2013) 941-943.
165
[229] A. Saunders, L.J. Core, C. Sutcliffe, J.T. Lis, H.L. Ashe, Extensive polymerase
pausing during Drosophila axis patterning enables high-level and pliable
transcription, Genes Dev, 27 (2013) 1146-1158.
[230] K. Chen, J. Johnston, W. Shao, S. Meier, C. Staber, J. Zeitlinger, A global
change in RNA polymerase II pausing during the Drosophila midblastula transition,
eLife, 2 (2013) e00861.
[231] L. Amir-Zilberstein, E. Ainbinder, L. Toube, Y. Yamaguchi, H. Handa, R.
Dikstein, Differential regulation of NF-kappaB by elongation factors is determined by
core promoter type, Mol Cell Biol, 27 (2007) 5246-5259.
[232] D.A. Hendrix, J.W. Hong, J. Zeitlinger, D.S. Rokhsar, M.S. Levine, Promoter
elements associated with RNA Pol II stalling in the Drosophila embryo, Proc Natl
Acad Sci U S A, 105 (2008) 7762-7767.
[233] H. Kwak, N.J. Fuda, L.J. Core, J.T. Lis, Precise maps of RNA polymerase
reveal how promoters direct initiation and pausing, Science, 339 (2013) 950-953.
[234] N.J. Proudfoot, Ending the message: poly(A) signals then and now, Genes
Dev, 25 (2011) 1770-1782.
[235] P.K. Andersen, T.H. Jensen, S. Lykke-Andersen, Making ends meet:
coordination between RNA 3'-end processing and transcription initiation, Wiley
interdisciplinary reviews. RNA, 4 (2013) 233-246.
[236] D.C. Di Giammartino, J.L. Manley, New links between mRNA polyadenylation
and diverse nuclear pathways, Molecules and cells, 37 (2014) 644-649.
[237] O. Calvo, J.L. Manley, Strange bedfellows: polyadenylation factors at the
promoter, Genes Dev, 17 (2003) 1321-1327.
[238] K. Xiang, L. Tong, J.L. Manley, Delineating the structural blueprint of the pre-
mRNA 3'-end processing machinery, Mol Cell Biol, 34 (2014) 1894-1910.
[239] J.C. Dantonel, K.G. Murthy, J.L. Manley, L. Tora, Transcription factor TFIID
recruits factor CPSF for formation of 3' end of mRNA, Nature, 389 (1997) 399-402.
[240] Y. Wang, J.A. Fairley, S.G. Roberts, Phosphorylation of TFIIB links
transcription initiation and termination, Current biology : CB, 20 (2010) 548-553.
[241] C.K. Mapendano, S. Lykke-Andersen, J. Kjems, E. Bertrand, T.H. Jensen,
Crosstalk between mRNA 3' end processing and transcription initiation, Molecular
cell, 40 (2010) 410-422.
[242] S. Lykke-Andersen, C.K. Mapendano, T.H. Jensen, An ending is a new
beginning: transcription termination supports re-initiation, Cell cycle, 10 (2011) 863-
865.
166
[243] K. Oktaba, W. Zhang, T.S. Lotz, D.J. Jun, S.B. Lemke, S.P. Ng, E. Esposito, M.
Levine, V. Hilgers, ELAV Links Paused Pol II to Alternative Polyadenylation in the
Drosophila Nervous System, Molecular cell, 57 (2015) 341-348.
[244] T. Gonatopoulos-Pournatzis, V.H. Cowling, Cap-binding complex (CBC),
Biochem J, 457 (2014) 231-242.
[245] R.J. Jackson, C.U. Hellen, T.V. Pestova, The mechanism of eukaryotic
translation initiation and principles of its regulation, Nature reviews. Molecular cell
biology, 11 (2010) 113-127.
[246] R. Dikstein, Transcription and translation in a package deal: the TISU
paradigm, Gene, 491 (2012) 1-4.
[247] M. Kozak, Initiation of translation in prokaryotes and eukaryotes, Gene, 234
(1999) 187-208.
[248] R. Elfakess, R. Dikstein, A translation initiation element specific to mRNAs with
very short 5'UTR that also regulates transcription, PloS one, 3 (2008) e3094.
[249] R. Elfakess, H. Sinvani, O. Haimov, Y. Svitkin, N. Sonenberg, R. Dikstein,
Unique translation initiation of mRNAs-containing TISU element, Nucleic Acids Res,
39 (2011) 7598-7609.
[250] D. Avni, S. Shama, F. Loreni, O. Meyuhas, Vertebrate mRNAs with a 5'-
terminal pyrimidine tract are candidates for translational repression in quiescent cells:
characterization of the translational cis-regulatory element, Mol Cell Biol, 14 (1994)
3822-3833.
[251] O. Meyuhas, Synthesis of the translational apparatus is regulated at the
translational level, European journal of biochemistry / FEBS, 267 (2000) 6321-6330.
[252] M. Schafer, R. Kuhn, F. Bosse, U. Schafer, A conserved element in the leader
mediates post-meiotic translation as well as cytoplasmic polyadenylation of a
Drosophila spermatocyte mRNA, EMBO J, 9 (1990) 4519-4525.
[253] E. Kempe, B. Muhs, M. Schafer, Gene regulation in Drosophila
spermatogenesis: analysis of protein binding at the translational control element
TCE, Dev Genet, 14 (1993) 449-459.
[254] R.J. Katzenberger, E.A. Rach, A.K. Anderson, U. Ohler, D.A. Wassarman, The
Drosophila Translational Control Element (TCE) is required for high-level
transcription of many genes that are specifically expressed in testes, PloS one, 7
(2012) e45009.
[255] S. Moshonov, R. Elfakess, M. Golan-Mashiach, H. Sinvani, R. Dikstein, Links
between core promoter and basic gene features influence gene expression, BMC
genomics, 9 (2008) 92.
167
[256] A. Tamarkin-Ben-Harush, E. Schechtman, R. Dikstein, Co-occurrence of
transcription and translation gene regulatory features underlies coordinated mRNA
and protein synthesis, BMC genomics, 15 (2014) 688.
[257] L. Savinkova, I. Drachkova, T. Arshinova, P. Ponomarenko, M. Ponomarenko,
N. Kolchanov, An experimental verification of the predicted effects of promoter TATA-
box polymorphisms associated with human diseases on interactions between the
TATA boxes and TATA-binding protein, PloS one, 8 (2013) e54626.
[258] L.K. Savinkova, M.P. Ponomarenko, P.M. Ponomarenko, I.A. Drachkova, M.V.
Lysova, T.V. Arshinova, N.A. Kolchanov, TATA box polymorphisms in human gene
promoters and associated hereditary pathologies, Biochemistry. Biokhimiia, 74
(2009) 117-129.
[259] G. Li, X. Ruan, R.K. Auerbach, K.S. Sandhu, M. Zheng, P. Wang, H.M. Poh, Y.
Goh, J. Lim, J. Zhang, H.S. Sim, S.Q. Peh, F.H. Mulawadi, C.T. Ong, Y.L. Orlov, S.
Hong, Z. Zhang, S. Landt, D. Raha, G. Euskirchen, C.L. Wei, W. Ge, H. Wang, C.
Davis, K.I. Fisher-Aylor, A. Mortazavi, M. Gerstein, T. Gingeras, B. Wold, Y. Sun,
M.J. Fullwood, E. Cheung, E. Liu, W.K. Sung, M. Snyder, Y. Ruan, Extensive
promoter-centered chromatin interactions provide a topological basis for transcription
regulation, Cell, 148 (2012) 84-98.
[260] T.K. Kim, M. Hemberg, J.M. Gray, A.M. Costa, D.M. Bear, J. Wu, D.A. Harmin,
M. Laptewicz, K. Barbara-Haley, S. Kuersten, E. Markenscoff-Papadimitriou, D. Kuhl,
H. Bito, P.F. Worley, G. Kreiman, M.E. Greenberg, Widespread transcription at
neuronal activity-regulated enhancers, Nature, 465 (2010) 182-187.
[261] R. Andersson, C. Gebhard, I. Miguel-Escalada, I. Hoof, J. Bornholdt, M. Boyd,
Y. Chen, X. Zhao, C. Schmidl, T. Suzuki, E. Ntini, E. Arner, E. Valen, K. Li, L.
Schwarzfischer, D. Glatz, J. Raithel, B. Lilje, N. Rapin, F.O. Bagger, M. Jorgensen,
P.R. Andersen, N. Bertin, O. Rackham, A.M. Burroughs, J.K. Baillie, Y. Ishizu, Y.
Shimizu, E. Furuhata, S. Maeda, Y. Negishi, C.J. Mungall, T.F. Meehan, T.
Lassmann, M. Itoh, H. Kawaji, N. Kondo, J. Kawai, A. Lennartsson, C.O. Daub, P.
Heutink, D.A. Hume, T.H. Jensen, H. Suzuki, Y. Hayashizaki, F. Muller, F.
Consortium, A.R. Forrest, P. Carninci, M. Rehli, A. Sandelin, An atlas of active
enhancers across human cell types and tissues, Nature, 507 (2014) 455-461.
[262] S. Weingarten-Gabbay, E. Segal, A shared architecture for promoters and
enhancers, Nat Genet, 46 (2014) 1253-1254.
[263] R. Andersson, Promoter or enhancer, what's the difference? Deconstruction of
established distinctions and presentation of a unifying model, BioEssays : news and
reviews in molecular, cellular and developmental biology, (2014).
168
Figure legends
Fig. 1. General features of the core promoter region. A. The three main
core promoter types based on the distribution of TSSs, including focused,
dispersed and mixed promoters. Small arrows represent weak TSSs,
whereas a large arrow represents a single strong TSS. B. Chromatin features
of active core promoters include distinct post-translational modifications and
nucleosome depletion. Associated histones marks are depicted:
H3K4me2/me3 (orange), H3K4ac (gray), H3K27ac (light blue). A DHS/NDR
pattern ranging from nucleosome-free (light) to nucleosome-occupied regions
(dark) is illustrated below. C. Schematic illustration of the most common core
promoter elements found in focused promoters. The diagram is roughly to
scale. D. Schematic illustration of the known factors and sequence motifs that
are associated with dispersed promoters.
Fig. 2. The core promoter can be studied from different angles in
multiple resolutions. A. Zooming in on global genomic interactions in the
nucleus, one can study long-range interactions, such as those between
enhancers and promoters, by analyzing chromatin looping, cohesion function,
interactions of transcription factors (TFs) with co-activators and cis-regulatory
modules and interactions of the preinitiation complex (PIC) components with
their target promoters. B. Zooming in on the basal transcription machinery,
one can study the assembly and composition of the PIC at different Pol II-
promoters and on the 3D structure of different PIC components. C. Zooming
in on the DNA-binding PIC components (TFIIB and TFIID), one can focus on
the alternative protein components at different Pol II-promoters, on the core
promoter composition of specialized transcription programs, and on the
interactions of different PIC components with specific core promoter elements.
Fig. 3. Schematic model depicting the pivotal role of the core promoter
module in diverse molecular events and stages of gene expression. The
core promoter is important for (clockwise): basal transcription initiation and
PIC- core promoter compatibility and thus for PIC formation; enhancer-
promoter compatibility (which is schematically represented by the preferential
169
activation of DPE-dependent promoters by Caudal); promoter-proximal Pol II
pausing; termination/ polyadenylation and Pol II recycling; and translation, via
core promoter elements that play a role in both transcription and translation.
Please see the main text for detailed explanations.
170
Figure 1:
Figure 2:
171
Figure 3:
אילן-סיטת ברראוניב
:האנושי לגנום בדרוזופילה HOX-ה גני של מפרומוטורים
.האנושי DPE-ה אלמנט של פוטנציאלית ופעילות איפיון, זיהוי
דנינומתן -יהודה
על שם מינה ואבררד עבודה זו מוגשת כחלק מהדרישות לקבלת תואר מוסמך בפקולטה למדעי החיים
של אוניברסיטת בר אילן גודמן
תשע"ה , ישראל גן רמת
מן הפקולטה למדעי החיים על שם ,גרשון-עבודה זו נעשתה בהדרכתה של דר' תמר יובן
.מינה ואבררד גודמן של אוניברסיטת בר אילן
א
תקציר
ביטוי גנים תקין, חיוני לצורך קיומו ופעילותו של כל תא ותא באורגניזם השלם. מערך
תהליך עומד הגנים ביטוי מנגנון בבסיס. רבים פקטורים זה מבוקר על ידי רב שלבי תהליך
על DNA בתבנית משתמש RNA polymerase II (Pol II) האנזים, זה בתהליך. השעתוק
הבקרה מנקודות אחת. לחלבון להיתרגם דהעתי אשר mRNA מולקולת לשעתק מנת
התחלת על הבקרה הינה לחלבונים המקודדים הגנים ביטוי תהליך של והקריטיות הראשונות
לשעתק יוכל Pol II ש מנת על). promoter( הפרומוטר באזור המתרחשת מדויקת שעתוק
מערכת ידי על) core promoter" (הליבה פרומוטר" אל מגויס הוא, RNA מולקולות
אינטראקציות באמצעות) Preinitiation complex )PIC-ה יצירת תוך הבזאלית השעתוק
. DNA-וחלבון חלבון-חלבון
. מדויקת שעתוק התחלת המאפשר המינימלי DNA-ה מקטע הוא core promoter-ה
+) 1 -כ מוגדר) (Transcription start site; TSS( השעתוק תחילת אתר את מכיל זה מקטע
core -ה, כן כמו). TSS-ל ביחס+) 40( -ל) -40( בין( כלל בדרך בסיסים 80 -כ ואורכו
promoter ב קצרים פונקציונאליים רצפים מכיל-DNA ,שנקראים core promoter
elements (or motifs) ,ה למרכיבי עגינה נקודות המהווים-PIC .מצוי אלה אלמנטים בין
הקשורים גנים בביטוי חשוב תפקיד בעל DPE-ה מוטיב כי נמצא היתר בין. DPE-ה אלמנט
המתפתח העובר סגמנטי זהות לקביעת האחראים hox-ה גני כדוגמת, עוברית בהתפתחות
ונמצא מלנוגסטר דרוזופילה הפירות בזבוב התגלה DPE-ה. אותם המבקר caudal-ה וגן
עד, שנה 20 -כ לפני כבר זוהה זה שאלמנט למרות. אדם בבני גנים בשני גם פונקציונאלי
. זה אלמנט תחת המבוקרים נוספים הומניים גנים זוהו לא היום
בגנים DPE-ה אלמנט את לאפיין, וחישוביות מחקריות, דרכים במספר ניסינו, זו בעבודה
המאפיינים על התבססו המחקר גישות רוב. בפרט ההומניים Hox-ה ובגני, בכלל הומניים
ב
, והאדם הדרוזופילה בין אבולוציוני שימור על וכן בדרוזופילה שנמצא כפי DPE-ה אלמנט של
רצפיבהשוואת שימוש תוך. תאיים ברב מאוד ששמורים גנים, Hox-ה גני בין בייחוד
גני של פרומוטרים של מצומצם מספר כי הראנו, לוציפראז מבחני באמצעות וכן פרומוטורים
Hox אלמנט מכילים באדם DPE שאופיינו כפי המקוריות לדרישות העונה פונקציונאלי
ברמת לירידה גורמת DPE-ה לקונצנזוס העונה ברצף מוטציה, אלה במבחנים. בדרוזופילה
של הקריטריונים בבסיסם עומדיםש שלעיל האנליזות רוב. הנבדק מהפרומוטר השעתוק
, ניכר אבולוציוני שימור למרות כי להניח סביר. בידינו חרס העלו, מדרוזופילה DPE-ה אלמנט
אורגניזמים ושני היות מסוימת ברמה שונים והדרוזופילה האדם בין המקבילים האלמנטים
באופן, DPE-ל הומולגי אשר הומני אלמנט ולאפיין לזהות מנת על. אבולוציונית רחוקים אלו
לצורך) stable cell lines( יציבות תאים שורות יצרנו, בזבובים DPE ה ברצף תלוי שאינו
היא היחידה ההנחה. מתקדמים chromatin immune-precipitation ניסויי ביצוע
המבוססת הנחה, PIC-ב המצויים TAF9 -ו TAF6 החלבונים י"ע נקשר ההומני שהאלמנט
.קודמים מחקרים על
סרטני מחולי Cdx-ו Hox גני מספר של פרומוטורים אזורי של ריצוף בעקבות, כן על יתר
מאפיין אשר Hoxb6 הגן של יתר לביטוי מנגנון מציעים אנו, פעילותם ובחינת שונים דם
של שינוי( single nucleotide polymorphism (SNP)ש מציעים אנו. דם סרטני מספר
.הגן של בביטויו לשינוי גורם זה גן של TSS-ב) אחד נוקלאוטיד
בבריאות core promoter-ה הרכב הבנת של החשיבות את מציגה זו עבודה, לסיכום
DPE-ה אלמנט כי מראה העבודה, בנוסף. DPE-ה באלמנט התמקדות תוך ובחולי
הקונצנזוס ברצף שינויים ויכיל שיתכן, מקביל אלמנט לאפיין עתידה אך באדם פונקציונאלי
.ומיקומו
18
18
18
18
88
A.
B.
C.
D.