bar ilan universitylifefaculty.biu.ac.il/gershon-tamar/images/theses... · dr. tirza doniger, dr....

BAR ILAN UNIVERSITY

FROM PROMOTERS OF DROSOPHILA HOX GENES

TO THE HUMAN GENOME:

IDENTIFICATION, CHARACTERIZATION AND

POTENTIAL FUNCTION OF A HUMAN DPE MOTIF

Yehuda M. Danino

Submitted in partial fulfillment of the requirements for the Master's

Degree in the Faculty of Life Sciences, Bar-Ilan University

Ramat Gan, Israel 2015

This work was carried out under the supervision of Dr. Tamar Juven-Gershon,

from the Mina and Everard Goodman Faculty of Life Sciences, Bar Ilan

University

Acknowledgments

A complex research project like this is never performed without multiple

collaborations and many contributions of different people.

I would like to extend my appreciation especially to the following.

First and foremost, I would like to express my sincere gratitude to my

advisor Dr. Tamar Juven-Gershon, who has supported me throughout my

'Psagot' course, my thesis and even before, with her patience, immense

knowledge and her valuable advices and excellent guidance that allowed me

to work in my own way and to say all my thoughts. I appreciate her

enthusiasm from ideas and science in general, her continuous professional

and personal support, her endless patience, her striving for excellence and

her humanity. Her guidance helped me in all the time of research and writing

of this thesis. With her admirable leadership, I was able to make the best of

my abilities. One simply could not wish for a better or friendlier supervisor.

Special thank is reserved to Dr. Diana Ideses, our lab manager, for her

kindness, encouragement and insightful comments. Her help and friendship

are invaluable to me.

In my daily work I have been blessed with a friendly, cheerful and very

much helpful group of fellow lab mates: Yonathan Zehavi, Avital Ovadia-

Shochat, Adi Kedmi, Miri Pinhasov, Anna Sloutskin and the former lab

members Julia Sharabany and Ola Kuznetsova. Moreover, I would like to

thank B.Sc students, who worked with me in our lab: Sivan Shakartzi, Noy

Elimelech, Gal Nuta and Chen Katz, their help was very important.

Particularly, I would like to significantly thank Hila Shir-Shapira for working

close to me during this M.Sc. period including moments of success, gladness,

crying and immediately support. A significant part from my knowledge is due

to her. Sometimes, little words say big things. Furthermore, I would like to

especially thank Dan Even for the many valuable discussions, support and

help. His readiness to help, even in late night, and his friendship will remain

with me forever.

I would also like to thank Prof. Sergey Chesnokov and his wife Lina, Dr.

Julia Starkova, Amitay Drummer, Dr. Julia Zeitlinger and Wanqing Shao for

their collaboration and their part in my thesis. Moreover, I would like to thank

Dr. Tirza Doniger, Dr. Haim Wachtel and Dr. Ofir Hakim for their help and their

smart advices. I wish everyone the best in their future research and work.

I am also grateful to my friends: Einav Cohen, Oriel Davidi, Daniel Rika,

Telem Nahum, Snir Dadon, Elor Lict and Dvir Manjem. Their encouragement

and understanding were a source of support to me.

Nevertheless, none of this would have been possible without the love and

patience of my family. My immediate family has been a constant source of

love, concern, support and strength all these years.

Finally, last but not least, I would to thank God, for every single moment of

my life and of challenge and success during this scientific journey to

understand, if only a little, its majesty and greatness in order to reveal its

honor in the reality.

Table of Contents

Abstract ......................................................................................................... I

1. Introduction ............................................................................................... 1

1.1. The complexity of the transcription initiation landscape and the core

promoter region ......................................................................................... 1

1.2. The focused core promoter elements ................................................. 3

1.2.1. The TATA box .............................................................................. 6

1.2.2. The Inr .......................................................................................... 6

1.2.3 The DPE ....................................................................................... 7

1.3. The Homeotic proteins: evolutionary conservation, structure and

function in development ............................................................................ 8

1.4. The network of HOX and CDX proteins and their role in blood

cancers .................................................................................................... 11

2. Research Importance: ............................................................................ 13

3. Research Goals ...................................................................................... 15

4. Results .................................................................................................... 16

4.1. The integrated approaches of the researc ........................................ 16

4.2. Part I: Promoters of human Cdx/Hox genes contain functional DPE

motifs ....................................................................................................... 18

4.2.1. The 'Individual examination' project ........................................... 19

4.2.2. The PCP project ......................................................................... 24

4.2.3. The 'Individual examination' project- further analysis ................. 32

4.3. Part II: Identification of a SNP in the +1 position of the Hoxb6 and its

potential implications in health and disease ............................................ 35

4.4. Part III: Whole genome analysis ....................................................... 41

4.4.1. Identification of human promoters that contain Drosophila DPE

sequence motifs ................................................................................... 41

4.4.2. ElemeNT- a core promoter Elements Navigation Tool ............... 45

4.5 Part IV: Identification of binding sites of the human TAF6 and TAF9,

subunits of the TFIID in human promoters .............................................. 48

5. Discussion ........................................................................................... 55

In search of a functional human DPE motifs – analysis of human Cdx/Hox

promoters and computational whole genome analysis ............................ 55

Identification of a SNP in the +1 position of the Hoxb6 and its potential

implications in health and disease ........................................................... 58

Identification of binding sites of the human TAF6 and TAF9, subunits of

the TFIID in human promoters ................................................................. 59

6. Materials and Methods ........................................................................... 61

7. Reference ............................................................................................... 70

8. Publications during the M.Sc. period ...................................................... 78

9. Appendixes ............................................................................................. 79

Appendix 1 .............................................................................................. 79

Appendix 2 .............................................................................................. 80

Appendix 3 .............................................................................................. 85

Appendix 4 .............................................................................................. 86

Appendix 5 .............................................................................................. 87

Appendix 6 .............................................................................................. 89

Appendix 7 .............................................................................................. 90

Appendix 8 ............................................................................................ 112

Appendix 9 ............................................................................................ 113

א ............................................................................................................. תקציר

I

Abstract

Accurate gene expression is pivotal for determining the distinct identities

and dedicated functions of different cells and tissues in the multicellular

organism. This multistep program is regulated by mechanisms that comprise

diverse multiplayer molecular circuits of multiple dedicated components.

One process underlying accurate gene expression is transcription.

Transcription of protein-coding genes by RNA polymerase II (Pol II) uses the

DNA sequence as a template for transcribing mRNA molecules, which in turn

would be translated into proteins. Transcription initiation is one of the first and

central regulation points underlying the expression of protein-coding genes

and distinct non-coding RNAs. It occurs following the recruitment of Pol II to

the core promoter region by the basal transcription machinery, during the

preinitiation complex (PIC) assembly, through protein-protein and protein-

DNA interactions.

The core promoter is generally defined as the minimal DNA sequence that

directs accurate initiation of transcription. This region encompasses the

transcription start site (TSS), typically referred to as the +1 position, and its

length is approximately 80bp (from -40 to +40, relative to the TSS). Moreover,

the core promoter sequence contains short functional DNA sequences, which

are termed core promoter elements (or motifs), which confer structural and

functional properties to the core promoter. The downstream core promoter

element (DPE) is one of these elements. The DPE element has an important

role in the expression of regulated genes that are associated with embryonic

development, including the caudal gene that regulates the Hox genes and the

Hox genes themselves, which specify the identity of the segments in the

II

developing embryo. The function of the DPE was mainly characterized in the

fruit fly, Drosophila Melanogaster. Even though this element was discovered

about twenty years ago, to date, a functional DPE has only been identified in

two human promoters (irf-1 and calm2).

In this study, we tried to identify the DPE motif within promoters of human

genes in general, and within the promoters of the Hox genes in particular,

using several computational and experimental methods. The majority of the

research approaches was based on the features of the DPE, as defined in

Drosophila, and on the evolutionary conservation between Drosophila to

humans, particularly in the Hox genes, which are highly conserved in

metazoans. Remarkably, we demonstrated that several promoters of human

Hox genes contain a functional DPE motif. The majority of the

abovementioned analyses, which were based on the definition of the

Drosophila DPE, were unsuccessful in identifying novel human DPE-

containing promoters. It is reasonable that despite significant evolutionary

conservation, these organisms are evolutionarily distant and the human DPE

may differ from the Drosophila DPE. In order to identify and characterize the

equivalent of the DPE motif in the human genome, independent of the

Drosophila DPE definition, we generated stable cell lines for performing

advanced chromatin immune-precipitation assays (ChIP). The assumption is

that the homologous element in humans is bound by TAF6 and TAF9, which

are found in the PIC. The assumption is based on the finding that human

TAF6 and TAF9 have previously been shown to bind the human DPE-

containing irf-1 promoter.

III

Furthermore, following sequencing of several Hox and Cdx promoter

regions from dozens of patients with different blood cancers, we suggest that

overexpression of the Hoxb6 gene, a common feature of blood cancers, may

result from single nucleotide polymorphism (SNP). To summarize, this study

highlights the importance of understanding the core promoter composition in

health and disease by focusing on the DPE motif. In addition, this work

demonstrates that the DPE motif is functional in humans. However, it remains

to be determined whether a functional equivalent motif, in human, may have a

different sequence and position constraints.

1

1. Introduction

1.1. The complexity of the transcription initiation landscape and the core

promoter region

Appropriate temporal and spatial gene expression is a highly complex

process underlying the fate and function of different cells and tissues. The

regulation of this process is composed of multiple levels and orchestrated

molecular events [1-3]. A central event in the regulation of eukaryotic gene

expression is the initiation of transcription. The initiation of transcription of

protein-coding genes and distinct non-coding RNAs occurs following the

recruitment of RNA polymerase II (Pol II) to the core promoter region by the

basal transcription machinery [4].

The core promoter is generally defined as the minimal DNA sequence that

directs accurate initiation of transcription. The core promoter sequence

encompasses the transcription start site (TSS), typically referred to as the +1

position [5, 6].

In the past, it was assumed that the core promoter is a generic entity that

functions in a universal manner. Nowadays however, the growing convention

is that the unique properties of a given promoter are a function of its

architecture and core promoter motifs composition [5-8]. The core promoter,

which is referred to as “the gateway to transcription”, is a central component

in the initiation of transcription [9, 10]. Research in the past decade has

enhanced our understanding of the fundamental roles that the core promoter

plays in the initiation of transcription, as well as in the regulation of additional

aspects of gene expression. Two modes of transcription initiation were noted

in metazoan, focused and dispersed [8] (Figure 1).

2

Focused (also termed “sharp peak”) promoters contain a single

predominant TSS or a few TSSs within a narrow region of several nucleotides

[7]. Focused promoters encompass approximately between -40 to +40

nucleotides relative to the TSS (referred to as the +1 position). Focused

transcription initiation is associated with spatiotemporally regulated tissue

specific genes [11] and with canonical core promoter elements, which have a

positional bias, such as the TATA box, Initiator, MTE and DPE [12].

Dispersed (also termed “broad”) promoters, contain multiple weak start

sites that spread over 50 to 100 nucleotides at the promoter region ([7, 8] and

refs therein). Dispersed transcription initiation is associated with constitutive

or housekeeping genes. Vertebrate dispersed promoters often contain CpG

islands and Sp1 and NF-Y sites [6, 7, 13] whereas Drosophila core promoters

often contain elements that have weaker positional biases (as compared to

the focused promoters), such as the Ohler 1, DNA replication element (DRE),

Ohler 6 and Ohler 7 [12, 14]. Although the focused promoter architecture

exists in all the organisms and is the predominant initiation mode in simpler

organisms, the dispersed mode is more common in higher eukaryotes [7, 11].

Figure 1. Three main shapes of core promoters including dispersed, focused and

mixed promoters, based on their distribution of TSSs. Small arrows represent weak

TSSs, whereas large arrow represents a single strong TSS. Estimated length of each

promoter type is shown at left and their matched name is shown at right.

3

Although the focused promoter architecture exists in all the organisms and is

the predominant initiation mode in simpler organisms, the dispersed mode is

more common in higher eukaryotes [7, 11]. From a teleological standpoint, the

associations of sharp TSSs with regulated genes and of broad TSSs patterns

with constitutively expressed genes are rather intuitive. It would be easier to

achieve a more precise control of gene expression from focused TSSs as

compared with dispersed promoters of housekeeping genes, which would be

constitutively transcribed with minimal variation of constitutive gene

expression by usage of multiple start sites [7]. With respect to the “focused vs.

dispersed” sub-classifications mentioned above, a mixed promoter, an

additional promoter type, was revealed. This promoter type exhibits a

dispersed initiation pattern with a single strong transcription start site [6, 15]

(Figure 1). Nevertheless, from this point, the reference, in this work, will only

be to the focused core promoter landscape.

1.2. The focused core promoter elements

Classic biochemical studies performed over 30 years ago using the TATA

box-containing adenovirus major late promoter, identified the general

transcription factors (GTFs) as accessory factors for accurate Pol II

transcription initiation [16, 17]. The GTFs were named TFIIA, TFIIB, TFIID,

TFIIE, TFIIF and TFIIH, based on the protein fractions they purified in

(Reviewed in [4]) . These components, together with Pol II were necessary

and sufficient for basal transcription of the adenovirus major late promoter.

They assemble into the preinitiation complex (PIC) by protein-protein

interactions and by mediating core promoter recognition (Figure 2).

4

The basal transcription machinery recruits Pol II to the core promoter that

directs the initiation of transcription [4, 6, 7, 18-20]. The Pol II core promoter is

composed of short DNA sequences that are referred to as core promoter

elements or motifs. The majority of core promoter motifs serve as binding

sites for components of the basal transcription machinery, in particular TFIID

and TFIIB. Notably, TFIID is composed of TATA box-binding protein (TBP)

and TBP-associated proteins (TAFs) [4, 21, 22] (Figure 3). Nevertheless,

there are no universal core promoter elements, and diverse core promoter

compositions have been reported [6, 23]. The vast majority of the core

promoter motifs have been identified in the focused core promoter region.

There are different core promoter motifs that are currently known: human and

Drosophila Inr (Initiator), TATA box, BREu, BREd (up/downstream TFIIB

recognition elements), human TCT and Drosophila TCT, MTE (motif ten

element), DPE (downstream promoter element), Bridge, DCE (downstream

core element), XCPE1 (X core promoter element 1) and XCPE2 (X core

promoter element 2) [6, 18, 24] (Figure 4).

Figure 2. Schematic representation of the architecture of the preinitiation complex at a

promoter of Pol II-transcribed gene. The PIC is illustrated including all the GTFs and the Pol

II. The arrow represents the TSS, which is located within the core promoter.

5

These DNA motifs and their combinations contribute to the architecture and

function of the core promoter [25]. Although many elements have been

discovered so far, it is reasonable that additional elements will be discovered

in the future.

Figure 3. Schematic illustration of TFIIB and hollo-TFIID multicomplex. TFIIB and TFIID

are the central basal transcription factors, which bind the core promoter region. TFIIB, TBP

and TAF1/2 bind the BREs, TATA box, and Inr elements, respectively. TAF6/9 bind the MTE

and DPE elements. The other TAFs, which are not known as DNA binding factor, are

mentioned in general.

The core promoter elements were characterized by computational and

experimental methods. The consensus sequence of each of the elements is

defined by the IUPAC code for nucleotides. Here, only three main core

promoter elements will be discussed: TATA box, Inr and DPE.

6

1.2.1. The TATA box

The TATA box motif is the first core promoter motif to be identified [26].

Although the TATA box was previously considered to be a universal element,

it is presently estimated that only 8%-30% of metazoan core promoters [11,

18, 27-29] and 20%-46% yeast promoters [20, 30, 31] are TATA-dependent.

The TATA box motif is also present in plants [32, 33]. The TATA box is bound

by the TBP subunit of TFIID ([5, 6, 23] and refs therein). Both the TATA box

element and the TBP are conserved from archaebacteria to humans [7, 34].

The consensus sequence of the TATA box is TATAWAAR, where the 5' T is

usually located at -30 or -31 relative to the TSS in metazoans (or at -120 to -

40 in yeast). A wide range of sequences can functionally replace the yeast

TATA box for in vivo transcriptional activity [35].

1.2.2. The Inr

Early studies from the Chambon lab described the existence of a putative

element at the TSS [36] and the function of the initiator (Inr) as a

transcriptional element that encompasses the +1 TSS was articulated by

Smale and Baltimore [37]. The Inr is probably the most prevalent core

Figure 4. Schematic illustration of the majority of the known core promoter elements at

focused promoter area (-40 to +40 relative to the TSS). The diagram is roughly to scale

and every motif is colored differently. The arrow represents the TSS, the +1 position.

7

promoter motif in focused core promoters [27, 38, 39]. It is mainly bound by

the TAF1 and TAF2 subunits of TFIID [40-42]. The mammalian Inr consensus

sequence is YYANWYY (IUPAC nomenclature) [43], and the Drosophila

consensus is TCAKTY with A designated as the +1 [42, 44]. Inr-like

sequences were also identified in Saccharomyces cerevisiae [45].

Computational analyses of promoters argue that the Inr consensus is only YR

(-1, +1 positions) in humans [8, 11, 46] or TCAGTY for Drosophila [27, 38].

The A nucleotide (or R in the YR consensus) is generally designated as the

+1 position, even when transcription does not initiate at this specific

nucleotide. This critical convention is instrumental, because functional

downstream elements are completely dependent on the presence of an Inr

and the precise spacing from it [6, 7, 10].

1.2.3 The DPE

The downstream core promoter element (DPE) is located at +28 to +33

relative to the initiator‟s A+1, with a functional range set of „DSWYVY‟ and it

is recognized by TAF6 and TAF9 [47-49]. In addition to this functional range

set, the guanine at +24 was experimentally found as a contributor to the DPE

function [49].The DPE is associated with and enriched in developmental gene

networks [8, 50-52], and it is conserved from Drosophila to humans [48, 53].

This element was determined to be exclusively dependent on the presence of

a functional initiator, and dependent on strict spacing from it. Additionally, one

central feature of the DPE is its enrichment in TATA-less promoters. However,

co-occurrence of putative TATA, Inr and DPE was observed in a small fraction

of Drosophila genes [7, 29].

8

1.3. The Homeotic proteins: evolutionary conservation, structure and

function in development

The Homeotic proteins (HOX), which are encoded by the Homeobox genes,

are Helix-Turn-Helix transcription factors (TFs) that were first identified in

1978 in the fruit fly, Drosophila melanogaster [54]. These TFs contain a

conserved sequence-specific DNA-binding domain of 60 amino acids, termed

homeodomain or homeobox, which was discovered after their identification

[55]. The HOX proteins are the most investigated family of proteins from all

the Homeodomain-containing proteins. Nowadays, based on this conserved

domain, it is known that the HOX proteins are conserved among all the

eukaryotes [56, 57] (Figure 5). Another group of Homeodomain-containing

proteins is that of the ParaHox genes, the paralogues of the HOX proteins.

The Drosophila Caudal protein and the vertebrates CDX proteins (Caudal-

type homeobox) belong to the ParaHox genes [58].

The HOX and the ParaHox proteins are known as master regulators in

development and differentiation [57, 59-62]. The genes, which encode the

Drosophila HOX proteins, are found in two proximal gene clusters: the Ant-C

and BX-C clusters [54, 57, 63]. This family of genes contains eight Hox genes

that are expressed in the developing embryo in a collinear manner, similar to

their genomic organization. Additionally, in a wide variety of animals, ranging

from C. elegans to mice, mutations and disorders in the Hox genes result in

morphological defects including transformation of one body region into

another. Thus, owing to these 'Homeotic transformations' the Hox genes

called also as 'homeotic genes' [54]. The vertebrate CDX proteins and the

Drosophila Caudal protein regulate Hox gene expression [50, 64]. The name

9

'caudal' originated from its expression in the posterior part of the embryo.

Caudal is expressed in a maternal, as well as zygotic manner [65-67]. It

influences the anterior-posterior axis and regulates additional genes apart

from the Hox genes [50, 68, 69].

Figure 5. Evolutionary development of Hox genes among eukaryotic species. The

phylogenic tree represents the duplications and divergence of the Hox genes from the first

Homeobox gene to humans. The colored boxes represent the Hox genes in each of the

species, and the same color represents the same origin of the genes [57].

In Drosophila, the Hox genes are expressed in a relatively late stage during

embryogenesis. The development of the Drosophila body plan is multi-step

process that includes a sequential expression of maternal-, gap-, pair-rule-

and segment polarity genes, which gradually produce the body segments and

then determine their polarity. After these stages, the role of the Hox genes is

to specify the identity of the segments ([50, 62, 70] and refs therein).

The HOX proteins function as heterodimers or as components of larger

protein complexes to regulate multiple cellular processes (including adhesion,

cell cycle, cell death and cell movement) to regulate animal development and

10

morphology via transcriptional activation or repression of many target genes

[59, 71].

Focusing on humans, the caudal-related and the Hox genes were

duplicated and divergent through the evolution (Figure 6). Consequently,

there are 39 Hox genes, which are organized in four clusters (clusters: A, B, C

and D) on different chromosomes in the human genome [57]. Moreover, as

mentioned above, there are three Cdx genes in the human genome namely,

Cdx1, Cdx2 and Cdx4. Similarly to Caudal, the CDX proteins regulate the

expression of the vertebrate Hox genes [56, 58, 64, 72, 73]. The conservation

of the Hox genes is not only reflected by the DNA sequence, protein

homology and function, but also by their collinear genomic arrangement

relative to their distinct, often overlapping, expression domains [56, 57, 59].

Figure 6. Evolutionary conservation of Hox

genes between Drosophila and humans. Two

Hox clusters, the Ant-C and the BX-C clusters, in

Drosophila and four clusters of human Hox genes

(A, B, C and D) are schematically represented. The

colored boxes represent the Hox genes in each of

the species, and the same color represents the

duplication and divergent of the same colored

original gene in Drosophila and humans. Each color

in the fly and human embryo, indicates the anterior-

posterior expression domain of each Hox gene.

Both the fly and the human embryo are aligned

from left to right [57].

From the transcription regulation perspective, it was found that the Caudal

protein regulates the expression of Drosophila Hox genes by activation of

their DPE-dependent promoters [50] (Figure 7). Different core promoter

11

elements have previously been shown to contribute to enhancer-promoter

compatibility [74].

1.4. The network of HOX and CDX proteins and their role in blood

cancers

Since their discovery, it has been revealed that the Hox genes also play a

role in normal adult tissues. Along with that, mutations in several Hox genes

have been found to cause pathological processes in humans, which range

from morphological disorders to cancers [75]. Molecular pathways, including

the CDX-HOX pathway, which underline malignancies, have been shown to

regulate embryogenesis [56]. Specifically, the Hox and Cdx genes normally

function in adults in self-renewal of hematopoietic precursors and in the

hematopoiesis process. However, Cdx and Hox genes are involved in the

development of blood cancers [58, 73, 76, 77]. Indeed, abnormal expression

patterns of Hox and Cdx genes have been reported in diverse blood cancers

[78-80] (Figure 8). Normally, the expression of the Hox and Cdx genes

Figure 7. Schematic illustration of preferential activation by Caudal. In Drosophila, Caudal

(Green oval), a master regulator of Hox genes, preferentially activates functional DPE-dependent

genes as compared to TATA-containing genes. PIC; preinitiation complex (Orange cloud). Arrows

pointing away from Caudal represent activation and the bold arrow represents preferential

activation. The smaller, 90 degrees arrows at the core promoter regions represent TSSs.

12

decreases during the differentiation of hematopoietic cells [76]. Nevertheless,

in the malignant state there is an aberrant expression of multiple Hox and Cdx

genes, especially genes from the A, B and C Hox clusters, which are

inappropriately activated by the CDX proteins [58, 76, 77, 80, 81].

Figure 8. The Hox and Cdx expression levels in normal and malignant hematopoiesis.

A. Normal hematopoiesis. Cdx1 and Cdx2 are not expressed but Cdx4 is expressed. Its

expression levels as well as the Hox expression levels are diminished during differentiation.

B. Malignant hematopoiesis. The Hox genes are deregulated: ectopic expression of Cdx2 in

progenitor cells induces leukemic transformation and overexpression of Hox genes in

hematopoietic cells is observed during all the differentiation stages. The expression of Cdx4 is

diminished and the Cdx1 is not expressed [58].

Furthermore, as a result of chromosomal translocation events, Cdx2 or Hox

genes are fused with other genes, which promote oncogenesis. Thus, fused

proteins, which include CDX2 or HOX proteins, are a common phenotype in

cancers and particularly, in specific blood cancers [77, 82, 83]. Hence, there is

a body of evidence demonstrating a link between Hox and Cdx genes and

different blood cancers that suggests that these genes and their deregulated

expression could be the cause or the consequence of cancers [56].

13

2. Research Importance:

First, this research aims to deepen the understanding of regulation of Pol II

transcription via the core promoter region, generally, in eukaryotes and, in

particular, in humans. This can be achieved through the characterization of

promoters and their function through the functional analysis of putative core

promoter elements.

Second, the importance of this research stems from its novel approach to

compare between promoter sequences in Drosophila melanogaster and

promoter sequences in humans (part 1). Hence, I actually examine and rely

on the evolutionary conservation in the promoter region. This point of view is

not common, as most conservation studies examine the protein coding

regions of genes and not the regulatory regions. I have also taken this

approach in another part of this work, in which I relate to substitutions, such

as single nucleotide polymorphisms (SNPs) or mutations, in the core promoter

sequence. To date, the majority of the SNPs studies have been examining

their effect in protein coding regions, since these changes could drive

substitutions in amino acids and thus change the function of the protein

encoded by this gene.

Moreover, in addition to the basic insights about transcription regulation,

this research should be potentially relevant at the clinical level.

If a SNP in a specific Hox promoter is found in a higher frequency in patients

with a particular type of blood cancer, as compared to the healthy population,

such SNP could serve as a diagnostic marker for the disease in the future.

Last but not least, this study analyzes the DPE motif in the human genome.

Since 1996, when the DPE motif was first discovered, only two human genes

14

have been published as genes that contain a functional DPE in their

promoters [48, 53]. Thus, researches in the transcription community that use

the Drosophila DPE consensus to search for human genes that contain a

DPE, believe that the DPE may be fly-specific [8]. My research takes on the

great challenge of redefining the DPE element in the human genome using

approaches that not used before to explore this issue, such as Chromatin

Immunoprecipitation with nucleotide resolution through exonuclease unique

barcode and single ligation (ChIP-nexus). It is a novel technique that maps

transcription factor binding footprints genome-wide in vivo at nucleotide

resolution. Thus, the new methodology will enable the identification and

subsequent characterization of human downstream core promoter elements,

in a manner that is independent of the Drosophila DPE consensus.

15

3. Research Goals

1. To identify and characterize functional DPE-containing genes among

human genes and in particular among human Hox/Cdx genes, based

on our knowledge of the Drosophila DPE motif

2. To identify and characterize a downstream element in human

promoters, using an unbiased approach (which is independent of the

Drosophila melanogaster DPE motif) and a newly developed

Chromatin Immunoprecipitation (ChIP) methodology.

3. To analyze the involvement of the promoters of the human Cdx and

Hox genes in the AML and ALL blood cancer types.

16

4. Results

4.1. The integrated approaches of the research

In this study, several independent but integrated strategies were used to

discover, identify (both computationally and experimentally) and functionally

characterize the DPE motif in the human genome and examine its association

with different blood cancers.

The integrated strategies employed were:

a. Individual examination

b. The PCP project

c. The clinical approach

d. Whole genome analysis

e. Advanced ChIP-seq method named ChIP-nexus.

All these strategies, apart from the last one, are based on several guidelines

in order to increase the likelihood that the putative DPE sequences, identified

in the human genome, would be functional. There are three guidelines (Figure

9):

First, as a key criterion, the putative DPE motifs in the human genome must

match the functional range set of the Drosophila DPE motif and be absolutely

dependent on the presence of an Inr. Thus, the mammalian Inr has to contain

at least 3 out of 7 matches to the Inr consensus ('YYANWYY') with an

obligatory match of the 'cA' or 'tA' at positions -1, +1, respectively.

Additionally, the DPE has to be in located with a precise distance from the Inr

(the DPE must be positioned at +28 to +33 relative to the A+1 of the Inr) and

at least 4 out of 6 matches of the sequence to the functional range set

('DSWYVY') with an obligatory match of 'G' or 'C' at position +29 [47-49].

17

It was previously observed that having a „G‟ or a „C‟ at this position (+29) is

critical for the functionality of the DPE motif in Drosophila [49].

The second criterion is whether there are EST clones and/or tags for the

candidate gene and whether the EST clones and/or tags initiate at a position

that matches the A+1 of the putative Inr in the identified Inr-DPE combination

(or within one nucleotide upstream or downstream). These clones and tags

mark experimentally defined TSSs. The information of ESTs and tags is found

in the Genome Browser [84, 85] and the dbTSS (database of transcriptional

start sites in humans) [86], respectively.

The last but not the least guideline, is that the A +1 of the Inr in a putative

Inr-DPE combination is close to the Genome Browser/RefSeq annotated TSS

('known TSS').

These criteria enable the selection of the most likely DPE-containing

candidate genes in the human genome.

Figure 9. Diagram of the three basic guidelines

in order to increase the likelihood that the

putative DPE motifs identified in the human

genome are functional. Putative combinations of

Inr and DPE at human promoters were defined as

the best cndidates of functional DPE containing

promoters basically according to four guidelines;

match to the Drosophila DPE definition, existence

of EST clones, tags and known TSS by the

RefSeq and their proximity to the A+1 of the

putative combination.

18

In general, all the search modes in the human genome underlying the

majority of the research approaches used in this study are based on the

known information from Drosophila.

Likewise, the analysis of human Hox genes using the Drosophila DPE

definition, is, actually based on three experimental findings, which are the

starting point of this study:

a. The majority of the Drosophila Hox genes are DPE dependent [50].

b. To date, the identification of two DPE-dependent human genes, irf1

[48] and calm2 [53] has been published. Their DPE motifs are

characterized by the definition of the Drosophila DPE.

c. The Hox genes are evolutionarily conserved from simple organisms to

humans [57].

Hence, in this study, I focus on the human Hox genes a) because of their

importance in development and cancers and b) as a test case for the

identification and characterization of human DPE-dependent genes.

4.2. Part I: Promoters of human Cdx/Hox genes contain functional DPE

motifs

In order to examine and identify functional DPE sequences at the core

promoter regions of the human Cdx and Hox genes that are pivotal for

transcription, two parallel work channels were developed: An experimental

project, which will be referred to as 'Individual examination', and a

computational approach that was termed 'The PCP project'. These two

19

projects were based on searching for putative combinations of Inr and DPE in

the human Cdx genes and in the clusters of the human Hox genes. The

combinations were defined by the criteria mentioned above.

4.2.1. The 'Individual examination' project

In the 'Individual examination' project, Hila Shir-Shapira, a Ph.D. student in

our lab, and I, cloned together selected minimal promoters (-40 to +40 or -10

to +40 relative to the +1 position of the RefSeq) of the human Cdx and Hox

genes, upstream of the firefly Luciferase (FL) reporter gene in the pGL3 Basic

modified plasmid (in this work, this vector is referred to as pGL3). These

plasmids were transiently transfected into HEK-293 (Human embryonic kidney

cells) by the calcium-phosphate method (see 'Materials and Methods'). In

addition, two more plasmids were transfected. The first one was the TK

Renilla Luciferase (RL) containing plasmid, as a normalization control. The

second plasmid was the pBlueScript vector, in order to supplement the DNA

amount to the required total DNA amount. Generally, we compared between

wild-type (wt) constructs, which contain putative combinations of Inr and DPE,

and mutant (mD) constructs, which contain combinations of Inr and mutant

DPE. Cells were harvested 48 hours following the transfections and cellular

extracts were subjected to dual-luciferase analysis Thus, by the FL activity,

we could indirectly examine the activity of the core promoters and their

dependency on specific sequence motifs.

We initially examined the minimal promoters of the human Cdx1 and Cdx2,

genes (Figure 10).

The Cdx2 minimal promoter contains a combination of an Inr and DPE, which

matches in 4 out of 6 positions to the DPE definition, and does not contain a

20

putative TATA box sequence in the appropriate location. Upon mutation of the

putative Cdx2 DPE (mutating nucleotides 'AGAGG' to 'CTCATG'), there is a

~50% reduction in reporter activity as compared to the activity of a wt-driven

reporter (Figure 10A). Thus, the basal transcription of the human Cdx2 gene

is dependent on the DPE motif.

Figure 10. Transcription from the human Cdx1 promoter is dependent on both a TATA

box and a DPE motif whereas transcription from the Cdx2 promoter is dependent on

the DPE motif. The bar graphs illustrate a summary of three independent dual luciferase

experiments (each performed in triplicates). Mutated minimal promoter-driven constructs:

mutant DPE (mD), mutant TATA box (mT) or mutant TATA box and DPE (mTmD) were

compared to Wild-type (wt) minimal promoter-driven constructs A. Fold activation of the

human Cdx2 minimal promoter variants. B. Fold activation of the human Cdx1 minimal

promoter variants. The error bars represent the standard error of the mean (SEM).

In contrast to the human Cdx2 and the majority of Hox genes (see below),

the human Cdx1 promoter contains a TATA box motif in addition to a putative

Inr-DPE combination. Both the putative TATA box and the putative DPE

sequences fully match their consensus. The Inr motif only matches in 4 out of

7 positions to the mammalian Inr consensus. Nevertheless, the A+1 of the Inr,

in this triple element combination (TATA box, Inr and DPE), coincides with the

RefSeq +1 position of the gene (Appendix 1). As can be seen, only when both

the TATA box and DPE were mutated (mTmD, mutating nucleotides

21

'TATAAAAG' to 'ACGGACGT' and 'GCTCGT' to 'CTCATA' for TATA box and

DPE, respectively), a significant reduction of about 40% was observed.

However, neither the mutated TATA box (mT)-driven nor the mutated DPE

(mD)-driven constructs displayed significantly reduced reporter activities

(Figure 10B). Thus, the basal transcription of the human Cdx1 gene is not

exclusively dependent on the DPE motif, but this motif contributes to Cdx1

transcription at a certain level. Notably, the existence of sequence motifs that

match the Drosophila DPE definition, influence basal transcription levels of

both human Cdx1 and Cdx2 genes.

Surprisingly, a year after obtaining these results, there was an update to the

RefSeq annotation of the major TSSs of human genes, which indicated a

different and more distal TSS of the Cdx2 gene. Moreover, the analysis of

relations and the mutual contributions of TATA, Inr and DPE is beyond the

scope of this study, as we want to focus on the function of the DPE only and

its influence on transcription. Thus, we decided to discontinue the analysis of

the Cdx1 and Cdx2 promoters at this point.

Subsequently, the core promoters of all 39 Hox genes were manually

analyzed. As a preliminary step, the presence of TATA box, Inr and DPE

sequences in the appropriate positions was examined in the -200 to +200

region (relative to the TSSs of RefSeq) (Table 1). In this analysis, we used the

abovementioned criteria, as well as additional parameters such as: Pol II

occupancy, nucleosome positioning, histone modifications and CAGE

patterns, to select promising candidates of Inr and DPE combinations.

22

Core promoter

element

Human Hox gene cluster Total

A B C D

DPE only 9 6 4 7 26

TATA only 1 3 2 0 6

TATA+DPE 1 3 1 1 6

No TATA or DPE 0 0 2 3 5

Total 11 12 9 11 43

Table 1. A summary table of the appearance of TATA box- and DPE (associated with

Inr- like sequences) arround the TSSs of the human Hox transcripts. Sequences of

about 400nt around the known TSSs of all the Hox genes (200nt from each side), both, the

major TSS and alternative TSSs of each gene, were analyzed for the presence of TATA box,

combinations of Inr and DPE or TATA, Inr and DPE together, which may regulate

transcription of the Hox genes. 'TATA+DPE' or 'DPE only' means a human Inr and DPE

combination with strict spacing of 27bp between these two elements with or without TATA box

existence, respectively. 'No TATA or DPE' means that there is no TATA or Inr and DPE

combination at all, or in a reasonable distance (less than 50bp) from the known TSS.

Overall, twenty six putative combinations of Inr and DPE were identified

among all four Hox gene clusters. The best candidate promoters that we

decided to focus on, were: Hoxa1, Hoxa2, Hoxa9, Hoxa11, Hoxb3, Hoxb9,

Hoxc6, Hoxc8, Hoxd3, Hoxd9 and Hoxd10 (Figure 11, see Appendix 1).

23

Figure 11. Multiple human Hox genes contain the Drosophila DPE sequence motif, but

only a subset of these genes is transcriptionally dependent on this DNA element. The

bar graphs depict summaries of three independent, dual luciferase experiments (performed in

triplicates) that compare between minimal core promoter variants. Mutant DPE-containing

minimal promoter-driven constructs (mD) were compared to wild-type (wt) minimal promoter-

driven constructs. A. Fold activation of wt and mD pairs of Hox genes whose activities were

either not affected or were enhanced by the mutation. B. Fold activation of wt and mD pairs of

Hox genes whose activities were reduced by the mutation. The error bars represent the

standard error of the mean (SEM).

As shown in Figure 11, four Hox candidates (Hoxb3, Hoxc6, Hoxd9, and

Hoxd10) are not affected by the mutation of the DPE, which means that the

identified DPE sequence is not significantly important for their transcription

regulation (Figure 11A). However, the other tested candidates were affected.

While the luciferase activity of two Hox genes (Hoxa9, Hoxd3) seem to be

24

enhanced by the DPE mutation (Figure 11A), the luciferase activity of five

more Hox genes (Hoxa1, Hoxa2, Hoxa11, Hoxb9 and Hoxc8) is reduced as a

result of the mutation, suggesting that the DPE is important for their

transcription (Figure 11B). The nucleotides at positions +28 - +34 of each Hox

genes were mutated to 'CTCATGT' except for Hoxb9, which was only mutated

to 'CTCATG' at positions +28 - +33. The enhancement of the activities of

Hoxa9 and Hoxd3 following the DPE mutation, can be explained by the

potential involvement of the DPE motif in Pol II pausing [87, 88]. If so, then

upon mutation of the DPE, Pol II may move to productive-elongation state,

which might result in higher luciferase activity levels (see discussion for

further details). To conclude, using the Drosophila definition of a DPE motif,

we have identified five Hox genes that contain a functional DPE.

4.2.2. The PCP project

The PCP project is based on the 'Determinacy Analysis Chain' (DAC)

software that was developed by the mathematician Prof. Sergey Chesnokov.

This computational approach, which was done in collaboration with Prof.

Chesnokov, was used as a complementary strategy to the 'Individual

examination' approach to facilitate the identification of Inr and DPE

combinations within the human Hox gene clusters in a high-throughput

manner. This is done by searching for perfect and imperfect matches of

nucleotides that are found at specific positions within the promoter.

Nucleotides are defined as critical for DPE function based on experimental

findings using the Drosophila Hox promoters.

25

In agreement with that, the working assumption in this project was that Inr

and DPE combinations-containing sequences in the human genome (defined

based on the Drosophila DPE) should be associated with active promoters of

human Hox genes. The analysis was based on the seven Drosophila DPE-

containing Hox promoters: Scr, Dfd, lab, Antp-p1, Antp-p2, pb and Abd-B .

Since the Drosophila abd-A and Ubx promoters do not have functional DPE

motifs [50], these Hox promoters were not used in the reference-set.

Generation of 'Irreducible genetic markers' (IGms) was performed after

alignment of these seven minimal Drosophila Hox promoters (-10 to +40

relative the TSS). IGms are distinct sets of positions that contain, by definition,

13 irreplaceable nucleotides, which are part of the critical nucleotides that are

necessary for the transcriptional function of each Hox promoter. This number,

13, is the number of irreplaceable nucleotides that must be used for the

creation of IGms by the DAC software. IGms were searched for within the four

human Hox clusters and constructed based on the Drosophila DPE functional

range set and to the mammalian Inr consensus. Following that, for every IGm

that was matched in the human genome, the additional matched positions of

the mammalian Inr and Drosophila DPE motif sequences were appended.

These sequences were named 'Putative core promoters' (PCPs). The

PCPs, which were identified by the IGms, were considered to be 'good' PCPs

(see below) if they were likely to contain a functional DPE and serve as active

promoters.

At first, the irreplaceable positions, which compose the different IGms, only

included the positions of the Inr and the DPE motifs. A stronger indication for

a possible biological significance to these sequences was given if, in addition

26

to these positions, nucleotides in positions +17, +19 and +24 were T, G and

G, respectively. These nucleotides were previously shown to be

overrepresented in these positions in Drosophila DPE-containing promoters.

Moreover, the nucleotide G at position +24 was also experimentally shown to

contribute to DPE function in Drosophila [49]. Every single PCP was scored in

order to evaluate how 'good' it is. The PCP received one point if the positions

of the Inr and DPE motifs matched their consensus and functional range set,

respectively. The PCP received half a point if position +24 matched. As a

result, a total of 1524 'good' PCPs for all four human Hox clusters were

identified in the both strands (-/+) (see Appendix 2). This quantity of PCPs

exceeds ten times the expected results. We next “filtered out” of PCPs that

co-occupied nucleosomal sites, as promoter regions are generally

nucleosome free. It is unlikely that PCPs that are found at nucleosome-

occupied sites would be active promoters. The filtering was done, with the

help of Dr. Tirza Doniger, using data from two complement resources from the

genome browser: 'Nucleosome Positioning' and 'DNaseI Hypersensitivity

Sites'. Following this filtration, ~ 400 PCPs remained. Although the vast

majority of the results were eliminated by the filtration, the number of the

remaining PCPs is still higher than expected.

Thus, as a second step, we evaluated the PCP approach by comparing the

human Hox gene clusters to the human Histone gene cluster. The different

clusters are similar in length (see Appendix 3). The reason for this comparison

is that in contrast to the Hox genes, the promoters of the core Histones (h2a,

h2b, h3 and h4), at least in Drosophila, contain TATA box elements [50, 89,

90]. Unfortunately, this analysis indicated that the number of PCPs was

27

similar among the different clusters (Hox versus core Histones). This finding,

as well as other data from our lab, led us to suspect that additional

nucleotides that are located at distinct positions between the Inr and DPE

might be important for promoter activity. Notably, this region (which we

termed, “the Neck”), has already been shown to contain one conserved motif

that functions with the Inr and DPE, namely, the MTE [91].

Hence, these nucleotides and their positions may also be conserved among

Drosophila and humans. We therefore generated new sets of IGms, which

contain the Inr and DPE as well as nucleotides in between, and searched for

new PCPs at the human Hox and Histone gene clusters (Figure 12).

Figure 12. A schematic representation of the workflow of the 'PCP project'. A. The first step of the PCPs

analysis within the human Hox and Histone gene clusters was based on IGms that were only generated only

from the Inr and the DPE positions (IGms-Area-0) of the seven Drosophila DPE-containing Hox genes. B. The

revised analysis involved the generation of IGms (IGms-Area-1) based on the Inr, DPE and nucleotides in the

region between them (the 'Neck') of the Drosophila DPE-containing Hox genes.

28

Fifty-four sequences of Drosophila TATA-less, DPE-containing core

promoters were analyzed by the bioinformatics tool 'WebLOGO', in order to

estimate the conserved positions in the “Neck” region (Table 2). This set of

sequences was composed of experimentally validated core promoter

sequences from our lab [51, 92] and core promoter sequences analyzed by

the Kadonaga Lab and published in the 'Drosophila Core Promoter Database'

(DCPD) website (http://labs.biology.ucsd.edu/Kadonaga/DCPD.htm) [49].

Specifically, following this new definition, there is a new set of 13 positions for

each of the seven Drosophila DPE-containing Hox genes, which is used as its

IGms for searching for PCPs in the human Hox and Histone gene clusters

(Table 3, see Appendix 4).

Table 2. The conserved positions and bases in the 'Neck' region between the Inr and the DPE

of the 54 Drosophila DPE-containing genes. These positions and nucleotides were conserved

among the set of the 54 Drosophila DPE-containing genes and may be important in selection the

DPE-dependent genes in humans. The table is separated to four groups (Group1-4) according to the

conservation level of the bases in these positions.

Group4 Group3 Group2 Group1

+20 +25 +24 +19 +18 +27 +17 Position

S A/T G G/A C/G A/C T/C Base

The chosen positions

Inr Neck DPE

Dfd C-1, A+1, A+3 T+17, C+18, G+19, G+24, C+27 G+28, G+29, T+30, T+31, C+32

AntpP1 T-2, A+1, T+3 T+17, A+19, A+20, T+25, A+27 A+28, C+29, A+30, T+31, C+32

Abd-B T-2, C-1, A+1 T+17, C+18, G+19, G+24, T+25, C+27 G+28, G+29, T+30, T+31

AntpP2 T-2, C-1, A+1 C+18, G+19, T+20, A+25 A+28, G+29, A+30, C+31, G+32, T+33

lab T-2, C-1, A+1 C+18, G+19, G+24, A+27 G+28, C+29, A+30, C+31, G+32, T+33

pb T-2, C-1, A+1 T+17, G+19, G+24, A+25 G+28, G+29, T+30, T+31, G+32, T+33

Scr C-1, A+1, T+3, C+4 T+17, C+18, G+19, G+28, C+29, A+30, C+31, G+32, T+33

http://labs.biology.ucsd.edu/Kadonaga/DCPD.htm

29

The results are organized in seven different tables. Each table summarizes

the PCPs results for each Drosophila Hox promoter (Figure 13). Using this

approach we identified hundreds of PCPs among human Histone and Hox

gene clusters and dozens of 'good' PCPs, which were divided into three

groups:

Type I (Hox) - the PCPs that were only found in the human Hox gene

clusters (marked in green). These PCPs were generated from the set of IGms

that were unique to the Hox gene clusters.

Type II (HoxHist) - the PCPs that were found in both the Histone and Hox

clusters (marked in light orange). These PCPs were generated from sets of

IGms that were present in both the human Hox and Histone gene clusters.

Type III (Hist) - the PCPs that were only found in the Histone gene cluster

(marked in red). These PCPs were generated from the set of IGms that were

unique to the Histone gene clusters.

Table3. The new composition of the IGms based on the seven Drosophila Hox genes including

the Inr, Neck and DPE regions. The table presents the nucleotides and their positions that compose

the IGms including the new area, the region between the Inr and the DPE ('Neck'). For each

Drosophila Hox gene, the best positions were chosen (see main text and Table 2 for further details).

31

This analysis (Figure 13) led to two main findings. First, whereas the

previous analysis resulted in similar numbers of PCPs in the genomic regions

of both the human Hox and Histone genes, the vast majority of the PCPs

identified by this analysis were within the human Hox gene clusters ('PCP All',

Figure 13). Moreover, the number of the 'good' PCPs, which were identified in

the genomic area of the human Histone genes, was negligible in comparison

to the number of the 'good' PCPs that were identified in the human Hox gene

clusters.

Second, in contrast to the previous results, the number of PCPs was reduced

from hundreds of 'good' PCPs to only dozens of 'good' PCPs ('PCP Good',

Figure 13). In the previous analysis that did not include the Neck region,

hundreds of PCPs were identified (1524 'good' PCPs in total, see above).

Having dozens of PCPs seems more “reasonable” as there are only 39

human Hox genes.

The new set of 'good' PCP sequences was further filtered by Dr. Tirza

Doniger using the 'Nucleosome Positioning' and 'DNaseI Hypersensitivity

Sites' data from the UCSC genome browser. Following this step, only 52

'good' PCPs remained as potential candidates for functional combinations of

Figure 13. Summary tables containing the distribution of PCP within the human Hox gene clusters

versus the Histone gene clusters following the analysis that included the 'Neck' positions. Every

single table contains the number of the PCPs, which were identified in the human Hox and Histone gene

clusters, separately (Hox or Hist) or overlapping (HoxHist), and were generated by IGms from a single

Drosophila Hox promoter. The three groups: Hox, HoxHist and Hist are marked in green, light orange and

red, respectively. At the top of each table there are the percentages of the different types of PCPs out of all

the PCPs that were identified in total. The percentages of the 'good' PCPs are indicated at the bottom of

each table. The names of the Drosophila Hox gene, which the PCPs were originally generated from, are

indicated on the left. (A-G). PCP-All: the sum of 'good' and „not good' PCPs that were identified. “PCP

Good” (dark blue) indicates the number of PCPs that, based on their sequence, may be functional

candidates.

32

Inr and DPE in the human Hox gene clusters. To visualize the genomic

location of the PCPs, we uploaded this final set of PCPs to the UCSC genome

browser (through the generation of a personal custom track), (Appendix 5).

Unfortunately, even though the results seemed promising, the 'good' PCPs

did not overlap with (or were not even within <50bp from) the known Hox

TSSs. Moreover, their orientation did not always fit the directionality of the

closest known TSSs.

Taken together, the “PCP approach” of identifying putative DPE-containing

human Hox genes, based on the Drosophila DPE motif was unsuccessful and

did not result in the discovery of DPE-containing promoters within the human

Hox gene clusters.

4.2.3. The 'Individual examination' project- further analysis

Based on the individual examination of human Hox genes, there are five

functional genes that contained a Drosophila DPE motif, namely, Hoxa1,

Hoxa2, Hoxa11, Hoxb9 and Hoxc8 (Figure 11B). The next step was to

experimentally define their TSS using primer extension assays on RNA

purified from cells that were transfected with minimal promoter constructs.

However, the levels of the transcripts from the transfected minimal promoters

were very low and we could not detect the extension products.

To overcome this, we designed new reporter constructs, each containing a

larger genomic fragment that encompasses the natural core promoter, as well

as ~ 1000bp of upstream sequence. Our assumption was that this region

would contain binding sites of activators, especially CDX2, and thus the

33

binding of activators that are expressed in the cells to these regions could

potentially enhance the transcription from the promoters of these five human

Hox genes. After computational analyses (using the 'JASPAR' tool) of putative

transcription factor binding sites in these regions, we constructed new

plasmids, which contain ~1000bp upstream of these minimal promoters, using

PCR with specific primers on human genomic DNA (for wt constructs).

Following the construction of the wt constructs, we used site-directed

mutagenesis to construct the mDPE version of each promoter. For simplicity,

we termed these plasmids „Enhancer-Promoter‟ constructs, even though it is

very likely that additional cis-regulatory modules, not necessarily proximal to

these promoters, contribute to their activities. It is noteworthy that the

mutagenesis of the Hoxc8 construct was technically challenging (perhaps due

to its high GC content) and we‟ve only recently managed to generate it. To

test the activities of these constructs, HEK 293 cells were transfected with

either the wt or mD 'Enhancer-Promoter' construct of each of the four human

Hox genes (Hoxa1, Hoxa2, Hoxa11 and Hoxb9). Cells were harvested the

cells 48h following transfections and the extracts were assayed for luciferase

activity (Figure 14).

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Hoxa1 Hoxa2 Hoxa11 Hoxb9

Rel

ativ

e Lu

cife

rase

Act

ivit

y

wt

34

Figure 14. The ‘Enhancer-Promoter’ constructs of human Hox promoters that contain

the Drosophila DPE motif indicate that their activity is dependent on the DPE motif.

The bar graph illustrates a summary of three independent dual luciferase experiments

(performed in triplicates). Mutant DPE-containing enhancer-promoter-driven constructs (mD)

were compared to wild-type (wt) enhancer-promoter-driven constructs. The error bars

represent the standard error of the mean (SEM).

Surprisingly, in contrast to the results of the minimal promoter constructs

(Figure 11B), the mD human Hoxa1 Enhancer-Promoter construct showed

high reporter activity, as compared to the wt construct (Figure 14). Enhanced

luciferase activity can be explained by RNA polymerase II pausing at the

promoter-proximal region of genes. As mentioned above, paused Pol II

Drosophila promoters are enriched for the presence of DPE (perhaps due to

its high GC content [87]). Notably, the relative activity of the other constructs

(Hoxa2, Hoxa11 and Hoxb9) was reduced when the DPE motif was mutated.

Nevertheless, the relative luciferase activity of the human Hoxa2 enhancer-

promoter-containing plasmids couple (wt and mD) does not reflect the activity

of the firefly luciferase (FL). Rather, owing to high RL levels in the Hoxa2 mD

transfected cells relative to the RL levels in the Hoxa2 wt transfected cells, the

normalization of the dual luciferase activity, which is performed by division of

the FL by the RL values, results in reduced levels of mD. The FL levels of

both wt and mD Hoxa2 transfected cells were similar. Hence, the only two

genes that convincingly demonstrated a Drosophila DPE-dependent activity

were the human Hoxa11 and human Hoxb9.

Following these findings, we purified total RNA from transfected cells in

order to do primer extension assay (performed by Hila Shir-Shapira, a Ph.D.

student in the lab). This assay allows the in-vivo validation of the levels of the

35

transcripts of the reporter gene under the regulation of the different enhancer-

promoter regions (wt versus mD) of human Hoxa11 and human Hoxb9.

Moreover, this assay reveals the TSSs of the transcripts. We did were unable

to detect a signal with total RNA and decided to purify poly A+ RNA from the

transfected cells. Primer extension experiments using PolyA+ RNA are being

performed these days and the preliminary results are promising.

4.3. Part II: Identification of a SNP in the +1 position of the Hoxb6 and

its potential implications in health and disease

On one hand, it is known that, similarly to Cdx2 [78]), certain human Hox

genes are aberrantly expressed in several solid tumors[56, 93], especially in

different types of blood cancers [56, 58, 76, 77, 79, 80, 94]. On the other

hand, based on evolutionary conservation, we hypothesized that functional

DPE motifs exist in the human genome, particularly in the Hox gene clusters

whose basal transcription is regulated by this motif in Drosophila. Thus, one

possible explanation for the abnormal expression of Hox genes in cancers is

that it may be caused by genetic substitutions (mutations or single nucleotide

polymorphism; SNP) in their core promoters or in core promoter elements.

This hypothesis is supported by evidence of many nucleotide polymorphisms

in multiple TATA box elements, which have been associated with multiple

pathologies in humans [95]. In order to examine this suggestion, we searched

for substitutions in the promoter regions (-200 to +200 relative to the known

TSS) of the human Cdx2, Hoxa9, Hoxa10, Hoxb6 and Hoxb7 genes in

sampled obtained from blood cancer patients. The promoter regions of these

36

genes were sequenced by Dr. Julia Starkova from the 'Childhood Leukaemia

Investigation Prague' (CLIP) institute in Prague, who has an access to

samples from patients. The above genes were chosen from all the human

Hox and Cdx genes, since their expression in leukemia is aberrant and it

suggested that these genes are associated with the development or the

appearance of cancers [80]. In order to distinguish between mutation and

SNP, we first mapped all the known SNPs (based on the SNP database,

dbSNP of NCBI), in the range of 700bp upstream and 700bp downstream,

relative to the known TSSs of these genes.

Further, for each of the genes, the

sequences of dozens of patients were

examined and compared to the human

genome sequence (from the genome

browser; 'the reference') (Table 4).

We analyzed sequences of samples

obtained from T-cell Acute Lymphoblastic

Leukemia (T-ALL) and Acute Myeloid

Leukemia (AML) patients.

Although we did not detect any mutations

in the sequenced core promoter regions we

examined, several SNPs were identified. Strikingly, we detected a SNP in the

known +1 position (RefSeq) of the Hoxb6 gene (SNP ID in the dbSNP is

rs56805315). In this SNP, Cytosine (C), which is more prevalent, is

substituted by a Thymine (T). Further analysis of the Hoxb6 core promoter

sequence led to several findings (Figure 15):

AML T-ALL Cancer

type

The gene

35 60 Cdx2

33 20 Hoxa9

34 50 Hoxa10

31 71 Hoxb6

33 69 Hoxb7

Table4. The distribution of the patients' samples, whose promoter regions were sequenced. The promoter regions (±200bp relative to the known TSS) of five human genes: Cdx2, Hoxa9, Hoxa10, Hoxb6 and Hoxb7, were sequenced from samples of blood cancer patients (T-ALL or AML). The sequencing was done in order to search for substitutions (mutations or SNPs) in these sequences.

37

a. There is an Adenine (A) at the +2 position, which could be defined as

the +1 position and then the preceding 'C' could be defined as the -1

position (A is typically designated as the +1 even if transcription

initiates within a few nucleotides from it).

b. These two nucleotides constitute a sequence that partially matches the

mammalian Inr consensus (4 out of 7 positions). At the appropriate

spacing of 28 nucleotides downstream of this putative +1, there is a

sequence that partially matches the functional range set of the

Drosophila DPE motif (4 out of 6 positions) (Figure 15, top).

c. Three nucleotides downstream of the abovementioned 'A', there is

another A. This DNA sequence (highlighted in yellow in Figure 15,

bottom) may serve as a +1 of a motif that fully matches the Inr

consensus (7 out of 7). Similarly to the first putative Inr, there is a

sequence that partially matches DPE motif (4 out of 6) and is located at

the appropriate distance from the putative Inr motif (Figure 15, bottom).

Figure 15. Combinations of putative Inr and DPE motifs, which are found in the core promoter

sequence of the Hoxb6 gene. The combination containing the RefSeq TSS is shown on top and the

alternative combination, which is close to the known TSS and contains a sequence that perfectly

matches the Inr consensus, is shown on the bottom. The SNP (C or T-containing alleles) is colored in

grey; matching positions to the Inr consensus or to the functional range set of the Drosophila DPE are

colored in yellow or purple, respectively. The start and end positions (-5 to +40) are relative to the

RefSeq TSS.

38

As shown in Figure 15 and as mentioned above, although the Inr of the first

combination of Inr and DPE (Figure 15, top) contains the known TSS of

Hoxb6 and is thus expected to be optimal to transcription, this Inr does not

fully match the consensus. However, the Inr of the second putative

combination of Inr and DPE (Figure 15, bottom) matches the consensus

perfectly. Moreover, it is known that the expression levels of this gene, Hoxb6,

are higher in leukemic cells compared to normal hematopoietic cells [80, 96,

97]. Therefore, it is possible that the high expression of Hoxb6, especially in

blood cancers, is affected, directly or indirectly, by this SNP. Hence, I want to

suggest the following potential mechanism:

Normally, the presence of the canonical Inr, which contains only 4

matching positions including the most important positions -1 and +1 'C' and

'A', ('cA') respectively (Figure 15, top), enables transcription that leads to

appropriate expression levels of the Hoxb6 gene in the cells. However, if the

'C' is replaced by a 'T' due to certain environmental influences or genetic

factors, the strength of the Inr, which now contains a 'tA' instead of a 'cA', is

reduced and thus the transcription levels from this TSS are also reduced. In

general, although both 'C' and 'T' at position -1 match the Inr consensus, 'C' is

more common than 'T'. In such a situation, an alternative TSS might be

preferred. Hence, the basal transcription machinery might recognize the

“second Inr” sequence, which matches the Inr consensus perfectly (Figure 15,

bottom). Hence, it is likely that the alternative TSS would be stronger than the

known TSS with a „T‟ at -1. Subsequently, this alternative usage of a “second

Inr” might result in higher transcription levels of the Hoxb6, which is commonly

associated with blood cancers.

39

In order to examine this hypothesis, the minimal promoter variants from -10

to +40 relative to the known TSS (wt or SNP) of the Hoxb6 gene were cloned

upstream of the firefly luciferase reporter gene and transfected into HEK-293

cells along with the Renilla reporter plasmid for transfection-efficiency

normalization. The activity of the SNP promoter, which contains the rare allele

('T'), was compared to the activity of the wt promoter, which contains the

common allele ('C'), using dual luciferase assays. As can be seen in Figure

16, minimal promoter containing the rare allele 'T' (below; SNP) is 8.5 fold

more active than the wt variant. This preliminary finding supports the above

hypothesis but additional experiments, such as primer extension and

tumorigenicity-related functional experiments that use the promoter in the

context of the whole gene, need to be done to strengthen it.

Figure 16. Substitution of Cytosine to Tymine in the known TSS of the human Hoxb6 gene

enhances the expression levels of the reporter gene that is cloned downstream of it. The bar

graph illustrates a summary of three independent dual luciferase experiments (performed in

triplicates) that compare between minimal core promoter variants; wt ('C' at the known +1 position) or

SNP ('T' at the known +1 position). The error bars represent the standard error of the mean (SEM).

40

Additionally, in order to estimate the clinical effect of the SNP and

examine whether the presence of the SNP is correlated with the severity of

blood cancer, Dr. Starkova found out from Affymetrix (which manufactures

SNParrays), that for 170 healthy individuals, the frequency of the rare allele 'T'

is 3.5% heterozygotes. Furthermore, according to the 1000 Genome Project

data, for 120 healthy individuals, the frequency of the rare allele is 4.9%

heterozygotes. Hoxb6 promoters were sequenced (see above, Table 4) from

102 T-ALL and AML patients in total. The results indicate similar frequencies

of the rare allele among the 102 patients (~ 5%). AML patients displayed a

higher frequency of the rare allele, as compared to T-ALL patients. Obviously,

this number of our patients' samples is not enough for performing statistical

analyses. Many more samples should be analyzed in order to determine

whether the frequency of the rare allele is higher in patients compared to the

healthy population. We have analyzed the samples that Dr. Starkova has and

obtaining samples from additional patients is expected to be a lengthy

assignments. In addition, this SNP is rare, and is barely found in databases

that are currently available to us. Thus, we decided to suspend future analysis

of this SNP until more samples from Dr. Starkova and more databases of

patients' samples will be available for us. Notably, we have recently identified

a sequence in the Hoxb6 promoter that partially matches the TATA box

element and is located 22bp upstream of the TSS, according to the RefSeq.

Even though it is not the typical position for a TATA box sequence relative to

the TSS, it may still be functional. Nevertheless, in light of the progress of my

other projects, we have decided to advance them and discontinue the work on

the regulation of the Hoxb6 expression for the time being.

41

4.4. Part III: Whole genome analysis

4.4.1. Identification of human promoters that contain Drosophila DPE

sequence motifs

Following the identification of the DPE motif in the human Hox gene clusters

as a test case, our search was expanded to other human candidates using

computational and experimental whole genome analysis. Whereas a few

candidates were identified in the human genome by the 'individual

examination' strategy, this approach is not systematic.

This project was performed in collaboration with Hila Shir-Shapira and Anna

Sloutzkin, two Ph.D. students in our lab, and Amitay Drummer from Sol

Efroni's lab at Bar-Ilan University.

Towards this systematic strategy, a computational software, termed

'hDPEsearcher', was developed. The Matlab-based hDPEsearcher code was

developed by Amitay Drummer and was optimized by Anna Sloutzkin. The

software searches for putative combinations of Inr and DPE within the human

genome. The combinations were defined by the abovementioned key

sequence criteria (see pages 16-18). Moreover, the detected combinations

are compared with the known RefSeq TSSs of all the annotated human

protein-coding transcripts. The proximity to the TSS and the Inr and DPE

match scores are considered to be indicative of a potential biological function

of the detected DPE motif. EST clones and tags are not automatically taken

into consideration in this software.

42

The hDPEsearcher analyzes each strand of each chromosome separately,

and its algorithm can be described by the following steps:

1) Search for DPE sequences, where each match receives a score of 1.

2) If the DPE score is at least 4 out of 6, search for a mammalian initiator

starting at position -2 if the second position of the detected DPE ('G' or

'C') is precisely located at +29 position relative to the A+1 of the

initiator.

3) Calculate the mammalian Inr score.

4) An Inr and DPE combination is considered to be a „hit‟ only if the

combined score for both is at least 8. i.e., if the DPE is highly

conserved the corresponding Inr can be less conserved and vice versa.

In addition to the chromosomal coordinates of the putative Inr and DPE

combinations, the software plots the combinations found against the genomic

coordinates for each chromosome. These graphs describe a rather uniform

distribution of putative Inr and DPE combinations along the chromosome, with

some regions containing peaks of Inr and DPE combinations frequencies.

However, no significant correlation between the peaks and potential biological

function was discovered.

The list of the complete human TSS locations was extracted from the UCSC

table browser, with the help of Dr. Tirza Doniger. Alternative splice variants

starting at the same position were considered as single TSSs, denoted by its

official gene name. Putative Inr and DPE combinations found within ±5bp of

known TSSs were considered for future analysis.

Furthermore, the putative Inr and DPE combinations that were computationally

identified were manually filtered. Specifically, we checked whether there are EST

43

clones and tags for these combinations (using the genome browser and dbTSS,

respectively) that coincide with promoters of genes that are associated with

specific biological processes, such as; development, cell-cycle, proliferation,

apoptosis and differentiation. Following this selection, 11 human genes were

considered to be the most promising candidates for experimental validation.

Finally, 11 human genes were considered to be the most promising candidates

for experimental validation.

The genes are: p21, tp53inp2, ccnd1, proS1, twist2, snail1, cdc25a, cdc25b,

cdc34, Hoxb6 and Hoxd13 (see Appendix 6). For each of these genes, wt and

mutant DPE (mD) versions of the minimal promoter (-10 to +40, relative to the A

+1 position of our combination) were cloned into the firefly luciferase pGL3

vector. The generated plasmids were co-transfected into HEK-293 cells, along

with the TK Renilla luciferase reporter plasmid for transfection-efficiency

normalization. The relative activation of the mD version, compared to the wt, was

examined using dual luciferase assays. All the experiments were performed at

least twice and each experiment was done in triplicates.

In contrast to our expectations, no major differences between the

transcriptional activity of the wt and the mD minimal promoters were observed

for all the analyzed promoters (Figure 17). A close examination of the genes

giving an apparent reduction of expression upon mutation of the DPE (i.e. ccnd1

and proS1) reveal a difference in the activity levels of normalizing RL between

the wt and mD transfected cells, and therefore, the general conclusions seem to

be applicable to all the analyzed promoters.

44

Figure 17. Multiple human promoters contain matching Drosophila DPE sequences,

however these motifs are not functional. The bar graph illustrates dual luciferase

experiments that compare between minimal promoter variants. Mutant DPE-containing

promoter-driven constructs (mD) were compared to wild-type (wt) promoter-driven constructs.

Overall, there is no substantial difference in expression between the wt and the mD versions

of the minimal core promoters. Error bars represent standard deviations.

n=4 for p21, ccnd1, twist, and Hoxd13. n=3 for tp53inp2, proS1, cdc25a, cdc25b, and cdc34. n=2 for snail and Hoxb6.

These findings could be explained by experimental design issues and by

the evolutionary distance between Drosophila and humans (see discussion).

It is noteworthy that two more human Hox genes, Hoxb6 and Hoxd13 (this

Hxob6 transcript has a different TSS as compared to the Hoxb6 TSS that was

mentioned above) were examined in this project, although these candidates

were previously not considered promising enough to have a functional DPE in

the 'Individual examination' project (see Table 1 and its related text). These

findings and this computational project emphasized the need for a

bioinformatics tool that detects core promoter elements in order to facilitate

the identification of core promoter elements.

45

4.4.2. ElemeNT- a core promoter Elements Navigation Tool

Indeed, there is no available resource allowing the identification of all the

specific core promoter elements and their potential combinations within a

given sequence. Hence, almost every annotation of core promoter elements

in a sequence of interest described above was individually performed. To

automate this process and alleviate the time burden associated with manual

scanning of dozens of sequences at once, Anna Sloutzkin and I have

developed the 'core promoter Elements Navigation Tool' (ElemeNT).

A paper describing this work is under review (see Appendix 7).

Briefly, ElemeNT is a web-based, interactive tool (implemented in Perl) for

rapid and convenient detection of core promoter elements and their

combinations within any given sequence.

It is accessible at http://lifefaculty.biu.ac.il/gershon-tamar/index.php/element-

description (password-protected until publication; Username-GershonLab,

password- TJGL2014). ElemeNT searches the input sequences for the

presence of certain core promoter elements specified by the user. The

elements are represented by position weight matrices (PWMs), which are

constructed based on the experimentally validated biologically functional

sequences.

The elements that can be searched for are: mammalian Initiator, Drosophila

Initiator, TATA box, MTE, DPE, Bridge, BREu, BREd, Human TCT,

Drosophila TCT, XCPE1 and XCPE2. The MTE, DPE and Bridge motifs are

only calculated at the precise location relative to each detected

mammalian/Drosophila Initiator, based on the known strict spacing

requirement. The scores are normalized to be between 0 and 1, generating

http://lifefaculty.biu.ac.il/gershon-tamar/index.php/element-description


46

more intuitively interpretable results. For each element, the user should

specify a threshold between 0 and 1, which determines whether the element

is present or not at a position. Default threshold values were empirically

determined for each element, based on known functional sequence elements,

and are provided.

The output of the program contains the analyzed sequence, a color display of

some possible core promoter elements combinations found, and a table

containing each of the detected elements alongside its position, PWM and

consensus match scores. A sample output of the ElemeNT program is

depicted in Figure 18.

In addition to the automation of core promoter elements annotation, the

ElemeNT program uses PWM data, rather than consensus sequence, to

score the putative motifs. The use of the PWM enables a better reflection of

the biological significance of the different nucleotides‟ distribution at specific

position, which is hard to account for by manual annotation of sequences.

Notably, for some elements, the PWM differs from the defined consensus,

reflecting differences in the analyses of the sequences by different labs.

47

Figure 18. A sample output of the ElemeNT program. ElemeNT has detected a TATA box

flanked by both, a BREu element and a BREd element, Drosophila and Mammalian initiator

elements and MTE, DPE and Bridge elements. Sample input sequence (top), the

combinations of elements identified in it (middle) and a table of the detected elements

(bottom) are shown. The two possible combinations result from a sequence match to both the

Drosophila and mammalian initiators, due to the partial sequence redundancy of the two

elements. The table displaying all the elements identified within the sample input sequence,

their location, PWM and consensus match scores. Note the message displayed for the TATA-

box, indicating the presence of mammalian and Drosophila initiator, as well as BREu and

BREd, at optimal distances for transcriptional synergy.

48

4.5 Part IV: Identification of binding sites of the human TAF6 and

TAF9, subunits of the TFIID in human promoters

Until this point in my study, in order to identify functional DPE motifs in human

promoters, our strategies relied on the strict conditions of spacing and sequence

of the DPE motif as characterized in Drosophila [47-49, 91]. Although

publications have demonstrated the transcriptionally dependency of two human

genes on an element that matches the definition of Drosophila DPE [48, 53],

using these strict features, we barely succeeded to define other functional DPE-

containing human genes. Nevertheless, one detail was not taken into

consideration, which is the proteins that bind this element. Originally, the

Drosophila DPE was defined as the recognition and binding site of the TFIID

subunits, TAF6 and TAF9 [47, 48]. More recently, footprinting assays and

structural studies of the human TFIID multi-complex (hTFIID) demonstrated that

the hTFIID binds to the TATA box, Inr, MTE and DPE sequences, which are

contained in the synthetic super core promoter [25, 98].

Hence, as an alternative approach towards the identification of human DPE-

containing promoters, we decided to identify the binding sites of human TAF6

and human TAF9 in human promoters. In contrast to the previous strategies, this

strategy is independent of the spacing requirement, location and sequence of

the Drosophila DPE. To this end, human stable cell lines were generated by the

Flp-In systemTM (Figure 19), in order to perform, thereafter, a high-resolution

Chromatin Immuno-Precipitation (ChIP) assay. The Flp-In system allows a single

allele, single site integration event, of a plasmid of interest (see for example

[99]).

49

Figure 19. Illustration of the Flp-In™ system. HEK-293 Flp-In cells are co-transfected with

the pcDNA5/FRT vector, which contains a gene of interest (i.e. the tagged human taf6 or

human taf9 genes), and the pOG44 plasmid (a Flp-recombinase expression plasmid). Flp-

recombinase catalyzes homologous recombination between the FRT site in the pcDNA5/FRT

vector and the FRT site, which is located in specific known locus in the genome of the cells.

Transfected cells express the desired gene and become Hygromycin B resistant and Zeocin

sensitive. This integration breaks the ORF of the lacZ gene that has been integrated in the

genome of the HEK-293 Flp-In parental cells.

Four constructs were generated in order to create four different stable cell

lines expressing one of the four tagged-TAF protein versions. Each of the TAFs

cDNAs (TAF6 or TAF9) were cloned into the pcDNA5/FRT vector with two short

tandem protein-tags, FLAG and HA, which were cloned in frame immediately

upstream or downstream (N- or C- terminal, respectively) of the TAFs. Tagging

was required because ChIP-grade antibodies against the TAFs were unavailable

to us.

50

The four plasmids are:

a. TAF6 with a C-term. FLAG-HA tag.

b. TAF6 with an N-term. FLAG-HA tag.

c. TAF9 with a C-term. FLAG-HA tag.

d. TAF9 with an N-term. FLAG-HA tag.

Each of the above plasmids was co-transfected with the pOG44 plasmid into

HEK-293 flp-in cultured cells. The pOG44 plasmid expresses the flp-

recombinase enzyme, which catalyzes homologous recombination between the

two FRT sites. One FRT site is located just downstream of the tagged-TAFs in

the pcDNA5/FRT vector. The second FRT site is located within the ORF of the

lacZ gene that is integrated in the HEK-293 flp-in genome. Subsequently the

tagged-TAFs are should be integrated specifically between these two FRT sites

(Figure 19). Starting one day post transfections, cells were grown in

Hygromycin-containing medium to select for stably integrated clones.

In order to examine the proper integration of the tagged-TAFs-containing

plasmids into the dedicated site in the genome, three distinct tests were

performed. The first test was whether the cells express β-galactosidase, which is

encoded by the lacZ gene. Generally, when the β-galactosidase catalyzes

hydrolysis of its chromogenic substrate X-gal, the cells are colored in blue.

However, because the integration site is located within the lacZ gene, the

insertion of the tagged-TAF-pcDNA5/FRT plasmid version should disrupt the

ORF of lacZ. Indeed, stable cell lines with successful integration event for each

tagged-TAF variation, remain mostly white when assayed for X-gal activity

(Figure 20).

51

A. B. C.

D. E. F.

G. H. I.

J. K. L.

M.

TAF6 with a C-term. FLAG-HA tag

TAF6 with an N-term. FLAG-HA tag

TAF9 with a C-term. FLAG-HA tag

TAF9 with an N-term. FLAG-HA tag

52

We next used PCR to test for the presence of two FRT sites in the stable

cell lines. Genomic DNA of each of these stable cell lines was extracted and

used as a template for PCR with specific primers for the FRT sites (Figure 21).

Each amplified PCR product was gel-purified and sequenced.

Figure 20. TAF6 and TAF9 are stably integrated in the stable cell lines, as determined by the

X-gal assay. Photos of three different areas of a well of each of the four Hygromycin-resistant stable

cell lines taken following three weeks of antibiotic selection and after fixation and incubation with an

X-gal-containing solution.

(A-C) TAF6 with a C-term. FLAG-HA; (D-F) TAF6 with an N-term. FLAG-HA; (G-I) TAF9 with a C-

term. FLAG-HA; (J-L) TAF9 with an N-term. FLAG-HA. (M) A positive control of untransfected

parental HEK-293 flp-in cells.

White cells indicate successful integration, whereas blue cells indicate unsuccessful integration.

Figure 21. Two FRT sites are present around the integration site in the stable cell lines. In

order to verify successful integration into the HEK-293 flp-in cells, genomic DNA was PCR-amplified.

with specific primers for each one of the FRT sites. A. Extracted genomic DNA from each of the

stable cell lines was run on a 1% agarose gel. As expected, the genomic DNA appeared at the top of

the gel. M, 1Kb DNA Marker. B. PCR using SV40 promoter- and Hygromycin gene-specific primers

to detect the FRT site that is located upstream of the tagged-TAFs-pcDNA5/FRT integrated

plasmids. The expected product size is marked by a white arrow. M, 1Kb Marker. C. PCR using BGH

poly A sequence- and lacZ gene-specific primers for the FRT site that is located downstream of the

tagged-TAFs-pcDNA5/FRT integrated plasmids. The expected product size is marked by a white

arrow. M1, 1Kb DNA Marker. M2, 100 bp DNA Marker.

N.T, No Template control. P.C, Positive-Control - a known sample that was previously tested. N.C,

Negative-Control - untransfected parental HEK-293 flp-in cells.

53

We next examined whether the four stable cell lines express each of the

tagged-TAFs by Western-blotting (WB). Whole cell extracts were with anti-FLAG,

anti-HA, anti-TAF6 or anti-TAF9 antibodies (Figure 22). TAF6 was detected in all

the four stable cell lines. Overexpressed TAF6 was detected using either anti-

FLAG, anti-HA or anti-TAF6 antibodies (Figure 22A, B). Notably, we could detect

endogenous TAF6 in the TAF9-overexpressing cell lines (Figure 22B). TAF9

was barely detectable by WB using the abovementioned antibodies (Figure 22A,

C).

Figure 22. The stable cell lines express tagged-

TAF6 and TAF9 protein variations.

A. Western-blot (WB) analysis using anti-HA (left

panel) or with anti-FLAG (right panel) antibodies.

The tagged-TAF6 versions are detected.

B. WB analysis using anti-TAF6 antibodies. TAF6

versions are over-expressed in extracts of the

tagged-TAF6 containing cell lines.

TAF6 is detected at ~75kDa.

C. WB analysis using anti-HA antibodies against

protein lysates from tagged-TAF9 stable cell lines. A

~33kDa band (as expected) is detected in N-

terminally tagged TAF9 cells and a similar, but very

weak band, is detected in the C-terminally tagged

TAF9 cells..

C' and N' are represent the position of the tandem

tags (FLAG and HA) relative to the TAF6 or TAF9

proteins. M, protein marker (kDa).

54

To further examine the expression of TAF9, we immuno-precipitated cell

extracts using anti-FLAG beads and subjected it to WB analysis using anti-FLAG

antibodies.

Unfortunately, we could not detect TAF9 using the immunoprecipitation (IP)

(data not shown). The difficulties to detect TAF9 proteins (both the tagged- and

endogenous- TAF9), can be explained by the biology of TAF9 and its function

that is independent of basal transcription (see discussion). Nevertheless, we

next focused on the ChIP experiments.

I next generated a ChIP-seq protocol for HEK-293 flp-in cells by combining

different published protocols. Experimental optimization of the ChIP-seq protocol

was done (data not shown). Nonetheless, we initiated a collaboration with Dr.

Julia Zeitlinger's lab (Stowers Institute for Medical Research).

This collaboration allows us to use a novel, yet unpublished ChIP method with

a higher-resolution than traditional methods such as ChIP-seq. This method,

termed ChIP-nexus, was developed in the Zeitlinger lab to detect transcription

factor binding sites in vivo with nucleotide resolution.

We shipped the four stable cell lines to the Zeitlinger lab and they are

currently performing ChIP-nexus, as well as ChIP-seq experiments on them,

using anti-FLAG and anti-HA antibodies.

55

5. Discussion

In search of a functional human DPE motifs – analysis of human

Cdx/Hox promoters and computational whole genome analysis

Individual examination of the promoters of human Cdx and Hox genes was

performed. Dual luciferase experiments on both minimal promoter and

enhancer-promoter constructs, demonstrated that several Hox promoters

contain functional DPE motifs that contribute to transcription from these

promoters. Moreover, similar results were obtained when the TATA box and

DPE sequences of Cdx1 or DPE sequence of Cdx2 promoters were mutated.

These observations can be explained by experimental evidence that the TATA

box, Inr and DPE serve as binding sites for different components of the TFIID

multi-complex, a major PIC component [20, 47-50]. Hence, upon mutation of

these motifs, the basal transcription machinery binds the promoter only through

a sub-optimal Inr motif, which leads to unstable architecture of the PIC, which in

turn, results in reduced transcription.

Additionally, sequence analysis of the core promoter regions of all the human

Hox transcripts, indicated that the vast majority of the human Hox promoters

contain DPE sequence motifs but not TATA box sequences.

These functional and in silico findings support the notion that the Hox genes,

despite of duplication and divergent events during evolution, are mostly

conserved. In addition to the known conservation of the Hox genes, which was

previously only done at the protein level by comparisons of the amino acids

homeodomain composition of different genes, our analyses present conservation

at the DNA level, and more importantly, in the promoters of these genes.

56

Based on these findings, several consequences may be suggested. First,

synchronized and cooperative regulation of the expression of Hox genes in

development and differentiation could be achieved by a shared transcriptional

mechanism driven by specific master regulators (e.g. CDX protein family).

Second, as a complementary view, it suggests that similarly to the Drosophila

Hox genes [50], human Hox genes may be preferentially regulated through

specific core promoter elements, such as the DPE.

However, we have shown that promoters of other Hox as well as other human

genes, which contain the Drosophila DPE sequence, did not contain functional

DPE motifs. Technically, the constructs used in the dual luciferase assays only

contained the minimal promoter comprising 50 nucleotides, and may thus, not be

able to present the actual transcriptional difference between wt and mutant DPE

constructs, if it is minor. Moreover, this assay measures the transcriptional

activity indirectly, by quantifying the enzymatic activity of the reporter protein that

is produced from the gene under the control of these promoters. Hence, it only

detects products of transcripts that were correctly translated into the active

enzyme, but not all the transcripts that were transcribed from the reporter gene.

Therefore, the measurements could be affected by the luciferase mRNAs

processing and stability and by the degradation of unfolded firefly luciferase

proteins.

The 'PCP project' defined Inr and DPE combinations in the human Hox

clusters computationally, based on the sequences of Drosophila Hox promoters.

Unfortunately, we did not identify any new functional DPE-containing human

57

genes using this approach. Notably, it is known that the TSSs in the human

genome, which are annotated in the genome browser, are not accurate, as seen

for the updated TSS of Cdx2. Furthermore, recent studies revealed transcription

initiation in enhancers [100-104] and a recent paper from the Lis lab indicated a

shared architecture between promoters and enhancers [105, 106]. Thus, some

of the final PCPs may theoretically encompass real TSSs. Importantly, these

results emphasize that transcription, even at the basal level, is regulated by

multiple factors in addition to sequence composition. Notably, in contrast to the

Hox genes, the putative DPE-containing promoters of the genes that were

analyzed in the 'whole genome' project are less conserved from Drosophila to

humans.

Hence, despite the identification of a few human genes that conform to the

Drosophila DPE definition, Drosophila and humans are still evolutionarily distant,

and the precise functional Drosophila DPE sequence might have evolved over

time to represent a modified set of nucleotides in humans. Subsequently, the

strict requirements of functional initiator and precise spacing present in

Drosophila, might be altered. These conclusions were the motivation to develop

an assay that would be independent of the Drosophila DPE definition and its

sequence features (see Part IV of the Results and discussion below).

58

Identification of a SNP in the +1 position of the Hoxb6 and its potential

implications in health and disease

Screening of dozens of core promoter sequences of T-ALL and AML patients,

revealed an interesting SNP in the human Hoxb6 gene. Based on dual luciferase

experiments, this SNP (cytosine to thymine), which is located in the known +1

position of the Hoxb6 transcript, enhanced the reporter activity about 850% as

compared to the frequent allele. It can be speculated that this substitution

contributes to the overexpression of Hoxb6, a frequent phenotype in the blood

cancers mentioned above. We have proposed a model how this SNP can

contribute to the utilization of an alternative TSS with a potentially stronger Inr

and DPE combination. Notably, there are many examples in which the aberrant

expression of genes that regulate embryogenesis has been implicated in

carcinogenesis [56]. The DPE element has been associated with developmental

genes and development regulators. Thus, it could be speculated that not only

this SNP and others have a role in cancer development, but the DPE might as

well. If so, along with previous data regarding TATA box sequence aberrations

and their correlation with diseases [95], this part of my thesis, strengthens the

clinical implications of the core promoter elements.

59

Identification of binding sites of the human TAF6 and TAF9, subunits of

the TFIID in human promoters

In order to characterize the DPE motif in human promoters, independently of

the strict requirements of the Drosophila DPE motif, tagged- human TAF6 and

human TAF9 stable cell lines were generated in the last stage of this work. To

validate the integration and expression of the TAFs, three tests were performed.

While the X-gal and the integration tests were successful for all four cell lines,

the protein expression data indicated high TAF6 overexpression, but mostly

undetectable tagged and endogenous TAF9 expression (Figure 22). There are

several explanations that may account for our inability to detect TAF9. First, the

half-life of TAF9 is shorter than the half-life of TAF6. Thus, it may be difficult to

detect TAF9 proteins by WB, although TAF9 was previously detected by WB

[107]. Moreover, the FLAG-HA tags at N- or C- terminus may also influence on

the stability of TAF9 protein more than the stability of TAF6 protein because the

TAF9 protein is smaller than TAF6 protein.

Second, it was shown that TAF9 protein contributes to the tumor suppression

activity of p53 through protein-protein interactions [108, 109]. Thus, a plausible

(yet not highly likely), explanation for the barely detection of TAF9 in the stable

cell lines is that the TAF9 is degraded in HEK-293 flp-in cells, whose karyotype

is abnormal and were previously shown to generate tumors in nude mice [110].

Moreover, to identify the hTFIID complexes at active promoters, recent ChIP-

chip experiments were performed on human embryonic stem cells (hESCs) and

revealed non-canonical hTFIID complexes, which are only composed of six

TAFs, containing TAF6 but not TAF9 [111]. Therefore, the endogenous TAF9

may not be expressed in some human embryonic-origin cells. This, however,

60

does not account for the fact that we were unable to detect the ectopic TAF9

driven by a CMV enhancer-promoter.

Nevertheless, all the four stable cell lines have been used for ChIP-nexus

experiments that allow the identification of the DNA binding sites of the ChIPed

TAFs at single nucleotide resolution. With this method, it will be possible, for the

first time, to characterize de novo a core promoter element (a human DPE) that

is the functional equivalent of the Drosophila DPE motif, in a manner that is

independent of the original DPE.

To conclude, in this thesis, which is composed of five related, but

independent projects, we aimed to identify and characterize a DPE motif in

the human genome. Based on a previous knowledge about the original DPE

motif that has been extensively studied in Drosophila, through computational

and experimental evidence using the human Hox genes as a test case and

finally by the ChIP-nexus, the hDPE can be regarded as a human core

promoter element with potential links to gene expression in blood cancers.

61

6. Materials and Methods

Plasmids received

1. TK Renilla luciferase- This plasmid contains the Renilla luciferase gene

that is controlled by the Thymidine Kinase promoter. This plasmid was

received from the lab of Prof. Yaron Shav-Tal.

2. pBlueScript- A commercial plasmid (Stratagene).

3. pGL3 modified basic- This is the commercial plasmid of pGL3 basic

(Promega) but with a different multiple cloning site. 'Basic' means that

enhancer and promoter do not exist in this plasmid.

4. pOG44- A commercial plasmid (Life Technologies).

Construction of Plasmids

1. Cdx1-pGL3- cloning the core promoters (-40 to +40 relative to the TSS)

versions of Cdx1 (wt, mTATA, mDPE, mTATAmDPE) into the pGL3

modified basic vector using a “Drop-In” procedure. In order to generate

this plasmid, the Cdx1 promoter was divided into two parts that would

later be ligated to each other and to the vector at the same time ('three

way ligation' manner). Designed primers were (ordered from IDT),

include a KpnI compatible sticky site at the 5 ' end of the upstream part,

an SpeI compatible sticky site at the 3' end of the downstream part and

cohesive sticky ends between the two parts. Construction was verified

by DNA sequencing (Hy Labs).

2. Cdx2/Hox/other promoters-pGL3- cloning the minimal promoter (-10 to

+40 relative to the TSS) versions (wt and mDPE or wt and SNP for

Hoxb6 only) of the genes: Cdx2, Hoxa1, Hoxa2, Hoxa9, Hoxa11,

Hoxb3, Hoxb6, Hoxc6, Hoxc8, Hoxd3, Hoxd9, Hoxd10, Hoxd13, twist2,

62

snail1, cdc25a, cdc25b, cdc34, proS1, ccnd1, p21 and tp53inp2 into

the pGL3 modified basic vector using a “Drop-In” procedure (primers

were ordered from IDT). The minimal promoter versions of Hoxa6 and

Hoxb9 genes are cut and purified with PstI and XbaI restriction

enzymes from pUC119 plasmids that contained these minimal

promoters (previously constructed by Dr. Juven-Gershon).

Construction was verified by DNA sequencing (Hy Labs).

3. enhancer-promoter wt Hox -pGL3- generation of this type of constructs,

which include the promoter region of five different Hox genes (Hoxa1,

Hoxa2, Hoxa11, Hoxb9 and Hoxc8) with ~1kb upstream region, was

done using PCR on genomic DNA that was purified from HEK-293

cultured cells with gene-specific primers. The PCR primers for each

Hox gene are listed below (cloned positions are noted):

Hoxa1 (from -987 to +145 relative to the known TSS)

- Forward primer: 5'- CTCCTACCCCTAAAAATCCGGCGGTC -3'

- Reverse primer: 5'- ACTGCTAAGTATGGGGTATTCCAGGAAGGA -3'


- Forward primer: 5'- CTTTCTCCATCTCTCAAACTCTCTCTTCTTC -3'

- Reverse primer: 5'- CGCTGCTAGGGTGTTTTTTTTCTAATTCAC -3'


- Forward primer: 5'- GATCCCGGGTAAGACGAAGGCCCT -3'

- Reverse primer: 5'- CAGGGACCACGCTCATCAAAATCCATT -3'

63

Hoxb9 (from -986 to +69 relative to the known TSS)

- Forward primer: 5'- GTGGCCTTAACCCTTTCTCCTATTTAGCTCCCTCATCAG -3'

- Reverse primer: 5'- CACCCCCTGCTCAACTTCTCAGCCAACAAAGTA -3'

Hoxc8 (from -963 to +141 relative to the known TSS)

- Forward primer: 5'- CCAGCTAGAAACCAGGGACACACAGCT -3'

- Reverse primer: 5'- CTCACGAGTACCCCGCCCAGTACC -3'

PCR products were first cloned into the pJET 2.1 blunt vector and then

transferred into the pGL3 vector using restriction enzymes.

Construction was verified by DNA sequencing (Hy Labs).

4. Enhancer-promoter mDPE Hox-pGL3- Constructs were generated by

site-Directed mutagenesis following Stratagene‟s QuickChange

protocol. Complementary primers (IDT) include nucleotides

mismatches to the DPE motif and additional sequences that surround

them. The enhancer-promoter wt Hox -pGL3 plasmids served as

templates for the mutagenesis reaction. Following the mutagenesis

PCR, tubes were incubated with DpnI to digest the template plasmids.

DNA sequence-verified fragments that encompass the mutated

nucleotides were sub-cloned into their corresponding locations in the

wt vectors.

5. FLAG-HA tagged TAF6/9 versions-pcDNA5/FRT- this set of plasmids

contains four types of inserts: a) TAF6 with a C-term. FLAG-HA tag, b)

TAF6 with an N-term. FLAG-HA tag, c) TAF9 with a C-term. FLAG-HA

tag and d) TAF9 with an N-term. FLAG-HA tag. In order to preserve the

correct open-reading frame that includes the tags- and the TAFs-types,

construction of these plasmids was done in two steps. First, I cloned

64

the FLAG-HA encoding sequences with an addition of Methionine

codon or stop codon into the pcDNA5/FRT vector by the “Drop-In”

procedure. These inserts are flanked by NheI and KpnI or KpnI and

XhoI sites (For sequences of the tags and related details, see

Appendix 8). Next, I set PCR reactions on pET-taf6/taf9 containing

vectors (received from Prof. Rivka Dikstein, Weizmann Institute,

Rehovot), with primers that lack a Methionine codon (for N-term. tags)

or lack a stop codon (for C-term. tags). The primers are:

- forward TAF9 C-term. tags: 5'- GCTAGCATGGAGTCTGGCAAGACG -3'

- reverse TAF9 C-term. tags: 5'- GGTACCCAGATTATCATAGTCATCATCATCATCAT -

3'

- forward TAF9 N-term. tags: 5'- GGTACCGAGTCTGGCAAGACGGCTT -3'

- reverse TAF9 N-term. tags: 5'- CTCGAGTTACAGATTATCATAGTCATCATCATCATC

ATCGTC -3'

- forward TAF6 C-term. tags: 5'- GCTAGCATGGCTGAGGAGAAGAAGCTGAAGCTTAGC -

3'

- reverse TAF6 C-term. tags: 5'- GGTACCCGGAGCAGGCTGAGGGGA -3'

- forward TAF6 N-term. tags: 5'- GGTACCGCTGAGGAGAAGAAGCTGAAGCTT -3'

- reverse TAF6 N-term. tags: 5'- CTCGAGTCACGGAGCAGGCTGAGG -3'

GCTAGC- NheI site, GGTACC-KpnI site and CTCGAG – XhoI site. Highlighted in Red are

stop codons.

The PCR products were cloned into the pcDNA5/FRT with FLAG-HA

tags using the abovementioned restriction enzymes. The plasmids are

schematically illustrated in Appendix 8(B-E).

65

Cultured cells

Both Human Embryonic Kidney (HEK) -293 cells and HEK-293 flp-in cells

were cultured in DMEM with high glucose supplemented with 10% FBS, 1%

L-Glutamine, 1% Pen-Strep, 0.2% Amphotericin (Biological industries) at 37oC

with 5% CO2. HEK-293 flp-in TAF6 and TAF9 stable cells were grown in

media containing 85ug/ml of Hygromycin. The HEK-293 cells were received

from Prof. Ronit Sarid and Dr. Jeremy Don, and the parental HEK-293-flp-in

cell line was received from Prof. Yaron Shav-Tal.

Transfections

1. Transient transfections- One day prior to transfections, 0.8 X 106 HEK-

293 cells were plated in each 60mm dish to obtain 60%-80%

confluence on the following day. Cells were transfected by calcium

phosphate with 3g total DNA per dish (see details in the dual

luciferase section below). Just prior to transfections, the medium was

replaced with medium containing 0.1% Chloroquine, which prevents

lysosomal degradation and improves the transfection efficiency. After

six to eight hours, the medium was replaced with a new medium

without Chloroquine, and after forty-eight hours the cells were

harvested for reporter activity assays or for RNA purification.

2. Stable transfections- HEK293-flp-in cells, which contain FRT site in

their genome, were co-transfected with each of the taf6/9 tagged

plasmids, and with the pOG44 flp recombinase expression plasmid in

ratio of 1:9, respectively (this ratio in favor of the pOG44 plasmid is

used to increase the chances of homologous recombination into the

66

FRT site, rather than random integration in the genome). One day prior

to co-transfections, 2-3 X 106 cells were plated in 100mm plates to

obtain 60%-80% confluence on the following day. Cells were

transfected using calcium phosphate with 10g DNA in total (1g of

each of the different taf 6/9 expression vectors and 9g of pOG44

plasmid) per plate. Just prior to transfections, the medium was

replaced with medium containing 0.1% Chloroquine. After six to eight

hours, the medium was replaced with a new medium without

Chloroquine, Starting one day after the transfections and during the

next three weeks, growth medium was supplemented with 85ug/ml of

Hygromycin B. Hygromycin-resistant colonies were transferred to

larger plates and finally to 75cm2 flasks. To verify genomic integration,

cells were tested for lack of -gal expression. Colonies in which the

majority of cells were white In the X-gal test were taken for further

analyses. Genomic integration was also tested by PCR. Cell extracts

were analyzed by western blotting in order to detect the expression of

the tagged-TAFs in the stable cell lines.

X-gal staining (lacZ assay)

The four stable cell clones, as well as the parental HEK-293 flp-in cells (as

a negative control) were grown in 24 well plates for 24 h. After 24 h, the

cells were washed in PBS x1, fixed with 3.7% formaldehyde for 5 minutes,

washed again with PBS x1 and incubated in X-gal solution (containing

5mM K3Fe(CN)6, 5mM K4Fe(CN)6, 2mg MgCl2 and 1mg/ml X-gal) at

37°C for at least three hours.

67

PCR amplification: detection of FRT sites in the stable cell lines

Genomic DNA was extracted from the four stable cell clones as well as from

the parental HEK-293 flp-in cells (as a negative control) using the Archive

Pure DNA Cell/Tissue kit (5 PRIME). The genomic DNA was used as a

template for two separate PCRs. The first PCR reaction was done with a

forward SV40 promoter primer and a reverse Hygromycin-resistance gene

primer (Hygro). The second PCR reaction was done with a forward BGH

polyA primer and a reverse lacZ-Zeocin gene primer (lacZ). PCR products

were run on an agarose gel and the amplified DNA was extracted using the

Nucleo Spin PCR clean-up Gel extraction kit (Macherey Nagel). The amplified

DNA was sequenced using the same primers of the PCR reactions. The

primers are:

- SV40 promoter (forward primer): 5′-CCAGTTCCGCCCATTCTCC-3’

- Hygro (reverse primer): 5’-CTGTTATGCGGCCATTGTCC -3’

- BGH polyA (forward primer): 5'- CGAGTCTAGAGGGCCCGTTTAAAC -3'

- lacZ (reverse primer): 5'- GTAACCGTGCATCTGCCAGTTTG -3'

Western-blotting

Protein extracts from the stable cell lines, which express TAF6- or TAF9-

FLAG-HA versions were run and separated by electrophoresis in

polyacrylamide –SDS gels (concentration of 12%-15%) and transferred to

nitrocellulose membrane (GE Healthcare). For western blotting, the

membranes were incubated with different primary antibodies; αFLAG-M2

(Sigma), αHA11.1 (Covance), αTAF6 or αTAF9 (from Lazslo Tora's lab).

Washes of the primary antibodies were done with 5% milk PBS-T. The

68

membranes were then reacted with the secondary antibodies, goat-anti-

mouse-HRP. Washes of the secondary antibodies were done with PBS-T

(without 5% milk). The detection of the TAF6 and TAF9 proteins was done by

the EZ-ECL Kit (Biological Industries).

Dual luciferase analysis

Each transient transfection was done in triplicates. The total amount of DNA

per dish was 3g. This amount of DNA was consisted of: 2.5g of the pGL3

reporter constructs, 0.1g of TK Renilla (RL) and 0.4g of the pBlueScript

vector for completion to 3g total DNA. Cells were harvested 48 hours post

transfection and extracts were analyzed for both firefly and Renilla luciferase

activities using the Synergy instrument (BioTek). To normalize for variations in

transfection efficiency, firefly luciferase values for each plate were divided by

the Renilla luciferase values.

RNA extraction

Transfected HEK-293 cells were washed with PBSx1 and were harvested for

RNA production using the Trizol reagent (Invitrogen) or the PerfectPure RNA

Cell & Tissue kit (5 PRIME). The levels of extracted RNA were tested by a

nanodrop spectrophotometer. RNA was also run on agarose gels to assess its

quality. The RNA was used as a template for primer extension analysis.

Primer extension analysis

Total RNA, which was extracted from transfected HEK 293 cells, was used as

a template for cDNA synthesis by the MMLV Reverse Transcriptase

69

(Promega) using a specific 32P end-labeled primer. In order to accurately

identify the transcripts' start sites, 2g of each transfected plasmid was

sequenced with the same primer used for the primer extension reaction, by

the sequenase Version 2.0 DNA sequencing kit (USB). The samples were run

on an 8% polyacrylamide-urea DNA sequencing gel. The gel was dried at

80oC under vacuum and exposed overnight to a PhosphoImager screen (GE

Healthcare). The primer extension reaction was done on 15-30g of total

RNA. The labeled primer is complementary to the firefly luciferase gene.

Computational software and Bioinformatic tools

1. DAC software- The 'Determinacy Analysis Chain' Software is a

program that uses a search algorithm that refers to texts (i.e. DNA

sequence) as a sequential string of letters. This tool finds the positions

in a sequence that match the search (using the combinations of bases

that are contained in the different IGms).

2. hDPEsearcher Software- This tool is a MatLab based software, which

was developed by Amitay Drummer, Anna Sloutskin and I. The script of

this software contains pre-determined instructions for searching Inr and

DPE combinations in the human genome (for more details, see above:

section 4.4.1.).

70

7. Reference

1. Splinter, E. and W. de Laat, The complex transcription regulatory landscape

of our genome: control in three dimensions. EMBO J, 2011. 30(21): p. 4345-55.

2. Dong, X., et al., Modeling gene expression using chromatin features in

various cellular contexts. Genome Biol, 2012. 13(9): p. R53.

3. Shandilya, J. and S.G. Roberts, The transcription cycle in eukaryotes: from

productive initiation to RNA polymerase II recycling. Biochim Biophys Acta, 2012.

1819(5): p. 391-400.

4. Thomas, M.C. and C.M. Chiang, The general transcription machinery and

general cofactors. Crit Rev Biochem Mol Biol, 2006. 41(3): p. 105-78.

5. Butler, J.E. and J.T. Kadonaga, The RNA polymerase II core promoter: a key

component in the regulation of gene expression. Genes Dev, 2002. 16(20 :(p. 2583-

92.

6. Kadonaga, J.T., Perspectives on the RNA polymerase II core promoter. Wiley

Interdiscip Rev Dev Biol, 2012. 1(1): p. 40-51.

7. Juven-Gershon, T. and J.T. Kadonaga, Regulation of gene expression via the

core promoter and the basal transcriptional machinery. Dev Biol, 2010. 339(2): p.

225-9.

8. Lenhard, B., A. Sandelin, and P. Carninci, Metazoan promoters: emerging

characteristics and insights into transcriptional regulation. Nat Rev Genet, 2012.

13(4): p. 233-45.

9. Heintzman, N.D. and B .Ren, The gateway to transcription: identifying,

characterizing and understanding promoters in the eukaryotic genome. Cell Mol Life

Sci, 2007. 64(4): p. 386-400.

10. Juven-Gershon, T., et al., The RNA polymerase II core promoter - the

gateway to transcription. Curr Opin Cell Biol, 2008. 20(3): p. 253-9.

11. Carninci, P., et al., Genome-wide analysis of mammalian promoter

architecture and evolution. Nat Genet, 2006. 38(6): p. 626-35.

12. Rach, E.A., et al., Motif composition, conservation and condition-specificity of

single and alternative transcription start sites in the Drosophila genome. Genome

Biol, 2009. 10(7): p. R73.

13. Bajic, V.B., et al., Mice and men: their promoter properties. PLoS Genet,

2006. 2(4): p. e54.

14. Hoskins, R.A., et al., Genome-wide analysis of promoter architecture in

Drosophila melanogaster. Genome Res, 2011. 21(2): p. 182-92.

71

15. Stamatoyannopoulos, J.A., Illuminating eukaryotic transcription start sites.

Nat Methods, 2010. 7(7): p. 501-3.

16. Matsui, T., et al., Multiple factors required for accurate initiation of

transcription by purified RNA polymerase II. J Biol Chem, 1980. 255(24): p. 11992-6.

17. Samuels, M., A. Fire, and P.A. Sharp, Separation and characterization of

factors mediating accurate transcription by RNA polymerase II. J Biol Chem, 1982.

257(23): p. 14419-27.

18. Dikstein, R., The unexpected traits associated with core promoter elements.

Transcription, 2011. 2(5): p. 201-6.

19. Kadonaga, J.T., The DPE, a core promoter element for transcription by RNA

polymerase II. Exp Mol Med, 2002. 34(4): p. 259-64.

20. Smale, S.T. and J.T. Kadonaga, The RNA polymerase II core promoter. Annu

Rev Biochem, 2003. 72: p. 449-79.

21. Muller, F. and L. Tora, The multicoloured world of promoter recognition

complexes. EMBO J, 2004. 23 )1 :( p. 2-8.

22. Tora, L., A unified nomenclature for TATA box binding protein (TBP)-

associated factors (TAFs) involved in RNA polymerase II transcription. Genes Dev,

2002. 16(6): p. 673-5.

23. Muller, F. and L. Tora, Chromatin and DNA sequences in defining promoters

for transcription initiation. Biochim Biophys Acta, 2014. 1839(3): p. 118-28.

24. Anish, R., et al., Characterization of transcription from TATA-less promoters:

identification of a new core promoter element XCPE2 and analysis of factor

requirements. PLoS One, 2009. 4(4): p. e5103.

25. Juven-Gershon, T., S. Cheng, and J.T. Kadonaga, Rational design of a super

core promoter that enhances gene expression. Nat Methods, 2006. 3(11): p. 917-22.

26. Goldberg, M.L., Ph.D. thesis, in Stanford University 1979.

27. Ohler, U., et al., Computational analysis of core promoters in the Drosophila

genome. Genome Biol, 2002. 3(12): p. RESEARCH0087.

28. Kim, T.H., et al., A high-resolution map of active promoters in the human

genome. Nature, 2005. 436(7052): p .876-80.

29. Gershenzon, N.I. and I.P. Ioshikhes, Synergy of human Pol II core promoter

elements revealed by statistical sequence analysis. Bioinformatics, 2005. 21(8): p.

1295-300.

30. Mencia, M., et al., Activator-specific recruitment of TFIID and regulation of

ribosomal protein genes in yeast. Mol Cell, 2002. 9(4): p. 823-33.

31. Basehoar, A.D., S.J. Zanton, and B.F. Pugh, Identification and distinct

regulation of yeast TATA box-containing genes. Cell, 2004. 116(5): p. 699-709.

72

32. Molina, C. and E. Grotewold, Genome wide analysis of Arabidopsis core

promoters. BMC Genomics, 2005. 6: p. 25.

33. Yamamoto, Y.Y., et al., Differentiation of core promoter architecture between

plants and mammals revealed by LDSS analysis. Nucleic Acids Res, 2007. 35(18): p .

6219-26.

34. Reeve, J.N., Archaeal chromatin and transcription. Mol Microbiol, 2003. 48(3):

p. 587-98.

35. Singer, V.L., C.R. Wobbe, and K. Struhl, A wide variety of DNA sequences

can functionally replace a yeast TATA element for transcriptional activation. Genes

Dev, 1990. 4(4): p. 636-45.

36. Corden, J., et al., Promoter sequences of eukaryotic protein-coding genes.

Science, 1980. 209(4463): p. 1406-14.

37. Smale, S.T. and D. Baltimore, The "initiator" as a transcription control

element. Cell, 1989. 57 )1 :( p. 103-13.

38. FitzGerald, P.C., et al., Comparative genomics of Drosophila and human core

promoters. Genome Biol, 2006. 7(7): p. R53.

39. Gershenzon, N.I., E.N. Trifonov, and I.P. Ioshikhes, The features of

Drosophila core promoters revealed by statistical analysis. BMC Genomics, 2006. 7:

p. 161.

40. Kaufmann, J. and S.T. Smale, Direct recognition of initiator elements by a

component of the transcription factor IID complex. Genes Dev, 1994. 8(7): p. 821-9.

41. Verrijzer, C.P., et al., Binding of TAFs to core elements directs promoter

selectivity by RNA polymerase II. Cell, 1995. 81(7): p. 1115-25.

42. Chalkley, G.E. and C.P. Verrijzer, DNA binding site selection by RNA

polymerase II TAFs: a TAF(II)250-TAF(II)150 complex recognizes the initiator. EMBO

J, 1999. 18(17): p. 4835-45.

43. Javahery, R., et al., DNA sequence requirements for transcriptional initiator

activity in mammalian cells. Mol Cell Biol, 1994. 14(1): p. 116-27.

44. Purnell, B.A., P.A. Emanuel, and D.S. Gilmour, TFIID sequence recognition of

the initiator and sequences farther downstream in Drosophila class II genes. Genes

Dev, 1994. 8(7): p. 830-42.

45. Yang, C., et al., Prevalence of the initiator over the TATA box in human and

yeast genes and identification of DNA motifs enriched in human TATA-less core

promoters. Gene, 2007. 389(1): p. 52-65.

46. Frith, M.C., et al., A code for transcription initiation in mammalian genomes.

Genome Res, 2008. 18(1): p. 1-12.

73

47. Burke, T.W. and J.T. Kadonaga, Drosophila TFIID binds to a conserved

downstream basal promoter element that is present in many TATA-box-deficient

promoters. Genes Dev, 1996. 10(6): p. 711-24.

48. Burke, T.W. and J.T. Kadonaga, The downstream core promoter element,

DPE, is conserved from Drosophila to humans and is recognized by TAFII60 of

Drosophila. Genes Dev, 1997. 11(22): p. 3020-31.

49. Kutach, A.K. and J.T. Kadonaga, The downstream promoter element DPE

appears to be as widely used as the TATA box in Drosophila core promoters. Mol

Cell Biol, 2000. 20(13): p. 4754-64.

50. Juven-Gershon, T., J.Y. Hsu, and J.T. Kadonaga, Caudal, a key

developmental regulator, is a DPE-specific transcriptional factor. Genes Dev, 2008.

22(20): p. 2823-30.

51. Zehavi, Y., et al., Core promoter functions in the regulation of gene

expression of Drosophila dorsal target genes. J Biol Chem, 2014. 289(17): p. 11993-

2004.

52. Zehavi, Y., et al., The core promoter composition establishes a new

dimension in developmental gene networks. Nucleus, 2014. 5(4.(

53. Duttke, S.H., RNA polymerase III accurately initiates transcription from RNA

polymerase II promoters in vitro. J Biol Chem, 2014. 289(29): p. 20396-404.

54. Lewis, E.B., A gene complex controlling segmentation in Drosophila. Nature,

1978. 276(5688): p. 565-70.

55. McGinnis, W., et al., A conserved DNA sequence in homoeotic genes of the

Drosophila Antennapedia and bithorax complexes. Nature, 1984. 308(5958): p. 428-

33.

56. Abate-Shen, C., Deregulated homeobox gene expression in cancer: cause or

consequence? Nat Rev Cancer, 2002. 2(10): p. 777-85.

57. Lappin, T.R., et al., HOX genes: seductive science, mysterious mechanisms.

Ulster Med J, 2006. 75(1): p. 23-31.

58. Rawat, V.P., R.K. Humphries, and C. Buske, Beyond Hox: the role of

ParaHox genes in normal and malignant hematopoiesis. Blood, 2012. 12 0)3 :( p. 519-

27.

59. Pearson, J.C., D. Lemons, and W. McGinnis, Modulating Hox gene functions

during animal body patterning. Nat Rev Genet, 2005. 6(12): p. 893-904.

60. McIntyre, D.C., et al., Hox patterning of the vertebrate rib cage. Development,

2007 .134)16 :( p. 2981-9.

61. Wellik, D.M., Hox patterning of the vertebrate axial skeleton. Dev Dyn, 2007.

236(9): p. 2454-63.

74

62. Mallo, M. and C.R. Alonso, The regulation of Hox gene expression during

animal development. Development, 2013. 140(19): p. 3951-6 3.

63. Montavon, T. and D. Duboule, Chromatin organization and global regulation

of Hox gene clusters. Philos Trans R Soc Lond B Biol Sci, 2013. 368(1620): p.

20120367.

64. Deschamps, J. and J. van Nes, Developmental regulation of the Hox genes

during axial morphogenesis in the mouse. Development, 2005. 132(13): p. 2931-42.

65. Mlodzik, M., A. Fjose, and W.J. Gehring, Isolation of caudal, a Drosophila

homeo box-containing gene with maternal expression, whose transcripts form a

concentration gradient at the pre-blastoderm stage. EMBO J, 1985. 4(11): p. 2961-9.

66. Mlodzik, M. and W.J. Gehring, Expression of the caudal gene in the germ line

of Drosophila: formation of an RNA and protein gradient during early embryogenesis.

Cell, 1987. 48(3): p. 465-78.

67. Mlodzik, M., G. Gibson, and W.J. Gehring, Effects of ectopic expression of

caudal during Drosophila development. Development, 1990. 109(2): p. 271-7.

68. Levine, M., et al., Expression of the homeo box gene family in Drosophila.

Cold Spring Harb Symp Quant Biol, 1985. 50: p. 209-22.

69. Macdonald, P.M. and G. Struhl, A molecular gradient in early Drosophila

embryos and its role in specifying the body pattern. Nature, 1986. 324(6097): p. 537-

45.

70. Sanson, B., Generating patterns from fields of cells. Examples from

Drosophila segmentation. EMBO Rep, 2001. 2(12): p. 1083-8.

71. Svingen, T. and K.F. Tonissen, Hox transcription factors and their elusive

mammalian gene targets. Heredity (Edinb), 2006. 97(2): p. 88-96.

72. van den Akker, E., et al., Cdx1 and Cdx2 have overlapping functions in

anteroposterior patterning and posterior axis elongation. Development, 2002. 129(9):

p. 2181-93.

73. Lengerke, C., et al., BMP and Wnt specify hematopoietic fate by activation of

the Cdx-Hox pathway. Cell Stem Cell, 2008 .2)1 :( p. 72-82.

74. Butler, J.E. and J.T. Kadonaga, Enhancer-promoter specificity mediated by

DPE or TATA core promoter motifs. Genes Dev, 2001. 15(19): p. 2515-9.

75. Quinonez, S.C. and J.W. Innis, Human HOX gene disorders. Mol Genet

Metab, 2014. 111(1 :(p. 4-15.

76. Argiropoulos, B. and R.K. Humphries, Hox genes in hematopoiesis and

leukemogenesis. Oncogene, 2007. 26(47): p. 6766-76.

77. Frohling, S., et al., HOX gene regulation in acute myeloid leukemia: CDX

marks the spot? Cell Cycle, 2007. 6(18): p .2241-5.

75

78. Scholl, C., et al., The homeobox gene CDX2 is aberrantly expressed in most

cases of acute myeloid leukemia and promotes leukemogenesis. J Clin Invest, 2007.

117(4): p. 1037-48.

79. Andreeff, M., et al., HOX expression patterns identify a common signature for

favorable AML. Leukemia, 2008. 22(11): p. 2041-7.

80. Starkova, J., et al., HOX gene expression in phenotypic and genotypic

subgroups and low HOXA gene expression as an adverse prognostic factor in

pediatric ALL. Pediatr Blood Cancer, 201 0 .55)6 :( p. 1072-82.

81. Lengerke, C. and G.Q. Daley, Caudal genes in blood development and

leukemia. Ann N Y Acad Sci, 2012. 1266: p. 47-54.

82. Passegue, E., et al., Normal and leukemic hematopoiesis: are leukemias a

stem cell disorder or a reacquisition of stem cell characteristics? Proc Natl Acad Sci

U S A, 2003. 100 Suppl 1: p. 11842-9.

83. Eklund, E., The role of Hox proteins in leukemogenesis: insights into key

regulatory events in hematopoiesis. Crit Rev Oncog, 2011. 16(1-2): p. 65-76.

84. Zweig ,A.S., et al., UCSC genome browser tutorial. Genomics, 2008. 92(2): p.

75-84.

85. Rosenbloom, K.R., et al., The UCSC Genome Browser database: 2015

update. Nucleic Acids Res, 2014.

86. Yamashita, R., et al., DBTSS: DataBase of Transcriptional Start Sites

progress report in 2012. Nucleic Acids Res, 2012. 40(Database issue): p. D150-4.

87. Hendrix, D.A., et al., Promoter elements associated with RNA Pol II stalling in

the Drosophila embryo. Proc Natl Acad Sci U S A, 2008. 105(22): p. 7762-7.

88. Nechaev, S .and K. Adelman, Pol II waiting in the starting gates: Regulating

the transition from transcription initiation into productive elongation. Biochim Biophys

Acta, 2011. 1809(1): p. 34-45.

89. Goldberg, M.L., PhD thesis. Stanford University, 1979.

90. Guglielmi, B., N. La Rochelle, and R. Tjian, Gene-specific transcriptional

mechanisms at the histone gene cluster revealed by single-cell imaging. Mol Cell,

2013. 51(4): p. 480-92.

91. Lim, C.Y., et al., The MTE, a new core promoter element for transcription by

RNA polymerase II. Genes Dev, 2004. 18(13): p. 1606-17.

92. Kedmi, A., et al., Drosophila TRF2 is a preferential core promoter regulator.

Genes Dev, 2014. 28(19): p. 2163-74.

93. Bhatlekar, S., J.Z. Fields, and B.M. Boman, HOX genes and their role in the

development of human cancers. J Mol Med (Berl), 2014. 92(8): p. 811-23.

76

94. Drabkin, H.A., et al., Quantitative HOX expression in chromosomally defined

subsets of acute myelogenous leukemia. Leukemia, 2002. 16(2): p. 186-95.

95. Savinkova, L.K., et al., TATA box polymorphisms in human gene promoters

and associated hereditary pathologies. Biochemistry (Mosc), 2009. 74(2): p. 117-29.

96. Giampaolo, A., et al., Expression pattern of HOXB6 homeobox gene in

myelomonocytic differentiation and acute myeloid leukemia. Leukemia, 2002. 16(7):

p. 1293-301.

97. Fischbach, N.A., et al., HOXB6 overexpression in murine bone marrow

immortalizes a myelomonocytic precursor in vitro and causes hematopoietic stem cell

expansion and acute myeloid leukemia in vivo. Blood, 2005 .105)4 :( p. 1456-66.

98. Cianfrocco, M.A., et al., Human TFIID binds to core promoter DNA in a

reorganized structural state. Cell, 2013. 152(1-2): p. 120-31.

99. Yunger, S., et al., Quantifying the transcriptional output of single alleles in

single living mammalian cells. Nat Protoc, 2013. 8(2): p. 393-408.

100. Kim, T.K., et al., Widespread transcription at neuronal activity-regulated

enhancers. Nature, 2010. 465(7295): p. 182-7.

101. Lai, F. and R. Shiekhattar, Enhancer RNAs: the new molecules of

transcription. Curr Opin Genet Dev, 2014. 25: p. 38-42.

102. Lam, M.T., et al., Enhancer RNAs and regulated transcriptional programs.

Trends Biochem Sci, 2014. 39(4): p. 170-82.

103. Li, W., M.T. Lam, and D. Notani, Enhancer RNAs. Cell Cycle, 2014. 13(20): p .

3151-2.

104. Andersson, R., et al., An atlas of active enhancers across human cell types

and tissues. Nature, 2014. 507(7493): p. 455-61.

105. Core, L.J., et al., Analysis of nascent RNA identifies a unified architecture of

initiation regions at mammalian promoters and enhancers. Nat Genet, 2014. 46(12):

p. 1311-20.

106. Weingarten-Gabbay, S. and E. Segal, A shared architecture for promoters

and enhancers. Nat Genet, 2014. 46(12): p. 1253-4.

107. Frontini, M., et al., TAF9b (formerly TAF9L) is a bona fide TAF that has

unique and overlapping roles with TAF9. Mol Cell Biol, 2005. 25(11): p. 4638-49.

108. Lu, H., et al., The regulation of p53-mediated transcription and the roles of

hTAFII31 and mdm-2. Harvey Lect, 1994. 90: p. 81-93.

109. Lu, H. and A.J. Levine, Human TAFII31 protein is a transcriptional coactivator

of the p53 protein. Proc Natl Acad Sci U S A, 1995. 92(11): p. 5154-8.

110. Shen, C., et al., The tumorigenicity diversification in human embryonic kidney

293 cell line cultured in vitro. Biologicals, 2008. 36(4): p. 263-8.

77

111. Maston, G.A., et al., Non-canonical TAF complexes regulate active promoters

in human embryonic stem cells. Elife, 2012. 1: p. e00068.

78

8. Publications during the M.Sc. period

Sloutskin A., Danino Y.M., Zehavi Y., Orenstein Y., Doniger T., Shamir Y.,

and Juven-Gershon, T., ElemeNT: A Computational Tool for Detecting Core

Promoter Elements, Plos One, under review.

Danino Y.M., Even D., Ideses D., and Juven-Gershon T., The core promoter:

at the heart of gene expression, BBA - Gene Regulatory Mechanisms, (invited

review) under review.

Safra M., Fickentscher R., Levi-ferber M., Danino Y.M., Haviv-Chesner A.,

Hansen M., Juven-Gershon T., Weiss M., and Henis-Korenblit S. (2014) The

FOXO transcription factor DAF-16 bypasses ire-1 requirement to promote

endoplasmic reticulum homeostasis, Cell Metabolism, 20(5):870-881.

79

9. Appendixes

Appendix 1

The minimal promoter sequences of the human Cdx and human Hox genes.

The table contains the name of the gene, its symbol and the sequence from -

10 to +40 (for Cdx1 only the sequence is from -40 to +40) relative to the +1

position that we defined (in bold). The capital letters represent the nucleotides

that are transcribed as reported by the RefSeq in the genome browser. Color

code of matching positions is: TATA box, mammalian Inr and DPE.

Gene Gene

Symbol Sequence (5' to 3')

Cdx1 uc0031rq.3 ccggagctataaaaggcctgggtggggcgggcgcggcg

gcAGGACAGCCGAGTTCAGGTGAGCGGTTGCTCGTCGTCGGG

Cdx2 uc001urv.4 ccgcctctgcagcctagtgggaaggaggtGGGAGGAAAGAAGGAAGAAAG

Hoxa1 uc003syd.3 cATTCATATCATTTTTCTTCTCCGGCCCCATGGAGGAAGTGAGAAAGTTG

Hoxa2 uc003syh.3 TGAATTCAATAGTTTAATAGTAGCGCGGTCCCCATACGGCTGTAATCAGT

Hoxa9 uc003syt.3 TGAAATCTGCAGTTTCATAATTTCCGTGGGTCGGGCCGGGCGGGCCAGGC

Hoxa11 uc003syx.3 ccaaatttctacttcacggatccgCTTCAAAGAGGCAGCTGCAGTGGAGA

Hoxb3 uc010wlm.2 accgcgcagtATATTTCACATTCTCCAGAATGTTAAGTGACACTTTAACT

Hoxb9 uc002inx.3 ttgaccaatcATTTTGCAAGGAGAGCTGAGACGGGCTGCTCCACTGTACT

Hoxc6 uc001sev.3 tgactttgtcaTTTTGTCTGTCCTGGATTGGAGCCGTCCCTATAACCATC

Hoxc8 uc001ser.3 gGCCGAGCTCAGCACCGAGGCGCCCCCCAACCTGCCCAGCCCCCAGCCCA

Hoxd3 uc002ukp.3 tcGCCTCCACAGATATCAAAAGAAACCTGAAGAGCCTACAAAAAAAAAAG

Hoxd9 uc010zex.2 ccgCGCGACCAATGGTGGAGGCTGCAGCCTGCGAACTAGTCGGTGGCTCG

Hoxd10 uc002ukf.1 ATGTTTTCCTAGAGATGTCAGCCTACAAAGGACACAATCTCTCTTCTTCA

80

Appendix 2

Representation of the 1524 'good' PCPs in the four human Hox gene clusters.

A. Hoxa cluster.

B. Hoxb cluster.

C. Hoxc cluster.

D. Hoxd cluster.

Each of the clusters contains the 'good' PCPs that found in both strands (+/-).

85

Appendix 3

A table comparing between the human Hox and Histone gene clusters that

contains: cluster name, chromosomal location, the length (bp) of the genomic

regions that contain the Hox genes or Histone genes, and the average length

of the Hox gene cluster (bp). Notably, both the Hox and the Histone cluster

are similar in size.

Histones Hoxd Hoxc Hoxb Hoxa Name Cluster

Chr6 Chr2 Chr12 Chr7 Chr7

Chromosome

26104094-

26285993

176957000-

177058134

54332000-

54450000

46606807-

46858272

271320000-

27240000 Positions

181900 104134 118001 251466 108000 Length(bp)

181900 145400

Average(bp)

86

Appendix 4

The minimal promoter sequences of the seven Drosophila Hox genes that

were used as the origin sequences for the generation of IGms (and thus, the

PCPs) in the human Hox gene clusters. The color code for matching positions

to Drosophila Inr, 'Neck' positions and DPE is shown.

Inr

+17

+18

+19

+20

+24

+25

+27

DPE

Colored marks

T

C

G

A G

A

A

Inr+Neck+DPE

C

G

A

T

T

C

Each of the positions that were chosen for the IGms of each Drosophila Hox

promoter, is colored in dark blue above the sequence (see Table 3). The

number of the IGms that were generated from the promoter sequence is

indicated above the colored name of each gene.

87

Appendix 5

A print-screen of the uploaded 52 'good' PCPs to the genome browser, which

are located within the four human Hox gene clusters. Each of these PCPs is

represented as vertical black line (under the 'good PCPs' title) and its name is

indicated to the right of it. The name is composed of the Drosophila gene

name from which the PCP originated from and a serial number.

A. Hoxa cluster.

B. Hoxb cluster.

C. Hoxc cluster.

D. Hoxd cluster.

89

Appendix 6

The minimal promoter sequences of the human gene candidates obtained

from the hDPEsearcher software. The table contains the name of the gene, its

symbol and the sequence from -10 to +40 relative to the +1 position that we

defined (in bold). The capital letters represent the nucleotides that are

transcribed as reported by the RefSeq in the genome browser. The color code

of matching positions is the same color code that is used in Appendix 4.

Gene Gene

Symbol Sequence (5' to 3')

p21 uc021yzb.1 aacatgtcccAACATGTTGAGCTCTGGCATAGAAGAGGCTGGTGGCTATT

tp53inp2 uc002xau.1 gcggccgcacAGACTCAAAGCCCCGCGGGCGAGCTCAGCAGCCCGGAGCG

ccnd1 uc010hoo.3 cagtaacgtcACACGGACTACAGGGGAGTTTTGTTGAAGTTGCAAAGTCc

ProS1 uc010hoo.3 tgtttccttcAGTTTTGTCAAAGCAACAGGCTTCACAAGTCCTGGTTAGG

twist2 uc021vyw.2 cagcccagctAGAGTTTCCAAAAAAGTTAGAATAACTTCCTCTCCCGGAG

snail1 uc002xuz.3 tgctgcattcATTGCGCCGCGGCACGGCCTAGCGAGTGGTTCTTCTGCGC

cdc25a uc003csh.1 CAGCGAAGACAGCGTGAGCCTGGGCCGTTGCCTCGAGGCTCTCGCCCGGC

cdc25b uc002wjn.3 gctgctgctcagcGCAGCCAGTCGCGGAGGCGGGGAGGCTGCGCGGTCAG

cdc34 uc010hoo.3 cggccaaggcAAGCGCCGGTGGGGCGGCGGCGCCAGAGCTGCTGGAGCGC

Hoxb6 uc010dbh.1 cctggtggttaTAATGCAGCATTCTTTTGGACACCACACCTAGGTCGGAG

Hoxd13 uc002ukf.1 cgagcgaaccagaGAGAAAGGAGAGGAGGGAGGAGGCGCGCCGCGCCATG

90

Appendix 7

A paper describing the ElemeNT and CORE resources, currently under

review.

91

ElemeNT: A Computational Tool for Detecting Core Promoter Elements

Anna Sloutskin1, Yehuda M. Danino1, Yonathan Zehavi1, Yaron Orenstein2, Tirza

Doniger1, Ron Shamir2 and Tamar Juven-Gershon1*

1The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University,

Ramat Gan 5290002, Israel

2Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv 6997801,

Israel.

The authors declare that there are no potential conflicts of interest.

Corresponding author

Email: [email protected] (T.J-G)

Dear Reviewer,

The resources described in this manuscript will be publicly available upon

acceptance of the paper. However, to secure them until publication, they are user-

protected. Please use the general user details provided below to access the

database and software described.

To protect the identity of the reviewer when accessing the resources, you may use a

proxy server that conceals your IP address. We suggest using google to choose such

a proxy server.

The resources are available at:

http://lifefaculty.biu.ac.il/gershon-tamar/index.php/resources

Username: GershonLab

Password: TJGL2014

mailto:[email protected]


92

Abstract

Core promoter elements play a pivotal role in the transcriptional output, yet their

detection within sequences of interest is largely manually-performed. Here, we

present two contributions in the curation and detection of core promoter elements

within given sequences. First, the CORE is a collection of TATA-box, initiator and

downstream core promoter element (DPE) sequences among RefSeq-defined

Drosophila melanogaster transcription start sites. Second, the Elements Navigation

Tool (ElemeNT) is a convenient web-based, interactive tool for prediction and display

of putative core promoter elements and their biologically-relevant combinations.

These resources, accessible at http://lifefaculty.biu.ac.il/gershon-

tamar/index.php/resources, facilitate the identification of core promoter elements as

active contributors to gene expression.



93

1. Introduction

The uniqueness of each cell, as well as the differences between cell types in

multicellular organisms, are largely achieved by distinct transcriptional programs. The

regulation of transcription initiation is a complex process that is primarily based on

the direct interactions between transcription factors and DNA. Transcription initiation

occurs at the core promoter region where the RNA Polymerase II (RNAPII) binds,

which is often referred to as the „gateway to transcription‟ [1-6]. Although it was

previously believed that the core promoter is a universal component that works in a

similar mechanism for all protein-coding genes, it is nowadays established that core

promoters differ in their architecture and function [5-9]. Moreover, distinct core

promoter compositions were demonstrated to result in various transcriptional outputs

[10-14].

Transcription initiation is generally thought to occur in either a focused or a dispersed

manner with multiple detected combinations between these modes [6,7]. Promoters

that exhibit a dispersed initiation pattern typically contain multiple weak transcription

start sites (TSSs) within a 50 to 100bp region, and are associated with CpG islands.

In vertebrates, dispersed transcription initiation appears to account for the majority of

protein-coding genes and is believed to direct the transcription of constitutively-

expressed genes.

Focused promoters contain a single predominant TSS or within a cluster of several

nucleotides and are highly correlated with tightly regulated gene expression [6]. The

focused core promoter typically spans the region from -40 to +40 relative to the first

transcribed nucleotide, which is usually termed “the +1 position”. The focused core

promoter area encompasses distinct DNA sequence motifs, termed core promoter

elements or motifs. These elements are recognized by the basal transcription

machinery to recruit RNAPII and form the preinitiation complex [15-17]. The TFIID

multi-subunit complex is a key basal transcription factor that recognizes the core

promoter in the process of transcription initiation [15-18]. A distinct set of TFIID

subunits, namely TATA box-binding protein (TBP) and TBP-associated factors

(TAFs), recognize specific core promoter sequences [4-6,15,19-22]. Table 1 and

Figure 1 provide a summary of the characteristics of the known core promoter

elements. Remarkably, the MTE, DPE and Bridge elements are exclusively

dependent on the presence of a functional initiator with a strict spacing requirement,

and are typically enriched in TATA-less promoters [4-6,19,20,22-24].

94

An important aspect of core promoter elements is their synergistic nature. Although

the presence of a specific core promoter element is usually sufficient to influence

transcription, different combinations of core promoter elements exist, with some

shown to act in concert and hence, affect the potency of the transcriptional outcome

[10,25]. It is therefore important to consider all the elements present within the same

promoter in order to assess its transcriptional strength.

Manual annotation of experimentally-validated Drosophila promoters for the presence

of TATA-box, Initiator and DPE was previously described [23]. This analysis includes

205 promoters, whose TSSs were empirically determined. This mapping of core

promoter elements has facilitated the discovery that the Drosophila Hox gene

network is regulated via the DPE [26]. A more comprehensive analysis of the whole

Drosophila transcriptome revealed that DPE-containing genes are conserved and

highly prevalent among the target genes of Dorsal, a key regulator of dorsal-ventral

axis formation [11]. These examples demonstrate that the comprehensive annotation

of core promoter elements in each transcript can greatly advance the understanding

of gene expression regulation.

Prediction of promoter elements that affect the transcriptional output, in the absence

of experimental validation, is a difficult task. Although high-throughput transcription

data, such as cap analysis gene expression (CAGE) [27] and genomic run-on assay

followed by deep sequencing (GRO-seq) [28] exist, the RefSeq annotation is still

considered the “gold standard” for TSSs annotation (see Discussion).

The majority of currently available promoter prediction programs search for over-

represented motifs in a given set of promoter sequences (based on annotated TSSs),

rather than known core promoter elements [29-31]. Most of these programs utilize

other features, such as transcription factors binding sites, physical properties of the

DNA, DNA accessibility, RNA polymerase II occupancy and various epigenetic

markers [31-37]. However, even available programs that aim to identify core

promoter elements, such as McPromoter [38] and Eukaryotic Core Promoter

Predictor (YAPP, http://www.bioinformatics.org/yapp/cgi-bin/yapp.cgi), rarely

consider the strict spacing required by the Inr-dependent elements, namely, DPE,

MTE and Bridge.

The selection of promoters that comprise the data set used to predict core promoter

elements based on position weight matrices (PWMs) is of pivotal importance, as

subtle variations in the sequences may generate completely different PWMs [33].

Motif finding algorithms, such as XXmotif, can be used to accurately construct a

PWM for over-represented motifs within a given set of sequences [39,40].

Unfortunately, even a perfect model that is only based on sequence features, cannot

http://www.bioinformatics.org/yapp/cgi-bin/yapp.cgi

95

exclusively account for the observed transcriptional activity, as most of the sequence

motifs are short and redundant, and can thus be found in many non-transcriptionally

active regions of the genome [33]. Using experimentally-validated sequences rather

than over-represented motifs, can greatly enhance the strength of the prediction

program, but cannot fully guarantee the accuracy of the prediction. Currently, the

experimental readout of transcription strength and start sites resulting from mutated

promoter sequences is not performed on a high-throughput scale; hence, the

currently available experimental results are prone to be biased. Moreover, the known

biologically functional sequences may slightly differ from the determined consensus,

and therefore the detection of candidate core promoter elements cannot be easily

performed using currently available resources.

2. Methods

2.1 Availability

CORE and ElemeNT are accessible at http://lifefaculty.biu.ac.il/gershon-

tamar/index.php/resources. Each resource is described in a separate description

page. For ElemeNT, both source files (Perl programming language) and the PWMs

used can be downloaded at http://lifefaculty.biu.ac.il/gershon-

tamar/index.php/element-description

2.2 CORE annotation guidelines

For a position between -10 and +10 relative to the RefSeq‟s TSS, each adenosine

was examined as a potential A+1, and was assigned a score based on nucleotides

match to the consensus Drosophila initiator sequence (Table 1). Only a match of at

least 4 out of 6 nucleotides was considered for further analysis.

DPE motifs were calculated for each putative initiator position by scoring the

sequence that is precisely located at +28 to +33 relative to the A+1 of the

corresponding initiator, based on a match to the DPE functional range set (DSWYVY;

an experimentally defined broad DPE consensus [23], presented in Table 1). The

presence of TATA-box motifs was determined by searching for a 4-nucleotides TATA

sequence match in the region between -45 and -19 relative to the RefSeq +1

position. This loose criterion was used in order to avoid missing functional TATA box-

containing promoters that do not match the 8-nucleotides-long consensus

(TATAWAAR).

2.3 The ElemeNT algorithm

For each core promoter element, the user should specify a threshold between 0 and

1 for the presence of the element at a position. Default threshold values were





96

empirically determined for each element, based on known functional sequence

elements.

For a PWM matrix P with k columns, the PWM score is calculated for each sub-

sequence of length k (k-mer) in the sequences, by multiplying the appropriate values

of the PWM for each consecutive position, as follows:

1: 1_ ( , ) '( , )k

i i k j i jPWM SCORE S P P j S , where 1:i i kS is a k-mer starting at

position i+1 in sequence S and '( , )P j x is the probability for nucleotide x at position j

in P, normalized so that for a given j, max{ '( , )} 1P j x . The role of this

normalization is to guarantee that the final PWM score for every element is between

0 and 1, irrespective of the PWM‟s parameters. Each sub-sequence with a score

exceeding the specified threshold is termed „hit‟. The score is calculated for

0 i n k , where n is the length of the input sequence S, and hits are displayed in

a list sorted in descending score order for each element. Consensus match scores,

which are the number of base matches of the hit to the motif‟s consensus, are also

reported for each hit (Table 1).

3. Results

3.1 The CORE database

We constructed CORE, a database of all RefSeq-defined Drosophila melanogaster

transcripts, annotated for the presence of TATA-box, Drosophila initiator and

downstream core promoter element (DPE) (File S1). All Drosophila transcripts

initiating at the same nucleotide were treated as a single TSS. For a given TSS, an

initiator score was calculated for each position from -50 to +50 relative to +1 of the

RefSeq TSS. Two putative initiators were determined for each RefSeq TSS, with the

first priority Inr located closer to the annotated TSS. DPE scores were calculated for

each of the determined initiators, and the presence of a TATA box was assigned.

The annotation guidelines are detailed in section 2.2. Furthermore, the frequencies of

the following elements among the Drosophila transcripts were summarized: TATA-

box, Drosophila initiator and DPE motifs. In addition to a comprehensive analysis of

the core promoter composition of Drosophila transcripts, CORE provides clues

(based on the core promoter composition) with regards to an optimal TSS. Notably,

none of the available resources, including CORE, allow the identification of most

current core promoter elements and their potential combinations within a given

sequence.

3.2 The Elements Navigation Tool

97

In order to facilitate the joint identification of the vast majority of core promoter

elements and their biologically-relevant combinations within a sequence, we

developed the Elements Navigation Tool (ElemeNT). ElemeNT is a web-based,

interactive tool for rapid and convenient detection of core promoter elements and

their combinations within any given sequence. Core promoter elements have been

shown to function at a specific distance from the TSS and to affect transcription (e.g.

as examined by mutational analysis). ElemeNT searches the input sequences for the

presence of core promoter elements that are precisely located relative to the TSS, as

specified by the user (Figure 2). The elements are represented by PWMs, which are

constructed based on the validated biologically functional sequences (File S2, Table

1). Notably, for some elements, the PWMs differ from the defined consensus

sequences, reflecting differences in the data sources used to generate these models.

The elements that can be searched for are: Mammalian initiator, Drosophila initiator,

TATA box, MTE, DPE, Bridge, BREu, BREd, Human TCT, Drosophila TCT, XCPE1

and XCPE2 (Table 1, Figure 1). Notably, the MTE, DPE and Bridge motifs are only

scored at the precise location relative to each detected mammalian/Drosophila

initiator, based on the known strict spacing requirement that is crucial for these

elements to be functional. The scores are normalized to the scale of 0 to 1, to allow

more interpretable results. The ElemeNT algorithm is described in section 2.3.

The output of the program contains the analyzed sequence, a color display of certain

possible core promoter elements combinations found, and a table containing each of

the detected elements alongside its position, PWM and consensus match scores

(Figure 3). Suggested combinations of core promoter elements are displayed in order

to indicate potential synergism between elements that may inspire further

exploration. The elements that are considered to form possible combinations are any

combination of the following: 1) the mammalian/Drosophila initiator and either the

MTE, DPE or Bridge motifs, 2) TATA box and mammalian/Drosophila initiator, 3)

TATA box and either BREu or BREd (Figure 3A).

In the output table, the elements are ordered by their type and then sorted by PWM

scores (Figure 3B). The MTE, DPE and Bridge motifs, which are strictly dependent

on the presence of a functional initiator [4-6,19,20,22,24], are displayed immediately

below the corresponding initiator. For TATA box motifs, a message is displayed if the

specific TATA-box is located 26 to 40bp upstream of the A+1 of an initiator. In

addition, a message is displayed if a BREu or BREd is located in close proximity to

the specific TATA-box [41-43].

To assess the performance of the ElemeNT tool, a set of experimentally-validated

core promoter sequences were analyzed by the tool. The analysis of the Drosophila

98

Inr is presented as an example (Figure S1). Importantly, ElemeNT detected most of

the biologically functional Drosophila initiator motifs among the dataset, at cutoff

values around 0.01. As expected, lower threshold values used were able to detect a

greater number of correct hits, however, the false positive ratio was higher as well.

False negative hits were scored as well, based on missed motifs. The threshold

values of 0.005-0.01 had a strong correlation with scores obtained for previously

validated motifs‟ sequence variations [23].

This dataset cannot be used to compare the performance of ElemeNT with other

programs, such as YAPP, as YAPP does not search for Drosophila Inr, only for

mammalian Inr. The definition of mammalian Inr is more loose than that of the

Drosophila Inr, and no individually-validated set of mammalian TSS was available.

Taken together, both the CORE database and the ElemeNT program present new

improved tools to assess the presence of core promoter elements within a given DNA

sequence.

4. Discussion

Core promoter elements, located in the immediate vicinity of the TSSs, were

demonstrated to have a great effect on the transcriptional output [6,7]. The majority

of core promoter elements were identified as DNA sequences that are recognized by

components of the preinitiation complex [19,41,42,44,45]. In addition,

overrepresented motifs were discovered in the region around the annotated TSSs

[46-48]. Some of these motifs affected the transcriptional outcome [24] and some

were bound by transcription-regulating proteins [49].

The determination of actual TSSs, which influence the motifs discovered in their

vicinity, is a critical factor in the prediction of core promoter elements. The

comprehensive determination of TSS provided by RefSeq is based on the rigorous

alignment of reads from high-quality RNA [50]. However, the TSS of the same gene

can vary across the developmental stages, tissues, and time points sampled, which

possess a great challenge for integration of the data provided by different studies.

Both the CORE database and the ElemeNT tool will benefit from the wealth of rapidly

evolving novel high-throughput techniques to identify features and sequences that

might affect transcription; these include PEAT [51], CAGE [27], FAIRE-seq [52],

ChIP-seq [53], and GRO-seq [28]. The above techniques are applied by major

projects and consortia, which are aimed at dissecting the rules governing

transcriptional regulation, including ENCODE [54], modENCODE [55], and

FANTOM5 [56], as well as other genome-wide studies [57,58]. These different

99

strategies complement each other and together introduce a much more complex view

of RNA transcription initiation than previously anticipated [59].

Furthermore, core promoter elements are associated with focused, rather than

dispersed, transcription [6], while the classification of promoters to these classes is

largely lacking. Since the CORE database uses the RefSeq‟s annotation of 5‟ ends, it

should be revisited in the future, when new standardized data for transcription start

sites will be available. Insights gained during the integration of additional data, e.g.

CAGE [60-62] and GRO-seq [59,63] will be of utmost importance for re-defining

transcription start sites. Moreover, this will enable the re-evaluation of current tools.

The overall distribution of TATA box, Inr and DPE motifs among the Drosophila

transcripts might consequently change.

The uniqueness of the ElemeNT program, as compared to other promoter-prediction

software, is its major focus on biologically-functional core promoter elements,

manifested by two major concepts that lie at the foundation of the ElemeNT

algorithm. The first is the exclusive use of experimentally validated core promoter

motifs, rather than overrepresented motifs, to construct the PWMs used. The use of

an experimentally-determined individual TSSs set is, however, limited due to possible

statistical bias.

The second is the obligatory presence of an initiator, and the strict spacing for the

downstream promoter elements MTE, DPE and Bridge. Both the presence of a

functional initiator and the strict spacing are crucial for the functionality of the

downstream elements, and are frequently omitted by other core promoter elements

prediction programs available [29,31,34,37,38]. Moreover, the identification of

combinations of elements, which were experimentally demonstrated to result in

synergistic effects [10,24,25], may spark new research directions. Despite the fact

that the presence of potential core promoter elements, or any combination of them,

may not necessarily imply that the elements are functional, their presence might

indicate that the specific genomic locus is transcriptionally active. However, in

contrast to most of the available promoter prediction programs, ElemeNT is not

designed to produce or analyze a genome-scale data, but is rather intended to

narrow down a given region of interest, considering the currently available,

experimentally-validated information about core promoter motifs themselves. The

redundancy of the core promoter motifs leads to the identification of sequences that

perfectly match functionally-verified sequences, yet are not functional. Based on

experience with transcription factors binding motifs [64], sorting out only the

functionally-relevant hits might prove to be a difficult task. Future modifications of the

algorithm used to annotate core promoter elements will be based on new insights

100

and a better understanding of transcription regulation, obtained by the

abovementioned techniques and consortia.

Importantly, the ElemeNT program can assist in the analysis of sequences from

organisms whose TSSs have not yet been comprehensively defined. For example,

both the TATA box and the BRE motifs are conserved from archaebacteria to

humans [65] and many organisms whose transcriptomes have not been annotated,

are likely to contain such core promoter elements.

To conclude, we anticipate that the ElemeNT tool, along with the CORE database,

will make the search for specific core promoter elements and their combinations

within Drosophila transcripts or any sequence of interest, accessible to scientists and

help in elucidating the major role core promoter elements play in gene expression.

Acknowledgments

We thank Marina Socol, Boris Komraz and Dr. Eli Sloutskin for invaluable assistance

in ElemeNT development and web execution. We thank Gal Nuta for assisting with

optimization of ElemeNT parameters. We thank Dr. Diana Ideses, Dan Even, Adi

Kedmi, Hila Shir-Shapira and Gal Nuta for critical reading of the manuscript.

Funding Statement

This research was supported by grants from the Israel Science Foundation to T.J-G

(no. 798/10) and R.S (no. 317/13) and the European Union Seventh Framework

Programme (Marie Curie International Reintegration Grant) to T.J-G (no. 256491).

Y.O was supported by the Edmond J. Safra Center for Bioinformatics at Tel-Aviv

University and the Israeli Center for Research Excellence (I-CORE), Gene

Regulation in Complex Human Disease, center 41/11.

101

References

1. Smale ST (2001) Core promoters: active contributors to combinatorial gene

regulation. Genes & Development 15: 2503-2508.

2. Smale ST, Kadonaga JT (2003) The RNA polymerase II core promoter. Annual

Review of Biochemistry 72: 449-479.

3. Heintzman ND, Ren B (2007) The gateway to transcription: identifying,

characterizing and understanding promoters in the eukaryotic genome. Cellular and

Molecular Life Sciences 64: 386-400.

4. Juven-Gershon T, Hsu J-Y, Theisen JWM, Kadonaga JT (2008) The RNA

polymerase II core promoter - the gateway to transcription. Current Opinion in Cell

Biology 20: 253-259.

5. Juven-Gershon T, Kadonaga JT (2010) Regulation of gene expression via the core

promoter and the basal transcriptional machinery. Dev Biol 339: 225-229.

6. Kadonaga JT (2012) Perspectives on the RNA polymerase II core promoter. Wiley

Interdiscip Rev Dev Biol 1: 40-51.

7. Lenhard B, Sandelin A, Carninci P (2012) Metazoan promoters: emerging

characteristics and insights into transcriptional regulation. Nature Reviews Genetics

13: 233-245.

8. Muller F, Demeny MA, Tora L (2007) New problems in RNA polymerase II

transcription initiation: matching the diversity of core promoters with a variety of

promoter recognition factors. J Biol Chem 282: 14685-14689.

9. Muller F, Tora L (2014) Chromatin and DNA sequences in defining promoters for

transcription initiation. Biochim Biophys Acta 1839: 118-128.

10. Juven-Gershon T, Cheng S, Kadonaga JT (2006) Rational design of a super core

promoter that enhances gene expression. Nature Methods 3: 917-922.

11. Zehavi Y, Kuznetsov O, Ovadia-Shochat A, Juven-Gershon T (2014) Core

promoter functions in the regulation of gene expression of Drosophila dorsal target

genes. J Biol Chem 289: 11993-12004.

12. Zehavi Y, Sloutskin A, Kuznetsov O, Juven-Gershon T (2014) The core promoter

composition establishes a new dimension in developmental gene networks. Nucleus

5.

13. Butler JE, Kadonaga JT (2001) Enhancer-promoter specificity mediated by DPE

or TATA core promoter motifs. Genes Dev 15: 2515-2519.

14. Dikstein R (2011) The unexpected traits associated with core promoter elements.

Transcription 2: 201-206.

15. Thomas MC, Chiang CM (2006) The general transcription machinery and general

cofactors. Critical Reviews in Biochemistry and Molecular Biology 41: 105-178.

102

16. He Y, Fang J, Taatjes DJ, Nogales E (2013) Structural visualization of key steps

in human transcription initiation. Nature 495: 481-486.

17. Grunberg S, Hahn S (2013) Structural insights into transcription initiation by RNA

polymerase II. Trends Biochem Sci 38: 603-611.

18. Cianfrocco MA, Kassavetis GA, Grob P, Fang J, Juven-Gershon T, et al. (2013)

Human TFIID binds to core promoter DNA in a reorganized structural state. Cell 152:

120-131.

19. Burke TW, Kadonaga JT (1996) Drosophila TFIID binds to a conserved

downstream basal promoter element that is present in many TATA-box-deficient

promoters. Genes & Development 10: 711-724.

20. Burke TW, Kadonaga JT (1997) The downstream core promoter element, DPE, is

conserved from Drosophila to humans and is recognized by TAF(II)60 of Drosophila.

Genes & Development 11: 3020-3031.

21. Wu CH, Madabusi L, Nishioka H, Emanuel P, Sypes M, et al. (2001) Analysis of

core promoter sequences located downstream from the TATA element in the hsp70

promoter from Drosophila melanogaster. Mol Cell Biol 21: 1593-1602.

22. Theisen JW, Lim CY, Kadonaga JT (2010) Three key subregions contribute to

the function of the downstream RNA polymerase II core promoter. Mol Cell Biol 30:

3471-3479.

23. Kutach AK, Kadonaga JT (2000) The downstream promoter element DPE

appears to be as widely used as the TATA box in Drosophila core promoters. Mol

Cell Biol 20: 4754-4764.

24. Lim CY, Santoso B, Boulay T, Dong E, Ohler U, et al. (2004) The MTE, a new

core promoter element for transcription by RNA polymerase II. Genes &

Development 18: 1606-1617.

25. Gershenzon NI, Ioshikhes IP (2005) Synergy of human Pol II core promoter

elements revealed by statistical sequence analysis. Bioinformatics 21: 1295-1300.

26. Juven-Gershon T, Hsu J-Y, Kadonaga JT (2008) Caudal, a key developmental

regulator, is a DPE-specific transcriptional factor. Genes & Development 22: 2823-

2830.

27. Shiraki T, Kondo S, Katayama S, Waki K, Kasukawa T, et al. (2003) Cap analysis

gene expression for high-throughput analysis of transcriptional starting point and

identification of promoter usage. Proc Natl Acad Sci U S A 100: 15776-15781.

28. Core LJ, Waterfall JJ, Lis JT (2008) Nascent RNA sequencing reveals

widespread pausing and divergent initiation at human promoters. Science 322: 1845-

1848.

103

29. Bajic VB, Tan SL, Suzuki Y, Sugano S (2004) Promoter prediction analysis on

the whole human genome. Nat Biotechnol 22: 1467-1473.

30. Frith MC, Valen E, Krogh A, Hayashizaki Y, Carninci P, et al. (2008) A code for

transcription initiation in mammalian genomes. Genome Res 18: 1-12.

31. Narlikar L, Ovcharenko I (2009) Identifying regulatory elements in eukaryotic

genomes. Brief Funct Genomic Proteomic 8: 215-230.

32. Ohler U, Niemann H (2001) Identification and analysis of eukaryotic promoters:

recent computational approaches. Trends Genet 17: 56-60.

33. Pedersen AG, Baldi P, Chauvin Y, Brunak S (1999) The biology of eukaryotic

promoter prediction--a review. Comput Chem 23: 191-207.

34. Rach EA, Winter DR, Benjamin AM, Corcoran DL, Ni T, et al. (2011)

Transcription initiation patterns indicate divergent strategies for gene regulation at the

chromatin level. PLoS Genet 7: e1001274.

35. Duran E, Djebali S, Gonzalez S, Flores O, Mercader JM, et al. (2013) Unravelling

the hidden DNA structural/physical code provides novel insights on promoter

location. Nucleic Acids Res 41: 7220-7230.

36. Abeel T, Saeys Y, Rouze P, Van de Peer Y (2008) ProSOM: core promoter

prediction based on unsupervised clustering of DNA physical profiles. Bioinformatics

24: i24-31.

37. Datta S, Mukhopadhyay S (2013) A composite method based on formal grammar

and DNA structural features in detecting human polymerase II promoter region. PLoS

One 8: e54843.

38. Ohler U (2006) Identification of core promoter modules in Drosophila and their

application in accurate transcription start site prediction. Nucleic Acids Res 34: 5943-

5950.

39. Hartmann H, Guthohrlein EW, Siebert M, Luehr S, Soding J (2013) P-value-

based regulatory motif discovery using positional weight matrices. Genome Res 23:

181-194.

40. Luehr S, Hartmann H, Soding J (2012) The XXmotif web server for eXhaustive,

weight matriX-based motif discovery in nucleotide sequences. Nucleic Acids Res 40:

W104-109.

41. Deng W, Roberts SG (2005) A core promoter element downstream of the TATA

box that is recognized by TFIIB. Genes Dev 19: 2418-2423.

42. Lagrange T, Kapanidis AN, Tang H, Reinberg D, Ebright RH (1998) New core

promoter element in RNA polymerase II-dependent transcription: sequence-specific

DNA binding by transcription factor IIB. Genes Dev 12: 34-44.

104

43. Deng W, Roberts SG (2007) TFIIB and the regulation of transcription by RNA

polymerase II. Chromosoma 116: 417-429.

44. Chalkley GE, Verrijzer CP (1999) DNA binding site selection by RNA polymerase

II TAFs: a TAF(II)250-TAF(II)150 complex recognizes the initiator. EMBO J 18: 4835-

4845.

45. Tokusumi Y, Ma Y, Song X, Jacobson RH, Takada S (2007) The new core

promoter element XCPE1 (X Core Promoter Element 1) directs activator-, mediator-,

and TATA-binding protein-dependent but TFIID-independent RNA polymerase II

transcription from TATA-less promoters. Mol Cell Biol 27: 1844-1858.

46. FitzGerald PC, Sturgill D, Shyakhtenko A, Oliver B, Vinson C (2006) Comparative

genomics of Drosophila and human core promoters. Genome Biol 7: R53.

47. Ohler U, Liao GC, Niemann H, Rubin GM (2002) Computational analysis of core

promoters in the Drosophila genome. Genome Biol 3: RESEARCH0087.

48. Xi H, Yu Y, Fu Y, Foley J, Halees A, et al. (2007) Analysis of overrepresented

motifs in human core promoters reveals dual regulatory roles of YY1. Genome Res

17: 798-806.

49. Li J, Gilmour DS (2013) Distinct mechanisms of transcriptional pausing

orchestrated by GAGA factor and M1BP, a novel transcription factor. EMBO J 32:

1829-1841.

50. Pruitt KD, Tatusova T, Maglott DR (2005) NCBI Reference Sequence (RefSeq): a

curated non-redundant sequence database of genomes, transcripts and proteins.

Nucleic Acids Res 33: D501-504.

51. Ni T, Corcoran DL, Rach EA, Song S, Spana EP, et al. (2010) A paired-end

sequencing strategy to map the complex landscape of transcription initiation. Nat

Methods 7: 521-527.

52. Giresi PG, Kim J, McDaniell RM, Iyer VR, Lieb JD (2007) FAIRE (Formaldehyde-

Assisted Isolation of Regulatory Elements) isolates active regulatory elements from

human chromatin. Genome Res 17: 877-885.

53. Furey TS (2012) ChIP-seq and beyond: new and improved methodologies to

detect and characterize protein-DNA interactions. Nat Rev Genet 13: 840-852.

54. (2004) The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306:

636-640.

55. Washington NL, Stinson EO, Perry MD, Ruzanov P, Contrino S, et al. (2011) The

modENCODE Data Coordination Center: lessons in harvesting comprehensive

experimental details. Database (Oxford) 2011: bar023.

56. Forrest AR, Kawaji H, Rehli M, Baillie JK, de Hoon MJ, et al. (2014) A promoter-

level mammalian expression atlas. Nature 507: 462-470.

105

57. Sandelin A, Carninci P, Lenhard B, Ponjavic J, Hayashizaki Y, et al. (2007)

Mammalian RNA polymerase II core promoters: insights from genome-wide studies.

Nat Rev Genet 8: 424-436.

58. Lenhard B, Sandelin A, Carninci P (2012) Metazoan promoters: emerging

characteristics and insights into transcriptional regulation. Nat Rev Genet 13: 233-

245.

59. Core LJ, Martins AL, Danko CG, Waters CT, Siepel A, et al. (2014) Analysis of

nascent RNA identifies a unified architecture of initiation regions at mammalian

promoters and enhancers. Nat Genet 46: 1311-1320.

60. Consortium F, the RP, Clst, Forrest AR, Kawaji H, et al. (2014) A promoter-level

mammalian expression atlas. Nature 507: 462-470.

61. Hoskins RA, Landolin JM, Brown JB, Sandler JE, Takahashi H, et al. (2011)

Genome-wide analysis of promoter architecture in Drosophila melanogaster.

Genome Res 21: 182-192.

62. Nechaev S, Fargo DC, dos Santos G, Liu L, Gao Y, et al. (2010) Global analysis

of short RNAs reveals widespread promoter-proximal stalling and arrest of Pol II in

Drosophila. Science 327: 335-338.

63. Saunders A, Core LJ, Sutcliffe C, Lis JT, Ashe HL (2013) Extensive polymerase

pausing during Drosophila axis patterning enables high-level and pliable

transcription. Genes Dev 27: 1146-1158.

64. Shlyueva D, Stampfel G, Stark A (2014) Transcriptional enhancers: from

properties to genome-wide predictions. Nat Rev Genet 15: 272-286.

65. Reeve JN (2003) Archaeal chromatin and transcription. Mol Microbiol 48: 587-

598.

66. Smale ST, Baltimore D (1989) The "initiator" as a transcription control element.

Cell 57: 103-113.

67. Goldberg ML (1979) Ph.D. Thesis. Sequence analysis of Drosophila histone

genes.

68. Parry TJ, Theisen JWM, Hsu J-Y, Wang Y-L, Corcoran DL, et al. (2010) The TCT

motif, a key component of an RNA polymerase II transcription system for the

translational machinery. Genes & Development 24: 2013-2018.

69. Anish R, Hossain MB, Jacobson RH, Takada S (2009) Characterization of

transcription from TATA-less promoters: identification of a new core promoter

element XCPE2 and analysis of factor requirements. PLoS One 4: e5103.

70. Lewis BA, Kim TK, Orkin SH (2000) A downstream element in the human beta-

globin promoter: evidence of extended sequence-specific transcription factor IID

contacts. Proc Natl Acad Sci U S A 97: 7172-7177.

106

71. Lee DH, Gershenzon N, Gupta M, Ioshikhes IP, Reinberg D, et al. (2005)

Functional characterization of core promoter elements: the downstream core element

is recognized by TAF1. Mol Cell Biol 25: 9674-9686.

107

Figure legends

Figure 1. Schematic representation of the major core promoter elements. The region

of the core promoter area (-40 to +40 relative to the TSS) is illustrated. The diagram

is roughly to scale, and each element is colored according to its color in the output

table (see Figure 3B).

Figure 2. Flow diagram of the ElemeNT process. The flowchart demonstrates the

input, processing and output steps of the ElemeNT program. The input consists of a

set of sequences and the elements to search for with their corresponding thresholds.

ElemeNT calculates hits for each element, and considers possible combinations. The

output includes combinations of core promoter elements and a table containing all

the identified elements, their location, PWM score and consensus match score.

Figure 3. A sample output of the ElemeNT program. (A) A screen-shot of the sample

input sequence and the combinations of elements identified in it. ElemeNT has

detected a TATA box flanked by both a BREu element and a BREd element,

Drosophila and Mammalian initiator elements and MTE, DPE and Bridge elements.

The two possible combinations result from a sequence match to both the Drosophila

and mammalian initiators, due to the partial sequence redundancy of the two

elements. (B) The table displaying all the elements identified within the sample input

sequence, their location, PWM and consensus match scores. Note the message

displayed for the TATA-box, indicating the presence of mammalian and Drosophila

initiator, as well as BREu and BREd, at optimal distances for transcriptional synergy.

Figure S1. Evaluation of ElemeNT‟s discovery rates. False positive (red) and false

negative (blue) hits ratios for Drosophila initiator motif were scored as a function of

the threshold used. False positives ratio was calculated using the number of false

positive matches among all potential matches. False negative ratio was assigned

based on discovery rate of true hits. The x-axis is presented on a logarithmic scale.

The analysis included 43 50bp-long sequences, which were found to be

experimentally functional.

File S1. The CORE Database. This database was created in order to identify putative

TATA box, initiator (Inr) and DPE elements in Drosophila melanogaster core

promoter region of different TSSs and to calculate the frequencies of these core

promoter elements among the transcripts, as well as to provide clues (based on core

promoter composition), with regards to an optimal TSS. The RefSeq sheet contains

108

all the annotated Drosophila transcripts from -50 to +50, relative to +1 of the RefSeq

TSS. All the transcripts initiating at the same nucleotide are treated as a single TSS.

The Inr position sheet contains the determination of the optimal initiator +1 position.

To enable a more comprehensive analysis, a „Second best initiator‟ was performed

using less stringent criteria. The DPE & TATA sheet calculates the score for

downstream core promoter element (DPE) and the presence of TATA-box motifs for

each Drosophila distinct TSS. This file along with the documentation is available at

http://lifefaculty.biu.ac.il/gershon-tamar/index.php/core-description

File S2. Position weight matrices representing the core promoter elements. This file

contains the position weight matrices (PWMs) of the different core promoter

elements, containing the nucleotide distributions in each position of each element.

Each core promoter element appears in a separate sheet. The source of the

promoter sequences used to calculate the PWM is indicated, as well as the Laplace

smoothing performed to avoid zero values. All indicated positions are relative to the

TSS.

109

Table 1. The known core promoter elements.

Name

Position

(relative to the

TSS)

PWM logo representation Consensus

(in IUPAC characters) References

Mammalian

Initiator -2 to +4

YYANWYY

[66] Drosophila

Initiator -2 to +5

TCAKTY

TATA box -30/-31 to -23/-24

TATAWAAR [6,67]

BREu

Immediately

upstream of the

TATA box

SSRCGCC [42]

BRE d

Immediately

downstream of the

TATA box

RTDKKKK [41]

DPE

(Inr dependent) +28 to +33

DSWYVY

(functional range set) [19,20,23]

MTE

(Inr dependent) +18 to +29

CSARCSSAAC [24]

Bridge

(Inr dependent)

Part I: +18 to +22

Part II: +30 to +33

Part I: CGANC

Part II: WYGT [22]

Drosophila TCT -2 to +6

YYCTTTYY [68]

Human TCT -1 to +6

YCTYTYY [68]

XCPE1 -8 to +2

DSGYGGRASM [45]

XCPE2 -9 to +2

VCYCRTTRCMY [69]

DCE +6 to +11, +16 to

+21, +30 to +34 -

Necessary motifs:

CTTC, CTGT, AGC [70,71]

Motif 1

Just upstream of

TSS, but can be

found up to -300 YGGTCACACTR [47,49]

The table includes the position (relative to the TSS, +1), motif logo, IUPAC

consensus sequence and references for each element.

110

Figure 1:

Figure 2:

111

Figure 3:

112

Appendix 8

General schematic plasmid maps of the tagged-TAFs-containing plasmid

variants. A. The sequences of the FLAG and HA epitope-tags.

B. TAF6 with a C-term. FLAG-HA tag. C. TAF6 with an N-term. FLAG-HA

tag. D. TAF9 with a C-term. FLAG-HA tag. E. TAF9 with an N-term. FLAG-

HA tag.

A.

113

Appendix 9

A invited review article about the core promoter, currently under review.

114

The core promoter: at the heart of gene expression

Yehuda M. Danino1, Dan Even1, Diana Ideses1 and Tamar Juven-Gershon1*

1The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan

University, Ramat Gan 5290002, Israel

Running title: The core promoter: a central player in gene expression

Key words: core promoter; RNA Pol II transcription; core promoter

elements/motifs; enhancer-promoter specificity; core promoter preferential

activation; gene expression

The authors declare that there are no potential conflicts of interest.

* To whom correspondence should be addressed. Tel: +972-3-531-8244; Fax:

+972-3-738-4058; Email: [email protected]

mailto:[email protected]

115

ABSTRACT

The identities of different cells and tissues in multicellular organisms are

determined by tightly controlled transcriptional programs that enable accurate

gene expression. The mechanisms that regulate gene expression comprise

diverse multiplayer molecular circuits of multiple dedicated components. The

RNA polymerase II (Pol II) core promoter establishes the center of this

spatiotemporally orchestrated molecular machine. Here, we discuss

transcription initiation, diversity in core promoter composition, interactions of

the basal transcription machinery with the core promoter, enhancer-promoter

specificity, core promoter-preferential activation, enhancer RNAs, Pol II

pausing, transcription termination, Pol II recycling and translation. We further

discuss recent findings indicating that promoters and enhancers share similar

features and may not substantially differ from each other, as previously

assumed. Taken together, we review a broad spectrum of studies that

highlight the importance of the core promoter and its pivotal role in the

regulation of metazoan gene expression and suggest future research

directions and challenges.

116

Introduction

Appropriate temporal and spatial gene expression is a highly complex process

underlying the fate and function of different cells and tissues. The regulation

of this process is composed of multiple levels and orchestrated molecular

events [1-3]. A central event in the regulation of eukaryotic gene expression is

the initiation of transcription. The initiation of transcription of protein-coding

genes and distinct non-coding RNAs occurs following the recruitment of RNA

polymerase II (Pol II) to the core promoter region by the basal transcription

machinery [4].

The core promoter is generally defined as the minimal DNA sequence

that directs accurate initiation of transcription. The core promoter sequence

encompasses the transcription start site (TSS), typically referred to as the +1

position [5, 6]. Examination of the distribution of TSSs reveals that there are

multiple modes of transcription initiation (Fig. 1A). Distinct molecular players

can open the chromatin structure at the core promoter region and thus

facilitate initiation of transcription. Interestingly, active promoters are

associated with specific chromatin signatures. These include: nucleosome-

depleted regions (NDR) or reduced nucleosome occupancy over the

promoters, DNaseI hypersensitive sites (DHS) and the enrichment of specific

histone modifications, such as di- and tri-methylation of H3K4 and acetylation

of H3K4 and H3K27 (Fig. 1B) [7, 8]. In the past, it was assumed that the core

promoter is a generic entity that functions in a universal manner. Nowadays

however, the growing convention is that the unique properties of a given

promoter are a function of its architecture and core promoter motifs

composition (Fig. 1C and D) [5, 6, 9, 10].

The core promoter, which is often referred to as “the gateway to

transcription”, is a central component in the initiation of transcription [11, 12].

Research in the past decade has enhanced our understanding of the

fundamental roles that the core promoter plays in the initiation of transcription,

as well as in the regulation of additional aspects of gene expression. Insights

are gained from studies of specific genes and gene networks [12-14], as well

as from genome-wide studies [10, 15] utilizing methodologies such as PEAT

[16], 5' RACE [17], CAGE [18], FAIRE-seq [19], ChIP-seq [20], Gro-seq [21],

117

and RNA-seq [22], and key projects and consortia (e.g. modENCODE [23],

ENCODE [24] and FANTOM5 [25]), which developed following the

implementation of some of the above methods. Accordingly, core promoters

can be studied at different resolutions: from genomic architecture,

transcription co-regulators and sequence-specific transcription factors (Fig.

2A), through basal transcription factors (Fig. 2B and C) and DNA sequence

motifs (Fig. 2C). Importantly, the different experimental strategies complement

each other and together, provide the elaborate view of core promoters. Here,

we review the current state of knowledge relevant to the contribution of the

core promoter to multiple aspects of gene expression, and discuss future

directions and challenges in the field.

1. Diversity in the transcription initiation landscape

1.1. Multiple modes of transcription initiation

The core promoter is best known for its role in directing proper transcription

initiation at the TSS. Several years ago, two modes of transcription initiation,

focused and dispersed, were noted in metazoan (Fig. 1A) [6, 10]. Focused

(also termed “sharp peak”) promoters contain a single predominant TSS or a

few TSSs within a narrow region of several nucleotides [9]. Focused

promoters encompass approximately between -40 to +40 nucleotides relative

to the TSS (referred to as the +1 position). Focused transcription initiation is

associated with spatiotemporally regulated tissue specific genes [26] and with

canonical core promoter elements that have a positional bias, such as the

TATA box, Initiator, MTE and DPE [27] (Fig. 1C).

Dispersed (also termed “broad”) promoters contain multiple weak start

sites that spread over 50 to 100 nucleotides at the promoter region ([9, 10]

and refs therein). Dispersed transcription initiation is associated with

constitutive or housekeeping genes. Vertebrate dispersed promoters often

contain CpG islands and Sp1 and NF-Y sites [6, 9, 28] whereas Drosophila

core promoters often contain elements that have weaker positional biases (as

compared to the focused promoters), but frequently co-occur in a specific

order and orientation: Ohler 1, DNA replication element (DRE), Ohler 6 and

118

Ohler 7 [27, 29] (Fig. 1D). Although the focused promoter architecture exists

in all the organisms and is the predominant initiation mode in simpler

organisms, the dispersed mode is more common in higher eukaryotes [9, 26].

For example, over 70% of vertebrate promoters are dispersed [28, 30-32].

From a teleological standpoint, the associations of sharp TSSs with regulated

genes and of broad TSSs patterns with constitutively expressed genes are

rather intuitive. It would be easier to achieve a more precise control of gene

expression from focused TSSs, as compared with dispersed promoters of

housekeeping genes, which would be constitutively transcribed with minimal

variation of gene expression by usage of multiple start sites [9].

1.2. Focused versus Dispersed initiation patterns - recent studies, new

insights

Despite the abovementioned distinction between the two modes of

transcription initiation, classification of transcription initiation landscapes is not

so straightforward. Functional experiments and genome-wide studies using

advanced technologies imply that there are multiple ways to classify

promoters. Thus, the boundaries between these two major types of promoters

are sometimes unclear [6, 33]. With respect to the “focused vs. dispersed”

sub-classifications mentioned above, a mixed promoter (also termed “broad

with peak”; [16]), an additional promoter type, was revealed. This promoter

type exhibits a dispersed initiation pattern with a single strong transcription

start site [6, 34] (Fig. 1A). Several studies classified mammalian promoters

using alternative criteria [26, 28, 32]. The Ren Lab classified active promoters

based on genome-wide ChIP experiments for TFIID and Pol II, as well as

H3Ac and H3K4me, regardless of focused or dispersed initiation patterns [32].

Bajic et. al. [28] define four promoter types, based on distribution of

dinucleotides over the promoter regions, CpG Islands and TATA boxes.

Moreover, Carninci et. al. [26] classified promoters into four groups based on

CAGE analysis: single peak, broad shape peak, bimodal/multimodal peak and

broad with dominant peak. These studies also challenge the “focused vs.

dispersed” classification, as some mouse and human promoters contain both

CpG Islands and TATA boxes. A recent comprehensive review [10], which

119

compared genome-wide studies in human and Drosophila, presented another

sub-classification of three major types of promoters termed Type I, Type II

and Type III. Type I promoters contain TATA boxes and focused TSSs, lack

CpG islands and are associated with tissue-specific expression in adult

tissues. Type II promoters contain CpG islands and dispersed TSSs. In

mammals, type II promoters lack TATA boxes, and in Drosophila they contain

DRE, Ohler 1 or Ohler 6 motifs. Genes belonging to this group are associated

with broad expression throughout the organism's life. Type III promoters are

associated with developmentally regulated genes, which in Drosophila contain

combinations of Initiator and DPE motifs. In mammals, type III promoters

contain large CpG islands.

Taken together, the transcriptional initiation landscape is more complex than

the simple classification of two types of promoters.

1.3. Bidirectional and divergent transcription

Another manifestation of the complexity of transcription initiation is the

phenomenon of bidirectional transcription. Bidirectional transcription, which

presents two closely spaced transcription initiation events (within less than

1kb) of head-to-head Pol II transcripts in both sense and anti-sense

orientations, was originally defined for adjacent head-to-head oriented pairs of

protein-coding genes [35]. The relatively short region that contains the

opposite-oriented initiations and separates between these genes, is often

called a “bidirectional promoter” [36]. Experimental and computational studies

have characterized many features of bidirectional promoters. In general, it is

shown that 10%-22% of the genes in mammals are organized in this manner

[37]. Moreover, the bidirectionality was shown to be controlled in a cell-type

specific manner, and these pairs of genes are coordinately regulated ([37] and

refs therein). Hence, bidirectional promoters might have evolved to facilitate

the regulation of transcription of different genes at the same time, and might

consist of two separate, yet dependent, core promoters. Additionally, a

computational analysis supports an evolutionary role for bidirectional

promoters in the emergence of novel species-specific transcripts [38].

Bioinformatics analysis of the distribution of common core promoter elements

120

(BREu, TATA box, Inr and DPE) and CpG islands at bidirectional versus

unidirectional promoters, demonstrated that while the BREu is enriched at

bidirectional promoters, the Inr and DPE elements are similarly detected at

both promoter types [39]. The TATA box is rare in general, but is enriched in

bidirectional promoters of histone genes. Moreover, it was shown that the

CpG islands and Sp1 binding sites are common features of most of the

bidirectional promoters, compared to unidirectional promoters [40]. Other

studies focused on overrepresented binding-sites of different transcription

factors, and in some cases - on their influence on the expression of two

opposite genes regulated by a bidirectional promoter [37, 41].

Interestingly, another manifestation of bidirectional transcription

involving non-coding RNAs (ncRNAs) was recently characterized. Multiple

classes of ncRNAs were identified in different organisms (reviewed in [42]).

One of these classes is promoter-associated ncRNAs. During the years,

classes of promoter-associated non-coding transcripts were discovered in

bacteria, yeast, Drosophila, mouse, human and plants ([42-44] and refs

therein). Four studies, published back-to-back in 2008, described new classes

of promoter-associated ncRNAs in humans and mice [21, 45-48]. These

ncRNAs were generally divided into two classes, termed TSS-associated

RNAs (TSSa-RNAs) [47] and promoter upstream transcripts (PROMPTs) [46]

or upstream antisense RNAs (uaRNAs) [49], which share many features.

They are short, present at low abundance and are associated with CpG

islands and active-promoter-related histone marks (H3K4me3, H3ac), but not

with elongation-related histone marks (H3K36me3, H3K79me3).

Non-coding antisense RNAs derived from bidirectional promoters have

very short half-lives and are barely detectable. Two recent studies have

shown that an asymmetric distribution of polyadenylation signals and U1

snRNP-binding sites surrounding TSSs control transcript stability [49-51].

Notably, bidirectional initiation is also a feature of enhancer RNAs (eRNA; see

section 7) [52, 53].

The Lis lab has demonstrated that nearly 80% of active genes have

bidirectional promoters, suggesting that bidirectional initiation is a general

feature of mammalian genomes [21, 54]. Hence, these divergent ncRNAs

may be regarded as markers for active promoters of protein-coding genes [21,

121

45-47, 55]. Duttke et al. have recently analyzed transcription from human

promoters in HeLa cells and have classified promoters into three types:

unidirectional promoters, divergent promoters (containing an annotated gene

in the forward direction and no annotated gene in the reverse direction) and

bidirectional promoters (containing annotated genes in both directions) [56].

Surprisingly, they discovered that about half of human active promoters are

intrinsically unidirectional. Moreover, the divergent transcripts result from their

own reverse-oriented core promoters. Using DNaseI accessibility they

determined that unidirectional promoters are depleted at the edges of open

chromatin. The authors suggest that divergent transcription is not an inherent

property of the transcription process, but a consequence of the presence of

both forward and reverse-directed promoters. This suggestion is in line with

the two occupancy peaks observed for each TBP and Pol II by the Lis lab

[54]. The Lis lab observed tight spacing (estimated 110 bp) between the

forward and reverse-directed promoters [54], whereas the Ohler & Kadonaga

labs, observed variable, however larger, spacing between the two [56]. It

remains to be determined whether the difference between these findings

results from the differences between the different cell lines used or from the

analysis methodology.

Despite the impressive discoveries related to bidirectional transcription

in the last few years (which highlight the complexity of gene expression), the

functional role of short non-coding antisense RNAs still remains elusive. From

this point onwards, we only refer to the comprehensively studied focused and

dispersed core promoter types.

2. Core promoter elements: the combinatorial code of precise

transcription initiation

The Pol II core promoter is composed of short DNA sequences that are

referred to as core promoter elements or motifs. The majority of core promoter

motifs serve as binding sites for components of the basal transcription

machinery, in particular TFIID, which is composed of TATA box-binding

protein (TBP) and TBP-associated factors (TAFs), and TFIIB [4, 57, 58].

122

The basal transcription machinery recruits Pol II to the core promoter

that directs the initiation of transcription [4, 6, 9, 59-61]. Nevertheless, there

are no universal core promoter elements, and diverse core promoter

compositions have been reported [6, 62]. In this section, we will briefly discuss

the majority of core promoter elements (schematically depicted in Fig. 1C and

D), which have been analyzed in Drosophila and mammals, with particular

emphasis on their variety and the relations between them.

2.1. The precisely-positioned core promoter elements are common in

the focused promoters

Early studies from the Chambon lab described the existence of a putative

element at the TSS [63]. The function of the initiator (Inr) as a transcriptional

element that encompasses the +1 TSS was articulated by Smale and

Baltimore [64]. The Inr is probably the most prevalent core promoter motif in

focused core promoters [65-67]. It is mainly bound by the TAF1 and TAF2

subunits of TFIID [68-70]. The mammalian Inr consensus sequence is

YYA+1NWYY (IUPAC nomenclature) [71], and the Drosophila consensus is

TCA+1KTY [70, 72]. Inr-like sequences were also identified in Saccharomyces

cerevisiae [73]. Computational analyses of promoters argue that the Inr

consensus is only YR (-1, +1 positions) in humans [10, 26, 74] or TCA+1GTY

for Drosophila [65, 67]. The A nucleotide (or R in the YR consensus) is

generally designated as the +1 position, even when transcription does not

initiate at this specific nucleotide. This critical convention is instrumental,

because functional downstream elements are completely dependent on the

presence of an Inr and the precise spacing from it [6, 9, 12].

Notably, a strict version of the mammalian initiator (sINR), which is

present in 1.5% of human genes and enriched in TATA-less promoters of

specific functional categories, was defined as CCA+1TYTT, with conserved

sequences flanking the motif [75]. The sINR motif functions in cooperation

with Sp1 and can replace the conventional Inr, but not vice versa. Similarly to

the canonical Inr element, sINR is bound by TAF1 and its function depends on

it [75]. The YY1 transcription factor binds sINR, but this binding is dispensable

for sINR function [75].

123

In addition to these versions of the Inr, a few core elements that

encompass the transcription start site were identified. The polypyrimidine

initiator motif (TCT), which was originally identified in mouse, is conserved

from Drosophila to humans [13, 76-78]. The TCT has a consensus sequence

of YYC+1TTTYY in Drosophila and YC+1TYTYY in humans, in which C is the

+1 TSS. Although the Inr consensus resembles the TCT consensus, the TCT

motif cannot substitute for an Inr to initiate transcription [13]. The TCT

overlaps with a motif that was previously identified in humans, termed 5'-

terminal oligopyrimidine tract (5'-TOP) (reviewed in [79]), which is functionally

distinct from it [13]. Both the TCT and the 5‟-TOP elements are enriched and

are functional in the transcription of ribosomal protein genes and proteins

involved in the regulation of translation [13, 76].

Two additional core promoter motifs that are located around TSSs

were originally identified in the hepatitis B virus X gene promoter, which

contains two TSSs. The X gene core promoter element 1 (XCPE1) drives Pol

II transcription from the first TSS of the X gene promoter as well as from other

human promoters, when accompanied by co-activator sites. XCPE1 is found

in ~1% of the human genes (particularly TATA-less genes) and its consensus

sequence DSGYGGRASM spans positions -8 to +2 relative to the TSS [80].

Unlike XCPE1, The X gene core promoter element 2 (XCPE2) is sufficient to

drive Pol II transcription by itself. The XCPE2 directs transcription from the

second TSS of the X gene mRNA, but it also drives transcription from

additional human promoters, in a TAF-free manner. Its consensus sequence

VCYCRTTRCMY spans positions -9 to +2 relative to the TSS [81].

There are core promoter elements that are located upstream of the

TSS. The TATA box motif is the first core promoter motif to be identified [82].

Although the TATA box was previously considered to be a universal element,

it is presently estimated that only 8%-30% of metazoan core promoters [26,

32, 59, 67, 83] and 20%-46% yeast promoters [61, 84, 85] are TATA-

dependent. The TATA box motif is also present in plants [86, 87], however the

majority of Arabidopsis promoters are TATA-less [88]. The TATA box is bound

by the TBP subunit of TFIID ([5, 6, 62] and refs therein). Both the TATA box

element and the TBP are conserved from archaebacteria to humans [9, 89].

The consensus sequence of the TATA box is TATAWAAR, where the 5' T is

124

usually located at -30 or -31 relative to the TSS in metazoans (or at -120 to -

40 in yeast). A wide range of sequences can functionally replace the yeast

TATA box for in vivo transcriptional activity [90].

The TFIIB recognition elements (BRE), which are bound by the TFIIB

basal transcription factor, are located immediately upstream or downstream of

the TATA box, respectively [91-93]. TFIIB contacts these two elements by two

independent DNA-recognition motifs within its core domain [92]. The

consensus of the upstream BRE (BREu) is SSRCGCC [93], and the

consensus of the downstream BRE (BREd) is RTDKKKK [91]. The TFIIB and

the BRE elements are conserved from archaebacteria to humans [6, 92]. Both

BREu and BREd act in conjunction with the TATA box [6, 9]. A bioinformatics

analysis using the EPD database showed that 25% of the eukaryotic core

promoters contain a potential BREu [83]. Surprisingly, this study revealed that

the BREu is more prevalent in TATA-less promoters (28.1%) than in TATA-

containing promoters (11.8%). Both elements exert positive as well as

negative effects on basal transcription and on activated transcription in a

manner that is context-dependent [91, 93-95].

In addition to the abovementioned upstream elements there are core

promoter elements that are located downstream of the TSS. The downstream

core promoter element (DPE), which was discovered as a TFIID recognition

site that is downstream of the Inr, is precisely located at +28 to +33 relative to

the A+1 of the Inr, with a functional range set of DSWYVY [96-98]. In addition

to this functional range set, the guanine at +24 was shown to contribute to

DPE function [98]. The DPE is prevalent in developmental gene networks [10,

14, 95, 99]. Importantly, a recent study provides in vivo evidence that

expression driven by the homeotic Antennapedia P2 promoter during

Drosophila embryogenesis is dependent on the DPE [99]. The motif ten

element (MTE) was identified as an overrepresented core promoter

sequence, which is located immediately upstream of the DPE, encompassing

positions +18 to +29 relative to the A+1 of the Inr [67]. As positions +28 to +29

overlap the DPE, the MTE consensus sequence was defined for positions +18

to +27 (CSARCSSAAC) [100]. Although the majority of the MTE-containing

promoters contain a DPE, the MTE motif functions independently of the DPE

[100, 101]. Both the MTE and DPE serve as recognition sites for TFIID and

125

appear to be in close proximity to TAF6 and TAF9 [97, 101]. Using single-

nucleotide substitution analysis, the MTE and DPE together were found to

consist of three functional sub-regions: positions 18-22, 27-29 and 30-33

downstream to the A+1 of the Inr. The bridge configuration, which includes the

first and the third functional sub-regions (bridge I, positions 18-22 with favored

nucleotides CSARC; bridge II, positions 30-33 with favored nucleotides

WYVY), was shown to be a naturally rare but functional core promoter

element [101]. Both the MTE and DPE are conserved from Drosophila to

humans [6, 96, 97, 100-104]. The MTE, DPE and Bridge motifs are

exclusively dependent on the presence of a functional Inr, and are enriched in

TATA-less promoters. However, co-occurrence of putative TATA, Inr and DPE

motifs was observed in a small fraction of Drosophila genes [14, 83].

An additional downstream element was identified and characterized in

the human adult β-globin promoter. This element, termed downstream core

element (DCE), was detected by scanning mutagenesis of the +10 to +45 in

the promoter region. The DCE is composed of three sub-elements, located at

positions +6 to +11 (necessary motif CTTC), +16 to +21 (necessary motif

CTGT), and +30 to +34 (necessary motif AGC) relative to the TSS. The DCE

is distinct from the MTE, DPE and Bridge downstream elements, as the DCE

is recognized and bound by TAF1 [105] and not by TAF6 or TAF9 [97, 101].

Moreover, unlike the DPE, the DCE is frequently found in TATA box-

containing promoters [105, 106].

2.2. Core promoter elements with weak positional biases in dispersed

promoters

Even though the vast majority of core promoter elements are precisely located

in focused promoters, there are still a few variably located motifs that were

also identified in dispersed promoters. These variably located elements, like

some of the precisely located elements discussed above, are associated with

specific gene groups.

As mentioned, there are sequence motifs such as the DNA-replicated-

related element (DRE) and Ohler 1, 6 and 7 motifs, which were detected by a

computational analysis as commonly expressed in dispersed promoters of

126

Drosophila genes with maternally inherited transcripts [27]. The consensus

sequences of the DRE, Ohler 1, 6 and 7 motifs are WATCGATW,

YGGTCACACTR, KTYRGTATWTTT and KNNCAKCNCTRNY, respectively

[67]. The DRE is a target of the DNA replication-related-element binding factor

(DREF). DREF, which was discovered in Drosophila and was later found to

have orthologues in many other species (including humans), is involved in

transcriptional regulation of proliferation-related genes [107]. A motif 1 binding

protein (M1BP) has recently been identified and the enrichment of Motif 1 and

M1BP was implicated in cytoskeletal organization, mitotic cell cycle and

metabolism [108].

2.3. The interplay between core promoter elements

With the notion that there are no universal core promoter elements and that

core promoter elements are a very important feature of regulation of gene

expression, many studies examined the combinations between core promoter

elements such as: Inr, TATA box, BREu, BREd, MTE and DPE, and their

effects on the transcriptional output. For example, the BRE elements were

originally characterized as functional elements with conjugation to TATA box.

In this context, both the BREu and the BREd either increase or decrease the

levels of basal transcription [91, 93, 94, 109]. Notably, the addition of a BREu

element to a core promoter of a Caudal target gene has a differential effect on

transcription in a TATA box- or DPE- context [95]. The TATA box and the Inr

cooperate, in certain cases, as synergistic elements [110]. An antagonistic

behavior was demonstrated between TBP, which activates TATA transcription

and inhibits DPE transcription, and NC2 and Mot1, which activate DPE

transcription by inhibiting the function of TBP [111].

The functionality of the DPE, MTE and Bridge elements is, by

definition, dependent on their precise location relative to the Inr [96, 97, 100,

101]. Synergy was observed between the MTE and DPE, as well as between

the MTE and TATA box [100]. Based on these relationships, a synthetic core

promoter, termed super core promoter (SCP), containing a TATA box, Inr,

MTE and DPE was designed. Remarkably, the SCP is stronger than any of

the natural core promoters examined [112].

127

Collectively, these findings indicate that the levels of gene expression

can be modulated by the core promoter composition. Such modulation is

directly achieved by the impact of the combinations of core promoter elements

on the architecture of the basal transcription machinery, which provides an

additional level of transcriptional regulation. The core promoter may have

diversified during evolution so that each element may work with the other,

depending on the context and organism. Hence, simple categorization may

disregard the complexity of gene expression.

3. Functional and structural insights regarding the role of the core

promoter in the assembly of the Pol II transcription machinery

In this section, we describe the assembly of the basal transcription

machinery components (primarily based on the analysis of TATA-dependent

promoters) and their distinct roles in specific cellular contexts.

3.1. Terminology change: from “general” to “basal” transcription

machinery

Classic biochemical studies performed over 30 years ago using the TATA

box-containing adenovirus major late promoter identified the general

transcription factors (GTFs) as accessory factors for accurate Pol II

transcription initiation [113, 114]. The GTFs were named TFIIA, TFIIB, TFIID,

TFIIE, TFIIF and TFIIH, based on the protein fractions they purified in

(reviewed in [4]) . These components, together with Pol II, were necessary

and sufficient for basal transcription of the adenovirus major late promoter.

They assemble into the preinitiation complex (PIC) by protein-protein

interactions and by mediating core promoter recognition (Fig. 2B).

In the past, it was generally accepted that the PIC composition of GTFs

does not vary between promoters with different core promoter architecture,

and the PIC is nucleated by the binding of the TBP subunit of TFIID, which

binds the TATA box [115] (reviewed in [4, 30]). Traditionally, this simple model

has been considered “general”. However, due to the diversity in core promoter

composition and the realization that the known GTFs are insufficient to

transcribe DPE-containing promoters [116], it is suggested that the GTFs do

128

not function in a “general” manner, and different compositions of PIC exist.

Indeed, the non-ubiquitous expression pattern of certain TAFs imply that they

cannot be PIC components in every cell type [57]. Moreover, many studies

have presented the variability in PIC formation, specifically by the molecular

flexibility in TFIID composition. Hence, GTFs should be addressed as “basal”

rather than “general” transcription factors (also discussed in [57, 117-119]).

3.2. Compatibility between PIC components, related factors and core

promoter elements

Undoubtedly, the diverse assembly of the basal transcription factors, as well

as the diversity of core promoter elements, is a complex subject, both

structurally and functionally. Nevertheless, due to this complexity, the PIC,

which is pivotal for core promoter recognition ([57, 117, 120] and refs therein),

can assemble at core promoters with varying compositions and regulate Pol II

transcription in different cells and organisms. In agreement with that,

requirements for a “match” between the PIC and the core promoter have been

observed in recent years.

This compatibility has mainly been reflected in studies addressing the

flexibility and modularity of TFIID subunits and the entire TFIID complex. Early

footprinting assays detected differential TFIID protection patterns with respect

to the presence of a TATA box and BRE in mammalian promoters [121, 122],

and a DPE in Drosophila [97]. These studies and others [123] have

demonstrated the important roles of TAFs in the assembly of the PIC, and

hence, in the transcription process. As mentioned earlier, sub-modules of

TFIID bind specific core promoter elements, e.g. TBP binds the TATA box,

TAF1/TAF2 bind the Inr, TAF1 binds the DCE and TAF6/9 bind the DPE and

the MTE (Fig. 2C) [68-70, 96, 97, 100, 103, 105]. It is noteworthy that

TAF4/TAF12 and TAF4b/TAF12 sub-complexes can also bind core promoters

[103], and are necessary for transcription of a sub-group of genes, which are

mostly associated with TATA box and Inr motifs [124]. Interestingly, TAF1

contains two distinct enzymatic activities: an acetyl-transferase and a kinase

activity, which are important for regulating non-overlapping, different gene

129

sets in vivo [125], suggesting that different functional modules of the PIC

contribute to transcription of different target genes.

While TBP and TAF1 were initially considered the nucleating subunits

of holo-TFIID assembly [126], Wright et. al. [127] discovered that Drosophila

TAF4 preferentially nucleates TFIID in TATA-less, DPE-containing promoters.

This study also uncovered a stable core-sub-complex, composed of TAF5 and

the histone fold domain (HFD)-containing TAF4, TAF6, TAF9 and TAF12.

This core sub-complex is associated with the peripheral subunits TAF1, TAF2,

TAF11 and TBP. These core TAFs are incorporated into TFIID in two copies,

and are organized in five heterodimer pairs with other HFD-containing TAFs

(TAF3-TAF10, TAF6-TAF9, TAF4-TAF12, TAF8-TAF10 and TAF11-TAF13)

([120] and refs therein). Recent structural analysis of human TFIID

demonstrated that these core TAFs exhibit two-fold symmetry [128].

Interestingly, incorporation of the TAF8-TAF10 pair breaks the symmetry and

allows the entry of the single copy TAFs and TBP into the structure, resulting

in an asymmetric holo-TFIID that can nucleate the PIC.

Several TBP-free complexes have been characterized [123, 129, 130]. One of

them, the TBP-free TAF-containing complex (TFTC), is capable of replacing the

canonical TFIID at both TATA-less and TATA-containing promoters in vitro [123]. The

assembly of TAF-less TBP-containing complexes (such as TBP-TFIIA-containing

complexes) at specific core promoters, which was somewhat surprising, has also

been observed [131-133]. A TAF-free TBP-containing PIC is important for

transcription from HIV-1 LTR promoter [132]. Interestingly, a distinctive TBP-TAF

complex, lacking TAF1, TAF4 and TAF10, is involved in transcription of the U2

snRNA gene [134].

These findings add to a growing body of evidence implying that distinct

core promoters would be differentially recognized by PICs that contain TBP or

are devoid of it. Notably, TBP activates TATA-dependent transcription and

represses DPE-dependent transcription, whereas Mot1 and NC2 block TBP

function and thus repress TATA-dependent transcription and activate DPE-

dependent transcription [111, 135]. Interestingly, Deng et. al. [136]

demonstrated that NC2 acts positively at promoters that lack functional BREs,

while TFIIA recruitment, which is dependent on the presence of BREs,

reduces transcriptional activity. The association of BRE elements with TATA

130

boxes further supports these findings [83, 93]. Interestingly, the architectural

DNA-binding protein HMGA1 has been shown to interact with the Mediator

and activate transcription of mammalian promoters containing both a TATA

box and an Inr [137].

Remarkably, the Nogales lab used electron microscopy to visualize

human TFIID with promoter DNA, and discovered that TFIID exists in two

structurally distinct conformations (termed canonical and rearranged) [138].

The transition between the two states is modulated by TFIIA, and the

presence of TFIIA and promoter DNA facilitates the formation of the

rearranged conformation [138]. Human TFIID is composed from three main

structural lobes (termed lobe A, B and C) [138, 139]. Using the super core

promoter DNA [112], lobe C was shown to interact with downstream elements

(DPE and MTE), while lobe A interacts with the Inr and TATA box.

Three TBP-related factors (TRF1, TRF2 and TRF3) have been

discovered in the animal kingdom based on their homology to the C-terminal

core domain of TBP, which is essential for interaction with the TATA box

(reviewed in [117-119, 140-142]. Unlike TRF1 and TRF3 (also termed TBP2

and TBPL2), TRF2 (also termed TLP, TLF, TRP and TBPL1), is unable to

recognize the TATA box, as the TATA-interacting Phe residues of TBP are

not conserved in TRF2 [143-145]. Drosophila TRF2 selectively regulates the

TATA-less Histone H1 promoter, whereas TBP regulates the TATA-containing

core Histones genes [133, 146]. The Kadonaga lab has recently discovered

that TRF2, and not TBP, regulates transcription of ribosomal protein genes

that lack TATA box and contain functional TCT motifs [147]. Kedmi et. al.

[148] discovered that TRF2 preferentially functions as a core promoter

regulator of DPE-containing promoters. These findings and others have

highlighted the involvement of TRF2 in the regulation of diverse biological

processes driven by distinct core promoter compositions (reviewed in [119]).

Taken together, promoter recognition by multiple TAFs, TRFs, TBP-free or

TBP-containing complexes, underscore a key regulatory role for core

promoters in transcription initiation, and may provide an explanation for

evolutionary changes affecting the PIC-promoter interface [149].

131

3.3. Different basal transcription factors promote distinct biological

processes

The diversity in the components of the PIC, especially in TFIID subunits,

establishes distinct protein complexes that drive transcription of specific sets

of genes (e.g. with cell type- or tissue-specific functions) (reviewed in [150]).

The Wassarman lab has shown that Drosophila TAF1 affects multiple

developmental events in vivo [151], and that Drosophila TAF6 is broadly

required for cell growth and cell fate specification [152]. Moreover, Drosophila

TAF4 and TAF6 were shown to be required for transcription of the snail and

twist Dorsal-target genes in vivo [153]. Human TAF8 was implicated in

differentiation of cultured 3T3-L1 preadipocytes to adipocytes [154].

Interestingly, the Drosophila TAF10 homologues TAF10 and TAF10b, are

differentially expressed during Drosophila embryogenesis [155]. Expression of

mouse TAF10 was later shown to be required for early mouse embryogenesis

of the inner cell mass, but not the trophoblast [156]. Remarkably, conditional

knock out of mouse TAF10 in embryonic and adult liver resulted in the

dissociation of TFIID into individual components [157]. Based on these

findings, it was suggested that TFIID is not required for the maintenance of

ongoing transcription of hepatic genes. Rather, it is involved in mechanism of

postnatal silencing of hepatic genes [157]. Additional studies reveal an

important role for distinct TFIID complexes in regulating pluripotency of

embryonic stem cells [158, 159].

Multiple TAF paralogues have been implicated in different biological

processes. A retroposed homologue of human TAF1 (TAF1L) and TAF7L are

expressed during male germ-cells differentiation [160, 161]. Similarly to

humans, TAF7L in mice is required for spermatogenesis in cooperation with

TRF2 [161-163]. TAF7L was recently demonstrated to be an important

regulator of white- as well as brown- adipose tissue differentiation [164, 165].

TAF4b was originally identified as a cell-type-specific TAF in a human B

lymphocyte cell line [166]. Using knockout mice, TAF4b was shown to be

important for ovarian development and spermatogenesis [167-170].

Remarkably, mouse TAF9L was recently shown to regulate neuronal gene

expression in vivo [171]. Interestingly, tissue-specific TAF homologues of

132

Drosophila TAF4 (no hitter), TAF5 (cannonball), TAF6 (meiosis 1 arrest),

TAF8 (spermatocyte arrest) and TAF12 (ryan express) collaborate to control a

testis-specific transcriptional program [172].

TBP paralogues are involved in distinct biological processes, such as

embryonic development, differentiation and morphogenesis (reviewed in [117,

119, 141, 173]). TRF2 regulates a subset of genes that differ from TBP-

regulated genes. TRF2 is essential for embryonic development of C. elegans,

Drosophila, zebrafish and Xenopus [117, 119, 141, 173]. It is highly

conserved in evolution and is present in all bilaterian organisms [143]. Since

bilaterian organisms contain three germ layers (endoderm, mesoderm and

ectoderm) and more ancient animals only contain two germ layers (endoderm

and ectoderm), it is tempting to speculate that TRF2 may be important for

mesoderm formation. This suggestion is further supported by the fact that the

DPE motif is prevalent among Drosophila genes that are involved in

embryonic development [14, 95]. Mouse TRF2, unlike C. elegans, Drosophila,

zebrafish and Xenopus TRF2, is not required for embryonic development but

is essential for spermiogenesis [174, 175]. A separate study demonstrated

that the cleavage of TFIIA- precursor (into the and subunits of TFIIA) is

necessary for activation spermiogenic TRF2 target genes [176]. Drosophila

trf2 is also required for the response to the steroid hormone ecdysone during

Drosophila metamorphosis [177]. Hence, TRF2 drives multiple transcriptional

programs [119].

Zebrafish TRF3 is important for initiation of hematopoiesis during

embryonic development [178, 179], however, both zebrafish and Xenopus

TRF3 are mainly expressed in oocytes and are essential for embryogenesis

[180, 181]. Mouse TRF3, which is exclusively expressed in oocytes, is

essential for the differentiation of female germ cells but not for embryonic

development [182].

These fascinating findings emphasize the motivation to investigate the

regulation of gene expression at the core promoter level. It is possible that there are

core promoter motifs that have not yet been discovered, and they might be bound by

other PIC components. Thus, the analysis of novel core promoter elements in

multiple organisms is likely to shed light on mechanistic aspects of transcriptional

regulation.

133

4. Enhancer-promoter connectivity

Zooming out from the basal transcription resolution uncovers another facet of

regulation of gene expression, namely, enhancer-promoter interactions that

regulate the activation of specific genes in a precise spatio-temporal manner.

Enhancers contain DNA binding sites for sequence-specific transcription

factors that in turn, recruit co-activators and co-repressors and determine the

overall activity of the enhancers (reviewed in [183-190]). Originally, scientists

searched for enhancers as cis-regulatory elements that stimulate transcription

levels from the nearest promoter, irrespective of orientation. Enhancer-

promoter pairs are commonly engaged by enhancer's looping, which

physically brings these regulatory elements into proximity, through recruitment

of multiple proteins (activators, co-activators, Mediator, cohesin and the PIC).

Studies in recent years, employing advanced global methodologies such as

chromatin conformation capture (3C), its derivatives (4C, 5C, Hi-C) and ChIA-

PET, have led to the discovery of both intrachromosomal and

interchromosomal physical contacts with promoters. While multiple

enhancers can interact with multiple promoters, specificity has been

observed. The mechanisms that determine enhancer–promoter specificity are

still poorly understood, but they are thought to include biochemical

compatibility, constraints imposed by the three-dimensional architecture of

chromosomes, insulator elements, and effects of local chromatin environment

[190].

In the last twenty years, the compatibility of enhancer-promoter

interactions has mostly been studied in Drosophila. One of the early studies

analyzing the compatibility between enhancer-promoter pairs examined the

expression of the neighboring gooseberry (gsb) and gooseberry neuro (gsbn)

genes [191]. Swapping experiments revealed that although both enhancers

(GsbE and GsbnE) are located between the two TSSs of the two genes (and

thus cross-activation could potentially occur), the GsbE could only activate the

gsb promoter, while the GsbnE could only activate the gsbn promoter.

Another study showed compatibility between the decapentaplegic (dpp)

promoter and its enhancer, which only activates the dpp gene, but not other

genes that are located closer to it [192]. Erythroid-specific long-range

134

interactions have been observed in vivo between the active murine β-globin

gene and the locus control region (LCR) [193]. These long-range interactions

of the β-globin gene were not observed in non-expressing brain cells. High-

throughput imaging of thousands of transparent transgenic zebrafish embryos

(which were injected with about two hundred combinations of enhancer-core

promoter pairs driving the expression of the GFP reporter gene),

demonstrated the specificity of individual enhancer-promoter interactions and

underscored the importance of the core promoter sequence in these

interactions [194]. Taken together, these results demonstrate distinct

compatibilities of enhancers to their cognate promoters and the importance of

the core promoters in the regulation of enhancer-promoter interactions.

While a few studies in Drosophila demonstrated the involvement of

proximal-promoter elements in enhancer specificity [195, 196], there are

multiple examples of enhancer-promoter communications that are affected by

specific core promoter elements. Promoter competition experiments revealed

that both the AE1 enhancer from the Drosophila Antennapedia gene complex

and the IAB5 enhancer from the Bithorax gene complex preferentially activate

TATA-containing promoters when challenged with linked TATA-less

promoters [197]. Nevertheless, both enhancers were able to activate

transcription from a TATA-less promoter in reporters that lacked a linked

TATA-containing promoter [197]. Enhancer-promoter specificity was first

demonstrated in transgenic Drosophila sister lines that contain a DPE- or a

TATA-dependent reporter gene at precisely the same genomic position

relative to the enhancer [198]. Remarkably, this study identified enhancers

that can discriminate between core promoters that are dependent on a TATA

or a DPE motif. Furthermore, Caudal, a sequence-specific transcription and a

key regulator of the Drosophila HOX gene network, activates transcription

with a preference for a DPE motif relative to the TATA-box [95]. More

recently, Zehavi et. al. [14] analyzed the Drosophila dorsal-ventral

developmental gene network that is regulated by the sequence-specific

transcription factor Dorsal, and discovered that the majority of Dorsal target

genes contain DPE sequence motifs. The DPE motif is functional in multiple

Dorsal target genes, as mutation of the DPE leads to a loss of transcriptional

activity. Moreover, the analysis of hybrid enhancer-promoter constructs of

135

Dorsal targets reveals that the core promoter plays a pivotal role in the

transcriptional output [99].

High-throughput analyses of enhancers in diverse biological systems

have led to a wealth of information with regards to long-range enhancer-

promoter interactions and three-dimensional chromatin landscapes. We

highlight several remarkable findings below. First, most of the enhancer-

promoter interaction loops of regulated genes are distal, and are not localized

at the nearest promoter as originally considered [199-201]. Second, enhancer

looping enables cooperative regulation of genes of the same biological

process by organizing them in physical proximity [199, 201]. This may indicate

a similar core promoter composition among these gene networks or gene

clusters (as previously described for the Hox and dorsal-ventral

developmental gene regulatory networks [14, 95]).

A recently developed genome-wide screen termed STARR-seq (self-

transcribing active regulatory region sequencing) identified thousands of

enhancers that could activate transcription of a synthetic promoter containing

four core promoter elements in a single promoter - the TATA-box, Inr, MTE

and DPE motifs [202]. Notably, enhancers near ribosomal protein genes were

under-represented among the enhancers identified in this study, which could

be due to the fact that the majority of ribosomal protein gene promoters are

regulated via the TCT core promoter element [13, 190, 202].

Remarkably, both the Furlong lab analyzing enhancer three-

dimensional contacts during Drosophila embryogenesis, and the Ren lab

analyzing long-range chromatin interactions in human cells, discovered that

the majority of enhancer interactions remain unchanged during marked

developmental transitions or activation following gene induction, respectively

[199, 203]. This “on-hold” enhancer-promoter connections, may be preparing

the cell for rapid activation of transcription. The Furlong lab discovered that

the pre-existing loops are associated with paused Pol II and proposed a

model where through transcription factor–enhancer occupancy, an enhancer

loops towards the promoter and polymerase is recruited, but paused in the

majority of cases (Pol II pausing is discussed below). They suggest that the

subsequent recruitment of transcription factor(s) or additional enhancers at

preformed enhancer-promoter interaction hubs could trigger activation by

136

releasing Pol II pausing [203]. Notably, enhancer–promoter interactions

analyzed in these studies involve active promoters, with high enrichment for

H3K27ac and H3K4me3, and active enhancers, defined by H3K27ac, Pol II

and H3K79me3, indicating similarities in 3D regulatory principles from flies to

humans [199, 200, 203].

Strikingly, the Stark lab has recently demonstrated that distinct sets of

enhancers activate transcription with core promoter specificity using two types

of Drosophila cultured cells [204]. They used the core promoter of a ribosomal

protein gene driven by the TCT motif, as a representative of housekeeping

promoters, and a synthetic promoter (derived from the even skipped

promoter), which contains four core promoter elements in a single promoter -

the TATA-box, Inr, MTE and DPE motifs, as a representative of

developmental promoters. Thousands of enhancers exhibit a marked

specificity to one of the two core promoters - the housekeeping promoter or

the developmental promoter. Interestingly, TSSs next to housekeeping

enhancers were enriched in Ohler motifs 1, 5, 6 and 7 (consistent with the

ubiquitous expression and housekeeping functions of these genes), whereas

TSSs next to developmental enhancers were enriched in TATA box, Inr, MTE

and DPE motifs (which are associated with cell-type-specific gene

expression).

Taken together, these observations strengthen the concept that the

core promoter composition is not only a pivotal component in basal

transcription and initiation, but also an active regulator of transcription that is

instrumental for activating developmental and housekeeping gene regulatory

programs via sequence-encoded enhancer-promoter specificity.

5. Transcription initiation, Pol II recycling and steps in between: the

crosstalk between the core promoter and other modules in the

transcription cycle

Apart from transcription initiation, Pol II-driven transcription cycle contains

additional steps: elongation and termination. These steps contain at least

eight transition points at which transcription is regulated by multiple dedicated

factors, and each can be rate limiting (reviewed in [205, 206]). Moreover,

137

maturation of mRNA precursors occurs co-transcriptionally [207]. Below, we

briefly describe these highly regulated steps with a focus on the direct or

indirect role of the core promoter.

5.1. Timing and synchrony - Pol II pausing and productive elongation

Early elongation, following proper transcription initiation and preceding

productive elongation, contains two sequential steps: promoter-escape and

promoter-proximal pausing of Pol II. Pol II pausing is a highly regulated step,

which is characterized by accumulation of Pol II, typically at 20-60 nucleotides

downstream of the TSS (reviewed in [206, 208, 209]). The transition from

initiation to early elongation is regulated by multiple factors and

phosphorylation events of the heptad repeats within the C-terminal domain

(CTD) of the largest subunit of Pol II. The CTD is mostly unphosphorylated

when Pol II is recruited to the promoter. Serine 5 (Ser5) of the CTD is then

phosphorylated by TFIIH, which causes destabilization of the interaction

between Pol II and other PIC components and thus, permits promoter escape

and early elongation. Following Ser5 phosphorylation, association of DRB

sensitivity-inducing factor (DSIF) and Negative elongation factor (NELF)

complexes with the phosphorylated Pol II leads to pausing at the promoter-

proximal region [210]. Next, positive transcription elongation factor b (P-TEFb)

complex phosphorylates the Ser2 residue of the Ser5-phosphorylated CTD,

and the DSIF and NELF factors. These post-translational modifications result

in productive elongation (reviewed in [206, 208, 209]).

Pol II pausing was originally identified in Drosophila heat-shock and

human c-myc genes [211-214]. Although Pol II pausing was originally

considered to be restricted to a few specific genes, nowadays, the pausing of

Pol II appears to be a common step in transcription process of multiple genes

from C.elegans [215] to humans, and generally prevalent in metazoans [21,

216-220]. Specifically, multiple genome-wide assays and studies in vitro and

in vivo, mostly in Drosophila, showed that the Pol II pausing has a role in

facilitating metazoan developmental control genes and genes that respond to

environmental stimuli ([221] and refs therein, [215]). Thus, Pol II pausing

contributes to developmental dynamics, along with designated transcription

138

initiation programs [222, 223]. It was previously argued that Pol II pausing

prepares genes for a rapid and synchronous induction. Recent studies,

however, suggest that paused Pol II is not absolutely required for rapid gene

induction, as genes in which Pol II is not paused, can be induced just as

quickly, and to even higher levels than paused genes ([209, 221] and refs

therein). Promoters regulated by pausing possess a distinct chromatin

architecture that may facilitate the plasticity of gene expression in response to

signaling events [209]. Notably, paused Pol II complexes were recently shown

to be more stable than originally considered, and thus, pausing may serve as

a time-window to integrate regulatory signals [224]. There are two known

sequence-specific transcription factors that regulate pausing: the GAGA factor

(GAF) [211, 212, 218, 225] and the more recently identified M1BP factor

[108].

Pausing allows synchronous gene expression of developmentally

regulated genes following their induction during embryogenesis [221, 226-

229]. Differences in synchronicity are most likely due to the core promoter

composition, as demonstrated by promoter-swapping experiments [227] and

the relationship between Pol II pausing and core promoter sequence during

Drosophila development [226, 230].

The positive elongation factor P-TEFb controls NFκB target genes

driven by TATA-containing promoters, whereas the negative elongation factor

DSIF controls weak TATA and TATA-less genes [231]. Interestingly,

Drosophila TATA-dependent promoters are associated with a low degree of

pausing [226, 230], suggesting that the TATA box prevents Pol II pausing and

promotes P-TEFb activity, leading to a more productive elongation [231].

Remarkably, the Levine lab has shown that at least one fourth of

paused Drosophila promoters contain a shared sequence motif, the „„pause

button‟‟ (PB), whose consensus (KCGRWCG) [232] is similar to that of the

DPE (DSWYVY) [9]. The PB motif is typically located between +25 and +35

(somewhat overlapping the DPE, although it has a wider distribution with

regards to its location relative to the TSS). Over one-fifth of the paused

Drosophila promoters are enriched for the DPE, MTE and PB core promoter

motifs, all of which are located close to the pause site [232]. Notably, 75% of

the genes in the dorsal-ventral network were identified as paused genes

139

[232]. Over two thirds of Dorsal target genes contain a DPE motif [14]. These

correlations, in addition to the fact that PB and DPE are GC-rich and share

the 'GGWC' sub-consensus, and that both motifs overlap with the paused Pol

II (see above), may indicate that the DPE, as opposed to the TATA box, could

contribute to Pol II pausing. The Adelman lab has later found out that both the

DPE and PB precisely align with the peak of Pol II pausing [219].

In addition, a current study indicates that whereas proximity of Pol II

pausing to the TSSs is correlated with focused initiation, pausing at dispersed

promoters is located more distally, and with a wider pattern [221, 233].

Moreover, it seems that in contrast to dispersed promoters, Pol II pausing at

focused promoters is not dependent on nucleosome regulation. When the

core promoter elements are not located at optimal position, or do not match

the consensus sequence, pausing appears to be weaker and located more

downstream (+60 to +80) than its typical location. Thus, initiation modes and

core promoter architecture affect the strength and location of pausing [233].

It is well known that enhancers play a major effect on activity and

synchrony of gene expression in development. Remarkably, Lagha et al. [227]

used a promoter swapping strategy and advanced imaging methods and

discovered that promoters of key developmental genes play a pivotal role in

pausing, which in turn determines the “time to synchrony”- the time it takes to

achieve coordinated gene expression in over 50% of the nuclei in the

developing Drosophila embryo. The authors demonstrate that substitutions of

paused promoters (e.g. tup), which show rapid and synchronous activity, with

non-paused promoters (such as pnr), result in slow and stochastic activation

of gene expression. Moreover, elements associated with pausing (e.g. GAGA)

influence the timing and synchrony of the gene expression. The synchronous

activation is essential for proper mesoderm invagination in the developing

Drosophila embryo. They provide evidence for a positive correlation between

pausing, synchrony and gene expression levels, which are necessary for

morphogenesis. Hence, it is the promoter, and not the enhancer, that

determines the levels of paused Pol II and the synchrony of gene activation

[227, 228].

To summarize, these studies provide evidence regarding different

aspects of regulation of Pol II pausing via the core promoter. However,

140

additional biochemical studies are needed to elucidate the mechanisms

underlying pausing.

5.2. Termination, polyadenylation and recycling of Pol II - back to

square one

The promoter and terminator modules define the boundaries of the

transcribed region of protein-coding genes. Transcription termination includes

dephosphorylation of the Pol II CTD, its disassociation from the 3'-end and

cleavage of the pre-mRNA. Furthermore, this highly regulated event is

coupled with the 3'-end polyadenylation processing [234]. Numerous factors in

multi-subunit protein complexes and several RNA elements mediate the

termination/polyadenylation processes, including two central complexes:

cleavage and polyadenylation specificity factor (CPSF) and cleavage

stimulation factor (CstF) [235, 236]. Although several factors are shared, the

termination mechanism for metazoan replication-dependent core histone

genes, which are not polyadenylated, is different than the termination

mechanism of polyadenylated genes (reviewed in [235, 237, 238].

There are mutual links between transcription initiation and termination/

polyadenylation. It should be noted that although many studies were done

using yeast, we focus here on metazoan transcriptional termination. The

CPSF complex was first immunoprecipitated and co-purified with holo-TFIID

from nuclear extracts of human cell-lines almost twenty years ago [239]. The

authors showed that CPSF is recruited to the core promoter by TFIID and

later dissociates from TFIID and continues to be associated with the

elongating Pol II and later with the polyA site. Specifically, the CPSF-160

subunit mainly interacts with TAF5, TAF7 and TAF12, but not with TAF1,

TAF10 and TAF15 and minimally, if at all, with TBP. Overexpression of TBP

reduced polyadenylation of transcripts initiated from a TATA-containing

promoter, while both polyadenylated transcripts and non-polyadenylated

transcripts that initiated from a TATA-less promoter were unaffected [58, 239].

Furthermore, the recruitment of CstF by TFIIB to the core promoter through

PIC assembly was also demonstrated ([240] and refs therein). Thus, subunits

of the main termination factors CPSF and CstF are brought to the PIC and

141

transferred to Pol II, which eventually leads to transcription termination.

Moreover, components of the core histone termination machinery were also

found associated with histone promoters ([235] and refs therein).

Nevertheless, it was previously observed that the termination/polyadenylation

machinery influences PIC assembly and the efficiency of transcription re-

initiation through Pol II recycling ([241] and refs therein). These transcription

initiation-termination/polyadenylation connections are mediated by two

different chromatin and genomic mechanisms: gene looping from 3'-end

processing sites to core promoters, which brings both modules into spatial

and physical proximity, and compartmentalization of genes into “gene

factories” [3, 235, 242]. It is noteworthy that these connections and couplings

are conserved throughout eukaryotes. In this regard, it is possible that the PIC

assemblies and 3'-associated machineries of the core histone genes are

particularly specialized, as compared to other protein-encoding genes [133,

235].

In a recent paper, Oktaba et al. [243] demonstrated that the promoters

are involved in the regulation of alternative cleavage and polyadenylation. The

nuclear RNA-binding protein embryonic lethal abnormal visual system (ELAV)

is known to inhibit the canonical polyadenylation processing at the 3' UTRs of

genes, which causes to Pol II read-through and 3' UTR extension, during the

development of the nervous system in Drosophila and vertebrates. The

authors provide evidence that ELAV-mediated 3' UTR extension is dependent

on the promoter and Pol II pausing in the developing Drosophila nervous

system [243]. Using double-labeling assays and swapping promoters

experiments, they show that only reporter constructs that were driven by

promoters of known extended genes in vivo, produced extended transcripts in

transgenic Drosophila embryos. Ectopic expression of ELAV in non-neural

tissues resulted in the induction of 3‟ UTR extension. Moreover, sequence

analysis of 252 neural-specific transcripts with 3‟ UTR extensions revealed the

enrichment of the GAGA motif and Pol II pausing. Indeed, reduced 3' UTR

extension levels were observed in GAGA-binding protein Tritorax-like (Trl)-

mutant Drosophila embryos. ChIP-seq analysis revealed the enrichment of

ELAV in promoter regions of extended genes, as well as in 3' UTRs and

introns. Thus, ELAV is selectively recruited to the 3' UTRs of extended genes

142

through paused Pol II promoters, perhaps via looping between the promoters

and the termination regions. Taken together, the above studies strengthen the

link between transcription initiation and termination and the pivotal role of the

promoter in this linkage.

6. Is the dogma really composed of sequential steps? – the

transcription-translation linkage

Traditionally, eukaryotic translation has been defined as a separate process

that is independent from transcription. However, the translation machinery

depends on mRNA-maturation processing, such as the m7G cap structure at

the 5‟ UTR and its associated protein complexes [244]. These complexes

recruit the small ribosomal subunit that in turn reaches the first codon, AUG,

via a 5' UTR scanning mechanism (reviewed in [245]). A common element for

translation initiation is the Kozak element (RCCAUGG), which contains the

AUG [246, 247]. In addition to this well-defined translational initiator, a

distinguished element, Translation Initiator of Short 5' UTR (TISU), was

recently identified. Remarkably, this element is important for transcription and

initiation of translation of a specific set of genes [248]. The TISU is found in

4.5% of the mammalian protein-coding genes, with consensus sequence of

„SAASATGGCGGC‟ with rigid core-sequence of 'ATG' located at +5 to +30,

and particularly positioned around the +10 relative to the TSS [59, 248, 249].

This core promoter element is enriched in TATA-less promoters of genes

mostly involved in cellular functions such as protein metabolism and RNA

processing. As a transcriptional element, it was shown to be necessary for

transcription and its function was mediated, at least in part, by YY1 [246, 248].

As a translational element, it was defined as an optimized translation initiator

for protein-coding genes possessing a very short 5' UTR (median of 12nt) that

mediates translation in cap-dependent but ribosomal-scanning independent

manner, as opposed to the Kozak sequence [246, 249]. The 5'-TOP, a

mammalian pyrimidine-tract regulatory element, was previously characterized

as a transcriptional and translational element [76, 77, 250, 251]. It was

identified as a core promoter motif used as a transcriptional "initiator" in many

protein-biogenesis genes, and its translational activity is critical under stress

143

conditions. The translational control element (TCE) [252], another

transcription/translation element, was previously shown to regulate translation

in Drosophila testes [253]. Katzenberger et. al. [254] recently showed that the

overlapping transcriptional motifs, testis element 1 (TE1) and testis element 2

(TE2), which are overrepresented in testis-specific core promoters, are

together identical (TE1/2 motif) to the original TCE. Thus, this element is a

transcriptional element, too. The TCE is identified as a transcriptional element

in 45% of Drosophila testis-specific genes that are driven by focused

promoters. Its consensus sequence is “CTCAAAATTT”, with enrichment in the

-5 to +25 region, but without precise location relative to the TSS [254].

Hence, these three core promoter motifs play pivotal roles in both

transcription and translation of distinct sets of genes. Moreover, correlations

between the TATA box and different features of genes (e.g. gene length) have

been observed [255]. This co-regulation of these processes raises questions

regarding the interplay between transcription and translation, such as: Do

downstream core promoter elements affect the translation of these genes?

Based on the fact that the 5' UTRs of some organisms are short, are these

elements evolutionarily conserved? Indeed, a recent study reveals general

associations and co-occurrence between translational and transcriptional

regulatory trends and features, including core promoter composition [256].

Taken together, the core promoter region is, at least in part, a central

intersection for coordinating transcription and translation.

7. Discussion and future perspectives

In this review, we discussed diverse aspects of regulation of gene expression,

particularly in metazoans, with an emphasis on the core promoter. We

highlighted the complexity of the core promoter architecture. Furthermore, we

presented its intricate connections and its pivotal influences on different steps

of transcription: initiation, elongation, termination, polyadenylation and finally,

translation (Fig. 3). Moreover, we would like to raise a few issues that are

directly related to the core promoter but were not mentioned above.

First, in addition to the diversity of core promoter elements and the

relationships between them, nucleotide polymorphism in the core promoter

144

affects its activity including its binding by the PIC components. Multiple lines

of evidence point towards polymorphisms in many human promoters,

particularly in the TATA box sequence. These TATA box substitutions can

affect TBP binding and core promoter activity, and are associated with human

diseases ([257], reviewed in [258]). It is expected that like TATA box

polymorphism, polymorphisms in other elements exist, and may be clinically

relevant.

Second, the enhancer-promoter interactome seems to be a much more

complex landscape than previously considered. In agreement with that,

promoter-promoter interactions have recently been found [259]. These

interactions behave as enhancer-promoter interactions, where one promoter

is able to act as an enhancer of another. Hence, hypothetical, more

complicated hierarchies of direct and indirect interactions between enhancers

and promoters could be achieved (e.g. generating an enhancer-promoter-

promoter hub).

Moreover, an additional regulatory aspect that is associated with

enhancers is the discovery of enhancer-derived RNAs (eRNAs). This class of

ncRNAs was only discovered a few years ago in humans [260]. eRNAs are

short-lived, 5'-capped transcripts produced from enhancer regions. Their

expression is correlated with histone marks of active enhancers (H3K4me1

and H3K27ac), and they are enriched for transcription factors, co-activators

(such as p300/CBP), basal transcription factors and Ser5-phosphorylated Pol

II. eRNAs are preferentially found in enhancers that contact their target

promoters though enhancer-looping, and it is suggested that these transcripts

play a role in generating or maintaining enhancer-promoter-loops and in

facilitating the recruitment of sequence-specific transcription factors,

chromatin remodeling or chromatin modifying complexes to the targeted

promoters [52]. Additionally, eRNAs are associated with several signaling-

pathways ([52, 53] and refs therein). Although eRNAs are extensively

investigated, also by high-scale methodologies [261], little is known about

their core promoter compositions and their TSS architectures [54]. Hence, one

of the future goals should be an in-depth investigation of the core promoter

architectures of eRNAs and their transcriptional machineries.

145

Actually, in agreement with the current knowledge that many active

mammalian promoters are bidirectional [21, 56], a study published several

months ago revealed shared architectures of bidirectional initiations at

promoters and active enhancers [54]. On one hand, similar trends and profiles

of transcription factor binding, nucleosome positioning, histone marks and

similar frequencies of sequence motifs such as the TATA box, BREs and Inr

(YR only) were present in both promoters and transcribed enhancers. On the

other, these modules differ in the stability of the transcripts that they

synthesize in each direction: promoters give rise to stable transcripts in the

sense direction, whereas promoter upstream antisense RNA and enhancer

RNAs are rapidly degraded [54]. This unifying architecture of TSSs [262]

along with recent findings (e.g. promoter-promoter interactions) challenge the

traditional classification of promoters and enhancers (see also [263]). It is

noteworthy that Core et. al. [54] indicated that although there are distinct

pause modes, which include proximal focused pausing and distal dispersed

pausing (see also [233]), the length between the bidirectional TSS pairs and

the peaks of TFIIB are not affected. This high-resolution analysis of nascent

RNAs might also imply that the high frequency of dispersed mammalian core

promoters observed previously, represents multiple independent initiation

sites acting as enhancers for neighboring promoters [54]. Thus, the

phenomena of dispersed mammalian promoters might be less abundant than

originally perceived. Taken together, the growing body of evidence indicates

that the core promoter lies at the heart of gene expression.

146

Acknowledgments

We thank Ron Even for graphic design assistance. We thank Jim Kadonaga,

Uwe Ohler, Sascha Duttke, Anna Sloutskin, Hila Shir-Shapira and Racheli

Harshish for critical reading of the manuscript. Core promoter-related

research in the Juven-Gershon lab is supported by grants from the Israel

Science Foundation (no. 798/10), the European Union Seventh Framework

Programme (Marie Curie International Reintegration Grant; no. 256491), the

United States-Israel Binational Science Foundation (no. 2009428; joint with

James T. Kadonaga) and the German-Israeli Foundation for Scientific

Research and Development (no. I-1220-363.13/2012; joint with Eileen E.M.

Furlong).

147

References

[1] E. Splinter, W. de Laat, The complex transcription regulatory landscape of our

genome: control in three dimensions, EMBO J, 30 (2011) 4345-4355.

[2] X. Dong, M.C. Greven, A. Kundaje, S. Djebali, J.B. Brown, C. Cheng, T.R.

Gingeras, M. Gerstein, R. Guigo, E. Birney, Z. Weng, Modeling gene expression

using chromatin features in various cellular contexts, Genome Biol, 13 (2012) R53.

[3] J. Shandilya, S.G. Roberts, The transcription cycle in eukaryotes: from productive

initiation to RNA polymerase II recycling, Biochim Biophys Acta, 1819 (2012) 391-

400.

[4] M.C. Thomas, C.M. Chiang, The general transcription machinery and general

cofactors, Crit Rev Biochem Mol Biol, 41 (2006) 105-178.

[5] J.E. Butler, J.T. Kadonaga, The RNA polymerase II core promoter: a key

component in the regulation of gene expression, Genes Dev, 16 (2002) 2583-2592.

[6] J.T. Kadonaga, Perspectives on the RNA polymerase II core promoter, Wiley

Interdiscip Rev Dev Biol, 1 (2012) 40-51.

[7] B. Li, M. Carey, J.L. Workman, The role of chromatin during transcription, Cell,

128 (2007) 707-719.

[8] E. Valen, A. Sandelin, Genomic and chromatin signals underlying transcription

start-site selection, Trends Genet, 27 (2011) 475-485.

[9] T. Juven-Gershon, J.T. Kadonaga, Regulation of gene expression via the core

promoter and the basal transcriptional machinery, Dev Biol, 339 (2010) 225-229.

[10] B. Lenhard, A. Sandelin, P. Carninci, Metazoan promoters: emerging

characteristics and insights into transcriptional regulation, Nat Rev Genet, 13 (2012)

233-245.

[11] N.D. Heintzman, B. Ren, The gateway to transcription: identifying, characterizing

and understanding promoters in the eukaryotic genome, Cell Mol Life Sci, 64 (2007)

386-400.

[12] T. Juven-Gershon, J.Y. Hsu, J.W. Theisen, J.T. Kadonaga, The RNA

polymerase II core promoter - the gateway to transcription, Current opinion in cell

biology, 20 (2008) 253-259.

[13] T.J. Parry, J.W. Theisen, J.Y. Hsu, Y.L. Wang, D.L. Corcoran, M. Eustice, U.

Ohler, J.T. Kadonaga, The TCT motif, a key component of an RNA polymerase II

transcription system for the translational machinery, Genes Dev, 24 (2010) 2013-

2018.

[14] Y. Zehavi, O. Kuznetsov, A. Ovadia-Shochat, T. Juven-Gershon, Core promoter

functions in the regulation of gene expression of Drosophila dorsal target genes, The

Journal of biological chemistry, 289 (2014) 11993-12004.

148

[15] A. Sandelin, P. Carninci, B. Lenhard, J. Ponjavic, Y. Hayashizaki, D.A. Hume,

Mammalian RNA polymerase II core promoters: insights from genome-wide studies,

Nat Rev Genet, 8 (2007) 424-436.

[16] T. Ni, D.L. Corcoran, E.A. Rach, S. Song, E.P. Spana, Y. Gao, U. Ohler, J. Zhu,

A paired-end sequencing strategy to map the complex landscape of transcription

initiation, Nat Methods, 7 (2010) 521-527.

[17] M.A. Frohman, M.K. Dush, G.R. Martin, Rapid production of full-length cDNAs

from rare transcripts: amplification using a single gene-specific oligonucleotide

primer, Proc Natl Acad Sci U S A, 85 (1988) 8998-9002.

[18] T. Shiraki, S. Kondo, S. Katayama, K. Waki, T. Kasukawa, H. Kawaji, R.

Kodzius, A. Watahiki, M. Nakamura, T. Arakawa, S. Fukuda, D. Sasaki, A.

Podhajska, M. Harbers, J. Kawai, P. Carninci, Y. Hayashizaki, Cap analysis gene

expression for high-throughput analysis of transcriptional starting point and

identification of promoter usage, Proc Natl Acad Sci U S A, 100 (2003) 15776-15781.

[19] P.G. Giresi, J. Kim, R.M. McDaniell, V.R. Iyer, J.D. Lieb, FAIRE (Formaldehyde-

Assisted Isolation of Regulatory Elements) isolates active regulatory elements from

human chromatin, Genome Res, 17 (2007) 877-885.

[20] T.S. Furey, ChIP-seq and beyond: new and improved methodologies to detect

and characterize protein-DNA interactions, Nat Rev Genet, 13 (2012) 840-852.

[21] L.J. Core, J.J. Waterfall, J.T. Lis, Nascent RNA sequencing reveals widespread

pausing and divergent initiation at human promoters, Science, 322 (2008) 1845-

1848.

[22] Z. Wang, M. Gerstein, M. Snyder, RNA-Seq: a revolutionary tool for

transcriptomics, Nat Rev Genet, 10 (2009) 57-63.

[23] N.L. Washington, E.O. Stinson, M.D. Perry, P. Ruzanov, S. Contrino, R. Smith,

Z. Zha, R. Lyne, A. Carr, P. Lloyd, E. Kephart, S.J. McKay, G. Micklem, L.D. Stein,

S.E. Lewis, The modENCODE Data Coordination Center: lessons in harvesting

comprehensive experimental details, Database (Oxford), 2011 (2011) bar023.

[24] The ENCODE (ENCyclopedia Of DNA Elements) Project, Science, 306 (2004)

636-640.

[25] A.R. Forrest, H. Kawaji, M. Rehli, J.K. Baillie, M.J. de Hoon, T. Lassmann, M.

Itoh, K.M. Summers, H. Suzuki, C.O. Daub, J. Kawai, P. Heutink, W. Hide, T.C.

Freeman, B. Lenhard, V.B. Bajic, M.S. Taylor, V.J. Makeev, A. Sandelin, D.A. Hume,

P. Carninci, Y. Hayashizaki, A promoter-level mammalian expression atlas, Nature,

507 (2014) 462-470.

[26] P. Carninci, A. Sandelin, B. Lenhard, S. Katayama, K. Shimokawa, J. Ponjavic,

C.A. Semple, M.S. Taylor, P.G. Engstrom, M.C. Frith, A.R. Forrest, W.B. Alkema,

149

S.L. Tan, C. Plessy, R. Kodzius, T. Ravasi, T. Kasukawa, S. Fukuda, M. Kanamori-

Katayama, Y. Kitazume, H. Kawaji, C. Kai, M. Nakamura, H. Konno, K. Nakano, S.

Mottagui-Tabar, P. Arner, A. Chesi, S. Gustincich, F. Persichetti, H. Suzuki, S.M.

Grimmond, C.A. Wells, V. Orlando, C. Wahlestedt, E.T. Liu, M. Harbers, J. Kawai,

V.B. Bajic, D.A. Hume, Y. Hayashizaki, Genome-wide analysis of mammalian

promoter architecture and evolution, Nat Genet, 38 (2006) 626-635.

[27] E.A. Rach, H.Y. Yuan, W.H. Majoros, P. Tomancak, U. Ohler, Motif composition,

conservation and condition-specificity of single and alternative transcription start sites

in the Drosophila genome, Genome Biol, 10 (2009) R73.

[28] V.B. Bajic, S.L. Tan, A. Christoffels, C. Schonbach, L. Lipovich, L. Yang, O.

Hofmann, A. Kruger, W. Hide, C. Kai, J. Kawai, D.A. Hume, P. Carninci, Y.

Hayashizaki, Mice and men: their promoter properties, PLoS Genet, 2 (2006) e54.

[29] R.A. Hoskins, J.M. Landolin, J.B. Brown, J.E. Sandler, H. Takahashi, T.

Lassmann, C. Yu, B.W. Booth, D. Zhang, K.H. Wan, L. Yang, N. Boley, J. Andrews,

T.C. Kaufman, B.R. Graveley, P.J. Bickel, P. Carninci, J.W. Carlson, S.E. Celniker,

Genome-wide analysis of promoter architecture in Drosophila melanogaster,

Genome Res, 21 (2011) 182-192.

[30] M. Baumann, J. Pontiller, W. Ernst, Structure and basal transcription complex of

RNA polymerase II core promoters in the mammalian genome: an overview, Mol

Biotechnol, 45 (2010) 241-247.

[31] S.J. Cooper, N.D. Trinklein, E.D. Anton, L. Nguyen, R.M. Myers, Comprehensive

analysis of transcriptional promoter structure and function in 1% of the human

genome, Genome Res, 16 (2006) 1-10.

[32] T.H. Kim, L.O. Barrera, M. Zheng, C. Qu, M.A. Singer, T.A. Richmond, Y. Wu,

R.D. Green, B. Ren, A high-resolution map of active promoters in the human

genome, Nature, 436 (2005) 876-880.

[33] M.C. Frith, Explaining the correlations among properties of mammalian

promoters, Nucleic Acids Res, 42 (2014) 4823-4832.

[34] J.A. Stamatoyannopoulos, Illuminating eukaryotic transcription start sites, Nat

Methods, 7 (2010) 501-503.

[35] N. Adachi, M.R. Lieber, Bidirectional gene organization: a common architectural

feature of the human genome, Cell, 109 (2002) 807-809.

[36] J.C. Ame, V. Schreiber, V. Fraulob, P. Dolle, G. de Murcia, C.P. Niedergang, A

bidirectional promoter connects the poly(ADP-ribose) polymerase 2 (PARP-2) gene

to the gene for RNase P RNA. structure and expression of the mouse PARP-2 gene,

The Journal of biological chemistry, 276 (2001) 11092-11099.

150

[37] A.S. Orekhova, P.M. Rubtsov, Bidirectional promoters in the transcription of

mammalian genomes, Biochemistry. Biokhimiia, 78 (2013) 335-341.

[38] V. Gotea, H.M. Petrykowska, L. Elnitski, Bidirectional promoters as important

drivers for the emergence of species-specific transcripts, PloS one, 8 (2013) e57323.

[39] M.Q. Yang, L.L. Elnitski, Diversity of core promoter elements comprising human

bidirectional promoters, BMC genomics, 9 Suppl 2 (2008) S3.

[40] P.G. Engstrom, H. Suzuki, N. Ninomiya, A. Akalin, L. Sessa, G. Lavorgna, A.

Brozzi, L. Luzi, S.L. Tan, L. Yang, G. Kunarso, E.L. Ng, S. Batalov, C. Wahlestedt, C.

Kai, J. Kawai, P. Carninci, Y. Hayashizaki, C. Wells, V.B. Bajic, V. Orlando, J.F. Reid,

B. Lenhard, L. Lipovich, Complex Loci in human and mouse genomes, PLoS Genet,

2 (2006) e47.

[41] G. Wang, K. Qi, Y. Zhao, Y. Li, L. Juan, M. Teng, L. Li, Y. Liu, Y. Wang,

Identification of regulatory regions of bidirectional genes in cervical cancer, BMC

medical genomics, 6 Suppl 1 (2013) S5.

[42] M.U. Kaikkonen, M.T. Lam, C.K. Glass, Non-coding RNAs as regulators of gene

expression and epigenetics, Cardiovascular research, 90 (2011) 430-440.

[43] P. Kapranov, J. Cheng, S. Dike, D.A. Nix, R. Duttagupta, A.T. Willingham, P.F.

Stadler, J. Hertel, J. Hackermuller, I.L. Hofacker, I. Bell, E. Cheung, J. Drenkow, E.

Dumais, S. Patel, G. Helt, M. Ganesh, S. Ghosh, A. Piccolboni, V. Sementchenko, H.

Tammana, T.R. Gingeras, RNA maps reveal new RNA classes and a possible

function for pervasive transcription, Science, 316 (2007) 1484-1488.

[44] W. Wei, V. Pelechano, A.I. Jarvelin, L.M. Steinmetz, Functional consequences of

bidirectional promoters, Trends Genet, 27 (2011) 267-276.

[45] Y. He, B. Vogelstein, V.E. Velculescu, N. Papadopoulos, K.W. Kinzler, The

antisense transcriptomes of human cells, Science, 322 (2008) 1855-1857.

[46] P. Preker, J. Nielsen, S. Kammler, S. Lykke-Andersen, M.S. Christensen, C.K.

Mapendano, M.H. Schierup, T.H. Jensen, RNA exosome depletion reveals

transcription upstream of active human promoters, Science, 322 (2008) 1851-1854.

[47] A.C. Seila, J.M. Calabrese, S.S. Levine, G.W. Yeo, P.B. Rahl, R.A. Flynn, R.A.

Young, P.A. Sharp, Divergent transcription from active promoters, Science, 322

(2008) 1849-1851.

[48] S. Buratowski, Transcription. Gene expression--where to start?, Science, 322

(2008) 1804-1805.

[49] P. Richard, J.L. Manley, How bidirectional becomes unidirectional, Nature

structural & molecular biology, 20 (2013) 1022-1024.

[50] A.E. Almada, X. Wu, A.J. Kriz, C.B. Burge, P.A. Sharp, Promoter directionality is

controlled by U1 snRNP and polyadenylation signals, Nature, 499 (2013) 360-363.

151

[51] E. Ntini, A.I. Jarvelin, J. Bornholdt, Y. Chen, M. Boyd, M. Jorgensen, R.

Andersson, I. Hoof, A. Schein, P.R. Andersen, P.K. Andersen, P. Preker, E. Valen, X.

Zhao, V. Pelechano, L.M. Steinmetz, A. Sandelin, T.H. Jensen, Polyadenylation site-

induced decay of upstream transcripts enforces promoter directionality, Nature

structural & molecular biology, 20 (2013) 923-928.

[52] F. Lai, R. Shiekhattar, Enhancer RNAs: the new molecules of transcription,

Current opinion in genetics & development, 25 (2014) 38-42.

[53] M.T. Lam, W. Li, M.G. Rosenfeld, C.K. Glass, Enhancer RNAs and regulated

transcriptional programs, Trends in biochemical sciences, 39 (2014) 170-182.

[54] L.J. Core, A.L. Martins, C.G. Danko, C.T. Waters, A. Siepel, J.T. Lis, Analysis of

nascent RNA identifies a unified architecture of initiation regions at mammalian

promoters and enhancers, Nat Genet, 46 (2014) 1311-1320.

[55] M. Uesaka, O. Nishimura, Y. Go, K. Nakashima, K. Agata, T. Imamura,

Bidirectional promoters are the major source of gene activation-associated non-

coding RNAs in mammals, BMC genomics, 15 (2014) 35.

[56] S.H. Duttke, S.A. Lacadie, M.M. Ibrahim, C.K. Glass, D.L. Corcoran, C. Benner,

S. Heinz, J.T. Kadonaga, U. Ohler, Human Promoters Are Intrinsically Directional,

Molecular cell, (2015).

[57] F. Muller, L. Tora, The multicoloured world of promoter recognition complexes,

EMBO J, 23 (2004) 2-8.

[58] L. Tora, A unified nomenclature for TATA box binding protein (TBP)-associated

factors (TAFs) involved in RNA polymerase II transcription, Genes Dev, 16 (2002)

673-675.

[59] R. Dikstein, The unexpected traits associated with core promoter elements,

Transcription, 2 (2011) 201-206.

[60] J.T. Kadonaga, The DPE, a core promoter element for transcription by RNA

polymerase II, Exp Mol Med, 34 (2002) 259-264.

[61] S.T. Smale, J.T. Kadonaga, The RNA polymerase II core promoter, Annu Rev

Biochem, 72 (2003) 449-479.

[62] F. Muller, L. Tora, Chromatin and DNA sequences in defining promoters for

transcription initiation, Biochim Biophys Acta, 1839 (2014) 118-128.

[63] J. Corden, B. Wasylyk, A. Buchwalder, P. Sassone-Corsi, C. Kedinger, P.

Chambon, Promoter sequences of eukaryotic protein-coding genes, Science, 209

(1980) 1406-1414.

[64] S.T. Smale, D. Baltimore, The "initiator" as a transcription control element, Cell,

57 (1989) 103-113.

152

[65] P.C. FitzGerald, D. Sturgill, A. Shyakhtenko, B. Oliver, C. Vinson, Comparative

genomics of Drosophila and human core promoters, Genome Biol, 7 (2006) R53.

[66] N.I. Gershenzon, E.N. Trifonov, I.P. Ioshikhes, The features of Drosophila core

promoters revealed by statistical analysis, BMC genomics, 7 (2006) 161.

[67] U. Ohler, G.C. Liao, H. Niemann, G.M. Rubin, Computational analysis of core

promoters in the Drosophila genome, Genome Biol, 3 (2002) RESEARCH0087.

[68] J. Kaufmann, S.T. Smale, Direct recognition of initiator elements by a component

of the transcription factor IID complex, Genes Dev, 8 (1994) 821-829.

[69] C.P. Verrijzer, J.L. Chen, K. Yokomori, R. Tjian, Binding of TAFs to core

elements directs promoter selectivity by RNA polymerase II, Cell, 81 (1995) 1115-

1125.

[70] G.E. Chalkley, C.P. Verrijzer, DNA binding site selection by RNA polymerase II

TAFs: a TAF(II)250-TAF(II)150 complex recognizes the initiator, EMBO J, 18 (1999)

4835-4845.

[71] R. Javahery, A. Khachi, K. Lo, B. Zenzie-Gregory, S.T. Smale, DNA sequence

requirements for transcriptional initiator activity in mammalian cells, Mol Cell Biol, 14

(1994) 116-127.

[72] B.A. Purnell, P.A. Emanuel, D.S. Gilmour, TFIID sequence recognition of the

initiator and sequences farther downstream in Drosophila class II genes, Genes Dev,

8 (1994) 830-842.

[73] C. Yang, E. Bolotin, T. Jiang, F.M. Sladek, E. Martinez, Prevalence of the

initiator over the TATA box in human and yeast genes and identification of DNA

motifs enriched in human TATA-less core promoters, Gene, 389 (2007) 52-65.

[74] M.C. Frith, E. Valen, A. Krogh, Y. Hayashizaki, P. Carninci, A. Sandelin, A code

for transcription initiation in mammalian genomes, Genome Res, 18 (2008) 1-12.

[75] G. Yarden, R. Elfakess, K. Gazit, R. Dikstein, Characterization of sINR, a strict

version of the Initiator core promoter element, Nucleic Acids Res, 37 (2009) 4234-

4246.

[76] N. Hariharan, R.P. Perry, Functional dissection of a mouse ribosomal protein

promoter: significance of the polypyrimidine initiator and an element in the TATA-box

region, Proc Natl Acad Sci U S A, 87 (1990) 1526-1530.

[77] A. Shibui-Nihei, Y. Ohmori, K. Yoshida, J. Imai, I. Oosuga, M. Iidaka, Y. Suzuki,

J. Mizushima-Sugano, K. Yoshitomo-Nakagawa, S. Sugano, The 5' terminal

oligopyrimidine tract of human elongation factor 1A-1 gene functions as a

transcriptional initiator and produces a variable number of Us at the transcriptional

level, Gene, 311 (2003) 137-145.

153

[78] R.P. Perry, The architecture of mammalian ribosomal protein promoters, BMC

Evol Biol, 5 (2005) 15.

[79] T.L. Hamilton, M. Stoneley, K.A. Spriggs, M. Bushell, TOPs and their regulation,

Biochem Soc Trans, 34 (2006) 12-16.

[80] Y. Tokusumi, Y. Ma, X. Song, R.H. Jacobson, S. Takada, The new core

promoter element XCPE1 (X Core Promoter Element 1) directs activator-, mediator-,

and TATA-binding protein-dependent but TFIID-independent RNA polymerase II

transcription from TATA-less promoters, Mol Cell Biol, 27 (2007) 1844-1858.

[81] R. Anish, M.B. Hossain, R.H. Jacobson, S. Takada, Characterization of

transcription from TATA-less promoters: identification of a new core promoter

element XCPE2 and analysis of factor requirements, PloS one, 4 (2009) e5103.

[82] M.L. Goldberg, Ph.D. thesis, in: Stanford University 1979.

[83] N.I. Gershenzon, I.P. Ioshikhes, Synergy of human Pol II core promoter

elements revealed by statistical sequence analysis, Bioinformatics, 21 (2005) 1295-

1300.

[84] M. Mencia, Z. Moqtaderi, J.V. Geisberg, L. Kuras, K. Struhl, Activator-specific

recruitment of TFIID and regulation of ribosomal protein genes in yeast, Molecular

cell, 9 (2002) 823-833.

[85] A.D. Basehoar, S.J. Zanton, B.F. Pugh, Identification and distinct regulation of

yeast TATA box-containing genes, Cell, 116 (2004) 699-709.

[86] C. Molina, E. Grotewold, Genome wide analysis of Arabidopsis core promoters,

BMC genomics, 6 (2005) 25.

[87] Y.Y. Yamamoto, H. Ichida, T. Abe, Y. Suzuki, S. Sugano, J. Obokata,

Differentiation of core promoter architecture between plants and mammals revealed

by LDSS analysis, Nucleic Acids Res, 35 (2007) 6219-6226.

[88] T. Morton, J. Petricka, D.L. Corcoran, S. Li, C.M. Winter, A. Carda, P.N. Benfey,

U. Ohler, M. Megraw, Paired-end analysis of transcription start sites in Arabidopsis

reveals plant-specific promoter signatures, The Plant cell, 26 (2014) 2746-2760.

[89] J.N. Reeve, Archaeal chromatin and transcription, Molecular microbiology, 48

(2003) 587-598.

[90] V.L. Singer, C.R. Wobbe, K. Struhl, A wide variety of DNA sequences can

functionally replace a yeast TATA element for transcriptional activation, Genes Dev,

4 (1990) 636-645.

[91] W. Deng, S.G. Roberts, A core promoter element downstream of the TATA box

that is recognized by TFIIB, Genes Dev, 19 (2005) 2418-2423.

[92] W. Deng, S.G. Roberts, TFIIB and the regulation of transcription by RNA

polymerase II, Chromosoma, 116 (2007) 417-429.

154

[93] T. Lagrange, A.N. Kapanidis, H. Tang, D. Reinberg, R.H. Ebright, New core

promoter element in RNA polymerase II-dependent transcription: sequence-specific

DNA binding by transcription factor IIB, Genes Dev, 12 (1998) 34-44.

[94] R. Evans, J.A. Fairley, S.G. Roberts, Activator-mediated disruption of sequence-

specific DNA contacts by the general transcription factor TFIIB, Genes Dev, 15

(2001) 2945-2949.

[95] T. Juven-Gershon, J.Y. Hsu, J.T. Kadonaga, Caudal, a key developmental

regulator, is a DPE-specific transcriptional factor, Genes Dev, 22 (2008) 2823-2830.

[96] T.W. Burke, J.T. Kadonaga, Drosophila TFIID binds to a conserved downstream

basal promoter element that is present in many TATA-box-deficient promoters,

Genes Dev, 10 (1996) 711-724.

[97] T.W. Burke, J.T. Kadonaga, The downstream core promoter element, DPE, is

conserved from Drosophila to humans and is recognized by TAFII60 of Drosophila,

Genes Dev, 11 (1997) 3020-3031.

[98] A.K. Kutach, J.T. Kadonaga, The downstream promoter element DPE appears to

be as widely used as the TATA box in Drosophila core promoters, Mol Cell Biol, 20

(2000) 4754-4764.

[99] Y. Zehavi, A. Sloutskin, O. Kuznetsov, T. Juven-Gershon, The core promoter

composition establishes a new dimension in developmental gene networks, Nucleus,

5 (2014).

[100] C.Y. Lim, B. Santoso, T. Boulay, E. Dong, U. Ohler, J.T. Kadonaga, The MTE,

a new core promoter element for transcription by RNA polymerase II, Genes Dev, 18

(2004) 1606-1617.

[101] J.W. Theisen, C.Y. Lim, J.T. Kadonaga, Three key subregions contribute to the

function of the downstream RNA polymerase II core promoter, Mol Cell Biol, 30

(2010) 3471-3479.

[102] T. Zhou, C.M. Chiang, The intronless and TATA-less human TAF(II)55 gene

contains a functional initiator and a downstream promoter element, The Journal of

biological chemistry, 276 (2001) 25503-25511.

[103] H. Shao, M. Revach, S. Moshonov, Y. Tzuman, K. Gazit, S. Albeck, T. Unger,

R. Dikstein, Core promoter binding by histone-like TAF complexes, Mol Cell Biol, 25

(2005) 206-219.

[104] S.H. Duttke, RNA polymerase III accurately initiates transcription from RNA

polymerase II promoters in vitro, The Journal of biological chemistry, 289 (2014)

20396-20404.

155

[105] D.H. Lee, N. Gershenzon, M. Gupta, I.P. Ioshikhes, D. Reinberg, B.A. Lewis,

Functional characterization of core promoter elements: the downstream core element

is recognized by TAF1, Mol Cell Biol, 25 (2005) 9674-9686.

[106] B.A. Lewis, T.K. Kim, S.H. Orkin, A downstream element in the human beta-

globin promoter: evidence of extended sequence-specific transcription factor IID

contacts, Proc Natl Acad Sci U S A, 97 (2000) 7172-7177.

[107] A. Matsukage, F. Hirose, M.A. Yoo, M. Yamaguchi, The DRE/DREF

transcriptional regulatory system: a master key for cell proliferation, Biochim Biophys

Acta, 1779 (2008) 81-89.

[108] J. Li, D.S. Gilmour, Distinct mechanisms of transcriptional pausing orchestrated

by GAGA factor and M1BP, a novel transcription factor, EMBO J, 32 (2013) 1829-

1841.

[109] Z. Chen, J.L. Manley, Core promoter elements and TAFs contribute to the

diversity of transcriptional activation in vertebrates, Mol Cell Biol, 23 (2003) 7350-

7362.

[110] E. Martinez, H. Ge, Y. Tao, C.X. Yuan, V. Palhan, R.G. Roeder, Novel

cofactors and TFIIA mediate functional core promoter selectivity by the human

TAFII150-containing TFIID complex, Mol Cell Biol, 18 (1998) 6571-6583.

[111] J.Y. Hsu, T. Juven-Gershon, M.T. Marr, 2nd, K.J. Wright, R. Tjian, J.T.

Kadonaga, TBP, Mot1, and NC2 establish a regulatory circuit that controls DPE-

dependent versus TATA-dependent transcription, Genes Dev, 22 (2008) 2353-2358.

[112] T. Juven-Gershon, S. Cheng, J.T. Kadonaga, Rational design of a super core

promoter that enhances gene expression, Nat Methods, 3 (2006) 917-922.

[113] T. Matsui, J. Segall, P.A. Weil, R.G. Roeder, Multiple factors required for

accurate initiation of transcription by purified RNA polymerase II, The Journal of

biological chemistry, 255 (1980) 11992-11996.

[114] M. Samuels, A. Fire, P.A. Sharp, Separation and characterization of factors

mediating accurate transcription by RNA polymerase II, The Journal of biological

chemistry, 257 (1982) 14419-14427.

[115] Y. He, J. Fang, D.J. Taatjes, E. Nogales, Structural visualization of key steps in

human transcription initiation, Nature, 495 (2013) 481-486.

[116] B.A. Lewis, R.J. Sims, 3rd, W.S. Lane, D. Reinberg, Functional characterization

of core promoter elements: DPE-specific transcription requires the protein kinase

CK2 and the PC4 coactivator, Molecular cell, 18 (2005) 471-481.

[117] F. Muller, M.A. Demeny, L. Tora, New problems in RNA polymerase II

transcription initiation: matching the diversity of core promoters with a variety of

156

promoter recognition factors, The Journal of biological chemistry, 282 (2007) 14685-

14689.

[118] T.W. Sikorski, S. Buratowski, The basal initiation machinery: beyond the

general transcription factors, Current opinion in cell biology, 21 (2009) 344-351.

[119] Y. Zehavi, A. Kedmi, D. Ideses, T. Juven-Gershon, TRF2: TRansForming the

view of general transcription factors, Transcription, (2015) 0.

[120] G. Papai, P.A. Weil, P. Schultz, New insights into the function of transcription

factor TFIID from recent structural studies, Current opinion in genetics &

development, 21 (2011) 219-224.

[121] N. Nakajima, M. Horikoshi, R.G. Roeder, Factors involved in specific

transcription by mammalian RNA polymerase II: purification, genetic specificity, and

TATA box-promoter interactions of TFIID, Mol Cell Biol, 8 (1988) 4028-4040.

[122] C.M. Chiang, H. Ge, Z. Wang, A. Hoffmann, R.G. Roeder, Unique TATA-

binding protein-containing complexes and cofactors involved in transcription by RNA

polymerases II and III, EMBO J, 12 (1993) 2749-2762.

[123] E. Wieczorek, M. Brand, X. Jacq, L. Tora, Function of TAF(II)-containing

complex without TBP in transcription by RNA polymerase II, Nature, 393 (1998) 187-

191.

[124] K. Gazit, S. Moshonov, R. Elfakess, M. Sharon, G. Mengus, I. Davidson, R.

Dikstein, TAF4/4b x TAF12 displays a unique mode of DNA binding and is required

for core promoter function of a subset of genes, The Journal of biological chemistry,

284 (2009) 26286-26296.

[125] T. O'Brien, R. Tjian, Different functional domains of TAFII250 modulate

expression of distinct subsets of mammalian genes, Proc Natl Acad Sci U S A, 97

(2000) 2456-2461.

[126] R.O. Weinzierl, B.D. Dynlacht, R. Tjian, Largest subunit of Drosophila

transcription factor IID directs assembly of a complex containing TBP and a

coactivator, Nature, 362 (1993) 511-517.

[127] K.J. Wright, M.T. Marr, 2nd, R. Tjian, TAF4 nucleates a core subcomplex of

TFIID and mediates activated transcription from a TATA-less promoter, Proc Natl

Acad Sci U S A, 103 (2006) 12347-12352.

[128] C. Bieniossek, G. Papai, C. Schaffitzel, F. Garzoni, M. Chaillet, E. Scheer, P.

Papadopoulos, L. Tora, P. Schultz, I. Berger, The architecture of human general

transcription factor TFIID core complex, Nature, 493 (2013) 699-702.

[129] M.A. Demeny, E. Soutoglou, Z. Nagy, E. Scheer, A. Janoshazi, M. Richardot,

M. Argentini, P. Kessler, L. Tora, Identification of a small TAF complex and its role in

the assembly of TAF-containing complexes, PloS one, 2 (2007) e316.

157

[130] J. Bonnet, C.Y. Wang, T. Baptista, S.D. Vincent, W.C. Hsiao, M. Stierle, C.F.

Kao, L. Tora, D. Devys, The SAGA coactivator complex acts on the whole

transcribed genome and is required for RNA polymerase II transcription, Genes Dev,

28 (2014) 1999-2012.

[131] D.J. Mitsiou, H.G. Stunnenberg, TAC, a TBP-sans-TAFs complex containing

the unprocessed TFIIAalphabeta precursor and the TFIIAgamma subunit, Molecular

cell, 6 (2000) 527-537.

[132] T. Raha, S.W. Cheng, M.R. Green, HIV-1 Tat stimulates transcription complex

assembly through recruitment of TBP in the absence of TAFs, PLoS biology, 3

(2005) e44.

[133] B. Guglielmi, N. La Rochelle, R. Tjian, Gene-specific transcriptional

mechanisms at the histone gene cluster revealed by single-cell imaging, Molecular

cell, 51 (2013) 480-492.

[134] J. Zaborowska, A. Taylor, S. Murphy, A novel TBP-TAF complex on RNA

polymerase II-transcribed snRNA genes, Transcription, 3 (2012) 92-104.

[135] F.J. van Werven, H. van Bakel, H.A. van Teeffelen, A.F. Altelaar, M.G.

Koerkamp, A.J. Heck, F.C. Holstege, H.T. Timmers, Cooperative action of NC2 and

Mot1p to regulate TATA-binding protein function across the genome, Genes Dev, 22

(2008) 2359-2369.

[136] W. Deng, B. Malecova, T. Oelgeschlager, S.G. Roberts, TFIIB recognition

elements control the TFIIA-NC2 axis in transcriptional regulation, Mol Cell Biol, 29

(2009) 1389-1400.

[137] M. Xu, P. Sharma, S. Pan, S. Malik, R.G. Roeder, E. Martinez, Core promoter-

selective function of HMGA1 and Mediator in Initiator-dependent transcription, Genes

Dev, 25 (2011) 2513-2524.

[138] M.A. Cianfrocco, G.A. Kassavetis, P. Grob, J. Fang, T. Juven-Gershon, J.T.

Kadonaga, E. Nogales, Human TFIID binds to core promoter DNA in a reorganized

structural state, Cell, 152 (2013) 120-131.

[139] M.A. Cianfrocco, E. Nogales, Regulatory interplay between TFIID's

conformational transitions and its modular interaction with core promoter DNA,

Transcription, 4 (2013) 120-126.

[140] W. Akhtar, G.J. Veenstra, TBP-related factors: a paradigm of diversity in

transcription initiation, Cell & bioscience, 1 (2011) 23.

[141] F. Muller, A. Zaucker, L. Tora, Developmental regulation of transcription

initiation: more than just changing the actors, Current opinion in genetics &

development, 20 (2010) 533-540.

158

[142] J.H. Reina, N. Hernandez, On a roll for new TRF targets, Genes Dev, 21 (2007)

2855-2860.

[143] S.H. Duttke, R.F. Doolittle, Y.L. Wang, J.T. Kadonaga, TRF2 and the evolution

of the bilateria, Genes Dev, 28 (2014) 2071-2076.

[144] P.A. Moore, J. Ozer, M. Salunek, G. Jan, D. Zerby, S. Campbell, P.M.

Lieberman, A human TATA binding protein-related protein with altered DNA binding

specificity inhibits transcription from multiple promoters and activators, Mol Cell Biol,

19 (1999) 7610-7620.

[145] M.D. Rabenstein, S. Zhou, J.T. Lis, R. Tjian, TATA box-binding protein (TBP)-

related factor 2 (TRF2), a third member of the TBP family, Proc Natl Acad Sci U S A,

96 (1999) 4791-4796.

[146] Y. Isogai, S. Keles, M. Prestel, A. Hochheimer, R. Tjian, Transcription of

histone gene cluster by differential core-promoter factors, Genes Dev, 21 (2007)

2936-2949.

[147] Y.L. Wang, S.H. Duttke, K. Chen, J. Johnston, G.A. Kassavetis, J. Zeitlinger,

J.T. Kadonaga, TRF2, but not TBP, mediates the transcription of ribosomal protein

genes, Genes Dev, 28 (2014) 1550-1555.

[148] A. Kedmi, Y. Zehavi, Y. Glick, Y. Orenstein, D. Ideses, C. Wachtel, T. Doniger,

H. Waldman Ben-Asher, N. Muster, J. Thompson, S. Anderson, D. Avrahami, J.R.

Yates, 3rd, R. Shamir, D. Gerber, T. Juven-Gershon, Drosophila TRF2 is a

preferential core promoter regulator, Genes Dev, 28 (2014) 2163-2174.

[149] S.H. Duttke, Evolution and diversification of the basal transcription machinery,

Trends in biochemical sciences, (2015).

[150] J.A. Goodrich, R. Tjian, Unexpected roles for core promoter recognition factors

in cell-type-specific transcription and gene regulation, Nat Rev Genet, 11 (2010) 549-

558.

[151] D.A. Wassarman, N. Aoyagi, L.A. Pile, E.M. Schlag, TAF250 is required for

multiple developmental events in Drosophila, Proc Natl Acad Sci U S A, 97 (2000)

1154-1159.

[152] N. Aoyagi, D.A. Wassarman, Developmental and transcriptional consequences

of mutations in Drosophila TAF(II)60, Mol Cell Biol, 21 (2001) 6808-6819.

[153] J. Zhou, J. Zwicker, P. Szymanski, M. Levine, R. Tjian, TAFII mutations disrupt

Dorsal activation in the Drosophila embryo, Proc Natl Acad Sci U S A, 95 (1998)

13483-13488.

[154] M. Guermah, K. Ge, C.M. Chiang, R.G. Roeder, The TBN protein, which is

essential for early embryonic mouse development, is an inducible TAFII implicated in

adipogenesis, Molecular cell, 12 (2003) 991-1001.

159

[155] S. Georgieva, D.B. Kirschner, T. Jagla, E. Nabirochkina, S. Hanke, H.

Schenkel, C. de Lorenzo, P. Sinha, K. Jagla, B. Mechler, L. Tora, Two novel

Drosophila TAF(II)s have homology with human TAF(II)30 and are differentially

regulated during development, Mol Cell Biol, 20 (2000) 1639-1648.

[156] W.S. Mohan, Jr., E. Scheer, O. Wendling, D. Metzger, L. Tora, TAF10

(TAF(II)30) is necessary for TFIID stability and early embryogenesis in mice, Mol Cell

Biol, 23 (2003) 4307-4318.

[157] A. Tatarakis, T. Margaritis, C.P. Martinez-Jimenez, A. Kouskouti, W.S. Mohan,

2nd, A. Haroniti, D. Kafetzopoulos, L. Tora, I. Talianidis, Dominant and redundant

functions of TFIID involved in the regulation of hepatic genes, Molecular cell, 31

(2008) 531-543.

[158] W.W. Pijnappel, D. Esch, M.P. Baltissen, G. Wu, N. Mischerikow, A.J.

Bergsma, E. van der Wal, D.W. Han, H. Bruch, S. Moritz, P. Lijnzaad, A.F. Altelaar,

K. Sameith, H. Zaehres, A.J. Heck, F.C. Holstege, H.R. Scholer, H.T. Timmers, A

central role for TFIID in the pluripotent transcription circuitry, Nature, 495 (2013) 516-

519.

[159] G.A. Maston, L.J. Zhu, L. Chamberlain, L. Lin, M. Fang, M.R. Green, Non-

canonical TAF complexes regulate active promoters in human embryonic stem cells,

eLife, 1 (2012) e00068.

[160] P.J. Wang, D.C. Page, Functional substitution for TAF(II)250 by a retroposed

homolog that is expressed in human spermatogenesis, Human molecular genetics,

11 (2002) 2341-2346.

[161] J.C. Pointud, G. Mengus, S. Brancorsini, L. Monaco, M. Parvinen, P. Sassone-

Corsi, I. Davidson, The intracellular localisation of TAF7L, a paralogue of

transcription factor TFIID subunit TAF7, is developmentally regulated during male

germ-cell differentiation, Journal of cell science, 116 (2003) 1847-1858.

[162] Y. Cheng, M.G. Buffone, M. Kouadio, M. Goodheart, D.C. Page, G.L. Gerton, I.

Davidson, P.J. Wang, Abnormal sperm in mice lacking the Taf7l gene, Mol Cell Biol,

27 (2007) 2582-2589.

[163] H. Zhou, I. Grubisic, K. Zheng, Y. He, P.J. Wang, T. Kaplan, R. Tjian, Taf7l

cooperates with Trf2 to regulate spermiogenesis, Proc Natl Acad Sci U S A, 110

(2013) 16886-16891.

[164] H. Zhou, T. Kaplan, Y. Li, I. Grubisic, Z. Zhang, P.J. Wang, M.B. Eisen, R.

Tjian, Dual functions of TAF7L in adipocyte differentiation, eLife, 2 (2013) e00170.

[165] H. Zhou, B. Wan, I. Grubisic, T. Kaplan, R. Tjian, TAF7L modulates brown

adipose tissue formation, eLife, 3 (2014).

160

[166] R. Dikstein, S. Zhou, R. Tjian, Human TAFII 105 is a cell type-specific TFIID

subunit related to hTAFII130, Cell, 87 (1996) 137-146.

[167] A.E. Falender, R.N. Freiman, K.G. Geles, K.C. Lo, K. Hwang, D.J. Lamb, P.L.

Morris, R. Tjian, J.S. Richards, Maintenance of spermatogenesis requires TAF4b, a

gonad-specific subunit of TFIID, Genes Dev, 19 (2005) 794-803.

[168] A.E. Falender, M. Shimada, Y.K. Lo, J.S. Richards, TAF4b, a TBP associated

factor, is required for oocyte development and function, Dev Biol, 288 (2005) 405-

419.

[169] R.N. Freiman, S.R. Albright, S. Zheng, W.C. Sha, R.E. Hammer, R. Tjian,

Requirement of tissue-selective TBP-associated factor TAFII105 in ovarian

development, Science, 293 (2001) 2084-2087.

[170] K.J. Grive, K.A. Seymour, R. Mehta, R.N. Freiman, TAF4b promotes mouse

primordial follicle assembly and oocyte survival, Dev Biol, 392 (2014) 42-51.

[171] F.J. Herrera, T. Yamaguchi, H. Roelink, R. Tjian, Core promoter factor TAF9B

regulates neuronal gene expression, eLife, 3 (2014) e02559.

[172] M. Hiller, X. Chen, M.J. Pringle, M. Suchorolski, Y. Sancak, S. Viswanathan, B.

Bolival, T.Y. Lin, S. Marino, M.T. Fuller, Testis-specific TAF homologs collaborate to

control a tissue-specific transcription program, Development, 131 (2004) 5297-5308.

[173] U. Ohler, D.A. Wassarman, Promoting developmental transcription,

Development, 137 (2010) 15-26.

[174] I. Martianov, G.M. Fimia, A. Dierich, M. Parvinen, P. Sassone-Corsi, I.

Davidson, Late arrest of spermiogenesis and germ cell apoptosis in mice lacking the

TBP-like TLF/TRF2 gene, Molecular cell, 7 (2001) 509-515.

[175] D. Zhang, T.L. Penttila, P.L. Morris, M. Teichmann, R.G. Roeder,

Spermiogenesis deficiency in mice lacking the Trf2 gene, Science, 292 (2001) 1153-

1155.

[176] T. Oyama, S. Sasagawa, S. Takeda, R.A. Hess, P.M. Lieberman, E.H. Cheng,

J.J. Hsieh, Cleavage of TFIIA by Taspase1 activates TRF2-specified mammalian

male germ cell programs, Developmental cell, 27 (2013) 188-200.

[177] A. Bashirullah, G. Lam, V.P. Yin, C.S. Thummel, dTrf2 is required for

transcriptional and developmental responses to ecdysone during Drosophila

metamorphosis, Developmental dynamics : an official publication of the American

Association of Anatomists, 236 (2007) 3173-3179.

[178] D.O. Hart, T. Raha, N.D. Lawson, M.R. Green, Initiation of zebrafish

haematopoiesis by the TATA-box-binding protein-related factor Trf3, Nature, 450

(2007) 1082-1085.

161

[179] D.O. Hart, M.K. Santra, T. Raha, M.R. Green, Selective interaction between

Trf3 and Taf3 required for early development and hematopoiesis, Developmental

dynamics : an official publication of the American Association of Anatomists, 238

(2009) 2540-2549.

[180] R. Bartfai, C. Balduf, T. Hilton, Y. Rathmann, Y. Hadzhiev, L. Tora, L. Orban, F.

Muller, TBP2, a vertebrate-specific member of the TBP family, is required in

embryonic development of zebrafish, Current biology : CB, 14 (2004) 593-598.

[181] Z. Jallow, U.G. Jacobi, D.L. Weeks, I.B. Dawid, G.J. Veenstra, Specialized and

redundant roles of TBP and a vertebrate-specific TBP paralog in embryonic gene

regulation in Xenopus, Proc Natl Acad Sci U S A, 101 (2004) 13525-13530.

[182] E. Gazdag, A. Santenard, C. Ziegler-Birling, G. Altobelli, O. Poch, L. Tora, M.E.

Torres-Padilla, TBP2 is essential for germ cell development by regulating

transcription and chromatin condensation in the oocyte, Genes Dev, 23 (2009) 2210-

2223.

[183] M. Bulger, M. Groudine, Functional and mechanistic diversity of distal

transcription enhancers, Cell, 144 (2011) 327-339.

[184] M. Levine, Transcriptional enhancers in animal development and evolution,

Current biology : CB, 20 (2010) R754-763.

[185] M. Levine, C. Cattoglio, R. Tjian, Looping back to leap forward: transcription

enters a new era, Cell, 157 (2014) 13-25.

[186] J. Marsman, J.A. Horsfield, Long distance relationships: enhancer-promoter

communication and dynamic gene transcription, Biochim Biophys Acta, 1819 (2012)

1217-1227.

[187] C.T. Ong, V.G. Corces, Enhancer function: new insights into the regulation of

tissue-specific gene expression, Nat Rev Genet, 12 (2011) 283-293.

[188] D. Shlyueva, G. Stampfel, A. Stark, Transcriptional enhancers: from properties

to genome-wide predictions, Nat Rev Genet, 15 (2014) 272-286.

[189] F. Spitz, E.E. Furlong, Transcription factors: from enhancer binding to

developmental control, Nat Rev Genet, 13 (2012) 613-626.

[190] J. van Arensbergen, B. van Steensel, H.J. Bussemaker, In search of the

determinants of enhancer-promoter interaction specificity, Trends in cell biology,

(2014).

[191] X. Li, M. Noll, Compatibility between enhancers and promoters determines the

transcriptional specificity of gooseberry and gooseberry neuro in the Drosophila

embryo, EMBO J, 13 (1994) 400-406.

162

[192] C. Merli, D.E. Bergstrom, J.A. Cygan, R.K. Blackman, Promoter specificity

mediates the independent regulation of neighboring genes, Genes Dev, 10 (1996)

1260-1270.

[193] B. Tolhuis, R.J. Palstra, E. Splinter, F. Grosveld, W. de Laat, Looping and

interaction between hypersensitive sites in the active beta-globin locus, Molecular

cell, 10 (2002) 1453-1465.

[194] J. Gehrig, M. Reischl, E. Kalmar, M. Ferg, Y. Hadzhiev, A. Zaucker, C. Song, S.

Schindler, U. Liebel, F. Muller, Automated high-throughput mapping of promoter-

enhancer interactions in zebrafish embryos, Nat Methods, 6 (2009) 911-916.

[195] V.C. Calhoun, A. Stathopoulos, M. Levine, Promoter-proximal tethering

elements regulate enhancer-promoter specificity in the Drosophila Antennapedia

complex, Proc Natl Acad Sci U S A, 99 (2002) 9243-9247.

[196] O.S. Akbari, E. Bae, H. Johnsen, A. Villaluz, D. Wong, R.A. Drewell, A novel

promoter-tethering element regulates enhancer-driven gene expression at the

bithorax complex in the Drosophila embryo, Development, 135 (2008) 123-131.

[197] S. Ohtsuki, M. Levine, H.N. Cai, Different core promoters possess distinct

regulatory activities in the Drosophila embryo, Genes Dev, 12 (1998) 547-556.

[198] J.E. Butler, J.T. Kadonaga, Enhancer-promoter specificity mediated by DPE or

TATA core promoter motifs, Genes Dev, 15 (2001) 2515-2519.

[199] F. Jin, Y. Li, J.R. Dixon, S. Selvaraj, Z. Ye, A.Y. Lee, C.A. Yen, A.D. Schmitt,

C.A. Espinoza, B. Ren, A high-resolution map of the three-dimensional chromatin

interactome in human cells, Nature, 503 (2013) 290-294.

[200] A. Sanyal, B.R. Lajoie, G. Jain, J. Dekker, The long-range interaction

landscape of gene promoters, Nature, 489 (2012) 109-113.

[201] Y. Zhang, C.H. Wong, R.Y. Birnbaum, G. Li, R. Favaro, C.Y. Ngan, J. Lim, E.

Tai, H.M. Poh, E. Wong, F.H. Mulawadi, W.K. Sung, S. Nicolis, N. Ahituv, Y. Ruan,

C.L. Wei, Chromatin connectivity maps reveal dynamic promoter-enhancer long-

range associations, Nature, 504 (2013) 306-310.

[202] C.D. Arnold, D. Gerlach, C. Stelzer, L.M. Boryn, M. Rath, A. Stark, Genome-

wide quantitative enhancer activity maps identified by STARR-seq, Science, 339

(2013) 1074-1077.

[203] Y. Ghavi-Helm, F.A. Klein, T. Pakozdi, L. Ciglar, D. Noordermeer, W. Huber,

E.E. Furlong, Enhancer loops appear stable during development and are associated

with paused polymerase, Nature, 512 (2014) 96-100.

[204] M.A. Zabidi, C.D. Arnold, K. Schernhuber, M. Pagani, M. Rath, O. Frank, A.

Stark, Enhancer--core-promoter specificity separates developmental and

housekeeping gene regulation, Nature, (2014).

163

[205] N.J. Fuda, M.B. Ardehali, J.T. Lis, Defining mechanisms that regulate RNA

polymerase II transcription in vivo, Nature, 461 (2009) 186-192.

[206] S. Nechaev, K. Adelman, Pol II waiting in the starting gates: Regulating the

transition from transcription initiation into productive elongation, Biochim Biophys

Acta, 1809 (2011) 34-45.

[207] D.L. Bentley, Coupling mRNA processing with transcription in time and space,

Nat Rev Genet, 15 (2014) 163-175.

[208] K. Adelman, J.T. Lis, Promoter-proximal pausing of RNA polymerase II:

emerging roles in metazoans, Nat Rev Genet, 13 (2012) 720-731.

[209] D.A. Gilchrist, K. Adelman, Coupling polymerase pausing and chromatin

landscapes for precise regulation of transcription, Biochim Biophys Acta, 1819 (2012)

700-706.

[210] Y. Yamaguchi, H. Shibata, H. Handa, Transcription elongation factors DSIF and

NELF: promoter-proximal pausing and beyond, Biochim Biophys Acta, 1829 (2013)

98-104.

[211] D.S. Gilmour, J.T. Lis, RNA polymerase II interacts with the promoter region of

the noninduced hsp70 gene in Drosophila melanogaster cells, Mol Cell Biol, 6 (1986)

3984-3989.

[212] E.B. Rasmussen, J.T. Lis, In vivo transcriptional pausing and cap formation on

three Drosophila heat shock genes, Proc Natl Acad Sci U S A, 90 (1993) 7923-7927.

[213] D.L. Bentley, M. Groudine, A block to elongation is largely responsible for

decreased transcription of c-myc in differentiated HL60 cells, Nature, 321 (1986) 702-

706.

[214] A. Krumm, T. Meulia, M. Brunvand, M. Groudine, The block to transcriptional

elongation within the human c-myc gene is determined in the promoter-proximal

region, Genes Dev, 6 (1992) 2201-2213.

[215] C.S. Maxwell, W.S. Kruesi, L.J. Core, N. Kurhanewicz, C.T. Waters, C.L.

Lewarch, I. Antoshechkin, J.T. Lis, B.J. Meyer, L.R. Baugh, Pol II docking and

pausing at growth and stress genes in C. elegans, Cell reports, 6 (2014) 455-466.

[216] G.W. Muse, D.A. Gilchrist, S. Nechaev, R. Shah, J.S. Parker, S.F. Grissom, J.

Zeitlinger, K. Adelman, RNA polymerase is poised for activation across the genome,

Nat Genet, 39 (2007) 1507-1511.

[217] J. Zeitlinger, A. Stark, M. Kellis, J.W. Hong, S. Nechaev, K. Adelman, M.

Levine, R.A. Young, RNA polymerase stalling at developmental control genes in the

Drosophila melanogaster embryo, Nat Genet, 39 (2007) 1512-1516.

164

[218] C. Lee, X. Li, A. Hechmer, M. Eisen, M.D. Biggin, B.J. Venters, C. Jiang, J. Li,

B.F. Pugh, D.S. Gilmour, NELF and GAGA factor are linked to promoter-proximal

pausing at many genes in Drosophila, Mol Cell Biol, 28 (2008) 3290-3300.

[219] S. Nechaev, D.C. Fargo, G. dos Santos, L. Liu, Y. Gao, K. Adelman, Global

analysis of short RNAs reveals widespread promoter-proximal stalling and arrest of

Pol II in Drosophila, Science, 327 (2010) 335-338.

[220] M. Quinodoz, C. Gobet, F. Naef, K.B. Gustafson, Characteristic bimodal

profiles of RNA polymerase II at thousands of active mammalian promoters, Genome

Biol, 15 (2014) R85.

[221] B. Gaertner, J. Zeitlinger, RNA polymerase II pausing during development,

Development, 141 (2014) 1179-1183.

[222] C. Nepal, Y. Hadzhiev, C. Previti, V. Haberle, N. Li, H. Takahashi, A.M. Suzuki,

Y. Sheng, R.F. Abdelhamid, S. Anand, J. Gehrig, A. Akalin, C.E. Kockx, A.A. van der

Sloot, W.F. van Ijcken, O. Armant, S. Rastegar, C. Watson, U. Strahle, E. Stupka, P.

Carninci, B. Lenhard, F. Muller, Dynamic regulation of the transcription initiation

landscape at single nucleotide resolution during vertebrate embryogenesis, Genome

Res, 23 (2013) 1938-1950.

[223] V. Haberle, N. Li, Y. Hadzhiev, C. Plessy, C. Previti, C. Nepal, J. Gehrig, X.

Dong, A. Akalin, A.M. Suzuki, I.W.F. van, O. Armant, M. Ferg, U. Strahle, P. Carninci,

F. Muller, B. Lenhard, Two independent transcription initiation codes overlap on

vertebrate core promoters, Nature, 507 (2014) 381-385.

[224] T. Henriques, D.A. Gilchrist, S. Nechaev, M. Bern, G.W. Muse, A. Burkholder,

D.C. Fargo, K. Adelman, Stable pausing by RNA polymerase II provides an

opportunity to target and integrate regulatory signals, Molecular cell, 52 (2013) 517-

528.

[225] J. Li, Y. Liu, H.S. Rhee, S.K. Ghosh, L. Bai, B.F. Pugh, D.S. Gilmour, Kinetic

competition between elongation rate and binding of NELF controls promoter-proximal

pausing, Molecular cell, 50 (2013) 711-722.

[226] B. Gaertner, J. Johnston, K. Chen, N. Wallaschek, A. Paulson, A.S. Garruss, K.

Gaudenz, B. De Kumar, R. Krumlauf, J. Zeitlinger, Poised RNA polymerase II

changes over developmental time and prepares genes for future expression, Cell

reports, 2 (2012) 1670-1683.

[227] M. Lagha, J.P. Bothma, E. Esposito, S. Ng, L. Stefanik, C. Tsui, J. Johnston, K.

Chen, D.S. Gilmour, J. Zeitlinger, M.S. Levine, Paused Pol II coordinates tissue

morphogenesis in the Drosophila embryo, Cell, 153 (2013) 976-987.

[228] A. Saunders, H.L. Ashe, Taking a pause to reflect on morphogenesis, Cell, 153

(2013) 941-943.

165

[229] A. Saunders, L.J. Core, C. Sutcliffe, J.T. Lis, H.L. Ashe, Extensive polymerase

pausing during Drosophila axis patterning enables high-level and pliable

transcription, Genes Dev, 27 (2013) 1146-1158.

[230] K. Chen, J. Johnston, W. Shao, S. Meier, C. Staber, J. Zeitlinger, A global

change in RNA polymerase II pausing during the Drosophila midblastula transition,

eLife, 2 (2013) e00861.

[231] L. Amir-Zilberstein, E. Ainbinder, L. Toube, Y. Yamaguchi, H. Handa, R.

Dikstein, Differential regulation of NF-kappaB by elongation factors is determined by

core promoter type, Mol Cell Biol, 27 (2007) 5246-5259.

[232] D.A. Hendrix, J.W. Hong, J. Zeitlinger, D.S. Rokhsar, M.S. Levine, Promoter

elements associated with RNA Pol II stalling in the Drosophila embryo, Proc Natl

Acad Sci U S A, 105 (2008) 7762-7767.

[233] H. Kwak, N.J. Fuda, L.J. Core, J.T. Lis, Precise maps of RNA polymerase

reveal how promoters direct initiation and pausing, Science, 339 (2013) 950-953.

[234] N.J. Proudfoot, Ending the message: poly(A) signals then and now, Genes

Dev, 25 (2011) 1770-1782.

[235] P.K. Andersen, T.H. Jensen, S. Lykke-Andersen, Making ends meet:

coordination between RNA 3'-end processing and transcription initiation, Wiley

interdisciplinary reviews. RNA, 4 (2013) 233-246.

[236] D.C. Di Giammartino, J.L. Manley, New links between mRNA polyadenylation

and diverse nuclear pathways, Molecules and cells, 37 (2014) 644-649.

[237] O. Calvo, J.L. Manley, Strange bedfellows: polyadenylation factors at the

promoter, Genes Dev, 17 (2003) 1321-1327.

[238] K. Xiang, L. Tong, J.L. Manley, Delineating the structural blueprint of the pre-

mRNA 3'-end processing machinery, Mol Cell Biol, 34 (2014) 1894-1910.

[239] J.C. Dantonel, K.G. Murthy, J.L. Manley, L. Tora, Transcription factor TFIID

recruits factor CPSF for formation of 3' end of mRNA, Nature, 389 (1997) 399-402.

[240] Y. Wang, J.A. Fairley, S.G. Roberts, Phosphorylation of TFIIB links

transcription initiation and termination, Current biology : CB, 20 (2010) 548-553.

[241] C.K. Mapendano, S. Lykke-Andersen, J. Kjems, E. Bertrand, T.H. Jensen,

Crosstalk between mRNA 3' end processing and transcription initiation, Molecular

cell, 40 (2010) 410-422.

[242] S. Lykke-Andersen, C.K. Mapendano, T.H. Jensen, An ending is a new

beginning: transcription termination supports re-initiation, Cell cycle, 10 (2011) 863-

865.

166

[243] K. Oktaba, W. Zhang, T.S. Lotz, D.J. Jun, S.B. Lemke, S.P. Ng, E. Esposito, M.

Levine, V. Hilgers, ELAV Links Paused Pol II to Alternative Polyadenylation in the

Drosophila Nervous System, Molecular cell, 57 (2015) 341-348.

[244] T. Gonatopoulos-Pournatzis, V.H. Cowling, Cap-binding complex (CBC),

Biochem J, 457 (2014) 231-242.

[245] R.J. Jackson, C.U. Hellen, T.V. Pestova, The mechanism of eukaryotic

translation initiation and principles of its regulation, Nature reviews. Molecular cell

biology, 11 (2010) 113-127.

[246] R. Dikstein, Transcription and translation in a package deal: the TISU

paradigm, Gene, 491 (2012) 1-4.

[247] M. Kozak, Initiation of translation in prokaryotes and eukaryotes, Gene, 234

(1999) 187-208.

[248] R. Elfakess, R. Dikstein, A translation initiation element specific to mRNAs with

very short 5'UTR that also regulates transcription, PloS one, 3 (2008) e3094.

[249] R. Elfakess, H. Sinvani, O. Haimov, Y. Svitkin, N. Sonenberg, R. Dikstein,

Unique translation initiation of mRNAs-containing TISU element, Nucleic Acids Res,

39 (2011) 7598-7609.

[250] D. Avni, S. Shama, F. Loreni, O. Meyuhas, Vertebrate mRNAs with a 5'-

terminal pyrimidine tract are candidates for translational repression in quiescent cells:

characterization of the translational cis-regulatory element, Mol Cell Biol, 14 (1994)

3822-3833.

[251] O. Meyuhas, Synthesis of the translational apparatus is regulated at the

translational level, European journal of biochemistry / FEBS, 267 (2000) 6321-6330.

[252] M. Schafer, R. Kuhn, F. Bosse, U. Schafer, A conserved element in the leader

mediates post-meiotic translation as well as cytoplasmic polyadenylation of a

Drosophila spermatocyte mRNA, EMBO J, 9 (1990) 4519-4525.

[253] E. Kempe, B. Muhs, M. Schafer, Gene regulation in Drosophila

spermatogenesis: analysis of protein binding at the translational control element

TCE, Dev Genet, 14 (1993) 449-459.

[254] R.J. Katzenberger, E.A. Rach, A.K. Anderson, U. Ohler, D.A. Wassarman, The

Drosophila Translational Control Element (TCE) is required for high-level

transcription of many genes that are specifically expressed in testes, PloS one, 7

(2012) e45009.

[255] S. Moshonov, R. Elfakess, M. Golan-Mashiach, H. Sinvani, R. Dikstein, Links

between core promoter and basic gene features influence gene expression, BMC

genomics, 9 (2008) 92.

167

[256] A. Tamarkin-Ben-Harush, E. Schechtman, R. Dikstein, Co-occurrence of

transcription and translation gene regulatory features underlies coordinated mRNA

and protein synthesis, BMC genomics, 15 (2014) 688.

[257] L. Savinkova, I. Drachkova, T. Arshinova, P. Ponomarenko, M. Ponomarenko,

N. Kolchanov, An experimental verification of the predicted effects of promoter TATA-

box polymorphisms associated with human diseases on interactions between the

TATA boxes and TATA-binding protein, PloS one, 8 (2013) e54626.

[258] L.K. Savinkova, M.P. Ponomarenko, P.M. Ponomarenko, I.A. Drachkova, M.V.

Lysova, T.V. Arshinova, N.A. Kolchanov, TATA box polymorphisms in human gene

promoters and associated hereditary pathologies, Biochemistry. Biokhimiia, 74

(2009) 117-129.

[259] G. Li, X. Ruan, R.K. Auerbach, K.S. Sandhu, M. Zheng, P. Wang, H.M. Poh, Y.

Goh, J. Lim, J. Zhang, H.S. Sim, S.Q. Peh, F.H. Mulawadi, C.T. Ong, Y.L. Orlov, S.

Hong, Z. Zhang, S. Landt, D. Raha, G. Euskirchen, C.L. Wei, W. Ge, H. Wang, C.

Davis, K.I. Fisher-Aylor, A. Mortazavi, M. Gerstein, T. Gingeras, B. Wold, Y. Sun,

M.J. Fullwood, E. Cheung, E. Liu, W.K. Sung, M. Snyder, Y. Ruan, Extensive

promoter-centered chromatin interactions provide a topological basis for transcription

regulation, Cell, 148 (2012) 84-98.

[260] T.K. Kim, M. Hemberg, J.M. Gray, A.M. Costa, D.M. Bear, J. Wu, D.A. Harmin,

M. Laptewicz, K. Barbara-Haley, S. Kuersten, E. Markenscoff-Papadimitriou, D. Kuhl,

H. Bito, P.F. Worley, G. Kreiman, M.E. Greenberg, Widespread transcription at

neuronal activity-regulated enhancers, Nature, 465 (2010) 182-187.

[261] R. Andersson, C. Gebhard, I. Miguel-Escalada, I. Hoof, J. Bornholdt, M. Boyd,

Y. Chen, X. Zhao, C. Schmidl, T. Suzuki, E. Ntini, E. Arner, E. Valen, K. Li, L.

Schwarzfischer, D. Glatz, J. Raithel, B. Lilje, N. Rapin, F.O. Bagger, M. Jorgensen,

P.R. Andersen, N. Bertin, O. Rackham, A.M. Burroughs, J.K. Baillie, Y. Ishizu, Y.

Shimizu, E. Furuhata, S. Maeda, Y. Negishi, C.J. Mungall, T.F. Meehan, T.

Lassmann, M. Itoh, H. Kawaji, N. Kondo, J. Kawai, A. Lennartsson, C.O. Daub, P.

Heutink, D.A. Hume, T.H. Jensen, H. Suzuki, Y. Hayashizaki, F. Muller, F.

Consortium, A.R. Forrest, P. Carninci, M. Rehli, A. Sandelin, An atlas of active

enhancers across human cell types and tissues, Nature, 507 (2014) 455-461.

[262] S. Weingarten-Gabbay, E. Segal, A shared architecture for promoters and

enhancers, Nat Genet, 46 (2014) 1253-1254.

[263] R. Andersson, Promoter or enhancer, what's the difference? Deconstruction of

established distinctions and presentation of a unifying model, BioEssays : news and

reviews in molecular, cellular and developmental biology, (2014).

168

Figure legends

Fig. 1. General features of the core promoter region. A. The three main

core promoter types based on the distribution of TSSs, including focused,

dispersed and mixed promoters. Small arrows represent weak TSSs,

whereas a large arrow represents a single strong TSS. B. Chromatin features

of active core promoters include distinct post-translational modifications and

nucleosome depletion. Associated histones marks are depicted:

H3K4me2/me3 (orange), H3K4ac (gray), H3K27ac (light blue). A DHS/NDR

pattern ranging from nucleosome-free (light) to nucleosome-occupied regions

(dark) is illustrated below. C. Schematic illustration of the most common core

promoter elements found in focused promoters. The diagram is roughly to

scale. D. Schematic illustration of the known factors and sequence motifs that

are associated with dispersed promoters.

Fig. 2. The core promoter can be studied from different angles in

multiple resolutions. A. Zooming in on global genomic interactions in the

nucleus, one can study long-range interactions, such as those between

enhancers and promoters, by analyzing chromatin looping, cohesion function,

interactions of transcription factors (TFs) with co-activators and cis-regulatory

modules and interactions of the preinitiation complex (PIC) components with

their target promoters. B. Zooming in on the basal transcription machinery,

one can study the assembly and composition of the PIC at different Pol II-

promoters and on the 3D structure of different PIC components. C. Zooming

in on the DNA-binding PIC components (TFIIB and TFIID), one can focus on

the alternative protein components at different Pol II-promoters, on the core

promoter composition of specialized transcription programs, and on the

interactions of different PIC components with specific core promoter elements.

Fig. 3. Schematic model depicting the pivotal role of the core promoter

module in diverse molecular events and stages of gene expression. The

core promoter is important for (clockwise): basal transcription initiation and

PIC- core promoter compatibility and thus for PIC formation; enhancer-

promoter compatibility (which is schematically represented by the preferential

169

activation of DPE-dependent promoters by Caudal); promoter-proximal Pol II

pausing; termination/ polyadenylation and Pol II recycling; and translation, via

core promoter elements that play a role in both transcription and translation.

Please see the main text for detailed explanations.

170

Figure 1:

Figure 2:

171

Figure 3:

אילן-סיטת ברראוניב

:האנושי לגנום בדרוזופילה HOX-ה גני של מפרומוטורים

.האנושי DPE-ה אלמנט של פוטנציאלית ופעילות איפיון, זיהוי

דנינומתן -יהודה

על שם מינה ואבררד עבודה זו מוגשת כחלק מהדרישות לקבלת תואר מוסמך בפקולטה למדעי החיים

של אוניברסיטת בר אילן גודמן

תשע"ה , ישראל גן רמת

מן הפקולטה למדעי החיים על שם ,גרשון-עבודה זו נעשתה בהדרכתה של דר' תמר יובן

.מינה ואבררד גודמן של אוניברסיטת בר אילן

א

תקציר

ביטוי גנים תקין, חיוני לצורך קיומו ופעילותו של כל תא ותא באורגניזם השלם. מערך

תהליך עומד הגנים ביטוי מנגנון בבסיס. רבים פקטורים זה מבוקר על ידי רב שלבי תהליך

על DNA בתבנית משתמש RNA polymerase II (Pol II) האנזים, זה בתהליך. השעתוק

הבקרה מנקודות אחת. לחלבון להיתרגם דהעתי אשר mRNA מולקולת לשעתק מנת

התחלת על הבקרה הינה לחלבונים המקודדים הגנים ביטוי תהליך של והקריטיות הראשונות

לשעתק יוכל Pol II ש מנת על). promoter( הפרומוטר באזור המתרחשת מדויקת שעתוק

מערכת ידי על) core promoter" (הליבה פרומוטר" אל מגויס הוא, RNA מולקולות

אינטראקציות באמצעות) Preinitiation complex )PIC-ה יצירת תוך הבזאלית השעתוק

. DNA-וחלבון חלבון-חלבון

. מדויקת שעתוק התחלת המאפשר המינימלי DNA-ה מקטע הוא core promoter-ה

+) 1 -כ מוגדר) (Transcription start site; TSS( השעתוק תחילת אתר את מכיל זה מקטע

core -ה, כן כמו). TSS-ל ביחס+) 40( -ל) -40( בין( כלל בדרך בסיסים 80 -כ ואורכו

promoter ב קצרים פונקציונאליים רצפים מכיל-DNA ,שנקראים core promoter

elements (or motifs) ,ה למרכיבי עגינה נקודות המהווים-PIC .מצוי אלה אלמנטים בין

הקשורים גנים בביטוי חשוב תפקיד בעל DPE-ה מוטיב כי נמצא היתר בין. DPE-ה אלמנט

המתפתח העובר סגמנטי זהות לקביעת האחראים hox-ה גני כדוגמת, עוברית בהתפתחות

ונמצא מלנוגסטר דרוזופילה הפירות בזבוב התגלה DPE-ה. אותם המבקר caudal-ה וגן

עד, שנה 20 -כ לפני כבר זוהה זה שאלמנט למרות. אדם בבני גנים בשני גם פונקציונאלי

. זה אלמנט תחת המבוקרים נוספים הומניים גנים זוהו לא היום

בגנים DPE-ה אלמנט את לאפיין, וחישוביות מחקריות, דרכים במספר ניסינו, זו בעבודה

המאפיינים על התבססו המחקר גישות רוב. בפרט ההומניים Hox-ה ובגני, בכלל הומניים

ב

, והאדם הדרוזופילה בין אבולוציוני שימור על וכן בדרוזופילה שנמצא כפי DPE-ה אלמנט של

רצפיבהשוואת שימוש תוך. תאיים ברב מאוד ששמורים גנים, Hox-ה גני בין בייחוד

גני של פרומוטרים של מצומצם מספר כי הראנו, לוציפראז מבחני באמצעות וכן פרומוטורים

Hox אלמנט מכילים באדם DPE שאופיינו כפי המקוריות לדרישות העונה פונקציונאלי

ברמת לירידה גורמת DPE-ה לקונצנזוס העונה ברצף מוטציה, אלה במבחנים. בדרוזופילה

של הקריטריונים בבסיסם עומדיםש שלעיל האנליזות רוב. הנבדק מהפרומוטר השעתוק

, ניכר אבולוציוני שימור למרות כי להניח סביר. בידינו חרס העלו, מדרוזופילה DPE-ה אלמנט

אורגניזמים ושני היות מסוימת ברמה שונים והדרוזופילה האדם בין המקבילים האלמנטים

באופן, DPE-ל הומולגי אשר הומני אלמנט ולאפיין לזהות מנת על. אבולוציונית רחוקים אלו

לצורך) stable cell lines( יציבות תאים שורות יצרנו, בזבובים DPE ה ברצף תלוי שאינו

היא היחידה ההנחה. מתקדמים chromatin immune-precipitation ניסויי ביצוע

המבוססת הנחה, PIC-ב המצויים TAF9 -ו TAF6 החלבונים י"ע נקשר ההומני שהאלמנט

.קודמים מחקרים על

סרטני מחולי Cdx-ו Hox גני מספר של פרומוטורים אזורי של ריצוף בעקבות, כן על יתר

מאפיין אשר Hoxb6 הגן של יתר לביטוי מנגנון מציעים אנו, פעילותם ובחינת שונים דם

של שינוי( single nucleotide polymorphism (SNP)ש מציעים אנו. דם סרטני מספר

.הגן של בביטויו לשינוי גורם זה גן של TSS-ב) אחד נוקלאוטיד

בבריאות core promoter-ה הרכב הבנת של החשיבות את מציגה זו עבודה, לסיכום

DPE-ה אלמנט כי מראה העבודה, בנוסף. DPE-ה באלמנט התמקדות תוך ובחולי

הקונצנזוס ברצף שינויים ויכיל שיתכן, מקביל אלמנט לאפיין עתידה אך באדם פונקציונאלי

.ומיקומו

88

A.

B.

C.

D.

bar ilan universitylifefaculty.biu.ac.il/gershon-tamar/images/theses... · dr. tirza doniger, dr....

Documents