dna-binding specificity of gata family transcription factors

12
MOLECULAR AND CELLULAR BIOLOGY, JUlY 1993, p. 3999-4010 Vol. 13, No. 7 0270-7306/93/073999-12$02.00/0 Copyright C 1993, American Society for Microbiology DNA-Binding Specificity of GATA Family Transcription Factors MENIE MERIKA AND STUART H. ORKIN* Division of Hematology/Oncology, Children's Hospital, Dana-Farber Cancer Institute and Department of Pediatrics, Harvard Medical School; and Howard Hughes Medical Institute, Boston, Massachusetts 02115 Received 4 February 1993/Returned for modification 11 March 1993/Accepted 8 April 1993 GATA-binding proteins constitute a family of transcription factors that recognize a target site conforming to the consensus WGATAR (W = A or T and R = A or G). Here we have used the method of polymerase chain reaction-mediated random site selection to assess in an unbiased manner the DNA-binding specificity of GATA proteins. Contrary to our expectations, we show that GATA proteins bind a variety of motifs that deviate from the previously assigned consensus. Many of the nonconsensus sequences bind protein with high affinity, equivalent to that of conventional GATA motifs. By using the selected sequences as probes in the electro- phoretic mobility shift assay, we demonstrate overlapping, but distinct, sequence preferences for GATA family members, specified by their respective DNA-binding domains. Furthermore, we provide additional evidence for interaction of amino and carboxy fingers of GATA-1 in defining its binding site. By performing cotransfection experiments, we also show that transactivation parallels DNA binding. A chimeric protein containing the finger domain of areA and the activation domains of GATA-1 is capable of activating transcription in mammalian cells through GATA motifs. Our findings suggest a mechanism by which GATA proteins might selectively regulate gene expression in cells in which they are coexpressed. The GATA family of proteins consists of a small, but enlarging, family of transcription factors with individual members represented in humans, mice, chickens, Xenopus laevis, Caenorhabditis elegans, Drosophila melanogaster (7a), and fungi. Among vertebrates, four GATA-binding proteins (designated GATA-1 to GATA-4), each with a different tissue and developmental profile, have been identi- fied (see reference 25 for a review). GATA-1, the founding member of the family, was initially identified as an erythro- cyte-specific DNA-binding activity with presumptive target sites in the promoters and/or enhancers of human and chicken globin genes (10, 21, 38). cDNAs were first charac- terized as encoding the mouse and chicken homologs (8, 37). As revealed through targeted gene disruption, expression of GATA-1 is essential for normal erythroid development (26). cDNAs for other vertebrate members of the GATA family have been isolated by virtue of their close sequence homol- ogy to the DNA-binding domain of GATA-1 (6, 13, 14, 18, 40, 45). In vertebrates, this DNA-binding domain (20, 41) includes two similar zinc fingers of the general configuration Cys-X2- Cys-X17-Cys-X2-Cys. Nonvertebrate members include the two-fingered C. elegans protein elt-1 (31) and several single- finger fungal factors which also bind GATA motifs (4, 11, 12, 16, 23, 43). The fungal finger region is more closely related to the carboxy finger of the vertebrate members (16). Deletion and mutagenesis analyses have identified distinct functional roles for the two fingers of mouse GATA-1 (mGATA-1) (20) and chicken GATA-1 (cGATA-1) (41) proteins. The carboxy finger is required for binding, whereas the amino finger cooperates with it to provide full stability and specificity of binding. Each GATA family member exhibits a distinctive, often overlapping, pattern of expression in tissues and cell lines. * Corresponding author. GATA-1 is found in cells of the erythroid lineage (8, 37), in two other hematopoietic lineages (megakaryocytic and bone marrow-derived mast cells [22, 29]), and in hematopoietic progenitor cells (3, 32). GATA-2 is also expressed in pro- genitor cells, mast cells, megakaryocytes (44), embryonic brain cells, primitive erythroblasts (40), endothelial cells, embryonic stem cells, and a variety of other cells and tissues (6, 18, 39, 40). In fact, GATA-2 has been implicated as a direct effector of selected endothelium-specific genes, such as preproendothelin-1 (6, 18, 39). GATA-3 protein is highly expressed in T-lymphoid cells and embryonic brain cells (13-15, 40), but elsewhere (endothelial cells and embryonic stem cells) it is expressed at a low level. A role of this protein in the regulation of T-cell receptor a- and B-chain genes has been suggested (13-15). Expression of GATA-4 is restricted to the heart, intestinal epithelium, primitive endoderm, and gonads (1). A common property of all GATA proteins is their high- affinity binding to a sequence motif conforming to the consensus T/A (GATA) A/G. Motifs with assigned func- tional significance that conform to this consensus have been found in various regulatory regions (promoters of genes expressed in erythroid, megakaryocytic, mast, and endothe- lial cells; globin and T-cell receptor a- and B-chain gene enhancers; and a- and 3-globin locus control regions (for a review, see reference 25). The abilities of various members of the GATA family to recognize closely related, but not identical, DNA sequence elements raises interesting possibilities as to how differential gene regulation is accomplished in cells expressing more than one GATA protein. That is, differential regulation might be achieved by selective high-affinity binding of one, but no other, GATA family members to a target sequence because of subtle variations in their DNA-binding domains. To define the spectrum of DNA-binding sites recognized by different GATA family members in an unbiased manner, we have employed the method of polymerase chain reaction (PCR)- 3999

Upload: trinhtruc

Post on 31-Dec-2016

229 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: DNA-Binding Specificity of GATA Family Transcription Factors

MOLECULAR AND CELLULAR BIOLOGY, JUlY 1993, p. 3999-4010 Vol. 13, No. 70270-7306/93/073999-12$02.00/0Copyright C 1993, American Society for Microbiology

DNA-Binding Specificity of GATA FamilyTranscription Factors

MENIE MERIKA AND STUART H. ORKIN*Division ofHematology/Oncology, Children's Hospital, Dana-Farber Cancer Institute andDepartment ofPediatrics, Harvard Medical School; and Howard Hughes Medical Institute,

Boston, Massachusetts 02115

Received 4 February 1993/Returned for modification 11 March 1993/Accepted 8 April 1993

GATA-binding proteins constitute a family of transcription factors that recognize a target site conforming tothe consensus WGATAR (W = A or T and R = A or G). Here we have used the method of polymerase chainreaction-mediated random site selection to assess in an unbiased manner the DNA-binding specificity ofGATAproteins. Contrary to our expectations, we show that GATA proteins bind a variety of motifs that deviate fromthe previously assigned consensus. Many of the nonconsensus sequences bind protein with high affinity,equivalent to that of conventional GATA motifs. By using the selected sequences as probes in the electro-phoretic mobility shift assay, we demonstrate overlapping, but distinct, sequence preferences for GATA familymembers, specified by their respective DNA-binding domains. Furthermore, we provide additional evidencefor interaction of amino and carboxy fingers of GATA-1 in defining its binding site. By performingcotransfection experiments, we also show that transactivation parallels DNA binding. A chimeric proteincontaining the finger domain of areA and the activation domains of GATA-1 is capable of activatingtranscription in mammalian cells through GATA motifs. Our findings suggest a mechanism by which GATAproteins might selectively regulate gene expression in cells in which they are coexpressed.

The GATA family of proteins consists of a small, butenlarging, family of transcription factors with individualmembers represented in humans, mice, chickens, Xenopuslaevis, Caenorhabditis elegans, Drosophila melanogaster(7a), and fungi. Among vertebrates, four GATA-bindingproteins (designated GATA-1 to GATA-4), each with adifferent tissue and developmental profile, have been identi-fied (see reference 25 for a review). GATA-1, the foundingmember of the family, was initially identified as an erythro-cyte-specific DNA-binding activity with presumptive targetsites in the promoters and/or enhancers of human andchicken globin genes (10, 21, 38). cDNAs were first charac-terized as encoding the mouse and chicken homologs (8, 37).As revealed through targeted gene disruption, expression ofGATA-1 is essential for normal erythroid development (26).cDNAs for other vertebrate members of the GATA familyhave been isolated by virtue of their close sequence homol-ogy to the DNA-binding domain of GATA-1 (6, 13, 14, 18,40, 45).

In vertebrates, this DNA-binding domain (20, 41) includestwo similar zinc fingers of the general configuration Cys-X2-Cys-X17-Cys-X2-Cys. Nonvertebrate members include thetwo-fingered C. elegans protein elt-1 (31) and several single-finger fungal factors which also bind GATA motifs (4, 11, 12,16, 23, 43). The fungal finger region is more closely related tothe carboxy finger of the vertebrate members (16). Deletionand mutagenesis analyses have identified distinct functionalroles for the two fingers of mouse GATA-1 (mGATA-1) (20)and chicken GATA-1 (cGATA-1) (41) proteins. The carboxyfinger is required for binding, whereas the amino fingercooperates with it to provide full stability and specificity ofbinding.Each GATA family member exhibits a distinctive, often

overlapping, pattern of expression in tissues and cell lines.

* Corresponding author.

GATA-1 is found in cells of the erythroid lineage (8, 37), intwo other hematopoietic lineages (megakaryocytic and bonemarrow-derived mast cells [22, 29]), and in hematopoieticprogenitor cells (3, 32). GATA-2 is also expressed in pro-genitor cells, mast cells, megakaryocytes (44), embryonicbrain cells, primitive erythroblasts (40), endothelial cells,embryonic stem cells, and a variety of other cells and tissues(6, 18, 39, 40). In fact, GATA-2 has been implicated as adirect effector of selected endothelium-specific genes, suchas preproendothelin-1 (6, 18, 39). GATA-3 protein is highlyexpressed in T-lymphoid cells and embryonic brain cells(13-15, 40), but elsewhere (endothelial cells and embryonicstem cells) it is expressed at a low level. A role of this proteinin the regulation of T-cell receptor a- and B-chain genes hasbeen suggested (13-15). Expression of GATA-4 is restrictedto the heart, intestinal epithelium, primitive endoderm, andgonads (1).A common property of all GATA proteins is their high-

affinity binding to a sequence motif conforming to theconsensus T/A (GATA) A/G. Motifs with assigned func-tional significance that conform to this consensus have beenfound in various regulatory regions (promoters of genesexpressed in erythroid, megakaryocytic, mast, and endothe-lial cells; globin and T-cell receptor a- and B-chain geneenhancers; and a- and 3-globin locus control regions (for areview, see reference 25).The abilities of various members of the GATA family to

recognize closely related, but not identical, DNA sequenceelements raises interesting possibilities as to how differentialgene regulation is accomplished in cells expressing morethan one GATA protein. That is, differential regulation mightbe achieved by selective high-affinity binding of one, but noother, GATA family members to a target sequence becauseof subtle variations in their DNA-binding domains. To definethe spectrum of DNA-binding sites recognized by differentGATA family members in an unbiased manner, we haveemployed the method of polymerase chain reaction (PCR)-

3999

Page 2: DNA-Binding Specificity of GATA Family Transcription Factors

4000 MERIKA AND ORKIN

mediated random site selection to identify sites recognizedby different GATA members.

Interestingly, we found that a substantial proportion ofDNA motifs selected by all GATA-binding proteins deviatesignificantly from the previously described consensus. Fur-thermore, we have identified binding sites which are prefer-entially recognized by a specific GATA family member. Wealso show that mGATA-1 and human GATA-2 (hGATA-2)proteins appear to exhibit a broader sequence specificitycompared with that of hGATA-3.We provide direct evidence that the single zinc finger of

the areA gene from the fungus Aspergillus nidulans (16) iscapable of mediating specific interactions with consensusand nonconsensus GATA motifs. By performing cotransfec-tion experiments, we observed that transactivation of re-porter constructs by mGATA-1 and hGATA-3 proteinsparallels their in vitro binding to the respective sites. Fur-thermore, a hybrid protein derived by substitution of thefinger region of mGATA-1 protein by the correspondingregion of the areA molecule is capable of activating tran-scription from reporter constructs, containing differentGATA motifs, in mammalian cells.

MATERIALS AND METHODS

Expression and purification of bacterially expressed pro-teins. Proteins used were produced in Escherichia coli by thebacteriophage T7 RNA polymerase expression system (34).cDNAs for mGATA-1; CfmGATA-1 (residues 230 to 336);hGATA-3; the hybrid molecules mGATA-1(fhGATA-3),mGATA-1(Nf/fareA), and mGATA-1(fareA); as well as thegenomic sequence of the areA gene (residues 467 to 587)were recovered by PCR with appropriate primers and clonedinto the bacterial expression vector pET8C (35). In thismanner, nonfusion proteins were expressed. The cDNA forhGATA-2 protein (residues 284 to 406) (6, 18) was clonedinto pET15b (Novagen) under the T7 RNA polymerasepromoter, but in fusion with 27 amino acids contributed bythe vector, including the six-histidine moiety. Plasmids wereintroduced into the bacterial strain BL21(DE3), which ex-presses the T7 RNA polymerase under the control of thelacUVS promoter.mGATA-1 protein and the C finger of mGATA-1 were

recovered from inclusion bodies as follows. Cultures weregrown to an optical density at 600 nm of 0.5. Isopropyl-p3-D-thiogalactopyranoside (0.5 mM) was then added, and cul-tures were incubated for 3 h. Cells were harvested and lysedby adding cold lysis buffer (50 mM Tris-Cl [pH 8.0], 200 mMNaCl, 2 mM EDTA [pH 8.0], 1 mM dithiothreitol [DTT], 1mM phenylmethylsulfonyl fluoride [PMSF], 10 mM benz-amidine). Lysozyme was added to a final concentration of0.2 mg/ml, and the lysate was incubated on ice for 20 min.Triton X-100 was then added to a final concentration of 1%for 10 min on ice. Lysis was completed by sonication (sixtimes for 15 s [each] with a microtip). Approximately 10 mlof the sonicated solution was layered onto 15 ml of sucrosesolution (40% sucrose, 10 mM Tris-Cl [pH 8.0], 0.2 M NaCl,1 mM EDTA [pH 8.0]) and centrifuged at 1,500 x g for 30min at 4°C. The pellet (inclusion bodies) was resuspended in0.5 ml of phosphate-buffered saline solution plus 2.5 ml ofextraction buffer (8 M urea, 0.5 M NaCl, 0.5 M Tris-Cl [pH8.0], 1 mM EDTA [pH 8.0], 1 mM DTT, 1 mM PMSF, 10mM benzamidine) and then gently vortexed. The suspensionwas incubated on ice for 10 min. Denatured proteins wereslowly allowed to renature by stepwise dialysis againstbuffers (50 mM Tris-Cl [pH 7.5], 100 mM NaCl, 10%

glycerol, 1 mM PMSF, 1 mM benzamidine, 0.5 mM DTT, 10,uM ZnSO4) containing 6, 4, 2, and 0 M urea. Renaturedprotein was centrifuged at 7,500 x g for 10 min at 4°C andwas aliquoted for storage at -80°C.hGATA-3, areA, mGATA-1(fhGATA-3), mGATA-1(Nf/

fareA) and mGATA-1(fareA) proteins do not form inclusionbodies in bacteria and were purified in a total bacterial lysateas follows: pelleted cells were resuspended in lysis buffercontaining 25% sucrose, 0.2 mM EDTA, 40 mM Tris-Cl (pH7.5), and 1 mM DTT. Lysozyme, PMSF, and urea were thenadded at final concentrations of 1 mg/ml, 1 mM, and 8 M,respectively. The lysate was incubated at 4°C for 1 h, andthen centrifuged at 7,500 x g for 1 h at 4°C to removeinsoluble material. The supernatant was subjected to step-wise dialysis against buffers (10 mM Tris-Cl pH 7.5, 25 mMNaCl, 1 mM EDTA, 0.1% Triton X-100, 1 mM DTT, 10%glycerol, 1 mM PMSF, 0.1 mM benzamidine, 10 ,uM ZnSO4)containing 6, 4, 2, and 0 M urea. Renatured protein wascentrifuged at 7,500 x g for 10 min at 4°C and aliquoted forstorage at -80°C.hGATA-2 protein used in all experiments, as well as

mGATA-1 and hGATA-3 proteins employed for affinitymeasurements, was purified to homogeneity as His fusionproteins. cDNAs were cloned into the vector pETlSb(Novagen). Proteins were purified as follows. Induced bac-terial pellets were resuspended in buffer A (10 mM Tris-Cl[pH 8.0], 0.1 M NaH2PO4, 8 M urea), pH 8.0. Lysis wascompleted by repeated sonication cycles. Insoluble materialwas removed by centrifugation at 7,500 x g for 30 min. Theclear lysate was applied to a nickel chelate affinity resin(Qiagen), and proteins were eluted by a pH gradient of bufferA. Purified proteins were renatured slowly by dialysisagainst buffer B (500 mM NaCl, 20 mM HEPES [N-2-hydroxyethylpiperazine-N'-2-ethanesulfonic acid] [pH 7.9]-KOH, 0.1% Nonidet P-40, 0.2 mM EDTA, 1 mM DTT, 0.5mM PMSF) containing 6, 4, 2, 1, and 0 M urea. AlthoughmGATA-1 and hGATA-2 are efficiently renatured by thisprocedure, hGATA-3 precipitated even in the presence ofhigh amounts of carrier protein. Therefore, we renaturedhGATA-3 by 100-fold dilution of the denatured protein inrenaturing buffer containing 10% glycerol, 0.2 mM EDTA,40 mM Tris-Cl (pH 7.5), 1 mM DTT, 50 mM NaCl, and 10p,M ZnSO4 supplemented with 0.1 mg of bovine serumalbumin (BSA) per ml. Renaturation was allowed to proceedovernight at 4°C.Random binding site selection. The oligonucleotide used

contains 20 random nucleotides flanked by 14 bases ofdefined sequence on either side to permit PCR amplificationand subsequent cloning of selected sequences. Double-stranded molecules were generated by annealing primer 1(AAGCGGCCGCTCGAGGATCC) and extension by TaqDNA polymerase. The double-stranded pool was purified bygel electrophoresis and incubated with bacterially expressedGATA proteins. Protein-DNA complexes were separated on4% (in the case of mGATA-1 and hGATA-3) or 8% (in thecase of areA and hGATA-2) native polyacrylamide gels.Bound DNA was eluted in a solution of 0.5 M ammoniumacetate, 0.1 mM EDTA, 10 mM magnesium acetate, and0.1% sodium dodecyl sulfate. DNA was amplified by PCRwith primers 1 and 2 (TGTAAGCTTCCCGGGAATTC).One-fifth of the PCR reaction mixture was used in a secondcycle of binding and amplification. The product of the fourthcycle was digested with XhoI and HindIII and cloned intopSP73 (Promega). Individual clones were chosen and se-quenced by standard methods.DNA-binding reactions, band shift assays, and Kd measure-

MOL. CELL. BIOL.

Page 3: DNA-Binding Specificity of GATA Family Transcription Factors

DNA-BINDING SPECIFICITY OF GATA PROTEINS 4001

ments. Binding reactions were performed in a buffer contain-ing 10 mM HEPES (pH 7.8), 50 mM potassium glutamate, 5mM MgCl2, 1 mM EDTA, 1 mM DTT, and 5% glycerol. Thebacterial lysate protein preparations were mixed in 20-,ulreaction mixtures including 1 ,ug of poly(dI-dC)(dI-dC)(Pharmacia) and BSA at a final concentration of 0.1 mg/ml.Incubations were performed on ice for 20 min. Reactionswere analyzed on 0.4x Tris-borate-EDTA native gels ofacrylamide concentration between 4 and 8% (depending onthe size of the proteins).

Binding reactions with the purified proteins used in the Kddetermination experiments were performed and analyzed asmentioned above, with the exceptions that no nonspecificcompetitor was included and the BSA concentration wasraised to 1 mg/ml.The dissociation constant (Kd) determinations were per-

formed under the binding and running conditions indicatedabove by using a constant amount of protein and serialdilutions of oligonucleotide probes. Quantitation of free andbound DNA was performed with a Phosphorlmager (Molec-ular Dynamics).

Plasmid constructions. Plasmids were constructed by stan-dard methods. Eukaryotic expression constructs formGATA-1, mGATA-l(fhGATA-3), and mGATA-l(fareA)contain the respective cDNAs cloned at the EcoRV site ofpCDNA1 (Invitrogen). Reporter plasmids containing se-lected oligonucleotide sequences were generated by insert-ing one copy of the respective oligonucleotide at the HindIIIsite of TATA-GH (20). All plasmid constructions wereverified by DNA sequencing.The mGATA-l(fhGATA-3) chimeric cDNA contains

mGATA-1 sequences corresponding to amino acid residues1 to 201 and 306 to 413. The finger domains of mGATA-1(residues 202 to 305) (37) were replaced by the correspondingregion of hGATA-3 (residues 260 to 365) (13). Similarly, themGATA-l(fareA) chimera contains the finger of areA (resi-dues 501 to 567) (16) in place of the two fingers of mGATA-1.mGATA-1(Nf/areA) retains the amino finger of mGATA-1.In the areA finger construct, the expressed region includedamino acids 467 to 587 of areA (16). Chimeric constructswere generated by PCR amplification.

Cell culture, transient DNA transfections, and humangrowth hormone (hGH) and lacZ assays. Mouse NIH 3T3cells were grown and transfected as previously described(20). In all experiments, the input reporter plasmid was 2 ,ug.Activator plasmid was varied from 0 to 12 p,g in the activatortitration experiments. Vector DNA was added as necessaryto achieve a constant amount of DNA. As an internal controlfor transfection efficiency, pCMV,B-lacZ plasmid (19) wasadded in all experiments. Medium for growth hormone assaywas removed 48 to 64 h after removal of calcium phosphateprecipitates. 3-Galactosidase activity was determined aspreviously described (30).

RESULTS

Selection of DNA-binding sites for GATA family members.WGATAR (W = A or T and R = A or G), a motif commonlyfound in promoters and enhancers of globin genes, hasgenerally been taken as a consensus sequence for binding ofGATA family members (10, 38). To define the specificity ofthese proteins in an unbiased manner, we used the method ofPCR-mediated random oligonucleotide site selection (la, 36)(see Materials and Methods). As proteins for this analysis,GATA-1, -2, and -3 were produced in bacteria by using theT7 polymerase expression system (34) (see Materials and

Methods). As full-length GATA-2 was exceedingly toxic toE. coli, we expressed a portion of the protein (amino acids284 to 406) containing the DNA-binding domain for theseexperiments.A compilation of sequences selected by these proteins is

presented in Fig. 1. The selected motifs are divided into fourgroups according to their similarity to the proposed consen-sus GATA recognition site. Groups 1 and 2 share a GATAcore with consensus (group 1) or nonconsensus (group 2)flanking nucleotides. Those in groups 3 and 4 deviate fromthe consensus either within the core itself (group 3) or inboth the core and flanking sequences (group 4). A consensussequence deduced for the binding site of each protein isshown below in Fig. 1.From the compilation of selected sequences, several con-

clusions can be drawn. First, a surprising fraction of thesequences (groups 3 and 4) for all proteins deviate from theprior proposed consensus site for GATA proteins. In allinstances, at least 50% of the selected sequences are withingroups 3 and 4. As shown below, these do not merelyrepresent low-affinity binding sites. Second, members of theGATA family exhibit overlapping, but distinct, specificities,as will also be expanded upon below. Third, among themammalian proteins, sequences selected for GATA-3 bind-ing exhibit the least variation in target sites, whereas modestand considerably higher variation is evident for the bindingof GATA-2 and GATA-1, respectively.

In vitro binding of GATA proteins to randomly selectedclones. Though the relative frequencies of bases at specificpositions in the deduced consensus sequences for GATA-1,-2, and -3 apparently vary, the overall consensus motifs arenot sufficiently different to distinguish the GATA familymembers on this basis in a clearcut manner. In part, this isthe case because the conjoined frequencies at specific posi-tions may be more relevant to binding affinities than theoverall consensus sequence. To demonstrate that the se-lected sequences are authentic protein binding sites and toascertain whether individual family members exhibit distinc-tive binding properties, we used the electrophoretic mobilityshift assay (EMSA) with probes derived from the selectedsequences. The probes used (restriction fragments of respec-tive clones) were randomly chosen from groups 1 to 4 ofsequences selected with GATA-1 and GATA-3 proteins. Thebinding of GATA-1 to various sites is displayed in Fig. 2A.As anticipated, GATA-1 binds to group 1-selected clones(e.g., A/GATAIG [lanes 5 and 6]) with an apparent affinitysimilar to that evident with a native A/GATA/A site presentin the erythropoietin receptor (EpoR) gene promoter (42).However, strong interactions are also observed with probesbelonging to group 2 (C/GATA/T [lanes 9 and 10] andC/GATA/G [lanes 11 and 12]); hence, flanking nucleotides donot play a predominant role in protein binding. DNA se-quences of groups 3 and 4 exhibited either strong or weakbinding (T/GATT/G [lanes 1 and 2], C/GATG/C [lanes 3 and4], T/GATG/G [lanes 7 and 8], and C/GATT/G [lanes 13 and14]), depending on the sequences flanking the core motif.For example, the C/GATG/C motif represents a higheraffinity site than T/GATG/G (compare lanes 3 and 4 with 7and 8).Assays of the same probes with GATA-3 reveal a distinct

binding profile (Fig. 2B). Only two probes (T/GATT/G [lanes1 and 2] and A/GATA/G [lanes 5 and 6]) show high-affinitybinding. Other sequences bind poorly, if at all. Note that theT/GATT/G probe binds strongly but deviates within theGATA core from the original WGATAR consensus, as wellas from the GATA-3 consensus derived from random site

VOL. 13, 1993

Page 4: DNA-Binding Specificity of GATA Family Transcription Factors

4002 MERIKA AND ORKIN

mGATA-1 hGATA-2 hGATA-3

ATAGctogtgcg cgatc.h. 4w tgcgcg aggagcgcagtgc!?CggcgggaqAdA!Agccatta acgggcgg cacttccccc tgggttg.......................gtccagg

co-T"2A4caagacacagtg tgcccgagg.......................atggt.....ta4&.0 cgtacgtcccc ataAbAtMtaccgggtgtgaaggcgggcjAq.M..ttggc

gacgggttatgtbMA*cc tggaTOAM gttggtgc9tacacgcgtgtcc .............

acagag.00= tgaAOAMcagggttgtgcgggc.0.W''A',acactatgtgg g W. gcatA4A!Magctacatca............. ....... gtatcaattcgtgggggqqAMccaccgtgcc do"Matgctcctaccca tg ccagtaacgct

....gggtag. tA'2Wacaccgcc .................... lwcgtteggcgca4daW ggttgAp&Mcacgagccccacccgc.....gagtgcc tggg.o ......A'Cttgccgttg ttgcatAdl!Maagggtggggcccat'.'A'. "'.gctcaccc GMAGcggtgtcttgtgtt gg gcagtgtatgc

ON&Wcttctccacccc tgg gtcgtttgtttgA tggcctggcctg Ap4TWcccgcatggtct

cacc..Ctgccaccgt aggcai.00.M=caattgccc............. gggctacccagArATARaccggggacgcibAttdttgccc acgaatglCGaMW tgtgc............9 atagagMTAXgggggtc

agcctacacagctccc.TGA.= 99999 ctclcttgg .............

aC............. WATaggtctacatgcgatggg.Tqh tgcaccatcccgc ...........g

.............. gggC.X cctcactttcegacctaacccgtcggggAPAMatcgcctgcc agcatcddM.tatcgtgcggggtg!dA!i.gtgttcc ggaggcacti gcct

tgcctagcccataCA"TAC...............

gtacgcgggW..M....Gttgtg iXM2AVattcggagggtccc acgttacgT agaccc4rA2WacaatgcccagtgtgcggT-OA=......tttgggtgcc aggtgatgagtgddAtAdc

gagtgggagcc2rA2'GA tg c ttgacttgtggt atgcaCCAVAgtgttggttaggcgtatgccaaAgJd=-cc ggcaacctggctg6V agcctagcccatacAMAC

ggcaVMAdcggcctggcg ggggag.WJW9ggctgddk.ttgctcccctcctattaVQW.W gtacc cttggatgtdAtA!ggtgcgggcaataC.CATW'.gacgcac 9

ggtaccctgWJIM cgacggaaCWAatggat*tagggggaacgqqh tccccccgggc...gtacagtgtg tggcaCWAAcaagcagtcAWdWggagggccttctgt

aA;AM"ggagtgttatccc gggtcc;q..gttcctgAM2Wggagggccttctgtggggtggag**"Wcc aaactggcggtacg

ggdPaMMtgccatgccgt aagtggcooA!tWattgccggcact2G gcggcgtc......gggtc%AGAT=Cagcccta tggggo". gtgcttgc

ggggtaatga cccaagctgtaakmttgtgcg ...........

caectcgaacactca2GA22Gtgggcta.ft.ftttaccccaaccgactatggtT,O^M.caaagggcA"TTOggggttgtggagq.W . aatca!,Q.AA74;ttcgaggtgccacaldh&Mtgggttgcc cactgctlWAWacacct gctaalVArlOcatcgggtgccaccgtccgctoahlw tcgAgOO.M'ccaatctgcc 1;4$"agctctatcgatgccacagcacaaccatq00A2W ggeagattg4o.041=cga actttgAOXXTOgatggtddAftdagcggacggtggc ggcaaacacace ct

aggagcgc2G&VTAtgcggtgggcggagoop"ggtgcc ccgaggcagcctatd2QA= gct2dhdWtaagaatgggcgcOGA!Wtgggcaattatc ggcacj4A=cagcgtctg gcttgagagtaA NMIOggegggggacgqtcOCAW.Tagt ggcaacagagcggcg?A g.............. agggccTq..cgtcgggcacctggAGA agcgttg tgagttcaaatoc2GL?2r-cccOCA290catactgtgtt cagccagcttaggaoooMcgccc g

gggatgA4AtV*aatctgccgcgtctttgTqog*ctgttc cgccaggccgctiVM, "',Mc..... gagctaaAQAIIMtgctgcgagggggaAdTTUccgttg gc;.TQJ.M ctttgccgccc

tactcagacacgccAM.TAQgcggtgaTCuMM.gggtgcc 04.qamtctcgtcccgccc taactggcgtgtc2CmAMAgcactcecacaccA4A!Wc gagaccgcctcgAQGASCJLa aglGAGAAccagcttgtagaAQ&Mtaacgtcacagccc ctcctatagtga4 c'gtgcgatagcaggTQA OlkagaT c=AA I .........gtcacagccc gagacgg4.wmt..Wgttgtg

...........: gggagagtcggca!OAWOtatcgctgcgtcgaGGA?Wc tatcgggcgOgAMtgca tgggAGAMOfOcaggccaggtaagagc24AMcggcttgc g.".l'.l.l.;tacacgtgtcca gggccgoow"tgtgcgggggqctWAAMtggatgaca atggacggA'GMM...... ggcacc gtcgggcgttggtg?CAIW...........

gtggggaggagctt ggcgaca600ATWtcagga.... ....

aaatcgQqAXMttagtgggg agagggcagtgd.WAMgc tgtCA'AlWtgtgctgattgtgagttcACJMW-gattgg

gqq4TWattgaatctgccgcttdOAM.gatatggtgcg

................ ttggctCdh2W.Catctgcc..

...... ........ gtgga4. AQgttcgggg.......... ...

aggcgcCGATTGcgtgtccc.. .. ..... aacaggaacacgactW x :-x T.

gggtggWA2CO,,gcattggt.. ... ... ...

.. gctcgagaC"IRC4tqcctgtCoWG4taaggttggtgc

A 6 16 13 052 :7 7 57::::.::.:::.. ::::.:.. ..-::::::

... .... ... ..16.. 1..... ... o . ..

:.. .. .-.-.:. ....~~~~~~~~~~~~~~.::.::.::.:.::......, :. .,:-::::..

T ii 2 10 0 1 52 6 10 1314........Conmnmjs 0/C/T A/G/G N..........A.T..A......../C/T.../C/T.

24 1 i 031 33°6 3:::-:1:.::o:::1................ ....

C -: .: :::.. .

:::.14 .: 0 21..

T 6 1116616.61..

|-iA/G/C 0A/C G A T A A/G G AGj

Page 5: DNA-Binding Specificity of GATA Family Transcription Factors

DNA-BINDING SPECIFICITY OF GATA PROTEINS 4003

FIG. 1. Compilation of sequences selected by mGATA-1, hGATA-2, and hGATA-3 proteins. DNA was selected by using bacteriallyexpressed GATA-binding proteins and a random oligonucleotide, as described in Materials and Methods. After four rounds of selection, theDNA was cloned and analyzed by DNA sequencing. The DNA sequences are aligned to conform to the previously described consensus forGATA-binding sites and grouped (groups 1 to 4) according to the core and immediate flanking sequences (in boldface) (see text). Only therandom portion of the oligonucleotide sequence is shown. On the basis of the frequency of each nucleotide at each position, a consensussequence can be deduced (bottom) for each protein's binding site. The compiled sequences are inclusive for each protein, i.e., all clones thatwere sequenced are shown.

selection (Fig. 1). From the direct comparison of bindingassays shown in Fig. 2A and B, we conclude that the precisebinding specificities of GATA-1 and GATA-3 differ and thatGATA-3 has a more restricted range of potential bindingsites.As shown in Fig. 3, the binding specificity of the GATA-2

finger domain approximates that of the GATA-1 proteinmore closely than that of GATA-3. Nonetheless, as revealedby assay with a probe containing the T/GATG/G motif, thebinding specificities of GATA-1 and GATA-2 can be distin-guished in at least one instance.

In total, 22 different probes were tested for GATA-1, -2,and -3 binding. Results are presented in Table 1, withrelative binding affinities scored as - to + + + + permittingclearcut discrimination between GATA-1 and GATA-3 bind-ing and more subtle distinctions between GATA-1 andGATA-2 binding.To substantiate the relative binding differences seen in the

/11 0.1 1 0.1 1 0.11 01 1 0.1 1 .1 1 0.11 1

mGATA-1 [ WI,1F

A -_ _ -9`1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

5e 51 51 51 51 51Plt 1 5 1 5 1 5 1 5 1 5 1 5 1 S 3

hGATA-3 -

B F- 1 2 345 6 7 8 10 11 12 3 14 15

FIG. 2. Comparative in vitro binding studies between mGATA-1and hGATA-3 proteins by using different randomly selected clonesas probes. EMSA was performed for the complexes formed betweenrandomly selected sequences (restriction fragments of respectiveclones) with the indicated GATA motifs (shown on the top of thelanes) and either mGATA-1 (0.1 and 1 plJ) (A) or hGATA-3 (1 and 51l) (B) bacterially expressed proteins. As a control probe anoligonucleotide from the erythropoietin receptor promoter (EpoR)(42) including the A/GATA/A motif was used (lanes 15). Specificcomplexes are indicated by bracket (mGATA-1) or arrow (hGATA-3). F, free probe.

EMSA, we also determined dissociation constants (Kds) forthe binding interactions of purified bacterially expressedproteins and oligonucleotides corresponding to various nat-ural or randomly selected sequences (see Materials andMethods). A representative saturation experiment forGATA-1 binding and the derived Scatchard plot are shownin Fig. 4. Kd measurements are summarized in Table 2.These closely parallel the relative binding evident in theEMSA. High-affinity binding sites exhibit Kds of -10 to 15nM for GATA-1, -2, and -3 proteins. As expected from theEMSA results shown in Fig. 2, quantitative data cannot beobtained for the interaction of the His fusion purifiedGATA-3 protein and the G/GATAIG site, ruling out thepossibility that the purity of the GATA-3 preparation influ-ences its binding specificity. Also as expected, methylationinterference experiments (data not shown) confirmed thatmGATA-1 makes specific contacts within the nontypicalT/GATT/G motif.To establish whether the different DNA-binding prefer-

ences we identified for bacterially expressed GATA-1 andGATA-3 proteins are also shared by their native counter-parts, we tested our selected sequences in EMSA with MELcell nuclear extract as a source of endogenous mGATA-1,MOLT-4 cell nuclear extract, and COS cells transfected witha hGATA-3 expression vector extract as a source ofhGATA-3. The binding specificities were identical to thoseobserved with bacterially expressed proteins (data notshown). Hence, the different specificities of the GATA

ov,u,, G\CyN F\'e'\Nj cfC.C? <&

.S

4-

I<}

]F1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

L GAT I-_.IImGATA-1 h(;ATA-2 hC(JATA-3

FIG. 3. EMSA demonstrating differential binding specificity ofdifferent GATA-binding proteins for selective GATA sites. Radio-labelled probes containing the indicated GATA motifs (top of thelanes) were incubated with mGATA-1 (lanes 1 to 5), hGATA-2(lanes 6 to 11) and hGATA-3 (lanes 12 to 17) bacterially expressedproteins. Complexes were analyzed on a native polyacrylamide gel.Specific complexes are indicated by bracket (mGATA-1), filledarrow (hGATA-3), and open arrow (hGATA-2). F, free probes.

VOL. 13, 1993

Page 6: DNA-Binding Specificity of GATA Family Transcription Factors

4004 MERIKA AND ORKIN

TABLE 1. Discrimination in DNA-binding specificities between different GATA proteinsa

DNA-binding specificityProbe SitemGT1 mAA1mGATA-1 hGATA-3 (fareA) mGATA1l CfmGATA-1 hGATA-2

1 T/GATT/G +++ ++++ +++ ++++ ++ +++2 C/GATG/C +(+) +l- - + - ++3 A/GATA/G +++ ++ +++ ND +++ ++4 T/GATG/G +I- - - - - +5 C/GATA/T ++ - +(+)- ++ ++6 C/GATA/G +++ +++ ND +++ +7 C/GATT/G ++(+) - ++++I- +8 A/GTTA/G +/- - -

9 A/GATT/G +++ +- +++ ND +++ +++10 G/GATA/A +++ - ++ ND +(+) ND11 A/GAAT/G -12 T/GATT/A +++ +++ ++ ND +++ +++13 G/GTTA/G +++ ++ ND + +(+) + +14 G/GATT/T + - +/- ND _ ND15 C/GATC/G - - - - -16 G/GATA/G +++ -+++ - +++17 C/GATA/C +++ +++ ND +++ ND18 C/GATT/C ++ - ++ ND +/- ND19 G/GATG/G + - + ND +/- ND20 C/GATT/A +++ +++ ND ND21 C/GATA/G +++- ND ND ND +++22 A/GATA/A +++ +++ +++ +++ +++ +++

a Results are summarized after EMSA was performed with restriction fragments of different randomly selected clones as probes and bacterially expressedGATA-binding proteins. Probe 22 is an oligonucleotide including the naturally existing GATA-binding site (A/GATA/A) from the erythropoietin receptor (EpoR)promoter (42). ND, binding specificity not determined; -, no detectable binding. Different degrees of binding are indicated as +/- to + + + +.

members are intrinsic to the proteins rather than the conse-quence of modifications in mammalian cells.

Restricted binding specificity of the carboxy finger domainof GATA-1. Previous studies have suggested different rolesfor the two-zinc finger domains of both mGATA-1 andcGATA-1 proteins. Specifically, the carboxy finger is suffi-cient for DNA binding, yet the amino finger contributes tothe stability of the protein-DNA complex (20, 41). Using ourcollection of selected sequences, we have asked directlywhether the binding specificity of the carboxy finger can bedistinguished from that of GATA-1 bearing two fingers. Forthese experiments, the carboxy finger of mGATA-1 (aminoacids 230 to 336) was cloned in a bacterial expression vector(see Materials and Methods). Bacterial lysates for this pep-tide, designated CfmGATA-1, were tested in parallel withintact GATA-1 by using the selected motifs shown in Table1. Although the majority of selected motifs bind with com-parable affinities to the two proteins, there are several motifs(C/GATG/C, G/GATT/T, T/GATG/G, A/GTTA/G, G/GATT/T, C/GATT/C, and G/GATG/G) which bind with variousaffinities to intact GATA-1 but fail to bind to the expressedcarboxy finger peptide. Therefore, these results demonstratethat one role of the amino finger is to increase the spectrumof sequences recognized by the carboxy finger. Thus, thetwo-finger domains interact in a complex manner in bindingsite recognition.

Restricted binding potential of GATA-3 is localized to thezinc finger region. A subset of selected sequences are boundby GATA-1 but not GATA-3. To determine whether thisdifference in specificity can be attributed to the zinc fingerdomains and/or other portions of the proteins, we con-structed a chimeric molecule in which the two zinc fingers ofGATA-1 were replaced with those of GATA-3. This chi-mera, designated mGATA-1(fhGATA-3), was assayed inparallel with intact GATA-1 and GATA-3 in EMSA. As

shown in Fig. 5, we find that the zinc finger domain ofGATA-3 is predominantly responsible for its restricted bind-ing specificity. Very low binding of some probes (C/GATT/Gand C/GATG/C) to the chimera implies a minor contributionof the body of GATA-1 in directing or stabilizing complexformation. Nonetheless, this finger replacement experimentdemonstrates that the binding specificities of the GATAproteins are determined by their finger domains, a resultconsistent with independent experiments which define thisregion as necessary and sufficient for DNA recognition (20,41).The binding specificity of the areA finger closely resembles

that of GATA-1. The carboxy finger domain of the vertebrateGATA proteins is more similar to the single finger of areAthan is the amino domain (16). We have examined thebinding properties of the areA finger by expressing the fingeralone or as chimeras with the body of GATA-1 in which theareA finger replaced only the carboxy GATA-1 finger[mGATA-1(Nf/fareA)] or both fingers [mGATA-1(fareA)].By the random site selection method, the expressed areAfinger-selected sequences resembled those recovered withthe vertebrate proteins (Fig. 6A). Again, about 50% of thesequences were within groups 3 and 4. To search moredirectly for binding site distinctions, we assayed the chi-meric GATA/areA proteins with individual selected se-quences. The binding specificities of mGATA-1(Nf/areA)and mGATA-1(fareA) were virtually identical (Fig. 6B).These proteins, as illustrated by mGATA-l(fareA), weremost similar to intact GATA-1, rather than GATA-3, in theirbinding profiles (Table 1). Nonetheless, the binding ofmGATA-1(fareA) and GATA-1 to some target sequences(e.g., T/GATG/G, C/GATG/C, and A/GTTA/G) could bedistinguished. From these results we conclude that (i) thesingle finger of areA approximates the specificity of two-

MOL. CELL. BIOL.

Page 7: DNA-Binding Specificity of GATA Family Transcription Factors

DNA-BINDING SPECIFICITY OF GATA PROTEINS 4005

GGATAG, mGATA-11.2 -

1.0 -

0.8 -

P.-

0.6 -

0.4'

0.2 -

0 50 100

Input [nMJ

GGATAG, mGATA-1

0

IL

o

0.04 -

0.03 -

0.02 -

0.0 0.5 1.0 1.5[Bound] nM

FIG. 4. Measurement of the DNA-binding affinity of mGATA-1protein for the GGATAG site. The dissociation constant (Kd) for thisinteraction was estimated as described in Materials and Methods byquantitating EMSA. (Top) The amount of bound probe is plotted asa function of the total input. (Bottom) Scatchard plot for thesaturation curve shown in the top pannel. Kd = -1/slope.

fingered GATA-1 and (ii) it more closely resembles that ofGATA-1 than GATA-3.

Transcriptional activation through nonconsensus GATAsites parallels in vitro binding. GATA proteins activatetranscription upon DNA binding (6, 9, 13-15, 18, 20).GATA-1 is a particularly strong activator in cotransfectionassays. Indirect data suggest that, at least with a subset ofbinding sites containing overlapping or double GATA mo-tifs, transactivation by GATA-1 may depend on the precisesequence to which it is bound (20, 41). In addition, it hasbeen proposed that the amino finger of cGATA-1 may itself

TABLE 2. Differential affinity of GATA-binding proteins fordifferent naturally existing or randomly selected GATA sites

Kd (nM)aSite

GATA-1 GATA-3 GATA-2 CfmGATA-1

AGATAA (EpoR) 11 25 15 30TGATTG 29 7 ND NDGGATAG 52 - ND NDTGATTA 33 8 ND ND

a The dissociation constants (Kds) were determined as described in Mate-rials and Methods by quantitating EMSA with a PhosphorImager. ND, notdetermined; -, no detectable binding.

contain an activation domain (41). We, therefore, sought todetermine whether transcriptional activation by GATA-1and two chimeras, mGATA-1(fhGATA-3) and mGATA-1(fareA), paralleled their DNA-binding specificities or otherproperties of the respective molecules. The use of suchchimeric molecules controls for differences in activationproperties of GATA family members that would obscuredirect comparisons. For example, GATA-3 is an appreciablyweaker transactivator than GATA-1 in cotransfection assays(6, 14). cDNAs for the respective proteins were expressedtransiently in NIH 3T3 cells with the mammalian expressionvector pCDNA1 (Invitrogen). As reporters, we used plas-mids bearing the hGH gene driven by a minimal promoterinto which single selected GATA binding sites were cloned(see Materials and Methods).As shown in Fig. 7, by using saturating levels of effector

plasmids, we observed a strict correlation between in vitroDNA binding and transactivation in the cotransfection as-say. For example, GATA-1, mGATA-1(fhGATA-3), andmGATA-1(fareA) all bind T/GATT/G and T/GATT/A motifsand activate reporter plasmids bearing these elements.Transactivation by the chimera with the finger domain ofGATA-3 is reduced in reporters bearing C/GATT/A,T/GATG/G, C/GATAIT, and G/GATA/G motifs. In agree-ment with our in vitro binding data, only GATA-1 itself bindsand activates transcription through the T/GATG/G motif;however, it does so at a low level compared with otherreporters. To confirm these observations, we have alsoexamined transcriptional activation by GATA-1 andmGATA-1(fhGATA-3) on three different targets with a con-stant input of reporter and variable amounts (0 to 12 ,ug) ofeffector plasmid (Fig. 8). Again, transactivation directlyparallels DNA binding by these activators. Differential pro-tein accumulation can be excluded as an explanation for thedifferences between transactivation by mGATA-1 ormGATA-1(fhGATA-3) proteins through the G/GATA/G andC/GATT/A sites, since activation by the two effector mole-cules is equivalent when a high-affinity binding site for bothproteins is included in the reporter construct (Fig. 8, upperpanel, Mla-GH reporter). Thus, GATA-1 is capable ofactivating transcription through nonconsensus binding sites.In these experiments, we have not observed activationdifferences referable to properties of the finger domainunrelated to DNA binding per se.

DISCUSSION

GATA-binding proteins constitute a family of transcrip-tion factors that recognize a discrete target site, hithertoconforming to the consensus WGATAR (10, 38). Membersof this family have been found in fungi, C. elegans, Dro-sophila melanogaster (7a), birds, amphibians, and mammals(for a review, see reference 25). DNA recognition isachieved through novel zinc fingers that are present eitheronce (as in areA) (16) or twice (as in C. elegans andvertebrates) in the protein (20, 31, 41). In mammals, the fourknown members (GATA-1, -2, -3, and -4) are expressed indistinct, yet often overlapping, cell types and are regulateddevelopmentally (for a review, see reference 25). Though theroles of these proteins in various cellular programs are stillunder active study, a requirement for GATA-1 in erythroiddevelopment has been established (26).

Flexibility of binding sites for GATA family membersrevealed by PCR random site selection. In this study, we haveaddressed the DNA-binding specificities of different GATAfamily proteins. In particular, we have used the technique of

VOL. 13, 1993

Page 8: DNA-Binding Specificity of GATA Family Transcription Factors

4006 MERIKA AND ORKIN

dd0d0dIt~~ G'4K f¾s~ck ~f

I.,

F --1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

IL _ _ ILt _ _

mGATA-1 (fhGATA-3) mGATA-1 hGATA-3

FIG. 5. The restricted binding potential of hGATA-3 protein is localized to the zinc finger region. EMSA was performed for the bindinginteractions between probes containing the indicated GATA motifs (indicated at the tops of the lanes) and the chimeric mGATA-1(fhGATA-3)(lanes 1 to 7), mGATA-1 (lanes 8 to 14), and hGATA-3 (lanes 15 to 21) bacterially expressed proteins. Interactions of the respective proteinswith the EpoR oligonucleotide containing the A/GATA/A motif (42) are shown in lanes 1, 8, and 15. The bracket indicates mGATA-1(fhGATA-3)- and mGATA-1-specific complexes, whereas the arrow indicates hGATA-3 protein complexed with DNA. F, free probe.

PCR-mediated random binding site selection to isolate inan unbiased manner sequences bound by recombinantGATA-1, -2, and -3, as well as the fungal protein AreA. Incontrast to expectations based on previous studies, we havedetected distinct preferences in the binding of these proteinsto specific target sequences. Several interesting conclusionscan be derived from these results.A principal finding of this study is the appreciably greater

flexibility of the recognition sites for these proteins. Al-

though sequences consistent with the prior consensus siteare represented in the selected binding sites, many othersthat deviate both in the core and/or flanking sequences wereidentified. By EMSA and measurement of dissociation con-stants (Kds), we have shown that many of the nonconsensussequences bind protein with a high affinity equivalent to thatof conventional GATA motifs. Hence, despite the extent towhich various experimental parameters, particularly thepreparation of recombinant protein, the ratio of protein to

Iagg ctaaTGAt* :cqoqagtgtsTc-hk

RJP I

CGM%( C. C agc _. _ c.§i.qgqagan 0gg ::--aaagacca7-.acgtaacXMc: gtg XP 2

cqaatgcacag w.4

ctcCCacccttcc1cra3-t.t 3cg z>''

'nGATA-1 (Nf/fareA)(2p1)?nGATA-1 (fareA)(2pu)

+ + +

+

+ + I J +

"I "I "I+ + + + + +

taagtgagcggggTCAT.tNAGqaacaca'.

gaa _t cactg,R t gc:gaceCr CC', :;

-LocgcagtaamCttgagcac4 -Cat*;aga02cTT0c:--

ac0cactgcaactcctCm=

3P 3

IJP 4

A 20 19 O 1t

G 6 19 0 4 1

C 600C 0 0T 5 0 0 1a 4

F -b-CO,se,5US N 0i A T A G

A B 12 3 4 5 67 891011121314FIG. 6. Protein-DNA interactions of the areA Zn finger domain. (A) Compilation of the sequences selected by the bacterially expressed

areAi finger region (16). Sequences are aligned and grouped as defined in the legend to Fig. 1. The deduced consensus sequence is shown atthe bottom of the panel. (B) Analysis of the DNA-binding interactions of mGATA-1/areA chimeric proteins. EMSA was performed withrestriction fragments of different randomly selected clones as probes and bacterially expressed mGATA-1(Nf/fareA) (lanes 4, 6, 8, 10, 12, and14) and mGATA-1(fareA) (lanes 3, 5, 7, 9, 11, and 13) proteins. The interactions of the two chimeric proteins with the control EpoRoligonucleotide are shown in lanes 1 and 2. The different GATA motifs in each probe are indicated on the top of the lanes. Bracket indicatesspecific protein-DNA complexes. F, free probe.

'a :., iE.-.41 -IL.-

MOL. CELL. BIOL.

-W TrVW ..z,

4s: 0-la m

- 11 .:i,

"v

Page 9: DNA-Binding Specificity of GATA Family Transcription Factors

DNA-BINDING SPECIFICITY OF GATA PROTEINS 4007

+++ TIGATT/G

+++ T/GATT/A

C/GATT/A

47.. T/GATG/G

+(+) C/GATAIT

+++ G/GATA/G

Mla

* mGATA-1 (fareA)* mGATA-1 (fhGATA-3)* mGATA-1

Fold activationFIG. 7. Transactivation through naturally existing and randomly selected GATA motifs by different GATA-binding proteins. Cotransfec-

tion experiments in NIH 3T3 cells were performed with reporter plasmids (2 ,ug) containing a single copy of an oligonucleotide including agiven GATA motif (site) cloned upstream of a TATA-hGH cassette (see Materials and Methods) and 12 ,ug of pcDNAI-derived expressionvectors encoding for mGATA-1, mGATA-1(fhGATA-3), or mGATA-1(fareA) proteins. A total of 1 pg of CMVI3acZ plasmid was includedin each case to correct for transfection efficiency. Mla: control reporter construct derived from cloning of an oligonucleotide from the mouseal-globin promoter (including a T/GATA/A site) upstream of a TATA-hGH cassette (20). The fold of activation is given relative to that of acontrol transfection performed with expression vector alone. In vitro binding data for the respective protein-DNA interactions aresummarized at the left. Different degrees of binding are scored as +/- to +++. -, no detectable binding.

input oligonucleotides for the binding assay, or the numberof cycles of selection employed, might influence the spec-trum of binding sites obtained, the variant sites we haveidentified through this procedure are bona fide, high-affinityin vitro targets. In fact, chi-square analysis of individualmotifs suggests that the selected sequences were obtainedfar in excess (P < 0.002 in most instances) of what would beanticipated at random. Thus, the GATA proteins appear totolerate a greater degree of flexibility in their DNA targetsthan we had previously appreciated.

Interestingly, our selection scheme identified a few sitesselected by a given GATA protein (notably, GATA-3) whichbind poorly to the respective protein in EMSA. In contrast,the same sites bind more strongly to other GATA familymembers. As an example, the A/GTTA/G motif selected byhGATA-3 does not exhibit detectable binding with thisprotein in EMSA. However, upon incubation with GATA-1,binding is detected. Furthermore, when the C/GATAFTmotif selected by hGATA-3 was tested for hGATA-3 ormGATA-1(fhGATA-3) protein binding, it also appeared torepresent a low-affinity binding site. Nonetheless, efficientbinding was obtained with all the other proteins tested (Table1). These observations suggest that DNA motifs of this typedo not represent irrelevant background of the random PCRsite selection, but rather low-affinity sites revealed by re-peated amplication steps. Whether such low-affinity siteswould be functionally significant in vivo is speculative.

Differences in DNA-binding specificity reside in the fingerdomain. A novel finding to emerge from these studies is thesubtle differences in binding site specificities that existamong the family members. In this regard, the more re-stricted specificity of GATA-3 compared with that ofGATA-1 is most illustrative. Somewhat finer distinctions inbinding specificities were apparent among GATA-1,GATA-2, and areA proteins. Different degrees of purity ofthe various GATA protein preparations cannot accountfor these observations, as individual motifs (such asG/GATA/G) permit a clearcut discrimination betweenmGATA-1 and hGATA-3 proteins from crude bacterial

extracts, as well as from purified material (Table 2). Byreplacement of GATA-1 finger with those of GATA-3, wehave demonstrated that their differential specificities reflectproperties of the respective finger domains rather than otherportions of the proteins. Accordingly, the specificity wehave observed for the hGATA-2 finger region should closelyapproximate that of the native protein. Since limited aminoacid substitutions in the finger domains characterize thevertebrate GATA-1, -2, and -3 proteins (45), we infer thatspecific amino acid differences in the otherwise highly con-served fingers direct subtle variation in target site recogni-tion for these proteins.Our data from using selected binding sites also provide

additional support for the interaction of the amino andcarboxy finger domains of GATA-1 in refining the bindingsite (20, 41). Specifically, we have demonstrated that thebinding specificity of the carboxy finger alone is subtlydifferent from that of the two-fingered protein.A more refined view of how primary sequence differences

exert their effects on binding specificity will require knowl-edge of three-dimensional structure of the GATA fingerdomains.

Single-finger GATA proteins bind consensus and noncon-sensus GATA motifs. The ability of single-finger proteins tointeract with a variety of GATA motifs raises an interestingquestion: do two-finger proteins have advantages over sin-gle-finger proteins in DNA recognition or transcriptionalactivation? Our analysis suggests that the finger of areA orthe carboxy finger of GATA-1 have slightly more restrictedsequence specificity compared with those of some of thetwo-finger proteins examined (Table 1). One may speculate,therefore, that the addition of a second finger might permit awider range of DNA-protein interactions, as might be re-quired for developmental and tissue-specific regulation inhigher eukaryotic organisms.

Potential in vivo role(s) of nonconsensus GATA sites. Ourobservation that many variant sequences bind to GATAproteins with high affinity is intriguing on several grounds.Among target cis-regulatory elements through which these

VOL. 13, 1993

In vitrg binding im

+++++++++

Page 10: DNA-Binding Specificity of GATA Family Transcription Factors

4008 MERIKA AND ORKIN

Reporter: Mla-GH

: /~-U

,, , . . . I I I . .Oug 0.5ug lug 2ug 4ug 8ug 12ug

ug ot tranafected activator

mGATA-1mGATA-l (fhGATA-3)

Reporter: G/GATA/G-GH

mGATA-1

mGATA-1 (fhGATA-3)

I . I . I I . I . I

Oug 0.5 lug 2ug 4ug 8ug 12ugug of transfected activator

Reporter: C/GATT/A-GH

mGATA-1mGATA-l (fhGATA-3)

Oug o.sug 1 ug 2ug 4ug 8ug 12ugug of transfected activator

FIG. 8. Transactivation through GATA sites by increasingamounts of different GATA-binding proteins parallels their in vitrobinding affinities. NIH 3T3 cells were cotransfected with a constantamount (2 p.g) of the reporter plasmid indicated on the top of eachpanel and increasing amounts (O to 12 p.g) of expression plasmidsencoding for mGATA-1 or mGATA-1(fhGATA-3) proteins. A totalof 1 pg of pCMVplacZ plasmid was included in all cases to correctfor transfection efficiency. The fold of activation is given relative tothat of a control transfection performed with expression vectoralone. The in vitro binding data for the respective sites included inthe reporters examined here are shown in Fig. 7.

proteins are thought to act, nearly all exhibit sites thatconform to the prior consensus motif. In part, one mightargue that the identification of potential binding motifs incis-regulatory elements has been biased and that nonconsen-

sus sites have been overlooked in many studies. Several

examples of naturally occurring nonconsensus GATA siteshave been reported, as in the 3' human 3-globin geneenhancer (38) and hypersensitive site 2 of the ,-globin locuscontrol region (27). In these instances, motifs deviating fromthe WGATAR consensus were reported to bind GATA-1protein from crude nuclear extracts. With the method ofrandom site selection, we have shown that these also repre-sent authentic binding sites for recombinant GATA proteins.Moreover, we have confirmed that bacterially expressedGATA-1 binds the promoter of the human 0-globin gene atan atypical A/GATT/G motif which overlaps a CCAAT box(5). As shown in Fig. 1, this motif emerged several timesfrom the pool of our selected sequences. Finally, a naturallyoccurring nonconsensus site (T/GATJ/A) found in the hu-man a-globin locus control region exhibits a strong in vivodimethyl sulfate footprint and flanks a region of stronghypersensitivity in erythroid chromatin (33). This site hasbeen selected and also shown to be a high-affinity bindingsite for recombinant GATA proteins. As a consequence ofour experiments, it may be necessary to consider morebroadly where GATA proteins might act within definedpromoters or enhancers. We can only speculate at this timeas to whether the use of such nonconsensus binding sites islimited to a subset of cis elements, perhaps those withspecial functional activity.The finding that members of the GATA family exhibit

overlapping, but distinct, sequence specificities is not aunique property of this class of transcription factors. Mem-bers of the ets, rel, homeodomain, and SRF families alsodisplay common and distinct sequence requirements (7, 17,24, 28). A question for further study is the extent to whichthe differences in binding specificities demonstrated here forthe mammalian GATA proteins influence the spectrum ofpotential target genes in vivo. In cells in which members ofthe family are coexpressed, do binding differences contrib-ute to the selection of genes on which each acts? In effect,selection of targets by this mechanism would reduce appar-ent redundancy of DNA-binding factors. Alternatively, if themajority of critical binding sites conform to sequencesrecognized by several family members, selective bindingwould play little role in choosing which protein participatesin transcriptional regulation of potential targets. In vivo,many parameters are important in defining the functionalrole (if any) of a given transcription factor binding site,particularly since such short recognition sequences (gener-ally on the order of 6 to 8 bp) are represented far toofrequently in genomic DNA. Hence, the context in which apotential binding site is embedded, local chromatin struc-ture, and possible synergistic interactions with other tran-scription factors recognizing neighboring sites may be theprimary determinants of transcriptional outcome. Specific invivo experiments will be required to address these importantbiological issues.

ACKNOWLEDGMENTSWe thank David Wilson and Nancy Andrews for nuclear extracts,

Herb Arst for the areA clone, and Ellis Neufeld for assistance instatistical analysis of the selected sequences.

This work was supported in part by a grant from the NIH toS.H.O., who is an Investigator of the Howard Hughes MedicalInstitute.

REFERENCES1. Arceci, R. J., A. A. J. King, M. C. Simon, S. H. Orkin, and D. B.

Wilson. 1993. Mouse GATA-4: a retinoic acid-inducible GATA-binding transcription factor expressed in endodermally derivedtissues and heart. Mol. Cell. Biol. 13:2235-2246.

30 -

20

10

c0

a

LL

0-

20

c0

a

10

U L0U.

0

30

CB°9 20

:010

IL

MOL. CELL. BIOL.

I20

Page 11: DNA-Binding Specificity of GATA Family Transcription Factors

DNA-BINDING SPECIFICITY OF GATA PROTEINS 4009

la.Blackwell, T. K., and H. Weintraub. 1990. Differences andsimilarities in DNA-binding preferences of MyoD and E2Aprotein complexes revealed by binding site selection. Science250:1105-1110.

2. Chen, C. A., and H. Okayama. 1988. Calcium phosphate-mediated gene transfer: a highly efficient transfection system forstably transforming cells with plasmid DNA. BioTechniques6:632-638.

3. Crotta, S., S. Nicolis, A. Ronchi, S. Ottolenghi, L. Ruzzi, Y.Shimada, A. R. Migliaccio, and G. Migliaccio. 1990. Progressiveinactivation of the expression of an erythroid transcriptionalfactor in GM- and G-CSF-dependent myeloid cell lines. NucleicAcids Res. 18:6863-6869.

4. Cunningham, T. S., and T. G. Cooper. 1991. Expression of theDAL80 gene, whose product is homologous to the GATAfactors and is a negative regulator of multiple nitrogen catabolicgenes in Saccharomyces cerevisiae, is sensitive to nitrogencatabolite repression. Mol. Cell. Biol. 11:6205-6215.

5. deBoer, E., M. Antoniou, V. Mignotte, L. Wall, and F. Grosveld.1988. The human ,B-globin promoter: nuclear protein factors anderythroid specific induction of transcription. EMBO J. 7:4203-4212.

6. Dorfman, D. M., D. B. Wilson, G. A. P. Bruns, and S. H. Orldn.1992. Human transcription factor GATA-2. J. Biol. Chem.267:1279-1285.

7. Ekker, S. C., K. E. Young, D. P. von Kessler, and P. A. Beachy.1991. Optimal DNA sequence recognition by the Ultrabithoraxhomeodomain of Drosophila. EMBO J. 10:1179-1186.

7a.Engel, J. D., and S.-F. Tsai. Personal communication.8. Evans, T., and G. Felsenfeld. 1989. The erythroid-specific tran-

scription factor Eryfl: a new finger protein. Cell 58:877-885.9. Evans, T., and G. Felsenfeld. 1991. trans-activation of a globin

promoter in nonerythroid cells. Mol. Cell. Biol. 11:843-853.10. Evans, T., M. Reitman, and G. Felsenfeld. 1988. An erythrocyte-

specific DNA-binding factor recognizes a regulatory sequencecommon to all chicken globin genes. Proc. Natl. Acad. Sci.USA 85:5976-5980.

11. Fu, Y.-H., and G. A. Marzluf. 1990. nit-2, the major nitrogenregulatory gene of Neurospora crassa, encodes a protein with aputative zinc finger DNA-binding domain. Mol. Cell. Biol.10:1056-1065.

12. Fu, Y.-H., and G. A. Marzluf. 1990. nit-2, the major positive-acting nitrogen regulatory gene of Neurospora crassa, encodesa sequence-specific DNA-binding protein. Proc. Natl. Acad.Sci. USA 87:5331-5335.

13. Ho, I.-C., P. Vorhees, N. Marin, B. K. Oakley, S.-F. Tsai, S. H.Orkin, and J. M. Leiden. 1991. Human GATA-3: a lineage-restricted transcription factor that regulates the expression ofthe T cell receptor a gene. EMBO J. 10:1187-1192.

14. Joulin, V., D. Bories, J.-F. Eleouet, M.-C. Labastie, S. Chretien,M.-G. Mattei, and P.-H. Romeo. 1991. A T-cell specific TCR 8binding protein is a member of the human GATA family. EMBOJ. 10:1809-1816.

15. Ko, L. J., M. Yamamoto, M. W. Leonard, K. M. George, P.Ting, and J. D. Engel. 1991. Murine and human T-lymphocyteGATA-3 factors mediate transcription through a cis-regulatoryelement within the human T-cell receptor 8 gene enhancer. Mol.Cell. Biol. 11:2778-2784.

16. Kudla, B., M. X. Caddick, T. Langdon, N. M. Martinez-Rossi,C. F. Bennett, S. Sibley, R. W. Davies, and H. N. Arst, Jr. 1990.The regulatory gene areA mediating nitrogen metabolite repres-sion in Aspergillus nidulans. Mutations affecting specificity ofgene activation alter a loop residue of a putative zinc finger.EMBO J. 9:1355-1364.

17. Kunsch, C., S. M. Ruben, and C. A. Rosen. 1992. Selection ofoptimal KB/Rel DNA-binding motifs: interaction of both sub-units of NF-KB with DNA is required for transcriptional activa-tion. Mol. Cell. Biol. 12:4412-4421.

18. Lee, M.-E., D. H. Temizer, J. A. Clifford, and T. Quertermous.1991. Cloning of the GATA-binding protein that regulates endo-thelin-1 gene expression in endothelial cells. J. Biol. Chem.266:16188-16192.

19. MacGregor, G. R., and C. T. Caskey. 1989. Construction of

plasmids that express E. coli beta galactosidase in mammaliancells. Nucleic Acids Res. 17:2365.

20. Martin, D. I. K., and S. H. Orkin. 1990. Transcriptionalactivation and DNA binding by the erythroid factor GF-1/NF-El/Eryfl. Genes Dev. 4:1886-1898.

21. Martin, D. I. K., S.-F. Tsai, and S. H. Orkin. 1989. Increased-y-globin expression in a nondeletion HPFH mediated by anerythroid-specific DNA-binding factor. Nature (London) 338:435-438.

22. Martin, D. I. K., L. I. Zon, G. Mutter, and S. H. Orkin. 1990.Expression of an erythroid transcription factor in megakaryo-cytic and mast cell lineages. Nature (London) 344:444 447.

23. Minehart, P. L., and B. Magasanik. 1991. Sequence and expres-sion of GLN3, a positive nitrogen regulatory gene of Saccharo-myces cerevisiae encoding a protein with a putative zinc fingerDNA-binding domain. Mol. Cell. Biol. 11:6216-6228.

24. Nye, J. A., J. M. Petersen, C. V. Gunther, M. D. Jonsen, andB. J. Graves. 1992. Interaction of murine Ets-1 with GGA-binding sites establishes the ETS domain as a new DNA-bindingmotif. Genes Dev. 6:975-990.

25. Orkin, S. H. 1992. GATA-binding transcription factors in he-matopoietic cells. Blood 80:575-581.

26. Pevny, L., M. C. Simon, E. Robertson, W. H. Klein, S.-F. Tsai,V. D'Agati, S. H. Orkin, and F. Costantini. 1991. Erythroiddifferentiation in chimaeric mice blocked by a targeted mutationin the gene for transcription factor GATA-1. Nature (London)349:257-260.

27. Philipsen, S., D. Talbot, P. Fraser, and F. Grosveld. 1990. The3-globin dominant control region: hypersensitive site 2. EMBO

J. 9:2159-2167.28. Pollock, R., and R. Treisman. 1991. Human SRF-related pro-

teins: DNA-binding properties and potential regulatory targets.Genes Dev. 5:2327-2341.

29. Romeo, P.-H., M.-H. Prandini, V. Joulin, V. Mignotte, M.Prenant, W. Vainchenker, G. Marguerie, and G. Uzan. 1990.Megacaryocytic and erythrocytic lineages share specific tran-scription factors. Nature (London) 244:447-449.

30. Sambrook, J., E. F. Fritsch, and T. Maniatis. 1989. Molecularcloning: a laboratory manual, 2nd ed. Cold Spring HarborLaboratory Press, Cold Spring Harbor, N.Y.

31. Spieth, J., Y.-H. Shim, K. Lea, R. Conrad, and T. Blumenthal.1991. elt-1, an embryonically expressed Caenorhabditis elegansgene homologous to the GATA transcription factor family. Mol.Cell. Biol. 11:4651-4659.

32. Sposi, N. M., L. I. Zon, A. Care, M. Valtieri, U. Testa, M.Gabbianelli, G. Manani, L. Bottero, C. Mather, S. H. Orkin,and C. Peschle. 1992. Cell cycle-dependent initiation and lin-eage-dependent abrogation of GATA-1 expression in pure dif-ferentiating hematopoietic progenitors. Proc. Natl. Acad. Sci.USA 89:6353-6357.

33. Strauss, E. C., N. C. Andrews, D. R. Higgs, and S. H. Orkin.1992. In vivo footprinting of the human a-globin locus upstreamregulatory element by guanine and adenine ligation-mediatedpolymerase reaction. Mol. Cell. Biol. 12:2135-2142.

34. Studier, F. W., and B. Moffat. 1986. Use of bacteriophage T7polymerase to direct selective high level expression of clonedgenes. J. Mol. Biol. 189:113-130.

35. Studier, F. W., A. H. Rosenberg, J. J. Dunn, and J. W.Dubendorff. 1990. Use of T7 RNA polymerase to direct expres-sion of cloned genes. Methods Enzymol. 185:60-89.

36. Sun, X.-H., and D. Baltimore. 1991. An inhibitory domain ofE12 transcription factor prevents DNA binding in E12 ho-modimers but not in E12 heterodimers. Cell 64:459-470.

37. Tsai, S.-F., D. I. K. Martin, L. I. Zon, A. D. D'Andrea, G. G.Wong, and S. H. Orkin. 1989. Cloning of cDNA for the majorDNA-binding protein of the erythroid lineage through expres-sion in mammalian cells. Nature (London) 339:446-451.

38. Wall, L., E. deBoer, and F. Grosveld. 1988. The human 3-globingene 3' enhancer contains multiple binding sites for an eryth-roid-specific protein. Genes Dev. 2:1089-1100.

39. Wilson, D. B., D. M. Dorfian, and S. H. Orkin. 1990. Anonerythroid GATA-binding protein is required for function ofthe human preproendothelin-1 promoter in endothelial cells.

VOL. 13, 1993

Page 12: DNA-Binding Specificity of GATA Family Transcription Factors

4010 MERIKA AND ORKIN

Mol. Cell. Biol. 10:4854-4862.

40. Yamamoto, M., L. J. Ko, M. W. Leonard, H. Beug, S. H. Orkin,and J. D. Engel. 1990. Activity and tissue-specific expression ofthe transcription factor NF-El multigene family. Genes Dev.4:1650-1662.

41. Yang, H.-Y., and T. Evans. 1992. Distinct roles for the twocGATA-1 finger domains. Mol. Cell. Biol. 12:4562-4570.

42. Youssoufian, H., L. I. Zon, S. H. Orkin, A. D. D'Andrea, andH. F. Lodish. 1990. Structure and transcription of the mouse

erythropoietin receptor gene. Mol. Cell. Biol. 10:3675-3682.43. Yuan, G.-F., Y.-H. Fu, and G. A. Marzluf. 1991. nit-4, a

pathway-specific regulatory gene of Neurospora crassa, en-

MOL. CELL. BIOL.

cddes a protein with a putative binuclear zinc DNA-bindingdomain. Mol. Cell. Biol. 11:5735-5745.

44. Zon, L. I., M. F. Gurish, R. L. Stevens, C. Mather, D. S.Reynolds, K. F. Austen, and S. H. Orldn. 1991. GATA-bindingtranscription factors in mast cells regulate the promoter of themast cell carboxypeptidase A gene. J. Biol. Chem. 266:22948-22953.

45. Zon, L. I., C. Mather, S. Burgess, M. E. Bolce, R. M. Harland,and S. H. Orkin. 1991. Expression of GATA-binding proteinsduring embryonic development in Xenopus laevis. Proc. Natl.Acad. Sci. USA 88:10642-10646.