molecular evolution of the gata family of transcription ... - pdfs/gata - jme.pdf · molecular...

13
Molecular Evolution of the GATA Family of Transcription Factors: Conservation Within the DNA-Binding Domain Jason A. Lowry, William R. Atchley Department of Genetics, North Carolina State University, Raleigh, NC 27695-7614, USA Received: 19 January 1999 / Accepted: 17 September 1999 Abstract. The GATA-binding transcription factors comprise a protein family whose members contain either one or two highly conserved zinc finger DNA-binding domains. Members of this group have been identified in organisms ranging from cellular slime mold to verte- brates, including plants, fungi, nematodes, insects, and echinoderms. While much work has been done describ- ing the expression patterns, functional aspects, and target genes for many of these proteins, an evolutionary analy- sis of the entire family has been lacking. Herein we show that only the C-terminal zinc finger (Cf) and basic do- main, which together constitute the GATA-binding do- main, are conserved throughout this protein family. Phy- logenetic analyses of amino acid sequences demonstrate distinct evolutionary pathways. Analysis of GATA fac- tors isolated from vertebrates suggests that the six dis- tinct vertebrate GATAs are descended from a common ancestral sequence, while those isolated from nonverte- brates (with the exception of the fungal AREA ortho- logues and Arabidopsis paralogues) appear to be related only within the DNA-binding domain and otherwise pro- vide little insight into their evolutionary history. These results suggest multiple modes of evolution, including gene duplication and modular evolution of GATA fac- tors based upon inclusion of a class IV zinc finger motif. As such, GATA transcription factors represent a group of proteins related solely by their homologous DNA- binding domains. Further analysis of this domain exam- ines the degree of conservation at each amino acid site using the Boltzmann entropy measure, thereby identify- ing residues critical to preservation of structure and func- tion. Finally, we construct a predictive motif that can accurately identify potential GATA proteins. Key words: GATA — Transcription factor — Zinc finger — DNA binding domain — Phylogeny — Boltz- mann entropy — Predictive motif Introduction The GATA-binding family of transcription factors con- stitutes a subgroup of DNA-binding proteins whose members both bind a consensus HGATAR motif and contain the class IV zinc finger motif. Most of the pro- teins described to date include one or two zinc fingers fitting the consensus sequence CX 2 CX 17–18 CX 2 C fol- lowed by a basic region. In those proteins containing two zinc fingers, the GATA-binding domain is comprised of the C-terminal zinc finger (Cf) and successive basic do- main. Generally speaking, there appears to be evolutionary conservation with regard to function. Many GATA fac- tors in organisms studied thus far activate or inactivate genes in response to an environmental deficiency and/or to extract an element (i.e., iron, nitrogen, etc.) from the surroundings. Within fungi, some GATA factors act as positive regulators of nitrogen metabolism and are re- quired to activate expression of nitrogen catabolic en- zymes during periods of nitrogen deficiency. Examples include the AREAs (Caddick et al. 1986; MacCabe et al. 1998; Haas et al. 1995) and orthologous genes (Fu and Marzluf 1990; Froeliger and Carpenter 1996; Stanbrough Correspondence to: Jason A. Lowry; e-mail: [email protected] J Mol Evol (2000) 50:103–115 DOI: 10.1007/s002399910012 © Springer-Verlag New York Inc. 2000

Upload: others

Post on 29-Oct-2019

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Molecular Evolution of the GATA Family of Transcription ... - pdfs/GATA - JME.pdf · Molecular Evolution of the GATA Family of Transcription Factors: Conservation Within the DNA-Binding

Molecular Evolution of the GATA Family of Transcription Factors:Conservation Within the DNA-Binding Domain

Jason A. Lowry, William R. Atchley

Department of Genetics, North Carolina State University, Raleigh, NC 27695-7614, USA

Received: 19 January 1999 / Accepted: 17 September 1999

Abstract. The GATA-binding transcription factorscomprise a protein family whose members contain eitherone or two highly conserved zinc finger DNA-bindingdomains. Members of this group have been identified inorganisms ranging from cellular slime mold to verte-brates, including plants, fungi, nematodes, insects, andechinoderms. While much work has been done describ-ing the expression patterns, functional aspects, and targetgenes for many of these proteins, an evolutionary analy-sis of the entire family has been lacking. Herein we showthat only the C-terminal zinc finger (Cf) and basic do-main, which together constitute the GATA-binding do-main, are conserved throughout this protein family. Phy-logenetic analyses of amino acid sequences demonstratedistinct evolutionary pathways. Analysis of GATA fac-tors isolated from vertebrates suggests that the six dis-tinct vertebrate GATAs are descended from a commonancestral sequence, while those isolated from nonverte-brates (with the exception of the fungal AREA ortho-logues andArabidopsisparalogues) appear to be relatedonly within the DNA-binding domain and otherwise pro-vide little insight into their evolutionary history. Theseresults suggest multiple modes of evolution, includinggene duplication and modular evolution of GATA fac-tors based upon inclusion of a class IV zinc finger motif.As such, GATA transcription factors represent a group ofproteins related solely by their homologous DNA-binding domains. Further analysis of this domain exam-ines the degree of conservation at each amino acid siteusing the Boltzmann entropy measure, thereby identify-

ing residues critical to preservation of structure and func-tion. Finally, we construct a predictive motif that canaccurately identify potential GATA proteins.

Key words: GATA — Transcription factor — Zincfinger — DNA binding domain — Phylogeny — Boltz-mann entropy — Predictive motif

Introduction

The GATA-binding family of transcription factors con-stitutes a subgroup of DNA-binding proteins whosemembers both bind a consensus HGATAR motif andcontain the class IV zinc finger motif. Most of the pro-teins described to date include one or two zinc fingersfitting the consensus sequence CX2CX17–18CX2C fol-lowed by a basic region. In those proteins containing twozinc fingers, the GATA-binding domain is comprised ofthe C-terminal zinc finger (Cf) and successive basic do-main.

Generally speaking, there appears to be evolutionaryconservation with regard to function. Many GATA fac-tors in organisms studied thus far activate or inactivategenes in response to an environmental deficiency and/orto extract an element (i.e., iron, nitrogen, etc.) from thesurroundings. Within fungi, some GATA factors act aspositive regulators of nitrogen metabolism and are re-quired to activate expression of nitrogen catabolic en-zymes during periods of nitrogen deficiency. Examplesinclude the AREAs (Caddick et al. 1986; MacCabe et al.1998; Haas et al. 1995) and orthologous genes (Fu andMarzluf 1990; Froeliger and Carpenter 1996; StanbroughCorrespondence to:Jason A. Lowry;e-mail: [email protected]

J Mol Evol (2000) 50:103–115DOI: 10.1007/s002399910012

© Springer-Verlag New York Inc. 2000

Page 2: Molecular Evolution of the GATA Family of Transcription ... - pdfs/GATA - JME.pdf · Molecular Evolution of the GATA Family of Transcription Factors: Conservation Within the DNA-Binding

et al. 1995; Minehart and Magasanik 1991). Conversely,a negative regulator of nitrogen metabolism, DAL80, hasalso been identified inS. cerevisiae(Cunningham andCooper 1991). Additional fungal GATA factors have ac-quired diverse functions. For example, both WC1 andWC2 in N. crassaare involved in blue light-regulatedphotomorphogenesis and circadian regulation (Ballarioet al. 1996; Crosthwaite et al. 1997). SRD1, also found inbaker’s yeast, may play a role in pre-rRNA processing(Hess et al. 1994). URBS1, SREA, and SRE, three of thesix known fungal factors with two zinc fingers, repressbiosynthesis of iron chelators called siderophores inU.maydis, A. nidulans,andN. crassa,respectively (Voisardet al. 1993; Zhou et al. 1998). Since GAF2 ofS. pombe(fission yeast) and SREP fromP. chrysogenumalso havetwo zinc finger domains separated by a region compa-rable to that of URBS1, SREA, and SRE, they probablyfunction similarly.

Within vertebrates, six GATA factors have been char-acterized that are expressed in a variety of tissues.GATA-1, -2, and -3 comprise one subgroup, whileGATA-4, -5, and -6 constitute the other. A distinctive butoverlapping pattern of expression, both temporally andspatially, has been established for these six proteins.GATA-1, -2, and -3 are differentially expressed in cellsof erythroid lineage (i.e., megakaryocytes, progenitorcells, embryonic brain cells, primitive erythroblasts, en-dothelial cells, eosinophils, testis, etc.). Conversely,GATA-4, -5, and -6 are detected in endodermally de-rived tissues (heart, lung, stomach, intestine, ovary,blood vessles, etc.). Furthermore, numerous target geneshave been identified for vertebrate GATA factors. Im-portant targets, among many, for GATA-1 in red bloodcells are the globin genes (Evans et al. 1988; Martin et al.1989; Talbot and Grosveld 1991). Preproendothelin I is acandidate target gene for GATA-2 in endothelial cells(Kawana et al. 1995). GATA-3, expressed in T cells,plays a role in regulating the T-cell receptor (Joulin et al.1991). Targets for GATA-4 include myosin heavy chain,actin, and cardiac troponin (Jiang and Evans 1996; Ip etal. 1994).

Herein, we explore the evolutionary relationshipsamong 93 proteins that contain at least one GATA-typezinc finger. These proteins have been isolated from 30organisms (see Table 1 for a complete list of all GATAproteins and organisms included in our analyses). Whilethe C-terminal zinc finger (Cf) and basic domain havebeen conserved throughout evolution, there appear to beno other conserved regions across all GATA factors.However, it should be noted that GATA-binding capa-bility has not been demonstrated for every protein in-cluded in this analysis. Furthermore, we address ques-tions regarding sequence variability and its impact onstructure in the DNA-binding domain. First, we measurethe extent of sequence variability permitted while pre-serving the tertiary structure and function of this domain.

Using this information, we can construct a predictivemodel identifying the primary structure of the GATA-binding zinc finger domain.

Materials and Methods

Sequence Analysis

We collected members of the GATA family from the DDBJ/EMBL/GenBank sequence database by keyword and query search (BLAST)using the conserved type IV zinc finger domain. We identified 89full-length protein sequences, including 14 hypothetical GATA pro-teins, as well as 4 partial sequences (consisting of the zinc finger andC-terminal domain only) isolated from 30 species (Table 1). The aminoacid sequences were aligned using the ClustalW algorithm (Thompsonet al. 1994) and subsequent improvement by eye. Neighbor-joining(NJ) trees (Saitou and Nei 1987) were constructed based upon thep-distance (fraction of sites that are different), gamma distance, andPoisson correction in the MEGA package (Kumar et al. 1994). Gappedsites in the alignments were deleted in pairwise comparisons, i.e., thesesites were not informative. The final step was to perform 500 replica-tions of the nonparametric bootstrap to measure support for the topol-ogy. Overall, we found the topologies generated with the differentmetrics to be concordant. We then submitted the alignment and NJtreefile to PAML (phylogenetic analysis by maximum likelihood) andenlisted the Jones Taylor Thornton (JTT) substitution matrix to calcu-late the branch lengths (Yang 1997). We constructed trees for (a) theentire protein sequence; (b) each individual zinc finger with a subse-quent basic domain, DNA binding or otherwise; and (c) the flankingregions. STKA, isolated from cellular slime mold (D. discoideum),served as the outgroup in our analysis (Chang et al. 1996). This proteinwas outgrouped since cellular slime mold diverged from the othermitochondrial eukaryotes (crown eukaryotes) earlier than any otherorganism in our analysis.

Estimating Sequence Variability

Statistical analyses of protein sequences present complications sinceamino acid symbols lack any natural ordering or underlying metric.Recent works suggest analyses using such concepts as entropy andmutual information (Schneider et al. 1986; Clarke 1995; Herzel andGrosse 1995; Roman-Roldan et al. 1996). Along these lines, we usedthe Boltzmann entropyE as employed by Shannon (1949). This Boltz-mann–Shannon statistic measures the amount of variation amongamino acid categories at each site. It is defined as

E 4 −∑Pilog2Pi

where Pi represents the relative frequency of residues in categoryi(wherePilog2Pi 4 0 if Pi 4 0). Hence, if all the elements are in thesame category, thenE 4 0 for that site.E increases as the number ofcategories is increased, with the theoretical maximum value beinglog2(n), with n representing the number of categories. Gaps were notconsidered in the calculation ofE. We computed the Boltzmann–Shannon statistic at each site, allowing each amino acid to represent aseparate category in our calculation ofE (n 4 20).

Ancestral Sequence Reconstruction

Based on the most parsimonious tree generated using only the con-served zinc finger domains, we attempted to reconstruct the ancestralsequences at particular nodes. We enlisted the program PROTPARS ofthe PHYLIP (Felsenstein 1993) package to perform the analyses.

104

Page 3: Molecular Evolution of the GATA Family of Transcription ... - pdfs/GATA - JME.pdf · Molecular Evolution of the GATA Family of Transcription Factors: Conservation Within the DNA-Binding

Table 1. List of all proteins included in our analyses including source, length in amino acids, and accession numbera

Organism No. Protein Length Accession No.

VertebrateDanio rerio (zebrafish) 10 GATA-1 418 U18311

86p GATA-2 (partial) 194+ U183125 GATA-3 438 S80425

33 GATA-5 383 AJ242515Gallus gallus(chicken) 14 GATA-1 304 M26209

7 GATA-2 466 X569303 GATA-3 444 X56931

87p GATA-4 (partial) 380+ U1188731 GATA-5 391 U1188822 GATA-6 387 U11889

Homo sapiens(human) 13 GATA-1 413 M306019 GATA-2 474 M778102 GATA-3 444 X55037

27 GATA-4 442 L3435721 GATA-6 449 U66075

Mus musculus(mouse) 12 GATA-1 413 X157638 GATA-2 480 AB0000961 GATA-3 443 X55123

26 GATA-4 441 SP:Q0836932 GATA-5 404 U8472520 GATA-6 444 S82462

Rattus norvegicus(rat) 11 GATA-1 413 D1351825 GATA-4 440 L2276119 GATA-6 441 L22760

Xenopus laevis(frog) 16 GATA-1A 359 M7656615 GATA-1B 364 M765636 GATA-2 452 M765644 GATA-3 435 M76565

28 GATA-4 392 U4545330 GATA-5A 390 L1370129 GATA-5B 388 L1370224 GATA-6A 391 U4545423 GATA-6B 391 Y08865

EchinodermStrongylocentrotus purpuratus

(sea urchin) 18 GATAc 431 AF07767488p GATAe (partial) 119+ AF077675

InsectBombyx mori(silkworm) 35 GATAb 509 L27451Drosophila melanogaster 34 GATAa/pannier 540 S68798

(fruit fly) 49 GATAb/serpent 779 X7621717 GATAc 486 D5054289Hp “GATAd” (partial) ??? AC005454

NematodeCaenorhabditis briggsae 44 ELT-3 249Caenorhabditis elegans 36 ELT-1 416 X57834

48 ELT-2 433 U2517543 ELT-3 (K02B9.4) 226 AF146517 (Z69663)45 END1 221 AF026555

90H END3 242 Z8155547H T24D3 211 Z6812146H C18G1.2 198 AF06870891H C22D3.1 613 Z50027/Z4986792H F52C12.5 344 AF100657

F55A8.1 294 AF067612Fungus

Aspergillus nidulans 53 AREA 876 X5249181 NSDD 461 U7004339 SREA 549 AF095898

105

Page 4: Molecular Evolution of the GATA Family of Transcription ... - pdfs/GATA - JME.pdf · Molecular Evolution of the GATA Family of Transcription Factors: Conservation Within the DNA-Binding

Results

Phylogenetic Analysis of GATA Proteins

Phylogenetic analysis of the full-length GATA proteinsequences yields two drastically different patterns of pro-tein evolution (Fig. 1). The evolutionary pathway ofGATA factors among nonvertebrates appears to be muchdifferent from that within vertebrates. The topology forthe vertebrate sequences is well resolved, suggesting aparalogous pattern of gene divergence, i.e., evolution bygene duplication. Strong support for this region of the

tree is given (within the NJ tree, 22 of 33 bootstrapvalues >90%). These findings suggest that vertebrateGATA proteins arose from a common ancestor, i.e., theyhave a common origin.

However, given the hypothesis that the vertebrate lin-eage underwent two major duplication events followingthe separation of Amphioxus and Craniata (Fig. 2A)(Ohno 1970; Holland et al. 1994; Pebusque et al. 1998;Sidow 1996), we must attempt to explain the existence ofsix vertebrate GATAs as opposed to four or eight (Fig.2B). The first question to address is whether there wereone or two GATA genes within the deuterostome ances-tor. Given the bootstrap support of the NJ tree (94%; not

Table 1. Continued

Organism No. Protein Length Accession No.

Aspergillus niger 52 AREA 882 X81998Aspergillus oryzae 51 AREA 866 AJ002968Aspergillus parasiticus 50 AREA 866 AF148539Fusarium solani f.sp.pisi 76 PBP 457 U23722Gibberella fujikuroi 60 AREA 971 Y11006Magnaporthe grisea 57 NUT1 956 U60290Metarhizium anisopliae 59 NRR1 944 AJ006468Neurospora crassa 58 NIT-2 1036 M33956

41 SRE 587 AF08713082 WC-1 1154 X9430077 WC-2 530 Y09119

Penicillium chrysogenum 40 SREP 532 U4841456 AREA 725 U0261264 NREB 298 U96385

Penicillium roqueforti 55 NMC 860 AJ001530Penicillium urticae 54 NRFA 865 U53137Saccharomyces cerevisiae 84 ASH1 588 Z28185

(baker’s yeast) 62 DAL80/UGA43 269 X6019966 GAT1/NIL1 510 U2734467 GLN3 730 M3526763 GZF3/DEH1/NIL2 551 X8635374 SRD1 225 SP:P0900780H Cos9375 560 Z4707173H CB15A-2D 133 X0738875H Lpb10p/YPL021W 187 U3662478H YIR013c 121 SP:P4056979H YLR013w 141 Z73185

Schizosaccharomyces pombe 61 GAF1 290 L31601(fission yeast) 38 GAF2 564 L29051

37H SPCC1393.08 557 AL035592Ustilago maydis(smut fungus) 42 URBS1 950 M80547Zygosaccharomyces rouxii 65 SAT1 327 D83211

PlantArabidopsis thaliana 83H BAC IG002P16.9 550 AF007270

(thale cress) 93H C7A10.740 211 Z9970872 GATA-1 274 Y1364868 GATA-2 264 Y1364970 GATA-3 269 Y1365069 GATA-4 240 Y13651

Nicotiana tabacum 71 Ntl1-NT7 305 X73111Slime mold

Dictyostelium discoideum 85 STKA 872 U68754

a The accession numbers are from Genbank unless otherwise indicated (SP, Swiss-Prot; PIR database). The No. column corresponds to the positionon the phylogenetic tree in Fig. 1. “H” and “p” indicate “hypothetical” and “partial” sequences, respectively. These numbers appear again in Fig.3 for reference.

106

Page 5: Molecular Evolution of the GATA Family of Transcription ... - pdfs/GATA - JME.pdf · Molecular Evolution of the GATA Family of Transcription Factors: Conservation Within the DNA-Binding

shown), the branching pattern ofDrosophila dGATAaand dGATAc relative to the vertebrate GATAs suggeststhat there were two ancestral deuterostome GATAs.dGATAa shares a common ancestor with GATA-4, -5,and -6, while dGATAc and GATA-1, -2, and -3 alsohave a common ancestor (Fig. 1). Therefore, we favor

the hypothesis that the ancestral vertebrate organism hadtwo GATAs, proposing that an independent gene dupli-cation occurred prior to the protosome/deuterostome di-vergence (yielding dGATAa and dGATAc inDrosophilaand two ancestral deuterostome GATAs). Together, thetwo major duplication events produced the separate

Fig. 1. Evolutionary relationships among 85 full-length GATA pro-tein sequences that contain at least one zinc finger conforming to theconserved sequence (CX2–4CX17–20CX2C). Amino acid sequenceswere aligned with ClustalW. Drawn here is a neighbor-joining (NJ) tree

with branch lengths estimated using the JTT substitution matrix inPAML. Gaps were deleted in pairwise comparisons. Taxa are num-bered sequentially from bottom to top for cross reference with Fig. 3.

107

Page 6: Molecular Evolution of the GATA Family of Transcription ... - pdfs/GATA - JME.pdf · Molecular Evolution of the GATA Family of Transcription Factors: Conservation Within the DNA-Binding

GATA-4, -5, and -6 and GATA-1, -2, and -3 genes,along with two that either have been lost (pseudogenes)or simply have yet to be cloned. Expression patterns forthe GATA factors correlate well with the proposed pat-tern of gene duplication. GATA-1/2/3 are expressed incells of erythroid lineage, while GATA-4/5/6 are foundin gut-derived (endodermal) tissues. Divergence withinthe regulatory regions of the ancestors of the two sub-families would account for their different expression pat-terns. Further support is gained by comparison of theirexon/intron structure, where structure is conservedamong the members within each subfamily (MacNeill etal. 1997).

The presence of six distinct GATA factors within am-phibians suggests that each of these gene duplicationsoccurred prior to the origin of amphibians approximately360 million years ago (Benton 1990), consistent withFig. 2A. Due to the pseudotetraploid genome ofXenopuslaevis(Thiebaud and Fischberg 1977), GATA-1, -5, and-6 each have two slightly divergent genetic loci charac-terized in the literature. The absence of a duplicated lo-cus for GATA-2, -3, or -4 is probably a consequence ofinsufficient sampling for the second gene.

Compared to the topology of the vertebrate GATAs,that of the nonvertebrate proteins is less resolved (Fig.1). Of the 35 factors identified in fungi, 29 possess only

a single zinc finger domain. Five of the six proteins withtwo zinc fingers differ from those described within ver-tebrates regarding the spacing of the fingers. Many of theclades involving nonvertebrate proteins arise deep withinthe tree and have low bootstrap support. This can beattributed to the lack of conservation, aside from the zincfinger motif, or to a lack of homology (i.e., no commonancestry) of the flanking regions. Based on this informa-tion, we infer that a modular evolutionary mechanism,such as exon (domain) shuffling (Gilbert 1987; Patthy1995, 1996), could explain this observation. Although,there are two well-supported lineages (bootstrap values>95%): the orthologous AREA proteins and the fourparalogousArabidopsisGATAs. The AREA proteinshave been shown to have similar functions to one anotherin agreement with the phylogeny (Davis and Hynes1987; Fu and Marzluf 1990; Haas and Marzluf 1995;Froeliger and Carpenter 1996).

Evolution of the Motif and Tandem Repeats

An interesting question is the origin of the multiple cop-ies of the zinc finger domain. Of the 89 full length and 4partially sequenced GATA factors included in our analy-ses, 45 contain two zinc fingers. Thirty-three GATA pro-

Fig. 2. A Phylogenetic tree showing the deuterostomelineage as based on the “tree of life” (http://phylogeny.arizona.edu/tree/phylogeny.html). Theblackovals represent the two hypothetical large-scaleduplication events occurring after the divergence ofamphioxus. An approximate time line is included forreference.B Based on the relative expression patterns,we propose two possible explanations for the existenceof six GATA proteins in vertebrates. Thetop boxshowsa single ancestral GATA protein undergoing two majorduplication events, yielding four GATAs. Twoindependent duplications produce the resultant sixvertebrate GATA proteins. Thebottomscenario depictspossible events assuming that there were two GATAproteins in the ancestral deuterostome organism. Thesetwo ancestral GATAs undergo two duplication eventsresulting in eight distinct genetic loci.Dashed linesindicate that two of these genes have becomenonfunctional (pseudogenes), resulting in six functionalGATAs.

108

Page 7: Molecular Evolution of the GATA Family of Transcription ... - pdfs/GATA - JME.pdf · Molecular Evolution of the GATA Family of Transcription Factors: Conservation Within the DNA-Binding

teins have been identified in vertebrates, all of whichcontain two zinc fingers. The remaining 12 are from seaurchin (2),D. melanogaster(2), C. elegans(1), silkmoth(1), and fungi (6). Based upon the phylogenetic tree pro-duced in Fig. 1, a reasonable inference would be that theancestral GATA factor had a single zinc finger domain.Those factors with two zinc fingers would have evolvedthrough one or more tandem duplication events. Lookingat the region separating the two zinc fingers, we mustconsider two hypotheses: (1) a single tandem duplicationevent followed by an insertion in the ancestor of GAF2,SREP, SREA, URBS1, and SRE and (2) two separatetandem duplication events. In either case, note that thehypotheticalS. pombeGATA protein from cosmid c1393(SPCC1393.08) has two zinc fingers homologous tothose inC. elegans, Drosophila,sea urchin, and verte-brates. Therefore, the tandem duplication event that re-sulted in the two fingers of SPCC1393.08 was likely thesame one that gave rise to ELT-1, dGATAa, dGATAc,SpGATAc, SpGATAe, and the known vertebrateGATAs. This would suggest that this tandem duplicationoccurred prior to the divergence of Fungi and Metazoa.The two zinc fingers of SRE, GAF2 (S. pombe), SREA,SREP, and URBS1 are separated by 139, 135, 120, 119,and 119 amino acids, respectively, while the vertebrateand SPCC1393.08 zinc fingers are separated by only 29residues. Interestingly, an independent tandem duplica-tion was observed in the areA-300 mutant inAspergillusnidulans(Caddick and Arst 1990). The two fingers in theareA-300 mutant are separated by 114 amino acids.

Next, we wanted to examine the evolutionary historyof the GATA-type zinc finger. Previously, Omichinskiand others (1993) had identified a minimal 66-aminoacid peptide of chicken GATA-1 that contains the C-terminal zinc finger and successive basic domain capableof binding the GATA site. Using only the region of thepeptide that actually contacts the DNA as the model(residues 162 to 216 of cGATA-1), we aligned the cor-responding DNA-binding domains from each of theGATA factors to that of chicken GATA-1. For thoseproteins containing two zinc fingers, two separate opera-tional taxonomic units (OTUs), one for each zinc fingerand basic domain (Nf, N-terminal finger; Cf, C-terminalfinger) were analyzed. With this alignment of the con-served domain, we then constructed a second phyloge-netic tree (Fig. 3). Based on the topology, it is likely thatthe tandem duplication observed in GATA proteins ofDrosophila,sea urchin, and vertebrates occurred prior tothe earliest gene duplication of their common ancestralGATA. Regarding the twoDrosophila GATAs whichhave two zinc finger domains, each finger and successivebasic domain closely resembles the corresponding re-gions of the vertebrate and echinoderm GATAs, suggest-ing a gene duplication prior to the divergence of proto-stomes and deuterostomes. The subsequent geneduplication events within vertebrates occurred before the

origin of amphibians as they contain at least one ortho-logue for each of GATA-1 through GATA-6. However,it is likely that the zebrafish genome also contains at leastsix GATAs (see Table 1), but two or more are merelyabsent from the literature. Therefore, it is probable thatall six GATA proteins were present in the common an-cestor of jawed vertebrates (Gnathostomata).

Under this assumption, we attempted to reconstructthe ancestral GATA sequences for each of the tandemlyduplicated vertebrate zinc fingers and the fungal AREAorthologues (Fig. 4). The ancestral sequences are quitedivergent, especially in the basic domain, suggestingunique functions for each finger. Within vertebrates, thetwo zinc finger domains have evolved vastly differentfunctions. In most cases, the Cf is responsible for bindingthe HGATAR cis-element, while the Nf has been shownto have multiple functions. Perhaps the most criticalfunction of the Nf is to participate in protein–proteininteractions with GATA cofactors such as Sp1 (Merikaand Orkin 1995), EKLF (Merika and Orkin 1995), FOG(Tsang et al. 1997), FOG-2 (Lu et al. 1999), Ush (Haen-lin et al. 1997), and LIM-Only (Ono et al. 1998), amongothers. It also has been suggested that the Nf confersspecificity in the binding activity of the C-terminal finger(Martin and Orkin 1990) and/or functions as part of anacitvation domain (Yang and Evans 1992). Using mouseGATA-1, Whyatt and others (1993) have shown that thepresence of the Nf allows the protein to bind to a weakersite (T/C)AAG. Formerly, there was no evidence that theNf could bind DNA by itself. More recent studies usingchicken GATA-2 and GATA-3 have demonstrated theDNA-binding capability of the Nf and flanking basicdomains (Pedone et al. 1997). This ability is specific toGATA-2 and GATA-3 since this basic domain, located 8to 14 residues amino to the Nf, is lacking in the otherfour vertebrate GATA factors. DNA-binding specificitystudies using GATA-1/2/3 have shown all to preferGATA, while GATA-2 and GATA-3 recognize GATCequally as well (Ko and Engel, 1993).

Flanking Regions

Phylogenetic analysis of the flanking regions of theGATA factors gives further evidence of the two separatemodes of evolution for vertebrates and fungi (data notshown). The strong bootstrap support that was evident inthe analysis of the full protein within vertebrates is main-tained even when the conserved zinc finger domains areremoved. Hence, this presents further evidence that ver-tebrate GATA factors have evolved from a common an-cestor. Additionally, as expected, removal of the con-served zinc finger domains from the nonvertebratesequences results in even deeper nodes and less statisti-cal support, again with the exception of the AREA or-thologues and the four paralogousArabidopsisGATAs,which maintain good resolution and strong support. This

109

Page 8: Molecular Evolution of the GATA Family of Transcription ... - pdfs/GATA - JME.pdf · Molecular Evolution of the GATA Family of Transcription Factors: Conservation Within the DNA-Binding

110

Page 9: Molecular Evolution of the GATA Family of Transcription ... - pdfs/GATA - JME.pdf · Molecular Evolution of the GATA Family of Transcription Factors: Conservation Within the DNA-Binding

would argue against common ancestry for the flankingsequences, suggesting modular evolution via shuffling ofthe zinc finger domains. Further evidence of modularevolution exists in the identification of other domainslocated within the flanking regions. For example, threefungal proteins (WC-1, WC-2, and PBP) all contain thePAS domain, a domain which has been implicated incircadian regulation. Additionally, the fact that the zincfinger domain can be located anywhere within the pro-tein lends support to this hypothesis. For instance, the Nfof GAF2 is located 11 amino acids from the N terminus,whereas those of URBS1, SRE, SREP, and SREA arelocated nearer the middle of the protein chain.

Flexibility in Primary and Tertiary Structure

In examining this family of transcription factors, we notethat the region responsible for DNA-binding is relativelywell conserved. A certain degree of conservation of pri-mary structure is necessary to maintain functionality ofthe folded tertiary structure. First, recall that the con-served motif structure is CX2CX17–18CX2C. With this inmind, we present some intriguing variations to the con-sensus sequence. The most recently reported example isEND-1 (Zhu et al. 1997), which has four residues be-tween the first two cysteines where normally there areonly two. The single finger of Ash1p represents the onlymember with 20 amino acids between the second and thethird cysteines (although there is no experimental evi-dence that Ash1p binds HGATAR). Also noteworthy,GATAb in Bombyx mori (silkworm) and chickenGATA5 are two genes in which alternative splicing de-termines the number of GATA-type zinc fingers, eitherone or two, present in the resultant protein (Drevet et al.1995; MacNeill et al. 1997). Further, the Nf of GATAb,found in two of the three alternatively spliced isoforms,represents the only case where histidine (H) is substi-tuted for arginine (R) at position 19, immediately fol-lowing the invariant tryptophan (W). Considering thevariations and the structural information available, wepropose an updated consensus motif and linear model(Fig. 4, bottom).

To measure the sequence variation, we calculated en-tropy values (Shannon 1949) for each of the sites withinthe conserved DNA-binding domain (Fig. 5). Again, weutilized the region delimited by Omichinski et al. (1993)using only the 55 amino acid region that included allcontact with the DNA. Some of the sites within the zinc

finger motif appear to be under strong selection pressuresresulting in very low entropy values. To begin with themost obvious, the four invariant cysteine (C) residues areabsolutely necessary to confer binding of the zinc ionand are conserved in all GATAs. The invariant trypto-phan (W) residue in the loop appears to be critical inmaintaining the structural integrity in the metal bindingregion (Omichinski et al. 1993). Studies using the AREAprotein ofA. nidulansdemonstrated a requirement for theconserved leucine (L) residue which precedes the tryp-tophan (Ravagnani et al. 1997). This leucine makes spe-cific hydrophobic contacts with the first two nucleotidesof the consensus GATA-binding site (AGATAA) asshown in the resolved tertiary complex (Starich et al.1998a). The asparagine (N) located at position 29 alsomakes critical specific DNA contacts and is well con-served. This site falls within thea-helical region at theC-terminal end of the zinc finger (sites 28–33). Addi-tionally, structural analysis of the murine GATA-1 Nfinteraction with FOG implicated six sites that make di-rect contact with this cofactor (Kowalski et al. 1999).These sites are very well conserved (at least 4/6 identi-cal) among all the Nf’s of GATAs with two zinc fingersexcept for the six fungal proteins.

However, there are many sites within the zinc finger/basic domain that have high entropy values (Fig. 5). Ifevents causing such variability in the primary structureof the motif are propagated, the evolutionary history ofthis family may be too variable at this level to be re-solved. A study comparing the tertiary structures com-plexed with DNA may be necessary to address this prob-lem. To date, only three such structures have beenresolved, with one containing a point mutation within thezinc finger (L→V) (Omichinski et al. 1993; Starich et al.1998a, b). The two native proteins show similar struc-tures and DNA contacts within the zinc finger motif.However, the basic tails make their contacts within dif-ferent grooves of the double-helix (AREA, minorgroove; cGATA1, major groove). This could help to ex-plain the large amount of variability observed in the ba-sic region of the DNA-binding domain.

Discussion

Phylogenetic analyses suggest that the ancestral GATAprotein contained only a single zinc finger. We infer thata single tandem duplication event prior to the divergenceof the Fungal and Metazoan lineages adequately explains

<

Fig. 3. Omichinski and others (1993) identified a minimal 66-aminoacid peptide from chicken GATA-1, containing the C-terminal zincfinger and successive basic domain, which could bind the GATA site.From this information, we aligned the 55-amino acid region (residues162 to 216 of cGATA-1) containing all sites that physically interactwith DNA to the corresponding regions of other GATAs. For thoseproteins with two zinc fingers, we differentiate using Nf (N-terminal

finger) and Cf (C-terminal finger). As both NJ and parsimony-basedmethods produced very similar trees, we submitted the NJ tree toPAML. Branch lengths were determined as in Fig. 1. Bootstrap valuesof the NJ tree were considerably lower due to the short length of thesequences used. The alignment has been submitted to the EMBL align-ment database under accession number ds39412 (ftp://ftp.ebi.ac.uk/pub/databases/embl/align/).

111

Page 10: Molecular Evolution of the GATA Family of Transcription ... - pdfs/GATA - JME.pdf · Molecular Evolution of the GATA Family of Transcription Factors: Conservation Within the DNA-Binding

the existence of proteins with two zinc fingers (Fig. 6).An insertion event would account for the larger region ofseparation between the two zinc fingers of the five fungalregulators of siderophore biosynthesis (Fig. 6, arrow).Within the vertebrate lineage, evolution of the GATAproteins has occurred via gene duplications, while non-vertebrate GATAs have undergone modular evolution inaddition to gene duplication.

Several more GATA proteins must be cloned andcharacterized to provide additional information regard-ing the evolutionaly history of this complex transcriptionfactor family as many questions remain. First, are theretwo GATA pseudogenes in vertebrates as we have pro-posed? Given the hypothesis that the vertebrate lineageunderwent two major duplication events following thedivergence of Amphioxus and Craniata (Fig. 2A) (Ohno1970; Holland et al. 1994; Pebusque et al. 1998), alongwith our hypothesis that there were two GATAs prior tothe protostome/deuterostome divergence, we would ex-pect to find eight GATA proteins in vertebrates. Possibleexplanations include (i) insufficient sampling, leavingtwo to be discovered, and (ii) distinct events involvingsome type of single gene duplication (i.e., unequal cross-ing-over). Successful completion of the human genomeproject will eventually resolve this question and mayuncover two or more GATAs or GATA pseudogenes.

In other organisms, genome and other large-scale se-quencing projects have uncovered several putativeGATA proteins. The completed genome projects ofS.cerevisiaeandC. eleganshave yielded five and six hy-pothetical GATA proteins, respectively. Searching thedbEST at NCBI confirmed at least transcription of four

of the six hypothetical GATAs inC. elegans.However,it remains unclear whether these proteins are expressedand function accordingly. It is also interesting that theS.cerevisiaegenome contains no GATA proteins with twozinc fingers, especially sinceS. pombecontains at leasttwo such proteins. Within theC. elegansgenome, morethan 3% of the inferred protein sequences contain poten-tial zinc-binding structural domains (Clarke and Berg1998). Surprisingly, theC. elegansgenome contains onlyone protein, ELT1, with two zinc fingers. If recent hy-potheses prevail, those that place nematodes as a sistergroup to arthropods (Aguinaldo et al. 1997; Knoll andCarroll 1999), then we would expect to find another (Fig.6). It is possible that each of these organisms has lost oneor more GATAs with two zinc fingers. Also, sequencingof chromosome 2R ofDrosophila melanogasterhas re-vealed a fourth GATA protein. Comparison to dbESTidentified one EST (AI401919), suggesting that this geneis at least transcribed. Whether the protein is functionalhas yet to be determined.

Given the prevalence of GATAs containing a singlefinger within other organisms, why are there no suchGATA proteins in vertebrates? WithinDrosophila,it hasbeen suggested that dGATAb/serpent is a functional ho-mologue to the whole vertebrate GATA family (Rehornet al. 1996). Rehorn and others have shown thatdGATAb/serpent is essential for hematopoiesis, as areGATA-1/2/3, and for morphogenesis of the endodermalgut, as are GATA-4/5/6. If dGATAb/serpent contains asingle zinc finger and is able to perform such a broadrange of functions, why do we not see a direct orthologuein vertebrates? One possible explanation involves the

Fig. 4. Based on the most parsimonious tree (not shown), we show asimplified version along with the ancestral sequences for two interest-ing nodes. Nodes 1 and 2 correspond to ancestors of the Nf-basicdomain and Cf-basic domain, respectively. Thenumber in parenthesesnext to each terminal node indicates the number of sequences includedin each group. The sequence next to each of the termini represents theconsensus sequence for that group. Based on all known GATA se-

quences, we have constructed a linear model for identifying GATA-type zinc fingers. For the invariant residues, we use the correspondingsingle-letter code. Symbols for the other positions:a (acidic: D, E);b(basic: K, R); f (hydrophobic: A, V, L, I); c (amidic: N, Q); s

(hydroxyl: S, T);u (aromatic: F, Y); lowercase (occurs in all but oneparalogue); X (any amino acid).

112

Page 11: Molecular Evolution of the GATA Family of Transcription ... - pdfs/GATA - JME.pdf · Molecular Evolution of the GATA Family of Transcription Factors: Conservation Within the DNA-Binding

presence of cofactors such as FOG (Tsang et al. 1997),FOG-2 (Lu et al. 1999; Svensson et al. 1999), LIM-Only(Ono et al. 1998), and others that use part of the Nf toform protein complexes. Perhaps multiple cofactors existfor each of the vertebrate GATAs, thereby relaxing se-lection pressures on the single-finger GATAs as theywere no longer essential.

Additionally, concerning the observed flexibility inprimary structure, what impact do the additional aminoacids in the GATA-binding domain (particularly ofEND-1 and ASH-1) have on the tertiary structure and,resultantly, the binding strength and specificity ofGATA-binding proteins? Competitive gel-shift assaysand NMR analyses of DNA–protein complexes couldfurther demonstrate and characterize their effects.

Furthermore, sequences from other organisms (par-ticularly lamprey, hagfish, and amphioxus) would enableestimates of the timing of the duplication events to berefined. Obtaining sequences from tunicates, crusta-

ceans, and other echinoderms would also be useful, es-pecially if a single-finger GATA were cloned. It is in-teresting to note that, to date, no single-finger GATAlocus has been identified in a deuterostome organism.Additionally, we might obtain better estimates of eventssuch as the tandem duplication event we have proposed.Specifically, was there only a single tandem duplicationas we have postulated, or were there more? SequencingGATAs from choanoflagellates, believed to have a com-mon ancestor with Metazoa more recent than do theFungi, could provide an answer this question. If multipletandem duplications have occurred, GATA proteins iso-lated from jellyfish, anemones, or sponges could help toestimate the timing of the tandem duplication within theMetazoan lineage. Likewise, GATAs isolated frommembers of Zygomycota (bread molds,Rhizopus,etc.),which diverged from the Ascomycota/Basidiomycota an-cestor, may help estimate the timing of the tandem du-plication within the fungal lineage.

Fig. 5. Plot showing the Boltzmann–Shannonentropy values calculated for each site within theconserved DNA-binding domain. We number thepositions consistent with Fig. 1 of Omichinski et al.(1993). In this plot, each amino acid represents itsown category. By definition, the theoretical maximumfor E is log2(20) 4 4.32.Below the plot is a smallsubset of the zinc finger/basic domain alignment. Theboldfaceresidues of chicken GATA1 and AREA ofA. nidulanswere shown to interact physically withthe DNA (Omichinski et al. 1993; Starich et al.1998a). Theunderlinedpositions of cGATA1 weredetermined to be involved in maintaining thestructural integrity of the zinc-binding region. The sixitalicized residues (E6, V8, G11, A12, H25, Y26) ofmouse GATA1 were shown to interact physicallywith FOG (Kowalski et al. 1999).

Fig. 6. To summarize, we have constructed aphylogenetic tree showing divergence of the 30organisms included in our analysis (again, this is basedon the “tree of life” except for the position ofNematoda). Note that branch lengths do not accuratelyreflect evolutionary time. Theovals indicate theproposed genome duplications as in Fig. 2. Thesquarerepresents the postulated tandem duplication event. Thearrow indicates the proposed insertion between thefingers of GAF2 and related homologues.

113

Page 12: Molecular Evolution of the GATA Family of Transcription ... - pdfs/GATA - JME.pdf · Molecular Evolution of the GATA Family of Transcription Factors: Conservation Within the DNA-Binding

Acknowledgments. The authors would like to thank the labs of Leo-nard I. Zon and Nigel Holder for access to the zebrafish GATA5sequences and John Gilleard for access to theC. briggsaeand C.elegansELT3 sequences prior to their submission to public databases.We would also like to thank Michael Purugganan and Jeff Thorne forcritical comments on the manuscript. This work was supported by aNational Institutes of Health Training Grant and the U.S. Departmentof Education Graduate Assistance in Areas of National Need(GAANN) Fellowship Program in Biotechnology, P200A80803.

References

Aguinaldo AM, Turbeville JM, Linford LS, Rivera MC, Garey JR, RaffRA, Lake JA (1997) Evidence for a clade of nematodes, arthropodsand other moulting animals. Nature 387:489–493

Ballario P, Vittorioso P, Magrelli A, Talora C, Cabibbo A, Macino G(1996) White collar-1, a central regulator of blue light responses inNeurospora,is a zinc finger protein. EMBO J 15:1650–1657

Benton MJ (1990) Phylogeny of the major tetrapod groups: Morpho-logical data and divergence dates. J Mol Evol 30:409–424

Caddick MX, Arst HN Jr (1990) Nitrogen regulation in Aspergillus:Are two fingers better than one? Gene 95:123–127

Caddick MX, Arst HN Jr, Taylor LH, Johnson RI, Brownlee AG(1986) Cloning of the regulatory gene areA mediating nitrogenmetabolite repression in Aspergillus nidulans. EMBO J 5:1087–1090

Chang WT, Newell PC, Gross JD (1996) Identification of the cell fategene stalky in Dictyostelium. Cell 87:471–481

Clarke ND (1995) Covariation of residues in the homeodomain se-quence family. Protein Sci 4:2269–2278

Clarke ND, Berg JM (1998) Zinc fingers in Caenorhabditis elegans:Finding families and probing pathways. Science 282:2018–2022

Crosthwaite SK, Dunlap JC, Loros JJ (1997) Neurospora wc-1 andwc-2: Transcription, photoresponses, and the origins of circadianrhythmicity. Science 276:763–769

Cunningham TS, Cooper TG (1991) Expression of the DAL80 gene,whose product is homologous to the GATA factors and is a nega-tive regulator of multiple nitrogen catabolic genes in Saccharomy-ces cerevisiae, is sensitive to nitrogen catabolite repression. MolCell Bol 11:6205–6215 [published errtaum appears in Mol CellBiol 12(5):2454, 1992]

Davis MA, Hynes MJ (1987) Complementation of areA- regulatorygene mutations of Aspergillus nidulans by the heterologous regu-latory gene nit-2 of Neurospora crassa. Proc Natl Acad Sci USA84:3753–3757

Drevet JR, Swevers L, Iatrou K (1995) Developmental regulation of asilkworm gene encoding multiple GATA-type transcription factorsby alternative splicing. J Mol Biol 246:43–53

Evans T, Reitman M, Felsenfeld G (1988) An erythrocyte-specificDNA-binding factor recognizes a regulatory sequence common toall chicken globin genes. Proc Natl Acad Sci USA 85:5976–5980

Felsenstein J (1993) PHYLIP (Phylogeny Inference Package), version3.57c. Department of Genetics, University of Washington, Seattle

Froeliger EH, Carpenter BE (1996) NUT1, a major nitrogen regulatorygene in Magnaporthe grisea, is dispensable for pathogenicity. MolGen Genet 251:647–656

Fu Y-H, Marzluf GA (1990)nit-2, the major nitrogen regulatory geneof Neurospora crassa, encodes a protein with a putative zinc fingerDNA-binding domain. Mol Cell Biol 10:1056–1065

Gilbert W (1987) The exon theory of genes. Cold Spring Harbor SympQuant Biol 52:901–905

Haas H, Bauer B, Redl B, Stoffler G, Marzluf GA (1995) Molecularcloning and analysis of nre, the major nitrogen regulatory gene ofPenicillium chrysogenum. Curr Genet 27:150–158

Haas H, Marzluf GA (1995) NRE, the major nitrogen regulatory pro-tein of Penicillium chrysogenum, binds specifically to elements in

the intergenic promoter regions of nitrate assimilation and penicil-lin biosynthetic gene clusters. Curr Genet 28:177–183

Haenlin M, Cubadda Y, Blondeau F, Heitzler P, Lutz Y, Simpson P,Ramain P (1997) Transcriptional activity of pannier is regulatednegatively by heterodimerization of the GATA DNA-binding do-main with a cofactor encoded by the u-shaped gene of Drosophila.Genes Dev 11:3096–3108

Herzel H, Grosse I (1995) Measuring correlations in symbol sequences.Physica A 216:1–13

Hess SM, Stanford DR, Hopper AK (1994) SRD1, a S. cerevisiae geneaffecting pre-rRNA processing contains a C2/C2 zinc finger motif.Nucleic Acids Res 22:1265–1271

Holland PW, Garcia-Fernandez J, Williams NA, Sidow A (1994) Geneduplications and the origins of vertebrate development. Develop-ment Suppl:125–133

Ip HS, Wilson DB, Heikinheimo M, Tang Z, Ting CN, Simon MC,Leiden JM, Parmacek MS (1994) The GATA-4 transcription factortransactivates the cardiac muscle-specific troponin C promoter-enhancer in nonmuscle cells. Mol Cell Biol 14:7517–7526

Jiang Y, Evans T (1996) The Xenopus GATA-4/5/6 genes are associ-ated with cardiac specification and can regulate cardiac-specifictranscription during embryogenesis. Dev Biol 174:258–270

Joulin V, Bories D, Eleouet JF, Labastie MC, Chretien S, Mattei MG,Romeo PH (1991) A T-cell specific TCR delta DNA binding pro-tein is a member of the human GATA family. EMBO J 10:1809–1816

Kawana M, Lee ME, Quertermous EE, Quertermous T (1995) Coop-erative interaction of GATA-2 and AP1 regulates transcription ofthe endothelin-1 gene. Mol Cell Biol 15:4225–4231

Knoll AH, Carroll SB (1999) Early animal evolution: Emerging viewsfrom comparative biology and geology. Science 284:2129–2137

Ko LJ, Engel JD (1993) DNA-binding specificities of the GATA tran-scription factor family. Mol Cell Biol 13:4011–4022

Kowalski K, Czolij R, King GF, Crossley M, Mackay JP (1999) Thesolution structure of the N-terminal zinc finger of GATA-1 revealsa specific binding face for the transcriptional co-factor FOG. JBiomol NMR 13:249–262

Kumar S, Tamura K, Nei M (1994) MEGA: Molecular EvolutionaryGenetics Analysis software for microcomputers. Comput ApplBiosci 10:189–191

Lu JR, McKinsey TA, Xu H, Wang DZ, Richardson JA, Olson EN(1999) FOG-2, a heart- and brain-enriched cofactor for GATAtranscription factors. Mol Cell Biol 19:4495–4502

MacCabe AP, Vanhanen S, Sollewign Gelpke MD, van de VondervoortPJ, Arst HN Jr, Visser J (1998) Identification, cloning and sequenceof the Aspergillus niger areA wide domain regulatory gene con-trolling nitrogen utilisation. Biochim Biophys Acta 1396:163–168

MacNeill C, Ayres B, Laverriere AC, Burch JB (1997) Transcripts forfunctionally distinct isoforms of chicken GATA-5 are differentiallyexpressed from alternative first exons. J Biol Chem 272:8396–8401

Martin DI, Orkin SH (1990) Transcriptional activation and DNA bind-ing by the erythroid factor GF-1/NF-E1/Eryf 1. Genes Dev 4:1886–1898

Martin DI, Tsai SF, Orkin SH (1989) Increased gamma-globin expres-sion in a nondeletion HPFH mediated by an erythroid-specificDNA-binding factor. Nature 338:435–438

Merika M, Orkin SH (1995) Functional synergy and physical interac-tions of the erythroid transcription factor GATA-1 with the Kruppelfamily proteins Sp1 and EKLF. Mol Cell Biol 15:2437–2447

Minehart PL, Magasanik B (1991) Sequence and expression of GLN3,a positive nitrogen regulatory gene of Saccharomyces cerevisiaeencoding a protein with a putative zinc finger DNA-binding do-main. Mol Cell Biol 11:6216–6228

Ohno S (1970) Evolution by gene duplication. Springer-Verlag, NewYork

Omichinski JG, Clore GM, Schaad O, Felsenfeld G, Trainor C, AppellaE, Stahl SJ, Gronenborn AM (1993) NMR structure of a specific

114

Page 13: Molecular Evolution of the GATA Family of Transcription ... - pdfs/GATA - JME.pdf · Molecular Evolution of the GATA Family of Transcription Factors: Conservation Within the DNA-Binding

DNA complex of Zn-containing DNA binding domain of GATA-1.Science 261:438–446

Ono Y, Fukuhara N, Yoshie O (1998) TAL1 and LIM-only proteinssynergistically induce retinaldehyde dehydrogenase 2 expression inT-cell acute lymphoblastic leukemia by acting as cofactors forGATA3. Mol Cell Biol 18:6939–6950

Patthy L (1995) Protein evolution by exon shuffling. Springer-Verlag,Austin, TX

Patthy L (1996) Exon shuffling and other ways of module exchange.Matrix Biol 15:301–310; discussion 311–312

Pebusque MJ, Coulier F, Birnbaum D, Pontarotti P (1998) Ancientlarge-scale genome duplications: Phylogenetic and linkage analysesshed light on chordate genome evolution. Mol Biol Evol 15:1145–1159

Pedone PV, Omichinski JG, Nony P, Trainor C, Gronenborn AM,Clore GM, Felsenfeld G (1997) The N-terminal fingers of chickenGATA-2 and GATA-3 are independent sequence-specific DNAbinding domains. EMBO J 16:2874–2882

Ravagnani A, Gorfinkiel L, Langdon T, Diallinas G, Adjadj E, DemaisS, Gorton D, Arst HN Jr, Scazzocchio C (1997) Subtle hydrophobicinteractions between the seventh residue of the zinc finger loop andthe first base of an HGATAR sequence determine promoter-specific recognition by the Aspergillus nidulans GATA factorAreA. EMBO J 16:3974–3986

Rehorn KP, Thelen H, Michelson AM, Reuter R (1996) A molecularaspect of hematopoiesis and endoderm development common tovertebrates and Drosophila. Development 122:4023–4031

Roman-Roldan R, Bernaola-Galvan P, Oliver JL (1996) Application ofinformation theory to DNA sequence analysis: A review. PatternRecog 29:1187–1194

Saitou N, Nei M (1987) The neighbor-joining method: A new methodfor reconstructing phylogenetic tress. Mol Biol Evol 4:406–425

Schneider TD, Stormo GD, Gold L, Ehrenfeucht A (1986) Informationcontent of binding sites on nucleotide sequences. J Mol Biol 188:415–431

Shannon CE (1949) The mathematical theory of communication. Uni-versity of Illinois Press, Urbana

Sidow A (1996) Gen(om)e duplications in the evolution of early ver-tebrates. Curr Opin Genet Dev 6:715–722

Stanbrough M, Rowen DW, Magasanik B (1995) Role of the GATAfactors Gln3p and Nil1p of Saccharomyces cerevisiae in the ex-pression of nitrogen-regulated genes. Proc Natl Acad Sci USA92:9450–9454

Starich MR, Wikstrom M, Arst HN Jr, Clore GM, Gronenborn AM

(1998a) The solution structure of a fungal AREA protein-DNAcomplex: An alternative binding mode for the basic carboxyl tail ofGATA factors. J Mol Biol 277:605–620

Starich MR, Wikstrom M, Schumacher S, Arst HN Jr, GronenbornAM, Clore GM (1998b) The solution structure of the Leu22→Valmutant AREA DNA binding domain complexed with a TGATAGcore element defines a role for hydrophobic packing in the deter-mination of specificity. J Mol Biol 277:621–634

Svensson EC, Tufts RL, Polk CE, Leiden JM (1999) Molecular cloningof FOG-2: A modulator of transcription factor GATA-4 in cardio-myocytes. Proc Natl Acad Sci USA 96:956–961

Talbot D, Grosveld F (1991) The 58HS2 of the globin locus controlregion enhances transcription through the interaction of a multi-meric complex binding at two functionally distinct NF-E2 bindingsites. EMBO J 10:1391–1398

Thiebaud CH, Fischberg M (1977) DNA content in the genus Xenopus.Chromosoma 59:253–257

Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improv-ing the sensitivity of progressive multiple sequence alignmentthrough sequence weighting, position-specific gap penalties andweight matrix choice. Nucleic Acids Res 22:4673–4680

Tsang AP, Visvader JE, Turner CA, Fujiwara Y, Yu C, Weiss MJ,Crossley M, Orkin SH (1997) FOG, a multitype zinc finger protein,acts as a cofactor for transcription factor GATA-1 in erythroid andmegakaryocytic differentiation. Cell 90:109–119

Voisard C, Wang J, McEvoy JL, Xu P, Leong SA (1993) urbs1, a generegulating siderophore biosynthesis in Ustilago maydis, encodes aprotein similar to the erythroid transcription factor GATA-1. MolCell Biol 13:7091–7100

Whyatt DJ, deBoer E, Grosveld F (1993) The two zinc finger-likedomains of GATA-1 have different DNA binding specificities.EMBO J 12:4993–5005

Yang Z (1997) PAML: A program package for phylogenetic analysisby maximum likelihood. CABIOS 13:555–556

Yang HY, Evans T (1992) Distinct roles for the two cGATA-1 fingerdomains. Mol Cell Biol 12:4562–4570

Zhou LW, Haas H, Marzluf GA (1998) Isolation and characterizationof a new gene, sre, which encodes a GATA-type regulatory proteinthat controls iron transport in Neurospora crassa. Mol Gen Genet259:532–540

Zhu J, Hill RJ, Heid PJ, Fukuyama M, Sugimoto A, Priess JR, RothmanJH (1997) end-1 encodes an apparent GATA factor that specifiesthe endoderm precursor in Caenorhabditis elegans embryos. GenesDev 11:2883–2896

115