new sequence motifs in flavoproteins: evidence for common ancestry and tools to predict structure

20
New Sequence Motifs in Flavoproteins: Evidence for Common Ancestry and Tools to Predict Structure Olivier Vallon* Institut de Biologie Physico-Chimique, CNRS, Paris, France ABSTRACT We describe two new sequence mo- tifs, present in several families of flavoproteins. The ‘‘GG motif’’ (RxGGRxxS/T) is found shortly after the babdinucleotide-binding motif (DBM) in L-amino acid oxidases, achacin and aplysianin-A, mono- amine oxidases, corticosteroid-binding proteins, and tryptophan 2-monooxygenases. Other disperse se- quence similarities between these families suggest a common origin. A GG motif is also found in protopor- phyrinogen oxidase and carotenoid desaturases and, reduced to the central GG doublet, in the THI4 pro- tein, dTDP-4-dehydrorhamnose reductase, soluble fumarate reductase, steroid dehydrogenases, Rab GDP-dissociation inhibitor, and in most flavo- proteins with two dinucleotide-binding domains (glutathione reductase, glutamate synthase, flavin- containing monooxygenase, trimethylamine dehy- drogenase . . . ). In the latter families, an ‘‘ATG mo- tif’’ (oxhhhATG) is found in both the FAD- and NAD(P)H-binding domains, forming the fourth b-strand of the Rossman fold and the connecting loop. On the basis of these and previously described motifs, we present a classification of dinucleotide- binding proteins that could also serve as an evolu- tionary scheme. Like the DBM, the ATG motif ap- pears to predate the divergence of NAD(P)H- and FAD-binding proteins. We propose that flavopro- teins have evolved from a well-differentiated NAD(P)H-binding protein. The bulk of the substrate- binding domain was formed by an insertion after the fourth b-strand, either of a closely related NAD(P)H-binding domain or of a domain of com- pletely different origin. Proteins 2000;38:95–114. r 2000 Wiley-Liss, Inc. Key words: FAD; NAD; NADP; dinucleotide; oxi- dase; reductase; cofactor binding; Ross- man fold INTRODUCTION Sequence motifs are those regions of a protein where variability is limited, usually because of structural con- straints linked either to folding or to interactions with cofactors, substrates, or other proteins. Much effort is devoted to identifying motifs conserved between distantly related protein families: they can serve as ‘‘anchoring sites’’ for predicting the fold of a polypeptide chain, pro- vided the structure of a distant relative is available. One of the earliest and most fruitful examples of such correlation efforts was the discovery by Rossman and collaborators 1 of a conserved sequence motif, reflecting a common chain fold for several dehydrogenases that use NADH as a cofactor. The basic structure consists of a parallel b-sheet, made up of six strands connected in a right-handed fashion by a-helices that flank the sheet on both sides. The structure has overall twofold rotational symmetry, with the first half of the sheet more specifically involved in binding the adenine moiety, the second the nicotinamide moiety. The nucleotide binds near the C-terminal end of the strands. When the central sheet is viewed with the C-terminal ends of the strands pointing upward and the first helix in foreground (as in Fig. 5), the strands, reading left to right, are in the order 6, 5, 4, 1, 2, and 3. The nicotinamide points slightly to the front, the adenine to the back. A slightly modified version of this fold was subsequently identified in the FMN-binding protein flavodoxin 2 and in NADPH- and FAD-binding proteins. 3 Structural informa- tion on the latter type has come mostly from the analysis of the glutathione reductase (GR)-related family of flavopro- tein pyridine nucleotide reductases, which possess one FAD- and one NAD(P)H-binding domain. 4 In this category of proteins, the basic fold of each domain differs slightly from that of dehydrogenases, because the third and fourth strand of the parallel b-sheet are connected not by a helix but by an antiparallel b-sheet (see Schulz 5 ) that covers one side of the central sheet. In the NAD(P)H-binding domain of these proteins, the fifth and sixth strands of the central b-sheet are missing. In the FAD-binding domain, the fifth strand is retained but close to the end of the sequence, i.e., after the NAD(P)H-binding domain. Most of the dinucleotide-binding proteins that adopt the Rossman fold show a series of conserved amino acid residues at a few crucial positions, whereas other regions of the sequence can considerably vary. Alignments of Abbreviations: CBP, corticosteroid-binding protein; COX, choles- terol oxidase; CMO, cyclic monooxygenases; DAO, D-amino acid oxidase; DBM, dinucleotide-binding motif; FCSD, flavocytochrome c-sulfide dehydrogenase; FMO, flavin-containing monooxygenase 5 dimethylaniline monooxygenase; GOX, glucose oxidase; GR, glutathi- one reductase; LAO, L-amino acid oxidase; MAO, monoamine oxidase; PHH, p-hydroxybenzoate hydroxylase; Rab-GDI, Rab GDP dissocia- tion inhibitor; SFR, soluble fumarate reductase; TMO, tryptophan 2-mono-oxygenase; TMD, trimethylamine dehydrogenase. Grant sponsor: Centre National de la Recherche Scientifique; Grant number: UPR1261. *Correspondence to: Olivier Vallon, Institut de Biologie Physico- Chimique, UPR 9072 CNRS, 13 rue Pierre et Marie Curie, F-75005 Paris, France. E-mail: [email protected] Received 1 December 1998; Accepted 19 August 1999 PROTEINS: Structure, Function, and Genetics 38:95–114 (2000) r 2000 WILEY-LISS, INC.

Upload: olivier-vallon

Post on 06-Jun-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

New Sequence Motifs in Flavoproteins: Evidencefor Common Ancestry and Tools to Predict StructureOlivier Vallon*Institut de Biologie Physico-Chimique, CNRS, Paris, France

ABSTRACT We describe two new sequence mo-tifs, present in several families of flavoproteins. The‘‘GG motif’’ (RxGGRxxS/T) is found shortly after thebabdinucleotide-binding motif (DBM) in L-aminoacid oxidases, achacin and aplysianin-A, mono-amine oxidases, corticosteroid-binding proteins, andtryptophan 2-monooxygenases. Other disperse se-quence similarities between these families suggest acommon origin. A GG motif is also found in protopor-phyrinogen oxidase and carotenoid desaturases and,reduced to the central GG doublet, in the THI4 pro-tein, dTDP-4-dehydrorhamnose reductase, solublefumarate reductase, steroid dehydrogenases, RabGDP-dissociation inhibitor, and in most flavo-proteins with two dinucleotide-binding domains(glutathione reductase, glutamate synthase, flavin-containing monooxygenase, trimethylamine dehy-drogenase . . . ). In the latter families, an ‘‘ATG mo-tif’’ (oxhhhATG) is found in both the FAD- andNAD(P)H-binding domains, forming the fourthb-strand of the Rossman fold and the connectingloop. On the basis of these and previously describedmotifs, we present a classification of dinucleotide-binding proteins that could also serve as an evolu-tionary scheme. Like the DBM, the ATG motif ap-pears to predate the divergence of NAD(P)H- andFAD-binding proteins. We propose that flavopro-teins have evolved from a well-differentiatedNAD(P)H-binding protein. The bulk of the substrate-binding domain was formed by an insertion afterthe fourth b-strand, either of a closely relatedNAD(P)H-binding domain or of a domain of com-pletely different origin. Proteins 2000;38:95–114.r 2000 Wiley-Liss, Inc.

Key words: FAD; NAD; NADP; dinucleotide; oxi-dase; reductase; cofactor binding; Ross-man fold

INTRODUCTION

Sequence motifs are those regions of a protein wherevariability is limited, usually because of structural con-straints linked either to folding or to interactions withcofactors, substrates, or other proteins. Much effort isdevoted to identifying motifs conserved between distantlyrelated protein families: they can serve as ‘‘anchoringsites’’ for predicting the fold of a polypeptide chain, pro-vided the structure of a distant relative is available. One ofthe earliest and most fruitful examples of such correlation

efforts was the discovery by Rossman and collaborators1 ofa conserved sequence motif, reflecting a common chain foldfor several dehydrogenases that use NADH as a cofactor.The basic structure consists of a parallel b-sheet, made upof six strands connected in a right-handed fashion bya-helices that flank the sheet on both sides. The structurehas overall twofold rotational symmetry, with the first halfof the sheet more specifically involved in binding theadenine moiety, the second the nicotinamide moiety. Thenucleotide binds near the C-terminal end of the strands.When the central sheet is viewed with the C-terminal endsof the strands pointing upward and the first helix inforeground (as in Fig. 5), the strands, reading left to right,are in the order 6, 5, 4, 1, 2, and 3. The nicotinamide pointsslightly to the front, the adenine to the back.

A slightly modified version of this fold was subsequentlyidentified in the FMN-binding protein flavodoxin2 and inNADPH- and FAD-binding proteins.3 Structural informa-tion on the latter type has come mostly from the analysis ofthe glutathione reductase (GR)-related family of flavopro-tein pyridine nucleotide reductases, which possess oneFAD- and one NAD(P)H-binding domain.4 In this categoryof proteins, the basic fold of each domain differs slightlyfrom that of dehydrogenases, because the third and fourthstrand of the parallel b-sheet are connected not by a helixbut by an antiparallel b-sheet (see Schulz5) that covers oneside of the central sheet. In the NAD(P)H-binding domainof these proteins, the fifth and sixth strands of the centralb-sheet are missing. In the FAD-binding domain, the fifthstrand is retained but close to the end of the sequence, i.e.,after the NAD(P)H-binding domain.

Most of the dinucleotide-binding proteins that adopt theRossman fold show a series of conserved amino acidresidues at a few crucial positions, whereas other regionsof the sequence can considerably vary. Alignments of

Abbreviations: CBP, corticosteroid-binding protein; COX, choles-terol oxidase; CMO, cyclic monooxygenases; DAO, D-amino acidoxidase; DBM, dinucleotide-binding motif; FCSD, flavocytochromec-sulfide dehydrogenase; FMO, flavin-containing monooxygenase 5dimethylaniline monooxygenase; GOX, glucose oxidase; GR, glutathi-one reductase; LAO, L-amino acid oxidase; MAO, monoamine oxidase;PHH, p-hydroxybenzoate hydroxylase; Rab-GDI, Rab GDP dissocia-tion inhibitor; SFR, soluble fumarate reductase; TMO, tryptophan2-mono-oxygenase; TMD, trimethylamine dehydrogenase.

Grant sponsor: Centre National de la Recherche Scientifique; Grantnumber: UPR1261.

*Correspondence to: Olivier Vallon, Institut de Biologie Physico-Chimique, UPR 9072 CNRS, 13 rue Pierre et Marie Curie, F-75005Paris, France. E-mail: [email protected]

Received 1 December 1998; Accepted 19 August 1999

PROTEINS: Structure, Function, and Genetics 38:95–114 (2000)

r 2000 WILEY-LISS, INC.

protein sequences have been proposed on the basis ofpairwise and multiple structural alignments1,3,6 and havebeen used to generate a sequence consensus for the centralpart of the fold. This motif, presented in its most elaborateform by Wierenga et al.,7 applies to NADH- as well asFAD-binding domains and in a modified form to NADPH-binding domains. In the following, this sequence signaturewill be referred to as the dinucleotide-binding motif (DBM).The basic DBM consensus can be written:

ohxhxGxGxxGxxxhxxhxxxx....hxhxD/E

eeeeee hhhhhhhhhhhhhhh eeeee

where o stands for a polar or charged residue and h for ahydrophobic residue. The sequence constitutes the ‘‘bab ’’motif: the first two b-strands and the intervening amphi-pathic a-helix (as indicated by e and h, respectively,underneath). The conserved central GxGxxG sequenceallows close approach of the polypeptide backbone to theADP ribose and pyrophosphate and close packing of thehelix with the b-sheet. The second and third Gs can bereplaced by A in NADH or NADPH binding sites.8 Thehydrophobic residues provide hydrophobic interactionsbetween the a-helix and the b-sheet. The terminal residue(generally an E in flavoproteins, a D in NADH-bindingproteins) hydrogen bonds to the ribose of the adeninemoiety. At that position, a noncharged residue is usuallyfound in NADPH-binding sites, to accommodate the phos-phate group of the cofactor (see, however, Baker et al.9),followed by a positively charged residue. The importanceof these and another residue downstream for NADH/NADPH discrimination has been established by site-directed mutagenesis.10

In addition to this motif, a second fingerprint has beenidentified by Eggink et al. 11 in FAD-binding proteins of theGR family. The characteristic sequence is TxxxxhfhhGD,where f is an aromatic residue. The hydrophobic residuesbelong to the fifth b-strand of the FAD-binding domain,found near the C-terminus of the protein. The invariant Gprovides for a turn at the end of the b-strand, and theterminal D H-bonds to the O3* (and sometimes O4*) of theribityl in the flavin moiety of the cofactor. The initial Tparticipates in the formation of a ‘‘greek key’’ motif foundjust before the fifth strand.

The Rossman fold is the prevalent, although notunique12,13 way adopted by proteins to bind dinucleotides.In the case of flavoproteins, numerous families have beendescribed, and in most cases a classical DBM can be foundin the N-terminal part of the sequence. However, outside ofthis region, sequence comparison programs in generalhave failed to identify sequence similarities between thesefamilies. The three dimensional (3-D) structure of severalflavoproteins is known at atomic resolution, and theirfolding pattern appears much more similar than wouldhave been suggested by comparison of their sequences.The question of whether all these flavoproteins have acommon origin or have convergently acquired their com-

mon structural similarities and DBM sequence has hith-erto remained largely unanswered.

In a previous study, we presented the cloning andsequencing of a complementary deoxyribonucleic acid(cDNA) for a flavoprotein, the catalytic subunit of theperiplasmic L-amino acid oxidase (LAO) of the green algaChlamydomonas reinhardtii.14 Sequence comparison hasrevealed the presence of a short ‘‘GG motif,’’ found in LAOsand in a wide variety of other flavoprotein families. Inaddition, we identified another, hitherto overlooked se-quence signature, the ‘‘ATG motif,’’ present in a largecollection of flavoproteins and in many NAD(P)H-bindingproteins. Finally, we systematically examined the pres-ence in flavoproteins of the fingerprint described by Egg-ink et al.11 Our analysis lends support to the hypothesis ofa common evolutionary origin for these flavoprotein fami-lies, rooted among dehydrogenases.

MATERIALS AND METHODS

Sequences and 3-D structures were retrieved from data-bases by using a web interface (www.infobiogen.fr/srs5).Homology searches were performed with the BLASTP orPSI-BLAST programs15 on the NCBI server. Multiple align-ments were generated with MACAW16 or CLUSTALW17

(www.genome.ad.jp/SIT/CLUSTALW.html) or were re-trieved from the PUMA database (www-c.mcs.anl.gov/home/compbio/PUMA/). Secondary structure predictionswere generated with PSIpred (http://globin.bio.warwick.ac.uk/psipred/). 3-D structures were visualized withRASWIN.18 Interatomic distances and torsion angles werecomputed with the CCP4 program suite.19

RESULTSGG Motif

Besides the classical DBM, the C. reinhardtii LAOshows only weak sequence similarity to the other LAOsequences, and we propose to classify it in a familydifferent from the other LAOs. However, all LAOs sharethe sequence RxGGRxx(S/T) (hereafter referred to as theGG motif), located exactly four residues downstream of theDBM. This sequence was also found, at the same distancefrom the DBM, in many other families of flavoproteins(listed in Table I). Suspecting that the GG motif wasindicative of homology, we attempted to identify otherconserved sequences between these proteins. We foundthat four of these families share sufficient similarity to beconsidered homologous, i.e., of probable common ancestry:non-algal LAOs, corticosteroid-binding proteins (CBP),monoamine oxidases (MAO), and tryptophan 2-mono-oxygenases (TMO). Selected regions of a MACAW align-ment are presented in Figure 1 (the entire file is availablefrom the author upon request, as are all the other align-ments used in this study). The line above the humanMAO-B in Figure 1 is a secondary structure prediction forthis sequence, generated with the program PSIpred. Othersequences yielded similar predictions for the conservedregions (not shown). The C. reinhardtii LAO and the twomolluscan antibacterial proteins were also included in thealignment, even though the similarity to the other protein

96 O. VALLON

families is somewhat lower and homology is less certain.Carotenoid desaturases (not shown) show a weak similar-ity to these sequences in the middle (hhhxxP/S) and last(FAGE) similarity blocks (see Table V). With protoporphy-rinogen oxidases, similarity was limited to the DBM andGG motif.

Expanding the Search: The ‘‘GG Doublet’’

In the GG motif, the two central glycines appear as theonly strictly invariant residues in the families listed inTable I. In a library of 394 sequences of nucleotide-bindingproteins (mostly flavoproteins), extracted from NBRF byusing various versions of the DBM consensus, we found 50sequences with a GG doublet 6 residues after the end of theDBM. The protein families in which this GG doublet wassignificantly conserved are presented in Table II. A fewadditional observations are detailed below.

On the basis of sequence comparison, fumarate reducta-ses can be divided into two families sharing significantsimilarities (Fig. 2). The GG doublet was conserved in thesoluble fumarate reductases (SFR) that bind FAD noncova-lently,20 but not in the other family, comprising the soluble

thiol-fumarate reductases from archaebacteria21 and themembrane-bound succinate dehydrogenases of eubacteriaand mitochondria22 that bind FAD covalently.23 In thelatter family, the corresponding region contained the Hisresidue that substitutes FAD (replaced by Cys in archaebac-teria). However, the presence of a GG doublet was notstrictly correlated with noncovalent binding of FAD:L-aspartate oxidases that can reduce fumarate and arerelated in sequence to succinate dehydrogenases24 bindFAD noncovalently.25 Still, their sequence lacks a GGdoublet (Fig. 2). Several steroid dehydrogenases showingdispersed similarity to SFRs have been included in thealignment of Figure 2. Here, the GG doublet was found at avariable distance from the DBM (15 to 18), but otherregions of similarity indicate a common origin. The boxedsequences in Figure 2 are conserved only between fuma-rate reductases, succinate dehydrogenases, and aspartateoxidase: they may delineate the substrate-binding domain.

Also listed in Table II are several families of complexflavoproteins that share the particularity of containingtwo dinucleotide-binding domains, each with its DBM. Ingeneral, the first domain is involved in binding FAD, the

TABLE I. Flavoproteins With a GG Motif

Protein family: 5n6 Sequence identifiers References GG motif Exceptions

L-amino acid oxidase (LAO, EC 1.4.3.2) 516 U78797 (C. reinhardtii) 14 RxGGRxxSL-amino acid oxidase (LAO, EC 1.4.3.2)a 566 A38314 (N. crassa);

O93364 (Crotalus); BC542A (B. cereus); O34363 (B. subtilis); Z48565 (Syn-echococcus sp. PCC6301); MMU70430 (M. musculus FIG1)

54–57 RxGGRxx(S/T)

Antibacterial proteinb 526 ACHC_ACHFU; D83255 58, 59 RxGGRxx(S/T)Corticosteroid-binding protein (CBP)c 526 CBP1_CANAL;

FMS1_YEAST60, 61 RxGGRxxT

Mono-amine oxidase (MAO, EC 1.4.3.4) 5176 Vertebrate MAOs:AOFA_HUMAN; AOFB_HUMAN; AOFA_RAT; AOFA_BOVIN; S45812;X15609; AOF_ONCMYOther: Z78198 (C. elegans); e327500 (A. thaliana) AOFN_ASPNG (A.nudans MAO-N); PUO_MICRU (M. rubens putrescine oxidase); AOFH-_MYCTU (M. tuberculosis); Y782_SYNY3 (Synechocystis sp.); AB010716 (M.luteus tyramine oxidase); AJ223391 (A. nicotinovorans 6-hydroxy-L-nicotineoxidase)

50, 62, 63 RxGGRxx(S/T) AOFB_RAT: CxGGRxxTZ35602 (C. elegansR13G10.2): RxGGRxxDY782_SYNY3(S. sp. PCC6803): RxGGRxxG

Tryptophan 2-mono-oxygenase (TMO, EC 1.13.12.3) 596 TR2M_AGRRA; S28687; TR2M_AGRT4; 499582; TR2M_AGRT3; TR2M_PSESS;A20966; A53376;

64, 65 RxGG(R/K)xxS

TR2M_AGRVI: VxGGRxxT

Carotenoid desaturased 52762CRTD_RUBGE; CRTD_RHOCA; CRTD_RHOSH; CRTI_RHOCA;CRTI_RHOSH; PDEH_STRGR; e224562; CRTI_ERWHE; CRTI_AGRAU;X97985; S32171; S43324; CRTJ_MYXXA; CRTI_NEUCR; CRTI_NEUCR;CRTI_PHYBL; CRTI_MYXXA;2CRTI_SYNY3; CRTI_ARATH; CRTI_MAIZE; CRTI_SOYBN2S66625; ZCDS_SYNY32crtN: B55548

66–70 xxGG(R/K) YZ25_MYCTU;d1011212; 2749982: xxGGAZCDS_ARATH : xxGAKP49_STRLI; e318945;e1237742 : xxGGG

Protoporphyrinogen oxidase (EC 1.3.3.4)e 5116 HEMG_BACSU;g699193; PPOX_MYCTU; SCHEM14PO; YAM7_SCHPO; PPOX_HUMAN;PPOX_MOUSE; ATHPPOX; NTY13465

71, 72 RxGGxxx(S/T) d1022768: HxGGxxxSNTY13466: KxGGxxxS

aHomology between the Synechococcus and Neurospora proteins was overlooked in55 and 14 because of sequencing errors. The Fig 1 protein hasbeen classified as a MAO but has more similarity to LAOs.56,57

bAchacin and aplysianin-A have not been recognized previously as flavoproteins. Their antibacterial effect could be the result of H2O2 production.cFMS1, the yeast CBP, acts as a multicopy suppressor of fenpropimorph resistance (Genbank entry), suggesting a role in sterol biosynthesis.73

dKnown as phytoene, methoxyneurosporene, hydroxyneurosporene, z-carotene, or dehydrosqualene desaturases (dehydrogenases).eThe S. cerevisiae enzyme binds FAD covalently. It oxidizes protoporphyrinogen IX, whereas the B. subtilis sequence is that of a protoporphyrino-gen IX/coproporphyrinogen III bispecific enzyme.

97SEQUENCE MOTIFS IN FLAVOPROTEINS

3URWHLQ VS UHV�

'%0 **�PRWLIo h h h h G G G h h o h h h h E R G G R S/T D/E G A/G

/$2 6� VS �� S V L V L G A G M A G L T A A L S L L R R G h - - - - Q V T V I E Y Q N R I G G R L L S v p l k g g q - - - F S - E A G G G H F R A N M P/$2 1�F� ��� N I A I V G A G M S G L M T Y L C L T Q A G M T - - - N V S I I E G G N R L G G R V H T e y l s g g p f d y s y q E M G P M R F P N T I T ���),*� 0�P� �� K V V V V G A G V A G L V A A K M L S D A G H - - - - K V T I L E A D N R I G G R I F T F R D E K T G - - - W I g E L G A M R M P S S - - ���

&%3 6�F� �� K V I I I G A G I A G L K A A S T L H Q N G I Q - - - D C L V L E A R D R V G G R L Q T V T G Y Q G R k - - - y - D I G A S W H H D T L T ���&%3 &�D� � K V L I I G A G V S G L K A A E T I L S K S F L t g d D V L V V E A Q N R I G G R L K T T D T S Q S K l g i n y - D L G A S W F H D S L N ���

( ( ( ( ( & & & + + + + + + + + + + + + & & & & ( ( ( ( ( & & & & & & & ( ( ( ( ( ( & & & & & ( ( ( & & & & & & & & & &

0$2�% +�V� � D V V V V G G G I S G M A A A K L L H D S G L - - - - N V V V L E A R D R V G G R T Y T L R N Q K V K - - - Y V - D L G G S Y V G P T - Q ���0$2 6�J� � D V I V I G G G I S G L S A A K L L K E K G L - - - - S P V V L E A R D R V G G R T F T V Q N E Q T K - - - Y V - D L G G A Y V G P T - Q ���3XWUR[ 0�U� �� D V V V V G A G P A G L M A A R T L V A A G r - - - - T V A V L E A R D R V G G R T W S - K T V D G A - - - F L - E I G G Q W I S P D - Q ���0$2�1 $�Q� �� D V I V I G G G Y C G L T A T R D L T V A G F - - - - K T L L L E A R D R I G G R S W S - S N I D G Y - - - P Y - E M G G T W V H W H - Q ���0$2�� &�H� �� K I A I I G A G I S G I S T A R H L K H L G I - - - - D A V L F E A K D R F G G R M M D - D Q S L G V - - - S V g k - G A Q I I V G N - I ���0$2�� &�H� �� S I A I I G A G F A G L R A A Q R F E E L G I - - - - V Y T I F E G S D R I G G R V Y S - F P Y Q N G - - - Y L - Q F G A E Y I N G E - D ���0$2 $�W� ��� K V I V I G A G P A G L T A A R H L Q R Q G F - - - - S V T V L E A R S R V G G R V F T D R S S L S V - - - P V - D L G A S I I T G i e a ���

702 $�W� ��� K V A V I G A G I S G L V V A N E L L H A G V D - - - D V T I Y E A S D R V G G K L W S H A F R D A - P S V V A - E M G A M R F P P A A F ���702 3�V� �� R V A I V G A G I S G L V A A T E L L R A G V K - - - D V V L Y E S R D R I G G R V W S Q V F D Q T r P R Y I A - E M G A M R F P P S A T ���

$FKDFLQ �� D V A V V G A G P S G T Y S A Y K L R N K G q - - - - T V E L F E Y S N R I G G R L F T T H L P N V - P D L N L - E S G G M R Y F K N H H ���$SO\VLDQLQ�$ �� N I A I V G A G P S G A Y S A Y K M R H S G k - - - - D V G L F E Y C N R V G G R L Y T Y Q L P N T - P D V N L - E L G G M R Y I T G A H ���/$2 &�U� �� D V V V V G G G C G G I Y S A Y R L L S G T T L k p - S V C T F E A T N R V G G R I F S i r g l g k l a d m T V - D L G A Y R Y V D G R H ���

/$2 6� VS ��� R I Q G G N D R L P K A M A A A I - - - G S E R F I L D A - - P V V A I D Q Q A - - - - - - - - - N R A T V T V K D G R - - - - - T - - - - - -/$2 1�F� ��� A I D G G L N R L P L S F H P L V - - - D N A - T T L N r r l E R V A F D A E T - - - - - - - - - Q K V T L H S R N S y k d S F E S - - - - - -),*� 0�P� ��� R I V G G W D L L P R A L L S S L - - - S G A - L L L N A - - P V V S I T Q G R - - - - - - - - - N D V R V H I A T S L H - S E K T - - - - - -

&%3 6�F� ��� F A L - N Y D S V V Q R I A Q S F - - - P Q N W L K L S C - - E V K S I T R E P s - - - - - - - - K N V T V N C E D G T - - - - - V - - - - - -&%3 &�D� ��� L N K K G Y G Y L V E S L A K R I - - - P E S S L L L E E - - P V N K I I R N N K d a G - - - - - K R V L V E T I N G L - - - - - Q - - - - - -

+ + + & & + + + + + + + + + + + & & & & ( ( & & & ( ( ( ( ( ( ( & & & & ( ( ( ( ( & & & ( (

0$2�% +�V� ��� K F V G G S G Q V S E R I M D L L - - - G D R - V K L E R - - P V I Y I D Q T R - - - - - - - - - E N V L V E T L N H E - - - - - M - - - - - -0$2 6�J� ��� K F L G G S S Q I S E C M A K E L - - - G E R - V K M E S - - P V Y K I D Q T G - - - - - - - - - D M V E V E T L N K E - - - - - T - - - - - -3XWUR[ 0�U� ��� R V V G G M Q S V S E T M A A E L - - - G E D V V F L D T - - P V R T I R W A G D g g t y a e h v p g t p v t v w s d r l - - - - T - - - - - -0$2�1 $�Q� ��� K F K D G Q S A F A R R F W E E A a g t G R L G Y V F G C - - P V R S V V N E R - - - - - - - - - D A A R V T A R D G R - - - - - E - - - - - -0$2�� &�H� ��� V I T D G A Q R I I D F L A T - - - - - G L D - I R L N C - - P V K C I D W G R - - - - - - - - - D D R K V K I F F E N a e q a a - - - - - - -0$2�� &�H� ��� f n t w d S G V D S E A T L N N M - - - G F K - T I L D E - - L T S K V S K S K m r m l s k v e g l d y t g k f f q s r - - - - - - - - - - - -0$2 $�W� ��� M I K G G Y S R V V E S L A E - - - - - G L D - I H L N K - - I V S D V S Y v s d v s a m d n s k H K V R V S T S N G C - - - - - E - - - - - -

702 $�W� ��� M C P E G I S E L P R R I A S E V - - - V N G V S V S Q R - - I C H V Q V R A I Q K E K - - - - - T K I K I R L K S G I - - - - - S - - - - - -702 3�V� ��� L I P D G I S S L A A R L A D Q S - - - F D G K A L R D R - - V C F S R V G R I S R E A - - - - - E K I I I Q T E A G E - - - - - Q - - - - - -

$FKDFLQ ��� T L T D G M S A L P Q A L A D A F - - - L K S S T S H A L - - T L N R K L Q S L S K T d n g l y l l e f l e t n t h e g y t e e s n i t d l - -$SO\VLDQLQ�$ ��� T V E E G M Q K V P K E L I K E F - - - K K T S A S N Q V - - Q L N K Y L Q A I R S K s d h s f v l y f r p t s t v d g k t t i l d y r p l q r/$2 &�U� ��� l g f v t f v e e L T K L S M S L - - - G M K - L F L N S - - P V T K V D R m p s t s a d a q y a v s v m g p n g k i m k - - - - - - - - - - -

$7*�OLNH"h h h P/S Φ Φ

/$2 6� VS F Q G D A I I S T I P F T V L P E - - - - - - - - - - - V A V R P G - - W S A G - - - K R R M F A E M E W E Q T V K V I A Q T R S P V W L A Q N V H G w p m a ���/$2 1�F� S E H D Y A V I A A P F S I V K K - - - - - - - - - - - W R F S P A - - L D L T a p t L A N A I Q N L E Y T S A C K V A L E F R T R F W E H L P Q p i y g s c ���),*� 0�P� L T A D V V L L T A S G P A L Q R - - - - - - - - - - - I T F S P P - - L T R K - - - R Q E A L R A L H Y V A A S K V F L S F R R P F W H E E H I E G G H S N ���

&%3 6�F� Y N A D Y V I I T V P Q S V L N L s v q p e k n l r g r I E F Q P P - - L K P V - - - I Q D A F D K I H F G A L G K V I F E F E E C C W S N E S S - K i v t l ���&%3 &�D� I F C D Y L I V T V P Q S I L L L e e s s p y s - - - - I K W E P K - - L P Q R - - - L V E S I N S I H F G A L G K V I F E F D R I F W D N S K D - R f q i i ���

( ( & & ( ( ( ( ( & & + + + + + + + + & & & & & & + + + + + + + + + & & & & & & & ( ( ( ( ( ( & & & & & & & & & & & & & & (

0$2�% +�V� Y E A K Y V I S A I P P T L G M K - - - - - - - - - - - I H F N P P - - L P M M - - - R N Q M I T R V P L G S V I K C I V Y Y K E P F W R K K D Y - C G T M I ���0$2 6�J� Y K A K Y V I V A T P P G L N L K - - - - - - - - - - - M H F N P E - - L P P L - - - R N Q L I H R V P M G S V I K C I V Y Y R E N F W R K K G Y - C G T M V ���3XWUR[ 0�U� V R A K D V V V A V P P N L Y S R - - - - - - - - - - - I S F E P P - - L P R L - - - Q H Q M H Q H Q S L G L V I K V H A V Y E T P F W R D K G L - S G T g f ���0$2�1 $�Q� F A A K R L V C T I P L N V L S T - - - - - - - - - - - I Q F S P A - - L - - - - - - S T E R I S A M Q A G H V N M C t k v h a e v - - - - - - - - - - - - - ���0$2�� &�H� E E F D K V V I T T S L S V L K S n h s - - - - - - - - K M F V P P - - L P I E - - - K Q K A I D D L G A G L I E K I A V K F D R R F W D T V D A - D G l r t ���0$2�� &�H� - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - K M R A I E R A G F G S N L K I F L E Y E K P W W - - - - - - - - - - - ���0$2 $�W� Y L G D A V L V T V P L G C L K A e t - - - - - - - - - I K F S P P - - L P D W - - - K Y A S I K Q L G F G V L N K V V L E F P T V F W D D S v d y f g a t a ���

702 $�W� E L Y D K V V V T S G L A N I Q L r h c l t c d t n i f q a p - - - - - - - - - - - - V N Q A V D N S H M T G S S K L F L M T E R K F W L D H I L - P S C V L ���702 3�V� R V F D R V I V T S S N R A M Q M i h c l t d s e s f l s r d - - - - - - - - - - - - V A R A V R E T H L T G S S K L F I L T R T K F W I K N K L - P T T I Q ���

$FKDFLQ V C A R K V I L A I P Q S A L I H - - - - - - - - - - - L D W K P L - - R S E T - - - V N E A F N A V K F I P T S K V F L T F P T A W W L S D A V - K N P a f ���$SO\VLDQLQ�$ V C A R Q V I L A L P V F A L R R - - - - - - - - - - - L D W P P L - - H E G R - - - A E T A Y A A V R N M A A S K V F M T F D Q A W W L D R N F t D N T a f ���/$2 &�U� V V A K H V I F N T P Q R P L M R i l q n s n l a r v t e - - - - - - - - - - - - - - W S S A L D M P F P L M A A K M Y L Y Y D D A W W V y m n r t t g s l n ���

*'�OLNH" *�KHOL["Φ G Φ h hF A G E o G A G/A

/$2 6� VS ��� v v t l a d f a W G E Q - P W I R G S F g g - P P L G G a w m i r - - - - - - - - - - - - - - - E W T T P E G L I H F A G D F T T M K - - S G W V E G A I E S G L R A A R Q I d p g a q ���/$2 1�F� ��� t g q f n R R C W A L D - P L E S A S W A S P T V - - G Q H E L Y L P - - - - - - - - - - - - - E Y F Q T R N N L V F V G E H T S Y T - - H A W I A S A L E S G I R G S V Q L L L E L G ���),*� 0�P� ��� d g r g V V K R W A E D - P H S Q G G F V V Q P P L Y G R E A E D Y - - - - - - - - - - - - - - D W S A P F G R I Y F A G E H T A L P - - H G W V E T A V K S G L R A A V R I N N N Y G ���

&%3 6�F� ��� L R N I I V S N W T R D - P Y S R G A Y S A C F P - - - - - G D D P V D M V v a - - - - - - - - M S N G Q D S R I R F A G E H T I M D G - A G C A Y G A W E S G R R E A T R I S D L L K&%3 &�D� ��� P I N T I V T D W T T N - P Y I R G S Y S T M Y T - - - - - N D D P S D L I i s l s g d f e - - D L G I L E P Y I K F A G E H T T S E G - T G C V H G A Y M S G I Y A A D C I L E N I F ���

& & & & ( ( ( ( & & & & & & & & & & & & & & & & & & & & & & + + + + + & & & & & & ( ( ( ( ( & & & & & & & & + + + + + + + + + + + + + + + + + + + + + &

0$2�% +�V� ��� P V H Y E E K N W C E E - Q Y S G G C Y T T Y F P P G I L - - T Q Y G R - - - - - - - - - - - - V L R Q P V D R I Y F A G T E T A T H W - S G Y M E G A V E A G E R A A R E I L H A M G ���0$2 6�J� ��� P V H Y E E K N W C E E - E Y S G G C Y T A Y F P P G I L - - T Q Y G K - - - - - - - - - - - - V L R E P V G R L Y F A G T E T A T E W - S G Y M E G A V Q A G E R A A R E V M Y E M G ���3XWUR[ 0�U� ��� P V V Y Y E S D F G S E - E W T R G A Y A A S Y D L G G L - - H R Y G A - - - - - - - - - - - - H Q R T P V G P I R W A G S D L A A E G - Y Q H V D G A L R Q G R L A A A E V L G A G S ���0$2�1 $�Q� ��� v K R L V F H N W V K D - E F A K G A W F F S R P - G M V - - S E C L Q - - - - - - - - - - - - G L R E K H R G V V F A G S D W A L G W - R S F I D G A I E E G T R A A R V V L E E L G ���0$2�� &�H� ��� P L G H M M S H W G A D - R F V G M S Y T F V P F g s d g d a t - Y N Q - - - - - - - - - - - - L K K S I D E K L Y F A G E H T I A A E - P Q T M A G A Y I S G L R E A G Q I V M S L K ���0$2�� &�H� ��� i Q K I Y R H N W I T D - E F A R G S Y S Y I S E n t c h s N T D D I K I L r d p - - - - - - - V L R N R K P I I C F A G E H T D S K M - Y Q T A V G A S R S G L R E A D R I F N Y M R0$2 $�W� ���� p v a s V V T D W G T E - P Y S Y G A Y S Y v a i g a s g e d y d - - - - - - - - - - - - - - - V L G R P V Q N L F F A G E A T C K E H - P D T V G G A M M T G V R E A V R I I D I L R ���

702 $�W� ��� E R Y V L H H D W L T D - P H S A G A F K L N Y P - - - - - G E D V Y S q r l f f q p m t a - - N S P N K D T G L Y L A G C S C S F A - - G G W I E G A V Q T A L N S A C A V L R S T G ���702 3�V� ��� D Q N V I Q H D W L T D - E N A G G A F K L N R R - - - - - G E D F Y S e e l f f q a - - - - - L D T A N D T G V Y L A G C S C S F T - - G G W V E G A I Q T A C N A V C A I I H N C G ���

$FKDFLQ ��� P K S G T S Q F W S S Y - P F - E G D W T V W K A G Y H c e y T Q Y I I - - - - - - - - - - - - E R P S L I D D V F V V G S D H V N C I E N A W T E S A F L S V E N V F E K Y F - - - -$SO\VLDQLQ�$ ��� P K T A V S Q F W T D Y - P F - G C G W I T W R A G Y H f d d V M S T M - - - - - - - - - - - - R R P S L K D E V Y V V G A D Y S W G L M S S W T E G A L E T A D A V L K D Y F k g e c ���/$2 &�U� ��� P S Q A V V V V W d p q i Q W T G G A W H S Y K N g i y g - S Q V S T n y t d p r n a v p v a s I K P F A G E N V W V A N E A F S A V - - Q G W A E G S L I M A E N V V T Q L g a a r p ���

Figure 1.

98 O. VALLON

second one NAD(P)H. The GG motif was always foundafter the first DBM (Table II). An alignment of several ofthese families is presented in Figure 3.

In particular, a series of proteins related to the baiHgene product of Eubacterium sp.26 present this type oforganization in their C-terminal region. The 3-D structureof a protein in this family, trimethylamine dehydrogenase(TMD)27,28 allowed us to visualize the position of the GGdoublet in the molecule (Fig. 4). In this protein, ADP, notFAD, occupies the first DBM.29 The GG doublet (residues425–426) is located at the base of an a-helix that pointstoward the second dinucleotide-binding domain. The dihe-dral angles of these Gly (Table III) are in a domain of theRamachandran plot that is not allowed for other types ofresidue.30 In addition, the first Gly is in contact withcentral residues in the DBM, whereas the second one is inclose proximity to the pyrophosphate and ribose of theADP cofactor (Table III). The packing is so dense that theintroduction of side chains at these positions appearsimpossible without extensive rearrangement of the back-bone. The next residue, H427, also plays an importantstructural role, because it makes main chain and sidechain contacts with atom O2A of the cofactor. These strongstructural constraints may explain the excellent conserva-tion of this GG doublet in the family.

The folding pattern of the two C-terminal domains ofTMD closely resembles that of the FAD and NADPHdomains of GR.27 The similarity is striking in the regionsthat surround the ADP/FAD cofactors. In a structure-based alignment, the GG doublet in TMD was shownperfectly superposed to two Gly residues in GR.29 Thisdoublet is found not six, but five residues downstream ofthe DBM, which explains why GR did not appear in ourinitial screen for proteins with a GG doublet. In GR, as inTMD, the first Gly is close to the DBM, and the second oneapproaches the ribose AO3* atom of the cofactor (TableIII). The residue immediately following the GG doublet,T57, makes main chain atom contact with the AO2 atom ofFAD (as does H427 in TMD), plus side chain contact withAO1. The following a-helix runs alongside the ribityl of the

FAD and contains the two redox-active Cys. In the struc-ture-based alignment of GR and TMD presented in Figure3, the GG doublet appears as one of the few cases ofsequence identity between the two proteins.

A number of proteins share significant sequence andstructural similarity with GR and are usually grouped inthe superfamily flavoprotein pyridine nucleotide reducta-ses (a more appropriate term than the often used ‘‘flavopro-tein disulfide reductases’’). The GG doublet was perfectlyconserved in these proteins (Table II), which is remarkablein view of their limited sequence similarity. The spacingbetween the end of the DBM and the GG doublet variedconsiderably between different families, and even within afamily. Still, in the Families of Structurally Similar Pro-teins (FSSP) database,31 where a given structure is struc-turally aligned with all other structures in PDB, we foundthat nearly all the flavoprotein pyridine nucleotide reduc-tases had a GG doublet superposed to that of GR (Table II).The only exceptions were NADH peroxidase and NADHoxidase of Enterococcus faecalis.32 In these proteins, thea-helix that follows the DBM starts farther away from thecofactor and contains modified Cys residues.32 In flavocyto-chrome c-sulfide dehydrogenase (FCSD) of Chromatiumvinosum, which does not bind pyridine nucleotides but isstructurally related to GR,33,34 the GG doublet was alsomissing, probably correlated with the covalent binding ofFAD via Cys residues in the a-helix following strand 2.

When we examined the available 3-D structures offlavoproteins with a single dinucleotide-binding domain (5families), Rab GDP dissociation inhibitor (Rab-GDI)35 wasthe only one with a bona fide GG doublet. Although thisprotein does not bind any dinucleotide cofactor, a sequenceresembling a DBM can be found near its N-terminus,36 andits 3-D structure is strikingly similar to that of a flavopro-tein.37 In particular, a stretch of 12 residues (D-34 to S-45),including the GG doublet, was easily superimposable withthe GR structure. Another interesting case was that ofcholesterol oxidase (COX)38 and glucose oxidase (GOX).39

With use of the FSSP database, these proteins were foundto contain two consecutive Gly residues that could besuperimposed with the GG doublet of GR. Surprisingly,this sequence was found at 172 and 151, respectively,after the DBM, i.e., at a position completely different fromthe usual GG doublet. Still, the orientation, dihedralangles, and distance to the DBM are comparable withthose in TMD and GR (Table III). As in GR and TMD, theGG doublet is close to the FAD, and the residue immedi-ately following makes contact with the pyrophosphate(OA1). Based on their similar 3-D structure, COX andGOX have been grouped in a superfamily of flavoproteins,called GMC oxidoreductases.40,41 We have found perfectconservation of these two Gly residues in the GMC super-family (not shown).

A Novel Conserved Sequence in GR-RelatedProteins: The ‘‘ATG Motif’’

We wondered whether other discrete conserved se-quence motifs could be identified in flavoproteins, inaddition to the DBM and GG motifs. We initiated our

Fig. 1. Three regions of a MACAW sequence alignment of MAOs andrelated sequences. The italicized line above the MAO-B sequence is thesecondary structure prediction generated by PSIpred (C for random coil,E for b-strand, H for a-helix). Residues that can be aligned withsequences from another family appear in upper case, with gray shadingdenoting the best conserved regions. Conserved residues appear onblack background and are identified on the upper line (in italics whenconservation is not absolute; h stands for hydrophobic, o for polar F foraromatic). The arrow points to the Cys residue that covalently binds theflavin in vertebrate MAOs.50 LAO from Chlamydomonas reinhardtii(U78797), Neurospora crassa (A38314), and Synechococcus sp.PCC6301 (Z48565); CBP from Saccharomyces cerevisiae (FMS1_YEAST) and Candida albicans (CBP1_CANAL); MAO from Homo sapi-ens (AOFB_HUMAN), Salmo garnieri (AOF_ONCMY) and Aspergillusniger (S55273), and related sequences from Coenorhabditis elegans(R13G10.2 and F55C5.6) and Arabidopsis thaliana (ATFCA5: proteine327500); Putrox : putrescine oxidase from Micrococcus rubens (PUO_MICRU); FIG1 from Mus musculus (U70429); TMO from Agrobacteriumtumefaciens (A20966) and Pseudomonas syringeae (TR2M_PSESS);Achacin from Achatina fulica (ACHC_ACHFU) and Aplysianin-A fromAplysia kurodai (D83255).

99SEQUENCE MOTIFS IN FLAVOPROTEINS

TABLE II. Flavoproteins With a GG Doublet†

Protein family 5n6 Sequence identifiers References GG Exceptions

I) One dinucleotide-binding domain:THI4 enzymea 596 THI4_YEAST; THI4_SCHPO; THI4_ARATH;

TH41_MAIZE; TH42_MAIZE; THI4_ALNGL; THI4_FUSOX; THI4_FUSSH;THI4_METJA

74–77 16

dTDP-4-dehydrorhamnose reductaseb (EC 5.4.99.9) 556 GLF_MYCGE;GLF_MYCPN; GLF8_KLEPN; GLF1_KLEPN; GLF_ECOLI

78, 79 16

Soluble fumarate reductase (SFR)c 596 e257742 (Z78020);OSM1_YEAST; YEF7_YEAST; JC5123; CAB16560; CE01D10; FRDA_SHEPU;CAB38558.1; CAB37062

20, 22, 80 16

3-Oxosteroid 1-dehydrogenases 546 3O1D_COMTE; e280795 (Z82098);U59422; D37969

81 16 e266575: GG at 18

d4,5-alpha steroid dehydrogenase 516 L23428 82 15Rab GDP dissociation inhibitor (Rab-GDI)d 5216 GDIA_BOVIN; GDI-

A_RAT; C56956; GDIA_HUMAN; GDIB_RAT; GDIB_MOUSE; GDIB_HU-MAN; CELRABGDI; S36746; X93166; U62866; X94983; D83531; AF012823;AF016896; AF016897; RAE1_HUMAN; RAE2_HUMAN; RAE1_RAT

35–37 16 RAEP_YEAST: GDYD4C_SCHPO: GA

II) Two dinucleotide-binding domains:Glutamate synthases (GOGAT)e 566 A46602; B29617; Z83864; X89221;

Z49889; JQ1977;

83–88 16

GOGAT-likef 556 U20981; ECAE000333 (yffG); ECAE000372 (f644);AE000303 (o462); U73807

83–88 16

Dimethylaniline monooxygenase (FMO)g 5166 FMO2_RABIT; L08449etc.. 1 CEC01H6; CELC46H11; CEK08C7(.2); CEK08C7(.5); CEK08C7(.7)YHX6_YEAST

89–91 16

Cyclic monooxygenases (CMO)h 5106 YZ20_MYCTU; Y4ID_RHISN;STCW_EMENI; AB010439; Z80108; AL021287; AL021942; AL021309; Z83864

92, 93 16 CYMO_ACISP: AG

BaiH—Trimethylamine dehydrogenasei 5106 BAIH_EUBSP; NADO-_THEBR; 2649317 (AE001017); FADH_ECOLI; CAA15852 (AL010186);CAA71086 (Y09960); CAA76082 (Y16136); DHTM_METME; X89575

26, 53, 94–96 16 BAIC_EUBSP: AG

Glutathione reductase (GR) 596 GSHR_ARATH; GSHR_CAEEL;GSHR_ECOLI; GSHR_HAEIN; GSHR_HUMAN; GSHR_PEA; GSHR_PSEAE;GSHR_SPISP; GSHR_YEAST

15/115

Dihydrolipoamide dehydrogenase 5416 DEECLP; DEHULP etc . . .1 related: S42920; U10552; S70187; E35156

15/112

Thioredoxine reductase 5216 U67594; AE000058; X79603; U82978; TRX-B_COXBU; TRXB_ECOLI; TRXB_CLOLI; R34K_CLOPA; TRXB_EUBAC;TRXB_HAEIN; TRXB_MYCGE; TRXB_MYCLE; TRXB_MYCTU; TRXB-_NEUCR; TRXB_PENCH; TRXB_STRCL; TRXB_STRCO; TRXB_YEAST;YHQ6_YEAST; U63713; S44027

15/111

Alkyl hydroxyperoxide reductase 556 DHNA_BACSP, DHNA_BACSU;U82598; AHPF_SALTY; JC2311

13

Mercuric reductases 5116 MERA_BACSR; MERA_PSEAE;MERA_SERMA; MERA_SHIFL; MERA_STAAU; MERA_STRLI; MER-A_THIFE; RDPSHA, ECOMERTET; D90903; NCNR1MER

15

Trypanothione reductase 556 TYTR_CRIFA; TYTR_LEIDO;TYTR_TRYBB; TYTR_TRYCO; TYTR_TRYCR

114

†Figures in the GG column indicate its distance to the end of the DBM.aKnown in yeasts as MOL1 or NMT2, in other fungi as STI35, in plants as THI1. They are involved in thiazole biosynthesis.bProducts of the rfbD genes from bacteria, involved in O-antigen biosynthesis.cThe Shewanella putrefasciens flavocytochrome c binds FAD noncovalently and has an additional cytochrome c-like N-terminal domain.dRab-GDIs from mammals, yeast, C. elegans and fruit fly, and choroideremia-related gene products CHM and CHML. These proteins do not binddinucleotides, but a DBM-like sequence is found near their N-terminus, and the 3-D structure is similar to that of a flavoprotein. The region of theGG double (from D-34 to S-45) can be superposed with that of GR.eNADH- and NADPH-dependent GOGATs (EC1.4.1.14 and .13, respectively) are made up of either one or two polypeptide chains. The C-terminalpart (or b-subunit) contains the two DBMs.fFormate dehydrogenase (subunit b) of Moorella thermoacetica and the product of two E. coli open reading frames, ORF644 and YffG (for YffG, theEMBL AE000333 entry appears correct, whereas the Swissprot YFFG_ECOLI and Genbank ECOAEG530A entries apparently contain sequenceerrors that disrupt the alignment).gNO-forming (EC 1.14.13.8), also known as flavin-containing monooxygenases. Microsomal enzymes involved in the detoxification of xenobiotics.Their second DBM binds NADPH. A FMO homologue (presumably YHX6_YEAST) has been described in yeast.hFungal and bacterial sequences related to cyclohexanone monooxygenase and steroid monooxygenase. Show homology to FMOs.iThe baiH and baiC gene products of Eubacterium; NADH oxidases of Thermoanaerobium brockii and Archaeoglobus fulgidus; 2,4-enoyl-CoAreductase of E. coli (the product of the fadH 5 ygiL gene) and Mycobacterium tuberculosis; 2-enoate reductase from Clostridium tyrobutyricumand Moorella thermoacetica; trimethylamine dehydrogenase (TMD) (EC 1.5.99.7) of Methylotrophus methylotrophus and its close relativedimethylamine dehydrogenase (DMD) of Hyphomicrobium X.

100 O. VALLON

search with flavoprotein pyridine nucleotide reductases,for which ample structural and sequence data were avail-able. In the 87 sequences examined, we found that theregion of the fourth b-strand in the FAD-binding domainobeyed the following consensus:

oohhhAT G

where o stands for a polar or charged, and h for a hydro-phobic, residue (o can be D, E, K, R, H, N, S, or P, rarely A;h can be I, V, L, F, Y, or A). In the human GR sequence, thiscorresponded to residues 150–157 (PHILIATG). In thevarious structures available, the b-strand starts at thefirst residue of the consensus or slightly before and alwaysends at the fifth. The last three residues form a connectionwith a b-strand leading into the NADPH-binding domain.The sixth and seventh positions could be occupied not onlyby A or T (dominant) but also by S, V, or I. However, thefinal G was perfectly conserved. In GR, it is located in closeproximity to O3 of the cofactor, and its f and c angles arenot compatible with the presence of a side chain at thisposition (Table III). In the following, this consensus se-quence will be referred to as the ATG motif.

Extending our search to the NAD(P)H domain of theseproteins, we found that the sequence corresponding to thefourth b-strand of the Rossman fold and the following loopin that domain also fitted the ATG motif. In human GR,this second ATG motif (DCLLWAIG) corresponded to resi-dues 283–290, again just at the end of the NADPH-bindingdomain. The second residue of the motif could be hydropho-bic instead of polar. The sixth residue was almost alwaysan A as in the FAD-binding domain, whereas the seventhresidue was more often I than T. Again, the final G wasperfectly conserved, and its f and c angles are such that noother residue can be accommodated (Table III). The se-quence similarity between the two ATG motifs is remark-able, especially in view of the limited overall sequenceconservation between the two domains in each family andbetween the different families.

ATG and GD MotifsAre Present in Most FlavoproteinsWith Two Dinucleotide-Binding Domains

In flavoprotein pyridine nucleotide reductases and inrubredoxin reductase (EC 1.18.1.1), Eggink et al.11 de-scribed a consensus sequence encompassing the fifthb-strand of the FAD binding site. This motif can bewritten: TxxxxhfhhGD, and we refer to it as the GD motif.In addition, some sequence similarity has also been recog-nized in GR immediately downstream,42 corresponding tothe fifth a-helix of the FAD-binding domain. A G residue iswell conserved in the middle of the helix, where it comesclosest to the fifth b-strand. The fourth position before thatG is occupied by A, which in the available 3-D structuresfaces the G of the GD motif, and a hydrophobic residue isusually found immediately afterward. We call this AhxxGsequence the ‘‘G helix.’’

We found that the GD motif and the G helix could berecognized, together with the two ATG motifs, in all

flavoprotein pyridine nucleotide reductases and in nearlyall the other protein families that present two dinucleotidebinding-domains (Table IV). Whenever possible, the 3-Dstructure was examined to confirm assignation of themotifs (see Fig. 4). Figure 3 presents an alignment ofrepresentative sequences from four of these families (TMD,GOGAT, FMO, and CMO). Once these motifs have beenaligned, other regions of sequence similarity appear: forthose families in which no 3-D structure is available, thismay allow a few precise structural predictions. The otherflavoproteins with two dinucleotide-binding domains arelisted in Table IV together with the motifs observed. Whenall these sequences were taken into consideration, thedefinition of the motifs had to be broadened somewhat. Inthe first ATG motif, some variability was found in thepenultimate and antepenultimate positions, although thefinal G was perfectly conserved. In the second ATG motif,the second residue was more often hydrophobic than polar.The GD motif often was missing the initial T (correlated inavailable 3-D structures with a loss of the greek key), andthe second residue of the hydrophobic stretch was notnecessarily aromatic as in GR. For the G helix, it could berecognized only by virtue of its proximity to the GD motif.Its second residue was not necessarily hydrophobic, andthe final G could be replaced by A.

As can be seen in Table IV, the two ATG motifs arepresent in nearly all the families examined, the onlyexceptions being linked to loss of dinucleotide-binding. Aclear GD motif (and G helix) can be recognized in all butfour families, of which three may have a GD-like sequence.The good conservation of these motifs points to a commonstructural organization for all flavoproteins with twodinucleotide-binding domains, suggestive of a commonorigin. Their general organization can be described asfollows (the parentheses are for sequence motifs thatare not present in all families): DBMFAD-(GG)-ATGFAD-DBMNAD(P)H-ATGNAD(P)H-(GDFAD)-(G helix).

Are ATG and GD Motifs Also Present in ProteinsWith a Single FAD-Binding Domain?

We wondered whether the ATG and GD motifs could alsobe identified in flavoproteins with a single dinucleotide-binding domain (Table IV). When the available 3-D struc-tures were examined, only p-hydroxybenzoate hydroxylase(PHH)43 and the related phenol hydroxylase44 were foundto show a bona fide GD motif, as noted in Eggink et al.,11

with the final D liganding O3*. The other flavoproteins alllacked the final GD residues in the 5th strand and used adifferent bonding pattern for the ribityl. In COX and GOX,O3* is bound by water molecules, not by a D residue as inthe GD motif. In D-amino acid oxidase (DAO), the flavinmoiety adopts a different conformation,45 turning its O3*in the opposite direction, away from the fifth strand.

In contrast, the ATG motif appeared much more wide-spread, at least among the flavoprotein families for which3D-structures were available. It was found at the expected

101SEQUENCE MOTIFS IN FLAVOPROTEINS

DBM GGoh h h h G G G/A h h o h h h h o G G

H H H H H H H H H H H H H H H H H H H H H H C C C C C C C C C C C E E E E C C C H H H H H H H H H H H H C C C C C E E E E E C C C C C C C C C C C C H H H H H C C C C C C C C C C C C C C C H H H H H H H H H H H H H C C C H H H H H H H H H H H260� 6�F� s v r r v f i y v s i f v l i i v l k r t l s g t d q t s m k q P V V V I G S G L A G L T T S N R L I S K y r i P V V L L D K A A S - - I G G N S I K A S S G I N - - G A H T D T Q Q N L K V M D T P E L F L K D T L H S A K G R G V P S L M D K L T K E ���)�OLNH &�H� m a s v l h - - - - - - - - - - - - a e p - - - - - - - - - - - T V V V V G G G L A G L S A S L D I I N G G G - R V I L V E G E G N - - T G G N S A K A S S G I N - - G A G T E T Q E K L G I K D S P E L L V K D T L S A G D S E N D K K L V E I L A A N ��

C C C C C C C C C C C C C C C C C C C C C C C C C C E E E E C C C H H H H H H H H H H H H C C C C E E E E E C C C C C C C C C E E C C C C C E E C C C C C C C C H H H H H H H H H H H H C C C C C C H H H H H H H H H H6XF' 6�F� q t q g s v n g s a s r s a d g k y h i i d h e y - - - - - - - D C V V I G A G G A G L R A A F G L A E A G Y - K T A C I S K L F P - - T R SH T V A A Q G G I N - - A A L G N m h - - - - - K D N W K W H M Y D T V K G S D W L G D Q D S I H Y M T R E ���6XF' 0�M� m k t - - - - - - - - - - - - - - - - - - - - - - - - - - - - - D I L I I G G G G A A A R A A I E C R D K n - - v I I A V K G L F G - - K S GC T V M A E G G Y N - - A V F N P - - - - - - - K D S F K K H F Y D T V K G G G F I N N P K L V E I L V K N ��$VS2 %�V� m s k k - - - - - - - - - - - - - - - - - - - - - - - - - - - - T I A V I G S G A A A L S L A A A F P P S y - - E V T V I T K K S V - - K N S N S V Y A Q G G I A - - A A Y A K - - - - - - - D D S I E A H L E D T L Y A G C G H N N L A I V A D V L H D ���26' 0�W� m t v - q e f - - - - - - - - - - - - - - - - - - - - - - - - - D V V V V G S G A A G M V A A L V A A H R G L - S T V V V E K A P H - - Y G G S T A R S G G G V W - - I P N N E V L K R R G V R D T P E A A R T Y L H G I V G E I V E P E R I D A Y L D R ��6�OLNH 0�W� m a l t c t d m s d a v a g s d a e g l t a - - - - - - - - - - D A I V V G A G L A G L V A A C E L A D R G L - R V L I L D Q E N R a n V G G Q A F W S F G G L F - - L V N S P E Q R R L G I R D S H E L A L Q D W L G T A A F D R P E D Y - - - - - - - ������6' &�W� m s n v k k h v s t i n p v g e v l d v g s a d e v q w s d a s D V V V V G W G G A G A S A A I E A R E Q G A - E V L V I E R F S G - - - G G A S V - L S G G V V y a G A V P A T R R K - p A S R F T E A M T A Y L K H E V N G V V S D E T L A R F S R D ���

C C H H H H H H H H C C C C C C C C C C C C C C C C E E C C C C C C C C C C C H H H H H H H H H H H H H H C C C C C C C E E E E E E H H C E E E E260� 6�F� S K S A I R W L q T E F D L K L D L L A Q L G G H S V P R T H - - - - R S S G K - L P P - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - G F E I V Q A L S K K L K D I S S K D S N L V Q I M L N S E V V D I ���)�OLNH &�H� S A D A V E F L - R G V E V D L T D V N L C G G H S V P R T H w i p s P K E G R p I P A - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - G F E I M K R L R T R L N E K Q S E N P E A F K L L T Q T K M V G I ���

H H H H H H H H H H H C C C C C C C C C C C E E E C C C C C C C C C C C C C C C C C E E E E E H H H H H H H H H H H H H H H H H H C C C E E E E E E E E E E E6XF' 6�F� A P K S - - - - - - - - - - - - - - I I E L E H Y G V P F S R - - - - T E N G K - - - - - - - - - - - - - - - - - - - - - I Y Q R A F G G Q T k e y g k g a q A Y R T C - A V A D R T G H A L L H T L Y G Q A L R h d - - - - - - T H F F I E Y F A L D L ���6XF' 0�M� A P K E - - - - - - - - - - - - - - L L N L E R F G A L F D R - - - - T E D G F - - - - - - - - - - - - - - - - - - - - - I A Q R P F G G Q S - - - - - - - - F N R T C - Y C G D R T G H E I M R G L m e y i s k f e R - - - - - I K I L E E V M A I K L ���$VS2 %�V� G K M M V - - - - - - - - - - - - - Q S L L E R - G F P F D R - - - - N E R G G - - - - - - - - - - - - - - - - - - - - - V C L G R E G A H S - - - - - - - - Y N R i f h A G G D A T G R L L I D Y L l k r i n s - - K - - - - - I K L I E N E T A A D L ����26' 0�W� G P E M L S F V l k h t p l k m c w V P G Y S D Y Y P E A P G G R - P G G R S I E P K P f n a r k l g a d m a g l e p a y g k v p l n v v v m q q d y v r l n Q L K R H P R G V L R S M K V G A R T m w a k a t g k n l v g m - - - - - - G R A L I G P L ���6�OLNH 0�W� - - - - - - - - - - - - - - - - - - - - - W P E Q W A H A Y V D F a A G E K R S W L R A r g l k i f p - - - - - - - - - - - - - - - - - - - - - - - - - - - - L V G W A E R G G Y D A Q G H G N S V p r f - - - - - - - - - - h i t w g t G P A L V D I F ������6' &�W� S V T N L N W L e k q g a t f a s t M P G Y K T S Y P a d g m y l y y s g n e v v p a y g n p q l l k k p p p r g h r v v a k g q s g a m f f a a l q k s t l a h g a r t l t q a r v q r l v r e k d s g r v l g v e v m v l p e g d p r t e r h k k l d ���

ATG motifoo h h hA T GG

E E C C C C E E E E E E E E C C C C C E E E E E C C E E E E E C C C C C C C H H H H H H C C C C C E E E E C C C C C C C C H H H H H H H H H H C C C C C C C C C E E C260� 6�F� E L d N Q G H V T G V - - - - - - - - - - - - - - - - - - - - V Y M D E N G N R K I M K S - - - - - - - - - - - - - - - H H V V F C S G G F G Y S K E M L K E Y S P N L I H L P - - - - - T T N G K Q T - T G D G Q K I L S K L G - A E L I D M D Q V Q V ���)�OLNH &�H� L R - E N G K V S G I e - - - - - - - - - - - - - - - - - - - I V T S A S G E K S K I I G - - - - - - - - - - - - - - - D A V I L A T G G F S A D E T L L K E F G A E I F G F P - - - - - T T N G A F A - K G D d - - - - - - - - - - - - - - - - - - - - ���

E E C C C E E E E E E E E E C C C C C E E E E E C C E E E E E C C C C C C C C C C C C C C C C C C H H H H H H H H H H C C C C C C C C E E E C6XF' 6�F� L T - H N G E V V G V I - - - - - - - - - - - - - - - - - - - A Y N Q E D G T I H R F R A - - - - - - - - - - - - - - - H K T I I A T G G Y G R - - - - - - - - - - - - A Y F S - - - - - C T S A H T C - T G D G N A M V S R A G - F P L Q D L E F V Q F ���6XF' 0�M� I V - K D N R C Y G A I - - - - - - - - - - - - - - - - - - - F L D L K T G N I F P I F A - - - - - - - - - - - - - - - K A T I L A T G G A G Q - - - - - - - - - - - - L Y P I - - - - - T S N P I Q K - T G D G F A I A Y N E G - A E L I D M E M V Q F ���$VS2 %�V� L I - E D G R C I G V - - - - - - - - - - - - - - - - - - - - M T K D S K G R L K V R H A - - - - - - - - - - - - - - - D E V V L A A G G C G N - - - - - - - - - - - - L F L H - - - - - H T N D L T V - T G D G L S L A Y R A G - A E L T D L E F T Q F ����26' 0�W� R I G L Q R A G V p - V E L N T A F T D L F V E - N G V V S G V Y V R D s h e a e s a e p q l i r a r - - - - - - - - - R G V I L A C G G F E H N E Q M R I K Y q r a p i t t e w - - - - T V G A S A N - T G D G I L A A E K L G - A A L D L M D D A W W ���6�OLNH 0�W� V R Q L R D R P T v r F A H R H Q V D K L I V E - G N A V T G V R G t v l e p s d e p r g a p s s r k s v g k f e f r a S A V I V A S G G I G G N H E L V R K N w p r r m g r i p k q l l S G V P A H V - D G R M I G I A Q K A G - A A V I N P D R M w h ������6' &�W� e l v a k s a c i r r r v p r r v a v n v r r s r a r s a r s a t s v p a k v w c c p l a a i s s i r n c w s m r r - - Y K P G W L T G A A G C - - - - - - - - - - - - - - - - - - - - - - - - - - - - - D G S G L R L G Q S V G - G I A Q D L N N I S A ���

?

C C C C C C C C C C C C C C C E E E E E C C C C C C C C E E E E C C C C C C C C C C C C C C C C C C H H H H H H C C C C C C C C E E E E C C C C C C C C C C C C C260� 6�F� H P T G F I d p n d r e N N W K F L A A E A LR G L G G I L L H - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - p T T G R R F T N - - - - - - - E L S T R D T V T M E I Q S K C P K n d n r a l l v m s d k v y e n y t n n i n ���)�OLNH &�H� s s y s v c r s k r s i S R N K V L A A E A IR G K G A L L I N - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - S K G Q R F G N - - - - - - - E L G R R D Y L T G K I L E l a e p i e e s f q g g s a g r n a a i m i m n e a ���

C C C C C C C C C C E E E E C C C C C E E E E E C C C C C C E E C C C C C C C C C C C C H H H H H H H H H H H H H H C C C C C C C C C C C C C C c c C C C H H6XF' 6�F� H P S G I Y G S G C - - - - - - - L I T E G AR G E G G F L V N - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - S E G E R F M E R Y A P T A K D L A C R D V V S R A I T M E I R E G R G V g k k k d - h M Y L Q L S H L P P E ���6XF' 0�M� H P T G M V G T G I - - - - - - - L V T E A VR G E G G I L Y N - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - K Y K E R F M V R Y D K E R M E L S T R D V V A R A I Y K E I Q E G R G V n g g - - - - V Y L D V S H L P N E ���$VS2 %�V� H P T L L V K N G V s y g - - - - L V S E A VR G E G G C L V D - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - E N G R R I M A e r h p l g - D L A P R D I V S R V I H E E M A K G N R V - - - - - - - - Y I D F S A I s d - ����26' 0�W� G P T V P L v - g k p w f a l s e r n s p - - G S I I V N M S - G K R F M N E S m p y v e a c h h M Y G G E H G Q G P G P G E n i p - A W L V F D Q R Y R d r y i f a g l q p g - - - - Q R I P S R W L D S G V I - - - - - - - - - - - - - - - - - - V Q ���6�OLNH 0�W� y t e g i t n y d p i w p r h g i r i i p g p S S L W L D A A - G K R l p v p l f p g f d t l g t l e y i t k s g h d y - - - - - - - T W F V L N A K i i e k e f a l s g q e q n p d l t g r r l g q l l r s r a h a g p p g p v q a f i d r g v d c V H ������6' &�W� w r f i t p p s v w p k - - - - - - - - - - - - G L V V N I Q - G E R F C N E Q - - - - - - - - - V Y G A K L G Y E M M E K Q G G Q - A W L I I D S N V R r q a a w q c l f g g l w a f Q S M P A L A L M Y K V A - - - - - - - - - - - - - - - - - - I K ���

Figure 2.

Fig. 2. (Continued.) MACAW sequence alignment of SFR and related sequences. Thealignment was generated with 15 sequences, but only 8 were retained for clarity. The DBM, GG,ATG, and GD-like motifs are shown. Arrows and italics indicate the residues involved in covalentbinding of FAD23 or fumarate binding.51 ? indicates the only other fully conserved Arg, possiblythe second active site Arg postulated in.51 Regions conserved between all families are shaded ingray (in black for fully conserved residues). Regions common to fumarate reductases, succinatedehydrogenases, and aspartate oxidases are boxed. The italicized lines above the OSM1 andSucD sequences are the secondary structure predictions generated by PSIpred. OSM1

(OSM1_YEAST, res. 4 to end); F-like : an unidentified protein from C. elegans (AAC46539.1)with similarity to soluble fumarate reductases but lacking one of the two basic residuesimplicated in fumarate binding; SucD : succinate dehydrogenases from S. cerevisiae (DHSA-_YEAST, mature) and Methanococcus jannaschii (U67462); AspO : aspartate oxidase fromBacillus subtilis (NADB_BACSU); 3OSD: 3-oxosteroid 1-dehydrogeneses from M. tuberculosis(MTCY03C7.19c); S-like : steroid dehydrogenase-like protein from M. tuberculosis(MTCY369.29); 4,5SD: d4, 5-alpha steroid dehydrogenase from Comamonas testosteroni(L23428).

Figure 3.

104O

.V

AL

LO

N

Fig. 3. (Continued.) Sequence alignment, using MACAW, of flavoproteins with two dinucleo-tide-binding domains. The alignment was generated with 27 sequences, but only 10 have beenretained for clarity. The GR sequence was added based on structural superposition with TMD inthe FSSP entry for GR. The lines in italics above the GR and TMD sequences indicate theirsecondary structures, according to FSSP: e for extended (b-strands), h and t for a-helices, g for3-turns, s for bends. Arrows point to the redox-active Cys in GR. GR (GSHR_HUMAN, res. 19 toend); TMD (DHTM_METME, res. 386 to end); BaiC (BAIC_EUBSP, res. 369 to end; note thatthe sequence has been corrected by adding a nucleotide immediately before the published stop

codon,52 to restore homology to the other family members); FadH: 2–4 enoyl CoA reductase(FADH_ECOLI, res. 371 to end; note that the sequence in53 contains a frameshift that masks theGD motif); GOGAT-like protein of E. coli 5 ORF462 (AE000303 o462, res. 120 to end); FdhB :formate dehydrogenase b subunit (U73807, res. 371 to end); DHPD: dihydropyrimidinedehydrogenase (BTU20981, res. 183–539); GOGAT from S. cerevisiae (SCGLUTSYN, res.1778 to end); FMOs from Oryctolagus cuniculus (1C1: FMO5_RABIT) and S. cerevisiae(YHX6_YEAST); CMO: stcW, a putative sterigmatocystin biosynthetic gene from Emericellanidulans (STCW_EMENI).

105S

EQ

UE

NC

EM

OT

IFS

INF

LA

VO

PR

OT

EIN

S

position in COX, GOX, and the entire GMC family, inDAO45 and in PHH and the related phenol 2-monooxygen-ases (PheA/TfdB). A notable exception was Rab-GDI (whichas mentioned above does not bind FAD). When we exam-ined flavoprotein families for which no structural datawere available, our sequence alignments sometimes re-vealed putative ATG motifs (Table IV and Fig. 2). In SFR,the PSIpred program consistently predicted the correspond-ing region as a b-strand, and three other strands werepredicted in the preceding 40 residues that may constitutethe three-stranded antiparallel b-sheet connecting helixIII and strand 4. In most families, however, the ATG motifcould not be recognized unambiguously. In combinationwith secondary structure predictions, only tentative assign-ments could be made. For example, in the alignmentproposed for MAO-related sequences, a hydrophobic stretchfollowed by P might correspond to the fourth b-strand(ATG-like in Fig. 1), whereas another region can beproposed to form the fifth b-strand (GD-like) and followinghelix. Similarly, a central conserved region of carotenoiddesaturases, centered around a hydrophobic sequence,may be interpreted as forming the fifth strand (residues424–429 in CRTI_SYNY3).

DISCUSSIONEvolution of Dinucleotide-Binding Proteins: TheRossman Fold, the ‘‘Pre-DBM’’ and the DBM

Most of the flavoproteins whose 3-D structure has beensolved were found to adopt a similar folding pattern basedon the Rossman fold. This fold is also found in NAD(P)H-binding proteins and in proteins that bind other cofactorsor do not bind any. A list of the available 3-D structures ispresented in Table V, organized according to the FSSPclassification. Figure 5 presents a proposal for a classifica-tion of Rossman-type proteins, intended to organize themfrom simplest to most complex on the basis of theirtopology, cofactor-binding properties, and recognized se-quence motifs. It may also be viewed as a tentative schemefor the evolution of dinucleotide-binding proteins, but itshould be stressed that other pathways are possible: thebuilding of a solid evolutionary tree is beyond the scope ofthis study.

As described in the introduction, the Rossman fold in itssimplest form (i.e., as it appears in glucose 6-phosphatedehydrogenase) is a six-stranded parallel b-sheet, witha-helices connecting the strands in a right-handed fashion.

Fig. 4. Position of the GG doublet in TMD. A space-filling representa-tion is used for the GG doublet and ADP cofactor. Other importantstructures are represented as wireframe: the connection between DBMand GG doublet and the following helix (thin), the DBM (most of the helix

removed), the ATG and GD motifs. b-strands are marked with romannumerals at their N-terminus. Dotted lines indicate H-bonds between ADPand residues D419 and D674.

106 O. VALLON

Many variations on this theme can be found, i.e., struc-tures that can be derived simply from the Rossman fold bydeleting or adding structural elements without reorganiz-ing the existing connectivity of the strands. For example,the final strand is missing in flavoproteins, and their thirdhelix is replaced by an antiparallel b-sheet (except inDAO). The same holds for the NAD(P)H-binding domainsof flavoprotein pyridine nucleotide reductases, which inaddition lack the fifth helix and have an additional b-strandat their N-terminus.

In Rossman-type proteins, G residues are often found atthe C-terminus of the first (central) b-strand. This is true

even for proteins that do not bind cofactors or bindcofactors other than dinucleotides at that site, such asaspartate carbamoyltransferase, vp39 methyltransferaseor carboxyl methylesterase. The tendency to accumulate Gresidues in that area becomes particularly strong in NADH-binding proteins. For example, aldehyde dehydrogenasehas the sequence GSTVG at this position, whereas glucose6-phosphate dehydrogenase has GGTGDLA. For a largeseries of NAD(P)H-binding proteins, a more elaborateconsensus sequence (type 3 in first column of Table V) canbe derived for this region: G(G/A/x)x(G/x)xx(G/A), with theGs at positions 1, 4, and 7 occupying positions equivalent

TABLE III.Angle and Distance Data for the GG,ATG, and GD Motifs in TMD, GR, and COX

Motif Molecule Res No F C Distance to DBM (or ATG) Distance to cofactor

GG TMD G 425 135 222 N at 2.85 Å from A 397 OC at 3.72 Å from G 398 CA

G 426 56 2133 N at 3.74 Å from A 397 O CA at 3.46 Å from O3*N at 3.67 Å from G 398 CA CA at 3.74 Å from O5*

CA at 3.39 Å from O2AGR G 55 104 224 N at 2.90 Å from G 28 O

C at 3.90 Å from G 29 CAG 56 60 2151 N at 3.66 Å from G 28 O

CA at 3.76 Å from G 29 CAN at 3.89 Å from AO3*CA at 3.15 Å from AO3*CA at 3.86 Å from A05*CA at 3.79 Å from A02C at 3.88 Å from A02

COX G 113 117 212 N at 3.10 Å from S 19 ON at 3.05 Å from S 19 OGCA at 3.07 Å from S 19 OC at 3.17 Å from S 19 OO at 3.76 Å from G 20 CA

G 114 52 2136 N at 3.51 Å from S 19 O N at 3.49 Å from AO3*CA at 3.95 Å from G 20 CA CA at 3.88 Å from AO5*

CA at 3.04 Å from AO3*CA at 3.72 Å from AC3*C at 3.87 Å from AO1

1st ATG TMD G 488 87 2162 N at 3.71 Å from C5*CA at 3.77 Å from C5*CA at 3.97 Å from O2BCA at 3.40 Å from O1ACA at 3.27 Å from O3A

GR G 157 65 2153 N at 3.65 Å from AC5*N at 3.68 Å from A01CA at 3.34 Å from O3CA at 3.66 Å from OP2

COX G 290 95 164 N at 3.87 Å from AC5*CA at 3.60 Å from O3CA at 3.68 Å from OP2

2nd ATG TMD G 643 156 173GR G 290 163 2173

GD TMD G 673 68 2153 CA at 3.67 Å from S 400 OG CA at 3.59 Å from O2BC at 3.74 Å from O2B

D 674 264 221 CB at 3.76 Å from G 488 C (ATG) N at 2.96 Å from O2BC at 3.50 Å from G 488 O (ATG) OD2 at 2.63 Å from O3B

GR G 330 83 2158 CA at 3.52 Å from OP2D 331 255 231 CB at 3.92 Å from G 157 CA (ATG) N at 2.97 Å from OP2

OD1 at 3.32 Å from C3*OD2 at 2.76 Å from O3*OD2 at 3.19 Å from C5*

COX D 474 2124 2159 OD2 at 2.78 Å from G 290 N (ATG) CB at 3.43 Å from OP2G 475 268 211 N at 2.95 Å from OP2

107SEQUENCE MOTIFS IN FLAVOPROTEINS

to those of the DBM. These proteins include the REDsuperfamily46 (dihydropteridine reductase, 3-a, 20-b-hydroxysteroid dehydrogenase and UDP-galactose 4-epime-rase), carbonyl reductase, malate dehydrogenases, glucose6-phosphate dehydrogenase, enoyl-acyl carrier protein re-ductases, cis-biphenyl-2,3-dihydrodiol-2,3-dehydrogenase,etc.. We call this type of sequence a ‘‘pre-DBM’’ because it

is less defined and less compact than the classical DBM,but we have no indication that the latter has evolved formthe former. As in the DBM, hydrophobic residues aregenerally found at the critical positions in the two b-strandsand intervening a-helix. Some of these proteins (malatedehydrogenase, UDP-galactose 4-epimerase, dihydropico-linate reductase, etc.) also have a negatively charged

TABLE IV. Occurrence of Sequence Motifs in Flavoproteins†

Protein family5n6

domain: ExampleGGFAD

ATGFAD

ATGNAD(P)H

GDFAD

G helixFAD

II) Two dinucleotide-binding domains:Pyridine nucleotide reductases*a 5926 GSHR_HUMAN 156 1157 1290 1331 1346NADH oxidase/NADH peroxidase* 5266 NAPE_ENTFA 2 1112 1243 1281 1306BaiH-Trimethylamine dehydrogenase* 5106 DHTM_METME 1426 1b488 1c641 1d572 1686Glutamate synthase (GOGAT) 5116 SCGLUTSYN 11819 11876 1e2046 1d2088 12103Dimethylaniline monooxygenase 5166 RABDIANMON 140 1149 1f329 ?g (369) 2Cyclic monooxygenases (CMO) 5106 STCW_EMENI 146 1148 1352 ?h (399) 2Rubredoxin/putidaredoxin/ferredoxin

reductases 5226 RURE_PSEOL 2 1i107 1238 1275 1309Nitrate and nitrite reductases 5206 NIRB_ECOLI 2 1110 1j242 1k281 1l304NADH-ubiquinone reductase/ndhA 5146 NDI1_YEAST 2 1117 1280 1d323 1l400Adrenodoxin and ferredoxin reductases 566 ADRO_BOVIN 2 1121 1366 ?m (405) 2Lysine:N6-, ornithine:N5-hydroxylases 556 IUCD_ECOLI 2 1150 1331 2 2Flavocytochrome c sulfide dehydrogenase* 596 1fcd 2 1135 2n 1d8324 1342

I) One dinucleotide-binding domain:GMC oxidoreductases*o 5276 GOX_ASPNG ?124 1312 2 2D-amino acid oxidase* 576 OXDA_PIG 2 1p183 2 2p-hydroxybenzoate hydroxylase* 5136 PSEPOBA 2 1q160 1d286 2Rab-GDI* 5216 GDIA_BOVIN 141 2 2 2Fumarate reductase, steroid dehydrogenases 596 OSM1_YEAST 172 1230 ?r467 1494dTDP-4-dehydrorhamnose reductase 556 GLF_ECOLI 138 ?s (231 or 239) 2 2THI4 596 THI4_YEAST 1104 1t236 ?u (291) ? (303)LAO, MAO, TMO, CBP 5366 A38314 1213 ?v (475) ?w (647) ? (663)

†Families for which 3-D structures are known are indicated by *. The figures in the table indicate the position of the last residue of the motif in theexample sequence.aGR, mercuric reductase, dihydrolipoamide dehydrogenase, thioredoxine reductase, alkyl-hydroxyperoxide reductase and trypanothionereductase, see Table II.bCan have Y as the 2nd residue, or T as the 3rd.cIn DMD, the final G is replaced by S; in DMD as in TMD, this domain does not bind NAD(P)H.97

dThe initial T of GD motif is missing or d8: replaced by S.eCharged residues at positions 3 or 5 pointing toward the antiparallel b-sheet.fThis sequence had been noted in98 but was considered not significant because both flavin-binding sites were thought to lie in the a-subunit. The S.cerevisiae enzyme lacks the initial T.eSequence conservation had been noted in90 but attributed to the formation of a stable C(a4)-hydroperoxyflavin intermediate.gThe closest match was a conserved hydrophobic stretch followed by GL. Absent in yeast FMO which has an complete DBM and appears to bindFAD weakly.91

hThe closest match was a conserved hydrophobic stretch followed by GP.iIn P. oleovorans rubredoxin reductase, final G is replaced by P.jSometimes has R at the 6th position.kEnds more often in E than in D.lG often replaced by A.mThe closest match was a conserved hydrophobic stretch followed by GW.nFinal G replaced by P. The domain does not bind NAD(P)H, but rather another unknown cofactor.33

oGlucose, cholesterol, alcohol, choline, and methanol oxidases; glucose, alcohol, choline, sorbose, and cellobiose dehydrogenases; mandelonitrilelyase; versicolorine synthase.p5th residue always an N.qThe unusual G and D residues at positions 5 and 7 are strictly conserved. In PHH and phenol hydroxylase, the D faces the N-end of a short a-helixin the back of the b-sheet, thus compensating for its dipole, and the G at the 5th position allows close approach to this helix.rEnds in E, sometimes in N, A, or V.s3rd or 4th residue is hydrophilic.t5th residue is hydrophilic.uEnds in M.vEnds in P.wEnds in D, E, S, T, C, or A.

108 O. VALLON

TABLE V. Sequence Motifs and Cofactor Binding in Rossman-Type Proteins†

DBM ATG CofactorFSSP

branchPDBno. Name Notes on structure

2 2a NADH 55.1.1.1.1.1 1ad3 aldehyde dehydrogenase (N-term domain) 6th strand missing2E 2a HCA 1 FeMo 55.1.1.2.1.1 3min nitrogenase molybdenum iron protein

(C-term domain)6th strand missing

2Db 2 S-Ade-homoCys 55.2.1.1.1.1 1v39 vp39 (polyA polymerase regulatory subunit) 6th strand missing2D 2a S-AdeMet 55.2.1.1.2.2 1xva glycine N-methyltransferase 6th strand missing38AR 2 NAD(P)H 55.2.1.2.1.1 1dpg glucose 6-phosphate dehydrogenase3E 2 NAD(P)H 55.2.1.2.2.1 1drw dihydrodipicolinate reductase additional 7th strand4SR 1 NADPH 55.2.1.2.2.1 1dap diaminopimelic acid dehydrogenase additional 7th strand2A 2 none (buried) 55.2.1.2.3.1 8atc aspartate carbamoyltransferase (C-terminal

domain)N-term domain also Rossman fold:

no DBM, an ATG4E 2 NADH 55.2.1.2.4.1 1leh leucine dehydrogenase48D 2 NADH 55.2.1.2.4.2 2nad formate dehydrogenase (C-term domain) additional 7th strand (parallel)4D 1 NADH 55.2.1.2.5.1 1gyp glyceraldehyde-3-phosphate dehydrogenase4D 1 NADH 55.2.1.2.6.1 2ohx alcohol dehydrogenase38TR 1 NADPH 55.2.1.2.7.1 1cyd carbonyl reductase3D 1 NADH 55.2.1.2.7.1 1dhr dihydropteridine reductase2cG 1d NADH 55.2.1.2.7.1 1eny enoyl-acyl carrier protein reductase38D 1 NADH 55.2.1.2.7.1 1hdc 3-a, 20-b-hydroxysteroid dehydrogenase additional 7th strand3D 1 NADH 55.2.1.2.7.1 1xel UDP-galactose 4-epimerase38LR 1 NADPH 55.2.1.2.7.1 1fds 17-b-hydroxysteroid-dehydrogenase3E 2e NADH 55.2.1.2.8.1 1bmd malate dehydrogenase (T. flavus)3D 1 NADH 55.2.1.2.8.1 2cmd malate dehydrogenase (E. coli)3D 1 NADH 55.2.1.2.8.1 1hyh 1-2-hydroxyisocaproate dehydrogenase4D 1 NADH 55.2.1.2.8.1 1ldg 1-lactate dehydrogenase4NR 2 NADPH 55.2.1.2.9.1 2pgd 6-phosphogluconate dehydrogenase additional anti-parallel 7th strand4LR 2 NADPH 55.2.1.2.9.2 1yveI acetohydroxy acid isomeroreductase2fQ 1 none 55.2.2.1.1.1 1chd cheB carboxyl methylesterase (C-terminal

domain)additional strand 1 helix after 2nd

strand, order reads: 6, 5, 4, 1, 2, 3, 282G 2 vit.B12 55.2.2.2.1.1 1bmt methionine synthase 3rd strand missing3T 2 CoA 55.2.2.2.2.1 1scu succinate-CoA ligase (N-terminal) 2nd helix replaced by b-strand1E 2 none 55.2.2.2.3.1 3chy CheY 3rd strand missing2Y 2 NAD(P)H 55.2.2.3.3.1 1iso isocitrate dehydrogenase 3rd and 6th strands missing2H 1 ATP 55.2.2.4.1.1 1bnc biotin carboxylase1G 2 ATP 55.2.3.1.1.1 1gpb glycogen phosphorylase b1N 2 none 55.2.3.2.1.1 1nba N-carbamoylsarcosine amidohydrolase (A)2S 2 none 55.3.1.1.1.1 1esc serine esterase 3rd strand missing2 2 NADH 55.5.1.1.1.1 1ndh cytochrome b5 reductase 6th strand missing2 1 ADP 55.5.2.1.1.1 1php 3-phosphoglycerate kinase (N-terminal

domain)additional strand 1 helix after 2nd

strand 6th strand missing;order: 6, 5, 1, 2, 4, 3

38I 2 none 55.6.1.1.1.1 1pdo mannose permease phosphotransferase 3rd and 6th strands missing2 2 none 55.7.1.1.1.1 1tpt thymidine phosphorylase 6th strand missing*4D§1S

12

*ADP§?

55.8.1.1.1.1 2tmd trimethylamine dehydrogenase (chain A,C-terminal part)

*no anti/ b-sheet, no greek key§4 strands 1 1 (at N-terminus)

*4D§2S

12

*FAD§?

55.8.1.1.1.1 1fcd flavocytochrome c sulfidedehydrogenase, chain A

*No GG doublet, covalent FAD§4 strands 1 1 (at N-terminus)

*4T§4HR

11

*FAD§NADPH

55.8.1.1.1.1 1trb thioredoxin reductase *§4 strands 1 1 (at N-terminus)

*4E§4VR

11

*FAD§NADPH

55.8.1.1.1.1 3grs glutathione reductase *§4 strands 1 1 (at N-terminus)

*4E§4D

11

*FAD§NADH

55.8.1.1.1.1 1nhp NADH peroxidase *No GG doublet§4 strands 1 1 (at N-terminus)

*4E§4E

11

*FAD§NADH

55.8.1.1.1.1 3lad dihydrolipoamide dehydrogenase *§4 strands 1 1 (at N-terminus)

48 2 none 55.8.1.1.2.1 1gnd guanine nucleotide dissociation inhibitor GG doublet at base of a b-strand4 1 FAD 55.8.1.1.2.2 1pbe p-hydroxybenzoate hydroxylase4 1 FAD 55.8.1.1.2.2 1aa8 D-amino acid oxidase4 1 FAD 55.8.1.1.2.3 1gal glucose oxidase GG doublet at other location4 1 FAD 55.8.1.1.2.3 3cox cholesterol oxidase GG doublet at other location

†Based on the FSSP classification (class 55 as of October 1997. This class also comprises proteins based on alternating b-strand and a-helices, but with adifferent topology that cannot be derived from the typical Rossman fold by a simple addition or deletion of strands. They have been omitted from theanalysis). For the DBM, the nature of the central G-rich sequence is indicated by the first letter: 1 5 no G; 2 5 one G or more, but different from 3 or 4; 3 5GxxGxxG; 4 5 Gx(G/A)xx(G/A). 8 indicates one mismatch. The last residue of the 2nd b-strand is also indicated, together with the following one whenrelevant to NADH/NADPH discrimination. When two dinucleotide-binding domains are present, they are marked with * and § for the N- and C-terminaldomains, respectively.a4th strand ends in G.bD95 at end of b-strand 2 binds O2* and O3*.cThe other family members have a pre-DBM.dHalf the sequences had A at the last position, instead of G.eAligned with ATG motif: DYALLVGA.fThe sequence resembles a DBM: GASTGG, where the central S has been implicated in catalysis.99

109SEQUENCE MOTIFS IN FLAVOPROTEINS

residue at the end of the 2nd b-strand, which binds theribose of the cofactor.

The distributions of the pre-DBM and DBM coincides inpart with the FSSP classification (Table V), which is a goodindication that they can be used to trace evolutionarylinks. Although both types of motifs are encountered inbranches 55.2.1.2.2, .6, and .8, the pre-DBM is the rule inbranch 55.2.1.2.7, and the DBM predominates in branches55.2.1.2.4, .5, .6, and .9, and in branch 55.8 (the flavopro-teins). Presumably, the latter branch has evolved from anNAD(P)H-binding ancestor that already presented a DBM(step G in Fig. 5).

The ATG Motif

Because of the pseudosymmetry axis present in theRossman fold, the end of the fourth b-strand and connect-ing loop are subject to similar structural constraints thatgave rise to the G-rich sequence of the DBM. Here again,the need to accommodate a substrate or a cofactor appearsto have favored G over more bulky residues. Indeed, a bonafide ATG motif (loosely defined here as the sequenceoxhhhxxG) can be recognized in many Rossman-type pro-teins (Table V). In many cases, the first residue was a D.The nature of the second residue was variable from onefamily to another, as had been observed in flavoproteinpyridine nucleotide reductases. At variance with the classi-cal ATG motif of flavoproteins, the fifth residue was notalways hydrophobic, because N, H, or even E could befound at that position. We did not endeavor to check theconservation of the motif in every member of these fami-lies, as we had done for flavoproteins. However, cursoryexamination of additional sequences and of publishedalignments showed excellent conservation of the first andlast residues and of the hydrophobic nature of the 3rd and4th.

Note that most of the proteins that present an ATG motifalso bind NAD(P)H or FAD, which suggests that thissequence plays a role in the interaction with dinucleotides.However, the ATG motif was absent in several families ofNAD(P)H-binding proteins, and no correlation was foundbetween the ATG motif and the presence of NAD(P)H inthe crystal structure nor the tightness of the connectionwith the 4th a-helix. Note also that the ATG motif waspresent in some families that lack a DBM or pre-DBM. Themotif may have appeared independently several times,each time probably as a further specialization towarddinucleotide binding (for the sake of clarity, this is pre-sented as a single step D, in Fig. 5). However, in the case offlavoproteins, its presence in both domains of pyridinenucleotide reductases and in other flavoproteins stronglysuggests that it was an established feature of a commonancestor. In flavoproteins, the ATG motif has acquired aspecific function, because it is always at the junction withthe substrate-binding domain, and not within a domain asin NAD(P)H-binding proteins. Interestingly, several ex-amples of secondary loss of the ATG motif are found inproteins that no longer bind dinucleotides (TMD, FCSD,and Rab-GDI). In these cases, the DBM is also altered,

which underlines how both sequence characteristics aretightly linked to cofactor binding.

The GD Motif

As recognized by Eggink et al.,11 the structural basis forconservation of the GD motif is in the bonding of the O3* ofthe flavin ribityl by the OD2 of its final D residue. Schulz5

has noted the interesting symmetry between this D andthe D/E residue that ends the 2nd b-sheet (see Fig. 4),similar to the symmetry we have described between theATG and the DBM. The distribution of the GD motif,however, suggests a less ancient origin. It is found in mostflavoproteins with two dinucleotide-binding domains,where it clearly indicates homology. However, its presencein PHH and maybe in SFR and other flavoprotein familiesmay be due to convergence. In many cases, the critical Dresidue is missing or poorly conserved, which suggests adifferent type of interaction with the ribityl. In conclusion,the GD motif appears as a dispensable sequence feature inflavoproteins, acquired probably late in evolution (step Hin Fig. 5), and possibly several times independently. Whenpresent, it is a useful tool for structural predictions, incombination with the ATG motif (see Figs. 1–3).

GG Motif Versus GG Doublet

We propose that the two conserved Gly residues (the GGdoublet) found a few residues downstream of the first DBMin most flavoproteins with two dinucleotide-binding do-mains is an ancestral feature of these proteins, as are theATG and GD motifs. Their absence in FCSD or NADHoxidase would be the consequence of secondary loss. Whenstructural data are available, the GG doublet appears toplay an important role in binding FAD (or ADP). It allowsclose contact between the first phosphate of the pyrophos-phate (closest to the adenine moiety) and the N-terminalend of a structurally important a-helix. Stabilization of thecofactor is achieved via H-bonding of the next residue withO2A or O1A and via the favorable electrostatic interactionbetween the helix dipole and the negatively chargedphosphate. A similar role has been described for the firsta-helix of the Rossman fold.47,46

Concerning the GG motifs or GG doublets of flavopro-teins with a single dinucleotide-binding domain, however,the lack of extended sequence similarity makes it difficultto distinguish between homology and sequence conver-gence. When the structures of Rab-GDI or GMC oxidoreduc-tases are compared with that of GR, it appears thatalthough the GG doublets are superimposable, the se-quences that follow are completely unrelated. When consid-ering families for which no structural data are available,secondary structure predictions may provide interestingclues. For MAO-related sequences (Fig. 1), carotenoiddesaturases and protoporphyrinogen oxidases (i.e., allproteins with a GG motif) and for dTDP-4-dehydrorham-nose reductases and THI4 proteins, the prediction wasthat the two G’s are immediately followed by a b-strand. Isthis putative strand similar to the one found in Rab-GDI?Note that the latter also has conserved S/T four residuesdownstream, as in the GG motif. In contrast, the predic-

110 O. VALLON

tion for SFR is that the GG doublet is followed by ana-helix, as in GR. Together with the presence of a classicalATG (and GD?) motif, this may be taken as an indicationthat SFR shares common ancestry with flavoprotein pyri-dine nucleotide reductases.

Origin and Evolution of Flavoproteins

Central to the building of Figure 5 is the notion thatflavoproteins have evolved from an NAD(P)H-binding pro-tein. Pyridine nucleotides are usually found as solubleelectron/proton carriers in biological pathways, and theiruse in biological systems may therefore predate that offlavins, which generally serve as protein-bound intermedi-ates in intraprotein redox reactions. The fact thatNAD(P)H-binding proteins appear at the same time more

diverse and simpler in structure than FAD-binding pro-teins also suggests an earlier appearance.

But where should we place the divergence between FAD-and NAD(P)H- binding proteins? Despite the limited num-ber of structures available, it is obvious that the FAD-binding domain of simple flavoproteins (with only oneRossman-type domain) such as PHH, DAO, GOX, andCOX (and even the non-flavoprotein GDI) closely re-sembles that of flavoprotein pyridine nucleotide reducta-ses. All these proteins show, in addition to the DBM,several typical features (some already noted in Rao andRossmann2) that are not present in other Rossman-typeproteins: (a) an excursion into another (substrate-binding)domain after the 2nd strands; (b) a three-stranded antipa-rallel b-sheet between strands 3 and 4, often completed by

Fig. 5. A simplified classification, and possible evolutionary scheme,for Rossman-type proteins. b-strands are numbered in arabic, a-helicesin roman numerals. A: Duplication of a hypothetical three-strandedb-sheet forms the basic Rossman fold (e.g., N-carbamoylsarcosineamidohydrolase). B: Acquisition of G residues between strand 1 and helixI (e.g., C-terminal domain of aspartate carbamoyltransferase), leading tobinding of cofactor: NAD(P)H (e.g., enoyl-acyl carrier protein reductase),ATP (e.g., glycogen phosphorylase b), or other (e.g., glycineN-methyltransferase). C: Apparition of the DBM and pre-DBM motifs (e.g.,leucine dehydrogenase, dihydrodipicolinate reductase). D: Acquisition of

the ATG motif (e.g., alcohol dehydrogenase, dihydropteridine reductase).E: Replacement of helix II by a three-stranded antiparallel b-sheet;F: Loss of strands 5 and 6 and helices IV and V; addition of a strand andhelix at the N-terminus (NAD(P)H-binding domain of flavoprotein pyridinenucleotide reductases). G: Acquisition of FAD-binding, loss of strand 6,helix V moves to front; acquisition of GD motif and GG doublet (only in theFAD domain of flavoprotein pyridine nucleotide reductases and a fewothers). H: insertion of NAD(P)H domain after strands 4 and 2 (flavopro-tein pyridine nucleotide reductases). I: Insertion of substrate bindingdomain after helix IV and after strand 2 (other flavoproteins).

111SEQUENCE MOTIFS IN FLAVOPROTEINS

an additional strand at the N-terminus of the protein (thisantiparallel b-sheet is absent in TMD and DAO, whichhave an a-helix instead); (c) another excursion into thesubstrate-binding domain after the 4th strand or 4th helix;(d) a 5th helix folding back not behind the b-sheet but infront (in our conventional orientation), i.e., next to the 1sthelix; (e) a missing 6th strand, often replaced by anantiparallel b-strand found just before the 5th strand inthe sequence. These structural similarities suggest that allthese flavoproteins have evolved from a common ancestor,which was already quite differentiated from the basicRossman-type protein.

Among these features of flavoproteins, it is remarkablethat one of the most conspicuous, the antiparallel b-sheetbetween the 3rd and 4th strands, is also present, with thesame connectivity, in the NAD(P)H-binding domain offlavoprotein pyridine nucleotide reductases. We have foundno other example of a b-sheet connecting strands 3 and 4 inother NAD(P)H-binding proteins. In this domain, theincursion (a) described above is missing and the fifthstrand and following helix in the central parallel b-sheetare missing (they are often replaced by an additionalstrand and helix at the N-terminus of the domain). Inshort, the NAD(P)H-domain appears as a truncated ver-sion of the FAD-domain. This, together with the presenceof an ATG motif (see above) suggests that this domain hasevolved from the same branch that gave the FAD-bindingdomain (step F in Fig. 5). In conclusion, we propose thatflavoprotein pyridine nucleotide reductases have arisen byinsertion into an ancestral FAD-binding domain (after the4th strand) of an NAD(P)H-binding domain, itself evolvedfrom the same ancestor. This scheme is somewhat morecomplex than the simple gene duplication event proposedbefore,49 but it appears more able to explain the structuralsimilarities between the available structures. In mostflavoproteins, the bulk of the substrate-binding domain isinserted after the 4th b-strand or the 4th a-helix, the restbeing found after the 2nd strand. When the structure ofthis additional domain is examined (i.e., using superimpo-sition of Ca traces in FSSP), two main branches appear,probably reflecting different insertion events (steps H andI). In flavoprotein pyridine nucleotide reductases andrelated proteins, as stated above, the inserted domain is aclassical dinucleotide-binding domain found immediatelyafter the 4th strand (ATG). In PHH, GDI, and DAO,insertion occurs after the 4th helix. The inserted domainconsists of a 7-stranded b-sheet, with mostly antiparallelorientation, whose complex topology is perfectly identicalin the three proteins. As noted for COX,38 the substrate-binding domain of GMC oxidoreductases adopts a similartopology, albeit with additional secondary structure ele-ments. We believe that this is evidence for a common originfor PHH, GDI, DAO, and GMC oxidoreductases, and thatother families of flavoproteins will turn out to be structur-ally and phyllogenetically related.

ACKNOWLEDGMENTS

The author thanks J.M. Camadro, Institut JacquesMonod, and D. Picot, Institut de Biologie Physico-Chimi-

que, for their help with the sequence comparison andstructural analysis programs, respectively.

REFERENCES1. Rossmann MG, Moras D, Olsen KW. Chemical and biological

evolution of nucleotide-binding protein. Nature 1974;250:194–199.

2. Rao ST, Rossmann MG. Comparison of super-secondary struc-tures in proteins. J Mol Biol 1973;76:241–256.

3. Wierenga RK, Drenth J, Schulz GE. Comparison of the three-dimensional protein and nucleotide structure of the FAD-bindingdomain of p-hydroxybenzoate hydroxylase with the FAD- as wellas NADPH-binding domains of glutathione reductase. J Mol Biol1983;167:725–739.

4. Schulz GE, Schirmer RH, Sachsenheimer W, Pai EF. The struc-ture of the flavoenzyme glutathione reductase. Nature 1978;273:120–124.

5. Schulz GE. Binding of nucleotides by proteins. Curr Opin StructBiol 1992;2:61–67.

6. Ohlsson I, Nordstrom B, Branden CI. Structural and functionalsimilarities within the coenzyme binding domains of dehydroge-nases. J Mol Biol 1974;89:339–354.

7. Wierenga RK, Terpstra P, Hol WG. Prediction of the occurrence ofthe ADP-binding beta alpha beta-fold in proteins, using an aminoacid sequence fingerprint. J Mol Biol 1986;187:101–107.

8. Bork P, Grunwald C. Recognition of different nucleotide-bindingsites in primary structures using a property-pattern approach.Eur J Biochem 1990;191:347–358.

9. Baker PJ, Britton KL, Rice DW, Rob A, Stillman TJ. Structuralconsequences of sequence patterns in the fingerprint region of thenucleotide binding fold: implications for nucleotide specificity[published erratum appears in J Mol Biol 1993 Aug 5;232:1012]. JMol Biol 1992;228:662–671.

10. Scrutton NS, Berry A, Perham RN. Redesign of the coenzymespecificity of a dehydrogenase by protein engineering. Nature1990;343:38–43.

11. Eggink G, Engel H, Vriend G, Terpstra P, Witholt B. Rubredoxinreductase of Pseudomonas oleovorans: structural relationship toother flavoprotein oxidoreductases based on one NAD and twoFAD fingerprints. J Mol Biol 1990;212:135–142.

12. Hecht HJ, Erdmann H, Park HJ, Sprinzl M, Schmid RD. Crystalstructure of NADH oxidase from Thermus thermophilus. NatStruct Biol 1995;2:1109–1114.

13. Benson TE, Filman DJ, Walsh CT, Hogle JM. An enzyme-substrate complex involved in bacterial cell wall biosynthesis. NatStruct Biol 1995;2:644–653.

14. Vallon O, Wollman F-A. cDNA sequence of Ma, the catalyticsubunit of the Chlamydomonas reinhardtii L-amino acid oxidase(Accession No. U78797): a new sequence motif shared by a widevariety of flavoproteins (PGR97–171). Plant Physiol 1997;115:1729.

15. Altschul SF, Madden TL, Schaffer AA, et al. Gapped BLAST andPSI-BLAST: a new generation of protein database search pro-grams. Nucleic Acids Res 1997;25:3389–3402.

16. Schuler GD, Altschul SF, Lipman DJ. A workbench for multiplealignment construction and analysis. Proteins 1991;9:180–190.

17. Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improvingthe sensitivity of progressive multiple sequence alignment throughsequence weighting, position-specific gap penalties and weightmatrix choice. Nucleic Acids Res 1994;22:4673–4680.

18. Sayle RA, Milner-White EJ. RASMOL: biomolecular graphics forall. Trends Biochem Sci 1995;20:374.

19. Wild DL, Tucker PA, Choe S. A visual data flow environment formacromolecular crystallographic computing [published erratumappears in J Mol Graph 1995 Dec;13:385]. J Mol Graph 1995;13:291–298, 299–300.

20. Morris CJ, Black AC, Pealing SL, et al. Purification and propertiesof a novel cytochrome: flavocytochrome c from Shewanella putrefa-ciens. Biochem J 1994;302:587–593.

21. Heim S, Kunkel A, Thauer RK, Hedderich R. Thiol:fumaratereductase (Tfr) from Methanobacterium thermoautotrophicum—identification of the catalytic sites for fumarate reduction andthiol oxidation. Eur J Biochem 1998;253:292–299.

22. Pealing SL, Black AC, Manson FD, Ward FB, Chapman SK, ReidGA. Sequence of the gene encoding flavocytochrome c from

112 O. VALLON

Shewanella putrefaciens: a tetraheme flavoenzyme that is asoluble fumarate reductase related to the membrane-bound en-zymes from other bacteria [published erratum appears in Biochem-istry 1993 Apr 13; 32:3829]. Biochemistry 1992;31:12132–12140.

23. Robinson KM, Rothery RA, Weiner JH, Lemire BD. The covalentattachment of FAD to the flavoprotein of Saccharomyces cerevisiaesuccinate dehydrogenase is not necessary for import and assemblyinto mitochondria. Eur J Biochem 1994;222:983–990.

24. Tedeschi G, Negri A, Mortarino M, et al. L-aspartate oxidase fromEscherichia coli. II. Interaction with C4 dicarboxylic acids andidentification of a novel L-aspartate: fumarate oxidoreductaseactivity. Eur J Biochem 1996;239:427–433.

25. Mortarino M, Negri A, Tedeschi, et al. L-aspartate oxidase fromEscherichia coli. I. Characterization of coenzyme binding andproduct inhibition. Eur J Biochem 1996;239:418–426.

26. Franklund CV, Baron SF, Hylemon PB. Characterization of thebaiH gene encoding a bile acid-inducible NADH:flavin oxidoreduc-tase from Eubacterium sp. strain VPI 12708. J Bacteriol 1993;175:3002–3012.

27. Lim LW, Shamala N, Mathews FS, Steenkamp DJ, Hamlin R,Xuong NH. Three-dimensional structure of the iron-sulfur flavo-protein trimethylamine dehydrogenase at 2.4-A resolution. J BiolChem 1986;261:15140–15146.

28. Barber MJ, Neame PJ, Lim LW, White S, Matthews FS. Correla-tion of x-ray deduced and experimental amino acid sequences oftrimethylamine dehydrogenase. J Biol Chem 1992;267:6611–6619.

29. Lim LW, Mathews FS. Steenkamp DJ. Identification of ADP in theiron-sulfur flavoprotein trimethylamine dehydrogenase. J BiolChem 1988;263:3075–3078.

30. Creighton TE: Proteins: structures and molecular properties.1993.

31. Holm L, Sander C. Touring protein fold space with Dali/FSSP.Nucleic Acids Res 1998;26:318–321.

32. Ross RP, Claiborne A. Molecular cloning and analysis of the geneencoding the NADH oxidase from Streptococcus faecalis 10C1:comparison with NADH peroxidase and the flavoprotein disulfidereductases. J Mol Biol 1992;227:658–671.

33. Van Driessche G, Koh M, Chen ZW, et al. Covalent structure of theflavoprotein subunit of the flavocytochrome c: sulfide dehydroge-nase from the purple phototrophic bacterium Chromatium vino-sum. Protein Sci 1996;5:1753–1764.

34. Chen ZW, Koh M, Van Driessche G, et al. The structure offlavocytochrome c sulfide dehydrogenase from a purple photo-trophic bacterium. Science 1994;266:430–432.

35. Schalk I, Zeng K, Wu SK, et al. Structure and mutational analysisof Rab GDP-dissociation inhibitor. Nature 1996;381:42–48.

36. Koonin EV. Human choroideremia protein contains a FAD-binding domain [published erratum appears in Nat Genet 1996Apr;12:458]. Nat Genet 1996;12:237–239.

37. Wu SK, Zeng K, Wilson IA, Balch WE. Structural insights into thefunction of the Rab GDI superfamily. Trends Biochem Sci 1996;21:472–476.

38. Vrielink A, Lloyd LF, Blow DM. Crystal structure of cholesteroloxidase from Brevibacterium sterolicum refined at 1.8 A resolu-tion. J Mol Biol 1991;219:533–554.

39. Hecht HJ, Kalisz HM, Hendle J, Schmid RD, Schomburg D.Crystal structure of glucose oxidase from Aspergillus niger refinedat 2.3 A resolution. J Mol Biol 1993;229:153–172.

40. Cavener DR. GMC oxidoreductases: a newly defined family ofhomologous proteins with diverse catalytic activities. J Mol Biol1992;223:811–814.

41. Li J, Vrielink A, Brick P, Blow DM. Crystal structure of cholesteroloxidase complexed with a steroid substrate: implications for flavinadenine dinucleotide dependent alcohol oxidases. Biochemistry1993;32:11507–11515.

42. Becker K, Muller S, Keese MA, Walter RD, Schirmer RH. Aglutathione reductase-like flavoenzyme of the malaria parasitePlasmodium falciparum: structural considerations based on theDNA sequence. Biochem Soc Trans 1996;24:67–72.

43. Entsch B, van Berkel WJ. Structure and mechanism of para-hydroxybenzoate hydroxylase. FASEB J 1995;9:476–483.

44. Enroth C, Neujahr H, Schneider G, Lindqvist Y. The crystalstructure of phenol hydroxylase in complex with FAD and phenolprovides evidence for a concerted conformational change in the

enzyme and its cofactor during catalysis. Structure. 1998;6:605–617.

45. Mizutani H, Miyahara I, Hirotsu K, et al. Three-dimensionalstructure of porcine kidney D-amino acid oxidase at 3.0 A resolu-tion. J Biochem (Tokyo) 1996;120:14–17.

46. Labesse G, Vidal-Cros A, Chomilier J, Gaudry M, Mornon JP.Structural comparisons lead to the definition of a new superfamilyof NAD(P)(H)-accepting oxidoreductases: the single-domain reduc-tases/epimerases/dehydrogenases (the ‘RED’ family). Biochem J1994;304:95–99.

47. Hol WG, van Duijnen PT, Berendsen H. The alpha-helix dipoleand the properties of proteins. Nature 1978;273:443–446.

48. Wierenga RK, De Maeyer MCH, Hol WG. Interaction of pyrophos-phate moietites with a-helices in dinucleotide binding proteins.Biochemistry 1985;24:1346–1357.

49. McKie JH, Douglas KT. Evidence for gene duplication formingsimilar binding folds for NAD(P)H and FAD in pyridine nucleotide-dependent flavoenzymes. FEBS Lett. 1991;279:5–8.

50. Bach AW, Lan NC, Johnson DL, et al. cDNA cloning of human livermonoamine oxidase A and B: molecular basis of differences inenzymatic properties. Proc Natl Acad Sci USA 1988;85:4934–4938.

51. Schroder I, Gunsalus RP, Ackrell BA, Cochran B, Cecchini G.Identification of active site residues of Escherichia coli fumaratereductase by site-directed mutagenesis. J Biol Chem 1991;266:13572–13579.

52. Mallonee DH, White WB, Hylemon PB. Cloning and sequencing ofa bile acid-inducible operon from Eubacterium sp. strain VPI12708. J Bacteriol 1990;172:7011–7019.

53. He XY, Yang SY, Schulz H. Cloning and expression of the fadHgene and characterization of the gene product 2,4-dienoyl coen-zyme A reductase from Escherichia coli. Eur J Biochem 1997;248:516–520.

54. Niedermann DM, Lerch K. Molecular cloning of the L-amino-acidoxidase gene from Neurospora crassa. J Biol Chem 1990;265:17246–17251.

55. Bockholt R, Masepohl B, Kruft V, Wittmann-Liebold B, PistoriusEK. Partial amino acid sequence of an L-amino acid oxidase fromthe cyanobacterium Synechococcus PCC6301, cloning and DNAsequence analysis of the aoxA gene. Biochim Biophys Acta 1995;1264:289–293.

56. Raibekas AA, Massey V. Primary structure of the snake venomL-amino acid oxidase shows high homology with the mouse B cellinterleukin 4-induced Fig1 protein. Biochem Biophys Res Com-mun 1998;248:476–478.

57. Chu CC, Paul WE. Fig1, an interleukin 4-induced mouse B cellgene isolated by cDNA representational difference analysis. ProcNatl Acad Sci USA 1997;94:2507–2512.

58. Obara K, Otsuka-Fuchino H, Sattayasai N, Nonomura Y, TsuchiyaT, Tamiya T. Molecular cloning of the antibacterial protein of thegiant African snail, Achatina fulica Ferussac. Eur J Biochem1992;209:1–6.

59. Takamatsu N, Shiba T, Muramoto K, Kamiya H. Molecularcloning of the defense factor in the albumen gland of the sea hareAplysia kurodai. FEBS Lett 1995;27:373–376.

60. Malloy PJ, Zhao X, Madani ND, Feldman D. Cloning and expres-sion of the gene from Candida albicans that encodes a high-affinity corticosteroid-binding protein. Proc Natl Acad Sci USA1993;90:1902–1906.

61. Joets J, Pousset D, Marcireau C, Karst F. Characterization of thesaccharomyces cerevisiae FMS1 gene related to Candida albicanscorticosteroid-binding protein 1. Curr Genet 1996;30:115–120.

62. Ishizuka H, Horinouchi S, Beppu T. Putrescine oxidase of Micro-coccus rubens: primary structure and Escherichia coli. J GenMicrobiol 1993;139:425–432.

63. Schilling B, Lerch K. Cloning, sequencing and heterologousexpression of the monoamine oxidase gene from Aspergillus niger.Mol Gen Genet 1995;247:430–438.

64. Camilleri C, Jouanin L. The TR-DNA region carrying the auxinsynthesis genes of the Agrobacterium rhizogenes agropine-typeplasmid pRiA4: nucleotide sequence analysis and introductioninto tobacco plants. Mol Plant-Microbe Interact 1991;4:155–162.

65. Emanuele JJ, Heasley CJ, Fitzpatrick PF. Purification and charac-terization of the flavoprotein tryptophan 2-monooxygenase ex-pressed at high levels in Escherichia coli. Arch Biochem Biophys1995;316:241–248.

113SEQUENCE MOTIFS IN FLAVOPROTEINS

66. Hugueney P, Romer S, Kuntz M, Camara B. Characterization andmolecular cloning of a flavoprotein catalyzing the synthesis ofphytofluene and zeta-carotene in Capsicum chromoplasts. Eur JBiochem 1992;209:399–407.

67. Albrecht M, Linden H, Sandmann G. Biochemical characteriza-tion of purified zeta-carotene desaturase from Anabaena PCC7120 after expression in Escherichia coli. Eur J Biochem 1996;236:115–120.

68. Ehrenshaft M, Daub ME. Isolation, sequence, and characteriza-tion of the Cercospora nicotianae phytoene dehydrogenase gene.Appl Environ Microbiol 1994;60:2766–2771.

69. Gari E, Toledo JC, Gibert I, Barbe J. Nucleotide sequence of themethoxyneurosporene dehydrogenase gene from Rhodobactersphaeroides: comparison with other bacterial carotenoid dehydro-genases. FEMS Microbiol Lett 1992;72:103–108.

70. Wieland B, Feil C, Gloria-Maercker E. Genetic and biochemicalanalyses of the biosynthesis of the yellow carotenoid 4,4’-diaponeurosporene of Staphylococcus aureus. J Bacteriol 1994;176:7719–7726.

71. Camadro JM, Labbe P. Cloning and characterization of the yeastHEM14 gene coding for protoporphyrinogen oxidase, the molecu-lar target of diphenyl ether-type herbicides. J Biol Chem 1996;271:9120–9128.

72. Dailey TA, Meissner P, Dailey HA. Expression of a cloned protopor-phyrinogen oxidase. J Biol Chem 1994;269:813–815.

73. Ladeveze V, Marcireau C, Delourme D, Karst F. General resis-tance to sterol biosynthesis inhibitors in Saccharomyces cerevi-siae. Lipids 1993;28:907–912.

74. Manetti AG, Rosetto M, Maundrell KG. nmt2 of fission yeast: asecond thiamine-repressible gene co-ordinately regulated withnmt1. Yeast 1994;10:1075–1082.

75. Praekelt UM, Meacock PA. MOL1, a Saccharomyces cerevisiaegene that is highly expressed in early stationary phase duringgrowth on molasses. Yeast 1992;8:699–710.

76. Choi GH, Marek ET, Schardl CL, Richey MG, Chang SY, SmithDA. sti35, a stress-responsive gene in Fusarium spp. J Bacteriol1990;172:4522–4528.

77. Machado CR, de Oliveira RL, Boiteux S, Praekelt UM, MeacockPA, Menck CF. Thi1, a thiamine biosynthetic gene in Arabidopsisthaliana, complements bacterial defects in DNA repair. Plant MolBiol 1996;31:585–593.

78. Zhang L, al-Hendy A, Toivanen P, Skurnik M. Genetic organiza-tion and sequence of the rfb gene cluster of Yersinia enterocoliticaserotype O:3: similarities to the dTDP-L-rhamnose biosynthesispathway of Salmonella and to the bacterial polysaccharide trans-port systems. Mol Microbiol 1993;9:309–321.

79. Yao Z, Valvano MA. Genetic analysis of the O-specific lipopolysac-charide biosynthesis region (rfb) of Escherichia coli K-12 W3110:identification of genes that confer group 6 specificity to Shigellaflexneri serotypes Y and 4a. J Bacteriol 1994;176:4133–4143.

80. Enomoto K, Ohki R, Muratsubaki H. Cloning and sequencing ofthe gene encoding the soluble fumarate reductase from Saccharo-myces cerevisiae. DNA Res 1996;3:263–267.

81. Plesiat P, Grandguillot M, Harayama S, Vragar S, Michel-BriandY. Cloning, sequencing, and expression of the Pseudomonastestosteroni gene encoding 3-oxosteroid delta 1-dehydrogenase. JBacteriol 1991;173:7219–7227.

82. Florin C, Kohler T, Grandguillot M, Plesiat P. Comamonastestosteroni 3-ketosteroid-delta4(5alpha)-dehydrogenase: gene andprotein characterization. J Bacteriol 1996;178:3322–3330.

83. Miller RE, Stadtman ER. Glutamate synthase from Escherichiacoli : an iron-sulfide flavoprotein. J Biol Chem 1972;247:7407–7419.

84. Gregerson RG, Miller SS, Twary SN, Gantt JS, Vance CP. Molecu-lar characterization of NADH-dependent glutamate synthasefrom alfalfa nodules. Plant Cell 1993;5:215–226.

85. Curti B, Vanoni MA, Verzotti E, Zanetti G. Glutamate synthase: acomplex iron-sulphur flavoprotein. Biochem Soc Trans 1996;24:95–99.

86. Gosset G, Merino E, Recillas F, Oliver G, Becerril B, Bolivar F.Amino acid sequence analysis of the glutamate synthase enzymefrom Escherichia coli K-12. Protein Seq Data Anal 1989;2:9–16.

87. Cavicchioli R, Kolesnikow T, Chiang RC, Gunsalus RP. Character-ization of the aegA locus of Escherichia coli: control of geneexpression in response to anaerobiosis and nitrate. J Bacteriol1996;178:6968–6974.

88. Albin N, Johnson MR, Diasio RB. cDNA cloning of bovine liverdihydropyrimidine dehydrogenase. DNA Seq 1996;6:243–250.

89. Lawton MP, Cashman JR, Cresteil T, et al. A nomenclature for themammalian flavin-containing monooxygenase gene family basedon amino acid sequence identities.Arch Biochem Biophys 1994;308:254–257.

90. Atta-Asafo-Adjei E, Lawton MP, Philpot RM. Cloning, sequencing,distribution, and expression in Escherichia coli of flavin-contain-ing monooxygenase 1C1: evidence for a third gene subfamily inrabbits. J Biol Chem 1993;268:9681–9689.

91. Suh JK, Poulsen LL, Ziegler DM, Robertus JD. Molecular cloningand kinetic characterization of a flavin-containing monooxygen-ase from Saccharomyces cerevisiae. Arch Biochem Biophys 1996;336:268–274.

92. Chen YC, Peoples OP, Walsh CT. Acinetobacter cyclohexanonemonooxygenase: gene cloning and sequence determination. JBacteriol 1988;170:781–789.

93. Brown DW, Yu JH, Kelkar HS, et al. Twenty-five coregulatedtranscripts define a sterigmatocystin gene cluster in Aspergillusnidulans. Proc Natl Acad Sci USA 1996;93:1418–1422.

94. Liu XL, Scopes RK. Cloning, sequencing and expression of thegene encoding NADH oxidase from the extreme anaerobic thermo-phile Thermoanaerobium brockii. Biochim BiophysActa 1993;1174:187–190.

95. Boyd G, Mathews FS, Packman LC, Scrutton NS. Trimethylaminedehydrogenase of bacterium W3A1: molecular cloning, sequencedetermination and over-expression of the gene. FEBS Lett 1992;308:271–276.

96. Yang CC, Packman LC, Scrutton NS. The primary structure ofHyphomicrobium X dimethylamine dehydrogenase: relationshipto trimethylamine dehydrogenase and implications for substraterecognition. Eur J Biochem 1995;232:264–271.

97. Huang L, Rohlfs RJ, Hille R. The reaction of trimethylaminedehydrogenase with electron transferring flavoprotein. J BiolChem 1995;270:23958–23965.

98. Pelanda R, Vanoni MA, Perego M, et al. Glutamate synthasegenes of the diazotroph Azospirillum brasilense: cloning, sequenc-ing, and analysis of functional domains. J Biol Chem 1993;268:3099–3106.

99. Krueger JK, Stock J, Schutt CE. Evidence that the methylester-ase of bacterial chemotaxis may be a serine hydrolase. BiochimBiophys Acta 1992;1119:322–326.

114 O. VALLON