cloningof cdnasfor human aldehydedehydrogenases … [4]; gly-thr-leu-glu-leu-glu-val-asx-lys ... xx...

5
Proc. Nad. Acad. Sci. USA Vol. 82, pp. 3771-3775, June 1985 Genetics Cloning of cDNAs for human aldehyde dehydrogenases 1 and 2 (cDNA expression library/synthetic oligodeoxynucleotide probe/isozymes) LILY C. Hsu*, KENZABURO TANI*, TOSHINOBU FuJIYOSHI*, KOTOKU KURACHIt, AND AKIRA YOSHIDA** *Department of Biochemical Genetics, Beckman Research Institute of the City of Hope, Duarte, CA 91010; and tDepartment of Biochemistry, University of Washington, Seattle, WA 98195 Communicated by Arno G. Motulsky, February 7, 1985 ABSTRACT Partial cDNA clones encoding human cyto- solic aldehyde dehydrogenase (ALDH1) and mitochondrial aldehyde dehydrogenase (ALDH2) were isolated from a human liver cDNA library constructed in phage Agtll. The expression library was screened by using rabbit antibodies against ALDH1 and ALDH2. Positive clones thus obtained were subsequently screened with mixed synthetic oligonudeotides compatible with peptide sequences of ALDH1 and ALDH2. One of the positive clones for ALDH1 contained an insertion of 1.6 kilobase pairs (kbp). The insert encoded 340 amino acid residues and had a 3' noncoding region of 538 bp and a poly(A) segment. The amino acid sequence deduced from the cDNA sequence co- incided with the reported amino acid sequence of human ALDH1 [Hempel, J., von Bahr-Lindstrom, H. & Jornvall, H. (1984) Eur. J. Biochem. 141, 21-35], except that valine at position 161 in the previous amino acid sequence study was found to be isoleucine in the deduced sequence. Since the amino acid sequence of ALDH2 was unknown, 33 tryptic peptides of human ALDH2 were isolated and sequenced. Based on the amino acid sequence data thus obtained, a mixed oligonucleo- tide probe was prepared. Two positive clones, AALDH2-21 and AALDH2-36, contained the same insert of 1.2 kbp. Another done, AALDH2-22, contained an insert of 1.3 kbp. These two inserts contained an overlap region of 0.9 kbp. The combined cDNA contained a sequence that encodes 399 amino acid residues, a chain-termination codon, a 3' untranslated region of 403 bp, and a poly(A) segment. The deduced amino acid sequence was compatible with the amino acid sequences of the tryptic peptides. The degree of homology between human ALDH1 and ALDH2 is 66% for the coding regions of their cDNAs and 69% at the protein level. No significant homology was found in their 3' untranslated regions. Liver aldehyde dehydrogenase (ALDH; aldehyde:NAD+ oxidoreductase, EC 1.2.1.3) is considered to play a major role in alcohol metabolism. Two major and several minor iso- zymes exist in the livers of mammals, including man. One of the major isozymes, ALDH1 (or E1), is of cytosolic origin, associated with a low Km for NAD and a high Km for acetaldehyde, and strongly inactivated by disulfiram. An- other major isozyme, ALDH2 (or E2), is of mitochondrial origin, associated with a high Km for NAD and a low Km for acetaldehyde, and insensitive to disulfiram (1-6). Racial differences in these two isozymes have been found between Caucasians and Orientals. All Caucasians examined thus far have both ALDH1 and ALDH2 in their livers (commonly designated "usual"). In contrast, =50% of Ori- entals have only the ALDH1 isozyme and are missing the ALDH2 isozyme (commonly designated "atypical") (7, 8). The atypical Oriental livers, however, contain a defective enzyme, with diminished activity, that is immunologically related to ALDH2 (6, 9, 10). More recently, another ab- normality, an absence of active ALDH1 and instead the presence of an enzymatically inactive protein, was found in some Orientals (11). A very high incidence (50-80%o) of acute alcohol intoxication in Orientals in comparison to Caucasians (about 10%) could be attributed to genetic differences in the ALDH isozymes (7). Both ALDH1 and ALDH2 are tetrameric forms (2-5), and the two isozymes do not contain a common subunit (6). The amino acid sequences of human and horse ALDH1 were recently reported (12, 13), but the sequence of ALDH2 is unknown. In this paper, we report the isolation and characterization of cDNA clones for human ALDH1 and ALDH2. The amino acid sequences of the two isozymes were deduced from their cDNA sequences and compared. MATERIALS AND METHODS Sequence Analysis of ALDH2. Human liver ALDH2 was purified to homogeneity from a liver autopsy sample from a Caucasian with the usual phenotype, as described (6). The tryptic peptides were isolated either by peptide mapping or by reversed-phase HPLC (10). Amino acid sequences of the peptides were determined by manual Edman degradation (14). Preparation of Radioactive Oligonucleotide Probes. Two types of mixed icosamers, corresponding to ALDH1 amino acid sequence, and mixed tetradecamers, corresponding to ALDH2 amino acid sequence, were synthesized by a solid- phase phosphotriester method (see Fig. 1). The chemically synthesized probe was labeled at the 5' end with [y-32P]ATP (2000-5000 Ci/mmol, 1 Ci = 37 GBq; ICN) and T4 polynu- cleotide kinase (Bethesda Research Laboratories) by the standard method (15). Screening of Human Liver cDNA Expression Library with Antibody. Rabbit antibody against homogeneous ALDH1 and that against ALDH2 were partially purified through (NH4)2SO4 precipitation and DEAE-cellulose chromatogra- phy as described (6). The human liver cDNA library, con- structed by inserting the cDNA copies of poly(A)+ mRNA from human liver into the EcoRI site of bacteriophage vector Xgtll via synthetic linkers (16), was provided by S. L. C. Woo (Howard Hughes Medical Institute, Houston, TX). The cDNA library was screened by the antibody probe method described in a previous paper (17). Identification of Fusion Protein from Lysates of Induced Recombinant Lysogens. The method used for isolation of fusion protein is essentially identical to that described previ- ously (17). Fusion protein was detected with anti-ALDH1 or anti-ALDH2 antibody. Analysis of Recombinant Agtll Inserts by Southern Blot Hybridization with Oligonucleotide Probes. The DNA prepa- Abbreviations: ALDH, aldehyde dehydrogenase; ADH, alcohol dehydrogenase; bp, base pair(s). *To whom reprint requests should be addressed. 3771 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Upload: buinga

Post on 26-May-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Proc. Nad. Acad. Sci. USAVol. 82, pp. 3771-3775, June 1985Genetics

Cloning of cDNAs for human aldehyde dehydrogenases 1 and 2(cDNA expression library/synthetic oligodeoxynucleotide probe/isozymes)

LILY C. Hsu*, KENZABURO TANI*, TOSHINOBU FuJIYOSHI*, KOTOKU KURACHIt, AND AKIRA YOSHIDA**

*Department of Biochemical Genetics, Beckman Research Institute of the City of Hope, Duarte, CA 91010; and tDepartment of Biochemistry, University ofWashington, Seattle, WA 98195

Communicated by Arno G. Motulsky, February 7, 1985

ABSTRACT Partial cDNA clones encoding human cyto-solic aldehyde dehydrogenase (ALDH1) and mitochondrialaldehyde dehydrogenase (ALDH2) were isolated from a humanliver cDNA library constructed in phage Agtll. The expressionlibrary was screened by using rabbit antibodies against ALDH1and ALDH2. Positive clones thus obtained were subsequentlyscreened with mixed synthetic oligonudeotides compatible withpeptide sequences of ALDH1 and ALDH2. One of the positiveclones for ALDH1 contained an insertion of 1.6 kilobase pairs(kbp). The insert encoded 340 amino acid residues and had a3' noncoding region of 538 bp and a poly(A) segment. Theamino acid sequence deduced from the cDNA sequence co-incided with the reported amino acid sequence of humanALDH1 [Hempel, J., von Bahr-Lindstrom, H. & Jornvall, H.(1984) Eur. J. Biochem. 141, 21-35], except that valine atposition 161 in the previous amino acid sequence study wasfound to be isoleucine in the deduced sequence. Since the aminoacid sequence of ALDH2 was unknown, 33 tryptic peptides ofhuman ALDH2 were isolated and sequenced. Based on theamino acid sequence data thus obtained, a mixed oligonucleo-tide probe was prepared. Two positive clones, AALDH2-21 andAALDH2-36, contained the same insert of 1.2 kbp. Anotherdone, AALDH2-22, contained an insert of 1.3 kbp. These twoinserts contained an overlap region of 0.9 kbp. The combinedcDNA contained a sequence that encodes 399 amino acidresidues, a chain-termination codon, a 3' untranslated regionof 403 bp, and a poly(A) segment. The deduced amino acidsequence was compatible with the amino acid sequences of thetryptic peptides. The degree of homology between humanALDH1 and ALDH2 is 66% for the coding regions of theircDNAs and 69% at the protein level. No significant homologywas found in their 3' untranslated regions.

Liver aldehyde dehydrogenase (ALDH; aldehyde:NAD+oxidoreductase, EC 1.2.1.3) is considered to play a major rolein alcohol metabolism. Two major and several minor iso-zymes exist in the livers of mammals, including man. One ofthe major isozymes, ALDH1 (or E1), is of cytosolic origin,associated with a low Km for NAD and a high Km foracetaldehyde, and strongly inactivated by disulfiram. An-other major isozyme, ALDH2 (or E2), is of mitochondrialorigin, associated with a high Km for NAD and a low Km foracetaldehyde, and insensitive to disulfiram (1-6).

Racial differences in these two isozymes have been foundbetween Caucasians and Orientals. All Caucasians examinedthus far have both ALDH1 and ALDH2 in their livers(commonly designated "usual"). In contrast, =50% of Ori-entals have only the ALDH1 isozyme and are missing theALDH2 isozyme (commonly designated "atypical") (7, 8).The atypical Oriental livers, however, contain a defectiveenzyme, with diminished activity, that is immunologicallyrelated to ALDH2 (6, 9, 10). More recently, another ab-

normality, an absence of active ALDH1 and instead thepresence of an enzymatically inactive protein, was found insome Orientals (11). A very high incidence (50-80%o) of acutealcohol intoxication in Orientals in comparison to Caucasians(about 10%) could be attributed to genetic differences in theALDH isozymes (7).Both ALDH1 and ALDH2 are tetrameric forms (2-5), and

the two isozymes do not contain a common subunit (6). Theamino acid sequences of human and horse ALDH1 wererecently reported (12, 13), but the sequence of ALDH2 isunknown.

In this paper, we report the isolation and characterizationofcDNA clones for human ALDH1 and ALDH2. The aminoacid sequences of the two isozymes were deduced from theircDNA sequences and compared.

MATERIALS AND METHODSSequence Analysis of ALDH2. Human liver ALDH2 was

purified to homogeneity from a liver autopsy sample from aCaucasian with the usual phenotype, as described (6). Thetryptic peptides were isolated either by peptide mapping orby reversed-phase HPLC (10). Amino acid sequences of thepeptides were determined by manual Edman degradation(14).

Preparation of Radioactive Oligonucleotide Probes. Twotypes of mixed icosamers, corresponding to ALDH1 aminoacid sequence, and mixed tetradecamers, corresponding toALDH2 amino acid sequence, were synthesized by a solid-phase phosphotriester method (see Fig. 1). The chemicallysynthesized probe was labeled at the 5' end with [y-32P]ATP(2000-5000 Ci/mmol, 1 Ci = 37 GBq; ICN) and T4 polynu-cleotide kinase (Bethesda Research Laboratories) by thestandard method (15).

Screening of Human Liver cDNA Expression Library withAntibody. Rabbit antibody against homogeneous ALDH1and that against ALDH2 were partially purified through(NH4)2SO4 precipitation and DEAE-cellulose chromatogra-phy as described (6). The human liver cDNA library, con-structed by inserting the cDNA copies of poly(A)+ mRNAfrom human liver into the EcoRI site of bacteriophage vectorXgtll via synthetic linkers (16), was provided by S. L. C.Woo (Howard Hughes Medical Institute, Houston, TX). ThecDNA library was screened by the antibody probe methoddescribed in a previous paper (17).

Identification of Fusion Protein from Lysates of InducedRecombinant Lysogens. The method used for isolation offusion protein is essentially identical to that described previ-ously (17). Fusion protein was detected with anti-ALDH1 oranti-ALDH2 antibody.

Analysis of Recombinant Agtll Inserts by Southern BlotHybridization with Oligonucleotide Probes. The DNA prepa-

Abbreviations: ALDH, aldehyde dehydrogenase; ADH, alcoholdehydrogenase; bp, base pair(s).*To whom reprint requests should be addressed.

3771

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Proc. Natl. Acad. Sci. USA 82 (1985)

rations were separated by electrophoresis in agarose gels andtransferred to nitrocellulose filters (15). Hybridization over-night with 5'-end-labeled oligonucleotide mixture (106cpm/ml) was at 560C for the ALDH1 probes and at 380C forthe ALDH2 probes. The filters were subsequently washedthree times in 0.9M NaCl/0.09 M sodium citrate, pH 7, at thehybridization temperature for 15 min, dried at room tempera-ture, and autoradiographed for 2 days at -70'C with twointensifying screens.

Subcloning of Phage Inserts and Preparation of ClonedDNA. The individual insert cDNA was ligated to the EcoRI-digested pUC13 vector. Competent Escherichia coli TB1cells were transformed with the ligated DNA by the calciumchloride procedure (15). PlasmidDNA was prepared from thetransformed cells grown in a large-scale liquid culture andwas purified by gradient centrifugation in cesium chloride(15).

Restriction Endonuclease Maps. Restriction mapping of thecDNA insert of Xgtll was performed by single or doubleendonuclease digests of the recombinant phage or the puri-fied insert DNA. Restriction enzymes (Bethesda ResearchLaboratories and Boehringer Mannheim) were used underthe conditions recommended by the suppliers.DNA Sequence Analysis. The restriction fragments were

subcloned in phage M13mpl8 and -mpl9. DNA sequence wasdetermined by the dideoxynucleotide chain-terminationmethod (18).

RESULTS AND DISCUSSION

Amino Acid Sequences of Tryptic Peptides of ALDH2. Toobtain structural information which is essential for synthesisof an oligonucleotide probe and confirmation of clonedcDNA, we determined amino acid sequences of trypticpeptides of ALDH2 (Table 1). Although some of these

peptides isolated by peptide mapping or by HPLC werecontaminated with other peptides, their sequences can still bealigned, because of the difference in amounts of majorpeptides and minor contaminant peptides. From the aminoacid sequence data obtained, a peptide Gly-Asn-Pro-Phe-Aspwas selected as a probe site, and a mixture of 64 differenttetradecadeoxynucleotides encoding this sequence was pre-pared (Fig. 1).

Screening of cDNA Clones for ALDH1 and ALDH2 withAntibody Probes. The human cDNA library was screenedsuccessively four times with either anti-ALDH1 antibody oranti-ALDH2 antibody, neither of which crossreact with thehost E. coli and Xgtll phage proteins. Five clones that gavea strong signal with anti-ALDH1 but no or a weak signal withanti-ALDH2 were isolated. Two clones, XALDH1-1 andXALDH1-2, which had longer inserts than the others, wereselected for the examination of fusion proteins. NaDod-S04/PAGE (not shown) revealed that the molecular sizes ofthe fusion proteins that reacted with anti-ALDH1 antibodywere about40 kDa (for XALDH1-1) and 20 kDa (forXALDH1-2) larger than E. coli /-galactosidase.A total of 39 clones that gave a strong signal with anti-

ALDH2 but no or weak signal with anti-ALDH1 wereisolated. Three clones, XALDH2-21, XALDH2-22, andXALDH2-36, were selected for examination of fusion pro-teins. NaDodSO4/PAGE of the lysates prepared from theinduced recombinant lysogens indicated that the molecularsize of the fusion proteins that reacted with anti-ALDH2antibody was about 40 kDa larger than E. coli/3-galactosidase(Fig. 2), suggesting that the recombinants contained cDNAfor human ALDH2.

Southern Blotting Analysis and Subcloning of the ALDH1and ALDH2 Recombinant Clones. DNA from XALDH1-1 wasdigested with EcoRI and subjected to Southern blot hybridi-zation with ALDH1 oligonucleotide probes. The 1.6-kilobase-pair (kbp) insert hybridized strongly with the syn-

Table 1. Amino acid sequences of tryptic peptides obtained from human liver ALDH2

Amino acid sequence

Peptides from HPLCAla-Val-Lys; Tyr-His-Gly-Lys [5]Ser-Val-Ala-Arg [16]; Met-His-Gly-LysVal-Pro-Glx-Lys [34]; Ser-Tyr-Thr-Arg; Met-Asn-Ala-Ser-His-ArgCys-Leu-Arg [3]Leu-Leu-Asn-ArgTyr-Tyr-Ala-Gly-Trp-Ala-Asp-Lys [4]; Gly-Thr-Leu-Glu-Leu-Glu-Val-Asx-LysVal-Val-Gly-Asx-Pro-Phe-Asx-Ser-Lys [19]; Leu-Ala-Asx-Leu-Ile-Glx-Ile-Leu-Gly-Tyr-Ile-Asn-Thr-Gly-Lys [21]Ala-Ala-Phe-Pro-Thr-Gly-Ser-Pro-Ala-; Lys-Thr-Glx-Glx-Leu-Val-Asx-Leu-ArgThr-lle-Pro-Ble-Asp-Gly-Asx-Phe-Phe-Ser-Tyr- [6]His-Glu-Pro-Val-Gly-Val-Cys-Gly-Gln-Ile-Ile- [7];Thr-Phe-Val-Glx-Glx-Asp-lle-Tyr-Asp-Glu-Phe-Val- [15]Glu-Ala-Gly-Phe-Pro-Pro-Gly-Val-Val-Asn-Ue-Val-Pro- [10]Glu-Glu-Be-Phe-Gly-Pro-Val-Met-Glx-Ile-Leu- [25];Ser-Pro-Asn-fle-Ile-Met-Ser-Asp-Ala-Asp-Met- [14]Ala-Asx-Tyr-Leu-Ser- [30]

Peptides from peptide mapMet-Asn-Ala-Ser-His-ArgMet-Ser-Gly-Ser-Gly-Arg [31]Leu-Gly-Pro-Ala-Leu-Ala-Thr-Gly-Asx-Val-Val-Val- [8];Asp-Leu-Asp- [29]Val-Thr-Leu-Glu-Leu-Gly-Gly-Lys [13]; Gly-Tyr-Phe-Ile-Glx-Pro-Thr-Val-Phe-Gly-Asx-Val- [24]Val-Val-Gly-Asn-Pro-Phe-Asp-Ser- [19]Val-Pro-Gin-Lys [34]Leu-Leu-Cys-Gly-Gly-Gly-Ile-Ala-Ala-Asp- [23]Val-Ala-Phe-Thr-Gly-Ser-Thr-Glx- [11]

The sequences in bold type are compatible with the sequence deduced from cDNA; the bracketed numbers correspondto the singly underlined sequences in Fig. 5.

Peak or spot

IIIVIIVIIIIXXIIXXXXIXXIIIXXXXXXIIIXXXIV

XXXVIIXXXVIII

XLV

IIIII

IVVVIVIIVIII

3772 Genetics: Hsu et al.

Proc. Natl. Acad. Sci. USA 82 (1985) 3773

A 1 - B ab c C a b c D a b c

- Z

fal40,-"

Probe I

Peptide : H2N-Glu-Phe-Ala-His-His-Gly-Val-COOH

: 5' GAA UU GCN CAU CAU GGC GU 3'G C C C

cDNA : 3' CTCT AAGA CGN GTA GTACCG CA 5'

Probe II

Peptide H2N-Gln-Gly-Gln-Cys-Cys-Ile-Ala-COOH

mRNA 5' CAA GGN CAA UGU UGU AUC GC 3'

cDNA : 3' GTcT CCN GTcT ACGA ACGA TAG CG 5'

ALDH2

Peptide H2N-Gly-Asn-Pro-Phe-Asp-COOH

mRNA : 5' qGN AAC CCN UUg GA 3'

cDNA probe: 3' CCN TT G GGN AAA CT 5'

FIG. 1. Synthetic oligonucleotides used as probes. The ALDH1probes each consisted of 64 different icosamers corresponding toamino acid sequences ofa tryptic peptide ofALDH1 which has beenimplicated in the disuifiram binding site (12, 19). The ALDH2 probeconsisted of 64 different tetradecamers corresponding to amino acidsequence of a tryptic peptide obtained from human ALDH2. N, allfour possible deoxynucleotides.

thetic probes (not shown). The XALDH1-1 insert wassubcloned in vector pUC13 (Fig. 3A). DNA prepared fromthe subclone was digested with Pst I or doubly digested withPst I and EcoRJ and subjected to Southern blot hybridization.The probe site was located on a 0.94-kbp Pst I/EcoRIfragment (Fig. 3 B-D).The inserts of all three XALDH2 DNA were hybridized

with the ALDH2 oligonucleotide probe (not shown). Insertsize of XALDH2-21 and -36 was estimated as about 1.2 kbp,and that of XALDH2-22 was 1.3 kbp.

Restriction Endonuclease Maps and Nucleotide Sequences.The restriction map of the EcoRJ insert of XALDH1-1 is

A1 2 3 4 5 6

B1 2 3 4 5 6

a -

FIG. 2. Detection of fusion protein by NaDodSO4/PAGE andimmunoblotting. Proteins accumulating in induced lysogens contain-ing Xgtll, XALDH2-21, XALDH2-22, and XALDH2-36, respectively,were compared. (A) Gel stained with Coomassie blue. (B) Replicanitrocellulose filter stained with anti-ALDH2 antibody and peroxi-dase-conjugated goat antiserum against rabbit IgG. Lane 1: E. coli,-galactosidase (a), bovine serum albumin (b), ALDH1 (c), ALDH2(d), and egg albumin (e). Lane 2: lysate from host E. coil BNN103.Lanes 3-6: lysates from BNN103 lysogenized with Xgt11, XALDH2-21, XALDH2-22, and AALDH2-36, respectively.

FIG. 3. Agarose gel electrophoresis and Southern blot hybridiza-tion of ALDH1 recombinant DNA. (A) Agarose gel stained withethidium bromide. Lanes: 1, EcoRI-digested XALDH1-1 DNA; 2,EcoRI-digested plasmid pUC13 DNA with XALDH1-1 insert; 3,HindIII-digested wild-type X phage DNA; (B-D) Agarose gel stainedwith ethidium bromide (B) and autoradiograms of replica nitrocel-lulose filters hybridized with ALDH1 probes I (C) and II (D),respectively. Lanes: a, HindIII-digested wild-type X DNA; b, PstI/EcoRI-digested plasmid DNA with AALDH1-1 insert; c, PstI-digested plasmid DNA with AALDH1-1 insert;

shown in Fig. 4. Cleavage sites for Sst I, Kpn I, Sma I,BamHI, Xba I, Sal I, and Sph I were not found in the insert.The strategy used forthe sequence determination of XALDH1-1 is also outlined in Fig. 4. The nucleotide sequence and thededuced amino acid sequence are shown in Fig. 5. The cDNAsequence was verified by the data generated from bothstrands. It contains a coding sequence for 340 amino acidresidues, a 3' noncoding region of 538 bp, and a poly(A)segment. The amino acid sequence deduced from the nucleo-tide sequence exactly coincides with the reported amino acidsequence from position 161 to the COOH-terminal position500 ofhuman ALDH1, except for one position (12). Based onour sequence data, the amino acid at position 161 should beisoleucine, whereas it was reported as valine in the proteinsequence study (12). Although the possibility of errors in thesequence determination cannot be ruled out, the discrepancycould be due to a substitution, G -- A, which occurred duringthe ligation steps, or to genetic polymorphism at the ALDHIlocus in man.The restriction endonuclease cleavage maps of the three

clones for ALDH2 are shown in Fig. 4. The cDNA insert ofAALDH2-22 contained a region of 0.9 kbp that overlappedwith the insert of XALDH2-21 and -36; together, the insertscovered 1.6 kbp. Cleavage sites for HincII, Sma I, Pst I,

HindIII, and Xba I were detected. Six Sau3A1 sites were alsolocated in the clones (Fig. 4). No site for BamHI, Acc I, KpnI, Sal I, Sst I, or Sph I was found in the clones. The syntheticoligonucleotide probe site was located between the Sma I andSau3Al sites.The strategy employed to determine the sequences of the

cDNA inserts ofXALDH2-21, -22, and -36 is outlined in Fig.4. The DNA sequences were verified by the data generatedfrom both strands. The combined sequence of cDNA forALDH2 derived from the clones is given in Fig. 5. Theexistence of a poly(A) segment at the 3' end of the combinedcPNA suggested that it contained the 3' untranslated regionofALDH2 mRNA. Two of the three possible reading framesencountered in-phase termination codons at positions 105and 215, respectively, from the 5' end of the combinedsequence. The remaining reading frame encodes 399 aminoacid residues before the stop codon TAA is encountered; thisstop codon is followed by a 3' noncoding region of 403nucleotides. The noncoding region does not contain the-A-A-T-A-A-A- sequence which is considered to be impor-

ALDH1

mRNA

Genetics: Hsu et al.

3774 Genetics: Hsu et al.

0 500 1000

Proc. Natl. Acad. Sci. USA 82 (1985)

1500 bp

SSHp T AS P S P HIll. II 113'

kXALDH2-22I XALDH2-21,36 I

S SHc SinS Ss HcPHS XI II Iii 1~~~~~3

cDNA2:ALDH2:

cDNAj:ALDH1:cDNA2:ALDH2:

cDNAI:ALDH1:cDNA2:ALDH2:

FIG. 4. Restriction maps of cDNA insert ofAALDH1 and that of XALDH2 and sequencedetermination strategy. Horizontal arrows indi-cate the direction and extent sequencing. Restric-tion endonuclease cleavage sites: A, Ava II; H,HindIII; Hc, HincII; Hp, Hpa II; P, Pst I; S,Sau3Al; Sm, Sma I; T, Taq I; and X,Xba I. Thicklines indicate the coding regions.

cDNA2: CTGGCGGCCTTGGAGACCCTGGACAATGGCAAGCCCTATGTCATCTCCTACCTGGTGALDH2: LeuAlaAlaLeuGluThrLeuAspAsnGlyLysProTyrVal IleSerTyrLeuVal 19

GATTTGGACATGGTCCTCAAATGTCTCCGGTATTATGCCGGCTGGGCTGATAAGTACCACGGGAAAACCATCCCCATTGACGGAGACTTCTTCAGCTACACACGCCATGAACCTGTGGGGAspLeuAspMetV LeuLysCysLeuArgTyrTyrAlaGlyTrpAlaAspLysTyrHisGlyLysThrI1ePro11eAspGlyAspPhePheSeryrTh9rArgHisGluProVa1G1y59

ATATGTGGCCAAATCATTCCTTGGAATTTCCCGTTGGTTATGCTCATTTGGAAGATAGGGCCTGCACTGAGCTGTGGAAACACAGTGGTTGTCAAACCAGCAGAGCAAACTCCTCTCACTI1 eCysGlyCl n IleI leProTrpAsnPheProLeuValMetLeulIleTrpLys IleGlyProAl aLeuSerCysGlyAsnThrVal Val Val LysProAl aGl uGl nThrProLeuThrGTGTGCGGGCAGATCATTCCGTGGMATTTCCCGCTCCTGATGCAAGCATGGAAGCTGGGCCCAGCCTTGGCAACTGGAAACGTGGTTGTGATGAAGGTAGCTGAGCAGACACCCCTCACCVal CysGlyGl n IleI 1eProTrpAsnPheProLeuLeuMetGl nAl aTrpLys~euGlyProAl aLeuAl aThrGlyAsnVal Val ValMetLysValAl aGl udlnThrProLeuThr 99

GCTCTCCACGTGGCATCTTTAATAAAAGAGGCAGGGTTTCCTCCTGGAGTAGTGAATATTGTTCCTGGTTATGGGCCTACAGCAGGGGCAGCCATTTCTTCTCACATGGATATAGACMAAA1 aLeuHi sValAl aSerLeuI leLysGl uAl aGl yPheProProGl yVal ValAsn IleVal ProGl yTyrGl yProThrAl aGlyAl aAl a IleSerSerHi sMetAspIlleAspLysGCCCTCTATGTGGCCAACCTGATCAAGGAGGCTGGCTTTCCCCCTGGTGTGGTCAACATTGTGCCTGGATTTGGCCCCACGGCTGGGGCCGCCATTGCCTCCCATGAGGATGTGGACAAAA1 aLeuTyr~alAl aAsnLeuI 1eLysGl uAl aGlyPheProProGlyVal ValAsn Il1eVa! ProGl'yPheGlyProThrAl iGl yAl aAl a IleAlaSerHi sGl uAspValAspLysl139

u_

cDNAj: GTAGCCTTCACAGGATCAACAGAGGTTGGCAAGTTGATCAAAGAAGCTGCCGGGAAAAGCAATCTGAAGAGGGTGACCCTGGAGCTTGGAGGAAAGAGCCCTTGCATTGTGTTAGCTGATALDHj: ValAl aPheThrGlySerThrGl uVal G1yLysLeuI leLysGl uAl aAl aGl yLysSerAsnLeuLysArgValThrLeuGl uLeuGlyGlyLysSerProCysI leVal LeuAl aAspcDNA2: GTGGCATTCACAGGCTCCACTGAGATTGGCCGCGTAATCCAGGTTGCTGCTGGGAGCAGCAACCTCAAGAGAGTGACCTTGGAGCTGGGGGGGAAGAGCCCCAACATCATCATGTCAGATALDH2: ValAlaPheThrGlySerThrGluIleGlyArgVal IeGlnVa1A1aAlaGlySerSerAsnLeuLysArgVa1ThrLeuGluLeuGlyGlyLysSerProAsnIleIleetSersALDH2: ~~~1113 1

Ap17

cDNAj:ALDH1:cDNA2:ALDH2:

cDNAj:ALDH1:cDNA2:ALDH2:

cDNAj:ALDH1:cDNA2:ALDH2:

cDNAI:ALDH1:cDNA2:ALDH2:

GCCGACTTGGACAATGCTGTT_>gATTTGCACACCATGGGhTTCTACCAkCfAGGGCCAGTGTTGTATAGCGCATCCAGGATTTTTGTGGAAGAATCAATTTATGATGAGTTTGTTCGAAlaAspLeuAspAsnAlaVal GuPheA aHisHisGlyVal PheTyrHisGlnGlyGlnCysCysTle- aAl aSerArgI1ePheVa1G1uGluSerIleTyrAspGl uPheValArgGCCGATATGGATTGGGCCGTGGAACAGGCCCACTTCGCCCTGTTCTTCAACCAGGGCCAGTGCTGCTGTGCCGGCTCCCGGACCTTCGTGCAGGAGGACATCTATGATGAGTTTGTGGTGA1 aAspMetAspTrpAl aVal G1uGl nAl aHi sPheAl aLeuPhePheAsnGlnGlyGl nCjsCysCysAl aGlySerArgThrPheVal G1nGl uAspIleTyrAspGl uPheVal Val 219

AGGAGTGTTGAGCGGGCTAAGAAGTATATCCTTGGAAATCCTCTGACCCCAGGAGTCACTCAAGGCCCTCAGATTGACAAGGAACAATATGATAAAATACTTGACCTCATTGAGAGTGGGArgSer~a1GluArgAl aLysLysTyrI1eLeuGl yAsnProLeuThrProGlya1ThrGlnGl yPrGnIeAspLysGluGlnTyrAspLysIeLeuAspLeuI1eGl uSerGlyCGGAGCGTTGCCCGGGCCAAGTCTCGGGTGGT GGAACCC TTTG4TAGCAAGACCGAGCAGGGGCCGCAGGTGGATGAAACTCAGTTTAAGAAGATCCTCGGCTACATCAACACGGGGArgSerVa1aArgAl aLysSerArgVa1Va1G1yAsnProPheAspSerLysThrGl uGlnGlyProGlnaAspGl uThrGl nPheLysLysIeLeuGlyTyrIeAsnThrGly 259

I6 19 Zi

AAGAAAGAAGGGGCCAAACTGGAATGTGGAGGAGGCCCGTGGGGGAATAAAGGCTACTTTGTCCAGCCCACAGTGTTCTCTMATGTTACAGATGAGATGCGCATTGCCAAAGAGGAGATTLysLysGl uGl yAl aLysLeuGl uCysGl yGl yGl yProTrpGlyAsnLysGl yTyrPheVal G1nProThrVa 1PheSerAsnVal1ThrAspGl uMetArgIl~eAl aLysGl uGl u IleAAGCAAGAGGGGGCGAAGCTGCTGTGTGGTGGGGGCATTGCTGCTGACCGTGGTTACTTCATCCAGCCCACTGTGTTTGGAGATCTGCAGGATGGCATGACCATCGCCMAGGAGGAGATCLysGl nGl uGl yAl aLysLeuLeuCysGl yGl y~lyI 1eAl aAl aAspArgGl yTyrPhelIleGl nProThrVal pheGl yAspVal G1nAspGlyMetThrI leAl aLysGl uGl uIle 299

23 24

TTTGGACCAGTGCAGCAAATCATGAAGTTTAAATCTTTAGATGACGTGATCAAAAGAGCAAACMATACTTTCTATGGCTTATCAGCAGGAGTGTTTACCMMAGACATTGATMAAGCCATAPheGlyProValG1nGlnIleMetLysPheLysSerLeuAspAspVal IleLysArgAlaAsnAsnThrPheTyrGlyLeuSerAlaGlyValPheThrLysAspIleAspLysAlalleTTCGGGCCAGTGATGCAGATCCTGAAGTTCAAGACCATAGAGGAGGTTGTTGGGAGAGCCAACMTTCCACGTACG3GCTGGCCGCAGCTGTCTTCACAAAGGATTTGGAC9GGCCMATpheGl {ProVal MetGl nlIleLeuLysPheLysthrIl1eGl uGl uVal1Va 1G1lyArgAl aAsnAsnSerThrTyrGlyLeuAl aAl aAl aVal PheThrLysAspLeuAspLysAl a~sn 339

cDNAI: ACAATCTCCTCTGCTCTGCAGGCAGGAACAGTGTGGGTGAAITGCTIAITGGCGTGGITAAGIGCCAGTCICICiTGGGATTCAAGATIb1bTGGAAATIbb GARRACITGGWAGAGTACALDHj: ThrI eSerSerAl aLeuGl nAl aGl yThrVal1TrpVal1AsnCysTyrGl yVal1Val1SerAl aGl nCysProPheGl ytlyPheLyst~etSerGl yAsnGl yArgGl uLeuGly~l uTyrcDNA2: TACCTGTCCCAGGCCCTCCAGGCGGGCACTGTGTGGGTCMACTGCTATGATGTGTTTGGAGCCCAGTCACCCTTTGGTGGCTACMAGATGTCGGGGAGTGGCCGGGAGTTGGGCGAGTACALDH2: TyrLeuSerGlnAlaLeuGnAl aGlyThrValTrpValAsnCysTyrAspValPheGlyAlaGl nSerProPheGlyGlyTyrLysetSerGlSerGlyArgGluLeuGlyGluTyr 379

cDNAI1G6TTTCCATGMTATACAGAGGTCAAMACAGTCACAGTGAAAATCTC TCAGMAGAACTCATAAAGAAAATACAAGAGTGGAGAGAAGC TCTTCMATA6CTAGCATCTCCTTACAGTCACALDH1: G1yPheHisG1uTyrThr luVa1LysThrVa1ThrVa1LysI1eSerG1nLysAsnSerTermcDNA2: 666CT6CA6GCATACACTGAGTGAAAACTGTCACAGTCAAAGTGCCTCAGAAGMCTCATAAGAATCATGCAAGCTTCCTCCCTCA6CCATTGAT6GAAAGTTCAGCAAGATCAGCMCALDH2: G1lyLeuGl nAl aTyrThriluVal1LysThrVal1ThrVal1LysVal1Pro61lnLysAsnSerTerm39

11 34

cDNAj:cDNA2:cDNA1:cDNA2:

cDNAI:cDNA2:

cDNAj:cDNAj:

TAATATAGTAGATTTTAAAGACAAAATTTTTCTTTTCTTGIfi^IATTTTTTTTAAACATAAGCTAAATCATATTAGTATTAATACTACCCATAGAAAACTTGACATGTAGCTI xTCTTCTAAAAAAACCAAGAAAAMTGATCCTTGCGTGCTGAATATCTGAAAAGAGAAATTTTTCCTACAAAATCTCTTGGGTCAAGAAGTTCTAGAATTTGAATTGATAAACATGGTGGGTTGGCTGAGATTATTTGCCTTCTGAAATGTGACCCCCAAGTCCTATCCTAAATAAAAAAGACAAATTCGGATGTATGATCTCTCTAGCTTTGTCATAGTTATGTGATTTTCCT-TTGTAGCTACTTTTGGGTAAGAGTATATGAGGAACCTTTTAAACGACAACATACTGCTAGCTTTCAGGATGATTTTTAAAAAATAGATTCAAATGTGTTATCCTCTCTCTGAAACGCTTCCTATAACTCGAGTTCAGGATAATAATTTTATAGAAAAGGAACAGTTGCATTTAGCTTCTTTCCCTTAGTGACTCTTGAAGTACTTAACATACACGTTMACTGCAGAGTAAATTGCTCTGTTC6CAGTAGTTATATATAGGGGAAGAAAAAGCTATTGTTTACAATTATATCACCATTAAGGCAACTGCTACACCCTGCTTTGTATTCTGGGCTAAGATTCATTAAAACTAGCTGCTCTT (A )1I5AAGTCCTTGGACTGTTTTGAAAAGTTTCCTAGGATGTCATGTCTGCTTGTCAAAAGAAATAATCCCTGTAATATTTAGCTGTAAACTGAATATAAAGCTTAATAA w cAACCTTGCATAT(A )18

FIG. 5. Nucleotide sequences ofcDNAs and deduced partial amino acid sequences for human ALDH1 and ALDH2. The singly underlinedamino acid sequences are compatible with the ALDH2 tryptic peptides analyzed; the numbers immediately below these regions correspond to

those of the tryptic peptides listed in Table 1. The doubly underlined region corresponds to the tryptic peptide which includes an amino acidsubstitution found in the atypical Oriental ALDH22 (10). Arrows indicate the substitution sites in the atypical gene and enzyme (10). The syntheticprobe sites are boxed. Numbers at right indicate amino acid residue numbers in the partial ALDH2 sequence.

ALDH1 5I

ALDH2 5'

Proc. Natl. Acad. Sci. USA 82 (1985) 3775

tant for the addition of the poly(A) tail to the 3' end ofmRNA(20). However, a comparable sequence, -A-T-T-A-A-A-, islocated 20 bases upstream from the 3' end of the region (Fig.5).The cDNA includes the synthetic probe site. The deduced

399 amino acid sequence contained 21 regions that werecompatible with the amino acid sequences ofALDH2 trypticpeptides (Table 1 and Fig. 5), strong evidence that the cDNAwe obtained is for human ALDH2 isozyme. The deducedamino acid sequence at positions 12-25 from the COOHterminus is similar to that of a tryptic peptide that has beenimplicated in the abnormality ofthe atypical Oriental ALDH2molecule (10). In atypical Oriental ALDH22, the position 14amino acids from the COOH terminus is lysine instead of theusual glutamic acid, and this single amino acid substitutionseems to have resulted in a drastic reduction of the enzymeactivity (10). There are some discrepancies between theprevious amino acid sequence data (10) and the amino acidsequence deduced from the nucleotide sequence in this partof the molecule. The previous amino acid sequence dataincluded errors, presumably due to contamination in thepeptide samples and decomposition of tyrosine during acidhydrolysis.Amino acid sequence data available for ALDH2 have been

limited to sequences oftwo tryptic peptides ofhorse ALDH2,one of 15 residues and the other of 23 residues (21, 22). Thesubunit molecular weight of human ALDH2 was estimated tobe 52,600 (6). The deduced COOH-terminal sequence of 399amino acids thus accounts for >80% of human ALDH2. Acomparison of the 399 residues of ALDH2 and the reportedamino acid sequence of human ALDH1 (12) indicated 69%homology between the two human isozymes (Fig. 5). Sixty-five out of 123 substitutions are compatible with single-basechanges.The degree ofhomology (about 69% at the amino acid level

and 66% at the coding nucleotide level) between humanALDH1 and ALDH2 is lower than the homology betweenhuman ALDH1 and horse ALDH1, which is estimated to be91% (13). This finding is compatible with early evolutionarydivergence of the cytosolic and mitochondrial isozymes. Ithas been reported that homology between the pig mitochon-drial and cytosolic aspartate aminotransferases is about 50%at the protein level (23). Thus, the rate of divergence ofmitochondrial and cytosolic isozymes appear to differ sub-stantially among enzymes.ALDH1 is strongly inactivated by disulfiram, whereas

ALDH2 is resistant to this agent (2,. 4). A cysteine residue atposition 302 from the NH2 terminus (position 199 from theCOOH terminus) has been implicated in the disulfirambinding site of ALDH1 (12, 19). A cysteine residue is in thecorresponding position in ALDH2 (position 200 in Fig. 5).However, the isoleucine residue next to this cysteine inALDH1 is replaced by a cysteine in ALDH2; the correspond-ing sequences are -Gly-Gln-Cys-Cys-Ile-Ala- for ALDH1 and-Gly-Gln-Cys-Cys-Cys-Ala- for ALDH2 (see Fig. 5).The present study, together with the previous amino acid

substitution study (10), provides the exact nucleotide se-quences for the usual ALDH21, and the atypical ALDH22genes; -GAA-GTG-AAA-ACT-GTC-ACA- in the region forALDH21, and AAA instead of GAA for ALDH22 (Fig. 5,arrows). Use of two synthetic oligodeoxynucleotide probescorresponding to these sequences would make it possible todetermine genotypes of individuals by Southern hybridiza-tion analysis of DNA from their peripheral blood cells, aspreviously accomplished in other cases (24, 25). Chromo-somal assignments for ALDHJ and ALDH2 loci in man arenot yet known. Use of the cloned cDNAs should allow theseassignments to be made.

Racial differences were also observed in alcohol dehydro-genase (ADH). The majority of Orientals have the "variant"ADH22 gene, producing the atypical enzyme which exhibits100 times higher specific activity than the wild-type enzyme(26). The structural difference between the two types of theenzyme has been determined (26). Recently, a full-lengthcDNA for human ADH2 was cloned and sequenced, and thenucleotide difference between the wild-type ADH21 andvariantADH22genes became apparent (17). Determination ofthe organization ofALDH andADHgenes may shed light onthe possible relationship between the polymorphism of theseloci and alcohol sensitivity and/or alcoholism in Caucasiansand Orientals.

We are indebted to Dr. Savio L. C. Woo for allowing us to use thehuman liver cDNA library. We also thank Dr. G. L,. Forrest forproviding the pUC13 vector, Dr. T. 0. Baldwin for providing E. colistrain TB1, and Dr. B. Simmer and Mr. T. Hankapiller for computeranalysis. This work was supported by Public Health Service GrantsHL-29515 and AA05763.

1. Crow, K. E., Kitson, T. M., McGibbon, A. K. H. & Batt,R. D. (1974) Biochim. Biophys. Acta 350, 121-128.

2. Eckfeldt, J., Mope, L., Takio, K. & Yonetani, T. (1976) J.Biol. Chem. 251, 236-240.

3. Eckfeldt, J. & Yonetani, T. (1976) Arch. Biochem. Biophys.175, 717-722.

4. Greenfield, N. J. & Pietruszko, R. (1977) Biochim. Biophys.Acta 483, 35-45.

5. Kitabatake, N., Sasaki, R. & Chiba, H. (1981) J. Biochem.(Tokyo) 89, 1223-1229.

6. Ikawa, M., Impraim, C. C., Wang, G. & Yoshida, A. (1983) J.Biol. Chem. 258, 6282-6287.

7. Goedde, H. W., Harada, S. & Agarwal, D. P. (1979) Hum.Genet. 51, 331-334.

8. Teng, Y.-S. (1981) Biochem. Genet. 19, 107-114.9. Impraim, C. C., Wang, G. & Yoshida, A. (1982) Am. J. Hum.

Genet. 34, 837-841.10. Yoshida, A., Huang, I.-Y. & Ikawa, M. (1984) Proc. Nail.

Acad. Sci. USA 81, 258-261.11. Yoshida, A., Wang, G. & Dave, V. (1983) Am. J. Hum. Genet.

35, 1117-1125.12. Hempel, J., von Bahr-Lindstrom, H. & Jomvall, H. (1984)

Eur. J. Biochem. 141, 21-35.13. von Bahr-Lindstrom, H., Hempel, J. & Jornvall, H. (1984)

Eur. J. Biochem. 141, 37-42.14. Huang, I.-Y., Rubinfien, E. & Yoshida, A. (1980) J. Biol.

Chem. 255, 6408-6411.15. Maniatis, T., Fritsch, E. F. & Sambrook, J. (1982) Molecular

Cloning: A Laboratory Manual (Cold Spring Harbor Labora-tory, Cold Spring Harbor, NY).

16. Young, R. A. & Davis, R. W. (1983) Proc. Nail. Acad. Sci.USA 80, 1194-1198.

17. Ikuta, T., Fujiyoshi, T., Kurachi, K. & Yoshida, A. (1985)Proc. Nail. Acad. Sci. USA 82, 2703-2707.

18. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Nail.Acad. Sci. USA 74, 5463-5467.

19. Hempel, J., Pietruszko, R., Fietzek, P. & Jornvall, H. (1982)Biochemistry 21, 6834-6838.

20. Proudfoot, N. J. & Brownlee, G. G. (1976) Nature (London)263, 211-214.

21. von Bahr-Lindstrom, H., Sohn, S., Woenckhaus, C., Jeck, R.& Jornvall, H. (1981) Eur. J. Biochem. 117, 521-526.

22. Hempel, J., von Bahr-Lindstrom, H. & Jomvall, H. (1983)Pharmacol., Biochem. Behav. 18, 117-121.

23. Kagamiyama, H., Sakakibara, R., Tanase, S., Morino, Y. &Wada, H. (1980) J. Biol. Chem. 255, 6153-6159.

24. Conner, B. J., Reyes, A. A., Morin, C., Itakura, K., Teplitz,R. L. & Wallace, R. B. (1983) Proc. Nail. Acad. Sci. USA 80,278-282.

25. Kidd, V. J., Wallace, R. B., Itakura, K. & Woo, S. L. C.(1983) Nature (London) 304, 230-234.

26. Yoshida, A., Impraim, C. C. & Huang, I.-Y. (1981) J. Biol.Chem. 256, 12430-12436.

Genetics: Hsu et al.