nucleotide sequenceand theencoded amino acidsofhuman serum … · sequencecodes...

6
Proc. NatL Acad. Sci. USA 79 (1982) Correction. In the article "Nucleotide sequence and the encod- ed amino acids of human serum albumin mRNA" by Achilles Dugaiczyk, Simon W. Law, and Olivia E. Dennison, which appeared in the January 1982 issue of Proc. NatL. Acad. Sci. USA (79, 71-75), the first paragraph of Discussion was garbled by a printer's error. The correct paragraph is printed below. Determining the complete nucleotide sequence of the cDNA has permitted us to identify the pre- and the propeptides of human serum albumin. Actually, the amino acid sequence of the propeptide has been reported for carriers of an abnormal albumin, Christchurch (18), which is longer than normal human albumin by six amino acids at the NH2 terminus. This additional hexapeptide has the sequence Arg-Gly-Val-Phe-Arg-Gln (18) and differs only in the terminal position from the sequence we are presently reporting for the apparently normal protein, Arg- Gly-Val-Phe-Arg-Arg-albumin (Fig. 3). Thus, carriers of albu- min Christchurch must be carriers of a CGA to CAA mutation, which changes the codon for Arg to Gln in the last position of the propeptide. The altered protein consequently ceases to be a substrate for the specific protease that removes propeptides from secretory proteins. It is interesting to note, however, that failure to remove such a propeptide does not prevent the pro- tein from being secreted; at least this is true about albumin Christchurch, which has reached the bloodstream of its carriers (18). Correction. In the article "A proton gradient controls a calcium-release channel in sarcoplasmic reticulum" by Varda Shoshan, David H. MacLennan, and Donald S. Wood, which appeared in the August 1981 issue of Proc. NatL Acad. Sci. USA (78, 4828-4832), the authors request that the following correc- tion be noted. In the experiment reported in Fig. 5 and dis- cussed on pages 4830 and 4831, the concentration of Ca2+ used to inhibit Ca + release was, in fact, 100 ,uM not 3.3 juM as re- ported. Consequently, although Ca2' release was measurably reduced by Ca2' in the experiments of Fig. 5, the data neither support nor preclude the possibility that physiological Ca2+ lev- els ('10 ,uM) can inhibit Ca2' release. 2124 Corrections Downloaded by guest on June 17, 2021 Downloaded by guest on June 17, 2021 Downloaded by guest on June 17, 2021 Downloaded by guest on June 17, 2021 Downloaded by guest on June 17, 2021 Downloaded by guest on June 17, 2021 Downloaded by guest on June 17, 2021

Upload: others

Post on 01-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • Proc. NatL Acad. Sci. USA 79 (1982)

    Correction. In the article "Nucleotide sequence and the encod-ed amino acids of human serum albumin mRNA" by AchillesDugaiczyk, Simon W. Law, and Olivia E. Dennison, whichappeared in the January 1982 issue ofProc. NatL. Acad. Sci. USA(79, 71-75), the first paragraph of Discussion was garbled by aprinter's error. The correct paragraph is printed below.

    Determining the complete nucleotide sequence ofthe cDNAhas permitted us to identify the pre- and the propeptides ofhuman serum albumin. Actually, the amino acid sequence ofthe propeptide has been reported for carriers of an abnormalalbumin, Christchurch (18), which is longer than normal humanalbumin by six amino acids at the NH2 terminus. This additionalhexapeptide has the sequence Arg-Gly-Val-Phe-Arg-Gln (18)and differs only in the terminal position from the sequence weare presently reporting for the apparently normal protein, Arg-Gly-Val-Phe-Arg-Arg-albumin (Fig. 3). Thus, carriers of albu-min Christchurch must be carriers of a CGA to CAA mutation,which changes the codon for Arg to Gln in the last position ofthe propeptide. The altered protein consequently ceases to bea substrate for the specific protease that removes propeptidesfrom secretory proteins. It is interesting to note, however, thatfailure to remove such a propeptide does not prevent the pro-tein from being secreted; at least this is true about albuminChristchurch, which has reached the bloodstream of its carriers(18).

    Correction. In the article "A proton gradient controls acalcium-release channel in sarcoplasmic reticulum" by VardaShoshan, David H. MacLennan, and Donald S. Wood, whichappeared in the August 1981 issue of Proc. NatL Acad. Sci. USA(78, 4828-4832), the authors request that the following correc-tion be noted. In the experiment reported in Fig. 5 and dis-cussed on pages 4830 and 4831, the concentration ofCa2+ usedto inhibit Ca + release was, in fact, 100 ,uM not 3.3 juM as re-ported. Consequently, although Ca2' release was measurablyreduced by Ca2' in the experiments of Fig. 5, the data neithersupport nor preclude the possibility that physiological Ca2+ lev-els ('10 ,uM) can inhibit Ca2' release.

    2124 Corrections

    Dow

    nloa

    ded

    by g

    uest

    on

    June

    17,

    202

    1 D

    ownl

    oade

    d by

    gue

    st o

    n Ju

    ne 1

    7, 2

    021

    Dow

    nloa

    ded

    by g

    uest

    on

    June

    17,

    202

    1 D

    ownl

    oade

    d by

    gue

    st o

    n Ju

    ne 1

    7, 2

    021

    Dow

    nloa

    ded

    by g

    uest

    on

    June

    17,

    202

    1 D

    ownl

    oade

    d by

    gue

    st o

    n Ju

    ne 1

    7, 2

    021

    Dow

    nloa

    ded

    by g

    uest

    on

    June

    17,

    202

    1

  • Proc. Nati Acad. Sci. USAVol. 79, pp. 71-75, January 1982Biochemistry

    Nucleotide sequence and the encoded amino acids of human serumalbumin mRNA

    (cDNA clones/codon usage/prepropeptide/triple-domain structure)

    ACHILLES DUGAICZYK, SIMON W. LAW*, AND OLIVIA E. DENNISONDepartment of Cell Biology, Baylor College of Medicine, Texas Medical Center, Houston, Texas 77030

    Communicated by James F. Bonner, Septemnber 3, 1981

    ABSTRACT The complete nucleotide sequence of humanserum albumin mRNA has been determined from recombinantcDNA clones and from a primer-extended cDNA synthesis on themRNA template. The sequence is composed of 2078 nucleotides,starting upstream from a potential ribosome binding site in the 5'untranslated region. It contains all the translated codons and ex-tends into the poly(A) at the 3' terminus. Part of the translatedsequence codes for a hydrophobic prepeptide, Met-Lys-Trp-Val-Thr-Phe-Ile-Ser-Leu-Leu-Phe-Leu-Phe-Ser-Ser-Ala-Tyr-Ser, fol-lowed by a basic propeptide, Arg-Gly-Val-Phe-Arg-Arg. Thesesignal peptides are absent from mature normal serum albuminand, so far, have not been identified in their nascent state in hu-mans. A remaining 1755 nucleotides of the translated mRNA se-quence code for 585 amino acids, which are in agreement, withfew exceptions, with the published amino acid sequence for humanserum albumin. The mRNA sequence verifies and refines the re-peating homology in the triple-domain structure of the serum al-bumin molecule.

    A B CaM- -~

    D

    The gene for serum albumin, which codes for the major plasmaprotein, is ofparticular interest to study because it is regulatedin development. In mammals, serum albumin is synthesized bythe adult liver, and its synthesis increases from low levels earlyin development to a high plateau in adulthood. The embryonicliver and yolk sac, on the other hand, produce predominantlya-fetoprotein, but the synthesis decreases drastically after birth(1-3). This inverse relationship between the expression of thetwo genes makes it an attractive problem in developmental bi-ology, particularly because the two genes are related. We haverecently determined the complete sequence of mouse a-feto-protein mRNA (4). The deduced a-fetoprotein structure re-vealed extensive homology to mammalian serum albumin, in-dicating that the two proteins are encoded in the same genefamily. Similar conclusions have been reached by others fromstudies on rat (5) and mouse (6) a-fetoprotein genes.

    In the present effort we extend our studies to the humangenome and report the cloning and sequence determination ofDNA complementary to the human serum albumin mRNA. Wededuce the unknown amino acid sequence of the signal peptideand verify a repeating homology in the triple-domain structureof the human serum albumin molecule.

    METHODSmRNA. Human liver mRNA was obtained by the procedure

    of Chirgwin et aL (7) from fetal livers removed at 17-20 weeksof gestation. Immunoprecipitation of albumin-containing poly-somes was performed according to Taylor and Tse (8). In vitrotranslation ofmRNA was carried out in a cell-free reticulocyte

    FIG. 1. Autoradiogram of proteins, labeled with [asSimethioninein an in vitro reticulocyte translation system, and separated electro-phoretically in a sodium dodecyl sulfate/acrylamide gel (9). Lane A,no mRNA added to the translation system. Lane B, translation prod-ucts of total human liver mRNA. Lane C, translation products of hu-man liver mRNA obtained from immunoprecipitated albumin-produc-ing polysomes. Lane D, part of the translation sample that is separatedin lane B was immunoprecipitated with antibody prior to its electro-phoretic separation on the gel.

    system, following the instruction of the supplier (New EnglandNuclear). The translation products were separated electro-phoretically according to Laemmli (9).

    Cloning and Sequence Determination of DNA. Double-stranded cDNA has been cloned in the Pst I site of the plasmidpBR322, as described in detail previously (4, 10). DNA se-quence was determined according to the procedure of Maxamand Gilbert (11).

    RESULTSEnriched mRNA. The products of in vitro translation of hu-

    man liver mRNA are shown in Fig. 1. On the basis of this trans-lation, mRNA that was enriched for serum albumin sequenceswas estimated to be over 50% pure (Fig. 1, lane C). This en-riched mRNA was used to obtain a cDNA probe to screen therecombinant clones for serum albumin sequences.

    * Present address: National Heart, Lung and Blood Institute, NationalInstitutes of Health, Bethesda, MD 20205.

    71

    The publication costs ofthis article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertise-ment" in accordance with 18 U. S. C. §1734 solely to indicate this fact.

  • 72 Biochemistry: Dugaiczyk et al.

    Recombinant Plasmids pHA36 and pHA206. A restrictionendonuclease map of the largest positive clone, pHA36, isshown in Fig. 2, together with a restriction map of the primer-extended plasmid clone pHA206. The latter was obtained in asecond transformation experiment after initiating the cDNAsynthesis from an internal primer. This primer was a 91-base-pair-long DNA fragment, Msp I (152)-Taq I (182/3), isolatedfrom pHA36. The two plasmids, pHA36 and pHA206, share0.15 kilobase of homologous DNA. Together, they encode theentire sequence for human serum albumin, starting with theCTT codon for Leu at position - 10 of the prepeptide and ex-tending into the 3' untranslated region of poly(A).

    Sequence of the Albumin cDNA. The entire nucleotide se-quence of the serum albumin mRNA, as determined from thecloned DNA in pHA36 and pHA206 and from the primer-ex-tended cDNA at the 5' terminus of the message, is shown inFig. 3. The inferred amino acid sequence is also indicated. ThemRNA length is 2078 nucleotides, of which 38 represent the5' untranslated region, 54 identify a prepeptide of 18 aminoacids, 18 identify a propeptide of 6 amino acids, 1755 code forthe known 585 amino acids ofserum albumin, 189 make up the3' untranslated region, and 24 are the poly(A) sequence. Nu-cleotides 5 to 15 (-34 to -24) in the 5' untranslated region (Fig.3) are complementary to a 3'-terminal region of eukaryotic 18SRNA (13) and thus could represent a ribosome binding site:

    (5') ... T-T-C C-T-T-C-T-G-T .............. albumin mRNA(3') ... G-A-G-G-A-A-G-G-C-G-U-C-C-m62A-m6A .... 18S RNAThe translated portion of the mRNA sequence codes for the

    signal peptide and the main body of the albumin polypeptidechain. Because prepeptides are removed from nascent secretoryproteins (such as albumin) in the endoplasmic reticulum, andthe conversion ofproalbumin to albumin takes place in the Golgivesicles, the presence and the sequence of the signal peptidefor normal human serum albumin have not been reportedpreviously.

    At the 3' end of the message, the putative polyadenylylationsignal sequence A-A-T-A-A-A, is located 16 nucleotides up-stream from the beginning of the poly(A) sequence. Anothercharacteristic sequence located near the polyadenylylation sitehas been identified by Benoist et al. (14); the consensus se-quence from several mRNAs was T-T-T-T-C-A-C-T-G-C. A

    Hpa 1(3548)

    Pat I(3611)

    ,182/3Taq I

    similar sequence, T-T-T-T-C-T-C-T-G-T, is located 19 nucleo-tides upstream from the A-A-T-A-A-A hexanucleotide in thehuman albumin mRNA molecule (Fig. 3).

    Primary Structure of Human Serum Albumin. Amino acidsequences for human serum albumin have been published byBehrens et al. (15) and by Meloun et aL (16). (See Fig. 4.) Thereare a few discrepancies between the two reports, often involvingsequences surrounding cysteine residues. Perhaps the mostserious in terms of the structure of the albumin molecule is thesequence around the 17th and 18th cysteines, because the pres-ence (15) or absence (16) of other amino acids between the twocysteines would affect the polypeptide loops generated by thedisulfide bonds ofthese two cysteines. Our results in this criticalregion are shown by the sequencing gel in Fig. 5, which permitsan identification of residues 261 to 290. The 17th and 18th cys-teines are residues 278 and 279, and there are clearly no otheramino acids between them. Consequently, such a sequencearrangement establishes the near-perfect homology in the tri-ple-domain structure of the albumin molecule (Fig. 4). Our se-quence results are in better agmeement with those of Melounet aL (16), although several discrepancies remain. Amino acidpositions 94 (Gln), 95 (Glu), 97 (Gly), 170 (Gln), 464 (His), 465(Glu), and 501 (Glu) are specified (16) as, 94 (Glu), 95 (Gln), 97(Glu), 170 (Glu), 464 (Glu), 465 (His), and 501 (Gln). Residues364-370 (Fig. 3), Ala-Asp-Pro-His-Glu-Cys-Tyr, are specified(16) as His-Asp-Pro-Tyr-Glu-Cys-Ala. There are several.possi-ble explanations for these differences, but we do not think thedifferences are due to erroneous DNA sequence determinations.

    DISCUSSION

    Determining the complete nucleotide sequence of the cDNAhas permitted us to identify the pre- and the propeptides ofhuman serum albumin. Actually, the amino acid sequence ofthe propeptide. The altered protein consequently ceases to bea substrate for the specific'protease that removes propeptidesfrom secretory proteins. It is interesting to note, however, thatfailure to remove such a propeptide does not prevent the pro-tein from being secreted; at least this is 'true about albuminare presently reporting for the apparently normal protein, Arg-Gly-Val-Phe-Arg-Arg-albumin (Fig. 3). Thus, carriers of albu-min Christchurch must be carriers of a CGA to CAA mutation,which changes the codon for Arg to Gln in the last position of

    270 325/6 37416 382 419/0 46011Taq I Mbo I TaqI Mlo HknU Mbo I

    I 1L11 II493/4Taq I

    666termTAA

    *Pot I(3611)

    34HpaU(3658)

    J I1Taq 1

    5'

    Psol boMboI(3611) 16/7 31

    H1%~~~~~Hn 157 131 144/5 162 107/8

    Hinf I MboI MboI MuI

    1 -1 I [3I I.PstIs0 (3611)

    Hp&(A2 (3646)

    HI26910

    Il

    -I IHhaI satI1

    364 406/7 HIMI H*n41 531/2472/3 479/0

    I IHhdlll Hhfl

    (3668)

    pHA36

    Kiobases

    O .1 .2 .3 A .5 .6 .7 .8 .9 1.0 1.1 12 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

    FIG. 2. Restriction endonuclease cleavage map of the overlapping human albumin cDNA inserts in the recombinant plasmids pHA36 andpHA26.' The map shows the inserted albumin DNA (opent bar) and its orientation in-the pBR322 plasmid DNA (black bar). The mRNA sequenceis presented by the upper DNA strands, with the 5' and 3' termini as indicated. Numbers on restriction sites within the human-derived'DNA cor-respond to amino- acid positions in the albumin molecule. Numbers such as 182/3 indicate that the restriction sequence overlaps two amino acids.The Taq I (270) site is actually not cleaved in our system'by Taq I, due to methylation of this sequence to T-C-G-m6A. The inserted sequence inpHA206 starts with the'CTT codon for Leu at position -10 of the prepeptide. Numbers on restriction sites within the pBR322 sequence are takenfrom available sequence data from Sutcliffe (12). One of the two Pst I sites in each recombinant plasmid, indicated by *, has not been reconstituteddue to loss of a terminal A in the Pst'I-linearized plasmid DNA.

    Proc. Nad Acad. Sci. USA 79 (1982)

  • Biochemistry: Dugaiczyk et al. Proc. Nati. Acad. Sci. USA 79 (1982) 73

    -18 p r e -10Met lys trp val thr phe ile ser leu leu phe leu phe ser

    (A)GCTTTTCTCTTCTGTCAACCCCACACGCCTTTGGCACA ATG AAG TGG GTA ACC TTT ATT TCC CTT CTT m CTC m AGC (80)

    -1 -6 p r o -1 1 10 20

    ser ala tyr ser arg gly val phe arg arg asp ala his lys ser glu val ala his arg phe lys asp leu gly glu glu asn phe lysTCG GCT TAT TCC AGG GGT GTG m CGT CGA GAT GCA CAC AAG AGT GAG GTT GCT CAT CGG M AAA GAT TTG GGA GAA GAA AAT TTC AAA (170)

    21 30 34 40 50ala leu val leu ile ala phe ala gin tyr leu gin gin cys pro phe glu asp his val lys leu val am glu val thr glu phe alaGCC TTG GTG TTG ATT GCC TTT GCT CAG TAT CTT CAG CAG TGT CCA m GAA GAT CAT GTA AAA TTA GTG AAT GAA GTA ACT GAA TTT GCA (260)

    51 53 60 62 70 75 80

    lys thr cys val ala asp glu ser ala glu asn cys asp lys ser leu his thr leu phe gly asp lys leu cys thr val ala thr leuAAA ACA TGT GTT GCT GAT GAG TCA GCT GAA AAT TGT GAC AAA TCA CTT CAT ACC CTT m GGA GAC AAA TTA TGC ACA GTT GCA ACT CTT (350)

    81 90 91 100 101 110arg glu thr tyr gly glu met ala asp cys cys ala lys gin glu pro gly arg asn glu cys phe leu gin his lys asp asp asn proCGT GAA ACC TAT GGT GAA ATG GCT GAC TGC TGT GCA AAA CM GAA CCT GGG AGA MT GM TGC TTC TTG CM CAC MA GAT GAC AAC CCA (440)111 120 124 130 140

    asn leu pro arg leu val arg pro glu val asp val met cys thr ala phe his asp asn glu glu thr phe leu lys lys tyr leu tyrMC CTC CCC CGA TTG GTG AGA CCA GAG GTT GAT GTG ATG TGC ACT GCT m CAT GAC MT GM GAG ACA TTT TTG AAA MA TAC TTA TAT (530)

    141 150 160 168 169 170

    glu ile ala arg arg his pro tyr phe tyr ala pro glu leu leu phe pbe ala lys arg tyr lys ala ala phe thr glu cys cys ginGAA ATT GCC AGA AGA CAT CCT TAC TTT TAT GCC CCG GAA CTC CTT TTC m GCT MA AGG TAT MA GCT GCT m ACA GAA TGT TGC CM (620)

    171 177 180 190 200ala ala asp lys ala ala cys leu leu pro lysleu asp glu lie arg asp glu gly lys ala ser ser ala lys gin arg leu lys cysGCT GCT GAT MA GCT GCC TGC CTG TTG CCA MG CTC GAT GM CTT CGG GAT GM GGG AAG GCT TCG TCT GCC MA CAG AGA CTC AAG TGT (710)

    201 210 220 230ala ser leu gin lys pbe gly glu arg ala phe lys ala trp ala val ala arg leu ser gin arg phe pro lys ala glu phe ala gluGCC AGT CTC CAA MA TTT GGA GM AGA GCT TTC MA GCA TGG GCA GTA GCT CGC CTG AGC CAG AGA TTT CCC AAA GCT GAG TTT GCA GAA (800)

    231 240 245 246 250 253 260

    val ser lys leu val thr asp leu thr lys val his thr glu cys cys his gly asp leu leu glu cys ala asp asp arg ala asp leuGTT TCC AAG TTA GTG ACA GAT CTT ACC MA GTC CAC ACG GAA TGC TGC CAT GGA GAT CTG CTT GM TGT GCT GAT GAC AGG GCG GAC CTT (890)

    261 265 270 278 279 280 289 290

    ala lys tyr se cys glu asn gin asp ser ile ser ser lys leu lys glu cys cys glu lys pro leu leu glu lys ser his cys ileGCC AAG TAT ATC TGT GAA MT CAA GAT TCG ATC TCC AGT AAA CTG MG GM TGC TGT GAA MA CCT CTG TTG GM AAA TCC CAC TGC ATT (980)

    291 300 310 316 320ala glu val glu asn asp glu met pro ala asp leu pro ser leu ala ala asp phe val glu ser lys asp val cys lys asn tyr alaGCC GAA GTG GM MT GAT GAG ATG CCT GCT GAC TTG CCT TCA TTA GCT GCT GAT TTT GTT GAA AGT AAG GAT GTT TGC AM MC TAT GCT(1070)

    321 330 340 350glu ala lys asp val phe leu gly met phe leu tyr glu tyr ala arg arg his pro asp tyr ser val val leu leu leu arg leu alaGAG GCA AAG GAT GTC TTC TTG GGC ATG m TTG TAT GAA TAT GCA AGA AGG CAT CCT GAT TAC TCT GTC GTG CTG CTG CTG AGA CTT GCC(1160)

    351 360 361 369 370 380lys thr tyr glu thr thr leu glu lys eys cys ala ala ala asp pro his glu cys tyr ala lys val phe asp glu phe lys pro leuMG ACA TAT GM ACC ACT CTA GAG MG TGC TGT GCC GCT GCA GAT CCT CAT GAA TGC TAT GCC MA GTG TTC GAT GM U AA CCT CTT(1250)

    381 390 392 400 410val glu glu pro gin asn leu ile lys gin asn cys glu leu phe glu gin leu gly glu tyr lys phe gin asn ala leu leu val argGTG GAA GAG CCT CAG AAT TTA ATC MA CAA MT TGT GAG CTT m GAG CAG CTT GGA GAG TAC MA TTC CAG MT GCG CTG TTA GTT CGT(1340)

    411 420 430 437 438 440tyr thr lys lys val pro gin val ser thr pro thr leu val glu val ser arg asn leu gly lys val gly ser lys eys cys lys hisTAC ACC MG MA GTA CCC CAA GTG TCA ACT CCA ACT CTT GTA GAG GTC TCA AGA MC CTA GGA MA GTG GGC AGC AM TGT TGT AM CAT(1430)

    441 448 450 460 461 470pro glu ala lys arg met pro eys ala glu asp tyr leu ser val val leu asn gin leu cys val leu his glu lys thr pro val serCCT GM GCA AAA AGA ATG CCC TGT GCA GM GAC TAT CTA TCC GTG GTC CTG AAC CAG TTA TGT GTG TTG CAT GAG AAA ACG CCA GTA AGT(1520)

    471 476 477 480 490 500asp arg val thr lys cys cys thr glu ser leu val asn arg arg pro cys phe ser ala leu glu val asp glu thr tyr val pro lysGAC AGA GTC ACC AAA TGC TGC ACA GM TCC TTG GTG MC AGG CGA CCA TGC TTT TCA GCT CTG GM GTC GAT GM ACA TAC GTT CCC AAA(1610)501 510 514 520 530glu pbe asn ala glu thr phe thr phe his ala asp ile cys thr leu ser glu lys glu arg gin ile lys lys gin thr ala leu valGAG mU MT GCT GAA ACA TTC ACC TTC CAT GCA GAT ATA TGC ACA CTT TCT GAG MG GAG AGA CAA ATC MG AM CM ACT GCA CTT GTT(1700)531 540 550 558 559 560glu leu val lys his lys pro lys ala thr lys glu gin leu lys ala val met asp asp phe ala ala phe val glu lys cys cys lysGAG CTC GTG MA CAC MG CCC MG GCA ACA AM GAG CM CTG MA GCT GTT ATG GAT GAT TTC GCT GCT TTT GTA GAG MG TGC TGC MG(1790)

    561 567 570 580ala asp asp lys glu thr cys phe ala glu glu gly lys lys leu val ala ala ser gln ala ala leu gly leu ter terGCT GAC GAT MG GAG ACC TGC m GCC GAG GAG GGT AM MA CTT GTT GCT GCA AGT CAA GCT GCC TTA GGC TTA TM CATCACAUMAAMG(1883)

    ter terCATCTCAGCCTACCATGAGAATAAGAGAAAGAAAATGAAGATCAAAAGCTTATTCATCTGTTTTTCTTTTTCGTTGGTGTAAAGCCAACACCCTGTCTAAAAAACATAAATT CTTT (2oo2)

    TCATTTTGCCTCTTTTCTCTGTGCTTCMTMTAAMAATGGAAAGMTCTM ...... 20. M (2078)

    FIG. 3. Nucleotide sequence of human serum albumin mRNA as determined from cloned cDNA. The sequence 5'-upstream of the Leu at -10,which is not contained in pHA206, was determined as follows. A short DNA fragment, containing the Taq I site (Arg at -1 in the propeptide), wasobtained from the 5' end of the albumin DNA in pHA206 (Fig. 2). The fragment was terminated by the Alu I sequence (Ser at -5 in the prepeptide)and the Mnl I sequence (Glu-6), and it was 32P-labeled at the 5' terminus of the latter. This fragment was annealed with human liver mRNA, andthe resulting RNADNA template/primer was used in the reverse transcriptase reaction to synthesize a cDNA copy extended into the 5' terminusof the mRNA template. This cDNA was separated from the deoxyribonucleoside triphosphates on a Sephadex G-50 column and used directly forsequence determination. The first A residue was determined from genomic DNA. The amino acid sequence shown is deduced from the nucleotidesequence. ter, Termination.

    the propeptide. The altered protein consequently ceases to be Christchurch, which has reached the bloodstream of its carriersa substrate for the specific protease that removes propeptides (18).from secretory proteins. It is interesting to note, however, that The nucleotide sequence of the albumin mRNA has also per-failure to remove such a propeptide does not prevent the pro- mitted us to refine the repeated homology in the triple-domaintein from being secreted; at least this is true about albumin structure of the human albumin molecule. The fact that cys-

  • 74 Biochemistry: Dugaiczyk etalP

    ji~~~~~~~h

    - _ mma

    Domain I

    A)~~~~~~~~~~~~~~~~1

    Doai _1

    _~~~~~~~~~~~ k

    Domain11IIE {

    FIG.4.Aioaieune n iufd odn atr fhmnsrmabmn rpedmi tutr fitra oooyi n_iaevarw.Tesqec ersnsorDeetdt:telvu s codn oBon(7.Nt h erDretsmer ftedslie

    bridges throughout the molecule.

    Proc. Natl. Acad. Sci. USA 79 (1982)

  • Proc. Natd Acad. Sci. USA 79 (1982) 75

    2814 LysX Glur cysO CyskGluLys

    2 LeukLysSer

    ISer'Ile;SerAspGInAsnGlu

    CyslleTyrLys

    Ala

    261

    290lle

    CysHisSerLysGluLeuLeuProLysGluCysCysGluLys276

    FIG. 5. Autoradiogram of a DNA sequencing gel showing the re-gion coding for the 17th and 18th cysteines of human serum albumin;the two cysteines are residues 278 and 279 and there are no otheramino acids between them.

    teines 278 and 279 are not separated by other amino acids, aswas originally thought (15), brings the structure of domain IIcloser to the structures of the remaining two domains (Fig. 4).

    As nucleotide sequences of genes or mRNAs become avail-able, they prompt renewed speculations about codon usage,because some insight might be gained into rates of mutation orother evolutionary forces, from any nonrandom pattern of co-dons used in a given gene, or a given species. For example,analyzing published nucleotide sequence data ofhuman a- and,B3globin genes, Modiano et aL (19) noticed that when severaldegenerate codons specify one amino acid, the codon that givesrise to a termination codon in a one-step mutation is never used.These "dispensable pretermination codons" are avoided, it wasargued (19), in order to reduce the risk to mutate to termination.Such an argument implicitly postulates that evolution is antic-ipatory. This argument, however, receives no support from thedistribution of codons utilized on the serum albumin gene.

    Table 1 summarizes the prevalence ofcodons utilized for eachof the amino acids in human serum albumin, including the pre-sequence and prosequence. There is no evidence for discrim-ination against the seven dispensable pretermination codons:UUA, UUG, UCA, UCG, CGA, AGA, and GGA.

    We thank Dr. Paul C. MacDonald and Dr. Evan R. Simpson fromthe University of Texas Southwest Medical Center, Dallas, for kindlyproviding human fetal liver samples. We also thank Dr. Brian J.McCarthy and Dr. Arthur D. Riggs for critical reading of the manu-

    Table 1. Codon usage in human serum albumin mRNAU C A G

    Phe

    u PheLeuLeu

    Leu

    c LeuLeuLeu:

    ileA fle

    ileMet

    25101013

    SerSerSerSer

    3763

    Tyr 13 Cys 15Tyr 6 Cys 20Ter 1 Ter 0Ter 0 Trp 2

    19 Pro 10 His 117 Pro 6 His 53 Pro 7 Gln 1112 Pro 1 Gln 9

    4 Thr 7 Asn 114 Thr 9 Asn 61 Thr 11 Lys 407 Thr 2 Lys 20

    Val 12 Ala 31G Val 7 Ala 14

    Val 8 Ala 16Val 16 Ala 2

    Asp 25Asp 11Glu 38Glu 23

    Arg 3Arg 1Arg 3Arg 2

    Ser 6Ser 3Arg 13Arg 5

    UCAG

    UCAG

    UCAG

    Gly 3 UGly 3 CGly 6 AGly 2 G

    The numbers indicate the number of times the individual codons areused in the coding region of the mRNA. Ter, termination.

    script. The artwork is by David Scarff. The work was supported by theBaylor Center for Population Research and Reproductive Biology.

    1. Abelev, G. I. (1971) Adv. Cancer Res. 14, 295-358.2. Gitlin, D., Perricelli, A. & Gitlin, G. M. (1972) Cancer Res. 32,

    979-982.3. Sell, S. & Gord, D. R. (1973) Immunochemistry 10, 439-442.4. Law, S. W. & Dugaiczyk, A. (1981) Nature (London) 291,

    201-205.5. Jagodzinski, L. L., Sargent, T. D., Yang, M., Glackin, C. & Bon-

    ner, J. (1981) Proc. NatL Acad. Sci. USA 78, 3521-3525.6. Gorin, M. B., Cooper, D. L., Eiferman, F., van de Rijn, P. &

    Tilghman, S. M. (1981)J. Biol Chem. 256, 1954-1959.7. Chirgwin, J. M., Przybyla, A. E., MacDonald, R. J. & Rutter,

    W. J. (1979) Biochemistry 18, 5294-5299.8. Taylor, J. M. & Tse, T. P. H. (1976) J. Biol Chem. 251,

    7461-7467.9. Laemmli, U. K. (1970) Nature (London) 227, 680-685.

    10. Law, S., Tamaoki, T., Kreuzaler, F. & Dugaiczyk, A. (1980)Gene 10, 53-61.

    11. Maxam, A. & Gilbert, W. (1980) Methods Enzymot 65, 499-560.12. Sutcliffe, J. G. (1978) Nucleic Acids Res. 5, 2721-2728.13. Azad, A. A. & Deacon, N. J. (1980) Nucleic Acids Res. 8,

    4365-4376.14. Benoist, C., O'Hare, K., Breathnach, R. & Chambon, P. (1980)

    Nucleic Acids Res. 8, 127-142.15. Behrens, P. O., Spiekerman, A. M. & Brown, J. R. (1975) Fed.

    Proc. Fed. Am. Soc. Exp. Biol 34, 591 (abstr.).16. Meloun, B., Moravek, L. & Kostka, V. (1975) FEBS Lett. 58,

    134-137.17. Brown, J. R. (1976) Fed. Proc. Fed. Am. Soc. Exp. Biol 35,

    2141-2144.18. Brennan, S. 0. & Carrell, R. W. (1978) Nature (London) 274,

    908-909.19. Modiano, G., Battistuzzi, G. & Motulsky, A. G. (1981) Proc. Natl

    Acad. Sci. USA 78, 1110-1114.

    Biochemistry: Dugaiczyk et aL