sequences ofthe ssb gene andproteinbiochemistry: sancaret al. table 1. amino~acidcompositionofssb...

5
Proc. NatL Acad. Sci. USA Vol. 78, No. 7, pp. 4274-4278, July 1981 Biochemistry Sequences of the ssb gene and protein (DNA sequence/helix-destabilizing protein/tryptic peptides/secondary structure/domains) Aziz SANCAR*, KENNETH R. WILLIAMSt, JOHN W. CHASES, AND W. DEAN RuPP*t Departments of *Terhpeutic Radiology and tMolecular Biophysics and Biochemistry, Yale University School of Medicine, New Haven, Connecticut 06510; and MDepartment of Molecular Biology, Albert Einstein College of Medicine, Bronx, New York 10461 Communicated by jerard Hurwitz, April 13, 1981 ABSTRACT We have determined the sequences of the ub gene and protein of Escherichia coli. The coding region of ssb is 534 base pairs and is preceeded and followed by dyad symmetries of 39 base pairs and 27 base pairs, respectively. The promoter for sab is close to that for uvrA and these two genes are transcribed in opposite directions: ssb clockwise and uvrA counterclockwise on the standard E. coli genetic map. The DNA helix-destabilizing protein encoded by the ssb gene (single-strand binding protein) contains 177 amino acids and has a calculated molecular weight of 18,873. Although there is no extensive sequence homology among the three helix-destabilizing proteins whose sequences are now known, both the E. coli and bacteriophage T4 DNA helix-de- stabilizing proteins do contain an acidic, a-helical region at their carboxy termini that may be functionally homologous. The re- mainder of the E. coli helix-destabilizing protein can be divided into two apparent domains on the basis of its amino acid sequence. The amino-terminal region (residues 1-105) contains 79% of the charged residues (27 out of 34 total) in the protein and is predicted to have a high degcee of secondary structure (a helix and lpleated sheet). In contrast, the region including residues 106-165 contains only two charged amino acids and is devoid of a helix or ,8 pleated sheet. The Escherichia coli DNA helix-destabilizing protein (single- strand binding protein, SSB) is essential for replication of the chromosomes of E. coli and its single-stranded DNA phages. In addition, SSB participates in recombination and repair of E. coli DNA and in the concerted response (SOS response) of this bacterium to DNA-damaging agents (see refs. 1 and 2 for re- views). Although the recent mapping (3) and cloning (4) of the ssb gene has facilitated the genetic study of ssb and large-scale purification of its protein (5), a detailed understanding of the regulation ofthis gene and the interaction of its protein SSB with DNA and other proteins involved in DNA metabolism requires the sequence of the gene and protein. We have reported (4) the cloning of the ssb and uvrA genes on a 9.4-kilobase pair (kb) DNA fragment. In this communi- cation we have determined the location of ssb relative to uvrA, its direction of transcription, and its sequence. We have found that the ssb start point is within about 200 base pairs of the uvrA start point and that ssb is transcribed clockwise on the E. coli chromosome. The coding region of ssb is 534 base pairs long, specifying a protein containing 177 amino acids for a total cal- culated molecular weight of 18,873. MATERIALS AND METHODS Bacterial Strains andPlasmids. CSR603 (recAl,uvrA6,phr-1) was used as a host strain for various plasmids and has been described (6). Plasmid-carrying derivatives of KLC647 (recAl,uvrA6,AexoI) were the source of SSB. pDR2000 (uvrA',ssb+,tet',amp') is a pBR322 derivative that carries the uvrA and ssb genes (4); pDR1996 (uvrA',ssb+,tet+,amp+) was constructed by inserting a Cla I fragment of pDR2000 into pBR322. By cutting pDR1996 with Kpn I and religating, we constricted pDR1995 (uvrA-,ssb-,tet+,amp+). Mapping and Sequence Determination of the ssb Gene. The restriction map of ssb was obtained by digesting pDR1996 and its various restriction fragments with Ava II, Bst N1, Hae III, and Hinfl and then analyzing the digests in 5% (wt/vol) polyacrylamide gels. DNA fragments were extracted from these gels, the 5' end was labeled, and the sequence was determined as described (7). Maxicells were prepared and labeled as de- scribed (8) except that cycloserine (200 ,ug/ml) was added to the culture 1 hr after irradiation. Purification of SSB and Its Tryptic Peptides. SSB was pu- rified from KLC647/pDR1996 by the same procedure used for CSR603/pDR2000 (8). KLC647/pDR1996 has two advantages over the CSR603/pDR2000 strain used in our previous puri- fication. First, SSB is amplified about 20-fold in this strain, as compared to the 13-fold amplification for CSR603/pDR2000; second, KLC647 is deficient in exonuclease I activity, which otherwise copurifies with SSB on several chromatographic sup- ports (5). Tryptic peptides were isolated from trichloroacetic acid-pre- cipitated SSB that had been digested with trypsin (1:30, wt/ wt) for 6 hr at 37°C in 0.05 M NH4HCO3. After removing the precipitate (which proved to be the nearly homogeneous tryptic peptide containing residues 97-115), the supernate was frac- tionated into six pools by Bio-Gel P4 chromatography. Each of these pools was then subjected to Aminex AG50W-X4 chro- matography as described for phage T4 gene 32 tryptic peptides (9). Amino Acid Analysis and Sequencing of SSB. The amino acid compositions of SSB and its tryptic peptides were determined (after hydrolyzing aliquots of peptides for 16 hr and protein for 24, 48, and 72 hr at 110°C in 6 M HCV/0.2% phenol) on a Beck- man 121 M amino acid analyzer. The sequence of the first 52 amino acids in SSB was determined on an automated Beckman 890C sequencer with 180 nmol of trichloroacetic acid-precipi- tated SSB. The phenylthiohydantoin derivatives obtained from the sequencer were identified by gas chromatography and by amino acid analysis after back-hydrolysis in hydriodic acid (18 hr at 130°C). The carboxy-terminal amino acid of SSB was iden- tified by carboxypeptidase A and B digestion (9). The secondary structure of SSB was predicted from the amino acid sequence with a computer program (10) based on the method of Chou and Fasman (11). Abbreviations: kb, kilobase pair; SSB, single-strand binding protein (Escherichia coli DNA helix-destabilizing protein). 4274 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertise- ment" in accordance with 18 U. S. C. §1734 solely to indicate this fact. Downloaded by guest on May 22, 2021

Upload: others

Post on 21-Jan-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sequences ofthe ssb gene andproteinBiochemistry: Sancaret al. Table 1. Amino~acidcompositionofSSB Aminoacidanalyses* Amino DNA This acid sequence study Ref. 13 Ref. 14 Cys 0 0.3t 0.9

Proc. NatL Acad. Sci. USAVol. 78, No. 7, pp. 4274-4278, July 1981Biochemistry

Sequences of the ssb gene and protein(DNA sequence/helix-destabilizing protein/tryptic peptides/secondary structure/domains)

Aziz SANCAR*, KENNETH R. WILLIAMSt, JOHN W. CHASES, AND W. DEAN RuPP*tDepartments of *Terhpeutic Radiology and tMolecular Biophysics and Biochemistry, Yale University School of Medicine, New Haven, Connecticut 06510; andMDepartment of Molecular Biology, Albert Einstein College of Medicine, Bronx, New York 10461

Communicated by jerard Hurwitz, April 13, 1981

ABSTRACT We have determined the sequences of the ubgene and protein of Escherichia coli. The coding region of ssb is534 base pairs and is preceeded and followed by dyad symmetriesof 39 base pairs and 27 base pairs, respectively. The promoter forsab is close to that for uvrA and these two genes are transcribedin opposite directions: ssb clockwise and uvrA counterclockwise onthe standard E. coli genetic map. The DNA helix-destabilizingprotein encoded by the ssb gene (single-strand binding protein)contains 177 amino acids and has a calculated molecular weightof 18,873. Although there is no extensive sequence homologyamong the three helix-destabilizing proteins whose sequences arenow known, both the E. coli and bacteriophage T4 DNA helix-de-stabilizing proteins do contain an acidic, a-helical region at theircarboxy termini that may be functionally homologous. The re-mainder of the E. coli helix-destabilizing protein can be dividedinto two apparent domains on the basis of its amino acid sequence.The amino-terminal region (residues 1-105) contains 79% of thecharged residues (27 out of34 total) in the protein and is predictedto have a high degcee ofsecondary structure (a helix and lpleatedsheet). In contrast, the region including residues 106-165 containsonly two charged amino acids and is devoid ofa helix or ,8 pleatedsheet.

The Escherichia coli DNA helix-destabilizing protein (single-strand binding protein, SSB) is essential for replication of thechromosomes of E. coli and its single-stranded DNA phages.In addition, SSB participates in recombination and repair ofE.coli DNA and in the concerted response (SOS response) of thisbacterium to DNA-damaging agents (see refs. 1 and 2 for re-views). Although the recent mapping (3) and cloning (4) of thessb gene has facilitated the genetic study of ssb and large-scalepurification of its protein (5), a detailed understanding of theregulation ofthis gene and the interaction ofits protein SSB withDNA and other proteins involved in DNA metabolism requiresthe sequence of the gene and protein.We have reported (4) the cloning of the ssb and uvrA genes

on a 9.4-kilobase pair (kb) DNA fragment. In this communi-cation we have determined the location of ssb relative to uvrA,its direction of transcription, and its sequence. We have foundthat the ssb start point is within about 200 base pairs ofthe uvrAstart point and that ssb is transcribed clockwise on the E. colichromosome. The coding region of ssb is 534 base pairs long,specifying a protein containing 177 amino acids for a total cal-culated molecular weight of 18,873.

MATERIALS AND METHODS

Bacterial Strains andPlasmids. CSR603 (recAl,uvrA6,phr-1)was used as a host strain for various plasmids and has beendescribed (6). Plasmid-carrying derivatives of KLC647

(recAl,uvrA6,AexoI) were the source of SSB. pDR2000(uvrA',ssb+,tet',amp') is a pBR322 derivative that carries theuvrA and ssb genes (4); pDR1996 (uvrA',ssb+,tet+,amp+) wasconstructed by inserting a Cla I fragment of pDR2000 intopBR322. By cutting pDR1996 with Kpn I and religating, weconstricted pDR1995 (uvrA-,ssb-,tet+,amp+).

Mapping and Sequence Determination of the ssb Gene.The restriction map of ssb was obtained by digesting pDR1996and its various restriction fragments with Ava II, Bst N1, HaeIII, and Hinfl and then analyzing the digests in 5% (wt/vol)polyacrylamide gels. DNA fragments were extracted from thesegels, the 5' end was labeled, and the sequence was determinedas described (7). Maxicells were prepared and labeled as de-scribed (8) except that cycloserine (200 ,ug/ml) was added to theculture 1 hr after irradiation.

Purification of SSB and Its Tryptic Peptides. SSB was pu-rified from KLC647/pDR1996 by the same procedure used forCSR603/pDR2000 (8). KLC647/pDR1996 has two advantagesover the CSR603/pDR2000 strain used in our previous puri-fication. First, SSB is amplified about 20-fold in this strain, ascompared to the 13-fold amplification for CSR603/pDR2000;second, KLC647 is deficient in exonuclease I activity, whichotherwise copurifies with SSB on several chromatographic sup-ports (5).

Tryptic peptides were isolated from trichloroacetic acid-pre-cipitated SSB that had been digested with trypsin (1:30, wt/wt) for 6 hr at 37°C in 0.05 M NH4HCO3. After removing theprecipitate (which proved to be the nearly homogeneous trypticpeptide containing residues 97-115), the supernate was frac-tionated into six pools by Bio-Gel P4 chromatography. Each ofthese pools was then subjected to Aminex AG50W-X4 chro-matography as described for phage T4 gene 32 tryptic peptides(9).Amino Acid Analysis and Sequencing of SSB. The amino acid

compositions of SSB and its tryptic peptides were determined(after hydrolyzing aliquots of peptides for 16 hr and protein for24, 48, and 72 hr at 110°C in 6 M HCV/0.2% phenol) on a Beck-man 121 M amino acid analyzer. The sequence of the first 52amino acids in SSB was determined on an automated Beckman890C sequencer with 180 nmol of trichloroacetic acid-precipi-tated SSB. The phenylthiohydantoin derivatives obtained fromthe sequencer were identified by gas chromatography and byamino acid analysis after back-hydrolysis in hydriodic acid (18hr at 130°C). The carboxy-terminal amino acid ofSSB was iden-tified by carboxypeptidase A and B digestion (9). The secondarystructure of SSB was predicted from the amino acid sequencewith a computer program (10) based on the method ofChou andFasman (11).

Abbreviations: kb, kilobase pair; SSB, single-strand binding protein(Escherichia coli DNA helix-destabilizing protein).

4274

The publication costs ofthis article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertise-ment" in accordance with 18 U. S. C. §1734 solely to indicate this fact.

Dow

nloa

ded

by g

uest

on

May

22,

202

1

Page 2: Sequences ofthe ssb gene andproteinBiochemistry: Sancaret al. Table 1. Amino~acidcompositionofSSB Aminoacidanalyses* Amino DNA This acid sequence study Ref. 13 Ref. 14 Cys 0 0.3t 0.9

Proc. Natl. Acad. Sci. USA 78 (1981) 4275

RESULTSPhysical Map of the ssb Gene. pDR2000 is a recombinant

plasmid that carries the uvrA and ssb genes (4). In maxicells,this plasmid produces two proteins which have molecularweights of 19,500 and 114,000. Various genetic and biochemicaltests demonstrated that these two proteins (UVRA and SSB) arethe products of the ssb and uvrA genes, respectively (12). Thelocation of uvrA on the pDR2000 plasmid and its direction oftranscription also have been determined (12). To localize ssbrelative to uvrA, we first mapped pDR2000 with Cla I and KpnI restriction enzymes. Then the Cla I fragment spanning from8 to 13.7 kb of pDR2000 was cloned into pBR322 to generatepDR1996 (Fig. 1). When the proteins encoded by this plasmidwere analyzed in maxicells, it was found that the plasmid carriedssb as well as uvrA (Fig. 2). When the 2.2-kb Kpn I fragmentspanning from 5.3 to 7.55 kb of pDR1996 was removed, wefound that the new plasmid, pDR1995, did not produce eitherUVRA or SSB in maxicells (Fig. 2). These results indicated thatat least part ofthe ssb gene or its operon was localized between5.3 kb and 6.75 kb ofpDR1996. Therefore, we decided to isolatethe 2.2-kb Kpn I fragment, map it with four base-pair-specificrestriction enzymes, and determine the sequence of the regionbetween 5.3 kb and the uvrA promoter at 6.75 kb. Fig. 3 showsthe restriction map of this region that was obtained by digestingthe Kpn I 2.2-kb fragment with Ava II, BstNl, Hae III, andHinfl endonucleases. The resulting DNA fragments were iso-lated, and the sequence was determined. By comparing theDNA sequence to the sequence of the amino terminus of SSB,we were able to localize the ssb gene between 6 kb and 6.75kb of the pDR1996 map as shown in Figs. 1 and 3. The uvrAand ssb promoters are close to one another, and the two genesare transcribed in opposite directions: ssb clockwise and uvrAcounterclockwise on the standard E. coli genetic map (Fig. 3).

Nucleotide Sequence of the ssb Gene. The DNA sequenceof ssb is shown in Fig. 4. The coding region of the gene is pre-ceeded and followed by dyad symmetries of 39 and 27 basepairs, respectively. The presumptive "RNA polymerase rec-ognition sequence," the Pribnow box, and Shine-Dalgarno se-quence are also indicated in Fig. 4. The nucleotides in this fig-ure were numbered by starting seven base pairs downstreamfrom the "invariable T" ofthe Pribnow box. Although the codingregion of ssb contains 534 base pairs, the methionine of the ini-tiation codon is not present in the SSB and, therefore, the pro-tein contains 177 amino acids and terminates in a phenylalaninefollowed by a UGA termination codon.

Protein Chemistry ofSSB. The sequence ofthe first 52 aminoacids in SSB was determined by automatic sequence determi-nation and is in complete agreement with the sequence shownin Fig. 4. Carboxypeptidase A and B digestion of SSB released

EcoRI Cla ISal I Sal I

1 10.6/01

EcoRI 8 29

KpnI vrA10.6 kb 37

Vsb 4

c EoRI

TlCia IKpn I

EcoRI Cla I~aSal I

7cpDR1995 2-

-6 8.4 kb

EKp4Cla I EcoRI

FIG. 1. Plasmids used in this study. The promoters for uvrA andssb are indicated by small circles. The arrows indicate the locations,lengths, and directions of transcription of uvrA and ssb. The vectorDNA sequences carried by the plasmids are drawn with hatched lines.

UVRA _

TET -

AMP

FIG. 2. Synthesis of SSB in maxicells. Plasmid-carrying deriva-tives of CSR603 (recAl, uvrA6,phr-l) were irradiated and labeled with[85Slmethionine as described (8). The labeled proteins were separatedon a 12% NaDodSO4Jpolyacrylamide gel. The gel was dried and au-toradiographed for 5 days. Each lane contained approximately 20,000cpm of acid-insoluble protein containing 'S. Purified uvrA and ssbproteins (UVRA and SSB) were used as molecular weight markers.Left, pDR1996 (ssb+); Right, pDR1995 (ssb-). TET and AMP indicatethe products of the tet and amp genes of the vector, respectively.

0.82 mol of phenylalanine per mol of SSB and no other aminoacids, again in agreement with the DNA sequence. To confirmthe rest of the proposed amino acid sequence for SSB, we com-pared the amino acid composition of intact SSB and its trypticpeptides with that predicted from the DNA sequence. As shownin Table 1, there is good agreement between the actual SSBcomposition, as determined in this study and as reported pre-viously, and that calculated from the sequence given in Fig. 4.A better confirmation of the sequence was obtained by isolating12 of the 15 predicted tryptic peptides and determining theircompositions. As shown in Table 2, there is complete agreementbetween the predicted and experimental compositions of thesepeptides.

Based on its amino acid sequence, SSB can be divided intothree apparent domains. The first two-thirds of SSB (residues1-105) contains a large number of charged amino acids (27 outof 34 in SSB) and is predicted to be highly ordered, with mostof the residues either in a helices or ,B pleated sheets (Fig. 5).In sharp contrast, the region including residues 106-165 con-tains only two charged amino acids and no predicted a helix or,3 pleated sheet. This unusually long coil region probably doeshave some secondary structure and, in fact, it is predicted tocontain most of the (structure turns (9 out of 14) in SSB. Thecarboxy-terminal region ofSSB (residues 166-177) is interestingbecause of its possible homology to the carboxy terminus of the

Biochemistry: Sancar et al.

Dow

nloa

ded

by g

uest

on

May

22,

202

1

Page 3: Sequences ofthe ssb gene andproteinBiochemistry: Sancaret al. Table 1. Amino~acidcompositionofSSB Aminoacidanalyses* Amino DNA This acid sequence study Ref. 13 Ref. 14 Cys 0 0.3t 0.9

4276 Biochemistry: Sancar et al.

-o

i0-19

c0 " se Cc0- 04 4-00 (a 0 (a .-0I m I m ImI1.I1, .1 I ,1NW

te-> 4-

a~ ~I I@. . . . .4._

1 (00en'. .1. m :4 i t I 5 I i I I i I

IGIIs_AGT I ssb- GTA

.0

I')0-

4- z 44~~~~t- 0.c 0

, .>I 4 m I

I ..I I .uvA I I

AT uvrA w,

FIG. 3. Restriction map ofssb and sequencing strategy. The map coordinates are those ofpDR1996. The translational start points for uvrA andssb are indicated by ATG and the presumptive terminator of ssb with an arrowhead on the line. The location ofthe uvrA promoter was determinedin other studies (ref 12; unpublished data). The arrows below the map indicate the direction of sequencing and the length ofsequences determinedin individual sequencing experiments. The region whose sequence is given in Fig. 4 is boxed.

bacteriophage T4 gene 32 protein (9). Both of these helix-des-tabilizing proteins contain a negatively charged, a-helical re-gion within the last 12 residues at their carboxy terminus.Overall, SSB is predicted to contain 22% a helix and 19% (3pleated sheet, which is very close to the experimental valuesof 20% a helix and 20% A pleated sheet, as previously deter-mined by circular dichroism (13).

DISCUSSIONWe determined the sequence of the ssb gene, predicted theprotein sequence of SSB from the DNA sequence, and thenconfirmed most of the protein sequence either by direct se-quence determination or by analysis of tryptic peptides. Theresulting DNA and protein sequences reported here reveal sev-eral interesting points.

-50 -10 I

TGTCTGGCCAGGTTTGTTTCCCGGAACCGAGGTCACAACATAGTAAAAGCGCT ;TGGTACAATCGCGCG T TTCAGAACGATTTTTTTC GGA CACGAACmACAGACCGGTCCAAACAAAGGGCCTTGGCTCCAGTGTTGTATCATTTTCGC C~nACCATGTTAGCGCGC AAAGTCTTGCTAAAAAAA tCCT~tGTGCTT

70 110

GCCAGCAGAGGCGTAAACAAGGTTATTCTCGTTGGTAAT1TGGGTCAGGACCCGGAAGTACGCTACATGCCAAATGGTGGCGCAGTTGCCAACATTACGCTGGCTACTTCCGAATCATGGCGGTCGTCTCCGCATTGTTCCAATAAGAGCAACCATTAGACCCAGTCCTGGGCCTTCATGCGATGTACGGTTACCACCGCGTCAACGGTTGTAATGCGACCGATGAGGCTTAGTACCAZaSerArgGlyVaZAsenLys~aZIteLeuVaZ GlyAonLeuGlyG~nAspPrsoGluVaZArgl'yrMetProAs9nGlyGtyA ZaVaZAZa~snIZeThrLeuAlaThrSerGluSerTrp

10 20 30 40

190 230.* * 0

CGTGATAMAAGCGACCGGCGAGATGAAAGAACAGACTGAATGGCACCGCGTTGTGCTGTTCGGCAAACTGGCAGAAGTGGCGAGCGAATATCTGCGTAAAGGTTCTCAGGTTTATATCGAAGCACTATTTCGCTGGCCGCTCTACTTCTTGTCTGACTTACCGTGGCGCAACACGACAAGCCGTTTGACCGTCTTCACCGCTCGCTTATAGACGCATTTCCAAGAGTCCAAATATAGCTT

ArgAspLysAlaThrGlyGZu~etLysoGluG~nThrGluTrpHi8Arg~aZVaZLeugheGlyLysLeuA ZaGluVaZAlaSerGZ u~nrLeuArgLyaGZySerG~nValZnjrIZeGZ u50 60 70 80

310 350

GGTCAGCTGCGTACCCGTAAATGGACCGATCAATCCGGT9AGGATCGCT1CACCACAGAAGTCGTGGTGAACGTTGGCGGCACCATGCAGATGCTGGGTGGTCGTCAGGGTGGTGGCGCTCCAGTCGACGCATGGGCATTTACCTGGCTAGTTAGGCCAGTCCTAGCGATGTGGTGTCTTCAGCACCACTTGCAACCGCCGTGGTACGTCTACGACCCACCAGCAGTCCCACCACCGCGAGlyG~nLeuArgfhrArgLyaTrpThrAspalnSerGlyG~nAspArglyrThrThrGluVaZ Val ValAsnValGZyGlyThrMetGtnMetLeuGlyGlyArgGlnGZy~l yGlyAla

90 1 00 110 120

430 470

l~iGGCUGrCACCGlTTA'TAAGCCACCACCAGATC GGClGTCGCAAC AGTGAGCGT GGCGTCACCGlTTAGT1AA30CG1 1160CCTCAGAGCGGTCAGGCGAGGCProAlaGlyGlyAsnIteClyGlyGlyG~nProG~nSerGlyTrpGlyG~nProG~nG~nProG~nGlyGlyAsnG~ncheSerGlyGlyA ZaG~nSerArgProGlnG~nSerAlaF'ro

130 140 150 160

550 590

GCAGCGCCGTCTAACGAGCCGCCGATGGACTTTGATGATGACATTCCGTTC rG T CATTAAAACAATAGGTTATATTGTTTAAGGTGGATGATTAAAGCATCTGCCAGCCATAAACGTCGCGGCAGATTGCTCGGCGGCTACCTGAAACTACTACTGTAAGGC A AC sAAGAATTTTGTTATCCAATATAACAsAAATTCCACCTACTAATTCGTAGACGGTCGGTAnTT

AtaAlaProSerAsnGluProProMetAspPheAspAspAspIZeProPheNON170

670

AAAGAAGCCTCCGTTATGGAGGCTCACGTATCAGGTCAAAATCGTTTCTTCGGAGGCAATACCTCCGAGTGCATAGTCCAGTTMTAGC

FIG. 4. Sequences of the ssb gene and protein. The dyad symmetries 5' and 3' to the coding region are underlined, whereas the presumptive"-35 sequence," Pribnow sequence, Shine-Dalgarno ribosome binding sequence and the initiation and termination codons are enclosed in boxes.** In a preliminary account of this work (1), glutamine-82 was erroneously identified as glutamic acid and, as a result of a typographical error, a

valine residue was inserted in between threonine-85 and arginine-86.

Proc - Natl. Acad. Sci - USA 78 (1981)

Dow

nloa

ded

by g

uest

on

May

22,

202

1

Page 4: Sequences ofthe ssb gene andproteinBiochemistry: Sancaret al. Table 1. Amino~acidcompositionofSSB Aminoacidanalyses* Amino DNA This acid sequence study Ref. 13 Ref. 14 Cys 0 0.3t 0.9

Biochemistry: Sancar et al.

Table 1. Amino~acid composition of SSB

Amino acid analyses*Amino DNA Thisacid sequence study Ref. 13 Ref. 14Cys 0 0.3t 0.9 0Asx 16 16.3 16.2 16Thr 9 8.4 13.3 11Ser 11 9.6 11.8 13Glx 28 27.2 26.0 30Pro 12 11.5 10.0 9Gly 28 27.9 24.5 34Ala 13 13.3 13.9 17Val 13 12.6 12.2 8Met 5 4.1 4.9 9Ile 5 4.8 5.3 3Leu 8 8.5 9.0 6Tyr 4 4.4 4.4 4Phe 4 4.3 3.9 4His 1 1.3 2.6 1Lys 6 6.2 6.1 8Arg 10 9.5 8.1 9Trp 4 4.6t 2.6 3

* Data are numbers of amino acids based on a molecular weight of18,873 for SSB.

t Determined as carboxymethylcysteine.* Estimated spectrophotometrically.

SSB is a relatively abundant protein, there being about 1200monomers per cell (14). This high level could be due to efficienttranscription of the gene or to efficient translation of its mRNAor both. Although it is not possible to tell at present what leadsto efficient transcription, actively transcribed genes have beenfound to have at least one of the following features: (i) closehomology to the canonical "-35 sequence," T-T-G-A-C-A, andthe Pribnow box, T-A-T-A-A-T (15); and (ii) high A+T/G+Cratio in the promoter region (16) and tandem promoters (17).

Proc. Natl. Acad. Sci. USA 78 (1981) 4277

An inspection of the DNA sequence (Fig. 4) will show that ssbdoes not have a strong promoter by these criteria. However,the ssb promoter does contain a 39-base pair palindrome, whichmay contribute to an active transcription. Alternatively, thisdyad symmetry may be a recognition site for a positive or anegative regulator that effects the transcription of ssb. How-ever, there is as yet no in vivo evidence suggesting that SSBis an inducible protein. The A-G-G-A-G sequence situated eightbase pairs prior to the initiation codon constitutes a good matchto an ideal ribosome binding site; therefore, we expect ssbmRNA to be translated efficiently.The 3' end of ssb does not contain a conventional terminator

structure: a dyad symmetry which precedes or overlaps a G/C stretch of at least three nucleotides, which in turn precedesa T stretch of at least four bases, where termination occurs 20± 4 base pairs from the center ofsymmetry (15). Although thereis a 27-base-pair palindrome immediately after the terminationcodon of ssb, there is no G/C stretch followed by multiple Tsin the 110 nucleotides beyond the coding region. Similarly, ssbdoes not show any sequence homology with the terminators ofp-dependent genes. The significance of these observations willbecome apparent only when the actual point of transcriptiontermination is determined in vitro.

The protein sequence given in Fig. 4 is in excellent agree-ment with our estimate of the molecular weight (19,500) andamino acid composition of SSB (Table 1).The proposed se-quence has, in addition, been confirmed by direct sequencedetermination of the first 52 amino acids in SSB and by char-acterizing SSB trypic peptides. The protein sequence and pre-dicted secondary structure of SSB reveal three apparent do-mains. The first 60% ofthe molecule (residues 1-105) has a highdegree of secondary structure and contains 14 of the 16 basicamino acids in SSB, which suggest this domain may be impor-tant for DNA binding through ionic interactions. Preliminaryresults on a partial proteolysis product of SSB, referred to asSSB*-A and lacking Mr 7000 fragment at the carboxy terminus,

Table 2. Amino acid compositions* of tryptic peptides from SSB

Amino Residue numbersacid 4-7 8-12 42-43 44 49 50-56 57-62 63-72 73-84 87-96t 97-115 116-154t 155-177tAsx 1.0(1) 2.2(2) 1.0(1) - - - - - 2.0(2) 1.3(1) 2.1(2) 5.1(5)Thr - - - 1.2(1) 1.0(1) - - - 1.0(1) 3.7(3) - -Ser _- - 0.8(1) 0.9(I) 0.9(1) - 1.8(3) 1.5(2)Glx - 1.9(2) - 0.8(1) 3.1(3) - 1.9(2) 3.3(3) 1.6(2) 2.2(2) 9.2(9) 3.2(3)Pro - 0.7(1) - _ - - _ _ _ 3.5(4) 6.2(6)Gly 1.1(1) 2.7(2) - 1.1(1) - 1.1(1) - 2.2(2) 1.3(1) 4.6(4) 14.7(14) -

Ala - - - 0.9(1) - 1.9(2) - - - 3.1(3) 2.9(3)Val 1.0(1) 2.7(3) - - 1.1(2) 1.2(1) 1.0(1) - 2.3(4)Met - - - 1.0(1) - - - - - 1.5(2) 1.1(1)Ile - 0.5(1) - - - - - 1.0(1) - - 1.0(1) 1.0(1)Leu - 2.1(2) - - - 1.0(1) 2.1(2) 1.0(1) - 0.9(1)Tyr - - - - - - 1.0(1) 1.0(1) - 0.8(1)Phe - - - - - 0.9(1) - - - 1.0(1) 2.1(2)His - - - 1.0(1) - - - _ - _ -Lys 1.0(1) - 1.0(1) 0.9(1) - 1.0(1) - 1.0(1) 0.9(1) -Arg - 1.0(1) - 1.0(1) - 1.1(1) 1.0(1) 1.1(1) 1.0(1) 1.0(1)Trp - - - - ND(1}- Trp - - ND(1) - ND(1) -

Yield, %§48 19 36 49 16 29 52 23 6 100 9 37

ND, not determined.-* Numbers in parentheses are calculated from the sequence in Fig. 4.t The tryptic peptide extending from residue 88 to 96 was isolated in 10% yield.* The overlapping tryptic peptide containing residues 116-177 was isolated in 7% yield. Digestion of either the peptide containing residues 116-177 or 155-177 with carboxypeptidase A and B revealed phenylalanine as the carboxy-terminal residue.

§ Yield of purified tryptic peptide from 1.5 ,Amol of trichloroacetic acid-precipitated SSB.

Dow

nloa

ded

by g

uest

on

May

22,

202

1

Page 5: Sequences ofthe ssb gene andproteinBiochemistry: Sancaret al. Table 1. Amino~acidcompositionofSSB Aminoacidanalyses* Amino DNA This acid sequence study Ref. 13 Ref. 14 Cys 0 0.3t 0.9

4278 Biochemistry: Sancar et al.

115a118

121

124127

13132

135137

14

141,144

147

150 +154

157162

165 OOQOOOQQ

+ _ _2

2 21 16 _

26w- - ~~+ + 29 34 37

+ Q_ Q0 Q40755

*... 993105 95

166 173

FIG. 5. Secondary structures of SSB. ( A ), (A), and (-) denote helical, (3-pleated-sheet, and random-coil residues, respectively. (3-structure turnsare indicated by chain reversals. The + and - signs refer to charged residues, and the numbers refer to residues at conformational boundaries.

suggests this interpretation is correct. In that study (18) it wasdetermined that SSB*-A actually binds about 3 times moretightly than does intact SSB to single-stranded DNA.The next 61 amino acids in SSB (residues 106-165) are un-

usual in that this region contains only two charged amino acidsand, except for a large number of a-structure bends, is devoidof any other predicted secondary structure. Finally, the chargedensity (-5) and secondary structure (primarily a-helical struc-ture) of the last 12 amino acids in SSB (residues 167-177) aresimilar to the analogous region in the bacteriophage T4 gene32 protein (9) and suggests that the carboxy termini ofboth pro-teins may be functionally homologous. The helix-destabilizing"activity" ofthe phage T4 single-stranded DNA binding protein(gene 32 protein) appears to be partially repressed by its carboxyterminus (19). This control may arise through direct interactionsbetween the negatively charged carboxy terminus and posi-tively charged amino acids at the DNA binding site (20, 21). Thisinhibition of helix-destabilizing activity appears to be relievedby proteolytic removal of the carboxy terminus or by protein-protein interactions between the gene 32 protein and otherproteins, in the phage T4 DNA replication complex (22). Basedon the available data, it is tempting to speculate that the car-boxy-terminal region (residues 167-177) of SSB has a similarfunction. Even though SSB and the gene 32 protein may haveat least one similar functional domain in common and both areinvolved in DNA replication, repair, and recombination, thelack of any extensive sequence homology suggests these pro-teins evolved independently.We thank Dr. William Konigsberg for his advice and encouragement

throughout this study. We are especially grateful to Mary LoPresti forher expert technical assistance in isolating the SSB tryptic peptides. Wealso wish to thank Gary David for running the amino acid sequencerand gas chromatograph. This investigation was supported in part byNational Institutes of Health Grants GM23451, GM11301, and CA06519

and by a Swebilius Foundation Grant (to K.W.). J.W.C. is an Estab-lished Investigator of the American Heart Association (Grant 78-129.).1. Williams, K. R., & Konigsberg, W. H. (1981) in Gene Amplift-

cation and Analysis, eds. Chirikjian, J. G. & Paps, F. W. (Elsev-ier, Amsterdam) Vol. 2, in press.

2. Coleman, J. & Oakley, J. (1979) Crit. Rev. Biochem. 7, 247-289.3. Glassberg, J., Meyer, R. & Kornberg, A. (1979)J. Bacteriol. 140,

14-19.4. Sancar, A. & Rupp, W. D. (1979) Biochem. Biophys. Res. Com-

mun. 90, 123-129.5. Chase, J., Whittier, R., Auerbach, J., Sancar, A. & Rupp, W. D.

(1980) Nucleic Acids Res. 8, 3215-3227.6. Sancar, A. & Rupert, C. S. (1978) Nature (London) 272, 471-472.7. Maxam, A. & Gilbert, W. (1980) Methods Enzymot 65, 499-60.8. Sancar, A., Hack, A. & Rupp, W. D. (1979) J. Bacteriol. 137,

692-693.9. Williams, K. R., LoPresti, M. & Setoguchi, M. (1981) J. Biol.

Chem. 256,1754-1762.10. Cohen, F. (1979) Dissertation (Yale Univ., New Haven, CT).11. Chou, P. & Frasman, G. (1978) Adv. Enzymol. 47, 45-148.12. Sancar, A., Wharton, R. P., Seltzer, S., Kacinski, B. M., Clarke,

N. D. & Rupp, W. D. (1981) J. MoL Biol, 148, 45-62.13. Anderson, R. & Coleman, J. (1975) Biochemistry 14, 5485-5491.14. Weiner, J., Bertsch, L. & Kornberg, A. (1975) J. Biol. Chem.

250, 1972-1980.15. Rosenberg, M. & Court, D. (1980) Annu. Rev. Genet. 13, 319-

353.16. Nakamura, K. & Inouye, M. (1979) Cell 18, 1109-1117.17. Young, R. A. & Steitz, J. A. (1979) Cell 17, 225-234.18. Williams, K. R., Guggenheimer, R., Chasej. & Konigsberg, W.

H. (1981) Fed. Proc. Fed. Am. Soc; Exp. Biol., in press.19. Moise, H. & Hosoda, J. (1976) Nature (London) 254,455-458.20. Kowalczkowski, S. C., Lonberg, N.; Newport, J. W. & von Hip-

pel, P. H. (1981)J..Mol. Biol. 145, 75-104.21. Lonberg, N., Kowalczkowski, S. C., Paul, L. S. & von Hippel,

P. H. (1981)J. Mol. Biol. 145, 123-138.22. Burke, R., Alberts, B. &.Hosoda, J. (1981) J. Biol. Chem. 255,

11484-11493.

Proc. Natl. Acad.'Sci. USA 78 (1981)

Dow

nloa

ded

by g

uest

on

May

22,

202

1