cloning dna coding clostridium ss · (50 pug/ml), tetracycline (12.5 rg/ml), orampicillin (50...

10
Vol. 175, No. 5 Cloning and DNA Sequence of the Gene Coding for Clostridium thermocellum Cellulase Ss (CelS), a Major Cellulosome Component WILLIAM K. WANG, KRISTIINA KRUUS, AND J. H. DAVID WU* Department of Chemical Engineering, University of Rochester, Rochester, New York 14627-0166 Received 17 August 1992/Accepted 8 December 1992 Clostridium thermocellum ATCC 27405 produces an extracellular cellulase system capable of hydrolyzing crystalline cellulose. The enzyme system involves a multicomponent protein aggregate (the cellulosome) with a total molecular weight in the millions, impeding mechanistic studies. However, two major components of the aggregate, Ss (M, = 82,000) and SL (Mr = 250,000), which act synergistically to hydrolyze crystalline cellulose, have been identified (J. H. D. Wu, W. H. Orme-Johnson, and A. L. Demain, Biochemistry 27:1703-1709, 1988). To further study this synergism, we cloned and sequenced the gene (celS) coding for the Ss (CelS) protein by using a degenerate, inosine-containing oligonucleotide probe whose sequence was derived from the N-terminal amino acid sequence of the CelS protein. The open reading frame of celS consisted of 2,241 bp encoding 741 amino acid residues. It encoded the N-terminal amino acid sequence and two internal peptide sequences determined for the native CelS protein. A putative ribosome binding site was identified at the 5' end of the gene. A putative signal peptide of 27 amino acid residues was adjacent to the N terminus of the CelS protein. The predicted molecular weight of the secreted protein was 80,670. The celS gene contained a conserved reiterated sequence encoding 24 amino acid residues found in proteins encoded by many other clostridial cel or xyn genes. A palindromic structure was found downstream from the open reading frame. The celS gene is unique among the known cel genes of C. thermocellum. However, it is highly homologous to the partial open reading frame found in C. cellulolyticum and in Caldocellum saccharolyticum, indicating that these genes belong to a new family of cel genes. Clostridium thermocellum, an anaerobic thermophile, pro- duces a thermostable and highly active cellulase system (23, 40) capable of completely degrading crystalline cellulose (23). The enzyme system is a complex aggregate (cellulo- some) of at least 14 subunits with a total molecular weight in the millions (1, 9, 26, 28, 36, 59). Its complicated structure has impeded the study of its reaction mechanism. At least 15 endoglucanase genes have been cloned into Escherichia coli (6, 38, 45). The genes celA (5), ceiB (17), ceMC (51), celD (24), and celE (19), which code for endoglu- canases A, B, C, D, and E, respectively, have been se- quenced and the gene products have been studied. The celH gene, coding for endoglucanase H, has been sequenced and the gene product has been characterized by deletion analysis (60). The endoglucanase F gene, celF, has been sequenced, but gene expression has not been reported (39). Other well-characterized genes are xynZ (18), bglA (16), and bglB (15), which encode xylanase Z, 13-glucosidase A, and P-glu- cosidase B, respectively. Although the C. thermocellum cellulase system has a high specific activity for crystalline cellulose (21, 23), none of the genes identified appear to code for proteins capable of degrading crystalline cellulose. The genes essential for hydrolyzing crystalline cellulose there- fore remain to be identified. The first insight into the mechanism and organization of the cellulosome was obtained when two major cellulosome components, designated Ss and SL, were identified (59). Experiments in partially dissociating the cellulosome and in reconstituting the cellulase activity showed that these two components act in a cooperative manner to degrade crystal- * Corresponding author. line cellulose (59). Furthermore, the SL component appears to serve as an anchor on the cellulose surface for the Ss component. An anchor-enzyme model was proposed to explain this synergism (58). Further characterization of Ss and SL, the interactions between them, and the interactions between the proteins and the substrate may help to elucidate the enzyme mechanism and to understand its high specific activity. As these com- ponents are part of a complicated aggregate and are difficult to separate without denaturation, molecular cloning appears to be the only feasible approach for their preparation in large quantities. Furthermore, the DNA sequence obtained may shed more light on the nature of the interaction between these two components. Finally, the expression of the genes coding for the Ss and SL proteins in an otherwise non- cellulase producer will allow characterization of the gene products without interference from other cellulase subunits. This report describes the cloning of the gene coding for the Ss subunit (now denoted CelS) into E. coli and the nucle- otide sequence and sequence analysis of the respective DNA region. MATERIALS AND METHODS Bacterial strains and vectors. C. thermocellum ATCC 27405 was used as a source for the CelS protein and for genomic DNA. The E. coli strains used in this work were XL-1 Blue (Stratagene, La Jolla, Calif.) and DH1OB. E. coli XL-1 Blue {recAl endAl gyrA96 thi hsdRl7 supE44 rel4l lac (F' proAB lacIVZAM15 TnlO [Tetr])} served as the cloning host for plasmid pBluescript SK (-) and bacterio- phage lambda ZAPII (Stratagene). E. coli DH1OB [F- mcrA 1293 JOURNAL OF BACrERIOLOGY, Mar. 1993, p. 1293-1302 0021-9193/93/051293-10$02.00/0 Copyright ©D 1993, American Society for Microbiology

Upload: others

Post on 09-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cloning DNA Coding Clostridium Ss · (50 pug/ml), tetracycline (12.5 Rg/ml), orampicillin (50 jug/ml) as required. E. coli strains were stored in glycerol solution (3) at-70'C. Preparation

Vol. 175, No. 5

Cloning and DNA Sequence of the Gene Coding forClostridium thermocellum Cellulase Ss (CelS),

a Major Cellulosome ComponentWILLIAM K. WANG, KRISTIINA KRUUS, AND J. H. DAVID WU*

Department of Chemical Engineering, University of Rochester, Rochester, New York 14627-0166

Received 17 August 1992/Accepted 8 December 1992

Clostridium thermocellum ATCC 27405 produces an extracellular cellulase system capable of hydrolyzingcrystalline cellulose. The enzyme system involves a multicomponent protein aggregate (the cellulosome) with atotal molecular weight in the millions, impeding mechanistic studies. However, two major components of theaggregate, Ss (M, = 82,000) and SL (Mr = 250,000), which act synergistically to hydrolyze crystalline cellulose,have been identified (J. H. D. Wu, W. H. Orme-Johnson, and A. L. Demain, Biochemistry 27:1703-1709,1988). To further study this synergism, we cloned and sequenced the gene (celS) coding for the Ss (CelS) proteinby using a degenerate, inosine-containing oligonucleotide probe whose sequence was derived from theN-terminal amino acid sequence of the CelS protein. The open reading frame of celS consisted of 2,241 bpencoding 741 amino acid residues. It encoded the N-terminal amino acid sequence and two internal peptidesequences determined for the native CelS protein. A putative ribosome binding site was identified at the 5' endof the gene. A putative signal peptide of 27 amino acid residues was adjacent to the N terminus of the CelSprotein. The predicted molecular weight of the secreted protein was 80,670. The celS gene contained a

conserved reiterated sequence encoding 24 amino acid residues found in proteins encoded by many otherclostridial cel or xyn genes. A palindromic structure was found downstream from the open reading frame. ThecelS gene is unique among the known cel genes of C. thermocellum. However, it is highly homologous to thepartial open reading frame found in C. cellulolyticum and in Caldocellum saccharolyticum, indicating that thesegenes belong to a new family of cel genes.

Clostridium thermocellum, an anaerobic thermophile, pro-duces a thermostable and highly active cellulase system (23,40) capable of completely degrading crystalline cellulose(23). The enzyme system is a complex aggregate (cellulo-some) of at least 14 subunits with a total molecular weight inthe millions (1, 9, 26, 28, 36, 59). Its complicated structurehas impeded the study of its reaction mechanism.At least 15 endoglucanase genes have been cloned into

Escherichia coli (6, 38, 45). The genes celA (5), ceiB (17),ceMC (51), celD (24), and celE (19), which code for endoglu-canases A, B, C, D, and E, respectively, have been se-quenced and the gene products have been studied. The celHgene, coding for endoglucanase H, has been sequenced andthe gene product has been characterized by deletion analysis(60). The endoglucanase F gene, celF, has been sequenced,but gene expression has not been reported (39). Otherwell-characterized genes are xynZ (18), bglA (16), and bglB(15), which encode xylanase Z, 13-glucosidase A, and P-glu-cosidase B, respectively. Although the C. thermocellumcellulase system has a high specific activity for crystallinecellulose (21, 23), none of the genes identified appear to codefor proteins capable of degrading crystalline cellulose. Thegenes essential for hydrolyzing crystalline cellulose there-fore remain to be identified.The first insight into the mechanism and organization of

the cellulosome was obtained when two major cellulosomecomponents, designated Ss and SL, were identified (59).Experiments in partially dissociating the cellulosome and inreconstituting the cellulase activity showed that these twocomponents act in a cooperative manner to degrade crystal-

* Corresponding author.

line cellulose (59). Furthermore, the SL component appearsto serve as an anchor on the cellulose surface for the Sscomponent. An anchor-enzyme model was proposed toexplain this synergism (58).

Further characterization of Ss and SL, the interactionsbetween them, and the interactions between the proteins andthe substrate may help to elucidate the enzyme mechanismand to understand its high specific activity. As these com-ponents are part of a complicated aggregate and are difficultto separate without denaturation, molecular cloning appearsto be the only feasible approach for their preparation in largequantities. Furthermore, the DNA sequence obtained mayshed more light on the nature of the interaction betweenthese two components. Finally, the expression of the genescoding for the Ss and SL proteins in an otherwise non-cellulase producer will allow characterization of the geneproducts without interference from other cellulase subunits.

This report describes the cloning of the gene coding for theSs subunit (now denoted CelS) into E. coli and the nucle-otide sequence and sequence analysis of the respective DNAregion.

MATERIALS AND METHODS

Bacterial strains and vectors. C. thermocellum ATCC27405 was used as a source for the CelS protein and forgenomic DNA. The E. coli strains used in this work wereXL-1 Blue (Stratagene, La Jolla, Calif.) and DH1OB. E. coliXL-1 Blue {recAl endAl gyrA96 thi hsdRl7 supE44 rel4llac (F' proAB lacIVZAM15 TnlO [Tetr])} served as thecloning host for plasmid pBluescript SK (-) and bacterio-phage lambda ZAPII (Stratagene). E. coli DH1OB [F- mcrA

1293

JOURNAL OF BACrERIOLOGY, Mar. 1993, p. 1293-13020021-9193/93/051293-10$02.00/0Copyright ©D 1993, American Society for Microbiology

Page 2: Cloning DNA Coding Clostridium Ss · (50 pug/ml), tetracycline (12.5 Rg/ml), orampicillin (50 jug/ml) as required. E. coli strains were stored in glycerol solution (3) at-70'C. Preparation

1294 WANG ET AL.

1 2 3 4 5 6

N - THR -

(GLY)

7 8 9 10

PRO- THR - LYS - ALA - PRO- THR - LYS - ASP - GY

Deduced DNA 5' - ACT - cCT - KZ - AAA - = - OCr - ACT - AAA - GkT - GG - 3'Sequence C C C G C C C G C C

A A A A A A AG G G G G G G

Probe 5' - ACI - CCI - A - AAA - GCA - AC - ACI - AAA - GkT - GG -3'Sequence G G

Cbned 5' - GGT - car - ACA - AAG - CA - CCT - ACA - AAA - GOT - GGG -3 '

ceISDNA sequence

Deduced N - GLY - PRO- THR - LYS - ALA - PRO- THR- LYS - ASP- GLY -

amino add

FIG. 1. N-terminal amino acid sequence of the native CelS protein, the possible DNA sequences coding for the CelS N terminus, theoligonucleotide probe sequence, the corresponding DNA sequence of the cloned celS gene, and the deduced amino acid sequence for thecloned celS gene. Of two possible first amino acid residues at the N terminus, threonine was selected, although the cloned DNA sequencelater revealed that glycine is the correct residue. I, inosine residue in the probe sequence.

A(mrr-hsdRMS-mcrBC) +80dlacZAM15 AlacX74 end4lrecAl deoR A(ara-leu)7697 araDl39 galUgalK nupG rpsL]was used as a host for plasmid pBR322.Growth conditions and culture maintenance. C. thermocel-

lum was grown in Hungate tubes or anaerobic flasks with MJmedium (22) modified by increasing the amount of FeSO4eight times and the amount of vitamins two times. Thecarbon source was cellobiose for seed cultures or cotton forcellulase production. Cultures were incubated at 60'C untilthe late exponential phase (optical density at 660 nm = 0.5 to0.8) in cellobiose medium or until most of the cotton was

consumed (approximately 48 h).E. coli was grown at 370C on a rotary shaker or on agar

plates with Luria-Bertani medium (32) supplemented withmaltose (0.2%), MgSO4 (10 mM), isopropylthiogalactoside(IPTG) (1 mM), 5-bromo-4-chloro-3-indolyl-13-galactoside(50 pug/ml), tetracycline (12.5 Rg/ml), or ampicillin (50 jug/ml)as required. E. coli strains were stored in glycerol solution(3) at -70'C.

Preparation of the crude cellulase concentrate. The C.thermocellum culture broth from the cotton medium was

filtered through layers of cheesecloth to remove the residualcotton and centrifuged to remove the cell pellet. All stepswere carried out at 40C. The supernatant and cold acetonewere mixed in a 1:2 ratio to precipitate the proteins. Thesupernatant was removed by centrifugation. The pellet wasresuspended in a small amount of 10 mM Tris-HCI (pH 7.1).After the remaining salts were removed by centrifugation,the supernatant was dialyzed in 10 mM Tris-HCI (pH 7.1).The dialyzed enzyme solution was stored at -20'C.

Electrophoresis and electroblotting. The concentrated en-zyme preparation was subjected to sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) by themethod of Laemmli (27). Following electrophoresis, theprotein on the gel was transferred to a polyvinylidenedifluoride (PVDF) membrane by the method of Matsudaira(34). The membrane was stained with Coomassie brilliantblue R-250 for protein band visualization.

N-terminal amino acid sequence of the CelS protein. TheCelS band on the PVDF membrane was excised with a razorblade. The N-terminal amino acid sequence of the CelS

protein was determined by use of automated Edman se-

quencing on a model 475A gas-phase sequencer (AppliedBiosystems, Foster City, Calif.) and membrane pieces (29,34).

Internal peptide sequences of the CelS protein. The CelSprotein was eluted from the SDS gel, which had been brieflystained in a 0.25 M KCI solution to reveal the protein bands.The crushed gel containing CelS was soaked in an elutionbuffer [(10 mM Tris-HCl [pH 8.0], 1 mM dithiothreitol[DTT]) for 5 h. The gel material was removed by filtrationthrough a 0.45-,um-pore-size low-protein-binding filter (Gel-man Sciences, Ann Arbor, Mich.). Cold acetone (2.2 vol-umes) was used to precipitate the protein. The proteinprecipitate was resuspended in a 7 M urea-50 mM Tris-HC1(pH 8.0)-5 mM DTT solution.

Prior to protease digestion, the protein was denatured byincubation of the mixture at 60'C for 1 h. The urea concen-tration was then diluted to below 2 M by the addition ofdigestion buffer (50 mM sodium phosphate [pH 7.8], 1 mMDTT). Digestion with endoproteinase Glu-C (Promega, Mad-ison, Wis.) was performed as recommended by the supplier.After digestion, the peptides were separated by a 16.5%SDS-polyacrylamide gel by the method of Schagger and vonJagow (50). The gel was electroblotted to a PVDF mem-brane, and individual peptide bands were excised for N-ter-minal amino acid sequencing as described above.DNA isolation and manipulation. C. thennocellum genomic

DNA was purified by CsCl gradient ultracentrifugation (37).Small-scale plasmid preparations from E. coli were made bythe method of Holmes and Quigley (20). Plasmids for DNAsequencing reactions were isolated by the method of Reddyet al. (42). This step was followed by CsCl gradient ultra-centrifugation (41). DNA manipulations were carried out bystandard procedures (3, 48).

Radiolabeling of DNA probes. Oligonucleotides were radi-olabeled with [y-32P]ATP (specific activity, 6,000 Ci/mmol;NEN-DuPont, Boston, Mass.) by use of T4 polynucleotidekinase (44). Double-stranded DNA fragments were radiola-beled by use of the random-primer technique (12) with[a-32P]dATP (specific activity, 3,000 Ci/mmol; Amersham,Arlington Heights, Ill.).

Number ofAmino AcidResidue

Native CelSAmino AcidSequenoce

J. BACTERIOL.

Page 3: Cloning DNA Coding Clostridium Ss · (50 pug/ml), tetracycline (12.5 Rg/ml), orampicillin (50 jug/ml) as required. E. coli strains were stored in glycerol solution (3) at-70'C. Preparation

C. THERMOCELLUM celS 1295

A

1 2 3

B

1 2 3

23.1 kb -

9.4 kb -46.6kb -4.4 kb o.

2.3 kb No2.0kb -P

0.56 kb -4

i II kb

B E ED E H

- Cr/S

1.

FIG. 2. (A) Restriction digestion of total C. thernocellumgenomic DNA with PstI (lane 1), EcoRI (lane 2), and BamHI (lane3) restriction enzymes. Digested DNA was separated by electro-phoresis through a 1% agarose gel and stained with ethidiumbromide. (B) Autoradiogram of the total genomic DNA digesttransferred to a nylon membrane and hybridized to a 32P-labeled,29-base CelS N-terminal oligonucleotide probe. Strong hybridiza-tion occurred with a 15.5-kb PstI fragment (lane 1), a 1.7-kb EcoRIfragment (lane 2), and a 5.2-kb BamHI fragment (lane 3).

DNA hybridization. The DNA separated by agarose gelelectrophoresis (54) was transferred to nylon membranes(Nytran, Schleicher & Schuell, Inc., Keene, N.H.) by themethod of Reed and Mann (43). For dot hybridization,plasmid DNA was denatured with 0.3 M NaOH by incuba-tion at 65°C for 1 h and spotted onto nylon membranes.Hybridization with an oligonucleotide probe was done by theprotocol of Southern (55). For hybridization with a randomlyprimed, double-stranded DNA probe, the membrane wasprehybridized for 30 min at 50°C in QuikHyb hybridizationsolution (Stratagene). Hybridization was carried out for 1 hat 50°C as recommended by the supplier. Bacteriophageplaques were transferred to nylon membranes by the proce-dures of Sambrook et al. (48) prior to hybridization.

Construction and screening of genomic libraries. The aga-rose gel-fractionated or unfractionated genomic DNA frag-ments were ligated to the restriction enzyme-digested bac-teriophage lambda ZAPII or plasmid pBR322 vector, whichwas, if necessary, dephosphorylated or partially filled in.

Bacteriophage libraries were screened by hybridization ofplaque lifts with the respective probe. The inserts of thepositive clones were subcloned into plasmid pBluescript SK(-) by the in vivo subcloning method (53).The plasmids were introduced into E. coli DH1OB by

electroporation with a Gene Pulser device (Bio-Rad, Her-cules, Calif.) at settings of 2,500 V, 200 fQ, and 25 ,uF. Thetransformants were screened by dot hybridization with anappropriate probe and small-scale plasmid DNA prepara-tions. Positive plasmids were then examined for determina-tion of their restriction maps.PCR. Cloned bacteriophage inserts were amplified by the

polymerase chain reaction (PCR) with Taq polymerase (Per-kin Elmer, Norwalk, Conn., or GIBCO-Bethesda ResearchLaboratories, Gaithersburg, Md.) by the method describedby Saiki et al. (47). The CelS N-terminal [5'-ACICCIACIAA(A/G)GCICCIACIAA(A/G)GATGG-3'] probe and ei-ther the M13 forward (5'-GTAAAACGACGGCCAGT-3') orthe M13 reverse (5'-AGCGGATAACAATlTlCACACAGGA-3') oligonucleotide were used as amplification primers(8).

1I. i i1*4 4

I1l.

Iv. i IFIG. 3. Position and orientation of the 2.2-kb celS open reading

frame relative to restriction sites. The four restriction fragmentsshown (i to iv) were those cloned to obtain the complete ORFsequence. The two arrows underneath the 5.2-kb BamHI fragment(ii) denote the PCR primers used to synthesize the probe leading tothe cloning of the 0.97-kb EcoRI fragment (iii). One of the primersequences was taken from the cloning vector. Restriction sites: B,BamHI; E, EcoRI; H, HindIIl.

DNA sequencing. DNA was sequenced by the dideoxychain termination method of Sanger et al. (49) with[a-35S]dATP and a Sequenase 2.0 sequencing kit (UnitedStates Biochemical Corp., Cleveland, Ohio). Double-stranded DNA templates were sequenced with primers cor-responding to either the vector or the ends of a previouslydetermined sequence. The entire sequence reported wasdetermined at least twice on the same strand.DNA sequence analysis. Computer programs used in ana-

lyzing the nucleotide and amino acid sequences were pro-vided by the University of Wisconsin Genetics ComputerGroup (10). Protein sequences were compared with allrespective data in the nonredundant Swiss-Prot, ProteinInformation Resource, GenPept, and GenPeptUpdate (atotal of 81,719 sequences) data banks by use of the BLASTnetwork service at the National Center for Biotechnology(2).

Nucleotide sequence accession number. The nucleotidesequence data reported here (see Fig. 4) have been submit-ted to GenBank and have been assigned the accessionnumber L06942.

RESULTS AND DISCUSSION

Amino acid sequence at the N terminus of the CelS protein.The extracellular proteins of the C. thermocellum culturewere concentrated by acetone precipitation and separated bySDS-PAGE. The separated protein bands on the gel weretransferred by electroblotting to a PVDF membrane, and themembrane was stained with Coomassie brilliant blue R-250.The protein band corresponding to the CelS subunit (Mr =82,000) was excised from the blot. The amino acid sequenceof the CelS protein was determined by automated Edmandegradation as described in Materials and Methods. A se-

VOL. 175, 1993

Page 4: Cloning DNA Coding Clostridium Ss · (50 pug/ml), tetracycline (12.5 Rg/ml), orampicillin (50 jug/ml) as required. E. coli strains were stored in glycerol solution (3) at-70'C. Preparation

1296 WANG ET AL.

-219 ATGTCAAATGCGCGGCTGATTTGATAAAAAAGTTTGTTAACACAAATTTATTATGTTAAC -160

-159 ACAAGTATTTTTTGGGTCCAGCTTAGTTTTATGATGAAAATAATGCGTAAAATTTATCCG -100

-99 CCAAAAGGGGGAATGAATTTATTGCGGGTAGGTTGCATTATTTCATCATATAACTTAAAA -40SD

1 M V K S R K I 7-39 AGAATAAAAAAGTATATTTGAAAGGGGAAGATGGAGAGAATGGTAAAAAGCAGAAAGATT 20

8 S I L L A V A M L V S I M I P T T A F A 2721 TCTATTCTGTTGGCAGTTGCAATGCTGGTATCCATAATGATACCCACAACTGCATTCGCA 80

...........--........-*---.....

28 G P T K A P T K D G T S Y K D L F L E L 4781 GGTCCTACAAAGGCACCTACAAAAGATGGGACATCTTATAAGGATCTTTTCCTTGAACTC 140

48 Y G K I K D P K N G Y F S P D E G I P Y 67141 TACGGAAAAATTAAAGATCCTAAGAACGGATATTTCAGCCCAGACGAGGGAATTCCTTAT 200

68 H S I E T L I V E A P D Y G H V T T S E 87201 CACTCAATTGAAACATTGATCGTTGAAGCGCCGGACTACGGTCACGTTACTACCAGTGAG 260

88 A F S Y Y V W L E A M Y G N L T G N W S 107261 GCTTTCAGCTATTATGTATGGCTTGAAGCAATGTATGGAAATCTCACAGGCAACTGGTCC 320

108 G V E T A W K V M E D W I I P D S T E Q 127321 GGAGTAGAAACAGCATGGAAAGTTATGGAGGATTGGATAATTCCTGACAGCACAGAGCAG 380

128 P G M S S Y N P N S P A T Y A D E Y E D 147381 CCGGGTATGTCTTCTTACAATCCAAACAGCCCTGCCACATATGCTGACGAATATGAGGAT 440

148 P S Y Y P S E L K F D T V R V G S D P V 167441 CCTTCATACTATCCTTCAGAGTTGAAGTTTGATACCGTAAGAGTTGGATCCGACCCTGTA 500

168 H N D L V S A Y G P N M Y L M H W L M D 187501 CACAACGACCTTGTATCCGCATACGGTCCTAACATGTACCTCATGCACTGGTTGATGGAC 560

188 V D N W Y G F G T G T R A T F I N T F Q 207561 GTTGACAACTGGTACGGTTTTGGTACAGGAACACGGGCAACATTCATAAACACCTTCCAA 620

208 R G E Q E S T W E T I P H P S I E E F K 227621 AGAGGTGAACAGGAATCCACATGGGAAACCATTCCTCATCCGTCAATAGAAGAGTTCAAA 680

228 Y G G P N G F L D L F T K D R S Y A K Q 247681 TACGGCGGACCGAACGGATTCCTTGATTTGTTTACAAAGGACAGATCATATGCAAAACAG 740

248 W R Y T N A P D A E G R A I Q A V Y W A 267741 TGGCGTTATACAAACGCTCCTGACGCAGAAGGCCGTGCTATACAGGCTGTTTACTGGGCA 800

268 N K W A K E Q G K G S A V A S V V S K A 287801 AACAAATGGGCAAAGGAGCAGGGTAAAGGTTCTGCCGTTGCTTCCGTTGTATCCAAGGCT 860

288 A K M G D F L R N D M F D K Y F M K I G 307861 GCAAAGATGGGTGACTTCTTGAGAAACGACATGTTCGACAAATACTTCATGAAGATCGGT 920

308 A Q D K T P A T G Y D S A H Y L M A W Y 327921 GCACAGGACAAGACTCCTGCTACCGGTTATGACAGTGCACACTACCTTATGGCCTGGTAT 980

328 T A W G G G I G A S W A W K I G C S H A 347981 ACTGCATGGGGTGGTGGAATTGGTGCATCCTGGGCATGGAAGATCGGATGCAGCCACGCA 1040

FIG. 4. Nucleotide and deduced amino acid sequences for the celS gene of C. thermocellum. The numbering of nucleotides and amino acidresidues starts with the beginning of the coding sequence. The putative Shine-Dalgarno (SD) ribosome binding site is marked by asterisks.The putative signal peptide sequence is overlined. The experimentally determined N-terminal amino acid and two internal peptide sequencesare marked by dots. The 3'-end palindrome is indicated by arrows facing each other.

quence containing 10 amino acid residues was obtained (Fig. in position 10 was eliminated by shortening the oligonucle-1). otide probe to 29 bases from 30 bases. For the other amino

Design of the oligonucleotide probe. A survey of the pub- acids that could be represented by four different possiblelished DNA sequences for C. thermocellum (ceL4 [5], ceiB codons each (amino acid residues 1, 2, 3, 5, 6, and 7), the[17], ceiC [51], celD [24], and xynZ [18]) revealed no neutral nucleoside inosine was used in the positions ofsignificant bias in codon usage for the amino acid residues uncertainty. Inosine is capable of pairing with all fourpresent in the CelS N terminus. conventional bases and neither increases nor decreasesOf the 10 amino acid residues obtained, 7 have a fourfold overall hybridization strength (33). Thus, six inosine resi-

degeneracy and 3 have a twofold degeneracy from the dues were incorporated into the oligonucleotide probe (ingenetic code (Fig. 1). The uncertainty presented by glycine positions 3, 6, 9, 15, 18, and 21).

J. BACTERIOL.

Page 5: Cloning DNA Coding Clostridium Ss · (50 pug/ml), tetracycline (12.5 Rg/ml), orampicillin (50 jug/ml) as required. E. coli strains were stored in glycerol solution (3) at-70'C. Preparation

C. THERMOCELLUM celS 1297

H F G Y Q N P F Q G W V S A T Q S D F ACACTTCGGATATCAGAACCCATTCCAGGGATGGGTAAGTGCAACACAGAGCGACTTTGCT

P K S S N G K R D W T T S Y K R Q L E FCCTAAATCATCCAACGGTAAGAGAGACTGGACAACAAGCTACAAGAGACAGCTTGAATTC

Y Q W L Q S A E G G I A G G A T N S W NTATCAGTGGTTGCAGTCGGCTGAAGGTGGTATTGCCGGTGGAGCAACCAACTCCTGGAAC

G R Y E K Y P A G T S T F Y G M A Y V PGGTAGATATGAGAAATATCCTGCTGGTACGTCAACGTTCTATGGTATGGCATATGTTCCG............

H P V Y A D P G S N Q W F G F Q A W S MCATCCTGTATACGCTGACCCGGGTAGTAACCAGTGGTTCGGATTCCAGGCATGGTCAATG

Q R V M E Y Y L E T G D S S V K N L I KCAGCGTGTAATGGAGTACTACCTCGAAACAGGAGATTCATCAGTTAAGAATTTGATTAAG

K W V D W V M S E I K L Y D D G T F A IAAGTGGGTCGACTGGGTAATGAGCGAAATTAAGCTCTATGACGATGGAACATTTGCAATT

P S D L E W S G Q P D T W T G T Y T G NCCTAGCGACCTCGAGTGGTCAGGTCAGCCTGATACATGGACCGGAACATACACAGGCAAC

P N L H V R V T S Y G T D L G V A G S LCCGAACCTCCATGTAAGAGTAACTTCTTACGGTACTGACCTTGGTGTTGCAGGTTCACTT

A N A L A T Y A A A T E R W E G K L D TGCAAATGCTCTTGCAACTTATGCCGCAGCTACAGAAAGATGGGAAGGAAAACTTGATACA

K A R D M A A E L V N R A W Y N F Y C SAAAGCAAGAGACATGGCTGCTGAACTGGTTAACCGTGCATGGTACAACTTCTACTGCTCT

E G K G V V T E E A R A D Y K R F F E QGAAGGAAAAGGTGTTGTTACTGAGGAAGCACGTGCTGACTACAAACGTTTCTTTGAGCAG

E V Y V P A G W S G T M P N G D K I Q PGAAGTATACGTTCCGGCAGGTTGGAGCGGTACTATGCCGAACGGTGACAAGATTCAGCCT

G I K F I D I R T K Y R Q D P Y Y D I VGGTATTAAGTTCATAGACATCCGTACAAAATATAGACAAGATCCTTACTACGATATAGTA

Y Q A Y L R G E A P V L N Y H R F W H ETATCAGGCATACTTGAGAGGCGAAGCTCCTGTATTGAATTATCACCGCTTCTGGCATGAA

V D L A V A M G V L A T Y F P D M T Y KGTTGACCTTGCAGTTGCAATGGGTGTATTGGCTACATACTTCCCGGATATGACATATAAA

V P G T P S T K L Y G D V N D D G K V NGTACCTGGTACTCCTTCTACTAAATTATACGGCGACGTCAATGATGACGGAAAAGTTAAC

S T D A V A L K R Y V L R S G I S I N TTCAACTGACGCTGTAGCATTGAAGAGATATGTTTTGAGATCAGGTATAAGCATCAACACT

D N A D L N E D G R V N S T D L G I L KGACAATGCCGATTTGAATGAAGACGGCAGAGTTAATTCAACTGACTTAGGAATTTTGAAG

R Y I L K E I D T L P Y K N *AGATATATTCTCAAAGAAATAGATACATTGCCGTACAAGAACTAATTTCAAAACTGATTT

FIG. 4-Continued.

For lysine (amino acid residues 4 and 8), both possiblenucleoside residues (adenosine and guanine) were incorpo-rated into the oligonucleotide probe sequence, resulting in a

probe with a fourfold degeneracy. For aspartic acid (aminoacid residue 9), the third base could be either cytosine or

thymine. Sambrook et al. (48) have argued that a mis-matched thymine guanine pair is somewhat more stablethan an inosine guanine pair, suggesting that thymine is a

better choice than inosine or cytosine for this particularposition in constructing the probe. Thus, for the asparticacid residue, the single codon GAT was used in the oligo-nucleotide probe. Therefore, the CelS N-terminal probe was

a 29-base, mixed oligonucleotide with a fourfold degeneracyand six inosine residues (Fig. 1).The specificity of the oligonucleotide probe was examined

by probing a total C. thermocellum genomic DNA digestwith the radiolabeled oligonucleotide probe by the Southernhybridization technique. As shown in Fig. 2, the oligonucle-otide probe hybridized predominantly to a 15.5-kb PstIfragment, a 1.7-kb EcoRI fragment, or a 5.2-kb BamHIfragment, demonstrating its specificity. Both the 1.7-kbEcoRI and the 5.2-kb BamHI fragments were cloned asdescribed below.

Cloning of the celS gene. For cloning of the 1.7-kb EcoRI

3481041

3681101

3881161

4081221

4281281

4481341

4681401

4881461

5081521

5281581

5481641

5681701

5881761

6081821

6281881

6481941

6682001

6882061

7082121

7282181

3671100

3871160

4071220

4271280

4471340

4671400

4871460

5071520

5271580

5471640

5671700

5871760

6071820

6271880

6471940

6672000

6872060

7072120

7272180

7412240

VOL. 175, 1993

Page 6: Cloning DNA Coding Clostridium Ss · (50 pug/ml), tetracycline (12.5 Rg/ml), orampicillin (50 jug/ml) as required. E. coli strains were stored in glycerol solution (3) at-70'C. Preparation

1298 WANG ET AL.

G> -- C> < ___<--C

2241 GAAAGGACGGCTTGTGCCGGTCTTTTTTA(.CATTTCTAAAGCCATACCATGGCTTTTCGCA 2300

2301 TAATTTCTATTATATTCGCTAAAAACCAATGATTTTTGTGCCGAAATATTGTATAATAAA 2360

2361 TAATAATGTGTTCTTTTTTGAAAAAAGAGTCGCATGGTCTGCAATGTGCAAAGGAGCTGA 2420

2421 TGGTATGGCGGAGAAAAACAAGGTGGAAGTCAGGATAGCGGGAAAAGATTATACGTTGGT 2480

25402481 TGGTTGCGAATCCGAAGAGTATATTCAGAAGGTGGCGCTG

2541 TGAAATCATGAGATTGAATAATAAGCTT 2568

FIG. 4-Continued.

fragment, C. thermocellum genomic DNA was digested withEcoRI and the DNA fragments were fractionated by agarosegel electrophoresis. The EcoRI fragments between 2.2 and1.0 kb were recovered from the gel. These fragments werecloned into the bacteriophage lambda ZAPII vector.A recombinant bacteriophage clone containing the 1.7-kb

EcoRI fragment was isolated by screening the bacteriophagelibrary with the radiolabeled oligonucleotide probe. Its insertwas subcloned into the pBluescript SK (-) plasmid vectorby the in vivo excision procedure, and the nucleotide se-

quence was determined as described in Materials and Meth-ods. As shown in Fig. 1, the cloned insert contained a DNAsequence coding for an amino acid sequence matching thatof the N terminus of the native CelS protein. Thus, the1.7-kb EcoRI fragment contained the 5' end of the celS gene.Its DNA sequence contained an additional 85 bp of the celSstructural gene, which was truncated by an EcoRI restrictionsite.A clone containing the 5.2-kb BamHI fragment was simi-

larly cloned and sequenced. This clone contained 247 bp ofthe celS structural gene in addition to the 85 bp contained bythe 1.7-kb EcoRI fragment. However, the gene was stilltruncated by a BamHI restriction site. For facilitation of"genomic walking," a region of this fragment that corre-

sponds to the N-terminal region of CeIS was amplified byPCR with the cloned bacteriophage DNA as a template andthe N-terminal (corresponding to the N-terminal amino acidsequence) and M13 forward (corresponding to the vectorDNA sequence) oligonucleotides as primers (Fig. 3). ThePCR product was radiolabeled by the random-primer tech-nique. Southern hybridization with this PCR product as aprobe identified a 0.97-kb EcoRI fragment as containing thedownstream sequence of the celS structural gene. Thisfragment was cloned by use of the pBR322 plasmid vectorand the PCR product as a probe for screening after sizefractionation on an agarose gel. The fragment was sequencedand found to contain 970 bp of the celS structural gene;however, the fragment did not contain the complete se-quence needed for the entire open reading frame (ORF) ofthe celS gene. For another step of genomic walking, the0.97-kb EcoRI fragment recovered from an agarose gel andradiolabeled by the random-primer technique was in turnused as a probe for identifying a 2.08-kb BamHI-HindIIIfragment as extending the celS structural gene. This frag-ment was also cloned and sequenced by use of the pBR322vector. A total of four restriction fragments were found toencompass the complete ORF of the celS gene (Fig. 3).

Sequence of the C. theymocelum celS gene. The completenucleotide sequence of the celS gene is shown in Fig. 4.Beginning with the native CelS N terminus, the encoded714-amino-acid polypeptide had a calculated molecularweight of 80,670, in close agreement with the M, of 82,000observed for the native protein.

For confirmation that the celS sequence codes for the CelSprotein, the internal peptide sequences of the CelS proteinwere determined. The native CelS protein eluted from anSDS gel was digested with endoproteinase Glu-C, and theN-terminal sequences of the proteolytic fragments weredetermined as described in Materials and Methods. Threeinternal peptide sequences were obtained. One matcheddeduced amino acid residues 412 to 431 for the celS gene(Fig. 4). The other two internal peptide sequences were thesame, indicating that proteolysis was incomplete. Theymatched deduced amino acid residues 589 to 605 for the MelSgene (Fig. 4). These sequences were preceded by a glutamicacid residue, the expected cleavage site for endoproteinaseGlu-C.

Sequence analysis of the celS gene. (i) Ribosome bindingsite. A sequence (5'-AGGGGAA-3') homologous to theconsensus ribosome binding site sequence for C. thermocel-

TABLE 1. Codon usage in the MelS gene of C. thennocellum

Codon Amino No. of Codon Amino No. ofacid times used acid times used

TTl Phe 6 TCT Ser 8TTC Phe 22 TCC Ser 10TTA Leu 2 TCA Ser 15TTG Leu 16 TCG Ser 1CTl Leu 12 CCT Pro 23CTC Leu 8 CCC Pro 1CTA Leu 0 CCA Pro 3CTG Leu 3 CCG Pro 11ATT Ile 16 ACT Thr 14ATC Ile 5 ACC Thr 7ATA Ile 10 ACA Thr 29ATG Met 22 ACG Thr 2GTT Val 21 GCT Ala 20GTC Val 2 GCC Ala 6GTA Val 20 GCA Ala 37GTG Val 0 GCG Ala 1TAT Tyr 27 TGT Cys 0TAC Tyr 27 TGC Cys 2TAA End 1 TGA End 0TAG End 0 TGG Trp 28CAT His 4 CGT Arg 7CAC His 8 CGC Arg 1CAA Gln 2 CGA Arg 0CAG Gln 19 CGG Arg 1AAT Asn 9 AGT Ser 4AAC Asn 24 AGC Ser 12AAA Lys 21 AGA Arg 18AAG Lys 25 AGG Arg 0GAT Asp 17 GGT Gly 35GAC Asp 33 GGC Gly 7GAA Glu 25 GGA Gly 22GAG Glu 14 GGG Gly 2

J. BACTERIOL.

Page 7: Cloning DNA Coding Clostridium Ss · (50 pug/ml), tetracycline (12.5 Rg/ml), orampicillin (50 jug/ml) as required. E. coli strains were stored in glycerol solution (3) at-70'C. Preparation

C. THERMOCELLUM celS 1299

KL YGD V NVVYGDVNVEjY G D V NVL YGDVNI LYGDVN

G D V NVKKGDVNI KHGDLNTGLGDL NIV Y G DJNDL KG D V NGK GDV NI LYGDVNKL D V NYSL GDVN

D

GD

G

FL

F

GNNE

sGK

D G K V N S TDCL LGI S I N T DN ADG1VNST DLTML KRYL L KSV T N I NREAADGR VN SS DV[E]LL KR Y L L VE NI NK EA AG KV NS TD LT LL KR Y VL KA V ST JSSK AE KN A

DGK INST DnCTML K RY I L R GI E FPS P S GlIAADG RI NS T D YSEMLKR YV IK SL EF TD PE EH QK FI AA AD V N ST DF SLL K R Y I L KE3VD IN SI NV TN AILNA V N S T D L ML K R Y I L K S L E L GT S E H E E K F K K A AG N INSSDL QALKR[EgLL GI SPLTGEAL LRA

DG NV DA LD FA G|L KK YI MA AD HA TV KNLD G AI D AL D I AAIL K KE31 LT QT TS NI SL TN AD GN I D A I D F AEL K K Y L L DEfSI SiN K VN AD| NVD AL D FA AL KK YL LG GT SS IIDV KA ADETV D AI D L A L K K Y L. L NEiS TT I|NT AN AD

LKVNIA I DZ V L K 1L L [G T M L S V S

DL NE DGRV NST DL~gI L K RYIL KSl |DTPT]YK N. COOHDVNR DG INSSDMTILKRYLIKSIPHILPIY-COOHDV NVSGTVNST DLEjl MK RYVL R S ISELPYK-COOHDVNRDGRVNSSDVTIL[ORYLREjIEKjLPIl-COOH|DVNAADINST DL L MKKYL LRS! DK|F PDV GNGRI NST DLJV LERY I L KLUIEKL AEQ-COOHI N|N|DRINsT I L K RE1 LN COOHINSTDIS EILDLNRDNKVDSTDLTIL KRYLLYAHSEIPl-COOH

|D VNIL| NE|VAFIDL K KY LLGMDS KPS N -C OO HIDM NIND GNI AlI IDFFAQL K V K LLN - C OO H|DiL|DG|D|I AlI D F|A K|L K LL|G D- C OO HDTYKDGNI AIDMATLKKYLLGE3TQL QG-COOHD MNSDN I DAI DYA LL KKAL LISI Q- COOHIDMNKDGK V NALLA|VL KKIM S C OO H

710448533620450708

870463445438561695

419

741477563649480740

899492475460584725

441

FIG. 5. Alignment of the conserved, duplicated regions of CelS, CelA, CeOB, CelD, CelE, CelF, CeiX, CelH, and XynZ of C.thennocellum, CeICCA, CelCCC, CelCCD, CelCCG, and ORF1 of C. cellulolyticum, and EngB of C. cellulovorans. Boxed amino acids are

identical or have similar chemical properties. Numbers indicate the positions, within the sequence of each protein, of the first and last aminoacids shown on a line. Similar residues were as follows: V, L, I, M, and F; R and K; D and E; N and Q; Y, F, and W; and S and T.

lum cellulase genes (5'-AGGAGGA-3' [5, 16-19, 24, 39, 51,60]) was found in the 5' end of the ORF. This Shine-Dalgarnosequence was located 10 bp upstream of a putative initiationcodon. While a 5- to 8-bp distance between the ribosomebinding site and the initiation codon is most frequently found(5, 16-19, 51, 60), a 12-bp distance has been reported for thecelD gene (24).

(ii) A+T content. As in other cel, xyn, and bgl genes fromC. thermocellum (5, 16-19, 24, 51, 60), the 5'-end noncodingregion of the celS gene is enriched in A+T residues. TheA+T content of the 203 bp upstream from the ribosomebinding site is 71.4%, while the translated sequence has anA+T content of 54.9%. No sequence displaying obviousstrong homology to known E. coli (46) or Bacillus subtilis(30) promoters was found.

(iii) Signal peptide. Immediately following the putativeinitiation codon was a peptide sequence characteristic ofprokaryotic signal sequences (57). The sequence contains 27amino acids before the cleavage site. Three of the first 6amino acids in the signal peptide are positively charged andare followed by 15 predominantly hydrophobic amino acids.A secondary structure-breaking residue (proline) is located 6residues before the cleavage site. As has been commonlyobserved (57), the signal sequence ends with an alanineresidue.

(iv) Codon usage. The codon usage for the celS gene (Table1) is generally similar to those reported for other C. thermo-cellum genes (17-19, 24, 51). As with the celC (51) and the

celE (19) genes, there appears to be a bias toward codonsthat end with A or T nucleotides.

(v) Palindromic sequence. A palindromic sequence corre-sponding to an mRNA hairpin loop with a AG of -10.9kcal/mol (ca. -45.6 kJ/mol) (7) was found 16 bp downstreamof the stop codon. It is followed by a DNA sequence rich inA+T residues. Palindromic sequences following other C.thermocellum genes have been found (5, 16-18, 24, 60).These structures may function as a transcription terminationsignal (46). However, their exact role remains to be deter-mined.

Sequence comparison. The deduced amino acid sequencefor the celS gene revealed no global similarities (2) to thosefor other C. thermocellum genes (celA [5], ceiB [17], ceiC

[51], celD [24], celE [19], celF [39], celH [601, xynZ [18], andbglA [16]). However, a region of 64 amino acids at the Cterminus of CelS was highly homologous to the conserved,duplicated regions of other clostridial proteins (CelA, CelB,CelD, CelE, CelF, CelH, and CelX [19] and XynZ of C.thermocellum; CelCCA [11], CelCCC [4], CelCCD [52],CelCCG [4], and ORF1 [4] of C. cellulolyticum; and EngB[13] of C. cellulovorans [Fig. 5]). This region, contained inthe C terminus of all but the CelE and XynZ proteins, hastwo highly conserved sequences each composed of 24 aminoacids. These two segments in CelS are connected by 8residues that exhibit some similarity in size and compositionto the peptides linking the duplicated sequences of CelA andCelB.

675413498581411666

828426411402526660

384

CeISCeIACeIBCeJDCelECeIFCeIXCeIHXynZCeICCACelCCCCeICCDCelCCGORF1EngB

CeISCeIACeIBCelDCelECeIFCeIXCeIHXynZCeICCACeICCCCeICCDCeICCGORF1EngB

711449534621451709

871464446439562696

420

VOL. 175, 1993

L--

Page 8: Cloning DNA Coding Clostridium Ss · (50 pug/ml), tetracycline (12.5 Rg/ml), orampicillin (50 jug/ml) as required. E. coli strains were stored in glycerol solution (3) at-70'C. Preparation

1300 WANG ET AL. J. BACTERIOL.

CeIS 386 IEFYOWLQSAEGGIAGGATNSWNGRYEK AIGTSTFYG rAW71PHPVYADPGA 1 IEFY OWL QS A E GA I AG GA TN S WN G RYEA VPSGTS F G GIEN PV YA D PGB 1 LG

CelS 436 SFGFQAWSMORVLMELETGDSS VNKN; V D WLVL DDGTFA 51 SNTWFGM WS MO RV AE[ YKTGD ARIAKL L DK A K WI NGEI KFNADGTFB 2 SNTWFGFQAWSMORVAE YYVTGDKDUAGAL LEVKSV S W I K VKLN§DGTFC 50 1KFNGDGTF

CelS 486 l PS D GQ P DT U - - DL GVAGSL ANAL AA 101 I P ST II DVJGOPDT NN PTTYTGNI]NLHVKVVNYGTDLGLIAS1SLANE[ILTB 52.AIPSTL-DWSGQPDT U--NYTINPLTVVDY G SL A N A Lc 59 F N

CelS 533 TIY A A ATEEK DT K A A YN FY C G KG E A R A D YA 150 YYA AS G - ---- DETSRNAOQLLDSMNNV SDS STVIEIOIRGDVYB 99 K -gV E K N L A K L L D K--LY DEKL PEKRADY

CeIS 583 R F F E Q E V Y V P A GWSGTMPNG OPGIKFIDIRTKYDP D IVYLA 192 RFLDOEVFVPAGWTGKMPNGDVIKSGVKFIDIRSKVKODPE OTMIVIAALOB 146 RFFEQEVYIPAGWTGKMPNGDVIKSGVKFIDIRSKYKQDPD PK[LJEIAAYK

CeIS 633 RGEAPVLNVRFHEVDLAV GLA MTYVKVPGTPSTYVG D V N|DA 242 AGQVPO RFWA SEFAVANGVYAILFPDQG ------- PEKLLGDVIGB 196 SGQVPE HRFWAC C- COOH

CIS 683 |G NST AVALKRYVLR IS1 TID|NAD L NIE GINS LGI LK RI LKA 285 GE TVD A I LAILKKYLLN STTINTANADMNSDN AUD A I YALLKKALLS

43550

1

4851 005158

5321499877

5821 91145

6322411 95

682284227

732334

CIS 733 E I DTL PYKN- COOH 741A 335 0- COOH 336

FIG. 6. Amino acid sequence comparison for CelS, the partial ORF (ORF1) preceeding the celCCC gene of C. cellulolyticum (A), thepartial ORF (also ORF1) preceeding the manA gene of C. saccharolyticum (B), and the 28-amino-acid cartridge of an endoglucanase of B.ruminicola (C). Boxed amino acids are identical or have similar chemical properties. Numbers indicate the positions, within the sequence ofeach protein, of the first and last amino acids shown on a line. For A and B, the numbers start with the first amino acid residues reported.For C, the numbering refers to the second encoding ORF. Similar residues were as follows: V, L, I, M, and F; R and K; D and E; N and Q;Y, F, and W; S and T; and G and A.

This conserved region has been suggested to have a role inthe binding of the enzyme to the cellulose substrate (5).However, experimental data recently indicated that therepeated sequence is responsible for an interaction withother cellulase components, especially the SL subunit (56).Therefore, the existence of the repeated sequence in theCelS protein supports an interaction between CelS (Ss) andSL, as has been observed for their synergism in cellulaseactivity (58, 59). It would be interesting to determinewhether deletion of the duplicated sequence would compro-mise the cooperativity between CelS and SL*A comparison of the deduced CelS amino acid sequence

with other sequences revealed two ORFs with highly homol-ogous sequences (Fig. 6). However, only partial sequenceshave been reported for these ORFs (4, 31). The first partialORF is located upstream of the celCCC gene of C. cellu-lolyticum (ORF1; 4) It has 58% identical residues (72%similar residues; 10) and includes the conserved, duplicatedregion in the C terminus. The second partial ORF is locatedupstream of the f-mannanase gene of Caldocellum saccha-rolyticum (ORF1; 31). This ORF also has 58% identicalresidues (73% similar residues; 10); however, it does notinclude the conserved, duplicated region. Since the completegenes for these two ORFi proteins have yet to be reported,the characterization of the CelS protein may provide insightinto their function. Furthermore, a 28-amino-acid sequenceof an endoglucanase from Bacteroides ruminicola (35) was

found for all three genes. This sequence is also shown in Fig.6.On the basis of the following evidence, we concluded that

the cloned celS gene codes for the CelS protein previouslyreported: (i) the calculated molecular weight is in closeagreement with the apparent molecular weight of the nativeCelS protein; (ii) the celS sequence agrees with three peptidesequences determined for the CelS protein, one at the Nterminus and the other two near the middle and the Cterminus of CelS; and (iii) the celS sequence contains theconserved, duplicated sequence that so far has been foundonly in the clostridial genes coding for glucanases and thatsupports the interaction with the SL protein, as previouslyreported (58, 59). We compared the celS sequence with thatof the J3 gene (25), cloned by use of antibodies against CelS,and found no global similarities. The calculated molecularweight of the product of the J3 gene (35,186) is much lowerthan that of CelS. The N-terminal sequence of the J3 geneproduct which does not contain the conserved, duplicatedsequence is different from that of CelS. It would thereforeappear that the J3 gene was obtained because of the cross-reactivity of the antibodies used. However, further study isneeded to resolve this discrepancy.

In previous work, we showed that CelS (Ss) and SL, twomajor components of the cellulase aggregate (cellulosome),degrade crystalline cellulose cooperatively. Although cellu-lose degradation by this enzyme system involves many more

Page 9: Cloning DNA Coding Clostridium Ss · (50 pug/ml), tetracycline (12.5 Rg/ml), orampicillin (50 jug/ml) as required. E. coli strains were stored in glycerol solution (3) at-70'C. Preparation

C. THERMOCELLUM celS 1301

components than CelS and SL, important insights may beobtained by focusing on these two subunits, since theyprobably from the simplest "subcellulosome" active oncrystalline cellulose. Unfortunately, progress has been lim-ited by the technical difficulties in the purification of the twocomponents in a nondenaturing procedure. In this work, wecloned the celS gene. The DNA sequence indicates that thecomplete sequence ofcelS has never been cloned before andthat celS belongs to a new family of cel genes.An anchor-enzyme model was proposed to explain the

synergism between CelS (Ss) and SL (58). Recent resultsindicate that SL contains multiple repeated domains, witheach domain providing a receptor site for the proteins withthe conserved, reiterated sequence that functions as thebinding ligand (14). SL therefore appears to function as ananchor not only for CelS but also for other Cel proteins.Since CelS is the most abundant subunit of the cellulosomeand so far the only subunit shown to degrade a substantialportion of crystalline cellulose synergistically with SL' stud-ies on the mechanism of the CelS-SL synergism may providemore insight into the mechanism and organization of thecellulosome. Expression of the celS gene in E. coli willfacilitate such studies.

ACKNOWLEDGMENTSThis project was supported in part by a biomedical research

support grant from the Public Health Service. W.K.W. appreciatesthe support of the Link Foundation Energy Fellowship.We thank Wen Gang Chou, Shau-Ping Lei, and Ton-Hou Lee for

helpful discussions; Robert Quivey and Hon-Chi Lin for valuableconsultations and for providing cloning vectors and equipment; TomCummings for providing oligonucleotides; Robert Seid, MichaelFiske, and Sue Middlebrook for determining the peptide sequences;and Barbara Iglewski for providing research facilities in the initialstage of the project and for general support.

REFERENCES1. Ait, N., N. Creuzet, and P. Forget. 1979. Partial purification of

cellulase from Clostridium thermocellum. J. Gen. Microbiol.113:399-402.

2. Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J.Lipman. 1990. Basic local alignment search tool. J. Mol. Biol.215:403-410.

3. Ausubel, F. M., R. Brent, R. E. Kingston, D. D. Moore, J. G.Seidman, J. A. Smith, and K. Struhl. 1989. Current protocols inmolecular biology. John Wiley & Sons, Inc., New York.

4. Bagnara-Tardif, C., C. Gaudin, A. Belaich, P. Hoest, T. Citard,and J.-P. Belaich. 1992. Sequence analysis of a gene clusterencoding cellulases from Clostridium cellulolyticum. Gene 119:17-28.

5. Beguin, P., P. Cornet, and J.-P. Aubert. 1985. Sequence of acellulase gene of the thermophilic bacterium Clostridium ther-mocellum. J. Bacteriol. 162:102-105.

6. Beguin, P., J. Millet, 0. Grepinet, A. Navarro, M. Juy, A. Amit,R. Poljak, and J.-P. Aubert. 1988. The cel (cellulose degrada-tion) genes of Clostridium thermocellum, p. 267-282. In J.-P.Aubert, P. Beguin, and J. Millet (ed.), Biochemistry and genet-ics of cellulose degradation. Academic Press, Inc., New York.

7. Cantor, C. R., and P. R. Schimmel. 1980. Biophysical chemis-try, part III. The behavior of biological macromolecules. W. H.Freeman & Co., New York.

8. Caruthers, M. H. 1985. Gene synthesis machines: DNA chem-istry and its uses. Science 230:281-285.

9. Coughlan, M. P., K. Hon-Nami, H. Hon-Nami, L. G. Ljungdahl,J. J. Paulin, and W. E. Rigsby. 1985. The cellulolytic enzymecomplex of Clostridium thermocellum is very large. Biochem.Biophys. Res. Commun. 130:904-909.

10. Devereux, J., P. Haeberli, and 0. Smithies. 1984. A comprehen-sive set of sequence analysis programs for the VAX. NucleicAcids Res. 12:387-395.

11. Faure, E., A. Belaich, C. Bagnara, C. Gaudin, and J.-P. Belaich.1989. Sequence analysis of the Clostridium cellulolyticum en-doglucanase-A-encoding gene, celCCA. Gene 84:3946.

12. Feinberg, A. P., and B. Vogelstein. 1983. A technique forradiolabeling DNA restriction endonuclease fragments to highspecific activity. Anal. Biochem. 132:6-13.

13. Foong, F., T. Hamamoto, 0. Shoseyov, and R. Doi. 1991.Nucleotide sequence and characteristics of endoglucanase geneengB from Clostridium cellulovorans. J. Gen. Microbiol. 137:1729-1736.

14. Fujino, T., P. Beguin, and J.-P. Aubert. 1992. Cloning of aClostridium thermocellum DNA fragment encoding polypep-tides that bind the catalytic components of the cellulosome.FEMS Microbiol. Lett. 94:165-170.

15. Grabnitz, F., K. P. Rucknagel, M. Seiss, and W. L. Stauden-bauer. 1989. Nucleotide sequence of the Clostridium thermocel-lum bglB gene encoding thermostable j3-glucosidase B: homol-ogy to fungal 0-glucosidases. Mol. Gen. Genet. 217:70-76.

16. Grabnitz, F., M. Seiss, K. P. Rucknagel, and W. L. Stauden-bauer. 1991. Structure of the 13-glucosidase gene bglA of Clos-tridium thermocellum. Eur. J. Biochem. 200:301-309.

17. Grepinet, O., and P. Beguin. 1986. Sequence of the cellulasegene of Clostridium thermocellum coding for endoglucanase B.Nucleic Acids Res. 14:1791-1799.

18. Grepinet, O., M. C. Chebrou, and P. Beguin. 1988. Nucleotidesequence and deletion analysis of the xylanase gene (xynZ) ofClostridium thermocellum. J. Bacteriol. 170:4582-4588.

19. Hall, J., G. P. Hazlewood, P. J. Barker, and H. J. Gilbert. 1988.Conserved reiterated domains in Clostridium thermocellumendoglucanases are not essential for catalytic activity. Gene69:29-38.

20. Holmes, D. S., and M. Quigley. 1987. A rapid boiling method forthe preparation of bacterial plasmids. Anal. Biochem. 114:193-197.

21. Johnson, E. A. 1983. Regulation of cellulase activity and syn-thesis in Clostridium thernocellum. Ph.D. thesis. Massachu-setts Institute of Technology, Cambridge.

22. Johnson, E. A., A. Madia, and A. L. Demain. 1981. Chemicallydefined minimal medium for growth of the anaerobic cellulolyticthermophile Clostridium thermocellum. Appl. Environ. Micro-biol. 41:1061-1062.

23. Johnson, E. A., M. Sakajoh, G. Halliwell, A. Madia, and A. L.Demain. 1982. Saccharification of complex cellulosic substratesby the cellulase system from Clostridium thermocellum. Appl.Environ. Microbiol. 43:1125-1132.

24. Joliff, G., P. Beguin, and J.-P. Aubert. 1986. Nucleotide se-quence of the cellulase gene celD encoding endoglucanase D ofClostridium thermocellum. Nucleic Acids Res. 14:8605-8613.

25. Kobayashi, T., N. Huskisson, P. Barker, A. L. Demain, andM. P. M. Romaniec. 1992. Nucleotide sequence of a geneencoding an endoglucanase of Clostridium thermocellum relatedto subunit Ss and purification of the enzyme. Enzyme Microb.Technol. 14:447-453.

26. Kohring, S., J. Weigel, and F. Mayer. 1990. Subunit composi-tion and glycosidic activities of the cellulase complex fromClostridium thermocellum JW20. Appl. Environ. Microbiol.56:3798-3804.

27. Laemmli, U. K. 1970. Cleavage of structural proteins during theassembly of the head of bacteriophage T4. Nature (London)227:680-685.

28. Lamed, R., E. Setter, and E. A. Bayer. 1983. Characterization ofa cellulose-binding, cellulase-containing complex in Clostridiumthermocellum. J. Bacteriol. 156:828-836.

29. LeGendre, N., and P. Matsudaira. 1988. Direct protein microse-quencing from Immobilon-P transfer membrane. BioTechniques6:154-159.

30. Losick, R., and J. Pero. 1981. Cascades of sigma factors. Cell25:582-584.

31. Luthi, E., N. B. Jasmat, R. A. Grayling, D. R. Love, and P. L.Bergquist. 1991. Cloning, sequence analysis, and expression inEscherichia coli of a gene coding for a P-mannanase from theextremely thermophilic bacterium "Caldocellum saccharolyti-cum." Appl. Environ. Microbiol. 57:694-700.

VOL. 175, 1993

Page 10: Cloning DNA Coding Clostridium Ss · (50 pug/ml), tetracycline (12.5 Rg/ml), orampicillin (50 jug/ml) as required. E. coli strains were stored in glycerol solution (3) at-70'C. Preparation

1302 WANG ET AL.

32. Maniatis, T., E. F. Fritsch, and J. Sambrook. 1982. Molecularcloning: a laboratory manual. Cold Spring Harbor Laboratory,Cold Spring Harbor, N.Y.

33. Martin, F. H., M. M. Castro, F. Aboul-ela, and I. Tinoco, Jr.1985. Base pairing involving deoxyinosine: implications forprobe design. Nucleic Acids Res. 13:8927-8938.

34. Matsudaira, P. 1987. Sequence from picomole quantities ofproteins electroblotted onto polyvinylidene difluoride mem-branes. J. Biol. Chem. 262:10035-10038.

35. Matsushita, O., J. B. Russell, and D. B. Wilson. 1991. ABacteroides ruminicola 1,4-,3-D-endoglucanase is encoded intwo reading frames. J. Bacteriol. 173:6919-6926.

36. Mayer, F., M. P. Coughlan, Y. Mori, and L. G. Ljungdahl. 1987.Macromolecular organization of the cellulolytic enzyme com-plex of Clostridium thermocellum as revealed by electron mi-croscopy. Appl. Environ. Microbiol. 53:2785-2792.

37. Meade, H. M., S. R. Long, C. B. Ruvkun, S. E. Brown, andF. M. Ausubel. 1982. Physical and genetic characterization ofsymbiotic and auxotrophic mutants of Rhizobium meliloti in-duced by transposon TnS mutagenesis. J. Bacteriol. 149:114-122.

38. Millet, J., D. Petre, P. Beguin, 0. Raynaud, and J.-P. Aubert.1985. Cloning of ten distinct DNA fragments of Clostridiumthennocellum coding for cellulases. FEMS Microbiol. Lett.29:145-149.

39. Navarro, A., M.-C. Chebrou, P. Beguin, and J.-P. Aubert. 1991.Nucleotide sequence of the cellulase gene celF of Clostridiumthermocellum. Res. Microbiol. 142:927-936.

40. Ng, T. K., and J. G. Zeikus. 1981. Comparison of extracellularcellulase activities of Clostndium thermocellum LQRI andTrichoderma reesei QM 9414. Appl. Environ. Microbiol. 42:231-240.

41. Radloff, R., W. Bauer, and J. Vinograd. 1967. A dye-buoyant-density method for the detection and isolation of closed circularduplex DNA: the closed circular DNA in HeLa cells. Proc.Natl. Acad. Sci. USA 57:1514-1521.

42. Reddy, S. V., K. Hamsabhushanam, and P. Jagadeeswaran.1989. A rapid method for large scale isolation of plasmid DNAby boiling in a plastic bag. BioTechniques 7:821-822.

43. Reed, K. C., and D. A. Mann. 1985. Rapid transfer ofDNA fromagarose gels to nylon membranes. Nucleic Acids Res. 13:7207-7221.

44. Richardson, C. C. 1981. Bacteriophage T4 polynucleotide ki-nase, p. 299-314. In P. D. Boyer (ed.), The enzymes, vol. 14A.Academic Press, Inc., New York.

45. Romaniec, M. P. M., N. G. Clarke, and G. P. Hazlewood. 1987.Molecular cloning of Clostridium thennocellum DNA and theexpression of further novel endo-0-1,4-glucanase genes in Esch-enichia coli. J. Gen. Microbiol. 133:1297-1307.

46. Rosenberg, M., and D. Court. 1979. Regulatory sequences

involved in the promotion and termination of RNA transcrip-tion. Annu. Rev. Genet. 13:319-353.

47. Saiki, R. K., D. H. Gelfand, S. Stoffel, S. J. Scharf, R. Higuchi,G. T. Horn, K. B. Mullis, and H. A. Erlich. 1988. Primer-directed enzymatic amplification of DNA with a thermostableDNA polymerase. Science 239:487-491.

48. Sambrook, J., E. F. Fritsch, and T. Maniatis. 1989. Molecularcloning: a laboratory manual, 2nd ed. Cold Spring HarborLaboratory, Cold Spring Harbor, N.Y.

49. Sanger, F., S. Nicklen, and A. R. Coulson. 1977. DNA sequenc-ing with chain-terminating inhibitors. Proc. Natl. Acad. Sci.USA 74:5463-5467.

50. Schagger, H., and H. von Jagow. 1987. Tricine-sodium dodecylsulphate-polyacrylamide gel electrophoresis for the separationof proteins in the range from 1 to 100 kDa. Anal. Biochem.166:368-379.

51. Schwarz, W. H., S. Schimming, K. P. Rucknagel, S. Burg-schwaiger, G. Kreil, and W. Staudenbauer. 1988. Nucleotidesequence of the ceMC gene encoding endoglucanase C of Clos-tndium thermocellum. Gene 63:23-30.

52. Shima, S., Y. Igararashi, and T. Kodama. 1991. Nucleotidesequence analysis of the endoglucanase-encoding gene, cel-CCD, of Clostnidium cellulolyticum. Gene 104:33-38.

53. Short, J. M., J. M. Fernandez, J. A. Sorge, and W. D. Huse.1988. Lambda ZAP: a bacteriophage lambda expression vectorwith in vivo excision properties. Nucleic Acids Res. 16:7583-7600.

54. Southern, E. 1979. Gel electrophoresis of restriction fragments.Methods Enzymol. 68:152-176.

55. Southern, E. M. 1975. Detection of specific sequences amongDNA fragments separated by gel electrophoresis. J. Mol. Biol.98:503-517.

56. Tokatidis, K., S. Salamitou, P. Beguin, P. Dhuijati, and J.-P.Aubert. 1991. Interaction of the duplicated segment carried byClostnidium thermocellum cellulases with cellulosome compo-nents. FEBS Lett. 291:185-188.

57. Watson, M. E. E. 1984. Compilation of published signal se-quences. Nucleic Acids Res. 12:5145-5164.

58. Wu, J. H. D., and A. L. Demain. 1988. Proteins of the Clostri-dium thermocellum cellulase complex responsible for degrada-tion of crystalline cellulose, p. 117-131. In J.-P. Aubert, P.Beguin, and J. Millet (ed.), Biochemistry and genetics of cellu-lose degradation. Academic Press, Inc., New York.

59. Wu, J. H. D., W. H. Orme-Johnson, and A. L. Demain. 1988.Two components of an extracellular protein aggregate of Clos-tridium thermocellum together degrade crystalline cellulose.Biochemistry 27:1703-1709.

60. Yague, E., P. Beguin, and J.-P. Aubert. 1990. Nucleotidesequence and deletion analysis of the cellulase-encoding genecelH of Clostridium thermocellum. Gene 89:61-67.

J. BACTIERIOL.