sequence and characterization of the escherichia coli genome between the ndk and gcpe genes

4
FEMS Microbiology Letters 121 (1994) 293-296 © 1994 Federation of European Microbiological Societies 0378-1097/94/$07.00 Published by Elsevier 293 FEMSLE 6129 Sequence and characterization of the Escherichia coli genome between the ndk and gcpE genes Jeanette Baker and Jack Parker * Department of Microbiology, Southern Illinois University, Carbondale, IL 62901-6508, USA Received 10 May 1994; revision received and accepted 21 June 1994 Abstract: The region of the chromosome immediately upstream of the Escherichia coli gene gcpE has been cloned and sequenced. This region contains two functional open reading frames, orf 384 and orf 337, encoding proteins of 43082 and 36189 Da, respectively. Sequencing analysis (this paper) and the isolation of a DNA fragment containing a functional promoter (Talukder, A.A., Yanai, S., and Yamada, M. (1994) Biosci. Biotech. Biochem. 58, 117-120) indicate that orf 337 is in an operon with gcpE. The gene orf 384 is immediately downstream of the gene ndk, which encodes nucleoside diphosphate kinase. Key words: Escherichia coli; Open reading frame; Genomic sequence Introduction We have previously reported the sequence and partial characterization of gcpE, a gene on the Escherichia coli chromosome encoding a protein of 40681 Da which may have an essential func- tion [1]. This gene is immediately upstream from hisS, a gene encoding histidyl-tRNA synthetase [2]. However, gcpE is not co-transcribed with hisS [2]. No promoter was identified on the clone containing gcpE, and it seemed likely that gcpE was the downstream terminal gene of an operon [1]. In this paper we report the sequence of the region upstream from gcpE. This region contains * Corresponding author. Tel: (618) 453 2520; Fax: (618) 453 8036. two open reading frames that are expressed in E. coli, one of which is almost certainly in a di- cistronic operon with gcpE. The other is immedi- ately downstream of the essential gene ndk [3], but is most likely in a monocistronic operon. Materials and Methods Bacteria, bacteriophage, plasmids and media The bacteriophage A428 (2D5), which contains a portion of the Escherichia coli K-12 chromo- some [4], was provided by Catherine L. Squires. The E. coli strain BL21(DE3) [5] used for ex- pressing cloned genes from a bacteriophage T7 promoter was kindly provided by F.W. Studier. Plasmids used included pLysS and pLysE, which help control the activity of the T7 RNA poly- merase [5], and the cloning vector pTZ18U [6]. SSDI 0378-1097(94)00279-Z

Upload: jeanette-baker

Post on 20-Jul-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Sequence and characterization of the Escherichia coli genome between the ndk and gcpE genes

FEMS Microbiology Letters 121 (1994) 293-296 © 1994 Federation of European Microbiological Societies 0378-1097/94/$07.00 Published by Elsevier

293

FEMSLE 6129

Sequence and characterization of the Escherichia coli genome between the ndk and gcpE genes

J e a n e t t e B a k e r and Jack P a r k e r *

Department of Microbiology, Southern Illinois University, Carbondale, IL 62901-6508, USA

Received 10 May 1994; revision received and accepted 21 June 1994

Abstract: The region of the chromosome immediately upstream of the Escherichia coli gene gcpE has been cloned and sequenced. This region contains two functional open reading frames, orf 384 and orf 337, encoding proteins of 43082 and 36189 Da, respectively. Sequencing analysis (this paper) and the isolation of a DNA fragment containing a functional promoter (Talukder, A.A., Yanai, S., and Yamada, M. (1994) Biosci. Biotech. Biochem. 58, 117-120) indicate that orf 337 is in an operon with gcpE. The gene orf 384 is immediately downstream of the gene ndk, which encodes nucleoside diphosphate kinase.

Key words: Escherichia coli; Open reading frame; Genomic sequence

Introduct ion

We have previously reported the sequence and partial characterization of gcpE, a gene on the Escherichia coli chromosome encoding a protein of 40681 Da which may have an essential func- tion [1]. This gene is immediately upstream from hisS, a gene encoding histidyl-tRNA synthetase [2]. However, gcpE is not co-transcribed with hisS [2]. No promoter was identified on the clone containing gcpE, and it seemed likely that gcpE was the downstream terminal gene of an operon [1]. In this paper we report the sequence of the region upstream from gcpE. This region contains

* Corresponding author. Tel: (618) 453 2520; Fax: (618) 453 8036.

two open reading frames that are expressed in E. coli, one of which is almost certainly in a di- cistronic operon with gcpE. The other is immedi- ately downstream of the essential gene ndk [3], but is most likely in a monocistronic operon.

Materials and Methods

Bacteria, bacteriophage, plasmids and media The bacteriophage A428 (2D5), which contains

a portion of the Escherichia coli K-12 chromo- some [4], was provided by Catherine L. Squires. The E. coli strain BL21(DE3) [5] used for ex- pressing cloned genes from a bacteriophage T7 promoter was kindly provided by F.W. Studier. Plasmids used included pLysS and pLysE, which help control the activity of the T7 RNA poly- merase [5], and the cloning vector pTZ18U [6].

SSDI 0 3 7 8 - 1 0 9 7 ( 9 4 ) 0 0 2 7 9 - Z

Page 2: Sequence and characterization of the Escherichia coli genome between the ndk and gcpE genes

294

DNA manipulations Growth of bacteria for preparat ion of plasmids

and phage, and routine cloning procedures were as described in Sambrook et al. [7]. Deletion derivatives of plasmids for sequencing were made using exonuclease III [8]. All sequencing was done using the Sequenase version 2.0 kit (United States Biochemical Corporat ion) with adjustments made for double-stranded templates [9]. Both strands were sequenced using GTP and ITP, and reac- tions for areas of persistent compression were run on gels containing formamide as described in manufacturer ' s protocols (United States Bio- chemical Corporation). Computer analysis of the D N A and protein sequences was performed using the MacVector TM program (IBI-Kodak Com- pany) and BLAST [10].

Labeling and analysis of proteins Strains containing plasmids capable of allow-

ing protein over-expression from genes cloned downstream of a T7 promoter were grown in M9 medium [7] containing 0.4% glucose, 30 mg 1-1 chloramphenicol and 50 mg 1-1 ampicillin. Tran- scription from the T7 promoter by a T7 poly- merase whose expression was controlled by the lac promoter [5] was induced by adding IPTG to a final concentration of 1.0 mM at 1.5 h (in strains containing pLysS) or 3.5 h (in strains con- taining pLysE) before harvesting. Transcription of cellular genes was reduced by the addition of rifampicin to 150 /~g m1-1 1 h before harvesting and labeling was accomplished by adding 25/zCi m1-1 [35S]-L-methionine 15 min before harvest- ing. Radioactively labeled proteins were sepa- rated by electrophoresis using a 10% polyacryl- amide gel containing sodium dodecyl sulfate [11] and analyzed by autoradiography.

Results and Discussion

Cloning and.analysis of genomic fragment contain- ing upstream region

A 7.0 kb KpnI fragment of A428 containing the region of the E. coli chromosome upstream from gcpE was cloned into the vector pTZ18U yielding the plasmid pSIU904. This plasmid was

1000 2000 3000

I I I Kpn I Sma I Pst I Bgl II Hae 1[

I ] J~_h~sS

pSlU 918

pSIU 927

~3SIU 914

Fig. 1. Map of the region of the E. coli genome containing gcpE. The accession number for bases 1-2179 in the EMBL Data Bank is U02965. The sequence for bases 2180-3850, including gcpE have previously been reported (accession number X64451) [1]. The orf 384 spans bases 235-1389 and orf 337 spans bases 1674-2687. The locations of the 5' region of the hisS gene [2] and the 3' region of the ndk gene [3] are also shown. The direction of transcription in each transcrip-

tional unit is from left to right as drawn.

used directly for sequencing and was also used to generate the exonuclease I I I deletion clones and subclones for sequencing. The sequence obtained (bases 1-2179 in Fig. 1) was analyzed with the previously reported sequence [1] which includes gcpE and a short region upstream (bases 2180- 3850 in Fig. 1). This sequence is at approximately 56.7 centisomes on the E. coli restriction map of K.E. Rudd [12]. The combined sequence revealed two open reading frames upstream of gcpE (Fig. 1). The restriction map of the sequence agreed with the published map of this region [4] except that we found an extra EcoRV site at 1284. Restriction analysis of A428 indicates that this site does exist on the E. coli chromosome (results not shown). Bases 1 - 197 of Fig. 1 are identical to bases 528 to 724 of H a m a et al. [3]. This includes the last 27 codons of the ndk gene. Therefore, our sequence completes the gap between the essential genes ndk, which encodes nucleoside diphosphate kinase, and hisS, which encodes his- t idyl-tRNA synthetase.

Characterization of open reading frames The two open reading frames upstream of

gcpE would encode proteins of 43082 Da (orf 384) and 36 189 Da (orf 337). The use of synony- mous codons in these genes is typical of E. coli genes [13] (results not shown). The frequency of optimal codon usage [13] can be calculated to be 0.75 in orf 384, 0.65 in orf 337, and 0.73 in gcpE.

Page 3: Sequence and characterization of the Escherichia coli genome between the ndk and gcpE genes

1 2 3

337

384 "~--- E

Fig. 2. Autoradiogram of an SDS polyacrylamide gel contain- ing proteins expressed from a T7 promoter. Strains were grown, induced and labeled with [35S]-methionine as de- scribed in Materials and Methods. Lane 1 shows proteins encoded by pSIU914, lane 2 shows proteins encoded by plas- mid pSIU927, and lane 3 shows proteins encoded by plasmid pSIU918. The regions of the E. coli chromosome carried by the plasmids are shown in Fig. 1. The arrows labeled E, 337 and 338 point to the products of gcpE, orf 337 and or]" 384,

respectively.

This would indicate that the products of these genes might be present in the cell at middle-level quantities, similar to that of some biosynthetic enzymes [13]. Computer searches of protein se- quences in SWlSS-PROT, PIR(R), or Brookhaven protein sequence databases, or translated coding sequences from GenBank, did not uncover any protein clearly related to the predicted products of orf 384, orf 337 or gcpE (results not shown). The relatedness that was detected seemed to be primarily because of the three consecutive trypto- phan codons and the abundance of threonine codons in orf 337.

In order to determine if these open reading frames were expressed in E. coli they were sub- cloned onto a variety of plasmids. Strains contain- ing these plasmids were analyzed. Figure 2 shows autoradiograms of labeled proteins from strains

295

containing plasmids where the E. coli chromoso- mal D N A was cloned under the control of the bacteriophage T7 promoter . The region con- tained by the plasmids is shown in Fig. 1. In Fig. 2, lane 1 shows labeled proteins from a strain containing pSIU914, lane 2 from a strain contain- ing pSIU927 and lane 3 from a strain containing pSIU918. The non-vector proteins are marked with arrows. On the SDS gel (Fig. 2), the prod- ucts of gcpE (lane 1) and off 384 (lanes 2 and 3) have apparent molecular masses of 39 kDa and 43 kDa respectively, as would be expected by the D N A sequence. However, the product of orf 337 has an apparent size of 56 kDa, over 50% greater than expected. It seems likely that this is a result of anomalous migration on SDS gels [14]. The predicted product of this gene has a high concen- tration of the non-polar amino acids alanine, proline, and threonine and does not focus in a typical isoelectric focussing gel (data not shown). It would seem that its structure, or possibly post- translational modifications, are responsible for its anomalous migration behavior on these gels [14].

Transcriptional units The gene ndk is expressed from its own pro-

moter, and the region between ndk and orf 384 contains an apparent transcription termination sequence [3]. Since orf 384 is expressed from clones not containing ndk (results not shown), there must be a promoter upst ream of orf 384. There is also a region of dyad symmetry followed by a run of T's at base 1395 to base 1435, imme- diately downstream of orf 384. This would sug- gest that orf 384 is transcribed as a mono- cistronic mRNA.

Our computer analysis revealed one or more consensus promoter sequences between orf 337 and orf 384. Preliminary evidence from our labo- ratory involving subcloning confirms that there is a p romoter in this region (data not shown). Re- cently Talukder et al. have identified a promoter in this region using a technique involving produc- tion of a protein fusion [15]. (They also reported a 131 bp sequence from within orf 337. Our sequence differs from theirs at two positions, which we have checked by repeatedly sequencing both strands. We find a G at positions 1731 and

Page 4: Sequence and characterization of the Escherichia coli genome between the ndk and gcpE genes

296

1753 of our sequence, where Talukder et al. report a C and an A, respectively [15].) The precise location of the promoter was not deter- mined, but part of orf 337 was expressed as a fusion protein, and the fragment containing the promoter ended at the Sau3A site at base 1839 on our sequence (Fig. 1) and extended 500 to 1000 bases ups t ream [15]. This would exclude the possibility that orf 384 is transcribed with orf 337. Therefore, it seems likely that or f 337 and gcpE are part of a dicistronic operon.

Acknowledgements

This work has been supported by a grant from the Office of Research Development and Admin- istration of SIU.

References

1 Baker, J., Franklin, D.B. and Parker, J. (1992) Sequence and characterization of the gcpE gene of Escherichia coli. FEMS Microbiol. Lett. 94, 175-180.

2 Freedman, R., Gibson, B., Donovan, D., Biemann, K., Eisenbeis, S., Parker, J. and Schimmel, P. (1985) Primary structure of histidine tRNA synthetase and characteriza- tion of hisS transcripts. J. Biol. Chem. 260, 10063-10068.

3 Hama, H., Almaula, N., Lerner, C.G., Inouye, S. and Inouye, M. (1991) Nucleoside diphosphate kinase from Escherichia coli; its overproduction and sequence compari- son with eukaryotic enzymes. Gene 105, 31-36.

4 Kohara, Y., Akiyama, K. and Isono, K. (1987) The physical map of the whole E. coli chromosome: Application of a new strategy for rapid analysis and sorting of a large genomic library. Cell 50, 495-508.

5 Studier, W.F., Rosenberg, A.H., Dunn, J.J. and Duben- dorff, J.W. (1990) Use of T7 RNA polymerase to direct expression of cloned genes. Methods Enzymol. 185, 60-89.

6 Mead, D.A., Szczesna-Skorupa, E. and Kemper, B. (1986) Single-stranded DNA 'blue' T7 promoter plasmids: a ver- satile tandem promoter system for cloning and protein engineering. Protein Eng. 1, 67-74.

7 Sambrook, J., Fritsch, E.F. and Maniatis, T. (1989) Molec- ular Cloning, A Laboratory Manual, 2nd Edn. Cold Spring Harbor Press, Cold Spring Harbor, N¥.

8 Henikoff, S. (1987) Unidirectional digestion with exonucle- ase IlI in DNA sequence analysis. Methods Enzymol. 155, 156-165.

9 Toneguzzo, F., Glynn, S., Levi, E., Mjolsness, S. and Hayday, A. (1988) Use of a chemically modified T7 DNA polymerase for manual and automated sequencing of su- percoiled DNA. BioTechniques 6, 460-469.

10 Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403-410.

11 Laemmli, U.K. (1970) Cleavage of structural proteins dur- ing assembly of the head of bacteriophage T4. Nature 227, 680-685.

12 Miller, J.H. (1992) A Short Course in Bacterial Genetics: A Laboratory Manual and Handbook for Escherichia coli and Related Bacteria. Cold Spring Harbor Laboratory Press, NY.

13 Ikemura, T. (1981) Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J. Mol. Biol. 151,389-409.

14 Banker, G.A. and Cotman, C.W. (1972) Measurement of free electrophoretic mobility and retardation coefficient of protein-sodium dodecyl sulfate complexes by gel elec- trophoresis: A method to validate molecular weight esti- mates. J. Biol. Chem. 247, 5856-5861.

15 Talukder, A.A., Yanai, S. and Yamada, M. (1994) Analysis of the Escherichia coli genomic genes and regulation of their expressions: An applicable procedure for genomic analysis of other microorganisms. Biosci. Biotech. Biochem. 58, 117-120.