sequencing and analysis of 51 kb on the right arm of chromosome xv from saccharomyces cerevisiae...

8
YEAST VOL. 12: 281-288 (1996) .O O OO 0" 0 xv % 0 Yeast Sequencing Reports 0 oooo Sequencing and Analysis of 51 kb on the Right Arm of Chromosome XV from Saccharomyces cerevisiae Reveals 30 Open Reading Frames STEFAN WIEMANN*, STEFANIE RECHMANN, VLADIMIR BENES, HARTMUT VOSS, CHRISTIAN SCHWAGER, CESTMIR VLCEKt, JOSEF STEGEMANN, JURGEN ZIMMERMANN, HOLGER ERFLE, VACLAV PACES? AND WILHELM ANSORGE Biochemical Instrumentation, EMBL, Meyerhofstr. 1, 0-691 I7 Heidelberg, Germany ?Institute of Molecular Genetics, Academy of Science of the Czech Republic, Flemingovo 2, Cz-16637 Prague, Czech Republic We have sequenced a region of 51 kb of the right arm from chromosome XV of Succharomyces cerevisiue. The sequence contains 30 open reading frames (ORFs) of more than 100 amino acid residues. Thirteen new genes have been identified. Thirteen ORFs correspond to known yeast genes. One delta element and one tRNA gene were identified. Upstream of the RP031 gene, encoding the largest subunit of RNA polymerase 111, lies a Abflp binding site. The nucleotide sequence data reported in this paper are available in the EMBL, GenBank and DDBJ nucleotide sequence databases under the Accession Number X90518. KEY WORDS - Succharomyces cerevisiue; chromosome XV; DNA sequencing project INTRODUCTION As part of the European Union project for sequencing the Saccharomyces cerevisiae genome, we have sequenced a 50,984 bp cosmid contig (consisting of cosmids pEOA306, pEOA265 and pEOA106), on the right arm of chromosome XV. We used random and ordered sequencing strategies on automated standard ALF DNA sequencers (Pharmacia, Uppsala, Sweden). MATERIALS AND METHODS Cosmids Cosmids pEOA306, pEOA265 and pEOA106 made up an overlaping contig of 51 kb of yeast chromosome XV DNA, and were received from B. Dujon (Thierry et al., 1992). The inserts were part of a larger cosmid contig assigned to us for sequencing. *Corresponding author. Sequencing strategies The cosmids were sequenced in a combination of random and ordered approaches. Random Sau3A and TaqI subclones were made from EcoRI fragments of cosmid pEOA106 and sequenced. Cosmid DNA of pEOA265 and pEOA306 was digested with EcoRI or BamHI and separated by agarose gel electrophoresis. Fragments were purified and subcloned into pUClS/EcoRI- and pUC 1 8/BamHI-digested vectors, respectively. The sequence of one 5 kb fragment from cosmid pEOA265 was determined by sequencing nested deletion subclones in pUC28/29 (Benes et al., 1993). All fragments were sequenced with standard primers to generate multiple starting sites for walking primer strategy. Remaining gaps were closed with walking primers. Fluorescein-1 5- *dATP (Voss et al., 1992) was used as internal label in automated double-stranded dideoxy DNA sequencing on ALF DNA sequencers. Linking CCC 0749-503x/96/03028 1-08 0 1996 by John Wiley & Sons Ltd

Upload: wilhelm

Post on 06-Jun-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sequencing and analysis of 51 kb on the right arm of chromosome XV from Saccharomyces cerevisiae reveals 30 open reading frames

YEAST VOL. 12: 281-288 (1996)

.O O OO

0" 0 xv % 0 Yeast Sequencing Reports 0 oooo

Sequencing and Analysis of 51 kb on the Right Arm of Chromosome XV from Saccharomyces cerevisiae Reveals 30 Open Reading Frames STEFAN WIEMANN*, STEFANIE RECHMANN, VLADIMIR BENES, HARTMUT VOSS, CHRISTIAN SCHWAGER, CESTMIR VLCEKt, JOSEF STEGEMANN, JURGEN ZIMMERMANN, HOLGER ERFLE, VACLAV PACES? AND WILHELM ANSORGE

Biochemical Instrumentation, EMBL, Meyerhofstr. 1, 0-691 I 7 Heidelberg, Germany ?Institute of Molecular Genetics, Academy of Science of the Czech Republic, Flemingovo 2, Cz-16637 Prague, Czech Republic

We have sequenced a region of 51 kb of the right arm from chromosome XV of Succharomyces cerevisiue. The sequence contains 30 open reading frames (ORFs) of more than 100 amino acid residues. Thirteen new genes have been identified. Thirteen ORFs correspond to known yeast genes. One delta element and one tRNA gene were identified. Upstream of the RP031 gene, encoding the largest subunit of RNA polymerase 111, lies a Abflp binding site. The nucleotide sequence data reported in this paper are available in the EMBL, GenBank and DDBJ nucleotide sequence databases under the Accession Number X90518.

KEY WORDS - Succharomyces cerevisiue; chromosome XV; DNA sequencing project

INTRODUCTION As part of the European Union project for sequencing the Saccharomyces cerevisiae genome, we have sequenced a 50,984 bp cosmid contig (consisting of cosmids pEOA306, pEOA265 and pEOA106), on the right arm of chromosome XV. We used random and ordered sequencing strategies on automated standard ALF DNA sequencers (Pharmacia, Uppsala, Sweden).

MATERIALS AND METHODS Cosmids

Cosmids pEOA306, pEOA265 and pEOA106 made up an overlaping contig of 51 kb of yeast chromosome XV DNA, and were received from B. Dujon (Thierry et al., 1992). The inserts were part of a larger cosmid contig assigned to us for sequencing.

*Corresponding author.

Sequencing strategies The cosmids were sequenced in a combination

of random and ordered approaches. Random Sau3A and TaqI subclones were made from EcoRI fragments of cosmid pEOA106 and sequenced. Cosmid DNA of pEOA265 and pEOA306 was digested with EcoRI or BamHI and separated by agarose gel electrophoresis. Fragments were purified and subcloned into pUClS/EcoRI- and pUC 1 8/BamHI-digested vectors, respectively. The sequence of one 5 kb fragment from cosmid pEOA265 was determined by sequencing nested deletion subclones in pUC28/29 (Benes et al., 1993).

All fragments were sequenced with standard primers to generate multiple starting sites for walking primer strategy. Remaining gaps were closed with walking primers. Fluorescein-1 5- *dATP (Voss et al., 1992) was used as internal label in automated double-stranded dideoxy DNA sequencing on ALF DNA sequencers. Linking

CCC 0749-503x/96/03028 1-08 0 1996 by John Wiley & Sons Ltd

Page 2: Sequencing and analysis of 51 kb on the right arm of chromosome XV from Saccharomyces cerevisiae reveals 30 open reading frames

282 S. WIEMANN ET AL.

necessary to obtain the complete sequence in double-stranded form. The average number of readings per base pair was 4.8 due to the high redundancy random shotgun sequencing strategy used for sequencing of cosmid pEOA106. The average reading length was 3 17 bases because of several shotgun clones that contained only short inserts.

of first-order EcoRI fragments was done by se- quencing on the respective BamHI subclones and vice versa or by direct cycle sequencing on cosmids.

Oligonucleotide design and synthesis Walking primers were designed with the help

of computer programs (see below), considering additional rules for sequencing with internal labelling (Wiemann et al., 1995). Sets of ten syn- thetic oligonucleotide primers were generated simultaneously on the EMBL multiple segmental synthesizer (Ansorge et al., 1992) for primer walking. Oligonucleotides were directly used in the sequencing reactions after deprotection, lyophilization and one n-butanol precipitation.

Software Oligonucleotides optimal for sequencing were

designed using the Geneskipper program devel- oped by C. Schwager (EMBL, manuscript in preparation) or the OLIGO computer program (MedProbe, Oslo, Norway). Raw data collection and evaluation were performed with the ALF Manager software (Pharmacia, Uppsala, Sweden). Fragment assembly and analysis of sequences was done with the Geneskipper program (EMBL). Sequences of putative genes were aligned (FASTA) with the EMBL/GenBank data libraries (Gen- EMBL databank, update July 03, 1995) for nucleotide and the SwissProt and PIR data libraries for deduced amino acid sequences. In parallel, the analysis was done by MIPS (Munich). BLAST searches were performed at the NCBI using the BLAST network service. PROSITE (SwissProt, release 12.2, February 1995) searches were performed with deduced protein sequences of probable new genes using the Geneskipper software which was also used for the other analy- ses. Codon usage of individual open reading frames (ORFs) was compared with a reference table compiled from 435 yeast genes (ysc.cod by J. Michael Cherry, unpublished).

RESULTS AND DISCUSSION

Sequence determination A total of 771 overlapping fragments were

sequenced to determine the 50,984 bp of the contig carried by cosmids pEOA306, pEOA265 and pEOA106. Ninety-two walking primers were

Sequence analysis The overall GC content of the sequence is 38.1%

which is the same as the GC content reported for yeast chromosome XI (Dujon et al., 1994). The GC content of identified and probably new genes is slightly higher (39.10/0). These ORFs cover 78% of the total sequence; only 22% of the reported sequence is non-coding but still does contain regu- latory sequences as promoters, terminators and other elements (see below).

ORF analysis of the sequence revealed 30 ORFs of more than 100 amino acid residues, named #1-#30 (Figure 1 and Table 1). The ORFs were also given preliminary names by MIPS. Each ORF was assigned with ‘0’ (for chromosome XV), and a number. This nomenclature is additionally used in the text and in Table 1. Computer analysis for possible introns (combinations of: 5’ donor site within 1000 bp upstream of a branch point which, in turn, is not more than 180 bp upstream of a 3‘ acceptor site) revealed the existence of five possible locations of introns (Figure 1). Three of these locations lie within ORFs (#8 and #18 in Figure 1) and are, therefore, not likely to be real. One location overlaps to a large extent with an ORF (#19) that is located on the opposite strand and also not likely to be real. The fifth predicted location of an intron lies within ORF 03275 (#30) which is indeed interrupted by an intron (see below).

All ORFs were analysed with FASTA on DNA and protein level. Putative new yeast proteins were additionally analysed with PROSITE and BLAST. Several PROSITE consensus pattern hits were found in almost every ORF, namely putative phosphorylation sites of casein kinase 11, protein kinase C and CAMP-dependent protein kinase, myristoylation sites (Towler et al., 1988) and microbodies C-terminal targeting signals (de Hoop and Ab, 1992). These matches were, therefore, not counted as significant.

ORFs were analysed for codon usage, which was compared with a reference codon usage table,

Page 3: Sequencing and analysis of 51 kb on the right arm of chromosome XV from Saccharomyces cerevisiae reveals 30 open reading frames

51 kb SEQUENCE O N CHROMOSOME XV 283

EcoR I A BamHI

B C

a b C

f e d

D

E

Abt 1 p 6 tRNA I II

1 50984 minimum length = 100 amino acid residues start codon = ATG

Figure 1. ORF analysis of the 51 kb contig. (A) EcoRI and BamHI maps. (B) Positions of the Abtlp binding site, the delta element and the tRNA-Asp gene. (C and E) Positions of 3' splice acceptor sites are drawn as vertical bars, possible locations of introns are indicated by open boxes for both orientations. (D) Location and frame (a-0 of identified ORFs are indicated. Known yeast genes are drawn as black boxes, probable and possible new genes grey. Open boxes represent ORFs that probably do not encode a protein. Positions of stop codons are indicated by vertical bars in the respective frames. Numbering of ORFs is the same as that used in Table 1.

compiled from known yeast genes. The similarity of the codon usage of an identified ORF and that of already known yeast genes from the reference table is calculated and the result given as a correspond value. The lower the correspond value, the greater the probability that a particular ORF contains coding sequence. Figure 2 shows a plot of the codon usage profiles of ORFs 03240 (#lo), 03242 (#24) and 03260 (#21), each com- pared with the codon usage profile calculated from the codon usage reference table. The higher the similarity between the profile of an ORF and the reference profile, the greater the probability that a particular ORF represents a functional gene. The correspond value of ORF 03240 is low and the codon usage profile of this ORF is very similar to the reference profile, indicating that ORF 03240 is a functional gene. For ORF 03260, the correspond value is high and its profile is rather different from the reference pro- file. This indicates that ORF 03260 most prob- ably does not represent a functional gene. The correspond value and the profile of ORF 03242 are intermediate and would suggest that this ORF is possibly coding.

New genes

The reported sequence contains 13 new putative yeast genes and homologues of known or putative genes from other species. Ten of these genes have codon usages (correspond value < 2.00) similar to known yeast genes, the other three ORFs (ORFs 03311 (#15), 03299 (#16) and 03287 (#26)) are borderline cases from the codon usage (correspond value > 2.00) but their size of over 200 amino acid residues would still suggest that these ORFs are expressed, probably at a rather low level.

03237 (#2) is homologous with a hypothetical protein from Bacillus subtilis (maf, AC: L08793; Butler et al., 1993). The deduced protein sequences have a BLAST score of 134, the smallest sum probability P(2) is 5.4e-13. The sequences share identity in 21 out of 46 amino acid residues (45%) and positives in 37 out of 46 residues (80'%0).

0331 1 (#15) is homologous with a hypothetical 24K protein from Escherichia coli (AC: P32661; A. Lyngstadaas and E. Boye, unpublished), product of the yhfE gene. The deduced protein sequences have a score of 115, the smallest sum probability P(3) is 7.3e-12. Identities are in 20

Page 4: Sequencing and analysis of 51 kb on the right arm of chromosome XV from Saccharomyces cerevisiae reveals 30 open reading frames

284 S. WIEMANN ET AL.

Table 1. Prediction of open reading frames of more than 100 amino acid residues

Gene o r ORF Name Length Length FASTA # Frame (MIPS) Start Stop (bp) (protein) MW (D) PI GC (%) Pred. Corr. score

1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 b 9 b

10 C

11 C

12 C

13 C 14 d 15 d 16 d 17 d 18 d 19 e 20 e 21 e 22 e 23 e 24 e 25 f 26 f 27 f 28 f 29 f 30 d + e

03234 03237 03263 03269 03290 03320 03326 03244 03248 03240 03258 03314 03317 03323 0331 1 03299 03284 0328 1 03296 03278 03260 03254 0325 1 03242 03293 03287 03272 03266 03246 03275

(1) 826

17254 21511 3 1567 48961 50647 4472 7967 1905

15426 43572 45495 50897 43235 41204 30245 29039 39955 24967 15817 14542 9862 3262

36588 3 1074 22500 20643 6996

23284 23062

645 1524

18957 22449 34590 50190

(50984) 7216 8851 4190

16730 45227 48023 50556 42579 40326 29427 25221 37274 23573 15515 10160 9056 2648

34873 30358 22195 19189 6679

23272 22695

(645) 699

1704 939

3024 1230

2745 885

2286 1305 1656 2529 342 657 879 819

3819 2682 1395 303

4383 807 615

1716 717 306

1455 318

(338)

38 1

(214) 232 567 312

1007 409

914 294 76 1 434 551 842 113 218 292 272

1272 893 464 100

1460 269 204 57 1 238 101 484 105

(1 1-21

126

(23886) 26478 64295 35080

112833 46217

(12132) 101171 34976 84770 48257 63204 93291 12471 22828 31014 29 176

140841 95478 52520 10529

157004 28845 21681 62055 25303 11 170 52924 11673

13678

4.42 40 + 1.84 4.40 36 + 1.01 5.84 37 + 0.93 8.20 40 0.79 5.98 38 0.81 7.22 39 + 0.96

10.60 42 3.05 7.00 39 0.69

10.35 37 + 1.47 5.00 38 + 0.75 4.65 41 1.46 6.51 40 1.46 5.87 44 0.36* 8.46 38 - 2.27 6.21 39 + 2.50 9.67 42 + 2.69 8.90 42 2.28 4.81 37 1.24 5.95 37 + 1.13 4.04 41 1.91 8.31 38 + 3.16 8.25 39 I .26 5.91 37 + 2.15

10.03 37 - 1.75 7.02 40 2.12 5.27 40 + 2.58

10.49 37 - 2.96 6.12 37 + 2.36 3.39 37 - 5.04

5.36 44 3.07

107 < 100 <loo GCY

DBM 1 113

IDH2 AZFl

133 <loo

YTAl Vspl7 EFT 1 <loo <loo 267

CAT5 IDH2

124 LEO1 (<loo) RP03 1 (<loo) (<loo)

(<loo)

(2 16)

ADE2 <loo

421

PFY 1

ORF analysis of cosmid contigs from cosmids pEOA306, pEOA265 and pEOA106. The list contains ORFs of more than 100 aa residues, the optimized FASTA scores were obtained in a comparison of the nucleotide/deduced amino acid sequences with the GenBank and EMBL/SwissProt data libraries. Given are the molecular weights (MW) and PI (isoelectric point) values of the deduced proteins, the GC content (GC YO) of the individual ORFs and the coding prediction (Pred.) for new yeast genes (based on the correspond value, the length and the lack of overlap with other ORFs). '+', the ORF is probably coding; ' - ', the ORF is probably not coding. The correspond values (Corr.) were calculated with a yeast codon usage table produced by J. Michael Cherry, which includes 435 S. cerevisiue genes of the GenBank DNA data library 63 (ysc.cod). The sizes of ORFs #1 and #7 are given in parentheses because these ORFs extend the sequence that is reported here upstream and downstream, respectively. *The correspond value of ORF 03317 (#13), the yeast EFT1 gene, was calculated with the yeast-high.cod reference table from J. Michael Cherry, generated from highly expressed yeast genes (taking the ysc.cod, the value of ORF 03317 had been 3.47). ORF 03317 was the only ORF in the contig considered as highly expressed taking the codon frequency of the yeast-high.cod as reference. The 'Name' column gives the working nomenclature of MIPS.

out of 42 residues (47%), positives in 32 out of A yeast homologue of the human mitochondria1 42 residues (76%). The E. coli yhfE gene is part brown fat uncoupling protein (AC: P25874; of the dam operon (AC: 219601; A. Lyngstadaas Cassard et al., 1990) is encoded by ORF 03299 and E. Boye, unpublished) and is probably a (#16). The protein sequences share 24.8% identity member of the ribulose-phosphate 3- epimerase in a 282 amino acid overlap. The yeast gene family. YMCl (AC: X67122; Graf et al., 1993) encoding a

Page 5: Sequencing and analysis of 51 kb on the right arm of chromosome XV from Saccharomyces cerevisiae reveals 30 open reading frames

51 kb SEQUENCE ON CHROMOSOME XV

03240 correspond: 0.75

03242 correspond: 1.75

285

03260 correspond: 3.1 6

a, a, cn cn a, m

3 C 0 -0

a

8

3 C 8 0 0 -0 -0 0

z a

8 0

3

0 0 0 individual codons individual codons individual codons

Figure 2. Correspond values and codon usage profiles of ORFs 03240, 03242 and 03260 (from left to right). Codon usage of ORFs was compared with a codon usage reference table (ysc.cod by J. Michael Cherry, unpublished). The lower the correspond value and the higher the similarity of the two curves, the greater the probability that a particular ORF is coding.

mitochondria1 carrier protein is also a homologue of ORF 03299. The protein sequences have 55.2% similarity (33.9 identity). BLAST analysis of this ORF revealed high homology with Argl3p, an amino acid transporter from Neurospora crassa (score: 129; smallest sum probability 2.5e-82; AC: L36378; Q. Liu and J.C. Dunlap, unpublished).

A yeast homologue of a gene described from other species is encoded by ORF 03266 (#28). The deduced gene product is similar to a hypothetical protein from Caenorhabditis elegans chromosome 3 (optimized FASTA score 421, 39% identity in a 213 amino acid overlap; AC: P34649; Wilson et al., 1994) and a hypothetical protein from Thermo- plasma acidophilum (FASTA score 344, 37.0% identity in a 189 amino acid overlap; AC: 403021; Klenk et al., 1992).

The deduced protein from ORF 03263 (#3) contains a targeting signal for transfer into the endoplasmic reticulum (KEEL at position 145; consensus [KRHSAI-[DENQI-E-L; Pelham 1990). However, the consensus targeting signal in S. cerevisiae has been reported to be HDEL. The authenticity of the signal sequence in ORF 03263 is, therefore, questionable. Furthermore, proteins transported to the endoplasmic reticulum contain the targeting signal usually at the C-terminal, which is not the case for ORF 03263.

The protein encoded by ORF 03240 (#lo) contains one possible aminotransferase class-I1 pyridoxal-phosphate attachment site, where the lysin residue (position 687) would be the pyridoxal-P attachment site (position 684, TLAKSIAPSS; consensus T-[LIVMFYWI-[SAG]-

K-[SAG]-[LIVMFYW]-[GA]-x(2) -[SAG]; A. Bairoch, unpublished).

ORF 03287 (#26) contains an endoplasmic reticulum targeting signal at the C-terminal of the deduced protein. The sequence of the signal (position 200; HDEL) is identical with the pattern reported for S. cerevisiae (Pelham, 1990). The deduced protein contains three clusters of hydrophobic residues (surrounding residues 34, 76 and 120) and, around residue 130, a very hydrophilic stretch with high probability to form an a-helix. The codon usage (correspond value 2-58) would suggest that this ORF is rarely expressed.

The ORFs 03234 (#l), 03320 (#6), 03248 (#9), 03296 (#19), 03251 (#23) and 03272 (#27) did not have any significant hits in FASTA, BLAST and PROSITE analyses. ORF 03320 (#6) contains a repeat of [CAT],, encoding a stretch of histidine residues. ORF 03272 (#27) is a borderline case because of its length of only 101 amino acid residues and its high correspond value.

Known genes The protein encoded by ORF 03269 (#4) is

identical with the GCYgene product (AC: X13228; Oechsner et al., 1988) which is homologous to a vertebrate eye lens protein. The two DNA sequences are 100% identical in a 1542 bp over- lap. ORF 03290 (#5 ) is the gene encoding the S. cerevisiae rho type GTPase activating protein Dbmlp (AC: U07421; G. Chen et al.,

Page 6: Sequencing and analysis of 51 kb on the right arm of chromosome XV from Saccharomyces cerevisiae reveals 30 open reading frames

286 S. WIEMANN ET AL.

been published earlier (AC: M23369; Magdolen et al., 1988). After four amino acid residues of this gene, the coding sequence is interrupted by 210 nucleotides of intronic sequence. Both the canonical splice donor (GTATGT) and acceptor (TACTAAC) sequences, as well as a branch point, are present within the intron. There are no further discrepancies between the genomic sequence and the 1352 bp of the published sequence. The sequence of ORF 03278 (#20) also overlaps with the published sequence of the UBP2 gene (Baker et al., 1992), encoded by ORF 03281

ORF 03254 (#22) encodes the largest subunit of yeast RNA polymerase 111. The DNA sequence of ORF 03254 is 99.9% identical in a 5462 bp overlap with that of the published sequence of the RP031 gene (AC: X03129; Allison et al., 1985). The ADE2 gene (AC: M59824; Stotz and Linder, 1990), encoding the catalytic subunit of the phosphoribo- sylaminoimidazole carboxylase is located in ORF 03293 (#25); the DNA sequences are 100% identical in a 25 15 bp overlap.

(#18).

unpublished). The sequences are 99.9% identical in a 3669 bp overlap.

ORF 03326 (#7) is 100% identical (in 498 bp) with the published sequence for the yeast isocitrate dehydrogenase gene (IDH2; AC: M74131; Cupp and McAlister-Henn, 1991). The sequence of ORF 03244 (#8) is identical (100% identity in 4217 bp overlap) with a previously reported gene (AZFl ) encoding a hypothetical zinc finger protein of 100 kDa (AC: 226253; Brohl et al., 1994). The S. cerevisiae TAT-binding homologue 1 (product of the YTAl gene, AC: X73569; identities: ORF: 99.7% in 1305 bp, gene: 99% in 2884 bp; Stucka et al., manuscript in preparation) is encoded by ORF 03258 (#ll). This gene is a member of a gene family; other members of the same family are e.g. CDC48, PASl , SEC18.

The S. cerevisiae vacuolar protein sorting- associated protein Vspl7p is encoded by ORF 03314 (#12), the sequence is 99.3% identical in a 2819 bp overlap with the published sequence (AC: L02869; Kohrer and Emr, 1993). ORF 03317 (#13) is the gene coding for the yeast translation elongation factor 2, the product of the EFT1 gene (AC: M59369; Perentesis et al., 1992). The sequences are 100% identical in a 2932 bp overlap. Codon usage analysis of this ORF revealed that the elongation factor is highly transcribed (see Table 1).

The yeast CAT5 gene is located in ORF 03284 (#17, AC: X82930; M. Proft, unpublished). The gene product is involved in glucose repression. The amino acid sequences are 100% identical. The yeast ubiquitin carboxyl-terminal hydrolase 2 is encoded by ORF 03281 (#18), the DNA sequence is 98.7 identical in a 3013 bp overlap with the published sequence of the UBP2 gene (AC: M94916; Baker et al., 1992). Compared to the published sequence, ORF 03281 contains a 24 bp Hind111 fragment that is missing in the published sequence for the UBP2 gene. The reading frame is not destroyed by the lack of this fragment in the published sequence and the loss had, therefore, not been discovered. Presence of the Hind111 fragment was verified by sequencing on a first-order EcoRI subclone of the cosmid.

The yeast LEO1 gene product (AC: X77135; Magdolen et al., 1994) is encoded by ORF 03278 (#20). The function of the putative protein remains to be determined. The two sequences are 100% identical in a 1876 bp overlap. The yeast profiline gene PFYl (ORF 03275, #30), located on the same database entry (AC: X77135) had already

Probably no genes ORFs 03323 (#14), 03260 (#21) and 03246

(#29) overlap either with known yeast genes coded in another frame (03244, #8 and 03326, #7). These probably do not code for a gene product. A correspond value of 1.75 and the length of 204 amino acid residues would imply, however, that ORF 03242 (#24) might be coding, although it overlaps with ORF 03240 (#lo). FASTA and PROSITE analysis did not reveal significant hits of this ORF.

Other homologies The sequence in position 4187742355 shares

72.9% identity in a 339 bp overlap with a yeast delta element (AC: X02417; Hauber et al., 1985). The adjacent sequence at positions 42355- -42435 matches perfectly with the yeast tRNA-Asp gene from chromosome I1 (AC: X76294; Van der Aart et al., 1994). A yeast Abflp binding site, reported to be present in yeast RNA polymerase genes, is present at positions 14876-14930. This region is located some 300 bp upstream of ORF 03254 (#22), the gene encoding the largest subunit of yeast RNA polymerase 111. The sequence is 100% identical in a 56 bp overlap with the published sequence (AC: M60734; Seta et al., 1990).

Page 7: Sequencing and analysis of 51 kb on the right arm of chromosome XV from Saccharomyces cerevisiae reveals 30 open reading frames

51 kb SEQUENCE ON CHROMOSOME XV 287

ACKNOWLEDGEMENTS

We thank Dr Karl Kleine and all MIPS staff (Munich, Germany) for help with the sequence analysis. This work was supported by the Com- mission of the European Communities. The Prague laboratory was supported by grant no. 20419312349 of the Czech Grant Agency.

REFERENCES Allison, L. A,, Moyle, M., Shales, M. and Ingles, C. J.

(1985). Extensive homology among the largest sub- units of eukaryotic and prokaryotic RNA polymerase. Cell 42, 599-610.

Ansorge, W., Voss, H., Wiemann, S., et al. (1992). High-throughput automated DNA sequencing facility with fluorescent labels at the European Molecular Biology Laboratory. Electrophoresis 13, 616-619.

Baker, R. T., Tobias, J. W. and Varshavsky, A. (1992). Ubiquitin-specific proteases of Saccharomyces cere- visiae. Cloning of UBP2 and UBP3, and functional analysis of the UBP gene family. J. Biol. Chem. 267, 23364-23375.

Benes, V., Hostomsky, Z., Arnold, L. and Paces, V. (1993). M13 and pUC vectors with new unique restriction sites for cloning. Gene 130, 151-152.

Brohl, S., Lisowsky, T., Riemen, G. and Michaelis, G. (1994). A new nuclear suppressor system for a mitochondrial RNA polymerase mutant identifies an unusual zinc-finger protein and polyglutamine domain protein in Saccharomyces cerevisiae. Yeast 10, 719-73 1.

Butler, Y. X., Abhaydwardhane, Y. and Stewart, G. C. (1993). Amplification of the Bacillus subtilis maf gene results in arrested septum formation. J. Bacteriol. 175, 3139-3145.

Cassard, A. M., Bouillaud, F., Mattei, M. G., et al. (1990). Human uncoupling protein gene: Structure, comparison with rat gene, and assignment to the long arm of chromosome 4. J. Cell. Biochem. 43, 255-264.

Cupp, J. R. and McAlister-Henn, L. (1991). NAD+- dependent isocitrate dehydrogenase: Cloning, nucleo- tide sequence, and disruption of the ZDH-2 gene from Saccharomyces cerevisiae. J. Biol. Chem. 266, 22 199-22205.

de Hoop, M. J . and Ab, G. (1992). Import of proteins into peroxisomes and other microbodies. Biochem. J .

Dujon, B., et al. (1994). Complete DNA sequence of yeast chromosome XI. Nature 369, 371-378.

Graf, R., Baum, B. and Braus, G. H. (1993). YMC1, a yeast gene encoding a new putative mitochondrial carrier protein. Yeast 9, 301-305.

Hauber, J., Nelbock-Hochstetter, P. and Feldmann, H. (1985). Nucleotide sequence and characteristics of a

286, 657-669.

Ty element from yeast. Nucl. Acids Res. 13, 2745- 2758.

Klenk, H., Renner, O., Schwass, V. and Zillig, W. (1992). Nucleotide sequence of the genes ecoding the subunits H, B, A' and A" of the DNA-dependent RNA polymerase and the initiator tRNA from Thermoplasma acidophilum. Nucl. Acids Res. 20, 5226.

Kohrer, K. and Emr, S. D. (1993). The yeast VPS17 gene encodes a membrane-associated protein required for the sorting of soluble vacuolar hydrolases. J. Biol. Chem. 268, 559-569.

Magdolen, V., Oechsner, U., Mueller, G. and Bandlow, W. (1988). The intron-containing gene for yeast pro- filin (PFY) encodes a vital function. Mol. Cell. Biol. 8,

Magdolen, V., Lang, P., Mages, G., Hermann, H. and Bandlow, W. (1994). The gene LEO1 on yeast chromosome XV encodes a non-essential extremely hydrophilic protein. Biochim. Biophys. Acta 1218, 205-209.

Oechsner, U., Magdolen, V. and Bandlow, W. (1988). A nuclear yeast gene (GCY) encodes a polypeptide with high homology to a vertebrate eye lens protein. FEBS Lett. 238, 123-128.

Pelham, H. R. B. (1990). The retention signal for soluble proteins of the endoplasmic reticulum. Trends Biochem. Sci. 15, 483-486.

Perentesis, J. P., Phan, L. D., Gleason, W. B., LaPorte, D. C., Livingston, D. M. and Bodley, J. W. (1992). Saccharomyces cerevisiae elongation factor 2. Gen- etic cloning, characterization of expression, and G-domain modeling. J. Biol. Chem. 267, 1190-1 197.

Seta, F. D., Treich, I . , Buhler J. M. and Sentenac, A. (1990). ABFl binding sites in yeast RNA polymerase genes. J. Biol. Chem. 265, 15168-15175.

Stotz, A. and Linder, P. (1990). The ADE2 gene from Saccharomyces cerevisiae: Sequence and new vectors. Gene 95, 91-98.

Stucka, R., Schwarzlose, C., Mannhaupt, G., Schnall, R. and Feldmann, H. (1995). A novel multi-gene family in yeast with high similarity to the human tat-binding proteins. (In preparation.)

Thierry, A,, Gaillon, L., Galibert, F. and Dujon, B. (1992). Yeast chromosome XI: Construction of a cosmid contig, a high resolution map and sequencing progress. Abstract CSH Meeting: Genome Mapping and Sequencing, 198.

Towler, D. A,, Gordon, J. I., Adams, S. P. and Glaser, L. (1988). The biology and enzymology of eukaryotic protein acylation. Annu. Rev. Biochem. 57, 69-99.

Van der Aart, Q. J. M., Barthe, C., Doignon, F., Aigle, M., Crouzet, M. and Steensma, H. V. (1994). Se- quence analysis of a 31 kb DNA fragment from the right arm of Saccharomyces cerevisiae chromosome 11. Yeast 10, 959-964.

Voss, H., Wiemann, S., Wirkner, U., el al. (1992). Automated DNA sequencing system resolving 1,000

5 108-5 1 15.

Page 8: Sequencing and analysis of 51 kb on the right arm of chromosome XV from Saccharomyces cerevisiae reveals 30 open reading frames

288

bases with fluorescein-1 5-*dATP as internal label. Meth. Mol. Cell. Biol. 3, 153-156.

Wiemann, S., Rupp, T., Zimmermann, J., Voss, H., Schwager, C. and Ansorge, W. (1995). Primer design for automated DNA sequencing utilizing T7 DNA polymerase and internal labeling with fluorescein- 15- dATP. BioTechniques 18, 688-697.

S. WIEMANN ET AL.

Wilson, R., et al. (1994). 2.2 Mb of contiguous nucleo- tide sequence from chromosome I11 of C. elegans. Nature 368, 32-38.

Yen, T. J., Li, G., Schaar, B. T., Szilak, I. and Cleveland, D. W. (1992). CENP-E is a putative kinetochore motor that accumulates just prior to mitosis. Nature 359, 53&539.