effects of transposable elements on the expression of ...washing was at 50" in 0.1x ssc (150 mm...

20
Copyright 0 1993 by the Genetics Society of America Effects of Transposable Elements on the Expression of theforked Gene of Drosophila melanogaster Kenley K. Hoover, Andy J. Chien and Victor G. Corces Department of Biology, The Johns Hopkins University, Baltimore,Maryland 21218 Manuscript received April 13, 1993 Accepted for publication June 23, 1993 ABSTRACT The products of the forked gene are involved in the formation and/or maintenance of a temporary fibrillar structure within the developing bristle rudiment of Drosophila melanogaster. Mutations in the forked locus alter this structure and result in aberrant development of macrochaetae, microchaetae and trichomes. The locus hasbeen characterized at the molecular level by walking, mutant character- ization and transcript analysis. Expression of the six forked transcripts is temporally restricted to mid- late pupal development. At this time, RNAsof 6.4, 5.6, 5.4, 2.5, 1.9 and 1.1 kilobases (kb) are detected by Northern analysis. The coding region of these RNAs has been found to be within a 21- kb stretch of genomic DNA. The amino terminus of the proteins encoded by the 5.4- and 5.6-kb forked transcripts contain tandem copies of ankyrin-like repeats that may play an important role in the function of forked-encoded products. The profile of forked RNA expression is altered in seven spontaneous mutations characterized during this study. Three forked mutations induced by the insertion of the gypsy retrotransposon contain a copy of this element inserted into an intron of the gene. In these mutants, the 5.6-, 5.4- and 2.5-kbforked mRNAs are truncated via recognition of the polyadenylation site in the 5’ long terminal repeat of the gypsy retrotransposon. These results help explain the role of the forked gene in fly development and further our understanding of the role of transposable elements in mutagenesis. D EVELOPMENT of macrochaetae and micro- chaetae in Drosophila melanogaster involves cho- reographed interactions between four sister cells. These cells are derived from two successive mitotic divisions of a sensory mother cell. At approximately 15 hr after puparium formation, the two predecessor cells generated from the first of these divisions and located within the epidermal layer divide to give rise to two sets of cells. One pair will differentiate to form the external sensory neuron and its protective sheath cell, the thecogen. The other set is composed of two precursor cells, the socket-forming tormogen cell and the bristle-secreting trichogen cell.Between 18 and 30 hr of pupal development, the tormogen and tri- chogen precursor cells grow larger than surrounding epidermal cells in volume and become polyploid througha series of endomitotic divisions (POODRY 1980). Bristle secretion commences at approximately 30 hr afterpuparium formation and proceeds rapidly during the next 25 hr of development. Socket for- mation begins after bristle secretion has started. The socket cell has completely circumscribed the bristle by 48 hr after puparium formation. Growth of the developing bristle occurs predomi- nantly at its tip and proceeds perpendicular to the bodyaxis(LEES and WADDINCTON 1942; LEES and PICKEN 1945). Histological studies indicate that bristle morphogenesis depends upon the structural grouping Genetics 135: 507-526 (October, 1993) and orientation of microtubules and microfibrils within these cells. In particular, the formation and arrangement of fiber bundles associated with the cell surface of the developing bristle rudiment plays a critical role in bristle development. These intracyto- plasmic fiber bundles are closely applied to the cell surface and are oriented parallel to the long axis of the bristle rudiment. Cross sectional analysis of the bristle rudiments of Stubble and singed mutants shows an alteration in the amount, shape, and distribution of these fiber bundles (OVERTON 1967). In contrast, these mutants do not show abnormalities in the more centrally located microtubule structures when com- pared to wild-type bristle sections. Cross-sectional analysis of the bristle rudiments in the forked (f) mutant f36a 40 hr after puparium formation fails to detect these longitudinally arrayed fiber bundles (N. PETERSON, personal communication).Theforked gene (1-56.7) is located in chromosomal subdivision 15F. The product of this gene is essential during a 24-hr period just prior to bristle secretion (DUDICK, WRIGHT and BROTHERS 1974). Failure of theforked product to be made in wild-typequantities results in bristles that are shortened and aberrantly bent or forked with ends that may be sharply twisted or split. The severity of the bristle phenotype ranges from the relatively mild f ”’ andf3 alleles to the severefhd,f360 and f 14’ alleles. The primary source of spontaneous mutations in

Upload: others

Post on 27-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Effects of Transposable Elements on the Expression of ...Washing was at 50" in 0.1X SSC (150 mM NaCI, 15 mM sodium citrate pH 7.0) and 0.05% SDS for 1 hr at 50". The oligo-(dT) primed

Copyright 0 1993 by the Genetics Society of America

Effects of Transposable Elements on the Expression of theforked Gene of Drosophila melanogaster

Kenley K. Hoover, Andy J. Chien and Victor G. Corces

Department of Biology, The Johns Hopkins University, Baltimore, Maryland 21218 Manuscript received April 13, 1993

Accepted for publication June 23, 1993

ABSTRACT The products of the forked gene are involved in the formation and/or maintenance of a temporary

fibrillar structure within the developing bristle rudiment of Drosophila melanogaster. Mutations in the forked locus alter this structure and result in aberrant development of macrochaetae, microchaetae and trichomes. The locus has been characterized at the molecular level by walking, mutant character- ization and transcript analysis. Expression of the six forked transcripts is temporally restricted to mid- late pupal development. At this time, RNAs of 6.4, 5.6, 5.4, 2.5, 1.9 and 1.1 kilobases (kb) are detected by Northern analysis. The coding region of these RNAs has been found to be within a 21- kb stretch of genomic DNA. The amino terminus of the proteins encoded by the 5.4- and 5.6-kb forked transcripts contain tandem copies of ankyrin-like repeats that may play an important role in the function of forked-encoded products. The profile of forked RNA expression is altered in seven spontaneous mutations characterized during this study. Three forked mutations induced by the insertion of the gypsy retrotransposon contain a copy of this element inserted into an intron of the gene. In these mutants, the 5.6-, 5.4- and 2.5-kbforked mRNAs are truncated via recognition of the polyadenylation site in the 5’ long terminal repeat of the gypsy retrotransposon. These results help explain the role of the forked gene in fly development and further our understanding of the role of transposable elements in mutagenesis.

D EVELOPMENT of macrochaetae and micro- chaetae in Drosophila melanogaster involves cho-

reographed interactions between four sister cells. These cells are derived from two successive mitotic divisions of a sensory mother cell. At approximately 15 hr after puparium formation, the two predecessor cells generated from the first of these divisions and located within the epidermal layer divide to give rise to two sets of cells. One pair will differentiate to form the external sensory neuron and its protective sheath cell, the thecogen. The other set is composed of two precursor cells, the socket-forming tormogen cell and the bristle-secreting trichogen cell. Between 18 and 30 hr of pupal development, the tormogen and tri- chogen precursor cells grow larger than surrounding epidermal cells in volume and become polyploid through a series of endomitotic divisions (POODRY 1980). Bristle secretion commences at approximately 30 hr after puparium formation and proceeds rapidly during the next 25 hr of development. Socket for- mation begins after bristle secretion has started. The socket cell has completely circumscribed the bristle by 48 hr after puparium formation.

Growth of the developing bristle occurs predomi- nantly at its tip and proceeds perpendicular to the body axis (LEES and WADDINCTON 1942; LEES and PICKEN 1945). Histological studies indicate that bristle morphogenesis depends upon the structural grouping

Genetics 135: 507-526 (October, 1993)

and orientation of microtubules and microfibrils within these cells. In particular, the formation and arrangement of fiber bundles associated with the cell surface of the developing bristle rudiment plays a critical role in bristle development. These intracyto- plasmic fiber bundles are closely applied to the cell surface and are oriented parallel to the long axis of the bristle rudiment. Cross sectional analysis of the bristle rudiments of Stubble and singed mutants shows an alteration in the amount, shape, and distribution of these fiber bundles (OVERTON 1967). In contrast, these mutants do not show abnormalities in the more centrally located microtubule structures when com- pared to wild-type bristle sections. Cross-sectional analysis of the bristle rudiments in the forked (f) mutant f36a 40 hr after puparium formation fails to detect these longitudinally arrayed fiber bundles (N. PETERSON, personal communication). Theforked gene (1-56.7) is located in chromosomal subdivision 15F. The product of this gene is essential during a 24-hr period just prior to bristle secretion (DUDICK, WRIGHT and BROTHERS 1974). Failure of theforked product to be made in wild-type quantities results in bristles that are shortened and aberrantly bent or forked with ends that may be sharply twisted or split. The severity of the bristle phenotype ranges from the relatively mild f ”’ andf3 alleles to the severefhd,f360 and f 14’ alleles.

The primary source of spontaneous mutations in

Page 2: Effects of Transposable Elements on the Expression of ...Washing was at 50" in 0.1X SSC (150 mM NaCI, 15 mM sodium citrate pH 7.0) and 0.05% SDS for 1 hr at 50". The oligo-(dT) primed

508 K . K. Hoover, A. J. Chien and V. G. Corces

c1 2 3 4 5 6 1 - , & " L , m e 891011 5.4 kb(C)

1.9 kb(D)

1.1 kb(E)

cDNA done ??X Dl 2 3 4 5 m w cDNA dorm 2W

L 6.4 kb(F)

FIGURE 1.-Structure of theforked locus in wild-type and mutant alleles. The restriction map of theforked region is shown. The identity, insertion site, and orientation of the insertions inf3,f'7k,fhd,f',f5,fK,f360 andf'4k alleles are indicated; arrows above each element show the direction of transcription, arrows in the elements represent inverted repeats, and rectangles signify LTRs of the retrotransposons. Each demarcation of the grid corresponds to 1 kb and is numbered according to the sequence of the locus in Figure 3. Transcription units of the forked locus are presented below the grid. Rectangles indicate exons of the gene; shaded rectangles represent translated coding regions and empty rectangles indicate regions that are transcribed, but not translated. cDNA clones are bracketed beneath or above the transcripts that they contain. Each box (exon) is named according to its location on the transcript.

yeast and Drosophila is the insertion of transposable elements. The mechanism of transposable element mutagenesis upon the affected locus depends on the location of the insertion site with respect to the differ- ent structural and functional domains of the mutated gene. Possible modes of mutagenesis include interfer- ence with enhancer and/or promoter activity, disrup- tion of the coding region, alteration of message proc- essing, and perturbation through more complicated mechanisms inherent to the nature of the element causing the mutation. For example, insertion of the gypsy retrotransposon in the 5' region of the yellow (y) or cut (ct) loci affects the rate of transcription of these genes in a tissue-specific fashion (CORCES and GEYER 199 1 ; JACK et al. 199 I) , whereas insertion of the copia element in an intron of the white ( w ) gene results in premature truncation of transcripts initiating in the white promoter (ZACHAR et al. 1985; MOUNT, GREEN and RUBIN 1988; PENG and MOUNT 1990). Character-

ization of the forked locus has been undertaken in order to analyze the mode of gypsy mutagenesis in alleles where the element has inserted into the intron of the gene. The forked locus was cloned by utilizing the gypsy retrotransposon as a tag to screen a genomic library prepared from the DNA of a f ' stock (PARK- HURST and CORCES 1985; MCLACHLAN 1986). The f ', f and f alleles of this gene are gypsy-induced and bring the expression of the forked gene under the control of the su(Hw) gene as well as the suppressor of forked [sul f ) ] , suppressor of sable [su(s)], suppressor of white apricot [su(w")], suppressor of purple [su(pr)], sup- pressor of lozenge' [su(lz')], ~ ~ ( l r ' ~ ) , and enhancer of white eosin [en(w")] genes (RUTLEDGE et aZ. 1988; SMITH and CORCES 1991). The number of modifier loci that alter forked expression in these gypsy-induced alleles suggests that the gypsy element may alter the expres- sion of the forked gene by multiple mechanisms. The su(Hw) protein has been implicated to act at the tran-

Page 3: Effects of Transposable Elements on the Expression of ...Washing was at 50" in 0.1X SSC (150 mM NaCI, 15 mM sodium citrate pH 7.0) and 0.05% SDS for 1 hr at 50". The oligo-(dT) primed

The Drosophila forked Gene 509

A

-6.4 r 5 . 6 -5.4

- 2.5 -1.9

- 1.1

FIGURE 2.-Profile of forked mRNA expression during Drosoph- ila development. (A) Ten micrograms of poly(A+) RNA, isolated from different developmental stages of the Canton S stock were electrophoresed on a 1 % agarose-formaldehyde gel. transferred to a nitrocellulose membrane, and hybridized with a 32P-labeled 6.2- kb EcoRI-Sal1 genomic fragment (see Figure 1). Embryos were collected every 24 hr and reared at 22.5". The stage of the flies upon harvest is designated at the top of each lane. The sizes of forked transcripts are shown in the right margin in kilobases. (B) The same filter shown in A was hybridized with the Drosophila ras2 gene as a sample loading control. This R N A is expressed at constant levels throughout development (MOZER et al. 1985).

scription initiation level, and su(s), su(w") and su(f) are presumed to affect mRNA splicing and/or stability (SMITH and CORCFS 199 l) , suggesting that gypsy might affect these various processes when inserted into the forked locus. The work reported here describes the molecular characterization of the forked gene and the effect of gypsy insertion on forked transcription, offer- ing new insights into the mechanisms by which trans- posable elements affect the expression of adjacent genes.

MATERIALS AND METHODS

Fly stocks: Flies were reared at 22.5" and 65% relative humidity on standard molasses/yeast/cornmeal medium. Embryo samples for Northern analysis were collected at 24- hr intervals and raised at 22.5" until reaching the appropri- ate developmental stage for harvest. The f ', f ', f 5 and f 36a

stocks were obtained from stock centers. The f m' mutation was obtained from T. GERASIMOVA (Institute of Gene Biol- ogy, Russian Academy of Sciences, Moscow, Russia). The f hd allele was obtained from D. DORER and A. CHRISTENSEN (Thomas Jefferson University, Philadelphia). The f 14' and f '" alleles were obtained from A. KIM (Moscow State Uni- versity, Russia). The f tuh-l tuh-3 stock was provided by D. KUHN (University of Central Florida, Orlando). RNA preparation: Total RNA was prepared using the

sodium dodecyl sulfate (SDS)-phenol technique (SPRADLING and MAHOWALD 1979). Poly(A)+ mRNA was then purified

by oligo-(dT) cellulose chromatography (AVIV and LEDER 1972), fractionated on a 1.0% formaldehyde/agarose gel, and transferred to a nitrocellulose membrane. Filters were hybridized overnight at 42 O in a solution of 5X SSCP (1 50 mM NaCI, 15 mM sodium citrate, 20 mM sodium phosphate pH 6.8), 5X Denhardt's solution, 50% formamide, 1% Sar- cosyl, 100 Kg/ml carrier DNA and 10% dextran sulfate. Washing was at 50" in 0.1X SSC (150 mM NaCI, 15 mM sodium citrate pH 7.0) and 0.05% SDS for 1 hr at 50". The oligo-(dT) primed cDNA library was prepared using the cDNA Synthesis System Plus (Amersham, UK) and X gtlO arms. Random hexamer cDNA clones were generated with the Zap-cDNA synthesis kit (Stratagene, La Jolla).

DNA manipulations: DNA was isolated from Drosophila by homogenizing 1 g of adults in a buffer containing 0.1 M NaCI, 0.2 M sucrose, 10 mM EDTA, 0.5% Triton X and 30 mM Tris-HCI (pH 8.0). The sample was then filtered to remove debris, precipitated, and resuspended in a lysis buffer containing 0.35 M NaCI, 10 mM EDTA, 1 % N-lauroyl sarcosine, and 10 mM Tris-HCI (pH 8.0). DNA was extracted with phenol/chloroform and precipitated in ethanol. Treat- ment of DNA with restriction enzymes, Southern analysis, cloning of DNA into plasmid and phage vectors, library screening, and labeling of DNA by random priming were carried out by standard procedures (SAMBROOK, FRISCH and MANIATIS 1989). Genomic phage libraries were generated from partial Sau3Al digestions cloned into the lambda Dash vector (Stratagene, La Jolla). Sequencing of the forked locus from Canton S (wild-type), mutant alleles, and cDNA isolates utilized the dideoxy chain termination method (SANGER, NICKLEN and COULSON 1977) using Sequenase (United States Biochemical Corp.).

Electron microscopy: Young adults were fixed in a 4% glutaraldehyde solution of 0.1 M sodium cacodylate (pH 7.0). The samples were then dried, sputter coated with gold- palladium, and viewed in a Jeol JSM35 scanning electron microscope at a magnification of 95x.

RESULTS

Analysis of transcripts encoded by the forked gene: A chromosomal walk of contiguous genomic clones was performed in both directions flanking the insertion of the gypsy retrotransposon in the f' allele (PARKHURST and CORCES 1985) in order to derive a DNA map of the forked region. This walk spans a distance of 50 kb (Figure 1). The 6.2-kb EcoRI-Sal1 genomic fragment within which the gypsy is inserted was used as a probe to hybridize to a Northern blot of poly(A+) RNA prepared from various stages during Drosophila development. The results (Figure 2) show that the forked locus is temporally expressed during mid-late pupal development, during which time six transcripts are detected. The amount of these mRNAs was quantitated with a Molecular Dynamics PhosphorImager. A prominent 2.5-kb transcript ac- counts for 42% of the total forked RNA. Two RNAs of 1.9 and 1.1 kb make up 17 and 18% of the total RNA, respectively. Transcripts of 5.6 and 5.4 kb in length each constitute 10% of the total forked RNA. A rare 6.4-kb RNA constitutes 3% of the total forked RNA.

Characterization of forked transcripts by North-

Page 4: Effects of Transposable Elements on the Expression of ...Washing was at 50" in 0.1X SSC (150 mM NaCI, 15 mM sodium citrate pH 7.0) and 0.05% SDS for 1 hr at 50". The oligo-(dT) primed

510 K. K. Hoover, A. J. Chien and V. G. Corces

1 CGAGTGCTGG TTTATCTCGT CAAATACCGT TGCTAGTTTG AGGTCTTTGT GCAGGGTTGT GTTCCTTACG AACCATTCTG CTCTACATM TTTGGCTMT

AAGTCATTM GCTACTTTCT TACAAATTGT TMTTATTCT ATTCCTTATT CCTTTCCATC CTGCAGAACC ATAAACGATT ATGCGCTCU TTCG~SITCTG C A C c t a a g c c a c a a c a c c a a a c t a a a c a a t c t g c a c a g c a a c a a g c t c c t c c a c a g c a a t c a a a g c c a c a g c a g c a g c a t c a a t c c g a g c aCaaggatgC

c1

a C t C C C a C C a g c t g g a t t c c t a c f t g g g c g g a g t g g g c a g t a t g a t g a a t c t g g g c g g c a t g g c c a a g t t g g c c a g A T G T ACCATCCGGG GAGCCTTTCC

MET1 V r H i s P r o C L y S e r V a l S e r 401 GGCAGCWM GACCCMCCG AGGAGCTCTG GCCCTTCACT ATCCCGCCGC ACGTCGTTGC CTTGATTGTG TCCAACTGCT GGTCGCCGCA TCCGTGGATA

GLVSerGluA rgProAsnGL vClVaLaLeU A laLeuHisT v rALaALaAL aArgGlvCvs LeuAsKvsV a lGLnLeuLe uVa lALaALa SerVaLAsDl

TTTGGTMGT AGATGAGTTA CTGTCCCAAA GGAGTATAGT ATATGATGAC CAAMCTTGC ATCGCGATTT GAGTGCGGGG TGAAAGCTCT TGCACCAGCC K1, H1

CTTCATTGTT TTTATGTGTC GGGGAAATCG CTCGACGATC GATGGCTCCA CTTTATATTT TATTTTTCCC CCATCTGTCT GCATTTCCAT CGATGCGGTT TCTTTCCATT TACTTGCGAC AATTATCTCA TTAGGGGTTT CTTATCTGAT TCGATTAGTA TAAGAGCTGA u C A A G T W \ A TTATTGGCTG T G ~ T C T A

801 ACTAMTTTG GACTTCGTTA TAMATCCAT TTCGMACGT ATGGTGGTM TTATAGCAAT G C T T T W CAAGATTAAG A G A G ~ C GACCMTGTA

TGTATATATA T T T T G M M A ATTAAAGAAC TGGTAATCAA TATTTTAGAA TACTCTTTTC TTMGCGAAA ATGATATTTG GCAGGTTTAT ATAGATTTCC

AAGTATCMT TCCGTACAAA CGTATTATTA AAAGTATTAT ATTGTTCGAT TTATGTACAC CATTTTGCCT AGTTGTAATA ATCATTCAAA TGGAAGCATT

TMGTGTATT MGCCTTCGT AACGATCTGA TTATCATAAA CTCAATCAAC GATGATGTCG ATTTGAGAAT TTATCTACCT GGTMGCTGG GAGTACTCAT 1201 TAAATGCACT M T T A M A C T TATTAGCCCA TCCMCAAAG TAGAAAGMT GAAGGCGAGT GGTGAAAAAC TAGGAAAMG GCMGGAAM CCTGGCGMT

A T C T T G W ATCTATCCGC TAAGTGGGTT ACCGATCCGT GGCCAGTGTT TGATTCACCT AAATTCCTGT GCTCCAACAG CAGCACGCGT CCAAAGCTO. TTCATTTAAT TGGGAAATGG GAATCGGAAT CCAACTGGCC AAGGCGAGCC AGCCAAGCGA GATGCACTCC ATCTTCAGCC CTCCTTGGCC CCCTGAATCG

CCTACGATCC CATGCCATCC CATCGAGTCC AGGCCAGGTT TGGTTCTATC GTGTGCTATT TTCAATTGGA GTCAAACTGG MAACTCACA G C G G C A C ~

1601 GCACCGTTGC M C M G T T T T CCTCACTCGC CACCAAGTGG AGCMCTCCG GACTGACTGG CCAAGGAATC AGAATTGGAC TCAGMTCAC AGTCGGMTC

C G A G T C A W TCAGTGCCAC TGCCTGATCC CCACCTCCCT CGATCCTCAC CAGCTATATT CATATTTACA MGCGCCAAG GCAGATCMT TGACATGACA

ACTGACATGG CGACACAGM AAGAGCGAGA G A T G A W C G AGGGCGGCTC TGTGTATAAA ATGTATTGTA CMTAACTCG TCCCGTTTTA A C G T T M T M

2001 ATTCTATTTT MTTACATTA CATCGCCATT CATTTGGCTT AACTGTGGCC AAAAGTTATT TGCCTTACAA T C T T T T C M C CGACCCTCGT CACMCCGTT

TAGGGCTTTT TAAAGTATTT TTATTTAATT ATTTAGGCTA AGGAACMAG GGTATTAAGT ACTGACAATT TTATCACGAG GTATTATTCC TATMTCACC

TWTAGTTTT GCGTACAGAG ACACAAACCG TTTGATAGTT TTGCGTACAG AGACACAACG ATCCAAACAG CTGGTGCTTC TTATATTAAT ACTGCAAAGA

AGAAGGCGAG GGCAGTTTCA TGACCTTCTT ATAAAGATAA AATAAAATTT TAAAATTTAA ATTTCTGTTA AAATTTAAAT GAAATTAGTT TMTGTCAAG AAGCAAAMA ATTTGAACAT TTTAGAGMT TGTTTACGTA TGATTTATTA TCTCATGATC TTGCTCCGAT GCTATTGTAA ATAATATTGA AAAAAAAACT

2401 TTCCCTAAAT AAGAAATTTA ATTGGAAAGC ACACTAATTT TAAACGCTTC AGAMTTGAT GATTTCAAAT TTTTTTTCAA ATTATTTAGC GTGTCATAAT

TAAATGGCAC CTATGTACTT CCGGGTGACT TGTGTTGAAC TGTCAATCCG CCCTCTCCAC TGGCCATTTG ATAATAATGC GTATGGCGAC AGCTCGGTCT

CGAGTTACCT GATTTGCCCC CGTCATGCCC CCTGCGTCAC CAAAAGAAGA CCAGGCCGAA GAAAAAGAAG AAGTGGAGGG AGCMCACCA CCAGCACCAC

2801 TATATCGTAT CGGTCTCGTG CAGCATAATA CMCACATAT ATTTCMACA AATATATAGA GTGATAGTTG GTGTTAAAGT TGTTACTGTT TACCAGCTAC CAGTGGTTCG CGGTGGCCAC TCGATGGCGC TTGCCGTTCG GTGGTGCTGG CTATTCGCGT TTGGCTAGTC GAGGCGAACG GTGCTTTCGT TCGTGGGTCT

CGTTATGCAG CAGCMCATG CGGCCACGGA ATGAATTTGA ATTTTGATTT CAATGGTGCA TTTAGTGCGG TTTCGTTGTG AAGCTCGATG CATCAGCGCG

GCCACGCCCC CGCGACTGTC CGTTGGTCAG CCAGCAGAAG CAGCATCAGC AGCAGCATCA GCAGCAGCAC TAGCAGCAGC AGCAGCAGCA GCAGCAGCAG

CMCAGCAGA C M M T T T C G C G C T T C M A G TATCTCAAAT GCAGTTAACG MTCGGAACG CGMTTAATT AACAGTTTCA GTGTACCAAA GAACTGCAAG 3201 GCACAAAGGC AGCTAAAGCA AAAGCAACAT CACCATCACC ATCACCGTCT GCACCCCGCA AACAGTAGCC ACAACATTTG GCTTAATGCC CAAGCCAGCA

GCAACATCCG CAGCAGCAGC AGCCGCAGCA ACCGCAACCA TGAAAAGGTA GCAACGGCTG CTTTTGGCCA ACATTCAGTT GCCACTAATC GATTGGCCM

GGCAGCGAGT AAATCGAATT GGCCACCGGG ACGCCAAGGA TGCCGAGCTA ATCCAATACG CTTTTTTTTT GCCAGTGTGT TTGAGTGCGG AATTAAATGC

TCTAAAATAT ATGTGCGTTC AGTTGTGAGT GCTTGTGCTT GGTCTCAGAA AAAACAGTTT AATGATCATT TAATACACCA ATAAATATAT TCAGAAATGC

3601 AGCACATCAG TAAATGGAAA TAGCAGTAAT AATTATAATT TATTCAAAAT AGAAAGCCTA GTTGAGCAAA TAACTTACTT TAAACAMTT GGATGTAGAA

TTGAGCTGGA CTGTGATTAA TCATCTTGAC TCGTCTAATA ATGTTTATAA GGAACAAAAT TAACATATTC TTACTGAGAA AATCTTCTAT A C T T C W

AGAAGGAGTC ACCMTCAAA ACTCAAATTT GTATGGTATA GAAGATTAAC ACATATTTAT ATGTATGTAG GTGGTAGTTG TTTGCTATGA GCCAGTATGC

f3(412) ATACCCTTTC E G A T G A C TTGAATCTTA AAATAACCCA GATAAGACTT GATAATTGTT AGAGAATTCC ACATCTTAAC GTATCCACTA ATCGCCGCGT

4001 AATACTAACA TCGGCTATTG TATCTGTATC TTTCTATCTG TTTATCTCCC TGGAGTCTTT TTGTGTCACT CGCATGTCTT TCATTCTGCA TTCTCTCAGC

GATGGGCMT CGACGAGTCT GGATTGTGGA TCGCCAATTG TTGATCGTCC ATCATTGGAT TCGAGCACGC MATGTTCAA TTAAATAAAC GAGTCTATM

CTGTCAACTG GCATTTCTM CGAGGCTCCT GCTTTGCCTC AATTATTTCG CTTTTTAAAA CCACTTGCTG CCTCGGCCCC CAGACATATT AAATAAATTA

AAAAATAATT ATTAAATTGC AAATACTGAG AAACAAGACA AGCGGGCCAA ACACAAAAAA AAAAAAAAAA CAGCAAAGAG GAAAAAATTG GCAATGTATA

4401 TAGAAATATG TTTCGTACTG TATTTTCACC TACTAAAGAA GCAACAACAT ACGAAAATAA ATGGCGGGAT GGAGTGGATG AAGATGAAAA ATCAACTTTA

ATTGGTTTAC AATTCATTTC TAAGCGAGAA AAGCTGGAAA ATGTACCAAG TAGTCCCAAG TCGGACGAAG AAAGCTAACT CCGCCGCCGT GCGCAATATC

CATGAAGTCG TAAGCCAGTT GACTAAGTGA CAGATACTAG TATCCGTACC CATATAACCA TCATCAACAA CATGCCTCCC ACGCCTCCGT TCGGCCCCCG

AAACTTCTGG AGCTCAGGGG CCAAAGCACA GTGGGACCAT GGTTTATTTA GGGTCGAATC TAAAGTAAAA TGGTTTAGGC AATATTTGGG ATAAGTAGGA

4801 GGGAATTGTT CAATACAAGC TAGTTAACTA GTACAAAAAT AACTATAAAT CAGATCATTA AAGTAAATTC TCCAAAATTA TACTGTTGGA TTAATGGATT

CATAGATATG C T T M T A T T A AAATTGATGA AAAATTATGA AAAAGGTTTC AAAAGTMTA ATCAAAATAA GTATGTATTT TTGTCTCAGA ATTCGTTTAT

TGAGTTCTGG TGCCTGATTT ATTGCTTGCG AATTGTGTTA ACAATTAAAT GGTCGCGAAG TTGGTAAACT ATTTTCCTAA CTCTGCTCAG CTGCCTGTTA

GGCAATTGAG AGTTGGAAAG GCCTGACCAA ATTCATGGCG TTTTGCTGGT CTATATTTTG GCGCTAGCAA ATTGGCGTGA TCACGGTGAT CAGTGGCCAT

FIGURE 3.-DNA sequence analy- sis of theforked locus. The sequence of the coding region offorked is pre- sented. Bases are numbered accord- ing to their location on the grid of Figure 1 . Lowercase letters indicate transcribed, untranslated RNA. Ex- ons have the amino acid translation listed below. Underlined residues are labeled as follows: A, B, C and D identify each exon and transcript (see Figure 1); K, ankyrin-like repeat; G , glycine-rich region; H, hydrophobic stretch; 0, CAX or opa repeat; P, area with high proline concentra- tions; and PRD, paired repeat. Trans- posable element identity and inser- tion site for the characterized alleles is marked by double underline. In- sertion of these elements occurs be- tween the underlined nucleotides.

ern and cDNA analysis: An oligo-(dT)-primed cDNA library was generated from poly(A+) mRNA of mid- late Canton S pupae. Several cDNA clones were iso- lated using the 6.2-kb EcoRISaZI genomic forked frag- ment as a probe. Genomic and cDNA isolates were sequenced and the organization of exons and introns was determined by comparing the sequence of the cDNA isolates with that of the corresponding genomic DNA. The splice site donor and acceptor sequences conformed with consensus splice sites (MOUNT 1982). As depicted in Figure 3, all clones that were obtained from this screen recognized the first of two close polyadenylation signals located at base 18,423 of the genomic sequence (PROUDFOOT and BROWNLEE

1976). Sites of polyadenylation were observed to be- gin 16-25 bases downstream of this polyadenylation signal. Analysis of these cDNA clones allowed the determination of the structure of forked encoded tran- scripts.

The 2.5-kb transcript: One of the clones isolated during the screen, named 22G, defines cDNA A (Fig- ure 1). This cDNA includes six exons. To determine which forked transcript is encoded by cDNA A, exon- specific probes were made by the polymerase chain reaction (PCR), with primers flanking each exon, and used for Northern analysis (data not shown). Exon A1 hybridizes only to the 2.5-kb transcript, while exon A2 also hybridizes to the 5.4-, 5.6- and 6.4-kb RNAs.

Page 5: Effects of Transposable Elements on the Expression of ...Washing was at 50" in 0.1X SSC (150 mM NaCI, 15 mM sodium citrate pH 7.0) and 0.05% SDS for 1 hr at 50". The oligo-(dT) primed

The Drosophilaforked Gene 51 1

5201 TATAATTCTC CGTTGGATGG GGCACTCCAC TCTTGCCCAC TGTGGCATGA GGAAACCAGG CCAAGCCAAC TAAACGGTAA CACCAGCAAC TCCAGCTAGA TCTACACGAG TGGCAGACGC CAAGACMGT GCAGACTTTA GACTGGGAGT CAGGCGTAAC TACTCGTGGA WGAACCTCT CCCCTTGACG GTTTCTATGT TTGTTTCGCT TTTAGGCGTT AATTCACCTA AAAATGTATA TAGATGTAGA TGTGAGGCAA GAAGCATAAG TTTGTCCAGC AGATCGAGCA TCGAGGGTTT MTCACAGCC CAGACITCGC AGCGAGATGT CGGAATAATG CCGACTTTCG GTTTCAGCAG GTGTATACAT ATGTGATCCA TAAAGAAATT GTACAAAtaa

E 1

5601 agtgaaaaac a a a a g a c t g c agccgaaaac t tgctatgca a a t c a a a g c a t t t a a g t g a t t t t t c c t a a a c c a g t t c c a t aaatcgaact c a t t t t t t t t cccgtgcctg tgtgtgtgca tataagtgtg t t t g tg taac g g t a c a a g g a a a a t c a a a g c t a c a c g a a a a c c g g a a a t a t a taaagtc ta gtgacaaacg aaagtaaaat g tgaaat t tc gaacgtgaac ggtaagccaa g g c g a a t a a a t taaagt tca c c a a a t a a g t t taat taaaa caccagcgca ccaagagcac tgccaattcg aacccattca aatataaatc aatatataca t a t a t c t g t t aaaagcaaag g a t c c c a c t c a a a c a c a c a c cccagtcacc tgaccaccaa

6001 atccagtcca c t t c c t c a t c g t c a c c a t c c t c c t c c t c a t t c g c t a g c t a c t t c t c c a c g a t c a a c g t t a agaagggcct c c t a g g c c g c a t g a c c a c a a g t c t g a c c t c ctaccgccgc agcaagcaga a g t c c a a g t c caacaatgac a t c c a c a c c a t c a t c g c c a c cgaggtggtg a c g g c g g c c a t tctcagcag cagtgtccac agcagcagca c a g g a t c g g a aatcggggat ggcggacagg g c a a c a c c a g c c t c a t c g a g gaggcagctg c c c a g g a t c a gagcagcgag gagcaggatc gccagcatca g a c g c c a a g g a t c c a c t a g c c c a a a c g g a g ATGAACCATC ACCCGCACTC GCACCACCAG CGGAAGTCCG CCGTGGAGGT

W E T A s n H i s H i s P r o H i s S e r H i s H i s G l n A r g L y s S e r A I a V a l G I u V a

6401 GGTGCGCCAG CAGATGGACG ACGACGGGGA TCAGGACTCC AACGATGACG ACGAGACCAT CCCCATGATG AGCCGCGTCG GGCGCAAGGA GCAAAACAAT I V a l A r g G I n G l m E T A s p A s p A s p G I y A s p C l n 4 s p S e r A s n A S p A S p A s p C l u T h r W E TProHETWET S e r G l y V a l G I y A r g L y s C I u G l m 4 s n A s n CGACGCAGCG TCGTTAGGTG GGATATCACT AAAAGATCAA TGATATCTTG CGTGATCACA AAACTTTTTA CCTACATATA CTACGCAATA AGTAATACAG A r g A r g S e r V a l V a l S e

GTCMGTAGT ATAGCAGTAT ATGTCTGGGG CTAAATAATT ATAATTAATT CCATTCTCGT TGCTGCAAAT GACTGCCTTT TTGTCTTGGT TTAAAAAAGA AAATGGTTAA GATCAAAAAG AATTTATAAG CCGTTAAGAA AGTTTTCAGT TACATCGTGC GTACTAGTGC GCTTGCACTT CTTCGAATTA CATTTTCATT

6801 GATGCTTAAA ATGAATCATG AACTGAAACC GTGTACTCAT ATAAATATGC CTTCGATCAT TGGTTTGAAT GACTAATTAT ATTATAAAAT ATTTACATAA AATATTGTGT CTCACAAGGC GATTAATTGG TTTAATGCGA TTAAAATTTT AAACTTATCA TTTAATAGAG GCCTATAGAT TTGCACATAA CTGGAAACTT GTTACCCAAT TCCTATCMT GGCAGACCAC ATTTGATGGG ATTATAGTCG GCTGAACACC GTGACAAATC GGTGCATAAC TTTTCCCACA CATTCTCCAA AGTTACCATT TCCATTGGCG GTGGAGTTGT TAATCTCTGC CTACCGTCCT CCATCTCTTG TMTCGATCG ATCAATGATA TTCCGCAATA AAAACGAGTG

7201 AAAACCGGTC AAAGAAGAGG AAAATGCTGA GTAGAGATTT TCCAGTGACC CAGTGCGTTT CTATCGATTA ACTGCAGTTA TATGTTATAG ACACTCTATA

TGCCATATCT GCGTAGGAAT AGGTGCCTTC GAAATGAGAG CACTTTTAGC TATTTCGGTG CAACTTTTTG TTCATTAAGC TCCCCTTTGT ACGGATACTA TACCACCTAA TTGGATGACA AGTAATGAGC CTTTTTTCCG CCGCATTGCA GCGCCAATAC GCAAATGGAC AACGATGTCA CGCCCGTTTA CCTGGCGGCG

s A l a A s n T h r G l r M E T A s p A s n A s D V a l T h r P r o V a l T v r L e u A l a A l a

G l n G l u G l Y H i s L e u G l u V a I L e u L v s P h e L e u V a l L e u G l u 4 l a G l y G l v S e r L e u T v r V a l A r s A l a A r q A s f f i l v M E T A l a P r o l l e H i s A l a A l a S K 3

7601 CACAAATGGG CTGCTTGGAT TGCCTCAAAT GGATGGTGAG TAGATGGTGC ACCAAATGCT TCGAATTCGG TCTTTTCATC CGATCTGTAA GCAGCTAACT e r G l r U E T G l Y C y s L e u A s p C v s L e u L v s T r p n E T

GCCAACTTCC TGGCCATCTT GGCAACTGCA GCAGTTGAAC CAATGCAGG TCMTTGCTA CGTTAAGTGC ATTGGCCAAA ACTGAATGTT GCATTAGGAT TGAACAGAAA CCGCAAAACA GAAAGGGTTT TTGCGGCTTA TAAATATTTG ATATTATTAT AATATGATCC GATTTTTAGG GCGGAGTMT AGATGATTAG T U T C T A A A A TGAGCAAATT GCTGCATTAA TATCAATTTA CAATGCATAG CATTGAAAAT CCATCAAGCT TGTACATTCT AAGAGTCTAC TTTCATTGAT

8001 GCAGAATACT ATTCTATTAC TCGCAAAAGG GACATGGATT GTTCTTAACC CATTCATTTT TCTCAGTCGG CTGTGAACCC GTTTCTTTTT GCTAATGTCT GTGGGTCCTG TTTTTCAATA GTTTTAACGG GGTTCTCGAG TGGGATTCCC GGCCAGTTTG ACGCCACGCC AGCGCCATTT ATCAATCGGG GAATGCGCTC GATGGAGGAG GGAGCAACTT ACCAAGTAGC CAACTAGCGA AGTGGACAAG TGACCGAGGA GCGGAGAGAG TGGCTTATGC AAAGCCACAT CTTCCAGTGC

TGACAATGGC CTCCCAATAA CCAGAATGCA ATTTCAAGAT CATCGCCAGC GGCAGTGGM AACCGCCTTT CTAAATTGAA CTAAATAGTT TCCATTATAG 8401 CCGCAAGCCT GCCGGCTGGC AAACCACTTT CCACATACGC AGTCTGTGAG ATACTTAACC GCCGAAAGAC ATTGAGTTGA TTTAGTCGAC AAGGAGTCGG

CATTTCGATG GACAGCGCGG AATGATACCT CAACATCTCG TCAAGACATC GTGCTTTTAG ATGGATAAGC TGTAATAAAC ATGGTCAAAA TGACAGTTGT GCTTCGMCT AAAGAGACGT TGATGGATCG TTCGGTTTCG GGTGGTTTM MACATATTC ATTAGTCACT TATAACAGAT AGTAAATCAT TTTATTGCCC ACATATTATA TTTGTTTTTC CCTCTTATCC CATCAAAGGT CCAGGATCAA GGTGTGGATC CCAATTTAAG GGATGGCGAT GGAGCTACGC CCTTGCACTT

V a L G l n A s D G l n G l W a l A s d r o A s n L e u A r g A s D G L Y A s D G l y A l a T h r P r o L e u H i s P h

E3, C 3 K3 K4

8801 TGCCGCCTCC CGTGGACATC TATCCGTAGT CCGTTGGCGG CAATCACGGA GCAAATTGTC CCTGGATAAA TATGGCAAAT CGCCCATTAA CGATGCGGCA e A l a A l a S e r A r g G l y H i s L e u S e r V a l V a I A r g T r M r q G l n S e r A r g S e r L y s L e u S e r L e u 4 s D L v s T Y r G l Y L Y s S e r P r o l l e A s n A s W l a A l a

K5

GAAAATCAGC AGGTTGAGGT CAGTGTTGAA CTAGAAAAAA TTTCAATTAA ACATACATAT TAACATATTA ACTATTTACC CCTACAATTT TTTCTTGTTT G l u 4 s n G l n G I n V a l G l u

AGTGCCTGAA TGTCTTGGTT CAGCACGGCA CTTCGGTGGA TTACAATGGC AAAAGTAGCT CACAGCGGCA CAACTCGCAG CAGCAACTGC ACCAGCAGCA C v s L e u 4 s n V a l L e u V a l G l n H i s G l y T h r S e r V a l A s p T y r A s n G l y L y s S e r S e r S e r G L n A r g H i s L y s S e r G l n G l n G l n L e u H i s G l n G l n H i

84, C 4 K5 01 CCAGCAACAG CAGCAGCMC AACAGCAGCA GCACCTGAGC AGCAGCTGCA ACTCGAACTC GAATTCCTCG AAGCAGACGA GTCGCAGCAA TACGATCAAG s G l n G l n G l n G l n G l n G l n G I n G l n C l n G l n H i s L e u S e r S e r S e r C y s A s n S e r A s n S e r A s n S e r S e r L y s G t n T h r S e r A r g S e r A s n T h r l l e L y s

Exons A3, A4, A5 and A6 hybridize to all forked transcripts with the exception of the 1.1-kb RNA, which only hybridizes to A3 and A6 (see below). These results suggest that cDNA A, defined by clone 22G, corresponds to the 2.5-kb transcript.

Sequence analysis of clone 22G indicates that the 5’ end has a guanosine nucleotide that is not present in the genomic sequence. This difference is presum- ably the result of the 7-methyl guanosine cap of the mRNA (BROWN et al. 1989). The transcription start site of this transcript is AGGCTGC, located at base 12,982 in the genomic sequence (Figure 3) and has weak homology to the ATCAG/TTC/T sequence that is conserved in the transcription initiation site of

other Drosophila genes (HULTMARK, KLEMENZ and GEHRINC 1986). A sequence resembling a TATA box is flanked by a G-C rich sequence and is located 32 bp upstream of this transcription site. There is a possible CAAT box located 61 bp upstream of the transcrip- tion initiation site (Figure 3). The low extent of ho- mology of the transcription start site, T A T A box, and CAAT box with that of the consensus for these pro- moter signals in other eukaryotes may explain the relatively low abundance of this transcript or may be due to a pupal or tissue specific promoter sequence. Similar low homologies have been described for the pupal specific transcript of the singed locus (PATERSON and O’HARE 199 1). Three in-frame ATG codons are

Page 6: Effects of Transposable Elements on the Expression of ...Washing was at 50" in 0.1X SSC (150 mM NaCI, 15 mM sodium citrate pH 7.0) and 0.05% SDS for 1 hr at 50". The oligo-(dT) primed

512 K. K. Hoover, A. J. Chien and V. G. Corces

9201 TCCAAGTCCT CCTCGACTTT GAGCTCCGAC GTGGAGCCAT TTTACTTGCA TCCACCGGCC TTAGCAGGCG GATCAGGAGG ATCGGGMTG GGCTTGGGCA S e r L y s S e r S e r S e r T h r L e U S e r S e r A s p V a l G l u P r o P h e T y r L e u H i s P r o P r o A l a L e u A L a G l y G 1 y S e r G l y G l y S e r G l v n E T G l v L e u G l y l l TGGGMTGGG CATGGGCATG GGCATGAGCA TGGGCCTGGG MTCTCAACG AAGAACAGCG ATGCCTTGTA CTCGCAACAA TCGAGGAGTT C C T C C G A W E T G l V n E T G l v W E T G l y W E T G l V n E T S e r l l E T G l v L e u G l y I l e S e r A r g L y s A s n S e r A s p A l a L e u T y r S e r G L n G l n S e r A r g S e r S e r S e r G l u L y

HZ, GI GTTGTACAAC GGTTCGTCCT CCTTGGGTGG CAACAGTTCT TCGGGAGGCG GAGGCGTCGG AGGAGGAGGT GGAAGCGGTG GAGCCCTGCT GCCCAACGAT s L e u T y r A s n G l r j e r s e r s e r L e u G l y C I y A s n S e r S e r S e r G l y G l v G 1yGlWalGL Y G l Y G l v G l v G l y S e r G L v G L v A L a L e u L e u P r o A s r A s e

H3, G2 GGGTTGTATG TGAATCCCAT GCGGAATGGC GGCATGTACA ACACCCCATC GCCGAATGGC AGCATCTCCG GGGAGTCGTT CTTTCTGCAC GATCCGCAGG G l v L e u T v r V a l A s n P r d 4 E T A r g A s n G l y G l y N E T T y r A s n T h r P r o S e r P r o A s n G l y S e r l l e S e r G L y C l u S e r P h e P h e L e u H i s A s p r o G l n A

9601 ACTATCATCT ACAATCGGGT GCGGGATCTG TTCGTGGACT CCGATTGCAG CAGCAGTGTG AAACAGCATG GCAAGAGTCA CGCCCACCAG CTGCTACAGT

s p T y r W i s L e u G l n S e r G l y A l a G l y S e r V a l A r g G l y L e u A r g L e u G l n G L n G L n C y s G I u T h r A L a T r p C l n G l u s e r A r g P r o P r o A I a A l a T h r V a CGAAAGCGGG CAACGCTATG ACCATCAGGC GGACGTCACT CCAGCTCCAG CGGACGGCAG CGGCTCCGAC GAGTCCGTCT CCATCTCGTC AGCTTCAACT L G l u S e r C L y G l r A r g T y r A s p H i s c h A 1 a A s p V a l T h r P r o A L a P r o A L a A s p C l y S e r G l y S e r A s p G l u S e r V a l S e r l l e S e r S e r A l a S e r T h r CGCCGAAGCA GCAGCAGCAG CTGCGCAGCC AGAGTTTCAA TCGCCACGGT AGCAGCAGCA ACATCAGCGC CAGGCAACAT CAAGMCMC AACTACMGG A r g A r g S e r S e r S e r S e r S e r C v s A l a A L a A r s V a l S e r l L e A L a T h r V a l A l a A l a A l a T h r S e r A L a P r o G l y A s n l L e L y s A s n A s n A s n T y r C y s V

TGACAAGGGC MGCCCAGCA ATCAGAATGG CGGCGGGGAT CACGACTACT GAGGACATAT ATCTGGTTCG CGAGGAGTCG CGCAAGCAGC ACCAGCAGCA ~ 4 . ~ 2

10001 ACAGCAACAC CAGTTGCAGC AACAACAGCA GCTGCAGCAC CAGGCCACCC AGCAGCAGCA GCAGCAGCAA CAGCAACAGC AGCAGCTGCA ACACAAATAC a l T h r A r g A 1 a S e r P r o A l a I l e A r g H E T A L a A l a C l y I l e T h r T h r T h r G l u A s p l l e T y r L e u V a l A r g G l u G l u S e r A r g L y s G l n H i s G L n G L n G L

n G l n G l n H i s G L n l e u G l n G l n G L n G l n G 1 n L e u G l n H i s G l n A l a T h r G L n G I n G l n G I n G l n G l n G l n G l n G l n G L n G I n G l n L e u G l n H i s L y s T y r

02 AACGTAGGAC GATCCCGGTC GCGGGACAGT GGCTCCCATT CGCGATCGGC CAGCGCCAGT TCCACCAGGA GCACGGACAT TGTGCTGCAG TACAGCAACC A s n V a L G l y A r g S e r A r g S e r A r g A s p s e r G l y S e r H i s S e r A r g S e r A 1 a S e r A l a S e r S e r T h r A r g S e r T h r A s p l 1 e V a l L e u G L n T y r S e r A s n H ACGTAAGGTA CATTTGTTAT TGCAACAGAA AGGTCTAAAA GTATGCAAGA AAATATTTTA ATTTAATGCA ACAAACAACA TATTTTTAAA CCTGATATGC

ATTTGCTACT TAAAGAAACC TTTCAATGTG CTAATTAACT AAATTATGTT TTAGACTTAT ATTATAACAA TTCCTTTTGA TCCTTCAGCA TTTGAACAAC i s

H i s L e u A s n A s n B5,C5

10401 M G C G C M C A TGAACAACAA CAGCAACATG AACAACAGCA GCAGCAACAT CGCACAGAGC AGCAGCAGCA GCAACMCAA CAACAATAGT CTGCTGAATC 3."Continued' L y s A r g A s W E T A s n A s n A s n S e r A s n H E T A s n A S n S e r S e r S e r A s n l 1 e A l a G l n S e r S e r S e r S e r S e r A s n A s n A s n 4 s n A s n S e r L e u L e u A s n A GCAATAAATC ACATTCGATT ATTGGACTGC ACAGCAGCAA ATACGAGTCC TGCCTGAAGG ACAACTACAG CGCAAAGAAC GTCAATCTGA AGAACCAACT r g A s n L y s S e r H i s S e r l l e I l e G l y L e u H i s S e r S e r L y s T y r G l u S e r C y s L e u L y s A s p A s n T y r S e r A L a L y s A s n V a l A s n L e u L y s A s n G L n L e

f17k (hobo) CCTCAAEGC GGCATCAAGA GTGATACCTA CGAAAGTGTC TGTCCGCCGG AGGATGTGGC CGAGCGGACC AAGCAGACGC ACAAGAACTC CATGATCCGC

u L e u A s n G L y G l y I l e L y s S e r A s p T h r T y r G l u S e r V a 1 C y s P r o P r o G I u A s p V a l A l a G l u A r g T h r L y s G L n T h r H i s L y s A s n S e r W E T l l e A r g AACMTCTGG CGGATGCCTC CAGCAACAAC AACACCAGCG GCAGCATCAA CMCATCAGC AATATTGGCA ACATGAACGG TGGCAATCAG AGTAGCAGGA A s r A s n L e u A I a A s p A l a S e r S e r A s n 4 s n A s n T h r S e r G L y S e r l l e A s r A s n l l e S e r A s n l L e G l y A s n N E T A s n C 1 y C l y A s n G l n S e r S e r A r g A

10801 ATCTCMGCG GGTCTCCTCA GCGCCACCAA TGCAAAATCT GGCCGTGGGT GAGTACAGAA TATTTGAAAA CMTAAATAA GAAGATAACA ACTTTTAGAC

s n L e u L y s A r g V a l S e r S e r A L a P r o P r o W E T G l n A s n L e u A l a V a l TTATATGATA CTATGACTTA GAGTAAGTGC TCTGCTCTTG AGCCAATCTC GTTAGTAAAG TGTATTAAAC CTTGAAATCC TCTTCTAATC CTTAGTGAAT

V a l A s n

86, C6 GGACCACCTC CACCACCACT ACCTCCACCA TTGAGGGCAC CAGTTCAACA GCAACGCAAT MCTTGAGCG TCGATCAACC GAACAGCATG ACCGGTCCAA G l y P r o P r o P r o P r o P r o L e U P r o P r o P r o L e u A r g A l a P r o V a l G I n G 1 n G l n A r g A s n A s n L e u S e r V a l A s p C l n P r o A s n S e r n E T T h r C l y P r o T

CAATGCATCT ACAGAAGAAT GTCTTCGGAC CGAATCAGGG TCCAACCAAC GCCAATGGAA CAGGTGGTGC AGCTCCTCCG CCAGTTCCAG CACCCMTCC h r W E T H i s L e u G l n L y s A s n V a l P h e G l y P r o A s n G l n C 1 y P r o T h r A s n A l a A s n G l y T h r G l y G l y A l a A l a P r o P r o P r o V a l P r o A L a P r o A s n P r

11201 CGTGGCCACT GMGCAGTGG ACTCGGATTC GGGACTTGAA GTGGTCGAGG AGCCGAGCTT GAGGCCATCG GMCTGGTGC GCGGCAACCA CAATCGCACC o V a l A l a T h r G l u A l a V a L A s p S e r A s p S e r G l y L w G l u V a l V a L G l u G L u P r o S e r L e u A r g P r o S e r G l u L e u V a L A r g G l y A s n H i s A s r A r 9 T h r ATGTCCACCA TATCGGGTGA GTTCTMATG ATTAAAATTA TATACTCTCC TTATCTTCCA CTAATAATGG AGTTAAATTA GAAAATCTCG AATGATTAAT

W E T S e r T h r l L e S e r A GTAGTTTATA TCAAAATTCT TGCGCATGGT TCGGAGTTAT CCAGGACCCC ATGACCATTA TCACACCGAG AAATCCATCC GCTATAATCA TTGATTGACT GATTGATTM AGTTCCCTAA CCGTTTGCCA GGTCTTTGAA CCCCATGATT TTTGTGGGGG GTTCCGCATC GTCGATGGCT TGTGACCACT GTCCACTTGT

11601 CCGTCTTGAT TGATCAATTA GGGGMATTC GCCAGTCATG TTCGGCTGAT GGGGGGAAAA TGGCTGTGAA GTGCCCTGCA GTGAAAAGAT ATAGAAACGA GTTGCTCGAG GAGAACCTGA TGTCTGAGCG GCAGTCGAAG GCTGTCGATT GGATGGGATA ATGGTTACAT CTTGATTGAA GTGTTGGCTT TCAAGCAGTG

TTAGCAAATG GCTGTGCGAA CAAAGCCATA CAAATAATGT CGTCAGTTTG GAAAATATTA TGATTGATTG TGTGTAGCCA TATGGMACA ATTTCTCCTT ATTTCAACAA ATGCAGCTTC GCAATTTTCT GTCCACTACT TTTCATCACC TTTCAGGATG CCTTTGCTM TTAGTTAAAA ATCCAAGTGG TCGTTTTCTA

1 2 0 0 1 GCCGATTTGT TGAAGCCCGA AGACAATAAC CTCATTATCA CACAATAAAT ATCGGGTATT ACTTAATTGC AAAAATAGGC GTAACTACCA AGCAAAAMC

present at the beginning of the major open reading frame of this mRNA, preceded by an in-frame non- sense codon. Sequence flanking the first of these translation initiation sites best fits the nucleotide bias in Drosophila melanogaster (CAVENER and RAY 1991). There is an inframe termination codon 2 1 nucleotides upstream of this ATG site. The 2.5-kb RNA contains six exons and spans 5.5 kb of genomic DNA (Figure 3). The open reading frame of this mRNA is 1.9 kb and encodes a polypeptide of 7 1 kD.

Sequence analysis of cDNA clones shows that alter- nate acceptor splice sites are acknowledged at two locations in exons common to the 1.9-, 2.5-, 5.4-, 5.6- and 6.4-kb transcripts. In one instance, the alternative

acceptor-splice sites correspond to the fourth exon of the 2.5-kb transcript. These alternative sites are lo- cated 24 bp apart at nucleotides 16,892 and 16,916 in the genomic sequence. The downstream site is acknowledged in 75% of the clones examined. The fifth exon of the 2.5-kb mRNA has the second pair of alternative acceptor splice sites observed during se- quence analysis of the isolated clones. Twelve nucle- otides separate these sites. In this case, the upstream site located at nucleotide 17,206, is acknowledged in 75% of the clones. The alternative acceptor splice sites recognized during this analysis also agree with the Drosophila splice site consensus (MOUNT 1982). Alternatives are of lengths such that the reading frame

Page 7: Effects of Transposable Elements on the Expression of ...Washing was at 50" in 0.1X SSC (150 mM NaCI, 15 mM sodium citrate pH 7.0) and 0.05% SDS for 1 hr at 50". The oligo-(dT) primed

The Drosophilaforked Gene

M C T W T G C G M T M T ACCTGCCATG TTTGACATGT GTTGGCTCCG ATTTTCCATT TGTTAATTGC CCATTATGTC CGATGGGGCA GGACAAATGG

AGACAGGCTT TMTTGAGCC GACGTTCACA CGATAGGCGG CGMGATCCG TGCAMATGT GGCCAAMU MGAAGTGGA AGTCGGGTTT CTGCTGGMA A C T G A M M G TGGAAGTGCC M C G A T G M A C M T C C C A G C CACTTCACTG CCGTATMGA CACMGTCAT TCGATATMG CTACACTUG CGAMAGCCG

12401 TAGCTCATAT GCTCCTTCAC ACGGGAMAA ATATGAGGGG GCTTTCTTAT AAAGGTACGC AATCTAMAT CTTGGTMCT GTCTMTCTC TGTACATCTG

TACATCTCTG GCTTATMCC CMTTGATTT MCATCATTT TGATMAGAG TTTTTACCAA TTGATAMTT CGGTAGATGT GAATTATTTC GAGTCCATTA ACCTACMCA TAGTCGATTG TCGTTGGGTT AGAGGCCTTC AAAAGACCTA TGCCATATAC CCACAGTGGA CATTTGGGAT CCCCGGATTC GCGATGAGGA AGCMTTTGT CACMCTCGC GAATGTCCCA CACTATACAC ACTTGCACAT AACAATTGGC GACCTTCGAG CCATCACCCG CACCACCGCT CACATGGGTC

12801 ACGCCCACTG CGCCTCCTCC GCCTTTTGGT AATCGGCTCG TGGAACTTCG CATCTGCCM CTGCGAAAGT TTTACCCGCG AAAGTGAACG TGTTTCGCGA AGCGACCTCT CTCGTGTCTC CMTGCGATC CGTCGTCATG GTCTTGGGTG AATAACTTTG GCGATGGCTT GAACCCAGTT C a g g c t g c c a acggagctcg

A 1

cccaggcaac a t c g t c a c c c g g a g t t g a g a gtcgcgactt t t t c a a g c c t c c a c c c c g a c t ta tccggtc g t t c g t t c a t atcgcgttga a c a t t c t g a t aatgtgcatt gat tgaacat a g a t a a c a g a ATGAATATGC TGGGCTCGGT TMGATGCAC MGAATTCGC GCCGAAACTA TTTCACGCTC AGCTTCCATC

METAsnWETL e u G l y S e r V a I L y s M E T H i s L y s A s n S e r A r g A r g A s n T y r P h e T h r L e u S e r P h e H i s A

13200 GGTTCGATAT TTTTCGGTGT MGTGCTCCC MCTGAAATC GGTGTTGTAG A T M T C C C M ACGGTTACAG CCCGTGAAGT GTATTTACTT TMGTTTTTC r g P h e A s p I l e P h e A r g S TGTGTTTTTG TTMGTGGTT TTAGAGCTAC ATTCTTTGAT GGCTTTAAAT AAGAACGGCT ACATTTGATG ATATATTACA AACGAGTMT ATATTTACGT TATGTGTTGT GCAACTGGTA CCTTTTCTGC TTAACTTACA ATTTGTTTTT ATAAATTTTG TAAAATAAAT TTTAAACATA TACCGCTTM MTGTAGTGA GCCCACMTT TCTTAAATM TTAACGTMA CACATTGTAT TCTTTMTAC TTACACTTTT CTAACAGATT AAACTTAATA TTTATTGCTT MTATTATGC

13600 AATACTAACT CTTATGATTG AAATGTTTAA GCATTAGTTC CTAAATCGTA ATATTTTATC GAGACTGGAC ACCTCTTAAA GAAATGTGAT TTGCACTTTC

TMGCCAAGC CAATTAGCTC AATTTAAAAA CTGATACCAT GAAGATCGGG CATTGCACTC GTAAAACTGC AAACGAAACC AAACTGCACA TCAACTCCM TTTCGAATTT CACCGCGAAC TTGACAGGTA CAAGCCCCTT TTTGCTCCGC ACAAATTAAT AGATGAGATG TGTGGGAACA TGTGGAAAAA TGAAAAAATG TGCTTTTCGT TAGCTCGCTT GAGTTGTTTT TTCGAATCGA ATGAATAAAT TCAATGCAAA CTGACTGAAC TCGATTGCGA GTAGTTTTGA CCATAAGAGT

1 4 0 0 0 GCAGCCATGA GTCTGCACCA CCACCTCGTA TGGTCTCGTC CAAACCACGT AATGATCGCC GTGAAAACGG GTTACCAAAG GCCGTGAAGT TGGGGACCCG

ATGCGATCGG GTTTAGATGG TGTTTGGCCA ATTAACCGGC TGCCTGGAAA TTGAATGCAG AGAGGCGACT TCAATGGAAA CGGCGGCTAA T T C M T T T T T ATGGGTATAC TCCCACGAAG ACGTTGGCGG GGMCTTATT CTTCCGTGTC TGCGGATATT MTCCAACTA TATAATATAT ATATGATTTC CCAGGAAGCA

14401 TATGTTCTAT CCMTCATGT GCGGACTTTT AGCCACATTC GTTGGCCACC TATCTCTGTT CCTTGGTCTC ACCATTTATC AMATTGGGG TATCGAGCAT MGCGAGCTT TCCATCAAAT CTGAGCTGCT TCCAGGCAAA CAATAAGGAA AGGCATTGAA TTGGGACACT TGAATGTTTC CGACATGCCA ATATCTAAAT

CTCTCGATTA CATATACATT TAACCACCM TMGTTATCA TTTATCAACT TGAGATTCTG AAGTTTGGTT ACCAATTAAT ATGTATGCCT M A A T T M T T GAMTATAAA TATTTTAAAT MACTGCAGT AAAAGTGAAC CCGACTTGAT TTGGAAGCCA AGGAAATATT TAAAATATAT TCATTTTTTG AATGTATATA FIGURE 3.-Continued. TAMTGATTA TTATAGTTCC CAAAATGAGC TCTAATTTCA AATTATTTGC ATTCCAATAT GCAGCACTCA TTTAACATTT CTCTCCGTTA GACTTCCACA

14801 CAAAAGTAAT TTATTTGATT CACTTGTTCG TATTTTTATA TGCAACTGGG TTGCAACTCC GTCTCCTTCT CCTCGCAAAT CTCCGTCTGG TGGTGAAAGG CGCCTACGCA GCGATCCATT GTGGTTTGGG CTTCTCGTAG CTTCGTAGTT TCATAGTTTC GTATTTTCTT CAGTACATTT GCTGTAGTTT ATTTGCTGGC TGCTCAAGTT AGCATGACCA GAGTCAGACT GCTGATAACT GCTAACCCAC AAAACTCATA AATAAATTGG TTCAATTCCA GCGAACAAGA MGCCAAGCT

e r A s n L y s L y s A l a L y S L e

A2.07.C7

GCTCMTGCC GGCAGCACAA ATGGTAGCTC CATCGCCGCC AGCTCGCATG ACTCCCAGTC CCGATACGW GGATCTGTAC ATGCTGCCM TGGTTCTGCG u L e u A s n A l a G l y S e r T h r A s n G l y S e r S e r I l e A l a A l a S e r S e r A s p A s p S e r G l n S e r A r g T y r G l y G l y S e r V a l H i s A l a A l a A s n G l y S e r A l a

1 5 2 0 1 GCCAATGGTC ACTTCTATGG CTACTCGGAG GGCAACAAAA ACCAGGGGAG CAATGCCAAT GGCAACGGCA ACGGCAACGG AGGTGGCGGC AACTATCACA A l a A S f f i l y H i s P h e T y r C 1 y T y r S e r G l u G l y A s n L y s A s n G l n G l y S e r A s n A l a A s n G I y A s n G l y A s n G l y A s n G l yGlyGlyGly A s n T y r H i s S

GCCATCCCM TCAGCCACAC TACGCGACTG GCCAGCCGCA TCAGTACACG CCCAGCCTGT ACGGAGGAGG CTCCTCCGCC GAGGATCACT ACGAACTTAC e r H i s P r o A s n G l n P r o H i s T y r A l a T h r G L y G l n P r o H i s G l n T y r T h r P r o S e r L e u T y r G I y G L y G 1 y S e r S e r A l a G l u A s p H i s T y r G l u L e u T h ACAGCAGCAG CAGCAAATCT ACGGGGAAAC CCATTTGCCC AGCGGACAGA GCACGGGCGG ACCCAATCTG GTCAACAAGC AGCTGGTGCT GCCTTTTGTG r G L n G l n G l n G l n G l n I l e T y r G l y G l u T h r H i s L e u P r o S e r G l y G L n 4 r g T h r G l y G 1 y P r o A s n L e u V a l A s n L y s G l n L e u V a l L e u P r o P h e V a l

f hdw) CCGCCCAGTT TTCCCAATAA ATCACAAGAC GGCGTCACGC ATCTCATCAA GCCGTCCGAG TATCTCAAGA GCATCAGCAT CCACAAGCGT TCGTGGCAT P r o P r o S e r P h e P r o A s n L y s S e r C l n 4 s p G l y V a l T h r H i s L e u l l e L y s P r o S e r G l u T y r L e u L y s S e r l l e s e r l l e A s p L y s A r g S e r C y S P r o S

f ',f 5(9YPsY) 15601 CCTCCGCCCG GTGAGTAGTG TTTATCCCCC AAATACCCCC ATGAATCGCA CAGTATATGT ACCTACTCGT A Z T A T C C T C CATCCTCCGA CTTTCAATTA

e r S e r A l a A r GCACATGGCA GCCGATTTAA TTTGGCTGAA CATCTCGTGC AACGGCTTTT TATAATTAGT ATCGGCGGTA CTGTTGCCTC GCCTCAATTA GCCAMATCT

f h 9 Y P s Y ) A M A M M A A AAAAATATAL STATAAAAAG ATACATGGCT TCAAAGACGC TTGGCTGCCG CTGAGCAATT AGCAGCTAGC AGATTGGCTA GATCTCTGTG CTGCGCAATT TGCAGCGGAA MTGCCTAAA GAAAATATAG CTAGATTTTC TATACCCAAC AATAGTTTTT CTTACAAAAT GCAATTTGAG ACAACTCMT

16001 TTGAGGTAAG CGTAGCTGCT GACAAATTGA TGTTTAATGT GCCACTCATC AGTAATGGTA TTGAAGTTAC TTGGTAAAAC AAGCTCGGAT T G A A W GTCAGATGCG CAGCACTGTA GCCGACAGAT ATTCGCGACT TCCTCTGCCG ATCAAGAGTG CATGAAATTT TATGCGTGCC ACTTTGCGAC CTGCCACTTT

513

will not be altered depending upon the selection of the site. Acceptor splice site selection does not appear to be specific to each size class forked RNA, as two independent clones of the 2.5-kb transcript display differences in both alternate acceptor-splice site selec- tions (data not shown).

The 5.6-kb transcrzfit: To determine the structure of the larger forked transcripts, a random hexanucleotide primed cDNA library was generated from mid-late pupal poly(A+) RNA. This library, as well as the oligo- (dT)-primed library described above, was probed with DNA fragments spanning the forked locus depicted in Figure 1. Three overlapping cDNA clones that were isolated define the structure of the 5.6-kb transcript (Figure 1). Clone 15T hybridizes to all forked tran-

M E T A r g A l a T h r L e u A r g P r o A l a T h r P h 01

scripts on Northern blots whereas clones 86E and 1 1 E hybridize to the 5.4-, 5.6- and 6.4-kb RNAs. To determine which of these two transcripts is encoded by the cDNAs, exon-specific probes were made by PCR of primers flanking each exon and used for Northern analysis. Results from these experiments (data not shown) indicate that exon B 1 hybridizes only to the 5.6-kb transcript, suggesting that cDNA B which is assembled with clones 86E, 11E and 15T, encodes the 5.6-kb forked transcript. Analysis of clones 86E and 11E indicates that they extend into the 5' transcribed, untranslated leader of the transcript (Fig- ure 3). Probes corresponding to exons B2, B3, B4, B5 and B6, hybridize to the 5.4-, 5.6- and 6.4-kb RNAs, suggesting that the exons are common to these tran-

Page 8: Effects of Transposable Elements on the Expression of ...Washing was at 50" in 0.1X SSC (150 mM NaCI, 15 mM sodium citrate pH 7.0) and 0.05% SDS for 1 hr at 50". The oligo-(dT) primed

514 K . K. Hoover, A. J. Chien and V. G. Corces

TCTTACCACA CATGCGATGT TGCAGCCGCC ACTGCMCTC AGTTCCAATC CGAACATGAT ACTGCGCACA CGCCACACCT CGCGCAGGTC CGCCATTCGC

e L e u T h r T h r A s p A l a M E T L e u G l n P r o P r o L e u G l n L e u S e r C y s A s n P r o L y s M E T l l e L e u A r g T h r A r g H i s T h r T r p A r g A r CTACTGGCCA CMCCCACCC CCATCTCCCA ACTCCCCTCT CCCMTTGCC ATTTTCCAGT GUAGTACCC CACATCTCAC ATCTCTACCT M T C C M C A T

16401 TTTCCCCCTT TTGTTCCACC MCACCTCCA CGCATACGGA CWCTACATG CACATACAGA TCGCCCAGCA GCAGTCGCAT CCGCACCACC ACCACCACCA g S e r T h r A s p T h r G l u A s p T y r M E T G l n l l e G l n l I e A l a C l n G l f f i l n S e r H i s P r o H i s G l n H i s C l n H i s H i

f 36a (springer) f 1 4 ' Q g y p s y ) A3,B8,C8,D2

CCCCCATCCC C E C C S A T C CGCACCATCC CCACCAGCTG CATGGAATGG GCCAGCAGCC GCCGTTGCCC CCACCACCGC CCCCTCTACC CCACCWATG s P r o H i s P r o H i s P r o H i s P r o H i s H i s P r o H i s C l n L e u H i s C l y W E T C l y C l u C l n P r o P r o L e u P r o P r o P r o P r o P r o P r o L e u P r o H i s G l y W E T

PRD P TTACCCCCGC AGCCACCGAT GGCTCCACCG ATCCGTGTAC CAGGTCCCCA CGACGCAACC ATCTCTGCCA AATCACACM GCCCAACAAG CCCCACCCAT L e u P r o P r o C I n P r o P r M E T G l y P r o P r o M E T G l y V a l A I a C l y G l y G l M s p A l a T h r M E T S e r C l y L y s S e r H i s L y s P r o L y s L y s P r o H i s C L y S CCCCACCCM TACCCAGGAC ACCGCCACCA GGAAGCACCA CCAGCCACTC TCGGCCATCT CTATCCAGCA TCTGAACAGC CTGCACGTCA CCMTTGGCA

e r A l a P r o A s n S e r C l n A s p T h r A l a T h r A r g L y s C l n H i s C l n P r o L e u S e r A l a l l e S e r l l e C l n 4 s p L e u A s n s e r V a l G l n 16801 CTCCACTGTG CTCAGGMCA CTTATTATTA GGCAGGCTCA T A A G A M C M T T G l C C T T M C T T A C T A C M TTCCTCCACC M T T C T C C T T CACCTGCCCC

L e u A r g A A&,BP,CP,D3

GCACGGACAC ACAGAagGTA CCCMCCCCT ACCACATGCC ACCTCCCAGC CTCAGCATGC AGTCTCTCAC CTCTAGCACG GAAACCTATC TTAAGTCCW r g T h r A s p T h r C l n L y s V a l P r o L y s P r o T y r G l W E T P r o A l a A r g S e r L e u S e r M E T C I n C y s L e u T h r S e r S e r T h r G l u T h r T y r L e u L y s S e r A s CCTMTTGCC CACCTAAAGA TCTCCAACCA CATACCGCCC ATCMGAAGA TCAAACTGGA GCAGCAGATG GCCAGCCGGA TGGACTCGGA GCACTACATG p l e u l l e A l a G l u L w L y s l I e S e r L y s A s p l l e P r o C l y I l e L y s L y s M E T L y s V a l G l u G l n G l W E T A l a S e r A r g M E T A s p S e r G l U H i s T y r M E T GAGATCACCA AGCAGTTCTC GGGCAACMC TACGTCCACC AGGTCAGTTC TGGCCMCTG GTCCCAGCCG AAACACMGC GAACGGTGGT TAAGAGGCGG

G l u l l e T h r L y s C l n P h e S e r C l y A s n A s n T y r V a l A s p C In

17201 CTCAAAWTC TACTTGCagA TACCGGAAAG GGACCACCCC CGCAACCCCA TACCCGATTG CMGCGCCAG ATGATGCCCA AGAACGCCGC CGACCCGGCC I l e T y r L e u G l n l I e P r o G l u A r g A s p C l n 4 l a G l y A s n 4 l a l I e P r o A s p T r p L y s A r g G l n M E T M E T A l a L y s L y s A l a A l B G l u A r g A l a

A5,BlO,ClO,D4 MGMGGACT TCGAGWCCG TATCCCTCAC WCCCAGACT CACCCCGCCT CTCACAGATT CCCCACTGGA AGCGGGATCT ACTGGCGCGA CGCGAGCAW L y s L y s A s p P h e C l u G l u A r g M E T A l a C l n C L u A l a C l u S e r A r g A r g L e u S e r C l n l l e P r o G l n T r p L y s A r g A s p L e U L e u A l a A r g A r g G l u G l u T

CCGACAACM GCTGMGTM GTGCGTGTAG TCCCTACTGG GCACTCTCCT GACCCCTCCC CTCAATTGTC TGACCTTCTT CTAGGGCGGC CATCTATACG h r G l u A s n L y s L e u L y s A l a A l a l l e T y r T h r

A 6 , B l l , C l l , D 5

CCCMGGTGG AGCAWACM CCCCCTCCCC GACACCTCGC GGCTGAAGAA CCGCGCCATG TCCATCGACA ACATCAACAT CAATCTGGCC CCTATGGAGC

17601 AGCACTATCA GCAGATCCAG TTTCAGCACA AGCAGCAGCA ACTCCACTTG GGCAACAAGG AGAACCATGT CGATCCCAAG GGAGCTCATC AGGATCAGCA P r o L y s V a l G luGlnAsn4s n A r g V a l A l a A s p T h r T r p A r g L e u L y s A s n 4 r g A l a M E T S e r l l e A s p A sn l le lsn l l e A s n L e u A l a G l y M E T G l u C

I n H i s T y r C I n G l n l l e G l n P h e G l n G l n L y s C l n G l n G l n L e u H i s L e u G l y A s n L y s C I u A s n H i S V a I A s p P r o L y s G l y A l a A s p C lnAspCln4s TCACGAGCAC GGAGTTGCCA TTCGCACTCC CCTTGCTAAC CCTAATTCTG GCCTCATTGG AGGTCAGGAG CTCATCGACC CATCGCACGG CACCCACCGC p G l n G l u H i s G l y V a l G l y l I e G l y S e r G l W a l G l y A s n A l a A s n S e r G I y V a l l l e C I y C l v C l n C l u L e u l l e A s p P r o S e r C l u C l r j e r C l n G l y

HS

CGCCCCGACG AGGAGGAGGA MCATCATCC CCTGCCCCCC CCAGCTGCCC ACACCAACAG CCCTCTCAGC CTGATCTGAT CTGCATCCAC GTACACATAC A r g A l a A s p G luclffilffil u T h r S e r S e r P r o G l y A l a P r o s e r C y s A l a A s p C l n G l n P r o S e r G l n P r o A S p L e u l l e C y s l l e H i S V a l H i S l l e P CCMAAAATT AAATCCATTT TTCMTCGAC TCCTAAGCGG TGTATTTTTT TTTTTTTTTT TTTTTGTGGA T t a a c t t a a a gcatgtgcaa t ta taatgag

r o L y s L y s L e u A s n P r o P h e P h e A s n 4 r q L e u L e u S e r C l v V a l P h e P h e P h e P h e P h e P h e P h e V a l A s e H 6 f s ( g y p s Y )

18001 aaaaacccta aatataaagc a t a a a a c c t a agcctaaggt c t a g c t a c t t a c a t t c t a c t a t a t g c a t a a t c a t t t t t g c c g t a t z a t c t a a t c t t a g c acctccatat t t t t t t t t t t t c g c c a a a a c g a a t a t t g t t a t tgcctagt g g g c a g t a c a g t a t c t c a a g g c g a a t c a a a t g c t g t g c t t a a g t t c t t a t

t t c g g t t t t g t a t t t t a a t c a c t t a t a c t c t c t a t t t t t t t a c a t t t t t t t t t g t g a c t t g g c t t g a a t t ta t t tagtac tggaat taat t t a a t t g a t t ggttaagcat t t t t t a t t g t t a t t t g t t t t t a t a t a c g c g g c t c a a a c a a t a t t a c t t t t a a t a a t a t a a a t a c a c g a a t a c a t a c a a c a catctatgaa

18401 tgaaaaagcg aacgaatgaa acaataaaTA CATATATATA ATTTAAAACA AAAGAGAGM TGCTTTAGTC GACTTCCCCG ATMTTATAT MCCGTGTCT CACCTAGTGT GMTGCGAGC CCGAAATTTT ATTATTTTTC TTGGATATCG ATAGGTATTG G M M T A A A A TGAGAAAAAT TGATCAAAAG TGTGGGCGTG CCACTTTTAG CCGGCTTGTG GGCGGTAGAG TGGGCGTGGC ATCGTGGGCC CCATGTTTTC CTTTGCCGTT GGGGTGTGGT M G T T T T T C T T M T T T T G T A M T C M T T T T A T A C C T M A T TCCAGCCACC GCGCCMCCA ACTCGGTCTC CCTAACAMT TTTGGTCGGA ACGTATCGAC GTATTGTATA CTTGACATAC

18801 TCCGAWCCA ATACATTTTC M C G M T C T A GTATACCCTT TTACTCTACG AGTAACGTGT TTMCTACGT GTTGTGACCT TCTAGCTTGG G C W T G G G TGCTGGTTGC AGTGATCCCT CTGTMTCCT TTCCCTTCAT ATTCTACTTG TTGCTTTTCT TGCAGTCCAG ACCCATTGAT ACCGCTGGCG CCATCATTTG MTCATCGAG TACAWATAT CCAGGMGTC TCTCTATTGG CAATACAATT MGATTTTTG AACATATMC ATCCTTTTTT ATCGGTATAG M C A T M G G T

TWAGGTCCT T M A C T C C M TAACTCACTT ACTAATMTA AGCACATCGA TTTCCAAACT TCTGTATCTC TCACATTCGT T T A M A G T T T W C A G C M 19201 AAAAAAGCCA TGTTTGTCGG TCCGTCGGTA TCTCGTGATC TGAGGCACTA TACAGGCTAG AAACGTGTAT AAGATATCTT CCCACACCCC W A C C C M T A

ACTCCCMAT CACCAAGAAA CCATCTCCTT TCCTCATAAT TTCATCGTTT TATTACCAAC TTATGTCATC CTGCCTATGC ACCTATTGAT GTTCCGCACA AAGCCCTTC

Ankyrin - I'k I e consensus

Repeat #1

Repeat #2

Repeat #3

Repeat #4

Repeat #5

forked consensus

FIGURE 3.-Continued.

FIGURE 4.-Alignment for the ankyrin-like repeats encoded by the 5.4-kb RNA. Those positions at which residues are identical to the ankyrin-like consensus are darkly shaded. Sites in which amino acids are chemically similar with consensus residues or are present in the major- ity of repeats are lightly shaded. Chemically similar amino acids are grouped as follows (SCHWARTZ and DAYHOFF 1978): A, S, T. P, and G ; N, D, E, and Q; H, R , and K; M , L, I , and V; F, Y , and W.

Page 9: Effects of Transposable Elements on the Expression of ...Washing was at 50" in 0.1X SSC (150 mM NaCI, 15 mM sodium citrate pH 7.0) and 0.05% SDS for 1 hr at 50". The oligo-(dT) primed

The Drosophilaforked Gene 515

J

Canton S f 3 f17k

FIGURE 5,"Bristle phenotypes of forked mutants. Scanning electron mi- crographs showing dorsal thorax of Canton S.f',f''*,f',f",f',f" and f flies. T h e forked phenotype of the flies ranges from the mildf' mu- tant to the severe jhd and f '& mu- tants.

L I

f 5 f K

scripts. The 5.6-kb RNA is encoded within 12 kb of genomic DNA from coordinate +6.4 to +18.4 of the chromosomal walk and has eleven exons. A 4.3-kb open reading frame in this mRNA could encode a protein of 157 kD.

The 5.4" transcript: A cDNA clone, designated 77X, composes cDNA C and was isolated from the screening of the random hexanucleotide primed li- brary. This clone hybridizes to 6.4-, 5.6- and 5.4-kb RNAs when used as a probe. Four exons are contained within the clone. The most 5' of these exons, named C1, hybridizes only to the 6.4- and 5.4-kb RNA. The relative abundance of these RNAs suggest that cDNA C, which is composed of clones 77X and 15T, corre- sponds to the 5.4-kb transcript. The three terminal exons of clone 77X are shown by sequence analysis to be the same as exons B2, B3 and B4 in the 5.6-kb

RNA. The 5' end of the C1 exon does not contain an open reading frame and represents the transcribed, untranslated leader of the transcript. Sequence flank- ing the translation initiation site of the 5.4-kb RNA are in agreement with that found in other Drosophila genes (CAVENER and RAY 199 1). This transcript spans 18 kb of genomic DNA from interval +0.4 to + 18.4 of the chromosomal walk and has a 4.3-kb open read- ing frame that encodes a polypeptide of 155 kD.

The 1.9-kb transcript: Clone 2W was isolated during the screen of the oligo-(dT) primed cDNA library. This clone, composing cDNA D, is 1.6 kb in length and contains five exons. Northern analysis had estab- lished that probes of exons A3, A4, A5 and A6 hybridize to the 1.9-kb transcript. In accordance with these findings, sequence analysis indicates that the four terminal exons of clone 2W are the same as those

Page 10: Effects of Transposable Elements on the Expression of ...Washing was at 50" in 0.1X SSC (150 mM NaCI, 15 mM sodium citrate pH 7.0) and 0.05% SDS for 1 hr at 50". The oligo-(dT) primed

516 K. K. Hoover, A. J. Chien and V. G. Corces

in transcript A. A fragment that represents exon Dl (Figure l), hybridizes on Northern blots to both the 1.9- and 1.1-kb RNAs, suggesting that cDNA D, which contains clone 2W, represents the 1.9 kb tran- script. Genomic sequence analysis indicates that a 90- bp open reading frame extends upstream of the 5‘ end of clone 2W. This open reading frame contains three translation start sites. Sequences flanking the first of these sites is most similar to sequences that are adjacent to the translation start sites of other Dro- sophila genes (CAVENER and RAY 1991). Assuming that this is the 5’ most exon of the 1.9-kb transcript and that the ATG site within this open reading frame is the translation start site of this transcript, the open reading frame of the 1.9-kb RNA extends 1.4 kb and encodes a 53-kD protein.

The 1.1-kb transcript: The structure of the 1.1-kb RNA was determined by Northern analysis using exon specific probes that were generated by PCR. No cDNAs were isolated for this RNA, and therefore the structure of this transcript is only tentative. The 1.1- kb transcript hybridizes to probes representing exons D l , A3 and A6. Exon A6 was further divided into two equal 550-bp fragments and each were used as probes. The 1.1-kb RNA hybridizes only to the frag- ment corresponding to the terminal half of exon A6. These results show that the 1.1- and 1.9-kb RNAs probably share the same promoter and that alternative splice selection occurs in the 1. l-kb RNA. Analysis of consensus splice sequences suggests that the splice acceptor site for the last exon of the 1.1-kb RNA is close to the untranslated sequence in the terminal exon shared by all forked RNAs.

The 6.4-kb transcript: Northern analysis indicates that the 6.4- and 5.4-kb RNAs hybridize to the same genomic and exon specific probes from interval -0.5 to +19.4 of the chromosomal walk. A genomic frag- ment that extends 500 bp upstream of the 5’ end of cDNA clone 77X of the 5.4-kb RNA, from coordi- nates -0.5 to 0, encodes multiple stop signals in all reading frames and selectively hybridizes to both of these RNAs. Probing Northerns with a 1.0-kb frag- ment that extends distally from -0.5 to -1.5, detects only the 6.4-kb RNA and reveals that the transcription start site of this RNA is upstream of that for the 5.4- kb RNA. Northern hybridizations using genomic frag- ments that extend 10 kb upstream of interval - 1.5 as probes do not detect forked RNAs. Therefore, the transcription start site of the 6.4-kb RNA should be located between the XhoI site at position 0 and the BamHI site at -2.0 kb. Sequence of the genomic DNA in this region reveals no significant open reading frames. Based on these results, a tentative structure for the 6.4-kb transcript is represented in Figure 1, assuming that the only difference between the 5.4-

and 6.4-kb RNAs is a longer transcribed untranslated region in the latter.

Structural analysis of the forked proteins: T o de- termine if a function could be predicted from the structural domains of the forked products, the amino acid structure of the proteins encoded by the 2.5-, 5.4- and 5.6-kb mRNAs was analyzed. A feature com- mon to all three of these proteins is the presence of stretches of at least 16 hydrophobic residues that may represent regions of the protein spanning the cell membrane. Two such hydrophobic stretches occur in exon A6 (Figure 1) which is common to all three RNAs encoded by nucleotides 17,714-17,777 and 17,909-17,972 of the genomic sequence (Figure 3). The coding region shared by the 5.6- and 5.4-kb transcripts has as many as three of these hydrophobic stretches in exon B4 (Figure l), spanning nucleotides 9,285-9,348, 9,450-9,512 and 9,822-9,870. The 5’ exon of the 5.4-kb RNA also contains a significant stretch of residues that commence at nucleotide 458 and includes three residues of the next exon. Hydro- phobic stretches of this size are characteristic of trans- membrane spanning peptides, a-helices, and mem- brane associated proteins. It is appealing that these hydrophobic stretches are present in proteins that are involved in the formation of microfibrillar structures of bristles that are apposed to the cell membrane.

The characterized transcripts of the forked locus contain several regions in which there are concentra- tions of specific residues. Thirteen of the 24 amino acids in exon A3, between nucleotides 16,558 and 16,630 in the genomic sequence, encode proline res- idues. The 5.4- and 5.6-kb RNAs have glycine rich stretches encoded in exon B4, between nucleotides 9,267-9,341 and 9,444-9,482 in which 13 of 25 amino acids and 11 of 13 amino acids are glycine, respectively. Polyglycine tracts are devoid of structure and these peptide stretches may serve a role analogous to the hinge region of the products of Ultrabithorax (BEACHY, HELFAND and HOGNESS 1985).

All six forked transcripts contain an exon that en- codes a tandem repeat of alternating histidine and proline residues between nucleotides 16,478 and 16,535 in the genomic sequence (Figure 3). The re- peat is located in the third exon of the 2.5-kb RNA and the eighth exon of the 5.4- and 5.6-kb transcripts. This motif was first observed in the product of the paired gene and has been termed the PRD repeat, or PRD gene set (FRIGERIO et al. 1986). As is the case of the repeat in paired, the forked repeat is imperfect and occasionally disrupted. Taking into consideration this discontinuity, the forked PRD repeat spans 19 resi- dues. The repeat has also been observed in the D. melanogaster gene bicoid (BERLETH et al. 1988), and the Drosophila ananassae gene Om(1D) (TANDA and CORCES 199 1). The PRD repeat from the paired gene

Page 11: Effects of Transposable Elements on the Expression of ...Washing was at 50" in 0.1X SSC (150 mM NaCI, 15 mM sodium citrate pH 7.0) and 0.05% SDS for 1 hr at 50". The oligo-(dT) primed

The Drosophilaforked Gene 517

has been used to probe developmentally staged North- ern blots. This screen demonstrated that many other transcripts contain the PRD motif, but its role has not been determined (FRIGERIO et al. 1986).

The forked gene has two copies of a sequence de- fined by the presence of a repeated (CAX), motif, where X is the nucleotide G, A or C, and n usually has a value of less than 30. These repeats are located within exon B4 (Figure 1) which is the fourth exon of the 5.4- and 5.6-kb RNAs. The (CAX), motif is pres- ent eighteen times from nucleotides 9,078 to 9,135 in the genomic sequence (Figure 3). The second re- peat is located near the 3’ end of the exon and contains thirty-two of these codons from nucleotides 9,987-10,094. This repeat has been found in orga- nisms ranging from yeast to humans, in genes that are developmentally or tissue specifically regulated (WHARTON et al. 1985a; SCHULTZ and CARLSON 1987; SUZUKI et al. 1988; CHANG et al. 1987; DUBOULE et al. 1987). The (CAX), motif encodes a polyglutamine stretch that is often interrupted by histidine residues and has been recognized under the names of opa (WHARTON et al. 1985a,b), strep (KIDD, LOCKETT and YOUNG 1983), and the M repeat (SCHNEUWLY et al. 1986). In D. melanogaster, the opa repeat occurs at least 500 times in the haploid genome and has been found to be expressed differentially in a develop- mental profile (WHARTON et al. 1985b). The opa motif is associated with homeobox containing genes such as Antennupedia (LAUGHON et al. 1986), Deformed (RE- GULSKI et al. 1985), cut (BLOCHINGER et al. 1988), engrailed (KUNER et al. 1985) and fushi tararu (LAUGHON and SCOTT 1984). In the transcription factor Spl, two glutamine rich domains, containing approximately 25% glutamine residues, have been shown to be potent activators of transcription (COUREY et al. 1989). A chimaera containing the Spl zinc finger binding domain and the nonhomologous opa repeat of Antennapedia was observed to synergis- tically stimulate the expression of the SV40 enhancer, suggesting that the repeat may serve an evolutionary conserved structural or functional role (COUREY et al. 1989).

Products of the 5.6- and 5.4-kb RNAs contain ankyrin-like repeats: A search of the Genbank pro- tein database revealed significant homology between regions common to the three largest forked products and polypeptide stretches of proteins known to con- tain a repeated motif found in the ankyrin protein (LUX, JOHN and BENNETT 1990). The products of the 5.6- and 5.4-kb RNAs contain four and five copies of this ankyrin-like repeat, respectively. The motif has also been recognized as the Cdcl O/SW 16 repeat (BREEDEN and NASMYTH 1987). The amino terminally situated forked ankyrin-like repeats are arranged in tandem. This repeated motif is encoded in the first

four exons of the 5.4-kb transcript and the second, third, and fourth exon of the 5.6-kb RNA (Figure 3). When conservative residue substitution is taken into consideration (Figure 4), the forked ankyrin-like re- peats show significant homology to the ankyrin-like consensus (LAMARCO et al. 1991). Additionally, posi- tion two is usually an arginine residue within the forked ankyrin-like repeats. As observed in the consensus for the 22 ankyrin repeats (LUX, JOHN and BENNETT 1990) and the six ankyrin-like repeats in the product of the Caenorhabditis elegans gene fem-1 (SPENCE, COULSON and HODGKIN 1990), position 17 of the forked ankyrin-like consensus is typically a glycine res- idue. Position 22 of the ankyrin-like consensus is a leucine residue as is the case of the consensus for the seven copies of this repeat motif in the product of the human bcl-3 gene (OHNO, TAKIMOTO and MC- KEITHAN 1990). The length of ankyrin-like repeats has generally been found to be 33 residues, but varies from the half repeats in the human NF-KB binding protein (BOURS et al. 1990), to the presence of non- contiguous repeats in the products of the yeast genes cdclO and SW16 (AVES et al. 1985; BREEDEN and NASMYTH 1987). Ankyrin-like repeats of varying numbers and lengths have been found in many classes of proteins in organisms ranging from yeast to hu- mans. The yeast cell cycle control genes cdcl0, SW14 and SW16, encode proteins that contain two noncon- tiguous repeats (BREEDEN and NASMYTH 1987; AN- DREWS and HERSKOWITZ 1989). Products of the Notch locus of D. melanogaster and the lin-12 and glp-I genes of C. elegans are involved in tissue differentiation and contain six contiguous ankyrin-like repeats in their cytoplasmic domains (GREENWALD 1985; YOCHEM and GREENWALD 1989). The fem-1 gene product of C. elegans is involved in sex determination and contains six contiguous ankyrin-like repeats (SPENCE, COUISON and HODGKIN 1990). The ankyrin-like motif has been observed in transcription factors as well. Studies of the heterodimeric DNA binding protein GABP indi- cate that the amino terminus of the p subunit contains four contiguous ankyrin-like repeats (LAMARCO et al. 199 1). These repeats help stabilize GABP p interac- tion with the CY subunit (THOMPSON, BROWN and MCKNIGHT 199 1).

Analysis of spontaneousforked mutations: To ana- lyze the regulatory and coding regions of the forked gene further and to determine the mechanism of mutagenesis by the insertion of transposable elements, eight spontaneous forked mutations were character- ized. Genomic DNA was isolated from f ’, f ’, f K , f 36a,

f ’4k, f z7k, f hd and f mutant stocks. This DNA was then treated with restriction endonucleases. Genomic Southerns containing DNA isolated from Canton S and mutant individuals were prepared and analyzed for alterations using fragments from the 50 kb of

Page 12: Effects of Transposable Elements on the Expression of ...Washing was at 50" in 0.1X SSC (150 mM NaCI, 15 mM sodium citrate pH 7.0) and 0.05% SDS for 1 hr at 50". The oligo-(dT) primed

518 K. K. Hoover, A. J. Chien and V. G. Corces

contiguous DNA in the chromosomal walk of the forked locus as hybridization probes (Figure 1). In all cases, insertions were found within the analyzed re- gion. The characterized mutations, with the exception of the f 17' and f alleles, have insertions located within a 6.2-kb EcoRISalI genomic fragment from interval + I 3.0 to +19.2 of the chromosomal walk. Insertions in the f "' and f alleles are located 2.6 and 9.3 kb upstream of the 6.2-kb region, respectively (Figures 1 and 3).

A minimum of 11.7 kb separates the transposable element insertion site of the f mutation from that of the f I , f and f 36a mutations. This distance between members of the left and right pseudoallelic series explains the intra-allelic recombination that has been observed to occur within the forked locus (GREEN 1955). Furthermore, mutant characterization shows that the forked locus is oriented with its 5' end closest to the telomere. This orientation is supported by the physical map of a phage walk spanning the Df(l)B263-20 deletion, which includes the forked locus (HICASHIJIMA et al. 1992).

The f allele is a member of the left pseudoallelic series and has an insertion in the forked locus that is the most distal of those characterized (Figure 1). The bristle phenotype off mutants is the weakest of all forked alleles characterized in this study, with macro- chaetae slightly bent (Figure 5). This mutation was cloned by screening a genomic library prepared from a f mutant stock with a 1.6-kb EcoRI-XhoI genomic fragment located between coordinates +2.6 to +4.1 of the chromosomal walk (Figure 1). Restriction and sequence analysis of clones isolated from the screening of this library indicate that the f allele has a full length 7.6-kb 412 retrotransposon (YUKI et al. 1986) inserted at the locus in the same orientation as forked. This insertion is located at nucleotide 3,914, within the first intron of the 5.4-kb transcript and upstream of the other forked transcripts (Figures 1 and 3).

f 17' mutants have a weak bristle phenotype, as only macrochaetae are affected (Figure 5). Clones contain- ing the forked locus of this allele were obtained by screening a genomic library made with DNA from f I" flies with a 3.8-kb EcoRI genomic fragment lo- cated between coordinates +9.2 to +14.0 of the chro- mosomal walk (Figure 1). Characterization of isolates obtained from this screen by restriction and sequence analysis reveals that a 2.9-kb hobo element (STRECK, MACGAFFEY and BECKENDORF 1986) is inserted in the same orientation as forked in this allele. The insertion site of the hobo element is within the fifth exon of the 5.6- and 5.4-kb RNAs and upstream of the smaller transcripts. The hobo element is inserted at nucleotide 10,608 (Figures 1 and 3).

The f hd allele was generated by P element-induced hybrid dysgenesis (D. DORER and A. CHRISTENSEN,

personal communication). There is no sign of reduced viability or fertility in the f hd stock, but the phenotype of the macrochaetae, microchaetae, and trichomes are severely affected (Figure 5). The bristle phenotype of females that are heterozygous for f and the deletion Dfll)B263-20 which includes all of the forked coding region is indistinguishable from homozygous f fe- males. Screening of a genomic phage library prepared with DNA that was isolated from a f hd stock with the 6.2-kb EcoRI-Sal1 genomic fragment provided clones containing the insertion. Sequence analysis of these clones shows that a 2.9-kb P element (O'HARE and RUBIN 1983) is inserted into the forked locus. The P element insertion is located 14 nucleotides upstream of a donor splice site for an exon that is common to the 6.4-, 5.6-, 5.4- and 2.5-kb transcripts at nucleotide 15,597 of the genomic sequence (Figure 3). This insertion is in the second exon of the 2.5-kb RNA and the seventh for the 5.4- and 5.6-kb transcripts. The P element is inserted in the same orientation as the forked gene (Figure 1).

Macrochaetae, microchaetae and trichomes are se- verely affected in the f j4' and f alleles (Figure 5). Both of these mutations are the result of transposable element insertions into an exon that has been deter- mined by screening mid-late pupal poly(A+) RNA of Canton S flies to be present in all forked transcripts. Analysis of clones isolated from a genomic library prepared from the DNA off j4' flies shows that a full length gypsy retrotransposon (MARLOR, PARKHURST and CORCES 1986) has inserted in an opposite orien- tation to the forked gene. This insertion is at nucleotide 16,517 of the genomic sequence (Figures 1 and 3). The screening of a genomic phage library prepared from the f mutant stock with the 6.2-kb EcoRI-Sal1 genomic fragment provided clones of the forked re- gion in this mutant. Analysis of the clones indicates that a full length springer retrotransposon (KARLIK and FYRBERG 1985) is inserted at nucleotide 16,513, four nucleotides upstream of the f j4' insertion (Figure 3). The springer insertion is oriented opposite to the

forked locus. The insertion sites for both the f 14' and f alleles are within the coding region containing the PRD repeat.

Unlike the f "' mutation which results from a gypsy insertion into an exon, insertion of the gypsy retro- transposon into an intron of the forked gene in f I , f and f alleles renders the expression of the bristle phenotype under the control of second site modifier loci. The gypsy insertion of the f I allele is located within an intron of the characterized transcripts, at nucleotide 15,673 of the genomic sequence (Figures 1 and 3) (HOOVER et al. 1992). This element is inserted in the same orientation as forked. The macrochaetae and microchaetae are mildly affected in the f allele (Figure 5).

Page 13: Effects of Transposable Elements on the Expression of ...Washing was at 50" in 0.1X SSC (150 mM NaCI, 15 mM sodium citrate pH 7.0) and 0.05% SDS for 1 hr at 50". The oligo-(dT) primed

The Drosophilaforked Gene 519

The f 5 mutation has a strong bristle phenotype. The macrochaetae and microchaetae are strongly gnarled in f flies (Figure 5). Southern analysis of genomic DNA cut with the EcoRI restriction endo- nuclease indicates that two insertions occur within the 23-kb Sal1 genomic fragment from the -4 to +19.4 interval of the genomic walk (data not shown). A genomic phage library was prepared from a f 5 stock and screened with the 6.2-kb EcoRI-Sal1 fragment. Sequence analysis of the clones obtained from this screen indicates that two full-length gypsy retrotrans- posons are inserted in the same orientation as the forked gene. One of the gypsy insertions is located at the identical site and orientation as that observed for the f I gypsy insertion (Figure 1). The second gypsy insertion of the f 5 allele is shown by Southern and sequence analysis to be located within the 3' tran- scribed untranslated region of all forked transcripts, at nucleotide 18,166 of the genomic sequence (Figures 1 and 3). BRIDGES (1938) isolated the f 5 allele in a screen 22 years after his initial isolation of the f I

mutation. Molecular characterization of the f m1 allele, a third independently derived mutation, indicates that it also contains a gypsy insertion at the identical site and orientation as that observed in the f I and f alleles. The f m1 allele arose spontaneously (T. GERA- SIMOVA, personal communication) and has a weak bristle phenotype. The insertion site common to all of these alleles appears to be a preferred location in the forked locus for gypsy mutagenesis. The TATA target site of these insertions is the same as that recognized by the gypsy insertion of the f allele (see below) and the second f gypsy retrotransposon inser- tion. The target site of the gypsy insertion in the f "' allele also consists of alternating pyrimidines and ad- enines, but is CACA.

The weak bristle phenotype off mutants is similar to that of the f I allele (Figure 5). Genomic phage libraries were prepared from DNA isolated from a f mutant stock. The library was screened with the 6.2- kb EcoRI-Sal1 genomic fragment from interval +13.0 to +19.2 of the chromosomal walk. Sequence char- acterization of the clones isolated from this screen show that a gypsy retrotransposon is inserted at nu- cleotide 15,821 in the genomic sequence, 148 bp downstream of the insertion site of the gypsy elements in the f I and f 5 alleles (Figures 1 and 3). This element is oriented in the same manner as the forked gene.

Effects of transposable element insertions on the expression of the forked gene: Northern analysis of poly(A+) RNA isolated during pupal stages of devel- opment, was carried out for each of the mutant stocks to study the correlation between RNA levels of the

forked gene and the phenotypic severity of each mu- tation. These experiments were also undertaken to further characterize the involvement of transposable

Canton S f 3 f '* 7 8 9 7 8 9 7 8 9 -"

A

(kb)

G 5 . 6 6.4

-5.4

- 2.5

-1.9

-1.1

FIGURE 6.-Analysis offorked mRNAs i n f ' andf"* alleles. (A) Poly(A+) RNA of pupal stages of development was analyzed for Canton S , f 3 and f ''' stocks. Numbers at the top of each lane signify the age of the flies in days after egg laying; day 7 corresponds to young pupae in Figure 1 , day 8 to mid pupae, and day 9 includes the mid-late pupae stages. The filter was probed with the 6.2-kb EcoRISalI genomic fragment. (B) The same filter was dehybridized and screened with the ras2 gene as a control for loading of amounts of RNA.

elements in eliciting a mutant phenotype. Northern blots were probed with the 6.2-kb EcoRI-Sal1 genomic fragment from coordinates +13.0 to +19.2 of the chromosomal walk. The filters were then stripped and reprobed with a DNA fragment containing the coding region of the Drosophila ras2 gene. This gene encodes a 1.6-kb transcript that is expressed at approximately constant levels throughout development and serves as a control for the amount of RNA loaded in each lane (MOZER et al. 1985).

The mutations have been separated into three groups according to severity of bristle phenotype and identity of the transposable element responsible for the mutation. The weak phenotypes off and f "' alleles are the result of similar alterations in the de- velopmental profile of forked RNAs. The 6.2-kb EcoRI-Sal1 genomic probe does not detect the 6.4-, 5.6- or 5.4-kb transcripts in RNA isolated from a f mutant stock (Figure 6, Table 1). In this allele, the 412 element insertion is within an intron of the 6.4- and 5.4-kb RNAs that is upstream of the coding regions for the 5.6-, 2.5-, 1.9- and 1.1-kb RNAs (Figure 1). The 412 insertion does not alter abun- dance of the 2.5-, 1.9- or 1.1-kb RNAs. As is the case for the f mutation, the 6.4-, 5.6- and 5.4-kb RNAs are also absent from the pupal RNA off flies in which a hobo element has inserted into an exon of these transcripts (Figures 1 and 3). Quantitation of transcript levels using ras2 as a standard shows that the accumulation of the 2.5- and 1.9-kb transcripts is

Page 14: Effects of Transposable Elements on the Expression of ...Washing was at 50" in 0.1X SSC (150 mM NaCI, 15 mM sodium citrate pH 7.0) and 0.05% SDS for 1 hr at 50". The oligo-(dT) primed

520 K. K. Hoover, A. J. Chien and V. G. Corces

TABLE 1

Summary of forked phenotypes and quantitation of forked RNA levels in Canton S and mutant stocks

Nature of Effects on RNA levels

Strain Phenotype mutation 6.4 kb 5.6/5.4 kb 2.9 kb 1.9 kb 1 . 1 kb

cs Wild type 5 50 100 50 50 f ' Weak 412 ND ND 90 70 50 f '') Weak hobo ND ND 30 30 50 f Strong springer ND ND ND ND 370 f" f' Intermediate am 10 10 20 30 70 f" Intermediate DPSY 1 10 30 25 20 f ' Strong gYPV 1 10 10 20 30

Strong P ND ND ND 50 50

Data summarize the quantitation of forked mRNAs in the strains examined. Values represent the average of at least three different preparations. RNA levels were determined using a Molecular Dynamics Phosphorlmager or a densitometer. The value obtained for each recording was divided by that received for the ras2 control in the lane. A value of 100 has been assigned to the amount of 2.5-kb RNA in wild-type files to which other values have been normalized. ND indicates not detectable.

A

Canton S f 36a f" 7 8 9 7 8 9 7 8 9 "-

(kb) 6.4

5 5.6 - 5.4 - 4.0

- 2.5

- 1.9

-1.1

FIGURE 7.-Analysis f o r k d mRNAs in f 360 and f mutants. (A) Poly(A+) containing RNA (10 pg) of pupal stages of development for Canton S , f " and f was probed with the 6.2-kb EcoRISalI genomic fragment. Numbers at the top of each lane indicate the age of the flies in days after egg laying; day 7 corresponds to young pupae in Figure 1, day 8 to mid pupae, and day 9 includes the mid- late pupae stages. (B) This filter was screened with ras2 to control for equal loading of RNA.

affected in f "' flies. In this mutant, the amount of the 2.5-kb RNA is almost one-third and that of the 1.9 RNA is approximately two-thirds of the same size transcript in Canton S flies (Figure 6, Table 1).

The severity of the bristle phenotype in the f hd and f 36a mutations is the result of the insertion of a trans- posable element into an exon of the forked gene. The 6.4-, 5.6-, 5.4- and 2.5-kb RNAs are not detected when a Northern containing poly(A+) RNA isolated from pupal stages of development from f and f 36a stocks was probed with the 6.2-kb EcoRI-Sal1 frag- ment (Figure 7). In the f hd mutant, abundance of the 1.9- and 1. I-kb RNA is the same as that observed for wild-type flies (Table 1). The 5' most exon of these

1.1- and 1.9-kb RNAs is downstream of the insertion site of the P element in the f hd allele and these RNAs are the only forked transcripts within which the coding region has not been disrupted (Figure 1). A signal of approximately 1.1 kb is present in the f 36n mutant stock (Figure 7). This signal is amplified eightfold over that observed for the equivalent sized forked transcript in wild-type flies. Longer exposure of the membrane also detects a band corresponding to a transcript that is 4.2 kb. These forked RNAs detected in f 36a flies result from premature termination of transcription. The 4.2-kb RNA in this mutant is de- tected with probes that are specific to the 5.6- and 5.4-kb RNAs. A probe containing the 5' most exon of the 2.5-kb RNA hybridizes to a 1.1-kb RNA in the f 36a mutant stock (Figure 8B). Northern analysis using the long terminal repeat (LTR) of the springer ele- ment as a probe shows hybridization to the same RNAs as the 6.2 kb EcoRI-Sal1 fragment in addition to the expected RNAs of the element (data not shown). These results indicate that all forked RNAs are truncated within the 3' LTR of the springer ele- ment. These findings are supported by the presence of a polyadenylation signal on the nontranscribed strand of the springer LTR (KARLIK and FYRBERG, 1985).

The insertion of a gypsy retrotransposon into an intron of the forked gene in the f' mutation lowers the abundance of the five largest RNAs (Figure 8A). The 1.1-kb RNA of this mutant is elevated 1.4-fold over that observed in the Canton S stock (Table 1). The Northern blot in Figure 8 was probed with a fragment specific to the 2.5-kb transcript and found to hybridize to an 1 .I-kb RNA in f' flies (Figure 8B). This fragment hybridizes selectively to the 2.5-kb RNA in wild-type flies (see lane marked CS-8 in Figure 8B), suggesting that the 1.1-kb transcript arises by premature termination of the 2.5-kb RNA. T o ex- amine the role of the gypsy insertion in truncation of

Page 15: Effects of Transposable Elements on the Expression of ...Washing was at 50" in 0.1X SSC (150 mM NaCI, 15 mM sodium citrate pH 7.0) and 0.05% SDS for 1 hr at 50". The oligo-(dT) primed

The Drosophilaforked Gene 52 1

FIGURE 8.-Analysis of forked

thern was prepared from poly (A')

f ' f 369 f ' fm cs f' f a c,s- - cs- - I" mRNA inf'andf'6a mutants. A Nor- A 8 7 8 9 7 8 9 , 8 7 8 9 7 8 9 8 7 8 9 7 8 9 T (kb) R N A (IO pg) of pupal stages of

the 2.5-kb RNA, the Northern was stripped and re- probed with a fragment representing the 5' LTR of the gypsy element (Figure 8C). This probe hybridizes to a 1. I-kb RNA (marked with an arrowhead in Figure 8C) that migrates slightly faster than the 1.15-kb gypsy transcript present in Canton S, f I and f 36a samples in Figure 8C. In addition to the 1. I-kb forked transcript, other RNAs transcribed from the gyfisy element are also visible in Figure 8C, including a full-length 7.0- kb RNA, a spliced subgenomic 2.0-kb RNA that is specific for the envelope protein (A. P~LISSON and V. CORCES, unpublished), and several other transcripts that have not been characterized or are intermediates in the process of reverse transcription. These results indicate that in the f I allele, most of the 2.5-kb forked RNA is prematurely terminated within the LTR of the gypsy element and only low amounts of the RNA are being processed normally.

Examination of forked RNAs in the f mutation indicates that all transcripts are reduced in abundance when compared to Canton S (Table 1). When used as a probe, the 6.2-kb EcoRI-Sal1 fragment detects RNAs that are 4.0 and 4.2 kb in addition to the other

forked RNAs (Figure 9A). Northern analysis using genomic fragments that are proximal to the gypsy insertion as probes do not detect the 4.0- and 4.2-kb RNAs. Further hybridizations of this Northern with probes containing the first exons of the 5.6-, 5.4- and 2.5-kb RNAs indicate that reduced levels of normal sized transcripts are being produced and that some of the RNAs from each of these promoters are truncated to form transcripts that are 1.4 kb smaller than native

forked RNAs (Figure 9B). In agreement with the f ' data, the three truncated RNAs in f mutants are detected when the LTR of the gypsy retrotransposon is used as a probe (data not shown). Hybridization of the Northern in Figure 9 with a probe representing

. . 6.4 development isolated from Canton S.

F5.6 f' and f36" stocks. Numbers at the -5.4 top of each lane indicate the age of - 4.0 the flies in days after egg laying; day

7 corresponds to young pupae in Fig- - 2.5 ure 1, day 8 to mid pupae, and day 9

includes the mid-late pupae stages. The membrane was hybridized with the 6.2-kb EcoRISall genomic frag- ment (A); the exon A I (Figure 1) specific to the 2.5-kb RNA (B); the long terminal repeat of the gypsy ele- ment (C); and the r a d gene as an - internal standard (D). The arrow- head in panel C points to the 1. I-kb transcript that initiates in the forked promoter and terminates in the gypsy 5' LTR.

-1.9

cs f K cs f K

B A 7 8 9 7 8 9 " "

7 8 9 7 8 9 (kb)

G 5 . 6 6.4

L 5 . 4 - 4.0

.I) - 2.5

-1.9

- 1.1

""..I FIGURE 9.-The profile of forked mRNAs infK flies. A Northern

containing poly(A+) RNA of Canton S and f flies was probed with the 6.2-kb EcoRISalI genomic fragment (A), the A I exon specific for the 2.5-kb RNA (B), or the ras2 gene as a control for loading (C) . Numbers at the top of each lane indicate the age of the flies in days after egg laying; day 7 corresponds to young pupae in Figure 1, day 8 to mid pupae, and day 9 includes the mid-late pupae stages.

exon A3 that is common to all forked RNAs (Figure I), indicates that levels of the 1.9- and 1.1-kb RNA are reduced when compared to Canton S (data not shown). The coding region of these RNAs is down- stream of the gypsy insertion causing the f mutation (Figure 1).

The f mutation has a gypsy element insertion into the 3' transcribed, untranslated region of the forked gene in addition to a second gypsy insertion in the exact site and orientation as that for the f I allele (Figure 1). As predicted by the increased severity of the bristle phenotype in this mutant, the abundance

Page 16: Effects of Transposable Elements on the Expression of ...Washing was at 50" in 0.1X SSC (150 mM NaCI, 15 mM sodium citrate pH 7.0) and 0.05% SDS for 1 hr at 50". The oligo-(dT) primed

522 K. K. Hoover, A. J. Chien and V. G. Corces

cs f 5 cs f 5

B A 7 a 9 7 a 9 7 8 9 7 8 9 ” ”

6.4 5.6 5.4 4.0

2.5

1.9

1 .I

FIGURE 10.-The profile offorked mRNAs inf’ flies. A North- ern containing p o l y (A+) RNA of Canton S andf’ flies was probed with exon A3, which hybridizes to allforked RNAs (A), the A 1 exon specific for the 2.5-kb RNA (B), or the ras2 gene as a control for loading (C). Numbers at the top of each lane indicate the age of the flies in days after egg laying; day 7 corresponds to young pupae in Figure 1, day 8 to mid pupae, and day 9 includes the mid-late pupae stages.

of the forked RNAs is further reduced from that observed for the f’ mutant (Figure IOA, Table 1). Like thefK mutant, some of the transcripts generated from the 5.6-, 5.4- and 2.5-kb promoters in the f 5

stock are prematurely terminated and give rise to RNAs that are 1.4 kb smaller than normal. In this mutant, truncation of forked RNA in the 5’ LTR of the downstream gypsy element would result in RNAs of the same size as wild type. A Northern containing f’ RNA was probed with exon A3 (Figure 1) present in all forked RNAs (Figure 10A). Use of this exon, which is located downstream of the gypsy insertion in the intron off’ (Figure I), allows for the examination of levels of those transcripts that are not truncated in this element. The blot was then stripped and reprobed with exon A1 (Figure I), a probe specific to the 2.5- kb RNA (Figure 10B). These results indicate that most of the 1.1-kb RNA that is detected in the f’ mutant when exon A3 is used as a probe is a truncated transcript generated from the 2.5-kb promoter.

DISCUSSION

Data presented here show that the forked locus encodes at least six different transcripts, and that the coding regions of these RNAs span 2 1 kb of genomic DNA. Analysis of spontaneous mutations in this study shows that changes in forked transcription result in disfigurement of chaetae and trichomes. Correlations exist between the presence or absence of specific forked-encoded mRNA and the severity of the mutant phenotype.

The terminally differentiated bristle cell is hollow and elliptical in cross section (RIBBERT 1972). The mechanical support necessary to generate this aniso- metric cell during mid to late stages of pupal devel- opment is presumably provided by nine to twelve regularly spaced fiber bundles that run in parallel to the long axis of the bristle shaft close to the cell surface. Electron microscopy of the developing bristle shaft in Oregon R flies shows that these fiber bundles are round, with a diameter of approximately 500 nm, and constitute 20% of the cross sectional area of the bristle (OVERTON 1967). The fiber bundle structures are temporary and only apparent between forty and fifty-three hours after puparium formation. The dis- appearance of these structures corresponds to the appearance of a dense cuticle around the periphery of the bristle shaft. Sclerotization and layers of cuticle provide the structural support necessary to maintain the terminally differentiated hollow bristle. Similar fibrillar structures are found during the development of scales in Ephestia (OVERTON 1966) and the footpads of Sarcophaga (RIBBERT 1972).

Electron micrographs of bristle rudiments from the severe f mutant, taken at 5-hr intervals of pupal development, fail to detect the temporary fibrillar structures seen in the bristle shafts of wild-type files (N. PETERSEN, personal communication). Immuno- staining of the bristle rudiments with polyclonal anti- bodies generated from the five amino terminal exons of the 2.5-kbforked mRNA localize to these fibrillar structures, but there is an absence of antibody binding in the bristle rudiments off 36n mutants (PETERSEN et al. 1993). This implicatesforked products in the for- mation and/or maintenance of the fibrillar structures.

The stretches of hydrophobic amino acids and an- kyrin-like repeats in someforked proteins may help to explain the role of these proteins during assembly of the fibrillar structures. The forked protein may be membrane associated, in which case, the presence of stretches of hydrophobic amino acids could represent regions of the protein that span the cell membrane or interact with membrane associated proteins. The an- kyrin-like repeats and/or other structural motifs within theforked proteins, could interact with proteins of the fibrillar complex and anchor them to the cell membrane. The ankyrin-like repeats present in the proteins encoded by the 6.4-, 5.6- and 5.4-kb mRNAs might serve a function analogous to the amino termi- nal domain of human erythrocyte ankyrin that con- tains 22 of these repeats. This domain binds to integral membrane proteins (DAVIS and BENNETT 1984; WEAVER, PASTERNACK and MARCHESI 1984) and tu- bulin (BENNETT and DAVIS 198 1). Theforked proteins that contain these repeats may likewise interact with the central microtubules and/or integral membrane proteins present on the cell membrane that are closely

Page 17: Effects of Transposable Elements on the Expression of ...Washing was at 50" in 0.1X SSC (150 mM NaCI, 15 mM sodium citrate pH 7.0) and 0.05% SDS for 1 hr at 50". The oligo-(dT) primed

The Drosophila forked Gene 523

associated with the fibrillar bundles. This model pre- dicts that mutations in the forked gene result in a failure of the fibrillar assembly to anchor properly to the cell membrane, compromising the structural sup- port. Stress created by cytoplasmic pressure within the bristle rudiment causes buckling of the shaft and results in a forked phenotype.

Alternatively, the forked products may be compo- nents of the fibrillar structures. The hydrophobic stretches and the ankyrin-like repeats of the forked proteins may be involved in protein/protein interac- tions necessary for the formation or packaging of the fibrillar structures. The ankyrin-like repeats of the 6.4-, 5.6- and 5.4-kb RNAs may serve a role in stabi- lizing the fibrillar structures by interacting with mi- crotubules or integral membrane proteins. The region of the forked proteins most likely to interact to form fibrillar structures is within the carboxy terminal amino acids common to the five largest forked prod- ucts. This model predicts that mutations in the forked gene would result in insufficient levels or an imbalance of materials required for normal fibrillar assembly.

Characterization of the effects of transposable ele- ment insertions on the expression of the forked gene clearly indicates that the 6.4-, 5.6- and 5.4-kb tran- scripts are necessary for normal bristle development. In the f allele, a 412 element is inserted into an intron of the 5.4- and 6.4-kb RNAs, upstream to the promoters of the other forked RNAs (Figure 1). The 6.4-, 5.6- and 5.4-kb RNAs are absent in f flies, but the other forked RNAs are unaffected. The f mutant has a weak bristle phenotype. Interestingly, elevated levels of the 2.5-, 1.9- and 1.1-kb forked RNAs are capable of rescuing the severe bristle phenotype of f 36a flies (PETERSEN et al. 1993). Higher levels of these RNAs, which encode proteins that do not have an- kyrin-like repeats, are sufficient for normal bristle development. The 12.8-kb BamHI-Sal1 genomic frag- ment from the interval +6.4 to +19.2 of the chro- mosomal walk (Figure 1) was cloned into the CaSpeR cloning vector, and introduced into a forked mutant stock through P element-mediated germline transfor- mation. When the construct is expressed at high levels, the mutant phenotype off 36a is rescued (PETERSEN et al. 1993), leading to the conclusion that the ankyrin- like repeats are dispensible, but abundance of the

forked protein is critical for normal bristle develop- ment.

The insertion of a transposable element into an exon of forked causes the severe phenotypes of the f hd

and f 36a alleles. In the f hd mutant, a P element is inserted in the same orientation as the forked gene. The insertion site of this element is near the 3' end of exon A2 (Figure 1) that is common to the 6.4-, 5.6-, 5.4- and 2.5-kb RNAs, which are absent in this mutant. This absence may result from effects of the

P element on transcription of the promoters of these RNAs or RNA instability resulting from insertion of the element. In contrast, levels of the 1.9- and 1.1-kb RNAs which are initiated downstream of this insertion site are unaffected. Results from Northern analyses off hd and f suggest that the severity of the forked phenotype correlates with levels of the 2.5-kb RNA. Southern analysis of DNA isolated from a f stock shows that the springer insertion responsible for this mutation is in exon A3 (Figure 1) common to all forked RNAs. This element is oriented opposite to forked. The RNA in f 36a mutants is truncated. This truncation is caused by recognition of a polyadenyl- ation signal located in the 3' LTR of the springer element.

In the f ',f5 and f alleles, the insertion of a gypsy retrotransposon into an intron of forked is responsible for the mutant phenotype. There is a reduction in the native levels of forked RNAs in these mutants. The mechanism(s) by which the gypsy element acts to re- duce the RNA levels in these mutants is unclear. The element may influence the expression of forked by truncating forked RNAs within the element, affecting stability of the forked RNA, or by altering transcrip- tion levels of the various forked promoters. Insertion of gypsy in f ', f and f also affects the structure of

forked RNAs by causing truncation of the transcript at a polyadenylation signal present in the 5' LTR of the gypsy retrotransposon. In the case of the f allele, this truncation leads to the appearance of 4.0 and 4.2- kb transcripts that are absent from the Canton S stock. Northern blots of the mRNA from f ', f and f" mutants analyzed with a probe specific for the pro- moter of the 2.5-kb transcript show hybridization to a 2.5-kb RNA and an additional 1.1-kb RNA. In contrast to forked RNAs in thef36a mutant, premature termination of forked RNAs within the LTR of the gypsy element in the f I , f and f" mutants is leaky, since low levels of native sized forked RNAs are present in the affected alleles. The fact that wild-type sized RNAs are detected inf' ,f5 and f " mutants indicates that during RNA processing, the gypsy element is excised with the intron into which it has inserted. The presence of the gypsy in unprocessed RNA may cause transcript instability that would further reduce abun- dance of the wild-type sized forked RNAs. Truncation of transcripts is also found in gypsy induced mutants of the hsp82 gene (DORSETT et al. 199 1) and the Hairy wing' (Hw' ) mutant (CAMPUZANO et al. 1985) where similarly oriented gypsy insertions into an intron or exon of the transcribed RNA respectively, result in truncation of the RNA from the host gene within the gypsy LTR. In addition, the gypsy retrotransposon may influence the normal transcription of forked RNAs. In thef" mutant, levels of the RNAs generated from the 1.9- and 1.1-kb promoter are reduced compared to

Page 18: Effects of Transposable Elements on the Expression of ...Washing was at 50" in 0.1X SSC (150 mM NaCI, 15 mM sodium citrate pH 7.0) and 0.05% SDS for 1 hr at 50". The oligo-(dT) primed

524 K. K. Hoover, A. J. Chien a n d V. G . Corces

Canton S. The coding regions of these RNAs are downstream of the gypsy insertion site and are not disrupted by the insertion. Transcription of the larger forked RNAs may similarly be influenced by regulatory sequences within the gypsy element, as is the case for the yellow and achaete genes. In the case of yellow, interactions between sequences within the gypsy ele- ment that bind the su(Hw) protein and enhancers of the yellow gene in the y2 allele are responsible for the mutant phenotype (CORCES and GEYER 1991). Simi- larly, the truncated achaete RNA is elevated 5-20-fold over wild-type levels in the Hw' allele (CAMPUZANO et al. 1985), suggesting that the achaete promoter is being stimulated by regulatory sequences within the gypsy element. These observations, together with the results obtained from the analysis of gypsy-induced

forked alleles, suggest that this retrotransposon might cause mutant phenotypes by a combination of mech- anisms that include effects on the rate of transcription initiation, RNA stability, and termination of transcrip- tion.

We would like to thank S. PARKHURST and R. BURATOWSKI for help during the initial characterization of the forked locus, and N. PETERSEN for communicating results prior to publication. We are indebted to M. SEPANSKI for help with the electron microscope, and A. CHRISTENSEN, D. DERER, T. GERASIMOVA, A. KIM and D. KUHN for gifts of plasmids and Drosophila strains. Our special gratitude goes to M. GREEN for many iluminating discussions; without his persistence, we would never have cloned f '. In spite of this persistence, we never characterized f 3N. This work was sup- ported by U. S. Public Health Service grant GM35463 from the National Institutes of Health. The nucleotide sequence data re- ported here will appear in the EMBL, GenBank and DDBJ databases under the accession number X6987 1.

L I T E R A T U R E C I T E D

ANDREWS, B. J., and I. HERSKOWITZ, 1989 The yeast SW14 protein contains a motif present in developmental regulators and is part of a complex involved in cell-cycle dependent transcription. Nature 3 4 2 830-833.

AVES, S. J., B. W. DURKACZ, A. CARR and P. NURSE, 1985 Cloning, sequencing and transcriptional control of the Schizosaccharo- myces pombe cdc lO start gene. EMBO J. 4: 457-463.

AVIV, H., and P. LEDER, 1972 Purification of biologically active globin messenger RNA by chromatography on oligothymidylic acid-cellulose. Proc. Natl. Acad. USA 6 9 1408-1412.

BEACHY, P. A., S. L. HELFAND and D. S. HOGNESS, 1985 Segmental distribution of bithorax complex proteins during Drosophila development. Nature 313: 545-551.

BENNETT, V., and J. DAVIS, 198 1 Erythrocyte ankyrin: immuno- reactive analogues are associated with mitotic structures in cultured cells and with microtubules in brain. Proc. Natl. Acad.

BERLETH, T., M. BURRI, G. THOMA, D. BOPP, S. RICHSTEIN, G. FRIGERIO, M. NOLL and C. NUSSLEIN-VOLHARD, 1988 The role of localization of bicoid RNA in organizing the anterior pattern of the Drosophila embryo. EMBO J. 7: 1749-1 756.

BLOCHINGER, K., R. BODMER, J. JACK, L. Y . JAN and Y . N. JAN, 1988 Primary structure and expression of product from cut, a locus involved in specifying organ identity in Drosophila. Nature 333: 629-635.

BOURS, S., J. VILLABOS, P. R. BURD, K. KELLY and U. SIEBENLIST,

USA 78: 7550-7554.

1990 Cloning of a mitogen-inducible gene encoding a KB DNA-binding protein with homology to the re1 oncogene and cell cycle motifs. Nature 348: 76-80.

BREEDEN, L., and K. NASMYTH, 1987 Similarity between cell-cycle genes of budding yeast and fission yeast and the Notch gene of Drosophila. Nature 3 2 9 651-654.

BRIDGES, C. B., 1938 New mutations. Drosophila Inform. Serv. 9 47.

BROWN, N. H., D. L. KING, M. WILCOX and F. C. KAFATOS, 1989 Developmentally regulated alternative splicing of Drosophila integrin PS2a transcripts. Cell 5 9 185-195.

CAMPUZANO, S., L. CARRAMOLINO, C. V. CABRERA, M. RUIZ GOMEZ, R. VILLARES, A. BORONAT and J. MODOLELL, 1985 Molecular genetics of the achaete-scute gene complex of D. melanogaster. Cell 40: 327-338.

CAVENER, D. R., and S. C. RAY, 1991 Eukaryotic start and stop translation sites. Nucleic Acids Res. 1 9 3 185-3 192.

CHANG, C., J. KOKONTIS, C.-T. CHANG and S. LIAO, 1987 Cloning and sequence analysis of the rat ventral prostate glucocorticoid receptor cDNA. Nucleic Acids Res. 15: 9603.

CORCES, V. G., and P. K. GEYER, 1991 Interactions of retrotrans- posons with the host genome: the case of the gyfisy element of Drosophila. Trends Genet. 7: 86-90.

COUREY, A. J., D. A. HOLTZMAN, S. P. JACKSON and R. TJIAN, 1989 Synergistic activation by the glutamine-rich domains of human transcription factor S p l . Cell 5 9 827-836.

DAVIS, J. Q., and V. BENNETT, 1984 Brain ankyrin: a membrane associated protein with binding sites for spectrin, tubulin, and the cytoplasmic domain of the erythrocyte anion channel. J. Biol. Chem. 259: 13550-13559.

DORSETT, D., G. A. VIGLIANTI, B. J. RUTLEDCE and M. MESELSON, 1989 Alteration of hsp82 gene expression by the gypsy tran- sposon and suppressor genes of Drosophila melanogaster. Genes Dev. 3: 454-468.

DUBOULE, D., M. HAENLIN, B. GALLIOT and E. MOHIER, 1987 DNA sequences homologous to the Drosophila opa re- peat are present in murine mRNAs that are differentially expressed in fetuses and adult tissues. Mol. Cell. Biol. 7: 2003- 2006.

DUDICK, M. E., T. R. F. WRIGHT and L. L. BROTHERS, 1974 Developmental genetics of the temperature sensitive lethal of the suppressor offorked 1 , (l)sufj,l ts67g in Drosophila melanogaster. Genetics 76 487-510.

FRIGERIO, G., M. BURRI, D. BOPP, S. BAUMGARTNER and M. NOLL, 1986 Structure of the segmentation gene paired and the Drosophila PRD gene set as part of a gene network. Cell 47:

GREEN, M. M., 1955 Phenotypic variations and pseudoallelism at the forked locus in Drosophila melanogaster. Proc. Natl. Acad. Sci. 41: 375-379.

GREENWALD, I. S., 1985 lin-12, a nematode homeotic gene, is homologous to a set of mammalian proteins that includes epidermal growth factor. Cell 43: 583-590.

HIGASHIJIMA, S., T. KOJIMA, T . MICHIUE, S. ISHIMARU, Y . EMORI and K. SAIGO, 1992 Dual Bar homeobox genes of Drosophila required in two photoreceptor cells, R1 and R6, and primary pigment cells for normal eye development. Genes Dev. 6 50- 60.

HOOVER, K. K., T . I. GERASIMOVA, A. J. CHIEN and V. G. CORCES, 1992 Dominant effects of suppressor o f f fa i rywing mutations on gypsy-induced alleles of forked and cut in Drosophila melano- gaster. Genetics 132: 691-697.

HULTMARK, D., R. KLEMENZ and W. GEHRING, 1986 Translational and transcriptional control elements in the un- translated leader of the heat-shock gene hsp-22. Cell 44: 429- 438.

JACK, J., D. DORSETT, Y . DELOTTO and S. LIU, 1991 Expression of the cut locus in the Drosophila wing margin is required for

735-746.

Page 19: Effects of Transposable Elements on the Expression of ...Washing was at 50" in 0.1X SSC (150 mM NaCI, 15 mM sodium citrate pH 7.0) and 0.05% SDS for 1 hr at 50". The oligo-(dT) primed

The D r o s o p h i l a f o r k e d Gene 525

cell type specification and is regulated by a distant enhancer. Development 113: 735-747.

KARLIK, C., and E. A. FYRBERG, 1985 An insertion within a variably spliced Drosophila tropomyosin gene blocks accumu- lation of only one encoded isoform. Cell 41: 57-66.

KIDD, S., T. J. LOCKETT and M. W. YOUNG, 1983 The Notch locus of Drosophila melanogaster. Cell 3 4 421-433.

KUNER, J., M. NAKANISHI, Z. ALI, B. DREES, E. GUSTAVSON, J. THIES, L. KAUVAR, T. KORNBERC and P. O’FARRELL, 1985 The engrailed locus of Drosophila melanogaster: molec- ular cloning. Cell 42: 309-3 16.

LAMARCO, K., C. C. THOMPSON, B. P. BYERS, E. M. WALTON and S. L. MCKNICHT, 1991 Identification of Ets- and Notch-re- lated subunits in GA binding protein. Science 253: 789-792.

LAUGHON, A., and M. P. SCOTT, 1984 Sequence of a Drosophila segmentation gene: structural homology with DNA-binding proteins. Nature 310: 25-3 1 .

LAUGHON, A,, A. M. BOULET, J. R. BERMINGHAM, R. A. LAYMAN and M. P. SCOTT, 1986 Structure of transcripts from the homeotic Antennapedia gene of Drosophila melanogaster: two promoters control the major protein coding region. Mol. Cell. Biol. 6 4676-4689.

LEES, A. D., and C. H. WADDINGTON, 1942 The development of the bristles in normal and some mutant types of Drosophila melanogaster. Proc. R. SOC. Ser. B 131: 87-110.

LEES, A. D., and L. E. R. PICKEN, 1945 Shape in relation to fine structure in the bristles of Drosophila melanogaster. Proc. R. SOC. Ser. B 132: 396-423.

Lux, S. E., K. M. JOHN and V. BENNETT, 1990 Analysis of cDNA for human eryhtrocyte ankyrin indicates a repeated structure with homology to tissue-differentiation and cell-cycle control proteins. Nature 344 36-42.

MARLOR, R. L., S . M. PARKHURST and V. G . CORCES, 1986 The Drosophila melanogaster gypsy transposable element encodes pu- tative gene products homologous to retroviral proteins. Mol. Cell. Biol. 6 1129-1 134.

MCLACHLAN, A., 1986 Drosophila forked locus. Mol. Cell. Biol. 6: 1-6.

MOUNT, S. M., 1982 A catalogue of splice junction sequences. Nucleic Acids Res. 10: 459-472.

MOUNT, S. M., M. M. GREEN and G . M. RUBIN, 1988 Partial revertants of the transposable element associated suppressible allele white-apricot in Drosophila melanogaster. Structures and responsiveness to genetic modifiers. Genetics 118: 221-234.

MOZER, B., R. MARLOR, S. PARKHURST and V. CORCES, 1985 Characterization and developmental expression of a Drosophila ras oncogene. Mol. Cell. Biol. 5: 885-889.

O’HARE, K . , and G . M. RUBIN, 1983 Structure of P transposable elements and their sites of insertion and excision in the Dro- sophila melanogaster genome. Cell 3 4 25-35.

OHNO, H., G . TAKIMOTO and T . W. MCKEITHAN, 1990 The candidate proto oncogene bcl-3 is related to genes implicated in cell lineage determination and cell cycle control. Cell 60:

OVERTON, J., 1966 Microtubules and microfibrils in morphoge- nesis of the scale cells of Ephestia kuhniella. J. Cell Biol. 2 9

OVERTON, J., 1967 The fine structure of developing bristles in wild type and mutant Drosophila melanogaster. J. Morphol. 122:

PARKHURST, S. M., and V. G. CORCES, 1985 forked, Gypsys, and suppressors in Drosophila. Cell 41: 429-437.

PATERSON, J., and K. O’HARE, 1991 Structure and transcription of the singed locus in Drosophila melanogaster. Genetics 129

PENG, M., and S. M. MOUNT, 1990 Characterization of Enhancer- of-white-apricot in Drosophila melanogaster. Genetics 126 1061- 1069.

991-997.

293-305.

367-380.

1073-1084.

PETERSEN, N. S., D.-H. LANKENAU, H. K. MITCHELL, P. YOUNG, and V. G . CORCFS, 1993 Forked proteins are components of fiber bundles present in developing bristles of Drosophila mel- anogaster. Genetics (in press).

POODRY, C. A,, 1980 Epidermis: morphology and development, pp. 443-497 in The Genetics and Biology of Drosophila, edited by M. ASHBURNER and T. R. F. WRIGHT. Academic Press, New York.

PROUDFOOT, N. J., and G. G . BROWNLEE, 1976 3’ Non-coding region sequences in eukaryotic messenger RNA. Nature 263:

REGULSKI, M., K. HARDING, R. KOSTRIKEN, F. KARCH, M. LEVINE and W. MCGINNIS, 1985 Homeo-box genes of the Antenna- pedia and Bithorax complexes of Drosophila. Cell 43: 71-80.

RIBBERT, D., 1972 Relation of puffing to bristle and footpad differentiation in Calliphora and Sarcophaga, pp. 153-178 in Results and Problems in Cell Differentiation, Vol. 4, edited by W. BEERMAN. Springer-Verlag, Berlin.

RUTLEDGE, B. J., M. A. MORTIN, E. SCHWARZ, D. THIERY-MIEG and M. MESELSON, 1988 Genetic interaction of modifier genes and modifiable alleles in Drosophila melanogaster. Ge- netics 119 391-397.

SAMBROOK, J., E. F. FRISCH and T. MANIATIS, 1989 Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.

SANGER, F., S. NICKLEN and A. R. COULSON, 1977 DNA sequenc- ing with chain terminating inhibitors. Proc. Natl. Acad. Sci. USA 74: 5463-5467.

SCHNEUWLY, S., A. KUROIWA, P. BAUMGARTNER and W. J. GEHR- INC, 1986 Structural organisation and sequence of the hom- eotic gene Antennapedia of Drosophila melanogaster. EMBO J. 5: 733-739.

SCHULTZ, K., and M. CARLSON, 1987 Molecular analysis ofSSN6, a gene functionally related to the SNF protein kinase of Saccha- romyces cerevisiae. Mol. Cell. Biol. 7: 3637-3645.

SCHWARTZ, R. M., and M. 0. DAYHOFF, 1978 Matrices for de- tecting distant relationships, pp. 353-358 in Atlas of Protein Sequence and Structure, Vol. 5, edited by M. 0. DAYHOFF. National Biomedical Research Foundation, Washington D.C.

SMITH, P. A., and V. G. CORCES, 1991 Drosophila transposable elements: mechanisms of mutagenesis and interactions with the host genome. Adv. Genet. 29: 229-300.

SPENCE, A. M., A. COULSON and J. HODGKIN, 1990 The product of fem-Z, a nematode sex-determining gene, contains a motif found in cell cycle control proteins and receptors for cell-cell interactions. Cell 60: 981-990.

SPRADLING, A. C., and A. P. MAHOWALD, 1979 Identification and genetic localization of mRNAs from ovarian follicle cells of Drosophila melanogaster. Cell 16: 589-598.

STRECK, R. D., J. E. MACGAFFEY and S. K. BECKENDORF, 1986 The structure of hobo transposable elements and their insertion sites. EMBO J. 5: 3615-3623.

SUZUKI, Y., N. YOSHUHISA, A. ABE and T . FUKASAWA, 1988 Gall Z protein, an auxiliary transcription activator for genes encod- ing galactose metabolizing enzymes in Saccharomyces cerevisiae. Mol.Cell. Biol. 8: 4991-4999.

TANDA, S., and V. G. CORCES, 1991 Retrotransposon-induced overexpression of a homeobox gene causes defects in eye morphogenesis in Drosophila. EMBO J. 1 0 407-417.

THOMPSON, C. C., T . A. BROWN and S. L. MCKNIGHT, 1991 Convergence of Ets- and Notch-related structural motifs in a heteromeric DNA binding complex. Science 253: 762-767.

WEAVER, D. C., G . R. PASTERNACK and V. T. MARCHESI, 1984 The structural basis of ankyrin function. 11. Identification of two functional domains. J. Biol. Chem. 259 6170-6175.

WHARTON, K., K. JOHANSEN, T . Xu and S . ARTAVANIS-TSAKONAS, 1985a Nucleotide sequence from the neurogenic locus Notch implies a gene product that shares homology with proteins

211-214.

Page 20: Effects of Transposable Elements on the Expression of ...Washing was at 50" in 0.1X SSC (150 mM NaCI, 15 mM sodium citrate pH 7.0) and 0.05% SDS for 1 hr at 50". The oligo-(dT) primed

526 K. K . H o o v e r , A. J. Chien and V. G. Corces

containing EGF-like repeats. Cell 43: 567-58 1. WHARTON, K. A. , B. A. YEDVOBNICK, V. G. FINNERTY and S.

ARTAVANIS-TSAKONAS, 1985b opa: a novel family of tran- scribed repeats shared by the Notch locus and other develop- mentally regulated loci in Drosophila melanogaster. Cell 4 0 55- 62.

YOCHEM, J., and I. GREENWALD, 1989 glp-1 and lin-12 genes implicated in distinct cell-cell interactions in C. elegans, encode similar transmembrane proteins. Cell 58: 553-563.

YUKI, S., S. INOUYE, S. ISHIMARU and K. SAIGO, 1986 Nucleotide sequence characteristization of a Drosophila retrotransposon 412 element. Eur. J. Biochem. 158: 403-410.

ZACHAR, Z. D. , D. DAVIDSON, D. GARZA and P. M. BINGHAM, 1985 A detailed developmental and structural study of the transcriptional effects of insertion of the copia transposon into the white locus of Drosophila melanogaster. Genetics 11 1: 495- 515.

Communicating editor: M. J. SIMMONS