type i procollagen cooh-terminal proteinase enhancer protein

6
0 1994 by The American Society for Biochemistry and Molecular Biology, Inc THE JOURNAL OF BIOLOGICAL CHEMISTRY Vol. 269, No. 42, Issue of October 21, pp. 26280-26285, 1994 Printed in U.S.A. Type I Procollagen COOH-terminal Proteinase Enhancer Protein: Identification, Primary Structure, and Chromosomal Localization of the Cognate Human Gene (PCOLCE)* (Received for publication, July 21, 1994, and in revised form, August 11, 1994) Kazuhiko TakaharaS, Efrat Kesslerg, Luba Biniaminovg, Marina BruselO, Roger L. Eddy$ Sheila Jani-Saitfl, Thomas B. Showsn, and Daniel S. GreenspanSll** From the $Department of Pathology and Laboratory Medicine and the (I'rogram in Cell and Molecular Biology, University of Wisconsin, Madison, Wisconsin 53706, the §Maurice and Gabriela Goldschleger Eye Research Institute, 7bl Aviv University Faculty of Medicine, Sheba Medical Center, Tel Hashomer, 52621 Israel, and the Wepartment of Human Genetics, Roswell Park Memorial Institute, New York State Department of Health, Buffalo, New York 14263 Type I procollagen COOH-terminal proteinase (C-pro- teinase) enhancer, a glycoprotein that binds to the COOH-terminal propeptide of type I procollagen and en- hances procollagen C-proteinase activity, was purified from mouse fibroblast culture media. Partial amino acid sequences obtained from proteolytic fragments were found to have identity with the deduced amino acid se- quence of a cDNA clone of unknown function, previously isolated from a mouse astrocyte library. Sequences of mouse enhancer cDNA, obtained in the present study, predict a -5O-kDa, 468-amino acid protein that differs from the 43-kDa, 402-amino acid protein predicted by the previously reported astrocyte-derivedclone. Human cDNh encode an enhancer of 449 amino acids. Previous biochemical studies have found the mouse enhancer as a 55-kDa form, which is readily processed to 36- and 34- kDa forms,retaining full C-proteinaseenhancing activ- ity and the ability to bind the COOH-terminal propep- tide.Data presented here show the 36-kDa form to correspond to the amino-terminalportion of the S5-kDa protein. This is the mostconservedregionbetween mouse and human enhancers, comprising two domains with homology to domains found in a number of pro- teases and proteins with developmental functions. Such domains are thought to mediate interactions between proteins. Mouse enhancer RNA is shown to be at highest levels in collagen-rich tissues, especially tendon. The human enhancer gene, PCOLCE, is localized to 7q21.3-q22, the same chromosomal region containing the type I collagen a2 chain gene, COLlA2. Fibrillar collagen types 1-111 are synthesized as precursor molecules known as procollagens. Theseprecursorscontain amino- and carboxyl-terminal peptide extensionsknown as N- and C-propeptides,' respectively, which are cleaved, upon se- * This work was supported by National Institutes of Health Grant GM46846 (to D. S. G.) and by Grant 89-00498/1 (to E. K. and D. S. G.) from the United States-Israel Binational Science Foundation, Jerusa- lem, Israel. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. to the GenBankTMIEMBL Data Bank with accession number($ L33799. The nucleotide sequence(s) reported in this paper has been submitted ** To whom correspondence should be addressed: Dept. of Pathology, University ofWisconsin, 1300 University Ave., Madison, WI 53706. Tel.: The abbreviations used are: N- and C-propeptides, the amino- and carboxyl-terminalpropeptides,respectively, of the procollagen mole- cule; C-proteinase, carboxyl-terminal proteinase; bp, base pair(s); PCR, 608-262-4676; Fax: 608-265-3301. cretion of procollagen from the cell, to yield the mature triple helical monomer capable of association into highly structured fibrils (for reviews, see Refs. 13). Procollagen C-proteinase, the enzyme that cleaves the C-propeptide of type I procollagen, has been extensively purified from conditioned media of chick embryo tendons (4) and from media of cultured mouse fibro- blasts (5, 6). Specificity of the C-proteinases from both sources was confirmed by demonstrating that cleavage of the pro-cu chains of type I procollagen occurred at the correct peptide bond (4, 5). Both enzymes were also shown to cleave the C-propep- tides of procollagen types I1 and I11 (4,5). The properties of the chick and mouse C-proteinases were similar with the exception that in the mouse system, full expression of C-proteinase ac- tivity depended upon the presence of either a 55-kDa glyco- protein or smaller 36- and 34-kDa proteolytically processed forms of the same protein (6, 7). None of the three enhancer proteins exhibited intrinsic procollagen processing activity, but all were capable of enhancing the activity of the C-proteinase by approximately 1 order of magnitude (6). The enhancer protein is far more abundant in the media of cultured mouse fibroblasts than is the C-proteinase. In addi- tion, all three forms of the enhancer protein have been shown to bind specifically to the type I procollagen C-propeptide (6,7). It has therefore been proposed that the enhancer may act through binding to procollagen, thereby inducing a conforma- tional change that renders it a fitter substratefor cleavage by the C-proteinase (6). Consistent with a proposed role in procol- lagen processing, immunoblotting of extracts from various postnatal mouse and rat tissues has shown the enhancer pro- teins to be most abundant in connective tissues, with an ap- parent correlationbetween enhancer expression and extent of collagen deposition (8). In this study, we presentthe full-length amino acid se- quences for the human and mouse enhancer proteins, as de- duced from partial amino acid sequencing and from character- ization of cDNA clones. The chromosomal position of the cognate human gene PCOLCE is also presented. Possible im- plications of the domain structure of the enhancer proteins and of the chromosomal location of PCOLCE are discussed. MATERIALS AND METHODS Purification of Mouse Enhancer Proteins Enhancer proteins were purified from the growth media of 3T6 mouse fibroblasts (9) as described (5-7). One difference was that for final pu- rification of enhancer proteins by affinity chromatography on a column polymerase chain reaction; PAGE, polyacrylamide gel electrophoresis; HPLC, high pressure liquid chromatography; CUB, complement-uegf- bmp-1. 26280

Upload: lamcong

Post on 25-Jan-2017

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Type I Procollagen COOH-terminal Proteinase Enhancer Protein

0 1994 by The American Society for Biochemistry and Molecular Biology, Inc THE JOURNAL OF BIOLOGICAL CHEMISTRY Vol. 269, No. 42, Issue of October 21, pp. 26280-26285, 1994

Printed in U.S.A.

Type I Procollagen COOH-terminal Proteinase Enhancer Protein: Identification, Primary Structure, and Chromosomal Localization of the Cognate Human Gene (PCOLCE)*

(Received for publication, July 21, 1994, and in revised form, August 11, 1994)

Kazuhiko TakaharaS, Efrat Kesslerg, Luba Biniaminovg, Marina BruselO, Roger L. Eddy$ Sheila Jani-Saitfl, Thomas B. Showsn, and Daniel S. GreenspanSll** From the $Department of Pathology and Laboratory Medicine and the (I'rogram in Cell and Molecular Biology, University of Wisconsin, Madison, Wisconsin 53706, the §Maurice and Gabriela Goldschleger Eye Research Institute, 7bl Aviv University Faculty of Medicine, Sheba Medical Center, Tel Hashomer, 52621 Israel, and the Wepartment of Human Genetics, Roswell Park Memorial Institute, New York State Department of Health, Buffalo, New York 14263

Type I procollagen COOH-terminal proteinase (C-pro- teinase) enhancer, a glycoprotein that binds to the COOH-terminal propeptide of type I procollagen and en- hances procollagen C-proteinase activity, was purified from mouse fibroblast culture media. Partial amino acid sequences obtained from proteolytic fragments were found to have identity with the deduced amino acid se- quence of a cDNA clone of unknown function, previously isolated from a mouse astrocyte library. Sequences of mouse enhancer cDNA, obtained in the present study, predict a -5O-kDa, 468-amino acid protein that differs from the 43-kDa, 402-amino acid protein predicted by the previously reported astrocyte-derived clone. Human cDNh encode an enhancer of 449 amino acids. Previous biochemical studies have found the mouse enhancer as a 55-kDa form, which is readily processed to 36- and 34- kDa forms, retaining full C-proteinase enhancing activ- ity and the ability to bind the COOH-terminal propep- tide. Data presented here show the 36-kDa form to correspond to the amino-terminal portion of the S5-kDa protein. This is the most conserved region between mouse and human enhancers, comprising two domains with homology to domains found in a number of pro- teases and proteins with developmental functions. Such domains are thought to mediate interactions between proteins. Mouse enhancer RNA is shown to be at highest levels in collagen-rich tissues, especially tendon. The human enhancer gene, PCOLCE, is localized to 7q21.3-q22, the same chromosomal region containing the type I collagen a2 chain gene, COLlA2.

Fibrillar collagen types 1-111 are synthesized as precursor molecules known as procollagens. These precursors contain amino- and carboxyl-terminal peptide extensions known as N- and C-propeptides,' respectively, which are cleaved, upon se-

* This work was supported by National Institutes of Health Grant GM46846 (to D. S. G.) and by Grant 89-00498/1 (to E. K. and D. S. G.) from the United States-Israel Binational Science Foundation, Jerusa- lem, Israel. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

to the GenBankTMIEMBL Data Bank with accession number($ L33799. The nucleotide sequence(s) reported in this paper has been submitted

** To whom correspondence should be addressed: Dept. of Pathology, University ofWisconsin, 1300 University Ave., Madison, WI 53706. Tel.:

The abbreviations used are: N- and C-propeptides, the amino- and carboxyl-terminal propeptides, respectively, of the procollagen mole- cule; C-proteinase, carboxyl-terminal proteinase; bp, base pair(s); PCR,

608-262-4676; Fax: 608-265-3301.

cretion of procollagen from the cell, to yield the mature triple helical monomer capable of association into highly structured fibrils (for reviews, see Refs. 13). Procollagen C-proteinase, the enzyme that cleaves the C-propeptide of type I procollagen, has been extensively purified from conditioned media of chick embryo tendons (4) and from media of cultured mouse fibro- blasts (5, 6). Specificity of the C-proteinases from both sources was confirmed by demonstrating that cleavage of the pro-cu chains of type I procollagen occurred at the correct peptide bond (4, 5). Both enzymes were also shown to cleave the C-propep- tides of procollagen types I1 and I11 (4,5). The properties of the chick and mouse C-proteinases were similar with the exception that in the mouse system, full expression of C-proteinase ac- tivity depended upon the presence of either a 55-kDa glyco- protein or smaller 36- and 34-kDa proteolytically processed forms of the same protein (6, 7). None of the three enhancer proteins exhibited intrinsic procollagen processing activity, but all were capable of enhancing the activity of the C-proteinase by approximately 1 order of magnitude (6).

The enhancer protein is far more abundant in the media of cultured mouse fibroblasts than is the C-proteinase. In addi- tion, all three forms of the enhancer protein have been shown to bind specifically to the type I procollagen C-propeptide (6,7). It has therefore been proposed that the enhancer may act through binding to procollagen, thereby inducing a conforma- tional change that renders it a fitter substrate for cleavage by the C-proteinase (6). Consistent with a proposed role in procol- lagen processing, immunoblotting of extracts from various postnatal mouse and rat tissues has shown the enhancer pro- teins to be most abundant in connective tissues, with an ap- parent correlation between enhancer expression and extent of collagen deposition (8).

In this study, we present the full-length amino acid se- quences for the human and mouse enhancer proteins, as de- duced from partial amino acid sequencing and from character- ization of cDNA clones. The chromosomal position of the cognate human gene PCOLCE is also presented. Possible im- plications of the domain structure of the enhancer proteins and of the chromosomal location of PCOLCE are discussed.

MATERIALS AND METHODS

Purification of Mouse Enhancer Proteins Enhancer proteins were purified from the growth media of 3T6 mouse

fibroblasts (9) as described (5-7). One difference was that for final pu- rification of enhancer proteins by affinity chromatography on a column

polymerase chain reaction; PAGE, polyacrylamide gel electrophoresis; HPLC, high pressure liquid chromatography; CUB, complement-uegf- bmp-1.

26280

Page 2: Type I Procollagen COOH-terminal Proteinase Enhancer Protein

Type I Procollagen C-proteinase Enhancer Protein 26281

of Sepharose coupled to the C-propeptide of type I procollagen, the C- propeptide ligand was chick rather than human. It was isolated from 17-day-old chick embryo tendons by published procedure (10) and was the kind gift of Dr. David Hulmes (University of Edinburgh, Scotland).

Preparation of Mouse Enhancer Fragments for NH,-terminal Sequence Determination

Fragments from the 55-kDa Enhancer Protein-Unreduced samples of the partially purified 55-kDa enhancer fraction from the lysyl-Sepha- rose step (-100 pg of enhancer protein) were subjected to SDS-PAGE in a 10% separating gel. After rapid staining with Coomassie Blue (ll), gel slices containing the 55-kDa enhancer protein were excised, washed with water, equilibrated with SDS-sample buffer containing 0.1 M di- thiothreitol, and subjected to SDS-PAGE in a 7.5% acrylamide gel fol- lowed by rapid Coomassie staining as above. This step was required to separate the 55-kDa enhancer from two closely migrating proteins that were not well resolved without reduction. Gel pieces containing the enhancer protein were excised, washed with water, and applied to a 12% acrylamide gel for treatment with Staphylococcus aureus V8 pro- tease (10 pg/ml, 30 min) performed as described by Cleveland et al. (12). Proteolytic fragments were separated by SDS-PAGE and electrotrans- ferred to a polyvinylidene difluoride membrane (Immobilon-P, Millipore Corp.) (13). Three fragments with approximate molecular masses of 35, 30, and 27 kDa (designated V1, V2, and V3) were subjected to NH,- terminal sequencing.

Fragments Derived from the 36-kDa Enhancer Protein-Propeptide- Sepharose-purified enhancer (-300 pg) was reduced and carboxy- methylated as previously described (14). The modified protein (1 mg/ml in 50 mM ammonium bicarbonate) was digested with V8 protease (20 pg/ml, 6 h a t 37 "C), and the resulting peptides were separated by reverse phase HPLC using an Aquapore RP300 column (4.6 x 30 mm, Applied Biosystems) and a Hewlett-Packard model HP1040 Instru- ment. Two fractions, 50 and 56, were subjected to NH,-terminal se- quence analysis.

Fragments Derived from a Mixture of the 55- and 36-kDa Enhancer Proteins-A propeptide-Sepharose-purified enhancer fraction contain- ing approximately 50 pg of each of the 55- and 36-kDa enhancer pro- teins in 400 p1 of 50 m Tris-HC1, pH 7.5,0.15 M NaCl, 5 r m CaCI, was incubated with trypsin (L-1-tosylamido-2-phenylethyl chloromethyl ke- tone-treated, 2.5 pgml) (Sigma) for 15 min at room temperature. The reaction was stopped by adding trichloroacetic acid to lo%, and the resulting precipitate was washed with acetone, dissolved in SDS- sample buffer containing 0.1% dithiothreitol, and subjected to SDS- PAGE in a 12% separating gel followed by electroblotting to polyvinyli- dene difluoride (13). Two fragments with approximate molecular masses of 34 and 23 kDa (designated T2 and T4, respectively) were subjected to NH,-terminal sequence analysis.

Amino Acid Sequence Analysis The NH,-terminal amino acid sequences of fragments Vl-V3 were

determined by automated Edman degradation on an Applied Biosys- terns 470A protein sequencer at the University of Wisconsin Biotech- nology Center. NH,-terminal sequencing of all other fragments was performed at the Weizmann Institute of Science, Rehovot, Israel, with either the A B 1 model 475A gas phase protein sequencer (fragments T2 and T4) or the AI31 model 475A liquid phase protein sequenator (frag- ments 50A, 50B, 50C, and 56).

SDS-PAGE SDS-PAGE was performed according to the method of Laemmli (15)

with sample preparation as recommended by Hunkapiller et al. (11).

Polymerase Chain Reaction (PCR) PCR amplification was performed with 0.2 1.1~ each of specific oligo-

nucleotide primers and with 2 ng of unamplified cDNA from mouse heart (Clontech) in a thermal cycler (Coy) with denaturation at 94 "C for 3 min, followed by 35 cycles of 94 "C (1 min), 53 "C (1 min), 72 "C (1.4 mid, and a final incubation at 72 "C (8 min). Amplifications were per- formed in a final volume of 100 pl containing 10 mM Tris-HC1, pH 8.3, 50 m~ KCl, 1.5 mM MgCl,, 0.01% (w/v) gelatin, 0.2 mM of each dNTP, 2.5 units of Tag polymerase (Perkin-Elmer), and 5% formamide.

DNA Sequence Analysis Restriction fragments were subcloned into the vector pBluescript I1

KS' (Stratagene), polymerase chain reaction products were cloned into the TA vector (Invitrogen), and sequences were obtained from double- stranded templates by dideoxy chain termination as previously de- scribed (16). Ends of subclones were sequenced using T3 and T7 primers

(for pBluescript clones) or T7 primers (for TA clones), with internal portions of subclones made accessible to sequencing through introduc- tion of appropriate deletions or use of oligonucleotide primers comple- mentary to insert sequences. All reported sequences have been con- firmed by sequencing of both strands.

Isolation of RNA Total RNA was prepared from liver, kidney, heart, sternum, brain,

tendon, and skin of 26-day-old mice. Tissues were frozen in liquid ni- trogen, pulverized, and then homogenized in Tri Reagent (Molecular Research Center, Inc.) (17) followed by chloroform phase separation and successive isopropanol and ethanol precipitations, performed according t o the manufacturer's instructions.

Chromosomal Assignment The genomic clone was labeled with digoxygenin-11-dUTP (Boeh-

ringer Mannheim) by random priming (18). In situ hybridization on metaphase chromosome spreads was done as per the method of Trask (19). Digital images were obtained using a Nikon epifluorescence mi- croscope coupled to a cooled CCD camera (Photometrics). Images were merged and enhanced using the IP Lab Spectrum image analysis soft- ware supported on an Apple Macintosh IIci computer.

RESULTS

NH,-terminal Amino Acid Sequencing of Mouse Enhancer Fragments-When the intact 55-kDa enhancer protein, iso- lated without prior digestion with proteases, was subjected to automated Edman degradation, no products were detectable. In addition, no signal was obtained upon analysis of the 34-kDa tryptic fragment T2 (see "Materials and Methods"). These re- sults suggested that the NH,-terminal amino acid residue of the enhancer protein might be blocked and also raised the possibility, subsequently confirmed by nucleotide sequencing (see below), that the 36- and 34-kDa enhancer proteins repre- sent the NH,-terminal portion of the 55-kDa enhancer protein.

To obtain enhancer fragments with free amino ends, the 55- kDa protein was treated with S. aureus VS protease, and the resulting fragments were separated by SDS-PAGE and elec- troblotted to polyvinylidene difluoride. Three fragments (Vl, V2, and V3) were subjected to NH,-terminal sequence analysis, and each yielded a distinct sequence (Fig. 1). Purified car- boxymethylated 36-kDa enhancer was also digested with V8 protease. Products were separated by HPLC, and two fractions (50 and 56) were subjected to amino-terminal sequencing. Frac- tion 56 gave a single peptide sequence, whereas fraction 50 yielded three distinct sequences: 50A, 50B, and 50C (Fig. 1). An additional sequence was obtained by NH,-terminal sequencing of tryptic fragment T4, derived from an enhancer preparation containing both the 55- and 36-kDa enhancer proteins. As evi- dent from Fig. l, the sequence of fragment T4 showed partial identity with that of fragment V2. Fragments V1 and 50A also shared portions of their sequences, whereas the sequences of fragments 50B, 50C, 56, and V3 were each unique.

Comparison of the combined amino acid sequences of frag- ments V2 and T4 with those on the National Center for Bio- technology Information BLAST network (nonredundant data base) found significant homology with the predicted amino acid sequence of a cDNA clone named p14, isolated from a library prepared from a mouse astrocyte cell line (20). Alignment of sequences of the various mouse enhancer peptide fragments with corresponding deduced amino acid sequences from cDNA p14 is shown in Fig. 1. In each case, residues successfully resolved by amino acid sequencing of the peptide fragments were found to be identical with those of the deduced protein product of the p14 cDNA. This clearly identified the mouse enhancer protein as the encoded product of clone p14.

Isolation of cDNAs Encoding the Full-length Human C-pro- teinase Enhancer and Comparison of the Deduced Amino Acid Sequence with That of the Mouse Protein-Preparatory to elu- cidating the structure of the human C-proteinase enhancer

Page 3: Type I Procollagen COOH-terminal Proteinase Enhancer Protein

26282 Type I Procollagen C-proteinase Enhancer Protein

PEPTIDE SEQUENCE

5 OA v1

4 4

50C 77

508 96

56 250

v3 281

T4 v2

GFPNLYPPNKKXIXTITVP SGYVASEGFPNLYPPXKKXIXTITVPEGQTV BQWABEQFPNLYPPNKKCIN”1TVPEGQTV 74

SFRVFDM BFRVFDM 83

VFAGSGTSGQRLGRFXGTFRPAF‘WAPGNQV WAQBQTSGQRLQRFCQTPRPA~APQNQV 126

LLVQFVSDLSmADGFSASY LLVQFVBDLBVTADQFBABY 269

SALSPGEDVQRGPQSR SALBPQBDVQRQPQBR 296

SQPAETPEASPAXQATPVAP ASPAXQATPVAPAAPSITXPKQY -

315 BQPAETPEA8PATQATPVAPAAPBITCPKQY 345

FIG. 1. Comparison of peptide sequences of the mouse C-pro-

amino acid sequences of cDNA p14. Residues derived from amino- teinase enhancer protein with the corresponding deduced

ments, isolated after cleavage with either trypsin (peptide T4) or V8 terminal sequencing of mouse C-proteinase enhancer peptide frag-

protease (peptides V1, V2, V3, 50A, 50B, 50C, and 56), are compared with amino acid sequences deduced from published (20) p14 cDNA nucleotide sequences. The deduced p14 sequences are shown in bold face type beneath the corresponding peptide sequences and are num- bered as in Lecain et al. (20). Residues that could not be resolved by amino acid sequencing of proteolytic fragments are indicated by X.

protein and obtaining probes for mapping its cognate gene, PCOLCE (see below), PCR was employed to produce a mouse enhancer cDNA probe with which to screen human cDNA li- braries. Primers were designed corresponding to nucleotides 38-57 and 1344-1363 of the published p14 sequence (20). PCR was performed using unamplified mouse heart cDNA as tem- plate since p14 RNA transcripts had been reported to be ex- pressed at relatively high levels in heart compared with some other tissues (21). A PCR product of the expected 1327-bp size was obtained. A 280-bp PstI fragment from near the 3’-end of the 1327-bp mouse PCR product was then prepared and used as a probe to screen a human placenta cDNA library. Three over- lapping human cDNAs were obtained (Fig. 21, which together contained an open reading frame predicting a polypeptide con- sisting of 449 amino acid residues (Fig. 3).

Comparison of the predicted amino acid sequence of the hu- man enhancer protein with that published for the mouse pro- tein (20) showed high homology between the two proteins, which was maintained up to mouse residue 390 (Fig. 3). How- ever, from position 390 onward, no significant homology existed between the predicted mouse and human amino acid se- quences. In contrast, corresponding DNA sequences down- stream of this point were found to be quite conserved. It was apparent that insertion of an additional base at or near this point within the p14 sequence would cause a frameshift such that the predicted downstream mouse and human amino acid sequences would regain high homology up to amino acid 444 (421 of the human enhancer sequence). Furthermore, addition of another base at this latter position in the published mouse cDNA sequence would further extend homology between hu- man and mouse enhancer proteins. To examine the possibility that bases may have been omitted at or near these points in the published sequence, primers corresponding to nucleotides 1107-1126 and 1453-1472 of the published mouse p14 se- quence were synthesized and PCR performed; PCR products were cloned, and three independent clones were sequenced in

C C

Pv S PA BISPv SB11SaA 81 N II I I I v I KT11

I I KT9

I I KT3 - 500 bp

FIG. 2. Partial restriction map of human C-propeptide en- hancer protein cDNA clones. Human cDNA clones KT11, K T 9 , and KT3 (which contain 1474-, 1231-, and 924-bp inserts, respectively) are shown beneath a schematic representation of regions of the enhancer mRNA. Black and stippled boxes represent the signal peptide and “CUB” domain (see “Discussion”) encoding regions, respectively. Wavy lines represent 5’- and 3’-untranslated regions. C indicates the position of cysteine codons. Those cysteines, which are found in the human but not in the mouse enhancer, are in bold face type. Vertical lines and arrowheads mark the areas where insertions are found in the mouse enhancer. A, ApaI; BZ, BgZI; BIZ, BglII; N, NcoI; P, PstI; Pv, hruII; Sa, SacII; S, SmaI.

their entirety. Except for 2 additional guanosine residues, all sequences obtained, which extended to just upstream of the polyadenylation signal, were found to be in full agreement with the published sequence. 1 of these additional guanosine resi- dues was located between nucleotides 1213 and 1214, and the other was located between nucleotides 1375 and 1376 of the published sequence. The corrected mouse nucleotide sequence predicts a polypeptide of 468 amino acid residues, which, with the addition of a few small gaps, is homologous with the human protein throughout its entire length (Fig. 3). The mouse se- quence presented here alters the sequence of the last 12 resi- dues of the published mouse 402-residue protein (20) and ex- tends the polypeptide chain by a stretch of 66 amino acid residues added to the carboxyl-end of the protein.

The molecular weight calculated for the 468-amino acid mouse protein, deduced from the corrected cDNA sequence, is 50,216 (PI 8.51). Cleavage of the putative signal peptide (Fig. 3) would produce a 444-amino acid-long mature protein with a predicted molecular weight of 47,679 (PI 8.38). This value is in better agreement with the value of 55 kDa previously reported for the mouse enhancer protein (6) than would be the molecular weight of the 378-amino acid mature protein predicted by the published p14 sequence (20). The remaining difference between predicted and observed values for the molecular weight of the mouse enhancer protein is probably due to post-translational modification. Indeed, both the mouse and human enhancer proteins contain two conserved N-glycosylation sites (Fig. 31, and the mouse protein has been shown to be glycosylated (6,7). The predicted molecular weights of the precursor and mature forms of the human enhancer protein are 47,946 (PI 7.34) and 45,523 (PI 7.361, respectively.

Expression Patterns of Enhancer RNA in Mouse Tissues- Previous immunoblotting experiments have shown the C-pro- teinase enhancer protein to be expressed at highest levels in type I collagen-rich connective tissues, particularly tendon (8). To ascertain that mRNA species represented by the sequences presented here are indeed expressed with the same tissue dis- tribution as is the C-proteinase enhancer protein, an RNase protection assay with a riboprobe corresponding to mouse enhancerlpl4 sequences was performed on RNA isolated from various mouse tissues, As seen in Fig. 4, all tissues examined showed some expression of enhancer RNA. However, compari- son of relative intensities of the protected bands clearly indi-

Page 4: Type I Procollagen COOH-terminal Proteinase Enhancer Protein

7 ? / p ~ I Procollagen C-proteinase Enhan.cer Protein 26283

Human 1

Mouse 1

5 1

5 0

101

1 0 0

151

1 5 0

2 0 1

2 0 0

2 5 1

2 5 0

287

300

327

350

377

400

427

4 5 0

4 MLPAATASLLGPLLTXCALLPFAQGQT~TRPVF&GDVKGESGWAS

FK:. 3. Alignment of the predicted amino acid residues of the human and mouse procollagen C-proteinase enhancer precur- sor proteins. 'I'hcx :\mino acid rvsidues o f thr human rnhnncrr protckin precursor predicted by the cDNA scqurncrs rrportrd hrr r (accession number L:137RI)) werc aligned with mnusr enhancer precursor amino acid sequence prrdicted by the cDNA sequences o f Lccnin rf nl. 120). amended hy addition of 2 panosine res idues as explained in the text. Alignment was with the program GAP from the Grnrtics Computrr Group (30) with a gap weight of 3 and n gap length Wright of 0.1. Rrfirnl orrows indicate the predicted cleavage sites brtwrrn siqal peptides and the mature human and mousr enhancer proteins, bawd on comparison with other secrrtcd proteins (31). Arrotc~hrnds mark the beginnings of the two CUB domains (see "Discussion"). Those deduced mouse sequences, which align with mouse enhancer pept ide f raprnts , a re u n d r r l i n d . A horizontnl nrrow marks thr point downstrram of which amino acid rrsidurs difTrr from those rrportcd hy Lecain rf nl. (20) due to frameshifts from the insertion of thr two guanosines.

cated highest levels of specific RNAexpression in tissues rich in type I collagen, such as skin and especially tendon. Expression in the brain, the tissue from which p14 was first isolated, was extremely low and detectahle only after prolonged exposure of the gel (not shown). Expression in the liver, another tissue low in connective tissue content, was also very low. These results correlate well with the tissue distrihution of the enhancer pro- tein, further supporting identity of the p14 protein product and the mouse C-proteinase enhancer protein.

Ch,romosomal Assignmrnt of the Gent IPCOLCE), Which En- codes the Human Procollagen C-proteinase Enhancrr Protein- Because of the apparent role of the C-proteinase enhancer in collagen biosynthesis, it was desirahle to localize the chromo- somal position of the cognate human gene, designated PCOLCE, as a first step toward examining its possible involve- ment in heritable connective tissue disorders. Toward this end, the 1,480-bp full-length insert of clone KTll (Fig. 2) was used as a prohe for hyhridization to Southern blots containing EcoRI-digested genomic DNA from panels of human-mouse cell hybrids. Hybridization to strong human hands of 3.9 and 3.0

kilohases and to a weaker human hand of 1.2 kilohasrs was observed with cross-hybridization t o n 7.1-kilohasc mnusr hand (not shown). Examination of DNAfrom a total of30 hyhrid lines derived from 17 unrelated human cell linrs and 4 mouse crll lines (22-24) showed the seLTegation of PCOIX:E to corrrlatr with the distrihution of human chromosome 7 (Table I ) . O f the cell hybrids examined, the hyhrid .JSR-I7S, with the 7i9 trans- location 7pter .7q22::9p24 W9pter and no intact chromosnmr 7 scored positive for PCOLCE, further localizing PCOIX'E to t h r pter bq22 region of chromosome 7. To more prtbcisely localizr the PCOLCE gene, fluorescrncr in s i tu hyhridizatinn analysis was performed on mrtaphasr chromosomr spreads. To ohtain :I

prohe for fluorescence in s i tu hyhridization analysis. a human placenta genomic library was scrrrnrd with thr insert of cDNA clone KTll (Fig. 21. A positive grnomic clone was ohtainrd and shown by Southrrn hlot analysis to haw the s amr restriction patterns as those ohserved for the endogenous PC01,CE grnr in human genomic DNA (not shown). Employing thr uncut genomic clone a s a prohe for fluorescrncr in s i tu hyhridization analysis, douhle fluorescent signals wrre found only at 7q21.3-22 in 12 out of 20 metaphase sprrads (60'; ) ex:lminrtl and on no other chromosome (data not shown), localizing PCOLCE to this region.

11ISCL'SSION

In this study, we havr shown identity of mousr procollagcn C-proteinase enhancer protein srqurnces with the tlrducrtl amino acid sequence of a previously drscrihrd (201 cDNA clonr. named pl4. isolatrd from a mousr astrocyte-drrivrd lihrary. The p14 sequence, of hitherto unknown function. had hrrn

Page 5: Type I Procollagen COOH-terminal Proteinase Enhancer Protein

26284 o p e I Procollagen C-proteinase Enhancer Protein

TAIIIX I S r~ r rgn t io r r of PCOLCIS u i t h hrrmnn chromosor71c~s in EcoRI-rliErstrd

I ~ r ~ ~ ~ r n ~ r - t r ~ o r ~ s r crll hvhrid DNA

Concordnnt nn. Discordnnt no. ~~~~~~~ ~ ~. . ~~~~

Chrnrnonomc* ~

nr hybrids of hylr lds ~ Discordancy 11; I '

~~~~~~~~~~

( + / + I ( - / - I I+/-) {-/+I

1 10 10 6 0 23 2 12 6 7 4 38 :3 15 5 4 4 10 X 3 3 R

5 x

31

5 14 6 5 5 33 6 14 X 5 3 27 7 18 11 0 0 0 8 14 5 5 6 3 7 9 6 1 0 11 n 41

10 15 2 4 9 11

4 3 11 9 4 2 23

12 12 3 7 X 50 1 3 11 6 8 :i 14 14 5 4 6 34

43

15 14 5 4 6 34 1 6 9 X 1 0 3 43 17 15 2 3 X 39 18 10 :3 9 x 57 1 9 9 11 I O 0 33

21 14 2 5 9 47 22 X X 11 2 X 1 0 5 3 32

45

"The hybrids wwr character ized hy karyotypic analysis and hy mapped enzyme markers (22-24,. Translocation chrnmosomes. with no intact chromosomr prrsrnt. wcrc not tahul:~trtI for th r p r r c rn t ag r d i s - corc1nnc.y.

~ "

2n 1 :1 4 f i 7 4 3

I

~~~~~~ ~ ~~ ~ ~~ ~~~~~~~ ~~

isolated through subtraction library methodology in an attempt to identify mRNAs specifically expressed by an astroglial cell line, D l 9 (21). However, in sifrr hybridization of brain sections (20) did not detect p14 sequences in adult mouse astrocytes. Instead, p14 expression was localized to the choroid plexus and leptomeninges, areas in the brain of relatively high connective tissue content. In the present investigation, we show expres- sion of enhancerjpl4 RNA in several mouse tissues, including brain. However, highest expression levels of the transcript were clearly associated with tissues high in collagen content, par- ticularly tendon. This finding is consistent with previous im- munoblotting data that showed the enhancer protein to be most abundant in tendons and other connective tissues rich in col- lagen type I (8). Therefore, it further supports the identity of sequences reported here and p14 sequences as representing the C-proteinase enhancer protein. The low levels of enhancerlpl4 RNA in brain found in the present study and the previous demonstration that the enhancer antigen is barely detectable in mouse brain extracts ( 8 ) argue against a brain-specific role for the p14 protein product. The low levels of enhancer protein ( 8 ) and RNA found in sterna raise the question of whether the enhancer actually complements C-proteinase activity in the maturation of type I1 collagen in uiuo.

Previous studies have demonstrated three active forms of the enhancer protein: a 55-kDa glycoprotein and two smaller pro- teins, with approximate molecular masses of 36 and 34 kDa, generated from the 55-kDa form by partial proteolysis during purification. Like the 55-kDa enhancer protein, the smaller enhancer forms were glycosylated and retained both C-protein- ase-enhancing activity and ability to specifically bind the C- propeptide of type I procollagen (6, 7). Peptide analysis in the present study shows the 36-kDa form to be derived from the NH,-terminal portion of the 55-kDa form. Interestingly, this portion of the 55-kDa molecule is composed almost exclusively of two copies of a domain (Figs. 2 and 31, which is found con- served, to varying extents, in complement components Clr/Cls and in a family of proteins that appear to play roles in devel-

BPlO'SpAN

uvs 7

To l lo~ l - RMP-I - Enhancer

.. 1 . .

FN;. 5. Comparison of the domain organizationn of the 55-kDa enhancer protein and CCn-containing metnlloproteanes. I j l ' l O SpAN. t h e hlast~ilIa-rc.strictf~~1 protvlns 111 '10 I:$:! I ; ~ n d SpAS 1:!4 I of s r n urchin; ~/'VS,2, t he YVS.2 protc.in ( 3 5 1 fnund In an t r r lo r rmhpmni r t issues ofXenopus embryos; ~ ~ J I I I J I ~ , t h r Dros(~phr1r1 tlorsnl-vrntr:~l pnt- terninggcne product Tolloid (2x1: 1j.MI'-1. thv twnc* mc~rphn~rn ic p ro t r ln 1 of h u m a n s (271, Rlnck. .stipplrc/. Irntchrtl. and o p r n h(~.rrs r ep r r s rn t metalloprotrase. C I T . EGF-likr. and non-homr~lognrle dnm:~ins. rrsprc- tivr,ly.

opmental processes such as embryogenesis and organogrnesis ( 2 5 ) . This shared domain. which has been referred to as thc. CUR ~Complement-Ue~-f-Rrnp-l~ domain (251, appears to me- diate protein-protein interactions. For rxample, in the ClrK:ls system, CUR domains are essential for the formation of Cl r l Cls tetramers and the subsequent binding of these tetramers to C l q (26). Thus, specific binding of the type I procollagen C-propeptide by the 36-kDa enhancer, which contains littlr or no sequences other than the two CUB domains is cnnsistrnt with the proposed role of CUR domains in protrin-protein in- teractions. Since antibodies to the 36-kDa enhancer fnrm also recognize the 34-kI)a form (61, both arc prob:tbly derived from the same NH,-terminal portion of the 55-kDn prrcursor. This suggests that the 34-kI)a enhancrr bindinF: of the C-propeptide may also be mediated by CUR domains. Although it serms likely that C-propeptide binding by the 55-kDa rnhanccr pro- tein also occurs via CUR domains, this rrmains to be shown. The 36- and 34-kDa enhancer forms, which apparently lack se- quences beyond those of the CUR domains. arc as effectivr in enhancing C-proteinase activity as is the 55-kDa enhancer (6) . This supports the suggestion ( 6 ) that the stimulating effect of the enhancer may be due solely to rnhancrr-propeptide binding. which could then induce a conformational change rendering the C-proteinase cleavage site more accessihle to the rnzyme.

Immediately downstream of the CUR domains in the 55-kDa enhancer is a region devoid of cysteine residues that requires four gaps in the human sequence for optimum aliKnmcnt with mouse sequences (see Fig. 3). This domain is the region Irnst conserved between mouse and human enhancers and is likrlv a linker that connects the more highly conserved upstream and downstream domains, and in which proteolytic cleavages occur to produce the 36- and 34-kDa enhancer forms. The carboxyl- terminal domain of the 55-kDa form contains six cysteines in humans and four cysteines in mouse. However, it is relatively conserved hetween the two species and thus may subserve a specific function. Comparison of the predicted amino acid sr- quences of the carboxyl-terminal and linker domains with data bases did not detect similar sequencrs and thus has not pro- vided further insight into the nature of these domains.

Of great potential interest is the question of whether thr structure of the enhancer protein may provide some insiF:ht into the nature of the C-proteinase, for which peptide and nu- cleotide sequences are as yet unavailable. The (:-proteinase. like the enhancer protein, binds to the C-propeptitle of typr I procollagen ( 5 , 6) and, under certain circumstances." may bind the enhancer protein itself. Therefore. it seems reasonablr to ask whether thest. protein-protein interactions might also b r mediated by CUR domains. Some dev~lopmentally important proteins that contain CUR domains. including human RMP-1 (27) and Drosophila Tolloid (281, also contain the conserved protease domain found among zinc-dependent metalloendopro-

~~ .~ ..~- .

E. Krssler. unpuhl i shrd da ta . ~~

Page 6: Type I Procollagen COOH-terminal Proteinase Enhancer Protein

teases of the astacin family (25, 29). Typically, these develop- mentally associated proteases, whose substrates have yet to be determined, are protein mosaics composed (in addition to the metalloprotease domain) of two to five CUB domains and one or more calcium-binding EGF-like domains (see Fig. 5). Thus, if interactions of the C-proteinase with its procollagen substrate and with the enhancer (like those of the enhancer protein itself) are mediated by CUB domains, then the calcium-dependent C-proteinase could well be a member of the family of proteins whose structures are depicted in Fig. 5.

We have mapped PCOLCE, the gene that encodes the en- hancer protein in humans, to the q21.3+q22 region of chromo- some 7. It is of interest that COLlA2, the gene coding for the pro-a2(1) chain of type I procollagen, maps to this same region. The proximity of these two genes raises the possibility that expression of type I collagen and the C-proteinase enhancer, a protein seemingly involved in type I collagen biosynthesis, might be coregulated. This proximity of the two genes also implies that the possibility of mutations in PCOLCE, as well as in COLlA2, should be considered in heritable diseases of con- nective tissue that map to 7q21.3jq22.

REFERENCES 1. Kivirikko, K. I., and Myllyla, R. (1984) in Extracellular Matrix Biochemistry

(Piez, K. A., and Reddi, A. H., eds) pp. 83-118, Elsevier Science Publishing Co., Inc., New York

2. Prockop, D. J., and Kivirikko, K. I. (1984) N. Engl. J . Med. 311, 376-386 3. Kuhn, K. (1987) in Structure and Function of Collagen Qpes (Mayne, R., and

Burgeson, R. E., eds) pp. 1 4 2 , Academic Press, Inc., Orlando, FL 4. Hojima, Y., van der Rest, M., and Prockop, D. J. (1985) J. Biol. Chem. 260,

1599616003 5. Kessler, E., Adar, R., Goldberg, B., and Niece, R. (1986) Collagen Relat. Res. 6,

249-266 6. Kessler, E., and Adar, R. (1989) Eur. J . Biochem. 186, 115-121 7. Adar, R., Kessler, E., and Goldberg, B. (1986) Collagen Relat. Res. 8, 267-277 8. Kessler, E., Mould, P. A., and Hulmes, J. S. (1990) Biochem. B ~ O D ~ V S . Res.

Type I Procollagen C-proteinase Enhancer Protein 26285

Commun. 173,8146 , " Development 114, 769-786

- .

35. Sato, S. M., and Sargent, T. D. (1989) Deu. Biol. 137, 135-141

10. Pesciotta, D. M., Curran, S., and Olsen, B. R. (1982) inlmmunochemistry ofthe Extracellular Matrix (Furthmayer, H., ed) Vol. I, pp. 91-109, CRC Press, Boca Raton, FL

11. Hunkapiller, M. W., Lujan, E., Ostrander, F., and Hood, L. E. (1983) Methods Enzynol. 91,227-236

12. Cleveland, D. W., Fisher, S. G., Kirschner, M. W., and Laemmli, U. K. (1977) J.

13. Matsudaira, P. (1987) J. Biol. Chem. 262, 10035-10038 B i d . Chem. 262,1102-1106

14. Peretz, M., and Burstein, Y. (1989) Biochemistry 28, 6549-6555 15. Laemmli, U. K. (1970) Nature 227, 680-685 16. Lee, S.-T., Smith, B. D., and Greenspan, D. S. (1988) J. B i d . Chem. 263,

17. Chomczynski, P. (1993) Bionchniques 15, 532535 18. Feinberg, A. P., and Vogelstein, B. (1983) Anal. Biochem. 132, 6 1 3 19. Trask, B. (1991) Methods Cell Biol. 35, 1-35 20. Lecain, E., Zelenika, D., Laine, M.-C., Rhyner, T., and Pessac, B. (1991) J.

Neurochem. 66, 2133-2138 21. Rhyner, T. A., Lecain, E., Mallet, J., and Pessac, B. (1990) J. Neurosci. Res. 27,

144-152 22. Shows, T. B., Sakaguchi, A. Y., and Naylor, S. L. (1982) in Aduances in Human

Publishing Corp., New York Genetics (Harris, H., and Hirschorn, K., eds) Vol. 12, pp. 341-452, Plenum

23. Shows, T., Eddy, R., Haley, L., Byers, M., Henry, M., Fujita, T., Matsui, H., and

24. Shows, T. B. (1983) in Isoenzymes: Current Topics in Biological and Medical Taniguchi, T. (1984) Somatic Cell Mol. Genet. 10, 315418

Research (Rattazzi, M. C., Scandalios, J. G., and Whitt, G. S., eds) Vol. 10, pp.323-339, A. R. Liss, New York

9. Todaro, G. J., and Green, H. (1963) J. Cell Biol. 17,29%313

13414-13418

25. Bork, P., and Beckmann, G. (1993) J. Mol. Biol. 231, 539-545 26. Arlaud, G. J., Colomb, M. G., and Gagnon, J. (1987) Immunol. Today 8, 1 0 6

110 27. Wozney, J. M., Rosen, V., Celeste, A. J., Mitsock, L. M., Whitters, M. J., Kriz,

28. Shimell, M. J., Ferguson, E. L., Childs, S. R., and OConnor, M. B. (1991) Cell

29. Dumermuth, E., Sterchi, E. E., Jiang, W., Wolz, R. L., Bond, J. S., Flannery, A.

30. Deverew, J., Haeberli, P., and Smithies, 0. (1984) Nucleic Acids Res. 12,

32. Greenspan, D. S., Lee, S.-T., Lee, B. S., and HoEman, G. G. (1991) Gene 31. von Heijne, G. (1983) Eur. J . Biochem. 133,17-21

33. Lepage, T., Ghiglione, C., and Gache, C. (1992) Deuelopment 114, 147-164 34. Reynolds, S. D., Angerer, L. M., Palis, J.. Nasir, A., and Aneerer. R. C. (1992)

" .

R. W., Hewick, R. M., and Wang, E. A. (1988) Science 242, 1528-1534

67,469-481

V., and Beynon, R. J. (1991) J. Biol. Chem. 266,21381-21385

9383-9394

Expression 1, 29-39