collagen gene structure
TRANSCRIPT
-
8/17/2019 Collagen Gene Structure
1/8
COLLAGEN GENE STRUCTURE
The collagen genes, like most eukaryotic genes, are large, multiexon
nes interrupted at several points by noncoding DNA sequences of unknown function
called introns.54 Thus, the eukaryotic gene coding for a protein is much larger than
would be predicted from the amino acid sequences of the nal protein An example of
the complexity of collagen genes is the intron!exon organi"ation of the type #$$collagen gene ( COL7A I ) %see &ig '()*+, which consists of '' exons, the largest
number in any published gene--
During the early stages of gene expression, the entire gene is transcribed into a high)
molecular)weight precursor m.NA, which is a complementary copy of the coding
strand of the double)helical DNA The precursor m.NA undergoes posttranscriptional
modications, such as capping and polyadenylation, and the introns are removed by
splicing to yield a linear, uninterrupted coding sequence with -/ and */ untranslated
0anking regions The mature m.NA is then transported into the cytoplasm and
translated in cells, such as dermal broblasts
1omplementary and genomic DNA clones corresponding to the α chains of various
collagen molecules have been described in different laboratories These clones have
been extensively characteri"ed and hybridi"ed with the corresponding m.NA
molecules to examine the temporal and topographic expression of these genes by
Northern Not and in situ hybridi"ation techniques, respectively $n addition, their
nucleotide sequence homology with the corresponding amino acid sequences in the
collagen a chains in various animal species has been determined, thus allowing
estimates of the evolutionary conservation of certain segments within the collagens-2
A high degree of conservation implies a region of functional importance within a
protein molecule
.ecombinant DNA technology has also facilitated determination at the precisechromosomal location of the di3erent collagen genes within the human genome %Table
'()4+ 5ith few exceptions, the collagen genes are widely scattered throughout the
human genome &or example, the genes coding for the two constituent polypeptide
chains of type $ collagen, a 1(I) and a4%$+, are located on separate chromosomes, '6
and 6 respectively 7nowledge of the precise chromosomal location of the genes
coding for collagens in human skin will allow development of polymorphic markers
within the genes and in the 0anking DNA for use in genetic linkage studies $n addition,
sophisticated mutation detection strategies, based on scanning of the genes, have led
to identi8cation of a large number of mutations in di3erent collagen genes with
characteristic phenotypic consequences %see, eg, .efs * to -, *', *4, and -6+
TRANSLATION OF COLLAGEN POLYPEPTIDES
9nder physiologic conditions, collagen molecules spontaneously assemble into
insoluble bers This observation presented a logistic problem because it was di:cult
to visuali"e how a collagen molecule could be synthesi"ed inside the cell and then
secreted into the extracellular space without premature assembly of the molecules
into insoluble bers This problem was solved by the demonstration that collagen is
initially synthesi"ed as a larger precursor molecule, procollagen, which is soluble under
physiologic conditions
The precursor polypeptides of procollagen, so)called prepro)a chains, are synthesi"ed
on the ribosomes of the rough endoplasmic reticulum in broblasts and related cells
%&ig '()(+ This initial translation product, the prepro)a chain, contains an amino)
terminal signal %or leader+ sequence The signal sequence, a characteristic feature of
-
8/17/2019 Collagen Gene Structure
2/8
many secreted proteins, is rich in hydrophobic amino acids and probably serves as a
signal for attachment of the ribosomes to the membranes of the rough endoplasmic
reticulum and vectorial release of the nascent polypeptides into the cisternae of the
rough endoplasmic reticulum During the transmembrane transport of the
polypeptides, the signal sequence is en"ymatically removed in a reaction cataly"ed by
signal peptidase %Table '()*+,The polypeptides released inside the lumen of the rough
endoplasmic reticulum are termed pro- α chains and are larger than collagen a chainsbecause they contain additional peptide sequences at both ends of the molecule
#arious studies show that these noncollagenous extension peptides are di3erent from
the collagenous portion of the molecule in that they do not have glycine in every third
position, they are relatively poor in proline and hydroxyproline, and they are relatively
rich in acidic amino acids These extension peptides also contain cysteine and
tryptophan, which are not present, for example, in type $ and $$ collagens These
noncollagenous domains often contain motifs homologous with sequences found as
building blocks in other extracellular matrix proteins, such as the bronectin $$$
domain, von 5illebrand factor A domain, and thrombospondin N)terminal domain
sequences $t should be noted that, in spite of their homology, these domains do not
have the functional characteristics of the original proteins
POSTTRANSLATIONAL MODIFICATIONS OF POLYPEPTIDE CHAINS
After the assembly of amino acids into prepro)a chains on the ribosomes, the
polypeptides undergo several modications before the completed collagen molecules
are deposited into extracellular bers %&ig '()(+ ;ost of these modication reactions
are cataly"ed by specic en"ymes, and many of the modications are characteristic of
the biosynthesis of collagen %Tables '()* and '()(+ These events are often termed
posttranslational modications to emphasi"e that these reactions are not directly
controlled by the information in the m.NA but occur in the polypeptide chains after
the amino acids have been linked together by peptide bonds The posttranslational
modication reactions of
-
8/17/2019 Collagen Gene Structure
3/8
6A' A The al %#$$+ polypeptide con8sists of a triple)helical domain that
contains imperfections or interruptions in the ?ly)@) repeat sequence, including a *B
!amino acid ChingeC region The central collagenous domain is 0anked by
noncollagenous segments, the amino)terminal N1)' domain and the carboxyl)terminal
N1)4 domain The N1)' domain consists of submodules with homology to known
adhesive proteins, as indicated below the molecule The N1)4 domain has a segment
of homology with the 7unit" proteinase inhibitor molecule The type #$$ collagen
gene consists of a total of '' exons %vertical blocks+, which are separated from each
other by intervening noncoding intronic sequences %hori"ontal lines+ The si"es %in
base pairs+ of the introns %above
the lines+ and the exons %below
the blocks+ are indicated
%;odied from 1hristiano et
al,- with permission+
A. Intracellular steps
' Translation of prepro)a)
chains on theribosomes of the rough
endoplasmicreticulum
4 1leavage of the signal
sequence* Eydroxylation of selected
prolyl andlysyl residues
( ?lycosylation of some
hydroxylysylresidues
- &ormation of interchain
disulde bonds2 &ormation of triple helices
. Secret!"n "# pr"c"lla$en
C. E%tracellular
&"'!(cat!"ns)
' 1leavage of peptide
extensions by specic
proteases
4 &ibril formation
-
8/17/2019 Collagen Gene Structure
4/8
* 1ross)linking of collagen brils by deamination of hydroxylysine and lysine
residues to give aldehydes, followed by cross)link formation by reaction of either
%a+ 4 aldehydes or %b+ ' aldehyde and ' F)amino group on adGacent molecules
-
8/17/2019 Collagen Gene Structure
5/8
TALE *+-
Characteristics of Enzymes Participating in the Biosynthesis of Collagen
ENZYME SUBSTRATE PR!UCT C"ACTRS AN!
CSUBSTRATESysyl hydroxylase
1ollagen galactosyl
transferase
1ollagen glucosyltransferase
Hrotein disulde
isomeraseI
Hrocollagen N)
proteinase
%ADA;Tysyl
oxidases
Nascent prepro)J chains
Hrolyl residue in x)pro)glysequence in pro)a chainsK
Hrolyl resisdue in pro)hyp)
gly sequence in pro)a
chainsK
>ysyl residue in lys)gly, lys)
ser, or
lys)ala sequence in pro)a
chainsK
Eydroxylysine in pro)achainsK
?alactosyl)=)hydroxylysine
in pro)a chainsK
1ysteine residues in the
extensions of pro)a chains
Hrocollagen or pa)collagen
Hrocollagen pc)collagen
>ysyl or hydroxylysyl
residue in brillar collagen
Hro) J chains
()Eydroxyproline
*)Eydroxyproline
Eydroxylysine
?al)=)hydroxylysine
?lc)gal)=)hydroxylysine
-
8/17/2019 Collagen Gene Structure
6/8
-
8/17/2019 Collagen Gene Structure
7/8
Snt/es!s "# H'r"%ls!ne
Eydroxylysine is another amino acid characteristic of collagen %&ig '()6+ During the
intracellular synthesis of pro)collagen, hydroxylysine serves as an attachment site for
the sugar residues and is critical to the formation of cross)links that stabili"e the
extracellular collagen matrix &ree hydroxylysine is not incorporated into nascent
polypeptide chains, but certain lysyl residues in peptide linkages are converted to
hydroxylysine The hydroxylation reaction is cataly"ed by an en"yme, lysyl
hydroxylase, which, like the prolyl hydroxylases, requires P4, &e4M, a)ketoglutarate, and
ascorbate as cofactors and cosubstrates Despite certain similarities, the prolyl and
lysyl hydroxylases are di3erent en"yme proteins and products of di3erent genes >ysyl
hydroxylase, like prolyl)()hydroxylase, hydroxylates only lysyl residues in the
position of the repeating ?ly)@) sequence Fven though hydroxylation of lysyl residues
in collagen is initiated while the polypeptides are still assembled on the ribosomes, the
formation of hydroxylysine continues for some time after the release of peptides from
the ribosomes
The extent to which lysyl residues in the position of the ?ly)@) sequence are
hydroxylated varies greatly among the collagens from di3erent sources $n particular,type $ and type $$$ collagens are frequently hydroxylated to a lesser degree, so that
these collagens normally contain
approximately four to eight hy)
droxylysine residues per 'PPP amino
acids, whereas type $$ collagen has
approximately four to ve times as
many hydroxylysine residues $n type
$# collagen, most of the lysyl
residues are converted to
hydroxylysine This variation can be
explained in part by di3erences in
the actual number of lysyl residues
that are available for maximal
hydroxylation in the pro)a chains
The variation in the hydroxylation of
lysyl residues can also be explained
by the fact that the nature of the amino acids in the @ position and in the adGacent
triplets in0uences the rate at which lysyl residues in ?ly)@)>ys sequences are
hydroxylated $n addition, lysyl hydroxylase does not hydroxylate a collagen substrate
that is in the triple)helical conformation- Therefore, folding of the pro)a chains into a
triple helix terminates the intracellular formation of hydroxylysyl residues The rate atwhich pro)a chains of di3erent genetic types fold into the triple helix varies, and, in
particular, the rate of triple)helix formation is considerably slower during the synthesis
of type $$ pro)collagen than of type $ procollagen Thus, folding of the procollagen
polypeptides into their triple)helical conformation can regulate the amount of
hydroxylysyl residues in newly synthesi"ed collagen molecules
The critical importance of lysyl hydroxylation of collagen is attested to by the
deciency of lysyl hydroxylase in patients with the scoliotic %type #$+ form of FD
-
8/17/2019 Collagen Gene Structure
8/8