nucleotide sequence and genome organization of foot-and-mouth
TRANSCRIPT
volume 12 Number 16 1984 Nucleic Acids Research
Nucleotide sequence and genome organization of foot-and-mouth disease virus
S.Forss, K.Strebel, E.Beck and H.Schaller
Microbiology, University of Heidelberg, Im Neuenheimer Feld 230, D-6900 Heidelberg, FRG
Received 21 May 1984; Revised and Accepted 24 July 1984
ABSTRACTA continuous 7802 nucleotide sequence spanning the 94* of foot and mouthdisease virus RNA between the 5 -proximal poly(C) tract and the 3'-terminatpoty(A) was obtained from cloned cpNA, and the total size of the RNAgenome was corrected to 8450 nucteotides. A long open reading frame wasidentified within this sequence starting about 1300 bases from the 5 end of theRNA genome and extending to a termination codon 92 bases from its polyade-nylated 3 end. The protein sequence of 2332 ammo acids deduced from thiscoding sequence was correlated with the 260 K FMDV polyprotein. Its pro-cessing sites and twelve mature viral proteins were interred from protein data,available for some proteins, a predicted cleavage specificity of an FMDVencoded protease for Gtu / GlyCThr, Ser) linkages, and homologies to relatedproteins from poliovirus. In addition, a short unlinked reading frame of 92codons has been identified by sequence homology to the polyprotein initiationsignal and by in vitro translation studies.
INTRODUCTION
Foot and mouth disease viruses (FMDV) or aphthoviruses are picornaviruses
which are the causative agent of an aggressive and economically important
disease of cloven-footed farm animals. Their virion contains a single-stranded
RNA genome of about 8 kb with a small protein (VPg) covalently attached to
its 5' end, an internal poly(C) tract, and a poly(A) sequence at the 3' end.
This RNA is of positive polarity and can act directly as a messenger RNA.
Protein sythesis involves post-translational cleavage of a 260 K potyprotein
which is encoded between a single major translation initiation site next to the
poly(C) tract and the 3' end of the RNA. This mode of protein synthesis is
common to all picornaviruses (1) and makes them interesting model systems for
studying the mechanisms of translation initiation and of protein maturation by
specific proteolytic cleavages (2).
To obtain more information about the FMDV genome, its control signals and its
gene products we have cloned cDNA copies of the viral RNA from strain O-jK
and determined their nucleotide sequence. Using a set of overlapping clones we
obtained a continuous sequence of 7915 nucleotides representing the 3' proxi-
mal long L segment of the FMDV RNA which is thought to contain all its cod-
ing information. In a previous publication we had already determined the initia-
© IRL Press Limited, Oxford, England. 6587
Downloaded from https://academic.oup.com/nar/article-abstract/12/16/6587/2385244by gueston 04 April 2018
Nucleic Acids Research
lion sites for polyprotein synthesis and a long open reading frame encoding the
structural proteins of FMDV (3). The present work extends this reading frame
by the coding sequence for the non-structural proteins, predicting a single
translation product of 2332 ammo acids which could be correlated in many
parts with known protein data trom FMDV induced proteins. In addition, we
report and discuss a sequence of 713 bases preceding the polyprotein gene to
the poly(C) tract.
MATERIALS AND METHODS
Enzymes
Restriction endonucleases were isolated and purified by standard procedures or
purchased from Boeringer Mannheim or Biolabs. Avian Myeloblastosis Virus
reverse transcriptase was a gift from W. Keller, Heidelberg. Nuclease S1 from
Aspergillus oryzae, calf intestinal phosphatase, and T4 polynucleotide kinase
were from Boehnnger Mannheim, and terminal deoxynucleotidyl transferase was
from BRL. [ a / - 3 2 P] -ATP was prepared according to (4).
FMDV RNA
FMDV O.|K RNA, isolated after 7 and 64 passages in BHK-cells of the same
field-isolate of the virus (Kaufbeuren 1966-67) was provided by K. Strohmaier
and W. Keller. The virus was plaque-purified at the beginning of the passages
and again after passage 16.
FMDV cDNA clones
The clones FMDV-2735 and -2615 were constructed essentially as described
by (3) using single-stranded restriction fragments (mapping at pos. 3662-3752
and 3792-3882, cf. Figure 2) trom the existing cDNA clone FMDV-715 as prim-
ers for cDNA-synthesis. The double-stranded cDNA was provided with dC-
tails and annealed to dG-tailed pBR322, linearized at the Pstl site. Construc-
tion of clones FMDV-3214a and -3214c was carried out using a modified
procedure that avoids second strand synthesis and G/C tailing (cf. Figure 1):
cDNA synthesis was primed with an Aval/Hindlll restriction fragment (Pos.
692-743) from cDNA clone FMDV2735 which had been isolated from a 6*
potyacrytamide strand separation get (8). Reverse transcription was followed
by RNase treatment (pane. RNase 20ug/ml, 30 min., 37°C). Oligo-dA tails of
70-80 nucleotides were added to the 3'-ends of the cDNA using Terminal
Deoxynucleotidyl Transferase. This ohgo-dA tailed cDNA was annealed to the
DNA fragment complementary to the primer fragment. By this a partially dou-
ble stranded molecule was obtained having a single stranded oligo-dA tail at
its 3'-side and a double stranded portion at its 5'-side reconstituting the origi-
6588
Downloaded from https://academic.oup.com/nar/article-abstract/12/16/6587/2385244by gueston 04 April 2018
Nucleic Acids Research
nal Hindlll site ot FMDV pos. 743. This DNA molecule was then annealed to
the vector pUC9 (6), that had been cleaved by Pstl, dT-tailed, Hindlll cleaved
and puntied by agorose gel electrophoresis. The DNA mixture was ligated and
transformed into competent E.coli C600 cells. Clones were screened for FMDV
inserts by colony hybridization with [32P]-labelled FMDV RNA as described
elsewhere (7). Clones FMDV-2735 and -2615 were derived from RNA isolated
from low passaged FMDV (7 passages), all other clones were derived from a
virus-isolate after 64 passages in BHK cells.
Nucleotide sequence analysis
Restriction fragments were endlabelted and chemically degraded by the base-
specific cleavage methods according to (8). 5' endlabelling was used through-
out except for the 3' end of the genome where 3' endlabelling was employed
at a Hpall site located 68 nucleotides in front of the poly(A) tail. Thin
sequencing gels (0,04 cm), either 40 or 100 cm long, were dried prior to print-
ing, according to (9) to improve resolution of the bands in the sequencing lad-
ders.
Computer analyses
The derived nucleotide sequences were entered into a data base, where the
information was stored and processed using the computer programs of (10).
The alignment program from Kriiger and Osterburg (unpublished) was based on
the algorithm from (11). The homology comparison in Figure 4b was based on
such computer-derived alignments scoring "similar ammo acids" as one third of
identical ammo acids.
RESULTS AND DISCUSSION
Cloning of FMDV cDNA
A set of cDNA clones from FMDV strain OiK (FMDV-144, -512, -703, -715,
-1034, -1448) has been described which cover the two thirds of the FMDV
genome from the VP3 coding region to the 3' end (5, cf. Figure 2). To obtain
cloned cDNA copies upstream of the VP3 gene single-stranded restriction frag-
ments from the existing cDNA clones (indicated by I and II in Figure 2) were
used in two steps (7, 12) to prime cDNA synthesis close to the missing parts
of the FMDV genome (see Methods and Figure 1). As a result the cloned part
of the FMDV genome was extended into the 3'-end of the poly(C) tract.
Nucleotide sequence analysis
In Figure 2 exact map positions of the cloned cDNA inserts used for nucleotide
sequence analysis and the sequencing strategy are depicted. The methods of
(8) were followed for 5' [3 2P]-endlabelling and subsequent partial chemical
6589
Downloaded from https://academic.oup.com/nar/article-abstract/12/16/6587/2385244by gueston 04 April 2018
Nucleic Acids Research
HinduRNA 51 __ _ = _ * '
cDNA 3' ~ *** •™
Transformation —
TTTTT
PUC9
F ig .1 : Strategy tor cloning 5'-terminal sequences of FMDV O i K . Theminus strand of an Aval/Hmdll l fragment (pos . 692-743, hatched box)from cDNA clone pFMDV2735 was used to speci f ica l ly prime cDNAsynthesis on FMDV RNA (step I). The RNA template was hydro lyzedusing pane. RNase, o l i go -dA tails were added to the 3 ' -ends of thecDNA and the cDNA was annealed to the plus strand of the Aval/Hmdll lfragment (empty box ) , complementary to the primer fragment (step II).The part ial ly double stranded and d A - tai led cDNA was annealed topUC9 vector DNA (6) that had been Pstl c leaved , dT- ta i led and Hmdlllc leaved in this order (step III) and then transferred into competent E.coliC600.
degradation of restr ict ion fragments. In order to obtain unambiguous sequence
data most regions were sequenced in several independent runs, and the entire
sequence was analyzed on both DNA strands (Figure 2). Using over laps of at
least hundred nucleot ides between subctones, a continuous sequence was g e n -
erated over the 7915 c loned nucleotides start ing with 11 C residues from the
poly(C) tract and ending in 102 A residues as determined in clone
pFMDV-3214c and pFMDV-512, respect ively (Figure 3). A l l cleavage sites
piedicted in this sequence for the restr ict ion enzymes used during the sequence
analysis were also observed experimental ly, except for a Clal site at position
5026. This site over laps two Mbol sites which are known to be targets for
adenosine methylat ion in E. col i (13).
Minor heterologies in the sequence were observed in over lapping regions from
different c lones (Figure 3, 14). Such point mutations are known to occur with
high probabi l i ty in populat ions of FMDV RNA (15) and in other viral RNAs
because of the intr insical ly imprecise mechanism of RNA repl icat ion (16). This
variation was part icular ly obvious when cDNA from moderately and highly pas -
saged virus (7 and 64 passages in 6HK ce l l s , respect ive ly) was compared. No
subst i tut ions were found in 550 bases over lapping the clones FMDV-2735 and
-2615 from the same low passaged virus (14).
The nucleot ide sequence
The FMDV genome is divided by the poly(C) tract into two parts of different
size and funct ion. The small (S) segment to the 5 ' side (approx. 400 nuc leo -
tides (17)) is probably involved only in initiation of viral RNA repl icat ion, and
the large (L) segment 3 ' of the poly(C) contains a l l the protein coding informa-
6590
Downloaded from https://academic.oup.com/nar/article-abstract/12/16/6587/2385244by gueston 04 April 2018
ryib'f"i ** • p». ' I » " ' m \ I P12 | n* I m | »ni | raoi
; 'viv'W i; ;'—;B 't' " t y r t i 1 i " i ' i ' i i f t i / f t ^ ' i ^ I'T #N»tfU{'.itf"' ' ' 4 ^ 1_ . , - , i , . ™-j
Fig.2: Physical map ot the FMDV genome, FMDV cDNA clones andstrategy for the nucleotide sequence analysis. The top part shows thephysical and the gene map of the FMDV RNA. Positions of the restric-tion sites of endonucleases used for 5' end-labeling are indicated on thesecond line. Below that, the restriction fragments used to prime cDNAsynthesis (I and II) are illustrated as open arrows. Horizontal arrowsshow direction and extent ot individual sequencing runs, grouped accord-ing to the clones they originate from. Dashed lines represent portions ofclones or sequencing runs from which no information was obtained. Theanalysis of the clones FMDV-1034 and FMDV-144 has already beenreported (26) and only the extent ot the sequenced areas are indicatedhere. The restriction enzymes are represented by the letters: A, Aval;B, BamHi; C, Clal; E, Haelll; F, Hind; H, Hhal; I, Hphl; N, Hindlli; P,Hpali; R, EcoRl; T, Taql; U, Pvuli; X, Xhol; Xb, Xbal.
CD
o'
O)
3)a>inCD0)a
Downloaded from https://academic.oup.com/nar/article-abstract/12/16/6587/2385244by gueston 04 April 2018
Nucleic Acids Research
tion (18). The sequence ot 7802 nonhomopolymeric nucleotides shown in Figure
3, represents the complete primary structure ot the L segment. Assuming
additional 150 bases tor poly(C) and 400 tor the S segment this corresponds to
94 % of the FMDV genome indicating that the size of the FMDV genome is
about 8500 nucleotides, i.e. 500 nucleotides longer than estimated from sizing-
gets by (17).
This size correction is also supported by the sizing and the partial sequence
analysis of cDNA copies that cover the as yet uncloned 5'-terminat part of the
FMDV genome, i.e. the S segment and the poly(C) tract (7). So tar, all
attempts to clone this missing part of the viral RNA have been unsuccessful.
These difficulties seem to be related to the internal poly(C) tract, which, alt-
hough readily copied into cDNA (7), is probably highly unstable in E. coli, as
shown for other (GtC)-homopolymers longer than 30 basepairs (19). In accor-
dance with this notion we have only been able to clone 11 C residues from the
3' proximal part of the poly(C) tract which still contains interdispersed non-C
nucleotides. The sequence obtained ...GCT(C)i 1AAG... was confirmed by
direct sequencing of the cDNA extending into the poly(C) tract (7). It is diffe-
rent, but shows similarities to the sequence ..T(C)2AUUCCAAG... determined
for the 3' end ot a T1 oligo nucleotide containing the poly(C) tract from FMDV
O-V1 and A61 (17).
Coding regions and translation initiation sites
From the kinetics of the appearance of FMDV specific gene products it has
been predicted that FMDV RNA contains a single long open translational
reading frame for a 260 K polyprotein, which in turn is a precursor for all gene
products. Our sequence reveals such a reading frame of 7035 nucleotides
(pos. 766 to 7800) with a first possible initiaton codon for the polyprotein at
position 805 (see Figure 3). The coding region is preceded by 724 nucleotides
of known sequence and by some additional 550 nucleotides of yet uncloned
RNA comprising the poly(C) tract and the S fragment. It is followed by sev-
eral atop codons in all three reading frames leaving 92 nucleotides untranslated
in front of the poly(A) tail. Usually the 5' proximal AUG is used to initiate
translation in an eucaryotic mRNA, which suggests that the ribosomes first
recognize the capped 5' end ot the RNA and then traverse downstream until an
AUG is encountered (20). Clearly this model is not applicable to FMDV since
translation of the major primary gene product starts at position 805 and to a
lesser extent also at position 889 (3), which are the ninth and tenth AUGs
downstream trom the poly(C) (see Figure 3). These two start sites differ from
other AUG codons in the sequence in that they are preceded at a short dis-
6592
Downloaded from https://academic.oup.com/nar/article-abstract/12/16/6587/2385244by gueston 04 April 2018
Nucleic Acids Research
9 2 fCTnAACTTTTACCCTCCTTCCCGACGTAAAASCCAGGTAACCACAACCITCAAACCGTCCCCCCCCACCTAA^
?0« C G G A ^ G A A A C C A C A A G A C T T A C C T T C C C T C C C A A G T A A A A C G A C A A A C A C A C A C A G T T T I G C C C G T T T T ^ ^
31S TTGTACAAACACCATCTATGCAGCTTTCCOa^ACTGACACAAACCGTGCAACTTCAAACTCCGCCIGCTCT^^
«<0 G A T C C A C T A G C G A G T C T T A G T A G C G G T A C T G C T G T C T C G T A G C G G A G C A J G T T G G C C G T G G G A A C A C C T C T ^
See C A T G T G T G C A A C < S X > C C A C G G C A G C T T T A C T C T C A A A C C C A C T T C A A G G I G A C A T T G A T A C T G C T A C T C A A
• I t C « 1 A C A C T C G G G A T C T G A G A A C C G G A C T G G G A C T T C T T T A A A G T G C C C A G T T T A A A A A & C I T C T A C C C C T G A ^ ^
P 2 Q & B 0 & ATG AAT ACA ACT &*C TCT TTT ATC GCT TTG G T A CAG CCT A T C A G A G A G A I T AAA GCA CTT TTT CTA TCA CGC ACC A C A GGG AAA I ATG GAA P20fi
<u "j.V"J!;'_'J!'_i"i".tri"i*'j_""-*'j_i;i*'jJ!.'.t;ii';_'i'_L."--i"ifi>_'-i"_i'i-if_'i'-r_'_!'j'.M\r:7Ol" "PIO
C T GAA G C TC AG f f j G i f i GAT CTC
G&h CT T
CAA CTG CAT GAG GGT G&h CCA CCT CCT CTC GTG ATC TGG AAC ATC AAG CAC TTG CTC CAC ACC GGC ATC GGC ACC GCC TCG C & l
12SS TTT CCG TGT GTC ACC TCC AAC CCG TGG TAC GCG ATT CAC GAT GAG GAC TTC TAC CCC TGG ACG CCG GAC CCG TCC GAC GTT CTG GTC TTT
1 3 * S GTC CCC TAC CAT CAA GAA CCA CTC AAC GGG CAA TCG AAA GCC AAG GTT CAA CCC AAG CTC AAA GGG GCT GCA CAA TCC ACT CCA GCG ACC
V a I P ' o Tyr Aap G i n Cl u P r v L f u Asn G l y G l u T r p L y * A I * L y a V a l G i n A r g L y a L t u L y a G l y A l a G l y G i n S a r Sar P r o A l a Thr 2 1 0
H I S CCC TCC CAG AAC CAA TCT CCClAAT ACT GGC ACC ATA ATA AAC AAC TAC TAC ATC CAC CAG TAT CAA AAC TCC ATG GAC ACA CAC CTT GGT VP4
) i 5 I 5 GAC AAC GCA ATC ACT GCA GGC TCT AAC CAG GGC TCC ACC CAC ACA ACC TCC ACC CAC ACA ACC AAC ACC CAC AAC AAT GAC TGG TTC TCC
i e < » AAA CTT CCC ACC TCT GCT TTC AGC GGT CTT TTC GCC CCT CTT CTC CCC IcAC AAG AAG ACA GAG CAG ACC ACT CTC CTC CAA CAC CCC ATC VP2
Idti Ltl IiLF SlH S±S IE.' !£.' LU l U 41k 410 &!« ill 3 0 °170) CTC ACC ACC CGI AAC GGC CAC ACC ACG TCG ACA ACC CAC TCA AGC GTT GGA GTC A C A TAC GGG TAC CCA ACA CCT GAA GAT TTT CTG ACC
Lau Tnf Tnr A^j Ain 61 r H n Thr Tnr &*r Thr Tnr Gin S«' S«' V«l Gly V«l Thr Tyr Gly Tyr Al# Thr Alt Glu Aap Prl* V*l Sar 330
17^^ G y CCG 'UhC ACT TCC GCT CTC rt ACC ACA CTT CTG £P^1 GCA ^^ CGC TTT TTC AAA ACC f**ftC CTC TTC *if TCG CTC ACC ACT f^yf TCA
1 f TTC G&^ CGT TGC **f CTC CTG ^ A CTC CCG ACC ^c CAC ' OV GGT GTC TAC GGC AGC CTG ACT CAC TCG TAT CCA TAT ATG A y *<ff GGC
: AAC ACT CAA GGT CCC
VP3~C I G G A ATA TTC CCC GTG CCC
2 3 3 6 TGT ACC CAC GCC TAT GGT GGC CTC GTG ACC ACG GAC CCG AAC ACC CCT CAC CCC GTT TAT GGG A A * CTG TTC AAC CCC CCC CGC AAC CAC
C y t Sar Aap C l y Tyr G l y G l v L a u V a l T^^ T j ^ Aap Pio L y ^ T h r A l • Aap P r y V a I Tyr G l v ^ _ > V a I Pha Aan Pio P r o A r g A a n G i n 6 4 0
2a I S TTG CCG GGG CGT TTT ACC AAC CTC CTT CAT GTG GCT f>AO GCA TCC CCG ACG TTT CTC CGC TTC GAG GGT GCC GTA CCG TAC CTC ACC ACG
2S1S ^AA ff*fi fL^f* TCG G^C AGG CTC CTT CCT CAC TTT CAC ATC TCT TTC CCA CCA AAA CJ^^ ATC T C A SAC ACC TTC CTC C C A GCT CTT CCC C A GL r a T h r Aap f a r A i p * ' 9 V t i L a u A l a C l n P n t A a o M a I & i r L a u A4 a A l a L y a G i n Ma I S a r A m Thr Pha L t u A l t G l y L a u A l a G i n 0 0 0
T » f T y i Tnr G i n Tyr itt Gl r Tnr l i t A m L t u n n P n r M a i P u t Th< G l y *>'O Thr A I D A l a L v > A l t A r g T r f M a i V a l A l t Tyr A l t S 3 0
26 9ft CCA CCA GCC ATC CAC CCG CCC ^^G ACA CCT GAG GCG CCC CCC CAC TGC ATT CAT C T CAA TGG fr^f ACT GGC TTA ifeAC TCA AAG TTT ACT
i16S TTT TCC ATC CCC TAC CTC TCC CCC GCC CAT T A C fJC^m TAC ACC CCG TCT GfaC GTG GCC GAG ACC ACft J^^T GTG C^flr ^ f ^ TGG GTC TGC TTGP n t S a r i i a P r o Tyr L a u i n A l a A l a A«o T v A l a Tyr Thr A l a S i r Gly V » l A l a Q*u Tnr Tnr Aan V a l G i n G l y T r p V t I C y l L a u BOO
2 » 0 S CCC CCT CCC GAA I ACC ACT TCT CCC CCC GAG TCA CCC CAT CCT CTC ACC ACC ACC GTT GAA AAC TAC GGT CCC CAA ACA CAC ATC CAG ACC V P 1
Ala Arg Ai a G I U I T I W TIW Sa_ Ala Cly frlu $\t Ala A I D Pip Val Thr Thr Thr V i l Glu Ain Tirr CJj Gl v Glu Thr Gin I l i CjJJ AfQ f 50
JOSft CCC f~^* ^AC ACG fi^c O ' C TCC TTC ATC ATC ^ ^ f ACA TTT C T C %/sG C T C ACA CCQ C A ^ * ^ ^ ^ ^ ^ ATT ^\^c A T T T T 6 ^AC C T C ATC C A C ATT
3 7 3 6 ACC TCC C T T CCA AAT GCA G C C CCC C A A AAC C C C T T C CAC * A C ACC ACC AAC C C A ACT GCT T A C CAC AAC GCA CCA C T C A C C C C C C T T CCC
T n r T r p V a l P r o A m C l y A l t P r o C l u t - y a A l a L t u A » p A l n T h r T h r A « n P r o T n r A l t T y r M i a L y a A l a Pit L .au T n r A r g L a u A l t 1 * 0
3 3 7 $ C T C C C C T A C ACT C C C CCC CAC CGC G T G T T G CCA ^CC C T C T A C r \^c CCT C A C TCC i * ^ ^ T A C rtj^c ^^^^ AAT GCT CTG CCC AAC T T C A C A CGT
GcTo P12
IAan Pha Aap Lau Lau Lya Lau Ala Cly Aap Val Civ tar Aan Pro GlylPre Ph* P*na Put 6ar Aap VaI 00_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ — — - — — __ — — — — — — J
P52nA r g s a r A g o P h . t . r L * > L a u V i l G l u Tnr M a Aan C l n M a i C l n C l u * * P M a i S a r T n . L , . H . i C l y P r o Aap P n t A a n A r g L a u V a l 9 9 0
t n TCC fif TTT ^ hG ^ . ^ ^ T TG GCiC ATT ^ ^ * CTC M V fj^f ATC * f i * MTf GGT CTC C C ^ * B * f j f r fififl CCC TGG Ti C ^ ^ 3 CTT ATC *t^*j CTC CTA
I , , A | a p n a C | H c i u L ( u A | a M a G l v V a l L y a A l a M a A r g Tnr Gl r L a u A * p G l u * l « L y i P r o T ' P Tyr L y a L a u H i L y » L * u L a u 1 0 3 4
)S5 CTC CAC ACC ACC TTC GTC G I G AAC AAC * T C TCC CAC TCC CTC TCC ACT CTC TTT CAC GTG CCG GCC CCC GTC TTC ACT TTC GCA CCA CCC
2A6 CTC CTC TTC C C ^iTrfi TTC CTC ^Art CTT &CC TCC ACT TTC TTC ^fi4i TCC ACA CCC f^\A ^^^C CTT ^j^^j fv^^i f y * CAfi *a W CACICTC Ptfi^ ^^~* ^^34
6593
Downloaded from https://academic.oup.com/nar/article-abstract/12/16/6587/2385244by gueston 04 April 2018
Nucleic Acids Research
1 AAC CAC CCC ACC AAC TAC
I4BS G C * CAC CCA ATT TCC < 1 ACC CCC ACA ATC CAC I O CTC TCC TAC TCC CCA CCT CAC CCT CAC C A C TTC CAC CCT TAC AAC CAC
: ccc ACC ACC * A C TTC TAC i c e ccc T I C ACC
" l i CCO ACC ACC AT< r CCT ffiv; GAT ccc TAC AAA ATT AAC ACC
P i O O i l l i AAC CCC CAA <i ( p 3 ) L * . G . f Gl» •
ATC CAA CAA ACT
: ACC ACA CAT GAC i
i CTT CAC GAG CCC CAC AAC ACC CCT CTA CAC ACC ACC CCC CCC ACC ACC CTT GCC TTT ACA C A C A C A ACT CTC CCA CCT CAA AAC CCA TCC
4 * 1 5 GAT GAC C t C A A C TCC GAC CCT GCC CAA CCT CTT CAC CAC CAA CCA C A A GCT CAA ICCA CCC TAC CCC C C A CCA CTC T CAC AAA CCT
CTC AAA G I G AGA CCC AAC CTC CCA CAC CAC CAC CCC CCT TAC GCT GCT CCC ATC CAC ACA CAA AAA CCC C I A AAA CTC *
-. GAC CGC ACA GCC
e n s CCC CAT CTC CCC A C A (
TAC ACA GCC CCC ACC AAC CCT CC1 TAC TCC CCA CCA GCC CTT CTT CCC AAA CAC CCA GCT CAC ACT TTC A I C CTC CCC ACT CAC
GCA CCC AAC CCA GT T GGA TAC TCC TCA TCC CTT TCC ACC TCC ATC CTT CTT A A A A T C Af^fi CCA C A C ATT f^C CCC f^f t CCA CAC
: CAC ccc ATC CAA CCA GAC ACT cco ccc ccc
I CCT GCC CCT AAC ACT
'. C C i vAA ATT CCC TCi TCC AAC CCT G*T O i l I > 1 TGG CAS A&A T y •
• TTT Tt.T G C * ( A C ATC CAC TCA AAT AAC OCA
> CAC TTC CCC CAG TAC ACA AAC CTC TCC C A T
: ccc TTC CAC
« ATG CCC TCT
:TC TCC TIT CCA CGC <
iliAA rcccn ASCCGSSCTC
6594
Downloaded from https://academic.oup.com/nar/article-abstract/12/16/6587/2385244by gueston 04 April 2018
Nucleic Acids Research
tance by a stretch ot 11 pyrimidines, which are interrupted by no more than
one punne residue and which also show significant base complementarity to the
3'-end of the 18S libosomal RNA from eucaryotes (3). These pyrimidine runs
are also present at the translational start sites of poliovirus and EMCV (21,
22). We therefore speculate that both features may be of importance tor the
recognition by ribosomes or initiation factors of an uncapped mRNA with the
long untranslated leader sequence. Sequence complementary to the 18S rRNA
terminus has also been noted for less pyrimidine rich sequences in other eucar-
yotic mRNAs (23).
The sequence of 1.3 kb preceding the polyprotein gene seems to be exception-
ally long for a leader segment of a small, and otherwise compactly organized
viral genome and suggests that additional short coding sequences may exist in
this segment. At present the most likely candidate for such an unlinked FMDV
gene is a sequence of 92 translatable codons that follows the first AUG after
the poly(C) tract (pos. 209 in Fig. 3). The evidence for this hypothesis is two-
fold. Firstly, a polypeptide ot 10 K (tentatively named P10) has recently been
identified among the products of in vitro translation of FMDV RNA by immuno-
precipitation with an antiserum directed against the P10 amino acid sequence
(Strebel et al, in prep.). Secondly, the translational start of the presumed P10
gene (pos. 209) is structurally very similar to other sites controlling translation
initiation at internal positions of picornaviral RNAs in that it is also preceded at
a short distance by the pyrimidine rich sequence noted above.
The deduced protein map
The long open reading frame encodes a polypeptide with a maximal size of
2332 ammo acids and a calculated molecular weight of 258.9 K, in excellent
agreement with the 260 k determined experimentally tor the FMDV polyprotein
(1). Its deduced amino acid sequence could be correlated in the P1 (P88)
segment with all known sequence data from the structural proteins of FMDV
O-)K (3, 26, 27, ammo acids underlined in Figure 3). Much less information
Fig.3: The nucteotide sequence and encoded information of the L seg-ment of FMDV RNA strain O^K (U residues are shown as T). The aminoacid sequence corresponds to the total polyprotein of 2332 amino acids.A translated amino acid sequence is also shown for a hypothetical smallpolypeptide encoded upstream from the polyprotein. The numbers to theright represent amtno acid positions in the potyprotein. Predicted limits ofthe primary precursors are indicated to the left, and those of the stableviral potypeptides to the right. Dashed lines represent borders wherethe supporting data are weak. Underlined amino acids have been deter-mined experimentally for this virus strain (26,27). The nucleotide posi-tions to the left use the BamHI site at position 3000 as a reference (5).Nucleotide 92 (indicated by an arrowhead) is the first nucleotide down-stream Irom the poly(C), and is put forward here as nucleotide 1 in animproved numbering system. AUG codons in the 5' region are under-lined, a palindromic sequence at the 3' end is underlined by arrows.Nucleotide heterologies are indicated by dots.
6595
Downloaded from https://academic.oup.com/nar/article-abstract/12/16/6587/2385244by gueston 04 April 2018
Nucleic Acids Research
oFMDV
92
P16,,P20o VR. VP2 VP3 VP1 .P12 P2L
189217
69 218 220
PoliovirusVP4, VP2 VP3
69 271 238
213 16154 318 153P20b
VP1
302- f - H—h
P2-X
213
, P3-t,U9 97 329 182
P56
P3-ib661
RNA
proteins
- I proteins
Poliovirus2207 aa
FMDV f~2332 aa T
1 i
\
i1
L
r ^ \ M-& mii
$
d ^ 2 a
1S I
P2
2C /
mmi
i ii
p \
m i
3c
P3
3d
-capsid proteins - [primer -fprotealor replication?
Fig.4: Comparison of the FMDV and poliovirus genomes.a) Schematic map of gene organization and protein processing Open arrows
indicate cleavages probably executed by cellular proteases, while filledarrows represent processing by the viral protease. The dashed arrowsillustrate morphogenetic cleavages occurring during virus particle matura-tion. The FMDV polypeptides are termed as in Figure 3; the number ofamino acids is given for each protein below the lines.
b) Sequence homology between corresponding parts of the two polyproteins.The degree of homology in different segments (see Methods) is: hatched25-40'', crosshatched 40-65", filled in 565H. Related gene products areconnected by vertical lines and identified using the new general nomenc-lature for picornaviruses (according to the third Meeting of the EuropeanStudy Group of Picornaviruses, Urbino, Italy Sept. 5-10, 1983).
was available for the non-structural proteins encoded in the P2 (P52) and the
P3 (P100) precursors where only the VPg genes had been exactly mapped (24)
and only approximate positions had been allocated for P34 and P56a (1) and
for P12 and P20b (A.King, pers.comm.). A more complete protein map was
established (14), using size estimations from SOS polyacrylamide gel electro-
phoresis and sequence homologies to poliovirus polypeptides for which the
order was known (21). The exact coding limits ot the individual functional
proteins were predicted from the cleavage specificity for
Glu (GlnVGly (Ser,Thr) linkages of the FMDV protease (see below) and also
using very recent data from Grubman and collaborators, who determined amino
acid sequences at the N-termini of the polypeptides synthesized in vitro from
FMDV A12 R N A and corresponding to P52, P12, P34, P14, P20b (pers. comm.)
and P56 (25). The order and limits of FMDV proteins thus obtained (Figure 4)
were recently confirmed by immunoprecipitation of polypeptides of the predicted
size from FMDV infected BHK cells, using antisera against bacterially synthes-
6596
Downloaded from https://academic.oup.com/nar/article-abstract/12/16/6587/2385244by gueston 04 April 2018
Nucleic Acids Research
Table I: Predicted map positions and biochemical properties of FMDV polypeptides as
deduced from the nucleotide sequence.
Poly-peptide
Poly-protein
P20a
P16
P88
VP4
VP2
VP3
VP1
P52
(PI 2)
P34
P100
(P14)
VPg-1
VPg-2
VPg-3
P2Ob
P56
Map co- ,.ordinate '
L
L1
PI
la
lb
lc
Id
P2
2b
2c
P3
3a
3b-l
3b-2
3b-3
3c
3d
Mapposition
805-7800
805-1455
889-1455
1456-3615
1456-1662
1663-2316
2317-2976
2977-3615
3616-5079
3664-4125
4126-5079
5080-7800
5080-5538
5539-5607
5608-5679
5680-5751
5752-6390
6391-7800
No. ofami no acids
2332
217
189
720
69
218
220
213
488
154
318
907
153
23
24
24
213
470
Molecularweight
258946.8
24367.4
21243.1
79305.6
7362.3
24410.4
23746.4
23840.5
54501.1
16255.1
35892.9
100844.7
17355.2
2604.5
2622.4
2579.3
23012.4
52760.9
Netcharge
+37
-5
-7
+13
-3
+4
-2
+14
+8
+2
+8
+21
-5
+3
+4
+3
+9
+7
Function
precursor
?
?
precursor
capsid proteinn n
ii ii
ii n
precursor?
?
precursor
genome-linked
protein
protease
RNA polymerase
1) Nomenclature suggested for the picornaviral polypeptides at the 3r Meeting ofthe European Study Group of Picornaviruses, Urbino Italy, Sept. 5-10 1983.
ized polypeptides that correspond to the respective segments of the polypro-
tein (Strebel et al. , in prep.).
The biochemical properties of the FMDV proteins predicted from the ammo acid
sequences in Figure 3 are summarized in Table I and a schematic protein map is
shown in Figure 4a. Our sequence-derived molecular weights often differ from
earlier determinations, and correlate better with recent size estimations of FMDV
proteins synthesized in infected cells (3, Strebel et al. , in prep.).
The polyprotein sequence starts with two closely related "leader" proteins, L
and L' ("P20a" and "P16") which differ by 28 amino acids at their N-termini
and have molecular weights of 24.4 k and 21.2 K, respectively. The primary
product following the leader protein L/L' in the polyprotein is P1 (MW 79.3k),
formerly called "P88", the precursor of the capsid proteins which are arranged
in the order VP4-VP2-VP3-VP1 (1). P2 (formerly "P52"), the precursor from
6597
Downloaded from https://academic.oup.com/nar/article-abstract/12/16/6587/2385244by gueston 04 April 2018
Nucleic Acids Research
the middle part of the potyprotein has a calculated molecular weight of 54.5 K.
It contains two stable proteins, 2b ("P12") and 2c ("P34"), with unknown func-
tions. The N-termtnal limit of P2 was originally set next to the carboxy-termi-
nus of VP1 (cf. Figure 3) which had been accurately identified for FMDV OiK
by (26). However, as shown for FMDV A 1 2 , the proteins P2 and P12 start 16
ammo acids downstream from the C-terminus of VP1 (Grubman pers. comm.,
cf. Figure 3). The carboxy-termmal precursor P3 (formerly "P100", calculated
molecular weight 100.8 K) comprises the sequence between ammo acids 1426
and 2332. It is processed into six proteins: 3a ("P14") of 17.4 K, three VPgs
(in tandem) ot 2.3 K each, a candidate for a protease, 3c ("P20b"), of 23.0 K,
and an RNA polymerase, 3d ("P56"), of 52.7 K.
Pfotease cleavage sites
The present map ot the FMDV genome (Figures 2 and 4) predicts that at least
12 sites in the protein sequence need to be cleaved to give rise to mature
viral gene pioducts. Seven out of these sites show similar amino acid
sequences
VP2/VP3 Pro-Ser-Lys-Glu/Giy-Ile-Phe-Pro
VP3/VP1 Ala-Arg-Ata-Glu/Thr-Thr-Ser-Ala
P14/VPg-1 Pro-Gln-Ala-Glu/Gly-Pro-Tyr-Ala
VPg-1/VPg-2 Pro-Gln-Gln-Glu/Gly-Pro-Tyr-Ala
VPg-2/VPg-3 VaI-Val-Lys-Glu/Gly-Pro-Tyr-Glu
VPg-3/P20b Ite-Val-Thr-Glu/Ser-GIy-Ala-Pro
P20b/P56 Pro-His-His-GIu/GIy-Leu-Ile-Val
suggesting that they are cleaved by a single (viral) protease, recognizing the
consensus sequence Glu/Gly (Ser, Thr). This specificity is similar to the one
displayed by the poliovirus protease which cleaves between Gin and Gly resi-
dues at eight out ot eleven processing sites in the polio potyprotein (21, 28,
Fig. 4a). The sequences around the remaining five cleavage sites in FMDV
P2 0a/VP4 Ser-Gln/Asn-Gly-Ser-GIy/Asn-Thr-Gly-Ser
VP4/VP2 Ala-Leu-Leu-Ala/Asp-Lys-Asn-Thr
VP1/P5 2 Lys-GIn-Thr-Leu/Asn-Phe-Asp-Leu
P12/P34 Ala-Glu-Lys-Gln/Leu-Lys-Ala-Arg
P34/P100 Ile-Phe-Lys-Gln/lle-Ser-lle-Pro
show little sequence homology to each other and are thought to be recognized
by cellular proteases. However recent experiments from this laboratory (unpub-
lished data) suggest that the L polypeptide may also be involved in the pro-
cessing of at least the L/P1 junction.
6598
Downloaded from https://academic.oup.com/nar/article-abstract/12/16/6587/2385244by gueston 04 April 2018
Nucleic Acids Research
In FMDV there are 14 Glu-Gly, 3 Glu-Ser, and 9 Glu-Thr dipeptides in the
polyprotein sequence which may be substrate for the FMDV protease. Of these
only five, one, and one respectively are utilized, according to the protein map
(Figures 3 and 4a). Therefore, not simply the primary sequence but also a
certain conformation of the amino acid sequence must be recognized by the
viral enzyme.
Homoloqy to poliovirus
Gene organization and processing mechanisms predicted for FMDV from the
nucleotide sequence in Figure 3 are summarized in Figure 4 and compared to
those from poliovirus, a member of the enterovirus family. As outlined in Fig-
ure 4a, the overall organization is well conserved between the two genomes.
This map is also similar to the approximate gene map established for EMCV
(29), another well studied virus belonging to a third genus of picornaviridae. In
an overall comparison FMDV differs from poliovirus by an increase in size of
its genome by about 1000 nucleotides most of which are added in the
5' pioximal part of the genome. Within the polyprotein region the two genomes
differ drastically only by the addition of the L gene and two extra VPg genes
in FMDV, and by the addition of a third extended gene (2a or 2b) in the cen-
tral part of the polio genome. Other corresponding genes often differ in size.
Thus, the capsid proteins are larger in poliovirus, while FMDV has expanded its
nonstructural proteins in the P3 segment (Figure 4).
The alignment of the two genomes was refined using sequence homologies bet-
ween functionally corresponding segments. Using a dot matrix program signifi-
cant homology was detected on the ammo acid level throughout most part of
the coding region, and to a lower extend also on the nucleotide level indicat-
ing common structural features of general functional importance (14). As
shown in Figure 4 these sequence homologies are very high in certain parts of
two non-structural proteins, the polymerase (3d), and protein 2c (x/P34).
Less, but still high homology was detected between the protease genes and
the capsid proteins VP2 and VP3, indicating that the functional specificity of
these proteins was less stringently conserved during picornaviral evolution. The
evolutionary divergance is most pronounced in proteins 1d, 2a/2b and 3a. Pro-
tein 1d (VP1) is the capsid protein most exposed at the surface of the virion
and, as a consequence of the pressure of the host's immune system, its
sequence is highly variable also between different aphtoviruses giving rise to
seven serotypes and many more subtypes. In contrast proteins 2a/2b and 3a,
although varible between FMDV and poliovirus, are highly conserved between
two FMDV serotypes as exemplified by a comparison of FMDV O-|K and
6599
Downloaded from https://academic.oup.com/nar/article-abstract/12/16/6587/2385244by gueston 04 April 2018
Nucleic Acids Research
(unpublished results). Therefore, we conclude that these latter proteins play
an important role in the FMDV lite cycle but did coevolve with their targets
which are most likely FMDV-specified molecules. Following the same argu-
ment, we predict that the more generally conserved picornaviral proteins (2c
and 3d) depend in their functions on the interaction with host factors of con-
served structure. In this context it is of note that sequences common to FMDV
and poliovirus in proteins 2c and 3d are also present in Cow Pea Mosaic Virus,
a plant virus which has thus been correlated with picornaviruses (30).
The comparison of the two picornaviral genomes in Figure 4b also indicates
that these differ in size predominantly in regions with low sequence homology.
Sometimes extended blocks of nucleotides are also found inserted into well
conserved genes like in gene 1b (VP2). Together with the addition of com-
plete genes, like the L gene or the extra VPg genes in FMDV, these data indi-
cate that evolution of picornaviral genomes involved the insertion or deletion of
RNA segments of several hundred nucleotides.
CONCLUSIONS
We report the nucleotide sequence of the complete coding part of an aphthovi-
rus genome. This sequence is useful in several ways for further studies of the
viral life cycle. It has already allowed exact predictions of the genome organi-
sation of these viruses and of the amino acid sequences of their gene pro-
ducts. This information facilitates the identification and characterization of these
proteins, e.g. by antisera elicited against synthetic peptides. In addition, cDNA
segments from specific parts of the genome can be expressed in E. coli and
used as specific antigens free of any other viral protein, or be used as sub-
strate tor processing enzymes. Finally, FMDV genes and genomic signals can
be transferred well defined as cDNA copies into animal cells and their functions
analyzed. Such studies should also answer questions as to the function of the
1300 nucleotides long leader sequence preceding the polyprotein gene in the
FMDV RNA.
During preparation of this manuscript, the complete coding sequence for the
FMDV polyprotein has been reported for strain A-io (31). While there are exten-
sive serotype related sequence variations in the structural proteins (3, 12) the
sequence differs in the non-structural genes from the OiK sequence by 331
nucteotides (7.9*0 and 44 predicted amino acids (3.1"), provided that we
neglect three T (U) residues (positions 5978/79, 6013/14, 6016/17), appearing
in duplicate in the A 1 0 sequence published.
6600
Downloaded from https://academic.oup.com/nar/article-abstract/12/16/6587/2385244by gueston 04 April 2018
Nucleic Acids Research
ACKNOWLEDGEMENTS
We are indepted to Joern Wolters and Oliver Steinau lor the computer analyses,
and to Gudrun Feil for technical assistance. We also thank Dr. W. Keller and
Dr. K. Strohmaier for gifts of FMDV RNA. We acknowledge Dr. A. King and
Dr. M. Grubman for the communication of unpublished data. This work was
supported by research grants from Biogen S.A. and the Deutsche Forschungs-
gemeinschaft (Forschergruppe Genexpression).
REFERENCES1 Sangar, D.V. (1979) J. gen. Virot. 45, 1-132 Perez-Bercoff, R. (ed.) (1979), The Molecular Biology of the Picornavi-
ruses. Plenum, New York3 Beck, E., Foiss, S., Strebel, K., Cattaneo, R. and Fell, G., (1983)
Nucleic Acids Res. 11, 7873-78854 Johnson, R.A. and Walseth, T.F. (1979) Adv. cyclic Nucteotide Res.
10, 135-1375 Kupper, H., Keller, W., Kurz, C , Forss, S., Schaller, H., Franze, R.,
Marquardt, O., Zaslavsky, V. , and Hofschneider, P.-H. (1981) Nature289, 555-559
6 Vieira, J. and Messing,J. (1982) Gene 19, 259-2687 Strebel, K. (1982) Diplomarbeit, University of Heidelberg8 Maxam, A.M. and Gilbert, W. (1980) Methods in Enzymol. 65, 499-5609 Garoff, H. and Ansorge, W. (1981) Analyt. Biochem. 115, 454-457
10 Osterburg, G., Glatting, K.-H., and Sommer, R. (1982) Nucleic AcidsRes. 10, 207-216
11 Zucker, M. and Stiegler, P. (1981) Nucleic Acids Res. 9, 133-14812 Beck, E., Feil, G., and Strohmaier, K., (1983a) EMBO Journal 2, 555-55913 Hattman, S., Brooks, J.E., and Masurekar, M. (1978) J. Mol.Biol. 126,
367-38014 Forss (1983), P.H. thesis. University of Heidelberg15 Domingo, E., Dayila, M., and Ortin, J. (1980) Gene 11, 333-34618 Holland, J . , Spindler, K., Horodyski, F., Grabau, E., Nichol, S., and
VandePol, S. (1982) Science 215, 1577-158517 Harris, T.J.R., Robson, K.J.H., and Brown, F. (1980) J. gen. Virot. 50,
403-41818 Sangar, D.V., Black, D.N., Rowlands, D.J., Harris, T.J.R., and Brown,
F., (1980) J . Virol. 33, 59-6819 Peacock, S.L., Mclver, C M . , and Monoham, J.J. (1981) Biochem. Bio-
phys. Acta 655, 243-25020 Kozak, M. (1983) Microbiological reviews 47, 1-45.21 Kitamura, N., Semler, B.L., Rothberg, P.G., Larsen, G.R., Adler, C.G.,
Dorner, A . J . , Emini, E.A., Hanecak, R., Lee, J .J . , VanderWerf, S.,Anderson, C.W., and Wimmer, E. (1981) Nature 291, 547-553
22 Palmenberg, A.C. , Kirby, E.M., Janda, M.R., Drake, N.L., Duke, G.M.,Potratz, K.F., and Coltett, M.S. (1984) Nucleic Acids Res. 12,2969-2996
23 Hagenbiichle, O., Sauter, M., Steitz, J.A., and Maus, R.G. (1978) Cetl13, 551-563
24 Forss, S. and Schaller, H. (1982) Nucleic Acids Res. 10, 6441-645025 Robertson, B.H., Morgan, D.O., Moore, D.M., Grubman, M.J., Card, J . ,
Fischer, T., Weddell, G., Dowbenko, D., and Yansura, D. (1983) Virol-ogy 126, 614-623
26 Kurz, C , Forss, S., Kupper, H., Strohmaier, K., and Schaller, H.(1981) Nucleic Acids Res. 9, 1919-1931
27 Strohmaier, K., Wittmann-Liebold, B., and Geissler, A.W. (1978) Biochem.Biophys. Res. Comm. 85, 1840-1645
28 Hanecak, R., Semler, B.L., Anderson, C.W., and Wimmer, E. (1982)Proc. Natl. Acad. Sci. U.S.A. 79, 3973-3977
29 Palmenberg, A.C. (1982) J. Virol. 44, 900-90630 Franssen, H., Lennissen, J . , Goldbach, R., Lomonossoff, G., and Zim-
mern, D., EMBO Journal, in press.31 Carroll, A.R., Rowlands, D.J., and Clarke, B.E. (1984) Nucleic Acids
Res. 12, 2461-2470
6601
Downloaded from https://academic.oup.com/nar/article-abstract/12/16/6587/2385244by gueston 04 April 2018