nucleotide sequence and genome organization of foot-and-mouth

volume 12 Number 16 1984 Nucleic Acids Research

Nucleotide sequence and genome organization of foot-and-mouth disease virus

S.Forss, K.Strebel, E.Beck and H.Schaller

Microbiology, University of Heidelberg, Im Neuenheimer Feld 230, D-6900 Heidelberg, FRG

Received 21 May 1984; Revised and Accepted 24 July 1984

ABSTRACTA continuous 7802 nucleotide sequence spanning the 94* of foot and mouthdisease virus RNA between the 5 -proximal poly(C) tract and the 3'-terminatpoty(A) was obtained from cloned cpNA, and the total size of the RNAgenome was corrected to 8450 nucteotides. A long open reading frame wasidentified within this sequence starting about 1300 bases from the 5 end of theRNA genome and extending to a termination codon 92 bases from its polyade-nylated 3 end. The protein sequence of 2332 ammo acids deduced from thiscoding sequence was correlated with the 260 K FMDV polyprotein. Its pro-cessing sites and twelve mature viral proteins were interred from protein data,available for some proteins, a predicted cleavage specificity of an FMDVencoded protease for Gtu / GlyCThr, Ser) linkages, and homologies to relatedproteins from poliovirus. In addition, a short unlinked reading frame of 92codons has been identified by sequence homology to the polyprotein initiationsignal and by in vitro translation studies.

INTRODUCTION

Foot and mouth disease viruses (FMDV) or aphthoviruses are picornaviruses

which are the causative agent of an aggressive and economically important

disease of cloven-footed farm animals. Their virion contains a single-stranded

RNA genome of about 8 kb with a small protein (VPg) covalently attached to

its 5' end, an internal poly(C) tract, and a poly(A) sequence at the 3' end.

This RNA is of positive polarity and can act directly as a messenger RNA.

Protein sythesis involves post-translational cleavage of a 260 K potyprotein

which is encoded between a single major translation initiation site next to the

poly(C) tract and the 3' end of the RNA. This mode of protein synthesis is

common to all picornaviruses (1) and makes them interesting model systems for

studying the mechanisms of translation initiation and of protein maturation by

specific proteolytic cleavages (2).

To obtain more information about the FMDV genome, its control signals and its

gene products we have cloned cDNA copies of the viral RNA from strain O-jK

and determined their nucleotide sequence. Using a set of overlapping clones we

obtained a continuous sequence of 7915 nucleotides representing the 3' proxi-

mal long L segment of the FMDV RNA which is thought to contain all its cod-

ing information. In a previous publication we had already determined the initia-

© IRL Press Limited, Oxford, England. 6587

Downloaded from https://academic.oup.com/nar/article-abstract/12/16/6587/2385244by gueston 04 April 2018

Nucleic Acids Research

lion sites for polyprotein synthesis and a long open reading frame encoding the

structural proteins of FMDV (3). The present work extends this reading frame

by the coding sequence for the non-structural proteins, predicting a single

translation product of 2332 ammo acids which could be correlated in many

parts with known protein data trom FMDV induced proteins. In addition, we

report and discuss a sequence of 713 bases preceding the polyprotein gene to

the poly(C) tract.

MATERIALS AND METHODS

Enzymes

Restriction endonucleases were isolated and purified by standard procedures or

purchased from Boeringer Mannheim or Biolabs. Avian Myeloblastosis Virus

reverse transcriptase was a gift from W. Keller, Heidelberg. Nuclease S1 from

Aspergillus oryzae, calf intestinal phosphatase, and T4 polynucleotide kinase

were from Boehnnger Mannheim, and terminal deoxynucleotidyl transferase was

from BRL. [ a / - 3 2 P] -ATP was prepared according to (4).

FMDV RNA

FMDV O.|K RNA, isolated after 7 and 64 passages in BHK-cells of the same

field-isolate of the virus (Kaufbeuren 1966-67) was provided by K. Strohmaier

and W. Keller. The virus was plaque-purified at the beginning of the passages

and again after passage 16.

FMDV cDNA clones

The clones FMDV-2735 and -2615 were constructed essentially as described

by (3) using single-stranded restriction fragments (mapping at pos. 3662-3752

and 3792-3882, cf. Figure 2) trom the existing cDNA clone FMDV-715 as prim-

ers for cDNA-synthesis. The double-stranded cDNA was provided with dC-

tails and annealed to dG-tailed pBR322, linearized at the Pstl site. Construc-

tion of clones FMDV-3214a and -3214c was carried out using a modified

procedure that avoids second strand synthesis and G/C tailing (cf. Figure 1):

cDNA synthesis was primed with an Aval/Hindlll restriction fragment (Pos.

692-743) from cDNA clone FMDV2735 which had been isolated from a 6*

potyacrytamide strand separation get (8). Reverse transcription was followed

by RNase treatment (pane. RNase 20ug/ml, 30 min., 37°C). Oligo-dA tails of

70-80 nucleotides were added to the 3'-ends of the cDNA using Terminal

Deoxynucleotidyl Transferase. This ohgo-dA tailed cDNA was annealed to the

DNA fragment complementary to the primer fragment. By this a partially dou-

ble stranded molecule was obtained having a single stranded oligo-dA tail at

its 3'-side and a double stranded portion at its 5'-side reconstituting the origi-

6588



nal Hindlll site ot FMDV pos. 743. This DNA molecule was then annealed to

the vector pUC9 (6), that had been cleaved by Pstl, dT-tailed, Hindlll cleaved

and puntied by agorose gel electrophoresis. The DNA mixture was ligated and

transformed into competent E.coli C600 cells. Clones were screened for FMDV

inserts by colony hybridization with [32P]-labelled FMDV RNA as described

elsewhere (7). Clones FMDV-2735 and -2615 were derived from RNA isolated

from low passaged FMDV (7 passages), all other clones were derived from a

virus-isolate after 64 passages in BHK cells.

Nucleotide sequence analysis

Restriction fragments were endlabelted and chemically degraded by the base-

specific cleavage methods according to (8). 5' endlabelling was used through-

out except for the 3' end of the genome where 3' endlabelling was employed

at a Hpall site located 68 nucleotides in front of the poly(A) tail. Thin

sequencing gels (0,04 cm), either 40 or 100 cm long, were dried prior to print-

ing, according to (9) to improve resolution of the bands in the sequencing lad-

ders.

Computer analyses

The derived nucleotide sequences were entered into a data base, where the

information was stored and processed using the computer programs of (10).

The alignment program from Kriiger and Osterburg (unpublished) was based on

the algorithm from (11). The homology comparison in Figure 4b was based on

such computer-derived alignments scoring "similar ammo acids" as one third of

identical ammo acids.

RESULTS AND DISCUSSION

Cloning of FMDV cDNA

A set of cDNA clones from FMDV strain OiK (FMDV-144, -512, -703, -715,

-1034, -1448) has been described which cover the two thirds of the FMDV

genome from the VP3 coding region to the 3' end (5, cf. Figure 2). To obtain

cloned cDNA copies upstream of the VP3 gene single-stranded restriction frag-

ments from the existing cDNA clones (indicated by I and II in Figure 2) were

used in two steps (7, 12) to prime cDNA synthesis close to the missing parts

of the FMDV genome (see Methods and Figure 1). As a result the cloned part

of the FMDV genome was extended into the 3'-end of the poly(C) tract.

Nucleotide sequence analysis

In Figure 2 exact map positions of the cloned cDNA inserts used for nucleotide

sequence analysis and the sequencing strategy are depicted. The methods of

(8) were followed for 5' [3 2P]-endlabelling and subsequent partial chemical

6589



HinduRNA 51 __ _ = _ * '

cDNA 3' ~ *** •™

Transformation —

TTTTT

PUC9

F ig .1 : Strategy tor cloning 5'-terminal sequences of FMDV O i K . Theminus strand of an Aval/Hmdll l fragment (pos . 692-743, hatched box)from cDNA clone pFMDV2735 was used to speci f ica l ly prime cDNAsynthesis on FMDV RNA (step I). The RNA template was hydro lyzedusing pane. RNase, o l i go -dA tails were added to the 3 ' -ends of thecDNA and the cDNA was annealed to the plus strand of the Aval/Hmdll lfragment (empty box ) , complementary to the primer fragment (step II).The part ial ly double stranded and d A - tai led cDNA was annealed topUC9 vector DNA (6) that had been Pstl c leaved , dT- ta i led and Hmdlllc leaved in this order (step III) and then transferred into competent E.coliC600.

degradation of restr ict ion fragments. In order to obtain unambiguous sequence

data most regions were sequenced in several independent runs, and the entire

sequence was analyzed on both DNA strands (Figure 2). Using over laps of at

least hundred nucleot ides between subctones, a continuous sequence was g e n -

erated over the 7915 c loned nucleotides start ing with 11 C residues from the

poly(C) tract and ending in 102 A residues as determined in clone

pFMDV-3214c and pFMDV-512, respect ively (Figure 3). A l l cleavage sites

piedicted in this sequence for the restr ict ion enzymes used during the sequence

analysis were also observed experimental ly, except for a Clal site at position

5026. This site over laps two Mbol sites which are known to be targets for

adenosine methylat ion in E. col i (13).

Minor heterologies in the sequence were observed in over lapping regions from

different c lones (Figure 3, 14). Such point mutations are known to occur with

high probabi l i ty in populat ions of FMDV RNA (15) and in other viral RNAs

because of the intr insical ly imprecise mechanism of RNA repl icat ion (16). This

variation was part icular ly obvious when cDNA from moderately and highly pas -

saged virus (7 and 64 passages in 6HK ce l l s , respect ive ly) was compared. No

subst i tut ions were found in 550 bases over lapping the clones FMDV-2735 and

-2615 from the same low passaged virus (14).

The nucleot ide sequence

The FMDV genome is divided by the poly(C) tract into two parts of different

size and funct ion. The small (S) segment to the 5 ' side (approx. 400 nuc leo -

tides (17)) is probably involved only in initiation of viral RNA repl icat ion, and

the large (L) segment 3 ' of the poly(C) contains a l l the protein coding informa-

6590


ryib'f"i ** • p». ' I » " ' m \ I P12 | n* I m | »ni | raoi

; 'viv'W i; ;'—;B 't' " t y r t i 1 i " i ' i ' i i f t i / f t ^ ' i ^ I'T #N»tfU{'.itf"' ' ' 4 ^ 1_ . , - , i , . ™-j

Fig.2: Physical map ot the FMDV genome, FMDV cDNA clones andstrategy for the nucleotide sequence analysis. The top part shows thephysical and the gene map of the FMDV RNA. Positions of the restric-tion sites of endonucleases used for 5' end-labeling are indicated on thesecond line. Below that, the restriction fragments used to prime cDNAsynthesis (I and II) are illustrated as open arrows. Horizontal arrowsshow direction and extent ot individual sequencing runs, grouped accord-ing to the clones they originate from. Dashed lines represent portions ofclones or sequencing runs from which no information was obtained. Theanalysis of the clones FMDV-1034 and FMDV-144 has already beenreported (26) and only the extent ot the sequenced areas are indicatedhere. The restriction enzymes are represented by the letters: A, Aval;B, BamHi; C, Clal; E, Haelll; F, Hind; H, Hhal; I, Hphl; N, Hindlli; P,Hpali; R, EcoRl; T, Taql; U, Pvuli; X, Xhol; Xb, Xbal.

CD

o'

O)

3)a>inCD0)a



tion (18). The sequence ot 7802 nonhomopolymeric nucleotides shown in Figure

3, represents the complete primary structure ot the L segment. Assuming

additional 150 bases tor poly(C) and 400 tor the S segment this corresponds to

94 % of the FMDV genome indicating that the size of the FMDV genome is

about 8500 nucleotides, i.e. 500 nucleotides longer than estimated from sizing-

gets by (17).

This size correction is also supported by the sizing and the partial sequence

analysis of cDNA copies that cover the as yet uncloned 5'-terminat part of the

FMDV genome, i.e. the S segment and the poly(C) tract (7). So tar, all

attempts to clone this missing part of the viral RNA have been unsuccessful.

These difficulties seem to be related to the internal poly(C) tract, which, alt-

hough readily copied into cDNA (7), is probably highly unstable in E. coli, as

shown for other (GtC)-homopolymers longer than 30 basepairs (19). In accor-

dance with this notion we have only been able to clone 11 C residues from the

3' proximal part of the poly(C) tract which still contains interdispersed non-C

nucleotides. The sequence obtained ...GCT(C)i 1AAG... was confirmed by

direct sequencing of the cDNA extending into the poly(C) tract (7). It is diffe-

rent, but shows similarities to the sequence ..T(C)2AUUCCAAG... determined

for the 3' end ot a T1 oligo nucleotide containing the poly(C) tract from FMDV

O-V1 and A61 (17).

Coding regions and translation initiation sites

From the kinetics of the appearance of FMDV specific gene products it has

been predicted that FMDV RNA contains a single long open translational

reading frame for a 260 K polyprotein, which in turn is a precursor for all gene

products. Our sequence reveals such a reading frame of 7035 nucleotides

(pos. 766 to 7800) with a first possible initiaton codon for the polyprotein at

position 805 (see Figure 3). The coding region is preceded by 724 nucleotides

of known sequence and by some additional 550 nucleotides of yet uncloned

RNA comprising the poly(C) tract and the S fragment. It is followed by sev-

eral atop codons in all three reading frames leaving 92 nucleotides untranslated

in front of the poly(A) tail. Usually the 5' proximal AUG is used to initiate

translation in an eucaryotic mRNA, which suggests that the ribosomes first

recognize the capped 5' end ot the RNA and then traverse downstream until an

AUG is encountered (20). Clearly this model is not applicable to FMDV since

translation of the major primary gene product starts at position 805 and to a

lesser extent also at position 889 (3), which are the ninth and tenth AUGs

downstream trom the poly(C) (see Figure 3). These two start sites differ from

other AUG codons in the sequence in that they are preceded at a short dis-

6592



9 2 fCTnAACTTTTACCCTCCTTCCCGACGTAAAASCCAGGTAACCACAACCITCAAACCGTCCCCCCCCACCTAA^

?0« C G G A ^ G A A A C C A C A A G A C T T A C C T T C C C T C C C A A G T A A A A C G A C A A A C A C A C A C A G T T T I G C C C G T T T T ^ ^

31S TTGTACAAACACCATCTATGCAGCTTTCCOaÂCTGACACAAACCGTGCAACTTCAAACTCCGCCIGCTCT^^

«<0 G A T C C A C T A G C G A G T C T T A G T A G C G G T A C T G C T G T C T C G T A G C G G A G C A J G T T G G C C G T G G G A A C A C C T C T ^

See C A T G T G T G C A A C < S X > C C A C G G C A G C T T T A C T C T C A A A C C C A C T T C A A G G I G A C A T T G A T A C T G C T A C T C A A

• I t C « 1 A C A C T C G G G A T C T G A G A A C C G G A C T G G G A C T T C T T T A A A G T G C C C A G T T T A A A A A & C I T C T A C C C C T G A ^ ^

P 2 Q & B 0 & ATG AAT ACA ACT &*C TCT TTT ATC GCT TTG G T A CAG CCT A T C A G A G A G A I T AAA GCA CTT TTT CTA TCA CGC ACC A C A GGG AAA I ATG GAA P20fi

<u "j.V"J!;'_'J!'_i"i".tri"i*'j_""-*'j_i;i*'jJ!.'.t;ii';_'i'_L."--i"ifi>_'-i"_i'i-if_'i'-r_'_!'j'.M\r:7Ol" "PIO

C T GAA G C TC AG f f j G i f i GAT CTC

G&h CT T

CAA CTG CAT GAG GGT G&h CCA CCT CCT CTC GTG ATC TGG AAC ATC AAG CAC TTG CTC CAC ACC GGC ATC GGC ACC GCC TCG C & l

12SS TTT CCG TGT GTC ACC TCC AAC CCG TGG TAC GCG ATT CAC GAT GAG GAC TTC TAC CCC TGG ACG CCG GAC CCG TCC GAC GTT CTG GTC TTT

1 3 * S GTC CCC TAC CAT CAA GAA CCA CTC AAC GGG CAA TCG AAA GCC AAG GTT CAA CCC AAG CTC AAA GGG GCT GCA CAA TCC ACT CCA GCG ACC

V a I P ' o Tyr Aap G i n Cl u P r v L f u Asn G l y G l u T r p L y * A I * L y a V a l G i n A r g L y a L t u L y a G l y A l a G l y G i n S a r Sar P r o A l a Thr 2 1 0

H I S CCC TCC CAG AAC CAA TCT CCClAAT ACT GGC ACC ATA ATA AAC AAC TAC TAC ATC CAC CAG TAT CAA AAC TCC ATG GAC ACA CAC CTT GGT VP4

) i 5 I 5 GAC AAC GCA ATC ACT GCA GGC TCT AAC CAG GGC TCC ACC CAC ACA ACC TCC ACC CAC ACA ACC AAC ACC CAC AAC AAT GAC TGG TTC TCC

i e < » AAA CTT CCC ACC TCT GCT TTC AGC GGT CTT TTC GCC CCT CTT CTC CCC IcAC AAG AAG ACA GAG CAG ACC ACT CTC CTC CAA CAC CCC ATC VP2

Idti Ltl IiLF SlH S±S IE.' !£.' LU l U 41k 410 &!« ill 3 0 °170) CTC ACC ACC CGI AAC GGC CAC ACC ACG TCG ACA ACC CAC TCA AGC GTT GGA GTC A C A TAC GGG TAC CCA ACA CCT GAA GAT TTT CTG ACC

Lau Tnf Tnr A^j Ain 61 r H n Thr Tnr &*r Thr Tnr Gin S«' S«' V«l Gly V«l Thr Tyr Gly Tyr Al# Thr Alt Glu Aap Prl* V*l Sar 330

17^^ G y CCG 'UhC ACT TCC GCT CTC rt ACC ACA CTT CTG £P^1 GCA ^^ CGC TTT TTC AAA ACC f**ftC CTC TTC *if TCG CTC ACC ACT f^yf TCA

1 f TTC G&^ CGT TGC **f CTC CTG ^ A CTC CCG ACC ^c CAC ' OV GGT GTC TAC GGC AGC CTG ACT CAC TCG TAT CCA TAT ATG A y *<ff GGC

: AAC ACT CAA GGT CCC

VP3~C I G G A ATA TTC CCC GTG CCC

2 3 3 6 TGT ACC CAC GCC TAT GGT GGC CTC GTG ACC ACG GAC CCG AAC ACC CCT CAC CCC GTT TAT GGG A A * CTG TTC AAC CCC CCC CGC AAC CAC

C y t Sar Aap C l y Tyr G l y G l v L a u V a l T^^ T j ^ Aap Pio L y ^ T h r A l • Aap P r y V a I Tyr G l v ^ _ > V a I Pha Aan Pio P r o A r g A a n G i n 6 4 0

2a I S TTG CCG GGG CGT TTT ACC AAC CTC CTT CAT GTG GCT f>AO GCA TCC CCG ACG TTT CTC CGC TTC GAG GGT GCC GTA CCG TAC CTC ACC ACG

2S1S ÂA ff*fi fL^f* TCG G^C AGG CTC CTT CCT CAC TTT CAC ATC TCT TTC CCA CCA AAA CJ^^ ATC T C A SAC ACC TTC CTC C C A GCT CTT CCC C A GL r a T h r Aap f a r A i p * ' 9 V t i L a u A l a C l n P n t A a o M a I & i r L a u A4 a A l a L y a G i n Ma I S a r A m Thr Pha L t u A l t G l y L a u A l a G i n 0 0 0

T » f T y i Tnr G i n Tyr itt Gl r Tnr l i t A m L t u n n P n r M a i P u t Th< G l y *>'O Thr A I D A l a L v > A l t A r g T r f M a i V a l A l t Tyr A l t S 3 0

26 9ft CCA CCA GCC ATC CAC CCG CCC ^^G ACA CCT GAG GCG CCC CCC CAC TGC ATT CAT C T CAA TGG fr^f ACT GGC TTA ifeAC TCA AAG TTT ACT

i16S TTT TCC ATC CCC TAC CTC TCC CCC GCC CAT T A C fJC^m TAC ACC CCG TCT GfaC GTG GCC GAG ACC ACft J^^T GTG C^flr ^ f ^ TGG GTC TGC TTGP n t S a r i i a P r o Tyr L a u i n A l a A l a A«o T v A l a Tyr Thr A l a S i r Gly V » l A l a Q*u Tnr Tnr Aan V a l G i n G l y T r p V t I C y l L a u BOO

2 » 0 S CCC CCT CCC GAA I ACC ACT TCT CCC CCC GAG TCA CCC CAT CCT CTC ACC ACC ACC GTT GAA AAC TAC GGT CCC CAA ACA CAC ATC CAG ACC V P 1

Ala Arg Ai a G I U I T I W TIW Sa_ Ala Cly frlu $\t Ala A I D Pip Val Thr Thr Thr V i l Glu Ain Tirr CJj Gl v Glu Thr Gin I l i CjJJ AfQ f 50

JOSft CCC f~^* ÂC ACG fi^c O ' C TCC TTC ATC ATC ^ ^ f ACA TTT C T C %/sG C T C ACA CCQ C A ^ * ^ ^ ^ ^ ^ ATT ^\^c A T T T T 6 ÂC C T C ATC C A C ATT

3 7 3 6 ACC TCC C T T CCA AAT GCA G C C CCC C A A AAC C C C T T C CAC * A C ACC ACC AAC C C A ACT GCT T A C CAC AAC GCA CCA C T C A C C C C C C T T CCC

T n r T r p V a l P r o A m C l y A l t P r o C l u t - y a A l a L t u A » p A l n T h r T h r A « n P r o T n r A l t T y r M i a L y a A l a Pit L .au T n r A r g L a u A l t 1 * 0

3 3 7 $ C T C C C C T A C ACT C C C CCC CAC CGC G T G T T G CCA ^CC C T C T A C r \^c CCT C A C TCC i * ^ ^ T A C rtj^c ^^^^ AAT GCT CTG CCC AAC T T C A C A CGT

GcTo P12

IAan Pha Aap Lau Lau Lya Lau Ala Cly Aap Val Civ tar Aan Pro GlylPre Ph* P*na Put 6ar Aap VaI 00_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ — — - — — __ — — — — — — J

P52nA r g s a r A g o P h . t . r L * > L a u V i l G l u Tnr M a Aan C l n M a i C l n C l u * * P M a i S a r T n . L , . H . i C l y P r o Aap P n t A a n A r g L a u V a l 9 9 0

t n TCC fif TTT ^ hG ^ . ^ ^ T TG GCiC ATT ^ ^ * CTC M V fj^f ATC * f i * MTf GGT CTC C C ^ * B * f j f r fififl CCC TGG Ti C ^ ^ 3 CTT ATC *t^*j CTC CTA

I , , A | a p n a C | H c i u L ( u A | a M a G l v V a l L y a A l a M a A r g Tnr Gl r L a u A * p G l u * l « L y i P r o T ' P Tyr L y a L a u H i L y » L * u L a u 1 0 3 4

)S5 CTC CAC ACC ACC TTC GTC G I G AAC AAC * T C TCC CAC TCC CTC TCC ACT CTC TTT CAC GTG CCG GCC CCC GTC TTC ACT TTC GCA CCA CCC

2A6 CTC CTC TTC C C îTrfi TTC CTC Ârt CTT &CC TCC ACT TTC TTC ^fi4i TCC ACA CCC f^\A ^^^C CTT ^j^^j fv^î f y * CAfi *a W CACICTC Ptfi^ ^^~* ^^34

6593



1 AAC CAC CCC ACC AAC TAC

I4BS G C * CAC CCA ATT TCC < 1 ACC CCC ACA ATC CAC I O CTC TCC TAC TCC CCA CCT CAC CCT CAC C A C TTC CAC CCT TAC AAC CAC

: ccc ACC ACC * A C TTC TAC i c e ccc T I C ACC

" l i CCO ACC ACC AT< r CCT ffiv; GAT ccc TAC AAA ATT AAC ACC

P i O O i l l i AAC CCC CAA <i ( p 3 ) L * . G . f Gl» •

ATC CAA CAA ACT

: ACC ACA CAT GAC i

i CTT CAC GAG CCC CAC AAC ACC CCT CTA CAC ACC ACC CCC CCC ACC ACC CTT GCC TTT ACA C A C A C A ACT CTC CCA CCT CAA AAC CCA TCC

4 * 1 5 GAT GAC C t C A A C TCC GAC CCT GCC CAA CCT CTT CAC CAC CAA CCA C A A GCT CAA ICCA CCC TAC CCC C C A CCA CTC T CAC AAA CCT

CTC AAA G I G AGA CCC AAC CTC CCA CAC CAC CAC CCC CCT TAC GCT GCT CCC ATC CAC ACA CAA AAA CCC C I A AAA CTC *

-. GAC CGC ACA GCC

e n s CCC CAT CTC CCC A C A (

TAC ACA GCC CCC ACC AAC CCT CC1 TAC TCC CCA CCA GCC CTT CTT CCC AAA CAC CCA GCT CAC ACT TTC A I C CTC CCC ACT CAC

GCA CCC AAC CCA GT T GGA TAC TCC TCA TCC CTT TCC ACC TCC ATC CTT CTT A A A A T C Af^fi CCA C A C ATT f^C CCC f^f t CCA CAC

: CAC ccc ATC CAA CCA GAC ACT cco ccc ccc

I CCT GCC CCT AAC ACT

'. C C i vAA ATT CCC TCi TCC AAC CCT G*T O i l I > 1 TGG CAS A&A T y •

• TTT Tt.T G C * ( A C ATC CAC TCA AAT AAC OCA

> CAC TTC CCC CAG TAC ACA AAC CTC TCC C A T

: ccc TTC CAC

« ATG CCC TCT

:TC TCC TIT CCA CGC <

iliAA rcccn ASCCGSSCTC

6594



tance by a stretch ot 11 pyrimidines, which are interrupted by no more than

one punne residue and which also show significant base complementarity to the

3'-end of the 18S libosomal RNA from eucaryotes (3). These pyrimidine runs

are also present at the translational start sites of poliovirus and EMCV (21,

22). We therefore speculate that both features may be of importance tor the

recognition by ribosomes or initiation factors of an uncapped mRNA with the

long untranslated leader sequence. Sequence complementary to the 18S rRNA

terminus has also been noted for less pyrimidine rich sequences in other eucar-

yotic mRNAs (23).

The sequence of 1.3 kb preceding the polyprotein gene seems to be exception-

ally long for a leader segment of a small, and otherwise compactly organized

viral genome and suggests that additional short coding sequences may exist in

this segment. At present the most likely candidate for such an unlinked FMDV

gene is a sequence of 92 translatable codons that follows the first AUG after

the poly(C) tract (pos. 209 in Fig. 3). The evidence for this hypothesis is two-

fold. Firstly, a polypeptide ot 10 K (tentatively named P10) has recently been

identified among the products of in vitro translation of FMDV RNA by immuno-

precipitation with an antiserum directed against the P10 amino acid sequence

(Strebel et al, in prep.). Secondly, the translational start of the presumed P10

gene (pos. 209) is structurally very similar to other sites controlling translation

initiation at internal positions of picornaviral RNAs in that it is also preceded at

a short distance by the pyrimidine rich sequence noted above.

The deduced protein map

The long open reading frame encodes a polypeptide with a maximal size of

2332 ammo acids and a calculated molecular weight of 258.9 K, in excellent

agreement with the 260 k determined experimentally tor the FMDV polyprotein

(1). Its deduced amino acid sequence could be correlated in the P1 (P88)

segment with all known sequence data from the structural proteins of FMDV

O-)K (3, 26, 27, ammo acids underlined in Figure 3). Much less information

Fig.3: The nucteotide sequence and encoded information of the L seg-ment of FMDV RNA strain O^K (U residues are shown as T). The aminoacid sequence corresponds to the total polyprotein of 2332 amino acids.A translated amino acid sequence is also shown for a hypothetical smallpolypeptide encoded upstream from the polyprotein. The numbers to theright represent amtno acid positions in the potyprotein. Predicted limits ofthe primary precursors are indicated to the left, and those of the stableviral potypeptides to the right. Dashed lines represent borders wherethe supporting data are weak. Underlined amino acids have been deter-mined experimentally for this virus strain (26,27). The nucleotide posi-tions to the left use the BamHI site at position 3000 as a reference (5).Nucleotide 92 (indicated by an arrowhead) is the first nucleotide down-stream Irom the poly(C), and is put forward here as nucleotide 1 in animproved numbering system. AUG codons in the 5' region are under-lined, a palindromic sequence at the 3' end is underlined by arrows.Nucleotide heterologies are indicated by dots.

6595



oFMDV

92

P16,,P20o VR. VP2 VP3 VP1 .P12 P2L

189217

69 218 220

PoliovirusVP4, VP2 VP3

69 271 238

213 16154 318 153P20b

VP1

302- f - H—h

P2-X

213

, P3-t,U9 97 329 182

P56

P3-ib661

RNA

proteins

- I proteins

Poliovirus2207 aa

FMDV f~2332 aa T

1 i

\

i1

L

r ^ \ M-& mii

$

d ^ 2 a

1S I

P2

2C /

mmi

i ii

p \

m i

3c

P3

3d

-capsid proteins - [primer -fprotealor replication?

Fig.4: Comparison of the FMDV and poliovirus genomes.a) Schematic map of gene organization and protein processing Open arrows

indicate cleavages probably executed by cellular proteases, while filledarrows represent processing by the viral protease. The dashed arrowsillustrate morphogenetic cleavages occurring during virus particle matura-tion. The FMDV polypeptides are termed as in Figure 3; the number ofamino acids is given for each protein below the lines.

b) Sequence homology between corresponding parts of the two polyproteins.The degree of homology in different segments (see Methods) is: hatched25-40'', crosshatched 40-65", filled in 565H. Related gene products areconnected by vertical lines and identified using the new general nomenc-lature for picornaviruses (according to the third Meeting of the EuropeanStudy Group of Picornaviruses, Urbino, Italy Sept. 5-10, 1983).

was available for the non-structural proteins encoded in the P2 (P52) and the

P3 (P100) precursors where only the VPg genes had been exactly mapped (24)

and only approximate positions had been allocated for P34 and P56a (1) and

for P12 and P20b (A.King, pers.comm.). A more complete protein map was

established (14), using size estimations from SOS polyacrylamide gel electro-

phoresis and sequence homologies to poliovirus polypeptides for which the

order was known (21). The exact coding limits ot the individual functional

proteins were predicted from the cleavage specificity for

Glu (GlnVGly (Ser,Thr) linkages of the FMDV protease (see below) and also

using very recent data from Grubman and collaborators, who determined amino

acid sequences at the N-termini of the polypeptides synthesized in vitro from

FMDV A12 R N A and corresponding to P52, P12, P34, P14, P20b (pers. comm.)

and P56 (25). The order and limits of FMDV proteins thus obtained (Figure 4)

were recently confirmed by immunoprecipitation of polypeptides of the predicted

size from FMDV infected BHK cells, using antisera against bacterially synthes-

6596



Table I: Predicted map positions and biochemical properties of FMDV polypeptides as

deduced from the nucleotide sequence.

Poly-peptide

Poly-protein

P20a

P16

P88

VP4

VP2

VP3

VP1

P52

(PI 2)

P34

P100

(P14)

VPg-1

VPg-2

VPg-3

P2Ob

P56

Map co- ,.ordinate '

L

L1

PI

la

lb

lc

Id

P2

2b

2c

P3

3a

3b-l

3b-2

3b-3

3c

3d

Mapposition

805-7800

805-1455

889-1455

1456-3615

1456-1662

1663-2316

2317-2976

2977-3615

3616-5079

3664-4125

4126-5079

5080-7800

5080-5538

5539-5607

5608-5679

5680-5751

5752-6390

6391-7800

No. ofami no acids

2332

217

189

720

69

218

220

213

488

154

318

907

153

23

24

24

213

470

Molecularweight

258946.8

24367.4

21243.1

79305.6

7362.3

24410.4

23746.4

23840.5

54501.1

16255.1

35892.9

100844.7

17355.2

2604.5

2622.4

2579.3

23012.4

52760.9

Netcharge

+37

-5

-7

+13

-3

+4

-2

+14

+8

+2

+8

+21

-5

+3

+4

+3

+9

+7

Function

precursor

?

?

precursor

capsid proteinn n

ii ii

ii n

precursor?

?

precursor

genome-linked

protein

protease

RNA polymerase

1) Nomenclature suggested for the picornaviral polypeptides at the 3r Meeting ofthe European Study Group of Picornaviruses, Urbino Italy, Sept. 5-10 1983.

ized polypeptides that correspond to the respective segments of the polypro-

tein (Strebel et al. , in prep.).

The biochemical properties of the FMDV proteins predicted from the ammo acid

sequences in Figure 3 are summarized in Table I and a schematic protein map is

shown in Figure 4a. Our sequence-derived molecular weights often differ from

earlier determinations, and correlate better with recent size estimations of FMDV

proteins synthesized in infected cells (3, Strebel et al. , in prep.).

The polyprotein sequence starts with two closely related "leader" proteins, L

and L' ("P20a" and "P16") which differ by 28 amino acids at their N-termini

and have molecular weights of 24.4 k and 21.2 K, respectively. The primary

product following the leader protein L/L' in the polyprotein is P1 (MW 79.3k),

formerly called "P88", the precursor of the capsid proteins which are arranged

in the order VP4-VP2-VP3-VP1 (1). P2 (formerly "P52"), the precursor from

6597



the middle part of the potyprotein has a calculated molecular weight of 54.5 K.

It contains two stable proteins, 2b ("P12") and 2c ("P34"), with unknown func-

tions. The N-termtnal limit of P2 was originally set next to the carboxy-termi-

nus of VP1 (cf. Figure 3) which had been accurately identified for FMDV OiK

by (26). However, as shown for FMDV A 1 2 , the proteins P2 and P12 start 16

ammo acids downstream from the C-terminus of VP1 (Grubman pers. comm.,

cf. Figure 3). The carboxy-termmal precursor P3 (formerly "P100", calculated

molecular weight 100.8 K) comprises the sequence between ammo acids 1426

and 2332. It is processed into six proteins: 3a ("P14") of 17.4 K, three VPgs

(in tandem) ot 2.3 K each, a candidate for a protease, 3c ("P20b"), of 23.0 K,

and an RNA polymerase, 3d ("P56"), of 52.7 K.

Pfotease cleavage sites

The present map ot the FMDV genome (Figures 2 and 4) predicts that at least

12 sites in the protein sequence need to be cleaved to give rise to mature

viral gene pioducts. Seven out of these sites show similar amino acid

sequences

VP2/VP3 Pro-Ser-Lys-Glu/Giy-Ile-Phe-Pro

VP3/VP1 Ala-Arg-Ata-Glu/Thr-Thr-Ser-Ala

P14/VPg-1 Pro-Gln-Ala-Glu/Gly-Pro-Tyr-Ala

VPg-1/VPg-2 Pro-Gln-Gln-Glu/Gly-Pro-Tyr-Ala

VPg-2/VPg-3 VaI-Val-Lys-Glu/Gly-Pro-Tyr-Glu

VPg-3/P20b Ite-Val-Thr-Glu/Ser-GIy-Ala-Pro

P20b/P56 Pro-His-His-GIu/GIy-Leu-Ile-Val

suggesting that they are cleaved by a single (viral) protease, recognizing the

consensus sequence Glu/Gly (Ser, Thr). This specificity is similar to the one

displayed by the poliovirus protease which cleaves between Gin and Gly resi-

dues at eight out ot eleven processing sites in the polio potyprotein (21, 28,

Fig. 4a). The sequences around the remaining five cleavage sites in FMDV

P2 0a/VP4 Ser-Gln/Asn-Gly-Ser-GIy/Asn-Thr-Gly-Ser

VP4/VP2 Ala-Leu-Leu-Ala/Asp-Lys-Asn-Thr

VP1/P5 2 Lys-GIn-Thr-Leu/Asn-Phe-Asp-Leu

P12/P34 Ala-Glu-Lys-Gln/Leu-Lys-Ala-Arg

P34/P100 Ile-Phe-Lys-Gln/lle-Ser-lle-Pro

show little sequence homology to each other and are thought to be recognized

by cellular proteases. However recent experiments from this laboratory (unpub-

lished data) suggest that the L polypeptide may also be involved in the pro-

cessing of at least the L/P1 junction.

6598



In FMDV there are 14 Glu-Gly, 3 Glu-Ser, and 9 Glu-Thr dipeptides in the

polyprotein sequence which may be substrate for the FMDV protease. Of these

only five, one, and one respectively are utilized, according to the protein map

(Figures 3 and 4a). Therefore, not simply the primary sequence but also a

certain conformation of the amino acid sequence must be recognized by the

viral enzyme.

Homoloqy to poliovirus

Gene organization and processing mechanisms predicted for FMDV from the

nucleotide sequence in Figure 3 are summarized in Figure 4 and compared to

those from poliovirus, a member of the enterovirus family. As outlined in Fig-

ure 4a, the overall organization is well conserved between the two genomes.

This map is also similar to the approximate gene map established for EMCV

(29), another well studied virus belonging to a third genus of picornaviridae. In

an overall comparison FMDV differs from poliovirus by an increase in size of

its genome by about 1000 nucleotides most of which are added in the

5' pioximal part of the genome. Within the polyprotein region the two genomes

differ drastically only by the addition of the L gene and two extra VPg genes

in FMDV, and by the addition of a third extended gene (2a or 2b) in the cen-

tral part of the polio genome. Other corresponding genes often differ in size.

Thus, the capsid proteins are larger in poliovirus, while FMDV has expanded its

nonstructural proteins in the P3 segment (Figure 4).

The alignment of the two genomes was refined using sequence homologies bet-

ween functionally corresponding segments. Using a dot matrix program signifi-

cant homology was detected on the ammo acid level throughout most part of

the coding region, and to a lower extend also on the nucleotide level indicat-

ing common structural features of general functional importance (14). As

shown in Figure 4 these sequence homologies are very high in certain parts of

two non-structural proteins, the polymerase (3d), and protein 2c (x/P34).

Less, but still high homology was detected between the protease genes and

the capsid proteins VP2 and VP3, indicating that the functional specificity of

these proteins was less stringently conserved during picornaviral evolution. The

evolutionary divergance is most pronounced in proteins 1d, 2a/2b and 3a. Pro-

tein 1d (VP1) is the capsid protein most exposed at the surface of the virion

and, as a consequence of the pressure of the host's immune system, its

sequence is highly variable also between different aphtoviruses giving rise to

seven serotypes and many more subtypes. In contrast proteins 2a/2b and 3a,

although varible between FMDV and poliovirus, are highly conserved between

two FMDV serotypes as exemplified by a comparison of FMDV O-|K and

6599



(unpublished results). Therefore, we conclude that these latter proteins play

an important role in the FMDV lite cycle but did coevolve with their targets

which are most likely FMDV-specified molecules. Following the same argu-

ment, we predict that the more generally conserved picornaviral proteins (2c

and 3d) depend in their functions on the interaction with host factors of con-

served structure. In this context it is of note that sequences common to FMDV

and poliovirus in proteins 2c and 3d are also present in Cow Pea Mosaic Virus,

a plant virus which has thus been correlated with picornaviruses (30).

The comparison of the two picornaviral genomes in Figure 4b also indicates

that these differ in size predominantly in regions with low sequence homology.

Sometimes extended blocks of nucleotides are also found inserted into well

conserved genes like in gene 1b (VP2). Together with the addition of com-

plete genes, like the L gene or the extra VPg genes in FMDV, these data indi-

cate that evolution of picornaviral genomes involved the insertion or deletion of

RNA segments of several hundred nucleotides.

CONCLUSIONS

We report the nucleotide sequence of the complete coding part of an aphthovi-

rus genome. This sequence is useful in several ways for further studies of the

viral life cycle. It has already allowed exact predictions of the genome organi-

sation of these viruses and of the amino acid sequences of their gene pro-

ducts. This information facilitates the identification and characterization of these

proteins, e.g. by antisera elicited against synthetic peptides. In addition, cDNA

segments from specific parts of the genome can be expressed in E. coli and

used as specific antigens free of any other viral protein, or be used as sub-

strate tor processing enzymes. Finally, FMDV genes and genomic signals can

be transferred well defined as cDNA copies into animal cells and their functions

analyzed. Such studies should also answer questions as to the function of the

1300 nucleotides long leader sequence preceding the polyprotein gene in the

FMDV RNA.

During preparation of this manuscript, the complete coding sequence for the

FMDV polyprotein has been reported for strain A-io (31). While there are exten-

sive serotype related sequence variations in the structural proteins (3, 12) the

sequence differs in the non-structural genes from the OiK sequence by 331

nucteotides (7.9*0 and 44 predicted amino acids (3.1"), provided that we

neglect three T (U) residues (positions 5978/79, 6013/14, 6016/17), appearing

in duplicate in the A 1 0 sequence published.

6600



ACKNOWLEDGEMENTS

We are indepted to Joern Wolters and Oliver Steinau lor the computer analyses,

and to Gudrun Feil for technical assistance. We also thank Dr. W. Keller and

Dr. K. Strohmaier for gifts of FMDV RNA. We acknowledge Dr. A. King and

Dr. M. Grubman for the communication of unpublished data. This work was

supported by research grants from Biogen S.A. and the Deutsche Forschungs-

gemeinschaft (Forschergruppe Genexpression).

REFERENCES1 Sangar, D.V. (1979) J. gen. Virot. 45, 1-132 Perez-Bercoff, R. (ed.) (1979), The Molecular Biology of the Picornavi-

ruses. Plenum, New York3 Beck, E., Foiss, S., Strebel, K., Cattaneo, R. and Fell, G., (1983)

Nucleic Acids Res. 11, 7873-78854 Johnson, R.A. and Walseth, T.F. (1979) Adv. cyclic Nucteotide Res.

10, 135-1375 Kupper, H., Keller, W., Kurz, C , Forss, S., Schaller, H., Franze, R.,

Marquardt, O., Zaslavsky, V. , and Hofschneider, P.-H. (1981) Nature289, 555-559

6 Vieira, J. and Messing,J. (1982) Gene 19, 259-2687 Strebel, K. (1982) Diplomarbeit, University of Heidelberg8 Maxam, A.M. and Gilbert, W. (1980) Methods in Enzymol. 65, 499-5609 Garoff, H. and Ansorge, W. (1981) Analyt. Biochem. 115, 454-457

10 Osterburg, G., Glatting, K.-H., and Sommer, R. (1982) Nucleic AcidsRes. 10, 207-216

11 Zucker, M. and Stiegler, P. (1981) Nucleic Acids Res. 9, 133-14812 Beck, E., Feil, G., and Strohmaier, K., (1983a) EMBO Journal 2, 555-55913 Hattman, S., Brooks, J.E., and Masurekar, M. (1978) J. Mol.Biol. 126,

367-38014 Forss (1983), P.H. thesis. University of Heidelberg15 Domingo, E., Dayila, M., and Ortin, J. (1980) Gene 11, 333-34618 Holland, J . , Spindler, K., Horodyski, F., Grabau, E., Nichol, S., and

VandePol, S. (1982) Science 215, 1577-158517 Harris, T.J.R., Robson, K.J.H., and Brown, F. (1980) J. gen. Virot. 50,

403-41818 Sangar, D.V., Black, D.N., Rowlands, D.J., Harris, T.J.R., and Brown,

F., (1980) J . Virol. 33, 59-6819 Peacock, S.L., Mclver, C M . , and Monoham, J.J. (1981) Biochem. Bio-

phys. Acta 655, 243-25020 Kozak, M. (1983) Microbiological reviews 47, 1-45.21 Kitamura, N., Semler, B.L., Rothberg, P.G., Larsen, G.R., Adler, C.G.,

Dorner, A . J . , Emini, E.A., Hanecak, R., Lee, J .J . , VanderWerf, S.,Anderson, C.W., and Wimmer, E. (1981) Nature 291, 547-553

22 Palmenberg, A.C. , Kirby, E.M., Janda, M.R., Drake, N.L., Duke, G.M.,Potratz, K.F., and Coltett, M.S. (1984) Nucleic Acids Res. 12,2969-2996

23 Hagenbiichle, O., Sauter, M., Steitz, J.A., and Maus, R.G. (1978) Cetl13, 551-563

24 Forss, S. and Schaller, H. (1982) Nucleic Acids Res. 10, 6441-645025 Robertson, B.H., Morgan, D.O., Moore, D.M., Grubman, M.J., Card, J . ,

Fischer, T., Weddell, G., Dowbenko, D., and Yansura, D. (1983) Virol-ogy 126, 614-623

26 Kurz, C , Forss, S., Kupper, H., Strohmaier, K., and Schaller, H.(1981) Nucleic Acids Res. 9, 1919-1931

27 Strohmaier, K., Wittmann-Liebold, B., and Geissler, A.W. (1978) Biochem.Biophys. Res. Comm. 85, 1840-1645

28 Hanecak, R., Semler, B.L., Anderson, C.W., and Wimmer, E. (1982)Proc. Natl. Acad. Sci. U.S.A. 79, 3973-3977

29 Palmenberg, A.C. (1982) J. Virol. 44, 900-90630 Franssen, H., Lennissen, J . , Goldbach, R., Lomonossoff, G., and Zim-

mern, D., EMBO Journal, in press.31 Carroll, A.R., Rowlands, D.J., and Clarke, B.E. (1984) Nucleic Acids

Res. 12, 2461-2470

6601


nucleotide sequence and genome organization of foot-and-mouth

Documents