cloning d cdna,whichencodes major subunit ...10794 medical sciences: chaudhuri et al. a fyb81 fyb71...

5
Proc. Natl. Acad. Sci. USA Vol. 90, pp. 10793-10797, November 1993 Medical Sciences Cloning of glycoprotein D cDNA, which encodes the major subunit of the Duffy blood group system and the receptor for the Plasmodium vivax malaria parasite ASOK CHAUDHURIt, JULIA POLYAKOVAt, VALERIE ZBRZEZNAt, KENNETH WILLIAMSt, SUBHASH GULATI§, AND A. OSCAR POGOt¶ tLaboratory of Cell Biology, Lindsley F. Kimball Research Institute of the New York Blood Center, New York, NY 10021; *Protein and Nucleic Acid Chemistry Facility, Howard Hughes Medical Institute, Yale University, New Haven, CT 06510; and §Department of Medicine, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY 10021 Communicated by Louis H. Miller, August 16, 1993 (received for review July 27, 1993) ABSTRACT cDNA clones encoding the major subunit of the Duffy blood group were isolated from a human bone marrow cDNA library using a PCR-amplifled DNA fragment encoding an internal peptide sequence of glycoprotein D (gpD) protein. The open reading frame of the 1267-bp cDNA clone indicated that gpD protein was composed of 338 amino acids, predicting a Mr of 35,733, which was the same as a deglyco- sylated gpD protein. Portions of the predicted amino acid sequence, matched with six CNBr/pepsin peptides obtained from affinity-purified gpD protein. In ELISA analysis, an anti-Duffy murine monoclonal antibody reacted with a syn- thetic peptide deduced from the cDNA clone. Hydropathy analysis suggested the presence of 9 membrane-spanning a-hel- ices. In bone marrow RNA blot analysis, the gpD cDNA detected. a 1.27-kb mRNA in Duffy-positive but not in Duffy- negative individuals. It also identified the same size mRNA in adult kidney, adult spleen, and fetal liver; in brain, it detected a prominent 8.5-kb and a minor 2.2-kb mRNA. In Southern blot analysis, gpD cDNA identified a single gene in Duffy- positive and -negative individuals. Duffy-negative individuals, therefore, have the gpD gene, but it is not expressed in bone marrow. The same or a similar gene is active in adult kidney, adult spleen, and fetal liver of Duffy-positive individuals. Whether this is true in Duffy-negative individuals remains to be demonstrated. A GenBank sequence search yielded a signifi- cant protein sequence homology to human and rabbit inter- leukin-8 receptors. The Duffy blood group system consists of two principal antigens Fya and Fyb produced by FY*A and FY*B alleles. Antisera anti-Fya and anti-Fyb defined four phenotypes, Fy(a+b-), Fy(a-b+), Fy(a+b+), and Fy(a-b-) (1). Neither antiserum agglutinates Duffy Fy(a-b-) cells, the predomi- nant phenotype in Blacks. Antisera defining the other phe- notypes, Fy3, Fy4, and Fy5, are very rare. A murine mono- clonal antibody, anti-Fy6, defined another Duffy antigenic determinant present in all Duffy-positive cells but absent in Fy(a-b-) cells (2). Blacks with Fy(a-b-) erythrocytes cannot be infected by the human malaria parasite Plasmodium vivax (3). These cells are also resistant to the in vitro invasion by Plasmodium knowlesi, a simian parasite that invades Fy(a+b-) and Fy(a-b+) human erythrocytes (4). Receptors for erythrocyte invasion by these parasites, therefore, are related to the Duffy blood group system. Using the anti-Fy6 monoclonal antibody, we have devel- oped a procedure for purification of Duffy antigens in human erythrocytes (5). Duffy antigens appear to be multimeric erythrocyte-membrane proteins composed of different sub- The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. units. A glycoprotein, named gpD, of 35-45 kDa is the major subunit of the protein complex and has the antigenic deter- minants defined by anti-Fya, anti-Fyb, and anti-Fy6 antibod- ies (5, 6). The characterization, at the molecular level, of this protein will be crucial in finding its function on the erythro- cyte membrane, in understanding the parasite-erythrocyte recognition process, and eventually in resolving the molec- ular mechanism of parasite invasion. This report describes the isolation, sequence analysis, and tissue expression of a mRNA encoding gpD. 11 MATERIALS AND METHODS Partial Amino Acid Sequence Analysis of gpD Protein. The purified protein from Fy(a-b+) human erythrocytes (5) was alkylated and cleaved with cyanogen bromide (CNBr) as explained (7). Pe-1 peptide was obtained by sequencing the nonfractionated CNBr digest using the o-phthalaldehyde blocking reagent (8) (see legend, Fig. 1B). Pe-5 peptide was the partial sequence of the onily fragment (-4 kDa) that separated very well from the CNBr digest run on the three- layer SDS/PAGE system (9). After the run, the peptide fragment was electroblotted onto ProBlott (Applied Biosys- tems) and sequenced (7). Another aliquot was digested with pepsin (50: 1 ratio) at 37°C overnight, and the fragments were separated by reverse-phase HPLC using a Vydac C18 column. Pe-2, Pe-3, Pe-4, and Pe-6 peptides, which were the few pepsin peptides yielded by reverse-phase HPLC, were se- quenced. Applied Biosystems protein/peptide sequencer, model 470 or 477, was used according to the manufacturer's recommendations. Primer Design and PCR. The nucleotide sequence of the primers (23-mer each) was deduced from the N-terminal and C-terminal amino acid sequences of Pe-5 peptide (see legend, Fig. 1B). Bases were chosen according to the codon prefer- ence described by Lathe (10), and deoxyinosine (I) was incorporated at the position where degeneracy exceeded >3-fold, except toward the 3' end. Primer A (sense) was specific for residues 245-252 (see Fig. 1B) and consisted of 12-fold degeneracy 5'-ATGAAYATHYTITGGGCITGGTT (where Y = C or T; and H = C, T, or A). Primer B (antisense) was specific for residues 261-268 (see Fig. 1B) and consisted of 32-fold degeneracy 5'-ACIAGRAARTCIAGICCIARNAC (where R = A or G; and N = G, A, T, or C). First-strand cDNA was synthesized from Fy(a-b+) phe- notype mRNA using the preamplification kit from BRL and oligo(dT) as primer. For enzymatic amplification, cDNA, primer A, primer B, and Taq polymerase (Stratagene) were Abbreviation: gpD, glycoprotein D. lTo whom reprint requests should be addressed. 'The sequence reported in this paper has been deposited in the GenBank data base (accession no. U01839). 10793 Downloaded by guest on June 25, 2020

Upload: others

Post on 18-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cloning D cDNA,whichencodes major subunit ...10794 Medical Sciences: Chaudhuri et al. A Fyb81 Fyb71 Fyb71-81 250 500 750 1000 1250 tbp PstlI.EBstU! EcoRI Es# Aoti gl Accl B G GC? TCC

Proc. Natl. Acad. Sci. USAVol. 90, pp. 10793-10797, November 1993Medical Sciences

Cloning of glycoprotein D cDNA, which encodes the major subunitof the Duffy blood group system and the receptor for thePlasmodium vivax malaria parasiteASOK CHAUDHURIt, JULIA POLYAKOVAt, VALERIE ZBRZEZNAt, KENNETH WILLIAMSt, SUBHASH GULATI§,AND A. OSCAR POGOt¶tLaboratory of Cell Biology, Lindsley F. Kimball Research Institute of the New York Blood Center, New York, NY 10021; *Protein and Nucleic AcidChemistry Facility, Howard Hughes Medical Institute, Yale University, New Haven, CT 06510; and §Department of Medicine, Memorial Sloan-KetteringCancer Center, 1275 York Avenue, New York, NY 10021

Communicated by Louis H. Miller, August 16, 1993 (received for review July 27, 1993)

ABSTRACT cDNA clones encoding the major subunit ofthe Duffy blood group were isolated from a human bonemarrow cDNA library using a PCR-amplifled DNA fragmentencoding an internal peptide sequence of glycoprotein D (gpD)protein. The open reading frame of the 1267-bp cDNA cloneindicated that gpD protein was composed of 338 amino acids,predicting a Mr of 35,733, which was the same as a deglyco-sylated gpD protein. Portions of the predicted amino acidsequence, matched with six CNBr/pepsin peptides obtainedfrom affinity-purified gpD protein. In ELISA analysis, ananti-Duffy murine monoclonal antibody reacted with a syn-thetic peptide deduced from the cDNA clone. Hydropathyanalysis suggested the presence of9 membrane-spanning a-hel-ices. In bone marrow RNA blot analysis, the gpD cDNAdetected.a 1.27-kb mRNA in Duffy-positive but not in Duffy-negative individuals. It also identified the same size mRNA inadult kidney, adult spleen, and fetal liver; in brain, it detecteda prominent 8.5-kb and a minor 2.2-kb mRNA. In Southernblot analysis, gpD cDNA identified a single gene in Duffy-positive and -negative individuals. Duffy-negative individuals,therefore, have the gpD gene, but it is not expressed in bonemarrow. The same or a similar gene is active in adult kidney,adult spleen, and fetal liver of Duffy-positive individuals.Whether this is true in Duffy-negative individuals remains to bedemonstrated. A GenBank sequence search yielded a signifi-cant protein sequence homology to human and rabbit inter-leukin-8 receptors.

The Duffy blood group system consists of two principalantigens Fya and Fyb produced by FY*A and FY*B alleles.Antisera anti-Fya and anti-Fyb defined four phenotypes,Fy(a+b-), Fy(a-b+), Fy(a+b+), and Fy(a-b-) (1). Neitherantiserum agglutinates Duffy Fy(a-b-) cells, the predomi-nant phenotype in Blacks. Antisera defining the other phe-notypes, Fy3, Fy4, and Fy5, are very rare. A murine mono-clonal antibody, anti-Fy6, defined another Duffy antigenicdeterminant present in all Duffy-positive cells but absent inFy(a-b-) cells (2). Blacks with Fy(a-b-) erythrocytes cannotbe infected by the human malaria parasite Plasmodium vivax(3). These cells are also resistant to the in vitro invasion byPlasmodium knowlesi, a simian parasite that invadesFy(a+b-) and Fy(a-b+) human erythrocytes (4). Receptorsfor erythrocyte invasion by these parasites, therefore, arerelated to the Duffy blood group system.Using the anti-Fy6 monoclonal antibody, we have devel-

oped a procedure for purification of Duffy antigens in humanerythrocytes (5). Duffy antigens appear to be multimericerythrocyte-membrane proteins composed of different sub-

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

units. A glycoprotein, named gpD, of 35-45 kDa is the majorsubunit of the protein complex and has the antigenic deter-minants defined by anti-Fya, anti-Fyb, and anti-Fy6 antibod-ies (5, 6). The characterization, at the molecular level, of thisprotein will be crucial in finding its function on the erythro-cyte membrane, in understanding the parasite-erythrocyterecognition process, and eventually in resolving the molec-ular mechanism of parasite invasion. This report describesthe isolation, sequence analysis, and tissue expression of amRNA encoding gpD. 11

MATERIALS AND METHODSPartial Amino Acid Sequence Analysis of gpD Protein. The

purified protein from Fy(a-b+) human erythrocytes (5) wasalkylated and cleaved with cyanogen bromide (CNBr) asexplained (7). Pe-1 peptide was obtained by sequencing thenonfractionated CNBr digest using the o-phthalaldehydeblocking reagent (8) (see legend, Fig. 1B). Pe-5 peptide wasthe partial sequence of the onily fragment (-4 kDa) thatseparated very well from the CNBr digest run on the three-layer SDS/PAGE system (9). After the run, the peptidefragment was electroblotted onto ProBlott (Applied Biosys-tems) and sequenced (7). Another aliquot was digested withpepsin (50: 1 ratio) at 37°C overnight, and the fragments wereseparated by reverse-phase HPLC using a Vydac C18 column.Pe-2, Pe-3, Pe-4, and Pe-6 peptides, which were the fewpepsin peptides yielded by reverse-phase HPLC, were se-quenced. Applied Biosystems protein/peptide sequencer,model 470 or 477, was used according to the manufacturer'srecommendations.Primer Design and PCR. The nucleotide sequence of the

primers (23-mer each) was deduced from the N-terminal andC-terminal amino acid sequences of Pe-5 peptide (see legend,Fig. 1B). Bases were chosen according to the codon prefer-ence described by Lathe (10), and deoxyinosine (I) wasincorporated at the position where degeneracy exceeded>3-fold, except toward the 3' end. Primer A (sense) wasspecific for residues 245-252 (see Fig. 1B) and consisted of12-fold degeneracy 5'-ATGAAYATHYTITGGGCITGGTT(where Y = C or T; and H = C, T, or A). Primer B (antisense)was specific for residues 261-268 (see Fig. 1B) and consistedof 32-fold degeneracy 5'-ACIAGRAARTCIAGICCIARNAC(where R = A or G; and N = G, A, T, or C).

First-strand cDNA was synthesized from Fy(a-b+) phe-notype mRNA using the preamplification kit from BRL andoligo(dT) as primer. For enzymatic amplification, cDNA,primer A, primer B, and Taq polymerase (Stratagene) were

Abbreviation: gpD, glycoprotein D.lTo whom reprint requests should be addressed.'The sequence reported in this paper has been deposited in theGenBank data base (accession no. U01839).

10793

Dow

nloa

ded

by g

uest

on

June

25,

202

0

Page 2: Cloning D cDNA,whichencodes major subunit ...10794 Medical Sciences: Chaudhuri et al. A Fyb81 Fyb71 Fyb71-81 250 500 750 1000 1250 tbp PstlI.EBstU! EcoRI Es# Aoti gl Accl B G GC? TCC

10794 Medical Sciences: Chaudhuri et al.

A

Fyb81Fyb 71

Fyb71-81

250 500 750 1000 1250tbp

Pstl BstU! EcoRI Es# Aoti gl AcclI.E

BG GC? TCC CCA 00A CTG TIC CTG CTC CGG CTC TIC AGG C?C CC? GC? TTG TCC T1?

TCC AC? 0?C CGC AC? GCA TCT GAC TIC TGC AGA GAC CTS G0S C?C CCA CCC GAC CTT

CC? C?C TO? CCT CCC CTC CCA CCT 0CC CCT CAG STC CCA GoA GAC TCI TCC OTA

acT cTG ATO GCC TCC TCT GOG TaT GTc cTc CAG OC G AG CTC TCC CCC TCA acT M^ON A s G Y V L Q A L * P s T Z

aAC TCA Aos CAG CTG SAC TIC GaA GAT oTA TOO AAT TIc TcC ?A? 0o0 0T0 A?T GA?

1i s Q L D r Z D V V s Y a V D

TCC TTC CCA GAT wa GAC TAT GaT Gcc AAC cTO GAA oca OCT occ ccc Toc cac Tcc37 s r P D G D Y D A L Z A A A P C s

TGO aAC CTG CTG GA? GAC TCT OCA CTG CCC TIC TIC ATC C?C aCC AaG 0?C CTO 0O056 C L L D D s A L P F F I L T S V L 0

A?? CTA SC? AGC AGC AC? C?? TTC ATO C?? TIC AGA CCT CTC TIC COC Cha75 I L A s s T V L N L V R P L r a Q

PO I

CTC TGC CC? GSC TOO CC? CTG WcA CAG CTO 0CC 0s0 OC AG? oCC C?c TIC aaC94 L C p G w P V L A o L A V s A L S

Pe1 P 2A?? GTG GST CCC 0SC TTG OCC CCA G00 CTA OU0 aoC AC? COC a0C TIc oCC CTG TOT

113 I V V P V L A P L 0 S T R S s A L C

Pe 2

AOC CCO GOC ?AC TG? TOO TAT GOC TCA GCC TI? OCC CAO SC? TTs CTs CCA 000

132 * L G Y C V V Y S A F A Q A L L L QPO 3

TOC CA? OCC TCC CTG Gac CAC AGA CTG GO? OCA GOC CAO CCA SOC C?C ACC CsO151 c A S L L A 0 Q V P 0 L T L

170

139

G00 C?C ac? 0SG GSA A?? TMO GSa osoT WcC cMA Cco WA Cso CCT 0CC CSG0 L T V 0 I w V A A L L T L P V T L

P. 4

oCC AoG Goo oCC TCI GO? GSA C?c TOC aCC CTO A?A TAC aoC ACO GAG CTo AOG OC?

A S 0 A s a L C T L I Y S T L r A

TTO CAG 0CC ACA CaC AC? oTa oCC To0 CTT oCC ATC T TMM TTo TTO CCA Tso os 853206 L Q A T * T V A C L A I r v L L P L 0

TTG TTT OGA oCC AAo 00 CTO aa0 AAO OCA TTo GO? A£O 0G0 CCA OC CCC TOo ATO 910227 L r 0a A o L R K A L 0 K 0 P a p 3 K

A?T ATC CTG TOo oCC TOO T?? AT? TTC TOO TOO CCT CA? 000 TG0 0? CTA oGa CT 967246 9 I L W A U r I r v v p * 0 v V L a L

. SG0T TTC CTO OTO AOG TCC AAO CTO T10 CTO TTS TCA ACA T0? CTO 0CC CAa CAGoCT 1024

26S5 D F L V R S £ L L L L S T C L A Q Q AP. 5

CTO GaC CTG CTO CTO AAC CTO OCA "A GCC CTo OCA AT? TTO CAC TO? 0O1 OCT ACO 1061204 L D L L L N L A K A L A I L U C V A T

CCC CTO CTc CTC GCC CTA TIC TOC CAC CAO 0CC ACC COC aCC CTc TTo CCC TIc CTO 1136303 P L L L A L F C I Q A T R T L L P 2 L

P. 6CCC CTC CC? OAA0A TG0 TCT TC? CAT CTO GAC ACC CT? 0G0 AC AAA TCC TAG TIC 1195

322 P L P K 0 V * S U L D T L 0 0 K S

?CT TCC CAC CTO ?CA ACC TOA A?? AAA OCTAC 0CC TI? 0c0 Am AAA AAA AAA 1252

AMA AAA AMA AAA AM 1267

FIG. 1. (A) Schematic representation and partial restriction sitesof the two longest gpD protein cDNA clones. The overlapping ofFyb81 and Fyb71 and the combination of the two clones (Fyb7l-81)are shown. An, poly(A) tail. (B) Nucleotide and amino acid se-quences of the combined Fyb71-81 cDNA encoding gpD protein.Amino acid residues are numbered on left; nucleotide positions arenumbered on right. Positions of peptides that match predicted aminoacids are shown by solid single lines. The two potential carbohy-drate-binding sites to asparagine residues are shown by arrows. Thethird glycosylation site, the asparagine at position 37, is unlikely tooccur because it is followed by aspartic acid (14). Double underliningat the 5' end indicates the sequence used to primer-extend the 5' end;double underlining at the 3' end is the consensus poly(A) additionsequence. The CNBr peptide (Pe-1) sequenced with the o-phthalal-dehyde reagent was P(L/A)FRIQL(S/C)PGXPVLAQ. Sequences ofpepsin peptides separated on HPLC were FSIVV (Pe-2), FAQAL(Pe-3), XXVGIX (Pe-4), and PSLPLPEGR (Pe-6). X denotes ill-defined residues in the corresponding cycle, and L/A and S/C meaneither residue. The sequence of CNBr peptide separated on SDS/PAGE (Pe-5) was [M]NILWAWFIFWWPHGVVLGLDFLV (resi-dues used to design primers A and B are in italics; residues used todesign the probe for cDNA-library screening are in boldface letters;the methionine added to CNBr Pe-5 peptide is in brackets).

incubated in a Perkin-Elmer thermal DNA cycler. The am-plification product of expected size (72 bp) was subcloned inpBluescript-SK vector (Stratagene). The deduced amino acidsequence of the insert matched the sequence of Pe-5 peptide(Fig. 1B). From the sequence WFIFWWPH of peptide Pe-5,

the oligonucleotide TGGTTTATTTTCTGGTGGCCTCATwas chemically synthesized, 32p labeled at the 5' end with T4polynucleotide kinase (New England Biolabs), and used as aprobe to screen a human bone marrow cDNA library (seebelow).Human mRNA and DNA Isolation. Poly(A)+ RNA was

isolated as explained (11) and by using the Invitrogen isola-tion kit. mRNAs from Caucasian adult liver, spleen, kidney,brain, and fetal liver, as well as erythroleukemia cells K-562,were obtained from Clontech. DNA was obtained fromperipheral blood leukocytes of the four Duffy phenotypes bya slight modification of a published procedure (12).RNA-Blot Analysis (Northern). Poly(A)+ RNAs were run

on formaldehyde/agarose gel and transferred onto Hy-bond-N+ nylon membranes (Amersham). They were hybrid-ized in QuickHyb (Stratagene) and washed according to themanufacturer's instructions.DNA-Blot Analysis (Southern). All restriction enzyme di-

gestions were done according to the conditions suggested bythe supplier (New England Biolabs). Digested DNA wassize-fractionated on 0.8% agarose gel and blotted as de-scribed for Northern analysis. Hybridization in QuickHybsolution was carried out at 68°C for 1 hr according to themanufacturer's instructions.

Construction and Screening of a cDNA Human Bone Mar-row Library. A mixture of mRNA of several Fy(a-b+)individuals, the BRL Superscript Choice system, and oli-go(dT) as a primer was used to prepare cDNA. The cDNAwas ligated into AZAP II vector and packaged with GigapackGold (Stratagene) extract. About 1.9 x 106 unamplifiedcDNA clones were screened with the 32P-labeled probedescribed above. cDNA inserts in pBluescript were isolatedby the plasmid-rescue method according to manufacturer'sprotocol. Both DNA strands were sequenced by using vectorprimers and by primers designed from the sequenced regionsof the transcript.Primer Extension. A 32P-labeled 24-mer antisense primer

from nt 57-80 of the coding strand (Fig. 1B) was extended onFy(a-b+) mRNA using a preamplification kit (BRL), and theproducts were separated on a 6% sequencing gel. The M13sequence ladder was used to determine sizes of products.

RESULTS AND DISCUSSIONPartial Amino Acid Sequence of gpD Protein. In this study

the gpD protein was purified from Fy(a-b+) erythrocytes.The N terminal of this highly hydrophobic protein wasblocked. The obstacles to obtain internal sequences were itsinsolubility at neutral pH without detergents and its tendencyto aggregate. After trial and error, the protein was digestedwith CNBr or pepsin, and three approaches were performedfor amino acid sequencing. (i) The unfractionated CNBrpeptide mixture was sequenced by blocking with the o-phthalaldehyde reagent those peptides that lacked a prolinein the earliest cycles of sequencing; the procedure yieldedPe-1 peptide of 16 residues (see legend for Fig. 1B). (ii) Afragment of the CNBr cleavage mixture (of =4 kDa) thatresolved well on SDS/PAGE was sequenced and producedPe-5 peptide of 23 residues. (iii) Four short pepsin fragmentsthat eluted as single peaks from a reverse-phase HPLCcolumn were sequenced and yielded Pe-2 and Pe-3 of fiveresidues each, Pe-4 of three residues, and Pe-6 of nineresidues (pepsin digestions at 100: 1 ratio and 4°C for 30 or 60min did not generate larger peptides).RNA Amplification by PCR. Pe-5 peptide was the most

promising for generating a probe for the selection of gpDprotein clones. Pe-2, Pe-3, Pe-4, and Pe-6 peptides were tooshort for PCR amplification, whereas the Pe-1 peptide waslarger, but it had three ill-defined residues (see legend for Fig.1B). Because the Pe-5 peptide was produced by CNBr

Proc. Natl. Acad. Sci. USA 90 (1993)

Dow

nloa

ded

by g

uest

on

June

25,

202

0

Page 3: Cloning D cDNA,whichencodes major subunit ...10794 Medical Sciences: Chaudhuri et al. A Fyb81 Fyb71 Fyb71-81 250 500 750 1000 1250 tbp PstlI.EBstU! EcoRI Es# Aoti gl Accl B G GC? TCC

Proc. Natl. Acad. Sci. USA 90 (1993) 10795

cleavage, a methionine was included at the N terminus toincrease the length of the peptide to 24 residues. Twodegenerated primers for amino acids 245-252 (primer A) atthe N terminus and amino acids 261-268 (primer B) at the Cterminus were chemically synthesized and used to amplifythe coding sequence of Pe-5 peptide from pooled human bonemarrow mRNA of Fy(a-b+) individuals. The expected PCR-amplified product of 72 oligonucleotides was cloned andsequenced; it had an open reading frame of the exact se-quence of the Pe-5 peptide (data not shown). The 24-meroligonucleotide probe having codon usage for amino acids251-258 successfully identified true gpD protein cDNAclones.

Nucleotide Sequence of the gpD Protein cDNA Clones. Anonampkified human bone marrow cDNA library constructedfrom pooled mRNA of Fy(a-b+) individuals was screenedwith the 24-mer probe. Of 1.9 x 106 recombinant AZAP IIphage, four positive clones were selected and sequenced. Allclones had overlapping sequences but did not fully extendgpD cDNA. Clone Fyb8l of 1085 bp, the only clone thatincluded the ultimate 5' end, and clone Fyb7l of 1083 bp,which extended from nt 185 to the poly(A)+ tail, were thelongest. Clone Fyb3l of 989 bp and clone Fyb82 of 726 bpextended from nt 275 and 527, respectively, to the poly(A)tail. Combination of clone Fyb8l with any other clonegenerated the full-length cDNA of gpD protein. Fig. 1Ashows the overlapping and combination of the two longestclones.Thejoined Fyb7l-81 clone predicted an open reading frame

that started at position 176 and stopped at position 1192,encoding a polypeptide of 338 amino acid residues (Fig. 1B).A GenBank sequence search (release 77) at the NationalCenter for Biotechnology Information using the BLAST net-work service yielded a significant protein sequence homologyto human and rabbit interleukin-8 receptors and quasi-totalnucleotide sequence homology with a human hippocampuscDNA clone HHCMF86 (see below). Verification of the 5'end of clone Fyb8l was done by primer extension. Theextended product of an antisense primer (from position 57 to80, Fig. 1B) yielded a sequence of 80 nt that matched exactlywith the predicted size at the 5' end of the Fyb8l clone (datanot shown). At positions 176-178, the initiation codon is notembedded within a sequence context most frequently asso-ciated with mammalian translation initiation (13). We as-sumed, however, that it is the true initiation codon for thefollowing reasons: (i) it is the only ATG codon at the 5' end;and (ii) from the first methionine residue, the polypeptideencoded by the combined clones has the same molecularmass as that of deglycosylated gpD protein (6). At the 3' end,clone Fyb7l-81 included the consensus poly(A) additionsignal AATTAAA (Fig. 1B).Both clones had a perfect nucleotide sequence match,

except at the 5' end, where several base substitutions yieldedsix different amino acid predictions. These discrepancieswere not a sequencing error because both DNA strands weresequenced several times. They were a consequence of pro-tein heterogeneity because the cDNA library was con-structed from mRNA of several Fy(a-b+) individuals.To establish that clone Fyb7l-81 had a coding sequence

specific to gpD protein, we compared the translated sequencewith the partial amino acid sequence data obtained from thesix peptides described in the legend of Fig. 1B. Portions ofthepredicted amino acid sequence matched with Pe-1 peptidesequenced by the o-phthalaldehyde reagent, with four pep-tides isolated by reverse-phase HPLC (Pe-2, Pe-3, Pe-4, andPe-6 peptides), and with the Pe-5 peptide isolated by SDS/PAGE. However, two of a total of 62 residues did not match.Thus, the residues at position 92 and 327 were tryptophan bycodon sequence analysis, but they were isoleucine and argi-nine, respectively, by amino acid sequence determination.

Because tryptophan is a very unstable residue, the discrep-ancies may be a technical problem in amino acid sequenceanalysis. On the other hand, they may be due to the heter-ogeneity of gpD protein.

Additional evidence that clone Fyb7l-81 encoded gpDprotein was provided by RNA blot and ELISA analysis.Probe Fyb8l did not detect any bone marrow mRNA inDuffy-negative individuals, but it detected an =1.27-kb tran-script representing the full-length of gpD mRNA in Duffy-positive individuals (see Fig. 3A). Anti-Fy6 antibody reactedwith a 35-mer synthetic peptide (residues 9-44, see Fig. 1B),predicted by the Fyb7l-81 clone (data not shown). Theabsence of gpD protein-specific mRNA in Fy(a-b-) pheno-types (see below) and the reaction of anti-Fy6 with a peptidederived from a gpD cDNA are strong indications that theclones we isolated are true Duffy clones.Amino Acid Sequence and Membrane Topology of gpD

Protein. The predicted translation product of the Fyb7l-81clone is an acidic protein of isoelectric point 5.65 and M,35,733. The protein carries at the N terminus only twopotential canonical sequences for N-glycosylation to aspar-agine residues (14). This result agrees with previous inves-tigations that N-glycosidase F digestion increases gpD pro-tein mobility on SDS/PAGE and with the chemical detectionof N-acetylglucosamine (6, 15, 16).

Predictions of transmembrane helices locations from se-quence data using the hydropathy map ofEngelman et al. (17)and a scanning window of 20 residues show that the bulk ofthe protein is embedded in the membrane (Fig. 2). Ninetransmembrane a-helices, a hydrophilic domain of 66 resi-dues at the N terminus, a hydrophilic domain of 25 residuesat the C terminus, and short protruding hydrophilic connect-ing segments were predicted. The pair of helices, D and E, isso closely spaced that the pair may be arranged as coupled

-& 3._

o 2c

E_- 1co. 0CuE

>% C)-10

C:

m -25a)

0L

A

'w I I

I31 1

-31 41 81 121 161 201 241 281Amino acids

NH2EXOCELLULAR

-2 II-w -r-c::

tz,R 10f 01I104iaA

I

oo 162 KE2534

CYTOPLASMCOOH

FIG. 2. (Upper) Hydropathy plot of the gpD sequence. Hydro-pathy values for a membrane-span of 20 residues were determinedusing the FOAMPC program as, described by Engelman et al. (17). Thepredicted a-helices were assigned alphabetical labels from A to I.(Lower) Proposed model for membrane orientation ofgpD was basedon the predictions of the hydropathy profile and on the exocellularlocation of the N terminus determined from the charge-differencerule, the two potential glycosylation sites, and anti-Fy6 reactivity.

| -A i F i g

Medical Sciences: Chaudhuri et al.

I I

Dow

nloa

ded

by g

uest

on

June

25,

202

0

Page 4: Cloning D cDNA,whichencodes major subunit ...10794 Medical Sciences: Chaudhuri et al. A Fyb81 Fyb71 Fyb71-81 250 500 750 1000 1250 tbp PstlI.EBstU! EcoRI Es# Aoti gl Accl B G GC? TCC

10796 Medical Sciences: Chaudhuri et al.

anti-parallel helices. A schematic illustrationtopology is shown in Fig. 2.The charge-difference rule proposed by l

(18) predicts that the N terminus is on the exothe C terminus of the protein is on the cytoplamembrane. The N-terminal prediction is vfinding of the two potential N-glycosylationterminus. Moreover, the reaction of anti-Fy6peptide deduced from this domain establishelocation experimentally because the antibod)rocytes. The signal-anchor sequence (19, 20insertion probably lies in the first transmembrfollows the N-terminal domain. From there odeeply buried in the membrane and exits at recytoplasmic side of the membrane (Fig. 2).predictions of helices, hydrophilic connectinjthe location of the C-terminal fragment shouated by direct biochemical and immunochenDuffy gpD protein is deeply buried in the

the membrane-associated fragment of band 3blood group Rh polypeptide (22, 23), bactericand lipophilin (25). The significant homologywith interleukin-8 receptors is very intriguingprotein binds chemokines and has the abililsignal-transduction cascade, this gives rise tcanother class of proinflammatory mediatcprotein is not present in leukocytes becausclonal antibody (anti-gpD) against purifiedgpD protein that reacts with erythrocytes alsors does not react with any leukocytes (unpuNorthern and Southern Analyses. On RN

Fyb71 or Fyb8l clone detected an -1.27-kbin the bone marrow of the three Duffy-positbut not in individuals of Fy(a-b-) phenotypxabsence of gpD mRNA was consistent withgpD protein in Duffy-negative individuals.body did not react with any erythrocyte memlFy(a-b-) erythrocytes (data not shown).

kbA B

kb12.0-

- 28S 6.5-_- 118S 3.5-

1.35- -2.0- _1.4-

0.24-

_ wActin

2 3 4

05-

i of gpD protein individuals do not express gpD protein because they do notsynthesize Duffy-specific mRNA.

Hartmann et al. On Southern blot analysis Fyb71 or Fyb81 probe hybrid-cellular side and ized with DNA of Duffy-positive and -negative individualsLsmic side of the (Fig. 3B). They identified a single band of 6.5 kb in BamHI,alidated by the two bands of 12 kb and 2 kb in EcoRl, and two bands of 3.5sites on the N kb and 1.4 kb in Pst I-digested DNA. These findings agreewith a synthetic with the restriction map of the Fyb7l and Fyb81 clones and:s its exocellular show a single-copy gene. Determination of the structuraly binds to eryth- differences among the genes of Duffy-positive and -negative1) for membrane individuals should clarify the mechanism ofgpD gene repres-rane a-helix that sion in negative individuals. A functional silencer elementon, the protein is described in other systems may selectively repress transcrip-sidue 314 on the tion of gpD gene in the erythrocytes of Fy(a-b-) individualsThe topological (28). The Duffy system is different from the ABO (29) andg segments, and Kell (Colvin Redman, personal communication) systems,ild be substanti- where mRNA has been found in individuals who do notnical analysis. express the blood group determinants.membrane like As indicated in Fig. 4, a 1.27-kb mRNA species was found(21), the human in adult spleen and kidney, fetal liver but not in adult liver,)rhodopsin (24), and K-562 erythroleukemia cells. Hybridization with the( of gpD protein ,B-globin probe showed a strong signal in bone marrow and(26, 27). If gpD fetal liver; it showed a weak signal in adult spleen but showedty to activate a no signal in adult liver, brain, and kidney (data not shown).) gpD protein as The presence of gpD mRNA in fetal liver was expected)rs. Thus, gpD because fetal liver is an erythropoietic organ. In human brain,e a rabbit poly- a strong band of 8.5 kb and a faint band of 2.2 kb wereand denatured detected. This finding is interesting, as it indicates there is and their precur- Duffy-related protein in brain. This idea is also supported byblished results). the quasi-total homology between Fyb7l-81 clone and aA blot analysis human hippocampus cDNA clone HHCMF86, which wasmRNA species recently identified (30). However, it is unlikely that the 8.5-kbLive phenotypes brain mRNA codes for a Duffy protein with long 5' and 3'e (Fig. 3A). The untranslated sequences. Perhaps the brain mRNA codes forthe absence of a larger protein that has extensive homology with gpDAnti-gpD anti- protein. The homologies of these mRNA species with gpD-brane protein of specific mRNA remain to be demonstrated by sequenceDuffy-negative analysis; however, the findings strongly indicate that gpD

protein or a similar protein is produced in kidney, nonhe-mopoietic spleen cells, and probably in brain.

In summary, we have isolated four cDNA clones thatencode gpD protein, the major subunit of the Duffy bloodgroup antigenic system. It is a highly hydrophobic intramem-

w _ brane glycoprotein with nine putative transmembrane a-hel-ices. The cognate gene is present in Duffy-positive and-negative individuals, but the bone marrow ofDuffy-negativeindividuals does not synthesize gpD-specific mRNA. In adultkidney, spleen, and fetal liver, the mRNA has the same sizeas gpD mRNA; however, in brain the mRNA is much larger.

2 3 4 5 6 7 8 9 10 IJ 12

FIG. 3. RNA and Southern blots probed with either the Fyb7l orthe Fyb81 insert. (A) Human bone marrow poly(A)+ RNA from thefour phenotypes. Lanes: 1, 10 ug of Fy(a-b-) mRNA; 2, 5 pg ofFy(a+b-) mRNA; 3, 5 pg of Fy(a-b+) mRNA; and 4, 2 pg ofFy(a+b+) mRNA. RNAs were resolved on a 2% denaturing agarosegel, blotted, hybridized, and autoradiographed for 72 hr at -80°C.RNA size markers human 28S (5.1 kb) rRNA, 18S (2.0 kb) rRNA, andthe 1.35-kb GIBCO/BRL marker (Life Technologies, Grand Island,NY) were used to calculate the size of gpD mRNA. The actin probeat bottom was used as control of sample loading. RNA integrity wasindicated by the presence of the two rRNAs in the poly(A)+ fractionand the actin probe. (B) Human genomic DNAs from the fourphenotypes. Each lane contained 10 pIg of digested DNA. Lanes:1-4, Fy(a-b-); 5-8, Fy(a+b-); 9-12, Fy(a-b+) DNA. Enzymedigestions were as follows: lanes: 1, 5, and 9, BamHI; 2, 6, and 10,EcoRI; 3, 7, and 11, Hinfl; and 4, 8, and 12, Pst I. DNAs wereresolved on 0.8% agarose gel, blotted, hybridized, and autoradio-graphed for 7 days at -80°C. Sizes were calculated from the positionsof GIBCO/BRL DNA markers.

8.50- X

2.20-1.35- *_-:"."...I

-"28S

-'18S

1 2 3 45 6 7

FIG. 4. RNA blot analysis of poly(A)+ RNA from human tissuesprobed with the insert of Fyb8l clone. Lanes 1, 3, 5, and 7 contained2 pg of Fy(a-b+) bone marrow, fetal liver, adult spleen, anderythroleukemia (K-562) mRNAs, respectively. Lanes 2, 4, and 6contained 7 pg of total brain, adult liver, and adult kidney mRNA,respectively. RNAs were resolved on a 1.5% denaturing agarose geland autoradiographed for 5 days at -80°C.

kb IF

Proc. Natl. Acad. Sci. USA 90 (1993)

Dow

nloa

ded

by g

uest

on

June

25,

202

0

Page 5: Cloning D cDNA,whichencodes major subunit ...10794 Medical Sciences: Chaudhuri et al. A Fyb81 Fyb71 Fyb71-81 250 500 750 1000 1250 tbp PstlI.EBstU! EcoRI Es# Aoti gl Accl B G GC? TCC

Proc. Natl. Acad. Sci. USA 90 (1993) 10797

The clones that we have characterized will provide theelements to investigate (i) the structural components of gpDgenes, (ii) the biosynthesis and expression of gpD protein inhuman bone marrow and other tissues, (iii) the structure-function of this erythrocyte-membrane protein that mightexist in other cell types and may function as a chemokinereceptor, and (iv) the role as the receptor for P. vivax-merozoite invasion.

Note Added in Proof. Since submission of this article, a family ofchemotactic and proinflammatory soluble peptides, including inter-leukin 8 (IL-8), melanoma growth-stimulating activity (MGSA),monocyte chemotactic peptide 1 (MCP-1), and regulated on activa-tion, normal T-cell expressed and secreted peptide (RANTES), hasbeen shown to bind to Duffy-positive erythrocytes only. This ob-servation strongly supports the idea that gpD protein is an additionalclass of chemokine receptor (31).

We are grateful to Dr. J. Adamson for his support and encour-agement. We thank Drs. M. Nichols and P. Rubinstein for the Fy6murine monoclonal antibody, Drs. D. Engelman and R. Macnab forthe FOAMPC program, and the Laboratory of Immunohematology forerythrocyte antigen typing. We are indebted to H. Hlawaty, M.Noble, and K. Stone for technical assistance and P. Lim forsecretarial assistance. This research was partially supported byGrantHL 39021 from the National Institutes ofHealth and New YorkBlood Center Institutional Funds.

1. Marsh, W. L. (1975) Crit. Rev. Clin. Lab. Sci. 5, 387-412.2. Nichols, M. E., Rubinstein, P., Barnwell, J., de Cordoba, S. R.

& Rosenfield, R. E. (1987) J. Exp. Med. 166, 776-785.3. Miller, L. H., Mason, S. J., Clyde, D. F. & McGinniss, M. H.

(1976) N. Engl. J. Med. 295, 302-304.4. Miller, L. H., Mason, S. J., Dvorak, J. A., McGinniss, M. H.

& Rdthman, K. I. (1985) Science 189, 561-563.5. Chaudhuri, A., Zbrzezna, V., Johnson, C., Nichols, M., Ru-

binstein, P., Marsh, W. L. & Pogo, A. 0. (1989) J. Biol. Chem.264, 13770-13774.

6. Chaudhuri, A. & Pogo, A. 0. (1993) in Blood Cell Biochemis-try, eds. Cartron, J.-P. & Rouger, P. (Plenum, New York), Vol.6, in press.

7. LeGendre, N. & Matsudaira, P. T. (1989) in A Practical Guideto Protein and Peptide Purification for Microsequencing, ed.Matsudaira, P. T. (Academic, New York), pp. 49-69.

8. Brauer, A. W., Oman, C. L. & Margolies, M. N. (1983) Anal.Biochem. 137, 134-142.

9. Shagger, H. & von Jagow, G. (1987) Anal. Biochem. 168,368-379.

10. Lathe, R. (1985) J. Mol. Biol. 183, 111-112.11. Chirgwin, J. M., Przybyla, A. E., MacDonald, R. J. & Rutter,

W. J. (1979) Biochemistry 18, 5294-5299.12. Maniatis, T., Fritsch, E. F. & Sambrook, J. (1982) Molecular

Cloning: A Laboratory Manual (Cold Spring Harbor Lab.Press, Plainview, NY).

13. Kozak, M. (1987) Nucleic Acids Res. 87, 8125-8131.14. Marshall, R. D. (1972) Annu. Rev. Biochem. 41, 673-702.15. Tanner, M. J. A., Anstee, D. J., Mallison, G., Ridgwell, K.,

Martin, P. G., Aventi, N. D. & Parsons, S. F. (1988) Carbo-hydr. Res. 178, 203-212.

16. Wasniowaska, K., Eichenberger, P., Kugele, F. & Hadley,T. J. (1993) Biochem. Biophys. Res. Commun. 192, 366-372.

17. Engelman, D. M., Steitz, T. A. & Goldman, A. (1986) Annu.Rev. Biophys. Chem. 15, 321-353.

18. Hartmann, E., Rapoport, T. A. & Lodish, H. F. (1989) Proc.Natl. Acad. Sci. USA 86, 5786-5790.

19. Wessels, H. P. & Spies, M. (1988) Cell 55, 61-70.20. Blobel, G. (1980) Proc. Natl. Acad. Sci. USA 77, 1496-1500.21. Jay, D. (1986) Annu. Rev. Biochem. 55, 511-538.22. Cherif-Zahar, B., Bloy, C., Le Van Kim, C., Blanchard, D.,

Bailly, P., Hermand, P., Salmon, C., Cartron, J. P. & Colin, Y.(1990) Proc. Natl. Acad. Sci. USA 87, 6243-6247.

23. Avent, N. D., Ridgwell, K., Tanner, M. J. A. & Anstee, D. J.(1990) Biochem. J. 271, 821-825.

24. Carlton, P. & Rosenbusch, P. J. (1985) EMBO J. 4, 1593-1597.25. Stoffel, W., Hillen, H., Schr6der, W. & Deutzmann, R. (1983)

Hoppe-Seyler Z. Physiol. Chem. 364, 1455-1466.26. Holmes, W. E., Lee, J., Kuang, W.-J., Rice, G. C. & Wood,

W. I. (1991) Science 253, 1278-1280.27. Murphy, P. M. & Tiffany, H. L. (1991) Science 253, 1280-

1283.28. Li, L., Suzuki, T., Mori, N. & Greengard, P. (1993) Proc. Natl.

Acad. Sci. USA 90, 1460-1464.29. Yamamoto, F., Clausen, H., White, T., Marken, J. & Hako-

mori, S. (1990) Nature (London) 345, 229-233.30. Adams, M. D., Dubnick, M., Kerlavage, A. R., Moreno, R.,

Kelley, J. M., Utterback, T. R., Nagle, J. W., Fields, C. &Venter, J. C. (1992) Nature (London) 355, 632-634.

31. Horuk, R., Chetan, C. E., Darbonne, W. C., Colby, T. J.,Rybicki, A., Hadley, T. J. & Miller, L. H. (1993) Science 261,1182-1184.

Medical Sciences: Chaudhuri et al.

Dow

nloa

ded

by g

uest

on

June

25,

202

0