document

4
letter 384 nature genetics • volume 22 • august 1999 Radiation hybrid map of the mouse genome William J. Van Etten 1 , Robert G. Steen 1 , Huy Nguyen 1 , Andrew B. Castle 1 , Donna K. Slonim 1 , Bing Ge 2 , Chad Nusbaum 1 , Greg D. Schuler 3 , Eric S. Lander 1,4 & Thomas J. Hudson 1,2 1 Whitehead Institute/MIT Center for Genome Research, Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA. 2 Montreal General Hospital Research Institute, McGill University, Montreal, H3G 1A4, Canada. 3 National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA. 4 Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA. Correspondence should be addressed to E.S.L. (e-mail: [email protected]) or T.J.H. (e-mail: [email protected]). Radiation hybrid (RH) maps are a useful tool for genome analy- sis, providing a direct method for localizing genes and anchor- ing physical maps and genomic sequence along chromosomes. The construction of a comprehensive RH map for the human genome 1 has resulted in gene maps reflecting the location of more than 30,000 human genes 2,3 . Here we report the first com- prehensive RH map of the mouse genome. The map contains 2,486 loci screened against an RH panel of 93 cell lines 4 . Most loci (93%) are simple sequence length polymorphisms (SSLPs) taken from the mouse genetic map, thereby providing direct integration between these two key maps. We performed RH mapping by a new and efficient approach in which we replaced traditional gel- or hybridization-based assays by a homoge- neous 5´-nuclease assay 5 involving a single common probe for all genetic markers. The map provides essentially complete con- nectivity and coverage across the genome, and good resolution for ordering loci, with 1 centiRay (cR) corresponding to an aver- age of approximately 100 kb. The RH map, together with an accompanying World-Wide Web server, makes it possible for any investigator to rapidly localize sequences in the mouse genome. Together with the previously constructed genetic map 6 and a YAC-based physical map reported in a companion paper 7 , the fundamental maps required for mouse genomics are now available. RH mapping involves screening sequence-tagged sites (STSs) against cell lines from an RH panel created by fusing irradiated donor cells from one species with recipient cells from another species. Each resulting cell line retains a fraction of the donor genome carried in a collection of large genomic fragments. STSs that lie nearby in the genome tend to be retained in the same cell lines. Therefore, STSs showing similar retention patterns in the RH cell lines can be inferred to lie nearby in the genome. To con- struct a comprehensive RH map of the mouse, we used the T31 RH panel, which carries fragments of the mouse genome on a hamster background. The T31 panel was preliminarily character- ized 4 by the screening of 271 STSs to estimate the retention rate and average fragment size. To construct an RH map providing comprehensive coverage of the mouse genome, we selected SSLPs from the MIT mouse genetic map 6 . The genetic markers were previously genotyped in an (OB×CAST) F2 intercross representing 92 meioses and thereby mapped to approximately 1,250 ‘bins’. At least one marker was selected from each bin, and as many as four were selected from the most populated bins. Preference was given to markers that had also been mapped on the higher resolution EUCIB cross 8 . RH mapping is usually performed by scoring the presence or absence of PCR products either by gel electrophoresis or by spot- ting and hybridization with an internal probe 1 . Both procedures involve considerable sample handling. For this project, we devel- oped a different approach. The genetic markers all contain a (CA) n -repeat sequence that is the basis of the length polymor- phism. We tested whether the loci might all be detected using the 5´-nuclease (TaqMan) assay 5 with a single common probe com- plementary to the (CA) n -repeat. The 5´-nuclease assay is a homo- geneous reaction: a target locus is amplified by PCR in the Fig. 1 Scoring marker retention on the RH panel based on ratio of FAM to TAMRA fluo- rescence. The thresholds for positive and negative samples were as defined. For the locus shown, positive samples (shown in yel- low) are those with ratios above 0.90 and negative samples (shown in blue) were those with ratios below 0.80. Samples between 0.80 and 0.90 were scored as uncertain (shown in red). Control samples are water (sample 2, green), hamster DNA (sample 58, orange) and mouse DNA (sample 71, magenta). The dashed line indicates the mean value; solid lines indicate thresholds. radiation hybrid cell line FAM/TAMRA ratio © 1999 Nature America Inc. • http://genetics.nature.com © 1999 Nature America Inc. • http://genetics.nature.com

Upload: greg-d

Post on 21-Jul-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

letter

384 nature genetics • volume 22 • august 1999

Radiation hybrid map of the mouse genome

William J. Van Etten1, Robert G. Steen1, Huy Nguyen1, Andrew B. Castle1, Donna K. Slonim1, Bing Ge2,Chad Nusbaum1, Greg D. Schuler3, Eric S. Lander1,4 & Thomas J. Hudson1,2

1Whitehead Institute/MIT Center for Genome Research, Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA. 2MontrealGeneral Hospital Research Institute, McGill University, Montreal, H3G 1A4, Canada. 3National Center for Biotechnology Information, National Library ofMedicine, National Institutes of Health, Bethesda, Maryland 20894, USA. 4Department of Biology, Massachusetts Institute of Technology, Cambridge,Massachusetts 02139, USA. Correspondence should be addressed to E.S.L. (e-mail: [email protected]) or T.J.H. (e-mail: [email protected]).

Radiation hybrid (RH) maps are a useful tool for genome analy-sis, providing a direct method for localizing genes and anchor-ing physical maps and genomic sequence along chromosomes.The construction of a comprehensive RH map for the humangenome1 has resulted in gene maps reflecting the location ofmore than 30,000 human genes2,3. Here we report the first com-prehensive RH map of the mouse genome. The map contains2,486 loci screened against an RH panel of 93 cell lines4. Mostloci (93%) are simple sequence length polymorphisms (SSLPs)taken from the mouse genetic map, thereby providing directintegration between these two key maps. We performed RHmapping by a new and efficient approach in which we replacedtraditional gel- or hybridization-based assays by a homoge-neous 5´-nuclease assay5 involving a single common probe forall genetic markers. The map provides essentially complete con-nectivity and coverage across the genome, and good resolutionfor ordering loci, with 1 centiRay (cR) corresponding to an aver-age of approximately 100 kb. The RH map, together with anaccompanying World-Wide Web server, makes it possible forany investigator to rapidly localize sequences in the mousegenome. Together with the previously constructed geneticmap6 and a YAC-based physical map reported in a companionpaper7, the fundamental maps required for mouse genomicsare now available.RH mapping involves screening sequence-tagged sites (STSs)against cell lines from an RH panel created by fusing irradiateddonor cells from one species with recipient cells from anotherspecies. Each resulting cell line retains a fraction of the donor

genome carried in a collection of large genomic fragments. STSsthat lie nearby in the genome tend to be retained in the same celllines. Therefore, STSs showing similar retention patterns in theRH cell lines can be inferred to lie nearby in the genome. To con-struct a comprehensive RH map of the mouse, we used the T31RH panel, which carries fragments of the mouse genome on ahamster background. The T31 panel was preliminarily character-ized4 by the screening of 271 STSs to estimate the retention rateand average fragment size.

To construct an RH map providing comprehensive coverage ofthe mouse genome, we selected SSLPs from the MIT mousegenetic map6. The genetic markers were previously genotyped inan (OB×CAST) F2 intercross representing 92 meioses andthereby mapped to approximately 1,250 ‘bins’. At least onemarker was selected from each bin, and as many as four wereselected from the most populated bins. Preference was given tomarkers that had also been mapped on the higher resolutionEUCIB cross8.

RH mapping is usually performed by scoring the presence orabsence of PCR products either by gel electrophoresis or by spot-ting and hybridization with an internal probe1. Both proceduresinvolve considerable sample handling. For this project, we devel-oped a different approach. The genetic markers all contain a(CA)n-repeat sequence that is the basis of the length polymor-phism. We tested whether the loci might all be detected using the5´-nuclease (TaqMan) assay5 with a single common probe com-plementary to the (CA)n-repeat. The 5´-nuclease assay is a homo-geneous reaction: a target locus is amplified by PCR in the

Fig. 1 Scoring marker retention on the RHpanel based on ratio of FAM to TAMRA fluo-rescence. The thresholds for positive andnegative samples were as defined. For thelocus shown, positive samples (shown in yel-low) are those with ratios above 0.90 andnegative samples (shown in blue) were thosewith ratios below 0.80. Samples between0.80 and 0.90 were scored as uncertain(shown in red). Control samples are water(sample 2, green), hamster DNA (sample 58,orange) and mouse DNA (sample 71,magenta). The dashed line indicates themean value; solid lines indicate thresholds. radiation hybrid cell line

FAM

/TA

MRA

rat

io

© 1999 Nature America Inc. • http://genetics.nature.com©

199

9 N

atu

re A

mer

ica

Inc.

• h

ttp

://g

enet

ics.

nat

ure

.co

m

letter

nature genetics • volume 22 • august 1999 385

presence of an oligonucleotide probe complementary to an inter-nal region of the locus; the probe hybridizes to the PCR productand is cleaved by the 5´ nuclease activity of Taq polymerase dur-ing the PCR reaction. Cleavage is assayed directly in themicrotitre plate by labelling the internal probe at one end with afluorescent dye and at the other end with a moiety that quenchesthe fluorescent signal when in close proximity. We tested thisapproach extensively and demonstrated high concordance withgel-based assays. The approach greatly streamlines RH mapping,reducing it to simply setting up PCR reactions in 384-wellmicrotitre plates and scanning them in a fluorescence platereader (Fig. 1).

We constructed the map as follows. All loci were screened induplicate. The average retention rate for the 93 cell lines used inthis project was 30%. Loci with abnormally high or low retentionrates or too many discrepancies between the duplicate typingswere discarded as unreliable1. We initially ordered the remaining2,409 SSLPs according to their location on the genetic map andcomputed RH distances using the RHMAPPER program9. Wethen refined the map by eliminating loci that introduced anunusually high number of apparent breaks and by testing permu-tations of local order. A reference map of 2,115 loci was obtainedin which the optimal order based on genetic mapping and RHmapping agreed completely. We placed the remaining loci intheir most likely position in their respective genetic linkagegroups within the reference map if they showed strong pairwiselinkage (lod>10). There were 200 markers for which the optimalorder differed from the genetic order. These differences tended tobe slight, with 83% of the conflicts spanning no more than twobins on the genetic map. Finally, we added 174 random genomicloci, which were localized in the panel by virtue of strong pair-wise linkage (lod>10) to a locus in the reference map.

The RH map contains 2,486 loci distributed across the 19autosomes and the X chromosome. The markers on each chro-mosome are described (Table 1), and a representative map forchromosome 8 is shown (Fig. 2). The total length of thegenome is approximately 29,500 cR, corresponding to about100 kb per cR.

To use an RH map to localize new loci, it is necessary to deter-mine the appropriate lod score threshold for reliably declaringlinkage. RH maps typically require much higher thresholds thangenetic maps, owing to the greater number of breaks between loci

and the variable nature of the retention patterns. For each SSLPlocus in the reference map, we calculated the highest lod scorewith a marker on a different chromosome (Fig. 3); this distribu-tion reflects the potential for spurious linkage. It is clear that rela-tively high lod scores can occur simply by chance. For this reason,it is not reliable to detect linkage by searching for the highest lodscore among a sparse collection of loci. This problem is largelyeliminated through the use of a very dense map. Most consecu-tive loci in the current map have lod scores exceeding 9.

To determine the proportion of the mouse genome reliablylinked to the RH map, we scored 97 consecutive random genomicSTSs (defining a unique fragment in mouse but not hamsterDNA) on the RH panel. All the STSs were linked to the map witha lod score of at least 7.0, and the lod score exceeded 9 in 98% ofcases. We conclude that the RH reference map covers most of themouse genome.

The RH map provides a powerful tool for anchoring contigsand localizing markers in the YAC-based mouse physical mapreported in a companion paper7. STSs were reliably localized inthe YAC-based map on the basis of double linkage (that is, link-age to a contig via two independent YACs), but not on the basis ofsingle linkage, owing to the possibility that the single linking YACis a chimaeric clone containing fragments from two differentchromosomal locations. In these cases, the addition of RH dataallows one to exclude the possibility of a chimaeric linkage andrely on single-YAC linkage information for physical coverage.With this RH information, the YAC-based map was shown tocover 92% of the mouse genome7.

Availability of the RH map allows investigators anywhere in theworld to readily map genes and other loci in the mouse genome.In the past, it was necessary to develop a genetic polymorphismand trace its inheritance in an appropriate cross. With RH map-ping, one need only score the presence or absence of an STS assayon the 93 DNAs from the publicly available RH map. By enteringthis information into a server on the web site of the White-head/MIT Center for Genome Research (http://www-genome.wi.mit.edu), the locus can be automatically mappedagainst the RH map. The one practical issue in using this RHpanel is that one must use a mouse STS that does not generate aband of equal size in hamster.

The availability of a comprehensive RH map together withthe availability of a large collection of mouse ESTs (ref. 10) is

Table 1 • Map statistics by chromosome

Total Reference Other Other Phys. length RH length RetentionChrom loci SSLPs SSLPs STSs (Mb) (cR) kb/cR rate

1 203 174 13 16 210 2,509 83.7 0.2662 192 172 11 9 203 1,957 103.7 0.2693 131 108 10 13 175 1,787 98.0 0.3374 155 129 15 11 172 2,108 81.6 0.2715 170 145 7 18 166 1,898 87.5 0.2736 142 117 13 12 161 1,674 96.2 0.2637 107 85 13 9 151 1,536 98.3 0.2738 136 118 11 7 145 1,468 98.8 0.3189 125 100 13 12 140 1,323 105.8 0.319

10 137 116 12 9 141 1,356 104.0 0.31411 132 113 14 5 138 1,538 89.7 0.34212 120 106 5 9 142 1,051 135.1 0.34913 110 94 10 6 128 1,016 126.0 0.29314 108 94 11 3 130 1,175 110.6 0.27915 90 80 4 6 118 1,073 110.0 0.33216 83 68 11 4 111 988 112.3 0.35317 99 81 10 8 113 1,208 93.6 0.32318 68 63 2 3 113 1,199 94.3 0.31619 87 72 6 9 80 806 99.3 0.379X 91 80 6 5 182 1,889 96.4 0.234

Total 2,486 2,115 197 174 2,919 29,559Ave 124 106 10 9 146 1,478 98.8 0.300

© 1999 Nature America Inc. • http://genetics.nature.com©

199

9 N

atu

re A

mer

ica

Inc.

• h

ttp

://g

enet

ics.

nat

ure

.co

m

letter

386 nature genetics • volume 22 • august 1999

likely to spur an explosion in the mapping of mouse genes, justas it has done for human genes. Such a gene-based map is apowerful tool for genetic studies such as positional cloning. Inaddition, the availability of gene-based maps in multiple organ-isms will provide the foundation for detailed synteny mapsshowing the correspondence between conserved genomic seg-ments, making it possible to exploit cross-species informationin gene hunts, as well as revealing much about evolutionaryforces molding the genome.

MethodsT31 RH panel. The T31 mouse/hamster RH panel (Research Genetics)contains 100 cell lines. In a preliminary characterization of the panel with∼ 6 assays per chromosome, we identified 7 cell lines with retention ratesunder 10%; these were eliminated as being largely uninformative. We per-formed RH typing using the remaining 93 cell lines (listed on the White-

head Institute/MIT web site) as well as control samples(mouse genomic DNA from strain 129, hamstergenomic DNA from strain A23 and a water blank).

Fluorescent PCR assay. PCR reactions were preparedin black 384-well PCR plates (MJ Research) in a multi-step automated system. RH and control DNAs werefirst aliquoted into the PCR plates from a large-volumesource plate using a 384-pin pipettor from the Geno-matron robotic platform1. Subsequently, reagents wereadded using a Packard liquid-handling robotic work-station. The final PCR reaction (20 µl) consisted oftemplate DNA (10 ng), forward and reverse primers (1µM each), dNTPs (200 µM each), MgCl2 (3.5 mM),Tris-HCl (10 mM, pH 8.3), KCl (50 mM), AmplitaqGold DNA polymerase (0.50 U) and fluorescent probe5´-FAM-(CA)11–TAMRA-3´ (0.024 µM). The reactionmixture was subjected to a two-step amplification pro-tocol (40 cycles of 15 s denaturation, 92 °C; 60 sannealing, 56 °C) performed on a Tetrad thermocycler(MJ Research). The extent of cleavage of the internalprobe was quantified with a Tecan fluorometer bymeasuring both FAM and TAMRA fluorescence (FAMEx. 485 nm, Em. 515 nm; TAMRA Ex. 485 nm, Em.595 nm), where FAM is the signal liberated by cleavageand thus reflects the presence of the desired PCR prod-uct and TAMRA is the quencher moiety whose signaldecreases during the reaction.

Scoring marker retention with 5´-nuclease assay. Weused a simple algorithm based on the ratio of FAM toTAMRA fluorescence to infer whether each of the 93hybrids was positive or negative for the target PCRproduct. In general, most hybrids were negative (inas-much as the retention rate is 30%), and the ratios forthe negative hybrids tended to be tightly clustered.Samples were declared negative, positive or uncertainas follows. Histograms were constructed by dividingthe range between the highest and lowest ratios into48 bins of equal size and assigning each ratio to a bin.

The most populated bin invariably contained negative samples and waswell separated from positive samples by a number of empty bins. Accord-ingly, we defined a set ‘N’ (intended to contain most of the negativehybrids) to consist of the samples in the most populated bin together withthose in all bins not separated from this bin by an empty bin. We checkedthat the set N included the negative controls. We then calculated the mean‘m’ and standard deviation ‘s’ of the samples in N and set s=1.96 s. Wedefined samples with ratios in the range [0, m+s] to be negative, thosewith ratios in the range [m+s, m+3s] to be uncertain and those with ratiosin the range [m+3s, ∞] to be positive. For a large number of loci, the scor-ing algorithm was confirmed by comparing its results with the results ofscoring PCR products by agarose gel electrophoresis. The concordancerate exceeded 99.5%.

RH map construction. All loci were scored in duplicate. The average reten-tion rate for the 93 cell lines used in this project was 30%. Loci with abnor-mally high (>60%) retention frequencies were discarded as likely to repre-

Fig. 2 Integrated RH and genetic map of mouse chromo-some 8. Long vertical lines represent the RH map (left line)and the MIT genetic map (right line). Both maps aredrawn to equal lengths. Columns of STS names corre-spond to each map. Loci are indicated at positions spacedproportionally along the map according to the respectivemetrics. Loci belonging to the RH reference map (in whichthe optimal order based on genetic mapping and RH map-ping agreed completely) are connected by solid lines tothe genetic map. Loci placed relative to the reference mapare shown in italics. Interrupted lines connecting bothmaps indicate instances where the optimal RH andgenetic orders differ.

© 1999 Nature America Inc. • http://genetics.nature.com©

199

9 N

atu

re A

mer

ica

Inc.

• h

ttp

://g

enet

ics.

nat

ure

.co

m

letter

nature genetics • volume 22 • august 1999 387

sent repeat sequences. Loci with abnormally low (<10%) retention fre-quencies or with more than four discrepancies (>4.3%) between the dupli-cate samples were similarly discarded as likely to be weak or unreliableassays. The mean discrepancy rate in the project was 2.6% for all markersand 1.4% for the mapped markers; a plot of per cent discrepancies is avail-able on the Whitehead web site. We then ordered the 2,409 SSLPs meetingthese criteria based on the MIT genetic map and calculated RH distancesusing the RHMAPPER program9. The loci were subjected to an ‘expansion’test in which any locus whose inclusion expanded the map distance bymore than 10 cR or mapped to a significantly different location was exclud-ed. The remaining 2,019 loci were next subjected to a ‘ripple’ test (using the‘ripple’ function in RHMAPPER) in which consecutive triples of loci werepermuted to determine if any permutation yielded a significantly higherlikelihood. The best permutation was re-inserted into the order and thenext triple was then considered. Finally, the 390 SSLPs that were removed

in the previous step were inserted back into the map using the ‘placement’function of RHMAPPER if they showed a strong pair-wise linkage(lod>10) to another marker. The reference map was defined as a set of2,115 markers for which the most likely orders on the genetic and RH mapswere in agreement.

AcknowledgementsWe thank G. Farino and V. Frazzoni for technical assistance. T.J.H. is arecipient of a Clinician-Scientist award from the Medical Research Council ofCanada. This research was supported by the National Institutes of Health(HG01806).

Received 16 April; 28 June 1999.

Fig. 3 Histogram of pair-wise lodscores between consecutive markerson the Whitehead Institute mouse RHreference map (white bars). Lodscores were calculated using RHMAP-PER, with an assumed retention rateof 30%. Bars represent lod scores inthe range (n–1) to n, and their heightindicates the number of such scores.In addition to the histogram, the fig-ure shows the inverse cumulative dis-tribution of the largest spurious,cross-chromosome lod score observedfor each marker on the referencemap. Thus, for example, 55% ofmarkers have a cross-chromosomallod score exceeding 6.

1. Hudson, T.J. et al. An STS-based map of the human genome. Science 270,1945–1954 (1995).

2. Schuler, G.D. et al. A gene map of the human genome. Science 274, 540–546(1996).

3. Deloukas, P.A. et al. A physical map of 30,000 human genes. Science 282, 744–746(1998).

4. McCarthy, L.C. et al. A first-generation whole-genome radiation hybrid mapspanning the mouse genome. Genome Res. 7, 1153–1161 (1997).

5. Livak, K.J., Marmaro, J. & Todd, J.A. Towards fully automated genome-widepolymorphism screening. Nature Genet. 9, 341–342 (1995).

6. Dietrich, W.F. et al. A comprehensive genetic map of the mouse genome. Nature380, 149–152 (1996).

7. Nusbaum, C. et al. A YAC-based physical map of the mouse genome. NatureGenet. 22, 388–393 (1999).

8. Rhodes, M. et al. A high-resolution microsatellite map of the mouse genome.Genome Res. 8, 531–542 (1998).

9. Slonim, D., Kruglyak, L., Stein, L. & Lander, E.S. Building human genome mapswith radiation hybrids. J. Comput. Biol. 4, 487–504 (1997).

10. Marra, M. et al. An encyclopedia of mouse genes. Nature Genet. 21, 191–194(1999).

pair-wise lod scores

num

ber

of m

arke

rs

per

cen

t

© 1999 Nature America Inc. • http://genetics.nature.com©

199

9 N

atu

re A

mer

ica

Inc.

• h

ttp

://g

enet

ics.

nat

ure

.co

m