Copyright � 2007 by the Genetics Society of AmericaDOI: 10.1534/genetics.107.074377
Predicting the Size of the Progeny Mapping Population Required toPositionally Clone a Gene
Stephen J. Dinka,* Matthew A. Campbell,† Tyler Demers* and Manish N. Raizada*,1
*Department of Plant Agriculture, University of Guelph, Guelph, Ontario, Canada N1G 2W1 and†The Institute for Genomic Research, Rockville, MD, 20850
Manuscript received April 10, 2007Accepted for publication June 4, 2007
ABSTRACT
A key frustration during positional gene cloning (map-based cloning) is that the size of the progenymapping population is difficult to predict, because the meiotic recombination frequency varies alongchromosomes. We describe a detailed methodology to improve this prediction using rice (Oryza sativa L.) as amodel system. We derived and/or validated, then fine-tuned, equations that estimate the mapping pop-ulation size by comparing these theoretical estimates to 41 successful positional cloning attempts. We thenused each validated equation to test whether neighborhood meiotic recombination frequencies extractedfrom a reference RFLP map can help researchers predict the mapping population size. We developed ameiotic recombination frequency map (MRFM) for �1400 marker intervals in rice and anchored eachpublished allele onto an interval on this map. We show that neighborhood recombination frequencies(R-map, .280-kb segments) extracted from the MRFM, in conjunction with the validated formulas, betterpredicted the mapping population size than the genome-wide average recombination frequency (R-avg),with improved results whether the recombination frequency was calculated as genes/cM or kb/cM. Ourresults offer a detailed road map for better predicting mapping population size in diverse eukaryotes, butuseful predictions will require robust recombination frequency maps based on sampling more progeny.
A limited number of forward genetics techniquesexist to isolate an allele that underlies a mutant or
polymorphic phenotype and that require no priorknowledge of the gene product. These include pro-tocols to isolate host DNA flanking insertional muta-gens (e.g., transposons) (Ballinger and Benzer 1989;Raizada 2003) and positional gene cloning techniques(Botstein et al. 1980; Paterson et al. 1988; Tanksley
et al. 1995) that permit the discovery of alleles createdby chemical mutagens, radiation, or natural genetic var-iation. Positional gene cloning is feasible when the fol-lowing conditions are met: (1) two parents exist thatdiffer in a trait of interest; (2) the parents can be distin-guished at the chromosome level by polymorphic DNAmarkers (e.g., RFLP); and (3) in a population of progeny,the underlying gene can be mapped relative to nearbyDNAsegmentsthathavepreviouslybeencloned(Botstein
et al. 1980; Tanksley et al. 1995). Unfortunately, posi-tional gene cloning suffers from unpredictability interms of the number of post-meiotic progeny that aresearcher can expect to genotype to narrow a candidatechromosomal region to a small number of candidategenes (Dinka and Raizada 2006). For example, in rice(Oryza sativa L.), only 1160 gametes were genotyped
to narrow the Pi36(t) allele to a resolution of 17 kb (Liu
et al. 2005), whereas 18,944 gametes were genotypedto map the Bph15 allele to a lower resolution of 47 kb(Yang et al. 2004). During fine mapping, the physicaldistance between a known physical location on a chro-mosome (i.e., themolecularmarker)andthetargetalleleis inferred by the frequency of meiotic recombinants thatcan break cosegregation of the phenotype encoded bythe target allele with physically anchored molecularmarkers (Botstein et al. 1980; Paterson et al. 1988).Ideally, a gene hunt ends once a molecular marker isfound that always cosegregates with the target phenotypein a large population of genotyped and phenotyped F2
(or post-F2) progeny. Therefore, the frequency of mei-otic recombination in the vicinity of the target locus(defined as R¼ kilobase/cM), along with the local den-sity of molecular markers, determines the size of themapping population. We are interested in helping re-searchers predict mapping population size. As initialanalysis assigns a target allele to a 1–5-cM map interval,the goal of this study is to determine whether the re-combination frequencyat this interval size,obtainedfroma high-density molecular marker map, can be used topredict the number of progeny required for subsequentsub-centimorgan mapping in combination with user-friendly mathematical formulas.
Durrett et al. (2002) used the kb/cM ratio (R) as thebasis of an equation (which we will refer to as theDurrett–Tanksley equation) to predict genotyping
1Corresponding author: Department of Plant Agriculture, University ofGuelph, 50 Stone Rd., Guelph, Ontario, Canada N1G 2W1.E-mail: [email protected]
Genetics 176: 2035–2054 (August 2007)
requirements during positional cloning, the only suchequation we could find in the literature. Durrett et al.compared the results of their equation to empiricalevidence from 12 published positional cloning suc-cesses in Arabidopsis thaliana; the model often appearedto overestimate the number of progeny required to begenotyped. However, the accuracy of the model wasdifficult to assess, because only the genome-wide re-combination frequency was employed, rather than localrates of recombination. Perhaps as a result, it was simplyconcluded that some researchers were lucky or unlucky(Durrett et al. 2002).
Building upon the work of Durrett et al., we havetried to understand and predict when a researcher willbe lucky or unlucky during positional gene cloning byaccounting for: (1) over-genotyping (resulting in re-dundant crossovers between the target locus and theclosest molecular markers); (2) a low density of availablemolecular markers in the target interval (causing somecrossovers to be missed); and most important, (3) highor low local rates of local recombination (R) comparedto the genome-wide average (Nachman 2002). We havecompared the predictions of the Durrett–Tanksleyequation to empirical data obtained from 41 positionalcloning studies in rice (O. sativa L.), which is a modelsystem for the world’s most important crops, the cereals(Paterson et al. 2005). Specifically, we have measuredthe predictability of the Durrett–Tanksley equation andthen focused on whether ‘‘neighborhood’’ (,2 cM) re-combination values obtained from a reference geneticmap (Harushima et al. 1998) further improve the ac-curacy of the model compared to using the genome-wide average recombination rate (R-avg). In addition,we have derived and tested a simpler equation thatpredicts progeny mapping size. Finally, we have mea-sured the utility of employing R-values calculated asgenes/cM rather than kb/cM to predict mapping pop-ulation size, as the former allows the candidate genenumber to be estimated, which is of greater interest toresearchers targeting sequenced, annotated genomes.
MATERIALS AND METHODS
Use and modification of the Durrett–Tanksley equation:First, we used the Durrett–Tanksley equation (Durrett et al.2002) which estimates the number of F2/post-F2 meioticgametes required to positionally clone an allele as derivedfrom an F1 heterozygote, based on the following probability:
P ¼ 1� ½1 1 NT=ð100RÞ�e�NT=ð100RÞ;
where P is probability (P) that if a (proximal) crossover occursin the vicinity of a target allele that a second (distal) crossoverwill be carried by a sibling gamete; N is number of genotypedchromosomes (informative gametes) required; T is map reso-lution, the candidate kilobase or gene block distance betweenthe closest two molecular markers containing the targetallele; and R is recombination frequency (kb/cM or genes/cM).
As the equation is dependent only on the value NT/100R,then if the probability is set at 0.95, NT/100R ¼ 4.744, whichmay be rewritten as N ¼ (4.744 3 100R)/T.
To adjust for the target number of gametes containing aninformative crossover (lT), which we assume may decrease T(better map resolution), we introduced the empirically-derived T modifier, 4.744/lT (see results); the resultingmodified Durrett–Tanksley equation is as follows:
N ¼ (4.744 3 100R)/½T-marker 3 (4.744/lT)�,or simplified,
N ¼ ð100R 3 lTÞ=T -marker ;
where N is total number of informative chromosomes (game-tes) that must be genotyped with the probability of success setat P ¼ 0.95, R is the local recombination frequency (R-local)(kb/cM or genes/cM), T-marker is distance between the closesttwo molecular markers (in which crossovers are detected rel-ative to the target allele) (kilobases or gene block), and lT isnumber of crossovers between the closest two molecularmarkers ($2).
The Durrett–Tanksley equation assumes that the recombi-nation frequency (R) is constant in the vicinity T of the tar-get allele. This equation also requires that the genotype of thetarget allele (a) in F2/post-F2 progeny can be assigned. Thus,in the case of a recessive target allele, N equals the number ofF2 testcross progeny. Alternatively, where F2 progeny are theproduct of selfing F1 heterozygotes (such as in plants), thensince each F2 progeny is derived from two meioses, N equalstwo times the number of F2 progeny genotyped; this is onlytrue, however, when the F2 progeny genotype AA can be dis-tinguished from the genotype Aa since this is required todetermine whether a crossover occurred on the proximal ordistal side of the target allele. Such a determination requirestesting progeny for segregation of phenotypes in the F3 gen-eration (progeny testing).
Derivation of a simplified equation based on single-crossover probability: We developed the following user-friendly equation to estimate the fine-mapping populationsize, an estimate of the number of F2 testcross progeny re-quired to be genotyped to detect sufficient crossovers toachieve a desired kilobase or gene block resolution:
N ¼ Log ð1� PÞ=Logð1� T -marker=100RÞ;
where N is the number of meiotic gametes (chromosomes)that must be genotyped in which it can be determined whethera crossover is located proximal or distal to the target allele, P isthreshold probability of success (e.g., 0.95), T-marker is expecteddistance between flanking molecular markers (kilobases or can-didate genes), and R is local or genome-wide average recom-bination frequency (kb/cM or genes/cM).
This equation was based on the assumption that if a cross-over occurs in a segment (with length T) on the proximal sideof a target allele in a large population of F2 progeny (N), thenthere is an equal chance that a recombination event will becarried by a sibling F2 gamete on the distal side within adistance of ,T from the target allele as shown in Figure 1B.Hence, because the probability of only a single recombinationevent occurring within the mapping population must becalculated, the equation is simplified. However, it is recog-nized that the distance between the two crossovers will rangefrom zero to 2T; on average, however, the distance will beT, and likely ,T when there are more than two informativecrossovers and/or when the molecular marker resolution islimiting. However, since the majority of positional cloningstudies report more than two informative crossovers (l) (seeTable 2), and since the minimum distance between flanking
2036 S. J. Dinka et al.
molecular markers (T-marker) is often limiting, then the prob-ability is high that the distance between the closest two cross-overs will be ,T-marker.
The detailed derivation of this equation is as follows:
1. P (failure) of a crossover in the target interval (T) pergamete ¼ (total genome crossovers � target interval cross-overs)/total genome crossovers.
2. Alternatively, P (failure) per gamete ¼ 1 � (fraction ofgenome 3 number of crossovers in whole genome).
3. Thus, P (failure) per gamete ¼ 1 � ½(kb resolution/kbgenome size 3 (genome map in cM/100)� or P(failure) pergamete ¼ 1 � ½(gene block resolution/genome-wide genenumber 3 (genome map in cM/100)�.
4. Since P (failure) ¼ (P failure per gamete)N, where N isnumber of informative gametes, then
N ¼ Log ðP failÞ=Log ðP fail per gameteÞand
N ¼ Log ð1� P successÞ=Log ðP fail per gameteÞ:
5. Therefore, N ¼ Log (1 � Psuccess)/Log ½1 � (gene block/genome gene number 3 genome map cM/100)� or N¼ Log(1 � Psuccess)/Log ½1 � (kb target/genome kb 3 genomemap cM/100)�.
6. Simplified, the above equation can be rewritten as:
N ¼ Log (1 � Psuccess)/Log ½1 � (kb target/100) 3 (totalcM/total genome kb)�,
or
N ¼ Log ð1� PÞ=Log ð1� T -marker=100RÞ;
where R is local or genome-wide recombination frequency.Additional assumptions of this model are as follows:
1. The equation assumes that the phenotype of the trait ofinterest can be readily scored to determine if a crossoveroccurred proximal or distal to the target allele; hence N isequivalent to the number of testcross progeny, 0.5 3 thenumber of F2 (selfed) progeny (if no progeny testingperformed), or 2 3 the number of F2 (selfed) progeny (ifF3 progeny testing is performed).
2. The equation assumes that the frequency of double-recombinants in a small interval is negligible due tocrossover interference.
3. The equation assumes that the crossover may occur any-where in the defined interval T such that the distance be-tween each informative crossover and the target locus is ,T.
4. The recombination frequency is assumed to be constant inthe region ,2T.
Modified single crossover equation: Based on empiricaldata, we then modified this equation by adjusting the geneticmap resolution T by the number of crossovers (see results),resulting in the equation:
N ¼ Log ð1� PÞ=Logf1� ½T -marker 33=lT�=100Rg;
where N is total number of informative chromosomes that mustbe genotyped with the probability of success, P¼ 0.95, R is thelocal recombination frequency (R-local) (kb/cM or genes/cM),T-marker is distance (kb or candidate gene block) between theclosest two molecular markers (in which crossovers are de-tected relative to the target allele), and lT is number of cross-overs between the closest two molecular markers ($2).
Analysis of published positional cloning studies: We ana-lyzed 41 published positional cloning/fine-mapping studies inrice to extract or calculate the three variables, N, T, and R
(Table 1). The candidate gene resolution (T) ½in kb or genenumber, T(kb) or T(gene)� was either reported in each studyor obtained by personal communication with the authors. Inthe latter case, these were confirmed by corroborating thekilobase resolution with the gene resolution using the TIGRPseudomolecules Release 4.0 database (Yuan et al. 2005);retroelements, transposons, and transposases were excludedfor gene resolution. The calculation of N gametes genotypedwas more complex; it required us to distinguish the actualnumber of progeny genotyped (g) from the number ofinformative chromosomes (N), defined as chromosomes thathad the potential of having a crossover between the tar-get allele and a flanking molecular marker, and where thelocation of that crossover (proximal or distal to the target) wasdistinguished (e.g., using progeny testing). To convert g to N,we multiplied g by a meiosis factor ( f ) as shown in Table 1 (alsosee footnotes to Table 1). This required us to classify themapping strategy used and note whether the target trait wasdominant, recessive, or was expressed in the haploid genera-tion (gamete or gametophyte). For example, for the cloning ofthe recessive bc1 allele (Y. Li et al. 2003), since only F2 recessiveprogeny were genotyped (7068 recessives genotyped out of30,000 F2 progeny) and hence the genotype of the target allelewas non-ambiguous, the total number of informative chromo-somes genotyped was 2 3 7068 (i.e., f¼ 2, hence N¼ 2 3 g). Incontrast, for the fine mapping of the dominant Psr1 allele(Nishimura et al. 2005), since 3800 (Backcross 3, BC3) F1
progeny were genotyped, and thus only 50% of the targetchromosomes underwent informative meioses, then f ¼ 0.5,and N ¼ 1900 informative chromosomes. For rice, it wasassumed that males and females had equal rates of recombi-nation, but in many species, such as zebrafish, this is not true(Singer et al. 2002; Lenormand and Dutheil 2005) and mustbe accounted for in the meiosis factor. Finally, to calculate thelocal recombination frequency (R-local) (Table 2), we usedthe following equation:
R -local ¼ T ðlocalÞ=mðlocalÞ;
where R is local recombination frequency (kilobases/cM),T(kb) is distance in kilobases between the closest two cross-overs, m is genetic map distance between the two crossovers incentimorgans, and m¼ 100 3 (l1 1 l2)/N, where l1 is numberof closest, proximal crossovers (Table 2), l2 is number of closest,distal crossovers (Table 2), and N is total number of informa-tive gametes (chromosomes) genotyped (Table 1). In a testcross,m ¼ 100 3 l/progeny, whereas in a selfed cross with progenytesting, m¼ 100 3 (l/2 3 progeny) since genotyping permitsboth chromosomes to contribute to the mapping population.
The only crossovers (lT) in the calculation were those thatwere in between the two molecular markers used to define T.For each of the 41 studies, we applied the values for R(local),T(kb) and set P at 0.95, to the Durrett–Tanksley equation andcompared the number of informative gametes (N) required bythis equation to the empirical numbers shown in Table 1. Weperformed both nonparametric correlation analysis (Spear-man coefficient) and linear regression analysis using thesoftware program Instat 3 (GraphPad Software).
Generation of a reference meiotic recombination frequencymap (MRFM) for rice: To determine whether recombinationfrequencies derived from a reference genetic map could beused to predict progeny sampling requirements using theDurrett–Tanksley equation, we first assembled such a map,inspired by a previous report (Wu et al. 2003), to generate twotypes of recombination values: R(gene), in genes/cM; andR(kb), in kilobases/cM (see supplemental Table 1 at http://www.genetics.org/supplemental/). The names and GenBankaccession numbers of RFLP markers genetically mapped in an
Predicting Positional Gene Cloning 2037
F2 population between Nipponbare and Kasalath were ob-tained from the Rice Genome Project (RGP: http://rgp.dna.affrc.go.jp/) (Harushima et al. 1998). FASTA sequence filesfor the markers were obtained from NCBI. The RFLP markersequences from the RGP map were physically mapped ontothe version 4 TIGR rice pseudomolecules map (http://www.rice.tigr.org) using the Genomic Mapping and AlignmentProgram (GMAP) (Wu and Watanabe 2005). The physicalmap position of each marker was derived from the top hit thatexceeded a threshold of 95% identity over 90% of the length.After physically positioning the RFLP markers onto the pseu-domolecules, Perl scripts and manual inspection were used toremove all markers showing map incongruency (where thephysical and genetic position of the markers were at odds). Weobtained 1391 congruent markers for the RGP map. This es-tablished both physical and genetic locations and hence in-terval distances for each RFLP marker; from these values, thekb/cM recombination frequency was calculated for each markerpair. To generate the corresponding genes/cM frequencies,we queried the Osa1 database at TIGR: the coordinates of all42,535 non-transposable element-related transcription unitswere obtained (Yuan et al. 2005). Custom Perl scripts werewritten to bin these transcription units between each RFLPmarker pair. This established the number of non-transposableelement candidate genes for each interval along with thegenetic locations of these markers, and hence the followingparameters were calculated for each RFLP marker pair: thegenetic distance between each marker and the correspondinggenes/cM recombination rate.
Testing the predictive value of the Modified Durrett–Tanksley equation using R-map recombination frequencies:Next we assigned each target allele to a physical location onthe RGP physical map, which contains 1400 marker intervals.To accomplish this, each target allele was assigned a TIGRlocus number (if cloned) onto a BAC/PAC clone (if not cloned;TIGR Pseudomolecules Release 4.0); sometimes this informa-tion was published. In remaining examples, the GenBankgene sequence or molecular marker information was used toscreen the TIGR rice sequence database; the genetic mapposition, marker data, and BAC/PAC assignment helped toverify the physical assignment. The locus or BAC/PAC nameand sequence was then used to assign each allele to an intervalbetween two mapped markers on the RGP MRFM of rice(Table 2; supplemental Table 1 at http://www.genetics.org/supplemental/). The recombination frequency of the corre-sponding marker interval (R-map) was then employed; becausewe feared that chance crossovers might distort the recombina-tion frequency in small intervals (,277 kb, 1-cM average) onthis map, adjacent segments were sometimes added together(to achieve a .280-kb interval) before calculating an averageR-map value with the goal of situating the target allele at thephysical center of the larger interval. In rare situations, anR-map value for an interval of ,280 kb was accepted becauseadjacent intervals were unusually large. The choice to add ornot add marker intervals was done blindly from the R-localvalues in order to not bias R-map values. The R-map valueswere then applied to each equation.
Calculation of R-avg values: The genome-wide averagerecombination frequency in kilobases/cM was calculated bydividing the total genome size (�430 Mb) (IRGSP 2005) bythe total genetic map length (�1521 cM) (Harushima et al.1998); the average recombination frequency in genes/cMwas calculated by dividing the total number of non-transpos-able element-encoded transcription units (�42,535) (Yuan
et al. 2005) by the map length. The resulting genome-widerecombination frequency (R-avg) in rice is 277 kb/cM and 28genes/cM.
RESULTS
Initial equations to predict mapping population size:Initially, we employed two equations to predict the sizeof the fine-mapping population, one of which is de-veloped here. First, we used the Durrett–Tanksley equa-tion (Durrett et al. 2002), which estimates the numberof F2/post-F2 meiotic gametes required to positionallyclone an allele as generated from an F1 heterozygote; itcalculates the probability (P) that if a (proximal) cross-over occurs in the vicinity of a target allele that a second(distal) crossover will be carried by a sibling gamete,such that the distance between the two crossovers will bethe kilobase distance T (Figure 1A), for a prescribednumber of genotyped gametes (N) (informative chro-mosomes) and for a given recombination frequency (R),according to the following equation:
P ¼ 1� ½1 1 NT=ð100RÞ�e�NT=ð100RÞ:
The primary assumption of the equation is that theprogeny number will vary with the recombination fre-quency: the higher the frequency of recombination, thefewer progeny will be required to detect a crossover be-tween the target allele and flanking molecular markers.See materials and methods for additional details.
We then derived a second equation with the goal ofmaking it more user-friendly for researchers. This equa-tion was based on the following premise: if a crossoveroccurs in a segment (with length T ) on the proximalside of a target allele in a large population of F2 progeny(N), then there is an equal probability that a siblinggamete will carry a crossover on the distal side within adistance of ,T from the target allele as shown in Figure
Figure 1.—An explanation of how the map resolution (T)was calculated for the equations used in this study. (A) TheDurrett–Tanksley equation calculates the probability thattwo sibling post-meiotic (F2) gametes will carry informativecrossovers: a proximal crossover occurring (X) flanking thetarget allele (solid line) and a second, distal crossover occur-ring at a distance ,T from the first crossover. (B) The simpli-fied, single crossover equation divides the flanking region ofthe target allele into T segments, and calculates the probabil-ity that a crossover will occur in any T segment. Thus, on av-erage, within the F2 gamete population, the average distancebetween flanking crossovers will be T (range .0 to ,2T).
2038 S. J. Dinka et al.
1B. This simplifies the equation by only having to calcu-late the probability of a single crossover within the pop-ulation, noting, however, that although on average anytwo crossovers will be distance T apart, they may rangefrom zero to 2T (see materials and methods for fur-ther details). The number of F2 testcross progeny re-quired to be genotyped to detect sufficient crossovers toachieve a desired kilobase or gene block resolution isthus as follows:
N ¼ Log ð1� PÞ=Log ð1� T -marker=100RÞ;
where N is the number of meiotic gametes (chromo-somes) that must be genotyped in which it can be deter-mined whether a crossover is located proximal or distalto the target allele, P is threshold probability of success(e.g., 0.95), T-marker is expected distance between flank-ing molecular markers (kilobases or candidate genes),and R is local or genome-wide average recombinationfrequency (kb/cM or genes/cM).
Similar totheDurrett–Tanksleyequation, thismodelas-sumes that the phenotype of the trait of interest can bereadily scored to determine if a crossover occurred prox-imal or distal to the target allele; hence N is equivalent tothe number of testcross progeny, 0.5 times the number ofF2 (selfed) progeny (if no progeny testing performed), ortwo times the number of F2 (selfed) progeny (if F3
progeny testing is performed). The derivation of thisequation is in the materials and methods section.
Empirical gamete number, mapping resolution, andlessons from published studies in rice: To validate theequations noted above, we first analyzed 41 publishedpositional cloning/fine-mapping studies in rice, to ex-tract or calculate N and T (Table 1) (see materials and
methods). We made several observations that might beuseful to future research groups who wish to undertakepositional cloning in rice. First, as in other species, inrice there was a wide range in the number of informativegametes (N) (potential recombinant chromosomes) thatwere genotyped to positionally clone target alleles: thisranged from only 416 gametes for the Pi-kh allele(Sharma et al. 2005) to �20,000 gametes for the allelesGn1a (Ashikari et al. 2005), qSH1 (Konishi et al. 2006),and Bph15 (Yang et al. 2004), an �25-fold range. Theaverage number of informative gametes genotyped was5686; the median was 4200. The median target resolution(T) achieved was 44.5 kb or five genes. There were sevenexamples of single-gene resolution mapping (Table 1),and to achieve this resolution, the number of informativegametes employed ranged from 2800 to 26,000 (�10-foldrange); the average was 11,593 gametes. Single generesolution mapping in a smaller genome, A. thaliana, hasbeen much rarer (Dinka and Raizada 2006).
Several fine-mapping strategies were used successfully:
1. Of 41 studies, 11 groups reported isolation of aquantitative trait locus (QTL); to reduce the effects
of minor QTL and/or to be able to employ a back-ground with well-characterized molecular markers,the target QTL was isolated by limited backcrossing(BC) or full introgression (near isogenic line, NIL)into a new genetic background. In other examples(e.g., qSH1) (Konishi et al. 2006), the original QTLgenome was used for mapping such that all but thetarget QTL was fixed (not segregating); to createheterozygosity in the region containing the targetallele for mapping, a corresponding chromosomesegment from a polymorphic genotype was crossed in½segment substitution line (SSL)� (Table 1).
2. Because outcrosses/testcrosses are challenging inrice, most studies involved selfing progeny, whichhas the potential of carrying informative crossoverevents on both diploid chromosomes, thus poten-tially doubling the effective number of informativegametes (N). One of the challenges created by selfing,however, for recessive alleles, is that it is not possible todetermine whether a crossover occurred proximal ordistal to the target without checking for the segrega-tion pattern (progeny testing, PT) in the subsequentgeneration (e.g., F3) to distinguish all genotype com-binations (aa, Aa, AA) at the target locus. Six groupsprogeny-tested to check the recessive genotype (e.g.,chl1) (H. T. Zhang et al. 2006). Alternatively, to avoid F3
generation phenotyping, 15 groups (e.g., bc1) (Y. Li
et al. 2003) preselected recessive (mutant) progeny byphenotyping and then only genotyped this subset, thusdiscarding 75% of all progeny.
3. There were 12 fully dominant alleles targeted; in thesecases, as in recessive alleles, because the proximal vs.distal location of flanking crossovers could not bedistinguished without distinguishing AA from Aa geno-types, researchers either progeny-tested in the sub-sequent generation (e.g., Pi-kh) (Sharma et al. 2005)or, cleverly, preselected only the recessive progeny classfor genotyping (e.g., Xa1) (Yoshimura et al. 1998).
4. Finally, there were four examples ½ f5-DU, Rf-1, S32(t),S5n� where the target alleles were expressed in thehaploid generation (e.g., pollen grain, embryo sac)and where the nature of the gene products oftenrequired generating outcross/testcross progeny formapping. In the case of f5-DU (Wang et al. 2006), anallele that boosts pollen viability in specific hybridgenotypes, testcross progeny were used for mapping,since phenotyping required a hybrid background tocheck for segregation of viable pollen grains (eitherhigh or low). Similarly, to fine map the S5n locus (Qiu
et al. 2005), which confers embryo sac viability to wide-cross hybrids, 8000 hybrids were generated by out-crossing a heterozygous NIL S5n/� parent (NIL F1)to a wide-cross tester; phenotyping was performed bymeasuring segregation of fertility of F2 embryo sacson hybrid rice spikelets. In the case of S32(t) (Li et al.2007), which also confers (post-meiotic, haploid)embryo sac viability, the segregation of embryo sac
Predicting Positional Gene Cloning 2039
TA
BL
E1
An
alys
iso
fp
ub
lish
edp
osi
tio
nal
clo
nin
gan
dfi
ne-
map
pin
gst
ud
ies
inri
ce(O
ryza
sati
va)
Tar
get
alle
leT
IGR
ann
ota
tio
n
Inh
erit
ance
Map
pin
gst
rate
gya
To
tal
pro
gen
yP
roge
ny
gen
oty
ped
(g)
Mei
osi
sfa
cto
r(f
)b
Info
rmat
ive
gam
etes
(N)c
gen
oty
ped
Can
did
ate
reso
luti
on
(T)d
Ref
eren
ceT
ype
Tra
itkb
Gen
ese
bc1
LO
C_O
s03g
3025
0Si
mp
leR
ecF
2-R
ec�
30,0
007,
068
214
,136
3.3
1Y.
Li
etal
.(2
003)
bel
LO
C_O
s03g
5524
0Si
mp
leR
ecF
2-R
ec98
723
12
462
110
18P
an
etal
.(2
006)
Bph
15
fB
AC
20M
14/
BA
C64
O9
Sim
ple
Do
mR
ILF
2-A
ll1
PT
9,47
29,
472
218
,944
4711
Yan
get
al.
(200
4)ch
l1L
OC
_Os0
3g59
640
Sim
ple
Rec
F2-A
ll1
PT
�2,
000
477
295
471
.69
H.
T.
Zh
an
get
al.
(200
6)ch
l9L
OC
_Os0
3g36
540
Sim
ple
Rec
F2-A
ll1
PT
�10
,000
2,45
82
4,90
615
0013
7H
.T
.Z
ha
ng
etal
.(2
006)
cpt1
LO
C_O
s02g
3597
0Si
mp
leR
ecF
2-R
ec5,
000
1,40
02
2,80
032
523
Ha
ga
etal
.(2
005)
d11
LO
C_O
s04g
3943
0Si
mp
leR
ecSS
LF
2-R
ec�
15,0
003,
020
26,
040
9819
Ta
na
be
etal
.(2
005)
d2L
OC
_Os0
1g10
040
Sim
ple
Rec
SSL
F2-A
ll1
PT
3,00
03,
000
26,
000
6010
Ho
ng
etal
.(2
003)
Dbs
LO
C_O
s01g
3304
0Si
mp
leR
ecF
2-R
ec�
12,4
003,
100
26,
200
8615
Sazu
ka
etal
.(2
005)
dgl1
LO
C_O
s01g
4900
0Si
mp
leR
ecF
2-R
ec�
4,60
01,
150
22,
300
44.5
5K
om
or
iso
no
etal
.(2
005)
Eu
iL
OC
_Os0
5g40
384
Sim
ple
Rec
NIL
F2-R
ec5,
500
1,40
02
2,80
024
1Z
hu
etal
.(2
006)
eui1
LO
C_O
s05g
4038
4Si
mp
leR
ecF
2-R
ec�
10,0
002,
623
25,
246
303
Lu
oet
al.
(200
6)f5
-DU
fP
AC
P00
08A
07Q
TL
Gam
eteg
NIL
hyb
rid
test
cro
ss1,
993
1,99
31
1,99
370
9W
an
get
al.
(200
6)fo
n1
LO
C_O
s06g
5034
0Si
mp
leR
ecF
2-A
ll2,
419
2,41
91
2,41
915
010
Suza
ki
etal
.(2
004)
fon
4L
OC
_Os1
1g38
270
Sim
ple
Rec
F2-R
ec�
8,40
02,
100
24,
200
450
83H
.W
.C
hu
etal
.(2
006)
gh2
LO
C_O
s02g
0949
0Si
mp
leR
ecF
2-R
ec13
,000
3,25
62
6,51
130
3K
.W
.Z
ha
ng
etal
.(2
006)
gid1
LO
C_O
s05g
3373
0Si
mp
leR
ecF
2-R
ec�
7,20
01,
800
13,
600
384
Ueg
uc
hi-T
an
ak
aet
al.
(200
5)gl
-3f
BA
CO
SJN
3b00
74M
06Q
TL
Rec
SSL
BC
4F2
Rec
2,06
849
92
998
87.5
10W
an
etal
.(2
006)
Gn
1a
LO
C_O
s01g
1011
0Q
TL
Ad
dit
ive
NIL
BC
F2-A
ll1
PT�
13,0
0013
,000
226
,000
6.3
1A
sh
ik
ar
iet
al.
(200
5)H
d1L
OC
_Os0
6g16
370
QT
LD
om
BC
3F
3-R
ec9,
000
1,50
52
3,01
012
2Ya
no
etal
.(2
000)
Hd6
LO
C_O
s03g
5538
9Q
TL
Do
mN
ILB
C3F
2P
T2,
807
2,80
72
5,61
426
.41
Ta
ka
ha
sh
iet
al.
(200
1)H
td1
LO
C_O
s04g
4647
0Si
mp
leR
ecF
2-R
ec20
,000
4,60
02
9,20
030
6Z
ou
etal
.(2
005,
2006
)M
oc1
LO
C_O
s06g
4078
0Si
mp
leR
ecF
2-R
ec2,
010
2,01
02
4,02
020
2X
.L
iet
al.
(200
3)P
i36
(t)f
PA
CP
0443
G08
Sim
ple
Do
mF
2-R
ec4,
884
580
21,
160
172
Liu
etal
.(2
005)
Pib
LO
C_O
s02g
5731
0Si
mp
leD
om
BC
2F3/
F4-R
ec1
PT�
13,0
003,
305
26,
610
8012
Wa
ng
etal
.(1
999)
Pi-
d2L
OC
_Os0
6g29
810
Sim
ple
Do
mF
2-R
ec20
,000
4,00
02
8,00
018
033
Ch
en
etal
.(2
006)
Pi-
khL
OC
_Os1
1g42
010
Sim
ple
Do
mF
2-A
ll1
PT
208
208
241
614
3.5
18Sh
ar
ma
etal
.(2
005)
pla1
LO
C_O
s10g
2634
0Si
mp
leR
ecF
2-R
ec2,
312
578
21,
156
743
Miy
osh
iet
al.
(200
4)P
sr1
LO
C_O
s01g
2548
4Q
TL
Do
mN
ILB
C3F
23,
800
3,80
00.
51,
900
50.8
4N
ish
im
ur
aet
al.
(200
5)qS
h1L
OC
_Os0
1g62
920
QT
LD
om
SSL
BC
4F2
1P
T10
,388
10,3
882
20,7
660.
612
1K
on
ish
iet
al.
(200
6)qU
vr1
0L
OC
_Os1
0g08
580
QT
LA
dd
itiv
eN
ILF
2-A
ll1
PT
1,85
01,
850
23,
700
276
Ued
aet
al.
(200
5)R
f-1
LO
C_O
s10g
3543
6Si
mp
leG
amet
egN
ILC
MS
test
cro
ss5,
145
5,14
51
5,14
576
4K
om
or
iet
al.
(200
4)S3
2(t
)fP
AC
AP
0052
94Q
TL
Gam
eteg
Het
BC
4F2
gam
etes
1,05
01,
050
22,
100
647
Li
etal
.(2
007)
S5n
fP
AC
P00
21C
04Q
TL
Gam
eteg
NIL
hyb
rid
ou
tcro
ss8,
000
8,00
01
8,00
040
5Q
iu
etal
.(2
005)
Skc1
LO
C_O
s1g2
0160
QT
LD
om
BC
3F2-A
ll1
PT
2,97
32,
973
25,
946
7.4
1R
en
etal
.(2
005)
spl1
1L
OC
_Os1
2g38
210
Sim
ple
Rec
F2-R
ec;
F3-A
ll�
3,00
02,
143
2/0.
5h
1,53
727
3Z
en
get
al.
(200
4)sp
l7L
OC
_Os0
5g45
410
Sim
ple
Rec
SSL
F2-A
ll1
PT
2,94
42,
944
25,
888
11
Yam
an
ou
ch
iet
al.
(200
2)X
a1L
OC
_Os0
4g53
120
Sim
ple
Do
mF
3-R
ec4,
225
965
21,
930
257
Yosh
im
ur
aet
al.
(199
8)xa
13
LO
C_O
s08g
4235
0Si
mp
leR
ecF
2-A
ll1
PT
�8,
000
7,97
22
14,8
4214
.82
Z.
H.
Ch
uet
al.
(200
6)
(con
tin
ued
)
2040 S. J. Dinka et al.
TA
BL
E1
(Co
nti
nu
ed)
Tar
get
alle
leT
IGR
ann
ota
tio
n
Inh
erit
ance
Map
pin
gst
rate
gya
To
tal
pro
gen
yP
roge
ny
gen
oty
ped
(g)
Mei
osi
sfa
cto
r(
f)b
Info
rmat
ive
gam
etes
(N)c
gen
oty
ped
Can
did
ate
reso
luti
on
(T)d
Ref
eren
ceT
ype
Tra
itkb
Gen
ese
Xa2
6L
OC
_Os1
1g47
000
Sim
ple
Do
mF
2/
NIL
-Rec
�1,
908
477
295
467
.212
Yan
get
al.
(200
3);
Sun
etal
.(2
004)
xa5
LO
C_O
s05g
0158
0Si
mp
leR
ecF
2-A
ll1
PT
2,34
52,
345
24,
790
8.1
2Iy
er
and
Mc
Co
uc
h(2
004)
Med
ian
�4,
884
2,41
94,
200
44.5
5
QT
L,
qu
anti
tati
vetr
ait
locu
s;R
ec,
rece
ssiv
e;D
om
,d
om
inan
t;G
amet
e,ga
met
op
hyt
ic;
RIL
,re
com
bin
ant
inb
red
lin
e;P
T,
pro
gen
yte
sted
;SS
L,
chro
mo
som
e/se
gmen
tsu
b-
stit
uti
on
lin
e;h
et,
het
ero
zygo
us.
aT
od
isti
ngu
ish
wh
eth
era
cro
sso
ver
occ
urr
edb
etw
een
the
targ
etal
lele
and
the
pro
xim
alvs
.d
ista
lm
ole
cula
rm
arke
r,th
ep
ost
-mei
oti
cge
no
typ
eo
fth
eta
rget
locu
s(A
A,
Aa,
aa)
mu
stb
ed
isce
rnib
le,a
sth
isd
eter
min
esth
en
um
ber
of
effe
ctiv
em
eio
ses
that
con
trib
ute
sto
the
fin
alm
app
ing
of
the
targ
etlo
cus.
Fo
rex
amp
le,f
or
are
cess
ive
trai
t,in
the
F2
gen
erat
ion
,wh
erea
sa
cro
sso
ver
can
be
det
ecte
db
yge
no
typ
ing
flan
kin
gm
ole
cula
rm
arke
rs,b
ecau
seA
Avs
.Aa
alle
les
can
no
tb
ed
isti
ngu
ish
ed,t
he
pro
xim
alvs
.dis
tall
oca
tio
no
fth
ecr
oss
ove
rca
nn
ot
be
assi
gned
wit
ho
ut
ph
eno
typ
ing
segr
egan
ts(P
T,
pro
gen
yte
stin
g)in
the
F3
gen
erat
ion
;w
ith
ou
tp
roge
ny
test
ing,
on
ly�
1/4
(aa
clas
s)o
fre
com
bin
ant
pro
gen
yco
ntr
ibu
teto
the
fin
alm
apas
sign
men
to
fth
eal
lele
.T
he
foll
ow
ing
gen
oty
pin
gan
dp
hen
oty
pin
gst
rate
gies
wer
eem
plo
yed
:F
2-R
ec,
on
lyF
2re
cess
ive
pro
gen
yge
n-
oty
ped
;F2-A
ll,a
llF
2p
roge
ny
gen
oty
ped
ran
do
mly
;F2-A
ll1
PT
,all
F2
pro
gen
yge
no
typ
ed,t
hen
F3
pro
gen
yp
hen
oty
ped
tod
isti
ngu
ish
AA
fro
mA
aal
lele
sat
the
targ
etlo
cus
inth
eF
2ge
ner
atio
n;R
IL/
NIL
/SS
L/
BC
,use
db
ackc
ross
ing
and
/o
rre
com
bin
ant
inb
red
lin
es,n
ear-
iso
gen
icli
nes
,an
d/
or
segm
ent
sub
stit
uti
on
lin
esto
dis
tin
guis
hth
eta
rget
QT
Lp
hen
oty
pe
or
toin
tro
gres
sth
eta
rget
alle
lein
toan
app
rop
riat
ech
rom
oso
me
bac
kgro
un
dsu
itab
lefo
rm
app
ing;
F1
or
F2
gam
etes
,sin
ceth
eta
rget
gen
ep
rod
uct
isex
pre
ssed
inth
eh
aplo
idga
met
op
hyt
e(p
oll
eno
rem
bry
osa
c),
use
dth
era
tio
of
gam
eto
ph
yte
ph
eno
typ
es(e
.g.,
po
llen
germ
inat
ion
)o
fth
esp
ikel
et,
inco
mb
inat
ion
wit
hge
no
typ
ing
the
pla
nts
carr
yin
gth
esp
ikel
ets,
tod
isti
ngu
ish
AA
/A
a/aa
gen
oty
pes
of
the
targ
etal
lele
.bT
he
mei
osi
sfa
cto
r(
f)is
mu
ltip
lied
by
the
nu
mb
ero
fp
ost
-mei
oti
cp
roge
ny
gen
oty
ped
togi
veth
en
um
ber
of
mei
ose
sth
atco
ntr
ibu
ted
toth
efi
nal
map
pin
gre
solu
tio
no
fth
eta
rget
alle
le.
Fo
rex
amp
le,
ifth
eF
2p
op
ula
tio
nw
asd
eriv
edfr
om
selfi
ng
F1
par
ents
,th
enea
chF
2p
lan
tre
pre
sen
tstw
om
eio
ses
(f¼
2)w
hen
the
pre
cise
F2
gen
oty
pe
was
dis
cern
edb
yp
roge
ny
test
ing
(ph
eno
typ
ing
segr
egan
tsin
the
F3
gen
erat
ion
).W
ith
ou
tp
roge
ny
test
ing,
then
on
ly1/
4re
cess
ive
(aa)
F2
pro
gen
yw
ere
con
sid
ered
tob
eu
sefu
l,b
ut
sin
cetw
om
eio
ses
con
trib
ute
dto
this
clas
s,f¼
(1/
4)3
2¼
0.5.
cN
um
ber
of
chro
mo
som
eso
ref
fect
ive
po
st-m
eio
tic
pro
du
cts
that
wer
ege
no
typ
ed(N
),w
her
eN¼
g3
f.dT
he
fin
alm
apre
solu
tio
n(T
)is
defi
ned
asth
en
um
ber
of
kilo
bas
eso
rge
nes
inth
ein
terv
alb
etw
een
the
mo
stp
roxi
mal
and
dis
tal
mo
lecu
lar
mar
ker
flan
kin
gth
eta
rget
alle
le,
inw
hic
hat
leas
to
ne
cro
sso
ver
was
fou
nd
.W
hen
this
valu
ew
asn
ot
pu
bli
shed
,it
was
esti
mat
edu
sin
gth
eV
ersi
on
4T
IGR
rice
pse
ud
om
ole
cule
sge
no
me
bro
wse
r,an
dw
hen
po
ssib
leco
nfi
rmed
by
per
son
alco
mm
un
icat
ion
wit
hth
est
ud
yau
tho
rs.
eO
nly
no
n-t
ran
spo
son
,n
on
-ret
roel
emen
tge
ne
mo
del
sar
ein
clu
ded
asd
efin
edb
yth
eV
ersi
on
4T
IGR
rice
pse
ud
om
ole
cule
san
no
tati
on
dat
abas
e.fG
ene
no
tye
tis
ola
ted
.gT
he
alle
leac
tsin
the
hap
loid
gam
ete-
der
ived
gen
erat
ion
,ei
ther
po
llen
or
emb
ryo
sac.
See
resu
lt
sfo
rd
etai
ls.
hT
wo
pro
gen
yp
op
ula
tio
ns,
wit
htw
oge
no
typ
ing
stra
tegi
es,
wer
eu
sed
.
Predicting Positional Gene Cloning 2041
viability was measured in the spikelets of selfed F2
plants. Finally, in the case of Rf-1, a nuclear locus thatrestores male gamete (pollen) fertility by overcomingthe effects of a mitochondrial ½cytoplasmic male ster-ility (CMS)� gene, 5145 testcross F2 progeny (three-way cross: heterozygote restorer 3 non-restorer tester)were generated for mapping and the segregation ofpollen viability scored (Komori et al. 2003, 2004).
Lessons from calculating empirical local recombina-tion frequencies (R-local) and their use in validatingpredictive equations: To both validate the equationsnoted in this study and later understand any discrep-ancies between the experimental data and predictionsbased on the molecular marker map, we then calculatedthe experimental (local) recombination frequency(R-local) for each of the 41 successful fine-mappingstudies in rice (see materials and methods) (Table 2).From each study, we counted the number of crossoverslocated between the closest two markers used to definethe final map resolution (T); these are the first recom-binants used to define the edges of the candidate targetregion. Although we expected to find only 1 crossoveron each distal or proximal flank (2 total), in 32 of 41examples we found between 3 and 16 total crossovers,due to hotspots of recombination and/or poor markerdensity; such redundant crossover targets suggested thatan excess number of progeny were genotyped given theavailable marker density in the majority of rice posi-tional cloning attempts, an important observation.
Since a high density of molecular markers and largeprogeny numbers are used in positional cloning, theR-local values provide an interesting snapshot into thevariation in recombination frequency in the rice ge-nome: we found that though the genome-wide averageR was 277 kb/cM or 28.0 genes/cM in rice, locally,R-values ranged from 3.3 to 1344.2 genes/cM or 28.2 to14,718 kb/M, an �400-fold and �500-fold range, re-spectively. Strongly influenced by chance, such a widerange in recombination frequencies would largely ex-plain the wide range in the number of progeny that weregenotyped in rice (Table 1). The most hyper-recombi-nogenic region (3.3 genes/cM, 28.2 kb/cM) flankedthe Pi36(t) allele (Liu et al. 2005), which required only1160 informative gametes to achieve a map resolution of17 kb or two candidate genes. The region with the leastamount of recombination (1344.2 genes/cM or 14,718kb/cM) encompassed the chl9 allele; in this study,although 4906 informative chromosomes were geno-typed, the map resolution was 1500 kb or 137 genes(H. T. Zhang et al. 2006). These two groups define theextremes of good and bad ‘‘luck,’’ respectively, in rice,and as such may set upper and lower map-population-size boundaries for future positional cloning attemptsin this important species.
We then compared the empirical number of gametesthat were genotyped (N) in each study to the number
predicted by both equations (see above) given only thevariables T and R-local; this allowed us to first test thevalidity of the equations in rice and to modify the equa-tions if necessary. The size of the mapping population(informative chromosomes) (N) predicted by the Durrett–Tanksley equation compared to the empirical data, forgiven Tand R-local values (in kb/cM), is shown in Figure2A; we found a strong positive correlation between themapping size predicted by the Durrett–Tanksley equa-tion and the experimental results (Spearman r ¼ 0.85,P , 0.0001, n ¼ 41). In at least 10 examples (10/41),however, in spite of using the actual recombinationfrequencies, we found that the Durrett–Tanksley equa-tion overestimated the mapping population by atleast twofold, which would have caused researchers tounnecessarily genotype thousands of extra progeny.The simpler, Single Crossover model appeared to be aslightly better predictor of the progeny mapping pop-ulation size as shown in Figure 2B. Although this secondequation predicted the mapping population N witha near-equivalent correlation as the Durrett–Tanksleyequation (Spearman r ¼ 0.86; P , 0.0001; n ¼ 41),linear regression analysis of the two models (Figure 3, Aand B) demonstrated that the single crossover equationcame closer to a linear slope of m ¼ 1 on an x–y scatterplot of predicted vs. experimental N values; in the caseof the Durrett–Tanksley model, the best-fit line followedthe equation y¼ 1.70x� 1323 (goodness of fit r2¼ 0.76,Sy.x ¼ 5456), whereas for the single crossover equation,the best-fit line was y ¼ 1.07x � 833 (r2 ¼ 0.76, Sy.x ¼3426). Although one equation was slightly better thanthe other, these results demonstrate for the first timethat (both) simple formulas, if based on accurate localrecombination frequency values, can provide signifi-cant guidance in predicting the mapping populationsize in the majority of alleles targeted for positionalcloning.
Fine-tuning of the equations based on empiricalstudies: We then wondered if we could fine-tune bothpredictive models. We noticed that the Durrett–Tanksleyequation overestimated the number of progeny neededwhen the experimental number of crossovers found indistance Twas low (,5 total); when the number of cross-overs found was high (.5), this equation underestimatedthe number of progeny required (Figure 2A; Table 2).In the latter cases, it appeared as if T was limited by thelocal density of molecular markers; given this low density,the published studies appear to have ‘‘over-genotyped’’the progeny population. Restated, when many crossoverswere found within the interval T (final map resolution),then the actual candidate distance (in kilobases) mighthave been smaller (higher map resolution) had moremolecular markers been available in the vicinity. Byplotting the ratio Nmodel/N empirical relative to the numberof crossovers (lT) (where l ¼ l1 1 l2) (Table 2) on ascatter plot, we found that there was an inverse Powerrelationship between the two variables such that N model/
2042 S. J. Dinka et al.
TA
BL
E2
Cal
cula
tio
no
flo
cal
reco
mb
inat
ion
freq
uen
cies
(R-l
oca
l)in
chro
mo
som
ein
terv
als
con
tain
ing
targ
etlo
ci
Tar
get
alle
le
R(L
oca
l)as
defi
ned
by
po
siti
on
alcl
on
ing
stu
die
sR
egio
nal
R(m
ap)
asd
eter
min
edb
yan
cho
rin
glo
cus
toth
eri
ceR
GP
1391
-mar
ker
map
a
Pro
xim
al(l
1)
cro
sso
vers
Dis
tal
(l2)
cro
sso
vers
R(l
oca
l)b
(kb
/cM
)R
(lo
cal)
b
(gen
es/
cM)
Ch
rom
oso
me
Ph
ysic
alin
terv
al(b
p)c
Pro
xim
alm
arke
rD
ista
lm
arke
rR
(map
)d
(kb
/cM
)R
(map
)d
(gen
es/
cM)
bc1
52
66.6
20.2
317
2153
34–1
7217
366
C12
819
E61
171
403.
646
.8be
l1
216
9.4
27.7
331
3796
97–3
1385
021
C00
12S2
722
140.
221
.4B
ph1
52
31,
780.
741
6.8
468
8283
5–72
3162
1C
0708
R02
8814
60.4
125.
4ch
l11
222
7.7
28.6
333
8942
27–3
3900
896
E50
580
C14
8436
9.9
53.8
chl9
23
14,7
18.0
1344
.23
2020
4631
–202
0230
5E
6134
0S2
992
6028
.851
2.0
cpt1
310
700.
049
.52
2160
2296
–192
7256
E36
34E
5085
017
5.3
20.7
d11
57
493.
395
.64
2325
2491
–232
4807
0S1
0702
R02
7827
2.7
36.3
d21
11,
800.
030
0.0
152
3342
5–52
4132
1E
6011
0C
1207
225
0.7
31.3
dbs
25
761.
713
2.9
118
4777
85–1
8484
690
C05
85E
6080
8A29
44.1
316.
7dg
l11
325
5.9
28.8
128
4632
15–2
8470
187
R10
12C
0922
169.
321
.3eu
i1
222
4.0
9.3
523
6453
62–2
3655
166
S148
89E
3000
312
7.0
18.7
eui1
12
524.
652
.55
2364
5362
–236
5516
6S1
4889
E30
003
127.
018
.7f5
-DU
132
93.0
12.0
512
7864
9–13
7942
1R
0830
S161
1310
6.7
11.9
fon
13
272
5.7
48.4
630
4734
75–3
0469
084
R14
79E
1013
918
5.5
26.3
fon
41
101,
718.
231
6.9
1122
1636
72–2
2162
134
E43
36C
0050
2171
.623
6.7
gh2
1e1e
976.
797
.72
4874
079–
4870
146
S151
1S1
3446
177.
523
.9gi
d11
168
4.0
72.0
519
7853
34–1
9788
132
S268
9C
1268
252.
228
.3gl
-33
314
5.5
16.6
316
4694
00–1
6556
900
E03
23C
6031
8A12
8.5
12.5
Gn
1a
11
819.
013
0.0
152
7232
4–52
6727
4E
6011
0C
1207
225
0.7
31.3
hd1
12
120.
420
.16
9335
377–
9337
570
C02
35R
1006
925
7.9
22.4
Hd6
43
211.
78.
03
3145
3402
–314
4775
5C
0012
S272
214
0.2
21.4
htd1
11
1,38
0.0
276.
04
2734
8729
–273
5139
3C
1137
8E
5119
628
3.7
40.7
moc
12
220
1.0
20.1
624
3103
11–2
4315
385
S103
24R
1559
151.
818
.7P
i36
(t)
25
28.2
3.3
827
6297
7–28
8890
9S4
144B
C60
293
139.
412
.5P
ib1
31,
322.
019
8.3
235
1129
00–3
5107
768
C03
79S1
0020
98.3
12.9
Pi-
d21
33,
600.
066
0.0
617
1593
37–1
7163
823
C00
58S2
0510
9503
.884
0.0
Pi-
kh6
459
.77.
511
2476
1902
–247
6292
2R
1082
9S5
730
198.
622
.3pl
a11e
1e42
7.7
17.3
1013
3291
13–1
3327
360
C09
61R
1738
A13
5.1
13.7
Psr
12
132
1.7
25.3
114
4293
94–1
4435
943
R25
01C
0045
428.
039
.1qS
h11
242
.469
.21
3677
6800
–367
7221
1R
0578
E20
351
1145
.613
1.3
qUvr
10
21
333.
074
.010
4433
064–
4429
249
E30
589B
E10
6429
1.7
28.2
Rf-
12
11,
303.
468
.610
1860
6065
–186
2412
6S1
1841
S211
7431
8.5
48.2
S32
(t)
42
224.
024
.52
2109
653–
2227
836
S203
51R
1047
920
1.1
22.6
S5n
12
1,06
6.7
133.
36
5606
631–
5758
152
R19
54R
2349
70.7
10.0
Skc1
11
220.
029
.71
1146
0220
–114
5573
4E
4175
E10
119
190.
418
.9sp
l11
21
138.
315
.412
2343
9842
–234
3465
2E
6197
6S5
375
95.2
13.3
spl7
11
29.4
29.4
526
2611
54–2
6263
625
C04
66C
1136
822
2.2
43.6
Xa1
32
96.5
27.0
431
4190
36–3
1425
732
C52
068
S154
429
8.3
45.0
xa1
33
243
9.3
59.4
826
5959
51–2
6593
109
R39
61A
R06
3910
9.1
16.0
(con
tin
ued
)
Predicting Positional Gene Cloning 2043
TA
BL
E2
(Co
nti
nu
ed)
Tar
get
alle
le
R(L
oca
l)as
defi
ned
by
po
siti
on
alcl
on
ing
stu
die
sR
egio
nal
R(m
ap)
asd
eter
min
edb
yan
cho
rin
glo
cus
toth
eri
ceR
GP
1391
-mar
ker
map
a
Pro
xim
al(l
1)
cro
sso
vers
Dis
tal
(l2)
cro
sso
vers
R(l
oca
l)b
(kb
/cM
)R
(lo
cal)
b
(gen
es/
cM)
Ch
rom
oso
me
Ph
ysic
alin
terv
al(b
p)c
Pro
xim
alm
arke
rD
ista
lm
arke
rR
(map
)d
(kb
/cM
)R
(map
)d
(gen
es/
cM)
Xa2
62
1440
.17.
211
2768
8448
–276
9219
8R
1506
R33
4238
3.4
32.5
Xa5
11
194.
047
.95
3235
38–3
2546
8C
1201
5C
0568
111.
913
.1
Fo
rco
mp
aris
on
,th
ege
no
me-
wid
eav
erag
eR
(avg
)¼
277
kb/
cMo
r28
.0ge
nes
/cM
.a
Ric
eG
eno
me
Pro
ject
(RG
P)
RF
LP
map
bas
edo
nan
F2
po
pu
lati
on
bet
wee
nN
ipp
on
bar
ean
dK
asal
ath
(htt
p://
rgp
.dn
a.af
frc.
go.jp
/)
(Ha
ru
sh
im
aet
al.
1998
).bT
he
loca
lre
com
bin
atio
nfr
equ
ency
was
calc
ula
ted
asfo
llo
ws:
R-lo
cal¼
T(l
oca
l)/
m(l
oca
l),
wh
ere
T(l
oca
l)is
nu
mb
ero
fki
lob
ases
or
gen
esb
etw
een
the
clo
sest
flan
kin
gm
ole
cula
rm
arke
rsw
ith
atle
ast
on
ecr
oss
ove
rb
etw
een
each
mar
ker
and
the
targ
etlo
cus,
and
m(l
oca
l)is
gen
etic
dis
tan
ceb
etw
een
thes
etw
om
arke
rsin
cen
tim
org
ans.
Th
eva
lue
for
T(l
oca
l)is
rep
ort
edin
Tab
le1.
Th
ege
net
icm
apd
ista
nce
,m
(lo
cal)¼
100
3(l
11
l2)/
N,
wh
ere
l1
isn
um
ber
of
pro
xim
alcr
oss
ove
rs(T
able
2),
l2
isn
um
ber
of
dis
talc
ross
ove
rs(T
able
2),a
nd
Nis
tota
ln
um
ber
of
info
rmat
ive
chro
mo
som
esge
no
typ
ed(T
able
1).I
na
test
cro
ss,m¼
100
3l
/p
roge
ny
(hen
cem¼
100
3re
com
bin
ants
/p
roge
ny)
,wh
erea
sin
ase
lfed
cro
ssw
ith
pro
gen
yte
stin
g,m¼
100
3(l
/2
3p
roge
ny)
sin
cege
no
typ
ing
per
mit
sb
oth
chro
mo
som
esto
con
trib
ute
toth
em
app
ing
po
pu
lati
on
.cT
his
isth
ep
reci
sech
rom
oso
me
ph
ysic
allo
cati
on
of
the
targ
etlo
cus
or
can
did
ate
regi
on
on
the
rice
ph
ysic
alm
ap(V
ersi
on
4.0
TIG
Rri
cep
seu
do
mo
lecu
les)
.d
R(m
ap)¼
T(m
ap)/
m(m
ap),
wh
ere
T(m
ap)
isn
um
ber
of
kilo
bas
eso
rge
nes
bet
wee
nth
eR
GP
pro
xim
alan
dd
ista
lm
arke
rsco
nta
inin
gth
eta
rget
locu
sas
list
edin
Tab
le2,
and
m(m
ap)
isge
net
icm
apd
ista
nce
bet
wee
nth
ese
two
RG
Pm
arke
rsin
cen
tim
org
ans.
Th
ese
valu
esca
nb
efo
un
din
sup
ple
men
tal
Tab
le1
ath
ttp
://w
ww
.gen
etic
s.o
rg/
sup
ple
men
tal/
.eN
um
ber
of
reco
mb
inan
tsn
ot
rep
ort
ed,
sose
tat
l1¼
1an
dl
2¼
1.
2044 S. J. Dinka et al.
N empirical ¼ 4.744/lT. Therefore, we adjusted T bymultiplying it by 4.744/lT, where lT is the total numberof crossovers in this region. Accordingly, we also rede-fined T as T-marker to note that marker density oftenrate-limits the physical resolution. The resulting modi-fied Durrett–Tanksley equation is
N ¼ð4:744 3 100RÞ=½T -marker 3 ð4:744=lTÞ�;
or simplified;
N ¼ ð100R 3 lTÞ=T -marker ;
where N is total number of informative chromosomesthat must be genotyped with the probability of successset at P ¼ 0.95, R is the local recombination frequency(R-local), T-marker is distance between the closest twomolecular markers (in which crossovers are detectedrelative to the target allele), and lT is number ofcrossovers between the closest two molecular markers($2). This is a rewritten version of the standard mapdistance calculation: m¼ 100 3 recombinants/progenyfor a testcross, assuming no double crossovers (Haldane
1919).
Figure 2.—Testing the validity of two mathematical equations as predictors of the size of the progeny mapping population (N)required to positionally clone target alleles using rice as a model system. We compared N values predicted by the Durrett–Tanksleyequation (A) and the Single Crossover model equation (B) to 41 published, empirical studies (shown in Table 1). In both graphs,the target alleles are shown on the x-axis; solid histograms denote the kilobase map resolution achieved, and the solid graphed lineis the number of informative post-meiotic gametes (N) genotyped, as calculated in Table 1; the spotted line is the number ofinformative gametes predicted. When the probability of success is set at 95%, then the Durrett–Tanksley equation is simplifiedsuch that N ¼ (4.744 3 100R)/T, where R is relevant meiotic recombination frequency and T is final map resolution achieved,notably the distance between the closest distal and proximal molecular markers that are subject to at least one crossover betweenthe marker and the target trait in the progeny population. The single crossover model predicts that N¼ Log (1 � P)/Log (1� T/100R). For both equations, we employed the published, empirical local recombination frequency (R-local) as shown in Table 2,and hence these graphs represent the upper limit of prediction possible by the equations. For the graphs above, we set the prob-ability of success at 95% (P ¼ 0.95).
Predicting Positional Gene Cloning 2045
We then compared the predictions of the modifiedDurrett–Tanksley equation, using R-local values (Table2), to the published mapping size population values(N); as shown in Figure 3C, the modified equation was100% predictive (y ¼ 1.0x, r2 ¼ 1.0, F ¼ 0). Using asimilar approach, we also modified the Single Crossoverequation. By plotting the ratio N model/N empirical relative tothe number of crossovers (lT) (where lT ¼ l1 1 l2)(Table 2) on a scatter plot, we found that there was aninverse Power relationship between the two variablessuch that N model/N empirical � 3/lT. Therefore, we modi-fied the genetic map resolution T by the number ofcrossovers, resulting in the following modified SingleCrossover equation:
N ¼ Log ð1� PÞ=Log f1� ½T � marker ð3=lTÞ�=100Rg:
As shown in Figure 3D, again the modified equationwas close to 100% predictive of the empirical results (y¼1.0x � 1.5, r2 ¼ 1.0).
These modified equations offer some advantages forresearchers: these equations define probability explic-itly as the number of crossovers (informative gametes)that a researchers can expect to achieve for a givenprogeny population. A researcher is taking more of a risk
if the goal is to achieve only two informative gametes,each carrying a crossover on either side of thetarget allele (lT ¼ 2), compared to if the target is fiveinformative gametes. These equations also make itexplicit that the density of available molecular markersin the target region is critical: if there are few availablemolecular markers, a researcher does not achieve betterresolution by increasing the number of progeny geno-typed (N) beyond a certain threshold. We suggest thatusers of this equation who wish to predict N should selectT based on a realistic density of achievable molecularmarkers in the vicinity of the target allele, and adjust lT
according to their own risk assessment. For example, ifobtaining only two informative recombinant gametes istoo risky, N should be increased.
Predictive value of the equations using recombina-tion frequencies derived from a MRFM: In the analysisabove, we validated both Durrett–Tanksley equations andthe Single Crossover equations using published high-resolution, local recombination frequencies (R-local)derived from already fine-mapped alleles. Our goal wasto predict the progeny mapping population (N infor-mative gametes) in advance, however, whereas R-localdata is not available until the conclusion of a positionalcloning attempt. Previous a priori mapping population
Figure 3.—Linear re-gression analysis to validateand determine which math-ematical models predictmapping population sizeduring positional cloningattempts in rice. In eachcase, the y-axis is thenumberof gametes predicted byeach model, and the x-axisis the published, empiricalnumber of informative ga-metes genotyped. (A) Lin-ear regression analysis ofthe Durrett–Tanksley equa-tion, based on the calcula-tion P ¼ 1 � [1 1 NT/(100R)]e�NT/(100R), whereP is threshold probabilityof success, N is the numberof meiotic gametes (chro-mosomes) that must be gen-otyped in which it can bedetermined whether a cross-over is located proximal ordistal to the target allele,
T is expected distance between flanking molecular markers (kilobases or candidate genes), and R is local recombination frequency(kb/cM orgenes/cM). Theequationwas simplified by setting theprobability of successatP¼0.95, resulting inN¼(4.7443100R)/T.(B) Linear regression analysis of the Single Crossover equation, where N ¼ Log (1 � P)/Log (1 � T-marker/100R). (C) Linear re-gression analysis of the modified Durrett–Tanksley equation, calculated as N¼ (100R 3 lT)/T-marker, where lT is number of cross-overs between the closest two molecular markers ($2). (D) Linear regression analysis of the modified Single Crossover equation,calculated as N ¼ Log (1 � P)/Log {1 � ½T-marker(3/lT)�/100R}. For each model, experimentally-derived R-local values were usedfrom Table 2, andhence these graphs represent the upper accuracy limit of theequations as typically such high resolution frequenciesare not available before a positional cloning experiment. The results demonstrate that the Single Crossover model is moderatelybetter predictive of the mapping population size compared to the Durrett–Tanksley equation, but both models become accuratewhen the equations are adjusted for the number of gametes carrying crossovers immediately flanking the target locus.
2046 S. J. Dinka et al.
estimates only used the genome-wide average recombi-nation frequency (R-avg) (Durrett et al. 2002), but aswe have confirmed (Table 2) and as many others havenoted (Wu et al. 2003; Crawford et al. 2004; McVean
et al. 2004), recombination frequencies vary tremen-dously along any chromosome. Therefore, we wonderedif we could more accurately predict N in advance byemploying regional meiotic recombination frequenciesfrom a high-density molecular marker map (R-map). Toaccomplish this, we first developed a MRFM for 1400marker intervals in rice, based on the Rice Genome Pro-ject (RGP) F2 ½Nipponbare (Japonica) 3 Kasalath (In-dica)� RFLP map (Harushima et al. 1998). Mean R-mapvalues were 33.5 genes/cM and 294 kb/cM, similar tocalculations of the whole-genome average recombinationfrequency (R-avg) for rice (28 genes/cM and 277 kb/cM).The entire R-map data set is located in supplementalTable 1 (http://www.genetics.org/supplemental/) and itshould serve as a useful reference for future positionalcloning studies in rice.
Next, in silico, we mapped each cloned allele onto aphysical and genetic interval on this map as shown inTable 2 (see materials and methods). We then usedthe corresponding ‘‘neighborhood’’ recombination fre-quencies (R-map) to calculate mapping population sizes(N). As shown in Figure 4, we found that there was amodest but significant improvement in predicting thenumber of informative gametes (N) required to be ge-notyped when recombination frequencies (calculated
as kilobases/cM) were based on rice RGP R-map values;as we suspected, we found that there was not a signif-icant correlation between the empirical mapping size(N) vs. mapping sizes predicted by either of the two(unmodified) equations when the R-avg value was used(Spearman r¼ 0.30, P¼ 0.0547, n¼ 41) (Figure 4, A andD). In contrast, the correlation was significant whenR-map values were used (Spearman r¼ 0.46, P¼ 0.0022,n ¼ 41) (Figure 4, B and E) and this correlation in-creased even further when several outliers were re-moved (Spearman r¼ 0.61, P , 0.0001, n¼ 36) (Figure4, C and F). Surprisingly, however, the correlation didnot improve even further when the modified equationswere used that took into account the number of im-mediate crossovers (lT) (for R-map, Spearman r¼ 0.35,P ¼ 0.0232, considered significant); however, the corre-lation was still a significant improvement over when theR-avg value was used in conjunction with the modifiedequations (Spearman r ¼ 0.21, P ¼ 0.19, n ¼ 41, notsignificant; data not shown). We conclude that mappingsize predictions based on neighborhood (.280-kb seg-ments) recombination frequencies (in kilobases/cM)better predict the number of progeny required to begenotyped to positionally clone a gene than predictionsbased on using the genome-wide average recombina-tion frequency.
The effect of using R-map recombination frequen-cies calculated as kb/cM vs. genes/cM: Although use ofR-map values better predicted the size of the progeny
Figure 4.—Modest improvement in predicting the size of the progeny mapping population (informative gametes) required tobe genotyped during positional cloning when using neighborhood recombination frequencies extracted from a reference geneticmap (R-map) compared to the whole genome average (R-avg) using R-values based on kilobase/cM calculations. On the x-axis isthe mapping population size from published positional cloning studies in rice (see Table 1). On the y-axis is the prediction. (A–C)Models based on the Durrett–Tanksley equation (unmodified). (D–F) Models based on the Single Crossover equation (unmod-ified). R-map values were calculated from the 1400-marker Rice Genome Project (RGP) RFLP map (F2 of Nipponbare 3 Kasalathcross) (see materials and methods). In C and F, five outliers were removed in comparison to B and E, respectively. Both equa-tions set the probability of success at 95%.
Predicting Positional Gene Cloning 2047
mapping population compared to the genome-wide aver-age recombination frequency, we were disappointedthat the improvement was not more significant. In orderto understand the reason, we asked to what extentR-map values calculated as kilobases/cM (from the riceRGP 1400-marker map) in fact correlated with theR-local values that we extracted from the 41 publishedstudies. As shown in Figure 5A, the correlation was infact poor (Spearman r¼ 0.23, P¼ 0.1428, considered notsignificant); of course, there was no correlation whenR-local was compared to R-avg, so the R-map (kb/cM)values were still useful.
However, we then asked whether the correlation im-proved when R-map was calculated as genes/cM insteadof kb/cM. Limited evidence (Fu et al. 2001) suggestedthat the crossovers contributing to R-map values mightprimarily be occurring in and around genes. In fact, asshown in Figure 5B, we found a significantly improvedcorrelation between R-map values calculated as genes/cMto R-local values also calculated as genes/cM (Spearmanr ¼ 0.48, P ¼ 0.0016).
Therefore, we retested whether we could better pre-dict progeny mapping population sizes (N) when usingrice RGP R-map values calculated as genes/cM ratherthan kilobases/cM. Using R-map (genes/cM) calcula-tions shown in Table 2, Figure 6 demonstrates that in-deed the map population (N) predicted by both the(unmodified) Durrett–Tanksley equation and the (un-modified) Single-Crossover equation based on R-map(genes/cM) values better predicted the published re-sults over the genome-wide R-avg (28 genes/cM) orR-map values based on kb/cM (Figure 6 vs. Figure 4). Infact, with three outliers removed, the correlation be-tween the progeny size predictions based on R-map vs.the published data was extremely significant (Spearmanr ¼ 0.67, P , 0.0001, n ¼ 38) (Figure 6, C and F). Al-though the predictions did not improve further whenthe modified equations were used (for R-map, Spear-man r ¼ 0.38, P ¼ 0.0151, considered significant), thepredictions were significantly better than when theR-avg value was used in conjunction with the modifiedequations (Spearman r ¼ 0.05, P ¼ 0.7662, n ¼ 41, not
significant; data not shown). We conclude that mappingsize predictions based on neighborhood (.280-kb seg-ments) recombination frequencies (R-map) better pre-dict the number of progeny required to be genotypedfor positional gene cloning in rice when R-values arecalculated as genes/cM rather than kilobases/cM, andboth are significant improvements over calculationsbased on the genome-wide R-avg.
The limiting factor is that R-map values often do notreflect R-local frequencies, but when they do theprogeny mapping size can be accurately predicted: Ascalculated in Table 2 and shown in Figure 7A, thelimiting factor is that the neighborhood recombinationfrequency often does not reflect the local recombina-tion frequency, even though it is more reflective of localrates of recombination than the genome-wide average.The situation may or may not be better for other maps inother species, particularly as more robust, higher-resolution maps are constructed. Indeed, the rice mapgave us hope for the future; in spite of the problems withour use of this map (see discussion) as shown in Figure7A, we found 11 examples where the R-map values (cal-culated as genes/cM) were only ,30% different thanthe corresponding R-local value. These corresponded tothe following loci: f5-DU, spl11, gl-3, pla1, hd1, moc1,S32(t), bel, dl1, fon4, and Pi-d2. When the mapping pop-ulation size (N) was calculated for only these 11 alleles,shown in Figure 7, B–E, linear regression analysis showedthat both the modified Durrett–Tanksley equation aswell as the modified Single Crossover equation very ac-curately predicted the mapping population size (N)using recombination frequency (R-map) values fromthe RGP map: the best fit lines were linear (m¼ 1.2) andthe predictions matched the best-fit lines with very highr2 values (0.95–0.98). Similar results were obtained for10 examples where R-map values, calculated as kb/cM,were used; in that case, the predictions matched thebest-fit line also with r2 value of 0.98 (slope y ¼ 0.8x �590; data not shown).
The utility of our approach was best demonstrated bycomparing the data for bel (Pan et al. 2006) vs. Pi-d2(Chen et al. 2006); empirically, only 462 informative
Figure 5.—Meiotic re-combination frequencies(R-map) extracted from theRice Genome Project (RGP)RFLP map better correlatewith local frequencies(R-local) frompublishedpos-itional cloning studies whencalculated as genes/cMrather than kilobases/cM.(A) Linear regression analy-sisofR-mapvs.R-local valuesusing kilobase/cM ratios.
(B) Linear regression analysis using genes/cM ratios. The correlation between R-map vs. R-local will have to be calculated empiricallyfor each map and each species to determine if the methodology described in this study can be employed.
2048 S. J. Dinka et al.
gametes (N) were genotyped to fine map bel to a mapresolution (T) of 18 genes; in contrast, 8000 informativegametes were required to fine map Pi-d2 to a mapresolution of 33 genes. The RGP map correctly pre-dicted that the recombination frequency (R-local)flanking Pi-d2 was �20-fold lower than that flankingbel. As a result, both modified equations would havepredicted in advance that mapping bel to this resolutionwould require �360 gametes, and that Pi-d2 wouldrequire �10,000 gametes. If such accurate predictionscould be made across the majority of target loci in thefuture, then researchers will be able generate appropri-ately sized map populations and properly allocate hu-man, growth room, and financial resources.
DISCUSSION
A key frustration during positional gene cloning, alsoknown as map-based cloning, has been that the size ofthe mapping population has been found to vary .25-fold within a species (Dinka and Raizada 2006) (Table1) depending on the target locus, and that this final sizehas been difficult to predict. As a result, researchersoften undertake positional cloning attempts with somefear. More importantly, it has been difficult to estimate
the time, resources, growth space, and personnel re-quired to generate, propagate, genotype, and pheno-type an appropriately sized progeny population. Thegoal of this research was to create a detailed method-ology to improve mapping size predictability acrosseukaryotic species once researchers have initially map-ped a target locus to a small interval (1–2 cM). As a sidebenefit, we have provided a detailed review of positionalcloning strategies and results in rice, which should beuseful information for the research community studyingrice, the world’s most important crop. Building upon thework of Durrett et al. (2002), we have demonstratedthe utility of a formula (the Durrett–Tanksley equation)that predicts progeny population size N (Figure 2). Byfurther fine-tuning the Durrett–Tanksley equation, tak-ing into account how many (redundant) crossovers de-fined the map resolution T (a measure of the localmarker density), we were able predict the size of themapping population with 100% accuracy when providedwith local, high-resolution recombination frequencies(Figure 3). We also derived and tested a simpler, moreuser-friendly equation, based on the probability ofachieving only one crossover within the progeny pop-ulation, instead of the two calculated by the Durrett–Tanksley equation. We found that the Single Crossovermodel was as predictive as the Durrett–Tanksley
Figure 6.—More significant improvement in predicting the size of the progeny mapping population (informative gametes)required to be genotyped during positional cloning in rice when employing neighborhood recombination frequencies (R-map) from the RGP map calculated as genes/cM rather than kb/cM. On the x-axis is the mapping population size from publishedpositional cloning studies in rice (see Table 1). On the y-axis is the prediction. (A–C) Models based on the Durrett–Tanksleyequation (unmodified). (D–F) Models based on the Single Crossover equation (unmodified). R-map values were calculated fromthe 1400-marker Rice Genome Project (RGP) RFLP map (F2 of Nipponbare 3 Kasalath cross) (see materials and methods). InC and F, three outliers were removed in comparison to B and E, respectively. Both equations set the probability of success at 95%. Asignificant improvement over use of the R-avg frequency (A and D) is demonstrated.
Predicting Positional Gene Cloning 2049
equation, and that the number of crossovers (l) wasagain a useful equation modifier (Figures 2 and 3). Withvalidated equations, and researchers not having theluxury of having access to robust recombination fre-quencies in the vicinity of their target allele, wemeasured whether recombination frequencies derivedfrom a 1400-marker reference genetic map (supple-mental Table 1 at http://www.genetics.org/supplemental/)could be useful, and indeed the map population size wasmore accurately predicted when these values were usedinstead of the genome-wide average recombination fre-quency (Figures 4 and 6). Since researchers targeting afully sequenced genome care more about how manycandidate genes they must distinguish, not the numberof kilobases per se, we also determined that the modelscould predict gene resolution as well as or better thanthe kilobase resolution (Figures 5 and 6). Although therice map, in conjunction with our formulas, could haveaccurately predicted several unusually large or small
mapping population-requiring target alleles, includingalleles located near centromeres suffering from sup-pressed meiotic recombination (e.g., chl9, Pi-d2, andBph15), we found that the limiting factor was the cor-relation between R-map vs. R-local recombination fre-quencies (Table 2, Figure 7).
Understanding R-map vs. R-local discrepancies:There are likely several reasons for why recombinationfrequencies from a reference genetic map (R-map) inrice often did not match the frequency in the vicinity oftarget alleles (R-local), and these are important lessonsfor future attempts to predict mapping population size.First and most obvious, even within a .280-kb interval(�1 cM average), the rice RGP map demonstrated thatthe meiotic recombination frequency could vary signif-icantly (Wu et al. 2003) (supplemental Table 1 at http://www.genetics.org/supplemental/). Second, as is the casewith many whole-genome genetic maps, only small num-bers of progeny (typically 100–200) were genotyped to
Figure 7.—The underlying limiting factor is that the neighborhood (.280 kb) recombination frequency R-map often does notreflect the recombination frequency in the vicinity (,50 kb) of the target locus (R-local), but when the values do match, then theprogeny mapping size can be accurately predicted. (A) Comparison of recombination frequencies in the vicinity of the target gene(R-local) compared to neighborhood recombination frequencies (R-map) derived from the Rice Genome Project (RGP) RFLPmap. The graph shows at which loci R-map reflects R-local and where it does not. In the vicinity of the qSh1, dbs, fon4, chl-9, and Pi-d2 loci, R-map accurately predicted a low recombination frequency, unlike R-avg, and thus predicted that large numbers of prog-eny would need to be genotyped. (B–E) Linear regression analysis demonstrating that when R-map values were within 30% ofR-local values, including several high or low recombination intervals shown in A, then the modified Durrett–Tanksley equationand modified Single Crossover equation could accurately predict the final outcome, namely large or small mapping populations,respectively. The modified equations (C and E), which took into account the number of gametes carrying crossovers immediatelyflanking each target locus, were more accurate than the original equations (B and D).
2050 S. J. Dinka et al.
generate the RGP map (Harushima et al. 1998); as aresult, the location of rare crossovers was more subjectto chance. In other words, had the RGP map been gen-erated multiple times using independent populations,the recombination frequencies would likely have variedsignificantly within 1–2-cM intervals. Third, whereas theRGP map was based on two parental genotypes, therice Indica variety (Kasalath) and the Japonica variety(Nipponbare) (Harushima et al. 1998), only 8 of 41 ofthe studies that we compared our models to also usedthese genotypes to generate their mapping populations.Differences between genotypes, such as the density ofrepetitive DNA or local cytogenetic rearrangements asseen in maize (Bennetzen and Ramakrishna 2002;Wang and Dooner 2006), might have caused R-mapvalues from the RGP map to differ from the publishedstudies. Indeed, it has been shown that domesticatedrice cultivars have an unusually high rate of ongoinggene duplications, vary considerably in the location anddensity of repetitive DNA (e.g., retroelements), and havevery high rates of intergenic nucleotide polymorphisms(SNPs, indels), perhaps in part due to human selectionin geographically isolated locations (Garris et al. 2005;Yuet al. 2005; Tanget al. 2006). Finally, the RGP map wasgenerated using F2 selfed progeny, whereas the map-ping populations used in the 41 published studies weregenerated by diverse methods, including the use of NILs,chromosome SSLs, and recombinant inbred lines (RILs),and in at least at one locus with low recombination rates,fon4-1, an �200-kb chromosome deletion was involved(H. W. Chu et al. 2006). It has been shown that when twochromatids differ in their relatedness to one another, asin RILs vs. NILs, the local recombination frequency maybe affected (Burr and Burr 1991; Lukacsovich andWaldman 1999; Li et al. 2006); in the most extremecase, unequal deletions between chromatids, suppres-sion of meiotic recombination has long been observed
(Rieseberg 2001). All of these factors might have con-tributed to our observation that R-map values from therice RGP map often did not match recombination fre-quencies in the vicinity of target alleles.
Applying these results: As for our recommendationsto researchers undertaking positional cloning, we rec-ommend that the R-map strategy should only be reliedupon when they have access to a reference genetic mapthat has been demonstrated to have a strong correlationbetween R-map values and R-local values. To make thispossible, higher resolution maps, with more markers,must be generated and/or employed to account for sub-centimorgan R variation. In potato, a genetic map with10,000 markers was recently constructed (van Os et al.2006), demonstrating progress in this area. Such high-resolution maps will provide researchers with a range ofrecombination frequencies across a 1–2-cM interval,and thus, at best, researchers could expect to predict anupper and lower range of N, not the precise number. Toimprove the robustness (reproducibility) of R-map fre-quencies, genetic maps must be generated based onsampling hundreds to thousands of progeny rather thanonly 100–200 individuals (Ferreira et al. 2006). To makereference map frequencies relevant to the genotypictargets of positional cloning, maps must be constructedfrom more parental genotype pairs. In addition, forsome species, the number of informative gametes (N)might need to be adjusted to account for male vs. femaledifferences in recombination frequency (Lenormand
and Dutheil 2005) by adjusting the meiosis factor (f )(see materials and methods). As to whether R-mapvalues based on genes/cM or kilobases/cM should beused, we had assumed, given that meiotic recombina-tion in plant genomes has been shown to be highlybiased to gene regions, rather than flanking hetero-chromatin (Fu et al. 2001), that if we ascribed mostrecombination as occurring within or flanking genes,
Figure 8.—The final goal ofthis research: a mapping popula-tion size prediction graph. Shownare the predictions for rice chro-mosome 3 of the number of prog-eny (informative gametes, N)required to positionally clone atarget allele to achieve a five-candidate gene map resolution(T) based on the Single Crossoverequation (unmodified) with a95% probability of success. Thex-axis denotes the physical basepair location along the sequencedchromosome. Arrows point topreviously isolated alleles in rice;the model was effective in predict-ing the relative mapping popula-tion size for these alleles (see
results text). For example, the graph accurately predicted that 20-fold more progeny would be required to positionally clonethe chl9 locus compared to the nearby gl-3 locus. The model is based on meiotic recombination frequencies (R-map in genes/cM)as calculated from the Rice Genome Project (RGP) map (see supplemental Table 1 at http://www.genetics.org/supplemental/).
Predicting Positional Gene Cloning 2051
then the genes/cM ratio would be less variable than thekb/cM ratio; in other words, as the number of genesincreased in an interval, the frequency of crossoverswould also increase in proportion, keeping the genes/cMratio constant. However, in retrospect, two pieces of datanow suggest that this assumption was incorrect. First, inthe meiotic recombination frequency calculations wemade on the RGP rice map, we found that the genes/cMratio varied within the genome nearly as much as thekb/cM ratio; the coefficient of variation for R (genes/cM)was 98% across the rice genome (n¼ 971) compared to113% for R (kb/cM) (n ¼ 952). Second, if recombina-tion was biased to within or near genes, then the recom-bination frequencies from positional cloning studies(R-local) would be predicted to be higher than thegenome-wide average for rice (R-avg ¼ 277 kb/cM); infact, out of the 41 published studies, 20 studies had aR-local value below R-avg with 20 above the R-avg, sug-gesting no bias in recombination near genes (Table 2).It is therefore possible that the stronger correlation wefound for the RGP map between R-map vs. R-local, whencalculated as genes/cM, was random, but this should betested for more maps and for more species. Indeed, itwill be interesting to test the predictions of this paper inboth larger and more compact genomes.
As more robust, higher-resolution maps across moreparental genotypes become available, our hope is thatthe methodology we have described here will generateaccurate mapping population size graphs that predict arange of N-values for a given target allele. We concludeby showing an example of such a map in Figure 8, re-presenting our predictions for rice chromosome 3. Inspite of the challenges noted, this map did accuratelypredict the very different mapping population sizesrequired for the five alleles shown.
We thank the corresponding authors of the positional cloning stud-ies cited here for numerous personal communications. Funds for workon rice genome annotation at The Institute for Genomic Researchwere through a grant from the National Science Foundation (DBI0321538) to C. Robin Buell. This research was supported by an OntarioPremier’s Research Excellence Award, an Ontario Ministry of Agricul-ture and Food (OMAF) grant, and a Discovery Grant from the NaturalSciences and Engineering Research Council of Canada, to M.N.R.
LITERATURE CITED
Ashikari, M., H. Sakakibara, S. Y. Lin, T. Yamamoto, T. Takashi
et al., 2005 Cytokinin oxidase regulates rice grain production.Science 309: 741–745.
Ballinger, D. G., and S. Benzer, 1989 Targeted gene mutations inDrosophila. Proc. Natl. Acad. Sci. USA 86: 9402–9406.
Bennetzen, J. L., and W. Ramakrishna, 2002 Exceptional haplo-type variation in maize. Proc. Natl. Acad. Sci. USA 99: 9093–9095.
Botstein, D., R. L. White, M. Skolnick and R. W. Davis, 1980 Con-struction of a genetic linkage map in man using restriction frag-ment length polymorphisms. Am. J. Hum. Genet. 32: 314–331.
Burr, B., and F. A. Burr, 1991 Recombinant inbreds for molecularmapping in maize—theoretical and practical considerations.Trends Genet. 7: 55–60.
Chen, X. W., J. J. Shang, D. X. Chen, C. L. Lei, Y. Zou et al., 2006 AB-lectin receptor kinase gene conferring rice blast resistance.Plant J. 46: 794–804.
Chu, H. W., Q. Qian, W. Q. Liang, C. S. Yin, H. X. Tan et al.,2006 The floral organ number4 gene encoding a putative or-tholog of Arabidopsis CLAVATA3 regulates apical meristem sizein rice. Plant Physiol. 142: 1039–1052.
Chu, Z. H., B. Y. Fu, H. Yang, C. G. Xu, Z. K. Li et al., 2006 Targetingxa13, a recessive gene for bacterial blight resistance in rice. The-oret. Appl. Genet. 112: 455–461.
Crawford, D. C., T. Bhangale, N. Li, G. Hellenthal, M. J. Rieder
et al., 2004 Evidence for substantial fine-scale variation in recom-bination rates across the human genome. Nat. Genet. 36: 700–706.
Dinka, S. J., and M. N. Raizada, 2006 Inexpensive fine mappingand positional cloning in plants using visible, mapped trans-genes. Can. J. Bot. 84: 179–188.
Durrett, R. T., K. Y. Chen and S. D. Tanksley, 2002 A simple for-mula useful for positional cloning. Genetics 160: 353–355.
Ferreira, A., M. F. da Silva, L. Silva and C. D. Cruz, 2006 Estimatingthe effects of population size and type on the accuracy of geneticmaps. Genet. Molec. Biol. 29: 187–192.
Fu, H. H., W. K. Park, X. H. Yan, Z. W. Zheng, B. Z. Shen et al.,2001 The highly recombinogenic bz locus lies in an unusuallygene-rich region of the maize genome. Proc. Natl. Acad. Sci. USA98: 8903–8908.
Garris, A. J., T. H. Tai, J. Coborn, S. Kresovich and S. R. McCouch,2005 Genetic structure and diversity in Oryza sativa L. Genetics169: 1631–1638.
Haga, K., M. Takano, R. Neumann and M. Iino, 2005 The rice co-leoptile phototropism gene encoding an ortholog of ArabidopsisNPH3 is required for phototropism of coleoptiles and lateraltranslocation of auxin. Plant Cell 17: 103–115.
Haldane, J. B. S., 1919 The combination of linkage values and thecalculation of distances between the loci of linked factors. J.Genet. 8: 299–309.
Harushima, Y., M. Yano, P. Shomura, M. Sato, T. Shimano et al.,1998 A high-density rice genetic linkage map with 2275markers using a single F-2 population. Genetics 148: 479–494.
Hong, Z., M. Ueguchi-Tanaka, K. Umemura, S. Uozu, S. Fujioka
et al., 2003 A rice brassinosteroid-deficient mutant, ebisu dwarf(d2), is caused by a loss of function of a new member of cyto-chrome P450. Plant Cell 15: 2900–2910.
IRGSP, 2005 The map-based sequence of the rice genome. Nature436: 793–800.
Iyer, A. S., and S. R. McCouch, 2004 The rice bacterial blight re-sistance gene xa5 encodes a novel form of disease resistance.Mol. Plant Microbe Interact. 17: 1348–1354.
Komori, T., T. Yamamoto, N. Takemori, M. Kashihara, H.Matsushima et al., 2003 Fine genetic mapping of the nucleargene, Rf-1, that restores the BT-type cytoplasmic male sterility inrice (Oryza sativa L.) by PCR-based markers. Euphytica 129: 241–247.
Komori, T., S. Ohta, N. Murai, Y. Takakura, Y. Kuraya et al.,2004 Map-based cloning of a fertility restorer gene, Rf-1, in rice(Oryza sativa L.). Plant J. 37: 315–325.
Komorisono, M., M. Ueguchi-Tanaka, I. Aichi, Y. Hasegawa, M.Ashikari et al., 2005 Analysis of the rice mutant dwarf and gla-dius leaf 1. Aberrant katanin-mediated microtubule organizationcauses up-regulation of gibberellin biosynthetic genes indepen-dently of gibberellin signaling. Plant Physiol. 138: 1982–1993.
Konishi, S., T. Izawa, S. Y. Lin, K. Ebana, Y. Fukuta et al., 2006 AnSNP caused loss of seed shattering during rice domestication. Sci-ence 312: 1392–1396.
Lenormand, T., and J. Dutheil, 2005 Recombination difference be-tween sexes: A role for haploid selection. PLoS Biol. 3: 396–403.
Li, D. T., L. M. Chen, L. Jiang, S. S. Zhu, Z. G. Zhao et al., 2007 Finemapping of S32(t), a new gene causing hybrid embryo sac steril-ity in a Chinese landrace rice (Oryza sativa L.). Theoret. Appl.Genet. 114: 515–524.
Li, L. L., M. Jean and F. Belzile, 2006 The impact of sequence di-vergence and DNA mismatch repair on homeologous recombi-nation in Arabidopsis. Plant J. 45: 908–916.
Li, X. Y., Q. Qian, Z. M. Fu, Y. H. Wang, G. S. Xiong et al.,2003 Control of tillering in rice. Nature 422: 618–621.
2052 S. J. Dinka et al.
Li, Y. H., O. Qian, Y. H. Zhou, M. X. Yan, L. Sun et al.,2003 BRITTLE CULM1, which encodes a COBRA-like protein,affects the mechanical properties of rice plants. Plant Cell 15:2020–2031.
Liu, X. Q., L. Wang, S. Chen, F. Lin and Q. H. Pan, 2005 Geneticand physical mapping of Pi36(t), a novel rice blast resistancegene located on rice chromosome 8. Mol. Genet. Genom. 274:394–401.
Lukacsovich, T., and A. S. Waldman, 1999 Suppression of intra-chromosomal gene conversion in mammalian cells by small de-grees of sequence divergence. Genetics 151: 1559–1568.
Luo, A. D., Q. Qian, H. F. Yin, X. Q. Liu, C. X. Yin et al., 2006 EUI1,encoding a putative cytochrome P450 monooxygenase, regulatesinternode elongation by modulating gibberellin responses inrice. Plant Cell Physiol. 47: 181–191.
McVean, G. A. T., S. R. Myers, S. Hunt, P. Deloukas, D. R. Bentley
et al., 2004 The fine-scale structure of recombination rate vari-ation in the human genome. Science 304: 581–584.
Miyoshi, K., B. O. Ahn, T. Kawakatsu, Y. Ito, J. I. Itoh et al.,2004 PLASTOCHRON1, a timekeeper of leaf initiation in rice,encodes cytochrome P450. Proc. Natl. Acad. Sci. USA 101: 875–880.
Nachman, M. W., 2002 Variation in recombination rate across thegenome: evidence and implications. Curr. Opin. Genet. Dev.12: 657–663.
Nishimura, A., M. Ashikari, S. Lin, T. Takashi, E. R. Angeles et al.,2005 Isolation of a rice regeneration quantitative trait loci geneand its application to transformation systems. Proc. Natl. Acad.Sci. USA 102: 11940–11944.
Pan, G., X. Y. Zhang, K. D. Liu, J. W. Zhang, X. Z. Wu et al.,2006 Map-based cloning of a novel rice cytochrome P450 geneCYP81A6 that confers resistance to two different classes of herbi-cides. Plant Mol. Biol. 61: 933–943.
Paterson, A., E. Lander, J. Hewitt, S. Peterson, S. Lincoln et al.,1988 Resolution of quantitative traits into Mendelian factors byusing a complete linkage map of restriction fragment lengthpolymorphisms. Nature 335: 721–726.
Paterson, A. H., M. Freeling and T. Sasaki, 2005 Grains of knowl-edge: genomics of model cereals. Genome Res. 15: 1643–1650.
Qiu, S. Q., K. D. Liu, J. X. Jiang, X. Song, C. G. Xu et al., 2005 De-limitation of the rice wide compatibility gene S5(n) to a 40-kbDNA fragment. Theoret. Appl. Genet. 111: 1080–1086.
Raizada, M. N., 2003 RescueMu protocols for maize functional ge-nomics. Methods Mol. Biol. 236: 37–58.
Ren, Z. H., J. P. Gao, L. G. Li, X. L. Cai, W. Huang et al., 2005 A ricequantitative trait locus for salt tolerance encodes a sodium trans-porter. Nat. Genet. 37: 1141–1146.
Rieseberg, L. H., 2001 Chromosomal rearrangements and specia-tion. Trends Ecol. Evol. 16: 351–358.
Sazuka, T., I. Aichi, T. Kawai, N. Matsuo, H. Kitano et al.,2005 The rice mutant dwarf bamboo shoot 1: A leaky mutantof the NACK-type kinesin-like gene can initiate organ primordiabut not organ development. Plant Cell Physiol. 46: 1934–1943.
Sharma, T. R., M. S. Madhav, B. K. Singh, P. Shanker, T. K. Jana
et al., 2005 High-resolution mapping, cloning and molecularcharacterization of the Pi-k(h) gene of rice, which confers re-sistance to Magnaporthe grisea. Mol. Genet. Genom. 274: 569–578.
Singer, A., H. Perlman, Y. L. Yan, C. Walker, G. Corley-Smith et al.,2002 Sex-specific recombination rates in zebrafish (Danio rerio).Genetics 160: 649–657.
Sun, X. L., Y. L. Cao, Z. F. Yang, C. G. Xu, X. H. Li et al., 2004 Xa26,a gene conferring resistance to Xanthomonas oryzae pv. oryzae inrice, encodes an LRR receptor kinase-like protein. Plant J. 37:517–527.
Suzaki, T., M. Sato, M. Ashikari, M. Miyoshi, Y. Nagato et al.,2004 The gene FLORAL ORGAN NUMBER1 regulates floralmeristem size in rice and encodes a leucine-rich repeat receptorkinase orthologous to Arabidopsis CLAVATA1. Development131: 5649–5657.
Takahashi, Y., A. Shomura, T. Sasaki and M. Yano, 2001 Hd6, arice quantitative trait locus involved in photoperiod sensitivity,encodes the alpha subunit of protein kinase CK2. Proc. Natl.Acad. Sci. USA 98: 7922–7927.
Tanabe, S., M. Ashikari, S. Fujioka, S. Takatsuto, S. Yoshida et al.,2005 A novel cytochrome P450 is implicated in brassinosteroidbiosynthesis via the characterization of a rice dwarf mutant,dwarf11, with reduced seed length. Plant Cell 17: 776–790.
Tang, T., J. Lu, J. Huang, J. He, S. R. McCouch et al., 2006 Genomicvariation in rice: genesis of highly polymorphic linkage blocksduring domestication. PLoS Genet. 2: e199.
Tanksley, S. D., M. W. Ganal and G. B. Martin, 1995 Chromo-some landing: a paradigm for map-based gene cloning in plantswith large genomes. Trends Genet. 11: 63–68.
Ueda, T., T. Sato, J. Hidema, T. Hirouchi, K. Yamamoto et al.,2005 qUVR-10, a major quantitative trait locus for ultraviolet-B resistance in rice, encodes cyclobutane pyrimidine dimer pho-tolyase. Genetics 171: 1941–1950.
Ueguchi-Tanaka, M., M. Ashikari, M. Nakajima, H. Itoh, E. Katoh
et al., 2005 GIBBERELLIN INSENSITIVE DWARF1 encodes asoluble receptor for gibberellin. Nature 437: 693–698.
van Os, H., S. Andrzejewski, E. Bakker, I. Barrena, G. J. Bryan
et al., 2006 Construction of a 10,000-marker ultradense geneticrecombination map of potato: Providing a framework for accel-erated gene isolation and a genomewide physical map. Genetics173: 1075–1087.
Wan, X. Y., J. M. Wan, L. Jiang, J. K. Wang, H. Q. Zhai et al.,2006 QTL analysis for rice grain length and fine mapping ofan identified QTL with stable and major effects. Theoret. Appl.Genet. 112: 1258–1270.
Wang, G. W., Y. Q. He, C. G. Xu and Q. F. Zhang, 2006 Fine map-ping of f5-Du, a gene conferring wide-compatibility for pollenfertility in inter-subspecific hybrids of rice (Oryza sativa L.). The-oret. Appl. Genet. 112: 382–387.
Wang, Q. H., and H. K. Dooner, 2006 Remarkable variation inmaize genome structure inferred from haplotype diversity atthe bz locus. Proc. Natl. Acad. Sci. USA 103: 17644–17649.
Wang, Z. X., M. Yano, U. Yamanouchi, M. Iwamoto, L. Monna et al.,1999 The Pib gene for rice blast resistance belongs to the nu-cleotide binding and leucine-rich repeat class of plant disease re-sistance genes. Plant J. 19: 55–64.
Wu, J. Z., H. Mizuno, M. Hayashi-Tsugane, Y. Ito, Y. Chiden et al.,2003 Physical maps and recombination frequency of six ricechromosomes. Plant J. 36: 720–730.
Wu, T. D., and C. K. Watanabe, 2005 GMAP: a genomic mappingand alignment program for mRNA and EST sequences. Bioinfor-matics 21: 1859–1875.
Yamanouchi, U., M. Yano, H. X. Lin, M. Ashikari and K. Yamada,2002 A rice spotted leaf gene, SpI7, encodes a heat stress tran-scription factor protein. Proc. Natl. Acad. Sci. USA 99: 7530–7535.
Yang, H. Y., A. Q. You, Z. F. Yang, F. Zhang, R. F. He et al.,2004 High-resolution genetic mapping at the Bph15 locusfor brown planthopper resistance in rice (Oryza sativa L.). The-oret. Appl.Genet. 110: 182–191.
Yang, Z., X. Sun, S. Wang and Q. Zhang, 2003 Genetic and physicalmapping of a new gene for bacterial blight resistance in rice. The-oret. Appl. Genet. 106: 1467–1472.
Yano, M., Y. Katayose, M. Ashikari, U. Yamanouchi, L. Monna et al.,2000 Hd1, a major photoperiod sensitivity quantitative trait lo-cus in rice, is closely related to the Arabidopsis flowering timegene CONSTANS. Plant Cell 12: 2473–2483.
Yoshimura, S., U. Yamanouchi, Y. Katayose, S. Toki, Z. X. Wang
et al., 1998 Expression of Xa1, a bacterial blight-resistance genein rice, is induced by bacterial inoculation. Proc. Natl. Acad. Sci.USA 95: 1663–1668.
Yu, J., J. Wang, W. Lin, S. Li and H. E. A. Li, 2005 The genomes ofOryza sativa: a history of duplications. PLoS Biol 3: e38.
Yuan, Q. P., O. Y. Shu, A. H. Wang, W. Zhu, R. Maiti et al., 2005 TheInstitute for Genomic Research Osa1 rice genome annotation da-tabase. Plant Physiol. 138: 17–26.
Zeng, L. R., S. H. Qu, A. Bordeos, C. W. Yang, M. Baraoidan et al.,2004 Spotted leaf11, a negative regulator of plant cell deathand defense, encodes a U-box/armadillo repeat protein en-dowed with E3 ubiquitin ligase activity. Plant Cell 16: 2795–2808.
Zhang, H. T., J. J. Li, J. H. Yoo, S. C. Yoo, S. H. Cho et al., 2006 Ricechlorina-1 and chlorina-9 encode ChlD and ChlI subunits ofMg-chelatase, a key enzyme for chlorophyll synthesis and chloro-plast development. Plant Mol. Biol. 62: 325–337.
Predicting Positional Gene Cloning 2053
Zhang, K. W., Q. Qian, Z. J. Huang, Y. Q. Wang, M. Li et al.,2006 GOLD HULL AND INTERNODE2 encodes a primarilymultifunctional cinnamyl-alcohol dehydrogenase in rice1. PlantPhysiol. 140: 972–983.
Zhu, Y. Y., T. Nomura, Y. H. Xu, Y. Y. Zhang, Y. Peng et al.,2006 ELONGATED UPPERMOST INTERNODE encodes a cy-tochrome P450 monooxygenase that epoxidizes gibberellins in anovel deactivation reaction in rice. Plant Cell 18: 442–456.
Zou, J. H., Z. X. Chen, S. Y. Zhang, W. P. Zhang, G. H. Jiang et al.,2005 Characterizations and fine mapping of a mutant gene for
high tillering and dwarf in rice (Oryza sativa L.). Planta 222: 604–612.
Zou, J. H., S. Y. Zhang, W. P. Zhang, G. Li, Z. X. Chen et al.,2006 The rice HIGH-TILLERING DWARF1 encoding an ortho-log of Arabidopsis MAX3 is required for negative regulation ofthe outgrowth of axillary buds. Plant J. 48: 687–696.
Communicating editor: J. A. Birchler
2054 S. J. Dinka et al.