genetic differentiation

Upload: mani-kantan-k

Post on 02-Jun-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Genetic Differentiation

    1/10

    Copyright 1997 by the Genetics Society of America

    Genetic Differentiation and Estimation of Gene Flow from FStatisticsUnder Isolation by Distance

    Franqois Rousset

    Laboratoire Ginitiqueet Enuironnement, Inst itut des Sciences de Lduolution, Un iuersiti de Montpellier II,34095 Montpellier, FranceManuscript received September 17, 1996

    Accepted for publication December 20, 1996

    ABSTRACTI reexamine the use of isolation by distance models as a basis for the estimation of demographic

    parameters from measures of population subdivision. To that aim, I first provide results for values of Fstatistics in one-dimensional models and coalescence times in two-dimensional models, and make moreprecise earlier results for Fstatistics in two-dimensional models and coalescence times in nedimensionalmodels. Based on these results, I propose a method of data analysis involving the regression of Fyr/ (1

    F S T ) stimates for pairs f subpopulations on geographic distance or populations along linear habitatsor logarithm of distance for populations n two-dimensional habitats. This regression provides in princi-ple an estimate of the product of population density and second moment f parental axial distance. ntwo cases where comparison to direct stimates is possible, the method proposed here s more satisfactorythan previous indirect methods.

    A ALYSES of the structure of natu ral popula tionsare often based on the island or stepping stonemodels. Functions of probabilities of identity of geneswithin and between units, such as Fsr, are estimatedan d compared to expecta tions und er the island model.The relationship between Fs7. and the numbe r of mi-grants according o this model is often used to quantifygene flow. This relationship has been shown to approxi-

    mate the relationship between Fsr and the number ofmigrants in stepping stone models on a two-dimen-sional space ( KIMURA and M A RWA M A 971; CROW ndAOKI 1984; SLATKIN nd BARTON1989), and is usedto obtain indirect estimates of number of migrants orneighborhood size from genetic data. These modelscan also be used to study the distribution of coalescencetimes ( SLATKIN 991, 1993) , and therefore to obtainthe value of measures of population subdivision thathave been proposed for analyzing differences in allelesize distributions a t microsatellite loci, numbe r of nucle-otide differences, a nd maybe quantitative traits (HLJD-SON 1990; h N D E 1992; SLATKIN 1995). Analytical ap-proximations have been obtained for oalescence timesin the one-dimensional steppin g stone model SLATKIN1991 ) .

    Here, I present a new method of analysis tha t maybe ded uced from solation by distance models. Th e esti-mation method uses estimates of &fo r pairs of subpop-ulations rather than a single Fstatistic for the entire etof subpopulations. To that aim, I first provide results

    Addressfor correspondence:Franqois Rousset, Laboratoire Gknktiqueet Environnement, Institut des Science s de Evolution, CC065, USTL,Place E. Bataillon, 34095 Montpellier Cedex 05, France.E-mail: [email protected]

    Genetics 1 4 5 1219-1228 (Apr il, 1997)

    concerning expected values of measures of populationsubdivision unde r isolation by distance. Then, I proposean indirect estimator of the produc t of population den-sity and second moment of parental axial distance. The-ory suggests that this metho d is more reliable tha n cur-rently used methods of analysis, particularly for types ofdispersal distributions that may be common in naturalpopulations.

    There may be some correlation between direct andindirect estimates of demographic parameters, ( e . g . ,HASTINGS and HARRISON 1994; SLATKIN 994; WARDetal. 1994), but detailed case studies generally argue fordiscrepancies between the two approaches ( e . g . ,CAMP-BELL and DOOLEY 992; SCHILTHUIZEN nd LOMBAERTS1994;JOHNSON and BLACK 1995) . Cases where it is pos-sible to compare indirect estimates to direct esti-mates obtained from observation of population densi-ties and individual dispersal remain scarce. For the twomost detailed data sets I have foun d, where such a com-parison is possible, I find a better agreement betweendirect estimates and indire ct estimates obtained by thepresent method than with other indirect methods.

    ANALJlTIC THEORY

    Identity by descent in one and two dimensions: I willconsider discrete generation models for populations onfinite an d infinite lattices in o ne or two dimensions,Le., for subpopulations on a circle or twodimensionaltorus of finite o r infinite size. Because exact results areavailable for these models, it is possible to discuss the irinterpr etatio n without conce rn for proble ms f mathe-matical formulat ion. Some models of potentially contin-uously distributed popula tions may yield similar results

  • 8/10/2019 Genetic Differentiation

    2/10

    1220 F. Rousset

    but heir formulation remains difficult (SAWYE R andFELSENSTEIN 981).

    M A RWA M A (1970) and SAWYER 197 6) provide de-tailed and complementary mathematical expositions ofthe lattice models. They are lengthy and will not berepeated here. Th e APPENDIX summarizes some resultsfor the finite and infinite population models ie., witha finite or infinite number of subpopulations). Beloware the main assumptions, meaning of parameters, andsome approximations or he infinite lattice modelsbased on results of SAWYER 1977).

    In their basic formulation he models assume thattwo gametes produc ed in the same subpopulation haveprobability 1/ ( 2 N ) of being copies of the same gene.Under the multinomial sampling scheme of the Wright-Fisher model this amounts to assume tha t here areeither 2N breeding haploid ndividuals per subpopu-lation, or Ndiploid individuals an d that dispersal occursthrough gametes only. The results are accurate for zy-gotic dispersal (NAGYIAKI 983) and some ot her mat-ing systems are accounted for y use of effective popu-lation size arguments SAWYER 1976; TACHIDA andYOSHIMARU 996). Within these models, the effectivepopula tion size is a measure of the rate at which genescoalesce per generation.

    These models also assume that there s a finite secondmome nt of the distance between a ge ne and its parentin the previous generation. For symmetric dispersal inone dimension, this is also the variance u of parentalposition X relative to offspring position. u* s not the

    variance Var ( XI ) of unsigned distance I XI as a =Var X ) = Var (1x1) + E I X 1 ) . For isotropic dis-persal in two dimensions u 2 s defined as the varianceof parental axial distance X , tha t would be measuredalong one dimension. The noncentral second momentof parent-offspring euc lidian distance is 20 and shouldnot be confused with the variance Var R ) of the Euclid-ian distance as 2a = Var R ) + E ( R ) ( e.g., CRAWORD1984).

    If u is the mutation rate per generation, and Q heprobability of identity by descent of a pair of genes atjste ps from each othe r, then sing the results of SAWYER(1977) as explained in the APPENDIX, one obtains inon e dimension (all starred symbols will refer to theinfinite lattice models) :

    0-

    e-62 u 1/v=- ( 1 )

    1 - e,* 4 N u L This is an approximation or large geographic dis-tances. On the other hand,

    Q f 1 A,=-+-, ( 2 )1 e,* 4 N u a 4Nu

    where A, is a consta nt dependent on the ispersal distri-

    bution, but not on N or u . Its definition is given by

    SAWYER 19 77 ), Equation 2.4) and its biological sig-nificance is similar to that of A2 discussed below.

    In two dimensions, for he probability of identityof genes at j steps from each other in one dimensionand k in the other, one has

    O T = ( r / u )4N7ru2 ( 3 )1 e:

    where 8 . stands for O,,,,, r = + k 1s the distancebetween genes and & is the modified Bessel functionof second kind and zero order.

    A different formula must be considered when r = 0:

    J - 2 1 .

    Q,T -In ( a) 27rA4N7ra 4 )1 Q f

    A2 is of the same nature as AI above and an explicitdefinition is given in the APPENDIX (Equa tion A1 1 . Itsbiological significance is discussed later.

    FSTand related quantities: The previous theory pro-vides values of the probabili ties of identity by descent .Hence it can be used to provide values of the correla-tion P, = B o Q , / ( 1 Oj of genes within popula-tions with respect to genes at some distance j . Thisquantity is different from F,,;, which is better defined

    as Qo - Q ) / 1 - Q ) where the Q s are probabilitiesof identity in state rather than identity by descent, andfrom the ratio of average coalescence times, CS,; = 7;

    T , T, where the 7s are average coalescence timesof pairs of genes at distance j . These distinctions arenecessary to underst and how Fstatistics are affected bythemutation rate and mutation process (ROUSSET1996). However, proper ties of &,can be deduced fromthose of Cs-,- nd ,B, and in finite populations in thelimit of low mutation rate, the values of all hree param-eters are identical ( SLATKIN 991; ROUSSET 996).

    When 0, = 0, P reaches its maximum possible value,which is O o .This is the limit value of ,B at long distances.Under the assumption that F,,. = P = B o , Equation 4implies 1/&, 1 = 4N7ra2/ (-In (&) + 2 ~ A 2 )where the denominato r is a function of the mutation

    rate and distribution of dispersal, but not of distance.For the stepping stone model 1 FYI, 1 = 2Nm7r /(-In ( a) % A 2 ) where m is the fraction of mi-grants. Similar formulas have been obtained by and WEISS 1964) and SLATKIN nd BARTON 1989),an d proposed as a basis for estimating eith er 4N7ra o rNm by the latter au thors.

    It appears useful to reconsider the underlying mod-els. For a one-dimensional infinite population on e ob-tains

    PI= ( 5 )1 0: 41% 4Na

    I 1

  • 8/10/2019 Genetic Differentiation

    3/10

    Isolation by Distance 1221

    which is the difference between ( 1 and 2 ) (see AP-PENDIX ) . For a two-dimensional infinite population,

    which is the difference between ( 3 ) and ( 4 ) . The ex-pression in SLATKIN nd BARTON 1989) for &at shortdistances is equivalent except that they do not give adefinition for the equivalent of A2 in their formulas.

    The low mutation limits of the above expressionsyield

    If AI is neglected, this is in agreement with a result ofSLATKIN 1993) for the stepping stone odel. The two-

    dimensional result is

    CfT In ( r / a ) - 0.116 + 27rA2M

    4Nxu2> ( 8 )

    1 - C h .

    because ik o x) - In x/2) + y e for small x(ABRAMOVITZ and STEGUN 972, eq. 9.6.13), where y I= 0.5772 . . . is Eulers constant. Note that Csr/ ( 1 -

    is the coalescent approximation for 1 ( 2 Mwhere M is the quantity discussed by SLATKIN 1993) .For the log-log plots of i us. distance discussed there,

    log hi M log ( 2 N a 2 ) log ( A l a + j ) ( 9 )in one dimension, and

    log M log (2N7ra2) - log (In ( r / a )

    - 0.116 + 2 ~ A p ) (10)in two dimensions.

    Numerical evaluation of f l j and Plkfor finite popula-tion structures with different distributions of dispersal(detailed in the APPENDIX ) are compared o the infinitepopulation coalescent approximation with and withoutthe AI or A2 term in Figures 1 and 2. The theory isremarkably accurate if the As are taken i nto account.

    Schematically, the variables considered have a linearrelationship the s lope of which is determined by No2only and the intercept determined by both Nand m orecomplex features of the dispersal distribution embod-ied in the definition of the As. It is not easy to relatethe A values to particular features of the d istribution ofdispersal. However, a given value of a 2 may be due toa relatively large num ber of migrants at short distancesor to a few long distance migrants. Differentiation be-tween neighboring subpopulations should be more ef-ficiently prevented in the former case than in the latter,resulting in negative A 2 / a 2 values if all migrants arefrom neighboring populations (e .g . ,the stepping stone

    model with a = 1 200) , and in positive A 2 /a values

    geometric, d = 1/2 , q = 2/3, a2 = 15/4

    binomial, n = 16, o2 = 4

    stepping, d = 1 / 1 O O , a2 = 1/2OO

    20 4 0 60 80 100 120

    distance (lattice steps)

    FIGURE 1.-Coalescence and identity measures in on ed i-mensional models. Exact values of p / ( 1 - p ) * . ) , com-puted from ( A 3 ) and ( A 7 ) , are compared to the asymptoticapproximation for & / ( 1 Crr) in finite onedimensionallattices for different distributions of migration (Equ atio n 7,dark gray line) and the same approximation without the A,term (-----). u = and N = 20 in all ases ( a differentvalue of Nwould only change the y-axis scale). Lattice size is1000 steps and differentiation shown for distances up to 125steps. Note the differences between identity by descent mea-sures in the finite population * * * ) and infinite population

    (light gray line) models.

  • 8/10/2019 Genetic Differentiation

    4/10

    1222 F. Rousset

    geometric, d = 2q = 2/21, o2 = 0.055

    0.0015

    0.001

    0.0005

    I 10.

    geometric, d = 1 / 2 , q = 2/3, o2 = 15/40.004 ,

    0.0075

    0.005

    0.0025 __ . -......

    -__..-..< _....-

    ...I 10.00.

    -

    geometric, d = 2q = 4/3, o2 = 10

    binomial, n = 16, o2 = 4

    0.003

    0.002

    0.001

    stepping, d = 1/100, o2 = 1/2006

    4

    5

    _..

    *.*e

    _.

    /-..,...-

    _

    _..--.-../.a_...--.-. .=-.

    1

    10. 100.

    stepping, d = 1/4, o2 = 1/80.175

    0.1 5

    0.125

    0.1

    0.075

    0.05

    0.025

    10.00.

    distance (lattice steps)

    FIGURE .-Coalescence and identity measures in two-dimensional models. Note the logarithmic scale for distance. Exactvalues of ,B/ ( 1 - p ) . ) , computed from (A5) and (A7) , are comp ared to the asymptotic approximation for C~yr/ 1C y , ) in finite two-dimensional lattices for different distributions of migration (Equ atio n 8, dark gray line) and he sameapproximatio n without the A2 term (-----) . Th e value of 1 ( 4 N ~ u ' ) s indicated by

  • 8/10/2019 Genetic Differentiation

    5/10

    Isolation by Distance 1223

    X

    FIGURE .-The difference between coalescence and iden -tity measures. These differences are measured by R,,, =-xK; x ) (plain ine) or = e ? (gray line), in two or onedimension, respectively. See text for usage.

    moments of unsigned distance, but from thenoncentral m om en ts ). They would tend to have highA 2 /a values, so differentiation among populations atlow distances ( r < a could be common. However,kurtosis is not a perfect descriptor of the exten t of mi-gration between neighboring populations in the pres-ent models nor of A * / a . For some extreme latticemodels strong kurtosis is compatible with migrationonly between neighboring subpopulations (t he step-ping stone example with a = 1 200 has the highestkurtosis of all examples in Figure 2 ) .

    P and Cw are practically identical at short distancesbut progressively depart from each other . In wo dimen-sions, the slope of the relationship between P / 1 -P and logarithm of distance, which is independent ofA P,will be R I / , times that for Cs / ( 1 - Cyr) at distancer where d& ( r / a ) / d In ( r ) = R I / C . This ratio ofslopes can be deduced from the value of the function

    -xKA( (Figure 3 ) : at distance r = xu/ RI / , =- A( x). For example, RlIc = 0.8 for r =0.56a/&, and R I / < ; = 0.2 for r = 2.4a /&. It isnecessary to know both u and a to determine the dis-tance at which some discrepancy between coalescenceand identity measures is reached. For example, if u =

    and a 2 0.1 km, the distance where R I / , = 0.2is only 53.7 km, and 1697 km, if u = l o p 6and a = 1km. In the onedi mensio nal model, the slope of therelationship between p / 1 0) and distance (n ot itslogarithm) is given by values of epx rather han -

    xK& x) (see Equation 5 ) . The deviation from the coa-lescent approximation occurs at a shorter distance inone than in two dimensions.

    These values are valid only for identity b descentmeasures. The distance will be shorter, k 1 ) ktimes those given above, for a symmetric k-allele model,and onger for stepwise mutation models (ROUSSET1996 ). In two dimensions the differences on FS,.values

    0.06

    0.05

    4 0.04

    Q 0.03

    0.02

    0.01

    I

    distance

    0.01 1 n y = 4

    Q 0 . m II

    0.006.

    Qo.cQ4 ;

    0 002

    10. 100.

    y = 2

    y = 4

    y = 20

    ny = 20

    n y = 100

    log(distance)

    FIGURE 4.-Differentiation in an elongated habitat. Thisfigure shows differentiation as a function of distance and loga-rithm of distance along the long axis of an habitat of 1000 Xn, subpopulations, for values of ny from 2 to 100. Dispersalfollows the geometric model with d = 2q = 4/3. Exact valuesof / ? / 1 - ?) ( - * ) were computed from A 5 ) and ( A 7 ) .--- show expected slopes for onedi mensi onal abitats with thesame linear density Nn,. The dark gray line is the asymptoticapproximation for & / ( 1 -

    cy,(Equation 8 ) . u = lo-

    and N = 20.

    due to differences in mutation rates may not be consid-erable. For example, for N = 6.23, a * = 2.72, and A2= 0.745 (values chosen to fit the indirect estimates inthe example from human populations below), the max-imum P* value is 0.05 for u = l o p 6and 0.04 or u =lop4 . Differences due to different mutational processesare smaller and may be difficult to detect (details notshown ) .

    When should a narrow elongated habitat be consid-

    ered one- or two-dimensional is not obvious. Numericalexamples (Figure 4 ) suggest that in such habitats, dif-ferentiation between populations at distance smallerthan half the width of he habitat follows the two-dimen-sional model in that there is a linear relationship be-tween &,/ (1 - &) and logarithm of distance, anddifferentiation between populations at distances largerthan the width of the habitat follows the one-dimen-sional model, with a linear relationship between &,./( 1 s 7 . ) and distance and slope determined by thedensity of individuals per unit length . Thus , ifferentia-tion in elongated habitats can be analyzed using thetwo-dimensional model when habitat is locally two-dimensional at the cale of study efined by the distance

  • 8/10/2019 Genetic Differentiation

    6/10

    1224 F. Rousset

    between samples, and using the one-dimensional modelat larger distances.

    IMPLICATIONS FOR DATA ANALYSIS

    The previous results show that an appropriat e repre-sentation of data is a plot of estimates of F y , / ( 1 FS,.)against the distance in on e dimension or logarithm ofdistance in two dimensions. In the latter case, the ex-pected relationship is approximately linear, y = a + bxwith slope b = 1 / 4 % ~ ' ) and intercept a = -In ( a )+ - In 2 ) + 27rA2.The slope of the regression maybe used to estimate 1 / (4N7ra'). The quantity

    e - a / b = 2oe 2- Y, = 1.123~e ~2(11 )

    would be independent of subpopulation size. If wecould somehow discard A 2 , t would be possible to esti-mate both a and N , for example, by 8 = e - ' / ' / 1.123a n d N = 1 / ( 4 7 r & 2 ) , i f A 2 = 0 . T h i s w o u l d b e a v e r ypoor method for most examples in Figure 2, and thereis no reason to consider A2 negligible in real situations,hence o and N cannot be estimated separately.

    Th e linear relationship never perfectly holds, and ifpossible it is preferable to take into account only thedifferentiation observed at distances r > a and r lowerthan he value dete rmined by some acceptable R,/cvalue, for example, r < 0.56a/& for R,,, = 0.8 intwo dimensions. If only a minimal estimate of a is avail-able, it provides an upper bound to the bias measured

    Interpretation of parameters: In many recent studiesattempting to estimate a demographic parameter fromgenetic data , i t is consi dered appropriat e to estimate anumber of migrants between subpopulations and thiscreates a need o define subpopulations. This raises twodifficulties, that of estimating number of migrants whenmodels of isolation by distance do no t suggest any sim-ple way to do so and that of defining subpopulations.The surface occupied by such subpopulations is oftenestimated by the surface within which no differentiationis detect ed, or equa ted to the neighborhood area as

    given by WRIGHT'S ormulas. These procedures havelittle theoretical supp ort, and it is not the purpose ofthe present paper to provide such support . Nor does itprovide any ground to define subpopulations that canbe considered panmictic in some sense, another prob-lematic interpretation of WRIGHT'S eighborhood.

    In fact, the present method of analysis does not re-quire the definition of subpopulations on a lattice, butonly the knowledge of the distances between samples.In their basic formulat ion he lattice models assumethat the distance between neighboring subpopulationsE is 1. In data analyses the distance between neighbor ingsubpopulations is often unknown or even difficult todefine, so it is necessary to identify quantit ies that inter-

    by & / C .

    pretation does not depend on the assumption that E =1, or equivalently is not affected by a change of scale.

    The slope is such a quantity: whatever the scale itis inversely proportional to the product of populationdensity D, by the second moment of dispersal distance

    0 9 In two dimensions, the slope does not depend onthe spatial scale because of the logarithmic effect ofdistance, and D,a: isalways No : when the distancebetween steps E is 1, this is density, D, = N , imes secondmoment, o f = a:, and if scale is changed this is stilldensity, D, = N / E ~ , imes second moment, of =a 2 . In one dimension, the slope is N o : E : t dependson the spatial scale but is always density, D, = N / E ,times second moment, o: = a E ' .

    Then, in the special case of the two-dimensional step-ping stone model, a = aye = m 2 / 2 and 2D,oT =

    N/E ) 2 = Nm s the number of migrants per sub-

    population. Thus Nm can be estimated even if E is un-known, but obviously this quantity provides no informa-tion about movements of individuals unless E is known.In the one-dimensional stepping stone model, D,a: =N m is not the number of migrants per subpopulationwhen E 1, so the number of migrants cannot be esti-mated if E is unknown.

    Examples: In this section, I give two applications ofthe approach described above. First, I have applied itto Gainj- and Kalam-speaking people of New Guinea forwhich both genetic differentiation and demographicproperties have been extensively studied (e.g., WOOD1987; LONG et al. 19 87 ), which permits a comparisonof different estimates of parameters.

    The natal dispersal data of WOOD t al. ( 1985 providethe position of parents of individuals that reproducedin some place. They can be used to estimate a'. Womenhave 6: = 2.21 km and men have e = 3.23 km.Hence 8 = (6; 8;) 2 = 2.72 km. The populationdensity is -24 individuals * km , and age structure anddistribution of number of offspring may reduce the ef-fective population size by a factor of 2 (WOOD 1987),hence a direct estimate of 4Drra' is 410.

    I reanalyzed the five loci studied by LONG et al.1987) Figure 5 ) The slope of the regression is

    0.0047, hence he slope estimate of 4 k a ' is 213individuals, about half the direct estimate (t he slopeestimate is 265 individuals if differentiation at distanceslarger than 8 only is taken into account) . The stimateof &&computed from all subpopulations is 0.025, henceby the l / & 1 method one would obtain 40, apoorer estimate of 4Drra'. The high differentiation atshor t distances may be at least in part explained by thenature of the migration distribution, which is stronglyleptokurtic (t he kurtosis for the axial migration distri-bution, inferred under the assumption of isotropic mi-gration, is 14. 6), but may also be due to other factorsnot included in the model.

  • 8/10/2019 Genetic Differentiation

    7/10

    0.16

    kI 0.08

    Isolation by Distance

    0.035

    0.025

    0.02

    I

    1225

    -ln(distance)

    FIGURE .-Differentiation amon g Gainj- an d Kalam-speak-ing peoples. Multilocus estimates of pairwise differentiationare plotted against logarithm of map distances (in K m ) . Theregression is y = 0. 00 47 ~ 0.0191 and the maximum distancebetween two subp opul ation s is 14 km. Genotypic data appearin LONG et al. ( 1986). FV. as estimated according to WEIRand COCKERHAM 1984).

    Th e oth er example allowing comparison of directand indirect estimates that I have found is that of theintertidal snail Bembicium vittatum (JOHNSON and BLACK1995) Th e habitat is linear -3 m wide) and the re-sults will be analyzed according to th e on e dimensionalmodel.

    I used the estimate of density at Noddy Shore, D =11 1 adults * m , which is typical of popula tions alongthe 1600 m transect (JOHNSON and BLACK1995).Dispersal was studied by mark-re captur e exper imen tswithin this transect, and I computed an estimate ofthe second moment a' of dispersal distance over onegeneration (a bo ut 12 mon ths) as 2.4 times the estimateof the second moment over 5 months (se e JOHNSONand BLACK19 95 ). From their Table 1, 8 = 2.4 (6.4 '+ 181.5) = 533.9 m'. Th en a direc t estimate of 4Da'is 2.4 l o 5individ uals-m ( th e latter unit may surprise,but in one dimension he product of linear densitytimes second moment of dispersal distance necessarilyscales as a n umb er of individuals times a distance) . Ireanalyzed the genotypic data for th e 1600 m transect(13 loci, provided by M . S. JOHNSON) and found thatthe slope of the regression of p/ 1 - f i to distance is2.76 1O j (individuals * m) (Figure 6 ) . According tothe one-d imensional model the inverse value 3.6 10'individuals m is an estimate of 4 D a 2, 1.5 times thedirect estimate.

    JOHNSON and BLACK S 199 5) estimate of Nm, basedon the approximation log M M log ( N m ) at one inter-deme distance SLATKIN 99 3) , s 22 at 150 m. Consid-ering more genera l migration distributions, and ne-glecting the A I term in Equation 9, this could beinterpreted as an estimate of ( D a ) /distance so anestimate of 4Da ' is 88 X 150 = 13200 individuals m,

    0.015

    0.01

    0.005

    0

    0.005

    0.01

    distance (m)

    FIGURE 6.-Differentiation among Bembicium snails.Multilocus estimates of painvise differentiation are plotted

    against distance. The regression is y = 2.76 lO- x + 0.0015.Fyrwas estimated according to WEIRand COCKERHAM1984).

    far from the direct estimate. This discrepancy maybe due to neglecting AI as well as estimation problemssince log - og representations prevent the use of unbi-ased estimators of F Y I . .

    The analytical theory can be used to assess to whichextent the difference between the slope expected fromcoalescence measures and the slope expected from Fstatistics may bias the analysis. In the human examplethe ratio R,,,: as given by Figure 3 is only 0.9996 at 14km, and in the one-dimensional example this is 0.907at 1600 m (assuming u = lop6and using estimatedvalues of a' in both case s). Thus in the one-dimen-sional case a slight bias in estimation is expected. I tshould result in an overestimation of Na' by some fac-tor

  • 8/10/2019 Genetic Differentiation

    8/10

    1226 F. Rousset

    kurtosis of the dist ribution of dispersal on expectedlevels of differentiation.

    In his formulation of isolation by distance modelsWRIGHT (194 6) took into account the nature of thedispersal distribution by way of the neighborhood size,a number of individuals, and the neighborhood area,the surface occupied by this num ber of individuals. Hismost precise definition of neighborhood size is the re-verse of the chance that two uniting gametes camefrom the same individual (see Equation A1 2) . It isapparen t from Figure 2 that this quantity does not de-scribe any simple feature of the model, neither doesthe neighborhood area that is a proportional to theneighborho od size for all examples in Figure 2. Thequantity 4N7ra2 that determ ines the value of the slopein the two-dimensional model should not be confusedwith the neighborhood size. WRIGHT ound hat hevalue of the neighborhood size is 4N.ira2 for Gaussiandispersal, but is different for other distributions. Herethe result that the slope is 1 ( 4N7ra2) arises withoutreference to a Gaussian distribution of parental dis-tances, and holds more generally.

    The theoretical results agree with earlier numericalstudies of Fyr and related quantities that showed thatdifferentiation in the two-dimensional stepping stone( i . e . , nearest neighbor dispersal) model is roughly asexpected under the island model, and increases morerapidly with distance in one-dimensional models Kr-MURA and MARUYAMA 1971; CROW and Aom 1984; SLAT-KIN and BARTON1989; SLATKIN 991 ) . An importantbut frequently neglected message from the two-dimen-sional stepping stone model is that subpopulations thatnever exchange migrants may not exhibit much higherFy,. values than those that do.

    Another mportant implication of these models isthat an apparent absence of a pattern of isolation bydistance may be due not only to range expansions( SLATKIN 993), but also to sampling at large distances(so that that R,,c 4 1 , or to large values of Da . Thecapacity to detect isolation by distance depends also onthe range of (logar ithm of) distance values nvesti-gated, and on he variance of estimators that is probably

    lower at short distances.However, the models show that the variation of pair-wise F s T values with distance may be more easily inter-pretable that the Fs,.values themselves. Using ~valuesthemselves to estimate demographic parameters is notstraightforward, and the examples confirm these expec-tations. In both of them, the slope and direct esti-mates differ by less than twofold. This agreement maybe due n part to the fact that the estimates are based ondifferentiation at a relatively small geographical scale,where stochastic equilibrium is approached more rap-idly SLATKIN 993) , and where differentiation shouldbe independent of the details of the mutation process.Some other complicating factors such as spatial varia-

    tion of demographic parameters or selection variablein space are also more easily avoided at sho rt distances.

    The relatively small discrepancies between direct an dindirect estimates may be due to minor inadequaciesof the models as well as imprecision of the est imators.Good mark-recapture estimates of a may also be diffi-cult to obtain because of long distance migration out-side the study area. More studies of this kind would benecessary before systematic differences between differ-ent kind of estimates can be detected and interpreted.Nevertheless, the variation of F U . / (1 I+,. with dis-tance contains the most easily interp retable informa-tion, an d the available examples show that there is abetter match between direct and indirect estimates ofDa obtained in this way than with indirect estimatesof this quantity obtained by other methods.

    I thank M. S. JOHNSON for providing the Bembicium data set anduseful comments, and . BRITTON-DAVIDIAN, C. HEVIILON, . JARNE,M. RAYMOND, M. SLATKIN nd particularly Y . MICHALAKIS or provid-ing references, helpful suggestions or comme ne on the manuscript.Some of the programs used in the data analyses were written incollaboration with M. RAYMOND and will be included in future ver-sions of the Genepop package (RAYMOND and ROUSSET 995) avail-able by ftp at ftp.cefe.cnrs-rnop.fr. This work was supported by theProgr amme Enviro nneme nt d u Ce ntre ational de la Recherche Sci-entifique (GDR 11.05) and the Centre de Biosystematique de Mont-pellier. This s paper 97-019 of the Institut des Sciences de IEvolution.

    LITERATURE CITED

    ABRAMOVITZ, M., and I. A. STECUN Editors), 1972 Handbook ofMathematical Functions. Dover, New York.

    CAMPBELL, . R., and J. L. DOOLEY, 992 Th e spatial scale of geneticdifferentiation in a hummingbird-pollinated plant: comparisonwith models of isolation by distance. Am. Nat. 139: 735-748.

    CRAWORD, ., 1984 Th e estimation of neighb orhoo d parametersfor plant populations. Heredity 5 2 273-283.

    CROW, . F., and K. AOKI,1984 Gro up selection for a polygenic be-havionral trait: estimating th e degre e of population subdivision.Proc. Natl. Acad. Sci. USA 81: 6073-6077.

    ENDLER, . A, , 1977 Geographical Variation, Speciation, and Cline.7.Princeto n University Press, Princ eto n, NJ.

    HASTING, , , and S. HARRISON, 994 Metapopulation dynamics andgenetics. Ann. Reo. Ecol. Syst. 25: 167-188.

    HUDSON, . R., 1990 Gene genealogies an d the coalescent process.Oxford Surv. Evol. Biol. 7: 1-44.

    JOHNSON, M. S., and R. BLACK, 995 Neighborhood size and theimportance of barriers to gene flow in an ntertidal snail. Hered-

    ity 75: 142-154.KIMURA, M., and T. MARuyM 1971 Pattern of neutral polymor-

    phism in a geographically structured population. Genet. Res. 18:125-131.

    KIMURA, M. , and G. H. WEISS, 1964 The stepping ston model o fpopulation struct ure and the ecrease of genetic correlation withdistance. Genetics 49: 561-576.

    LANDE, R., 1992 Neutral model of quantitative genetic variance inan island model with local extinction and recolonization. Evolu-tion 46: 381-389.

    LONG, . C., J. M . NAIUC,H. W. MOHRENWEISER, . GEKSHOWITZ,. L.JOHNSON et al., 1986 Genetic characterization of Gainj- andKalam-speaking peoples of Papua New Guinea. Am. J. Phys. An-thropol. 70: 75-96.

    LONG, . C., P. E. SMOUSE nd J. J . WOOD, 1987 Th e allelic correla-tion structure of Gainj- and Kalamspeaking people. 11. The ge-netic distance between po pulati on subdivisions. Genetics 117:273-283.

  • 8/10/2019 Genetic Differentiation

    9/10

    Isolation by Distance 1227

    MALECOT, G., 1950 Quelques schkmas probabilistes sur la variabilitkdes populations naturelles. Annales de lUniversit.5 de Lyon A13: 37-60.

    MALFCOT, G., 1951 Un traitement stochastique des problPmes lin-eaires (mutation , linkage, migration) en gk nitique des popula-tions. Annales d e IUniversitk de Lyon A 1 4 79-117.

    I A , T., 1970 Effective num ber of alleles in a subdividedpopulation. Theor. Pop. Biol. 1: 273-306.

    NAGYIAIU, ., 1976 The decay of genetic variability in geographicallystructured populations. 11. Theo r. Pop. Biol. 10: 70-82.

    NAGYLAKI, ., 1983 The robustness of neutral models of geographi-cal variation. Theo r. Pop. Biol. 24: 268-294.

    RAYMOND, ., and F. ROUSSET, 1995 GENEPOPVersion 1.2: popula-tion genetics software for exact tests and ecumenicism. J. H ere d.86: 248-249.

    ROUSSET, ., 1996 Equilibriu m values of measures of populationsubdivision for stepwise mutation processes. Genetics 142: 1357-1362.

    SAWYER, ., 1976 Results for the stepping stone model for migrationin population genetics. Ann. Prob. 4: 699-728.

    SAWYER, ., 1977 Asymptotic properties of the equilibrium probabil-ity of identity in a geographically structured population. Adv.Appl. Prob. 9 268-282.

    SAWYER, ., and J. FELSENSTEIN, 981 A continuous migration model

    with stable demography. J. Math. Biol. 11: 193-205.SCHILTHUIZEN, ., and M. LOMBAERTS, 994 Population structureand levels of gene flow in the mediterrane an land snail Albinariac m g a t a (Pulmonata: Clausiliidae . Evolution 48: 577-586.

    SLATKIN, ., 1991 Inb ree ding coefficients and coalescence times.Genet. Res. 58: 167-175.

    SLATKIN, ., 1993 Isolation by distance in equilibrium an d non-equilibrium populations. Evolution 47: 264-279.

    SLATKIN,M., 1994 Gene flow and population structure, pp. 3-17in Ecological Genetics,edited by L. A. REAL. Princeton UniversityPress, Princeton, NJ.

    SLATKIN, ., 1995 A measure of population subdivision based onmicrosatellite allele frequencies. Genetics 139: 457-462.

    SIATKIN, M., and N . H. BARTON, 989 A comparison of three indi-rect methods for stimating average levels of gene flow. Evolution43: 1349-1368.

    SPITZER, ., 1976 Principles of Random Walk, Ed. 2. Springer-Verlag,Berlin.

    STROBECK, ., 1987 Average number of nucleotide differences in asample from a single subpopulation: a est for population ubdivi-sion. Genetics 117: 149-153.

    TACHIDA, ., and H. YOSHIMARU, 996 Genetic diversity in partiallyselfing populations with the stepping-stone structure. Heredity

    WARD, . D., M. WOODWARD nd D. 0 F. SKIBINSKI, 994 A compar-ison of genetic diversity levels n marine , freshwater, and anadr o-mous fishes. J. Fish Biol. 44: 213-232.

    WEIR, B. S., and C. C. COCKERHAM,984 Estimating Pstatistics forthe analysis of p opulation structure. Evolution 38: 1358-1370.

    WOLFRAM, ., 1991 Mathematica, Ed. 2. Addison Wesley, RedwoodCity, CA.

    WOOD, J. W., 1987 The genetic demography of the Gainj of PapuaNew Guinea. 2. Determinants of effective population size. Am.Nat. 129: 165-187.WOOD, J. W., P. E. SMOUSE nd J. C. LONG, 1985 Sex-specific dis-persal patterns in two human populations of highland NewGuinea. Am. Nat. 125 747-768.

    WRIGHT, S., 1946 Isolation by distance un de r diverse ystems ofmating. Genetics 31: 39-59.

    WRIGHT, ., 1969 Evolution and theGenetics of Populations. II. TheT h e q of Gene Frequencies. University of Chicago Press, Chicago.

    Communicating editor: W.J. EWENS

    77: 469-475.

    APPENDIX

    Identity by descent: Consider migration in one di-mension. Let y = ( 1 u) and ml be the probabilitythat the parent of some gene in population j was in

    population j + I then at stochastic equilibrium theprobabilities e, of identity by descent of pairs of genesin subpopulations i and i + j obey the relationship

    MALECOT 1951

    An explicit solution is expressed as follows. For n sub-populations on a circle, let f i be the integer part of n /2 , $( z) = X?: mlzlbe the generat ing function of thems, and define

    Then

    where A, = A n I 2= 1 and Ak = 2 otherwise [see m u -YAMA ( 1970) with notations changed and slightly differ-ent value of f i ]. The limit when n goes to infinity is(h4ALECOT 1950; NAG- 1976; SAWYER 1977)

    In the same way, for a two-dimensional torus of n, X nysubpopulations,

    x cos (F) os T)~ k m ( A 5 )where the f i s , E s and $s are defined as above, one foreach dimension. When both n, and nr goes to infinity,the above result converges to

    ,g =( l - e,*,)1

    r Y ; eZX)* , : ( e Z Y )I k 2N7r2 0 1 - y x ( e )$; ( e )

    X cos x cos ( k y dxdy. (A6)

    Evaluation of the integrals in (A4) and (A6 ) s detailedby SAWYER 1977) and yields Equations 1-4 . These for-mulas are asymptotic results for low mutation rates, ie.,for y in the neighborhood of 1. Taken as function ofy these are approximations for the generating func-tions of coalescence times e y ) in the neighborhoodof 1 .

    Values of j? and CsT: It is useful to consider the quan-

    tity

  • 8/10/2019 Genetic Differentiation

    10/10

    1228 F. Rousset

    At the low mutation limit, it can be interpreted as aratio of average coalescence times,

    This ratio has a finite limit when the number of subpop-ulations increases though the average coalescence timesthemselves become infinite. In finite populations To =2Nnp, where np is the num ber of subpopulations STRO-BECK 1987 ), hence is readily obtained when Cw isknown.

    From Equation A3,

    x (1 COS y)) (A9)and when n goes to infinity,

    0: - T y@' ( e )1 pi* 2N7r o 1 y @ ' ( e )

    X ( 1 - COS j x ) dx. (A10)

    The integral is no more than a difference between twointegrals discussed by SAWYER 19 77 ). In this way oneobtains Equation 5 as the difference between Equations1 and 2. Likewise Equation 6 is the difference betweenEquations 3 and 4.

    Discrete migration istributions, A2 values, ndWright s neighborhood size: The examples make useof the following distributions of parent-offspring dis-tance. In some cases the probability of migration by Isteps in one dimension is mL( , n) = dC1,+ 2- + ( 1- d )S l o (n even ) , ( e ) = 1 - d ( l - cos (x / 2 ) ) andg 4 - dn/4 . When n = 2 it corresponds to the stepping

    stone model with migration rate in each dimention d/2. (Th e migration rate m M d i n two dimensions if d issmall.) When d = 1 and n 2, this is a shiftedbinomial that may be considered a discrete equivalentof Gaussian migration. In other cases q 1 d / 2and

    ml = ( 1 - q)qL- ' d /4 or 1 0 ,

    @ ( e ) = 1 [ I - ( 1 - q)(cos x) q ) /

    ( 1 - 2 ~ 0 s x ) q + q ' ) ]d / 2

    and

    a ' = ( d / 2 ) ( 1 + q ) / ( l q ) 2 .Thus, the fraction of migrants is determined by d, andamong migrants distance follows a geometric distribu-tion described by q.

    A2 is defined by SAWYER 197 7), Equation (3. 4) .

    Since C~v./ 1 CS,) s almost identical to the poten-tial kernel of the random walk defined by the 9 s

    SPITZER 97 6) , ome results of SPITZER (1976) , p. 124,can be used to obtain a somewhat more explicit formulain the case of isotropic migration:

    where A = 0.9159 . . . is Catalan's constant. AI and A2were computed using Muthemutica (WOLFRAM 991 .

    In the present notations, WRIGHT'S eighborhoodsize (WRIGHT 969, Equations 12.40-12.41) can bewritten

    for the lattice models. It has no simple relationship toNand to A2 or the integral in the definition of A2,