hideki innan* and yuseob kim - soken

8
Copyright Ó 2008 by the Genetics Society of America DOI: 10.1534/genetics.108.086835 Note Detecting Local Adaptation Using the Joint Sampling of Polymorphism Data in the Parental and Derived Populations Hideki Innan* ,1 and Yuseob Kim *Graduate University for Advanced Studies, Hayama, Kanagawa 240-0193, Japan and The School of Life Sciences and the Center for Evolutionary Functional Genomics, The Biodesign Institute, Arizona State University, Tempe, Arizona 85287 Manuscript received January 7, 2008 Accepted for publication April 23, 2008 ABSTRACT When a local colonization in a new niche occurs, the new derived population should be subject to different selective pressures from that in the original parental population; consequently it is likely that many loci will be subject to directional selection. In such a quick adaptation event through environmental changes, it is reasonable to consider that selection utilizes genetic variations accumulated in the precolonization phase. This mode of selection from standing variation would play an important role in the evolution of new species. Here, we developed a coalescent-based simulation algorithm to generate patterns of DNA polymorphism in both parental and derived populations. Our simulations demonstrate that selection causes a drastic change in the pattern of polymorphism in the derived population, but not in the parental population. Therefore, for detecting the signature of local adaptation in polymorphism data, it is important to evaluate the data from both parental and derived populations simultaneously. L OCAL adaptation is one of the driving forces of speciation (Coyne and Orr 2004; Beaumont 2005). Consider a biallelic locus with alleles A and B in a single population. It is assumed that there is little fitness difference between A and B. Suppose that a small subset of the population migrates to a new niche with a different environment, where B is advantageous over A. Then, it is likely that this locally adapted allele B becomes fixed in the derived local population, providing an opportunity for the evolution of a new species. One of the nice examples is the body armor evolution in sticklebacks, which distribute from oceans to fresh- water lakes and streams. It has been believed that marine sticklebacks have spread and colonized in fresh- water lakes and streams multiple times. One of the phenotypes specific to the freshwater sticklebacks is the reduction in the body armor. The marine sticklebacks typically have extensive armor plates from head to tail, while the freshwater sticklebacks retain only part of them. Because the freshwater stickleback populations are usually isolated from one another, parallel evolution of the body armor reduction has been suggested. Recent molecular evidence (Colosimo et al. 2005) dem- onstrated that genetic variation in a single gene called Ectodysplasin should be responsible for most of this evolution. A low-plate allele at this locus is shared by all investigated freshwater populations (except for one Japanese population). It was found that in these fresh- water populations, the low-plate allele is completely fixed, while the allele is found in the marine popula- tions in very low frequencies. It is suggested that re- peated fixation of the low-plate allele in the marine populations occurred when sticklebacks colonized in freshwater lakes and streams. Domestication also offers typical situations of the fixation of standing variation in the cultivated species (e.g.,Doebley et al. 2006). Our ancestors have domes- ticated a number of animals and plants in our history for several tens of thousands of years, and the process is still ongoing. Domestication is usually initiated with a small subset of the population of its wild progenitor. In this small founder population, some already-existing var- iants are subject to humans’ extremely strong selective pressure to achieve desired phenotypic characteristics. Subsequently, these beneficial phenotypes can be fixed in the founder population in a very short time. The advantage of this mode of selection is that it does not have to wait for a new mutation to adapt to new environments (including artificial environments). It may be argued that standing variation plays important roles in adaptive evolution especially when there are rapid environmental changes. The purpose of this note 1 Corresponding author: Graduate University for Advanced Studies, Hay- ama, Kanagawa 240-0193, Japan. E-mail: innan [email protected] Genetics 179: 1713–1720 ( July 2008)

Upload: others

Post on 23-Mar-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hideki Innan* and Yuseob Kim - SOKEN

Copyright � 2008 by the Genetics Society of AmericaDOI: 10.1534/genetics.108.086835

Note

Detecting Local Adaptation Using the Joint Sampling of PolymorphismData in the Parental and Derived Populations

Hideki Innan*,1 and Yuseob Kim†

*Graduate University for Advanced Studies, Hayama, Kanagawa 240-0193, Japan and †The School of Life Sciences and theCenter for Evolutionary Functional Genomics, The Biodesign Institute, Arizona State University, Tempe, Arizona 85287

Manuscript received January 7, 2008Accepted for publication April 23, 2008

ABSTRACT

When a local colonization in a new niche occurs, the new derived population should be subject todifferent selective pressures from that in the original parental population; consequently it is likely thatmany loci will be subject to directional selection. In such a quick adaptation event through environmentalchanges, it is reasonable to consider that selection utilizes genetic variations accumulated in theprecolonization phase. This mode of selection from standing variation would play an important role inthe evolution of new species. Here, we developed a coalescent-based simulation algorithm to generatepatterns of DNA polymorphism in both parental and derived populations. Our simulations demonstratethat selection causes a drastic change in the pattern of polymorphism in the derived population, but notin the parental population. Therefore, for detecting the signature of local adaptation in polymorphismdata, it is important to evaluate the data from both parental and derived populations simultaneously.

LOCAL adaptation is one of the driving forces ofspeciation (Coyne and Orr 2004; Beaumont

2005). Consider a biallelic locus with alleles A and B ina single population. It is assumed that there is little fitnessdifference between A and B. Suppose that a small subsetof the population migrates to a new niche with a differentenvironment, where B is advantageous over A. Then, it islikely that this locally adapted allele B becomes fixed inthe derived local population, providing an opportunityfor the evolution of a new species.

One of the nice examples is the body armor evolutionin sticklebacks, which distribute from oceans to fresh-water lakes and streams. It has been believed thatmarine sticklebacks have spread and colonized in fresh-water lakes and streams multiple times. One of thephenotypes specific to the freshwater sticklebacks is thereduction in the body armor. The marine sticklebackstypically have extensive armor plates from head to tail,while the freshwater sticklebacks retain only part ofthem. Because the freshwater stickleback populationsare usually isolated from one another, parallel evolutionof the body armor reduction has been suggested.Recent molecular evidence (Colosimo et al. 2005) dem-onstrated that genetic variation in a single gene called

Ectodysplasin should be responsible for most of thisevolution. A low-plate allele at this locus is shared byall investigated freshwater populations (except for oneJapanese population). It was found that in these fresh-water populations, the low-plate allele is completelyfixed, while the allele is found in the marine popula-tions in very low frequencies. It is suggested that re-peated fixation of the low-plate allele in the marinepopulations occurred when sticklebacks colonized infreshwater lakes and streams.

Domestication also offers typical situations of thefixation of standing variation in the cultivated species(e.g., Doebley et al. 2006). Our ancestors have domes-ticated a number of animals and plants in our history forseveral tens of thousands of years, and the process is stillongoing. Domestication is usually initiated with a smallsubset of the population of its wild progenitor. In thissmall founder population, some already-existing var-iants are subject to humans’ extremely strong selectivepressure to achieve desired phenotypic characteristics.Subsequently, these beneficial phenotypes can be fixedin the founder population in a very short time.

The advantage of this mode of selection is that it doesnot have to wait for a new mutation to adapt to newenvironments (including artificial environments). Itmay be argued that standing variation plays importantroles in adaptive evolution especially when there arerapid environmental changes. The purpose of this note

1Corresponding author: Graduate University for Advanced Studies, Hay-ama, Kanagawa 240-0193, Japan. E-mail: innan–[email protected]

Genetics 179: 1713–1720 ( July 2008)

Page 2: Hideki Innan* and Yuseob Kim - SOKEN

is to provide a guideline on how to detect such selectionin DNA polymorphism data. Figure 1 illustrates themodel considered. It is supposed that a small part of thelarge population colonized in a new niche at time ts,while the major part of the large population still re-mains. The population before ts is referred to as theancestral population, and the two descendant popula-tions are called the parental and derived populations, asillustrated in Figure 1. Let N0 be the diploid effectivepopulation size of the ancestral population, in which itis assumed that two alleles, A and B, are segregating withfrequencies 1 � p0 and p0, respectively, at time ts. Let N1

be the size of the parental population after ts. It isassumed that N0 and N1 are constant.

At time ts, a new small population (derived popula-tion) is established in a new niche with a new environ-ment, and let N2 be its effective population size. As thispopulation should be a subsample of the ancestral pop-ulation at ts, it is reasonable to assume that the initialfrequency of B in this population is also p0. Suppose thatthis allele is advantageous conditional on being in thisnew niche, so that it could contribute to local adaptationto the new environment. Following the colonization, themodel allows a population expansion event at time te,changing the population size to N92. The populationsizes before and after te are assumed to be constant.

The frequencies of B in the two populations changeover time due to random genetic drift and selection.The model emphasizes selection in the new local pop-ulation, where the relative fitnesses of an A/A homozy-gote, a B/A heterozygote, and a B/B homozygote aregiven by 1, 1 1 2sh, and 1 1 2s, respectively (we assumeh ¼ 0.5 throughout this study for simplicity). Noselection is here considered in the ancestral and

parental populations (s¼ 0), but it is easy to incorporateselection in those populations into the model. Thecurrent frequencies of B in the parental and derivedpopulations are denoted by p1 and p2, respectively.

A typical pattern of local adaptation from standingvariation would be that the beneficial allele, B, is fixed inthe local derived population, while the two alleles maystill be segregating in the current parental population.The major goal of this study is to understand how suchselection affects the pattern of nucleotide polymor-phism in the derived population. Then, we seek for aneffective method to detect the signature of selection inpolymorphism data. The results would contribute to theidentification of genes that have played a significant rolein recent local adaptation.

We have previously demonstrated that this mode ofselection may not leave as strong a signature as thestandard selective sweep from a single mutant, whichusually causes a drastic reduction of polymorphism(Innan and Kim 2004). We also showed that the sta-tistical power to detect selection from standing variationis fairly low when we use the polymorphism data in thelocal population alone. Here, we quantitatively showthat we can improve the power substantially when wecompare the patterns of polymorphism in both of thetwo populations. The rationale behind this idea is thatselection can be detected most efficiently when thepatterns of polymorphism right before and after theduration of adaptive selection are both observed. This isvery difficult in natural populations in most cases, but ifthe selection event is recent, we expect the currentpattern of polymorphism is similar to that right after theselection event. As an approximate to the pattern beforeselection, we may be able to use the current poly-morphism pattern in the parental population becausethis population is free from selection. The change in thepattern of polymorphism would be slow at a neutralregion, especially when the population size is large, andthis might be the case for the parental population.

This idea has been extensively applied to domesti-cated species such as maize. A classic example is the workof Doebley and colleagues (Wang et al. 1999), whodemonstrated that the level of polymorphism in maize issignificantly reduced (in comparison to its wild pro-genitor) in the regulatory region of the tb1 gene, sug-gesting the operation of strong selection specific tomaize. In addition to a number of follow-up studies formaize (e.g., Wright et al. 2005), this approach has alsobeen used for Drosophila melanogaster and humans (i.e.,African vs. non-African populations; Schlotterer

2002, 2003; Kayser et al. 2003; Sabeti et al. 2007; Tang

et al. 2007). Interesting examples of the demonstrationsof selection include the spread of malarial-resistantalleles in non-African human populations (Tishkoff

et al. 2001).To quantitatively evaluate the advantage of using

polymorphism data of joint sampling from two popula-

Figure 1.—Illustration of the model. The vertical axis rep-resents the frequency of the beneficial allele B. See text fordetails.

1714 H. Innan and Y. Kim

Page 3: Hideki Innan* and Yuseob Kim - SOKEN

tions, we developed a simulation algorithm to generatepatterns of polymorphism under the scenario of localadaptation described above. A coalescent simulation forthe model outlined in Figure 1 can be performed in amanner similar to that in Innan and Kim (2004). First,the trajectories of allele B in the three populations aredetermined, given its frequency p0 at t ¼ ts. Thetrajectory in the ancestral population is determined bysimulating the frequency change backward in time,starting with p0 at t ¼ ts, as described in Innan and Kim

(2004). The trajectories in the two descendant popula-tions are independently determined by forward-in-timesimulations. In the simulation for the derived popula-tion, if allele B is lost by time t ¼ 0, the recursion startsagain at t ¼ ts until a trajectory with p2 . 0 is obtained.

Next, the genealogy for n1 sequences from theparental population and n2 sequences from the derivedpopulation is constructed by the coalescent-with-recombination algorithm conditional on two allelicclasses (Hudson and Kaplan 1986; Kim and Stephan

2002). In this case, each population is subdivided by theA and B alleles, and the trajectory of B obtained abovespecifies their frequencies at a given time and popula-tion. From t¼ 0 to ts, the ancestral recombination graph(ARG) conditional on two allelic classes (Griffiths andMarjoram 1997; Kim and Stephan 2002) is builtseparately for both populations. Let R0 be the popula-tion recombination parameter (scaled by N0); that is,R0 ¼ 4N0g, where g is the recombination rate pergeneration.

In the parental population, the simulation starts withm chromosomes carrying B alleles and n1 � m chromo-somes carrying A alleles at t ¼ 0, where m is determinedby binomial sampling with probability p1. The simula-tion for the derived population from t ¼ 0 to ts isidentical to that for the postdomestication period inInnan and Kim (2004). At the end of constructinggenealogy for the parental population (t¼ ts), there aren1A (n1B) ancestral sequences, i.e., the edges in the ARG,carrying A (B). Similarly, n2A and n2B ancestral sequen-ces carrying A and B, respectively, remain in the derivedpopulation at t ¼ ts. Then, these ancestral sequencesfrom the two populations are combined to start the ARGin the ancestral population. Namely, the ARG starts att ¼ ts with n1A 1 n2A edges linked to A and n1B 1 n2B

edges linked to B. The construction of ARG continuesuntil t ¼ max(10N0, 10N92).

The joint genealogy for n1 1 n2 sampled sequences isextracted from the ARG for all individual sites. Then,mutations are added on the genealogy following theJukes–Cantor model of sequence evolution (Jukes andCantor 1969). The number of mutations over a unitlength of branch (4N0 generations) is Poisson distrib-uted with population mutation parameter u0, which isu0¼ 4N0m, where m is the mutation rate per generation.

A number of patterns of polymorphism were simu-lated following the procedure described above. N0 ¼

100,000 is fixed throughout this study. Figure 2 contraststhe patterns of polymorphism after sweeps from a singlemutation (Figure 2A) and from standing variation(Figure 2B). We focus on the levels of polymorphismwithin and between populations measured by p1, p2,and pb (see Table 1) and Tajima’s D in each population(TD1 and TD2) that summarize the allele-frequencyspectrum of polymorphism (Tajima 1989). Here, weassume that the ancestral population is split into twopopulations with sizes N1¼ 0.9N0 and N2¼ 0.1N0 at timets¼ 0.1N0 and then a 10-fold increase in population sizeoccurs in the derived population (N 92¼N0) at time te ¼0.05N0. u0 ¼ 0.01 per site is assumed, and the re-combination rate per adjacent sites is assumed to bethe same as the mutation rate (R0 ¼ u0). In each rep-lication, a 50-kb region was simulated, and the spatialdistributions of p1, p2, pb, and TD1 and TD2 wereinvestigated by a window analysis with size 5 kb.

We first carried out a number of simulations with asweep from a single mutation that arose at the same timeas the population split event ts¼ 0.1N0 generations ago.The selection intensity is assumed to be s ¼ 0.005(2N2s ¼ 100), and the target site of selection is locatedat the center of the simulated region. The observedpatterns from the simulations are quite similar, and weshow the results from three runs with typical patterns inFigure 2A. For each run, the profiles of the levels ofpolymorphism along the sequence are shown in the leftgraph and the distributions of Tajima’s D are in the rightone. It is demonstrated that the reduction in p isobserved in a wide region around the target site ofselection (i.e., p2 is represented by solid stars) becausethe haplotype on which the beneficial mutation arosehas spread very rapidly in the derived population.Although recombination prohibits the complete fixa-tion of that haplotype in the entire recombining chro-mosome, we likely observe a wide dip in the distributionof p for the parameters chosen. Theoretically, the widthof the dip depends on the selection intensity and therecombination rate (Kaplan et al. 1989; Stephan et al.1992). In the parental population, which is free fromselection for the beneficial allele, the level of poly-morphism (i.e., p1 represented by solid rectangles) isdistributed around its expectation, which is �0.01. Thedistribution of pb is very similar to that of p1 because thederived population is so young that both p2 and pb

primarily reflect the pattern of polymorphism at ts.Similarly, the reduction of Tajima’s D is observed in awide region around the target site of selection only inthe derived population (solid stars in Figure 2A),indicating a wide region of skewed allele-frequencydistribution toward rare alleles. Similar patterns for p

and TD are observed for almost all simulations asdemonstrated in Figure 2C.

Figure 2B shows the simulation results with a sweepfrom standing variation. Three runs of simulations withtypical patterns were arbitrarily chosen to demonstrate

Note 1715

Page 4: Hideki Innan* and Yuseob Kim - SOKEN

the point. All parameters are the same as those forFigure 2A, except that the frequency of B at ts is p0¼ 0.1.The pattern of the polymorphism level in the derivedpopulation seems quite variable under this setting: thepatterns include those with almost no visible reductionin p2 and TD2 (e.g., the middle graphs of Figure 2B) andthose with quite a sharp dip (e.g., the top graphs). Thesignature of selection from standing variation is not asclear as that of a complete sweep from a single mutation(see also Figure 2D), in agreement with our previousstudy and others (Innan and Kim 2004; Hermisson andPennings 2005; Przeworski et al. 2005). This is simplybecause there could be nucleotide variation within Baccumulated in the preselection phase (i.e., before ts),and the amount of polymorphism is determined by thefrequency of the allele and the mode and intensity ofselection (Innan and Tajima 1997, 1999). Nevertheless,the direction of the signature is similar. In general, p2

is reduced around the target site of selection in thederived population, and, more importantly, the effectcan be well contrasted when compared to those in theparental population.

On the basis of these observations, we consider severalsummary statistics to detect selection in polymorphismdata (summarized in Table 1). First, the reduction in thelevel of nucleotide polymorphism would be a typicalsignature of selection. There are two major statisticalapproaches to detect such a signature. One is to test thereduction at the focal region relative to the orthologousregion of a closely related species or population

(Vigouroux et al. 2002; Wright et al. 2005), while theother is to test against other regions in the same species(Hudson et al. 1987; Schlotterer 2003). One of thepurposes of this study is to investigate which approach ismore powerful. As a representative of the former, we usethe summary statistic, Dp ¼ p2/p1. The reduction mayalso be measured in terms of haplotype-based heterozy-gosity, denoted by DH ¼ H2/H1. A haplotype is definedas those with completely identical sequences.

For the latter, we consider tests using polymorphismdata in the derived population alone. One of such testscan be performed in the framework of the Hudson–Kreitman–Aguade (HKA) test (Hudson et al. 1987).The original form of the HKA test examines if the levelsof polymorphism in multiple regions are consistent witheach other by using the divergence data from an out-group to correct for the mutation rate variation acrossregions. A modified version of the HKA test (Innan

2006) should be suitable in the situation here, where wewish to evaluate the effect of selection in the focal re-gion. In this new version called the genomic HKA(gHKA) test (Innan 2006), a summary statistic r is used,which is the ratio of the level of polymorphism in thefocal region to the divergence (see Table 1). The gHKAtest examines whether this r is statistically consistentwith the average over multiple reference regions, whichare supposed to be neutral. We evaluate the power of thegHKA test, which is applied to the derived population.We consider two cases: l¼ 1 and 10 reference regions areavailable. In the reference regions, the sizes and the

Figure 2.—Typical patterns of polymorphism measured by p1, p2, pb, and TD1 and TD2. (A) Those when a sweep occurredfrom a single mutant (p0¼ 1/2N2). (B) Those when a sweep occurred from standing variation (p0¼ 0.1). The density distributionsof p1, p2, TD1 and TD2 in the 5000-bp window around the target site of selection are also shown in C and D: (C) for p0¼ 1/2N2 and(D) for p0 ¼ 0.1.

1716 H. Innan and Y. Kim

Page 5: Hideki Innan* and Yuseob Kim - SOKEN

mutation and recombination rates are assumed to beidentical to those in the focal region. The statistic rshould be reduced by adaptive selection in the derivedpopulation.

Adaptive selection in the derived population woulddramatically change the genetic composition in thispopulation, thereby enhancing the difference betweenthe two populations. Therefore, it is easy to imagine thatmeasures of population differentiation such as Wright’sFST (Wright 1931) would be increased, as pointed outby Lewontin and Krakauer (1973). Here, FST is mea-sured in terms of nucleotide and haplotype diversity(i.e., p and H), denoted by FSTp and FSTH, respectively. Adrastic change in Tajima’s D after a sweep suggests thatDTD ¼ TD2 � TD1 might also be a good summary sta-tistic, in addition to Tajima’s D in the derived populationitself (i.e., TD2).

We have also developed a new summary statistic fo-cusing on the most common haplotype in the derivedpopulation (MC2). When selection from standing vari-ation is completed, it is likely that the frequencies of oneor a few haplotypes carrying the beneficial allele shouldbe increased by selection. Therefore, if the haplotype

diversity is measured in terms of the heterozygosity ofMC2, which is defined as H MC2 ¼ 2ðn=ðn � 1ÞÞf ð1� f Þ,where f is the sample frequency of MC2, it is expectedthat HMC2 in the derived population (H MC2

2 ) should bemuch reduced after a sweep. The new statistic is de-signed as the ratio of HMC2; that is, DH MC2¼H MC2

2 =H MC2total .

The denominator, H MC2total , is HMC2 in the pooled sample

from the two populations. H MC2total is used instead of H MC2

1

because the summary statistic cannot be computedwhen the parental population does not have MC2, whichfrequently occurs when ts and/or u are large. This draw-back should reduce the power to detect selection as isshown below.

The power of these summary statistics to detectadaptive selection from standing variation was evaluatedby large amounts of coalescent simulations under vari-able situations. The power of summary statistic S isdefined as the proportion of simulation runs (in per-centile) that reject the neutrality at the 5% level. We firstused the demographic model and the mutation andrecombination parameters that are identical to those inFigure 2. The null distributions of the nine summarystatistics were obtained with neutral simulations, fromwhich the 95% confidence intervals were computed. Forall test statistics, one-tailed tests were applied. Then,additional simulations with selection (10,000 replica-tions for each parameter set) were carried out to evaluatethe power of the nine test statistics. The selection param-eters are set such that s¼ 0.005 and 0.001 (2N2s¼ 100 and20). We investigated the cases with p0 ¼ 0.01, 0.02, 0.05,0.1, 0.2, and 0.5 in addition to a sweep from a singlemutation (i.e., p0¼ 1/2N2). The sample sizes from the twopopulations were set to be n1¼n2¼ 50. We chose this pairbecause n1 ¼ n2 ¼ 50 is sufficiently large so that usingmore samples does not improve the power much (datanot shown).

The extent to which selection affects (in terms of thelength of the chromosomal region) is determined by afunction of the selection intensity and the mutation andrecombination parameters; therefore, the power to de-tect selection may also depend on the size of the regionto which a summary statistic is applied. We investigatedthe power of the nine summary statistics for L ¼ 1000-and 5000-bp regions. The result is summarized in themiddle graphs (top for L¼ 1000 bp and bottom for L¼5000 bp) in Figure 3A.

For the strong selection case (s¼ 0.005) with L¼ 5000bp, the common pattern for all summary statistics is thatthe power decreases as p0 increases, in agreement withour previous study (Innan and Kim 2004). FSTp is overallthe most powerful summary statistic among the nine,and Dp is the second best. These two best ones are basedon p and similar statistics based on H (i.e., DH and FSTH)

TABLE 1

Summary statistics

Statistics Description

Summary statistics for the level of polymorphismp1 p within the parental populationa

p2 p within the derived populationa

pb p between the two populationsa

H1 H within the parental populationb

H2 H within the derived populationb

Hb H between the two populationsb

Test statisticsTD1 Tajima’s D in the parental populationTD2 Tajima’s D in the derived populationDTD TD2 � TD1

Dp p2/p1

DH H2/H1

r Polymorphism/divergence ratio, p2/dc

FSTp FST in terms of nucleotide sequenceFSTH FST in terms of haplotypeMCH H MC2

2 =H MC2total

d

a p, the average number of pairwise nucleotide differences.b H, the heterogeneity in terms of haplotypes. H within a

population corresponds to the haplotype heterozygosity,while that between populations is the probability that ran-domly chosen haplotypes from the two populations (onefrom each) are different.

c d, the nucleotide divergence from an outgroup species.d H MC2, the heterozygosity of MC2, the most common hap-

lotype in the derived population.

Figure 3.—Powerof theninesummarystatistics(seetext).Thefixedparametersareu0¼R0¼0.01and te¼ ts/2.(A)Strongbottleneckfollowed by an expansion. N1¼ 0.9N0, N2¼ 0.1N0, and N92¼N0 are assumed. (B) Strong bottleneck with no expansion. N1¼ 0.9N0 andN2 ¼ N92 ¼ 0.1N0 are assumed. (C) Half-and-half split of the ancestral population. N1 ¼ N2 ¼ N92 ¼ 0.2N0 is assumed.

<

Note 1717

Page 6: Hideki Innan* and Yuseob Kim - SOKEN

1718 H. Innan and Y. Kim

Page 7: Hideki Innan* and Yuseob Kim - SOKEN

are not as good as Dp and FSTp. The situation is similarfor MCH. The performance of TD2 and DTD is not asgood as that of Dp and FSTp. Our previous study (Innan

and Kim 2004) showed that the HKA test generally hasmore power to detect selection from standing variationthan Tajima’s D, which is again demonstrated in thisstudy. The power of the gHKA test with l ¼ 1 and 10reference regions is much higher than that of Tajima’s D.The effect of the number of reference regions on thepower to detect sweeps may be relatively minor aspointed out by Innan (2006).

One of our motivations in this study is that we anti-cipated more chance of detecting selection when thepolymorphism patterns in the two populations are com-pared than when the pattern in the focal region is com-pared with those in different regions in the samepopulation. The former type of test statistics includesDp and FSTp, while the HKA test would represent thelatter type. Here we clearly show that the performanceof the gHKA test is not as good as that of FSTp and Dp,the former type of tests. These results support our idea.

When s and L are changed, FSTp is not always the best.For example, when L ¼ 1000 bp (middle bottom graphin Figure 3A), Dp is the best for small p0 while FSTp isbetter for large p0. The power of haplotype-basedsummary statistics is improved especially when p0 issmall. The performances of DH and FSTH are almost asgood as (and occasionally exceed) those of Dp and FSTp.

It should be noted that the power and p0 are not in aperfect negative correlation, which is quite against ourintuition. This is simply because the fixation of the bene-ficial allele was not completed in some replications,causing a reduction in the power. This frequently hap-pens when p0 is small because ts is not sufficiently longfor the fixation. It appeared that FSTp is sensitive to thiseffect in comparison with the others. This effect is alsolarge when L ¼ 1000 bp in comparison with the case ofL ¼ 5000 bp.

To investigate the effect of ts on the power, Figure 3Aincludes the results for two other ts (0.05N0 and 0.2N0)while the other parameters are the same. If the effect ofincomplete sweeps is taken into account, it is clearlyshown that the power decreases with increasing tsbecause the accumulation of new mutations obscuresthe footprint of selection.

The overall pattern is similar under different demo-graphic parameters. We have investigated the power ofthe nine summary statistics in another two demographicscenarios. The first one assumes the same parameters asthose for Figure 3A except that the population expan-sion is not allowed in the derived population (i.e., N2 ¼N 92). In this no expansion model, the efficacy of selec-tion is reduced because it is positively correlated with thepopulation size. Subsequently, the power is generallyreduced as shown in Figure 3B. Similarly, in the thirddemographic model where the ancestral populationsplits into the two populations with the same sizes (i.e.,

N1 ¼ N2 ¼ N0/2), the power of all summary statistics isincreased (because of the increase of the populationselection parameter, 2N2s), while the overall pattern ofthe relative power remains similar (Figure 3C).

Thus, this note introduced an algorithm to simulatepatterns of polymorphism in both parental and derivedpopulations when the latter experienced a sweep fromstanding variation. Extensive simulations were performedto investigate the power of the summary statistics summa-rized in Table 1. Our power simulations are considered tobe an extended version of Teshima et al. (2006), whoinvestigated similar selection modes in a single popula-tion. The power of all summary statistics is high when ts issmall, but decreases substantially as ts increases. Whenselection starts from standing variation, p0 is a crucialfactor to determine the likelihood to detect selection.Overall, we found that the performances of Dp and FSTp

are good, although not always best. FST has been one ofthe commonly used summary statistics to detect signa-tures of local adaptation (e.g., Pogson et al. 1995; Akey

et al. 2002; Storz and Nachman 2003; Storz andDubach 2004) (see also Beaumont 2005, for a review).Our results confirm the usefulness of this measure, inagreement with Vitalis et al. (2001) and Beaumont andBalding (2004).

We showed that it is much more powerful to detectselection when we use summary statistics that capturethe difference in the patterns of polymorphism betweenrecently diverged populations. This can be fairly dem-onstrated by comparing the performance of two testswith similar properties, Dp and gHKA: both focus on thereduction of the level of polymorphism, but the formeruses the level of polymorphism in the orthologous re-gion in the parental population as a control, whereasthe latter uses those in different regions in the samepopulation (species). In almost all parameter sets inves-tigated, the power of the former is larger than that of thelatter. These results indicate the importance of jointsampling from both the derived and the parental pop-ulation, because the latter can be used as a good control.Understanding what regions in the genome played asignificant role in local adaptation would provide in-sights into the mechanism of speciation. Our findingcould be incorporated in such approaches.

The authors thank K. Teshima, S. Takuno, and an anonymousreviewer for comments. H.I. is supported by grants from the JapanSociety for the Promotion of Science ( JSPS-19681020), the NationalScience Foundation (NSF) (CCF-0622037), and the Graduate Univer-sity for Advanced Studies, and Y.K. is supported by grants from the NSF(DEB-0449581) and Arizona State University.

LITERATURE CITED

Akey, J. M., G. Zhang, K. Zhang, L. Jin and M. D. Shriver,2002 Interrogating a high-density SNP map for signatures ofnatural selection. Genome Res. 12: 1805–1814.

Beaumont, M. A., 2005 Adaptation and speciation: What can Fst tellus? Trends Ecol. Evol. 20: 435–440.

Note 1719

Page 8: Hideki Innan* and Yuseob Kim - SOKEN

Beaumont, M. A., and D. J. Balding, 2004 Identifying adaptive ge-netic divergence among populations from genome scans. Mol.Ecol. 13: 969–980.

Colosimo, P. F., K. E. Hosemann, S. Balabhadra, G. Villarreal, Jr.,M. Dickson et al., 2005 Widespread parallel evolution in stickle-backs by repeated fixation of ectodysplasin alleles. Science 307:1928–1933.

Coyne, J. A., and H. A. Orr, 2004 Speciation. Sinauer Associates,Sunderland, MA.

Doebley, J. F., B. S. Gaut and B. D. Smith, 2006 The molecular ge-netics of crop domestication. Cell 127: 1309–1321.

Griffiths, R. C., and P. Marjoram, 1997 An ancestral recombina-tion graph, pp. 257–270 in Progress in Population Genetics andHuman Evolution, edited by P. Donnelly and S. Tavare.Springer-Verlag, New York.

Hermisson, J., and P. Pennings, 2005 Soft sweeps: molecular pop-ulation genetics of adaptation from standing genetic variation.Genetics 169: 2335–2352.

Hudson, R. R., and N. L. Kaplan, 1986 On the divergence of allelesin nested subsamples from finite populations. Genetics 113:1057–1076.

Hudson, R. R., M. Kreitman and M. Aguade, 1987 A test of neutralmolecular evolution based on nucleotide data. Genetics 116:153–159.

Innan, H., 2006 Modified Hudson–Kreitman–Aguade test and two-dimensional evaluation of neutrality tests. Genetics 173: 1725–1733.

Innan, H., and Y. Kim, 2004 Pattern of polymorphism after strongartificial selection in a domestication event. Proc. Natl. Acad.Sci. USA 101: 10667–10672.

Innan, H., and F. Tajima, 1997 The amounts of nucleotide variationwithinandbetween allelic classesandthereconstructionof the com-mon ancestral sequence in a population. Genetics 147: 1431–1444.

Innan, H., and F. Tajima, 1999 The effect of selection on theamounts of nucleotide variation within and between allelic clas-ses. Genet. Res. 73: 15–28.

Jukes, T. H., and D. R. Cantor, 1969 Evolution of protein mole-cules, pp. 21–132 in Mammalian Protein Metabolism, edited byH. N. Munro. Academic Press, New York.

Kaplan, N. L., R. R. Hudson and C. H. Langley, 1989 The ‘‘hitch-hiking’’ effect revisited. Genetics 123: 887–899.

Kayser, M., S. Brauer and M. Stoneking, 2003 A genome scan todetect candidate regions influenced by local natural selection inhuman populations. Mol. Biol. Evol. 20: 893–900.

Kim, Y., and W. Stephan, 2002 Detecting a local signature of genetichitchhiking along a recombining chromosome. Genetics 160:765–777.

Lewontin, R. C., and J. Krakauer, 1973 Distribution of gene fre-quency as a test of the theory of the selective neutrality of poly-morphisms. Genetics 74: 175–195.

Pogson, G. H., K. A. Mesa and R. G. Boutilier, 1995 Genetic popu-lation structure and gene flow in the atlantic cod gadus morhua: a

comparison of allozyme and nuclear RFLP loci. Genetics 139: 375–385.

Przeworski, M., G. Coop and J. D. Wall, 2005 The signature of pos-itive selection on standing genetic variation. Evolution 59: 2312–2323.

Sabeti, P. C., P. Varilly, B. Fry, J. Lohmueller, E. Hostetter et al.,2007 Genome-wide detection and characterization of positiveselection in human populations. Nature 449: 913–918.

Schlotterer, C., 2002 A microsatellite-based multilocus screen forthe identification of local selective sweeps. Genetics 160: 753–763.

Schlotterer, C., 2003 Hitchhiking mapping—functional ge-nomics from the population genetics perspective. Trends Genet.19: 32–38.

Stephan, W., T. H. E. Wiehe and M. W. Lenz, 1992 The effect ofstrongly selected substitutions on neutral polymorphism: analyt-ical results based on diffusion theory. Theor. Popul. Biol. 41: 237–254.

Storz, J. F., and J. M. Dubach, 2004 Natural selection drives altitu-dinal divergence at the albumin locus in deer mice, Peromyscusmaniculatus. Evolution 58: 1342–1352.

Storz, J. F., and M. W. Nachman, 2003 Natural selection on proteinpolymorphism in the rodent genus Peromyscus: evidence from in-terlocus contrasts. Evolution 57: 2628–2635.

Tajima, F., 1989 Statistical method for testing the neutral mutationhypothesis by DNA polymorphism. Genetics 123: 585–595.

Tang, K., K. R. Thorntonand M. Stoneking, 2007 A new approachfor using genome scans to detect recent positive selection in thehuman genome. PLoS Biol. 5: e171.

Teshima, K. M., G. Coop and M. Przeworski, 2006 How reliable areempirical genomic scans for selective sweeps? Genome Res. 16:702–712.

Tishkoff, S. A., R. Varkonyi, N. Cahinhinan, S. Abbes, G.Argyropoulos et al., 2001 Haplotype diversity and linkage dis-equilibrium at human G6PD: recent origin of alleles that confermalarial resistance. Science 293: 455–462.

Vigouroux, Y., M. McMullen, C. T. Hittinger, K. Houchins, L.Sculz et al., 2002 Identifying gene of agronomic importancein maize by screening microsatellites for evidence of selectionduring domestication. Proc. Natl. Acad. Sci. USA 99: 9650–9655.

Vitalis, R., K. Dawson and P. Boursot, 2001 Interpretation of var-iation across marker loci as evidence of selection. Genetics 158:1811–1823.

Wang, R.-L., A. Stec, J. Hey, L. Lukens and J. Doebley, 1999 Thelimits of selection during maize domestication. Nature 398: 236–239.

Wright, S., 1931 Evolution in Mendelian populations. Genetics 16:97–159.

Wright, S. I., I. Vroh Bi, S. G. Schroeder, M. Yamasaki, J. F. Doebley

et al., 2005 Theeffectsofartificial selectiononthemaizegenome.Science 308: 1310–1314.

Communicating editor: G. Gibson

1720 H. Innan and Y. Kim