signatures of a population bottleneck can be localised...

36
20/06/2003 S S i i g gn n a at t u u r r e e s s o o f f a a p p o o p p u u l l a at t i i o o n n b b o o t t t t l l e e n n e e c ck k c c a an n b b e e l l o o c c a al l i i s se e d d a al l o o n n g g a a r r e e c c o o m mb b i i n n i i n n g g c c h hr r o o m mo o s so o m me e C C é é l l i i n n e e B B e e c c q q u u e et t Bioinformatics and Modelling INSA of Lyon Institute for Cell, Animal and Population Biology University of Edinburgh, Scotland, UK Tutor ICAPB: Prof. Nick H. Barton Co-Tutor ICAPB: Dr. Peter Andolfatto Tutor INSA: Dr. Guillaume Beslon

Upload: others

Post on 30-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

20/06/2003

SSSiiigggnnnaaatttuuurrreeesss ooofff aaa pppooopppuuulllaaatttiiiooonnnbbbooottttttllleeennneeeccckkk cccaaannn bbbeee lllooocccaaallliiissseeeddd

aaalllooonnnggg aaa rrreeecccooommmbbbiiinnniiinnngggccchhhrrrooommmooosssooommmeee

CCCéééllliiinnneee BBBeeecccqqquuueeetttBBiiooiinnffoorrmmaattiiccss aanndd MMooddeelllliinngg

IINNSSAA ooff LLyyoonn

Institute for Cell, Animal and Population BiologyUniversity of Edinburgh, Scotland, UK

Tutor ICAPB: Prof. Nick H. BartonCo-Tutor ICAPB: Dr. Peter AndolfattoTutor INSA: Dr. Guillaume Beslon

Page 2: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining
Page 3: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

ABSTRACT

Most statistical tests proposed to detect selection are sensitive to demographic factorssuch as changes in population size. Unlike the localised effect of strong selection,demographic factors are expected to have a similar effect on the whole genome. While this isgenerally true, we show that signatures of a population bottleneck can be more localised. Wecharacterise spatial patterns of variability across a recombining chromosome that hasexperienced a recent and strong population bottleneck event. Interestingly, a bottleneck in thepresence of recombination results in increased heterogeneity in variability patterns along achromosome, reminiscent of the effects of selection. Since changes in population size may becommon events in the history of natural populations, our results have implications for theinterpretation of genome-wide scans of variability in Drosophila and humans.

Page 4: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

Content

ABSTRACT

Content

A Year-Internship in ICAPB, Edinburgh, Scotland, UK .. 1

Scientific report.............................................. 31. INTRODUCTION ..................................................................................3

1.1. DISTINGUISHING DEMOGRAPHY AND SELECTION.......................................... 31.2. STATISTICAL TESTS PROPOSED............................................................................. 41.3. DATA FOR DROSOPHILA AND HUMANS.............................................................. 41.4. INTEREST OF DETECTING SIGNATURE OF A BOTTLENECK........................... 5

2. MATERIALS AND METHODS ............................................................62.1. BOTTLENECK SIMULATIONS.................................................................................. 6

2.1.1. Drosophila populations.......................................................................................... 72.1.2. Human populations ................................................................................................ 7

2.2. STATISTICAL TESTS.................................................................................................. 82.2.1. Levels of variability................................................................................................ 82.2.2. Frequency spectrum ............................................................................................... 8

a) TAJIMA’s (1989_a) D............................................................................................... 8b) FAY and WU’s (2000) H.......................................................................................... 9

2.2.3. Linkage disequilibrium........................................................................................... 9a) High frequency haplotypes (HUDSON et al. 1994, VIEIRA and CHARLESWORTH 2000)... 9b) Number of haplotypes (STROBECK 1987)................................................................ 9

3. RESULTS.............................................................................................103.1. DROSOPHILA POPULATION PARAMETERS. ...................................................... 103.2. HUMAN POPULATION PARAMETERS. ................................................................ 113.3. LARGE SURVEY REGIONS IN DROSOPHILA...................................................... 13

3.3.1. Pattern of variability............................................................................................ 133.3.2. Increase of the linkage disequilibrium ................................................................. 14

3.4. LARGE SURVEY REGIONS IN HUMANS.............................................................. 153.4.1. Pattern of variability............................................................................................ 153.4.2. Increase of the linkage disequilibrium ................................................................. 15

4. DISCUSSION.......................................................................................164.1. INTERPRETATION OF SIGNIFICANT TESTS OF FREQUENCY SPECTRUM.. 16

4.1.1. TAJIMA’s D............................................................................................................ 164.1.2. FAY and WU’s H test ............................................................................................. 17

4.2. INTERPRETATION OF SIGNIFICANT TESTS TEST OF THE LEVEL OFPOLYMORPHISM .............................................................................................................. 17

Page 5: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

4.3. HAPLOTYPE TESTS AND LINKAGE DISEQUILIBRIUM.................................... 17

5. CONCLUSION AND PROSPECTS.....................................................18

Epilogue.......................................................19ACKNOWLEDGMENT

LITERATURE CITED

SUPPLEMENTSTABLE 2.SUPPLEMENT A............................................................................................... ITABLE 2.SUPPLEMENT B. .............................................................................................IIFIGURE 1.PATTERN OF VARIABILITY ALONG LARGE REGIONS IN DROSOPHILA AND HUMANS FOR S.......................................................IIIFIGURE 1.CONTINUED. PATTERN OF VARIABILITY FOR K............................... IVFIGURE 1.CONTINUED. PATTERN OF VARIABILITY FOR fMFH. ...........................VFIGURE 1.CONTINUED. PATTERN OF VARIABILITY FOR D............................... VIFIGURE 1.CONTINUED. PATTERN OF VARIABILITY FOR H..............................VII

Page 6: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

1

A Year-Internship in ICAPB, Edinburgh,Scotland, UK

Nine months ago, I arrived in Edinburgh, Scotland, as ready as one can be to work fora year in an almost completely new field, in an unknown city and, to make everything eveneasier, surrounded by total strangers speaking English with a funny accent (really unknownlanguage). The part of the university where I spent most of my working hours was located tothe south of the city, just ten minutes walk from where I was living. The campus is mostlydedicated to Science, the buildings either brand new or made of old stone and separated bylovely courtyards (which could be appreciated when the weather allowed). The the Institute ofCell, Animal and Population Biology (ICAPB) is the first building one encounters when onearrives from the city. In the General Office, the two secretaries are eager to help you find yourway (and later help you with all the administrative problems one can experience as anexchange student). I was looking for Prof. Nick Barton, who had accepted me into his lab forthe year, and it was in the north wing, on the first floor, that I found the messy little roomwhere six students were already hard at work. A door at the end of this room led to the officeof my future tutor, where my first unforgettable visit took place: Prof. Barton welcomed meby talking very fast about his ideas for the project and drawing dozens of incomprehensiblegraphs. After what seemed like hours, I was desperate: I had not understood a single wordexcept for the conclusion: get yourself an idea of what people do here. Nick introduced me tomany people this first day, whose names and projects were forgotten in minutes. I wasadvised to read several books and papers that would provide me with a basic understanding ofquantitative and population genetics, which I definitely needed to communicate with my newcolleagues. I installed myself at the desk that had been freed for me and I started to makeacquaintances with the other students. I was quickly relieved to discover that the new Ph.D.student, who I would have to share an Internet connection with, was as lost as I was duringour tutor’s first speech. I also attended the M.Sc. in Quantitative Genetics and GenomeAnalysis courses from October to December, thus meeting more students who were more orless new in the field and, therefore, more or less lost during the heavy lectures.

I became accustomed to the way of life on ICAPB first floor easily. Every morning at11am, coffee break. From Monday to Wednesday, a one-hour seminar on different areas ofquantitative and population genetics; needless to say, I learned a lot during these sessions. OnThursdays at 1pm, Genetics Journal Club, in which a student or professor presents a recentcontroversial or revolutionary paper on a subject of his/her choice, preferably a subject whichis not the speaker’s speciality. And every other Friday, Happy Hour, drinking event organisedby a different lab each time. For the first few weeks, I did what I had been told: read therecommended literature and met the first floor population (apart from the Scorpions,Arabidopsis, Drosophila…). Meanwhile Nick realised that my theoretical knowledge was stilla bit limited for the projects he had thought about and suggested that I ask my biologistcolleagues for data analysis, or simulation projects. So I shyly went into each office, askingpeople about their current projects and begging for anything that I could do to help. At thispoint, I would like to thank Prof. Brian Charlesworth for suggesting a project which I finallyturned down after I met the person who would become my supervisor and collaborator for theproject I present in this report. Dr. Peter Andolfatto is interested in the application of gene-

Page 7: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

2

genealogy approaches to understanding the major determinants of genome variabilitypatterns, and had a lot of ideas for natural and simulation data analysis projects involvingprogramming, but no time to work on them. Most importantly, we spoke the same language:he managed to explain things to me at my level (reaching my level of understanding was quitean achievement at that time), told me about his projects and helped me understand theconcepts I was still having problems with. We decided to work together on the subject ofdistinguishing between evolutionary models through their effects on sequence variability. Atfirst the project was meant to look at whether adaptation limits the power of purifyingselection in highly recombining regions of the Drosophila genome. However, my preliminarywork showed that it was too ambitious a project, and that some required data were lacking.That is how I started working on the project about detecting local signatures of a populationbottleneck along a recombining chromosome.

I started by reading papers related to the subject, which discussed the statistics wewanted to use, and began programming a draft functional nucleus that performed these tests.Meanwhile, we tried to define sets of parameters for modelling bottlenecks, which wererelevant to Drosophila and human populations. I then tried to add functions to perform theanalysis of the resulting simulations. My coding abilities and rather messy thinking pathwaysmanaged, quite logically, to create a bug in my program. The Christmas holidays arrived justin time, thus giving me a (to my mind) deserved break. January saw me writing the report dueat the end of the first semester. Being, “a bit” too ready to rush into the practical area of mywork, seldom thinking or reviewing my work, this report helped me to define precisely what Iwas doing, and quickly became a list of everything I had learned during the first part of myinternship and project planning.

Now knowing what I needed to do, using the existing functions, I rewrote a cleanprogram with the helpful advice of Hedi Soula. The first part of the project investigated therejection probabilities of our tests of interest for a single locus. Our analysis of the simulationswith bottleneck models for Drosophila very quickly gave interesting and unexpected results.We then worked on the second part of the project, and simulated a larger genomic segmentwith our bottleneck models, thus giving even more interesting results. Of course all of thiswas not without trial and error: correcting statistical functions, little bugs, mistakes in thesimulations parameters…everything that makes an internship a learning experience (ofcourse, it also drives you crazy). Anyway, we finally reached the point where we couldconfirm our results by doing similar simulations with bottlenecks modelled for humanpopulations.

And that is how, after eight months of intensive work, very few days-off, no holidays,but fortunately lots of relaxing (usually in the pub - I was in Scotland remember) we are nowat the point of writing a striking paper, which we hope will give scientists working in the fieldof quantitative, evolutionary and population genetics, a new perspective on, and a way ofworking with, the statistics we studied.

The following report will deal with the serious matter: the scientific work that Peterand I did during these months of collaboration.

Page 8: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

3

Scientific report

1. INTRODUCTION

1.1. DISTINGUISHING DEMOGRAPHY AND SELECTION.Natural population harbour enormous levels of genetic variability and the signatures

of an organism’s evolutionary history lie hidden within this variability. The patterns ofnucleotide variation between populations and species can be used to elucidate the functionsencoded by the genomic sequence. Because a functional gene might be subject to selection,detecting the genomic regions subject to selection enables a better understanding of theprocesses of genome evolution under various population genetic forces. Historically, theimportance of mutation, natural selection and migration (i.e. gene flow) has been emphasisedin population genetics.

But in his neutral theory, KIMURA (1968, 1983) challenged the notion that naturalselection is the most important force in evolution. He argues that neutral mutations are thesource of all variation, that recombination determines the extent of association amongpolymorphic mutations (Linkage disequilibrium, LD) and in is theory of genetic driftdescribed the role of random events in determining a mutation’s history in a the population,from its origin to either lost or fixation.

However, while most DNA variation within species may be neutral, natural selectionmay still have an important role in shaping it. For instance, the fixation of a favourablemutation reduces the genetic variation in surrounding regions (a phenomenon called'hitchhiking' or a 'selective sweep', MAYNARD SMITH and HAIGH, 1974). Most of the closelylinked neutral variants are lost, but those variants on the same chromosome as the favourablemutation will increase in frequency. So, at the level of DNA where there is linkage (i.e.closely linked markers that are unlikely to be separated by a crossing over event and hencehave a greater probability of being inherited together), directional natural selection onfunctional DNA sequence variation contributes to the genetic drift of closely linked sequencesand thus increases LD around the selected locus. Because of this characteristic localisedreduction of variability, it is possible to find functional genomic regions. This ability to detectnatural selection is useful for the study of the history of the populations, and, for humans, inparticular, for medicine.

But the task is made especially difficult by the fact that many different evolutionaryprocesses affect the genetic variation in a similar way to selection. For instance, low levels ofpolymorphism within a region can be explained, not only by hitchhiking due to positiveselection on a linked beneficial allele, but also simply by low local mutation rate. In addition,the variability of coding regions tends to be lowest, due to selection against deleterious alleles(i.e. negative or purifying selection). Because coding regions generate functional proteinproducts that can be the targets of natural selection, nucleotide variants near deleteriousmutations are removed through indirect selection “Background selection” (CHARLESWORTH etal. 1993).

Another important process affecting natural genetic variability is the demographyhistory of a population. Genetic drift involves the stochastic process of transmitting allelesfrom one generation to the next, in a large population this will not have much effect in eachgeneration: the random nature of the process will tend to average out. In a small population

Page 9: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

4

the effect could be rapid and significant. Moreover, any rapid reduction of the population sizetends to “select” by chance few genomes that become the founder genomes of the newpopulation. Thus genetic variability in the population can be sharply reduced (bottleneckeffect).

A severe bottleneck is expected to cause a similar average reduction of geneticvariability on the whole genome because genomic segments are inherited en bloc fromgeneration to generation and thus share a single genealogical history. However, in thepresence of recombination, bottlenecks may produce patterns of variability along achromosome that, by chance, mimic the localised effects of directional or negative selection.Recombination events juxtapose neighboring chromosomal segments that have differenthistories, which disrupts the correlation of a genealogical history. Thus as each independentregion follows its one mutational history, the homogeneity is lost, which could lead tolocalised reduction of variability as expected under selection.

1.2. STATISTICAL TESTS PROPOSEDTo identify genomic regions in which selection might be operating, numerous tests of

neutrality have been developed. They use patterns of genetic variability to detect departurefrom the standard neutral model (SNM) which assumes that mutations are neutral and thatgene are sampled from a randomly mating (panmictic) population of constant size. Severaltests have been designed to extract the information encoded in a single locus.

Such tests can look at the level of polymorphism (KREITMAN and HUDSON 1991) orfocus on the structure of haplotypes (i.e. sets of closely linked genetic markers present on onechromosome which tend to be inherited together - not easily separable by recombination, seeSTROBECK 1987, HUDSON et al. 1994, VIEIRA and CHARLESWORTH 2000 and DEPAULIS andVEUILLE 1998). These tests can also consider the frequency spectrum of mutation (see TAJIMA1989-a, FU and LI 1993 and FAY and WU 2000).

In addition, several multi-locus tests have been proposed to detect genomic regionssubject to selection. Examples include the HKA test (HUDSON, KREITMAN, and AGUADE1987) that compares the divergence between species and the within-specific variability atseveral independent loci. KIM and STEPHAN (2002) explicitly model genetic linkage betweensurveyed and selected regions and develop a maximum-likelihood method based onindependent loci to examine the significance of a local reduction of genetic variably andestimate the strength of directional selection. SCHLÖTTERER’s (2002) lnRV focuses ondifferences in levels of variability between two populations.

Demography, by affecting natural variation affects these tests as well as selection.However, because the signature of demography is genome wide, while selection has localisedeffects, in theory, considering data from multiple loci should enable those tests to distinguishbetween the two processes.

1.3. DATA FOR DROSOPHILA AND HUMANSThe data from Drosophila and humans suggest that both species originated in Africa

and differentiated between non-African and African populations. Specifically, non-Africanpopulations have less diversity and higher LD. These observations suggest an “out of Africa”bottleneck.

In particular, in humans, a bottleneck (i.e. a severe reduction in population size) isthought to be associated with the emergence of modern humans (~200,000 ya). In addition,some particular human populations have remained small (i.e. hunters-gatherers) while othersgenerally recovered in size and have recently experienced exponential growth (i.e. after the

Page 10: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

5

development of agriculture, ~10,000 years ago) (see EXCOFFIER and SCHNEIDER 1999);leading to interesting differences in patterns of variability among human populations.

1.4. INTEREST OF DETECTING SIGNATURE OF A BOTTLENECKNumerous examples of departure from neutrality had been studied in both Drosophila

species and humans and several investigators have claimed that the observation of significantreduction of variability indicates selection. But, even the tests specifically designed to detectselective sweeps (i.e. FAY and WU 2000) can be influenced by population structure(PRZEWORSKI 2002), and may also be sensitive to other departure from demographic stability.

A solution to overcome this problem has been to consider the heterogeneity of patternsof variability as an argument for selection. This is relevant because selection has a localisedeffect and specifically, in the presence of recombination, strong directional selection ispredicted to produce a characteristic “valley” of reduced variability around the selected site(see KIM and STEPHAN 2002). In contrast demography tends to affect the genetic variabilityuniformly throughout the genome. However, a severe population bottleneck is also predictedto reduce genetic variability and in particular, in the presence of recombination, may producepatterns along a chromosome that, by chance, mimic the effects of localised selection.

In our study, we want to measure the extent to which the heterogeneity of the patternof variability is affected by a severe reduction of population size. We first explicitly modelbottlenecks with “best guess” parameters for Drosophila malanogaster and D. simulans andhumans. We then characterise the patterns of variability across recombining chromosomesthat have experienced our population bottleneck models and try to see how the results mightaffect the interpretation of the statistical tests.

Page 11: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

6

2. MATERIALS AND METHODS

2.1. BOTTLENECK SIMULATIONSSince many bottleneck models are possible, we focus on two highly simplified types

of bottlenecks. These are a “single step” population size change or a simple step followed byan instant recovery. In the former, an ancestral population of N0 individuals instantaneouslycrashes to Nb individuals at time Tb in the past. The reduced population size can also representthe harmonic mean of the population size over a period of severe periodic bottlenecks startingTb generations ago. In the latter, an ancestral population of N0 individuals that instantaneouslycrashes to Nb individuals for T generations, at time Tb in the past, then instantly recovers toN0. Parameters for these two classes of models are set such that they produce the sameaverage reduction in variability in the derived population. In principle, these two models areequivalent in their effects on variability, especially when their starting times are recent.

TABLE 1.Parameter values used in bottleneck simulationsBottleneck Nb Tb T

Drosophilaa

(I) 50 2000 10 (II) 50 120000 10 (III) 50 2000 50 (IV) 50 120000 50

Humans (V) a 10 600 15 (VI)b 900 600 600 (VII)a 10 2500 10 (VIII)b 2800 2500 2500

See text for explanationa Parameters for step–recovery bottleneckb Parameters for simple step bottleneck

The simulations with the standard neutral model (SNM) were run using HUDSON’s(2002) program which generates independent replicate samples assuming a constantpanmictic population size, an infinite-sites mutation model and a neutral coalescentapproximation to the WRIGHT–FISHER model. Simulations are based on the parameterθ=4NµL, where N is the diploid effective population size, µ is the sex average mutation rateper base pair and per generation, and L is the sequence length in base pairs. Alternatively, wespecify the number of segregating sites (S) in which case each independent replicate samplewill have S observed segregating sites. A simulation under a finite-sites recombination modelrequires the recombination rate along the sequence ρ=4NrL, where r is the sex averagerecombination rate per base pair and per generation. To simulate bottlenecks the parametersof interest are the number of intervals of population size changes and for each interval a tripletof additional parameters summarising the reduced population size (Nb), the starting time (Tb)and the length of the bottleneck (T).

Page 12: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

7

2.1.1. Drosophila populationsWe applied the different models of bottlenecks listed in Table 1 to an ancestral

population at neutral equilibrium with 5,000,000 individuals (similar than WALL et al. 2002).The timings of these events are consistent with those proposed for dispersal of D.melanogaster from Africa (models (II ) and (IV), 10-15 kya ~ 120,000 generations, LACHAISEet al. 1988) or the founding of North America populations (models (I ) and (III), < 400 ya ~2,000 generations, LACHAISE et al. 1988), respectively. The extent to which variability isreduced in these bottlenecks is based on limited data suggesting that the X chromosome of D.melanogaster is 2-fold less diverse in non-African populations relative to central Africanpopulations (inversion polymorphisms complicate the interpretation pattern for autosomalgenes, ANDOLFATTO 2001). In D. simulans, autosomal diversity may be about 25% lower fornon-African compared to African populations and potentially even more reduced on the Xchromosome (ANDOLFATTO 2001; BEGUN and WHITLEY 2000).

For the bottleneck scenarios described in Table 1, the average variability (measured asWATTERSON’s (1975) θW, see Statistical tests below) in the derived (post-bottleneck)population is reduced by either 15% (models (I) and (II)) or 50% (models (III) and (IV))compared to the ancestral population. We set the population mutation rate to θ = 4NµL = 15for a length of 500 recombining base pairs for single locus tests (µ=1.5 x 10-9, WALL et al.2002). To model a larger chromosome segment, we set θ = 800 over 40,000 recombining basepairs and the analysis of the diversity was performed on 500 base pairs widows with a stepsize of 50 base pairs. We set the population recombination rate ρ=4NrL = 3θ (ANDOLFATTO

and PRZEWORSKI 2000) and ρ = 15θ (when we consider regions with high recombination rateand µ=1.5 x 10-9) at and consider a sample size of n=15 chromosomes.

2.1.2. Human populationsHere, we assume an neutral equilibrium of 12,000 individuals (Wall 2003). The timing

of the bottleneck reflect a population size contraction of non-African populations sometimebetween the emergence of modern humans about 200,000 years ago and the precedingpopulation expansion after the introduction of agriculture about 10,000 years ago. In Table 1,the timing of the bottlenecks for humans correspond to 12,000 ya (models (V) and (VI)) and50,000 (models (VII) and (VIII)) respectively, assuming 20 years per generation. The extentto which variability has been reduced is based of data from ten non-coding autosomal regionssampled in one African and two non-African populations of humans (FRISSE et al. 2000). Forthe step-recovery models ((VI) and (VIII)) and simple step models ((V) and (VII)) in Table 1,variability is reduced by 35% in the derived population. We set the population mutation rateto θ = 3 for a length of 2500 recombining base pairs (µ=2.5 x 10-8, NACHMAN and CROWELL

2000-a). To model a larger chromosome segment, we set θ = 240 over 200,000 recombiningbase pair, and the analysis was performed on 2500 base pairs windows with a step size of 250base pairs. We set ρ = θ (PRZEWORSKI, personal communication) and consider n=15chromosomes.

Page 13: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

8

2.2. STATISTICAL TESTSWe employed several commonly used tests of neutrality that focus on aspects of the

data such as level of polymorphism, frequency spectrum and linkage disequilibrium. Thesetests are used to detect selection, but can also be sensitive to departure from demographicstability, and so could detect our modelled bottlenecks. The critical values of the level ofdiversity were computed from simulations based on θ, computed from the average level ofpolymorphism of the simulation with bottleneck. The critical values of all the other testsdescribed below were computed from the simulation under the SNM for all possible values ofS. All statistical tests are performed assuming no intragenic recombination, considering 15chromosomes and 10,000 repetitions. Note, this is conservative since the bottlenecksimulations have recombination and recombination reduces LD (see Bottleneck simulationspart 2.1.). The critical values are computed considering a correlated bloc of 500 of 2500 bp(for Drosophila and humans respectively, no recombination) where only neutral mutationevents create the variability. In contrast, in the simulations with the equilibrium populationand bottleneck models, recombination events might have broken the bloc, generating newvariations. Thus, when there is recombination, the tests of neutrality are more conservative, asthere is more variation generated, and fewer regions with significantly too low variability.

2.2.1. Levels of variabilityWe test the level of polymorphism using the KREITMAN and HUDSON’s (1991) test :

Prob(S | θ, n), where S is the number of segregating sites, θ is the expected number ofdifferences between a pair of sequences and n is the sample size. Note that this test requiredthat the ancestral variability θ is known, which is generally not the case when applied tonatural data.

In the presence of a bottleneck, the probability of having S segregating sites given θand n is expected to be lower than under the SNM. We use this as a one-tailed test and areinterested in Prob(S<Scrit | θ, n) where Prob(Scrit | θ, n) 05.0≤ for the simulations with theSNM.

2.2.2. Frequency spectrumTo test the frequency spectrum of a set of sequences, one need to know the ancestor

nucleotide at each segregating sites. When used on natural data, the sequence of an outgroupis consider the ancestor sequence, enabling us to determine the variant from the ancestornucleotides.

a) TAJIMA’s (1989-a) DWe employ TAJIMA’s (1989-a) D, which measure the normalised difference between

θW (WATTERSON 1975) and π (TAJIMA1983).

∑−

=

=θ1

1

W 1n

i i

S(1)

∑=

−−

=πS

iii pp

nn

1

)1(21

(2)

where n is the sample size, S the observed number of segregating sites and pi is the frequencyof variant (i.e. heterozygosity) for the ith segregating sites. Under the SNM, the two estimatorsof θ, π and θW, are unbiased, so the mean of D is close to zero (E(D)=0).

Page 14: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

9

)var( W

WDθ−π

θ−π= (3)

A negative value of D indicates an excess of rare mutations, while a positive Dindicates an excess of intermediate frequency variants. After a population bottleneck, E(D)can be positive, negative, or zero depending on the length of time since the bottleneck and theseverity of the bottleneck. In our analysis, we use the two one-tailed tests Prob(D ≤ D5% | S, n)and Prob(D ≥ D95% | S, n) where D5% and D95% are the critical values with 5% rejection in bothnegative and positive direction computed from the simulations with the SNM.

b) FAY and WU’s (2000) HThis test has been designed to have high power to detect positive selection.

H = π- θH (4)where θH (FAY and WU 2000) is a variant measure of diversity weighted by the homozygosityof the derived variants, as opposed to the frequency of the ancestral variants:

θH = ∑−

= −

1

1

2

)1(2n

i

i

nniS

(5)

where Si is the number of derived variants found i times in a sample of nchromosomes. Under neutrality and the infinite-site model, θH is another unbiased estimatorof θ and so E(H)=0. The H-test is considered to be highly conservative in the presence ofgrowth, as population growth tends to produce an excess of low frequency variants. However,in the presence of population structure, highly unequal sampling from different populationscan also lead to a significant H (PRZEWORSKI 2002). Here, we investigate how it responds tochange in population size by using the one-tailed test Prob(H ≤ Hcrit | S, n) where Prob(Hcrit |S, n) ≤ 0.05 is computed from simulations with the SNM.

2.2.3. Linkage disequilibrium

a) High frequency haplotypes (HUDSON et al. 1994, VIEIRA and CHARLESWORTH 2000)The statistical test for the frequency of the most frequent haplotype (fMFH) we use is

similar to the tests for high frequency haplotypes used by HUDSON et al. (1994) and VIEIRAand CHARLESWORTH (2000) to detect selection. Because selection (positive negative orbalancing), can create strong haplotype structure, fMFH is expected to be higher in a region thathas experienced selection than for neutral region. A recent bottleneck may also have the sameeffect on the haplotype distribution of a given population, as it tends to reduce the haplotypediversity, so we test the Prob(fMFH > fMFH(crit) | S, n) where Prob(fMFH (crit) | S, n) ≥ 0.95 for thesimulations with the SNM.

b) Number of haplotypes (STROBECK 1987)STROBECK’s (1987) proposed a test of the SNM based on the number of distinctive

haplotypes (K) (see also FU 1996). The value of K is expected to be lower than under theSNM for a given population if it has experienced a recent bottleneck (MARUYAMA andFUERST 1985-a) or periodic reductions of population size (MARUYAMA and FUERST 1985-b).So we use a one-tailed test Prob(K<Kcrit | S, n) where Prob(Kcrit | S, n) ≤ 0.05 is computedfrom simulations with the SNM. We also note the behaviour of the minimum number ofrecombination events, RM (HUDSON & KAPLAN 1985). WALL (2000) has proposed an estimateof ρ based on K and RM. Thus, based on the behaviour of these statistics we may infer how abottleneck affects the ρ estimate of WALL (2000).

Page 15: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

10

3. RESULTSTABLE 2

Rejection probability for single loci under step-recovery bottleneck models for Drosophila (ρ = 3θ).

Bottleneck θ/θo S K fMFH D H(µ, reject) (µ, reject) (µ, reject) (µ, reject 5%, 95%) (µ, reject)

(I) Tb = 2000 ga 0.82 40.1, 0.005 8.2, 0.15 0.28, 0.07 0.41, 0.001, 0.03 -1.28, 0.02(II) Tb = 120,000 ga 0.83 40.7, 0.005 10.1, 0.02 0.23, 0.03 0.36, 0.001, 0.03 -1.19, 0.02 Eqb. Pop.a 0.81 39.3, 0.002 12.6,<0.0001 0.16, 0.001 -0.01, 0.001, 0.001 0.00, 0.01

(III) Tb = 2000 ga 0.50 24.3, 0.07 3.5, 0.96 0.55, 0.56 0.88, 0.04, 0.35 -3.94, 0.21(IV) Tb = 120,000 ga 0.52 25.4, 0.06 5.9, 0.39 0.47, 0.36 0.69, 0.04, 0.27 -3.77, 0.19 Eqb. Pop.a 0.50 24.5, 0.01 11.2, 0.0002 0.20, 0.002 -0.02, 0.01, 0.01 -0.01, 0.01

Ancestral b 1.00 48.8, - 13.1,<0.0001 0.14, 0.0006 -0.01, 0.0003, 0.0008 0.05, 0.004

All the simulations consider ρ = 3θ, 15 chromosomes and 10,000 repetitions. The Romannumbers refer to the bottleneck models described in Table 1. µ and reject are the mean and rejectionprobability.

a Simulations with the SNM based on θ = 12 or 7.5 (for 0.8 and 0.5 variability reductionrespectively) with recombination ρ = 3θ.

b Simulations with the SNM based on θ = 15, with recombination ρ = 3θ.

3.1. DROSOPHILA POPULATION PARAMETERS.In Table 2, we present the effect of the bottleneck models on statistical tests applied to

short sequenced regions in Drosophila. We have modelled severe (θ/θ0 = 50%) and lesssevere (83%) bottlenecks associated with expansion from Africa (~120,000 ga) andcolonisation of non-Africa (~2000 ga). We also model an equilibrium population of the samesize as the derived population for comparison.

The apparent robustness of some tests to even large departures from an equilibriumpopulation of constant size reflects how conservative these tests are ( PRZEWORSKI et al.2001). The reason is that statistical tests are most often employed, as they are here, assumingno recombination. For example, for S, the proportion of significant tests is much below the5% level for each of the equilibrium populations modelled, since ρ = 3θ in the modelledpopulations.

Remarkably, while the variance of both S and TAJIMA’s D increases under abottleneck, relative to an equilibrium population, this rarely results in significant tests for theS statistic and for the negative tail of TAJIMA’s test (see Materials and Methods part 2.2). Thebottom line here is that recent bottlenecks in a species’ history are unlikely to result in asignificantly negative TAJIMA’s D or a marked deficiency of segregating sites (and thus,rejection by the KREITMAN and HUDSON’s (1991) test). Very similar results were recoveredby assuming that ρ = 15θ or under a simple bottleneck in which the population loses the sameamount of variability but never recovers in size (see Materials and Methods, part 2.1. andSupplementary II). The reason these details do not matter is probably because bothbottlenecks modelled here are so recent relative to the effective population size of the species(0.0004N0 and 0.024N0 generations, respectively).

Page 16: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

11

The statistical tests based on haplotype structure appeared to be the most sensitive.While the most sensitive overall was the expected number of haplotypes, K, (STROBECK1987), the frequency of the most frequent haplotype (fMFH), analogous to the haplotype testproposed by HUDSON et al. (1994), also had considerable power. In contrast, our most severebottlenecks (Nb/T=1) result in positive mean values of D and the positive tail of D showconsiderable power to detect these models. This is not surprising because following areduction in population size, rare frequency mutations are lost more readily than are commonmutations (N EI et al. 1975), and transient positive D values are expected (TAJIMA 1989-b).Also, the mean of H is negative under a bottleneck and FAY and WU’s H test is sensitive tothe assumption of no recombination. Surprisingly this statistic has some power to detectdrastic bottlenecks (see Table 2, most severe bottlenecks (III) and (IV)), and thus like otherstatistical tests, it is not robust to assumptions about population history (see also PRZEWORSKI2002, LAZZARO and CLARK 2003).

TABLE 3Rejection probability for single loci under different bottleneck models for humans

Bottleneck θ/θo S K fMFH D H(µ, reject) (µ, reject) (µ, reject) (µ, reject 5%, 95%) (µ, reject)

(V) Tb = 600 gaa 0.65 6.2, 0.09 3.8, 0.10 0.55, 0.08 0.42, 0.03, 0.16 -0.58, 0.12(VI) Tb = 600 gab 0.65 6.1, 0.09 3.6, 0.12 0.55, 0.08 0.56, 0.03, 0.19 -0.57, 0.12

(VII) Tb = 2500 gaa 0.64 6.2, 0.08 4.3, 0.05 0.58, 0.07 -0.02, 0.06, 0.09 -0.55, 0.11(VIII)Tb = 2500 gab 0.65 6.2, 0.08 3.9, 0.09 0.55, 0.07 0.40, 0.03, 0.15 -0.58, 0.11

Eqb. Pop.c 0.65 6.3, 0.04 5.1, 0.01 0.49, 0.02 -0.05, 0.03, 0.04 0.02, 0.05

Ancestrald 1.00 9.8, - 6.5, 0.01 0.39,0.02 -0.04, 0.03,0.04 0.01, 0.04

All the simulations consider ρ = θ, 15 chromosomes and 10,000 repetitions. The romannumbers refer to the bottleneck models described in Table 1. µ and reject are the mean and rejectionprobability.

a Correspond to the step –recovery bottlenecks (V) and (VII) of Table 1.b Correspond to the simple step bottleneck (VI) and (VIII) of Table 1.c Simulations with the SNM based on θ = 3*0.65 with recombination ρ = θ.d Simulations with the SNM on θ = 3 with recombination ρ = θ.

3.2. HUMAN POPULATION PARAMETERS.Table 3 presents the effect of some bottleneck models on statistical tests applied to

short sequenced regions in humans. The probability of rejecting the SNM increase of around2 fold for most of the statistics under the different models of bottlenecks. However, all if allthe test a not sensitive to our models, they have little power to detect the bottleneck chosen forhuman compared to those for Drosophila (Table 2). Eventually, all the tests are moresensitive to recent reduction of population (but not significant difference).

Our results for human populations are consistent with the fact that our two bottleneckmodels (simple step and step-recovery) are equivalent when the bottleneck is recent. Incontrast, the observations for older bottlenecks suggest that the history of a population after abottleneck is important. If a population bounces back to its original size, TAJIMA’s D might beclose to zero. However, if the average population size stays low (or fluctuates with a lowharmonic mean, see Materials and Methods part 2.1) then TAJIMA’s D could be positive.

Page 17: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

12

Page 18: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

13

3.3. LARGE SURVEY REGIONS IN DROSOPHILADirectional selection is expected to have localised effects on the genome (see

MAYNARD-SMITH and HAIGH 1974; KAPLAN et al. 1989; FAY and WU 2000; KIM andSTEPHAN 2000; PRZEWORSKI 2002). But theory tells us that closely linked genomic regionswill have correlated genealogical histories even under neutrality (HUDSON 1983). Thisdifficulty has been overcome by either comparing unlinked (or effectively unlinked) loci orby explicitly modelling linkage between surveyed regions (KIM and STEPHAN 2000). Incontrast, population history is expected to affect the entire genome in a similar manner. If weobserve localised rejections of the neutral model in genome wide scans, can we conclude thatselection has been operating? To address this question, we modelled large segments of thegenome (approx. 40 kb) undergoing bottlenecks like those described in Table 1, and askedhow statistical tests behaved spatially across such sequences. We thus asked not only aboutthe number of unusual regions observed in the genome segment modelled, but also thedistribution of sizes of these regions.

3.3.1. Pattern of variabilityFigures 1a, 1b and 1c (see Supplements III to VII) show three examples of patterns of

variability along large segments simulated under step-recovery bottleneck models (I) and (IV)with ρ = 3θ and (IV) with ρ = 15θ respectively. These graphs where chosen from 10 samplerandomly selected from the simulations with bottleneck. The lower graphs show the worstpattern (more and larger regions rejecting the SNM) with the equilibrium population of thesame size as the derived population for comparison.

Primarily, one can observe considerable heterogeneity across a sequence. Moreprecisely, the regions on which the tests significantly reject the SNM are not uniformlydistributed across the genomic sequences simulated, as one might expect by the statement“similar average effect on the genome”. Moreover, the standard single-locus tests suggest thatfor independent loci, tests are in some cases conservative. For instance, the negative tail ofTAJIMA’s D rejects the SNM with a probability p=0.04 (Table 2 Bottleneck (IV), p=0.01 forthe equilibrium population). However Figure 1b for the statistic D (Supplement VI) showsthat the rejection probability does not inform about the distribution of the regions where theSNM is rejected. Similarly, K is conservative under a severe bottleneck (p=0.02 bottleneck(IV) and p<0.0001 for the equilibrium population in Supplement II), while one can observe a40 kb region with too few haplotypes along its whole length (Figure 1c for K, SupplementIV). This clearly shows that even tests conservative for independent loci can detect largeregions rejecting the SNM under bottleneck models.

Depending on the patterns observed on the three graphs, the conclusion about theevent influencing the genetic variability might be different. The graphs (a) and (c) for thestatistic S all show patterns with very few rejecting regions, similar to the patterns observedfor the equilibrium population, thus the observed segments appear to be neutral. This meansthat there is small effects on the variance and mean of S. The graphs of the other bottleneck(b) modelled for this statistic show very different pattern of variability. The upper graph hasno rejection, the middle plot has large regions of very low variability and the lower oneplotting a pattern of low variability along the whole 40 kb segment. These observations wouldlead us to the conclusion of some selection going on the middle and lower graphs. Similarly,the graphs (a) and (c) for the statistics K, fMFH and FAY and WU’s H display patterns verydifferent which could lead to the conclusion of localised selective sweep on the middle graphsand selection spread along the whole 40 kb segment for the lower plots.

In contrast, the graphs for the more severe bottleneck but less recombination (b) forthese three statistics display very similar patterns of very numerous and large regions of too

Page 19: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

14

low variability, showing what one can expect to observe under a bottleneck. Theseobservations show that a mild demographic reduction in the population history (a) does notaffect independent recombining segments homogeneously, but can by chance mimic thepattern of a selective sweep. Similarly, a more severe bottleneck applied to genetic sequencessubject to many recombination events (c) can display patterns expected under selection. Thegraphs for TAJIMA’s D display neutral genomic segments (a) or, for (b) and (c), patterns thatseem to show selection on the two lower plots. Thus despite the fact that the tests for negativeTAJIMA’s D and deficiency of segregating sites are relatively robust to our modelledbottlenecks, a genome screening with these test may detect selection where in fact a severebottleneck has occurred. Also, the positive TAJIMA’s D, powerful to detect the most severebottleneck (see Table 2), displays patterns which lack the homogeneity that would suggestthat demography is the cause of the too high variability detected by the test.

3.3.2. Increase of linkage disequilibriumREICH et al. (2002) showed that the human genome contains sizeable regions

(stretching over tens of thousands of base pairs) that have intrinsically high and low rates ofsequence variation and showed that the primary determinant of these patterns is sharedgenealogical history. By measuring the average distance over which genealogical histories aretypically preserved, it is possible to have an estimate of the average extent of correlationamong variants (linkage disequilibrium). The size of correlated segments can be computedfrom the approximation (6) (OHTA and KIMURA 1971 and WEIR and HILL)

NrLrE

421

)( 2

+≈ (6)

For the parameters for Drosophila (see Materials and Method, part 2.1), and E(r²)=0.1,the size of regions with correlated genealogical histories is expected to be between 20 or 100bp (for ρ = 15θ and 3θ respectively) under the SNM. The graph for the equilibriumpopulation in the Figures 1a, b and c (lower graphs in Supplement from III to VII) showregions rejecting the SNM with size in this range. However, the more severe bottlenecks cancreate patterns with until 40 kb regions sharing the same genealogy history for the haplotypeand frequency spectrum tests (see Figures 1b and 1c, Supplements IV to VII).

When a population had experienced a bottleneck the probability to find regionsrejecting the SNM is significantly larger than under the SNM (for all the tests except thenegative tail of TAJIMA’s D, two-sample Kolmogorov-Smirnov test : KS<0.0001). Moreover,conditioning on detecting at least one unusual region in a sample of 40 kb sequences, the sizeof the region under a bottleneck model is significantly larger (KS<0.0001 for all the graphswhere the test was applicable). Figures 2a and 2b show a summary of these observations andplot for the six statistics the probability of finding an unusual region of a given size along a 40kb recombining chromosome that has experienced a old and severe bottleneck (model (IV))and the corresponding null model, for ρ = 3θ and ρ = 15θ respectively. They show for all thetests that the probability of detecting large regions is significantly increased when thepopulation has experienced a bottleneck. Even the negative tail of TAJIMA’s D, which isconservative (see Table 2 and Supplement I and II), show regions sharing genealogicalhistories larger under the bottleneck models. This is an expected result, as bottlenecksincrease the variance of the statistics (expect the haplotype tests). These results confirm theimportant excess of linkage disequilibrium (LD) that a bottleneck can create across genomicsequences.

Page 20: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

15

3.4. LARGE SURVEY REGIONS IN HUMANS.We modelled large segments of the genome (approx. 200 kb) undergoing the

bottlenecks described in Table 1.

3.4.1. Pattern of variabilityFigures 1d and 1c (see Supplements III to VII) display the pattern of variability for the

five statistics for the step-recovery bottleneck (recent (V) and older (VI) respectively). For allthe statistics, the patterns displayed in the three graphs with bottlenecks show very differentpatterns of variability, thus mimicking what one expect to find under selection.

The only exception is the graph (e) for the number of haplotypes K that displayrelatively homogeneous patterns, with numerous small regions rejecting the null modelscattered along the whole 200 kb segment. We do not show the results for the simple stepmodel, but in the 4 bottlenecks we modelled for humans, this is the only time the selected 10samples display these homogeneous patterns.

3.4.2. Increase of the linkage disequilibrium(6) and the parameters defined for humans (see Materials and Methods, part 2.1.) we

expect that the length of sequences sharing the same history is about 5000 bp. The graph forequilibrium populations (lower graphs) show rejecting regions with sizes consistent with thisvalue. However, one can observe much larger regions, and even until 150 kb rejectinghomogeneously the SNM in the Figure 1d for S (Supplement III). The excess of linkagedisequilibrium due to a severe bottleneck can be very strong.

As for Drosophila parameters, we find significantly more regions rejecting the SNM(KS<0.0001 for all tests except the negative tail of TAJIMA’s D), and conditioning on the

Page 21: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

16

region rejecting the SNM, the probability for its size to be large is significantly higher whenthe population had experienced a bottleneck than when the population size had remainedconstant (KS<0.0001 for all statistics). The Figure 2c summarises these results by displayingfor the six statistics the probability of finding an unusual region of a given size along a 200 kbrecombining chromosome with the model of bottleneck (VI). A bottleneck creates apopulation where only few genealogies a shared, thus smoothing the effect of recombinationduring the period of small population size. Recombination usually creates variability bybreaking correlation between loci. However, when the population size is small (ρ=4NrLsmaller), recombination events do not systematically induce variability (recombinationbetween identical genomes). Thus, the breakdown of LD across chromosome is not aseffective as under the SNM. This explain how bottleneck increases the extent to whichgenealogical histories are shared.

4. DISCUSSION.

4.1. INTERPRETATION OF SIGNIFICANT TESTS OF FREQUENCY SPECTRUM

4.1.1. TAJIMA’s DOur simulations show that, in Drosophila, positing a recent and drastic bottleneck

predicts that derived populations should have a more positive TAJIMA’s D than the ancestralpopulation (Tables 2, and Supplements I and II). In Table 3, the human values of D are closerto those expected under neutrality (old step-recovery bottleneck (VII)), but tend to be positiveas well.

In contrast, the data from Drosophila usually do not show positive values of D(PRZEWORSKI et al. 2001). This could be explained by an inadequate population samplingwhich may cover up the positive values of D. However, WALL et al. (2002) pointed out thatTAJIMA’s D is more positive on the X chromosome than on the autosomes for D. simulans,and this may be better explained by demographic history of the species than by selection. TheX chromosome is affected in a different manner than the autosomes if the population hasexperience a bottleneck, because it has a smaller effective population size the timing of thebottleneck would seem more recent. Also more negative values of D are found in the Africanpopulations (HARR et al. 2002). If ancestral populations of Drosophila have experienced along-term growth, this may cover up the evidence for bottlenecks in the derived populations.Note that our models might show positive values of D because our bottleneck models forDrosophila are recent relative to the effective population size of the species (only 0.024N0 forthe oldest model). Thus the fact that our models do not fit with the real demographic events ofthis species may also explain the differences with the natural data.

D is also not generally positive in humans (FRISSE et al. 2001), but this could be due toinadequate population sampling which may cover up the positive values of D (PTAK andPRZEWORSKI 2002). The comparison between African and non-African populations alsoshows more negative D for the ancestral populations than for the derived ones (FRISSE et al.2001, PLUZHNIKOV et al. 2002). GILAD and LANCET (2003) found values of D positive fordata from a Pigmy population, while a sample defined as Caucasians show values close toneutral. The Pigmies have a hunters-gathers cultures and remain a relatively small population.The value of D for this population is consistent with our results for the old simple stepbottleneck model (see bottleneck (VIII) Table 3). In contrast, the results for the Caucasianpopulation, which derived from an agricultural culture that may have experienced a recentexponential growth, are similar to those find for our old step-recovery model (see bottleneck

Page 22: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

17

(VII) Table 3). Thus these observation may well be due to differences in the demographyhistory of the compared populations.

4.1.2. FAY and WU’s H testSince it was designed specifically to detect selective sweeps (FAY and WU 2000 and

OTTO 2000), FAY and WU’s H test has been used on loci of interest to provide evidence fortheir adaptive functionality. We find, however, that this test is generally negative underbottlenecks and is not conservative under when the demography history of a population isunknown. In Drosophila, numerous comparisons between African and non-Africanpopulations show significant differences at loci thought to be subject to selection such asAcp26Aa (FAY and WU 2000), desat2 (TAKAHASHI et al.) and janus-ocnus (PARSCH et al.2001). However, these observations can not be interpreted as a unique signature of selection.

Studies of ten human non-coding autosomal regions and found more significant FAYand WU’s H (4 loci of 10) in the non-African populations, while the African populations fitthe neutral model (FRISSE et al. 2001, HAMBLIN et al. 2002) studied. Also, GILAD and LANCET(2003) found that H at human olfactory genes are significantly more negative in non-Africanpopulations than African populations. These two studies were interpreted as evidence forselection. Our study shows that these signatures of selection could well be due to a bottleneckor population structure (PRZEWORSKI 2002).

4.2. INTERPRETATION OF SIGNIFICANT TESTS TEST OF THE LEVEL OFPOLYMORPHISM

We measure the level of variability on single loci with the KREITMAN and HUDSON’s(1991) test but one expect the HKA-test, which considers the divergence between species andcompares the within-specific level of polymorphism at multiple loci, to be affected similarlyby demographic changes.

In natural populations of Drosophila, there are few examples of loci showingsignificant HKA-tests, such as Pgd in D. melanogaster (BEGUN and AQUADRO 1994) and runtin D. simulans (LABATE et al. 1999). In general, however, the HKA test fails to reject theSNM model in Drosophila. Similarly, in humans, the gene Dmd shows a deficiency ofsegregating sites in the non-African populations (NACHMAN and CROWELL 2000-b), but theHKA-test does not detect departure from neutrality on human loci.

If selection is a rare phenomenon, the fact that so few loci show significant HKA-testin both species is understandable. Our S statistic has a poor power to detect our models ofbottlenecks for both species, thus suggesting that a bottleneck would have to be very severeindeed to produce rejections by the HKA-test (HUDSON, KREITMAN, and AGUADE 1987) andsimilar tests (Tables 2 and 3, and Supplements I and II). So another explanation of theobservation in natural populations of Drosophila and Humans could be that they haveexperienced relatively mild bottlenecks, because these observations are consistent with thelack of power we found for our test of level of polymorphism in presence of bottlenecks.

4.3. HAPLOTYPE TESTS AND LINKAGE DISEQUILIBRIUMIn contrast to the test of level of polymorphism, the haplotype tests tend to be fairly

sensitive to our models of bottleneck in Drosophila; too few distinct haplotypes and a strongstructure of haplotype can be observed after of a drastic or recent population size reduction.

The haplotype tests have been developed because directional selection tends toreduced heterozygosity and increases the haplotype structure. But, under neutrality, closelylinked regions have correlated histories (HUDSON 1983). Researchers have been using theheterogeneity as an argument for selection. However, our results (Figure 1, Supplement III to

Page 23: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

18

VII and Figure 2) suggest that bottlenecks, by increasing linkage disequilibrium (decrease ofRM: those results were not shown but are consistent with those of WALL et al. 2002),exaggerate the correlation leading to larger scale heterogeneity among regions than expectedby the neutral model. Note that, even for tests with low power to detect our models ofbottleneck (level of polymorphism and negative tail of D), conditioning on finding a regionrejecting the SNM, its size is expected to be larger in presence of bottleneck than under theSNM (Figure 1a, Supplement III). This creates a problem for using the heterogeneityargument to detect selection, as the observation of heterogeneity along a sequence may beconsistent with demography.

In natural populations, recent data on loci suggest a deficiency of haplotypes andspatial heterogeneity in non-African relative to African populations in Drosophila (HUDSONet al. 1994, BEGUN and AQUADRO 1995, HUDSON et al.1997, VEUILLE et al. 1998,ANDOLFATTO et al. 1999, MOUSSET et al. 2003, and others) and in human (KAYSER et al.2003, SCHNEIDER et al. 2002). Also, multi-locus studies for selection produce data showingwells of diversity observed on chromosomes of Drosophila (NURMINSKY et al. 2001, HARR etal. 2002) and Humans (PATIL et al. 2001, SCHLÖTTERER 2002, and SABETI et al. 2002). Thisnumerous examples of heterogeneity can not be interpreted as a unique signature of selectionand may be explain by severe or recent bottlenecks.

5. CONCLUSION AND PROSPECTSThis study stresses the importance of the demographic history in shaping patterns of

genome variability. The biggest issue is that a populations history is usually unknown. Wehave studied bottleneck models because changes in populations size may be common in thehistory of most species. However, the effect of other departures from demographic stabilityshould also be studied in modelling population structure (e.g. extinction- recolonisation).

Departures from the SNM make finding evidence for selection and thus genomicregions experiencing adaptive evolution difficult. Specific methods need to be developed tobetter distinguish population size changes from selection. Combining several tests ofneutrality which look at uncorrelated information from the genomic data may be an accurateway to do so. Unfortunately, the five tests we studied tend to be significant on the sameregions under the bottlenecks models.

An alternative approach was pursued by LAZZARO and CLARK (2003), who compare aset of candidate genes to random genes. The mean values of the statistic are significantlydifferent for the two sets of genes; which is unlikely under a demographic change. This couldbe a good method to distinguish selection from demography.

Page 24: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

19

Epilogue

This internship has been a wonderful experience. I have been immersed in the domainof quantitative and population genetics with an amazing speed. In ICAPB, all my colleaguesprovided me with a favourable learning environment, and were eager to answer the most basicof questions answering this field of study. This enabled me to rapidly gain confidence withthe new concepts and method I had to deal with during my project.

In addition to the scientific achievement during my time in Edinburgh, myprofessional future has been deeply changed. More precisely, for the next three years I willundertake a PhD in Brown University, Providence, Rhode Island, USA under the supervisionof Molly Przeworki. The subject itself will be determined later, but it will certainly be acontinuation of the project presented in this report. This PhD opportunity is the direct result ofmy presence and working in ICAPB this year.

For all this I can never be grateful enough to all the persons who made this projectpossible, and indeed, a total success. In particular, the INSA of Lyon and the department forBioinformatics and Modelling have been from the start ready to help me realise this year-internship. I really hope the new promotions will still have this option to gain experience in ayear-project abroad during the fifth year of their engineering degree, because it appears to meto be a valuable opportunity to experience professional life.

To conclude, I have had great time during this year and have learned a lot which, I amconvinced, has prepared me for academic researches in the quantitative and populationgenetic.

Page 25: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

ACKNOWLEDGEMENT

First, I would like to thank Pr. Nick Barton for his invitation to work in his group ofPopulation Genetics. He gave me the wonderful opportunity to come to work in Edinburghand especially in the prestigious Institute of Cell, Animal and Population Biology (ICAPB),University of Edinburgh, UK. Throughout the internship, he was always available to discussmy hypotheses and directions.

I thank also Dr. Peter Andolfatto who was my collaborator and supervisor for myproject. He explained to me the quantitative genetics: hitchhiking, coalescence process,linkage disequilibrium and other notion required to start my work on the project… Each timeI had unclear ideas or a problem in my program, he was available to help me solve it. He alsoprovides me with the guidance I needed to organise my work. But most of all, he had thepatience and skill to handle me despite my unsettled mood and willingness to work.

I thank Dr. Molly Przeworski for her guidelines and her comments about the project.

Thanks also to the secretaries of ICAPB who were always ready to provide immediatehelp in finding solutions to solve the multiple technical and administrational problems I hadduring the beginning of my training period.

I thank particularly all my colleagues of Nick Barton‘s group, and of the surroundingoffices Alex Kalinka, Tim Sands, Toby Johnson, Jelle Zuidema, Angus Davison, PennyHaddrill and Andy Gardner for our discussions about our projects, for their help andfriendship and for some of them, their proof-reading of parts of this report.

I also thank all the teams of the ICAPB for their kindness. I particularly thank to XulioMaside for his help in finding Windows’ programs and solutions under such a Macenvironment.

I thank all the MSc students for helping me spend a nice time in Edinburgh bydiscovering with me the entertaining parts of this beautiful city.

I particularly thank Hedi Soula for helping me rewrite a more clear and workableprogram and Guillaume Beslon for his friendly supervision from Lyon.

I finally thank Jean-Michel Fayard and all my teachers for the help, advice andsupport they provided me during the internship.

Page 26: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

LITERATURE CITED

ANDOLFATTO, P., 2001 Contrasting patterns of X-linked and autosomal nucleotide variation in Drosophilamelanogaster and Drosophila simulans. Mol. Biol. Evol. 18: 279–290.

ANDOLFATTO, P., M. PRZEWORSKI, 2000 A genome-wide departure from the standard neutral model in naturalpopulation of Drosophila. Genetics 156: 257–268.

ANDOLFATTO, P., J.D. WALL, M. KREITMAN, 1999 Unusual haplotype structure at the proximal breakpoint ofIn(2L)t in a natural population of Drosophila melanogaster. Genetics 153:1297–1311.

BEGUN, D.J, and, C.F. AQUADRO, 1994 Evolutionary inferences from DNA variation at the 6-Phosphogluconatedehydrogenase locus in natural populations of Drosophila: selection and geographic differentiation. Genetics136: 155–171.

BEGUN, D.J, and, C.F. AQUADRO, 1995: Molecular variation at the vermilion locus in geographically diversepopulations of Drosophila melanogaster and D. simulans. Genetics 140: 1019–1032.

BEGUN, D.J., and P. WHITLEY, 2000 Reduced X-linked nucleotide polymorphism in Drosophila simulans. Proc.Natl. Acad. Sci. USA 97: 5960–5965

CHARLESWORTH, B., M. T. MORGAN and D. CHARLESWORTH, 1993 The effect of deleterious mutations onneutral molecular variation. Genetics 134: 1289–1303.

DEPAULIS, F., and M. VEUILLE, 1998 Neutrality tests based on the distribution of haplotypes under an infinite-site model. Mol Biol Evol 15: 1788–1790.

DEPAULIS, F., L. BRAZIER., and M. VEUILLE, 1999 Selective sweep at the Drosophila melanogaster suppressorof Hairless locus and its association with the In(2L)t inversion polymorphism. Genetics 152:1017–1024.

EXCOFFIER, L., and S. SCHNEIDER, 1999 Why hunter-gatherer populations do not show signs of pleistocenedemographic expansions. Proc Natl Acad Sci. USA 96: 10597–10602.

FAY J.C., and C-I. WU, 2000 Hitchhiking under positive Darwinian selection. Genetics 155: 1405–1413.

FRISSE, L., R. R. HUDSON, A. BARTOSZEWICZ, J. D. WALL, J. DONFACK, and A. DI RIENZO, 2001 Geneconversion and different population histories may explain the contrast between polymorphism and linkagedisequilibrium levels. Am J Hum Genet 69: 831–843.

FU, Y.-X., 1996 New statistical tests of neutrality for DNA samples from a population. Genetics 143: 557–570.

FU, Y.-X., and W.H. LI ,1993 Statistical tests of neutrality of mutations. Genetics 133: 693-709.

GILAD, Y., and D. LANCET , 2003 Population Differences in the Human Functional Olfactory Repertoire Mol.Biol. Evol. 20: 307–314.

HARR, B., M. KAUER, and C. SCHLÖTTERER, 2002 Hitchhiking mapping: A population-based finemappingstrategy for adaptive mutations in Drosophila melanogaster . Proc Natl Acad Sci USA 99: 12949–12954

HAMBLIN, M.T., E.E. THOMPSON, A. DI RIENZO, 2002 Complex signatures of natural selection at the Duffyblood group locus. Am J Hum Genet. 70: 369–83.

HUDSON, R. R., 1983 Properties of a neutral allele model with intragenic recombination. Theoretical PopulationBiology 23: 183–201.

HUDSON, R. R., 2002 Generating samples under a Wright-Fisher neutral model of genetic variation.Bioinformatics 18: 337–338.

Page 27: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

HUDSON, R. R., and N. L. KAPLAN, 1985 Statistical properties of the number of recombination events in thehistory of a sample of DNA sequences. Genetics 111:147–164.

HUDSON, R. R., M. KREITMAN, and M. AGUADE, 1987 A test of neutral molecular evolution based onnucleotide data. Genetics 116: 153–159.

HUDSON, R. R., K. BAILEY, D. SKARECKY, J. KWIATOWSKY, and F.J. AYALA, 1994 Evidence for a positiveselection in the Superoxide Dismutase (Sod) region of Drosophila melanogaster. Genetics 136:1329–1340.

HUDSON, R. R., A.G. SAEZ, AND F.J. AYALA, 1997 DNA variation at the Sod locus of Drosophila melanogaster:An unfolding story of natural selection. Proc. Natl. Acad. Sci. USA 94: 7725–7729.

KAPLAN, N. L., R. R. HUDSON, and C. H. LANGLEY, 1989 The "hitchhiking effect" revisited. Genetics 123:887–899.

KAYSER, M., S. BRAUER, M. STONEKING, 2003 A genome scan to detect candidate regions influenced by localnatural selection in human populations. Mol Biol Evol. 20: 893–900.

KIM, Y., and W. STEPHAN, 2002 Detecting a local signature of genetic hitchhiking along a recombiningchromosome. Genetics 160: 765–777.

KIMURA, M., 1968 Evolutionary rate at the molecular level. Nature 217: 624–626.

KIMURA, M., 1983 The neutral theory of molecular evolution . Cambridge University Press, Cambridge, UK.

KREITMAN, M., and R. R. HUDSON, 1991 Inferring the evolutionary histories of the Adh and Adh-dup loci inDrosophila melanogaster from patterns of polymorphism and divergence. Genetics 127: 565-582.

LABATE, J.A., C.H. BIERMANN, W.F. EANES, 1999 Nucleotide variation at the runt locus in Drosophilamelanogaster and Drosophila simulans. Mol Biol Evol 16: 724–731.

LACHAISE, D., M.L. CARIOU, J.R. DAVID, F. LEMEUNIER and L. TSACAS, 1988 The origin and dispersal of theDrosophila melanogaster subgroup: a speculative paleobiogeographic essay. Evol Biol 22: 159–225.

LAZZARO, B.P., CLARK A.G., 2003 Molecular population genetics of inducible antibacterial peptide genes inDrosophila melanogaster. Mol. Biol. Evol. 20: 914–923.

MARUYAMA, T., and P. A. FUERST , 1985-a Population bottleneck and nonequilibrium models in populationgenetics. II. Number of alleles in a small population that was formed by a recent bottleneck. Genetics 111:675–689.

MARUYAMA, T., and P. A. FUERST , 1985-b Population bottleneck and nonequilibrium models in populationgenetics. III. Genic homozygosity in population which experience periodic bottlenecks. Genetics 111: 691–703.

MAYNARD SMITH, J., and J. HAIGH, 1974 The hitch-hiking effect of a favorable gene. Genet. Res. 23: 23–35.

MOUSSET, S., L. BRAZIER, M.-L. CARIOU, F. CHARTOIS, F. DEPAULIS, and M. VEUILLE, 2003 Evidence of ahigh rate of selective sweeps in African Drosophila melanogaster. Genetics 163: 599–609

NACHMAN, M.W., and S.L. CROWELL, 2000-a Estimate of the Mutation Rate per Nucleotide in Humans.Genetics 156: 297–304.

NACHMAN, M.W., and S.L. CROWELL, 2000-b Contrasting evolutionary histories of two introns of the duchennemuscular dystrophy gene, dmd, in humans. Genetics 155: 1855–1864.

NEI, M., T. MARUYAMA, and R. CHAKRABORTY. 1975. The bottleneck effect and genetic variability inpopulations. Evolution 29: 1–10.

Page 28: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

NURMINSKY, D., D. DE AGUIAR, C. D. BUSTAMANTE, D.L. HARTL, 2001 Chromosomal Effects of Rapid GeneEvolution in Drosophila melanogaster. Science 291: 128–130.

OHTA, T., and M. KIMURA, 1971 Linkage disequilibrium between two segregating nucleotide sites under thesteady flux of mutations in a finite population. Genetics 68: 571–580.

OTTO, S. P., 2000 Detecting the form of selection from DNA sequence data. Trend In Genetics 16: 526–529.

PARSCH, J., C.D. MEIKLEJOHN and D. L. HARTL 2001 Patterns of DNA sequence variation suggest the recentaction of positive selection in the janus-ocnus region of Drosophila simulans. Genetics 159: 647–657

PATIL, N., A.J. BERNO, D. A. HINDS, W. A. BARRETT , J.M. DOSHI, C.R. HACKER, C.R. KAUTZER, D.H. LEE, C.MARJORIBANKS, D.P. MCDONOUGH, B.T.N. NGUYEN, M.C. NORRIS, J.B. SHEEHAN, N. SHEN, D. STERN,R.P. STOKOWSKI, D.J. THOMAS, M.O. TRULSON, K.R. VYAS, K.A. FRAZER, S.P.A. FODOR, D.R. COX, 2001Blocks of limited haplotype diversity revealed by high-resolution scanning of Human chromosome 21.Science 294: 1719–1723

PLUZHNIKOV, A., A. DI RIENZO, and R. R. HUDSON, 2002 Inferences about Human demography based onmultilocus analyses of noncoding sequences. Genetics 161: 1209–1218.

PRZEWORSKI, M., 2002 The signature of positive selection at randomly chosen loci. Genetics 160: 1179–1189

PRZEWORSKI, M., WALL J.D., ANDOLFATTO P., 2001 Recombination and the frequency spectrum in Drosophilamelanogaster and Drosophila simulans. Mol Biol Evol. 18: 291–298.

PTAK, S.E, and M. PRZEWORSKI, 2002 Evidence for population growth in humans is confounded by fine-scalepopulation structure. Trends Genet. 18: 559–563.

REICH, D.E., S.F. SCHAFFNER, M. J. DALY, G. MCVEAN, J.C. MULLIKIN, J.M. HIGGINS1, D.J. RICHTER, E. S.LANDER and D. ALTSHULER, 2002 Human genome sequence variation and the influence of gene history,mutation and recombination. nature genetics 32: 135–142.

SABETI, P.C., D. E. REICH, J. M. HIGGINS, H. Z. P. LEVINE, D. J. RICHTER, S.F. SCHAFFNER, S. B. GABRIEL, J.V. PLATKO, N. J. PATTERSON, G. J. MCDONALD, H. C. ACKERMAN, S. J. CAMPBELL, D. ALTSHULER, R.COOPERK, D. KWIATKOWSKI, R. WARD and E. S. LANDER, 2002 Detecting recent positive selection in theHuman genome from haplotype structure. Nature 419: 832–837.

SCHLÖTTERER, C., 2002 A microsatellite-based multilocus screen for the identification of local selectivesweeps. Genetics 160: 753–763.

SCHNEIDER, J.A., et al., 2002 Non-neutral evolution revealed by comparison of gene-based DNA sequencediversity in humans and chimpanzees. Am. J. Hum. Genet 71(supplement): abstract 1149.

STROBECK, C., 1987 Average number of nucleotide differences in a sample from a single subpopulation: a testfor population subdivision. Genetics 117: 149–153.

TAJIMA F., 1983 Evolutionary relationship of the DNA sequences in finite populations. Genetics 123: 437–460.

TAJIMA F., 1989-a Statistical method for testing the neutral mutation hypothesis by DNA polymorphism.Genetics 123: 585–595.

TAJIMA F., 1989-b The effect of change in population size on DNA polymorphism. Genetics 123:597–601.

TAKAHASHI, A., S.C. TSAUR, J.A. COYNE, C.I. WU, 2001 The nucleotide changes governing cuticularhydrocarbon variation and their evolution in Drosophila melanogaster. Proc Natl Acad Sci USA 98: 3920–3925.

VEUILLE, M., V. BENASSI, S. AULARD, F. DEPAULIS, 1998 Allele-specific population structure of Drosophilamelanogaster alcohol dehydrogenase at the molecular level. Genetics. 149: 971–81

Page 29: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

VIEIRA, J., and B. CHARLESWORTH, 2000 Evidence for selection at the fused locus of Drosophila virilis.Genetics 155: 1701–1709.

WALL, J.D., 2000 A Comparison of estimators of the population recombination rate. Mol. Biol. Evol. 17:156–163.

WALL, J.D., 2003 Estimating ancestral population sizes and divergence times. Genetics. 163: 395–404.

WALL, J.D., P. ANDOLFATTO, and M. PRZEWORSKI, 2002 Testing models of selection and demography inDrosophila simulans. Genetics 162: 203–216.

WATTERSON, G. A., 1975 On the number of segregating sites. Theor. Popul. Biol. 7: 256–276. finitepopulations. Theor. Appl. Genet. 38: 473–485.

WEIR, B. S., and W. G. HILL, 1986 Nonuniform recombination within the human β-globin gene cluster. Am. J.Hum. Genet. 38: 776– 778.

Page 30: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

I

TABLE 2.Supplement ASimple step bottleneck models for Drosophila (ρ = 3θ).

Bottleneck θ/θo S K fMFH D H(µ, reject) (µ, reject) (µ, reject) (µ, reject 5%, 95%) (µ, reject)

(I) Tb = 2000 ga 0.82 40.2, 0.0002 8.2, 0.16 0.27, 0.07 0.42, 0.0004, 0.03 -1.17, 0.02(II) Tb = 120,000 ga 0.85 41.3, 0.0002 9.4, 0.05 0.24, 0.03 0.36, 0.0004, 0.02 -1.12, 0.02 Eqb. Pop.a 0.80 39.3, 0.002 12.6,< 0.0001 0.16, 0.001 -0.01, 0.001, 0.001 0.00, 0.01

(III) Tb = 2000 ga 0.50 24.3, 0.25 3.5, 0.96 0.55, 0.56 0.88, 0.05, 0.36 -3.94, 0.21(IV) Tb = 120,000 ga 0.51 24.9, 0.22 4.4, 0.78 0.50, 0.42 0.85, 0.04, 0.33 -3.76, 0.19 Eqb. Pop.a 0.50 24.5, 0.01 11.2, 0.0002 0.20, 0.002 -0.02, 0.01, 0.01 -0.01, 0.01

Ancestralb 1.00 48.8, - 13.1,< 0.0001 0.14, 0.0006 -0.01, 0.0003, 0.0008 0.05, 0.004

All the simulations consider ρ = 3θ, 15 chromosomes and 10,000 repetitions.. The bottleneckmodelled are the simple step bottlenecks corresponding to the step-recovery bottlenecks of Table1(Roman numbers), with Nb=2000 and 120,000 (0.8 variability reduction) and 10,000 and 600,000 (0.5variability reduction). µ and reject are the mean and rejection probability.

a Simulations with the SNM based on θ = 12 or 7.5 (for 0.8 and 0.5 variability reductionrespectively) with recombination ρ = 3θ.

b Simulations with the SNM based on θ = 15, with recombination ρ = 3θ.

Page 31: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

Céline BECQUET ICAPB, Edinburgh, 09/02 – 06/03Signatures of a population bottleneck can be localised along a recombining chromosome

II

TABLE 2.Supplement BStep-recovery bottleneck models for Drosophila (ρ = 15θ).

Bottleneck θ/θo S K fMFH D H(µ, reject) (µ, reject) (µ, reject) (µ, reject 5%, 95%) (µ, reject)

(I) Tb = 2000 ga 0.83 40.3, 0.0002 8.8, 0.07 0.25, 0.04 0.42,< 0.0001, 0.01 -1.25, 0.01(II) Tb = 120,000 ga 0.84 40.8, 0.0001 13.4,< 0.0001 0.14, 0.0006 0.36,< 0.0001, 0.004 -1.30, 0.01 Eqb. Pop.a 0.80 39.0, 0.0001 14.3,< 0.0001 0.10,<0.0001 0.00,< 0.0001, 0.0002 -0.01, 0.001

(III) Tb = 2000 ga 0.50 24.4, 0.03 3.7, 0.96 0.54, 0.54 0.90, 0.04, 0.32 -3.83, 0.18(IV) Tb = 120,000 ga 0.52 25.2, 0.03 9.7, 0.02 0.33, 0.12 0.71, 0.03, 0.21 -3.63, 0.17 Eqb. Pop.a 0.50 24.4, 0.002 13.6,< 0.0001 0.13,< 0.0001 0.00, 0.0005, 0.0009 -0.01, 0.01

Ancestralb 1.00 48.8, - 14.5,< 0.0001 0.10,< 0.0001 0.0031,< 0.0001, 0.0001 -0.01, 0.0007

All the simulations consider ρ = 15θ, 15 chromosomes and 10,000 repetitions. The Romannumbers refer to the bottleneck models described in Table 1. µ and reject are the mean and rejectionprobability.

a Simulations with the SNM based on θ = 12 or 7.5 (for 0.8 and 0.5 variability reductionrespectively) with recombination ρ = 15θ.

b Simulations with the SNM based on θ = 15, with recombination ρ = 3θ.

Page 32: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

III

Page 33: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

IV

Page 34: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

V

Page 35: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

VI

Page 36: Signatures of a population bottleneck can be localised ...przeworski.uchicago.edu/cbecquet/MasterThesis.pdfSignatures of a population bottleneck can be localised along a recombining

VII