[ieee 1996 australian new zealand conference on intelligent information systems. proceedings. anziis...

4
hoc 1996 Australian New Zealand Conf. on Intelhgent Informanon Systems, 18-20 November 1996, Adelaide, Australia. Editors, Narasimhan and Jain The Power of Mutation Pierre A. I. Wijkman Stockholm University and Royal Institute of Technology Department of Computer and Systems Sciences Electrum 230, 164 40 Kista, Sweden E-mail: pierreadsv .su. se Abstract Traditional theories in evolutionary biology holds that a neutral mutation occasionally can drift to fixation in a population by genetic drift. In this paper we determine the frequency of this event more precisely. We show by simulation that a sin- gle neutral mutation by genetic drift spread fast in a population. We show (1) that the size of the spread has a power law distribution and (2) that the re- quired average number of generations for a specific spread size has a power law distribution. We have also shown that sexual reproduction effectively stops large neutral mutations from spreading through the population. 1 Introduction Nature is extremely superior to man in designing systems of high complexity. The process in nature that is responsible for this design process is called evolution. Two fields that both are closely related to evolution are evolutionary biology and evolu- tionary computation. Evolutionary biology is a field that from observations in nature build models of evolution. Evolutionary biology tries, in short, to achieve a better understanding of evolution. Evolu- tionary computation is a field that as a first step tries to interpret the models fi-om evolutionary biology into more abstract forms and as a second step tries to apply these abstract models in the con- struction of artificial, or man made, systems. Evo- lutionary computation tries, in short, to apply the design principles of evolution. Evolutionary com- putation is accordingly directly dependent on the results from evolutionary biology. The results of evolutionary computation can, in turn, give valu- able information back to evolutionary biology. Researchers within evolutionary biology can use evolutionary computation to test their models in simulations with artificial environments and ar- tificial systems. In spite of the fact that the two fields of evolutionary biology and evolutionary computation are very closely related they are cur- rently treated as two separate fields, especially from the viewpoint of the biologists. Perhaps will the two fields become united as computers become faster and more realistic models in evolutionary biology can be simulated and tested. In this paper we work within both these fields. We study the phenomena of spread of neutral mu- tations by genetic drift. A neutral mutation is a mutation that does not change the fitness of an individual. Genetic drift is due to the sampling error that results because not all individuals in a population reproduce. The smaller the number of reproducing individuals the smaller is the prob- ability that the new population is representing the original population. Thus any population of finite size becomes more similar in time. Taken together, neutral mutations and genetic drift have a very different dynamics than each of them has on their own. The current view is that a neutral mutation occasionally can drift to fixation in a population by genetic drift [F86]. In this paper we determine the frequency of this event more precisely. We show by simulation that a single neutral mutation by genetic drift spread fast in a population. More spe- cifically, we show that the spread of a mutation is random and power law distributed. This power law distribution indicates, in theoretical physic jargon, that the system is in a critical state [B?31 [BC?2] [C91] [c92]. Spread sizes of all sizes occur in contrast to normal, or Gaussian, distributions which behave exponentially. We also show (1) that the number of generations required for a spread of a certain size is random and power law distributed and (2) that the spread of a large neutral mutation is stopped in populations that use sexual reproduction. Systems with this kind of dynamics often reveal their dynamic nature as flicker noise and their static nature as fractals [BC92]. By taking a closer look at the spread of neutral mutations by genetic drift we have been able to see a very different picture of the dynamics of an evolutionary system. The dynamics of this system is similar, if not identical, to the dynamics of avalanches, forest fires and earth- quakes [BC?~] [ai]. 2 Methodology Throughout our research we have used simulation as a means to test our ideas and also to achieve inspiration or guidance for further ideas. A simula- 0-7803-3667-4/96/$5.00 01996 IEEE

Upload: pai

Post on 04-Apr-2017

220 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: [IEEE 1996 Australian New Zealand Conference on Intelligent Information Systems. Proceedings. ANZIIS 96 - Adelaide, SA, Australia (18-20 Nov. 1996)] 1996 Australian New Zealand Conference

h o c 1996 Australian New Zealand Conf. on Intelhgent Informanon Systems, 18-20 November 1996, Adelaide, Australia. Editors, Narasimhan and Jain

The Power of Mutation

Pierre A. I. Wijkman

Stockholm University and Royal Institute of Technology Department of Computer and Systems Sciences

Electrum 230, 164 40 Kista, Sweden E-mail: pierreadsv .su. se

Abstract Traditional theories in evolutionary biology holds that a neutral mutation occasionally can drift to fixation in a population by genetic drift. In this paper we determine the frequency of this event more precisely. We show by simulation that a sin- gle neutral mutation by genetic drift spread fast in a population. We show (1) that the size of the spread has a power law distribution and (2) that the re- quired average number of generations for a specific spread size has a power law distribution. We have also shown that sexual reproduction effectively stops large neutral mutations from spreading through the population.

1 Introduction Nature is extremely superior to man in designing systems of high complexity. The process in nature that is responsible for this design process is called evolution. Two fields that both are closely related to evolution are evolutionary biology and evolu- tionary computation. Evolutionary biology is a field that from observations in nature build models of evolution. Evolutionary biology tries, in short, to achieve a better understanding of evolution. Evolu- tionary computation is a field that as a first step tries to interpret the models fi-om evolutionary biology into more abstract forms and as a second step tries to apply these abstract models in the con- struction of artificial, or man made, systems. Evo- lutionary computation tries, in short, to apply the design principles of evolution. Evolutionary com- putation is accordingly directly dependent on the results from evolutionary biology. The results of evolutionary computation can, in turn, give valu- able information back to evolutionary biology. Researchers within evolutionary biology can use evolutionary computation to test their models in simulations with artificial environments and ar- tificial systems. In spite of the fact that the two fields of evolutionary biology and evolutionary computation are very closely related they are cur- rently treated as two separate fields, especially from the viewpoint of the biologists. Perhaps will the two fields become united as computers become faster

and more realistic models in evolutionary biology can be simulated and tested.

In this paper we work within both these fields. We study the phenomena of spread of neutral mu- tations by genetic drift. A neutral mutation is a mutation that does not change the fitness of an individual. Genetic drift is due to the sampling error that results because not all individuals in a population reproduce. The smaller the number of reproducing individuals the smaller is the prob- ability that the new population is representing the original population. Thus any population of finite size becomes more similar in time. Taken together, neutral mutations and genetic drift have a very different dynamics than each of them has on their own. The current view is that a neutral mutation occasionally can drift to fixation in a population by genetic drift [F86]. In this paper we determine the frequency of this event more precisely. We show by simulation that a single neutral mutation by genetic drift spread fast in a population. More spe- cifically, we show that the spread of a mutation is random and power law distributed. This power law distribution indicates, in theoretical physic jargon, that the system is in a critical state [B?31 [BC?2] [C91] [c92]. Spread sizes of all sizes occur in contrast to normal, or Gaussian, distributions which behave exponentially. We also show (1) that the number of generations required for a spread of a certain size is random and power law distributed and (2) that the spread of a large neutral mutation is stopped in populations that use sexual reproduction.

Systems with this kind of dynamics often reveal their dynamic nature as flicker noise and their static nature as fractals [BC92]. By taking a closer look at the spread of neutral mutations by genetic drift we have been able to see a very different picture of the dynamics of an evolutionary system. The dynamics of this system is similar, if not identical, to the dynamics of avalanches, forest fires and earth- quakes [BC?~] [ a i ] .

2 Methodology Throughout our research we have used simulation as a means to test our ideas and also to achieve inspiration or guidance for further ideas. A simula-

0-7803-3667-4/96/$5.00 01996 IEEE

Page 2: [IEEE 1996 Australian New Zealand Conference on Intelligent Information Systems. Proceedings. ANZIIS 96 - Adelaide, SA, Australia (18-20 Nov. 1996)] 1996 Australian New Zealand Conference

Roc. 1996 Ausaalian New Zealand Conf. on Intelligent Information Systems, 18-20 November 19%. Adelaide, Australia Editors, Narasimhan and Jain

{

.._ .

tion is a general method for studying a real process or system and consist usually of two steps:

1. The construction of a model, i. e. a mathemati- cal theory, that considers the mathematical and logical relations that represent the essential features of the process or system.

2. The accomplishment of the computations that step by step imitates the way that the process or system works.

Simulation is an important tool in the solution of a large number of problems, especially when a con- ventional mathematical solution is impossible. We have used the method of simulation in order to achieve a better understanding of the behaviour that different models of evolution gives rise to. We believe that those models that result in a behaviour that are more efficient, in terms of construction or search, are reflecting reality to a larger extent than does the other less efficient models.

3 Spread of Small Neutral Mutations Figure 1 shows the general algorithm used in the simulations applied in examining the question of how a small neutral mutation spread in a popula- tion. First the population is initialised to identical individuals. The population consist of 25, ten char- acters long, strings where each character can be one of two symbols. Then two arrays that count the frequency of (1) the maximum number of mutated individuals and (2) the number of generations is initialised. In an outer loop that continues a specific number of repetitions, the following happens: (1) In one randomly selected individual, one randomly selected character is changed. (2) In an inner loop that continues as long as not all individuals in the population are identical the following happens: (2a) In asexual reproduction one individual is selected to make one individual by making a copy of itself. In sexual reproduction two individuals are selected to make one individual by an ordinary single cross- over operation. (2b) One individual is selected randomly in the population and is replaced by the constructed individual. (2c) If the number of mu- tated individuals in the population is larger than the previous maximum then the previous maximum is updated. (2d) The generation counter is updated. (3) When the inner loop has finished, the first array counting the frequency of a specific maximum number of mutated individuals is updated. Also the second array counting the number of specific gen- erations is updated. It is these two arrays that are shown in graphical form in the figures below.

pop = IniPop Ini f l t l to popsize] = 0 Ini f2[1 to popsizel = 0

Do i = RndSel(pop) j = Change(i) pop = pop + j - i maxMuts = 1 gen = 0

Do il = sel(pop) i2 = Sel(pop) i3 = MakeNew(i1,iZ) i4 = Select(pop) pop = pop + i3 - i4 muts = NumMuts(p0p) If muts z maxMuts Then maxMuts = muts gen = gen + 1

Loop Until AllIdentical (pop)

fl[maxMutsl = fl[maxMutsl + 1 f2 [genl = f2 [genl + 1

Loop until Ready

Figure 1 : Simulation algorithm.

The simulations where run for a total of IO6 repeti- tions. Figure 2a shows the log-log plot of the distri- bution P(w) of the number w of individuals that a specific mutation spreads to using asexual repro- duction. Figure 2b shows the corresponding results for sexual reproduction.

0.1

1 1.52 3 5 7 10.1520.

W

Figure 2a: P(w) for asexual reproduction.

. .

0.1 I - .

... 1 1.52 3 5 7 10.1520.

W

Figure 2b: P(w) for sexual reproduction.

If we exclude the last value we have a linearity that indicates a power law distribution: P(w) = w-'.*' for asexual reproduction and P(w) = w-'.*~ for sexual reproduction. This shows that an originally single mutation can spread fast in a population. Note that the probability for a single mutation to spread to the

Page 3: [IEEE 1996 Australian New Zealand Conference on Intelligent Information Systems. Proceedings. ANZIIS 96 - Adelaide, SA, Australia (18-20 Nov. 1996)] 1996 Australian New Zealand Conference

Proc 1996 Australian New Zealand Conf. on Intelhgent Information Systems, 18-20 November 1996, Adelade, Australia Editors, Narasimhan and Jain

0.005

0.002.

whole population is equal to the probability that the mutation will spread to only four individuals.

We also studied the average number of genera- tions it would take to reach a specific number of mutated individuals. Figure 3a shows the log-log plot of the distribution P(1M) of the number of generations required in average to reach 1M mu- tated individuals using asexual reproduction. Figure 3b shows the corresponding results for sexual re- production.

E o~~~~~~ 0.02

4

0.01

0.005

0.002. 1 1 . 5 2 3 5 7 10.1520.

1M Figure 3b: P( 1M) for sexual reproduction.

The linearity indicates a power law distribution. P(1M) = lM-i.24 for asexual reproduction and P(1M)

for sexual reproduction. This shows that an originally single mutation can spread fast in a population.

Finally we studied the number of generations that a single mutation survive. Figure 4a shows the log-log plot of the distribution P(1) of the number of generations that an originally single mutation can survive a long time in a population using asexual reproduction. Figure 4b shows the corresponding results for sexual reproduction.

- - 1M-l.24

1 Figure 4b: P(l) for sexual reproduction.

The linearity indicates a power law distribution: P(l) = I-' for asexual reproduction and P(l) = 1-I 93

for sexual reproduction. This shows that an origi- nally single mutation can survive a long time.

4 In the following simulations we have used the same general algorithm that we used in section 3 (see figure 1) but we now change every character in- stead of just one in the randomly selected individ- ual to be changed. The population consist again of 25, ten characters long, strings where each charac- ter can be one of two symbols. The simulations where, as in previous simulations, run for a total of lo6 repetitions.

P(w), P(lM), and P(l) are again power law dis- tributed. In addition, we measured the change, or total accumulated number of mutations, d that the individuals in the population had acquired when they by genetic drift had return to their identical status (i.e. when the inner loop was over). In this case we only considered sexual reproduction since asexual reproduction never mix individuals and only produce individuals with either no change or a maximal change.

Figure 5a shows the log-log plot of the distribu- tion P(d) of the number of mutations in the indi- viduals in the population when all individuals are equal. Figure 5b shows the same relation in a log

Spread of Large Neutral Mutations

plot.

0.5 1

1 1.5 2 3 5 7 10.

d

Figure 5a: P(d) for sexual reproduction.

1 Figure 4a: P(l) for asexual reproduction.

Page 4: [IEEE 1996 Australian New Zealand Conference on Intelligent Information Systems. Proceedings. ANZIIS 96 - Adelaide, SA, Australia (18-20 Nov. 1996)] 1996 Australian New Zealand Conference

Proc. 1996 Australian New Zealand Cod. on Intelligent Information Systems. 18-20 November 1996, Adelaide, Australia. Editors, Nmimhan and Jain

2 4 6 8 1 0

d Figure 5a: P(d) for sexual reproduction.

The linearity in the log lot indicate an exponential behaviour: P(d) = This means that an indi- vidual that originally has a maximal difference in relation to the other individuals in the population does not spread this large difference in the popula- tion,

5 Conclusion We have performed the above described simula- tions on a wide number of different population sizes and string sizes. The results have always been power law distributions.

In evolutionary biology, the neutralist- selectionist controversy is about whether most of the variation in a population is selectively neutral and hence largely irrelevant to a population’s ca- pacity to respond to new forces of selection, or whether the genetic variants in a population differ in fitness and so constitute the raw material for adaptation to new selective regimes [F86]. Our aim has not been to settle this dispute but to show that neutral mutations at least has a possibility to spread fast in an arbitrarily large population. Perhaps our findings can be used to explain the remarkably high frequency of 7 percent colour blindness among humans in modem societies (in contrast to only 2 percent in hunting and gathering societies) witch according to current theories requires a mutation rate much larger that has actually been observed [F86J.

We hope that the results of this paper can lead to a better understanding in the areas of evolution- ary biology and evolutionary computation and that the twofold goal of understanding the design prin- ciples of nature and the ability to use these princi- ples in the design of artificial systems is one step closer.

dered Systems, Vol 11, Edited by A. Bunde & S. Havlin, Springer-Verlag Book, 1992

[C91] Michael Creutz, Abelian Sandpiles, Com- puters in Physics, MarcWApril 199 1, American Institute of Physics

[C92] Michael Creutz, On Self-Organised Critical- ily, Nuclear Physics B, Proceedings Sup-ple- ments, North-Holland, 1992

[F86] Douglas J. Futuyama, Evolutionay Biology, Second Edition, Sinauer Associates, Inc. - Pub- lishers, 1986

References [B93] Per Bak, Punctuated Equilibrium and Criti-

cality in a Simple Model of Evolution, Physical Review Letters, Volume 71, Number 24, 13 De- cember 1993, The American Physical Society

[BC92] Per Bak & Michael Creutz, Fractals and Self-Organised Criticality, Fractals and Disor-