AN AUTOMATIC REGROUPING MECHANISM
TO DEAL WITH STAGNATION
IN PARTICLE SWARM
OPTIMIZATION
A Thesis
by
GEORGE I. EVERS
Submitted to the Graduate School of the
University of Texas-Pan American
In partial fulfillment of the requirements for the degree of
MASTER OF SCIENCE
May 2009
Major Subject: Electrical Engineering
COPYRIGHT 2009 George Evers
All Rights Reserved
iii
ABSTRACT
Evers, George I., An Automatic Regrouping Mechanism to Deal with Stagnation in
Particle Swarm Optimization. Master of Science (MS), May, 2009, 96 pp., 10 tables, 32
illustrations, 31 references, 6 titles.
Particle Swarm Optimization (PSO), which was intended to be a population-based global
search method, is known to suffer from premature convergence prior to discovering the
true global minimizer. In this thesis, a novel regrouping mechanism is proposed, which
aims to liberate particles from the state of premature convergence. This is done by
automatically regrouping the swarm once particles have converged to within a pre-
specified percentage of the diameter of the search space. The degree of uncertainty
inferred from the distribution of particles at premature convergence is used to determine
the magnitude of the regrouping per dimension. The resulting PSO with regrouping
(RegPSO) provides a mechanism more efficient than repeatedly restarting the search by
making good use of the state of the swarm at premature convergence. Results suggest
that RegPSO is less problem-dependent and consequently provides more consistent
performance than the comparison algorithms across the benchmark suite used for testing.
iv
DEDICATION
The dynamics of living amongst various cultures have been fascinating,
while the constancy of family is refreshing.
Thanks for always being there!
v
ACKNOWLEDGEMENT
Thanks to Dr. Mounir Ben Ghalia for his guidance and
for helping me present my ideas clearly.
vi
TABLE OF CONTENTS
Page
ABSTRACT ....................................................................................................................... iii
DEDICATION ................................................................................................................... iv
ACKNOWLEDGEMENT .................................................................................................. v
TABLE OF CONTENTS ................................................................................................... vi
LIST OF TABLES ............................................................................................................. ix
LIST OF FIGURES ............................................................................................................. x
CHAPTER I. INTRODUCTION ...................................................................................... 1
Motivation for Particle Swarm Optimization .................................................................. 1
Optimization: A Brief Overview ................................................................................. 1
Gradient-Based Methods ............................................................................................. 3
Population-Based Heuristics........................................................................................ 4
PSO as a Member of Swarm Intelligence .................................................................... 6
Research Motivation ....................................................................................................... 8
Research Objectives ...................................................................................................... 14
Empirical Determination of Quality Parameters ....................................................... 14
Development of Regrouping Mechanism for Gbest PSO ......................................... 14
Testing of RegPSO .................................................................................................... 15
vii
Data Comparison ....................................................................................................... 15
Explore Applicability of RegPSO to Simple Uni-Modal Problems .......................... 15
Contributions ................................................................................................................. 15
High-Quality PSO Parameters ................................................................................... 15
Development of Efficient Regrouping Mechanism ................................................... 16
Development of Regrouping Model Specifically for Uni-Modal Case ..................... 16
CHAPTER II. PARTICLE SWARM OPTIMIZATION ALGORITHM ...................... 17
Problem Formulation..................................................................................................... 17
Evolution of the PSO Algorithm ................................................................................... 17
Original PSO Algorithm ............................................................................................ 17
“Lbest” PSO .............................................................................................................. 20
Inertia ......................................................................................................................... 20
Velocity Clamping ..................................................................................................... 28
Standard “Gbest” PSO .............................................................................................. 35
Illustration of Premature Convergence ......................................................................... 42
CHAPTER III. EMPIRICAL SEARCH FOR QUALITY PSO PARAMETERS .......... 48
Rastrigin Experiment Outlined...................................................................................... 48
Independent Validation of “Social Only” PSO ............................................................. 50
viii
Socially Refined PSO .................................................................................................... 53
CHAPTER IV. REGROUPING PARTICLE SWARM OPTIMIZATION .................. 59
Motivation for Regrouping............................................................................................ 59
Detection of Premature Convergence ........................................................................... 60
Swarm Regrouping ........................................................................................................ 61
“Gbest” PSO Continues as Usual .................................................................................. 63
Two-Dimensional Demonstration of the Regrouping Mechanism ............................... 66
CHAPTER V. TESTING AND COMPARISONS ......................................................... 76
Comparison with Standard “Gbest” & “Lbest” PSO’s ................................................. 77
Comparison with Socially Refined PSO ....................................................................... 80
Comparison with MPSO ............................................................................................... 82
Comparison with OPSO ................................................................................................ 83
RegPSO for Simple Uni-Modal Problems .................................................................... 86
CHAPTER VI. CONCLUSIONS ................................................................................... 88
REFERENCES ................................................................................................................ 91
APPENDIX (BENCHMARKS) ...................................................................................... 94
BIOGRAPHICAL SKETCH ........................................................................................... 96
ix
LIST OF TABLES
Page
Table II-1: Effect of Velocity Clamping Percentage with Static Inertia Weight .............. 31
Table II-2. Effect of Velocity Clamping with Linearly Decreased Inertia Weight .......... 33
Table III-1: “Social-only” Gbest PSO with Slightly Negative Inertia Weight ................. 52
Table III-2: “Socially Refined” PSO with Slightly Negative Inertia Weight ................... 55
Table III-3: “Socially Refined” PSO with Small, Negative Inertia Weight ..................... 57
Table V-1: RegPSO Compared to Gbest and Lbest PSO with Neighborhood Size 2 ...... 78
Table V-2: RegPSO Compared with Socially Refined PSO ............................................ 81
Table V-3: RegPSO Compared with MPSO ..................................................................... 83
Table V-4: OPSO Compared with RegPSO both with and without Cauchy Mutation .... 85
Table V-5: A RegPSO Model for Solution Refinement Rather than Exploration ............ 87
x
LIST OF FIGURES
Page
Figure II-1: Velocity Clamping Pseudo Code .................................................................. 30
Figure II-2: Rastrigin Benchmark Used for 2-D Illustration ............................................ 39
Figure II-3: Swarm Initialization (Iteration 0) .................................................................. 40
Figure II-4: First Velocity Updates (Iter. 1)...................................................................... 40
Figure II-5: First Position Updates (Iter. 1) ...................................................................... 41
Figure II-6: Second Velocity Updates (Iter. 2) ................................................................. 41
Figure II-7: Second Position Updates (Iter. 2) .................................................................. 42
Figure II-8: Swarm Initialization (Iter. 0) ......................................................................... 44
Figure II-9: Converging (Iter. 10) ..................................................................................... 44
Figure II-10: Exploratory Cognition and Momenta (Iter. 20) ........................................... 45
Figure II-11: Convergence Continues (Iter. 30) ............................................................... 45
Figure II-12: Momenta Wane (Iter. 40) ............................................................................ 46
Figure II-13: Premature Convergence (Iter. 102) ............................................................. 46
Figure IV-1: RegPSO pseudo code ................................................................................... 65
Figure IV-2: Swarm Regrouped (Iter. 103) ...................................................................... 66
Figure IV-3: PSO in New Search Space (Iter. 113) .......................................................... 67
Figure IV-4: Swarm Migration (Iter. 123) ........................................................................ 67
Figure IV-5: New Well Considered (Iter. 133) ................................................................. 68
xi
Figure IV-6: Most Bests Relocated (Iter. 143) ................................................................. 68
Figure IV-7: Swarm Collapses (Iter. 153) ........................................................................ 69
Figure IV-8: Horizontal Uncertainty (Iter. 163) ............................................................... 69
Figure IV-9: Uncertainty Remains (Iter. 173) .................................................................. 70
Figure IV-10: Becoming Convinced (Iter. 183) ............................................................... 70
Figure IV-11: Premature Convergence Detected (Iter. 219) ............................................ 71
Figure IV-12: Second Regrouping (Iter. 220)................................................................... 71
Figure IV-13: Swarm Migration (Iter. 230) ...................................................................... 72
Figure IV-14: Best Well Found (Iter. 240) ....................................................................... 72
Figure IV-15: Swarm Collapsing (Iter. 250)..................................................................... 73
Figure IV-16: Particles Swarm to the Newly Found Well (Iter. 260) .............................. 73
Figure IV-17: Convergence (Iter. 270) ............................................................................. 74
Figure IV-18: Effect of Regrouping on Cost Function Value .......................................... 74
Figure V-1: Mean Behavior of RegPSO on 30D Rastrigin .............................................. 80
1
CHAPTER I
INTRODUCTION
Motivation for Particle Swarm Optimization
Optimization: A Brief Overview
Optimization is the search for a set of variables that either maximize or minimize a
scalar cost function, ( )f x . The n-dimensional decision vector, ,x consists of the n
decision variables over which the decision maker has control. The cost function is
multivariate since it depends on more than one decision variable, as is common of real-
world relationships. The decision maker desires a more efficient method than trial and
error by which to obtain a quality decision vector, which is why optimization techniques
are employed.
In general, the literature focuses on minimization since the maximum of any cost
function, ( )f x , is mathematically equivalent to the minimum of its additive inverse,
( )f x . In other words, any scalar function to be optimized may be treated wholly as a
minimization problem due to the symmetric relationship between the cost function and its
additive inverse across hyperplane ( ) 0f x .
When each decision variable is allowed to assume all real, integer, or other values
making up the n-dimensional search space, the optimization is said to be unconstrained.
If there are further limitations on the allowable values of any decision variable, the
optimization is said to be constrained. Boundary constraints, which specify a maximum
2
and/or minimum value for any or all decision variables, are not necessarily considered to
constitute constrained optimization, though this would literally be the case.
If the Rocky Mountain Range, with its hills and valleys, represents an
optimization function, with the goal of the optimization problem being to find the
geographical coordinates that minimize the altitude of the function, the bottom of each
valley and depression would be a local minimum in reference to the altitude, which is the
cost function’s value. The n-dimensional coordinates or decision vector at which a local
minimum occurs is called a local minimizer or local minimum point; the decision vector
to be optimized consists of longitude in the horizontal dimension and latitude in the
vertical dimension. Since the goal for this example would be to find the lowest altitude
of the mountain range, one might simply head in a downward direction from the current
location, which would lead him to a local minimum; however, one would not necessarily
have a reason to believe that location to be the global minimum. Local optimization (LO)
methods seek to find a local minimum and, more importantly, its corresponding local
minimizer, while global optimization (GO) methods attempt to find the global minimum,
or lowest function value, and its corresponding global minimizer.
An explorer could try walking all over the mountain range recording the local
minimum (as measured by altitude) at each local minimizer (as measured in two-
dimensional space by longitude and latitude) in order to find the global minimum and its
global minimizer; but this kind of exhaustive search would be quite inefficient. As a
more efficient example of global minimization, consider a team of explorers searching
for local minima independently while sharing information with each other via walkie
talkies: in this way, the team of explorers would have a much better chance to find the
3
global minimum quickly and efficiently since each agent would be aware of the quality of
regions in various directions.
The Rocky Mountain range, with its multiple local minima, is an example of a
multi-modal function. As an example of a uni-modal function, imagine a relatively
smooth crater on the surface of the moon. It has its only one minimum, which does not
necessarily require a team of explorers to locate.
Gradient-Based Methods
Gradient-based optimization can be likened to an explorer with a device to
calculate rate of descent in any particular direction he or she might choose to explore in
order to determine the most feasible direction in which to search. This device might also
approximate second derivative or Hessian information, by which to infer the rate of
descent of the already calculated rate of descent. Using such derivative information, a
Taylor polynomial could conceivably be constructed in each direction for our explorer so
that expected hills and valleys are generated as a contour map, and the most feasible
direction can be inferred based upon all data available in the vicinity. Our explorer,
though well equipped with an expected contour map, has only information about his
greater vicinity, which does not necessarily help him navigate from his location to the
desired global minimizer if the search space is large. Regardless of how much time he
takes to calculate the curvatures in various directions from his immediate vicinity, he has
very little information of valleys far from him. He may even be deceived by local terrain
and end up at the bottom of a steep valley that, based upon all available information, is
the best minimum available but in reality is only a local minimum to which the explorer
has prematurely settled.
4
To escape from the state of premature convergence, one option is to simply restart
the algorithm. This is analogous to having our explorer restart his search. However,
since the path he followed was derived entirely from his instrument’s calculations based
on the terrain visible around him, restarting from the same position would
deterministically lead to premature convergence in the same region. This is because there
is no randomness or stochasm in the algorithm to help him avoid deceit by local terrain.
Consequently, restarting a gradient-based search requires initilializing explorers to
different locations each time, though they could still be deceived by some prominent
topographical feature such as a steep valley.
Population-Based Heuristics
For these reasons, when our explorer is amidst potentially deceitful multi-modal
terrain, he might benefit more from communication with other agents dispersed
elsewhere than from the solitary use of time-consuming and deterministic calculations.
In this way, the nature of the search changes from a series of highly analytical decisions
by one agent to the synchronous movements of multiple agents.
The fact that various agents are dispersed throughout the search space allows for
consideration of multiple regions simultaneously so that deceit by any one local region is
unlikely to occur unless all agents converge to the same region before reaching the global
minimizer, in which case it is said that the agents have prematurely converged. To
further hinder premature convergence, population-based approaches tend to employ some
form of stochasm or randomness. For example, in genetic algorithms (GA) [1-3], the
decision vector or location of each agent is considered to be a sort of DNA, and
beneficial random mutations are seized upon by offspring. Randomness, in the
5
algorithmic world at least, is generated by a separate algorithm called a randomizer that
transforms each seemingly random number to find each next seemingly random number
so that experiments can be scientifically reproduced through the deterministic nature of
seeming “randomness.”
In addition to offering more resistance to premature convergence than do
gradient-based methods, the computational simplicity of population-based optimization
methods allows progress to be made in a more time efficient manner. Population-based
approaches may be able to further reduce computational complexity by more easily
lending themselves to parallel processing; for example, using one processor per agent
might allow one phase of the code to be executed in parallel, after which the other phase
would extract from each agent’s memory the new function values for simple comparison
and write the location of the best agent back to each memory location for consideration
by all other agents before re-entering the parallel phase of computations. If this could be
done, it would reduce computation complexity from a Big O Notation of O(s*k) to O(k),
where “s” is the number of agents employed and “k” is the expected number of iterations.
Heuristic approaches have not necessarily been proven to produce the global
minimum with every trial or to be applicable in all cases. Rather, they have been
demonstrated to work well in general.
The speed of a population-based search heuristic can be measured in iterations,
function evaluations, or real time. Since each particle evaluates its function value at each
iteration, the number of function evaluations conducted per iteration is equal to the
number of search agents. Function evaluations seem to be the most popular measure.
Real time is not generally used since the time required to run a simulation on one
6
computer might not equal the time required on another computer, making real-time
comparisons from paper to paper practically impossible. Furthermore, real simulation
times may vary even on one computer due to system heating and background activity or
other activity by the user.
The time required for a function evaluation, and therefore also the time required
for an iteration, which is a set of function evaluations, depend on the computational
complexity of the algorithm, which would be better reflected by a measure of real time.
However, it is not practical to ask all researchers to use the same system for comparison,
so the traditional function evaluations will be used herein. The reader is cautioned,
however, that an algorithm requiring less function evaluations or iterations than another
is not necessarily faster in real time if the seemingly quicker algorithm is computationally
more complex. To compare efficiencies in real time, one could call the different
algorithms according to a cleverly alternating pattern that might involve random selection
until the desired trial size had been collected for each algorithm; however, as mentioned
previously, such an approach would make for a standalone paper whose results would not
compare well with those of other authors using different computers.
PSO as a Member of Swarm Intelligence
Particle Swarm Optimization (PSO) was introduced in 1995 by social
psychologist James Kennedy and professor and chairman of electrical and computer
engineering Russell C. Eberhart to simulate the natural swarming behavior of birds as
they search for food [4]. The test function used was2 2
1 2( ) ( 100) ( 100)f x x x ,
which has a minimum function value of zero at Cartesian coordinates (100, 100). In
math, this would be called a three-dimensional function as it is graphed on the three
7
dimensional Cartesian coordinate system; however, in optimization the focus is on the
number of dimensions in the decision vector: since there are two decision variables to be
optimized, this is referred to as a two-dimensional optimization problem. In other words,
this particular function has two decision variables, 1x and 2x , to be optimized such that
the resulting decision vector, x , minimizes the cost function, ( )f x .
Kennedy & Eberhart considered the global minimizer of their test function as a
type of corn field and were curious to see whether the swarm of particles would
successfully flock toward the food. As the swarm flocked toward location (100, 100),
this algorithm mimicking the social interaction of swarming or schooling creatures was
verified to be an optimization algorithm. Since that time, PSO has been shown to
converge quickly relative to other population-based optimization algorithms such as GA
while still offering good solution quality [5].
Swarm intelligence is a type of multi-agent system whereby individual agents
behave according to simple rules but interact to produce a surprisingly capable collective
behavior. PSO is one form of swarm intelligence since each particle flies through the
search space by updating its individual velocity at regular intervals toward both the best
position or location it personally has found (i.e. the personal best), and toward the
globally best position found by the entire swarm (i.e. the global best). Since the function
value of each particle is iteratively or regularly evaluated in order to determine which
offers the lowest function value; and since that information affects the velocity, and by
implication the direction, of every other particle; an interestingly capable collective
behavior emerges. The global best, or in some forms of PSO the neighborhood best, are
8
stored to a memory location that all particles access and utilize to determine their
individual velocities. This models the social act of communication.
While particles begin searching between each decision variable’s initial boundary
values, these are not necessarily boundary constraints since particles are generally
allowed to search outside of each decision variable’s range of values. Some forms of
PSO, however, use the initial boundary values as boundary constraints to prevent
particles from exploring outside a fixed search space. PSO has also been shown to be
applicable to constrained nonlinear optimization problems [6].
While global optimization algorithms such as PSO are most naturally applied to
the optimization of multimodal cost functions, they can optimize unimodal functions as
well.
Other examples of swarm intelligence are Ant Colony Optimization (ACO) [7]
[8] [9] and Stochastic Diffusion Search (SDS) [10].
Research Motivation
While population-based heuristics are less susceptible to deceit due to their use of
stochasm and reliance directly upon function values rather than derivative information,
they are nonetheless susceptible to premature convergence, which is especially the case
when there are many decision variables or dimensions to be optimized. The more
communication that occurs between agents, the more similar they tend to become until
converging to the same region of the search space. In particle swarm, if the region
converged to is a local well containing a local minimum, there may initially be hope for
escape via a sort of momentum built into the algorithm via the inertial term; over time,
9
however, particles’ momenta decrease until the swarm settles into a state of stagnation,
from which the basic algorithm does not offer a mechanism of escape.
While allowing particles to continue in this state may lead to solution refinement
or exploitation following the initial phase of exploration, it has been observed empirically
that after enough time, velocities may become so small that at their expected rate of
decrease, even the nearest solution may be eliminated from the portion of the search
space particles can practically be expected to reach in later iterations. In traditional PSO,
when no better global best is found by any other particle for some time, all particles
converge about the existing global best, potentially eliminating even the nearest local
minimizer.
Van den Bergh appears to have solved this particular problem with his
Guaranteed Convergence PSO (GCPSO) by using a different velocity update equation for
the best particle since its personal best and global best both lie at the same point, which in
traditional PSO inhibits the explorative abilities of the best particle, since it is so strongly
pulled toward that one point, with only its waning momentum and accelerations in the
direction of that point keeping it exploring at all [11] [12]. GCPSO is therefore said to
guarantee convergence to a local minimizer.
There is still a problem, however, in that particles tend to converge to a local
minimizer before encountering a true global minimizer. Addressing this problem, Van
den Bergh developed multi-start PSO (MPSO) which automatically triggers a restart
when stagnation is detected. Various criteria for detecting premature convergence were
tested in order to avoid the undesirable state of stagnation [12]: (i) Maximum Swarm
Radius, which defines stagnation as having occurred when the particle with the greatest
10
Euclidean distance from global best reaches a minimum threshold distance, taken as a
percentage of the original swarm radius, (ii) Cluster Analysis, which terminates the
current search when a certain percentage of the swarm has converged to within a pre-
specified Euclidean distance, and (iii) Objective Function Slope, which records the
number of iterations over which no significant improvement has been seen in the function
value returned by the global best, and terminates the current search when that number
reaches a pre-specified maximum. The first two criteria monitor the proximity of
particles to one another, and the latter monitors whether improvement has been seen
recently in the function value being optimized. Which is better seems to depend on
which is the cause of the problem: (i) proximity of particles to one another making
exploration unlikely to impossible, or (ii) function value not improving over time. Since
the former seems to be the cause of the latter, measuring particles’ proximities directly
seems like the better idea, which is consistent with the fact that Van den Bergh found the
Maximum Swarm Radius and Cluster Analysis methods to outperform the Objective
Function Slope method.
Restarting in MPSO refers to starting the search anew with a different sequence of
random numbers generated so that even initial positions are different than they were in
previous searches. At restart, particles lose their memories of the previous search so that
each search is independent of those previously conducted. After each independent
search, the global best is compared to the best global best of previous searches. After a
pre-specified number of restarts have completed, the best of all global bests is proposed
as the most desirable decision vector found over all searches.
11
Following this logic, one wonders if there might be a more efficient mechanism
by which the swarm could “restart.” It was thought that restarting on the original search
space might cause unnecessarily repetitious searching of regions not expected to contain
quality solutions. GCPSO might even allow the swarm to escape local optima if
parameters were designed with exploratory intentions, but this approach would
effectively leave the rest of the swarm trailing almost linearly behind the globally best
particle’s random movements, which would not be ideal. So a mechanism became
desirable by which the swarm could efficiently regroup in a region small enough to avoid
unnecessarily redundant search, yet large enough to escape wells containing local minima
in order to try to prevent stagnation while retaining memory of only one global best
rather than a history of the best of them. Consequently, there is one continuous search
with each grouping making use of previous information rather than a series of
independent searches.
In 1995, James Kennedy and Russell C. Eberhart observed that if each particle is
drawn toward its neighborhood best or local best instead of directly toward the global
best of the entire swarm, particles are less likely to get stuck in local optima [13].
Neighborhoods in this Lbest PSO overlap so that information about the global best is still
transmitted throughout the swarm but more slowly so that more exploration is likely to
occur before convergence, reducing the likelihood of premature convergence. The PSO
literature seems to have focused primarily on global best PSO (Gbest PSO) due to its
relatively quick initial convergence; however, hasty decisions may be of lower quality
than those made after due consideration, and Lbest PSO appears generally to produce
higher quality solutions if given enough time to do so. Since Gbest PSO is more popular,
12
it is often referred to simply as PSO; however, Lbest PSO should not be overlooked.
Lbest PSO still suffers from premature convergence in some cases as demonstrated
somewhat severely on the Rastrigin test function or benchmark, where the standard Gbest
PSO also suffers.
Wang et al. applied an opposition-based learning scheme to PSO (OPSO) along
with a Cauchy mutation of the global best so that particles are less likely to be attracted to
the same position [14]. The main objective of OPSO with Cauchy mutation is to help
avoid premature convergence on multi-modal functions. Using opposition-based learning,
two different positions for each selected particle are evaluated - the particle’s own
position and the position opposite the center of the swarm. Only for particles at the
center of the swarm are these positions the same.
Worasucheep proposed a PSO with stagnation detection and dispersion (PSO-
DD) that detects stagnation by monitoring changes in mean velocity and best function
value, reinvigorates the swarm with velocities one hundred times larger than their levels
at stagnation, and disperses particles by up to one-tenth of one percent of the range on
each dimension [15]. In this way, diversity is infused back into the system so that the
search can continue rather than restarting and searching anew. While this idea improves
performance on some benchmarks, performance suffers considerably on the Rosenbrock
benchmark.
Balancing between the explorative tendencies of Lbest PSO and the quick
convergence of Gbest PSO, Parsopoulos and Vrahatis with their Unified PSO (UPSO)
iteratively take a weighted average of the velocities proposed by each [16, 17]. In this
way, each particle has available for consideration its personal best, its neighborhood best,
13
and the swarm’s global best. Particles can consequently be thought of as being more
informed. However, it may be redundant for the personal best to be considered in both
Gbest and Lbest velocities before weighting, which could conceivably cause it to be over-
represented unless its social acceleration coefficient is decreased to account for this.
Rather than averaging together the two algorithms, it might be computationally simpler to
give the velocity update equation direct access to all three bests. UPSO or some variant,
due to its incorporation of Lbest PSO, may be able to reduce the effect of premature
convergence, but data has so far focused on the number of iterations necessary to
converge to a pre-specified solution quality and on the relative performance of UPSO
rather than on the absolute performance of the algorithm, which would indicate how well
it avoids premature convergence to approximate the global minimizer and facilitate
comparison with other published results.
Once the swarm has converged prematurely, there are at least five options: (i)
terminate the search and accept the best decision vector found as the proposed solution,
(ii) allow the search to continue and hope that the swarm will slowly refine the quality of
the proposed solution, though it is likely only an approximation of a local minimizer
rather than the desired global minimizer, (iii) restart the swarm from new locations and
search again to see if a better solution can be found as in MPSO, (iv) somehow flag
regions of the space to which particles have prematurely converged as already explored
and restart the algorithm so that each successive search is more likely to encounter the
global minimizer, or (v) reinvigorate the swarm by introducing diversity so the search can
continue more or less from the current location without having to restart and re-search
low quality regions of the search space.
14
Binkley and Hagiwara’s velocity-based reinitialization (VBR) shares with Van
den Bergh’s MPSO the idea of maintaining a list of global bests at stagnation. Rather
than restarting on the entire search space, however, the swarm is reinvigorated by
reinitializing velocities, which seems to be more efficient since the entire search space
does not necessarily need to be searched again. At the end of the search, the best of all
global bests is returned as the optimal value. Stagnation here is defined as the median
velocity dropping below a pre-specified threshold. The relatively difficult Rastrigin and
Rosenbrock benchmarks still present difficulty for the algorithm when the search space
consists of many dimensions [16], which is the case of primary concern in this thesis.
Research Objectives
Empirical Determination of Quality Parameters
This research firstly searches for high-quality parameters capable of performing
well in general to see how effectively proper parameter selection can prevent stagnation.
If these parameters are of high enough quality, the stagnation problem will be considered
solved. Otherwise, the regrouping concept will be developed and tested using these
parameters as a basis for comparison.
Development of Regrouping Mechanism for Gbest PSO
The second task is to develop a regrouping mechanism to liberate particles from
entrapping local wells or otherwise deceitful terrain in order to allow continued progress.
The resulting algorithm is called Regrouping PSO (RegPSO).
15
Testing of RegPSO
Testing will be conducted on a benchmark suite consisting of common uni-modal
and multi-modal problems of varying levels of difficulty, including the incorporation of
noise.
Data Comparison
The results of testing will be compared with (a) Gbest PSO and Lbest PSO using
somewhat standard parameters found to work well, (b) the high-quality empirically
determined parameters for Gbest PSO, and (c) other approaches that have been developed
for solving the stagnation problem.
Explore Applicability of RegPSO to Simple Uni-Modal Problems
The idea of regrouping is to help the swarm escape from the state of premature
convergence, which is primarily troublesome on multi-modal problems. However, the
potential applicability of the concept to the simple uni-modal case will be explored as
well.
Contributions
High-Quality PSO Parameters
Many parameter combinations will be tested in order to find a combination that
works well across the benchmark suite in conjunction with Gbest PSO. The resulting
parameters will serve not only as a comparison basis for RegPSO but as a good means to
delay stagnation for applications which do not allow sufficient time for regrouping to
take effect.
16
Development of Efficient Regrouping Mechanism
A regrouping mechanism is developed by which to liberate particles from the
state of premature convergence so that exploration can continue. This regrouping
mechanism will make use of the state of the swarm when premature convergence is
detected in order to re-organize the swarm according to information inferred from the
swarm state. The regrouping mechanism should work better than simply restarting on the
same search space repeatedly and should still be applicable to a variety of problem types.
Development of Regrouping Model Specifically for Uni-Modal Case
Whereas the previous contribution is expected to be useful on multi-modal
functions due to its exploratory intentions, it is desirable to show the applicability of the
same mechanism to the uni-modal case. It will be shown that RegPSO can have
parameters selected so as to regroup in a tiny region in order to help particles refine
solution quality or “exploit” the proposed solution.
17
CHAPTER II
PARTICLE SWARM OPTIMIZATION ALGORITHM
Problem Formulation
The goal of any optimization problem is to maximize or minimize an objective
function f x where x is the decision vector consisting of n dimensions or decision
variables consisting of real numbers. Since maximization of any function f x is
equivalent to minimization of f x , the literature generally focuses on minimization
without loss of generality.
Solution *x is a global minimizer of f x if and only if *f x f x for all x in the
domain of f x . The unconstrained minimization problem of consideration here can be
formulated as
Minimize
where : .n
f x
f R R (2.1)
Evolution of the PSO Algorithm
Original PSO Algorithm
The basic idea of particles searching individually while communicating with each
other concerning the global best in order to produce a more capable collective search
applies to
18
all forms of PSO from the originally conceived algorithm through the more
capable models available today.
Particle swarm, as originally published [4], consisted of a swarm of particles each
moving or flying through the search space according to velocity update equation
1 1 2 21i ii i i i iv k v k c r k p k x k c r k g k x k (2.2)
where
iv k is the velocity vector of particle i at iteration k ,
ix k is the position vector of particle i at iteration k ,
ip k is the n-dimensional personal best of particle i found from initialization
through iteration k,
g k is the n-dimensional global best of the swarm found from initialization
through iteration k,
1c is the cognitive acceleration coefficient so named for its term’s use of the
personal best, which can be thought of as a cognitive process whereby a particle
remembers the best location it has encountered and tends to return to that state,
2c is the social acceleration coefficient so named for its term’s use of the global
best which attracts all particles simulating social communication,
1ir k and 2i
r k are vectors of pseudo-random numbers with components
selected from uniform distribution (0,1)U at iteration k , and
is the Hadamard operator representing element-wise multiplication.
19
The farther a particle is from its personal best, the larger i ip x is and the
stronger the acceleration toward that point is expected to be. Notice that if a particular
dimension of the current position is greater than the same dimension of the personal best,
the acceleration on that dimension is negative, which means that the particle is pulled
back toward that location on that dimension. Of course, this implies that when the
personal best lies ahead of the current position, the particle will accelerate in the positive
direction toward the personal best so that each particle is always pulled toward its
personal best on each dimension. Similarly, the farther a particle is from its global best,
the larger ig x is and the stronger the acceleration is toward that point. The social and
cognitive acceleration coefficients, 1c and 2c , determine the respective strengths of those
pulls and relative importances of each best.
When each dimension of the social and cognitive terms is multiplied by a random
number, the acceleration is not necessarily directed straight toward the best. Were the
same random number used on all dimension, each pull will be straight toward its best.
Either way, particles are accelerated in two different directions at once so that they do not
actually accelerate straight toward either best.
At each iteration, the previous velocity is reduced by the inertia weight and
altered by both accelerations in order to produce the velocity of the next iteration.
Treating each iteration as a unit time step, a position update equation can be stated
as
1 1
for 1,2, , .
i i ix k x k v k
i s
(2.3)
20
“Lbest” PSO
Though Eberhart and Kennedy published the Lbest version the same year as the
Gbest version [13], it was Gbest PSO that gained prominence – apparently for its quick
initial convergence [5]. The only difference between the two is that the velocity update
equation of Lbest PSO uses a neighborhood best rather than the global best as explained
in the research motivation section. “Lbest” PSO often outperforms Gbest PSO as
demonstrated in Table V-1 since hasty decisions often lead to a compromise in solution
quality when taking more time would be practical, though for real-time implementations
or cases of limited available data, the ability to make real-time decisions – even if
imperfect – becomes valuable so that Gbest PSO may be better for such applications.
The velocity update equation of Lbest PSO can be formulated in vector notation as
1 1 2 21 .i ii i i i i iv k v k c r k p k x k c r k l k x k (2.4)
where il k is the local or neighborhood best at iteration .k
Inertia
Static Inertia Weight & Constriction Coefficient
There was a weakness inherent in velocity update equations (2.2) and (2.4) that
was fixed by the introduction of an inertia weight. For the following derivation, let 0k
be the iteration at which particles have their positions and, optionally, their velocities
randomly initialized. Then for any particle i , the velocity at iteration 1k is
1 1 2 21 0 0 0 0 0 0 0 .i ii i i i iv v c r p x c r g x (2.5)
21
Since a particle has only one position, (0),ix from which to choose in order to determine
its personal best, (0),ip of necessity (0) (0)i ip x and the middle term of equation (2.5)
is zero, so the particle’s velocity at iteration 1k can more succinctly be expressed as
2 21 0 0 0 0 .ii i iv v c r g x (2.6)
Using (2.2) again, the velocity of particle i at iteration 2k is
1 1 2 22 1 1 1 1 1 1 1 .i ii i i i iv v c r p x c r g x (2.7)
Substituting the value found in (2.6) for 1iv , the velocity at the second iteration
following initialization becomes
1 1
2
2
2 1 1 1
1 1 1
i
i
i i i
i
v c r p x
cr g x
i
i
2 i i
v 0
r 0 g 0 - x 0 (2.8)
with the substituted values in bold for emphasis. By velocity update equation (2.2), the
velocity of particle i at iteration 3k is
1 1
2 2
3 2 2 2 2
2 2 2 .
i
i
i i i i
i
v v c r p x
c r g x
(2.9)
22
Substituting for 2iv the value found in (2.8), the velocity at the third iteration
following initialization becomes
1
1
2
2
32 2 2
2 2 2
i
i
i
i i
i
v cr p x
c
r g x
i
i
i
1 i i
i
2 i i
2 i i
r 1 g p 1 - x 1v 0
r 0 g g 0 - x 0
+ r 1 g g 1 - x 1
(2.10)
with the substituted values in bold for emphasis.
By mathematical induction, it can be seen that
1 1
1
2 2
0
1 0
.
i
i
k
i i i i
a
k
i
a
v k v c r a p a x a
c r a g a x a
(2.11)
Because the personal bests and global best can only improve over time, 1iv k should
rely more heavily upon recent bests than upon early values. Yet (2.11) shows that early
information in ip a k and g a k is given just as much opportunity to affect
1iv k as is the higher quality information of later iterations since the information of
all iterations is summed without any weighting scheme by which to increase the relative
importance of the higher quality information of later iterations. This problem is remedied
by introducing either an inertia weight [17, 18], (0,1) , or constriction coefficient
[19], (0,1) , into velocity update equation (2.2) according to
1 1 2 21i ii i i i iv k v k c r k p k x k c r k g k x k (2.12)
or
1 1 2 21i ii i i i iv k v k c r k p k x k c r k g k x k (2.13)
23
Equation (2.13), which is from Clerc’s constriction models, can be rewritten as
1 1 2 21i ii i i i iv k v k c r k p k x k c r k g k x k (2.14)
which then simplifies to
3 1 4 21i ii i i i iv k v k c r k p k x k c r k g k x k (2.15)
where
3 1 4 2and .c c c c (2.16)
Since (2.15) is mathematically equivalent to (2.12) as a result of the acceleration
coefficients being set arbitrarily by the user prior to execution, converting between the
velocity update equation of the constriction coefficient models (2.13) and the standard
velocity update equation with inertia weight (2.12) is straightforward using (2.16).
However, the mathematical equivalence of the velocity update equations does not render
the constriction models mathematically equivalent to standard PSO since the position
updates for the former vary by type such that the velocity vector in those models does not
simply carry particles from their previous positions to their new positions. In this sense,
the velocity concept is redefined by the constriction models.
The constriction models are used in conjunction with Clerc’s equation
2
1 2
2, 4,
2 4
, 4
where
,
[0,1]
c c
(2.17)
which recommends a value for the constriction coefficient based on preselected values of
24
the acceleration coefficients, where smaller values of lead to quick convergence and
larger values allow more exploration.
Equation (2.17) is based on theoretical studies of particle trajectories, but since it
was hoped that the constriction models would eliminate the need for velocity clamping
[19], the calculations that led to (2.17) did not account for the velocity clamping value,
though it affects particles’ trajectories. This is unfortunately somewhat of a weakness in
the model since it appears from empirical testing that all PSO parameters are inter-related
so that (2.17) would be more useful if it accounted for the velocity clamping value,
which has continued to be beneficial as discussed in the following section.
The same process that led to (2.11) beginning with velocity update equation (2.2)
leads to (2.18) when beginning with velocity update equation (2.12) with inertia weight.
1
1 1
1
2 2
0
1 0
.
i
i
kk k a
i i i i
a
kk a
i
a
v k v c r a p a x a
c r a g a x a
(2.18)
So long as the inertia weight has a magnitude less than one, (2.18) shows that past
personal bests are expected to have less effect on a particle’s velocity at iteration 1k
than more recent personal bests due to the effect of multiplication at each iteration by the
inertia weight, . This makes sense conceptually since recent bests – both global and
personal – are expected to be of higher quality than past bests. However, past bests could
still have more effect on a particle’s overall velocity than recent bests for a while at the
beginning of the search since ig a x a , at least, is generally more significant in early
iterations when the swarm is more spread out.
25
Additionally, a particle’s initial velocity, which is not derived from any
information, but randomly initialized to lie between the upper and lower velocity
clamping values, becomes of less effect over time. This too makes sense because its
main benefit is in early iterations where it provides momentum by which to propel the
best particle, but after some time it effectively becomes noise diluting actual information.
Setting 1 would make velocity update equation (2.12) with inertia weight
equivalent to velocity update equation (2.2) without inertia weight so that (2.12) can be
accepted without a rigorous proof demonstrating its superiority to (2.2) since it simply
provides more options. So long as 0,1 , velocity update (2.12) helps particles forget
their lower-quality past positions in order to be more affected by the higher-quality
information of late, which seems to make more sense conceptually.
Time-Varying Inertia Weight
Decreasing the inertia weight over time would still allow the swarm to gradually
forget early information of relatively low quality, as in the static case, due to the iterative
multiplication of all past information by a fraction of one as in equation (2.18). For the
decreasing weight, however, information is forgotten more quickly than were the initial
value held constant. This time-decreasing weighting of information may provide more
balance between early and recent information since early information is forgotten at a
slower rate than later information due to the use of relatively large weights early in the
simulation. In other words, all memory is adversely affected, but short-term memory is
mostly affected. This potentially more balanced weighting of early information with late
information might help the standard algorithm postpone premature convergence to
candidate solutions of later iterations when appropriate initial and final values are used.
26
The decreasing inertia weight also allows early weights to be larger than were a
static weight used throughout the search. This corresponds to larger velocities early in
the search than would otherwise be seen, which may help postpone premature
convergence by facilitating exploration early in the search. The rate of decrease from
initial weight to final weight depends on the expected length of the simulation since the
step size is a fraction of the total number of iterations expected; hence, the amount of
time spent in the relatively explorative phase, as determined by the amount of time for
which the decreasing weight is larger than the value that would have been used for a
static weight, also depends on the expected length of the simulation. Table II-2 shows
data generated by decreasing the inertia weight gradually from 0.9 to 0.4 over the course
of 800,000 function evaluations.
Increasing the inertia weight, on the other hand, would cause past information to
be forgotten more rapidly than recent information due to the weighting distribution, thus
tremendously increasing the importance of the higher quality information of later
iterations. For the right range of values, this could conceptually lead to quicker initial
convergence due to less diversity being maintained; however, this could adversely affect
solution quality on difficult functions by upsetting the balance between exploration and
exploitation.
Quick convergence is desirable when successfully converging to a global
minimizer, but it is undesirable when the search is so hasty as to converge prematurely to
a local minimizer. There is a delicate balance to achieve in order to search efficiently yet
thoroughly. The time-varying weight attempts to improve that balance as inferred from
equation (2.18), which shows that at any iteration a particle’s velocity vector is the result
27
of weighted attractions toward past information, which frames a time-varying inertia
weight as affecting the balance between the rates of short-term and long-term
forgetfulness.
The first study to vary the inertia weight decreased it with the idea that this would
help particles converge upon and refine a solution by reducing velocities over time. This
appeared to work better over the thirty trials conducted [17]; but with only one
benchmark tested, it is conceivable that this might have been a characteristic of that
particular benchmark, which would be consistent with the findings of Meissner et al [20],
who used particle swarm to optimize its own parameters with very different parameters
being proposed per benchmark – including an increasing inertia weight on some
benchmarks and a decreasing weight on others. Since that experiment used Gbest PSO,
which tends to stagnate before reaching a global minimizer, the parameter combinations
recommended are likely not ideal, though they may be approximations of quality local
minimizers.
Whereas [20] found an increasing weight to outperform on some benchmarks,
[21] suggested that an increasing inertia weight outperformed on all benchmarks tested;
however, a different formulation of PSO was used so that the quicker convergence
claimed could not be attributed to the increasing weight alone. In an attempt to reproduce
the results of [21] using standard Gbest PSO, increasing the inertia weight from 0.4 to 0.9
with the same swarm size of 40 particles, acceleration constants 1.49618, and 1,000
iterations as used in the paper resulted in worse performance on all nine benchmarks
relative to decreasing the weight from 0.9 to 0.4. Therefore, decreasing the weight
appears better than increasing it, at least for the range between 0.9 and 0.4. When the
28
static weight was compared to decreasing, however, only the Ackley and Rastrigin
benchmarks saw much improvement from decreasing the weight; and performance on
Rosenbrock suffered from the decrease, so that decreasing the inertia weight is not
always best as can be seen by comparing the data of Table II-2 with that of Table II-1.
It is noteworthy that Naka and Fukuyama showed a decrease from 0.9 to 0.4 to
considerably outperform decreases from 2.0 to 0.9 and from 2.0 to 0.4 on their particular
state estimation problem [22], but they did not generate any comparison data using the
static inertia weight. Table II-1 and Table II-2 compare the performance of static and
decreasing inertia weights on some popular benchmark problems.
Velocity Clamping
Eberhart and Kennedy introduced velocity clamping, which helps particles take
reasonably sized steps in order to comb through the search space rather than bouncing
about excessively [13]. Clerc had hoped to alleviate the need for velocity clamping with
his constriction models [19]. Eberhart, however, showed clamping to improve
performance even when parameters are selected according to a simplified constriction
model (2.17) [18]. Clerc then compared equation (2.13) with velocity clamping to his
other constriction models without velocity clamping and concurred that velocity
clamping does offer considerable improvement even when parameters are selected
according to (2.17), so that the constriction models have not eliminated the benefit of
velocity clamping [19]. Consequently, velocity clamping has become a standard feature
of PSO.
Velocity clamping is done by first calculating the range of the search space on
each dimension, which is done by subtracting the lower bound from the upper bound.
29
For example, if each dimension of the search space is defined by lower and upper bounds
[ 100,100] , the range of the search space is 200 per dimension. Velocities are then
clamped to a percentage of that range according to
max range ,
(0,1]
j jv
(2.19)
where
range ,
for 1,2, ,
and search space, , defined in 2.21 .
U L
j j jx x
j n
(2.20)
For the commonly used clamping value of 0.5, if the center of the search space
lies at the origin of Euclidean space, the maximum velocity is simply the upper bound of
the search space; for example, a search space defined by [ 100,100] on any dimension
would have max 100jv . However, it is not desirable to define maximum velocity
explicitly in terms of the upper bound of the search space since this assumes that the
origin of Euclidean space will always be either the center or lower bound of the intended
search space so that the upper bound is proportional to the range to be explored – neither
of which is necessarily a valid assumption since some application problems have decision
variables, such as length, defined only for positive values, the lower bound of which may
not even be zero. Consequently, it is preferable to define the maximum velocity more
generally in terms of the range of the search space as in (2.19). If, for example, particles
are initialized on [100, 300] for any particular decision variable, they should logically
have the same maximum velocity as if they were initialized on [-100, 100], since the
same distance is expected to be traversed in either case.
30
The same maximum velocity should be applied in both the positive and negative
directions in order to avoid biasing the search in either the positive or negative direction.
The following pseudo code shows how velocities proposed by velocity update equation
(2.12) are clamped prior to usage in position update equation (2.3).
max
max
max
max
if 1
1
else if 1
1
end if
ij j
ij j
ij j
ij j
v k v
v k v
v k v
v k v
Figure II-1: Velocity Clamping Pseudo Code
As noted by Engelbrecht [23], clamping a particle’s velocity changes not only the
step size, but usually also the particle’s direction since changing any component of a
vector changes that vector’s direction unless each component should happen to be
reduced by the same percentage. This should not be thought of as a problem, however,
since each dimension is to be optimized independently, and the particle still moves
toward the global best on each dimension, though at a less intense speed. Since the
maximum iterative movement toward global best on any dimension is clamped, particles
may be thought of as combing the search space a bit more thoroughly than were their
velocities unclamped.
Though the same velocity clamping percentage of fifty percent is used in most
papers for sake of comparison, the value does not appear to have been optimized yet. Liu
et al suggested a value of fifteen percent [24], which has been empirically verified to
work well as shown in Table II-1 and Table II-2.
31
Table II-1: Effect of Velocity Clamping Percentage with Static Inertia Weight
Gbest PSO; 800,000 function evaluations; s = 20, c1 = c2 = 1.49618, = 0.72984
Benchmark n 0.15 0.5 1 Without Vel.
Clamping
Ackley
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
2.4952
2.5311
4.4409e-15
5.4122
1.0969
3.1206
3.6524
1.5017
7.0836
1.4975
3.6812
3.9281
0.9313
7.8162
1.6167
3.6544
4.003
1.6462
9.6772
1.8326
Griewangk
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.040416 0.1182
0
2.2675
0.3289
0.049122
0.055008 0
0.15666
0.044639
0.052756
0.12784
0
0.95838
0.21229
0.042845
0.070346
0
0.46229
0.094686
Quadric
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
1.9076e-80 4.7513e-74
1.3422e-84 1.469e-72
2.4174e-73
1.6824e-79
4.1822e-75
4.146e-84
2.0732e-73
2.9314e-74
4.4508e-79
4.0909e-75
2.253e-83
1.8646e-73
2.6362e-74
9.1067e-80
2.4315e-76 6.3926e-84
1.1208e-74
1.5841e-75
Quartic
with noise
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.0016193
0.0027877
0.00042632
0.016272
0.0030928
0.00272
0.0039438
0.00060861
0.019695
0.0040209
0.0023147
0.0044615
0.00077887
0.067881
0.0095446
0.0030738
0.0053736
0.00069323
0.031762
0.0063947
Rastrigin
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
51.7378
56.1753
24.874
91.536
14.4256
70.64194
71.63686
42.78316
116.4097
17.1532
75.11921
75.81567
38.80337
133.324
22.04992
83.0789
83.636
41.7882
136.3089
19.63376
Rosenbrock
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
1.6095e-9
1.2763 1.7652e-17
9.9657
2.1661
5.35546e-9
2.06915
2.68986e-18
13.315
3.1387
2.34906e-8
1.49279
1.2411e-18
10.101
2.42602
4.828e-9
1.87934
3.05779e-19 18.6845
3.60868
Schaffer’s f6
2
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0
0.0038864
0
0.0097159
0.0048081
0
0.0033034
0
0.0097159
0.0046492
0
0.0025261 0
0.0097159
0.004305
0
0.004275
0
0.0097159
0.0048718
Sphere
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0
4.6936e-322
0
2.332e-320
0
0
2.4703e-323
0
8.745e-322
0
0
0 0
3.4585e-323 0
0
6.4229e-323
0
2.7223e-321
0
Weighted
Sphere
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0
5.0889e-322
0
2.4012e-320
0
0
1.0869e-321
0
5.3903e-320
0
0
6.9169e-323 0
1.5563e-321 0
0
4.1007e-322
0
1.6601e-320
0
32
Clamping velocities to fifteen percent provided noticeably better performance in
median and mean values on multi-modal functions of high dimensions, where cautious
step sizes in light of new information proved most beneficial; Griewangk was the
exception, since one poorly performing trial significantly affected the mean function
value. Smaller step sizes seem to have helped avoid premature convergence to sub-
optimal, local minimizers. It appears that the standard velocity clamping value of fifty
percent widely used in the literature can be improved upon, and fifteen percent seems to
work well in agreement with Liu’s observation based on primarily different benchmarks
of low dimensions [24].
To determine if fifteen is also a good clamping percentage in conjunction with the
linearly decreased weight, each trial was repeated in Table II-2 using the same initial
positions and sequences of random numbers used to generate each row of Table II-1.
33
Table II-2: Effect of Velocity Clamping Percentage with Decreasing Inertia Weight
Gbest PSO; 800,000 evaluations; s = 20, c1 = c2 = 1.49618, from 0.9 to 0.4 linearly
Benchmark n 0.15 0.5 1 Without Vel.
Clamping
Ackley
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
7.9936e-15
1.1191e-14
4.4409e-15
4.3521e-14
8.0648e-15
7.9936e-15
1.0196e-14
4.4409e-15
2.931e-14
4.9149e-15
7.9936e-15
9.4147e-15 4.4409e-15
2.2204e-14
3.2892e-15
2.36167
3.58269
3.9968e-14
20.8328
3.77189
Griewangk
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.012319
0.022023
0
0.090322 0.024071
0.022141
0.028174
0
0.11254
0.027334
0.01109
0.018645 0
0.11942
0.023904
0.017239
0.025321
0
0.11743
0.02732
Quadric
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
1.2644e-17
2.3189e-14 2.6219e-22
8.6952e-13
1.2438e-13
3.6766e-17
1.4071e-12
1.0224e-22 5.0725e-11
7.6037e-12
3.8379e-16
8.577e-13
9.158e-20
1.6719e-11
3.3624e-12
1.3274e-15
8.9083e-11
7.0557e-21
4.0182e-9
5.6939e-10
Quartic
with noise
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.0015335
0.0015241
0.00033276
0.0028314
0.00065649
0.0019085
0.0021906
0.00086674
0.007024
0.0010925
0.0022996
0.002396
0.00081094
0.0071013
0.0013029
0.0024834
0.0028831
0.00075789
0.0078216
0.0015665
Rastrigin
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
24.3765
25.252 13.9294
42.7832
7.06661
25.8689
27.4808
8.95463 48.7529
8.29488
30.3462
31.4805
10.9445
57.7075
9.95111
40.7933
42.6439
18.9042
72.6317
11.5052
Rosenbrock
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
8.63847
18.859
0.000292151
81.5558 25.9117
8.655254
23.60514
5.062084e-5
143.8422
32.00213
6.576994
16.49861 3.950323e-5
103.6101
24.48977
11.81142
29.77723
8.374797e-6 571.7925
82.17547
Schaffer’s f6
2
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Sphere
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
2.8331e-106
1.0834e-94 6.4959e-117
4.885e-93
6.9319e-94
9.4358e-106
1.5939e-88
1.0629e-123 7.969e-87
1.127e-87
1.5402e-104
1.5349e-90
8.4934e-115
7.5037e-89
1.061e-89
1.4202e-101
1.6172e-90
3.1766e-114
7.9451e-89
1.1233e-89
Weighted
Sphere
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
9.6085e-104 4.4182e-93
7.6884e-121 1.6078e-91
2.3941e-92
1.0392e-102
6.1741e-91
5.7277e-115
3.0868e-89
4.3654e-90
3.1051e-103
4.9532e-96 1.4453e-115
1.5059e-94
2.4592e-95
3.8037e-99
7.185e-90
5.6992e-112
3.5536e-88
5.0246e-89
34
On Rastrigin and the noisy Quartic, results were again better for velocities
clamped to fifteen percent of the range of the search space, which was the best percentage
from Table II-1. The best results on Rosenbrock in conjunction with the decreasing
weight were obtained by clamping velocities to a maximum step size equal to the full
range of the search space, which differs from what was seen in Table II-1. Other
performance differences were not of considerable magnitude. Fifteen percent appears to
be the best velocity clamping value for most of the benchmarks tested in Table II-1 and
Table II-2.
Comparison between Table II-1 and Table II-2 shows that on the simple uni-
modal Quadric, Sphere, and Weighted Sphere benchmarks as well as on the more
difficult, uni-modal Rosenbrock, the decreasing inertia weight had an adverse effect on
performance, whereas on the multi-modal Ackley, Griewangk, Rastrigin, Schaffer’s f6,
and the essentially multi-modal Quartic with noise, the decreasing weight improved
performance. It was observed by comparing the numbers of iterations required to
produce a small function value that the static weight produced quicker convergence
across the benchmark suite, but this quicker convergence in turn led to stagnation on
multi-modal functions sooner than did the slower convergence achieved by the linearly
decreased weight. This means that the weighting of past information via the time-
decreasing inertia weight is problem-dependent where the balance achieved by
decreasing the weight was best for relatively difficult multi-modal functions for which
the increased exploration resulting from a larger weight in early iterations proved
beneficial, while the quicker convergence of the static weight was better for the simpler,
uni-modal functions.
35
Standard “Gbest” PSO
Objective
Elaborating on (2.1), repeated below for convenience, the optimization problem
considered herein is to
minimize
: n
f x
f R R
where f is the objective function, or cost function, of an application problem. The
decision vector, ,nx R consists of the n decision variables to be optimized, thus
producing the most desirable function value. A decision vector is called the global
minimizer if it produces the optimal function value called true global minimum. Even
though (2.1) is considered an unconstrained optimization problem, in practice only
solutions belonging to a subset nR are considered feasible. The search space is
defined by a subset
1 2 21
L U L U L U, , , n
n nx x x x x x R (2.21)
where L
jx and U
jx are, respectively, the lower and upper bounds of the search space along
dimension j for 1,2, , .j n
Initializations
For a particle swarm of size s , each particle, 1 2
[ , , , ]ni i i ix x x x , represents a
potential solution to the optimization problem where particles are indexed as 1,2, , .i s
The swarm is initialized by randomizing particles’ positions about the center, center , of
36
the search space, , using random numbers drawn from a uniform distribution so that no
portion of the search space is preferred over any other as would result from random
numbers being drawn from a normal distribution. This can be done according to (2.22)
below.
1 2
L UL U L U
1 1 2 2
10 ( ) ( )
2
where , ,..., with
(0,1) randomly selected.
= , ,..., ,2 2 2
which is usually known in ad
i i i
i
i i
i n
j
n n
x k center r range range
r r r r
r U
x xx x x xcenter
vance rather than calculated.
(2.22)
The personal bests are then initialized to be the same as the initial positions since
there are no past positions with which to compare:
0 0 .i ip k x k (2.23)
Let 1 2( ) ( ), ( ), , ( )sP k p k p k p k be the set of all personal bests at iteration .k In Gbest
PSO, the global best is initialized and iteratively updated to be the best of all personal
bests according to
arg min .i
ip k P k
g k f p k
(2.24)
The value of each particle’s velocity along dimension j is initially randomized to
lie within max max,j j
v v
according to
max max
1 2
2
where , ,..., with
(0,1) randomly selected.
i i
i d
j
v k r v v
r r r r
r U
(2.25)
37
and subsequently clamped to lie within the same range since particles should only need to
step through some maximum percentage of the search space per iteration. Before
velocity clamping was implemented, particles were prone to roam far outside the bounds
of the search space [25]. The value of max
jv is selected as a percentage, , of the range of
the search space along dimension j according to (2.19) and (2.20) repeated below [26].
max range ,
(0,1]
j jv
range ,
for 1,2, , .
U L
j j jx x
j d
range ( )j represents the range of search space along dimension ,j where can be
thought of as the hypercube to be searched, with range j being the length of that
hypercube along dimension .j The velocity clamping percentage, , is usually chosen
within range 0.1 0.5 .
Iterative Swarm Motion
Iteratively, particle i moves from its current position to a new position along
velocity vector, 1 2
[ , , , ],ni i i iv v v v according to position update equation (2.3) restated
below, where the rate or velocity may be thought of as being multiplied by a unit time
step of one iteration.
1 1
for 1,2, , .
i i ix k x k v k
i s
The velocity is first calculated according to velocity update equation (2.12),
restated below.
38
1 1 2 21
for 1,2, ,
i ii i i i i iv k v k c r p k x k c r g k x k
i s
As the stepping process according to velocity and position update equations (2.12)
and (2.3) continues, particles update their personal bests as they encounter better
positions than encountered previously. At any point in time, the best of all personal bests
is the swarm’s global best shared freely between particles. Particles eventually converge,
via their communication of the global best and collective movement toward it, to the one
best position they have found. The algorithm can be allowed to run either for a number
of iterations expected to produce a good solution or until a user-specified criterion or
threshold is reached.
Each particle keeps a memory of its personal best, ( ),ip k for its own
consideration; this is the n-dimensional location, or position, that has produced the best
function value over the particle’s search through the current iteration. Each personal best
is updated only when the particle’s new position at iteration 1k yields a better function
value than does the personal best at iteration k as shown below.
1 if 11 .
if 1
i i i
i
i i i
x k f x k f p kp k
p k f x k f p k
(2.26)
In Gbest PSO the global best, ( ),g k is iteratively updated according to the same equation
(2.24) by which it was initialized. It is then “communicated” via shared computer
memory to all particles for consideration.
The effects of the inertial term, cognitive term, social term, and velocity clamping
percentage on particles’ velocities is illustrated in Figure II-3 through Figure II-7 for
39
swarm size 10,s acceleration constants 1 2 1.49618,c c inertia weight
0.72984, and velocity clamping percentage 0.15. The acceleration coefficients
and inertia weight were obtained from Clerc’s constriction model (2.17) [27]. The
velocity clamping value was selected following the suggestion in [24] and since it
worked well with Gbest PSO in Table II-1 and Table II-2, though the benchmarks in
these tables were of much higher dimensionality than used in [24].
Figure II-2: Rastrigin Benchmark Used for 2-D Illustration
40
Figure II-3: Swarm Initialization (Iteration 0)
Positions and velocities are randomly initialized. Personal bests and the
global best are initialized accordingly. Particles 1 and 3 are selected to
visually illustrate how velocities update and are clamped.
Figure II-4: First Velocity Updates (Iter. 1)
The randomly initialized velocities of iteration 0 decrease via the inertia
weight, and particles accelerate toward the global best of particle 6. There is
no cognitive acceleration since all particles are initially at their personal bests.
The black resultant vectors stem from clamping on each dimension.
41
Figure II-6: Second Velocity Updates (Iter. 2)
Particle 1 continues moving downward to the left according to its inertia and
social acceleration. Particle 3 now experiences cognitive acceleration, which
together with its leftward social acceleration overcome its inertia to the right; it
experiences a larger acceleration toward the global best, as expected from (2.12),
due to the global best being farther away than its personal best.
Figure II-5: First Position Updates (Iter. 1)
Particles move along their resultant velocity vectors to new positions. Particle
1 found a new personal best. The new position of particle 3 evaluates to a
higher function value, so its previous position is still its personal best.
42
The main challenge seen in the literature is that PSO tends to stagnate as illustrated
in the next section.
Illustration of Premature Convergence
The swarm is said to have prematurely converged when the proposed solution is
not a global minimizer and when progress toward better minima has ceased so that
continued activity could only hope to refine the quality of the solution converged upon,
which may or may not be a local minimizer. Stagnation is a result of premature
convergence. Once particles have converged prematurely, they continue converging to
within extremely close proximity of each other so that the global best and all personal
bests are within one miniscule region of the search space. Since particles are continually
attracted to the bests in that same small vicinity, particles stagnate as the momentum
from their previous velocities wears off. While particles are technically always moving,
Figure II-7: Second Position Updates (Iter. 2)
Particles iteratively follow their resultant velocity vectors to new positions.
43
stagnation can be thought of as a lack of movement discernable on the large scale, from
which perspective the stagnated swarm will appear as one dot or point.
The multi-modal Rastrigin function is one of the most difficult benchmarks
common in PSO literature because of its many local wells, each of which has a steep rate
of decrease relative to the overall curvature as shown in Figure II-2. These local wells
make the true global minimizer difficult to discover. PSO can successfully traverse many
of the wells containing local minima that would trap a gradient-based method but often
gets stuck in high-quality wells near the true global minimizer.
In order to illustrate the stagnation problem that has plagued PSO since its
original formulation, Gbest PSO was applied to minimize the two-dimensional Rastrigin
function of Figure II-2 using swarm size 10,s acceleration constants
1 2 1.49,c c inertia weight 0.72, and velocity clamping percentage 0.15.
Swarm motion is graphed on the colored contour maps of Figure II-8 through
Figure II-16, where particles can be seen flying from random initialization to eventual
stagnation at the local minimizer near [2,0]. The true global minimizer at [0,0] is not
discovered. A particle finds the relatively quality region near local minimizer [2,0] and
communicates this new global best to the rest of the swarm. As other particles fly in its
direction, none finds a better global best, so all converge near position [2,0] as momenta
wane. This is the continuation of the search used to illustrate how velocities update in
Figure II-3 through Figure II-7.
44
Figure II-8: Swarm Initialization at Iteration 0
Particles are randomly initialized within the search space.
Figure II-9: Converging (Iter. 10)
Particles are converging to local minimizer [2,0] via their
attraction to the global best in the vicinity.
45
Figure II-10: Exploratory Cognition and Momenta (Iter. 20)
Cognitive accelerations toward personal bests and “momenta”
keep particles searching prior to settling down.
Figure II-11: Convergence Continues (Iter. 30)
As momenta wane and no better global best is found, particles
continue converging to local minimizer [2,0] .
46
Figure II-12: Momenta Wane (Iter. 40)
Momenta continue to wane as particles are repeatedly pulled
toward (a) the global best near [2,0] and (b) their own
personal bests in the same vicinity.
Figure II-13: Premature Convergence (Iter. 102)
The local minimizer near [2,0] is being honed in on, but no
progress is being made toward a better solution as the swarm
has converged prematurely without hope of escape.
47
Stagnation is clearly the main obstacle of PSO as little if any progress is made in
this state. Chapter III conducts an extensive experiment to search for a set of parameters
capable of preventing or postponing stagnation. Chapter IV presents the formulation and
pseudo code for PSO with regrouping (RegPSO). Chapter V compares RegPSO using
standard parameters to (a) Gbest PSO using the best parameters of Chapter III, (b) the
multi-start PSO (MPSO) of Van den Bergh for escaping from premature convergence
once it is detected, and (c) opposition-based PSO (OPSO) designed to maintain swarm
diversity with the hopes of preventing stagnation.
48
CHAPTER III
EMPIRICAL SEARCH FOR QUALITY PSO PARAMETERS
Rastrigin Experiment Outlined
While the parameters derived from Clerc’s constriction model are commonly used
in the literature, this is largely so that improvements can be compared in a
straightforward manner with those of other articles and papers using the same set of
parameters. It has not been empirically determined that these are actually the best
parameters available, and other parameters have been suggested [5, 28].
Since parameter selection affects solution quality, proper selection can be thought
of as postponing stagnation. Consequently, one becomes curious whether such selection
could prevent stagnation altogether. If simple parameter selection alone could prevent
stagnation, this would be preferable since it would not require any modification to the
standard algorithm. Before developing a novel mechanism, an empirical test was done to
check whether parameter selection itself might adequately prevent stagnation.
To explore this possibility, many different combinations of the social acceleration
coefficient, the cognitive acceleration coefficient, and the inertia weight were tested on
the relatively difficult multi-modal Rastrigin benchmark while holding constant: (i) the
velocity clamping threshold at fifteen percent of each dimension’s range, (ii) the swarm
size at thirty, (iii) the number of iterations at three thousand, and (iv) the number of trials
49
per parameter combination at fifty. A swarm size of thirty was utilized for the
experiment since the question was not yet whether parameter selection could prevent
stagnation with a small swarm size but whether it could prevent stagnation even with a
modestly large swarm size.
Parameter combinations were tested by building into the PSO Research Toolbox a
mechanical exploratory feature to automatically implement the following rules: (i) for the
starting values of the acceleration coefficients, initialize the inertia weight to a value
expected to work well; (ii) generate fifty trials for this set of parameters and record the
median, mean, minimum, maximum, and standard deviation in one column of a table;
(iii) increment the inertia weight by 0.01 in either direction; (iv) conduct fifty more trials
and record the resulting statistics in a new column; (v) sort the columns from lowest
inertia weight to highest; (vi) evaluate whether increasing or decreasing the inertia weight
appears most promising based on the median values generated; (vii) increment the inertia
weight in the most promising direction except for every fifteenth column, where the
opposite direction is selected in case the apparent direction is wrong; (viii) as long as the
best median or the best mean is in one of the outer six columns, repeat steps iv – vii to
continue testing other values of the inertia weight; (ix) when neither the best median nor
the best mean are in any of the outer six columns, increment both acceleration
coefficients by 0.1, thus preserving the difference between them; (x) until the maximum
number of tables is reached, use simple trends in the best inertia weight of each table to
determine a good starting value of the inertia weight for the next table, or if an
insufficient number of tables have been generated from which to infer a trend, begin with
the best inertia weight of the previous table; (xi) until the user-specified minimum
50
number of tables is reached, repeat steps ii – x to test various values of the inertia weight
for each new combination of acceleration coefficients. At the conclusion of this process,
tables of data were displayed for human analysis, after which the difference between the
acceleration constants was incremented, and the process was repeated. The linearly
varied inertia weight was not tested here as it would have added a tremendous number of
possible parameter combinations.
At this point, the best median and mean per table were highlighted, the difference
between the acceleration coefficients was incremented from zero by one-tenth to four and
three-tenths with the social coefficient kept larger than the cognitive coefficient for this
study, and steps (i) through (xi) were repeated. The toolbox allowed for thousands of
parameter combinations to be tested on Rastrigin with fifty trials generated per
combination.
Independent Validation of “Social Only” PSO
The best median over three thousand iterations was produced by combination
1 20, 3.8, 0.02c c w . This combination interestingly constitutes the “social-only”
PSO found by Kennedy to quickly train an ANN for solving the XOR problem [29] and
should be thought of as an independent validation of that model. The unique difference
here is the slightly negative inertia weight.
For the social-only PSO, equation (2.18) simplifies to
1
2 2
0
1 0 .i
kk k a
i i i i
a
v k v c r a g a x a
(2.27)
Notice that the slightly negative inertia weight implies that it can be beneficial for the
swarm to be somewhat skeptical of new information (i.e. to take information with a grain
51
of salt) by slightly trusting the social information of any particular iteration for inertia
weight exponent k a even and slightly distrusting the same information for k a odd
such that the each iteration’s information is trusted and distrusted in an oscillatory
fashion with past information being quickly forgotten due to iterative multiplication by
the small inertia weight. On the conceptual level, the proposed parameters in
combination with Kennedy’s social-only model place the most importance on new
information, which is alternately trusted and distrusted until practically forgotten.
Even though i ig a x a is expected to have a larger magnitude per dimension
in earlier iterations when the swarm is more spread out, the small inertia weight will
dominate the product causing past global bests as well as past personal bests to have less
effect on each particle at iteration 1k than more recent bests due to the effect of
multiplication at each iteration by the slight inertia weight.
The curiosity at this point was whether the recommended parameters would
perform well in general or simply be characteristic of the Rastrigin benchmark. The
results of testing this combination across the popular suite of benchmarks are displayed in
Table III-1 where the original swarm size of thirty is tested over the initial ninety-
thousand function evaluations as well as over a full eight-hundred thousand function
evaluations in order to test not only the initial convergence rate but also the eventual
solution quality produced. This work is more concerned with eventual solution quality,
though relatively short trials were used to derive the parameters in order to examine many
different parameter combinations. Swarm sizes of twenty-five and twenty were also
tested out of curiosity.
52
Table III-1: “Social-only” Gbest PSO with Slightly Negative Inertia Weight
c1 = 0, c2 = 3.8, = -0.02, = 0.15, 50 trials per benchmark per column Benchmark n s = 30
90,000 FE’s
s = 30
800,010 FE’s
s = 25
800,000 FE’s
s = 20
800,000FE’s
Ackley
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
1.8655e-5
0.014907
3.1492e-6
0.73192
0.10347
3.4728e-13 0.014887
5.0626e-14 0.73192
0.10348
6.6905e-10
0.0010873
5.0626e-14
0.044198
0.0062883
4.0669e-6
0.024254
7.9048e-14
0.64485
0.11786
Griewangk
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.022134 0.046633
4.347e-10
0.76961
0.10895
0.022134
0.046435
3.3307e-16 0.76961
0.10903
0.040576
0.047599
5.5511e-16
0.15602
0.042542
0.030896
0.068903
1.8874e-15
0.70899
0.13558
Quadric
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.7794
1.1431
0.23305
3.6574
0.86872
2.5499e-7
0.12165
8.5254e-23
3.452
0.55226
0.00042113
0.610391
1.12388e-15
22.0692
3.14867
0.0076941
1.52907
5.27791e-8
37.6524
5.90377
Quartic
with noise
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.07581
0.088061
0.01469
0.23258 0.048645
0.075803
0.088052
0.014682
0.23258
0.048643
0.10865
0.11937
0.03071
0.25844
0.051093
0.12476
0.13695
0.040125
0.29015
0.057916
Rastrigin
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
3.3673e-6
0.11948
1.1064e-8
1.9899 0.38348
1.6875e-14
0.099497
0
1.9899
0.36238
4.802e-7
0.54013
3.5527e-15
4.0506
1.0923
0.99496
0.78361
1.7764e-14
3.9817
0.95187
Rosenbrock
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
25.93824
44.77153
0.08502217
167.5172
38.19886
9.84766
14.0412
0.0244401 85.1125
18.3926
15.1304
22.2481
0.0647027
79.3204 24.0442
16.84264
30.30881
0.0547216
136.9281
34.61948
Schaffer’s f6
2
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.0097159
0.010428
0
0.037224
0.0058499
0.0097159
0.010428
0
0.037224
0.0058499
0.0097159
0.010039
0
0.037224
0.0062036
0.0097159
0.0098774 0
0.037224
0.0043897
Sphere
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
1.3321e-11
4.318e-9
3.4333e-13
2.1134e-7
2.9875e-8
6.0994e-35
4.2675e-9
4.2073e-94
2.112e-7
2.9863e-8
2.1304e-22
0.00015874
2.2951e-89
0.0061238
0.00088397
3.0375e-12
0.0003836
1.837e-54
0.012502
0.0019265
Weighted Sphere
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
2.9151e-10
0.00033694
1.8131e-12
0.016814
0.0023778
8.5506e-32 0.00033694
6.7149e-95
0.016814
0.0023778
2.6998e-23
1.6776e-6
1.6519e-95
8.241e-5
1.1651e-5
2.508e-11
6.4829e-5
3.7237e-52
0.00088496
0.00017912
53
It is not surprising that the parameters derived using a swarm size of thirty
generally performed best in conjunction with the same swarm size. Even for the same
thirty particles, however, performance was still lacking on Rosenbrock using the
empirically derived parameter combination.
While Table III-1 evidences that complicated functions such as Rastrigin can be
solved fairly well by optimizing parameters for the problem at hand, comparison with
Table II-1 shows that improved performance on Rastrigin came at the cost of deteriorated
performance on Rosenbrock, such that parameter optimization seems to be problem-
dependent. Meissner et al attempted to use PSO to optimize its own parameters [20], and
their results also indicate parameter selection to be problem-dependent. It is cautioned,
however, that since standard PSO was used as the master PSO in that paper, the
parameters recommended for each benchmark should not necessarily be viewed as
optimal since stagnation may have been an issue.
Even the best parameters found for Rastrigin reduce its function value on average
only to one-tenth of one unit; furthermore, the fact that practically the same average
performance was usually seen over both short and long trials suggests that parameter
selection, though effective at postponing stagnation, was not able to avoid it.
Socially Refined PSO
In order to develop a comparison basis by which to gauge the success of RegPSO
relative to the approach of optimizing parameters, other parameters found to work well
on Rastrigin were tested to see which perform well across benchmarks. Interestingly, the
Socially Refined PSO with small, negative inertia weights tested in Table III-2 and Table
III-3 outperformed the social-only and predominantly social parameters with small,
54
positive inertia weights sampled from the same vicinity of the parameter-defined search
space. According to cumulative velocity update equation (2.18), the Socially Refined
PSO trusts both the social and cognitive information of any particular iteration when
k a is even and slightly distrusts the same information when k a is odd while quickly
forgetting past information. In other words, particles oscillate between trust and distrust
until the information is forgotten about in light of new information. According to the
data presented in Tables III-2 through III-3, this may be healthier than simply trusting all
information.
55
Table III-2: “Socially Refined” PSO with Slightly Negative Inertia Weight
c1 = 0.1, c2 = 3.7, = -0.01, = 0.15, 50 trials per benchmark per column
Benchmark n 30s
800,010 FE’s
25s
800,000 FE’s
20s
800,000 FE’s
Ackley
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
3.2863e-14
3.5918e-14
1.8652e-14 1.1458e-13
1.3505e-14
3.2863e-14 3.6984e-14
1.8652e-14
8.6153e-14
1.2481e-14
3.9968e-14
4.9916e-14
2.931e-14
2.78e-13
3.5686e-14
Griewangk
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.018469
0.029549
0
0.12983 0.031745
0.018452
0.023528 0
0.14943
0.028115
0.020953
0.029729
0
0.1662
0.031242
Quadric
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
1.0538e-20
4.0136e-20
5.2502e-24
6.9434e-19
1.0243e-19
4.7631e-23
1.3101e-22
5.5505e-25
2.3811e-21
3.4278e-22
1.4344e-26
2.1305e-25
1.7896e-28
7.1409e-24
1.0083e-24
Quartic
with noise
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.0017654
0.0024619
0.00063002
0.013427
0.0023139
0.0027207
0.0046598
0.00068381
0.041015
0.0067316
0.003471
0.0044198
0.00090772
0.046851
0.0063827
Rastrigin
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
8.8818e-16
0.23879
0
1.9899
0.47398
1.7764e-15
0.25869
0
1.9899 0.52456
3.5527e-15
0.45768
0
3.9798
0.8339
Rosenbrock
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
8.16376
8.3122
0.012567
33.2969
6.72267
2.65706
5.3097 0.000600161
14.8977
5.54191
3.77921
5.71585
1.36618e-7
14.4996
5.43735
Schaffer’s f6
2
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.0097159
0.00855 0
0.0097159
0.0031894
0.0097159
0.0093273
0
0.0097159
0.0019233
0.0097159
0.0093273
0
0.0097159
0.0019233
Sphere
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
2.1808e-108
2.4203e-106
1.4095e-112
5.141e-105
8.6854e-106
1.6591e-124
1.2844e-122
5.2522e-129
3.6725e-121
5.2553e-122
1.6609e-146
2.5295e-143
3.5956e-151
1.135e-141
1.6036e-142
Weighted Sphere
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
6.8498e-107
8.8944e-105
3.3557e-111
3.8603e-103
5.458e-104
1.8222e-123
6.2982e-121
6.6437e-128
1.2251e-119
2.202e-120
2.2059e-145
4.2659e-141
1.335e-149
2.0798e-139
2.9401e-140
56
Socially Refined PSO parameters 1 20.1, 3.7, 0.01c c were able to improve upon
the “social-only” parameter combination by removing one-tenth from the social
acceleration coefficient and applying it to the cognitive component. Improved
performance on Rastrigin again came at the cost of deteriorated performance on
Rosenbrock when compared to Table II-1 so that parameter selection is once again seen
to be problem-dependent.
57
Table III-3: “Socially Refined” PSO with Small, Negative Inertia Weight
c1 = 0.1, c2 = 3.5, = -0.1, = 0.15, 50 trials per benchmark per column
Benchmark n 30s
800,010 FE’s
25s
800,000 FE’s
20s
800,000 FE’s
Ackley
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
2.931e-14
3.1299e-14
1.5099e-14
6.839e-14 9.9237e-15
3.8192e-14
3.6557e-14
2.2204e-14
7.5495e-14
9.682e-15
3.9968e-14
5.3966e-14
2.2204e-14
2.0695e-13
3.9636e-14
Griewangk
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.017226
0.025327
0
0.15367
0.030708
0.017241
0.026762
0
0.10746 0.026636
0.014772
0.023663 0
0.12269
0.026331
Quadric
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
7.3887e-29
5.0276e-28
2.0621e-31
1.0916e-26
1.5776e-27
1.4001e-25
7.6784e-25
2.2748e-27
6.5308e-24
1.3586e-24
7.3887e-29
5.0276e-28
2.0621e-31
1.0916e-26
1.5776e-27
Quartic
with noise
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.0024288
0.0031599 0.0012296
0.02221
0.0030853
0.0031992
0.004724
0.00085537 0.039222
0.0059909
0.0046839
0.0072108
0.0015745
0.025236
0.0055917
Rastrigin
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.49748
0.71637
0 5.9698
1.1378
0.99496
0.89546
0
4.9748
1.1237
0.99496
1.4924
1.7764e-15
5.9697
1.4249
Rosenbrock
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
5.63767
6.17404
0.00437651
14.5122
5.87016
1.2464
4.2523
0.000173502
17.1673
5.324
0.934938
3.82356
6.75498e-6
11.5996
4.44694
Schaffer’s f6
2
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.0097159
0.011334
0 0.037224
0.0080548
0.0097159
0.009133
0
0.0097159 0.0023308
0.0097159
0.0097159
0.0097159
0.0097159
0
Sphere
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
1.9666e-130
4.5786e-125
3.5655e-135
2.2883e-123
3.2361e-124
5.5713e-148
4.2839e-146
4.1483e-152
1.2209e-144
1.7476e-145
3.8905e-172
2.3075e-166
1.0487e-177
1.1487e-164
0
Weighted Sphere
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
4.1491e-129
1.1582e-127
1.258e-132
2.1567e-126
3.4054e-127
1.3336e-146
3.1883e-143
4.3628e-151
1.0893e-141
1.6308e-142
7.1988e-172
2.635e-168
5.0297e-176
7.7798e-167
0
58
The case that parameter selection is problem-dependent stems not only from the
tradeoff in performance between Rastrigin and Rosenbrock seen by comparing the tables
of this chapter with Table II-1: the tables of this chapter alone show performance
improving with swarm size on Rastrigin and the noisy Quartic while deteriorating with
swarm size on Rosenbrock while the number of function evaluations is held constant.
This means that not even one swarm size is most efficient for all problems. Furthermore,
for the social-only model, performance on Sphere and Weighted Sphere improved with
swarm size, but that trend reversed in the predominantly social model. The need to
optimize parameters for the problem at hand is a weakness since it requires an additional
optimization process prior to the optimization problem itself.
This chapter has empirically derived quality parameters to serve as a comparison
basis by which to test the proposed regrouping mechanism. The small, negative inertia
weight in conjunction with Socially Refined PSO provided good, general performance
with swarm sizes of twenty and twenty-five. While proper parameter selection can be
seen to postpone stagnation quite effectively, it may be insufficient to prevent stagnation
based on the thousands of parameter combinations tested for this chapter.
The problem-dependence of parameter selection suggests that the ability to escape
from the state of premature convergence via a regrouping mechanism in order to continue
searching for better regions might be a more generally applicable approach to dealing
with stagnation.
59
CHAPTER IV
REGROUPING PARTICLE SWARM OPTIMIZATION
Regroup: “to reorganize (as after a setback) for renewed activity” [30].
Motivation for Regrouping
Since parameter selection is problem-dependent such that postponing stagnation
may require an extensive optimization process prior to the optimization problem itself;
and since even with parameters optimized for the problem at hand, the swarm may still
stagnate as seen in Table III-1; a regrouping mechanism is sought to liberate the swarm
from the state of premature convergence, thus enabling continued progress toward a
global minimizer.
The goal of the proposed Regrouping PSO (RegPSO) is to detect when particles
have prematurely converged and regroup them within a new search space large enough to
escape from the local well in which particles have become trapped but small enough to
provide an efficient search. It is thought that this will provide an efficient means of
escape from the state of premature convergence so that the swarm can continue making
progress rather than restarting; while continually restarting the search requires running
the search an arbitrary number of times, which may or may not suffice to discover a
true global minimizer, RegPSO seeks to improve upon past searches.
60
Detection of Premature Convergence
As discussed earlier, all particles are pulled on all dimensions toward the global
best via update equations (2.12) and (2.3). If no particle encounters a better global best
over a period of time, the swarm will continually move closer to the unchanged global
best until the entire swarm has converged to one small region of the search space. If
particles actually have happened upon a global minimizer, they may refine that solution
by their tiny movements toward it; but in all other cases, it is undesirable for particles to
remain in this state. Therefore, it is useful to measure how near particles are to each
other so that an effective action can be taken once they have converged to the same
region. Van den Bergh’s Maximum Swarm Radius criterion for detecting premature
convergence is adopted for this purpose. It is proposed herein that when premature
convergence is detected using this maximum swarm radius measurement, the swarm be
regrouped in a new search space centered at the global best as follows.
At each iteration, k , the swarm radius, ( )k , is taken to be the maximum
Euclidean distance, in n-dimensional space, of any particle from the global best.
1, ,( ) max ( ) ( )
where is the Euclidean norm of any vector .
ii s
k x k g k
a a
(4.1)
Let r represent the hypercube making up the search space at regrouping index
r , where r is initialized to zero and incremented by one with each regrouping so that
0 represents the initial search space within which particles are initialized and 1 2, ,...
are the subsequent search spaces within which particles are regrouped or re-initialized.
Let rrange be the vector containing the side lengths, or range per dimension, of
search space r as shown in (4.2).
61
1 2, ,...,r r r r
nrange range range range
(4.2)
The n-dimensional hypercube r then has sides of length r
jrange for 1,2,...,j n .
Let ( )rdiam represent the “diameter” of search space r , calculated as the
Euclidean norm of vector rrange .
r rdiam range (4.3)
Particles are considered too close to each other and regrouping is triggered when the
normalized swarm radius, norm , defined as the ratio of the maximum Euclidean distance
of any particle from global best to the diameter of the search space [11] falls below a
user-specified stagnation threshold, , satisfying premature convergence condition (4.4)
norm r
k
diam
(4.4)
An empirical study found 41.1 10 to work well with the proposed regrouping
mechanism. Regrouping too early did not allow for the desired degree of solution
refinement, while regrouping too late meant wasting time in a stagnated state prior to
regrouping.
Swarm Regrouping
When premature convergence is detected by condition (4.4), the swarm is
regrouped in a new search space centered at the global best. The side lengths, or range
on each dimension, of the hypercube defining the new search space, r , are determined
by (a) the magnitude of the regrouping factor, , which is inversely proportional to the
stagnation threshold as shown in (4.5)
62
6
,5
(4.5)
and (b) the degree of uncertainty inferred on each dimension from the maximum
deviation from global best. Note that the degree of uncertainty as inferred
computationally simply in (4.6) differs from the maximum Euclidean distance of any
particle from global best in (4.1) as the former is the maximum deviation per dimension
over all particles and the latter is the maximum Euclidean deviation of any one particle.
0
,1, ,
( ) min ( ), maxr
j j i j ji s
range range x k g k
(4.6)
The hypercube defining the new search space, r , is proportional on each dimension to
the degree of uncertainty upon detection of premature convergence, except that the range
on each dimension of r is clamped to a maximum of the range on the same dimension
of the initial search space, 0 , as shown in (4.6).
Each particle is then randomly regrouped about the global best within r
according to
1 2
11 ( ) ( )
2
where , ,...,
with each (0,1) randomly selected.
i i i
j
r r
i i
i n
i
x k g k r range range
r r r r
r U
(4.7)
This randomizes particles to lie within implicitly defined search space
1 2 21
, , , , , ,, , ,r L r U r L r U r L r U r
n nx x x x x x (4.8)
with respective lower and upper bounds
63
,
,
1( ),
2
1( ).
2
j
j
L r r
j j
U r r
j j
x g range
x g range
(4.9)
“Gbest” PSO Continues as Usual
The velocity clamping values are re-calculated based on the dimensions of the
new search space, r , according to
max, ,r rv range (4.10)
where superscript r is again the regrouping index.
Velocities are then re-initialized to lie within new range max, max,,r r
j jv v per
dimension according to
max, max,
1 2
2
where , ,...,
with each (0,1) randomly selected
r r
i i
i n
j
v k r v v
r r r r
r U
(4.11)
Personal bests are re-initialized as originally done such that
.i ip k x k (4.12)
Rather than being re-initialized, the global best is remembered across regroupings.
This allows the search that was in progress prior to the occurrence of premature
convergence to continue since particles are attracted back to the best point found so far
while combing the search space along the way due to their cognitive pulls. After each
regrouping, velocities and positions continue updating as in Gbest PSO with particles
being regrouped within a new search space according to (4.6) and (4.7) when premature
convergence condition (4.4) is met.
64
If premature convergence occurs near an edge of the hypercube defining the
original search space, the new search space may not necessarily be a subspace of the
original search space since it may be desirable to search outside the original bounds if the
initial search space was only a guess as to where solutions were likely to be found.
Restricting particles to the original search space is easy to do via position clamping or
velocity reset [31] if it is known for a fact that better solutions do not lie outside the
search space. In practice, it is easier to make an educated guess as to where a solution
will lie than to know for certain that no better solutions can be found elsewhere; for this
reason, particles are not generally required to stay within the search space if they have
good reason to explore outside of it.
Since the PSO algorithm works well prior to premature convergence, the new
RegPSO algorithm does not require changes to the original position and velocity update
equations but merely liberates the swarm from premature convergence via an automatic
regrouping mechanism. The pseudo code for RegPSO is given in Figure IV-1.
65
Do with Each New Grouping
For j = 1 to n
If r = 0
Calculate 0( )jrange according to (2.20).
Else
Calculate ( )r
jrange according to (4.6).
End If
End For
Calculate max,rv according to (4.10).
Calculate the diameter, rdiam , of the current search space using (4.3).
For 1i to s
Randomly initialize particles’ velocities, ,iv k according to (4.11).
Randomly initialize particles’ positions, ( ),ix k to lie within r .
Initialize personal bests: ( ) ( )i ip k x k .
End For
If r = 0
Initialize the global best, g k , according to (2.24).
End If
Do Iteratively Update velocities according to (2.12).
Clamp velocities when necessary according to Figure 1.
Update positions according to (2.3).
Update personal bests according to (2.26).
Update global best according to (2.24).
Calculate the swarm radius according to (4.1).
If (i) the premature convergence criterion of (4.4) is met or (ii) a
user-defined maximum number of function evaluations per grouping
is satisfied,
Then regroup the swarm.
End If
Until search termination
Figure IV-1: RegPSO pseudo code
66
Two-Dimensional Demonstration of the Regrouping Mechanism
In chapter two, the swarm behavior of Gbest PSO was observed within the search
space of the two-dimensional Rastrigin benchmark in order to demonstrate the stagnation
problem. At iteration 102 in Figure II-13, premature convergence condition (4.4) was
satisfied, which automatically triggered the regrouping shown in Figure IV-2, by which
to escape the state of premature convergence. Figures IV-2 through IV-17 show how
RegPSO helps the swarm find the global minimizer. Figure IV-18 shows the benefit that
regrouping has on the function value. These figures were generated using the same
parameters used for Figures II-3 through II-13 along with stagnation threshold
41.1 10 and regrouping factor 6
.5
Figure IV-2: Swarm Regrouped (Iter. 103)
RegPSO detected premature convergence at iteration 102 of Figure II-13.
The swarm is regrouped above at iteration 103 in order to continue making
progress toward the global minimizer. Personal bests are re-intitialized
according to (4.12). The new, smaller search space is shown.
67
Figure IV-3: PSO in New Search Space (Iter. 113)
Gbest” PSO continues as usual within the new search space after
regrouping. The swarm is returning cautiously to the global best
with new momenta, personal bests, and perspectives.
Figure IV-4: Swarm Migration (Iter. 123)
The swarm is migrating toward a better position found by one
of the particles near [1, 0].
68
Figure IV-5: New Well Considered (Iter. 133)
Some particles are refining the approximation to the local minimizer
near [1, 0] while others continue exploring due to their momenta and
cognitive accelerations.
Figure IV-6: Most Bests Relocated (Iter. 143)
Most of the particles’ personal bests now belong to the well
containing the local minimizer near [1, 0]. Notice the uncertainty
on the horizontal dimension.
69
Figure IV-7: Swarm Collapses (Iter. 153)
Particles collapse on the horizontal dimension to the new improved well.
Figure IV-8: Horizontal Uncertainty (Iter. 163)
Cognitively, the swarm doubts its decision on the horizontal dimension
more so than on the vertical dimension.
70
Figure IV-9: Uncertainty Remains (Iter. 173)
The relative uncertainty on the horizontal dimension is still evident.
Figure IV-10: Becoming Convinced (Iter. 183)
The entire swarm is converging to the local minimizer near [1, 0],
refining the quality of the solution with small steps toward it.
71
Figure IV-11: Premature Convergence Detected (Iter. 219)
Premature convergence is again calculated via condition (4.4).
Particles have had enough time to refine solution quality, which is
an important part of the search, and will be regrouped in the
following iteration.
Figure IV-12: Second Regrouping (Iter. 220)
Because particles were cognitively less certain of their solution on the
horizontal dimension than on the vertical, the swarm regrouped
about the global best with a larger horizontal range than vertical.
72
Figure IV-13: Better Well Discovered (Iter. 230)
Particles return with momenta and cognitive restraint toward the global
best remembered near local minimum [1, 0].
Figure IV-14: Swarm Migration (Iter. 240)
The efficient regrouping mechanism helps particles quickly find the
well containing global minimizer [0, 0].
73
Figure IV-15: Swarm Collapsing (Iter. 250)
Particles swarm toward the new global best.
Figure IV-16: Particles Swarm to the Newly Found Well (Iter. 260)
Personal bests now lie within the new well, eliminating the cognitive
pull to other locations. Momenta wane.
74
Figure IV-17: Convergence (Iter. 270)
Solution refinement of the global minimizer is in progress.
Figure IV-18: Effect of Regrouping on Cost Function Value
Performance comparison of Gbest PSO and the proposed RegPSO
on the Rastrigin benchmark with dimension n = 2.
Figure IV-18 shows that shortly after the first regrouping and immediately after
the second regrouping, solution quality was improved as particles escaped premature
convergence in order to continue onward toward the true global minimizer rather than
75
simply stagnating in place. Having presented the regrouping concept in two dimensions,
its effectiveness is now tested in the much more difficult thirty-dimensional case.
76
CHAPTER V
TESTING AND COMPARISONS
While the linearly decreasing inertia weight was shown in Table II-2 to improve
solution quality on multi-modal functions by postponing stagnation, this results from a
weighting scheme where early information is forgotten less rapidly than late information
as inferred from cumulative velocity update equation (2.18). Postponing stagnation in a
more cautious search, however, prevents regrouping from being triggered as often. It was
found to be more beneficial to allow the quick convergence of the popular static weight
and regroup at premature convergence than to take a considerably longer time to
converge cautiously and regroup less often. For this reason, the popular static inertia
weight, 0.72984 , is proposed for use with RegPSO.
While clamping velocities to fifteen percent rather than the more common fifty
percent generally leads both to quicker convergence and higher quality solutions, the
value of fifty percent was found to be most useful with RegPSO, which is surely
attributable to the re-initialization of velocities within max, max,,r r
j jv v per dimension
according to equation (4.11) with each regrouping. Using larger velocities at regrouping
would certainly help particles escape from entrapping regions in order to find better
minimizers – not only because larger velocities carry particles farther from entrapping
regions but also because larger velocities mean larger momenta by which to overshoot
77
the global best and explore on the other side of the entrapping region before converging
again upon the global best if no better position is found by which to update it.
Hence, it is with the standard parameters recommended by Clerc’s (2.18) that
RegPSO is tested in conjunction with velocities clamped to the popular fifty percent of
the range of the search space per dimension.
Comparison with Standard “Gbest” & “Lbest” PSO’s
In Table V-1, Gbest PSO, Lbest PSO and RegPSO are compared side by side for
800,000 total function evaluations. The point of selecting this number is to show that
RegPSO is capable of avoiding stagnation and continuing onward to approximate the true
global minimizer if given enough time to do so. The standard algorithms use the linearly
decreasing inertia weight and velocity clamping to fifteen percent demonstrated to work
well for Gbest PSO in Table II-1 and Table II-2 and empirically verified to work well
with Lbest PSO. The results on Rastrigin are especially impressive since this benchmark
generally returns high function values in the literature due to stagnation of the swarm. It
is clear that RegPSO is more consistent across the benchmark suite.
78
Table V-1: RegPSO Compared to Gbest PSO & Lbest PSO with Neighborhood Size
2
800,000 function evaluations, s = 20, c1 = c2 = 1.49618, 50 trials per row per column,
RegPSO used 41.1 10 ;
11.2 ; 100,000 function evaluations max per grouping.
Benchmark n Gbest PSO
0.15,
0.9 to 0.4
Lbest PSO with
neighborhood size of 2
0.15,
0.9 to 0.4
RegPSO
0.5,
0.72984
Ackley
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
7.9936e-15
1.1191e-14
4.4409e-15
4.3521e-14
8.0648e-15
7.9936e-15
1.0623e-14
7.9936e-15
1.5099e-14
3.428e-15
5.0832e-7
5.2345e-7
1.9571e-7
9.7466e-7
1.6771e-7
Griewangk
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.012319
0.022023
0
0.090322
0.024071
0.009861
0.012538
0
0.075718
0.015404
0.0098573
0.013861
0
0.058867
0.01552
Quadric
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
1.2644e-17
2.3189e-14
2.6219e-22
8.6952e-13
1.2438e-13
5.877e-24
5.9577e-22
2.0446e-28
1.5377e-20
2.2534e-21
2.5503e-10
3.1351e-10
6.0537e-11
9.5804e-10
2.2243e-10
Quartic
with noise
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.0015335
0.0015241
0.00033276
0.0028314
0.00065649
0.0024195
0.0025417
0.00084968
0.0044732
0.00070295
0.0006079
0.00064366
0.0002655
0.0012383
0.00021333
Rastrigin
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
24.3765
25.252
13.9294
42.7832
7.06661
28.8538
31.2746
15.9193
73.6268
11.419
2.3981e-14
2.6824e-11
0
1.3337e-9
1.886e-10
Rosenbrock
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
8.63847
18.859
0.000292151
81.5558
25.9117
0.070101
1.0713
9.1079e-6
4.0744
1.7196
0.0030726
0.0039351
1.7028e-5
0.018039
0.0041375
Schaffer’s f6
2
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Sphere
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
2.8331e-106
1.0834e-94
6.4959e-117
4.885e-93
6.9319e-94
8.4679e-241
2.1967e-215
1.1756e-258
1.0983e-213
0
5.8252e-15
9.2696e-15
1.2852e-15
4.9611e-14
8.6636e-15
Weighted Sphere
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
9.6085e-104
4.4182e-93
7.6884e-121
1.6078e-91
2.3941e-92
3.5402e-240
1.2102e-225
7.5531e-252
5.8251e-224
0
8.1295e-14
9.8177e-14
1.9112e-14
2.5244e-13
5.4364e-14
79
As was shown in Table II-1 and Table II-2, the decreasing inertia weight
improved the performance of Gbest PSO on most of the benchmark suite at the cost of
deteriorated performance on Rosenbrock. "Lbest” PSO does not respond as adversely as
Gbest PSO on Rosenbrock to the otherwise beneficially decreasing inertia weight, but it
is outperformed by Gbest PSO on Rastrigin. “Lbest” PSO seems to perform well on
simple uni-modal benchmarks but suffers on the more complicated uni-modal
Rosenbrock, multi-modal Rastrigin, and noisy Quartic.
Regrouping was not necessary to successfully traverse the multi-modal Ackley
function since its local wells are minor relative to the overall curvature leading to the
global minimizer; however, only RegPSO consistently solved the more difficult multi-
modal Rastrigin due to its prominent local wells, which have a significant impact relative
to the slight overall curvature leading to the global minimizer. RegPSO also provided the
best performance in the presence of noise. Only RegPSO consistently solved the tricky
Rosenbrock, and only RegPSO was able to consistently solve the multi-modal
benchmarks.
Only RegPSO was able to approximate the true global minimizer for all four
hundred and fifty trials, which can be ascertained from the worst case performance per
benchmark. The two standard PSO algorithms show greater problem dependency such as
on Rastrigin, where the true global minimizer was not approximated with even one trial
by either standard PSO algorithm. Due to its apparently lower problem-dependency,
RegPSO may be more applicable than standard PSO for solving problems about which
little is known in advance since it performed consistently in the presence of noise, on
multi-modal benchmarks, and on uni-modal benchmarks.
80
Figure V-1: Mean Behavior of RegPSO on 30D Rastrigin
A swarm size of 20 is sufficient to approximate the global minimizer of the
30-D Rastrigin and reduce the cost function to approximately true minimum.
Comparison with Socially Refined PSO
In Table V-2, RegPSO is compared to the best of the Socially Refined PSO
parameters derived from the Rastrigin experiment of chapter three in order to test
whether the regrouping mechanism provides better general performance than even
parameters painstakingly chosen by trial and error. Each algorithm uses its own quality
velocity clamping percentage.
81
Table V-2: RegPSO Compared with Socially Refined PSO
RegPSO used 41.1 10 ;
11.2 ; 100,000 function evaluations max per grouping.
Benchmark n Socially Refined
PSO
1 20.1, 3.5,
0.1, 0.15,
20
c c
s
Socially Refined
PSO
1 20.1, 3.7,
0.01, 0.15,
20
c c
s
RegPSO
1
2
1.49618,
1.49618,
0.72984,
0.5, 20
c
c
s
Ackley
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
3.9968e-14 5.3966e-14
2.2204e-14
2.0695e-13 3.9636e-14
3.9968e-14
4.9916e-14 2.931e-14
2.78e-13
3.5686e-14
5.0832e-7
5.2345e-7
1.9571e-7
9.7466e-7
1.6771e-7
Griewangk
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.014772
0.023663
0
0.12269
0.026331
0.020953
0.029729
0
0.1662
0.031242
0.0098573
0.013861 0
0.058867
0.01552
Quadric
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
7.3887e-29
5.0276e-28
2.0621e-31
1.0916e-26
1.5776e-27
1.4344e-26
2.1305e-25
1.7896e-28
7.1409e-24
1.0083e-24
2.5503e-10
3.1351e-10
6.0537e-11
9.5804e-10
2.2243e-10
Quartic
with noise
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.0046839
0.0072108
0.0015745
0.025236
0.0055917
0.003471
0.0044198
0.00090772
0.046851
0.0063827
0.0006079
0.00064366
0.0002655
0.0012383
0.00021333
Rastrigin
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.99496
1.4924
1.7764e-15
5.9697
1.4249
3.5527e-15 0.45768
0 3.9798
0.8339
2.3981e-14
2.6824e-11
0
1.3337e-9
1.886e-10
Rosenbrock
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.934938
3.82356
6.75498e-6
11.5996
4.44694
3.77921
5.71585
1.36618e-7
14.4996
5.43735
0.0030726
0.0039351
1.7028e-5
0.018039
0.0041375
Schaffer’s f6
2
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.0097159
0.0097159
0.0097159
0.0097159
0
0.0097159
0.0093273
0 0.0097159
0.0019233
0
0
0
0
0
Sphere
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
3.8905e-172
2.3075e-166
1.0487e-177
1.1487e-164
0
1.6609e-146
2.5295e-143
3.5956e-151
1.135e-141
1.6036e-142
5.8252e-15
9.2696e-15
1.2852e-15
4.9611e-14
8.6636e-15
Weighted
Sphere
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
7.1988e-172
2.635e-168
5.0297e-176
7.7798e-167
0
2.2059e-145
4.2659e-141
1.335e-149
2.0798e-139
2.9401e-140
8.1295e-14
9.8177e-14
1.9112e-14
2.5244e-13
5.4364e-14
82
While the Socially Refined PSO improves performance over the standard Gbest
and Lbest PSO’s, RegPSO is still seen to be less problem-dependent and consequently
more consistent across the benchmark suite than the Socially Refined PSO resulting from
the Rastrigin experiment of chapter three. RegPSO demonstrates the versatility to solve
different types of problems.
Comparison with MPSO
In Table V-3, MPSO using the normalized swarm radius convergence detection
criterion was selected for comparison since it was Van den Bergh’s best-performing
restart algorithm: outperforming guaranteed convergence PSO (GCPSO), multi-start PSO
using the cluster analysis convergence detection technique (MPSOcluster), multi-start PSO
using the objective function slope convergence detection technique (MPSOslope), and
random particle swarm optimization (RPSO). MPSO [11] restarts particles on the
original search space when premature convergence is detected, and it uses an improved
local optimizer, GCPSO [11, 12], as its core search algorithm rather than the basic Gbest
PSO for which the regrouping mechanism is currently being tested. RegPSO is compared
to MPSO in order to confirm that the proposed regrouping mechanism is indeed more
efficient than continually restarting on the original search space.
83
Table V-3: RegPSO Compared with MPSO
50 trials per benchmark per algorithm; 200,000 function evaluations per trial.
s = 20, 1 20.5, 1.49618,c c 60.72984, and 10 for both algorithms.
RegPSO used 11.2 and 100,000 function evaluations max per grouping.
Benchmark n MPSO [11]
(using GCPSO)
RegPSO
(using Gbest PSO)
Ackley 30 Median
Mean
0.931
0.751
2.0806e-8
2.4194e-8
Griewangk 30 Median
Mean 1.52E-9
1.99E-9
0.019684
0.030309
Rastrigin 30 Median
Mean
45.8
45.8 10.9525
11.9726
Mean Performance
15.517 4.001
The comparison was not fair because GCPSO is an improved form of Gbest PSO;
the regrouping mechanism was, however, efficient enough to overcome the handicap and
provide greater consistency than that provided by continually restarting GCPSO on the
original search space.
Comparison with OPSO
In Table V-4, RegPSO is compared to opposition-based PSO (OPSO) with
Cauchy mutation, which was developed to “accelerate the convergence of PSO and avoid
premature convergence” [14]. In OPSO, each particle has a fifty percent chance of being
selected to have its position opposite the center of the swarm evaluated in addition to
having its own position evaluated. If the opposite position is better, the particle jumps to
that position, leaving the less beneficial position behind. This is done to maintain
diversity with the hope of avoiding premature convergence. The Cauchy mutation
mutates the global best according to a distribution capable of providing large mutations at
times when compared to normal or uniform distributions; when the mutated position is
better than the original global best, the mutation is kept.
84
OPSO was presented with a swarm size of ten for an expected sixteen function
evaluations per iteration resulting from ten particles: five opposite positions expected
according to probability one-half, and one extra function evaluation to consider mutating
the global best. However, it has been found empirically to work better with larger swarm
sizes. OPSO is compared with RegPSO using twenty particles over eight hundred
thousand function evaluations in order to see how capable the algorithm really is at
avoiding premature convergence. OPSO is given the benefit of the fifteen percent
velocity clamping value found to work well with the Gbest PSO it utilizes and
empirically verified to perform better with OPSO than a clamping value of fifty percent.
85
Table V-4: OPSO Compared with RegPSO both with and without Cauchy Mutation
800,000 function evaluations; s = 20, c1 = c2 = 1.49618, ω = 0.72984
Benchmark n OPSO
0.15
OPSO with Cauchy
mutation
0.15
RegPSO
0.5
Ackley
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
2.2686
2.3144
0.9313
4.3405
0.68001
1.8997
1.9851
7.9936e-15 3.3449
0.69098
5.0832e-7
5.2345e-7 1.9571e-7
9.7466e-7
1.6771e-7
Griewangk
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.04051
0.071839
0
1.1762
0.17207
0.016007
0.025233
0
0.15272
0.030445
0.0098573
0.013861
0
0.058867
0.01552
Quadric
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
1.323e-73
7.3917e-71
6.4403e-79
1.6987e-69
2.7855e-70
1.7387e-69
3.4375e-66
3.0621e-74
1.1251e-64
1.639e-65
2.5503e-10
3.1351e-10
6.0537e-11
9.5804e-10
2.2243e-10
Quartic
with noise
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.0048506
0.0050701
0.0019586
0.0086246
0.0015018
0.0047243
0.0047242
0.0024328
0.0072114
0.0011654
0.0006079
0.00064366
0.0002655
0.0012383
0.00021333
Rastrigin
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
45.7681
48.136
19.8992
73.6268
13.0222
49.74788
51.63829
24.87396
107.4552
16.05482
2.3981e-14
2.6824e-11
0
1.3337e-9
1.886e-10
Rosenbrock
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
5.3552e-7 1.70137
5.88319e-18 12.0378
2.6865
0.000817947
2.45937
7.26269e-17
24.5545
3.9952
0.0030726
0.0039351 1.7028e-5
0.018039
0.0041375
Schaffer’s f6
2
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Sphere
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0
3.4585e-322
0
1.6171e-320
0
0
0
0
2.4703e-323
0
5.8252e-15
9.2696e-15
1.2852e-15
4.9611e-14
8.6636e-15
Weighted Sphere
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0
9.3872e-323
0
4.0019e-321
0
0
2.4703e-323
0
4.0019e-322
0
8.1295e-14
9.8177e-14
1.9112e-14
2.5244e-13
5.4364e-14
86
OPSO did not meet its goal of avoiding premature convergence as can be seen
from the Rastrigin benchmark over eight hundred thousand function evaluations.
RegPSO again provided the best consistency across benchmarks. The consistency across
the benchmark suite is a result of regrouping, which on very simple functions can
actually prevent particles from continuing to refine solution quality by regrouping when it
is not necessary to do so. This tradeoff is eagerly accepted when it is not known in
advance that a particular function is extremely simple since it is far more important to
approximate the global minimizer than to have better approximations sometimes and
horrible approximations other times; however, if it is known in advance that a problem is
quite simple to solve, a different regrouping mechanism is provided in the following
section specifically for this case so that no tradeoff in performance is necessary.
In all tables, the mean performance of RegPSO across benchmarks was superior
to that of the comparison algorithms.
RegPSO for Simple Uni-Modal Problems
Having demonstrated RegPSO to be less problem-dependent and more consistent
across the benchmark suite, the question became whether RegPSO might be capable of
improving performance on simple, uni-modal functions. Toward this end, a tiny
stagnation threshold of 2510 was combined with a regrouping factor of
6
19
110 ,
10
which is a much smaller fraction of the inverse of the stagnation
threshold than used previously. The results are shown in Table V-5.
87
Table V-5: A RegPSO Model for Solution Refinement Rather than Exploration 25 6
1 220, 0.72984,c c 1.49618, 10 , 10
800,000function evaluations, 100,000 function evaluations max per grouping
s
Benchmark n RegPSO
0.5
Ackley
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
3.1206
3.6524
1.5017
7.0836
1.4975
Griewangk
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.049122
0.055008
0
0.15666
0.044639
Quadric
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
7.6883e-77
1.7754e-72
1.5117e-82
7.4807e-71
1.0691e-71
Quartic
with noise
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0.00058695
0.00063169
0.0002655
0.0012383
0.00021131
Rastrigin
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
70.64194
71.63686
42.78316
116.4097
17.1532
Rosenbrock
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
1.7703e-16
0.87706
9.1369e-21
3.9866
1.6682
Schaffer’s f6
2
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0
0
0
0
0
Sphere
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0
0
0
0
0
Weighted Sphere
30
Median:
Mean:
Minimum:
Maximum:
Std. Dev.:
0
0
0
0
0
Note that these results on the simple uni-modal Quadric, Sphere, and Weighted
Sphere are the best of all algorithms tested over a full eight hundred thousand function
evaluations, which suggests that RegPSO is highly scalable.
88
CHAPTER VI
CONCLUSIONS
An approach for dealing with the stagnation problem in PSO has been tested by
building into the algorithm a mechanism to automatically trigger swarm regrouping when
premature convergence is detected. The regrouping mechanism helps liberate particles
from the state of premature convergence and enables continued progress toward a global
minimizer. RegPSO has been shown to have better mean performance than the
algorithms compared with – a result that would have been more pronounced had only
multi-modal benchmarks been used. RegPSO also consistently outperformed in the
presence of noise. Given sufficient function evaluations, RegPSO was able to solve the
stagnation problem for each benchmark tested and approximate the true global minimizer
with each trial conducted.
Though the parameters used for RegPSO worked consistently across the
benchmark suite, it is not claimed that parameters have been fully optimized. While
RegPSO seems capable of reducing the problem-dependency usually seen in the standard
PSO algorithms so that parameter optimization may be less important, parameters such as
the regrouping factor certainly do have some degree of problem dependency. It may be
necessary to change parameters should the problem at hand present unusual difficulty.
For example, should greater precision be necessary, a smaller stagnation threshold could
89
be selected in order to allow more solution refinement prior to regrouping; conversely,
should less precision be necessary, particles could be regrouped sooner by setting a larger
stagnation threshold in order to achieve a quicker overall search.
RegPSO appears to be a good general purpose optimizer based on the benchmarks
tested, which is certainly encouraging; however, it is cautioned that the empirical nature
of the experiment is not a theoretical proof that RegPSO will solve every problem well:
certainly, its performance must suffer somewhere. Future work will try to understand
where the algorithm suffers in order to understand any limitations and apply it to the
proper contexts. One such difficulty seen already was with simple uni-modal functions,
where regrouping is unnecessary since particles quickly and easily approximate the
global minimizer to a high degree of accuracy, and where there is no better minimizer to
be found. However, even in this context, regrouping proved beneficial when the
stagnation threshold and regrouping factor were set small enough to help particles
improve accuracy of approximations to the true global minimizer – often finding it
exactly.
While the regrouping mechanism has been tested in conjunction with standard
Gbest PSO in order to demonstrate the usefulness of the mechanism itself, there does not
seem to be anything to prevent the same regrouping mechanism from being applied with
another search algorithm at its core. Performance may be improved in conjunction with
an improved local minimizer such as GCPSO.
It may be beneficial to consider turning off the regrouping mechanism once
particles have repeatedly converged to the same solution. This would allow eventual
90
solution refinement of greater precision rather than repeatedly cutting off the local search
in favor of exploration elsewhere.
It has been empirically observed that clamping velocities to fifteen percent of the
range of the search space on each dimension often provides a quicker convergence to
solutions of higher quality in conjunction with standard PSO. RegPSO using standard
Gbest PSO as its core, however, appears to benefit from larger velocities such as those
clamped to fifty percent of the range on each dimension. The larger maximum velocity
facilitates exploration after regrouping by allowing larger step sizes and more significant
momenta by which to resist repeated premature convergence to the remembered global
best. It may be possible to further improve RegPSO via a velocity clamping value that
gradually decreases from fifty percent to fifteen percent with each grouping, so the
benefits of both values can be reaped.
RegPSO seems to improve performance consistency with one set of parameters
by facilitating escape from potentially deceitful local wells and to solve simple, uni-
modal problems free of entrapping wells quite well with another set of parameters
designed to regroup within a tiny region rather than to escape from that region. It is
suspected that RegPSO may provide a degree of scalability previously missing in the
standard PSO algorithm.
91
REFERENCES
[1] J. Holland, Adaptation in Natural and Artificial Systems: University of Michigan
Press, 1975.
[2] T. Back, D. Fogel, and Z. Michalewicz, Evolutionary Computation 1: Basic
Algorithms and Operators: IOP Press, 2000.
[3] T. Back, D. Fogel, and Z. Michalewicz, Evolutionary Computation 2: Advanced
Algorithms and Operations: IOP Press, 2000.
[4] J. Kennedy and R. C. Eberhart, "Particle swarm optimization," in Proceedings of
the IEEE International Conference on Neural Networks, Perth, Australia, 1995,
pp. 1942-1948.
[5] R. Hassan, B. Cohanim, O. d. Weck, and G. Venter, "A comparison of particle
swarm optimization and the genetic algorithm," in Proceedings of the 46th
AIAA/ASME/ASCE/AHS/ASC, Austin, TX, 2004.
[6] X. Hu and R. C. Eberhart, "Solving constrained nonlinear optimization problems
with particle swarm optimization. 2002 (SCI 2002), Orlando, USA. 2002," in
Proceedings of the Sixth World Multiconference on Systemics, Cybernetics and
Informatics, Orlando, 2002.
[7] J. L. Denebourg, J. M. Pasteels, and J. C. Verhaeghe, "Probabilistic behaviour in
ants: a strategy of errors?," Journal of Theoretical Biology, vol. 105, pp. 259-271,
1983.
[8] F. Moyson and B. Manderick, "The collective behaviour of ants: an example of
self-organization in massive parallelism," in Actes de AAAI Spring Symposium on
Parallel Models of Intelligence, Stanford, California, 1988.
[9] S. Goss, S. Aron, J.-L. Deneubourg, and J.-M. Pasteels, "The self-organized
exploratory pattern of the Argentine ant," Naturwissenschaften, vol. 76, pp. 579-
581, 1989.
[10] J. M. Bishop, "Stochastic searching networks," in Proc. 1st IEE Conf. on
Artificial Neural Networks, London, 1989, pp. 329-331.
92
[11] F. Van den Bergh, "An analysis of particle swarm optimizers," PhD thesis,
Department of Computer Science, University of Pretoria, Pretoria, South Africa,
2002.
[12] F. Van den Bergh and A. P. Engelbrecht, "A new locally convergent particle
swarm optimiser," in Proceedings of the IEEE Conference on Systems, Men,
Cybernetics, Hammamet, Tunisia, 2002, pp. 96-101.
[13] R. C. Eberhart and J. Kennedy, "A new optimizer using particle swarm theory," in
Micro Machine and Human Science, MHS '95., Proceedings of the Sixth
International Symposium on, Nagoya, Japan, 1995, pp. 39-43.
[14] H. Wang, Y. Liu, S. Zeng, H. Li, and C. Li, "Opposition-based particle swarm
algorithm with cauchy mutation," in Proc. of the 2007 IEEE Congress on
Evolutionary Computation, Singapore, 2007, pp. 4750-4756.
[15] C. Worasucheep, "A particle swarm optimization with stagnation detection and
dispersion," in Proceedings of the IEEE Congress on Evolutionary Computation,
Hong Kong, 2008, pp. 424-429.
[16] K. J. Binkley and M. Hagiwara, "Balancing exploitation and exploration in
particle swarm optimization: velocity-based reinitialization," Information and
Media Technologies, vol. 3, pp. 103-111, 2008.
[17] Y. Shi and R. Eberhart, "A modified particle swarm optimizer," in Evolutionary
Computation Proceedings, 1998, IEEE World Congress on Computational
Intelligence., The 1998 IEEE International Conference on, Anchorage, AK, 1998,
pp. 69-73.
[18] R. Eberhart and Y. Shi, "Comparing inertia weights and constriction factors in
particle swarm optimization," in Proc. Congress on Evolutionary Computation,
La Jolla, CA, 2000, pp. 84-88.
[19] M. Clerc and J. Kennedy, "The particle swarm - explosion, stability, and
convergence in multidimensional complex space", IEEE Transactions on
Evolutionary Computation, vol. 6, pp. 58-73, Feb. 2002.
[20] M. Meissner, M. Schmuker, and G. Schneider, "Optimized particle swarm
optimization (OPSO) and its application to artificial neural network training,"
BMC Bioinformatics, vol. 7, 10 March, 2006.
[21] Y. Zheng, L. Ma, L. Zhang, and J. Qian, "On the convergence analysis and
parameter selection in particle swarm optimization," in Proceedings of the Second
International Conference on Machine Learning and Cybernetics, 2003, pp. 1802-
1807.
93
[22] S. Naka and Y. Fukuyama, "Practical distribution state estimation using hybrid
particle swarm optimization," IEEE Power Engineering Society Winter Meeting,
vol. 2, pp. 815-820, Jan. 2001.
[23] A. P. Engelbrecht, Computational Intelligence: An Introduction, 2 ed.: John
Wiley and Sons, 2007.
[24] B. Liu, L. Wang, Y. Jin, F. Tang, and D. Huang, "An improved particle swarm
optimization combined with chaos," Chaos, Solition and Fractals, vol. 25, pp.
1261-1271, 2005.
[25] R. C. Eberhart, P. Simpson, and R. Dobbins, Computational Intelligence PC
Tools: Academic Press Professional, 1996.
[26] A. P. Engelbrecht, Fundamentals of Computational Swarm Intelligence: John
Wiley & Sons, 2006.
[27] M. Clerc and J. Kennedy, "The particle swarm-explosion, stability, and
convergence in multidimensional complex space," IEEE Transactions on
Evolutionary Computation, vol. 6, pp. 58-73, 2002.
[28] A. Carlisle and G. Dozier, "An off-the-shelf pso," in Proceedings of the Particle
Swarm Optimization Workshop, Indianapolis, IN, 2001.
[29] J. Kennedy, "The particle swarm: social adaptation of knowledge," in
Proceedings of IEEE International Conference on Evolutionary Computation,
Indianapolis, IN, 1997, pp. 303-308.
[30] in Merriam Webster's Online Dictionary, 2009.
[31] G. Venter and J. Sobieski, "Particle swarm optimization," in Proceedings of the
43rd AIAA/ASME/ASCE/AHS/ASC Stuctures, Structual Dynamics, and Materials
Conference, Denver, CO, 2002.
94
APPENDIX (BENCHMARKS)
95
Benchmark n
Ackley
2
1 1
cos(2 )
0.2
( ) 20 20
32 32
n n
j j
j j
x x
n n
j
f x e e e
x
30
Griewangk
2
1 1
( ) 1 cos4000
600 600
nnj j
j j
j
x xf x
j
x
30
Quadric
2
1 1
( )
100 100
jn
j
j k
j
f x x
x
30
Quartic
with noise
4
1
( ) [0,1)
1.28 1.28
n
i
i
j
f x random i x
x
30
Rastrigin
2
1
( ) 10 ( 10cos(2 ))
5.12 5.12
n
j j
j
j
f x n x x
x
30
Rosenbrock
12 2 2
1
1
( ) 100( ) (1 )
30 30
n
j j j
j
j
f x x x x
x
30
Schaffer’s f6
2
2 2
1 2
2 2 2
1 2
sin( 0.5( ) 0.5
(1.0 0.001( ))
100 100j
x xf x
x x
x
2
Spherical
2
1
( )
100 100
n
j
j
j
f x x
x
30
Weighted
Sphere
2
1
( )
5.12 5.12
n
j
j
j
f x j x
x
30
96
BIOGRAPHICAL SKETCH
George I. Evers received the Bachelor of Arts in Mathematics with a Physics
minor, secondary teaching certificate, and Cum Laude honors from Texas A&M
University – Kingsville, where he served as president of the TAMUK chapter of the
Society of Physics Students, taught physics labs from age 18, and enjoyed conversations
with professors.
Since then, he has primarily enjoyed working in other countries, staying long
enough to appreciate the cultures and languages. He enjoys seeing how various cultures
have evolved different approaches to life’s basic challenges and inferring the resulting
strengths and weaknesses of each approach.
Trading stocks at night while keeping his day job, he became interested in
algorithms. He has enjoyed tackling PSO’s stagnation problem as well as inferring the
parallels between the algorithm and society, such as that an inherently diverse population
and slow rate of agreement between groups may help avoid premature convergence to
sub-optimal solutions, though appearing unfruitful in the short-term.
At The University of Texas – Pan American, he served as a teaching assistant for
the department of electrical engineering.