moea nmoea neural network esign

MOEA NEURA NETWORK DESIGNMOEA NEURAL NETWORK DESIGN

Design Specification

• Essential requirements for neural network designA t i i l ith th t h f th ti l to A training algorithm that can search for the optimal parameters(i.e., weights or biases) for the specified network structure and training task.

o A rule or algorithm that can determine the network complexityand ensure it to be sufficient for given training problem.

o A metric or measure to evaluate the reliability and generalizationo A metric or measure to evaluate the reliability and generalization of the produced neural network.

Design Dilemma

),( ** ωxfNS

• A single optimal neural network is very difficult to find due to

o Weights are tuned by training sets, which are finite– difficult to extract ),( ** ωxNSf

from by a finite training data set.NSF

o Trade-off between NN learning capability and the variation of the hidden neuron numbers

NSF

A network with insufficient neurons— low training performance

A network with excessive number of neurons— difficult for generalization .

• Instead of trying to obtain a single optimal neural network, finding a set of near-optimal networks with different network structures is more feasible.

•• NN design– Multiobjective optimization problem (achieve better performance while simplify network structure)

Design Principle

• A population based, parallel searching multiobjective genetic algorithm is suitable for neural network design– to find a set of non-dominated neural net solutions

o Find a uniformly distributed Pareto front by one run

o Equally treat discontinuous, concave and other shapes of Pareto front

f2

f2 t k

f1– network complexity

f2– network training error

p y(neuron number)

f1

Hierarchical Genotype

• Using GA to evolve RBF Neural Network o Evolving NN topology along with parameters

Hi hi l i d i

phenotype

o Hierarchical gene structure in genotype design

weight center

)||||exp()( 2

1i

m

iif cxx −−= ∑

=

ωphenotype

1 0 01 0 1ControlgenesWeight

1 0 01 0 1ControlgenesWeight

Genotype

Centergenes

genesCentergenes

genes

Domination Ranking

• Goal: To convert multiple fitness of multiobjectives to a rank value.

f2

f1

• Rank value represents the dominated relationship by layerizing the resulting population.

Diversity Preservation

• Goal: to maintain population diversity during evolutionary process to obtain uniformly distributed Pareto front

f2 f2f2 f2

f1 f1

Automatic Accumulated Ranking

• Combine hierarchical genotype representation and Rank Density based Genetic Algorithm (RDGA)— Automatic Accumulated Ranking Strategy, includes both dominance and diversity infodiversity info.

• All nondominated individual has rank=1, for dominated individual y at generation t , Its rank value is given by

∑+=)(

),(1),(tp

j tyranktyrankf23

8 ∑=1j

j

1

1

2

26 12

where P(t) are the number of individual who

f1

12

where P(t) are the number of individual who dominate y

Adaptive Cell Density Estimation

• In HRDGA, an adaptive grid density estimation approach is proposed. The length of adaptive grid cell in objective space is computed ascomputed as

iXiXff

d)(min)(max xx

xx ∈∈−

= ni 1= Density=4i

iXiXi K

ffd

)(min)(max xxxx ∈∈

−= ni ,...,1=

f2

ii K

d = ni ,...,1=

• Density value of each individual is the b f th i di id l l t d i th

Density=4

number of the individuals located in the same cell.

f1

Fitness Assignment

• In HRDGA, original population is divided to optimize rank and density value independently. The rank and d it l b ti i d i lt l !density value can be optimized simultaneously!

population pop lation

Sub-pop 1 to Minimizepopulation rank value

i-th generation (i+1)-th generation

p p populationSub-pop 2 to minimizepopulation density value

Selection

Crossover&mutation

Selection

Elitism

• An elitist archive is used and an adaptive probability of sampling an individual from archive is applied.

Main population

Elitists’ Archive

Local Search

• Diffusion scheme from Cellular GA

Selectedparent

Best i di id l

Diffusion scheme

individual

Offspring

Forbidden Region

• To prevent the appearances of offsprings with high rank value and low density value

Offspring

Selected parent

Forbidden Region

Offspring

Forbidden Region

Mackey-Glass Time Series Prediction

• Algorithms for comparisono K-Nearest Neighbor (KNN, Kaylani & Dasgupta, 2000);

• Testing Problem Mackey Glass Chaotic Time Series

o Generalized Regression Neural Network (GRNN, Wasserman, 1993)

o Orthogonal Least Square (OLS, Chen,et. al, 1991)

Testing Problem– Mackey-Glass Chaotic Time Series

)())(1()(

)())(( txb

τtxτtxa

tdtxd

c ×−−+−×

=

o To predict x(t+6) based on x(t), x(t-6), x(t-12) and x(t-18)

o150101020 ==== τcba 150,10,1.0,2.0 ==== τcba

Parameter Setting

• For HRDGA, the length of all three layers of genes are chosen to be 150, population size is 400.

• F KNN d GRNN h d 70 i di id l k i h 11 80• For KNN and GRNN method, 70 individual networks with 11~80 number of neurons are used for comparison.

• For OLS method, 40 different tolerance parameter– values are chosen ρρ

• St i C it i ith th h ( ti ) d 5 000 th

form 0.01 to 0.4 with step 0.01. determines the trade-off between the performance and complexity of a network.

ρρ

• Stopping Criteria: either the epochs (generations) exceeds 5,000, or the training Sum Square Error (SSE) between two sequential (epochs) generations is smaller than 0.01.

• Training data set: first 250 seconds data. • Testing data set: data from 250 – 499, 500 – 749, 750 – 999 and 1,000 –

1,249 seconds

Simulation Study- Training Set

Training SSE vs Neuron Numbers Non dominated frontTraining SSE vs. Neuron Numbersfor training set

Non-dominated front for training set

Simulation Study- Testing Set #1

Training SSE vs Neuron Numbers Non dominated frontTraining SSE vs. Neuron Numbersfor testing set #1

Non-dominated front for testing set #1


T i i SSE N N b Non dominated frontTraining SSE vs. Neuron Numbersfor testing set #2



T i i SSE N N b N d i d fTraining SSE vs. Neuron Numbersfor testing set #3



T i i SSE N N b N d i d fTraining SSE vs. Neuron Numbersfor testing set #4


Performance Comparison

Best performance for Training set

Best performance for Testing set #1




for Training set for Testing set #1 for Testing set #2 for Testing set #3 for Testing set #4

Training SSE

Neuron number

Testing SSE

Neuron number

Testing SSE

Neuron number

Testing SSE

Neuron number

Testing SSE

Neuron number

KNN

2.8339

69

3.3693

42

3.4520

42

4.8586

48

4.8074

19

GRNN

2.3382

68

2.7720

38

3.0711

43

2.9644

40

3.2348

37

OLS

2.3329

60

2.4601

46

2.5856

50

2.5369

37

2.7199

54

HRDGA

2.2901

74

2.4633

47

2.5534

52

2.5226

48

2.7216

58

Problem for OLS

• The trade-off characteristic between network performance and complexity totally depends on the value of tolerance parameter

ρρ

ρ• Same value means completely different trade-off features for different NN design problems– designer cannot control the network complexity

parameter .ρ

ρnetwork complexity.

• The relationship between valueand network topology is nonlinear, many to one mapping

ρ

many to one mapping

Observations

• A new genetic algorithm assisted neural network design approach Hierarchical Rank-Density Genetic Algorithm is proposed

• Characteristics of HRDGA:o Hierarchical genotype representation– evolve topology with parameterso Multiobjective optimization– find a set of near-optimal network

candidates by one runcandidates by one run o Elitism, local search and forbidden region techniques help HRDGA in

finding near-optimal networks with low training errors.o The network complexity is near completely and uniformly sampled p y p y y p

due to HRDGA’s diversity maintaining scheme.

PARTIC E SWARM OPTIMIZATIONPARTICLE SWARM OPTIMIZATION

Definition

Swarm Intelligence (SI) is the property of a system whereby the collective behaviors of (unsophisticated) agents interacting locally with their environmentagents interacting locally with their environmentcause coherent functional global patterns to emerge.SI provides a basis with which it is possible to p pexplore collective (or distributed) problem solving without centralized control or the provision of a global modelglobal model.

In Simple Language…

SI systems are typically made up of a population of simple agents interacting locally with one another and with their environment to accomplish simple goal.with their environment to accomplish simple goal. Although there is normally no centralized controlstructure dictating how individual agents should behave local interactions between such agents oftenbehave, local interactions between such agents often lead to the emergence of global behavior (swarming behavior).

SI in Nature

Fish Schooling Bird Flocking in V formationFish Schooling ©CORO, Caltech

Bird Flocking in V formation©CORO, Caltech

Benefits of forming a swarm of thousand of animals g(agents) is to increasing foraging efficiency and defense over against predators

• Another example: Nest building in termitesbuilding in termites

• An eighteen feet tall termite nest at the University of Ghana Legon.

(© Kjell B. Sandved / Visuals Unlimited)

How Termites Build their Nest

1. Each termite scoops up a mudball from its environment, invests the ball with pheromones.

2. Then, randomly drops it on the ground or elevate , y p gpositions to form small heaps.

3. Termites are attracted to their nestmates' pheromones. Hence, they are more likely to drop their own mudballsy y pnear or on top of their neighbors'.

4. The process stop when reaches a specific height.5. Termites look for heaps clusters, choose the heap 5 e es oo o eaps c us e s, c oose e eap

clusters, and connect them with each other by building walls.

6. Over time this leads to the construction of pillars, p ,arches, tunnels and chambers.

Fundamental Concepts of SI

Self-organization process– Each agent plays its role by interacting with its

environment to gather the latest information, constantly making decision based on some simple local rules and information received, and p ,interact locally with other agents

Division of labor– Different groups of agents have their own

specializations to carry on certain tasksThey collaborate with other groups and perform– They collaborate with other groups and perform their own tasks simultaneously

Applications of ‘Swarm’ Principle

Robotics – Swarm bots http://www.swarm-bots.org/

C t A i tiComputer Animation– http://gvu.cc.gatech.edu/animation/Areas/group_behavior/gr

oup.html

Games, Interactive graphics and virtual reality“Swarm-like” algorithm - solving optimization problemsproblems

– Particle Swarm Optimization (PSO)– Ant Colony Optimization (ACO)

Early Particle Swarm Optimization

Craig Reynolds’s flock motion (1986-1987)– To model the flocking behavior of simple agents

(boids)(boids) – Resulting flock motion is contributed by the

interaction between the behaviors of individual b idboids

– Each boid has its own coordinate system and applies geometric flight model to support its flight movement

– Geometric flight model includes translation and flight dynamic parameters of yaw, pitch, andflight dynamic parameters of yaw, pitch, and banking (roll)

– Incorporates three steering behaviors (local rules), which is the underlying concept of flocking E h b id h it l l i hb h d (li it d– Each boid has its own local neighborhood (limited perception range of birds or fishes in the nature)

angelangle

boid

distancedistance

The three steering behaviors describe the maneuverability of an individual boid:

Separation: steer to avoid crowding local flockmates

Alignment: steer towards the average heading of local flockmates (neighbors)

Cohesion: steer to move toward the average position of local flockmates

Reference: http://www.red3d.com/cwr/boids/index.html

flockmates (neighbors) local flockmates (neighbors)

Simulated Boid Flock simulated boid flock avoiding cylindrical obstacles (1986)

Reference: http://www.red3d.com/cwr/boids/index.html

Reynolds’s pioneering work became the stepping stone for the development of a computer graphic area known as the behavior animationarea known as the behavior animation

– Georgia Tech' physically-based models of group behaviors – Stampede sequence (The Lion King)– Orc army (The Lord of the Rings)

Heppner’s artificial bird flocks simulation– Studied bird flock from the movies

C ll b t d ith G d d P tt t d l– Collaborated with Grenander and Potter to develop a program that simulates an artificial bird flocks

– Through observation, Heppner realized that chaotic theory b t l i th t b h i i fl kican be to explain the emergent behavior in flocking

– Designed the four simple rules to model an individual bird’s behavior

F. Heppner and U. Grenander, “A stochastic nonlinear model for coordinated bird flocks.” The Ubiquity of Chaos, ed. S. Krasner, AAAS Publications, Washington, DC, 1990

1. The attractive force is to allow the birds to attract each other and repulsive force is to forbid the birds to fly too close to each otherclose to each other

2. Each bird maintains the same velocity as its neighboring birdsO f3. Occasionally, the birds’ flight path can be altered by a random input (craziness)

4. Any birds are attracted to a roost and the attraction yincreases as the birds are flying closer to the roost

The whole concept is as follow:– The birds begin to fly around with no particular destination– Once a bird discover the roost it will move away from theOnce a bird discover the roost, it will move away from the

flock and land on the roost– Hence, it will pull its nearby neighbors to move towards the

roost– As these neighbors discover the roost, they will land on the

roost and bring others more– This process will go on until the entire flock land on the roost

Particle Swarm Optimization (PSO)

Inspired by the “roosting area” concept, James Kennedy (social-psychologist) and Russell Eberhart (Electrical Engineer) revised Heppner and(Electrical Engineer) revised Heppner and Grenander proposed methodology

I t h d th bi d k h t l tIn nature, how do the birds know where to locate food (“roost”) when they are hundred feet in the air?

Explore the area of social psychology, which related to social behavior of the human beings

Reference: http://www.adaptiveview.com/articles/ipsop1.html

Their conclusion: knowledge is shared within the flockThey also include the “mind of social” viewpoint, which– Individuals want to be individualistic, i.e. to improve

themselves. I di id l t t l th f th i– Individuals want to learn the success of their neighbors (both locally and globally), primarily learn their “experiences”.

Hence, they developed the Particle Swarm Optimization (PSO)

About PSO

PSO is a population based optimization techniqueThe population is called a swarm (swarm population)Th t ti l l ti ( ti l ) fThe potential solutions (particles) - form a swarm “flying” around the search space to search for the best solutionParticles = Candidate solutions (decision variables)Particles’ flights are governed by the historical information :o at o– Velocity (v)– Own personal best position found so far (pbest)

Global best position discovered so far by any particles in– Global best position discovered so far by any particles in the swarm (gbest)

x2My Personal Best Position (pbesti)

Global Best Position (gbest)

This is my new position (xi(t+1))

I’m here (xi(t)) My Velocity (vi(t))

position (xi(t+1))

43x1

Standard PSO Equation

At each iteration, tFor each particle, ip ,

For each dimension, j

( ) ( )tvwtv jiji 1 ×=+ Momentum Component( ) ( )( )( )( )( )txgbestrc

txpbestrc

jij

jiji

jiji

22

,,11

,,

−××+

−××+Update Velocity

Momentum Component

Cognitive Component

Social Component( )( )txgbestrc jij ,22+

( ) ( ) ( )11 ,,, ++=+ tvtxtx jijijiUpdate Position

Social Component

r1,r2 – Random numbers with [0,1]c1,c2 – Acceleration constantsM t t i ti l i ht t lMomentum component, , – inertial weight controls the impact of previous velocityCognitive component - Personal thinking of each

w

particles; personal desire to exceed its current achievementSocial Component – Social knowledge attained via Soc a Co po e Soc a o edge a a ed acollaborative effort of all the particles

Velocity Clipping Criterion

Kennedy investigated the swarm behavior if the velocity clipping criterion is not introduced.Without the velocity clipping criterion the swarmWithout the velocity clipping criterion, the swarm would diverge in a sinus-like waves of increasing amplitude without able to converge to the global optimumoptimum Hence, the velocity clipping criterion is necessary for the swarm to converge close to or equal to the global

tioptimum

J. Kennedy, and R. C. Eberhart, Swarm Intelligence, ISBN 1-55860-595-9, Academic Press (2001)

A particle without velocity clipping criterion

A particle with velocity clipping criterion

Jakob Vesterstrøm and Jacques Riget, “Particle SwarmsExtensions for improved local, multi-modal, and dynamic search in numerical optimization,” May 2002.

To prevent the particles from leaving the search space, the one of the following steps can be taken:

V l it li i it i E h ti l t– Velocity clipping criterion: Each particle are not allow to have velocities exceed the user defined, i.e., [-vmax, vmax]. Usually, vmax is chosen to be max

ixk×[ max max] y max

where is the feasible bound for variables, i.e. [ ]U

iLi xx ,

imaxix

– Position clipping criterion: Each particle are not allow to have decision variables exceed the feasible bound for variables ( )[ ]ULfeasible bound for variables ( )[ ]U

iLi xx ,

PSO Algorithm (Pseudo Code)

/Initialization/Initialize swarm randomly (Particles and velocity)Set w, c1,c2, max num of iterations (tmax), Store particles’ positions (pbest)

beginbeginWhile t<tmax

for each particleCalculate fitness.Update pbest if current position is better than the position contained in the memory.

End

Find global best position (gbest)

for each particleUpdate velocity and position of particle.Apply velocity/position clipping criterion

EndEnd whileEnd while

Report optimum solution (gbest) End

Animation of PSO

Function:

55 ⎞⎛( ) ( )( )( ) ( )( )( )

( ) ( )22

5

12

5

1121

800320425131

1cos1cos,

+++

+⎟⎠

⎞⎜⎝

⎛++×++= ∑∑

==

xx

kkxkkkxkxxFkk

– Minimization problemT d i i i bl

( ) ( )21 80032.042513.1 +++ xx

– Two decision variables– No decision variable bounds

Modification in PSO for Solving SOPs

1. Parameter Settings 2. Modifications of PSO Equation3. Neighborhood Topology4. Mutation/Perturbation Operators

M lti l S C t i PSO5. Multiple-Swarm Concept in PSO

1. Parameter Settings

x2My Personal Best Position (pbesti)

Global Best Position (gbest)

Inertial weight, w is large

I’m here (xi(t))My Velocity (v (t))


53x1

My Velocity (vi(t))

x2


c1 >> c2


54x1

x2This is my new position (xi(t+1))p ( i( ))

c2 >> c1


55x1

Random inertia weight – Experiment indicates this strategy accelerate the

convergence of particle swarm in the early time of theconvergence of particle swarm in the early time of the algorithm

( )2

5.0 ⋅+=

randw

– rand() is uniformly distributed random number within [0,1]

*Y. H. Shi, R. C. Eberhart, “Empirical Study of Particle Swarm Optimization”, Proceeding Congress on EvolutionaryComputation, Piscataway, pp.1945-1949, 1999

Linear decreasing the inertial weight( ) 221 w

maxttmaxtwww +

−×−=

– w1 and w2 are initial and final values of inertia weight– Larger value for w facilitates global search at the beginning

of the run

maxt

of the run– Smaller w encourage more local search ability near the end

of the run – Experiment indicates good performance when inertia weightExperiment indicates good performance when inertia weight

descend from 0.9 to 0.4

*R. C. Eberhart, Y. Shi, “Comparing inertia weight and constriction factors in particle swarm optimization”, Proceeding Congress on Evolutionary Computation, San Diego, pp. 84-88, 2000

Chaotic inertia weight– Use chaotic mapping to set inertia weight coefficient

L i ti i ( ))(1)()1( tztztz ××+ μ– Logistic mapping ( ))(1)()1( tztztz −××=+ μ

• Distribution of Logistic mapDistribution of Logistic map when µ = 4

• Logistic mapping is iterated 30,000 times

Times happen to bothIntervals are very high

,• Mean times happening to

interval [0.1,0.9] is 200

0.1 0.9

– Strategy of chaotic initial weight1. Select a random number z in the interval of (0, 1)2 Calculate Logistic mapping z with µ = 42. Calculate Logistic mapping, z with µ = 43. Apply to either linear decreasing the inertial weight or random

inertia weight

( )

E i t G d i i i k

( )2

5.0 ⋅+×=

randzw( ) zwmaxt

tmaxtwww ×+−

×−= 221 or

Experiment: Good convergence precision, quick convergence velocity, and better global search ability

Yong Feng, Gui-Fa Teng, Ai-Xin Wang, and Yong-Mei Yao, “Chaotic inertia weight in particle swarm optimization,” Proceeding 2nd International Conference on Innovative Computing, Information and Control, Kumamoto, Japan, pp. 475-475, 2007

Time varying acceleration coefficients (c1, c2)– Large c1 and small c2 in the early stage, to encourage

particles to explore the search spaceparticles to explore the search space – Promoted quick convergence to the optimum solution in the

later stage with larger c2 and smaller c1

( ) iif ctmax

tccc 1111 +−=

( ) t( ) iif ctmax

tccc 2222 +−=

Ratnaweera A, Halgamuge S.K., and Watson H. C., “SELF-ORGANIZING HIERARCHICAL PARTICLE SWARM OPTIMIZER,” IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 8, NO. 3, JUNE 2004

2. Modifications of PSO Equation

Canonical PSO by Maurice Clerc – Studied the swarm behavior using the second order

differential equations– The study shows that it is possible to determine under

which conditions that the swarm will converge– Introduces a constriction factor , to ensure convergence

by restraining the velocity to guarantee convergence of theχ

by restraining the velocity to guarantee convergence of the particles

– Observation: The amplitude of the particle’s oscillations decreases and increase depend on the distance between pbest and gbestpbest and gbest

M. Clerc, and J. Kennedy, “The Particle Swarm Explosion, Stability,and Convergence in a Multidimensional Complex Space,” IEEE Transactionson Evolutionary Computation, Vol. 6, No. 1, February, (2002): 58-73

– Particle will oscillate around the weighted mean of pbestand gbest

– If pbest and gbest are near each other, particle will performIf pbest and gbest are near each other, particle will perform local search

– If pbest and gbest are far apart from each other, particle will perform global searchperform global search

– During the search process, particle will shift from local search back to global search depending on pbest and gbestThe constriction factor balances the need for local and– The constriction factor balances the need for local and global search depending how the social conditions are in place

The update velocity equation is( ) ⎟

⎞⎜⎛ +tv ji,

( ) ( )( )( )( ) ⎟

⎟⎟⎟

⎠⎜⎜⎜⎜

⎝ −

+−=+

txgbestrc

txpbestrctv

jij

jijiji

,22

,,11, 1 χ

2κ [ ]where ; ; and( )422

−−−=

φφφκχ [ ]1,0∈κ

4;21 >+= φφ cc 21 φφ

The parameter controls the convergence speed to the point of attraction.If is close to zero, will be close to zero, then the resulting

κ

κ χvelocity will be small. Small velocity encourage local search, so the convergence speed is highIf is close to one, high exploration behavior but slowest possible convergence speed

κpossible convergence speedExperiment: Even without the velocity clipping criterion the constriction factor can prevent the particles from leaving the search space and ensure convergencesearch space and ensure convergence

Gaussian Particle Swarm Model (GPSO)– Observation shows the expected values for

( b ) d ( b ) 0 729 d 0 85(pbest-x) and (gbest- x) are 0.729 and 0.85– A probability distribution that generates random

values with expected values of [0.729 0.85]values with expected values of [0.729 0.85]

( ) ( )( ) ( )( )txgbestrandntxpbestrandntv jijjijiji ,,,, 1 −+−=+

( ) ( ) ( )11– |randn| and |randn| are positive random numbers

generated according to abs[N(0,1)]

( ) ( ) ( )11 ,,, ++=+ tvtxtx jijiji

g g [ ( , )]

R. A. Krohling, “Gaussian swarm: a novel particle swarm optimization algorithm,” Proceedings of the IEEE Conference on Cybernetics and Intelligent Systems, Singapore, pp. 372-376, 2004.

3. Neighborhood Topology

From the standard PSO equation, the movement of particles influence by both personal best (pbest) and global best (gbest)global best (gbest)Neighborhood topology- topology of a swarm (usually replace gbest)( y p g )Neighborhood topology- How the particles in the swarm are connected with each other in terms of sharing their knowledgesharing their knowledge

The convergence rate can be estimated by calculating the average distance between two particles in the neighborhood topologyparticles in the neighborhood topology The shorter the average distance facilitates quick convergence speed – This means lower degree of g p gconnectivity

Global Topology (STAR)– Happens in every iteration

All th ti l t d i h th t th k l d i– All the particles are connected in such that the knowledge is shared by all particles

– Best solution found by any particles in the swarm and will i fl ll th ti l t t it tiinfluence all the particles at next iteration

g

e b

afg

d c

Wheel Topology– When particle (example: b) finds the global best, particle a,

will immediately drawn into it (at next iteration)will immediately drawn into it (at next iteration)– Only when particle a move to the location, that it influence

the rest of the particlesO it ti i d b f ll th ti l i– One or more iterations required before all the particles in the neighborhood are influence by the global best

bfg

a bfg

aThe rest of particles

willed

ca

ed

cwill

drawn into particle a

Ring Topology (Circle or lbest)– Each particle is connected with K immediate neighbors

When one particle (example: b) finds the global best only its– When one particle (example: b) finds the global best, only its immediate neighbors (i.e., a & c), will be drawn to b

– Other particles are not influenced by b until their immediate neighbors have moved towards that locationneighbors have moved towards that location

– A few iterations may required before all the particles in the neighborhood are influence by the global best

ba

fg

b

a

fg

ed

c

bf

de c

bf

Each neighborhood has strength and weaknessSTAR – Fast information flow, converges fast, but with potential of premature convergencewith potential of premature convergence Wheel – Moderate information flow, converges more quickly but sometimes to a suboptimal point in the spaceCircle – Slow information flow, converge slower, favor more exploration; might have more chances to find better solutions slowlyThere are many more neighborhood topologies and the choice of these topologies are depended by thethe choice of these topologies are depended by the problems (i.e., problem dependent)

4. Mutation/Perturbation Operators

Problem of PSO – lack of diversity and easily trapped in a local optimum. A t ti ti l b bl t l d th thA mutating particle may be able to lead the other particles away from their current position if this particle becomes the global best.p gHence, apply mutation operators to PSO – A strategy to enhance the exploration of the particles and to escape the local minimaand to escape the local minima.

Where to apply the mutation operators in PSO?– Apply to the updated particle’s position (decision variable)

A l t th d fi d i l it th h ld– Apply to the user defined maximum velocity threshold

The mutation operators are the mutation approaches use for GA or MOEA. Example:

R d t ti N if t ti N l di t ib t d– Random mutation; Non-uniform mutation; Normal distributed mutation

– Example of Gaussian mutation used for PSO*:

where σ is set to be 0.1 times the length of the search space in

( ) ( )( )σGaussianxxmutation −×= 1

gone dimension; OR σ can be set at 1.0 and linearly decreases to 0 as the iteration counts reach maximum criteria

5. Multiple-Swarm Concept in PSO

Improve performance by promoting exploration and diversity

– Counter its tendency of premature convergence

Three main groups – Improve the performance of PSO by promoting diversityImprove the performance of PSO by promoting diversity – Solve multimodal problems – Locate and track the optima of a multimodal problem in a

dynamic environmentdynamic environment

Kennedy proposed using a k-means clusteringalgorithm to identify the centers of different clusters of particles in the population, and then use theseof particles in the population, and then use these cluster centers to substitute the personal bests

– Require a pre-specified number of iterations to determine the cluster centers and pre-specified number of clustersp p

– Not suitable for multimodal problems since the cluster centers don’t necessarily the best-fit particles in that cluster

*J. Kennedy, “Stereotyping: improving particle swarm performance with cluster analysis,” Proceedings of Congress on Evolutionary Computation, San Diego, CA. pp. 1507-1512, 2000

– Number of clusters and number of iterations to identify the cluster centers must be predeterminedpredetermined.

Cluster A’s center performs better thanperforms better than all members of cluster A, whereasl t B’ tcluster B’s center

performs better than some and worse than others*

Chen and Yu’s TPSO– First subswarm will optimize following the global best;

second subswarm will move in the opposite direction pp– Particle’s pbest is updated based on its local best, their

corresponding subswarm’s best, and the global best collected from two subswams

– If the global best has not improved for 15 successive iterations, the worst particles of a subswarm are replaced by the best ones of the other subswarm. Then, the subswarms switch their flight directionsswitch their flight directions

G. Chen and J. Yu, “Two sub-swarms particle swarm optimization algorithm,” Proceeding of International Conference on Natural Computation, Changsha, China, pp. 515-524, 2005

Multi-population cooperative optimization (MCPSO) – Based on the concept of master-slave mode

S l ti ill h t d lti l– Swarm population will have a master swarm and multiple slave swarms

– Slave swarms explore the search space independently to i t i di it f ti lmaintain diversity of particles

– Master swarm updates via the best particles collected from the slave swarms

B. Niu, Y. Zhu, and X. He, “Multi-population cooperative particle swarm optimization,” Proceeding of European Conference on Artificial Life, Canterbury, UK, pp. 874-883, 2005

Speciation-based PSO (SPSO) is proposed by Parrott and Li

– Notion of species: A group of individuals sharing commonNotion of species: A group of individuals sharing common attributes according to some similarity metric

– A radius, rs, is measured in Euclidean distance from the center of a species to its boundary

– The center of a species, species seed, is always the fittest individual in the species

– All particles that fall within the distance from the species d l ifi d h iseed are classified as the same species

D. Parrott and X. Li, “Locating and tracking multiple synamic optima by a particle swarm model using speciation,” IEEE Transactions on Evolutionary Computations, Vol. 10, No. 4, pp. 440-458, 2006

PSO Designs Inspired by Other Fields

Emotion Particle Swarm Optimization– Each particle has two emotions (joy and sad)

Th ti l t t f th ti l i b d th– The emotional state of the particles is based on the emotional factor compare with a random value

– If certain condition is met, then the particle is updated using th “j f l” l it ti l “ d” l it tithe “joyful” velocity equation or else “sad” velocity equation will be applied

– Psychological model is incorporated in both “joyful” and “sad” velocity equations“sad” velocity equations

B. Niu, Y.L. Zhu, K.Y. Hu, S.F. Li, X.X. He, “A novel particle swarm optimizer using optimal foraging theory,” Proceedings of Computational Intelligence and Bioinformatics, Part 3, Kunming, China , Vol. 4115, pp. 61-71, 2006

ANT CO ONY OPTIMIZATIONANT COLONY OPTIMIZATION

Biological Fact

• Ant Colony Optimization (ACO) studies artificial systems that take inspiration from the behavior of real ant colonies and which are used to solve discrete optimization problems.

• Real ants are capable of finding the shortest path from a food source to their nest without using visual cues by exploiting pheromone information.

• While walking, ants deposit pheromone on the ground and follow, in probability, pheromone previously deposited by other ants.

• Also, they are capable of adapting to changes in the environment, so, t ey a e capab e o adapt g to c a ges t e e o e t,for example finding a new shortest path once the old one is no longer feasible due to a new obstacle (dynamical environments).

• Consider the following figure in which ants are moving on a straight line which connects a food source to the nest:

• It is well-known that the primary means used by ants to form and maintain the line is a pheromone trail. Ants deposit a certain amountmaintain the line is a pheromone trail. Ants deposit a certain amount of pheromone while walking, and each ant probabilistically prefers to follow a direction rich in pheromone rather than a poorer one. This elementary behavior of real ants can be used to explain how they can find the shortest path which reconnects a broken line after the sudden appearance of an unexpected obstacle has interrupted the initial path:

• In fact, once the obstacle has appeared, those ants which are just in front of the obstacle cannot continue to follow the pheromone trail and therefore they have to h b t t i i ht l ft I thichoose between turning right or left. In this

situation we can expect half the ants to choose to turn right and the other half to turn left. The very same situation can be found on the other side of the obstaclefound on the other side of the obstacle.

• It is interesting to note that those ants which choose, by chance, the shorter path around the obstacle will more rapidly reconstitute the interrupted pheromone trail compared to p p pthose which choose the longer path. Hence, the shorter path will receive a higher amount of pheromone in the time unit and this will in turn cause a higher number of ants to choose the shorter path Due to this positivechoose the shorter path. Due to this positive feedback (autocatalytic) process, very soon all the ants will choose the shorter path.

• The most interesting aspect of this autocatalytic process is that finding the shortest path around the obstacle seems to be an emergent property of the interaction between the obstacle shape and ants distributed behavior: Although all ants move atand ants distributed behavior: Although all ants move at approximately the same speed and deposit a pheromone trail at approximately the same rate, it is a fact that it takes longer to contour obstacles on their longer side than on their shorter side which makes the pheromone trail accumulate quicker on thewhich makes the pheromone trail accumulate quicker on the shorter side. It is the ants preference for higher pheromone trail levels which makes this accumulation still quicker on the shorter path.

• [R1] Beckers R., Deneubourg J.L. and S. Goss (1992). Trails and U-turns in the selection of the shortest path by the ant Lasius niger. Journal of theoretical biology, 159, 397-415.

• [R2] Hölldobler B. and E.O. Wilson (1990). The ants. Springer-Verlag, Berlin.

History

• The first ACO system was introduced by Marco Dorigo in his Ph.D. thesis (1992), and was called Ant System (AS).

• AS is the result of a research on computational intelligence approaches to combinatorial optimization that Dorigo conducted atapproaches to combinatorial optimization that Dorigo conducted at Politecnico di Milano in collaboration with Alberto Colorni and Vittorio Maniezzo. AS was initially applied to the travelling salesman problem, and to the quadratic assignment problem.

• Si 1995 D i G b d ll d Stüt l h b ki• Since 1995 Dorigo, Gambardella and Stützle have been working on various extended versions of the AS paradigm. Dorigo and Gambardella have proposed Ant Colony System (ACS), while Stützle and Hoos have proposed MAX-MIN Ant System (MMAS).

•• They have both been applied to the symmetric and asymmetric traveling salesman problem, with excellent results. Dorigo, Gambardella and Stützle have also proposed new hybrid versions of ant colony optimization with local search. In problems like the quadratic assignment problem and the sequential ordering problem these ACO algorithms outperform all known algorithms on vast classes of benchmark problems.

Traveling Salesman Problem

• NP-complete• Given a set of n cities and

distances for each pair of citiesdistances for each pair of cities, have a team of ants search in parallel for a roundtrip of minimal total length visiting each city exactly once.

• n!/2n possible toursp

• Symmetric TSP: ),(),( rssr δδ =

• Asymetric TSP: ),(),( rssr δδ ≠

Design Principle

• The solution is constructed in an iterative way. It adds new cities to a partial solution by exploiting both information gained from past experience and aboth information gained from past experience and a greedy heuristic.

• Use of “cooperation” among many relatively “simple agents” which indirectly communicate by “distributedagents which indirectly communicate by distributed memory” implemented as pheromone deposited on edge of a graph

• L t V { b } b t f iti A {( ) b l• Let V={a, b, …} be a set of cities. A={(r,s), r,s belong to V} be the edge set, and δ(r,s) be a cost measure (in Euclidean distance) associated with edge (r,s).

Ant System (AS)

• In addition to all variables defined in TSP, Ant System uses additional parameters. Each edge (r,s) has a desirability measure τ(r,s), called pheromone, which is updated at run time by artificial ants.

• Each ant generates a complete tour by choosing the cities according to a probabilistic• Each ant generates a complete tour by choosing the cities according to a probabilistic state-transition rule: ant prefers to move to cities which are connected by short edges with a high amount of pheromone.

• Random-proportional rule:⎧ β

o denotes the probability with which ant k in city r chooses to move to the city s

⎪⎩

⎪⎨

⎧∈

⋅⋅

= ∑∈

otherwise,0

)(if,),(),(

),(),(

),()(

rJsurur

srsr

srpk

rJuk

k

β

β

ητητ

)( srpo denotes the probability with which ant k in city r chooses to move to the city s.o denotes the set of cities that remain to be visited by ant k positioned on city r (to make

the solution feasible)ο τ is the pheromoneο η=1/δ inverse of the distance

)(rJk

),( srpk

ο η=1/δ inverse of the distanceο β is the parameter which determines the relative importance of pheromone versus distance

(β>0)• Favor the choice of edges which are shorter and have a great amount of pheromone

• Once all ants have completed their tours, a global pheromone updating rule is applied

∑Δm

)()()1()(

whereo

∑=

Δ+⋅−←k

k srsrsr1

),(),()1(),( ττατ

⎧ ∈ antbydonetour)(if/1 ksrLo

o 0<α<1 pheromone decay parametero Lk is the length of the tour performed by ant k

⎩⎨⎧ ∈

=Δotherwise,0

ant by donetour ),(if,/1),(

ksrLsr k

kτ

o m is the number of ants• Pheromone updating rule is intended to allocate a greater amount of

pheromone to shorter tours.• Pheromone placed on the edges plays the role of a distributed long-Pheromone placed on the edges plays the role of a distributed long

term memory• This memory is not stored locally within the individual ants; but is

distributed on the edge of the graph

• Each ant deposits an amount of pheromone on edge which belong to its tour in proportion to how short its tour was.

• Edges which belong to many short tours are the edges which receive the greater amount of pheromonereceive the greater amount of pheromone

• A fraction of the pheromone evaporates on all edges (i.e., edges that are not refreshed become less desirable), and them each ant deposits an amount of pheromone on edges which b l t it t i ti t h h t it t (ibelong to its tour in proportion to how short its tour was (i.e., edges which belong to many short tours are the edges which receive the greater amount of pheromone). The process is then iterated.

•• Although ant system is useful for discovering good or optimal solutions for small TSPs (up to 30 cities), the time required to find such results made it infeasible or ineffective for larger problems.

Ant Colony System (ACS)

• The state transition rule provides a direct way to balance between exploration of new edges and exploitation of a priori and accumulated knowledge about the problem.

• The global updating rule is applied only to edges which belong toThe global updating rule is applied only to edges which belong to the best ant tour.

• While ants construct a solution, a local pheromone updating rule is applied.

• m ants are initially positioned on n cities chosen according to some initialization rule (e.g., randomly)

• Each ant builds a tour (i.e., feasible solution to the TSP) by repeatedly applying a stochastic greedy rule (i.e., the staterepeatedly applying a stochastic greedy rule (i.e., the state transition rule).

• While constructing its tour, an ant also modifies the amount of pheromone on the visited edges by applying the local updating rulerule.

• Once all ants have terminated their tour, the amount of pheromone on edge is modified again (by applying the global updating rule).

ACS Pseudo Codes

InitializeLoop /* at this level each loop is called an iteration */

Each ant is positioned on a starting nodeLoop /* at this level each loop is called a step */

Each ant applies a state transition rule to incrementallyEach ant applies a state transition rule to incrementally build a solution and a local pheromone updating rule

Until all ants have built a complete solutionA global pheromone updating rule is appliedA global pheromone updating rule is applied

Until End-condition

ACS Design Procedure

• State transition rule: an ant positioned on node r choose the city s to move to by applying the rule given by

( )⎪⎧ ≤if)()(maxarg qqurur βητ

o q is a random number uniformly distributed within [0 1]

( )⎪⎩

⎪⎨⎧ ≤⋅

= ∈

otherwise,

if,),(),(maxarg 0)(

S

qqururs rJu k

βητ

o q is a random number uniformly distributed within [0,1]o q0 is a parameter between 0 and 1o S is a random variable selected according to the probability distribution

function given earlier.• F t iti t d d t d b h t d d ith• Favors transitions toward nodes connected by short edges and with

a large amount of pheromone• The parameter q0 determines the relative importance of exploitation

versus exploration.p• If q< q0, then the best edge is chosen (exploitation), otherwise, an

edge is chosen according to random-proportion rule from AS (biased exploration).

• Global updating rule: only the global best ant (the ant which constructed the shortest tour from the beginning of the trial) is allowed to deposit pheromonep p

where

),(),()1(),( srsrsr τατατ Δ⋅+⋅−←

where

⎩⎨⎧ ∈

=Δotherwise,0

best tour global),(if,/1),(

srLsr gbτ

Lgb is the length of the globally best tour from the beginning of the trial.

• Ants search in a neighborhood of the best tour found up to the current iteration of the algorithm.

• Local updating rule: While building a solution (a tour) of the TSP, ants visit edges and change their pheromone level by applying the local updating rulep g

where 0<ρ<1 is a design parameter

),(),()1(),( srsrsr τρτρτ Δ⋅+⋅−←

where 0<ρ<1 is a design parameter., τ0 is the initial pheromone level

• The effect of local updating rule is to make the desirability of edges change dynamically

0),( ττ =Δ sr

change dynamically.• Every time an ant uses an edge, this becomes slightly less desirable

(since it loses some of its pheromone) (i.e., favors the exploration of edges not yet visited)edges not yet visited).

Simulation Results

• Five randomly generated 50-city problems• Ant Colony System (ACS), Simulated Annealing (SA), Elastic Net

(EN), Self-Organizing Map (SOM)(EN), Self Organizing Map (SOM)• Each was run 25 times to arrive a statistical average

Problem ACS SA EN SOMCity Set 1 5.88 5.88 5.98 6.06City Set 2 6.05 6.01 6.03 6.25City Set 3 5.58 5.65 5.70 5.83City Set 4 5.74 5.81 5.86 5.87City Set 5 6 18 6 33 6 49 6 80City Set 5 6.18 6.33 6.49 6.80

1.0;9.0;2 0 ==== ραβ q

• http://www.ing.unlp.edu.ar/cetad/mos/TSPBIB_home.html• Smaller geometric problems

Problem ACS GA EP SA Optimum

Eli50 (50-city

problem)

425[1,830]

428[25,000]

426[100,000]

443[68,512]

425

Eli75 535 545 542 580 535Eli75(75-city

problem)

535[3,480]

545[80,000]

542[325,000]

580[173,250]

535

KroA100(100 it

21,282 21,761 N/A N/A 21,282(100-city problem) [4,820] [103,000] [N/A] [N/A]

• http://www.ing.unlp.edu.ar/cetad/mos/TSPBIB_home.html• Larger geometric problems

Possible Improvements

• Enhance local search procedure• Exploration vs. exploitation dilemma• Population size (ant size: m = 10, a heuristic good

choice)• Parameterization of parameters β ηParameterization of parameters, β, η• Computational complexity• Extent to multi-agent systeme o u age sys e

ACO Repository

http://iridia.ulb.ac.be/~mdorigo/ACO/ACO.html

ACO Conference

Watch out for ANTS coming in 2010!!!

ACO Demonstration

moea nmoea neural network esign

Documents