sc741– w6: an introduction to machine- learning and modeling … · 2008. 11. 21. · ga: coding...
TRANSCRIPT
SC741– W6: An Introduction to Machine-
Learning and Modeling Techniques for Swarm-Intelligent, Embedded
Systems
Outline• Machine-Learning Techniques
– Overview and definitions– Genetic Algorithms– In-line learning– From single-unit to multiple-unit systems
• Probabilistic Modeling Techniques– Overview and definitions– Microscopic modeling– Macroscopic modeling
• Combining Modeling and Learning Techniques
An Introduction to Machine-Learning
Techniques
Why Machine-Learning?
• Complementarity to a model-based/engineering approaches: when low-level details matter (optimization) and/or good solution do not exist (design)!
• When the design/optimization space is too big (infinite)/too computationally expensive to be systematically searched
• Automatic design and optimization techniques• Role of engineer reduced to specifying
performance requirements and problem encoding
Why Machine-Learning?
• There are design and optimization techniques robust to noise, nonlinearities, discontinuities
• Individual real-time, adaptation to new environmental conditions; i.e. increased individual flexibility when environmental conditions are not known/cannot predicted a priori
• Search space: parameters and/or rules
ML Techniques: Classification– Supervised techniques: “a trainer/teacher” is available.
• Ex: a set of in-out examples is provided to the system and the error at the output of the of the system vs. perfect answer of the teacher for the same input is calculated and fed to the system using a machine-learning algorithm so that error is reduced over trials
• The system generalization capabilities after training are tested on examples not presented to the system before (i.e. not in the training set)
– Unsupervised techniques: “trial-and-error”, “evaluative” techniques; no teacher available.
• The system judges its performance according to a given metric to be optimized
• The metrics does not refer to any specific input-to-output mapping• The system tries out possible design solutions, does mistakes, and
try to learn from its mistakes• Number of possible examples is very large, possibly infinite, and
not known a priori
ML Techniques: Classification
– Off-line: in simulation, download of the learned/evolved solution onto real hardware when certain criteria are met
– Hybrid: most of the time in simulation (e.g. 90%), last period ( e.g. 10%) of the process on real hardware
– On-line: from the beginning on real hardware (no simulation). Depending on the algorithm more or less rapid
ML Techniques: ClassificationML algorithms require sometimes fairly important computational resources (in particular for multi-agent search algorithms), therefoe a further classification is:
– On-board: machine-learning algorithm run on the system to be learned or evolved (no external unit)
– Off-board: the machine-learning algorithm runs off-board and the system to be learned or evolved just serve as phenotypical, embodied implementation of a candidate solution
Selected Unsupervised ML Techniques Robust to Noisy
Fitness/Reinforcement Functions• Evolutionary computation
– Genetic Algorithms (GA) – Genetic Programming (GP)– Evolutionary Strategies (ES)– Particle Swarm Optimization (PSO)
• Learning – In-Line Adaptive Learning– Reinforcement Learning
Today
W9
W11Today
GA: Coding & Decodingphenotype genotype phenotype
coding decoding(chromosome)
• phenotype: usually consists of real numbers • chromosome = string of genotypical segments, i.e. genes
G1 G2 GnG4G3 … Gi = gene = binary or real number
Coding: real to real or binary to real via Gray code (minimization of nonlinear jumping between phenotype and genotype)
Decoding: inverted operation
Rem:
• Artificial evolution: usually one-to-one mapping between phenotypic and genotypic space
• Natural evolution: 1 gene codes for several functions, 1 function coded by several genes.
GA: Population & Fitness• Population: set of solutions tested in one generation; consists of m genetic
individuals (e.g. m = 100), each one provided with an own chromosome.
• Generation: new population after genetic operators have been applied (n = # generations e.g. 50, 100, 1000).
• Evaluation span: evaluation period of each genetic individual during one generation.
• Life span: number of generation an individual survive
• Fitness function: measurement of each genetic individual efficacy during life-span
• Population manager: applies genetic operators to generate the individuals of the new generation from the current one, at the end of the life span of the whole population.
GA: Basic Operators• Selection: roulette wheel, ranked selection, elitist selection• Crossover: 1 point, 2 points (e.g. pcrossover = 0.2)
• Mutation (e.g. pmutation = 0.05)
Gk Gk’
Note: examples for fixed-length chromosomes!
Generation Loop
Evaluation of Individual Candidate
Solutions
Population Replenishing
Selection
Crossover and Mutation
Decoding (genotype-> phenotype) Fitness Measurement
Population Manager
System
Evolutionary Loop: Several Generations
Start
Initialize Population
Generation loop
End criterion met?N
Y
Ex. of end criteria:
• # of generations
• best solution performance
•… End
Examples of GA Applications
• Mechanical design
• Single robot experiments
An Example of Mechanical
Design using GA
+ +
Khepera external powering module• External power in each position and
orientation
• Power floor + redesigned basic module endowed with electrical contacts
Powering Module: Problem Encoding
• Mixed approach: engineering intuition and GA
• Engineer: bands, minimal number of contacts (≥ 4) for limiting search space
• Genotype (9 reals): encoding of wcond, (r,ϕ) coordinates of each contact
• Fitness: fraction of position (translation and rotation) powered
• Simulated mechanical vibrations(gaussian noise) of contacts for evolving robust solutions
• m = 100, n = 50, pc = 0.2, pm = 0.01• Evolved solution symmetric, can be
backwards geometrically demonstrated that optimal solution
Evolving a Neural Controller
1e12)( −
+= −xxf
S2
S3 S4S5
S6
M1 M2
∑=
+=m
jjiji IIwx
10
)( ii xfO =
inhibitory conn.excitatory conn.
Oi output
f(xi)
wijsynaptic weight
S1
neuron N with sigmoidtransfer function f(x)Ni
Ij S8 S7input
Note: In our case we evolve synaptic weigths but Hebbian rules for dynamic change of the weights, transfer function parameters, … can also be evolved (see Floreano course)
Evolving Obstacle Avoidance(Floreano and Mondada 1996)
Defining performance (fitness function):
)1)(1( iVV −∆−=Φ
• V = mean speed of wheels, 0 ≤ V ≤ 1• ∆v = absolute algebraic difference
between wheel speeds, 0 ≤ ∆v ≤ 1• i = activation value of the sensor with the
highest activity, 0 ≤ i ≤ 1
Note: Fitness accumulated during evaluation span, normalized over number of control loops (actions).
Evolving Robot Controllers
Note:Controller architecture can be of any type but worth using GA if the number of parameters to be tuned is important
Evolving Obstacle Avoidance
Evolved path
Fitness evolution
Evolved Obstacle Avoidance Behavior
• Generation 100• Direction of motion NOT encoded in the fitness function: GA
automatically discovers asymmetry in the sensory system configuration (6 proximity sensors in the front and 2 in the back)
Evolving Homing Behavior(Floreano and Mondada 1996)
Robot’s sensorsSet-up
• Fitness function:
• V = mean speed of wheels, 0 ≤ V ≤ 1• i = activation value of the sensor with the
highest activity, 0 ≤ i ≤ 1
)1( iV −=Φ
Evolving Homing BehaviorController
• Fitness accumulated during life span, normalized over maximal number (150) of control loops (actions).
• No explicit expression of battery level/duration in the fitness function (implicit).• Chromosome length: 102 parameters (real-to-real encoding).• Generations: 240, 10 days embedded evolution on Khepera.
Evolving Homing BehaviorFitness evolution
Evolution of # control loops per evaluation span
Battery recharging vs. motion patterns
Battery energy
Left wheel activation
Right wheel activation
Reach the nest -> battery recharging -> turn on spot -> out of the nest
Evolved Homing Behavior
Moving BeyondController-Only Evolution
• Evidence: Nature evolve HW and SW at the same time …
• Faithful realistic simulators enable to explore design solution which encompasses off-line co-evolution (co-design) of control and morphological characteristics (body shape, number of sensors, placement of sensors, etc. )
• GA are powerful enough for this job and the methodology remain the same; only encoding changes
Evolving Sensory Systems(Zhang, Antonsson, Martinoli 2002)
• Realistic simulator: Webots• Evolution of:
– number of sensors– cone of view– range– sensor placement
• Sensory system cost proportional to # of sensors, range, cone of view
• Fitness: maximal coverage at minimal cost ��������������
����� ������
����������������
�����
����
�������
������
�����
��
���
�
�����
��
������ ������
�����
������� ������!���
��
�
Dealing with Variable-Length Chromosome
Zero line Zero line
Crossover line Crossover line
Zero line Zero line
Crossover line Crossover line
Parents Children
Crossover
• Evolution of number of sensors -> variable-length chromosome
• Additional operators: insertion, deletion
• Crossover: extremely disruptive operator
• Idea: take advantage of the problem genotype-to-phenotype mapping
• In our case: convex, closed structure on the vehicle -> fold chromosome in the same way -> mapping maintained!
Evolving Control and Robot Morphology
(Lipson and Pollack, 2000)
http://www.mae.cornell.edu/ccsl/research/golem/index.html• Arbitrary recurrent ANN• Passive and active (linear
actuators) links• Fitness function: net distance
traveled by the centre of mass in a fixed duration
Example of evolutionary sequence:
Examples of Evolved Machines
Problem: simulator not enough realistic (performance higher in simulation because of not good enough simulated friction; e.g., for the arrow configuration 59.6 cm vs. 22.5 cm)
In-Line Adaptive Learning
Not Always a big Artillery such a GA is the Most Appropriate Solution…
• Simple individual learning rules combined with collective flexibility can achieve extremely interesting results
• Simplicity and low computational cost means possible embedding on real simple individuals
• We will see similar learning rules in combination with threshold bases algorithms, foraging strategies, …
In-Line Adaptive Learning(Li, Martinoli, Abu-Mostafa, 2001)
Key ideas:• Origin: adaptive bang-bang control• 2-steps algorithm• Learning step adaptive according to fixed rules• Randomness when stuck• Can be combined with a low-pass filter for noise reduction
GTP: parameter to be learned
∆d: learning stepd: direction
Performance
In-Line Adaptive LearningDifferences with gradient descent methods:• Fixed rules for calculating step increase/decrease -> limited
descent speed -> no gradient computation -> more conservative but more stable
• Randomness for getting out from local minima (no momentum)
• Underlying low-pass filter is part of the algorithm
Differences with Reinforcement Learning:• No learning history considered (only previous step)
Differences with basic In-Line Learning:• Step adaptive -> faster and more stability at convergence
Sample ResultsTest-bed: multi-agent system, 1 parameter (GTP)
Filter cut-off f high (short averaging window)
Filter cut-off f low (long averaging window)
0 100 200 300 400 500 6000
0.2
0.4
0.6
0.8
1
1.2
1.4
Initial gripping time parameter (sec)
Sti
ck−p
ulli
ng
rat
e (1
/min
)
2 robots
3 robots
4 robots
5 robots
6 robots
00
0.2
0.4
0.6
0.8
1
1.2
1.4
Sti
ck−p
ulli
ng
rat
e (1
/min
)Pe
rfor
man
ce
GTP
Perf
orm
ance
Learned (mean + std dev)Systematic (mean only)
100 200 300 400 500 600Initial gripping time parameter (sec)
2 robots
3 robots
4 robots
5 robots
6 robots
GTP
Moving from a Single-Unit to Multiple-Unit
Systems
Evolving Controllers
• Collective: fitness become noisy due to partial perception, independent parallel actions
Credit Assignment ProblemWith limited communication, no communication at all, or partial perception:
Example: Evolving the Sensory System of one Vehicle within a Traffic System
(Zhang, Antonsson, Martinoli 2002)
��������������
����� ������
����������������
�����
����
�������
������
�����
��
���
�
�����
��
������ ������
�����
������� ������!���
��
�
Fitness Function
}0;)1max{(1
∑=
−=V
i
i
Vanf δ
a = cost factor for sensorsn = number of sensors usedδi = 0 if car not detected; 1 if car detectedV = actual number of vehicles in the detection region during
a whole evaluation span
Ad Hoc Evolutionary Loop
• Goals:– Evaluating actual fitness– Eliminating as quickly as possible
lucky individuals
• Solution:– Dynamical population size– Individual life span = several
evaluation spans over generations
Co-Learning and Co-Evolution
Co-Learning or Co-Evolving of Competitive Behavior
No credit assignment problem: individual fitness!Example: co-evolution of prey’s and predator’s controllers
Φprey= T Φpredator= 1-T
T = normalized time of survival of the prey (before is caught)
Prey-Predator Experiment(Nolfi and Floreano, 1998)
Prey-Predator ExperimentFitness co-evolution in simulation
Fitness co-evolution in real robots
Co-Learning or Co-Evolving of Collaborative Behavior
Key question: What metrics (group, individual, combination) should we measure for improving individual controllers?
Ex.:Co-Evolving Collaborative BehaviorThree orthogonal axes to consider (extremities or balanced solutions are possible):
• Individual and group fitness• Private (non-sharing of parameters) and public (parameter sharing) policies• Homogeneous vs. heterogeneous systems
An Introduction to Probabilistic Modeling
Techniques
Why Modelling?
• Current target platform: self-organized robotic systems
• Individual robot control deterministic BUT mobility + robot-to-environment and robot-to-robot interactions -> overall individual behavior highly stochastic
• However, we do not care about microscopic interactions (part of the self-organization process) as long as the swarm behavior/performance is statistically predictable -> only way to get swarm-based (industrial) applications!
• This way of proceeding is revolutionary in engineering (while traditional in science – biology, physics, chemistry). E.g., classical roboticists tend to eliminate all forms of inaccuracies, limitations in their initial models and then relax the original assumption in order to come close to reality. In our models, we consider from the beginning noise and inaccuracies (even more important with miniaturization) very much as biologists, physicists, and chemists do.
Why Modelling?• Understanding the role of system parameters in the team
performancecontrol, interaction geometry, body morphology, sensor, communication range, number of individuals
• Generalizingother hardware platforms, other experimental constraints, other class of experiments
• Prediction and fast evaluation -> design!before developing ad hoc embodied simulators or real embedded systems (104-105 faster than embodied simulations)
Probabilistic model
Embodied simulatorReal robots
An Incremental Probabilistic Modeling Methodology for
Nonspatial Metrics
Experimental LevelsAbstracted interactions, swarm behavior
• Simulation– Macroscopic models– Microscopic models– Non-embodied simulations– Embodied, sensor-based simulations
• Real robots– Actively controlled lab environments– Passively controlled lab environments– Real environments Detailed interactions,
individual behavior
Methodology OverviewA priori known features of robots and environment;
desired swarm performance metric(s)
Characterization of microscopic interactions
(robot-to-robot, robot-to-environment)
Prediction of swarm performance/macroscopic system dynamics
Modeling Assumptions• Nonspatial metrics for swarm performance
• Environment and controller can be described as ProbabilisticFSM; the state granularity of the description is arbitrarily established by the engineer as a function of the microscopic or macroscopic level of description.
• Both robots and environment are semi-Markovian systems: the system future state is a function of the current state and of how much time has been spent in this state.
• Mean spatial distribution of robots is homogeneous, as they were randomly hopping on the arena (trajectories not considered).
Experimental Validation of Nonspatial Assumption – (1)
PositionNonembodied obstacles = detection surfaces
Shape
Numerical example (mean ± std dev, 3 locations, 100 h simulated time):
0.310.31 ± 0.030.32 ± 0.020.3 ± 0.030.31 ± 0.04robot
GeometryAll shapesRoundRect.SquareSize
Experimental Validation of Nonspatial Assumptions (2)
DefaultSymmetryof Stick Distribution
# sticks
Modeling Methodology
Microscopic characterization
Search Avoidance
Start
Obstacle?pa
τa
1-pa
Deterministic agent’s flowchart
Ss Sa
pa
τa
1
PFSM (Markov Chain)
Start
Deterministic robot’s flowchart
Search Avoidance
Obstacle?YN
Microscopic Modeling Methodology
…
Caste 1
Ss Sa
Ss Sa
Ss Sa
R11
R12
R1l
…
Caste n
Se Sd
Si
Rn1
…Se Sd
Si
Rnm
…
O11 O1pOq1 Oqr
… …Sa Sb Sa Sb
Robotic System (N PFSM; N = total # agents)Coupling (e.g., manipulation, sensing)
Environment (Q PFSM; Q = total # objects)
Macroscopic Modeling Methodology
Coupling
Type 1
Ss Sa
Se Sd
Si
Type q
Caste1
Caste n
Robotic (PFSM)
• average quantities• central tendency prediction (1 run)• continuous quantities: +1 DE per state
for all robotic castes and object types (metric/task dependent!)
• - 1 DE if substituted with conservation equations (e.g., total # of robots, total # of objects of type q, … )
Sa Sb
Environment (PFSM)
Model Calibration• Goal: calibration method for achieving 0-
free parameter models!• Two types of parameters:
– Delays– Encountering probabilities
• Calibration procedure:– Systematic experiment with 1 real or realistic
simulated robot (robot-to-environment interaction)
– Systematic experiments with 2 real or realistic simulated robots (robot-to-robot interaction)
Time Discretization: The Engineering Receipt
Time-discrete vs. time-continuous models:1. Assess what’s the time resolution needed for your swarm
performance metrics2. Consider unit-to-unit interaction essence: probabilistic/deterministic,
asynchronicity role, …3. Choose whenever possible the most computationally efficient model:
time-discrete less computationally expensive than emulation of continuity (e.g. Runge-Kutta, etc.) and maybe the prediction accuracy not statistically different from continuous models because of other (more important) sources of inaccuracies in the system
Often not even a tradeoff: just use the appropriate instrument for the appropriate problem!
Delays
1. Measure all delays of interest in your system, i.e. those which might influence the swarm performance metrics. Note: often “delay states” can just summarize all what you need without getting into the details of what’s going on within the state
2. Consider only average values (we might consider also parameter distribution in the future, the modeling methodology does not prevent to do so)
3. For time-discrete systems: choose the time step T = MCF all the delays measured (e.g., 3 s obstacle avoidance, 4 s object manipulation, T = 1 s) -> no rounding error
Encountering Probabilities• Idea: trajectories and object location do not count
(see assumption) -> 1D Montecarlo simulation
2D physical space 1D probability space
O1
O1O1
O2
O2O2
O2
Free space
Encountering Probabilities
is
si g
AvWr =
As = detection area of the smallest object
v = mean robot speedWs = robot’s detection width for the
smallest object (center-to-center)
1. Measure geometric probabilities of encountering (detection areas)2. Calculate the encountering rate ri [s-1] for the object i from the
geometric probabilities gi :
3. For time-discrete models, calculate the encountering probabilities pi (per time step) from the encountering rates:
pi = riT
NOTE: not exactly the same of IJRR04 and AR04!
Geometric Probabilities (Detection Areas)
ps, pw, …are function of sensor range, behavior, robot’s and object’s size, … : interaction characterization!
Example: stick
ps = As/Aarena
As
Example 1: Linear Model –Wandering and
Obstacle Avoidance
A Simple Linear ModelExample: search (moving forwards) and obstacle avoidance
© Gilles Caprari, 1999 © Nikolaus Correll, 2004
A Simple Linear Model
Search Avoidance
1
+ paNs(k-Ta)- paNs(k)
pa
Linear DEs!
Ns(k)Ns(k+1) =
Na(k+1) = N0 – Ns(k+1)
Ta = obstacle avoidance durationpa = probability of encountering an obstacleNs = average # robots in searchNa= average # robots in obstacle avoidanceN0 = # robots used in the experimentk = 0,1, … (iteration index)
! Ns(k) = Na(k) = 0 for all k<0 !Ns(0) = N0 ; Na(0) = 0
A Simple Linear Model
Group size
Micro to macro comparison(same robot density but wall surfacebecome smaller with bigger arenas)
Typical parameter analysis at steadystate
Example 2:Nonlinear Model –
Collaborative Stick-Pulling
Set-Up
Physical Set-Up Collaboration via indirect communication
Proximity sensors
Arm elevationsensor
• 2-6 robots• 4 sticks• 40 cm radius arena
IR reflectiveband
Systematic Experiments
Real robots Embodied simulation
•[Martinoli and Mondada, ISER, 1995]•[Ijspeert et al., AR, 2001]
Systematic Experiments’ ResultsNrobots > Nsticks
Nrobots ≤ Nsticks
• Real robots (3 runs) and embodied simulations (10 runs)• System bifurcation as a function of #robots/#sticks
Modeling Methodology
Microscopic characterization
Probabilistic agent’sflowchart
Markov Chain (PFSM)Deterministic robot’s flowchart
Geometric Probabilities
sgg
sg
aww
rR
arr
ass
pRp
ppAAp
NppAApAAp
=
==
−===
2
1
0
/)1(
//
Aa = surface of the whole arena
Simplified Macroscopic Model
Τi,Τa,Τd,Τc << Τg →Τi=Τa=Τd=Τc=0
Simplified Macroscopic ModelNonlinear DE coupling through unit-to-unit interaction (in this case through the stick)!
Search Grip
Ns(k+1) =
Ng(k+1) =
C(k) =
Ct(k) =
Ns(k) – pg1[M0 – Ng(k)]Ns(k)
successful
+ pg2Ng(k)Ns(k)
unsuccessful
+ pg1[M0 – Ng(k-Τg)]Γ(k;0)Ns(k-Tg)
N0 – Ns(k+1)
pg2Ns(k)Ng(k)
e
T
k
T
kCe
∑=0
)(
Ns = average # robots in searching modeNg= average # robots in gripping modeN0 = # robots used in the experimentM0 = # sticks used in the experimentΓ = fraction of robots that abandon pullingTe = maximal number of iterationsk = 0,1, …Te (iteration index)C = # of collaborations Ct = mean collaboration rate over Te
Simplified Macroscopic Model
Fraction of robots which abandon pulling:
∏−
−−=
−=ΓSL
SLg
Tk
TTkjsgSL jNpTk )](1[);( 2
with
TSL = shift left duration [iterations] (=0 in this model)
Initial conditions and causality
Ns(0) = N0, Ng(0) = 0Ns(k) = Ng(k) = 0 for all k<0
Results Simplified Model• Microscopic (100 runs) and macroscopic models overlapped• Only qualitatively agreement with embodied/real robots results• Discrepancies micro-macro: small numbers and combinatorial issues
• 16 robots, 16 sticks, Ra = 80 cm• 4 robots, 4 sticks, Ra = 40 cm
Steady State Analysis• Steady-state analysis → can be computed analytically:opt
gT
gc
g
gg
optg R
forR
NRpT
+=≤
−
+−
−=
12
21
)1(2
1ln
)2
1ln(
10
1
βββ
β
with β = N0/M0 = ratio robots-to-sticks
• No for β>βc;• Depending on Rg, it’s possible:
i.e., optimal Tg also with more robots than sticks • [Lerman et al, Alife J., 2001], [Martinoli et al, IJRR, 2004]
optgT
cββ <<1
Steady State Analysis
20 robots and 16 sticks (optimal Tg)
gg RR101~ =Example: (collaboration very difficult)
Full ModelFor instance, for the average number of robots in searching:
)()()()()()()();()()(])()([)()1(
22
121
iasRaswcdascdagcascag
cgasacgagsRwggss
TkNpTkNpTkNTkTkNTkTkNTkTkkNppkkkNkN
−+−+−−∆+−−∆+−Γ−∆+++∆+∆−=+
with time-varying coefficients (nonlinear coupling):
∏−
−−=
−=Γ
=∆
−−=∆
SL
SLg
Tk
TTkjsgSL
ggg
dggg
jNpTk
kNpk
kNkNMpk
)](1[);(
)()(
)]()([)(
2
22
011
• 6 states: 5 DE + 1 cons. EQ• Ti,Ta,Td,Tc ≠ 0; Τxyz = Τx + Τy + Τz• TSL= Shift Left duration• [Martinoli et al., IJRR, 2004]
Results (Standard Arena)
Webots (10 runs), microscopic (100 runs), macroscopic model (1 run)
Results: 4 x #Sticks, #Robots and Arena Area
Webots (10 runs), microscopic (100 runs), macroscopic model (1 run)
Example 3:Nonlinear Model – Collective
Aggregation
To be Discussed in Week 10 Aggregation ProcessPhysical Set-Up
• 1-10 robots• 20-40 seeds• 80x80 cm and 114x114 cm arenas• With and without activity regulation
Combining Modeling and Machine-learning
Techniques
Key Idea• Microscopic modeling (or in general any form of abstraction) allow us
to consider certain parameters and leave others• Models as a “parameter” filter; allow for fast simulation and enhanced
understanding• Macroscopic models do not allow the study of specialization unless
precise castes can be identified a priori (caste = set of homogeneous individuals)
• In this case study we have identified a crucial parameter for our metrics and we wanted to understand what’s the best distribution of parameter values over the different individuals (learning as automatic way for discovering diversity/specialization)
In-Line Adaptive Learning(Li, Martinoli, Abu-Mostafa, 2001)
• GTP: Gripping Time Parameter (parameter to be learned)• ∆d: learning step• d: direction• Performance/reinforcement signal: collaboration rate, i.e. #
sticks pulled/minute
Co-Learning Collaborative BehaviorThree orthogonal axes to consider (extremities or balanced solutions are possible):
• Individual and group fitness• Private (non-sharing of parameters) and public (parameter sharing) policies• Homogeneous vs. heterogeneous systems
1
2
3
Results #1 – Public & Group & Homogeneous Policy
Filter cut-off f high (short averaging window)
Filter cut-off f low (long averaging window)
0 100 200 300 400 500 6000
0.2
0.4
0.6
0.8
1
1.2
1.4
Initial gripping time parameter (sec)
Sti
ck−p
ulli
ng
rat
e (1
/min
)
2 robots
3 robots
4 robots
5 robots
6 robots
0 100 200 300 400 500 6000
0.2
0.4
0.6
0.8
1
1.2
1.4
Initial gripping time parameter (sec)
Sti
ck−p
ulli
ng
rat
e (1
/min
)2 robots
3 robots
4 robots
5 robots
6 robots
Note: 1 parameter for the whole swarm!
Systematic (mean only)Learned (mean + std dev)
Diversity and SpecializationKey question: does team diversity enhance performance? I.e., canindividual become specialized?
Performance ratio between 2 caste and homogeneous system (microscopic model + Webots, systematic)
4 robots, one per color, microscopic model + learning
Results #1,2,3 – All Three Learning Policies and Systematic
Systematic search
Learning
Public, group, homprivate, group, hetprivate, individual, het
Importance of specialization
• [Li, Martinoli, Abu-Mostafa, 2002] Note: high cut-off filter frequency!
Impact of Diversity on Performance(Li, Martinoli, Abu-Mostafa, 2004)
2 3 4 5 6
1
1.05
1.1
1.15
1.2
1.25
1.3
Number of robots
Stic
k−pu
lling
rat
e ra
tio2−caste, GlobalHeterogeneous, GlobalHeterogeneous, Local
Performance ratio between heterogeneous (full and 2-castes) and homogeneous groups AFTER learning
Measuring Diversity and Specialization(Li, Martinoli, Abu-Mostafa, 2004)
2 3 4 5 60
1
2
3
4
5
6
Number of robots
Div
ersi
ty
2−caste, GlobalHeterogeneous, GlobalHeterogeneous, Local
2 3 4 5 6
0
0.5
1
1.5
2
2.5
Number of robots
Sp
ecia
lizat
ion
2−caste, GlobalHeterogeneous, GlobalHeterogeneous, Local
Diversity (entropy measurement)
Specialization: diversity in order to fit
Additional Literature – Week 6Books• Ross S. M., Introduction to Probability Models, Academic Press, San
Diego, CA, USA, 1997.
Papers• Ijspeert A. J., Martinoli A., Billard A., and Gambardella L.M.,
“Collaboration through the Exploitation of Local Interactions inAutonomous Collective Robotics: The Stick Pulling Experiment”. Autonomous Robots, 11(2):149–171, 2001.
• Lerman K., Martinoli A., and Galstyan A., “A Review of Probabilistic Macroscopic Models for Swarm Robotic Systems”. Proc. of the Workshop on Swarm Robotics, 8th Int. Conference on the Simulation of Adaptive Behavior, Los Angeles, July, 2004. Lecture Notes in Computer Science (2004), Sahin E. and Spears W., editors, to appear.
• Lerman, K. and Galstyan, A. Mathematical model of foraging in a group of robots: Effect of interference. Autonomous Robots, 13(2):127–141, 2002.