distributed intelligent systems – w6 a multi-level ... · swarm robotics • swarm-robotic...
TRANSCRIPT
Distributed Intelligent Systems – W6A Multi-Level Modeling
Methodology for Swarm Robotics
Outline• Multi-Level Modeling Methodology
– Rationale– Assumptions, implementation
• Model Calibration– Methods– Open issues
• Examples– Linear (obstacle avoidance)– Nonlinear (stick-pulling)
• Combined modeling and machine-learning methods• Diversity and specialization
Modeling Rationale and Methodology Overview
A First Achievement using Real Robots(Beckers, Holland, and Deneubourg, 1994)
Swarm Robotics: A new Engineering Discipline?
• Why does it work?• What are the principles?• Is a new paradigm or just an isolated experiment? • If yes, can we precisely define it?• Can we generalize these results to other tasks and
experimental scenarios?• How can we design an efficient and robust SR system?
Methods?• How can we optimize a SR system?
All these questions suggest that formal methods and therefore modeling could help!
• Even if individual node control deterministic, the interaction with the environment/teammates in the real-world is noisy and only statistically predictable →probabilistic models
• Some distributed intelligent systems might be self-organized: among key ingredients of self-organization (see week 1) there is randomness, an ingredient at the core of the exploration-exploitation balance of these systems. Coordinated collective behavior base on self-organization is statistically predictable using appropriate probabilistic models!
Rationale for Probabilistic Modelling –Distributed Intelligent Systems
Rationale for Probabilistic Modelling –Swarm Robotics
• Swarm-robotic systems represent a specific class of distributed intelligent systems: artificial, self-locomoted, scalable, etc.
• Often swarm-robotic systems consist of miniature/small-scale nodes. The smaller the units are and the more they might be affected by noise (crude sensors/actuators/com devices; limited computation): miniaturization increases the need for probabilistic modeling!
• It is not completely clear how much we can generalize the methodology proposed here to non-robotic platforms and to other class of experiments which do not fulfill its assumptions (on-going work)
• Different levels, different system/design parameterscertain low-level parameters (e.g., body morphology, sensor placement) captured explicitly only with very realistic models; others (e.g., communication range, number of individuals) captured also at higher level; diversity studied with microscopic but difficult with macroscopic models; system-oriented as opposed as single-layer oriented (e.g. control, communication)
• Different levels, different generalization powerthe higher the abstraction, the better the generalizing power (e.g., other hardware platforms, other experimental constraints,other class of experiments)
• Different levels, different computational costquantitatively accurate models have often to be solved numerically: the higher the abstraction, the lower the computational cost
Rationale for Multi-Level Modelling
Modeling Levels
Target swarm system: multiple units
ModelsMacroscopic models: single representation for the swarm
Microscopic models: unit represented individually • S&A-based simulator• Kinematic point simulator
• Multi-agent system• …
• Master equations• Chemical reaction networks
• Rate equations (ODE)• …
Approximations
Approximations
Approximations
Approximations
Abs
trac
tion
Expe
rim
enta
l tim
e
Multi-Level Modeling Methodology
Ss SaSs SaSs Sa
∑ ∑′ ′
′−′′=n n
nnn tNtnnWtNtnnW
dttdN )(),|()(),|()(
Ss Sa
Target system (physical reality): info on controller, S&A, communication, morphology and environmental features
Microscopic – Module-based: intra-robot (e.g., S&A, transceiver) and environment (e.g., physics) details reproduced faithfully
Microscopic – Agent-based: multi-agent models, only relevant robot features captured, 1 agent = 1 robot
Macroscopic: rate equations, mean field approach, whole swarm
Com
mon
met
rics
Originality and Differences with other Model-Based Approaches
• Specifically targeted to (miniature) swarm-robotic systems: exploit robust control design techniques for resource-constrained individual nodes (e.g. BB, ANN); incorporate microscopic constraints (e.g.,technology, environment) from start, bottom-up modeling and multi-level system-centered design approach
• Different from traditional model-based techniques in robotics: top-down approaches, start from idealized node model and relax assumptions; compensate with potential expensive/sophisticated technology and control algorithms for meeting the requirements; (multi-level) control-centered design approach
• Different from traditional modeling in biology (and other natural sciences): as simple as possible macroscopic models targeting a given scientific question; free parameters + fitting based on macroscopic measurements since often microscopic information notavailable/accurate
Modeling Assumptions
Modeling Assumptions 1. Probabilistic FSM description for environment and
multi-agent system; arbitrary state granularity2. Semi-Markovian properties: the system future state
is a function of the current state (and possibly of the amount of time spent in it)
3. Nonspatial metrics for swarm performance4. Mean field approach (well-mixed system): mean
spatial distribution of agents over multiple runs assumed to be homogeneous, as they were randomly hopping on the arena
5. Linear superposition of object/robot detection areas(sparse object/robot distribution; no overcrowding, no detection areas overlapped)
Assumptions 1 and 2• We work with states
pin poutTx
Sx
Sx: state xTx: duration of state xpin, pout: probabilities to entry and leave state x
• States can characterize both robots and the environment
Assumptions 3,4,5• Trajectories and object location do not count -> 1D
Montecarlo simulation• N objects of type i -> N x (prob. to encounter i)
O1O1
O2
O2O2
2D physical space 1D probability space
O2
O1
Free space
Experimental Validation of Assumption 3,4,5 – (1)
Position
Shape
Nonembodied obstacles = detection surfaces
0.310.31 ± 0.030.32 ± 0.020.3 ± 0.030.31 ± 0.04robot
GeometryAll shapesRoundRect.SquareSize
Numerical example (mean ± std dev, 3 locations, 100 h simulated time):
Experimental Validation of Assumptions 3,4,5
Symmetryof Stick Distribution
# sticks
Default
From Microscopic to Macroscopic Models:
Theoretical Background
Microscopic Level
∑ ∑′ ′
∆+′−′∆+=
−∆+=∆
n ntnptnttnptnptnttnp
tnpttnptnp),(),|,(),'(),|,(
),(),(),(
p(n,t) = probability of an agent to be in the state n at time tIf Markov properties fulfilled (neglect distributions):
inflow outflow
Probability the agent was in a given state n’
Transition probability
ttnttnptnnW
t ∆′∆+
=′→∆
),|,(lim);|(
0Transition rate
Sum over all possible states n’ the agent can be in
Macroscopic Level
∑ ∑′ ′
′ ′−′=n n
nnn tNtnnWtNtnnW
dttdN )(),|()(),|()( Rate Equation
(time-continuous)
inflow outflown, n’ = states of the agents (all possible states at each instant)Nn = average fraction (or mean number) of agents in state n at time tW = transition rates (linear, nonlinear);
∑ ∑′ ′
′ ′−′+=+n n
nnnn kTNkTnnTWkTNkTnnTWkTNTkN )(),|()(),|()())1((
Time-discrete version. k = iteration index, T time step (often left out)
Left and right side of the equation: averaging over the total number of agents, dividing by ∆t, limit ∆t → 0; neglect distributions of the stochastic variables (mean field approach):
Time Discretization: The Engineering Receipt
1. Assess what’s the time resolution needed for your swarm performance metrics
2. Consider unit-to-unit interaction essence: probabilistic/deterministic, asynchronicity role, …
3. Choose whenever possible the most computationally efficient model: time-discrete less computationally expensive than emulation of continuity (e.g. Runge-Kutta, etc.); in our systems/metrics there is no evidence of decreased prediction accuracy
4. Advantage of time-discrete models: a single common sampling rate can be defined among different abstraction levelsOften not even a tradeoff: just use the appropriate
instrument for the appropriate problem!
Time-discrete vs. time-continuous models:
Model Parameters• Gray-box approach: a priori information about the system
is exploited – multi-agent system– # of agents– technological and environmental constraints
• Models should not only explain but have also predictive power: the mapping between model parameters and design choices should be as explicit as possible (the higher the abstraction level the more difficult it is)
• Two types of parameters for micro-AB and macro: – State durations (e.g., interaction times with objects)– State-to-state transition probabilities (e.g., encountering
probabilities)
From the Target System to Macroscopic Models: An Incremental Bottom-Up Recipe in Seven Steps
Recipe: Target System & Metric(s)1. Perform your basic design choices for the single robot:
HW (e.g., robot morphology, S&A technology, etc.); SW (e.g., control architecture)
2. Define your system performance metric(s)
Recipe: Module-Based Model & FSM3. Implement faithfully your design choices in a module-
based microscopic model (in principle even running the same control code; component library provided)
4. Capture the control structure with a finite number of states of interest (semi-markovian properties must be fulfilled) and generate a corresponding FSM
Search Avoidance
Obstacle detected
Grip
Obstacle avoided
Recipe : Microscopic-AB Model
…
Caste 1Robotic System (N PFSM; N = total # agents)
Environment (Q PFSM; Q = total # objects)
Coupling (e.g., manipulation, sensing)
…O11 O1p
Oq1 Oqr
Ss Sa
Ss Sa
Ss Sa
R11
R12
R1l
…
Caste n
Se Sd
Si
Rn1
…Se Sd
Si
Rnm
… …Sa Sb Sa Sb
5. Approximate local interactions and intra-robot details and develop an agent-based model
Recipe : Macroscopic Model
Environment (1 PFSM)
Coupling
Type 1
Ss Sa
Se Sd
Si
Type q
Caste1
Caste n
Robotic system (1 PFSM)
• Single representation for the whole swarm
• average quantities• central tendency prediction (1 run)• continuous quantities: +1 ODE per
state for all robotic castes and object types (metric/task dependent!)
• - 1 ODE if substituted with conservation equations (e.g., total # of robots, total # of objects of type q, … )
Sa Sb
6. Approximate micro-to-macro mapping (mean field): exploit AB-blueprint and build the macroscopic model
• Two types of parameters for micro-AB and macro: – State durations (e.g., interaction times with objects)– State-to-state transition probabilities (e.g., encountering probabilities)
• A good calibration procedure should:– Maximize the quantitative matching between the target system and the
model according to one or several metrics of performance evaluation– Minimize a posteriori fitting– Have a low cost in terms of experimental time and hardware availability– Maximize model predictive power by allowing to integrate detailed
parameters anchored to physical/technological reality7. Calibrate using following procedures:
– Method 1: run “orthogonal” experiments on local a priori known interactions (robot-to-robot, robot-to-environment) → use for all types of interaction happening these values
– Method 2: use all a priori known information (e.g., geometry, technology) without running experiments → get initial guesses → fine-tune parameters automatically on the target experiment with a as small as possible subset of the target system
Recipe: Model Calibration
Model Calibration – Method 1[Correll & Martinoli ISER 2004]
Delays & Discretization Interval1. Measure all interaction times of interest in your system, i.e. those
which might influence the swarm performance metrics. Note: often “delay states” can just summarize all what you need without getting into the details of what’s going on within the state.
2. Consider only average values (we might consider also parameter distributions in the future, the modeling methodology does not prevent to do so)
3. For time-discrete systems: choose the time step T = MCF all the delays measured (e.g., 3 s obstacle avoidance, 4 s object manipulation, T = 1 s) -> no rounding error.Note: more accuracy in parameter measuring means in this case more computational cost when simulating
Geometric Probabilities gi(Normalized Detection Areas)
gs, gw, …are function of sensor range, behavior, robot’s and object’s size, … : interaction characterization!
Example: stick
gs = As/Aarena
As
Encountering Probabilities
is
si g
AvWr =
As = detection area of the smallest object
v = mean robot speedWs = robot’s detection width for the
smallest object (center-to-center)
1. Measure geometric probabilities of detection gi2. Calculate the encountering rate ri [s-1] for the object i from the
geometric probabilities gi :
3. For time-discrete models, calculate the encountering probabilities pi (per time step) from the encountering rates:
pi = riT
NOTE: see ISER 04; slightly different from IJRR04 (decoupled time and space) !
Model Calibration - Practice• Assumptions (well-mixed, linear overlap of areas) might
be only partially fulfilled• We do not capture distributions in the model parameters,
only deterministic average values; distributions might more faithfully capture:– Controller type (e.g. distal vs. proximal)– Active vs. passive objects (e.g., robot vs. puck)– Size/movability of objects (e.g., wall, blade vs. seed)– Embodiment vs. non embodiment (e.g., area vs. real obstacle)– Way of measuring your metrics (e.g., egocentric, allocentric)– Impact on the considered swarm performance metric through
error propagation (clear decoupling between parameters and structure inaccuracies of the model)
Allocentric Parameter Calibration
• From a bird vie perspective, using an external system (e.g. supervisor in Webots, tracking system in reality)
• Collision with an obstacle defined by a fixed distance between the obstacle and the robot
• High variance on the state durations, little variance (only due to different obstacle types such as wall, robots, corners, other objects) in the encountering probability
Egocentric Parameter Calibration
• From the internal robot perspective • Collision with an obstacle based on behavioral
transition and/or sensory perceptual threshold(might be slightly different for different type of controllers)
• Medium variance on the state durations and encountering probabilities (both due to due to the obstacle avoidance maneuvering, sensor and actuator noise, etc.)
Model Calibration - Practice
Micro-MB to micro-AB comparison(different controllers, calibration scenarios: static with allocentric vsegocentric calibration and allocentricmetric)
Na*/N0
Matching between model calibration and model validation: the example of allocentric vs. egocentric calibration
Issue: model structure based on egocentric information (controller) and metric usually on allocentricperformance (environment-based)
Model Calibration - PracticeBin distribution of interaction time Ta (mean Ta= 25 *50 ms = 1.25 s)
# of
col
lisio
ns
Collision time
Micro AB/macro, deterministic delay
Micro-MB, proximal contr., alloc.Micro-MB, distal controller, allocentric
Micro AB/macro, prob. delay
Model Calibration - PracticeGeometric probability g: example of transition in space from search to obstacle avoidance (1 moving robot, 1 dummy robot, Webots measurements, egocentric)
Distal controller (rule-based)
Proximal controller (Braitenberg, linear)
Model Calibration – Method 2[Correll & Martinoli, DARS 2006]
Distributed Boundary Coverage• Case study: turbine inspection• 25 blades• Up to 40 Alice II robots• Metric: time to complete the
inspection• Suite of algorithms (reactive to
deliberative) and models• In this course: focus on
reactive and introduce modeling methodology
Distributed Reactive Algorithms
10 alices, 25 blades, reactive blade-to-blade movement
20 alices, 25 blades, obstacle avoidance only
Parameter Calibration –Starting from Method 1
Encountering probability– Detection area– Robot speed– Sensor range– Ad hoc experiments
Interaction times– A hoc experiments
Improvement bydata-fitting
Can avoid ad hoc experiments for enc. probabilitiesCan relax one assumption
Relaxing Modeling Assumptions5. Linear superposition of object/robot detection areas (sparse
object/robot distribution; no overcrowding, no detection areas overlapped)
• Solution: constrained system identification technique [Correll & Martinoli, DARS-2006]
• Results (6 unknown par.)
Initial guess (geometry, specs)ExperimentOptimal parameterizedModel
Current Limitations
• Durations calibrated with previous method (constrained)
• Method has been tested with the whole system in place (including the targeted size in terms of number of robots).
Linear Example:Wandering and Obstacle
Avoidance
A Simple Linear Model
© Nikolaus Correll 2006
Example: search (moving forwards) and obstacle avoidance
A simple Example
Nonspatiality& microscopiccharacterizationDeterministic
robot’s flowchart
Search Avoidance
Start
Obstacle?YN
Search Avoid., τa
Start
Obstacle?pa
ps
1-pa
Probabilistic agent’s flowchart
Ss Sa
pa
τa
ps
PFSM
Linear Model – Probabilistic Delay
Search Avoidance, Ta
Ta = mean obstacle avoidance durationpa = probability of moving to obstacle av.ps = probability of resuming searchNs = average # robots in searchNa= average # robots in obstacle avoidanceN0 = # robots used in the experimentk = 0,1, … (iteration index)
Ns(k+1) =
Na(k+1) =
Ns(k)
N0 – Ns(k+1)
ps=1/Ta
+ psNa(k)- paNs(k)
pa
Ns(0) = N0 ; Na(0) = 0
Linear Model – Deterministic Delay
Search Avoidance, Ta
Ta = mean obstacle avoidance durationpa = probability moving to obstacle avoidanceNs = average # robots in searchNa= average # robots in obstacle avoidanceN0 = # robots used in the experimentk = 0,1, … (iteration index)
Ns(k+1) =
Na(k+1) =
Ns(k)
N0 – Ns(k+1)
1
+ paNs(k-Ta)- paNs(k)
pa
! Ns(k) = Na(k) = 0 for all k<0 !Ns(0) = N0 ; Na(0) = 0
Linear Model – Sample Results
Micro-AB to macro comparison(same robot density but wall surfacebecome smaller with bigger arenas)
Micro-AB to micro-MB comparison(different controllers, static scenarios, allocentric measures)
Na*/N0
Steady State Analysis• Nn(k+1) = Nn(k) for all states
n of the system → Nn*
• Note 1: equivalent to differential equation of dNn/dt = 0
• Note 2: for time-delayed equations easier to perform the steady-state analysis in the Z-space but in t-space also ok (see IJRR-04)
• For our linear example (time-delay option):
aas Tp
NN+
=1
0*
Group size
Ex.: normalized mean number of robots in search mode at steady state as a function of time for obstacle avoidance
aa
aaa Tp
TpNN+
=1
0*
aas Tp
NN+
=1
0*
Nonlinear Example –Stick-Pulling
A Case Study: Stick-Pulling
Proximity sensors
Arm elevationsensor
Physical Set-Up Collaboration via indirect communication
• 2-6 robots• 4 sticks• 40 cm radius arena
IR reflectiveband
Systematic Experiments
Real robots Module-based model
•[Martinoli and Mondada, ISER, 1995]•[Ijspeert et al., AR, 2001]
Experimental and Realistic Simulation Results
• Real robots (3 runs) and realistic simulations (10 runs)• System bifurcation as a function of #robots/#sticks
Nrobots > Nsticks
Nrobots ≤ Nsticks
Geometric Probabilities
sgg
sg
aww
rR
arr
ass
pRp
ppAAp
NppAApAAp
=
==
−===
2
1
0
/)1(
//
Aa = surface of the whole arena
From Reality to Abstraction
Deterministic robot’s flowchart
Probabilistic agent’sflowchart
PFSMNonspatiality& microscopiccharacterization
Full Macroscopic Model
• 6 states: 5 DE + 1 cons. EQ• Ti,Ta,Td,Tc ≠ 0; Τxyz = Τx + Τy + Τz• TSL= Shift Left duration• [Martinoli et al., IJRR, 2004]
)()()()()()()();()()(])()([)()1(
22
121
iasRaswcdascdagcascag
cgasacgagsRwggss
TkNpTkNpTkNTkTkNTkTkNTkTkkNppkkkNkN
−+−+−−∆+−−∆+−Γ−∆+++∆+∆−=+
For instance, for the average number of robots in searching mode:
∏−
−−=
−=Γ
=∆
−−=∆
SL
SLg
Tk
TTkjsgSL
ggg
dggg
jNpTk
kNpk
kNkNMpk
)](1[);(
)()(
)]()([)(
2
22
011
with time-varying coefficients (nonlinear coupling):
Swarm Performance Metric
C(k) = pg2Ns(k-Tca)Ng(k-Tca)
e
T
k
T
kCe
∑== 0
t
)( (k)C
: mean # of collaborations at iteration k
: mean collaboration rate over Te
Collaboration rate: # of sticks per time unit
Results (Standard Arena)
Micro-MB (10 runs)Micro-AB (100 runs)Macro (1 run)
Discrepancies because of ODE approximations (nonlinearities + discrete exact vs. average quantities)
Results: 4 x #Sticks, #Robots and Arena Area
Micro-MB (10 runs)Micro-AB (100 runs)Macro (1 run)
Reducing the Macroscopic Model
Τi,Τa,Τd,Τc << Τg →Τi=Τa=Τd=Τc=0
Goal: reach mathematical tractability
Nonlinear coupling!
Reduced Macroscopic Model
Search Grip
Ns = average # robots in searching modeNg= average # robots in gripping modeN0 = # robots used in the experimentM0 = # sticks used in the experimentΓ = fraction of robots that abandon pullingTe = maximal number of iterationsk = 0,1, …Te (iteration index)
Ns(k+1) =
Ng(k+1) =
Ns(k) – pg1[M0 – Ng(k)]Ns(k)
successful
+ pg2Ng(k)Ns(k)
unsuccessful
+ pg1[M0 – Ng(k-Τg)]Γ(k;0)Ns(k-Tg)
N0 – Ns(k+1)
∏−=
−=Γk
Tkjsg
g
jNpk )](1[)0;( 2
Ns(0) = N0, Ng(0) = 0Ns(k) = Ng(k) = 0 for all k<0
Initial conditions and causality
Results Reduced Micro-AB Model
• 4 robots, 4 sticks, Ra = 40 cm • 16 robots, 16 sticks, Ra = 80 cm
• Micro-AB (100 runs) and macro models overlapped• Only qualitatively agreement with micro-MB/real robots results
Steady State Analysis (Reduced Macro Model)
• Steady-state analysis [Nn(k+1) = Nn(k)] → It can be demonstrated that :
g
optg RM
NforT+
≤∃1
2
0
0
with N0 = number of robots and M0= number of sticks, Rg approaching angle for collaboration
• Counterintuitive conclusion: an optimal Tg can exist also inscenarios with more robots than sticks if the collaboration is very difficult (i.e. Rg very small)!
∝
approaching angle for collaboration
Analysis Verification (Micro-AB and Macro Full Model)
gg RR101~ =
20 robots and 16 sticks (optimal Tg)
Example: (collaboration very difficult)
• can be computed numerically by integrating the full model ODEs or solving the full model steady-state equations
Optimal Gripping Time• Steady-state analysis → can be computed analytically in
the simplified model (numerically approximated value):
gc
g
gg
optg R
forR
NRpT
+=≤
−
+−
−=
12
21
)1(2
1ln
)2
1ln(
10
1
βββ
β
optgT
with β = N0/M0 = ratio robots-to-sticks
[Lerman et al, Alife Journal, 2001], [Martinoli et al, IJRR, 2004]
optgT
Journal Publications using the Same Modeling Framework
Stick Pulling
• [Martinoli, Easton, Agassounon, Int. J. of Robotics Res., 2004]• [Lerman, Galstyan, Martinoli, Ijspeert, Artificial Life, 2001]• [Ijspeert, Martinoli, Billard, Gambardella, Auton. Robots, 2001]
Object Aggregation
• [Agassounon, Martinoli, Easton, Autonomous Robots, 2004]• [Martinoli, Ijspeert, Mondada, Robotics and Auton. Systems 1999]
Wireless-Based Cohesive Swarming
• [Winfield, Liu, Nembrini, Martinoli, Swarm Intelligence J., 2008]
Combined Modeling and Machine-Learning
Techniques
Rationale for Combined Methods (1)• Any level of modeling (micro-MB, micro-AB, or macro) allow us to consider
certain parameters and leave others; models, as expression of reality abstraction, can be considered as more or less coarse “filters” of the reality
• Combined modeling/machine-learning techniques can be be used at any of the abstraction levels; machine-learning techniques will explore the design parameters explicitly represented at a given level of abstraction
• Depending on the features of the hyperspace to be searched (size, continuity, noise, etc.), appropriate machine-learning techniques should be used (e.g., single-agent vs. multi-agent techniques); the different mapping policies (e.g., individual/group, public/private, homogeneous/heterogeneous) are “orthogonal” and can be applied to different microscopic levels
• One particular optimization problem is system identification: the performance to optimize is the matching with the reality (or with a lower abstraction level). See model calibration theory, calibration method #2, this lecture.
Rationale for Combined Methods (2)
Ss SaSs SaSs Sa
Ss Sa
Target system + ML = adaptation with HW in the loop (on-board or off-board)
MB-Microscopic + ML (see Week 05 examples using GA and PSO); for instance low-level design parameters can be learned
AB-Microscopic + ML (see this lecture’s examples); for instance, diversity and specialization can be studied
Macroscopic + ML? Most of the time not needed since very fast + continuous; homogeneous systems mainly;standard numerical optimization techniques/systematic search can be used
Abs
trac
tion
Cos
t of o
ptim
izat
ion/
desi
gn
In-Line Adaptive Learning
Not Always a big Artillery such a GA/PSO is the Most Appropriate Solution…
• Simple individual learning rules combined with collective flexibility can achieve extremely interesting results
• Simplicity and low computational cost means possible embedding on simple, real robots
• Can be used for all sort of parameters, for instance also those controlling the robot activity for a given task in a threshold-based algorithm (see lecture Week 7)
In-Line Adaptive Learning(Li, Martinoli, Abu-Mostafa, 2001)
• GTP: Gripping Time Parameter • ∆d: learning step• d: direction• Underlying low-pass filter for measuring the performance
Algorithm Parameters
Low-pass filter
Adapting rules for the learning step
From Li et al., Adaptive Behavior, 2004
In-Line Adaptive LearningDifferences with gradient descent methods:• Fixed rules for calculating step increase/decrease → limited
descent speed → no gradient computation → more conservative but more stable
• Randomness for getting out from local minima (no momentum)
• Underlying low-pass filter is part of the algorithm
Differences with Reinforcement Learning:• No learning history considered (only previous step)
Differences with basic In-Line Learning:• Step adaptive → faster and more stability at convergence
Co-Learning in a Collaborative Framework –
Homogeneous Systems
Baseline: Homogeneous TeamThree orthogonal axes to consider (extremities or balanced solutions are possible):
• Individual and group fitness• Private (non-sharing of parameters) and public (parameter sharing) policies• Homogeneous vs. heterogeneous systems
Example with binary encoding of candidate solutions
Sample Results – Homogeneous System
0 100 200 300 400 500 6000
0.2
0.4
0.6
0.8
1
1.2
1.4
Initial gripping time parameter (sec)
Sti
ck−p
ulli
ng
rat
e (1
/min
)
2 robots
3 robots
4 robots
5 robots
6 robots
0 100 200 300 400 500 6000
0.2
0.4
0.6
0.8
1
1.2
1.4
Initial gripping time parameter (sec)
Sti
ck−p
ulli
ng
rat
e (1
/min
)2 robots
3 robots
4 robots
5 robots
6 robots
Short averaging window(filter cut-off f high)
Long averaging window(filter cut-off f low)
Learned (mean + std dev)Systematic (mean only) Note: 1 parameter for the
whole group!
Co-Learning in a Collaborative Framework –Heterogeneous Systems
Viable Policies for Exploring Het. TeamsThree orthogonal axes to consider (extremities or balanced solutions are possible):
• Individual and group fitness• Private (non-sharing of parameters) and public (parameter sharing) policies• Homogeneous vs. heterogeneous systems
Viable for Viable for exploring exploring heterogeneous heterogeneous solutionssolutions
Not scalable
Heterogeneity allowed but eventually roughly homogeneous solution via shuffle around of candidate solutions
Homogeneity enforced
4 robots, one per color, AB-micro + learning
Key question: does team diversity enhance performance? I.e., canindividual become specialized?
Performance ratio between 2 caste and homogeneous system (MB- and AB-microscopic models, systematic)
Diversity and Specialization
Impact of Diversity on Performance(Li, Martinoli, Abu-Mostafa, 2004)
2 3 4 5 6
1
1.05
1.1
1.15
1.2
1.25
1.3
Number of robots
Stic
k−pu
lling
rat
e ra
tio
2−caste, GlobalHeterogeneous, GlobalHeterogeneous, Local
Performance ratio between heterogeneous (full and 2-castes) and homogeneous groups AFTER learning
Notes: • large Tm (long averaging
window)• only private strategies
global = group• local = individual
Diversity Metrics(Balch 1998)
Entropy-based diversity measure introduced in AB-04 could be used for analyzing threshold distributions
Simple entropy:
pi = portion of the agents in cluster i; m cluster in total; h = taxonomic level parameter
Social entropy:
Example – Simple Entropy
34
6r2
r1
r3
• R = {r1, r2, r3}• n = 3 (three swarm points)• bi-dimensional space • define a distance: Euclidian distance• h = taxonomic level parameter• m = number of clusters
34
6r2
r1
r3
h < 3, m = 3
477.031log
313
)31,
31,
31(log)(
3
1
=−
==−= ∑=
HppRHi
ii
34
6r2
r1
r3
3 ≤ h < 4, m = 2
c1
c2
276.0117.0159.032log
32
31log
31
)32,
31(log)(
2
1
=+=−−
==−= ∑=
HppRHi
ii
c1
c2c3
Example – Simple Entropy
11
=∑=
m
iip
34
6r2
r1
r3
4 ≤ h < 6, m = 2c1
c2
301.021log
21
21log
21
)21,
21()
21
31
31,
21
31
31(log)(
2
1
=−−
=++=−= ∑=
HHppRHi
ii
34
6r2
r1
r3
h ≥ 6, m = 1
c1
01log)33(log)(
1
1
=−==−= ∑=
HppRHi
ii
Check with overlapping clusters!
Example – Social Entropy
309.20)21,
21(2)
32,
31(1)
31,
31,
31(3)()(
0
=+×+×+×== ∫∞
HHHRHRD
Note: In contrast to simple entropy ≥ 1
Contrast with R = {r1, r2, r3} and r1 = r2= r3 (homogeneous swarm), for any h ≥ 0 → single cluster → D(R) = 0!
Differences with Plain Euclidian Diversity Measure
• Underlying difference measure might be the same (e.g. Euclidian distance) • Social entropy is looking for possible clustering of the vectors (looking for
possible castes) while Euclidian diversity is just looking how spread out/diverse in general are the vectors
Components in all dimensions
All points from any other point
Specialization Metric
S = specialization; D = social entropy; R = swarm performance
Specialization metric introduced in AB-04:
Notes• Idea: “weighting diversity with performance”• This is useful when the number of tasks to be solved is not well-defined or it is
difficult to assess the task granularity a priori. In such cases the mapping between task granularity and caste granularity might not trivial (one-to-one mapping? How many sub-tasks for a given main task, etc. see the limited performance of a caste-based solution in the stick-pulling experiment)
• Could be used for analyzing specialization arising from a variable-threshold division of labor algorithm (see lecture Week 7)
Sample Results in the Standard Sticks
Diversity SpecializationRelative Performance
• Spec more important for small teams
• Local p > global p• Enforced caste: pay the
price for odd team sizes
• Flat curves, difficult to tell whether diversity bring performance
• Specialization higher with global when needed, drop more quickly when not needed
• Enforcing caste: “low-pass filter” effect
• 2 serial grips needed to get the sticks out• 4 sticks, 2-6 robots, 80 cm arena
Sample Results for Long Sticks
Diversity SpecializationRelative Performance
• Spec more important for small teams
• Global p > local p• Enforced caste: helps
because of original large space (= # of robots)
• Flat curves, difficult to tell whether diversity bring performance
• However, consistent with standard set-up
• Specialization higher with global when needed, drop more quickly when not needed
• Not clear why spec grows with team size and local
• 4 serial grips needed to get the sticks out• 16 sticks, 8-24 robots, 4 times bigger area
• Global performance is usually less noisy, so part of diversity for increasing performance higher with global performance (“specialization when needed”)
• Default sticks - When local and global performance are almost aligned (i.e. “by doing well locally I do well globally”), local performance achieve slightly better results since direct reinforcement (no credit assignment problem)
• Long sticks - When local and global performance are not aligned (i.e. only for grip k performance aligned, for grip 1 … k-1 no insurance a “given baton passing action” will lead to stick pulled out), global reinforcement achieve better results at the price of a bigger credit assignment problem resulting usually in slower learning
Some Remarks on the Results
Conclusion
Take Home Messages
• Models help understanding and generalizing properties of real-time distributed intelligent systems
• Two main levels of models: micro and macro• Often macroscopic models using the mean field
approach result in nonlinear, time-delayed ODE• Multi-level modeling allows for different
approximations, accuracy/computation trade-offs• A methodology with precise inter-level mapping has
been developed for a given class of experiments• If carefully designed, models allow also for system
optimization and closing the loop between analysis and synthesis
Take Home Messages
• Different modeling levels can be combined with machine-learning for design and optimization purposes
• AB-microscopic models allows for efficiently studying diversity and specialization issues
• Specialization is the part of diversity that improves performance
• The diversity and specialization level of a heterogeneous swarm can be quantitatively measured
Additional Literature – Week 6Book• T. Balch and L. E. Parker (Eds.), Robot teams: From diversity to polymorphism.
Natick, MA: A K Peters.• Ross S. M., Introduction to Probability Models, Academic Press, San Diego,
CA, USA, 1997.
Papers• Ijspeert A. J., Martinoli A., Billard A., and Gambardella L.M., “Collaboration
through the Exploitation of Local Interactions in Autonomous Collective Robotics: The Stick Pulling Experiment”. Autonomous Robots, 11(2):149–171, 2001.
• Lerman, K. and Galstyan, A. Mathematical model of foraging in a group of robots: Effect of interference. Autonomous Robots, 13(2):127–141, 2002.
• Murciano, A., Millán, J. del R., & Zamora, J. (1997). Specialization in multi-agent systems through learning. Biological Cybernetics, 76, 375–382.
• Wolpert, D. H., & Tumer, K. (2001). Optimal payoff functions for members of collectives. Advances in Complex Systems, 4, 265–279.
• Matarić, M. J. (1998). Using communication to reduce locality in distributed multi-agent learning. Journal of Experimental and Theoretical Artificial Intelligence, 10, 357–369.