sc741– w6: an introduction to machine- learning and modeling … · 2008. 11. 21. · ga: coding...

SC741– W6: An Introduction to Machine-

Learning and Modeling Techniques for Swarm-Intelligent, Embedded

Systems

Outline• Machine-Learning Techniques

– Overview and definitions– Genetic Algorithms– In-line learning– From single-unit to multiple-unit systems

• Probabilistic Modeling Techniques– Overview and definitions– Microscopic modeling– Macroscopic modeling

• Combining Modeling and Learning Techniques

An Introduction to Machine-Learning

Techniques

Why Machine-Learning?

• Complementarity to a model-based/engineering approaches: when low-level details matter (optimization) and/or good solution do not exist (design)!

• When the design/optimization space is too big (infinite)/too computationally expensive to be systematically searched

• Automatic design and optimization techniques• Role of engineer reduced to specifying

performance requirements and problem encoding

Why Machine-Learning?

• There are design and optimization techniques robust to noise, nonlinearities, discontinuities

• Individual real-time, adaptation to new environmental conditions; i.e. increased individual flexibility when environmental conditions are not known/cannot predicted a priori

• Search space: parameters and/or rules

ML Techniques: Classification– Supervised techniques: “a trainer/teacher” is available.

• Ex: a set of in-out examples is provided to the system and the error at the output of the of the system vs. perfect answer of the teacher for the same input is calculated and fed to the system using a machine-learning algorithm so that error is reduced over trials

• The system generalization capabilities after training are tested on examples not presented to the system before (i.e. not in the training set)

– Unsupervised techniques: “trial-and-error”, “evaluative” techniques; no teacher available.

• The system judges its performance according to a given metric to be optimized

• The metrics does not refer to any specific input-to-output mapping• The system tries out possible design solutions, does mistakes, and

try to learn from its mistakes• Number of possible examples is very large, possibly infinite, and

not known a priori

ML Techniques: Classification

– Off-line: in simulation, download of the learned/evolved solution onto real hardware when certain criteria are met

– Hybrid: most of the time in simulation (e.g. 90%), last period ( e.g. 10%) of the process on real hardware

– On-line: from the beginning on real hardware (no simulation). Depending on the algorithm more or less rapid

ML Techniques: ClassificationML algorithms require sometimes fairly important computational resources (in particular for multi-agent search algorithms), therefoe a further classification is:

– On-board: machine-learning algorithm run on the system to be learned or evolved (no external unit)

– Off-board: the machine-learning algorithm runs off-board and the system to be learned or evolved just serve as phenotypical, embodied implementation of a candidate solution

Selected Unsupervised ML Techniques Robust to Noisy

Fitness/Reinforcement Functions• Evolutionary computation

– Genetic Algorithms (GA) – Genetic Programming (GP)– Evolutionary Strategies (ES)– Particle Swarm Optimization (PSO)

• Learning – In-Line Adaptive Learning– Reinforcement Learning

Today

W9

W11Today

GA: Coding & Decodingphenotype genotype phenotype

coding decoding(chromosome)

• phenotype: usually consists of real numbers • chromosome = string of genotypical segments, i.e. genes

G1 G2 GnG4G3 … Gi = gene = binary or real number

Coding: real to real or binary to real via Gray code (minimization of nonlinear jumping between phenotype and genotype)

Decoding: inverted operation

Rem:

• Artificial evolution: usually one-to-one mapping between phenotypic and genotypic space

• Natural evolution: 1 gene codes for several functions, 1 function coded by several genes.

GA: Population & Fitness• Population: set of solutions tested in one generation; consists of m genetic

individuals (e.g. m = 100), each one provided with an own chromosome.

• Generation: new population after genetic operators have been applied (n = # generations e.g. 50, 100, 1000).

• Evaluation span: evaluation period of each genetic individual during one generation.

• Life span: number of generation an individual survive

• Fitness function: measurement of each genetic individual efficacy during life-span

• Population manager: applies genetic operators to generate the individuals of the new generation from the current one, at the end of the life span of the whole population.

GA: Basic Operators• Selection: roulette wheel, ranked selection, elitist selection• Crossover: 1 point, 2 points (e.g. pcrossover = 0.2)

• Mutation (e.g. pmutation = 0.05)

Gk Gk’

Note: examples for fixed-length chromosomes!

Generation Loop

Evaluation of Individual Candidate

Solutions

Population Replenishing

Selection

Crossover and Mutation

Decoding (genotype-> phenotype) Fitness Measurement

Population Manager

System

Evolutionary Loop: Several Generations

Start

Initialize Population

Generation loop

End criterion met?N

Y

Ex. of end criteria:

• # of generations

• best solution performance

•… End

Examples of GA Applications

• Mechanical design

• Single robot experiments

An Example of Mechanical

Design using GA

+ +

Khepera external powering module• External power in each position and

orientation

• Power floor + redesigned basic module endowed with electrical contacts

Powering Module: Problem Encoding

• Mixed approach: engineering intuition and GA

• Engineer: bands, minimal number of contacts (≥ 4) for limiting search space

• Genotype (9 reals): encoding of wcond, (r,ϕ) coordinates of each contact

• Fitness: fraction of position (translation and rotation) powered

• Simulated mechanical vibrations(gaussian noise) of contacts for evolving robust solutions

• m = 100, n = 50, pc = 0.2, pm = 0.01• Evolved solution symmetric, can be

backwards geometrically demonstrated that optimal solution

Evolving a Neural Controller

1e12)( −

+= −xxf

S2

S3 S4S5

S6

M1 M2

∑=

+=m

jjiji IIwx

10

)( ii xfO =

inhibitory conn.excitatory conn.

Oi output

f(xi)

wijsynaptic weight

S1

neuron N with sigmoidtransfer function f(x)Ni

Ij S8 S7input

Note: In our case we evolve synaptic weigths but Hebbian rules for dynamic change of the weights, transfer function parameters, … can also be evolved (see Floreano course)

Evolving Obstacle Avoidance(Floreano and Mondada 1996)

Defining performance (fitness function):

)1)(1( iVV −∆−=Φ

• V = mean speed of wheels, 0 ≤ V ≤ 1• ∆v = absolute algebraic difference

between wheel speeds, 0 ≤ ∆v ≤ 1• i = activation value of the sensor with the

highest activity, 0 ≤ i ≤ 1

Note: Fitness accumulated during evaluation span, normalized over number of control loops (actions).

Evolving Robot Controllers

Note:Controller architecture can be of any type but worth using GA if the number of parameters to be tuned is important

Evolving Obstacle Avoidance

Evolved path

Fitness evolution

Evolved Obstacle Avoidance Behavior

• Generation 100• Direction of motion NOT encoded in the fitness function: GA

automatically discovers asymmetry in the sensory system configuration (6 proximity sensors in the front and 2 in the back)

Evolving Homing Behavior(Floreano and Mondada 1996)

Robot’s sensorsSet-up

• Fitness function:

• V = mean speed of wheels, 0 ≤ V ≤ 1• i = activation value of the sensor with the

highest activity, 0 ≤ i ≤ 1

)1( iV −=Φ

Evolving Homing BehaviorController

• Fitness accumulated during life span, normalized over maximal number (150) of control loops (actions).

• No explicit expression of battery level/duration in the fitness function (implicit).• Chromosome length: 102 parameters (real-to-real encoding).• Generations: 240, 10 days embedded evolution on Khepera.

Evolving Homing BehaviorFitness evolution

Evolution of # control loops per evaluation span

Battery recharging vs. motion patterns

Battery energy

Left wheel activation

Right wheel activation

Reach the nest -> battery recharging -> turn on spot -> out of the nest

Evolved Homing Behavior

Moving BeyondController-Only Evolution

• Evidence: Nature evolve HW and SW at the same time …

• Faithful realistic simulators enable to explore design solution which encompasses off-line co-evolution (co-design) of control and morphological characteristics (body shape, number of sensors, placement of sensors, etc. )

• GA are powerful enough for this job and the methodology remain the same; only encoding changes

Evolving Sensory Systems(Zhang, Antonsson, Martinoli 2002)

• Realistic simulator: Webots• Evolution of:

– number of sensors– cone of view– range– sensor placement

• Sensory system cost proportional to # of sensors, range, cone of view

• Fitness: maximal coverage at minimal cost ��

��

��

��

��

��

��

��

��

��

�

��

��

��

��

�� !��

��

�

Dealing with Variable-Length Chromosome

Zero line Zero line

Crossover line Crossover line

Zero line Zero line

Crossover line Crossover line

Parents Children

Crossover

• Evolution of number of sensors -> variable-length chromosome

• Additional operators: insertion, deletion

• Crossover: extremely disruptive operator

• Idea: take advantage of the problem genotype-to-phenotype mapping

• In our case: convex, closed structure on the vehicle -> fold chromosome in the same way -> mapping maintained!

Evolving Control and Robot Morphology

(Lipson and Pollack, 2000)

http://www.mae.cornell.edu/ccsl/research/golem/index.html• Arbitrary recurrent ANN• Passive and active (linear

actuators) links• Fitness function: net distance

traveled by the centre of mass in a fixed duration

Example of evolutionary sequence:

http://www.mae.cornell.edu/ccsl/research/golem/index.html

http://www.mae.cornell.edu/ccsl/research/golem/index.html

Examples of Evolved Machines

Problem: simulator not enough realistic (performance higher in simulation because of not good enough simulated friction; e.g., for the arrow configuration 59.6 cm vs. 22.5 cm)

In-Line Adaptive Learning

Not Always a big Artillery such a GA is the Most Appropriate Solution…

• Simple individual learning rules combined with collective flexibility can achieve extremely interesting results

• Simplicity and low computational cost means possible embedding on real simple individuals

• We will see similar learning rules in combination with threshold bases algorithms, foraging strategies, …

In-Line Adaptive Learning(Li, Martinoli, Abu-Mostafa, 2001)

Key ideas:• Origin: adaptive bang-bang control• 2-steps algorithm• Learning step adaptive according to fixed rules• Randomness when stuck• Can be combined with a low-pass filter for noise reduction

GTP: parameter to be learned

∆d: learning stepd: direction

Performance

In-Line Adaptive LearningDifferences with gradient descent methods:• Fixed rules for calculating step increase/decrease -> limited

descent speed -> no gradient computation -> more conservative but more stable

• Randomness for getting out from local minima (no momentum)

• Underlying low-pass filter is part of the algorithm

Differences with Reinforcement Learning:• No learning history considered (only previous step)

Differences with basic In-Line Learning:• Step adaptive -> faster and more stability at convergence

Sample ResultsTest-bed: multi-agent system, 1 parameter (GTP)

Filter cut-off f high (short averaging window)

Filter cut-off f low (long averaging window)

0 100 200 300 400 500 6000

0.2

0.4

0.6

0.8

1

1.2

1.4

Initial gripping time parameter (sec)

Sti

ck−p

ulli

ng

rat

e (1

/min

)

2 robots

3 robots

4 robots

5 robots

6 robots

00

0.2

0.4

0.6

0.8

1

1.2

1.4

Sti

ck−p

ulli

ng

rat

e (1

/min

)Pe

rfor

man

ce

GTP

Perf

orm

ance

Learned (mean + std dev)Systematic (mean only)

100 200 300 400 500 600Initial gripping time parameter (sec)

2 robots

3 robots

4 robots

5 robots

6 robots

GTP

Moving from a Single-Unit to Multiple-Unit

Systems

Evolving Controllers

• Collective: fitness become noisy due to partial perception, independent parallel actions

Credit Assignment ProblemWith limited communication, no communication at all, or partial perception:

Example: Evolving the Sensory System of one Vehicle within a Traffic System

(Zhang, Antonsson, Martinoli 2002)

��

��

��

��

��

��

��

��

��

��

�

��

��

��

��

�� !��

��

�

Fitness Function

}0;)1max{(1

∑=

−=V

i

i

Vanf δ

a = cost factor for sensorsn = number of sensors usedδi = 0 if car not detected; 1 if car detectedV = actual number of vehicles in the detection region during

a whole evaluation span

Ad Hoc Evolutionary Loop

• Goals:– Evaluating actual fitness– Eliminating as quickly as possible

lucky individuals

• Solution:– Dynamical population size– Individual life span = several

evaluation spans over generations

Co-Learning and Co-Evolution

Co-Learning or Co-Evolving of Competitive Behavior

No credit assignment problem: individual fitness!Example: co-evolution of prey’s and predator’s controllers

Φprey= T Φpredator= 1-T

T = normalized time of survival of the prey (before is caught)

Prey-Predator Experiment(Nolfi and Floreano, 1998)

Prey-Predator ExperimentFitness co-evolution in simulation

Fitness co-evolution in real robots

Co-Learning or Co-Evolving of Collaborative Behavior

Key question: What metrics (group, individual, combination) should we measure for improving individual controllers?

Ex.:Co-Evolving Collaborative BehaviorThree orthogonal axes to consider (extremities or balanced solutions are possible):

• Individual and group fitness• Private (non-sharing of parameters) and public (parameter sharing) policies• Homogeneous vs. heterogeneous systems

An Introduction to Probabilistic Modeling

Techniques

Why Modelling?

• Current target platform: self-organized robotic systems

• Individual robot control deterministic BUT mobility + robot-to-environment and robot-to-robot interactions -> overall individual behavior highly stochastic

• However, we do not care about microscopic interactions (part of the self-organization process) as long as the swarm behavior/performance is statistically predictable -> only way to get swarm-based (industrial) applications!

• This way of proceeding is revolutionary in engineering (while traditional in science – biology, physics, chemistry). E.g., classical roboticists tend to eliminate all forms of inaccuracies, limitations in their initial models and then relax the original assumption in order to come close to reality. In our models, we consider from the beginning noise and inaccuracies (even more important with miniaturization) very much as biologists, physicists, and chemists do.

Why Modelling?• Understanding the role of system parameters in the team

performancecontrol, interaction geometry, body morphology, sensor, communication range, number of individuals

• Generalizingother hardware platforms, other experimental constraints, other class of experiments

• Prediction and fast evaluation -> design!before developing ad hoc embodied simulators or real embedded systems (104-105 faster than embodied simulations)

Probabilistic model

Embodied simulatorReal robots

An Incremental Probabilistic Modeling Methodology for

Nonspatial Metrics

Experimental LevelsAbstracted interactions, swarm behavior

• Simulation– Macroscopic models– Microscopic models– Non-embodied simulations– Embodied, sensor-based simulations

• Real robots– Actively controlled lab environments– Passively controlled lab environments– Real environments Detailed interactions,

individual behavior

Methodology OverviewA priori known features of robots and environment;

desired swarm performance metric(s)

Characterization of microscopic interactions

(robot-to-robot, robot-to-environment)

Prediction of swarm performance/macroscopic system dynamics

Modeling Assumptions• Nonspatial metrics for swarm performance

• Environment and controller can be described as ProbabilisticFSM; the state granularity of the description is arbitrarily established by the engineer as a function of the microscopic or macroscopic level of description.

• Both robots and environment are semi-Markovian systems: the system future state is a function of the current state and of how much time has been spent in this state.

• Mean spatial distribution of robots is homogeneous, as they were randomly hopping on the arena (trajectories not considered).

Experimental Validation of Nonspatial Assumption – (1)

PositionNonembodied obstacles = detection surfaces

Shape

Numerical example (mean ± std dev, 3 locations, 100 h simulated time):

0.310.31 ± 0.030.32 ± 0.020.3 ± 0.030.31 ± 0.04robot

GeometryAll shapesRoundRect.SquareSize

Experimental Validation of Nonspatial Assumptions (2)

DefaultSymmetryof Stick Distribution

# sticks

Modeling Methodology

Microscopic characterization

Search Avoidance

Start

Obstacle?pa

τa

1-pa

Deterministic agent’s flowchart

Ss Sa

pa

τa

1

PFSM (Markov Chain)

Start

Deterministic robot’s flowchart

Search Avoidance

Obstacle?YN

Microscopic Modeling Methodology

…

Caste 1

Ss Sa

Ss Sa

Ss Sa

R11

R12

R1l

…

Caste n

Se Sd

Si

Rn1

…Se Sd

Si

Rnm

…

O11 O1pOq1 Oqr

… …Sa Sb Sa Sb

Robotic System (N PFSM; N = total # agents)Coupling (e.g., manipulation, sensing)

Environment (Q PFSM; Q = total # objects)

Macroscopic Modeling Methodology

Coupling

Type 1

Ss Sa

Se Sd

Si

Type q

Caste1

Caste n

Robotic (PFSM)

• average quantities• central tendency prediction (1 run)• continuous quantities: +1 DE per state

for all robotic castes and object types (metric/task dependent!)

• - 1 DE if substituted with conservation equations (e.g., total # of robots, total # of objects of type q, … )

Sa Sb

Environment (PFSM)

Model Calibration• Goal: calibration method for achieving 0-

free parameter models!• Two types of parameters:

– Delays– Encountering probabilities

• Calibration procedure:– Systematic experiment with 1 real or realistic

simulated robot (robot-to-environment interaction)

– Systematic experiments with 2 real or realistic simulated robots (robot-to-robot interaction)

Time Discretization: The Engineering Receipt

Time-discrete vs. time-continuous models:1. Assess what’s the time resolution needed for your swarm

performance metrics2. Consider unit-to-unit interaction essence: probabilistic/deterministic,

asynchronicity role, …3. Choose whenever possible the most computationally efficient model:

time-discrete less computationally expensive than emulation of continuity (e.g. Runge-Kutta, etc.) and maybe the prediction accuracy not statistically different from continuous models because of other (more important) sources of inaccuracies in the system

Often not even a tradeoff: just use the appropriate instrument for the appropriate problem!

Delays

1. Measure all delays of interest in your system, i.e. those which might influence the swarm performance metrics. Note: often “delay states” can just summarize all what you need without getting into the details of what’s going on within the state

2. Consider only average values (we might consider also parameter distribution in the future, the modeling methodology does not prevent to do so)

3. For time-discrete systems: choose the time step T = MCF all the delays measured (e.g., 3 s obstacle avoidance, 4 s object manipulation, T = 1 s) -> no rounding error

Encountering Probabilities• Idea: trajectories and object location do not count

(see assumption) -> 1D Montecarlo simulation

2D physical space 1D probability space

O1

O1O1

O2

O2O2

O2

Free space

Encountering Probabilities

is

si g

AvWr =

As = detection area of the smallest object

v = mean robot speedWs = robot’s detection width for the

smallest object (center-to-center)

1. Measure geometric probabilities of encountering (detection areas)2. Calculate the encountering rate ri [s-1] for the object i from the

geometric probabilities gi :

3. For time-discrete models, calculate the encountering probabilities pi (per time step) from the encountering rates:

pi = riT

NOTE: not exactly the same of IJRR04 and AR04!

Geometric Probabilities (Detection Areas)

ps, pw, …are function of sensor range, behavior, robot’s and object’s size, … : interaction characterization!

Example: stick

ps = As/Aarena

As

Example 1: Linear Model –Wandering and

Obstacle Avoidance

A Simple Linear ModelExample: search (moving forwards) and obstacle avoidance

© Gilles Caprari, 1999 © Nikolaus Correll, 2004

A Simple Linear Model

Search Avoidance

1

+ paNs(k-Ta)- paNs(k)

pa

Linear DEs!

Ns(k)Ns(k+1) =

Na(k+1) = N0 – Ns(k+1)

Ta = obstacle avoidance durationpa = probability of encountering an obstacleNs = average # robots in searchNa= average # robots in obstacle avoidanceN0 = # robots used in the experimentk = 0,1, … (iteration index)

! Ns(k) = Na(k) = 0 for all k<0 !Ns(0) = N0 ; Na(0) = 0

A Simple Linear Model

Group size

Micro to macro comparison(same robot density but wall surfacebecome smaller with bigger arenas)

Typical parameter analysis at steadystate

Example 2:Nonlinear Model –

Collaborative Stick-Pulling

Set-Up

Physical Set-Up Collaboration via indirect communication

Proximity sensors

Arm elevationsensor

• 2-6 robots• 4 sticks• 40 cm radius arena

IR reflectiveband

Systematic Experiments

Real robots Embodied simulation

•[Martinoli and Mondada, ISER, 1995]•[Ijspeert et al., AR, 2001]

Systematic Experiments’ ResultsNrobots > Nsticks

Nrobots ≤ Nsticks

• Real robots (3 runs) and embodied simulations (10 runs)• System bifurcation as a function of #robots/#sticks

Modeling Methodology

Microscopic characterization

Probabilistic agent’sflowchart

Markov Chain (PFSM)Deterministic robot’s flowchart

Geometric Probabilities

sgg

sg

aww

rR

arr

ass

pRp

ppAAp

NppAApAAp

=

==

−===

2

1

0

/)1(

//

Aa = surface of the whole arena

Simplified Macroscopic Model

Τi,Τa,Τd,Τc << Τg →Τi=Τa=Τd=Τc=0

Simplified Macroscopic ModelNonlinear DE coupling through unit-to-unit interaction (in this case through the stick)!

Search Grip

Ns(k+1) =

Ng(k+1) =

C(k) =

Ct(k) =

Ns(k) – pg1[M0 – Ng(k)]Ns(k)

successful

+ pg2Ng(k)Ns(k)

unsuccessful

+ pg1[M0 – Ng(k-Τg)]Γ(k;0)Ns(k-Tg)

N0 – Ns(k+1)

pg2Ns(k)Ng(k)

e

T

k

T

kCe

∑=0

)(

Ns = average # robots in searching modeNg= average # robots in gripping modeN0 = # robots used in the experimentM0 = # sticks used in the experimentΓ = fraction of robots that abandon pullingTe = maximal number of iterationsk = 0,1, …Te (iteration index)C = # of collaborations Ct = mean collaboration rate over Te

Simplified Macroscopic Model

Fraction of robots which abandon pulling:

∏−

−−=

−=ΓSL

SLg

Tk

TTkjsgSL jNpTk )](1[);( 2

with

TSL = shift left duration [iterations] (=0 in this model)

Initial conditions and causality

Ns(0) = N0, Ng(0) = 0Ns(k) = Ng(k) = 0 for all k<0

Results Simplified Model• Microscopic (100 runs) and macroscopic models overlapped• Only qualitatively agreement with embodied/real robots results• Discrepancies micro-macro: small numbers and combinatorial issues

• 16 robots, 16 sticks, Ra = 80 cm• 4 robots, 4 sticks, Ra = 40 cm

Steady State Analysis• Steady-state analysis → can be computed analytically:opt

gT

gc

g

gg

optg R

forR

NRpT

+=≤

−

+−

−=

12

21

)1(2

1ln

)2

1ln(

10

1

βββ

β

with β = N0/M0 = ratio robots-to-sticks

• No for β>βc;• Depending on Rg, it’s possible:

i.e., optimal Tg also with more robots than sticks • [Lerman et al, Alife J., 2001], [Martinoli et al, IJRR, 2004]

optgT

cββ <<1

Steady State Analysis

20 robots and 16 sticks (optimal Tg)

gg RR101~ =Example: (collaboration very difficult)

Full ModelFor instance, for the average number of robots in searching:

)()()()()()()();()()(])()([)()1(

22

121

iasRaswcdascdagcascag

cgasacgagsRwggss

TkNpTkNpTkNTkTkNTkTkNTkTkkNppkkkNkN

−+−+−−∆+−−∆+−Γ−∆+++∆+∆−=+

with time-varying coefficients (nonlinear coupling):

∏−

−−=

−=Γ

=∆

−−=∆

SL

SLg

Tk

TTkjsgSL

ggg

dggg

jNpTk

kNpk

kNkNMpk

)](1[);(

)()(

)]()([)(

2

22

011

• 6 states: 5 DE + 1 cons. EQ• Ti,Ta,Td,Tc ≠ 0; Τxyz = Τx + Τy + Τz• TSL= Shift Left duration• [Martinoli et al., IJRR, 2004]

Results (Standard Arena)

Webots (10 runs), microscopic (100 runs), macroscopic model (1 run)

Results: 4 x #Sticks, #Robots and Arena Area

Webots (10 runs), microscopic (100 runs), macroscopic model (1 run)

Example 3:Nonlinear Model – Collective

Aggregation

To be Discussed in Week 10 Aggregation ProcessPhysical Set-Up

• 1-10 robots• 20-40 seeds• 80x80 cm and 114x114 cm arenas• With and without activity regulation

Combining Modeling and Machine-learning

Techniques

Key Idea• Microscopic modeling (or in general any form of abstraction) allow us

to consider certain parameters and leave others• Models as a “parameter” filter; allow for fast simulation and enhanced

understanding• Macroscopic models do not allow the study of specialization unless

precise castes can be identified a priori (caste = set of homogeneous individuals)

• In this case study we have identified a crucial parameter for our metrics and we wanted to understand what’s the best distribution of parameter values over the different individuals (learning as automatic way for discovering diversity/specialization)

In-Line Adaptive Learning(Li, Martinoli, Abu-Mostafa, 2001)

• GTP: Gripping Time Parameter (parameter to be learned)• ∆d: learning step• d: direction• Performance/reinforcement signal: collaboration rate, i.e. #

sticks pulled/minute

Co-Learning Collaborative BehaviorThree orthogonal axes to consider (extremities or balanced solutions are possible):

• Individual and group fitness• Private (non-sharing of parameters) and public (parameter sharing) policies• Homogeneous vs. heterogeneous systems

1

2

3

Results #1 – Public & Group & Homogeneous Policy

Filter cut-off f high (short averaging window)

Filter cut-off f low (long averaging window)

0 100 200 300 400 500 6000

0.2

0.4

0.6

0.8

1

1.2

1.4


Sti

ck−p

ulli

ng

rat

e (1

/min

)

2 robots

3 robots

4 robots

5 robots

6 robots

0 100 200 300 400 500 6000

0.2

0.4

0.6

0.8

1

1.2

1.4


Sti

ck−p

ulli

ng

rat

e (1

/min

)2 robots

3 robots

4 robots

5 robots

6 robots

Note: 1 parameter for the whole swarm!

Systematic (mean only)Learned (mean + std dev)

Diversity and SpecializationKey question: does team diversity enhance performance? I.e., canindividual become specialized?

Performance ratio between 2 caste and homogeneous system (microscopic model + Webots, systematic)

4 robots, one per color, microscopic model + learning

Results #1,2,3 – All Three Learning Policies and Systematic

Systematic search

Learning

Public, group, homprivate, group, hetprivate, individual, het

Importance of specialization

• [Li, Martinoli, Abu-Mostafa, 2002] Note: high cut-off filter frequency!

Impact of Diversity on Performance(Li, Martinoli, Abu-Mostafa, 2004)

2 3 4 5 6

1

1.05

1.1

1.15

1.2

1.25

1.3

Number of robots

Stic

k−pu

lling

rat

e ra

tio2−caste, GlobalHeterogeneous, GlobalHeterogeneous, Local

Performance ratio between heterogeneous (full and 2-castes) and homogeneous groups AFTER learning

Measuring Diversity and Specialization(Li, Martinoli, Abu-Mostafa, 2004)

2 3 4 5 60

1

2

3

4

5

6

Number of robots

Div

ersi

ty

2−caste, GlobalHeterogeneous, GlobalHeterogeneous, Local

2 3 4 5 6

0

0.5

1

1.5

2

2.5

Number of robots

Sp

ecia

lizat

ion

2−caste, GlobalHeterogeneous, GlobalHeterogeneous, Local

Diversity (entropy measurement)

Specialization: diversity in order to fit

Additional Literature – Week 6Books• Ross S. M., Introduction to Probability Models, Academic Press, San

Diego, CA, USA, 1997.

Papers• Ijspeert A. J., Martinoli A., Billard A., and Gambardella L.M.,

“Collaboration through the Exploitation of Local Interactions inAutonomous Collective Robotics: The Stick Pulling Experiment”. Autonomous Robots, 11(2):149–171, 2001.

• Lerman K., Martinoli A., and Galstyan A., “A Review of Probabilistic Macroscopic Models for Swarm Robotic Systems”. Proc. of the Workshop on Swarm Robotics, 8th Int. Conference on the Simulation of Adaptive Behavior, Los Angeles, July, 2004. Lecture Notes in Computer Science (2004), Sahin E. and Spears W., editors, to appear.

• Lerman, K. and Galstyan, A. Mathematical model of foraging in a group of robots: Effect of interference. Autonomous Robots, 13(2):127–141, 2002.

sc741– w6: an introduction to machine- learning and modeling … · 2008. 11. 21. · ga: coding...

Documents