robot swarms as ensembles of cooperating components - matthias holzl
DESCRIPTION
Matthias Holzl's Case Study from AWASS 2013TRANSCRIPT
![Page 1: Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl](https://reader033.vdocuments.us/reader033/viewer/2022051818/54b3bd324a7959dd7b8b4579/html5/thumbnails/1.jpg)
Robot Swarms as Ensembles ofCooperating Components
Matthias HölzlWith contributions from Martin Wirsing, Annabelle Klarl
AWASSLucca, June 24, 2013
www.ascens-ist.eu
![Page 2: Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl](https://reader033.vdocuments.us/reader033/viewer/2022051818/54b3bd324a7959dd7b8b4579/html5/thumbnails/2.jpg)
The Task
Robots cleaning an exhibition area
Matthias Hölzl 2
![Page 3: Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl](https://reader033.vdocuments.us/reader033/viewer/2022051818/54b3bd324a7959dd7b8b4579/html5/thumbnails/3.jpg)
marXbot
Miniature mobile robot developed by EPFLRough-terrain mobilityRobots can dock to other robotsMany sensors
Proximity sensorsGyroscope3D accelerometerRFID readerCameras. . .
ARM-based Linux systemGripper for picking up items
Matthias Hölzl 3
![Page 4: Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl](https://reader033.vdocuments.us/reader033/viewer/2022051818/54b3bd324a7959dd7b8b4579/html5/thumbnails/4.jpg)
Swarm Robotics
Matthias Hölzl 4
![Page 5: Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl](https://reader033.vdocuments.us/reader033/viewer/2022051818/54b3bd324a7959dd7b8b4579/html5/thumbnails/5.jpg)
Problems
Noise, sensor resolutionExtracting information from sensor dataUnforeseen situationsUncertainty about the environmentPerforming complex actions when intermediateresults are uncertain. . .
Matthias Hölzl 5
![Page 6: Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl](https://reader033.vdocuments.us/reader033/viewer/2022051818/54b3bd324a7959dd7b8b4579/html5/thumbnails/6.jpg)
Action Logics
Logics that can represent change over timeProbabilistic behavior can be modeled (but is cumbersome)
Matthias Hölzl 6
![Page 7: Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl](https://reader033.vdocuments.us/reader033/viewer/2022051818/54b3bd324a7959dd7b8b4579/html5/thumbnails/7.jpg)
Markov Decision Processes
pos = (x,y)
pos = (x,y+1) pos = (x+1,y+1)
pos = (x + 1,y)e / ...
s, n / ...
s / 0.9 / -0.1e,w / 0.025 / -0.1
w / ...s,n / ...
e / ...s, n / ...
w/ ...s,n / ...
n / 0.9 / -0.1e, w / 0.025 / -0.1
s,n,w,e / 0.05 / -0.1
Matthias Hölzl 7
![Page 8: Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl](https://reader033.vdocuments.us/reader033/viewer/2022051818/54b3bd324a7959dd7b8b4579/html5/thumbnails/8.jpg)
Markov Decision Processes
watchTV
goToClub
DecideActivity
In Club
Watching TV
Dancing Alone
Drinking
Dancing With
Partner
dance p = 0.5
dance p = 0.5
drinkBeer
Oh, oh
Oh, no
flirt
p = 0.05
p = 0.95
Matthias Hölzl 8
![Page 9: Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl](https://reader033.vdocuments.us/reader033/viewer/2022051818/54b3bd324a7959dd7b8b4579/html5/thumbnails/9.jpg)
MDPs: Strategies
watchTV
goToClub
DecideActivity
In Club
Watching TV
Dancing Alone
Drinking
Dancing With
Partner
dance p = 0.5
dance p = 0.5
drinkBeer
Oh, oh
Oh, no
flirt
p = 0.05
p = 0.95
State TV CDB CDFDA watchTV goToClub goToClubIC drinkBeer drinkBeer danceDWP flirt flirt flirtUtility 0.1 −0.05(∗) −1.975(∗)
(∗) 0.05+ (−0.1)(∗∗) 0.05+ (0.5 × 0.2) + 0.5 × (0.25+ (0.05 × 5) + (0.95 × −5))
Matthias Hölzl 9
![Page 10: Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl](https://reader033.vdocuments.us/reader033/viewer/2022051818/54b3bd324a7959dd7b8b4579/html5/thumbnails/10.jpg)
Reinforcement Learning
General idea:Figure out the expectedvalue of each actionin each statePick the action withthe highest expectedvalue (most of thetime)Update the expectationsaccording to the actualrewards
Matthias Hölzl 10
![Page 11: Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl](https://reader033.vdocuments.us/reader033/viewer/2022051818/54b3bd324a7959dd7b8b4579/html5/thumbnails/11.jpg)
How well does this work?
Rather well for small problemsBut: state explosion
Matthias Hölzl 11
![Page 12: Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl](https://reader033.vdocuments.us/reader033/viewer/2022051818/54b3bd324a7959dd7b8b4579/html5/thumbnails/12.jpg)
Solutions
DecompositionHierarchyPartial programs
Matthias Hölzl 12
![Page 13: Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl](https://reader033.vdocuments.us/reader033/viewer/2022051818/54b3bd324a7959dd7b8b4579/html5/thumbnails/13.jpg)
POEM
Action languageFirst-order reasoningHierarchical reinforcement learning
Learns completions for partial programsConcurrency
Reflection / meta-object protocols· · ·
Matthias Hölzl 13
![Page 14: Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl](https://reader033.vdocuments.us/reader033/viewer/2022051818/54b3bd324a7959dd7b8b4579/html5/thumbnails/14.jpg)
Iliad: A POEM Implementation
Common Lisp-based programming languageFull first-order reasoning
Operations on logical theories: U(C)NA, domain closure, . . .Resolution, hyperresolution, DPLL, etc.Conditional answer extractionProcedural attachment, constraint solving
Hierarchical reinforcement learningBased on Concurrent ALispPartial programsThreadwise and temporal state abstractionHierarchically Optimal Yet Local (HOLY) Q-learningHierarchically Optimal Recursively Decomposed (HORD) Q-learning
Matthias Hölzl 14
![Page 15: Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl](https://reader033.vdocuments.us/reader033/viewer/2022051818/54b3bd324a7959dd7b8b4579/html5/thumbnails/15.jpg)
Planned Contents
Introduction to CALisp/PoemSimple TD-learning: banditsFlat reinforcement learning: navigationHierarchical reinforcement learning:collecting items individuallyThreadwise decomposition for hierarchicalreinforcement learning: learning collaboratively
Matthias Hölzl 15
![Page 16: Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl](https://reader033.vdocuments.us/reader033/viewer/2022051818/54b3bd324a7959dd7b8b4579/html5/thumbnails/16.jpg)
n-armed Bandits
S
search / 1.0 / N(0.1, 1.0)
coll-known / 1.0 / N(0.3, 3.0)
Choice between n actionsReward depends probabilisticly onthe action choiceNo long-term consequencesSimplest form of TD-learning
Matthias Hölzl 16
![Page 17: Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl](https://reader033.vdocuments.us/reader033/viewer/2022051818/54b3bd324a7959dd7b8b4579/html5/thumbnails/17.jpg)
Flat Learning
XXXXXXXXXXXXXXXXXXXXXXXX Target: (0 0)XXTT XX XXXX XXXXXXXXXX XXXX Choices: (N E S W)XXXXXX XX XX XX Q-values:XX XX XXXXXXXX XX #((N (Q -1.8))XX XXXXXX XX XX (E (Q -1.8))XX XX XXXXXXXX (S (Q -2.25))XXXXXXXXXX XX XX (W (Q 2.76)))XX XX XXXX XX Recommended choice is WXX XX XX XX XX XXXX XX RR XXXXXXXXXXXXXXXXXXXXXXXXXX
Matthias Hölzl 17
![Page 18: Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl](https://reader033.vdocuments.us/reader033/viewer/2022051818/54b3bd324a7959dd7b8b4579/html5/thumbnails/18.jpg)
Flat Learning
(defun simple-robot ()(call (nav (target-loc (robot-env)))))
(defun nav (loc)(until (equal (robot-loc) loc)
(with-choice navigate-choice (dir ’(N E S W))(action navigate-move dir))))
Matthias Hölzl 18
![Page 19: Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl](https://reader033.vdocuments.us/reader033/viewer/2022051818/54b3bd324a7959dd7b8b4579/html5/thumbnails/19.jpg)
Hierarchical Learning
(defun waste-removal ()(loop
(choose choose-waste-removal-action(action sleep ’SLEEP)(call (pickup-waste))(call (drop-waste)))))
(defun pickup-waste ()(call (nav (waste-source)))(action pickup-waste ’PICKUP))
Matthias Hölzl 19
![Page 20: Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl](https://reader033.vdocuments.us/reader033/viewer/2022051818/54b3bd324a7959dd7b8b4579/html5/thumbnails/20.jpg)
Multi-Robot Learning
XXXXXXXXXXXXXXXXXXXXXXXXXXTT XX RR XXXX XX XXXXXXXX XX XXXX XX RW RR XXXX XXXX RR WW XXXX XXXX RR XXXX XXXX XXXXXXXXXXXXXXXXXXXXXXXXXX
Matthias Hölzl 20