artificial intelligence in robotics - cvut.cz
Post on 03-Oct-2021
11 Views
Preview:
TRANSCRIPT
Artificial Intelligence in Robotics Lecture 11: GT in Robotics
Viliam LisΓ½
Artificial Intelligence Center
Department of Computer Science, Faculty of Electrical Eng.
Czech Technical University in Prague
Game Theory
Mathematical framework studying strategies of players in
situations where the outcomes of their actions critically depend
on the actions performed by the other players.
2
Robotic GT Applications
3
By Yuxuan Wang through flicker
Adversarial vs. Stochastic Environment
Deterministic environment
The agent can be predict next state of the environment exactly
Stochastic environment
Next state of the environment comes from a known distribution
Adversarial environment
The next state of the environment comes from an unknown
(possibly nonstationary) distribution
Game theory is optimizes behavior in adversarial environments
4
GT and Robust Optimization
It is sometimes useful to model unknown environmental
variables as chosen by the adversary
the position of the robot is the worst consistent with observations
the planned action depletes the battery the most that it can
the lost person in the woods moves to avoid detection
GT can be used for robust optimization without adversaries
5
Normal form game
6
π is the set of players
π΄π is the set of actions (pure strategies) of player π β π
ππ: π΄ππβπ β β is immediate payoff for player π β π
Mixed strategy
ππ β Ξ(π΄π) is a probability distribution over actions
we naturally extend ππ mixed strategies as the expected value
Best response
of player π to strategy profile of other players πβπ is
π΅π πβπ = arg max
ππβΞ(π΄π)ππ( ππ , πβπ)
Nash equilibrium
Strategy profile πβ is a NE, iff βπ β π βΆ ππβ β π΅π (πβπ
β )
Normal form game
0-sum game
Pure strategy, mixed strategy, Nash equilibrium, game value
7
Player 1
Row player
Maximizer
Player 2
Column player
Minimizer
r p s
R 0.5 0 1
P 1 0.5 0
S 0 1 0.5
Computing NE
LP for computing Nash equilibrium of 0-sum normal form game
maxπ1,π
π
π . π‘. π1 π1 π π1, π2 β₯ π
π1βπ΄1
βπ2 β π΄2
π1 π1 = 1
π1βπ΄1
π1 π1 β₯ 0 βπ1 β π΄1
8
Pursuit-Evasion Games
9
Task Taxonomy
10
Robin, C., & Lacroix, S. (2016). Multi-robot target detection and tracking:
taxonomy and survey. Autonomous Robots, 40(4), 729β760.
Problem Parameters
11
Chung, T. H., Hollinger, G. A., & Isler, V. (2011). Search and pursuit-evasion in
mobile robotics`: A survey. Autonomous Robots, 31(4), 299β316.
Problem Parameters
12
Chung, T. H., Hollinger, G. A., & Isler, V. (2011). Search and pursuit-evasion in
mobile robotics`: A survey. Autonomous Robots, 31(4), 299β316.
PERFECT INFORMATION CAPTURE
13
arena with radius π
man and lion have unit speed
alternating moves
can lion always capture the man?
Algorithm for the lion
start from the center
stay on the radius that passes the man
move as close to the man as possible
Analysis
capture time with discrete steps π π2 [Sgall 2001]
no capture in continuous time
the lion can get to distance π in time π(π logπ
π) [Alonso at al 1992]
single lion can capture the man in any polygon [Isler et al. 2005]
14
M
Lion and man game
L
Modelling movement constraints
Homicidal chauffeur game [Isaacs 1951]
unconstraint space
pedestrian is slow, but highly maneuverable
car is faster, but less maneuverable (Dubinβs car)
can the car run over the pedestrian?
π₯ π = π’π, π’π β€ 1; π₯ πΆ = π£ cos π , π£ sin π ; π = π’πΆ , π’πΆ β {β1,0,1}
Differential games
π₯ = π(π₯, π’1 π‘ , π’2(π‘)), πΏπ π’1, π’2 = ππ(π₯ π‘ , π’1 π‘ , π’2(π‘)π
π‘=0) ππ‘
analytic solution of partial differential equation (gets intractable quickly)
15
M C
Incremental Sampling-based Method
S. Karaman, E. Frazzoli: Incremental Sampling-Based
Algorithms for a Class of Pursuit-Evasion Games, 2011.
1 evader, several pursuers
Open-loop evader strategy (for simplicity)
Stackelberg equilibrium
the evader picks and announces her trajectory
the pursuers select trajectory afterwards
Heavily based on RRT* algorithm
16
Image by MIT OpenCourseWare
Incremental Sampling-based Method
Algorithm
Initialize evaderβs and pursuersβ trees ππ and ππ
For π = 1 to π do ππ,πππ€ β πΊπππ€(ππ)
if ππ β ππ: πππ π‘ ππ,πππ€ , ππ β€ π π & time np β€ time ne,new β β then delete ππ,πππ€
ππ,πππ€ β πΊπππ€ ππ πΆ = {ππ β ππ: πππ π‘ ππ , ππ,πππ€ β€ π π & time np,nππ€ β€ time ne }
delete πΆ βͺ descendants C, Te
For computational efficiency pick π π βlog |ππ|
|ππ|
17
18
iteration 500 iteration 3000
iteration 5000 iteration 10000
19
Discretization-based approaches
Open-loop strategies are very restrictive
Closed-loop strategies are generally intractable
20
Cops and robbers game
Graph πΊ = (π, πΈ)
Cops and robbers in vertices
Alternating moves along edges
Perfect information
Goal: step on robberβs location
Cop number: Minimum number of cops necessary to guarantee
capture or the robber regardless of their initial location.
21
Cops and robbers game
Neighborhood π π£ = π’ β π βΆ π£, π’ β πΈ
Marking algorithm (for single cop and robber):
1. For all π£ β π, mark state (π£, π£)
2. For all unmarked states (π, π) If βπβ² β π π βπβ² β π π such that (πβ², πβ²) is marked, then mark π, π
3. If there are new marks, go to 2.
If there is an unmarked state, robber wins
If there is none, the copβs strategy results from the marking order
22
Cops and robbers game
Time complexity of marking algorithm for π cops is π π2 π+1 .
Determining whether π cops with a given locations can capture a
robber on a given undirected graph is EXPTIME-complete
[Goldstein and Reingold 1995].
The cop number of trees and cliques is one.
The cop number on planar graphs is at most three [Aigner and
Fromme 1984].
23
Cops and robbers game
Simultaneous moves
No deterministic strategy
Optimal strategy is randomized
24
Stochastic (Markov) Games
π is the set of players
π is the set of states (games)
π΄ = π΄1 Γβ―Γ π΄π, where π΄π is the set of actions of player π
π: π Γ π΄ Γ π β [0,1] is the transition probability function
π = π1, β¦ , ππ, where ππ: π Γ π΄ β β is immediate payoff for player π
25
1 0
0 1
2 1
0 1
0.5 0 1
1 0.5 0
0 1 0.5
0
π 1 π 2
π 3
π 4
π(π π|π π , (π1, π2))
Stochastic (Markov) Games
Markovian policy: ππ: π β Ξ(π΄)
Objectives
Discounted payoff: πΎπ‘ππ(π π‘, ππ‘)βπ‘=0 , πΎ β [0,1)
Mean payoff: limπββ
1
π ππ(π π‘ , ππ‘)ππ‘=0
Reachability: π ππππβ πΊ , πΊ β π
Finite vs. infinite horizon
26
1 0
0 1
2 1
0 1
0.5 0 1
1 0.5 0
0 1 0.5
0
π 1 π 2
π 3
π 4
π(π π|π π , (π1, π2))
Value Iteration in SG
Adaptation of algorithm from Markov decision processes (MDP)
For zero-sum, discounted, infinite horizon stochastic games
βπ β π initialize π£ π arbitrarily (e.g., π£ π = 0)
until π£ converges
for all π β π for all (π1, π2) β π΄ π
π π1, π2 = r s, a1, a2 + πΎ π π β² π , π1, π2 π£(π β²)
π β²βπ
π£ π = maxπ₯
minπ¦
π₯ππ¦ // solves the matrix game π
Converges to optimum if each state is updated infinitely often
the state to update can be selected (pseudo)randomly
27
Pursuit Evasion as SG
28
π = (π, π) is the set of players
π = π£π , π£π1 , β¦ , π£ππ β ππ+1 βͺ π is the set of states
π΄ = π΄π Γ π΄π, where π΄π = πΈ, π΄π = πΈπ is the set of actions
π: π Γ π΄ Γ π β [0,1] is deterministic movement along the edges
π = ππ , ππ, where ππ = βππ is one if the evader is captured
Summary
PEGs studied in various assumptions
Simplest cases can be solved analytically
More complex cases have problem-specific algorithms
Even more complex cases best handled by generic AI methods
29
top related