hierarchical mission control of automata with human supervision prof. david a. castañon boston...
TRANSCRIPT
Hierarchical mission control of automata with human supervision
Prof. David A. CastañonBoston University
Problem of Interest
• Coordination of heterogeneous teams to accomplish tasks in uncertain, risky environments
- Vehicles with different capabilities, resources- Some resources are renewable (sensors), others are not
- Tasks are spatially distributed, require combinations of capabilities - Successful completion of tasks not guaranteed
- Likelihood of success depends on resources assigned- Tasks arrive, depart randomly- Task types may be unknown until observed- Vehicles may fail randomly, depending on trajectories
• Key aspect: Real-time adaptation to events
• Human Supervision- Determine task priority/value- Modify individual vehicle task assignments when desired- Determine specific vehicle schedules when desired
Problem Illustration
Experiment model
• Multiple robots search for and perform tasks at BU’s Mechatronics Lab
Why is this a hard problem
• Uncertain environment and dynamics- Unknown targets- Uncertain effectiveness of sensing, actionsRequires highly adaptive system, anticipative of and responsive to new informationHedge against loss of assets, new arrivals, action failures, …
• Diverse set of vehicles with multiple capabilities- Dynamic role selection, ad hoc teaming
• Dual control problems: Manage both information acquisition and action - Trade off search and sensing versus actions- Dynamic coupling of available capabilities to achieve desired effects
• Support and adapt to human control inputs- Goals, constraints, fixed decisions- Provide information to assess effects of changes
Classes of algorithms
• Operations Research- Deterministic and stochastic multi-vehicle task assignment and scheduling
- Large vehicles, small tasks, limited cooperation, homogeneous activities- No risk, limited uncertainty to new task arrivals, departures independent of vehicle actions
- Search theory and sensor management- Large-scale resource allocation and integer programming
• Stochastic Control- Control of stochastic queuing systems in communications- Single vehicle routing and low level vehicle trajectory control- Swarm control approaches with stability and performance guarantees
- Homogeneous vehicles- Approximate dynamic programming techniques
- Not focused on combinatorial optimization in general, rare exceptions- Model predictive control of complex stochastic systems
• Artificial Intelligence/Computer Science- Constraint satisfaction, temporal planning systems
- Non-real time, off-line combinatorial constraint-based search- Limited incorporation of risk/reward, information dynamics
- Behavioral control in robotics for simple tasks- Reinforcement learning for stochastic planning in well-defined repeated environments (e.g.
games)
Proposed Approach: Hierarchical Model Predictive Control
• Hierarchical approach: avoid combinatorial explosion of complexity through decompositionTeam strategy selection: address uncertainty- Allocate team capabilities to tasks, hedging against task type uncertainty, new task
arrivals, action success probabilities- Simplify distribution of resources across vehiclesTeam activity scheduling: address combinatorial complexity- Allocate team activities to platforms- Select schedules and routes
• Model Predictive Control: resolve algorithms in response to new information or human directives- Receding horizon control- Respond to new tasks, changes in task status, platform loss, ….- Adapt to human guidance and constraintsRequires fast algorithms for real-time control
Team Strategy Selection
• Stochastic dynamic programming formulation- Multistage formulation, with outcomes observed after each stage
ResourcesStage 1 Stage 2 Stage 3
Task1
TaskN
Task1
TaskN
Task1
TaskN
Type 1
Type 2
Type 3
Type 4
TaskN+1
TaskN+M
Notation
• N tasks i = 1, …, N
• M resource types j = 1, …, M
• Assume independence of all task completion events
iVi task of Value :
jR
ijx
i
jp
jM
j
ij
ij
j
typeof resource using ofCost :
task toassigned typeof resources ofNumber :
task completes
ly successful typeof resource singley that Probabilit :
typeof resources ofNumber :
Example: Two-Stage Single Resource Problem
• Define a task completion state after each stage
- Task completion state observed after each stage
• Decisions are now feedback policies
• Task completion state dynamics: Controlled Markov chain- Resources assigned determine transition probabilities- Independence of completion event outcomes decouples transition dynamics
across tasks
after state completion task overall theis )}( ),...,({)(
stageafter task of state completion thedenotes }1,0{)(
1 kkkk
kik
N
i
niiii kpnkkxkkP ))(1()))1(,(,1)1(|1)((
kkkx
kikkxi stageat sallocation resource of vector ))1(,(
stagein task toassigned resources ))1(,(
Two-Stage Problem Statement
• Objective: minimize expected uncompleted task value plus expected resource use costs
• Constraints: Resource limits
))1(,2(1}1)2({min 11))}1(,2(),1({
ii
N
iii
xx
xxRIVE
1
11
..., 1, 0,))1(,2(),1(
)1( outcomes allfor ))1(,2()1(
Mxx
Mxx
ii
N
iii
Relaxed Two-Stage Problem
• Original problem is stochastic integer program- P-space complete, hard
• Expand set of admissible feedback strategies in second stage- Generates lower bound to optimal value function- New constraint on average number of resources
- Relaxes exponential number of constraints to a single constraint- Simple result: All feasible strategies in original problem are feasible in current problem- Lower bound on original performance- Idea: select optimal strategies for lower bound
1
11)}1({
..., 1, 0,))1(,2(),1(
))1(,2()1())1(|)1((
Mxx
MxxxP
ii
N
iii
Characterization of Optimal Strategies
• Important concept: Mixed local strategies- Local strategies: feedback strategies such that the actions on a given task depend
only on the state of that task
- Mixed strategy: random combination of pure strategies- Mixed strategies may achieve better performance than pure strategies in relaxed
problem
• Theorem: In relaxed problem, for every pure strategy, there is a mixed local strategy which uses same resources and achieves same expected performance- Proven by construction- Restricts search to local mixed strategies- Fast algorithm for solution of optimal strategies using convex optimization
principles!- Can solve exactly in Complexity O((M1+N)log(N))
))1(,2())1(,2( iii xx
Comments and Extensions
• MPC approach guarantees feasibility of approximate problem solution in terms of original problem- Obtain approximate solution, but implement only first stage allocations- Resolve problem when new observations are available, with receding horizon- Fast algorithm allows for rapid computation
• Main extensions:- Multiple stages- Multiple resource types
- Multiple renewable and non-renewable resources- Solution NP-hard, but can solve approximately
- Multiple task types: sensing and action- Must sense to observe outcomes
- New task arrivals, discovered by searching- Unknown task types: Detect presence, but must observe to determine task type- Task departures, deadlines
Team Activity Scheduling
• Inputs from team strategy selection- Desired resources assigned to each task in current period - Desired resources held in reserve when future information is collected
• Guidance and constraints from human operators- Task values, select platform task assignments, select task resource assignments
• Known parameters- Vehicle locations and resources in each vehicle, task locations
• Problem: assign resource deliveries for tasks to individual vehicles, and select sequence of activities for vehicle- Deterministic multi-vehicle routing problem (VRP)- NP-hard, with many useful approximate approaches available
Team Activity Assignment Formulation
Problem Formulation
Visit Customers
Subject to:
N vehicles to route
Integrality
• VRP is an NP-hard problem (traveling salesman) wrapped in an NP-hard problem (bin packing).
• Classical Application: Truck Routing
where
Discounted Cost
Team Activity Assignment Algorithm
• Candidate algorithm: Tabu Search- Locally perturbs trial solutions- Uses “Tabu” list to avoid local minima- Evaluated by AFIT for UAV routing - Fast replanning, leads to rapid response to events- Handles time window constraints instead of precedence constraint
• Significant extensions to date- Multiple task types- Multiple resource types- Compound tasks involving multiple vehicles
• Alternative algorithms (AFOSR-sponsored)- Mixed Integer-Linear Programming, J. How, MIT- Receding horizon controller, C. Cassandras, BU
Comments
• Algorithms available for dynamic control of automata performing tasks in uncertain, risky environments- Fast generation of desired courses of action- Hedge against uncertain outcomes, adapt to new information
• Operator interaction through value structure, plus fixed decision variables and constraints- Allows for “micro”-management- Very limited insight into effects of operator inputs on automata behavior and
performance
• Fundamental problem for this MURI research: prediction of course of action in the presence of uncertainty- Not a single plan, but a contingency tree of possible actions/responses- Hard to modify, approve
Experimental Platform for Research
• Multiple robots search for and perform tasks at BU’s Mechatronics Lab- Can provide operator control of some platforms: human-automata teams- Control information displayed, risk to each operator using video
Future Activities
• Implement research experiments involving tasks with performance uncertainty in test facility- Vary tempo, size, uncertainty, information
• Develop algorithms to interact with operators in alternative roles- Supervisory control- Team partners
• Extend existing algorithms to different classes of tasks- Area search, task discovery, risk to platforms
• Develop algorithms to assist operators in predicting behavior of automata teams in uncertain environments
• Collaborate with MURI team to design and analyze experiments involving alternative structures for human-automata teams