hierarchical mission control of automata with human supervision prof. david a. castañon

Hierarchical mission control of automata with human supervision

Prof. David A. CastañonBoston University

Problem of Interest

• Coordination of heterogeneous teams to accomplish tasks in uncertain, risky environments

- Vehicles with different capabilities, resources- Some resources are renewable (sensors), others are not

- Tasks are spatially distributed, require combinations of capabilities - Successful completion of tasks not guaranteed

- Likelihood of success depends on resources assigned- Tasks arrive, depart randomly- Task types may be unknown until observed- Vehicles may fail randomly, depending on trajectories

• Key aspect: Real-time adaptation to events

• Human Supervision- Determine task priority/value- Modify individual vehicle task assignments when desired- Determine specific vehicle schedules when desired

Problem Illustration

Experiment model

• Multiple robots search for and perform tasks at BU’s Mechatronics Lab

Why is this a hard problem• Uncertain environment and dynamics

- Unknown targets- Uncertain effectiveness of sensing, actionsRequires highly adaptive system, anticipative of and responsive to new informationHedge against loss of assets, new arrivals, action failures, …

• Diverse set of vehicles with multiple capabilities- Dynamic role selection, ad hoc teaming

• Dual control problems: Manage both information acquisition and action - Trade off search and sensing versus actions- Dynamic coupling of available capabilities to achieve desired effects

• Support and adapt to human control inputs- Goals, constraints, fixed decisions- Provide information to assess effects of changes

Classes of algorithms• Operations Research

- Deterministic and stochastic multi-vehicle task assignment and scheduling- Large vehicles, small tasks, limited cooperation, homogeneous activities- No risk, limited uncertainty to new task arrivals, departures independent of vehicle actions

- Search theory and sensor management- Large-scale resource allocation and integer programming

• Stochastic Control- Control of stochastic queuing systems in communications- Single vehicle routing and low level vehicle trajectory control- Swarm control approaches with stability and performance guarantees

- Homogeneous vehicles- Approximate dynamic programming techniques

- Not focused on combinatorial optimization in general, rare exceptions- Model predictive control of complex stochastic systems

• Artificial Intelligence/Computer Science- Constraint satisfaction, temporal planning systems

- Non-real time, off-line combinatorial constraint-based search- Limited incorporation of risk/reward, information dynamics

- Behavioral control in robotics for simple tasks- Reinforcement learning for stochastic planning in well-defined repeated environments (e.g.

games)

Proposed Approach: Hierarchical Model Predictive Control

• Hierarchical approach: avoid combinatorial explosion of complexity through decompositionTeam strategy selection: address uncertainty- Allocate team capabilities to tasks, hedging against task type uncertainty, new task

arrivals, action success probabilities- Simplify distribution of resources across vehiclesTeam activity scheduling: address combinatorial complexity- Allocate team activities to platforms- Select schedules and routes

• Model Predictive Control: resolve algorithms in response to new information or human directives- Receding horizon control- Respond to new tasks, changes in task status, platform loss, ….- Adapt to human guidance and constraintsRequires fast algorithms for real-time control

Team Strategy Selection

• Stochastic dynamic programming formulation- Multistage formulation, with outcomes observed after each stage

Resources Stage 1 Stage 2 Stage 3Task

1

TaskN

Task1

TaskN

Task1

TaskN

Type 1

Type 2

Type 3

Type 4

TaskN+1

TaskN+M

Notation

• N tasks i = 1, …, N

• M resource types j = 1, …, M

• Assume independence of all task completion events

iVi task of Value :

jR

ijxi

jp

jM

j

ij

ij

j

typeof resource using ofCost :

task toassigned typeof resources ofNumber : task completes

ly successful typeof resource singley that Probabilit :

typeof resources ofNumber :

Example: Two-Stage Single Resource Problem

• Define a task completion state after each stage

- Task completion state observed after each stage

• Decisions are now feedback policies

• Task completion state dynamics: Controlled Markov chain- Resources assigned determine transition probabilities- Independence of completion event outcomes decouples transition dynamics

across tasks

after state completion task overall theis )}( ),...,({)( stageafter task of state completion thedenotes }1,0{)(

1 kkkkkik

N

i

niiii kpnkkxkkP ))(1()))1(,(,1)1(|1)((

kkkxkikkxi

stageat sallocation resource of vector ))1(,( stagein task toassigned resources ))1(,(

Two-Stage Problem Statement

• Objective: minimize expected uncompleted task value plus expected resource use costs

• Constraints: Resource limits

))1(,2(1}1)2({min 11))}1(,2(),1({

iiN

iii

xx

xxRIVE

1

11

..., 1, 0,))1(,2(),1(

)1( outcomes allfor ))1(,2()1(

Mxx

Mxx

ii

N

iii

Relaxed Two-Stage Problem

• Original problem is stochastic integer program- P-space complete, hard

• Expand set of admissible feedback strategies in second stage- Generates lower bound to optimal value function- New constraint on average number of resources

- Relaxes exponential number of constraints to a single constraint- Simple result: All feasible strategies in original problem are feasible in current problem- Lower bound on original performance- Idea: select optimal strategies for lower bound

1

11)}1({

..., 1, 0,))1(,2(),1(

))1(,2()1())1(|)1((

Mxx

MxxxP

ii

N

iii

Characterization of Optimal Strategies

• Important concept: Mixed local strategies- Local strategies: feedback strategies such that the actions on a given task depend

only on the state of that task

- Mixed strategy: random combination of pure strategies- Mixed strategies may achieve better performance than pure strategies in relaxed

problem

• Theorem: In relaxed problem, for every pure strategy, there is a mixed local strategy which uses same resources and achieves same expected performance- Proven by construction- Restricts search to local mixed strategies- Fast algorithm for solution of optimal strategies using convex optimization

principles!- Can solve exactly in Complexity O((M1+N)log(N))

))1(,2())1(,2( iii xx

Comments and Extensions

• MPC approach guarantees feasibility of approximate problem solution in terms of original problem- Obtain approximate solution, but implement only first stage allocations- Resolve problem when new observations are available, with receding horizon- Fast algorithm allows for rapid computation

• Main extensions:- Multiple stages- Multiple resource types

- Multiple renewable and non-renewable resources- Solution NP-hard, but can solve approximately

- Multiple task types: sensing and action- Must sense to observe outcomes

- New task arrivals, discovered by searching- Unknown task types: Detect presence, but must observe to determine task type- Task departures, deadlines

Team Activity Scheduling

• Inputs from team strategy selection- Desired resources assigned to each task in current period - Desired resources held in reserve when future information is collected

• Guidance and constraints from human operators- Task values, select platform task assignments, select task resource assignments

• Known parameters- Vehicle locations and resources in each vehicle, task locations

• Problem: assign resource deliveries for tasks to individual vehicles, and select sequence of activities for vehicle- Deterministic multi-vehicle routing problem (VRP)- NP-hard, with many useful approximate approaches available

Team Activity Assignment Formulation

Problem Formulation

Visit Customers

Subject to:

N vehicles to route

Integrality

• VRP is an NP-hard problem (traveling salesman) wrapped in an NP-hard problem (bin packing).

• Classical Application: Truck Routing

where

Discounted Cost

Team Activity Assignment Algorithm

• Candidate algorithm: Tabu Search- Locally perturbs trial solutions- Uses “Tabu” list to avoid local minima- Evaluated by AFIT for UAV routing - Fast replanning, leads to rapid response to events- Handles time window constraints instead of precedence constraint

• Significant extensions to date- Multiple task types- Multiple resource types- Compound tasks involving multiple vehicles

• Alternative algorithms (AFOSR-sponsored)- Mixed Integer-Linear Programming, J. How, MIT- Receding horizon controller, C. Cassandras, BU

Comments

• Algorithms available for dynamic control of automata performing tasks in uncertain, risky environments- Fast generation of desired courses of action- Hedge against uncertain outcomes, adapt to new information

• Operator interaction through value structure, plus fixed decision variables and constraints- Allows for “micro”-management- Very limited insight into effects of operator inputs on automata behavior and

performance

• Fundamental problem for this MURI research: prediction of course of action in the presence of uncertainty- Not a single plan, but a contingency tree of possible actions/responses- Hard to modify, approve

Experimental Platform for Research

• Multiple robots search for and perform tasks at BU’s Mechatronics Lab- Can provide operator control of some platforms: human-automata teams- Control information displayed, risk to each operator using video

Future Activities

• Implement research experiments involving tasks with performance uncertainty in test facility- Vary tempo, size, uncertainty, information

• Develop algorithms to interact with operators in alternative roles- Supervisory control- Team partners

• Extend existing algorithms to different classes of tasks- Area search, task discovery, risk to platforms

• Develop algorithms to assist operators in predicting behavior of automata teams in uncertain environments

• Collaborate with MURI team to design and analyze experiments involving alternative structures for human-automata teams

hierarchical mission control of automata with human supervision prof. david a. castañon

Documents