decision making for autonomous navigation abstract...! 1! decision making for autonomous navigation...

1

Decision Making for Autonomous Navigation

Abstract

Robots can automate a wide range of physical tasks from warehouse management to

space exploration. Many of these tasks require robots to navigate autonomously. Such a

robot should continuously choose and execute actions from a set of available actions until

it reaches the destination. The robot’s decision should avoid collisions and other

damaging outcomes while it ensures favorable ones, such as reduced travel distance and

time. This paper reviews decision-making techniques for effective navigation. In

particular, it reviews different methods to acquire information about the environment and

how this information can be used for effective decision making.

1. Introduction

Robots are machines that effect changes in the world inspired by human or animal

physical capabilities. A robot is autonomous if it can complete an assigned task with no

human intervention. Autonomous robots are increasingly used to automate routine tasks

in areas such as industrial automation and warehouse management, or tasks in areas that

are dangerous for people such as space exploration, military missions, and search and

rescue operations.

Mobile robots are robots that can move in a two-dimensional (2-D) or a three-

dimensional (3-D) world. This paper will focus on mobile robots that move in a 2-D

world. The success of a mobile robot depends on its ability to navigate, that is, to

ascertain its current position in the world and reach an assigned location. Mobile robots

that navigate autonomously are used to move goods in a warehouse (Sun et al. 2014),

2

carry out mundane tasks such as vacuuming (Forlizzi & Disalvo 2006), or transport

people in vehicles (Petrovskaya & Thrun 2009). Mobile robots are also used to explore

dangerous or unknown locations in space missions (Maimone et al. 2007) or search and

rescue operations (Davids 2002). Although these applications have different requirements

and criteria for success, they all rely on autonomous navigation.

Most commercially available mobile robots have a hardware/firmware layer and a

software layer. The hardware layer includes sensors and actuators. Sensors measure or

detect quantities in the environment, (e.g., light, heat) and convert them into digital

information. Actuators are motors that cause physical actions. The software layer accepts

sensor information as input, and generates commands to the robot actuators such that the

robot completes its assigned task. This separation of hardware systems from the

autonomous decision-making software has significant advantages. It allows robot

manufacturers to build standardized robots without application-specific details, and

allows robot users the flexibility to implement customized decision-making algorithms

based on the application. This paper focuses on such decision-making algorithms for

autonomous navigation.

In computer science, decision making involves several subfields of artificial

intelligence; including planning, reasoning, and machine learning. Planning organizes

future actions to achieve an assigned goal. Reasoning infers the truth-value of statements

from known facts. Learning uses past experience to improve decision making, so that the

performance of a robot on a task improves with its experience.

The remainder of this paper describes decision making, planning, learning, and

other techniques used to build autonomous navigation systems for mobile robots, with a

3

particular focus on how knowledge about the environment is acquired and used. Section 2

describes the basic hardware components of a mobile robot and classifies various mobile

robot settings. It also provides an overview of different decision-making approaches and

describes the role of knowledge acquisition in decision making. Section 3 focuses on

knowledge acquisition, that is, how raw sensor data can be converted into information

useful for decision making. Sections 4, 5, and 6 address deliberative, reactive, and hybrid

decision-making approaches to navigation, respectively. Section 7 briefly describes

spatial knowledge and decision making in humans. Finally, Section 8 discusses important

challenges in current autonomous navigation systems.

2. Mobile robots

This section introduces basic components of a mobile robot and describes the different

class of mobile robot systems based on their hardware capabilities, the environment they

inhabit, the tasks they are assigned, and the decision-making approach they adopt. Mobile

robots use a broad variety of sensors and actuators. Sensors can be classified as active or

passive. Active sensors use their own energy source. They emit radiation that is directed

toward the target to be investigated. The radiation reflected from that target is then

detected and measured by the sensor. The key advantage of an active sensor is that it is

less sensitive to external changes, such as time of day and season. Range-finding sensors

(e.g., LiDAR, SONAR, and UltraSonic sensors) are active sensors. They measure the

distance to the nearest obstacle by the emission of light or sound waves. LiDAR uses

light waves; SONAR and Ultrasonic sensors uses sound waves. The reflected light or

sound wave is detected, and the distance between the sensor and the obstacle is calculated

4

based on the delay until the arrival of the reflected wave. In robotics, range-finding

sensors are used to detect obstacles, map the environment, and avoid collision.

A passive sensor measures the naturally available signals in the environment.

Motion sensors and vision-based sensors are the commonly used passive sensors. Motion

sensors are primarily used as an odometer to estimate the position of the robot. Motion

sensors measure various aspects of robot movements. Inertial motion sensors measure the

inertial force experienced by a moving object; wheel encoders estimate the rotation of the

wheel. Vision-based sensors measure electromagnetic radiation in the environment. They

are most commonly used to identify landmarks along a route, but they can also serve as

odometers. There are also hybrid sensors that are both passive and active. Kinnect™ is

one such hybrid sensor. Kinnect™ passively captures visual information with a camera. It

also makes active depth measurements by its emission and detection of infrared radiation.

Kinnect™ sensors are often used to localize and identify landmarks (Veloso et al. 2012).

(a) (b) (c)

Figure 1: Robot models with different actuators and sensors. (a) Surveyor with treaded wheels. (b)

Turtlebot with a Kinnect™ Sensor. (c) Sony’s AiBO robot with legs.

5

Wheels and legs are the two primary kinds of actuators used in mobile robots. As

in Figure 1, Sony’s AiBO1 robots have legs where each joint is controlled by a motor.

Other robots (e.g., Surveyors2 and Turtlebots3) use motorized wheels. Indeed, any

standard vehicle (e.g., a car or a truck) can also be viewed as a wheeled robot. Wheeled

robots predominate among mobile robots because of their simplicity. Compared to a

robot with legs, a wheeled robot has fewer moving parts, which makes decision making

more tractable. Robots with legs often have more motors, whose simultaneous control

requires complex decision making.

Mobile robot systems can be classified, based on the number of robots, as single-

robot or multi-robot systems. Single-robot systems do not require task allocation,

teamwork, or collaboration. Multi-robot systems can parallelize tasks and hence reduce

the task completion time. They are also more robust because hardware or software failure

on a single robot does not necessarily result in task failure. Multi-robot systems are

classified based on whether the robots complete tasks individually or through

collaboration. They can also be classified by whether a central server makes decisions for

all robots or if each robot controls itself.

The sensing capabilities of a robot can be classified as local or global. Local

sensing captures information about the immediate environment through onboard sensors

(e.g., cameras, range-finding sensors). Global sensing captures a global perspective of the

environment with an external sensor like GPS or an overhead camera. In many cases,

mobile robot systems rely on both kinds of sensing.

1 http://www.sony-aibo.com 2 http://www.robotmarketplace.com/products/IL-SRV1Q.html 3 http://www.turtlebot.com

6

Navigational tasks assigned to the robots can be classified as exploratory or

target-based. Exploratory tasks focus on the exploration of a given area without any

particular target location. Target-based tasks expect the robot to travel to one or more

given targets. In navigation tasks with multiple targets, either the target set is known in

advance and is given to the robot before it begins execution, or the target set is not known

in advance and is assigned to robots as and when new targets arrive.

Finally, a robot’s environment can be classified as structured or unstructured.

Structured environments, such as offices or warehouses, have certain geometric

regularities (e.g., rooms, corridors) that allow a robot to make simplifying assumptions

that assist in navigation. In an unstructured setting these assumptions do not hold, and

more domain-independent approaches are required. Environments can also be classified

as static or dynamic. Environments where no features change during task execution are

called static environments. If features of the environment change during the execution of

the task the environment is called dynamic.

The acquisition of knowledge is central to decision making in autonomous

navigation. The knowledge required for autonomous navigation can be classified as

knowledge about the environment, knowledge about the position and orientation of the

robot in the environment, knowledge about an effective set of parameters for a robot in

the given environment, and knowledge about the robot’s goal. This knowledge can be

acquired continuously with sensors. Such knowledge will be up to date and need not be

stored internally. Another approach is one-time acquisition of knowledge that is then

stored internally for future use.

7

Reactive systems use only continuous sensor data without any stored internal

knowledge. Reactive systems are rule-based, fast, and can quickly respond to changes in

the environment. They can also be sub-optimal because they make decisions based only

on the immediate situation and ignore the future implications of a decision.

In contrast, deliberative systems require an internal representation of knowledge.

That knowledge can be either pre-defined or learned through sensor data. A deliberative

system uses its knowledge to predict the future states of a robot and to select actions that

would lead the robot to its goal. These methods work well only when the external

environment remains consistent with robot’s internal representation. Most approaches are

hybrids; they acquire some components of knowledge continuously from sensors and

store others. Whether or not it stores knowledge, every navigation system must acquire

and apply knowledge about the environment, about the position and orientation of the

robot, about its parameter settings, and about the goal.

3 Knowledge acquisition

This section describes the components of knowledge required for navigation. Mapping

generates a map of the environment. Localization determines the position and orientation

of the robot (robot state) in the environment. Simultaneous localization and mapping is a

knowledge acquisition problem where both mapping and localization is done

simultaneously. Estimation of parameter values that optimize robot performance in a

given environment is also discussed. Finally, this section ends with a survey of different

approaches to order tasks assigned to a robot.

8

3.1 Mapping

A map is a diagrammatic representation of an environment that represents features useful

for navigation. Metric maps capture the geometric properties of the environment;

topological maps capture the connectivity among different regions.

Two important approaches to mapping based on metric maps are those that use

geometric shapes (Chatila & Laumond 1985) and those that use probabilistic grids called

occupancy grids (Elfes 1989). Early mapping work used convex polygons to model

empty space (Chatila & Laumond 1985). Service robots in museums or factories could

use these techniques to build a map of the environment. The environment can be initially

controlled to ensure that there are no moving obstacles; once an accurate map is built, the

robot can then start to navigate. This approach has two main difficulties: geometric,

shape-based mapping provides no mechanism to incorporate uncertainties due to sensor

or actuator errors or changes in the environment, and it applies only to environments with

regular geometric structures.

In contrast, occupancy grids try to address both these problems with a

probabilistic grid that models a maps (Elfes 1989). This approach partitions the map into

a grid, each of whose cells stores a probabilistic estimate of whether it contains an

obstacle. This allows the use of techniques such as Bayesian estimation, where an initial

belief about the occupancy of the cell is continuously revised as more evidence is

collected from sensor measurements.

Topological approaches collect information about the connections among the

various regions in the map (Mataric 1992; Kuipers & Byun 1991). In one approach,

robots use sensors to detect and follow walls (Mataric 1992). A compass mounted on the

9

robot detects the orientation of the walls. As the robot tries to follow one wall after

another, it creates a network of such landmarks, and uses the network as a topological

map for navigation. In another approach, the robot randomly explores the environment

while it simultaneously identifies places with distinctive characteristics based on its

sensor measurements (Kuipers & Byun 1991). For example, a robot in a room could use

“equal-distance-from-nearby-obstacles” as its metric to correctly identify the middle of

the room. After the robot reaches a distinctive place, it randomly chooses and follows an

applicable control strategy such as “follow-the-midline-in-a-corridor” or “walk-along-

the-edge-of-a-large-space”. If a new distinctive place is found, then the two distinctive

places are treated as nodes and the control strategy between them is recorded as the edge

that connects the nodes.

Hybrid approaches use both topological and metric information for navigation

(Thrun 1998). Metric information is stored as an occupancy grid; a trained neural

network converts the sensor measurements into the values in the grid. Planning over an

occupancy grid depends on the granularity and can be suboptimal for fine granularity. A

topological map makes planning more efficient, since it has fewer regions and hence a

smaller state space compared to an occupancy grid. A Voronoi diagram converts the

occupancy grid into a topological map. A Voronoi diagram (Figure 2(a)) is a set of

points in free space that have at least two distinct nearest points. Critical points are points

in the set where the distance to its nearest neighbor reaches a local minimum. Critical

lines connect critical points to its nearest neighbors (Figure 2(b)). These critical lines

partition a map into topological regions (Figure 2(c)). These topological regions are then

10

converted into a graph where nodes represent the regions and edges represent adjacencies

between regions (Figure 2(d)).

(a) Voronoi diagram (b) Critical lines

(c) Regions (d) Topological graph

Figure 2: Extracting the topological graph from the map (Thrun 1998, p. 44).

3.2 Localization

Localization is a fundamental ability for navigation. There are two kinds of localization:

position tracking, which finds the position of the robot relative to its initial position, and

global localization, which finds the global co-ordinates of the robots.

Dead reckoning can be used for position tracking. It continuously measures the

motion of the robot and estimates a position relative to the robot’s initial position. The

robot’s motion can be estimated either by odometers that measure the speed of the wheels

or by inertial sensors, such as accelerometers, that measure the inertial forces experienced

11

by the robot during motion. Probabilistic approaches can improve the performance of

dead reckoning. One approach is to estimate the probability distribution of the next

position of the robot in the map, given the sensor measurements and its previous

positions. The Markov assumption reduces this dependency to only the previous position.

Kalman filters are commonly used to estimate this probability distribution (Montemerlo

et al. 2002). Kalman filters assume that a Gaussian distribution can model the actuator

and sensor errors.

The two most common approaches to global localization are landmarks and

sensor matching. A landmark is defined as a distinctive and recognizable feature of the

environment. The robot identifies and stores landmarks encountered in a map. The robot

is localized when a new landmark matches a stored landmark. Landmark-based

approaches are common in robots that use vision-based sensors. The robot can use scale-

invariant features (Lowe 1999) of the environment as landmarks (Se et al. 2001). Scale-

invariant features do not change under image translation, scaling, or rotation. This makes

them suitable landmarks for mobile robot localization. Given a database of such features,

a matching algorithm can be used to find the global coordinates of the robot.

Sensor matching uses all available sensor information to estimate the robot’s

global position. Monte Carlo Localization (MCL) is one such global localization

technique. It computes the probability distribution of the robot’s position in the

environment given all sensor measurements and a global map (Thrun et al. 2001). MCL

applies sampling-based methods to approximate the distribution. The key idea is to

maintain a probability distribution of the robot’s position as a set of random position

samples (particles) drawn from the probability distribution. Each particle is a possible

12

position for the robot. The set of particles constitute a discrete approximation of the

probability distribution. Weights are then assigned to each particle based on the sensor

information. These weights reflect changes in the probabilities of the possible positions

of the robot, given the new evidence from the sensors. Once the robot moves, it computes

a new particle set that represents the robot’s latest position. Both the old particle set and

its weights are used to compute the new particle set. This ensures that the new particle set

incorporates the evidence from the sensors. Because the number of particles can be

adjusted, MCL allows a tradeoff between accuracy and computational cost, which makes

it a popular localization algorithm.

3.3 Simultaneous localization and mapping

Simultaneous localization and mapping (SLAM) generates a map of the environment and

simultaneously determines the robot’s location in that map. This is accomplished by

detection of new landmarks and identification of observed landmarks as the robot moves

in the environment. When a robot moves, a noisy odometer estimates the new position.

Landmarks from the environment are extracted from this new position. The robot

attempts to associate these just-identified landmarks with earlier observed landmarks. Re-

observed landmarks are then used to improve the estimation of robot’s position.

The initial approach to SLAM proposed the use of an extended Kalman filter

(EKF) to estimate the distribution of the robot’s position and the distribution of landmark

positions (Smith et al. 1987). Key limitations of EKF-based approaches are that they are

slow and scale quadratically with the number of landmarks (Montemerlo et al. 2002).

FASTSLAM, a popular SLAM method based on particle filters, has been shown to scale

logarithmically with the number of landmarks (Montemerlo et al. 2002). An important

13

requirement of the SLAM-based approach is the detection and identification of enough

landmarks to ensure accurate localization. A large number of landmarks, however, imply

a large computation time, which might be inappropriate for real-time applications. This

inherent tradeoff between speed and accuracy remains an important issue in SLAM.

3.4 Parameter estimation

The identification of optimal parameter values to control robot navigation is called

parameter estimation. In reactive navigation systems, the parameters are often used to

weight different behaviors (Maes & Brooks 1990; Ram et al. 1994). In one approach, the

correlation between behaviors and conditions that result in positive reinforcements are

learned, and the behaviors with a high probability of positive reinforcement are preferred

(Maes & Brooks 1990).

Another approach to parameter estimation uses genetic algorithms (Ram et al.

1994). Initially, random weights are assigned to the behaviors of the robots. The robots

are run in simulation and their performance on a navigational task is recorded. Many such

sets of weights are generated and their performance is evaluated in simulation. A new

generation of weights is created from the combination of weights with better performance

from the previous generation and they are again tested in simulation. The process is

repeated until a termination condition is reached; then the weights with best performance

are selected.

A robot that learns to adapt to changes in its environment greatly improves its

utility. For example, a robot that can switch from one kind of sensor to another to account

for variations in lighting conditions (e.g., learn to use a vision camera during the day and

an infrared sensor at night) can be more useful. A robot that can adapt its behavior based

14

on feedback from its users is also useful. Such a robot could learn to navigate differently

in different settings, and optimize for time or distance travelled.

3.5 Goal Knowledge

Navigation goals set targets and order them, that is, they specify when a visit to one target

should precede a visit to another target. For a single robot with multiple unordered

targets, target ordering seeks an ordering that maximizes efficiency. In such cases, target

ordering can be modeled as a traveling salesperson problem. In other cases, target sets are

partially ordered or completely ordered. In scenarios such as manufacturing where the

robot has to move to different parts of a factory floor to follow a predefined workflow

that combines components to build products, the target set might be completely ordered

and the robot’s task is only to reach the targets in the specified order. Other scenarios

might specify the target set as a partial order. For example, if the goal is to shift identical

chairs from one room to another, the robots must visit the first room to pick a chair up

and then visit the destination room to place that chair. There are no restrictions, however

on which chair to pick up first or where in the destination room to place the chair.

A partially or unordered target set can also be dynamic. The target set is dynamic

either if the target’s position changes over time (the moving target tracking problem)

(Huang 2009) or if targets are dynamically added and deleted. To operate successfully in

such an environment, the robot should be able to re-compute the optimal target order

during the task. Another way to address dynamism is to rely on heuristics (e.g., visit the

closest point first) that can provide possibly suboptimal but quick solutions.

In multi-robot systems, target allocation partitions a set of targets among the

robots, in a way that improves performance. Optimal target allocation is computationally

15

expensive, because it must find both an optimal partition and an optimal target order for

every allocation. When both the robot’s environment and the set of targets are static,

combinatorial auctions can be used to allocate targets (Berhault et al. 2003). Each robot

bids for a bundle of targets. The targets are allocated to the lowest bidder. Since the

number of possible bundles increases exponentially with the targets, finding an optimal

bidding strategy is a hard problem.

4. Deliberative decision making

This section describes several planning mechanisms relevant for navigation. Given an

internal representation of the map, the robot state, parameter values, and a goal, a planner

should be able to generate a sequence of actions (a plan) that will accomplish its goal,

provided the robot’s actuators execute the action accurately. When the task is navigation,

a plan can be a sequence of waypoints on a map. The identification of this sequence of

waypoints is called pathfinding. There are many different ways to plan including classical

planning, graph-based approaches, sampling-based approaches, and planning in non-

deterministic settings.

4.1 Classical Planning

Classical planning uses formal logic to represent and create a plan. In classical planning

methods, the map, robot state information, robot action repertoire, and goal information

are stored as logical formulas, and the planner uses theorem proving to prove that

execution of the plan will satisfy the goal conditions. Classical planning can address

planning in a state space, a plan space, a planning graph, or as a constraint satisfaction

problem.

16

In state-space planning, the search space is a set of world states, transitions

between states are actions, and a path through the state space that begins at the start state

and ends in a goal state is a plan. STRIPS was an early a state-space planner that used

theorem proving to test if the state that eventually results from the execution of actions

listed in a plan would satisfy the goal condition (Fikes & Nilsson 1971). STRIPS used

means-ends analysis as a heuristic to guide search. ABSTRIPS improved on STRIPS; it

built an abstract plan first, and moved towards a more detailed version of the plan only if

the abstract plan succeeded (Sacerdoti 1974). This detection and elimination of

impossible search spaces allowed ABSTRIPS to plan faster.

In plan-space planning, the search space is a set of plans (including partial plans).

The initial state is a null plan and transitions are plan operators that add or reorder actions

to modify plans. Plan-space planning methods follow the principle of least commitment

and create a partial order of actions. Actions are ordered only when necessary; this

reduces backtracking, as it avoids a situation seen in state space planning called

Sussman’s anomaly (Sacerdoti 1975). For example, NOAH, a non-linear planner, fulfills

unsatisfied preconditions by adding actions to the plan (Sacerdoti 1975). It then uses

procedures called critics to confirm that the additions do not harm previously satisfied

preconditions. Actions that cause conflicts are ordered to resolve any conflicts. Because

actions are ordered only when required (in this case is to resolve conflicts), NOAH

follows the principle of least commitment. TWEAK is an improvement over NOAH; it is

a simpler, provably complete and correct non-linear planner that used a STRIPS-like

action representation (Chapman 1987). TWEAK, however, is restrictive; its

representation does not allow conditions and universal quantifiers. UCPOP, another

17

partial-order planner, allows universal quantification and conditional effects (Penberthy

& Weld 1992). The efficiency of UCPOP is further improved in RePOP, where heuristics

ensure that the partial-order plan most likely to lead to a solution is explored first

(Nguyen & Kambhampati 2001).

A planning graph is a directed graph where levels alternate between proposition

nodes and action nodes. An example appears in Figure 3. An edge connects an action in

one level to a proposition in the next level whenever the proposition forms a post-

condition of the action. An edge also connects a proposition in one level to an action in

the next whenever a proposition satisfies a pre-condition of an action. Two actions

interfere with one another if the outcome of one action reverses the outcome of the other,

or if the outcome of one action negates the preconditions of the other. Planning graphs

are more compact than their corresponding state-space based graphs, that is, they have

fewer nodes because they allow actions to interfere. To produce a plan, a search

algorithm is run on the planning graph in stages.

Figure 3: Example of a planning graph with alternate levels of actions and propositions (Blum 1997, p. 5).

18

GraphPlan is a planner based on a planning graph and a STRIPS-like

representation (Blum 1997). GraphPlan begins in stage 1 with a planning graph whose

single proposition level represents the initial conditions. In stage i, GraphPlan takes the

planning graph from stage i−1, extends it one time step (the next action level and the

following propositional level), and then searches the extended planning graph for a valid

plan of length i. GraphPlan’s search either finds a valid plan (in which case it halts) or

determines that the goals are not all achievable by time i (in which case it iterates to the

next stage). In each iteration through this extend/search loop, the algorithm either

discovers a plan or else proves that no plan with that many time steps or fewer is

possible.

Planners based on heuristic-search focus on how to guide the search through the

state space. Heuristics evaluate future states on their distance to the goal; the state closest

to the goal is explored first. The number of actions in a plan that solves a relaxed version

of the planning problem is used as the measure distance. The Fast-Forward (FF) planning

system uses planning graphs to construct relaxed planning problems whose solutions are

then used to estimate the distance to the goal (Hoffmann & Nebel 2001). The Fast-

Downward (FD) planning system combines heuristic search with hierarchal problem

decomposition (Helmert 2006). Like FF, FD computes plans by heuristic search in the

state space. Unlike FF, however, FD’s heuristic function uses hierarchal decomposition

of the planning task to construct relaxed planning problems. The number of actions in the

plans for these relaxed planning problems is then used to estimate the distance from a

state to the goal. The LAMA planner extends FD with landmarks, propositional formulas

19

that must be true at some point in every plan for a given task (Richter & Westphal 2010).

LAMA then uses landmarks to guide the search to states with many landmarks.

Some approaches have sought to transform the planning problem into a constraint

satisfaction problem, to exploit existing constraint satisfaction solvers. A planning graph

can be formulated as a SAT problem, and general-purpose SAT solvers can be used to

solve the problem (Kautz et al. 1996). In another approach, Local Planning Graph (LPG),

the planning graph is formulated as a propositional constraint satisfaction problem

(Gerevini & Serina 2003). LPG treats the stages of the planning graph as variables, and

the detection of a partial plan in a stage as variable assignment. LPG relies on heuristics

to improve its speed.

4.2 Graph-based Search

In graph-based search, an undirected graph represents the topology and distance

information of a fixed map. The map is divided into a regular grid, where nodes represent

cells and edges connect adjacent cells. Edge weights represent obstacles between the

cells. A route through the map from the initial position of the robot to the goal can be

represented as a path in the undirected graph. This reduces the path-planning problem to

a shortest path problem.

A* is the most common approach to search for a path to the goal state (Nilsson

1980). A* estimates the total cost from the start node to the goal as the sum of the cost to

get to the current node from the start node and a heuristic estimate of the cost from the

current node to the goal node. The algorithm returns an optimal path from the start to the

goal when the heuristic always underestimates the true distance. A* search keeps a queue

of states to be explored and the state with the lowest total cost is explored first.

20

On large maps, A* consumes considerable memory, because the number of states

in the queue grows at an exponential rate. In contrast, Iterative Deepening A* (IDA*)

only requires memory linear in the length of the solution that it constructs (Korf 1996).

IDA* is an iterative algorithm that begins with an initial cutoff for the total estimated cost

of the solution. Each iteration begins with a depth-first search for a solution whose

estimated total cost is less than the cutoff. If a solution is not found, the cutoff is

incremented and depth first search is repeated. Hierarchal planning A* (HPA*) is a near-

optimal algorithm that uses the hierarchal properties of path planning to save computation

(Botea et al. 2004). HPA* partitions the map into neighborhoods, and finds a path to the

border of the neighborhood that contains the start location. Next, it finds a path from the

border of the start neighborhood to the border of the neighborhood around the goal.

Because there are fewer neighborhoods than nodes in a grid-based graph, computation of

such a path is fast. Finally, HPA* finds a path from the border of the goal neighborhood

to the goal. All three paths combined forms a solution that is both quick to compute and

near-optimal.

4.3 Sampling-based approaches

Unlike simple path planning, motion planning does not assume that the robot is a single

point in space. Instead, motion planning considers the problems that can arise when

robots are treated as solid-bodied objects. The high dimensionality of motion planning

renders several of the standard planning approaches ineffective. Sampling-based

approaches are commonly used instead. Rather that an explicit representation of the

environment, sampling-based algorithms rely on collision checking to test whether two

randomly sampled states have a feasible trajectory. A graph of feasible trajectories is

21

built from multiple sampled states, which is then used to construct a solution to the

original motion-planning problem. Although sampling-based algorithms are not

complete, they provide probabilistic completeness guarantees, that is, the probability that

the planner fails to return a solution, if one exists, decays to zero as the number of

sampled points approaches infinity.

There are two principal sampling-based approaches: Probabilistic RoadMaps

(PRMs) (Kavraki et al. 1996) and Rapidly exploring Random Trees (RRT) (Zucker et al.

2007; Ferguson et al. 2006). As shown in Figure 4, PRM is a graph whose nodes

represent valid states, chosen at random from a state space. In motion planning, a state

includes not only the position of the robot, but also the orientation of the robot’s body.

Each state is represented as a node in a graph. A set of such nodes is chosen randomly. A

local planner checks for a path from one state to other nearby states (within distance d

from the given state, where d is pre-defined) and, if a path is found, the nodes that

represent the states are connected with an edge. The resultant graph becomes the input to

a graph search algorithm that then computes the shortest path from the initial state to the

goal state.

The RRT algorithm randomly builds a tree to search a non-convex, high-

dimensional space. RRT chooses a state at random, and connects it to the nearest state in

the tree, provided that a collision-free path exists between them. A local planner that

searches for a plan of pre-defined length l is used to test for the existence of a collision-

free path. Each iteration adds a new state to the tree, and the tree grows with a bias

toward unexplored areas of the search space. An example appears in Figure 5. Growth of

the tree stops when the goal state is randomly chosen and added to the tree. The tree is

22

then used as an input to a search algorithm, which returns a path from the initial state to a

goal state.

Figure 4: PRMs (Choset 2005, p. 245). The circles denote randomly sampled states and the lines denote

feasible trajectory. The dark lines form a path from initial state (qinit) to final state (qgoal).

Figure 5: Growth of RRTs biased towards unexplored spaces (Kuffner and LaValle 2000, p. 3).

4.4 Planning in non-deterministic settings

In many domains a robot’s actuators are imperfect, which leads to uncertain outcomes. In

such domains the state of a robot after an action can be represented as a probability

23

distribution over the possible states. A Markov decision process (MDP) can be used to

model such domains. An MDP framework, however, assumes that there is complete

observability (i.e. the robot can accurately detect its current state).

An MDP can be described as a tuple <S, A, T, R, π >, where:

• S is a finite set of states of the world. A continuous world can be converted into a

finite set of states by discretization.

• A is a finite set of actions.

• T : S × A → p(S) is a state-transition function, which for every state s ∈ S and

action a ∈ A, the function returns a probability distribution over the world states.

• R : S × A → ℝ is the reward function that returns expected reward gained by the

robot when it chooses an action in a specific state.

• 𝜋: S → A is policy function that specifies the action to be chosen in any given

state.

The objective of an MDP framework is to find an optimal policy function that results in

the maximum reward accumulation, as defined by the reward function. Value Iteration

(VI) is an iterative algorithm to find such an optimal policy. A value function returns the

maximum reward that the robot could get from a particular state if it is allowed to make a

limited number of decisions. VI initially computes the value function assuming the robot

is allowed make only one decision. In the next step, it iteratively computes the value

function if the robot is allowed to make two decisions with the previously computed

value function. This is repeated until the value function is found for the desired number

of decisions. The optimal policy is then to choose actions that should lead to states with a

maximum value as defined by the value function.

24

In many domains, a robot with noisy sensors may not able to determine its current

state with complete reliability. Partially Observable MDP (POMDP) provides a

framework to obtain optimal policies in such domains (Kaelbling et al. 1998). A POMDP

can be described as a tuple <S, A, T, R, O, Of> where,

• S, A, T and R describe a Markov decision process;

• O is a finite set of observations the robot can experience through its sensors. A

continuous valued sensor can be discretized to generate a finite observation set.

• Of : S × A → p(O) is the observation function which returns for each action and

resulting state a probability distribution over possible observations

Because POMDPs do not know the current state accurately, they maintain a probability

distribution over states called a belief state. Some approaches have a separate component

called State Estimator (SE) that updates the belief state based on the last action, the

current observation, and the previous belief state (Figure 6). The goal is to generate a

policy 𝜋 that could suggest optimal actions from any belief state. An important aspect of

POMDPs is that there is no difference between an action chosen to change the state of the

world and action chosen to gain information. POMDPs have several drawbacks, however.

First, the generation of an optimal policy is computationally expensive, so such a policy

cannot be generated for large domains. Second, POMDPs are based on a stochastic model

of the transition and reward function, so any policy generated is optimal only for that

model; any change in the stochastic nature of the environment results in suboptimal

results.

25

Figure 6: Relationship between state estimator, policy and the world (Kaelbling, Littman, and Cassandra

1998, p. 106).

Online POMDPs, unlike offline POMDPs, interleave policy construction and

execution (Figure 7) (Ross et al. 2008). The online approach tries only to find a good

local policy for the current belief state of the robot, so it considers only states reachable

from the current belief state. This focuses computation on a small number of states. Once

this local planning phase terminates, the execution phase executes the best action from

the local policy and updates the current belief based on the observation obtained.

Figure 7: Policy construction and execution in online and offline POMDPs (Ross et al. 2008, p. 671).

An improvement in performance of online POMDPs is made by the application of

Monte Carlo sampling (Silver & Veness 2010). Instead of search over all possible

observations, a search over a sampled set of observations, and beliefs reached by the

sampled observations allows for a deeper search within a given planning time.

26

5. Reactive decision making

Figure 8: Subsumption Architecture (R. Brooks 1986, p. 7).

This section reviews various architectures built for reactive decision-making. Rod

Brooks, in 1991, argued that intelligence can be achieved without any internal

representation (Brooks 1991), based on two important observations. First, the acquisition

of accurate world information through noisy sensors is difficult. Second, even if an

accurate internal representation were built, it would have to be updated constantly

because most real-world robotics deals with dynamic environments. Given these

arguments, Brooks’ approach was to replace internal representations with sensor

information and predefined rules that computed the decision reactively. Without any

internal representation, the problems of accurate knowledge acquisition and stale

knowledge are avoided. His reactive subsumption architecture is based on behaviors

organized into layers, where decisions from higher-layer behaviors override decisions

from lower-layer behaviors, as shown in Figure 8 (Brooks 1986). Behaviors at each layer

receive sensor information and make independent decisions that are sent to the actuators.

27

Toto, a mobile robot system, was built on the subsumption architecture (Brooks

1986). Its low-level behaviors enabled it to wander without collision. An intermediate

level of behavior recognized such landmarks as walls and clutters. Each time it

recognized a landmark, a behavior allocated itself as the 'place' of that particular

landmark. Behaviors that corresponded to physically adjacent landmarks had neighbor

relationship links activated between them. A graph structure was thus formed, where the

nodes of the graph were active computational elements rather than static data structures.

As the robot moved in the environment, the nodes tracked its location. Nodes became

active if they could detect that they corresponded to the robot’s current location. Thus the

robot had both a map, and a sense of where it was on the map, from a distributed

computational model.

AuRA also builds robot decision systems as a collection of parallel behaviors

(Arkin 1990). It includes motor schemas that are primitive movement behaviors, and

perceptual schemas that are behaviors to sense actively the required information from the

environment. Each motor schema recommends a possible move in the form a vector with

direction and magnitude. The sum of all such vectors represents the final decision that is

then executed by the robot.

Pengi consists of a collection of rules (Agre & Chapman 1987). These rules were

guided by the properties of the immediate situation. With its simple scheme of rules,

Pengi plays a fairly complicated game without an explicit internal representation of the

environment. Although reactive methods are fast and can handle dynamism well, the fact

that they do not exploit available knowledge to make long-range plans limits their

efficiency.

28

6. Hybrid decision making

This section reviews different approaches used for robot navigation in a dynamic

environment such as dynamic obstacles (e.g., people moving on a factory floor) and

dynamic maps (e.g., roads blocked for construction). Given a navigation plan as a

sequence of subgoals, changes in the environment could render it useless. Because not all

environmental changes can be predicted, robots have to rely on their sensors to gather

information about them. Thus the reliance on only an internal representation does not

serve well in realistic scenarios. Instead, hybrid architectures try to adapt deliberative

behavior in the light of sensor information about the changes in the world. Such

architectures are also reviewed in this section.

6.1 Obstacle Avoidance

If there are moving obstacles in the environment, the robot must avoid them (obstacle

avoidance) and continue to follow its earlier plan, revise it, or create a new plan. It is

typically assumed that the robot can sense the location and the velocity of the obstacle. A

common approach to obstacle avoidance is first to plan a path to avoid static obstacles

and then use local information about dynamic obstacles to divert minimally from that

plan. How to minimize the cost of diversion is treated as an optimization problem

(Chhugani et al. 2009). Other approaches to obstacle avoidance include reactive rules

(Sun et al. 2014) and potential fields (Khatib 1985). Although such rules are

computationally fast, they suffer from local minima and do not guarantee optimal

solutions (Koren et al. 1991).

6.2 Dynamic Maps

29

When maps are partially known or change over time, A*-generated plans can fail. Using

A* to replan whenever a plan fails is computationally expensive. To address this need for

repeated planning, incremental search algorithms like D* (Stentz 1994), focused D*

(Stentz 1995), LPA* (Koenig et al. 2004), D* lite (Koenig & Likhachev 2002), and

MPGAA* (Hernández, Baier, and Asin 2014) have been proposed. Incremental search

algorithms assume that the unknown parts of the map have no obstacles, and find a

shortest path to the goal under this assumption. The robot then follows the path. When

the robot observes new map information it adds the information to the map and, if

necessary, generates a new shortest path to the target from the current position. This new

path generation is more efficient because it uses information from the earlier planning

process.

D* search starts from the goal state and explores until the robot’s initial state is

reached. D* search maintains a list of states to be evaluated, along with the shortest

distance from each state to the goal state. The list begins with the goal state. In every

iteration, D* chooses the state closest to the goal state from the list and adds its neighbors

into the list. When the start state is added to the list, the search stops, and the path to the

goal can be found by backtracking. The robot then follows this path to the goal. When the

robot encounters a new obstacle, cost updates are propagated from the position of the

obstacle to the robot. D* is efficient because most changes in the map are detected near

the robot, and therefore require fewer cost updates.

Focused D* improves upon D* with a heuristic that focuses the propagation of

cost changes toward the robot. LPA* returns the shortest path in a changing graph, but

with its start and end positions held constant. Unlike D*, which is similar to uniform cost

30

search, LPA* is similar to A* search. Each node in LPA* uses heuristics to maintain an

estimate of the path length through that node. When edge costs change, these estimates

are updated and the updated estimates are used to re-compute the plan. D* lite adapts

LPA* to work when the initial position of the robot changes as the robot moves.

MPGAA*, introduced in 2014, is an empirically faster yet simpler algorithm than

D* lite. MPGAA* initially finds an optimal path from the robot to the goal using the A*

algorithm. After the robot’s environment changes, MPGAA* efficiently re-plans with

two main ideas. First, it uses the information from the initial A* search procedure to

improve the heuristic estimate to the goal. This makes subsequent searches faster.

Second, it saves all the previous optimal paths and tries to reuse them instead of

repeatedly planning.

Another important search variation is anytime A*, an incremental algorithm that

allows the user to query for a possibly suboptimal solution at any time during search.

This is useful for robots with real-time requirements (Likhachev et al. 2005).

6.3 Hybrid Architectures

Many hybrid architectures incorporate both sensor information and static internal

representations of the environment in decision making (Firby 1987; Connell 1992; Gat

1997; Goodwin et al. 1997; Ranganathan & Koenig 2003; Rosenblatt 1997; Arkin 1990).

Reactive planning selects plans based only on the current situation without any lookahead

(Firby 1987). Reactive planning uses a set of pre-defined procedures called Reactive

Action Planners (RAP). Every RAP is a specially designed search procedure that chooses

actions only based on the current world state, until the assigned goal state is reached. The

goal-dependent design of its search procedures makes the search more efficient than brute

31

force, while dependency on only the current state ensures that the search algorithm works

despite changes in the environment.

Many three-tier hybrid architectures have been proposed with the following

structure: a reactive tier of behaviors that directly map sensor values to actuators with no

internal state, a sequencer whose algorithms sequence the behaviors to achieve complex

objectives, and a slow deliberative planner to find a long-term plan. In 3T, for example,

the reactive layer has primitive behaviors such as wall finding, wall following, and

collision avoidance (Gat 1997). Algorithms in its sequence layer combined the reactive

behaviors to achieve complex objectives. For example, self-localization is achieved

through a combination of wall following and collision avoidance.

Figure 9: DAMN Architecture (Rosenblatt 1997, p. 2).

Rather than a top-down structure to achieve the desired symbiosis of reactive and

deliberative behavior, DAMN, shown in Figure 9, takes a horizontal approach, where

reactive and deliberative modules concurrently vote to control the robot (Rosenblatt

1997). DAMN’s slower deliberative modules vote at a slower rate, while the faster,

reactive modules vote at a faster rate. Votes from different modules are combined based

on a pre-defined set of weights given to the modules. These weights control the reactive

and deliberative tendencies of decision making in DAMN.

32

7. Human Navigation

Figure 10: (a) Graph. (b) Labelled graph. (c) Survey knowledge (Chrastil and Warren 2013, p. 2).

This section briefly reviews relevant research in psychology about how people capture

spatial information and navigate. Evidence from psychology suggests that the spatial

knowledge that underlies human navigation includes routes, topological graphs, labeled

graphs, and surveys arranged in increasing level of detail (Chrastil & Warren 2013). A

more detailed representation allows superior navigation, but it is much more difficult to

learn. A route is a list of place-action associations that links an action at each

recognizable place. Routes can guide navigation to a known location along a previously

traveled path. A topological graph, of an environment consists of a network of

recognizable place nodes linked by path edges (traversable paths between nodes). An

example appears in Figure 10(a). It represents the known connectivity of the environment

and enables travel between any two nodes in the graph. In contrast, route knowledge does

not accommodate path intersections or junctions, and restricts travel to only known routes

between specific locations.

In addition to topological information, a labeled graph, (as in Figure 10(b))

incorporates approximate local metric information about distances between known places

(edge weights) and/or angles between known paths (node labels). This local metric

33

information can help find efficient routes and detours, and can permit approximate,

distance-based shortcuts through the local integration of information. Survey knowledge

(as in Figure 10(c)) consists of information about environmental locations, their

coordinates in a Euclidian space. The metric distances between these locations can be

computed from the coordinates. If such a map were built by path integration, it would

enable direct and accurate shortcuts between known locations, instead of just a single

possible route.

Wayfinding identifies a route from the source to the destination. It has been

hypothesized that humans use different wayfinding techniques for different purposes

(Golledge 1999). For example, a trip home after work retraces a learned route, while a

trip to a new location in a known city requires a search over more detailed representations

like a labeled graph or survey knowledge. The use of different wayfinding techniques

requires that people learn multiple spatial representations simultaneously to form a spatial

collage (Tversky 1993). This ability to maintain multiple spatial representations of our

environment and use a combination of them as the situation demands is an important

feature of human navigation.

8. Challenges in Autonomous Navigation

This section identifies significant research challenges in autonomous navigation and

argues that solutions to these problems would improve autonomous navigation.

8.1 Partially ordered targets

Most robots in dynamic environments address target ordering and path planning with two

separate algorithms (Veloso et al. 2012; Berhault et al. 2003). Techniques, such as

34

auctions, generate a target order, and then graph-based dynamic planners (e.g., D* lite)

use the target order as input and generate a plan for the next target in sequence. Although

dynamic planners claim to be optimal, they assume a total order for targets. In many

robotic applications, however, navigational tasks are partially ordered. For example,

moving a box on a factory floor might require the robot to carry the box to one of many

inventory rooms, but not necessarily to a specific room. In scenarios where targets are not

fully ordered, the approach of “first order, then plan” may produce suboptimal results.

When a robot’s environment changes, it might be more efficient to reorder target points

rather than to re-plan. Current graph-based dynamic planners fail to do this.

8.2 Learning relevant information

Most state-of-the-art autonomous navigation systems have a predefined model of the

environment (Veloso et al. 2012; Petrovskaya & Thrun 2009). A path planner then finds

the shortest possible path in that model. An accurate model either requires expensive

sensors or forces the robot to work in a static environment. An ideal autonomous

navigation system should learn, from experience, how to ignore information about the

world that is irrelevant to navigation, and how to extract only the relevant information.

Plans in such an ideal navigation system would allow for inaccuracies of the world

model. If the plan fails due to an inaccurate model the robot should be able to identify the

part of the model caused the failure and improve it accordingly.

8.3 Multi-robot navigation

In many applications where redundancy is important and the task is parallelizable, a team

of robots is more appropriate than a single robot. For example, in a search and rescue

scenario a team of robots might be better able to survive harsh conditions. There, failure

35

of a single robot does not mean failure of the entire mission, because the job of a failed

robot can be transferred to a working robot. Although A* is effective in moderate-sized

maps, multi-robot planning even in a static environment is computationally expensive

and does not scale well. The dynamic version of multi-robot planning is even more

difficult.

Another problem in multi-robot navigation is the selection of the correct team size

for the task at hand. For example, on a factory floor with multiple homogenous robots

that move material, too many robots can cause congestion and slow the process, while too

few can result in long task completion time.

8.4 Human Interaction

Current autonomous navigation systems that interact with humans either solicit input

from the user or communicate the result of a computation (Petrovskaya & Thrun 2009).

In applications of autonomous navigation, such as autonomous driving, people often have

useful navigation information beyond the destination location. For example, a passenger

in an autonomously driven vehicle might know that a particular street in the city is

blocked or likely to be blocked. As another example, a passenger might wish to avoid a

neighborhood that she considers unsafe. Another example is in mission-critical

applications, such as search and rescue, where additional human knowledge can direct the

robots in unforeseen situations. All these examples demonstrate the usefulness of a

navigation system that can be easily understood by people and can incorporate human

intuition and understanding for better decision making.

8.5 Changing environments

36

Many existing autonomous navigation systems are hybrids that rely on a mixture of

reactive and deliberative approaches. The reactive components of the architecture ensure

that the system is responsive in a dynamic world, whereas the deliberative components

seek to harness potential efficiency gains from long-term planning. The balance between

the two approaches is then determined manually, based on the nature of the domain.

Domains that experience rapid changes might be biased toward reactive approaches,

whereas domains that experience limited changes might be biased towards deliberative

approaches.

In many domains where robots could be deployed, however, (e.g., in airports to

move cargo, on factory floors to move goods, or on roads for autonomous driving) there

are periods of time when the environment changes are rapid and others periods when the

environment remains static. For example, an optimal strategy for rush hour may be

different from an optimal strategy during other, less busy hours. The ability to focus on

deliberative decision making when the world is static and reactive decision making when

the world is dynamic remains an important challenge for navigation.

9. Conclusion

Autonomous navigation by robots helps automate physical tasks that require robots to

move to different places in an environment. This paper has reviewed different approaches

to automate decision making for robot navigation, with the acquisition and use of spatial

knowledge as the central theme. Reactive methods’ dependence on continuous sensor

data allows them to adapt quickly to changes in the environments, while deliberative

methods can make long term decisions which leads to significant improvements in

efficiency. Hybrid approaches, attempt to capitalize on the best of both. Research on

37

human navigation is relevant because many robot applications are in domains where

humans navigate with relative ease. The limitations in the current decision-making

approaches for navigation restrict the widespread use of mobile autonomous robots, and

present interesting research challenges.

10. References

Agre, P.E. & Chapman, D., 1987. Pengi  : Implementation of a Theory of Activity. In Proceedings of AAAI 1987. pp. 268–272.

Arkin, R.C., 1990. Integrating behavioral, perceptual, and world knowledge in reactive navigation. Robotics and Autonomous Systems, 6, pp.105–122.

Berhault, M., Huang, H., Keskinocak, P., Koenig, S., Elmaghraby, W., Griffin, P. & Kleywegt, A., 2003. Robot exploration with combinatorial auctions. In Proceedings of 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003), pp.1957–1962.

Blum, A., 1997. Fast planning through planning graph analysis. Artificial Intelligence, 90, pp.281–300.

Botea, A., Müller, M. & Schaeffer, J., 2004. Near optimal hierarchical path-finding. Journal of game development, pp.1–30.

Brooks, R., 1986. A robust layered control system for a mobile robot. IEEE Journal on Robotics and Automation, 2(1), pp.14–23.

Brooks, R.A., 1991. Intelligence without representation. Artificial Intelligence, 47(1-3), pp.139–159.

Chapman, D., 1987. Planning for conjunctive goals. Artificial Intelligence, 32, pp.333–377.

Chatila, R. & Laumond, J., 1985. Position referencing and consistent world modeling for mobile robots. In Proceedings of IEEE International Conference on Robotics and Automation 1985, 2, pp.138 – 145.

Chhugani, J., Kim, C., Satish, N., Lin, M., Manocha, D. & Dubey, P., 2009. ClearPath: Highly Parallel Collision Avoidance for Multi-Agent Simulation. ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp.177–187.

38

Choset, H.M., 2005. Principles of robot motion: theory, algorithms, and implementation, MIT press.

Chrastil, E. & Warren, W., 2013. From cognitive maps to cognitive graphs. PloS one, 9(11), e112544.

Connell, J.H., 1992. SSS: a hybrid architecture applied to robot navigation. In Proceedings of 1992 IEEE International Conference on Robotics and Automation, pp.2719–2724.

Davids, A., 2002. Urban search and rescue robots: From tragedy to technology. IEEE Intelligent Systems and Their Applications, 17(2), pp.81–83.

Elfes, A., 1989. Using occupancy grids for mobile robot perception and navigation. Computer, 22(6), pp.46–57.

Ferguson, D., Kalra, N. & Stentz, A., 2006. Replanning with RRTs. In Proceedings of IEEE International Conference on Robotics and Automation 2006. pp. 1243–1248.

Fikes, R.E. & Nilsson, N.J., 1971. Strips: A new approach to the application of theorem proving to problem solving. Artificial Intelligence, 2, pp.189–208.

Firby, R., 1987. An investigation into reactive planning in complex domains. In Proceedings of AAAI - 87. pp. 202–206.

Forlizzi, J. & Disalvo, C., 2006. Service Robots in the Domestic Environment: A Study of the Roomba Vacuum in the Home. In Design. ACM Press, pp. 258–265.

Gat, E., 1997. On Three-Layer Architectures. Artificial intelligence and mobile robots, pp.195–210.

Gerevini, A. & Serina, I., 2003. Planning as Propositional CSP: From Walksat to Local Search Techniques for Action Graphs. Constraints, 8, pp.389–413.

Golledge, R.G., 1999. Human Wayfinding and Cognitive Maps. In Wayfinding Behavior: Cognitive Mapping and Other Spatial Processes. pp. 5–45.

Goodwin, R., Goodwin, R., Simmons, R.G., Simmons, R.G., Durrant-Whyte, H.F., Durrant-Whyte, H.F., Koenig, S., Koenig, S., O’Sullivan, J. & O’Sullivan, J., 1997. A layered architecture for office delivery robots. In Proceedings of International conference on Autonomous agents 1997, pp.245–252.

Helmert, M., 2006. The fast downward planning system. Journal of Artificial Intelligence Research, 26, pp.191–246.

39

Hernández, C., Baier, J.A. & Asin, R., 2014. Making A* Run Faster than D*-Lite for Path-Planning in Partially Known Terrain. In Proceedings of the 24th International Conference on Automated Planning and Scheduling (ICAPS), pp.504-508.

Hoffmann, J. & Nebel, B., 2001. The FF planning system: Fast plan generation through heuristic search. Journal of Artificial Intelligence Research, 14, pp.263–312.

Huang, L., 2009. Velocity planning for a mobile robot to track a moving target - a potential field approach. Robot and Autonomous Systems, 57(1), pp.55–63.

Kaelbling, L.P., Littman, M.L. & Cassandra, A.R., 1998. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1-2), pp.99–134.

Kautz, H., McAllester, D. & Selman, B., 1996. Encoding plans in propositional logic. In Proceedings of the Fifth International Conference on Principles of Knowledge Representation and Reasoning (KR-96). pp. 374–384.

Kavraki, L.E., Švestka, P., Latombe, J.C. & Overmars, M.H., 1996. Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Transactions on Robotics and Automation, 12(4), pp.566–580.

Khatib, O., 1985. Real-time obstacle avoidance for manipulators and mobile robots. In Proceedings of IEEE International Conference on Robotics and Automation 1985, 2, pp.41–59.

Koenig, S. & Likhachev, M., 2002. D* Lite. In Proceedings of the Eighteenth National Conference on Artificial Intelligence, pp.476–483.

Koenig, S., Likhachev, M. & Furcy, D., 2004. Lifelong Planning A*. Artificial Intelligence, 155, pp.93–146.

Koren, Y., Member, S., Borenstein, J. & Arbor, A., 1991. Potential Field Methods and Their Inherent Limitations for Mobile Robot Navigation. In Proceedings of the IEEE International Conference on Robotics and Automation, pp.1398–1404.

Korf, R.E., 1996. Artificial Intelligence Search Algorithms M. J. Atallah, ed., In Algorithms and Theory of Computation Handbook.

Kuffner, J.J. & LaValle, S.M., 2000. RRT-connect: An efficient approach to single-query path planning. In Proceedings of IEEE International Conference on Robotics and Automation 2000, pp. 995–1001.

Kuipers, B. & Byun, Y.-T., 1991. A robot exploration and mapping strategy based on a semantic hierarchy of spatial representations. Robotics and Autonomous Systems, 8(1-2), pp.47–63.

40

Likhachev, M., Ferguson, D., Gordon, G., Stentz, A. & Thrun, S., 2005. Anytime Dynamic A *: An Anytime , Replanning Algorithm. Science, pp.262–271.

Lowe, D.G., 1999. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision. pp. 1150–1157.

Maes, P. & Brooks, R., 1990. Learning to coordinate behaviors. In Proceedings of the Eighth National Conference on Artificial Intelligence. AAAI Press, pp. 1–7.

Maimone, M., Cheng, Y. & Matthies, L., 2007. Two years of visual odometry on the Mars Exploration Rovers. Journal of Field Robotics, 24(3), pp.169–186.

Mataric, M.J., 1992. Integration of Representation Into Goal-driven Behavior-based Robots. In Proceedings 1992 IEEE International Conference on Robotics and Automation. pp. 304–312.

Montemerlo, M., Thrun, S., Koller, D. & Wegbreit, B., 2002. FastSLAM: A factored solution to the simultaneous localization and mapping problem. In Proceedings of 8th National Conference on Artificial Intelligence/14th Conference on Innovative Applications of Artificial Intelligence 1999, pp. 593–598.

Nguyen, X. & Kambhampati, S., 2001. Reviving partial order planning. In Proceedings of IJCAI International Joint Conference on Artificial Intelligence- 2001, pp.459–464.

Nilsson, N.J., 1980. Principles of Artificial Intelligence, San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.

Penberthy, J.S. & Weld, D., 1992. UCPOP: A Sound, Complete, Partial Order Planner for ADL. In Proceedings of the third international conference on knowledge representation and reasoning, pp.103–114.

Petrovskaya, A. & Thrun, S., 2009. Model based vehicle detection and tracking for autonomous urban driving. Autonomous Robots, 26(2-3), pp.123–139.

Ram, A., Boone, G., Arkin, R. & Pearce, M., 1994. Using Genetic Algorithms to Learn Reactive Control Parameters for Autonomous Robotic Navigation. Adaptive Behavior, 2(3), pp.277–305.

Ranganathan, A. & Koenig, S., 2003. A reactive robot architecture with planning on demand. In Proceedings of 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003), pp.1462–1468.

Richter, S. & Westphal, M., 2010. The LAMA Planner: Guiding Cost-based Anytime Planning with Landmarks. Journal of Artificial Intelligence Research., 39(1), pp.127–177.

41

Rosenblatt, J.K., 1997. DAMN: a distributed architecture for mobile navigation. Journal of Experimental & Theoretical Artificial Intelligence, 9, pp.339–360.

Ross, S., Pineau, J., Paquet, S. & Chaib-draa, B., 2008. Online planning algorithms for POMDPs. Journal of Artificial Intelligence Research, 32, pp.663–704.

Sacerdoti, E.D., 1974. Planning in a hierarchy of abstraction spaces. Artificial intelligence, 5(2), pp.115–135.

Sacerdoti, E.D., 1975. The nonlinear nature of plans. In Proceedings of the 4th International Joint Conference on Artificial Intelligence. IJCAI’75. Morgan Kaufmann Publishers Inc., pp. 206–214.

Se, S., Lowe, D. & Little, J., 2001. Vision-based Mobile Robot Localization And Mapping using Scale-Invariant Features. In Proceedings of IEEE International Conference on Robotics and Automation 2001. pp. 2051 – 2058.

Silver, D. & Veness, J., 2010. Monte-Carlo Planning in Large POMDPs. Advances in neural information processing systems. pp. 2164–2172.

Smith, R., Self, M. & Cheeseman, P., 1987. Estimating uncertain spatial relationships in robotics. In Proceedings of IEEE International Conference on Robotics and Automation 1987, 4, pp.435–461.

Stentz, A., 1994. Optimal and efficient path planning for partially-known environments. In Proceedings of IEEE International Conference on Robotics and Automation 1994, pp.3310–3317.

Stentz, A., 1995. The focussed D* algorithm for real-time replanning. In Proceedings of 14th International Joint Conference on Artificial intelligence (IJCAI). Morgan Kaufmann Publishers Inc., pp. 1652–1659.

Sun, D., Kleiner, A. & Nebel, B., 2014. Behavior-based Multi-Robot Collision Avoidance. In Proceedings of IEEE International Conference on Robotics and Automation 2014, pp. 1668–1673.

Thrun, S., 1998. Learning metric-topological maps for indoor mobile robot navigation. Artificial Intelligence, 99(1), pp.21–71.

Thrun, S., Fox, D., Burgard, W. & Dellaert, F., 2001. Robust Monte Carlo localization for mobile robots. Artificial Intelligence, 128(1-2), pp.99–141.

Tversky, B., 1993. Cognitive maps, cognitive collages, and spatial mental models. Spatial Information Theory A Theoretical Basis for GIS, pp.14–24.

42

Veloso, M., Biswas, J., Coltin, B., Rosenthal, S., Kollar, T., Mericli, C., Samadi, M., Brandao, S. & Ventura, R., 2012. CoBots: Collaborative robots servicing multi-floor buildings. In Proceedings of IEEE International Conference on Intelligent Robots and Systems, pp. 5446–5447.

Zucker, M., Kuffner, J. & Branicky, M., 2007. Multipartite RRTs for rapid replanning in dynamic environments. In Proceedings of IEEE International Conference on Robotics and Automation 2007, pp. 1603–1609.

decision making for autonomous navigation abstract...! 1! decision making for autonomous navigation...

Documents