novel search techniques for path finding in complex ...guni/papers/thesis-sharon.pdf3.3.2 3 3 grid...

Novel Search Techniques for Path Findingin Complex Environment

Guni Sharon

June 15, 2015

This work was carried out under the supervision ofProf. Ariel Felner at the Department of Information Systems Engineering, Ben-Gurion

University.

1

Acknowledgments

I would like to thank my advisor, Prof. Ariel Felner, for the tremendous effort he investedin guiding, supporting and encouraging me when needed. Prof. Felner’s vast knowledge,patience and experience enabled the submission of this thesis and I owe him a great debt ofgratitude for this.

I would also like to express my gratitude to Dr. Roni Stern, who served, de facto as mysecond advisor, and taught me a great deal about academic research. I would like to thankProf. Nathan Sturtevant, for hosting me at the University of Denver and guiding me throughseveral research projects. I was privileged to work alongside some great researchers: Dr.Meir Goldenberg, Ofra Amir, Eli Boyarski and Max Barer. Thank you all.

Most importantly I would like to thank my mother for many hours of proofreading andtypos spotting.

i

Contents

1 Introduction & Overview 11.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2.1 Systematic Search . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.2 A* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Path-Finding Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3.1 Multi-Agent Path-Finding . . . . . . . . . . . . . . . . . . . . . . 31.3.2 Real-Time Agent-Centered Search . . . . . . . . . . . . . . . . . . 4

1.4 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4.1 Chapter 2: The Multi-Agent Path Finding problem . . . . . . . . . 51.4.2 Chapter 3: Increasing-Cost Tree Search . . . . . . . . . . . . . . . 51.4.3 Chapter 4: Conflict-Based Search . . . . . . . . . . . . . . . . . . 61.4.4 Chapter 5: Exponential-Deepening A* . . . . . . . . . . . . . . . . 7

1.5 Related Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.5.1 Research led by myself . . . . . . . . . . . . . . . . . . . . . . . . 91.5.2 Research led by others . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Multi-Agent Path Finding 122.1 Problem Definition and Terminology . . . . . . . . . . . . . . . . . . . . . 12

2.1.1 Problem Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.1.2 Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.1.3 MAPF Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 132.1.4 MAPF Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.1.5 Cost function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.1.6 Distributed vs. Centralized . . . . . . . . . . . . . . . . . . . . . . 142.1.7 Examples of a MAPF Problem . . . . . . . . . . . . . . . . . . . . 14

2.2 Survey of Centralized MAPF Algorithms . . . . . . . . . . . . . . . . . . 152.2.1 Reduction-based Solvers . . . . . . . . . . . . . . . . . . . . . . . 152.2.2 MAPF-Specific Sub-optimal Solvers . . . . . . . . . . . . . . . . . 15

Search-Based Suboptimal Solvers . . . . . . . . . . . . . . . . . . 15Rule-Based Suboptimal Solvers . . . . . . . . . . . . . . . . . . . 16Hybrid Solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.3 Optimal MAPF Solvers . . . . . . . . . . . . . . . . . . . . . . . . 18Admissible Heuristics for MAPF . . . . . . . . . . . . . . . . . . 18Drawbacks of A* for MAPF . . . . . . . . . . . . . . . . . . . . . 19

ii

Reducing the Effective Number of Agents with Independence De-tection . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Enhancements to ID . . . . . . . . . . . . . . . . . . . . . . . . . 20M* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21Avoiding Surplus Nodes . . . . . . . . . . . . . . . . . . . . . . . 21Operator Decomposition . . . . . . . . . . . . . . . . . . . . . . . 21Enhanced Partial Expansion . . . . . . . . . . . . . . . . . . . . . 22

2.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3 The Increasing Cost Tree Search 233.1 ICTS Formalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.1.1 High-Level Search . . . . . . . . . . . . . . . . . . . . . . . . . . 243.1.2 Low-Level Search . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Compact Paths Representation with MDDs . . . . . . . . . . . . . 26Example of ICTS . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Goal Test in the k-Agent-MDD Search Space . . . . . . . . . . . . 27Generalization to k > 2 agents . . . . . . . . . . . . . . . . . . . . 29

3.1.3 The ICTS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 29Search Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2 Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2.1 ICTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2.2 A* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.2.3 A*+OD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.3 Experimental Results: ICTS vs. A* . . . . . . . . . . . . . . . . . . . . . 353.3.1 Experiment Types . . . . . . . . . . . . . . . . . . . . . . . . . . 353.3.2 3× 3 Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.3.3 The Growth of k vs. ∆ . . . . . . . . . . . . . . . . . . . . . . . . 373.3.4 8× 8 Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.3.5 Limitations of ICTS . . . . . . . . . . . . . . . . . . . . . . . . . 393.3.6 ICTS on other MAPF variants . . . . . . . . . . . . . . . . . . . . 40

3.4 ICT Pruning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.4.1 Simple Pairwise Pruning . . . . . . . . . . . . . . . . . . . . . . . 403.4.2 Enhanced Pairwise Pruning . . . . . . . . . . . . . . . . . . . . . 413.4.3 Repeated Enhanced Pairwise Pruning . . . . . . . . . . . . . . . . 423.4.4 Tradeoffs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.4.5 m-Agent Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.5 Experiments: ICTS Pruning Techniques . . . . . . . . . . . . . . . . . . . 443.5.1 Pruning Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . 443.5.2 Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.6 Experiments: ICTS Vs. A* on Different Domains . . . . . . . . . . . . . . 483.6.1 8× 8 grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483.6.2 Grid with Scattered Obstacles . . . . . . . . . . . . . . . . . . . . 493.6.3 Dragon Age Maps . . . . . . . . . . . . . . . . . . . . . . . . . . 50

iii

3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4 The Conflict-Based Search 534.1 The Conflict Based Search Algorithm . . . . . . . . . . . . . . . . . . . . 53

4.1.1 Definitions for CBS . . . . . . . . . . . . . . . . . . . . . . . . . 544.1.2 High Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

The Constraint Tree . . . . . . . . . . . . . . . . . . . . . . . . . 54Processing a Node in the CT . . . . . . . . . . . . . . . . . . . . . 55Resolving a Conflict . . . . . . . . . . . . . . . . . . . . . . . . . 55Conflicts of k > 2 Agents . . . . . . . . . . . . . . . . . . . . . . 56Edge Conflicts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56Pseudo-Code and Example . . . . . . . . . . . . . . . . . . . . . . 56

4.1.3 Low Level: Find Paths for CT Nodes . . . . . . . . . . . . . . . . 584.2 Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.2.1 Optimality of CBS . . . . . . . . . . . . . . . . . . . . . . . . . . 594.2.2 Completeness of CBS . . . . . . . . . . . . . . . . . . . . . . . . 60

Claim a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60Claim b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.2.3 Comparison with Other Algorithms . . . . . . . . . . . . . . . . . 61Example of CBS Outperforming A* (bottlenecks) . . . . . . . . . . 62Example of A* Outperforming CBS (open space) . . . . . . . . . . 63

4.3 CBS Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 644.3.1 Experimental Problem Settings . . . . . . . . . . . . . . . . . . . 644.3.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 65

8×8 4-Connected Grid . . . . . . . . . . . . . . . . . . . . . . . . 65DAO Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.4 CBS Using Different Cost Functions . . . . . . . . . . . . . . . . . . . . . 684.4.1 High level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.4.2 Low Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.4.3 Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.4.4 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.5 Meta-Agent Conflict Based Search (MA-CBS) . . . . . . . . . . . . . . . 694.5.1 Motivation for Meta-agent CBS . . . . . . . . . . . . . . . . . . . 694.5.2 Merging Agents Into a Meta-Agent . . . . . . . . . . . . . . . . . 704.5.3 Merge Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714.5.4 Merging Constraints . . . . . . . . . . . . . . . . . . . . . . . . . 71

Merging External Constraints . . . . . . . . . . . . . . . . . . . . 724.5.5 The Low-Level Solver . . . . . . . . . . . . . . . . . . . . . . . . 724.5.6 Completeness and Optimality of MA-CBS . . . . . . . . . . . . . 734.5.7 MA-CBS as a Continuum . . . . . . . . . . . . . . . . . . . . . . 73

4.6 MA-CBS Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . 744.6.1 Conclusions from Experiments . . . . . . . . . . . . . . . . . . . . 78

4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

iv

5 Exponential Deepening A* 805.1 RTACS: Definitions and Background . . . . . . . . . . . . . . . . . . . . . 81

5.1.1 RTACS Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . 815.1.2 Classification of States . . . . . . . . . . . . . . . . . . . . . . . . 825.1.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 825.1.4 Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835.1.5 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.2 Previous Work on RTACS . . . . . . . . . . . . . . . . . . . . . . . . . . 855.2.1 Learning Real-Time A* . . . . . . . . . . . . . . . . . . . . . . . 85

Local-Search Space LRTA* . . . . . . . . . . . . . . . . . . . . . 86Real-Time Adaptive A* . . . . . . . . . . . . . . . . . . . . . . . 88Depression Avoidance in LRTA*/RTA* . . . . . . . . . . . . . . . 88Other variants of LRTA* . . . . . . . . . . . . . . . . . . . . . . . 88

5.3 Iterative-Deepening Algorithms for RTACS . . . . . . . . . . . . . . . . . 895.3.1 Real-time Iterative-deepening Best-first Search . . . . . . . . . . . 905.3.2 Theoretical analysis of ID . . . . . . . . . . . . . . . . . . . . . . 905.3.3 Efficiency of IDA*/RIBS . . . . . . . . . . . . . . . . . . . . . . . 92

Exponential Domains . . . . . . . . . . . . . . . . . . . . . . . . . 92Polynomial domains . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.4 Exponential Deepening A* (EDA*) . . . . . . . . . . . . . . . . . . . . . 935.4.1 Optimality of EDA* . . . . . . . . . . . . . . . . . . . . . . . . . 93

Classical search setting (without the RTACS restrictions) . . . . . . 93RTACS setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.4.2 Balance of EDA* . . . . . . . . . . . . . . . . . . . . . . . . . . . 94Exponential domains: . . . . . . . . . . . . . . . . . . . . . . . . 94Polynomial domains: . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.5 BDFS in RTACS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 955.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

5.6.1 Supporting the Theoretical Analysis . . . . . . . . . . . . . . . . . 98Open grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98Game map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.6.2 RTACS Experiments on Video Game Maps . . . . . . . . . . . . . 1005.6.3 Different Domains . . . . . . . . . . . . . . . . . . . . . . . . . . 1015.6.4 Worst case performance . . . . . . . . . . . . . . . . . . . . . . . 102

5.7 LSS-EDA* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1035.7.1 Comparing EDA* and LSS-EDA* on an Example graph . . . . . . 1055.7.2 Experimental Results When Lookahead is Applied . . . . . . . . . 106

5.8 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 108

6 Conclusions and Future Work 109

Bibliography 119

v

A Memory Restricted CBS 120A.1 Domains with Many Duplicate States . . . . . . . . . . . . . . . . . . . . 121A.2 Polynomial domains with transpositions . . . . . . . . . . . . . . . . . . . 121

A.2.1 Extreme Case Example . . . . . . . . . . . . . . . . . . . . . . . . 122A.2.2 EDA* is Balanced Even When Transpositions Exist . . . . . . . . . 123

vi

List of Figures

2.1 An example of a MAPF instance with 2 agents. The mice, 1 and 2, mustreach the pieces of cheese 1 and 2, respectively. . . . . . . . . . . . . . . . 14

2.2 An example of a MAPF instance with 2 agents. The mice, 1 and 2, mustreach the pieces of cheese 1 and 2, respectively. . . . . . . . . . . . . . . . 15

3.1 ICT for three agents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2 2 and 3 steps MDDs for agent a1 and 2 steps MDD for agent a2 . . . . . . 263.3 ICT for the problem in Figure 2.2. . . . . . . . . . . . . . . . . . . . . . . 273.4 MDDs of 2 steps for agent a2 and its extension . . . . . . . . . . . . . . . 273.5 (i) Merging MDD1 and MDD2 to MDD12 (ii) unfolded MDD∗3

1. . . . . 283.6 Example of changes in the f -value on a 4-connected grid. The f -value of

the cells is for time step 1. . . . . . . . . . . . . . . . . . . . . . . . . . . 343.7 ∆ and k growth on a 3× 3 grid with no obstacles. . . . . . . . . . . . . . . 373.8 Success rate on an 8×8 grid with no obstacles. Experiment of type 2 where

ID not activated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.9 ICTS pathology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.10 Example of a bottleneck for three agents. . . . . . . . . . . . . . . . . . . 433.11 Success rate on an 8× 8 grid. ID activated. . . . . . . . . . . . . . . . . . 483.12 Success rate for 40 agents on a 32x32 grid with scattered obstacles. Exper-

iment of type 1 - ID activated. . . . . . . . . . . . . . . . . . . . . . . . . 493.13 DAO maps (left). Their performance (right). The x-axis = number of

agents. The y-axis = success rate. . . . . . . . . . . . . . . . . . . . . . . . 50

4.1 A (k = 3)-way branching CT (top) and a binary CT for the same problem(bottom). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.2 An example of a Constraint Tree (CT). . . . . . . . . . . . . . . . . . . . . 584.3 An example of an unsolvable MAPF instance. . . . . . . . . . . . . . . . . 614.4 A pathological case where CBS is extremely inefficient. . . . . . . . . . . . 634.5 The CT for the pathological case where CBS is extremely inefficient. . . . . 644.6 Success rate vs. number of agents 8×8 grid. . . . . . . . . . . . . . . . . . 664.7 The success rate of the different algorithms all running on top of ID for

different DAO maps den520d (top), ost003d (middle), brc202d (bottom). . . 674.8 DAO maps den520d (left), ost003d (middle), brc202d (right) and their con-

flicting locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

vii

4.9 Success rate of the MA-CBS on top of ID with EPEA* as the low-levelsolver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.1 Example graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825.2 A running example of EDA* for RTACS on the problem instance depicted

in Figure 5.1. For convenience, a copy of Figure 5.1 is given above. . . . . 975.3 Open map experiments: A*, IDA*, RIBS, EDA*. . . . . . . . . . . . . . . 995.4 F vs R values on video game maps as 8-connected grid with octal-distance

heuristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1005.5 sorted measurements of 50% of the hardest instances. . . . . . . . . . . . . 1035.6 An example 4-connected grid with an agent (robot) currently occupying

the top left corner. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1055.7 Comparing EDA* on the Brushfire map with no lookahead (left) and LSS-

EDA* with lookahead of 10 (right). In red - states visited by the agent,gray - states marked dead. . . . . . . . . . . . . . . . . . . . . . . . . . . 106

A.1 Example MAPF instance where CBS has cycles. . . . . . . . . . . . . . . 120A.2 Pathological example where EDA* with C = 2 might be imbalanced. . . . 122

viii

List of Tables

3.1 Summary of theoretical comparison: A*, A*+OD and ICTS . . . . . . . . 343.2 Results on 3×3 grid averaged over 100 instances. ID was not activated (ex-

periment of type 2). Comparison of the theoretical measures to the numberof nodes and to the actual running time (in ms) . . . . . . . . . . . . . . . 36

3.3 k conflicting agents (experiment of type 2, where ID is not activated) on an 8 × 8 grid.

Running time in ms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.4 Number of (non-goal) ICT nodes where the low-level search was activated

for 3 × 3 grid, 4 × 4 grid, and 8 × 8 grid and the Den520d map. ID wasactivated for the den520d map only. . . . . . . . . . . . . . . . . . . . . . 45

3.5 Runtime (in ms) for 3× 3 grid, 4× 4 grid, 8× 8 grid and Den520d map. . 473.6 Run-time (in ms) on 8× 8 grid. Type 1 experiment; ID activated. . . . . . . 483.7 A*+OD Vs. basic ICTS and ICTS+3E on the DAO maps (runtime in ms). . . . . . . . . 51

4.1 Nodes generated and running time on 8×8 grid. . . . . . . . . . . . . . . . 664.2 Run-Time (ms) on DAO problems. EPEA* as low-level solver. . . . . . . . 754.3 Run-Time (ms) on DAO problems. A* as Low-level Solver. . . . . . . . . . 75

5.1 IDA* vs EDA*. Ci = condition i. . . . . . . . . . . . . . . . . . . . . . . . 955.2 Average measurements for the open map experiment. . . . . . . . . . . . . 995.3 Average measurements over all DAO problems (LSS=1). . . . . . . . . . . 1015.4 Average measurements in different domains. . . . . . . . . . . . . . . . . . 1025.5 1% worst cases average measurements over all experiments with no looka-

head. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1045.6 Average measurements when memory bounded lookahead is applied in

DAO scenarios. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075.7 Average measurements when time bounded lookahead is applied in DAO

scenarios. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

ix

Abstract

In the Path-Finding problem, given a graph, an initial vertex (start) and a destinationvertex (goal), the task is to find a path along the graph leading from start to goal. Thepath-finding problem has many real life applications such as GPS navigation, robot routingand video games. Many other problems such as domain independent planning, permuta-tion puzzles and constraint satisfaction problems are commonly reduced to a path-findingproblem.

Path finding problems are commonly solved using the A* algorithm. The A* algorithmis proven to be ”optimally effective” for any given admissible heuristic function, meaningthat no optimal algorithm employing the same heuristic will consider fewer nodes thanA*. Nevertheless, the research on the A* algorithm and on path finding problems is anactive field of research. Moreover, a tremendous number of research papers were writtenon algorithms that are based on A*.

This thesis focuses on two complicated variants of the path finding problem on an ex-plicitly given graph. The first path-finding variant is the Multi-Agent Path-Finding problem(MAPF) which is defined and presented in Chapter 2. In MAPF, paths should be assigned tomultiple agents, each with unique start and goal vertices such that the paths do not overlap,i.e., the agents will not collide. While the underlying graph is explicitly given the com-plexity of MAPF stems from the fact that the number of different assignments of agents tolocations is exponential. MAPF can be found in real-life applications such as video games,traffic control, robotics and aviation.

This thesis presents two novel algorithms for solving MAPF. The main contribution ofthese algorithms is that they both introduce new search algorithms which are not based onA*. Each of the new algorithms has a unique and novel search tree.

Chapter 3 introduces the Increasing Cost Tree Search, a novel two-level formalizationthat optimally solves MAPF. The high level performs a search on a new search tree calledthe increasing cost tree (ICT). Each node in the ICT consists of a k-vector [C1, C2, ...Ck]

which represents all possible solutions in which the cost of the individual path of eachagent, ai, is exactly Ci. The low level performs a goal test on each of the ICT nodes. Wedenote our two-level algorithm as ICT search (ICTS). We compare ICTS to the traditionalA*-based search formalization and show its advantages and drawbacks. Experimental re-sults on a number of domains show that ICTS outperforms the previous state-of-the-art A*approach by up to three orders of magnitude in many cases.

Chapter 4 presents yet another new algorithm for solving MAPF optimally called Con-flict Based Search (CBS). CBS is also a two-level algorithm where the high level search isperformed in a constraint tree (CT) whose nodes include constraints on time and locationfor a single agent. At each node in the CT a low-level search is performed to find new pathsfor all agents. The paths returned by the low-level must satisfy the constraints given by thehigh-level. The path finding that CBS performs are strictly single-agent searches, mean-ing that the low-level state space of CBS is linear in the size of the underlying graph/map.

Also, it means that the single agent paths assigned to all of the agents might be conflicting.Unlike A*-based searches where the search tree is exponential in the number of agents, thehigh-level search tree of CBS is exponential in the number of conflicts encountered dur-ing the solving process. We analyze CBS and study circumstances where CBS is weak andwhen it is strong compared to the A*-based approaches and ICTS. In many cases, CBS out-performs other optimal solvers by up to a full order of magnitude. In other cases, usually inproblems dense with agents, CBS is outperformed by previous solvers. In order to mitigatethe drawbacks of CBS we generalize CBS and present Meta-Agent CBS (MA-CBS). Themain idea is to merge groups of agents into meta-agents if the number of internal conflictsbetween them exceeds a given bound. MA-CBS acts as a framework that can run on topof any complete MAPF solver. We analyze MA-CBS and provide experimental resultsdemonstrating that it outperforms basic CBS and other A*-based optimal solvers in manycases.

A second path-finding variant dealt in this thesis is the Real-Time Agent-Centered(RTACS) variant. In RTACS a single agent must physically find its way from start to goalwhile utilizing a bounded amount of time and memory prior to performing each physicalstep. The agent is also assumed to have a limited sensing radius which enables it to exam-ine only a small portion of the graph (which is in close proximity to it). RTACS can be usedto suboptimally solve permutation puzzles, robotics problem and path planning problemsin video games. RTACS algorithms can also be adapted to provide a policy for a Markovdecision processes and partially observable Markov decision processes.

Finding a path for a single agent from start to goal might require, in the worst case,examining all states at least once, i.e., in complexity that is linear in the size of the statespace, (as in a breadth first search). In existing RTACS algorithms the agent is sometimesforced to revisit each state many times causing the entire procedure to be quadratic in thesize of the state space. As a result, previous RTACS algorithms have an algorithmic gap.Chapter 5 introduces Exponential Deepening A* (EDA*) which is an iterative deepeningsearch variant where the threshold between successive depth-first calls is increased expo-nentially. Unlike previous RTACS algorithms, EDA* is proven to hold a worst case boundthat is linear in the state space. Experimental results supporting this bound are presentedand demonstrate up to 5x reduction over existing RTACS solvers with regards to distancetraveled, states expanded and CPU runtime.

In some cases the planning time and sensing radius allows extra computational effortprior to acting. For these scenarios we present Local Search Space EDA* (LSS-EDA*), avariant of EDA* that directs the agent towards its goal in fewer steps at the expense of ex-tra computational effort. Experimental results show that LSS-EDA* outperforms existingRTACS solvers when extra computational effort is allowed prior to each step.

State-of-the art experimental results were achieved by our algorithms in both the MAPFand the RTACS path finding variants. The main parts of this thesis have been publishedin the AAAI, IJCAI and SoCS conferences [Sharon et al., 2012a; Sharon et al., 2012b;Sharon et al., 2011a; Sharon et al., 2011b; Sharon et al., 2014]. Our work on ICTS and onCBS were also summarized and published in the Artificial Intelligence Journal [Sharon et

ii

al., 2013a; Sharon et al., 2015]. The work on EDA* was also summarized and submittedto this journal and is currently under review. The relevant chapters in this thesis are almostexact copies of these journal manuscripts.

Keywords: Path Finding, Path Planning, Multi Agent, Heuristic Search, Iterative Deep-ening, Exponential Deepening, Real-Time, Agent Centered

iii

Chapter 1

Introduction & Overview

Single-agent path finding is the problem of finding a path in a graph leading from aninitial vertex, denoted as start, to a goal vertex, denoted as goal. It is a fundamentaland important problem in AI that has been researched extensively as this problem canbe found in GPS navigation [Sturtevant and Geisberger, 2010; Geisberger et al., 2012],robot routing [Cohen et al., 2013; Bhattacharya et al., 2012; Likhachev et al., 2008], plan-ning [Bonet and Geffner, 2001; Helmert, 2008], network routing [Broch et al., 1998],and many combinatorial problems (e.g., puzzles) [Korf and Taylor, 1996; Korf, 1997;Felner, 2006; Sharon et al., 2015].

A solution of a single-agent path finding problem is a path leading from start to goal.Generally, the quality of a solution is the cost of the found path. An optimal solution is alowest-cost path across the graph leading from the start to the goal.

We distinguish between two types of graphs:

• Exponentially growing graphs - where for any vertex v, the number of unique ver-tices at distance r from v is Θ(cr), where c is a constant.

• Polynomially growing graphs - where for any vertex v, the number of unique ver-tices at distance r from v is Θ(rk). k is commonly known as the dimensionality ofthe graph and is a constant.

In polynomially growing domains such as GPS navigation, robotics and computer games,the input graph is usually given explicitly and is completely stored in memory. In exponen-tially growing domains such as combinatorial puzzles, a large graph is defined implicitlyby specifying the underlying graph structure.

1.2 Background

Two important properties of a search algorithm are optimality and completeness. An al-gorithm is said to be optimal if it is guaranteed that the solution it returns is with minimal(or maximal in some cases) cost. An algorithm is said to be complete if it is guaranteed toreturn a solution if such exists. Throughout this thesis we require that the presented algo-rithms are complete. Therefore, all the algorithms introduced in this thesis are systematicsearch algorithms, which we define next.

1

1.2.1 Systematic Search

Systematic search algorithms are algorithms that can systematically go over the entire prob-lem space. This is done by keeping track of the paths in the problem space that have beentraversed by the search algorithm. The various mechanisms for doing this differ by theirmemory requirements. One of the common mechanisms (known as a Best-First Search) isto store all states in the frontier of the search in a special data structure usually called theopen list (OPEN for short). The frontier of the search is the set of states (open states) thathave been generated, but have not been expanded. The term generating a node refers tocreating a data structure representing the node, while expanding a node means generatingall its of children. Classical examples of systematic search algorithms that use Best-FirstSearch are A* [Hart et al., 1968] and Dijkstra’s algorithm [Dijkstra, 1959].

Systematic search algorithms have several beneficial properties. Assuming a finite statespace, and given enough time, a systematic search will visit the entire search space, findinga solution if one exists. Thus, systematic search algorithms are complete. Additionally asystematic search can identify when the entire search space has been searched, and termi-nate if a solution does not exist. Some systematic search algorithms can also guarantee thatthe returned solution is optimal. Dijkstra’s algorithm and A* are well-known examples ofsearch algorithms that can find an optimal solution.

1.2.2 A*

One of the most widely used systematic search algorithmic is A* [Hart et al., 1968]. As asystemic search, A* keeps a list of open states (OPEN) and a closed list (CLOSED) whichcontains all the states that have been expanded. Every generated state is assigned a valueequal to its distance from start (g value) plus an assumption of the remaining distance tothe goal (h value). In every iteration of A*, the state in OPEN with the lowest g + h valueis chosen to be expanded. This minimal-valued state is moved from OPEN to CLOSED,and the children of this state are inserted to OPEN (generated). The purpose of CLOSEDis to avoid reinserting states that have already been expanded back into OPEN. Once thegoal is chosen for expansion, A* halts and the found path to the goal is returned.

If h(n) is admissible, i.e., it never overestimates the actual cost from n to the goal,then A* is guaranteed to return an optimal solution if one exists. An important propertyof the A* algorithm is that it is optimal in the number of expanded nodes. That is, anyother equally informed search algorithm will have to expand all the nodes expanded by A*before identifying the optimal solution [Dechter and Pearl, 1985]. Due to this property weuse A* as a baseline algorithm for theoretical and empirical comparison throughout thisthesis.

1.3 Path-Finding Variants

This thesis focuses on two variants of the path-finding problem. Multi-Agent Path-Findingand Real-Time Agent-Centered Search. The input for both problem variants is a polyno-

2

mially growing graph (usually a 2D map) but the settings of both variants are much morecomplicated than the traditional single-agent path finding.

1.3.1 Multi-Agent Path-Finding

A multi-agent path finding (MAPF) problem is defined by a graph, G = (V,E), and aset of k agents labeled a1 . . . ak, where each agent ai has a start position si ∈ V and goalposition gi ∈ V . At each time step an agent can either move to an adjacent location orwait in its current location. The task is to plan a sequence of move/wait actions for eachagent ai, moving it from si to gi while avoiding conflicts with other agents (i.e., withoutoccupying the same location at the same time). Time is discretized into time points andat time point t0 agent ai is located in location si. Between successive time points, eachagent can perform a move action to an empty neighboring location or can wait (stay idle) atits current location. In this thesis we assume that both moving and staying have unit cost.Extensions to arbitrary costs of move and wait actions can be performed.

The main constraint is that each state can be occupied by at most one agent at a giventime. In addition, if a and b are neighboring states, two different agents cannot simulta-neously traverse the connecting edge in opposite directions (from a to b and from b to a).A conflict is a case where one of these two constraints is violated. The MAPF problemdiscussed in this thesis is to find a solution to a MAPF problem that minimizes a globalcumulative cost function.

MAPF has practical applications in video games [Hagelback and Johansson, 2008],traffic control [Silver, 2005; Dresner and Stone, 2008; Li and Fan, 2011], robotics [Ben-newitz et al., 2002; Agmon et al., 2012], aviation [Pallottino et al., 2007; Cheng et al.,2001] and warehouse management [Guizzo, 2008; Cohen et al., 2014].

The traditional approach for solving MAPF optimally, is in a coupled manner by usingan A*-based search [Ryan, 2008; Standley, 2010; Goldenberg et al., 2014]. A node in thesearch tree consists of the location of every agent at time t. The start state and goal stateinclude all the initial locations and the goal locations of the different agents, respectively.For a graph with branching factor b (e.g., b = 4 in a 4-connected grid), there are O(b + 1)

possible moves for any single agent; b moves to the neighboring locations and one waitmove where an agent stays idle at its current location. In an A* search that includes kagents the number of operators per state is the cross product of all k single-agent moves,which is O((b + 1)k). Thus, the branching factor for any A*-based search in the MAPFproblem is exponential in the number of agents.

This thesis presents two novel algorithms for optimally solving MAPF, the Increasing-Cost Tree Search and the Conflict-Based Search. Both algorithms model MAPF usinga unique state space representation that allows them to exploit certain properties of thisunique problem. Both algorithms are theoretically and empirically compared to the A*algorithm. In many circumstances these algorithms are shown to outperform the state-of-the-art A* variants.

Chapter 2 provides a survey as well as relevant definitions for the Multi-Agent PathFinding problem (MAPF). Chapters 3 and 4 introduce our two novel MAPF algorithms.

3

All three chapters regarding MAPF are almost exact copies of published journals papers[Sharon et al., 2013a; Sharon et al., 2015].

1.3.2 Real-Time Agent-Centered Search

Real-Time Agent-Centered Search (RTACS) is a path-finding problem defined by an undi-rected graph G, a start and a goal state. The agent is located in start and its task is to arriveat goal while acting and reasoning in the physical environment. The agent is assumedto have limited computational power and only local perception of the environment. Thisresults in two restrictions imposed on the agent’s movement:

1. Real time - the agent has a bounded amount planning time prior to each action itperforms.

2. Agent centered - the agent is allowed to observe and manipulate only the states thatare within sensing radius of its current location.

As a result of these restrictions, algorithms for solving RTACS work in repeated plan-actcycles. Due to the real-time nature of interleaving planning and acting, the path followedby the agent is usually far from being optimal.

RTACS settings assume that the agent is allowed to write a limited (constant) amount ofinformation into each state (e.g., an estimate of the distance to the goal). In this way RTACSis an example of an ‘ant’ algorithm, with limited computation and memory [Shiloni et al.,2009].

Work on RTACS is quite diverse. Published experiments come from a range of ap-plications with diverse evaluation metrics. The original research in this area was used tosuboptimally solve the sliding-tile puzzle [Korf, 1990]. Other work has modeled this as arobotics problem [Koenig, 2001], or as a path planning problem in video games [Bulitko etal., 2008]. The objective of a RTACS algorithm can be one of the following:

• 1: Minimize travel distance: This is relevant when the time of physical movementof the agent (between states) dominates the computational time done by the CPU.

• 2: Minimize computational time: This is relevant when the CPU time dominatesthe time of the physical movement of the agent. CPU time can be measured exactlyor approximated by counting the number of visited states.

Different variants of RTACS algorithms aim at satisfying different objectives.Most existing RTACS solvers belong to the LRTA* family [Korf, 1990; Koenig and

Sun, 2009; Hernandez and Baier, 2012]. The core principle in algorithms of this family isthat when a state is visited by the agent, its heuristic value is updated through its neighborsusing the Bellman equation [Bellman, 1957]. In areas where large heuristic errors exist,agents must revisit states many times, potentially linear in the state space per state [Koenig,1992]. Consequently, the total number of states visited in the entire procedure is potentiallyquadratic in the state space. In this thesis we aim to remedy this and provide an algorithmwhich is linear in the size of the search space even in its worst case. Chapter 5 defines

4

RTACS, provides relevant terminology and presents our novel EDA* algorithm that is, inthe worst case, proven to be linear in the state space.

All algorithms presented in this thesis obtain significant speedups over other knownalgorithms and are considered, to date, state-of-the-art. Moreover, all of these algorithmswere previously presented in top-tier journals and conferences.

The rest of this chapter is organized as follows. Section 1.4 provides an overview of thethesis and Section 1.5 lists publications relevant to the research presented in the thesis.

1.4 Thesis Overview

The main part of this thesis consists of four chapters. The following is a brief overview ofthese chapters.

1.4.1 Chapter 2: The Multi-Agent Path Finding problem

Algorithms for solving MAPF can be divided into two classes: optimal and sub-optimalsolvers. Finding an optimal solution for the MAPF problem is NP-hard [Yu and LaValle,2013b], as the state space grows exponentially with the number of agents. Sub-optimalsolvers are usually used when the number of agents is large. In such cases, the aim is toquickly find a path for the different agents, and it is often intractable to guarantee that agiven solution is optimal.

The problem addressed in this thesis is to find an optimal solution to the MAPF problem.Optimal solvers are usually applied when the number of agents is relatively small and thetask is to find an optimal, minimal-cost solution. This can be formalized as a global, single-agent search problem. Therefore, the traditional approach for solving MAPF optimally isby using A*-based searches [Ryan, 2008; Standley, 2010; Goldenberg et al., 2014]. Recallthat the traditional A* state space representation for MAPF grows exponentially in thenumber of agents. Naturally, search algorithms that are based on A* can solve this problemoptimally, but they may run for a very long time or exhaust the available memory.

This chapter defines the MAPF problem and provides a survey on MAPF algorithms.We classify all existing work into two main categories: optimal and sub-optimal. We thenfurther classify the different approaches for solving this problem sub-optimally. This isdone through a consistent terminology that helps these classifications and is used through-out this thesis. Most of the text in this chapter was previously published [Sharon et al.,2015].

1.4.2 Chapter 3: Increasing-Cost Tree Search

In this chapter, we introduce a novel formalization and a corresponding algorithm for find-ing optimal solutions to the MAPF problem. The new formalization is based on the un-derstanding that a complete solution for the entire problem is built from a set of individualpaths for the different agents. The search algorithm that is based on this new formalizationconsists of two levels.

5

The high-level phase performs a search on a new search tree called the increasing costtree (ICT). Each node in the ICT consists of a k-vector C1, C2, . . . Ck, where k is thenumber of agents in the problem. An ICT node represents all possible solutions in whichthe cost of the individual path of each agent ai is exactly Ci. The ICT is structured in sucha way that there is a unique node in the tree for each possible combination of costs. Thehigh-level phase searches the tree in an order that guarantees that the first solution found(i.e., ICT node whose k-vector corresponds to a valid solution) is optimal.

For each ICT node visited, the high-level phase invokes the low-level phase to checkif there is a valid solution that is represented by this ICT node. The low-level phase itselfconsists of a search in the space of possible solutions where the costs of the different agentsare given by the specification of the high-level ICT node. For this we introduce a problem-specific data-structure that is a variant, adjusted to our problem, of the multi-value decisiondiagram (MDD) [Srinivasan et al., 1990] . The MDD data-structure stores all possiblepaths for a given cost and a given agent. The low-level phase searches for a valid (non-conflicting) solution amongst all the possible single-agent paths, represented by the MDDs.We denote our 2-level algorithm as ICT-search (ICTS).

Unlike an A*-search, both levels of ICTS are not directly exploiting information fromadmissible heuristic. Nevertheless, we also introduce efficient pruning techniques that canquickly identify ICT nodes that do not represent any valid solution. These pruning tech-niques are based on examining small groups of agents (such as pairs or triples) and identi-fying internal conflicts that preclude the given ICT node from representing a valid solution.When such an ICT node is identified, there is no need to activate the low-level search andthe high-level search can proceed to the next ICT node.

We study the behavior of our ICTS formalization and discuss its advantages and draw-backs when compared to the A*-based approaches. Based on characteristics of the problemwe show cases where ICTS will be very efficient compared to the A*-based approaches.We also discuss the limitations of ICTS and show circumstances where it is inferior to theA*-based approaches.

Substantial experimental results are provided, confirming our theoretical findings. While,in some extreme cases, ICTS is ineffective, there are many cases where ICTS outperformsthe A*-based approach [Standley, 2010] by up to three orders of magnitude. Specifically,we experimented on open grids as well as on a number of benchmark game maps fromSturtevant’s path finding database [Sturtevant, 2012]. Results show the superiority of ICTSover the A*-based approaches in these domains.

This chapter is almost an exact copy of a previously published paper [Sharon et al.,2013a].

1.4.3 Chapter 4: Conflict-Based Search

In this chapter we introduce another approach for optimally solving MAPF. First, wepresent a novel conflict-based formalization for MAPF and a corresponding new algorithmcalled Conflict Based Search (CBS). CBS is a two-level algorithm, divided into high-leveland low-level searches. The agents are initialized with default paths, which may contain

6

conflicts. The high-level search is performed in a constraint tree (CT) whose nodes con-tain time and location constraints for a single agent. At each node in the CT, a low-levelsearch is performed for all agents. The low-level search returns single-agent paths that areconsistent with the set of constraints given at any CT node. If, after running the low level,there are still conflicts between agents, i.e., two or more agents are located in the samelocation at the same time, the associated high-level node is declared a non-goal node andthe high-level search continues by adding more nodes with constraints that resolve the newconflict.

We study the behavior of our CBS algorithm and discuss its advantages and draw-backs when compared to A*-based approaches as well as other approaches. Based oncharacteristics of the problem, we show cases where CBS will be significantly more ef-ficient than the previous approaches. We also discuss the limitations of CBS and showcircumstances where CBS is inferior to the A*-based approaches. Experimental results areprovided which support our theoretical understandings. While CBS is ineffective in somecases, there are many cases where CBS outperforms ICTS and EPEA* [Felner et al., 2012;Goldenberg et al., 2014], the state-of-the-art A*-based approach for this problem. Specif-ically, we experimented on open grids as well as on a number of benchmark game mapsfrom Sturtevant’s path finding database [Sturtevant, 2012]. Results show the superiority ofCBS over the A*-based approaches and ICTS on many of these domains.

Next, we mitigate the worst-case performance of CBS by generalizing CBS into a newalgorithm called Meta-agent CBS (MA-CBS). In MA-CBS the number of conflicts allowedbetween any pair of agents is bounded by a predefined parameter B. When the number ofconflicts exceeds B, the conflicting agents are merged into a meta-agent and then treatedas a joint composite agent by the low-level solver. In the low-level search, MA-CBS usesany possible complete MAPF solver to find paths for the meta-agent. Thus, MA-CBS canbe viewed as a solving framework where low-level solvers are plugged in. Different mergepolicies give rise to different special cases. The original CBS algorithm corresponds to theextreme case where B = ∞ (never merge agents), and the Independence Detection (ID)framework [Standley, 2010] is the other extreme case where B = 0 (always merge agentswhen conflicts occur). Finally, we present experimental results for MA-CBS that show thesuperiority of MA-CBS over the other approaches on all domains.

This chapter is almost identical to a previously published manuscript [Sharon et al.,2015].

1.4.4 Chapter 5: Exponential-Deepening A*

In this chapter, we define RTACS and provide a short survey on existing algorithms. Next,we study the Iterative Deepening (ID) approach for solving RTACS.

Depth-First Iterative Deepening (DFID) and its informed variant Iterative DeepeningA* (IDA*) [Korf, 1985] are commonly used algorithms for searching exponentially grow-ing domains (e.g., permutation puzzles and many planning problems). Adding heuristicguidance does not change the theoretical attributes that we are interested in this chapter.Therefore in all our analysis we just talk about general (Iterative Deepening) ID. ID per-

7

forms a set of bounded depth-first search (BDFS) calls, while incrementing the boundbetween each successive BDFS call. ID satisfies the restrictions imposed on a RTACSalgorithm, assuming an undirected search graph.

The domain in which the agent acts greatly affects the performance of the differentalgorithms; we distinguish between domains that grow exponentially and polynomially. Inexponentially growing domains with no transpositions, ID has a worst case complexity thatis linear in the size of the state space. This is because, in such domains, the cost of the lastBDFS iteration dominates the cost of all previous iterations. Work on RTACS however,usually focuses on domains with a polynomially growing state spaces such as 2D maps.For such domains, the cost of the last BDFS iteration of ID will not dominate the sum ofcosts of previous iterations. In such domains, ID exhibits a worst case complexity that ispolynomially larger than the size of the state space.

We define a RTACS algorithm to be balanced if the number of times the agent revisitsstates is of the same order as the number of first visits. In polynomial domains ID, similarto LRTA*, may revisit states many times and holds a worst case complexity quadratic inthe state space. Therefore, in polynomial domains ID is thus imbalanced according to ourdefinitions.

To tackle the problem of extensive state revisiting of ID we introduce Exponential Deep-ening A* (EDA*), a variant of IDA* [Korf, 1985]. Unlike IDA*, where the threshold for thenext iteration grows linearly, in EDA* the threshold for the next iteration is multiplied by aconstant factor and thus grows exponentially. We prove that in common RTACS domains,which are polynomial, EDA* is balanced according to our definition, resulting in complex-ity that is linear in the size of the state space. As a result, EDA* solves the algorithmic gapmentioned above.

We then provide experimental results that validate our theoretical claim. EDA* out-performs existing RTACS solvers. This is especially valid in worst-case scenarios, whereother RTACS algorithms perform a quadratic number of revisits but EDA*, which has alinear-time worst-case complexity, is much more efficient.

In many scenarios the time and memory bounds imposed on planning each step allowssensing larger areas of the graph (within a given sensing radius). This allows a largersearch to be performed prior to acting. As a result, the action chosen at each step has ahigher tendency to lead towards the goal and away from dead ends. For such scenarioswe present an extension to EDA* called Local Search Space EDA* (LSS-EDA*). A setof states in close proximity to the goal are defined as the local search space (LSS) of thecurrent state. LSS-EDA* finds and marks expandable states [Sharon et al., 2013b] withinthe LSS and marks them as dead. Once LSS-EDA* encounters a state that was previouslymarked dead it will ignore that state and continue as if it is not part of the underlyinggraph. We then provide further experimental results showing that when non-trivial LSS areallowed LSS-EDA* outperforms state-of-the-art RTACS algorithms such as daLSS-LRTA*and daRTAA* [Hernandez and Baier, 2012].

A version of this research was presented in [Sharon et al., 2014]. This chapter is almostan exact copy of a journal paper under submission to Artificial Intelligence.

8

1.5 Related Publications

The following papers have been published as a result of this thesis.

1.5.1 Research led by myself

The following is a list of papers that are closely related to this thesis:A first study of ICTS was published in:

• 1) [Sharon et al., 2011a] - Guni Sharon, Roni Stern, Meir Goldenberg, Ariel Felner.”The Increasing Cost Tree Search for Optimal Multi-Agent Pathfinding.” In the 22ndInternational Joint Conference on Artificial Intelligence (IJCAI), 2011.

• 2) [Sharon et al., 2011b] - Guni Sharon, Roni Tzvi Stern, Meir Goldenberg, and ArielFelner. ”Pruning techniques for the increasing cost tree search for optimal multi-agentpathfinding.” In 4th Annual Symposium on Combinatorial Search (SOCS), 2011.

The work on ICTS was later summarized in:

• 3) [Sharon et al., 2013a] - Guni Sharon, Roni Stern, Meir Goldenberg, and Ariel Fel-ner. ”The increasing cost tree search for optimal multi-agent pathfinding.” ArtificialIntelligence 195 (2013): 470-495.

A first study of CBS was published in:

• 4) [Sharon et al., 2012a] - Guni Sharon, Roni Stern, Ariel Felner, and Nathan R.Sturtevant. ”Conflict-based search for optimal multi-agent pathfinding.” In the 26thAAAI Conference on Artificial Intelligence, 2012.

• 5) [Sharon et al., 2012b] - Guni Sharon, Roni Stern, Ariel Felner, and Nathan R.Sturtevant. ”Meta-Agent Conflict-Based Search For Optimal Multi-Agent Path Find-ing.” In 5th Annual Symposium on Combinatorial Search (SOCS), 2012. Awardedthe ”best paper award” for SoCS-2012.

The work on CBS was later summarized in:

• 6) [Sharon et al., 2015] - Guni Sharon, Roni Stern, Ariel Felner, and Nathan R.Sturtevant. ”Conflict-based search for optimal multi-agent pathfinding.” ArtificialIntelligence 219 (2015): 40-66. 470-495.

A first study of EDA* was published in:

• 7) [Sharon et al., 2014] - Guni Sharon, Ariel Felner, and Nathan Sturtevant. ”Ex-ponential Deepening A* for Real-Time Agent-Centered Search.” In the 28th AAAIConference on Artificial Intelligence, 2014.

The work on EDA* was later summarized in:

• 8) Guni Sharon, Ariel Felner, and Nathan Sturtevant. ”Exponential Deepening A* forReal-Time Agent-Centered Search.” Artificial Intelligence (under review).

9

Next, is a list of papers that are partly related to this thesis: A work generalizing CBSto solve Constraint Satisfaction Problems was presented in:

• 9) [Sharon, 2014] - Guni Sharon. ”Partial Domain Search Tree For Constraint-Satisfaction Problems.” In 8th International Symposium on Combinatorial Search(SOCS), 2015.

A method for online state pruning for RTACS problems was presented in:

• 10) [Sharon et al., 2013b] - Guni Sharon, Nathan R. Sturtevant and Ariel Felner.”Online Detection of Dead States in Real-Time Agent-Centered Search.” In 6th In-ternational Symposium on Combinatorial Search (SOCS), 2013.

1.5.2 Research led by others

Next, is a list of papers that are based on the theoretical foundations set by this thesis.Starting with a list of publications that included me as a co-author.

Amir et al presented a framework, that is based on the ICTS algorithm, for reducingMAPF to Combinatorial Auctions in:

• 11) [Amir et al., 2015] - Ofra Amir, Guni Sharon and Roni Stern. ”Multi-AgentPathfinding as a Combinatorial Auction.” In 29th AAAI Conference on Artificial In-telligence, 2015.

A suboptimal variant of CBS was presented in:

• 12) [Barer et al., 2014] - Max Barer, Guni Sharon, Roni Stern and Ariel Felner.”Suboptimal Variants of the Conflict-Based Search Algorithm for the Multi-AgentPathfinding Problem.” In 21st European Conference on Artificial Intelligence (ECAI),2014.

Enhancements to the CBS and MA-CBS algorithms were presented in:

• 13) [Boyrasky et al., 2015] - Eli Boyrasky, Ariel Felner, Guni Sharon, and RoniStern. ”Don’t Split, Try To Work It Out: Bypassing Conflicts in Multi-Agent Pathfind-ing.” In 25th International Conference on Automated Planning and Scheduling (ICAPS),2015.

• 14) [Boyarski et al., 2015] - Eli Boyrasky, Ariel Felner, Roni Stern, Guni Sharon,Oded Betzalel, Solomon Shimony and David Tolpin. ”ICBS: Improved Conflict-based Search algorithm for Multi-Agent Pathfinding.” In the 24th International JointConference on Artificial Intelligence (IJCAI), 2015.

The current state-of-the-art A* variant for MAPF was presented in:

• 15) [Felner et al., 2012] - Ariel Felner, Meir Goldenberg, Guni Sharon, Roni Stern,Tal Beja, Nathan R. Sturtevant, Jonathan Schaeffer and Robert Holte. ”Partial-ExpansionA* with Selective Node Generation.” In the 26th AAAI Conference on Artificial In-telligence, 2012.

10

• 16) [Goldenberg et al., 2014] - Meir Goldenberg, Ariel Felner, Roni Stern, GuniSharon, Nathan R. Sturtevant, Robert C. Holte and Jonathan Schaeffer. ”EnhancedPartial Expansion A*.” J. Artif. Intell. Res. (JAIR) 50: 141-187 (2014).

I was not a co-author on the following papers. However, I have shared my code withsome of these projects:

• [Ferner et al., 2013a] - Used the MA-CBS framework to enhance thier algoritm,RM*.

• [Tolpin, 2014] - Suggested novel improvments for MA-CBS.

• [Goldenberg et al., 2012] - Presented a new algorithm that is also applicable forMAPF and is comparable to CBS/MA-CBS

• [Cohen et al., 2014] - Presented a new varaint of CBS that is fast to compute butsuboptimal.

11

Chapter 2

Multi-Agent Path Finding

This chapter defines the Multi-Agent Path-Finding problem and presents coherent defini-tions that are used throughout this thesis. Next, a survey covering and characterizing mainMAPF algorithms is provided.

This chapter is organized as follows. In Section 2.1 we introduce and define MAPF.Section 2.2 provides a survey on previous work regarding MAPF. Finally, Section 2.3 con-cludes this chapter.

This chapter uses much of the text that was published in a journal paper [Sharon et al.,2015].

2.1 Problem Definition and Terminology

Many variants of the MAPF problem exist. We now define the problem and later describethe algorithms in the context of a general, commonly used variant of the problem [Standley,2010; Standley and Korf, 2011; Sharon et al., 2013a; Felner et al., 2012; Sharon et al.,2015]. This variant is as general as possible and it includes many sub-variants.

2.1.1 Problem Input

The input to the multi-agent path finding problem (MAPF) is:

• (1) A directed graph G(V,E). The vertices of the graph are possible locations for theagents, and the edges are the possible transitions between locations.

• (2) k agents labeled a1, a2 . . . ak. Every agent ai has a start vertex, starti ∈ V and agoal vertex, goali ∈ V .

Time is discretized into time points. At time point t0 agent ai is located in location starti.

2.1.2 Actions

Between successive time points, each agent can perform a move action to a neighboringvertex or a wait action to stay idle at its current vertex. There are a number of ways to dealwith the possibility of a chain of agents that are following each other in a given time step.

12

This may not be allowed, may only be allowed if the first agent of the chain moves to anunoccupied location or it may be allowed even in a cyclic chain which does not include anyempty location. Our algorithm is applicable across all these variations.

2.1.3 MAPF Constraints

The main constraint in MAPF is that each vertex can be occupied by at most one agent ata given time. There can also be a constraint disallowing more than one agent to traversethe same edge between successive time steps. A conflict is a case where a constraint isviolated.

2.1.4 MAPF Task

A solution to the MAPF problem is a set of non-conflicting paths, one for each agent,where a path for agent ai is a sequence of move, wait actions such that if ai performsthis sequence of actions starting from starti, it will end up in goali.

2.1.5 Cost function

We aim to solve a given MAPF instance while minimizing a global cumulative cost func-tion. We describe the algorithms in this thesis in the context of a common cost function thatwe call the sum-of-costs [Dresner and Stone, 2008; Standley, 2010; Sharon et al., 2013a;Sharon et al., 2012a; Sharon et al., 2012b]. sum-of-costs is the summation, over all agents,of the number of time steps required to reach the goal for the last time and never leave itagain. Finding the optimal solution, i.e., the minimal sum-of-costs, has been shown to beNP-hard [Yu and LaValle, 2013b].

Other cost functions have also been used in the literature. Makespan, for example, isanother common MAPF cost function which minimizes the total time until the last agentreaches its destination (i.e., the maximum of the individual costs). Another cost functionis called Fuel [Felner et al., 2004], corresponding to the total amount of distance traveledby all agents (equivalent to the fuel consumed). The Fuel cost function is in fact the sum-of-costs of all agents where only move actions incur costs but wait actions are free. Inaddition, giving different weights to different agents is also possible.

Some variants of MAPF do not have a global cost function to optimize, but a set ofindividual cost functions, one for every agent [LaValle and Hutchinson, 1998]. In suchvariants a solution is a vector of costs, one per agent. This type of cost function is part of thebroader field of multi-objective optimization, and is beyond the scope of this thesis. In otherMAPF variants the agents may be self-interested, and the task is to devise a mechanism thatwill cause them to cooperate [Bnaya et al., 2013]. In this work we assume that the agentsare fully collaborative and are not self-interested.

Yu and LavValle [Yu and LaValle, 2012] studied a MAPF variant in which instead ofassigning each agent with a goal position a set of goal positions is given and the task is tofind a solution that brings each of the agents to any goal position. They showed that thisMAPF variant is solvable in polynomial time using network flows.

13

2.1.6 Distributed vs. Centralized

MAPF problems can be categorized into two groups: distributed and centralized. In adistributed setting, each agent has its own computing power and different communica-tion paradigms may be assumed (e.g., message passing, broadcasting etc.). A large bodyof work has addressed the distributed setting [Gilboa et al., 2006; Grady et al., 2011;Bhattacharya et al., 2010]. By contrast, the centralized setting assumes a single centralcomputing power which needs to find a solution for all agents. Equivalently, the central-ized setting also includes the case where we have a separate CPU for each of the agents butfull knowledge sharing is assumed and a centralized problem solver controls all the agents.The scope of this thesis is limited to centralized approaches, and we cover many of them inthe next section.

2.1.7 Examples of a MAPF Problem

Figure 2.1: An example of a MAPF instance with 2 agents. The mice, 1 and 2, must reach the piecesof cheese 1 and 2, respectively.

We now present two MAPF instances that will be used throughout the thesis. Figure 2.1shows an example 2-agent MAPF instance. Each agent (mouse) must plan a full path toits respective piece of cheese. Agent a1 has to go from S1 to G1 while agent a2 has togo from S2 to G2. Both agents have individual paths of length 3: 〈S1, A1, D,G1〉 and〈S2, B1, D,G2〉, respectively. However, these paths have a conflict, as they both includestate D at time point t2. One of these agents must wait one time step. Therefore, theoptimal solution cost, C∗, is 7 in this example.

Figure 2.2 shows a different 2-agent MAPF problem instance. Agent a1 has to go fromA to F while agent a2 has to go fromB toD. Both agents have individual paths of length 2:A−C−F and B−C−D, respectively. However, these paths have a conflict, as theyboth include state C at time point t1. One of these agents must wait one time step or take adetour. Therefore, the optimal solution C∗ is 5 in this example. Note that in this example,the optimal solution with respect to the total time elapsed (makespan) cost function has atotal time elapsed of 3.

14

A

B

C F

E

D

Figure 2.2: An example of a MAPF instance with 2 agents. The mice, 1 and 2, must reach the piecesof cheese 1 and 2, respectively.

2.2 Survey of Centralized MAPF Algorithms

Work assuming a centralized approach can be divided into three classes. The first classof solvers, used in recent work, reduce MAPF to other problems that are well studiedin computer science. The second class of solvers consists of MAPF-specific sub-optimalsolvers. The third class of solvers is the class of optimal solvers. The focus of this thesis ison optimal solvers, but we include a brief survey of the other classes below.

2.2.1 Reduction-based Solvers

This class of solvers, used in recent work, reduce MAPF to other problems that are wellstudied in computer science. Prominent examples include reducing to Boolean Satisfiabil-ity (SAT) [Surynek, 2012], Integer Linear Programming (ILP) [Yu and LaValle, 2013a] andAnswer Set Programming (ASP) [Erdem et al., 2013]. These methods return the optimalsolution and are usually designed for the makespan cost function. They are less efficient oreven not applicable for the sum of cost function. In addition, these algorithms are usuallyhighly efficient only on small problem instances. On large problem instances the translationprocess from a MAPF instance to the required problem has a very large, yet polynomial,overhead which makes these approaches inefficient.

2.2.2 MAPF-Specific Sub-optimal Solvers

Algorithms of this class are usually highly efficient but do not guarantee optimality andeven completeness in some cases. They are commonly used when the number of agentsis large and the optimal solution is intractable. MAPF-specific sub-optimal solvers can befurther classified into sub classes.

Search-Based Suboptimal Solvers

Search-based solvers usually aim to provide a high quality solution (close to optimal) butthey are not complete for many cases. These solvers differ by the way they treat conflicts

15

between agents. A prominent example of a search-based sub-optimal algorithm is Hier-archical Cooperative A* (HCA*) [Silver, 2005]. In HCA* the agents are planned one ata time according to some predefined order. Once the first agent finds a path to its goal,that path is written (reserved) into a global reservation table. More specifically, if the pathfound for any agent ai is v0

i = starti, v1i , v

2i , . . . v

li = goali, then the algorithm records that

state vji is occupied by agent ai at time point tj . Reservation tables can be implemented as amatrix of #vertices×#timesteps, or in a more compact representation such as a hash ta-ble for the items that have been reserved. When searching for a path for a later agent, pathschosen by previous agents are blocked. That is, the agent may not traverse locations thatare in conflict with previous agents. An approach similar to HCA* was presented earlierfor multi-agent motion planning [Erdmann and Lozano-Perez, 1987]. Windowed-HCA*(WHCA*) [Silver, 2005], one of several HCA* variants, only performs cooperative pathfinding within a limited window, after which other agents are ignored. A perfect single-agent heuristic is most often used to guide this search. Because HCA* is designed forgames with limited memory, the heuristic cannot be pre-computed and must be calculatedat runtime.

Later work extended HCA* by abstracting the size of the state space to reduce the run-time cost of building the heuristics [Sturtevant and Buro, 2006]. Finally, WHCA* wasenhanced such that the windows are dynamically placed only around known conflicts andagents are prioritized according to the likelihood of being involved in a conflict [Bnaya andFelner, 2014]. The HCA* idea has a few drawbacks. First, when too many agents exist,deadlocks may occur, and HCA* is not guaranteed to be complete. Second, HCA* doesnot provide any guarantees on the quality of its solution, and thus the solutions may be farfrom optimal. Finally, HCA* may even slow the search significantly. This is particularlytrue with the windowed variant of HCA*, WHCA*. Because individual agents are inde-pendently looking for solutions of minimal length, agents may unnecessarily collide whensignificant free space is available, incurring significant computational costs to resolve. Thereservation table idea has also been used for managing traffic junctions where cars (agents)must cross a junction without causing collisions [Dresner and Stone, 2008]. That systemsolves the online version of the problem, where cars arrive at a junction and once they crossit they disappear.

Rule-Based Suboptimal Solvers

Rule-based approaches include specific movement rules for different scenarios and usuallydo not include massive search. The agents plan their route according to the specific rules.Rule-based solvers favor completeness at low computational cost over solution quality.

TASS [Khorshid et al., 2011] and Push and Swap (and its variants) [Luna and Bekris,2011; Sajid et al., 2012; de Wilde et al., 2013] are two recently proposed rule-based MAPFsub-optimal algorithms that run in polynomial time.1 Both algorithms use a set of “macro”operators. For instance, the push and swap algorithm uses a “swap” macro, which is a setof operators that swaps location between two adjacent agents. Both TASS and Push and

1An approach similar to TASS was presented earlier for tunnel environments [Peasgood et al., 2006].

16

Swap do not return an optimal solution and guarantee completeness for special cases only.TASS is complete only for tree graphs while Push and Rotate [de Wilde et al., 2013], avariant of Push and Swap, is complete for graphs where at least two vertices are alwaysunoccupied, i.e., k ≤ |V | − 2.

Predating all of this work is a polynomial-time algorithm that is complete for all graphs [Daniel Ko-rnhauser, 1984; Roger and Helmert, 2012]. This previous work focuses on a specific variantof MAPF called the pebble motion coordination problem (PMC). PMC is similar to MAPFwhere each agent is viewed as a pebble and each pebble needs to be moved to its goallocation.

Hybrid Solvers

Some suboptimal solvers are hybrids and include specific movement rules as well as signif-icant search. For example, if the graph is a grid then establishing flow restrictions similarto traffic laws can simplify the problem [Wang and Botea, 2008; Jansen and Sturtevant,2008]. Each row/column in the grid is assigned two directions. Agents are either suggestedor forced to move in the designated directions at each location in order to significantlyreduce the chance of conflicts and the branching factor at each vertex. These approachesprioritize collision avoidance over shorter paths and work well in state spaces with largeopen areas. These approaches are not complete for the general case, as deadlocks mayoccur in bottlenecks.

Another hybrid solver was presented by Wang and Botea [Wang and Botea, 2011]. Thebasic idea is to precompute a full path (Pi) for each agent (ai). For each pair of successivesteps (pj, pj+1 ∈ P ) an alternative sub path is also pre-computed. If the original computedpath of agent ai is blocked by another agent aj , we redirect agent ai to a bypass via oneof the alternative paths. The main limitation of this approach is that it is only proven to becomplete for grids which have the slidable property (defined in [Wang and Botea, 2011]).It is not clear how to generalize this algorithm to grids that are not slidable.

Ryan introduced a search-based approach for solving MAPF problems which uses ab-straction to reduce the size of the state-space [Ryan, 2008]. The input graphG is partitionedinto subgraphs with special structures such as cliques, halls and rings. Each structure rep-resents a certain topology (e.g., a hall is a singly-linked chain of vertices with any numberof entrances and exits). Each structure has a set of rule-based operators such as Enter andLeave. Once a plan is found in the abstract space, a solution is derived using the rule basedoperators. A general way to use these abstractions is by solving the entire MAPF problemas a Constraint Satisfaction Problem (CSP). Each special subgraph adds constraints to theCSP solver, making the CSP solver faster [Ryan, 2010]. The efficiency of subgraph de-composition (in terms of runtime) depends on the partitioning of the input graph. Findingthe optimal partitioning is a hard problem and not always feasible. Open spaces are notsuitable for partitioning by the defined structures making this algorithm less effective ongraphs with open spaces.

17

2.2.3 Optimal MAPF Solvers

Optimal MAPF solvers usually search a global search space which combines the individualstates of all k agents. This state space is denoted as the k-agent state space. The states inthe k-agent state space are the different ways to place k agents into |V | vertices, one agentper vertex. In the start and goal states agent ai is located at vertices starti and goali,respectively. Operators between states are all the non-conflicting actions (including wait)that all agents have. Given this general state space, any A*-based algorithm can be used tosolve the MAPF problem optimally.

We use the term bbase to denote the branching factor of a single agent, that is, the num-ber of locations that a single agent can move to in one time-step. This thesis focuses on4-connected grids where bbase = 5, since every agent can move to the four cardinal direc-tions or wait at its current location. The maximum possible branching factor for k agentsis bpotential = bbase

k. When expanding a state in a k-agent state space, all the bpotentialcombinations may be considered, but only those that have no conflicts (with other agentsor with obstacles) are legal neighbors. The number of legal neighbors is denoted by blegal.blegal = O(bbase

k), and thus for worst-case analysis one can consider blegal to be in thesame order as bpotential, i.e., exponential in the number of agents (k). On the other hand,in dense graphs (with many agents and with a small number of empty states), blegal canbe much smaller than bpotential. In general, identifying the blegal neighbors from the possi-ble bpotential neighbors is a Constraint Satisfaction Problem (CSP), where the variables arethe agents, the values are the actions they take, and the constraints are to avoid conflicts.Hereafter we simply denote blegal by b.

Admissible Heuristics for MAPF

To solve MAPF more efficiently with A*, one requires a non-trivial admissible heuris-tic. A simple admissible heuristic is to sum the individual heuristics of the single agentssuch as Manhattan distance for 4-connected grids or Euclidean distance for Euclideangraphs [Ryan, 2008]. HCA* improves on this by computing the optimal distance to thegoal for each agent, ignoring other agents. As this task is strictly easier than searching withadditional agents, it can be used as an admissible heuristic. HCA* performed this computa-tion incrementally for each agent, while Standley performed the computation exhaustivelya priori before solving the MAPF problem [Standley, 2010].

We denote this heuristic as the sum of individual costs heuristic (SIC). Formally, the SICheuristic is calculated as follows. For each agent ai we assume that no other agent existsand calculate its optimal individual path cost from all states in the state space to goali; thisis usually done by a reverse search from the goal. The heuristic taken in the multi-agent A*search is the sum of these costs over all agents. For the example problem in Figure 2.1, theSIC heuristic is 3 + 3 = 6. Note that the maximum of the individual costs is an admissibleheuristic for the makespan variant described above, in which the task is to minimize thetotal time elapsed.

For small input graphs, the SIC heuristic for any problem configuration can be storedas a lookup table by precalculating the all-pairs shortest-path matrix for the input graph G.

18

For larger graphs we calculate the shortest path from each state only to the goal states of agiven instance. This, however, must be recomputed for each problem instance with respectto each set of goal states.

Drawbacks of A* for MAPF

A* always begins by expanding a state and inserting its successors into the open list (De-noted OPEN). All states expanded are maintained in a closed list (denoted CLOSED).Because of this, A* for MAPF suffers from two drawbacks. First, the size of the statespace is exponential in the number of agents (k), meaning that CLOSED cannot be main-tained in memory for large problems. Second, the branching factor of a given state maybe exponential in k. Consider a state with 20 agents on a 4-connected grid. Each agentmay have up to 5 possible moves (4 cardinal directions and wait). Fully generating allthe 520 = 9.53× 1014 neighbors of even the start state could be computationally infeasible.The following enhancements have been proposed to overcome these drawbacks.

Reducing the Effective Number of Agents with Independence Detection

Since the state space of MAPF is exponential in the number of agents, an exponentialspeedup can be obtained by reducing the number of agents in the problem. To this endStandley introduced the Independence Detection framework (ID) [Standley, 2010].

Algorithm 1: The ID frameworkInput: A MAPF instance

1 Assign each agent to a singleton group2 Plan a path for each group3 repeat4 validate the combined solution5 if Conflict found then6 Merge two conflicting groups into a single group7 Plan a path for the merged group

8 until No conflicts occur;9 return paths of all groups combined

Two groups of agents are independent if there is an optimal solution for each groupsuch that the two solutions do not conflict. ID attempts to detect independent groups ofagents. Algorithm 1 provides the pseudo-code for ID. First, every agent is placed in its owngroup (line 1). Each group is solved separately using A* (line 2). The solution returnedby A* is optimal with respect to the given group of agents. The paths of all groups arethen checked for validity with respect to each other. If a conflict is found, the conflictinggroups are merged into one group and solved optimally using A* (lines 6-7). This processof replanning and merging groups is repeated until there are no conflicts between the plansof all groups. The resulting groups are independent with regards to each other. Note thatID is not perfect, in the sense that more independent subgroups may lay undetected in thegroups returned by ID.

19

Since the complexity of a MAPF problem in general is exponential in the number ofagents, the runtime of solving a MAPF problem with ID is dominated by the running timeof solving the largest independent subproblem [Standley, 2010]. ID may identify that asolution to a k-agent MAPF problem can be composed from solutions of several indepen-dent subproblems. We use k′ to denote the effective number of agents which is number ofagents in the largest independent subproblem (k′ ≤ k). As the problem is exponential inthe number of agents, ID reduces the exponent from k to k′.

Consider A*+ID on our example problem in Figure 2.1 (Section 2.1.7), but with anadditional agent. Assume that the third agent a3 is located at state D and that its goal stateis S1. ID will work as follows. Individual optimal paths of cost 3 are found for agents a1

(path 〈S1, A1, D,G1〉) and a2 (path 〈S2, B1, D,G2〉), and a path of cost 2 is found for agenta3 (path 〈D,A2, S1〉). When validating the paths of agents a1 and a2, a conflict occurs atstate D, and agents a1 and a2 are merged into one group. A* is called upon this group andreturns a solution of cost 7 (agent a2 waits one step at B1). This solution is now validatedwith the solution of agent a3. No conflict is found and the algorithm halts. The largestgroup invoked by A* was of size 2. Without ID, A* would have to solve a problem with 3agents. Thus, the worst-case branching factor was reduced from bbase

3 to bbase2.It is important to note that the ID framework can be implemented on top of any optimal

MAPF solver (one that is guaranteed to return an optimal solution) in Line 7. Therefore,ID can be viewed as a general framework that utilizes a MAPF solver. Hence, ID is alsoapplicable with the algorithm proposed in this thesis (ICTS and CBS) instead of with A*.Indeed, in the experimental evaluation we ran ID on top of ICTS and CBS.

Enhancements to ID

In order to improve the chance of identifying independent groups of agents, Standley pro-posed a tie-breaking rule using a conflict avoidance table (CAT) as follows. The paths thatwere found for the agents are stored in the CAT. When a newly formed, merged group issolved with A* (line 7), the A* search breaks ties in favor of states that will create thefewest conflicts with the existing planned paths of other groups (agents that are not part ofthe merged group), as stored in the CAT. The outcome of this improvement is that the solu-tion found by A* using the conflict avoidance tie-breaking is less likely to cause a conflictwith the other agents. As a result, agents paths are more likely to be independent, resultingin substantial speedup.

Standley also presented an enhanced version of ID (EID) [Standley, 2010]. In thisversion, once two groups of agents are found to conflict, a resolve conflict procedure iscalled prior to merging the groups. EID tries to resolve the conflict by attempting to replanone group to avoid the plan of the other group, and vice-versa. To maintain optimality, thecost of the plan found during the replanning must be exactly at the same cost as the originaloptimal solution for that group. If the conflict between the groups was not resolved, thegroups are merged and solved together as in the basic ID. If the resolve procedure is ableto solve the conflict, the groups are not merged and the main loop continues.

20

M*

An algorithm related to ID is M* [Wagner and Choset, 2011]. It is an A*-based algorithmthat dynamically changes the branching factor based on conflicts. In general, expandednodes generate only one child in which each agent makes its optimal move towards thegoal. This continues until a conflict occurs between q ≥ 2 agents at node n. In this casethere is a need to locally increase the search dimensionality. M* traces back from n upthrough all the ancestors of n until the root node and all these nodes are placed back inOPEN. If one of these nodes is expanded again it will generate bq children where the qconflicting agents make all possible moves and the k− q non-conflicting agents make theiroptimal move.

An enhanced version, called Recursive M* (RM*) [Wagner and Choset, 2011] dividesthe q conflicting agents into subgroups of agents each with independent conflicts. Then,RM* is called recursively on each of these groups. A variant called ODRM* [Ferner et al.,2013b] combines Standley’s Operator Decomposition on top of RM*.

Avoiding Surplus Nodes

Under some restrictions, A* is known to expand the minimal number of nodes required tofind an optimal solution [Dechter and Pearl, 1985]. There is no such statement regardingthe number of nodes generated by A*. Some of the nodes must be generated in order foran optimal solution be found. Nodes with f > C∗, known as surplus nodes [Goldenberg etal., 2014], are not needed in order to find an optimal solution.

Surplus nodes are all nodes that were generated but never expanded. The number ofgenerated nodes is the number of expanded nodes times the branching factor. Thus, inMAPF, where the branching factor is exponential in the number of agents, the number ofsurplus nodes is potentially huge, and avoiding generating them can yield a substantialspeedup [Goldenberg et al., 2014; Goldenberg et al., 2012]. The challenge is how to iden-tify surplus nodes during the search. Next, we describe existing techniques that attempt todetect surplus nodes.

Operator Decomposition

The first step towards reducing the amount of surplus nodes was introduced by Standleyin his operator decomposition technique (OD) [Standley, 2010]. Agents are assigned anarbitrary (but fixed) order. When a regular A* node is expanded, OD considers and appliesonly the moves of the first agent. Doing so introduces an intermediate node. At intermedi-ate nodes, only the moves of a single agent are considered, generating further intermediatenodes. When an operator is applied to the last agent, a regular node is generated. Oncethe solution is found, intermediate nodes in OPEN are not developed further into regularnodes, so that the number of regular surplus nodes is significantly reduced.

21

Enhanced Partial Expansion

Enhanced Partial Expansion A* (EPEA*) [Goldenberg et al., 2014] is an algorithm thatavoids the generation of surplus nodes and, to the best of our knowledge, is the best A*-based solver for MAPF. EPEA* uses a priori domain knowledge to avoid generating sur-plus nodes. When expanding a node, N , EPEA* generates only the children Nc withf(Nc) = f(N). The other children of N (with f(Nc) 6= f(N)) are discarded. This is donewith the help of a domain-dependent operator selection function (OSF). The OSF returnsthe exact list of operators which will generate nodes with the desired f -value (i.e., f(N)).N is then re-inserted into the open list with f -cost equal to that of the next best child. Nmight be re-expanded later, when its new f -value becomes the best in the open list. Thisavoids the generation of surplus nodes and dramatically reduces the number of generatednodes.

In MAPF problems, when using the SIC heuristic, the effect on the f -value of movinga single agent in a given direction can be efficiently computed. Exact details of how thisOSF for MAPF is computed and implemented can be found in [Goldenberg et al., 2014].

2.3 Conclusions

In this Chapter we defined MAPF as well as provided a survey on previous work, introduc-ing a categorization that helps classify all previous work to the following classes:

1. Optimal solvers [Standley, 2010; Goldenberg et al., 2014; Sharon et al., 2013a;Sharon et al., 2015]

2. Sub-optimal search-based solvers [Silver, 2005; Dresner and Stone, 2008]

3. Sub-optimal procedure-based solvers [Daniel Kornhauser, 1984; Luna and Bekris,2011; Khorshid et al., 2011; Sajid et al., 2012; de Wilde et al., 2013]

4. Sub-optimal hybrid solvers [Wang and Botea, 2008; Jansen and Sturtevant, 2008;Wang and Botea, 2011; Ryan, 2008]

We believe this new terminology will help classify and evaluate MAPF algorithms inthe future.

22

Chapter 3

The Increasing Cost Tree Search

We present a novel formalization for the MAPF problem which includes a search tree calledthe increasing cost tree (ICT) and a corresponding search algorithm, called the increasingcost tree search (ICTS) that finds optimal solutions. ICTS is a two-level search algorithm.The high-level phase of ICTS searches the increasing cost tree for a set of costs (costper agent). The low-level phase of ICTS searches for a valid path for every agent that isconstrained to have the same cost as given by the high-level phase.

We analyze this new formalization, compare it to the A* search formalization and pro-vide the pros and cons of each. Following, we show how the unique formalization of ICTSallows even further pruning of the state space by grouping small sets of agents and identi-fying unsolvable combinations of costs. Experimental results on various domains show thebenefits and limitations of our new approach. A speedup of up to 3 orders of magnitudewas obtained in some cases.

This chapter is organized as follows. In Section 3.1 we introduce and formulate theICTS algorithm. Section 3.2 theoretically compares ICTS to the A*-based search algo-rithms and show its advantages and drawbacks. Section 3.3 provides initial experimentalresults. Section 3.4 introduces enhancements to the basic ICTS in the form of various prun-ing techniques. Sections 3.5 and 3.6 provide further experimental results and Section 3.7concludes this chapter and describes future and ongoing work.

This chapter is almost an exact copy of [Sharon et al., 2013a].

3.1 ICTS Formalization

In this section we describe our new increasing cost search formalization for the MAPFproblem. This formalization is then used to construct an efficient optimal search algorithm,called the increasing cost tree search algorithm (ICTS).

As described in the previous sections, the classical coupled global-search approachspans an A*-based search tree where states correspond to the possible locations of eachof the agents. This search is coupled with an admissible heuristic that guides the search.ICTS is based on a conceptually different formalization and unlike A*, ICTS is not guidedby a heuristic. Based on the understanding that a complete solution for the entire problemis built from individual paths (one for each agent), ICTS divides the MAPF problem into

23

two problems:

1. What is the cost of the path of each individual agent in the optimal solution?

2. How to find a set of non-conflicting paths for all the agents given their individualcosts?

ICTS answers these two questions in the different levels of the algorithm.

• High-level search: searches for a minimal cost solution in a search space that spanscombinations of individual agent costs (one for each agent).

• Low-level search: searches for a valid solution under the costs constraints (given bythe high-level search). The low-level search can be viewed as the goal test of thehigh-level search.

Next, we cover these levels in detail.

3.1.1 High-Level Search

The high-level search is performed on a new search tree called the increasing cost tree(ICT). The ICT is built as follows.

• Nodes: In ICT, every node s consists of a k-vector of individual path costs, [C1, C2, . . . Ck]

with one cost per agent. Node s represents all possible complete solutions in whichthe cost of the individual path of agent ai is exactly Ci. The total cost of node s isC1 + C2 + . . .+ Ck. Nodes of the same level of ICT have the same total cost.

• Root of the tree: The root of ICT is [opt1, opt2, . . . , optk], where opti is the cost ofthe optimal individual path for agent i assuming no other agents exist. This vectorfor the root of ICT is equivalent to the SIC heuristic of the start state if used withA*-based searches as described above.1

• Successor function: Each node [C1, C2, . . . Ck] will generate k successors. Suc-cessor succi increments the costs of agent ai resulting in a cost vector of succi =

[C1, C2, . . . Ci−1,Ci + 1, Ci+1, . . . Ck], thus increasing the total cost function by one.

• Goal test: An ICT node [C1, . . . , Ck] is a goal node if there is a non-conflictingcomplete solution such that the cost of the individual path for agent ai is exactly Ci.

Figure 3.1 shows an example of an ICT with three agents, all with individual optimalpath costs of 10. Dashed lines mark duplicate children which can be pruned. Since thecosts of nodes in the ICT increase by one at each successive level, it is easy to see that abreadth-first search of ICT will find the optimal solution.

1We note, however, that any A*-based search is not restricted to SIC and might use any possible admissible heuristic.By contrast, ICTS is built exclusively on information of paths of individual agents which is logically equivalent to theSIC heuristic.

24

10,10,10

10,10,1210,11,1110,12,1011,10,1111,11,1012,10,10

10,10,1110,11,1011,10,10

Figure 3.1: ICT for three agents.

The depth of the optimal goal node in ICT is denoted by ∆. ∆ equals the differencebetween the cost of the optimal complete solution (C∗) and the cost of the root (i.e., ∆ =

C∗ − (opt1 + opt2 + . . . optk)). The branching factor of ICT is exactly k (before pruningduplicates) and therefore the number of nodes in ICT is O(k∆).2 Thus, the size of ICT isexponential in ∆ but not in k. In the extreme case where the agents can reach their goalusing an optimal path and without conflicting with each other, we will have ∆ = 0 and anICT with a single node, regardless of the number of agents.

The value of ∆ is constant but can only be revealed a posteriori, after the optimalsolution cost is found. There are many factors that affect the value of ∆ for a given probleminstance. For example, the number of agents k, the topology of the searched map, the ratioof the number of agents and the number of vertices of the graph - all affect the valueof ∆. Section 3.3.3 discusses the effect of k on ∆. In general, the experimental results(Section 3.3) show that while ∆ is affected by k, it can be significantly smaller than k insome cases, while in other cases the opposite it true. For example, in large open maps witha small number of agents, we expect ∆ to grow slower than k, while in narrow corridorswhere many conflicts occur, we expect ∆ to grow faster than k even for small values of k.

The high-level phase searches the ICT. For each ICT node, the low-level search deter-mines whether it is a goal node, i.e., whether its cost vector corresponds to non conflictingindividual paths of the given costs. This is discussed next.

3.1.2 Low-Level Search

A straightforward approach to check whether an ICT node [C1, C2, . . . , Ck] is a goal wouldbe: (1) For every agent ai enumerate all the possible individual paths with cost Ci, (2)Iterate over all possible combinations of individual paths of the different agents until acomplete non-conflicting solution is found.

The main problem with this approach is that for every agent ai, there may be an ex-ponential number of paths of cost Ci. Moreover, the number of possible ways to combinepaths of different agents is the cross product of the number of paths for every agent. Next,we introduce an effective algorithm for doing this.

2More accurately, the exact number of nodes at level i in the ICT is the number of ways to distribute i balls (actions)

to k ordered buckets (agents). For the entire ICT this is∆∑

i=0

(k + i− 1

k − 1

).

25

Compact Paths Representation with MDDs

The number of different paths of length Ci for agent ai can be exponential. We suggest tostore these paths in a special compact data-structure called multi-value decision diagram(MDD) [Srinivasan et al., 1990]. MDDs are DAGs which generalize Binary DecisionDiagrams (BDDs) by allowing more than two choices for every decision node.

An MDD for our purpose is structured as follows. Let MDDci be the MDD for agent

ai which stores all the possible paths of cost c. It has a single source node. Nodes of theMDD can be distinguished by their depth below the source node. Every node at depth tof MDDc

i corresponds to a possible location of ai at time t, that is on a path of cost cfrom starti to goali. MDDc

i has a single source node at level 0, corresponding to agent ailocated at starti at time t0, and a single sink node at level c, which corresponds to agent ailocated at goali at time tc.

A

C

F

A

A

C

B C

E F

F

2

1MDD 3

1MDD

B

C

D

2

2MDD

Figure 3.2: 2 and 3 steps MDDs for agent a1 and 2 steps MDD for agent a2

Consider again our example problem instance from Figure 2.2 (Section 2.1.7). For thisproblem, Figure 3.2 illustrates MDD2

1 and MDD31 for agent a1, and MDD2

2 for agent a2.Agent a1 only has a single path of length 2 and thus MDD2

1 stores only one path. Now,considerMDD3

1. At time 0, the agent is at locationA. Next, it has three options for time 1:move to B, wait at A (which causes A to also appear at level 1 of this MDD), or, move toC. Thus, it can be in either of these locations at time step 1. These are all prefixes of pathsof length 3 to the goal. Note that while the number of paths of cost c might be exponentialin c, the size of MDDc

i is at most |V | × c as each level in the MDD has no more than |V |nodes and there are exactly c levels. For example, MDD3

1 includes 5 possible differentpaths of cost 3.

Building the MDD is very easy. We perform a breadth-first search from the start loca-tion of agent ai down to depth c and only store the partial DAG which starts at start(i) andends at goal(i) at depth c. Furthermore, MDDc

i can be reused to build MDDc+1i . Thus,

one might use previously built MDDs when constructing a new MDD.We use the termMDDc

i (x, t) to denote the node inMDDci that corresponds to location

x at time t. For example, the source node of MDD21 shown in Figure 3.2 is denoted by

MDD21(A, 0). We use the term MDDi when the depth of the MDD is clear from the

context.

26

Example of ICTS

2,2 Cost = 4

GOAL! 3,2 2,3

4,2 3,3 2,4

Cost = 5

Cost = 6

Figure 3.3: ICT for the problem in Figure 2.2.

The low-level search performs a goal test on nodes of the ICT as follows. For every ICTnode, we build the corresponding MDD for each of the agents. Then, we need to find a setof paths, one from each MDD that do not conflict with each other. The ICT for our exampleproblem from figure 2.2 (Section 2.1.7) is shown in figure 3.3. The high-level search startswith the root ICT node [2, 2]. MDD2

1 and MDD22 (shown in figure 3.2) have a conflict as

they both have state c at level 1. The ICT root node is therefore declared as non-goal by thelow-level search. Next, ICT node [3, 2] is verified by the high-level search. Now MDD3

1

and MDD22 have non-conflicting complete solutions. For example, [A-B-C-F] for a1 and

[B-C-D] for a2. Therefore, this node is declared as a goal node by the low-level search andthe solution cost 5 is returned.

Goal Test in the k-Agent-MDD Search Space

Next, we present an efficient algorithm that iterates over the MDDs to find whether a set ofnon-conflicting paths exist. We begin with the 2-agent case and then generalize to k > 2.

B

C

D

2

2MDD '2

2MDD

B

C

D

D

Figure 3.4: MDDs of 2 steps for agent a2 and its extension

Consider two agents ai and aj located in their start positions. Define the global 2-agent search space as the state space spanned by moving these agents simultaneously to allpossible directions. This is the same search space that is searched by the standard A*-basedsearch algorithms for this problem [Standley, 2010]. Now consider the MDDs of agents aiand aj that correspond to a given ICT node [c,d]: MDDc

i and MDDdj . It is important to

note that without loss of generality we can assume that c = d. Otherwise, if c > d, a path

27

of (c− d) dummy goal nodes can be added to the sink node of MDDdj to get an equivalent

MDD, MDDcj . Figure 3.4 shows MDD2′

2 where a dummy edge (with node D) was addedto the sink node of MDD2

2.The cross product of MDDi and MDDj spans a subset of the global 2-agent search

space, denoted as the 2-agent-MDD search space. This 2-agent-MDD search space is asubset of the global 2-agent search space, because we are constrained to only considermoves according to edges of the single-agent MDDs. We now define a 2-agent-MDDdenoted as MDDij for agents ai and aj . A 2-agent-MDD is a generalization of a single-agent MDD to two agents. MDDij is a subset of the 2-agent-MDD search space spannedby the cross product of the two single-agent MDDs. Every node in MDDij correspondsto a valid (non-conflicting) pair of locations of the two agents. That is, nodes in the crossproduct that correspond to a conflict are part of the 2-agent-MDD search space but are notpart of the 2-agent-MDD.

MDDij is formally defined as follows. A node n = MDDij([xi, xj], t) includes apair of locations [xi, xj] for ai and aj at time t. It is a unification of the two MDD nodesMDDi(xi, t) and MDDj(xj, t). Starting at the two source nodes of MDDi and MDDj ,MDDij is built level by level. The source node MDDij([xi, xj], 0) is the unification ofthe two source nodes MDDi(xi, 0) and MDDj(xj, 0). Consider node MDDij([xi, xj], t).The cross product of the children of MDDi(xi, t) and MDDj(xj, t) should be examined(they are all in the 2-agent MDD search space) and only non-conflicting pairs are added aschildren of MDDij([xi, xj], t). In other words, we look at all pair of nodes MDDi(xi, t+

1) and MDDj(xj, t + 1) such that xi and xj are children of xi and xj , respectively. If xiand xj do not conflict3 then MDDij([xi, xj], t+ 1) becomes a child of MDDij([xi, xj], t)

in MDDij . There are at most |V | nodes for each level t in the single-agent MDDs. Thus,the size of the 2-agent-MDD of height c is at most c× |V |2.

A

A

C

B C

E F

F

2,3

2,1MDD 3

1*MDD

A,B

A,C B,C

C,D E,D

F,D (ii) (i)

A

A

C

B C

E F

F

3

1MDD '2

2MDD

B

C

D

D

X = C,C

F,D

Conflict!

Figure 3.5: (i) Merging MDD1 and MDD2 to MDD12 (ii) unfolded MDD∗31.

In principle, one can actually build and storeMDDij in memory by performing a searchover the two single-agent MDDs and unifying the relevant nodes. Duplicate nodes at levelt can be merged into one copy but we must add an edge for each parent at level t− 1. Forexample, Figure 3.5(i) shows how MDD3

1 and MDD2′2 (which has a dummy edge so as

to have 3 levels) were merged into a 2-agent-MDD, MDD312. Elements in bold represent

the resulting 2-agent-MDD, MDD312. Dotted elements represent parts of the 2-agent MDD

3They conflict if xi = xj or if (xi = xj and xj = xi), in which case they are traversing the same edge in an oppositedirection.

28

search space that are pruned. In the process of building MDD312 a conflict (where both

agents are assigned position C at time step 1) was encountered. The conflicting node fromthe 2-agent MDD search space will therefore not be added to MDD3

12 together with allits outgoing edges. Note that node (F,D) at the second level will also not be added toMDD3

12 despite the fact that it does not represent a conflict. This happens because all itslegal predecessors (in this case only node (C,C)) are not part of MDD3

12. There is onlyone possible node at level c (the height of the MDDs), which both agents arrive at theirgoal. This is the sink node of MDDij . Any path to it represents a valid solution to the2-agent problem.

In practice one does not necessarily need to build the entire structure of MDDij andstore it in memory. In order to perform the goal test for a given ICT node, all that isneeded is to systematically search through the nodes of MDDij and check if such a sinknode exists in MDDij (in this case true is returned) or prove that such as node does notexist (in this case false is returned). The exact search choice we use is described below inSection 3.1.3.

Generalization to k > 2 agents

Generalization for k > 2 agents is straightforward. The k-agent MDD search space isthe cross product of all k single-agent MDDs. Similarly, a node in a k-agent MDD, n =

MDD[k](x[k], t), includes k locations of the k agents at time t in the vector x[k]. It isa unification of k single-agent MDD nodes of level t that do not conflict. The size ofMDD[k] is O(c × |V |k). The low-level search is performed on the k-agent MDD searchspace, which is the cross product of all the k single-agent MDDs associated with the givenICT node. In practice (as just explained above for 2 agents) visiting all nodes of MDD[k]

is not mandatory in order to perform this goal test. Furthermore, as a combined MDD (formultiple agents) is potentially exponential, we would like to avoid building it entirely inmemory. Instead, we perform a systematic exhaustive search through the k-agent-MDDsearch space in order to check whether the current ICT node is a goal node or not. This isdone by first building a single-agent MDD for each of the k agents. Following, we run asearch in the k-agent MDD search space constrained by the unification rules defined above.This results in visiting the nodes in the k-agent MDD. Once a node at level c is reached,true is returned. If the entire k-agent-MDD was scanned and no node at level c had beenreached, false is returned. This means that there is no way to unify k paths from the ksingle-agent MDDs. In other words, dead-ends were reached in MDDc

[k] before arrivingat level c. This search in the k-agent-MDD search space is called the low-level search. Wenote that any systematic exhaustive search will work here and discuss several alternativesin Section 3.1.3 below.

3.1.3 The ICTS Algorithm

ICTS is summarized in Algorithm 2. The high-level phase searches each node of the ICT(line 2). Then, the low-level phase searches the corresponding k-agent-MDD search space(line 11). The lines in the square brackets (lines 5-10) are optional and will be discussed

29

Algorithm 2: The ICT-search algorithmInput: (k, n) MAPF

1 Build the root of the ICT2 foreach ICT node in a breadth-first manner do3 foreach agent ai do4 Build the corresponding MDDi

5 [ //optional6 foreach pair (triple) of agents do7 Perform node-pruning8 if node-pruning failed then9 Break //Conflict found. Next ICT node

10 ]11 Search the k-agent MDD search space // low-level search12 if goal node was found then13 return Solution

in Section 3.4 as a possible enhancement for further pruning the search space before thelow-level search is activated. The version without these pruning techniques is referred toas basic ICTS.

Search Choices

To guarantee admissibility, the high-level search should be done with breadth-first search(or any of its variants). However, any systematic search algorithm can be activated by thelow-level search on the k-agent MDD search space (line 11). We found the best variant tobe depth-first search. The advantage of DFS is that in the case that a solution exists (and thecorresponding ICT node will be declared as the goal) it will find the solution fast, especiallyif many such solutions exist. In order to prune duplicate nodes we coupled the DFS with atransposition table that stores all the visited nodes. This assures that this low-level searchfor a given ICT node visits every node in the MDD search space at most once.

Note that as stated in Section 2.2.3, ICTS can be activated on top of the ID framework.In such case, ICTS is used to optimally solve subgroups of the k agents (see Algorithm 1for description of ID). Standley has shown that when solving one subgroup of agents itis worthwhile to prefer solutions that will not create conflicts with agents from a differentsubgroup [Standley, 2010]. This is done using the conflict avoidance table as explained inSection 2.2.3. Therefore, when ICTS is used on top of ID we slightly modified the searchstrategy of the low-level search to prefer low-level nodes with minimal conflicts with othergroups as follows.

Depth-first search can be implemented as a best-first search where the cost function isthe number of steps away from the start states (i.e., the g-value). DFS expands the nodewith the highest g-value. We used such implementation for our low-level search. When IDwas used, we changed the cost function of the low-level search to be the k-MDD node withthe smallest number of conflicts with other groups as reported by the conflict avoidancetable.

30

We therefore only report results for this variant in the experimental section. That is, weused DFS with a transposition table when ID was not used (experiments of type 1 below)and best-first search (which prefers nodes with small number of conflicts with other groups)when ID was used (experiments of type 2 below).

3.2 Theoretical Analysis

This section compares the amount of effort done by ICTS to that of A* and A*+OD withthe SIC heuristic. It is well-known that A* will always expand all the nodes with f < C∗

and some of the nodes with f = C∗. In the worst case (depending on tie braking) A*expands all the nodes with f ≤ C∗. Let X be the number of nodes expanded by A* withthe SIC heuristic in the worst-case, i.e., X is the number of nodes with f ≤ C∗.

A* is known to be ”optimally effective”, which means that A* expands the minimalnumber of nodes necessary in order to ensure an optimal solution [Dechter and Pearl, 1985].As such, any algorithm that is guaranteed to return the optimal solution has a computationalcomplexity of at least O(X). Therefore, we next analyze the complexity of ICTS, as wellas A* and A*+OD, with respect to X . We show that the actual work done by A* andA*+OD (in terms of total numbers of expanded and generated nodes) is much larger thanX and is substantially larger than ICTS in some cases.

3.2.1 ICTS

The time complexity of ICTS is composed from the complexity of applying the low-levelsearch for every ICT node visited by the high-level search. As explained in Section 3.1.1,the number of ICT nodes visited by the high-level search until the optimal solution is foundis O(k∆). Therefore, the complexity of ICTS is the complexity of the low-level search ona single ICT node times O(k∆).

The low-level search is a systematic search of the k-agent MDD search space. To searchthe k-agent MDD search space, one must first build the single-agent MDD for each of thek agents and then consider their cross-product. Therefore, the complexity of the low-levelsearch is composed from the complexity of building k single-agent MDDs and searchingthe k-agent MDD search space.

Building an MDD for a single agent requires, in the worst case, time that is linearin the size of the single-agent state-space and the depth of the MDD. This is in generalexponentially smaller than the number of states visited by A* (X) when searching theglobal k-agent state-space. Thus, it can be omitted. Next, we show that complexity ofsearching the k-agent MDD search space (i.e., the number of nodes in the k-agent MDDsearch space that are visited during the search) is bounded by X .

Lemma 1 For every node m in the global k-agent search space that will be visited by alow-level search for ICT node [C1, . . . , Ck], it holds that f(m) ≤

∑ki=1Ci, where f(m) is

the cost given to node m by an A* search with the SIC heuristic.

31

Proof: Node m represents a possible location for every agent, denoted by loci(m), at timestep t. Let gi(m) be the cost of agent ai arriving at loci(m). Similarly, let hi(m) be theheuristic estimate of the distance from loci(m) to goali, and let fi(m) = gi(m) + hi(m).

The cost of node m in an A* search with the SIC heuristic is given by the sum of fi(m),since:

f(m) = g(m) + h(m) =k∑i=1

gi(m) +k∑i=1

hi(m) =k∑i=1

fi(m)

By definition, the k-agent MDD search space is spanned by MDD1 ×MDD2 × . . .×MDDk. MDDi contains only nodes on a path of cost Ci from starti to goali. Conse-quently, fi(m) ≤ Ci, since fi(m) uses an admissible heuristic (hi(m)). Therefore we canconclude that f(m) ≤

∑ki=1Ci as required.

Theorem 1 For any ICT node visited by the high-level search, the low-level search on therelevant k-agent MDD search space will visit at most X states from the global k-agentsearch space.

Proof: Since the high-level search is performed in a breadth-first search manner, the costof every ICT node returned by the high-level search will never exceed C∗. This meansthat for any ICT node [C1, . . . , Ck] that is visited by the high-level search, it holds thatC1 + · · ·+ Ck ≤ C∗.

Following Lemma 1, we have that any node m visited by the low-level search will havef(m) ≤ C∗. Since X is the number of all the nodes with f(m) ≤ C∗, we can concludethat any low-level search will visit at most X states. Note that since we implemented thelow-level search with DFS plus a transpositions table, each of these X states is visited atmost once.

An important observation is that nodes in the global k-agent search space that are out-side the k-MDD search space are never visited by the low-level search. This means that nonode with cost larger than C∗ will ever be considered by ICTS. As will be shown next (inSection 3.2.2), this is not the case with A*.

While a single low-level search of ICTS will not visit more than X states from theglobal k-agent search space, ICTS performs many low-level searches, one for each ICTnode. Since the number of ICT nodes is bounded by k∆, and the number of nodes visitedfor a single ICT node is bounded by X , then the number of nodes visited by ICTS isO(X × k∆).4

Next, we perform a similar analysis of the number of nodes visited by A*, with respectto X .

4This is an upper bound. Searches in ICT nodes at level l will visit no more than the number of nodes with f =SIC(root)+ l. Only for depth ∆ this equals C∗. Furthermore, one can potentially reuse information from MDDs acrossICT nodes.

32

3.2.2 A*

While A* expands only X nodes, it generates (= adds to the open list) many more. Ev-ery time a non-goal node is expanded, A* generates all its blegal = O(bbase

k) children.5

Therefore, the total number of nodes generated by A* is X × blegal. 6 Recall that blegal =

O(bbasek). Therefore, A* will generate more nodes than X by a factor which is exponential

in k. On a four-connected grid this would be a factor of 5k.The main extra work of A* with respect to X is that nodes with f > C∗ might be

generated as children of nodes with f ≤ C∗ which are expanded. A* will add thesegenerated nodes to the open list but will never expand them.7 Therefore, the number ofnodes visited by A* is actually O(X × bbasek).

3.2.3 A*+OD

As described in Section 2.2.3, Operator Decomposition (OD) is a recent improvement ofA* where intermediate states are generated between the regular full states by applying anoperator to a single agent only. Thus, there is a path of k intermediate states between anypair of neighboring full states. OD reduces the branching factor from O(bbase

k) to bbase(single-agent branching factor). However, since each operator only advances a single agentrather than every agent, the depth of a goal node increases by a factor of k.

While A*+OD expands more nodes then A*, it can generate substantially fewer nodesthan A*. As explained above, when A* expands all theX nodes, it generatesO(bbase

k×X)

nodes. In OD, when any one of these X nodes is expanded, initially (for the first agent)only bbase nodes are generated. If the f -value of these bbase nodes is above C∗, they will notbe further expanded. Thus, potentially, OD may reduce the number of nodes generated byA* from O(bkbase ×X) to O(bbase ×X). This is indeed a significant saving.

However, this potential saving of OD is a ”best-case” analysis. Next, we show that inthe worst-case expanding a full state with f = C∗ using A*+OD may incur expanding andgenerating a number of nodes that is exponential in the number of agents.

Consider a single agent ai located at location loc, and assume that hSIC is the heuristicof that individual agent (i.e., hSIC(loc) is the distance from loc to goali). Define bSIC asthe number of locations adjacent to loc for which hSIC = hSIC(loc) − 1. For an open4-connected grid bSIC = 2, as there are at most two neighbors whose general direction istowards the goal. These neighbors will have the same f -value as location loc. For example,consider Figure 3.6 where and located in the middle cell with f = c at time t0. The agentonly has two locations at time t1 to reach the goal in the lower-right corner and keep thecost of f = c.

With A*+OD, every state (full or intermediate) will generate only bbase children. How-ever, bSIC of them will be expanded afterwards, since they have exactly the same f value as

5To be precise, A* has more overhead. It first considers all the bpotential = bbasek potential children and selects only

the legal ones. If duplicate detection is performed, duplicate legal nodes are also discarded.6An equivalent situation will occur in the last iteration of IDA* where the threshold is C∗. IDA* will generate all

these blegal nodes and backtrack.7These nodes are called surplus nodes in our new formalization which studies tradeoffs of expanded nodes vs. gener-

ated nodes [Felner et al., 2012].

33

Stay

f = c + 1 f = c

f = c

f = c + 2

f = c + 2

Figure 3.6: Example of changes in the f -value on a 4-connected grid. The f -value of the cells isfor time step 1.

Nodes visitedAlgorithm General Open 4-connected gridA* expanded X f ≤ C∗

A* X × bbasek X × 5k

A*+OD X × bSICk × bbase X × 2k × 5

ICTS X × k∆ X × k∆

Table 3.1: Summary of theoretical comparison: A*, A*+OD and ICTS

their parent.8 The number of intermediate nodes that are expanded below a full node withf -value of C∗ is therefore bSICk. Each of these nodes will generate bbase − bSIC childrenwith f -value that is larger than C∗. So the total number of nodes that are generated and notexpanded below each full state is bSICk × (bbase − bSIC) = O(bSIC

k × bbase).In total, the number of generated nodes by A*+OD is O(X× bSICk× bbase). Therefore,

A*+OD will visit more nodes than X by a factor which is still exponential in k but thebase of the exponent is reduced from bbase to bSIC . In an open 4-connected grid, thismeans a reduction from 5k to 2k. Note that this analysis ignores the intermediate statesexpanded by nodes with f -value smaller than C∗. However, as the MAPF problem is anexponential domain, the number of nodes with f -value equal to C∗ dominates the numberof nodes with f < C∗. This is a well-known phenomenon in combinatorial spaces, whichis exploited by search algorithms such as IDA* [Zhang and Korf, 1995]. Thus, A*+ODexpands O(X × bSICk × bbase).

Table 3.1 summarizes the number of nodes visited by A*, A*+OD and ICTS comparedto the number of nodes expanded by A* (X). Consider the difference between the numberof nodes visited by A*, i.e., X × bk, and the number of nodes visited by ICTS , i.e.,X × k∆. Clearly, ICTS visits less nodes than A* if k∆ is smaller than bk. As explained inSection 3.1.1, the exact value of ∆ is affected by the value of k, as adding more agents willlikely increase ∆. However, there are factors other than k that affect the value of ∆, suchas the topology of the map and the ratio between the number of agents and the number ofvertices of the graph.9 Therefore, in some cases, k∆ < bk and ICTS will outperform A*

8Naturally, bSIC is different for every agent and location but we assume that bSIC is a constant for simplicity.9This is discussed below in Section 3.3.3.

34

while in other cases bk < k∆ and A* will outperform ICTS. In the experimental resultswe show both cases. For example in an open 8x8 grid with 10 agents that are locatedrandomly, we have that bbasek = 9, 765, 625 while k∆ = 4, 105. On the other hand, in anopen 3x3 grid with 8 agents that are located randomly, we have that bbasek = 390, 625 whilek∆ = 11× 108. These results for the 8x8 and 3x3 open grids can be seen in Table 3.3 andTable 3.2, respectively.

3.3 Experimental Results: ICTS vs. A*

In this section we provide experimental results comparing ICTS to basic A* and to thestate-of-the-art A* variant of Standley [Standley, 2010], i.e., A*+OD. Both versions ofA* were guided by the SIC heuristic. For each problem instance encountered during ourexperiments a time limit of 5 minutes was set. If an algorithm was not able to solve aproblem within the time limit it was halted. In such cases, the different measurements wereaccumulated (number of nodes generated/expanded, runtime etc.) and treated as lowerbounds.

3.3.1 Experiment Types

The ID framework can be used to enhance both A* and ICTS. When a group of conflictingagents is formed, any optimal and complete MAPF solver can be used (line 7 in Algo-rithm 1). Therefore, ICTS was also implemented on top of the ID framework. This versionis called ICTS+ID and is a competitor to A*+OD+ID. When ID is activated, we distinguishbetween two different agent counts:

• Total number of agents. This number is labeled by k and represents the number ofagents that exist in the problem instance to be solved.

• Effective number of agents - This number is labeled by k′ and represents the numberof agents in the largest independent subgroup found by ID. Thus, the solvers (A* orICTS) were actually activated on at most k′ agents instead of k agents.

Our experiments below are also classified into two types with regards to the usage ofID. In the first type of experiments (type 1) we aimed to show the overall performance ofthe evaluated algorithms. In such cases we activated ID and then executed both ICTS andA* on top of ID. The different algorithms are given a problem instance with a given numberof k agents, where the start and goal locations are uniformly randomized. Since ID breaksthese problems into subproblems we also report the average value of k′.

In the second type of experiments, (type 2), our aim is to study the behavior of theA* or ICTS algorithm for a given number of agents. However, when the ID framework isapplied on k agents (whose start and goal locations are randomized) the resulting effectivenumber of agents, k′, is noisy and its variance is very large. Therefore, in some of ourexperiments we did not activate ID and compared the algorithms (A* and ICTS) on afixed number of agents k. In such cases k′ ≡ k. However, to make the experiments

35

Nodes generated Runtimek Cost ∆ bbase

k k∆ A* A*+OD ICTS A* A*+OD ICTS2 3.6 0.10 25 1 23 17 1 1 0 03 5.5 0.23 125 1 90 38 2 3 0 04 7.8 0.77 625 3 294 102 6 12 1 15 10.7 1.93 3,125 22 980 425 31 71 8 76 13.8 3.27 15,625 350 2,401 1,383 204 327 44 427 18.6 5.83 78,125 84,513 6,050 7,105 2,862 1,547 611 1,6548 23.6 10.02 390,625 11× 108 11,055 35,288 79,942 5,311 13,474 248,407

Table 3.2: Results on 3 × 3 grid averaged over 100 instances. ID was not activated (experiment oftype 2). Comparison of the theoretical measures to the number of nodes and to the actual runningtime (in ms)

meaningful we generated these groups of agents as follows. In a preprocessing phase, werandomized the start and goal locations for a large number of agents and activated ID onthese agents. Whenever ID recognizes k agents that conflict with each other, we place theproblem instance that corresponds to these k agent in a bucket labeled k. This process isrepeated until the buckets for all values of k are filled with enough instances. We denotethis process as the coupling mechanism that generates sets of conflicting agents of differentsizes.

3.3.2 3× 3 Grid

Our first experiment is on a 4-connected 3× 3 grid with no obstacles where we varied thenumber of agents from 2 to 8. This was an experiment of type 2, i.e., the ID frameworkwas not activated. This is due to the small search space and the fact that the density of theagents is high. Therefore, the activation of ID will not gain too much, as agents tend tobe in conflict with each other.10 Table 3.2 presents the results averaged over 100 instancesgenerated with the coupling mechanism described above. The column ”Cost” in Table 3.2shows the average solution cost of the optimal solution. Naturally, since the cost functionused is additive, adding more agents results in a longer solution cost.

Comparing the number of nodes is problematic as the different algorithms have differentphases with different types and sizes of nodes. In such cases, researchers sometimes onlyreport running times [Standley, 2010; Korf, 2009]. We chose to report both the numberof generated nodes (middle portion of the table) and running times (right portion of thetable). However, we note that the number of generated nodes is calculated differently foreach algorithm. For A* it is the traditional number of generated nodes. For A*+OD itincludes both full states and intermediate states. For ICTS this number corresponds to thesummation of the number of k-agent MDD nodes visited by all the calls to the low-levelsearch.

The results confirm our theoretical analysis (Section 3.2) about the correlation betweenthe performance of the algorithms and the relation between bbasek and k∆. In a 4-connectedgrid bbase = 5 in the worst case, to account for four cardinal moves plus wait. For k ≤ 6 wesee that bbasek > k∆, and ICTS is indeed superior to the A* variants. It generates a smaller

10In fact, in some cases applying ID actually degraded the performance, due to the overhead of ID.

36

number of nodes and needs the same or slightly less time to solve the problem. For k ≥ 7,we have that bbasek < k∆. Correspondingly, the relative performance shifts. For k = 7

ICTS still generated a slightly smaller number of nodes than the A* versions, but the A*versions were faster in time due to a smaller constant time per node. Both A* and A*+ODclearly outperform ICTS for 8 agents in both nodes and time.

As a side note, we observed an interesting phenomenon regarding the performance ofA*+OD with respect to A*. Standley reported that A*+OD is superior to A* [Standley,2010], and we supported this claim theoretically in Section 3.2. However, in an extremecase where the graph is dense with agents A*+OD may be weaker than A*. Such anextreme case is given in Table 3.2, where for k ≥ 7, A* outperforms A*+OD in terms ofgenerated nodes, and for k = 8, A* even outperforms A*+OD in terms of runtime. Thisoccurs because in such dense cases, the branching factor of both A* and A*+OD plays asmaller role than the depth of the search tree, which is larger for A*+OD because it hasintermediate states.

3.3.3 The Growth of k vs. ∆

Recall that according to the theoretical analysis described in Section 3.2, the performanceof ICTS with respect to A* is affected by the values of ∆. The major cause for large valuesof ∆ is the existence of many conflicts. Increasing k can potentially increase the number ofconflicts but this depends on the density of the problem at hand, which is defined to be theratio between k and N . When the density is low, adding another agent will add relativelyfew conflicts and ∆ will increase slightly. When the density is high, adding another agentcan increase ∆ substantially. This is shown in the ∆ column of Table 3.2. Moving from7 to 8 agents increases ∆ much more than moving from 2 to 3 agents. Naturally, the sizeof the graph has direct influence on the density. For a given k, small graphs are denser andwill have more conflicts (and thus larger values for ∆) than large graphs.

0

2

4

6

8

10

12

0.22 0.33 0.44 0.56 0.67 0.78 0.89

Density

Δ

k

Figure 3.7: ∆ and k growth on a 3× 3 grid with no obstacles.

Figure 3.7 shows the relation between ∆, k and the density, for the 3× 3 grid. The X-axis corresponds to growing values of density (d = #agents

#cells). The table presents two curves.

37

The k curve corresponds to the number of agents that procure the given density. For ourcase of a 3× 3 grid, the relation between the density and the number of agents is linear, asthere are 9 cells and therefore k = 9×d. The second curve presents the ∆ obtained. As canbe seen in Figure 3.7, for small density values ∆ is smaller than k. However, we can seethat in general, ∆ increases in a manner which is super-linear in the density. Thus, from acertain value of density, ∆ will be higher than k. In the results shown in Figure 3.7, we cansee that ∆ is larger than k for density of 0.82 and larger. Of course, the exact density valuefrom which ∆ is larger than k is greatly influenced by the topology of the map. Hence,while this value was 0.82 for our example, this number may vary for other maps.

3.3.4 8× 8 Grid

Next, we compared A*, A*+OD and ICTS on a larger grid of size 8 × 8. Two sets ofexperiments were performed. The first experiment, reported next, is of type 2. Instanceswere generated with the coupling mechanism (explained above) to verify that there are kconflicting agents in every instance, and the ID framework was not activated. A type 1 ex-periment was also performed, where we generated purely random instances of k agents andthe ID framework was activated. The results for this experiment are reported in Section 3.6.

0

20

40

60

80

100

120

3 4 5 6 7 8 9 10 11 12 13 14

Succ

ess

Rat

e

Nunmber Of Agents

ICTS 3E

ICTS

A*+OD

A*

Figure 3.8: Success rate on an 8 × 8 grid with no obstacles. Experiment of type 2 where ID notactivated.

Figure 3.8 presents the number of instances (out of 100 random instances) solved byeach of the evaluated algorithms within the 5 minutes time limit. Clearly, as the number ofagents increases, ICTS is able to solve more instances than A*+OD, and both are strongerthan A*. The line in Figure 3.8 labeled by ICTS+3E denotes results for ICTS with theenhanced triple pruning technique that will be described in Section 3.4. ICTS with thisenhancement solves more instances than basic ICTS and many more than A*+OD.

Table 3.3 presents the average number of states visited and the runtime in millisecondsfor the instances that were solved by both A*+OD and ICTS algorithms within the timelimit. Since A* could not solve all these instances, we use the accumulated measures(nodes and time) of A* until the timeout as lower bounds. Note that for k = 10, A*+OD

38

Nodes generated Runtimek Cost ∆ bbase

k k∆ A* A*+OD ICTS A* A*+OD ICTS3 14.7 0.5 125 2 409 90 16 14 1 14 20.3 0.9 625 3 2,756 303 31 401 5 15 26.1 1.4 3,125 9 >19,631 933 94 >12,826 43 76 29.9 1.9 15,625 30 >78,432 2,287 143 >84,689 193 197 36.2 2.2 78,125 67 >176,182 4,762 372 >239,411 380 818 41.0 2.5 390,625 187 NA 12,935 645 NA 2,792 2829 46.7 3.4 1,953,125 1,642 NA 46,565 3,826 NA 18,516 3,048

10 52.3 3.6 9,765,625 4,105 NA >106,181 24,320 NA >78,999 24,784

Table 3.3: k conflicting agents (experiment of type 2, where ID is not activated) on an 8× 8 grid. Runningtime in ms.

could solve less than 80% of the instances (as can be seen in Figure 3.8). Thus, for thiscase, we also report instances not solved by A*+OD and we report accumulated measures(nodes and time) as lower bounds for A*+OD. It is important to point out that there wereno instances solved by A*+OD that were not solved by ICTS. It is clear that in this settingICTS significantly outperforms both A* variants (A* and A*+OD). For example for 5agents ICTS is more than 1,422 times faster than A* and 18 times faster than A*+OD. Thesuperior performance of ICTS over the A* variants corresponds to the theoretical analysispresented in Section 3.2, where ICTS is expected to outperform the A* variants whenk∆ < bbase

k. Indeed in this setting (8 × 8 grid, no obstacles), bbasek grows very fast whilek∆ grows relatively slowly. These values can be seen in the relevant columns of Table 3.3.Note that theoretically, if we continue to add agents to an 8× 8 grid we may reach a pointwhere k∆ > bbase

k. However, solving instances with such a large value of k was notfeasible with our computing resources.

3.3.5 Limitations of ICTS

In the vast majority of our experiments and settings the ICTS approach was superior to theA* approach. However, there are cases where A* is significantly faster than ICTS. We nowconcentrate on such cases.

ba

Figure 3.9: ICTS pathology.

Recall that A* is exponential in k and ICTS is exponential in ∆. Therefore, whenk is very small and ∆ is very large, ICTS will be extremely inefficient compared to A*.Figure 3.9 presents such a pathological example. Agents a and b are on the left side of thecorridor and only need to swap their positions (linear conflict). Thus, the SIC heuristic is2. However, both agents must travel all the way to the end of the corridor to swap theirrelative positions.

The cost of the optimal path is 74 (37 time steps for each agent). bbase ≤ 3 along thecorridor (left, right, wait) and thus bbasek ≤ 9. A* expanded X = 852 nodes, generated2,367 nodes (blegal ≈ 4 due to illegal- and duplicate nodes) and solved this problem rela-tively quickly in 51ms. In contrast, ∆ = 72 and as a result, 2,665 ICT nodes were visitedand ICTS solve the problem in 36,688ms.

39

Similarly, as shown in Table 3.2, in the 3×3 grid when k was large and thus the densitywas very high, again, ∆ was very large and ICTS was inferior to A*. For those two cases,though for different reasons, ∆ can be very large. While for the corridor case, the densitywas low and there was only a single conflict, solving this conflict caused many extra movesover the SIC heuristic. Thus, this case is pathological from the ICTS perspective due to itstopological structure. For the 3 × 3 grid, the density was high and caused a large numberof conflicts. Resolving these conflicts results in a large value for ∆.

3.3.6 ICTS on other MAPF variants

In this thesis we focus on the commonly used variant of MAPF (described in Chapter2). Other variants might have different time analysis for ICTS. To demonstrate this, twoextreme examples are provided next.

• Time-elapsed variant - In this variant the task is to minimize the number of timesteps elapsed until all agents reach their final positions (this is also known as themakespan). For this case, there is no meaning to the individual cost of a single agent.All agents virtually use the same amount of time steps. Thus, the size of the ICT willbe linear in ∆ instead of exponential.

• Variable costs variant - In this variant the different actions have different costs forthe different agents. Let ε be the minimum possible step cost. The difference betweentwo successive levels in the ICT must be at least ε (in order to maintain optimality).The size of the ICT will now be exponential in ∆

ε.

Thus, the relative behavior of the algorithms for these variants of the problem might bedifferent than the ones reported in this chapter.

3.4 ICT Pruning Techniques

All ICT nodes visited by the high-level search are non-goal nodes, except for the last node(the goal node). For all these nodes the goal test will return false. This is done by thelow-level search by scanning the entire k-agent-MDD search space.

We now turn to discuss a number of useful pruning methods that can be optionallyactivated before the low-level phase on an ICT node n. This is shown in lines 5-10 ofAlgorithm 2. If the pruning was successful, n can be immediately declared as non-goal andthere is no need to activate the low-level search on n. In this case, the high-level searchjumps to the next ICT node. We begin with the simple pairwise pruning and then describeenhancements as well as generalize these techniques to pruning techniques that considergroups with more than two agents.

3.4.1 Simple Pairwise Pruning

As shown above, the low-level search for k agents is exponential in k. However, in manycases, we can avoid the low-level search by first considering subproblems of pairs of agents.

40

Consider a k-agent MAPF and a corresponding ICT node n = C1, C2, . . . Ck. Now,consider the abstract problem of only moving a pair of agents ai and aj from their startlocations to their goal locations at costs Ci and Cj , while ignoring the existence of otheragents. Solving this problem is actually searching the 2-agent-MDD search space thatcorresponds to MDDij . If no solution exists to this 2-agent problem (i.e., searching the2-agent-MDD search space will not reach a goal node), then there is an immediate benefitfor the original k-agent problem as this ICT node (n) can be declared as non-goal rightaway. There is no need to further perform the low-level search through the k-agent MDDsearch space. This is done in the Simple Pairwise Pruning (SPP) variant where we iterateover all

(k2

)pairs of agents to find such cases.

Algorithm 3: Pairwise pruning in ICT node n

1 foreach pair of agents ai and aj do2 Search MDDij with DFS3 if solution found then4 continue // next pair

5 if solution not found then6 return SUCCESS // Next ICT node

7 return FAILURE // Activate low-level search on n

SPP, presented in Algorithm 3 is optional and can be performed just before the low-level search (lines 5-10 in Algorithm 2). SPP iterates over all pairs (MDDi, MDDj) andsearches the 2-agent-MDD search space that corresponds to MDDij . If a pair of MDDswith no pairwise solution is found (line 5), SUCCESS is returned. The given ICT node isimmediately declared as a non-goal and the high-level search moves to the next ICT node.Otherwise, if pairwise solutions were found for all pairs of MDDs, FAILURE is returned(line 7) and the low-level search must be performed over the k-agent-MDD search space ofthe given ICT node n.

SPP is performed with a DFS on MDDij (line 2). The reason is again that a solutioncan be found rather fast, especially if many solutions exist. In this case, this particular pairof agents cannot prune the current ICT node n. Algorithm 3 will then move to the nextpair of agents and try to perform pruning for the new pair on node n. In the worst-case,all pairwise searches found a 2-agent solution and FAILURE is returned. This will incur(k2

)= O(k2) different searches of a 2-agent-MDD search, one for every pair of agents.

3.4.2 Enhanced Pairwise Pruning

It is still possible to gain knowledge from searching all the pairs of 2-agent-MDD, even ifall such searches resolved in a 2-agents solution (and thus the ICT node was not pruned).This is done by changing the search strategy of the pairwise pruning from depth-first searchto breadth-first search and adding a number of steps that modify the single-agent MDDs,MDDi and MDDj as follows. Assume that MDDij was built by unifying MDDi andMDDj . A node at level t of MDDij represents a valid location of agent i and agentj at time step t. Conflicting locations, e.g., where agent i and agent j are at the same

41

location, are of course discarded from MDDij . We can now unfold MDDij back into twosingle-agent MDDs, MDD∗i and MDD∗j . MDD∗i and MDD∗j can be sparser than theoriginal MDDs, since MDD∗i only includes paths that do not conflict with MDDj (andvice versa). In other words, MDD∗i only includes nodes that were actually unified withat least one node of MDDj . Nodes from MDDi that were not unified at all, are calledinvalid nodes and are deleted.

Figure 3.5(ii) shows MDD∗31 after it was unfolded from MDD3

12. Dashed nodes andedges correspond to parts of the original MDD that were pruned. For example, node C inthe right path of MDD3

1 is invalid as it was not unified with any node of MDD32. Thus,

this node and its incident edges, as well as its only descendant (F ) can be removed, and arenot included in MDD∗3

1.The unfolding process described above can require searching the entire 2-agent-MDD

search space of MDDij . The outcome of this process is MDD∗i and MDD∗j , whichare potentially sparser than MDDi and MDDj . Having sparser MDDs is useful for thefollowing two tasks:

1. Further pairwise pruning. After MDD∗i was obtained, it is used for the nextpairwise check of agent ai. Sparser MDDs will perform more ICT (high-level) nodepruning as they have a smaller number of possible options for unifying nodes andlower chances of avoiding conflicts. Furthermore, when MDD∗i is matched withMDDk, it might prune more portions of MDDk than if the original MDDi wasused. This has a cascading effect such that pruning of MDDs occurs through a chainof MDDs.

2. The general k-agent low-level search. This has a great benefit as the sparse MDDswill span a smaller k-agent-MDD search space for the low-level search than the orig-inal MDDs.

We call this improved pruning process Enhanced Pairwise Pruning (EPP).

3.4.3 Repeated Enhanced Pairwise Pruning

If all O(k2) pairs were matched and a solution was found for every pair then the ICTnode cannot yet be declared as a non-goal and the low-level search should be activated.Assume a pair of agents ai and aj such that a solution was found in MDDij when theagents were first matched. However, now, after all the mutual pruning of single-agentMDDs the resulting MDD∗i and MDD∗j could potentially be much sparser. Repeatingthis process might reveal that now, the new sparser MDDs can no longer be unified and thatthe previous solution no longer exists. The Repeated Enhanced Pairwise Pruning (REPP)repeatedly performs iterations of EPP. In each iteration, EPP matches all the

(k2

)pairs and

repeatedly makes the single-agent MDDs sparser. This process is continued until either theICT node is pruned (because there exist a pair ai and aj such that there is no solution toMDD∗ij) or until no single-agent MDD can be made sparser by further pairwise pruning.Note that REPP can be viewed as a form of arc-consistency, a well-known technique inCSP solvers [Mohr and Henderson, 1986].

42

3.4.4 Tradeoffs

Natural tradeoffs exist between the different pairwise pruning techniques. Per pair ofagents, SPP is the fastest because as soon as the first two-agent solution is found we stopand move to the next pair of agents. EPP is slower than SPP per pair of agents becausethe pairwise search is performed until the entire MDDij was searched and the single-agentMDDs were made as sparse as possible. However, EPP might cause speedup in futurepruning and in the low-level search as described above.

Let d be the depth of a single MDD. Recall that the size of every layer in a givenMDD is bounded by |V |. While searching a 2-agents MDD all combinations of 2-agentslocations might be examined in every layer. In the worst case |V |2 × d or O(|V |2) stateswill be visited. This will be done for all

(k2

)pairs (= O(k2)). The total work done by

SPP (worst case) and EPP is O(|V |2 × k2). Recall that searching the entire search space isO(|V |k). Since |V | > k, we have that in general the pruning is much faster than the actualsearch in the k-agent MDD search space. REPP is even slower than EPP per ICT node butcan cause further pruning of ICT nodes and of single-agent MDDs.

3.4.5 m-Agent Pruning

1

2

3

1

2

3

Figure 3.10: Example of a bottleneck for three agents.

Not all conflicts can be captured by pairwise pruning. To illustrate this, consider Figure3.10(Left), where there is a bottleneck of two locations through which three agents needto pass at the same time. Each pair of agents can pass without conflicts (at a cost ofthree moves per agent) but the three agents cannot pass it at the same time with a totalof nine moves. Figure 3.10(Right) shows a slightly more complex example of a three-waybottleneck in a 4-connected grid.

All variants of pairwise pruning can easily be generalized to include groups of m > 2

agents. Given a group ofm agents (where 2 < m < k), one can actually search through them-agent-MDD search space. Again, if no solution is found for a given set of m agents, thecorresponding ICT node can be pruned and declared as a non-goal. The low-level searchon the k-agent-MDD search space will not be activated and the high-level search moves tothe next ICT node.

43

Next, we demonstrate experimentally that the pruning techniques described above canyield substantial speedup for ICTS.

3.5 Experiments: ICTS Pruning Techniques

In this section we experiment with the pruning techniques described above and study theirbehavior. We compared 7 different variants of pruning techniques for ICTS. As explainedabove, each of these techniques was activated before the low-level search. If a pruning wassuccessful the low-level search was not activated. The 7 variants are labeled as follows:

1. No pruning (NP). This is the basic ICTS.

2. Simple pairwise pruning (2S).

3. Enhanced pairwise pruning (2E).

4. Repeated enhanced pairwise pruning (2RE).

5. Simple triple pruning (3S).

6. Enhanced triple pruning (3E).

7. Repeated enhanced triple pruning (3RE).

Again, we set a time limit of 5 minutes. If a variant could not solve an instance withinthe time limit it was halted and fail was returned. We experimented on a 3 × 3, 4 × 4

and 8 × 8 4-connected grids with no obstacles. We also experimented on a large 257x257map (den520d) from the game Dragon Age: Origins(DAO) taken from Sturtevant’s repos-itory [Sturtevant, 2012]. This map is shown below in figure 3.13(top). For the grids, weused the coupling mechanism (described in Section 3.3.1) to generate hard instances andthe ID framework was not activated (experiment of type 2). For the game map, ID wasactivated (experiment of type 1).

For each of these grids and for every given number of agents, we randomized 100problem instances. The numbers in the following tables are averages over the instancesthat were solved by all variants out of the 100 instances. The number of such instances isgiven in the Ins columns of the tables discussed below.

3.5.1 Pruning Effectiveness

Table 3.4 compares the effectiveness of the different pruning variants for a given number ofagents (indicated by the k column). The effectiveness of a pruning technique was measuredby the number of non-goal ICT nodes that were not pruned, i.e., the nodes for which thelow-level search was activated. Obviously, lower numbers of non-goal ICT nodes indicatebetter efficiency of pruning, where zero means prefect pruning. The total number of all non-goal ICT nodes is given in the NP column. For example, consider the line that correspondsto k = 7 (the last number where all 100 instances could be solved by all variants) for the

44

k k′ Ins ∆ NP 2S 2E 2RE 3S 3E 3RE3× 3 grid

4 - 100 0.8 5 3 1 1 0 0 05 - 100 1.9 30 15 4 4 3 1 16 - 100 3.3 203 112 26 25 24 5 57 - 100 5.8 2,861 1,730 641 627 420 90 848 - 80 9.0 36,588 23,317 7,609 7,454 5,444 775 686

4× 4 grid5 - 100 0.8 7 5 2 2 1 0 06 - 100 1.5 28 12 4 4 4 0 07 - 100 2.0 87 65 18 18 15 1 18 - 99 3.3 528 300 53 51 46 4 49 - 98 4.6 3,441 1,528 349 347 189 12 12

10 - 77 5.6 8,658 3,618 584 582 382 9 78× 8 grid

5 - 100 1.5 12 9 1 1 4 0 06 - 99 1.9 43 21 2 2 7 0 07 - 98 2.2 67 25 7 7 5 1 18 - 96 2.4 135 53 9 9 17 1 19 - 88 2.5 258 79 18 17 43 1 1

10 - 65 2.8 402 93 24 23 56 2 2den520d

15 1.17 100 0.27 0.48 0.00 0.00 0.00 0.00 0.00 0.0030 1.52 85 0.80 1.60 0.00 0.00 0.00 0.00 0.00 0.0045 1.82 77 1.12 2.83 0.06 0.00 0.00 0.00 0.00 0.0060 1.99 68 1.19 3.81 0.09 0.00 0.00 0.00 0.00 0.0075 2.13 53 1.40 6.96 0.08 0.00 0.00 0.00 0.00 0.0090 2.35 32 1.35 10.19 0.29 0.00 0.00 0.00 0.00 0.00

Table 3.4: Number of (non-goal) ICT nodes where the low-level search was activated for 3×3 grid,4× 4 grid, and 8× 8 grid and the Den520d map. ID was activated for the den520d map only.

45

3 × 3 grid in the top of the table. There were 2,861 non-goal ICT nodes. For all of thesenodes basic ICTS (with no pruning) activated the low-level search. When 2S was activatedalmost half of them were pruned and the low-level search was only activated for 1,730 non-goal ICT nodes. This number decreases with the more sophisticated technique and for 2REmost of the nodes were pruned and only 641 nodes activated the low-level search. Triplepruning show the same tendency and it is not surprising that triple pruning was always ableto prune more ICT nodes than the similar pairwise pruning.

The middle part of the table corresponds to 4 × 4 and 8 × 8 grids. Similar tendenciescan be observed. Note that, solving problems with the same number of agents but on largergrids result in less conflicts, since the grids are less dense. This leads to small values of ∆

and therefore a small numbers of ICT nodes. This is counter intuitive because one mightexpect that problems on larger graphs will be more difficult. While this is generally true forthe case of a single agent, for k > 1 agents the amount of conflicts between agents plays asignificant role.

The bottom part of the table shows results for the DAO map (den520d). As mentionedabove, for this domain we have applied ID, and therefore present in the table both k and k′

(the number of agents in the largest independent subgroup). Notice that there is no need touse the advanced pruning methods for this map. The graph is very sparse with agents andtherefore k′ and ∆ are very small. This makes the ICT tree very small and easy to pruneeven by the simple pruning techniques.

It is important to note the correlation between k (or k′ when applicable), ∆ and theNP columns (the number of ICT nodes). When more agents exist, ∆ increases too but thenumber of ICT nodes increases exponentially with ∆. This phenomenon was studied in theSection 3.2.

3.5.2 Runtime

Table 3.5 shows the runtime results in ms for the same set of experiments11. As explainedabove, there is a time tradeoff per ICT node between the different variants; the enhancedvariants incur more overhead. Therefore, while the enhanced variants managed to alwaysprune more ICT nodes (as shown in Table 3.4) this is not necessarily reflected in the runningtime. However, clearly, one can observe the following trend. As the problem becomesdenser with more agents it pays off to use the enhanced pruning variants. Note that the bestvariant (given in bold) outperformed the basic NP variant by up to a factor of 50 in manycases. It is interesting to note from both tables that, for the cases we tested, 2E, 2RE, 3Eand 3RE performed similarly. They all managed to prune almost all non-goal ICT nodesand their time performance was very similar.

11Numbers are different from table 3.3 because different instances were considered. Here problems that were notsolved by A*+OD were also considered.

46

k k′ Ins ∆ NP 2S 2E 2RE 3S 3E 3RE3× 3 grid

4 - 100 0.8 1 3 1 1 4 0 15 - 100 1.9 9 30 5 7 37 6 76 - 100 3.3 92 316 61 77 447 56 687 - 100 5.8 3,800 9,408 2,174 2,936 14,343 1,821 2,4508 - 80 9.0 119,374 220,871 54,045 68,401 306,261 43,332 55,991

4× 4 grid5 - 100 0.8 5 14 2 3 13 2 36 - 100 1.5 25 48 10 14 69 9 117 - 100 2.0 217 406 54 71 463 41 588 - 99 3.3 2,387 3,604 456 586 3,657 364 5159 - 98 4.6 23,254 28,097 4,731 5,872 26,933 3,148 4,359

10 - 77 5.6 76,052 82,248 11,130 14,017 74,115 9,348 12,8178× 8 grid

5 - 100 1.5 781 797 50 55 643 13 196 - 99 1.9 2,454 2,326 54 66 1,531 44 627 - 98 2.2 5,183 3,745 507 536 1,615 92 1248 - 96 2.4 9,487 5,320 517 566 2,918 189 2579 - 88 2.5 47,778 31,733 2,042 2,183 18,428 451 628

10 - 65 2.8 61,666 38,835 4,830 5,160 28,677 1,218 1,1755den520d

15 1.17 100 0.27 3,458 3,865 701 725 3,857 705 76330 1.52 85 0.80 19,038 18,482 1,671 1,787 18,723 1,688 1,85845 1.82 77 1.12 23,672 24,478 6,317 6,513 25,003 6,413 6,62760 1.99 68 1.19 38,810 41,140 20,749 21,079 41,901 20,826 21,15575 2.13 53 1.40 69,665 67,107 25,316 25,771 67,897 25,508 25,91090 2.35 32 1.35 55,734 54,474 26,696 27,296 54,598 26,871 27,435

Table 3.5: Runtime (in ms) for 3× 3 grid, 4× 4 grid, 8× 8 grid and Den520d map.

47

3.6 Experiments: ICTS Vs. A* on Different Domains

In Section 3.3, experimental results were provided for comparing A* with basic ICTS.In Section 3.5, experimental results were provided to analyze and compare the differentpruning techniques. In this section, we conclude the experimental results by comparing thebest versions of A*, ICTS and ICTS with pruning, over a range of problems domains. Thestrongest variant for the A*-based approach is A*+OD, as presented by Standley [Standley,2010]. While in general all pruning techniques (except for SPP) showed similar trends, weobserved empirically that the ICTS+3E pruning techniques was the best pruning technique.Therefore, in our last set of experiments we only compared A*, A*+OD, basic ICTS andICTS+3E. Note that the aim in the results presented in this section was to solve problemswith as many agents as possible. Therefore, ID was always activated (experiments of type1).

3.6.1 8× 8 grids

The first set of experiments presented in this section is on the same 8×8 open grid describedin Section 3.3.

0

10

20

30

40

50

60

70

80

90

100

4 6 8 10 12 14 16 18 20 22

Succ

ess

Rat

e

Number Of Agents

A*+ID

A*+OD+ID

ICTS+ID

ICTS 3E+ID

Figure 3.11: Success rate on an 8× 8 grid. ID activated.

Nodes generated Run-Timek k’ Ins A*+OD ICTS ICTS+3E A*+OD ICTS ICTS+3E4 1.09 100 54 34 32 5 6 46 1.22 100 226 408 52 1,046 59 108 1.71 100 >3,793 3,078 111 >5,102 593 31

10 2.44 100 >5,264 1,262 165 >19,227 470 2712 3.41 99 >12,895 10,542 251 >22,856 4,310 7314 4.15 93 >16,982 5,358 475 >44,473 2,134 26516 5.02 66 >41,253 15,275 1,215 >77,216 9,453 1,167

Table 3.6: Run-time (in ms) on 8× 8 grid. Type 1 experiment; ID activated.

Figure 3.11 presents the number of instances that were solved under 5 minutes. Again

48

it is easy to see that the ICTS variants could solve more instances than the A* variants.Table 3.6 presents the number of nodes and runtime for the same experiment, averagedover the instances that could be solved by three algorithms: A*+OD, ICTS and ICTS+3E(indicated in the Ins column). It is clear that ICTS+3E is faster than ICTS and outperformsA*+OD by almost three orders of magnitude.

Note that previous results shown for the 8 × 8 grid (displayed in Figure 3.8 and Ta-bles 3.3 and 3.5) were experiments of type 2 (no ID). Therefore, less instances were solvedby all algorithms on the type 2 experiments, since when ID is applied, both A* and ICTSare activated on independent subgroups that often have significantly fewer agents than k.12

3.6.2 Grid with Scattered Obstacles

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20 25

Succ

ess

Rat

e

%Obstacle

ICTS 3E+ID

ICTS +ID

A*+OD+ID

Figure 3.12: Success rate for 40 agents on a 32x32 grid with scattered obstacles. Experiment oftype 1 - ID activated.

In this experiment we generated five grids of size 32x32 which differ in the percentageof random cells that were declared as obstacles. This number was varied from 0% to25% in increases of 5%. We then randomized start and goal locations for 40 agents. Theexperiment is of type 1 and ID was always activated.

Figure 3.12 presents the number of instances (out of 100 random instances) solvedby each algorithm within the 5 minutes limit. As can clearly be seen, for every obstaclepercentage except 25% we see a similar trend. ICTS+3E outperforms basic ICTS whichin turn outperforms A*+OD. Note that for the fixed number of agents (40) when there aremore obstacles the problem becomes harder, since more conflicts between agents occur.As a result, all algorithms managed to solve a very small number of instances for 25%obstacles.

49

0

10

20

30

40

50

60

70

80

90

100

5 10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

3E+ID

ICTS +ID

A*+OD+ID

0

10

20

30

40

50

60

70

80

90

100

5 10 15 20 25 30 35 40 45 50 55

3E+ID

ICTS +ID

A*+OD+ID

0

10

20

30

40

50

60

70

80

90

100

5 10 15 20 25 30 35 40

3E+ID

ICTS +ID

A*+OD+ID

Figure 3.13: DAO maps (left). Their performance (right). The x-axis = number of agents. They-axis = success rate.

3.6.3 Dragon Age Maps

We also experimented with maps from the game Dragon Age: Origins, which are part ofSturtevant’s repository of benchmarks [Sturtevant, 2012]. Figure 3.13 shows three suchmaps (den520d (top), ost003d (middle) and brc202d (bottom)) and the success rateof solving 100 random instances on these maps within the 5 minutes time limit. Thesespecific maps were chosen as they have the characteristics of different domains. Mapden520d (top) has many large open spaces and no bottlenecks, map ost003d (mid-dle) has a few open spaces and a few bottlenecks and map brc202d (bottom) has almostno open spaces and many bottlenecks.

The different curves shown on the right side of Figure 3.13 have the same meaning asthe curves in Figure 3.12. The experiments were of type 1 - ID always activated. Clearly, inall these maps ICTS+3E significantly outperformed basic ICTS and A*+OD. In den520d(top) and ost003d (middle), even the basic ICTS outperformed A*+OD. By contrast, inbrc202d (bottom), A*+OD outperformed basic ICTS. This supports by our theoreticalanalysis as follows. The brc202d map is similar to a maze. Mazes often have long andnarrow corridors. Therefore, if a number of agents conflict, resolving their conflict mightneed a large number of extra steps. Each of these extra steps increases ∆ and consequently,the performance of ICTS will degrade. Thus, in this map ICTS was outperformed by

12For completeness, Figure 3.8 also shows success rate for ICTS+3E and clearly demonstrates that ICTS+3E is farsuperior to A*+OD and basic ICTS in type 2 experiments.

50

A*+OD. Only the ICTS+3E enhancement managed to outperform A*+OD in this map.Note that an example of such a maze-like case was given in Figure 3.9, when discussingthe limitation of ICTS.

k Ins k’ ∆ A*+OD ICTS ICTS+3Eden520d

10 97 1.1 0.1 992 561 48820 93 1.2 0.4 5,454 2,921 1,26630 71 1.4 0.6 16,316 6,560 1,39040 70 1.5 0.7 32,376 9,471 2,61750 45 1.8 1.1 49,511 8,406 10,219

ost003d10 98 1.3 0.4 10,843 1,555 35920 79 1.6 1.1 31,777 4,000 94530 41 1.9 1.4 90,765 10,971 2,449

brc202d5 96 1.2 0.3 6,312 2,238 552

10 86 1.4 0.9 23,218 6,286 1,57615 63 1.7 1.4 36,590 18,354 2,89620 37 1.9 1.6 51,927 46,371 5,010

Table 3.7: A*+OD Vs. basic ICTS and ICTS+3E on the DAO maps (runtime in ms).

Table 3.7 presents the average running times of A*+OD, ICTS and ICTS+3E, overthe instances (out of same 100 instances used above) that could be solved by all threealgorithms. When the number of agents increases, only relatively easy problems (out ofthese 100 instances) were solved, hence the numbers do not necessarily increase. In allthese maps, ICTS+3E significantly outperforms ICTS and A*+OD by up to two ordersof magnitude. Again, due to the maze-like topology of brc202d, A*+OD outperformsbasic ICTS and a relatively smaller advantage of ICTS+3E over A*+OD is observed forthis specific map.

Note that a subset of the experiments shown in this chapter have already been pre-sented by the author in previous publications [Sharon et al., 2011a]. A pedantic readermight notice faster running-time in this chapter for both A* and A*+OD. This is due toan improved implementation of A* performed for this thesis. This improved A* imple-mentation includes the following enhancement to both versions of A*. If a state that hasexactly the same f -value as its predecessor is generated, it is immediately expanded anddoes not enter the open list. This enhancement, called immediate expand, was previouslyproposed for A* [Sun et al., 2009; Stern et al., 2010]. Immediate expand can result insignificant speedups especially in high branching factor problems as the MAPF problem.This explains the faster A* runtime shown above.

3.7 Conclusions

In this Chapter we presented the ICTS algorithm for optimally solving MAPF. We com-pared ICTS to A* theoretically and experimentally on a range of domains. In particular,we observed that the performance of A* tends to degrade mostly when k increases whilethe performance of ICTS with respect to the performance of A* tends to degrade when ∆

51

increases. Therefore, there is no universal winner. ICTS will be inefficient in very denseenvironments such as a 3 × 3 grid with 8 agents, where many conflicts occur, or in caseswhere resolving conflicts is very costly (e.g., the example shown in Figure 3.9). However,we have demonstrated that in many natural domains this is not the case, and ICTS obtainsa significant speedup of up to 2 orders of magnitude over A*+OD.

We also introduced a number of techniques for pruning ICT nodes without the need toactivate the low-level search of ICTS. All of these techniques significantly outperform thebasic variant of ICTS, where no pruning is performed and the low-level search is activatedfor every ICT node. There is a tradeoff between the different pruning techniques. Moresophisticated pruning methods incur larger overhead but are able to prune a larger portionof the ICT nodes. While no pruning technique dominated all other techniques in all theproblem instances, the following guideline was observed. When the problem becomesmore dense and more conflicts exist (causing larger ∆) it is beneficial to apply the moreadvanced pruning techniques.

Future work will continue in two main directions:(1) The low-level search can be viewed as a Constraint Satisfaction Problem (CSP).

The goal is to decide whether there is a solution where every agent is constrained to finda path of a specific cost. One way to encode the low-level search as a CSP would be toassign a variable for every agent-time pair (a, t). The values of this variable would bethe possible locations of agent a at time t according to the MDD of a. Constraints will beadded to avoid collisions and to ensure that a valid path for every agent is returned. This willallow using state-of-the-art CSP solvers, instead of the systematic search that is currentlyproposed. Note that the pruning techniques proposed in this chapter can be viewed as formsof arc and path-consistency. Hence, a possible future work would be to use more advancedpruning techniques that are based on arc-consistency and path-consistency algorithms, suchas AC3.1 [Zhang and Yap, 2001].

(2) Extending ICTS for weighted graphs and for cases where the agents have an abstractgoal instead of a specific goal for every agent. For example, a group of agents can be givena goal to leave a specific area.

52

Chapter 4

The Conflict-Based Search

In this chapter we present the Conflict Based Search (CBS), an optimal multi agent pathfinding algorithm. CBS is a two-level algorithm that does not convert the problem intothe single ‘joint agent’ model. At the high level, a search is performed on a Conflict Tree(CT) which is a tree based on conflicts between individual agents. Each node in the CTrepresents a set of constraints on the motion of the agents. At the low level, fast single-agent searches are performed to satisfy the constraints imposed by the high level CT node.In many cases this two-level formulation enables CBS to examine fewer states than A*while still maintaining optimality. We analyze CBS and show its benefits and drawbacks.

Additionally we present the Meta-Agent CBS (MA-CBS) algorithm. MA-CBS is a gen-eralization of CBS. Unlike basic CBS, MA-CBS is not restricted to single-agent searchesat the low level. Instead, MA-CBS allows agents to be merged into small groups of jointagents. This mitigates some of the drawbacks of basic CBS and further improves perfor-mance. In fact, MA-CBS is a framework that can be built on top of any optimal and com-plete MAPF solver in order to enhance its performance. Experimental results on variousproblems show a speedup of up to an order of magnitude over previous approaches.

This chapter is organized as follows. In Section 4.1 we introduce and formulate the CBSalgorithm. Section 4.2 theoretically compares CBS to the ICTS and A*-based search algo-rithms and show its advantages and drawbacks. Section 4.3 provides initial experimentalresults. Section 4.4 presents adaptations for CBS that are needed for different MAPF set-tings. Section 4.5 introduces Meta-Agent CBS. Sections 4.6 provides further experimentalresults and Section 4.7 concludes this chapter and describes future work.

This chapter is almost an exact copy of [Sharon et al., 2015].

4.1 The Conflict Based Search Algorithm

We now turn to describe our new algorithm, the Conflict Based Search Algorithm (CBS).Later, in Section 4.5, we present a generalization to CBS called Meta-Agent Conflict BasedSearch (MA-CBS). In addition, a memory efficient variant of CBS is presented in Ap-pendix A.

Recall that the state space spanned by A* in MAPF is exponential in k (the number ofagents). By contrast, in a single-agent path finding problem, k = 1, and the state space is

53

only linear in the graph size. CBS solves MAPF by decomposing it into a large numberof constrained single-agent path finding problems. Each of these problems can be solvedin time proportional to the size of the map and length of the solution, but there may be anexponential number of such single-agent problems.

4.1.1 Definitions for CBS

The following definitions are used in the remainder of the chapter.

• We use the term path only in the context of a single agent and use the term solutionto denote a set of k paths for the given set of k agents.

• A constraint is a tuple (ai, v, t) where agent ai is prohibited from occupying vertexv at time step t. During the course of the algorithm, agents will be associated withconstraints. A consistent path for agent ai is a path that satisfies all its constraints.Likewise, a consistent solution is a solution that is made up from paths, such that thepath for any agent ai is consistent with the constraints of ai.

• A conflict is a tuple (ai, aj, v, t) where agent ai and agent aj occupy vertex v at timepoint t. A solution (of k paths) is valid if all its paths have no conflicts. A consistentsolution can be invalid if, despite the fact that the individual paths are consistent withthe constraints associated with their agents, these paths still have conflicts.

The key idea of CBS is to grow a set of constraints and find paths that are consistentwith these constraints. If these paths have conflicts, and are thus invalid, the conflicts areresolved by adding new constraints. CBS works in two levels. At the high level, conflictsare found and constraints are added. The low level finds paths for individual agents thatare consistent with the new constraints. Next, we describe each part of this process in moredetail.

4.1.2 High Level

In the following section we describe the high-level process of CBS and the search tree itsearches.

The Constraint Tree

At the high level, CBS searches a tree called the constraint tree (CT). A CT is a binary tree.Each node N in the CT consists of:

1. A set of constraints (N.constraints). Each of these constraints belongs to a singleagent. The root of the CT contains an empty set of constraints. The child of a nodein the CT inherits the constraints of the parent and adds one new constraint for oneagent.

54

2. A solution (N.solution). A set of k paths, one path for each agent. The path foragent ai must be consistent with the constraints of ai. Such paths are found by thelow-level search.

3. The total cost (N.cost) of the current solution (summed over all the single-agent pathcosts). This cost is referred to as the f -value of node N .

Node N in the CT is a goal node when N.solution is valid, i.e., the set of paths for allagents has no conflicts. The high level performs a best-first search on the CT where nodesare ordered by their costs. In our implementation, ties are broken in favor of CT nodeswhose associated solution contains fewer conflicts. Further ties were broken in a FIFOmanner.

Processing a Node in the CT

Given the list of constraints for a node N of the CT, the low-level search is invoked. Thelow-level search (described in detail below) returns one shortest path for each agent, ai,that is consistent with all the constraints associated with ai in node N . Once a consistentpath has been found for each agent (with respect to its own constraints) these paths are thenvalidated with respect to the other agents. The validation is performed by iterating throughall time steps and matching the locations reserved by all agents. If no two agents plan to beat the same location at the same time, this CT node N is declared as the goal node, and thecurrent solution (N.solution) that contains this set of paths is returned. If, however, whileperforming the validation, a conflict C = (ai, aj, v, t) is found for two or more agents aiand aj , the validation halts and the node is declared a non-goal node.

Resolving a Conflict

Given a non-goal CT nodeN whose solutionN.solution includes a conflictCn = (ai, aj, v, t)

we know that in any valid solution, at most one of the conflicting agents (ai and aj) mayoccupy vertex v at time t. Therefore, at least one of the constraints (ai, v, t) or (aj, v, t)

must be added to the set of constraints in N.constraints. To guarantee optimality, bothpossibilities are examined and node N is split into two children. Both children inherit theset of constraints from N . The left child resolves the conflict by adding the constraint(ai, v, t) and the right child adds the constraint (aj, v, t).

Note that for a given CT nodeN , one does not have to save all its cumulative constraints.Instead, it can save only its latest constraint and extract the other constraints by traversingthe path from N to the root via its ancestors. Similarly, with the exception of the root node,the low-level search should only be performed for agent ai which is associated with thenewly added constraint. The paths of other agents remain the same as no new constraintsare added for them.

55

(2,v,t), (3,v,t) (1,v,t), (2,v,t) (1,v,t), (3,v,t)

(2,v,t), (3,v,t) (1,v,t), (2,v,t) (1,v,t), (3,v,t)

(1,v,t) (2,v,t)

Figure 4.1: A (k = 3)-way branching CT (top) and a binary CT for the same problem (bottom).

Conflicts of k > 2 Agents

It may be the case that while performing the validation between the different paths a k-agentconflict is found for k > 2. There are two ways to handle such k-agent conflicts. We cangenerate k children, each of which adds a constraint to k− 1 agents (i.e., each child allowsonly one agent to occupy the conflicting vertex v at time t). Or, an equivalent formalizationis to only focus on the first two agents that are found to conflict, and only branch accordingto their conflict. This leaves further conflicts for deeper levels of the tree. This is illustratedin Figure 4.1. The top tree represents a variant of CT where k-way branching is allowedfor a single conflict that includes k-agents for the case where k = 3. Each new successoradds k − 1 (=2) new constraints (on all agents but one). The bottom tree presents a binaryCT for the same problem. Note that the bottom middle state is a duplicate state, and ifduplicate detection is not applied there will be two occurrences of this node instead of one.As can be seen the size of the deepest layer in both trees is identical. The complexity ofthe two approaches is similar, as they both will end up with k nodes, each with k − 1 newconstraints. For simplicity, we implemented and describe only the second option.

Edge Conflicts

For simplicity, we described only conflicts that occur in vertices. But if according to theproblem definition agents are not allowed to cross the same edge at opposite directions thenedge conflicts can also occur. We define an edge conflict to be the tuple (ai, aj, v1, v2, t)

where two agents “swap” locations (ai moves from v1 to v2 while aj moves from v2 to v1)between time step t to time step t+ 1. An edge constraint is defined as (ai, v1, v2, t), whereagent ai is prohibited of starting to move along the edge from v1 to v2 at time step t (andreaching v2 at time step t+1). When applicable, edge conflicts are treated by the high levelin the same manner as vertex conflicts.

Pseudo-Code and Example

Algorithm 4 presents the pseudo code for CBS, as well as the more advanced MA-CBS thatwill be explained later on in Section 4.5. Lines 12-18 are only relevant for MA-CBS. For

56

Algorithm 4: high level of CBS (and MA-CBS)Input: MAPF instance

1 Root.constraints = ∅2 Root.solution = find individual paths by the low level()3 Root.cost = SIC(Root.solution)4 insert Root to OPEN5 while OPEN not empty do6 P ← best node from OPEN // lowest solution cost7 Validate the paths in P until a conflict occurs.8 if P has no conflict then9 return P.solution // P is goal

10 C← first conflict (ai, aj , v, t) in P11 if shouldMerge(ai, aj) // Optional, MA-CBS only then12 ai,j = merge(ai, aj , v, t)

13 Update P.constraints(external constraints).14 Update P.solution by invoking low level(ai,j)15 Update P.cost16 if P.cost <∞ // A solution was found then17 Insert P into OPEN

18 continue // go back to the while statement

19 foreach agent ai in C do20 A← new node21 A.constraints← P.constraints + (ai, v, t)22 A.solution← P.solution23 Update A.solution by invoking low level(ai)24 A.cost = SIC(A.solution)25 if A.cost <∞ // A solution was found then26 Insert A into OPEN

now, assume that the shouldMerge() function (in Line 11) always returns false, skippingLines 11-18. The high level has the structure of a best-first search.

We describe CBS using our example from Figure 2.1 (Section 2.1.7), where the miceneed to get to their respective pieces of cheese. The corresponding CT is shown in Fig-ure 4.2. The root contains an empty set of constraints. In Line 2 the low level returns anoptimal solution for each agent, 〈S1, A1, D,G1〉 for a1 and 〈S2, B1, D,G2〉 for a2. Thus,the total cost of this node is 6. All this information is kept inside this node. The root is theninserted into OPEN and will be expanded next.

When validating the two-agent solution given by the two individual paths (line 7), aconflict is found when both agents arrive at vertex D at time step 2. This creates a con-flict (a1, a2, D, 2) (Line 10). As a result, the root is declared as a non-goal and two chil-dren are generated in order to resolve the conflict (Line 19). The left child, adds the con-straint (a1, D, 2) while the right child adds the constraint (a2, D, 2). The low-level searchis now invoked (Line 23) for the left child to find an optimal path that also satisfies thenew constraint. For this, a1 must wait one time step either at A1 or at S1 and the path〈S1, A1, A1, D,G1〉 is returned for a1. The path for a2, 〈S2, B1, D,G2〉 remains unchangedin the left child. The total cost for the left child is now 7. In a similar way, the right child is

57

Figure 4.2: An example of a Constraint Tree (CT).

generated, also with cost 7. Both children are inserted to OPEN (Line 26). In the next it-eration of the while loop (Line 5) the left child is chosen for expansion, and the underlyingpaths are validated. Since no conflicts exist, the left child is declared a goal node (Line 9)and its solution is returned as an optimal solution.

4.1.3 Low Level: Find Paths for CT Nodes

The low-level search is given an agent, ai, and the set of constraints associated with ai. Itperforms a search in the underlying graph to find an optimal path for agent ai that satisfiesall its constraints while completely ignoring the other agents. The search space for thelow-level search has two dimensions: the spatial dimension and the time dimension.1 Anysingle-agent path-finding algorithm can be used to find the path for agent ai, while verifyingthat the constraints are satisfied. We implemented the low-level search of CBS with A*which handled the constraints as follows. Whenever a state (v, t) is generated where v isa location and t a time step and there exists a constraint (ai, v, t) in the current CT (high-level) node, this state is discarded. The heuristic we used is the shortest path in the spatialdimension, ignoring other agents and constraints.

For cases where two low-level A* states have the same f -value, we used a tie-breakingpolicy based on Standley’s tie-breaking conflict avoidance table (CAT) (described in Sec-tion 2.2.3). States that contain a conflict with a smaller number of other agents are pre-ferred. For example, if states s1 = (v1, t1) and s2 = (v2, t2) have the same f value, but v1

is used by two other agents at time t1 while v2 is not used by any other agent at time t2,then s2 will be expanded first. This tie-breaking policy improves the total running time by afactor of 2 compared to arbitrary tie breaking. Duplicate states detection and pruning (DD)speeds up the low-level procedure. Unlike single-agent path finding, the low-level state-space also includes the time dimension and dynamic ‘obstacles’ caused by constraints.Therefore, two states are considered duplicates if both the position of ai and the time stepare identical in both states.

1The spatial dimension itself may contain several internal dimensions. For example a 2D map contains two dimen-sions. A k-dimensional graph along with the time dimension results in a search space with k + 1 dimensions.

58

4.2 Theoretical Analysis

In this section we discuss theoretical aspects of CBS. We begin with a proof that CBSreturns the optimal solution followed by a proof of completeness. We then discuss the prosand cons of CBS when compared to other algorithms. All claims in this section assume thesum-of-costs cost function. Adapting these claims to other cost functions is discussed inSection 4.4.

4.2.1 Optimality of CBS

We start by providing several supporting claims for the optimality proof.

Definition 1 For a given nodeN in the constraint tree, letCV (N) be the set of all solutionsthat are: (1) consistent with the set of constraints of N and (2) are also valid (i.e., withoutconflicts).

If N is not a goal node, then the solution at N will not be part of CV (N) because thesolution is not valid. For example, consider the root node. The root has no constraints thusCV (root) equals the set of all possible valid solutions. If the solution chosen for the rootis not valid and thus is not part of CV (root), the root will not be declared a goal.

Definition 2 We say that node N permits a solution p if p ∈ CV (N).

The root of the CT, for example, has an empty set of constraints. Any valid solutionsatisfies the empty set of constraints. Thus the root node permits all valid solutions.

The cost of a solution in CV (N) is the sum of the costs of the individual agents. LetminCost(CV (N)) be the minimum cost over all solutions in CV (N). for CV (N) = ∅define minCost(CV (N)) =∞.

Lemma 2 The cost of a node N in the CT is a lower bound on minCost(CV (N)).

Proof: Since N.cost is the sum of all the optimal consistent single agent solutions,it has the minimum cost among all consistent solutions. By contrast, minCost(CV (N))

has the minimum cost among all consistent and valid solutions. Since the set of all con-sistent and valid solutions is a subset of all consistent solutions, it must be that N.cost ≤minCost(CV (N)).

Lemma 3 For each valid solution p, there exist at least one node N in OPEN such that Npermits p.

Proof: By induction on the expansion cycle: In the base case OPEN only contains the rootnode, which has no constraints. Consequently, the root node permits all valid solutions.Now, assume this is true after the first i − 1 expansions. During expansion i assume thatnode N is expanded. The successors of node N - N1 and N2 are generated. Let p be a validsolution. If p is permitted by another node in OPEN, we are done. Otherwise, assume p ispermitted by N . We need to show that p must be permitted by at least one of its successors.

59

The new constraints forN1 andN2 share the same time and location, but constrain differentagents. Suppose a solution p permitted by N has agent a1 at the location of the constraint.Agent a1 can only be constrained at one of N1 and N2, but not both, so one of these nodesmust permit p. Thus, the induction holds.

Consequence: at all times at least one CT node in OPEN permits the optimal solution(as a special case of Lemma 3).

Theorem 2 CBS returns an optimal solution.

Proof: When a goal node G is chosen for expansion by the high level, all valid solutionsare permitted by at least one node from OPEN (Lemma 3). Let p be a valid solution (withcost p.cost) and let N(p) be the node that permits p in OPEN, Let N.cost be the cost ofnode N . N(p).cost ≤ p.cost (Lemma 2). Since G is a goal node, G.cost is a cost of avalid solution. Since the high-level search explores nodes in a best-first manner accordingto their cost we get that G.cost ≤ N(p).cost ≤ p.cost

4.2.2 Completeness of CBS

The state space for the high-level search of CBS is infinite as constraints can be added foran infinite number of time steps. This raises the issue of completeness. Completeness ofCBS includes two claims:

• Claim a: CBS will return a solution if one exists.

• Claim b: CBS will identify an unsolvable problem.

We will now show that claim a is always true while claim b requires a test independentof CBS.

Claim a

Theorem 3 For every cost C, there is a finite number of CT nodes with cost C.

Proof: Assume a CT node N with cost C. After time step C all agents are at their goalposition. Consequently, no conflicts can occur after time step C. Since constraints arederived from conflicts, no constraints are generated for time steps greater than C. Asthe cost of every CT node is monotonically non-decreasing, all of the predecessors of theCT node N have cost ≤ C. Hence, neither N nor any of its predecessors can generateconstraints for time step greater than C. Since there are a finite number of such constraints(at most k · |V | ·C constraints on vertices and k · |E| ·C constraints on edges), there is alsoa finite number of CT nodes that contain such constraints.

Theorem 4 CBS will return a solution if one exist.

Proof: CBS uses a systematic best-first search, and the costs of the CT nodes are monoton-ically non-decreasing. Therefore, for every pair of costs X and Y , if X < Y then CBS willexpand all nodes with cost X before expanding nodes of cost Y . Since for each cost thereare a finite number of CT nodes (Theorem 3), then the optimal solution must be found afterexpanding a finite number of CT nodes.

60

Claim b

Figure 4.3: An example of an unsolvable MAPF instance.

Claim b, does not always hold for CBS. For example, consider the problem presented inFigure 4.3 where two agents need to switch locations. The CT will grow infinitely, addingmore and more constraints, never reaching a valid solution. Fortunately, Yu and Rus [Yuand Rus, 2014] have recently presented an algorithm that detects weather a given MAPFinstance is solvable or not. Running their algorithm prior to CBS would satisfy claim b asCBS will be called only if the instance is solvable.

4.2.3 Comparison with Other Algorithms

This section compares the work done by CBS to that of A* when both aim to minimizethe sum-of-costs function and both use the SIC heuristic. Assume we use A* for MAPF.Let χ be the set of (multi-agent) A* nodes with f < C∗ when A* is executed with the SICheuristic. Also, let X = |χ|. It is well known that A* must expand all nodes in χ in orderto guarantee optimality [Dechter and Pearl, 1985].

In prior work we analyzed the worst case behavior of A* and ICTS [Sharon et al.,2013a]. A* generates, in the worst case, up to X × (bbase)

k nodes. ICTS searches, in theworst case, up to X × k∆ low-level nodes (where ∆ is the depth of the lowest cost ICTgoal node). Note that the low-level nodes visited by ICTS are states in the k-agent MDDsearch space, which is similar but not the same as the nodes visited by A*. For more details,see [Sharon et al., 2013a]. We limit the discussion here to comparing CBS only to A*, asthe relation between A* and ICTS was already studied [Sharon et al., 2013a]. Let Υ be theset of nodes with cost < C∗ in the CT and let Y = |Υ|. As a best-first search guided bythe cost of nodes and since cost is monotonically non-decreasing, CBS must expand all thenodes in Υ. We restrict ourselves to giving an upper bound on Y . As the branching factorof the CT is 2, 2d nodes must be expanded in the worst case where d is the depth of theCT once a solution is found. At each node of the CT exactly one constraint is added. Inthe worst case an agent will be constrained to avoid every vertex except one at every timestep in the solution. The total number of time steps summed over all agents is C∗. Thus,an upper bound on Y , the number of CT nodes that CBS needs to expand, is 2|V |×C

∗ . Foreach of these nodes the low level is invoked and expands at most |V | × C∗ (single-agent)states for each time step. Let Y be the number of states expanded in the underlying graph(low-level states). Y = O(2|V |×C

∗ × |V | ×C∗). Note that we counted low-level nodes thatare expanded within expanded high-level nodes. If we also want to consider the generatedhigh-level nodes we should multiple this by 2 as each expanded CT node generates 2 new

61

nodes. Again, in practice Y and Y can be significantly smaller.CBS performs a high-level search in the CT and then a low-level search in the under-

lying graph and expands a total of Y low-level states. Thus, if Y X , that is, if thenumber of states expanded in all the single-agent searches is much smaller than the num-ber of (multi-agent) A* states with f < C∗, then CBS will outperform A* and vice versa.One may be surprised by this result, showing that CBS can potentially consider less statesthan A*, as A* is known to be “optimally efficient” in the sense that it expands only theset of nodes necessary to find optimal solution. The “optimally efficient” property of A*,however, is only true if comparing A* with another BFS searching the same state space,using the same, consistent, heuristic function, and ignoring the impact of tie-breaking be-tween states with the same f values [Dechter and Pearl, 1985]. CBS, searches a completelydifferent state space than the traditional A*, and thus can be arbitrarily better than A*.

Next we show special cases where Y X (bottleneck) and where X Y (openspace). The overall trend seems to be typical for MAPF, where the topology of the domaingreatly influences the behavior of algorithms.

Example of CBS Outperforming A* (bottlenecks)

Our example of Figure 2.1 (Section 2.1.7) demonstrates a case where Y X , i.e., a casewhere CBS expands fewer nodes than A*. As detailed above, CBS generates a total ofthree CT nodes (shown in Figure 4.2 in Section 4.1.2). At the root, the low level is invokedfor the two agents. The low-level search finds an optimal path for each of the two agents(each of length 3), and expands a total of 8 low-level states for the CT root. Now, a conflictis found at D. Two new CT children nodes are generated. In the left child the low-levelsearches for an alternative path for agent a1 that does not pass through D at time step 2. S1

plus all m states A1, . . . , Am are expanded with f = 3. Then, D and G1 are expanded withf = 4 and the search halts and returns the path 〈S1, A1, A1, D,G1〉.

Thus, at the left child a total of m + 3 nodes are expanded. Similar, m + 3 states areexpanded for the right child. Adding all these to the 8 states expanded at the root we getthat a total of Y = 2m+ 14 low-level states are expanded.

Now, consider A* which is running in a 2-agent state space. The root (S1, S2) hasf = 6. It generates m2 nodes, all in the form of (Ai, Bj) for (1 ≤ i, j ≤ m). All thesenodes are expanded with f = 6. Now, node (A1, D) with f = 7 is expanded (agent a1

waits at A1). Then nodes (D,G2) and (G1, G2) are expanded and the solution is returned.So, in total A* expanded X = m2 + 3 nodes. For m ≥ 5 this is larger than 2m + 14

and consequently, CBS will expand fewer nodes. A* must expand the Cartesian product ofsingle agent paths with f = 3. By contrast, CBS only tried two such paths to realize thatno solution of cost 6 is valid.

Furthermore, the constant time per (low-level) node of CBS is much smaller than theconstant time per node of A* for two reasons: A* expands multi-agent nodes while CBSexpands single-agent states. Second, the open list maintained by CBS is much smaller be-cause the single agent search space is linear in the size of the input graph. By contrast theopen list for A* deals with the multi-agent state space which is exponentially larger. Con-

62

sequently, insertion and extraction of nodes from the open list is faster in CBS. CBS alsoincurs overhead directly at the high-level nodes. Each non-goal high-level node requiresvalidating the given solution and generating two successors. The number of high-levelnodes is very small compared to the low-level nodes. Consequently, the overhead of thehigh-level is negligible.

Example of A* Outperforming CBS (open space)

D

1 A

B

C

2 3 4

Figure 4.4: A pathological case where CBS is extremely inefficient.

Figure 4.4 presents a case where Y X and A* will outperform CBS. There is anopen area in the middle (in gray) and all agents must cross this area. For each agent thereare four optimal paths of length 4 and thus the SIC heuristic of the start state is 8. However,each of the 16 combinations of these paths has a conflict in one of the gray cells. Conse-quently, C∗ = 9 as one agent must wait at least one step to avoid collision. For this problemA* will expand 5 nodes with f = 8:(D2, C1), (D3, C2), (D3, B1), (C2, B1), (C3, B2) and 3 nodes with f = 9 (B3, B2), (A3, B3), (A3, B4)until the goal is found and a total of 8 nodes are expanded.

Now, consider CBS. CBS will build a CT which is shown in Figure 4.5. The CT consistsof 5 non-goal CT nodes with cost 8, and 6 goal CT nodes (dashed outline) with cost 9.The root CT node will run the low-level search for each agent to a total of 8 low-levelexpansions. Each non-goal CT node except the root will run the low-level search for asingle agent to a total of 4 low-level expansions. Each goal CT node will expand 5 low-level nodes. In total, CBS will expand 8 + 4 · 4 + 6 · 5 = 54 low-level nodes.

Since the Conflict Tree grows exponentially in the number of conflicts encountered,CBS behaves poorly when a set of agents is strongly coupled, i.e., when there is a high rateof internal conflicts between agents in the group.

While it is hard to predict the performance of the algorithms in actual domains, theabove observations can give some guidance. If there are more bottlenecks, CBS will haveadvantage over the A*-based approaches as it will rule out the f -value where agents conflictin the bottleneck very quickly and then move to solutions which bypass the bottlenecks.If there are more open spaces, A* will have the advantage over CBS as it will rule outconflicted solutions very fast. Next, we show experimental results supporting both cases.

63

Figure 4.5: The CT for the pathological case where CBS is extremely inefficient.

4.3 CBS Empirical Evaluation

CBS, as well as the other algorithms presented in this thesis, is applicable to many variantsof the MAPF problem. Next, we describe the MAPF variant and experimental setting usedin the following empirical analysis. This specific MAPF variant and setting was chosento conform with prior work [Sharon et al., 2013a; Sharon et al., 2015; Standley, 2010;Standley and Korf, 2011].

4.3.1 Experimental Problem Settings

At each time step all agents can simultaneously perform a move or wait action. Our im-plementation assumes that both moving and waiting have unit cost. We also make thefollowing two assumptions:

1. The agents never disappear. Even if an agent arrived at its goal it will block otheragents from passing through it.

2. Wait actions at the goal cost zero only if the agent never leaves the goal later. Other-wise, they cost one.

In addition, in our experimental setting agents are allowed to follow each other. Thatis, agent ai can move from x to y if, at the same time, agent aj also moves from y toz. Following [Standley, 2010; Sharon et al., 2013a; Sharon et al., 2015; Yu and LaValle,2013a; Erdem et al., 2013], we allow agents to follow each other in a cyclic chain. Webelieve that this policy is more suited to represent a multi-robot scenario, as indeed robotscan move simultaneously in a circle (without an empty space). Not allowing following ina chain is more common in the pebble motion literature [Daniel Kornhauser, 1984], where

64

pebbles are moved one after the other, and thus at least one empty space is needed to initiatea movement. Edge conflicts are also prohibited. That is agent ai is prohibited to move fromx to y if at the same time agent aj moves from y to x. Our implementation does not usea duplicate detection and pruning procedure (DD) at the high level as we found it to havea large overhead with negligible improvement. The low level, on the other hand, does useDD.

The specific global cumulative cost function used in our experiments is the sum-of-costs function explained in Section 2.1.4. If, for instance, it takes agents a1 and a2 2 and3 time steps to reach their goal respectively, then the sum-of-costs for these two agents is2 + 3 = 5. Note that for each agent the number of time steps are counted until the time stepin which it arrives at its goal without moving away. An agent that reaches its goal but lateron is forced to move away might cause a dramatic increase in the total cost. To remedythis we used the same mechanism of EPEA* [Goldenberg et al., 2014] which generateshigh-level nodes one at a time according to their f -value.

For an admissible heuristic for the low-level search we used the SIC heuristic (seeSection 2.2.3) in all our experiments.

4.3.2 Experimental Results

We implemented and experimented with A*, EPEA*, ICTS+pruning (denoted ICTS) andCBS. For ICTS, we used the all triples pruning [Sharon et al., 2013a], which has beenfound to be very effective. All algorithms, excluding ICTS, are based on the SIC heuristic.ICTS uses more advanced pruning that could potentially apply to CBS and A* as advancedheuristics in the future. Despite this, CBS without this advanced heuristic still outperformsICTS in many scenarios.

8×8 4-Connected Grid

We begin with an 8×8 4-connected open grid where the number of agents ranges from3 to 21. We set a time limit of 5 minutes. If an algorithm could not solve an instancewithin the time limit it was halted and fail was returned. Our aim here is to study thebehavior of the different algorithms for a given number of agents. When the ID frameworkis applied to k agents (whose start and goal locations are randomized) the resulting effectivenumber of agents, k′, is noisy and its variance is very large. Therefore, for this experimentwe followed [Sharon et al., 2013a] and created problem instances where all agents aredependent, according to the ID framework.2 In such cases k′ ≡ k and running the IDframework is superfluous.

Figure 4.6 shows the success rate, i.e., the percentage of instances that could be solvedunder 5 minutes by the different algorithms when the number of agents increases. In thesesimple problems A* is clearly inferior, and CBS holds a slight advantage over the otherapproaches.

Table 4.1 presents the number of nodes generated and the run time averaged over 1002Such experiments were called type 2 experiments in [Sharon et al., 2013a].

65

0

10

20

30

40

50

60

70

80

90

100

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Succ

ess

rat

e

k'

CBS

ICTS

EPEA*

A*

Figure 4.6: Success rate vs. number of agents 8×8 grid.

#Generated nodes Run-Time (ms)k’ Count A* EPEA* CBS(hl) CBS(ll) A* EPEA* ICTS CBS p-val3 100 640 15 10 490 8 0 1 7 0.014 100 3,965 25 24 1,048 207 1 1 14 0.025 100 21,851 35 51 2,385 3,950 3 1 32 0.016 89 92,321 39 45 1,354 37,398 4 8 20 0.097 100 NA 88 117 3,994 NA 15 20 60 0.008 100 NA 293 266 8,644 NA 75 100 148 0.009 100 NA 1,053 1,362 45,585 NA 444 757 879 0.01

10 99 NA 2,372 3,225 111,571 NA 1,340 3,152 2,429 0.0211 94 NA 7,923 8,789 321,704 NA 8,157 7,318 7,712 0.4412 92 NA 13,178 12,980 451,770 NA 13,787 19,002 12,363 0.3613 86 NA 14,989 15,803 552,939 NA 18,676 28,381 16,481 0.5014 83 NA 13,872 21,068 736,278 NA 15,407 35,801 24,441 0.1415 71 NA 22,967 24,871 826,725 NA 33,569 54,818 30,509 0.4516 64 NA 26,805 24,602 822,771 NA 41,360 65,578 34,230 0.3217 49 NA 25,615 17,775 562,575 NA 42,382 75,040 25,653 0.22

Table 4.1: Nodes generated and running time on 8×8 grid.

instances. For the case of CBS both the high-level (hl) nodes and low-level (ll) states arereported. “NA” denotes problems where A* obtained less than 80% success rate (more than20% fail) for a given number of agents. The count column states the number of instancessolved by all algorithms within the time limit. Average results are presented only for thoseinstances. Similar to [Sharon et al., 2013a] we do not report the number of nodes for theICTS variants because this algorithm is not based solely on search.

The results of EPEA* and CBS are relatively similar. To investigate the statisticalsignificance of the differences between them, we performed a paired t-test on the runtimeresults of these two algorithms. The resulting p-values are shown in the “p-val” column.As can be seen, for larger problem sizes the significance of the difference between thealgorithms becomes smaller (high p-value corresponds to smaller significance). This isbecause some problems are very difficult to solve, while other problems are easy and canbe solved fast. This fact, together with the exponential nature of MAPF, results in highvariance of the runtime results. Clearly, pure A* is the worst algorithm while EPEA* is thestrongest A* variant. CBS is faster than ICTS for more than 11 agents. Note that althoughCBS generates more nodes than EPEA*, it is still faster in many cases (k′ > 14) due to thefact that the constant time per node of the low-level CBS (single-agent state, small openlist) is much smaller than that of EPEA* (multiple agents, large open list). CBS was faster

66

than EPEA* and ICTS by up to a factor of 2 and 3 respectively.

DAO Maps

Figure 4.7: The success rate of the different algorithms all running on top of ID for different DAOmaps den520d (top), ost003d (middle), brc202d (bottom).

We also experimented on 3 benchmark maps from the game Dragon Age: Origins [Sturte-vant, 2012]. Here we aimed to show the overall performance of the evaluated algorithms.Unlike the results above, here we perform ID on top of each of the evaluated algorithms.The different algorithms are given a problem instance with a given number of k agents,where the start and goal locations are uniformly randomized. ID breaks these problemsinto subproblems and executes the associated algorithm on each subproblem separately.We show results only for the three strongest algorithms: EPEA*, ICTS and CBS.

Figure 4.7 (right) shows the success rates given the number of agents for the three maps.Here the results are mixed and there is no global winner. One can clearly see that ICTS isalways better than EPEA*. The performance of CBS on these maps supports our theoreticalclaims that CBS is very effective when dealing with corridors and bottlenecks but ratherinefficient in open spaces. For den520d (top) there are no bottlenecks but there are largeopen spaces; CBS was third. For ost003d (middle) there are few bottlenecks and small openspaces; CBS was intermediate in most cases. Finally, for brc202b (bottom) there are manynarrow corridors and bottlenecks but very few open spaces, thus CBS was best. Note thatwhile both den520 and ost003 have open spaces they differ in the number of bottlenecks.

67

Figure 4.8: DAO maps den520d (left), ost003d (middle), brc202d (right) and their conflicting loca-tions

Figure 4.8 illustrates the conflicts encountered during the solving process of CBS. Acell where a conflict occurred in is colored red. The darkness of the red corresponds tothe number of conflicts occurred in a cell. Darker red implies more conflicts. As can beseen in open spaces (den520d,ost003d) the conflicts are spread and cover a large portionof the map. By contrast, in brc202d the conflicts are concentrated in the corridors andbottlenecks. This illustrates how open spaces encourage more conflicts and thus are lesssuitable for CBS.

4.4 CBS Using Different Cost Functions

Let SoC denote the sum-of-costs function as defined in Section 2.1.4. Up until now wefocused on the task of minimizing the SoC function. Nevertheless, CBS can be generalizedto other cost functions as follows.

4.4.1 High level

Generalizing CBS to the makespan cost function requires a single change to the high level –computing the cost of CT nodes according to the makespan of their corresponding solutioninstead of SoC (line 24 in Algorithm 4 Section 4.1.2). More generally, let Φ be a functionassigning costs to MAPF solutions. To find a solution that minimizes Φ, we modify CBSto compute the cost of every CT node, N , to be Φ(N.solution). We denote an executionof CBS with cost function Φ as CBSΦ.

4.4.2 Low Level

Recall that each high-level node, N , keeps a multi-agent solution, N.solution. Each multi-agent solution is composed of k single-agent paths. The low level solver, as defined abovefor our cost function (SoC), returns a single agent path for a given agent. The single-agentpath returned by the low-level must be optimal in the sense that it keeps Φ(N.solution)

to a minimum. It is possible, however, to improve the low-level for a given cost function.During the low-level search we suggest breaking ties according to the minimal number ofconflicts encountered with other agents. This can be done using a conflict avoidance table(CAT) described above.

68

4.4.3 Optimality

With minor modifications, the optimality proof for the sum of cost function (presentedin Section 4.2.1) holds for CBSΦ for a very wide range of cost functions. Φ is calledadmissible if Φ(N) ≤ minCost(CV (N)) for every CT node N .

Theorem 5 If Φ is admissible then CBSΦ is guaranteed to return the optimal solution.

Proof CBSΦ performs a best-first search according to Φ. By definition, any solution inthe CT subtree below N cannot have a solution of cost smaller than minCost(CV (N)).BFS guided by an admissible evaluation function is guaranteed to return the optimal solu-tion [Dechter and Pearl, 1985].

SoC, makespan, and fuel cost functions are all admissible, as adding more constraintsin subsequent CT nodes cannot lead to a solution with a lower cost. Therefore, accordingto Theorem 5, CBSΦ would return optimal solutions for all these cost functions.

4.4.4 Completeness

The proof of completeness provided in Section 4.2.2 holds for any cost function Φ as longas there is a finite number of solutions per given cost according to Φ. This condition holdsfor any cost function that has no zero cost actions. SoC and makespan are examples of suchfunctions. The fuel cost function, on the other hand, does not have this property. For thefuel cost function there can be an infinite number of high-level nodes with a given cost aswait actions do not cost anything. Since the fuel cost function does not satisfy the abovecondition it calls for a slightly different approach. In their paper Yu and Rus [Yu and Rus,2014] suggest that for any solvable MAPF instance no more than O(|V |3) single agentsteps are required for all agents to reach their goal. This fact can be utilized to our needs asfollows. For each node N , if SoC(N) > |V |4, the node can safely be ignored. A solutionlarger than |V |4 (according to SoC) must contain a time step where all agents perform await action simultaneously. In this case, there is a cheaper solution where this time step isremoved.

4.5 Meta-Agent Conflict Based Search (MA-CBS)

We now turn to explain a generalized CBS-based framework called Meta-agent ConflictBased Search (MA-CBS). First we provide motivation for MA-CBS by focusing on thebehavior of the basic version of CBS that was described above.

4.5.1 Motivation for Meta-agent CBS

As explained previously, CBS is very efficient (compared to other approaches) for someMAPF problems and very inefficient for others. This general tendency for different MAPFalgorithms to behave differently for different environments or topologies was discussedpreviously [Sharon et al., 2013a; Sharon et al., 2015]. Furthermore, a given domain might

69

have different areas with different topologies. This calls for an algorithm that will dy-namically change its strategy based on the exact task and on the area it currently searches.There is room for a significant amount of research in understanding the relation betweenmap topologies and MAPF algorithms performance. MA-CBS is a first step towards dy-namically adapting algorithms.

As was shown in the previous section, CBS behaves poorly when a set of agents isstrongly coupled, i.e., when there is a high rate of internal conflicts between agents in theset. In such cases, basic CBS may have to process a significant number of conflicts in orderto produce the optimal solution. MA-CBS remedies this behavior of CBS by automaticallyidentifying sets of strongly coupled agents and merging them into a meta-agent. Then,the high-level CBS continues, but this meta-agent is treated, from the CBS perspective, asa single agent. Consequently, the low-level solver of MA-CBS must be a MAPF solver,e.g., A*+OD [Standley, 2010], EPEA* [Goldenberg et al., 2014], M* [Wagner and Choset,2011]. Thus, MA-CBS is in fact a framework that can be used on top of another MAPFsolver. Next, we provide the technical details of MA-CBS.

4.5.2 Merging Agents Into a Meta-Agent

The main difference between basic CBS and MA-CBS is the new operation of mergingagents into a meta-agent. A meta-agent consists of M agents. Thus, a single agent is just ameta-agent of size 1. Returning to Algorithm 4 (Section 4.1.2), we introduce the mergingaction which occurs just after a new conflict is found (Line 10). At this point MA-CBS hastwo options:

• Branch: In this option, we branch into two nodes based on the new conflict (lines19-26). This is the option that is always performed by basic CBS.

• Merge: MA-CBS has another option, to perform the merging of two conflicting(meta) agents into a single meta-agent (Lines 12-18).

The merging process is done as follows. Assume a CT node N with k agents. Supposethat a conflict was found between agents a1 and a2, and these two agents were chosen tobe merged. We now have k − 1 agents among them a new meta-agent of size 2, labeleda1,2. This meta-agent will never be split again in the subtree of the CT below N ; it might,however, be merged with other (meta-) agents to form new meta-agents. Since nothingchanged for the other agents that were not merged, we now only call the low-level searchagain for this new meta-agent (Line 14). The low-level search for a meta-agent of size Mis in fact an optimal MAPF problem for M agents and should be solved with an optimalMAPF solver. Note that the f -cost of this CT node may increase due to this merge action,as the optimal path for a meta-agent may be larger than the sum of optimal paths of eachof these agents separately. Thus, the f -value of this node is recalculated and stored (Line15). The node is then added again into OPEN (Line 17).

MA-CBS has two important components (in addition to the CBS components): a merg-ing policy to decide which option to choose (branch or merge) (Line 11), and a constraint-merging mechanism to define the constraints imposed on the new meta-agent (Line 13).

70

This constraint-merging mechanism must be designed such that MA-CBS still returns anoptimal solution. Next, we discuss how to implement these two components.

4.5.3 Merge Policy

We implemented the following merging policy. Two agents ai, aj are merged into a meta-agent ai,j if the number of conflicts between ai and aj recorded during the search exceedsa parameter B. We call B the conflict bound parameter and use the notation MA-CBS(B)

to denote MA-CBS with a bound of B. Note that basic CBS is in fact MA-CBS(∞). Thatis, we never choose to merge and always branch according to a conflict.

To implement this conflict bound oriented merging policy, a conflict matrix CM ismaintained. CM [i, j] accumulates the number of conflicts between agents ai and aj seenthus far by MA-CBS. CM [i, j] is incremented by 1 whenever a new conflict betweenai and aj is found (Algorithm 4, Section 4.1.2, Line 10). Now, if CM [i, j] > B theshouldMerge() function (Line 11) returns true and ai and aj are merged into ai,j. If aconflict occurs between two meta-agents, a1 and a2, because of two simple agents, at ∈ a1

and ak ∈ a2, CM [t, k] is incremented by 1 and the shouldMerge() function will returntrue if

∑CM [x, y] > B over all x ∈ a1, y ∈ a2. This policy is simple and effective. How-

ever, other merging policies are possible and could potentially obtain a significant speedup.

To illustrate MA-CBS, consider again the example shown in Figure 2.1 (Section 2.1.7).Assume that we are using MA-CBS(0). In this case, at the root of the CT, once the conflict(a1, a2, D, 2) is found, shouldMerge() returns true and agents a1 and a2 are merged intoa new meta-agent a1,2.

Next, the low-level solver is invoked to solve the newly created meta-agent and a(conflict-free) optimal path for the two agents is found. If A* is used, a 2-agent A* will beexecuted for this. The high-level node is now re-inserted into OPEN, its f -value is updatedfrom 8 to 9. Since it is the only node in OPEN it will be expanded next. On the secondexpansion the search halts as no conflicts exist - there is only one meta-agent which, bydefinition, contains no conflicts. Thus, the solution from the root node is returned. Bycontrast, for MA-CBS(B) with B > 0, the root node will be split according the conflict asdescribed above in Section 4.1.2.

4.5.4 Merging Constraints

Denote a meta-agent by x. We use the following definitions:

• A meta constraint for a meta-agent x is a tuple (x, x, v, t) where a subset of agentsx ⊆ x are prohibited from occupying vertex v at time step t.

• Similarly, a meta conflict is a tuple (x, y, v, t) where an individual agent x′ ∈ x andan individual agent y′ ∈ y both occupy vertex v at time point t.

Consider the set of constraints associated with (meta-) agents ai and aj before themerge. They were generated due to conflicts between agents. These conflicts (and therefore

71

the resulting constraints) can be divided to three groups.

1. internal: conflicts between ai and aj .

2. external(i): conflicts between ai and any other agent ak (where k 6= j).

3. external(j): conflicts between aj and any other agent ak (where k 6= i).

Since ai and aj are now going to be merged, internal conflicts should not be consideredas ai and aj will be solved in a coupled manner by the low level. Thus, we only considerexternal constraints from that point on.

Merging External Constraints

Assume that agents ai and aj are to be merged into a new meta agent ai,j and that ai hasthe external constraint (ai, v, t). This constraint means that further up in the CT, ai had aconflict with some other agent ar at location v at time t and therefore ai is not allowed tobe located at location v at time t.

The new meta agent must include all external constraints. Assume an external constraint(ai, v, t). After merging ai and aj this constraint should apply only to the original agent, ai,and not apply to the entire meta agent, i.e., ai ∪ aj. Therefore, the merged constraintis in the form of (ai,j, ai, v, t). This is done in Line 13 of Algorithm 4 (Section 4.1.2).

When merging ai and aj one would be tempted to introduce a meta constraint (ai,j, ai∪aj, v, t) where both agents ai and aj are prohibited from location v at time t. However,this might break the optimality of the algorithm because of the following scenario. Assumea 3-agent problem where in the optimal solution agent a3 must go through vertex v at timestep t. Calling MA-CBS to solve this problem creates the root CT node. Assume that in thepath chosen for each agent in the root CT node both agents a1 and a2 are assigned vertexv at time step t. Next, MA-CBS branches according to the conflict (a1, a2, v, t). Two newCT nodes are generated N1, N2. In N1 (N2) agent a1 (a2) is constrained from taking vat time t. Next, agent a1 (a2) is merged with agent a3 at node N1 (N2). If we allow theconstraints imposed on agents a1 and a2 to apply to agent a3 we will block the optimalsolution in both N1 and N2 which are the only two nodes in OPEN.

By contrast, assume a conflict (x, y, v, t) is detected between x and y, after the metaagent x was created. The exact identity of the conflicting agents x′ ∈ x and y′ ∈ y isirrelevant. The constraint on both meta agents should include the entire meta agent, i.e.(x, v, t) or similarly on y (Line 20 of Algorithm 4, Section 4.1.2). Doing so will preservecompleteness as all possible solutions exist where either all agents in x or y are constrainedfrom being at v at time t.

4.5.5 The Low-Level Solver

The low level finds a path for a given agent. In case of a meta agent the low level needsto solve an instance of MAPF that is given by internal agents that make up the meta-agent.Any MAPF solver that possess the following three attributes may be used at the low level:

72

1. Completeness - the solver must return a solution if one exists else it must return false.

2. Constraint handling - the solver must never return a solution that violates a con-straint.

3. Optimality - the solver must return the optimal solution.

Many known MAPF algorithms are suitable for the low-level of MA-CBS, e.g., A*+OD [Stan-dley, 2010], EPEA* [Goldenberg et al., 2014], M* [Wagner and Choset, 2011]. ICTS [Sharonet al., 2013a] and CBS in their basic form are not suitable for the low level as they cannotdetect an unsolvable problem instance. In Section 4.2.2 we presented a way to deal withthis issue by applying an algorithm that detects unsolvable MAPF instances [Yu and Rus,2014]. This method, however, is not directly applicable for the constrained MAPF instancepassed to the low-level solver. Interestingly, MA-CBS with a merge bound smaller thaninfinity can be configured to serve as a low-level solver, resulting in a recursive structureof MA-CBS. To avoid cases where an MA-CBS solver calls another MA-CBS solver adinfinitum, using MA-CBS as a low-level solver requires increasing the merge threshold be-tween successive recursive calls. The manner in which the threshold is increased is a deepquestion that requires extensive research.

4.5.6 Completeness and Optimality of MA-CBS

The proof of completeness provided in Section 4.2.2 also holds for MA-CBS as is. In orderto prove the optimality of MA-CBS we will use the supporting claims and lemmas definedand proven in Section 4.2.1. All lemmas from Section 4.2.1 hold for the MA-CBS case. Theproofs are not affected from the option to merge agents. The only exception is Lemma 3which says: “For each valid solution p, there exist at least one node N such that N permitsp.” The proof was by induction on a branching action. However, for MA-CBS, expandinga high-level node can result in merging agents rather than branching. We thus completethe proof by handling the merge action case. In this case, node N is expanded, a mergingaction is performed and node N is re-inserted into OPEN. Any valid solution in V S(N)

must remain in the new V S(N) after the expansion process since no new constraints wereadded.

4.5.7 MA-CBS as a Continuum

As explained above, the extreme case of MA-CBS(∞) is equivalent to basic CBS. Wenow note that the other extreme case, MA-CBS(0) is equivalent to the basic independencedetection mechanism introduced by Standley [Standley, 2010] (see Section 2.2.3). In MA-CBS(0) we merge agents as soon as a conflict occurs. In the root node MA-CBS(0) solveseach agent separately. Then, MA-CBS(0) expands the CT root node. When validatingthis node a conflict between the solutions of the single agents is found (if one exists). Theconflicting agents will be merged as B = 0. The combined group will be solved usingthe low-level MAPF solver. Next, the root will be re-inserted into OPEN and a validationoccurs. Since B = 0 the branching option will never be chosen and conflicts are always

73

solved by merging the conflicted (meta) agents. Thus, this variant will only have one CTnode which is being re-inserted into OPEN, merging agents that conflict until no conflictsoccur. This is identical to the behavior of ID.

The enhanced ID version explained in Section 2.2.3 tries to resolve conflicts by replan-ning a path to one of the conflicting agents. This re-planning is a reminiscent of MA-CBS(1) as once a conflict is found, we try to bypass it by branching. If more conflicts arefound, the agents are merged.

Thus, MA-CBS(B) is a continuum which has these two previous algorithms as extremecases. However, MA-CBS(B) with varying values of B can be significantly better than ID,when it solves agents that are only loosely coupled, by adding constraints to these agentsseparately. For example, in the case of a bottleneck (such as Figure 2.1, Section 2.1.7)where the individual solutions of the agents conflict, ID (≡MA-CBS(0)) will merge theseagents to a single group and solve it in a coupled manner. By contrast, MA-CBS(B) (withB > 0) can avoid this bottleneck by adding a single constraint to one of the agents. There-fore, using MA-CBS(B) and choosing a suitable value ofB ≥ 0 adds much more flexibilityand may significantly outperform ID. This is clearly seen in the experimental results sectiondescribed next.

4.6 MA-CBS Experimental Results

In this section we study the behavior of MA-CBS empirically on three standard benchmarkmaps from the game Dragon Age: Origins [Sturtevant, 2012]. Recall that each of thethree maps, shown again in Figure 4.9, represents a different topology. Map den520d(top) has many large open spaces and no bottlenecks, map ost003d (middle) has a fewopen spaces and a few bottlenecks and map brc202d (bottom) has almost no open spacesand many bottlenecks. Our main objective was to study the effect of the conflict boundparameter B on the performance of MA-CBS. We ran MA-CBS(B) in our experimentswith B = 0, 1, 5, 10, 100, 500 and∞. Since MA-CBS is a framework that can run on topof any A*-based solver, we experimented with two such solvers: A* and Enhanced PartialExpansion A* (EPEA*) [Goldenberg et al., 2014]. Both solvers used the SIC heuristic(defined above). A* was chosen as a baseline, while EPEA* was chosen since it is currentlythe state-of-the-art A*-based MAPF solver.

For each of the maps we varied the number of agents k. We ran our algorithms on 100random instances for each value of k. If an algorithm did not solve a given problem instancewithin five minutes it was halted. The numbers reported are an average over all instancessolved by all the algorithms that managed to solve more than 70% of the instances. If analgorithm solved less than 70% we report its average results as a lower bound (includinginstances where the solver was halted due to time limit). These cases were denoted by ”>”in the tables below.

Tables 4.3 and 4.2 show runtime in ms for the experiments described above. The kcolumn denotes the number of agents in the experiment. MA-CBS(x) is denoted by B(x).For a given number of agents, the result of the best-performing algorithm is given in bold.

74

den520dk B(0) B(1) B(5) B(10) B(100) B(500) B(∞)5 899 190 180 181 180 180 256

10 1,633 1,782 470 467 469 469 63215 1,621 2,241 1,708 1,702 1,713 1,738 1,80720 3,393 3,725 1,527 1,515 1,553 1,555 1,86725 7,675 8,327 1,701 1,620 1,731 2,071 3,26430 12,574 13,308 3,955 3,773 5,276 16,191 >38,70735 15,736 12,655 4,974 4,993 7,199 18,998 > 50,05040 14,635 15,452 4,860 4,971 7,686 20,860 >50,891

ost003dk B(0) B(1) B(5) B(10) B(100) B(500) B(∞)5 187 231 168 168 169 169 222

10 1,718 1,983 764 753 757 757 93515 4,888 4,593 1,597 1,592 1,568 1,570 1,90920 10,463 13,426 3,701 3,654 3,623 3,598 4,11925 > 60,140 >58,902 >28,881 15,109 18,159 35,536 > 73,86030 >84,473 > 80,248 > 30,781 25,860 27,525 46,328 >92,20935 >90,703 >81,633 >39,660 21,466 28,241 47,544 > 95,262

brc202dk B(0) B(1) B(5) B(10) B(100) B(500) B(∞)5 1,834 2,351 1,286 1,276 1,268 1,267 1,664

10 6,034 8,059 4,580 4,530 4,498 4,508 5,49515 12,354 15,389 6,903 6,871 6,820 6,793 8,68520 > 70,003 >73,511 35,095 21,729 19,846 31,229 >43,625

Table 4.2: Run-Time (ms) on DAO problems. EPEA* as low-level solver.

den520dk B(0) B(1) B(5) B(10) B(100) B(500) B(∞)5 223 273 218 220 219 222 219

10 1,099 1,458 553 552 549 552 54615 1,182 1,620 1,838 1,810 1,829 1,703 1,67220 4,792 4,375 1,996 2,011 2,020 1,857 1,70825 7,633 14,749 2,193 2,255 2,320 2,888 3,04630 > 62,717 > 60,214 8,082 8,055 8,107 8,013 7,74535 > 65,947 > 51,815 13,670 13,587 15,981 28,274 > 45,95440 > 81,487 >82,860 18,473 18,399 20,391 31,189 > 45,857

ost003dk B(0) B(1) B(5) B(10) B(100) B(500) B(∞)5 470 631 220 218 220 219 222

10 8,192 16,270 1,006 995 981 977 93515 8,971 15,679 1,640 1,619 1,624 1,551 1,45820 29,507 47,204 3,293 3,234 3,208 3,074 3,00025 >122,166 >125,417 >73,014 > 53,481 28,443 38,422 >59,92330 >162,290 > 170,094 >63,963 > 51,167 29,912 43,405 >69,681

brc202dk B(0) B(1) B(5) B(10) B(100) B(500) B(∞)5 7,382 12,200 1,682 1,665 1,640 1,657 1,664

10 22,554 39,346 5,372 5,312 5,263 5,226 5,31815 47,822 84,460 8,851 8,746 8,736 8,701 8,68120 >116,675 >159,039 >51,592 24,011 24,817 31,069 >34,72625 >197,268 >223,838 >146,301 >85,891 63,162 63,835 > 66,178

Table 4.3: Run-Time (ms) on DAO problems. A* as Low-level Solver.

75

Table 4.3 shows results for the case where A* was used for the low-level search whileTable 4.2 report results when EPEA* was used. Each frame in the tables presents a differentmap.

The results clearly show that as the problems become harder (longer time to solve) MA-CBS with non-extreme values, i.e., with B 6= 0 and B 6=∞, is able to solve most instancesfaster than MA-CBS(0) (ID) and MA-CBS(∞) (basic CBS). The new variants achievedup to an order of magnitude speed-up over MA-CBS(∞) (e.g. in den520d for 35 and 40agents with EPEA* as the low-level solver) and up to a factor of 4 over MA-CBS(0) (e.g.,in ost003d with 25 agents).

Next, consider the effect of increasing the number of agents k for the den520d mapwhere EPEA* was used (Table 4.2 first frame). Instances with few agents (k < 25) weresolved faster using MA-CBS with large B values. As the problems become denser (k >30) MA-CBS with smaller B values is faster. In addition, the relative performance ofbasic CBS (≡MA-CBS(∞)) and MA-CBS(500) with respect to the best variant degrades.This is explained as follows. In dense problem instances, where there are many agentsrelative to the map size, many conflicts occur. Recall that basic CBS is exponential inthe number of conflicts encountered. Thus, increasing the number of agents degrades therelative performance of MA-CBS with large B values (which behaves closer to basic CBS)compared to variants with small B values. In separate experiments (not reported here) inthe extreme scenario, where k = |V | − 1, we observed that MA-CBS(0) performs best.

Now, consider the results where A* was used as a low-level solver (Table 4.3). Here,we see the same general trend as observed in the results for EPEA*. However, the best-performing value of B was larger than that of MA-CBS with EPEA* (Table 4.2). Forexample, in the den520d map with 30 agents, MA-CBS(5) with A* as the low-level solverdid not obtain a significant speedup over CBS. For EPEA* as the low-level solver, MA-CBS(5) obtained an order of magnitude speedup over CBS. The same tendency can alsobe observed in the other maps. The reason is that for a relatively weak MAPF solver,such as A*, solving a large group of agents is very inefficient. Thus, we would like toavoid merging agents and run in a more decoupled manner. For these cases a higher B ispreferred. On the other hand, with a faster MAPF solver, such as EPEA*, a lower value ofB would perform better. In MA-CBS(∞) the low level is never invoked for meta agents(only for single agents). Consequently, running A* or EPEA* at the low level makes nodifference. The difference in runtime is accounted for a different set of problems that weretaken into account. For B(5) EPEA* could solve many more problems compared to A* (theharder problems). Since these problems are accounted for in table 1 the runtime results arehigher compared to table 2.

Figure 4.9 shows the success rate, i.e., the number of instances solved before the time-out, for MA-CBS with B = 0, 1, 10, 100 and∞. The low-level solver was set to EPEA*,hence B = 0 is denoted by EPEA*. Additionally, for comparison we also report the suc-cess rate of the best ICTS variant [Sharon et al., 2013a]. Note that the legends are orderedaccording to the performance in the given map.

As can be seen, in all the experiments MA-CBS with intermediate values, 0 < B <∞,

76

Figure 4.9: Success rate of the MA-CBS on top of ID with EPEA* as the low-level solver.

is able to solve more instances than both extreme cases, i.e., EPEA* (≡ MA-CBS(0))and basic CBS (≡ MA-CBS(∞)). Additionally, MA-CBS with intermediate values alsooutperforms the ICTS solver. Consider the performance of MA-CBS variants with B <∞in comparison with the basic CBS (B =∞). Basic CBS performs very poorly for den520d(top), somewhat poorly for ost003d (middle) but rather well for brc202d (bottom). This isbecause in maps with no bottlenecks and large open spaces, such as den520d, CBS will beinefficient, since many conflicts will occur in many different locations. This phenomenon isexplained in the pathological example of CBS given in Figure 4.4 (Section 4.2.3). Thus, inden520d the benefit of merging agents is high, as we avoid many conflicts. By contrast, formaps without large open spaces and many bottlenecks, such as brc202d, CBS encountersfew conflicts, and thus merging agents result in only a small reduction in conflicts. Indeed,as the results show, for brc202d the basic CBS (MA-CBS(∞)) achieves almost the sameperformance as setting lower values of B.

In problems with a higher conflict rate it is, in general, more helpful to merge agents,and hence lower values of B perform better. B-CBS(10) obtained the highest success rates,for example, in den520d (top). By contrast, B-CBS(100) obtained the highest success rates

77

in ost003d and brc202d.

4.6.1 Conclusions from Experiments

The experiments clearly show that there is no universal winner. The performance of eachof the known algorithms depends greatly on problem features such as: density, topologyof the map, the initial heuristic error and the number of conflicts encountered during theCBS solving process. It is not yet fully understood how these different features are relatedto the performance of each algorithm, a point we intend to research in the future. At thesame time, we are trying to come up with new features to assess the performance of eachalgorithm prior to search. Nevertheless, we present the following general trends that weobserved:

• MA-CBS with intermediate B values (0 < B <∞) outperforms previous algorithmA*, EPEA* and CBS. It also outperforms ICTS in most cases.

• Density. In dense maps with many agents, low values of B are more efficient.

• Topology. In maps with large open spaces and few bottlenecks, low values of B aremore efficient.

• Low-level solver. If a weak MAPF solver (e.g., plain A*)is used for the low-levelsearch, high values of B are preferred.

4.7 Conclusions

In this Chapter the CBS algorithm was introduced. CBS is a novel optimal MAPF solver.CBS is unique in that all low-level searches are performed as single-agent searches, yet itproduces optimal solutions. The performance of CBS depends on the structure of the prob-lem. We have demonstrated cases with bottlenecks (Figure 2.1, Section 2.1.7) where CBSperforms well, and open spaces (Figure 4.4, Section 4.2.3) where CBS performs poorly.We analyzed and explained these cases and how they affect CBS’s performance.

We then turned to present the MA-CBS framework, a generalization of the CBS algo-rithm. MA-CBS can be used on top of any MAPF solver, which will be used as a low-levelsolver. Furthermore, MA-CBS can be viewed as a generalization of the Independence De-tection (ID) framework introduced by Standley [Standley, 2010].

MA-CBS serves as a bridge between CBS and other optimal MAPF solvers, such asA*, A*+OD [Standley, 2010] and EPEA* [Goldenberg et al., 2014]. It starts as a regularCBS solver, where the low-level search is performed by a single agent at a time. If MA-CBS identifies that a pair of agents conflicts often, it groups them together. The low-levelsolver treats this group as one composite agent, and finds solutions for that group usingthe given MAPF solver (e.g., A*). As a result, MA-CBS is flexible and can enjoy thecomplementary benefits of both CBS and traditional optimal solvers by choosing when togroup agents together. As a simple yet effective mechanism for deciding when to groupagents, we introduced the conflict bound parameter B. The B parameter corresponds to

78

the tendency of MA-CBS to create large groups of agents and solve them as one unit.When B = 0 MA-CBS converges to ID and when B = ∞ MA-CBS is equivalent toCBS. Setting 0 < B < ∞ gives MA-CBS flexibility, so that in cases where only fewconflicts occur, MA-CBS can act like CBS, while if conflicts are common, MA-CBS canconverge to a single meta-agent problem that includes all or most of the conflicting agents.Experimental results in testbed map problems support our theoretical claims. The domainspresented have different rates of open spaces and bottlenecks. MA-CBS with a high Bvalue (100, 500) outperforms other algorithms in cases where corridors and bottlenecks aremore dominant. In addition, experimental results showed that MA-CBS with non-extremevalues of B (i.e., neither B = 0 nor B = ∞) outperforms both CBS and other state-of-the-art MAPF algorithms. The results lead to the conclusion that in the general case, it ismost beneficial to group agents to a certain extent. This results in a faster solving processwhen compared to never grouping agents, as in CBS, and to grouping all agents, as in allprevious optimal solvers.

There are many open challenges for the CBS and MA-CBS algorithms:

1. Currently no heuristic guides the search in the high-level constraint tree. Coming upwith an admissible heuristic for the high level could potentially result in a significantspeed up.

2. Further work could be done to understand the effect of the B parameter on MA-CBS,which might give insight into how B could be varied dynamically.

3. Using a single B parameter for merging agents is relatively simple; it is an openquestion whether more sophisticated merging policies could significantly improveperformance. For instance, merging might be based on areas of the map instead of onindividual agents.

4. Following Ferner et al [Ferner et al., 2013b] It would be valuable to experiment andcompare different low-level solvers including ICTS [Sharon et al., 2013a], ODrM* [Ferneret al., 2013b], Boolean Satisfiability (SAT) [Surynek, 2012], Integer Linear Program-ming (ILP) [Yu and LaValle, 2013a] and Answer Set Programming (ASP) [Erdem etal., 2013].

79

Chapter 5

Exponential Deepening A*

In the Real-Time Agent-Centered Search (RTACS) problem, an agent has to arrive at agoal location while acting and reasoning in the physical world. Time/memory constraintsor limited sensing radius prevents the agent from planning a full start-to-goal plan beforeacting. Traditionally, RTACS problems are solved by propagating and updating heuristicvalues of states visited by the agent. In existing RTACS algorithms the agent is some-times forced to revisit each state many times causing the entire procedure of arriving atthe goal to be quadratic in the state space. In this chapter, we study the Iterative Deepen-ing (ID) approach for solving RTACS and introduce Exponential Deepening A* (EDA*),a RTACS algorithm where the threshold between successive Depth-First calls is increasedexponentially. EDA* is proven to hold a worst case bound that is linear in the state space.Experimental results supporting this bound are presented and demonstrate up to ×4 re-duction over existing RTACS solvers with regard to distance traveled, states expanded andCPU runtime.

We introduce two versions of EDA*. The first only senses and manipulates the im-mediate neighbors of the agent’s location. If the agent is allowed to perform significantcomputation and sensing prior to acting, we introduce Local Search Space EDA* (LSS-EDA*), a variant of EDA* that allows to sense and manipulate all states within a givenlocal search space before choosing a move. This results in plans with shorter distanceat the expense of extra computational effort. Experimental results show that LSS-EDA*outperforms existing LSS-based RTACS solvers.

This chapter is organized as follows. In Section 5.1 we define RTACS and providerelevant background. Section 5.2 presents a short survey on previous work regardingRTACS. In Section 5.3 we present and analyze the Iterative-Deepening approach for solv-ing RTACS. Section 5.4 introduces EDA* and provides theoretical analysis. Section 5.5describes the BDFS procedure for RTACS algorithms. In Section 5.6 we present experi-mental results that support our theoretical analysis and present the strengths of EDA* whencompared to previous RTACS solvers. Section 5.7 presents LSS-EDA* along with experi-mental results that present its superiority. Finally, Section 5.8 summarizes this chapter andprovides conclusions and future research directions.

This chapter is almost an exact copy of a journal paper under submission to ArtificialIntelligence. A version of this research was presented in [Sharon et al., 2014].

80

5.1 RTACS: Definitions and Background

In this section we define the settings of RTACS and provide background and related work.

5.1.1 RTACS Settings

Given an undirected graph, a start and a goal vertices, we differentiate between two typesof problems and related algorithms: classical search problems and real-time agent centeredsearch problems (RTACS). In a classical search problem the task is to search and return afull start-to-goal path (in a single planning phase). Later on, this path may be followed by areal-world agent (acting phase), but this is beyond the scope of the classical search problemwhich focuses on finding a path. Traditional search algorithms in their common use (e.g.,A*, IDA*) belong to this class.

In contrast, in RTACS, a moving agent is located at start and its task is to physicallyarrive at the goal. RTACS algorithms perform cycles that include a planning phase wheresearch (and possibly sensing of the environment) occurs and an acting phase where theagent physically moves. Several plan-act cycles are performed due to the following restric-tive assumptions:

• Assumption 1: (real time) As a real-time problem, the agent can only perform aconstant-bounded number of computations before it must act by following an edgefrom its current state. Then, a new plan-act cycle begins from its new position.

• Assumption 2: (memory) The internal memory of the agent is limited. But agentsare allowed to write a small (constant) amount of information into each location (e.g.,g- and h- values). In this way RTACS solvers are an example of ‘ant’ algorithms, withlimited computation and memory [Shiloni et al., 2009].

• Assumption 3: (sensing radius) As an agent-centered problem, the agent is con-strained to only manipulate (i.e., read and write information) states which are in closeproximity to it; these are usually assumed to be contiguous around the agent. Un-less stated otherwise and as a default we assume that the sensing radius is 1 and anagent can only manipulate its direct neighbors. But, a larger sensing radius may beassumed. In such cases, and given sufficient time and memory, an algorithm mightallow a Local Search Space(LSS). Relevant algorithms may manipulate states withinthe LSS.

A basic building-block for RTACS solvers is propagation (learning) of h-values and/org-values between neighboring states. Since the agent is assumed to have no or limitedmemory, any needed data regarding a state must be stored in the actual state (according toassumption 2 above). The propagation procedure (known as the Bellman equation [Bell-man, 1957]) is defined as: x(v) = minu(x(u)+cost(v, u)), where x(v) is the g- or h-valueof vertex v, u is a vertex within the sensing radius of vertex v and cost(v, u) is the cost alongthe path from v to u.

81

A(3) S(0) B(0) G(0) 3 1 1

Figure 5.1: Example graph.

5.1.2 Classification of States

Due to the special settings of RTACS defined above, agents may arrive at a state, leave it butlater arrive at the same state again. For example, consider the graph depicted in Figure 5.1.The agent’s initial location is S and its task is to arrive at G (for now ignore the numbers inparenthesis). As its first action the agent must either move left (to state A) or right (to stateB). If the agent moves to state B it will be obligated to return to the start state at least oncebefore arriving at the goal. Returning to a previously visited state is called a revisit of thisstate. A RTACS agent has two types of state visits:

• First visit - the current state was never visited previously by the agent. We denotethe number of first visits by F .

• Revisit - the current state was visited previously by the agent. We denote the numberof revisits by R.

5.1.3 Evaluation Metrics

There are many possible metrics to evaluate the performance of RTACS algorithms andprevious work differs in this aspect. We therefore first define these metrics. Commonmetrics for evaluating RTACS algorithms are:

• (1) Travel distance: the total distance (sum of edge costs) that the agent traversedduring all acting phases. This is relevant when the time of physical movement of theagent (between states) dominates the computational CPU time.

• (2) Computational time: the sum of CPU runtime over all planning phases; relevantwhen the CPU run time dominates the time of a physical movement. CPU time canbe measured precisely or approximated by counting the number of states expanded inthe planning phase.1

• (3) First Visits Ratio: In this thesis we also introduce a new metric: First VisitsRatio (FVR). FVR is the ratio between the number of first visits and total visits andis defined as FV R = F

F+R. FV R represent the balance between exploring new

areas of the state space (F ) and exploring previously visited states (R) that are more1Koenig [Koenig, 2004], discussed these and other metric variants. Recently, Burns et al. [Burns et al., 2013; Ruml

and Do, 2007] suggested combining the two metrics into one utility function that takes both into account. Working withthis metric is left for future work.

82

promising. Keeping a balanced (constant) FVR, is desired in RTACS algorithms. Wefurther elaborate on this matter in Section (5.6).

5.1.4 Optimality

In RTACS, the agent physically moves in the state space searching for the goal state. There-fore, unless the shortest path is known to the agent in advance, the path it follows is oftennot optimal. Moreover, for most RTACS algorithms, even when the agent reaches the goalstate, the optimal path remains unknown. Hence, we distinguish between two types ofoptimality:

• Optimal-0: Returning the optimal solution. Defined only for classical search algo-rithms. If the search algorithm is guaranteed to return the shortest path (after a singleplanning phase) we say that it is Optimal-0. For example, A*, IDA* and DFBnBare all Optimal-0. However, Optimal-0 is not defined for RTACS algorithms becausethey do not return paths; their task is to arrive at the goal.

• Optimal-1: Converging to the optimal solution. Defined for RTACS. In some casesit is assumed that the agent must repeatedly solve the same problem many times (eachof these is called a trial) while sharing knowledge across all trials. That is, we assumethat values that are stored inside the states (e.g., h-values) remain across trials. Inthese cases, the heuristics can be updated by successive trials until they converge tothe true value along the optimal path. Once this process converges the agent will beable to follow the optimal path in all future trials. RTACS algorithms that guaranteethis convergence are said to be Optimal-1. For example, algorithms from the LRTA*family are Optimal-1 [Korf, 1990]. By contrast, Real-Time A* (RTA*) [Korf, 1990]is not Optimal-1. All these algorithms are discussed below.

5.1.5 Terminology

The following list summarizes the terminology used throughout this chapter.

• Underlying graph - the input graph. Each vertex is a possible physical location ofthe agent. Each edge represents a possible transition between two vertices (locations).

• State - a state is composed of: (1) A vertex from the underlying graph. (2) Extra datasuch as g- and h-values. The exact values stored as extra data may change accordingto the applied search algorithm. For example, LRTA* [Korf, 1990] requires storingonly h-values while RIBS [Sturtevant et al., 2010] stores g-values.

• Learning - propagating and updating h- or g-values from (to) neighboring states.

• Directing - choosing an action (or a set of actions) that will lead the agent to the nextchosen location.

• Planning - performing learning and directing.

83

• Acting - physically moving the agent according to the directing policy.

• Iteration - performing a single Bounded-Depth First Search call within an Iterative-Deepening algorithm. (This is further explained in section 5.3).

• Trial - the entire process of getting the agent from the initial state to the goal state.Notice that in Iterative Deepening algorithms, a trial may be composed from severaliterations.

• Visit a state - physically arrive at a state.

• Revisit - visiting a previously visited state. The number of revisits across a trial isdenoted by R.

• First visit - visiting a state that was never visited before during the current trial. Thenumber of first visits across a trial is denoted by F .

• Balanced algorithm - a balanced algorithm satisfies F = Θ(R) in every trial. Acomprehensive definition is given in Section 5.3.2.

• Sensing radius - the agent is assumed to be able to manipulate (read and write in-formation) states that are within a given sensing radius around the currently occupiedstate. Unless stated otherwise, we assume that the sensing radius is 1 (i.e., the agentcan only manipulate the immediate neighbors of a node).

• Lookahead - considering states that are beyond the immediate neighboring states forlearning or directing purposes. Lookahead is only applicable when sensing radius,memory and time constraints allow it. The basic setting does not allow lookahead.

• Expand a state - examining the data in neighboring states either for learning, di-recting or both. Note that any visited state must also be expanded but states may beexpanded during a lookahead without being visited. When lookahead is not applied,the number of visited states is equal to the number of expanded states.

• Local Search Space - the set of states expanded during a lookahead.

• Dead state - states that the agent does not need to visit in order to get to the goalstate. States that are marked dead will never be expanded or visited.

• Global A* - the basic A* algorithm [Hart et al., 1968] that halts once the goal state ischosen for expansion. Global A* stores a global g-value for each state that representsthe distance from the initial state.

• Local A* - usually applied during the lookahead phase in RTACS. Local A* haltseither when the goal state is expanded or when the number of expanded states exceedsa given bound. Local A* stores a local g-value for each state that represents thedistance from the location currently occupied by the agent.

84

Algorithm 5: LRTA* with lookahead radius 1Input: state start, state goal

1 sc = start2 while sc 6= goal do3 sc.h =∞4 foreach (state sn in Neighbors(sc)) do5 if (sc.h > sn.h + cost(sc, sn)) then6 snext = sn7 sc.h = sn.h + cost(sc, sn)

8 sc = snext //physical move

5.2 Previous Work on RTACS

Work on RTACS is quite diverse. Published work comes from a range of applications withdiverse evaluation metrics. The original research in this area was used to suboptimallysolve the sliding-tile puzzle [Korf, 1990]. Other work has modeled this as a robotics prob-lem [Koenig, 2001], or as a path planning problem in video games [Bulitko et al., 2008].RTACS algorithms can also be adapted to provide a policy for a Markov Decision Processes(MDPs) and Partially Observable Markov Decision Processes (POMDPs) [Dibangoye etal., 2012]. On the other hand, reinforcement Learning algorithms suited for solving MDPsand POMDPs can be adapted to solve RTACS. When applicable, RTACS algorithms havea clear advantage over RL algorithms [Bulitko and Lee, 2006a] mainly because of threereasons.

1. RTACS algorithms assume a deterministic world which allows them to perform moreaggressive value update rules.

2. Given a none trivial admissible heuristic function RTACS algorithms converge muchfaster to the optimal solution since they never decrease the heuristic value of states.

3. RTACS algorithms usually employ sophisticate local search techniques that are usu-ally not applicable in RL algorithms.

In this section we cover a representative set of previously published RTACS algorithms.

5.2.1 Learning Real-Time A*

Many RTACS algorithms belong to the Learning Real-Time A* (LRTA*) family [Korf,1990]. Different LRTA* variants differ in the learning policy, directing policy or both.Basic LRTA*, with no lookahead, works according to the following policies:

1. Learning - the heuristic value (h) of the currently occupied state (s) is updated ac-cording to s.h = minsn(sn.h+ cost(s, sn)) where sn is a neighbor of s.

2. Directing - the agent, currently located st sc, is directed to move towards a neighbor-ing state sn, with the minimal s.h+ cost(sc, sn) value.

85

Basic LRTA* (introduced by Korf [Korf, 1990]) is presented in Algorithm 5. In itslearning process the agent propagates the heuristic of the current state from its neighbors(Lines 4-7). Then, the agent is directed (directing policy) to the most promising neighborsnext (Line 8). The agent then acts and visits (physically moves to) snext . LRTA* is provedto be Optimal-1 [Korf, 1990].

We illustrate LRTA* using the example graph presented in Figure 5.1. The numbers inparenthesis denote the initial heuristic value for each state. The agent’s initial location is Sand its task is to arrive at G. Notice that states S and B have a large heuristic error (4 and5 respectively). LRTA* starts by updating the heuristic of S to 1 before moving to state B– its most promising neighbor. In B the heuristic is updated to 2, and then the agent movesback to S. The agent then updates h(S) to 3 and moves back to B. This example illustratesone drawback of this approach – an agent may revisit a state many times during the solvingprocess. In the worst case, in a state space with N states, an agent may perform O(N2)

moves before finding the goal [Koenig, 1992].Real-time A* (RTA*) [Korf, 1990] is an algorithm that is closely related to LRTA*.

RTA* has a similar directing policy but differs in its learning policy. In RTA*, s.h ispropagated not from s’s best neighbor, like in LRTA*, but from its second-best neighbor.Assume a currently occupied state s that has several neighboring states, among them s1

and s2. Assume all neighboring states are at distance 1 from s. s1 and s2 have the minimalh-values across all neighbors of s, s1.h = 2 and s2.h = 3. LRTA* will update s.h = 3

(via the best neighbor, s1) since the optimal path from s might be through s1. By contrast,RTA* will update s.h = 4 (via the second- best neighbor, s2). The intuition behind thislearning policy is as follows: if the path via s1 is found to hold an accurate heuristic thenthe agent will not revisit s and its h value will not matter. If, however, s1.h is later updatedto be larger than s2.h = 3, the heuristic value of s is at least 4 (via s2). This learningpolicy allows escaping from local heuristic depressions faster than the learning policy ofLRTA*. But, over multiple trials RTA* may end up with inadmissible heuristic values,causing RTA* to not be Optimal-1.

Local-Search Space LRTA*

In many RTACS scenarios the planning time per step, memory restrictions and sensingradius allow extra computational effort prior to performing an action. This extra compu-tational effort is commonly known as lookahead; during which the agent stays stationaryand expands states that are of close vicinity but beyond the immediate neighbors as doneby simple LRTA (which assumes no lookahead). The set of states that are expanded duringthe lookahead are referred as the Local-Search Space (LSS). In LSS-LRTA* the agent isassumed to have the ability to read and write data in states that are within the LSS. Ex-amining a larger LSS usually yields an increase in the quality of the plan and so the totaldistance traveled by the agent is reduced. This comes at the cost of longer planning timeper step. We now turn to present LSS-LRTA* [Koenig and Sun, 2009].

In each planning phase, LSS-LRTA* initiates a local A* search. The root of the localA* search is the current location of the agent and it is assigned glocal = 0. After expanding

86

Algorithm 6: Learning phase in LSS-LRTA*Input: open list OPEN

1 while OPEN not empty do2 s = extract(OPEN) //Minimal h value3 s.closed = false4 foreach (state sn in Neighbors(s) do5 if (sn.closed = true AND sn.h > s.h + cost(s, sn)) then6 sn.h = s.h + cost(s, sn)7 insert(sn, OPEN)

the root and generating its immediate neighbors, states are prioritized according to their(glocal + h) value and are expanded in a best-first manner. The process of expanding andgenerating states in the local A* phase (lookahead) halts in five scenarios:

1. Goal reached - if the goal is found within the LSS, the local A* search halts and theagent is directed to move towards the goal.

2. OPEN is empty - if the open list (OPEN for short) is empty at any stage of the localA* search, meaning that the entire (connected) state space was expanded, the searchhalts and FALSE is returned. Meaning that there is no feasible path leading theagent to its goal.

3. Time exhausted - each planning phase might be time bounded. Once the time boundhas been reached the local A* search halts.

4. Memory exhausted - A* requires memory to store its open list. Once the memorylimit has been reached (due to the size of OPEN) the A* search halts.

5. Sensing limit reached - once a state that is beyond the sensing radius is chosen forexpansion, the local A* search halts.

We assume all states expanded by local A* are marked closed. Once the local A*search halts due to scenarios 3, 4, or 5, a learning procedure is performed. This procedureis described in Algorithm 6. The open list from the local A* search is stored and passedas an argument to the learning procedure. Next, a best-first search is initiated. At eachexpansion cycle, the state with the minimal h value is extracted from OPEN (Line 2) andstored in variable s. Each state, sn, that is marked closed (sn.closed = TRUE) and isa direct neighbor of s is considered for learning. States that are not marked closed areeither out of the sensing radius or hold an updated h value. If sn should be updated via s,i.e., sn.h > s.h + cost(s, sn), then sn.h is updated and sn is inserted into OPEN (Lines6-7). This learning process guarantees to maintain the heuristics values admissible and soit preserves the Optimal-1 attribute of LRTA*.

The directing policy of LSS-LRTA* is as follows. Prior to the learning process, the beststate (lowest glocal + h) is stored in snext. Once learning is finished, the agent is ordered tophysically move to snext.

87

Algorithm 7: Learning phase in RTAA*Input: open list OPEN

1 f = mins∈OPENs.glocal + s.h //Minimal f value2 foreach (state sc where sc.closed = true do3 sc.h = f − sc.glocal

Real-Time Adaptive A*

Real-Time Adaptive A* (RTAA*) [Koenig and Likhachev, 2006] is a simple and effectivelookahead variant of RTA*. RTAA* differs from LSS-LRTA* only in the learning policy(they both perform a local A* search initially and have a similar directing policy). Algo-rithm 7 presents the learning procedure of RTAA*. First, the minimum f = g + h valueover all states in OPEN is stored in variable f (Line 1). Next, for each state sc that wasclosed during the local A* search, we update sc.h = f − sc.glocal.

This kind of learning gives higher h values to states that are closer to the current posi-tion, allowing the agent to escape local heuristic depressions faster. Moreover, this proce-dure is significantly faster compared to the learning procedure of LSS-LRTA*.

Depression Avoidance in LRTA*/RTA*

Depression Avoidance (DA) [Hernandez and Baier, 2012] is a directing policy for a RTACSagent that performs learning. As such, it is applicable to both LRTA* and RTA* (both dif-fer in their learning policy but have a similar directing policy). We distinguish betweenthe stored (current) heuristic value, denoted as hs, and the initial heuristic value (priorto performing any learning), denoted as hi. Given a set of possible locations, DA di-rects the agent towards the state for which the least amount of learning has occurred, i.e.,argmins[∆(s) = s.hi − s.hs]. As an example, consider a situation where there are twocandidate states, s1 and s2, for the next step. Assume, s1.hs = 1, s2.hs = 3, meaningthat LRTA* (no DA) would direct the agent towards s1 (minimal hs). Further assume thats1.hi = 3, s2.hi = 4 resulting in ∆(s1) = 2,∆(s2) = 1. As a result, daLRTA* woulddirect the agent towards s2 (minimal ∆). By directing the agent towards states with theleast learning, daLRTA* may avoid revisiting states belonging to the same local heuristicdepression many times.

Other variants of LRTA*

f -LRTA* [Sturtevant and Bulitko, 2011] learns both the heuristic to the goal (h) and thedistance from the start state (g). f -LRTA* uses g-values both alone and in combinationwith h-values to find and mark dead states. By learning the g-value of each state f -LRTA*converges to the optimal solution in very few trials at a cost of a slightly worse performancewithin each trial.

LRTS [Bulitko and Lee, 2006b] is a generalization of ε-LRTA* [Shimbo and Ishida,2003], SLA* [Shue et al., 2001] and γ-traps [Bulitko, 2004], all extending the LRTA*algorithm. LRTS uses three main enhancements:

88

Algorithm 8: IDA*/RIBS/EDA*Input: Vertex start, Vertex goal, Int C

1 T = start.h2 while BDFS(start, goal, T ) = FALSE do3 Case IDA* : T = T + C4 Case EDA* : T = T × C

1. Deeper Lookahead Search - use the LSS learning policy to update heuristic valuesand converge to the accurate heuristic faster.

2. Heuristic Weight - converge to a sub-optimal solution (a factor of ε from optimality)in very few trials by increasing the weight of the heuristic by a factor of 1 + ε.

3. Backtracking - upon making an update to heuristic value of the current state, theagent also updates the heuristic value of the previously visited state. This allows theagent to escape local heuristic depressions faster by increasing the heuristic valueswithin the depression.

The number of first visits (F ) any search algorithm performs is bounded by the size ofthe state space (denoted by N ), i.e., F = O(N). In areas where large heuristic errors exist,the number of revisits (R) that all LRTA* algorithms may perform is, in the worst case,linear (in N ) per state [Koenig, 1992], i.e., R = N2. Consequently, the total number ofstate visits (F +R) is O(N2) - quadratic in N .

5.3 Iterative-Deepening Algorithms for RTACS

We now present the Iterative-Deepening (ID) framework in its RTACS form and theoret-ically analyze its attributes. Later, in the next section, we introduce our new algorithm,Exponential-Deepening A*.

ID acts according to the high-level procedure presented in Algorithm 8. T denotes thethreshold for a given Bounded DFS (BDFS) iteration where all states with f ≤ T will bevisited. T is initialized to start.h (line 1). For the next iteration, T is incremented to thelowest f -value seen in the current iteration that is larger than T . For simplicity, we assumethat T is incremented by a constant C (line 3). A lower bound for C is the minimal edgecost.

In many domains, multiple paths to the same node exist; these are called transpositionsor duplicates. Transpositions may blow up the search tree to be exponentially larger thanthe state space. In a 2D, 8-connected grid of radius r there are O(r2) unique states. But,due to transpositions, the number of states expanded by a depth-first search is O(8r). Insuch cases a mechanism called duplicate detection and pruning (DD) may be optionallyemployed. Since in RTACS the agent is allowed to leave small pieces of information in eachstate, marking and detecting previously visited states is allowed and does not incur extraoverhead. ID algorithms risk the possibility of blowing up the state space exponentially

89

(due to duplicate states) if DD is not applied . As a result all ID based algorithm presentedin this chapter use DD as we detail below in section 5.5.

5.3.1 Real-time Iterative-deepening Best-first Search

Real-time Iterative-deepening Best-first Search (RIBS) [Sturtevant et al., 2010] is a variantof IDA* for RTACS. At the high level RIBS is identical to IDA* and executes Algorithm 8.However, the BDFS iterations are physically performed by a moving agent. As a RTACSalgorithm, RIBS may store the index of the current BDFS iteration in each state it visits.If the agent encounters a state with an iteration index equal to T it can infer that this statewas previously visited in the current iteration. Consequently, RIBS can detect and pruneduplicates (DD) within any given iteration. In addition, since the threshold T correspondsto the f = g + h of states, RIBS stores both g- and h-values in each state.2 RIBS isOptimal-1. Moreover, after finishing a single trial RIBS can trace the optimal solution, i.e.,it is also Optimal-0.

Sturtevant et al. [Sturtevant et al., 2010] described pruning techniques that can be em-ployed on top of RIBS. These techniques find and mark redundant states. Redundant statesare a special form of dead states that can be removed from the state space while still main-taining the optimal solution. Sharon at el. [Sharon et al., 2013b] further extended the notionof dead states by pruning states as follows:

• Optimal-1 algorithms - if convergence to optimal solution is desired, all states thatare guaranteed not to be part of the only optimal solution (swamp states [Pochter etal., 2009]) can safely be marked dead.

• Non Optimal-1 algorithms - for a single trial, i.e., finding the goal once, all statesthat are not needed to maintain reachability of the goal (expandable states) can bemarked dead.

Dead-state pruning is applicable in any algorithm that updates and stores global g values.It is thus also applicable to our new algorithm, EDA*.

While not fully generalized for k-dimensional polynomial domains, examples in [Sturte-vant et al., 2010] suggest that in the worst case, RIBS, as well as algorithms from theLRTA* family, will perform O(dk+1) actions before finding a goal at depth d.

5.3.2 Theoretical analysis of ID

Our new algorithm, EDA*, is a variant of ID. Hence, before presenting EDA*, we providea theoretical analysis of ID with regards to the state visits.

For simplicity of discussion, hereafter we make the following assumptions:

1. All ID-based RTACS algorithms use DD within a given iteration (the details of theDD we used are given below).

2Note that RIBS uses a global g-value (distance from start) in contrast to local A* that uses local g-values (withineach LSS).

90

2. The heuristic h is admissible and consistent.

3. The underlying graph has unit edge costs.

4. Fixed branching factor b and fixed dimensionality k for exponential and polynomialdomains respectively.

5. For worst-case analysis we assume h = 0 for all states.

For a given ID iteration, i, we denote the number of states visited for the first time by Fiand the number of states that are revisited (were visited in previous iterations) by Ri. Notethat

∑i Fi = F and

∑iRi = R where F and R are the total first visits and revisits across

the entire trial as defined above.Theoretical studies of ID usually focus on other quantities: Nprev, the total number of

states visited in all iterations prior to the last one, and Nlast, the number of states visited inthe last iteration. We now show that Nlast = F and that Nprev = R. We denote the lastiteration index by n.

Lemma 4 F = Nlast

Proof: States revisited in iteration i are exactly the states visited during iteration i− 1, i.e.,Ri = Fi−1 + Ri−1. Note that there are no revisits at the first iteration, i.e., R1 = 0. Now,

Nlast = Fn+Rn = Fn+F(n−1)+R(n−1) = Fn+F(n−1)+F(n−2)+R(n−2) = ... =n∑i=1

Fi = F

Lemma 5 R = Nprev

Proof: We again use the fact that Ri = Fi−1 + Ri−1. Thus, Nprev =n−1∑i=1

(Fi + Ri) =

n∑i=1

Ri = R

Definition: We say that any RTACS algorithm and ID in particular is balanced if F = θ(R)

(or equivalently, if Nlast = θ(Nprev) in ID). We break this into two conditions:

• Condition 1 - R = O(F ), meaning that the order of F is greater than or equal to R.

• Condition 2 - F = O(R), meaning that the order of R is greater than or equal to F .

If R >> F (condition 1 is violated) the agent spends most of the time revisiting pre-viously seen (non-goal) states. Many existing RTACS algorithms (e.g., the LRTA* family)do not satisfy condition 1. In the worst-case, they are quadratic in the state space due toextensive revisits. That is, for such algorithms F = O(N) but R = O(N2).

If F >> R (condition 2 is violated) the agent might spend too much time in exploringnew but irrelevant states. In an infinite or very large state space this behavior should beavoided. A balanced algorithm has a good balance between F and R; this proved to havea positive effect on the worst case performance in experiments. Note that a balanced algo-rithm has an FV R (FV R = F

F+R.) that will never asymptotically converge to 0 or 1. For

91

ID algorithms, being balanced depends on how fast the BDFS threshold increases betweensuccessive iterations. If the threshold increases too slowly, most of the work will be spenton iterations that are previous to last one which are all superfluous. As a result condition1 will be violated. If, on the other hand, the threshold increases too fast, the final iterationmight use a threshold that is larger than what is needed to find the goal. A large thresholdmight result in visiting many states that are situated further than the goal state and are thusalso superfluous. As a result condition 2 will be violated.

Lemma 6 An algorithm has a worst case complexity linear in the size of the state space,N , iff it satisfies condition 1.

Proof: Since, in the worst case, the entire state space will be visited, and each state can bevisited for the first time only once, F = O(N). If condition 1 is satisfied then R = O(F ).Now, since F = O(N) then R = O(N) too. Thus the complexity of the algorithm (F +R)is also O(N). The second direction is similar. We are given that F + R = O(N). SinceF = O(N), R must also be O(N), so R = O(F ) and condition 1 is satisfied.

5.3.3 Efficiency of IDA*/RIBS

We now discuss the circumstances under which IDA* satisfies both conditions 1 and 2(and is considered balanced). We differentiate between two types of domains: Exponentialdomains and Polynomial domains. We treat each of them in turn.

Exponential Domains

In exponential domains, usually given implicitly, the number of states at depth d is bd

(exponential in d) where b is the branching factor. At each level there are a factor of bmore states. Therefore, in IDA* the total number of states visited in the last iteration isNlast = F = O(bd). The sum of states visited over all prior iterations, Nprev = R, is alsoO(bd) (See [Russell and Norvig, 2010], pp 90 for more details). Therefore, F = θ(R), bothconditions 1 and 2 are met, and according to our definition, IDA* is balanced in exponentialdomains.

Polynomial domains

In polynomial domains the number of states at radius r from the start state is rk where k isthe dimension of the domain. The last iteration visits Nlast = F = O(rk) states. However,the total number of states visited prior to the last iteration is (we denote q = r − 1):

Nprev = R =

q∑i=1

ik >

q∑i=q/2

ik >

q∑i=q/2

(q

2)k =

q

2· (q

2)k =

qk+1

2k+1

Since 2k+1 is a constant and q = r − 1, R = Ω(rk+1). Condition 2 is satisfied (F =

O(R)). But, since R = Ω(rk+1) and F = O(rk) then R >> F and condition 1 is violated.

92

For example, in a straight line of length d, IDA* will perform O(d) first visits and O(d2)

revisits.3

Since condition 1 is violated in polynomial domains, the complexity of IDA* is largerthan the size of the state space (according to Lemma 6). Exponentially Deepening A*remedies this and satisfies both conditions 1 and 2 for RTACS problems on polynomialdomains such as grids. This was the main motivation for developing EDA*.

5.4 Exponential Deepening A* (EDA*)

We now turn to describe our new algorithm Exponential Deepening A* (EDA*). We beginby describing and analyzing EDA* as classic single planning phase algorithm. EDA* issimilar to IDA* in that it performs multiple bounded depth-first searches until the solutionis found. EDA* only requires one change to the IDA* pseudo code presented in Algo-rithm 8 (line 4). That is, instead of adding a small constant to the threshold, we multiplythe previous threshold by a constant (C). For example, assuming unit edge costs and h ≡ 0,IDA* will perform iterations with thresholds 1, 2, 3..., i. By contrast, if C = 2, the EDA*thresholds will be 2, 4, 8, 16 . . . , 2i.

EDA* is a general framework and can be applied in both classical search settings and inRTACS environments. Similar to RIBS, EDA* in RTACS environments performs full DDwithin a given iteration.

We will show below that, unlike IDA*, in polynomial domains the number of nodes inthe last iteration of EDA* F equals θ(R). Thus, Conditions 1 and 2 are satisfied, and EDA*is considered balanced on polynomial domains. Empirical results supporting this claim areprovided in the experimental section.

5.4.1 Optimality of EDA*

When proving optimality of EDA* we must distinguish between classical search settingand RTACS.

Classical search setting (without the RTACS restrictions)

Lemma 7 EDA*, as a classic single planning phase search algorithm, is Optimal-0.

Proof: Let d be the cost of the optimal solution. Let i be the value for which Ci−1 < d ≤Ci. The goal will not be found in iteration i − 1 as d > Ci−1. On the other hand, EDA*must find the optimal solution in the ith iteration if it completes that iteration (i.e., the

3We worked under the assumption of fixed branching factor and unit edge costs in exponential domains (where bothconditions 1 and 2 were met) or fixed dimensionality in polynomial domains (where condition 2 was violated). In order tosatisfy Conditions 1 and 2 for varying branching factor and/or for non-uniform edge costs, the threshold of IDA* shouldbe dynamically changed. To this end, a formula is required to calculate/predict the next effective threshold. Predictiontechniques for IDA* have been well-studied [Korf et al., 2001; Zahavi et al., 2010; Lelis et al., 2013]. These methods,however, require a preprocessing phase which is not possible for RTACS due to its real-time nature. There exist otherprediction methods which do not require preprocessing [Vempaty et al., 1991; Sarkar et al., 1991; Burns and Ruml,2012]. However, all the above prediction methods require extra memory which is not available in RTACS and assume noDD. For example, for an 8-connected grid these methods are suitable for an exponential domain with b = 8.

93

iteration does not halt when the goal is first reached). In iteration i, all states with f ≤ Ci

are visited. In particular, we will see the goal via its optimal solution as d ≤ Ci.4

RTACS setting

Lemma 8 EDA* when used as a RTACS algorithm is Optimal-1: Let d be the cost of theoptimal solution. EDA* will converge to the optimal solution after d trials if it completesthe final iteration in each trial.5

Proof: By induction on d. We will prove that the optimal path of length dwill be discoveredin trial d. For d = 1, the optimal path of length 1 will be discovered in the first trial (trial 1)via the start state. Assume the induction holds for all paths of length z − 1. Now, considerthe optimal path of length z composed of states s1, s2...sz. The optimal path to state sz−1

was found after z − 1 trials, according to the assumption. State sz−1 must be expandedduring the last iteration. When sz−1 is expanded the optimal path to state sz will be found,that is, the known shortest path to sz−1 plus the connecting edge E(sz−1, sz). Tracing backfrom the goal state will reveal the optimal path.

5.4.2 Balance of EDA*

To deal with the balance conditions for EDA*, we again distinguish the two types of do-mains.

Exponential domains:

Assume that the depth of the goal is d = Ci + 1. In this case, the goal will not be foundin iteration i, and will instead be found in iteration i + 1. All states with f ≤ Ci+1 willbe visited during the last iteration. There are Nlast = F = O(b(C(i+1))) such states in total.In all previous iterations Nprev = R =

∑ij=0 b

(Cj) = O(b(Ci)) states will be visited. Inexponential domains EDA* satisfies Condition 1, R = O(F ). EDA*, however, violatesCondition 2 since F 6= O(R). In practice, violating Condition 2 translates into visitingmany surplus states [Felner et al., 2012; Goldenberg et al., 2014], in such states f > d

meaning that visiting such nodes is not necessary in order to reach the goal (via the optimalpath). Thus, EDA* is imbalanced for exponential domains.

Polynomial domains:

We first assume that in a polynomial domain of dimension k the number of unique statesvisited by EDA* within a threshold T is θ(T k). This is a simplifying assumption whichis always true in state spaces with no transpositions. When transpositions exist and sincewe perform DD and do not allow re-opening of nodes, there are pathological cases where

4Strictly speaking when transpositions exist, EDA* is Optimal-0 only if one of the following conditions hold: (1)DD is not applied (e.g., in implicitly given exponential domains). (2) DD is applied (e.g., in explicitly given polynomialdomains) and we allow re-opening and then re-expansion of nodes if they are seen again with smaller g-values.

5Note that a trial is the entire procedure of solving a given problem while an iteration is a single BDFS call within atrial.

94

Solver Exponential (N = bd) Polynomial (N = dk)Nodes Cond-1 Cond-2 Balanced Nodes Cond-1 Cond-2 Balanced

IDA* bd + + X dk+1 - + ×EDA* bC(d−1) + - × Cdk + + X

Table 5.1: IDA* vs EDA*. Ci = condition i.

this assumption does not hold. Such cases require a different treatment which is discussedin A.2. Nevertheless, our experimental results show that the general trends occur in realworld domains (even when transpositions occur).

If the goal is found in iteration i, EDA* will visit F = (Ci)k = (Ck)i = (C)i states,where C = Ck is a constant. In all previous iterations the agent will visit

R =i−1∑j=0

(Cj)k =i−1∑j=0

(Cj) = θ(Ci)

Consequently, F = θ(R). EDA* satisfies both conditions 1 and 2. Since EDA* satisfiescondition 1, its worst case complexity is linear in the state space, as proven in Lemma 6.Since it satisfies condition 2, the number of surplus nodes visited will not hurt the com-plexity. As a result, EDA* is considered balanced on polynomial domains.

Table 5.1 summarizes the total complexities of IDA*/RIBS and EDA* on polynomialand exponential domains and also points which of the conditions they satisfy. Note that forboth exponential and polynomial domains EDA* satisfies condition 1 and has a worst casecomplexity linear in the size of the state space.

5.5 BDFS in RTACS

ID algorithms can be easily adapted to work in RTACS environments. The high-level IDprocedure (as presented in Algorithm 8) remains unchanged. The low-level BDFS proce-dure, by contrast, requires some adaptations as it must now be simulated by a moving agent.The low-level BDFS procedure for RTACS environments is presented in Algorithm 9.

During a BDFS iteration the following pieces of information are written in each visitedstate s:

• 1: Iteration index (labeled s.i): The iteration index of the last BDFS iteration inwhich this state was visited. If for iteration I and state s, s.i = I then state s istreated as a duplicate state (This is checked in Line7). If, however, s.i 6= I it meansthat s was never visited during iteration I and EDA* sets s.i = I (Line 5).

• 2: Best g-value (labeled s.g): The shortest known distance from the start state, the(global) g-value, of state s. When a state sn is examined (among the other neighbors,Line 7) via s, sn.g is updated according to sn.g = min(sn.g, s.g + cost(s, sn)). Thebest g-value is stored across visits, iterations and trials.

• 3: h-value (labeled s.h): An estimation of the distance to the nearest goal.

95

Algorithm 9: BDFS for ID in RTACSInput: State current, State goal, Threshold T , Iteration I

1 current.g = 02 while current 6= goal do3 if (current = NULL) then4 return FALSE //Backtracked from root5 current.i = I6 LSS(current)//Local search space - lookahead (Optional)7 snext = best unvisited neighbor, null if non exist.8 if (current.g + h(current, goal) > T ) OR snext = null then9 current = current.p //Physical move - Backtrack

10 continue //Next while loop11 snext.p = current12 current = snext //Physical move13 return TRUE

• 4: Parent pointer (labeled s.p): BDFS must backtrack once it reaches a leaf. Tothis end, a backpointer is stored. When the agent moves away from state s to aneighboring state snext the parent pointer snext.prev is set to s (Line 11).

When BDFS examines a state s, s was either previously visited or not visited duringthe current iteration. In both EDA* and RIBS, the agent never revisits a state in the sameiteration. There is one exception, backtracking to a previously visited state. If the agentreaches a state with no valid successors (a valid successor holds f ≤ T and is yet unvisitedduring the current iteration), the agent backtracks to the state visited prior to s.

Lookahead within BDFS (Line 6) is optional and is activated only if the settings of theenvironment allows a local-search space that is larger than 1. This procedure is explainedbelow in Section 5.7.

When the agent moves to a new state s 6= goal in iteration I (Line 12), a new while loopbegins (Line 2) and the iteration index s.i is updated to I (Line 5). In Line 7 all neighborsof state s are examined; during this process their g value, as well as the h value of states, are updated.6 The neighbor with the lowest f = g + h, that was not yet visited duringthe current iteration, is chosen as snext. Checking whether sn was visited in the currentiteration is done by checking if sn.i = I . If no unvisited neighbor exist, snext is set to null.Next, two cases exist:

(1) (Lines 8-10) s.g + s.h > T or snext = null. In this case, the agent backtracks tostate s.prev.

(2) (Line 11-12) s.g + s.h ≤ T and snext 6= null. In this case, snext.prev is updated tobe s. Then, the agent physically moves to snext.

We illustrate EDA* for RTACS using the example graph depicted in Figure 5.1 (par-ented here again). The running example is provided in a scheme presented in Figure 5.2.Each dashed bracket represent an iteration (T = 1, 2, 4). Each full bracket represent thecurrent location of the agent at each time step. An arrow within a full bracket represents the

6Updating g and h values is not necessary for EDA* to be complete. It is required, however, for the Optimal-1attribute.

96

A(3) S(0) B(0) G(0) 3 1 1

Figure 5.2: A running example of EDA* for RTACS on the problem instance depicted in Figure 5.1.For convenience, a copy of Figure 5.1 is given above.

97

updating of a field. An arrow between two full brackets represent a physical movement be-tween two states. (bt) represents arriving at a state via the backtracking procedure. Assumethe initial threshold is T = 1 (top left dashed bracket). The agents initial location is state S(left most bracket within T = 1). At its first location (S) the agent updates both S.i and S.hfrom zero to one leaving S.g = 0 and S.p = NULL. At this point the agent also updatesthe g-value of the neighboring states (B.g and A.g) to one. Next, the agent is directed tothe most promising neighboring location (with lowest f -value), B. At B the agent updatesB.i = 1, B.h = 2 and B.p = S. Since B has no neighbors that were not visited during thecurrent iteration, the agent backtracks to state S. Since S has no neighbors that were notvisited during the current iteration and hold f ≤ T (A.f=4), the first iteration comes to anhalt. Next, the threshold is increased (T = 2) but no state is visited since the f value of theroot is greater than T . Consequently, T is increased to 4 (bottom dashed bracket). Duringthis iteration the goal state is reached by the agent.

5.6 Experimental Results

This section presents experimental results supporting the theory that was previously pre-sented and comparing EDA* to other RTACS solvers. In some experiments we add resultsfor global A* which is a classical search algorithm and not a RTACS algorithm. Neverthe-less, when the different RTACS algorithms use a Local Search Space (LSS) they perform alocal A* search over the LSS in their planning phase. In the limit, where the LSS includesthe entire state space, these algorithms will act identically to global A*. Consequently,global A* serves as a lower bound (when infinite memory, planning time and sensing ra-dius are provided).

5.6.1 Supporting the Theoretical Analysis

In order empirically support the claim that EDA* is balanced in polynomial domains wepresent two experimental settings, on open grid and a game map.

Open grid

First, we used a large grid (2000×2000) with no obstacles to clearly show the trends ofthe algorithms. As we aim to prove the worst case performance bounds for the differentalgorithms, the heuristic values of all states were set to zero (h ≡ 0). The distance dbetween start and goal varied from 1 to 500.

Figure 5.3(left) presents the number of expanded states (y-axis) as a function of thedistance from start to the goal (d) (x-axis) for IDA*, A*, RIBS (IDA*+dead-state pruning),and EDA* withC = 2. Clearly, EDA* outperforms RIBS which in turn outperforms IDA*.The quadratic growth of IDA*/RIBS vs. the linear growth of EDA* is clearly shown. A*expanded less states than EDA* and is thus invisible; hidden by the EDA* curve. Recallthat FVR was defined above as FV R = F

F+R. Similarly, an algorithm is balanced if

F = θ(R). Hence, the FVR metric can give an empirical evidence whether a RTACS

98

Figure 5.3: Open map experiments: A*, IDA*, RIBS, EDA*.

Algorithm Expanded Time (MS) Expansions per second FVRA* 222,792 316 705 1.00EDA* 569,402 226 2,524 0.02IDA* 56,000,887 39,998 1,400 0.01RIBS 28,139,592 37,309 754 0.58

Table 5.2: Average measurements for the open map experiment.

algorithm is balanced. If F = O(R) and F 6= Ω(R) then the corresponding FVR willconverge to zero. If F = Ω(R) and F 6= O(R) then the FVR will converge to one. IfF = Θ(R) then the FVR will neither converge to zero nor to one.

Figure 5.3(right) presents the FVR (y-axis). Here, A*, RIBS and EDA* are reported.A* never checks the same state more than once, therefore, F = R and its FVR is always1. The FVR for RIBS decreases quickly, converging to 0. This supports the claim that forpolynomial domains RIBS (and IDA*) are imbalanced. Both perform a factor of d morerevisits compared to first visits (F = dk and R = d(k+1)). By contrast, EDA* has an FVRwhich is always very close to 0.5 and is not significantly decreasing nor increasing. Thissupports the claim that EDA* is balanced (F = θ(R)). The “steps” in FVR for EDA*correspond to the jumps of T and can be observed at Ci = 1, 2, 4, 8, 16, 32, 64, 128. Forany i, once the goal is further than Ci, the i+1 iteration must be performed and some statesat radius Ci+1 will be visited.

Table 5.2 presents the average measurements over all the 500 open map instances. A*has a larger FVR compared EDA*, i.e., A* performs less revisits. In fact A* has an FVRequal to one meaning it performs no revisits (R = 0). As a result EDA* expands more thantwice more nodes compared to A*. However, since EDA* expands a factor of three morestates per second its average time (226 MS) was faster compared to A* (316 MS). Notethat other RTACS solvers cannot be compared when h ≡ 0, as they will move arbitrarilyaround the state space.

99

Figure 5.4: F vs R values on video game maps as 8-connected grid with octal-distance heuristics.

Game map

Our second experimental setting aims to support our theoretical claims in a practical do-main, video game maps, using heuristic guidance. For this experiment we used the entireset of Dragon-Age: Origins (DAO) problems from the movingai repository [Sturtevant,2012]. h was set to octal distance. Figure 5.4 presents the F (X axis) and R (Y axis) val-ues of each problem instance as a scattered graph. We present values for three algorithms:EDA* and the two strongest LRTA* variants, daLRTA* and daRTA*. We also present threefitted curves for these algorithms based on the dots. Both daLRTA* and daRTA* lines showthat R grows super superlinear with F . By contrast, the line for EDA* shows a linear re-lation between r and F . These results give evidence to the fact that unlike other RTACSalgorithms, EDA* is balanced even when heuristic guidance is available.

5.6.2 RTACS Experiments on Video Game Maps

In this section we report average measurements for the entire set of DAO problems (allbuckets, all 159,252 instances).7 h was again set to octal distance. The following algo-rithms were used for this experiment: A*, LRTA*, RTA*, daLRTA*, daRTA*, f -LRTA*,RIBS and EDA*8. In this experiment, for all algorithms we used the default lookahead(LSS = 1), results for larger values for LSS are provided below in Section 5.7.2. ForEDA*, the number in parenthesis denotes the size of the constant factor C taken from 1.1,

7Problem instances in the repository are divided into buckets. The nth bucket holds problems of length [4n, 4n + 4).8The dead states pruning of RIBS is also applicable to EDA*. Unlike RIBS, the complexity of EDA* is not dominated

by state revisits. Consequently EDA* does not benefit greatly from dead state pruning; we found the overhead to be notworthwhile.

100

Algorithm Expanded Distance Time (MS) FVRClassical search algorithms

A* 11,345 380 78 1.00IDA* 6,142,549 14,617,700 9,975 0.19

Real-Time Agent-Centered algorithmsRIBS 330,397 742,138 1,971 0.29f -LRTA* 82,111 92,149 340 0.49LRTA* 237,233 243,075 284 0.42daLRTA* 33,486 35,645 105 0.66RTA* 60,744 70,481 78 0.60daRTA* 26,664 30,978 82 0.74EDA*(1.1) 48,797 109,146 135 0.40EDA*(1.5) 18,984 38,518 51 0.59EDA*(2) 15,243 29,764 40 0.65EDA*(4) 12,970 24,248 34 0.70EDA*(8) 12,714 23,553 33 0.71EDA*(16) 12,785 23,689 33 0.71

Table 5.3: Average measurements over all DAO problems (LSS=1).

1.5, 2, 4, 8, 16.Table 5.3 reports the averages over all instances of the following four measures:

1. The number of expanded states (Expanded column). This shows the magnitude ofwork done by each algorithm.

2. The total distance traveled during the solving process (for RTACS algorithms) (Dis-tance).

3. CPU runtime in milliseconds. The CPU time spent in the planning phases (Time).

4. First Visit Ratio (FVR).

The best algorithm in each category is in bold. The total distance traveled highly corre-lates to the number of expanded states. Distance is slightly higher than the nodes expandeddue to the agent’s ability to move diagonally; diagonal moves cost

√2 while only expand-

ing one state. In the DFS algorithms (IDA*, RIBS, EDA*), the distance traveled correlatesto twice the number of expanded states because the agent travels in each edge twice (onceforward and second backtracking).

Different C values for EDA* influence the performance. The value of C = 8 was bestfor all 4 measures. EDA*(8) outperformed all other algorithms in all measures. If wedisregard RTA* and daRTA* which are not Optimal-1 then the advantage of EDA* is evenmore dramatic over all Optimal-1 algorithms.

5.6.3 Different Domains

In our next experiment we test the relative performance of EDA* across all three differentdomains in the repository [Sturtevant, 2012]: Mazes, girds with random obstacles and roommaps.

101

Algorithm Expanded Distance Time (MS) FVRMazes

daLRTA* 167,032 167,032 149 0.34daRTA* 265,073 265,073 223 0.15EDA*(32) 73,496 73,495 57 0.51

RandomdaLRTA* 146,682 153,214 190 0.11daRTA* 1,391,168 1,576,520 1,727 0.02EDA*(8) 51,547 55,827 52 0.50

RoomsdaLRTA* 20,569 21,003 29 0.52daRTA* 42,203 45,778 59 0.32EDA*(2) 47,390 54,152 39 0.57

Table 5.4: Average measurements in different domains.

• Mazes - We experimented on all 10 mazes with corridor width equal to 1. Eachof these mazes has exactly 131,071 passable states. We used all available instances(145,020 in total).

• Random obstacles - We experimented on all ten maps with 40% random obstacle.The average number of passable states is 96,365. We used all instances (35,360 intotal) from all scenarios.

• Rooms - We experimented on all ten rooms maps size with rooms size 64x64 and80% of doors opened. The average number of passable states is 249,352. We used allinstances (21,080 in total).

Table 5.4 presents similar average measurements as presented in Table 5.3. Here weonly compared the three strongest algorithms: daLRTA*, daRTA*, EDA*. For EDA* wereport the best C observed for each domain. On mazes and random obstacles one canobserve a clear advantage to EDA* over all metrics. On the rooms domain, by contrast,EDA* was inferior to both daLRTA* and daRTA*. This phenomena occurs since the roomsdomain has relatively small heuristic depressions (each room) causing algorithms of theLRTA* family to perform well.

5.6.4 Worst case performance

EDA* has the best worst-case complexity (linear in the state space) among all other RTACSsolvers (most are quadratic in the state space). Consequently, the advantage of EDA* ismuch clearer when we look at the worst performance over all instances for each algorithm.We present the worst case measurements for the three strongest RTACS algorithms: daL-RTA*, daRTA* and EDA* on two representative domains: video game maps (DAO) andrandom obstacles.

Figure 5.5 presents four graphs. For each of the representative domains, two graphs arepresented: one for distance traveled and another for CPU runtime. For each domain andeach algorithm, the distance and runtime measurements were sorted across the instances.

102

Figure 5.5: sorted measurements of 50% of the hardest instances.

Each curve in each graph shows the worst 50% values in increasing order (in the X axis).The value of the measurements is specified on the Y axis (in logarithmic scale). There aretwo important conclusion that can be inferred from these results:

1. EDA* has the best worst-case performance (the instance in the right of the curves).On all domains the worst case performance of EDA* is better than the performance ofthe currently known state-of-the-art algorithms. There is one exception to the aboveclaim, in the Rooms domain EDA* perform similar to daLRTA* in the worst case.

2. As the problems become more complex the relative performance of EDA* improves.Remember that the y axis is in logarithmic scale and so this tendency is actuallymuch stronger than what it seems. It is reasonable to assume that this tendency willcontinue and that for more complex instances (even from the rooms domain) EDA*will outperform its competitors.

Table 5.5 has the same format as Table 5.3 but present an average over the 1% of in-stances where the algorithm performed worst (the instances may vary among the algo-rithms). The advantage of EDA* is much clearer when we consider the hardest instances.For instance, daRTA* expanded a factor of 2 more states compared to EDA* over all DAOinstances while on the 1% worst instances it expanded a factor of 4 more states.

5.7 LSS-EDA*

In this section we handle the case where memory, time and sensing radius constraints allowperforming lookahead prior to each move. We propose a lookahead technique for EDA*

103

Alg. Expanded Distance TimeDAO

daLRTA* 553,082 588,912 1,071daRTA* 769,562 916,424 1,356EDA*(8) 178,141 206,261 216

RandomdaLRTA* 1,412,960 1,477,766 1,844daRTA* 10,645,791 12,085,471 13,021EDA*(32) 257,113 278,463 275

Table 5.5: 1% worst cases average measurements over all experiments with no lookahead.

denoted as LSS-EDA* that is inspired by online detection of dead states [Sharon et al.,2013b]. In LSS-EDA*, before choosing the agent’s next step (Line 7 in Algorithm 9),a lookahead procedure is performed (Line 6). The lookahead procedure is described inAlgorithm 10. This lookahead search is performed while the agent remains stationary instate s. During the lookahead a local A* search rooted at s is performed. Recall that statesexpanded during the lookahead are denoted as the Local Search Space (LSS). The agent isassumed to be able to read and write data in states that are within the LSS.

Algorithm 10: LSS for EDA*Input: State current

1 OPEN, CLOSED ← A∗(current)2 Mark closed and open states expandable3 Mark current needed4 foreach (state s in OPEN) do5 p← reach(s)6 Mark p needed

7 Mark all expandable states dead

As in LSS-LRTA*, the local A* search halts in one of the four conditions specified inSection 8. Once the local A* search halts, the OPEN and CLOSED lists (representing theinner nodes and the frontier nodes of the LSS, respectively) are stored in memory (Line1). Next, all states from both OPEN and CLOSED of the local A*, except the currentstate s, are initially marked as expandable while s itself is marked as needed. Then, thealgorithm iterates over all states in OPEN (Line 4). For each state x in OPEN, we calculateP (x) which is the shortest path in the LSS (i.e., the nodes of OPEN and CLOSED that arestored) leading from x to any needed state within the LSS. Next, All states in P (x) areset to be needed (Function reach(s), Line 5). We note that P (x) can be of length zero ifx is already needed. Once the for loop terminates (all OPEN states are now needed), allthe remaining expandable states are marked as dead (Line 7). These nodes do not lay ona shorter path to any of the OPEN nodes. Thus, they may be later ignored by the BDFSsearch. This process guarantees that at least one path leading the agent to the goal remains.The advantage of this process is that the BDFS iteration (where the agent physically moves)will not enter any of the dead states. The LSS phase in EDA* dramatically speeds up theBDFS phase and significantly reduce the number of visited states.

104

After each planning phase, LSS-LRTA* and RTAA* direct the agent towards the beststate on the fringe of the lookahead and, in practice, may perform several move actionsper acting phase. LSS-LRTA* and RTAA* do so since moving to a neighboring statesand performing lookahead from that state usually results in a similar LSS which will notcontribute much learning. EDA*, by contrast, performs a lookahead after every singlemove and benefits greatly from it as the lookahead usually considers many different states.This happens because the lookahead marks many states dead, meaning that they are neverconsidered in successive lookaheads.

Figure 5.6: An example 4-connected grid with an agent (robot) currently occupying the top leftcorner.

As an example consider the 4-connected grid depicted in Figure 5.6. The agent (repre-sented by the robot) is currently located in the top-left corner. The robot has a sensing radius(dashed red line) allowing it to manipulate all the states seen in the figure. In this example,each state is given a unique letter and a heuristic value (in parenthesis). After performingthe local A* search (LSS) OPEN contains the following states with their f = glocal + h

values in parenthesis C(8), H(6), I(6), J(6). All states within the sensing radius are ini-tially marked expandable except state currently occupied by the agent (S) (Algorithm 10,Lines 2-3). Next, each state from OPEN (ordered by its f -value) is considered. We startwith state J . The shortest path from J to the currently only needed state (S) is returnedp = (D,E, F, I, J) and all its constituted states are marked needed (Lines 5-6). Next, stateI is chosen, since I is already needed p = ∅. State H is chosen next, since H has a neededneighbor (E), p = (H) and H is marked needed. Finally, state C is chosen, resulting inp = (B,C) which are marked needed. The only remaining expandable state is A which ismarked dead, i.e., will never be considered again by the agent.

5.7.1 Comparing EDA* and LSS-EDA* on an Example graph

Figure 5.7 demonstrates the differences between EDA* and LSS-EDA*. The figure depictsthe Brushfire map from the video game Starcraft, the final iteration of both EDA* (left)and LSS-EDA* (right). In this example the lookahead of LSS-EDA* allows 10 states to beexpanded (due to memory restrictions). The color legend is as follows:

105

Figure 5.7: Comparing EDA* on the Brushfire map with no lookahead (left) and LSS-EDA* withlookahead of 10 (right). In red - states visited by the agent, gray - states marked dead.

• White - traversable terrain.

• Black, Green - obstacles.

• Light Red - states visited during the final iteration.

• Dark Red - states backtracked during the final iteration.

• Gray - States that were marked dead by LSS-EDA*.

A large area of the map on the right is colored gray, meaning that many states weremarked dead. As a result, the last iteration of LSS-EDA* visited only a small fractionof the states visited by EDA* with no lookahead. LSS-EDA* reduced the total distancetraversed by the agent by a factor of 5.44 (170,069 vs. 31,282). Of course, this comes at amore costly planning phase.

5.7.2 Experimental Results When Lookahead is Applied

Lookahead is used when available resources (time, memory, sensing) allow it. In thesecases the goal is to make use of the available resources for reducing the travel distance.Under this logic, minimizing resources consumption is not important, instead the aim isto make use of the available resources in order to reduce the physical traveling distance.This is usually the case when the physical movement of the agent is slow and allows extracomputation per step. Next, we present experimental results when lookahead is availablein two scenarios which define the LSS:

• Bounded memory - the agent is allowed to store a limited number of states in mem-ory. This assumption was commonly used in previous publications [Koenig and Sun,2009; Hernandez and Baier, 2012] where the bound on the number of allowed statesin memory was referred as the lookahead parameter.

106

Algorithm Expanded Distance Time (MS)5 states bound

LSS-daLRTA* 52,300 14,896 393daRTAA* 41,279 14,936 152LSS-EDA*(8) 36,601 9,513 216

10 states boundLSS-daLRTA* 70,566 11,871 595daRTAA* 47,298 9,886 210LSS-EDA*(8) 32,099 4,834 256

100 states boundLSS-daLRTA* 62,575 3,809 407daRTAA* 76,373 4,657 310LSS-EDA*(8) 73,315 1,238 515

Table 5.6: Average measurements when memory bounded lookahead is applied in DAO scenarios.

Algorithm Expanded Distance Time (MS)2 MS bound

daLRTA* 42,280 1,891 252daRTAA* 38,125 1,442 152EDA*(8) 143,238 685 670

4 MS bounddaLRTA* 34,582 1,354 206daRTAA* 30,235 1,051 124EDA*(8) 142,414 570 611

8 MS bounddaLRTA* 26,840 983 185daRTAA* 24,694 813 123EDA*(8) 55,710 492 293

Table 5.7: Average measurements when time bounded lookahead is applied in DAO scenarios.

107

• Bounded planning time - the agent is allowed to spend a bounded amount of com-putation time for the planning phase prior to making each step (acting).

We compared the strongest RTACS solvers, daRTAA* and LSS-daLRTA*, to LSS-EDA* with C = 8, with varying memory/time bounds on the LSS. Again, all values areaveraged over the entire set of DAO problems. Table 5.6 presents results for memorybounded lookahead while Table 5.7 presents results for time bounded lookahead. LSS-EDA* presents an impressive reduction in distance traveled by the agent (up to ×3 reduc-tion in 100 states bound) when compared to LSS-daLRTA* and daRTAA*. On the otherhand, LSS-EDA* consumes more CPU runtime and expands more nodes (up to ×5 in-crease in expanded states). The results show a clear picture in which EDA* better utilizesits allowed lookahead (evident in the number of expansions and CPU runtime) to reducethe total distance traveled by the agent.

5.8 Conclusions and Future Work

In this chapter we discussed three main issues:

1. The Real-Time Agent-Centered Search problem (RTACS).

2. Conditions for which RTACS algorithms are considered to be balanced.

3. A new ID variant, Exponential Deepening A* (EDA*), designed to solve RTACSproblems while maintaining the conditions of being balanced.

EDA* is intuitive and very simple to implement. To the best of our knowledge EDA*is the only RTACS algorithm that is, in the worst case, linear in the state space. Experi-mental results on grids support our theoretical claims; EDA* shows a worst case behaviorthat is linear in the state-space. Moreover, EDA* outperforms other algorithms in all themeasurements and in all domains except the rooms domain. In addition, EDA* is shown tobe very robust across different instances evidence in its worst case performance.

We then introduced a lookahead variant of EDA* denoted as LSS-EDA*. LSS-EDA*utilizes its allotted resources, mainly memory and CPU runtime to reduce the total distancetraveled by the agent. LSS-EDA* presents up to a ×3 reduction in distance traveled overother algorithms that use lookahead.

Throughout this chapter we assumed a constant dimensionality (k) of the polynomialdomains. In many cases the dimensionality may vary in different parts of the domain.The theory presented in this chapter does not hold for these cases. For these cases a morecomplex variant of EDA* is needed, one with a dynamic C parameter. Such a parametercalls for using learning and prediction methods. A large body of work has dealt with this inexponential domains, where no DD is used. Adaptation needs to be done for those methodsto work in polynomial domains with DD.

108

Chapter 6

Conclusions and Future Work

This thesis focused on two complex variants of the path-finding problem. The first variantis the Multi-Agent Path-Finding (MAPF) problem. Two novel algorithms, the Increasing-Cost Tree Search and the Conflict-Based Search, were presented and evaluated.

MAPF was presented and defined in Chapter 2 where we also presented new termi-nology to help classify existing and future MAPF solvers into four categorize: Optimal,Sub-optimal search-based, Sub-optimal procedure-based and Sub-optimal hybrid.

Chapters 3 and 4, presented three novel approaches for solving MAPF:

• The Increasing-Cost Tree Search (ICTS) - ICTS brakes the MAPF optimizationproblem to a set of easier decision problems (given a set of costs is there a solution).

• The Conflict-Based Search (CBS) - CBS is unique in that all low-level searches areperformed as single-agent searches.

• The Meta-Agent CBS (MA-CBS) - MA-CBS is a generalization of the CBS algo-rithm.

ICTS, CBS and MACBS present impressive performance on benchmark problems wherethey outperform previous state-of-the-art solvers.

We believe that in this thesis we only touched the surface of tackling optimal MAPF.Much research is left for future work:

1. Developing stronger heuristics for the basic (A*) state space representation of MAPFand applying it to ICTS and (MA-)CBS.

2. The use of inference in ICTS pruning techniques and constraints in (MA-)CBS over-laps with work on CSP and SAT, a connection that has not been well-explored. Thereare theoretical connections between these fields [Rintanen, 2011] that need morestudy.

3. Obtaining deeper insights into the influence of the different parameters of the prob-lems on the properties of a MAPF instance. Such deeper insight could, for example,better reveal when the ICTS, CBS and MA-CBS frameworks are valuable and whatenhancements will perform best under which circumstances. Such understandingmight give rise to new hybrid algorithms as well.

109

4. Conducting a rigorous theoretical and empirical comparison between all known algo-rithms and approaches of solving MAPF.

The second path-finding variant discussed in this thesis is the Real-Time Agent-CenteredSearch (RTACS). Chapter 5 presented EDA*. EDA* is the only RTACS algorithm that is,in the worst case, linear in the state space. Experimental results on grids supported thistheoretical claim. Future work on RTACS will concentrate on two main objectives:

5. EDA* may visit an exponential number of states more than A*/Dijkstra’s algorithmeven in polynomially growing domain. The existence of an algorithm for RTACS thathas a worst-case complexity linear in the complexity of A* is still an open question.

6. Applying RTACS algorithms to solve Markov-Decision Problems (MDPs) is a promis-ing objective that was previously researched [Dibangoye et al., 2012]. Adaptingnew RTACS techniques such as EDA*, Depression Avoidance [Hernandez and Baier,2012] and State Pruning [Sharon et al., 2013b] might have a great impact on the MDPcommunity.

110

Bibliography

[Agmon et al., 2012] Noa Agmon, Chien-Liang Fok, Yehuda Emaliah, Peter Stone, Chris-tine Julien, and Sriram Vishwanath. On coordination in practical multi-robot patrol. InIEEE International Conference on Robotics and Automation (ICRA), May 2012.

[Amir et al., 2015] Ofra Amir, Guni Sharon, and Roni Stern. Multi-agent pathfinding as acombinatorial auction. AAAI, 2015.

[Barer et al., 2014] Max Barer, Guni Sharon, Roni Stern, and Ariel Felner. Suboptimalvariants of the conflict-based search algorithm for the multi-agent pathfinding prob-lem. In ECAI 2014 - 21st European Conference on Artificial Intelligence, 18-22 Au-gust 2014, Prague, Czech Republic - Including Prestigious Applications of IntelligentSystems (PAIS 2014), pages 961–962, 2014.

[Bellman, 1957] Richard Bellman. Dynamic Programming. Princeton University Press,Princeton, NJ, USA, 1 edition, 1957.

[Bennewitz et al., 2002] Maren Bennewitz, Wolfram Burgard, and Sebastian Thrun. Find-ing and optimizing solvable priority schemes for decoupled path planning techniques forteams of mobile robots. Robotics and autonomous systems, 41(2):89–99, 2002.

[Bhattacharya et al., 2010] Subhrajit Bhattacharya, Vijay Kumar, and Maxim Likhachev.Distributed optimization with pairwise constraints and its application to multi-robot pathplanning. In Robotics: Science and Systems, pages 87–94, 2010.

[Bhattacharya et al., 2012] Subhrajit Bhattacharya, Maxim Likhachev, and Vijay Kumar.Topological constraints in search-based robot path planning. Autonomous Robots,33(3):273–290, 2012.

[Bnaya and Felner, 2014] Zahy Bnaya and Ariel Felner. Conflict-oriented windowed hi-erarchical cooperative A*. In International Conference on Robotics and Automation(ICRA), 2014.

[Bnaya et al., 2013] Zahy Bnaya, Roni Stern, Ariel Felner, Roie Zivan, and StevenOkamoto. Multi-agent path finding for self interested agents. In Symposium on Combi-natorial Search (SOCS), 2013.

[Bonet and Geffner, 2001] Blai Bonet and Hector Geffner. Planning as heuristic search.Artificial Intelligence, 129(1):5–33, 2001.

111

[Boyarski et al., 2015] Eli Boyarski, Ariel Felner, Roni Stern, Guni Sharon, Oded Betza-lel, Solomon Shimony, and David Tolpin. ICBS: Improved conflict-based search algo-rithm for multi-agent pathfinding. 2015.

[Boyrasky et al., 2015] Eli Boyrasky, Ariel Felner, Guni Sharon, and Roni Stern. Don’tsplit, try to work it out: Bypassing conflicts in multi-agent pathfinding. In Twenty-FifthInternational Conference on Automated Planning and Scheduling, 2015.

[Broch et al., 1998] Josh Broch, David A. Maltz, David B. Johnson, Yih-Chun Hu, andJorjeta Jetcheva. A performance comparison of multi-hop wireless ad hoc network rout-ing protocols. In Proceedings of the international conference on Mobile computing andnetworking, pages 85–97. ACM, 1998.

[Bulitko and Lee, 2006a] Vadim Bulitko and Greg Lee. Learning in real-time search: Aunifying framework. J. Artif. Intell. Res.(JAIR), 25:119–157, 2006.

[Bulitko and Lee, 2006b] Vadim Bulitko and Greg Lee. Learning in real-time search: Aunifying framework. J. Artif. Intell. Res. (JAIR), 25:119–157, 2006.

[Bulitko et al., 2008] Vadim Bulitko, Mitja Lustrek, Jonathan Schaeffer, Yngvi Bjornsson,and Sverrir Sigmundarson. Dynamic Control in Real-Time Heuristic Search. JAIR,32:419 – 452, 2008.

[Bulitko, 2004] Vadim Bulitko. Learning for adaptive real-time search. Technical Re-port http: // arxiv. org / abs / cs.AI / 0407016, Computer Science Research Repository(CoRR), 2004.

[Burns and Ruml, 2012] Ethan Burns and Wheeler Ruml. Iterative-deepening search withon-line tree size prediction. In LION, pages 1–15, 2012.

[Burns et al., 2013] Ethan Burns, Wheeler Ruml, and Minh Binh Do. Heuristic searchwhen time matters. J. Artif. Intell. Res. (JAIR), 47:697–740, 2013.

[Cheng et al., 2001] Victor HL Cheng, Vivek Sharma, and David C Foyle. A study ofaircraft taxi performance for enhancing airport surface traffic control. Intelligent Trans-portation Systems, IEEE Transactions on, 2(2):39–54, 2001.

[Cohen et al., 2013] Benjamin Cohen, Sachin Chitta, and Maxim Likhachev. Single-anddual-arm motion planning with heuristic search. The International Journal of RoboticsResearch, 2013.

[Cohen et al., 2014] Liron Cohen, Sven Koenig, and Tansel Uras. Using highways forbounded-suboptimal multi-agent path finding. In Eighth International Symposium onCombinatorial Search, 2014.

[Daniel Kornhauser, 1984] Paul Spirakis Daniel Kornhauser, Gary Miller. Coordinatingpebble motion on graphs, the diameter of permutation groups, and applications. InSymposium on Foundations of Computer Science, pages 241–250. IEEE, 1984.

112

[de Wilde et al., 2013] Boris de Wilde, Adriaan W. ter Mors, and Cees Witteveen. Pushand rotate: cooperative multi-agent path planning. In AAMAS, pages 87–94, 2013.

[Dechter and Pearl, 1985] Rina Dechter and Judea Pearl. Generalized best-first searchstrategies and the optimality of A*. Journal of the ACM (JACM), 32(3):505–536, 1985.

[Dibangoye et al., 2012] Jilles S Dibangoye, Christopher Amato, and Arnoud Doniec.Scaling up decentralized mdps through heuristic search. arXiv preprintarXiv:1210.4865, 2012.

[Dijkstra, 1959] Edsger. W. Dijkstra. A note on two problems in connexion with graphs.Numerische Mathematik, 1:269–271, 1959.

[Dresner and Stone, 2008] Kurt M. Dresner and Peter Stone. A multiagent approach to au-tonomous intersection management. Journal of Artificial Intelligence Research, 31:591–656, 2008.

[Erdem et al., 2013] Esra Erdem, Doga G. Kisa, Umut Oztok, and Peter Schueller. Ageneral formal framework for pathfinding problems with multiple agents. In AAAI, 2013.

[Erdmann and Lozano-Perez, 1987] Michael Erdmann and Tomas Lozano-Perez. On mul-tiple moving objects. Algorithmica, 2(1-4):477–521, 1987.

[Felner et al., 2004] Ariel Felner, Roni Stern, Sarit Kraus, Asaph Ben-Yair, and Nathan S.Netanyahu. PHA*: finding the shortest path with A* in an unknown physical environ-ment. Journal of Artificial Intelligence Research, 21:631–670, 2004.

[Felner et al., 2012] Ariel Felner, Meir Goldenberg, Guni Sharon, Roni Stern, Tal Beja,Nathan R. Sturtevant, Jonathan Schaeffer, and Robert Holte. Partial-expansion A* withselective node generation. In AAAI, 2012.

[Felner, 2006] A. Felner. Solving the graph-partitioning problem with heuristic search.Annals of Mathematics and Artificial Intelligence, 67:19–39, 2006.

[Ferner et al., 2013a] Cornelia Ferner, Glenn Wagner, and Howie Choset. Odrm* optimalmultirobot path planning in low dimensional search spaces. In Robotics and Automation(ICRA), 2013 IEEE International Conference on, pages 3854–3859. IEEE, 2013.

[Ferner et al., 2013b] Cornelia Ferner, Glenn Wagner, and Howie Choset. ODrM* optimalmultirobot path planning in low dimensional search spaces. In International Conferenceon Robotics and Automation (ICRA), pages 3854–3859, 2013.

[Geisberger et al., 2012] Robert Geisberger, Peter Sanders, Dominik Schultes, and Chris-tian Vetter. Exact routing in large road networks using contraction hierarchies. Trans-portation Science, 46(3):388–404, 2012.

[Gilboa et al., 2006] Arnon Gilboa, Amnon Meisels, and Ariel Felner. Distributed naviga-tion in an unknown physical environment. In AAMAS, pages 553–560. ACM, 2006.

113

[Goldenberg et al., 2012] Meir Goldenberg, Ariel Felner, Roni Stern, and Jonathan Scha-effer. A* variants for optimal multi-agent pathfinding. In Symposium on CombinatorialSearch (SOCS), 2012.

[Goldenberg et al., 2014] Meir Goldenberg, Ariel Felner, Roni Stern, Guni Sharon,Nathan R. Sturtevant, Robert C. Holte, and Jonathan Schaeffer. Enhanced partial ex-pansion A*. Journal of Artificial Intelligence Research, 50:141–187, 2014.

[Grady et al., 2011] Devin K. Grady, Kostas E. Bekris, and Lydia E. Kavraki. Asyn-chronous distributed motion planning with safety guarantees under second-order dy-namics. In Algorithmic foundations of robotics IX, pages 53–70. Springer, 2011.

[Guizzo, 2008] Erico Guizzo. Three engineers, hundreds of robots, one warehouse. Spec-trum, IEEE, 45(7):26–34, 2008.

[Hagelback and Johansson, 2008] Johan Hagelback and Stefan J Johansson. Using multi-agent potential fields in real-time strategy games. In Proceedings of the 7th internationaljoint conference on Autonomous agents and multiagent systems-Volume 2, pages 631–638. International Foundation for Autonomous Agents and Multiagent Systems, 2008.

[Hart et al., 1968] Peter E. Hart, Nils J. Nilsson, and Bertram Raphael. A formal basis forthe heuristic determination of minimum cost paths. IEEE Trans. on Systems Science andCybernetics, 4(2):100–107, 1968.

[Helmert, 2008] Malte Helmert. Understanding planning tasks: domain complexity andheuristic decomposition, volume 4929. Springer, 2008.

[Hernandez and Baier, 2012] Carlos Hernandez and Jorge A. Baier. Avoiding and escapingdepressions in real-time heuristic search. J. Artif. Intell. Res. (JAIR), 43:523–570, 2012.

[Jansen and Sturtevant, 2008] Renee Jansen and Nathan R. Sturtevant. A new approach tocooperative pathfinding. In AAMAS, pages 1401–1404, 2008.

[Khorshid et al., 2011] Mokhtar M. Khorshid, Robert C. Holte, and Nathan R. Sturtevant.A polynomial-time algorithm for non-optimal multi-agent pathfinding. In Symposiumon Combinatorial Search (SOCS), 2011.

[Koenig and Likhachev, 2006] Sven Koenig and Maxim Likhachev. Real-time adaptiveA*. In AAMAS, pages 281–288, 2006.

[Koenig and Sun, 2009] Sven Koenig and Xiaoxun Sun. Comparing real-time and incre-mental heuristic search for real-time situated agents. Autonomous Agents and Multi-Agent Systems, 18(3):313–341, 2009.

[Koenig, 1992] Sven Koenig. The complexity of real-time search. Technical ReportCMU–CS–92–145, School of Computer Science, Carnegie Mellon University, Pitts-burgh, 1992.

[Koenig, 2001] Sven Koenig. Agent-centered search. AI Magazine, 22(4):109–132, 2001.

114

[Koenig, 2004] Sven Koenig. A comparison of fast search methods for real-time situatedagents. In In Proceedings of the Third International Joint Conference on AutonomousAgents and Multiagent Systems - Volume, pages 864–871, 2004.

[Korf and Taylor, 1996] Richard E. Korf and Larry A. Taylor. Finding optimal solutionsto the twenty-four puzzle. In AAAI, pages 1202–1207, 1996.

[Korf et al., 2001] Richard E. Korf, Michael Reid, and Stefan Edelkamp. Time complexityof iterative-deepening-A*. Artif. Intell., 129(1-2):199–218, 2001.

[Korf, 1985] Richard E. Korf. Depth-first iterative-deepening: An optimal admissible treesearch. Artificial intelligence, 27(1):97–109, 1985.

[Korf, 1990] Richard E. Korf. Real-time heuristic search. Artif. Intell., 42(2-3):189–211,1990.

[Korf, 1997] Richard E. Korf. Finding optimal solutions to Rubik’s cube using patterndatabases. In AAAI/IAAI, pages 700–705, 1997.

[Korf, 2009] Richard E Korf. Multi-way number partitioning. In IJCAI, pages 538–543.Citeseer, 2009.

[LaValle and Hutchinson, 1998] Steven M. LaValle and Seth A. Hutchinson. Optimal mo-tion planning for multiple robots having independent goals. Robotics and Automation,14(6):912–925, 1998.

[Lelis et al., 2013] Levi H. S. Lelis, Sandra Zilles, and Robert C. Holte. Predicting the sizeof ida*’s search tree. Artif. Intell., 196:53–76, 2013.

[Li and Fan, 2011] Q Li and HSL Fan. 18. a simulation model for detecting vessel conflictswithin a seaport. Navigational Systems and Simulators: Marine Navigation and Safetyof Sea Transportation, page 133, 2011.

[Likhachev et al., 2008] Maxim Likhachev, Dave Ferguson, Geoff Gordon, AnthonyStentz, and Sebastian Thrun. Anytime search in dynamic graphs. Artif. Intell.,172:1613–1643, September 2008.

[Luna and Bekris, 2011] Ryan Luna and Kostas E. Bekris. Efficient and complete cen-tralized multi-robot path planning. In Intelligent Robots and Systems (IROS), pages3268–3275, 2011.

[Mohr and Henderson, 1986] Roger Mohr and Thomas C Henderson. Arc and path con-sistency revisited. Artificial intelligence, 28(2):225–233, 1986.

[Pallottino et al., 2007] Lucia Pallottino, Vincenzo Giovanni Scordio, Antonio Bicchi, andEmilio Frazzoli. Decentralized cooperative policy for conflict resolution in multivehiclesystems. Robotics, 23(6):1170–1183, 2007.

115

[Peasgood et al., 2006] Mike Peasgood, John McPhee, and Christopher M. Clark. Com-plete and scalable multi-robot planning in tunnel environments. Computer Science andSoftware Engineering, page 75, 2006.

[Pochter et al., 2009] Nir Pochter, Aviv Zohar, and Jeffrey S. Rosenschein. Using swampsto improve optimal pathfinding. In Carles Sierra, Cristiano Castelfranchi, Keith S.Decker, and Jaime Simao Sichman, editors, AAMAS (2), pages 1163–1164. IFAAMAS,2009.

[Rintanen, 2011] Jussi Rintanen. Planning with SAT, admissible heuristics and A*. InIJCAI, pages 2015–2020, 2011.

[Roger and Helmert, 2012] Gabriele Roger and Malte Helmert. Non-optimal multi-agentpathfinding is solved (since 1984). In Symposium on Combinatorial Search (SOCS),2012.

[Ruml and Do, 2007] Wheeler Ruml and Minh Binh Do. Best-first utility-guided search.In IJCAI, pages 2378–2384, 2007.

[Russell and Norvig, 2010] Stuart J. Russell and Peter Norvig. Artificial Intelligence - AModern Approach (3. internat. ed.). Pearson Education, 2010.

[Ryan, 2008] Malcolm R. K. Ryan. Exploiting subgraph structure in multi-robot path plan-ning. Journal of Artificial Intelligence Research, 31:497–542, 2008.

[Ryan, 2010] Malcolm R. K. Ryan. Constraint-based multi-robot path planning. In Inter-national Conference on Robotics and Automation (ICRA), pages 922–928, 2010.

[Sajid et al., 2012] Qandeel Sajid, Ryan Luna, and Kostas E. Bekris. Multi-agent pathfind-ing with simultaneous execution of single-agent primitives. In Symposium on Combina-torial Search (SOCS), 2012.

[Sarkar et al., 1991] Uttam K. Sarkar, Partha P. Chakrabarti, Sujoy Ghose, and S. C. DeSarkar. Reducing reexpansions in iterative-deepening search by controlling cutoffbounds. Artif. Intell., 50(2):207–221, 1991.

[Sharon et al., 2011a] Guni Sharon, Roni Stern, Meir Goldenberg, and Ariel Felner. Theincreasing cost tree search for optimal multi-agent pathfinding. In IJCAI, pages 662–667, 2011.

[Sharon et al., 2011b] Guni Sharon, Roni Stern, Meir Goldenberg, and Ariel Felner. Prun-ing techniques for the increasing cost tree search for optimal multi-agent pathfinding. InSymposium on Combinatorial Search (SOCS), 2011.

[Sharon et al., 2012a] Guni Sharon, Roni Stern, Ariel Felner, and Nathan R. Sturtevant.Conflict-based search for optimal multi-agent path finding. In AAAI, 2012.

[Sharon et al., 2012b] Guni Sharon, Roni Stern, Ariel Felner, and Nathan R. Sturtevant.Meta-agent conflict-based search for optimal multi-agent path finding. In Symposiumon Combinatorial Search (SOCS), 2012.

116

[Sharon et al., 2013a] Guni Sharon, Roni Stern, Meir Goldenberg, and Ariel Felner. Theincreasing cost tree search for optimal multi-agent pathfinding. Artificial Intelligence,195:470–495, 2013.

[Sharon et al., 2013b] Guni Sharon, Nathan R. Sturtevant, and Ariel Felner. Online detec-tion of dead states in real-time agent-centered search. In SOCS, 2013.

[Sharon et al., 2014] Guni Sharon, Ariel Felner, and Nathan Sturtevant. Exponential deep-ening a* for real-time agent-centered search. In Twenty-Eighth AAAI Conference onArtificial Intelligence, 2014.

[Sharon et al., 2015] Guni Sharon, Roni Stern, Ariel Felner, and Nathan R Sturtevant.Conflict-based search for optimal multi-agent pathfinding. Artificial Intelligence,219:40–66, 2015.

[Sharon, 2014] Guni Sharon. Partial domain search tree for constraint-satisfaction prob-lems. In Eighth International Symposium on Combinatorial Search, 2014.

[Shiloni et al., 2009] Asaf Shiloni, Noa Agmon, and Gal A. Kaminka. Of robot ants andelephants. In AAMAS ’09: Proceedings of The 8th International Conference on Au-tonomous Agents and Multiagent Systems, pages 81–88, Richland, SC, 2009. Interna-tional Foundation for Autonomous Agents and Multiagent Systems.

[Shimbo and Ishida, 2003] Masashi Shimbo and Toru Ishida. Controlling the learning pro-cess of real-time heuristic search. AIJ, 146(1):1–41, 2003.

[Shue et al., 2001] L.-Y. Shue, S.-T. Li, and R. Zamani. An intelligent heuristic algorithmfor project scheduling problems. In 32nd Annual Meeting of the Decision SciencesInstitute, 2001.

[Silver, 2005] David Silver. Cooperative pathfinding. In Artificial Intelligence and Inter-active Digital Entertainment (AIIDE), pages 117–122, 2005.

[Srinivasan et al., 1990] Arvind Srinivasan, Timothy Ham, Sharad Malik, and Robert K.Brayton. Algorithms for discrete function manipulation. In International Conference onComputer Aided Design (ICCAD), pages 92–95, 1990.

[Standley and Korf, 2011] Trevor S. Standley and Richard E. Korf. Complete algorithmsfor cooperative pathfinding problems. In IJCAI, pages 668–673, 2011.

[Standley, 2010] Trevor S. Standley. Finding optimal solutions to cooperative pathfindingproblems. In AAAI, 2010.

[Stern et al., 2010] Roni Stern, Tamar Kulberis, Ariel Felner, and Robert Holte. Usinglookaheads with optimal best-first search. In AAAI, 2010.

[Sturtevant and Bulitko, 2011] Nathan R. Sturtevant and Vadim Bulitko. Learning whereyou are going and from whence you came: h-and g-cost learning in real-time heuristicsearch. International Joint Conference on Artificial Intelligence (IJCAI), pages 365–370,2011.

117

[Sturtevant and Buro, 2006] Nathan R. Sturtevant and Michael Buro. Improving collabora-tive pathfinding using map abstraction. In Artificial Intelligence and Interactive DigitalEntertainment (AIIDE), pages 80–85, 2006.

[Sturtevant and Geisberger, 2010] Nathan R. Sturtevant and Robert Geisberger. A compar-ison of high-level approaches for speeding up pathfinding. In Artificial Intelligence andInteractive Digital Entertainment (AIIDE), 2010.

[Sturtevant et al., 2010] Nathan R. Sturtevant, Vadim Bulitko, and Yngvi Bornsson. Onlearning in agent-centered search. In AAMAS, pages 333 – 340, 2010.

[Sturtevant, 2012] Nathan R. Sturtevant. Benchmarks for grid-based pathfinding. Compu-tational Intelligence and AI in Games, 4(2):144–148, 2012.

[Sun et al., 2009] Xiaoxun Sun, William Yeoh, Po-An Chen, and Sven Koenig. Simpleoptimization techniques for A*-based search. In AAMAS, pages 931–936, 2009.

[Surynek, 2012] Pavel Surynek. Towards optimal cooperative path planning in hard setupsthrough satisfiability solving. In The Pacific Rim International Conference on ArtificialIntelligence (PRICAI), pages 564–576. 2012.

[Tolpin, 2014] David Tolpin. Justifying and improving meta-agent conflict-based search.arXiv preprint arXiv:1410.6519, 2014.

[Vempaty et al., 1991] Nageshwara Rao Vempaty, Vipin Kumar, and Richard E. Korf.Depth-first versus best-first search. In AAAI, pages 434–440, 1991.

[Wagner and Choset, 2011] Glenn Wagner and Howie Choset. M*: A complete multirobotpath planning algorithm with performance bounds. In Intelligent Robots and Systems(IROS), pages 3260–3267, 2011.

[Wang and Botea, 2008] Ko-Hsin Cindy Wang and Adi Botea. Fast and memory-efficientmulti-agent pathfinding. In ICAPS, pages 380–387, 2008.

[Wang and Botea, 2011] Ko-Hsin Cindy Wang and Adi Botea. Mapp: a scalable multi-agent path planning algorithm with tractability and completeness guarantees. Journal ofArtificial Intelligence Research, 42(1):55–90, 2011.

[Yu and LaValle, 2012] Jingjin Yu and Steven M. LaValle. Multi-agent path planning andnetwork flow. In Algorithmic Foundations of Robotics X - Proceedings of the TenthWorkshop on the Algorithmic Foundations of Robotics, WAFR 2012, MIT, Cambridge,Massachusetts, USA, June 13-15 2012, pages 157–173, 2012.

[Yu and LaValle, 2013a] Jingjin Yu and Steven M. LaValle. Planning optimal paths formultiple robots on graphs. In International Conference on Robotics and Automation(ICRA), pages 3612–3617, 2013.

[Yu and LaValle, 2013b] Jingjin Yu and Steven M. LaValle. Structure and intractability ofoptimal multi-robot path planning on graphs. In AAAI, 2013.

118

[Yu and Rus, 2014] Jingjin. Yu and Daniela. Rus. Pebble motion on graphs with rotations:Efficient feasibility tests and planning algorithms. In Eleventh Workshop on the Algo-rithmic Fundations of Robotics, 2014.

[Zahavi et al., 2010] Uzi Zahavi, Ariel Felner, Neil Burch, and Robert C. Holte. Predictingthe performance of IDA* using conditional distributions. J. Artif. Intell. Res. (JAIR),37:41–83, 2010.

[Zhang and Korf, 1995] W. Zhang and R. E. Korf. Performance of linear-space searchalgorithms. Artificial Intelligence, 79:241–292, 1995.

[Zhang and Yap, 2001] Yuanlin Zhang and Roland HC Yap. Making ac-3 an optimal algo-rithm. In IJCAI, pages 316–321, 2001.

119

Appendix A

Memory Restricted CBS

Many recently presented optimal MAPF solvers require an exponential amount of mem-ory. For A* variants, the memory is used to store OPEN and CLOSED. For ICTS, thememory is used for the ICT. Both are exponential (ICTS in ∆ and A* variants in k). Indomains with no transpositions, this memory problem can be easily solved by runningIterative-Deepening A* (IDA*) [Korf, 1985]. However, the efficiency of IDA* degradessubstantially as it encounters more duplicate nodes. In domains such as maps and grids –the most common domain for MAPF – duplicate nodes are very frequent. For example, ina 4-connected grid the number of unique states at radius r from a given location is O(r2)

but the number of paths is O(4r). Thus, all previous optimal MAPF solvers were memory-intensive. Next, we describe how MA-CBS can be modified to be an effective optimalsolver that requires memory of size O(k · C∗ · |V |), that is the product of the number ofagents, k, the cost of the optimal solution cost, C∗ and the size of the input graph, |V |.

Figure A.1: Example MAPF instance where CBS has cycles.

Unlike A* and its variants, CBS searches a conceptually different state space – theconstraint space. Nevertheless, duplicate state may be encountered also in this space. Fig-ure A.1 showed an MAPF instance (left) and a corresponding CT with a duplicate state(right). In this problem there are three agents. Each has a starting location, Si, and a goallocation, Gi. The root of the CT has no constraints and the solution found for agents (a1,

120

a2, a3) is 〈S1, A,G1〉, 〈S2, A,G2〉, 〈S3, B,G3〉. The root contains a conflict (a1, a2, A, 1)

and a new CT node N1 is created with the constraint (a1, A, 1). N1 contains the conflict(a1, a3, B, 1). Therefore another CT node N2 is created with the constraint (a2, A, 1) butit contains the conflict (a2, a3, B, 1). Next, CT node N1 is expanded, creating CT node N3

with the constraints (a1, A, 1), (a3, B, 1). Then, CT node N2 is expanded generating CTnode N4 with the constraints (a2, A, 1), (a3, B, 1). CT node N3 creates CT node N5 withthe constraints (a1, A, 1), (a3, B, 1), (a2, A, 1). CT node N4 creates CT node N6 with theconstraint (a2, A, 1), (a3, B, 1), (a1, A, 1). As can be seen CT nodes N5 and N6 contain thesame set of constraints and are thus duplicates.

Even though the above example proves duplicates may exist in the constraint tree, inpractice, on 4-connected grids, we encountered very few duplicates. We ran all the experi-ments reported in this chapter using a duplicate detection mechanism. The results, with andwithout duplicate detection, were almost identical in all parameters due to the low amountof duplicates. Consequently, we experimented with Iterative-Deepening as the high-levelsearch algorithm. Iterative-Deepening is a depth first search that requires memory linearin the depth of the solution. The resulting average run time was about 12% higher ForIterative-Deepening compared to best-first search as the high-level solver.

A.1 Domains with Many Duplicate States

Despite the fact that 4-connected grids have very few duplicate states, other domains suchas random graphs may contain many duplicates. For these domains DFID will be ineffi-cient as the high-level solver. To solve this problem we developed a new CT branchingtechnique which will completely prevent duplicates. In CBS, when a conflict (a1, a2, v, t)

is found the node is split into two children while adding constraint (a1, v, t) in one childand (a2, v, t) in the other child. For the memory efficient variant we define a new constraint(ai, v, t) which means that agent ai must be at location v at time step t. Now, when a con-flict (a1, a2, v, t) is found three children are generated. The first child adds the constraints(a1, v, t), (a2, v, t), i.e., a1 cannot be located in v at time t but a2 must be at v at time t.The second adds the constraints (a2, v, t), (a1, v, t), i.e., a2 cannot be located in v at timet but a1 must be at v at time t. The third child adds the constraints (a1, v, t), (a2, v, t),i.e., neither a1 nor a2 are allowed to be in v at time t.

This mechanism prevents the possibility of duplicates occurrences while still maintain-ing optimality and completeness. For memory restricted environments and domains thatcontain many duplicates we suggest using this formalization along with DFID as the high-level CT search algorithm.

A.2 Polynomial domains with transpositions

The theoretical analysis provided above in section 5.4.2 assumed no transpositions (or thatthe state space is growing polynomially even when transpositions exist and DD is not per-formed). The experimental section showed the the general tendency that was proved is

121

Figure A.2: Pathological example where EDA* with C = 2 might be imbalanced.

evident in real world domains (even though they might have transpositions). To comple-ment this, in this section we theoretically deal with cases where transpositions exist. Wefirst show an extreme example where EDA* with C = 2 might be imbalanced due to trans-positions. Then, we prove that for each domain there exists a constant C such that EDA*will be balanced.

A.2.1 Extreme Case Example

When transpositions exist, since DD is performed and re-opening is not allowed, EDA*with a unique C value might be imbalanced. This phenomena occurs in a graph withexponentially growing cycles. As an example for such a case consider EDA* with C = 2

where the initial position of the agent is vertex 0 in the graph depicted in Figure A.2.Each solid edge is of weight one. Dotted paths represent a chain of vertices where the

number below each dashed path is the length of the path. That is, a dotted path of length lincludes l − 1 intermediate vertices (not shown in the figure) and l edges of cost one. Thenodes are labeled with the optimal path to them (via the top path). We note that the lowerdetour from node i to node i + 1 is of length 2i. Assume the goal is located at distance xfrom 0 and that x = 2k.

Consider an EDA* run with thresholds T = 1, 2, 4, . . . 2k = x. Assume that in alliterations except the final iteration (i.e., the first k− 1 iterations) ties are broken in favor ofthe bottom paths. Let i be the index of a representative iteration from that group. In thatiteration EDA* will visit 2i nodes via the lower path. This includes all i nodes from theupper path. For example, consider iteration 3 (T = 8). In that iteration, EDA* will visit8 nodes from the bottom path and this also includes node 3 which is at distance 7 alongthe bottom path. It will also include one node from the bottom path from node 3 to node4. EDA* will now backtrack and since we perform DD it will not traverse more than oneedge in the upper path. The total number of nodes in all k − 1 iterations is O(2k = x).

Now consider the last iteration k where T = 2k = x. Assume that in this iteration tiesare broken in favor of the upper path. So, EDA* will follow the upper path until it reachesthe goal node at distance x. In addition, when backtracking and arriving at node j, EDA*will visit x− j nodes out of the 2j in the detour that connects node j with node j + 1. So,in practice it will visit min(x − j, 2j) on that detour. For the second half of the nodes inthe upper chain (where x/2 = 2k−1 ≤ j < 2k = X) it holds that x− j < 2j . Therefore the

total number of new visits in the second half of the bottom detours isx∑

j=x/2

(x−j) = O(x2).

122

Therefore F = Θ(x2) while R = Θ(x). In this example R << F and condition 2 isviolated. Therefore, EDA* with C = 2 is imbalanced. However, we prove next that theremust exist a C value for which EDA* acts balanced

A.2.2 EDA* is Balanced Even When Transpositions Exist

We now deal with the case where transpositions exist and present the following Theorem:for any polynomial domain, even with transpositions, there exists a C > 1 value such thatEDA* is balanced. We prove this below under the following assumptions:

• Constant dimensionality - the rate in which the polynomial domain grows (k) isfixed.

• Fixed tie-breaking policy - each time a given state is visited all its unvisited neigh-bors (candidates for the next step) will be prioritized in the same order.

• No heuristic guidance - In order simplify the worst case analysis we assume that foreach state, s, h(s) = 0.

• All edges cost 1 - This is, again, an assumption that helps simplifying the analysisbelow.

Lemma 9 For a given iteration with threshold T , EDA* will visit no more than T k states.

Proof:Given a threshold T , the agent can visit all states with f ≤ T . The number of such

states will be maximized when h = 0 for all states. In that case all states with g ≤ T can bevisited. According to the definition of a polynomial domain there are T k states with g ≤ T .

Lemma 10 For a given iteration with threshold T , no less than T states will be visited byEDA*.

Proof: Within each iteration we classify all states to three categories:

1. Undiscovered - states that were never visited by the agent in the current iteration.

2. Open - states that were visited once i.e., the agent never backtracked from them.

3. Closed - states that were visited twice i.e., the agent backtracked from them.

Assume a path, p = c0, c1, ..., cT (where c0 is the starting state) of length T exists, i.e.,the agent is able to travel to depth T . If this is not the case, the agent will necessarilyvisit the entire state space and will reach the goal. At each iteration at least one state fromp must be open. Let ci be the state with the largest index (i) that is both open and in p.If ci.g = T then the agent traversed at least T states as c0.g = 0, increasing the g valuerequires traversing one edge and DD is applied. Once the iteration concludes, ci.g mustbe equal to T . Assume, in contradiction to the previous claim, that the process halted andci.g < T . This is impossible since when backtracking via ci, the agent must continue tostate ci+1 where it will update the g value to be ci+1.g = ci.g + 1.

123

Lemma 11 The number of states visited in each iteration (i) grows exponentially with i.

Proof:According to Lemmas 9 and 10 the number of states visited in each iteration (i) is be-

tween T and TK , i.e., T ≤ visited(i) ≤ T k. In EDA* T = Ci, i.e., T grows exponentiallywith i. Consequently, the number of states visited in each iteration is between two func-tions that are exponential in i, i.e., Ci ≤ visited(i) ≤ [Ck]i (recall that Ck is a constant).

Lemma 12 For any constant C > 1, condition 1 (R = O(F )) must be satisfied.1

Proof: The number of states visited in each iteration grows exponentially (Lemma 11)meaning that a given iteration (Nlast) dominates the complexity of all previous iterations(Nprev), i.e, Nprev = O(Nlast). Since R = Nprev (Lemma 5) and F = Nlast (Lemma 4),R = O(F ).

Lemma 13limC→1

R =∞

Proof: Assume that the difference between the heuristic value of the initial state (s.h) andthe cost of the optimal solution (s.h∗) is s.h∗ − s.h = ∆ > 0.2 Reaching the goal requiresT to be at least the optimal solution (h∗). Since T = Ci we get that reaching the goal ispossible for the first time in iteration i = logC(h∗). Consequently, as C is decreased closerto 1, i increases to infinity. Since each iteration can visit no less than one state, R increasesto infinity with i.

Theorem 6 For any problem instance there exist a C value for which EDA* is balanced.

Proof:

• Condition 1 R = O(F ) - satisfied for any constant C > 1 value according to Lemma12.

• Condition 2 F = O(R) - since R can grow as much as needed (Lemma 13) and F isbounded by the size of the state space, there must exist a C value for which R ≥ F .

1Condition 1 may be violated by EDA* in logarithmic growing domains which are not in the scope of this thesis.2If ∆ = 0 there might be only one iteration and R will be equal to zero.

124

novel search techniques for path finding in complex ...guni/papers/thesis-sharon.pdf3.3.2 3 3 grid...

Documents