a study on n-puzzle solving search algorithms

A study on n-puzzle solving search algorithms*

Kadir Firat UyanikElectrical and Electronics Engineering Department,

Middle East Technical [email protected]

Abstract— Classical AI search algorithms have been testedon various problems, such as 8-queens, traveling salesman,automatic assembly sequencing, and even robot navigation. Inthis study, several search algorithms are going to be comparedin n-puzzle solving problem in terms of their computationalcomplexity and memory usage for different board configurationscenarios (e.g. board size, tile placement etc.).

I. INTRODUCTION

It is an intriguing problem to design a software agent thatcan find a reasonable amount of action-sequence in order toreach a particular goal state from an initial state. In general,search is a process of examining different possible sequencesof actions that yield to a desired state, and choosing the bestaction-sequence to realize when a similar query is done inthe future.

Almost all the search algorithms possess the following el-ements and functionalities: initial state,goal state, successorfunction, goal test, and path cost.

These are the common attribute and functionalities thatthe algorithms tested in this study, namely depth-firstsearch(memoizing, iterative deepening), breadth-first search,A-star search (with manhattan distance, euclidean distance,and misplaced tiles heuristics).

II. EXPERIMENTAL SETUP

In order to represent the state of the puzzle a generic datastructure is being utilized 1.

As it is used in Node data structure, object orientedprogramming principles are used in design of the searchalgorithms as well 2. Whole software setup is consisted ofmore than 3000 lines of c++ code, and around 10 classes. Itwas very important to consider several performance issuesduring the imlementation, such as correct-const-ness issue,and de-allocation.

Since nodes are allocated dynamically from the heapregion, it is very important to release back the memoryacquired for a particular search operation.

Correct-const-ness is one of the most important goodpractices while coding in C++ language. If a variable passedby value to a function, a new copy of this variable is createdinside the new local scope. Therefore passing the variable byreference would do the trick. However, one should be verycareful when the objects or data are passed via referencesince the value of that memory location stores can easily

*This study is a part of EE586 Artificial Intelligence course offered byDr. Afsar Saranli

Fig. 1: Nodes are the structures that search algorithms makeuse of. Since nodes are templated stuctures any kind of datacan be stored inside them. Boards are used by nodes in thisproblem where they are composed of Tile structures.

Fig. 2: All of the search algorithms share similar functionalities which are parts of the search class.

be changed in this case. To avoid this a const keyword canbe added to that compiler understands that the value of thatvariable will not be changed in that particular scope andgives error whenever designer tries to change the value ofthat variable in the compile time.

III. EXPERIMENTS

Several tools are being used to measure memory usage,such as valgrind with masif, and the linux system clockwhich is precise in the nanoseconds scale. In order to showhow different search algorithms make use of the heap andstack regions of the memory accross time, several experi-ments have been conducted on the board configuration givenin the homework sheet that is:

initial =3 4 61 0 87 2 5

goal =1 2 34 5 67 8 0

Memory usage and computation time of the searchalgrithms are given in the following figures. Please noticethat the when valgrind is used to measure the memoryusage (with the configuration of max.100 snapshots, 10Hzresolution) execution time increases approximately 50 timesthe actual time. However, actual execution times are alsogiven in the figures in addition to the number of nodes beingexpanded and the number of nodes expanded in the optimal

that is found by BFS or A* search algorithms which areknown to be optimal.

In general, algorithms expands a node with some selectioncriteria, and decides on what to do next by evaluatingthis node. The way nodes are evaulated is the most crit-ical part. For instance, depth-first search(DFS) algorithmstries to expand the nodes so that the depth of the searchtree is increased at each iteration, whereas the breadth-firstsearch(BFS) tries to expand through all the successors of aparticular node.

Since depth first search goes deep down in the solutiontree, if a duplicate-state check isn’t done, DFS most likely tostack in an infinite loop and never finds the goal. To avoid thisproblem, all the expanded nodes are stored in an expandedlist. And each candidate node is first checked if it is alreadyexpanded or not (yet, the very first thing is checking if it isgoal).

Since BFS tries to expand the nodes that are closest tothe starting node, it is guaranteed that it will come upwith one of the optimal solutions (there might be severalgoals at the same depth). However, DFS is not an optimalalgorithm in most of the cases. In order to make DFS as anoptimal algorithm, it is modifed by limiting the max searchdepth for each iteration which is so called iterative-deepeningdepth first search(IDDFS) algorithm. IDDFS works similarto both DFS and BFS algorithms and it finds one of theoptimal solutions like BFS. Another modification would beexpanding whole graph and saving all solution alternativesand picking the shortest one after expanding whole searchtree, which is what Memoizing DFS does.

The search algorithms mentioned above consider only theactual path cost that it takes until reaching a particular node.They expand the nodes without considering how much theyare getting closer to the goal state. To do this, informedsearch methods were developed. One of the most famousinformed search algorithm is A* (pronounced as A-star).It takes not only the actual distance of a node from theinitial state but also the cost being estimated until reachingthe goal state. If this estimate, that is heuristic, is soundand admissible (never overestimates the distance between aspecific node and the goal node), the algorithm will find theoptimal path. In this study, manhattan, euclidean, and numberof misplaced tiles heuristics are being used.

Although euclidean and manhattan distance heuristics(A*w/ manhattan) work almost the sameway, they outperformnumber of misplaced tiles heuristic (A* w/ misplaced). Thisis due to the fact that the more informative a heuristic is thebetter the algorihm will perform and make more reasonabledecisions. Therefore it is expected that A* w/ manhattan willperform better than the A* w/ misplaced.

IV. DISCUSSION

In this study, several monte carlo simulations conducted.Randomly generated board configurations (with a priorlyknown true distance to the goal) have been tested on allof the algorithms. Optimal path length is shown in figure

9, average time that a solution takes in figure 10, and totalnumber of expanded-nodes (=moves) in figure 11.

100 experiments for each board configuration (true-distance in the range of 2-14) have been conducted. Resultsshows that the complexity of informed algorithms(A* w/manhattan and A* w/ misplaced) grows almost linearlywhereas uninformed search algorithms’ blow up exponen-tially due to the branching factor of the problem. In n-puzzleaverage branching factor is around 3, uninformed searchalgorithms would be reather useless for the problems havinglarger branching factor such as chess-play.

To sum up, A* w/ manhattan is the best algorithm interms of execution time, memory usage, and the shortest-path-length criteria. To see how it performs for differentsized boards, it is tested on 3x3, 5x5 and 7x7 boards with ainitial state of 19 steps further from the goal state. Resultsshow that A* isn’t being affected from the board size sincethe most critical part is how far a particular state is awayfrom the goal state. According to the simulation results A*make use of not 300-400 nodes to reach the goal state thatis 19 steps further from the initial state. Still, it finds theoptimal solution, and execution time is less than 15seconds.To be more specific execution time for 3x3, 5x5, and 7x7board configurations are 1.17sec, 5.2sec, and 15sec, whereasthe number of opened nodes are 298.8, 312.95, and 330.34respectively.

V. CONCLUSIONS AND FUTURE WORK

In this study, I have investigated several search algorithmson the n-puzzle problem domain. DFS appeared to be worstand the most unreliable among the other algorithms, whereasA* algorithm with manhattan distance heuristic performedmost successfully regarding the memory being used and theexecution time it requires. Algorithms are implemened byusing C++ language on a Linux box. Memory usage vs.execution time figures are obtained via Valgrind open sourcetool with the Messif extention. Graphs are being drawn viaOctave’s plot utilities. All of the source code can be obtainedfrom my personal code repository.

As a future work a GUI implementation can be final-ized that I’ve started by using QT framework, and searchalgorithms can be optimized by carefully investigating theheap usage expecially when doing Monte-Carlo simulations.Another important issue is that the search algorithms espe-cially depth-first and breadth-first can easily be parallelizedand search programs can be designed in a way that eachsubtree of the search tree can be solved in a different thread.This is best done by creating thousands of many threads ina Graphical Processing unit. During this study, I’ve got achance to investigate several parallelization methods that aredone by using CUDA on NVidia graphics cards.

http://code.google.com/p/kfu/source/checkout

Fig. 3: BFS algorithm. Actual solution time:0.981sec, ]nodes expanded:2233, ]nodes expanded in the optimal path 12

Fig. 4: IDDFS algorithm. Actual solution time:0.257sec, ]nodes expanded:318, ]nodes expanded in the optimal path 12

Fig. 5: DFS algorithm. Actual solution time:6457.16sec, ]nodes expanded:168788, ]nodes expanded in the optimal path 9,but solution is found to be at the 67562nd level. Please notice that the execution is being killed in valgrind case becauseit takes about two hours even valgrind is not being activated. Therefore, we can estimate that this execution would takeapproximately 3 or 4 days.

Fig. 6: A* algorithm with manhattan distance heuristic. Actual solution 0.016sec, ]nodes expanded:31, ]nodes expanded inthe optimal path is 12.

Fig. 7: A* algorithm with euclidean distance heuristic. Actual solution 0.018sec, ]nodes expanded:30, ]nodes expanded inthe optimal path is 12.

Fig. 8: A* algorithm with total number of misplaced tiles heuristic. Actual solution 0.0298sec, ]nodes expanded:113, ]nodesexpanded in the optimal path is 12.

Fig. 9: All of the algorithms except DFS finds the optimal path. DFS is not included in the experiments since it requiresdays to simulate DFS for a problem of true-distance 4.

Fig. 10: Uninformed algorithms require much more larger time to solve problems having larger depth values. It worthsnoting that IDDFS requires much more time than the BFS algorithm. Although they should perform theoretically similar,starting the search from very first -in the case of IDDFS- requires releasing the memory and allocating again which is aserious overhead. This problem can be overcomed by storing nodes in a hash-table or by keeping track of already allocatednodes to avoid de-allocation+allocation procedure.

Fig. 11: The difference between IDDFS and BFS is not that conceivable as in it is in the figure 10. It is expected to be lessif the number of simulations is increased.

a study on n-puzzle solving search algorithms

Documents