multi-robot coordination using switching of … coordination using switching of methods for deriving...

8
174 ECTI TRANSACTIONS ON COMPUTER AND INFORMATION TECHNOLOGY VOL.8, NO.2 November 2014 Multi-Robot Coordination Using Switching of Methods for Deriving Equilibrium in Game Theory Wataru Inujima 1 , Kazushi Nakano 2 , and Shu Hosokawa 3 , Non-members ABSTRACT The study of a Multi-Agent System using multi- ple autonomous robots has recently attracted much attention. With the problem of target a tracking as a typical case study, multiple autonomous robots de- cide their own actions to achieve the whole task which is tracking target. Each autonomous robot’s action influences each other. So, an action decision in co- ordination with other robots and the environment is needed to achieve the whole task effectively. The game theory is a major mathematical tool for realizing a coordinated action decision. The game theory mathematically deals with a multi- agent environment influencing each other as a game situation. The conventional methods model one of the target tracking as a n-person general-sum game, and the use of the non-cooperative Nash equi- librium theory in non-cooperative games and the semi-cooperative Stackelberg equilibrium. The semi- cooperative Stackelberg equilibrium may obtain bet- ter control performance than the non-cooperative Nash equilibrium, but requires the communication among robots. In this study, we propose switching of meth- ods in the equilibrium derivation both from the non-cooperative Nash equilibrium and the semi- cooperative Stackelberg equilibrium in a coordina- tion algorithm for the target tracking. In the sim- ulation, our proposed method achieves coordination with less connections than the method using the semi- cooperative Stackelberg equilibrium at all times. Fur- thermore, the proposed method shows better control performance than the non-cooperative Nash equilib- rium. Keywords: Multi-Agent, Coordination Control, Robot, Game Theory Manuscript received on July 20, 2013 ; revised on January 6, 2014. Final manuscript received May 3, 2014. 1,2 The authors are with Dept. of Mechanical Eng. and In- telligent Systems, The University of Electro-Communications 1-5-1, Chofugaoka, Chofu, Tokyo, 182-8585, Japan, E- mail:[email protected] and [email protected] 3 The author is with Dept. of Electronic Eng., The University of Electro-Communications 1-5-1, Chofugaoka, Chofu, Tokyo, 182-8585, Japan, E-mail:[email protected] 1. INTRODUCTION Recently, robots following people to support car- rying baggage has been studied. Using multi-robot, it is expected that an entire performance of achieving the whole task is improved with cooperating individ- ual robot whose performance and costs are low. For instance, robots in cooperation can carry heavy bag- gage which a single robot cannot carry . Thus, the study of a Multi-Agent System (MAS) using multiple autonomous robots is actively being in- creased. With the problem of target tracking as a typ- ical case study, the competitive situation to prevent achiving the whole task by the robot collision with a tracked target robot, obstacles, and other robots is assumed. Therefore, a coordination control to solve these is needed. The game theory is the method realizing a co- ordinated action decision of each robot, and deals with a multi-agent environment in which multiple au- tonomous agents decide their own actions to achive the whole task and these action decisions influence each other [1]. In the game theory, the situation assuming a non binding agreement among agents is called “non- cooperative game”. In contrast, the situation assum- ing binding an agreement among agents is called “co- operative game” [2]. Skrzypczyk [3][4][5] introduces the cost value as a method using the game theory for the target tracking problem, and models the target tracking problem as “n-person general sum game”. The cost value indi- cates a burden to achieve the task based on the po- sition of each tracking robot, a tracked target robot and obstacles. For deciding an action of the robot in coordina- tion for the target tracking, Skrzypczyk [3] presents a method using “Nash equilibrium” based on the non- cooperative game theory. On the other hands, Har- mati [4] proposes “Stackelberg equilibrium” based on a type of cooperative game. In the coordination using the Nash equilibrium, tracking robots as agents do not have communica- tion mechanism in a tracking team, and robots decide their own action by using only cost values. In the co- ordination using the Stackelberg equilibrium, track- ing robots have communication mechanism and ex- change information in a tracking robot team. Robots obtain the agreement of an action to guarantee an ac-

Upload: phamcong

Post on 21-Apr-2018

218 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Multi-Robot Coordination Using Switching of … Coordination Using Switching of Methods for Deriving Equilibrium in Game Theory 175 tion decision of a robot whose cost value is relatively

174 ECTI TRANSACTIONS ON COMPUTER AND INFORMATION TECHNOLOGY VOL.8, NO.2 November 2014

Multi-Robot Coordination Using Switching ofMethods for Deriving Equilibrium in Game

Theory

Wataru Inujima1 , Kazushi Nakano2 , and Shu Hosokawa3 , Non-members

ABSTRACT

The study of a Multi-Agent System using multi-ple autonomous robots has recently attracted muchattention. With the problem of target a tracking asa typical case study, multiple autonomous robots de-cide their own actions to achieve the whole task whichis tracking target. Each autonomous robot’s actioninfluences each other. So, an action decision in co-ordination with other robots and the environment isneeded to achieve the whole task effectively.

The game theory is a major mathematical toolfor realizing a coordinated action decision. Thegame theory mathematically deals with a multi-agent environment influencing each other as a gamesituation. The conventional methods model oneof the target tracking as a n-person general-sumgame, and the use of the non-cooperative Nash equi-librium theory in non-cooperative games and thesemi-cooperative Stackelberg equilibrium. The semi-cooperative Stackelberg equilibrium may obtain bet-ter control performance than the non-cooperativeNash equilibrium, but requires the communicationamong robots.

In this study, we propose switching of meth-ods in the equilibrium derivation both from thenon-cooperative Nash equilibrium and the semi-cooperative Stackelberg equilibrium in a coordina-tion algorithm for the target tracking. In the sim-ulation, our proposed method achieves coordinationwith less connections than the method using the semi-cooperative Stackelberg equilibrium at all times. Fur-thermore, the proposed method shows better controlperformance than the non-cooperative Nash equilib-rium.

Keywords: Multi-Agent, Coordination Control,Robot, Game Theory

Manuscript received on July 20, 2013 ; revised on January 6,2014.Final manuscript received May 3, 2014.1,2 The authors are with Dept. of Mechanical Eng. and In-

telligent Systems, The University of Electro-Communications1-5-1, Chofugaoka, Chofu, Tokyo, 182-8585, Japan, E-mail:[email protected] and [email protected] The author is with Dept. of Electronic Eng., The University

of Electro-Communications 1-5-1, Chofugaoka, Chofu, Tokyo,182-8585, Japan, E-mail:[email protected]

1. INTRODUCTION

Recently, robots following people to support car-rying baggage has been studied. Using multi-robot,it is expected that an entire performance of achievingthe whole task is improved with cooperating individ-ual robot whose performance and costs are low. Forinstance, robots in cooperation can carry heavy bag-gage which a single robot cannot carry .

Thus, the study of a Multi-Agent System (MAS)using multiple autonomous robots is actively being in-creased. With the problem of target tracking as a typ-ical case study, the competitive situation to preventachiving the whole task by the robot collision witha tracked target robot, obstacles, and other robots isassumed. Therefore, a coordination control to solvethese is needed.

The game theory is the method realizing a co-ordinated action decision of each robot, and dealswith a multi-agent environment in which multiple au-tonomous agents decide their own actions to achivethe whole task and these action decisions influenceeach other [1].

In the game theory, the situation assuming anon binding agreement among agents is called “non-cooperative game”. In contrast, the situation assum-ing binding an agreement among agents is called “co-operative game” [2].

Skrzypczyk [3][4][5] introduces the cost value as amethod using the game theory for the target trackingproblem, and models the target tracking problem as“n-person general sum game”. The cost value indi-cates a burden to achieve the task based on the po-sition of each tracking robot, a tracked target robotand obstacles.

For deciding an action of the robot in coordina-tion for the target tracking, Skrzypczyk [3] presentsa method using “Nash equilibrium” based on the non-cooperative game theory. On the other hands, Har-mati [4] proposes “Stackelberg equilibrium” based ona type of cooperative game.

In the coordination using the Nash equilibrium,tracking robots as agents do not have communica-tion mechanism in a tracking team, and robots decidetheir own action by using only cost values. In the co-ordination using the Stackelberg equilibrium, track-ing robots have communication mechanism and ex-change information in a tracking robot team. Robotsobtain the agreement of an action to guarantee an ac-

Page 2: Multi-Robot Coordination Using Switching of … Coordination Using Switching of Methods for Deriving Equilibrium in Game Theory 175 tion decision of a robot whose cost value is relatively

Multi-Robot Coordination Using Switching of Methods for Deriving Equilibrium in Game Theory 175

tion decision of a robot whose cost value is relativelylarge. By the action decision under the assumption ofthis agreement, the Stackelberg equilibrium derives abetter balance among cost values than the Nash equi-librium [5].

Therefore, it is expected that the appearance of theremarkablly poor performance robot is suppressed.The Stackelberg equilibrium earns a lower cost valuethan the Nash equilibrium, and it is expected to ob-tain better performance to achieve the task than theNash equilibrium.

For these reasons, we propose a switching methodfor equilibrium derivation.

Our proposed method aims to coordinate the Nashequilibrium with the Stackelberg equilibrium thatneeds communication in the situation that only theNash equilibrium that needs no communication is dif-ficult to achieve the task.

2. FORMULATION OF PROBLEM

This chapter defines various information and esti-mated values that a control algorithm deals with. Todeal with the target tracking problem for the gametheory, we present a proposal for the target trackingby cost values.

2.1 Definition of Target Tracking and StateEstimation

With the target tracking in this study, there are asingle tracked target robot, a robot team constructedby N autonomous robots following a tracked targetrobot, and M static obstacles. The positions of eachrobot and obstacle are shown in Fig. 1. Each robotis cylinder in shape and has radius Dr.

The task of target tracking is that each au-tonomous robot in a tracking robot team achievescollision avoidance with the other tracking robots, atracked target robot, and obstacles, while keeping adesired formation. For the target tracking, rotationalangular velocity and translational velocity are con-trolled by a control algorithm. All tracking robotscan obtain the positions of all robots and obstaclesby a camera mounted in an environment.

The position pi of an autonomous roboti (i = 1, 2, . . . , N) in the tracking robot team, the po-sition g of a tracked target robot, and the position oj

of an obstacle j (j = 1, 2, . . . ,M) at discrete time tnobtained from a camera are defined in Eqs. (1)‘(3).

pi (tn) =[xi (tn) yi (tn)

]T(1)

g (tn) =[xg (tn) yg (tn)

]T(2)

oj (tn) =[xj yj

]T(3)

The positions of obstacles are constant.

The position c (tn) shown the center of a tracking

Fig.1: Position Relation

robot team is defined by

c (tn) =

[1

N

N∑i=1

xi (tn)1

N

N∑i=1

yi (tn)

]T

. (4)

The estimation is needed in order to obtain a routedirection, translational velocity, angular velocity andvarious information of robots that will be observed attn+1.

At discrete time tn, a robot i estimates the trans-lational velocity vg (tn) of a tracked target robot fromEq. (5) using the position obtained by a camera.

vg,x (tn) =xg (tn)− xg (tn−1)

tn − tn−1

=xg (tn)− xg (tn−1)

∆t

vg,y (tn) =yg (tn)− yg (tn−1)

tn − tn−1

=yg (tn)− yg (tn−1)

∆t

vg (tn) =

√(vg,x (tn))

2+ (vg,y (tn))

2(5)

The estimated position g (tn+1) = [xg (tn+1)

yg (tn+1)]Tafter one control period is obtained by

xg (tn+1) = xg (tn) + [xg (tn)− xg (tn−1)]

= vg,x (tn)∆t+ xg (tn)

yg (tn+1) = yg (tn) + [yg (tn)− yg (tn−1)]

= vg,y (tn)∆t+ yg (tn) . (6)

The route direction of a tracking robot i is esti-mated as

Θi (tn) = arctan

(yi (tn)− yi (tn−1)

xi (tn)− xi (tn−1)

). (7)

The potisions of tracking robots after one control

Page 3: Multi-Robot Coordination Using Switching of … Coordination Using Switching of Methods for Deriving Equilibrium in Game Theory 175 tion decision of a robot whose cost value is relatively

176 ECTI TRANSACTIONS ON COMPUTER AND INFORMATION TECHNOLOGY VOL.8, NO.2 November 2014

period depend on the control inputs applied to robots.The control algorithm shown in Eq. (8) is applied toa tracking robot i.

ui =[ωi,di (tn) vi (tn)

]Tωi,di (tn) ∈

[−π

6 , 0,π6

], 0 ≤ vi (tn) ≤ vmax

(8)

ωi,di (tn) and vi (tn) denote the the angular veloc-ity and translational velocity applied to a robot. didenotes an action decision of a robot i, which decidesthe angular velocity ωi,di (tn) from

(−π

6 , 0,π6

). vmax

denotes the maximum value of the translational ve-locity in a control input. vi (tn) is calculated by

vri (tn) = kf vg (tn) +li,g (tn)−Dr

kmDrkpvmax

vi (tn) =

{vri (tn) (vri (tn) < vmax)

vmax (otherwise) .(9)

Here, kfCkmCkp are factors, and li,g is the dis-tance between estimated position g (tn+1) after onecontrol period and the position pi (tn) of a trackingrobot i is shown as:

xg,dist = xg (tn+1)− xi (tn)

yg,dist = yg (tn+1)− yi (tn)

li,g (tn) =√

x2g,dist + y2g,dist . (10)

Thus, the translational velocity is deterministicallyderived at discrete time tn, so the role of the gametheory in a control algorithm is derivation of an ap-propriate ωi,di (tn).

Each autonomous robot control period is denotedby ∆t, and the time delay in the position detec-tion and the calculation of a control algorithm de-notes T0, ∆t−T0 = ∆t − T0 denotes between thetime when the calculation of a control algorithm iscompleted and the time when the next control pe-riod starts. From these, the estimated robot position

pi,di (tn+1) =[xi,di (tn+1) yi,di (tn+1)

]Tafter one

control period acquiring an action decision di (tn) are

calculated from Eq. (11) at discrete time tn

xi,di (tn+1)

= xi (tn)

+ vi (tn−1)T0 cos(Θi (tn) + ωi,di (tn−1)T0

)+ vi (tn)∆t−T0

· cos(Θi (tn) + ωi,di (tn−1)T0 + ωi,di (tn)∆t−T0

)yi,di (tn+1)

= yi (tn)

+ vi (tn−1)T0 sin(Θi (tn) + ωi,di (tn−1)T0

)+ vi (tn)∆t−T0

· sin(Θi (tn) + ωi,di (tn−1)T0 + ωi,di (tn)∆t−T0

)(11)

To estimate the center position of a tracking robotteam after one control period, when an action deci-sion set (d1, d2, ..., dN ) at discrete time tn is acquired,the position c (d1, ...dN ) is calculated from

c (d1, ...dN ) =

[1

N

N∑i=1

xi,di ,1

N

N∑i=1

yi,di

].(12)

2.2 Cost Value

To achieve the target tracking, the control algo-rithm makes an action decision by the game theoryusing the cost value indicating the situation of eachrobot in a tracking team. The cost value is designedto take a larger value in the action decision of thetracking robot preventing the achievement of the tar-get tracking.

In the assumption that the tracking robot acquireddecision set (d1, d2, . . . , dN ), the cost value Ii of thetracking robot i is given by Eq. (13).

Ii (d1, d2, . . . , dN )

= fi (d1, d2, . . . , dn,p1,p2, . . . ,pN ,o1,o2, . . . ,oM , g)

(13)

The cost value is constructed of four elements de-pending on the positions of other tracking robots, atracked target robot and obstacles. These are definedby

Ii (d1, d2, . . . , dN )

= K1,iI1,i +K2,iI2,i +K3,iI3,i +K4,iI4,i .

(14)

K1,iC K2,iC K3,iCK4,i are the weights ofcorresponding cost elements. Cost elementsI1,iCI2,iCI3,iCI4,i are defined by

Page 4: Multi-Robot Coordination Using Switching of … Coordination Using Switching of Methods for Deriving Equilibrium in Game Theory 175 tion decision of a robot whose cost value is relatively

Multi-Robot Coordination Using Switching of Methods for Deriving Equilibrium in Game Theory 177

I1,i (d1, d2, . . . , dN )

= minj=1,2,...,Nk=1,2,...,M

j =i

(lr,i,j , lo,i,k, li,g

)−1

. (15)

The cost element of Eq. (15) acts as a collisionavoidance term.

lr,i,j denotes the distance between the esti-mated positions of robots i (i = 1, 2, . . . , N) andj (j = 1, 2, . . . , N, i = j) in a tracking robot team.

Similarly, lo,i,k does the distance between the esti-mated position pi,di of a tracking robot i and the po-

sition ok (k = 1, 2, . . . ,M) of an obstacle k, li,g doesthe distance between the estimated position of a roboti and the estimated position g of a tracked targetrobot. A value of I1,i is increased when the robot iapproaching other robots, a tracked target robot andobstacles.

Therefore, to avoid a situation in which a collisionalmost close, a decision should be made to reduce thevalue of I1,i.

The cost element I2,i itself is just the distance lc,ibetween the estimated position pi,di of a robot i andthe estimated center position c (d1, d2, . . . , dN ) of atracking robot team. So, I2,i is defined as

I2,i (d1, d2, . . . , dN ) = lc,i . (16)

I2,i is the term for keeping a constant distanceamong tracking robots. So, I2,i take a larger valueas a robot leaving from c, I2,i.

The third cost element I3,i means to achieve oneof the task of target tracking where a tracking robotteam follows a tracked target robot while keepinga certain distance. I3,i is defined using a dstance

lc,g between the estimated position c (d1, . . . , dN ) ofa tracked target robot g and the estimated positionof robot team center and a desired value I3ref,i for lc,gto keep a desired formation as follows:

I3,i (d1, d2, . . . , dN ) =∣∣∣lc,g − Iref,3,i

∣∣∣ . (17)

Thus, increasing in the distance between an entiretracking robot team and a tracked target robot froma desired value makes I3,i large.

The fourth cost element I4,i means to keep a de-sired formation for a tracking robot team and is cal-culated from

I4,i (d1, d2, . . . , dN ) = lpi,p′i. (18)

where p′i denotes the estimated position of a track-

ing robot i when each tracking robot forms a de-sired formation including the estimated position g of

Fig.2: Control Algorithm

a tracked target. lpi,p′idoes the estimated position pi

by an action decision of a robot i and the estimatedposition p′

i.

Therefore, the more a robot i leaves far from theplacement to meet a desired formation, the larger I4,iis.

3. CONTROL ALGORITHM FOR COORDI-NATION

In this chapter, the control algorithm given to eachtracking robot in order to achieve the target trackingis described.

3.1 Control Algorithm

Figure 2 shows the control algorithm of each track-ing robot to be used in this study.

In the control algorithm, the target tracking ismodeled from g as the position of a tracked tar-get robot, pi (i = 1, 2, . . . , N) as the position of eachrobot in a tracking team and oj (j = 1, 2, . . . ,M) asthe position of each obstacle by a camera in a realenvironment, and the cost values are calculated.

Equilibriums S1, S2, · · · , Sl are derived using thegame theory from the calculated cost values. Whenthere are more than one equilibrium (l = 1), one equi-librium is selected in order to obtain a more even posi-tion control. If one equilibrium is selected, a trackingrobot obtains a di as the action decision. The controlinput ui is given to a tracking robot i by the deriveddecision.

3.2 Action Decision by Game Theory

In this study, two methods in the equilibriumderivation in the game theory are used to derive arational decision for each robot.

Page 5: Multi-Robot Coordination Using Switching of … Coordination Using Switching of Methods for Deriving Equilibrium in Game Theory 175 tion decision of a robot whose cost value is relatively

178 ECTI TRANSACTIONS ON COMPUTER AND INFORMATION TECHNOLOGY VOL.8, NO.2 November 2014

3.2...1 Non-cooperative Nash Equilibrium

We now introduce the non-cooperative Nash equi-librium in [5]. In this equilibrium, the target trackingproblem is regarded as a non-cooperative game. TheNash equilibrium which has self-enforcing [6] is ob-tained. The tracking robot acquires the action min-imizing the cost value by this equilibrium and even-tually achieve the task.

Here, the Nash equilibrium is a set of deci-sions

(dNN1 , dNN

2 , . . . , dNNN

)so that the cost values

I1, I2, · · · , IN of all tracking robots satisfy Eq. (19).

I1(dNN1 , dNN

2 , . . . , dNNN

)≤ I1

(d1, d

NN2 , . . . , dNN

N

)I2

(dNN1 , dNN

2 , . . . , dNNN

)≤ I2

(dNN1 , d2, . . . , d

NNN

)...

IN(dNN1 , dNN

2 , . . . , dNNN

)≤ IN

(dNN1 , dNN

2 , . . . , dN)

(19)

Therefore, the cost value INNi of a tracking robot

acquired by the Nash equilibrium is represented by

INNi = Ii

(dNN1 , dNN

2 , . . . , dNNN

). (20)

3.2...2 Semi-Cooperative Stackelberg Equilibrium

In the non-cooperative Nash equilibrium, the equi-librium cannot always makes evenly cost values of en-tire tracking robots. Therefore, when there is a robotwhose cost value is different from cost values of otherrobots, an inconvenient situation for the robot maycontinue.

And so, we next introduce the semi-cooperativeStackelberg equilibrium in [5] to stimulate the even-ness of the cost value by a decision which focuses onreducing the maximum cost value in a tracking robotteam. In the semi-coorperative Stackelberg equilib-rium, the communication mechanism is set to eachtracking robot, and tracking robots exchange theirown cost values each other in a robot team.

The leader as a robot with the maximum cost valueand the followers as other robots are classified bythese cost values. The agreement to guarantee thedecision of the leader is obtained. Next, assuming theguarenteed decision of the leader, decisions of follow-ers are derived by the Nash equilibrium in the non-cooperative game constructed only with followers (thefollowers game).

Thus, The target tracking problem is dealt with asa semi-cooperative situation because decisions are ob-tained among followers in the non-cooperative gametheory. On the other hand, the decision of the leaderis obtained in cooperative game theory.

The robot i as the leader derives a Nash equilib-rium d−i = (d1, d2, . . . , di−1, di+1, . . . , dN ) assuming

the followers game when the decision di of leader islocked.

The leader repeats this process for their ownall decisions, and derives all Nash equilibriums offollowers for each decision of the leader. Theirown decision dSS

i and followers’ decision dSS−i =(

dSS1 , dSS

2 , . . . , dSSi−1, d

SSi+1, . . . , d

SSN

)of followers which

minimize the cost value Ii. After that, the leadercommunicates dSS

i to each follower, and each followerderives the Nash equilibrium of the followers game as-suming communicated dSS

i .

From these, a Stackelberg equilibrium(dSS1 ,dSS

2 ,. . . ,

dSSN

)is acquired in the semi-cooperative game.

Here, ISSi as the cost value obtained by a tracking

robot i in the Stackelberg equilibrium of the semi-coorperative game is represented as

ISS1 = I1

(dSS1 , dSS

2 , . . . , dSSN

)ISS2 = I2

(dSS1 , dSS

2 , . . . , dSSN

)...

ISSN = IN

(dSS1 , dSS

2 , . . . , dSSN

). (21)

The leader is represented as the robot L ∈1, 2, . . . , N , and Eq. (22) is established.

ISSL 5 IL

(dNN1 , dNN

2 , . . . , dNNN

)(22)

3.2...3 Equilibrium Selection

Multiple Nash equilibriums may exist. To selectthe equilibrium S′ which attains the minimum devia-tion of cost values among multiple equilibriums, Eqs.(23), (24) and used in [3].

C (Sk) =N∑i=1

(Ii (Sk) + Idevi (Sk)

)S′ = arg min

k=1,2,··· ,lC (Sk) (23)

Idevi =

∣∣∣∣∣∣Ii − 1

N

N∑j=1

Ij

∣∣∣∣∣∣ (24)

3.3 Switching of Methods in EquilibriumDerivation

In the traditional control algorithm, any oneequilibrium derivation is decided to either non-cooperative Nash equilibrium or semi-cooperativeStackelberg equilibriumin in advance. In a non-cooperative Nash equilibrium, when these remainsan inconvenient situation for a tracking robot signifi-cantly leaving from the task, it is effective to encour-age the evenness of cost values and to focus on makingthe maximum cost value in a tracking robot team low

Page 6: Multi-Robot Coordination Using Switching of … Coordination Using Switching of Methods for Deriving Equilibrium in Game Theory 175 tion decision of a robot whose cost value is relatively

Multi-Robot Coordination Using Switching of Methods for Deriving Equilibrium in Game Theory 179

for achieving the task. But, a semi-cooperative Stack-elberg equilibrium requires communications amongrobots. So, a non-cooperative Nash equilibrium isfavorable from the point of view of burden in com-munication.

By properly switching these two methods in theequilibrium derivation, the improvement of commu-nicative efficiency and the ability to achieve the taskare expected very much. Therefore, in this paper,we calculate equilibriums of two methods in the equi-librium derivation in parallel. And switch these twoequilibriums depending on the IMAX of each equi-libriums. In order to switch from a non-cooperativeNash equilibrium to a semi-cooperative Staclkelbergequilibrium in an appropriate timing, a cost thresh-old Ith is introduced. Only when IMAX as the max-imum cost value in a tracking robot team becomesIMAX = Ith, the tracking robot team uses a semi-cooperative Stackelberg equilibrium. Otherwise thetracking robot team uses a non-cooperative equilib-rium.

4. VALIDATION BY THE SIMULATION

We validate the effectiveness of a decision usinga switching method for the equilibrium derivationthrough a simulation. It is simulated that three track-ing robots follow one tracked target robot while mak-ing a formation. All the parameters in the simulationis shown in Table 1.

Table 1: Parameters used by SimulationParameters Values

Translational Velocity0.017 [m/s]

of Tracked Target vgMax Translational Velocity

0.023 [m/s]of Tracking Robot vmax

Radius Dr 0.09 [m]kf 0.3kp 0.8km 10

Computation Time T0 0.05 [s]Control Period ∆t 0.1 [s]

Cost Weight K1,i (i = 1, 2, 3) 1.8Cost Weight K2,i (i = 1, 2, 3) 7Cost Weight K3,i (i = 1, 2, 3) 11Cost Weight K4,i (i = 1, 2, 3) 2

Iref,3,1 0 [m]Iref,3,2 0.04 [m]Iref,3,3 0.04 [m]

Cost Threshold Ith 7.96

Each parameter is determined by trial and er-ror. The orbits of these robots when only a non-cooperative Nash equilibrium is used for their owndecision are shown in Fig. 3.

From Figure 3, each tracking robot starts movingfrom the initial position, and it is confirmed that each

Fig.3: Orbits of Robots by Non-Cooperative NashEquilibrium

tracking robot moves with formation keeping, colli-sion avoidance and tracked target following robot.

A relatively large disturbance in the formation ofa tracking team occurs after the position (x, y) =(6.8[m], 3.8[m]) in which a tracked target changes inthe route direction.

Nevertheless, target tracking and collision avoid-ance have been achieved finally. But alignments ofthe tracking robot 0 and the robot 1 are reversed atthe scene of avoiding the second obstacle. Eventuallya precise formation has never been achieved.

When a relatively large disturbance in the forma-tion occurres in the obstacle avoidance, it is consid-ered that the tracking robot whose cost value risessignificantly compared to the other tracking robotsappeares.

But because a non-cooperative Nash equilibriumconsiders not only their own cost value but also re-ducing the cost values of the other tracking robots,even if the robot keeps away significantly from achiev-ing the task, it is difficult to make a decision to focuson reducing the cost value of a specific robot.

Therefore, the action decision which keeps a dif-ferent formation from desired has been obtained.

In this situation, in order to keep the desired for-mation, it is effective to acquire a decision to focus onreducing the maximum cost value in a tracking robotteam.

Next, the orbit of each robot when using the pro-posed switching method for the equilibrium deriva-tion for decision is shown in Fig. 4. Because eachrobot uses a semi-cooperative Stackelberg equilibriumonly if the maximum cost value in a tracking robotteam is larger than Ith, the number of communica-tion in a tracking robot team is suppressed to about77.1% in the case of using a semi-cooperative Stack-

Page 7: Multi-Robot Coordination Using Switching of … Coordination Using Switching of Methods for Deriving Equilibrium in Game Theory 175 tion decision of a robot whose cost value is relatively

180 ECTI TRANSACTIONS ON COMPUTER AND INFORMATION TECHNOLOGY VOL.8, NO.2 November 2014

Fig.4: Orbits of Robots by Switching Methods

elberg equilibrium at all times.Similarly to Fig. 3, each tracking robot starts mov-

ing from the initial position in Fig. 4, and it is con-firmed that each tracking robot moves with formationkeeping, collision avoidance and tracked target follow-ing. But in the scene of avoiding the second obstacle,the behavior of tracking robots differ from Fig. 3.

Finally, the reverse placement of tracking robotsobserved in Fig. 3 has never occurred. As a result, aprecise formation keeping has been achieved.

Fig.5: Maximum Cost Values in Team of TrackingRobot

The maximum cost value in a tracking robot teamis shown in Fig. 5 in the cases of using only a non-cooperative Nash equilibrium and using switching oftwo methods in the equilibrium derivation, and thetransiton of the switching is shown in Fig. 6 in thecases of using switching of two methods in the equi-

Fig.6: Transition of Switching in Switching Methods

librium derivation.The variation of the maximum cost value hardly

occurs until the halfway point from the start in thesimulation. But in the avoidance of a tracking robotteam from the second obstacle, variations in the or-bits of tracking robots are observed.

These are due to the following: The decision ismade to focus on reducing the maximum cost valuein a tracking robot team because a semi-cooperativeStackelberg equilibrium is used as a method in theequilibrium derivation caused by the maximum costvalue that is larger than or equal to Ith as the costthreshold in use of the switching method in the equi-librium derivation.

Owing to this effect, it is expected that the appear-ance of a tracking robot whose cost value is largerthan those of other robots is suppressed. Therefore,the reverse placement does not occur.

Finally it is considered that a precise formationkeeping has been achieved.

5. CONCLUSION

In this study, we proposed a switching method forthe equilibrium derivation in the game theory for thecontrol algorithm to target tracking, and the effec-tiveness of our proposed method was demonstratedthrough a simulation.

In a non-cooperative Nash equilibrium, the situ-ation that the task has not been achieved occurredin the long term. Especially, in the situation wherean increasing risk of collision by a rapid unplannedchange in the route direction of a tracked target robotwhile approaching obstacles, it was confirmed that ef-fective measures cannot be implemented for the ap-pearance of the robots remarkably deviating awayfrom the achievement task.

In the switching method for the equilibrium deriva-tion, only when the maximum cost value in a trackingrobot team that was larger than or equal to the costthreshold, we switched from a non-cooperative Nash

Page 8: Multi-Robot Coordination Using Switching of … Coordination Using Switching of Methods for Deriving Equilibrium in Game Theory 175 tion decision of a robot whose cost value is relatively

Multi-Robot Coordination Using Switching of Methods for Deriving Equilibrium in Game Theory 181

equilibrium to a semi-cooperative Stackelberg equi-librium.

Thus, it was possible to suppress the communi-cation among robots in comparison with the case ofusing a semi-cooperative Stackelberg equilibrium atall times. In addition, the effect of equalizing thecost values of tracking robots in a semi-cooperativeStackelberg equilibrium worked well by focusing onthe robot which obtained the maximum cost value.

As a result, it was confirmed that the placement ofdesired formation that has never been achieved in anon-cooperative Nash equilibrium has been achieved.

References

[1] Azuma Ohuchi, Masahito Ymamoto, HidenoriKawamura, “Theory and application of multi-agent systems : computing paradigm from com-plex systems engineering”, Corona PublishingCo., Ltd., 2002 (in Japanese).

[2] Shigeo Mutou, “Game Theory”, Ohmsha, Ltd.,2011 (in Japanese).

[3] K. Skrzypczyk, “Game theory based target fol-lowing by a team of robots”, Proceedings of ForthInternational Workshop on Robot Motion andControl, pp.91-96, 2004.

[4] I. Harmati, “Multiple robot coordination for tar-get tracking using semi- cooperative stackelbergequilibrium”, IEE International Control Confer-ence, 2006.

[5] I. Harmati, K. Skrzypczyk, “Robot team coordi-nation for target tracking using fuzzy logic con-troller in game theoretic framework”, Roboticsand Autonomous Systems, Volume 57, Issue 1,pp.75-86, 2009.

[6] Akira Okada, “Gane Theory”, Yuhikaku Publish-ing Co., Ltd., 1996 (in Japanese).

Wataru Inujima is a master’s coursestudent in the second year of TheUniversity of Electro-Communications,Tokyo, Japan. He received B.Eng. from The University of Electro-Communications in 2012. His researchinterests include Control Engineeringand Robotics.

Kazushi Nakano received his Dr.Eng. degree from Kyushu University,Fukuoka, Japan, in 1982. From 1980,he was a Research Associate at KyushuUniversity. From 1986, he was an As-sociate Professor at Fukuoka Instituteof Technology. He is currently a Pro-fessor in the Department of Informat-ics and Engineering, the University ofElectro-Communications, Tokyo, Japan.His interests include system identifica-

tion/control and their applications. He is a member of IEEE,ISCIE and IEEJ.

Shu Hosokawa received his B.Eng.from The University of Electro-Commu-nications, Tokyo, Japan in 2007. In2009, he received M.Eng. from the grad-uate school of The University of Electro-Communications. His research inter-ests include Contorl Engineering andRobotics. Now he is working at ControlSystem Security Center.