0 solving n-player games: a ... - yau-awards.science¼ 子芃.pdf · 0 solving n-player games: a...

0

Solving n-Player Games: aCombinatorial-Game-Theory Approach

Zipeng Zhang

AbstractCombinatorial game theory (CGT) is the study of 2-player games with perfect information and

no chance moves. In the past, some scholars, such as Krawec[4], had tried to extend this study togames with more than 2 players. In this paper, we provide a unique method of analyzing gameswith payoffs in which there are multiple players, each making multiple moves through the courseof game. In specific, we propose a backward-induction[11] function that is able to determine the netpayoffs in an n-player game, assuming all players play optimally. After this function is derived, weutilize it to analyze cases, such as the cheating behavior of NEV1 producers, that have applicationsin the real world. Lastly, this paper presents a way, based on the previous analytical method, ofanalyzing n-player games where random moves are inevitably involved. It’s found that this leads toa more general case of n-player games, in which identical payoffs are involved, worthy for furtherstudy.

Keywords Combinatorial game theory, n-player games, Preference system

Statement of Orginality

Our team claims that the paper submitted is our own accomplishment under the guidanceof the instructor. To our best knowledge, the content in the paper does NOT include anyresults from other researchers’ works, except as specially noted and acknowledged in thetext. If there is any dishonesty, we are willing to bear all relevant responsibilities.

Signature: Zhang ZipengAdvisor: Zhang Zhonghuan

1NEV is short for New-Energy Vehicle.

David Zhang

1

CONTENTS

I Introduction 1

II Methodology 2II-A Multi-Player Games

with a Defined Winner 2II-B Multi-Player Games

with Payoffs . . . . . . 3II-C Games with Mixed

Strategies . . . . . . . . 5

III Examples and Applications 6III-A Multi-Player Games

with a Defined Winner 6III-B Multi-Player Games

with Payoffs . . . . . . 7

IV Discussions 8

V Conclusion 9

VI Appendices 11

I. INTRODUCTION

Combinatorial games have been studiedextensively since at least the 19th century. Atypical kind of these games is a game includ-ing 2 players making moves in turn; what’sspecial and necessary is that each player hasperfect information of the game and there areno chance moves or ties involved. Researcherssuch as Berlekamp[1] and Nivasch[2] have de-veloped efficient ways to study some gamesof this type2, including determining the gamevalues and the best strategies for each player.

However, it’s not until recent decades thatscholars paid more attention to less conven-tional combinatorial games (e.g. games withpasses[3], games with imperfect players 3). Inthis paper, a special kind of combinatorialgames is studied: the game has n players.In fact, n can be either 1, 2, or greater than2, which corresponds to three different gamecategories: for n = 1, it is no longer a gamebut a situation that a person finds the optimalstrategy according to his decision tree[12]4;an n-player game (n > 2) is an extensionof 2-player games regarding the restrictionson the number of players, and since 2-playergames have been most commonly studied,some cases of this type of games will be usedto test the theories about games with moreplayers.

There are two major differences between n-player games and one-on-one games. Firstly,

2Although many combinatorial games still remain un-solved, 2-player games are very well researched compar-atively, and relevant algorithms based on many differentprinciples have been proposed.

3In this kind of games, at least one player doesn’t manageto or know how to choose the best strategy for his ownbenefits, regarding the whole game. In Analyzing n-playerimpartial games[4], Krawec gives an overview of a specialtype of games with imperfect players, in which some playersmake random moves with uniform probability distribution.

4This game can be viewed as a decision tree with weights,corresponding to the player’s payoffs for making moves.If there are no payoffs involved, then an 1-player game isusually of no interests for studying.

2

the definition of the game value of an n-player game cannot be the same as before,because there will be a single winner andseveral losers. Secondly, the generalizationto games with n players gives rise to anessential question: if a player can’t win, then,if not able to exit or pass, who will hesupport to win? That question leads to twoimportant concepts: Alliance and Preference.In the first case (Kelly[5]), the players willform m alliances (m ≤ n) and only care aboutthe benefits of their alliances, and usually allalliances are possible to win. In another case(Krawec[4]), each player will have an orderof preferences of the rest of the players, andamong all the players possible to win, hewill support his most preferred one5. In thispaper, only preferences are considered, andan approach similar to Krawec’s is taken.

After deriving the function to determinethe value of n-player games with the winnerdefined, we then move to multi-player gameswith payoffs. In order to solve these games,a preference-of-payoff system is introduced,and this leads to the important discussion thatKrawec’s preference matrix is not simply apreference of player; instead, it can be gen-eralized as a preference-of-outcome matrix,so that it is able to solve n-player games inbroader terms.

In convenience, the i’th player is denoted asPi, with i representing the number of him andPi standing for he himself. The players arenumbered from 0 to n−1, and Pk is equivalentto P(k mod n) if k > n− 1. Also, it’s assumedthat all players in the games analyzed in thispaper play optimally.

5Although rarely seen, there are cases where a playerprefers some other player(s) to himself, which leads to aninteresting discussion of altruism.

II. METHODOLOGY

A. Multi-Player Games with a Defined Win-ner

We set the rules of the game with multipleplayers as:

• The person who can’t make any moveswins (Misere winning rule).

• All players make decisions in an orderfrom P0 to Pn−1.

• There is a finite number of positions andevery player knows them.

• There is no tie.• No chance moves are involved.After setting the rules, we start with some

necessary definitions used by this paper.

Definition 1.1 A game G with kpossible moves is defined as a set,G = {G1, G2, · · · , Gk}, where Gi is aresulting game that is one move away fromgame G; in convenience, we sometimes callGi the moves. Therefore, a combinatorialgame is a finite collection of game sets(which are themselves collections of gamesets, etc.).

Definition 1.2 An end game is agame G where the current player cannotmake any legal moves, and it is denoted asG = ∅.

Definition 1.3 The value of gameG at round t is defined as the number of thisgame’s winner relative to the number of thecurrent player6. This value is determined bythe recursive function g(G, t), which is goingto be defined later in context. In other words,if Pi is the winner of game G at round t,then g(G, t) = i.

6Note: Because of this definition, the game value inthis paper is completely different from the Sprague-Grundyvalue.[2]

3

Definition 1.4 The preferences forall players are defined in a matrix:

PM =

A0,0 A0,1 · · · A0,n−1

A1,0 A1,1 · · · A1,n−1...

... . . . ...An−1,0 An−1,1 · · · An−1,n−1

where Ai,j is the most preferred player forplayer Pi with a smaller value of j beingmore preferred. (The number of the player isrelative to the current player.)

The current player’s best strategy is: forhim (who is P(t mod n)), among all the playerswho can win if he lets them win, he will lethis most preferred one to win. Representingthat statement mathematically, we have

g(G, t) =

{0, G = ∅At,k, otherwise

(1)

such that:

k = min {j ∈ N|At,j = g(H, t+ 1) + 1, H ∈ G}

where t actually means t mod n, H is thegame after the current player makes his move.For simplicity, in this section the arithmeticwill be done modulo n unless otherwise spec-ified.

Thus, g(G, t) is a recursive function thatstarts from the bottom vertex of the gamedecision tree (GDT) and computes the valueof each node to the top.

Proof If the game is an empty set(G = ∅), then, by Definition 1.3, its valueshould be 0 because the current player (whois P0 relative to the current player) can’tmake any legal moves and thus is the winner.Hence, the boundary value of the recursivefunction, g(∅, t) = 0, is proven.

If the game G is not an empty set, thena player At,k who can win if the currentplayer lets him win will satisfy the conditionthat At,k = g(G, t) = g(G, t + 1) + 1 =

· · · = g(G, t + a) + a (where a ∈ N). Also,because the current player will make theoptimal move by letting his comparativelymost preferred player to win, he shouldchoose the one with smallest k among allthe possible values of At,k. Therefore, therecursive sequence g(G, t) = At,k with k =min {j ∈ N|At,j = g(H, t+ 1) + 1, H ∈ G}is correctly defined.

B. Multi-Player Games with Payoffs

This type of games is more commonly seenin economic studies. First of all, the rules ofthese games are:

• The winner and loser are not defined.• Each player receives an assigned payoff

at the end of the game depending onlyon the position of the ending point.

• The information about all possible pay-offs and game positions (which are finite)are known by all players before the gamestarts.

• All players play in the order from P0 toPn−1.

In fact, Krawec’s approach can beunderstood as: the outcome of the game isthe number of the winner, and the more thewinner is preferred by Pi, the better theoutcome is to Pi. In analogy, the outcomeof an n-player game with payoffs is thefinal payoffs to the players, and the morepreferred the payoff is for Pi, the better theoutcome is to Pi. Therefore, we can define anew preference matrix and solve the n-playergames with payoffs systematically.

Definition 2.1 A game G isdefined as a finite collection of sets,G = {G1, G2, · · · , Gk}, where Gi is apossible move that leads to the next game ofG. Each Gi is, if not a vertex, also a finite

4

collection of sets, etc.7

Definition 2.2 An end game is agame G where the current player has nolegal moves available, and such G is a vectorwith each entry αij being player Pj’s i’thpayoff.8

Definition 2.3 The outcome of game Gat round t is defined as the equivalent payofffor the current player. The payoff of a gameG at round t is determined by the recursivefunction p(G, t), which is going to be definedlater. In other words, if Pi is the player ofgame G at round t and his payoff is pi, thenp(G, t) = pi.

Definition 2.4 The Preference-of-PayoffMatrix is defined as:

PPM =

f0,1 f0,2 · · · f0,mf1,1 f1,2 · · · f1,m

...... . . . ...

fn−1,1 fn−1,2 · · · fn−1,m

with fi,j being the j’th most satisfactorypayoff for player Pi

9. The total number ofpayoffs is m.

Definition 2.5 To serve the purposeof this paper, we define a notation ρ(A).Let ui be the element in set A, if all ui arevectors (ui ∈ V ), then ρ(A) = 1; otherwise,ρ(A) = 0.

Definition 2.6 To serve the purpose

7Note: although our notation does not account for non-impartial games, the moves in a game can depend on theplayer to move, so partizan games are included in ouranalysis.

8This notation may seem that G is a set of bases fora subspace of the vector space V , but in fact they arecompletely different.

9Note: the preference-of-payoff matrix PPM is probablynot a square matrix, because the total possible number ofoutcomes m now does not necessarily equal the total numberof players n.

of this paper, we define a function T as:T (p(G, t2), Pt1 , Pt2) returns player Pt1’scorresponding payoff if player Pt2 receives apayoff of p(G, t2) in game G at round t2.

The current player’s best strategy is: forhim (who is P(t mod n)), among all the possiblepayoffs that are possible to be the outcome ofthe game, he will make a move which leadsto his comparatively most preferred payoff.10

Representing that statement mathematically,we can define the function p to be

p(G, t) =

{ft,a, ρ(G) = 1

ft,l, ρ(G) = 0(2)

such that:

l = min{j ∈ N|ft,j =

T (p(H, t+ 1), Pt, Pt+1), H ∈ G}

a = min{k ∈ N|ft,k ∈

⋃i

αij

}where t should be modulo n, H is the result-ing game after the current player makes hismove, and Pj is the player who makes thelast move (j = ttotal mod n).

Thus, p(G, t) is a recursive function thatstarts from the bottom vertex of the GDTand computes each current player’s payoffto the top. After it determines the firstplayer’s payoff, we can use the functionT (p(G, 0), Pi, P0) to find player Pi’s payoff.

Proof If G is a collection of vectors(ρ(G) = 1), then the last round of the gameis reached, and the current player shouldmake his optimal move by choosing thestrategy that returns the most satisfactoryresult. That said, for all the payoffs available

10It’s usual to assume that all players prefer higher payoffs.However, with PPM, this doesn’t necessarily need to be thecase.

5

to him at this round (⋃i

αij), he should

choose the one with the smallest k, so theboundary value p(G, t) = ft,a with a =min{k ∈ N|ft,k =

⋃i

αij}, if ρ(G) = 1 isproven.

If G is a collection of sets, then a pay-off ft,l that the current player can possi-bly receive will satisfy the condition thatft,l = p(G, t) = T (p(G, t + 1), P t, Pt+1) =· · · = T (p(G, t + a), Pt, Pt+a) (where a ∈N). Also, because the current player willmake his optimal move by receiving thecomparatively most satisfactory payoff, heshould choose the one with the smallest valueof l among all the possible values of ft,l.Therefore, the recursive sequence p(G, t) =ft,l with l = min{j ∈ N|ft,j = T (p(H, t +1), Pt, Pt+1), H ∈ G}, given ρ(G) = 0 iscorrectly defined.

C. Games with Mixed Strategies

In the previous part, the functionT (p, Pt, Pt+1) was used to find the currentplayer’s payoff given the payoff of the playerat the next round. However, in some casesthere may be identical payoffs for the nextplayer even if he makes different moves,while they correspond to different payoffsfor the current player. In that situation, thefunction T is invalid because a mathematicalfunction can’t take 1 (or 1 set of) inputand return more than 1 (or 1 set of) values.Therefore, we need to craft a new way toanalyze these games.

A game of this kind may have a GDT asshown in figure 1. This GDT indicates that ifthe first player chooses his first move, then nomatter what move the second player makes,P1 will always earn 3, while P0 may earn2 or 4. If P0 earns 4, then it’s worthy forhim to choose his first move; if not, thenhe shall choose his second move to earn 3.Because negotiation is forbidden, the second

Fig. 1. A 2-round, 2-player game with payoffs. The numberon the edge represents the No. of the move for the currentplayer, and the tuple (a, b) gives the payoffs for the 2 playersrespectively.

player will make random moves among allthe choices that will yield identical payoffs,while the first player makes his moves basedon the probability distribution of the secondplayer. The study of chances and probabilityis not involved in combinatorial game theory,so this paper won’t be primarily focused onthat topic.

Firstly, we need an accurate definition forthe function T . We define it now to be avector function T(p(G, t2), Pt1 , Pt2) that re-turns all of player Pt1’s payoffs as a vectorif player Pt2 has a payoff of p(G, t2); if Pt1

has only one corresponding payoff, then thefunction T returns a vector with only oneentry. Also, if payoffs for k moves in roundt are identical, then the player’s decisionswill be made according to the probabilitydistribution vector pt = (p1, p2, · · · , pk), withk∑

i=1

pk = 1. When the function T returns a

vector with only one entry, the player Pt2 hasonly one move to make, and his probabilitydistribution is then pt2 = (1). The probabilitydistribution vectors are known by all playersbefore the game starts. Therefore, the inner

6

product < p,T > returns a player’s averagepayoff.

Using these notions, we can postulate thepayoff function of game G at round t, assum-ing all players are profit-maximizing, to be

p(G, t) =

max{⋃

i

αij

}, ρ(G) = 1

max{< pt+1,

T (p(H, t+ 1), Pt, Pt+1) >},

otherwise(3)

where t should be modulo n, H is theresulting game after the current player makeshis move, and Pj is the player who makes thelast move.

III. EXAMPLES AND APPLICATIONS

To test and apply the functions derivedearlier, we use them to solve several examplesof games, some of which having importantapplications in the real world.

A. Multi-Player Games with a Defined Win-nerExample 1.1 Consider a nim game with 2players. There are 2 heaps of cards respec-tively of sizes 1 and 2, which looks like:

/

//

Since that’s a 2-player game with each playerbeing rational and self-interested, the prefer-ence matrix is: [

0 10 1

]Analyzing how each player can take cardsaway from the 2 heaps, we can draw the fol-lowing GDT. The process is straightforwardbut tedious, so it’s not discussed.

Fig. 2. Game Decision Tree of Example 1.1. The tuple a, bmeans that there are a cards left in the first heap and b cardsleft in the second heap. Tij means that the current playertakes i cards from the j’th heap.

We can see from this simple decision treethat if the first player takes 1 card from thesecond heap or takes both 2 from this heap, hewill win the game no matter how the secondplayer moves. Therefore, this game shouldhave a value of 0, meaning that player P0

(the first player) will win.The vertex in the bottom of the GDT rep-

resents ∅, so g(G, 2) = g(∅, 2) = 0. Usingequation II-A, we can see that the subscriptk at round t = 1 is

k = min {j ∈ N ∪ {0}|A1,j = g(∅, 2) + 1}= min {j ∈ N ∪ {0}|A1,j = 0}= 2

Therefore, the game value at this round is

g(G, 1) = A1,2

= 1

Similarly, applying equation II-A again wecan find the value of the game at its startingpoint as g(G, 0) = 0, which agrees with ourprevious derivation by commonsense.

Example 1.2 Consider a nim game

7

with 4 players. There are 4 heaps of cardsrespectively of sizes 3, 5, 2, 4, which seemslike:

///

/////

//

////

Using the recursive function (this time, therecursion has too many steps to be presented,and the computation is done by the recursiveprogram), we find that the value of this gameis g(G, 1) = 2.

B. Multi-Player Games with Payoffs

Example 2.1 The decision tree of the gameis shown in figure 3.

Fig. 3. The number on each edge represents the number ofthe current player’s possible move, ”DP” stands for DecisionPoint, and the vector (a, b, c) gives the payoffs for the 3players respectively.

Represented as a set, this game is

G ={{{(2, 5, 1)} , {(2, 7, 4), (2, 7, 2)}} ,

{{(−4, 6, 8), (−4, 6, 3)}}}

We assume that all 3 players are profit-maximizing, so the preference-of-payoff ma-trix is

PPM =

2 2 2 −4 −47 7 6 6 58 4 2 1 3

For P0, he has two possible moves. If he

decides to make the first move, then P1 willhave 2 possible moves. If he then choosesto make his first move, then the only choiceleft for P2 is to receive a payoff of 1. If P1

decides to makes his second move, then P2

can choose between his second and thirdmoves. Because making the second movereturns him a better payoff (in third rowof the PPM, 4 is on the left to 2), he willcertainly choose his second move. Therefore,if P1 makes his first move, then he will earn5, and if he makes his second move, then hewill earn 7, which means that he will makehis second move (in the second row of thePPM, 7 is on the left to 5). Therefore, thepayoff for P0 is 2. Similarly, if P0 makes hissecond move, he will incur a loss of 4. As aresult, P0 will choose his first move (in thefirst row of the PPM, 2 is on the left to −4),so that the net payoff vector for the game is(2, 7, 4).

Example 2.2 In recent years, therehas been a growing concern that many ofthe NEV producers are trying to cheat toearn the government’s subsidy for developingnew-energy technology11. It turned out thatalmost all investigators know whether theproducers investigated have cheated, butwhether they report the cheating behavior tothe government or take a bribe to collude

11Each car sold by the NEV producer can receive a hugeamount of subsidy, so the producer may make a false reportabout the actual number sold. Since the government usuallysends an investigator to check the producer’s report, he islikely to negotiate a bribe with the investigator to colludetogether.

8

becomes a different story. Therefore, thegame between a producer and an investigatoris with perfect information. In this example,we are going to analyze a case of subsidycheating.

Fig. 4. This is typical GDT between a car producer andan investigator. ”P” stands for Producer and ”I” stands forinvestigator. For the investigator, move 1 means ”Report thecheating behavior”, while move 2 means ”NOT report thecheating behavior”; for the producer, move 1 means ”cheatwithout a bribe”, move 2 means ”develop technology (nocheating)”, and move 3 means ”cheat with a bribe”. Thedata collected taken from [1] and [2]’s reports.

As shown in figure 4, the payoffs of thisGDT are given. Using equation 2, it’s notdifficult to determine that the net payoffvector of this game should be (1.5, 40000),which means that the producer will developnew-energy technology and the investigatorwon’t report the cheating behavior. However,if the investigator is not allowed to receive thebribe and report cheating at the same time,the outcome will be that the producer cheatswith a bribe, which can be demonstrated byanalyzing the new GDT if the end node withpayoff (11,−10005) is removed.

In fact, this situation takes place in cy-cle, and, in order to regulate the cheatingbehaviors, the government usually decreases

the investigator’s stable salary12 constantly(in this example, each cycle the governmentdecreases the investigator’s stable salary by33%) until a cheating behavior is reported(which means the investigator’s stable salaryin the next cycle will be less if he doesn’treport a cheating). Assuming the bonus andthe bribe are the same, we can draw a newGDT illustrating the effect of such a policy,as presented in figure 5. If needed, you cansee a larger display of this GDT in Appendix2.

Fig. 5. Usually one cycle takes 4 months to complete. Inthis graph, two cycles are shown, covering the game betweenthe producer and the investigator for 8 months. Seven morecycles are taken into account when calculating the payoffs,which are not shown in this graph so it doesn’t look toodense. After the cheating behavior is reported, the firm willbe forced to shut down, ending the game.

Analyzing the complete game using equa-tion 2, It is found that the producer willdecide to develop new technology starting atthe fifth round, because if not the investigatorwill then report his cheating behavior, forc-ing him to leave the industry. Hence, if thestable salary for the investigator falls below1.5 × (1 − 33%)4 = 0.3, he will not colludewith the producer, which forces the producerto develop new technology.

IV. DISCUSSIONS

Based on the application of equations 1and 2 to some example games, it’s discovered

12To see a clear definition of stable salary and its effect onthe investigator’s earning, please refer to Corruption, evasionand environmental policy: a game theory approach[6]

9

that these recursive functions yield accurateresults in a reasonable amount of time. Thetime it takes for the recursive functions toreturn results is proportional to the numberof nodes in the GDT, since this backwardinduction[10] begins from the end points andmoves toward the start of the game one nodeby one node. However, because the numberof nodes grows exponentially when the num-ber of the rounds increases, the efficiencyof this recursive algorithm still has limitswhen applied to large games. Therefore, it’sworthy for further study of some heuristicsolutions to reduce the time complexity tobe in polynomial relation with the number ofrounds, some already tried by scholars13.

A more general case of n-player gameswith payoffs is one with identical payoffs,which means some players are indifferentabout the moves at certain rounds. To solvegames of that kind, mixed strategies andchance moves are involved. When equation3 is used to solve games involving mixedstrategies, it’s found that the payoff vectorsneed to be updated through the course of therecursion. When probability is introduced intoour study, it’s inevitable to consider the aver-age payoff a player will earn. This conceptis not within the field of combinatorial gametheory, and it may lead to new definitions ofthe outcomes of games, which is worth moreresearch.

V. CONCLUSION

Many 2-player combinatorial games can beextended to be n-player games. This paperuses the preference system to solve for thegame value of such games. Afterwards, a

13In recent times, researchers such as Smith[9] developedsome approximation algorithms that can largely reduce thecomplexity of Backward Induction, while many of them(especially those that take advantage of bionic evolution)are not always stable and only return the local optimumsometimes.

recursive function for determining an n-playergame’s outcome is derived and proven analyt-ically.

We then provide an analysis of n-playergames with payoffs by revising the preferencesystem. A recursive function is then proposedand proven to yield accurate results whensolving these games.

When generalizing to n-player games withmixed strategies, it’s discovered that an im-portant function used previously needs tobe re-defined. The new definition makes thecomputation become more complicated, sothis type of games is worth further study forbetter solutions.

To aid researchers in the future to extendthe study of these games further, we at-tached the recursive algorithms programmedin Python, with which game values can bequickly determined.

10

REFERENCES

[1] Berlekamp, Elwyn R., John H. Conway, and Richard K. Guy. Winning Ways for YourMathematical Plays, Volume 1. AK Peters/CRC Press, 2018.[2] Nivasch, Gabriel. ”The Sprague-Grundy function of the game Euclid.” Discrete mathe-matics 306.21 (2006): 2798-2800.[3] Liu, Wen An, and Juan Yang. ”Multi-player last nim with passes.” International Journalof Game Theory 47.2 (2018): 673-693.[4] Krawec, Walter O. ”Analyzing n-player impartial games.” International Journal of GameTheory 41.2 (2012): 345-367.[5] Kelly AR (2011) Analysis of one pile misre Nim for two alliances. Rock Mt J Math41(6):18951906.[6] Cerqueti, R., & Coppier, R. (2014). Corruption, evasion and environmental policy: a gametheory approach. IMA Journal of Management Mathematics, 27(2), 235-253.[7] Sanyal, R., & Samanta, S. (2011). Trends in international bribe-giving: do anti-briberylaws matter?. Journal of International Trade Law and Policy, 10(2), 151-164.[8] Cairns, Grant, Nhan Bao Ho, and Tams Lengyel. ”The SpragueGrundy function of thereal game Euclid.” Discrete Mathematics 311.6 (2011): 457-462.[9] Smith, David K. ”Dynamic programming and board games: A survey.” European Journalof Operational Research 176.3 (2007): 1299.[10] Aumann, Robert J. ”Backward induction and common knowledge of rationality.” Gamesand Economic Behavior 8.1 (1995): 6-19.[11] Iqbal, A., and A. H. Toor. ”Backwards-induction outcome in a quantum game.” PhysicalReview A 65.5 (2002): 052328.[12] Berlekamp, Elwyn R., John H. Conway, and Richard K. Guy. Winning Ways for YourMathematical Plays, Volume 4. AK Peters/CRC Press, 2004.

11

VI. APPENDICES

Appendix 1.1Below is the Python program used to solve Exapmle 1.1.

1 import math2

3 def flat (nums,t ,n ):4 res = []5 if isinstance (nums, list ):6 for i in nums:7 if isinstance ( i , list ):8 res . extend( flat ( i , t ,n))9 else :

10 res .append(i )11 else :12 res=nums13 return res14

15 def indexing( lista ,element, t ,n ):16 for i in range (0, len ( lista )):17 listb = lista [ i ]18 if isinstance ( listb , list ):19 players = flat ( listb , t ,n)20 if (element+(t%(n+1)) in players ):21 return i22 else :23 if (element+(t%(n+1)))==listb :24 return i25

26 def g(G,t ,n,A,winners):27 Alist =A[t%(1+n)]28 if G==[]:29 res=030 return res31 else :32 listk =[]33 for k in range (0, n+1):34 decision =indexing(winners, Alist [k ], t ,n)35 if type( decision )==type (2):36 NextMove=G[decision]37 nextwinners=winners[ decision ]38 if Alist [k]==g(NextMove,t+1,n,A,nextwinners)+1:39 listk .append(k)40 res=Alist [min( listk )]41 print ( listk )42 return res43

44

45 G =[[],[[],[[]]]]46 winners =[[[1]],[[2],[[0]]]]47 A =[[0,1],[0,1]]

12

48 B =[[0,2,1],[0,2,1],[0,1,2]]49 gamevalue=g(G,0,1,A,winners)50 print (gamevalue)

Appendix 1.2In this appendix, the Python code used for generating the decision trees are presented.

1 import matplotlib . pyplot as plt2

3 decisionNode = dict ( boxstyle=”sawtooth”, fc=”0.8”)4 leafNode = dict ( boxstyle=”round4”, fc=”0.8”)5 arrow args = dict ( arrowstyle =”<−”)6

7 def davidTree( j ):8 resulttree =[{’1,2’ :{ ’T11’:{’0,2’ :{ ’T12’:{’0,1’ :{ ’T12’:’0,0’}}, ’T22’:’0,0’\\9 }}, ’T12’:{’1,1’ :{ ’T11’:{’0,1’ :{ ’T12’:’0,0’}}, ’T12’:{’1,0’ :{ ’T11’:’0,0’}}\\

10 }}, ’T22’:{’1,0’ :{ ’T11’:’0,0’}}}},11 {’0’ :{1:{ ’2’ :{1:{ ’5’ :{1: ’1’}},2:{ ’7’ :{2: ’4’ ,3: ’2’}}}},2:{’−4’\\12 :{3:{ ’6’ :{4: ’8’ ,5: ’3’}}}}}},\\13 {0:{1:{4:{1:{3:{1:{11:{1:{3:{1:5,2:7}}}},2:{9:{2:{15:{3:8}},\\14 3:{17:{4:7,5:−1}},4:{−2:{6:12}}}}}},2:{−2:{3:{5:{5:{0:{7:−7,\\15 8:−9}},6:7}}}},3:{1:{4:{4:{7:{18:{9:17,10:19}}}},5:{−9:{8:{\\16 9:{11:3}},9:{−1:{12:5,13:6}},10:{7:{14:8}}}}}}}},2:{7:{4:{5\\17 :{6:{21:{11:{−6:{15:3,16:−6}}}},7:{12:{12:{5:{17:3,18:−1,19\\18 :2}},13:{−1:{20:−5,21:3}}}}}}}}}}19 ]20 return resulttree [ j−1]21

22 def getNumLeafs(myTree):23 numLeafs = 024 firstSide = list (myTree.keys())25 firstStr = firstSide [0]26 secondDict = myTree[ firstStr ]27 for key in secondDict:28 if type(secondDict[key ]). name == ’dict’:29 numLeafs += getNumLeafs(secondDict[key])30 else :31 numLeafs += 132 return numLeafs33 def getTreeDepth(myTree):34 maxDepth = 035 firstSide = list (myTree.keys())36 firstStr = firstSide [0]37 secondDict = myTree[ firstStr ]38 for key in secondDict:39 if type(secondDict[key ]). name ==’dict’:40 thisDepth = 1 + getTreeDepth(secondDict[key])41 else :42 thisDepth = 143 if thisDepth > maxDepth:

13

44 maxDepth = thisDepth45 return maxDepth46

47 def plotNode(nodeTxt, centerPt , parentPt , nodeType):48 createPlot .ax1. annotate (nodeTxt, xy=parentPt , xycoords=’axes fraction ’ ,49 xytext=centerPt , textcoords =’axes fraction ’ ,50 va=”center” , ha=”center” , bbox=nodeType, \\51 arrowprops=arrow args)52

53

54 def plotMidText( cntrPt , parentPt , txtString ):55 xMid = (parentPt [0] − cntrPt [0]) / 2.0 + cntrPt [0]56 yMid = (parentPt [1] − cntrPt [1]) / 2.0 + cntrPt [1]57 createPlot .ax1. text (xMid, yMid, txtString , va=”center” , ha=”center” , \\58 rotation =30)59

60 def plotTree (myTree, parentPt , nodeTxt):61 numLeafs = getNumLeafs(myTree)62 depth = getTreeDepth(myTree)63 firstSide = list (myTree.keys())64 firstStr = firstSide [0]65 cntrPt = ( plotTree .xOff + (1.0 + float (numLeafs ))/2.0/ plotTree . totalW, \\66 plotTree .yOff)67 plotMidText( cntrPt , parentPt , nodeTxt)68 plotNode( firstStr , cntrPt , parentPt , decisionNode)69 secondDict = myTree[ firstStr ]70 plotTree .yOff = plotTree .yOff − 1.0/ plotTree . totalD71 for key in secondDict .keys ():72 if type(secondDict[key ]). name ==’dict’:73 plotTree (secondDict[key ], cntrPt , str (key))74 else :75 plotTree .xOff = plotTree .xOff + 1.0/ plotTree . totalW76 plotNode(secondDict[key ], ( plotTree .xOff, plotTree .yOff ), cntrPt , \\77 leafNode)78 plotMidText(( plotTree .xOff, plotTree .yOff ), cntrPt , str (key))79 plotTree .yOff = plotTree .yOff + 1.0/ plotTree . totalD80

81

82 def createPlot ( inTree ):83 fig = plt . figure (1, facecolor =’white’ )84 fig . clf ()85 axprops = dict ( xticks =[], yticks =[])86 createPlot .ax1 = plt . subplot (111, frameon=False, ∗∗axprops)87 plotTree . totalW = float (getNumLeafs(inTree))88 plotTree . totalD = float (getTreeDepth(inTree ))89 plotTree .xOff = −0.5/plotTree . totalW; plotTree .yOff = 1.0;90 plotTree ( inTree , (0.5,1.0), ’ ’ )91 plt .show()92

93 for i in range (1,4):94 mytree = davidTree( i )

14

95 print (mytree)96 createPlot (mytree)

Appendix 2

Zipeng Zhang

ACKNOWLEDGMENTS

The author would like to thank professor Morley for his instruction and guidance in thecompletion of this paper. The author would also like to thank Pioneer Academics team forproviding necessary assistance in improving the quality of the writing of this paper. Theauthor also wishes to thank his family for giving never-ending support.

16

0 solving n-player games: a ... - yau-awards.science¼ 子芃.pdf · 0 solving n-player games: a...

Documents