social norms for self-policing multi-agent systems and ... · multi-agent systems and virtual...

Social Norms for Self-PolicingMulti-agent Systems and Virtual Societies

A dissertation submitted byDaniel Villatoro Segura

at Universitat Autonoma de Barcelona tofulfill the degree of PhD in Computer Science.

Bellaterra, June 1st, 2011

Director: Dr. Jordi Sabater-MirInstitut d’Investigacio en Intel·ligencia ArtificialConsell Superior dInvestigacions Cientıfiques

A mis padres y a Los Beatles.

Acknowledgements

Muchas horas y dedicacion estan invertidas en este documento, pero afortunadamenteeste camino no lo he recorrido solo. Aquı solo nombrare a algunas de las personas einstituciones que han contribuido activamente a empujarme a lo largo de este camino.

Primero de todo, tengo que agradecer a mi director de tesis, Jordi Sabater por dosmotivos. Primero de todo, agradezco que hace anos confiara en un africano entrevistadovia Skype y por haberme dejado explorar y crecer cientıficamente. Gracias a el, no solohe aprendido el metodo e integridad cientıfica, sino que me ha ayudado a madurar comopersona, enfrentandonos juntos a momentos adversos y alegrıas insospechadas. El se-gundo motivo por el que le estoy agradecido es por haberme concedido esta oportunidadjunto al Senor Pinol: un gran companero y amigo, con el que sufrimos el proyecto eRep.¿Quien dirıa que tantas horas descifrando y programando, nos iba a llevar a brindar (unavez mas) por la ciencia en el Danubio?

Y tambien son protagonistas de estos brindis las siguientes personas, a las que lesestoy infinitamente agradecido y que me han hecho el camino mas llevadero. Meritxell,guapa!, por ser nuestro ejemplo a seguir, profesional y personalmente. Angi, por tusanimos, fuerza y companerismo. Sr. Nin, por el humor y los cafes. Mari Carmen, poresa sonrisa inagotable y la perseverancia ejemplar. Peter (Dani Polak), por habernosacompanado siempre. A las chicas de Administracion e Inma Moros, por habermesolucionado uno y mil problemas, siempre con una sonrisa en la cara. A Tito, por serun perdedor en la mesa de pingpong pero un buen amigo y un profesional con el cualme siento orgulloso de haber trabajado. A Elena, por tantas cervezas juntos en tantospuntos del mundo intentando sobrellevar las penas de la ciencia. A Pablo Noriega,por haberme dejado compartir muchos ratos a su lado, ilustrandome con su elegancia,y por haberme ayudado una infinidad de veces con consejos mas sabios de lo que elimagina. A Pedro Meseguer, por acortar las distancias entre doctorandos y cientıficoscon su amistad. A Juan Antonio Rodrıguez, por todos sus consejos en momentos clave.Ademas, quiero agradecer a todos y cada uno de los miembros del IIIA, ya que para mıes un orgullo haber sido parte de esta institucion, y haber compartido todos estos anoscon sus miembros.

I cannot forget thanking the partners from the projects I have been involved in:David de la Cruz, Jordi Brandts, Hector Solaz, Assumpcio Vila, Jordi Estevez, RaquelPique, Manuela Rodrıguez, Thijs, Wander, Tina, Stefan, Torsten, and the Yamanas.

I would like to thank Professor Sandip Sen, for hosting me in the University of Tulsaand sharing with me hours of work. I have learnt a lot from him and an important partof this thesis was accomplished thanks to him.

Infine, non potro mai dimenticare i colleghi e amici di Roma. Vorrei ringraziare

vii

Rosaria Conte per avermi accolto nel LABSS e avermi insegnato questo mestiere. Fed-erica Mattei per i mille favori. Francesca Giardini, Gennaro di Tosto, Stefano Picascia,Francisco Grimaldo, Luca Tummolini, Walter Quattrociocchi, Mario Paolucci, FedericoCecconi e Cristiano Castelfranchi per avermi ospitato. E non ci sono parole in italianoper rigraziare Giulia Andrighetto: mi ha dato la spinta di forza per finire questa tesi. Lavoglio ringraziare anche per avermi trasmesso la sua passione scientifica e per esserestata un’incredibile collega e una buonissima amica. Grazies milles cara mia!

I want to thank the Santa Fe Institute, for choosing me to participate in their summerschool, giving me the chance to meet an incredible group of scientists in one of the bestresearch labs of the world.

Quiero agradecer tambien a todas esas personas que habeis confiado en mı duranteestos anos y que sin vosotros esta tesis no se hubiera llevado a cabo: Julia Cot, TomasFuentes, Adria Serra, Jordi Lopez, Dani Vico, Jordi Esteller, Carles Amagat, CharlyBurgmann, Pedro Martın, Jose Ramon Lozano, Alejandro Parodi, Elvira Quero, MarıaElıas, Cecily Roche, Javier Medina, Mariola Ruiz, Gabriela Petrikovich, Maite Segu,Anna Trillo, Estibaliz Puente, Vıctor Cornet, Mario Bolıvar, Ingrid Dolader, Ory Shoe-maker y Alfonso Rodrıguez (fling).

Para finalizar no puedo dejar de agradecer a mi familia. Vuestro apoyo incondi-cional ha sido fundamental para alcanzar esta meta, y solo gracias a vosotros hasido posible. A mis tias Puri y Toni, y a mis primas de Barcelona, por habermehecho sentir como en casa. A mi Tata por su buen humor y todo el carino quesiempre me manda; a mi prima Ines, por no fallarme nunca; a mi cunada Cristinapor su animo; a mis sobrinos Pablo, Julia y Jorge por confiar en mı; a mis her-manos Diego y Fernando, mis roles a seguir y por tenerlos siempre tan cerca; yfinalmente, a mi madre, por transmitirme una parte de la fuerza, el empeno y laperseverancia que la caracteriza y a todos nos fascina; y a mi padre, por haber confiadoen mı y apoyarme siempre, y por las sabias palabras que una vez me enseno y mehan ayudado a acabar la “selectividad”: “Cabeza alta, paso firme y mucha mala leche”.

Gracias a todos.

¡Por la ciencia!

Abstract

Social norms are one of the mechanisms for decentralized societies to achieve coordi-nation amongst individuals. Such norms are conflict resolution strategies that developfrom the population interactions instead of a centralized entity dictating agent protocol.One of the most important characteristics of social norms is that they are imposed bythe members of the society, and they are responsible for the fulfillment and defenseof these norms. By allowing agents to manage (impose, abide by and defend) socialnorms, societies achieve a higher degree of freedom by lacking the necessity of author-ities supervising all the interactions amongst agents.

In this thesis we approach social norms as a malleable concept, understanding normsas dynamic and dependent on environmental situations and agents’ goals. By integrat-ing agents with the necessary mechanisms to handle this concept of norm, we haveobtained an agent architecture able to self-police its behavior according to the socialand environmental circumstances in which is located.

First of all, we have grounded the difference between conventions and essentialnorms from a game-theoretical perspective. This difference is essential as they favorcoordination in games with different characteristics.

With respect to conventions, we have analyzed the search space of the emergenceof conventions when approached with social learning. The exploration took us to dis-cover the existence of Self-Reinforcing Structures that delay the emergence of globalconventions. In order to dissolve and accelerate the emergence of conventions, we havedesigned socially inspired mechanisms (rewiring and observation) available to agentsto use them by accessing local information. The usage of these social instruments rep-resent a robust solution to the problem of convention emergence, specially in complexnetworks (like the scale-free).

On the other hand, for essential norms, we have focused on the “Emergence ofCooperation” problem, as it contains the characteristics of any essential norm scenario.In these type of games, there is a conflict between the self-interest of the individual andthe group’s interest, fixing the social norm a cooperative strategy. In this thesis we studydifferent decentralized mechanisms by which cooperation emerges and it is maintained.

An initial set of experiments on Distributed Punishment lead us to discover thatcertain types of punishment have a stronger effect on the decision making than a purebenefit-costs calculation. Based on this result, we hypothesize that punishment (utilitydetriment) has a lower effect on the cooperation rates of the population with respect tosanction (utility detriment and normative elicitation). This hypothesis has been inte-grated into the developed agent architecture (EMIL-I-A). We validate the hypothesis byperforming experiments with human subjects, and observing that behaves accordinglyto human subjects in similar scenarios (that represent the emergence of cooperation).

ix

x

We have exploited this architecture proving its efficiency in different in-silico scenar-ios, varying a number of important parameters which are unfeasible to reproduce inlaboratory experiments with human subjects (because of economic and time resources).

Finally, we have implemented an Internalization module, which allows agents toreduce their computation costs by linking compliance with their own goals. Inter-nalization is the process by which an agent abides by the norms without taking intoconsideration the punishments/sanctions associated to defection. We have shown withagent based simulation how the Internalization has been implemented into the EMIL-I-A architecture, obtaining an efficient performance of our agents without sacrificingadaptation skills.

Contents

Acknowledgements vii

Abstract ix

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Overview and Structure of the Thesis . . . . . . . . . . . . . . . . . . . 51.4 Related Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 State of the Art 92.1 Categorizing Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.1 Conventional Norms . . . . . . . . . . . . . . . . . . . . . . . 102.1.2 Essential Norms . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Convention Emergence . . . . . . . . . . . . . . . . . . . . . . . . . . 132.3 Essential Norms and the Emergence of Cooperation in Artificial Social

Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3.1 Socially Inspired Mechanisms for Cooperation Emergence . . . 182.3.2 Punishment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Conventions: Trespassing the 90% Convergence Rate 253.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.2 Overpassing the 90% . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.3 Is cultural maintenance the source of subconventions? . . . . . . . . . . 29

3.3.1 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.3.2 Effect of Memory Size . . . . . . . . . . . . . . . . . . . . . . 323.3.3 Effect of Neighborhood Size . . . . . . . . . . . . . . . . . . . 343.3.4 Effect of Learning Modalities . . . . . . . . . . . . . . . . . . 363.3.5 Effect of Number of Players . . . . . . . . . . . . . . . . . . . 393.3.6 Effect of Action Set . . . . . . . . . . . . . . . . . . . . . . . 42

3.4 Discovering subconventions: not only history, but topology . . . . . . . 43

v

4 Unraveling Subconventions 454.1 Social Instruments for Subconvention Dissolution . . . . . . . . . . . . 45

4.1.1 Our Social Equipment . . . . . . . . . . . . . . . . . . . . . . 474.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.2.1 Rewiring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.2.2 Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.3 Discussion on the Frontier Effect . . . . . . . . . . . . . . . . . . . . . 564.4 Understanding Subconventions in Scale-Free Networks. . . . . . . . . . 56

4.4.1 Who’s the Strongest Node? . . . . . . . . . . . . . . . . . . . . 574.5 Combining Instruments: Solving the Frontier Effect. . . . . . . . . . . 60

4.5.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5 Distributed Punishment 675.1 HIHEREI: Humans playing in an Electronic Institution . . . . . . . . . 70

5.1.1 Extending EIDE . . . . . . . . . . . . . . . . . . . . . . . . . 715.2 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 725.3 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735.4 Disentangling Distributed Punishment: the Punished’s motivation . . . 76

6 EMIL-I-A: The Cogno-Normative Agent Architecture 816.1 Punishment and Sanctions . . . . . . . . . . . . . . . . . . . . . . . . 836.2 The cognitive dynamics of norms . . . . . . . . . . . . . . . . . . . . . 846.3 The EMIL-I-A Architecture . . . . . . . . . . . . . . . . . . . . . . . 85

6.3.1 Norm Salience . . . . . . . . . . . . . . . . . . . . . . . . . . 876.3.2 Salience Control Module . . . . . . . . . . . . . . . . . . . . . 88

6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

7 Proving EMIL-I-A 917.1 EMIL-I-A plays Distributed Punishment . . . . . . . . . . . . . . . . . 91

7.1.1 EMIL-I-A Decision Making . . . . . . . . . . . . . . . . . . . 927.1.2 EMIL-I-A Results in Distributed Punishment Experiment . . . 93

7.2 Experimenting with Punishment and Sanction . . . . . . . . . . . . . . 947.2.1 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . 947.2.2 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . 977.2.3 EMIL-I-A Results . . . . . . . . . . . . . . . . . . . . . . . . 99

7.3 Exploiting EMIL-I-A . . . . . . . . . . . . . . . . . . . . . . . . . . . 1027.3.1 Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . 1027.3.2 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . 1037.3.3 Punishment and Sanctioning on the Emergence of Cooperation . 1047.3.4 Norms Spreading . . . . . . . . . . . . . . . . . . . . . . . . . 1047.3.5 Costs of Punishment . . . . . . . . . . . . . . . . . . . . . . . 1067.3.6 Dynamic Adaptation Heuristic . . . . . . . . . . . . . . . . . . 1087.3.7 Adapting to the Environment: Free Riders Invasions . . . . . . 109

7.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

vi

8 Internalization 1138.1 What is Internalization? . . . . . . . . . . . . . . . . . . . . . . . . . . 114

8.1.1 Factors affecting internalization. . . . . . . . . . . . . . . . . . 1158.2 Dynamics of Norms Internalization . . . . . . . . . . . . . . . . . . . . 117

8.2.1 Internalization Module . . . . . . . . . . . . . . . . . . . . . . 1178.2.2 Urgency Management Module . . . . . . . . . . . . . . . . . . 118

8.3 Self-Regulated Distributed Web Service Provisioning . . . . . . . . . . 1208.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1208.3.2 Norms in our Web Service Scenario . . . . . . . . . . . . . . . 1218.3.3 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1228.3.4 Agents Architectures . . . . . . . . . . . . . . . . . . . . . . . 1268.3.5 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . 1308.3.6 Experiment 1: Adapting to Environmental Conditions . . . . . 1318.3.7 Experiment 3: Internalizers vs IUMAs . . . . . . . . . . . . . . 1338.3.8 Experiment 3: Effect of Initial Norm Holders . . . . . . . . . . 1348.3.9 Experiment 4: Testing Locality: Norms Coexistence . . . . . . 1368.3.10 Experiment 5: Dealing with Emergencies: Selective Norm Vi-

olation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1378.3.11 Experiment 6: Topological Location . . . . . . . . . . . . . . . 137

8.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

9 Conclusions 1439.1 Convention Emergence . . . . . . . . . . . . . . . . . . . . . . . . . . 143

9.1.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1439.1.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

9.2 Essential Norms and Emergence of Cooperation . . . . . . . . . . . . . 1459.2.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1459.2.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

vii

List of Figures

2.1 Coordination Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Prisoner’s Dilemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3 Collective Action Game . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1 Underlying Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2 Convergence rates with different topologies and actions set . . . . . . . 303.3 Effect of Memory Size in Convergence Time on a Fully Connected net-

work. (100 Agents). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.4 Topologies Comparison with different Memory Sizes with Mono

Learning Approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.5 Convergence rates with different Neighborhood Sizes in a One Dimen-

sional Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.6 Diameter relation with Neighborhood size in a One Dimensional Lat-

tice with population = 100 . . . . . . . . . . . . . . . . . . . . . . . . 363.7 Different Learning Approaches in One Dimensional Lattices with dif-

ferent Neighborhood Sizes . . . . . . . . . . . . . . . . . . . . . . . . 373.8 Subnetwork topology resistent to subconventions in a multi learning

approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.9 Different Players in Fully Connected Networks with Different Learning

Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.10 Neighborhood Sizes Comparison with Different Players with Multi

Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.11 Reduced and Super Reduced Neighborhood sizes with different Players

using the Multi Learning modality. . . . . . . . . . . . . . . . . . . . . 423.12 Frontier effect in 3-players game with two competing conventions. . . . 433.13 Multi Player and Multi Action Games . . . . . . . . . . . . . . . . . . 44

4.1 Observation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.2 Examples of Frontiers in Different Networks. . . . . . . . . . . . . . . 514.3 Cluster Sizes Histogram for Scale Free . . . . . . . . . . . . . . . . . . 524.4 Example of Components Evolution in Scale Free NA Rewiring Method. 524.5 Rewiring Methods Comparison for Number of components in a One

Dimensional Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.6 Convergence Times with different Rewiring Methods in a Scale Free

Network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

ix

4.7 Comparison on Effects of Convergence Time on One Dimensional Lattice. 554.8 Fixing Agents Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . 584.9 Self-Reinforcing Structures . . . . . . . . . . . . . . . . . . . . . . . . 604.10 Comparison with Fixed SRS Nodes . . . . . . . . . . . . . . . . . . . 614.11 Comparison with Simple and Combined Social Instruments on Regular

Network using MBR. . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.12 Comparison with Simple and Combined Social Instruments on Scale

Free Network using BRR. . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.1 Human - eI interaction . . . . . . . . . . . . . . . . . . . . . . . . . . 715.2 Distributed Punishment Experimental Results with Human Subjects . . 745.3 Empirical Results of the Punished. . . . . . . . . . . . . . . . . . . . . 78

6.1 EMIL-I-A Architectural Design . . . . . . . . . . . . . . . . . . . . . 86

7.1 Distributed Punishment Experimental Results with Simulated Agents . 957.2 Punishment and Sanction Experiment with Human Subjects . . . . . . . 987.3 Punishment and Sanction Experiment with EMIL-I-A Simulated Subjects1017.4 Prisoner’s Dilemma Game Payoff Matrix . . . . . . . . . . . . . . . . 1027.5 Effects of Punishment (P) and Sanction (S) on the Emergence of Coop-

eration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1057.6 Effects of Punishment (P) and Sanction (S) on the Salience. . . . . . . . 1077.7 New Free Riders introduced at TS = 5000. . . . . . . . . . . . . . . . . 109

8.1 EMIL-I-A Architectural Design . . . . . . . . . . . . . . . . . . . . . 1198.2 Social Network of the Web-Service Provisioning Scenario. . . . . . . . 1238.3 The Interaction Protocol Between Agents . . . . . . . . . . . . . . . . 1258.4 Different Combinations of Resources Capacities Distribution and Ex-

pected Quality Distributions. . . . . . . . . . . . . . . . . . . . . . . . 1328.5 Percentage of Unsuccessful Transactions with Different Proportions of

Strategic Agents and Internalizers. . . . . . . . . . . . . . . . . . . . . 1348.6 Complex Loop Experiment Average Salience. . . . . . . . . . . . . . . 1358.7 Different norms coexisting in the same social environment. . . . . . . . 1368.8 Effect of the topological positioning of INHs. . . . . . . . . . . . . . . 139

x

Chapter 1

Introduction

1.1 MotivationSocial norms have been claimed to be one of the research topics in multi-agent systemsto be investigated [Luck et al., 2005]. Social norms are part of our everyday life, andthey have been of interest in several areas of research [Elster, 1989]. Social normshelp people self-organizing in many situations, specially in open-environments wherehaving an authority representative is not feasible. On the contrary to institutional rules,the responsibility to enforce social norms is not the task of a central authority but atask of each member of the society. From the book of Bicchieri [Bicchieri, 2006], thefollowing definition of social norms is extracted:

“The social norms I am talking about are not the formal, prescriptive orproscriptive rules designed, imposed, and enforced by an exogeneous au-thority through the administration of selective incentives. I rather discussinformal norms that emerge through the decentralized interaction of agentswithin a collective and are not imposed or designed by an authority”.

Although this definition does not pretend to be a formal definition, it gives us the no-tion that social norms are used in human societies as a mechanism to improve the be-havior of the individuals in those societies without relying on a centralized and om-nipresent authority. In recent years, the use of these kinds of norms has been con-sidered also as a mechanism to regulate virtual societies and specifically societiesformed by artificial agents ([Saam and Harrer, 1999, Shoham and Tennenholtz, 1992,Grizard et al., 2006]), and in some cases, mixed populations (agents and humans).

The study of social norms within multi-agent systems has been framed insidethe Computational Social Science community, which is “in charge of collectingand analyzing data at scale that may reveal patterns of individual and group be-haviours” [Lazer et al., 2009]. The combination of both techniques (computer sci-ence and social science) makes special sense when we deal with multi-agent sys-tems, as agents are social entities. The exploitation of the knowledge provided by thesocial sciences is becoming more useful with the integration of humans and agents(E.g. [Brito et al., 2009]).

1

Latterly, and due to the interest of the multi-agent community on social norms,the NorMAS movement has been founded [Nor, 2008]. This movement gathers aninterdisciplinary community (mainly formed by computer scientists, social scientists,psychologists, and economists) around the topic of the norms in multi-agent systems.From one of the workshops organized by this community, the following definition ofnormative multi-agent system was agreed by consensus:

“A normative multi-agent system is a multi-agent system organized bymeans of mechanisms to represent, communicate, distribute, detect, create,modify, and enforce norms, and mechanisms to deliberate about norms anddetect norm violation and fulfillment” [Boella et al., 2008].

The online emergence of social norms (how they are created at first time) in a decen-tralized way is one of the problems in the community that needs to be solved and someauthors are trying to approach. As we are focusing on decentralized open virtual soci-eties, different norms might emerge on different areas of the social topology. However,social norms are by definition ([Coleman, 1998]) more socially efficient when the wholepopulation abides by them. We are interested in studying the spreading and acceptanceof social norms, what Axelrod [Axelrod, 1986] calls norm support. Our understandingof norm support deals with the problem of which norm is established as the dominantwhen more than one norm exists for the same situation, or in those situations wherethe agents’ self-interest is against the interest of the society, and therefore violating thenorms is the strategy that provides the highest payoff.There are basically two well-known approaches to study this emergence: the game-theoretical and the cognitive. The former studies the process by which agents calculatethe cost-benefit ratio of the possible actions taking into consideration the existence of anorm [Villatoro et al., 2010] and the strategic behavior of the other agents involved inthe interactions. On the other hand, the latter studies the emergence from the point ofview of the beliefs, desire and intentions of the agents and their relation to normativebeliefs [Andrighetto et al., 2010b].

After reviewing the literature on both research approaches, we agree withYoung [Young, 2008] that there are three mechanisms by which norms emerge, thatgeneralize all the factors previously treated in the literature:

• Pure Coordination: These are “social” phenomena, because they are held in placeby shared expectations about the appropriate solution to a given coordinationproblem, but there is no need for social enforcement.

• Threat of social disapproval or punishment for norm violations.

• Internalization of norms of proper conduct.

The first mechanisms have been widely studied under the topic of conven-tion emergence [Shoham and Tennenholtz, 1997a, Kittock, 1993, Delgado et al., 2003,Mukherjee et al., 2007, Sen and Airiau, 2007, Walker and Wooldridge, 1995] (e.g. thetypical example of these kind of mechanisms are which side of the road should carsdrive: both agents are benefited from following the coordination norm, otherwise, theirutility is drastically reduced).

2

However, we have observed a general practice in that area of research which con-sists in establishing the system as converged when 90% of the population shares thesame convention. Because of the definition of convention (a norm that establishes afocal point amongst the possibles to promote coordination when shared by all partici-pants), we cannot consider 90% as an acceptable convergence rate and we perform anexhaustive study on that area. We focus on aspects like the strategy update rule used byagents or the topology that fixes their interactions to study full convergence (100%) inthe system. We will observe how certain strategy update rules and topologies promotethe existence of subconventions, that delay the emergence of a global convention. Wepropose socially-inspired methods to dissolve subconventions and allow the society toreach the desired full-convergence.

On the other hand, the second and third points are tightly related amongstthem. Sanctions serving as a mechanism for norm emergence has been widelystudied by social scientist [Coleman, 1998], psychologists [Bandura, 1976] andeconomists [Fehr and Gachter, 2002]. In this thesis we will study how different punish-ment technologies can reinforce the process of norm emergence, allowing the systemto self-policy. We initially envisioned to donate agents with the necessary mechanismsto achieve Multilateral Costly Punishment [Posner and Rasmusen, 1999]. In order tobe realistic with the parameter setting for the Multilateral Costly Punishment, we per-formed experimental economics experiments that proved us the existence of a strongerforce behind the punishment: the normative message affecting a normative decisionmaking. With the light thrown in those experiments, we decided to develop a cognitiveagent architecture that not only performs a benefit-cost calculation with respect to theexistence of norms, but it also considers the fact of the norm’s existence as determi-nant in the decision making. In this part we will also prove the determinant differencebetween punishment and sanction, which is an important element for our developedarchitecture, and a state-of-the-art discovery in the behavioural economics literature.As we said previously, the third point affirmed by Young for norms to emerge (Inter-nalization) is in a direct relationship with our research on punishment technologies. Inour work [Andrighetto et al., 2007, Conte, 2009, Andrighetto et al., 2010b], we intendInternalization as the process where an agent becomes compliant with norms not be-cause the fear of potential punishment (and utility detriment) but just because the mereexistence of norms and the willing to abide by them. We will observe how this processensures a more robust society against norm-violations, without sacrificing adaptationskills.

1.2 ContributionThis thesis contributes to the field of self-organized and normative multi-agent systemsin three lines:

First - Exhaustive Exploration of the Dimensions in the Convention EmergenceProblem

We present an experimental framework to analyze the dynamics in the process of

3

convention emergence, observing the effects of the different parameters configuringthe model. Specially, we focus on the problem of the emergence and dissolution ofsubconventions. We have detected how these subconventions emerge mainly for tworeasons: the strategy update rule promoting concordance with previous history (andculture maintenance) and topological conditions promoting endogamy. The conductedresearch took us to discover the existence of Self-Reinforcing Structures, mainly inScale-Free Networks, producing subconventions to remain metastable. We proposesocially-inspired mechanisms that dissolve these Self-Reinforcing Structures, allowingthe society to achieve full convergence.

Second - Experimental Platform for Regulated Hybrid Experiments

In order to obtain experimental results about certain punishment technologies, aplatform allowing to test the behavior of humans under certain conditions (achieved bythe partners interactions) was necessary. Another requirement for the platform was arestricted behavioural set of available actions to agents and a pre-established interac-tion protocol. Because of the previous reasons, the paradigm of Electronic Institutionsseemed the most adequate.

We have developed a friendly user web-interface for human subjects to interactwith other subjects inside an Electronic Institution. With the usage of this platform weallow humans to interact from remote locations through a simple web interface. On theother hand, the electronic institution paradigm allows us, as experiment designers, toimplement agents that perform in an specific way, locating human subjects in specificinteresting experimental conditions.

This platform has been essential to perform behavioral economics experiments,introducing agents into them, and allowing subjects to participate remotely, advancingthe state-of-the-art platforms (like Z-tree [Fischbacher, 2006]).

Third - EMIL-I-A Architecture for Self-Regulating Societies

We have developed a BDI agent architecture which distinguishes at a cognitive levelthe difference between punishment and sanction. Experimental results with humansubjects confirmed the difference between punishment (utility detriment) and sanction(utility detriment plus norm elicitation) in the decision making of human subjects. Thisdifference is incorporated precisely in our agent architecture (EMIL-I-A), producingeffects on the decision making of the agents at several levels. Moreover, we test theperformance of our EMIL-I-As in the same experimental conditions than with humans.These experiments show that EMIL-I-As produce the same dynamics than those showedby humans, obtaining an state-of-the-art agent architecture with punishment capabilitiesable to interact (accordingly) with humans.

Finally, we implement an Internalization mechanism that allows norm complianceto happen without the external enforcement of punishment, proving the adaptive capa-bilities of EMIL-I-A while saving in cognitive load.

4

1.3 Overview and Structure of the Thesis

This thesis is structured in nine chapters:Chapter 2: We present a game-theoretical difference between conventions and essen-tial norms. This difference is basic to structure the rest of the thesis. Then we makean analysis of the state of the art in the MAS literature in both fields. With respect toconventions, we review the different works performed on the emergence of conventionsand how they treated different aspects of such process. On the other hand, we reviewthe different works and mechanisms analyzed for the emergence of cooperation. Wefocus on those that are applicable for self-organizing societies, paying special attentionat MAS with punishment technologies incorporated.Chapter 3: In this chapter we present the general framework for the convention emer-gence problem. This framework allows us to study the different dimensions affectingthe convention emergence, focusing in the achievement of full convergence. A detailedanalysis of the search space of parameters allows us to discover that full convergencewas not previously achieved because of certain strategy update rules and certain topo-logical configurations promoting subconventions.Chapter 4: In order to dissolve the identified subconventions, we propose socially-inspired techniques that also accelerate the process of convention emergence. With theusage of these instruments we discover the existence of Self-Reinforcing Structures,and propose a combined social instrument for dissolving the subconventions created inthose areas of the topology.Chapter 5: In this chapter we propose distributed punishment as a punishment tech-nology that can produce norm emergence in situations represented by common goodgames. Experimental results with human subjects give us the intuition that other typesof punishment may exist and affect differently to the decision making of the agents.Chapter 6: We present the EMIL-I-A architecture, conceived to interpret the differencebetween punishment and sanction at a cognitive level. This architecture is constructedassuming the existence of normative beliefs and goals, which are orchestrated with thenorm salience, representing the degree of activation in the social environment of thedifferent recognized norms.Chapter 7: In this chapter we prove the correctness of the EMIL-I-A architecture bysimulating the same experiments where human subjects participated, obtaining similarresults. Moreover, we exploit our agent architecture in different experimental condi-tions, understanding the dynamics of our agents when interacting in different environ-mental situations. Finally we introduce a Dynamic Adaptation Heuristic of the cost ofpunishment in order to achieve a more efficient sanction, profiting from its cognitiveload, and maintaining high cooperation rates.Chapter 8: This chapter discusses a cognitive mechanism that allows agents to internal-ize norms, and therefore, comply with them without external enforcement. By meansof simulation in a P2P inspired scenario, we will observe how internalization allowsagents to efficiently respond to their peer needs, adapting to environmental conditionsand dynamically changing the normative scheme.Chapter 9: We conclude our research of normative self-organized systems and sketchfuture lines of research to exploit the work presented in this thesis.

5

1.4 Related PublicationsThe following publications are a direct consequence of the development of the thesis.

• G. Andrighetto, D. Villatoro, F. Cecconi, R. Conte. Simulazione ad Agenti eTeoria della Cooperazione. Il Ruolo della Sanzione. Sistemi Intelligenti (forth-coming)

• G. Andrighetto, D. Villatoro, R. Conte. Norm Dynamics within agents. In B.Edmonds Dynamic View of Norms. Cambridge University Press (forthcoming)

• D. Villatoro, J. Sabater-Mir and S. Sen. Social Instruments for Robust Conven-tion Emergence. Proceedings of the Twenty-Second International Joint Confer-ence on Artificial Intelligence (IJCAI 2011). (In press).

• D. Villatoro, G. Andrighetto, R. Conte and J. Sabater-Mir. Dynamic Sanctioningfor Robust and Cost-Efficient Norm Compliance. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI 2011).(In press).

• G. Andrighetto, D. Villatoro. Beyond the Carrot and Stick Approach to Enforce-ment: An Agent-Based Model. Proceedings of the European Conference on Cog-nitive Science, New Bulgarian University, Sofia, 21-24 May 2011. CognitiveScience Society.

• T. Balke and D. Villatoro. Operationalization of the Sanctioning Process in Hedo-nic Artificial Societies. Proceedings of the AAMAS Workshop on Coordination,Organizations, Institutions and Norms (COIN @ AAMAS 2011).

• D. Villatoro, J. Sabater-Mir and S. Sen. Social Instrument for Convention Emer-gence. Proceedings of 10th International Conference on Autonomous Agents andMulti-agent Systems (AAMAS 2011).

• D. Villatoro, S. Sen and J. Sabater-Mir. Exploring the dimensions of Conven-tion Emergence in Multi-agent Systems. Advances in Complex Systems (ACS)Volume No.14, Issue No. 2 pp 201-227. (2011).

• G. Andrighetto, D. Villatoro, R. Conte and J. Sabater Mir. Simulating the relativeeffects of punishment and sanction in the achievement of cooperation. Proceed-ings of the Eighth European Workshop on Multi-Agent Systems (EUMAS10).

• R. Conte, G. Andrighetto, D. Villatoro. From Norm Adoption to Norm Internal-ization. (DEON 2010).

• G. Andrighetto, D. Villatoro, R. Conte. Norm internalization in artificial soci-eties. AI Communications. Vol No.23, Issue No.4 pp.325-339. (2010)

• D. Villatoro, S. Sen and J. Sabater-Mir. Of Social Norms and Sanctioning: AGame Theoretical Overview. International Journal of Agent Technologies andSystems, Vol. 2, Issue 1, pp 1-15. (2010).

6

• A. Vila-Mitja, J. Estevez, D. Villatoro and J. Sabater-Mir. Archaeological Ma-teriality of Social Inequality Among Hunter-Gatherer Societies. ArchaeologicalInvisibility and Forgotten Knowledge: Conference Proceedings, Lodz, Poland,5th7th September 2007. Archaeopress. 2010

• D. Villatoro, S. Sen and J. Sabater-Mir. Topology and memory effect on conven-tion emergence. Proceedings of the IEEE/WIC/ACM International Conferenceon Intelligent Agent Technology. (IAT 2009).

• D. Villatoro, N. Malone and S. Sen. Effects of interaction history and networktopology on rate of convention emergence. Proceedings of 3rd InternationalWorkshop on Emergent Intelligence on Networked Agents (WEIN’09 @ AA-MAS).

• D. Villatoro and J. Sabater-Mir. Dynamics in the Normative Group RecognitionProcess. Proceedings of IEEE Congress on Evolutionary Computation (IEEECEC 2009).

• I. Brito, I. Pinyol, D. Villatoro, J. Sabater-Mir. HIHEREI: Human Interactionwithin Hybrid Environments. Proceedings of 8th International Conference onAutonomous Agents and Multi-agent Systems (AAMAS 2009) (Best StudentDemo Award).

• D. Villatoro and J. Sabater-Mir. Group Recognition through Social Norms. Pro-ceedings of 8th International Conference on Autonomous Agents and Multi-agentSystems (AAMAS 2009).

• D. Villatoro and J. Sabater-Mir and S. Sen. Interaction, observance or both?Effects on convention emergence. Proceedings of Dotze Congres Internacionalde lAssociaci0 Catalana d’Intelligencia Artificial (CCIA 2009).

• D. Villatoro and J. Sabater-Mir. Towards the Group Formation through SocialNorms. Proceedings of the Sixth European Workshop on Multi-Agent Systems(EUMAS08).

• D. Villatoro and J. Sabater-Mir. Towards Social Norm. Proceedings of theEleventh International Congress of the Catalan Artificial Intelligence Associa-tion (CCIA08).

• D. Villatoro and J. Sabater-Mir. Mechanisms for Social Norms Support in VirtualSocieties. Proceedings of the Fifth Conference of the European Social SimulationAssociation (ESSA08).

• D. Villatoro and J. Sabater-Mir. Categorizing Social Norms in a Simulated Re-source Gathering Society. Proceedings of the AAAI Workshop on Coordination,Organizations, Institutions and Norms (COIN @ AAAI08).

• D. Villatoro and J. Sabater-Mir. Norm Selection Through Simulation in aResource-Gathering Society Proceedings of 21st European Simulation and Mod-elling Conference (ESM07).

7

• J. Sabater-Mir, I. Pinyol, D. Villatoro and G. Cuni. Towards Hybrid Experimentson reputation mechanisms: BDI Agents and Humans in Electronic Institutions.Proceedings of 12th Conference of the Spanish Association for Artificial Intelli-gence (CAEPIA07).

8

Chapter 2

State of the Art

Descriptions of tasks like greeting another person, dressing, driving, etc. are oftenaccompanied by the phrase “in a proper way”. In human societies, the “proper way”to fulfill these interaction protocols is specified by social norms. A number of tasksthat require some kind of interaction with other agents might require agents to followspecified guidelines to successfully complete these tasks. Social norms can facilitatesuch agent interactions and enable agents to complete these tasks efficiently. Such socialnorms can emerge and spread among the society until they are widely accepted andadopted. Therefore, we can view social norms as key elements that enable coordinationand self-organization in our everyday life.

Not all the social norms, however, deal with the same kind of interaction scenarios.We observe that social norms like greeting (shaking hands, kissing, leaning towardseach other, or a simple “hi!”) pertain to different situations compared to, for example,the social norm of recycling. We also observe that social norms, though referring tothe same concept, are defined using different terms in the literature, e.g. norms, sociallaws, conventions, social norms.

In addition to the different types of norms, and the wide variety of terms used todefine this social instrument, the study of social norms is made more challenging by theheterogeneous perspective on this issue and how it is viewed in diverse research areassuch as economics, social sciences or multi-agent systems. We believe that thoughthese areas have interesting theories and practices to contribute to social norms literatureand can benefit from prudent adaptations and applications of social norms, not enoughattention and effort has been expended on this potentially effective social coordinationmechanism by the corresponding research groups.

The primary goal of this chapter is to develop a characterization of social normsinto two primary groups: coordination norms and essential norms. This division isperformed from a game-theoretical point of view with the goal of understanding theprocess of norm emergence.

Before proceeding, we need to characterize the type of environment where the con-tribution of this thesis is though to be applied in. Technically speaking we assumethe existence of a MAS environment –observable as an objective entity– with fixedontology, dynamic state, regimented with constitutive conventions and where the par-

9

ticipation of capable and entitled agents is permitted.1 The decision-making processesof agents may be not under the control of the MAS. Agents may respond to their ownmotivations and be owned by different owners. Agents may enter or leave the MAS atwill, as long as they are capable and entitled to be in the MAS.

The type of norms treated in this work are those norms created through the interac-tion of agents and that are not imposed by any central authority. These emerging normspossess interesting dynamics that have to be studied to achieve real self-regulated soci-eties.

2.1 Categorizing NormsIn the multi-agent systems literature, several terms have been used for the same con-cept (convention, social norm, social law) although referring to a similar term. How-ever, those terms have been used as synonyms, and referring to a concept that wasnot specifically described. Considering norms as regularities in the behavior of agents,Coleman [Coleman, 1998] defines two main types of norms: conventions and essentialnorms. In this section we ground on a game-theoretical paradigm to clearly define con-ventions and essential norms, as mechanisms that solve coordination problems althoughin games with different dynamics.

2.1.1 Conventional Norms

Conventions (or conventional norms) are a type of norms that naturally emerge, withno threat of punishment. Conventional norms solve coordination problems (a represen-tative payoff matrix of these type of problems can be seen in Figure 2.1), where thereexist no conflict between the individual and the collective interests, as what is desired isthat everyone behaves in the same way, without any major difference on which actionagents are coordinated.

Following Coleman’s theory, the selection of the focal action2 in such norms isarbitrary, although most of the times is strongly biased by cultural patterns. One clearexample of these kind of norms is the selection of which side of the road to drive on.

1That is (i) There exists a world that may be populated by (ii) agents whose activity may involve otheragents and possibly other entities –like speed limits, traffic lights, or fines. (iii) At any point in time, thatworld is in a state which is represented by a set of variables whose values may change due to the actionsof agents or to their own dynamics. (iv) Out of the many conceivable actions that, in principle, agents mayattempt, the only ones that are deemed admissible in that world may have any effect in the world. (v) Allactions are subject to preconditions (on the state of the world) that make them feasible and post-conditionsthat determine their intended effect in the world.

2The term focal action is directly borrowed from game theory where it exists the concept of focal point.A focal point is defined as a solution that players will tend to use in the absence of communication, because itseems natural, special or relevant to them. For example, imagine that you and your partner are visiting Paris.It is the first time for both of you in that city. Unfortunately, you are not in the same hotel and you have nomeans to communicate with each other, although you know that you have to meet each other at a certain timein a public place. You can choose between all the public places in Paris. Using a common sense reasoning,you might choose to go to the Eiffel Tower, the Pyramid of the Louvre Museum, or the Arc de Triomphe.Those would be focal points in the decision game. Consequently the focal action will be the action taken bythe agents in the absence of communication.

10

Player 2Player 1 Action A Action B

Action A+1

+1

-1

-1

Action B-1

-1

+1

+1

Figure 2.1: Coordination Game

This definition is in concordance with that of Young [Young, 2007]: “A convention is apattern of behavior that is customary, expected, and self-enforcing. Everyone conforms,everyone expects others to conform, and everyone wants to conform given that everyoneelse conforms.”.

Before any agent has a preference for any convention, all the possible conventionsare equally beneficial at the utilitarian point of view, and therefore, selecting which oneto use is initially chosen randomly. Its spread along the society (and how it is affectedby social learning) has been widely studied as we will see in Section 2.2.

Convention emergence is a process that starts with a population of agents with noinitial preferences for any of the existing conventions, and they have to agree on a com-mon one (because a convention, by definition, is only useful when all agents share thesame convention); once all agents have agreed on the same convention, the system hasconverged. Pertinent examples of conventions in Multi-agent Systems are communi-cation protocol (the language an individual chooses for communication), behavioralpatterns (coordination protocols like the side of the road where to drive or how to greetwhen introduced to other people), or the selection of the task to be executed in a multi-task scenario.

2.1.2 Essential NormsEssential norms are norms that solve or ease collective action problems, where there isa conflict between the individual and the collective interests, making this the main dif-ference with respect to conventions. These norms are called essential norms. Followingthe definition of focal actions, in these kind of norms the focal action is not chosen ran-domly because the targets’ interests lie in the direction of action opposing observanceof the norm, and the beneficiaries’ interests lie in the direction of action favoring obser-vance of the norm. Essential norms help address situations where the individuals aretempted not to contribute to the common good. These problems are commonly knownin the literature as collective action problems. Heckathorn [Heckathorn, 1996] claimsthat any collective action system has two characteristics:

1. Non-excludable: Excluding anyone from consumption of the common good isimpractical. For example, scabs benefit from higher wages won through strikes.

2. Jointness of supply: the degree to which the good costs the same to produceregardless of the number of people who consume it. A radio broadcast has very

11

Player 2Player 1 Cooperate Defect

Cooperate3

3

5

0

Defect0

5

1

1

Figure 2.2: Prisoner’s Dilemma

high jointness of supply because the costs of production have very little, if any,proportion to the number of people who consume the broadcast. Manufacturedproducts, in contrast, typically have low jointness of supply because the cost ofproduction increases with the number of consumers, though economies of scalemay make the increase in cost less than directly proportional to the increase inconsumers [Barros, 2007].

Referring to the first of these characteristics of collective action systems, in thecase of the collective actions, conflicts appear when an individual is tempted not tocontribute to the common pool, leading to an incomplete collective action. The factthat an individual behavior affects the welfare of the group is enough for the group toacquire the right to control individual behavior. Social norms are applied to control theindividual behaviour in these kind of problems.

It has been proven [Hardin, 1971] that a Collective Action Game is equivalent to thePrisoner’s Dilemma3. In a game that follows the structure of a Prisoner’s Dilemma (likethe one shown in Figure 2.2) the individually rational strategy is to Defect, no matterwhat the other player decides. The equilibrium state of those decisions is suboptimalin the sense of Pareto, as there exists a different focal point (the mutually coopera-tive outcome) where the outcome when both agents are coordinated is preferred byboth players. A social norm prescribing cooperation would help agents in obtaining asocially-efficient behavior.

However, conflict between the individual’s interest and the collective’s interest canoccur in other situations than that captured by the Prisoner’s Dilemma game. Suchconflicts can also have the form of the Collective Action Game, as shown in Figure 2.3.Only the payoffs for the individuals are shown, and they are calculated considering thatV is the value of the collective good, c is the cost for the individual to contribute, and pis the proportion of the total value that an individual can produce by itself.

3The Prisoner’s Dilemma states: Two suspects are arrested by the police. The police have insufficientevidence for a conviction, and, having separated both prisoners, visit each of them to offer the same deal.If one testifies (defects) for the prosecution against the other and the other remains silent (cooperates), thebetrayer goes free and the silent accomplice receives the full 5-year sentence. If both remain silent, bothprisoners are sentenced to only six months in jail for a minor charge. If each betrays the other, each receives afive-year sentence. Each prisoner must choose to betray the other or to remain silent. Each one is assured thatthe other would not know about the betrayal before the end of the investigation. How should the prisonersact?

12

CollectiveIndividual Cooperate Defect

CooperateV-c pV-c

DefectV(1-p) 0

Figure 2.3: Collective Action Game

In this thesis we will focus on the subgroup of essential norms that can be charac-terized by the Prisoner’s Dilemma, and we refer the reader to [Villatoro et al., 2010] fora further classification of Collective Action Games.

2.2 Convention EmergenceIn real self-organized systems (and in most of the work in the literature) conventionemergence can be achieved through social learning. The social learning process is acomplicated task: agents are learning from the other agents whom they interact with,which at the same time are also learning from those interactions. Therefore, the struc-ture of the social network that restricts the interactions between agents is essential forthe emergence of conventions learnt through social learning.

This section will be useful to help us understand the roots of this area of research,and how it evolved along the years, leaving an important gap unexplored, that we willcover in this thesis.

The concept of social learning was firstly used by [Shoham and Tennenholtz, 1994],although referring to it as co-learning4: “the process in which several agents simul-taneously try to adapt to one another’s behaviour so as to produce desirable globalsystem properties.”. One of the main contributions of that work is the presentation ofthe convention emergence problem, characterized by the following dynamics: from apopulation of agents, n players are randomly selected, and these agents (without anypossible means of communication amongst them) choose an action from the m possiblefollowing an specific strategy update rule; then, the system returns a payoff (calculateddepending on the game played, normally a coordination game with a payoff matrix sim-ilar to the one in Figure 2.1) with which agents update their strategy update rule andthe interaction is finished. The system (ideally) continues this process until all agentsagree on the same action, considering the convention to have emerged.

In the specific case of [Shoham and Tennenholtz, 1994], the authors focused on thequestion of how different update rules and other system characteristics affect the even-

4Shoham and Tennenholtz [Shoham and Tennenholtz, 1994] were the first ones using theterm co-learning, also used by [Kittock, 1993], even though the dates might seem misleading.[Shoham and Tennenholtz, 1994] was submitted for publication in 93, when Kittock was writing thisother article [Kittock, 1993], which got published earlier.

13

tual emergence of desirable global system characteristics. The Highest Cumulative Re-ward (HCR) Rule was object of study in that paper, in a 2-player and 2-actions scenario,analyzing the results when using Coordination Games and the Prisoner’s Dilemmagames. The HCR rule determines agents which action to choose by observing whichaction has provided the maximum accumulated reward in the last interactions. We canobserve how this strategy update rule is sensible with respect to the memory size ofthe agents. Their experimental results showed how different parameters that configurethe HCR affect to the overall system performance. At this experimentation phase, au-thors took a decision that would mark the empirical practices on this area of research:they considered the system as converged when it reached 85% of agents performingthe same action5. This decision was motivated to save computational efforts. It wouldseem intuitive that the system would not reverse its state given such high convergencerate, being therefore a perfectly efficient reason to fix the convergence at such level.However, it still was against the strict definition of conventions, which specifies that aconvention has to be shared by the whole population in order to obtain the maximumsocial benefit, otherwise, a subgroup of agents will not be benefiting from it.

Soon after Shoham and Tennenholtz presented their work, Kittock [Kittock, 1993]noticed an important problem in their model. As we said before, these conventionsare achieved through social learning, being the interactions amongst agents a key el-ement for the success of the system. Shoham and Tennenholtz did not explicitlyconsider this problem and used a random interaction approach. Kittock’s contribu-tion [Kittock, 1993] is the first attempt to observe the effect of a social network of in-teraction in the convention emergence problem. Even though the only topologies testedwhere regular lattices 6 with different neighborhood sizes, those were enough for himto discover the relationship between the ratio of the network and the speed of conver-gence. However, and following the tradition started by Shoham and Tennenholtz, theempirical convergence rate was fixed to 90%.

Having grounded the base for the convention emergence problem, different exten-sions have been performed along the years. Without paying attention to the topologyof interaction, in the seminal work of Peyton [Young, 1993], a new strategy update ruleis described, External Majority, where agents update their strategy with a sample ofthe strategies played by other agents in previous games. For our understanding, suchstrategy update rule is intrusive for the agents’ privacy. Later, in [Walker A, 1995], thisstrategy update rule was refined to use only information observed by the updating agent.

On top of that refinement, Walker and Wooldridge introduced the conven-tion emergence problem in a Hunter-Gatherer inspired society (previously presentedin [Castelfranchi and Conte, 1995]), allowing agents to choose one rule from a set of4 possible. Moreover, and because of the scenario, agents were not located in a fixednetwork of interactions; instead they were mobile and located in a grid. Because of thecombination of the mobility of agents and the strategy update rule, the system performsa convergence of 100%, becoming the first authors to obtain such achievement.

With the growth of the Internet, networks with similar characteristics to the Internet

5An extension of this work is presented in [Shoham and Tennenholtz, 1997b], where the convergence rateis extended to 95% in a subset of experiments (2 of them), remaining the rest (10) equal at 85%.

6A lattice is a regular network in which all nodes are sequentially connected to a constant number ofnodes. The ring network and the fully connected network are extreme cases of lattices.

14

Wor

kG

ame

Stra

tegy

Upd

ate

Inst

rum

ents

Topo

logy

Con

verg

ence

[Sho

ham

and

Tenn

enho

ltz,1

994]

CG

&PD

HC

RM

emor

yR

esta

rts

NT

85%

[Kitt

ock,

1993

]C

G&

PDH

CR

Non

eR

N90

%[K

ittoc

k,19

94]

CG

HC

RA

utho

rity

G&

T10

0%[S

hoha

man

dTe

nnen

holtz

,199

7b]

CG

&PD

HC

RM

emor

yR

esta

rts

NT

95%

[You

ng,1

993]

SDSe

lect

ive

Maj

ority

Non

eN

TT

heor

etic

al[S

enan

dA

iria

u,20

07]

CG

&SD

Q-L

earn

ing,

FP&

WoL

F-PH

CN

one

NT

90%

[Muk

herj

eeet

al.,

2008

]C

GQ

-Lea

rnin

g,FP

,W

oLF-

PHC

&H

CR

Non

eG

Avg

Payo

ff3.

5

[Del

gado

,200

2]C

GG

ener

aliz

edSi

mpl

eM

ajor

ity&

HC

RN

one

SF-A

B90

%

[Del

gado

etal

.,20

03]

CG

Gen

eral

ized

Sim

ple

Maj

ority

&H

CR

Non

eSF

-AB

,SF-

W&

SF-F

90%

[Puj

olet

al.,

2005

]C

GH

CR

Imita

tion

Ran

d,R

N,S

W&

SF-

AB

95%

[Urb

ano

etal

.,20

09a]

CG

Rec

ruitm

entb

ased

onFo

rce

with

Rei

nfor

ce-

men

t

Forc

eR

and,

RN

,SW

&SF

-A

B90

%

[Urb

ano

etal

.,20

09b]

CG

Ext

erna

lMaj

ority

Mem

ory

Ran

d,R

N,S

W&

SF-

AB

90%

[Wal

kerA

,199

5]C

G7

impl

emen

tatio

nsof

Maj

ority

Rul

esN

one

G10

0%

[Sav

arim

uthu

etal

.,20

07b]

UG

Uns

peci

fied

Non

eD

N10

0%[M

ungo

van

etal

.,20

11]

CG

Wei

ghte

dR

eplic

ator

sD

ynam

icR

ando

mIn

tera

ctio

nsSW

+R

and

100%

Tabl

e2.

1:C

ompa

rativ

eta

ble

ofC

onve

ntio

nE

mer

genc

eW

orks

.C

G=

Coo

rdin

atio

nG

ame;

PD=

Pris

oner

’sD

ilem

ma;

SD=

Soci

alD

ilem

ma;

UG

=U

ltim

atum

Gam

e.N

T=

No

Topo

logy

;G=

Gri

d;T

=tr

ees;

Ran

d=

Ran

dom

;RN

=R

egul

arN

etw

ork;

SW=

Smal

l-W

orld

;SF

-AB

=Sc

ale

Free

with

Alb

ertB

arab

asi;

SF-W

=Sc

ale-

Free

with

Wal

sh;S

F-F=

Scal

e-Fr

eeFi

tnes

s;D

N=

Dyn

amic

Net

wor

ks.

15

captured the attention of scientists, mainly, small-world and scale-free networks. Theo-retical scale-free networks have been proved to represent the same characteristics of realsocial networks [Newman, 2003, Barabasi and Bonabeau, 2003]. In the work of Del-gado [Delgado, 2002] it is explicitly analyzed the relationship between the emergenceof social conventions and the effect of topologies. In their empirical evaluation they per-form experiments with two different strategy update rules: generalized simple majorityrule and the highest cumulative reward rule. In their scenario, pairs of agents have toplay a pure coordination game of 2 actions. As the main goal of their work, agents arelocated in theoretical social networks, namely, Scale-Free Networks constructed withthe Albert-Barabasi algorithm7. However, as Kittock (and Shoham and Tenneholtz),Delgado fixes convergence rate at 90%. In this case, they observe that the process ofconvention emergence is accelerated with the usage of Scale-Free networks, because ofthe cascading effect produced by inherent structure of the network. We will see laterhow this acceleration is only at an initial phase, as it might happen that certain topo-logical subconventions emerge, specially in scale-free networks, although these cannotbe observed when fixing convergence at 90%. In a latter work [Delgado et al., 2003],Delgado’s work is extended including different algorithms for the construction of theScale Free networks (the Albert-Barabasi already studied in [Delgado, 2002], the Walshmodel and the Fitness model) to be used in the convention emergence problem. Eventhough the new extended variety of algorithms to build Scale-Free networks, conver-gence dynamics are similar and proportional to the density of the network. In a differentwork [Pujol et al., 2005], and with the scope of accelerating the process of conventionemergence, agents are given with socially-inspired mechanisms. Authors proved empir-ically that by providing agents with a small probability of Imitation (copying the actionperformed by another agent independently of what the strategy update rule dictates),faster convergence rates are obtained in scale-free networks. However, the convergencerates were still fixed at 95%.

Ten years after the first strategy update rule was presented, Sen andAiriau [Sen and Airiau, 2007] decided to incorporate different machine learning al-gorithms in the convention emergence problem, avoiding in that way the prob-lems inherent to the selection of an appropriate strategy selection rule. In thatwork they did not incorporate any interaction topology, restricting their empiricalresults to the efficiency of different Multi-agent Learning Algorithms such as Q-Learning [Watkins and Dayan, 1992], WoLF-PHC [Bowling and Veloso, 2002], andFictitious Play [Fudenberg and Levine, 1998]. The contribution of Sen and Airiau isimportant as they combine two interesting lines of research in Multi-agent Systems,such as Convention Emergence and Social Norms with Multi-agent Learning. As anextension of the previous work, in [Mukherjee et al., 2008] agents are located in a gridenvironment, in order to study the topological effects of the social network of inter-actions, as others ([Walker A, 1995]) did before. In both works, they consider a con-vention to have emerged when the average payoff of the population is at 3.5 (out of amaximum of 4, and a minimum of -1).

In a renewal of interest for the Convention Emergence problem, in the last yearsthe MAS community has focused on discovering different mechanisms that could ac-

7Albert-Barabasi algorithm tuned with the following parameters m0 = 4, m = 2, p = q = 0.4.

16

celerate the emergence, specially in complex networks. In the work of Savarimuthuet al. [Savarimuthu et al., 2007b], authors describe the emergence of norms in an sce-nario where the network of interactions dynamically change as the agents move in agrid world, being their neighbours in the network those agents geographically closer tothem in the grid at an specific moment in time. Eventhough this work might remind usto others [Walker A, 1995], authors here focus on the dynamic aspect of the network ofinteractions and how this affects to the process of emergence, which in this case is fixedat 100%.

Another recent work [Mungovan et al., 2011] propose an interesting scenario whereagents have a fixed interaction network (small-world) and authors provide agents withthe possibility of having random interactions. Their solution to the random interactionsis very elegant as the random interactions are not chosen uniformly from the popula-tion, but, with a weighted measure considering the distance in the network (i.e. agentsthat are closer in the network are more likely to interact). Their results, together withthose presented in [Walker A, 1995, Pujol et al., 2005, Savarimuthu et al., 2007b], sug-gest that a random interactions produce faster convergence results than when the in-teraction topology is fixed. However, the type of scenario we are interested in (virtualsocieties constructed on top of a real social network) cannot assume the possibility ofintense dynamic changes on the underlying network.

Trying to provide agents with other mechanisms, Urbano etal. [Urbano et al., 2009a] investigated how a new strategy update rule (Recruit-ment based on Force with Reinforcement (RFR)), which promotes a dynamic creationof a hierarchy speeds up the convention emergence. Their proposed strategy updaterule functions by providing agents with a measure of force that increases with everysuccessful interaction, and in case of unsuccessful interactions, the agent with thelower force copies the strategy and force of the winner. We find this function similarto the SLACER algorithm [Hales and Arteconi, 2006]. Authors prove the efficiency oftheir new strategy update rule on different topologies: regular graphs, a unconventionalimplementation of a random graph with uniform degree distribution, scale-free net-works (developed following the Albert-Barabasi model), and Small World (constructedusing the Watts-Strogatz model). However, convergence rate is still fixed to 90%. Theirresults are in concordance with those obtained by Kittock in [Kittock, 1994] where heproved the efficiency of trees structures, that are those dynamically created with theconcept of force.

Providing a slightly different framework than the classical convention emergence,Salazar et al. [Salazar et al., 2010] propose a variation of the coordination game, im-plementing a Language Coordination Game similar to [Lakkaraju and Gasser, 2008].In their game, agents have to match words with concepts, creating a huge conventionspace. Their solution is founded on a spreading protocol where agents can communi-cate part of a local solution together with a truthful valuation of the solution. Therefore,agents aggregate into their solutions the bests pieces of solutions received and thencommunicate them again with a truthful measure. Because of the communication is-sue, their framework is fragile against the presence of untruthful agents with differentpreferences than their collaborative agents. Authors demonstrate the robustness of theirinstrument in different topologies (scale-free and small world), reaching convergence

17

rates of 100% in most of the cases. Even though the success of their approach, theyassume the existence of a fully coperative community, which in our case does not haveto be the case. Conventions have to be reached by themselves, as it is in the commoninterest to be reached.

The general conclusion that we can extract from the revised work on the literatureis that only those authors that provided agents with instruments (generally instrumentsthat access private information of agents) achieve complete convergence in complexnetworks. The rest of work which do not incorporate any extra mechanism state theconvergence results with a rate of 90%, that as we have claimed previously, it cannotbe considered as a convention to be emerged. A summary of the review works in theliterature together with some of the characteristics analyzed can be seen in Table 2.1.

In the following chapters we will overcome this prefixed limit, observing the dy-namics of the system after that point.

2.3 Essential Norms and the Emergence of Cooperationin Artificial Social Systems.

As we mentioned previously in this chapter, there are norms that solve different prob-lems (represented with different games) than conventions. These games are representedby the Prisoner’s Dilemma or the Public Good Game (depending on the amount ofplayers). In general, in these type of games, the individual’s self-interest is in con-flict with the social interest, and norms establish a balance, motivating agents to playthe prosocial behaviour. These type of games generally represent what in the litera-ture is known as the Emergence of Cooperation [Axelrod, 1981]. This problem hasbeen treated from several areas of research such as philosophy [Bicchieri, 2006], econ-omy [Fehr and Gachter, 2002, Dreber et al., 2008], politics [Axelrod, 1986], and com-puter science[Kollock and Smith, 1996, Shoham and Tennenholtz, 1997a].

“The Tragedy of the Commons”, firstly described in [Hardin, 1968], describes thesituation where potentially everyone can highly benefit from a common resource. How-ever, the temptation is to selfishly enjoy the common resource without contributing toit.

In order to achieve and maintain cooperation, researchers have looked for socially-inspired techniques that can be implemented into the agent architectures and obtain amacroscopic result. Along this section we will review the most influential techniquesthat can be suitable for decentralized virtual societies, focusing on the achievement ofcooperation.

2.3.1 Socially Inspired Mechanisms for Cooperation Emergence

Tags

One of the most studied mechanisms is Tags. The term was introduced by Hollandin [Holland, 1993], and it is ideally conceptualized to allow agents to have a certainexternal visible marking recognizable by the rest of the population and relating the tag

18

with a specific behaviour. A good example of a Tag mechanism is to be found at the mil-itary stripes, allowing any member of the military to recognize the rank (and associateit with certain protocols of interaction) towards another by observing its stripes.

By allowing agents to carry tags, they recognize features that can associate towardsothers’ cooperative or defective strategy. It has been showed that incorporating a tag-based donation system (donating to those agents with similar tags) can sustain the emer-gence of cooperative behavior [Riolo et al., 2001]. Nevertheless, the model is based ona strong assumption: agents change their tags by accessing (and copying) other moresuccessful agents’ tag. We consider such information (others’ strategy and its success)to be private, and not accessible to other agents, at least in the type of scenarios weenvision, where each agent might belong to a different owner, and therefore, this typeof information access is unrealistic.

In addition, in these models tags evolve by allowing agents to substitute their neigh-borhood for the neighborhood of the most successful agents, producing a dynamic vari-ation of the network as well as an unrealistic access to private information of agents(such as their network of interactions).

Other models [Hales and Edmonds, 2005, Hales and Arteconi, 2006] have tried toapply a tags based mechanism in P2P networks, obtaining higher cooperation ratesthan when tags where not used. Other models have been inspired by Hales’ SLACalgorithm (copying the neighbours and strategy of the most successful known agent)like Araujo’s [Araujo and Lamb, 2008] Memetic Networks (inspired in the concept ofmemes: information that is propagated through copies in a cultural setting), or Griffiths’et al. [Griffiths and Luck, 2010] and the integration with Context Awareness.

Even though some variants of the tags mechanism have been ana-lyzed [Matlock and Sen, 2009], we find unrealistic to allow agents to access others’private information, which necessary for tags to function (success of the opponentagent, and strategy of interaction) and copy it for their own success. Moreover, in tagbased mechanism, agents reproduce and die as genes do in a genetic algorithm. Weare more interested in finding online mechanisms that rely on the present and publicactions of agents without having to sacrifice or reproduce.

Partner Selection

The private information access problem that concerns us the most with respect to Tagsmechanisms is avoided in another studied mechanism: Partner Selection. The maindifference with respect to tags is that this mechanism allows agents to choose whomthey want to interact with, thus reducing the risks associated to cooperation. Tags al-low agents to identify other agents as similar (comparing their tags) and then interactaccordingly; however, with partner selection agents need to explore and then start theexploitation by interacting repeatedly with those agents that have obtained the most sat-isfactory results. This type of mechanism generates the evolution of a social network,where an agent interacts more often with those that it considers more suitable. De-pending on the agents’ tolerance towards defection and the costs of refusal and socialisolation, the resulting social network of interaction might be very detached, resultingin higher rates of endogamy [Ashlock et al., 1996].

From the point of view of defectors, one of the problems is that in large populations

19

they might avoid punishment by roving from group to group, while they might resultostracized in smaller populations. Some simulation models of real phenomena haveproven [de Vos et al., 2001, Tosto et al., 2006, Nosratinia et al., 2007] that by allowingpartner selection, higher group payoffs are obtained, increasing the rate of cooperationwhile avoiding free-riders.

However, these models implementing partner selection also implement the possibil-ity of interaction rejection between agents, producing a segregation on the population.

Reputation

One mechanism that has been thoroughly studied is Reputation. This mechanismemerges when agents can communicate amongst them and can express and reason aboutother agents previous experiences. MAS researchers have produced a number of dif-ferent models to reason about trust and reputation issues [Sabater and Sierra, 2005],in a centralized or decentralized way. The last models (decentralized) are thosethat we are more interested in. Reputation mechanisms [Ohtsuki and Iwasa, 2004,Nowak and Sigmund, 2005] help in establishing indirect reciprocity (“I help you if youhelp somebody else”). However, these type of systems are fragile against the existenceof unreliable agents.

Reputation can be used to promote partner selection (by spreading good evaluationsabout an agent and therefore everyone will want to interact with it) or to punish (byspreading bad evaluations about such agent producing ostracism).

Some MAS scientists have proven that by introducing reputation mechanisms inP2P-like scenarios, cooperation is sustained [Hales, 2002, Banerjee et al., 2005], en-suring a good performance of the system.

Authority

The existence of a centralized authority that regiment the behaviour of agentsis an efficient mechanism to obtain norm-compliance [Cardoso and Oliveira, 2008,Cardoso and Oliveira, 2009, Garcia-Camino A, 2006, Grossi et al., 2007]. The successof the authority is found on the recognition from agents of an entitled figure that objec-tively recognizes the difference between what is right or wrong, and (ideally) punisheswhat is not considered as a pro-social behaviour.

Our main concern against the Authority is that it requires role-specialization, wheresome agents have special abilities over the others. The type of scenario that we deal withis self-regulated, which assumes that all agents have the same potential possibilities toperform actions over the other agents. The MAS scenarios where authority has beenproven are small and controllable environments, however, we can think of other typesof scenarios (like P2P networks) where controlling all interactions amongst agents istotally unfeasible.

Authority has also been implemented [Aldewereld et al., 2005] into the agentsmind: instead of having an entitled agent controlling the correct interactions, agentsare initialized with normative protocols of interactions, called landmarks, which re-strict the agents freedom, establishing the actions to be taken in order to achieve certain

20

tasks in a normative manner. The main drawback associated to landmarks is the diffi-culty for agents to adapt to environmental changes, which has to be done by changingagents’ internal landmarks by their programmers.

In general, we can observe how the success of most of the previously explainedmechanisms is based on the potential repercusions on the agents payoffs, therefore, thepunishment associated. When using tags or partner selection, agents fear the poten-tial ostracism [de Pinninck et al., 2007a]. Reputation affects indirectly the utility of anagent, by restricting the number or decreasing the quality of interactions(due to the badreputation). And, with authority is implemented, agents fear the fines imposed by au-thority when norm violations are observed. Because of all these reasons, we focus onpunishment as one important mechanism that ensures cooperation.

2.3.2 PunishmentAllowing agents to react against the behaviour of wrongdoers is the scope of Punish-ment. On the other hand, by receiving punishments, agents’ utility can be affected bythe others punishments. Nobel prize Elinor Ostrom [Ostrom, 2000] gives an evolu-tionary perspective to the emergence and maintenance of the cooperation norm in col-lective action dilemmas, discussing the relationship of the maintenance of norms withthe existence of monitoring and sanctioning mechanisms that reduce the probability offree-riding. Another Nobel prize Gary Becker also related the existence of a punish-ment mechanisms with the maintenance of cooperation [Becker, 1968]. Nonetheless,Becker’s view describes a homo-economicus type of agent, which is only motivated byutilitarian aspects. The pessimistic conclusion of many researchers [Hardin, 1968] isthat coercion by a strong external authority is necessary in order to ensure cooperation.

However, Ostrom’s theory suggests the existence of other type of mechanismsthat sustain cooperation other than the utilitarian damage that comes with punishment.In [Ostrom, 2000], she discusses the laboratory empirical evidences that are hard (or im-possible) to explain following Becker’s homo-economicus theory, or zero-contributionthesis.

Ostrom defends that for the evolution and maintenance of norms, other types ofagents different to the rational egoist agent have to be present in the society, and shedefines two of them: “conditional cooperators” and “willing punishers”. The formerare willing to cooperate to the common good (even though the rational strategy wouldmove the agent to reduce its contribution to the minimum, even to zero). However,this type of agents is also affected by the behaviour of others, which might provokea collapse in the cooperation strategy followed by the conditional cooperator. On theother hand, the willing punishers are agents that are “willing, if given the opportunity,to punish presumed free riders through verbal rebukes or to use costly material payoffswhen available”. With the existence of these types of agents, a more robust compliantand norm-defending society can be obtained.

Several works in Multi-agent Systems have considered the existence of punishmentmechanisms to ensure norm-compliance [de Pinninck et al., 2007b, Blanc et al., 2005,Grossi et al., 2007, Bazzan et al., 2008]. In [Pasquier et al., 2006] a classification of thepossible types of sanctions that can be present in a MAS, together with the styles ofsanctions and the scope of sanctions, is presented.

21

By allowing peer punishment, on contrast to that achieved through a centralizedinstitution, agents have the option to violate norms in those situations that they find itnecessary. This intelligent norm-violation allows society as a whole to achieve self-organization.

Normative BDI Architectures

Some authors [Castelfranchi et al., 2000] already noticed about the necessity of intel-ligent norm violation. They describe a theoretical architecture of agent that have anumber of characteristics that allow an agent:

• to know that a norm exists in the society and that it is not simply a diffuse habit,or a personal request, command or expectation of one or more agents;

• to adopt this norm impinging on its own decisions and behaviour, and then

• to deliberately follow that norm in the agents behaviour, but also

• to deliberately violate a norm in case of conflicts with other norms or, for ex-ample, with more important personal goals; of course, such an agent can alsoaccidentally violate a norm (either because it ignores or does not recognize it, orbecause its behaviour does not correspond to its intentions).

Even though the accurate theoretical policy described in [Castelfranchi et al., 2000],no agent implementation was done. On the other hand, a more complete description ofan agent architecture based on a BDI architecture is presented in [Dignum et al., 2000],which integrate norms into the agents decision making, specially on the process ofgenerating (candidate) intentions. With an such type of architectures, agents’ behaviouris different (and norm regulated) than that of utility-maximizing agents.

[Boella and Lesmo, 2002] sketched a BDI agent architecture that cope with game-theoretical concepts of norms, mainly, punishment. A key element for their theoreticalmodel to function is the existence of a “normative agent”, which imposes obligationsto other agents, observe the fulfillment of norms, and apply sanctions to norm-violators,similar to the “willing punishers” described in [Ostrom, 2000].

As we will explain along this thesis, there is a clear necessity of a modular BDIagent architecture that can deal with normative related actions such as punishment andintegrate them correctly into the agents decision making. On the other hand, this agentarchitecture has to be designed in such way that agents can intelligently apply punish-ments to other norm-violating agents. With a correct implementation of a BDI nor-mative architecture which reasons about norm compliance but also about metanormcompliance (apply punishments), intelligent norm violation can be achieved.

Other researchers have investigated a negotiation approach [Boella et al., 2009] tothe emergence of norms. In this work, agents negotiate about all the aspects relatedto the norms, like the social goal to be achieved with the fulfilment of the norm, theobligations and sanctions. Moreover, this model is theoretically designed for onlinegame scenarios, like The Sims Online or Second Life. And alternative to the BDImodels to achieve norm consensus is that presented in [Joseph and Prakken, 2009].

22

Instead of centering on the emergence, other proposals [Tinnemeier et al., 2010]have been envisioned to facilitate the runtime modification of norms. Such frame-work assumes a whole cooperative population, that will accept any changes in thenormative code produced by any other agent. Similar to the work of Tinnemeier,in [Meneguzzi and Luck, 2009] authors present an agent architecture able to modifyplans at runtime in reaction to newly accepted norms. Their main contribution is theintroduction of plan manipulation strategies to enable reasoning about norms and to en-sure compliance with newly accepted norms. Authors focus on the intelligent compli-ance of norms rather than in the emergence of these. They present an elegant frameworkwhere norms are obligations and prohibitions over certain states of the world or certainactions, which are specified with a validity period (if the norm says to pay your bills, itis impossible to identify a norm violation if there is no deadline to abide by the norm).Their framework can generate new plans to enable agents to comply with norms, andremove the plans when the norms are no longer relevant.

Another important contribution in the literature of normative agent architecture isEMIL-A [Andrighetto et al., 2010a]. EMIL-A is an agent architecture born with thescope of representing the process of norm innovation, and how this process is affectedby different societal levels, and the relations amongst them. Specially, they focus on therecognition of social norms in open societies where there is no central authority. Thearchitecture is built using a BDI approach, where the micro-interactions amongst agentsderive to a macro-behaviour (norms). Their theoretical architecture is also proven witha simulation framework, where agents recognize the existence of norms and behaveaccording to that recognition. However, agents were still not given with the tools andmechanisms to reason about the dynamic state of norms, and how they might disappear(and loose scope) along time.

In this thesis we will present an agent architecture (EMIL-I-A), constructed on topof the BDI EMIL-A architecture, that is able to interpret normative cues (such as com-pliance, violations, sanctions and punishments) into the agents decision making. More-over, this architecture integrates an adaptive vision of agent which can adapt to envi-ronmental and normative changes and obtain intelligent norm violation. Finally, thisarchitecture is given with an extra mechanism that allows agents to dissociate compli-ance with punishment, called Internalization. Our EMIL-I-A Architecture fits with theoperational approach described in [Lotzmann and Mohring, 2009].

23

Chapter 3

Conventions: Trespassing the90% Convergence Rate

Conventions are a special type of norms, used as a mechanism for sustaining socialorder, increasing the predictability of behavior in the society and specify the detailsof those unwritten laws. Following Coleman’s theory[Coleman, 1998], conventionsemerge to solve coordination problems, where there exist no conflict between the in-dividual and the collective interests, as what is desired is that everyone behaves in thesame way, without any major difference on which action agents are coordinated. There-fore, the selection of the focal action in such norms is arbitrary. One clear example ofthese kind of norms is the selection of which side of the road to drive on, both optionsare equally good as long as all drivers agree. Examples of conventions pertinent toMAS would be the selection of a coordination protocol, communication language, or(in a multitask scenario) the selection of the problem to be solved.

Research from several communities (multi-agent systems, physics, or economy)have studied a restricted configuration of parameters, giving special importance tothe underlying interaction topology [Kittock, 1993, Shoham and Tennenholtz, 1997a,Urbano et al., 2009a, Delgado, 2002, Delgado et al., 2003], and leaving all the other pa-rameter configurations unexplored.

Moreover, and in order to save computation efforts, most of these research consid-ered a convention as emerged when 90% of the population had converged to the sameconvention. However, 90% convergence cannot, by definition, be considered a conven-tion, as a convention needs to be shared by the complete population. Therefore, we havedecided to extend the 90% convergence rate to 100% in order to be consistent with thedefinition.

In this work we will provide multi-dimensional analysis of several factors that webelieve determine the process of convention emergence, such as:

• the population size.

• the underlying interaction topology.

• the strategy selection rules.

25

• the learning algorithm.

• the agents memory size.

• the amount of players per interaction.

The contribution of this chapter is centered on the achievement of full convergence,and a detailed analysis of the search space of parameters.

3.1 Model

The social learning situation for norm emergence that we are interested in is that oflearning to reach a social convention. We adopt the following definition of a socialconvention from the definition of a social law [Shoham and Tennenholtz, 1997a]: Asocial law is a restriction on the set of actions available to agents. A social law thatrestricts agents’ behavior to one particular action is called a social convention.

For the sake of generalization, our framework is built with the most acceptedconvention emergence model (used by [Kittock, 1993, Walker and Wooldridge, 1995,Shoham and Tennenholtz, 1997a, Delgado et al., 2003, Mukherjee et al., 2007,Sen and Airiau, 2007, Villatoro et al., 2009]). We represent the interaction betweentwo agents as an n-person m-action game.

for a fixed number of epoch do1

A← N // Initialize the available agents with all the2

populationrepeat3

select randomly an agent from the available agents: i ∈ A;4

select randomly a neighbor of i that is available: j ∈ A∪{ j |(i, j) ∈ G };5

remove i and j from the set of available agents: A = A\{i, j};6

ask i to select an action r in Ar;7

ask j to select an action c in Ac;8

send the joint action (r,c) to both i and j for policy update;9

until no pair of agents is available: @(i, j) ∈ A2 | i, j) ∈ G ;10

end11

Algorithm 1: Interaction protocol.

The model runs as specified in Alg. 1: at each time step, each agent is paired withanother agent from the population and chooses from one of several alternatives (thechoice can either be that of an action to execute, which we use in this paper, or aparticular state to be in). In our case a social convention will be reached if all the nagents are in the same state or choose the same action, i.e., the actual state or actionchosen as a convention is not important.

Based on the underlying assumptions of the convention emergence problem, allactions (or states) are potentially of the same value, as long as all agents converge to the

26

same action (i.e. given the set of actions={A,B}, converging to action A produces thesame utility to agents than converging to action B).

We model agent environments by networks, where each agent is represented by anode and the links in the network represent the possibility of interaction between nodes(or agents).

We consider the following two different agent network topologies or environmenttypes:

1. a one-dimensional lattice that provides a structure in which agents are connectedwith their n nearest neighbors. Different values of the neighborhood size (n)produce different network structures. For example, when n = 2 the network willhave a ring structure (as in Figure 3.1(b)) and agents will only be connectedwith their immediate neighbors on either side, corresponding to a ring topology.On the other hand, when n = PopulationSize, the connections result in a fullyconnected network (as in Figure 3.1(a)) where each agent is connected with allother agents. The motivation for the usage of this type of networks is to simulatethe ordered structure that many computational systems (memory clusters, sensornetworks, etc..) are organized.

2. a scale-free network, whose node degree distribution asymptotically follows apower law, meaning that there are many vertices with small degrees and only fewvertices with large degrees. This makes the network diameter (average minimumdistance between pairs of nodes) significantly smaller than one-dimensional lat-tices with small neighborhood sizes. Scale-free networks do represent the topol-ogy of social networks [Albert and Barabasi, 2002], and we find valuable to haveresults at least in synthetic scale-free networks. An example of this topology canbe seen in Fig. 3.1(c). The scale-free networks used in this work are generatedwith the BA algorithm [Barabasi and Albert, 1999], with mo = 2, m = 1, p = 1and q = 0, as in other works such as [Delgado et al., 2003, Salazar et al., 2010].

3. to capture some typical real-world scenarios, e.g., a community of closely knitresearchers and their students, we use a rather novel network topology, namelythe fully connected stars network: such a network has a relatively small number ofhubs or core nodes which are fully connected forming a clique, and each of thesecore nodes is also connected with a number of leaf nodes (an example can beseen in Figure 3.1(d)). This topology possesses some interesting characteristicsto understand the topological dynamics of convention emergence.

Agents cannot observe the other agent’s current decision, or immediate reward, andhence cannot calculate the payoff for any action before actually interacting with theopponent.

Agents use a learning algorithm to estimate the worth of each action. Agents willchoose their action in each interaction in a semi-deterministic fashion. A certain per-centage of the decisions will be chosen randomly, representing the exploration of theagent, and for the rest of the decisions, the agents deterministically choose the actionestimated to be of higher utility. In all the experiments presented in this work, the ex-ploration rate has been fixed at 25%, i.e., one-fourth of the actions are chosen randomly.

27

(a) Fully Connected Network (b) Ring Network or One-dimensional Latticewith Neighborhood Size 2

(c) Scale-Free Network (d) Fully Connected Stars Network

Figure 3.1: Underlying Topologies

28

The learning algorithm used here is a simplified version of the Q-Learning algo-rithm [Watkins and Dayan, 1992]. The Q-update function for estimating the utility ofan action is:

Qt(a)← (1−α)×Qt−1(a)+α× reward, (3.1)

where reward is the payoff received from the current interaction and Qt(a) is the utilityestimate of action a after selecting it t times. When agents decide not to explore, theywill choose the action with the highest Q value. The reward used in the learning processis a proportional reward obtained from the system after each interaction.

We have used two different learning modalities: (a) in the Multi learning approachboth interacting agents use the payoff to update their memory and action estimate, (b) inthe Mono learning approach, however, only the first agent selected, and not the secondone, updates its memory and action estimate after an interaction. Each agent interactsexactly once per time step in mono-learning, whereas in the multi-learning mode, dif-ferent agents interact different times in the same time step because of random partnerselection.

3.2 Overpassing the 90%The first experiments performed have the scope of proving how con-vergence evolves in the convention emergence process, as done in[Delgado et al., 2003, Kittock, 1993, Mukherjee et al., 2007, Sen and Airiau, 2007,Shoham and Tennenholtz, 1997a, Walker and Wooldridge, 1995]. However, this timewe have fixed the rate of convergence to 100%.

Our experimental results confirm those obtained by others with the same parame-ters, although, as we venture further than 90%, we discover different dynamics in thoseranges.

It can be seen in Figure 3.2 that convergence rates grows really fast at the beginningof the execution as agents do not have any clear preference and they will convergethrough the learning obtained by interacting. In certain topologies, the convergencerates suffer from a delay which might be produced by the existence of subconventions.

We hypothesize that subconventions are facilitated by the topological configurationof the environment (isolated areas of the graph which promote endogamy) or by theagent reward function (concordance with previous history, promoting cultural main-tenance). While the former can be studied by analyzing the dynamics of conventionemergence of the different presented topologies, in order to test the latter a new rewardfunction that captures the agents’ previous history has to be designed.

3.3 Is cultural maintenance the source of subconven-tions?

We hypothesize that history of interaction is instrumental for norm evolution. Learn-ing algorithms incorporate the history of interactions into their decisions, but rewardmetrics are typically static and independent of the agent histories. Norm evolution is

29

0.7

0.8

0.9

1

0 100000 200000 300000 400000 500000

Con

verg

ence

Rat

e

Timesteps

Scale FreeLattice NS = 10Lattice NS = 30Lattice NS = 50

(a) 2 actions

0.5

0.6

0.7

0.8

0.9

1

0 100000 200000 300000 400000 500000

Con

verg

ence

Rat

e

Timesteps

Scale FreeLattice NS = 10Lattice NS = 30Lattice NS = 50

(b) 5 actions

Figure 3.2: Convergence rates with different topologies and actions set

30

dependent upon the exertion of social pressure by the group on aberrant individuals. Itis through learning via repeated interactions that social pressure is applied to individ-uals in the group. However, a reward metric based on the current interaction does notnecessarily model the full context or capture the persistent nature of social pressure inhuman societies.

In particular, society often uses past history to judge individuals and hence actionshave future consequences in addition to immediate effects. Accordingly, we proposea reward structure based upon the agent’s interaction history as a more appropriate al-ternative to the single interaction reward metric normally used in the literature. In ourmodel agents are rewarded based upon the conformity of action between two agents,such that the agent who has the larger of the majority action choice in the stored in-teraction history of the agents receives higher reward. Hence, both interaction agents’history of actions are used to calculate each individuals’ payoff from an interaction. Weinvestigate how this history, and in particular, its size (memory size) affects differenttypes of social structure.

We have designed a memory based reward function that captures the past actions ofthe interacting agents into the calculation.

/* First, we select the majority action */TotalAActions = A1 +A2;1

TotalBActions = B1 +B2;2

if TotalAActions < TotalBActions then3

Ma jorityAction = B;4

if TotalBActions < TotalAActions then5

Ma jorityAction = A;6

if TotalAActions == TotalBActions then7

Ma jorityAction = RandomlyselectedbetweenAorB;8

/* Then, we calculate the reward depending on the agents9

action selection and on the majority action */if Action1 == Ma jorityAction then10

reward1 = Ma jorityActions1TotalMa jorityActions11

else12

reward1 = 013

end14

Algorithm 2: Memory Based Reward Function.

When two agents interact, the instantaneous reward that an agent receives is calcu-lated based on the action it selected and the action history of both agents as shown inAlgorithm 2 (the algorithm calculates reward for agent 1 and assumes only two actionsavailable per agent, but can be easily extended to an arbitrary number of actions). HereAx and Bx are the number of A and B actions in memory that agent x has taken, Actionxis the last action taken by agent x, and for which it is rewarded, Ma jorityAction is se-lected to be whichever action is selected more frequently by the two players combined,Ma jorityActionsx is the number of actions equal to the majority action that agent x haspreviously taken, and TotalMa jorityActions is the number of times the majority action

31

was chosen by both players in their finite histories.

3.3.1 ExperimentsTo evaluate the rate and success of norm emergence we ran experiments with differentsocietal configurations by varying the following system and agent properties:

Memory Size: To be able to analyze the effects of memory sizes, we vary the numberof past interactions stored by an agent,

Population Size: We study the effects of scale-up by varying population size.

Neighborhood Size: We study how different neighborhood sizes in a one dimensionallattice affect the process of emergence of conventions.

Underlying Topology: We observe the dynamics of the process of emergence of con-ventions depending on the underlying network topology.

Learning Modalities: We compare how conventions are reached with different learn-ing modalities, namely, one or both agents learning from an interaction.

Number of Players: We analyze the effect of different number of players participatingin each interaction on the speed of convention emergence.

Actions Set: We study the effect of the search space, namely the number of optionsavailable to agents, on the convention emergence process.

Results reported here have been averaged over 25 runs. Agents are initialized withuniformly random memories, and initially are unbiased in their action choice. We con-clude that a social convention has been reached when 100% of the population choosethe same action.

3.3.2 Effect of Memory SizeIn this set of experiments, we want to observe the effect of different memory sizeson convention emergence. We fix the population size at 100 agents. We present theconvergence times for different memory sizes in Figure 3.3.

First experiments are performed on fully connected networks, and the results areshown in Figure 3.3. For both learning modalities, convergence times increase pro-portionally with memory size. The relationship between memory size and convergencetime is explained by observing the configuration of the reward function and the learningalgorithm. Each action in memory gets a relatively high reward for smaller comparedto larger memory sizes (refer to the reward function defined in Algorithm 2). The learn-ing algorithm, therefore, receives larger reinforcements for the actions performed forsmaller memory sizes, resulting in faster convergence. Convergence is accelerated inthis situation because higher rewards have a larger impact on the Q value updated bythe learning algorithm represented in Formula 3.1. On the other hand, when dealingwith higher memory windows, the proportional reward is much smaller, and therefore,the reinforcement will be smaller. Due to this smaller reinforcement, a higher number

32

Figure 3.3: Effect of Memory Size in Convergence Time on a Fully Connected network.(100 Agents).

of interactions, and hence higher number of timesteps, will be needed to reinforce thataction to same degree, thereby increasing convergence time.

Moreover, and observing the learning modalities, the mono-learning approach takeslonger to converge than the multi-learning approach. A part of this difference is ex-plained by the fact that the average number of learning interactions in a multi-learningapproach is twice that of the mono-learning approach for the same number of time steps.There is, however, an additional clear trend of accelerated learning when both agentsare learning from the same interaction.

We extend the experiment to the other presented network topologies. The re-sults show that larger memory sizes consistently increase convergence times for allthe topologies, showing a stronger impact in the scale-free and fully connected starsnetworks than in the one-dimensional lattice.

In Figure 3.4 (note that the y-axis is in a logarithmic scale), we can observe therelative performance of different topologies for different memory sizes with the mono-learning approach. During this experiment, we limited the execution of the simulationsto one million timesteps. We observe that the Fully Connected Stars network takes themost time to converge, followed by the Scale-Free network. For both the Scale Free andthe Fully Connected Networks we can observe that the convergence time increases withincreasing memory size. These inefficiencies are largely due to more time taken to breakor resolve conflicting subconventions that form with scale-free and fully connected starsnetworks but not for fully connected networks (see Section 3.3.4). Accordingly, thefully connected network scales up much better with increasing memory size.

These first experiments provide us with the hint that topologies have an strong im-pact on the delay of full convergence, specially in Scale-Free and Fully Connected Starsnetworks.

33

Figure 3.4: Topologies Comparison with different Memory Sizes with Mono LearningApproach.

3.3.3 Effect of Neighborhood Size

To observe the effect of neighborhood size, we use a one-dimensional lattice (as scale-free networks and fully connected stars predetermine the neighbors for each node). Inthese experiments, we fixed a memory size of 5. Figure 3.5(a) shows a comparisonof convergence times for different neighborhood sizes, measured as percentages of thepopulation size, in a multi learning approach.

We can see that when increasing the neighborhood size, the convergence time issteadily reduced until it stabilizes after a certain neighborhood size. This effect is dueto the topology of the network. When the one dimensional lattice has a small neighbor-hood size, on average, the diameter of the graph1 is high and therefore agents have arelatively higher amount of local interactions. These local interactions might promotedifferent conventions in different parts of the network due to the inherent structure ofthe convention emergence problem: one convention from the possible ones has to beadopted by interacting with your neighbours, and as the topology restricts agents tomore immediate neighbours (promoting endogamy), they can develop a different con-vention than agents in another part of the network. These subconventions need to bebroken within a certain number of timesteps to achieve the global convention. In thecase of larger neighbourhood sizes, these endogamy has a softer effect, and thereforethe topology does not promote such strong metastable subconventions. It is also inter-esting to note that for smaller neighborhoods, larger populations exhibits much fasterconvergence.

1The diameter of a graph is the largest number of vertices which must be traversed in order to travel fromone vertex to another

34

(a) Multi Learning

(b) Mono Learning

Figure 3.5: Convergence rates with different Neighborhood Sizes in a One DimensionalLattice

35

Figure 3.6: Diameter relation with Neighborhood size in a One Dimensional Latticewith population = 100

Similar convergence results are also obtained with the mono learning approachshown in Fig. 3.5(b). When the neighborhood size crosses about 30% of the populationsize, the convergence time does not significantly decrease anymore. The relation of theneighborhood size and the diameter follows a geometric distribution and is shown inFigure 3.6. We see that when neighborhood sizes cross 30% of the population size thediameter of the network is no longer significantly reduced, and hence the convergencetimes are also not significantly reduced any further.

This experiment convinced us of the importance of the topology, but also lead us tothink about the differences between the learning modalities and their different effect.

3.3.4 Effect of Learning ModalitiesIn this set of experiments, we want to observe the difference in convergence times withthe two learning modalities for different topologies. We first compare results of thetwo learning modalities in a one-dimensional lattice with 100 agents (see Figure 3.7,where the y-axis is drawn on a logarithmic scale). For smaller neighborhood sizes,i.e., when the network diameter is high, multi-learning takes longer to converge thanmono-learning. After reaching the point where the diameter is no longer affected bythe neighborhood size (as discussed before, this happens when the neighborhood sizeis about 30% of the population size), the multi-learning performs better. The reason forthis interesting phenomenon is the creation of local subconventions with multi-learningwhen the neighborhood size is small.

When agents have a small neighborhood size, they will interact often with theirneighbors, resulting in diverse subconventions forming at different regions of the net-work. With the multi-learning approach, agents reinforce each other in each interac-tion. Such divergent subconventions conflict in overlapping regions. To resolve these

36

Figure 3.7: Different Learning Approaches in One Dimensional Lattices with differentNeighborhood Sizes

conflicts, relatively more interactions between the agents in the overlap area betweenregions adopting conflicting subconventions is necessary. Unfortunately, agents in theoverlapping regions may have more connections in their own subconvention region andhence will be reinforced more often by their subconventions, which makes it harderto break subconventions and arrive at a consistent, uniform convention over the entiresociety. In the case of the mono-learning approach, the agents in the overlapping regionwill not be disproportionately reinforced by the other agents sharing its subconvention,making it easier to break those subconventions.

On the other hand, when neighborhood sizes are large, and hence network diametersare small, agents interact with a larger portion of the population. This makes it moredifficult to create or sustain subconventions. In addition, this large neighborhood sizeis more effectively utilized by the multi-learning as agents will be learning from all theinteractions they are involved in, and not only from the interactions initiated by them.

For the scale-free and fully-connected stars, systematic variation of neighborhoodsize is not possible in general. We do observe an interesting phenomenon for thesekind of networks. When the multi-learning approach is used in scale-free networks andfully connected stars, subconventions are persistent and the entire population does notconverge to a single convention! This is the first time in all of our research on normemergence that we observed the coexistence of stable subconventions.

The explanation of this rather interesting phenomena can be found in the combina-tion of the memory-based reward function and the inherent topologies of such networks.We present, in Figure 3.8, a portion of a representative Scale Free or Fully ConnectedStars network where subconventions have formed. We see that agent 1 (hub node 1) andits connected leave nodes (nodes 10, 11, and 12) have converged to one subconvention

37

Figure 3.8: Subnetwork topology resistent to subconventions in a multi learning ap-proach

(represented by the color of the nodes) that is different from the subconvention reachedby agent 2 (hub node 2) and its connected leave nodes (nodes 21, 22, and 23). As anagent has equal probability of interaction with any of its neighbors, both agents 1 and2 interact more frequently with their associated leave nodes that share their subconven-tion.

Also note that when two agents interact and both actions have been used equallyoften in their combined memories, as will be the case when agents 1 and 2 interact,the majority action will be selected randomly, giving advantage to one of the agents.For the subconventions to be broken in this scenario, it is needed that for one of the hubnodes the following holds true: (1) the agent’s q-value for its preferred action decreases,and (2) the q-value for the action preferred by the other agent increases. In order for theagent’s q-value for its preferred action to decrease, a number of repeated interactions(proportional to the memory size) between the hub nodes (in our example 1 and 2)have to occur, and as there will be no clear majority action, the preference has to begiven to the same action, e.g., that preferred by agent 2, in all those interactions. Asthe reward for the agent 1’s action will then be 0, its q-value will start decreasing. Inorder for the agent 1’s q-value for its non-preferred action to increase, a number ofinteractions (also proportional to the memory size) between it and agent 2 has to occurand agent 1 has to explore in that interaction and try agent 2’s preferred action. This willresult in agent 1’s estimate of agent 2’s preferred action to increase, albeit slowly. Onlywhen both these fortuitous events follow each other, and without the intervention ofanother interaction with the leaves associated with agent 1 (which would reinforce thesubconvention), can the subconvention be ultimately broken. The likelihood of thesesequence of events happening is exceedingly small and hence subconventions routinelyarise with the multi-learning approach. Viewed another way, the leaf nodes can onlyinteract with their hubs and each of them will reinforce the subconvention action fortheir associated hub node in every time step, making it very unlikely that conflictingsubconventions will be resolved in situations such as in Figure 3.8.

On the other hand, as an agent is reinforced only once each time-step in the mono-learning approach, the processes required to break the subconventions are more likely,even though it sill has a relatively small probability. This probability decreases withlarger memory sizes, and hence subconventions are more likely to emerge with largermemory sizes when using mono-learning. Therefore, with larger memory sizes, sub-conventions will be harder to break and this phenomenon caused the significant increase

38

in convergence time for scale-free networks and full-connected star networks (we dis-cussed this in the previous section with reference to results displayed in Figure 3.3).

3.3.5 Effect of Number of PlayersTo investigate the effect of the number of players per interaction on the conventionemergence process we designed the following experiment set. Agents are situated in afully connected network. As the actual design of the Memory Based Reward Functiondoes not consider situations with more than two players, we modify it to produce anew reward function (shown in Algorithm 3) that reinforces the convention taken bythe majority of the players.

// First, we select the majority actionChoose Ma jorityAction as the most frequent action over memories of all1

interacting players;In case of tie, select randomly amongst the possible ones;2

// Then, we calculate the reward depending on the agentsaction selection and on the majority action

if Action1 == Ma jorityAction then3

reward1 = 14

else5

reward1 = 06

end7

Algorithm 3: MultiPlayer Reward Function.

Results from this experiment are presented in Fig. 3.9. We observe that the conver-gence time seems to decrease when the number of players in each interaction increases.The reason of this phenomena is found on the design of the reward function: agentswill reinforce the action played by the majority. In situations with smaller number ofplayers, fewer agents are consulted while deciding the majority action. This leads tothe emergence of more subconventions, and these agents reinforce each other. On theother hand, in scenarios with larger number of players per interaction, the majorityaction will be calculated by consulting the history of more agents, and therefore, thesesubconventions will be less likely to appear. In essence, more consistent rewards areprovided to a larger number of agents, hence promoting convergence of choices andenabling the emergence of a global convention. Information sharing between a largernumber of individuals then have a similar effect on global convention emergence as thelikelihood of interacting with a larger proportion of the society. Both facilitate uniformaction adoption.

In Figure 3.10 we can observe how the neighborhood size affects (as identified inSection 3.3.3) the emergence of multiplayer conventions. We previously observed thatthe emergence of conventions in one-dimensional lattices was strongly affected by theneighborhood size: the convergence times are drastically reduced when the diameter ofthe network (inversely proportional to the neighborhood size) is reduced. We can now

39

Figure 3.9: Different Players in Fully Connected Networks with Different LearningApproaches

observe similar effects in Multiplayer situations. We also observe that the convergencetime also increases when increasing the number of players. This phenomena occursdue to the learning modality used by the agents (a multi-learning approach) and therelation between the number of players and the neighborhood size. When a smallnumber of players interact (relative to the neighborhood size), subconventions are lesslikely to be created. However, when the number of players per interaction is larger(and closer to the neighborhood size), agents will always interact with the same agents,creating and reinforcing subconventions.

The neighborhood size also plays an important role when dealing with multiplayerinteractions. Assuming that all the players in an interaction must be neighbors, wecan infer that the amount of neighbors an agent has directly affects to the convergencetime. We have accordingly designed two methods that will adapt the neighborhoodsize of the network, depending on the number of players per interaction: the reducedand the super-reduced. The super-reduced scheme assigns the minimum amount ofneighbors needed for the number of players specified. Recall that the neighborhoodsize should be an even number N, representing N/2 of neighboring agents on each sideof the agent. Therefore, in the super-reduced method, for either a 2 or 3 players games,the neighborhood size will be N = 2. We have observed that using the super-reducedmethod, in the case of the upper limit of players (3 players with a neighborhood size N =2), makes agents interact repeatedly with the same agents every timestep. Depending onthe type of reward function, this repeated interaction with the same players would leadto metastable subconventions. Therefore, we relaxed this method creating the reducedmethod. The reduced method assigns the minimum amount of neighbors needed for

40

Figure 3.10: Neighborhood Sizes Comparison with Different Players with Multi Learn-ing Approach

the amount of players specified plus 2. This addition of two extra neighbors introducesvariety into the games (reducing the endogamy), allowing agents to play with differentagents in different interactions.

From the results presented in Figure 3.11 we can observe the convergence times ofboth methods (reduced and super-reduced) with different number of players per inter-action. We can observe how the super-reduced method produced larger convergencetimes and more pronounced changes between the even and odd value of each neigh-borhood size2. The reason why the super-reduced takes longer to converge is becauseof the interacting topology. When agents are not allowed to interact with other agentsthan their direct neighbors, no variance is introduced (complete endogamy), promotingthe appearance of a “frontier effect”. This frontier effect affects the agent located in themiddle of two regions of clear preference: this agent in the “frontier” will be doubt-ful about its preference (one example of this situation can be seen in Fig. 3.12). If theagents that the frontier agent interact with are always the same, it will be harder to breakthe frontier effect this agent is under the influence of.

Until one of the involved agents explores a different action, this frontier effect willnot be broken.

Moreover, in Fig. 3.11 we observed another phenomenon that has not been ex-plained yet. We observe that the scenarios with an odd number of players games takelonger to converge than those with an even number of players. We can find an expla-nation for this phenomenon in the design of the reward function and the frontier effect:the majority action is chosen by observing the actions of the interacting agents, and,if those are even, there might be a tie. In case of a tie, the majority action is chosenrandomly, giving a clear advantage to one of the actions in the “frontier” region. In thecase of an odd amount of players, the frontier agent will be affected by the same amount

2For N = 2, 2 and 3 players games can be played. For N = 4, 4 and 5 players games can be played. ForN = m, m (even) and m+1 (odd) players games can be played.

41

Figure 3.11: Reduced and Super Reduced Neighborhood sizes with different Playersusing the Multi Learning modality.

of non-frontier agents. We can observe and example scenario in Figure 3.12. Initially,in Figure 3.12(a), Agent 3 is doubtful about its preference. When Agent 2 interacts, inFigure 3.12(b), it affects (together with agent 1) agent 3, biasing their action preferenceto one convention. On the other hand, when agent 4 interacts, it will affect (togetherwith agent 5) agent 3, making it change its preference to the other convention.

The random selection of the majority action in case of a tie, possible only for evennumber of players, speeds up the process of convergence, although producing a largernumber of preference change per agent.

On the other hand, when having an odd number of players, we obtain the previouslyexplained “frontier” effect. This effect ensures a longer convergence time (because thefrontier agent needs to explore in order to break the frontier) with a smaller numberof preference changes. Experimental results are shown in Figure 3.13 confirming ourhypotheses.

3.3.6 Effect of Action SetThis last experiment was designed to evaluate how the topology plays an important

role in the emergence of conventions in environments where there are more than twoconventions to choose from. We can observe in Figure 3.13, that the larger the searchspace in the number of possible conventions, the longer it takes for the conventions toemerge. Specially pronounced is the effect with smaller neighborhood sizes. The expla-nation for this phenomena is again the creation of metastable subconventions promotedby the endogamy produced by the topological structure as explained in Section 3.3.3.However in the previous experiments, agent only had two options where to choose from.Having a larger set of options generate a larger number of different subconventions thatneed to be broken.

42

(a) Agent 3 doubtful

(b) Agent 2 interacts and affects Agent 3

(c) Agent 4 interacts and affects Agent 3

Figure 3.12: Frontier effect in 3-players game with two competing conventions.

3.4 Discovering subconventions: not only history, buttopology

These first set of experiments have been useful to understand the emergence of socialconventions based not only on direct interactions but also on the memory (and previ-ous history) of each of the agents under different interconnection topologies betweenagents. Our initial hypothesis was that subconventions are facilitated by the topologicalconfiguration of the environment or by the agent reward function. Experimental resultsusing the Memory Based Reward function confirmed that the cultural maintenance pro-moted by this functions slows down the convention emergence and in certain situationsprovoke subconventions to emerge.

However, empirical results have given us a number of proofs of the effect of topol-ogy on the maintenance of subconventions. Specific topology structures promote en-dogamy between a certain subgroup of agents that might produce a different subcon-vention than the rest, as identified by several authors [Salazar-Ramirez et al., 2008,Toivonen et al., 2009, Villatoro et al., 2009]. Specially, we have observed that subcon-ventions are more likely to appear and are more resistant when using the multi-learningapproach, and might not be resolved for scale-free and fully-connected star networks.This is an important result as we consider the multi-learning approach to be the mostrealistic. From now and on in the remaining of this thesis we will only consider thatlearning approach.

43

(a) Convergence Time for 2 Player Games (b) Convergence Time for 3 Player Games

(c) Average Number of Preference Changefor 2 Player Games

(d) Average Number of Preference Changefor 3 Player Games

Figure 3.13: Multi Player and Multi Action Games

44

Chapter 4

Unraveling Subconventions

The problem of subconventions is a critical bottleneck that can derail emergence ofconventions in agent societies and mechanisms need to be developed that can allevi-ate this problem. Subconventions are conventions adopted by a subset of agents in asocial network who have converged to a different convention than the majority of thepopulation. As we have seen, subconventions are facilitated by the topological config-uration of the environment (isolated areas of the graph which promote endogamy) orby the agent reward function (concordance with previous history, promoting culturalmaintenance). Assuming that agents cannot modify their own reward functions, theproblem of subconventions has to be solved through the topological reconfiguration ofthe environment.

4.1 Social Instruments for Subconvention DissolutionAs part of the social network, agents can exercise certain control over their social net-work so as to improve one’s own utility or social status. We define Social Instrumentsto be a set of tools available to agents to be used within a society to influence, directlyor indirectly, the behaviour of its members by exploiting the structure of the social net-work. Social instruments are used independently (an agent do not need any other agentto use a social instrument) and have an aggregated global effect (the more agents usethe social instrument, the stronger the effect).

Normally, in social learning scenarios the reward function for an agent is deter-mined either by the environment (e.g., if you over-exploit resources, they will be ex-hausted) or by individual social interactions (e.g., drive on the same side of the road asthe rest of the drivers for your and others’ safety) and typically individual agents cannotmodify it. However, in social learning interactions, the reward is determined based onthe strategies of the partners involved in the interaction. Therefore, an agent can indi-rectly modify the reward obtained and change its success and visibility and hence socialstatus by controlling or modifying the social network it is located in, e.g. selectivelychoosing which other agents it interacts with, giving different importance to the rela-tionships with other agents, share information about the interactions with other agents,

45

etc.Social instruments are used by agents to improve their utility, without directly al-

tering the utility function. Given a set of social instruments, agents can exploit the stateof their social environment to have tangible effects on the utility obtained from theirinteractions with other agents in that environment.

A society where individual agents repeatedly use a social instrument has the prop-erties of a complex system: the effect of the individual elements aggregated togetherexhibit, as a whole, properties that are not present in the properties of the individualparts. Moreover, the emergent aggregate behavior may change over time. The impactof the social instruments is reflected on its aggregated effect which often transforms thesocial environment.

For source of inspiration, we can look to human societies and observe how humanshave developed different social instruments to exploit and produce changes in theirsocial networks. These human social mechanisms have direct or side-effects onthe social networks. There exists some literature on classifying and analyzing theeffects of these social instruments on the network and therefore on the agents’ inter-actions [Albert et al., 2000, de Pinninck et al., 2007b, Savarimuthu et al., 2007a,Urbano et al., 2009a, Villatoro et al., 2009, Babaoglu and Jelasity, 2008,Conte and Paolucci, 2002, Hales, 2002, Hales and Arteconi, 2006].

However, there is a critical dearth of work in this area of socially-inspired tools formulti-agent systems (MAS). Our goal is to fill this gap by formalizing and adaptingthe social mechanisms used by humans for agent societies. As a social network can berepresented as a type of graph, graph theory provides us with a good ground formalismand allows us to establish a parallel between network operators and social operators.However, from a societal point of view only a subset of all the possible graph operationsmake sense.

In a social network, agents are represented by nodes and relationships by edges.Assuming agents cannot create or destroy other agents, the set of graph operations thatare socially relevant are the following: destroy a link, create a link, add attributes tonodes (node coloring), add attributes to links (link coloring) and transmit information(any node can transmit information to any other node following the right protocols usingthe communication network, which might be different from the social network).

The social instruments available to the agents should be able to perform thoseoperations. To alter the state of the social network in which agents are located,researchers have developed different social mechanisms that we now categorize. Weidentify some social instruments found in the MAS literature and observe their effecton the social network. The mechanisms we are interested in are those that can be usedby the agents without institutional support.

• Partner Selection (Edge Removal): People continually select who they doand do not want to interact with. Giving agents the capability of removingedges attached to them in their social network, gives them direct control overthe network that directly affects them, improving the overall behavior of thesystem [Albert et al., 2000, de Pinninck et al., 2007b].

46

• Tags (or Node Coloring): Recognizing certain attributes and characteristicsof other agents before interacting with them reduces social friction (e.g., byreducing the number of unsuccessful interactions) and improves coordination. Tofacilitate this process, some kind of externally visible social markings are needed,e.g., tag mechanisms [Chao et al., 2008, Holland, 1993]. By carrying a tag, hav-ing a role [Savarimuthu et al., 2007a] or a certain force [Urbano et al., 2009a],an agent allows all other agents to recognize a certain characteristic or qualityeven before any direct interaction.

• Social Position (or Link Coloring): Humans recognize that our behaviourhas to be adapted depending on who we are interacting with, e.g., I behavedifferently with a friend from school than with my boss. Agents can extractinformation from the topology of a social network to gauge the influence orstatus of other agents and use this information strategically to further their owninterests [Villatoro et al., 2009].

• Mimicking (or Node Imitation): Humans consciously and sub-consciously im-itate the strategies of more successful individuals as they aspire to improve theirown situation. Access to the strategies of other nodes in the network can be usedas a social instrument [Hales and Arteconi, 2006]. Such access can be applied tocompare the efficiencies of several strategies and mimic the most effective ones.

4.1.1 Our Social EquipmentSome researchers in the literature have used certain instruments for the establishmentand distribution of norms in virtual societies. However we have envisioned two new in-struments that we hypothesize will act efficiently on the dissolution of subconventions:rewiring and observation. As we will see in the next sections, rewiring directly changesthe structure of the network and observation exploits the information of certain parts ofthe network to bias agents’ decisions.

Rewiring (or Intelligent Link Removal and Creation)

If we use human societies as inspiration, agents should have the ability of choosingwhom they want to interact with. We have adapted this idea into a social instrumentthat allows agents to “break” the relationships from which they are not receiving anybenefit and try to create new ones. This social instrument allows an agent to removelinks with other agents it is connected with, and substitute intelligently those links bynew ones, making this last step the crucial difference with edge removal. Agents decideto rewire a link after the number of unsuccessful interactions1 with another agent goesabove a certain Tolerance threshold. Agents also need to decide whom they want toestablish the new link with. We have developed three different methods:

1Unsuccessful interaction in our convention emergence scenario corresponds to being uncoordinated ornot sharing the same convention for that interaction.

47

1. Random Rewiring: Agents rewire to a randomly selected agent from the popula-tion.

2. Neighbour’s Advice: Agents rewire to an agent recommended by a neighbour.The rewiring agent, X, asks a neighbour with the same preference as that ofX (agent referent) for another agent also with the same preference but not aneighbour of X (agent referred). If it was not possible to achieve that, RandomRewiring will be applied.

3. Global Advice: Agents rewire to an agent that is randomly selected by the systemfrom those that have the same strategy. This rewiring technique is sued as controlcase.

Despite the similarity with the work of Griffiths [Griffiths and Luck, 2010], thereexist a crucial difference with our approach: both [Griffiths and Luck, 2010] use anevolutionary approach, observing the results of their techniques after the reproductionof a number of generations, and with a certain mutation rate. On the other hand, weuse a more online approach where agents can modify their social network on runtime,without the necessity of evolving new generations. In addition, our rewiring methodsdo not access any private agent’s information (used only in the Global Advice which isused as a control case), such as their actual reward. There is also a clear difference ofthis social weapon with the partner selection mechanism: rewiring produces a tangiblechange in the network, partner selection on the other hand does not directly affect thenetwork but prioritise some nodes over the rest.

Observation

In a social learning scenario, allowing agents to observe the strategy of other agentsoutside their circle of interaction can provide useful information to support the con-vention emergence process. However, there has to be a trade-off between observingand interacting. Allowing agents to observe other agents’ state might help when decid-ing which strategy to adopt. However, there has to be a trade-off between observingand interacting. In order to analyze the effects of observation we will allow agents toobserve, at certain timesteps, a subset of agents’ states in the population. Therefore,agents will be assigned an Observation Probability. Moreover, agents need to know theamount of agents they can observe (Observation Limit) and how they want to observe(Observation Method). We propose three different observation methods:

1. Random Observation: Agents observe random agents from the society. (Fig-ure 4.1(b)).

2. Local Observation: Agents observe their immediate neighbours in the social net-work. (Figure 4.1(a)).

3. Random Focal Observation: Agents select one random agent from the societyand observe that agent and its direct neighbours. (Figure 4.1(c)).

48

(a) Local (b) Random (c) Random Focal

Figure 4.1: Observation Methods

After the observation process, the agent will choose the majority action taken bythe selected observed agents and will reinforce it. We are aware that in certain config-urations, choosing the observed “majority action” might not help agents to achieve thegeneral convention. Nonetheless, and because of the intrinsic nature of the conventionemergence process, the observed “majority action” will lead in most of the situations tothe general majority action.

Despite the similarity, this instrument and mimicking (presented in the previous sec-tion) behave differently. Mimicking agents copy from a successful agent the strategy(which in most of the cases is private) as well as the interaction neighbours (whichmight also be private). On the contrary, with the observation instrument agents observethe last decision (hoping this represents the actual strategy) of a group of agents (with-out knowing the actual success of the agents) and update their estimates based on themajority. With observation, agents only access information that has been previouslymade public by the observed agent, while with mimicking, they access information thatcan be considered private.

4.2 ExperimentsWe use the simulation model presented in Section 3.1 to evaluate how the social instru-ments that we propose influence the process of emergence of social conventions.

However, and in order to obtain complete results, we have introduced at this pointdifferent decision making functions to evaluate the efficiency of our social instruments.The decision making functions are:

1. Best Response Rule (BRR) [Mukherjee et al., 2007, Sen and Airiau, 2007]:Agents choose the action with which they have obtained the highest payoff inthe last iteration. A positive reward is given to agents if they are coordinated.This function gives preference to the instantaneously utility maximizing action.

2. Highest Cumulative Reward Rule (HCRR) [Shoham and Tennenholtz, 1997a,Kittock, 1993]: “an agent switches to a new action iff the total payoff obtained

49

from that action in the latest m iterations is greater than the payoff obtained fromthe currently-chosen action in the same time period”. A reward is generated basedon a coordination game. This function gives preference to the action that has ob-tained the higher accumulated payoff in the last m interactions (rather than onlyin the last one).

3. Memory Based Rule (MBR) [Villatoro et al., 2009]: A positive reward is givento agents if they are coordinated, and it is proportional to the actions they havechosen in the past. This function gives preference to the action that has providedthe largest payoff while taking into consideration in the reward function also theprevious actions (promoting concordance with previous history).

For all the following experiments we fix the population size to 100 agents (as wehave been able to understand the different population size effects so far) amd the mem-ory size to 5 (for HCRR and MBR).

4.2.1 Rewiring

We have experimented with the three rewiring methods introduced in Section 4.1.1 onthree topologies: a low clustered 2 one dimensional lattice (lattice with NeighborhoodSize = 10), a high clustered one dimensional lattice (lattice with Neighborhood Size= 30), and a scale free network. We have explored the search space of the ToleranceLevels, observing how they affect the convergence time and the number of components(maximally connected subgraphs) created when convergence is reached with the differ-ent decision making functions.

Influence of Rewiring Methods

From the experimental results obtained we have observed how, in general, the GlobalAdvice (GA) rewiring method produces the best convergence time due to its central-ized nature and access to global information. Nonetheless the decentralized methods,specially the Neighbor’s Advice (NA) method, also show good performances. The NAmethod improves the Random Rewiring (RR) method as it more expediently resolvesthe subconventions that appear in the one-dimensional lattices during the conventionemergence process. These metastable subconventions have been already identifiedin [Salazar-Ramirez et al., 2008, Toivonen et al., 2009, Villatoro et al., 2009] to be thepreferred action of a group of nodes who have converged to a different convention thanthe rest. These subconventions, as noted in [Villatoro et al., 2009], do persist as longas the frontier region remains metastable3. The frontier region is the group of nodes inthe subconvention that directly interacts with other nodes that have adopted a differentconvention (if nodes represent agents, and the color of the node the convention theyhave adopted, Fig 4.2 shows examples of frontiers in two different networks).

2Clustering Coefficient is a measure of degree to which nodes in a graph tend to cluster together.3Metastable conventions are those that should preferably be broken, but remain stable because of a com-

bination of factors (like interaction dynamics, topology and decision making functions).

50

(a) Frontier in Ring Network. (b) Frontier in Scale Free

Figure 4.2: Examples of Frontiers in Different Networks.

When using the Neighbor’s Advice method, these subconventions are resolved moreexpediently. Agents in the frontier use the rewiring instrument as they cross the toler-ance level faster than those not in the frontier. For this reason, the RR method will relinkan agent with a more suitable agent with a probability of 1

NumberO f Actions . In contrast,the NA method will relink the agent with another one with the same preference if it isaccessible. In case there is no other agent with the same preference to connect with,random rewiring will be applied, obtaining in the worst case scenario, the same results.

These results are reaffirmed for the scale-free networks, although an interestingphenomenon concerning the final number of components is observed in this case. Fig-ure 4.3 shows the size of the clusters when a convention was reached. The horizontalaxis represents the Tolerance levels and the vertical axis presents ranges (of size 10)of cluster sizes, and the histogram darkness the amount of clusters. When using theRR (Fig. 4.3(a)) and the GA (Fig. 4.3(c)) methods, the number of components obtainedis always above one, resulting in a fragmented society. The number of components isslightly higher for the NA method than when using the other methods. This phenomenonis observed because of the structure of the scale-free networks (very few nodes with highdegree (hubs) and a larger amount of nodes with a lower degree following a power-lawdistribution) and the dynamics of the rewiring methods. Non-hub nodes reach their tol-erance levels faster than hub nodes (as non-hub nodes interact with a lower amount ofagents). Once the tolerance level is reached, the rewiring process starts working. We donot observe any distinctive behavior separating the RR or the GA methods as they bothincorporate a random component. However, when using the NA the number of compo-nents is variable, depending on the tolerance level: with a low tolerance the number ofcomponents is higher, although we have observed that most of them are smaller com-ponents (as it is shown in Fig. 4.3(b) observing larger amounts, represented with darkercolours in the figure, in the smaller cluster ranges).

Non-hubs (when using the NA method) will rewire to any of their leaf nodes4 sep-

4A Leaf node is a vertex of degree 1

51

(a) RR Method (b) NA Method (c) GA Method

Figure 4.3: Cluster Sizes Histogram for Scale Free

arating completely from the initial complete component (e.g. in Fig. 4.4), creating agreater number of components, but with smaller size.

Figure 4.4: Example of Components Evolution in Scale Free NA Rewiring Method.

Influence of topology

When observing the effects of the topology, we find that the convergence time is in-creased under the effects of rewiring when the neighborhood size is increased. Thiseffect is due to the clustering coefficient of the network. The one dimensional lat-tices with higher neighborhood sizes are more fragmented than those with more re-stricted neighborhoods. Therefore, when increasing the neighborhood size, the num-ber of links between agents also increase, thereby increasing the clustering coefficient.Highly clustered societies are more resistant to rewiring, as the node that wants touse the rewiring would have to apply it to a higher number of nodes, and then, berewired to the same amount of nodes with the appropriate strategy. This is one ofthe scenarios where the difference between our social instrument and that presentedin [Hales and Arteconi, 2006, Griffiths and Luck, 2010]: our instrument needs to sub-stitute one by one the selected links; on the other hand, in [Hales and Arteconi, 2006,

52

1

1.5

2

2.5

3

3.5

4

4.5

5

500

1000

1500

2000

2500

3000

3500

4000

Ave

rage

Num

ber

of C

lust

ers

Tolerance

Random RewiringNeighbour’s Advice

Global Advice

(a) NS = 10

1

1.5

2

2.5

3

3.5

4

4.5

5

500

1000

1500

2000

2500

3000

3500

4000A

vera

ge N

umbe

r of

Clu

ster

s

Tolerance


Global Advice

(b) NS = 30

1

1.5

2

2.5

3

3.5

4

4.5

5

500

1000

1500

2000

2500

3000

3500

4000

Ave

rage

Num

ber

of C

lust

ers

Tolerance


Global Advice

(c) Scale Free

Figure 4.5: Rewiring Methods Comparison for Number of components in a One Di-mensional Lattice

Griffiths and Luck, 2010] their instrument accesses the private information of anotheragent and copies their set of neighbors. Our social instrument sacrifices convergencetime for the sake of privacy.

Tightly linked to the previous results, we also observed that under the effects ofrewiring there exists a trade-off between the diameter of the network and the number ofcomponents: with higher diameters (smaller neighborhood size) the number of compo-nents remains constant and greater than one (as it can be seen in Figure 4.5(a)). How-ever, we can observe in Figure 4.5(b) that when the diameter is smaller (with higherneighborhood size), the number of components is reduced to one with certain valuesof Tolerance. The explanation of this phenomenon is again related to the clusteringcoefficient of the network combined with the dynamics of the rewiring process. Whenthe clustering level of the network is low, agents have less links to be rewired. Initially,and before the Tolerance level is reached, agents also change their preferences withoutrewiring. Eventually, and after the Tolerance level is reached, agents start rewiring theirlinks (and reducing their number of preference changes). As it was explained previ-ously, low clustered societies have higher tendency to become disconnected and formdifferent components.

Experimental results (shown in Figure 4.6) also show interesting propertieswith Scale Free networks (see Section 4.2.1). Similar to what was observedby [Villatoro et al., 2009] we found that subconventional effect has a stronger effect onScale Free networks when using MBR, producing (as it can be seen in Figure 4.5(c))generally a larger number of components. However, when using the rewiring socialinstrument, global convergence is obtained in this type of networks.

We can conclude that rewiring performs better in low clustered societies, producinga stratified population which results in a significant reduction in convergence time. Inmore clustered networks, the tolerance level has to be chosen carefully (depending onthe other experimental parameters) to produce an effective technique for norm emer-

53

0

10000

20000

30000

40000

50000

60000

70000

80000

500 1000 1500 2000 2500 3000 3500 4000

Con

verg

ence

Tim

e

Tolerance

Scale Free Network


Global Advice

Figure 4.6: Convergence Times with different Rewiring Methods in a Scale Free Net-work.

gence.

4.2.2 ObservationIn this section we analyze the effects of observation as a social instrument when usedby agents. We test and compare the three different methods proposed, exploring thesearch space with a representative range of Observation Probability values. As we didin the previous section, we experiment with three topologies: a low clustered lattice, ahigh clustered lattice and a scale free. To observe the effects of the different observationmethods, we fix the Observation Limit to 10 for the experiments.

Influence of Observance Methods

Comparing the results from the three Observation methods we observe in Figure 4.7that the Random (RO) and the Random Focal Observation (RFO) methods are the mosteffective ones (with different topological effects to be commented), and have very sim-ilar results, when compared with the Local Observation (LO) method. The reason forthis phenomenon is to be found on the frontier effect. When agents use the LO method,they observe their direct neighbors. If the observing agent is in the frontier area, then,this observation is pointless. However, observing different areas gives a better under-standing of the state of the world, and hence the RO and the RFO methods performbetter.

Influence of topology

For the BRR and MBR decision making functions, we have observed that the differ-ent Observation methods produce a more pronounced effect in societies with higherdiameters, as we can see in Fig. 4.7. We notice that a small percentage of Observation

54

1

10

100

1000

10000

100000

1e+06

1e+07

0 0.2 0.4 0.6 0.8 1

Con

verg

ence

Tim

e

Observance Probability

NS = 10NS = 30NS = 50NS = 70NS = 90

(a) Local Observation

1

10

100

1000

10000

100000

1e+06

1e+07

0 0.2 0.4 0.6 0.8 1

Con

verg

ence

Tim

e


NS = 10NS = 30NS = 50NS = 70NS = 90

(b) Random Observation

1

10

100

1000

10000

100000

1e+06

1e+07

0 0.2 0.4 0.6 0.8 1

Con

verg

ence

Tim

e


NS = 10NS = 30NS = 50NS = 70NS = 90

(c) Random Focal Observation

Figure 4.7: Comparison on Effects of Convergence Time on One Dimensional Lattice.

55

drastically reduces convergence times. The reason for this effect can again be found inthe frontier and the subconvention effect previously discussed. Subconventions emergemore readily when the social network has a small diameter and the frontier region rep-resents the unsettled area. These subconventions are more easily resolved at these fron-tiers by observation rather than by learning through interactions.

Influence of Strategy Decision Techniques

The main experimental results obtained for BRR and MBR is that greater observationleads to faster convergence. However, in most cases, even smaller observation percent-age helps significantly reduce the convergence time. This result is important if Observa-tion has a concomitant cost. However, observation has a poor effect on HCRR: we findthat the convergence time is proportional to the observation probability (the higher theobservation, the worst it performs). The reason of this effect is found on the HCRR’slack of exploration policy. When observing, an action is reinforced, and this actionmight be different from the one that the agent has converged to within its environment.This new reinforcement destabilizes the convergence that was already obtained. Thiseffect is not produced with the others reward rules as they have an exploration rate thatquickly corrects this destabilization.

4.3 Discussion on the Frontier EffectThe experiments performed up to now have shown us how the emergence of social con-ventions can be facilitated with the usage of our proposed social instruments, achievinggood results. However, some practical information has to be kept for the appropriateusage of them: on the one hand, tolerance levels have to be carefully chosen whenusing rewiring. On topologies with large diameters the social network might suffer im-portant changes affecting the number of components. Moreover, we have observed aninteresting phenomenon on Scale Free networks: without the usage of the social instru-ments full convergence was hard to obtain; with the rewiring instrument convergence isreached producing a larger amount of components when using the Neighbour’s Advicemethod.

On the other hand, a small percentage of observation is very helpful in the achieve-ment of full convergence. Interestingly, when using the Local Observance method, thisimprovement is not as with the other methods.

With these pieces of information we can hypothesize that there exist a special typeof subconventions that cannot be resolved at a local level with the methods proposed,specially in Scale Free Networks.

4.4 Understanding Subconventions in Scale-Free Net-works.

The results from experiments presented above, together with the observations nar-rated by other authors [Epstein, 2000, Toivonen et al., 2009, Villatoro et al., 2009], con-

56

vinced us that subconventions are problematic obstacles to the emergence of globalconventions. These subconventions thrive (amongst other reasons) because of the topo-logical structure of the network where they emerge.

Because of the inherent structure of the scale-free networks (nodes whose connec-tions follow a power-law distribution), interesting properties of the network emerge inthe dynamics of the convention emergence process. To have a more detailed idea of howthe topology affect the process, we have performed a detailed study of its dynamics.

4.4.1 Who’s the Strongest Node?In previous works in the literature [Sen and Airiau, 2007, Mukherjee et al., 2008], someauthors fixed the behaviour of a number of agents in the population to observe how theirunchangeable behaviour would affect the emergence of a convention. Those experi-ments were performed in regular networks and the position of the fixed learners on thenetwork was not really considered. Inspired by those works, we have decided to applysuch technique in irregular networks, locating the fixed learners in different positionsof the network.

Scale free networks are basically defined as irregular networks where very fewnodes have a large number of connections (hubs), and, a large number of nodes havea very few connections (leafs). In order to discover which nodes have a stronger ef-fect in the emergence of conventions, we experiment by fixing a certain percentage ofthe population to an specific (and common to all of fixed-agents) convention. Threeexperimental variants are tested:

1. fixing hub agents’ behaviour: the behavior of the nodes with higher degree willbe fixed.

2. fixing leaf agents’ behaviour: the behavior of the nodes with lower degree willbe fixed.

3. fixing random agents’ behaviour: the behavior of nodes selected randomly willbe fixed.

We hypothesize that by fixing the behaviour of hub agents, the rest of the popula-tion will converge faster than in any other possible situation. Our intuitions are foundedon the dynamics of the convention emergence: given that a hub node will interact withmore agents, its fixed behaviour will remain unaffected by the others’ decisions. How-ever, as others are learners, they will learn from the decisions of the hubs.

Experimental results presented in Fig. 4.8 partially confirm our hypotheses. Wehave performed an intense evaluation of the search space, exploring different percent-ages of fixed agents in different locations of the network and analyzing their conver-gence rate. For small numbers of fixed agents, fixing the hubs produces better con-vergence rates than when fixing the same amount of leafs or random agents (as we canobserve in Fig. 4.8(a), Fig. 4.8(b), Fig. 4.8(c), and Fig. 4.8(d)). On the other hand, whenthe number of fixed agents is larger, fixing the leafs results in much faster convergencetime (as it can be seen in Fig. 4.8(e) and Fig. 4.8(f)).

This empirical result lead us to pay attention on the leafs. Experimental data con-firms that when the majority of the leafs have reached a common convention (artificially

57

(a) 5% Fixed Learners

(b) 25% Fixed Learners

(c) 35% Fixed Learners

(d) 45% Fixed Learners

(e) 65% Fixed Learners

(f) 75% Fixed Learners

Figure 4.8: Fixing Agents Behavior

58

by fixing its behaviour, or naturally through the convention process) reaching the fullconvergence can be easily achieved. Consequently, in order to identify why learningleafs (by fixing the hubs’ behavior) produces such delay in the convention emergence,we analyzed carefully the networks when the metastability was reached. By takingsnapshots of the state of the convention emergence process (the structure and state ofthe network: which node is in which state and connected with which other nodes) atthe moment where no improvement in the convergence rate has been made, we findwhat we have defined as Self Reinforcing Substructures (SRS). These substructures area group of nodes that, given the appropriate configuration of agents’ preferences andnetwork topology, do maintain subconventions.

During this research we have been able to abstract the topology of these structuresinto two general structures:

• The Claw SRS is formed by connecting a node with a number of hangers5 con-nected to it smaller than the number of links with the rest of the network. Inthe situation where the hangers coordinate to the same convention among them-selves and with the connecting node, we have a self-reinforcing structure. Forexample, in Fig. 4.9(a), A is the central node, having one connection with the restof the network and 3 hangers: B (that it is another claw), C (plain hanger) and D(chain’s connecting node).

• The Caterpillar SRS is a structure formed by a central path and from its memberscan hang other SRSs (such as claws, chains, or plain hangers). For example, inFig. 4.9(b), A, B, C, and D are members of the central path, and the other nodesreinforce them.

These two abstract (examples in Fig. 4.9) structures can be found as subnetworksof scale-free and random networks. As we have observed, the existence of these SRS(74% of the generated networks with the methods described in [Delgado et al., 2003]contain SRS) are the main reason why convergence to a 90% level (as observedby [Delgado et al., 2003]) is achieved relatively quickly, but overcoming the last 10%(containing the SRS) is much harder to achieve.

In order to check the validity of our hypotheses, we have extended the previousexperiment, adding another variant where we fix the behavior of the nodes in the SRSwith higher betweenness6.

Experimental results in Fig.4.10 show a comparison of the results of convergencerates of the best performing variant of those presented previously and the one that fixesthe agents’ behavior in the SRS. We can observe that fixing the SRS nodes outperformsany of the other strategies.

This result is very interesting as it help us understand the specific substructuresthat generate subconventions within social networks. This result is applied in the nextsection to help us solve the subconventions effect in a more robust way.

5A hanger is formed by nodes that are connected to a member of a cyclic component, but which do notthemselves lie on a cycle [Scott, 2000], and a chain is a walk in which all vertices and edges are distinct.

6Betweenness measures the extent to which a particular node lies “between” the various other nodes inthe graph [Freeman, 1979].

59

(a) Claw (b) Caterpillar

Figure 4.9: Self-Reinforcing Structures

4.5 Combining Instruments: Solving the Frontier Ef-fect.

After experimenting with both social instruments in Sec.4.2, we observed that the sub-conventions need to be resolved in what we consider to be the “frontier” region.

Theoretically, a subconvention in a regular network is not metastable (i.e. the sub-convention is continuosly affected tending to disappear), but unfortunately, slows downthe process of emergence. On the other hand, in other network types, such as randomor scale-free , subconventions seem to reach metastable states7 because of the existenceof the Self-Reinforcing Structures identified in Sec.4.4.1.

Therefore, by giving agents the tools to dissolve these frontiers, we hypothesizethat convention emergence will be achieved faster and full convergence rates will beobtained.

However, because of the way social instruments have been designed, the instru-ments can be activated in situations where it is not necessary (observation follows aprobabilistic approach, and rewiring is activated using a rewiring tolerance). Therefore,and by combining both social instruments developed in this work, we have designeda composed instrument for resolving subconventions in the frontier in an effective androbust manner. This composed instrument allows agents to “observe” when they arein a frontier, and then, apply rewiring, with the intention of breaking subconventions.To effectively use this combined approach, agents must first recognize when they are

7By experimentation, we have observed that around 99% of the generated scale-free networks do notachieve full convergence before one million timesteps with any of the decision making functions used in thiswork and without any social instrument.

60

(a) 5% Fixed Learners

(b) 25% Fixed Learners

(c) 35% Fixed Learners

(d) 45% Fixed Learners

(e) 65% Fixed Learners

(f) 75% Fixed Learners

Figure 4.10: Comparison with Fixed SRS Nodes

61

located on a frontier.We define a frontier as a group of nodes in the subconvention that are neighbours

to other nodes with a different convention and at the same time are not in the frontierwith any other group. To provide a more precise definition we first need to define oursystem:

Systemt = {A,Relt ,St} (4.1)

where A is a set of agents, Relt is a neighbouring function at time t, St is the actualstate of agents at time t.We can therefore formalize the notion of subconvention Sub:

Subt ⊂ A where ∃a ∈ Subt , 6 ∃b ∈ Subt | Relt(a,b)∧Sta 6= St

b

And now we can define when an agent is located in a frontier:

Frontiert(a) = {a,c ∈ Subt ∧∃b 6∈ Subt ∧Relt(a,b)∧Sta 6= St

b∧Relt(a,c)∧(∀d ∈ A | Relt(c,d)→ St

c == Std)} (4.2)

This formula basically means that an agent a is in the frontier when it is in frontof another agent (b) not sharing the same convention, and at the same time, is beingreinforced by other agents (c) that are not in the frontier.

As the cases are different for regular or irregular networks, two types of frontiersneed to be defined:

• weak frontiers as the ones that are not metastable in regular networks.

• strong frontiers as the ones generated by the SRSs in irregular networks.

The most important characteristic that defines a frontier is the existence of a con-frontation. Confrontation occurs when two agents in an interaction do not share thesame convention 8.

Before proceeding further, we will define three characteristics of agents with re-spect to their convention and the topological position in the network. An agent is inequilibrium if it has the same number of neighbours in its own convention as in theother convention. An agent is a weak node if the number of neighbours in its ownconvention is lower than those in the other, and an agent is a strong node otherwise (ifthe number of neighbours in its own convention is greater than those in the other). Inregular networks, two confronted agents are in a frontier region iff: (1) At least one ofthe confronted agents is in an equilibrium position, and (2) all the neighbours of an in-equilibrium confronted agent are strong nodes. In irregular networks, two confrontedagents are in a frontier region iff both agents are strong nodes.

With these formalizations and identification of what the frontiers are in each of thetopologies, agents can be equipped with this knowledge and the tools necessary to rec-ognize when they are located in a frontier and repair that situation. To achieve this

8Not sharing the same convention, choosing a different action, or choosing a different state to be, areconsidered equivalent expressions for our purpose.

62

task we will use the simple instruments presented in Sec. 4.1.1 but in a combined man-ner: firstly, agents will use observation (with the local observance method) to identifyif they are located in a frontier (and therefore part of a self-reinforcing structure), andsecondly, they will use rewiring to solve the frontier problem, by disconnecting fromanother agent in the SRS and reconnecting to another random agent.

4.5.1 ResultsWe have conducted exhaustive experimentation with the composed instrument on thethree topologies and using the different decision making functions described in the pre-vious section. The use of the composed instrument on the regular networks does notproduce an improvement on convergence time with respect to simple rewiring (one ex-ample of topology and strategy decision technique can be observed in Fig. 4.11(a)).However, an important improvement is observed in the number of rewired links (oneexample of this improvement can be seen in Fig. 4.11(b)). In general, this improve-ment is observed for lower tolerances. The reason of this effect is because for highertolerances rewiring works in the same way as the composed social instrument, but with-out observing. For those smaller values, the effect is intense, reducing the number ofrewiring links down to half of the original value.

On the other hand we observe an important improvement for convergence timeswhen using the composed instrument (with the recognition of SRS) on irregular net-works. The results presented in Figure 4.12 represent the average results from 25 dif-ferent scale-free networks with and without using the Combined Social instrument. Bycomparing Figure 4.12(a) and Figure 4.12(b) we notice the trade-off between the im-provement in convergence time and the amount of rewiring to be done. The reasonof this phenomena is because the Composed social instrument decomposes the SRSdifferently than the simple rewiring which only rewires the node in the actual frontier.

4.6 ConclusionsIn this chapter we have taken a step forward in the state-of-the-art research on conven-tion emergence by setting the convention emergence rate to 100%. This improvementmight initially seem meaningless although our experimental results showed that over-coming the 90% rate is a challenging task in certain configurations of the environmentbecause of the creation of subconventions. This research focused on identifying underwhat conditions subconventions emerge and are maintained along the time. We hypoth-esized that was mainly for two reasons: cultural maintenance and endogamy amongstthe members. To prove the first we introduced a Memory based reward rule that sim-ulated a function that promoted the cultural maintenance, analyzing carefully the sit-uations where subconventions emerged. After a profound analysis we learnt that thesubconventions are strongly affected by the topological configuration of the interactionnetwork which promotes strong endogamy. As system designers we are still interestedin achieving a 100% convergence rate (as it is the most efficient strategy for the systemas a whole when dealing with conventions), and assuming agents cannot change theirreward function, we have introduced the use of Social Instruments as tools that facilitate

63

(a) Convergence Time

(b) Rewired Links

Figure 4.11: Comparison with Simple and Combined Social Instruments on RegularNetwork using MBR.

64

(a) Convergence Time

(b) Rewired Links

Figure 4.12: Comparison with Simple and Combined Social Instruments on Scale FreeNetwork using BRR.

65

norm evolution. We have identified the characteristics and opportunities for effectivelyutilizing these social instruments for facilitating norm emergence through social learn-ing. Social instruments are attractive since they do not require centralized monitoringor enforcement mechanisms, normally are extremely easy to use, have very low com-putational costs, and are scalable to large systems. Despite the usage of the socialinstruments, subconventions were still resistant on certain configurations of the envi-ronment leading us to perform a more exhaustive study to identify the Self-ReinforcingStructures that are present in social networks like the Scale-Free.

Finally, we have presented a composed social instrument as a robust solution againstthe persistence of subconventions generated by the identified SRS, improving the con-vergence times obtained with simple rewiring.

In a world where almost 950 million users belong to an online social networkingplatform (where virtual agents could also exist) [Radwanick, 2010]9, it is important tounderstand what mechanisms this virtual entities should be equipped with to facilitatethe emergence of common conventions (for the sake of the whole group) as quicklyas possible. Moreover, as a system manager, the results from this work highlights theharmful potential of Self-Reinforcing Structures within the network for delaying theemergence process, and draws our attention to solutions for such critical problems.

9This report was accessed Sept 1st, 2010 at http://www.comscore.com/Press_Events/Press_Releases/2010/8/Facebook_Captures_Top_Spot_among_Social_Networking_Sites_in_India

66

Chapter 5

Distributed Punishment

The previous chapter has illustrated us how conventions can be delayed by the emer-gence and maintenance of subconventions. As analyzed, subconventions are mainlygenerated by the topological configuration of the interaction network. However, thissituation emerges as we dealt with the most strict and pure definition of convention,where all the options are potentially equally good as long as the whole population fol-lows the same convention. Consequently it is in the agents’ self-interest to accept theconvention with which they obtain maximum benefit. However, empirical results madeus learnt that certain agents obtain a benefit from following a convention, but due tosome exogenous reasons they can be positioned in a frontier.

Being located in a frontier forces an agent to maintain the convention with whichthe highest benefit is obtained; because of the social learning approach, agents obtaina larger benefit from the convention followed by the majority of their neighbours. Thetopological configuration can produce the neighbours not to change their convention soour frontier agent interacts successfully with all its neighbours. Because of that reason,some of the interactions of the frontier agent (depending on the number of uncoordi-nated neighbours) will be unsuccessful. At that moment, all the agents related to theagent(s) in the frontier are interested in reaching a common convention (and resolve thesubconvention), to reduce the number of unsuccessful interactions, and maximize theirutility.

Agents’ strategy are ruled by a utility-maximizing policy. Therefore, by providingagents with the correct mechanisms, they would be able to reduce others utility depend-ing on their behaviour. This type of action is commonly known as a punishment or asanction1. Normally, punishment implies a cost for both the punisher and the punished,reducing both utilities. In that way, we can see how a group of agents could coordinatefor punishing a certain agent, whose behaviour should be changed for the benefit of thesociety, in our case, the frontier agent. Punishment and sanction make special sense inpublic-good scenarios, where it is in the participants’ self-interest to free-ride; however,through punishment/sanctions free-riding can become the least convenient action.

The usage of punishment and sanctions is essential for self-policing so-

1We will see in Chapter 6 the difference between punishment and sanction at a cognitive level.

67

cieties [Papaioannou and Stamoulis, 2005, de Pinninck Bas et al., 2010]. Axel-rod [Axelrod, 1986] identified a number of mechanisms to impose social norms in asociety: metanorms, dominance, internalization, deterrence, social proof, membership,law and reputation. We are specially interested in dominance and deterrence, in the formof sanctions. Posner and Rasmusen [Posner and Rasmusen, 1999] claim that there existthe following types of sanctions:

1. Automatic Sanctions: Those that an agent receive for not being coordinated withthe others, i.e. those for not following the convention.

2. Guilt: The violator feels bad about his violation as a result of his education andupbringing, quite apart from external consequences. Probably most people inour society, though certainly not all, would feel at least somewhat guilty aboutstealing even if they believed they were certain not to be caught. This can beunderstood as a self-punishment imposed after breaking an internally respectednorm.

3. Shame: The violator feels that his action has lowered himself either in his owneyes or in the eyes of other people. In its most common form, shame ariseswhen other people find out about the violation and think badly of the violator.The violator may also feel ashamed, however, even if others do not discover theviolation. He can imagine what they would think if they did discover it, a moralsentiment which can operate even if he knows they will never discover it. Also,he may feel lowered in his own eyes, a “multiple self” situation in which theindividual is both the actor and the observer of his actions. In order to be able toapply this kind of punishment in MAS, it would be needed to develop into ouragent’s mind a mental representation of the other agents normative beliefs, beingthen able to know when others might consider that an agent have broken a norm,and the degree of importance of its violation.

4. Informational sanctions. The violator’s action conveys information about him-self that he would rather others not know. This type of punishment is closelyrelated to trust and reputation systems in MAS. By transmitting a certain piece ofinformation, agents can construct an specific opinion of other agents, to be usedduring the decision making when interacting with that agent.

5. Bilateral costly sanctions. The violator is punished by the actions of and at theexpense of just one other person, whose identity is specified by the norm. Theexpense to that person could be the effort needed to cause the violator disutility,or the utility that the person imposing the punishment loses by punishing him.Examples of what we are calling bilateral costly sanctions could be the agent thatspends part of its bandwidth to reduce other agent’s bandwidth after sharing abad quality resource in a P2P network.

6. Multilateral costly sanctions. The violator is punished by the actions and at theexpense of many other people. One good example would be a coalition of agentsreducing another agent’s bandwidth for sharing bad quality resources in a P2Pnetwork.

68

Due to the type of assumptions we make in this thesis about the scenario and forwhich we are designing our utility-based agents, both costly sanctions (bilateral andmultilateral) are interesting mechanisms as they can reduce directly other agents’ utility.The rest of sanctions (guilt, shame and informational sanctions) are very linked to theinternal beliefs and construction of the agent, and can also be delayed in time, beingthe sanction hard to interpret (like ostracism resulting from the distribution of a badreputation).

Even though applicable for conventions, costly punishment makes special sensewhen dealing with essential norms, which are generally named social norms. Essen-tial norms help agents in fixing the focal action for coordinating in the most beneficialsituation for the society (full cooperation). However, agents are tempted by the NashEquilibrium which brings agents towards defection (as an agent obtains a higher ben-efit from defecting when the rest are cooperating). However, “If holding a norm isassumption of the right to partially control a focal action and recognition of other normholders’ similar right, then a sanction is the exercise of that right. A sanction may benegative, directed at inhibiting a focal action which is proscribed by a norm, or positive,directed at inducing a focal action which is prescribed by a norm.” [Coleman, 1998].

Theoretical, empirical and ethnographic studies about costly punishment in hu-man societies have demonstrated that this behavior promotes and sustains cooper-ation in large groups of unrelated individuals and more generally plays a crucialrole in the maintenance of social order [Fehr and Gachter, 2002, Ostrom et al., 1992,Boyd and Richerson, 1992]. However, when talking about peer punishment, experi-mental results [Dreber et al., 2008] have proven that costly punishment increases theamount of cooperation but not the average payoff of the group, specially, reducingdrastically the payoff of those subjects applying the costly punishment. We agree thatpunishment is therefore a second order public good, as it is executed at someone’s costs.These costs affect also to the agents’ decision making, and might not be convenient forthe potential punisher to punish.

We hypothesize that by dividing the costs of punishment amongst the members ofthe society, this second-order public good becomes less costly, and therefore, moreattractive for the members of the society: at a small costs, the potential benefits areincreased as the number of freeriders decrease.

Within the scope of the MacNorms project [MacNorms, 2008], and in conjunctionwith the Instituto de Analisis Economico, we have proven the viability of distributedcostly punishment through experimentation with human subjects. The main objectiveof the experiments is to study the validity of distributed punishment amongst humansubjects, in order to latter introduce these technologies in heterogeneous societies pop-ulated with human and virtual entities.

To perform the experiments with the human subjects we used HIHEREI: a platformto perform experimental economics experiments with human subjects. As this platformis built on top of an Electronic Institution, we can easily provide it with the capabilityof performing experiments with virtual agents.

69

5.1 HIHEREI: Humans playing in an Electronic Insti-tution

Electronic institutions are regulated frameworks where agents can participate in in or-der to achieve a tasks. The actions and interactions performed by agents inside are anelectronic institution are prefixed by the system designer, being impossible for agentsto perform the non-authorized actions. The set of tools already provided to specify,develop and run electronic institutions do not take into account a way to incorpo-rate humans to participate remotely either alone or together with autonomous agentsin the electronic institution. Given that, we have extended the current technologicalframework to incorporate it. This work is the implementation of the design presentedin [Sabater-Mir et al., 2007].

The electronic institutions (eI) paradigm is specially useful for us to solve this prob-lem for several reasons. The first one is that the experimental subjects behaviour can becontrolled by eI, allowing them only to perform certain tasks in certain moments (withthis functionality we can implement the desired experimental scenario). Moreover, theHIHEREI technology allows us to create a remote connection for human subjects tobe (invisibly) represented by a virtual agent in the eI; this remote connection has beencreated to be accessed through any web navigator, allowing the human subjects to par-ticipate in the experiment from anywhere in the world2. Finally, HIHEREI and the eIprovide us with the correct environment for humans to participate with other entitieswithout knowing the nature of them (virtual or human), transporting our subjects ofstudy to the situation that we are more interested: those scenarios where the humansubject is not certain of the nature of his peers.

Figure 5.1 shows the main elements of the presented architecture:

• Electronic Institution (eI): The concept of electronic institution([Noriega, 1997, Rodrıguez-Aguilar, 2003, Esteva, 2003]) is inspired in hu-man institutions. In open multi-agent systems you have also autonomous entitiesthat interact to achieve individual goals. The behaviour of these entities cannotbe guaranteed. Therefore, and similarly to what happens in human societies,agents need mechanisms to guarantee the good functioning of the system in spiteof the local behaviours. To fully understand the electronic institution machinerywe refer the reader to [Sierra et al., 2004].

• Virtual agents (E-Agents): Agents endowed with autonomous behavior that par-ticipate in the e-institution through the governors (elements that provide the agentwith the interface to interact with the eI and at the same time restrict the possibleactions the agent can perform given the current state of the eI).

• Interface agents (I-Agents): Agents that represent human users in the electronicinstitution. They also act as web servers and, like E-Agents, connect to the eI byinteracting with the governors. This allows a totally distributed approach.

2In order to test how populations from different countries behave with respect to the punishment weinitially planned to run the experiments in the LINEEX Laboratory in Valencia and in the Nutfield Centre forExperimental Social Sciences in Oxford. In the end, because of the shortage of money and the discoveriesfrom other researchers [Noussair et al., 2003], the experiments were restricted to the LINEEX.

70

E l e c t r o n i c I n s t i t u t i o n

E - A g e n t

G o v e r n o r

Performative Structure D B

W e b s e r v e r D O M

S E R V E R S I D E

C L I E N T S I D E

J a v a S c r i p t ( S c r i p t l a n g u a g e )

A J A X

E - A g e n t

I - A g e n t

G o v e r n o r G o v e r n o r

S E R V L E T

S t a f f A g e n t

S t a f f A g e n t

Tracker data

Institutional data

X M L ( d a t a )

Figure 5.1: Human - eI interaction

• Staff agents: Institutional agents in charge of different aspects related to the wellfunctioning of the eI.

• Data base: Everything relevant that happens in the e-institution is stored in theDB for a subsequent analysis. The DB also stores the human user activity regard-ing the actions she/he performs in the client side.

• Client application: It is the software that provides a friendly interface for thehuman user.

5.1.1 Extending EIDEEIDE [Esteva et al., 2008], the Integrated Development Environment for Electronic In-stitutions, is a set of tools developed at the IIIA-CSIC aimed at supporting the engi-neering of multi-agent systems as electronic institutions. However, these tools do notprovide a general framework for a flexible participation of humans in electronic institu-tions. The desirable requirements for this extension are the following:

1. Allow remote access of human users to the eI.

2. Provide flexible and standard client applications for this access. Web applicationsare a good solution since they can run with standard Internet browsers.

The central element in the link between the human user and the e-institution is theservlet, which is activated once the user (client side), using a web browser, establishesthe connection with the server side. This piece of software (I-Agent) is seen as a servlet

71

from the point of view of the web server but at the same time as a normal agent fromthe point of view of the eI.

The I-Agent, acting as a servlet, receives messages in XML format from the clientapplication running locally in the user’s computer as a web application. The XMLmessages can be of two types:

• Tracker messages. This is the information that can be used later to analyze theactions performed by the user in the client side.

• Institutional messages. Information that is associated with the e-institution.These are actions that the user wants to perform and that have an influence inthe state of the eI.

Simultaneously, the client application receives XML messages from the servlet de-scribing the changes that have been produced in the eI. These changes are shown to theuser by the client application.

Under the circumstances of our Experimental Economics Experiments, the follow-ing instantiation of the platform was developed:

1. Electronic institutions: Using the eI development environ-ment [Esteva et al., 2008] an appropriate electronic institution was definedand verified. This includes the specification of interaction protocols, illocutionschemas and a concrete ontology.

2. Client application: Using HIHEREI we developed a web-based application toperform the the experiment. Then, human users can be remotely connected to theeI through a simple web browser. The application guides human subjects throughthe experiment while they interact with other agents (human or simulated).

3. E-Agents: Virtual agents are defined to achieve certain experimental situations.

5.2 Experimental DesignIn order to test the potential of Distributed Punishment, and following the tradition ofExperimental Economics, we have designed the following experiment. Participants ofthe experiment have to play repeatedly a three phase game. All the participants aredivided into groups of 4 that remain constant along the 40 rounds of the experiment.

0C 1C 2C′s 3C′sC 5 10 15 20D 10 15 20 25

Table 5.1: Payoff Matrix for the Distributed Punishment Experiment.

Every round is structured in 3 stages with two decisions to be taken by all the par-ticipants (humans or virtual): in the first stage participants have to decide whether ornot to cooperate to a public good, following the payoffs matrix shown in Table 5.1; the

72

action C means cooperation and the action D means defection. By observing the pay-offs described in Table 5.1 we can easily observe how is in the participants self-interestto defect (by selecting the action D).

During the second stage all participants are informed of the decisions of the otherparticipants in their group, and the benefits obtained so far by each of the participants;moreover, subjects are given the option to assign punishments to the other participantswithin their group, reducing to zero their payoffs obtained in the first stage.

Depending on the treatment, punishing have different costs for the punishers, but itaffects in the same way to the punished: removing all the benefits obtained in the firststage of the game. After all the decisions of the second stage have been taken, duringthe third stage all participants are informed of the decisions of the other participants intheir group and the resulting payoffs of that round.

In order to test the effect of distributed punishment on experimental subjects wehave designed an experiment with 3 different treatments:

1. No Punishment Treatment: No punishment is allowed in this treatment. This isthe control treatment.

2. Monolateral Punishment: In this treatment participants are allowed to send pun-ishments to the other participants assuming each punisher the whole cost of pun-ishment.

3. Distributed Punishment: In this treatment participants are allowed to send pun-ishments to the other participants dividing the punishment costs amongst all thepunishers.

In the last two treatments, it is straightforward to observe that the optimal strategyfor an individual would be defecting and not punishing. However, this behaviour canbe interpreted as freeriding from the rest of peers, imposing costly punishment to thedefector. Therefore, with the existence of punishment, subjects can impose it in or-der to achieve and maintain high cooperation rates. We hypothesize that by having acheaper option to punish, this punishment will be exerted more frequently, increasingthe cooperation rates obtained by the subjects.

5.3 Empirical Results

The experiments with human subjects were performed in the Laboratory for Research inExperimental Economics (LINEEX), at the Center for Research in Social and EconomicBehavior, in the University of Valencia. 40 subjects participated in each treatment,obtaining 10 groups and independent observations per treatment. All the participantswere paid depending on their performance on the game.

Experimental results for the 3 treatments are shown in Figure 5.2. Figure 5.2(a)shows the dynamics of the average cooperation rates of the participants in the threetreatments, meaning (from [0-1]) the rate of agents that decided to cooperation fromthe population. Even though high cooperation rates were not obtained in any of the

73

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

5 10 15 20 25 30 35 40

Coo

pera

tion

Rat

e

Rounds

No PunishmentMonolateral PunishmentDistributed Punishment

(a) Average Cooperation Rates

0

0.1

0.2

0.3

5 10 15 20 25 30 35 40

Pun

ishm

ent R

ate

Rounds

Monolateral PunishmentDistributed Punishment

(b) Average Punishment Rates

Figure 5.2: Distributed Punishment Experimental Results with Human Subjects

74

treatments, experimental results on cooperation rates confirmed that Distributed Pun-ishment outperformed Monolateral Punishment in a 20,95%, and No Punishment in a31,84%.

We realize that high cooperation rates are not obtained because punishment is stilltoo costly for the subjects, and moreover, the cost of punishment is not linear (in eithermonolateral or distributed punishment, the relationship between the punishment costsand its effects is not linear, as the punished looses what he earned at the first phase of thegame). Other works in the literature have proven that the 1:4 punishment factor is mosteffective to promote cooperation [Nikiforakis and Normann, 2008]. Therefore we canobserve how on the one hand, the punished is affected by the punishment received (asit loses all the benefits obtained in that round), but on the other hand, punishment is toocostly to be applied, unless distributed punishment is achieved. Because of the designof the game, distributed punishment is not easily achievable without the possibility ofcommunication amongst the potential punishers, being the punishment rates very lowin general, as it can be seen in Figure 5.2(b).

Treatment Cooperative GroupsNo Punishment 0

Monolateral Punishment 1Distributed Punishment 4

Table 5.2: Cooperative Groups in the Different Treatments.

By observing the data more carefully (shown in Table 5.2) we detected in the Dis-tributed Punishment treatment that cooperation was established and maintained in 4groups, which happen to be exactly the same groups where Distributed Punishmentwas achieved (in the rest of the groups, subjects did not coordinate for distributed pun-ishment and did not achieve high cooperation rates); we can therefore infer that cooper-ation was achieved because of the Distributed Punishment. This result is interesting asin the Monolateral treatment, stable cooperation was only obtained in one group, andnever obtained in the No punishment treatment.

We would like to remember the reader that the experiment was designed to be exe-cuted with no possible communication or planning from the punishers. Agents have tocoordinate by acting and only when this happen, punishments costs are reduced for thepunishers and therefore cooperation is achieved.

The results obtained regarding the Distributed Punishment were encouraging, how-ever, we noticed that two dynamics were acting at the same time: the punisher’s moti-vations and the punished’s motivation. These two dynamics reinforce one to the other,not allowing us to fully understand the behavior of the phenomena.

As we are interested in observing how effective is distributed punishment, we sep-arate both dynamics and analyze carefully the motivations of the punished agent. Inorder to fully understand the dynamics of the punished subjects, we need to explore thedifferent possible situations, i.e. (1) when the subject receives no punishment, (2) whenthe subject receives punishment from only one subject, (3) when the subject receivespunishment from two subjects, (4) when the subject receives punishment from threesubjects.

75

To extract a scientific conclusion, each situation have to be reached a significantnumber of times, obtaining the necessary observations; however, it seems unfeasible (interms of money and time) to repeat the previous experiment as many times as necessaryuntil obtaining the desired results. Moreover, we cannot ensure that the experimentalsubjects will reach the situations we are interested in by themselves. Therefore, weneed to introduce confederate subjects into our experiment, as done in other experi-ments [Asch, 1955] to produce the experimental situations we need. As the groups inour Public Good Game are of size 4, three confederates are needed for each experimen-tal subject, in order to have a completely controlled situation and observe the subjects’reactions.

For the sake of resource optimization (in terms of economic and time expenses) wedecided these confederates to be virtual agents, preprogrammed by us to achieve thedifferent treatments. In that way we fully exploit the capacity of HIHEREI, where theexperimental subjects interact in a web-based platform, without knowing the identity oftheir opponents. Moreover, these type of experiments place the subjects in the type ofsituation that we are more interested: human subjects playing against other participantswithout knowing if they are human or virtual entities, as it happens in any interactionnormally performed on the Internet.

5.4 Disentangling Distributed Punishment: the Pun-ished’s motivation

A total of 80 subjects from the Universitat Autonoma de Barcelona participated vol-untarily in a repeated Public Good’s game at the Experimental Economics Lab. Theparticipants, students from a wide range of fields of study, interacted anonymously us-ing the HIHEREI software. Subjects were not allowed to participate in more than onesession of the experiment. In all, four sessions were conducted in December 2010, withan average of 20 participants.

Each treatment was formed by 20 human subjects. Reproducing the conditions ofthe previous experiment, each human subject was assigned in a group with three otherpreprogrammed confederate agents which remained the same for the entire session.Subjects knew that the experiment would last 40 rounds, but did not know neither theexistence of virtual entities nor the identity of the other participants in their group. Thestructure of the round follows the same structure of the initial experiment: a first stagewere subjects have to decide whether to cooperate or not, a second stage were theydecide whom to punish, and a third stage were subjects are informed with the decisionsfrom their peers.

In order to produce a complete experiment, the search space has to be exhaustivelyexplored. To do so we programmed two types of acting strategies for the virtual agentsinteracting with each subject: the punisher and the non-punisher. Their behavior is pro-grammed to be the same during the first ten rounds, acting differently after the tenthround. The first ten rounds, all virtual actors contribute 50% of the times and punish25% of the other agents. After the tenth round, all agents contribute 90%. The dif-ference comes in the punishment strategy: non-punishers never punish and punishers

76

act on defectors 90% of the times only when the punisher has cooperated (to maintainconsistency). In order to have complete information about the different possibilities, weexplore the complete search space with 4 different treatments with respect the studiedhuman subject:

1. No Punisher Agent Treatment: each human subject interacting with 3 non-punisher agents.

2. 1 Punisher Agent Treatment: each human subject interacting with 1 punisheragent and 2 non-punisher agents.

3. 2 Punisher Agents Treatment: each human subject interacting with 2 punisheragents and 1 non-punisher agent.

4. 3 Punisher Agents Treatment: each human subject interacting with 3 punisheragents.

Empirical data (shown in Figure 5.3(a)) confirms that distributed punishment havea much stronger impact on the cooperation rates of human subjects, specially whenthe punishment is consistent from the whole group (3 Punisher Agents Treatment). Asmall difference between the subjects is taken into consideration when analyzing theresults: as the number of punisher actors increase, so does the probability of a subjectto be punished (as the punisher actor punishes 90% of the times); the probability of adefector of being punished in each of the treatments is therefore 90%, 99% and 99,9%respectively. These probabilities affect directly to the ratio of non-punished defectors,shown in Figure 5.3(b), along the experiment, that remains below a 5% in the 1 punishertreatment, around 2% in the 2 punishers treatment, and a 0,005% in the 3 punisherstreatment. Even though the different punishment probabilities in each treatment, weobserve that they do not have significant impact on the non-punished defectors rates toexplain the differences in the cooperation rates.

We hypothesize that the explanation for the difference on the cooperation rates isfound on the non-evident causes that made the experimental subjects change their coop-eration strategy. Punishment affects equally to the experimental subjects at a monetarylevel independently of the amount of punishers (for those treatments with punishers);however, the subjects decisions have had to be taken by considering other informationthan the potential monetary punishment. In the treatment with three punishers, cooper-ation rates are higher than any other treatment.

Based on this empirical result, and those obtained by oth-ers [Noussair and Tucker, 2005a], we hypothesize that some behaviours (like aconsistent punishment from the entire group) might contain implicit normativemessages that affect to the decision making in a more profound way than a merebenefit-cost calculation. Human subjects do not only take into consideration thepotential punishment that can be received, but they rather interpret some social cuesas the existence of a social norm, that even in the absence of punishment systems, canlead the subject to abide by the norms.

As far as we are concerned, there is no existing agent architecture that is able toidentify these type of cues and uses them into its decision making. Consequently, wehave envisioned a cognitive architecture that incorporate a more complex reasoning than

77

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30 35

Coo

pera

tion

Rat

e

Rounds

0 Punishers1 Punishers2 Punishers3 Punishers

(a) Average Cooperation Rates

0

0.1

0.2

0.3

0.4

0 5 10 15 20 25 30 35

Non

Pun

ishe

d D

efec

tors

Rat

e

Rounds

0 Punishers1 Punisher

2 Punishers3 Punishers

(b) Non Punished Defectors Rates

Figure 5.3: Empirical Results of the Punished.

78

a simple benefit-cost calculation. This architecture, named EMIL-I-A, will be presentedin the following chapter.

79

Chapter 6

EMIL-I-A: TheCogno-Normative AgentArchitecture

In centralized systems norms are explicitly specified ([Garcia-Camino A, 2006,Aldewereld et al., 2007]), might be accessible in a repository and are imposed by alegitimated authority. Differently, in self-organized societies, norms are created andcontrolled (through peer punishment) by the members of the society. As we discussedin Chapter 1, in this work we focus on the establishment of social norms and not in theiremergence.

Peer punishment [Casari, 2004] is considered as a second order public good, asthrough the application of it, norm compliance is achieved, although it implies a cost forthe punisher [Ostrom et al., 1992, Fehr and Gachter, 2000]. This view of punishment isin line with the one supposed by the economic model of crime (see [Becker, 1968])and with the approach adopted by experimental economics (see, [Sigmund, 2007], for areview of this approach). If punishment is therefore seen only as a deterrent mechanism,in our previous experiment, the three treatments should have provided similar resultsin terms of cooperation. However, empirical data showed that one of the treatmentsoutperformed the rest, even though the punishment affected in the same way to theinstrumental reasoning of the punished.

As noted by others [Giardini et al., 2010, Xiao and Houser, 2005,Hirschman, 1984], peer punishment, and punishment in general, can be seen asa twofold mechanism: (1) as an economic instrument to achieve deterrence on thewrong-doer, and (2) as a signalling mechanism used to let the audience know that thepunished behavior is understood as an infraction and therefore it should not be done.

In the experiments performed in the previous chapter, we have seen how the deter-rent effect on the wrong-doer is the same in all the treatments, therefore we hypothesizethat the difference on the compliance rates can only be explained through a more com-plex cognitive mechanism.

Nonetheless, several experiments show that punishment is effective in

81

promoting cooperation also when it does not impose an incentive and itis purely symbolic [Sunstein, 1996, Tyran and Feld, 2006, Houser and Xiao, 2010,Galbiati and D’Antoni, 2007, Masclet et al., 2003]. Social disapproval, peer pressure,public embarassement of offenders, or the communication that a norm violation oc-curred are often applied to alter people’s conduct and in many cases appear to be ef-fective. In these cases, punishing (explicitly or implicitly) informs the offenders andpublic that the targeted behavior is not condoned, and thus elicits the social norms.

These evidences seem at odds with the idea that people’s decisions are influ-enced only by incentives and they suggest that there are other factors driving theirchoices. As shown by several works in psychology, focusing people’s attention on thenorm is a crucial factor in producing norm-compliant behavior [Cialdini et al., 1990,Bicchieri, 2006]. One possible explanation of this phenomenon is that the normativecontent expressed by punishment has the effect of activating people’s normative moti-vation to comply with the norm of cooperation. When the normative content is signalledby punishment, it has the effect of framing the situation in such a way that not only pref-erences to avoid costs are activated, but normative preferences as well. With normativemotivation, we refer to the fact that people follow the norm because they recognizethat there is a norm and they are disposed to comply with it even when there is littlepossibility of instrumental gain, future reciprocation, and when the surveillance rate isvery small. We are not claiming here that people are always and only driven by norma-tive motivations when complying with norms, but that their decision is motivated by acombination of normative and cost-avoidance motivations. The hypothesis that peoplecan follow norms as ultimate ends is controversial (see [Barkow et al., 1995], for anevolutionary explanation), but there are several interesting models, such as for example[Bicchieri, 2006, Gintis, 2003], showing how normative preferences can be includedin the utility function of the individuals and how this preference interacts with otherpreferences of the individual.

Recently, researchers have conducted several experiments designed to explore thenorm-signalling effect of sanction in the achievement of cooperation, analysing whatfactors might impact the expressive power of this mechanism [Xiao and Houser, 2005,Masclet, 2003, Noussair and Tucker, 2005a], including ourselves inside the MacNormsproject. However, and up to our knowledge, this is a first attempt to fill a gap in thestate-of-the-art normative agent literature by designing an agent architecture able tointerpret certain social cues (like punishment) in the similar way as humans do. Withthis contribution we are producing an agent that is able to interact consistently in virtualhybrid environments (where both humans and virtual entities interact).

Basing our work on the findings of cognitive scientists [Giardini et al., 2010], andthe results obtained within the MacNorms project, we have developed a BDI cognitivearchitecture that allow agents to process (and include into its reasoning and decisionmaking) the intrinsic signalling with which punishment might be accompanied, disen-tangling punishment and sanction.

82

6.1 Punishment and Sanctions

As specified by [Giardini et al., 2010], punishment and sanction are both mechanismsaimed at changing the behaviors of others, in order to make them abstain from futureviolations. Because of this similarity, these two phenomena are often mistaken onefor another and considered as a single behavior. Punishment and sanction are differentbehaviours and that can be distinguished on the basis of their mental antecedents and ofthe way in which they aim to influence the future conduct of others.

On the one hand, punishment is referred to be a practice consisting in imposinga cost on the wrongdoer, with the aim of deterring him from future offenses. Deter-rence is achieved by modifying the relative costs and benefits of the situation, so thatwrongdoing becomes a less attractive option. The effect of punishment is achieved byinfluencing the instrumental mind of the individual, by shaping his material payoffs.This approach to punishment is in line with the economic model of crime, also knownas the rational choice theory of crime [Becker, 1968], claiming that the deterrent effectof punishment is caused by increasing individuals’ expectations about the price of non-compliance. A rational comparison of the expected benefits and costs guides criminalbehaviors and this produces a disincentive to engage in criminal activities.

On the other hand, sanction works by imposing a cost, as punishment does,and in addition by communicating to the target (and possibly to the audience) boththe existence and the violation of a norm [Giardini et al., 2010, Hirschman, 1984,Xiao and Houser, 2005, Andrighetto et al., 2010b])1 and at asking them to comply withit in the future.

The sanctioner ideally wants the sanctioned to change his conduct not just to avoidthe penalty but because he recognizes that there is a norm and wants to respect it.Sanction mixes together material and symbolic aspects and it is aimed at changing thefuture behaviour of an individual by influencing both its instrumental and normativemind. In order to decide how to behave, the individual will take into consideration notonly a mere costs and benefits measure but also the norm.

Often the sanctioner uses scolding to reign in free-riders, or expresses indignation orblame, or simply mentions that the targeted behaviour violated a norm. Through theseactions, he aims to focus people’s attention on different normative aspects, such as: (a)the existence and violation of a norm; (b) the high rate of norm surveillance in the socialgroup; (c) the causal link between violation and sanction: “you are being sanctionedbecause you violated that norm”; (d) the fact that the sanctioner is a norm defender.As suggested by works in psychology, all these normative messages have a key effectin producing norm compliance and favouring social control as well. Even a strongpersonal commitment to a norm does not predict behaviour if that norm is not activatedor made the focus of attention [Bicchieri, 2006, Cialdini et al., 1990]. Furthermore, themore these norms are made salient, the more they will elicit a normative conduct.

Norm salience indicates to an individual how operative and relevant a norm is withina group and a given context [Andrighetto et al., 2010b]. It is a complex function, de-pending on several social and individual factors. On the one hand, the actions of others

1Clearly, also punishment can have a norm-signalling effect as an unintended by-product, but only thesanctioner intentionally has this norm-defense goal.

83

provide information about how important a norm is within that social group. On theother hand, norm salience is also affected by the individual sphere, it depends on thedegree of entrenchment with beliefs, goals, values and previously internalized norms ofthe agent.

We claim that both punishment and sanction favor the increment of cooperation insocial systems, but sanction achieves cooperation in a more stable way and at a lowercost for the system. Cooperation is expected to be more robust if agents’ decisions aredriven not only by instrumental considerations but are also based on normative ones.Moreover, an individual that complies with the norm for internal reasons is also morewilling to exercise a special form of social control as well, reproaching transgressorsand reminding would-be violators that they are doing something wrong.

6.2 The cognitive dynamics of normsBuilding on Ullman-Margalit’s definition of a norm [Ullman-Margalit, 1977] as a pre-scribed guide for conduct which is generally complied with by the members of society,a norm has been defined [Andrighetto et al., 2007, Campennı et al., 2009] as a behaviorthat spreads through a given society to the extent that the corresponding prescriptionspreads as well, giving rise to a shared set of normative beliefs and goals. A normativebelief is a mental representation, held to be true in the world, that a given action is eitherobligatory, forbidden or permitted for a given set of individuals in a given context. Onthe other hand, a normative goal is an internal goal2 relativized 3 to a normative belief:it is the will to perform an action because and to the extent that this is believed to beprescribed by a norm.

There are at least three main types of normative beliefs:

• the main normative belief, stating that: there is a norm prohibit-ing, prescribing, permitting that... [von Wright, 1963, Kelsen, 1979,Conte and Castelfranchi, 2006, Conte and Castelfranchi, 1999].

• the normative belief of pertinence, indicating the set of agents on which the normis impinging.

• the norm enforcement belief, indicating that a positive sanction is consequent tonorm obedience and a negative sanction is consequent to norm violation.

In order to be compliant with the norm, the first two normative beliefs arenecessary conditions: agents should recognize that there is a norm and that itapplies to them. When individuals do not have them in mind, norms exert

2From a cognitive point of view, goals are internal representations triggering-and-guiding action at once:they represent the state of the world that agents want to reach by means of action and that they monitor whileexecuting the action [Conte, 2009]

3A goal is relativized when it is held because and to the extent that a given world-state or event is heldto be true or is expected [Cohen and Levesque, 1990]. An example is the following: tomorrow, I want to gosunbathing to the beach (relativized goal) because and to the extent that I believe tomorrow it will be sunny(expected event). The precise instant I cease to believe that tomorrow it will be sunny, I will drop any will togo to the beach

84

no effect on the behavior [Andrighetto et al., 2007, Campennı et al., 2009]. Fur-thermore, the more these normative mental representation are salient, the morethey will elicit a normative behavior [Bicchieri, 2006, Xiao and Houser, 2005,Cialdini and Goldstein, 2004]. Norm compliance and norm salience are strongly inter-twined: findings from psychology [Cialdini et al., 1990, Bandura, 1991] and behavioraleconomics [Xiao and Hauser, 2009, Bicchieri and Chavez, 2010] have pointed out thatdrawing people’s attention on a social norm and making it salient elicits an appropriatebehavior. Making a norm salient typically means providing people with informationabout the behavior and beliefs of the other individuals [Bicchieri and Xiao, 2007, p 4].

Despite the main normative belief and the belief of pertinence, the norm enforcingbelief is not a defining element of the norm, it simply enforces it. A normative commandis a special command that is intended to be adopted by its addressees because it isnormative and norms must be obeyed [von Wright, 1963]. Of course this motive can beabsent or weak in the minds of people, depending on the socialization and educationprocess and the credit obtained by current institutions. Sub-ideally, norms are oftencomplied with because they are enforced by a system of sanctions. But ideally, theyare meant to be observed because are norms and should be complied with for their ownsake.

However, a belief is not yet a decided action. Normative beliefs are necessary butinsufficient conditions for norms to be complied with. What leads agents endowed withone or more normative beliefs to execute them, especially since, by definition, normsprescribe costly behaviours? How can norms generate goals?

Usually normative beliefs generate normative goals by reference to an external en-forcement4 (sanctions, approval, etc.). The agent calculates the costs and benefits ofcomplying with or violating the norm and then decides how to behave. If no such agoal is generated, the norm will be violated.

6.3 The EMIL-I-A ArchitectureIn order to account for the different forms of punishment (and intended to achieve

other cognitive processes as we will see) a rich cognitive platform, namely a BDI-typearchitecture is required and EMIL-A [Andrighetto et al., 2007, Campennı et al., 2009]seemed a good candidate. Our extension is called EMIL-I-A, which stands for EMILInternalizer Agent. As the name suggests this architecture was first conceived for NormInternalization. Internalization is achieved with another module of this architecturethat will be carefully analyzed in Chapter 8. In this chapter we center on how thedifference between sanction and punishment is taken into account by for the describedarchitecture.

EMIL-I-A (as EMIL-A) consists of mechanisms and mental representations allow-ing norms to affect the behavior of autonomous intelligent agents. As any BDI-type(Belief-Desire and Intention) architecture EMIL-I-A operates through modules for dif-ferent sub-tasks (recognition, adoption, decision making, salience control, etc. . . ) andacts on mental representations for goals and beliefs in a non-rigid sequence.

4See [Conte and Castelfranchi, 1995, Conte and Castelfranchi, 2006] for a fine grained analysis of differ-ent reasons behind norm compliance)

85

Actio

n

World

Norm

Rec

ogni

tion

Mod

ule

Salie

nce

Cont

rol M

odul

e

Even

ts

Norm

Re

cogn

ized?

Ye

s

Goal:

Max

imize

Utili

ty

Decis

ion

Mak

ing

Choo

se U

tility

Max

imize

r St

rate

gy

Norm

ative

Dec

ision

Mak

ing

Mod

ule

Defa

ult D

ecisi

on M

akin

g M

odul

e

No

Goal:

Max

imize

Utili

ty

Norm

ative

Goa

l: Co

mpl

y wi

th N

orm

s

Decis

ion

Mak

ing

Bene

fit C

ost E

valu

atio

n of

Av

ailab

le Ac

tions

NBeli

ef 1

: Nor

m E

xiste

nce

NBeli

ef 2

: Per

tinen

ce

NBeli

ef 3

: Pun

ishm

ent

Socia

l Cue

s Re

ward

Rewa

rd

Figure 6.1: EMIL-I-A Architectural Design

86

Our normative agent architecture (represented in Fig. 6.1) has three important parts:the norm recognition module, the salience control module, and decision-making.

For further references on how the norm recognition module works, we refer thereader to [Campennı et al., 2009]. The norm recognition module (that is accessed whennorms have not yet been recognized, as it is shown in Fig. 6.1) allows agents to interpreta social input as a norm. In order for agents to recognize the existence of a norm, theyhave to hear from consistent agents5 a certain number of normative messages, such as“you should not take advantage of your group members by shirking” and observe afew normative actions compliant with the norm or aimed to defend it (i.e. cooperation,punishment and sanction, observed or received).

After recognition, a norm activates the three types of normative beliefs describedin Section 6.2, that are stored in the normative board6 (inside the Normative DecisionMaking Module represented in Fig. 6.1) . Once generated or activated, normative be-liefs will be inputted to the norm-adoption module: a normative goal - relativized tothe expected enforcement - will be generated. In this condition the normative agentadopts the norm, because it wants to avoid punishment. The normative goal is theninputted to the normative decision-maker and compared with other goals (on the basisof their salience, updated with the Salience Control Module) possibly active in the sys-tem. The normative decision-maker will choose which one to execute and will convertit into a normative intention (i.e. an executable goal). Once executed, this normativeintention will give rise to norm-compliance and/or norm-defense and/or norm transmis-sion through communication. Otherwise, it will eventually be abandoned, solution thatbrings to the utility maximizing strategy (normally, norm violation).

6.3.1 Norm SalienceWe adopt a definition of norms in which social norms are not static objects andthe degree to which they are operative and active varies from group to groupand from one agent to another: we refer to the degree of activation as norm’ssalience. The more salient a norm is, the more it will elicit a normative be-haviour [Bicchieri, 2006, Xiao and Houser, 2005, Cialdini et al., 1990]. Norm’ssalience is a complex function, depending on several social and individual factorsallowing agents to dynamically monitor if the normative scene is changing and to adaptto it7. For example, in an unstable social environment, if norm enforcement suddenlydecreases, agents having highly salient norms are less inclined to violate them, as ahighly salient norm is a reason for which an agent continues to comply with it even inabsence of punishment. It guarantees a sort of inertia, making agents less prompt toshift from the present strategy to a more favorable one. Viceversa, if a specific normdecays, agents are able to detect this change, ceasing to comply with it and adaptingto the new state of affairs. Finally, if an agent faces an emergency or a normativeconflict, norm salience allows it to decide which action to perform providing it with a

5An agent is consistent if, when choosing to punish, it has before cooperated in the PD.6The normative board is a portion of the long-term memory where normative beliefs are stored, ordered

by salience.7It is interesting to notice that this mechanism allows agents to record the social and normative informa-

tion, without necessarily proactively exploring the world (e.g. with a trial and error procedure).

87

criterion to compare the norms applicable to the context. For example, on the basis ofthe relative salience of the two norms, “stop at the red traffic light” and “do not blockthe way, when listening to an ambulance siren”, the agent will decide whether to waitor move on.

This is possible because our normative agents are as autonomous as socially respon-sive. They are autonomous in that they act on their own beliefs and goals (on the basisof their salience). However, they are also responsive to their environment, and to theinputs they receive from it, especially to social inputs.

On the one hand, the actions of others provide information about how important anorm is within the group. In particular, norm’s salience is affected by:

1. The amount of observed compliance and the costs people are willing to sustainin order to comply ([Cialdini et al., 1990]).

2. The surveillance rate, the frequency and intensity of punishment ([Haley, 2003]).

3. The enforcement typology (private or public, 2nd and 3rd party, punishment orsanction, etc.) ([Masclet, 2003]).

4. The efforts and costs spared to educate the population to a certain norm; thevisibility and explicitness of the norm ([Cialdini et al., 1990]).

5. The credibility and legitimacy of the normative source ([Sacks et al., 2009]).

On the other hand, norm salience is also affected by the individual sphere, it dependson the degree of entrenchment with beliefs, goals, values and previously internalizednorms. It has to be pointed out that the norm salience can also gradually decrease: forexample, this happens when agents realize that norm violations received no punishmentor when normative beliefs stay inactive for a certain time lag, suggesting that the normis not very active in the population anymore.

In order our agents to obtain an accurate and representative salience measure, wehave designed a Salience Control Module. This module is fed with the social infor-mation accessible by agents, transforming and aggregating those cues into a value thatrepresents the degree of activation of an specific norm. With that information, agentscan be endowed with other mechanisms in order to decide when to abide by (or violate)intelligently the existing social norms. However, the agents’ decision making module isdependent on the scenario and environment and we will analyze two different decisionmaking modules, built on top of norms’ salience, in the following chapters.

6.3.2 Salience Control ModuleBy using the Salience Control Module agents can understand the relative importance

of each norm. The module is fed (interaction after interaction) by social and normativecues available within their surrounding, and those are the following:

• Compliants Observed: Neighboring agents who comply with the norm.

• Violators Observed: Neighboring agents who violated the norm.

88

Social Cue WeightNorm Compliance/Violation (C) wC = (+/-) 0.99Observed Norm Compliance (O) wO = (+) 0.33Non Punished Violators (NPD) wNPD = (-) 0.66

Punishment Observed/Applied/Received (P) wP = (+) 0.33Sanctioning Observed/Applied/Received (S) wS = (+) 0.99

Explicit Norm Invocation Observed/Received (E) wE = (+) 0.99

Table 6.1: Normative Social Information Weights for the Salience Aggregation Func-tion.

• Non Punished Violators: The amount of unpunished neighboring agents that vi-olated the norm.

• Punishment Observed/Applied/Received: The amount of punishment actions ob-served in the neighbourhood, applied to other agents or received from otheragents.

• Sanctioning Observed/Applied/Received: The amount of sanctioning actions ob-served in the neighbourhood, applied to other agents or received from otheragents.

• Explicit Norm Invocation Observed/Received: The amount of explicit norm invo-cations observed in the neighbourhood, applied to other agents or received fromother agents.

Each of these cues (see Table 6.1) is aggregated with different weight. A higher weightis given to those social actions that are interpreted as driven by normative motivations,and the values and their ranking have been extracted from [Cialdini et al., 1990].

Norms’ salience is updated according to the formula below and the social weightsdescribed in Table 6.1:

SalNt = SalNt−1 + 1

α×φ(wC +O ·wO +NPD ·wNPD +P ·wP +S ·wS +E ·wE) (6.1)

where SalNt represents the salience of the norm N at time t, α the number of neigh-

bors that the agent has, φ the normalization value, wX the weights specified in Table 6.1,and finally O,NPD,P,S,E indicate the registered occurences of each cue. The resultingsalience measure (SalN

t ∈ [0− 1], 0 representing minimum salience and 1 maximumsalience) is subjective thus providing flexibility and adaptability to the system.

89

6.4 ConclusionsIn this chapter we have discussed about the necessity of a cognitive model that is ableto represent the existence of norms into the agents decision making. We have seen inprevious chapters how an utilitarian-based agent architecture is not enough, based onthe empirical results obtained with human subjects.

We have developed a cogno-normative agent architecture based on different norma-tive beliefs and goals that allow agents to comply with norms in a cognitive way whenthese are recognized. Our modular architecture activates a different decision makingwhen norms are identified (following a normative decision making) different than theclassical utility-maximizing one, as those seen in Chapters 3 and Chapter 4. Once thenorms are identified, the normative beliefs related to the recognition of a norm will alsoaffect the agents decision making, orchestrated by the norms salience. This salienceindicates agents how important norms are within a normative context, allowing agentsto perform an intelligent norm violation in those necessary cases.

In the next chapter we will put this architecture into use, exploiting it to observe theeffect of norms and punishment into the agents self-regulation.

90

Chapter 7

Proving EMIL-I-A

In this chapter we explore, by means of cognitive modeling and agent based simula-tion, the specific ways in which punishment and sanction favor the recognition of socialnorms, and adapt accordingly their degree of activation, in order to achieve an intelli-gent compliance.

For such task, we put EMIL-I-A into action. Therefore, we need to develop a com-plete agent architecture, integrating the Salience Control Module (explained in Chap-ter 6) with the agents’ decision making. We prove the efficiency of EMIL-I-A by repro-ducing with agent based simulation the Distributed Punishment Experiment presentedin Chapter 5.4.

Based on our results with human subjects in the Distributed Punishment Experi-ment, and under the suspicion that different types of punishment have a different effecton the emergence of cooperation, we will present the design and results of new exper-iments with human subjects that make explicit the difference between punishment andsanction, proving afterwards the consistency of EMIL-I-A with the empirical results.

After checking the efficiency of EMIL-I-A we fully exploit its capabilities by ana-lyzing in a simulated scenario the different dynamics of punishment and sanction in theachievement of norm compliance, focusing specially on the emergence of cooperation.As both punishment and sanction are costly behaviors, both for the enforcer and the tar-get, a well-designed enforcement system should combine high efficacy in discouragingcheating with limited costs for society. For such reason we also introduce a heuristic tobe used locally by agents in order to intelligently adapt the regulation costs.

7.1 EMIL-I-A plays Distributed PunishmentIn order to test the EMIL-I-A implementation we firstly prove its behaviour in the Dis-tributed Punishment experiment presented previously.

As it was specified in Section 5.4, in the Distributed Punishment game agents haveto take two decisions at two different phases on a repeated game. During the first stageagents have to choose whether to cooperate or not, and then in the second stage theyhave to choose whom to punish inside their group.

91

The dynamics of the game are exactly the same, and we will not repeat them (refer-ring the reader to Section 5.2 for the complete specification of the experiment). How-ever, we need to specify how the agent take these two decisions.

7.1.1 EMIL-I-A Decision MakingIn this model, agents have to take two decisions at two different stages: to cooperate ordefect and to punish/sanction or not. These decisions are influenced by an aggregationof economic, social and normative considerations. The decision making modules incharge of taking these decisions are built into the agents following an evolutionaryapproach using mixed strategies. These mixed strategies are updated after each timestepin the simulation with an aggregation of the Drives that are explained in this section.These mixed strategies allow agents to adapt to the environmental conditions, and toother changes in their social environment.

First Stage Decision: Cooperate or Defect?

We have designed three different drives that are aggregated to update the mixed strat-egy. These three drives represent the influence of different aspects into the cooperationdecision.

SIDt = SIDt−1 +(O× Rt−Rt−1RMax−RMin

) (7.1)

1. Self-Interested Drive (SID): it motivates agents to maximize their individualutility independently of what the norm asks. The SID is updated according Equa-tion 7.1.1 where SIDt represents the self-interested drive at time t, O the orienta-tion (+1 if the agent Cooperated and -1 if it Defected), Rt the reward obtained attime t, and RMax and RMin respectively the maximum and minimum reward thatcan be obtained. In the case where the marginal reward is zero, it is substituted byan inertial value with the same orientation as in the last variation. This way, theproportional and normalized value of the marginal reward (a proportional valueof the gain/loss wrt the last timestep) obtained indicates how the agent wouldchange its cooperation probability.

2. Social Drive1: Agents are influenced by what the majority of their neighborsdo. Combining the results obtained in the experiments with human subjects, 31%of agents ([Asch, 1955]) will change their tendency (with an increment of 18%[Fowler and Christakis, 2010]) towards that of their neighbors in case those areunanimous.

1Even though we model this drive at the theoretical level, we have decided not to include it in the actualplatform yet in order to have clearer results.

92

3. Normative Drive (ND): once the cooperation norm is recognized, agents deci-sions are influenced also by the normative drive. The normative drive is affectedby the norm salience: the more salient the norm is, the higher the motivation tocooperate. Salience is updated as explained in Sec. 6.3.2.

Second Stage Decision: Punish or Sanction or none?

The agent’s decision making module is also in charge of deciding when to punish orsanction a non compliant behaviour. Considering the difference between punishmentand sanction, agents need to decide which one to use: only agents having recognizedthe existence of a norm regulating their group can sanction, otherwise they will just usepunishment.

As discussed in Section 6.1, the punisher and the sanctioner are driven by differentmotivations. The former punishes in order to induce the future cooperation of others,thus expecting a future pecuniary benefit from its acts. On the other hand, the sactioneris driven by a normative motivation: it sanctions to favor the generation and spreadingof norms within the population. Given these differences, the probability governing thedecision of punishing or sanctioning is modified by different factors2 and they changein the following way:

(4) Punishment Drive: Agents change their tendency to punish on the basis ofthe relative amount of defectors with respect to the last round. “The more commonlya norm violation is believed to occur, the lower the individuals’ inclination to punishit.” [Traxler and Winter, 2009]. If the number of defectors increased, agents’ motivationto punish decreases accordingly.

(5) Sanction Drive: Agents change their tendency to sanction on the basis of thenorm salience. The more salient the norm is, the higher the probability to sanctiondefectors.

Therefore, we can see how the mixed strategies are affected both by agents’ deci-sions and by social information. As in evolutionary game theory, eventually these mixedstrategies can tend to extreme values (full cooperation or full defection, and completepunishment or no punishment), thus meaning that the system has converged.

7.1.2 EMIL-I-A Results in Distributed Punishment Experiment

As it was done with the human subjects, each subject (in this case, the EMIL-I-A agents)is assigned with 3 other preprogrammed agents. The effects of the 4 different treatments(no punishment, 1 punisher, 2 punishers, and 3 punishers) are analyzed.

Before showing the results, we need to specify that EMIL-I-As are loaded into theexperiment with an initial cooperation and punishment probability of 50% based on em-pirical evidence. As communication was not allowed in the experiments with humansubjects, the EMIL-I-As’ norm recognition module parameters have been changed, re-ducing the normative messages to 0 and leaving the other parameter with the samevalue. Therefore, for an agent to recognize the norm, 10 normative actions need to be

2Even though agents pay a cost to enforce defectors, this cost is not taken into account when they updatetheir decisions to punish and sanction.

93

seen (either compliance, a violation, or a punishment). Moreover, and because of thesame previous reason, the confederate actor agents only punish, and not sanction. Inthis way we are able to reproduce with fidelity the experimental conditions under whichhuman subjects were tested.

Experimental results show that EMIL-I-As (shown in Figure 7.1(b)) behave follow-ing the same pattern with respect to humans (whose results are shown in Figure 7.1(a)).However, the difference observed in the experiments with human subjects between thethree treatments (with a stronger impact on cooperation when 3 punishers acted simul-taneously) is not as strong with EMIL-I-A subjects.

In order to observe whether EMIL-I-A is being affected by the different punish-ment probabilities noticed in Sec. 5.4, we contrast the results with other type of agents’architectures, driven exclusively by utilitarian motivations (as the ones we used in Chap-ter 3.1, built with Reinforcement Learning Algorithms).

From the different options (Fictitious Play [Fudenberg and Levine, 1998], Q-Learning [Watkins and Dayan, 1992], Win or Learn Fast - Policy Hill Climb-ing [Bowling and Veloso, 2002]) we decide to use a WoLF-PHC as it is the one thatwe believe represents better the learning of an utilitarian-based human: it will maintainits strategy while obtaining benefits, and it will change it and learn otherwise.

Experimental results show that the reinforcement learning agents obtain similar dy-namics than humans, confirming that the instrumental motivation in humans is verystrong, although the high cooperation rates obtained with humans (and EMIL-I-As) arenot reached.

These results give us the certainty that the dynamics of EMIL-I-A have been cor-rectly integrated from the model into a working architecture, allowing us to recover ourinitial hypotheses and test the specific ways in which punishment and sanction favorthe achievement of cooperation and the spreading of social norms in social systemspopulated by autonomous agents.

7.2 Experimenting with Punishment and Sanction

Experimental economics experiments with human subjects allowed us to differentiatethe effects of punishment and sanction on the emergence of cooperation. As we havecommented already, our hypothesis states that when sanction is available as a mecha-nism for peer punishment, higher cooperation rates are obtained and, in the absence ofpunishment, remain higher for a longer time than when punishment is used.

To test this hypothesis we perform a new set of experiments with human subjectsthat explicitly differentiates between punishment and sanction. For the sake of compar-ison, the experiment is designed in a similar fashion to [Fehr and Gachter, 2000].

7.2.1 Experimental Design

Participants are divided into groups of 4, that remain constant for the whole treatment.For this experiment, no virtual agents were used, and all the experiments were per-formed following the Experimental Economics methodology [Croson, 2005].

94

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30 35

Coo

pera

tion

Rat

e

Rounds


(a) Human Subjects Cooperation Rates

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30 35

Coo

pera

tion

Rat

e

Rounds


(b) EMIL-I-A Subjects Cooperation Rates

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30 35

Coo

pera

tion

Rat

e

Rounds

Human SubjectsEMIL-I-A

WoLF

(c) Comparing Humans, EMIL-I-A and WoLF agents

Figure 7.1: Distributed Punishment Experimental Results with Simulated Agents

95

As the objective of this experiment is to test the difference of punishment and sanc-tion on the subjects decision making, we perform two different treatments: the Punish-ment Treatment and the Sanction Treatment. Each treatment is composed of 3 blocksof 10 rounds (with a total of 30 rounds). Depending on the treatment, the second blockwill vary, remaining the first and third block constant.

In all three blocks, at the beginning of every round, participants are given with anendowment of 20 ECUs, and at the end of each round participants are informed of thebenefits obtained in that round.

In the first block of 10 rounds (as well as in the third block), subjects simultane-ously choose the portion of their endowment to contribute to a group account. EachECU contributed to the group account yields a payoff of 0.4 ECU to each of the fourmembers of the group. Each ECU not contributed by the subject is credited to the sub-jects own earnings. Therefore, the earnings of an individual in a period are calculatedwith formula 7.2.1, where cx is the amount contributed by subject x.

E = 20 - ci +0.4×n∑

k=1ck (7.2)

Once all participants of the group make their contribution decision for that round, thesystem informs each subject of his initial endowment, his individual contribution, thetotal group contribution, and his individual earnings.

In the second block (rounds 11-20), subjects have to take decision in two phases.The first phase decision is exactly the same than in the first block. During the secondphase, agents have to simultaneously decide whom and how much to punish/sanction.At the end of the first stage, subjects are informed of the contribution levels of each ofthe other members of their group. For the Punishment Treatment, they can assign fromzero to ten punishment points to each of the three other group members, with a costof 1 ECU per point. Each point received from any other agent reduces the first stageearnings of the receiving subject 3 ECUs.

In the case of the Sanction Treatment, the second phase remains equal with a slightvariation: apart from sending the punishment points as in the Punishment Treatment,subjects can also send a message that justifies their punishment decision and will besent together with the punishment points. The message is predefined by us, allowingcertain parameters to be chosen by the subjects, like the correct amount that should becontributed (represented by the XXX). The message structure is the following:

It should be contributed XXX ECUs, because:

1. it is what is supposed to be done.

2. because that way we all are better off.

3. because that way you will not be punished.

96

With this message structure, subjects have to specify (in the XXX provided space)what they consider the “correct” amount to be contributed, and one of the three reasonswhy that should be done. The first message (“it is what is supposed to be done”) corre-sponds to a normative explanation, the second message (“because that way we all arebetter off ”) corresponds to a social preference explanation, and the last one (“becausethat way you will not be punished”) corresponds to a self-interested explanation. Thisway, in the Sanction Treatment, subjects have the chance to send punishment pointswith a justification of what the correct behaviour should be and why, in contrast withthe Punishment Treatment where no messages are sent. We want to remark that theoreti-cally, and as the messages that go with the punishment points are optional and cost-free,the Punishment Treatment can produce the same utilitarian effect than the SanctioningTreatment.

At the end of this second phase, participants are shown the punishments (and mes-sages if any) sent to each of them, and their final benefit of the round.

This three block experimental structure allows us to observe what happens in theabsence of punishment (in the first block) and clearly observe the effect of both pun-ishment technologies when available (in the second block). The third block of theexperiment allows us to observe the inertia introduced in the subjects with punish-ment/sanction.

7.2.2 Empirical Results

The experiments with human subjects were performed in the Laboratory for Research inExperimental Economics (LINEEX), at the Center for Research in Social and EconomicBehavior, in the University of Valencia. 48 subjects participated in each treatment,obtaining 12 groups and independent observations per treatment. All the participantswere paid depending on their performance in the game.

Experimental results are shown in Figure 7.2(a). Empirical data confirm our hy-potheses showing that sanction do have a stronger impact on the subjects decision mak-ing, resulting in a higher and more stable cooperative strategy than when punishmentis used. This result is important since the message accompanying the punishment donot have any monetary costs associated, and indeed produces higher cooperation rates.Moreover, as we can see in Figure 7.2(b), the average amount of sanctions sent persubject is considerably reduced with respect to the average amount of punishments3.

In general we can observe how both punishment/sanctioning technologies are ef-fective in attaining contribution levels higher than would typically be observed in theabsence of any system. The results obtained in our experiments are in concordancewith those presented in [Fehr and Gachter, 2000, Noussair and Tucker, 2005b], at leastin the punishment treatment. In [Noussair and Tucker, 2005b] authors incorporate a

3At the beginning of each treatment, subjects are informed about the 3-block structure of the experiment,but they did not know the specific configuration of each block. Instructions were read and explained at thebeginning of each block. Because of this experiment configuration, subjects discover only at the beginningof the second block that they can punish/sanction after the first phase, and they will be able to do it for 10rounds. This introduces an “end-of-the-world” effect, for which, when “the end of the world” is approaching,the decisions are not rational anymore. In our case, it is the last round of the second block, where we observean abrupt increase of both punishments and sanctions, which should not be considered as representative.

97

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

5 10 15 20 25 30

Con

trib

utio

n R

ate

Rounds

PunishmentSanction

(a) Average Cooperation

0

0.5

1

1.5

2

2.5

5 10 15 20 25 30

Ave

rage

Pun

ishm

ents

per

Age

nt

Rounds

PunishmentSanction

(b) Average Punishment/Sanction

0

10

20

30

40

50

60

% M

essa

ges

No Message

Normative

Social Preference

Self-Interested

(c) Message Distribution for the Sanction Treatment

Figure 7.2: Punishment and Sanction Experiment with Human Subjects98

symbolic punishment that reminds of our sanction mechanism: they allow subject tosend punishment points in the same way we did in our Punishment Treatment althoughwith no costs associated to the subjects assigning or receiving points. These points canbe understood as a level of disapproval of a subject’s contribution decision in the firststage. However, they lack of an accompanying significance of why that should be done,which is included in the message that goes with our punishment points.

With respect to the message chosen by subjects, in the Sanctioning Treatment (theonly one where a message can be accompanied with the punishment points), subjectshave a preference between sending no message at all (50,48% of the times and obtain-ing a plain punishment), or, sending the Social Preference message (“because that waywe all are better off.” is sent 42,63% of the times), as it is indicated in Figure 7.2(c).While others have studied the strength of different types of messages accompanyingpunishment with other punishment technologies [Bo and Bo, 2009] (centralized pun-ishment coming from the system instead that from peers), our experimental results con-firm that when punishment comes with the message justifying the punishing decisionfrom the peer, cooperation rates are increased. Even though we will not further explorethe reasons of this effect (and leaving this line of research open for future work) wehypothesize that this occurs because when punishment is accompanied with a messagethe punisher is seen as an entitled authority by the punished, activating its normativemind.

With the light thrown with the results obtained with human subjects, we can affirmthat our assumption (integrated into the EMIL-I-A platform) about the stronger impactof sanctioning over punishment was correct.

7.2.3 EMIL-I-A Results

In order to check the correctness of the implementation of our EMIL-I-A platform,we decided to simulate the previously performed experiments substituting the humansubjects with EMIL-I-A agents, as we did with the Distributed Punishment experimentin Sec.7.1.

There are a number of reasons why we believe that it is hard to simulate the hu-man subjects results as there are many other factors affecting their decision makingand cannot be controlled in these in-vitro experiments. Among them, the one we con-sider essential is the distribution of personalities on the experiment; it has already beendemonstrated that there are different types of subject attitudes towards contribution in apublic good [Brennan et al., 2008], and we believe that this proportion of each type ofpotential personality has an effect on the group’s dynamics.

In the experiments with EMIL-I-As we assume an homogeneous population ofagents, with the same characteristics and tendencies in the decision-making; this as-sumption is taken because for the design of EMIL-I-A, average values from humansubjects were taken and implemented in the architecture, without considering differenttypes of human personalities4.

4Agents can be given a “personality” with the EMIL-I-A architecture by fine-tuning the decision making.In the specific configuration detailed in Section 7.1.1, different weights applied while aggregating the differentdrives would produce a different type of behaviour.

99

We need to make a number of assumptions that will help us fix some EMIL-I-A’s behavioural parameters before running the simulation: agents are initialized witha tendency to cooperate of the 50%, an initial tendency to punish of the 70%. Webelieve these are pertinent (based on empirical results from humans) and necessary torun the simulation in the proper way. Moreover, we assume that our EMIL-I-A agentsalready know the existence of the cooperation norm (as humans do before going intothe experiment) with an initial salience of 0.75.

In order to reproduce the same experiments performed with human subjects, werestrict the simulation platform to only allow punishment on one of the treatments,and punishment and sanction in the second treatment. For the sanctioning treatmentwe have taken an implementation decision: as we saw with the human subjects, themessage chosen by them was fundamental to achieve the desired effect (cooperationrate’s raise); in our architecture, and assuming the homogeneity of agents, we havedecided to avoid the problems associated to the semantics of the message and we fixthis message as a universal message, to be understood equally by all EMIL-I-A agentsas the normative message that is part of a sanction 5.

We have slightly altered the structure of the experiment by eliminating the firstblock: restricting agents from the punishment/sanctioning technologies, cooperationrates will start decreasing. Because of the evolutionary strategies with which agentsare provided, the recovery from the absence of punishment/sanction would result tobe much slower than in humans. The reason why humans react in a such a way isbecause the “re-start” effect: the experiment pauses and the instructions are read tohumans, producing in them the image of a re-start of the experiment. Unfortunatelythese dynamics cannot be realistically incorporated into the agents.

The experimental results are shown in Figure 7.3(a). The EMIL-I-A’s obtained dy-namics when contrasting punishment and sanction are similar than those obtained withhumans: when sanctioning is available, cooperation rates reach values between 70%and 80%, in contrast with the 50% obtained with punishment. Moreover, in the absenceof punishment, cooperation rates in both cases start decreasing, being the decreasementslower when sanction is used. In general, we observe that EMIL-I-A is slower than hu-mans because of the evolutionary strategy update performed; on the other hand, humanswere affected by different external inputs like the instructions reading between blocks,and possibly other non-identified dynamics. The delay observed is of 5 rounds, beingthe average behaviour almost identical after that adaptation time.

The dynamics of punishment and sanction (shown in Figure 7.3(b)) are also similarafter this adaptation time, being the average amount of punishments and sanctions sim-ilar at the end of the punishment block wrt the humans experiment. We observed that,in both humans and simulated agents, the number of sanctions is lower than the numberof punishments, representing a cheaper cost for society to obtain cooperation.

These experiments have been useful to test the correct implementation of the EMIL-I-A architecture and a satisfactory selection of the Salience Weights.

5We want to remind the reader that we consider a sanction to be a punishment with a normative message.

100

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20

Con

trib

utio

n R

ate

Rounds

PunishmentSanction

No Pun/Sanc

(a) Average Cooperation

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20

Ave

rage

San

ctio

ns p

er A

gent

Rounds

PunishmentSanction

(b) Average Punishment/Sanction

Figure 7.3: Punishment and Sanction Experiment with EMIL-I-A Simulated Subjects

101

Player 2Player 1 Cooperate Defect

Cooperate3

3

5

0

Defect0

5

1

1

Figure 7.4: Prisoner’s Dilemma Game Payoff Matrix

7.3 Exploiting EMIL-I-A

So far we have seen how EMIL-I-A and the Salience Control Mechanism have beencorrectly designed, obtaining similar results when comparing its behaviour with thatof human subjects. To perform an exhaustive analysis of these two mechanisms, weintroduce a simulation model where to test them.

7.3.1 Simulation Model

In order to capture the specific dynamics of punishment and sanction and to test theirrelative effects in the achievement and maintenance of cooperation a simulation modelhas been developed.

In this model, agents play a variation of the classic Prisoner’s Dilemma Game(PDG) (whose payoff matrix is shown in Table 7.4), where an extra stage has been in-cluded in the game: after deciding whether to cooperate or not, agents can also choosewhether they want (or not) to punish or sanction the opponents who defected. Themotivation behind the PDG is due to our long-term research goal aimed at studyingenforcing technologies in virtual societies, and more specifically in environments likeP2P scenarios or web-services markets. These types of scenarios share a number ofcharacteristics with the PDG: dyadic encounters, repeated interactions, and one-shotgames.

Each timestep of the simulation is structured in 4 phases, that are repeated for afixed number of timesteps:

1. Partner Selection: Agents are randomly paired with their neighbours in the so-cial network.

2. First Stage: Agents play the PDG, deciding whether to cooperate (C) or to defect(D).

3. Second Stage: Agents decide whether to punish/sanction or not the opponentswho defected. Only agents who have recognized that there is a norm of coopera-tion governing their group use sanction to enforce others’ behaviours; otherwisepunishment is used. Punishment works by imposing a cost to the defector, thisway affecting its payoffs. On the other hand, sanction also informs the target (and

102

possibly the audience) that the performed action violated a social norm, thus hav-ing an impact both on agents’ payoffs and on the process of norm recognition andnorm salience. If an agent decides not to punish/sanction and it is a norm-holder(i.e. an agent with a highly salient norm of cooperation stored in its mind), it cansend an educational message to its opponent. Problems related to the way agentscommunicate the abstract concept of norm are outside of the scope of this paperand therefore, we assume perfect communication.

4. Strategy Update: As agents have mixed strategies, these strategies are updatedon the basis of their decisions, payoffs and social information acquired.


One of the main objectives of this research is to study the achievement of cooperation inadverse situations, where defecting is the utility-maximizing strategy for the agents. Inorder to observe the relative effects of punishment and sanction in our artificial scenario,we exploit the advantages of having agents endowed with normative minds (fine-tunedwith the parameters obtained with human-subjects in similar scenarios) allowing themto process the signals produced by these two enforcing mechanisms separately.

In order to analyze the differences of both types of punishment on the agents’ de-cision making, we have performed an exhaustive experimental analysis. To reduce thesearch space (and save computation costs), we have prefixed the population size to 100agents.

In the following experimental section, we contrast the results obtained in simula-tions where only punishment is allowed against situations in which both punishmentand sanction are allowed. Only after having recognized the existence of the coopera-tion norm, agents will sanction defectors, thus defending the norm; otherwise they willjust punish them.

As in this work we are not interested in analysing the emergence of norms, someagents already endowed with the cooperation norm are initially loaded into the simula-tion: we refer to them as initial norm’s holders (INHs), and are initially loaded with thenorm at a salience of 0,8. If no agent had the norm, they would have to start a processof norm emergence that would include the recognition of an anti-social behavior, theidentification of a possible solution and the consequent implementation in the societyin the form of norms.

All the following experiments have been run on two different topologies: a fullyconnected network (where agents have access to complete information) and a scale-free network (where agents only access local information). Despite obtaining the samefinal convergence results in both topologies, we have observed a delay in the scale-freenetworks, caused by the cascade effect of the spread of norms produced by the locationof the INHs (better connected INHs produce faster convergence results).

103

7.3.3 Punishment and Sanctioning on the Emergence of Coopera-tion

In this first experiment, we analyze the relative effects of punishment and sanctionon the achievement of cooperation, paying attention to the amount of INHs and thedifferent damages imposed with punishment and sanction. By comparing the resultsobtained with different damages in Fig. 7.5, we observe that different damages (i.e.the amount of punishment/sanction imposed to the target) affect the cooperation levelsdifferently. We can understand this first experiment as a proof-of-concept to check thecorrectness of our architecture. As expected, agents’ motivation to defect decreases ina much stronger way with a damage of 5 than with a lower damage of 36. It has to bepointed out that, despite what happens when using punishment, in populations enforcedby sanction (with a minimum amount of INHs, 10 in this experiment), cooperation isalso achieved when imposing a lower damage, as can be seen in Fig. 7.5(a), Fig. 7.5(c),and Fig. 7.5(e).

Both punishment and sanction directly affect the agents’ SID, reducing their moti-vation to defect. However, these two enforcing mechanisms are effective in achievingdeterrence only when the damage imposed is at least 3, as with a damage of 3 (Coop-eration Payoff = 3, Defection Payoff = 5 - 3 = 2 ) cooperation is the utility-maximizingstrategy.

As said in Sec. 7.1.1, agents’ cooperation probability is affected by both the SID andthe ND: sanction - thanks to its signalling component - influences the normative drivemore than punishment. In order to obtain deterrence, punishment exploits the power ofnorms much less than sanction, that is why it needs to impose higher damages on itstargets.

The amount of INHs produces an interesting result when using sanctioning: whenthe number of INHs is increased, emergence is achieved faster and it follows a distri-bution equal to the neighbors distribution (in the regular networks the emergence ofcooperation increases linearly and exponentially in scale-free networks). When usingpunishment the dynamics of the cooperation are different, as the salience is not affectedas strongly, and therefore the normative drive does not push towards cooperation asmuch as when sanction is used. In the next section we analyze this phenomenon.

7.3.4 Norms Spreading

With the proof-of-concept experiment presented in the last section, we not only testedthe effects of punishment and sanctioning on the achievement of cooperation, but also,how the initial amount of norm holders affects the dynamics of cooperation. Analysingthe same experimental data presented in the previous section, we now observe the rel-ative effects of sanctions and punishments on the recognition of the cooperation norm.As specified previously, our hypothesis is that, thanks to its signaling nature, sanction

6These values have been chosen for experimentation as both 3 and 5 punishment damages turn the co-operative action into the utility-maximizing strategy. A damage of 3 produces a slight improvement forcooperation (Payoff = 3) over the defection (Payoff 5 - 3 = 2). On the other hand, a damage of 5 produces astronger difference between cooperation (Payoff = 3) and defection (Payoff = 0).

104

0

0.2

0.4

0.6

0.8

1

0 50 100 150 200

Coo

pera

tion

Rat

e

Timesteps

SP

(a) 10 INHs. Damage 3.

0

0.2

0.4

0.6

0.8

1

0 50 100 150 200C

oope

ratio

n R

ate

Timesteps

SP

(b) 10 INHs. Damage 5.

0

0.2

0.4

0.6

0.8

1

0 50 100 150 200

Coo

pera

tion

Rat

e

Timesteps

SP

(c) 30 INHs. Damage 3.

0

0.2

0.4

0.6

0.8

1

0 50 100 150 200

Coo

pera

tion

Rat

e

Timesteps

SP

(d) 30 INHs. Damage 5.

0

0.2

0.4

0.6

0.8

1

0 50 100 150 200

Coo

pera

tion

Rat

e

Timesteps

SP

(e) 50 INHs. Damage 3.

0

0.2

0.4

0.6

0.8

1

0 50 100 150 200

Coo

pera

tion

Rat

e

Timesteps

SP

(f) 50 INHs. Damage 5.

Figure 7.5: Effects of Punishment (P) and Sanction (S) on the Emergence of Coopera-tion.

105

allows norms to spread more and more quickly in the population, making it more re-silient to change than if enforced only by mere punishment.

By paying attention to the average norm’s salience per agent in Fig. 7.6, we appre-ciate that sanction produces a stronger effect on the spread of norms than punishment,thus verifying our hypothesis. This phenomenon is mainly caused by the design of ournormative architecture, where sanctions impact in a stronger way the agents’ recogni-tion and salience of norms.

Moreover, this is a self-reinforcing process: once an agent has recognized a norm, itwill start sanctioning, getting others to comply with the norm, reproaching transgressorsand reminding would-be violators that they are doing something wrong. This processaffects and reinforces the sanctioner’s own salience, but also the norm’s salience of itsneighbours.

7.3.5 Costs of Punishment

Our second hypothesis is that sanctioning combines high efficacy in discouraging de-fectors with lower costs for society as compared to punishment. From the data shownin Table 7.1, we observe that in the two situations where both punishment and sanctionallow cooperation to emerge (i.e. imposing a damage of 5), sanctions are 32,52% betterin terms of the amount of punishment/sanction occurrences and the cost for the societywrt the situation where punishments are used. In other words, when using sanction, theamount of punishment/sanctioning acts and consequently the associated costs are de-creased by 1/3. This is an interesting result that confirms our hypothesis, that the use ofsanctions reduces the social costs of the achivement of cooperation as it impacts deeperon the salience.

Punishments Sanctions Individual Cost Social CostPunishment 4,8748 0 5 24,374

Sanction 0,3364 1,2491 5 7,9275

Table 7.1: Costs of Punishment and Sanctioning. Average values per agent andtimestep.

Moreover, (and even though the data obtained with Damage 3 are not comparableamongst them because of the different effects on the emergence of cooperation), wecan observe that - from a system designer point of view - knowing the right value ofthe punishment/sanction to apply (5 better than 3 in this case) would reduce the globalcosts, obtaining the same result on the emergence of cooperation (full cooperation). Thereason for this phenomena is because a sanction of 5 would produce a stronger deterrenteffect in the self-utilitarian drive of the agents than a sanction of 3, which needs to berepeatedly applied to achieve the same results.

106

0

0.2

0.4

0.6

0.8

1

0 50 100 150 200

Avg

Sal

ienc

e

Timesteps

SP

(a) 10 INHs. Damage 3.

0

0.2

0.4

0.6

0.8

1

0 50 100 150 200A

vg S

alie

nce

Timesteps

SP

(b) 10 INHs. Damage 5.

0

0.2

0.4

0.6

0.8

1

0 50 100 150 200

Avg

Sal

ienc

e

Timesteps

SP

(c) 30 INHs. Damage 3.

0

0.2

0.4

0.6

0.8

1

0 50 100 150 200

Avg

Sal

ienc

e

Timesteps

SP

(d) 30 INHs. Damage 5.

0

0.2

0.4

0.6

0.8

1

0 50 100 150 200

Avg

Sal

ienc

e

Timesteps

SP

(e) 50 INHs. Damage 3.

0

0.2

0.4

0.6

0.8

1

0 50 100 150 200

Avg

Sal

ienc

e

Timesteps

SP

(f) 50 INHs. Damage 5.

Figure 7.6: Effects of Punishment (P) and Sanction (S) on the Salience.

107

7.3.6 Dynamic Adaptation HeuristicIn the experiment in Sec. 7.3.3, the damage that both punishers and sanctioners canimpose on defectors in order to deter them from violating again is fixed for the entireduration of the simulation. But in some circumstances, this damage could be reduced,without lowering its deterring effect. In order to allow our agents to dynamically choosethe right amount of punishment and sanction to impose, thus reducing social costs, aDynamic Adaptation Heuristic (DAH) has been implemented. This heuristic works inthe following way:

if De f ectorst−1 ≤ De f ectorst AND De f ectorst > ToleranceDe f then1

Increase PunCost by ∆;2

if De f ectorst−1 > De f ectorst OR De f ectorst < ToleranceDe f then3

Decrease PunCost by ∆;4

Algorithm 4: Dynamic Adaptation Heuristic (DAH).

By keeping track of the amount of defectors in the previous timestep(De f ectorst−1), and comparing it with the actual amount of defectors, the imposedcost of punishment (PunCost) is adapted consequently with a ∆ (∆ = 0.1 in this work),thus obtaining an intelligent dynamic adaptation. ToleranceDe f is another parameterused to allow agents to be tolerant against exploration from the other agents, allowing apercentage of violations in their social environment. If ToleranceDe f is fixed to zero,it might produce a continuous rise of the Punishment Cost without having an effect onthe violators mind (as some violations are done by mere exploration).

Punishments Sanctions Individual Cost Social CostDAH Punishment 2,1142 0 5,1640 10,9177DAH Sanction 0,2932 0,8669 3,7031 4,2959

Table 7.2: Performance of the Dynamic Adaptation Heuristic. Average values pertimestep.

In order to test how the implemented DAH affects the performances of both punish-ment and sanction, an exhaustive exploration of the search space with different amountsof INHs (from 10 to 90, increasing at intervals of 10) has been performed.

Results obtained for different amounts of INHs follow the same distribution, for thecases with and without DAH. In Table 7.2, the average performances of both punish-ment and sanction in the dynamic and static conditions are shown. The DAH allowssociety to significantly reduce the number of punishing (57% less) and sanctioning(27% less) acts with respect to the static condition reported in Table 7.1.

However, when using DAH, the average individual cost of punishment is slightlyhigher than that of sanctioning. This is given by the cyclic dynamics produced byagents driven only by their strategic drives. When enforced by punishment, agentsabandon their defecting strategy because they want to avoid the cost of punishment.The number of cooperators starts to increase and punishers decrease the punishmentcost accordingly. However, this reduced punishment cost makes defection to become

108

0

0.2

0.4

0.6

0.8

1

4750 5000 5250 5500

10 NFR20 NFR50 NFR

100 NFR150 NFR200 NFR500 NFR

(a) Punishment.

0

0.2

0.4

0.6

0.8

1

4750 5000 5250 5500

10 NFR20 NFR50 NFR

100 NFR150 NFR200 NFR500 NFR

(b) Sanction.

Figure 7.7: New Free Riders introduced at TS = 5000.

the utility maximizing strategy. Consequently, the number of defectors will increaseagain and the punishment cost accordingly. With this newly adapted punishment cost,defection will not be the optimal strategy anymore, reaching the initial situation again.

Consequently, the social expenses using DAH are considerably reduced both whenpunishing (66%) and sanctioning (56%). The implemented heuristic allows society tointelligently reduce the social costs needed for the achievement and maintenance ofcooperation.

7.3.7 Adapting to the Environment: Free Riders Invasions

This experiment is aimed to test the hypothesis that sanction makes the populationmore resilient to change than if it were enforced only by mere punishment. If suddenly a

109

large amount of new defectors joins the population, we suppose that defectors will takelonger to invade the population in which sanction has been used. In order to confront therelative speed of adaptation and degree of resilience of the populations enforced withpunishment and sanction, we run simulation experiments in which after timestep 5000,new free riders (from a minimum amount of 10 to a maximum of 500) are injected inthe populations

Experimental results (see Fig. 7.7) show that a population enforced by (DAH) sanc-tion is able to receive up to 200 new free riders and still to maintain a high level ofcooperation, while when (DAH) punishment is used, only 100 new free riders makecooperation collapse. In the population enforced by sanction, a larger amount of coop-eration norms have spread, this having a refraining effect on the decision of abandoningthe cooperative strategy. Highly salient norms guarantee a sort of inertia, restrainingagents to change their strategy to a more favorable one.

7.4 ConclusionsThe Distributed Punishment experiment performed with human subjects (presented inChapter 5) gave us the hint of the existence of a heavier “message” in certain typesof punishment, affecting differently the reasoning of subjects. The designed EMIL-I-A Architecture takes into consideration the difference stated by cognitive scien-tist [Giardini et al., 2010] between Punishment and Sanction. We rebuilt the DistributedPunishment experiment in a simulated scenario with EMIL-I-A agents, obtaining sat-isfactory results recreating with an virtual agent the behavior of human subjects underthe same experimental conditions.

Meanwhile, we have also proven with experimental economics the hypothesizeddifference between punishment and sanction. As far as we know, this is the first timethis result has been proven, being therefore interesting for such community. On theother hand, EMIL-I-A has also performed well recreating the human subjects dynamicsin the same experiment. After proving the validity of such architecture, we (computerscientists and policy makers) are given with a powerful tool that allows us to furtherexploit these punishment technologies, in different situations, unfeasible to obtain inthe laboratory with human subjects.

For that reason, we have developed a simulation platform that have served us totest our normative theory of punishment and the designed EMIL-I-A architecture. Thesimulation results obtained by exploiting the capabilities of EMIL-I-A show the waysin which punishment and sanction affects the emergence of cooperation. More specifi-cally, these results seem to verify our hypotheses that the signaling component of sanc-tion allows this mechanism (a) to be more effective in the achievement of cooperation;(b) to spread norm faster and wider in the population, making it more resilient to envi-ronmental change than if enforced only by mere punishment; (c) to reduce significantlythe social cost for cooperation to emerge.

Moreover, and up to our knowledge, this is the first work in which agents are en-dowed with rich cognitive architecture allowing them to be affected by the normativeinformation associated to sanction and to dynamically gauge the amount of damage toimpose on defectors, achieving an efficient compliance.

110

However, one question remains unanswered: what happens when punishment isnot available but the norm is still necessary? We need a special mechanism that allowagents to be compliant with the norms independently of the external outcomes. Axelroddefined this process as internalization [Axelrod, 1986] and we will show in the follow-ing chapter how it is tightly linked with the salience of norms. By donating agentswith norm internalization mechanisms we will achieve a more compliant, stable andcost-efficient society, without sacrificing adaptability.

111

Chapter 8

Internalization

In the previous chapter we have seen how punishment is a useful mechanism to imposesocial norms by deterring (and educating) norm violators. However, we have seen howin the absence of punishment mechanisms, agents tend to adapt to the self-interestedutility-maximizing strategy.

The problem social scientists still revolve around is how autonomous systems,like living beings, perform positive behaviors toward one another and comply withexisting norms, especially since self-regarding agents are much better-off than other-regarding agents at within-group competition. Since Durkheim, the key to solving thepuzzle is found in the theory of internalization of norms [Mead, 1963, Parsons, 1937,Grusec and Kuczynski, 1997, Gintis, 2004, Horne, 2003]. One plausible explanationof voluntary non self-interested compliance with social norms is that norms havebeen internalized. Internalization is another of the mechanisms identified by Axel-rod [Axelrod, 1986] for the establishment of norms1.

Internalization occurs when

a norm’s maintenance has become independent of external outcomes -that is, to the extent that its reinforcing consequences are internally medi-ated, without the support of external events such as rewards or punishment([Aronfreed, 1968, p 18]).

This process entails several advantages. For example, norm compliance is expectedto be more robust when norms are internalized than in the case when norms are ex-ternal reasons for conduct: if everybody in the population internalizes a norm, thereis no incentive to defect and the norm remains stable [Gintis, 2004]. Driven by inter-nal motivations, internalizers are not only much better at complying with norms thanare externally-enforced individuals, but also at defending them. An effect of the latterprediction is that norm internalization is decisive, if not indispensable, for distributedsocial control. Internalization is not only a mechanism of private compliance, but alsoa factor of social enforcement. Individuals who have internalized the norm, not only

1Other factors identified by Axelrod in [Axelrod, 1986] like dominance and deterrence have been analyzedin Chapter 7.

113

comply with it without any need of external enforcement, but in many circumstancesthey also want others to observe the norm, by reproaching transgressors and remindingwould-be violators that they are doing something wrong.

The importance of norm internalization in favouring social order has been largelyrecognized, but philosophers, social scientists, psychologists, anthropologists still striveto answer some fundamental questions: Why and how do agents internalize social in-puts, such as commands, values, norms and tastes transforming them into endogenousstates? Which types of mental properties and ingredients ought individuals to possessin order to exhibit different forms of compliance? How many people must internalize anorm in order for it to spread and remain stable? What are the different implications ofdistinct modalities of norm compliance for society and governance of distinct modal-ities of norm compliance? These questions received no conclusive answer so far. Inparticular, no explicit, controllable, and reproducible model of the process of internal-ization is available yet.

Agents conform to an internal norm because so doing is an end in itself, and notmerely because of external sanctions, such as material rewards or punishment.

This chapter provides an operational model of the internal mechanisms of inter-nalization. Rather than a none-or-all phenomenon, norm internalization is a highlydynamic process, whose deepest step occurs when norms are observed thoughtlessly.Norm internalization is here characterized as a multi-step process, occurring at variouslevels of depth and giving rise to more or less robust compliance and also allowing incertain circumstances for automatic behaviours to be unblocked and deliberate norma-tive reasoning to be restored. If only the deepest level of internalization were modelled,it would be sufficient to hardwire agents with norms and make them automatically com-pliant. Even though in several contexts this type of agent is highly efficient, in rapidlychanging environments it proves to be inadequate to deal with the adaptive problemsit faces. Thus, to fully operationalise such a multilevel model of norm internalizationrequires a complex agent architecture. Unlike the vast majority of simulation modelsin which heterogeneous agents interact according to simple local rules, in our modelthe agents (EMIL-I-As) are provided with normative mental modules, allowing them tointernalize norms active in the social system in which they interact.

Emotions, playing a significant role in this process, will not be investigated at thisstage. However, recent work [de Melo et al., 2011] has explored the effects of emotionsin hybrid environments (populated with humans and agents) affecting on the decisionmaking of human subjects.

8.1 What is Internalization?Norm internalization is the process by means of which norm’s compliance becomesindependent of external outcomes i.e. when the norm addressee observes it free fromexternal punishment and reward. In other words, it is a mental process that takes a(social) norm as input and provides the individual with terminal goals, i.e. goals thatare considered as an end in itself and as means for achieving other goals. Therefore, itis the next step to take in our road towards norm compliance after having studied thepunishment/sanction mechanisms.

114

As discussed in Sec. 6.2, normative beliefs generate normative goals by referenceto an external enforcement. However, a norm is internalized when the norm addresseecomplies with it independently of external sanctions and rewards. In such a case, thenormative goal is no more relativized to an expected sanction, but only to the mainnormative belief.

8.1.1 Factors affecting internalization.Why do agents observe a norm irrespective of external enforcement? Far from provid-ing a complete list of the factors favouring norm internalization, in the present work wewill focus on some of the elements playing a key role in this process, such as:

• consistency;

• self-enhancing effect;

• urgency;

• calculation cost saving;

• norm salience.

Let us start with consistency. This mechanism operates at two stages: first byselecting which norm to internalize, and later by enforcing it (self-enforcement) andcontrolling that one’s behavior corresponds to it (self-control). Consistency of newnorms with one ’s beliefs, goals and previously internalized norms plays a crucialrole in the selection process (of which norm to internalize). Successful educationalstrategies favor internalization processes, often by linking new inputs with previouslyinternalized norms. Analogous considerations apply to policy-making. Consider theantismoking legislation: the efficacy of antismoking campaigns based on frighteningannouncements and warning labels (e.g., sentences like ’Smoking kills’ on cigarettepackages, see [Goodall, 2005]) is still controversial. One of the factors reducing theefficacy is the effect known as hyperbolic discounting ([Bickel and Johnson, 2003]; seealso, [Rachlin, 2000]), a psychological mechanism that leads to invest in goal-pursuita measure of effort that is a hyperbolically decreasing function of the time-distancefrom goal-attainment, and leads people to procrastinate energy-consuming work untilthe very last moment. Due to hyperbolic discounting, people, especially young people,are unable to act under the representation of delayed consequences of current actions.On the other hand, much more efficacious anti-smoking campaigns are those playing onpreviously emerged and diffused set of social norms, such as the live-healthy precepts,highly consistent with the message they want to transmit.

The second factor playing a role in norm internalization is the self-enhancing effectof norm compliance: the norm addressee realizes that it achieves one of its goals byobserving a given norm. Suppose I succeed in refraining from smoking and that aftera few days, I realize an advantage that I had not perceived before: food starts to tasteagain. This discovery generates a goal (quit smoking to enjoy good food), not rela-tivized to the norm but supporting it: I have converted the norm into an ordinary goal.Whether this goal will be strong enough to out-compete addiction is another matter.

115

Third, we focus on urgency (see, for example, [Zhang and Huang, 2006]). In par-ticular, one can argue that the more a given norm allows to answer problems frequentlyencountered under conditions of urgency, when time for decision-making is none orscanty, the more likely that norm will be internalized.

Fourth, we claim that agents are parsimonious calculators: under certain conditions,they internalize norms in order to save calculation and execution time. Upholding anorm that has led one to succeed reasonably well in the past is a way of economizingon the calculation costs that one would have to sustain whenever facing a new situation.Imagine a driver’s decision to stop at the traffic light when it turns red. Each time ourdriver approaches a red traffic light, it calculates the costs and benefits of complyingwith the norm: e.g. it predicts that it will save time by ignoring the norm, but that afine will then follow with a certain probability. The driver then chooses what is best foritself.

After a certain amount of calcula, always returning the same output (e.g. the driveralways deciding to stop in order to avoid punishment), the agent able to internalizewill abstain from instrumental reasoning: it will stop the car when traffic light is redwithout calculating the convenience of such behavior. After having weighted the costsand benefits of complying or not with a certain norm for a certain number of times (andhaving reached exactly the same decision every time), the agent stops calculating andtakes norm compliance as the best choice. By doing so, it will save time acting in amore efficient way. Explicit processing starts declining with norms gaining force, andgradually stops once norms have been internalized.

This last point is strictly intertwined with another important factor favoring norminternalization: i.e. norm salience.

Norm’s salience (already analyzed in Sec. 6.3) is another factor allowing norm inter-nalization: if the norm is highly salient, the norm is a candidate to be internalized by theagent. We claim that social norms are not static objects and the degree to which they areoperative and active varies from group to group and from one agent to another: we referto the degree of activation as salience. The more salient a norm is, the more it will elicita normative behaviour [Bicchieri, 2006, Xiao and Houser, 2005, Cialdini et al., 1990].Then, salience may increase to the point that the norm becomes internalized, i.e. con-verted into an ordinary goal, or even in an automated conditioned action, a routine.Norm’s salience is a complex function, depending on several social and individual fac-tors.

The salience of a normative belief can vary depending on several social and indi-vidual factors. The social factors affecting salience have been discussed in Chapter 6.On the other hand, norm salience is also affected by the individual sphere, it dependson the degree of entrenchment with beliefs, goals, values and previously internalizednorms2.

In this work we will focus only on the last two conditions, calculation cost savingand norm salience, leaving the rest open for future work.

2It has to be pointed out that the norm salience can also gradually decrease: for example, this happenswhen agents realize that norm violations received no punishment or when normative beliefs stay inactive fora certain time lag, suggesting that the norm is not very active in the population anymore.

116

In the next sections, we will show how EMIL-I-As can account for reversible rou-tines, or, which is the same, for flexible conformity.

8.2 Dynamics of Norms Internalization

When located in environments regulated by different competing norms, agents needa mechanism to detect the respective importance of each norm, thus allowing themto select which one of them to internalize, when to internalize it, or to de-internalizeone norm in order to internalize another. EMIL-I-A is given with several modules tohandle this complex task. Some of these modules (like the norm recognition moduleand the salience control module) have already been explained in Sec. 6.3. Their struc-ture and functioning remain completely the same, and in this section we will center inthe Internalization Module. This module is a conceptual selector of the decision mak-ing function to be followed; agents can follow an instantaneous utility maximizationfunction, a normative decision function or an automatic response (with no computationassociated).

8.2.1 Internalization Module

The internalization mechanism accounts for the trade-off between self-adaptation andinternalization. This is in charge of selecting which norm should be internalized, con-verting it into an automatic behavior that is fired without any decision making process.Of course, automatization is a multi-step complex process that needs conceptual, mod-elling and experimental work. For the time being, we implement norm compliance aseither automated or decided upon behaviour. In the real matters, the alternative is prob-ably fuzzier, and different steps on the continuum from a fully deliberate to a whollyautomated behaviour ought to be modelled. In the present work, (and as we can see inFigure 8.1) an internalized norm fires an automated action without any decision mak-ing process. An agent can internalize only one norm regulating a certain situation (e.g.people can greet by waving from a respectful distance, with a friendly handshake, orwith a number of kisses: depending on the environment, only one of these actions willbe internalized). Moreover, thanks to the information provided by the Salience Con-trol Module, this mechanism is also in charge for the de-internalization process. Asexplained in Sec. 8.1.1, EMIL-I-As are parsimonious calculators, and therefore, theyinternalize norms in order to save calculation (decision-making) and execution time.Trying to record the monotony of agents’ decisions, agents keep track (EvalN) of theamount of times this decision-making calculation (in which the expected payoffs ob-tained by complying or not with the norm is calculated) is performed and it returnsthe same decision. In case the decision-making calculation would return a differentdecision, the counter would be resetted, representing a break up of the monotony.

• Two conditions are necessary for a norm N to be internalized by an agent:

1. the salience of the candidate norm is at its maximum value (SalNt = 1), and,

117

2. the decision-making calculation has been done for a certain amount of timealways returning the same decision (in the present model we fixed the cal-culation repetition tolerance to 10, therefore EvalN ≥ 10).

• Instead, a norm N1 is de-internalized when the following conditions apply:

1. the salience of the internalized norm arrives at its minimum value (SalN1t =

0), and,

2. the salience of another norm (ruling exactly the same situation) exceeds theinternalized one (SalN2

t > SalN1t )

Summing up, we can see in Figure 8.1 a representation of the EMIL-I-A architec-ture with all its capabilities. An example decision making process would start recog-nizing the situation in which an agent finds itself; after that an agent reacts with theautomated action in case a norm is internalized. Otherwise, and if the agent has notrecognized any norm, it will use the Norm Recognition Module and then proceed with adefault decision-making. However, if the norm was already recognized, but not yet in-ternalized, agents check the internalization conditions previously explained and checkwhether the norm can be internalized; the decision making at this stage will be thatrestricted to the normative beliefs and goals, orchestrated by the salience.

8.2.2 Urgency Management Module

In self-organizing societies, where agents are in charge of creating, imposing, modi-fying, and enforcing norms, these are neither explicitly specified nor defended by anyin-charge authority. In a number of situations norms must be broken and agents shouldbe able to recognize these situations properly. By this means, agents have to intelli-gently decide to violate the norm, for example crossing at the red light when hearing anambulance coming from behind.

In our model, to violate norms when urgent situations occur, agents need to explic-itly record and compare the urgent and necessary nature of the situation their are facing(in the ambulance example, the lights and the siren make explicit the socially desirableviolation of Do not cross in red norm). Therefore, all the Urgent Situations are accom-panied with a salience value (SalUrgency

t ), that agents can compare with the salience ofthe norm “normally” regulating that specific situation and decide accordingly. We areaware of the fact that associating Urgent Situations with a pre-defined salience value isa short-cut, and agents should learn this Urgency salience instead, but at the momenthow agents learn to recognize what it is considered to be an urgency has been deliber-ately left out and will be explored in future work. Therefore, in urgent situations thedecision making compares the salience of the urgent situation with the salience of thenorm that usually regulates the situation, as it is showed in the following algorithm:

118

Actio

n

World

Norm

Rec

ogni

tion

Mod

ule

Salie

nce

Cont

rol M

odul

e

Even

ts

Inte

rnali

zatio

n M

odul

e

Norm

In

tern

alize

d?

Yes

Norm

Re

cogn

ized?

No

Ye

s

Goal:

Max

imize

Utili

ty

Decis

ion

Mak

ing

Choo

se U

tility

Max

imize

r St

rate

gy

Norm

ative

Dec

ision

Mak

ing

Mod

ule

Chec

k In

tern

aliza

tion

Cond

itions

Defa

ult D

ecisi

on M

akin

g M

odul

e

No

Goal:

Max

imize

Utili

ty

Norm

ative

Goa

l: Co

mpl

y wi

th N

orm

s

Decis

ion

Mak

ing

Bene

fit C

ost E

valu

atio

n of

Av

ailab

le Ac

tions

NBeli

ef 1

: Nor

m E

xiste

nce

NBeli

ef 2

: Per

tinen

ce

NBeli

ef 3

: Pun

ishm

ent

Socia

l Cue

s Re

ward

Rewa

rd

Figure 8.1: EMIL-I-A Architectural Design

119

if SalNt ≤ SalUrgency

t then1

Violate the norm;2

else3

Comply with the norm;4

end5

Algorithm 5: Urgency Management Algorithm.

8.3 Self-Regulated Distributed Web Service Provision-ing

Our simulation scenario consists in a web-service market populated by agents whosetask is to find out which services are offered by other agents (distributed) and control thebehavior of the rest of peers (self-regulated). This environment allows for EMIL-I-A tobe tested in a setting relevant for the MAS community. We will use the complete EMIL-I-A architecture, although with a simplified version of the Decision Making than the oneshown in Sec. 7.1.1. The decision making of the agents used in here are explained inSec. 8.3.4.

The simulated scenario presents some fundamental features reproducing the real-world phenomenon it models: it is dynamic (new services with different capabilitiescan be created during the simulation), unpredictable and populated by heterogeneousagents.

To test the performance of internalizers in a multi-norm scenario, we have intro-duced two possible norms regulating the system. This variant is a novel contribu-tion with respect to previous works in the normative self-adapting community (suchas [de Pinninck et al., 2007b, Blanc et al., 2005]).

8.3.1 MotivationThe presented simulation model aims to reproduce a self-regulated web-service market.Inspired to the ”Tragedy of the Digital Commons”, we present a provider-consumerscenario, populated by providers that might suffer a tragedy-of-the-commons causedby the consumers’ over-exploitation. The services we refer to present the followingfeatures: (1) they are available for use, and return agents a certain utility; (2) they area common good, and, (3) they can attend up to a certain amount of requests: once acertain threshold is overcome the service starts loosing quality.

While looking for services, consumers ask central registries information about whocan provide them the service they are looking for, information that is obtained by mo-bilizing neighbors in their social network. Networks are an effective way to obtain in-formation - for example provided through word-of-mouth - and represent an alternativesource, with respect to traditional methods. Conventional approaches in multi-agentsystems, such as registries (centralized repositories with the services that are offered)or matchmakers (auxiliary agents with knowledge to couple agents offering servicesand agents with need of services), partially address this problem [Decker et al., 1997].

120

However, in highly dynamic environments, there is a valuable amount of informa-tion that cannot be stored in a centralized repository. In some cases, much of thisinformation (such as the updated details about the quality of the service or the avail-ability of the service) may be accessed only by using social networks of interaction.In this work, a hybrid approach has been proposed: similarly to the white pages inUDDI [Curbera et al., 2002], our agents will query a central server for obtaining point-ers to service providers, and then, all the other important information about the servicewill be provided by the service providers. The functioning of the system is similar tothat implemented by Napster [nap, 2006].

More specifically, our case study presents the following dynamics: during the sim-ulation, agents look for and find services (or learn tasks) that are necessary to satisfytheir necessities. According to the number of agents that are exploiting it, every ser-vice has a level of quality that can increase or decrease during the simulation. Agentsobtain the services they need by finding other agents willing to share the service theyhave control of. For the sake of simplicity, the allocation of new resources and neces-sities are done automatically from the system into the agents, with different allocationprobability distributions depending on the experiment.

By finding a needed service with its expected quality, an agent obtains a reward;however if the service obtained does not fulfil the agent’s expectations - in terms ofquality -, the requester will continue to look for the service. The service itself does nottake any part in the decision to be shared or not: the decision is taken by the serviceholder, and, if a service-holder decides to share its service, this will be automaticallyshared.

Moreover, when sharing, a service-provider remains blocked for a number of time-steps (representing the transaction time), thus reducing its possibility to look for andfind its own needed service. This transaction time is inspired on the P2P networks:in these type of networks, when a resource is shared, the bandwidth of both agents isreduced (one for uploading the resource and the other for downloading it).

Thus, we can see how in this scenario, a selfish agent is motivated to free-rideby obtaining services from other agents, but giving none. The spread of this selfishbehaviour would eventually lead to a “tragedy of the commons” situation, getting thesociety to collapse, as predicted in [Adar and Huberman, 2000].

Trying to model the performance of a real system, EMIL-I-As behave dynamically,changing the rates of services’ assignation and request along the time. To test howthe EMIL-I-A architecture performs when facing this “tragedy of the digital commons”situation, we compared its performance with that of agents endowed with different ar-chitectures.

8.3.2 Norms in our Web Service ScenarioAs we have already specified, the service providers can share a resource with a re-quester. However, the service provider is the only one that knows at run-time the qualityof the service he can offer, and the quality needed by the requester. We introduced twodifferent norms that regulate the environment: (N1) the Always Share when Asked fora Service norm and (N2) the Share Only High-Quality Services norm. Depending onthe environmental conditions (the service’s capacity distributions) and on the agents’

121

necessities (the services’ expected quality), one or the other norm will become moresalient, and therefore will govern the behavior of the system.

In order to enforce norms and to deter agents from violating them, EMIL-I-As canuse the two different enforcing mechanisms already analyzed: punishment or sanction.

8.3.3 ModelFor the sake of clarity, we model the environment as a Hybrid Decentralized archi-tecture (as in Fig. 8.2), as defined in [Androutsellis-Theotokis and Spinellis, 2004]. Inthis type of architecture, each agent shares services (or resources) with the rest of thenetwork. All agents are connected with a central directory server that maintains (a) atable in which all the users’ information are recorded (normally the IP addresses and theconnection bandwidth, which in our system are represented as an ID and the capacityof the service), and (b) a table listing the services from each agent, along with metadatadescriptions of the services (such as the type, capacity and so on). Every agent lookingfor a service sends a query to the central server. The server searches for matches in itsindex, returning a list of users that hold the matching service. The user can then opendirect connections with one or more of the peers that hold the requested service, and hecan request the needed service. The final decision of whether sharing or not the serviceis to be taken by the peers.

In our scenario, the central server maintains the following information: A, the set ofagents in the system, SN, the social network that connects the agents, and T , the typesof services that can be provided and looked for in the system.

Agents are located in a social network that restricts their interactions and commu-nications. The social network is described in the following way:

SN = {A,Rel}where A is the set of agents populating the network and Rel is a neighborhood

function. The neighborhood function is indirected, therefore is ∀a,b ∈ A if Rel(a,b)then Rel(b,a).

During the simulation, agents find resources (or learn tasks) that are offered as ser-vices:

Service = {ID, t,c,Clients}

where ID represents the identification number of the service provider (e.g. IP ad-dress), t ∈ T is the type of service offered by the service provider (e.g. storage, calcu-lation time, data analysis), c is the capacity of the service provider, i.e. the amount ofclients it can satisfy offering a good Quality of Service; and Clients ⊂ A indicates allthe clients the service provider is holding at the moment.

Moreover, agents find out new necessities, that can be fulfilled by obtaining servicesfrom other agent:

Needs = {t,q,d}where t ∈ T represents the type of service an agent needs with a minimum q Quality

of Service level, before timestep d. The Quality-of-Service of a provider is calculatedin the following way: Capacity

ClientsFor the sake of simplicity, the allocation of new resources and necessities to the

agents is done automatically by the system, with different allocation probability distri-

122

Figure 8.2: Social Network of the Web-Service Provisioning Scenario.

123

butions depending on the experiment. Moreover, the service’s capacity and the service’sexpected quality follow two different probability distributions, both defined by the sys-tem and dependent on the environmental conditions that will be simulated.

Algorithm 6: Simulation Process.

for timesteps do1/* Environmental Phase: Resource & Needs Assignation */forall i ∈ agents do2

coin = random();3if coin ≤ serviceAssignationProbability then4

s = createNewService(type,capacity);5S.add(s);6i.assignService(s);7forall j ∈ agents do8

if coin ≤ needAssignationProbability then9j.assignNeed(s.type, deadline);10

end11end12

end13end14/* Search & Exchange Phase */forall i ∈ agents do15

/* Ordering the needs */i.orderNeedsByUrgency();16n = i.mostUrgentNeed();17/* Query server about most urgent needed service */Server← i.serverRequest(n);18potentialContacts = i.receiveServerResponse()← Server.serverResponse({c0,d0},{c1,d1}, . . . ,{ck ,dk});19/* Send a request to the closest of the available service holders */contact = potentialContacts.closestDistance();20contact← i.sendRequest(n,n.quality,n.urgency);21/* Receive and evaluate response for norm violations */response = i.receiveResponse()← contact.decideRequest(i,n,urgency);22if response == Share then23

i.Block(contact.distance);24contact.Block(contact.distance);25

end26normativeResponse = i.decideNormativeReaction(response);27

end28/* Strategy Update Phase */forall i ∈ agents do29

nc = i.ObserveNormaticeCues(i.neighbours);30i.UpdateStrategy(nc);31

end32end33

The simulation is run for 20.000 time steps, and each time step is structured in thefollowing way:

1. According to the environmental situation, each agent has the possibility to pro-vide some services and has some needs to satisfy (lines 2-14 in Algorithm 6).The needs are only assigned to agent when a service is created, and that is whywe created the needs assignation after the creation of the service.

2. Each agent can ask the system for a needed service (line 18 in Alg. 6), thusreceiving a list containing the service’s holders (line 19 in Alg. 6). The serverlist has the following form {c0,d0},{c1,d1}, . . . ,{ck,dk}, where ci represents theagent identity, and di represents the distance from the requester to that agent.

124

3. Each agent can send to the selected service-holder a request (as it can be seen inthe schematic communication protocol shown in Fig. 8.3) with a urgency param-eter. The service-holder is chosen giving preference to those that have availableservices and are located at the closest distance (line 20-21 in Alg. 6).

4. To simplify our experiments, agents can take any of these three decisions:“share”, “not to share” or “not allowed to share”. The decision-making functionswill be further discussed in Section 8.3.4.

5. If the service holder’s response is positive (”share”) and the quality of the re-ceived service is at least equal to the expected one, the requester becomes aservice-holder, and both the original service-holder and the requester remainblocked for a fixed number of time steps (representing the transaction time); ifthe quality of the service is insufficient, the requester will continue to look forthat service, discarding the one received (line 23-26 in Alg. 6)

6. After receiving the service holder’s response, if a low quality service is received(line 27 in Alg. 6), the requester can decide to punish or sanction the service-provider. As said in section 8.3.2, punishment works by imposing a fine to thetarget, thus modifying the cost-to-benefit ratio of norm compliance and violation,while sanction is also accompanied with a normative message, making explicitthe existence and violation of the norm to the target (and potentially to the audi-ence).

7. At the end of each time step, agents observe the interactions that have occurredaround them, this way checking the amount of norm’s compliance, violation, pun-ishment and sanction and consequently update the norm’s salience mechanism (asexplained in Sec. 8.2). Our agents have access to perfect local information: theycan record only the social normative information of their direct neighbors in theirsocial network (line 29-32 in Alg. 6).

Figure 8.3: The Interaction Protocol Between Agents

125

The two norms that govern the environment, described in Sec. 8.3.2, are formalizedas follows:

• N1, the Always Share when Asked for a Service norm:∀requesterAgent Share← contact.decideRequest(requesterAgent,requestedService,urgency);

• N2, the Share Only High-Quality Services norm:∀requesterAgent Share← contact.decideRequest(requesterAgent,requestedService,urgency)i f f requestedService.capacity>requestedService.Clients.size.

8.3.4 Agents Architectures

In environments where the designer has perfect knowledge of both the system’s be-haviour and the agents’ necessities, hardwired strategies are the best option: the sys-tem designer can use scheduling algorithms to synchronize the usage of the resourcesamongst the agents to obtain the optimal distribution. However, in this work, we areinterested in scenarios where the dynamics of the system are unpredictable and unfoldruntime. Therefore we need agents able to adapt to environmental changes. In the ex-perimental section (Sec. 8.3.5) we compare two agents’ architectures: the InstantaneousUtility Maximizing Agents (IUMA) and the EMIL-I-As.

Aiming to maximize their instantaneous utility, IUMAs share their resources onlyif the probability of being punished is sufficiently high. Moreover, these agents neverpunish (as punishing is costly and reduces their utility).

EMIL-I-As are provided with the normative architecture described in Sec. 8.2.Before internalizing the norm, they comply with it only to avoid punishment, as

IUMAs do. We refer to EMIL-I-As that are not provided the internalization modules asnormative agents. They do not comply with norms in an automatic way, as internalizersdo, they choose to comply with the norm only to avoid punishment. In other words,unlike internalizers their goal to comply with the norm is an instrumental one: it issatisfied only to avoid punishment and it is not an end in itself. Once internalized, theyfollow the norm also in absence of punishment. Initially, only norm-holders know aboutthe norms governing their environment. Any normative agent (EMIL-I-A or not) can(or cannot) be a norm-holder, depending if they are initialized with knowledge aboutnorms and the salience variable.

Therefore, normative agents observe norms under the threat of punishment (as thiswould reduce their utility). If a norm is intensively defended through the application ofpunishment, a normative agent will consequently observe it; on the other hand, when thenorm is not defended, a normative agent will probably not observe the norm. However,agents do not know beforehand what the surveillance rates of the norm are. Duringthe simulation, agents update (with their own direct experience and observed normativesocial information) the perceived probability of being punished. Normative agents’decision making is also sensible to a risk tolerance rate: when the perceived punishmentprobability is below their risk tolerance threshold, agents will decide to violate the

126

norm; otherwise they will observe it. Although this process might provide agents withthe highest benefit, it yields the computational cost of evaluating at every time step eachof the options. Before internalizing norms, a normative agent calculates the convenienceof complying with the norm, while the salience mechanism works in the background.From the technical point of view, the normative agent will internalize a norm when boththe following conditions are fulfilled: (1) norm salience is above a certain threshold,indicating that the norm is important enough within the social group; and (2) the cost-to-benefit calculations for all the possible actions exceed the tolerance threshold.

Once a norm is internalized, EMIL-I-As stop executing benefit-cost calculation, andstart observing the norm in an automatic way. Nevertheless, the salience mechanismis still active, and it is continuously updated. This way, if necessary, agents are ableto unblock the automatic action, restoring a cost-to-benefit analysis in order to decidewhether to comply or not.

Internalizer Decision Making

Internalizers take two decisions: (a) whether share a service or not, and (b) whetherpunish or sanction norm violators. Concerning the first choice (see Alg. 7), share aservice or not, if no norm has been internalized yet, the agents will evaluate all thepossible actions available, and they will choose the one returning the highest payoffs(line 11-24 in Alg. 7). Otherwise, once the norm has been internalized, the agent willbehave in an automatic way: the decision will be no further evaluated, but executedstraightforwardly wrt the compliant behaviour of the internalized norm (line 1-10 inAlg. 7).

Concerning the second choice, when detecting a norm violation, the decision topunish/sanction3 or not the wrongdoers depends on the level of salience of the violatednorm. This decision making is described in Algorithm 8 in a very self-explanatory way.

IUMAS Decision Making

The IUMAs’ decision making for the first stage decision is modelled with aclassic Q-Learning algorithm (as in [Sen and Airiau, 2007, Villatoro et al., 2009]).The learning algorithm used here is a simplified version of the Q-Learning one[Watkins and Dayan, 1992].

The Q-Update function for estimating the utility of a specific action is:

Qt(a)← (1−α)×Qt−1(a)+α× reward (8.1)

where reward is the payoff received from the current interaction and Qt(a) is the utilityestimate of action a after selecting it t times. When agents decide not to explore, theywill choose the action with the highest Q value. The reward used in the learning pro-cess is the one obtained from interaction, considering also the amount of punishmentreceived.

The IUMAs’ decision making for the second stage is also modelled with a Q-Learning. However, as there are no potential risks consequent to not punishing (as in

3As we said, only agents having recognized that there is a norm regulating their group will sanction,otherwise they will just use punishment.

127

Algorithm 7: Resource Sharing Decision Making.

Input: Deciding Agent i, Requesting Agent j, Required Service s, Requested Quality q, Requested Urgency uData: Potential Reward for Defection RDe f = 3Data: Potential Reward for Cooperation RCoop = 0/* Automatism in case of Internalized Norms */if i.Internalized(N1) then1

/* Norm 1 Compliant Behaviour */Share requested resource;2

end3if i.Internalized(N2) then4

/* Norm 2 Compliant Behaviour */if s.clients.size < s.capacity then5

Share requested resource;6else7

Do not Share;8end9

end10/* If no norm has been internalized, Benefit-Cost Calculation considering the presence

of Norms *//* Agents have to update their Beliefs about Punishment Policies */i.updateBeingPunishedProb();11if i.RiskTolerance ≤ i.perceivedPunishmentProb(N1) ∨12i.RiskTolerance ≤ i.perceivedPunishmentProb(N2) then13

RDe f ← Deduce Punishment Cost;14end15if RDe f ≥ RCoop then16

Do not Share;17else18

if i.Salience(N2) ≥ i.Salience(N1) ∧ i.Salience(N2) ≥ Norm Activation Value then19Not allowed to Share;20

else21Share;22

end23end24

128

Algorithm 8: Decision Making in front of norm violationsInput: Requesting Agent i, Requested Agent j, Required Service s, Requested Quality qreq,

Requested Urgency u, response, Obtained Quality qrec/* Response against an Excused Service Denial */

if response == Not allowed to Share then1/* If low quality was requested... */

if qreq < 1 then2/* And N1 is more important, it is considered as a Punishable

Violation */

if i.salience(N1) > i.salience(N2) then3if i.salience(N1) ≥ ACTIVATION VALUE then4

i.sanction(j);5else6

i.punish(j);7end8

end9end10

end11/* Response against a Service Denial */

if response == Do Not Share then12/* If low quality was requested... */

if qreq < 1 then13/* And any norm is active, it is considered as a Punishable

Violation */

if i.salience(N1) ≥ ACTIVATION VALUE then14i.sanction(j);15

else16i.punish(j);17

end18else19



end24end25

end26/* Response against a Low Quality Sharing */

if response == Share ∧ qrec < qreq then27/* If it is not an urgency... */

if u¡i.salience(N2) then28/* And N2 is active, it is considered as a Punishable Violation */



end33end34

end35/* If no normative action has been taken, and any norm is highly salient,

educational messages are sent */

if No Punishment/Sanction Yet then36if i.salience(N1) > i.salience(N2) ∧ i.salience(N1) ≥ ACTIVATION VALUE then37

i.educate(j);38end39if i.salience(N2) > i.salience(N1) ∧ i.salience(N2) ≥ ACTIVATION VALUE then40

i.educate(j);41end42

end43 129

the meta-punishment situation presented by [Axelrod, 1986]), agents will always prefernot to punish.


In the simulation platform presented, two important factors can vary, thus affecting thesystem’s behaviour: (1) the services’ capacity, (2) the expected desired-quality of theservices.

In order to check the adaptability of EMIL-I-As, the service’s capacities and itsexpected desired-quality vary during the simulations.

To reduce the search space, the service and task assignment have been fixed to aconstant rate of 10%: the resources are created at an average rate of 1 every 10 timesteps, and at that time step, 10% of the population will be assigned with tasks that needthat resource. Tasks are assigned only when resources are created.

Agents are located in a scale-free network (that represents theoretical social net-works [Newman, 2003, Albert and Barabasi, 2002]). This topology restricts the agentsobservation (with respect to the social and normative information) to their direct neigh-bors, but they can potentially interact (ask and receive services) with any other agent inthe network.

In this work, the cost of punishment has been fixed to 1:4, meaning that punishingcosts 1 unit to the punisher while reducing the target endowment of 4 units (the 1:4punishment technology is used because found [Nikiforakis and Normann, 2008] moreeffective in promoting cooperation). The decision making of both normative agentsand internalizers is not affected by the cost of applying a punishment/sanction to thetarget; however, the costs of being punished/sanctioned affects the agents’ decision tocooperate or defect in the next rounds. The results presented in this section are theaverage results of 25 simulations.

All non-IUMAs agents are initialized with a constant propensity to violate normsand the perceived probability of being punished/sanctioned is equal to or lower than30%. From the beginning of the simulation, some agents (this quantity varies fromexperiment to experiment) are already endowed with the two norms, N1 Always Sharewhen asked for a Service and N2 Share Only High-Quality Services. We refer to theseagents as norms’ holders and the initial salience of their norms is set to 0.9. Before pro-ceeding and for the sake of clarification, as specified in Sec. 8.3.4, a normative agentis an agent endowed with the EMIL-I-A architecture and that has not internalized thenorm yet, while with internalizers we refer to normative agents that have the internal-ization module and can reach the internalization state and comply with norms in anautomatic way. Both types of agents (internalizers and normative) can (or cannot) benorm-holders, i.e. knowing the existence of the norm with a salience value assigned toit.

The simulations are populated with a constant amount of 50 agents, with a variableamount of internalizers (whether or not norm holders) and IUMAs.

130

8.3.6 Experiment 1: Adapting to Environmental ConditionsOne key feature of our internalizers is their ability to dynamically adapt to unpredictablechanges. To test this ability, we have designed several dynamic situations, where theproviders’ capacities and the requesters’ desires vary, in order to dynamically changethe norm that apply along the simulation without the knowledge of the agents. From thebeginning of the simulation, the environment is fully populated by internalizers, alreadyhaving the two norms highly salient stored in their minds (norms holders) 4.

In Fig. 8.4, on the x-axis the timesteps of the simulation are shown, on the y-axisthe amount of internalizers is indicated.

Firstly, in Fig. 8.4(a) we show the dynamics of a basic scenario, where the servicesare endowed with infinite capacity to attend clients, and the requesters do not care aboutthe quality of the services they receive (meaning that also services with a low qualitysatisfy their needs). We can observe that, after a number of timesteps the internaliza-tion process starts working, resulting in an increased number of agents that internalizenorm 1 (Always share when asked for a service). Why is norm 1 spreading and beinginternalized rather than norm 2 (Share only high-quality services)? This is due to thefact that in this experimental condition agents looking for a service are not interestedin its quality, thus even though they receive services with a low quality their needs aresatisfied and they will not punish/sanction the service holder that provided it. Shar-ing a service, whichever its value is, is interpreted as an action compliant with norm 1and since instances of this action exceed the number of actions in which high-qualityservices are shared, the salience of this norm is higher than the other’s. This experi-ment confirms that our internalization architecture works selecting the correct norm tointernalize.

The second experiment is a slight but important variation of the previous one. Thistime, the services’ capacity is restricted (to 3 clients). Results in Fig. 8.4(b) show thatagents correctly internalize the norm that rules the situation (i.e. norm 1), although theinternalization process is slower than in the previous experiment. The reason for delaylays in that the dynamics taking place in this experiment: no clear information is givento the agents for deciding which one, between norm 1 and norm 2, is governing the sys-tem. Internalizers are initialized with both norms at high salience. When the resources’holders start sharing, they provide requesters with high quality resources, even thoughthe requesters’ needs would be satisfied also by low quality resources. Agents interprethigh-quality transactions as actions compliant with norm 2, this increasing its saliencevalue. After a while, agents start sharing resources of lower quality, because the list ofrequesters exceeds the amount of high quality resources. Low-quality resource trans-actions make the salience level of norm 2 decrease and the salience level of norm 1increases to its maximum value (necessary condition for norm internalization as shownin Sec.8.2.1). We can observe how all these micro-dynamics in the individuals affectthe global behaviour of the system producing the delay showed in Fig. 8.4(b).

In the third experiment, we test the agents’ capacity to internalize norm 2: in thissituation the services’ capacity is restricted to 3 clients and the requesters’ needs aresatisfied only when receiving high-quality resources. In general, we observe that the

4As we will see in Sec. 8.3.8, the internalization process speed is proportional to the initial amount ofnorm holders.

131

0 10 20 30 40 50

0 5000 10000 15000 20000

Int N1Int N2

(a) Experiment 1: Infinite services’ capacity and expected low quality of the services (meaningthat agents need to have the service without paying attention to its quality)

0 10 20 30 40 50

0 5000 10000 15000 20000

Int N1Int N2

(b) Experiment 2: Restricted services’ capacity (restricted to 3 clients) and expected low qualityof the services

0 10 20 30 40 50

0 5000 10000 15000 20000

Int N1Int N2

(c) Experiment 3: Restricted services’ capacity (restricted to 3 clients) and expected high qualityof the services

0 10 20 30 40 50

0 5000 10000 15000 20000

Int N1Int N2

(d) Experiment 4: Services’s capacity is variable along the simulation: the services’ capacityis infinite during the first 2000 timesteps, and after that moment, it linearly decreases from 20 to1, reaching the service exploitation point (where they cannot longer offer high quality services)after timestep 8000. Agents are assigned tasks to be fulfilled at high quality level along the entiresimulation.

0 10 20 30 40 50

0 5000 10000 15000 20000

Int N1Int N2

(e) Experiment 5 (The Complex Loop): Before the timestep 2000, the capacity of the services isinfinite, but it is set to 3 after that. Before timestep 5000 and after the timestep 15000, agents areassigned needs that are satisfied also with low quality services. During the rest of the execution, theexpected quality of the services is high.

Figure 8.4: Different Combinations of Resources Capacities Distribution and ExpectedQuality Distributions.

132

agents internalizing norm 2 do not reach the majority, but when this is the case they doso quickly. This is because until the number of resources is high enough to satisfy allthe requests with their expected quality, being compliant with norm 2 can be interpretedalso as an action compliant with norm 1, thus maintaining the salience of norm 1 high.

In the fourth experiment a dynamic change in the environment is included. Thesystem is programmed in such a way that for the first 2000 timesteps the capacity toprovide resources is infinite; after that time, during the simulation, the capacity linearlydecreases from 20 to 1 until. Agents’ needs are satisfied only when they receive high-quality resources. We observe that after timestep 8000, the capacity of the servicessignificantly decreases and agents slowly start deinternalizing norm 1 and substitute itwith norm 2. This experiment shows how our agents are able to adapt to this dynamicsituations.

In order to test their speed of adaptability, we have designed one last experiment.In Fig. 8.4(e), we show the result of the experiment named The Complex Loop, wherewe can observe that our internalizers perform efficiently even in complex situations, i.e.where not only the environment (the services capacity), but also the agents’ preferences(desired-quality of the services) change during the simulation.

The Complex Loop experiment shows that when the services’ capacity is infinite(meaning that high-quality services are always available) and the agents’ needs aresatisfied by high-quality (HQ) resources, norm 1 is internalized. At timestep 2000,an abrupt change occurs in the environment, making the services’ capacity drop to 3,and agents switch to internalize norm 2. After time step 5000, agents’ preferenceschange (starting now to be satisfied also with Low-Quality resources), producing an-other change in the internalization dynamics. Agents turn back to internalize norm 1.Finally, agents’ preferences change again at time step 15000, preferring HQ resourcesand internalizing norm 2 again.

We can conclude that a population of internalizers can adapt to sudden and unex-pected environmental changes in a flexible manner.

8.3.7 Experiment 3: Internalizers vs IUMAs

In this experiment, the internalizers’ performances are compared with the IUMAs ones.We run several simulations to cover the search space of different proportions of Inter-nalizers5 and IUMAs.

As it is shown in Fig. 8.5, a higher amount of Internalizers is able to adapt faster tothe changing environmental conditions. The figure shows the percentage of unsuccess-ful transactions6 occurred in the system during a situation identical to the one presentedin the “Complex Loop” experiment, except that the population is a combination of In-ternalizers and IUMAs.

The results show that the number of unsuccessful transactions is inverse-proportional to the number of internalizers. The explanation of this result can be foundin the way in which IUMAs decide to comply or not with the norm: when they decide to

5Also in this scenario, Internalizers are also norm-holders6We consider transactions as unsuccessful when a service provider offers a service of a quality lower than

the expected one, or when a service, though available, is not shared .

133

Figure 8.5: Percentage of Unsuccessful Transactions with Different Proportions ofStrategic Agents and Internalizers.

share a resource their payoff is reduced (when sharing, agents are blocked for a numberof timesteps), therefore they decide to free-ride. In other words, they always ask forservices, but they never share.

On the other hand, internalizers decide to comply or violate the norm, not onlyto maximize their material payoffs, but also because they recognized a norm in theirsocial environment, and form the goal to comply with it, depending on its salience.Consequently, when the environment change and agents need to find out the new norm,internalizers perform a certain number of unsuccessful transactions before adapting tothe new situation. The simulation data show that the unsuccessful transactions are 50%less (from 45% to 22,9%) in groups entirely populated by internalizers than in thosepopulated by IUMAs.

8.3.8 Experiment 3: Effect of Initial Norm Holders

This experiment aims to analyze how the initial amount of norm holders affects theperformance of a group fully populated by internalizers. As explained in Sec. 8.2, in-ternalizers do not know which are the norms in force in the group at the simulationonset, therefore the presence of norm-holders, explicitly eliciting the norms, is neces-sary for triggering the process of norm recognition.

Focusing again on the The Complex Loop experiment (presented in Sec. 8.3.6 andshown in Fig. 8.4(e)), in Fig. 8.6 the average norm’s salience per agent is shown: onthe x-axis the simulation time evolution is shown, while on the y-axis the amount ofinitial norm holders (randomly located in the network) is indicated. We obtained similarresults in the rest of experiments performed in Sec. 8.3.6, but we decided to analyze onlythis situation because of its complexity and completeness (as it contains all the possibledynamics of our scenario with respect to norm internalization and de-internalization).Figure 8.6(a) represents the salience dynamics of norm 1 and Figure 8.6(b) that of norm2: depending on the moment of the experiment, only one of the two norms is the onethat really prescribes how to behave; that is why its salience is higher than the other one(for a more detailed description of the relation between the two norms, see, Sec. 8.3.6).

134

0 5000 10000 15000 20000

Timesteps

10

30

50

Nor

m H

olde

rs

0

0.25

0.5

0.75

1

(a) Norm 1 Average Salience

0 5000 10000 15000 20000

Timesteps

10

30

50

Nor

m H

olde

rs

0

0.25

0.5

0.75

1

(b) Norm 2 Average Salience

Figure 8.6: Complex Loop Experiment Average Salience.

135

Figure 8.7: Different norms coexisting in the same social environment.

During the first 2000 timesteps of the simulation, we can appreciate the impact ofa different number of norms’ holders on norms’ salience: when this number increases,so does the norm’s salience. Special attention should be paid to the situation with noinitial norm-holders: here norms are never recognized as no explicit norm elicitationoccurs (by explicit norm invocation or by sanction). Even a small number of initialnorm-holders (10% of the population in this case) allows agents to recognize the normsand internalize them. Once recognized and stored according to their salience degree,agents will start both comply with the norms and also defend them. Thus, norm hold-ers are necessary to trigger a virtuous circle from compliance with the norms to theirenforcement. After the two norms have been recognized (around timestep 5000), thenumber of initial norm holders will no more affect the successive dynamics.

8.3.9 Experiment 4: Testing Locality: Norms Coexistence

In self-organized societies, as are the ones we are interested in, norms are an impor-tant tool for solving the dilemma of public good provision. Since individual behavioursaffect the group’s welfare, the group itself needs to exercise control over its members.Consequently, agents can create and impose a norm depending on the goals of the so-ciety. Moreover, in this type of distributed systems, different dilemmas could appearin different parts of the social network, depending (again) on the distribution of pop-ulation’s preferences distribution and the environmental conditions. Our internalizersare able to cope with the locality of the norms, thus allowing the coexistence of dif-ferent competitive norms in the same social network. To observe this result we haveperformed the following experiment.

We have placed 60 agents in a one-dimensional lattice, with neighborhood size setto 6 (i.e. each agent has a constant number of 6 neighbors). In Fig.8.7, agents in thetop right and in the bottom left quarters of the ring are assigned with infinite capacityresources and low quality desired services, and, the rest of the agents are given limitedcapacity services and high-quality desired tasks. The colour of the nodes indicates theinternalized norms (light colour standing for norm 1, and dark colour for norm 2). The

136

self-adaptive capability of our internalizers shows a good performance in the designedenvironment: two norms can coexist in different areas of the same social network.

8.3.10 Experiment 5: Dealing with Emergencies: Selective NormViolation

One of our claims is that internalizers are able to unblock a norm not only in situationswhere its salience is very low, this meaning that the norm is disappearing, but also inemergency situations.

While IUMAs cannot handle normative requests with different levels of urgency(because they are not endowed with normative architectures), our internalizers (thanksto the salience module) can. In the ambulance case, the urgency of the situation isannounced by the siren, thus allowing agents to unblock the automatic behaviour ofstopping at the red traffic light (and the ambulance to pass), as the emergence situationis more salient than the observance of the norm. In our distributed web-service scenario,we can imagine the following situation: imagine one of our agents is writing a scientificpaper which includes a number of important calculations. However, the day before thedeadline the agent in question realizes it needs to rerun some of the experiments; inorder to get the results in time, it needs to use different calculation clusters distributedamongst its peers, who are requested accordingly. The requested agents can chooseto follow the FIFO (first-in, first-out) norm, or, given the urgent situation, make anexception and proceed to the execution of our last-minute scientist’s calculations. Itis easy to see how this norm violation performed by its neighbours/colleagues wouldbring a benefit to our agent (as it will have the results in time for submission).

Experimental results show us that the internalizers successfully handle emergencieswhen these are explicitly specified (as the siren in the ambulance case). However, thereis a cost associated to this advantage. An emergency is interpreted by the audience asa non-punished norm-violation, consequently reducing the norm’ salience (stopping atthe red traffic light). If agents face the same emergency too often, the salience of thenorm “normally” regulating the situation will decrease and this reduction unblock theinternalization process, leading the agent back to the normative (benefit-to-cost calcu-lation) phase. We can then affirm that emergencies can be managed by internalizers,but only when they occur sporadically, otherwise they are interpreted as a change in thesalience of the norm governing the environment.

8.3.11 Experiment 6: Topological Location

Finally, we pay attention to the importance of the location of our INHs (initial normholders) in the topology, when dealing with scale-free networks. As agents handle localinformation to adapt the notion of norm activation, those agents located closer to INHswill have a stronger and faster norm recognition process and salience adaptation. Inother chapters of this thesis (Chapter 4), we identified that agents located on certainpositions within a Scale-Free network played an important role on the emergence ofconventions. We hypothesize that in irregular networks, such as scale-free, agents withhigher connections will be the most adequate to be initialize as INHs, as they will be

137

able to influence a larger portion of the population, producing a faster spread of normswithin the population.

In order to analyze the effect of the topological location of INHs, we adjust ourexperimental platform in order to position our INHs in certain locations of the topologyand contrast their effects:

• Hubs: INHs will be located in the nodes with the highest degree.

• Leafs: INHs will be located in the nodes with the lowest degree.

• Self-Reinforcing Structures (SRSs): INHs will be located in the nodes with ahigher betweeness inside a Self-Reinforcing Structure. Eventhough the conceptof Self-Reinforcing Structure have sense in the convention emergence problem,we believe that the position of these agents (connecting small communities withthe rest of the society) can have an important impact on the distribution of norms.

In order to obtain the results, we run each of the treatments 25 times and plot theaverage results. As we want to observe the effect of the location of the INHs in the dis-tribution of norms, we load a society fully populated with internalizers, and we vary thenumber and position of INHs. Agents are located in the first scenario used in Sec.8.3.6,where services capacities are infinite and it is expected low quality from agents; thatway we only pay attention to the average salience of Norm 1: Always Share whenAsked for a Service.

Experimental results shown in Figure 8.8 are against our initial hypothesis, show-ing that locating INHs on the leafs produce a stronger effect on the average population’ssalience. Even though it seems counterintuitive, the explanation is straightforward. Be-cause of the design of our agents, INHs decisions are motivated by the informationgathered at a local level. In the Hubs treatment, INHs access more information than inthe Leafs treatment, as the INHs have more neighbours in the former treatment. For thatreason, hub INHs initially observe more violations than the others. On the other hand,leaf INHs are only connected to very few agents, and therefore the information thatthey gather is reduced with respect to the Hub INH. Moreover, and using an example,if a leaf INH is connected to only one agent, and this one is a violator, the INH willpunish/sanction this violation. As we saw previously, a punished/sanctioned violationhave a strong impact on salience. On the other hand, the hub INH that has a large num-ber of neighbours will observe a reduced number of punished/sanctioned violations,remaining the other unpunished. As we have said, those actions have an effect on thesalience of INH that motivate their decisions. In the case of Hub INHs, INHs loosetheir motivation to punish/sanction faster than a leaf INH.

These results suggest us that using punishment in a self-organizing society shouldbe a bottom-up process, so that agents do not loose the motivation to punish/sanctionfree-riders. Moreover, this result is encouraging for policy-makers, as it is not necessaryto locate INHs in hub positions of a network (which are hard to obtain because of thedegree of these nodes), being the correct position the leafs of the network.

138

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 1000 2000 3000 4000 5000

Ave

rage

Sal

ienc

e

Timesteps

LeafsHubsSRS

(a) 10 INHs.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 1000 2000 3000 4000 5000

Ave

rage

Sal

ienc

e

Timesteps

LeafsHubsSRS

(b) 30 INHs.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 1000 2000 3000 4000 5000

Ave

rage

Sal

ienc

e

Timesteps

LeafsHubsSRS

(c) 50 INHs.

Figure 8.8: Effect of the topological positioning of INHs.

139

8.4 ConclusionsWhen Vygotsky first formulated his theory of internalization, he noted that only “thebarest outline of this process is known” [Vygotskii and Cole, 1978, p 57]. We do notknow, yet, how people manage to internalize beliefs and precepts with a reasonablyadequate success, partly because we still do not agree about what to investigate, orwhat we mean by this notion. Consequently, no useful notion and model are avail-able for applications, despite its wide and profound implications. Questions such ashow norm internalization unfolds, which factors elicit it, which are its effects, obsta-cles and counter-indications, are issues of concern for all of the behavioral sciences.The internalization of social inputs is indispensable for the study and management ofa broad spectrum of phenomena, from the development of a robust moral autonomy tothe investigation and enforcement of distributed social control; from the solution to thepuzzle of cooperation, to fostering security and fighting criminality, etc.

After a cognitive definition of the subject matter, this chapter presented and dis-cussed the building blocks of a rich cognitive model of internalization as a multi-stepprocess, including several types and degrees of internalization. Next, factors favoringdifferent types of internalization were discussed. The modular character of BDI archi-tectures, like EMIL-A was shown to fit the approach advocated in the paper.

In this work we have also implemented and presented the results of the internal-ization module that has been incorporated into the existing platform, creating the newEMIL-I-A (EMIL Internalizer Agent). Internalizers have been tested in a tragedy ofthe digital commons’ scenario, where the emergence of cooperation is a difficult taskto achieve and they have proven to successfully deal with this task. The simulationdata have also shown that when facing situations in which the environment can quicklychange internalizers are able to self-adapt. We have also observed that the amount ofinitial norm holders and internalizers speed up the process of norm’s convergence, evenwhen the scenarios are dynamic and different norms have to be internalized and de-internalized. Another interesting result is how the subjective character of the salienceallows the coexistence of different norms in the same social environment: dependingon the agents’ interests and the environmental conditions, different norms can emergein the same environment. Finally, the experiments have shown that our internalizationarchitecture is flexible enough to handle emergencies and decide to violate an alreadyinternalized norm. We claim that internalization provides agents with self-policing ca-pabilities that are very useful in settings where social control is unfeasible and expen-sive.

What is the added value of a rich cognitive model of internalization, as comparedto simpler ones (e.g., reinforcement learning)? There are several competitive advan-tages. First, reinforcement learning does not account for the main intuition shared bydifferent authors, i.e. the idea that internalization makes compliance independent of ex-ternal enforcement. Second, a rich cognitive model, namely a BDI architecture with itsspecific modules, accounts for different types and degrees of internalization, bridgingthe gap between self-enforcement and automatic responses. Third, a BDI architectureaccounting for different levels of internalization allows flexibility to be combined withautomatism, as well as thoughtless conformity with autonomy. A BDI system can hostautomatism, but a simpler agent architecture does not allow for flexible, innovative and

140

autonomous action.

141

Chapter 9

Conclusions

In this final chapter we summarize the main contributions of the thesis that have alreadyappeared partially in the respective sections of the dissertation. We also state the futurelines of research that this thesis has opened. For the sake of clarity we will divide boththe conclusions and the future works in two sections, one for the convention emergencepart and the other for the emergence of cooperation in Multi-agent Systems. However,there is a main conclusion to be extracted from the thesis as a whole: we have presenteda game theoretical differentiation for social norms, and a different range of techniquesthat can be used by agents (independently) in order to achieve their tasks in a moreefficient manner. This thesis shows how norms are essential for the achievement ofcoordination between agents without a centralized authority, even though these normsare against the agents’ self interest.

9.1 Convention Emergence

9.1.1 Conclusions

In this thesis we have developed a general framework for the convention emergenceproblem. The problem of convention emergence is a key problem to be solved for realself-organized systems. By allowing agents to reach a common convention from allpossibles is one of the simplest coordination tasks to be achieved in a decentralizedway. In our approach we focus on the emergence of the conventions following a sociallearning approach, as it seems to be the most realistic and less intrusive for the rest ofagents in the population (others have studied this problem although accessing to privateinformation of the other agents). Our experimental framework has been designed tovary a number of parameters that were never previously explored in the literature (typeof game, players per interaction, conventions available, strategy update rules, learningalgorithms and topologies), and we believed were essential for a complete understand-ing of the phenomenon.

The systematic variation of these parameters allowed us to analyze the dynamics ofthe process of convention emergence and analyze the reasons why previous researchers

143

did not overcome the threshold of 90% emergence rate. We have seen how certain con-figurations of the social network and some strategy update rules promote the emergenceof subconventions that remain meta-stable. It is the first time during the our researchin convention emergence that we identify this meta-stability, introduced by the sociallearning and the topology of interactions.

In order to dissolve those subconventions we analyzed the state-of-the-art of thesocially-inspired instruments used in MAS and propose two new ones: rewiring andobservance. Rewiring allows agents to break links with agents from which they arenot getting satisfactory interactions, while Observance allows agents to obtain infor-mation about the last played strategy of a certain subset of agents. With the usage ofthese instruments independently, the emergence of global conventions is accelerated insome scenarios, remaining others meta-stable. Specifically, we observe than when ourinstruments are only fed with local information, the subconventions remain meta-stable.

The recognition of this locality effect took us to discover the existence of Self-Reinforcing Structures in Scale-Free Networks. These structures are a group of agentsthat can emerge a subconvention that remains stable because of the interaction topologywhich directly affect to the social learning. The Self-Reinforcing Structures detected areClaws and Caterpillars in Scale-Free Networks. With an instrument that combines thepower of observance and rewiring, we achieved the dissolution of the Self-ReinforcingStructures, allowing then the system to achieve full-convergence, accessing only publicand local information.

9.1.2 Future Work

Along this work, we have seen how the convention emergence problem show interest-ing dynamics when studied in complex networks such as Scale Free. As this type ofnetworks represent theoretical social networks, the future work regarding this area ofresearch involves a careful analysis of the convention emergence in Scale-Free Net-works.

In this thesis we have already thrown some light about the dynamics observed inthose type of networks, and how the existence of Self-Reinforcing Structures delayedthe process of convention emergence. Our solution (social instruments) proposes thedynamic rewiring of the network. However, we did not concentrated on the mainte-nance of the characteristics that define such networks: scale-free networks are charac-terized by a number of properties (short average network diameter, small clustering).Our rewiring instrument does not pay attention in maintaining those properties whichdetermine a scale-free network. Recent work [Scholtes, 2010] presented a distributedrewiring scheme that is suitable to effectuate scale-free overlay topologies. One of ourshort-term tasks is the development of an algorithm that produces effective rewiring,maintaining the scale-free network, accessing only local information.

Another line of research that we consider of vital importance to be performed isthe analysis of reconfiguration techniques. This thesis has studied how conventions areachieved under static environments. It would be interesting to study how our socialinstruments ease the process of reconfiguration to different conventions if one of themlooses its validity within the social context of agents.

144

Finally, we plan to introduce more complex protocols to be learnt by agents: in thispart of our work, agents only had to learn how to behave in a single action interaction.However, we plan to introduce norms that involve more than one action, allowing agentsto learn normative patterns with the form of protocols. By studying the complexityof the protocols (the number of available actions, the length of the protocol and thetypology of agents) we will observe how our agents are able to coordinate for morecomplicated tasks.

9.2 Essential Norms and Emergence of Cooperation

9.2.1 Conclusions

This thesis have shown a game-theoretical approach to norms that differentiates be-tween conventions and essential norms, as games that represent different dynamicswith respect to the interest of the individuals. We have seen how conventions are nota solution for certain phenomena, like situations that represent common-good games.Initially we envisioned an specific type of peer punishment (distributed punishment)to impose the emergence of cooperation and its maintenance. Contrary to monolateralpunishment technologies (like that used in [Dreber et al., 2008]), distributed punish-ment seemed attractive since it would reduce individual costs (by dividing the cost ofpunishment amongst the punishers), obtaining higher social costs (by making defectiona less attractive option for norm-violators).

The empirical results obtained with human subjects showed us that motivations to-wards cooperation are not only driven by the external enforcement, but affected byother factors. Our hypothesis was that certain types of peer punishment have a strongerimpact on the agents decision making than the utilitarian damage.

In this thesis we have established the distinction between punishment (utility de-duction) and sanction (utility deduction plus norm elicitation), and we have sketcheda BDI agent architecture (EMIL-I-A) whose goals (and beliefs) are not only affectedby a benefit-cost calculation, but also, by the existence of norms (and their importancewithin the social context).

Experiments with human subjects confirmed the hypothesis of the difference be-tween punishment and sanction, using a experimental platform where humans couldparticipate in the experiments through an electronic institution. These experiments havealso served us to fine-tune the parameters that regulate the agent architecture, prov-ing another successful example of cross-fertilization between experimental economicsand computer science [Grossklags, 2007]. This differentiation between punishment andsanction is an important step forward, specially in the economics research, as so far lit-tle importance was given to the non-costly communication messages (assuming them ascheap talk [Farrell and Rabin, 1996]). The usage of sanction produces a more resilientindividual, as its decision making is also influenced by normative motivations.

We exploit the capabilities of EMIL-I-A in simulated scenarios, proving that ouragents achieve the emergence of cooperation with lower social costs and in a more re-silient way. We have performed a detailed analysis varying the amount of our agentsin populations of pure utilitarian agents, observing the adaptive capabilities of our ar-

145

chitecture. Another interesting contribution of this work is the intelligent DynamicAdaptation Heuristic which allows agents to dynamically change the amount of pun-ishment/sanction sent to norm-violators, producing an efficient punishment mechanism.One final interesting result confirm the robustness of sanction against invasions of freeriders into the population.

Finally, we have studied how Norm Internalization can be incorporated as a dif-ferent module in the agents mind allowing them to avoid a continuous evaluation ofthe possible actions in their decision making process. This internalization mechanismallows agents to save computational resources by firing automated actions in certaininternalized situations. We have studied how internalizers perform in different types ofscenarios, achieving good performance without sacrificing adaption skills. Moreover,the final experiments have allowed us to test the locality of social norms, being possiblethe emergence of different norms in different areas of the topology.

9.2.2 Future WorkThe emergence and maintenance of cooperation is a problem that worry policy mak-ers of open distributed punishment, as the natural tendency in these type of systemsis to defect and free-ride, producing a collapse of the system. As an example, whenGnutella was one of the most popular p2p sharing platform, 70% of users did not shareresources [Adar and Huberman, 2000]. In this thesis we have studied how differentpunishment technologies can affect the decision making of agents in a distributed man-ner, without the necessity of a centralized authority, which would be costly for thesystem. Unfortunately, the sanctioning technology that we propose also implies a costfor the punisher, entering in a second-order public good game. For that reason, and as afuture work to be performed in this area of research, we are interested in studying howreputation can work as a cost-free mechanism that ensures cooperation. By providingagents with the ability to communicate, they can send evaluations that affect their fu-ture interactions with other agents. Agents need to be given with the trust and reputationmechanism that (1) aggregates the rumors that receives, and (2) activates a broadcastingmechanism to spread rumors. By giving agents with this mechanism, agents also needto be given an heuristic to determine what to communicate and to whom. This problemis essential to be understood within the scope of social networks.

As a more direct task in our future work plan we need to carefully analyze theDynamic Adaptation Heuristic used for the efficient sanctioning. At the moment, theheuristic changes the damage associated to punishment and sanction depending on theamount of norm-violators within their social environment. However, we believe thatthis adaptation should be also proportional to the changes in the environment in order tobe more efficient: if a larger amount of norm-violator appears, the damage of sanctionshould be larger (instead of increasing gradually, producing a number of punishmentacts with no deterrent effect).

146

Bibliography

[nap, 2006] (2006). Napster. http://www.napster.com/.

[Nor, 2008] (2008). Special issue on normative multiagent systems. In van der Torre,L., Boella, G., and Verhagen, H., editors, Journal of Autonomous Agents and Multi-Agent Systems, volume 17. Springer.

[Adar and Huberman, 2000] Adar, E. and Huberman, B. A. (2000). Free riding ongnutella. First Monday.

[Albert and Barabasi, 2002] Albert, R. and Barabasi, A.-L. (2002). Statistical mechan-ics of complex networks. Review of Modern Physics, 74(1):47–97.

[Albert et al., 2000] Albert, R., Jeong, H., and Barabasi, A.-L. (2000). Error and attacktolerance of complex networks. Nature, 406(6794):378–382.

[Aldewereld et al., 2007] Aldewereld, H., Garcia-Camino, A., Dignum, F., Noriega,P., Rodriguez-Aguilar, J. A., and Sierra, C. (2007). Operationalisation of normsfor usage in electronic institutions. In Coordination, Organization, Institutions andNorms in agent systems, Lecture Notes in Computer Science.

[Aldewereld et al., 2005] Aldewereld, H., Grossi, D., Vazquez-Salceda, J., andDignum, F. (2005). Designing normative behaviour by the use of landmarks. InProceedings of AAMAS-05 International Workshop on Agents, Norms and Institu-tion for Regulated Multi Agent Systems, pages 5–18.

[Andrighetto et al., 2010a] Andrighetto, G., Campennı, M., Cecconi, F., and Conte, R.(2010a). The complex loop of norm emergence: A simulation model. In Deguchi, H.and et al., editors, Simulating Interacting Agents and Social Phenomena, volume 7of Agent-Based Social Systems, pages 19–35. Springer Japan.

[Andrighetto et al., 2007] Andrighetto, G., Campennı, M., Conte, R., and Paolucci, M.(2007). On the immergence of norms: a normative agent architecture. In EmergentAgents and Socialities: Social and Organizational Aspects of Intelligence. Papersfrom the AAAI Fall Symposium.

[Andrighetto et al., 2010b] Andrighetto, G., Villatoro, D., and Conte, R. (2010b).Norm internalization in artificial societies. AI Communications. (In press).

147

[Androutsellis-Theotokis and Spinellis, 2004] Androutsellis-Theotokis, S. and Spinel-lis, D. (2004). A survey of peer-to-peer content distribution technologies. ACMComput. Surv., 36:335–371.

[Araujo and Lamb, 2008] Araujo, R. and Lamb, L. (2008). Memetic networks: Ana-lyzing the effects of network propoerties in multi-agent performance. In Proceedingsof the 23rd AAAI Conference on Artificial Intelligence.

[Aronfreed, 1968] Aronfreed, J. M. (1968). Conduct and conscience; the socializationof internalized control over behavior [by] Justin Aronfreed. Academic Press, NewYork,.

[Asch, 1955] Asch, S. (1955). Opinions and social pressure. Scientific American,193(5):31–35.

[Ashlock et al., 1996] Ashlock, D., Smucker, M. D., Stanley, E. A., and Tesfatsion, L.(1996). Preferential partner selection in an evolutionary study of prisoner’s dilemma.Biosystems, 37(1-2):99 – 125.

[Axelrod, 1981] Axelrod, R. (1981). The emergence of cooperation among egoists.The American Political Science Review, 75(2):pp. 306–318.

[Axelrod, 1986] Axelrod, R. (1986). An evolutionary approach to norms. The Ameri-can Political Science Review, 4(80):1095–1111.

[Babaoglu and Jelasity, 2008] Babaoglu, O. and Jelasity, M. (2008). Self-* propertiesthrough gossiping. Philosophical transactions. Series A, Mathematical, physical,and engineering sciences, 366(1881):3747–3757.

[Bandura, 1976] Bandura, A. (1976). Social Learning Theory. Prentice Hall.

[Bandura, 1991] Bandura, A. (1991). Social cognitive theory of moral thought andaction. In Kurtines, W. M. and Gewirtz, J. L., editors, Handbook of moral behaviorand development. Lawrence Erlbauml, Hillsdale, NJ.

[Banerjee et al., 2005] Banerjee, D., Saha, S., Sen, S., and Dasgupta, P. (2005). Re-ciprocal resource sharing in p2p environments. Proceedings of AAMAS’05. Utrecht.Netherlands.

[Barabasi and Bonabeau, 2003] Barabasi, A. and Bonabeau, E. (2003). Scale-free net-works. Scientific American, 288(5):60–9.

[Barabasi and Albert, 1999] Barabasi, A.-L. and Albert, R. (1999). Emergence of scal-ing in random networks. Science, 286:509–512.

[Barkow et al., 1995] Barkow, J. H., Cosmides, L., and Tooby, J., editors (1995). TheAdapted Mind : Evolutionary Psychology and the Generation of Culture. OxfordUniversity Press, USA.

148

[Barros, 2007] Barros, B. (2007). Group Size, Heterogeneity, and Prosocial Behavior:Designing Legal Structures to Facilitate Cooperation in a Diverse Society. SSRNeLibrary.

[Bazzan et al., 2008] Bazzan, A. L. C., Dahmen, S. R., and Baraviera, A. T. (2008).Simulating the effects of sanctioning for the emergence of cooperation in a publicgoods game. In AAMAS ’08, pages 1473–1476, Richland, SC. International Founda-tion for Autonomous Agents and Multiagent Systems.

[Becker, 1968] Becker, G. S. (1968). Crime and punishment: An economic approach.The Journal of Political Economy, 76(2):169–217.

[Bicchieri, 2006] Bicchieri, C. (2006). The Grammar of Society: The Nature and Dy-namics of Social Norms. Cambridge University Press, New York.

[Bicchieri and Chavez, 2010] Bicchieri, C. and Chavez, A. (2010). Behaving as ex-pected: Public information and fairness norms. Journal of Behavioral DecisionMaking, 23(2):161–178.

[Bicchieri and Xiao, 2007] Bicchieri, C. and Xiao, E. (2007). Do the right thing: Butonly if others do so. MPRA Paper 4609, University Library of Munich, Germany.

[Bickel and Johnson, 2003] Bickel, W. K. and Johnson, M. W. (2003). Time and de-cision, chapter Delay discounting: a fundamental behavioral process of drug depen-dence. New York: Russell Sage Foundation.

[Blanc et al., 2005] Blanc, A., Liu, Y.-K., and Vahdat, A. (2005). Designing incentivesfor peer-to-peer routing. In INFOCOM, pages 374–385.

[Bo and Bo, 2009] Bo, E. D. and Bo, P. D. (2009). "do the right thing:"the effects of moral suasion on cooperation. NBER Working Papers 15559, NationalBureau of Economic Research, Inc.

[Boella et al., 2009] Boella, G., Caire, P., and van der Torre, L. (2009). Norm negotia-tion in online multi-player games. Knowl. Inf. Syst., 18:137–156.

[Boella and Lesmo, 2002] Boella, G. and Lesmo, L. (2002). A game theoretic ap-proach to norms and agents. Cognitive Science Quarterly, pages 492–512.

[Boella et al., 2008] Boella, G., Torre, L., and Verhagen, H. (2008). Introduction to thespecial issue on normative multiagent systems. Autonomous Agents and Multi-AgentSystems, 17(1):1–10.

[Bowling and Veloso, 2002] Bowling, M. H. and Veloso, M. M. (2002). Multiagentlearning using a variable learning rate. Artificial Intelligence, 136(2):215–250.

[Boyd and Richerson, 1992] Boyd, R. and Richerson, P. (1992). Punishment allowsthe evolution of cooperation (or anything else) in sizable groups. Ethology and So-ciobiology, 13(3):171–195.

149

[Brennan et al., 2008] Brennan, G., Gonzlez, L. G., Gth, W., and Levati, M. V. (2008).Attitudes toward private and collective risk in individual and strategic choice situa-tions. Journal of Economic Behavior & Organization, 67(1):253 – 262.

[Brito et al., 2009] Brito, I., Pinyol, I., Villatoro, D., and Sabater-Mir, J. (2009). Hi-herei: human interaction within hybrid environments regulated through electronicinstitutions. In AAMAS ’09: Proceedings of The 8th International Conference onAutonomous Agents and Multiagent Systems, pages 1417–1418, Richland, SC. Inter-national Foundation for Autonomous Agents and Multiagent Systems.

[Campennı et al., 2009] Campennı, M., Andrighetto, G., Cecconi, F., and Conte, R.(2009). Normal = normative? the role of intelligent agents in norm innovation.Mind & Society, 8:153–172.

[Cardoso and Oliveira, 2008] Cardoso, H. L. and Oliveira, E. (2008). Electronic insti-tutions for b2b: dynamic normative environments. Artif. Intell. Law, 16:107–128.

[Cardoso and Oliveira, 2009] Cardoso, H. L. and Oliveira, E. (2009). Adaptivedeterrence sanctions in a normative framework. In Proceedings of the 2009IEEE/WIC/ACM International Joint Conference on Web Intelligence and IntelligentAgent Technology - Volume 02, WI-IAT ’09, pages 36–43, Washington, DC, USA.IEEE Computer Society.

[Casari, 2004] Casari, M. (2004). On the design of peer punishment experiments.UFAE and IAE Working Papers 615.04, Unitat de Fonaments de l’Anlisi Econmica(UAB) and Institut d’Anlisi Econmica (CSIC).

[Castelfranchi and Conte, 1995] Castelfranchi, C. and Conte, R. (1995). Artificial so-cieties: The computer simulation of social life, chapter Simulative understanding ofnorm functionalities in social groups, pages 252–267. UCL Press.

[Castelfranchi et al., 2000] Castelfranchi, C., Dignum, F., Jonker, C. M., and Treur,J. (2000). Deliberative normative agents: Principles and architecture. In 6th In-ternational Workshop on Intelligent Agents VI, Agent Theories, Architectures, andLanguages (ATAL),, pages 364–378, London, UK. Springer-Verlag.

[Chao et al., 2008] Chao, I., Ardaiz, O., and Sanguesa, R. (2008). Tag mechanismsevaluated for coordination in open multi-agent systems. pages 254–269.

[Cialdini and Goldstein, 2004] Cialdini, R. and Goldstein, N. (2004). Social influence:Compliance and conformity. Annual Review of Psychology, 55:591– 621.

[Cialdini et al., 1990] Cialdini, R. B., Reno, R. R., and Kallgren, C. A. (1990). A focustheory of normative conduct: Recycling the concept of norms to reduce littering inpublic places. Journal of Personality and Social Psychology, 58(6):1015–1026.

[Cohen and Levesque, 1990] Cohen, P, R. and Levesque, H., J. (1990). Intention ischoice with commitment. Artificial Intelligence, 42(2-3):213–261.

[Coleman, 1998] Coleman, J. (1998). Foundations of social theory.

150

[Conte, 2009] Conte, R. (2009). Rational, goal-oriented agents. In Encyclopedia ofComplexity and Systems Science, pages 7533–7548.

[Conte and Castelfranchi, 1995] Conte, R. and Castelfranchi, C. (1995). Cognitive andsocial action. University College of London Press, London.

[Conte and Castelfranchi, 1999] Conte, R. and Castelfranchi, C. (1999). From conven-tions to prescriptions. towards a unified theory of norms. AI and Law, 7:323–340.

[Conte and Castelfranchi, 2006] Conte, R. and Castelfranchi, C. (2006). The mentalpath of norms. Ratio Juris, 19(4):501–517.

[Conte and Paolucci, 2002] Conte, R. and Paolucci, M. (2002). Reputation in artificialsocieties: Social beliefs for social order. Kluwer Academic Publishers.

[Croson, 2005] Croson, R. (2005). The method of experimental economics. Interna-tional Negotiation, 10:131–148.

[Curbera et al., 2002] Curbera, F., Duftler, M., Khalaf, R., Nagy, W., Mukhi, N., andWeerawarana, S. (2002). Unraveling the web services web: An introduction to soap,wsdl, and uddi. IEEE Internet Computing, 6:86–93.

[de Melo et al., 2011] de Melo, C., Carnevale, P., and Gratch, J. (2011). The effect ofexpression of anger and happiness in computer agents on negotiations with humans.In Tenth International Conference on Autonomous Agents and Multiagent Systems.

[de Pinninck et al., 2007a] de Pinninck, A. P., Sierra, C., and Schorlemmer, M.(2007a). Distributed norm enforcement via ostracism. In COIN @ MALLOW.

[de Pinninck et al., 2007b] de Pinninck, A. P., Sierra, C., and Schorlemmer, W. M.(2007b). Friends no more: Norm enforcement in multi-agent systems. In Durfee, E.H.; Yokoo, M., editor, Proceedings of AAMAS 2007, pages 640–642.

[de Pinninck Bas et al., 2010] de Pinninck Bas, A. P., Sierra, C., and Schorlemmer, M.(2010). A multiagent network for peer norm enforcement. Autonomous Agents andMulti Agent Systems, 21:397–424.

[de Vos et al., 2001] de Vos, H., Smaniotto, R., and Elsas, D. (2001). Reciprocal altru-ism under conditions of partner selection. Ration Soc, 13(2):139–183.

[Decker et al., 1997] Decker, K., Sycara, K. P., and Williamson, M. (1997). Middle-agents for the internet. In IJCAI (1), pages 578–583.

[Delgado, 2002] Delgado, J. (2002). Emergence of social conventions in complex net-works. Artificial Intelligence, 141(1-2):171–185.

[Delgado et al., 2003] Delgado, J., Pujol, J. M., and Sanguesa, R. (2003). Emergenceof coordination in scale-free networks. Web Intelli. and Agent Sys., 1(2):131–138.

151

[Dignum et al., 2000] Dignum, F., Morley, D., Sonenberg, E., and Cavedon, L. (2000).Towards socially sophisticated bdi agents. In Proceedings of the Fourth InternationalConference on MultiAgent Systems (ICMAS-2000), pages 111–, Washington, DC,USA. IEEE Computer Society.

[Dreber et al., 2008] Dreber, A., Rand, D. G., Fudenberg, D., and Nowak, M. A.(2008). Winners don’t punish. Nature, 452(7185):348–351.

[Elster, 1989] Elster, J. (1989). Social norms and economic theory. Journal of Eco-nomic Perspectives 3, 4:99–117.

[Epstein, 2000] Epstein, J. M. (2000). Learning to be thoughtless: Social norms andindividual computation. Working Papers 00-03-022, Santa Fe Institute.

[Esteva, 2003] Esteva, M. (2003). Electronic Institutions: from specification to devel-opment. IIIA PhD Monography. Vol. 19.

[Esteva et al., 2008] Esteva, M., Rodriguez-Aguilar, J. A., Arcos, J.-L., Sierra, C., Nor-iega, P., Rosell, B., and de la Cruz, D. (2008). Electronic institutions developmentenvironment (demo paper). In Proc. of AAMAS’08.

[Farrell and Rabin, 1996] Farrell, J. and Rabin, M. (1996). Cheap talk. The Journal ofEconomic Perspectives, 10(3):pp. 103–118.

[Fehr and Gachter, 2000] Fehr, E. and Gachter, S. (2000). Cooperation and punishmentin public goods experiments. American Economic Review, 90(4):980–994.

[Fehr and Gachter, 2002] Fehr, E. and Gachter, S. (2002). Altruistic punishment inhumans. Nature, 415:137–140.

[Fischbacher, 2006] Fischbacher, U. (2006). Abstract z-tree: Zurich toolbox for ready-made economic experiments.

[Fowler and Christakis, 2010] Fowler, J. H. and Christakis, N. A. (2010). Cooperativebehavior cascades in human social networks. Proceedings of the National Academyof Sciences, 107(12):5334–5338.

[Freeman, 1979] Freeman, L. C. (1979). Centrality in social networks: Conceptualclarification. Social Networks, 1(3):215–239.

[Fudenberg and Levine, 1998] Fudenberg, D. and Levine, D. K. (1998). The Theory ofLearning in Games. The MIT Press.

[Galbiati and D’Antoni, 2007] Galbiati, R. and D’Antoni, M. (2007). A signalling the-ory of nonmonetary sanctions. International Review of Law and Economics, 27:204–218.

[Garcia-Camino A, 2006] Garcia-Camino A, Rodriguez-Aguilar A, S. C. V. W. (2006).Norm-oriented programming of electronic institutions. In AAMAS ’06.

152

[Giardini et al., 2010] Giardini, F., Andrighetto, G., and Conte, R. (2010). A cognitivemodel of punishment. In COGSCI 2010, Annual Meeting of the Cognitive ScienceSociety 11-14 August 2010,, pages –. Portland, Oregon.

[Gintis, 2003] Gintis, H. (2003). The Hitchhiker’s Guide to Altruism: Gene-cultureCoevolution, and the Internalization of Norms. Journal of Theoretical Biology,220(4):407–418.

[Gintis, 2004] Gintis, H. (2004). The genetic side of gene-culture coevolution: inter-nalization of norms and prosocial emotions. Journal of Economic Behavior andOrganization, 53:57–67.

[Goodall, 2005] Goodall, C. E. (2005). Modifying Smoking Behavior Through PublicService Announcements and Cigarette Package Warning Labels: A Comparison ofCanada and the United States. PhD thesis, Ohio State.

[Griffiths and Luck, 2010] Griffiths, N. and Luck, M. (2010). Changing neighbours:Improving tag-based cooperation. In Proceedings of the Ninth International Confer-ence on Autonomous Agents and Multiagent Systems, pages 249–256.

[Grizard et al., 2006] Grizard, A., Vercouter, L., Stratulat, T., and Muller, G. (2006).A peer-to-peer normative system to achieve social order. In Workshop on COIN @AAMAS’ 06.

[Grossi et al., 2007] Grossi, D., Aldewereld, H. M., and Dignum, F. (2007). Ubi lex,ibi poena: Designing norm enforcement in e-institutions. In Coordination, Organi-zations, Institutions, and Norms in Agent Systems II, pages 101–114. Springer.

[Grossklags, 2007] Grossklags, J. (2007). Experimental economics and experimentalcomputer science: a survey. In Experimental computer science on Experimentalcomputer science, pages 12–12, Berkeley, CA, USA. USENIX Association.

[Grusec and Kuczynski, 1997] Grusec, J. E. and Kuczynski, L. (1997). Parenting andchildren’s internalization of values: a handbook of contemporary theory. New York:Wiley.

[Hales, 2002] Hales, D. (2002). Group reputation supports beneficent norms. Journalof Artificial Societies and Social Simulation, 5.

[Hales and Arteconi, 2006] Hales, D. and Arteconi, S. (2006). Slacer: A self-organizing protocol for coordination in p2p networks. IEEE Intelligent Systems,21:29,35.

[Hales and Edmonds, 2005] Hales, D. and Edmonds, B. (2005). Applying a socially-inspired technique (tags) to improve cooperation in p2p networks. EEE Transactionsin Systems, Man and Cybernetics - Part A: Systems and Humans, 35, 3:385–395.

[Haley, 2003] Haley, D. F. . K. J. (2003). Genetic and cultural evolution of cooper-ation, chapter The strategy of affect: emotions in human cooperation, pages 7–36.Cambridge, MA: The MIT Press.

153

[Hardin, 1968] Hardin, G. (1968). The tragedy of the commons. Science, xx:1243–47.

[Hardin, 1971] Hardin, R. (1971). Collective action as an agreeable n-prisoners’dilemma. Behavioral Science, 16(5):472–481.

[Heckathorn, 1996] Heckathorn, D. D. (1996). The dynamics and dilemmas of collec-tive action. American Sociological Review, 61(2):250–277.

[Hirschman, 1984] Hirschman, A. O. (1984). Against parsimony: Three easy ways ofcomplicating some categories of economic discourse. American Economic Review,74(2):89–96.

[Holland, 1993] Holland, J. (1993). The effects of labels (tags) on social interactions.Working Paper Santa Fe Institute, 93-10-064.

[Horne, 2003] Horne, C. (2003). The internal enforcement of norms. Eur Sociol Rev,19(4):335–343.

[Houser and Xiao, 2010] Houser, D. and Xiao, E. (2010). Understanding context ef-fects. Journal of Economic Behavior & Organization, 73(1):58–61.

[Joseph and Prakken, 2009] Joseph, S. and Prakken, H. (2009). Coherence-driven ar-gumentation to norm consensus. In Proceedings of the 12th International Confer-ence on Artificial Intelligence and Law, ICAIL ’09, pages 58–67, New York, NY,USA. ACM.

[Kelsen, 1979] Kelsen, H. (1979). General Theory of Norms. Hardcover.

[Kittock, 1993] Kittock, J. E. (1993). Emergent conventions and the structure of multi-agent systems. In Lectures in Complex systems: the proceedings of the 1993 Complexsystems summer school, Santa Fe Institute Studies in the Sciences of ComplexityLecture Volume VI, Santa Fe Institute, pages 507–521. Addison-Wesley.

[Kittock, 1994] Kittock, J. E. (1994). The impact of locality and authority on emergentconventions: initial observations. In Proceedings of AAAI’94, volume 1, pages 420–425. American Association for Artificial Intelligence.

[Kollock and Smith, 1996] Kollock, P. and Smith, M. (1996). Managing the VirtualCommons: Cooperation and Conflict in Computer Communities. pages 109–128.John Benjamins, Amsterdam.

[Lakkaraju and Gasser, 2008] Lakkaraju, K. and Gasser, L. (2008). Norm emergencein complex ambiguous situations. In Proceedings of the Workshop on Coordination,Organizations, Institutions and Norms at AAAI.

[Lazer et al., 2009] Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabasi, A.-L.,Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., Jebara, T., King,G., Macy, M., Roy, D., and Van Alstyne, M. (2009). Social science: Computationalsocial science. Science, 323(5915):721–723.

154

[Lotzmann and Mohring, 2009] Lotzmann, U. and Mohring, M. (2009). Simulatingnorm formation: an operational approach. In Proceedings of The 8th InternationalConference on Autonomous Agents and Multiagent Systems - Volume 2, AAMAS’09, pages 1323–1324, Richland, SC. International Foundation for AutonomousAgents and Multiagent Systems.

[Luck et al., 2005] Luck, M., McBurney, P., Shehory, O., and Willmott, S. (2005).Agent Technology: Computing as Interaction (A Roadmap for Agent Based Com-puting). AgentLink.

[MacNorms, 2008] MacNorms (2008). MacNorms: Mechanisms forSelf Organization and Social Control generators of Social Norms.http://www.iiia.csic.es/es/project/macnorms-0.

[Masclet, 2003] Masclet, D. (2003). L’analyse de l’influence de la pression des pairsdans les quipes de travail. CIRANO Working Papers 2003s-35, CIRANO.

[Masclet et al., 2003] Masclet, D., Noussair, C., Tucker, S., and Villeval, M.-C. (2003).Monetary and nonmonetary punishment in the voluntary contributions mechanism.American Economic Review, 93(1):366–380.

[Matlock and Sen, 2009] Matlock, M. and Sen, S. (2009). Effective tag mechanismsfor evolving cooperation. In Proceedings of The 8th International Conference onAutonomous Agents and Multiagent Systems - Volume 1, AAMAS ’09, pages 489–496, Richland, SC. International Foundation for Autonomous Agents and MultiagentSystems.

[Mead, 1963] Mead, M. (1963). Cultural patterns and technical change. New York:The New American Library.

[Meneguzzi and Luck, 2009] Meneguzzi, F. and Luck, M. (2009). Norm-based be-haviour modification in bdi agents. In Proceedings of The 8th International Confer-ence on Autonomous Agents and Multiagent Systems - Volume 1, AAMAS ’09, pages177–184, Richland, SC. International Foundation for Autonomous Agents and Mul-tiagent Systems.

[Mukherjee et al., 2007] Mukherjee, P., Sen, S., and Airiau, S. (2007). Norm emer-gence in spatially contrained interactions. In Proceedings of ALAg-07, Honolulu,Hawaii, USA.

[Mukherjee et al., 2008] Mukherjee, P., Sen, S., and Airiau, S. (2008). Norm emer-gence under constrained interactions in diverse societies. In Proceedings of the 7thinternational joint conference on Autonomous agents and multiagent systems - Vol-ume 2, AAMAS ’08, pages 779–786, Richland, SC. International Foundation forAutonomous Agents and Multiagent Systems.

[Mungovan et al., 2011] Mungovan, D., Howley, E., and Duggan, J. (2011). The in-fluence of random interactions and decision heuristics on norm evolution in socialnetworks. Computational & Mathematical Organization Theory, pages 1–27.10.1007/s10588-011-9085-7.

155

[Newman, 2003] Newman, M. E. J. (2003). The structure and function of complexnetworks. SIAM REVIEW, 45:167–256.

[Nikiforakis and Normann, 2008] Nikiforakis, N. and Normann, H.-T. (2008). A com-parative statics analysis of punishment in public-good experiments. ExperimentalEconomics, 11(4):358–369.

[Noriega, 1997] Noriega, P. (1997). Agent-Mediated Auctions: The FishmarketMetaphor. IIIA Phd Monography. Vol. 8.

[Nosratinia et al., 2007] Nosratinia, A., Member, S., and Member, T. E. H. (2007).Grouping and partner selection in cooperative wireless networks. IEEE Journal onSelected Areas in Communications, 25:369–378.

[Noussair et al., 2003] Noussair, C., Masclet, D., Tucker, S., and Villeval, M. (2003).Monetary and non-monetary punishment in the voluntary contributions mechanism.Open access publications from tilburg university, Tilburg University.

[Noussair and Tucker, 2005a] Noussair, C. and Tucker, S. (2005a). Combining mon-etary and social sanctions to promote cooperation. Open Access publications fromTilburg University urn:nbn:nl:ui:12-377935, Tilburg University.

[Noussair and Tucker, 2005b] Noussair, C. and Tucker, S. (2005b). Combining mone-tary and social sanctions to promote cooperation. Economic Inquiry, 43(3):649–660.

[Nowak and Sigmund, 2005] Nowak, M. A. and Sigmund, K. (2005). Evolution ofindirect reciprocity. Nature., 437(7063):1291–1298.

[Ohtsuki and Iwasa, 2004] Ohtsuki, H. and Iwasa, Y. (2004). How should we definegoodness??reputation dynamics in indirect reciprocity. Journal of Theoretical Biol-ogy, 231(1):107–120.

[Ostrom, 2000] Ostrom, E. (2000). Collective action and the evolution of social norms.Journal of Economic Perspectives, 14(3):137–158.

[Ostrom et al., 1992] Ostrom, E., Walker, J., and Gardner, R. (1992). Covenants withand without a sword: Self-governance is possible. The American Political ScienceReview, 86(2):404–417.

[Papaioannou and Stamoulis, 2005] Papaioannou, T. and Stamoulis, G. (2005). An in-centives’ mechanism promoting truthful feedback in peer-to-peer systems. In ClusterComputing and the Grid, 2005. CCGrid 2005. IEEE International Symposium on,volume 1, pages 275 – 283 Vol. 1.

[Parsons, 1937] Parsons, T. (1937). The structure of social action. A study in socialtheory with special reference to a group of recent European writers. New York,London: Free Press.

156

[Pasquier et al., 2006] Pasquier, P., Flores, R. A., and Chaib-draa, B. (2006). An on-tology of social control tools. In Proceedings of the fifth international joint con-ference on Autonomous agents and multiagent systems, AAMAS ’06, pages 1369–1371, New York, NY, USA. ACM.

[Posner and Rasmusen, 1999] Posner, R. and Rasmusen, E. (1999). Creating and en-forcing norms, with special reference to sanctions. Law and Economics 9907004,EconWPA.

[Pujol et al., 2005] Pujol, J. M., Delgado, J., Sanguesa, R., and Flache, A. (2005). Therole of clustering on the emergence of efficient social conventions. In Proceedingsof the 19th international joint conference on Artificial intelligence, pages 965–970,San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.

[Rachlin, 2000] Rachlin, H. (2000). The science of self-control. Cambridge, London:Harvard University Press.

[Radwanick, 2010] Radwanick, S. (2010). Facebook captures top spot among socialnetworking sites in india. comScore Inc. Press Release.

[Riolo et al., 2001] Riolo, R., Cohen, M., and Axelrod, R. (2001). Cooperation withoutreciprocity. Nature, 414:441–443.

[Rodrıguez-Aguilar, 2003] Rodrıguez-Aguilar, J. A. (2003). On the design and con-struction of agent-mediated electronic institutions. IIIA Phd Monography. Vol. 14.

[Saam and Harrer, 1999] Saam, N. J. and Harrer, A. (1999). Simulating norms, socialinequality, and functional change in artificial societies. Journal of Artificial Societiesand Social Simulation, 2(1).

[Sabater and Sierra, 2005] Sabater, J. and Sierra, C. (2005). Review on computationaltrust and reputation models. Artif. Intell. Rev., 24:33–60.

[Sabater-Mir et al., 2007] Sabater-Mir, J., Pinyol, I., Villatoro, D., and Cuni, G. (2007).Towards hybrid experiments on reputation mechanisms: Bdi agents and humans inelectronic institutions. In Proc. of CAEPIA’07.

[Sacks et al., 2009] Sacks, A., Levi, M., and Tyler, T. (2009). Conceptualizing legiti-macy: Measuring legitimating beliefs. American Behavioral Scientist.

[Salazar et al., 2010] Salazar, N., Rodriguez-Aguilar, J. A., and Arcos, J. L. (2010).Robust coordination in large convention spaces. AI Commun., 23:357–372.

[Salazar-Ramirez et al., 2008] Salazar-Ramirez, N., Rodrıguez-Aguilar, J. A., and Ar-cos, J. L. (2008). An infection-based mechanism for self-adaptation in multi-agentcomplex networks. pages 161–170.

[Savarimuthu et al., 2007a] Savarimuthu, B., Purvis, M., Cranefield, S., and Purvis, M.(2007a). How do norms emerge in multi-agent societies? Mechanisms design. TheInformation Science Discussion Paper, (1).

157

[Savarimuthu et al., 2007b] Savarimuthu, B. T. R., Cranefield, S., Purvis, M., andPurvis, M. (2007b). Norm emergence in agent societies formed by dynamicallychanging networks. In Proceedings of the 2007 IEEE/WIC/ACM International Con-ference on Intelligent Agent Technology, IAT ’07, pages 464–470, Washington, DC,USA. IEEE Computer Society.

[Scholtes, 2010] Scholtes, I. (2010). Distributed creation and adaptation of randomscale-free overlay networks. In Proceedings of the 2010 Fourth IEEE InternationalConference on Self-Adaptive and Self-Organizing Systems, SASO ’10, pages 51–63,Washington, DC, USA. IEEE Computer Society.

[Scott, 2000] Scott, J. (2000). Social Network Analysis: A Handbook. Sage Publica-tions, second. edition.

[Sen and Airiau, 2007] Sen, S. and Airiau, S. (2007). Emergence of norms throughsocial learning. Proceedings of IJCAI-07, pages 1507–1512.

[Shoham and Tennenholtz, 1992] Shoham, Y. and Tennenholtz, M. (1992). On the syn-thesis of useful social laws for artificial agent societies (preliminary report). In Pro-ceedings of the AAAI Conference, pages 276–281.

[Shoham and Tennenholtz, 1994] Shoham, Y. and Tennenholtz, M. (1994). Co-learning and the evolution of social acitivity. Technical report, Stanford, CA, USA.

[Shoham and Tennenholtz, 1997a] Shoham, Y. and Tennenholtz, M. (1997a). On theemergence of social conventions: modeling, analysis, and simulations. ArtificialIntelligence, 94:139–166.

[Shoham and Tennenholtz, 1997b] Shoham, Y. and Tennenholtz, M. (1997b). On theemergence of social conventions: Modeling, analysis, and simulations. Journal ofArtificial Intelligence, 94(1-2):139–166.

[Sierra et al., 2004] Sierra, C., Rodrıguez-Aguilar, J., Noriega, P., Arcos, J., and Es-teva, M. (2004). Engineering multi-agent systems as electronic institutions. Up-grade, 5:33–38.

[Sigmund, 2007] Sigmund, K. (2007). Punish or perish? retaliation and collaborationamong humans. Trends Ecol Evol, 22(11):593–600.

[Sunstein, 1996] Sunstein, C. R. (1996). Social norms and social roles. Columbia LawReview, 96(4):903–968.

[Tinnemeier et al., 2010] Tinnemeier, N., Dastani, M., and Meyer, J.-J. (2010). Pro-gramming norm change. In Proceedings of the 9th International Conference onAutonomous Agents and Multiagent Systems: volume 1 - Volume 1, AAMAS ’10,pages 957–964, Richland, SC. International Foundation for Autonomous Agents andMultiagent Systems.

[Toivonen et al., 2009] Toivonen, R., Castello, X., Eguıluz, V. M., Saramaki, J., Kaski,K., and San Miguel, M. (2009). Broad lifetime distributions for ordering dynamicsin complex networks. Physical Review, 79.

158

[Tosto et al., 2006] Tosto, G. D., Paolucci, M., and Conte, R. (2006). Altruism amongsimple and smart vampires. International Journal of Cooperative Information Sys-tems.

[Traxler and Winter, 2009] Traxler, C. and Winter, J. (2009). Sur-vey evidence on conditional norm enforcement. Working Paper Se-ries of the Max Planck Institute for Research on Collective Goods200903,MaxPlanckInstitute f orResearchonCollectiveGoods.

[Tyran and Feld, 2006] Tyran, J.-R. and Feld, L. P. (2006). Achieving compliance whenlegal sanctions are non-deterrent. Scandinavian Journal of Economics, 108(1):135–156.

[Ullman-Margalit, 1977] Ullman-Margalit, E. (1977). The Emergence of Norms. Claren-don Press, Oxford.

[Urbano et al., 2009a] Urbano, P., Balsa, J., Antunes, L., and Moniz, L. (2009a). Forceversus majority: A comparison in convention emergence efficiency. pages 48–63.

[Urbano et al., 2009b] Urbano, P., Balsa, J. a., Ferreira, Jr., P., and Antunes, L. (2009b).How much should agents remember? the role of memory size on convention emer-gence efficiency. In Proceedings of the 14th Portuguese Conference on Artificial Intel-ligence: Progress in Artificial Intelligence, EPIA ’09, pages 508–519, Berlin, Heidel-berg. Springer-Verlag.

[Villatoro et al., 2009] Villatoro, D., Sen, S., and Sabater, J. (2009). Topology and mem-ory effect on convention emergence. In Proceedings of the International Conference ofIntelligent Agent Technology (IAT). IEEE Press.

[Villatoro et al., 2010] Villatoro, D., Sen, S., and Sabater-Mir, J. (2010). Of social normsand sanctioning: A game theoretical overview. International Journal of Agent Tech-nologies and Systems, 2:1–15.

[von Wright, 1963] von Wright, G. H. (1963). Norm and Action. A Logical Inquiry. Rout-ledge and Kegan Paul, London.

[Vygotskii and Cole, 1978] Vygotskii, L. S. and Cole, M. (1978). Mind in society : thedevelopment of higher psychological processes / L. S. Vygotsky ; edited by MichaelCole ... [et al.]. Harvard University Press, Cambridge :.

[Walker and Wooldridge, 1995] Walker, A. and Wooldridge, M. (1995). Understandingthe emergence of conventions in multi-agent systems. In Lesser, V., editor, Proceedingsof the First International Conference on Multi–Agent Systems, pages 384–389, SanFrancisco, CA. MIT Press.

[Walker A, 1995] Walker A, Wooldridge, M. (1995). Understanding the emergence ofconventions in multi-agent systems. In Proceedings of ICMAS (International JointConference on Multi Agent Systems) (San Francisco).

159

[Watkins and Dayan, 1992] Watkins, C. and Dayan, P. (1992). Q-learning. MachineLearning, 8(3-4):279–292.

[Xiao and Hauser, 2009] Xiao, E. and Hauser, D. (2009). Avoiding the sharp tongue:Anticipated written messages promote fair economic exchange. Journal of EconomicPsychology, 30(3).

[Xiao and Houser, 2005] Xiao, E. and Houser, D. (2005). Emotion expression in humanpunishment behavior. Proc Natl Acad Sci U S A, 102(20):7398–7401.

[Young, 1993] Young, H. P. (1993). The evolution of conventions. Econometrica,61(1):57–84.

[Young, 2007] Young, H. P. (2007). Social norms. Economics Series Working Papers307, University of Oxford, Department of Economics.

[Young, 2008] Young, H. P. (2008). Social norms. The New Palgrave Dictionary of Eco-nomics.

[Zhang and Huang, 2006] Zhang, H. and Huang, S. Y. (2006). Dynamic control of inten-tion priorities of human-like agents. In Proceeding of the 2006 conference on ECAI2006, pages 310–314, Amsterdam, The Netherlands, The Netherlands. IOS Press.

160

social norms for self-policing multi-agent systems and ... · multi-agent systems and virtual...

Documents