modeling the process of collaboration and negotiation with incomplete information katia sycara,...

28
Modeling the Process of Collaboration and Negotiation with Incomplete Information Katia Sycara , Praveen Paruchuri, Nilanjan Chakraborty Collaborators: Roie Zivan, Laurie Weingart, Geoff Gordon, Miro Dudik

Upload: shawn-aldous-york

Post on 16-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Modeling the Process of Collaboration and Negotiation with

Incomplete Information

Katia Sycara, Praveen Paruchuri, Nilanjan Chakraborty

Collaborators: Roie Zivan, Laurie Weingart, Geoff Gordon, Miro Dudik

MURI 14 Program Review-- September 10, 2009

2

TheoryFormation

Identify Cultural FactorsCUNY, Georgetown, CMU

Computational ModelsCMU, USC

Virtual HumansUSC

ImplementationCMU

RESEARCHPRODUCTS

Surveys & InterviewsCUNY, CMU, U Mich, Georgetown

Cross-Cultural Interactions

U Pitt, CMU

Data AnalysisCUNY, Georgetown,

U Pitt, CMU

validation

validationvalidation

Validated Theories

Models

Modeling Tools

Briefing Materials

Scenarios

Training Simulations

Common task

Subgroup task

MURI 14 Program Review-- September 10, 2009

3

Problem• Computational model of reasoning in

Cooperation and Negotiation (C&N)

• Capture the rich process of C&N– Not just outcome– Not just offer-counteroffer but additional

communications

• Account for cultural, social factors

• Rewards of other agents not known

• Uncertain and dynamic environment MURI 14 Program Review--

September 10, 2009

MURI 14 Program Review-- September 10, 2009

4

Contributions• Created an initial model from real human data. The

model:– Applicable in a uniform way to both collaboration and

negotiation – Derives sequences of actions for an agent from real

transcripts, as opposed to state of the art work where action selection is constructed heuristically

– Adapts its beliefs during the course of the interaction– Learns elements of the negotiation (e.g. other party

type) as the interaction proceeds– Produces optimal activity sequences considering also

the other agents– Has only incomplete information about others

POMDP: Partially Observable Markov Decision Process

• Agent has initial beliefs• Agent takes an action • Gets an observation• Interprets the observation• Updates beliefs• Decides on an action• Repeats

Agent takes optimal action considering world/other agents

Elements: {States, Actions, Transitions, Rewards, Observations }MURI 14 Program Review--

September 10, 2009

The World(Other agents)

The World(Other agents)

Agent

ActionObservation

MURI 14 Program Review-- September 10, 2009

6

Why POMDP based modeling ?

– Decentralized algorithm– Incorporated in an agent that interacts with others– Can represent communication (arguments, offers, preferences

etc)– Many conversational turns – Learns e.g. the model of the other player– Adaptive best response – Computationally efficient for realistic interactions– Extendable to more the two agents

Natural way to represent cultural and social factors in C and NMURI 14 Program Review--

September 10, 2009

MURI 14 Program Review-- September 10, 2009

7

Output of POMDP

• The output is a policy matrix

• Policy: Optimal action to take, given current state (observations and other’s model)

• At run-time, agent consults the matrix and takes appropriate action

MURI 14 Program Review-- September 10, 2009

8

Simplified Example

• Two agents negotiating– Seller S (POMDP Agent)– Buyer B (Other player)

• Single item negotiation

• Initially buyer at 0 price and seller at max = 10

MURI 14 Program Review-- September 10, 2009

MURI 14 Program Review-- September 10, 2009

9

Example: State Space• State composed of 2 parts –

– Seller Type, Buyer type– Negotiation status: current offers

• Agent types: cooperative or non-cooperative

• Negotiation modeled from Seller’s perspective– Initially high uncertainty of Buyer type

• Seller’s belief about Buyer, and state of negotiation are dynamic

MURI 14 Program Review-- September 10, 2009

MURI 14 Program Review-- September 10, 2009

10

Example: POMDP State• Agent Type: cooperative vs non-cooperative

– 0 cooperative, 1 non-cooperative

– Discretized to {0, .5 , 1}• Price discretized to the set {0,1,..,9,10}• Sample state:

• State space = Number of Buyer types * Negotiation states = 363

Me (Seller) Type= CoopYou (Buyer) = Unknown

Negotiation status: <S price, =$10; B price=$0>

MURI 14 Program Review-- September 10, 2009

MURI 14 Program Review-- September 10, 2009

11

Example: Action & Transition

• Action set: {Concede 2, Concede 1, Concede 0, Accept, Reject}

• Transition: Probability of ending in some state if agent takes a particular action in current state

MURI 14 Program Review-- September 10, 2009

MURI 14 Program Review-- September 10, 2009

12

Me = CoopYou = Unknown

My price = $10Your price = $0

Me = CoopYou = Coop

( $9, $0 )

Me = CoopYou = Coop

( $9, $1 )

Me = CoopYou = Coop

( $9, $2 )

Me = CoopYou = Ncoop

( $9, $0 )

Me = CoopYou = Ncoop

( $9, $1 )

Me = CoopYou = Ncoop

( $9, $2 )

Concede 1

Concede 0

0.1 0.7 0.2 0.6 0.35 0.05

Concede 2

0.5 0.50.35

0.65

( $4, $6) ( $6, $4)

Concede 1 0.75 0.25

Me = Coop

You = Coop

( $8, $0 )

Me = CoopYou = Coop

( $8, $1 )

Me = CoopYou = Coop

( $8, $2 )

0.1 0.7 0.2

Me = Coop

You = Coop

( $7, $0 )

Me = CoopYou = Coop

( $7, $1 )

Me = CoopYou = Coop

( $7, $2 )

0.1 0.4 0.5

Concede 2Concede 0

( $5, $5)

Concede 1

Concede 0

Agree

MURI 14 Program Review-- September 10, 2009

13

Building Initial Simplified POMDP• Human negotiation transcripts

– 2 players (Grocer and Florist) with 4 issues

• Mapped dialogues to 14 base codes (actions)

• Other player’s type known for each transcript– Used for training and validation of the model

• Transition: Frequency of reaching some state, given a code

• Observation: Frequency of observing a code given some negotiation state

MURI 14 Program Review-- September 10, 2009

MURI 14 Program Review-- September 10, 2009

14

POMDP construction

MURI 14 Program Review-- September 10, 2009

14

Grocer-Florist Transcript

<Player, Action code>

Model Generator

Model generated

Reasoning over model

Prescription of optimal actions given

state of interaction

(Empty)Learns

MURI 14 Program Review-- September 10, 2009

15

Codes usedCode Definition Code Definition Code Definition

OFFER REACTIONS Misc Miscellaneous

OS Single-Issue RPO Agreement to offer made

SBF Substantiation

OM Multi-Issue RPS Agreement with statement

Q Question

PROVIDE INFORMATION

RNO Disagreement with offer

PC Procedural Comment

IP Issue Preferences RNS Disagreement with statement

INT Summarizing

IR Priorities TP Threat/Power

IB Bottom-line

Courtesy of Laurie Weingart

MURI 14 Program Review-- September 10, 2009

16

Sample Grocer-Florist Transcript• Speaker Code Unit• Florist PC So let’s start with temperature • Grocer RPS Okay• Florist OS So I would suggest a temperature of 64 degrees• Grocer RPS Okay• Florist Q How does that work for you?• Grocer IP Well personally for the grocery I think it is better to have a

higher temperature • Grocer SBF Just because I want the customers to feel comfortable • Grocer SBF And if it is too cold that might turn the customers away a little

bit• Florist RPS Okay• Grocer SBF "And also if it is warm, people are more apt to buy cold drinks

to keep themselves comfortable and cool"• Florist RPS That's true.• Grocer OS I think 66 would be good. • Grocer SBF That way it is not too cold and it is not too hot as well.• Grocer SBF And its good for the customers.• Florist RPO "Okay, yeah"

• Assumed Florist is Cooperative

MURI 14 Program Review-- September 10, 2009

17

Grocer POMDP generated

Me = CoopYou = Coop

70F, 62F

Me = CoopYou = Coop

70F, 64F

Me = CoopYou = Coop

66F, 64F

Me = CoopYou = Coop

66F, 66F

Agrees without committing

Proposes 66F

Grocer substantiates his offer

Discuss preferences and support their positions

Reward 60 points for both Grocer and Florist

64FFlorist

Florist

Doesn’t commit

Florist

Agrees to 66F

MURI 14 Program Review-- September 10, 2009

18

Negotiation Game

MURI 14 Program Review-- September 10, 2009

18

Agent: (Grocer)Optimal POMDP policy

Human(Florist)

Grocer Action

Florist Action

•Sequential•Process oriented•Blends computational and social science results

MURI 14 Program Review-- September 10, 2009

19

Initial results – Classification of Florist• 10 transcripts for training: 4 cooperatives, 6 non-

cooperatives• 5 for testing –average of correctly classified• X axis – Number of communications• Y axis – Uncertainty of belief of grocer about florist

Un

cert

ain

ty o

f be

lief

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8

MURI 14 Program Review-- September 10, 2009

20

Modeling Cultural Factors

• How do we model cultural factors for C and N in a POMDP?

• How do we validate the model?

• Is the model general enough to exhibit plausible culturally-specific human behavior?

MURI 14 Program Review-- September 10, 2009

21

Culture and POMDP• Initial beliefs about others’ social value orientation and

behavior usually reflect own culture beliefs about the interaction

• Culture influences frequency of particular actions and communications

• Interpretation of each observation refines the agent’s model of others

• Interpretation is influenced by culture– Model can capture cultural misinterpretations and their

consequences in terms of strategy and outcomes• Agents from different cultures can have different rewards

for the same actions

MURI 14 Program Review-- September 10, 2009

22

Other’s type

• Includes factors such as:– Social Value Orientation

• Pro-Social/cooperative, individualistic, competitive, altruistic

– Trust, Reputation etc– Cultural factors

• Individualist vs Collectivist

• Egalitarian vs Hierarchy

• Direct vs Indirect communication

MURI 14 Program Review-- September 10, 2009

A’s culture

A’s history with B

Context

B’s culture

B’s history with A

ContextB’s behavior

A’s interpretation of B’s intent

A’s real intent

A’s behavior

B’s interpretation of A’s intent

B’s real intent

B’s schema

A’s schema

B’s schema

A’s schema

Cognitive Schema of A POMDP

State Space

Initial Beliefs

Actions

Observations

Transition

Reward

Reward

A’s culture

A’s history with B

Context

B’s culture

B’s history with A

ContextB’s behavior

A’s interpretation of B’s intent

A’s real intent

A’s behavior

B’s interpretation of A’s intent

B’s real intent

B’s schema

A’s schema

B’s schema

A’s schema

Capturing initial state of model

State Space

Initial Beliefs

Actions

Observations

Transition

Reward

Reward

Survey experiments

Observer Experiments

A’s culture

A’s history with B

Context

B’s culture

B’s history with A

ContextB’s behavior

A’s interpretation of B’s intent

A’s real intent

A’s behavior

B’s interpretation of A’s intent

B’s real intent

B’s schema

A’s schema

B’s schema

A’s schema

Capturing model dynamics

State Space

Initial Beliefs

Actions

Observations

Transition

Reward

Reward

Intercultural transcripts

MURI 14 Program Review-- September 10, 2009

26

Plans for Next Year• Initial beliefs from Observer Experiment and from

surveys (US, Turkey, Egypt, Qatar)• Collect intra-cultural negotiation transcripts

– US, Turkey, Egypt

• Build POMDPs from intra-cultural negotiation transcripts– US, Turkey, Egypt

• Build POMDPs from inter-cultural negotiation transcripts– US-Hong Kong, US-German, US-Israeli (have) (courtesy of Wendi

Adair and Jeanne Brett)

– US-Turkish, US-Egyptian, US-Qatari (collect)

MURI 14 Program Review-- September 10, 2009

27

Plans for Next Year• Validate the predictive behavior of the models

– Using the transcripts for training and testing

• Use the models in negotiation with humans

• Use the models in what-if scenarios

• Use the models to generate hypotheses to test with human subjects

• Initial models for collaboration scenarios using POMDP

MURI 14 Program Review-- September 10, 2009

28

Thank You

Any questions ?

MURI 14 Program Review-- September 10, 2009