lecture 6 - emotions and decision makingjhoey/teaching/cs886-affect/... · 2017-04-10 ·...

$: Lecture 6 - Emotions and Decision Makingjhoey/teaching/cs886-affect/... · 2017-04-10 · Situations with con icting elements \high-reason" view I logical reasoning I no place for$
Lecture 6 - Emotions and Decision Making

Jesse HoeySchool of Computer Science

University of Waterloo

February 10, 2017

Readings:

Damasio Descartes’ Error Chapter 8 (handouts on Slack)

Nabiha Asghar and Jesse Hoey. Intelligent Affect: Rational DecisionMaking for Socially Aligned Agents. Proceedings of Uncertainty inArtificial Intelligence, Amsterdam, 2015

Lawler, Thye and Yoon Social Commitments in a Depersonalized World,Chapter 1 (handouts on Slack)

Somatic Marker Hypothesis

Damasio Descartes’ Error Chapter 8Reasoning and deciding assumes knowledge about

the situation

the different options

the consequences of options

some method of computing a strategy

Emotion is seldom recognized

Primary vs. Secondary Emotions

Damasio Descartes’ Error Chapter 7

Primary Emotions:

innate

simple

child-like

(some) facial expressions

Secondary Emotions:

expressed verbally

more complex datastructures

dependent on context andmemory

learned

adult-like

Situations with conflicting elements

“high-reason” viewI logical reasoningI no place for emotionI hard because attention and working memory have a limited

capacity

“somatic-marker” hypothesisI feelings generated from secondary emotionsI focusses action choicesI flags up good/bad choicesI leads to fewer alternativesI a biasing deviceI “high-reason” can be applied afterwards to the reduced space

of possibilities.I “there is still room for using a cost/benefit analysis and proper

deductive competence, but only after the automated stepdrastically reduces the number of options” (p173)

Somatic Markers

based on secondary emotions

pre-selection of actions/choices

may be absentI psycho- or socio-paths,I brain damageI “sick culture”

learned in childhood/adolescence

can be sub-conscious

call forth attention and working memory

located in prefrontal cortex along with secondary emotions

“tuned to cultural prescriptions designed to ensure survival ina particular society” (p200)

Example (p193 in Descarte’s Error)

undamaged prefrontal

proximity

to other

dates

of

week

day even

numbered

day?

day?

last

Date 1 or 2?

prefrontal damageventromedial

Date 1 or 2?

dither flip coin defer

Neural basis of somatic markers

pre-frontal cortices

“eavesdropping” posts

bring togetherI external signalsI internal “body” signalsI effectors (motor)

Action selection in Affect Control Theory

From Heise, 2007 (Chapt 7)

Institutionally constrained

e.g. professor acting upon a studentcan:

I advise

I question

I compliment

I punish

can’t:

I arrest

I sell to

I spank

I make love tothen, “behaviours that best confirm your sentiments becomepsychologically available”

you then “select from this relatively small set the behaviourthat is most sensible in the circumstances”

institutionally appropriate → feasible → sentiment affirming

Action selection in Affect Control Theory

In ACT:I Deflection: D =

∑i wi (fi − τi )2

I Optimal behaviour found by solving 0 = ∂D∂fb,j

, where

j ∈ {e, p, a}In BayesACT:

I sentiment distribution:Pr(f ′|f, τ , x,ba, ϕ) ∝ e−ψ(f′,τ ,x)−ξ(f′,f,ba,x)

I optimal behaviour found by solving 0 = ddf′b,j

Pr(f ′|f, τ , x,ba, ϕ)

I optimal behaviour in BayesACT is simply the mean of theproduct of Gaussians

I if instead we solve 0 = ∂∂f′b,j

Pr(f ′|f, τ , x,ba, ϕ), we get the

same answer as ACT

BayesACT action selection

Recall the Normative Action Bias :

π†(f ′b) =

∫f′a,f

′c

∫sPr(f ′|f, τ , x, ϕ)b(s) b∗a = arg max

f′b

π†(f ′b)

π†(f ′b) is marginal belief over f ′bb∗a is action prescription: the deflection minimizing, mostaffectively aligned action to take

agent attends only to actions, a, that have affective meaningclose to b∗a

However, it may not lead to the highest reward individually

POMCP allows search of “nearby” actions that could be(almost) equally aligned but more highly rewarding

Somatic Markers in Game Trees

Next State

REWARD/

UTILITY

TIM

E

Agent Action

Observation/Alter Action

Agent Action

Post−Action State

Current State


Next State

REWARD/

UTILITY

TIM

E

Agent Action


Agent Action

Post−Action State

Current State

COGNITIVE

DENOTATIVE


AFFECTIVECONNOTATIVE

Next State

REWARD/

UTILITY

TIM

E

Agent Action


Agent Action

Post−Action State

Current State

COGNITIVE

DENOTATIVE

Somatic Marker Action Selection in BayesACT

Somatic Action Transform selects best actions to attend to:

away fromF connotative optimal solution

DENOTATIVE PLAN TREE

search

τ

AFFECTIVE DYNAMICS

P(f )b

P(a)

CONNOTATIVE SPACE(EPA SPACE)

DENOTATIVE S

PACE

(ACTIO

N SPACE)

culturally shared

of actionsaffective meaning

unexploredbranch

explore

d bra

nches

Somatic Transform

Reinforcement Learning

Traditional Reinforcement Learning (RL) is based on the“exploration-exploitation” tradeoff

Exploitation: use learned knowedge of reward structure to optimizedecisions

Exploration: try something new (it might be better than what you know)!

Tradeoff: often based on “optimism in the face of uncertainty”

In BayesACT:

Exploitation: use learned knowledge of alignment to automatically(System I/affective) choose socially best action

Exploration: use System II (rational/cognitive) to explore nearby,possibly better (individually at first, then socially) actions.

Tradeoff: based on resource (time and energy) bounds

Exploratory actions that lead to individual reward are extinguished

Exploratory actions that lead to social/global reward are reified,celebrated and called “creative”.

Reward and Deflection

EPA used by humans to assess reward (Fennell et al. 2013):

Evaluation: expected value,

Potency: risk (e.g. powerful things are more risky, becausethey do what they want and ignore you),

Activity: uncertainty, increased risk, and decreased values(e.g. faster and more excited things are more risky and lesslikely to result in reward).

EPA in correpondence with choice in Social Dilemmas (Scholl2013):

Evaluation: affiliation or correspondence between outcomes:agents with similar goals will rate each other more positively.

Potency is a measure of dependence: agents who can reachtheir goals independently of other agents are more powerful.

Activity is a measure of the magnitude of dependence: agentswith bigger payoffs will tend to be more active.

Deflection ≡ reward?

Prisoner’s Dilemma

Payoff Matrix:

Cooperate DefectGive 2 Take 1

Cooperate: Give 2 (2,2) (0,3)Defect: Take 1 (3,0) (1,1)

Nash Equilibrium: Always defect (take 1)

But Humans will cooperate for long periods

Tit-for-Tat is nice, retaliatory and forgiving,

Tit-for-Tat supports long(er)-term cooperation, but not likehumans


Rationalistic Interpretation

Nowak 20061

modify the payoff matrix adding a bonus for cooperationbecause of:

I Kinship: bonus for cooperating with your kinI Direct Reciprocity (knowledge of repeated play)I Indirect reciprocity (trust and reputation)I Network reciprocity (your neighbours will be good to you)I Group selection (you will interact with a smaller group)

1Martin A. Nowak Five Rules for the Evolution of Cooperation. Science Vol314, Issue 5805, pp1560-1563, 2006. DOI: 10.1126/science.1133755


Affect Control Theoretic Interpretation

Reciprocity:friend: 2.75, 1.88, 1.38scrooge: 2.15, 0.21, 0.54

distance fromagent client optimal behaviour closest labels collaborate abandon

friend friend 1.98, 1.09, 0.96 treat/toast 0.4 23.9

friend scrooge 0.46, 1.14, 0.27 reform/lend money to 1.7 10.5

scrooge friend 0.26, 0.81, 0.77 curry favor/look away from 8.5 4.2

scrooge scrooge 0.91, 0.80, 0.01 borrow money/chastise 9.6 2.7


Affect Control Theoretic Interpretation

Kin Selection:enemy: 2.11, 0.75, 0.19brother: 1.86, 1.82, 1.5cousin: 1.66, 0.57, 0.74stranger: 0.02, 0.09, 0.23stepbrother: 0.43, 0.23, 0.31

distance fromagent client optimal behaviour closest labels collaborate abandon

brother brother 2.06, 1.04, 1.19 play with 0.7 25.2

cousin cousin 1.68, 0.36, 0.59 chitchat with 0.62 18.4

stepbrother stepbrother 0.62, 0.16, 0.03 implore 1.9 9.6

brother stranger 1.39, 1.56, 0.59 reply to 0.2 19.7

stranger stranger 0.23, 0.03, 0.39 dress down 3.6 6.7

brother enemy 0.12, 1.67, 0.56 convict 2 12

enemy enemy 0.68, 0.24, 0.45 flee 5.2 4.7

Prisoner’s Dilemma - Robot Experiments

2 4 6 8 10 12 14 16 18 200

2

4

6

8

10

12

game

Rew

ard

timeout: 10

2 4 6 8 10 12 14 16 18 200

2

4

6

8

10

12

game

timeout: 30

2 4 6 8 10 12 14 16 18 200

2

4

6

8

10

12

game

Rew

ard

timeout: 60

2 4 6 8 10 12 14 16 18 200

2

4

6

8

10

12

game

timeout: 120

here actions are “give 10” or “take 1”

above with γ = 0.9: more time buys more breadth, shallowsolutions found

with γ = 0.99, bots always cooperate: more time buys moredepth, deeper solutions found

Prisoner’s Dilemma - Human Experiments

0 2 4 6 8 10 12 14 16 18 200

0.5

1

1.5

2

2.5

3

game

Rew

ard

human

0 2 4 6 8 10 12 14 16 18 200

0.5

1

1.5

2

2.5

3

game

Rew

ard

bayesact

0 2 4 6 8 10 12 14 16 18 200

0.5

1

1.5

2

2.5

3

game

Rew

ard

titfortat

0 2 4 6 8 10 12 14 16 18 200

0.5

1

1.5

2

2.5

3

game

Rew

ard

jerkbot

here actions are “give 2” or “take 1”

70 participants, 12-18 round games, 360 games total

participants winnings = draws for money

Prisoner’s Dilemma - Human experiments

Joshua DA Jung, Jesse Hoey, Jonathan H Morgan, Tobias Schroder, and IngoWolf. Grounding social interaction with affective intelligence. In CanadianConference on Artificial Intelligence, pages 52-57. Springer, 2016.

Battle of the Sexes - Corobots

Shopping Football

Shopping (10,3) (0,0)Football (0,0) (3,10)

0−5 +5

?[−2,+1,+1]

[+2,−1,−1]

[+2,+2,+2]

Robot “identity”: Fa = [E ,P,A]Normative Action Communication Bias:Pr(Fb|Fa,Fc) ∼ N ((Fa − Fc)/2,Σb).Social Coordination Bias: towards own goal if Fa,p > Fc,p, elsetowards other robotAsymmetrical: one robot gets more planning time, larger Σb, andbetter initial estimate of other robot’s identity

Disspointment and Anger

Maarten J.J. Wubben, David De Cremer, Eric van Dijk, How emotion

communication guides reciprocity: Establishing cooperation through

disappointment and anger, Journal of Experimental Social Psychology, Volume

45, Issue 4, July 2009, Pages 987-990.

More complex prisoner’s dilemma gameI Each play splits a pot of 10 “tokens”I Give a token: worth $1I Take a token: worth $.50

Participants played a robot that expressed:I nothingI dissapointmentI anger

expression was in proportion to amount donated/lost

Showed thatI dissapointment encouraged cooperation,I anger encouraged retaliation

See also Papers by Antos (webpage)

Networked Prisoner’s Dilemma

move from simple descriptions of network dynamics toACT-based descriptions

BayesACT agents replicate 4/5 properties of humanbehaviour2

I invariance to network structure,I global cooperation rates decline over time, but remain

non-zero,I cooperation is anti-correlated with reward,I “moody conditional cooperative behaviour,I human play is stratified into four major groups.

2Joshua DA Jung and Jesse Hoey. Socio-Affective Agents as Models ofHuman Behaviour in the Networked Prisoners Dilemma.arxiv.org/abs/1701.09112

Networked Prisoner’s Dilemma

predict social network structures based on ACT

compute policies for social networking “bots” thatI incentivize economic growthI implement social policyI help people’s quality of lifeI catalyze social changeI help with human long-term survival

Theory of Social Commitments

Chapter 2: Narratives of Social Transformation

IndividualizationI basis of social order is transactional, contractual, brittleI requires rules to enforce social contracts

Socio-RelationalI emphasis on social nature of humansI relational and emotional tiesI rules cannot eliminate these ties, they re-emerge covertly

From: Edward J. Lawler, Shane R. Thye and Jeongkoo Yoon. Social Commitments in a Depersonalized World.

Russell Sage Foundation, 2009.


Chapter 2: Forms of Commitment

Normative Commitment

Instrumental Commitment

Affective Commitment

b

a

c

normative commitments: external enforcement of jointefforts/collective goods (b)

normative commitments arise from affective sentiments aboutshared membership (c)

affective ties arise as a by-product of instrumental conditions (a)

Individualization narrative: Instrumental, normative and (b) only

Socio-Relational narrative: all forms + links




Chapter 3: Theories of Affect in Social Interaction

Interaction ContextI normativeI structuralI emotions are constructs to encode norms

Interaction ProcessI Signals to selfI Signals to othersI Cognitive adjustmentsI emotions create interpersonal feelings that promote group

cohesion

Interaction OutcomeI Instrumental → affectiveI emotions “create” groupsI groups are the objects of emotions




Chapter 4: Core Assumptions

Social interactions involve jointness

Social interactions foster emotions

emotions are self-reinforcements or punishments

People strive to experience positive emotions (**)

Motivating effects of emotion → causal interpretations

Social group is referenced as cause of emotions

Ties are strengthened by that attribution




Chapter 4: Theory of Social Commitments

non-separability of contributions

perceptions of shared responsibility

attribution of emotions to group

strengthening of ties (if +ve emotions)

longer-lasting group cohesion

Instrumental/Transactional

Affective/Relational


Russel Sage Foundation, 2009.

Next:

Facial Expressions

Student presentations

lecture 6 - emotions and decision makingjhoey/teaching/cs886-affect/... · 2017-04-10 ·...

Documents