lecture 6 - emotions and decision makingjhoey/teaching/cs886-affect/... · 2017-04-10 ·...
TRANSCRIPT
Lecture 6 - Emotions and Decision Making
Jesse HoeySchool of Computer Science
University of Waterloo
February 10, 2017
Readings:
Damasio Descartes’ Error Chapter 8 (handouts on Slack)
Nabiha Asghar and Jesse Hoey. Intelligent Affect: Rational DecisionMaking for Socially Aligned Agents. Proceedings of Uncertainty inArtificial Intelligence, Amsterdam, 2015
Lawler, Thye and Yoon Social Commitments in a Depersonalized World,Chapter 1 (handouts on Slack)
Somatic Marker Hypothesis
Damasio Descartes’ Error Chapter 8Reasoning and deciding assumes knowledge about
the situation
the different options
the consequences of options
some method of computing a strategy
Emotion is seldom recognized
Primary vs. Secondary Emotions
Damasio Descartes’ Error Chapter 7
Primary Emotions:
innate
simple
child-like
(some) facial expressions
Secondary Emotions:
expressed verbally
more complex datastructures
dependent on context andmemory
learned
adult-like
Situations with conflicting elements
“high-reason” viewI logical reasoningI no place for emotionI hard because attention and working memory have a limited
capacity
“somatic-marker” hypothesisI feelings generated from secondary emotionsI focusses action choicesI flags up good/bad choicesI leads to fewer alternativesI a biasing deviceI “high-reason” can be applied afterwards to the reduced space
of possibilities.I “there is still room for using a cost/benefit analysis and proper
deductive competence, but only after the automated stepdrastically reduces the number of options” (p173)
Somatic Markers
based on secondary emotions
pre-selection of actions/choices
may be absentI psycho- or socio-paths,I brain damageI “sick culture”
learned in childhood/adolescence
can be sub-conscious
call forth attention and working memory
located in prefrontal cortex along with secondary emotions
“tuned to cultural prescriptions designed to ensure survival ina particular society” (p200)
Example (p193 in Descarte’s Error)
undamaged prefrontal
proximity
to other
dates
of
week
day even
numbered
day?
day?
last
Date 1 or 2?
prefrontal damageventromedial
Date 1 or 2?
dither flip coin defer
Neural basis of somatic markers
pre-frontal cortices
“eavesdropping” posts
bring togetherI external signalsI internal “body” signalsI effectors (motor)
Action selection in Affect Control Theory
From Heise, 2007 (Chapt 7)
Institutionally constrained
e.g. professor acting upon a studentcan:
I advise
I question
I compliment
I punish
can’t:
I arrest
I sell to
I spank
I make love tothen, “behaviours that best confirm your sentiments becomepsychologically available”
you then “select from this relatively small set the behaviourthat is most sensible in the circumstances”
institutionally appropriate → feasible → sentiment affirming
Action selection in Affect Control Theory
In ACT:I Deflection: D =
∑i wi (fi − τi )2
I Optimal behaviour found by solving 0 = ∂D∂fb,j
, where
j ∈ {e, p, a}In BayesACT:
I sentiment distribution:Pr(f ′|f, τ , x,ba, ϕ) ∝ e−ψ(f′,τ ,x)−ξ(f′,f,ba,x)
I optimal behaviour found by solving 0 = ddf′b,j
Pr(f ′|f, τ , x,ba, ϕ)
I optimal behaviour in BayesACT is simply the mean of theproduct of Gaussians
I if instead we solve 0 = ∂∂f′b,j
Pr(f ′|f, τ , x,ba, ϕ), we get the
same answer as ACT
BayesACT action selection
Recall the Normative Action Bias :
π†(f ′b) =
∫f′a,f
′c
∫sPr(f ′|f, τ , x, ϕ)b(s) b∗a = arg max
f′b
π†(f ′b)
π†(f ′b) is marginal belief over f ′bb∗a is action prescription: the deflection minimizing, mostaffectively aligned action to take
agent attends only to actions, a, that have affective meaningclose to b∗a
However, it may not lead to the highest reward individually
POMCP allows search of “nearby” actions that could be(almost) equally aligned but more highly rewarding
Somatic Markers in Game Trees
Next State
REWARD/
UTILITY
TIM
E
Agent Action
Observation/Alter Action
Agent Action
Post−Action State
Current State
Somatic Markers in Game Trees
Next State
REWARD/
UTILITY
TIM
E
Agent Action
Observation/Alter Action
Agent Action
Post−Action State
Current State
Somatic Markers in Game Trees
Next State
REWARD/
UTILITY
TIM
E
Agent Action
Observation/Alter Action
Agent Action
Post−Action State
Current State
COGNITIVE
DENOTATIVE
Somatic Markers in Game Trees
AFFECTIVECONNOTATIVE
Next State
REWARD/
UTILITY
TIM
E
Agent Action
Observation/Alter Action
Agent Action
Post−Action State
Current State
COGNITIVE
DENOTATIVE
Somatic Marker Action Selection in BayesACT
Somatic Action Transform selects best actions to attend to:
away fromF connotative optimal solution
DENOTATIVE PLAN TREE
search
τ
AFFECTIVE DYNAMICS
P(f )b
P(a)
CONNOTATIVE SPACE(EPA SPACE)
DENOTATIVE S
PACE
(ACTIO
N SPACE)
culturally shared
of actionsaffective meaning
unexploredbranch
explore
d bra
nches
Somatic Transform
Reinforcement Learning
Traditional Reinforcement Learning (RL) is based on the“exploration-exploitation” tradeoff
Exploitation: use learned knowedge of reward structure to optimizedecisions
Exploration: try something new (it might be better than what you know)!
Tradeoff: often based on “optimism in the face of uncertainty”
In BayesACT:
Exploitation: use learned knowledge of alignment to automatically(System I/affective) choose socially best action
Exploration: use System II (rational/cognitive) to explore nearby,possibly better (individually at first, then socially) actions.
Tradeoff: based on resource (time and energy) bounds
Exploratory actions that lead to individual reward are extinguished
Exploratory actions that lead to social/global reward are reified,celebrated and called “creative”.
Reward and Deflection
EPA used by humans to assess reward (Fennell et al. 2013):
Evaluation: expected value,
Potency: risk (e.g. powerful things are more risky, becausethey do what they want and ignore you),
Activity: uncertainty, increased risk, and decreased values(e.g. faster and more excited things are more risky and lesslikely to result in reward).
EPA in correpondence with choice in Social Dilemmas (Scholl2013):
Evaluation: affiliation or correspondence between outcomes:agents with similar goals will rate each other more positively.
Potency is a measure of dependence: agents who can reachtheir goals independently of other agents are more powerful.
Activity is a measure of the magnitude of dependence: agentswith bigger payoffs will tend to be more active.
Deflection ≡ reward?
Prisoner’s Dilemma
Payoff Matrix:
Cooperate DefectGive 2 Take 1
Cooperate: Give 2 (2,2) (0,3)Defect: Take 1 (3,0) (1,1)
Nash Equilibrium: Always defect (take 1)
But Humans will cooperate for long periods
Tit-for-Tat is nice, retaliatory and forgiving,
Tit-for-Tat supports long(er)-term cooperation, but not likehumans
Prisoner’s Dilemma
Rationalistic Interpretation
Nowak 20061
modify the payoff matrix adding a bonus for cooperationbecause of:
I Kinship: bonus for cooperating with your kinI Direct Reciprocity (knowledge of repeated play)I Indirect reciprocity (trust and reputation)I Network reciprocity (your neighbours will be good to you)I Group selection (you will interact with a smaller group)
1Martin A. Nowak Five Rules for the Evolution of Cooperation. Science Vol314, Issue 5805, pp1560-1563, 2006. DOI: 10.1126/science.1133755
Prisoner’s Dilemma
Affect Control Theoretic Interpretation
Reciprocity:friend: 2.75, 1.88, 1.38scrooge: 2.15, 0.21, 0.54
distance fromagent client optimal behaviour closest labels collaborate abandon
friend friend 1.98, 1.09, 0.96 treat/toast 0.4 23.9
friend scrooge 0.46, 1.14, 0.27 reform/lend money to 1.7 10.5
scrooge friend 0.26, 0.81, 0.77 curry favor/look away from 8.5 4.2
scrooge scrooge 0.91, 0.80, 0.01 borrow money/chastise 9.6 2.7
Prisoner’s Dilemma
Affect Control Theoretic Interpretation
Kin Selection:enemy: 2.11, 0.75, 0.19brother: 1.86, 1.82, 1.5cousin: 1.66, 0.57, 0.74stranger: 0.02, 0.09, 0.23stepbrother: 0.43, 0.23, 0.31
distance fromagent client optimal behaviour closest labels collaborate abandon
brother brother 2.06, 1.04, 1.19 play with 0.7 25.2
cousin cousin 1.68, 0.36, 0.59 chitchat with 0.62 18.4
stepbrother stepbrother 0.62, 0.16, 0.03 implore 1.9 9.6
brother stranger 1.39, 1.56, 0.59 reply to 0.2 19.7
stranger stranger 0.23, 0.03, 0.39 dress down 3.6 6.7
brother enemy 0.12, 1.67, 0.56 convict 2 12
enemy enemy 0.68, 0.24, 0.45 flee 5.2 4.7
Prisoner’s Dilemma - Robot Experiments
2 4 6 8 10 12 14 16 18 200
2
4
6
8
10
12
game
Rew
ard
timeout: 10
2 4 6 8 10 12 14 16 18 200
2
4
6
8
10
12
game
timeout: 30
2 4 6 8 10 12 14 16 18 200
2
4
6
8
10
12
game
Rew
ard
timeout: 60
2 4 6 8 10 12 14 16 18 200
2
4
6
8
10
12
game
timeout: 120
here actions are “give 10” or “take 1”
above with γ = 0.9: more time buys more breadth, shallowsolutions found
with γ = 0.99, bots always cooperate: more time buys moredepth, deeper solutions found
Prisoner’s Dilemma - Human Experiments
0 2 4 6 8 10 12 14 16 18 200
0.5
1
1.5
2
2.5
3
game
Rew
ard
human
0 2 4 6 8 10 12 14 16 18 200
0.5
1
1.5
2
2.5
3
game
Rew
ard
bayesact
0 2 4 6 8 10 12 14 16 18 200
0.5
1
1.5
2
2.5
3
game
Rew
ard
titfortat
0 2 4 6 8 10 12 14 16 18 200
0.5
1
1.5
2
2.5
3
game
Rew
ard
jerkbot
here actions are “give 2” or “take 1”
70 participants, 12-18 round games, 360 games total
participants winnings = draws for money
Prisoner’s Dilemma - Human experiments
Joshua DA Jung, Jesse Hoey, Jonathan H Morgan, Tobias Schroder, and IngoWolf. Grounding social interaction with affective intelligence. In CanadianConference on Artificial Intelligence, pages 52-57. Springer, 2016.
Battle of the Sexes - Corobots
Shopping Football
Shopping (10,3) (0,0)Football (0,0) (3,10)
0−5 +5
?[−2,+1,+1]
[+2,−1,−1]
[+2,+2,+2]
Robot “identity”: Fa = [E ,P,A]Normative Action Communication Bias:Pr(Fb|Fa,Fc) ∼ N ((Fa − Fc)/2,Σb).Social Coordination Bias: towards own goal if Fa,p > Fc,p, elsetowards other robotAsymmetrical: one robot gets more planning time, larger Σb, andbetter initial estimate of other robot’s identity
Disspointment and Anger
Maarten J.J. Wubben, David De Cremer, Eric van Dijk, How emotion
communication guides reciprocity: Establishing cooperation through
disappointment and anger, Journal of Experimental Social Psychology, Volume
45, Issue 4, July 2009, Pages 987-990.
More complex prisoner’s dilemma gameI Each play splits a pot of 10 “tokens”I Give a token: worth $1I Take a token: worth $.50
Participants played a robot that expressed:I nothingI dissapointmentI anger
expression was in proportion to amount donated/lost
Showed thatI dissapointment encouraged cooperation,I anger encouraged retaliation
See also Papers by Antos (webpage)
Networked Prisoner’s Dilemma
move from simple descriptions of network dynamics toACT-based descriptions
BayesACT agents replicate 4/5 properties of humanbehaviour2
I invariance to network structure,I global cooperation rates decline over time, but remain
non-zero,I cooperation is anti-correlated with reward,I “moody conditional cooperative behaviour,I human play is stratified into four major groups.
2Joshua DA Jung and Jesse Hoey. Socio-Affective Agents as Models ofHuman Behaviour in the Networked Prisoners Dilemma.arxiv.org/abs/1701.09112
Networked Prisoner’s Dilemma
predict social network structures based on ACT
compute policies for social networking “bots” thatI incentivize economic growthI implement social policyI help people’s quality of lifeI catalyze social changeI help with human long-term survival
Theory of Social Commitments
Chapter 2: Narratives of Social Transformation
IndividualizationI basis of social order is transactional, contractual, brittleI requires rules to enforce social contracts
Socio-RelationalI emphasis on social nature of humansI relational and emotional tiesI rules cannot eliminate these ties, they re-emerge covertly
From: Edward J. Lawler, Shane R. Thye and Jeongkoo Yoon. Social Commitments in a Depersonalized World.
Russell Sage Foundation, 2009.
Theory of Social Commitments
Chapter 2: Forms of Commitment
Normative Commitment
Instrumental Commitment
Affective Commitment
b
a
c
normative commitments: external enforcement of jointefforts/collective goods (b)
normative commitments arise from affective sentiments aboutshared membership (c)
affective ties arise as a by-product of instrumental conditions (a)
Individualization narrative: Instrumental, normative and (b) only
Socio-Relational narrative: all forms + links
From: Edward J. Lawler, Shane R. Thye and Jeongkoo Yoon. Social Commitments in a Depersonalized World.
Russell Sage Foundation, 2009.
Theory of Social Commitments
Chapter 3: Theories of Affect in Social Interaction
Interaction ContextI normativeI structuralI emotions are constructs to encode norms
Interaction ProcessI Signals to selfI Signals to othersI Cognitive adjustmentsI emotions create interpersonal feelings that promote group
cohesion
Interaction OutcomeI Instrumental → affectiveI emotions “create” groupsI groups are the objects of emotions
From: Edward J. Lawler, Shane R. Thye and Jeongkoo Yoon. Social Commitments in a Depersonalized World.
Russell Sage Foundation, 2009.
Theory of Social Commitments
Chapter 4: Core Assumptions
Social interactions involve jointness
Social interactions foster emotions
emotions are self-reinforcements or punishments
People strive to experience positive emotions (**)
Motivating effects of emotion → causal interpretations
Social group is referenced as cause of emotions
Ties are strengthened by that attribution
From: Edward J. Lawler, Shane R. Thye and Jeongkoo Yoon. Social Commitments in a Depersonalized World.
Russell Sage Foundation, 2009.
Theory of Social Commitments
Chapter 4: Theory of Social Commitments
non-separability of contributions
perceptions of shared responsibility
attribution of emotions to group
strengthening of ties (if +ve emotions)
longer-lasting group cohesion
Instrumental/Transactional
Affective/Relational
From: Edward J. Lawler, Shane R. Thye and Jeongkoo Yoon. Social Commitments in a Depersonalized World.
Russel Sage Foundation, 2009.
Next:
Facial Expressions
Student presentations