thesis defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38...

38
Thesis Defense Ph.D. Program in Information Systems and Computer Engineering

Upload: others

Post on 19-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

Thesis Defense Ph.D. Program in Information Systems and Computer Engineering

Page 2: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

2/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

Problem

Approach

Emotion-based Intrinsic Rewards

Emerging Emotions

Socially-Aware Learning Agents

Conclusions

Page 3: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

3/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

Problem

Approach

Emotion-based Intrinsic Rewards

Emerging Emotions

Socially-Aware Learning Agents

Conclusions

Page 4: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

!  Simple navigation problem !  State: agent location

!  Actions: movement

Page 5: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

5/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

!  RL main ideas (Sutton & Barto, 1998; Kaelbling et al., 1996)

!  Objective: maximize the reward throughout time

!  Task: discover which actions maximize reward in each state

!  e.g., using Q-learning (Watkins, 1989)

+1

Page 6: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

6/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

!  Assumptions (Sutton & Barto, 1998; Kaelbling et al., 1996)

!  Fully observable environments

!  Infinite visits to all states and actions

!  Stationary environments

!  Agent limitations

!  Limited perception and computational resources

!  Dynamic, unpredictable and unreliable environment

!  Demand for manual adjustments

!  Design assumptions too restrictive (Littman, 1994; Loch and Singh, 1998; Singh et al., 1994)

Page 7: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

7/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

!  Rewards are fundamental (Sutton & Barto, 1998; Kaelbling et al., 1996)

!  Implicitly defines the agent’s task

!  Impact on the learning time

!  Impact on what is learned

!  Major challenge (Abbeel and Ng, 2004; Ng and Russell, 2000; Sorg et al., 2010a)

!  Build reward mechanisms so the task is learned efficiently

!  Flexible and robust

!  Enhance agent’s autonomy

Page 8: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

8/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

“Design reward mechanisms for RL agents that are able to

alleviate their inherent perceptual limitations and

make them operate in a wide variety of domains

without the explicit intervention of others or relying on

expert or domain knowledge about a particular task.”

Page 9: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

9/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

Problem

Approach

Emotion-based Intrinsic Rewards

Emerging Emotions

Socially-Aware Learning Agents

Conclusions

Page 10: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

10/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

!  Agent has access to several sources of information

!  Perception, reasoning, learning, etc.

How long since I observed this state?

What is the best state that I can be?

How many times have I performed this action?

What is the value of this state?

… interaction information

processing mechanisms

state, reward

decision making

action

Page 11: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

11/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

interaction information

decision making

intrinsic motivation

intrinsic reward

!  Intrinsically Motivated RL (Singh et al. 2009, 2010; Sorg et al. 2010)

!  Agent learns with intrinsic rewards

!  Fitness: measures performance

!  Mitigates computational limitations of learning agents

processing mechanisms

state, extrinsic reward

Page 12: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

12/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

!  Parallel with biological organisms !  Limited perception and resources

!  Dynamic, unpredictable and unreliable environment

!  Natural motivational mechanisms

!  Shaped by evolution

!  Provide adaptive advantages

!  Social mechanisms

!  Cooperation in inherently

competitive environments

Page 13: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

13/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

“We focus on the role of emotions and also on the

way individuals interact and cooperate with each other as a social group

to design more flexible and robust reward mechanisms that

enhance the autonomy of RL agents in

both single and multiagent settings.”

Page 14: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

14/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

!  Emotion-based Intrinsic Rewards !  Role of emotions in decision-making

!  4 emotion-based domain-independent reward features

!  Emerging Emotions

!  Emergence of useful sources of information

!  Discuss relation with emotions

!  Socially-Aware Learning Agents

!  Extend IMRL to multiagent scenarios

!  Socially-aware behaviors

Page 15: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

15/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

Problem

Approach

Emotion-based Intrinsic Rewards

Emerging Emotions

Socially-Aware Learning Agents

Conclusions

Page 16: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

16/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

!  Studies show that emotions: (Cardinal et al. 2002, Dawkins 2000, Phelps & LeDoux 2005)

!  Basic and ancient survival mechanism

!  Beneficial adaptive mechanism for decision-making

!  Elicit physiological signals

!  Absence of emotions (Bechara et al. 2000, Damasio 1994, LeDoux 2000)

!  Impairs taking advantageous decisions

!  How to provide emotion-based motivation?

Page 17: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

17/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

Goals, Beliefs, Norms

novelty

causality

focus of event

importance …

!  Evaluation through appraisal (Frijda and Mesquita 1998, Lazarus 2001, Reisenzein 2009, Roseman 2001, Scherer 2001)

appraisal mechanism

emotional state

re-appraisal

belief update

behavior

Page 18: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

18/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

interaction information

decision making

!  Major dimensions of appraisal (Ellsworth & Scherer 2003, Leventhal & Scherer, 1987)

!  4 domain-independent reward features

!  Evaluate agent’s history of interaction with environment

emotion-based intrinsic reward

appraisal-based reward mechanism

processing mechanisms

Page 19: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

19/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

!  Foraging scenarios

!  Each presents distinct challenge !  Partially-observable

!  Compare performance emotion-based vs. fitness-based

!  Results !  Emotional agents outperform standard agents

!  Careful consideration of emotional aspects

!  Learn the intended task

!  Overcome perceptual limitations

Page 20: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

20/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

!  4 emotion-based reward features

!  Novelty, Valence, Goal relevance and Control

!  Domain-independent

!  General-purpose guiding system for RL agents

!  Mitigation of perceptual limitations

!  Departs from previous works within Affective Computing

!  Domain-independent appraisal-based

!  Does not alter RL algorithm

!  Does not focus on a set of basic emotions

Page 21: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

21/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

Problem

Approach

Emotion-based Intrinsic Rewards

Emerging Emotions

Socially-Aware Learning Agents

Conclusions

Page 22: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

22/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

!  Answer the question: !  “Are emotions the best candidate to complement the agents’ information

processing mechanism?”

interaction information

processing mechanisms

decision making

intrinsic motivation

intrinsic reward

Page 23: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

23/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

basic variables

decision making

!  Reward optimization using Genetic Programming (Niekum et al. 2010)

!  Population of reward functions

!  Evolved and evaluated according to agent’s performance

evolved reward function

processing mechanisms

GP mechanism

selection, mutation, crossover

Page 24: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

24/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

!  Same foraging scenarios as before

!  Observe resulting evolved optimal rewards

!  Discover patterns in the reward functions’ expressions

!  Results

!  Set of 5 informative signals

!  Fitness, relevance, advantage, prediction, frequency

!  Each signal can be used as an intrinsic reward feature

Page 25: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

25/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

!  Scenarios based on PacMan

!  Objective: validate emerged optimal sources of information

!  Different and much more complex scenarios

!  Limited perception

!  Results

!  GP-based agent outperformed standard agents

!  Learn the intended task

!  Overcome perceptual limitations

Page 26: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

26/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

!  Analyze emerged information signals

!  Compare the “kinds” of evaluation

!  According to appraisal theories of emotion

!  Results

!  Informative signals provide similar evaluation

!  Share structural and dynamical properties

Page 27: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

27/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

!  Information-processing reward mechanism !  Emerged by means of genetic programming

!  Domain-independent

!  Mitigation of perceptual limitations

!  Relation with natural agents and emotions

!  Dynamic and structural connections with appraisal dimensions

!  Adaptive mechanism that allows for higher fitness

!  Reinforce the role of emotions in agents adaptation

Page 28: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

28/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

Problem

Approach

Emotion-based Intrinsic Rewards

Emerging Emotions

Socially-Aware Learning Agents

Conclusions

Page 29: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

29/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

!  Extend previous research into multiagent settings

!  Shared environment

!  Each agent has its own goals

!  May be conflicting

!  Achieve cooperation

!  Inspiration from social mechanisms

!  Living in group augments survival chances

!  Still there is competition for resources and power

!  Communicate intentions and evaluate each others actions

Page 30: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

30/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

!  Cooperation in competitive contexts (Axelrod 1984, de Waal 2008, Dörner 1999, Falk & Fischbacher 2006, Hamilton 1964, Trivers 1971)

!  Need for affiliation !  Altruistic behaviors despite momentary losses

!  Legitimacy signals

!  Signal socially-aware behaviors

!  Reciprocation mechanism

!  Evaluates “kindness” of others’ actions

Page 31: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

31/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

social aspects

decision making

!  Limited resource sharing scenarios

!  Internal and external social rewards

!  Evaluate appropriateness of behaviors towards social group

social intrinsic reward

processing mechanisms

social motivation mechanism

Page 32: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

32/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

!  Limited resource and mutual dependency scenarios

!  2 or 3 agents interact in the same environment

!  Limited perception

!  Agents learn and act individually, fitness is measured of the social group

!  Results

!  Socially motivated agents outperformed “greedy” group

!  Mostly homogeneous populations

!  Emergence of “socially-aware” behaviors

Page 33: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

33/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

!  Extend IMRL to multiagent scenarios !  Social intrinsic motivation

!  Based on affiliation, social signaling and reciprocity

!  Emergence of “socially aware” behaviors

!  Trade-off immediate gains for future collaboration

!  Learned behaviors benefit whole group

!  Accordance with how cooperation thrives in nature

!  Existence of signaling mechanism

!  Reciprocation opportunities

!  Future interactions

Page 34: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

34/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

Problem

Approach

Emotion-based Intrinsic Rewards

Emerging Emotions

Socially-Aware Learning Agents

Conclusions

Page 35: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

35/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

“We focus on the role of emotions and also on

the way individuals interact and cooperate with each other as a social group

to design more flexible and robust reward mechanisms that

enhance the autonomy of RL agents in

both single and multiagent settings.”

Page 36: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

36/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

!  RL and IMRL !  Alleviate perceptual limitations and modeling effort

!  More autonomous, robust and flexible mechanisms

!  Novel approach for multiagent IMRL

!  Affective Computing

!  Show the importance of emotion-based reward design

!  Independent of algorithm, not focused on set of basic emotions

!  Parallel with natural organisms

!  Importance of emotion-related information

Page 37: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

37/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013

!  Multiagent Systems !  Relation with evolutionary game theory

!  Emergence of cooperation with relatedness and reciprocation

!  Signaling mechanism according to internal social standards

!  Emergence of cooperation by means of

!  Social pressures

!  Pure altruism

Page 38: Thesis Defensegaips.inesc-id.pt/.../2013/sequeira2013phdthesis_pres.pdf · 2015-09-10 · 4/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013 ! Simple navigation problem

38/38 Pedro Sequeira – Ph.D. Thesis Defense – September 18, 2013