phd presentation - ulisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf ·...
TRANSCRIPT
![Page 1: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/1.jpg)
PhD Presentation Biologically-inspired Models
for Learning Agents
![Page 2: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/2.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Introduction
Motivation
Case Studies
Conclusions
![Page 3: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/3.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Introduction
Motivation
Case Studies
Conclusions
![Page 4: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/4.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Prof. Francisco Melo as thesis co-supervisor
CAT in mid-July
Objectives
General problem
General solution
Focus on case studies / experiments
Main idea
Provide learning models to autonomous agents
Inspired on biological models
![Page 5: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/5.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Introduction
Motivation
Case Studies
Conclusions
![Page 6: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/6.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Definitions [Franklin & Graesser, 1997; Maes, 1994]
situated in dynamic environments
have and actively pursue goals
satisfy their needs
respond to external events from the environment
MAS - live and interact with
other agents
![Page 7: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/7.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Requirements [Franklin & Graesser, 1997; Maes, 1994]
mechanisms to distinguish perceived features
focus on relevant features, ignore non-important ones
adapt to and learn new knowledge from the environment
take the right action at each decision time
structures that represent the acquired knowledge
update representations overtime to reflect experience
![Page 8: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/8.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Building Agents
Key is ADAPTATION
Provide prior knowledge
sufficient for the agent to perceive its environment
Use learning mechanisms
update the agent’s knowledge
![Page 9: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/9.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Problems
Prior knowledge
lots of pre-programming of behaviors
large knowledge bases
Perceptual limitations
world dynamics, good states
Acting limitations
good actions
Learning
which paradigm / framework to use?
![Page 10: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/10.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Parallel between natural and artificial agents
Inhabit highly dynamic environments
Have to make complex decisions under uncertainty
Limited perceptual and acting capabilities
Focus on important events
Live in organized societies
![Page 11: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/11.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Inspiration from biological models
Evolutionary adaptive mechanisms
Simple but powerful survival tools
Improve performance with experience
Take the most of the perceived information
Lead to a greater fitness
![Page 12: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/12.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Inspiration from several research areas
Psychology
Biology
Ethology
Neuroscience
…
![Page 13: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/13.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Classical conditioning in RL
Improve learning speed
State-space reduction
Emotion-based Intrinsic Motivated RL
Single-agent event-processing mechanism
Use emotions as intrinsic rewards
Clues from agent-environment relationship
Improve agent fitness
Socially-aware IMRL
Multi-agent social processing mechanism
Use affiliation / cooperation
Improve population fitness
![Page 14: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/14.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Introduction
Motivation
Case Studies
Conclusions
![Page 15: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/15.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Inspired from animal learning
Teach an animal to respond in certain way
Provide reward and punishment appropriately
Main Ideas [Sutton & Barto, 1998]
Learn from experience
Situations + Actions → Reward
Reward is external feedback signal
Objective: maximize the reward receive throughout time
Task: discover which actions maximize reward in each state
Trial-and-error search
Mind subsequent (delayed) rewards
![Page 16: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/16.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Main Idea
Inspiration from classical conditioning paradigm
Partition observations into stimuli
Propose a measure for distance between states
Learn the value of states based on proximate
states
Propagated learning
Reduce space-state
Reduce learning time
![Page 17: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/17.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Classical Conditioning [Pavlov, 1927]
Advantages Contingency between stimuli in the environment
Independent of the animal's behavior
Animal does not learn behavior consequences
Predict the outcomes of new events from already-known situations
Create new contexts for behavior activation
US
food delivery
UR
salivation
CS
bell
CR
conditioned salivation
CS
bell
training…
![Page 18: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/18.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Model Based on Sensory Pattern Mining [Sequeira & Antunes, 2010]
Partition observations into stimuli
e.g. see bone, has ball, hear “Fetch!”
Build tree containing frequent patterns
![Page 19: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/19.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Model Use the Jaccard index [Jaccard, 1912]
frequency of intersection between stimuli over
frequency of union of the stimuli:
Advantages Sensible to particular correlations between stimuli
Rapid access to frequent patterns
![Page 20: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/20.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Learning Model Extend Q-learning algorithm [Watkins, 1989]
Determine similar states using the pattern tree
State distance measure:
Propagated multi-state update of values
New state receives information from similar states
![Page 21: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/21.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Experiment
Inspired in animal training
stimuli: 3 visual, 2 tactile, 2 auditory
actions: Pick, Drop, Eat,
Approach Trainer, Approach Ball
4 phases: acquisition, extinction,
association, substitution
Objectives
form associations between co-occurrent stimuli
evoke innate responses in new stimuli
discovery of new contexts for already-known responses
![Page 22: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/22.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Main results Faster initial learning
Secondary conditioning (e.g. “Fetch!” heard in more cells)
New contexts for actions (e.g. Eat when bone is present)
![Page 23: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/23.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Main Ideas [Singh et al., 2010]
Reward behaviors rather than consequences
Agent receives augmented reward
extrinsic reward
“normal” reward in RL, related with task (e.g. fulfillment of
needs)
intrinsic reward
does not directly relate with the task (e.g. play or explore)
Objective: maximize total reward
![Page 24: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/24.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Main Idea
Inspiration from emotional appraisal mechanisms
Mathematical adaptation of dimensions
Emotions as intrinsic rewards
Integrate with IMRL framework
Provide clues from agent-environment relationship
Enhance single agent fitness
![Page 25: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/25.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Emotions [Dawkins, 2000; Cardinal et al., 2002]
Evolutionary adaptive mechanism
Combined with learning signal advantageous and dangerous situations
help when seeking food and avoiding harm
Bias decision making [Naqvi et al., 2006]
maximizing reward and minimizing punishment
In humans [Phelps & LeDoux, 2005]
memory enhancement, sensory plasticity, attention facilitation,
regulation of social behavior, regulation and inhibition of
emotional responses
![Page 26: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/26.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Appraisal theories of emotion [Ellsworth & Scherer, 2003; Leventhal & Scherer, 1987]
Emotions arise from evaluations
Characterize subject-environment relationship
Significance for the person’s well-being or goals
Appraisal dimensions
each dimension evaluates a specific aspect
![Page 27: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/27.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Model of emotions in IMRL
Inspired in appraisal theories of emotions
Adopt four common appraisal dimensions
novelty, motivation, valence, control
each evaluates agent-environment relationship
numerical value represents dimension activation
Use dimension adaptations as reward features
each feature is component of intrinsic reward
![Page 28: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/28.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Affective reward features
Adaptation from Major Dimensions of Appraisal [Ellsworth & Scherer, 2003]
intentionally did not adapt social dimensions
Problem: appraisal theories usually deal with high-level
psychological processes
complex concepts (e.g. causal attribution, norms)
Solution: inspiration from the Multilevel Process Theory Of
Emotion [Leventhal & Scherer, 1987]
appraise events at different levels
emotions from reflex-like responses into complex cog. Patterns
Evaluate aspects of the agent’s history of interaction
![Page 29: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/29.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Affective reward features
Novelty: degree of familiarity of events
Valence: innate pleasure detector, learned preferences
Motivation: relevance of event for goals or needs
Control: degree of correctness of the world-model
![Page 30: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/30.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Experiments
Grid-world scenarios inspired in foraging environments
agent is a predator, tries to eat preys in the environment
observations: cell position, see prey
actions: N, S, E, W, Eat
Dyna-Q/prioritized sweeping alg. [Moore & Atkeson, 2003]
Objectives
maximize the agent’s fitness (extrinsic reward)
optimize feature weight vector
![Page 31: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/31.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Exploration scenario
One prey
Eat prey, rext=1
Non-Markovian
Results
optimal weight vector
optimal fitness: 1.902,2
"only extrinsic" fitness: 135,9
![Page 32: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/32.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Persistence scenario Two preys: rabbit and hare
Eat prey
rabbit: rext=0,1; hare: rext=1
Fence
n North actions, next time, n+1
Non-Markovian
Results
optimal weight vector
optimal fitness: 1.020,8
"only extrinsic" fitness: 25,4
![Page 33: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/33.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Prey-season scenario Two preys: rabbit and hare
Eat prey
rabbit: rext=0,1; hare: rext=1
Two seasons: rabbit and hare
10.000 steps
if 10 rabbits eaten, rext=-1
Non-Markovian
Results
optimal weight vector
optimal fitness: 5.203,5
"only extrinsic" fitness: 334,2
![Page 34: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/34.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Different rewards scenario
Two preys: rabbit and hare
always available
Eat prey
rabbit: rext=0,1; hare: rext=1
Markovian
Results
optimal weight vector
optimal fitness: 87.925,7
"only extrinsic" fitness: 87.890,8
![Page 35: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/35.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Conclusions
Intrinsic reward features based on emotional appraisal
Guide the agent during learning
Focus on specific aspects of the environment
Balance between different strategies
Bring attention to advantageous states
Ignore not so favorable states
![Page 36: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/36.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Main Idea
Integrate with IMRL framework
Multi-agent scenarios
Inspiration from affiliation and altruism
Mathematical adaptation of social signals
Emergence of socially-aware behaviors
Raise the fitness of the population
Even the fitness of each agent
![Page 37: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/37.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Affiliation [Dörner, 1999; Bach, 2009]
Urge to affiliate / interact with other agents
Send and receive legitimacy signals
reward socially-acceptable behaviors (l-signals)
punish unsuccessful interactions (anti l-signals)
internally reward or punish socially-aware behaviors
(internal l-signals)
Altruism [de Waal, 2008]
Intrinsic reward when benefit for the social group
initial cost but subsequent compensation
![Page 38: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/38.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Model for socially-motivated learning
Fitness measured at the population level:
Intrinsic reward: two social features
external signal: received from other agents, based on l-signal
internal signal : generated by the agent, based on internal l-signal
represent level of satisfaction of affiliation need
Total reward
![Page 39: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/39.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Social features for limited resource scenarios
Extrinsic reward
rext = IsFull – 0.1 IsHungry
External reward feature
rsExt = LastToEat AND SeeFood AND SeeOther AND !Eat
Internal reward feature
rsInt = LastToEat AND SeeFood AND !Eat
![Page 40: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/40.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Experiments
Grid-world scenarios inspired in foraging environments
two predator agents
observations: position, SeeFood, SeeOther, LastToEat, IsHungry
actions: N, S, E, W, Eat
rewards: reat = 1, rhungry = -0.1
agents become hungry after 30 timesteps
Objectives
maximize the population fitness (sum of extrinsic rewards)
optimize feature weight vector
![Page 41: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/41.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Single-food scenario
One food resource
Agent that eats starts closer
to food resource (bottom-right)
Results
optimal weight vector
optimal fitness: 3.249,2
"only extrinsic" fitness: -19.991,3
![Page 42: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/42.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Equal-resource scenario
Two food resources
Agent that eats starts bottom-right
Possibility of both eating
Results
optimal weight vector
optimal fitness: 18.178,8
"only extrinsic" fitness: -2.296,9
![Page 43: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/43.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Stronger-agent scenario
One food resource
Both start bottom-right
One agent is stronger
When both try to eat,
only one succeeds
Results
optimal weight vector
optimal fitness: 2.656,1
"only extrinsic" fitness: -1.164,9
![Page 44: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/44.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Introduction
Motivation
Case Studies
Conclusions
![Page 45: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/45.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Biologically-inspired learning models Provide built-in prior knowledge
Learning framework based on RL and IMRL
Rewards based on agent-environment relationship
Results Speed up learning
State-space reduction
Intrinsic features provide clues on important aspects
Lead to different strategies
Not directly related to, but increase fitness
Lead to “socially-aware” behaviors
![Page 46: PhD Presentation - ULisboaweb.ist.utl.pt/.../documents/phdpresentation4gaips.pdf · web.ist.utl.pt/~pedro.sequeira/phd Introduction Motivation Case Studies Conclusions](https://reader034.vdocuments.us/reader034/viewer/2022042321/5f0b17377e708231d42ecd7b/html5/thumbnails/46.jpg)
web.ist.utl.pt/~pedro.sequeira/phd
Improve classical conditioning model
Support more learning paradigms
Improve multi-agent model
Inspiration on cooperation
Evolutionary Game Theory
CAT…