cognitive robotics, embodied cognition and human-robot ... · cognitive robotics, embodied...

Cognitive Robotics, Embodied Cognition and Human-Robot

Interaction

Greg Trafton, Ph.DNaval Research Laboratory

Wednesday, November 3, 2010

Report Documentation Page Form ApprovedOMB No. 0704-0188

Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering andmaintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information,including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, ArlingtonVA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if itdoes not display a currently valid OMB control number.

1. REPORT DATE SEP 2010

2. REPORT TYPE N/A

3. DATES COVERED -

4. TITLE AND SUBTITLE Cognitive Robotics, Embodied Cognition and Human-Robot Interaction

5a. CONTRACT NUMBER

5b. GRANT NUMBER

5c. PROGRAM ELEMENT NUMBER

6. AUTHOR(S) 5d. PROJECT NUMBER

5e. TASK NUMBER

5f. WORK UNIT NUMBER

7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) Naval Research Laboratory Washington, USA

8. PERFORMING ORGANIZATIONREPORT NUMBER

9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S)

11. SPONSOR/MONITOR’S REPORT NUMBER(S)

12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release, distribution unlimited

13. SUPPLEMENTARY NOTES See also ADA560467. Indo-US Science and Technology Round Table Meeting (4th Annual) - Power Energyand Cognitive Science Held in Bangalore, India on September 21-23, 2010. U.S. Government or FederalPurpose Rights License

14. ABSTRACT The last decade has seen a strong push for more and better autonomous systems. In many cases,autonomous systems and people need to interact to accomplish joint goals. We have been working ondeveloping autonomous systems that interact with people in a variety of ways. Our approach has beenfrom a strong cognitive science viewpoint: we build models of how people think, perceive, and behave, andthen put those models on our robots. I will describe our research paradigm and show severaldemonstrations of our working robots (through video).

15. SUBJECT TERMS

16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT

SAR

18. NUMBEROF PAGES

36

19a. NAME OFRESPONSIBLE PERSON

a. REPORT unclassified

b. ABSTRACT unclassified

c. THIS PAGE unclassified

Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18

Quick Background

• As a cognitive scientist, I am interested in building computational cognitive models of people

• Our models are process descriptions of how people think, reason, perceive, etc. They express behavior through a series of human representations and strategies

• Building models of people facilitates human-robot interaction: People have expectations on how to interact with other people


Online vs. Offline Embodied Cognition

• A recent drumbeat in cognitive science is that cognition is for action (embodied cognition)

• We are building embodied models for cognitive robotics and human-robot interaction

• Online Cognition: Our perceptual and motoric processing are primarily for here and now interactions (Brooks)

• Offline Cognition: Even when decoupled from the environment, thinking is grounded in body-based mechanisms (sensing and motor control)


Cognitive Models

• We build models for online and offline cognition using a computational cognitive architecture (ACT-R; Anderson et al., 2004; Anderson, 2007)

• A cognitive architecture is a specification of the structure of the brain at a level of abstraction that explains how it achieves the function of the mind (Anderson, 2007)

• Hybrid symbolic and sub-symbolic system

• Connects with psychological data (experiments) and neuroscience data (fMRI)

• ACT-R models are precise process descriptions of how people perform different tasks


ACT-R/E (Embodied)

Procedural ModuleM

atc

hin

g

Selection

Intentional

module

Goal

Buffer

Temporal

Module

Temporal

Buffer

Imaginal

Module

Declarative

Module

Retrieval

Buffer

Visual

Module

Visual

Buffer

Aural

Module

Aural

Buffer

Motor

Buffer

Motor

Module

Configural

Buffer

Configural

Module

Manipulative Buffer

Manipulative Module

Vocal

Buffer

Vocal

Module

Robot Sensors and Effectors

Environment

Imaginal

Buffer

Execu

tion


ACT-R can make predictions about brain regions (fMRI)


Embodied Cognitive Modeling

• We use an MDS robot (Trafton et al., 2010) [Mobile/Dextrous/Social] named Octavia

• 2 color cameras (eyes)

• ToF camera (swiss ranger in forehead)

• Segway base

• 52 DOFs


ACT-R/E• We’ve hooked up ACT-R/E to our robotic sensors

and effectors

• Visual input enters into ACT-R’s visicon and is then processed the same way ACT-R processes any other visual stimuli

• Spatial information is also extracted through our sensors

• ACT-R/E motor commands move the robot

• Eyes follow attention, then head, then body

• Auditory information enters into the audicon

• We have lots of good sensor / perception systems which I will not talk about here...


Gaze-Following

• As a concrete example of online embodied cognition, I will show a classic gaze-following experiment with young children from Corkum & Moore (1998)

• Debate in the literature about maturation (via stages) vs. learning

• Gaze following is important for joint attention, early social development, theory of mind, later interaction


Corkum & Moore, 1998Participants

• 3 ages:

• 6-7 months old

• 8-9 months old

• 10-11 months old


Corkum & Moore, 1998Procedure

• Child came in and sat on their parent’s lap

• Parent was blindfolded so they could not give the child cues

• Experimenter got child’s attention so the child would look at the experimenter

Child

Experimenter

Toy 1 Toy 2



• The experimenter looked at one of two toys

• The experimenter did not point or talk to the child while looking at the toy

Child

Experimenter

Toy 1 Toy 2



• If the child looked at the same toy the experimenter did, it was coded as a “joint gaze”

Child

Experimenter

Toy 1 Toy 2



• If the child looked at the toy the experimenter was not looking at, it was coded as a “non-target gaze”

• Navel gazing and non-looks were not coded

Child

Experimenter

Toy 1 Toy 2



• There were three phases to the experiment

• For all trials, experimenter looked at a toy

Phase # Trials Description

Baseline 4 (2 on each side) Toy did not light up

Shaping 4 (2 on each side)Regardless of what infant did, toy lit up and rotated when experimenter

gazed at it

Testing 20 (10 on each side)Toy lit up and rotated only if both experimenter and infant gazed at it


Corkum & Moore, 1998Scoring

• Accuracy was scored as

• Perfect gaze-following = 100%

• Random gaze-following = 50%

• Perfect anti-gaze-following = 0% (should not happen)

# Joint Gazes

Total # Gazes


Corkum & Moore, 1998Research Questions

• At what age does gaze-following naturally occur?

• Can children who did not show gaze-following at baseline learn to follow gaze?

• This “learning to follow gaze” hypothesis was quite novel when it came out


Corkum & Moore, 1998Results

• At Baseline:

• 6-7 m.o.: Chance


• 10-11 m.o.: Joint!

• After Training:


• 8-9 m.o.: Joint!

• 10-11 m.o.: Joint!

6−7 8−9 10−11 6−7 8−9 10−11

Gaze−FollowingFrom Corkum and Moore (1998)

Age (months)

Accu

racy

of f

ollo

win

g ga

ze (%

)

0

25

50

75

100

BaselineAfter training


Corkum & Moore

• Corkum & Moore showed that gaze-following can be increased during a lab setting but:

• They just showed that children could learn how to follow gaze, not the mechanism of learning (e.g., what type of learning)

• Were 10-11 m old children at the right “stage?” Or was it continuous learning?

• They did not show that the experimental trial learning was what happened during ‘normal’ development

• There are no process descriptions of this finding

• So we built an ACT-R/E model (details light here)


ACT-R/E Model of Gaze-Following

• When the model is young, it looks around the world for interesting objects

• When it habituates to an object, it looks for another interesting object

• When it looks at the same object a person does, it gets a small reward, which is propagated back in time

• In “real-life” the object of gaze is relevant to both parent and child (Deak et al., 2008) [reward is tied to relevance]

• As model ages, it gradually learns to select the right spatial information and follow gaze



• Age is modeled by providing the model with opportunities to follow gaze

• The older the model, the more gaze-following experience it has had

• After aging each model, we ran it through the exact same experimental procedure that Corkum & Moore (1998) used

• We collected accuracy at each age group for the experiment and compared it to the empirical data



• Dots are model fits

• R2 = .95

• RMSD = .3

• All model points are within 95% CI

• Process support for learning hypothesis, not stage maturation

Trafton, Harrison, Fransen, & Bugajska, 2009

6−7 8−9 10−11 6−7 8−9 10−11

Gaze−FollowingFrom Corkum and Moore (1998)

Age (months)

Accu

racy

of f

ollo

win

g ga

ze (%

)

0

25

50

75

100

●

●

●

●

●

●

BaselineAfter training


Embodied Cognitive Modeling

• We also ran our model up to 10-11 m on our MDS robot, Octavia

• Several reasons for emphasizing embodied cognition

• No cheating allowed: very explicit theory of spatial cognition, learning, development

• Must deal with the complexity of the environment (dynamic, 3D, etc): the complexity of the real world tests the model, theory, framework


Model of 10-11 month old


We can also use head position to infer what someone sees


ACT-R/E Model of Level 1 Visual PT

• Dots are model fits

• R2 = .99

• RMSD = 10.4

• All model points are within 95% CI

• Trafton, Harrison, Fransen, & Bugajska, 2009

Accuracy of Level 1 Perspective−TakingFrom Moll and Tomasello (2006)

Mea

n pr

opor

tion

of h

andi

ng ta

rget

0

20

40

60

80

100

18 months 24 months

● ●●

●

ControlExperimental


Beyond Gaze-Following

• Gaze following is a strong example of online embodied cognition: very much “here and now”

• What about another classic cognitive phenomena that uses offline embodied cognition: Theory of Mind?


Theory of Mind

• Recognition that others can have beliefs, desires and intentions different from your own

• Typically develops around age 4

• Ability to infer what the beliefs, desires and intentions of others

• Critical ability for interacting with others

• The Sally-Anne task is one of the typical tasks used to study Theory of Mind


Sally-Anne Task(Baron-Cohen et al., 1985)

A child watches while:1. Sally puts a marble in her box

2. Sally leaves the room

3. Anne moves the marble to Anne’s box

4. Sally returns to the room

5. The child is asked where Sally believes the marble is

!" #"!"

#"

$" %"

#"

&"

!" #"

'"


Theory of Mind

• Wellman (2001) performed a meta-analysis on the existing experiments with the Sally-Anne false-belief task (and there were very many)

• False-beliefs are beliefs that one may have but which are not actually true in the physical world

• They built a statistical model (not a process model)

• We built a process model of the Sally-Anne task and ran it many times at different age levels

• (Had a combination of maturation + learning)


Results

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

20 30 40 50 60 70 80 90 100

Pro

porti

on C

orre

ct

Age (Months)

662 Child Development

only what we termed primary conditions. These wereconditions in which (1) subjects were within 14months of each other in age, (2) less than 20% of theinitially tested subjects were dropped from the re-ported data analyses (due to inattention, experimen-tal error, or failing control tasks), and (3) more than80% of the subjects passed memory and/or realitycontrol questions (e.g., “Where did Maxi put thechocolate?” or “Where is the chocolate now?”). Ourreasoning was that age trends are best interpretable ifeach condition’s mean age represents a relatively nar-row band of ages; interpretation of answers to the tar-get false-belief question is unclear if a child cannot re-member key information, does not know where theobject really is, or cannot demonstrate the verbal facil-ity needed to answer parallel control questions. Inmost of the studies, few subjects were dropped, veryhigh proportions passed the control questions, andages spanned a year or less, so primary conditions in-cluded 479 (81%) of the total 591 conditions available.The primary conditions are enumerated in Table 1;they were compiled from 68 articles that contained128 separately reported studies. Of the 479 primaryconditions, 362 asked the child to judge someoneelse’s false belief; we began our analyses by concen-trating on these conditions. On average in the pri-mary conditions, 3% of children were dropped from acondition, children were 98% correct on control ques-tions, and ages ranged 10 months around their meanvalues.

In an initial analysis only age was considered as afactor. As shown in Figure 2, false-belief performancedramatically improves with age. Figure 2A showseach primary condition and the curve that best fits thedata. The curve plotted represents the probability ofbeing correct at any age. At 30 months, the youngestage at which data were obtained, children are morethan 80% incorrect. At 44 months, children are 50%correct, and after that, children become increasinglycorrect. Figure 2B shows the same data, but in thiscase the dependent variable, proportion correct, istransformed via a logit transformation. The formulafor the logit is:

,

where “ln” is the natural logarithm, and “

p

” is theproportion correct. With this transformation, 0 rep-resents random responding, or even odds of predict-ing the correct answer versus the incorrect answer.(When the odds are even, or 1, the log of 1 is 0, so thelogit is 0.) Use of this transformation has three majorbenefits. First, as is evident in Figure 2B, the curvilin-ear relation between age and proportion correct is

logit ! ln p1 p–------------! "

# $

straightened, yielding a linear relation that allowssystematic examination of the data via linear regres-sion; second, the restricted range inherent to propor-tion data is eliminated, for logits can range fromnegative infinity to positive infinity; and third, thetransformation yields a dependent variable and ameasure of effect size that is easily interpretable interms of odds and odds ratios (see, e.g., Hosmer &Lemeshow, 1989).

The top line of Table 2 summarizes the initial anal-ysis of age alone in relation to correct performance

Figure 2 Scatterplot of conditions with increasing age show-ing best-fit line. (A) raw scatterplot with log fit; (B) proportioncorrect versus age with linear fit. In (A), each condition is rep-resented by its mean proportion correct. In (B), those scores aretransformed as indicated in the text.

Welman, 2001 Our data

R2 = 0.55 R2 = 0.56

Hiatt & Trafton, 2010Wednesday, November 3, 2010

Theory of Mind

• Work so far has focused on development

• It is impossible to program every situation an autonomous agent will encounter, so we have focused on learning because people are the best, most robust learning systems we know of

• Besides building theoretical models of people (which advances cognitive science by providing process models of how people think, learn, embodied cognition, etc.), we can show the robustness of our models by using/demonstrating them in a different context


Embodied Cognition• We build computational cognitive models that takes

embodiment seriously

• Our models are process models of how people think

• Our models match human data at multiple levels

• We take our models and put them on robots so they are functional and have an embodied presence and interaction

• Our approach is radically different from typical robotics architectures (and from other EC researchers)

• Our last video shows interactive behavior with a focus on computational vision and perception...


Thanks!


•

• • I

cognitive robotics, embodied cognition and human-robot ... · cognitive robotics, embodied...

Documents