parental scaffolding as a bootstrapping mechanism for ...right: 7 dof robot arm, 16 dof robot hand,...

8
Proceedings of the RAAD 2013 22nd International Workshop on Robotics in Alpe-Adria-Danube Region September 11-13, 2012, Portoroz, Slovenia Parental scaffolding as a bootstrapping mechanism for learning grasp affordances and imitation skills Emre Ugur a , Yukie Nagai b , and Erhan Oztop c a Intelligent and Interactive Systems, Institute of Computer Science, University of Innsbruck, Innsbruck, Austria. E-mail: [email protected] b Graduate School of Engineering, Osaka University, Osaka, Japan E-mail: yukie at ams.eng.osaka-u.ac.jp b Robotics Laboratory, Department of Computer Science, Ozyegin University, Istanbul, Turkey E-mail: [email protected] Abstract. Parental scaffolding is an important mechanism utilized by infants during their development. Infants, for example, pay stronger attention to the features of objects highlighted by parents and learn the way of manipulating an object while being supported by parents. Parents are known to make modifications in infant-directed actions, i.e. use “motionese”. Motionese is characterized by higher range and simplicity of motion, more pauses between motion segments, higher repetitiveness of demonstration, and more frequent social signals to an infant. In this paper, we extend our previously developed affordances framework to enable the robot to benefit from parental scaffolding and motionese. First, we present our results on how parental scaffolding can be used to guide the robot and modify robot’s crude action execution to speed up learning of complex actions such as grasping. For this purpose, we realize the interactive nature of a human caregiver-infant skill transfer scenario on the robot. During reach and grasp attempts, the movement of the robot hand is modified by the human caregiver’s physical interaction to enable successful grasping. Next, we discuss how parental scaffolding can be used in speeding up imitation learning. The system describes how our robot, by using previously learned affordance prediction mechanisms, can go beyond simple goal-level imitation and become a better imitator using infant-directed modifications of parents. Keywords. Developmental robotics, affordance, imitation, parental scaffolding, motionese 1. Introduction Scaffolding in developmental psychology refers to the support from an (adult) caregiver in order to speed up a child’s skill and knowledge acquisition (Berk and Winsler (1995)). This support can take various forms, including the attraction and maintenance of the child’s attention on relevant items, the shaping of the envi- ronment in order to ease the task (such as positioning and orienting the child so as to limit its degree of freedom), signalling the important features or subgoals of the task, or providing feedback and reinforcement (Wood et al. (1976)) (Fig. 1, left). When the task gets out of hand, puzzling the child, the caregivers step in as the “trouble-shooter” (Zukow-Goldring and Arbib (2007)). They interfere at different steps of the task, initially demonstrating the goal and drawing attention to task-relevant features, then “embodying” the child to co-achieve the goal. Throughout the process, they let the child have proprioceptive, force, tactile, visual, and auditory feedback, until the goal is achieved. Moreover, it is not only the adults who are interested in this kind of interaction. By acquiring the ability for joint attention, the children become aware of the caregiver as a “helper”, and begin “asking” for help when faced a difficult task by displaying significant communicative gestures. From the age of 9-months, when the joint attention mechanisms begin to emerge, to 18-months, infants use more and more of these communicative signals (Goubeta et al. (2006)). These interactions have a purpose. Infants can exhibit certain skills in game-contexts together with their mothers far before they can perform them in isolated cognitive tests (Hodapp et al. (1984)). The idea of parental scaffolding seems especially 1

Upload: others

Post on 11-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parental scaffolding as a bootstrapping mechanism for ...Right: 7 DoF robot arm, 16 DoF robot hand, table, and a sample object is shown. The range camera is placed on the top-right

Proceedings of the RAAD 2013

22nd International Workshop on Robotics in Alpe-Adria-Danube Region

September 11-13, 2012, Portoroz, Slovenia

Parental scaffolding as a bootstrapping mechanism for learning graspaffordances and imitation skills

Emre Ugura, Yukie Nagaib, and Erhan Oztopc

aIntelligent and Interactive Systems, Institute of Computer Science,

University of Innsbruck, Innsbruck, Austria.

E-mail: [email protected]

bGraduate School of Engineering,

Osaka University, Osaka, Japan

E-mail: yukie at ams.eng.osaka-u.ac.jp

bRobotics Laboratory, Department of Computer Science,

Ozyegin University, Istanbul, Turkey

E-mail: [email protected]

Abstract.Parental scaffolding is an important mechanism utilized by infants during their development.Infants, for example, pay stronger attention to the features of objects highlighted by parentsand learn the way of manipulating an object while being supported by parents. Parents areknown to make modifications in infant-directed actions, i.e. use “motionese”. Motionese ischaracterized by higher range and simplicity of motion, more pauses between motion segments,higher repetitiveness of demonstration, and more frequent social signals to an infant.In this paper, we extend our previously developed affordances framework to enable the robot tobenefit from parental scaffolding and motionese. First, we present our results on how parentalscaffolding can be used to guide the robot and modify robot’s crude action execution to speedup learning of complex actions such as grasping. For this purpose, we realize the interactivenature of a human caregiver-infant skill transfer scenario on the robot. During reach andgrasp attempts, the movement of the robot hand is modified by the human caregiver’s physicalinteraction to enable successful grasping. Next, we discuss how parental scaffolding can be usedin speeding up imitation learning. The system describes how our robot, by using previouslylearned affordance prediction mechanisms, can go beyond simple goal-level imitation andbecome a better imitator using infant-directed modifications of parents.

Keywords. Developmental robotics, affordance, imitation, parental scaffolding, motionese

1. Introduction

Scaffolding in developmental psychology refers to the

support from an (adult) caregiver in order to speed up

a child’s skill and knowledge acquisition (Berk and

Winsler (1995)). This support can take various forms,

including the attraction and maintenance of the child’s

attention on relevant items, the shaping of the envi-

ronment in order to ease the task (such as positioning

and orienting the child so as to limit its degree of

freedom), signalling the important features or subgoals

of the task, or providing feedback and reinforcement

(Wood et al. (1976)) (Fig. 1, left). When the task gets

out of hand, puzzling the child, the caregivers step in

as the “trouble-shooter” (Zukow-Goldring and Arbib

(2007)). They interfere at different steps of the task,

initially demonstrating the goal and drawing attention

to task-relevant features, then “embodying” the child

to co-achieve the goal. Throughout the process, they

let the child have proprioceptive, force, tactile, visual,

and auditory feedback, until the goal is achieved.

Moreover, it is not only the adults who are interested

in this kind of interaction. By acquiring the ability

for joint attention, the children become aware of the

caregiver as a “helper”, and begin “asking” for help

when faced a difficult task by displaying significant

communicative gestures. From the age of 9-months,

when the joint attention mechanisms begin to emerge,

to 18-months, infants use more and more of these

communicative signals (Goubeta et al. (2006)). These

interactions have a purpose. Infants can exhibit certain

skills in game-contexts together with their mothers far

before they can perform them in isolated cognitive

tests (Hodapp et al. (1984)).

The idea of parental scaffolding seems especially

1

Page 2: Parental scaffolding as a bootstrapping mechanism for ...Right: 7 DoF robot arm, 16 DoF robot hand, table, and a sample object is shown. The range camera is placed on the top-right

Fig. 1. Left: The caregivers step in when the task of the child gets out of hand. They teach infants different skills by

highlighting important features of the task or marking the boundaries of action units. Right: 7 DoF robot arm, 16 DoF

robot hand, table, and a sample object is shown. The range camera is placed on the top-right and not visible. The

human teacher can change the default trajectory of the robot to enable grasping thanks to the force/torque sensor that is

placed between robot arm and hand.

inspiring from a robotics point of view. The idea

has been exploited in various studies with different

viewpoints, such as for better communication between

humans and robots (Breazeal (1999)), or as a ground-

ing principle for lifelong developing of robots “at

home” (Saunders et al. (2006)). It has been shown

that caregivers tend to modify their motions when

teaching a task to a child. Analogous to “motherese”,

Brand et al. (2002) call these motions of higher in-

teractiveness, enthusiasm, proximity, range of motion,

repetitiveness and simplicity, as “motionese”. In a

similar line, Nagai and Rohlfing (2009a) reveals a

significant amount of bottom-up saliency features in

infant-directed interaction versus adult-directed inter-

action. Motivated by these findings, Nagai et al.

(2008) develops a bottom-up architecture for robot-

infants, who, like human-children, are equipped with

minimal a-priori information, and therefore in need

of depending mainly on bottom-up signals as much

as possible. Interestingly, such infant-robots are also

found to motivate humans to use motionese as if they

were dealt like human children. Due to the robots’

limited attention mechanisms, humans try to carefully

teach a task for example by approaching to the robots

and introducing the object closely to their attention,

sometimes shaking it, amplifying their movements and

making pauses. Evidently, these are also widely used

tactics in parent-infant communication.

Another issue in robot imitation is the inability of

the robot to understand “what” to imitate. In particu-

lar, there are goal-oriented tasks, where any means to

achieve the goal are acceptable, versus means-oriented

tasks, where the motion itself is equally important.

This is also a problem faced by human infants. Here

parents again come to rescue by signalling the impor-

tant features (Nagai and Rohlfing (2008)). In a goal-

oriented task, they emphasize initial and final states,

as well as important sub-goals, by taking long pauses.

Conversely, in a means-oriented task, they emphasize

the movement itself by adding additional movements

to the object, for instance by shaking it.

Scaffolding has also been used as a means of

“correcting” the robot’s experiences and letting it learn

the “right” way: Saunders et al. (2006) demonstrates

the scaffolding of the environment itself as one way

of reducing complexity. The robot is expected to

perform pre-taught behaviors, such as wall-following,

by matching its current sensory values to previously

memorized instances. The most “similar” previous

instance is decided, however, by giving more weight to

features with higher information gain for the specific

task. Here comes in the human teacher, who modifies

the environment in the learning phase, by reducing

variations in irrelevant features. These features, hav-

ing always constant values, will not affect the robot’s

behavior later on. Argall et al. (2010) takes a more

direct approach, by correcting the very movement of

the robot. The robot is allowed to learn a behavior

from demonstrations by a teacher. Once it derives a

policy, the human will help it correct its movement

by online tactile feedback from sensors attached to its

wrist.

Learning through self-exploration and imitation

are crucial mechanisms in developing sensorimotor

skills for developing robots. Our previous research

(Ugur et al. (2011b)) contributed to this research pro-

2

Page 3: Parental scaffolding as a bootstrapping mechanism for ...Right: 7 DoF robot arm, 16 DoF robot hand, table, and a sample object is shown. The range camera is placed on the top-right

gram by showing that with self-exploration a robot can

shape its initially crude motor patterns into well con-

trolled, parameterized sensorimotor behaviors which

it can use to understand the world around it. In this

paper, we extend this framework by enabling the robot

to use parental scaffolding in two major robot learning

problems.

• Section 2 describes our first extension attempt

(Ugur et al. (2011a)) where a human caregiver

speeds up robot’s affordance acquisition for

grasp action through parental scaffolding. As

the behavior parameter space is very large in

grasping with dexterous robot hand and many

different parts of complex objects provide gras-

pability, learning through self-exploration is a

slow and expensive process. A human caregiver

can step in for help by physically modifying

robot’s built-in reach-grasp-lift behavior exe-

cution trajectory. While being guided by the

human, the robot first detects the ‘first-contact’

points its finger made with the objects, and

stores the collection of these points as graspable

regions if the object is lifted successfully. Later,

it builds up simple classifiers using these experi-

enced contact regions and use these classifiers to

detect graspable regions on novel objects. At the

end, the robot hand was shown to lift an object

in different orientations by selecting one of the

experienced trajectories.

• In our previous work, we showed that after

learning affordances, the robot can make plans

to achieve desired goals and emulate end states

of demonstrated actions. In Section 3, we

extend this framework and discussed how our

robot, by using the learned behaviors and af-

fordance prediction mechanisms, can go beyond

simple goal-level imitation and become a better

imitator. For this, we develop mechanisms

to enable the robot to recognize and segment,

with the help of the demonstrator, an ongoing

action in terms of its affordance based goal

satisfaction. Once the subgoals are obtained, the

robot imitates the observed action by chaining

these sub-goals and satisfying them sequen-

tially. In this study, the demonstrator is expected

to modify his/her action executions for the robot

to better understand and imitate these actions,

similar to parents who make modifications in

infant-directed actions.

2. Scaffolding in learning affordances

In our parental scaffolding framework, the robot has

a default reach-grasp-lift action where the object is

detected by robot’s perceptual system, a reach trajec-

tory is computed based on robot’s arm kinematics and

object center, and the robot fingers are closed when

they are nearby to the object. The robot has no initial

knowledge about graspability of the objects. Differ-

ent objects can be grasped from different parts with

different hand orientations, so reach to object center

execution should be modified by the human teacher

during trajectory execution. Thus, the initial trajectory

is modified by incorporating the force applied to the

robot hand by the human during the course of the

action. The points on the object where fingers make

first contact are stored as potential grasp affording

parts. After hand closure is completed, the robot lifts

its hand and checks whether the object is lifted or not

by searching table surface with its perceptual system

again and through force sensor measurements. This

online, human modified reach-grasp-lift action is re-

peated many times with different object configurations

and graspable parts of the objects are discovered by

the robot. Below, the tools and methods to realize this

framework will be detailed.

2.1. Robot platform

An anthropomorphic robotic system equipped with a

range camera is used as the experimental platform.

This system uses a 7 DoF Motoman robot arm, that is

placed on a vertical bar similar to human arm as shown

in Figure 1. A five fingered 16 DoF Gifu robot hand is

mounted on the arm to enable manipulation. 123 cm

and 23 cm, respectively. For environment perception,

a infrared time-of-flight range camera (SwissRanger

SR-4000), with 176x144 pixel array, 0.23◦ angular

resolution and 1 cm distance accuracy is used.

We aim to guide the robot similar to a caregiver

guiding an infant’s movement. It is natural for a

human parent to hold the infant’s hand, and position

it in the space to help with a grasp. A similar intuitive

effect is obtained by attaching a force sensor to right

below the hand, at the wrist position. The desired

effect is holding the wrist of the robot, and moving the

7-DoF arm in the 6-DoF Cartesian space freely. This

is achieved by measuring the force applied to the robot

and converting it into joint displacements.

2.2. Robot perception

The robot uses range camera to detect the object on the

table. The detected object is represented as a 3D point

cloud and from this point cloud various features such

as local distance histograms as detailed in the next

subsection are computed. Furthermore, the distance

between any point on the object and hand fingers is

computed by comparing the 3D position of that point

measured by range camera and 3D position of the

finger computed by forward kinematics based on arm

and hand joint angles. As a result the robot can close

its hand when the fingers are nearby to the object or

can detect the object points that are in contact with

fingers.

3

Page 4: Parental scaffolding as a bootstrapping mechanism for ...Right: 7 DoF robot arm, 16 DoF robot hand, table, and a sample object is shown. The range camera is placed on the top-right

Fig. 2. Guided grasp experience. Rows from top to bottom

show detected objects in the scene before action

execution, how robot hand was controlled, detected

‘first-contact’ points, and scene after behavior

execution.

2.3. Distance histogram based classifier

We used a feature that captures the relative property

of the grasped-part with regards to the totality of the

object. Inspired from shape and topological feature

detectors in primate brain (Murata et al. (2000)), we

proposed a metric that captures the distribution of

three dimensional points (i.e. voxels) that make up a

given object. We propose that each voxel is identified

by the distribution of its distances from the neighbor-

ing voxels that make up the object. This distribution

changes smoothly as one moves smoothly on the

surface of the object, and is invariant of orientation

changes. Our idea was to develop a classifier based

on this metric, with the intuition that the handle voxels

would have similar distance distributions that are sig-

nificantly different from the body voxel distributions.

For the handle, voxels found by interacting with the

object are used to construct a distance distribution,

pH . Likewise the rest of the object points is used to

construct a distribution representing the non-handle

points, pB. At a later time when the robot faces a

novel object it computes a distance distribution for

each point on the object and compares it with pH and

pB, and decides what points can be used as handles.

2.4. Experiments

Touch region detection results: The robot initially

has a rough grasping skill; it reaches for the center

of the object and encloses its fingers upon contact. A

human caregiver interferes with the execution of this

basic skill in attempt to achieve successful grasping.

The human provides only partial guidance, making

this a true collaboration. In this setup, learning of

Fig. 3. Objects and their first contact voxels obtained during

training (see Figure 2) are shown.

Fig. 4. The mean histograms of graspable and remaining

voxel distances, respectively.

caregiver is also critical, as the ability of the robot

control system and the properties of the robot hand for

grasping must be learned by the caregiver. Once the

human-robot collaborative system manages grasping,

by using camera and proprioceptive information the

robot can discern parts of the object that it grasps. In

the current implementation, we show that this informa-

tion can be used to develop ’handle detectors’ so that

the robot can perform general handle-grasps without

human guidance. The successful grasping obtained

by human guidance is detected by the robot and the

positions of the fingers on the object are computed

as illustrated in Fig. 2. Repeated application of this

process, allows robust discrimination of object points

that afford the current action (i.e. handle grasping).

Grasp region detection on novel objects:We tested

this classifier using grasp executions mediated by the

caregiver for a mug type object that is placed at

five different orientation as shown in Figure 3. The

final representative distributions for handle (pH ) and

body (pB) obtained by combining these individual

histograms are shown in Figure 4.

We first, tested whether this simple classifier can

identify the handle parts of the original object accu-

rately or not. In particular, during an interaction not all

parts of the handle are touched so they were initially

marked as belonging to object body (Figure 3). With

this density based classifier it can be seen that most of

the handle voxels are indeed found as handle voxels

(Figure 5 (a)) with little false positives. The more

challenging task was to see whether this classifier

could detect handle-like parts from unseen objects.

For this, we used five different objects as seen in

Figure 5 (b). Although there were false matches, most

of the voxels identified as handles were indeed handles

or could be considered handles (Figure 5 (c)).

Autonomous grasp executions: The focus was to

learn and infer the graspable parts of the objects in

our parental scaffolding framework. Although it is

4

Page 5: Parental scaffolding as a bootstrapping mechanism for ...Right: 7 DoF robot arm, 16 DoF robot hand, table, and a sample object is shown. The range camera is placed on the top-right

Fig. 5. (a) gives the results obtained from grasp

classification of each voxel on training objects. See

Fig. 3 for real touch regions of these objects. On the

other hand, (c) gives the grasp classification results

for novel objects whose pictures were shown in (b).

not the main focus, the learned knowledge can also

be utilized to autonomously control robot arm and

to lift the objects. For this purpose, we designed

a simple lookup-table based mechanism to select a

reach-grasp-lift execution trajectory to lift the object

that had been used in training.

During training, the robot’s guided lift experience

was stored as a list of set of object voxels, set of touch

voxels, and modified hand-arm angle trajectory. From

this experience, the position of the largest touch region

relative to the object center was computed. Then,

a lookup table was constructed with relative position

information in one column and hand-arm trajectory in

the other column. When a new object is perceived,

the robot first finds the grasp regions using the simple

classification method, then computes the position of

the largest grasp region relative to the object center.

This relative position is searched in the lookup table,

the closest experienced relative grasp region position

is found, and the corresponding hand-arm trajectory

is executed. Note that the use of this distance metric

for trajectory selection is limited to the object that

was experienced during training, i.e. this metric

cannot handle multiple objects with different shapes

and sizes.

In the experiments, the object used in training was

placed in 5 different orientations. Each row in Fig-

ure 6 corresponds to a grasp execution for a different

orientation. The snapshots were taken for initial hand

posture, while hand was reaching the object, during

the first contact, during grasping and at the final stage

of lifting, respectively. The first four executions were

successful at the end since the object was placed in

similar orientations with training instances. In the last

execution, the handle was behind the object, so the

robot selected an incorrect execution trajectory.

Fig. 6. Robot grasps objects using the trajectories learned

during scaffolding. Each row corresponds to a

different grasp execution.

3. Scaffolding in imitation learning

In this section, the next stage of developmental pro-

gression is discussed in the form of imitation learning

where the affordance prediction capability can be used

to emulate the end states but not to imitate the action

trajectory. These tasks can be taught to a robot through

imitation, where the robot observes the demonstration,

extracts important steps from the movement trajectory,

encodes those steps as sub-goals and find the behaviors

to achieve these goals. Learning higher level skills

based on previously learned simpler ones is more

economical and usually easier for building a complex

sensorimotor system (Kawato and Samejima (2007)).

Therefore, here we propose to use learned affordances

(in the form of (object, behavior, effect) relations) and

affordance prediction capabilities as basic elements in

understanding and achieving sub-goals.

Extracting sub-goals or important features from a

demonstration is not straightforward as demonstrated

action trajectory may not correspond to any robot

behavior developed so far. For example when the

robot is asked to imitate a demonstration shown in

Fig. 7(a), as the observed trajectory is not represented

in robot’s sensorimotor space, executing the behavior

that seemingly achieves the goal would not satisfy the

imitation criteria (for example a right push of the ring

would tip over the cylinder). Young infants also have

similar difficulties in mapping observed actions within

5

Page 6: Parental scaffolding as a bootstrapping mechanism for ...Right: 7 DoF robot arm, 16 DoF robot hand, table, and a sample object is shown. The range camera is placed on the top-right

(a) Without motionese (b) With motionese

Fig. 7. An example scenario where demonstration of

enclosing a cylinder with a ring is continuous in (a)

and exaggerated in (b). The goal configuration of the

ring is indicated by the red colored ring enclosing

the cylinder. (a) If the robot cannot capture the

important features of the demonstration, it may

attempt to bring the ring to the goal position by

simply pushing it to the right, where the ring will

push the cylinder away rather than enclosing it. On

the other hand, when important steps are highlighted

by for example pauses as in (b), the robot can extract

sub-goals represented in its perceptual space and

find a behavior sequence from its behavior repertoire

to imitate the action correctly.

their own repertoire and in imitating these actions

successfully.

As explained in the Introduction Section, to over-

come this difficulty, parents are known to make mod-

ifications in infant-directed actions, i.e. use ”mo-

tionese”. Fine-grained analysis using a computational

attention model reveals the role of motionese in action

learning (Nagai and Rohlfing (2009b)). Longer pauses

before and after the action demonstration underline the

initial and final states of the action (i.e. the goal of

the action) whereas shorter but more frequent pauses

between movements highlight the sub-goals of the

action.

Inspired from infant development, in this section

we also use ‘motionese’ to enable the robot to identify

the important steps and the boundaries in the otherwise

complex stream of motion involving multi-objects.

A human tutor can exaggerate the relevant features

in his demonstration as in Fig. 7(b) and enable the

robot to map the exaggerated sub-steps into its own

behavior repertoire and imitate the action sequence

successfully.

3.1. Affordances and Effect Prediction

In our previous work (Ugur et al. (2011b, 2012)), the

affordances were defined as (object, behavior, effect)

relations, and with this we have shown that affordance

relations can be learned through interaction without

any supervision. As learning the prediction ability

is not focus of this paper, we skip the details and

shortly present how prediction operator works. This

prediction operator can predict the continuous change

in features given object feature vector, behavior type

and behavior parameters:

( f (),b j,ρ f )→ f ′[]b jeffect (1)

where f ′[]b jeffect denotes the effect predicted (′) to be

observed after execution of behavior b j.

3.2. State Transition

The state corresponds to the list of feature vectors

obtained from the objects in the environment:

S0 = [ f()o0 , f

()o1 , .., f

()om ]

where () denotes the zero length behavior sequence

executed on the objects, and m is the maximum num-

ber of objects. If the actual number of objects is less

than m, the visibility features of non-existing objects

are set to 0.

State transition occurs when the robot executes one

of its behaviors on an object. Only one object is

assumed to be affected at a time during the execution

of a single behavior, i.e. only the features of the

corresponding object is changed during a state tran-

sition. Thus, the next state can be predicted for any

behavior using the prediction scheme given in Eq. (1)

as follows:

S′

t+1 = St +[...0, f ′o[]b jeffect,0, ..] (2)

where b j behavior is executed on object o and features

of this object change by the summation operator.

Using an iterative search in behavior parameter

space, the robot can also find the best behavior and

its parameters that is predicted to generate a desired

(des) effect given any object:

bb( f (), f []deseffect) = argminb j ,ρ f

( f []deseffect− f ′[]b jeffect) (3)

where bb denotes “best behavior” operator.

3.3. Goal-emulation and plan generation

In the previous section, how the robot can (1) predict

the effect given object-behavior pair and (2) find

the best behavior to acquire a desired effect were

explained. Because prediction is based on vector

summation, the robot can estimate the total effect

that a sequence of behaviors will create by simply

summing up all effect vectors, and thus can use this

for multi-step prediction.

Goal-emulation is achieved by generating a plan,

i.e. finding the behavior sequence required to trans-

form the given state into the goal state. Forward

chaining can be used to search the state space and find

a sequence. Forward chaining uses a tree structure

with nodes holding the perceptual states and edges cor-

responding to (behavior-object) pairs. The execution

of each behavior on each different object can transfer

the state to a different state based on Eq. (2). Starting

from the initial state encoded in the root node, the next

states for different behavior-object pairs are predicted

for each state.

The goals are represented as desired world states:

G= [ f o1 , f o2 ]..

6

Page 7: Parental scaffolding as a bootstrapping mechanism for ...Right: 7 DoF robot arm, 16 DoF robot hand, table, and a sample object is shown. The range camera is placed on the top-right

3.4. Imitation through scaffolding

Imitation and goal emulation are achieved by finding

behavior sequences that will bring the initial state

(Sinit) to the goal state (Sgoal) depending on or indepen-

dent of demonstration, respectively. For this purpose,

the robot should have the ability to predict the effects

of its behaviors on the objects, i.e. it should be able to

predict the next state (St) for any behavior executed in

a given state (St). In this rest of this section, we present

the structures and methods that enable imitation and

goal emulation.

The robot observes the demonstration and extracts

the initial and goal states, as well as the intermediate

states (encoded as sub-goals) by detecting pauses

which may be introduced by a motionese engaged

tutor. If no pause can be detected, then a random

intermediate state would be picked up as the sub-goal

state.

Imitation module finds the behavior sequence that

brings the initial state to the goal state following the

detected state sequence. Finding the behavior that

transfers one observed state to the next observed state

corresponds to one-step goal-emulation, which the

robot can perform as described in Section 3.3. Thus,

imitating the behavior sequence practically corre-

sponds to applying goal-emulation for each successive

sub-goal extracted from the observed demonstration.

Selecting behaviors based on Imitation Module

results in following the exact trajectory of the demon-

strator, to the extent that as decimated by the pauses

inserted by the tutor. On the other hand, when Goal

Emulation Module, right panel) is selected, then it

finds a behavior sequence that brings the current state

to the goal state using forward chaining independent

of the intermediate states. In the Introduction Section,

we discussed that understanding “what to imitate” is

a nontrivial problem both in infants and robots; and

an active line of research in robotics. An autonomous

mechanism that is guided by caregiver/demonstrator

signals should be formalized and implemented based

on insights obtained from developmental psychology.

4. Conclusion

In this paper, we discussed how we can extend our pre-

viously developed unsupervised affordance learning

framework so that the robot can benefit from parental

scaffolding and motionese in learning of complex

affordances and in replicating observed actions. For

this purpose, we realized the interactive nature of a

human caregiver-infant skill transfer scenario on the

robot. First we utilized parental scaffolding to speed

up learning of complex grasp action through human

caregiver’s physical interaction. Next, we discussed

how parental scaffolding can be used in enabling

or speeding up imitation learning. We proposed an

extension to our framework where the robot, by using

previously learned affordance prediction mechanisms,

can go beyond simple goal-level imitation and become

a better imitator using motionese, i.e. infant-directed

modifications of parents.

In the future, the motionese based imitation exten-

sion should be tested and verified in the real robot with

human caregivers who demonstrate particular tasks for

imitation. These human subjects should be observing

robot’s imitation attempts (which probably initially

fail) but should not be informed about the working

mechanisms of the system. We expect to observe that

even naı̈ve subjects will modify their movements so

as to make the robot understand their actions as the

motionese theory predicts; similar to human parents

who unconsciously modify their demonstrations for

the infants. Such modifications should be elicited by

the responses of an action learner as in (Nagai et al.

(2010)).

5. Acknowledgments

This research was partially supported by Euro-

pean Community’s Seventh Framework Programme

FP7/2007-2013 (Specific Programme Cooperation,

Theme 3, Information and Communication Technolo-

gies) under grant agreement no. 270273, Xperience.

It was also partially funded by a contract in H23 with

the Ministry of Internal Affairs and Communications,

Japan, entitled ‘Novel and innovative R&D making

use of brain structures’.

6. References

Argall, B. D., Sauser, E. L., and Billard, A. G. (2010). Tactile

guidance for policy refinement and reuse. In Pro-

ceedings of the 9th IEEE International Conference

on Development and Learning, pages 7–12.

Berk, L. E. and Winsler, A. (1995). Scaffolding children’s

learning: Vygotsky and early childhood education.

In National Assoc. for Education.

Brand, R. J., Baldwin, D. A., and Ashburn, L. A. (2002).

Evidence for ‘motionese’: Modifications in moth-

ers’ infant-directed action. Developmental Science,

5(1):72–83.

Breazeal, C. (1999). Learning by Scaffolding. PhD thesis,

Elec. Eng. Comp. Sci. MIT, Cambridge, MA.

Goubeta, N., Rochat, P., Maire-Leblond, C., and Poss, S.

(2006). Learning from others in 9–18-month-old

infants. Infant and Child Development, 15:161–177.

Hodapp, R. M., Goldfield, E. C., and Boyatzis, C. J.

(1984). The use and effectiveness of maternal scaf-

folding in mother-infant games. Child Development,

55(3):772–781.

Kawato, M. and Samejima, K. (2007). Efficient reinforce-

ment learning: computational theories, neuroscience

and robotics. Current Opinion in Neurobiology,

17:205–212.

Murata, A., Gallese, V., Luppino, G., Kaseda, M., and

Sakata, H. (2000). Selectivity for the shape, size,

7

Page 8: Parental scaffolding as a bootstrapping mechanism for ...Right: 7 DoF robot arm, 16 DoF robot hand, table, and a sample object is shown. The range camera is placed on the top-right

and orientation of objects for grasping in neurons of

monkey parietal are AIP. Journal of Neuropyhsiol-

ogy, 83(5):2580–2601.

Nagai, Y., Muhl, C., and Rohlfing, K. J. (2008). Toward

designing a robot that learns actions from parental

demonstrations. In IEEE International Conference

on Robotics and Automation, pages 3545–3550.

Nagai, Y., Nakatani, A., and Asada, M. (2010). How a robots

attention shapes the way people teach. In Proc.

of the 10th International Conference on Epigenetic

Robotics, pages 81–88.

Nagai, Y. and Rohlfing, K. J. (2008). Parental action mod-

ification highlighting the goal versus the means. In

IEEE 7th International Conference on Development

and Learning.

Nagai, Y. and Rohlfing, K. J. (2009a). Computational anal-

ysis of motionese toward scaffolding robot action

learning. IEEE Transactions on Autonomous Mental

Development, 1(1):44–54.

Nagai, Y. and Rohlfing, K. J. (2009b). Computational Anal-

ysis of Motionese Toward Scaffolding Robot Action

Learning. IEEE Transactions on Autonomous Men-

tal Development, 1(1):44–54.

Saunders, J., Nehaniv, C. L., and Dautenhahn, K. (2006).

Teaching robots by moulding behavior and scaf-

folding the environment. In Proceedings of the

ACM SIGCHI/SIGART conference on Human-robot

interaction, page 118125.

Ugur, E., Celikkanat, H., Nagai, Y., and Oztop, E. (2011a).

Learning to grasp with parental scaffolding. In IEEE

Intl. Conf. on Humanoid Robotics, pages 480–486.

Ugur, E., Oztop, E., and Sahin, E. (2011b). Goal emulation

and planning in perceptual space using learned affor-

dances. Robotics and Autonomous Systems, 59(7–

8):580–595.

Ugur, E., Sahin, E., and Oztop, E. (2012). Self-discovery

of motor primitives and learning grasp affordances.

In IEEE/RSJ International Conference on Intelligent

Robots and Systems, pages 3260–3267.

Wood, D., Bruner, J. S., and Ross, G. (1976). The role

of tutoring in problem-solving. Journal of Child

Psychology and Psychiatry, 17:89–100.

Zukow-Goldring, P. and Arbib, M. A. (2007). Affor-

dances, effectivities, and assisted imitation: Care-

givers and the directing of attention. Neurocomput-

ing, 70:21812193.

8