autonomous learning and reproduction of complex sequences: a … learinig/andry.pdf ·...

Autonomous learning and reproduction of complex

sequences: a multimodal architecture for

bootstraping imitation games

Pierre Andry, Philippe Gaussier

ETIS, Neurocybernetic team CNRS 8051

6 avenue de Ponceau

95014 Cergy-Pontoise, France

andry,[email protected]

Jacqueline Nadel

Unit CNRS UMR 7593

47, Bd de l’Hopital

Paris, France

[email protected]

Abstract

This paper introduces a control architecturefor the learning of complex sequence of ges-tures applied to autonomous robots. The ar-chitecture is designed to exploit the robot in-ternal sensory-motor dynamics generated byvisual, proprioceptive, and predictive infor-mations in order to provide intuitive behav-iors in the purpose of natural interactionswith humans.

1. Introduction

Discriminating and learning sequences is acrucial mechanism in our efforts to devicerobots that can learn from imitation. Previoustechniques have included using symbolic ap-proaches and frequently they clearly separatedthe demonstration and reproduction phases (seefrom (Kuniyoshi, 1994) to (Tani, 2002) and a com-plete review about sequence learning architecturesin (Tijsseling and Berthouze, 2003)). Our archi-tecture, taking inspiration from experiments withpre-verbal children removes both these assumptions.Indeed, addressing the problem of learning sequencesof gestures with an autonomous and developing sys-tem arises important and very interesting remarks.First, we can argue that at a given level of its ”de-velopmental course” our system has no access to anyexplicit nor symbolic informations. Nevertheless,if we take a look at pre-verbal children, or evenyoung babies, learning sequences of action seems tobe possible, on the basis of, for example, imitationgames or/with simple sensory-motors interactions(see (Uzgiris, 1999, Nadel, 2000) for a selective re-view on infant imitation). Learning something newfrom a gestural interaction, as a new arrangementmade of motor primitives of self but showed bythe other, constitutes an interesting building blockof ”pre-verbal” communication. In this sense, our

problem arises the exciting question of how cansense and high level categorization emerge in a basicsensory-motor system from gestural interactions andfrom the building of shared representations? For theautonomous system designer interested in modeliz-ing this course of development, it obviously meansthat he can not constrain the environment nor thedemonstration phase in order to fit with previouslydetermined inputs or sensors. It also means thatthe robot is a part of an interaction whose dynamicsmay be constituted of actions of the demonstratorand actions of the robot, and, unless the robot hasautonomously learned how to take turn or switchrole, actions of both actors could happen at anytime. Demonstration and/or reproduction phasescan occur at the same time, with the consequencethat the system will have to extract by itself therelevant information from its perception. Conse-quently, the sequence learning and recognition tasksappear to be components of the same issue thatcan difficultly be stated as independent problems.This paper present a neural architecture that fusesthe demonstration and reproduction stages, whichcorrespond to two different dynamics of the sameperception action architecture: learning correspondto a situation where the different input informationsare not at the equilibrium, and reproduction canbe done when this system is able to predict learnedsequences. Interestingly we will link these resultswith what has been the drive of the developmentof our architecture from scratch, the homeostaticbehavior, that has led the system to learn its firstvisuo-motor repertory, and that triggered imitativebehaviors and first gestures imitations. Finally wewill discuss that our perception/action system in theperspective of turn-taking : how different modalitiescan determine the behavior of the system.

2. Homeostasis and development

Lets suppose a simple perception action architectureusing vision as main source of information, and a

97

Berthouze, L., Kaplan, F., Kozima, H., Yano, H., Konczak, J., Metta, G., Nadel, J., Sandini, G., Stojanov, G. and Balkenius, C. (Eds.)Proceedings of the Fifth International Workshop on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems

Lund University Cognitive Studies, 123. ISBN 91-974741-4-2

d/dφvision

motor commandmovementdetection

competition neural field read−out

Visuo−motormap

proprioception

a)

b)P

0

φ

Vactivity of self of self

learning of visuo−motor map

speed command

neural field

read−outd/dφ

Figure 1: a). Simplified architecture for self visuo-motor

coordination and immediate imitation. The Visuo-motor map

allows to compare visual (V) and proprioceptive (P) informa-

tion on the same space. Visual information triggers dynamical

attractors on the neural field. b). Initially, (V) and (P) of the

same end point of self do not have the same value (random ini-

tialization). This error induce the change of the Visuo-motor

weights, and the step by step equilibrium of (P) and (V) dur-

ing random movements. Immediate imitation is obtained the

same way, if Vision of others is confused with vision of self.

device (of a given complexity) to perform its actions.Such a system has to deal with at least 2 differentinformations:

• Visual information (V), about the external en-vironment. It can concern useful information,such as the movements of others, an object beinggrasped or self movements. This information isoften two-dimensional, related to the pixel space.

• Proprioception information (P), i.e informationabout self movements that are being done (angleof joints φ1,φ2,...,φn, speed of the joints d

dφ , etc..);

In previous works (Andry et al., 2004), we have im-plemented a model relying on dynamical neural fields(see S. Amari’s equations (Amari, 1977)) allowing acontrol architecture to behave like an homeostaticmachine. It tries to minimize differences between Pand V. We have shown that this homeostatic behav-ior was enough to drive, in the order:

1. Visuo-motor learning of associations between Pand V about self devices. For example, an eye-arm system during a babbling phase produce ran-dom arm movements in order to learn to equili-brate input V and P information projected on aninternal Visuo-motor map. The result is a sys-tem that is able to transform Motor information(φ1,φ2,...,φn) into a simpler 2D space correspond-ing to the visual one. As a result, movements to

reach visual goals can then be more easily com-puted on this visual space.

2. Because the system perception is ambiguous, Vinformation about movements of other are con-fused with movements of self. We have shownthat The homeostatic behavior and this percep-tion ambiguity triggers for free an immediate andsystematic imitation behavior.

Now we propose that the same simple homeostasisprinciple can still be applied to this same system inorder to switch from a phase of immediate system-atic imitation of simple gestures , to the learning ofmore complex sequences of theses gestures in orderto be able to reproduce them later. To do this, weintroduce a new structure that tries to predict thenext sensory-motor activity. This structure tries tokeep an equilibrium between its predictions (Pred)and the sensory-motor activity of the system. More-over, most sequences that are used for natural ges-ture interactions turn to be complex, and a reliablesystem need to use hidden states to code them inthe case of the involvement of the same state in dif-ferent transitions (for example A-B-A-C-A-B, etc..).We describe in the following section the system thatlearn and predicts such complex sequences of actions.

3. Learning complex sequences

d/dφmotor commandmovement

detectioncompetition neural field read−out

Visuo−motormap

pred

Prediction

Time baseProtyping

V

P

Figure 2: The architecture for on line sequence learning.

Dashed lines correspond to learned connections. See text for

details.

Our system does not directly takes the visual inputas the model of the sequence to learn. Instead, im-mediate imitation plays a crucial role in the learningmechanism. Movements of the robot are a filtered re-sponse of visual inputs, and the proprioception infor-mation is therefore more continuous, less noisy andstill contain the dynamics of the demonstrated se-quence. Hence, using the output of the visuo-motormap provide a simpler and more reliable signal of thedemonstrated trajectory. An integration and deriva-tion mechanism of this visuo-motor information al-lows to detect changes in the trajectory. Theses de-tected points form a temporal succession of relevantpositions (in the visuo-motor space). In real con-ditions, a sequence of gestures demonstrated by a

98

human will not be exactly the same. In a gesturalgame, for example, repetitions of the same sequenceturn to be different with numerous demonstrations.In this case, detection of “via points” in a simplesequence become quickly unstable from an iterationto the other. To build a stable sequence, we intro-duce a Prototyping group (see fig. 2) that learns thespatial characteristic (in the visual space) of the rel-evant points of the sequence, and exhibits robustnessto changes from iterations of the same sequence.

The neurons are activated as fol-low (Hersch, 2004), eq. 1:

P (Stj |x) =P (x|Stj) ·

∑i P (Stj |St−1

i )P (St−1i )

∑j P (x|Stj) ·

∑i P (Stj |St−1

i )P (St−1i )

(1)Moreover, this law combines properties of HMMand Self Organizing Maps algorithms, where hid-den states can be used to code complex sequences.Neurons (Sj , with activity P (Stj |x) at time t) ofthis group have one to all connections (Wij) whoseweights are modified in order to learn transitions be-tween states :

δwij = εwP (Stj |x) · P (Sti − 1|x)(1− wij) (2)

The main advantage is that computation can bedone on line, and the weight update is made ac-cording to the previous activated state (P (St−1

i |x),a kind of contextual activity at t − 1). The neu-rons of the Prediction group learn to predict thetiming of the transitions between activated neuronsof the prototyping group. Equations of activationof Prediction and Time Basis neurons can be foundin (Andry et al., 2001). The rule learning the tim-ing k of the transition between neurons i and j ofthe prototyping group, using time trace of the TimeBasis (TB) groups is the following :

∆wk−ij = ε · TBik · (Prototypingj − Predij) (3)

where ε is the learning rate. The sequence learn-ing mechanism generates step by step predictionsof sensory-motor information used to recall the se-quence in order to play back a learned sequence.

4. Experiments and results

Experiments where made with a 2 degrees of free-dom eye-neck device. Initially, the architecturewas planned to be tested with the 5 DOF roboticarm used in previous works, but a mechanicalfault has delayed the tests 1. The main point isthat the architecture remains unchanged for the2 DOF device application, and that the transfor-mation of complex device information into the 2

1A new robotic arm is under construction in order to vali-date the present work

dimensional visuo-motor space was demonstratedin (Andry et al., 2004). During the experiment, theexperimenter demonstrates repetitively the same se-quence in front of the robot, adapting the speedof its movements to the robot dynamics (but thedemonstration remains very close from natural move-ments). The robots learns the sequence online. First,the progressive learning of the sequence induces alow prediction activity. This activity is not strongenough to trigger the activity of the neural field. Asa result, the robot’s behavior is only triggered byV information, that “drives” the motor command:while the system is not able to process strong pre-dictions, it keeps an imitative behavior. After severaldemonstrations, the Prediction information reaches alevel strong enough to activate the neural field: Pred“drives” the system movements and allow the repro-duction of the learned sequence: the system switchfrom imitation to the demonstration of the learnedsequence. Of course, proper reproduction is only pos-sible if the experimenter stops the demonstrationswhen the system starts to predict, otherwise V andPred information will be in competition to drive themovements (imitation or reproduction ?). Figs: 3and 4 shows the internal activities of the main groupsof neurons of the architecture during learning and re-production.

5. Discussion

According to the purpose of this work, the mainquestion we can ask now is “is the robot behaviorautonomous ?”. The tested scenario was the fol-lowing: the robot imitates the experimenter gestureswith the consequence that V information is filteredby self movements in a more continuous visuo-motorinformation. The sequence of visuo-motor activitiesis then learned by changing the weights of the Pre-diction (time) and the prototyping (space) groups.Moreover, taking into account hidden states (algo-rithm of the prototyping group) allow the experi-menter to repeat during its interaction with the robotthe same sequence many times, until the predictioninformation arises. But at this point, the interac-tion stops to be intuitive and autonomous, in thesense that a complete reproduction of the learnedsequence implies that the experimenter control thesystem: First, when the prediction arises, the robotreproduction can only be done if the experimenterstops moving, otherwise V and prediction informa-tions will compete, inducing an unstable behavior :at any time, vision of the other moving can changethe system dynamics and confuse the reproductionof the sequence. Second, to stop moving, the ex-perimenter has to monitor the systems predictionsin order to see when the prediction information willovershoot a given threshold and activate the neu-ral field activity. To sum up, such a system is not

99

−100 −50 0

−100

−50

0predictions from neurone 7

−100 −50 0

−100

−50


−100 −50 0

−100

−50


−100 −50 0

−100

−50


−100 −50 0

−100

−50


−100 −50 0

−100

−50


−100 −50 0

−100

−50


−100 −50 0

−100

−50


Figure 3: Learning and reproducing a circle. Up : represen-

tation in the visual space. Left: Input activity (competition

of movement detection). Middle : Visuo-motor map activity

(filtered response) during immediate imitation. Right : Visuo-

motor map activity during the reproduction of the sequence;

this figure also show in black the neurons of the prototyping

sequence that have learn. Bottom: Activity showing the tran-

sitions learned by the Prediction group during a reproduction

of the sequence. The step by step recall of the transitions al-

low to trigger the neural field activity and the motor command

to reproduce the sequence.

able to know how to switch back from reproductionto learning. If our system is able to build two dif-ferent sensory-motor dynamics, immediate imitationand learning vs reproduction of a learned sequence,it is not able to cope between these two dynam-ics. In future works, we plan to adapt on goingresearch (Andry et al., 2001) about the dynamics ofturn taking of simple perception action systems tosee if minimal changes can be applied to the ar-chitecture in order to cope learning and reproduc-tion dynamics for stable interactions, in the mannerof (Ito and tani, 2004) with the learning of sequencesof increasing complexity.

Acknowledgements

The authors especially want to acknowledge the workof Micha Hersch and Benoit Lombardot, and the Hu-maine European network of excellence for funding.

References

Amari, S. (1977). Dynamics of pattern formation inlateral-inhibition type neural fields. Biological Cy-bernetics, 27:77–87.

Andry, P., Gaussier, P., and and. B.Hirsbrunner, J. N.

60

70

80

90

100

110

120

130

140

40 50 60 70 80 90 100 110 120x position

y po

sitio

n

Figure 4: A more complex trajectory with ambigous states:

“start-up-right-left-down-stop”. The sequence was repeated

about 20 times. Result with the prototyping group, arrows

showing transitions between the learned points. The “goes

and back” characteristic of the sequence has been properly

learned.

(2004). Learning invariant sensory-motor behaviors:A developmental approach of imitation mechanisms.Adaptive behavior, 12(2).

Andry, P., Gaussier, P., Moga, S., Banquet, J., andNadel, J. (2001). The dynamics of imitation pro-cesses: from temporal sequence learning to implicitreward communication. IEEE Trans. on Man, Sys-tems and Cybernetics Part A: Systems and humans,31(5):431–442.

Hersch, M. (july 2004). Apprentissage de sequenced’actions via des jeux d’imitations. Master’s thesis,Master of Cognitive Science of Paris.

Ito, M. and tani, J. (2004). On-line imitative interactionwith a humanoid robot using a dynamic neural net-work model of a mirror system. Adaptive behavior,12(2).

Kuniyoshi, Y. (1994). The science of imitation - towardsphysically and socially grounded intelligence -. Spe-cial Issue TR-94001, Real World Computing ProjectJoint Symposium, Tsukuba-shi, Ibaraki-ken.

Nadel, J. (2000). The functional use of imitation in pre-verbal infants and nonverbal children with autism.In A.Meltzoff and Prinz, W., (Eds.), The ImitativeMind: Development, Evolution and Brain Bases.

Tani, J. (2002). Articulation of sensory-motor experi-ences by ”forwarding forward model”: from robotexperiments to phenomenology. In Hallam, B., Flo-reano, D., Hallam, J., Hayes, G., and Meyer, J.,(Eds.), From Animals to Animats: Simulation ofAdaptive Behavior SAB’02, pages 171–180. MITPress.

Tijsseling, A. and Berthouze, L. (submited, 2003). Aneural network architecture for learning temporal in-formation. Neural Networks.

Uzgiris, I. C. (1999). Imitation as activity: its develop-mental aspects. In Nadel, J. and Butterworth, G.,(Eds.), Imitation in infancy, pages 187–206.

100

autonomous learning and reproduction of complex sequences: a … learinig/andry.pdf ·...

Documents