language games for autonomous robots - arti.vub.ac.besteels/games.pdfrunning in parallel on...

7
16 1094-7167/01/$10.00 © 2001 IEEE IEEE INTELLIGENT SYSTEMS S e m i s e n t i e n t R o b o t s Language Games for Autonomous Robots Luc Steels, Sony Computer Science Laboratory, Paris P et robots are currently entering the consumer market, and humanoid robots will soon follow. The ultimate success of such products will depend on our ability to resolve a key question: how we can design flexible, grounded dialogue systems for autonomous robots that permit open-ended dialogue with unprepared owners? This task is extraordinarily difficult and poses challenges for both integration and grounding. Here, I propose a unifying idea that meets both chal- lenges: language games. A language game is a sequence of verbal interactions between two agents situated in a specific environment. Language games both integrate the various activities required for dialogue and ground unknown words or phrases in a specific context, which helps constrain possible meanings. Over the past five years, I have been working with a team to develop and test language games on pro- gressively more sophisticated systems, from rela- tively simple camera-based systems to humanoid robots. The results of our work show that language games are a useful way to both understand and design human–robot interaction. Key challenges: Integration and grounding Human–robot dialogue requires solving many fun- damental AI problems, ranging from vision and speech to action planning, plan execution, and learning. AI researchers have made important progress in most of these areas over the last 20 years, aided by major advances in sensors, actuators, and computer hardware and software. As a result, we have components today that would have been hard to imagine 15 years ago. For example, in the mid ’80s, it took Shakey 1 sec- onds to laboriously extract object segments despite a carefully constructed scene, whereas the SRI Small Vision System currently performs real-time 3D tem- plate-based object matching with stereo in an unknown real environment. 2 But, whatever the performance of individual AI components, we can only obtain a com- plete intelligent system through integration. By over- coming individual component weaknesses using out- put from other components, we can create a total effect that is more than the sum of its parts. Integration must take place at two levels. First, from a computational point of view, we must com- bine disparate computational processes (possibly running in parallel on distributed hardware) into a coherent global system. To do this, we need a real- time distributed operating system and a secondary layer composed of standard components specialized for robot control. From an AI viewpoint, we must combine different tasks and methods, often using fundamentally differ- ent representations and approaches. For example, object recognition might involve techniques from instance-based learning, neural network style statisti- cal pattern recognition, and the matching of structured symbolic representations. Dialogue requires a combi- nation of syntactic and semantic processing, coupled to components for speech and image processing. Language games integrate all the different activi- ties required for effective dialogue: vision, gesturing, pattern recognition, speech analysis and synthesis, conceptualization, verbalization, interpretation, and follow-up action. The agent’s scripts for playing a game not only invoke the necessary components to achieve success in the game, but also trigger learn- ing algorithms that can help the robot fill in missing concepts or learn the meaning of new natural- language phrases. In addition to integration, human–robot dialogue Integration and grounding are key AI challenges for human–robot dialogue. The author and his team are tackling these issues using language games and have experimented with them on progressively more complex platforms.

Upload: ngolien

Post on 22-Apr-2019

215 views

Category:

Documents


0 download

TRANSCRIPT

16 1094-7167/01/$10.00 © 2001 IEEE IEEE INTELLIGENT SYSTEMS

S e m i s e n t i e n t R o b o t s

Language Games forAutonomous RobotsLuc Steels, Sony Computer Science Laboratory, Paris

P et robots are currently entering the consumer market, and humanoid robots will

soon follow. The ultimate success of such products will depend on our ability to

resolve a key question: how we can design flexible, grounded dialogue systems for

autonomous robots that permit open-ended dialogue with unprepared owners? This task is

extraordinarily difficult and poses challenges forboth integration and grounding.

Here, I propose a unifying idea that meets both chal-lenges: language games. A language game is a sequenceof verbal interactions between two agents situated in aspecific environment. Language games both integratethe various activities required for dialogue and groundunknown words or phrases in a specific context, whichhelps constrain possible meanings.

Over the past five years, I have been working witha team to develop and test language games on pro-gressively more sophisticated systems, from rela-tively simple camera-based systems to humanoidrobots. The results of our work show that languagegames are a useful way to both understand anddesign human–robot interaction.

Key challenges: Integration andgrounding

Human–robot dialogue requires solving many fun-damental AI problems, ranging from vision and speechto action planning, plan execution, and learning. AIresearchers have made important progress in most ofthese areas over the last 20 years, aided by majoradvances in sensors, actuators, and computer hardwareand software. As a result, we have components todaythat would have been hard to imagine 15 years ago.

For example, in the mid ’80s, it took Shakey1 sec-onds to laboriously extract object segments despite acarefully constructed scene, whereas the SRI SmallVision System currently performs real-time 3D tem-plate-based object matching with stereo in an unknownreal environment.2 But, whatever the performance of

individual AI components, we can only obtain a com-plete intelligent system through integration. By over-coming individual component weaknesses using out-put from other components, we can create a total effectthat is more than the sum of its parts.

Integration must take place at two levels. First,from a computational point of view, we must com-bine disparate computational processes (possiblyrunning in parallel on distributed hardware) into acoherent global system. To do this, we need a real-time distributed operating system and a secondarylayer composed of standard components specializedfor robot control.

From an AI viewpoint, we must combine differenttasks and methods, often using fundamentally differ-ent representations and approaches. For example,object recognition might involve techniques frominstance-based learning, neural network style statisti-cal pattern recognition, and the matching of structuredsymbolic representations. Dialogue requires a combi-nation of syntactic and semantic processing, coupledto components for speech and image processing.

Language games integrate all the different activi-ties required for effective dialogue: vision, gesturing,pattern recognition, speech analysis and synthesis,conceptualization, verbalization, interpretation, andfollow-up action. The agent’s scripts for playing agame not only invoke the necessary components toachieve success in the game, but also trigger learn-ing algorithms that can help the robot fill in missingconcepts or learn the meaning of new natural-language phrases.

In addition to integration, human–robot dialogue

Integration and

grounding are key AI

challenges for

human–robot dialogue.

The author and his

team are tackling these

issues using language

games and have

experimented with them

on progressively more

complex platforms.

requires grounding. Grounding relates lan-guage processing’s symbolic representationsto sensory-motor processing. If I tell a robotto “give me the red ball,” I expect it can bothdetect the red ball in the environment andexecute the action required to hand it to me.Although impressive natural-language inter-faces and software agent technologies exist,3

they do not address the grounding issue. The grounding problem goes deeper than

attaching labels to structures derived fromsignal processing and pattern-recognitionalgorithms. For example, color categoriessuch as red or brown do not simply equatewith wavelength measures. A fire truck, abottle of wine, and a tomato are all “red”even though they show very different physi-cal reflectance characteristics.

Expectations, context, and even languageall influence how we categorize physicalreality. Given this, we must establish a stronginterdependence between the conceptual andsensory-motor layers. Pattern recognitionneeds guidance from the conceptual layer toavoid combinatorial explosions and thepotential confusion resulting from input sig-nals’ inherent lack of detail or noise. Also,we must relate primitive concepts at the con-ceptual layer to sensory layer output, whichoften means we need techniques other thanthose currently employed by many purelysymbolic AI programs.

Language games contribute to solving thegrounding problem because they create astrong context that constrains the possiblemeanings of words, thus making it easier forthe robot to guess unknown meanings. As anexample, consider a human playing a game ofshowing objects to the robot, asking theirname and, if the robot does not know thename, teaching it the new name. If the speakerholds up an object and says “Look, wrob,” therobot can figure out that “wrob” is the name ofan object, as well as being the name of the spe-cific object that the speaker is holding. Therobot can thus infer a lot from the game con-text and the situation that will help it learn andhelp it solve ambiguity and uncertainty.

Solution: Language gamesA language game is both an integrating glue

and a vehicle for supporting the grounding ofsymbolic representations. The interactionbetween agents involves both language aspects(parsing and producing utterances) andgrounding aspects. The latter aspects includesensory processing, executing gestures oractions, and, most importantly, steps for learn-

ing new language (new words and phrases, andnew meanings or pronunciations for existingwords). Complex dialogue involves multiplelanguage games interlaced with each other.

Ludwig Wittgenstein promoted the lan-guage game notion to emphasize that lan-guage and meaning are not based on context-independent abstractions, but rather arise aspart of specific interactive situations. That is,the meaning of a word or phrase comes fromits role in a game; for example, the meaningof “queen” in chess. This explains why aword’s meaning cannot be easily defined inabsolute terms but rather arises from the sit-uation and context. It also explains whyhumans can easily disambiguate words orphrases. We typically interpret words in acontext that strongly restricts what is beingtalked or written about. Our work recreatesthis experience for robots through situatedlanguage games.

Language games must not only begrounded and situated, but also adaptive.That is, dialogue participants must adapt theirlanguage to negotiate communications in thegame. Recent research on human dialogueshows that humans invent new languagewhile they are solving cooperative tasks.4

Language is not fixed and preprogrammable,but rather highly adaptive and situated. Toachieve open-ended and grounded commu-nication with robots, we must account forthis. Concept acquisition, language learning,and language negotiation must be an integralpart of the dialogue.

The guessing gameThe guessing game is an example of a lan-

guage game.5 The “Guessing Game: A Sim-ple Example” sidebar sketches a simplifiedversion, and Figure 1 shows the processes itinvolves.

In the guessing game, the speaker tries todraw the listener’s attention to an object inthe environment. For example, Mary sits atthe table and asks her neighbor, Pierre, for thesalt by saying “salt.” Mary might also pointto the salt. The game’s context is the table, the

objects on it, and the people around the tableand their actions. The salt is the topic. Fromthis example, it’s clear that the word spo-ken—“salt”—is only a small part of what’sgoing on. In addition to hearing the word, thelistener must perceive and conceptualize thesituation, interpret the speaker’s gestures,guess what action the speaker desires, and soon. All of these elements are an intrinsic partof the language game.

The guessing game can fail in many ways.In the example above, Pierre might misun-derstand the word, not know the word, orassume a different meaning for the word. Thisfailure would become obvious in Pierre’s sub-sequent action. He might, for example, handMary the water instead of the salt.

Every language game must contain provi-sions for detecting and repairing failure. Typically, the speaker will provide moreinformation, possibly nonverbally throughadditional gestures. If failure is due to a lackof knowledge, the language game offers anopportunity for learning. For example, ifPierre does not know the word “salt” (perhapshe is French), he can use this situation toacquire a new word. If he failed to conceptu-alize the scene (perhaps in Pierre’s native cul-ture, salt is never purified to white grains), hecan enrich his repertoire of concepts.

There are many possible variations on theguessing game, but a few factors are crucial.

1. The speaker and listener must rate differ-ent associations based on the appropri-ateness of their form and meaning. Doingso lets the players select the associationwith the highest potential for success.

2. The game must have a positive feedbackloop between use and success. This letsthe players use higher scoring associa-tions in the future, increasing the likeli-hood of successful communication.

3. The game needs a strong structural cou-pling between concept formation andlanguage, which is achieved when play-ers have feedback on the adequacy ofconcepts.

SEPTEMBER/OCTOBER 2001 computer.org/intelligent 17

Conceptualize ApplyReference Dereference

PerceptionSense

Concept

World

Utterance

Sense

AnalyzeVerbalize

Perception

Concept

Figure 1. The guessing game consists of several processes. The speaker’s processes are onthe left, and the listener’s are on the right. Between them are feedback processes thatmove in alternating directions until the agents settle on coherent choices for each stage.

ChallengesBuilding dialogues for physical robots using

speech input and output is quite different frombuilding natural-language front ends for data-bases or expert systems, and even from build-ing dialogues for synthetic characters.3 On thepositive side, physical robots have real-worldsituations that can help constrain the meaningof utterances, and human users can help delin-eate dialogue using prosody and gestures.

On the negative side, there is enormousuncertainty at every step of the process:

• Getting speakers and listeners to shareattention on the same object or event isextraordinarily difficult, but is required ifthey are to discuss such objects or eventsin the real world.

• Given that they have different perspec-tives, speakers and listeners typicallyderive different low-level features andeven different segments.

• Speakers and listeners cannot telepathi-cally discern each other’s conceptualiza-tions of the world. Because there are

many ways to conceive reality, it is almostcertain that conceptualizations will dif-fer. For example, “the wine glass on theedge of the table” might also be concep-tualized as “the glass from which you justdrank” or simply “your glass.”

• There are many ways to express concep-tualizations, and language is ambigu-ous—a single word can have differentmeanings. This creates uncertainty for lis-teners. Moreover, we cannot assume thatspeakers and listeners have exactly the

18 computer.org/intelligent IEEE INTELLIGENT SYSTEMS

S e m i s e n t i e n t R o b o t s

In a simplified version of the guessing game, utterances aresingle words and the lexicon consists of associations betweensingle words and visually grounded predicates.

Each association is a triple <r, s, k> where r is a representa-tion, s a symbol, and k a score to reflect how successful thisassociation has been in the past games (and hence how suc-cessful it might be in the future). Each player has his or herown lexicon. There is no global knowledge nor central control.

The steps to play the game are:

Step 1—Shared attention By pointing, eye gazing, moving an object, or other means,

the speaker draws the listener’s visual attention to the topic orat least to a narrower context that includes the topic. To aidattention, the speaker can emit a word, such as “look,” andobserve whether this directs the listener’s gaze toward the topic.Based on this activity, we assume both agents have captured animage that reflects the shared context.

Step 2—Speaker behaviorThe speaker conceptualizes the topic, yielding a represen-

tation (r). Conceptualization is a combination of concepts thatdistinguishes the topic from the other objects in the context.For this simplified game, we’ll assume that the concept is asingle predicate that is true for the topic but not for the otherobjects. For example, if every object on the table is blue but thetopic is white, then color is a good way to refer to the topic.The speaker then collects all associations <r,s,k> in his lexiconand picks out the one with highest score (k). The symbol (s)from this association is the best word to communicate from thespeaker’s viewpoint; s is transformed into a speech signal andtransmitted to the listener.

Step 3—Listener behaviorThe listener receives the speech signal, recognizes the word

s, and looks up all associations <r′, s, m> in her memory.

Step 3.1If the listener did not have an association in memory for s, s is a

new word for the listener. She signals incomprehension, and thespeaker points to the topic. In this way, the listener can concep-

tualize the scene by finding categories indicating how the topic isdifferent from the other objects. If there is more than one possi-bility, she picks one, yielding a representation r,′′ and stores thenew association <r′′, s, i>, where i is an initial default score.

Step 3.2If there are associations, the listener applies each represen-

tation r’ to the current scene—perhaps starting with thehighest scoring ones—to see whether any one picks out aunique object. If so, this is the assumed topic. There might beambiguity due to more than one possible topic. In this case,the listener selects the referent identified by the associationwith the highest score. The listener then points to the topic.

Step 4—Feedback If the listener finds a referent (Step 3.2), there are two

possible outcomes.

Step 4.1The speaker agrees that the referent is correct (that it is the

topic he intended) and signals his agreement. In this case, bothspeaker and listener increase the score of the association theyused and decrease the score of competing associations. For thespeaker, a competing association is one that involves the samemeaning, but a different word. For the listener, a competingassociation is one that involves the same word, but a differentmeaning.

Step 4.2If the speaker signals that the listener failed to recognize

the topic, both speaker and listener decrease the score of theselected associations. The speaker then gives additional feed-back until speaker and listener share the same topic. The lis-tener then conceptualizes the topic from her viewpoint andeither stores a new word (as in 3.2) or increases the score of anexisting association (as in 4.1).

Step 5—Acquire a new conceptualization If the speaker or listener fail to conceptualize the scene, a

concept acquisition algorithm is triggered. The attempt here isto use the situation as a learning source to acquire a newconceptualization.

Guessing Game: A Simple Example

same knowledge of the language. Peopletypically have different histories of lan-guage exposure, and thus use languagein subtly different ways.

• Finally, there is an inherent uncertaintyand ambiguity in the speech signal itself.As decades of speech recognition researchhave shown, utterances articulated by thespeaker are seldom unambiguously rec-ognized by the listener.

It follows that we cannot view the differ-ent process steps outlined in Figure 1 assequential. They must work as a dynamicconstraint-propagation process in whichinformation flows in all directions untilspeaker and listener settle on a coherent com-munication. To select the best conceptual-ization and verbalization, the speaker musttake the listener’s circumstances, priorknowledge, and viewpoint into account. Thelistener must perform bottom-up and top-down processing to maximize the use of allavailable constraints.

ApplicationsFor the past five years, I have been work-

ing with a team to concretize and elaboratethe language game idea. Our experimentalplatforms have been different generations ofSony robots: the EVI pan-tilt camera and theAIBO dog-like robot. Our current workfocuses on the humanoid Sony Dream Robot(SDR). All the robots use Aperios6 as theirreal-time OS and Open-R7 as the secondarycomputational layer.

To integrate the AI components, we designeda cognitive architecture, Coala, to run on top ofthese components. In keeping with AI tradition,Coala captures an agent’s interaction with theworld and other agents using schemas. Aschema contains local slots, a schema applica-bility monitor, constraints, and actions speci-fied as augmented finite state machines. Coalahas facilities for memory storage and retrieval,action selection, and flexible schema execution.To interface between cognition and sensori-motor intelligence, Coala uses shared datastructures. Although computational and cogni-tive architectural layers are critical for a suc-cessful integration, here I focus on our generaldesign philosophy for using these tools toachieve coherent, global interactive behaviorfrom disparate components.

Talking HeadsThe Talking Heads experiment8 was one

of our first. It involved two robots playing a

guessing game. As Figure 2 shows, eachrobot consisted of a pan-tilt unit with a colorcamera, oriented toward geometric figurespasted on a white board. The situation’s con-text consisted of a small area on the white-board. The topic was one of the figures, suchas a red square.

For conceptualization, we used a decision-tree-like algorithm. Input to the decision-treeis output from a battery of statistical pattern-recognition and computer-vision algorithms.Thus, “left” meant that the x coordinate ofthe figure’s middle point was less than theaverage x coordinates of all figures in thecontext, “right” meant that it was greater thanthis average, “large” meant that the figure’ssize was greater than the average size of allfigures, and so on.

For concept acquisition, we used a selec-tionist learning method: decision-trees growin a random fashion when the agent fails tofind a concept that distinguishes the topicfrom other objects, and branches are prunedif they prove irrelevant or unsuccessful insubsequent language games.5 The “TalkingHeads Example” sidebar shows part of agame, where the listener acquires a new wordfor a particular shade of blue.

We set up a teleportation facility that letus run several guessing games through Inter-net-connected installations in different real-world locations. The whiteboard in eachlocation featured a different configuration ofcolored figures. An agent population traveledthrough the Internet and installed themselvesinto different robot bodies to play the games.

People created the agents through the WorldWide Web (www.csl.sony.fr/talking-heads).Owners could also play guessing games withtheir agents through recorded images, andthus teach them new words. If no new wordwas available, a speaking agent could con-struct a new word. As a result, a new lan-guage progressively developed that was onlypartly influenced by human language.

During a three-month period, the agentsplayed close to half a million language gamesand created a stable core vocabulary of 300words (they generated thousands of wordsoverall). Our experiment showed not only thatthe language game approach is useful forimplementing grounded dialogues between onehuman and a robot, but also that the game mightbe useful as an explanatory model for how lan-guage originates. Indeed, an evolving popula-tion of grounded agents developed, fromscratch, a repertoire of concepts and a lexiconto communicate about their environment.8

Talking to AIBOMore recently, we used the language game

framework to run experiments on the AIBOrobot. In many respects, these experimentswere a giant step beyond the Talking Headsexperiment. AIBO is a fully autonomousmobile robot with more than a thousandbehaviors, coordinated through a complexbehavior-based motivational system.7 Giventhis, getting the robot’s attention and achiev-ing shared perspective on the world prior to agame is complex. We enhanced the dialogueusing the robot’s gestures and movements and

SEPTEMBER/OCTOBER 2001 computer.org/intelligent 19

Figure 2. Setup for the Talking Heads experiment. Two pan-tilt cameras look at a whiteboard containing colored geometric figures, which the robots use as subjects of aguessing game.

its onboard visual processing and sensing. Weused off-the-shelf speech components forspeech input and output. Obviously, usingspoken language increases still further thecommunication uncertainty.

Nevertheless, we successfully imple-mented several language games, starting withthe guessing game. Figure 3 shows an exam-ple of an interaction; the “AIBO Game Dia-logue” sidebar shows the dialogue. Ratherthan using decision trees as before, we per-formed conceptualization using a nearest-neighbor algorithm with a memory of storedobject views. We used instance-based learn-ing to build the object memory. Every lan-guage game was thus an opportunity toacquire a new object view or to learn about anew object class, of which the current exam-

ple was a first instance. Therefore, in thisexperiment we showed that any kind of con-cept acquisition can be used. The topic mightbe a single object, or it might also be anaction or property of the situation. Othergames focused on naming body parts andactions to be used in commands.

Current work: Communicating withhumanoids

Designing a single language game is avery difficult task. The challenge lies in manysubtle details. First, you must set up the rightopportunities for the robot to get every pos-sible piece of information from the environ-ment. You must then exploit this informationto improve the robot’s understanding, andhelp it learn new concepts and language

whenever possible. Designing systems thatcan handle multiple language games is evenmore difficult, because humans smoothlyswitch from one game to another, often with-out a clear explicit indication.

Our current work focuses on dialogues formultiple language games within the contextof humanoid robots, specifically the SonySDR. The SDR humanoid robot has the nec-essary behavioral capacities to support fullyanimated grounded communication (ges-tures, visual recognition, speech input andoutput, and so on). The main challenge is tobring all these subsystems together into acoherent dialogue.

Managing flexible dialogues in real-worldinteractions is a problem similar to theaction-selection problem, which has been a

20 computer.org/intelligent IEEE INTELLIGENT SYSTEMS

S e m i s e n t i e n t R o b o t s

Figure A shows an example of the Talking Heads experiment.Table A shows the raw data that the speaker derives from theimage: X is the horizontal position of figure’s middle point, Y isthe vertical position, H the height, W the width, A the angular-ity, R, G, Y, B the color opponent channels (red, green, yellow,blue), and L the brightness.

The first object (with coordinates 0,1) is the topic. Based onthe decision trees, the agent conceptualizes this object interms of the blue channel. A shade of blue (between 0.25–0.5)is distinctive for the topic, but not for any other object in thecontext. The speaker has three words in the lexicon for this:Xagadude (score 0.1), Nibidesu (score 0.0), and Tetipi (score

0.0). The speaker chooses the first word.The listener does not know this word andperforms a categorization, which happensto yield the same conceptualization. Thelistener therefore adds a new associationto the lexicon.

Note that the listener might have eas-ily chosen another conceptualization forthis scene, such as one based on bright-ness or height. This would create a diver-gence in the lexicon. This divergencewould show up in a later game when thesame agents are confronted with a simi-larly unclear situation.

Figure A. Example of a guessing gameplayed by two “talking heads.” (a) theimages captured by the speaker and (b) listener. Notice that they are not exactlythe same, nor are (c) the speaker’s and (d)listener’s decision trees. Although theirrepertoires are not the same, both agentsin this case chose the same distinction.

Talking Heads Example

Table A. The raw data that the speaker derives from the image in Figure A.

Object X Y H W A R G Y B L

0 (0,1) 0.37 0.71 0.48 0.21 0.45 0.17 0.00 0.00 0.39 0.281 (1,0.96) 0.70 0.69 0.38 0.22 0.45 0.98 0.00 0.52 0.00 0.362 (0.42,0.0) 0.51 0.31 0.21 0.51 0.70 0.00 0.99 0.73 0.00 0.46

(a) (c)

(b) (d)

central challenge in behavior-based roboticsarchitectures.9 As with action selection,robots in dialogue must swiftly respond toimpulses coming from the environment,while maintaining behavioral coherence andremaining on target for long-term goals. Inher work with Kismet, Cynthia Breazeal10

offered a first example of how we can useideas from behavior-based robotics to regu-late human–robot interaction. The key is tointroduce continuously varying motivationalstates and combine them in a dynamical sys-tem with sensory-motor states and behavioralprogress monitors. Motivational states indi-cate a behavior’s desired intensity and degreeof opportunity. Action selection occursthrough a kind of bidding process in whichdifferent behaviors compete for execution.Motivational dynamics are tightly coupledto the environment so that the robot canswiftly switch to another behavior asrequired by the environment.

We are using these same principles tomanage multiple language games. A game’simplementation schema has associatedmonitors that determine whether the schemawould be appropriate in a given situation.Opportunity is determined both by environ-mental conditions and fragments of languageutterances that signal certain games. Weimplement each language game using aschema or a small hierarchy of schemas, andlocal states monitor opportunity or progress.Games have associated motivational statesthat permit flexible decision and flexible

switching from one game to another.The increased complexity of individual lan-

guage games and the need to coordinate mul-tiple language games raises the issue of syn-tax. Language games use and produce bits ofsyntactic structure as needed. Thus, there isno separate central language module that pro-vides complete parses or handles the planningof complete sentences. Rather, syntactic pro-cessing is similar to object tracking: It is anongoing process that yields occasional resultsthat are immediately used to further the game.Alternatively, the game can provide strongconstraints to aid syntactic processing.

I t’s now becoming possible for humans tohave open-ended dialogues with physi-

cally embodied robots. However, such dia-logues remain extraordinarily difficult toimplement, mainly because they require theintegration of a wide range of technologiesand methods into a coherent system.

As my examples here show, the languagegame metaphor is a useful way to conceiveand design open-ended dialogues. Languagegames group all aspects of verbal interaction:the parsing and production of utterances, aswell as the grounding in sensory-motor intel-ligence, nonverbal gestures, and actions thatresult from communication. In this way, lan-guage games provide the glue to integratemany diverse components into a single

whole. Language games also further lan-guage and concept acquisition, which canhelp robots fill in gaps and negotiate com-munication conventions.

The difficulty in setting up adequate learn-ing is not in finding good machine-learningalgorithms (plenty exist now) but in settingup the right opportunities for agents to learn.Properly designed language games createthis opportunity.

AcknowledgmentsThe technical work described in this article

involved many people from the Sony ComputerScience Laboratory in Paris, the Digital CreaturesLab in Tokyo, and the VUB Artificial IntelligenceLaboratory in Brussels. In particular, FrédéricKaplan of Sony CSL was chief architect of the lan-guage games on the AIBO. In addition, I thankTony Belpaeme, Edwin de Jong, Toshi Doi,Masahiro Fujita,Angus McIntyre, Joris Van Loov-eren, and Paul Vogt.

SEPTEMBER/OCTOBER 2001 computer.org/intelligent 21

The AIBO robot is preprogrammedto respond to many action names. Atthe start of the AIBO experiment, theexperimenter tells AIBO to sit, mak-ing concentration on the languagegame easier.

1. Human: Sit.

2. Human: Sit down.

The experimenter then showsAIBO the ball (see Figure 3).

3. Human: Look.

4. Human: Ball.

The word “look” helps AIBO focuson the guessing game based on visualinput. The robot performs image cap-turing and segmentation. The gameis possible if AIBO finds a segment. Ittries to recognize the object using anearest-neighbor algorithm.

5. Aibo: Ball?

The robot asks for feedback of theword to make sure it understood it.

6. Human: Yes.

With this positive feedback on thepronounced word, the game pro-ceeds. “Ball” is the word that AIBOwill then associate with the object; ifit is a new word, the robot will storeit. If the word already exists, therobot will increase the word’s score.

AIBO Game Dialog

Figure 3. A guessing game between AIBO and a human experimenter, Frédéric Kaplan,involving the use and acquisition of a word for “ball.” The “AIBO Game Dialog” sidebar gives an example of dialogue from the interaction.

References

1. J.J. Nilson, “Shakey the Robot,” tech. note 323,SRI AI Center, Menlo Park, Calif., 1984.

2. D. Beymer and K. Konolige, “Real-TimeTracking of Multiple People Using ContinuousDetection,” IEEE Frame Rate Workshop, 1999;available at www.eecs.lehigh.edu/~tboult/FRAME/Beymer (current 31 Aug. 2001).

3. N. Badler, M. Palmer, and R. Bindiganavale,“Animation Control for Real-Time VirtualHumans,” Comm. ACM, vol. 42, no. 8, Aug.1999, pp. 64–73.

4. H.H. Clark and S.A. Brennan, “Grounding inCommunication,” L.B. Resnick, J.M. Levine,and S.D. Teasley, eds., Perspectives on SociallyShared Cognition, APA Books, WashingtonD.C., 1991, pp. 127–149.

5. L. Steels, “The Origins of Syntax in VisuallyGrounded Robotic Agents,” Artificial Intelli-gence, vol. 103, nos. 1–2, 1998, pp. 133–156.

6. Y. Yokote, “The Apertos Reflective Operat-ing System: The Concept and Its Implemen-tation,” Sigplan Notices, vol. 33, no. 10, 1992,pp. 414–434.

7. M. Fujita and H. Kitano, “Development of anAutonomous Quadruped Robot for RobotEntertainment,” Autonomous Robotics, vol.5, 1998, pp. 1–14.

8. L. Steels et al., “Crucial Factors in the Originsof Word-Meaning,” The Transition to Lan-guage, A. Wray et al., eds., Oxford Press, UK,to be published 2002.

9. L. Steels and R.A. Brooks, eds., The ArtificialLife Route to Artificial Intelligence: BuildingEmbodied Situated Agents, Lawrence Erl-baum Associates, Hillsdale, N.J., 1995.

10. C. Breazeal, “A Motivational System for Reg-ulating Human-Robot Interaction,” Proc.AAAI98, AAAI Press, Menlo Park, Calif.,1998, pp. 31–36.

For more information on this or any other com-puting topic, please visit our Digital Library athttp://computer.org/publications/dilb.

S e m i s e n t i e n t R o b o t s

T h e A u t h o rLuc Steels is professorof artificial intelligenceat the University ofBrussels and directorof the Sony ComputerScience Laboratory inParis. His researchinterests in AI includerobotics, vision, learn-

ing, and natural language, with a focus on thedevelopment of computational and robotic mod-els for studying the origins of language. Con-tact him at AI-lab (ARTI), Vrije UniversiteitBrussel, Pleinlaan 2, 1050 Brussels, Belgium;[email protected].

New for 2002!Announcing 2 New Resources for Mobile,Wireless, and Distributed Applications

The exploding popularity of mobile Internet access,third-generation wireless communication,handheld devices, and Bluetooth have madepervasive computing a reality. New mobilecomputing architectures, algorithms, environments,support services, hardware, and applications arecoming online faster than ever. To help you keeppace, the IEEE Computer Society andCommunications Society are proud to announcetwo new publications:

IEEE Pervasive ComputingStrategies for mobile, wireless, and distributedapplications, including

Mobile computingWireless networksSecurity, scalability, and reliabilityIntelligent vehicles and environmentsPervasive computing applications

http://computer.org/pervasive

IEEE Transactions on Mobile ComputingState-of-the-art research papers on topics such as

Mobile computingWireless networksReliability and quality assuranceDistributed systems architectureHigh-level protocols

http://computer.org/tmc

� �

� �

� �

� �