1786-1782-1-pb.pdf

Upload: miguel-angel

Post on 06-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

  • We present a methodology for designing and im-plementing interactive intelligences. The construc-tionist design methodology (CDM)so called be-cause it advocates modular building blocks andincorporation of prior workaddresses factors thatwe see as key to future advances in AI, includingsupport for interdisciplinary collaboration, coordi-nation of teams, and large-scale systems integra-tion.

    We test the methodology by building an interac-tive multifunctional system with a real-time per-ception-action loop. The system, whose construc-tion relied entirely on the methodology, consistsof an embodied virtual agent that can perceiveboth real and virtual objects in an augmented-re-ality room and interact with a user through coor-dinated gestures and speech. Wireless trackingtechnologies give the agent awareness of the envi-ronment and the users speech and communica-tive acts. User and agent can communicate aboutthings in the environment, their placement, andtheir function, as well as about more abstract top-ics, such as current news, through situated multi-modal dialogue.

    The results demonstrate the CDMs strength insimplifying the modeling of complex, multifunc-tional systems that require architectural experi-mentation and exploration of unclear subsystemboundaries, undefined variables, and tangled dataflow and control hierarchies.

    The creation of embodied humanoidsand broad AI systems requires integra-tion of a large number of functions thatmust be carefully coordinated to achieve co-herent system behavior. We are working onformalizing a methodology that can help inthis process. The architectural foundation wehave chosen for the approach is based on theconcept of a network of interacting modules

    that communicate via messages. To test the de-sign methodology we chose a system with ahuman user that interacts in real time with asimulated human in an augmented-reality en-vironment. In this article we present the de-sign methodology and describe the systemthat we built to test it.

    Newell (1990) urged for the search of unifiedtheories of cognition,1 and recent work in AIhas increasingly focused on integration ofmultiple systems (compare with Simmons etal. 2003, McCarthy et al. 2002, and Bischoffand Graefe 1999). Unified theories necessarilymean integration of many functions, but ourprior experience in building systems that inte-grate multiple features from artificial intelli-gence and computer graphics (Bryson andThrisson 2000, Lucente 2000, Thrisson1999) has made it very clear that such integra-tion can be a challenge, even for a team of ex-perienced developers. In addition to basictechnical issuesconnecting everything to-gether can be prohibitive in terms of timeitcan be difficult to get people with differentbackgrounds, such as computer graphics, hard-ware, and artificial intelligence, to communi-cate effectively. Coordinating such an effortcan thus be a management task of a tall order;keeping all parties synchronized takes skill andtime. On top of this comes the challenge of de-ciding the scope of the system: what seemssimple to a computer graphics expert may infact be a long-standing dream of the AI person,and vice versa.

    Several factors motivate our work. First, amuch-needed move towards building on priorwork in AI to promote incremental accumula-tion of knowledge in creating intelligent sys-tems is long overdue. The relatively smallgroup working on broad models of mind, the

    Articles

    WINTER 2004 77

    Constructionist Design Methodology for

    Interactive IntelligencesKristinn R. Thrisson, Hrvoje Benko, Denis Abramov,

    Andrew Arnold, Sameer Maskey, and Aruchunan Vaseekaran

    Copyright 2004, American Association for Artificial Intelligence. All rights reserved. 0738-4602-2002 / $2.00

    AI Magazine Volume 25 Number 4 (2004) ( AAAI)

  • nents, and his (perhaps whimsical but funda-mental) claim that the brain is a hack.3 In otherwords, we expect a working model of the mindto incorporate, and coherently address, what atfirst seems a tangle of control hierarchies anddata paths. Which relates to another importanttheoretical stance: The need to model morethan a single or a handful of the minds mech-anisms in isolation in order to understand theworking mind. In a system of many moduleswith rich interaction, only a model incorporat-ing a rich spectrum of (animal or human) men-tal functioning will give us a correct picture ofthe broad principles underlying intelligence.

    There is essentially nothing in the construc-tionist approach to AI that lends it more natu-rally to behavior-based AI (compare withBrooks 1991) or classical AIits principles sitbeside both. In fact, since constructionist AI isintended to address the integration problem ofvery broad cognitive systems, it must be able toencompass all variants and approaches to date.We think it unlikely that any of the principleswe present will be found objectionable, or evencompletely novel for that matter, by a seasonedsoftware engineer. But these principles are cus-tom-tailored to guide the construction of largecognitive systems, and we hope it will be used,extended, and improved by many others overtime.

    To test the power of a new methodology, anovel problem is preferred over one that has aknown solution. The system we chose to devel-op presented us with a unique scope and un-solved integration issues: an augmented realitysetting inhabited by an embodied virtual char-acter; the character would be visible via a see-through stereoscopic display that the userwears and would help the user navigate the re-al-world environment. The character, calledMirage, should appear as a transparent, ghost-like stereoscopic 3-D graphic superimposed onthe users real-world view (figure 1). This sys-tem served as a test bed for our methodology;the system is presented in sufficient detail hereto demonstrate the application of the method-ology and to show the methodologys modularphilosophy, which the system mirrors closely.

    Related WorkRelated work can be broadly categorized intothree major sections based on emphasis: mod-ularization, mind models, and embodiment.Our work on constructionist AI directly ad-dresses the first two while embodiment is morerelevant to the project that we used to test themethodology.

    researchers who are bridging across disciplines,need better ways to share results and work to-gether and to work with others outside theirfields. To this end our principles foster reusablesoftware components through a common mid-dleware specification2 and mechanisms fordefining interfaces between components. Sec-ond, by focusing on the reuse of existing work,we are able to support the construction of morepowerful systems than otherwise possible,speeding up the path towards useful, deploy-able systems. Third, we believe that to studymental mechanisms they need to be embeddedin a larger cognitive model with significantbreadth to contextualize their operation andenable their testing under boundary condi-tions. This calls for an increased focus on sup-porting large-scale integration and experimen-tation. Fourth, bridging across multiplefunctions in a single, unified system increasesresearchers familiarity and breadth of experi-ence with the various models of thought todateas well as new ones. This is importantas are in fact all of the above pointswhen thegoal is to develop unified theories of cognition.

    Inspired to a degree by the classic LEGObricks, our methodologywhich we call a con-structionist approach to AIputs modularity atits center: Functions of the system are brokeninto individual software modules, which aretypically larger than software classes (that is,objects and methods) in object-oriented pro-gramming but smaller than the typical enter-prise application. The role of each module isdetermined in part by specifying the messagetypes and content of the information thatneeds to flow between the various functionalparts of the system. Using this functional out-line we then define and develop, or select,components for perception, knowledge repre-sentation, planning, animation, and other de-sired functions.

    Behind this work lies the conjecture that themind can be modeled through the adequatecombination of interacting functional ma-chines (modules). Of course, this is still debat-ed in the research community, and not all re-searchers are convinced of the ideas merits.However, this claim is in its essence simply acombination of two less radical ones. First, thata divide-and-conquer methodology will befruitful in studying the mind as a system. Sincepractically all scientific results since the Greekphilosophers are based on this, it is hard to ar-gue against it. In contrast to the search for uni-fied theories in physics, we see the search forunified theories of cognition in the same wayas articulated in Minskys (1986) theory, thatthe mind is a multitude of interacting compo-

    Articles

    78 AI MAGAZINE

  • ModularityModular approaches have been shown to speedup the development of large, complex systemsand to facilitate the collaborations of largeteams of researchers (Fink et al. 1995, 1996).For instance, Martinho, Paiva, and Gomes(2000) describe an architecture designed to fa-cilitate modular, rapid development of theatri-cal agents in virtual worlds. Like many othersystems (compare with Granstrm, House, andKarlsson 2002; Laird 2002; McKevitt, Nullin, and Mulvihill 2002; Badler, Palmer,and Bindiganavale 1999; and Terzopolous1999), their agents live in an enclosed virtualworld where the update cycle of the world canbe artificially controlled. Martinho, Paiva, andGomess architecture involves splitting theworld, along with the agents, into three sepa-rate modules, each of which can be developedindependently of the others. As in our ap-proach, this enables parallel execution of im-plementation, which reduces developmenttime, and combining the pieces to form a full

    system becomes a simpler task. Modularity alsoplayed a large role in the construction ofBischoff and Graefes (2000) impressive Hermesrobot. Simmons and colleagues robot Grace(2003), which involved the work of five institu-tions and more than 20 people, is another fineexample of a project that has benefited from amodular approach.

    The emphasis on modularity in our con-structionist methodology derives primarilyfrom the approach of Thrisson (1996), firstused for the communicative humanoid Gan-dalf and later for work at LEGO Digital (Brysonand Thrisson 2000), and from behavior-ori-ented design, or BOD (Bryson 2002). Brysondetails the elements of BOD, using principlesfrom behavior-based AI, such as reactive plansand tight integration of perception and action.However, this approach encompasses princi-ples focused on particular ways of planningand lacks more general principles and ways ofusing preexisting modules. The Ymir architec-ture (Thrisson 1999) presents relatively gener-

    Articles

    WINTER 2004 79

    Figure 1. Our Embodied Agent Mirage Is Situated in the Laboratory.Here we see how Mirage appears to the user through the head-mounted glasses. (The image has been enhanced for clarity.)

  • such augmented reality presents a compellingand powerful new human-computer interac-tion metaphor. Such characters can have manypractical applications: They can enhance therealism of multiplayer games and virtualworlds, they can function as information re-trieval assistants, and they could serve as virtu-al tour guides in real museums.

    Mind ModelsThere has been substantial work in the area ofmind construction for simulated humans (re-cent overviews can be found in McKevitt et al.2002; and Granstrm, House, and Karlsson2002). Given several decades of AI research, thefirst thing wed like to avoid is reinventing thewheel. With its operators and problem spaces,Soar (Newell 1990) presents a highly unified setof principles and architectural constructs thataddress a number of psychological regularitiesobserved in human cognition. The Soar archi-tecture has been explored in various systems,including computer games (Laird and van Lent2001, Laird 2002). Originally Soar focused onthe concept of cognition in a narrow sense:perception and action were left out of the ini-tial model. Yet to use Soar in simulated worldsrequires that these gaps be bridged. Soar hassince been extended to these areas, but itssmall set of cognitive principles dont make itvery well suited as a foundation for studyingcognitive model alternatives. Because of uni-form granularity, Soar is also relatively compli-cated to use when building large systems. Forthat a methodology with better abstractionsupport is called for.

    Behavior-based AI (compare with Bryson2001, Brooks 1991) was a response to old-school knowledge representation and its use ofexplicit mental models of the world. The ap-proach taken in behavior-based AI is to tightlycouple the perception and action in integratedsoftware structures that combine perceptualvariables and processes with motor plans andtheir selection and execution. In a complexsystem, this can sometimes make software de-velopment and maintenance easier: imple-menting processes like limb-eye coordinationcan, for example, be done with simpler controlstructures than in most other approaches.However, these architectures tend to be limitedto a single level of representation (comparewith Maes 1990); they are not conducive to theacceptance of superstructures that are needed,for example, to do dialogue interpretation andgeneration at multiple levels of abstraction (al-though see Bryson 2000).

    Puff the Magic LEGO Dragon (Bryson andThrisson 2000) and the Ymir architecture be-

    al modularization principles, using moduletypes such as perceptors, deciders, and actionswith subtypes, in a way that clearly distin-guishes their roles and mechanisms. This wasformalized and incorporated into the presentwork. We further extend it by turning the con-ventional module-oriented build process on itshead by making the messages in the systemequally as important as the modules them-selves, as explained further below.

    EmbodimentIn bringing together an agent and a user, twomethods are most common: the agent isbrought into the users world, as in robotics orscreen-based agents,4 or the user is brought in-to the agents world, as in immersivevirtualworld simulations.

    Due to their physical presence in the realworld, robots are inherently embodied. Therehas been a recent surge in simulating human-like movement and decision-making processesin robotics, and even more recent have been at-tempts to create a goal-oriented humanlike in-teraction between the robot and the user. Anexample is the work of Iba, Paredis, and Khosla(2002), where multimodal input gets used (ges-tures and speech) to program the vacuum-cleaning robot. Another version of bringingthe agent into the users world is exemplifiedby Thrissons social agent Gandalf (Thrisson1997, 2002), which modeled real-time, coordi-nated psycho-social dialogue skills includingturn-taking, body posture, gaze, and speech.

    An example of bringing the user into theagents world is the research by Rickel et al.(2001). In their work, a virtually embodiedcharacter is situated in simulated training ses-sions for the army; the user is placed in a virtu-al world with the agent. The authors acknowl-edge that much improvement is still needed inareas of virtual human bodies, natural lan-guage understanding, emotions, and human-like perceptions, yet a very small fraction of on-going development in this area makes use of,or makes available, readily available compo-nents that can be plugged into other systems.

    A third alternative exists for bringing togeth-er agent and user. The term augmented realityhas been used to refer to systems that combinecomputer-generated virtual objects with the re-al surrounding scene in an interactive, real-time manner (Azuma 1997). Participants in anaugmented reality environment typically weareither a semitransparent or a video see-throughhead-mounted display (HMD) that enablesthem to see localized virtual, graphical infor-mation rendered in context in the real envi-ronment. Placing autonomous characters into

    Articles

    80 AI MAGAZINE

  • fore it (Thrisson 1996, 1999) are both exam-ples of hybrid systems that use blackboards formodularizing information flow betweenprocesses based on type and scales of time.These systems contain both planners andschedulers, but their core is behavior-based,building upon a large number of modules thatinteract at different time scales. The system wedescribe in this article uses fewer modules, eachmodule being a rather monolithic application,a choice made not for theoretical reasons but asa direct result of following our design principlesfor quickly building a relatively broad system.CDM can be applied in the design of systemsincorporating any of the above approaches;moreover, to our knowledge, it is the best meth-od for integrating principles from all of them.

    Design PrinciplesOur approach is based on the concept of mul-tiple interacting modules. Modularity is madeexplicit in our system by using one or moreblackboards for message passing:

    To mediate communication betweenmodules, use one or more blackboardswith publish-subscribe functionality.

    The blackboard provides a localized record-ing of all system events, system history, andstate precisely at the level of abstraction repre-sented in the messages. The benefits of mes-sage-based, publish-subscribe blackboard archi-tectures are numerous. Among the most ob-vious is that their use allows the definition ofmodules whose input and output is fully de-fined by messages in the system, the messagesthus embodying an explicit representation ofthe modules contextual behavior. Buildingstateless modules wherever possible is thereforea direct goal in our approach, as it greatly sim-plifies construction. Given that the messageseffectively implement the modules applica-tion programming interfaces (APIs), they inturn directly mirror the abstraction level of thefull system, making the blackboard architec-ture a powerful tool for incremental buildingand debugging of complex, interactive systemswith multiple levels of abstraction. In manycases the messages we defined became key dri-ving forces in the Mirage system, around whichthe modules were organized.

    We also base our approach on the followingmeta-assumption:

    Only build functionality from scratch thatis critical to the raison dtre of the sys-temuse available software moduleswherever possible.

    This means that a critical part of the process

    is to define the projects goal, for example,whether the project is practically motivated(the primary goal being its functionality) or re-search-oriented (the primary goal is to answerone or more questions regarding mental pro-cess, human-computer interaction, or human-human interaction).

    StepsThe specifics of CDM are encapsulated in thefollowing outline. The indented text providesdetails by specific references to the examplesystem that we built.

    Step One. Define the Goals of the Project.Specify the primary motivation and goals be-hind the system.

    We wanted to show how to build an em-bodied, situated agent that significantlyincreased the value of an augmented real-ity by being able to interact naturally withhuman users. The primary focus wastherefore on demonstrating the interac-tion between the user, the agent, and theenvironment and the integration of thetechnologies necessary to support such asystem.

    Step Two. Define the Scope of the System.Specify at a high level what the intelligent sys-tem is intended to do. Using narratives, writeexample scenarios that typify its behavior.

    We start with an initial write-up of a fewhigh-level examples of user-agent interac-tions. From this, a more detailed specifica-tion of the abilities of the agent can be ex-tracted, guiding the selection of whichmodules to include, such as gesture track-ing, speech recognition, graphics, and soon. For example, in addition to includinga module encapsulating knowledge aboutthe objects in a room, we decided to in-clude a news summarization system, Co-lumbia Newsblaster (McKeown et al.2002),5 which enables the agent to tell usthe world headlines when requested andread from the news articles.

    Step Three. Modularize the System. The sys-tem is built using modules that communicatethrough a publish-subscribe mechanism orblackboards or a combination of both. Definethe functional areas that the system mustserve, and divide this into 5 to 10 distinct mod-ules with roughly defined roles. The modulescan then be recursively subdivided using thesame approach (principle of divisible modular-ity). This step is best done in a group meetingwhere all developers can contribute to the ef-fort.

    Articles

    WINTER 2004 81

  • ly module in the system to need access to aparticular type of data and then later anoth-er module is found to require access to thesame data, several possibilities should beconsidered. Modules with highly differentfunctional roles should typically not needaccess to the same set of dataif they do, itmeans they may have close functional de-pendencies; consider coupling them orputting them into the same executable withshared access to the same data. Perhaps moremodules are found to need the data in ques-tion; consider creating a separate data storeserver with its own API that both modulescan access, either via a blackboard or sepa-rate mechanism. Again, try to split the datasuch that only one module is in charge ofone set of data.

    Seventh, determine the number of black-boardswhich modules use which blackboards.The most obvious reason for modules toshare a blackboard is when they need tocommunicate directly, that is, when the out-put of one is the input of another. Considerusing two or more blackboards if (1) there isa wide range of information types in the totalset of messages in the system, for example,coexisting complex symbolic plan structuresand simple Boolean switches; (2) there is awide range of real-time requirements for theinformation flow, for example, high-frequen-cy vision processing and low-frequency planannouncements; (3) the system needs to runon multiple computers to achieve acceptableperformance. It is natural that modules withmessages containing similar content share ablackboard. Some modules are a naturalbridge between blackboards, as for examplewhen a module producing decisions to act re-quires perceptual data to make those deci-sions. Blackboards serve both as engineeringsupport and system optimization.

    Step Four. Test the System against the Sce-narios. Once a reasonably stable set of modulesand messages has been reached, the scenariosoutlined in step 2 are used to test-run the sys-tem on paper. Using these scenarios as a guide,every module should be made very simple atfirst, reading one type of message and postinganother. A full end-to-end chain is specified fora single interaction case, enabling every ele-ment in the path to be tested on the white-board, along with the routing mechanisms,timing assumptions, and so on, very early inthe design process. Such end-to-end testingshould also be performed regularly during im-plementation. The viability of the modulariza-tion is verified, with regard to the following:

    Expected communication (blackboard) through-

    First, classify modules into one of three roles, ac-cording to whether they serve perception,decision/planning, or action/animation.Each of these roles can be further split intomore levels. For example, high-level percep-tion versus low-level perception; short-, mid-and long-term planning; and high-level ani-mation versus low-level animation.

    Second, find off-the-shelf software. Locate soft-ware that is already implemented that couldbe used in the system. This step may requiremodification of the systems scope. (Thisstep does not apply to components that arepart of the projects raison dtre, as alreadyexplained.)

    Third, specify custom-written modules. Havingidentified usable external modules, specifyadditional modules that need to be writtenand what their functionality should be.

    Fourth, follow the natural breaks in the infor-mation-processing path. Let information flowvia blackboards (1) when processes are of dis-tinctly different types; (2) where informa-tion flow is relatively low frequency, (3)where information flow is event driven, or(4) where the method of information flow ispull rather than push, (that is, where themodules typically request their data, ratherthan having it pushed to them by otherprocesses), provided that they are savingprocessing cycles by pulling (for example,because the frequency of data transfer viablackboards is lower that way). This stephelps define the modules and their inputand output.

    Fifth, define message types. Specify how thevarious parts of the system should conveytheir state to the rest of the system. This stepoperationalizes the role of each module anddefines the modules interfaces. Define mes-sage content in such a way as to leave themodules as raw processors of data; in otherwords, make the blackboards (messages onthe blackboards) contain all the data neces-sary for every module to produce its output.This pushes state information into the black-boards, greatly easing system construction.Where this is not possible, divide the datathat needs to be external to the blackboardsin a way that leaves a single module in chargeof a clearly demarcated type of data so thatthe module serves as an interface or wrap-per between the data and the blackboards.Should this not be possible, consider creatinga separate module cluster with its own black-board and recursively follow these guidelines.

    Sixth, avoid duplication of information. If amodule has been designated as being the on-

    Articles

    82 AI MAGAZINE

  • put. Network speed and computing power putnatural constraints on the maximumthroughput of the blackboard. For example,perceptual data, which may need to be updat-ed at 1015 Hz, may have to bypass centralmessage paths. If the middleware/ blackboardis powerful enough to service a stream of da-ta, it is preferable to route the data through it;a centralized data traffic center enables mon-itoring, debugging, prioritization, and so on.However, most of todays network solutionsdo not handle transmission of raw visual orauditory data at sufficiently high rates, neces-sitating workarounds. In practical terms, thismeans that one may need to combine whatotherwise should naturally be two or moreseparate modules.

    Efficient information flow. Static or semistaticinformation that is frequently needed inmore than one place in the system may be ahint that the processes using that informa-tion should share an executable, thus con-taining the information in one place. Some-times this is not a good idea, however,especially when the processes sharing the in-formation are of very different natures, as invisual processing and motor control. In suchcases it is better to set up a server (hardwareor software) for the data that needs to beshared. Again, it is best that only one mod-ule be in contact withor in charge ofdatathat cannot be represented in the content ofmessages for reasons of size, frequency of up-dating, or other reasons.

    Convenience with regard to programming lan-guages and executables. If two modules, forexample speech synthesis and an animationsystem, are written in the same language, itmay be convenient to put them together in-to one executable. This is especially sensibleif the modules (1) use the same set of data,(2) are serially dependent on each other, (3)have similar update frequency requirements,(4) need to be tightly synchronized, or (5)are part of the same conceptual system (forexample, low- and high-level visual percep-tion). When conceptually distinct modulesare combined into the same executable, caremust be taken not to confuse their function-ality in the implementation, which couldprevent further expansion of either modulein the future.

    The ability to verify timing is especiallyimportant when building communicativecharacters since the system is bounded byreal-time performance requirements: theagent needs to act in real time with regardto the users natural behavior and expecta-tions. If any part of the path, from input

    (such as a user finishing a question) to theagent responding (such as providing ananswer), takes longer than the designersexpect, or if some delay expected to be ac-ceptable to the user turns out to be unac-ceptable, a redesign or significant modifi-cation may be required.

    Three real-world examples serve to illus-trate the implications of efficient informa-tion flow and convenience with regard toprogramming languages and executables.First, team members could not agree onhow to design the perception part: shouldperception be part of the world, or shouldit be a separate module that was fed infor-mation continuously from the worldmodule? Second, integration of perceptualinformation from speech and visionhow should it be implemented? A thirdquestion, which didnt come up until wehad iterated the design a few times, wasthe placement of the speech synthesismodule. Because this module was writtenin Java (we used FreeTTS6) and the worldwas written in Java 3-D,7 we decided toput the speech module into the world (seefigure 2) so that it could be better synchro-nized with the multimodal behavior ofthe agents body (we had the animationscheduler control it because it gave usforced synchronization with almost nooverhead). Even though the speech syn-thesizer contains knowledge, prosodygeneration, which rightfully belongs tothe mind rather than the body, we de-cided in favor of performance over theo-retical correctness.

    Step Five. Iterate. Go through steps 24 as of-ten as needed.

    Step Six. Assign the Modules to Team Mem-bers. The natural way to assign responsibilitiesis along the lines of the modules, followingpeoples strengths and primary interest areas.Every module or module cluster gets one leadteam member who is responsible for that mod-ules or module clusters operation, for definingthe messages the module or module cluster re-ceives and posts. A good way to assign tasks, es-pecially if the number of team members issmall, is to borrow from extreme program-ming8 and assign them to pairs: one lead, cho-sen for his or her primary strength, and onesupport, chosen by his or her interest or sec-ondary strength. This step depends ultimatelyon the goals of the project and the experienceand goals of the project members.

    We had eight modules and five teammembers; members served as lead on two

    Articles

    WINTER 2004 83

  • the interaction between modules and the com-plexity of the message content being transmit-ted between them.

    We did very little tuning of our system,but as a result, we did not implement allthe functionality we had originallyplanned.

    Execution in a Classroom SettingIn a classroom setting we divide the abovemethodological principles into four phases.The first phase starts off with a single-para-graph proposal from each student.. The pro-posals, which are reviewed and approved bythe instructor, should outline the topic, moti-vation, and components of the system to bebuilt. When a project has been chosen (some-times by merging two or more ideas into onesystem), the second phase focuses on creatingan integration plan, the expected results, and afinal (functional) form of the implemented sys-tem. It is important to write usage scenarios ex-emplifying the major functional aspects of thedesired system.

    During the third phase each team memberwrites up a description and breakdown (as longand detailed as necessary) of his or her individ-ual contribution to the project, identifying po-tential difficulties and challenges and propos-ing solutions to deal with them, as well asidentifying which team members he or sheneeds to collaborate with. The description de-tails the modules and the functions that are tobe coded, along with the input and outputmessages that the modules need and whichmodules produce or consume them. The moredetail about the messages content and syntaxin these write-ups, the better for the wholeteam, because the write-ups help concretizehow the whole system works. At the end of thisphase, the team should have a project planwith measurable milestones, based on the out-line from phase two.

    The fourth phase is the implementation. Thelead team members for each part of the system,in collaboration with the instructor, help revisethe project, reassign tasks, and so on, as neces-sary as the project progresses. The leads are ul-timately responsible for making sure that allmodules in the project can handle minimal in-teraction, as defined.

    Mirage SystemWe now give an overview of the system usedfor testing the methodology to provide an ex-ample system built with the methodology andbetter equip readers to evaluate its applicabilityin their own context.

    modules and support on one, or lead onone and support on two.

    Step Seven. Test All the Modules in Coopera-tion. It is critical to test the modules at runtime early on in the implementation phase toiron out inconsistencies and deficiencies in themessage protocols and content. Selecting theright middleware for message routing andblackboard functionality is key, as the final sys-tem depends on this being both reliable andappropriate for the project. Open adaptershave to exist for the middleware to allow inte-gration of new and existing modules. The test-ing is then done with the chosen middlewareand stub versions of the modules before theirfunctions have been implemented. To beginwith, the stubs can simply print out the mes-sages they take as input. They can also be set topublish canned messages via keyboard orgraphical menu input. Using these two fea-tures, the stub modules can be used to do sim-ple integration testing of the entire system.

    Integration testing often takes longer thanexpected. It is also one of the most overlookedsteps, yet it cannot be avoided. Doing so onlyforces integration testing to happen later in theprocess. Later testing, in turn, delays an accu-rate estimate of the projects full scope and in-creases the probability of missed deadlines andcancellation of planned functionality.

    Step Eight. Build the Modules to Their FullSpecification. This step is naturally done in fre-quent alternation with the prior step; as func-tionality is added, regression errors tend tocreep in. However, designing modules withclearly defined set of inputs and outputs dra-matically decreases the amount of potentialcommunication errors and makes the systemsbehavior more predictable. In addition to build-ing the full modules, one must build into eachmodule extensive fine-tuning capabilities, suchas the timing of message posting and trackingof state and history, to achieve the desired com-munication between modules later on.

    Step Nine. Tune the System with All ModulesCooperating. It is important to test the systemfor a while with a group of people using it andto tune it based on the results. Especially for re-al-time interactive systems, variations in peo-ples interaction style often unearth faults inthe implementation logic, especially simplifi-cations that the team may have made in the in-terest of time, and since the system is based onmultiple modules interacting and cooperating,their real performance is not understood untilthey have been subjected to a number of differ-ent user input scenarios. This step can be arbi-trarily long, depending on the complexity of

    Articles

    84 AI MAGAZINE

  • To implement publish-subscribe functional-ity between modules we used CMLabs Psy-clone,9 a platform that provides easy hookupvia Java and C++ adapters. Psyclone sharessome features with earlier efforts in this area,for example Cohen et al.s (1994) Open AgentArchitecture and Fink and colleagues (1996)DACS. In our case the blackboards state andhistory served primarily as support for iterativedevelopment and debugging. The architectureof the Mirage system is shown in figure 2; a listof the modules is given in table 1. We found asingle blackboard sufficient for the level ofcomplexity of our system but would probablyadvocate a multiblackboard approach with fur-ther expansion. The system and its modules arediscussed briefly in the following sections.

    Augmented Reality EnvironmentTo bridge the gap between the real and the vir-tual and bring our agent to life, we chose anaugmented reality approach. A head-mountedstereoscopic display (HMD) with selectivetransparency allows us to render the agent as a3-D, stereoscopic overlay graphic that the usercan perceive to be situated in the real environ-ment. The glasses are fully transparent exceptwhere the agent appears. Special tracking tech-nology makes the agent appear just like anyother object with a fixed location in the room.

    The augmented reality environment consistsof both perception and action/animationscheduling modules, as well as a complete

    graphics system that renders our embodiedagent (figure 1). The user wears the optical see-through glasses (Sony LDI-D100B) and sees thereal world overlaid with a stereoscopic image ofthe agent generated in real time. Using posi-tion and orientation tracking of the users headand hand,10 the system can make the virtualagent appear as an active part of the real envi-ronment. Mirage can remain stationary, ormove around in the world, independently ofthe user, giving the impression that he occu-pies an actual physical location in space.11

    Our system has a detailed 3-D model of theentire room in which Mirage is placed, includ-ing sofa, tables, chairs, desks, computers, andso on, giving him a certain amount of knowl-edge about his surroundings. Mirage can thusperceive the real objects in the room, walkaround the (real-world) furniture and objects,and orient himself relative to these. All graph-ics are implemented using the Java 3-D API.While the 3-D model of Mirage and the anima-tion schedulers for him were designed and im-plemented as part of this project, the trackingand graphical model of the environment hadalready been built and had to be adapted to oursystem.

    Interaction CapabilitiesWe see future versions of Mirage as guidingtours in museums or tutoring employees aboutthe layout of production plants, giving him po-tential as part of future commercial applica-

    Articles

    WINTER 2004 85

    Psyclone

    Graphics

    Newsblaster

    Decision

    SpeechRecognition

    Perception 2

    Model and Room KB

    AnimationControl

    AnimationControl

    Perception 1 SpeechSynthesizer

    Figure 2. Mirage System Architecture, Showing Sensor and Display Connections to the Physical Devices.Arrows to and from Psyclone indicate message and control flow. Small striped rectangles on the edges of modules signify Psyclone adapters.

  • phisticated humanoid, we envision several sep-arate cognitive modules for tasks such as rea-soning, dialog management, and so on all in-teracting via message passing.

    The primary reason for separating percep-tion into two modules was to be able to use thebuilt-in geometric computation methods ofthe graphics subsystem and its already-avail-able ability to detect selection of objects, point-ing, and intersection with other objects. Doingthose checks outside of the graphics environ-ment would have been inefficient and harderto implement, albeit theoretically cleaner. Thismeant that perceptual responsibilities wereeventually split between two modules (notcounting speech). However, once those basicmessages regarding geometric operations werecreated, integrating those with the speech mes-sages coming separately from the speech recog-nizer module was a relatively easy task, han-dled through the separate integration module,Perception-2.

    The Decision module is responsible for inte-grating and interpreting all messages from theperception modules and forming a response ac-tion for the agent. That selected action is thensent to the Action Scheduling module, whichcoordinates the avatar animation and speechsynthesis. When the Decision module receivescontradicting inputs or is unable to settle on asingle top-level decision, it can request that theavatar ask questions for clarification.

    DiscussionTwo of the third-party components we usedturned out to function inadequately in our fi-nal system. Sphinx II was too difficult to set upand tune properly in the time we had planned.We could have gotten better accuracy withminimal tuning using an off-the-shelf speech-recognition product such as ViaVoice or Drag-on Naturally Speaking. We also found that thespeech produced by FreeTTS is very hard to un-derstand in noisy environmentspeople whotried our demo had a very hard time under-standing much of what Mirage said, save forthe very shortest, most obvious utterances.While it is great to have an open source synthe-sizer in Java, it would be even better if it gener-ated more comprehensible output.

    One potential problem of using existingmodules is that sometimes they lack controlfrom the outside. Although such limitationsturned out not to be a problem with Mirage(we had a clear idea up front of how to engi-neer our way around them), the issue is ageneric one that will not go away. As a futureremedy to this problem we are working on a setof principles to help guide module builders

    tions. To make the agent interactive and con-versational, we implemented a multimodal in-put system that uses speech recognition andmotion tracking of the users right hand. Theagent itself is capable of multimodal communi-cation with the user via a speech synthesizer,body language, and manual gesture.

    For speech recognition we used the Sphinxspeech recognizer from Carnegie Mellon Uni-versity.12 Mirage has access to a knowledge basefor all the objects where type, use, names, andother attributes of individual objects are stored,allowing him to talk about them. The user canpoint at any object in the room and ask Miragewhat that object is. When asked about thenews today, Mirage recites news summariesgenerated dynamically by the Newsblaster sys-tem from online news feeds (McKeown et al.2002). Mirage is also aware of proximity, and itknows its own name; when either his name isspoken or the user comes within 1 meter, Mi-rage will turn to and greet the user.

    Perception, Decision, and ActionTo allow for the human interaction scenariosmentioned above, our system used two cus-tom-written perception modules in combina-tion with a custom decision module. The roleof the perception modules is to create messagesabout changes in the environment, includingrecognition of the users speech, proximity ofthe user to the Mirage agent, and so on. Thosemessages are then used by the Decision moduleto create sensible responses to the users inputthat are subsequently executed by the anima-tion scheduler. Examples of messages betweenmodules are given in table 2.

    The Perception-1 module is built into theaugmented reality environment. It keeps trackof the agents and the users positions and ori-entations, noting changes of states such as isthe agent visible to the user, is the agentlooking at the user, and so on. Perception-1uses technology based on previous work by Ol-wal, Benko, and Feiner (2003) in which Sense-Shapes (volumetric regions of interest) are usedto determine the users selection and pointing.This relatively simple selection mechanism istriggered on any speech command, returning alist of objects being pointed to at the time ofspeech. The output of Perception-1 is furtherprocessed by the Perception-2 module, whereobjects pointed at are combined with speechbased on event time stamps, to create a holisticperception of the users action. Implied mean-ing of perceptions is not represented explicitlybut emerges from their combination with deci-sion mechanisms, thus leaving out the use ofan explicit cognition module. In a more so-

    Articles

    86 AI MAGAZINE

  • when selecting, documenting, and exposingthe modules knobs and dials so that mod-ules are better suited for use with each other inmultimodule, multiauthor systems.

    Psyclone was a good match to CDM andmade it easy to distribute processes betweenmachines, something that enabled us toachieve acceptable run-time performancequickly. This approach of distributing processesis likely to become increasingly attractive overthe coming years as computer prices keep drop-ping, making garage-AI feasible and bringingnew developers to the field with clusters ofcheap computers from yesteryear, networks ofwhich can supply the processing power evenfor serious AI work. The computer industry hasno motivation to promote the reuse of oldcomputing equipment; software such as Psy-clone does, and Mirage shows this to be apromising option.

    Unlike closed virtual-world systems, ours isan open loop, with unpredictable human be-havior streaming in real time. Since the full sys-tem lives on separate computers that cannot berun in lock-step, unpredictable delays on thenetwork and the computers can differentiallyaffect the performance of the various modules.This calls for an architecture in which moduleshave tolerance for the error caused by such de-lays. Although we did not take the implemen-tation very far in this direction, our general ap-proach reflects these constraints directly. Forexample, universal time stamps were used ex-tensively throughout to coordinate events andmessages between modules.

    ConclusionUsing CDM we designed and implemented aconversational character embodied in an aug-mented reality system. We developed and in-corporated a variety of technologies to serveour goal of a natural and simple human-hu-manoid interaction that worked in real time.The design and implementation of the full sys-tem took place within a period of nine weeks,with five students dividing the work evenly,taking an estimated total of two mind-months(man-months) to complete. The results dem-onstrate that the methodology and its modu-larization principles work very well for the con-struction of such systems.

    The student team was composed of one un-dergraduate and four second-year masters stu-dents. Before starting the Mirage project, onlythe instructor had experience with the meth-odology, and none of the students had built aninteractive system of this scale before. One stu-dent had more than what could be considered a

    minimal experience in building interactive sys-tems, while two had prior experience withspeech-recognition systems. None of them wereAI majorsone had taken no introductory AIcourses; the others had taken one. Students feltthat the approach taken gave them an increasedoverview of the systems functioningsome-thing they would not have experienced other-wise. We feel confident to conclude that themethodology works well for building large,modular systems, even for teams on which mostof the members are novices in AI. This makesCDM an excellent choice for classroom use.

    Because CDM was developed for integratinglarge numbers of modules and system hierar-chies, it is not sensible to apply it to the rein-vention of well-tested algorithms already de-veloped over the past decades, such as, forexample, natural language parsing. However,the approachs utility is clear for incorporatingthose available algorithms into larger systems,where their operation can be observed in con-text and where they have some hope of achiev-ing utility. Mirage provides examples of howthis can be done.

    Articles

    WINTER 2004 87

    Module Posts Subscribing Modules

    Speech Recognition Input.Perc.UniM.Hear:Off

    Input.Perc.UniM.Hear:On

    Input.Perc.UniM.Voice.Speak

    Perception-1

    Perception-2

    Perception-1 knowledge.user.agent-proximity

    knowledge.user.looking-at-agent

    knowledge.agent.looking-at-user

    knowledge.user.looking-at-model

    knowledge.model.visible

    knowledge.user.pointing-at-model-objects

    knowledge.user.pointing-at-room-objects

    Perception-2

    Perception-2 knowledge.dialog-topic

    knowledge.dialog-topic.shifting

    knowledge.user-goal

    Decision

    Table 1. A List of Selected Modules and Their Posted and Subscribed-to Message Types.

    Message Type Input.Perc.UniM.Hear:On

    Sending Module Speech

    Subscribing Modules Perception-1, Perception-2

    Description Speech started

    Arguments None

    Message Type Input.Perc.UniM.Hear:Off

    Sending Module Speech

    Subscribing Modules Perception-1, Perception-2

    Description Speech stopped

    Arguments None

    Table 2. Example Specification of Message Format for Modules.

  • tems. While we dont consider the Mirage pro-ject to provide a scientific test of the hypothe-sis that models of mind can be developed in amodular fashion, nothing in our explorationsindicates some sort of limit to the kinds ofmental structures that CDM could support.The approach applies as equally to buildingbelievable interaction as to the goal of accu-rately modeling psychological and psycho-bio-logical processes. Indeed, since it allows modu-larization to happen at any size, and at one,two, or many, levels of detail, there is hardly asystem to which the methodology cannot beapplied. That does not mean to say that it isequally applicable to modeling all kinds of sys-temsindeed, we believe that it is most applic-able for systems with ill-defined boundaries be-tween subsystems and where the number ofvariables to be considered is relatively large. Inthe current state of science, primary examplesinclude ecosystems, biological systems, socialsystems, and intelligence.

    We are currently in the process of setting upa forum, in collaboration with a larger team, tofurther develop the constructionist approachto AI and build a repository of plug-and-playAI modules and systems. We invite anyonewith interest to join in this effort; the URL ismindmakers.org.

    AcknowledgmentsWe thank Steven Feiner for access to his labo-ratory, technology, and students; Alex Olwalfor creating the avatar; the Columbia News-blaster team; our reviewers for excellent sugges-tions; and CMLabs for Psyclone, a free academ-ic version of which can be downloaded fromwww.mindmakers.org/projects/Psyclone. Psy-clone is a trademark of Communicative Ma-chines Laboratories. LEGO is a trademark ofThe LEGO Group, A/S.

    Notes1. While Newells architecture, Soar, is based on asmall set of general principles that are intended toexplain a wide range of cognitive phenomena,Newell makes it very clear in his book (Newell 1990)that he does not consider Soar to be the unified the-ory of cognition. We read his call for unification notto mean in the narrow sense the particular premiseshe chose for Soar but rather to refer in the broadersense to the general breadth of cognitive models.

    2. www.mindmakers.org/air/airPage.jsp

    3. Personal communication.

    4. If a screen-based agent is aware of real-world ob-jects, as well as virtual ones, it might be classified asan augmented reality system. Traditionally, though,such classification would also imply that the usersreality is augmented in some obvious way by theagent/system.

    Explicit representation of module interac-tion through messages, which can be inspectedboth on paper and at run time, increases thesystems transparency and increased all teammembers understanding of how the variousparts of the system work. In spite of the typicaloptimism of (young) programmers, the stu-dents were surprised that all modules, exceptfor the Speech Recognition module, were work-ing as planned at the end of the semester (theproject did not start until the last third of thesemester). We feel the total estimated develop-ment time of two mind-months strongly sup-ports the hypothesis that CDM speeds up sys-tem creation. A parallel can be found in thework by Maxwell and colleagues (2001), whoexplored the use of a message-passing architec-ture to facilitate modular integration of severalkey technologies needed for constructing inter-active robots. Their robots have sophisticatedvision, planning and navigation routines,speech recognition and synthesis, and facialexpression. They achieved impressive resultswith limited resources: ten undergraduatesworking on the project for eight weeks, con-structing three robots with this architecture,two mobile and one stationary.

    One question that might be raised is whetherthe system thus built is perhaps too simple,somehow hiding a hypothetical sharp rise indifficulty when applied to the design of morerealistic models of mind, with one or two ordersof magnitude greater complexity (or requiringfunctionalities not addressed in Mirage). Al-though we cannot answer this conclusively, wefeel that evidence from earlier variants of themethodology counter this claim, it having beenused with good results in, for example, the de-sign of a large virtual-reality environment(Bryson and Thrisson 2000) and a natural lan-guage system deployed for CNET and Buy.com(Lucente 2000) that supported large numbers ofInternet users. The methodology is very gener-al; there is no doubt that systems of significant-ly greater complexity than Mirage can be builtusing the approach, reaping the samequitepossibly even greaterbenefits.

    In summary, we have presented (1) a novel,relatively broad, real-time system developed bya small group of relatively inexperienced stu-dents working within a very short period oftime; (2) team members significantly height-ened awareness of the functional breadth need-ed to construct interactive intelligent systems;(3) integration of prior work into a new system,and (4) a novel demo with commercial poten-tial, as well as commercial examples of earlieruses of the methodology, showing that it canhelp in the development of real-world AI sys-

    Articles

    88 AI MAGAZINE

  • 5. www.cs.columbia.edu/nlp/newsblaster

    6. freetts.sourceforge.net/docs/index.php

    7. java.sun.com/products/java-media/3D/

    8. www.extremeprogramming.org/

    9. See The Psyclone Manual (www.cmlabs.com/psy-clone). CMLabs Psyclone executables can be down-loaded for Linux, Windows, and Macintosh OSXfrom www. mindmakers.org/projects/psyclone.

    10. InterSense IS900. http://www.isense.com/

    11. The tracking keeps the agent still most of thetime; fast movements of the head will cause theagents image to jiggle.

    12. www.speech.cs.cmu.edu/sphinx/

    ReferencesAzuma, R. T. A Survey of Augmented Reality. 1997.Presence: Teleoperators and Virtual Environments. 6(4):355385.

    Badler, N.; Palmer, M. S.; and Bindiganavale, R. 1999.Animation Control for Real-Time Virtual Humans.Communications of the ACM 42(8): 6573.

    Bakker, B.; and den Dulk, P. (1999). Causal Relation-ships and Relationships between Levels: The Modesof Description Perspective. In Proceedings of the Twen-ty-First Annual Conference of the Cognitive Science Soci-ety, 4348, ed. M. Hahn and S. C. Stoness. Mahwah,N.J.: Lawrence Erlbaum Associates.

    Bischoff, R. 2000. Towards the Development ofPlug-and-Play Personal Robots. Paper presented atthe First IEEE-Robotics and Automation Society In-ternational Conference on Humanoid Robots, Cam-bridge, Mass., September 78.

    Bischoff, R.; and Graefe, V. 1999. Integrating Vision,Touch and Natural Language in the Control of a Sit-uation-Oriented Behavior-Based Humanoid Robot.In Proceedings of the 1999 IEEE International Conferenceon Systems, Man, and Cybernetics, 9991004. Piscat-away, NJ: Institute of Electrical and Electronics Engi-neers.Brooks, R. A. 1991. Intelligence without Reason. InProceedings of the Twelfth International Joint Conferenceon Artificial Intelligence, 569595. San Francisco: Mor-gan Kaufmann Publishers.Brooks, R. A.; Ferrell, C.; Irie, R.; Kemp, C. C.; Mar-janovic, M.; Scassellati, B.; and Williamson, M. 1998.Alternative Essences of Intelligence. In Proceedings ofthe Fifteenth National Conference on Artificial Intelli-gence and the Tenth Conference on Innovative Applica-tions of Artificial Intelligence, 961968. Menlo Park,Calif.: AAAI Press.

    Bryson, J. 2001. Intelligence by Design: Principles ofModularity and Coordination for Engineering Com-plex Adaptive Agents. Ph.D. diss., Massachusetts In-stitute of Technology, Artificial Intelligence Labora-tory, Cambridge, Mass..

    Bryson, J. 2000. Making Modularity Work: CombiningMemory Systems and Intelligent Processes in a DialogAgent. Paper presented at the Society for the Study ofArtificial Intelligence and the Simulation of Behaviour(AISB) 2000 Symposium on How to Design a Func-tioning Mind, Birmingham, England, April 1720.

    Bryson, J.; and Thrisson, K. R. 2000. Bats, Dragonsand Evil Knights: A Three-Layer Design Approach toCharacter-Based Creative Play. Virtual Reality (SpecialIssue on Intelligent Virtual Agents) 5(1): 5771. Hei-delberg: Springer-Verlag.

    Cohen, Philip R.; Cheyer, Adam J.; Wang, Michelle;and Baeg, Soon Cheol. 1994. An Open Agent Archi-tecture. In Readings in Agents, 197204, ed. M. N.Huhns and M. P. and Singh. San Francisco: MorganKaufmann Publishers.

    Fink, G. A.; Jungclaus, N.; Kummer, F.; Ritter, H.; andSagerer, G. 1996. A Distributed System for IntegratedSpeech and Image Understanding. In Proceedings ofthe Ninth International Symposium on Artificial In-telligence. Monterrey, Mexico: Instituto Tecnologicoy de Estudios Superiores de Monterrey.

    Fink, G. A.; Jungclaus, N.; Ritter, H.; Saegerer, G.1995. A Communication Framework for Heteroge-neous Distributed Pattern Analysis. In Proceedings ofthe First IEEE International Conference on Algo-rithms and Architectures for Parallel Processing(ICA3PP95), 881890 Piscataway, N.J.: : Institute ofElectrical and Electronics Engineers, Inc.

    Granstrm, B.; House, D.; and Karlsson, I., eds. 2002.Multimodality in Language and Speech Systems. Do-drecht, The Netherlands: Kluwer Academic Publish-ers.

    Iba, S.; Paredis, C. J. J.; and Khosla, P. K. 2002. Inter-active Multi-Modal Robot Programming. In Proceed-ings of the 2002 IEEE International Conference onRobotics and Automation. Piscataway, N.J.: Instituteof Electrical and Electronics Engineers, Inc.Laird, J. 2002. Research in Human-Level AI UsingComputer Games. Communications of the ACM 45(1):3235.

    Laird, J. E.; and van Lent, M. 2001. Human-Level AIsKiller Application: Interactive Computer Games. AIMagazine 22(2): 1525.

    Lucente, M. 2000. Conversational Interfaces for E-Commerce Applications. Communications of the ACM43(9): 5961.

    Maes, P. 1990. Situated Agents Can Have Goals. InDesigning Autonomous Agents: Theory and Practice fromBiology to Engineering and Back, 4970, ed. P. Maes.Cambridge, Mass.: The MIT Press.

    Martinho, C.; Paiva, A.; and Gomes, M. R. 2000.Emotions for a Motion: Rapid Development of Be-lievable Pathematic Agents in Intelligent Virtual En-vironments. Applied Artificial Intelligence 14(1):3368.

    Maxwell, B. A.; Meeden, L. A.; Addo, N. S.; Dickson,P.; Fairfield, N.; Johnson, N.; Jones, E. G.; Kim, S.;Malla, P.; Murphy, M.; Rutter, B.; and Silk, E. 2001.REAPER: A Reflexive Architecture for PerceptiveAgents. AI Magazine 22(1): 5366.

    McCarthy, J.; Minsky, M.; Sloman, A.; Gong, L.; Lau,T.; Morgensten, L.; Mueller, E. T.; Riecken, D.; Singh,M.; and Singh, P. 2002. An Architecture of Diversityfor Common Sense Reasoning. IBM Systems Journal41(3): 524.

    McKeown, K. R.; Barzilay, R.; Evans, D.; Hatzivas-siloglou, V.; Klavans, J. L.; Nenkova, A.; Sable, C.;Schiffman, B.; and Sigelman, S. 2002. Tracking and

    Articles

    WINTER 2004 89

  • in computer science from Columbia Uni-versity and began his Ph.D. in machinelearning at Carnegie Mellon University in2004. His primary research interests are un-supervised machine learning and anomalydetection. He can be reached at [email protected].

    Sameer R. Maskey is aPh.D. student at Colum-bia University. His pri-mary area of research isspeech summarization.He has been workingwith Julia Hirschberg inexploring structural, lin-guistic, and acoustic fea-

    tures to summarize broadcast news. He isalso interested in machine learning tech-niques for mining speech data. He can bereached at [email protected].

    Denis Abramov holdsan M.Sci. degree in com-puter science from Co-lumbia University andobtained his B.A. fromBrandeis University. Hehas done speech-recog-nition research at Drag-on Systems as well as

    consulting for startup companies. For thepast few years he has been developing ap-plications for the financial industry. Hisemail is [email protected].

    Aruchunan Vaseekaranis vice president forRandD at m-Rateit.com,a startup firm that he co-founded. This firm de-velops intelligent expert-based solutions forpersonal health manage-ment. He holds a B.Sc. in

    electrical engineering and an M.Sc. in com-puter science from Columbia University.He is a member of Tau Beta Phi, the Nation-al Engineering Honor Society. Vaseekarancan be reached at vasee@ ontash.net.

    modal Dialogue with People. In Proceed-ings of the First ACM International Confer-ence on Autonomous Agents. New York:Association for Computing Machinery..

    Thrisson, K. R. 1996. Communicative Hu-manoids: A Computational Model ofPsychosocial Dialogue Skills. PhD diss.,Massachusetts Institute of Technology Me-dia Laboratory, Cambridge, Mass.

    Kris Thrisson has beendeveloping AI systemsand related technologiesfor 15 years. As a codirec-tor of the Wizard Groupat LEGO Digital he andhis group developed alarge virtual world in-habited by interactive

    creatures. At the Massachusetts Institute ofTechnology, he created Gandalf, a graphi-cal guide to the solor system featuring mul-timodal perception and action, real-timeturn-taking, and task knowledge. As vicepresident of engineering at Soliloquy Inc.he directed the development of on-linenatural language bots that assisted cus-tomers of CNET and Buy.com. His recentstartup, Radar Networks Inc., develops se-mantic technologies for next-generationWeb services. Thrisson has consulted onbusiness and technology for numerouscompanies including British Telecom, In-teractive Institute, Kaupthing InvestmentBank, and NASA. He is currently chief tech-nology officer of Radar Networks Inc., acompany he cofounded. He has degrees inpsychology and human factors, and holdsa Ph.D. in media arts and sciences from theMassachusetts Institute of Technology Me-dia Laboratory. Thrisson recently joinedthe faculty at faculty at Reykjavik Universi-tys Department of Computer Science,where he develops communicative hu-manoids. He can be reached at kris@@ru.is

    Hrvoje Benko is current-ly a Ph.D. student work-ing with Steven Feiner atthe Department of Com-puter Science at Colum-bia University, New York.Among his research in-terests are hybrid user in-terfaces, augmented real-

    ity, and wearable mobile devices. He is amember of the Institute of Electrical andElectronics Engineers. His e-mail address [email protected].

    Andrew Arnold is a software engineer atGoogle in New York City. He received a B.A.

    Summarizing News on a Daily Basis withColumbias Newsblaster. Paper presented atthe Conference on Human Language Tech-nology (HLT 02), San Diego, Calif., Mar.2427.

    McKevitt, P.; Nullin, S.; and Mulvihill,C., eds. 2002. Language, Vision and Music.Amsterdam: John Benjamins.

    Minsky, M. 1986. The Society of Mind. NewYork: Simon and Schuster.

    Newell, A. 1990. Unified Theories of Cogni-tion. Cambridge, Mass.: Harvard UniversityPress.

    Nicolesco, M. N.; and Mataric, M. J. 2002. AHierarchical Architecture for Behavior-Based Robots. In Proceedings of the First In-ternational Joint Conference on Au-tonomous Agents and Multi-AgentSystems. New York: Association for Com-puting Machinery.

    Olwal, A.; Benko, H.; and Feiner, S. 2003.SenseShapes: Using Statistical Geometry forObject Selection in a Multimodal Augment-ed Reality System. In Proceedings of theSecond IEEE and ACM International Sym-posium on Mixed and Augmented Reality.Los Alamitos, Calif.: IEEE Computer Soci-ety.

    Rickel, J.; Gratch, J.; Hill, R.; Marsella, S.;and Swartout, W. 2001. Steve Goes toBosnia: Towards a New Generation of Vir-tual Humans for Interactive Experiences, InArtificial Intelligence and Interactive Enter-tainment: Papers from the 2000 AAAI SpringSymposium, ed. John Laird and Michael vanLent. Technical Report SS-01-02. MenloPark, Calif.: American Association for Arti-ficial Intelligence.

    Simmons, R.; Goldberg, D.; Goode, A.;Montemerlo, M.; Roy, N.; Sellner, B.; Urm-son, C.; Schultz, A.; Abramson, M.; Adams,W.; Atrash, A.; Bugajska, M.; Coblenz, M.;MacMahon, M.; Perzanowski, D.; Horswill,I.; Zubek, R.; Kortenkamp, D.; Wolfe, B.;Milam, T.; and Maxwell, B. 2003. GRACE:An Autonomous Robot for the AAAI RobotChallenge. AI Magazine 24(2): 5172.

    Terzopoulos, D. 1999. Artificial Life forComputer Graphics. Communications of theACM 42(8): 3342.

    Thrisson, K. R. 2002. Natural Turn-TakingNeeds No Manual: Computational Theoryand Model, from Perception to Action. InMultimodality in Language and Speech Sys-tems, 173207, ed. B. Granstrm, D. House,and I. Karlsson. Dordrecht, The Nether-lands: Kluwer Academic Publishers.

    Thrisson, K. R. 1999. A Mind Model forMultimodal Communicative Creatures andHumanoids. International Journal of AppliedArtificial Intelligence 13(45): 449486.

    Thrisson, K. R. 1997. Gandalf: An Embod-ied Humanoid Capable of Real-Time Multi-

    Articles

    90 AI MAGAZINE

    Please Join Us for the 2005AAAI Spring Symposium!

    Details:www.aaai.org/Symposia/

    Spring/2005/