collagen: applying collaborative discourse theory to human

12
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com COLLAGEN: Applying Collaborative Discourse Theory to Human-Computer Interaction Charles Rich, Candace L. Sidner, Neal Lesh TR-2000-38 November 2000 Abstract We describe an approach to intelligent user interfaces based on the idea of making the computer a collaborator, and an application-independent technology for implementing such interfaces. To appear in AI Magazine, Special Issue on Intelligent User Interfaces, 2001. This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved. Copyright c Mitsubishi Electric Research Laboratories, Inc., 2000 201 Broadway, Cambridge, Massachusetts 02139

Upload: others

Post on 17-Apr-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COLLAGEN: Applying Collaborative Discourse Theory to Human

MITSUBISHI ELECTRICRESEARCHLABORATORIEShttp://www.merl.com

COLLAGEN:Applying Collaborative Discourse Theory

to Human-Computer Interaction

CharlesRich,CandaceL. Sidner, NealLesh

TR-2000-38 November2000

Abstract

Wedescribeanapproachto intelligentuserinterfacesbasedon theideaof makingthecomputera collaborator, andanapplication-independent technologyfor implementingsuchinterfaces.

To appear in AI Magazine, Special Issue on Intelligent User Interfaces, 2001.

This work may not be copiedor reproducedin whole or in part for any commercialpurpose.Permissionto copy inwholeor in partwithoutpaymentof feeis grantedfor nonprofiteducationalandresearchpurposesprovidedthatall suchwholeor partialcopiesincludethefollowing: anoticethatsuchcopying is by permissionof MitsubishiElectricResearchLaboratories,Inc.;anacknowledgmentof theauthorsandindividualcontributionsto thework; andall applicableportionsof thecopyright notice.Copying, reproduction,or republishingfor any otherpurposeshallrequirealicensewith paymentof feeto MitsubishiElectricResearchLaboratories,Inc. All rightsreserved.

Copyright c�

MitsubishiElectricResearchLaboratories,Inc., 2000201Broadway, Cambridge,Massachusetts02139

Page 2: COLLAGEN: Applying Collaborative Discourse Theory to Human

Submittedin December2000.

Page 3: COLLAGEN: Applying Collaborative Discourse Theory to Human

COLLAGEN:Applying Collaborative Discourse Theory

to Human-Computer Interaction

Charles Rich, Candace L. Sidner and Neal Lesh

2 We describe an approach to intelligentuser interfaces based on the idea ofmaking the computer a collaborator, andan application-independent technology forimplementing such interfaces.

What properties of a user interface wouldmake you want to call it intelligent? For us,any interface that is called intelligent shouldat least be able to answer the six types ofquestions from users shown in Figure 1. Be-ing able to ask and answer these kinds ofquestions implies a flexible and adaptable di-vision of labor between the human and thecomputer in the interaction process. Unlikemost current interfaces, an intelligent user in-terface should be able to guide and supportyou when you make a mistake or if you don’tknow how to use the system well.

What we are suggesting here is aparadigm shift. As an analogy, considerthe introduction of the undo button. Thisone button fundamentally changed theexperience of using interactive systems byremoving the fear of making accidental mis-takes. Users today expect every interactivesystem to have an undo button and arejustifiably annoyed when they can’t findit. By analogy, to focus on just one of thequestion types in Figure 1, what we aresaying is that every user interface shouldhave a “What should I do next?” button.

Note that we are not saying that each ofthe questions in Figure 1 must literally be aseparate button. The mechanisms for ask-ing and answering these questions could bespoken or typed natural (or artificial) lan-guage, adaptive menus, simple buttons, orsome combination of these. We have exper-imented with all of these mechanisms in thevarious prototype systems described later.

Finally, some readers may object that an-swering the question types in Figure 1 shouldbe thought of as a function of the “appli-cation” rather than the “interface.” Rather

Who should/can/will do ?

What should I/we do next ?

Where am/was I ?

When did I/you/we do ?

Why did you/we (not) do

How do/did I/we/you do ?

Figure 1. Six Questions for an Intelligent Interface.

(Adapted from the news reporter’s “five W’s.”) The blanks are filled inwith application-specific terms, ranging from high-level goals, such as “pre-pare a market survey” or “shut down the power plant,” to primitive actions,such as “underline this word” or “close valve 17.”

than getting into a unproductive semantic ar-gument about the boundary between thesetwo terms, we prefer instead to focus on whatwe believe is the real issue, namely, whetherthis characterization of intelligent user in-terfaces can lead to the development of areusable middleware layer that makes it easyto incorporate these capabilities into diversesystems.

Again, there is a relevant historical anal-ogy. A key to the success of so-called wimp(Windows, Icons, Menus and Pointers) in-terfaces has been the development of widelyused middleware packages, such as Motif TM

and SwingTM. These middleware packagesembody generally useful graphical presenta-tion and interaction conventions, such as toolbars, scroll bars, check boxes, and so on. Webelieve that the next goal in user interfacemiddleware should be to codify techniques forsupporting communication about users’ taskstructure and process, as suggested by thequestion types in Figure 1. This article de-scribes a system, called CollagenTM, which isthe first step in this direction.

1

Page 4: COLLAGEN: Applying Collaborative Discourse Theory to Human

CollaborationSo what does all of this have to do with the“collaborative discourse theory” in the title ofthis article? The goal of developing genericsupport for communicating about the user’stask structure cannot, we feel, be achievedby taking an engineering approach focuseddirectly on the questions in Figure 1. Wetherefore started this research by looking foran appropriate theoretical foundation, whichwe found in the concept of collaboration.

Collaboration is a process in which twoor more participants coordinate their actionstoward achieving shared goals. Most col-laboration between humans involves commu-nication. Discourse is a technical term foran extended communication between two ormore participants in a shared context, suchas a collaboration. “Collaborative discoursetheory” (see theory sidebar) thus refers toa body of empirical and computational re-search about how people collaborate. Essen-tially, what we have done in this project isapply a theory of human-human interactionto human-computer interaction.

In particular, we have taken the approachof adding a collaborative interface agent (seeFigure 2) to a conventional direct-manipu-lation graphical user interface. The nameof our software system, Collagen (for Coll-aborative agent), derives from this approach.(Collagen is also a fibrous protein that is thechief constituent of connective tissue in ver-tebrates.)

The interface agent approach mimics therelationships that typically hold when twohumans collaborate on a task involving ashared artifact, such as two mechanics work-ing on a car engine together or two computerusers working on a spreadsheet together.

In a sense, our approach is a very literal-minded way of applying collaborative dis-course theory to human-computer interac-tion. We have simply substituted a soft-ware agent for one of the two humans thatwould appear in Figure 2 if it were a pic-ture of human-human collaboration. Theremay be other ways of applying the same the-ory without introducing the concept of anagent as separate from the application (Or-tiz & Grosz 2000), but we have not pursuedthose research directions.

Notice that the software agent in Figure 2is able both to communicate with and observethe actions of the user and vice versa. Amongother things, collaboration requires knowingwhen a particular action has been done. InCollagen, this can occur two ways: either bya reporting communication (“I have done x”)

observe

Agent

communicate

interact interact

observe

Application

User

Figure 2. Collaborative Interface Agent.

or by direct observation. Another symmetri-cal aspect of the figure is that both the userand the agent can interact with the applica-tion program.

There are many complex engineering is-sues regarding implementing the observationand interaction arrows in Figure 2, which arebeyond the scope of this article (Lieberman1998). In all our prototypes, the applica-tion program has provided a program inter-face (api) for performing and reporting prim-itive actions and for querying the applicationstate. Communication between the user andthe agent has been variously implemented us-ing speech recognition and generation, textand menus.

Outline of the Article

The remainder of this article lays out thework we have done in more detail, startingwith snapshots of four interface agents builtby us and our collaborators in different do-mains using Collagen. Following these ex-amples comes a description of the techni-cal heart of our system, which is the rep-resentation of discourse state and the al-gorithm for updating it as an interactionprogresses. Next, we discuss another keytechnical contribution, namely how Collagenuses plan recognition in a collaborative set-ting. Finally, we present the overall sys-tem architecture of Collagen, emphasizingthe application-specific versus application-in-dependent components. We conclude with abrief discussion of related and future work.

Application Examples

This section briefly describes four interfaceagents built by us and our collaborators in

2

Page 5: COLLAGEN: Applying Collaborative Discourse Theory to Human

Theory

In 1986, Grosz and Sidner proposeda tripartite framework for modellingtask-oriented discourse structure. Thefirst (intentional) component recordsthe beliefs and intentions of the dis-course participants regarding the tasksand subtasks (“purposes”) to be per-formed. The second (attentional) com-ponent captures the changing focus ofattention in a discourse using a stackof “focus spaces” organized around thediscourse purposes. As a discourseprogresses, focus spaces are pushedonto and popped off of this stack. Thethird (linguistic) component consistsof the contiguous sequences of utter-

ances, called “segments,” which con-tribute to a particular purpose.

Grosz and Sidner extended this ba-sic framework in 1990 with the intro-duction of SharedPlans, which are aformalization of the collaborative as-pects of a conversation. The Shared-Plan formalism models how intentionsand mutual beliefs about shared goalsaccumulate during a collaboration.Grosz and Kraus provided a com-prehensive axiomatization of Shared-Plans in 1996, including extending itto groups of collaborators.

Most recently, Lochbaum developedan algorithm for discourse interpreta-tion using SharedPlans and the tri-partite model of discourse. This al-

gorithm predicts how conversants fol-low the flow of a conversation basedon their understanding of each other’sintentions and beliefs.

ReferencesGrosz, B. J., and Sidner, C. L. 1986.Attention, intentions, and the structureof discourse. Computational Linguistics12(3):175–204.Grosz, B. J., and Sidner, C. L. 1990.Plans for discourse. In Cohen, P. R.; Mor-gan, J. L.; and Pollack, M. E., eds., Inten-tions and Communication. Cambridge,MA: MIT Press. 417–444.Grosz, B. J., and Kraus, S. 1996. Col-laborative plans for complex group action.Artificial Intelligence 86(2):269–357.Lochbaum, K. E. 1998. A collaborativeplanning model of intentional structure.Computational Linguistics 24(4).

four different application domains using Col-lagen. We have also built agents for air travelplanning (Rich & Sidner 1998) and email(Gruen et al. 1999). All of these agents arecurrently research prototypes.

Figure 3 shows a screen image and sampleinteraction for each example agent. Instancesof the questions types in Figure 1 are under-lined in the sample interactions

Each screen image in Figure 3 consists of alarge application-specific window, which boththe user and agent can use to manipulatethe application state and two smaller win-dows, labelled “Agent” and “User,” which areused for communication between the user andagent. Two of the example agents (3a and 3d)communicate using speech recognition andgeneration technology; the other two allowthe user to construct utterances using hier-archical menus dynamically generated basedon the current collaboration state.

The agent in Figure 3a helps a user set upand program a video cassette recorder (vcr).The image in the figure, which a real userwould see on her television screen, includesthe vcr itself so that the agent can point atparts of the vcr during explanations (see line12). In the first part of the vcr agent tran-script, the agent helps the user eliminate theannoying blinking “12:00” that is so commonon vcr clocks. Later on, the agent walks theuser through the task of connecting a cam-corder to the vcr.

The agent in Figure 3b was developed incollaboration with the Industrial Electronicsand Systems Laboratory of Mitsubishi Elec-tric in Japan. The application program inthis case is a sophisticated graphical inter-face development tool, called the Symbol Ed-itor. Like many such tools, the Symbol Edi-tor is difficult for novice users because there

are too many possible things to do at anymoment and the system itself gives no guid-ance regarding what to do next. Our agentguides a user through the process of achievinga typical task using the Symbol Editor, au-tomatically performing many of the tedioussubtasks along the way.

The agent in Figure 3c was developed incollaboration with the Information SciencesInstitute of the University of Southern Cali-fornia (Rickel et al. 2001). This agent teachesa student user how to operate a gas turbineengine and generator configuration using asimple software simulation. The first timethe agent teaches a new task or subtask, itwalks the student through all the requiredsteps. If a task has already been performedonce, however, the agent tells the student to“take it from here” (line 3). If the studentlater asks for help (line 5), the agent will de-scribe just the next step to be performed.

The gas turbine agent is part of a largereffort, which also involves the MITRE Cor-poration (Gertner et al. 2000), to incorpo-rate application-independent tutorial strate-gies into Collagen. Teaching and assistingare best thought of as points on a spectrumof collaboration (Davies et al. 2001), ratherthan as separate capabilities.

Finally, Figure 3d shows an agent beingdeveloped at the Delft University of Tech-nology to help people program a home ther-mostat (Keyson et al. 2000). The transcripthere illustrates only a very simple interactionwith the agent. This agent will eventuallybe able to help people analyze their behav-ior patterns and construct complicated heat-ing and cooling schedules to conserve energy.This work is part of a larger research projectat Delft to add intelligence to products.

3

Page 6: COLLAGEN: Applying Collaborative Discourse Theory to Human

1 User says "Hello."2 Agent says "Hello, what’s your name?"3 User says "Candy."4 Agent says "Hello, Candy!"

5 User says "Let’s set the clock."6 Agent says "What is the time?"7 User says "The time is 11:00 AM."8 Agent sets the clock to 11:00 AM.

...9 User says "How do I play back a video from my camcorder?"

10 Agent says "First, plug in the camcorder cables."11 User says "How do I plug in the camcorder cables?"

12 Agent points to where you plug in the camcorder cables.13 Agent says "The red cable plugs into the red terminal, the

black cable plugs into the black terminal, and the yellow cableplugs into the yellow terminal. Make sure the other ends of the

cables plug into the corresponding colors on the camcorder."14 User says "What next?"15 Agent says "Now press Play on your camcorder. You should

see a picture on the TV screen."

16 User says "Ok."17 Agent says "I don’t see a picture. Make sure the camcorder

is plugged in or has fresh batteries, and is turned on."18 User says "Ok."

19 Agent says "Good. It’s working."

(a) VCR Agent.

1 Agent says "Please list the other (non-boolean) attributesused in the columns of the logic table."

2 User says "The attributes are: voltage."3 Agent creates a new user-defined attribute.

4 Agent enters "voltage" as the name of the new attribute.5 Agent says "Please select the data type of the new attribute."6 User selects float as the data type of the new attribute.7 User clicks ok to accept the definition of voltage.

8 Agent says "Please edit the initial setting for the voltageattribute (if necessary) and then click ok."

9 User clicks ok to accept 0.0 as the initial setting for voltage.10 Agent shows the expression editor for column 1.

11 Agent enters (voltage > 0.0) as the title for column 1.12 Agent says "Please define an expression for column 1."

...[user goes to lunch]...14 User says "Where are we?"

15 Agent says "You are defining an expression for column 1, aspart of making a logic table."

16 User enters (voltage >= 1000.0) as the title for column 1.17 Agent shows the logical operator window.

18 Agent selects the >= operator.19 Agent drags and drops >= into expression for column 1.20 Agent selects the voltage attribute.21 Agent drags and drops voltage into expression for column 1.

22 Agent types "1000.0" into expression for column 1.

(b) Symbol Editor Agent.

1 Agent says "Let’s start the generator."2 User says "Ok."3 Agent says "You take it from here."

4 User presses the alarm reset button.5 User says "What next?"6 Agent says "Please press the gas turbine generator on button."7 Agent points to the gas turbine generator on button.

8 User presses the gas turbine generator button.9 Agent says "Good. We have successfully started the generator."

(c) Gas Turbine Agent.

1 Agent says "Let’s set the temperature in a room."2 User selects the bathroom.3 User says "What next?"

4 Agent says "What should the temperature setting be?"5 User says "18 degrees Centigrade."6 Agent sets the temperature in the bedroom to 18 degrees.

(d) Thermostat Agent.

Figure 3. Example Agents Built with Collagen.

4

Page 7: COLLAGEN: Applying Collaborative Discourse Theory to Human

DisplaySchedule

RecordProgram

AddProgram

RecordProgram

ReportConflictDisplaySchedule

Focus Stack Plan Tree2 3

1

Scheduling a program to be recorded.1 User says "I want to record a program."

Done successfully displaying the recording schedule.2 Agent displays recording schedule.3 Agent says "Here is the recording schedule."

Next expecting to add a program to the recording schedule.Expecting optionally to say there is a conflict.

Figure 4. Example Discourse State and Segmented Interaction History for VCR Agent.

Discourse StateParticipants in a collaboration derive bene-fit by pooling their talents and resources toachieve common goals. However, collabora-tion also has its costs. When people collab-orate, they must usually communicate andexpend mental effort to ensure that their ac-tions are coordinated. In particular, eachparticipant must maintain some sort of men-tal model of the status of the collaborativetasks and the conversation about them—wecall this model the discourse state.

Among other things, the discourse statetracks the beliefs and intentions of all the par-ticipants in a collaboration and provides a fo-cus of attention mechanism for tracking shiftsin the task and conversational context. All ofthis information is used by an individual tohelp understand how the actions and utter-ances of the other participants contribute tothe common goals.

In order to turn a computer agent into acollaborator, we needed a formal representa-tion of discourse state and an algorithm forupdating it. The discourse state representa-tion currently used in Collagen, illustrated inFigure 4, is a partial implementation of Grosz& Sidner’s theory of collaborative discourse(see theory sidebar); the update algorithm isdescribed in the next section.

Collagen’s discourse state consists of astack of goals, called the focus stack (whichwill soon become a stack of focus spaces tobetter correspond with the theory), and aplan tree for each goal on the stack. The topgoal on the focus stack is the “current pur-pose” of the discourse. A plan tree in Colla-gen is an (incomplete) encoding of a partialSharedPlan between the user and the agent.For example, Figure 4 shows the focus stackand plan tree immediately following the dis-course events numbered 1–3 on the right sideof the figure.

Segmented Interaction History

The annotated, indented execution trace onthe right side of Figure 4, called a segmentedinteraction history, is a compact, textual rep-

resentation of the past, present and futurestates of the discourse. We originally de-veloped this representation to help us debugagents and Collagen itself, but we have alsoexperimented with using it to help users visu-alize what is going in a collaboration (see dis-cussion of “history-based transformations” in(Rich & Sidner 1998))

The numbered lines in a segmented inter-action history are simply a log of the agent’sand user’s utterances and primitive actions.The italic lines and indentation reflect Colla-gen’s interpretation of these events. Specifi-cally, each level of indentation defines a seg-ment (see theory sidebar) whose purpose isspecified by the italicized line that precedesit. For example, the purpose of the toplevelsegment in Figure 4 is scheduling a programto be recorded.

Unachieved purposes that are currentlyon the focus stack are annotated using thepresent tense, such as scheduling, whereascompleted purposes use the past tense, suchas done. (Note in Figure 4 that a goal is notpopped off the stack as soon as it is com-pleted, because it may continue to be thetopic of conversation, for example, to discusswhether it was successful.)

Finally, the italic lines at the end of eachsegment, which include the keyword expect-ing, indicate the steps in the current planfor the segment’s purpose which have notyet been executed. The steps which are“live” with respect to the plan’s ordering con-straints and preconditions have the addedkeyword next.

Discourse Interpretation

Collagen updates its discourse state afterevery utterance or primitive action by theuser or agent using Lochbaum’s discourseinterpretation algorithm (see theory sidebar)with extensions to include plan recognition(see next section) and unexpected focusshifts (Lesh et al. 2001).

According to Lochbaum, each discourseevent is explained as either: (i) starting anew segment whose purpose contributes to

5

Page 8: COLLAGEN: Applying Collaborative Discourse Theory to Human

the current purpose (and thus pushing a newpurpose on the focus stack), (ii) continuingthe current segment by contributing to thecurrent purpose, or (iii) completing the cur-rent purpose (and thus eventually poppingthe focus stack).

An utterance or action contributes to apurpose if it either: (i) directly achieves thepurpose, (ii) is a step in a recipe for achievingthe purpose, (iii) identifies the recipe to beused to achieve the purpose, (iv) identifieswho should perform the purpose or a stepin the recipe, or (v) identifies a parameterof the purpose or a step in the recipe. Theselast three conditions are what Lochbaum calls“knowledge preconditions.”

A recipe is a goal-decomposition method(part of a task model). Collagen’s recipe def-inition language (see Figure 5) supports par-tially ordered steps, parameters, constraints,pre- and post-conditions, and alternative goaldecompositions.

Our implementation of the discourse in-terpretation algorithm above requires utter-

public recipe RecordRecipe achieves RecordProgram {step DisplaySchedule display;step AddProgram add;optional step ReportConflict report;constraints {

display precedes add;add precedes report;add.program == achieves.program;report.program == achieves.program;report.conflict == add.conflict;

}}

Figure 5. Example Recipe in VCR Task Model.

Definition of the recipe used in Figure 4 to decompose the non-primitiveRecordProgram goal into primitive and non-primitive steps. Collagen taskmodels are defined in an extension of the Java language which is automat-ically processed to create Java class definitions for recipes and act types.

Scheduling a program to be recorded.1 User says "I want to record a program."

Done successfully displaying the recording schedule.2 Agent displays recording schedule.3 Agent says "Here is the recording schedule."4 User says "Ok."

Done identifying the program to be recorded.5 Agent says "What is the program to be recorded?"6 User says "Record ’The X-Files’."

Next expecting to add a program to the recording schedule.Expecting optionally to say there is a conflict.

Figure 6. Continuing the Interaction in Figure 4.

ances to be represented in Sidner’s (1994) ar-tificial discourse language. For our speech-based agents, we have used standard naturallanguage processing techniques to computethis representation from the user’s spoken in-put. Our menu-based systems construct ut-terances in the artificial discourse languagedirectly.

Discourse Generation

To illustrate how Collagen’s discourse state isused to generate as well as interpret discoursebehavior, we briefly describe below how thevcr agent produces the underlined utteranceon line 5 in Figure 6, which continues theinteraction in Figure 4.

The discourse generation algorithm inCollagen is essentially the inverse of dis-course interpretation. Based on the currentdiscourse state, it produces a prioritized list,called the agenda, of (partially or totallyspecified) utterances and actions whichwould contribute to the current discoursepurpose according to cases (i) through (v)above. For example, for the discourse statein Figure 4, the first item on the agenda isan utterance asking for the identity of theprogram parameter of the AddProgram stepof the plan for RecordProgram.

In general, an agent may use any appli-cation-specific logic it wants to decide on itsnext action or utterance. In most cases, how-ever, an agent can simply execute the firstitem on the agenda generated by Collagen,which is what the vcr agent does in this ex-ample. This utterance starts a new segment,which is then completed by the user’s answeron line 6.

Plan Recognition

Plan recognition (Kautz & Allen 1986) is theprocess of inferring intentions from actions.Plan recognition has often been proposed forimproving user interfaces or to facilitate intel-ligent help features. Typically, the computerwatches “over the shoulder” of the user andjumps in with advice or assistance when itthinks it has enough information.

In contrast, our main motivation foradding plan recognition to Collagen was toreduce the amount of communication re-quired to maintain a mutual understandingbetween the user and the agent of theirshared plans in a collaborative setting (Leshet al. 1999). Without plan recognition,Collagen’s discourse interpretation algorithmonerously required the user to announce each

6

Page 9: COLLAGEN: Applying Collaborative Discourse Theory to Human

Events [k, l] -

PlanAQs�+

B C-

Focus B -

-

A→ B, CB → G, hC → k, DG→ k, l

A→ D, d, fB → k, JD → l, nJ → l, m

Recipes

Plan

Recognizer-

Two extensions of the input plan which“minimally explain” events k and lby applying recipes below the focus B.

AQs�+

B CQs�+

G hQs�+

k l

AQs�+

B CQs�+

k JQs�+

l m

Figure 7. Example Plan Recognizer Inputs and Outputs in a Collaborative Setting.

goal before performing a primitive actionwhich contributed to it.

Although plan recognition is a well-knownfeature of human collaboration, it has provendifficult to incorporate into practical com-puter systems due to its inherent intractabil-ity in the general case. We exploit three prop-erties of the collaborative setting in order tomake our use of plan recognition tractable.The first property is the focus of attention,which limits the search required for possibleplans.

The second property of collaboration weexploit is the interleaving of developing, com-municating about and executing plans, whichmeans that our plan recognizer typically op-erates only on partially elaborated hierarchi-cal plans. Unlike the “classical” definitionof plan recognition, which requires reasoningover complete and correct plans, our recog-nizer is only required to incrementally extenda given plan.

Third, it is quite natural in the contextof a collaboration to ask for clarification, ei-ther because of inherent ambiguity, or simplybecause the computation required to under-stand an action is beyond a participant’s abil-ities. We use clarification to ensure that thenumber of actions the plan recognizer mustinterpret will always be small.

Figure 7 illustrates roughly how planrecognition works in Collagen. Suppose theuser performs action k. Given the root plan(e.g., A) for the current discourse purpose(e.g., B) and a set of recipes, the planrecognizer determines the set of minimalextensions to the plan which are consis-tent with the recipes and include the userperforming k. If there is exactly one suchextension, the extended plan becomes part

of the new discourse state. If there is morethan one possible extension, action k is heldand reinterpreted along with the next event,which may disambiguate the interpretation(which l does not), and so on. The nextevent may in fact be a clarification.

Our algorithm also computes essentiallythe same recognition if the user does not ac-tually perform an action, but only proposesit, as in, “Let’s achieve G.” Another impor-tant, but subtle, point is that Collagen ap-plies plan recognition to both user and agentutterances and actions in order to correctlymaintain a model of what is mutually be-lieved.

System Architecture

Figure 8 summarizes the technical portion ofthis article by showing how all the pieces de-scribed earlier fit together in the architectureof collaborative system built with Collagen.This figure is essentially an expansion of Fig-ure 2, showing how Collagen mediates theinteraction between the user and the agent.Collagen is implemented using Java BeansTM,which makes it easy to modify and extendthis architecture.

The best way to understand the basic ex-ecution cycle in Figure 8 is to start with thearrival of an utterance or an observed action(from either the user or the agent) at thediscourse interpretation module at the topcenter of the diagram. The discourse inter-pretation algorithm (including plan recogni-tion) updates the discourse state as describedabove, which then causes a new agenda to becomputed by the discourse generation mod-ule. In the simplest case, the agent respondsby selecting and executing an entry in the

7

Page 10: COLLAGEN: Applying Collaborative Discourse Theory to Human

Utterances

Discourse State

Agenda

Actions Actions

Utterances

Observations

Recipe Library

DiscourseGeneration

Application

InterpretationDiscourse

COLLAGEN

Observations

Menu

Figure 8. Architecture of a Collaborative System Built with Collagen

new agenda (which may be either an utter-ance or an action), which provides new inputto discourse interpretation.

In a system without natural language un-derstanding, a subset of the agenda is alsopresented to the user in the form of a menuof customizable utterances. In effect, thisis a way of using expectations generated bythe collaborative context to replace naturallanguage understanding. Because this is amixed-initiative architecture, the user can, atany time, produce an utterance (e.g., by se-lecting from this menu) or perform an appli-cation action (e.g., by clicking on an icon),which provides new input to discourse inter-pretation.

In the simple story above, the onlyapplication-specific components an agentdeveloper needs to provide are the recipelibrary and an API through which applica-tion actions can be performed and observed(for an application-independent approach tothis API, see (Cheikes et al. 1999)). Giventhese components, Collagen is a turnkeytechnology—default implementations areprovided for all the other needed componentsand graphical interfaces, including a defaultagent which always selects the first item onthe agenda.

In each of the four example applications inFigure 3, however, a small amount (e.g., sev-

eral pages) of additional application-specificcode was required in order to achieve thedesired agent behavior. As the arrows in-coming to the agent in Figure 8 indicate,this application-specific agent code typicallyqueries the application and discourse statesand (less often) the recipe library. An agentdeveloper is free, of course, to employ arbitar-ily complex application-specific and generictechniques, such as a theorem proving, first-principles planning, etc., to determine theagent’s response to a given situation.

Related WorkThis work lies at the intersection of manythreads of related research in artificial intelli-gence, computational linguistics, and user in-terface. We believe it is unique, however, inits combination of theoretical elements andimplemented technology. Other theoreticalmodels of collaboration (Levesque et al. 1990)do not integrate the intentional, attentionaland linguistic aspects of collaborative dis-course, as SharedPlan theory does. On theother hand, our incomplete implementationof SharedPlan theory in Collagen does notdeal with the many significant issues in a col-laborative system with more than two partic-ipants (Tambe 1997).

There has been much related work onimplementing collaborative dialogues in

8

Page 11: COLLAGEN: Applying Collaborative Discourse Theory to Human

the context of specific applications, basedeither on discourse planning techniques(Chu-Carroll & Carberry 1995; Ahn etal. 1995; Allen et al. 1996; Stein et al.1999)) or rational agency with principles ofcooperation (Sadek & De Mori 1997). Noneof these research efforts, however, have pro-duced software that is reusable to the samedegree as Collagen. In terms of reuseabilityacross domains, a notable exception is theVerbmobil project (Verbmobil 2000), whichconcentrates on linguistic issues in discourseprocessing, without an explicit model ofcollaboration.

Finally, a wide range of interface agents(Maes 1994) continue to be developed, whichhave some linguistic and collaborative capa-bilities, without any general underlying the-oretical foundation.

Future WorkWe are currently extending our plan recog-nition algorithm to automatically detect cer-tain classes of so-called “near miss” errors,such as performing an action out of order, orperforming the right action with the wrongparameter. The basic idea is that, if thereis no explanation for an action given the“correct” recipe library, the plan recognizershould incrementally relax the constraints onrecipes before giving up and declaring the ac-tion to be unrelated to the current activity.This functionality is particularly useful in tu-toring applications of Collagen.

We have, in fact, recently become in-creasingly interested in tutoring and train-ing applications (Gertner et al. 2000; Rickelet al. 2001), which has revealed some im-plicit biases in how Collagen currently oper-ates. For example, Collagen’s default agentwill always, if possible, itself perform the nextaction that contributes to the current goal.This is not, however, always appropriate ina tutoring situation, where the real goal isnot to get the task done, but for the studentto learn how to do the task. We are there-fore exploring various generalizations and ex-tensions to Collagen to better support thefull spectrum of collaboration (Davies et al.2001), such as creating explicit tutorial goalsand recipes, and recipes that encode “workedexamples.”

We are also working on two substantialextensions to the theory underlying Collagen.First, we are adding an element to the at-tentional component (see theory sidebar) totrack which participant is currently in con-trol of the conversation. The basic idea isthat, when a segment is completed, the de-

fault control (initiative) goes to the partici-pant who initiated the enclosing segment.

Second, we are beginning to codify the ne-gotiation strategies used in collaborative dis-course. These strategies are different fromthe negotiation strategies used in disputa-tions (Kraus et al. 1998). For example, whenthe user rejects an agent’s proposal (or viceversa), the agent and user should be able toenter into a sub-dialogue in which their re-spective reasons for and against the proposalare discussed.

ReferencesAhn, R., et al. 1995. The DenK-architecture: Afundamental approach to user-interfaces. Artifi-cial Intelligence Review 8:431–445.

Allen, J., et al. 1996. A robust system for naturalspoken dialogue. In Proc. 34th Annual Meetingof the ACL, 62–70.

Cheikes, B., et al. 1999. Embedded training forcomplex information systems. Int. J. of ArtificialIntelligence in Education 10:314–334.

Chu-Carroll, J., and Carberry, S. 1995. Responsegeneration in collaborative negotiation. In Proc.33rd Annual Meeting of the ACL, 136–143.

Davies, J., et al. 2001. Incorporating tutorialstrategies into an intelligent assistant. In Proc.Int. Conf. on Intelligent User Interfaces.

Gertner, A.; Cheikes, B.; and Haverty, L. 2000.Dialogue management for embedded training. InAAAI Symposium on Building Dialogue Systemsfor Tutorial Applications, Tech. Report FS-00-01. Menlo Park, CA: AAAI Press.

Gruen, D.; Sidner, C.; Boettner, C.; and Rich, C.1999. A collaborative assistant for email. In Proc.ACM SIGCHI Conference on Human Factors inComputing Systems.

Kautz, H. A., and Allen, J. F. 1986. Generalizedplan recognition. In Proc. 5th National Conf. onArtificial Intelligence, 32–37.

Keyson, D., et al. 2000. The intelligent thermo-stat: A mixed-initiative user interface. In Proc.ACM SIGCHI Conference on Human Factors inComputing Systems, 59–60.

Kraus, S.; Sycara, K.; and Evanchik, A.1998. Argumentation in negotiation: A formalmodel and implementation. Artificial Intelli-gence 104(1-2):1–69.

Lesh, N.; Rich, C.; and Sidner, C. 1999. Usingplan recognition in human-computer collabora-tion. In Proc. 7th Int. Conf. on User Modelling,23–32.

Lesh, N.; Rich, C.; and Sidner, C. 2001. Collab-orating with focused and unfocused users underimperfect communication. In Proc. 9th Int. Conf.on User Modelling. Submitted.

Levesque, H. J.; Cohen, P. R.; and Nunes, J.H. T. 1990. On acting together. In Proc. 8thNational Conf. on Artificial Intelligence, 94–99.

9

Page 12: COLLAGEN: Applying Collaborative Discourse Theory to Human

Lieberman, H. 1998. Integrating user interfaceagents with conventional applications. In Proc.Int. Conf. on Intelligent User Interfaces, 39–46.

Maes, P. 1994. Agents that reduce work andinformation overload. Comm. ACM 37(17):30–40. Special Issue on Intelligent Agents.

Ortiz, C. L., and Grosz, B. J. 2000. Interpretinginformation requests in context: A collaborativeweb interface for distance learning. AutonomousAgents and Multi-Agent Systems. To appear.

Rich, C., and Sidner, C. 1998. COLLAGEN:A collaboration manager for software interfaceagents. User Modeling and User-Adapted Inter-action 8(3/4):315–350.

Rickel, J.; Lesh, N.; Rich, C.; Sidner, C.; andGertner, A. 2001. Using a model of collaborativedialogue to teach procedural tasks. In Proc. 10thInt. Conf. on Artificial Intelligence in Education.Submitted.

Sadek, D., and De Mori, R. 1997. Dialogue sys-tems. In Mori, R. D., ed., Spoken Dialogues withComputers. Academic Press.

Sidner, C. L. 1994. An artificial discourse lan-guage for collaborative negotiation. In Proc. 12thNational Conf. on Artificial Intelligence, 814–819.

Stein, A.; Gulla, J. A.; and Thiel, U. 1999.User-tailored planning of mixed initiative infor-mation seeking dialogues. User Modeling andUser-Adapted Interaction 9(1-2):133–166.

Tambe, M. 1997. Towards flexible teamwork. J.of Artificial Intelligence Research 7:83–124.

Verbmobil. 2000. http://verbmobil.dfki.de.

Charles Rich is a Senior Research Scientistat Mitsubishi Electric Research Laboratoriesin Cambridge, Massachusetts. He received hisPh.D. from MIT in 1980. The thread connectingall of Rich’s research has been to make inter-acting with a computer more like interactingwith a person. As a founder and director of theProgrammer’s Apprentice project at the MITArtificial Intelligence Laboratory in the 1980s,he pioneered research on intelligent assistantsfor software engineers. Rich joined MERL in1991 as a founding member of the research staff.He is a Fellow and past Councillor of AAAI andwas Co-Chair of AAAI’98. His email address [email protected].

Candace L. Sidner is a Senior Research Sci-entist at Mitsubishi Electric Research Laborato-ries in Cambridge, Massachusetts. She receivedher Ph.D. from MIT in 1979. Sidner has re-searched many aspects of user interfaces, espe-cially those involving speech and natural lan-guage understanding and human-computer col-laboration. Before coming to MERL, she wasa member of the research staff at Bolt BeranekNewman, Inc., Digital Equipment Corp., and Lo-tus Development Corp., and a visiting scientistHarvard at University. She is a Fellow and past

Councillor of AAAI, past President of the Asso-ciation for Computational Linguistics, and Co-Chair of the 2001 International Conference onIntelligent User Interfaces. Her email address [email protected].

Neal Lesh is a Research Scientist at MitsubishiElectric Research Laboratories in Cambridge,Massachusetts. He received his Ph.D. from theU. of Washington in 1998. His research interestsof late lie primarily in human-computer collab-orative problem solving. He has also workedin the areas of interface agents, interactiveoptimization, goal recognition, data mining, andsimulation-based inference. After completing histhesis on scalable and adaptive goal recognitionat U. Washington with Oren Etzioni, Lesh was apostdoc at U. Rochester with James Allen. Hisemail address is [email protected].

10