articles building agents to serve...

■ AI agents combining natural language interaction,task planning, and business ontologies can helpcompanies provide better-quality and more cost-effective customer service. Our customer-serviceagents use natural language to interact with cus-tomers, enabling customers to state their inten-tions directly instead of searching for the places onthe Web site that may address their concern. Weuse planning methods to search systematically forthe solution to the customer’s problem, ensuringthat a resolution satisfactory for both the customerand the company is found, if one exists. Ouragents converse with customers, guaranteeing thatneeded information is acquired from customersand that relevant information is provided to themin order for both parties to make the right deci-sion. The net effect is a more frictionless interac-tion process that improves the customer experi-ence and makes businesses more competitive onthe service front.

As companies optimize their productionand supply-chain processes, more peo-ple use the quality of customer service to

differentiate between alternative vendors orservice providers. Customer service is currentlya manual process supported by costly call-cen-ter infrastructures. Its lack of flexibility inadapting to fluctuations in demand and prod-uct change, together with the staffing andtraining difficulties caused by massive person-nel turnovers, often results in long telephonequeues and frustrated customers. This is a ma-jor cause for concern, as it generally costs fivetimes more to acquire a new customer than tokeep an existing one.

How can AI help in addressing this problem?For several years we have built a domain-inde-pendent AI platform for creating conversation-al customer-service agents that use a variety ofnatural language understanding and reasoning

methods to interact with customers and re-solve their problems. We have applied thisplatform to customer-service applications suchas technical diagnosis of wireless-service deliv-ery problems, product recommendation, ordermanagement, quality complaint management,and sales recovery, among others. The result-ing solutions and the lessons learned in theprocess are the subject of this article.

Understanding, Interaction, and Resolution

Compared to the “newspaper page” model ofWeb sites, in which people have to navigatethe site, find the information they need, andmake the ensuing decisions on their own, nat-ural language interaction combined with AI-based reasoning can change the interactive ex-perience profoundly (Allen et al. 2001). Thehigh bandwidth of natural language allowsusers to state their intentions directly, insteadof searching for a place in the site that seems toaddress their problem. Having a reasoning sys-tem that plans (in an AI sense) for achievingthe user goals increases the certainty that a so-lution that can satisfy both the user and thecompany will be found. The ability to converseguarantees that the relevant information is ac-quired from the user and provided to the userif and when needed in order for both parties tomake the right decisions.

Ideally, a conversational customer-interac-tion agent should be able to understand lan-guage, converse and resolve problems withhigh accuracy within its main area of compe-tence, and degrade gracefully as we departfrom this area. Human users react with moredispleasure when the agent exhibits completefailure to understand than when it shows par-tial understanding or even an effort to under-

Articles

FALL 2004 47

Building Agents to Serve Customers

Mihai Barbuceanu, Mark S. Fox, Lei Hong, Yannick Lallement, and Zhongdong Zhang

Copyright © 2004, American Association for Artificial Intelligence. All rights reserved. 0738-4602-2004 / $2.00

use a specific game, the planner creates sub-goals for applying a diagnosis method to deter-mine the cause. Normally the investigationwould start with obtaining user account infor-mation from the back end and determining theconditions under which the incident occurred.The Business Reasoning module performs do-main inferences, such as determining that theuser’s device must be J2ME enabled, and ap-plies business policies, such as verifying con-tent-rating restrictions on user accounts (in theprocess, accessing account information fromthe back end). In figure 1, the incident locationwas not provided in the input, and thus has tobe asked for. The Ask-incident-location methodused for the Establish-incident-location goaldetermines the incident location by requestingthe Natural Language Generation module tocreate the question for the user. At this point,the user can answer the question or divergeand create another conversation topic by ask-ing a different question. Dialogue manage-ment policies in the agent allow such diver-sions to be handled in various ways.

Understanding Natural Language

The need for both breadth and depth of inter-action has led us to integrate two natural lan-guage-understanding methods. The first, con-ferring high accuracy in limited domainsegments, is analytic. It relies on a staged ap-proach involving tokenizing, limited syntaxanalysis, semantic analysis, merging, and log-ical interpretation. The second is a similarity-based method that computes the degree ofsimilarity between a question and a set of pos-sible answers, returning the answers that havethe highest similarity. This is used to providegeneral answers from an arbitrary set of preex-isting documents. The two methods can beused in combination based on a unique do-main ontology.

Domain OntologyAn ontology is a conceptualization of an appli-cation domain shared by a community of in-terest, in our case including at least the ven-dors and the customers of the products theagent is servicing.

We use description logics (Borgida et al.1989) to represent ontologies. Description log-ics represent knowledge by means of structuredconcepts organized in lattices. A lattice is anacyclic-directed graph with a top and a bottomnode. See figure 2 for an example. An edge inthe graph represents a subsumption relationbetween two concepts. A concept a subsumes a

stand. They also assume that an agent shouldknow about issues that are commonsensicallyrelated to its main competence area. Therefore,graceful degradation must be provided, sup-ported by a critical minimum of commonsenseknowledge around the main competence area.

The Internet currently reaches a variety oftouch points through which the agent must beavailable: Web browsers on high-resolutiondesktops and small-screen devices such as cel-lular telephones and PDAs, e-mail, and thepublic telephony network accessible by tele-phone. Consequently, a user must be able toachieve the same goals with the same resultsthrough any of these channels. State changesperformed on one channel (for example,changing an order) must be reflected on theother channels. As some interactions are easieron some channels than on others, the agent isresponsible for planning and carrying out con-versations such that it uses the advantages andavoids the pitfalls of each channel in part.

To create customer agents of this type, a largevariety of types of knowledge need to be repre-sented and applied, including linguisticsknowledge, conversation-planning knowledge,and industry- and company-specific businessknowledge. Last but not least, an engineeringchallenge is to represent all these types ofknowledge uniformly in a complete ontologythat is supported by visual editing tools andthat facilitates componentization and reuse.

Overall ArchitectureHow do we achieve these objectives? An inter-active customer agent built on our platformhas the architecture shown in figure 1. The ex-ample illustrated is about games that are sub-scribed to online and downloaded through theair for use on Java-enabled cellular telephones.

The user input (from no matter what chan-nel) is first converted into an internal messageformat. From this, the Natural Language Un-derstanding module produces a set of semanticframes that represent the user’s intention andcontain relevant information extracted fromthe input. These frames are passed to the Con-versation Management module. This moduleidentifies whether the issue can be answeredimmediately or whether a longer conversationneeds to be initiated. In the latter case, a work-space is created for the Interaction Planner,consisting of a goal stack and a database. Thegoal stack is initiated with the starting goal.The planner uses a hierarchical task-planningmethod to decompose goals into subgoals andperform a backtracking search to satisfy them.If the user complains about not being able to

Articles

48 AI MAGAZINE

concept b, subsumes(a, b), if and only if any in-stance of b is also an instance of a. Descriptionlogics provide automated concept classifica-tion (see figure 3, “Classification in DescriptionLogics”). Figure 2 shows a very small, classifiedproduct lattice. It also illustrates the ability ofthe formalism to handle disjoint subconcepts.These are subconcepts that do not have com-mon instances, such as those marked with an“x” in the figure (Vegetable and Meat). Classi-fication is valuable for ontology construction,as it enforces and clarifies the semantics of con-cepts by making explicit the logical conse-quences of their definitions. (See figure 3.)

Knowledge-Based Language UnderstandingWe use linguistic and domain knowledge tounderstand natural language expressions by

applying a sequence of steps to the linguisticinput: tokenization, syntax analysis, semanticanalysis, merging, and logical interpretation.Here’s what each step does.

Tokenization. Words are first tagged with partof speech information. Main tags indicatenouns, verbs, adjectives, adverbs, prepositions,determiners, as well as the tense and the singu-lar or plural form. Words not found in the dic-tionary are spelling-checked and corrected. Re-maining words are tagged as general alpha-digitor digit strings. Alpha-digit and digit strings arethen recognized as URLs, e-mails, or telephonenumbers. Token sequences are recognized asdates, addresses, person and institution names,and a few others. Domain-specific token gram-mars defined as regular expressions are appliedif defined.

Syntax Analysis. Syntax analysis is limited to

Articles

FALL 2004 49

Web E-mail Phone

NaturalLanguage

UnderstandingConversationManagement

ChannelInterface

Adapter

User:“I bought a bunch of games for my kid, but yesterday he couldn’t use any.”

Usage-complaintApplication: gameDate: 17 Nov 02

Get account info

Account: 123234Activity: …

Goal: Establish-incident-locationMethod: Ask-incident-location

Infer device =J2ME-enabled

Agent:“Where were you when you experienced these problems?”

NaturalLanguage

Generation

InteractionPlanner

BusinessReasoning

Figure 1. Overall Agent Architecture.

with the percentage of the input they accountfor. This biases the system toward preferringlonger subsuming phrases to shorter ones. How-ever, no phrase is discarded.

Semantic Analysis. The next step is to recog-nize instances of ontology concepts. At thetopmost level (immediately under the topnode), the ontology is divided into Objects,Events, Attributes, Roles, and Values. Objectshave single or multiply valued attributes, whileevents have single or multiply valued roles. In-stances of these concepts are recognized by se-mantic patterns. Semantic patterns rely on thepresence of denoting words as well as on veri-fying constraints among entities. The denotingwords are sets of words or expressions that, ifencountered in the linguistic input, imply thepossible presence of that concept. The denot-ing words and expressions are inherited bysubconcepts. An object or event can be implieddirectly, by finding any of its denoting words,or by finding denoting words that imply any ofits subconcepts in the classified ontology.

Figure 4 illustrates a semantic pattern for theProduce concept and the way the fragment“the large white spuds” is recognized as denot-ing an instance of produce. The particular Ob-jectPattern shown simply requires a noungroup whose head is a word that denotes(through the inheritance-based denoting-wordsmechanism) the concept Produce. Fragmentslike “the fatty Atlantic salmon” or “a box ofPresident’s Choice soy milk” would be recog-nized as denoting the Produce concept in exact-ly the same way by this pattern.

Objects and events recognized by some pat-terns can be bound in other patterns to recog-nize further objects and events. For example,having recognized “the large white spuds” asan instance of Produce makes it possible to ap-ply the produce-not-fresh EventPattern in fig-ure 4 to the full input sentence “the large whitespuds delivered Monday were not fresh.” Thisresults in recognizing the input as a complaintof the type “<produce> is <not fresh>.” Notethat a sentence of the type “is <not fresh><produce>” would also have been recognizedby the same event pattern, because the se-quencing constraints in the pattern (the seqpredicates in the test clause) allow both ofthese orders.

Merging. After applying the patterns in the se-mantics stage, the merging stage combines ob-jects and events into higher-order aggregates.In figure 4, a Quality-complaint event (notcontaining the role date) and a Date object aremerged into a new Quality-complaint event.The new event contains the union of the rolesand attributes of the merged objects and

the recognition of structures that can be reliablyrecognized. Based on the Fastus (Appelt et al.1993) approach, we recognize only noun groupsand verb groups. The noun group consists of thehead noun of a noun phrase together with itsdeterminers and modifiers, for example, “thenew game” or “two pounds of brown-spottedBonita bananas.” The verb group consists of averb together with its auxiliaries and adverbs,for example, “were delivered late yesterdayevening” in “two pounds of brown-spotted ba-nanas were delivered late yesterday evening.”Verb groups are tagged as active, passive, infini-tive, gerund, and negated. Noun and verbgroups are scored in a manner that increases

Articles

50 AI MAGAZINE

x

xProduct

BeverageVegetable

Meat

Soy MilkBeef Pork Fish

Top

Bottom

Figure 2. Classified Concept Lattice.

Given a set of nodes N and the subsumption relations amongst them as a subset of N*N, classification generates a lattice containing the nodes N such that any node a in N is connected to its most specific subsumers (MSS) and most general subsumees (MGS). The set MSS contains subsumers of a such that no node exists that subsumes a and is subsumed by a node from MSS. The set MGS contains subsumees of a such that no node exists that is subsumed by a and subsumes a node in MGS.

Figure 3. Classification in Description Logics.

Articles

FALL 2004 51

(Date (name Date) (date “12 Dec 2002”))

Top

Object

Produce

Vegetable Meat

“the large white spuds delivered Monday were not fresh.”

(ObjectPattern (name produce-1) (object produce) (pattern “(noun-group(head ?n))”) (test “(denotes ?n object produce)”) (fill-attributes produce “?n”))

(Produce (name Produce) (produce potato”))

(EventPattern(name produce-not-fresh) (event Quality-complaint) (pattern “?p <- (Produce)” “?v <- (verb-group(head be)(negated TRUE))” “?f <- (adj-group(head ?n))”) (test “(denotes ?n attribute state fresh)” “(or(seq ?p ?v ?f)(seq ?v ?f ?p))”) (fill-role produce “?p”) (certainty 0.8)).

(MergePattern(name complaint-and-date) (resulting-event Quality-complaint) (pattern “?ev <- (Quality-complaint)” “?date <- (Date)” (test “(not(contains-role ?ev date))”) (merge “?ev”) (remove “?ev”))

(Potato (name Potato) (super vegetable) (denoted-by “potato”“murphy” “spud” “tater”))

Figure 4. Semantic Analysis and Merging.

sponse or is semantically a synonym of R. Fi-nally, O is a list of ontology concepts estab-lished when this entity is recognized.

If e is a frequently asked question (FAQ), anexample is Q(e) = “How do I contact you?” R(e)= “Call 1-800-555-1212,” C(e) = {“Do you guyshave a phone number?” “Where do I callyou?”} and O(e) = {FAQ}

With e a geographical location, an exampleis Q(e) = “,” R(e) = “Quebec, Canada,” C(e) ={“Quebec,” “La Belle Province”}, O(e) = {Loca-tion}.

Finally, with e a product name, an exampleis Q(e) = “,” R(e) = “Red Rabbit” and C(e) ={“Red Rabbit by Tom Clancy,” “Clancy’s latestbook”}, O(e) = {Product}.

Searching and Matching. The standard solu-tion to recognizing the presence of such enti-ties in a message is key word search, where keywords consist of words from the Q, R, and Ccomponents after stop-word elimination,stemming, and synonym equivalence. Foundwords are scored based on measures such as in-verse corpus frequency or fixed scores based onthe part of speech (nouns and verbs score morethan adjectives and adverbs). The final similar-ity score is computed in several ways depend-ing on the application. These include the stan-dard information retrieval tf*idf method(Salton and McGill 1983) (for FAQs) or more-specific methods that consider word order aswell (for geographical location and productnames). The implementation is based on theJakarta Lucene public-domain search engine(jakarta.apache.org/).

Reinforcement Learning. The similarity ap-proach relies on having a rather large and up-to-date corpus C for each entity. We use rein-forcement learning (Watkins and Dayan 1992)to automate the construction and mainte-nance of this corpus. Given a set of entities Ewhere each e = <Q, R, C, O>, reinforcementlearning continuously updates a relevancemeasure of the value of each element in C(e)every time the entity e is being asked about.The low-valued elements are eventually forgot-ten, while new useful ones are being learned,guaranteeing that the corpus C(e) is always upto date.

Assume the user has input the question q =“Can I give a few bucks to the driver?” First, thebest n matches to this are found. The referencequestions are displayed and the user clicks theone conveying the intended meaning, for ex-ample, “Is tipping necessary?” Based on this in-formation, we update the relevance scores forthe questions in the corpus for this selected en-tity. Let qm be the corpus question that actuallydid match. Its score is increased by rele-

events. Merge patterns solve most of thepronominal references to entities occurring inthe message, as, for example, binding “it” to“an order” in “I placed an order yesterday onthe site. Now I need to change it.” To solve ref-erences to entities introduced in the dialogueby the system, these entities are appended tothe user message by the Natural Language Gen-eration moduleand then merged as usual.

Logical Interpretation. The application do-main may logically constrain the meaning of auser message by making certain interpretationsinconsistent with each other. For example, it isnot possible for someone at the same time tochange the date of an order and to cancel it. Wedefine a consistent interpretation of a messageas a set of mutually consistent recognized ob-jects and events. Assuming we have an orderwith unique ID 735, {(change-order 735), (can-cel-order 735)} is not a consistent interpreta-tion, but both {(change-order 735)} and {(can-cel-order 735)} are. The score of an inter-pretation is the sum of the scores of its finalcomponents (objects and events). A compo-nent is said to be final if it is not contained inother components (as values of their roles or at-tributes). The logical-interpretation stage hasthe goal of finding a consistent interpretationof the input that has a maximal score (Fox andMostow 1977). The problem can be solved ex-actly by discrete optimization methods withbranch and bound or other forms of backtrack-ing search. We have not yet encountered situa-tions in which this would be needed. Instead,we provide incompatibility patterns that detectinconsistent interpretations and locally resolvethem by heuristically choosing what compo-nents to eliminate.

Similarity-Based Language UnderstandingThe analytic approach can provide high accu-racy but is knowledge intensive. To obtainwide domain coverage and graceful degrada-tion, we have integrated a similarity-basedmethod into our system. This approach doesnot provide a first-principles interpretation of asentence. Rather, it computes the degree of lex-ical similarity between a message and a set oflexical (multiword) entities, returning the enti-ties with highest similarity. The set of lexicalentities we match against can be arbitrary doc-uments or more specialized data such as prod-uct catalogues, geographical locations, institu-tion names, and so on.

We define a lexical entity e as a tuple: e = <Q,R, C, O>. Here, Q is a reference question, R isthe response to it, C is a set (corpus) of ques-tions, C = {qi}, such that each qi has R as its re-

Articles

52 AI MAGAZINE

vance(qm) <– d*relevance(qm) + (1 – d)*1. The oth-er questions qi were not useful, so their score isreduced by relevance(qi) <– d*relevance(qi) + (1 –d) * (–1). In both cases, 0 < = d < = 1 is a con-stant that determines how quickly we devaluepast experiences. Finally, q is added to the cor-pus of the selected entity. If the size of the cor-pus for the selected entity exceeds a set limit,the lowest-scored question is removed.

Similarity-based language understandinghas several uses in our system. It is the fallbackapproach when the analytic method fails. Of-ten FAQ sets are used as the repository in thiscase, ensuring that the user will get some rele-vant answers. And it is also used in situationswhere first-principles methods are simply inap-plicable. For example, to recognize productnames from product catalogues, important inretail applications, or to recognize references togeographical locations in applications dealingwith the delivery of goods or services.

Managing the DialogueThe results of the language-understandingstage are expressed as frame structures thathave slots for the information extracted fromthe input. An input like “I got the order deliv-ered yesterday, and I noticed that the strawber-ries were all smashed” is expressed as a Quality-complaint frame, with strawberries as theproduct, physically damaged as the quality is-sue, and “26 Nov 2002” (say) as the date. Thisframe implies that the user’s goal is to make a

complaint and obtain a resolution. User goalsare treated in two ways. Complex goals, likethe quality complaint, are treated in a deliber-ative manner. The agent starts planning a con-versation that will decide what information toelicit and what business policies to apply tofind a mutually satisfactory resolution. Simplegoals, usually requests for information (“Doyou charge a fee for bounced checks?”) aretreated reactively, with the system providingthe response and considering the issue termi-nated. The two types of goals are treated differ-ently during the interaction. In the plannedmode, the user can always ignore the currentsystem question and ask a reactive question—this is immediately answered, and the plannedmode is resumed (figure 5, left). If the requestcan be addressed only in planned mode, thecurrently executing planned interaction mayor may not be interrupted, depending on thepriorities of the respective goals (figure 5,right).

For each deliberative interaction, the Con-versation Management module creates a work-space, which contains a local database and agoal stack initialized with the inferred usergoal, for the Interaction Planner. When theworkspace is active, goals are refined into sub-goals, questions are asked, inferences are made,and business policies are applied. Several work-spaces can be active in the same time, corre-sponding to various conversation topics.Mechanisms are provided for a workspace toinherit values from other workspaces, creating

Articles

FALL 2004 53

User: I‘ve got the order yesterday, and all strawberries were smashed.System: Do you have the order number?User: anyway, what‘s your return policy?System: You can be refunded up to $10 per month up to $100 per year. In your case, you have already been refunded $70 this year.

So, can I have the order number, please?

User: I‘ve got the order yesterday, and all strawberries were smashed.System: Do you have the order number?User: and two weeks ago I asked to cancel the order and it was not.System: I understand you have a second complaint. Let‘s get to that after we finish the current one.

So, can I have the order number, please?

Figure 5. Switching between Deliberative and Reactive Behaviors.

down. Any of the established problems are ex-plained to the user. This process is in fact aform of heuristic classification (Clancey 1985),a well-known problem-solving method applic-able to diagnosis tasks.

To achieve a goal, three types of operatorsare available, Ask, Infer, and Expand. The inter-active operator (Ask) generates a question forthe user (via the natural language generationmodule), obtains the interpretations of the re-sponse from Natural Language Understanding,and uses them to update the values of the plan-ning variables associated with the currentworkspace. The AskLocation goal in figure 6can be achieved by an Ask operator that gener-ates the following example dialogue fragment:

Agent: Where were you when you experiencedthe loss of connection?User: On Highway 400, south of Barrie.

In this case, “Highway 400, south of Barrie”is extracted as the location of the incident andstored in the current workspace, where it canbe accessed by other operators. The main com-ponents of the Ask operator are: (1) a precon-dition in terms of the workspace variables andthe current and past semantic frames; (2) a lan-guage-generation request for question formula-tion; (3) the function or method to interpretthe response; and (4) the number of times thequestion can be reasked if the response is notunderstood. If this is exceeded without the an-swer being understood, the operator fails andbacktracking is initiated.

The noninteractive operator (Infer) is usedto perform deductions based on the existingvalues in the workspace. It is similar to Ask, butdoes not generate any questions. It may, how-ever, inform the user about inferences drawn tofurther clarify the interaction.

If the goal cannot be achieved directly, it canbe expanded into an ordered sequence of sub-goals using the Expand operator. Figure 6 iscomposed of possible refinements as specifiedby this operator. This operator is similarly in-strumented to clarify the dialogue flow. In-form-type messages can be specified for usewhen a goal is expanded into certain subgoals(for example, “I’m going to ask you now a fewquestions about your account”). Other Informmessages are for use when an expansion failedand backtracking is about to begin (for exam-ple, “I could not find your order based on theprovided information”) or when an expansionsucceeded. The Expand operator also specifiesthe constraints that must be satisfied in orderfor the expansion to be considered successful.The constraints are of two kinds. Computedconstraints are predicates among workspacevariables (for example, that a subscription to a

an interaction context that extends across theentire session. Reactive goals can generate re-sponses sensitive to the current deliberativegoal as context, as illustrated in figure 5.

The Interaction PlannerIn a business environment, a planning ap-proach to interaction is necessary to guaranteethat the solution search space is systematicallyexplored and that a solution is found if one ex-ists. This is a strong argument for AI agents—they never tire in the search for the best solu-tion and either find it or tell you if none exists.

We use a hierarchical task network (HTN)(Wilkins 1988) type of planning for its clarityand naturalness. In designing the interactiveplanning language, the issue has not been thecomplexity of the task networks or the size ofthe search space—in many business environ-ments these have to be simple enough forpeople to manage. The issue has been con-ducting the dialogue such that the logicalflow is clear, backtracking in the middle of thedialogue can be made comprehensible, andthe agent keeps communicating what it is do-ing and why.

Hierarchical task networks consist of goal re-finements into subgoals. Figure 6 illustrates atask network example from the wireless-servicediagnosis application, together with an exam-ple customer-service conversation. Refine-ments marked with OR are disjunctive (at leastone must be carried out), while the others areconjunctive and ordered (all must be carriedout in the shown left-to-right order).

At the top level of the illustrated task net-work, troubleshooting consists of first acquir-ing a problem description followed by estab-lishing the cause of the problem. There can betwo classes of problems in the example, whichwe discuss briefly. Download problems areidentified by a recorded attempt to download(through the air). The attempt to downloadcan fail because the device has insufficientmemory or because the device loses the con-nection in the process. In the former case, usersare advised as to how to increase the memoryon their telephone. The latter is treated as a re-port of network failure, and the service data-base is updated with the location where thefailure was reported. If the download was suc-cessful, then the problem is local to the device,and the user is advised on how to deal with it.

Nondownload problems fall into severalsubclasses, and more than one can occur at thesame time. The device may be incompatiblewith the service, there may be account restric-tions (such as content rating), subscriptionsmay have run out, or the network may be

Articles

54 AI MAGAZINE

service by a user is still active). Interactive con-straints replace the predicate by an accept-re-ject type of question addressed to the user, forexample “Service upgrade during weekends issubject to an extra $5.00 per month. Wouldyou like to upgrade?” Computed constraintsare similar to Infer operators, while interactiveones are similar to Ask operators, except thatthey do not update workspace data.

This description applies to the deliberativebehavior. The reactive behavior allows users todeviate for a short time from the planned inter-action, ask their own questions, get the re-sponse, and continue with the planned inter-

action. To make the agent reactive, we use theReact operator. This looks for unexpected se-mantic frames (coming from unexpected in-put) and answers them if the preconditions aresatisfied. Reactions can be supported every-where or in specified contexts, for example,when a certain (deliberative) goal is being pur-sued or during certain Ask executions. A reac-tion can specify the response messages to begenerated and the side-effect actions to exe-cute. In many cases, we use reactions to answerFAQs in context, as illustrated in figure 5.

The interaction planning system is aware ofthe interaction channel (Web, e-mail, or voice)

Articles

FALL 2004 55

LossOfConnection

OR

OR

WirelessServiceTroubleshoot

AcquireProblemDescription

EstablishCause

DownloadProblem

NonDownloadProblem

OR

DownloadSuccessful

InsufficientMemory

UpdateServiceDB IdentifyLocation

UpdateServiceDBAskLocation

VerifyDeviceCompatibility

VerifyAccountRestrictions

VerifySubscription

VerifyNetworkDowntime

Please describe your problem.2 hours ago my kid tried several times to play Invaders, but without success

Where were you when this happened? At our cottage in Haliburton

Our digital network is temporarily out of service in the Haliburton area, due to some upgrade work. The network should be restored by 5 pm today. We really apologize for this.

Figure 6. Task Network for Service Diagnosis and Example Conversation.

business objects, business policies, businessrules, queries, and actions. The system is usedthrough an Ask and Tell interface. The Ask op-erator is used to pose a query that will be an-swered, while Tell is used to assert facts, whichthe system may act on at its discretion. For ex-ample, (ask shortest-lead-time Toronto Van-couver) asks for the shortest lead-time for a de-livery from Toronto to Vancouver. To answer it,the system applies business rules that matchqueries with policies to produce answers. Foran example, see figure 7.

Language GenerationThe language-generation component createsthe linguistic form for the system’s questionsand messages. We use a two-stage generationprocess. First, the individual messages request-ed by various planning operators are collectedinto several buckets. Second, these messagesare assembled and edited in the final form tobe seen by the user. The component messagescan contain any number of Inform speech actsand a single Ask speech act. Each componentmessage is created by instantiating a Genera-tionTemplate. The templates themselves areannotated with pragmatic attributes of the textthey generate, for example concise, verbose, po-lite, and so on. There can be multiple templatesgenerating the same information content indifferent forms. The choice of the actual tem-plate used depends on a score that reflects howwell its pragmatic attributes match the attrib-utes in the initial generation request (the latterare set by the planning operators and are basedon the context and nature of the interaction).Randomization is used when scores are equal.The assembly and editing of the individualmessages into the final form are performed bylanguage-generation operators named Genera-tionPolicies. These can sort the components,filter out redundant ones, insert punctuation,format the text, and so on.

On interaction channels that use text (Web,e-mail), the system generates either simple textor HTML forms. On the voice channel, it gen-erates VoiceXML (Boyer et al. 2000) scripts thatplay vocal prompts, define input grammars,and evaluate the voice response using a voice-recognition engine. The Natural Language Un-derstanding module is bypassed for voice inter-actions, as we use the comparatively limitedlanguage-understanding ability of the voice-recognition engine. We currently use commer-cial voice-recognition engines provided byvoice ASPs like Voxeo (www.voxeo.com),which also provide all the telephony inter-faces. The voice ASP handles the voice ex-change according to the VoiceXML script re-

on which the system functions. The differencesamong these channels require that we use theadvantages and avoid the drawbacks of each.For example, plan to fill in a form in a browser,parse information from e-mails, or prompt andcheck pieces of information individually onthe telephone. Channel-specific behaviors arecreated by specifying alternative goal expan-sions conditioned by the interaction channel.Voice interactions based on commercial voice-recognition engines are particularly limited bythe size of the vocabulary that can be reliablyrecognized and are therefore handled in ourapplications by specific goal expansions.

Business ReasoningOur architecture makes a clear distinctionamong the strategic level at which all interac-tions are managed by Conversation Manage-ment, the level of planning of goal-directed in-teractions by the Interaction Planner, and thedomain reasoning level supported by the Busi-ness Reasoning module. This is consistent withprevious work on knowledge acquisition andproblem-solving methods (McDermott 1988)that has emphasized the recognition of thetypes and roles of knowledge as the basis of sys-tematic development of AI applications.

The main purpose of the Business Reasoningmodule is to provide the framework for speci-fying and applying business-specific policies.In the wireless-service diagnosis example, abusiness policy is that accounts can not con-tain applications that violate defined contentratings. Business policies for e-commerce appli-cations may specify lead times and cutoff timesfor order modification, various constraints onrefunds and returns, and so on. The secondpurpose is to map the data model used by back-end systems into a format understood by ourplatform and to update the back end in a trans-actional manner.

The provided conceptualization for describ-ing business reasoning consists of classes for

Articles

56 AI MAGAZINE

(BusinessRule (name fastest-delivery) (patter “(shortest-lead-time-query (from ?from) (to ?to))” “?crt <– (current-shortest ?from ?to ?time ?mode ?cost)” “(delivery-lead-time-policy (from ?from)(to ?to)(lead-time ?lead)(price-per-kg ?pr))” (test “(< ?lead ?time)”) (do “(retract ?crt)” “(assert (current-shortest ?from ?to ?lead ?pr))”)).

Figure 7. Sample Business Rule.

ceived from our server. Interestingly, addingVoiceXML scripts to the generation modulewas the only platform modification required tohave our conversational approach function onthe telephony network.

Ontology-Driven DevelopmentManaging the application development pro-cess for this platform is challenging due to thediversity and specialization of the knowledgeinvolved. We are addressing this problem byadopting an ontology-driven development ap-proach. In essence, we develop an applicationby creating an ontology that contains framesfor all the types of knowledge we need. This ispossible because, as illustrated, we have de-fined frame structures for every type of knowl-edge in the system-linguistic components(such as object and event patterns, merge pat-terns), conversation and planning components(such as the Ask, Infer, Expand, and React op-erators), and language-generation and busi-ness-reasoning components.

We use Protégé (Noy, Fergerson, and Musen2000) as our visual ontology editor (figure 8).The frames manipulated in the developmentprocess are translated into the platform’s nativeJess (Friedman-Hill 2003) rule language objectsvia the Jess Tab included in Protégé. Once trans-lated into Jess objects, the normal processingtakes place. Some objects are interpreted by Jessrules to provide the desired functionality, for ex-ample, the planning and the language-genera-tion operators. Others are compiled into Jess

rules using Jess’s ability to generate rules dy-namically. This is how the natural-language andbusiness-reasoning components are treated.

The advantage of this approach is the abilityto represent a large knowledge typology in auniform frame format for which a sophisticat-ed visual knowledge editor exists. The editormakes it possible to develop industry-specificrepositories of components, such as the retailand telecommunication repositories fromwhich we used several examples throughoutthe article.

EvaluationThe retail repository was used to create severalapplications, currently in different stages of de-ployment (table 1). Each column in table 1shows an application, with each row represent-ing a service issue dealt with by the applica-tion. Client Evaluation means the applicationhas been evaluated by the client company pri-or to beta testing on site. For each issue (row),we show, as a percentage, how often it occursin interactions in the application. (As we showonly the main issues, they sum to less than 100percent for each application.) The CorrectlySolved row shows what total percentage (acrossall issues in an application) was correctlysolved by the system. An 80/20 rule was appar-ent: a few issues dominate the majority of in-teractions. This is good news because it helpsfocus natural language and problem solving onthose issues.

To cope with the remaining issues, we have

Articles

FALL 2004 57

Gifts(Production)

WebGrocery

(Beta)

Flowers

(Beta)

Department Store

(Client Evaluation)

Books and Music

(Client Evaluation)

Correctly

Solved

66% 65% 52% 55% 64%

Order Status 33% 10% 3.3% – 20%OrderChange

2% 3% 4.6% – 12%

Quality Issue – 17% – – –MissingItems

– 16% – – 4%

AccountIssues

15% 9% 4.9% 12% 6%

ProductAvailability

5% 10% 7.2% – 1%

DeliveryInquiries

21% 2% 2.8% – 4%

BusinessPolicyQuestions

3% 11% 56.4% 67% 11%

Table 1. Applications Based on the Retail Repository.

ability to understand language. We monitorand assess this constantly on the basis of graphslike the one shown in figure 9 (a typical weekfrom the production Gifts application). We dis-tinguish in-scope questions (those that theknowledge base is prepared to deal with) fromout-of-scope ones, but we try to extend thescope by similarity methods, currently answer-ing about 30–40 percent of questions overall.

These results are generally satisfactory for alow-cost “self-serve” customer service solution,as they show the ability of the automated sys-

included in each application a base of FAQsrecognized through similarity matching.

Table 2 shows similar results for thetelecommunications repository. Here we havetwo different applications. The diagnosis oneuses heuristic classification (Clancey 1985) asthe problem-solving method and leads tolonger conversations. The wireless billing ap-plication has a broad but shallow one ques-tion–one answer nature and was done by sim-ilarity methods.

The critical aspect in all applications is the

Articles

58 AI MAGAZINE

Figure 8. Ontology Editor Screen (Protégé).

Table 2. Applications Based on the Telecommunications Repository.

Telecommunications Solutions Vendor(Client Evaluation)

Consulting Company(Client Evaluation)

Correctly Solved 74% 85%Diagnosis and explanationof wireless service issues

100% –

Wireless billing questions – 100%

tem to reduce call-center costs and give cus-tomers more access to information.

There is also considerable room for improve-ment, which we do in several ways. The indus-try-specific repository approach based on on-tology modeling languages allows us toimprove our language-understanding abilityby creating reusable concepts and semantic pat-terns. The lower-accuracy but wide-coveragesimilarity-matching method provides a simple,nontechnical way to extend the scope of thesystem dynamically. Finally, we are extendingthe similarity-matching approach with statisti-cal text-classification methods (Dumais et al.1998), such as Support Vector Machines, thatcan learn to perform more-complex mappingsbetween text and abstract categories than pos-sible with the standard information-retrievalmodel.

Lessons Learned and Conclusions

Paradoxically, designers of conversational cus-tomer-interaction systems do not control thescope of their system. Users do. People will askabout anything they need to know or do, with-out caring about the “scope” of the system asdefined by its designers. If the e-commerce sitebeing serviced happens to be slow for whateverreason, the customer service agent will get com-plaints about it. If any information is missing orunclear on the site, the agent will get questionsabout it. The scope of the system is always“anything commonsensically related to thebusiness.” Achieving such breadth through de-tailed semantic language analysis is very hard,if at all possible. For this reason, we are alwaysaiming for a “bell curve” relation between thebreadth and depth of understanding. Businesssubareas that require deep understanding (forexample, quality complaints in retail) are sup-ported by elaborated linguistic patterns. Othersubareas are supported by the less-accurate butwide-coverage similarity method that taps intothe existing document base of any business.This graceful degradation property ensures thatpeople get an answer in most cases, even if anapproximate one. Experience has shown thatusers are more forgiving if they get an approxi-mate or partially relevant response than no re-sponse at all. Finally, if none of these works,clear provisions must be made to connect cus-tomers to a live representative before they runout of patience. A good rule of thumb is to berisk adverse: every time the answer has been de-termined by similarity search, insert informa-tion for live-person connection.

The second lesson has to do with how re-

sponses are written. In the absence of nonver-bal communication, gestures that play such animportant role in interhuman interaction, theconnotations contained in the textual respons-es must be weighed carefully. Freshness, vari-ety, and being cute when you don’t understandwill always help.

The third lesson is about the role of AI. Manyprevious applied AI systems have reported the“disappearance of AI” phenomenon— fewerand fewer AI technologies were used as the sys-tem was being deployed commercially. Our to-date experience has been that this need not bethe case—our current system contains a goodassortment of integrated AI solutions, each ap-plied to significant parts of the overall task.

One reason this was possible is the fourth les-son learned, that we couldn’t have done itwithout using an AI programming language.Most of the system is written in Jess, a Java-em-bedded descendant of OPS5 and Clips. Properlyunderstood and used, rule-based programmingand AI-style “second-order programming”based on constructs like eval, apply, and the Jessbuild are invaluable for writing and integratingcomplex algorithms in a concise and timelymanner. And their integration with Java givesus unconstrained access to all the Internet andother programming resources ever needed.

The platform has been used to create cus-tomer interaction agents in industry verticalssuch as wireless-service provisioning and severalretail domains. These have undergone extensive

Articles

FALL 2004 59

0

10

20

30

40

50

60

70

80

90

11 12 13 14 15 16 17

Day of Month

Perc

enta

geIn scope

Correct versus in scope

Correct versus total

Error

Figure 9. Natural Language Understanding Performance.

his work in ontologies for enterprise mod-eling. In 1984 Fox cofounded CarnegieGroup Inc. (CGIX on Nasdaq until 1998), asoftware company that specialized in intel-ligent systems for solving engineering,manufacturing, and telecommunicationsproblems. In 1975 Fox received his B.Sc. incomputer science from the University ofToronto, and in 1983 he received his Ph.D.in computer science from Carnegie MellonUniversity, where he was an associate pro-fessor of computer science and robotics un-til his return to Toronto in 1991. He is a Fel-low of the American Association forArtificial Intelligence. His e-mail address ismsf at eil.utoronto.ca.

Lei Hong received herB.Sc. degree from HarbinUniversity of Technolo-gy, China, in 1990, and aMaster’s degree fromUniversity of Toronto,Canada, in 2000. In 1990she joined China Aero-space Industry Inc.,

where she spent most of her time on real-time control-system development for vari-ous industrial applications. From 2000 to2003 she worked for Novator Systems Inc.,where she actively worked on e-commerceapplications. Hong’s main interests includesoftware development for intelligent sys-tems.

Yannick Lallement re-ceived his Ph.D. in com-puter science from theUniversity of Nancy I,France, in 1996. He laterjoined the Human Com-puter Interaction Insti-tute at Carnegie MellonUniversity, developing

models of human performance in Soar. Heis now with Novator Systems in Toronto,working on automated customer-care solu-tions. His interests focus on artificial intel-ligence, cognitive modeling, and human-computer interaction. His e-mail address isyannick at novator.com.

Zhongdong Zhang re-ceived his Ph.D. fromthe Department of Com-puter and InformationScience, University ofConstance, Germany, in1998. He is currentlyworking on e-commerceand automated cus-

tomer-care solutions. His research interestsinclude Web-based information systems,text categorization/information extraction,and Web analytics. His e-mail address iszhang at novator.com.

Noy, N. F.; Fergerson, R. W.; and Musen, M.A. 2000. The Knowledge Model of Protégé2000: Combining Interoperability andFlexibility. In Proceedings of the Twelfth Eu-ropean Workshop on Knowledge Acquisition,Modeling and Management, 17–32. LectureNotes in Computer Science. Berlin:Springer-Verlag.

Salton, G.; and McGill, M. 1983. Introduc-tion to Modern Information Retrieval. NewYork: McGraw Hill.

Watkins, C. J. C. H.; and P. Dayan, P. 1989.Q-learning. Machine Learning 8(3): 279-292.

Wilkins, D. 1988. Practical Planning. SanMateo, Calif.: Morgan Kaufmann Publish-ers.

Mihai Barbuceanu’s re-search interests are increating theories andmodels of interactionand coordination be-tween human and artifi-cial agents. He has creat-ed one of the firstagent-coordination lan-

guages based on a conversational model;modeled joint work in organizations usingdeontic relationships; and applied decisiontheory, game theory, and constraint opti-mization to develop a negotiation methodthat achieves optimal allocations of goodsand services. Recently, his work has beenfocused on human computer interactionusing written and spoken natural language.He is the principal architect of a commer-cial framework for conversational cus-tomer-interaction systems that integrateknowledge-based and statistical methodsfor language understanding, informationretrieval, commonsense and specialized do-main ontologies, multitopic dialogue man-agement, and hierarchical task-planningmethods. Barbuceanu received his Ph.D.from the Technical University of Bucharest.He can be reached at mihai at novator.com.

Mark S. Fox cofoundedand is chairman andchief executive officer ofNovator Systems Ltd., aprovider of e-commerceservices and softwaresince 1994. In October2003, Fox launchedChocolatePlanet.com,

which will provide same-day delivery ofpremium chocolates across North America.Fox is also a professor of industrial engi-neering at the University of Toronto, wherehis research focuses on enterprise integra-tion and artificial intelligence. Fox isknown for his pioneering work in con-straint-directed scheduling, which under-lies current logisitics, supply-chain man-agement, and scheduling solutions, and for

beta testing; some of them have beenin production, and more are to come.The jury is still out with respect towhether we have managed to build theknowledgeable and infinitely patientagents able to provide the frictionlesscustomer-service experience envi-sioned. However, we strongly believethat this will happen only at the end ofa serious integration effort aimed atputting together a wide range of AI, In-ternet, and conventional technologies.

ReferencesAllen, J. F.; Byron, D. K.; Dzikovska, M.; Fer-guson, G.; and Galescu, L. 2001. TowardConversational Human Computer Interac-tion. AI Magazine, 22(4)(Winter): 27–37.

Appelt, D. E.; Hobbs, J. R.; Bear, J.; Israel, D.;and Tyson, M. 1993. FASTUS: A Finite-StateProcessor for Information Extraction fromReal-World Text. In Proceedings of the Thir-teenth International Joint Conference on Arti-ficial Intelligence. San Francisco: MorganKaufmann Publishers.

Borgida, A.; Brachman, R. J.; McGuiness,D.; and Resnick, L. A. 1988. CLASSIC: AStructural Data Model for Objects. In Pro-ceedings of the ACM SIGMOD Internation-al Conference on Management of Data, 59-67. New York: Association for ComputingMachinery Special Interest Group on Man-agement of Data.

Boyer, Linda; Danielsen, Peter; Ferrans, Jim;Karam, Gerald; Ladd, David; Lucas, Bruce,and Rehor, Kenneth. 2000. Voice Extensi-ble Markup Language (VoiceXML) W3CNote 5 May. (www.w3.org/TR/voicexml).Piscataway, N.J.: VoiceXML Forum

Clancey, W. J. 1985. Heuristic Classifica-tion. Artificial Intelligence 27(3): 289-350.

Friedman-Hill, E. 2003. Jess in Action.Greenwich, Conn.: Manning PublicationsCompany.

McDermott, J. 1988. A Taxonomy of Prob-lem Solving Methods. In AutomatingKnowledge Acquisition, 225–226, ed S. Mar-cus. Dortrecht, Holland: Kluwer AcademicPress.

Dumais, S. T.; Platt, J.; Heckerman, D.; andSahami, M. 1998. Inductive Learning Algo-rithms and Representation for Text Catego-rization. In Proceedings of the Seventh Inter-national Conference on Information andKnowledge Management, 148–155. NewYork: ACM Press.

Fox, M. S.; and Mostow, J. 1977. MaximalConsistent Interpretations of Errorful Datain Hierarchically Modeled Domains. In Pro-ceedings of the Fifth International Joint Con-ference on Artificial Intelligence, 165–171. LosAltos, Calif.: William Kaufmann, Inc.

Articles

60 AI MAGAZINE

articles building agents to serve...

Documents