embodied presentation teams: a plan-based approach for...

SAARLAND UNIVERSITY

Faculty of Natural Science and Technology I

Department of Computer Science

Master’s Program in Computer Science

Master’s Thesis

Embodied Presentation Teams:

A plan-based approach for affective

sports commentary in real-time

submitted by

Ivan Gregor

on March 1, 2010

Supervisor

Prof. Wolfgang Wahlster

Advisor

Dr. Michael Kipp

Reviewers

Prof. Wolfgang Wahlster

Dr. Michael Kipp

http://www.uni-saarland.de/en/

http://www.cs.uni-saarland.de/

http://www.cs.uni-saarland.de/

Statement

Hereby I confirm that this thesis is my own work and that I have documented all sources

used.

Signed:

Date:

Declaration of Consent

Herewith I agree that my thesis will be made available through the library of the Com-

puter Science Department.

Signed:

Date:

Abstract

Virtual agents are essential representatives of multimodal user interfaces. This thesis

presents the IVAN system (Intelligent Interactive Virtual Agent Narrators) that gen-

erates affective commentary on a tennis game that is given as an annotated video in

real-time. The system employs two distinguishable virtual agents that have different

roles (TV commentator, expert), personality profiles, and positive, neutral, or negative

attitudes to the players. The system uses an HTN planner to generate dialogues which

enables to plan large dialogue contributions and generate alternative plans. The sys-

tem can also interrupt the current discourse if a more important event happens. The

current affect of the virtual agents is conveyed by lexical selection, facial expression,

and gestures. The system integrates background knowledge about the players and the

tournament and user pre-defined questions. We have focused on the dialogue planning,

knowledge processing, and behaviour control of the virtual agents. Commercial products

have been used as the audio-visual component of the system.

A demo version of the IVAN system was accepted for the GALA 2009 that was a part of

the 9th International Conference on Intelligent Virtual Agents. We have verified that an

HTN planner can be employed to generate affective commentary on a continuous sports

event in real-time. However, while the HTN planning is well suited to generate large

dialogue contributions, the expert systems are more suitable to produce commentary on a

rapidly changing environment. Most parts of the system are domain dependent, however

the same architecture can be reused to implement applications such as: interactive

tutoring systems, tourist guides, or guides for the blind.

i

Acknowledgements

First of all, I would like to thank Michael Kipp and Jan Miksatko for being very helpful

and inspiring supervisors. Thanks as well to the DFKI for providing the opportunity

to work on this project, for the necessary equipment, and funding to attend the GALA

competition and the IVA conference. Thank you also to the Charamel GmbH and Nu-

ance Communication, Inc., for providing the Charamel virtual agents Mark and Gloria

and the RealSpeak Solo software with the Tom and Serena voices, respectively. Finally,

I would like to thank my parents for being very supportive during my studies in Prague

and Saarbruecken.

ii

Contents

Abstract i

Acknowledgements ii

List of Figures v

List of Tables vii

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 GALA 2009 Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 IVAN System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4 Research Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Related Work 8

2.1 ERIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.1 The Affect Module . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.2 The Natural Language Generation Module . . . . . . . . . . . . . 9

2.2 DEIRA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Spectators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4 STEVE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5 Presentation Teams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.5.1 Design of Presentation Teams . . . . . . . . . . . . . . . . . . . . . 13

2.5.2 Inhabited Marketplace . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.5.3 Rocco II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Methods for Controlling Behaviour of Virtual Agents 16

3.1 Hierarchical Task Network Planning . . . . . . . . . . . . . . . . . . . . . 16

3.1.1 Example of a Planning Task . . . . . . . . . . . . . . . . . . . . . . 17

3.1.2 Java Simple Hierarchical Ordered Planner (JSHOP) . . . . . . . . 19

3.1.3 JSHOP Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Expert Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.3 Statecharts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4 Generating Dialogue 26

iii

Contents iv

4.1 Commentary Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.1.2 Dialogue Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.1.3 Planning Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.1.4 Commentary Excerpt . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2 Affect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.2.2 Planning with Attitude . . . . . . . . . . . . . . . . . . . . . . . . 35

4.2.3 OCC Generated Emotions . . . . . . . . . . . . . . . . . . . . . . . 36

4.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5 Architecture 41

5.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.1.1 Design Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.1.2 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.1.3 Off-the-shelf Components . . . . . . . . . . . . . . . . . . . . . . . 44

5.2 Tennis Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.3 Plan Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.3.1 Event Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.3.2 Background Knowledge . . . . . . . . . . . . . . . . . . . . . . . . 52

5.3.3 Discourse Planner . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.4 Plan Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.4.1 Template Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.4.2 Avatar Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.4.3 Output Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6 Discussion 62

6.1 Comparison with the ERIC system . . . . . . . . . . . . . . . . . . . . . . 62

6.2 Evaluation in Terms of Research Aims . . . . . . . . . . . . . . . . . . . . 63

6.3 Comparison JSHOP vs Jess . . . . . . . . . . . . . . . . . . . . . . . . . . 66

7 Conclusion 68

7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

A Commentary Excerpt 72

List of Figures

1.1 Event Position Specification . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Example of an ANVIL File . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1 ERIC commenting on a Horse Race . . . . . . . . . . . . . . . . . . . . . . 9

2.2 DEIRA (Dynamic Engaging Intelligent Reporter Agent) . . . . . . . . . . 10

2.3 STEVE in a 3D Simulated Student’s Work Environment . . . . . . . . . . 12

2.4 Example of a Planning Method (Dialogue Scheme) to Discuss an AttributeValue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5 Excerpt of the Domain Knowledge . . . . . . . . . . . . . . . . . . . . . . 14

2.6 Gerd and Metze commenting RobboCup Soccer Game . . . . . . . . . . . 15

3.1 Example of a Planning Task - HTN . . . . . . . . . . . . . . . . . . . . . 18

3.2 Example of a Planning Task - generated Plan . . . . . . . . . . . . . . . . 18

3.3 JSHOP Input Generation Process . . . . . . . . . . . . . . . . . . . . . . 19

3.4 Sample JSHOP Axiom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.5 Sample JSHOP Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.6 Sample JSHOP Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.7 Overview of the COHIBIT system . . . . . . . . . . . . . . . . . . . . . . 25

4.1 Example of a Planning Method . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2 Example of a Compound Task Decomposition . . . . . . . . . . . . . . . . 30

4.3 Possible Decompositions of a Compound Task . . . . . . . . . . . . . . . . 31

4.4 Decomposition of the Goal Task “Comment” . . . . . . . . . . . . . . . . 32

4.5 Decomposition of the Subgoal Task Commant on rally . . . . . . . . . . . 32

4.6 Decomposition of the Goal Task “Comment” that leads to a Subgoal TaskDrop Volley . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.7 Emotion Module GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.1 IVAN Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.2 Dataflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.3 Charamel Virtual Agents Mark and Gloria . . . . . . . . . . . . . . . . . 45

5.4 Tennis Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.5 Tennis Simulator GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.6 IVAN Architecture - Plan Generation . . . . . . . . . . . . . . . . . . . . 47

5.7 Dataflow - Plan Generation . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.8 States of the Tennis Game . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.9 Tennis Score Counting using a Finite State Machine . . . . . . . . . . . . 50

5.10 Hierarchy of Facts from which an Ace can be deduced . . . . . . . . . . . 52

5.11 JSHOP Input Generation Process . . . . . . . . . . . . . . . . . . . . . . 55

v

List of Figures vi

5.12 IVAN Architecture - Plan Execution . . . . . . . . . . . . . . . . . . . . . 56

5.13 Dataflow - Plan Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

List of Tables

1.1 Tennis Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Event Position Specification . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Track Element Specification . . . . . . . . . . . . . . . . . . . . . . . . . . 4

4.1 Dialogue Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.2 Example of Generated Dialogues based on different Appraisals . . . . . . 36

4.3 Description of the eight Basic OCC Emotions . . . . . . . . . . . . . . . . 37

4.4 Five Personality Traits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.5 Example of Events that elicit respective Emotions . . . . . . . . . . . . . 38

5.1 Description of the Tennis Counting Terminology . . . . . . . . . . . . . . 50

5.2 Example of high-level facts deduced from low-level facts . . . . . . . . . . 52

5.3 Examples of Facts deduced from the Background Knowledge . . . . . . . 53

vii

Chapter 1

Introduction

This thesis presents the IVAN system (Intelligent Interactive Virtual Agent Narrators),

that provides affective commentary on a continuous sports event in real-time. We have

employed two virtual agents that are engaged in dialogues to comment on a tennis game

that was given as the GALA 2009 challenge (see section 1.2). The virtual agents can have

different attitudes to players and their current affective state can be conveyed by lexical

selection, facial expression, and gestures. We have focused on the knowledge processing,

dialogue planning, and behaviour control of the virtual agents. We have used commercial

software as the audio-visual component of the system. In the following sections, we will

explain why it is beneficial to employ virtual agents, then we will describe our task as

the GALA 2009 challenge, outline the IVAN system, and describe our research aims.

1.1 Motivation

Multimodal user interfaces are becoming more and more important in human-machine

communication. Essential representatives of such interfaces are the virtual agents that

aim to act like humans in the way they employ gestures, gaze, facial expression, posture,

and prosody to convey facts in face-to-face communication with a user. [1] The face-

to-face interaction that uses rich communication channel is believed to be exclusively a

human domain, for instance, if people have something important to say, they say it in

person. To generate such a complex behaviour of a virtual agent, it is important to endow

the virtual agent with emotions since s/he becomes more believable by humans and

the system that employs such virtual agents becomes more entertaining and enjoyable

for the users. [2] Virtual agents can be employed in many fields such as: computer

games, tutoring systems, virtual training environments [3], story telling systems [4, 5],

advertisement, automated presenters [6, 7, 8, 9] , and commentators. [10, 11]

1

Chapter 1. Introduction 2

In this thesis, we have focused on the commentary agents. Moreover, we have employed

a presentation team since the use of a presentation team [6], i.e., the use of several

distinguishable virtual agents with different personality profiles, roles, and goals, en-

riches communication strategies and the information being conveyed can be distributed

onto several virtual agents in the form of a dialogue. It is particularly important to

endow virtual agents of the presentation team with emotions since they become more

distinguishable. Distinct virtual agents can better represent different roles and opposing

points of view. The use of a presentation team is also more advantageous in comparison

to the use of only one virtual agent since the performance of a presentation team is more

entertaining for the audience, provide better understanding, and improve recall of the

presented information.

The additional advantage of virtual commentary agents is that they can run locally on

a user’s computer. Hence, the commentary can be partly customized since the user can

set basic settings of a commentary. Thus, it is a good idea to employ virtual agents as

a presentation team to comment on a sports event.

1.2 GALA 2009 Challenge

In this section, we will introduce our task that was given as the GALA 2009 1 challenge

(Gathering of Animated Lifelike Agents). The GALA event is a part of the annual In-

ternational Conference on Intelligent Virtual Agents (IVA)2. The aim of GALA is to

encourage students to implement a system that provides behaviourally complex com-

mentary on a continuous stream of events in real-time. The challenge of GALA 2009

was to provide a commentary on a tennis game that was given as an annotated video.

The GALA challenge in the previous years was to comment on a horse race that was

given by a horse race simulator.

The events that occur in the video of a tennis game are manually annotated with the

ANVIL tool [12] and stored into an ANVIL file. The ANVIL file contains timestamped

events that are grouped into tracks where each track contains events that have the same

source, namely, we have one track for the ball and one track for each player. Table 1.1

contains all events that can be annotated.

Each event is further specified with the place on a tennis court where it happened.

Table 1.2 contains attributes that specify the position of a ball or a player and Figure

1.1 depicts these tags in the picture of a tennis court.

1http://hmi.ewi.utwente.nl/gala2http://iva09.dfki.de/

http://hmi.ewi.utwente.nl/gala

http://iva09.dfki.de/


Player events Ball events

throw shotserve cross netforehand hit netbackhand hit tapeforehand-volley bouncebackhand-volley faultsmash outmiss

Table 1.1: Tennis Events

Position side Position longitudinal Position lateral Position height

server net left lowreceiver mid court middle middle

baseline right high

Table 1.2: Event Position Specification

Figure 1.1: Event Position Specification

Each event with its timestamp and position specification stands for a track element.

Table 1.3 contains information which attributes each track element has.


Ball track element Player track element

timestamp timestampball event player eventposition lateral position lateralposition longitudinal position longitudinalposition sideposition height

Table 1.3: Track Element Specification

Figure 1.2: Example of an ANVIL File

Figure 1.2 shows two excerpts from an ANVIL file. The left column is an example of

a ball track and the right column is an example of a track of the first player. As we

can see, each track consists of track elements where each track element represents one

event. Furthermore, each track element has a start time and an end time. Whilst the

start time of an event corresponds to its timestamp, the end time of an event can be

omitted, since all events can be considered as instantaneous. The ball track describes

that a ball was shot on the right side of the baseline on the server side at time 7.49 sec,

then the ball crossed the net in the middle, and bounced in the middle of the mid-court

on the receiver side, and then was shot on the right side of the baseline. The player

track describes that the player is throwing a ball on the right side of the baseline at time

7.4 sec, then he is serving. Later on, the player is playing a forehand on the right side

of the baseline, and then he is playing a backhand from the left side of the baseline.


1.3 IVAN System

In this section, we will introduce the IVAN system (Intelligent Interactive Virtual Agent

Narrators) [13] that we have developed to produce affective, behaviourally complex

commentary on a continuous sports event in real-time. The system was employed to

comment on a tennis game that was given as the GALA 2009 challenge. We have

employed a presentation team (Elisabeth Andre, Thomas Rist) [6], i.e., in our case

two virtual agents with different roles (TV commentator, expert) to reflect two different

presentation styles, attitudes to the players (positive, neutral, negative), and personality

profiles to simultaneously comment on a tennis game. One virtual agent can interrupt

the other virtual agent or himself/herself when more important event happens. The

system also integrates background knowledge about the players and the tournament.

Moreover, the user can fire one of the pre-defined questions at any time. We have

focused on the knowledge processing, dialogue planning, and behaviour control of virtual

agents. We have used commercial software as the audio-visual component of the system.

The IVAN system consists of several modules that are running in separate threads and

communicate via shared queues. We employed an HTN planner to generate dialogues,

statecharts to simulate basic states of the game, and expert systems to maintain the

emotional state of each virtual agent. When the system starts, the tennis simulator reads

an ANVIL [12] file that contains the description of a tennis game and sends timestamped

events (e.g. a player playes a forehand, a ball hits the net) at the time they occur to

the input interface of the core system. The core system transforms these elementary

events to the low-level facts (e.g. which player just scored) that form the knowledge base

for the HTN planner and the emotion module. Generated plans that represent possible

dialogues are transformed to individual utterances and annotated with gestures. The

current emotional state of a virtual agent is used to derive his/her facial expression.

Annotated utterances along with the corresponding facial expression tags are sent to

the audio-visual component that creates the multimodal output of the system.

When the system starts, our two virtual agents are engaged in dialogues to comment on

the tennis game or on the background facts. A virtual agent is happy if his/her favourite

player is doing well and unhappy s/he is losing. A virtual agent comments in a positive

way on a player s/he likes and on events that lead to the victory of his/her favourite

player. A virtual agent comments in a negative way on a player s/he dislikes and on

events that hinder the victory of his favourite player. The current affect of a virtual

agent is conveyed by lexical selection, facial expression, and gestures.


1.4 Research Aims

In this section, we will describe our four main research aims. They will be discussed in

section Evaluation in Terms of Research Aims 6.2 after we describe the architecture of

the whole system.

• Dialogue Planning for Real-time Commentary

In this master thesis, we wanted to investigate how an HTN planner can be em-

ployed to generate commentary in the form of a dialogue on a continuous sports

event in real-time for two virtual agents. An example of a real-time commentary

system that uses the expert systems to control one virtual agent is ERIC [10], how-

ever he might be too reactive, i.e., individual utterances are uttered at particular

knowledge states where ERIC cannot generate larger contributions. In addition,

the expert systems cannot generate alternative plans, thus the HTN planning of-

fers more variability. Therefore, we wanted to examine an HTN planner that is

supposed to be a good strategy to generate elaborate, large, and coherent dialogue

contributions.

• Reactivity

The system should be able to react quickly to new events that happen during the

tennis game. Moreover, when a more important event happens than the event

on which the virtual agents are commenting at the moment, the system should

be able to interrupt the current discourse and comment on the new event. The

interruption should be graceful and have smooth transition.

• Behavioural Complexity

The virtual agents of our presentation team should ideally behave like human ten-

nis commentators and produce interesting, suitable, and believable commentary.

They should use the whole range of communication channels to convey facts about

the tennis game. They should generate variety of dialogues along with synchro-

nized hand and body gestures and have appropriate facial expression in dependence

on their current emotional states. Moreover, if we allow the user to interact with

the system, the system becomes more engaging. The behavioural complexity en-

sures the believability of the virtual characters. Without above mentioned traits,

the virtual agents would look unrealistic.

• Affective Behaviours

The virtual agents should affectively react to the events that occur in the tennis

game according to their (positive, neutral, or negative) attitudes to the players.


Their emotional state should be derived from the appraisals of the events that hap-

pen during the tennis game. The virtual agents’ current affect should be conveyed

by lexical selection, facial expression, and gestures. If we endow virtual agents

with emotions, it will increase their believability and they will be better accepted

by users.

Chapter 2

Related Work

In this chapter, we will describe several examples of virtual agent applications that are

relevant to our work. We will introduce ERIC that is an affective, rule-based sport

commentary agent that won GALA 2007 as a horse race commentator. We will also

present DEIRA that is another horse race reporter. Then, we will present project Spec-

tators that participated in GALA 2009 (see section 1.2); it employs several autonomous

affective virtual agents that jointly watch a tennis game as ordinary tennis spectators.

To introduce the HTN planning (see section 3.1) that we have employed in our system

to generate dialogues, we will describe STEVE that uses the HTN planning to help

students to perform physical procedural tasks in a 3D simulated student’s work envi-

ronment. Since we employed a presentation team [6] in our system, we will also describe

the general design of presentation teams and two applications that employ them.

2.1 ERIC

ERIC [10, 14] won GALA 2007 1 as a horse race commentator. ERIC is a generic rule-

based framework for affective real-time commentary developed at DFKI. The system was

tested in two domains: a horse race and a tank battle game, where the horse race was

given in form of a horse race simulator supplied by GALA 2007. The simulator sends the

speed and the position of each horse every second to ERIC via socket. ERIC is getting

events from the horse race simulator and produces coherent natural language alongside

with the non-verbal behaviour. The visual output is represented by a virtual agent that

has lip movement synchronized to speech, can express various facial expressions and

perform many different gestures. ERIC employs the same avatar engine as our system.

The graphical output of ERIC is shown in Figure 2.1.

1http://hmi.ewi.utwente.nl/gala/finalists 2007/

8

http://hmi.ewi.utwente.nl/gala/finalists_2007/

Chapter 2. Related Work 9

Figure 2.1: ERIC commenting on a Horse Race

ERIC consists of several modules. We will describe two most interesting modules, i.e.,

the Affect module and the Natural Language Generation module in detail.

2.1.1 The Affect Module

The affect module is getting facts from the world and assigns appraisals to each event,

action, and object according to the goals, desires, and cause effect relations. The ap-

praisal of an event, action or objects is then sent in the form of a specific tag to the

ALMA module [15] that maintains the commentator’s affective state. ALMA consid-

ers three types of affect: emotions (short-term), mood (medium-term), and personality

(long-term). Emotions are bound to specific events and decay through time. Mood

represents the average of the emotional state across time. Personality is defined by Big

Five [16], i.e., openess, conscientiousness, extraversion, agreeableness and neuroticism.

Personality is used to compute the initial mood and influences the intensity and decay

of emotions. The affective state of a virtual agent influences the utterance, gesture, and

facial expression selection.

2.1.2 The Natural Language Generation Module

This module uses a template-based algorithm to generate utterances. Each template

corresponds to a rule in a rule-based engine. Each such rule has conditions that can be

partitioned into four groups: facts that must be known, facts that must be unknown,

facts that must be true, and facts that must be false. There is at least one utterance


for each template that contains flat text and slots for variables. First, all candidate

templates are generated, then the corresponding utterances are retrieved and finally one

of the most coherent utterances is chosen. The discourse coherence is ensured by the

Centering Theory [17] that in a simplified way says that the discourse is coherent if every

two utterances are coherent. Thus, the topic of a template and a list of all possible topics

for a coherent following sentence is defined for each template. After the last template

has been chosen, the next template is chosen so that its topic was among possible topics

for a coherent following sentence of the last template.

This system is most closely related to our work since the overall goal of ERIC is the

same as ours. The comparison of the IVAN system and ERIC is in section 6.1.

2.2 DEIRA

DEIRA [11] (Dynamic Engaging Intelligent Reporter Agent) is another commentary

agent that participated in GALA 2007 2 as a horse race reporter. DEIRA employs an

expert system to generate affective commentary in real-time. The system maintains

the affective state of the reporter according to his personality and events that occur in

the horse race. The current affect is represented by a vector of four values (tension,

surprise, amusement, pity) and is conveyed by lexical selection and facial expression of

the reporter. The graphical output of the system is shown in Figure 2.2.

Figure 2.2: DEIRA (Dynamic Engaging Intelligent Reporter Agent)




2.3 Spectators

Project Spectators [18] participated in GALA 2009 3 (see section 1.2). The system

consists of several autonomous virtual agents that are watching a tennis game. The

spectators can have different attitudes to the teams where the attitude can be positive

or neutral. Each spectator has a euphoria factor that determines how much the mood

state of a spectator changes when an important event happens in the tennis game. The

euphoria factor stands for the spectators’ personality trait. The mood of a spectator

is expressed by his facial expression, typical animations, and speech. The spectators’

moods are as follows: euphoric, happy, slightly happy, neutral, slightly sad, sad, and

disappointed. Furthermore, the position of the ball is interpolated so that the spectators

can gaze at the ball within a rally. Also the voice of a referee is incorporated to utter

the score in a conventional way.

However, the system focuses only on a non-verbal behaviour, i.e., neither the spectators

nor the referee comment on the game as tennis commentators. The system essentially

consists only of a limited set of rules that trigger respective animations. Thus, our

system and the project Spectators could be put together to generate complex scene of a

tennis game with both tennis commentators and spectators.

2.4 STEVE

STEVE (Soar Training Expert for Virtual Environments) [3] is a sample application that

uses the same method as our system to control the behaviour of virtual agents, namely,

the HTN planning (see section 3.1). STEVE is a virtual agent that helps students

to perform physical procedural tasks in a 3D simulated student’s work environment.

STEVE can either demonstrate procedural tasks or monitor students while they are

performing tasks and provide assistance if they need help or ask questions. Each task

consists of a set of partially ordered steps where a step can be a primitive action or

a composite action which creates a hierarchical structure where some steps of a task

can be also reused to solve other tasks. Therefore, STEVE employs the Hierarchical

Task Network to define particular tasks. STEVE consists of the perception, cognition,

and motor control module. The perception module monitors the state of the virtual

world and maintains its coherent representation. In each loop of the decision cycle of

the cognition module, the cognition module gets the current snapshot of the world from

the perception module, chooses appropriate goals, and then constructs and executes

plans. The motor control module gets high level commands from the cognition module




to control the voice, locomotion, gaze, gestures and objects manipulation. The graphical

output of STEVE is shown in Figure 2.3.

Figure 2.3: STEVE in a 3D Simulated Student’s Work Environment

Our system, as well as STEVE, uses an HTN planner to generate speech and can interact

with users via user questions. We were also inspired by the STEVE ’s execution cycle and

the concept of snapshots of the world. In comparison to STEVE, our system employs two

virtual agents, maintains their affective states, and generates affective commentary. On

the other hand, our system generates shorter contributions, it does not have elaborate

user interaction, and our virtual agents cannot move in the virtual environment.

2.5 Presentation Teams

We employed a presentation team [6, 7, 8, 9] in our system to comment on a tennis

game. In this section, we briefly describe the general design of presentation teams and

then focus on two projects that employ them. The first project is Inhabited Marketplace

where a car seller and customers have different preferences (e.g. running costs, prestige)

and character profiles. They are engaged in dialogues to discuss different attributes of

a car that the customers are interested in. The second project is Rocco II where two

soccer fans that can have different attitudes to the teams and character profiles jointly

watch a RoboCup soccer game and comment on it.


2.5.1 Design of Presentation Teams

The idea of presentation teams is to automatically generate presentations on the fly. A

presentation team consists of at least two virtual agents to convey information in style

of a performance to be observed by a user. This approach is believed to be more enter-

taining and provide better understanding than a system with only one presenter. The

virtual agents’ roles, character profiles, and dialogue types are chosen in dependence

on the discourse purpose. Moreover, the characters should be distinguishable, i.e., they

should have different audio-visual appearance, expertise, interest, and personality. Dis-

tinct agents can also better express opposing roles. There are two basic approaches how

to generate the dialogue. [19] Agents with the scripted behaviour correspond to actors

of a play that can still improvise a little at performance time, i.e., their behaviour is

first generated as a script (that contains slots for variables that can be substituted at

runtime) and later on executed. In contrast to the agents with the scripted behaviour,

the autonomous agents have no script, thus, they generate the dialogue contributions

on the fly, i.e., they pursue their own communicative goals and react to the dialogue

contributions of the other characters. First, we present a project that employs the agents

with the scripted behaviour, and then a project that employs the autonomous agents.

2.5.2 Inhabited Marketplace

The Inhabited Marketplace project employs a presentation team to present facts along

with an evaluation under constraints. Each character’s profile is defined by agreeable-

ness (agreeable, neutral, disagreeable), extraversion (extravert, neutral, introvert) and

valence (positive, neutral, negative). The presentation team consists of a car seller and

customers where each of them can prefer different dimension (e.g. environment, economy,

prestige, or running costs). The aim of each customer is to discuss all attributes that

have positive or negative impact on a dimension they are interested in. Furthermore, the

dialogue is also driven by the characters’ personality traits, e.g., an extrovert will start

the conversation or an introvert will use less direct speech. The dialogue is generated

by an HTN planner (see section 3.1), i.e., the goal task is successively decomposed by

planning methods into individual utterances. An example of a planning method that

represents a particular dialogue scheme is shown in Figure 2.4. The method represents

a scenario where two agents discuss a feature of an object. It applies if the feature has a

negative impact on any dimension and if this relationship can be easily inferred. Thus,

any disagreeable buyer produces a negative comment referring to this dimension, e.g.,

to the dimension running costs considering facts contained in Figure 2.5.


Figure 2.4: Example of a Planning Method (Dialogue Scheme) to Discuss an AttributeValue

Figure 2.5: Excerpt of the Domain Knowledge

2.5.3 Rocco II

Gerd and Metze are two soccer fans that comment on a RoboCup soccer game. They can

have different attitudes to the teams and their character profile is defined by extraversion

(extravert, neutral, introvert), openess (open, neutral, not open) and valence (positive,

neutral, negative). The project focuses on the following dispositions: arousal (calm,

neutral, excited) and valence. The system performs incremental event recognition [20]

from high level analysis of the scene over recognized events to the basis for the commen-

tary where the basis additionally contains background knowledge about the game and

teams. The system employs two autonomous agents that use template based natural

language generation to produce the commentary on the fly. Furthermore, an agent can

interrupt himself if more important event happens. The templates are strings with slots

for variables. Each template contains several tags, for instance: verbosity (the number

of words), bias (positive, neutral, negative), formality (formal, normal, colloquial) and

floridity (dry, normal, flowery language). The candidate templates are filtered in four

steps in the execution cycle:


1. pass only short templates in the case of the time pressure

2. templates used recently are eliminated

3. pass only templates expressing the speaker’s attitude

4. choose templates according to the speaker’s personality

The agents’ emotions are influenced by the current state of the game. Emotions can

be expressed by the speed and pitch range of the speech along with different hand and

body gestures. The graphical output of the system is shown in Figure 2.6.

Figure 2.6: Gerd and Metze commenting RobboCup Soccer Game

Similar to our system, Gerd and Metze can have different attitudes to the teams (play-

ers), personality profiles, integrates background knowledge about the game and teams,

and allow interruptions. In comparison to our system, they employed two autonomous

agents that use template based natural language generation to produce the commentary

on the fly. While our templates can be categorized only according to the bias, in Rocco II

project they use wide range of different templates that are categorized according to: ver-

bosity, bias, formality, and floridity. Thus, the system can generate more reactive and

elaborate commentary than our system. The system also maintains the emotional state

of the virtual agents which can be expressed by prosody, and hand and body gestures.

On one hand, our system does not integrate prosody, on the other hand, our virtual

agents have more elaborate facial expressions and gestures.

Chapter 3

Methods for Controlling

Behaviour of Virtual Agents

In this chapter, we will introduce three basic methods for controlling the behaviour of

virtual agents that we have employed in our system. The most important method is

the HTN planning that we have employed to generate dialogues for our presentation

team (see section 4.1). The second method are expert systems that we have used to

define emotion eliciting conditions in the emotion module (see section 4.2.3). The third

method are statecharts where we have used three simple finite state machines to model

basic states of the system (see section 5.3.1). Let us note that all these methods can be

used separately for the natural language generation (e.g. see ERIC in section 2.1 that

uses the expert systems).

3.1 Hierarchical Task Network Planning

In our system, we have employed the Hierarchical Task Network (HTN) planning to

generate the dialogues for our presentation team (see section 4.1). In general, planning

is employed for the problem solving and can be applied in many different domains to

save time and money, e.g., in air transport, flight control, controlling of space probes,

army missions, maintenance of complex machines (e.g. submarines), help in the case of

natural disasters, or tutoring systems (e.g. see STEVE in section 2.4). [21]

HTN planning is a variant of the automated planning. First, we will introduce the

STRIPS-Like planning [22] (where STRIPS stands for Stanford Research Institute Prob-

lem Solver) and then compare it to the HTN planning. The input of a STRIPS-Like

planner consists of a set of facts that describe the initial state of the world, a set of goal

16

Chapter 3. Methods for Controlling Behaviour of Virtual Agents 17

facts, and a set of planning operators that correspond to actions that can modify the

current state of the world. Let us denote the set of facts that describe the current state

of the world as a Base. A planning operator has a list of preconditions, a delete list,

and an add list. A planning operator can be applied if its preconditions are contained

in the Base. After a planning operator is applied, all facts that are in its delete list are

deleted from the Base and all facts that are in its add list are added to the Base. The

STRIPS-Like planner reaches the goal state of the world if the Base contains all goal

facts. After the planner is started, it is searching for a sequence of planning operators

that successively change the initial state of the world to its goal state. The output of

the planner is a plan (or a list of all possible plans) that consists of a list of planning

operators such that if we successively apply these operators to the initial state of the

world, we get the goal state of the world. While a STRIPS-Like planner can try to

apply any planning operator at any step of the planning process to reach the goal state

of the world, an HTN planner can only try to apply planning operators that are defined

in the HTN at a particular step of the planning process.

The HTN planning is based on tasks decomposition, i.e., compound tasks are decom-

posed into subtasks where each subtask is either a compound task on a lower level of

the planning hierarchy or a primitive task that corresponds to an action that can be

executed in the real world. Let us note that the primitive tasks in the HTN planning

correspond to the planning operators in the STRIPS-Like planning. The description

of the world (called planning domain in the HTN planning terminology) is given as a

Hierarchical Task Network and the planning goal (called planning problem) is given as

a list of goal tasks and a list of facts that describe the initial state of the world. The

resulting plan is a list of primitive tasks such that if we successively perform these prim-

itive tasks we accomplish the goal tasks. In the following text, we will show an example

of a planning task, introduce JSHOP1 as an implementation of an HTN planner that

we have employed in our system to generate the dialogues for our presentation team (see

section 4.1), and finally we will define some basic constructs of the JSHOP language.

3.1.1 Example of a Planning Task

Let us consider an example of a planning task that is depicted in Figure 3.1 to demon-

strate a typical task for an HTN planner. [23] There is a Hierarchical Task Network that

represents a way how to travel from x to y, more precisely, how to accomplish the goal

task travel(x,y). We can either take a taxi for a short distance or we can fly by air for

a long distance. (There might be also other ways how to travel that we do not consider

here.) Thus, to accomplish the compound goal task travel(x,y) we have to fulfil one of

1JSHOP2 (Java Simple Hierarchical Ordered Planner) http://www.cs.umd.edu/projects/shop/

http://www.cs.umd.edu/projects/shop/


its compound subtasks, namely, travel by taxi or travel by air. In the first case (travel

by taxi) we must first get a taxi, then ride the taxi from x to y and finally pay for it. In

the second case (travel by air) we must first buy a ticket from airport(x) to airport(y),

then travel from x to airport(x), fly by air from airport(x) to airport(y) and eventually

travel from airport(y) to y. Thus, to fulfil a compound task travel by taxi or travel by air

we have to satisfy all its respective subtasks. Let us note that after the planner starts,

it finds out first whether it is possible to travel by taxi and if not it backtracks and tries

the option to travel by air.

Figure 3.1: Example of a Planning Task - HTN

The resulting plan how to travel from the UMD (University of Maryland) to the MIT

is depicted in Figure 3.2. First we have to buy a ticket from the BWI (Baltimore

Washington International) airport to the Logan airport, then take a taxi from the UMD

to the BWI airport, then fly by air from the BWI airport to the Logan airport, and

finally take a taxi from the Logan airport to the MIT.

Figure 3.2: Example of a Planning Task - generated Plan


3.1.2 Java Simple Hierarchical Ordered Planner (JSHOP)

In the following text, we will introduce the Java Simple Hierarchical Ordered Planner

(JSHOP)2 [24, 25] that is the implementation of an HTN planner that we have employed

in our system. JSHOP is a Java implementation of a domain-independent Hierarchical

Task Network (HTN) planner, developed at the University of Maryland, that is based on

ordered task decomposition. The planning is conducted by problem reduction, i.e., the

planner recursively decomposes tasks into subtasks and stops when it reaches primitive

tasks that can be performed directly by planning operators. The compound task de-

composition is realized by methods that define how to decompose compound tasks into

subtasks. Since there may be more than one method that can be applied to a compound

task, the planner can backtrack, i.e., it can try more than one method to decompose a

compound task. As a consequence, the planner can find more than one suitable plan.

The Input of JSHOP consists of a description of a planning domain and a planning

problem. The planning domain creates the world description, i.e., it consists of planning

methods, planning operators and axioms. The planning problem consists of a list of

tasks and a list of facts that hold in the initial state of the world. The planning domain

description is stored in a domain file and the problem description in a problem file.

The Output of JSHOP is a list of suitable plans where each plan consists of a list of

primitive tasks and each primitive task corresponds to an action that can be executed

in the real world (e.g. utter an utterance or move object O from place X to place Y ).

Figure 3.3: JSHOP Input Generation Process

To Run the Planner, we have first to generate Java code from the respective domain

and problem files that are written using special Lisp-like syntax. JSHOP is implemented

2JSHOP2 (Java Simple Hierarchical Ordered Planner) http://www.cs.umd.edu/projects/shop/



in this way since this approach allows to perform certain optimizations and to produce

Java code that is tailored for a particular domain and problem description. [26] See

Figure 3.3. (The generated Domain Description Java file is compiled with the Domain-

Independent Templates which results in a Domain-Specific Planner. The generated Java

Problem file is compiled as well. At the end, we can run the planner that outputs all

possible Solution Plans.)

3.1.3 JSHOP Language

In the following text, we will describe the most important JSHOP constructs, namely:

axioms, planning operators, and planning methods. See the JSHOP manual [27] for

more details on the whole syntax of the language. JSHOP contains many constructs

characteristic for an HTN planner (e.g. symbols, terms, call terms, logical atoms, logical

expressions, implication, universal quantification, assignment, call expressions, logical

preconditions, task atoms, task list, axioms, operators, and methods). Furthermore, it

is possible to write user defined functions in Java.

Axioms

An axiom is an expression of the form:

(: − a [name1] L1 [name2] L2 . . . [namen] Ln)

where the head of an axiom is a logical atom a and its tail is a list of pairs (name,

logical precondition) where a is true if L1 is true or if L1...Lk−1 are false and Lk is true

(for k ≤ n). The name of the logical precondition is optional, however it can improve

readability. Figure 3.4 shows an example of an axiom. A place ?x is in walking distance

if the weather is good and a place ?x is within two miles of home, or if the weather is

bad and a place ?x is within one mile of home.

Figure 3.4: Sample JSHOP Axiom


Operators

An operator has the following form:

(: operator h P D A [c])

where h is the operator’s head; P is the operator’s precondition; D is the operator’s

delete list; A is the operator’s add list; c is the operator’s cost where the default cost

is 1. Let us denote the set of facts that describe the current state of the world as the

facts base. The operator can be applied if the preconditions in P are satisfied. After the

operator has been applied, all facts contained in D are deleted from the facts base and

all facts contained in A are added to the facts base. Figure 3.5 shows an example of a

planning operator. We can drive a ?truck from a ?old-loc to a ?location if the ?truck

is at the ?old-loc. After the operator has been applied, the fact (at ?truck ?old-loc) is

deleted from the facts base and a new fact (at ?truck ?location) is added to the facts

base.

Figure 3.5: Sample JSHOP Operator

Methods

A method is a list of the form:

(: method h [name1] L1 T1 [name2] L2 T2 . . . [namen] Ln Tn)

where h is the method’s head; each Li is a precondition; each Ti is a list of tasks; each

namei is a respective optional name. The compound task specified by the method can

be performed by performing all tasks in the list Ti if the precondition Li is satisfied and

for all preconditions Lk such that k < i holds that they are not satisfied. Figure 3.6

presents an example of a method. The task specified by this method is to eat a ?food.

If we have a fork then we eat the ?food with a fork. If we do not have a fork but we

have a spoon then we eat the ?food with a spoon.


Figure 3.6: Sample JSHOP Method

3.2 Expert Systems

Expert systems can be also employed to generate commentary on a sports event as we

have shown in ERIC (see section 2.1). Nevertheless, we have employed an expert system

only in the emotion module to define emotion eliciting conditions (see section 4.2.3).

Expert systems are used in many domains to “replace” human experts. The know-how

of human experts is first stored in the system. Afterwards, the system can be queried

by users that always get consistent answers. Nevertheless, the disadvantage of such a

system is that it is not appropriate for changing environments. Expert systems can be,

for instance, used in the following domains: financial services, accounting, production,

process control, medicine, or human resources. Examples of expert systems are CLIPS

(C Language Integrated Production System) [28] and its reimplementation into Java

Jess (Java Expert System Shell) [29] that we have employed in our system.

Expert systems are used to reason about the world using some knowledge that consists

of facts and rules. While the facts describe the current world in terms of assertions, the

rules define how to modify the facts base (knowledge base), e.g., how to deduce new

facts from already known facts where each rule is in the form of an if-then clause. Let

us note that it is also possible to retract or modify facts as a result of a rule being fired.

The inferring loop of a typical expert system consists of the following three steps:

1. Match the left hand side of the rules against facts and move matched rules onto

the agenda.

2. Order the rules on the agenda according to some conflict resolution strategy (e.g.

at random).

3. Execute the right hand side of the rules on the agenda in the order decided by

step (2).


The inferring loop ends when no new facts can be inferred. After the inferring process

ends, we know which rules have been fired and the fact base contains all initial and

inferred facts that have not been retracted. In the following text, we will present an

implementation of an expert system that we have employed in our system.

Java Expert System Shell (Jess)

Jess [29] is a fast Java implementation of an expert system developed at Sandia National

Laboratories. Although it has a rich Lisp-like syntax we will show only two examples:

one that defines an unordered fact and the other that defines a rule. See [29] for more

details on the complete syntax of the language.

Unordered Fact - Every fact corresponds to a particular template. The definition

of a template starts with a keyword deftemplate followed by a template name and an

optional documentation comment. The following template is an example how to define

an automobile. The template contains four slots: the manufacturer, the model, the year

of production as an integer, and color where red is the default color.

(deftemplate automobile

"A specific car."

(slot make)

(slot model)

(slot year (type INTEGER))

(slot color (default red))

)

The following command asserts a concrete Volkswagen Golf that was produced in 2009

and is of the default red colour.

(assert (automobile (model Golf)(make Volkswagen)(year 2009)))

Rule - Consider the following templates. The first template defines an agent that has

a name and can be hungry, the second template defines the current time.

(deftemplate agent

"A hungry agent"

(slot name)

(slot hungry)

)


(deftemplate current_time

"The current time"

(slot ctime (type FLOAT))

)

The following commands asserts agent George that is hungry and the current time that

is half past twelve.

(assert (agent (name George)(hungry TRUE)))

(assert (time (ctime 12.5)))

Consider the following rules that are chained.

(defrule open_cafeteria

(current_time {(12.0 <= ctime && ctime <= 14.0)})

=>

(assert (food-available))

)

(defrule have_lunch

?agent <- (agent (name ?name) (hungry TRUE))

(food-available)

=>

(modify ?agent (hungry FALSE))

(printout t ?name ‘‘had lunch.’’ crlf)

)

The first rule opens a cafeteria if the current time is between 12 and 14, and asserts the

fact that the food is available at the moment. Thus, the rule fires since the current time

is 12.5 and adds a new fact (food-available) to the facts base. The second rule fires if

there is an agent that is hungry and the food is available. Hence, the second rule fires

as well, prints out: “George had lunch.”, and modify the respective fact (i.e. the slot

hungry is FALSE ).

3.3 Statecharts

Another method that can be employed to control virtual agents are statecharts. In our

system, we have used finite state machines to maintain different states of the system


(see section 5.3.1). However, the statecharts can be also used to generate speech. An

example of a tool that enables to control virtual agents using statecharts is SceneMaker.

[30] A user can create arbitrary statechart using SceneMaker to describe the behaviour

of virtual agents. In every node of a statechart, a scene is stored. A scene can, for

instance, describe a dialogue between two virtual agents, i.e., the scene is described in a

theater script-like language and consists of utterances that are annotated with gestures.

A statechart can also consist of several types of edges that are used to define transitions

between nodes (e.g. a timeout edge, a conditional edge, or a probability edge).

The difference between SceneMaker and our approach is that while SceneMaker performs

one of the pre-defined scene at a node, we first run the HTN planner to generate the

scene, and then the scene is performed. Nevertheless, we have employed only three

simple finite state machines to maintain the basic states of our system and the logic was

implemented in the domain description of the HTN planner.

SceneMaker was employed in several projects: CrossTalk [31], VirtualHuman [32],

IDEAS4Games [33], and COHIBIT. [34, 35] For instance, the purpose of the COHIBIT

project is to provide knowledge about car technology and virtual agents in an entertain-

ing way. Two virtual agents interact with users and give them advices how to build a

car from different car pieces. The system is informed about the presence of users via

cameras and about the location and orientation of car pieces which is realized using

RFID technology. The overview of the COHIBIT system is depicted in Figure 3.7.

Figure 3.7: Overview of the COHIBIT system

Chapter 4

Generating Dialogue

In this chapter, we will explain how we generate affective commentary on a tennis game

for our two virtual agents. First, we will describe how we generate dialogues using an

HTN planner. Then, we will describe how we generate a piece of a dialogue that conveys

a particular attitude of a virtual agent to a player, how we maintain the affective state

of a virtual agent, and how a particular affect can be conveyed by different modalities.

4.1 Commentary Planning

In this section, we will describe how we generate the dialogues for our presentation team

that consists of two virtual agents. We have employed the JSHOP planner (see sec-

tion 3.1) to generate the commentary where the generated plans correspond to possible

dialogues in which the presentation team can be engaged. The planner is triggered at

particular states of the tennis game, gets facts that describe the current state of the

tennis game, and outputs all possible plans. The detailed description of the states in

which the planner is triggered, input facts, and how the generated plans are executed

will be given in Chapter 5. Thus, in this section, we will focus only on the dialogue

generation, i.e., in which dialogues our commentary team can be engaged in distinct

states of the tennis game according to the facts that describe the tennis game and the

background of the players and the tournament.

4.1.1 Motivation

The overall goal of our system is to automatically generate interesting, suitable, coherent,

and affective commentary from different points of view (in dependence on commentators’

attitudes to the players) in real-time. To investigate what the real tennis commentators

26

Chapter 4. Generating Dialogue 27

say during the game, we have analysed several tennis games from YouTube1. We have

found out that there are usually two commentators that comment on a tennis match

where the second commentator is usually a former tennis player or an expert in the field

that can always provide additional background information. We have also found out

that the commentary is to some extent driven by the states of the game, e.g., nobody

is talking when the serving player concentrates before s/he serves, the commentators

are engaged in small talks discussing players’ background when there is nothing else to

comment on or the commentators usually summarize every rally after it finishes. Thus,

for instance the statechart approach presented in the SceneMaker project (see section

3.3) would be also convenient, therefore we have employed finite state machines to decide

when to run the planner according to the states of the tennis game.

We have also noticed that the information being conveyed by a sport commentator does

not often bring much more than an ordinary spectator can perceive while s/he is watching

the same tennis game. Since we wanted our commentary to be more sophisticated, we

have let us inspire by the TennisEarth2 web page that describes tennis matches (rally

by rally) for tennis fans that have not seen them. As a consequence, the commentary on

the TennisEarth is more elaborate and inspiring for us. We also wanted to incorporate

more background knowledge since a standard tennis match is usually long-winded and

there is often nothing to comment on, thus we have made use of the OnCourt3 project

as a source of the background knowledge about players and tennis tournaments.

As we have already stated, the commentators have positive, neutral, or negative attitudes

to the players. Since the standard live commentary is usually balanced, except for

particular international tournaments, we had to add respective bias to our utterances.

Let us note that biased utterances usually convey particular affects. To deal with the

real-time requirement, we had to make sure that the dialogues are not too long. However,

we can predict the time we have at our disposal for a commentary according to the state

of the tennis game. For instance, we have always more time to comment on a just finished

game than on an event that happens within a rally. Nevertheless, these predictions

are only rough approximations, thus we had to allow interruptions, i.e., to interrupt

the current plan if more relevant event happens. The coherence of the commentary is

ensured by the dialogue planning that is elaborated in the next section.

1http://www.youtube.com/2http://www.tennisearth.com/3http://www.oncourt.info/

http://www.youtube.com/

http://www.tennisearth.com/

http://www.oncourt.info/


4.1.2 Dialogue Planning

To represent our presentation team, we have employed two virtual agents that have

different roles, attitudes to the players, and audio-visual appearance. The first com-

mentator is the Charamel virtual agent Mark that represents a TV tennis commentator

and the second Charamel virtual agent is Gloria that represents a tennis expert. (See

section 5.1.3 for more details on the Charamel avatar engine.) While Mark should con-

centrate on simple facts concerning the tennis game Gloria should rather elaborate on

these facts. Let us remember that all dialogues are based on commentators’ attitudes

to the players that can be positive, neutral, or negative.

Dialogue Schemes

We were inspired by the dialogue schemes presented in the project Presentation Teams

(see section 2.5.2). A dialogue scheme is a generic representation of a piece of dialogue

that can be generated under certain conditions by a planner. Let us note that dialogue

schemes correspond to the methods in the HTN planning. Let us also remember that

in the HTN planning, the compound goal task is decomposed by planning methods to

the subtasks where each subtask is either a planning operator that corresponds to a

template (that represents an utterance) or a compound task that is further decomposed

by planning methods. Consider the planning method depicted in Figure 4.1.

Figure 4.1: Example of a Planning Method

Let us assume that player ?P1 has played a winning return (i.e. player ?P2 has lost

the rally) and the subgoal task deduced by the planner from the goal task according

to the current state of the game is the compound task “comment on rally”. Thus, we


can satisfy the compound task “comment on rally” by performing the BODY of the

planning method if the PRECONDITIONS of the planning method can be satisfied (i.e.

?A is a commentator, ?B is an expert, player ?P1 has played a winning return, player

?P2 has lost the rally, ?A and ?B have both positive attitude to player ?P1 ). Figure

4.1 also presents an example of a possible dialogue that can be generated by applying

this planning method assuming that the BODY of the planning method consists only of

two planning operators (i.e. not compound tasks), and variables ?P1, ?P2, ?A and ?B

stands for respective players, commentator, and expert. We have already defined that

all dialogue schemes are based on commentators’ attitudes to the players, nevertheless

the semantic of the dialogue schemes can have one of the form defined in Table 4.1.

Whilst the left column defines individual dialogue schemes the right column presents an

example of a possible generated dialogue for each dialogue scheme.

Dialogue Scheme Example of a Generated Dialogue

A: argument for/against X A: “That serve was really phenomenal!”

B : contrary B : “Well, that is a little exaggerated!”

A: argument for/against X A: “Blake is in great shape as usual.”

B : contrary B : “But he already produced several unforced errors.”

A: override A: “Still, he is the best player on the court.”

A: argue for X A: “Excellent return by Safin.”

B : elaborate on X B : “Unreachable for Blake”.

A: background fact X A: “The brother of Blake Thomas is a well known player.”

B : evidence of X B : “His best ranking was the 141st place in 2002.”

A: background fact X A: “Roddick has been 4 times injured recently.”

B : consequence of X B : “It will be hard to break through today.”

Table 4.1: Dialogue Schemes

Planning Large Dialogue Contributions

We have already shown how to generate a simple dialogue. In the following text, we

will describe how to generate large dialogue contributions that consist of several simple

dialogues. Consider a part of a planning tree that is depicted in Figure 4.2 where all

nodes stand for compound tasks. Imagine that a game has finished and the subgoal task

of the planner deduced from the goal task is the compound task “comment on just fin-

ished game”. Hence, to satisfy the compound task “comment on just finished game”, we

have to satisfy all its compound subtasks, namely: Introduction, Body, and Conclusion.

Similarly, to satisfy the compound task Body, we have to satisfy all its compound sub-

tasks, namely: comment on score, comment on winning team, and comment on losing

team. The decomposition of the compound subtasks comment on winning team and


Figure 4.2: Example of a Compound Task Decomposition

comment on losing team are analogous. Every leaf of the subtree depicted in Figure

4.2 corresponds to at least one planning method that decomposes respective compound

task. The compound task decomposition is accomplished by a planning method that

stands for a dialogue scheme or by a planning method that represents a hierarchy of di-

alogue schemes, i.e., the compound task can be decomposed by a planning method into

several dialogue schemes in dependence on the facts that hold in the current description

of the world (e.g. commentators’ attitudes to the players). The following list presents

a possible generated dialogue that summarizes a game that has just finished (where C

and E stand for a commentator and an expert, respectively).

Introduction

E : “What a relief!”

C : “Tight game let’s summarize it.”

Comment on Score

C : “Blake and Roddick won the first game.”

E : “That’s unbelievable that they broke opponents’ serve!”

C : “That was spectacular!”

Comment on winning team - Highlights

C : “Blake and Roddick played an excellent game.”

E : “Well, they played several excellent winning returns.”

Comment on winning team - Difficulties

C : “Can you say something about difficulties of Blake and Roddick?”


E : “They were already trailing.”

C : “But they recovered.”

Comment on winning team - Odds

C : “Are Blake and Roddick going to win the match?”

E : “They are my favourites!”

Comment on losing team - Difficulties

C : “What difficulties did Safin and Ferrer have?”

E : “They did many unforced errors.”

Comment on losing team - Odds

C : “Do Safin and Ferrer have any chance to win.”

E : “Well, they can still break through.”

Conclusion

C : “Let’s see the next game.”

E : “Definitely.”

4.1.3 Planning Tree

In this section, we will describe our planning tree that represents the hierarchy of all

dialogues that can be generated. The planning tree is defined as a Hierarchical Task

Network (HTN) in the planning domain of the JSHOP planner (see section 3.1). The

root of the planning tree is the goal task, any internal node of the planning tree is a

compound task (i.e. a possible subgoal task), and every leaf of the planning tree is

either a primitive task that corresponds to a template (that represents an utterance) or

a reference to a particular compound task that is an internal node of the planning tree.

Let us consider Figure 4.3. To satisfy a compound task, we have to either satisfy all its

descendants (1), one arbitrary descendant that can be satisfied (2), or we have to satisfy

the first descendant that can be satisfied (3).

Figure 4.3: Possible Decompositions of a Compound Task


The root of our planning tree is the goal task “Comment”. Figure 4.4 depicts how the

goal task “Comment” is decomposed into subgoal tasks in dependence on the state of

the game, e.g., the presentation team is engaged in dialogues to introduce the upcoming

game if the game is just at the beginning or they summarize a rally just after a rally

finishes.

Figure 4.4: Decomposition of the Goal Task “Comment”

Figure 4.5 shows the further decomposition of the compound task Comment on rally

that is a subgoal task of the goal task “Comment”. Thus, our presentation team is

commenting on the result of the last rally in dependence on its outcome, e.g., the pre-

sentation team can comment on an excellent ace or a winning return played by a player.

Figure 4.5: Decomposition of the Subgoal Task Commant on rally

Figure 4.6 depicts the whole decomposition path from the goal task “Comment” to the

subgoal task “Drop Volley” which results in a commentary on a rally that finished with

a winning return that was a drop volley (i.e. the player won the rally by a ball that he

played before it bounced and then placed it just behind the net).


Figure 4.6: Decomposition of the Goal Task “Comment” that leads to a SubgoalTask Drop Volley

4.1.4 Commentary Excerpt

In this section, we will show an example of a generated dialogue where the players of

the serving team are: Blake and Roddick and the players of the receiving team are:

Safin and Ferrer. In this example, the dialogues are unbiased, i.e., the attitude of the

commentators is neutral since we would like to show how detailed the commentary can

be supposing that there is enough time to utter it. The state of the game and the

subgoal of the planner are mentioned before each dialogue. Let us note that C stands

for a commentator and E stands for a tennis expert. Another commentary excerpt is

shown in Appendix A.

Beginning - Introduction to the upcoming game

C : “Ladies and Gentlemen! Welcome to the Wimbledon semi-final in doubles.”

E : “We will guide you through the match in which James Blake and Andy Roddick

are playing versus Marat Safin and David Ferrer.”

C : “Enjoy the show!”

Rally in Progress - Serving Player’s Background

C : “Roddick has been 4 times injured since last years.”

E : “It will be hard to break through today.”

Rally in Progress - Comment on a nice shot

E : “What a shot!”

Rally finished - Summarize the rally (score: 15:0)

C : “What a Forehand by Roddick.”


E : “Roddick hit an excellent forehand-volley right into the left corner.”

C : “Roddick took advantage of a weak forehand return from Safin.”

Rally in Progress - Players’ Background

C : “Brother of James Blake Thomas is also playing tennis.”

E : “His best ranking was in 2002 when he occupied the 141st place in doubles.”

Rally in Progress - Comment on a nice shot

C : “What a shot by Roddick!”


C : “What a long ralley!”

E : “Ended by an inaccurate backhand-volley by Safin.”

C : “30:0”

E : “Blake and Roddick are holding their serve so far.”

Ralley in Progress - Background

E : “The weather is cloudy today.”

C : “Hopefully it won‘t be raining.”


C : “Nice high lob by Safin.”

E : “Too high for Roddick.”

C : “Caused unforced error by Blake.”

4.2 Affect

In the following sections, we will explain why it is important to generate affective com-

mentary on a tennis game and how an affect can be conveyed by different modalities.

We will explain two methods that we have employed to generate affective commentary

on a tennis game and discuss the pros and cons of this approach.

4.2.1 Motivation

In this section, we will clarify how important it is to incorporate emotions into the

commentary and how the affect can be expressed. In general, the virtual agents are

better accepted by users if they are endowed with emotions. [2] Different personality

profiles and affect make virtual agents more distinguishable which is beneficial to cre-

ation of the presentation teams. We were inspired by the concept of the presentation

teams described in section 2.5. Thus, we have employed two distinct virtual agents that


have different roles (commentator, expert), attitudes to the players (positive, neutral,

negative), and personality profiles (defined by: optimistic, choleric, extravert, neurotic,

social). Two affective virtual agents can also better represent opposing opinions and

are more entertaining than only one presenter. Moreover, the user should better recall

conveyed facts.

There can be many exciting moments in a tennis game as well, e.g., to win a tennis game

a player must have at least four points in total and two points more than the opponent,

thus the finish of a tennis game can be quite thrilling since there can be many game and

break points (i.e. situations when the serving or receiving player needs only one point

to win the game). Therefore, our virtual agents should affectively react to the events

that, e.g., lead to the victory of their favourite player or that lower the odds to win. The

current affect of a virtual agent can be expressed by dialogue scheme selection, lexical

selection (i.e. choice of an appropriate utterance according to the current affect), gaze,

facial expression, and hand and body gestures.

4.2.2 Planning with Attitude

In this section, we will describe how a particular affect can be conveyed via the choice

of a corresponding dialogue scheme where a dialogue scheme is a generic definition of

a piece of dialogue (see section 4.1.2). As we have already stated, a virtual agent can

have positive, neutral, or negative attitude to a player. Let us note that almost every

topic of the commentary is related to a specific event (e.g. a player has just scored, a

player has lost the lead). Thus, every such event can be appraised by a virtual agent as

desirable or undesirable according to his/her attitude to the players (e.g. it is desirable

when my favourite player gets a point or undesirable when he loses the lead). Hence, a

virtual agent will comment in a positive way on a desirable event and in a negative way

on an undesirable events. Each event is also usually connected with a particular player,

thus a virtual agent will comment in a positive way on actions of a player s/he likes and

in a negative way on actions of a player s/he dislikes. A virtual agent that has a neutral

attitude to a player will comment in a neutral way on events that are connected with a

respective player.

Let us consider a dialogue that consists of two utterances that are uttered by two virtual

agents. Let us assume that the dialogue is either related to an event that can be

appraised as positive, neutral, or negative, or the event is related to a player to which a

virtual agent has positive, neutral, or negative attitude. Table 4.2 presents examples of

possible generated dialogues where A and B stand for respective commentators. The first

column represents a particular combination of appraisals of an event or a combination of


attitudes to a player that is related to a particular event. The second column represents

a dialogue scheme of a possible dialogue where X stands for a player’s action or a fact.

The third column represents an example of a generated dialogue.

Appraisal Dialogue Scheme Example of a Generated Dialogue

A: positive A: argue for X A: “Outstanding ace by Blake!”

B : positive B : support X B : “Blake hits blistering serve down the line!”

A: positive A: argue for X A: “Excellent forehand by Safin!”

B : negative B : play down X B : “That’s a bit overstated.”

A: negative A: point out fault X A: “Safin failed to get the ball over the net.”

B : positive B : excuse X B : “Safin just overhits the serve.”

A: neutral A: convey fact X A: “The score is already 30:0.”

B : negative B : consequence of X B : “Safin and Ferrer are real losers as usual!”

A: neutral A: convey fact X A: “Deuce again.”

B : neutral B : elaborate on fact X B : “Safind and Ferrer got back on board.”

Table 4.2: Example of Generated Dialogues based on different Appraisals

Thus, we have shown how a particular affect can be conveyed via the choice of an

appropriate dialogue scheme. Let us note that the pieces of a generated dialogue are

individual utterances where an utterance is usually uttered by a virtual agent in a

particular situation that is correlated with a particular affect. Therefore, we annotated

each utterance with default gesture and facial expression tags to seamlessly convey a

particular affect by an utterance. Nevertheless, these tags are only default and can be

substituted by other tags generated by other modules. For instance, the facial expression

can be also set according to the current affective state of a virtual agent generated by

the emotion module that is described in the next section.

4.2.3 OCC Generated Emotions

In this section, we will describe the emotion module that models the affective state

of each virtual agent according to the OCC (Ortony, Collins, Clore) cognitive model

of emotions. [36, 37] We simulate eight basic OCC emotions that are relevant to the

tennis commentary. These emotions are explained in Table 4.3. The emotion module is

initialized with the personality of each virtual agent that is defined by five personality

traits listed in Table 4.4.


OCC Emotion Description

JOY Something happened that I wanted to happen.

DISTRESS Something happened that I did not want to happen.

HOPE Something may happen that I really want to occur.

FEAR Something may happen that I wish to never occur.

RELIEF Something bad did not happen.

DISAPPOINTMENT Something did not happen that I really wanted to occur.

SATISFACTION Something happened that I really wanted to occur.

FEAR-CONFIRMED Something bad did actually happen.

Table 4.3: Description of the eight Basic OCC Emotions

Personality Trait

optimistic

choleric

extravert

neurotic

social

Table 4.4: Five Personality Traits

The input of the emotion module are facts that our system deduces from the elementary

events got from the tennis game. The main functionality of the emotion module4 is

implemented in Jess (see section 3.2). The goals and antigoals of a virtual agent are

deduced from his/her attitude to the players, e.g., virtual agent A that has a positive

attitude to player P wants P to win the game, conversely, virtual agent B that has a

negative attitude to player P wants P to lose the game. The events that happen in the

tennis game are appraised as desirable if they lead to the goal or undesirable if they

hinder the goal. The conditions that elicit emotions based on the events that happen in

the tennis game are called emotion eliciting conditions. The appraisals of the emotion

eliciting conditions then generate particular emotions with respective intensities where

the initial intensity of a particular emotion depends on personality of the respective

virtual agent. The affective state of a virtual agent is represented by a vector of intensi-

ties of each emotion where, for instance, the emotion with the highest intensity can be

considered as the output of the emotion module. Since the emotions decay over time,

the emotion module maintains the emotion decay using, e.g., a linear decay function.

Table 4.5 shows examples of events that elicit respective emotions.

4The definitions of the OCC emotions (in source file occ.clp) were provided by Michael Kipp (DFKI).


OCC Emotion Event

JOY My favourite player scored.

DISTRESS My favourite player lost a point.

HOPE My favourite player is now leading.

FEAR My favourite player is now trailing.

RELIEF My favourite player settled the score.

DISAPPOINTMENT My favourite player lost the lead.

SATISFACTION My favourite player won the game.

FEAR-CONFIRMED My favourite player lost the game.

Table 4.5: Example of Events that elicit respective Emotions

Figure 4.7 depicts the GUI of the emotion module. The left part of the chart depicts

the current intensities of respective emotions for the first virtual agent and the right

part of the chart depicts corresponding data for the second virtual agent. The dynamic

bar chart was created using the JFreeChart5 library. There is also a log for each virtual

agent that lists all events that have caused a particular emotion from the beginning of

the tennis game. (Let us remark that Figure 4.7 depicts only the last two events.) Each

log entry consists of the emotion name, initial intensity, and the cause description.

Figure 4.7: Emotion Module GUI

5Andreas Viklund. The JFreeChart Class Library. http://www.jfree.org/jfreechart/

http://www.jfree.org/jfreechart/


The output of the emotion module is currently employed to set and update the facial

expression of each virtual agent every second. Nevertheless, it could be also used for the

gesture and lexical selection or as an input of the planner (if we had dialogue schemes

based on the OCC emotions).

4.2.4 Discussion

In this section, we will explain why we have employed two methods to simulate emotions

and which other options we have considered. As we have already stated, all dialogue

schemes are based on virtual agents’ attitudes to the players. Nevertheless, we could

have based the dialogue schemes also on the virtual agents’ current emotions. In this

case, we would have first derived the current emotion for each virtual agent and then

we would have tried to find an appropriate dialogue scheme. Nevertheless, in this case,

we would have had to face to the substantial growth of the number of dialogue schemes

and to the subsequent growth of the number of templates that represent individual

utterances. Thus, we would have had dialogue schemes for all meaningful combination

of emotions that the virtual agents can have.

However, we noticed that the positive appraisals usually correspond to emotions such as:

joy, hope, satisfaction, and relief, and that the negative appraisals usually correspond

to emotions such as: distress, disappointment, fear, and fear-confirmed. Therefore, we

could simplify the design of the planning domain and base the dialogue schemes only on

virtual agents’ attitudes to the players and derive specific emotion in a separate emotion

module. Such a specific emotion can be expressed by other modalities (e.g. facial

expression, gaze, gestures, lexical selection) except for the dialogue scheme selection.

Nevertheless, if we had had the specific emotion of each virtual agent as an input of the

planner, we could have also generated plans where the emotions could have changed at

some point as a reaction to what the other agent would have said. However, this option

is not useful in our case since both virtual agents share the same knowledge about the

tennis game, and the emotion of a virtual agent should correspond to the current state

of the game and not substantially change, for instance, from joy to distress if the virtual

agent’s favourite player is winning but the other virtual agent has just said something

bad about the winner.

Nevertheless, the option to change the emotion at some point of a plan would be useful if

the virtual agents had different knowledge about the tennis game such that an utterance

uttered by one virtual agent could have substantially changed the emotion of the other

virtual agent (e.g. one virtual agent would have made the other virtual agent happy if

s/he had told him/her that his/her favourite player had just won the game). To change


the emotion at some point in a plan would be also useful if the plans were longer, which

in our case is only the commentary on a just finished game, but it is hard to imagine

that a virtual agent that is very happy because his favourite player has just won the

game would have changed his/her emotion, e.g., from joy to distress because the other

virtual agent said something bad about a player s/he likes.

We have written a separate emotion module since we wanted to simulate the emotional

state of each virtual agent more precisely, e.g., we wanted to maintain the emotion decay

which would be infeasible in the planner. We could also have used some off-the-shelf

software to simulate the emotional state of each virtual agent. Nevertheless, we wanted

to simulate the emotions in a transparent way so that we could clearly see which event

had elicited which emotion and which emotion currently prevailed. We also wanted to

have full control over the module (i.e. we can adjust the computation of the initial

intensities of individual OCC emotions in dependence on the personality, we can define

our decay function, and we have control over the input and output tags). Therefore, we

did not use any “black box” such as ALMA [15], although ALMA is in general a good

choice to simulate the affective state of a virtual agent since it additionally maintains

the history and emotion blending.

The emotion module and the planner run independently. The planner cannot update the

emotion module since not every plan that is generated is also executed. Additionally,

the time of the plan generation and the time of the plan execution are different. The

emotion module could have passed the current emotional states of the virtual agents to

the planner, nevertheless we do not need the exact emotional state of the virtual agents

in the planner since our dialogue schemes are based only on virtual agents’ attitudes to

the players.

Chapter 5

Architecture

In this chapter, we will introduce individual modules of our system and describe how

they cooperate to generate a commentary on a tennis game for our presentation team

based on elementary events that are produced by a tennis simulator in real-time. The

system consists of several modules that are running in separate threads and communicate

via shared queues. For each module, we will describe its task and how it communicates

with other modules, i.e., what are the input and output of a particular module. First,

we will introduce the tennis simulator that produces elementary events (e.g. a player

plays a forehand, the ball crosses the net, the ball lands out). Then, we will describe

the plan generation, i.e., how we generate plans based on the knowledge deduced from

the elementary events got from the tennis simulator where a plan represents a particular

dialogue. Afterwards, we will explain how these generated plans are executed, i.e., how

we select plans from all the plans generated in the previous step. Our presentation team

is then engaged in dialogues that correspond to the selected plans.

5.1 System Overview

In the following sections, we will present the main design aims, introduce the overall

architecture of the system, and present the off-the-shelf components that are employed

in the system. We will discuss advantages of the modular architecture of the system,

how to ensure the reactivity, as well as discuss the need for extensibility. Finally we will

briefly introduce individual modules of the system and how they cooperate to produce

a commentary on a tennis game.

41

Chapter 5. Architecture 42

5.1.1 Design Aims

The system was designed with three main design aims, namely: modularity, reactivity,

and extensibility, that will be described below.

Modularity

The overall system is broken down into individual modules where each module provides

clearly defined interface and functionality. Each module is running in a separate thread

and asynchronously communicates with other modules via shared queues. This approach

is advantageous since each module can be tested separately and possibly replaced by

another module that implements the same interface.

Reactivity

The system should be able to react quickly to new events. Evidently, reactivity is

closely related to the modularity which facilitates not only parallel execution at multi-

core platforms but also the possibility of interruptions, i.e., one module can cause the

interruption of another module by sending an asynchronous message. The response time

of each module must be reasonably bound as well.

Extensibility

Since we wanted to participate in GALA 2009 (see section 1.2), we had at that time to

rapidly develop a demo application. As a consequence, the overall design should have

allowed for simple functionality implementation and subsequent refinement. This aim

is also related to the modularity since individual modules can be added, replaced, or

separately improved.

5.1.2 System Architecture

In the following text, we will briefly explain how we generate the commentary for our

presentation team based on the elementary events (e.g. a player serves, a ball hits the

net) that are produced by the tennis simulator. We will introduce individual modules

of the system and describe how they communicate. Figure 5.1 depicts the overall ar-

chitecture of the IVAN system and Figure 5.2 describes the dataflow that starts with

the elementary events produced by the tennis simulator and ends with the multimodal

output represented by the Charamel avatar engine (see section 5.1.3).

The tennis simulator is sending elementary events to the event manager. The event

manager is getting these elementary events (such as a ball is crossing the net, a ball

bounces) and deduces low-level facts (e.g. a rally finished). These derived low-level

facts are stored in the knowledge base. The event manager also decides when to run


Figure 5.1: IVAN Architecture

Figure 5.2: Dataflow

the discourse planner based on the global state of the game. In other words, the event

manager has a role of a perception unit since it is receiving events from the outside world

and maintains its coherent representation in the form of a knowledge base. The discourse

planner triggered by the event manager gets facts from the knowledge base, generates all

possible plans, and passes them to the output manager where a plan represents a possible

dialogues. Some facts can also be deduced during the planning process and stored in the

knowledge base (e.g. statistics to generate the commentary that summarizes the game).

The output manager maintains the plan execution, chooses one plan to execute, matches

planning operators with templates, adds gestures annotation, and sends appropriate


commands to the avatar manager that transforms them to the avatar engine specific

commands. More precisely, there is a mapping that maps each planning operator onto

a template where a template represents a set of possible annotated utterances. Thus,

a planning operator is mapped onto an annotated utterance that is chosen at random

among all utterances that correspond to a respective template. Furthermore, the avatar

manager maintains the state of the dialogue (e.g. who is speaking at the moment or how

long it will take to finish the current utterance) which can be used, e.g., to decide when

to interrupt the current discourse.

There is also the emotion module that maintains separately the emotional state of each

virtual agent. For instance, the facial expression of each virtual agent is updated every

second according to the current emotional state that is stored in the knowledge base.

Let us note that the knowledge base also contains background facts about the game and

players, virtual agents’ roles (commentator or expert), personality profiles, and attitudes

(positive, neutral, or negative) to the players.

5.1.3 Off-the-shelf Components

We have used two commercial products as an audio-visual component of the system.

We have employed Charamel1 to visualize virtual agents and RealSpeak Solo2 as a text-

to-speech (TTS) engine. We will describe both software toolkits in the following para-

graphs.

Charamel Avatar Engine

Charamel is a standalone application that communicates via socket and can visualize

several virtual agents at the same time. Individual virtual agents are controlled via

scripting language CharaScript. The virtual agents can express 14 different facial ex-

pressions (e.g. smile, happy, disappointed, angry, sad) with varying intensities. Their

lip movement is synchronized to speech that is produced by the RealSpeak Solo TTS.

The virtual agents can playback around one hundred pre-fabricated gesture clips that

can be tweaked using many different parameters (e.g. velocity, start time, end time,

interpolation time). Moreover, the transitions between each two consecutive gestures or

facial expressions are interpolated; the virtual agents are also performing idle gestures

in the meantime while no other gestures are triggered in order to look natural. Figure

5.3 depicts two Charamel virtual agents Mark and Gloria that were employed in the

system.

1http://www.charamel.com/2http://www.nuance.com/realspeak/solo/

http://www.charamel.com/en.html

http://www.nuance.com/realspeak/solo/


Figure 5.3: Charamel Virtual Agents Mark and Gloria

RealSpeak Solo TTS Engine

RealSpeak Solo is a TTS engine that gets commands from the Charamel to vocalize

desired utterances. While the TTS engine is vocalizing an utterance it is also sending

tags back to the Charamel which enables synchronized lip movement of a virtual agent

that is speaking. RealSpeak Solo supports several male and female voices. We employed

British female voice Serena for the Charamel virtual agent Gloria and American male

voice Tom for Mark.

5.2 Tennis Simulator

The GALA 2009 challenge was given as a static ANVIL file that describes a tennis game

(see section 1.2). Since we wanted to test our system as if it was a real-time application

we wrote a tennis simulator that reads first an ANVIL file and then simulates the game

in real-time. Although we consider the tennis simulator as a part of our system, it can

be easily reused in other systems since it communicates via socket. Moreover, only a

subtle modification is needed to simulate any game that is given as an ANVIL file (with

a corresponding video). In the following text, we will describe our tennis simulator in

detail.

The architecture of the tennis simulator is shown in Figure 5.4. The tennis simulator

first reads a video file and its annotation that is stored in an ANVIL file. The video is


Figure 5.4: Tennis Simulator

opened in a video player that is implemented using the Java Media Framework API 3;

the timestamped events, read from the ANVIL file, are stored in a priority queue. When

the simulator is started it is sending events one by one at the time they occur to a socket.

Since the time of the simulation is determined by the video player it is possible to pause

the simulation or to move it forwards. It is also possible to fire one of the pre-defined

question event anytime.

Figure 5.5 shows a GUI of the tennis simulator. A user first chooses an input file. S/he

can decide whether the video will be displayed in the video player or not and whether

the start of the simulation will be postponed or moved forward; then the simulation can

be started.

Figure 5.5: Tennis Simulator GUI

3http://java.sun.com/javase/technologies/desktop/media/jmf/

http://java.sun.com/javase/technologies/desktop/media/jmf/


5.3 Plan Generation

In this section, we will describe how we generate plans that correspond to possible

dialogues from the elementary events that are generated by the tennis simulator. Figure

5.6 depicts in colors the part of the system that is responsible for the plan generation

and Figure 5.7 shows which part of the dataflow is covered in this section. First, we will

describe the event manager that is getting elementary events from the tennis simulator,

deduces low-level facts from the elementary events and stores them in the knowledge

base where the low-level facts along with the background knowledge, virtual agents’

roles, personality profiles, and attitudes to the players create coherent representation

of the outside world. Then, we will describe the discourse planner that is triggered by

the event manager, gets facts from the knowledge base, and outputs all possible plans

that are subsequently passed to the output manager that maintains the plan execution

described in section 5.4.

Figure 5.6: IVAN Architecture - Plan Generation


Figure 5.7: Dataflow - Plan Generation

5.3.1 Event Manager

In this section, we will describe the event manager that has a role of a “perception unit”

since it is getting events from the outside world and maintains its coherent representation

that is stored in the knowledge base. More precisely, the event manager is getting

elementary events from the tennis simulator and deduces low-level facts that are stored

in the knowledge base. It also maintains the overall state and score of the match and

decides when to run the discourse planner. The elementary events (e.g. a player plays

a backhand, the ball lands out) that the event manager is getting from the tennis

simulator were defined in the GALA 2009 scenario in detail (see section 1.2), moreover

an elementary event can be also a user pre-defined question event. Let us remember

that a tennis match consists of sets, a set consists of games, and a game consists of

rallys. However, for the sake of simplicity we consider only one tennis game. Since we

cannot run the discourse planner every time we get an elementary event, we first describe

basic states of the tennis game that are modelled using finite state machines, and then

we identify at which states we run the discourse planner. After that, we explain what

low-level facts are deduced by the event manager, stored in the knowledge base and

subsequently available for the discourse planner.

States

Two finite state machines that we have employed to simulate basic states of the tennis

game are depicted in Figure 5.8. Both finite state machines run in parallel where the

initial state is marked by red and the transitions correspond to particular sequences of

elementary events.

Let us first look at the finite state machine on the left side. We start at the state

beginning, after a player throws a ball to serve we move to the state game in progress, at

the end after the game finishes we move to the state game finished. The state machine

on the right side starts at the state game not in progress. After a player throws a ball to


Figure 5.8: States of the Tennis Game

serve we move to the state rally beginning. A player can throw a ball several times before

he actually serves but after he serves we move to the state rally in progress. After the

ball hits the net, lands out, or bounces twice we get to the state rally finished. Then, in

the case the game is finished we get to the state game not in progress otherwise we wait

till a player throws a ball to serve and move to the state rally beginning. Both finite state

machines could be perceived as one but two of them will provide better understanding.

There are also two facts stored in the knowledge base derived from respective finite state

machines.

The event manager triggers the discourse planner at some states of the tennis game. We

will now show the list of specific states at which the discourse planner is triggered. The

list also contains some examples of goals that can be derived by the discourse planner

at respective states. (Let us note that additional states could be added if desired).

• beginning - do some introduction to the upcoming game

• rally finished - summarize just finished rally

• game finished - discuss just finished game

• rally beginning & a player has thrown the ball already twice - a player is nervous,

a player concentrates

• rally in progress - comment on the serving player’s background

• rally in progress & a volley or a smash was played - nice shot, risky shot

• rally in progress & the ball hit the tape - luck, inaccuracy

• a question event occured - answer the question


Score

The score of the game is also maintained in the event manager using a point counter for

each player and a finite state machine depicted in Figure 5.9. If a player wins a rally

s/he gets one point. A player wins the game if he has at least 4 points in total and at

least 2 points more than the opponent. After both players reach at least 3 points and

the game is not over yet, the score is either deuce or advantage. Table 5.1 explains how

the tennis score is counted for one player in the tennis terminology. Let us note that

the same player is serving within one game and that the score is read with the serving

player’s score first.

Figure 5.9: Tennis Score Counting using a Finite State Machine

Score Explanation

“love/zero” 0 points“fifteen” 1 point“thirty” 2 points“forty” 3 points“deuce” at least 3 points have been scored by each player, scores are equal“advantage” for the leading player, at least 3 points has been scored by each

player and one player has one point more

Table 5.1: Description of the Tennis Counting Terminology

Facts

We will now explain which low-level facts are deduced by the event manager from the

elementary events and stored in the knowledge base. The reason why we perform the

deduction of the low-level facts at this level in the event manager is that it substantially

facilitates the planning domain design. Working with the elementary events in the

planning domain would be quite cumbersome and unsuitable in the case we want to

reach reasonable latency. As we have already mentioned, the state of the game and the

score are maintained in the event manager, thus also the respective facts are stored in

the knowledge base. While the knowledge base contains only the current state of the


game, it contains all facts that describe the score from the beginning of the game. To

distinguish between individual score facts and to rank them, we will introduce a concept

of score generations, i.e., the first score fact has 0 generation, the second score fact has

1 generation etc. We can deduce, e.g., whether a player has lost the lead or settled from

the consecutive score facts. (Let us note that the concept of generations is often used in

computer science to distinguish among data that originates at consecutive steps of an

algorithm.)

Rally Snapshots

All events that occur in the tennis game are partitioned into the so-called rally snapshots.

We will now describe which low-level facts are derived from a rally snapshot and stored

in the knowledge base. Each rally snapshot has its generation that is similarly defined as

the score generation. (Let us note that the rally generation and the score generation are

different in general since, e.g, the first fault is a rally without score change.) The low-

level facts are deduced for each rally snapshot and stored in the knowledge base. In the

case the planner is triggered in the middle of a rally, the knowledge base then contains

only facts deduced from the elementary events considering the current incomplete partial

rally snapshot. The following list outlines which specific low-level facts are deduced from

a rally snapshot and stored in the knowledge base:

• how many times did the ball cross the net

• a list at which heights the ball crossed the net

• a list of pairs (player, shot) sorted from the beginning of a rally to its end

• a position where the last ball, that was in the field, bounced first

• a position where the last ball, that was out, bounced

• whether the ball crossed the net before it landed out

• which player missed the last ball

• how many times the serving player had thrown the ball before he served

Table 5.2 contains three examples that show which high-level facts can be deduced from

the low-level facts listed above. Figure 5.10 depicts a hierarchy of facts that shows how

an ace can be deduced.


high-level fact a list of low-level facts

ace the ball crossed once the net, bounced in the field,state - rally finished

lob the ball crossed the net at high position, bounced at the baselinedrop the ball crossed the net at low position, bounced at the net

Table 5.2: Example of high-level facts deduced from low-level facts

Figure 5.10: Hierarchy of Facts from which an Ace can be deduced

Comparison to Related Work

The event manager is to some extent similar to the STEVE’s perception module (see

section 2.4) since it also maintains the state of the world and its coherent representa-

tion. Our approach is also similar to the SceneMaker (see section 3.3), that employs

statecharts to control virtual agents, with the difference that while the SceneMaker can

perform, e.g., a pre-defined scene (i.e. a dialogue where utterances are annotated with

gestures) at a certain state we run the planner to generate the scene.

5.3.2 Background Knowledge

The background knowledge about the players and the game is incorporated to produce

commentary when, for instance, there is currently nothing else to comment on. We will

show some examples of background facts that are stored in the knowledge base. The

background knowledge is stored in several static CSV (Comma Separated Values) files

that could be alternatively replaced with a relational database. After the system starts,

all CSV files are read and the background knowledge they contain is transformed to the

facts that are stored in the knowledge base. Table 5.3 shows some examples which facts


can be deduced from the background knowledge.

Background knowledge Examples of deduced fact

Player’s details A sister of a player is also a tennis professional.Ranking A player is leading the ATP score.Style A player is playing risky as usual.Injury A player has been four times injured recently.Player’s results A player won two matches in a row.Tournaments details The tournament is played in London on grass.

Table 5.3: Examples of Facts deduced from the Background Knowledge

5.3.3 Discourse Planner

The discourse planner is responsible for the plans generation where a plan represents a

dialogue. The discourse planner is triggered by the event manager at particular states

of the game. It gets all facts from the knowledge base and outputs all possible plans

that are subsequently passed to the output manager. We will describe the input of the

planner, the planner itself, and the representation of the planner output. Let us note

that the concept of the dialogue generation has already been described in Chapter 4.

Input

The input of the planner consists of a planning task and a list of facts that describe

the initial state of the world. The planning task is the same all the time, namely, the

compound task “comment”, since the planner decides each time what it should comment

on according to the supplied facts. The list of facts varies and contains all the facts that

are stored in the knowledge base, i.e., it contains the following types of facts:

• the current state of the game

• scores of the game

• rally snapshots

• background knowledge (see section 5.3.2)

• commentators’ (positive, neutral, negative) attitudes to the players

• roles (commentator, expert)

• a question (a fact identifying that there is a question to be answered)


The Planner

We have employed the JSHOP (Java Simple Hierarchical Ordered Planner) as an HTN

planner to produce the commentary on a tennis game. See section 3.1 for more details

on JSHOP. As described above, the planner gets the input in the form of a problem

description and outputs all possible plans. The concept how these plans are generated

has already been described in detail in Chapter 4. Since JSHOP is an offline planner,

we had to modify it to run online. First, we will describe what makes JSHOP an offline

planner, how we modified it to run online, and how we could have employed JSHOP

without modification since we also considered and implemented this option.

JSHOP as an Offline Planner - The drawback of JSHOP is that it requires to

generate and compile the problem description prior to running the planner, assuming

that the problem description changes whereas the domain description remains the same

during the system run. As we can see, there is a costly compilation step before each run

of the planner. See section 3.1 where we explained the JSHOP input generation process

in detail. Let us also note that the planner does not have its own working memory in

the sense that every time it is run all facts have to be supplied again.

JSHOP as an Online Planner - We investigated how the problem description Java file

was generated from the JSHOP problem file and found out how to bypass the compilation

step described above. We have written a universal problem description Java file that

has been compiled only once and fully replaces the problem description Java file that

would be generated by the JSHOP, i.e., the instance of the problem description Java

class accepts the discourse planner problem description representation as Java objects

and serves as an input of the JSHOP as if the problem file was generated by the JSHOP.

This approach is fast and the plan generation takes only about 50-150ms.

Alternative Use of JSHOP as an Online Planner - JSHOP can be used as an

online planner without modification. However, this approach is quite costly since the

compilation step takes each time about 1 second and also consumes a lot of CPU re-

sources. Figure 5.11 shows individual steps of this alternative approach that will be

described below. The discourse planner uses its own problem description representation

that is first transformed to the JSHOP problem file (that uses special Lisp-Like syntax),

then the respective Java file is generated and compiled. After that, we make use of a

nice Java feature, namely, that it allows to replace one class implementation by another

at runtime, i.e., it allows to replace one *.class file by another during the system run.

Thus, at the end of the process depicted in Figure 5.11, we have a *.class representation

of a problem description and the planner can be started.

Let us note that we use this approach to compile the domain description once at the

beginning when the system starts. In this case the process starts with the JSHOP


domain file from which the corresponding Java file is generated, compiled and replaced

at runtime.

Figure 5.11: JSHOP Input Generation Process

Output

The output of the planner is the so called planning response which contains: a list of

all possible plans, the time when the planner was triggered, and the respective state

of the game. Each plan from the list of all possible plans contains: priority, semantic

token, and a list of planning operators. The semantic tokens are strings that identify

plans. For instance, the semantic tokens can be used to avoid repetitions where we

disallow consecutive execution of two plans with the same semantic token. The list

of planning operators corresponds to a dialogue where each planning operator stands

for one template (that corresponds to an utterance). Moreover, some facts can be also

deduced during the planning process and stored in the knowledge base for the next run

of the planner. For instance, it can be the statistics that summarises the game (e.g.

the number of outs, winning returns, and aces for each player). These facts can be, for

instance, used to generate the commentary on a just finished game.

5.4 Plan Execution

In this section, we will describe how we execute the plans that are generated by the

discourse planner, i.e., how we select plans that will be executed, more precisely, in

which dialogues the virtual agents will be engaged. Figure 5.12 depicts in colors the

part of the system that is responsible for the plan execution and Figure 5.13 shows

which part of the dataflow is covered in this section. First, we will describe the template

manager that provides mapping for each planning operator of a plan onto a particular

utterance that is furthermore annotated with gesture tags. Then, we will describe the

avatar manager that stands for an interface of the Charamel avatar engine, and finally


we will describe the output manager that is responsible for the plan execution, i.e., it

decides which plans and when will be executed.

Figure 5.12: IVAN Architecture - Plan Execution

Figure 5.13: Dataflow - Plan Execution

5.4.1 Template Manager

Let us remember that each plan corresponds to a dialogue where a plan consists of a

list of planning operators (primitive tasks) and each planning operator corresponds to a


template that contains a set of possible utterances that can be uttered by a virtual agent.

In this section, we will describe how a planning operator is mapped onto a particular

utterance that can be additionally annotated with gesture tags. The template manager

contains over 220 different templates and provides mapping for each planning operator

onto a particular template where each template has usually several slots that can be

substituted by parameters of a respective planning operator. Each template contains

1-3 variants of an utterance. Which utterance will be chosen is decided at random for

the sake of higher variability.

Moreover, there are default gesture and facial expression tags in every utterance since

each utterance is more or less bound to a particular situation that is correlated with a

certain emotion. The facial expression tags can be for instance: Smile, Happy, Surprise,

Angry, or Sad with different intensities. The gesture tags can be for instance: Disagree,

DontKnow, Disappointed, Surprise, Oops, or OhYes. Each gesture tag is stored in a so

called gesticon and is mapped onto a set of 1-3 possible gestures that can be directly

performed by a virtual agent in a particular situation. Every time the gesticon is queried

to find a mapping for a given gesture tag, it chooses one gesture from the corresponding

set of possible gestures at random to achieve higher variability.

Furthermore, there are two duration tags for each utterance, the first denotes the number

of milliseconds needed to utter an utterance employing a male voice and the second tag

is the respective duration for a female voice. These tags can be used to estimate the

duration of utterance in the case it is not provided by the text-to-speech engine. Let us

note that the gesture and facial expression tags stand only for default values, i.e., they

can be filtered out and substituted by other tags generated by other modules.

Example

In the following text, we will show an example how a planning operator can be mapped

onto a particular utterance. Imagine that the server has served and the receiver has

returned the ball in such a way that the server failed to return it. One planning operator

(more precisely operator’s head) of the generated plan can be for instance:

briskly_returned_serve ?server ?receiver ?receiver_shot

Where the first string is the operator’s name and the strings that begin with a question

mark stand for variables that are substituted into slots of a template. The planning

operator’s head contains three variables: ?server refers to the serving player, ?receiver

refers to the receiving player, and ?receiver shot refers to the type of a shot that the

receiving player played. There is a corresponding template in the template manager

that contains three slots that correspond to the three variables of the planning operator.

The template consists of two utterances:


{EmotionSurprise} {ExplainTo} ?receiver surprised ?server with an accurate

?receiver_shot return.

{EmotionSurprise} {Play} ?receiver generated a ?receiver_shot {Look} return

that was out of ?server’s reach.

The facial expression and gesture tags are annotated in curly brackets. The facial

expression tags start with the prefix Emotion whereas all other tags are gesture tags.

Let us assume that: the second utterance has been chosen at random, the variable

substitutions are known, and the respective gesture tags have been chosen from the

gesticon at random. Thus, we get the following substitutions:

?server := Safin

?receiver := Federer

?receiver_shot := forehand

{EmotionSurprise} := $(Emotion,surprise,0.9,500,1000,3000)

{Play} := $(Motion,interaction/bye/bye01,400,500,0,10000,1.5)

{Look} := $(Motion,presentation/look/lookto_right02,400,500,0,1200,0.8)

where the facial expression and gesture tags are mapped onto the avatar engine specific

tags (see the Charamel manual [38] for more details). After we apply the substitutions

we get the following annotated utterance that can be directly sent to the Charamel

avatar engine.

$(Emotion,surprise,0.9,500,1000,3000)

$(Motion,interaction/bye/bye01,400,500,0,10000,1.5)

Federer generated a forehand

$(Motion,presentation/look/lookto_right02,400,500,0,1200,0.8)

return that was out of Safin’s reach.

After a Charamel virtual agent gets this utterance, s/he looks surprised, s/he makes a

hand movement as if s/he played a ball with a tennis racket, and then s/he gazes at the

other virtual agent.

5.4.2 Avatar Manager

The avatar manager serves as an interface of the Charamel avatar engine. In the fol-

lowing text, we will describe how we have incorporated this module into our system


and which functionality it provides. The avatar manager is placed between the out-

put manager and the Charamel avatar engine. The output manager decides what plan

will be executed, i.e., which utterance and when will be uttered whereas the Charamel

avatar engine displays two virtual agents that represent our commentary team and ac-

cepts commands to control their behaviour. Thus, the role of the avatar manager is

to transform commands from the output manager to the Charamel specific commands.

Furthermore, it maintains the state of the dialogue that can be exploited by the output

manager. An annotated utterance, a gesture, or a facial expression can be sent to the

avatar manager. The tags that describe the state of the dialogue got from the avatar

manager are as follows: which virtual agent is currently speaking, how long s/he has

already been speaking, how much time it takes to finish the current utterance, and what

gesture or facial expression has been set for each virtual agent last time.

Let us remember that all commands that are sent to the avatar manager or to the

Charamel avatar engine are sent in a non-blocking manner (i.e. it never waits till a

command is completed). Thus, the output manager must first get the current state

of the dialogue and then decide which command to send to the avatar manager. For

instance, if nobody is speaking it can send instantaneously an annotated utterance to

the Charamel avatar engine. If somebody is speaking it knows who is speaking and how

long it will take to finish the current utterance. Thus, the output manager can then

decide whether to wait or send a new utterance right away. For instance, it should wait

if the utterance that is being uttered will be finished in a second. Nonetheless, in the

case that somebody is speaking and the avatar manager gets a command to utter the

other utterance, it interrupts the virtual agent that is speaking and starts uttering the

new utterance.

There can be two kinds of interruptions: self-interruption or an interruption by the

other agent. Gaze gestures and interruption utterances (e.g. “Wait!” or “Look!”) are

used to make the interruptions smoother. As we have already stated, the length of an

utterance is stored in the template manager for each template, nevertheless this length is

not accurate since the exact length of an utterance depends on slot substitutions in the

templates (e.g. a ?name “Ray” is shorter than “Richard”). Thus, the Charamel avatar

engine is always queried to send back the real length of an utterance. However, it can

take up to 1 second to get the response, thus the estimated length that is stored in the

template manager is used as long as the real length returned from the Charamel avatar

engine is unknown. A gesture or a facial expression can be sent to the Charamel avatar

engine at any time. New gesture or facial expression will be smoothly interpolated with

the previous one.


Since the avatar manager communicates with the Charamel avatar engine via socket (see

the Charamel manual [38]) we have to deal with some latency that can be up to one

second which can cause unwanted delays in the commentary. Another shortcoming of the

Charamel avatar engine is that a virtual agent that is speaking cannot be interrupted at

a specific position in an utterance since the exact state of the virtual agent is unknown.

We can only estimate the position in an utterance according to the time elapsed from

its beginning. Therefore, we cannot prevent from an utterance being interrupted in the

middle of a word.

5.4.3 Output Manager

The output manager is responsible for the plan execution, i.e., it decides in which di-

alogues the virtual agents will be engaged. In the following text, we will explain the

functionality of the output manager in detail. The output manager gets plans from the

discourse planner, chooses one plan to execute, maps planning operators onto templates,

and sends respective annotated utterances to the avatar manager that transforms them

to the Charamel specific commands. Thus, the output manager decides which plan and

when to execute. Furthermore, the output manager can interrupt the current plan and

run a new one while the interrupted plan can be started again later. The decision when

to interrupt a plan is based on heuristics. Moreover, the output manager keeps the plan

history that prevents from repetitions so that one plan is not executed twice in a row.

Decision Loop

The functionality of the output manager is implemented in the decision loop that main-

tains the state of the plan that is being executed, the stack of candidate plans, and the

plan history. The decision loop consists of the following steps:

1. Try to get new plans.

2. If there are new plans then select one and put it on the stack of candidate plans.

3. Remove old plans from the stack of candidate plans.

4. Get the status of the dialogue engine.

5. In the case that nobody is speaking we can perform one of the following actions:

• The plan that is being executed continues with the next utterance.

• The plan that has been interrupted starts again.

• The current plan is interrupted by a new one.

• A new plan is started.


6. In the case that somebody is speaking and there is a newer plan on the stack of

candidate plans, we decide according to heuristics whether the current plan will

be interrupted or not.

The plan is selected according to its priority and the least recently used strategy (at step

2) such that it prefers plans with high priority and plans that have not been executed

recently. To ensure that the stack of the candidate plans contains only plans that are

up-to-date (at step 3), we go through the plans and filter out old plans depending on

the semantic tokens of plans. For instance, a plan that contains some background facts

(e.g. that the serving player is leading the ATP score) does not get older so fast as a

plan that is related to some event that happened in the middle of a rally (e.g. when a

player played a smash).

Each time the output manager gets new plans, it has to decide on the basis of some

heuristics whether to interrupt the current plan and continue with a new one or not.

The output manager makes use of the state of the dialogue to know the approximate

time needed to finish the current utterance or how long the current plan has already

been running. For instance, the current plan will not be interrupted if it finishes in a

second or if it was started a couple of milliseconds ago. The interruptions also cannot

occur too often. In dependence on the semantic tokens of plans, some plans should be

executed as soon as possible (e.g. a comment referring to an ace) and some plans can

be executed with certain delay (e.g. a comment on player’s background). Furthermore,

an interrupted plan can be run again if it is still up-to-date and has not been almost

finished last time.

Chapter 6

Discussion

In this chapter, we will compare the IVAN system with ERIC, evaluate our system in

terms of the research aims, and discuss two basic tools (JSHOP and Jess) that can

be both employed to generate affective commentary on a continuous sports event in

real-time.

6.1 Comparison with the ERIC system

In this section, we will compare our system with ERIC (see section 2.1) since ERIC

is most closely related to our work. ERIC is an affective commentary virtual agent

that won GALA 2007 1 as a horse race reporter. The overall goal of ERIC is the same

as ours with the difference that while ERIC is a monologic system that employs one

virtual agent we have employed a presentation team that consists of two virtual agents

to comment on a sports event. Our virtual agents have different roles (TV commentator,

expert) and can have different attitudes to the players (positive, neutral, negative). The

use of a presentation team is believed to be more entertaining for the audience than only

one presenter and enriches the communication strategies since our virtual agents can be

engaged in dialogues and represent opposing points of view.

ERIC employs the expert systems to generate speech where his utterances reflect his

current knowledge state and the discourse coherence is ensured by the centering the-

ory. Nevertheless, ERIC may be too reactive, i.e., individual utterances are uttered at

particular knowledge states where ERIC cannot generate larger contributions. Hence,

we have employed an HTN planner to generate the dialogues which enabled us to plan

large dialogue contributions and the discourse coherence was ensured by the planner.


62


Chapter 6. Discussion 63

In contrast to ERIC, we have also implemented the possibility of interruptions, i.e.,

the current discourse can be interrupted if a more important event happens. However,

there is always certain trade-off between reactivity, i.e., a reactive commentary with fre-

quent interruptions, and discourse coherence, i.e., a commentary with large and coherent

dialogue contributions that does not comment on each event.

While ERIC uses ALMA to maintain his affective state, we use two methods: one that

generates affective dialogues based on virtual agents’ attitudes to the players and the

other that maintains the affective state of each virtual agent in the emotion module.

ALMA might appear to be a “black box”, on the contrary, the generation of affective

dialogues and the simulation of the affective states for our virtual agents are more

transparent, i.e., we can adjust the computation of the initial intensities of individual

OCC emotions in dependence on the personality, we can define our decay function, and

we have full control over the input tags and output of our emotion module. We can also

always say which event has caused the virtual agent’s current emotion or why a virtual

agent is commenting in a positive or negative way on an event or a player.

In comparison to ERIC, our virtual agents have gestures more synchronized with speech,

use more elaborate idle gestures (provided by Charamel), can gaze at each other, and

can interact with a user via user pre-defined questions. Whilst ERIC was designed to

be domain independent and was tested in two different domains, our system has only

been designed to comment on a tennis game, nevertheless the same architecture can be

used to produce affective commentary in other domains.

6.2 Evaluation in Terms of Research Aims

In this section, we will compare our research amis, listed in section 1.4, with the system

that we have implemented.

Dialogue Planning for Real-time Commentary and Reactivity

We have employed the JSHOP as an HTN planner to produce commentary on a contin-

uous sports event in real-time. The motivation to use an HTN planner was to generate

large dialogue contributions and to prevent from being too reactive (in the sense de-

scribed in 6.1). It also seemed to be a good strategy to generate dialogues. First, the

JSHOP gets all facts that describe the current state of the world and outputs all possible

plans (dialogues). Then, in the decision loop, one plan is selected and executed. The

problem arises when an important event happens in the middle of the execution of a

plan (dialogue) that comments on another event. In this case, our system can either

interrupt the execution of the current plan or wait till the current plan finishes. This


problem would solve dynamic replanning, i.e., to modify the current plan on the fly.

Since the JSHOP does not support dynamic replanning we can only either wait till the

current plan finishes or we can interrupt it. However, if the JSHOP supported dynamic

replanning, it would not be sufficient since the Charamel avatar engine does not indicate

its exact state, e.g., we cannot interrupt an utterance at a specific position in an utter-

ance. Moreover, if we sent an utterance word by word to the Charamel avatar engine it

would not be uttered in a coherent way. Thus, the planner would need to work with the

whole utterances which would not be optimal since we would have to wait till the current

utterance would have been uttered, and then we would continue with an utterance of

the modified plan that would have been created by the dynamic replanning.

Therefore, there is always certain trade-off between reactivity and discourse coherence.

We can either often interrupt plans (dialogues) to be reactive or we can delay the com-

mentary on some events or we can even ignore some events to get large, coherent dialogue

contributions. Nevertheless, we have noticed that the real-life tennis commentators do

not comment on every event and in the case when the game is not interesting, they are

engaged in small talks to amuse the audience by talking about the players’ background.

Thus, we have implemented a compromise that uses some heuristics to decide when

to interrupt the discourse. The resulting commentary is partly reactive but since we

cannot interrupt the discourse too often, our commentary has sometimes delays or does

not consider some events.

There is also always certain trade-off between reactive commentary that uses short

utterances and elaborate, more detailed commentary that is not so reactive. Since we

wanted to produce more interesting and detailed commentary to convey more facts, our

utterances are rather long.

We have supposed that the HTN planning is convenient to produce commentary on

sports events that are rather long-winded (e.g. a life tennis game). However, the testing

files provided by GALA 2009 were generated by the Wii2 software that produced tennis

games that unfolded more quickly in comparison to a standard life tennis games. Hence,

there was a slight mismatch between the input we anticipated and the input that we had

got. Nevertheless, our system was able to produce the commentary even under these

conditions.

The reactivity of the system also partly depends on the response time of the avatar

engine and the speed with which the virtual agents are talking. A little bit faster speech

and lower response time of the avatar engine that is sometimes up to 1 second would

lead to better results in terms of reactivity.

2http://wii.com/

http://wii.com/


Behavioural Complexity and Affectivity

Our virtual agents provide affective commentary on a tennis game according to their

(positive, neutral, negative) attitudes to the players and according to the events that

occur during the tennis game. The current affect of a virtual agent is expressed by

dialogue scheme selection, lexical selection, facial expression, and gestures. A user can

recognize which virtual agent is in favour of which player and whether the virtual agent’s

favourite player is doing well or not. For instance, only the virtual agent’s facial expres-

sion can reveal whether his/her favourite player is leading or not. The virtual agents

have also gestures synchronized with speech and can interact with a user in the form of

user pre-defined questions.

The variability of dialogues is ensured by the planner that always outputs all possible

plans (dialogues), and by the random selection of utterances and gestures within partic-

ular templates. However, there is always certain trade-off between a few nice, suitable,

and specific dialogues and a variety of a lot of general dialogues. Since we wanted to

have specific commentary for GALA we have preferred the first option. Nevertheless,

more variety could be achieved if we added more dialogue schemes and more variants

of utterances and gestures to respective templates. The dialogue schemes could be also

based on different types of OCC emotions that are maintained for each virtual agent in

a simple emotion module which would also increase the variability and affectivity of the

commentary.

We have used two methods to produce affective commentary: one that generates affective

dialogues based on virtual agents’ attitudes to the players and the other that maintains

the affective state of each virtual agent in the emotion module. Thus, the user can see

which event elicited which emotion and why a virtual agent is commenting in a positive

or negative way.

Generalizability

Although our system was not designed to be domain independant, we will describe be-

low which modifications would be necessary to change the domain. The tennis simulator

would need only a subtle modification to simulate any sports event given as an ANVIL

file. We would need to define new states at which the discourse planner is triggered by

the event manager. We would also need to define the snapshots of the world and which

low-level facts would be derived from the respective snapshots. The pre-processing of

the background facts is done in a generic way, thus we would only provide corresponding

input CSV files. While the Java code in the discourse planner is domain independent,

the definition of the Hierarchical Task Network in the planning domain would need to

be rewritten except for the part that concerns the background knowledge (e.g. injury,


weather). We would also need to add corresponding templates, change some heuris-

tics in the output manager, e.g., to determine under which conditions a plan can be

interrupted. We would also need to define respective emotion eliciting conditions in the

emotion module. The avatar manager is domain independent. Thus, the most complex

task would be to rewrite the domain description of the planner and to add respective

templates.

6.3 Comparison JSHOP vs Jess

In this section, we will compare two approaches, i.e., the HTN planning (see section 3.1)

and the expert systems (see section 3.2) that can be used to generate a commentary

on a sports event defined as GALA 2009 (see section 1.2). We will focus on two tools,

namely: JSHOP3 that is a representative of an HTN planner that we have employed

in our system to generate dialogues, and Jess4 which is a representative of an expert

system that was used, e.g., in ERIC (see section 2.1) to generate speech. Whereas the

HTN planning is well suited to plan larger contributions (e.g. dialogue planning), the

expert systems are more suitable to produce shorter comments that reflect the current

state of the world. In the following text, we will compare JSHOP and Jess in terms of

their expressive power, usableness, and user-friendliness.

• Variability

The variability is important, e.g., for the dialogue planning since the virtual agents

should not be engaged in the same dialogues all the time. In the logistics, it is

also convenient to have more than one way how to deliver a package since not all

paths cost the same, thus the cheapest path should be chosen, and some paths

can be also dynamically added or deleted from the domain. The advantage of the

planning is that it finds all solutions to a problem while the expert systems output

only one. (More precisely, while a planner is backtracking to find all possible plans,

it can try several substitutions of a variable. In contrast to the planning, once a

rule fires in an expert system a variable is substituted and cannot be changed.)

Nevertheless, it is possible to set the random resolution strategy in a rule-based

system which resembles as if we have chosen a plan at random among all possible

plans output by a planner. Thus, the variability can be reached in the rule-base

systems to some extent as well.

• Priority

We can assign a cost to each planning operator in the planning domain such that

3JSHOP2 (Java Simple Hierarchical Ordered Planner) http://www.cs.umd.edu/projects/shop/4Jess (Java Expert System Shell) http://www.jessrules.com/


http://www.jessrules.com/


the cost of a plan is equal to the sum of the costs of all planning operators that

the plan contains. After the planner outputs all possible plans, we can choose the

most or least expensive plan according to our preferences. If the cost corresponds

to the length of a path, we will probably choose the shortest one. If the cost

corresponds to the amount of money that we get when we execute the plan, we

will presumably choose the most profitable plan. In an expert system, we can

assign a salience value to each rule which specifies how urgently a rule should be

fired and in the case the salience value of two rules is the same, the current conflict

resolution strategy decides which rule will be fired as first. This is the way how

the rule-based systems can prioritize some outcomes. Nevertheless, the use of the

salience value should be avoided since it makes the execution of the rules very

difficult to monitor.

• Expressive Power

Jess offers substantially more constructs than JSHOP. We will show two examples

of constructs that are defined in Jess and that are not defined in JSHOP where

it would be advantageous to have them in JSHOP as well. The first example:

JSHOP does not support unordered facts, thus in the case we want to work with

only one slot of a fact we have to consider all its slots since JSHOP supports only

ordered facts. The second example: It is quite cumbersome to count the number

of facts that match certain condition in JSHOP, nevertheless it can be bypassed

by recursion. This task can be solved in Jess using the accumulate construct in an

intuitive way.

• Online vs Offline Execution

We have already pointed out that JSHOP runs offline (see Figure 3.3). Thus, due

to any change in the domain or problem file, respective Java file has to be first

generated and then compiled before the planner can be actually run. In contrast

to JSHOP, Jess runs online, i.e., after the Jess rule-based engine is initialized, it

can be run several times where facts and rules can be added to its facts base or

retracted in the meantime.

• Development Environment

Jess can be better integrated into a development environment than JSHOP since

there is a plugin that integrates Jess into Eclipse IDE 5 which facilitates the de-

velopment, e.g., it offers a Jess editor that emphasizes the Jess Lisp-like syntax

and marks errors. In comparison to Jess, JSHOP is provided as a Java library.

Nevertheless, the input JSHOP files can be edited as text files in Eclipse IDE as

well.

5http://www.eclipse.org/

http://www.eclipse.org/

Chapter 7

Conclusion

7.1 Summary

In this thesis, we have presented the architecture of the IVAN system (Intelligent In-

teractive Virtual Agent Narrators) that generates an affective commentary on a tennis

game in real-time where the input was given as an annotated video provided by GALA

2009. The demo version of the IVAN system was accepted for the GALA 2009 1 that

was a part of the 9th International Conference on Intelligent Virtual Agents (IVA)2.

The system employs two virtual agents with different attitudes to the players that are

engaged in dialogues to comment on a tennis game. We have focused on the knowledge

processing, dialogue planning, and behaviour control of the virtual agents. Commercial

products have been employed to represent the audio-visual component of the system.

Most parts of the system are domain dependent. However, the same architecture can

be reused to implement applications such as: interactive tutoring system, tourist guide,

or guide for the blind.

The system consists of several modules. We have employed an HTN planner to plan the

dialogues, an expert system to define the appraisals of the emotion eliciting conditions in

the emotion module, and finite state machines to simulate basic states of the system. Our

two virtual agents can have positive, neutral, or negative attitudes to the players. The

system uses two methods to generate affective multimodal output. In the first method,

the dialogue schemes in the HTN planner are selected according to the desirability

of particular events for respective virtual agents. In the second method, the system

maintains the affective state of each virtual agent in the emotion module, according

to the OCC cognitive model of emotions [36], based on the appraisals of the events

1http://hmi.ewi.utwente.nl/gala/finalists 2009/2http://iva09.dfki.de/

68


http://iva09.dfki.de/

Chapter 7. Conclusion 69

that happen in a tennis game. The current affect of the virtual agents is expressed

by lexical selection, facial expression, and gestures. Furthermore, the system integrates

background knowledge about the players and the tournament and allows the user to fire

one of the pre-defined questions at any time.

We have employed the JSHOP3 as an HTN planner to generate dialogues for our two

virtual agents. We have verified that JSHOP can be employed to generate affective

commentary on a continuous sports event in real-time. However, the HTN planning is

well suited to generate large dialogue contributions. Thus, if the environment changed

rapidly and we wanted to consider most of the events that occur in the environment it

would be more appropriate to use the expert systems as in ERIC. [10]

7.2 Future Work

In the following paragraphs, we will outline which modifications could be made to im-

prove our system in the future.

EMBR

We could integrate EMBR (A Realtime Animation Engine For Interactive Embodied

Agents). [39] Since EMBR has more advanced behaviour control, e.g., it can have more

precise gaze that can express particular emotions whereas the Charamel virtual agents

(see section 5.1.3) can only turn the head to gaze at the other virtual agent. We did not

employ EMBR since it had not been released at that time and EMBR had also offered

only one virtual agent where we needed two distinguishable characters.

Prosody

We could also integrate a prosody module if we had an appropriate TTS engine that

would provide the option to set the respective parameters. Then, we could use the

current emotional state of a virtual agent that is simulated by an emotion module (see

section 4.2.3) to set respective parameters of the TTS engine. We have not implemented

the prosody module since the RealSpeak Solo TTS4 did not provide the option to change

respective parameters.

ALMA

We could use ALMA [15] to maintain the emotional state of each virtual agent since

ALMA in addition to our emotion module maintains history and the emotion blending.

We could anticipate smoother transitions between individual emotional states of a virtual

agent. Nevertheless, we did not employ ALMA since we wanted to have full control

3JSHOP2 (Java Simple Hierarchical Ordered Planner) http://www.cs.umd.edu/projects/shop/4http://www.nuance.com/realspeak/solo/


http://www.nuance.com/realspeak/solo/


over the emotion module so that we could, e.g., adjust the computation of the initial

intensities of individual OCC emotions in dependence on the personality and define our

own decay function.

Affect

We could try to base some dialogue schemes on particular OCC emotions that are output

by our emotion module. In this way, we would get more affective and suitable dialogues.

Nevertheless, it would entail a lot of work since we would have also to make up a lot

of utterances that would express particular emotions. Let us note that to work with a

reasonable amount of templates, we can have either a lot of general affective dialogues

or a lot of specific dialogues that express particular emotions in a limited way. In our

case, we have chosen the second option, thus our dialogue schemes are only based on

virtual agents’ (positive, neutral, negative) attitudes to the players.

We could also base the selection of particular utterances and gestures in templates

on the current emotional state of a virtual agent that is maintained by our emotion

module. A particular utterance and a gesture would be chosen according to the current

emotional state of the virtual agent. The current affect could also, for instance, influence

the velocity of particular gestures. In this way, we would get more affective dialogues.

Nevertheless, we did not implement this feature since it would have required to make up

a lot of different affective utterances. We have also supposed that it is sufficient when

the utterances convey only virtual agents’ (positive, neutral, negative) attitudes to the

players.

Dynamic Replanning

We could try another planner (e.g. HOTRiDE [40]) that would support dynamic re-

planning, since the only way how we can change the plan (dialogue) now is to interrupt

the current plan and start a new plan. Nevertheless, the dynamic replanning seems

to be quite difficult to implement. One reason, why we did not try such a planner is

that the Charamel virtual engine (see section 5.1.3) does not indicate the exact state of

the discourse, thus such a planner would have to work with the whole utterances which

would not be optimal. Thus, the precondition to employ such a planner is to have an

avatar engine that would indicate what exactly has been uttered so far at any point in

time.

Evaluation

More elaborate evaluation of the system could be done. We could perform an experiment

to find out what a user remembers from the commentary with and without virtual

agents. However, the life tennis commentators are usually hidden so that the audience

could concentrate on the tennis game. Though we would in general expect that the

commentary with the virtual agents would be better, it can easily happen that the users


would more concentrate on the video of the tennis game and remember more without

virtual agents since the use of the virtual agents would rather distract them. We have

not performed this sort of evaluation since it was not clear how to interpret the possible

results.

We could also compare our commentary with a life commentary. Nevertheless, in com-

parison to our system, the real commentators are usually hidden and their commentaries

are not biased. Our system was also partly optimized for the GALA 2009 (see section

1.2) that was slightly different from a life tennis game since it used Wii5 videos of tennis

games. The life tennis commentary is also often very elaborate, thus our system cannot

compete with such a commentary in terms of variability.

Other Domains

We could reuse the architecture of the system to implement a system in other domains,

e.g., another long-winded sports events, interactive tutoring systems, tour guides, or

guides for the blind.

5http://wii.com/

http://wii.com/

Appendix A

Commentary Excerpt

In the following list, we will show a commentary excerpt, where C stands for a com-

mentator and E stands for a tennis expert.

C : “Ladies and Gentleman. Welcome to the Wimbledon semi-final in doubles.”

E : “We will guide you through the match in which James Blake and Andy Roddict are

playing versus Marat Safin and David Ferrer.”

C : “Enjoy the show!”

C : “The weather is cloudy.”

E : “I hope it won’t be raining.”

C : “Oops, Roddick scored!”

E : “Roddick hits an excellent forehand-volley right into the left corner.”

C : “Roddick has been unbeatable recently.”

E : “What a control by Roddick!”

C : “The score is already 30:0.”

E : “Players Safin and Ferrer are real losers as usual!”

E : “None of the players are playing on favourite surface.”

C : “It should harm performance of Blake and Rod..”

72

Appendix A. Commentary Excerpt 73

Interruption

E : “What an unexpected lob!”

C : “It’s a bad idea to try to return a lob at the net.”

C : “The brother of Blake Thom..”

Interruption

C : “Look!”

C : “Blake missed the huge backhand return by Ferrer! ”

C : “The brother of Blake Thomas is also playing tennis.”

E : “His Best ranking was in 2002.”

C : “The score is already 30:40.”

E : “No, Blake and Roddick are facing the break point.”

C : “Ferrer had a hard time recently.”

C : “Deuce.”

E : “Great recover by Blake and Roddick!”

C : “Roddick has been unbeatable recently.”

E : “No doubt he is a genius.”

C : “Off forehand by Blake drifts outside the left sideline.”

E : “Blake just overcooked his forehand.”

E : “Blake concentrates on his serve.”

C : “Roddic has been four times injured since last year.”

E : “It’s amazing how he ...”

Interruption

C : “What a relief!”

E : “Oh, no!”

C : “Tight game let’s summerize it.”

C : “Safin and Ferrer won the first game.”

E : “That’s unbelievable that they broke opponents’ serve!”

Bibliography 74

C : “That was quite obvious!”

E : “Safin and Ferrer played below part!”

C : “There were some excellent shots!”

E : “Well there might have been some bright sides.”

C : “Of course there were!”

C : “Did safin and Ferrer have any difficulties?”

E : “They were already trailing.”

C : “We have seen nice recovery.”

C : “Let’s see the next game.”

E : “Definitely.”

Bibliography

[1] Justine Cassell, Tim Bickmore, Lee Campbell, Hannes Vilhjalmsson, and Hao Yan.

”Human conversation as a system framework: Designing embodied conversational

agents”. In Embodied Conversational Agents, pages 29–63. MIT Press, Cambridge,

2000.

[2] Jonathan Gratch and Stacy Marsella. Tears and fears: modeling emotions and

emotional behaviors in synthetic agents. In Proceedings of the fifth international

conference on Autonomous agents, pages 278 – 285. ACM Press, Montreal, Quebec,

Canada, 2001.

[3] Jeff Rickel and W. Lewis Johnson. Animated agents for procedural training in

virtual reality: Perception, cognition, and motor control. APPLIED ARTIFICIAL

INTELLIGENCE, 13:343—382, 1998.

[4] Marc Cavazza, Fred Charles, and Steven J. Mead. Interacting with virtual char-

acters in interactive storytelling. In Proceedings of the first international joint

conference on Autonomous agents and multiagent systems, pages 318–325. ACM

Press, Bologna, Italy, 2002.

[5] Mark Riedl, C.J. Saretto, and R. Michael Young. Managing interaction between

users and agents in a multi-agent storytelling environment. In Proceedings of the 2nd

International Joint Conference on Autonomous Agents and Multi Agent Systems.

Melbourne, 2003.

[6] Elisabeth Andre, Thomas Rist, Susanne van Mulken, Martin Klesen, and Stephan

Baldes. The automated design of believable dialogues for animated presentation

teams. In Embodied Conversational Agents, pages 220–225, Cambridge, 2000. MIT

Press.

[7] Elisabeth Andre and Thomas Rist. Presenting through performing: On the use of

multiple Life-Like characters in Knowledge-Based presentation systems. In 2000

International Conference on Intelligent User Interfaces, pages 1–8. ACM Press,

New York, 2000.

75

Bibliography 76

[8] Elisabeth Andre, Thomas Rist, and Jochen Muller. Integrating reactive and scripted

behaviors in a Life-Like presentation agent. In Proceedings of the Second Inter-

national Conference on Autonomous Agents (Agents 1998), pages 261–268. ACM

Press, New York, 1998.

[9] Elisabeth Andre, Kim Binsted, Kumiko Tanaka-Ishii, Sean Luke, Gerd Herzog,

and Thomas Rist. Three RoboCup simulation league commentator systems. AI

Magazine, 22:57–66, 2000.

[10] Martin Strauss and Michael Kipp. ERIC: a generic rule-based framework for an

affective embodied commentary agent. 2007.

[11] Francois L. A. Knoppel, Almer S. Tigelaar, Danny Oude Bos, Thijs Alofs, and

Zsofia Ruttkay. Trackside DEIRA: a dynamic engaging intelligent reporter agent.

In Proceedings of the 7th international joint conference on Autonomous agents and

multiagent systems (AAMAS). Portugal, 2008.

[12] Michael Kipp. ANVIL a generic annotation tool for multimodal dialogue. pages

1367–1370, Aalborg, 2001.

[13] Ivan Gregor, Michael Kipp, and Jan Miksatko. IVAN intelligent interactive virtual

agent narrators. In Proceedings of the 9th International Conference on Intelligent

Virtual Agents (IVA-09), pages 560–561. Springer, Amsterdam, 2009.

[14] Martin Strauss. Realtime generation of multimodal affective sports commentary

for embodied agents, 2007.

[15] Patrick Gebhard. ALMA - a layered model of affect. In Proceedings of the Fourth In-

ternational Joint Conference on Autonomous Agents and Multiagent Systems (AA-

MAS 05), pages 29–36. Utrecht, 2005.

[16] Lewis R. Goldberg. An alternative description of personality: The Big-Five fac-

tor structure. In Journal of Personality and Social Psychology, volume 59, page

12161229. 1990.

[17] Barbara J. Grosz, Aravind K. Joshi, and Scott Weinstein. Centering: A frame-

work for modeling the local coherence of discourse. In Computational Linguistics,

volume 21, page 203 225. 1995.

[18] Ionut Damian, Kathrin Janowski, and Dominik Sollfrank. Spectators, a joy to

watch. In Proceedings of the 9th International Conference on Intelligent Virtual

Agents (IVA-09), pages 558–559. Springer, Amsterdam, 2009.

Bibliography 77

[19] Elisabeth Andre and Thomas Rist. Controlling the behavior of animated pre-

sentation agents in the interface: Scripting versus instructing. In AI Magazine,

volume 22, pages 53–66. AAAI Press, 2001.

[20] Elisabeth Andre, Gerd Herzog, and Thomas Rist. Generating multimedia presen-

tations for RoboCup soccer games. In RoboCup-97: Robot Soccer World Cup I

(Lecture Notes in Computer Science). Springer, 1998.

[21] Dana Nau, Tsz-Chiu Au, Okhtay Ilghami, Ugur Kuter, Hector Munoz-Avila,

J. William Murdock, Dan Wu, and Fusun Yaman. Applications of SHOP and

SHOP2, 2004.

[22] Richard Fikes and Nils Nilsson. STRIPS: a new approach to the application of

theorem proving to problem solving. In Artificial Intelligence, volume 2, pages

189–208. 1971.

[23] Dana S. Nau, Stephen J. J. Smith, and Kutluhan Erol. Control strategies in HTN

planning: Theory versus practice. In AAAI-98/IAAI-98 Proceedings, pages 1127–

1133. 1998.

[24] Dana Nau, Hector Munoz-Avila, Yue Cao, Amnon Lotem, and Steven Mitchell.

Total-Order planning with partially ordered subtasks. In Proceedings of the Sev-

enteenth International Joint Converence on Artificial Intelligence (IJCAI-2001).

Seattle, 2001.

[25] Dana Nau, Yue Cao, Amnon Lotem, and Hector Munoz-Avila. SHOP: simple hier-

archical ordered planner. In International Joint Conference on Artificial Intelligence

(IJCAI-99), pages 968–973, Stockholm, 1999.

[26] Okhtay Ilghami and Dana S. Nau. A general approach to synthesize Problem-

Specific planners, 2003.

[27] Okhtay Ilghami. Documentation for JSHOP2. 2006.

[28] Gary Riley. CLIPS: a tool for building expert systems, 2008. URL http:

//clipsrules.sourceforge.net/.

[29] Ernest Friedman-Hill. Jess, the rule engine for the java platform, 2009. URL

http://www.jessrules.com/.

[30] Patrick Gebhard, Michael Kipp, Martin Klesen, and Thomas Rist. Authoring scenes

for adaptive, interactive performances. In Proceedings of the Second International

Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-03),

pages 725–732. ACM Press, New York, 2003.

http://clipsrules.sourceforge.net/

http://clipsrules.sourceforge.net/

http://www.jessrules.com/

Bibliography 78

[31] Martin Klesen, Michael Kipp, Patrick Gebhard, and Thomas Rist. Staging exhibi-

tions: Methods and tools for modelling narrative structure to produce interactive

performances with virtual actors. In Virtual Reality. Special Issue on Storytelling

in Virtual Environments, volume 7, pages 17–29. Springer-Verlag, 2003.

[32] Norbert Reithinger, Patrick Gebhard, Markus Lockelt, Alassane Ndiaye, Norbert

Pfleger, and Martin Klesen. VirtualHumanDialogic and affective interaction with

virtual characters. In Proceedings of the 8th International Conference on Multimodal

Interfaces (ICMI’06), pages 51–58. Canada, 2006.

[33] Patrick Gebhard, Marc Schroder, Marcela Charfuelan, Christoph Endres, Michael

Kipp, Sathish Pammi, Martin Rumpler, and Oytun Turk. IDEAS4Games: building

expressive virtual characters for computer games. In Proceedings of the 8th Interna-

tional Conference on Intelligent Virtual Agents (IVA’08), pages 426–440. Springer,

2008.

[34] Patrick Gebhard and Susanne Karsten. On-Site evaluation of the interactive CO-

HIBIT museum exhibit. In Proceedings of the 9th International Conference on

Intelligent Virtual Agents (IVA-09), pages 174–180. Springer, Amsterdam, 2009.

[35] Michael Kipp, Kerstin H. Kipp, Alassane Ndiaye, and Patrick Gebhard. Evaluating

the tangible interface and virtual characters in the interactive COHIBIT exhibit,

2006.

[36] Andrew Ortony, Allan Collins, and Gerald L. Clore. The cognitive structure of

emotions., 1988.

[37] Christoph Bartneck. Integrating the OCC model of emotions in embodied charac-

ters. In Proceedings of the Workshop on Virtual Conversational Characters: Appli-

cations, Methods, and Research Challenges. Melbourne, 2002.

[38] Alexander Reinecke, Christian Dold, and Thomas Koch. Charamel Avatar Player

Interface. 2009.

[39] Alexis Heloir and Michael Kipp. EMBR - a realtime animation engine for interactive

embodied agents. In Proceedings of the 9th International Conference on Intelligent

Virtual Agents (IVA-09), pages 393–404. Springer, Amsterdam, 2009.

[40] N. Fazil Ayan, Ugur Kuter, Fusun Yaman, and Robert P. Goldman. HOTRiDE:

hierarchical ordered task replanning in dynamic environments. In Proceedings of

the ICAPS-07 Workshop on Planning and Plan Execution for Real-World Systems

- Principles and Practices for Planning in Execution. Providence, Rhode Island,

USA, 2007.

embodied presentation teams: a plan-based approach for...

Documents