data-driven programming and behavior for autonomous...

Data-Driven Programming and Behavior for Autonomous Virtual Characters

Jonathan Dinerstein, Parris K. Egbert, Dan Ventura and Michael GoodrichComputer Science Department

Brigham Young [email protected], {egbert, ventura, mike}@cs.byu.edu

In the creation of autonomous virtual characters, two lev-els of autonomy are common. They are often called mo-tion synthesis (low-level autonomy) and behavior synthe-sis (high-level autonomy), where an action (i.e. motion)achieves a short-term goal and a behavior is a sequence ofactions that achieves a long-term goal. There exists a richliterature addressing many aspects of this general problem(and it is discussed in the full paper). In this paper wepresent a novel technique for behavior (high-level) auton-omy and utilize existing motion synthesis techniques.

Creating an autonomous virtual character with behaviorsynthesis abilities frequently includes three stages: forminga model used to generate decisions, running the model toselect a behavior to perform given the conditions in the en-vironment, and then carrying out the chosen behavior (trans-lating it into low-level synthesized or explicit motions). Forthis process to be useful it must efficiently produce realis-tic behaviors. We address both of these requirements with anovel technique for creating cognitive models. The tech-nique uses programming-by-demonstration to address thefirst requirement, and uses data-driven behavior synthesisto address the second. Demonstrated human behavior isrecorded as sequences of abstract actions, the sequences aresegmented and organized into a searchable data structure,and then behavior segments are selected by determining howwell they accomplish the character’s long-term goal (seeFig. 1). The resulting model allows a character to engagein a very large variety of high-level behaviors.

Overview and FormulationThe proposed data-driven technique contains two inter-related parts: training via programming-by-demonstrationand autonomous behavior synthesis. The training phaseconsists of 1) a programmer integrating our system into anew character, 2) a user demonstrating behaviors, and 3)the demonstrator testing and editing the behaviors. Thesynthesis is effected by 1) segmenting and clustering thedemonstrated behaviors, 2) performing segment selection tochoose the next behavior to perform, and 3) iteratively per-forming the actions in the selected segments. These pro-cesses are formalized in the full paper. Briefly, we assume

Copyright c© 2008, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.

a state/action representation and a nondeterministic environ-ment which probabilistically maps state/action pairs to newstates, ssst+1 ∼ Pr(ssst+1|ssst,aaat).

Behaviors are represented as an ordered sequence of ac-tions, b = 〈aaa1, aaa2, ..., aaal〉, and are recorded by observingthe user at discrete time intervals. Behaviors are segmentedinto a set Q of small, fixed length sequences and novel be-havior is produced by concatenating behavior segments fromQ. Behavior segments are selected such that their concate-nation fulfills the goals of the animator and appears stylizedand natural. Goals are encoded as a fitness function, and thesequence of states induced by a behavior (sequence of ac-tions) can be evaluated against this (finite-horizon) fitness:

bbest = argmaxbi(j,j+Klength)∈Q

Klength∑u=1

f(s̃ssu)

where bi(j, j + Klength) is the sequence of actions tested,s̃ss1...s̃ssKlength

are the states resulting from performing theseactions, and f : S → R is the fitness function.

Autonomous BehaviorOur approach makes deliberative decisions by simulatingthe outcome of candidate choices. Such simulation is anal-ogous to traditional planning, except that we only considera set of n = |Q| paths in the search tree. In other words,each behavior segment bi ∈ Q represents a path through thesearch tree (and thus the user demonstration has, in essence,pruned the search tree, allowing faster deliberation). Wefurther reduce computational requirements by hierarchicallyclustering behavior segments and using a greedy search thatrelies on the notion of a prototype member of a cluster. Thefitness of a cluster is approximated by computing the fitnessof its prototype segment. The affect of other agents in the en-vironment is included in the simulation using their existingbehavioral models, if known, or by using agent modellingtechniques. Finally, since our technique is a heuristic oneand since, more importantly, the environment is dynamic, itis important occasionally to reevaluate and perhaps modifybehavior. During execution, the model periodically evalu-ates whether the currently selected behavior segment is stillappropriate by determining whether the average fitness ofthe remaining actions is above a threshold. If not, the currentsegment is deemed invalid and a new segment is selected.

Figure 1: Overview. (a) A user interactively demonstrates the desired high-level character behavior. (b) These demonstrationsare recorded as sequences of abstract actions. (c) At run-time, segments of the action sequences are combined into novelbehaviors. The resulting sequence of abstract actions determines the categories and goals of the character’s motion synthesis.Thus a character can engage in very long-term planning to achieve a virtual environment state S(x).

Experimental ResultsFor this work, the most important evaluation criterion is theaesthetic value of the resulting animation. Since aestheticvalue is subjective, the case studies and resulting video serveas the basis for this evaluation. Other criteria are also rele-vant, however, including the following: objective quality ofaction selection, sensitivity to parameter settings, algorith-mic complexity, search time, memory requirements, CPUusage, and time required by the animator. We have per-formed five case studies and demonstrations in the experi-ments: a submarine piloting task, a predator/prey scenario,a crowd behavior scenario, a capture-the-flag game, and acooperative assembly task (Fig. 2). In each case study, thereis no explicit communication between the characters. Allcharacters must rely on “visual” perceptions to ascertain thecurrent state of the virtual world ssst and perception is per-formed by each character individually. The first three casestudies use actions of short duration (< 1 second). The lasttwo use actions of long duration (> 5 seconds). In all casestudies, significant action planning is performed (up to 20actions into the future).

Our programming by demonstration approach has em-pirically proven to take very little time and storage,with a small amount of behavior trajectory data suffi-cient to achieve effective character behavior, and an in-formal user case study provides evidence that our inter-face is intuitive and easy to use. Our technique pro-

vides deliberative decision making while being signifi-cantly faster and more scalable than a cognitive modelbecause the use of example action sequences allows effi-cient computation in large action spaces. (See video atrivit.cs.byu.edu/a3dg/videos/dinerstein ddpb.mov.)

Our technique produces stylized and natural high-levelcharacter behavior that can be quickly demonstrated by non-technical users. While our approach is more computation-ally expensive than most reactive approaches to decisionmaking, it can be argued that deliberative techniques (likeours) can produce superior behavior. Moreover, our tech-nique, being O(log n), where n is the number of demon-strated action sequences, is significantly faster than tra-ditional deliberative methods, which usually are O(m|A|),where |A| is the number of possible actions and m is thedepth of the planning tree. Because our technique is heuris-tic, optimal decision making is not guaranteed. However, weempirically demonstrate that the approach is robust, is effec-tive in complex virtual environments, and is less computa-tionally expensive than many traditional deliberative tech-niques. It is a surprisingly scalable deliberative scheme,and it can work with actions of fine or coarse granularityas demonstrated in our case studies. We can compute longplans (e.g. 20 actions), allowing us to be confident in theseplans. Our technique works in conjunction with standardcharacter animation algorithms, providing goals and guid-ance through long-term deliberation.

Figure 2: Animation of the Assembly test bed. Two characters work together, without explicit communication, to assemble acolor-coded pyramid from boxes. The fitness function (which defines the desired shape and color configuration) is automaticallylearned during the demonstration phase. The characters choose high-level actions (e.g., “pick up a red box and place at locationx”), which are translated into novel motion through a combination of data-driven motion synthesis and inverse kinematics.

data-driven programming and behavior for autonomous...

Documents