proceedings of the 1991 winter simulation conference barry

9
Proceedings of the 1991 Winter Simulation Conference Barry L. Nelson, W. David Kelton, Gordon M. Clark (eds.) SIMULATING HUMAN TASKS USING SIMPLE NATURAL LANGUAGE INSTRUCTIONS Moon Ryul Jung* Jugal K. Kalita+ Norman I. Badler* Wallace Ching* * Department of Computer & Information Science, University of Pennsylvania, Philadelphia, PA 191046389. ABSTRACT We report a simple natural language interface to a human task simulation system that graphically dis- plays performance of goal-directed tasks by an agent in a workspace. The inputs to the system are simple natural language commands requiring achievement of spatial relationships among objects in the workspace. To animate the behaviors denoted by instructions, a semantics of action verbs and locative expressions is devised in terms of physically-based components, in particular geometric or spatial relations among the relevant objects. To generate human body motions to achieve such geometric goals, motion strategies and a planner that uses them are devised. The basic idea for the motion strategies is to use commonsen- sical geometric relationships to determine appropri- ate body motions. Motion strategies for a given goal specify possibly overlapping subgoals of the relevant body parts in such a way that achieving the subgoals makes the goal achieved without collision with objects in the workspace. A motion plan generated using the motion strategies is basically a chart of tempo- rally overlapping goal conditions of the relevant body parts. This motion plan is animated by sending it to a human motion controller, which incrementally finds joint angles of the agent’s body that satisfy the goal conditions in the motion plan, and displays the body configurations determined by the joint angles. 1 INTRODUCTION Computer animation of human tasks in a given workspace is an effective tool for visualizing the re- sults of task simulation. To facilitate the use of task animation for this purpose, the workspace designers should be allowed to specify tasks using a high level language which does not require extensive training or learning. Natural language is most suitable among such languages, because it is especially geared to de- + Department of Computer Science, University of Colorado, Colorado Springs, CO 80933. scribing and expressing human behavior. Languages have evolved rich verbal, prepositional ancl adverbial vocabularies to facilitate the expression of subtle as well as complex movements. Accordingly, we have been conducting research towards achieving realistic animations of tasks specified in terms of a sequence of natural language instructions (Badler et al 1990), such as Put the block on the table, Close the door, Put the block in the box and Push the block against the chair. Typically natural language instructions spec- ify what is to be done, and rarely indicate the spe- citic articulations required of body segments or parts to achieve a given goal. The agent should plan his motions so that the goal can be effectively achieved. Typical motions include taking a step, m-orienting the body, kneeling down, and reaching around an ob- stacle. The planned motions are dependent upon the structure of the workspace and the situations in which the agent is positioned. To support the use of natural language instructions as a human task simulation lan- guage, we should be able to represent goals conveyed by instructions in terms of fine-grained physical com- ponents, in particular geometric or spatiad relations among the relevant objects. We also need the capabil- ity of human motion planning by which body motions needed to achieve the fine-grained physical relations are found. This paper addresses how to provide these two capabilities. More specifically, given a task level goal in a given workspace specified in simple natu- ral language commands such as Put the block on the table, the problem of this study is to automatically find a set of possibly overlapping primitive motions that the relevant body segments of the agent must perform to achieve the task. Z HANDLING NATURAL LANGUAGE INPUTS To handle natural language instructions, we need to develop an approach to representing the semantics of 1049

Upload: others

Post on 14-May-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Proceedings of the 1991 Winter Simulation Conference Barry

Proceedings of the 1991 Winter Simulation ConferenceBarry L. Nelson, W. David Kelton, Gordon M. Clark (eds.)

SIMULATING HUMAN TASKS USING SIMPLE NATURAL LANGUAGE INSTRUCTIONS

Moon Ryul Jung*Jugal K. Kalita+Norman I. Badler*Wallace Ching*

* Department of Computer & Information Science,University of Pennsylvania,Philadelphia, PA 191046389.

ABSTRACT

We report a simple natural language interface to ahuman task simulation system that graphically dis-plays performance of goal-directed tasks by an agentin a workspace. The inputs to the system are simplenatural language commands requiring achievement ofspatial relationships among objects in the workspace.To animate the behaviors denoted by instructions, asemantics of action verbs and locative expressions isdevised in terms of physically-based components, inparticular geometric or spatial relations among therelevant objects. To generate human body motionsto achieve such geometric goals, motion strategiesand a planner that uses them are devised. The basicidea for the motion strategies is to use commonsen-sical geometric relationships to determine appropri-ate body motions. Motion strategies for a given goalspecify possibly overlapping subgoals of the relevantbody parts in such a way that achieving the subgoalsmakes the goal achieved without collision with objectsin the workspace. A motion plan generated usingthe motion strategies is basically a chart of tempo-rally overlapping goal conditions of the relevant bodyparts. This motion plan is animated by sending it toa human motion controller, which incrementally findsjoint angles of the agent’s body that satisfy the goalconditions in the motion plan, and displays the bodyconfigurations determined by the joint angles.

1 INTRODUCTION

Computer animation of human tasks in a givenworkspace is an effective tool for visualizing the re-sults of task simulation. To facilitate the use of taskanimation for this purpose, the workspace designersshould be allowed to specify tasks using a high levellanguage which does not require extensive training orlearning. Natural language is most suitable amongsuch languages, because it is especially geared to de-

+ Department of Computer Science,University of Colorado,

Colorado Springs, CO 80933.

scribing and expressing human behavior. Languageshave evolved rich verbal, prepositional ancl adverbialvocabularies to facilitate the expression of subtle aswell as complex movements. Accordingly, we havebeen conducting research towards achieving realisticanimations of tasks specified in terms of a sequenceof natural language instructions (Badler et al 1990),such as Put the block on the table, Close the door, Putthe block in the box and Push the block against thechair. Typically natural language instructions spec-ify what is to be done, and rarely indicate the spe-citic articulations required of body segments or partsto achieve a given goal. The agent should plan hismotions so that the goal can be effectively achieved.Typical motions include taking a step, m-orientingthe body, kneeling down, and reaching around an ob-stacle. The planned motions are dependent upon thestructure of the workspace and the situations in whichthe agent is positioned. To support the use of naturallanguage instructions as a human task simulation lan-guage, we should be able to represent goals conveyedby instructions in terms of fine-grained physical com-ponents, in particular geometric or spatiad relationsamong the relevant objects. We also need the capabil-ity of human motion planning by which body motionsneeded to achieve the fine-grained physical relationsare found. This paper addresses how to provide thesetwo capabilities. More specifically, given a task levelgoal in a given workspace specified in simple natu-ral language commands such as Put the block on thetable, the problem of this study is to automaticallyfind a set of possibly overlapping primitive motionsthat the relevant body segments of the agent mustperform to achieve the task.

Z HANDLING NATURAL LANGUAGEINPUTS

To handle natural language instructions, we need todevelop an approach to representing the semantics of

1049

Page 2: Proceedings of the 1991 Winter Simulation Conference Barry

1050 Jung, Kalita, Badler and Ching

action verbs. Action verbs arethat an agent is instructed toalso contain prepositional and

used to denote actionsperform. Instructionsadverbial modifiers to

provide values of arguments—obligatory as well aaoptional, aa well as other relevant information to en-able the performance of the underlying task.

Elsewhere, we have discussed in detail the schemeadopted for the representation of meanings of actionverbs and their modifiers (Badler 1990, Kalita 1990a,Kalita 1990b, Kalita 1991). We specify an imple-ment able semantics of action verbs and their modi-fiers in terms of physically-based components. Theseinclude geometric or spatial relations among relevantobjects, relationships among sub-act ions such as se-quentiality, concurrency and repetitiveness, inherentspecification of an action’s termination condition orlack thereof, and inherent motion or force-relatedcomponents of word meanings. Specification of geo-metric relations among objects is crucial in describinga physical environment as well as describing how anaction changes an environment. Thus, specificationof geometric relations must play an important rolein the lexical semantics of action verbs which denoteactions performed by one or more agents. Geometricrelations provide information regarding how one ormore objects or their sub-parts relate to one anotherin terms of physical contact, location, dist ante amongthem, and orientation. It will suffice to classify geo-metric relations into two classes-positional and ori-entational. Positional geometric relations specify asit uat ion in which a geometric entity (a point, line,surface or a volume) is related to (or constrained to)another geometric entity. For example, the verb putwhich requires two obligatory objects as argumentsspecifies that a certain geometric relation be estab-lished between the two objects. The particulars ofthe geometric relation are provided by prepositionalphrases as seen in the two sentences: Put the ball onthe table and Put the ball in the box. The preposi-tion on specifies that an arbitrary point on the ballbe related to or constrained to an arbitrary point onthe surface of the table. Similarly, in in the secondsentence refers to the fact that the ball or the volumeoccupied by the ball be constrained to the interiorvolume of the box. There are two geometric entitiesinvolved in the case of both on and in. In the caseof the in example, the two entities are both volumes,viz., the volume of the ball and the interior volumeof the box. In the case of the on example, thereare two points—a point on the ball and a point onthe table. In general, a geometric relationship relatestwo geometric entities each of which may be 0,1,2,or 3-dimensional. We call them the source-space anddestination-space, or simply source and destination.

Although not discussed in this paper, our repre-sentation also allows for specifying motion and force-related components of a word’s meaning, and canbe used to handle verbs such as move, push, pull,roll, put, fill and turn. Similar representations havealso been developed for prepositions such as against,along, around, from and to, as well as adverbs suchas gently, quickly and exactly.

Kalita (1990b) demonstrated the validity and use-fulness of the representation scheme adopted, by per-forming animation of the underlying tasks startingfrom natural language instructions. An earlier at-tempt towards the establishment of a link betweennatural language instructions and the graphical an-imation of underlying tasks was also reported in(Esakov 1989, 90). In the current paper, we will dis-cuss how an example command put the block on thetable is interpreted and planned. We assume that inour environment, the block is initially sitting at thebottom of an open box and that there is a table bythe side of the box. The agent is standing in a po-sition where she can reach the block as well as thetable without walking.

3 REDUCING THE TASK LEVEL GOALS

Let us consider the command Put the block on the ta-ble. This command will be used as a running exampleof planning and animation. The relevant lexical en-tries for the verb put and the preposition on are givenbelow. The lexical entries for the determiner and thenouns are not shown here. In these definitions wehave removed some of the details to make them morepresentable. Put the block on the table is paraphrasedas saying Put the block so that the block is on the ta-ble, and thereby parsed into put(hearer, (the block),on(’(the block), (the table))). Our lexical definition ofthe verb put is given as

put(agent, object, spatial-relation) +achieve(agent, positional-goal( spatial-relation))

According to this definition, put(agO, blockl,on(blockl, tablel)) is translated into achieve (agO,positional-goal(on (blockl, tablel))), It means that anagent agent is to achieve a particular positional goalbetween blockl and tablel denoted by function termpositional-goal( on(blockl, tablel) ). That particularpositional goal is dependent on the nature of the spa-tial relationship on and the natures of the two objectsblockl and tablel.Here we present only one sense of the preposition

on. This sense is associated with the most commonusage of on and is expressed in terms of three compo-nents: above, contact and support. Using these com-

Page 3: Proceedings of the 1991 Winter Simulation Conference Barry

Simulating Human Tasks 1051

ponents, the positional goal associated with spatialrelationship on(objectl, object2) is specified as fol-lows:

positional-goal(on(object 1, object2)) +above(source-space, destination-space) andcent act (source-space, destination-space) andsupport (destination-space, source-space), wheresource-space :=

any-of(self-supporting-space-of(object 1));destination-space:= an area X in

any-of((supporter-space-of(object2))such that horizontal (X) and

direction-of(normal-to(X))= “global-up”and free(X);

Here, ‘:=’ represents assignment of a value to aplaceholder. The source space and the destinationspace are determined by the geometric relationshipintended and the nature of the objects involved in therelationship. The source space is a selj-supporting-space of objectl. A self-supporting-space of an objectis a point, line or surface on the object on which itcan be supported. This is a fundamental functionalproperty of an object. Some examples are that a ballcan be supported on any point on its surface, a cubicblock on any of its faces, and a table on its four legs.

The destination space is an area on the supporter-space of objeci2. A supporter-space of an object is afeature of the object that can support other thingsagainst the force of gravity. The knowledge basehaa information as to which face of a block is a self-supporting-space and which face is a supporter-space.For example, a t able’s function is to support objectson its top surface, and a bookcase supports objectson the top surface of the individual shelves. It is alsorequired that (1) the destination space is horizontal,(2) the normal vector of the destination space, theoutward going vector perpendicular to the destina-t ion space, is up with respect to the global referenceframe, and (3) the destination space is a free areaspacious enough to accommodate the source space.

Using the definition for positional-goal(on(objectl,object2)), and the knowledge base entries for theblock blockl and tablel, the goal achieve(agO,position al-goal(on(blockl, tablel))) is reduced into amore refined form:

happen(achieve( agO,holdat(above(Src, Dest) and

contact(Src,Dest) andsupport(Dest,Src), T2)),

E, [T1,T2]).

Here, Src is the source space of object blockl and Destis the destination space of object tablel. Formulas

of the form happen(achieve(Agent, holdat(Rel, T2)),E, [Tl, T2]) means that an action E, in wlhich agentAgent achieves goal Rel to hold at time Tt?, happensbetween time T1 and T2. The component subgoalsabove, contact, and support can be geometrically de-fined aa follows.

above(Src,Dest) epositioned-at (center-of(Src), Pos) such thatminimal-vertical-distance-between(Src,llest ) > 0.

That is, Src is above Dest if the center of Src ispositioned at some location Pos so that the minimalvertical distance between Src and Dest is greater thanzero.

contact(Src,Dest) ~paitioned-within( center-of(Src), Dest) andalign-direction( normal-to(Src), normal-to(Dest)).

That is, Src contacts Dest if the center of Src is posi-tioned within Dest, and Src and Dest are parallel bymaking their normal vectors aligned with each other.

support(Dest,Src) ~intersect (vertical-line-through( center-of-nmss( Src)),

Dest).

That is, Dest supports Src if the vertical line passingthrough the center of mass of Src intersects Dest.

4 GEOMETRY OF THE HUMAN FIGURE

Our human figure model is anthropometrically real-istic. Figure 1 shows a geometry of the human fig-ure model. The human figure is modelleld aa a treeof rigid segments connected to each other via joints.For example, the left lower arm segment and the leftupper arm segment are connected to each other viathe elbow joint. All the segments in the human figureand in the workspace are assumed to be polyhedra.The human figure model haa 36 joints with a total of88 degrees of freedom in it, excluding the hands andfingers. The human figure haa an upper bcldy consist-ing of 17 segments and 18 vertebral joints (Monheit& Badler 1991). Each joint is described by defin-ing a local reference frame (coordinate system) on it.The degrees of freedom of a joint is the number ofindependent parameters that uniquely determine theorient ation of the joint with respect to the joint ref-erence frame. In the case of the human figure, theparameters are joint angles measured with respect toeach joint’s reference frame. For example, the elbowis modelled to have one degree of freedom, and thewrist three degrees of freedom. The wrist joint withthe three degrees of freedom is considered to have

Page 4: Proceedings of the 1991 Winter Simulation Conference Barry

1052 Jung, Kalita, Badier and Ching

three joint angles that determine its orientation. Thespatial configuration of a figure with N degrees offreedom at a time is uniquely determined by a se-quence of N joint angles in the figure at that time.

The segments of the body whose end points are freeare called end effecters. At the level of gross motionplanning, hands and feet are typically considered endeffecters. To specify a desired spatial configuration ofthe end effecters or other parts of the body, we definelocal reference frames on the body parts and stategoal conditions in terms of the origin positions andorientations of these reference frames. Such referenceframes will be called reference sites, and are consid-ered handles on the body. The movement of a givenbody part is dependent on the base or starting jointwith respect to which the end effecter is moved. Suchbase joint is called the pivot joint and the segmentsfrom the baae joint to the moving body part is calledthe moving body chain. In figure 1, the shaded partis the moving body chain determined relative to thepivot joint waist. We use two pivot joints: a shoulderand a waist. For example, if the waist k the pivotjoint, the segments in the chain are the palm, thelower and upper arms, and the vertebrae forming thetorso. The joints in the chain are the wrist, the elbowjoint, the shoulder joint, the joints among the verte-brae, and the waist joint. This means that the palmand the arms can be moved in any direction and ori-entation within their physical limits. The torso canalso be moved and bent at the waist.

Given a sequence of joint angles of the movingbody chain, the spatial configuration of the moving

body chain is uniquely determined. This mappingfrom the joint angles to the spatial configuration iscalled forward kinematics and usually comput at ion-ally straightforward. The opposite process of findingthe joint angles that determine a given spatial con-figuration is called inverse kinematics, and is difficultcomput ationally.

5 POSTURE PLANNING

In planning human body motions for a given fine-grained geometric goal, we are concerned with grossmotions rather than fine motions. The process offinding the appropriate movements of relevant bodyparts or segments for gross motions is called postureplanning. The planned sequence of primitive motionsare sent to a process animator, which finds and graph-ically displays the body configurations over time forthe planned motions.

Most AI symbolic planning works treat body mo-tions such ss put on as primitive, and do not con-sider how anthropometriczdly realistic human agentswill achieve such primitive tasks. However, we needsuch capability in order to use animation of the hu-man figure for evaluation of workspaces for humanoperators. The techniques of robot motion planning(Lozano-Perez 1983, 1985, 1987, Donald 1987, Bar-raquad, Langlois & Latombe 1989) could be used forplanning motions of realistic human agents. The com-mon framework of most robot motion planning tech-niques is that all information about object boundariesin the environment and about movement of the ma-nipulator is represented in terms of joint angles of themanipulator. Note that a manipulator (robot arm) ateach moment can be represented by its joint anglesat that moment, because a sequence of joint anglesdetermines the spatial configuration of the manipu-lator. The motions of the manipulator is found bysearching through the joint angle space from the ini-tial joint angles to the final joint angles, so that asequence of intermediate joint angles of the manipu-lator satisfies given constraints or motion criteria. Wecan use this framework, if a motion criterion underconsideration can be directly characterized in termsof joint angles. Trying to characterize motion criteriain terms of joint angles of the manipulator is natu-ral because the purpose of the motion planning is todetermine spatial configurations of the manipulatorover time, and in turn joint angles of the manipu-lator over time. However, for human agents whosebehaviors are much more complicated than robots,it is not easy to characterize all the motion criteriadirectly in terms of joint angles of the human body.For the current robot manipulators, motion criteria

Page 5: Proceedings of the 1991 Winter Simulation Conference Barry

Simulating Human Tksks 1053

can be characterized in terms of joint angles, becausetheir behaviors are rather simple. Usually, the base ofa robot manipulator is assumed to be fixed at a givenlocation and the distance between the initial configu-ration and the final configuration of the manipulatoris assumed to be relatively short.In the case of human motions, various kinds of rea-

soning are needed before characterizing motions di-rectly in terms of joint angles. Hence the basic ideaof our approach is that before transforming the hu-man motion problem into purely numerical problemformulated with respect to the joint angle space, weneed to apply symbolic reasoning based on our com-monsensical geometric notions. Symbolic reasoningis used to deal with motion criteria that are difficultto specify in terms of joint angles. More specifically,the Euclidean space positions and orientations of ref-erence sites of the human figure are manipulated tosatisfy motion criteria. The joint angles causing suchpositions and orientations are computed only whenintermediate spatial configurations are suggested, inorder to check whether the suggested spatial configu-rations are feasible in terms of joint angles. The jointangles for a given spatial configuration is computedby an inverse kinematic algorithm.

To plan for a given fine-grained geometric goal, theplanner obtains the goal position of the end effec-ter that would satisfy a given geometric goal, andthen generates intermediate goal conditions of therelevant body parts. These goal conditions are gen-erated based on the inherent motion constraints suchaa avoiding obstacles and respecting the limit of theagent’s muscular strength while lifting or moving ob-jects. More specifically, given a goal, goal-directedmotion strategies are employed to generate the goalconditions about the positions and orient ations ofthe reference sites on the relevant body parts, sothat a given set of motion constraints are respected.These goal conditions comprise a chart of temporallyoverlapping goal conditions about the relevant bodyparts. Then, the joint angles that satisfy the overlap-ping sequence of the goal conditions are computedusing a robust inverse kinematic algorithm (Zhao1989, Phillips, Zhao & Badler 1990, Phillips& Badler1991). Zhao’s algorithm uses a numerical optimiza-tion technique.

In general, motion criteria (motion strategies orrules) to be used to plan human motions can varydepending on the purpose of task animation, whichdetermines how fine-grained the simulated motionsshould be. The motion strategies are specified usinginformation about the structures of the workspaceand the agent, and the relationships between theagent and the world. As an example of motion strate-

gies, we describe relatively fine-grained motion strate-gies by which the motions of the agent are plannedfor a given fine-grained geometric goal. The basicidea of the motion strategies is to try straightforwardplanning initially, and then, if the initial planningis not fessible, to find the causes of the failure andgenerate goal conditions to avoid them. The precess of trial and error, however, would not be seenduring the execution time. The motions are plannedby exploiting the structure of the human body. Itis assumed that human agents moves with their feeton the ground or some horizontal base. ‘The humanbody motion planning is decoupled into upper bodyplanning and lower body planning. The interactionbetween the two is controlled by the overall motionplanning. Lower body planning is basically planningmovements of the waist position, which includes step-ping, walking, and raising or lowering the body. Thewaist may be lowered, raised, or moved horizontallydepending on the goal position of the end effecter.Upper body planning is moving the end effecter toa given goal position. If moving the end effecter tothe given goal position is not possible with respect tothe current body posture, the lower body planning isinvoked to change the body posture. In this paper,we are concerned with the upper body plamning.

6 UPPER BODY PLANNING

The upper body planning is attempted with respectto a given pivot joint suggested by the ove rail motionplanning. If the end effecter cannot directly reach thefinal goal with respect to the given pivot joint, theplanner tries to find an intermediate goal position ofthe end effecter, from which the final goal is expectedto be reached. An intermediate goal position is se-lected so that it is within the reach of the end effecterwith respect to the given pivot joint, collision-free,and goal-direct ed. If such a selection cannot be made,backtracking to the overall motion planning occurs.

Heuristic rules used to find a collision-free loca-tion are defined differently depending on whether thestraightforward linear movement of the end effectertowards the final goal position is horizontal and jor-wards, horizontal and to the left, horizontal and tothe right, or other combinations. Here the directionsare determined using the coordinate system shownin figure 2. As an example of heuristic rules, con-sider the case where the linear movement of the endeffecter towards the goal position is horizontal andforwards and that movement causes collision with anobstacle. This movement is shown in figure 3 by thedotted line. In this case, an intermediate goal po-sition of the end effecter is collision-free if the end

Page 6: Proceedings of the 1991 Winter Simulation Conference Barry

1054 Jung, Kalita, Badier and Ching

effecter at that intermediate goal position is exactlyabove, exactly below, exactly to the right o~ exactlyto the left of above and to the right ox above andto the lefl o~ below and to the right o~ or below andto the left of the obstacle, by a given margin. Theother cases are treated similarly. The heuristics forcollision-avoidance of the inner segments is the sameas the heuristics for the end effecter.

back of X

left of X~ “9” ‘f x

front of X

t

agent looking at X

Fig. 2: Coordinate System.

Fig. 3: The Agent Moves The HandLinearly Towards The BlockFollowing The Dotted Line

An intermediate goal position is goal-directed if itreally helps the end effecter to reach the final goalposition. This goal condition is defined differentlydepending on whether the final goal position is withinsome cent airier or not. In a workplace, every objectis in some container object; the workspace itself isconsidered a container.If an object is within a container, it is assumed that

there is a opening through which objects in the con-t airier can be reached. The opening may be covered

or not. When it is covered, it is assumed that it canbe opened. In the case where the final goal position ofthe end effecter and the current position of the endeffecter are immediately within the same cent airier(including the workspace), an intermediate goal posi-tion is goal-directed if it is minimally away from thefinal goal position. An intermediate goal position isminimally away from the final goal position if it ischosen so that the length of the straight line from theintermediate goal position to the final goal position isminimized, while achieving other goal conditions andrespect ing the current constraints. In the case wherethe final goal position of the end effecter is withinsome container and the current position of the endeffecter is outside of that container, a goal-directedintermediate goal position should be within that con-t airier, or in front of the opening of the container, orminimally away from the opening of the cent airier.

When goal conditions are generated aa a result ofthe upper body planning, some of them are to beachieved at the same time. For example, in the casewhere the end effecter’s linear movement towards thefinal goal position FinalGoalPos is forwards and hor-izontal, and the end effecter collides with an objectObstacle while moving, goal conditions about an in-termediate goal position Pos of the end effecter maycent tin goal conditions such as (a) EndE@ector atPos is above Obstacle by a given margin D, (b) Pos iswithin the reach of EndEffector, and (c) Pos is mini-mally away from Opening of the cent airier. All thesegoal conditions about Pos should be achieved at thesame time.

7 TRANSLATING GOAL CONDITIONSTO OBJECTIVE FUNCTIONS

To achieve generated goal conditions, they are firsttranslated to objective functions, which are then tobe minimized by a numerical optimization technique(Zhao 1989). For example, consider a goal about anend effecter EndEfl such that its reference site refis at (Pos, O~i), that is, a goal such that the posi-tion of ref is equal to a position Pos and the orienta-tion of ref is aligned with a coordinate frame Ori.That goal is translated to two objective functionsMINrejpe. (refpos – POS)2 and MIN,e~Or, (re fori –

0@2, where refPO, refers to the position at whichthe reference frame re f is positioned and re fori refersto the axes of the reference frame ref. A goal aboutan end effecter En dEff such that it is above an obsta-cle Obstacle by a given margin D is translated to anobjective function that tries to make the minimal dis-tance between the end effecter and the obstacle equalto the given margin D. That function is representedbY M~N(re~po. ,ref..,) [minima/distance(X, Y) – D]z,

Page 7: Proceedings of the 1991 Winter Simulation Conference Barry

Simulating Human Tasks 1055

where X is EdEff(r.j ,O,,,ejori), and Y is Obstacle.EndEf f(vejPO, ,re j.,, ) is the end effecter EndE#whose

). As an-reference site mj is located at (re~POS, ~e~ortother example, the objective function F correspond-ing to collision-avoidance goal condition about a seg-ment Seg colliding with an obstacle Obstacle fromabove is a function that tries to make the minimaldistance between the segment and the obstacle equalto a given margin if that distance is less than themargin, and does nothing otherwise. That functionis represented by

{

‘lN(refpo.,Vefor: .)[minimaldi9tance(X, Y) – D]2if minimald@ance(X, Y) < D, where

F= X = Seg(r.fpO,,r.fOriJ,Y = Obstacle.

zero, otherwise.

Here! Seg(re jPO,,reJO.,) is the segment Seg whose ref-“).erence frame re~ is located at (refpo$, ~efors

8 PLANNING AND ANIMATION

To specify motion strategies that require overlappingtemporal relationships among goal conditions aboutand actions of body parts, a time-based action for-malism is devised. The action formalism is Kowal-ski’s Event Calculus (Kowalski 1986), extended toallow actions with durations. A motion plan is de-fined to be a set of actions together with the tempo-ral ordering goals on the actions. Motion plans forgoals which often overlap are found using a plannerthat uses a goal-driven backwards-chaining reasoningbased on given motion strategies and checks the phys-ical feasibility of the current plan using an inversekinematic algorithm.

Here we show an example of planning. Con-sider the workspace of figure 3, where block blocklis inside box boxl. Consider a goal achieve(agO,holdat(contact(blockl. side6, (X in tablel.top)), T)).It says that agent agO should achieve a goal sothat it holds at time T. That goal is a relation-ship in which the side side6 of the block blocklcontacts an area X in the top of the table tablel.It is assumed that blockl.side6 is the bottom ofthe block blockl in the current situation. Rela-t ive positions such as top and front are determinedwith respect to the agent’s view direction. Theabove goal is reduced to a subgoal achieve-with(agO,Pivot, agO.right-hand, holdat(contact (agO. right-hand,blockl.sidel), T), which means that agent agtl achievesholdat (contact(agO. right-hand, blockl.sidel), T) usinghis right hand with respect to pivot joint Pivot. Itis assumed that blockl. sidel is the top of the blockblockl in the current situation. It is assumed that the

agent agO decides to use the right hand as the end ef-fecter. It is also assumed that the box is open, thatthe opening of the box is the top side of the box, andthat the agent will use the waist joint as the pivotjoint Pivot.

Here we show what goal conditions should be gener-ated by the planner to find an intermediate goal posi-tion Pos of the end effecter agO.right-hand. In the cur-rent situation, the linear movement of the end effec-ter toward the destination space blockl.sidel wouldcause the end effecter to collide with the frcmt side ofthe box bozl (fig. 3). Among the collision-free inter-mediate goal positions, a position TopPosl satisfyingabove (TopPosl, boxl. opening) will be chosen becauseit is goal-directed, being in front of the opening ofboxl.Suppose the end effecter is moved to the chosen

intermediate goal position (fig. 4). Consider whatwould happen when the end effecter is moved fromthe intermediate goal position in figure 4 to the finalgoal position. Suppose that the (verticzd and down-wards) linear movement of the end effecter causes thelower arm to collide with the top edge of the front sideof the box.

Fig. 4: The Hand Is Above The Opening Of TheBox

To prevent the collision of the lower arm, the plan-ner generates a goal condition about the elbow po-sition. Candidate intermediate goal posit ‘ions of theelbow include to the left of, to the n“ght of, in frontof and behind the edge of the collision (the top edgeof the front side of the box). If the left, the right,or the front of the edge of the collision is chosen asintermediate goal positions of the elbow, it will fail.When the elbow is constrained to satisfy such inter-mediate goal positions, the end effecter will be placed

Page 8: Proceedings of the 1991 Winter Simulation Conference Barry

1056 Jung, Kalita, Badler and Ching

in such a way that the final goal position is outsideof the reach of the end effecter. Thus, the interme-diate goal position of the elbow is constrained to bebehind the edge of the collision (fig. 5), with respectto the view direction of the agent. If that goal con-dition in conjunction with the other goal conditionsenables the end effecter moved onto the block insidethe box without collision, the planning for the goal ofreaching the block is done, which is the case in thisexample.

1 if!!?1,

~--.4--------- i

I

,,* ..-,,—,“ --+-4----- ~_ . . . .3

Fig. 5: The Hand Is In The Box With The ElbowAway From The Top Edge OfThe Front Side Of The Box

9 CONCLUSION

The present study is a step towards achieving thegoal of specifying human task animation using nat-ural language instructions (Badler et al 1990). Tolink a given instruction with the corresponding ac-tion in the workspace, a physically-based semantics ofmotion verbs and prepositional phrases is presented,in which the task level goal denoted by the instruc-tion is reduced into relevant fine-grained geometricrelations among the relevant objects. We also showhow to plan gross human body motions needed toachieve the fine-grained geometric goak. To generatethe relevant goal conditions about body parts neededto achieve the goal obtained from the input instruc-t ion, we have identified the important goal conditionsof the end effecter, those of collision-avoidance andgoal- directedness, and have provided commonsensicalmotion strategies to achieve these goal conditions.The commonsensical motion strategies are devisedwith respect to the ordinary Euclidean space. Hu-man motion planning based on the heuristic motionstrategies is not complete, in that it may not find a

plan even when there is one, However, for the pur-pose of simulating human behaviors in workspaces,our approach has an advantage over the joint anglespace reasoning used by robot motion planners, whichpermits a complete planner. Our approach allows theplanner to use a wider range of motion strategies in-cluding commonsensical motion strategies, and thatis especially needed to animate behaviors describedby natural language commands.

ACKNOWLEDGMENTS

This research is partially supported by Lockheed En-gineering and Management Services (NASA John-son Space Center), NASA Ames Grant NAG-2-426,NASA Goddard through University of Iowa UICR,FMC Corporation, Siemens Research, NSF CISEGrant CDA88-22719, Air Force HRL/LR ILIR-40-02 and F33615-88-C-OO04 (SEI), and ARO GrantDAAL03-89-C-0031 including participation by theU.S. Army Human Engineering Laboratory, NatickLaboratory, and TACOM.

We would like to thank members of Animation andNatural Language Group at the University of Penn-sylvania for their insightful discussions. We wouldalso like to thzmk Cary Phillips and Jianmin Zhao,who freely shared their time explaining the techni-cal details of the human motion animation tools. Wethank Welton Becket and Suneeta Ramaswami forreading the draft of the paper.

REFERENCES

Badler, N., B. Webber, J. KaJita, J. Esakov. Ani-mation from Instructions. In Making Them Move:Mechanics, control, and animation of articulatedfigures, N. Badler, D. Zeltzer, B. Barsky (cd.).Morgan-Kaufmann, 1990.

Barraquad, J., B. Langlois, &J. Latombe. NumericalPotential Field Techniques for Robot Path Plan-ning. Technical Report STAN-CS-89-1285, De-partment of Computer Science, Standford Univer-sity, October 1989.

Donald, Bruce. A Search Algorithm for Motion Plan-

ning with Six Degrees of Freedom. Artificial Intel-ligence, 31 (1987). 295-353.

Esakov, J., Norm Badler, & Moon Jung. An Inves-tigation of language input and performance tim-ing for task animation. In Graphics Interface ’89,pp. 86-93, Morgan-Kaufmann. Waterloo, Canada,1989.

Esakov, J. & Norm Badler. An architecture for high-Ievel human task animation control. In Paul A.Fishwick and Richard B. Modjeski, editors, Knowl-

Page 9: Proceedings of the 1991 Winter Simulation Conference Barry

Simulating Human T=ks 1057

edge Based Simulation: Methodology and Applica-tion, Springer Verlag, Inc., 1990.

Kalita, J. K., & Norman Badler. A Semantic Anal-ysis of a Claas of Action Verbs Based on PhysicalPrimitives, Annual Meeting of the Cognitive Sci-ence Socie@, Boston, July, 1990(a), pp. 412-419.

Kalita, J .K. Analysis of Action Verbs and Synthesisof Tasks for Animation. Ph.D. Dissertation. Com-puter and Information Science. Univ. of Pennsyl-vania. 1990(b).

Kalita, J .K., & Norman Badler. Interpreting prepo-sitions physically, Annual Meeting of the AmerLcan Association for Artificial Intelligence, Anheim,California, 1991.

Lozan~Perez, T. Spatial Planning: A ConfigurationSpace Approach. IEEE Transactions on Comput-ers. Vol C-32. No. 2. (1983).

Lozano-Perez, T. Task Planning in: Brady M (cd.)Robot Motion: Planning and Control. MIT press,Cambridge, MAss, 1985.

Lozano-Perez, T. A Simple Motion-Planning Algo-rithm for General Robot Manipulators. IEEEJournal of Robotics and Automation, Vol RA-3,No. 3, June 1987.1987.

Monheit, Gary, & Norman I. Badler. A KinematicModel of the Human Spine and Torso. TechnicalReport MS-CIS-90-77, Department of Computerand Information Science, University of Pennsylva-nia, Philadelphia, PA, 1990.

Phillips, Cary, Jianmin Zhao, & Norman I. Badler.Interactive Rxm.1-time Articulated Figure Manip-ulat ion Using Multiple Kinematic Constraints.ACM Computer Graphics 24,2 (1990). pp. 245-250.

Phillips, Cary & Norman I. Badler. Interactive Be-haviors for Bipedal Articulated Figures. To appearin the Proceedings of SIGGRAPH 1991.

Zhao, Jianmin, & Norman I. Badler. Real TimeInverse Kinematics with Spatial Constraints andJoint Limits. Technical Report MS-CIS-89-09,Computer and Information Science, University ofPennsylvania, PA, 1989.

AUTHOR BIOGRAPHIES

MOON RYUL JUNG is a Ph.D. student in Com-puter Science at the University of Pennsylvania. Hereceived the BS degree in Computer Science andStatistics from Seoul National University, Korea, in1980, the MSC in Computer Science from Korean Ad-vanced Institute of Science and Technology in 1982.His research interests include human figure modelingand animation, human figure mot ion planning, reac-

tive/incremental planning, and natural lan,guage se-mantics.

JUGAL KALITA graduated from the IndianInstitute of Technology, Kharagpur (B. Tech, 1982),the University of Saskatchewan, %skatoon (M .Sc.,1984), and the Univeristy of Pennsylvania, Philadel-phia (Ph. D., 1991). He is an assistant professor ofcomputer science at the University of Colorado atColorado Springs. His research interests include nat-ural language processing, expert systems, and com-puter graphics especially task-based human figuresimulation and animation.

Dr. NORMAN I. BADLER is the CeciliaFitler Moore Professor and Chair of Computer andInformation Science at the University of Pennsylva-nia and has been on that faculty since 1974. Activein computer graphics since 1968 with more than 85technical papers, his research focuses on human fig-ure modeling, manipulation, and animation. Badlerreceived the BA degree in Creative Studies Mathe-matics from the University of California at Santa Bar-bara in 1970, the MSC in Mathematics in 1971, andthe Ph.D. in Computer Science in 1975, both fromthe University of Toronto. He is Co-Editor of theJournal Graphical Models and Image Processing. Healso directs the Computer Graphics Research Labo-ratory with two full time staff members and about 40students.

WALLACE CHING graduated from OxfordUniversity, England in 1988 with a B.A. in Engineer-ing Science. He is a Ph.D student in Computer Sci-ence at the University of Pennsylvania. His researchinterest includes robotics path planning, human fig-ure motion simulation, dynamics-based motion simu-lation and computer graphics.