frames of reference and direct manipulation based nagivation

17-Jun-09

1

Frames of Reference and Direct

Manipulation Based Navigation

Representation of Movement in Architectural Space

Abstract

Although highly advanced in visual realism, current virtual reality systems used bydesigners do not provide the user with the ability to interact effortlessly and movein space as desired as one does in real space. As a result, these systems offerlimited help to designers when they want to test or communicate the spatial qualityof their projects. The aim of the research was to develop a design tool fornavigation in virtual environments that can offer a sense of ‘immersion’ andvividness similar to that experienced when one moves in real space.

The research examined a number of typical cases of CAD-VR navigation systemsand analyzed their performance. All programs use direct manipulation: a virtualrepresentation of reality is created, which can be manipulated by the user throughphysical actions like pointing, clicking, dragging, and sliding. The analysis showedthat the lack of realism of these systems means that they do not offer a ‘sense ofreality’ because of the user’s inability to interact easily with the computer tonavigate among represented objects. The user cannot: 1. Plan a course from a givenlocation to a desired one. 2. Shift the direction of their gaze and focus attention toobjects as they move across a path 3. Move around an object keeping track of thischange in relation to the surrounding objects. 4. Turn an object in front of theviewer in order to examine it. This lack of ‘sense of reality’ cannot be simplyimproved by adding attributes to the system that are more realistic - details,shadows, and reflections.

Departing from pioneering rigorous studies developed in environmental design byKevin Lynch and his followers about ‘cognitive mapping’, and drawing on recentresearch in cognitive science on spatial thinking, the study identified the cognitiveprocesses through which people perceive their environment as they move throughit. In contrast to Lynch’s approach concerned with the visual quality of urbanenvironments focusing on visual urban cues for recognition and orientation withina city, the present research, related to movement through the built environment,concentrated on the linguistic commands. The ‘frames of reference’ people use toplan their path among objects, shift their attention to them, move around them, andturn them. The frames of reference used are 1. allocentric, 2. egocentric, 3. relative,4. intrinsic, and 5. absolute.

Following the criteria of realism and vividness in exploring the virtual worldthrough movement, the system uses an agent/avatar which is an immersed

17-Jun-09

ii

character that allows the user to direct navigation and the view of objects. Itpermits both agent-based and object-based navigation. The user can refer to themovement of the avatar as a basis for movement or to the object as a referencepoint. To enhance the feeling of engagement, the user’s input to the systemrequesting a change of viewing position, direction of view, object of view, or apath, or a view, is expressed in terms of natural language while the output remainsvisual. In other words, the user talks to the avatar and , at the same time sees on thescreen what the avatar views in the virtual world. The user-centered navigation toolproduces “on the fly” navigation most desirable for design professionalapplications as opposed to a tailored presentation. It can be applied in urbanenvironments as well as in architectural interiors, both using the same types of axesand frames of reference. It is targeted to support testing of the quality of designedenvironments, both interior and exterior, by individual designers, but it can be mosteffective in architectural presentations and debates where the architectcommunicates with various parties, while examining the various aspects of thethree-dimensional project.

Keywords:

CAD-VR Navigation Systems

Spatial Frames of Reference

Visual versus Language Representation of Space

Design Tool Development

Design Methodology

Urban Design Representation

17-Jun-09

iii

Acknowledgements

I would like to thank Prof. A Tzonis for giving me the space to formulate mythoughts and his time to exchange ideas. This fruitful exchange also involved Prof.Dr. W. Porter who followed the whole process. My thanks also go to the committeemembers Prof. Dr.E. Backer, Prof. Dr. Y. Kalay, Dr. E. Ross, and Prof. H.J.Rosemann, for their useful comments. I would also like to thank Prof. Dr. B.Tversky for heir help in initial stages of the dissertation.

I would also like to thank all past DKS associates, Prof. Dr. L. Lefaivre, Prof. R.Serkesma, and past colleagues Dr. P. Bay Joo Hwa, Dr. Ir. B. S. Inanç, Dr. K.Moraes Zarzar, and Dr. J. Press, Dr. P. Sidjanin, T. Beischer also student assistants,and J. Arkesteijn for administration support. I would also like to thank M.Richardson for editing my work.

I would also like to thank Bowkounde TUD for its financial support, and all thepeople of Bowkounde that made it such a pleasant environment.

17-Jun-09

i

Table of Contents

Abstract

Acknowledgements

Contents

1. Introduction

1.1 Navigational interface

1.2 Descriptive aspects of action

1.3 Spatial reasoning

1.4 Route knowledge

1.5 Navigational goal analysis

1.6 Examination of movement in virtual space

1.7 Outline of this study

2. The basic principles of navigation in virtual environments

2.1 Basic configuration of current navigational systems

2.2 Description of spaces that people navigate

2.3 Current performance of navigation in virtual reality

2.4 Basic assumptions concerning navigation

17-Jun-09

ii

3. Selected cases of Current navigational systems

3.1 Introduction to the state of the art in computer navigation

3.2 Exploration Programs - Cave™

3.3 Exploration Programs - Cosmo Player®

3.4 Exploration Programs - Myst®

3.5 Representation programs - 3DS MAX®

3.6 First person shooters - Tomb Raider®

4. Critical analysis of the state of the art technology

4.1 Historical origins of the ‘flatness’ of the computer screen

4.2 Limitations of interaction in virtual space with surrounding objects

4.3 Navigational criteria

4.4 Scenarios of interaction

4.5 Adequacy criteria for an agent directed by the user

5. Conceptual system of agent-based frames & axis navigation

5.1 Language of the path

5.2 Elements of the path

5.3 Method of examination

17-Jun-09

iii

6. Visual representation - implementation

6.1 Egocentric and allocentric systems

6.2 Panoramic communicative view

6.3 Possible interaction with objects and routes

7. Language based representation - implementation

7.1 Directing the path

7.2 How view changes; axes and frames of reference

7.3 Flat-shooters – Two-dimensional – Amazon’s Sonja

8. A comparison between visual & language-based representation

8.1 Aligning the visual and linguistic frames

8.2 Travel Guide – guided walk – Amsterdam

8.3 Comparison between linguistic and visual frames

8.4 Comparison between existing simulation programs

9.An Agent/object-based system of navigation

9.1 Conceptual framework

9.2 Controlling the pedestrian agent behavioral model

9.3 Operation of the system

9.4 Process of the system

17-Jun-09

iv

10. Usability of the system

10.1 Architectural case

10.2 Evaluation of the object- centered tool

10.3 Testing the hypothesis

11. Conclusions; future avenues

11.1 Review of process

11.2 Findings

11.3 Future research

Appendix A

References

Summary

Index

Samenvatting

17-Jun-09

1

CHAPTER 1

INTRODUCTION

There has been growing interest in computerized architecture over the lasttwo decades of architectural design. These Virtual Reality [VR], ComputerAided Design/Drawing [CAD] systems employ interactive software andhardware. Although the systems are advanced in visual realism, they are stilllacking in the ability to permit their users to interact intuitively with three-dimensional objects as they navigate among them. As architects design, thedesign process can be roughly divided into a creative part that generates thebuilt form, and an observation/exploration part, that attempts to comprehendthe consequences of that intervention. Thus the design process enables thearchitect to develop a solution to a required local condition. Currently thosetasks are given to digital media production houses as described in an articlewritten by Michel Marriott published in The New York Times on March 42004, entitled “For New Buildings, Digital Models Offer an Advance Walk-Through”. Michael Schuldt, president of AEI Digital, states: “What we'retrying to do is to bring this technology that's out there, that's been developedfor the video game industry and Hollywood, and bring that to the buildingindustry …It's really just a communications tool.” Indeed, AEI'spresentations often resemble video games and special effects in movies,permitting viewers to fly over and around structures - or even to enter them,walking into richly detailed corridors and exhibition halls. In order toestablish and evaluate the design product associated with human action [use

17-Jun-09

2

and meaning], architects utilize Virtual Reality systems that generate arealistic three-dimensional simulation of an environment.

On the other hand, according to an article by Edward Rothstein published in theNew York Times on April 6 2002 entitled “Realism May Be Taking the Fun Out ofGames”, “One of the major goals of video game systems has been to simulate thereal, to create images so lifelike, and movements so natural that there is no senseof artifice.” Existing CAD-VR systems are highly reductive and abstract, and yetoffer a limited potential to grasp the spatial quality of new design proposals. Thislack of ‘sense of reality’ can be improved not just by adding attributes that aremore realistic – details, shadows, and reflections – but also by improving thenavigation techniques. The virtual space for the purposes of this research is thebuilt environment, a simulation with human activity, a topological unit withcontinuous and discontinuous movement. According to Evans (1980), Passini,(1984), Garling, et al., (1986), and Peponis, et. al., (1990), in most situationsarchitects want to design a building to reduce wayfinding problems for the peopleworking or visiting a complex of designed environments. The research on the otherhand focuses on the mechanism for navigation. It proceeds from pioneering workdeveloped in urban studies, and follows a tradition of rigorous studies inenvironment design by Lynch (1960), Appleyard (1964), and Thiel (1961). Thesestudies analyzed the urban environment in order to examine the visual quality ofcities, and to enhance urban quality through people’s recollection of places. Thisresearch draws on knowledge from cognitive studies in linguistics: Jackendoff(1983), (1993), (Talmy, 2001), Levinson, (1996), and Levelt, (1996), and studies ofvision by Marr, (1982), Ullman, (1996), and O'Keefe, (1993), and spatial reasoningby Piaget (1948), Campbell (1994), Eilan (1993), Paillard (1996), and Brewer(1993). It also employs current studies in spatial and motion cognition by Taylor(1992), Crawford (2000), and Tversky (1981).

Direct Manipulation is used by all current navigational programs; a virtualrepresentation of reality is created, which can be manipulated by the user throughphysical actions like pointing, clicking, dragging, and sliding. In these simulatedenvironments one usually interacts with the aid of a visual display and a hand-inputdata device. At the low end it is a computer desktop system, which includes acomputer and a screen, and is controlled with a mouse. At the high end, it is animmersed environment system, which includes a computer controlled by a set ofwired gloves or a ‘magic wand’, a position tracker, and a head-mountedstereoscopic display for three-dimensional visual output, immersed in a cube-like

17-Jun-09

3

environment with a display projected on three to six sides of a box-like structure.Virtual reality in the broader sense is used to describe the sense of reality asopposed to virtual space of any kind, imagined or real. This study uses this limitedsense of computer-simulated virtual reality.

The basic limitation of current virtual reality systems lies in their failure to provideusers with an effective system for object manipulation and interaction within thevirtual environment. As we shall show, current programs designed for the architectare tools for modeling the single architectural object and are thus designed as anobject-centered view, allowing users to move around, but they lack the capacity tomanipulate objects while moving in a large-scale environment. This is inadequatefor a higher-level function of movement, used in our everyday description ofplaces. For example, the use of an input device for direct action manipulationcommands – forward and backward, left and right – does not make any sense insome instances when users need to travel long distances in virtual reality. Movingthe mouse the equivalent distance is simply too laborious as compared to an abrupttransportation from place to place. Other systems that allow for objectmanipulation supply the user with the ability to interact with the environment butlack control over movement to allow the manipulation of the observer’s point ofview.

The question that concerned this study is: what is the interactive experience ofobjects and places through navigation in such a virtual space? The answer lies inunderstanding the intuitive action that reflects our spatial reasoning in a virtualenvironment as opposed to “everyday reality”, based on the observation of whattype of action was used and what constrains it. The aim of this research is to helparchitects move in virtual reality by automating such activity, utilizing a verbaldescription. Navigation is a means to achieve one’s goal; here I follow theassumption concerning the importance of navigation similar to that of Berthoz: “Tolocalize an object simply means to represent to oneself the movement that would benecessary reach it.” (Berthoz, 2000 p.37) The tool is to be a mechanism thatfacilitates exploration in the virtual world, through interaction with objects.

The investigation focuses on the role of the cognitive structures utilized as onemoves through the environment, and tries to identify ‘the descriptive geometricconventions and cognitive processing involved’. This study aims to develop atheoretical framework through which the applied system can overcome suchweaknesses. To achieve this augmented navigational interaction this research willmobilize knowledge from cognitive science and design, and more specifically

17-Jun-09

4

knowledge that deals with spatial reasoning. The goal of this dissertation is topresent an augmented linguistic navigational system that is interactive and fulfilsthe requirements of the architect to explore new horizons in a virtual three-dimensional world. Thus the first part will examine the missing frames that do notallow one to manipulate objects freely in current navigational programs. Theobjective of the second part of this dissertation is to specify the main characteristicof a navigation system that is “on the fly”, a system that would be generic in itsapplication as opposed to a tailored presentation.

The challenge is to construct a tool that will explicate recent cognitive findings andthus represent a better model fit for human-computer interaction. The dissertationwill concentrate on the process and not the product. This is an interdisciplinaryresearch spanning related areas of endeavor, such as perception, cognition,neuroscience, linguistics, information science, and robotics. This is both itsstrength and its weakness, since it has to borrow many different terminologies andmake comparisons between them, thus expanding the number of areas of researchcovered.

1.1 Navigational Interface

The navigational interface is the response of the computer to our indication ofdesires and needs to experience environments visually, a command translatedthrough spatial reasoning to perform a required task. ISO 13407 defines aninteractive system as a “combination of hardware and software components thatreceive input from, and communicate output to, a user in order to support his orher performance of the task.” But how does one examine the actions of a person tounderstand whether the interaction is effective? “An interaction technique defines aconsistent mapping between the user and the virtual environment technology, andhow the environment will react as a user interacts with the input devices.” (Willanset al. 2001) What this technique is, is what we intend to explore.

Users of Virtual Reality (VR) systems can suffer from severe usability problems,such as disorientation and an inability to explore objects naturally. Therefore thereis a need for better-designed VR systems that support exposition, navigation,exploration and engagement (Shneiderman, 1982). Significant usability problemswith current VR systems have been reported by Tognazzini (1989), Broll et al.(2001), Azuma et al. (2001), and van Dam (2000). According to Norman (1986),user activity is achieved through the examination of the stages of performing andevaluating action sequences. “The primary, central stage is the establishment of the

17-Jun-09

5

goal. Then, to carry it out requires three stages: forming the intention, specifyingthe action sequence, and executing the action. To assess the effect of the actionalso requires three stages, each in some sense, interpreting the state, andevaluating the interpreted state with respect to the original goals and intentions.”(Norman, 1986 p. 37) According to Shneiderman, (1998) one should consider thesyntactic-semantic model, which made a major distinction between meaningfulacquired semantic concepts and arbitrary syntactic details. Evaluation methods maybe able to discover some usability problems but no current evaluation methods fitthe specific problem of navigation in three-dimensional virtual environments. Itmay be argued that conventional usability evaluation methods, such as a layeredinteraction analysis (Nielsen 1992), or co-operative evaluation with users todiagnose problems (Monk et al. 1993), or cognitive walkthrough (Wharton et al.1994), could be applied to virtual reality systems. However, neither Shneiderman’ssyntactic-semantic model nor Nielsen’s heuristics addresses issues of locating andmanipulating objects, or navigating in three-dimensional environments; whileneither Norman’s theory of action nor cognitive walkthroughs (Wharton et al.1994) were designed to address perceptual orientation and navigation in virtualenvironments.

1.2 Descriptive aspects of action

When one examines the relationship between the representation of a space and‘movement’, a gap appears – a “mismatch” according to O'Keefe (1990 p.52) –between what we imagined the space to be and what that space is like. Thisperception of what one imagined can tint our acquisition of knowledge from anexperience. According to de Certeau (1984 p. 93), “Perspective vision andprospective vision constitute the two-fold projection of an opaque past anduncertain future into a surface that can be dealt with.” The definition of perceptionof concern to this research is an active process that manifests itself in three mainareas: vision, language, and body movement - kinesthetic. Perception is where “theobserver is constructing a model of what environmental situation might haveproduced the observed pattern of sensory stimulation” (Palmer 1999 p. 10). Thatis, the viewer desires are the bases for its interpretation. It is what one believes therelation between objects is, with possible interpretations competing against eachother.

When attempting to describe human action, the fundamental questions that have tobe asked are: what is its nature? What is its essence? And how will we describe it?

17-Jun-09

6

“Actions are agent-managed processes” (Rescher, 2000). Action as a philosophicalterm has a long history. Here I will circumvent the philosophical debate and enterthe discussion with an attempt to describe action, believing that in order to labelaction, one has to describe it. According to Rescher (2000), the conceptual tools forwhat might be called the canonical description of an action inventory are thefollowing:

1. Agent (Who did it?)

2. Act-type (What did he do?)

3. Modality of action (How did he do it?)

a. Modality of manner (In what way did he do it?)

b. Modality of means (By what means did he do it?)

4. Setting of action (In what context did he do it?)

a. Temporal aspect (When did he do it? How long did it take?)

b. Spatial aspect (Where did he do it?)

c. Circumstantial aspect (Under what circumstances did he do it?)

5. Rationale of action (Why did he do it?)

a. Causality (What caused him to do it?)

b. Finality (With what aim did he do it?)

c. Intentionality (From what motives did he do it?)

The explanation of why an agent performs an action has to do with rationalizationof the goal into intention, that is the decision to act to achieve a goal. However,intention must also include motivation and belief underpinning the reasons to act.The critical questions that the user asks when moving in space are: “What is ofinterest?” and “How does one get there?”

According to Jackendoff, (1972) the qualification of perception of causality can bedivided into STATES and EVENTS. STATES are considered non-dynamicinstances, while EVENTS are associated with dynamic causation. The distinctionbetween them is a fundamental frame position that determines their representation.The distinction between moving oneself in space and moving objects in space isbasically assumed to be a distinction of the awareness of what one sees[recognition] and how one acts. In our case, the distinction between movingoneself and moving objects is the state one is in, a distinction between perceptionand action. This can be summarized thus:

17-Jun-09

7

Desire: I want to see X1 object in relation to X2 object.

Belief: If I change location to L2 then I will see (X1 object in relation to X2 object).

Fact: What X objects are in relation to other X objects.

Backing: I can see no hindrance to view X1 object in relation to X2 object.

Base: I have done similar operations before.

Thus we see that the above argument is about the change of state; regardless of its‘primary reason’ it is warranted through the backing of spatial reasoning.

1.3 Spatial reasoning

There are two types of reasoning about space: visual reasoning and spatialreasoning. Visual reasoning is how one extracts information from theenvironment. The process of visual perception is conceptualized as a hierarchyspanning two main levels. Low-level vision is usually associated with theextraction of certain physical properties of the visible environment driven purely bystimulus input, such as depth, three-dimensional (3-D) shape, object boundaries, orsurface material properties. High-level visual processing relies on previously storedinformation about the properties of objects and events that allow us to identifyobjects, navigate, and reach toward a visible goal. It is beyond the scope of thisstudy to provide a thorough account of what the visual system does to produce amental representation; the research assumes that object recognition is a solvedproblem and focuses instead on motion and the relationship between objects andthe observer. However, several aspects of visual reasoning are important for ourconcern here. According to Goodale, “The ventral stream plays the major role inthe perceptual identification of object, while the dorsal stream mediates therequired sensorimotor transformation for visually guided actions directed at thoseobjects. Processing within the ventral stream enables us to identify an object, suchas a ripe pear in a basket of fruit; but processing within the dorsal stream providesthe critical information about location, size, and shape of that pear so that we canaccurately reach out and grasp it with our hand.” (1995, p. 177) The brain(ventral) system seems to be involved in identifying objects, whereas the parietalcenters in the upper (dorsal) system seem to be involved in locating objects. Thesetwo pathways are often called the “what” system, whose task is objectdiscrimination, and the “where” system, whose task is landmark discrimination(Palmer, 1999 p. 38) (Landau, 1993). We shall show later how those findingsconnect to our model of movement command.

17-Jun-09

8

The second type of reasoning about space is spatial reasoning . “Spatial reasoningis the relation between an observer and the space that surrounds that observer inorder to achieve some defined goal though movement.” (Presson, 1982) Reasoningabout space is how one plans how one will proceed to move in such an event/state,or concluding where something is in relation to where it was first, assuming certainpremises. “Reasoning about space is tied to the ‘how’ system representation. The‘what’ system and the ‘where’ system both sub-serve spatial guided motorbehaviour – the ‘how’ system.” (Palmer 1999 p. 39) Thus, spatial reasoning isalways related to a task to be performed, the basis of which are perception andvisual reasoning. Spatial reasoning is also related to causality – the knowledge bywhich one knows how the environment as well as objects within that environmentwill perform. To plan a route, a representation of the environment must beachieved, at the level of mental representation devoted to encoding the geometricproperties of objects in the world and the spatial relationships among them, i.e.visual and verbal description, expressions of multifaceted origin, of goals anddesires (Huttenlocher, 1991), (Crawford, 2000).

1.4 Route knowledge

In order to have route knowledge, two processes must be included to bridge the gapbetween passive observation of the temporal succession of sensory images (State)and active navigation using a description of the fixed environment (Event). Thefirst process constructs descriptions of routes traveled so that they can be followedwithout guidance, or mentally reviewed in the absence of the environment. Thesecond process must construct descriptions of the fixed features of the environment(places and paths) from the succession of Views and Actions. According toKuipers (1983), there are two capabilities we would expect to correspond withknowing a route: being able to travel the route without guidance, and being able todescribe or review the route without physically moving along it. These are differentcapabilities, since we occasionally experience the anomalous state of being able totravel a route, but not being able to describe it without traveling it. Traveling theroute must include, as a minimum, knowing which action to take at any givenmoment when faced with a particular view. Thus, knowledge of a particular routemust include a set of associations (View - Action) between descriptions of sensoryimages and the corresponding actions to take. An action like travel would terminateat the next decision point, when the current View should activate an associationproviding the next Action. The ability to rehearse the route in the absence of the

17-Jun-09

9

environment requires, in addition, a different set of associations (View - Next-View) from the current View to the View resulting at the next decision point.

We can represent both associations in a three-part schema for a path, an element ofthe route: (Context: <View>; Action: <Action>; Result: <View>).

The route description consists of a set of these schemas, each indexed for retrievalunder the View in its Context part. As a new route is learned from experience, newschemas are created, and their Action and Result components are filled in.According to Kuipers (1983), a route description consisting of a set of partiallyfilled-in schemas constitutes partial knowledge of the route, and has someinteresting properties. When only the Context parts of the schemas are filled, theroute description supports recognition of landmarks, but not self-guided travel;when the Action parts are also present, the route description supports self-guidedtravel but not mental rehearsal of the route apart from the environment; finally,when all three components are present, knowledge of the route is complete. Thesestates of partial knowledge allow the route to be learned incrementally fromobservations when processing resources are scarce.

The other major process at the foundation of the route knowledge is the creation ofdescriptions of the fixed features of the environment – places and paths – given thetemporal sequence of sensory images experienced during travel. We must augmentthe cognitive map to include a “place” description that has explicit associationswith the Views obtainable at that place, and is the object of the topological andmetrical relations. We then ask how the many Views available in our agent’ssensory world come to be grouped according to place. As a minimum, we expectthat two views will be associated with the same place description if they are linkedby an Action consisting of a rotation with no travel. In fact, this relation is anequivalence relation on the set of Views, and the equivalence classes turn out tocorrespond exactly with the places in the environment.

Precisely the same technique can be used to define path descriptions as equivalenceclasses of Views joined by travel without rotation. A path, by this definition, ismore than an arc in a network. Rather it collects the Views occurring along a street.Since those Views may also correspond to places, a topological connectionbetween a place and a path is made whenever the current View has associations toboth a place and a path description. The spatial order of places along a path can beobtained, incrementally, from the temporal order of Views during travel.

17-Jun-09

10

1.5 Navigational goal analysis

Spatial reasoning related to navigation can be further divided into tactical andstrategic navigation. The hierarchy of navigation behaviors can be examinedthrough performance analysis, whereby one would have a short aim task – tacticalnavigation, compared with long goals – strategic navigation. For example, a taskcould be to move from one place to another on a particular route, while the goalcould be arrival at a desired location at the end of the route. “Strategy can bedefined broadly as the individual’s approach to the task” (Rogers, 2000). This typeof analysis requires that the intention of subject be revealed. A navigational goal isfor someone to be somewhere, and from there to see something. In the virtualworld it is the placement of a person at a specified location with a perspectiveview. To achieve this goal, to be somewhere, one has to have an intention, a sub-goal that breaks the movement into its components. For example, if our goal is toreach the town hall, our intention is the script that takes us through the process ofgetting there. It might follow this script: Leave the house, turn left, go to the streetcorner and turn left again… This task identity can then be broken down into high-level and low-level action. According to Norman (1986), a convenient summary ofanalysis of tasks (the process of performing and evaluating an action) can beapproximated by seven stages of user activity:

Goals and intentions: A goal is the state the person wishes to achieve; an intentionis the decision to act so as to achieve the goal.

Specification of the action sequence: The psychological process of determining thepsychological representation of the actions to be executed by the user on themechanisms of the system.

Mapping from psychological goals and intentions to action sequence: In order tospecify the action sequence, the user must translate the psychological goals andintentions into the desired system state, then determine what settings of the controlmechanisms will yield that state, and then determine what physical manipulationsof the mechanisms are required. The result is the internal, mental specification ofthe actions that are to be executed.

Physical state of the system: The physical state of the system, determined by thevalues of all its physical variables.

Control mechanisms: The physical devices that control the physical variables.

17-Jun-09

11

Mapping between the physical mechanisms and system state: The relationshipbetween the settings of the mechanisms of the system and the system state.

Interpretation of system state: The relationship between the physical state of thesystem and the psychological goals of the user can only be determined by firsttranslating the physical state into psychological states (perception), theninterpreting the perceived system state in terms of the psychological variables ofinterest.

Evaluating the outcome: Evaluation of the system state requires comparing theinterpretation of the perceived system state with the desired goals. This often leadsto a new set of goals and intentions.

However, as a normative theory, it still lacks action analysis; for that I shall followcriteria similar to those set by Ullman (1996 p. 272). The action criteria are:

1. Spatial properties and relations are established by the application of a routeto a set of early representations.

2. Routes are assembled from a fixed set of elemental operations.

3. New routes can be assembled to meet newly specified processing goals.

4. Different routes share elemental operations.

5. A route can be applied to different locations. The processes that perform thesame route at different locations are not independent.

6. In applying routes, mechanisms are required for sequencing elementaloperations and for selecting the locations at which they are applied.

The use of routines to establish shape properties and spatial relations raisesfundamental problems at the levels of computational theory and the underlyingmechanisms. A general problem on the computational level is to establish whichspatial properties and relations are important for different tasks.

1.6 Examination of movement in virtual space

The first task of this study is to explain some of the reasons why existing virtualreality programs seem so detached from the observer. The study will attempt toclarify some of the misdirected technological analysis generated in some recentarticles about the phenomena of interaction, such as those written by Lauwerreyns(1998), Laurel (1986), Hutchins (1986), Myers (1999), Conway (2000), Pierce(1997), Stoakley (1995), Steuer (1992), McOmber (1999), Biocca (1992), and

17-Jun-09

12

many others. The study will attempt to expose the relationship between theobserver and objects in the virtual environment and follow models proposed byJackendoff (1983) and elaborated since.

Since there is no model available for the examination of action that fits our specifictask – for example see (Rubin, 1994) (Hackos, 1998) – the methodology that wasused for the examination of the performance of interaction within virtualenvironments was the case study. The cases study examined existing computernavigational programs. According to Yin (1994), from the case study methodologyone can generate the adequacy criteria of action. From the new criteria, one canbuild a conceptual system and respond to some of the difficulties encountered inthe integration between cognitive and computerized systems. The conceptualsystem is conceived through an analysis of the spatial reasoning in directingmovement through frames of reference. The conceptual system is constrained bytranslation and transformation rules. Once the tool is established, we will usescenarios to examine its effectiveness. According to (Carroll, 2002) scenario-baseddesign is a family of techniques in which the use of a future system is concretelydescribed at an early point in the development process. Narrative descriptions ofenvisioned usage episodes are then employed in a variety of ways to guide thedevelopment of the system. Scenario-based design changes the focus of designwork from defining system operations to describing how people will use a systemto accomplish work tasks and other activities. Scenario-based practices now spanmost of the software development lifecycle. The subject of such analysis is thevisual guided action and natural language command (English). There are widecultural differences in approaching such a task, and therefore the study isconstrained by a native English speaking population.

1.7 Outline of this study

The Introduction deals with the general formulation of the problem and themethodological premises, research methods, and theoretical models used in thestudy. The attempt is to decompose movement consistent with cognitive aspects ofspatial reasoning, frames, and axes.

Chapter 2 – Reviews the basic configuration of current navigation systems. Thechapter reviews the principle and performance of navigation in desktop computer-based applied virtual environment programs. The chapter ends with basicassumptions concerning navigational capabilities.

17-Jun-09

13

Chapter 3 – Presents the case studies, from computer simulations to guided tours,in order to discuss the control of representation of movement in virtual space. Thecases include the state-of-the-art programs available commercially, from computergames to three-dimensional navigational programs, to architects’ modellingprograms, in order to discuss the controls of representation/interaction as onemoves in virtual space.

Chapter 4 – Analyzes the software programs to show the ‘flatness’ of interactionand the performance of movement in different tasks. The adequacy criteria areintroduced to answer some of the questions on how to overcome ‘the flatness of thecomputer screen’ – the inability to interact intuitively with the surroundings. Bodyinput devices and the screen output devices are examined to generate criteria for anagent directed by the user.

Chapter 5 – Presents the conceptual system that underlies the endeavour. Theconceptual structure is shown to connect to the linguistic and visual faculty. Thechapter also presents the method by which the system is examined.

Chapter 6 – Presents basic framework components for representing spatialknowledge and performing spatial reasoning. The chapter introduces a basicpointing panoramic navigational system for representing an agent moving in anenvironment. Through the use of a visual panoramic model, we present amechanism of iconic representation. Thus, the chapter examines the relationshipbetween the basic visual panoramic model and basic language commands.

Chapter 7 – Represents the integration of language commands with action toproduce spatial reasoning. Through a spatial semantic model, we are able tointroduce paths controlled by frames and axes.

Chapter 8 – In this chapter we will align the various frames that are activated whenone moves. We will introduce one more system of description. Then a case of atourist guide will allow us to explore natural language and the visual system.Finally, all the different computer programs will be examined.

Chapter 9 – Introduces the proposed computerized tool of navigation and itsspatial reasoning mechanism. The tool components are then examined to revealhow visual and linguistic parsers analyze and generate a route command.

Chapter 10 – Presents a simulation of how the system works in an architecturalcase. The simulation is based on a scenario given by experts; four commands havebeen extracted and presented as a simulation.

17-Jun-09

14

Chapter 11 – Concludes the study and shows that localization of action is possiblein an object-centered system constrained by a panoramic view.

CHAPTER 2

THE BASIC PRINICPLES OF NAVIGATIONIN VIRTUAL ENVIRONMENTS

The general aim of this research is to find ways to make interaction betweenthe computer and user more vivid while navigating in virtual environments.Interacting with objects is a primary way to learn about an environment, yetinteracting with current computer programs is not an intuitive task. Thestudy will show that one of the ways to improve interaction between thecomputer and user is through voice commands. The specific aim of thisresearch is to build a system that transforms linguistic instructions into anagent’s movement in virtual environments. In order to begin, we shall nowprovide a short introduction to the basic configurations of existing systems.

2.1 Basic configurations of current systems

The navigational discourse between the architect and his client involvesrepresentation of the artifact through movement. The architect-client relationsinvolve communication through which one examines scenarios of usability. In thearchitectural scenario, designer and client walk through the environment utilizingdifferent scenarios involving user participant roles: resident, employee, customer,manager, janitor, service worker, neighbor, and passer-by. In our case the end-user(architect) seeks to communicate better about the placement of a proposed objectwithin the virtual environment. That is, the intention of the end-user (designer) isto examine changes or alternatives to the proposed architectural object, and its

17-Jun-09

16

effect on the environment. Movement is one of the basic tools, a means to an endby which people examine objects, whether it is to move objects, move aroundthem, or through them. Movement is primary, yet the knowledge of how onedirects movement has been ignored. Movement knowledge includes wayfindingand orientation strategies, and through it one can reason about space. Theexamination of movement deals with the character of experience that is perceptionand description. Description involves the production of data, how one can talkabout what one sees; from an introspective point of view, one can just talk about it,virtually without effort. Perception is one of the sources of information about ourworld: as one moves in space the process of visual perception converts retinalinformation into visual information. There is a distinction between the real worldand the projected world (Kant, 1790 - 1987). The Gestalt psychology demonstratedthat perception is the result of an interaction between environmental input andactive principles in the mind that impose structure on that input (Wertheimer,1923, Kohler, 1929, Koffka, 1935). The phenomenon of perception is a mentalconstruct, i.e. thought. Perception is the attempt to categorize in a content-richenvironment with mutually referring variable sets (Lakoff, 1987). Mostcategorization is automatic and unconscious, and if we become aware of it at all, itis only in problematic cases. In moving about the world, we automaticallycategorize objects and abstract entities, like events, actions, emotions and spatialrelationships (Rosch, 1978, Jackendoff, 1983).

According to Aloimonos (1997), “A [biological] system that successfully navigatesusing its perceptual sensors must have a number of capabilities that we canroughly separate into local and global ones.” Visual navigation amounts to thecontrol of sensory-mediated movement and encompasses a wide range ofcapabilities, ranging from low-level capabilities related to kinetic stabilization, tohigh-level capabilities related to the ability of a system to acquire a memory of aplace or location and recognize it (homing). In the human-computer interface, thissystem is similar in its representation relationship, but before jumping toconclusions let us examine the components.

According to O'Keefe (1993), navigation in an environment includes two stages inthe acquisition of information. In the first stage, the mammal identifies a notionalpoint in its environment, the polar coordinates. In the second stage, the mammalidentifies a landmark, a transitive reference in the environment. The landmark isfixed no matter how one moves around, and one can partially define which wayone is going by saying what angle one is making with it. Once the two stages are

17-Jun-09

17

identified, one can construct a map of its environment by recording the vector fromthe landmark to each of its targets, using the slope/permanent landmark to definedirection. Assuming that the mammal has done this and now wants to know how toget to a particular target, what it must do is to find the vector from itself to thepermanent landmark. Once it has the vector from itself to the permanent landmark,it can establish a vector from the permanent landmark to the new target. Themammal can then find the vector from itself directly to the new target.

The other factor that we have to take into account is body movement in space asone moves to a target: aiming at a visual target and arm reaching, the basic functionof which is to transport the hand to a given place (see Figure 2.1). In the case ofcomputer interaction it is the manipulation of the input device, and although it isreductive relative to an haptic device, the remaining action still serves an importantrole in the process of visuo-motor feedback. In virtual reality, movement takesplace as a result of competing frames of reference through intentional action by theuser. There are six stages to the formulations to how one operates/controlsmovement in virtual reality, which are common to all CAD simulations:

Figure 2.1 Basic sensorimotor channels involved in reaching movement. As according to Pillard,1996.

In virtual reality, movement takes place as a result of competing frames ofreference through intentional action by the user. There are six stages to theformulations to how one operates/controls movement in virtual reality, which arecommon to all computer simulations:

1. The user observes the monitor and interprets the display as an architecturalscene. He formulates a strategy of action as a result of what he wants to seeoccurring on the screen and what kind of action he needs to take.

2. The hand makes contact with the input device and is on a flat surface in thecase of a mouse, or unrestrained in the case of – for example – a handheldinput device or targeted location as in the case of a keyboard with alpha-numeric commands.

17-Jun-09

18

3. The monitor displays a scene plus the registration of the user movement on thescreen. This is the system input aid and it is usually provided through a pointerin the form of an arrow or hand symbol.

4. The user issues commands by means of hand movement simulating movementlocation, forwards, backwards, etc.

5. The monitor displays a scene. This is the system output.

6. The user perceives the scene and interprets it as having occurred as a result ofhis action and then compares it to the expected output.

7. Repeat of process (go to 1).

2.2 Description of the space people navigate

When people describe the movements of a person in space, they represent them interms of ‘states’ and ‘events’: there is a subject/object in state S1, which is re-located to state S2. Actions are bifocal; they create a topological reality ofconceptual relations and meaning. When people verbally relate to the environment,the task of formulating the requisite and sufficiently transitive properties for use ofa spatial expression to describe an event is considerably difficult. Many haveargued that it is impossible, in principle, to formulate definitions that are frame-free. The problem arises in part because of the allotropic nature of an event.According to Jackendoff (1983), this is a critical linguistic element in spatialcognition – the connection between conceptual structure and spatial representation.Conceptual structure is the encoding of meaning independent of any language,while spatial representation is the encoding of objects and their configurations inspace. According to Jackendoff’s frame/script theory, “The way of looking at theframe/script theory captures a generalization … the essential connection betweenthe frame selection task (how one categorizes novel things and events) and the useof a frame for its default values. I am claiming here that these tasks use the verysame information” (Jackendoff, 1983 p. 141). Tversky (1981) substantiates thisclaim by observing that people seem to keep track effortlessly of the objectsimmediately around them as they move and turn in environments. However, thesystemic distortion of the relationship to objects depends on the directionalreference of those objects from the body and the way the objects are placed inrelation to the axial parts of objects.

17-Jun-09

19

The agent has to move from one location to the next through a route to arrive at adestination. Means-ends analysis involves strategies and procedures for reducingdifference between states. Golledge (1999), conducted an experimental comparisonof strategies: stated criterion versus revealed criterion in route selection, that is, thestated intention versus the tangible action (see Table 2.1). The results show that theway we think about spatial problems and the way we act about spatial problemsreveal a gap between thinking about acting and thinking about space.

Stated criterion Ranking of stated criteria mostfrequently used

Ranking of revealed criteriamost frequently used

Shortest path 1 1

Least time 2 6

Fewest turns 3 3

Most scenic or aesthetic 4 9

First noticed 5 2

Longest leg first 6 7

Many curves 7 10

Most turns 8 5

Different from previous routetaken (variability)

9 8

Shortest leg first 10 4

Table 2.1 comparison of stated criterion versus revealed criterion in route selection (taken fromGolledge, 1999)

Spatial descriptions are composed of elements, typically expressed by nouns, andspatial relations, typically expressed by prepositions and verbs. In order toformulate a linguistic navigational theory, a number of requirements have to beadhered to. Similar to Jackendoff (1983 p. 11), there are four requirements:

Expressiveness: A theory of linguistic navigation must follow the observation ofthe user’s requirement adequately; it must be able to express all the distinctionsmade by a natural language. In practice, of course, no theory can be tested on allpossible sentences, but everyone assumes that some significant fragment of thelanguage must be accounted for.

17-Jun-09

20

Universality: In order to account for the linguistic navigational needs, the stock ofnavigational route structures available to be used by particular languages must beuniversal. On the other hand, this does not mean that every navigational routestructure is necessarily capable of expressing every meaning, for a language may belimited by its lexicon, grammatical structure, or correspondence rules.

Compositionality: A linguistic navigational theory must provide a principled wayfor the structure of the parts of a sentence to be combined into the meaning of thewhole sentence. This requirement may be taken more or less rigidly, depending onwhether or not one requires each constituent (as well as each word) of a sentence tobe provided with an independent interpretation.

Semiotic Properties: A linguistic navigational theory should be able to accountformally for so-called "indexical properties" of an object and signified. Inparticular, the notion of "valid inference" must be explicated.

2.3 Current performance of navigation in virtual reality

The capabilities of current VR systems are limited to integrating the handmovement of the observer with the action taken by him. Current VR systems arealso limited in their ability to integrate the following movements – as a result oflimitations at the conceptual level the system cannot:

1. Combine movement of the agent with object examination;

2. Move from one landmark to the next through a trajectory;

3. Move along an object.

It is the belief of this researcher that this lack of ‘reference systems’ causes one tomiss vivid concreteness, unlike Lynch (1973 p. 304), who expected to find ashorthand physical symbol for the city, both to organize impressions of it and toconduct daily activity. It is expected that the conceptual reference systems ofcommunication can be used as a tool for improving spatial accessibility to thethree-dimensional environment.

Expanded knowledge permits us to develop devices for representing knowledgewith competence to do and control the following:

1) A way of referring to an object;

2) A way of referring to a change of view;

17-Jun-09

21

3) A way that relays our movement in space.

To avoid confusion, we will adopt the distinction of Franklin, Tversky and Coon(Franklin, 1992), and refer to the point of view as the viewpoint of the character-described scenes, while the perspective is with the observer’s perspective.

The act of movement results from controlling, i.e. input.

Control and input types:

1) Control using screen aid (pointer device);

2) Control using input device (mouse, hand-held device);

3) Control using English commands (keyboard, voice command).

2.4 Basic assumptions concerning navigation

The Elementary Navigation control levels are:

1. Movement instruction level - provides a structure for selecting the objects inrelation to background scene, through the use of the information to be suppliedby the system.

2. Information representation level - relays our movement in space to virtualspace.

The navigational control permits consciousness of the frame which specifies theposition of the viewer in relation to the surrounding objects and the position of theviewer’s movement in space in relation to the frame of the attention at the sametime, thus allowing for an information update. The navigational control permits theshifting of attention, through which one focuses on the visual scene as one movesthrough space in relation to specific targets. The device distinguishes betweenframes of observation and frames of object manipulation and representation.

To demonstrate how the system works for architects, we will use thetechnique of scenarios. Scenarios of human-computer interaction help us tounderstand and to create computer systems and applications as artifacts ofhuman activity –– as things to learn from, as tools to use in one’s work, asmedia for interacting with other people. Scenario-based design of

17-Jun-09

22

architecture addresses five technical challenges: scenarios evoke reflectionin the content of design work, helping architects coordinate design action.Scenarios are at once concrete and flexible, helping architects manage thefluidity of design situations. Scenarios afford multiple views of an object,diverse kinds and amounts of detailing, helping the architect manage themany consequences entailed from any given design project. (Carroll, 2000);(Hertzum, 2003); (Rosson, 2002) A typical scenario between the architectand the client contains many directional references, since it examinesrelations between object.

The subsequent scenario has been compiled from interviews through out theresearch conducted by the author. Although it is an imaginary scenario, it isaccurate in its details. A sample of interviews that has been recorded was on June2004 with architects in Rotterdam, the Netherlands. (see Appendix II) Typicalscenes that architects imagine the client might want or they might want are asfollows:

A hypothetical scenario between the architect and the client:

Architect: These are the plans for the proposed city.

Client: What you are showing me is plans; can you show me what it might look

like when I approach the city?

Architect: (Pulls up the elevations) This is the North elevation.

Client: What you are presenting me with is a projection of space similar to the

plans in that it shows space as an analytical section. But can you show me what it would

look like with what you call ‘perspective’?

Architect (Pulls up the a three-dimensional representation) This is what you see.

You see the main road that runs through the city and connects the parts divided by three

rings of roads, the road runs from South to North. From here you see the statues,

fountains, palaces.

Client: Can you show me the citadel?

Architect: (Pulls up another three dimensional representation) This is the view of

the citadel as seen from the intersection of the main road with the inner ring.

17-Jun-09

23

Client: This is just a zoom! Can you show me what the citadel would look like

from the back?

Architect: (Pulls out another drawing) This is the view from the other side.

Client: Wait, I am lost. Can you show me what is the relation to the fountain, can

you show me a panoramic view of this location?

Architect: (Pulls up a panoramic) As you turn left you can see the fountain and as

we turn around we see the palace.

Client: Can you take me though the palace?

Architect: Would you like to see a movie?

Client: No, I would like something interactive.

Architect: The latest thing is you control an agent moving around in a virtual

environment. This is the joystick. It works on the analogy of movement. When you move

the stick forwards the agent moves forwards, when you move it backwards the agent

moves backwards, and so on.

Client: (Looking at the joystick and the perspective generated on the screen) How

do I get to the main hall?

Architect: Go through the entrance into the vestibule and the second door on your

right is the main hall.

Client: (Goes through the building arriving at the hall) Where is the bar?

Architect: You cannot see it from here.

Client: Why not?

Architect: If we rotate this bookcase then you would be able to see it.

Client: How does the deliveryman get to the bar?

Architect: The door on the left leads to the storeroom in the back of the building.

We have an access road all the way to the back.

Client: Can you show me the delivery station in the back?

Architect: For this we either have to go through the storeroom or go through the

kitchen located to your right.

Client: Fine.

Architect: (Moves the agent to the new location)

Client: Can you show me what happens when we add another floor to the palace?

Architect: This will make the palace look higher than the city hall.

17-Jun-09

24

Client: Can you show me?

Architect: This is the view from the left side of the city hall.

The hypothetical scenario presented here demonstrates a many-sidedargumentation about: the nature of representation, the nature of architectural spatialreasoning, and the technology and desire for control that drives the industry.Architectural representation tools have evolved through the ages, i.e. the plans,elevation and sections as the main tools of the trade, but the ability to represent thethree-dimensional product was always the highlight that was thought to bring mostclients to an holistic comprehension of the proposed design. However, theperspective was 2 1/2 D. Hence, it offered no interactivity and no feeling ofimmersion; on the contrary, as Marr suggested, it keeps the viewer aware of itsflatness.

What is the nature of architectural spatial reasoning? One has to start by describingthe design process. The task of the architect is to plan and integrate new designsolutions to human habitation needs. The task involves the coordination ofdifferent stakeholders to produce a comprehensive design. The architect has tocoordinate between different trades, clients and end-user demands. As architectsdesign, the design process can be roughly divided into a creative part that generatesthe built form, and an observation/exploration part, that attempts to comprehendthe consequences of that intervention. Thus, the design process enables thearchitect to develop a solution to a required local condition (Schön, 1983).

In order to investigate the consequence of an architectural intervention one mustexamine the quality of space through images. The architect must inspect theobject’s ability to contain and be contained. The presentation is a dialog betweenthe client and the architect. The architectural presentation takes the client throughthe design process, and as the scenario shows, the client has their own agenda andas he/she moves through the site different aspects of the project are examined. Thearchitectural presentation takes the client/user through a series of anticipated views,coupled with a scaled model. The “tour”, the Cave, games, and the architecturalpresentation programs have a lot in common; they all lead the user though anarrative, and they all attempt to present to the user a new environment. Theproblem is the possible stored images increase exponentially as we add moreobjects to the virtual environments, and the architectural client would like to havethem “on the fly”, hence a desire for controlled dynamic movement in space.

17-Jun-09

25

Concluding remarks

This chapter has provided some basic definitions to facilitate an understanding ofthe cases that will follow. The current performance limitations are: 1) combiningmovement of the agent with object examination; 2) moving from one landmark tothe next through a controlled trajectory; 3) moving along an object. We still requireadequacy criteria to examine various computer navigational programs, and thesewill be extracted following the case-based analysis. In order to build a tool toenhance movement, one needs to understand the observer/observed relationship aswell as object description. Movement is a change in object description. The user ofcomputer-based representation via a monitor or other immersive device still has todecide: how do I change the observer/object relation, and how do I provide inputinto the computer? This is a highly complex question that cannot be answeredimmediately and is the reason why we need ‘cases’.

CHAPTER 3

SELECTED CASES OF CURRENTNAVIGITONAL SYSTEMS

This chapter will describe the range of typical cases of navigation systems, acollection of applied computer-based programs of commercial enterprises.The programs vary from desktop system to fully immersed systems. Theinitial comparison method of interaction will allow us to explore thedifferent systems.

3.1 Introduction to the state of the art in computer navigation

The computer-based navigation cases were selected from professional literature onarchitecture and computers, as exemplified in the “International Journal of DesignComputing” 2002 Vol. 4, in a special issue on “designing a virtual world”. Thereason I have chosen to examine these programs is their ability to displayarchitectural virtual reality in a dynamic way. The commercial games are the stateof the art of what can be achieved in virtual reality systems as compared to existingarchitectural systems. In practice, architects use their own initiative andunderstanding in producing a tailored demonstration. The following case divisionprovides the range needed to examine what is involved, from the perspective of theuser in movement in virtual environments.

Exploration Programs: CAVE™, Cosmo Player®, Myst®

Representation programs: 3DS MAX®

‘First–person shooters’: Tomb Raider®

28

The selection presents navigational programs that use different computerizedsystems. Although 3DS MAX is a representation program, it was added to the listto show how architects work. The Cave is an immersed environment, while themajority of the programs use desktop computers, and Tomb Raider uses a gameconsole. However, there is no reason why the programs cannot all be converted todesktop programs, since they share a common mechanism of transition andpointing. According to Buxton (1990), the model relates the state of an applicationto the state of the device. Figure 3.1 presents the state transition diagram for the useof the mouse. Pressing or releasing the button of the mouse transfers the state ofthe system between states one and two. If the button of the mouse is up, the systemis in the tracking state until the mouse button is pressed. Then, the dragging state isentered. That is ‘direct manipulation’, where the manipulation of objects is by theuser through physical actions like pointing, clicking, dragging, and sliding.

Figure 3.1. The possible three state model of interaction between the user and visual input devicetaken from Buxton (1990)

This study involves a considerable “reduction” from the everyday experience ofmovement or the experience of computer simulated virtual reality, i.e. it focusesexclusively on questions regarding the representation/interaction of movement inarchitectural environments. It is a study on the mechanism of representationindicating what the new system ought to be able to achieve from the cases of CADvirtual reality systems. Also, it examines how new theoretical systems couldimprove existing limitations through the inclusion and integration of the user’smovement in a new transitional structure involving communication between theuser and the computer, i.e. the interface that facilitates interaction.

The following aspects are to be examined:

How the user expresses his/her desires: what types of actions are involved?

What is the user perceiving as output on the computer screen?

What is involved within the system of representation that generates this output?

What are the disappointments, frustrations, and difficulties experienced by the user?

Analysis of the constraints of the system as a result of input action.

17-Jun-09

29

3.2. Exploration Programs - the CAVE™

The CAVE™ system can be used as an example to introduce what the experienceof virtual reality can be. Participants visit the virtual environment alone or ingroups. The CAVE, created by the Electronic Visualization Laboratory of theUniversity of Illinois at Chicago, is a cube-shaped space with stereo projections onthree walls and on the floor, which are adjusted to the viewpoint of one of theusers.

In Cruz-Neira (1993), the authors define a VR system as one that provides a real-time viewer-centered head-tracking perspective with a large angle of view, inter-active control, and binocular (stereo) display. Alternative VR systems, such asHead Mounted Displays, achieved these features by using small display screensthat move with the viewer, close to the viewer's eyes.

Fig 3.2 The CAVE setup, a participant with the shutterglasses, a large group in the CAVE, and aparticipant in a demonstration (taken from (Saakes, 2002)

In the CAVE, one has the ability to share the virtual environment with multipleusers. The CAVE is an immersive environment, but it does not completely isolatethe users from the real world. According to Cruz-Neira (1993), real-world isolationcan be highly intrusive and disorienting. The viewer is still aware of the real worldand may fear events such as running into a wall. It has been shown in Cruz-Neira(1993) that tracking objects in the CAVE are less distracting than in other systems.This is due to the fact that the projection plane does not move with the viewer'sposition and angle as it does in an HMD device.

The 3D effect of the CAVE comes from the stereo projection. This stereoprojection is achieved by projecting an image for the left eye followed by an imagefor the right eye. Viewers wear stereographic LCD Crystal Eyes stereo shutterglasses to view the stereoscopic images. The glasses are synchronized with thecomputer using infrared emitters mounted around the CAVE. As a result, wheneverthe computer is rendering the left eye perspective, the right shutter on the glasses isclosed and vice-versa. This ‘tricks’ the brain and gives the illusion that the left eyeis seeing the left perspective and the right eye is seeing the right perspective.

30

Navigation can come in many forms, including flying or walking, and may becontrolled by a device called a ‘wand’. The wand is a hardware device that can bethought of as a 3D equivalent of a mouse. In addition to being tracked, this wandhas three buttons and a pressure-sensitive joystick. Since the wand is tracked, itfacilitates various interaction techniques. The observer in the CAVE also utilizessimple commands to interact with the environment. For instance, by pointing withthe wand one can pick up an object in the virtual world. Typically, with such atechnique, a virtual beam is emitted from the wand, allowing the user to intersectthe beam with the desired object, and then select it with a wand button.

Figure 3.3: The wand

The illusion of movement in the cave with the help of the wand in the immersedenvironment is intuitive. The tracking of the movement of the user in space offerssix degrees of freedom.

3.3. The exploration program–Cosmo Player®

Cosmo Player® is a program that plugs into the web browser to enable the user toexplore three-dimensional worlds. With Cosmo Player one can visit any three-dimensional world written in Virtual Reality Modeling Language (VRML). Thesethree-dimensional worlds often include other kinds of media, such as sound andmovies, and a brief guide shows the basics of the main controls. (see Figure 3.4)

Figure 3.4 Cosmo’s navigation control

Control and movement in Cosmo Player Manual

The main controls on the Cosmo Player dashboard do two things: move around in 3D worlds andexplore objects in 3D worlds.

Moving Around in a World

To move around in a 3D world, click the Go, Slide, or Tilt button and then drag the pointer in theCosmo Player window. Once one clicks a control, it stays selected until another is clicked.

17-Jun-09

31

Go Click and then drag to move in any direction of the visual array.

Slide Click and then drag to slide straight up and down or to slide right or left.

Tilt Click and then drag up or down or from side to side.

Exploring Objects

To explore objects in a 3D world, click the Rotate, Pan, or Zoom button and then drag the pointer inthe Cosmo Player window. Once one clicks a control, it stays selected until another is clicked.

Rotate Click and then drag to rotate an object.

Pan Click and then drag to pan right, left, up, or down.

Zoom Click and then drag up to zoom in or drag down to zoom out.

Interacting with Active Objects

An active object is one that will do something — like play a sound or an animation — when oneclicks it or clicks-and-drags it. When one passes the pointer over an active object, the pointerchanges form.

Seek Click Seek and then click an object to move closer to it.

Another Way of Moving Through a World

Authors of 3D worlds can set viewpoints — places of interest — for one to visit. The user can movefrom one viewpoint to another by choosing from the Viewpoint list or clicking the Next Viewpointor Previous Viewpoint button.

Viewpoint List Click the Viewpoint List button and choose a viewpoint from the pop-up list ofpre-selected scenes.

Figure 3.5 Cosmo’s navigation in action

Cosmo Player is an exploration program in which attention and direction areintegrated with dragging (see Figure 3.5). The dragging mechanism makes the userfeel as if he or she is constantly dragging some object behind. Cognitively, thedragging feeling comes from a concept related to anticipation by pointing to whereone wants to go and the delay of arriving at that point. The monitor is treated as aphysical entity with flat horizontal and vertical dimensions. Cosmo Player wasdesigned with the visitor in mind, to create places where the user explores thedesigner’s creation. The belief of the Cosmo Player designers was that the basicnavigation movement should also change to achieve the same status of wonder andsurprise as the objects themselves. Movement in a Cosmo Player world is

32

fragmental; although it makes sense to define each move in cinematic terms, suchas pan and tilt, when one makes a movie it restricts movement to a scene co-ordinate system when moving in virtual reality. When a person moves in the realworld the actions described above do not readily occur as individual actions. Forexample: should movement around in a world and zooming be separate concepts?Examining objects in Cosmo Player is through array rotation; at first intuition it isnot clear what kind of rotation is occurring since the visual indicator of movementis constrained to the two-dimensional screen and hand movement is constrainedrelative to the user’s pointing device surface. Combining dragging and pointinginto one action results in screen confusion. Although the Cosmo Player comesclosest to architectural movement it fails to overcome the sense of delay andconfusion when one interacts with objects.

3.4 The exploration program –Myst®

In Myst®, an adventure game, the mouse acts as the input device. Control isachieved through pointing and requires almost no skill to be learned. In Myst, thehand pointer integrates action by indicating possible direction and manipulation ofan object. The mouse works well in combination with the flat surface on which itmoves with the movement of the pointer devices. To move forward, the user mustclick in the middle of the screen. If one wants to turn right or left, one clicks on theright or left side of the screen. By pointing and clicking from one node to the next,one executes movement. Each node is a 360° panoramic view (cylindricalpanoramic images); the sense of movement from one node to the next is abrupt.The user does not have any option for a different location other than the oneprovided by the designer of the game. There are two possible ways to view a scene:one is a panoramic view, the other is an object view. The user must stop at everynew node to update his position. When one passes the pointer over an active object,the pointer changes form. In Myst the engagement is labeled through pre-determined hotspots in a scene, that is to say, the user must constantly search forvisual clues with the pointer device on the screen. The pointer acts both as anavigational device and an interaction device with objects at the same time (seeFigure 3.6).

Pointer: This is the standard navigation pointer for exploration.

17-Jun-09

33

Pointer: This is the standard navigation pointer for exploration.

Open Hand: This pointer indicates an object that you can use or manipulate oran object that you can pick up. Click and see what happens!

Zoom In /Zoom Out:

This pointer indicates something that you can zoom up on (+) oraway from (-). Click once to see what you are examining in moredetail and then click again to zoom out.

LightningBolt:

When Zip Mode is turned on, this indicates an area that you canzip to instantly.

Figure 3.6 Cursor icons from the Myst manual

3.5 Representation programs – 3DS MAX®

3DS MAX® is a representation tool aimed at animators of architecture and multi-media producers. Studio Max attempts to move objects in order to accomplish thedemands of the user to manipulate objects. The history of MAX is of some interestin understanding the development of CAD. As a company, it started on the Omegaplatform. Omega took the first steps towards visual display as opposed to thealphanumeric commands of the first computers. MAX started as a representationprogram for designing objects. It was bought by AutoDesk to supplement themanufacture of AutoCAD, a drafting program. Recently it was spun off as a newcompany called Discreet - attempting to corner the market of games buildingagents to be installed in games like Tomb Raider.

In order to construct or describe an object through descriptive geometry,orthographic projections are used. Orthographic means that all lines along the sameaxis are pertaining to, or involve right angles or perpendiculars. The object co-ordinates are plotted on a two-dimensional plane that defines the limits of the user's‘sight’. In a sense, the viewing plane is like the frame of peripheral vision. To seewhat is behind, you either have to turn your head (rotate the viewing plane), or stepbackward until the object is in front (move the viewing plane). In other words, theuser can only see things that are in front of the viewing plane, and everything elseis ‘outside the field of view’.

When an object is in alignment within geometrical co-ordinates it can be exploredpredictably in three ways: 1) array rotation, 2) object rotation, and 3) scene

34

rotation. An alternative to using the local view co-ordinates is to define an agentaxis, often associated with character animation of the represented agent in virtualreality.

In 3DS MAX, the windows that allow the user to view the 3D space are calledviewports (see Figure 3.7). The monitor screen itself is akin to the viewing planebecause the user can only see what is ‘beyond’ the monitor in cyberspace. In MAX,three of the four default views are orthographic, where objects are shown asorthographic projections. The fourth default viewport in MAX, the perspectiveviewport, represents a more realistic view of 3D space where lines converge tovanishing points, as they do in real life.

Figure 3.7. The viewpoint represents the current vantage point of the user. The viewport - viewingplane indicates the limits of the user's view because only objects in front of that plane are visible.

There are four standards for the object geometrical Cartesian co-ordinate system.They are:

1) A reference location that defines the object relation to the point of origin;

2) A reference orientation that defines the object axis;

3) A reference distance that defines the unit of measurement;

4) A reference sense along the orientation that defines the positive direction inrelation to the human axis.

3DS MAX has two representational systems; one that gives the appearance to theuser that all objects are seen and all have equal visual access, and a secondrepresentational camera through which one explores the virtual space with alimited point of view of the perspective (see Figure 3.8). That is, in order toenhance object manipulation, one can toggle between having one single viewportand splitting the viewport into four pre-defined world co-ordinates (top view, frontview, side view and a perspective). The 3DS MAX viewport has combined the ideaof world-co-ordinates with axis of array rotation, and thus one can see all objectsand all positions according to the world and view co-ordinates. With view co-ordinates one can rotate the array and place an object within viewports and rotatethe array of objects. (see Figure 3.8 for an overall view of the system, and for thespecific viewport located at the bottom right Figure 3.9)

17-Jun-09

35

Figure 3.8 User interface for 3DS MAX

Figure 3.9. MAX viewport navigation

Using Standard View Navigation - Button Operation; from MAX Manual

Clicking standard view navigation buttons produces one of two results:

• It executes the command and returns to your previous action.

• It activates a view navigation mode.

While in a navigation mode, one can activate other viewports of the same type,without exiting the mode, by clicking in any viewport.

Zooming, Panning, and Rotating Views; from MAX Manual

Click Zoom or Zoom All drag in a viewport to change view magnification. Zoom changes only theactive view while Zoom All simultaneously changes all non-camera views.

If a perspective view is active, you can also click Field of View. The effect of changing FOV issimilar to changing the lens on a camera. As the FOV gets larger you see more of your scene and theperspective becomes distorted, similar to using a wide-angle lens. As the FOV gets smaller you seeless of your scene and the perspective flattens, similar to using a telephoto lens.

Click Region Zoom to drag a rectangular region within the active viewport and magnify that regionto fill the viewport. Region Zoom is available for all standard views except the Perspective view. Ina perspective view Field-of-View replaces Region Zoom.

Click the Zoom Extents or Zoom Extents All fly-out buttons to change the magnification andposition of your view to display the extents of objects in your scene. Your view is centered on theobjects and the magnification changed so the objects fill the viewport.

Click Pan and drag in a viewport to move your view parallel to the viewport plane. You can alsopan a viewport by dragging with the middle mouse button held down while any tool is active.

Click Arc Rotate on Rotate Sub-Object to rotate your view around the view center, the selection, orthe current sub-object selection respectively. The latter option is a new feature in 3DS MAX. Whenyou rotate an orthogonal view, such as a Top view, it is converted to a User view.

With Arc Rotate, if objects are near the edges of the viewport they may rotate out of view.

With Arc Rotate Selected, selected objects remain at the same position in the viewport while theview rotates around them. If no objects are selected, the function reverts to the standard Arc Rotate.

With Arc Rotate Sub-Object, selected sub-objects or objects remain at the same position in theviewport while the view rotates around them.

36

Cameras are non-rendering objects that you can position in the 3D scene. Theywork like real cameras in that they provide a viewpoint on the scene that can beadjusted to space and animated to time. Just as with real cameras, MAX camerashave different settings such as lens lengths and focal lengths that one can use tocontrol the view of the scene. Cameras can move anywhere and through objects.Contrary to real-world camera effects, such as depth of field, MAX does not createa depth of field, so everything is in focus.

In 3DS MAX, there exist two types of action cameras: a target camera and a freecamera. A target camera makes use of a target, which is a point in 3D space wherethe camera is aimed. The target camera can track an object moving in the scene. Afree camera is a camera without a target that can easily be animated along a path oreasily pointed by simply rotating the camera. The camera can be manipulated bydragging, direct manipulation of the camera in the world coordinates, or by cameraviewport control through pan, tilt and zoom. It can also be further manipulated bytime lines and key frames.

Figure 3.10 Camera navigational directions

You move a camera view by clicking one of the following buttons and dragging inthe camera.

• Dolly moves the camera along its line of sight.

• Truck moves the camera and its target parallel to the view plane.

• Pan moves the target in a circle around the camera. For a target camera, it rotates the target aboutthe camera. (For a free camera, it rotates the camera about its local axes.)

• Orbit moves the camera in a circle around the target. The effect is similar to Arc Rotate fornon-camera viewports. It rotates the camera about its target. (Free cameras use the invisible target,set to the target distance specified in the camera Parameters rollout.)

Figure 3.11 Possible camera movements

17-Jun-09

37

The camera view in MAX is slightly different than the conventional perspectiveview; the difference lies in the treatment of the object rotation. In the conventionalviews there is a switch between array rotation and view rotation in relation to viewco-ordinates. That is, when view co-ordinates are visible the rotation is an arrayrotation, and when the view co-ordinates disappear from view the system switchesto panoramic rotation. In the camera views one can be engaged with an objectallowing array rotation in both conditions.

The 3DS MAX approach to immersed navigation was by distributing viewsbetween the various viewports. The camera routine mimics the act of directingmovies with elements of array rotation. One can always direct action by physicallymoving the camera to the required position, but then one is at the level oforthogonal projection, with no direct experience of space.

3.6. First-person shooters - Tomb Raider®

Early ‘shoot-up gallery’ programs had an agent immersed in a two dimensionalenvironment with four basic navigation commands: turn left, turn right, moveforward, and move back. The navigational commands expressed the attempt tointegrate different views with the user’s demands to experience space. It soonbecame obvious that those commands were insufficient and more elaboratesystems were required.

Tomb Raider is an example of first-person action games, where action is integratedby means of an agent. In the old arcade games, a person had a gun and shot ateverything that moved; in the first-person shooting gallery, ‘action’ is controlled bydirect control of the agent through input of the hand-held device. The Tomb Raideragent is Lara Croft – the representation of a character that moves as instructed bythe user through computer keys: as one presses a key or combination of keyscertain action follows. The control is divided according to the type of action onewants to perform. In the game, there are three levels of control over the agent: thefirst is basic movement of the agent (run forward, jump back, side-step left, side-step right), with some control over motion and speed (run, jump, walk), and somerotation (roll, turn left, turn right); the second level of control pertains to action ofthe agent (draw weapons, get/throw flare, fire); and the third level of control isaction in relation to the environment (grab ledge, pull object).

38

Figure 3.12 Lara turning and shooting

Below is a list of some of the actions of Lara Croft taken from the manual:

Running Pressing Up moves Lara forward at a running pace, while pressing Downmakes Lara jump back a short distance. Pressing Left or Right turns Lara Left or Right.

Walking By pressing the Walk button in conjunction with the Cursor Keys, Laracan carefully walk forwards or backwards. Whilst the Walk button is held down, Lara will not falloff any edge - if one makes her walk up to an edge Lara will automatically stop.

Side Steps Side-step right and left do exactly as one might imagine.

Roll Selecting Roll will make Lara roll forward, and finish up facing theopposite direction. This also works when Lara is underwater. Roll may also be activated by pressingthe Up and Down Cursor Keys simultaneously.

Jumping Lara can jump in any direction, to evade her enemies. Press the JumpKey and Lara will jump straight up into the air. If you press a Cursor Key immediately afterpressing Jump, Lara will jump in that direction. In addition, pressing Down or Roll straight afterstarting a forward jump makes Lara somersault in the air and land facing the opposite direction.This also works when jumping backwards by pressing Up or Roll immediately after takeoff.

Vaulting If Lara is faced with an obstacle that she can climb over, pressing Up andAction will make her vault onto it.

Climbing Some walls are climbable. If Lara comes across such a surface, pressingUp and Action will make her jump up (if there is room) and catch handholds on the wall. She willonly hang on whilst Action is held down. She can then be made to climb up, down, left and right bypressing the Cursor Keys. Pressing Jump will make Lara jump backwards away from the wall.

Grabbing hold If Lara is near to a ledge while she is jumping, pressing and holding theAction Key will allow her to grab the ledge in front of her and hang there. If a wall is climbable,Lara can catch onto it anywhere (not just ledges). Press Left or Right, and Lara will shimmysideways. Pressing Up will make Lara climb up to the level above. Let go of Action and Lara willdrop.

Picking objects up Lara can retrieve objects and store them in her inventory. Position Laraso that the object you want to retrieve is in front of her feet. Press the Action Key and she will pickit up.

Using puzzle items Position Lara so that the object receptor is in front of her. Press theAction Key and the Inventory Ring will appear. Left and Right will allow you to select the objectyou want to try, and pressing Action again will use it.

Pushing/pulling objects Lara can push certain blocks around and use them to climb up to greaterheights. Stand in front of the block and hold down Action, and Lara will get into her ready stance.Once she is ready, press Down to pull the block, and Up to push it, or if you decide you no longerwishes to carry on with this task, simply release the Action Key.

Looking around Pressing the Look Key will make the camera go directly behind Lara,whatever the camera is currently doing. With the Look button held down, the Cursor Keys allowLara to look around her. Once you let go of the key, the view returns to normal. (TIP: if you are

17-Jun-09

39

trying to line Lara up for a jump, and the camera is in an awkward position, pressing just the LookKey on its own will show you exactly what direction she is facing.)

First-person action games have combined the agent and the directing into onespace. The challenges of the game Tomb Raider are about exploration and timing,how fast one can move without mistakes in guiding Lara to her target. Directingmovement is continuous and requires a learning curve and good hand-eye co-ordination. The user is encouraged to be constantly on the move. Playing Lara isnon-intuitive; one presses alphanumeric keys, a process which takes time to learn.The player must learn how to align or position the agent with virtual objects, andhow to make the agent run while constantly adjusting the agent’s position. Thediscrete commands make it difficult to move Lara diagonally.

Concluding remarks

This chapter has described a range of navigational programs and their interfacecontrol mechanisms. The cases range from games to exploration programs, tomodeling programs are all available commercially. In the next chapter, we willanalyze this critically and compare the various programs in order to produceadequacy criteria.

17-Jun-09

41

CHAPTER 4

CRITICAL ANALYSIS OF THE STATE OFTHE ART TECHNOLOGY

This chapter will examine the overall performance of the different casespresented in the previous chapter. From this analysis the general criteria ofthe systems will be exposed. We have examined professional architecturalprograms, navigational programs and games. Unlike the games, all the othercomputer programs were not intuitive to learn and movement of hand did notnecessarily corresponded with the action on the screen. We call thisphenomena flatness of interaction.

4.1 Historical origins of the ‘flatness’ of the computer screen

Movement in virtual reality as represented on the computer screen has a longhistory with its roots in the Renaissance. In order to understand the process ofinteraction between the user and objects of desire, we must start with Alberti’sattempt in “On Painting” (Alberti, 1956) to describe the process of representation:“I beg you to consider me not as a mathematician but as a painter writing of thesethings. Mathematicians measure with their minds alone the forms of thingsseparated from all matter. Since we wish the object to be seen, we will use a moresensate wisdom.[…] The painter is concerned solely with representing what can beseen. The plane is measured by rays that serve the sight called visual rays whichcarry the form of the thing seen. We can imagine those rays to be like the finesthairs […] tightly bound within the eye where the sense of sight has its seat. Therays, gathered together within the eye, are like a stalk; the eye is like a bud, which

42

extends its shoots rapidly, and in a straight line to the plane opposite.[...] Theextrinsic rays, thus encircling the plane [...] like the willow wands of a basket-cage, and make, as is said, this visual pyramid. […] The base of this pyramid is aplane that is seen. The sides of the pyramid are those rays that I have calledextrinsic. The cuspid, that is the point of the pyramid, is located within the eye”(Book I). Alberti’s system of representation for perspective, as demonstrated in hisbook “On Painting”, was a new way of seeing the world (Gadol, 1969), (Edgerton,1966), (White, 1957), (Panofsky, 1924). According to (Panofsky, 1924) Albertideveloped a mathematical perspective. The basic visual forms were mediatedthrough an optical projection on a planar surface in a Euclidean space. Alberti’ssubstitution of the cone of vision with a pyramid makes the representation of a one-point perspective possible. In Alberti’s perspective, the size of the object observedvaries with the height of the observer’s eye and the distance to the object, theconstruction has the lines converge into a single point, called the vanishing point.Alberti’s concern was with an active representation of the physical world, bymeans of the ‘window’ described through the ‘flatness’ of the canvas. One’sobservation is fixed on the intersection of the picture plane and the physical world.It employs perspective for relating the viewer to a represented world representedthrough the visual rays and the visual pyramid. The instance of observation isjuxtaposed upon the intersection of the picture plane – the canvas and the physicalworld. “First of all about where I draw. I inscribe a quadrangle of right angles, aslarge as I wish, which is considered to be an open window through which I seewhat I want to paint” (Alberti, 1956). To construct such a window frame one needsan object of an intended view, and to project it onto a perpendicular surface to theobserver (see Figure 3.1). According to Gadol (1969 p. 29), the window view orthe centric-point scheme, as defined by Alberti, is represented through the visualrays – the observer perception of the picture – and on the other side of the pictureplane is the pyramid of ‘rays’, a purely imaginary construct. The informationdisplayed is dual, the picture is both a scene and a surface, and the scene isparadoxically behind the surface.

Figure 4.1 In perspective, all lines of sight converge on the viewer’s eye which is positioned in astationary privileged location. This creates the illusion of a vanishing point.

17-Jun-09

43

“He who looks at a picture, done as I have described [above], will see a certaincross-section of a visual pyramid, artificially represented with lines and colors ona certain plane according to a given distance, center and lights” (Alberti, 1956).The perspective, as defined by Alberti, the visual lines emanating (radiant lines)from the object or non-object on the horizon of one’s visual field and where theyintersect is the vanishing point that constructs the perspective (see Figure 3.2).According to Kubovy, “The geometric evidence for this point can be found in thesize of the depictions of known objects. The geometry of perspective implies thatthe painting of an object which is in front of the picture plane will belarger-than-life; since Renaissance painters very rarely painted larger-than-lifefigures, most figures must be behind the picture plane” (1986 p. 23).

Figure 4.2 Construction of perspective representation of pavement consisting of square tiles.(Kubovy, 1986)

According to Panofsky (1924 p. 29), “In order to guarantee a fully rational - thatis, infinite, unchanging and homogeneous - space, this "central perspective" makestwo tacit but essential assumptions: first, that we see with a single and immobileeye, and second, that the planar cross section of the visual pyramid can pass for anadequate reproduction of our optical image. In fact these two premises are ratherbold abstractions from reality, if by “reality” we mean the actual subjectiveoptical impression.” That is, Alberti attempted to set the condition for realism.According to Tzonis, “The tripartition that characterized the organization ofperspective-based pictures corresponded to the tripartite cognitive framework offront-middle-back, up-middle-down, right-middle-left - categorical structuresinternal to the mind. Consequently, what the viewer recognized in such paintingswas nature categorized, humanized. Perspective paintings were not onlynaturalistic images, but also mental images” (Tzonis, 1993). Current mechanismscontrolling movements in virtual space are predominantly based on suchperspective systems. The analogy between the observer of a painting and theobserver of computer interaction also has its limitations. As Kubovy (1986) shows,the interaction between the observer and the picture plane does not change, and asa result one experiences a double dilemma that is resolved differently in human-computer interface. The dilemma of the picture plane is explained as follows: theexperience of the picture turning stems from two perceptions – on the one hand,even though we are walking around the picture, we perceive the spatial layout of

44

the represented scene as if it remains unchanged. On the other hand, even thoughthe spatial layout of the scene remains unchanged, we perceive our own motion inspace as we walk past the picture.

The modern reader may find a similarity in Alberti’s “On Painting” with aphenomenon one encounters in the human-computer-interface phenomena wherethe screen can either be transparent to the observer or opaque. “When painters fillthe circumscribed places with colors, they should only seek to present the forms ofthings seen on this plane as if it were of transparent glass. Thus the visual pyramidcould pass through it, placed at a definite distance with definite lights and adefinite position of center in space and in a definite place in respect to theobserver” (Alberti p.51). When the screen is transparent one is working with anagent moving within a three-dimensional world; when the screen is opaque one isworking with a representation, a flat pointer that moves up and down. In caseswhere the input device is a mouse, one’s input movement is restricted to a two-dimensional space; forward and backward, left and right, and the screen feedback;up and down, left and right.

The attempt to break the Albertian window is characteristic of the modernmovement in art. There are two approaches that emerged from the ‘modernmovements’ that concern the re-construction and super-imposition of the views ofan object over time; one examined tracking movement in relation to objects,recording different views as combined viewpoints frozen in an imaginary instance,or what Henderson (1983) calls the fourth dimension. This is characterized as partof the aims of the Cubist painters, while the Futurist movement examined an objectas it moved in space registering the instance of its trajectory, taking their guidefrom the first experiments with a fixed camera and human movement (animation).The two approaches displayed a cognitive representation of the act of tracing anobject in short and long-term memory.

So what are the characteristics of flatness, and how do we define it? Flatness workson two levels: one physical, the other conceptual. The physical flatness is thedisembodied unattached movement of the hand relative to the display of thecomputer screen when it does not correspond to the representation of movement inspace. The conceptual flatness is our inability to manipulate objects intuitively andimmediately, since the scene is paradoxically behind the surface. This paradigmhas prevented us from exploring ways of working and navigating with objects in anintuitive way. It is now our task to expose the limits of such interaction.

17-Jun-09

45

4.2 Limitation of interaction with surrounding objects

Architects utilize several means to examine an object through design criteria.Architects first employ descriptive geometry to construct the object and perspectiveviews to combine the views of the projected object so as to be able to examine therelationships. Those relations can be summarized as follows: 1) the relationbetween the object and its detail or the relationship between the object and anotherobject; 2) the relation between what is seen and occluded parts; 3) the relationshipbetween objects and people. Those views are then examined in a cyclical mannerthrough the design process.

Much of the work of the locative prepositions involves the identification of thesetwo variables; action description (EVENT) and location description (STATE).When modeling a computer program of transformation of states into events, onehas to abstract different levels of representation, from object recognition tolanguage interpretation – fields of developing expertise. The program derives itsknowledge from object analysis to spatial propositions, to reference systems. Onthe conceptual level the system works on the notion of the path as a vector. Avector consists of a direction and distance from a known location to a new location,the agent’s existing location (departure point), and the agent’s new location (arrivalpoint).

Accessibility is a task for a navigator to perform. The physical accessibilityanswers the question: Can one move between objects or can one move betweenother agents? The visual accessibility asks the questions: What can be seen? Whichpoint allows the most unhindered visible range? Locations are insular or they haveaccess to other locations; this physical accessibility is secondary to visualaccessibility, since one has to know where objects are in order to access them.Accessibility or what Gibson (1979) calls affordance relies on the information thatthe various senses gather about an object. Visual accessibility is the unhinderedvisible range, sometimes referred to as spatial openness (Fisher-Gewirtzman, 2003)or isovist (Batty, 2001). The space that can be seen from any vantage point is calledan isovist and the set of such spaces forms a visual field whose extent definesdifferent isovist fields based on different geometric properties. This is part of agrowing field that attempts to build our understanding of how our spatialknowledge is acquired.

46

There are different reasons to examine pedestrian movement from security toservices (Kerridge, 2001), to understanding the pedestrian flow (Helbing, 2001),(Haklay, 2001), and also the importance of viewpoints for wayfinding (Ramloll,2001), (Mallot, 1999), and (Elvins, 1997). Those systems use the observer space torecord views triggered by requests of object handling and route planning. User typerequires the identification of the type of navigation group one belongs to. Forexample, tourists would have different needs than architects or shoppers inexamining the environment. Every type of group requires different operations inrelation to objects used. The difference between the different spatial knowledgelies in the way spatial data is captured and organized. On the one hand, there is thegeometric description of the environment, on the other, the description of the user.For example, a shop of one kind or another may be given importance over otherenvironment landmarks, or object features might be emphasized for architects. Theenvironment may also impose restrictions on the users. For example, in a grid citythe user might prefer to use the streets to navigate rather than use landmarks.

Thiel (1961) attempted to create a notational system from the participant point ofview, by dividing his system into two parts: the individual traveler viewpoint andthe taxonomy of the environment. As opposed to user-centered (Norman, 1986),Thiel (1970) coined the term ‘user participant’. The user participant is a viewerwho acts out different roles, i.e. different scripts. The information is filteredthrough three components: 1) Transducers – transfer information from one physicalsystem to another; 2) Coders – a set of conditioning or learned response; 3)Attenders – monitor the set conditions from the end-user’s point of view. Byintroducing this division, Thiel is able to escape the attempt to map the user goalsand intentions onto resulting action. That is, Thiel eliminates judgment anddecision from the process. The detriment factor of behavior include bothmotivation and intention, exemplified by theories of reasoned action and plannedbehavior (Ajzen, 1985) – theories in which intention mediates the relation betweenmotivation (attitude) and behavior.

In order to understand what one can expect from a computer navigation program,one must go back to the reason why architects have adopted the computer, or, to beexact, which computer software functions and features were offered. One of thethings the representational programs presented is the ability to change an objectand instantly be able to re-examine it: a reproducible perspective without theconstruction involved in redrawing. What architects expected was that they wouldbe able to combine the way they work with the way the perspective was

17-Jun-09

47

represented. The architectural expectation from the representational programs wasthe ability to change views instantly in a complex environment. What the userreceived was a turn wheel procedure, a chain of actions that he or she had to followin order to examine object relationships. Those procedures are different fromprogram to program yet all of them require action to switch from one globalcoordinate system to another, a process that is dependent on computational power,i.e. the time to execute the commands. Those actions have profound implicationsfor the production of an artifact, which is manufactured through collaborativecommunication.

The question is not just how much control one has over the procedures, but also theflexibility in the design process. What is the most effective way to move betweenthose established frames as one walks in the virtual environment? How much canthe visual co-ordinates be extended in relation to the users and to facilitatemovement of view? This implies that the controls of movement are through spatialrelations. The way objects relate to objects/to the viewer/to an object or part of it inthose programs is through the determination of what is active and passive in ascene, the manifestation of which are object and array rotation. There are twoinstances of object rotation in relation to the observer. One is object rotation – theuser selects an object and can then rotate the object. This is an observer control thatrefers to strategic or goal-oriented information processing when the individualintentionally directs attention towards relevant stimuli (Wang, 1999). The other isarray rotation – the user selects the object then moves around the object at thesame constant distance from the object, thus changing his point of view. This isdirecting attention elicited by characteristics of the object, and implies automatic ormandatory information processing.

Rotation is one of the most intuitive actions and the most controversial in terms ofwhich category it belongs to. In cognitive acquisition skills it precedes thecategories of object and route. Rotation is an instance where object manipulation isexposed. Through rotating the object and array one engages with the object.

The programs examined all have different approaches concerning how to resolvethe need for movement. The result of the different approaches generates differentprocedures for hand-eye co-ordination. All those programs use the interactivedevices of a physical pointer (the mouse) and a screen cursor. The use of the handto control the mouse and screen transfers our attention from the pointer to objectson the screen, thus captivating us.

48

The 3DS MAX solution is to split the viewport fields; in practice when one viewsthe perspective there is no update to the viewport fields. For MAX as amanufacturer of virtual objects the way it was set up worked well, since for theprofessional architect manufacturing is the ability to record an object projectedgeometrically. The basic components of MAX navigation are pan, array rotate, andzoom; through them one can circumnavigate an object. MAX also introduced acamera similar to Cosmo Player.

The Cosmo Player solution is to assume that directing a movie and navigation arethe same problem; it takes the film director as a model. It divides the navigationaltools into route and object manipulation.

Myst’s solution is what they term panoramic nodes – scene rotations which areconstructed before a preview and can then be connected through hotspots. The useof the mouse and a pointer allows for greater freedom by working as a spotlight,but is the most restrictive in the sense of movement flexibility and feedbackcontrol.

The Tomb Raider solution is to immerse an actor in the scene to direct movement.This powerful idea allows the user to forgo the pointer for a puppet. Tomb Raideris the most immersed of the programs: one’s attention is on the object movement inthree-dimensional space and allows the visualization of the experience of spacethrough the action of the agent, but it allows no architectural investigation.

There are two ways in which interaction is represented in the programs examinedin the previous chapter: the first is a cursor sign on the screen indicating the currentposition of the pointer in relation to the screen, the second is the use of an agent(see Table 4.1). The question of the normative values of immersion does notnecessarily have to be settled by experiment; suffice to say that a three-dimensionalaction is better then a two-dimensional action. The question one has to ask is: whatabout the performance of different tasks? What can those programs achieve onceall possible moves are accounted for? A comparison table examines the viewer’sability to move.

Viewer’s ability… 3DMAX Tomb Raider Myst Cosmo

To move forward and backwards Zoom Yes Yes Zoom

To turn right and left Pan Yes Yes Pan

To move up and down Tilt No No Tilt

17-Jun-09

49

To rotate object Yes Yes No No

To rotate array according to desiredaxis

No (onlyestablished axis)

No No No(onlyproximityaxis)

Table 4.1 Viewer’s ability to perform simple tasks

In Table 4.1 a list of possible actions is generated and contrasted with the differentprograms. From it, it is clear that one cannot rotate the array arbitrarily; alsomissing is the ability to move an agent through a trajectory.

4.3 Navigational Criteria

The way that humans think about physical situations contains four aspects ofanalysis that appear to be particularly important in solving simple qualitativespatial reasoning problems. These aspects are:

1. Representation of detail at multiple levels. People are able to store a largeamount of detailed information about a complex object, yet also consider thatobject in terms of gross shape alone when this is necessary. They are also able tofocus on a particular detail of the overall shape, while retaining a record of itscontext. An example of this ability is the way that an architect views a house. Heknows a huge amount about its detailed shape, but is able to think, when necessary,simply in terms of its overall shape.

2. Independent reasoning in local contexts. Where overall shape is very complex,people are able to reason about one part of the overall shape, treating it as anindependent context. The architect, for example, when designing a facade, is ableto work purely with that local context that is abstracted from the overall shape ofthe building. This is the conflict between the frame and the object’s intrinsicproperties.

3. Assignment of properties to groups of features. People are able to assign anabstract description to a whole set of shape features, and then make statementsabout the new abstraction, rather than simply about a single instance of it.

4. Qualitative size description and judgment. In many spatial reasoning situations,the absolute size of a given shape feature is not important. Its size relative to othershapes may be more important, as in the question “will this cargo fit through the

50

door?” Alternatively, its size may be altogether irrelevant, as in the question “isthat a door?” If qualitative reasoning methods are available, it is possible to discussrelative size, or size-independent questions, without numerical information.

An architectural presentation can be seen as a visual lecture on the potential ofselected architectural objects, through plans, elevations, sections and perspectivesthat help to visualize the building. The transition of movement from one place toanother in virtual environments needs to make sense to the audience. So what kindof characteristic output will give us the spatial experience in movement? Thetransition between the different nodes in every VR program is what makes itscharacter. In cinema, a shot must end in a cut. Thus the architectural presentation isan exposition of the object’s different points of view, and architectural transitionhas to respond to the architectural description for it to make sense. We have anobserver in an existing location in relation to world and view. The observer desiresa new view of the world, thus moving from the existing location to a new locationthrough a channel or path. For a pointer/visual aid one has an observer whobelieves: if I move through this channel and find myself in this new location I willhave a new view of the world (this is spatial knowledge using spatial knowledgestrategy, tactics, and so forth). For an agent we have the observer who believes thatif he or she moves the agent, they will have a new view of the object.

According to (Peacoke, 1993), “We experience objects specifically as materialobjects. Part of what is involved in having such experiences is that perceptualrepresentations serve as input to an intuitive mechanics, which employs the notionof force. This involvement is in turn a consequence of a general principlegoverning the ascription of content to mental representations, together with verygeneral philosophical considerations about what it is for an object to be amaterial”. This is intuitive mechanics: an important aspect of movement betweenobjects is the discontinuous space based on the ‘naive physics’ notions ofsubstantiality and continuity. The substantiality constraint states that solidobjects cannot pass through one another. The continuity constraint states that ifan object first appears in one location and later appears at a different location thenit must have moved along a continuous path between those two locations. In otherwords, objects do not disappear and then later re-appear elsewhere. Substantialityand continuity are major points of decisions; they determine the strategy ofmovement, and in a virtual built world they do not necessarily need to be applied.In short, one must have at least a simple theory of perception and action. Grasping

17-Jun-09

51

an object employs a representation of the procedure of movement towards thatobject overcoming obstacles in its way.

When comparing the task of turning around a building in programs like TombRaider and 3DS MAX, several distinctions arise. Tomb Raider has an agent movingforward, backward, right, and left, and lacks any further elements of interaction asthe agent moves in continuous three-dimensional space. In 3DS MAX, on the otherhand, one can rotate the object as well as rotate the array. The ability to go fromagent-action to directed-action is critical if one wants to be able to augment reality.As 3DS MAX proves, it is more effective to rotate the array in order to see the rearof a building than to walk around the building. It is also more efficient in anarchitectural office to prepare a presentation of a building or an entire environmentin this manner. A program that can do those things will improve the efficiency ofthe office in examining the proposed design as well as in the architecturalpresentations to the client.

Tomb Raider 3DS MAX

What is represented How one inputs things What isrepresented

How one inputs things

Initial condition Identify movement –Pointing

Initial condition Identify axis –Pointing and labeling

Turn towards the path Using Right/Leftcoordinates

-- --

Move from existinglocation to new location

Use Front/Back/Left/Right coordinates

Array rotation Use Left/Rightcoordinates

Turn towards thebuilding

Using Right/Leftcoordinates

-- --

Exit -- Exit at new/oldlocation?

--

Table 4.2 Difference between movements for the task of “moving to the back of the object”

4.4 Scenarios of interaction

For visual navigation one has two modes of thinking.

52

Bounded Agent: directing an agent to move. For example: go forward, turn left,etc (see Chapter 5 for equivalent linguistic device).

Directed Agent: moving the agent in relation to a cursor on the screen. Forexample: go towards an object (see Chapter 5 for equivalent linguistic device).

For navigation one has two instances of encounter: ‘Opaque screen analogy’ (upand down movement of screen pointer) and ‘Transparent screen analogy” (forwardand backward movement of screen pointer). (See Table 4.4)

Opaque screen analogy (up anddown movement of screenpointer)

Transparent screen analogy(forward and backwardmovement of screen pointer)

Bounded Agent Animated being, Early videogames (PacMan)

Augmented agent (TombRaider®)

Directed Agent: Animated scene (Myst®) Augmented environment(Cosmo Player® 3DS MAX®)

Table 4.3 Analysis of current systems

The four possibilities represent the unique situation that one encounters to thenavigational strategy.

Animated scene – is pointing at the desired location in order to move. It is the basicattention to an object i.e. zoom in relation to the center of projection.

Animated Agent – is moving of the agent, correspondence of hand movement withagent image; forward (hand) – up (screen), backward (hand) – down (screen). It hasa topological, side view, and axonometric view in relation to the observer.

Augmented environment – is pointing at the desired location in order to move. It isthe basic attention to an object but with flexibility of the projecting axis ofobserver/object.

Augmented agent – the movement of the agent, correspondence to hand movement;forward, and backward in immersed environments. That is, the analogical relationof the projecting axis of observer/object.

17-Jun-09

53

4.5 Adequacy criteria for an agent directed by the user

The arrival of animation capabilities at the desktop has provoked interest in the useof known animation techniques for computer-human communication. A commonthread to the proposal and inclusion of animation capabilities in user interfaces is astrong intuition that motion and making information objects move should make theinterface environment more credible, more “real”, and less cognitively foreign tousers. Baecker, (1990) discussed the potential of user interface animation to revealprocess and structure (by moving the viewpoint) and introduced the followingtaxonomy of eight uses of animating function to make the interface more engagingand comprehensible:

• Identification associates the symbol with its function (“What is this?”);

• Transition carries the user smoothly between states (“Where did I come from andwhere have I gone?”);

• Choice shows possible actions (“What can I do now?”);

• Demonstration illustrates the capabilities of the tool or service (“What can I dowith this?”);

• Explanation shows how to employ it (“How can I do this?”);

• Feedback provides information on process dynamics and state (“What ishappening?”);

• History replays previous actions and effects (“What have I done?”); and

• Guidance suggests suitable next steps (“What should I do now?”).

Stasko (1993) adds three design guidelines drawn from the principles of traditionalanimation:

• Appropriateness dictates that the operation or process should be representedaccording to the user’s mental model and system entities.

• Smoothness is essential since jerky, wildly varying animations are difficult tofollow.

• Duration and control vary with the type of animation. Demonstrations of unitoperations such as selection should be short (not more than a few seconds).Animating continuous processes with a clock time correspondence should be keptfaithful to the clock time.

54

According to Sloman (1978), verbs of motion all seem to involve a subset of thefollowing ideas:

1. An agent (may or may not also change position, and may or may not change theposition of other objects).

2. There is a route for the motion of each object, with a starting and a finishinglocation.

3. An agent may use an instrument, possibly to move an object.

4. Moving things have absolute and relative speeds.

5. If A causes B to move, A may be on the side away from which B is moving oron the side to which B is moving.

6. The movement of B may be merely initiated by A (pushing something over theedge of a table) or may be entirely due to A (throwing something, pushing italong).

7. The agent may have a purpose in moving the object.

8. There may be a previous history of movements or locations referred to (e.g. ifA retrieves B).

9. There may be more than one stage in the motion (e.g. A fetches B).

10. A may do something to B which tends to produce motion, but the motion maybe resisted (e.g. pushing an object which is too heavy, pulling an object with astring which stretches or breaks.)

11. The agent movement may be supported by an object (e.g. in riding it).

What kinds of needs does the tool have to satisfy? To satisfy the need to be thereand see, one must be able to control movement through a movement instructionlevel. The structure of object display can be divided as:

1. Plan a course from a given location to a desired one.

2. Shift the direction of their gaze and focus attention on objects as they moveacross a path.

3. Move around an object keeping track of this change in relation to thesurrounding objects.

4. Turn an object in front of the viewer in order to examine it.

Concluding remarks

In this chapter we have analyzed the programs to show the flatness of interactionand the performance of movement pertaining to different tasks with different tools.

17-Jun-09

55

When one examines the current computer programs for architecture one discoversthat they are based on the work-desk metaphor and exploration programs that use apointing metaphor (agent-based). The work-desk metaphor uses the universalcoordinate system allowing the user to rotate an array. The system works well for acarefully placed object at the point of origin thus transferring it to an object-centered model. The exploration programs, on the other hand, use agent-centeredmodels. Yet when one navigates in an environment an important gap opens for asystem that is able to choose and manipulate agent/object relationship, and this isan object-centered system as opposed to agent-based navigation. In the nextchapter we need to examine the overall relation of vision and language and themethodology by which we are to examine the interaction.

17-Jun-09

57

CHAPTER 5

CONCEPTUAL SYSTEM OF AGENT-BASEDFRAMES & AXIS NAVIGATION

Up to now we have examined existing computer programs, and now we mustturn our attention to a phenomenological approach, the visual and linguisticencoded information. We will examine the conceptual theory that will allowus a full range of action in directing an agent/avatar and the resultingelements of the path. We will also present the methodology by which we areto examine the interaction.

5.1 Language of the path

According to Jackendoff (1992), there is a single level of mental representation, aconceptual structure, at which linguistic, sensory, and motor information iscompatible. Word meaning is instantiated in large part in the brain combinatorialorganization. The full class of humanly possible concepts (or conceptual structure)is determined by combinatorial principles of the Conceptual Well-Formednessrules. That is, the conceptual well-formedness rules characterize the space ofpossible conceptual states – the resources available in the brain for formingconcepts. The conceptual well-formedness rules are the foundation on whichlearning is based. Inference rules are those relations among conceptual structuresspecifying how to pass from one concept to another, including the rules ofinference principals of pragmatics and heuristics. For example, what makes theverb ‘approach’ an object different from the verb ‘reach’ an object? If youapproach an object, you are going towards it, but you do not necessarily arrive

58

there. By contrast if you reach an object, you have been going towards it, and youhave arrived there.

Spatial representation is a format or level of mental representation devoted toencoding the geometric properties of objects in the world and the relationshipamong them in space. We assume that spatial representation must be translatableinto a form of representation specific to the motor system used to initiate and guidebehavior. Here we want to equate spatial representation with Marr’s 2 ½ Dsketches where objects have the same properties as in the spatial representation andnot with 2D topological relationships.

The semiotic rules express the spatial relation of an object (figure) in relation to aregion in which the other object (the reference object) is located. While thesemiotics rules are factual, the syntactic rules are asymmetrical. In our case theasymmetrical description does not apply in cases of relative size. The relative sizerule states that objects are considered relative to their size. For example, one cansay that the “the agent is next to the house” but not “the house is next to the agent”.The last element is the semiotic well-formedness rule, which abstracts thegeometric properties of an object according to linguistic criteria of spatialrepresentation. The semiotic well-formedness rules apply to the notion of‘affordance’ as defined by Gibson, (1979) and also examine the route to such anobject.

Figure 5.1 Spatial Semantic Model according to Jackendoff

The principle of phrase structure is a homomorphic relationship between what issaid and what is there. A homomorphism connects every point from system A tosystem B, without, however, connecting every point of structure B to structure A.An isomorphism is a symmetrical relation; it connects every point from system Ato every point of system B and vice-versa (Frey, 1969). Consequently the ‘theory ofconceptual structures’ has to be linked by a different set of correspondence rulesto the representations for perception and action. In addition, conceptual structuresof course have to be linked by a principle set of correspondence rules to the mentalrepresentations that serve language: conceptual structures are by hypothesis theform in which the meaning of linguistic expression must be couched internally.Therefore there must be correspondence between the syntax and conceptual

17-Jun-09

59

structure. For every major phrasal constituent in the syntax of the sentence theremust be a correspondence to a conceptual constituent that belongs to one of themajor ontological categories. Hence the head of the tree structure is a major phrasalconstituent corresponding to a function in conceptual structure.

In order to grammatically structure a sentence, a primary distinction is customarilymade between lexical categories (or parts of speech, e.g. Noun (N), Verb (V),adjective (A), and preposition (P), and Sentence (S)). The cases that describespatial location and motion are in the form NP VP PP. Within this restricted classthe correspondence of syntax and semantics is transparent; the PP refers to a placeor path, the subject NP refers to a thing, and the sentence as a whole refers to asituation or event in which an object and agent are located in the virtualenvironment. According to Herskovits (1998), the lexicon of English spatialprepositions has a limited number of relations. The full list can be seen in figure5.2.

Primarily Location Primarily Motionat/on/in acrossupon alongagainst to/frominside/outside aroundwithin/without away fromnear/(far from) towardnext up/down tobeside up/downby into/out ofbetween onto/offbeyond outopposite throughamid viaamong aboutthroughout ahead ofabove/below pastunder/overbeneathunderneathalongsideon top/bottom of

60

on the top/bottom ofbehindin front/back ofleft/right ofat/on/to the left/ right front/back ofat/on/to the left/right sidenorth/east/west/south ofto the east/north/south/west ofon the east/north/south/west side of

Figure 5.2 The English spatial prepositions, taken from Herskovits 1998

5.2 Elements of the path

On the phenomenal level the event that one is looking at is an event depicted as anaction sequential stream; one is in a state and commands to changes to a new state.What we are interested in is the relationship between the agent’s previous positionand subsequent position relative to either the object or the agent. That is, whatdirections are equivalent to the relationship of the subject’s transformed position?The task of formulating the necessary and sufficient truth conditions to describe anevent is immensely difficult in vision as well as in linguistic commands. Manyhave argued that it is in principle impossible to formulate definitions that clearlydelineate the subject of occurrences of an event. The problem arises in part becauseof the fuzzy nature of event classes. For any event there will be subjects that areclear instances of that event type, those that clearly are not instances, and thosewhose membership in that class of events is unclear. For examples, see (Levinson,2003), (Levelt, 1996), (Tversky, 1998).

Visual and linguistic descriptions have the ability to convey information about thepath through explicit and implicit knowledge; for example “go left” is adescription, where the start point is implicit. The path can also have the end pointsuspended like “go towards the house” or “go into the house”; it is the equivalentof pointing visually. The converse - path can have an arrival point like “go to theleft of the house”. Lastly the transverse - path can have an explicit start and an endpoint, giving us the ability to determine the path relation to an object. These fourtypes of path are represented below (see Figures 5.3-5.6). The verb action used inthis analysis is “Go”, “See” and “Turn”.

17-Jun-09

61

Figure 5.3 Bounded agent

Bounded agent The agent can move in any directiondesired (six degrees of freedom). It is operated by theuse of correspondence of the movement of screen actionsto observer purpose by combining the agent referencesystems of input to output. This is an agent-enteredreference system.To move the agent, use the command GO.Utilizes the commands: go → forward, backwards, left, right, up, and down.The agent can also turn (Look) sideways.Utilizes the commands: Turn to → the left/right..

Figure 5.4 Directed agent

Directed agent identifies the new position of an agentby directing it to an object; it uses the object-centeredreference system.The go towards command differentiates betweenmovement and end goal. The ‘go towards an object’directed command has no ability to discern betweendifferent regions of space relative to the object referencesystem.

Figure 5.5 Converse Path

Converse path: defines a spatial path of the observer inrelation to an object, object-centered. Two-PointRelations; this includes the evaluation of topological,angular and distance-dependent relations, which sharethe common characteristics that they – in their mostbasic form – relate two objects to each other. Uses thecommands: in front of, on the left/right side, behind –preference to agent role: architect, tourist etc.It is operated by the use of identification of object andthe new position relative to it.The go to command differentiates between movementand end goal. ‘Go to an object’ has the ability to discernbetween different regions of space relative to the objectreference system.

Figure 5.6 Transverse Path

Transverse path: defines the relation of objects inrelation to the path of the observer. N-Point Relations:Relations that cannot be reduced to a two-point problem,such as path relations [through/along/across].The user operates it by the identification of a target(object) and the object axis along which movement is to

62

be performed.Go along has the ability to discern between differentpath movements relative to the object.Utilizes the commands: Go along → (path) and Goaround → (an object) is also part of those commands

The elements presented here are the full range of interaction in an immersedenvironment ranked from low to high knowledge function. The agent movementalready exists in other visual systems that have been reviewed so far; the converseand transverse path still await implementation. The new features will be examinedas part of the enlarged system.

5.3 Method of examination

The method of examination is usually a compromise of task analysis andinformation analysis (Sutcliff, 1997). The process starts with a requirementanalysis (see Figure 5.7). The first part of the method concentrates on user contextanalysis eliciting information to classify information requirements in their taskcontext. The task requires interviews with trained users who have knowledge aboutthe process. The information/knowledge is used in a task walkthrough with thearchitect and client. This method will be demonstrated in Chapter 10.1.Information analysis builds on the task model, which in our case is the possibilityof action of any given spatial preposition task. In the case of information analysis,the question that one asks at every stage of the process is:

• What input information is required for this action?

• What output information is produced by this action?

• Is any other information required to help the user complete this action?

Figure 5.7 Method diagrammatic overview

The model is then analyzed in terms of the specific demands of the user andcategories of description. The information analysis of the user when performing atask depends on information categories and information declaration. The

17-Jun-09

63

information categories are examined in terms of verb action, and informationdeclarations are examined in terms of syntax and semiotics.

To investigate the usability of the system one asks the following questions:

• Were there any tasks that the system was unable to represent?

• Were there any relations between intention and commands that the system wasunable to represent?

• Is there any evidence that the use of the system would have saved the actors anyeffort?

• Would the use of the system have created substantial new work for the actors?

Concluding remarks

In this chapter we have presented the theoretical basis for conducting the research,and explained the relationship between the visual and the linguistic aspects ofcommunication. We have also shown that the method used is the analysis ofinformation of the user demands. In the next chapter we will examine the visualrepresentation of movement through the historical review of Piaget, (1948) andthen examine the cognitive visual location analysis.

17-Jun-09

65

CHAPTER 6

VISUAL REPRESENTATION -IMPLEMENTATION

There are three systems of movement representation: the agent-centeredmodel, the object-centered model, and the environment-centered model,which allow movement to be represented and verbally communicated in thevirtual environments. The object-centered model is of importance toarchitecture since it relates object/landmark to the user, yet there is noexploration program that mimics those features. This chapter will examine away of representing movement and performing spatial reasoning with anobject-centered model, based on panoramic views, as opposed to topologicalmaps. We will examine the visual representation of movement through thehistorical review of Piaget and then examine the way the cognitive visualnavigational translates to topological vector analysis.

6.1 Egocentric and allocentric systems

Let us start with the very simple premise that people interact with objects andplaces in real or virtual environments. The way people represent objects andarrange objects, locations and paths is a mental construction, in the built ‘reality’and in the computer interface. Visual navigation represents one possible way tomove about in space. Visual navigation is a sequence of arrangements of objects ina location and the paths of object action. The representation of manipulation ofobjects is a choice of objects transferred through attention and directed by gesture.

66

The term “spatial frames of reference” has been used by researchers in severaldifferent but related areas of endeavor, for example in perception, cognition,neuroscience, linguistics, and information science. Across these various areas, aconsensus emerges (Jackendoff, 1983) (Levinson, 1996) (Campbell, 1994).Fundamentally, a spatial frame of reference is a conceptual basis for determiningspatial relations. This description is applicable across situations in whichperson-to-object and object-to-object spatial relations are represented or described.

When Piaget and Inhelder first published their book “The Child’s Conception ofSpace” in 1948, Piaget was striving to understand the development of spatialreasoning. They reasoned as follows: “As the straight line leads to anunderstanding of projection, and the three dimensions of projective space lead tothe idea of a bundle of lines intersected by a plane, so both these fundamentaloperations of projection and section become familiar enough to enable the child togive the kind of explanation seen in the examples quoted. But the concept of thestraight line itself, together with the various relationships resulting from itssynthesis with the original topological relations, ultimately presumes the discoveryof the part played by points of view, that is, their combined co-ordination anddifferentiation. How is this discovery to be accounted for? To ascribe the originand development of projective geometry to the influence of visual perception … isto overlook the fact that the purely perceptual point of view is always completelyegocentric. This means that it is both unaware of itself and incomplete, distortingreality to the extent that it remains so. As against this, to discover one's ownviewpoint is to relate it to other viewpoints, to distinguish it from and co-ordinateit with them. Now perception is quite unsuited to this task, for to become consciousof one's own viewpoint is really to liberate oneself from it. To do this requires asystem of true mental operations, that is, operations which are reversible andcapable of being linked together” (Piaget, 1960). This division between egocentricand allocentric still remains with us and divides the location of structure, withinwhich the position of the objects and events are specified into frames. In theallocentric view (i.e. Many-to-Many relationship), all objects are related to allobjects. The points which represent object location in Cartesian space relate to Xand Y co-ordinates and to other objects in that space. By contrast, in the egocentricview (i.e. One-to-Many relationship), all objects relate to a single object. Theallocentric view is a mathematical construction; when spatial reasoning isintroduced into the allocentric construction one is dealing with representation,visual and verbal description. Traditionally, on the visual side, ‘plan’ and‘axonometric’ are associated with an allocentric view, while ‘perspective’ is

17-Jun-09

67

associated with an egocentric view. The plan, or schema, is an analytical section, areduction of space to 2D. It is capable of transmitting accurate distances relative tothe egocentric view where distance viewed is judgmental. (Piaget, 1960)experimented with the limits of children's abilities to transform spatial information.Piaget and Inhelder attempted to discover the age by which children can switchfrom an egocentric to an allocentric frame. That is, the egocentric frame wasconsidered to be innate while the allocentric was considered to be acquired. Theypresented “perspective” problems in which children were shown a model of threecolored mountains and were asked to indicate how it would look to an observerwho viewed it from a different position. Until 9-10 years of age, children tended tomake what were thought to be egocentric errors when shown a representation ofthe array, which depicted a variety of elevations. According to Huttenlocher, Piagethad shown that viewer-rotation problems are difficult when children must chooseamong pictures or models of an array from differing perspectives. According toHuttenlocher (1979), parallel tasks, array-rotation problems are much easier tosolve than viewer-rotation problems. Huttenlocher proposed that in solving theseproblems, subjects interpret the instructions literally, recoding the position of theviewer vis-à-vis the array for viewer-rotation problems and recoding the array withrespect to its spatial framework for array rotation problems. The results show thatthe viewer is fixed vis-à-vis the spatial context rather than that the viewer beingfixed vis-à-vis the array.

Campbell (1993) adds to some distinctions between ways of thinking that involvean explicit or implicit dependence upon an observer and those that have no suchdependence. Campbell’s suggestion is that the resultant system is egocentric only ifthis significance can be given solely by a reference to the subject’s own capacitiesfor perception and action, in what he calls causally indexical terms. The causalsignificance – the judgment made about objects standing in various spatialrelations – is essentially given in terms of its consequences for the subject’sperception or action: casual indexical. It will be allocentric if, and only if, thissignificance can be given without appeal to the subject’s perceptual and activeabilities, causally non-indexical, in terms that give no single object or person aprivileged position, which treats all the world’s objects (of a given kind) as on apar with respect to their physical interaction.

68

6.2 Panoramic communicative view

Spatial reasoning is the engagement in representation of motion with geometricproperties of points in space. The most elementary properties are prepositions,which specify the relation between objects in space; I call such prepositions spatialprepositions. Examples of spatial prepositions are in – city(x) or next to – house(x).Typically, the set of points where a predicate is true form a single compact regionof space, and spatial reasoning amounts to detecting intersection relations amongcombinations of regions, called environments. Spatial reasoning in our case is therepresentation of this reasoning in order to communicate.

To understand the relationship between an observer and the relatum one mustunderstand the communication relationship. It is a representation which makes onlyas many distinctions as necessary and sufficient conditions for the communicationof spatial relations of object to the observer. The representation for orderinginformation that restricts the location of a point with respect to some referencepoints is given by the panorama. It is defined as a continuous image of front andsideways view while on the move from which features are extracted. Theexamination of how one describes what one sees while moving is an expression ofthe speaker’s underlying cognitive states. In order to achieve spatial cognition, i.e.the ability to represent the environment and act upon this representation to form adecision, where to go and what to see, one has to be able to distinguish between theobject in a scene and its background. The English spatial predicate marks alocation, an operation that designates it as one to be remembered. The spatialpredicate also marks the referent and relatum position in space and arranges itsparts to be accessed. The English spatial predicate takes the form of a predicate, areferent and a relatum: (1) Referent – the object; (2) Relatum – the referenceobject in the background; (3) Predicate – the spatial relationship between thereferent and the relatum. This distinction was first defined by Filmor’s (1968)“Case Grammar”, and later by (Talmy, 1983).

The spatial predicate marks the referent and relatum position in space and arrangesits parts to be accessed. Orientation information locates a point object in anyposition of the semi-straight line from the origin of the Cartesian co-ordinates witha given angle. Orientation information can be given by polar co-ordinates: theorientation is given by a vector – an angle and the exact position in the straight lineof orientation by a distance, both measured from the origin of the Cartesiancoordinates. Three spatial point objects are involved in the definition of orientation

17-Jun-09

69

relationships by orientation model, i.e. ‘a’ and ‘b’, which define the referencesystem, and ‘c’, the object whose orientation is provided with respect to thereference system.

In investigating the hippocampus (the area of the brain thought to contain theencoding of spatial relationships), O'Keefe (1990, 1991) proposed theslope-centroid model as the way in which animals successfully navigate. Thismodel represents the basic relations of frames by always having a reference, whichis outside the simple Euclidean metric relation of trajectory vector between thecurrent location and the desired location. The model contains two stages in ananimal’s construction of a map of its environment. In the first stage, the animalidentifies a notional point in its environment, the centroid, which is a notionalpoint in the sense in which the South Pole or the Equator are notional: there may beno distinctive physical feature at that place. It is a fixed point, in that it does notmove with the animal. In the second stage, the animal also identifies a gradient forits environment, a way of giving compass directions. This is the slope of theenvironment; it functions like the direction east-west. The direction is fixed nomatter how one moves around, and one can partially define which way one is goingby saying what angle one is making with it. As in almost all models of mapping,we take it that the animal is constructing a two-dimensional map of itsenvironment; the third-dimension is not mapped. Once the animal has identifiedthe two stages, it can construct a map of its environment by recording the vectorfrom the centroid to each of its targets, using the slope to define direction.Assuming that the animal has done this and now wants to know how to get to aparticular target, what it must do is to find the vector from itself to the centroid.Once the animal has the vector from itself to the centroid and the vector from thecentroid to the target, it can find the vector from itself directly to the target.

According to O’Keefe (1990), at any point in an environment, an animal’s locationand direction are given by a vector to the centroid whose length is the distance tothe centroid and whose angle is the deviation from the slope (360-γ), as in Figure6.1. Other places (A and B) are similarly represented. This dichotomy has its rootsin egocentric and allocentric frames of reference and subsequent attempts byO’Keefe (1993) to define the possibility of navigation without allocentric thinking.For people, “This is done by enhancing landmarks which permit the use of objectreference system” (Hazen, 1980 p.14).

70

Figure 6.1 Use of the movement translation vector (T). (Taken from O'Keefe 1990; 1991)

The co-ordinate system, centered on the viewer, seems to be based generally on theplanes through the human body, giving us an up/down, back/front and left/right setof half lines. Such a system of co-ordinates can be thought of as centered on themain axis of the body and anchored by one of the body parts. Although the positionof the body of the viewer may be one criterion for anchoring the co-ordinates, thedirection of gaze may be another, and there is no doubt that relative systems areclosely hooked into visual criteria. An axis is a locus with respect to which spatialposition is defined. Landau and Jackendoff distinguish three axes. Three types ofaxes are required to account for linguistic terms describing aspects of an object’sorientation. According to Jackendoff, “The generating axis is the object's principalaxis as described by Marr (1982). In the case of a human, this axis is vertical. Theorienting axes are secondary and orthogonal to the generating axis and to eachother (e.g., corresponding to the front/back and side/side axes). The directed axesdifferentiate between the two ends of each axis, marking top vs. bottom or front vs.back.” (Landau, 1993)

Figure 6.2 Three axes - object to parts construction

In the TOUR model (Kuipers, B., 1978), the simulated robot performs two types ofactions: TURN and GO - TO. The purpose of the procedural behavior is torepresent a description of sensorimotor experience sufficient to allow the travelerto follow a previously experienced route despite incomplete information sensed. Itis stored as sensorimotor schema of the form <goal, situation, action, result>. The“you are here” pointer describes the current position of the robot by determining itsplace and orientation. The topological map is constructed when there are enough

17-Jun-09

71

sensorimotor schemes. The topological map consists of a topological network ofplaces (points), paths (curves), regions (areas), and topological relationships amongthem (connectivity order and containment). A place consists of an orientationreference frame, a set of paths intersecting at the place together with the angles ofthe paths relative to the orientation reference frame, and the distances anddirections of other places, which are visible from this place. A path consists of apartial ordering of places on the path, and regions bounded by the path on the leftand the right. The orientation reference frame is described in terms of itsorientation relative to other frames. A district consists of edges and paths.

According to Escrig (1998), there are four different types of inference rules definedto manipulate knowledge embedded in this representation: (1) rules which comparethe “you are here” pointer with the topological description of the environment; (2)rules for maintaining the current orientation with respect to the current coordinateframe; (3) rules which detect special structural features; and (4) rules which solveroute-finding and relative-position problems.

The approach to pointing is to define a path from a to b with the position of theobserver c; thus, in the panoramic model, one can point to an object and locate anew perspective relative to it (see Figure 6.3).

The basic knowledge is represented in Freksa and Zimmermann’s approach, whichis the orientation of an object, c, with respect to the reference system defined bytwo points, a and b, that is, c with respect to ab. The vector from a to b and theperpendicular line by b define the coarse reference system (Figure 6.3 a), whichdivides the space into nine qualitative regions (straight-front (sf), right-front (rf),right (r), right-back-coarse (rbc), straight-back-coarse (sbc), left-back-course (lbc),left (1), left-front (lf), identical-front (idf)). The vector from a to b and the twoperpendicular lines by a and b define the fine reference system (Figure 4.3 b) whichdivides the space into 15 qualitative regions (straight-front (sf), right-front (rf),right (r), right-middle (rm), identical-back-right (ibr), back-right (br), straight-back(sb), identical-back (ib), straight middle (sm), identical-front (idf), left-front (lf),left (1), left-middle (lm), identical-back-left (ibl), and back-left (bl)).

Figure 6.3 a) The coarse reference system and b) the fine reference system

Given the original relationship c with respect to ab, five more relationships can bedirectly obtained [Freksa and Zimmermann] by permutation of the three objects a,

72

b and c. The number of permutations of three elements is 3! = 3*2 = 6, which arethe following five operations plus the original relationship:

• c with respect to ab is the original relationship.

• c with respect to ba is defined as the inverse operation. It answers the

question: "What would the spatial orientation of c be if I were to walk back from b to a?"

• a with respect to bc is the homing operation. It answers the question:

"Where is start point a if I continue my way by walking from b to c?"

• a with respect to cb is the homing-inverse operation.

• b with respect to ac is the shortcut operation. It allows us to specify the

relative position of objects that are not on our path but to the side of it.

• b with respect to ca is the shortcut – inverse operation.

These observer instructions are examined through basic left, right, below, andabove, aimed at the target of acquisition in a three-dimensional space. A table withiconic representations is shown in Figure 6.4. One should note that the shortcut isthe only relationship that refers to the object; we will come back to this point as weprogress in our analysis.

Figure 6.4 Iconic representation of the relationship c with respect to ab and the result of applyingthe five operations to the original relationship (adapted from [Freksa and Zimmermann]).

In our case we can redefine the following:

Visual accessibility: What can be seen? Which point allows the most unhinderedvisible range for the agent? Accessibility, or what Gibson (1979) calls affordance,relies on information the various senses gather about an object.

Object context: How does an object look from a different angle? (Go to thefront/left side of the building)

Topological relations: What is the relation to other objects not necessarily seen?(am I left of the church?)

• a with respect to bc is Visual accessibility

• b with respect to ac is Topological relation

• c with respect to ab is Object context

17-Jun-09

73

6.3 Possible interaction with objects and routes

The aim of the tool is: to enhance for the user the interaction of virtualenvironments, and to make the navigation a natural directed experience. Thus, theproposed system can enhance interaction with the user utilizing an object-centeredframe of reference. Through pointing at an object with visual feedback, one canindicate one’s desire to be relative to an object by employing the topological mapto convey information through the cursor keys. This is the way computer gamesallow us to play football; we indicate to which player we want to pass the ball andthe direction in which the ball should go relative to the indicated player. If youreplace the ball with a camera and the player with a building, you would have asystem that functions as object-centered with visual feedback.

Reaching for a nearby object requires pre-existing knowledge, knowledge of whatproperties define and delimit objects. According to Berthoz, movement and therepresentation of movement, the ‘body’ and ‘thought’, go hand in hand. There is noapparent command structure of ‘thought’ then ‘body’; on the contrary, the bodyalready anticipates our next action. When navigating in built environments, oneemploys a different strategy of grasping, that of approach and position (look) andthat of reach (interaction.) According to Merleau-Ponty (1962), “Concretemovement is centripetal whereas abstract movement is centrifugal.” Or in thewords ofBerthoz, “The brain is capable of recognizing movements of bodysegments, selecting and modulating information supplied by the receptors at thesource. But proprioceptive receptors can themselves only detect relative movementof body masses. They are inadequate for complex movement-locomotion, running,jumping-where the brain must recognize absolute movements of head and body inspace.” (Berthoz 2000 p.32) Thus, the higher functions of movement have aconceptual linguistic component in them that existing direct manipulation cannotprovide. The grasping frames relate to objects in two ways: Manipulation mode -an intention that conveys an impulse toward the object, as in the case of object use.Observational mode - an intention conducted away from an object, as in the caseof object observation.

To design the architectural object, the architect works mostly with canonical viewssuch as sections, plans, and elevations. The perspectives that architects use enhancethe design by allowing an observational comparison of at least two sides of anobject. Thus the canonical extension of axes that are most commonly used aredivided into eight qualitative regions: straight-front, right-front, right, right-back,

74

straight-back, left-back, left, and left-front of the reference system. The canonicalarchitectural reference system representation (see Figure 6.8) avoids the granularityof choice among users by setting a constraint that states that all relative distances tothe agent-object shall remain constant when the viewpoint changes, unlesscontradicting the substantiality constraint that states that solid objects cannot passone another.

Figure 6.8 The architectural reference systems, incorporating the visible area from a generating location(or convergence location of the optic rays)

In the city there is the added distinction between observation and movement. Thecity form dictates a physical limitation from where one can view the city. Forsimplification, interaction in the built environment is divided into sixteen canonicalpoints of view to approach an object (see Figure 6.6). The eight points of view (ofCases 2 and 3) take on the nature of urban bound movement interaction with theagent approaching the side of an object, from a road, where only one side of anobject is visible. This view is often ignored by architects when designing, and itseffect on the architecture as Venturi (1977) shows is critical to the modernarchitect. As yet it has not become part of the architect’s canonical language.

Case 1 Case 2 Case 3 Case 4

Figure 6.6 Sixteen different cases depending on the relative orientation of the point “a” with respect tothe extended object “b” which are grouped into four cases due to symmetries. Taken from (Escrig,1998).

The approach to pointing is to define a path from a to b with the position of theobserver c; thus, in the panoramic model, one can point to an object and locate anew perspective relative to it (see Figure 6.3). The notion of reference system canbe viewed as a conceptual neighborhood with topological and linear relations (seeFigure 6.5). Thus one can walk around to the back of the building or transport tothe new position in the back of the building. There are two possibilities ofmanipulating an object. The first is (Figure 6.5 a) a topological conceptual system;the other (Figure 6.5 b) is a linear route. What is needed is a system that allows theuser to choose between the different rotation of object and array system accordingto the adequacy criteria.

17-Jun-09

75

Figure 6.5 Topological and linear view of the conceptual neighborhood (taken from Escrig, 1998)

Concluding remarks

This chapter examined a way of representing movement and performing spatialreasoning with an object-centered model, based on panoramic views. We haveexamined the visual representation of movement through the historical review ofPiaget and examined the way the cognitive visual navigational translates totopological vector analysis. In this chapter we have presented a visual system forobject-centered interaction. The system works well as a module but lacks anycompeting attention devices. Thus the user cannot switch between different framesof reference. We now turn our thoughts to the linguistic model, which has anattention mechanism built into the frames.

17-Jun-09

77

CHAPTER 7

LANGUAGE BASED REPRESENTATION -IMPLEMENTATION

In this chapter we present the way in which language performs the task of objectmanipulation through a command/description. We examine the mechanism ofattention and the divisions that language creates. The use of frames of referencewill be introduced, and a case of a basic linguistic navigational program will bebrought in to demonstrate some of the conceptual difficulties encountered in theearly development of such programs.

7.1 Directing the path

Language has a different set of frames than the visual axis; in fact, language hasmore axes of attention. A frame of reference is a set of axes with respect to whichspatial position is defined. To draw attention to an object, one must correlatebetween the sharing of attention and the desired location. To represent therelationship one has to refer to visual and linguistic aids for pointing and labeling.Confirm attention is an engagement to form attention. When interacting, pointingis the token action to identify joint engagement. When interacting, labeling is thematching of the object with the signified. Labeling is imperative pointing, adeclarative pointing using a referential language. When interacting, directing is anexplicit instruction of how to proceed from one place to another via a route.

Directing involves an explicit declarative set of instructions, and in our case the useof the English language. In English, a prepositional phrase is a function word thattypically combines with a noun to form a phrase that relates one object to another.According to Jackendoff (1987), the structure of spatial prepositional phrases in

78

English consists of two notions: the first is place, and the other is path. This is areference as projection in the sense of “conceptual structure” of mental information– including both “prepositional” and “diagrammatic” information (Johnson-Laird1999). Prepositional phrases in English explicitly mention a reference object as theobject of the preposition, as in “on the table”, “under the counter”, or “in the can”.The path is an ordered sequence of places and the translation vectors betweenthem. Paths can be identified by their end places or by a distinct name. On the otherhand, places along the path can be identified and associated with the path. A pathmay be marked by continuous features such as a trail or road, but need not be.

The internal structure of a path often consists of a path-function and a referenceobject. The PATH often consists of path-function and a reference object, asexpressed by phrases like ‘towards the house’, ‘around the house’, and ‘to thehouse’. Alternatively the path function may be a reference place” (Jackendoff,1983). This possibility is in phrases like “from under the table” where “from”expresses the path-function and “under the table” expresses the reference place.Prepositions such as “into” and “onto” express both a path-function and the place-function of the reference place. For example, “The man ran into the shelter”. Manyprepositions in English, such as “over”, “under”, “on”, “in”, “above”, and“between”, are ambiguous between a pure place-function and path-function, forexample, “the man is under the shelter”, and “the man ran under the shelter”. Oneof the ways to view an architectural building is to travel along a route. A route is asequence of procedures between two nodes requiring decision-making. The path isthen the course of action or conduct taken between two nodes. One can express thisconceptual possibility formally in terms of a phrase-structure-like rule for thefunctional composition of a conceptual structure.

[Place X] –> [Place PLACE-FUNCTION ([Thing Y])

[PLACE] projects into a point or region, but within the structure of an event orstate, a [PLACE] is normally occupied by a [THING.]

The internal structure of [PATH] often consists of path-function and a referenceobject, as expressed by phrases like “towards the mountain”, “around the tree”,and “to the floor”. Alternatively, the argument of path-function may be a referenceplace. This possibility is in phrases like “from under the table”, where “from”expresses the path-function and “under the table” expresses the reference place.Prepositions such as “into” and “onto” express both a path-function and the place-function of the reference place.

17-Jun-09

79

The mouse ran from under the table.

[Path FROM ([Place UNDER ([thing TABLE])])]

The mouse ran into the room.

[Path TO ([Place IN ([thing ROOM])])]

Language also makes use of different frames of reference for spatial description;they are used to identify places in directing our actions, in deciding where to move.The vocabulary used in the description of spaces in such situations is whatJackendoff (1993) calls “directions”:

Vertical: over, above, under, below, beneath.

Horizontal: side to side, beside, by, alongside, next to.

Front to back: in front of, ahead of, in the back of, behind, beyond.

“Two factors affected task difficulty. The first factor was whether the problem wasdescribed as a rotation of the array or of the viewer. The second was the type ofquestion. The effect of these two factors interacted: with appearance questions,array-rotation tasks were easy and viewer-rotation tasks were difficult; with itemquestions, viewer-rotation tasks were easy and array-rotation tasks were difficult.”(Huttenlocher 1979) The two principles are involved in how people treat theseproblems. First, arrays are coded item by item in relation to an outside framework.Second, transformation instructions are interpreted literally as involving movementof the viewer or array. For array rotation this entails recoding the array with respectto the framework; for viewer rotation it entails recoding the viewer’s position withrespect to both the array and its framework.

The extension of axis, such as above, below, next to, in front of, behind, alongside,left of, and right of, is used to pick out a region determined by extending thereference object's axes out into the surrounding space. For instance, in front of Xdenotes a region of space in proximity to the projection of X’s front-back axisbeyond the boundary of X in the frontward direction (Johnson-Laird, 1983);Landau and Jackendoff 1993). By contrast, inside X makes reference only to theregion subtended by X, not to any of its axes; near X denotes a region in proximityto X in any direction. Note that “many of the ‘axial prepositions’ aremorphologically related to nouns that denote axial parts.” (Jackendoff, 1999) Forexample, “Go to the front of building” [event GO [thing AGENT] [prepositionFRONT [thing BUILDING]]]

80

At this point one notices the disparity between the representing and acting, thedifference between ‘bounded agent’ and ‘directed agent’, which are low levelcommands with two additional high level commands. The first is ‘converse path’,which is parallel to sight, where one indicates the target, and can plot a course tothe target. The second high function is ‘transverse path’, which is perpendicularto sight, where one also needs to define the start and end point of the path, shownin prepositions like ‘through’, ‘around’ and ‘along’. The route-bound agent istransverse where the agent turns in relation to a building, while the object-boundagent is converse when it refers to an independent path in relation to thearchitectural object. Thus the conceptual linguistic model employs two basicoperations in the modeling of an agent’s movement, where there are Three-Pointinteraction and N-Point interaction (see Figure 7.1).

Converse Transverse

Figure 7.1 Symbol inequalities for a) the parallel and b) the perpendicular lines to the South-Northstraight line. Taken form Escrig (1998).

7.2 How views change; axes and frames of reference

With the introduction of an agent into the environment, a situation is created wherethe observer frame may be projected onto the object from a real or hypotheticalobserver. “This frame establishes the front of the object as the side facing theobserver. We might call this the ‘orientation mirroring observer frame’.Alternatively, the front of the object is the side facing the same way as theobserver's front. We might call this the ‘orientation-preserving observerframe’”(Jackendoff, 1999 p. 17).

According to Levinson, “To describe where something (let us dub it the ‘figure’) iswith respect to something else (let us call it the ‘ground’) we need some way ofspecifying angles on the horizontal. In English we achieve this either by utilizingfeatures or axes of the ground or by utilizing angles derived from the viewer’sbody coordinates… The notion ‘frame of reference’ … can be thought of aslabeling distinct kinds of coordinate systems)” (Levinson, 1996 p. 110). Linguisticliterature usually invokes three frames of reference: an intrinsic or object-centeredframe, a deictic or observer-centered frame, and an absolute frame (see Figure 7.2).

17-Jun-09

81

The frames of reference presuppose a ‘view-point’, and a figure and grounddistinct from it, thus offering a triangulation of three points and utilizing co-ordinates fixed on the viewer to assign directions to a desired location.

Intrinsic frames of reference – the position defining loci are external to theperson in question. This involves taking the inherent object-centered referencesystem to guide our attention, and uses an allocentric frame (Jackendoff, 1999).

Relative frames of reference (deictic) – those that define a spatial position inrelation to loci of the body or agent-centered. The relative frame of reference isused to identify objects’ direction; this involves imposing our egocentric frame onobjects (Jackendoff, 1999).

Absolute frames of reference – defining the position in absolute terms, such asNorth, South or Polar co-ordination. Absolute frames are environment-centeredand use either Cartesian or Polar co-ordinates (Jackendoff, 1999).

Figure 7.2 Three linguistic frames of reference

Let us consider an example, “the gate is in front of the house”. For a manufacturedartifact, the way we access or interface with the object determines its front,anchored to an already-made system of opposites: front/back, sides, and so forth.This would also be the case with any centralized symmetrical building but not for acylindrical building. In fact, the situation is more complex. The sentence “the gateis to the left of the house” can sometimes employ a relative frame of reference thatdepends on knowledge of the viewer location. This entails that the gate is betweenthe viewer and the house, because the primary co-ordinates on the viewer havebeen rotated in the mapping onto the ground object, so that the ground object has a“left” before which the gate is situated. “Viewing a frame of reference as a way ofdetermining the axes of an object, it is possible to distinguish at least eightdifferent available frames of reference” (for further details see Jackendoff 1996 p.15; many of these appear as special cases in Miller and Johnson-Laird 1976, which,in turn, cites Bierwisch 1967, Teller 1969, and Fillmore 1971, among others).Despite extensive interest in the role of frames of reference in spatialrepresentation, there is little consensus regarding the cognitive effort associatedwith various reference systems and the cognitive costs (if any) involved in

82

switching from one frame of reference to another. An experiment was conductedby Allen (2001) with regard to these issues, in which accuracy and response latencydata were collected in a task in which observers verified the direction of turnsmade by a model car in a mock city in terms of four different spatial frames ofreference: fixed-observer (relative-egocentric), fixed-environmental object(intrinsic-fixed), mobile object (intrinsic-mobile), and cardinal directions(absolute-global). The results showed that frames of reference could bedifferentiated on the basis of response accuracy and latency. In addition, nocognitive costs were observed in terms of accuracy or latency when the frames ofreference switched between fixed-observer versus global frames of reference orbetween mobile object and fixed environmental object frames of reference. Instead,a distinct performance advantage was observed when frames of reference werechanged (Allen, 2001).

When comparing the frames of reference, a few conclusions can be drawn:

1. Frames of reference cannot freely “translate” into one another.

2. There is common ground between visual axes and linguistic frames of referencethat allows them to converge on an object, and allows one to talk about what onesees.

3. Language is most adaptive to directing, and therefore other modalities shouldfollow.

7.3 Flat-shooters – Two-dimensional – Amazon’s Sonja

Amazon (Chapman, 1991) is a computer software program that recognizes thesimple commands of a user. We introduce the case of basic linguistic navigationalprogram to demonstrate some of the conceptual difficulties encountered in theearly development of such programs. (see Appendix I). As a player views theenvironment his vision is limited, and so is the built environment that is exposed.The program/game is designed to search the represented space and recognizeobjects viewed by the user. In order to recognize the occurrence of events in theworld, we need some way of representing the transitive properties of occurrencesof those events. The command can be said to describe the relationship betweenobjects and agent. Similar attempts include those of Schank (1973), Jackendoff(1983, 1990), and (Pinker, 1995). These prior efforts attempted to ground spatialexpressions in perceptual input. However, they did not offer a procedure for

17-Jun-09

83

determining whether a particular perceived event meets the transitive propertiesspecified by a particular spatial expression.

The alignment between the agent’s view of a place with analysis of control throughEnglish commands, is exemplified by the system of commands in Sonja (Chapman,1991). Sonja uses English instructions in the course of visually guided activity toplay a first generation video action game, and specifically one called Amazon. Thegame features an agent – an Amazon-warrior – whose goal is to find an amulet andkill a ghost in two-dimensional space. According to Chapman, the use of advice tothe Amazon-warrior requires that that the computer interprets the instructions.Interpretation can in general require unbounded types and amounts of work.Sonja’s interpretation is largely perceptual; it understands instructions by relatingthem to the current Amazon-playing situation. When Sonja is given an instruction,it registers the entities the instruction refers to and uses the instruction to choosebetween courses of action that themselves make sense in the current situation. Aninstruction can fail to make sense if it refers to entities that are not present in thesituation in which it is given, or if the activity it recommends is implausible in itsown right. Some instructions have variant forms. Pick-up-the-goody anduse-a-potion are chained together when the instruction ‘Get the potion and set itoff’ is given.

Instruction Instruction buffer(s) set Field

Get the monster/ghost/demon kill-the-monster type

Don't bother with that guy don’t-kill-the-monster

Head down those stairs go-down-the-stairwell

Don't go down yet don’t-go-down-the-stairwell

Get the bones kill-the-bones

Ignore the bones for now don’t-kill-the-bones

Get the goody pick-up-the-goody register-the-goody

Don't pick up the goody don’t-pick-up-the-goody register-the-goody

Head direction suggested-go direction

Don't go direction suggested-not-go direction

Go around to the left/right go-around direction

Go around the top/bottom go-around direction

Go on in go-in

84

OK, head out now go-out

Go on in and down the stairs in-the-room go-down-the-stairwell

Go on in and get the bones in-the-room kill-the-bones

Go in and get the goody register-the-goody in-the-room pick-up-the-goody

Get the potion and set it off register-the-potion pick-up-the-goody use-a-potion(chained)

Scroll's ready scroll-is-ready

On your left! and similar look-out-relative rotation

On the left! and similar look-out-absolute direction

Use a knife use-a-knife

Hit it with a knife when it goes light hit-it-with-a-knife-when-it-goes-light

Use a potion use-a-potion

No, the other one no-the-other-one

Table 7.1 Amazon’s natural and formal instructions.

When one examines ‘Sonja’, one is constrained in terms of language and terrain;the two-dimensional space allows for easier reference to action, but the visual andlinguistic extended reference systems operate differently (Jackendoff, 1983). Whena command is given, Sonja carries out the demand to move, for example ‘goaround’ still has the pattern a —> b —> c and utilizes a reference system. Whennavigating in three-dimensional space, people generally consider up and down, leftand right, forward and backward as possible movements of the vantage point. Theterrain in Amazon consists of barriers that require decisions to move up or downand inter-locking spaces restraining the use of the ‘channel’ between elements, toleft and right, up and down.

Head direction suggested-go direction

Don't go direction suggested-not-go direction

Go around to the left/right go-around direction

Go around the top/bottom go-around direction

Go on in go-in

OK, head out now go-out

Go on in and down the stairs in-the-room go-down-the-stairwell

17-Jun-09

85

Table 7.2 Amazon’s Sonja, possible moves in the terrain

Placement of an agent in a scene does not necessarily enhance the feeling ofimmersion. Sonja’s icon is a two-dimensional attempt (see Figure 7.3) at thetransition of frames according to movement in relation to the geographical up –down of the screen icon, in relation to the front – back direction of the user, and theside view of the agent in relation to the left – right direction. The instructions to theagent show some of the difficulties in translation to derive route knowledge.

On your left! and similar look-out-reference – agent

On the left! and similar look-out-reference – topological map

Figure 7.3 Amazon icons showing Sonja’s avatar

This confusion is avoided in first-person shooters like Tomb Raider, where theagent (e.g. Lara Croft) mostly shows its back, and is an isomorphic correlationbetween the user and the agent through a common reference system.

The following commands constitute what needs to change or be added in thesystem examined (see Table 7.3):

Go around (object) to the left/right

Go around (object) under/over

Go above/bellow (object)

Go through

Go back

Go left and similar

Go along the (object)

Go to the left of (object)! and similar

Go near (object)

Go between (objects)

Go to east/west/north/south

Go towards

86

Go in front/back of (object)

Look left/right

Go ahead of

Go past

Table 7.3 The most commonly used prepositions

Concluding remarks

We have examined the mechanism of attention and the divisions that languagecreates, the use of frames of reference. The three frames of reference: relative,intrinsic, and absolute, were introduced and a case of a basic linguistic navigationalprogram was brought in to demonstrate some of the conceptual difficultiesencountered in the early development of such programs. In this chapter we haveestablished the basic ways of handling objects as shown in language. We haveshown how, with the different frames of reference, one handles objects regardlessof prior position. We now need to examine the differences and similarities betweenthe visual command and the linguistic command and what the introduction oflinguistic commands contributes to the navigational system.

17-Jun-09

87

CHAPTER 8

A COMPARISON BETWEEN VISUAL &LANGUAGE-BASED REPRESENTATION

In this chapter we will align the various frames that are involved in spatialreasoning. We compare the different frames of reference as well asegocentric and allocentric reasoning. We will compare the different framesin relation to the task to be performed. A case taken from a tourist guide thatdescribes a walkthrough will allow us to explore natural language and visualsystems and compare the two. Finally all computer simulation programsintroduced in previous chapters will be examined through common framesand axes.

8.1 Aligning the visual and linguistic frames

The linguistic command involves the desire to be somewhere, and the mentalrepresentation of an object already assumes a position. The updating mechanism ofpointing and labeling has been compared to and shown a clear preference forobjects rather than location (de Vega, 2001) (Warga, 2000). Frames of referencehave a visual component and a linguistic component. The visual components are:generative axis, orienting axes, and directed axes, while the linguistic frames ofreference are: absolute, intrinsic, and relative. This idea of frames of reference isfurther developed by Levinson (1996) and Campbell (1994). According toLevinson(1996), and Campbell (1994) the frames of reference already involve

88

egocentric and allocentric thinking. According to Levinson, the three linguisticframes can be summed up in Table 8.1.

Intrinsic Absolute Relative

Origin: ≠ego Origin ≠ego Origin = ego

Object-centered Environment-centered Viewer-centered

3-D model 2 ½ -D sketch

Allocentric Allocentric Egocentric

Orientation-free Orientation-bound Orientation-bound

Table 8.1 Aligning classifications of frames of reference (S. Levinson)

8.2 Travel Guide – guided walk – Amsterdam

The case presents one of the recommended guided walks in the Jordaan inAmsterdam, a visual depiction and written description, taken from a popular travelguide book on Amsterdam (Pascoe, 1996). The role of the different framesinvolved in the proposed system is demonstrated. The walk is accompanied by awritten description, as well as a map, and images of various highlights of buildingsand streets that one would encounter along the way. The two printed pages are anintegration of a multimedia production for the presentation of a Dutch quarter, amixture of image and text.

Page one - On the first side of a two-page spread there is an aerial view photographof the Jordaan, a three to five story row of houses. On the second page there is textand a map of Amsterdam, giving the scale and direction of north, with twophotographs of locations beneath the map, and a wall plaque inserted between thegraphics (see Figure 8.1).

Figure 8.1 First page of the Jordaan tour

The text reads as follows:

Guided Walk

MANY OF AMSTERDAM'S most important historical landmarks, and several fine examples of16th- and 17th- century architecture, can be enjoyed on both of these walks. The first takes the visitorthrough the streets of the Jordaan, a peaceful quarter known for its narrow, pretty canals, houseboatsand traditional architecture. The route winds through to the man-made Western Islands ofBickerseiland, Realeneiland and Prinseneiland, built in the 17th century to accommodate the

17-Jun-09

89

expansion in Amsterdam's overseas trade. The area, with its rows of warehouses and wharves, is areminder of the city's erstwhile supremacy at sea.

The first two-page spread is an architectural historical description; here we shallexamine one of the photographs whose underlying caption tells of a tour. Thephotographic perspective is the view of the Dricharingenbrug across Prinsengrachton the next page (see Figure 8.2).

Second Page - The two-page spread has a different frame on this page; the map isin the middle of the page and the text is to the sides, in between the graphic andphotographic display of an architectural frontal elevation and two more viewerperspectives. The first paragraph encourages people to participate in a possibleevent.

Figure 8.2 Second page of the Jordaan tour

The text reads as follows:

A Walk around the Jordaan and Western Islands

The Jordaan is a tranquil part of the city, crammed with canal houses, old and new galleries,restaurants, craft shops and pavement cafes. The walk route meanders through narrow streets andalong enchanting canals. It starts from the Westerkerk and continues past Brouwersgracht, up to theIJ river and on to the Western Islands. These islands have now been adopted by the bohemianartistic community as a fashionable area to live and work.

The actual tour starts with this description.

Prinsengracht to Westerstraat

Outside Hendrick de Keyser's Westerkerk [page and illustration referral in the text] turn left upPrinsengracht, past the Anne Frankhuis [page and illustration referral in the text], and cross over thecanal. Turn left down the opposite side of Prinsengracht and walk along Bloemgracht - the prettiest,most peaceful canal in the Jordaan. Crossing the second bridge, look out for the three identical mid-17th-century canal houses called the Drie Hendricken (the three Henrys) [page and illustrationreferral in the text]. Continue up [illustration referral in the text] Leliedwarsstraat, with its cafes andold shops, turn right and walk past the St Andrieshofje [illustration referral in the text], one of thenumerous well preserved almshouses in the city. It is worth pausing to take a look acrossEgelantiersgracht at No. 360, a rare example of an Art Nouveau canal house.

90

Guided walk; a case

The command description can be seen as a verb of action specifying speed,direction and attention; the verb presupposes the position of the viewer in relationto an object, a predicate. In our demonstration the most common command is theverb ‘turn’, followed by the verb ‘walk’, used half the number of times. The verbalready contains a relationship of the surrounding objects with viewer script andthe specific place to the rotation of the viewer’s body. The verbal and visualdescription takes the viewer through: 1) The Amsterdam city frame 2) The Jordaanquarter frame 3) The street frame 4) Building object frames. We will analyze thefirst three sentences in the guide walk (see Figure 8.3.).

Figure 8.3. The first three sentence segments enlarged for clarity.

The first sentence in the tour starts (all references are part of the guide book):

a) Outside Hendrick de Keyser's Westerkerk (1) ‘turn’ left up Prinsengracht, pastthe Anne Frankhuis (2) (see p. 90), and cross over the canal.

Outside – A reference location — Object extended axis, the agent is facing the

front.

‘turn’ left Reference orientation — agent extended axis.

up Reference orientation in between the map and location through absolute

coordinates.

c with respect to route ab c—> ab

past Reference in relation to route (confirmation).

b with respect to ac b —> ac

cross over The cross over is also rotation/relative position, one is turning left.

There is no equivalent egocentric reference.

b) ‘Turn’ left down the opposite side of Prinsengracht and walk alongBloemgracht.

‘turn’ left Rotation/orientation — Agent extended axis but also

reflection/location – opposite side.

down Reference orientation in between the map and location through absolute

coordinates.

a with respect to bc a—> bc

17-Jun-09

91

It is interesting to note that this sentence needs both visual and linguisticunderstanding. Let me start by explaining that ‘gracht’ is the Dutch word for canal,and in the Netherlands the same street name is given to both sides of the canal. Inour situation when the tourist approaches the canal and looks around forBloemgracht without referring to the map, the tourist does not cross the canal to theopposite side and goes in the opposite direction since the next command – turnright – is relative (tourist); and since Bloemgracht (street) has two sides, crossingcan be preformed both ways.

Along — Object extended axis

b with respect to ac b —> ac

c) Crossing the second bridge, look out for the three…

Rotation in terms of an object’s relative position and map reading, thus no furtherinformation for orientation. There is no equivalent egocentric pointing.

The analysis of the accessibility of the site produced a map of possible branchingand a script shown in Table 8.2 and Figure 8.4.

Access description of a Access description of b Access description of c

First right

Second right

Third right

First left (First bridge)

First right

Second right

Third right

Or First right after canal

Fourth right (First bridge)

Third right First left

Fifth right (Second bridge)

Fourth right First left First left

Table 8.2 Script of first three sentences

Figure 8.4 Branching graph for the first sentence segments with one degree branching to target

From the examples above we observe that the analysis depends on the strategy andcontextual parameters which specify performance. The morphologicalcharacteristics and performance are meditated through visual and linguistic frames.That is, performance and heuristic rules govern the strategy by which one framesexploration.

92

8.3 Comparison between linguistic and visual frames

In order to compare the different frames, two different rotations are examined:object and array. The literature of examining this phenomenon is large and goesback to the times of Marr (1982). As Levinson demonstrates (in Table 8.3), there isa significant difference between the various frames but the frames can becompatible across modalities (1996 p.153).

F = figure or referent with center point at volumetric center Fc

G = ground or relatum, with volumetric center Gc, and with a surrounding region R

V = viewpoint

A = anchor point, to fix labeled co-ordinates

“Slope” = fixed-bearing system, yielding parallel lines across environment in each direction

Intrinsic Absolute Relative

Relation is binary binary ternary

Origin on ground ground viewpoint V

Anchored by A within G "slope" A within V

Transitive No Yes Yes if V constant

Constant under rotation of

whole array? Yes No No

viewer? Yes Yes No

ground? No Yes Yes

Table 8.3 Summary of properties of different frames of reference (S. Levinson)

When comparing the visual and linguistic system to the allocentric and egocentricsystem one has four elements:

Visual egocentric: is perceived by the internal senses and is, in principle, locatedwithin the limits of a person’s own body. In the visual egocentric space one canperceive one’s own body – what Pillard defines as the sensorimotor mode. Thevisual egocentric mode is the alignment of objects in relation to the environment

17-Jun-09

93

and the alignment of object category in relationship to the environment. Berthozdifferentiates between the relationship of ‘personal space’ (2000) and graspingspace, which allows one to point.

Linguistic egocentric: is the representation of the location of objects in spacethrough a relative frame of reference. It is what Berthoz defines as the ‘egocentricframe of reference’ (2000) and Pillard defines as representational mode.

Visual allocentric: encodes the spatial relations between objects and the relation toa frame of reference external to one’s own body. Thus, the visual allocentric space,the alignment with one’s own body, and the represented reality is already given.Visual allocentric can be a frozen visual egocentric point of view referred to as thepictorial perspective or Marr 2½ –D.

Linguistic allocentric: is the direction of relations between objects and the relationto a frame of reference external to one’s own body. What Berthoz defines as anallocentric frame of reference is a direction through absolute and intrinsic frames ofreference.

When a comparison table is drawn (see Table 8.4), one notices that when oneequates the visual frames of reference to the linguistic frames of reference and thedifference between them is the number of axes or a semiotic distinction, thenegocentric and allocentric cannot be the distinction between the frames ofreference. This is a question of one-to-one relations between the observer andagent, object, and polar coordinates. When using the intrinsic and relative framesof reference the observer is using an analogous process relative to himself, while inthe absolute frame of reference he is using an analogous process external tohimself. Thus in this act of communication one can use egocentric frames since itis a one-to-one relation. This conclusion coincides well with findings inneurophysiology (Dodwell, 1982) and (O'keefe, 1993).

Constant underrotation

Object rotation Array rotation Agent rotation

Visual egocentric Pointing Possible throughwindow frame

Indirectly throughwindow frame

Linguistic egocentric Relative Intrinsic Relative

Visual allocentric Display dependent Possible throughwindow frame

Indirectly throughpointing

Linguistic allocentric Intrinsic and absolute Intrinsic and absolute Intrinsic and absolute

94

Table 8.4 Aligning Classifications of Frames

8.4 Comparison between existing simulation programs

According to (Amarel, 1968) “a problem of reasoning about actions is given interms of an initial situation, a terminal situation, a set of feasible actions, and a setof constraints that restrict the applicability of action.” Visual and linguisticdescriptions have the ability to convey information about the path through explicitand implicit knowledge, for example “go left” is a description, where the startpoint is implicit. The path can also have the end point explicit, like “go towards thehouse” or “go into the house”; it is the equivalent of pointing visually. Theconverse path can have an arrival point like “go to the left of the house”. Lastly, thetransverse path can have an explicit start and end point, giving us the ability todetermine the path relation to an object. Vision has three types of movement:bounded agent, directed agent, and converse, while language also has three types ofmovement: bounded agent, converse path, and transverse path.

Comparing the various navigational programs to the operational criteria ofqualifying movement via a route, one has to take into account the discrepancybetween pointing and directing, as exemplified through axes and frames. In theexisting navigational programs not all three modules subsist (see Figure 7.9). Everyone of the programs has attempted to respond to the requirements of enhancementof immersion differently. 3DS MAX uses a global coordinate system to rotate anobject, while Cosmo utilizes a viewer-centered frame with elements of conversemobility. Thus the system is more effective and efficient.

Comparing the overall tasks that are involved in visual and linguistic commands, invisual systems the observer has attentional competition between the various tasksthat he/she has to perform. For example, when one identifies an object, one has achoice between array rotation and object rotation as well as object pointer andmoving pointer. The linguistic system command is adaptive; it can perform thevarious tasks with one interface. On the other hand, the visual system has a visualfeedback/confirmation that provides the user with a more interactive experience,while the linguistic based system has no visual feedback. Probably the bestimmersive solution is a mixture between the two. In order to examine the limits ofeach system the choice fell on the best task performance i.e. descriptive systems.

17-Jun-09

95

3DS MAX Tomb Raider Myst Cosmo Sonja

Manipulation mode Yes No No No No

Observational mode Yes No Yes No No

Converse No Yes No Yes Yes

Transverse No No No No Yes

Intrinsic frame No No No No Yes

Relative frame Yes Yes Yes Yes Yes

Table 8.5 Comparison between the different navigational programs

Concluding remarks

In this chapter we have compared the basic ways of handling objects, with bothvisual and linguistic systems. The visual and linguistic systems’ approaches toobjects have shown that one handles objects with the same frames and axes,although they use different methods to convey movement, and both have an addedability to refer to the path’s intrinsic value. When one equates the visual frames ofreference to the linguistic frames of reference the difference between them is thenumber of axes, then egocentric and allocentric cannot be the distinction betweenthe frames of reference, only a broad description. This is a question of one-to-onerelations between the observer and agent, object, and polar coordinates. Whenusing the intrinsic and relative frames of reference the observer is using ananalogous process relative to himself, while in the absolute frame of reference he isusing an analogous process external to himself. In the case of intrinsic frames ofreference, opinions tend mostly towards a use of an allocentric frame of reference. Iprefer to think that of this act of communication, where one employs an egocentricframe of reference, as giving the preference to vision, in a one-to-one relationship.This conclusion coincides well with findings in neurophysiology (Dodwell, 1982)and (O'Keefe, 1993). We have also shown that the existing simulation programsstill lack a comprehensive modulation. The new proposed system will work betterbecause, in contrast to previous products, it will utilize the intrinsic frame ofreference in a wider variety of possibilities (see Chapter 7.3). In the next chapter,we will demonstrate how the proposed ideal system might work.

17-Jun-09

97

CHAPTER 9

AN AGENT/OBJECT-BASED SYSTEM OFNAVIGATION

The aim of this object-based navigation tool is to enhance for the user theexploration of a virtual environment, and to make the navigation a naturaldirected experience. Following the goals of realism and immersion inexploring through movement in the virtual world, the system uses an agentthat allows the user to directly control “on the fly” navigation while viewingobjects, moving towards a destination that is non-distal, in other wordsobjects that are visible by the person who moves. The objective of thischapter is to illustrate some of the aspects of how such a proposed systemworks, and in a way that makes it different from existing navigationalprograms. The chapter will concentrate on the cognitive aspects that underliethe proposed system and the linguistic and visual expressions of input/outputof the system. The navigation is controlled by the user employing spatialframes of reference. It offers a linguistic control through a non-metrictopological environment as used in object-based navigation, while the visualoutput is a metrical.

When one examines the English spatial prepositions in conjunction with the visualrepresentation, one realizes that there are a few basic elements of movement in thetoolkit. Agent-centered – has an agent that can move in any direction using therelative frame of reference; object-centered – where one can move in relation toan object using the intrinsic frame of reference; and environment-centered –where one moves according to invariance within the environments using the

98

absolute frame of reference. In fact the situation is more complex; when one looksat agent-centered one has four choices while moving ‘to view the agent’ or ‘not’and while standing still ‘to view the agent’ or ‘not’. In our case while moving, onesees the agent and when one is standing still the agent point of view takes priority.

9.1 Conceptual framework

The proposed system consists of input of a language command and outputs a visualdisplay. The system comprises a screen in front of an observer, who sees agenerated perspective – a location. The Albertian window divides the lines of sight(visual rays) from the viewer positioned in a stationary location and the transparentpicture plan where all lines of construction converge on the vanishing point. Thedistance from the constructional objects to the observer is an imaginary virtualspace. The observer thus surveys the location of objects in front of him. Theobserver’s position can now be calculated and we can call this calculated point theagent. This is sometimes referred to in the literature as perspective-taking, acommon core of representation and transformation processes for visualization,mental rotation, and spatial orientation abilities.

The aim of this tool is to move from one location to another through a route byspecifying either the start position, for example “go left”, or the end location, forexample “go towards the house”, or a path, for example “go along the street”. Inorder to recognize the change in position, we need some way of representing thosedirectional commands. Through limited basic elements, we can generate a largecomputational variance that is meaningful to the observer. In order to command anagent based on the belief, context and desire of a user, the computer program mustgo through two steps to interpret such a request.

Participants engage in dialogues and sub-dialogues for a reason. Their intentionsguide their behavior and their conversational partner’s recognition of thoseintentions aids in the latter’s understanding of their utterance. A sub-dialogue is acommand segment concerned with a subtask of the overall act underlying adialogue. For example, the user wants to examine the effect of a proposed buildingon its environment; the sub-dialogue is a linguistic command to move Agent Afrom P0 to P1.

Thus, the discourse includes three components: a linguistic structure, an intentionalstructure and an attentional state. The linguistic structure consists of command

17-Jun-09

99

segments and an embedded relation. In the linguistic command, an agent performsan act in relation to objects. The intentional structure consists of commandsegment purposes and their interrelationships. A command segment purpose is anintention that leads to initiation of a command segment (intended to berecognized). The attentional state is an abstraction of the state of action of theparticipant’s focus of attention.

The system does not need to recognize an object since everything is labeled in thevirtual reality, and indexialilty is assumed. The field of view helps to narrow downthe object search. When the system encounters an ambiguous command, it clarifiesit through a linguistic query. All replies are complaints, generated when the systemcannot make sense of an instruction. Examples are “Where is the X (object)?” and“Go where?” The system only complains when an instruction does not make sense:specifically when it determines that one of the entities it refers to does not exist.Accordingly, the system clears the instruction buffer, thereby rejecting theinstruction. The system does not allow for a reply. The system is similar to that of(Chapman, 1991). Chapman’s ‘Sonja’ is what is implemented in this system. Inorder to engage in more extended negotiation of reference, one would in manycases require a mechanism for storing and using more of the linguistic context. Theknowledge precondition of the agents is axiomized as follows:

1. Agents need to know the scripts for the acts they perform

2. Agents have some primitive acts in their repertoire

3. Agents must be able to identify the parameters of the acts they perform

4. Agents may only know some descriptions of the acts

5. Agents know that the knowledge necessary for complex acts is derived fromform their component acts

To determine the command scripts the proposed system uses a parser to answerfour questions about the nature of movement in virtual environments. The parser’smodel of analysis consists of four different types of information. It answers thequestion Why? – What is the intent of the user? That is, how does one want toview the image: ‘Observation mode’ or ‘Manipulation mode’? The grasping framesuse a user’s profile to establish the intent of the user according to the user type indetermining Manipulation mode and Observational mode action.

It answers the question How? – What kind of action should be taken? TheMobility module distinguishes between converse and transverse movement.

100

Converse – Two-Point Relations: this includes the evaluation of topological,angular and distance-dependent relations, which share the common characteristicsthat they – in their most basic form – relate two objects to each other. Transverse– N-Point Relations: relations that cannot be reduced to a two-point problem, suchas path relations, special cases like in between, or constellational relations, arequalitatively different from two-point relations. They require more than twoobjects, and/or additional arguments such as shape or outline information. Thesystem relies on object width and length alignment empowering the user with theability to decide within a channel which way he or she wants to look.

It answers the question Where? – What kinds of spatial frame of reference areused? The Orientation module deals with several relations. Angular relations inparticular depend on the establishment of a frame of reference in order to be non-ambiguous. In our case it is the object-centered and viewer-centered frames ofreference. Intrinsic frames of reference are relations that take two or more objectsas arguments and specify the position of a located object with respect to thereference object. Relative (deictic) frames of reference are relations that take twoor more objects as arguments, specifying the position of one object with respect tothe reference frame of the viewer.

It answers the question What? – Which object? That is, the way one categorizesthe data as opposed to the retrieval classification of the data. The thematic sentenceanalysis processes both objects and routes (see Figure 9.2). An agent’s thematicrole specifies the agent’s relation to an action. In linguistic terms, verbs specifyactions, nouns identify the objects, and prepositions identify the relation to theobject, either route or location. The conceptual structure expression helps us toestablish the modules of language parsing, or the thematic roles that verbs, nouns,and prepositions cooperate, towards a semantic understanding.

Figure 9.2. The thematic sentence analysis

The specification of location requires something more than identifying the object:what is needed is a location description. This can take one of two forms: (1) stateand (2) process. A state description of location tells where something is located interms of a well-known and commonly understood system of coordinates. A processdescription is a set of instructions telling how to get to a particular location. The

17-Jun-09

101

linguistic command does not necessarily carry that much information – most of theprocess is assumed to be contextual.

The computational tool is based on a digital information retrieval system; objectsand locations in space. The tool attempts to match different meta-tags within thedatabase in order to infer the command of the user. The database system consists ofdatabase retrieval strategies for forward propagation. A standard assumption incomputationally oriented semantics is that knowledge of the meaning of thesentence can be equated with knowledge of its truth conditions: that is, knowledgeof what the world would be like if the sentence were true. This is not the same asknowing whether a sentence is true, which is (usually) an empirical matter, butknowledge of truth conditions is a prerequisite for such verification to be possible(Davidson, 1969). Meaning as truth conditions needs to be generalized somewhatfor the case of imperatives.

The cognitive linguistics system draws its sources from an older tradition, offunctionalist linguistics. This theory holds that constraints in the form of language(where form means the range of allowable grammatical rules) are derived from thefunction of language. Many linguistic phenomena remain unaccounted for in ourgrammar, among them agreement, tense, aspect, adverbs, negation, coordination,quantifiers, pronouns, reference and demonstratives. ‘Frame-semantic’ isconsidered fundamental to an adequate understanding of linguistic entities, and assuch, is integrated with traditional definition characterizations.

9.2 Controlling the pedestrian agent behavioral model

People are used to the situations they normally interact with; their behavior isdetermined by their experience, which reacts to a certain stimulus (situation) and isthen evaluated. Their reactions are usually rather ‘habitual’ and well predictable, amicro-movement (Haklay 2001). The micro-movement fluctuates, taking intoaccount random variations of behavior that arise. In the following paragraphs, wewill specify the micro-movement of agent motion:

1. A user wants to walk in a desired direction ‘g’ (the direction of his/her nextdestination) with a certain desired speed ‘v’. In our case the desired speed of anagent is equally distributed:

102

2. Agents keep a certain distance from borders (of buildings, walls, streets,obstacles, etc.). This effect can be described by a repulsive, monotonic decreasingpotential. At the same time people are attracted to certain streets and objects.

3. Agents need to simulate pedestrian preference. These interactions cause eitherhigh walkability effects or low walkability effects on him/her, like walking ondifferent objects or materials and the effects of performing desired maneuvers toachieve such preference. For example, walking up the staircase and walking downa corridor. In effect, one can walk anywhere. For example, the imperative “Walk inthe shade” or “walk in the middle of the road”.

4. The user needs to see the agent moving to grasp his environment, and since nohaptic or immersive device was integrated into the system to provide feedback,vision is the primary source of information.

Micro-movement characteristics contribute to the detailed behavior of agents, andthe immersion of the observer. Factors include progress, visual range, and fixation.Progress is simply the desired walking speed at which an agent moves. Visualrange relates to an observer’s visual acuity and determines which buildings andother elements in the environment the agent will ‘see’ and potentially respond to.In order to enable an agent to receive commands from the user and search thenearby area and to match an object to its label, the agent must control the followingelements:

(1) Route – This includes the whole route, an agent’s position on that route, itscurrent location in the world, its fixation on its route, its thresholds for evaluatingpossible new destinations (derived from its behavioral profile), and a threshold fordeciding that a target has been reached.

(2) Progress – The preferred speeds by which one moves in the environment as itrelates to objects size. In other words, the progress made towards the next targetand the minimum acceptable progress per unit time.

(3) Direction – Encompasses the agent’s current directional heading and thedirection to the next waypoint.

(4) Location – The coordinates of the agent’s center point and the agent’s sizeexpressed in terms of the effect on the walkable surface of the agent's presence in agrid square.

(5) Visual range– Describes an agent’s visual capabilities expressed as a visualcone with breadth and range into the surroundings. Potential destinations inside the

17-Jun-09

103

visual cone are considered as potential deviations from the current plan as directedby the observer, or as obstacles by the wayfinding task.

(7) Control – Reflects the agent's movement state. For example, is the agent activeor waiting, moving normally or stuck?

The subroutine structure

These different levels enable the agents to compute separately local movement, theprocess of moving to the next grid square on the walkable surface, medium-rangemovement maintaining a proper direction, and longer range movement trying tomove to the next point while avoiding obstacles. The system is able to examinewhere one is located through the shortest path between two points. The modules ascurrently used are presented in Figure 9.1. The following sections describe theiroperation, starting with the low-level operation of the helmsman and proceeding upto the highest level of the chooser.

Figure 9.1. The interactions between the agent control modules and the agent state variables storedon its ‘agent state’. There is a general ‘zig-zag’ of interaction whereby high-level modules controlthe state variables to which lower level modules respond.

The helmsman module

This module moves the agent ‘physically’ through the environment. It reservesspace for the agent in the environment and checks for obstacles like other agentsand buildings. Each grid square or ‘cell’ in the ‘world’ has a certain capacity, itswalkability value, low to high, where a high value refers to a non-penetrable object,such as a building. Values indicate the proportional occupation of that grid square,so that low values are preferred by pedestrians and high values indicate that a cellis unsuitable for pedestrians, such as the center of busy roads. Each agent is alsoassigned a value representing its own occupancy of a grid cell. In a tour, the movermodule looks in up to five directions, starting from the current heading direction,to determine where the most space is available. It sets the heading to this directionand places the agent at the new location, according to its heading and speed.

The navigator module

Supporting the helmsman at the next level is the navigator, which also maintainsthe agent’s heading so that it does not deviate too far from the target direction.However, the agent must be allowed to deviate somewhat from the heading

104

towards a target so that it can negotiate corners and get out of dead ends.Therefore, the controlled variable is the agent’s progress towards its current target.In operation, the navigator module checks if it is possible to walk in the targetdirection. If so, the navigator sets a new heading; if not, the heading remainsunchanged. These modules together deal with ‘tactical’ movement of getting to thenext point in the route. The last behavioral modules attend to more strategicmovement and planning.

The chooser module

The user-choice or chooser receives a command from the user that identifies thenext target in the route of the agent. This module enables an agent to receivecommands from the user and search the nearby area and to match an object to itslabel. The target can be a point like a junction in the street network, or a building orlocation the agent wants to enter. The chooser uses the agent’s visual field to detectcandidate objects in its immediate surroundings. In motion the visual field’s extentis defined by the agent’s speed and fixation on its task: the higher the speed andfixation, the narrower is the fan of rays which the vision module sends to theenvironment. Building objects in the field of view are considered by the choosermodule as potential new destinations which may be added to the current plannedroute. Building attributes such as type and general attractiveness are compared withthe user’s commands, and if a match is found then the location may be pushed ontothe route as a new next destination.

9.3 Operation of the system

The linguistic interpretation system is based on work already done boththeoretically and empirically in computer science and robotics. The system isviable because the domain is constrained, by limiting the vocabulary of objects inthe architectural domain in the virtual environment. There are numerous systemsusing the domain constrained hypothesis, to mention just a few (Chapman, 1991),(Tollmar, 1994), (Smith, 1994), (Polllock, 1995), (Bunt, 1998), (Cremers, 1998),and (Strippgen, 1999). According to (Varile, 1997), there are several approaches tolanguage analysis and understanding. Some of those languages are more gearedtowards conformity with formal linguistic theories, others are designed to facilitatecertain processing models or specialized applications. Language understandingdivides into: constrained-based grammar formalism, and lexicon for constrained-based grammars. According to Allen, (1990) the original task was defined by in the

17-Jun-09

105

1980s, in which positional change was defined in terms of the object, the sourcelocation and goal location.

In order to respond to a verbal command the system must interpret the verbalcommand and at the same time translate the visual scene into a verbal description,so that there is a correlation between the two components. Each component, thevisual parser and the linguistic parser, have their own information restraints (seeprevious chapters). The computational model shows the basic processing modulesand assumes that most modules already exist; for example, modules like voicerecognition have been worked out, as has a rendering engine. The system makesuse of partial matches between incoming event sequences and stored categories tohelp solve this command and implementation. The system imposes a userconstraint that comprises four principles. First, each sentence must describe someaction to be performed in the scenes, that is the agent must be instructed on theaction and location in order to perform. Second, the user is constrained to makingonly true statements about the visual context. Everything the user says must be truein the current visual context of the system. The user cannot say something which iseither false or unrelated to the visual context. Third, the order of the linguisticdescription must match the order of occurrence of the EVENTS. This is necessarybecause the language fragment handled by the system does not support tense andaspect. Finally the user is also restricted to homomorphic relations, i.e. language istime/effort restricted in the amount of information it contains about referenceobjects. These constraints help reduce the space of possible lexicons and supportsearch-pruning heuristics that make computation faster.

The above four learning principles make use of the notion of a sentence“describing” a sequence of scenes. The notion of description is expressed via theset of correspondence rules. Each rule enables the inference of the [EVENT] or[STATE] description from a sequence of [STATE] descriptions which match thepattern. For example, Rule 1 states that if there is a sequence of scenes, which canbe divided into two concatenated sub-sequences of scenes, such that eachsubsequence contains one scene interpretation. In every scene in that firstsubsequence, x is at P0 and not at P1. While in the second subsequence scene, x isat P1 but not at P0. Then we can describe that entire sequence of scenes by sayingthat x went on a path from P0 to P1. For example, “go to the left of the house”.

The navigational command program consists of two parsers: one visual, the otherlinguistic. The parser is a computer program that breaks down text or visual objectinformation into recognized strings for further analysis. The organization of these

106

modules is illustrated in Figure 9.2. The visual parser examines visual geometricrelations between the agent, object and the object’s background. The objects withinthis unhindered visual space are divided into access trees in which objects andnodes are connected through string clusters. The visual parser then produces alinguistic script of all the possible relations between the agent and the objects inview. The language parser examines the syntactic properties of the sentence. Thelanguage parser examines the geometric relations between the agent, object and theobject’s background as described in the linguistic command. The Linker receivesinformation from the object parser and the predicate parser. The data received bythe linker is already restricted by the observer point of view. The Linker comparesthe linguistic object relation and matches it with the appropriate visual object/noderelations. The user profile module establishes the subroutine behavior of the agent.The Knowledge rules restrict further the movement by the user.

Figure 9.2. The process of analyzing a command

The visual object parserThe object parser examines the relationship between objects and nodes, i.e. objectlocation and the path one has to travel to get to one’s goal. The more nodes one hasto pass through, the higher the circulation cost, i.e. it is a question of distanceversus energy. The hindered visual space is used to eliminate from the searchobjects that cannot be seen. Physical accessibility examines if one can movebetween objects or between other agents and eliminate all inefficient nodes. Theobject parser process is as follows:

1. Locate the agent position within the scene

1.1 Locate the visual environment boundary

2. Detect object types and count objects

2.1 Locate object positions

2.2 Identify the attributes of objects

3. Determine the object’s reference frame system through geometrical analysis ofobjects in view

3.1 Create a reference frame system for each of the adjacent objects

17-Jun-09

107

3.2 Detect the adjacency relationship to other objects

4. Detect all paths available to the user to reach the objects

4.1 Create a physical access path to each object

Topological analyses of objects consist of a semantic graph and geometry graph ofthe object. The geometry graph of the object analyzes the face vertex and edge,while the semantic graph analyzes top and bottom, front, and back, left and right.The semantic graph also analyzes the zones of object viewing, adjacent zones.

The visual parser prepares the basis for comparison with the linguistic command.The operation of the attentional mechanism in vision is based on three principlesconcerning the allocation of spatial attention (Mozer, 2002). These abstractprinciples concerning the direction of attention can be incorporated into thecomputational model by translating them into rules of activation, such as thefollowing:

(1) Locations containing objects in the visual field of the agent should be activated.

(2) Locations adjacent to objects’ regions should also be activated. The directedregions use an architectural reference frame system (see Figure 9.3) where theexact location is established through a view that encompasses the object. Eightregions were used to increase identification of the target area by the users.

The architectural reference frame system not only refers to adjacent points inrelation to an object but links them into a conceptual neighborhood, for examplethe preposition ‘along’ will use three points from a side of the object.

Figure 9.3 architectural reference frame system; front, back, left, right, front left, front right, backleft, and back right, incorporating the visible area from a generating location (or convergencelocation of the optic rays)

The visual parser links the topological and visual accessibility graph (distances andorientation) to produce accessibility tables, and adjacent graphs (path anddirection) to reach an object. Pedestrian activity can be considered to be anoutcome of two distinct components: the configuration of the street network orurban space and the location of particular attractions (shops, offices, public

108

buildings, and so on) on that network. The visual parser then ranks and describes intables the route to each object.

The Linguistic parserThe language parser has to enforce a relation between a time-ordered sequence ofwords in a sentence and a corresponding time-ordered sequence of syntacticstructures or parse trees, which are tagged by the lexical category (for the definitionof prepositions, see Figure 9.4). The language parser use of noun, verb, prepositionand adjective through slots requires the sentence to be linked by a simple syntax.The proposed function of the language parser produces a corresponding parse-tree/semantic-structure pair. The language parser imposes compositional syntax onthe semantic tree, relating the individual words to grammatical English andlexicon. The parser directly represents the clauses of the grammatical syntacticfunctions and grammatical constituents. The language parser will be able todistinguish a command given with a relation to the position of the object, and thuswill differentiate between the object positional reference system, for example:

“Go along the street.”

The sentence is constructed thus: Verb – go, Preposition – along, Noun – street.“Go to the left of the house.”

The sentence is constructed thus: Verb [Preposition, Noun], where the objectadjacent location is the target.

“Outside the church, go left.”

The sentence is constructed thus: [Preposition, Noun] Verb, Noun, where theobject reference frame is the starting point and the agent reference frame is the endpoint.

The list of prepositional definitions are as follows:

“left/right/back/front side”The chosen object needs to be analyzed for its orienting axis.ProcessWe need to establish the reference system at a distance that shows the whole facade.

“Rotate”Rotate the object requires the retrieval of the object generating axis.ProcessSwitch to object panorama with the same distance to the object as the observer is.

17-Jun-09

109

“Through”‘Through’ depends on the opening/passageway within an object or between objects.ProcessMove through the opening and continue in the same direction, while relinquishing part ofthe control to the observer (left and right in relation to the observer).

“Around”The term ‘around’ is defined as moving in relation to the circumference of an object. Onedirects the observer through the directed axes.ProcessMove the agent on a path neighboring to the object in relation to the agent, whilerelinquishing part of the control to the observer (left and right in relation to the object)and stop at the further point from the agent.

“Along”Rotate the object requires the retrieval of the object directed axes.ProcessMove the agent as on path in relation to the object, while relinquishing part of the controlto the observer (left and right in relation to the observer).

“Towards”‘Towards’ moves the observer to desired object.ProcessMove the agent to the object and wait for further command.

“Zoom”‘Zoom’ places the observer in front of the desired element within an object.ProcessIf the agent is in front of the object then elements of that object can be chosen.

Figure 9.4 Definition of preposition

The LinkerThe linker component relates a time-ordered sequence of linguistic input to asequence of cognitive structures that directs the visual output. The input is given interms of virtual visual location as well as a sentence that describes part of theenvironment and the process the agent has to perform. The linker then connects allpossible sequences of the STATE with EVENT to form connections with theobjects in a scene in front of the user, and produces as an output a new STATE,location. The linker has a list of prepositions that link objects to action. The linkerrelies on the object parser to determine the access graph and object adjacent

110

relations, as well as topological examination. When other information is given suchas a “Go to the front of the second building”, the linker then compares thedescription produced by the system and the description produced by the user. Thelinker divides the sentence into the four categories of the path, the conceptualconstraints. The syntactic analysis is constrained by the four modules of movement(see Chapter 5.2). The linker also relies on user profiles and knowledge rules.

It can be thought of as regulations for preposition actions; when the systemexamines a sentence it must compare the new sentence with those rules. Thepattern that emerges form those commands is as follows:

“Go left” Agent centered – (Bounded)I) GO(x- current position + Preposition)

“Go towards the house” Object centered – (Directed)II) GO(x- current position, TO (z an object))

“Go to the left of the house” Object centered – (Converse)III) GO(x- current position), TO, (Preposition) + (z an object)]

“Go along the street” Object centered (Transverse)IV) GO([FROM(y an object) To(x- current position, minimal distance) + (Preposition),TO(y an object) (x- current position, maximal distance)])

Chains“Go from the house to left the of town hall”V) GO(x- current position, [FROM(y an object) + (Preposition), TO(z an object)])

“In front of the house go left”VI) (Preposition) + (y an object) +Go (x- current position + (Preposition)

The referring expressions of natural language will be just those expressions that map intothe slots of the conceptual structure expressions.

User profileThe computer system queries the end-user about his preference to objectexamination, or grabbing frames which suits the user type. The types of questionsthat the system asks are:

What type of action would you like to perform?

17-Jun-09

111

A. Visit locations? B. Architectural tour? C. Maintenance inspection?

Would you like to be close to the object or would you like to see the object’s outline?

Would you like to see the object from a frontal position or from the side?

Would you like to go directly or through a path?

Knowledge/production rulesIn order to draw attention to an object, the user must specify the location of theobject, as well as the new position. Once the information from the linker and parsercorrelate, the command statement should be executed.1) IF THE LOCATION OF THE OBJECT IS KNOWN

AND THE OBJECTS WITHIN THE POINT OF VIEW IS "DEFINED"AND THE OBJECT AND DESCRIPTION MATCH

AND THE OBJECT HAS A REFERENCE SYSTEMAND NO OTHER RELATIONS ARE SPECIFIED

THEN THE USER IS REFERRING TO AN INTRINSIC FRAME

In order to draw attention to an object, the user must specify the location of theobject, as well as the new position. Once the information from the linker and parsercorrelate, the following movement should be executed.

Go left – Agent-centered – Relative frame of reference

Go towards the house – Object-centered – Intrinsic frame of reference

Go to the front of the house – Object-centered – Intrinsic frame of reference

Go left of the tree – Agent-centered – Relative frame of reference

Go south of the house – environment-centered – Absolute frame of reference

The semiotic distinction is at the heart of the theoretical system. It makes thedistinction between the label signifier and the object image itself. The systemdifferentiates between the semantic object and the signified, thus disambiguatingsome of the sentence’s confusion between the intrinsic and relative frames (seeFigure 9.5). In the case of a converse path, the system matches the topology of theobject to the linguistic preposition. If the object has no reference system, theattention shifts to the agent; when the object has a geometric axis then the attentionshifts to the object.

To this statement, we add the following rules:2) IF THE IDENTITY OF THE OBJECT IS KNOWN

112

AND THE OBJECT HAS NO REFERENCE SYSTEM

AND THE SURFACE OF THE OBJECT IS "UNDEFINED"AND THE OBJECT IS AN ARTEFACT

THEN THE USER IS REFERRING TO A RELATIVE FRAME

Figure 9.5. An example of the inference rule used in the sentence “go to the left of the house”

Also3) IF THE IDENTITY OF THE PATH IS KNOWN

AND THE PATH HAS AN ACCESS GRAPH

AND THE SURFACE OF THE PATH IS "DEFINED"AND THE PATH IS AN ARTEFACT

THEN THE USER IS REFERRING TO PRIMARY PATH

DEMONS

The following two rules examine the possible connection between the type ofcommand and the preferred strategic means to achieve the desired route.4) IF THE VERB THROUGH IS USED IN CONJUNCTION WITH PATH

IF THE LOCATION OF THE PATH IS KNOWNAND THE PATH WITHIN THE POINT OF VIEW IS "DEFINED"


AND NO OTHER RELATIONS ARE SPECIFIEDTHEN THE USER IS REFERRING TO A SCENIC PATH

5) IF THE VERB THROUGH IS USED IN CONJUNCTION WITH AN OBJECT

IF THE LOCATION OF THE PATH IS KNOWNAND THE PATH WITHIN THE POINT OF VIEW IS "DEFINED"


AND NO OTHER RELATIONS ARE SPECIFIEDTHEN THE USER IS REFERRING TO SHORTEST PATH

The system will be able to distinguish a command given with a relation to theposition and direction of the object, and thus will differentiate between the object

17-Jun-09

113

directional reference system and the egocentric position which creates a linguisticmirror effect. When we discussed the robotic system in Chapter 6.2, the frames ofreference always assumed a relative frame, but people do use an intrinsic frame ofreference as well, and herein lies the confusion. For example, the front right andback left depend on the viewer poison. In order to solve it the system will adopt theviewer position. This does not mean that all human error will be handled by thesystem, but that the system will respond to corrections made by the user. Thesystem now has the intended object and the preposition and in this case also anadjective.6) IF THE IDENTITY OF THE IS OBJECT KNOWN

AND THE OBJECT HAS A REFERENCE SYSTEM

AND THE OBJECT DIRECTIONAL AXIS IS NOT THE SAME AS THE AGENTAND THE OBJECT IS AN ARTEFACT

THEN THE USER IS REFERRING TO REFLECTIVE RELATIVE FRAME

The system will be able to distinguish a command given with a relation to theposition of the object, and thus will differentiate transverse action.7) IF THE IDENTITY OF THE OBJECT IS KNOWN

AND THE OBJECT HAS A REFERENCE SYSTEMAND THE OBJECT DIRECTIONAL AXIS IS NOT THE SAME AS THE AGENT

AND THE OBJECT IS AN ARTEFACTAND THE PREPOSITION IS ALONG IS USED

AND THE AGENT POINT OF VIEW IS THE DOMINATING VIEW TO THELEFT OR RIGHT OF THE OBJECT

THEN THE USER IS REFERRING TO THE

LEFT OR RIGHT OF THE OBJECTIF THE DISTANCES ARE EQUAL

THEN PROMPT THE AGENT FOR LEFT OR RIGHT DIRECTIONS

8) IF THE PREPOSITION PAST IS USEDAND IF THE IDENTITY OF THE OBJECT IS KNOWN

AND THE OBJECT HAS A REFERENCE SYSTEM

THEN GO TO THE BACK OF THE OBJECTTHEN LOOK AWAY FROM THE OBJECT FROM THE DIRECTION OF THE POINT

OF DEPARTURE

114

9.3 Process of the system

Let us go one more time go through the process and see the different phases of thismulti-processing. In the first phase the user looking at the screen utters a command.It is then processed through technologies for the spoken language interface.According to (Zue, 1997), the spoken language interface can be broken down into‘speech recognition’, which receives information from ‘speaker recognition’,which in turn receives information from ‘language recognition.’ The command isthen translated into text and that is when our system starts to operate. Thisinformation flow can be displayed in basic ways through three elements; theperception of objects, the object in the virtual world, and lastly the languageexpression. The system gathers information into the following slots: Why, How,Where, What. Each slot is a semantic tree that answers or informs the linker abouta specific task.

Why – User preferenceType of user

Manipulation mode preferenceViewer preference

In this module, the system displays dialogue box which attempts to categorize theuser into a type of potential navigator. In responding to the user’s input, the systemincorporates demons to help the user navigate.How – Verbs – Mobility module –

What is the objective of the task? What is the EVENT?What kind of path type is it?

Is it a path or is it a location?

In this module the system searches the database for verbs that describe the action tobe taken. It has to decide which type of path is to be taken, thus it borrows from theorientation module and together they determine the path type. Bounded agent,Directed agent Converse path and Transverse path.Where – Preposition – Orientation module

What preposition is used?What attachment to objects action is required?Does it use an object-centered command or an agent-centered command?

The where module establishes the linguistic command center of attention. In orderto accomplish such a task it must use the identification module to determine theobject geometrical properties.What – Nouns – Identification module – world’s objects

17-Jun-09

115

Identify objects in the agent field of view. What is the STATE?How many objects are there?

Identify each object position relative to the agent.Identify object and attributes relative to the agent.

What are the axes of the targeted objects?Identify path in the agent field of view.

Identify path relation to objects.

The identification module determines location and direction of the agent to theobject in the agent visual field. It parses the visual information to be retrieved bythe other modules. Through accessibility graphs and adjacency graphs, a basiclinguistic relationship of objects in the agent visual field are generated in the formof a table (see Chapter 8.2).

The semantic trees seek information to fulfill their objective. Once certaininformation nodes have been fired, and the semantic net gives a reply, the systemcan then execute the command. As one can see, the knowledge that is required hasits source in both the visual and linguistic parser. According to activation, thesystem has a different output. The linker applies a simple rule: if no object isactivated, or the object has no reference frame, then it is an agent-centerednavigation, otherwise it is object-centered. The linker also matches the usercommand description with the system table of object locations relative to the agent.

There are practical difficulties at this stage of development to add more features inthe confined time of a dissertation. Adding more features requires bringing in moremodules as is exemplified by the statement “Go to the left side of the house with atree in front.” This is an instruction which contains a subset – a particularaffirmative – some of a is b, or, if we were to alter the statement slightly, “Go tothe left side of the house without a tree in front” – some of a is not b. All thesetypes of reasoning within the sentence require the system to reason within thesubject and predicate, which in turn requires an ever growing sentence complexityfor the predicate parser, and ever-growing reasoning means and representations.The system also needs a topological visual Gestalt grouping mechanism socontextual related objects can be referred to.

In addition, many linguistic phenomena remain unaccounted for in our grammar. Inthe theoretical world this is not a limit, but in the real world it does make adifference. The navigation system also has difficulties in extreme heterogeneousand homogenous space, and sparse and dense environments. The system isvulnerable to navigation in the wild for deficient or twisted language of the user incommanding the agent within virtual environments.

116

Concluding remarks

In this chapter, we have shown how a theoretical model might operate and performon a broad range of movement. The bounded agent and directed agent use adifferent method of calculation, with primitive spatial entity, while the converseand transverse uses regional based theory. The environment-centered directions arenot included; these impose an independent set of a coordinate system on the object(centroid). This system, which is not part of the dissertation, was completelyignored although a simple polar coordination is implemented in existing GIS.

17-Jun-09

117

CHAPTER 10

USABILITY OF THE SYSTEM

10.1 Architectural case

Typical scenes that architects imagine the client might want or they might want areas follows:

“Go to the back of building”

“Rotate the building”

“Go through the building”

“Go around the building”

Those sentences contain all the ingredients of the two higher function commands,converse and transverse. The particular commands where chosen for their graphicaleffect, any other combination as described can be generated.

The following is a simulation of the system performance for a user. In the initialcondition the viewer sees the front of a building (see Figure 10.1). The agent/avataris not seen at this point, only when the agent moves does the camera fall back andshows the agent/avatar moving. (We will show at the movement of an agent/avatarin the command “Go around the building”. (see Figure 10.6))

Figure 10.1. Initial condition

The user then commands the system to “Go to the back right of building”

118

To understand what is meant by this command we must break down the sentenceinto its components. Given the agent’s position, we need to identify the building inquestion and analyze its geometric properties. We need to establish the prepositionrelationship i.e. the back of the building at a distance that shows the whole facade.The system then examines the object reference system and matches it to the “backright” and establishes it as the new position. This is a converse command andfollows Rule III and knowledge rules 1 & 3. In the demonstration the camera is inthe new position (see Figure 10.2).

Figure 10.2. Back of the building

“Rotate the building”

Rotate the building requires the retrieval of the building generative axis. In ‘rotatethe building’, the system prompts the user to find which way to rotate the buildingand at what angle. In the case of array rotation of the system, it switches to a fixeddistance rotation from the building and the viewer controls the rotation through leftand right commands. This is a demon activation (see Figure 10.3).

Figure 10.3. Rotate the building

“Go through the building”

‘Go through the building’ is a term for the system to perform movement betweentwo objects, or an opening in a building, that is the node has a two or three-levelstructure. If the building has a complex layout, there are systems that recognize thisand find the most efficient route. This is possible through a search such as ‘A*’algorithm, but it is not the purpose of this system to solve such problems. This is atransverse command using Rule IV and knowledge rules 1 & 3 and demon 7. In ourcase the system shows an intermediary image at the point that requires a decision(Figure 10.4) in which the user replies “Go left”.

Figure 10.4. Through the building

17-Jun-09

119

“Go around the building”

The term ‘around’ is defined as moving in relation to the circumference of anobject. In addition, one can direct the observer through segmenting of the objectreference system. The segmentation can be done by either referring to another goal,as in ‘circumvent the object’, or an even smaller section as in ‘turn around thecorner’. In either case of segmentation, movement is away from the object. In ourcase one can direct movement while traveling by the commands ‘look left’, ‘lookright’ and ‘stop’. In the case of prepositions like ‘around’, the system starts tomove the agent around the house (see Figures 10.5-10.6). The camera pulls back toreveal the agent moving along a path around the building. The user then commandsthe system to stop at the back of the building (see Figure 10.7).

Figure 10.5. Initial phase – The user viewing the buildings

Figure 10.6. Around the building – Turning with agent in view

Figure 10.7. Final phase – Turning endpoint seeing the building as the agent sees it

This simulation shows the need for such an architectural tool even in the mostbasic form of a single building presentation. The system does not deal with rules ofcomposition to form the attempt to frame “the perfect view”.

10.2 Evaluation of the object-centered tool

The present research has attempted to provide insights into the process ofnavigation in order to improve the design process, offering the architect the abilityto interact with objects as one moves in ‘virtual’ environments. The modelaugments elements of a route that work with intrinsic frame of reference to movethe observer. The proposed conceptual system is modular in the sense thatpresenting all the modules gives the observer the possibility to form the requiredsystem. The model represents the cognitive process as one moves in virtualenvironments. The interpretation of the model into a tool of simulation ofmovement required the development of semiotics in order to construct the languagecommand tool. As opposed to direct manipulation, the linguistic command of

120

movement in the virtual environment is without gesture, since movement isinterpreted as a directional vector, an analogical spatial representation.

The following aspect of movement has been achieved:

The user-centered tool performs both agent-based and object-based navigation.That is, the user can refer to the movement of the agent as a base for movement orthe user can refer to an object as a reference point in a triangulation scheme usingframes of reference. We have shown that of the four elements of the path (seeChapter 5.2), the transverse path with phrases like ‘through’ has a limit. The actrequires a decision to be taken along the path, as for example in the command‘walk through the building’. The user-centered tool is generic and exposes allpossible types of movement as presented through logic and evident in language.

The object-centered tool enhances movement in virtual environments byintroducing an agent making it more vivid and introducing language to make itmore immersive. As was shown in the analysis of Tomb Raider, introducing anagent to the virtual environment causes the viewer to see the context of hisenvironment from a point of view external to himself. For example, when the agentwalks one can see the agent in contact with the ground, enhancing the hapticperception of that environment. The user-centered tool is also more immersive inthe sense that it allows the user to be more engaged in the subject matter, andnavigation becomes second nature.

Using ‘on the fly’ user-centered tool performance is enhanced by a more efficientand effective system. The user-centered tool will function better in an architecturaloffice environment where the architect communicates with various parties, whileexamining the various aspects of the three-dimensional project. The user-centeredtool is effective in moving an agent through virtual reality, as shown in Table 4.2.

The object-centered language tool has integrated all modules of the visual interfaceinto a unified whole. As was shown in the examination of existing visual programs(see Chapter 4), one module cannot perform all tasks that are required, while thelinguistic interface does not need any modules, and the interpretation of the verbalcommand is sufficient to direct the navigation (see Chapter 7).

The object-centered tool is easy to use and learn since it uses everyday language.Talking to an agent represented on the screen does not require any transference ofhand movement direction to an agent in the virtual environment. As with directingpeople in the ‘real’ world, movement in a virtual world opens almost the samechannels of communication. The ability of most people to communicate properly

17-Jun-09

121

was never in question despite mounting evidence of people who are dyslectic andcannot perform this task. The primacy of vision, although recognized, should notinterfere with the establishment of an alternative communication route.

The object-centered tool is limited in the ability to select objects in the field ofview – as they increase so do the reference strings, and so does the demand on thecomputing power. The user-centered tool is also limited in the number ofinferences that the system can make with the positional reference system. Thesystem is bound to make misjudgments when the user switches from being in frontof the object command to being in the back of the object command using theintrinsic and relative frames of reference (see chapter 9.3).

The object-centered tool would be difficult for one person to build since it requiresa combination of many components, such as voice recognition, languageunderstanding, database knowledge combined with a three-dimension renderingengine. Thus a full demonstration of the tool is not possible at this stage of verylimited resources, and only a thought experiment and a user needs demonstrationare possible.

10.3 Initial testing of the hypothesis

During the study, a test was conducted in order to determine the initial studentreaction to which type of instruction, direct manipulation or linguistic commands,is preferred by the user. The test did not simulate an architectural investigation buta wayfinding task. The test simulated the proposed interaction using a humanmediator to perform the task of the system on users. The test compared two groupsusing a direct manipulation program (VRML - Cosmo Player) and the linguisticapproach. The test entailed training eight students, in two groups of four, to beproficient users of VRML. Both groups performed the same set of tasks but inopposite order, on two different navigational courses. The users were then askedfill in a questionnaire asking them to rank their preferences. If both groups hadmore or less the same preferences then the data was not contaminated by the recentexperimental experience.

122

The experiment took place in a simulated environment of the Jordaan, Amsterdam,with the same points as the tour guide (see the guided walk in Chapter 8.2). Course(1) is point [c] and course (2) is point [f]. The students were able to observe bothpoints [c] and [f] from the existing location point [x] (see Figure 10.8). The studentinstruction level was simple: the students used bounded and directed agentcommands for the task. The location goal did not place any demands on the studentto observe the location features. The students in the first group (A) learned VRMLfirst, and were then told to navigate a course (1) from point [x] to point [c]. Thengroup (B) verbally instructed group (A) to navigate course (2) from point [x] topoint [f]. The second group (B), which first gave instructions to group (A), learnedhow to use VRML, and then navigated a course (2) from point [x] to point [f].Group (A) then gave verbal instructions to group (B) to navigate course (1) frompoint [x] to point [c]. The students received an initial observation point (Figure10.8) and then proceeded to navigate at eye level.

The group of students selected come from across the European Union. They are intheir fourth year of architectural education, aged 20-23. Out of a class of 12students, eight students volunteered. The questions they were asked were:

1) Which system do you prefer?

2) Why did you prefer one over the other?

3) Rank the difficulty of each task.

We did not pose the students the question of whether they would need higher-function language to accomplish an architectural task. At the moment, directmanipulation programs do not have any higher-function tools to compare withlanguage.

The results from the eight users show that all students preferred directmanipulation to voice commands.

1) They expressed a preference for being able to control movement.

2) They found it moderately difficult to express their intentions through language.

3) Ranking did not yield a consensus; three students from group A ranked thedifficulties of giving instructions as moderate, while four students from group Band one from group B ranked the difficulties of giving instructions as considerable.

17-Jun-09

123

Figure 10.8 The test initial observation point

Discussion

The results are not supportive of the hypothesis of language as the preferrednavigational tool. The following is a discussion of why the students intuitivelypreferred direct manipulation.

To begin with, the architectural student group that was tested is biased towardsdirect manipulation, and also falls within the target group of computer players(aged 15-35). The hand-eye coordination presents the user with the pleasure oflearning a new skill and controlling another medium with direct commands. Theidea of the one-to-one relationship between one’s action and the computer’sreaction, i.e. the idea of the action of the user being analogical to the action on thescreen, is captivating,

The students’ direction instructions were in simple language, using expressionslike “go forward” or “turn right”, and thus the students used bounded and directedagent commands. The students used language intuitively with no clear definition ofthe difference between the linguistic command-based manipulation and directmanipulation of body movement. This result is not surprising, but why it is so has amore complex answer. The ranking shows how the students who practicedlanguage usage had an easier time. Perhaps if the students had practiced naturallanguage for navigation they might have chosen language navigation over directmanipulation navigation, but as it stands the students have a bias towards visuallanguage. This is the first part of the answer to why the students did not uselinguistic higher functions. The other part has to do with the students’ experienceof language prepositions. During their studies, students have difficulties in trying todetermine the frame of reference as they move around the building.

There is a difference between existing direct manipulation with ‘bounded agent’and ‘directed agent’, which are low functions, and the linguistic conceptual with‘converse path’ and ‘transverse path’, which are high functions. There may notbe an advantage to verbal commands if they are at the same level as directmanipulation commands, e.g., “go straight”, “turn right” – which can be easilydone with direct manipulation. But they could have an advantage had they been at ahigher level, e.g. “go around the building”, which direct manipulation cannot do.

124

The results are not supportive of the hypothesis of language as a navigational tool,but show the way that a ‘real’ test could be conducted. Much more sophisticatedtesting would have to be developed, and users of many age groups and professionswill be needed to establish user preferences.

17-Jun-09

125

CHAPTER 11

CONCLUSIONS; FUTURE AVENUES

The research has investigated computer navigational systems’ capability toallow the user intuitive interaction and to move in space as desired in atrajectory. The tool uses a linguistic interface for navigating andmanipulation in virtual three-dimensional environments. Existing visualsystems can only point, whereas the new linguistic system can do more: itallows for converse and transverse paths. Because of this, the linguisticsystem was chosen as well as a theoretical stance, to expose the visualinstruction. The mechanism of language understanding was examined and away to address the imprecise and unambiguous nature of language wasfound. The mechanism of frames of reference helped to disambiguatelinguistic spatial expressions. The tool contains a visual parser and linguisticparser, based on a constrained domain. This linguistic system demonstratedthat it could work more effectively than any previous systems.

The problem that this dissertation tackled was how to overcome the existingsoftware shortcoming, what I termed the ‘flatness’ of the computer screen, theinability to grasp and manipulate objects naturally. The aim was to improve theimmersive vividness of navigational interaction based on knowledge people havewhen they move through the built environment. A comparison between the visualand linguistic system revealed the linguistic superiority over the visual system in itsability to differentiate between different types of movement. The investigationfocused on the linguistic mechanism of directing movement of an agent immersedin a virtual environment. That is, facilitation of an observer’s interaction throughagent-based and object-based frames similar to real world experience.

126

11.1 Review of process

The study revealed that there are three basic frames identified with navigation invirtual environments: agent-centered, object-centered and environment-centered.Thus one can navigate in a virtual environment not just through agent-centeredcommands but also through the reference to random objects in one’s path, and thiscan be done verbally in virtual environments.

In the first phase of the dissertation, we critically examined various programs thatfacilitate movement in the virtual environment. The results of the examinationconclude that there is a phenomenon of ‘flatness’ of interaction, the gap betweenintuitive hand movement and visual response, the gap between what we see andwhat we expect to see. Finally, a matrix of all possible variations of visualinteraction was introduced.

In the second phase, we discussed some of the limits of communication, as definedby the visual and linguistic commands. We proposed a new linguistic model ofcommunication. A navigational system of verbal commands and visual output waschosen and some of the problems associated with it were addressed – mainly thedifferent parameters of object grasping, fundamentally, point of origin, goal anddirection.

The third phase examined the performance of such a system in the context ofarchitectural office use. It examined an imaginary scenario of an architect’spresentation of a project to his clients. The scenario developed is a compound ofinterviews conducted with architects. The scenario proves the importance of such asystem for architects.

11.2 Findings

The originality of the project lies in developing an explanation and representationof movement in environments that goes beyond the conceptual confines of currentCAD-VR systems, leading to a system of integrated multiple frames of spatialreference for interacting in the virtual environment. Towards this goal, the researchexamined initially a number of typical cases of CAD-VR navigation systems andanalyzed their performance. All those programs use direct manipulation: a virtualrepresentation of reality is created, which can be manipulated by the user throughphysical actions like pointing, clicking, dragging, and sliding. The analysis showed

17-Jun-09

127

the lack of realism of these systems because of the user’s inability to interactnaturally and effectively with the computer-based system to navigate andmanipulate represented objects.

In addition, the proposed system uses an agent to enhance the feeling of vividnessand immediacy, that is, an avatar moving around the built environment that allowsthe user to directly control navigation while viewing of objects, as they are viewedby the avatar. This permits for both agent-based and object-based navigation with adestination that is non-distal. The user can define the navigational movement of theagent as a basis of movement or the user can refer to objects.

In the new system, the user inputs his desire to the system requesting a change ofviewing position, expressed in terms of natural language while the output remainsvisual. In other words, the user talks to the agent and sees on the screen what theagent views in the virtual world. The mechanism of language understanding wasexamined and a way to address the imprecise and unambiguous nature of languagewas found. The mechanism of frames of reference helped us to disambiguatelinguistic spatial expressions. In order to address the problem of the imprecisenature of language the tool uses a user’s profile to predict possible intentions. Wehave stated that intention can be divided into an observational mode andmanipulative mode. The results were to be incorporated in the new system usingfour types of command: bounded agent, directed agent, converse path andtransverse path.

Tool development

The goal of this dissertation was to build a navigation system that is interactive andfulfils the requirements of the architect to explore new horizons in the virtual three-dimensional world. The system was built without predicate reasoning, due to timeconstraints. This means that there is a limit to how complex a sentence formationthe system can handle.

The linguistic tool has its limits in navigation in extreme heterogeneous andhomogenous space, and sparse and dense environments. The system is vulnerableto navigation in the wild (natural environment) or deficient or twisted language ofthe user in describing the environment.

128

Model development

This study developed a theoretical framework for a system that utilizes the agent-centered and object-centered navigation to allow the user maximum flexibility.Since the tool is a linguistic tool, it can differentiate between agent-centered andobject-centered, allowing it to differentiate between different types of path.

This research has presented an explanation of the specific experience of a person’smovement in virtual space. For the purpose of the explanation, a model wasdeveloped to direct the movement of a person with a ‘point of view’ and with theability to direct attention, through linguistic commands. The model is effectivebecause it achieves better matching of expectation to performance relative toprevious tools since it combines the different levels of representation. The toolextends the observer’s ability to manipulate objects through frames of reference. Itpresents the possibility of traveling and engaging objects in virtual environments.

The tool had to overcome some difficulties inherent in language understanding thatmake language imprecise. The mechanism of frames of reference helped todisambiguate linguistic spatial expressions. Two mechanisms were employed tocorrect the problem; one is the user profile where the intention of the user isrevealed, and the other is the use of a reference system to establish direct location.Added to that, this is the constrained domain, and one has a system that could beused to navigate in virtual built environments.

11.3 Future research

The approach that this research has taken has yet to be fully implemented in thevirtual world. Once the tool has been developed, future research might expand ourunderstanding of the way people navigate in the virtual environment. The toolallows for broad types of investigations for wayfinding (recall and inference) andcan have a direct influence on our understanding of the physical structure of urbanenvironments.

Future research can be the establishment of qualitative research methods in urbanstudies and media studies. The investigation can also concentrate, for example, onthe translation of the linguistic system functions back to visual instruction level.This can be examined by simulating the proposed interaction of the visualmanipulation engine compared to a control group using a human mediator (muchlike in the second set of scenarios described in Chapter 10.3).

17-Jun-09

129

Conclusion

We applied an interdisciplinary research method that employs knowledge fromcognitive science, computer science, vision and linguistic theory. We also utilizedcase studies to facilitate this interaction and examine its consequence.

We examined how one can improve navigation through the examination of existingtools. Through this examination, we have demonstrated the inability of existingtools to interact with objects naturally. With further examination of visual andlinguistic systems, we came up with a linguistic system for navigation that issuperior to existing systems. Through the limitation of a scenario, we establishedthe importance of such a system for architects, as a presentation tool.

.

.

.

17-Jun-09

131

Appendix I

A scene from Sonja (see Chapter 7.3)

Figure 1.1

The Amazon is in the center of the picture; anamulet is directly to her right. The frog-likecreature with ears is a demon; a ghost is directlybelow it. In the room there are two moreamulets.

Figure 1.2

Sonja kills the ghost by shooting shuriken atit. A shuriken is visible as a smallthree-pointed star between the Amazon andthe ghost.

Figure 1.3

Sonja shoots at the demon.

Figure 1.4

The Amazon throws a knife at the demon basedon the instruction "Use a knife". The knifeappears as a small arrow directly above theAmazon.

Figure 1.5

Sonja passes through the doorway of the room.

132

Figure 1.6

Sonja turns downward to get the bottom amulet.

Figure 1.7

Sonja has been told "No, the other one!" andheads for the top amulet instead.

Figure 1.8

Sonja has just picked up the amulet,causing it to disappear from the screen.

17-Jun-09

133

Appendix II

This is a compilation of interviews conducted in 2004 with architects in Rotterdam,the Netherlands.

Andrew Tang (MVRDV)

Flight Forum, Eindhoven, Business Park

Andrew: We were concerned with the grouping of several towers and also therelation to the main road with it services, i.e. a bus line and car transport.

From what points of view did you choose to examine the site?

Andrew: We looked at the project from the airport (since it is seen from the air).We also looked at the project from the point of view of the passenger in a car asthey approach the site.

What were the objects you wanted to examine?

Andrew: I was interested in viewing the relationship between towers. For that wegenerated an elevation along the line of the buildings. The map helped us toestablish the distances between the buildings but this is highly abstract. We alsoanalyzed the movement of cars through the site. We did not generate a sequence ofperspectives of the chronological movement of the passengers, which would havehelped people understand and experience the site. There are may parameters forone to examine in this kind of project, and one of the dimensions missing from theplans and elevations is the three-dimensional view. There are many planes that youhave to visualize three dimensionally. As architects we are trained to work in twodimensions and think three-dimensionally, but the general public is not.

Is there some other point of view which you think was important and you did notgenerate?

Andrew: We needed some perspectives from different points of the site whichwould have shown the relation between the different elements of the site in three

134

dimensions. The model can only generate views from the parameters of the site,you are still not embedded in the site.

Eva Pfannes

OOZE Architects

Vandijk clothing store

Eva: I was concerned with how my client inhabited the space. I wanted to developa structure to support the activity that takes place in the confined space, to use thedepth of the space of 20 meters with maximum flexibility.

From what points of view did you choose to examine the store?

Eva: We examined three points of view:

The entrance from where the client looks at into the store;

The way the client would look at the merchandise, looking at the left side of thestore;

And from the back of the store looking out.

Eva: I then realized that we needed an overall view so we can examine particularareas of concern, so I generated a wide view perspective from the top to show thelayout of the store. The client had problems understanding how particularsituations will work and I had to explain how the place would work. In order toovercome the limiting view of the plan we have run several scenarios of usability.So with my client I went through scenarios of shoppers, suppliers and attendants.For example, with the shopper, we examined how to attract him to the store andonce inside what he will see and do next. Where are the dressing rooms and can theshopper see them without hindrance? How would the merchandise be displayedand how inviting is it?

17-Jun-09

135

Paul van de Voort

DAF

A house in Den Haag

The house design was part of a prearranged package for the client to choose from.This is a very redistricted site with pitched roofs, volume, gutter, etc. The reasonwhy such restrictions were imposed was an attempt to preserve the old nostalgicstyle. We were asked to generate a typical house. When the client came to us, heneeded a bigger house and that is how we met.

The client wanted gables in front and a veranda at the back. When you look at thesection the house it is divided into two parts: the front with a garage and bedroomson top, the living room is at the back. The front was more in keeping with the oldnostalgic houses while the back was more open.

From what points of view did you choose to examine the house?

The way we work is with models. We presented the client with a model and talkedto him about the house. We always put in the human scale so the client can imaginehimself in the house. This is the closest experience one can have to an actualwalkthrough. We do not generate walkthroughs or single perspectives because theydo not convey enough information, and a video walkthrough is too constrained forthe client. There is always something that the client wants to see that is notincluded in the video.

17-Jun-09

137

References

…

Index

Axial Parts: Directed, Generating, OrientingObservational mode / Manipulation modeConverse / TransverseAllocentric / EgocentricCentroidCognitive mapsUser activity ModelTelepresence ModelAugmentedComputer programs

3DS MAX® – CADAmazon’s Sonja

CAVE™Cosmo Player® - exploration programMyst® – exploration quest gamesTomb Raider® - action games

Continuity constraintDirect ManipulationSpatial frames of reference

Absolute frames of referenceIntrinsic frames of referenceRelative frames of reference (deictic)

GestureCorrespondence rulesIntentionVisual reasoning/ spatial reasoningVisual criteriaSensorimotor informationObject rotation / array rotation

138

OrthographicPanning / ZoomingPath structurePerspective / Point of viewPointing and labeling taskOrientation information

Polar co-ordinatesObject reference system

Prepositions of EnglishFeatures of an objectReference systemScene recognition

Event and StateRoute knowledgeSemantic / Syntactic

Lexical Conceptual SemanticsSpatial cognition / Spatial languageSpatial depictionSpatial descriptionSpatial representation

Visual informationHaptic information

Substantiality constraint/continuity constraintNaive physicsIntuitive physics

Panoramic navigationUser activityVisual criteriaVisual perceptionVisual navigationIntentional structureAttentional state

17-Jun-09

139

Samenvatting

ABOUT THE AUTHOR

Asaf Friedman was born in Tel Aviv, Israel, in 1958. Prior to his doctoral studies,he was engaged in teaching architecture at several architectural and design schools,primarily in the Technion: teaching basic design courses, second year design, andmultimedia courses. In addition, he co-founded “Archimedia Ltd.”, whichprovided architectural services to clients as well as visual and multimediapresentations. The firm produced 3D models, animation, and multimediainteractive presentations for major architectural and industrial design studios inIsrael.

Between 1991 and 1993, he studied for his Master’s degree at Pennsylvania StateUniversity with a thesis entitled “The Architecture of the Face, Levinas’ Theory ofthe Other”. In this project, he showed the importance of non-representation andexplored (what he termed) the ‘architectural Other’ - that which evadesrepresentation. Emmanuel Levina’s notion of the face was examined with respectto the façade of buildings, specifically to buildings which were presented in theworks and writing of Leon Battista Alberti.

In the years 1988 - 1991, he worked in New York as an architect, and during thattime he constructed a conceptual ideographic tool, and together with thephilosopher Dr. I. Idalovichi published it as a book by Macmillan Maxwell KeterPublishing under the title “Eidostectonic: The Architectonic of ConceptualIdeographic Organon”. In this text, he attempted to combine conceptual andvisual elements of semiotics. The relationship between representation andsignification was examined, and a synthesis of ideograms and concepts wasrepresented in a singular comprehensive tectonic system. The book concentratedon the notion of space, and its historical conception from the ancient Greeks to themodern sciences. In 1986 he gained his Bachelor of Architecture degree from theTechnion Israel Institute of Technology in Haifa, Israel.

frames of reference and direct manipulation based nagivation

Documents

object of view

view of objects

objectbased navigation

user talks

visual realism

direction of view

virtual worldthrough

frames of reference