rights / license: research collection in copyright - non ... · user perception and acceptance...

Research Collection

Doctoral Thesis

Quality aspects of multimodal communicationuser perception and acceptance thresholds

Author(s): Zuberbühler, Hans-Jörg

Publication Date: 2003

Permanent Link: https://doi.org/10.3929/ethz-a-004583162

Rights / License: In Copyright - Non-Commercial Use Permitted

This page was generated automatically upon download from the ETH Zurich Research Collection. For moreinformation please consult the Terms of use.

ETH Library

https://doi.org/10.3929/ethz-a-004583162

http://rightsstatements.org/page/InC-NC/1.0/

https://www.research-collection.ethz.ch

https://www.research-collection.ethz.ch/terms-of-use

DISS. ETH NO. 15124

QUALITY ASPECTS OF MULTIMODAL COMMUNICATION:USER PERCEPTION AND ACCEPTANCE THRESHOLDS

A dissertation submitted to the

SWISS FEDERAL INSTITUTE OF TECHNOLOGY ZURICH

for the degree of

Doctor of Natural Sciences

presented by

HANS-JORG ZUBERBOHLER

Dip!. Umwelt-Natw. ETH

born 11.02.1968

citizen of Urnasch (AR)

accepted on the recommendation of

Prof. Dr. Dr. Helmut Krueger, examinerProf. Or. Albert Kundig, co-examiner

Or. Sissel Guttormsen Schar, co-examiner

2003

Acknowledgement

This thesis would not exist without support and cooperation of a number of people,

whom I would like to thank:

First and foremost, Prof. Dr. Dr. Helmut Krueger, my promoter, for his keen obser

vation and his valuable advice. He provided an excellent research environment for the

achievement of this thesis.

Furthermore, Prof. Dr. Albert Kiindig for the hours we spent discussing, and for

funding the QED-project, in whose frame I was writing my thesis.

A great thank also to Dr. Sissel Guttormsen Schar who introduced me to the world of

scientific research. And to my other colleagues in the research group man-machine inter

action, who have contributed in one way or another to make my time at the ETH one

that I will always look back to with great pleasure: Marc Arial, Morten Fjeld, Christine

Hitzke, Pamela Ravasio, Sam Schluep, and Phillipe Zimmermann.

Thanks also goes to the QED-team members Alexander Braun and Patrik Estermann

for the work they have done to implement the videoconference setup and to run experi

ments.

A special thanks to Kent Riopelle who proofread my thesis and provided valuable

feedback to improve its comprehensibility.

Finally, I would like to thank my parents and friends who ~upported and encouraged

me. Most of all, I thank my partner Ruth for her continuous and loving support.

Ziirich, August 2003 Hans-Jorg Zuberbiihler

Table of Contents

Table of Contents I

Abstract V

Zusammenfassung IX

1 Transfer to Practice 1

1.1 Regarding Human-Computer Interaction (HCI) 4

1.2 Regarding Human-Human Interaction (HHI) 6

2 Introduction 9

2.1

2.2

2.3

Background and Aims 9

Scope of Investigation 12

2.2.1 Delay as Quality of Service (QoS) Parameter 15

2.2.2 Published Results for Perception and Acceptance of Delay 17

2.2.3 A Psychophysical Approach 19

Structure of the Thesis 20

3 Theory 23

3.1 A Taxonomy of Communication 23

3.1.1 Social context 26

3.1.2 Orientation 27

3.1.3 Coding 29

11 TABLE OF CONTENTS

3.1.4 Modality 31

3.1.5 Timing 34

3.1.6 Exemplification of the interpersonal communication model 37

3.2 Processing Time of Auditory and Visual Stimuli 40

3.2.1 Indirect: Reaction Time Differences 40

3.2.2 Direct: Event-Related Potentials (ERPs) 41

3.3 Mental Representation of Time 43

3.3.1 Low Frequency Processing 44

3.3.2 High Frequency Processing 46

3.4 Neural and Cognitive Models of Time Perception 48

3.4.1 Labelled Lines 49

3.4.2 Population Clocks (Neural Networks) 50

3.4.3 Pacemaker-Switch-Accumulator Models 52

3.5 Psychophysical Theory for Measuring Thresholds 54

3.5.1 Testing paradigms 54

3.5.2 Specification of the Psychometric function '1'= f(ifJ) 55

3.5.3 Adaptive Psychophysical Procedures 59

4 Experiments 65

4.1 In Human-Computer Interaction (HCI) Mode 65

4.1.1 Experimental Setup 66

4.1.2 Procedure 66

4.1.3 HCI-Results 70

4.2 In Human-Human Interaction (HHI) Mode 74

4.2.1 Experimental Setup 75

4.2.2 Procedure 76

4.2.3 HHI-Results 79

5 Discussion and Conclusions 85

5.1 Regarding Relative Delays 85

5.1.1 In Human-Computer Interaction (HCI) 85

TABLE OF CONTENTS III

5.2

5.3

Regarding Absolute Delays 89

5.2.1 In Human-Computer Interaction (HCI) 89

5.2.2 In Human-Human Interaction (HHI) 91

Further Research 97

5.3.1

5.3.2

Relative Delay 98

Absolute Delay 98

Annex 101

Developed Software: The best-PEST Calculator 101

Description 102

Monte-Carlo-Simulations 107

References 113

Glossary 121

Index 131

Seite Leer /Blank leaf

Abstract

Recent trends in telecommunication networks indicate a shift away from the use of

circuit-switched networks towards the use of packet-switched networks. This new net

working environment will present end users with new characteristics like variations in

transmission delays, and bit rates as well as potential loss of data packets. These charac

teristics represent a challenge in the design and use of packet-switched networks, since

they may be lead to user impairments, depending on the kind of source coding and com

pression used in the end-systems.

It is generally agreed that very little is known about user expectations or perceptive

mechanisms and user behaviour in this new situation. As a consequence, it is presently

difficult to base network engineering on proper traffic forecasts and real user require

ments. This lack of knowledge is the driving force behind our work, aiming to examine

user perception and acceptance of the Quality of Service (QoS) parameters absolute and

relative del(~ys (also referred to as roundtrip delay and !)nchronisation errory.

In this thesis we investigated the perception and the acceptance thresholds for particu

lar delay parameters using psychophysical methoc 5. I.e. threshold are obtained by means

of empirical determinations applying either 2-alt, nativeforced-choice oryes-no paradigms, and

using the adaptive psychophysical procedure ( Jled best-PEST. The experiments are con

ducted in the interaction modes Human-Co IJjJuter-Interaction (HCI), and Human-Human

Interaction (HHI), which evoke different del ,y perceptions. HeI delay thresholds are ob

tained using an experimental set up that irc1udes stimulus presentation, best-PEST algo

rithm, and data acquisition. It is implemented using the object-oriented scripting lan

guage Lingo. The experiments conducted in the HHI mode comprise threshold determi

nations in which the experimental subjects interact with each other over a videoconfer

ence that uses an ATM-network infrastructure. The experimental set-up consists of two

VI ABSTRACT

or three videoconference stations connected via fibre passing through a system called

ARES, which emulates the behaviour of AIM channels in real-time with the possibility

to emulate performance degradations, such as delay or errors.

In the HCI mode the following thresholds are determined:

• Relative delay between auditory stimuli preceding visual stimuli (AV).

• Relative delay between visual stimuli preceding auditory stimuli 01A).

• Absolute delay between voice input and visual computer-generated response

0loiVis).

• Absolute delay between mouse input and visual computer-generated re

sponse (MouVis).

In the HHI mode the following thresholds are determined:

• Absolute delay in basic auditory interaction between two subjects (AudBas).

• Absolute delay in basic visual interaction between two subjects 01isBas).

• Absolute delay in realistic audio-visual interaction between three subjects

(AudVisReal) .

• Absolute delay in realistic auditory interaction between three subjects

(AudReal).

The thresholds for relative delays are 71 (±17) ms for the AV condition, and 105

(±25) ms for the VA condition. The thresholds for absolute delay in HCI are 115 (±23)

ms for the VoiVis condition, and 78 (±14) ms for the MoiVis condition. In HHI the

thresholds for absolute delays are 216 (±44) ms for the AudBas condition, and 237 (±92)

ms for the VisBas condition. Accomplishing a realistic task the perception threshold is

1220 ms, and the acceptance threshold is 2080 ms in the AudVisReal condition. In the

AudReal condition the perception threshold is 970 (±310) ms, and the acceptance

threshold is 1760 (±410) ms. Age and gender of the experimental subjects have no sig

nificant effect (p>0.05) on these results.

To obtain psychometric functions experimental data of each condition are fitted using

a logistic model. The benefit of such functions is that network planners, as well as con

tent and service providers are delivered with a means to estimate which user percentages

are expected to detect and to reject a specific delay. This 'political' question is influenced

ABSTRACT VII

by economical considerations, which price/performance ratio is intended to be offered

to the user.

Furthermore the relative delay thresholds are discussed in the light of neural process

ing times for different modalities. And the absolute delay threshold is discussed regarding

the task dependency represented by different degrees of interactivity.

Zusammenfassung

Telekommunikationsnetzwerke werden umgestellt von vermitdungsorientierten zu pa

cketvermittelten Netzwerken. Diese Umstellung hat zur Folge, dass die Benutzer mit

veriinderten Netzwerkeigenschaften konfrontiert werden, wie zum Beispiel einer

variablen Durchsatzrate und Obertragungsverzogerung, aber auch mit Verlusten von

Datenpaketen. Diese neuen Eigenschaften stellen eine Herausforderung beziiglich

Auslegung und Benutzung von packetvermittelten Netzwerken dar, da sie zu

Behinderungen des gewohnten Kommunikationsprozesses fiihren konnen.

Bis anhin ist in diesem Gebiet noch wenig gesichertes Wissen vorhanden, weder dar

iiber wie die Benutzer diese neue Situation wahrnehmen, noch damber wie sie sich insge

samt verhalten. Dies erschwert die Konzeptionierung und Dimensionierung von Tele

kommunikationsnetzen, da auf fundierte Annahmen iiber Benutzerbediirfnisse und ver

Hissliche Vorhersagen zur Netzwerkbelastung verzichtet werden muss. Die beschriebene

Wissensliicke ist die treibende Kraft hinter der vorliegenden Arbeit, in der die Wahrneh

mung und Akzeptanz der beiden Dienstqualitat-l 'arameter absolute und relative Verziigerung

untersucht werden.

Die Wahrnehmungs- und Akzeptanzschwellen der einzelnen Verzogerungsparameter

werden anhand empirischer Versuche mit psychophysischer Methodik bestimmt. Dabei

kommen entweder das 2-alternative forced-choice oder das yes-no Paradigma sowie das adapti

ve psychophysische Verfahren best-PEST zur Anwendung. Die Experimente sind aufge

teilt in die beiden Interaktionsmodi Mensch-Computer-Interaktion (HCI) und Mensch-Mensch

Interaktion (HHI), die beide unterschiedliche Verzogerungswahrnehmungen hervorrufen.

Zur Bestimmung der HCI-Verzogerungsschwellen wird ein Versuchsaufbau eingesetzt,

der Stimuluspriisentation, best-PEST Algorithmus und Datenerhebung vereint und der

mit der objektorientierten Skriptsprache Lingo programmiert ist. Versuche im HHI Mo-

x ZUSAMMENFASSUNG

dus andererseits werden mittels einer Videokonferenzanwendung durchgefiihrt, die uber

ein emuliertes ATM-Netzwerk Hiuft. Dieser Versuchsaufbau besteht aus zwei oder drei

Videokonferenzstationen, die uber Glasfaser mit dem sogenannten ARES-System ver

bunden sind. (Das ARES-System emuliert das Echtzeit-Verhalten von ATM-Kanalen

und bietet die Moglichkeit, gezielt Leistungsverschlechterungen bezuglich Verzogerung

und Fehlerverhalten zu simulieren).

Im HCI Modus werden folgende Schwellwerte bestimmt:

• Relative Verzogerung zwischen auditiven Stimuli, die den visuellen vorange

hen (AV).

• Relative Verzogerung zwischen visuellen Stimuli, die den auditiven vorange

hen (VA).

• Absolute Verzogerung zwischen Stimmeingabe und visueller, rechnerge

stiitzter Antwort (VoiVis).

• Absolute Verzogerung zwischen Mauseingabe und visueller, rechnergestiitz

ter Antwort (MouVis).

Im HHI Modus werden folgende Schwellwerte bestimmt:

• Absolute Verzogerung bei einfacher auditiver Interaktion zweier Versuchs

personen (AudBas).

• Absolute Verzogerung bei einfacher visueller Interaktion zweier Versuchs

personen (VisBas).

• Absolute Verzogerung bei realistischer audio-visueller Interaktion zwischen

drei Versuchspersonen (AudVisReal).

• Absolute Verzogerung bei realistischer auditiver Interaktion zwischen drei

Versuchspersonen (AudReal).

Die Schwellwerte fur relative Verzogerungen betragen 71 (±17) ms in der AV

Bedingung und 105 (±25) ms in der VA-Bedingung. Die Schwellwerte fur absolute Ver

zogerung in HCI betragen 115 (±23) ms in der VoiVis-Bedingung und 78 (±14) ms in

der MoiVis-Bedingung. In HHI betragen die Schwellwerte fur absolute Verzogerung 216

(±44) ms in der AudBas-Bedingung und 237 (±92) ms in der VisBas-Bedingung. Wenn

die Versuchspersonen realistische Gesprachssituationen nachzubilden haben, liegt ihre

Wahrnehmungsschwelle bei 1220 ms und ihre Akzeptanzschwelle bei 2080 ms (AudVis

Real-Bedingung). In der AudReal-Bedingung betragen diese Werte 970 (±330) ms fur

ZUSAMMENFASSUNGXI

Wahrnehmung und 1760 (±410) ms fur die Akzeptanz. Weder das Alter noch das Ge

schlecht der Versuchspersonen ubt einen signifikanten Einfluss (p>O.OS) auf die

Schwellwerte aus.

Um aus den experimentellen Daten psychometrische Funktionen zu erhalten, werden

fur alle Bedingungen logistische Kurven gefittet. Der Nutzen dieser Funktionen besteht

darin, dass Netzwerkplaner sowie Anbieter von Inhalten und Diensten abschatzen kon

nen, welcher Anteil Benutzer bestimmte Verzogerungswerte bemerken und/oder ableh

nen wird. Diese ,politische' Frage wird massgeblich durch okonomische Betrachtungen

beeinflusst, welches Preis-Leistungsverhaltnis den Kunden angeboten werden soli.

Des weiteren werden die re1ativen Verzogerungsschwellwerte im Licht der neuronalen

Verarbeitungsgeschwindigkeit fur verschiedene Modalitaten diskutiert. Und absolute

Verzogerungsschwellwerte werden beziiglich ihrer Abhangigkeit von den auszufuhren

den - durch verschiedene Interaktionsgrade gekennzeichnete - Aufgaben diskutiert.

1 Transfer to Practice

This chapter compiles the results rif the thesis that are direct!J transftrable to fields rifprac

tice. Atfirst a briif summary rif the background and than the motivation fOr the thesis is

presented. Subsequent!J qualitative results are discussed, and final!J quantitative!J listed in

diverse tables, each consisting rif user percentages fOr particular deltry types, andfor the two

interaction modes, Human-Computer-Interaction (HCI), and Human-Human-Interaction

(HHI).

In recent times, the underlying technology of public network infrastructures experi

enced a radical change from circuit-switched to packet-switched technology. The original

reason for this change is that with new packet-switched technologies, for instance Ipa,

the infrastructure can be operated with better capacity, since several data streams can be

multiplexed. This is in contrast to traditional circuit-switched technologies with ISDN

serving as the most service-wise advanced example, where certain circuits are reserved

for respective services. Another reason for the change is that packet-switching better

matches the characteristics of computer-generated data. This is crucial considering the

spread of computers acting as end-systems. Regarding the characteristics of these two

technologies it appears that - on the one hand - packet-switching results in higher net

work dJicienry, but - on the other hand - causes lower network predictability in terms of the

Quality of Service (QoS): Variations in transmission delays and bit rates, as well as poten

tialloss of data-packets are more likely to occur.

These new characteristics may lead to user impairments, depending on the application

used. For instance, real-time applications like voice-over-IP (VoIP) or videoconferences

are considered most critical in terms of delay. In this context, the question is, at which

threshold value a delay becomes perceivable, and at which threshold value does it be-

a All acronyms and abbreviations are explained in the glossary beginning on page 121.

2 CHAPTER 1. TRANSFER TO PRACTICE

come perturbing. In this thesis we decided to investigate these two delay thresholds by

means of a psychophysical approach. As a quantifiable result of the thesis, so-called psy

chometric functions are obtained, which describe the user detectability and acceptance of

different delay values. These functions are listed at the end of this chapter. Before, we

take a look at the results of the thesis that are rather of qualitative nature.

The experiments showed that perception and consequently acceptance of delays are

very much task-dependent. Therefore it is probably not helpful to recommend universal

threshold values; rather they should be suggested for different task categories. It seems

that the degree rf interactiviry acts as the most delay-sensitive property of any communica

tion scenario. For the time being, the choice of offered delay values should be kept as a

business strategy of the service provider. In order to base this strategy on a reliable fun

dament, it will be helpful to classify the abundance of relevant task categories, and to as

sess the proper delay thresholds for these categories separately. With such knowledge it

will be possible to adjust the delay values according to the measured degree of interactiv

ity. Having knowledge of the appendant psychometric function, the delay can be set ac

cording to a predefined (or negotiated) percentage of users perceiving or accepting this

particular delay.

Additionally, the experiments of this thesis showed that delay perception and accep

tance are not only influenced by the degree of interactivity but also to a great extent by

the number of communication channels the application offers: it seems that the visual

channel in an audio-visual application is acting as a distractor. I.e. the focus of attention

is divided into parts for the audio, and parts for the visual channel. Thus, the gain of

'media richness' in audio-visual communication has to be paid by a loss of focussed per

ception. Furthermore, the number of participants of the communication event turned

out to be another distractor: it seems that the focus of attention is divided into all com

munication members. With increasing number of participants, this results in a decrease

of attentional resources for the detection of delay.

In summary it appears that the three following factors are responsible for the users'

delay requirements. All of them result in high delay-tolerance, since they evoke poor de

lay perception:

• Low degree of interactivity.

• Increasing number of participants.

• Transition from mono- to multimodal communication.

1.1 Regarding Human-Computer Interaction (HCn 3

Suboptimal communication support, e.g. missing gaze awareness of the videoconfer

ence technology could be mentioned as a fourth factor that leads to higher delay

tolerance. Without gaze awareness the communication members are not sure when they

are addressed - unless they are explicidy verbally addressed. This again slows down the

degree of interactivity, and might be the reason, why the results of the conducted ex

periments suggest acceptable delay values for realistic audio-visual tasks that are well

above the elsewhere suggested values for audio-only tasks. Nevertheless, we have to bear

in mind two things.

• Users still have poor practice with multiperson, multimodal telecommunica

tion services. With the upcoming use of such services, users will most

probably improve their delay perception skills ~.e. they will avail themselves

of free attentional resources that are no longer needed to cope with the new

technology).

• The psychophysical methods used in the experiments disallow conclusions

about long term effects regarding what Wilson calls user costs (Wilson et al.,

2000). Although users do not perceive a particular delay as disturbing, it may

subconsciously increase mental strain. A technology which evokes such ef

fects contradicts the user-centred paradigm.

The remainder of this chapter presents the quantitative results of the conducted ex

periments. They provide insight about the users' perception performance for different

delays types in different modalities. Due to the above mentioned reasons, the results of

the realistic tasks (fable 4) should not be applied to situation where only two partners

communicate.

The results are divided into Human-Computer-Interaction (HC!), and Human

Human-Interaction (HH!). They describe two fundamental interaction modes resulting

in different delay perception. A further distinction concerns the types of delay: Relative de

Itry is perceived between particular modalities, e.g. between the auditory and the visual

channel. This delay is sometimes called intermedia ~nchronisatio J, ~nchronisation error, or lip

~nchronisation. The other type is called absolute de/try. It is per :eived only in dialogue set

tings between sending information and receiving answer from the dialogue partner. This

delay is sometimes called roundtrip deltry or return trip deltry.

The benefit of the following tables is that network planners, and content providers,

are delivered with a means to estimate which user percentages are expected to detect and


to reject a specific delay. This 'political' question is influenced by economical considera

tions, which price/performance ratio is intended to be offered to the user.

1.1 Regarding Human-Computer Interaction (Hel)

In HeI, we ran experiments determining the relative and the absolute delays. Results

concerning relative delays are available for situations where audio precedes the appendant

visual stimulus (condition AV), as well as for the opposite stimulus order (condition VA)

(see Table 1). The results are considered suitable for most stringent requirements, i.e. for

tasks facilitating the perception of asynchrony.

Further results concern the detection of absolute delays in situations where users ex

perience a delay between their vocal input and a computer-generated visual response

(condition VoiVis). Or between their mouse input and a computer-generated visual re

sponse (condition MouVis) (see Table 2). These results are considered suitable for appli

cations that e.g. enable voice recognition, or that are driven by mouse pointer or joystick

inputs (e.g. database queries, browsing the WWW, or image processing software). Addi

tionally, the absolute delay thresholds in HeI can also be used to analyse the later de

scribed HHI thresholds, since they represent a component inherent to all network

mediated HHI.

1.1 Regarding Human-Computer Interaction (HCn

Table 1 Relative delay values perceived by particular percentages of users.

Reading example: It can be expected that not more than 25 % of users will detect an

AV-delay of 53 ms, and a VA-delay of 74 ms, respectively.

5

Percentage ofUsel'$Detecting Asynchrony in Bel

"f%]5

10

25

33

50

67

75

90

95

Extent of Asynchrony whenAuditory Precedes Visual (AV)

; [ms]

12

29

53

61

77

92

101

125

141

Extent of Asynchrony whenVisual Precedes Auditory (VA)

; [ms]

34

50

74

67

98

113

122

146

162

Table 2 Absolute delay values perceived by particular percentages of users.

Reading example: It can be expected that up to 75 % of users will detect an absolute de

lay of 146 ms when interacting by voice. And up to 75 % of users will detect an absolute

delay of 96 ms when interacting by mouse clicks.

Percentages of Users DetectingAbsolute Delays in Hel

"C%]25

33

50

67

75

90

95

Absolute Delay inVocVis Interaction Mode

; [ms]

50

67

98

129

146

195

228

A .~ . • -nal.." 1ft

Aa • •• uelay :;ode; [ms]

33

45

65

85

96

128

149


1.2 Regarding Human-Human Interaction (HHI)

In the HHI mode we ran experiments determining absolute delay thresholds. Results

are available for delay perception in basic auditory and visual interaction (conditions

AudBas and VisBas) (see Table 3). Since these experiments evoked a maximal degree of

interactivity, the results are considered to represent the minimal delay users can perceive,

when interacting together. Further HHI experiments concern perception and acceptance

of absolute delays, when users execute realistic tasks (conditions AudVisReal and

AudReal) (see Table 4). Note that these results count only for the chosen task (free dis

cussion about a familiar topic) involving three participants. Since other tasks might evoke

different degrees of interactivity, they are assumed to allow for different delay values.


Reading example: It can be expected that up to 75 % of users will detect an absolute de

lay of 228 ms in auditory HHI. And up to 75 % of users will detect an absolute delay of

239 ms in visual HHI.

Percentages of Users DetectingAbsolute Delays in HHI

",[%]

5

10

25

33

50

67

75

90

95

a ...

a L _t

109

131

164

175

196

217

228

261

283

aL _I. <_

.. ---" .""-",'1T , •••_"

109

133

169

181

204

227

239

275

299

1.2 Regarding Human-Human Interaction (HHI)


Reading example: It can be expected that not more than 33 % of users will detect an ab

solute delay of 734 ms when interacting audio-visually, and 535 ms when interacting

solely in the auditory mode. And not more than 33 % of users will find that an absolute

delay of 1610 ms is disturbing when interacting audio-visually, or 1430 ms when inter

acting in the auditory mode. Note that these values count only for the chosen task.

7

Percentages of UsersDetecting or AcceptingAbsolute Delays in HHI

",rlOJ5

10

25

33

50

67

75

90

95

Perception of Absolute Delay

In realistic In realisticaudio-visual task audio-only task

(AudVisReal) (AudReal); [msJ ; [msJ

n.a. n.a.

n.a. n.a.

466 386

734 535

1220 800

1710 1070

1970 1220

2730 1640

3240 1920

Acceptance of Absolute Delay

In realistic In realisticaudio-visual task audio-only task

(AudVisReal) (AudReal); [msJ ; [msJ

n.a. 617

629 889

1350 1290

1610 1430

2080 1690

2550 1940

2810 2090

3530 2480

4030 2750

2 Introduction

In this chapter we expose the reasons that motivated us to investigate quality issues in mul

timodal real-time communication. To begin we briif!y describe the state-rif-the-art in tele

communication technology and outline user impacts 0/ such technology. Subsequent!J the ge

neric approach is narrowed down to fit the actual scope 0/ the investigation, pointing out the

p.rychopl!Jsical approach for measuring delay perception lry means 0/aglobal model 0/the us

ers' perception and acceptance 0/ environmental stimuli. Last!J the structure 0/ the thesis is

presented.

2.1 Background and Aims

Recent trends in telecommunication networks indicate a shift away from the use of

circuit-switched networks (with ISDN serving as the most technologically and service

wise advanced example) towards the use of packet-switched networks (e.g. lP, ATM or

MPLS) (Coffman et aI., 1998). Thus, most operators of public networks plan to migrate

their core network infrastructure to a universal, service-independent system operating in

a packet-switched mode. The original motive for using packet-switching stemmed from

the idea that the existing infrastructure could be used more efficiently by multiplexing

several data streams. In the meantime however, it has become clear that this focus is no

longer sufficient. Rather, packet-switching better matches the characteristics of com

puter-generated data. The traditional telecommunication networks (e.g. ISDN) have es

sentially been characterized by the following properties:

10 CHAPTER 2. INTRODUCTION

• Almost constant - and in the case of wire line transmission -low delay.

• Very low error rates for the ftxed network.

• Network services associated with constant bandwidth.

In contrast, the new networking environment will present end users with new charac

teristics like:

• Variations in transmission delay.

• Variations in bit rates.

• Potential loss of data packets.

Thus it appears that with packet-switched networks the users cannot count on a stable

Quality of Service (QOS)b anymore. These new characteristics represent a challenge in the

design and use of packet-switched networks, since they may be lead to user impairments,

depending on the kind of source coding and compression used in the end-systems.

It is generally agreed that very litde is known about user expectations or perceptive

mechanisms and user behaviour in this new situation (Bouch et al., 2000b). Furthermore

it is not yet known how objective system quality relates to users' subjective perceptions

of quality. The reason for this situation is that to date the majority of research on QoS is

systems oriented, focusing on trafftc analysis, scheduling, and routing. Relatively minor

attention has been paid to user-level QoS issues (Bouch et al., 2000a). Moreover, it is not

yet known if and how users make trade-off decisions between variant quality perform

ance and cost. As a consequence, it is presendy difftcult to base network engineering on

proper trafftc forecasts and real user requirements.

At the same time, the range of applications run by users is growing considerably

(Odlyzko, 2000), from traditional point-to-point phone calls to sophisticated computer

based applications, involving both users and servers. In addition, the last few years have

seen mobile communication become all-pervasive, with telephony and short message

services (SMS) dominating. With mobile users, yet another phenomenon is observed:

b The basic Quality of Service (QoS) parameters are: throughput, transit delay, jitter (delay vari

ance), and error rate. For the numerous deftnitions of the QoS-concept see Fluckiger (1995).

Note that the QoS concept is also applied for a broader scope including e.g. picture and sound

quality as well as security aspects.

2.1 Background and Aims 11

Many such users value the ability to communicate freely at least as high as some per

formance measures for the actual service. Two examples may illustrate this observation:

• Audio quality of mobile end devices is obviously very often tolerated at a

level considerably below POTS standards.

• SMS users accept an extremely unwieldy user interface.

This observation is in some ways synonymous to the well-known masking effects in

auditory perception (Zwicker et al., 1967), where certain stimuli are not noticed when

some other stimulus is present at a level above a certain threshold. Such effects may pos

sibly be generalized to a fundamental 'masking principle' where impairments are judged

in the light of the attained benefits, i.e. perturbing stimuli could be masked by more val

ued stimuli. However, the inverse effect is also true, describing a 'negative masking', re

ferred to here as an 'amplifying principle', where negative circumstances amplify a per

turbing stimulus. For example as might be the case in emergency situations where a lack

of effective communication quality may cause adverse effects. In contrast to the quality

factors based on technological sourcesc, masking and amplifying quality factors cannot be

controlled so much, since they are based on contextual and psychological causes.

To recapitulate, it appears that evolutions in telecommunication technology as well as

the growing number of applications deployed by users reveal a broad field of unanswered

questions concerning quality issues. This lack of knowledge is the driving force behind

our work, aiming to examine the end-user's perception and acceptance of QoS

parameters, thereby answering the question:

• How do network-induced impairments affect the interaction oftwo or more users ofa tele

communication .rystem?

The investigations to answer the above question are undertaken in the framework of a

project called QEDd (Kiindig et al., 2001), which aims at making a substantial contribu

tion towards quality-based network engineering, emphasising multimodal person-to-

c which in fact are addressed by the Quality of Service (QoS) concept.d QED is the acronym for Quality ofSmice Expectationsfor Real-Time Dialogue Communication, whichis accomplished in collaboration with Albert Kiindig and Alexander Braun from the ComputerEngineen'ng and Networks Lzboratory (IlK) of the Swiss Federal Institute of Technology Zurich (ETHZ).On his part, Alexander wrote a thesis (Braun, 2003) with emphasis on technological aspects.


person and person-to-computer communication, thus emphasising an user-centred per

spective.

2.2 Scope of Investigation

We will now briefly describe the scope of the investigations undertaken in this thesis.

For this purpose, the following hierarchical diagram (see Figure 1) classifies some out

standing attributes of communication. The chosen emphasis is drawn bold, whereas

situations not in our focus are drawn grey.

Communication

Technologically-Mediated

Real-Time (Synchronous)

Figure 1 Tree diagram showing the chosen emphasis on technologically mediated dialogue communication in real-time. Note that not all possible connections aredepicted.

The emphasis on technologically mediated dialogue communication in real-time has

been chosen for two essential reasons:

• Promising Future Applications

• Predictable User Expectations

2.2 Scope of Investigation

In the following the two reasons are briefly explained.

Promising Future Applications

13

Real-time dialogue communication between users will- despite upcoming new types

of applications - most probably remain an important and revenue-generating application

in both fixed and mobile telecommunication, and in both private and business communi

cation. Examples of such applications are pure videoconference applications, CSCW

tools, or services using UMTS technologies.

Predictable User Expectations

Face-to-face communication between people - which is the unmediated pendant to

technologically mediated communication - requires extremely sophisticated and well

trained pattern recognition skills: in contrast to computer-based pattern recognition, hu

mans are capable of interpreting very subtle variations in facial expression, voice pitch

and timing. As a consequence, we are all experts in the recognition of behavioural devi

ances from what we consider normale. For this reason it should be easy to model and

predict user expectancies for technologically mediated interpersonal communication in

real-time: Since the users compare such services with face-to-face communication, they

are assumed to expect from the application a behaviour which is equivalent to natural

face-to-face communication. This is in contrast to many other applications in the area of

man-machine interaction (e.g. browsing the WWW), for which user expectations are dif

ficult to predict. The reason could be that there is no natural equivalent for these kinds of

applications.

In summary, it appears that applications should support the fundamental information

exchange by the use of audio (hearing each other), video (seeing each other), and shared

tools (such as chat or whiteboard, and application sharing), at best without perceivable

differences to the face-to-face situation (which of course is hardly to attain). Ultimately,

users expect telecommunication services to include the proper conveyance of relevant

environmental aspects (e.g. background noise). Moreover - and probably most crucial-

e On the other hand, a lifetime is probably not enough to attain perfection in face-to-face communication, in the sense that the communicating partners can be sure that the meanings of theirstatements are understood in the intended way.


technologically mediated real-time communication is expected to allow for temporal pat

terns, which are similar to face-to-face communication.

As mentioned in the previous section, it is generally agreed, that very little is known

about user expectations in regard to QoS issues (Bouch et al., 2000b). On the other hand,

it was assumed in this section that users expect an application behaviour that allows simi

lar to face-to-face communication. In fact, these contradictory statements delineate the

objectives of this thesis: Under the assumption that users assimilate technologically medi

ated communication with face-to-face communication, we aim at measuring the bound

ary values of particular QoS-parameters that should not be exceeded in order to allow for

a 'feeling of naturalness'. This - in regard to the lack of quality - upper boundary is called

acceptance threshold.

Furthermore it is assumed that people perceive maximum communication quality re

garding QoS-parameters in face-to-face situations. Or inversely, they do not perceive a

lack of quality in face-to-face situations. As such, this would mean that in face-to-face

communication people are familiar with having maximal information throughputs, no de

lays, no jitter and no error rates. Of course this premise is somewhat out of touch with

reality, and strictly speaking - in the case of delay - incorrect: There is always a minimal

delay due to the propagation speed of sound and light. The following three reasons may

illustrate why this premise, nevertheless, makes sense:

• The transmission delay in face-to-face communication is constant and negli

gible small (approximately 3 ms per meter of communication distance).

• The comparison is drawn by means of an idealised face-to-face situation,

where no disturbing outer influences are present.

• The face-to-face situation is chosen as an idealised point of reference, in or

der to position the quality perception in technologically mediated communi

cation.

With this premise in mind, we aim at investigating a second threshold, providing an

answering to the question: Which degradations of particular QoS-parameters are 'just'

noticed by the users? This question is mainly important for economical reasons, since the

knowledge of the so-called perception threshold provides network planners and content pro

viders with a basis for decision. In fact, below perception threshold values, users will not

benefit from optimisation of network and end-system infrastructure referring to QoS

parameters.

2.2 Scope of Investigation 15

In summary, the scope of our investigations consists of perception and acceptance

thresholds (see Figure 2) in technologically mediated, real-time communication. Note

that in Figure 2 the face-to-face situation is assumed to be at the point where no perturb

ing stimulus is present.

Perturbing Stimulus

I'-------- -------

••••""...--/'

",*50% i"

",l'Perception

ill'## Acceptance

Threshold##

J l1li" Threshold",,'""_,fill..-•••••• -~

100%C\)uc.f3i"~u .u -« co• :sca0_z 0... ~o uc co0-1.- -0. 0

~C\)c..

0% o

Figure 2 Delineation of the scope of investigation for this thesis, showing theacceptance and perception thresholds for an arbitrary perturbing stimulus. The curvesdepict hypothetical user response behaviours.Straight line: Perception of lack of quality.Dashed line: Non-acceptance of lack of quality.

2.2.1 Delay as Quality of Service (QoS) Parameter

So far the question of the users' QoS-perception has been addressed in a rather ge

neric manner, incorporating the basic QoS-parameters. The sy,tematic investigation of allthese parameters, including interdependencies such as masking and amplifying effects,

would require a study design of exorbitant scale. Therefore the investigation is restricted

to a selection of QoS-parameters, which is considered most relevant, needful, and feasi

ble. We decided to short-list the QoS-parameter delay, which includes intermedia syn

chronisations as well as roundtrip delays. The reasons for this choice are exposed in the

following:


• The timing of interpersonal real-time communication contains important

prosodic (or non-verba~ information about the mind frames of the communi

cating partners. E.g. a bigger than accustomed delay between one partner's

proposal and the other partner's answer leads to misinterpretations (e.g. (1)

the latter would need to think about what was said, (2) would not be certain

of the answer, or (3) would simply have a slow reaction). Thus, timing plays

an important role in appraising individual characters and is therefore consid

ered a key parameter in quality-based network engineering.

• Networked audio-visual communication requires - compared to audio only

- more end-system and network resources, since encoding, transmission,

and decoding of motion images are very data-intensive, requiring either high

network throughput or adequate processing power for compres

sion/decompression in the end-systems (it should be noted that compres

sion/decompression operations usually introduce considerable additional

delay). Thus, there are several interdependencies between throughput, com

pression, and delay. Whereas throughput rates and the extent of compres

sion of underlying network and end-system configurations cannot be di

recdy perceived by the end users, this is not the case for delay. Moreover

beside picture and sound quality - it is the delay of a particular network ser

vice that makes throughput and compression perceivable.

• Valid empirical data concerning perception and acceptance of various delay

parameters in multimodal communication remain sparse. Although a lot of

statements have been made about the upper limit of delay for real-time

communication, most of the values refer either to audio-only, or do simply

reflect the technical limits. An early example are investigations conducted by

Bell Laboratories (Helder, 1966). They were triggered by the introduction of

satellites with their inherendy big delays when high orbits are used. The re

sults from these investigations are not fully convincing, since Bell Laboratories

were probably somewhat biased, as they were certainly not interested in

finding 'killing arguments' against satellite communication.


2.2.2 Published Results for Perception and Acceptance of Delay

The subsequent tables list selective studies concerning perception and acceptance of

relative and absolute delays. There exists a trade-off between these two delay parameters

in terms of the possibility to set the relative delay to zero by buffering the faster stream

(usually audio), and - on the other hand - accepting additional wastage of network and

end-system resources. Thus, in order to optimise the allocation of resources without im

pairing the users' quality perception, it is important to have profound knowledge of the

detection and impairment potential of both delay parameters.

Relative Delay

As mentioned, due to different compression/decompression needs for audio and

video data, the transmission of audio-visual data can result in considerable intermedia

synchronisation errors (referred here to as relative delqy). Table 5 lists some selective fmd

ings about the perception of relative delay for both asynchrony orders: Auditory before

visual (AV), and visual before auditory ryA). The results of the listed studies - except

Steinmetz' findings - showed that AV stimuli were detected easier than the opposite or

der. A further result concerned the type of the presented stimuli: synchronisation errors

of distinct stimuli are detected easier than synchronisation errors of the more complex lip

reading.

Table 5 Excerpt of studies concerning the perception of asynchronies.

Authors Condition... .. [msl \"",. ~(m.1.•• n;'

Lipreading 131 258(Dixon et al., 1980)

Hammer hitting peg 75 189

(McGrath et al., 1985) Drawn moving lips 79 138

(Lewkowicz, 1996) Bouncing disk 65 112

(Pandey et al., 1986)Lipreading with n.a.

Slump in performance

masking noise at 80-120

(Steinmetz, 1996) Lipreading ca. 80 ca. 80


The relative delay thresholds do not vary too much, in consideration of the different

experimental designs used in the listed studies. Unfortunately some results of are lacking

from detailed specification of confidence levels. Furthermore, in the scanned literature

no studies were encountered, which provide psychometric functions of the synchronisa

tion errors. In fact, perception thresholds were rarely obtained by means of psychophysi

cal methods.

Absolute Delay

While transmitting multimedia data from one place to another, it is inevitable that a

certain amount of delay is introduced. For pure audio transmissions this delay can be

kept very small, depending on the system architecture, the coding and compression of

the signaL If there is video data in addition, more delay is added because of the bigger

complexity of the video information.

So far a lot of statements have been made about the upper limit of delay for real-time

communication that can be expected of the users. Unfortunately, most of the suggested

values do not reflect the needs of the users but result from technical limits. In Table 6

some selective findings about absolute delay are summed up.

Table 6 Excerpt of studies concerning roundtrip delay.

Authors Condition - co, •• DeI8J>tmsl

(Yamaguchi et al., 1986) audio 360

(Chen et al., 1989) audio 600

(Gonsalves, 1989) audio 400

(Ranta-aho et al., 1998) audio-visual 1400 - 1920

(Alfano, 2000) audio 300 - 800

(Wilkins et al., 1998) audio-visual20 (LAN)

380 (WAN)

(Bouch et al., 2000b) audio-visual 400

(Isaac et al., 1994) audio-visual 640 - 840


2.2.3 APsychophysical Approach

Investigating humans' perceptions of external events is an interdisciplinary undertak

ing involving several branches of study, such as physics, sensory physiology, cognitive

and social psychology, and even cultural anthropology. The inclusion of all branches

would of course go beyond the scope of this thesis. Therefore we will restrict ourselves

to a feasible approach. In our opinion a P!Jchop~sical approach is suitable for the investi

gation of human perception and acceptance of perturbing influences from technologi

cally mediated communication.

The following general definition of p!Jchophysics and its interpretation is offered by

John C. Baird and Elliot Noma (Baird et al., 1978) and used by the International Society ofP!Jchophysics: »Psychophysics is commonly defined as the quantitative branch of the study

of perception, examining the relations between observed stimuli and responses and the

reasons for those relations. This is, however, a very narrow view of the influence it has

had on much of psychology. Since its inception, psychophysics has been based on the as

sumption that the human perceptual system is a measuring instrument yielding results

(experiences, judgments, responses) that may be systematically analysed. Because of its

long history (over 100 years), its experimental methods, data analyses, and models of un

derlying perceptual and cognitive processes have reached a high level of refinement. For

this reason, many techniques originally developed in psychophysics have been used to

unravel problems in learning, memory, attitude measurement, and social psychology. In

addition, scaling and measurement theory have adapted these methods and models to

analyse decision making in contexts entirely divorced from perception.« Hence, accord

ing to this definition, it appears that the term psychophysics is used to denote both the

substantive study of stimulus-response relationships and the methodologies used for this

study.

In Figure 3 a global model (Krueger, 1994) is introduced in which the psychophysical

approach is embedded. The model has been developed as a result of extensive research

concerning deterioration of health and well-b<.:ing, conducted at the Institute ofHygiene and

Applied P~siology (IHA) at the Swiss Federal Institute of Technology (ETH). It offers a means

to explain different user perceptions for objectively same interfering environmental stim

uli. The model is also assumed to be suitable for investigating technologically mediated

communication, since the intermediate technology can be considered an artefact that

evokes perturbing influences.


Figure 3 Global model (Krueger, 1994). The model explains the variance of userperceptions observed for objectively same stimuli.

The model in Figure 3 expresses the basic message that psychological effects may not

be disregarded when explaining environmental factors. I.e. the objective world is com

municated to a subjective world of mental constructs. Subjective assessments (attributionf

and affective judgement) are done according to this mental construct system and not to

the objective world directly. As an entrance to the above model the classical stimulus

response relationship measured by means of psychophysical methodology is dia

grammed. This upper layer outlines the topic under investigation, whereas the deeper lay

ers in the model are not subject to this thesis.

2.3 Structure of the Thesis

After having outlined background and aims of this thesis, and after having delineated

its scope comprising of the chosen psychophysical approach for measuring delay thresh

olds, the remaining components of the thesis are now briefly described.

f For which the Attribution Theory offers means of explanations. The Attribution Theory wasfounded by Fritz Heider (1958) and advanced by Harold Kelley (1973), both social psychologists. The theory is seen as relevant - among other things - to the study of event perception. Itdescribes how people explain events and the behavioural and emotional consequences of thoseexplanations.

2.3 Structure of the Thesis 21

Chapter 3 deals with the theoretical background in the fields of communication,

cognitive psychology and psychophysics, which we consider necessary to elucidate. In a

first part of this chapter a taxonomy of communication is developed, which is arranged

in a layered order with higher positioned entities influencing the subjacent ones. Since

diverse - and sometimes contradictory - theories deal with communication we will give

insight in an excerpt, which is considered suitable and sufficient for our purposes. In a

next part some concepts and conceivabilities are presented dealing particularly with

neural processing speed of different modalities, and with mental representations of time

resolved on the neurological level. Subsequently, chapter 3 details the psychophysical

background necessary to understand the procedure of the experiments conducted in

chapter 4.Chapter 4 is subdivided into two parts, each describing procedures and results of ex-

periments conducted in the mode of either human-computer interaction (HeI) or human

human interaction (HHI). The outcomes of these experiments are discussed in chapter 5,

where also concluding remarks are presented with an emphasis on human factors in net

work engineering.

The following annex consists of a description of the software called best-PEST calcula

tor, which has been programmed in order to run the threshold experiments and which

has been advanced to a fully independent, browser-based application aiming to make it

accessible for a broad audience.

A glossary, together with the cited references and an index of keywords concludes the

thesis.

Seite Lee·r /Blank leaf

3

3.1

Theory

In this chapter we give an overoiew of the theoretical background concerning the topic underinvestigation, particularlY the experiments described in chapter 4. First we develop a taxonomy ofcommunication, where a thread through the theories dealing with communication is established. SubsequentlY we give insight in current research ofmental representation of time,and processing speed in different modalities. The last part of this chapter deals with p[Ychophysical theory and the methods applied in the experiments.

ATaxonomy of Communication

Understanding an entity under investigation implies analysing and describing it. When

this entity is too complex models have to be created, which classify objects and concepts.

Of course a model is an approximation of reality; nevertheless it should provide enough

resolution, so that gained insights are reproducible in reality.

The entity under investigation here is communication. Since there is no widely accepted

general taxonomyg of communication we are about to develop one in the sense of a 'co

ordinate system', allowing for a concise description of specific communication settings.

For the users' quality expectations, the communication setting is considered crucial.

Therefore, it is important to avail oneself of appropriate models for such settings.

In the next sections an approach is described, consisting of the five aspects social con

text, orientation, coding, modality, and timing. It is considered suitable for the investigation of

g Taxonomy is the science of the classifying laws. The notion of taxonomy means establishingclasses within a set; classes may form partitions, overlapping, or nested subsets.

24 CHAPTER 3. THEORY

interpersonal communication. It will be shown that these five aspects can - to some ex

tent - be ordered in a layered fashion as shown in Figure 4. The suggested communica

tion layers will be defined and insight in the theoretical provenience will be given. There

after the layered order is exemplified, pointing out the interface between the interpersonal

communication model and the OSI reference model. Parts of this chapter are published in

(Guttormsen Schar et al., 2002).

Technically skilled readers will recognise the interpersonal communication model of

Figure 4 as a variety of the famous 7-layers OSI reference model (description see in the

Glossary). In fact there is resemblance, and the concepts of these two models are not too

far from each other, albeit they cannot be transformed one-to-one. Rather, they belong

to different systems as it is depicted in Figure 4, differentiating between culture, individuals,

and technology. The model of the interpersonal communication can be thought of being

stacked above the OSI-model (technology), and being subordinated by the cultural con

text. Before we take a closer view to the interpersonal communication model we will

shortly explain the other two systems, which are not in the focus of our investigation.

With OSI's technological approach, control is passed from one layer to the next. A

communication begins with the application layer on one end (for example, a user work

ing with a videoconference (VC) application). The information is passed through each of

the seven layers down to the physical layer (which is the actual transmission of bits). On

the receiving end, control passes back up the hierarchy.

In the system culture, we are distinguishing between individualistic and collectivistic

cultures. Individualism holds that the individual is the primary unit of reality and the ulti

mate standard of value. This view does not deny that societies exist or that people benefit

from living in them, but it sees society as a collection of individuals, not something over

and above them. Collectivism holds that the group - the nation, the community, the race,

etc. - is the primary unit of reality and the ultimate standard of value. This view does not

deny the reality of the individual. But ultimately, collectivism holds that the groups one

interacts with determine one's identity.

3.1 ATaxonomy of Communication 25

Interpersonal Real-Time CommunicationApplication

Social Context IFormalInformalI

lIPerson

Non-PersonIOrientation

lIVerbal

Non-VerbalICoding

• IVisualAuditoryIModality

l ISynchronousTiming Le I

Figure 4 Outline of communication taxonomy in a layered fashion with typicalexamples. Attributes bounded by dashed lines are not in the focus of our investigation.Reading example: communication settings between people take place either in formal orinformal context (Short et aI., 1976). The transmitted information concerns the relationbetween the partners (person-oriented) or the content (non-person-oriented(Watzlawick et aI., 1967). It is encoded verbally or non-verbally, and is received with theaid of either the visual or the auditory sense organs. The timing decides - among others- which applications take place in the considered communication setting.

26

3.1.1 Social context

CHAPTER 3. THEORY

The higher layers in the communication model (i.e. social context and orientation)

comprise a rather broad range of features. As a consequence, the study of higher-level

aspects of communication has been the source of many different theories. In what fol

lows, a thread is established through some of these concepts. It should be noted that

most of these approaches were deftned for business communication, probably because

this area is regarded as most influential when quality requirements are established.

In the following we describe social context according to the aspect degree rifformality, i.e.

depending on how far there exist formal rules or some codex for the exchange of

information.

Formal and informal communication

Several theorists make the distinction between formal and informal communication.

Smith (1972) deftnes formal communication channels as »those emanating from official

sources and carrying offtcial sanctions [...]. Formal messages usually flow through these

channels, thus acquiring legitimacy and authenticity«. On the other hand, informal com

munication channels »are not specifted rationally. They develop through accidents of spa

tial arrangement, through friendships«. Both formal and informal communication can fol

low an up-ward, downward, or horizontal path (to higher, lower, or equal authority). The

purposes of formal communication are to command, to instruct, and to ftnalise matters

through the application of regulations. The purposes of informal communication are to

educate through information sharing, to motivate through personal contacts, and to re

solve conflicts through participation and friendship. It seeks to involve participants in

organizational matters as a means of maintaining their enthusiasm, loyalty, and commit

ment. Table 7 lists some characteristics of formal and informal communication.

3.1 A Taxonomy of Communication

Table 7 Characteristics of formal and informal communication.

27

......... ,.. co .// . 0;:'/rf/ Br; /ii

• v ...... -" "" I·-~'.I ... ..official, binding unofficial

precise, unlikely to be misunderstood personal, inaccurate

traceable, can be preserved hard to trace

can avoid embarrassment can refute rumours and gossips

restricted jargon, rigid more emotional

authoritarian, likely to be obeyed less intimidating

fails to motivate promotes disclosure of underlying motives

3.1.2 Orientation

Orientation describes to some extent the purpose of a communication setting and the

related view the participants should have about the tacit assumptions needed to under

stand the topics being discussed. In some cases, these assumptions might be very limited

in range (e.g. comprising some technical knowledge needed to solve a specific task), while

in other cases, a common worldview is necessary for a fruitful discussion.

The summary below shows theories and methods that can be attributed to the orienta

tion layer. It is necessarily far from comprehensive; it should rather be seen as an indica

tion that it is extremely important to define a certain experimental setting properly, using

terminology and insight gained from the cited theories.

The Bales Categories

Already in the fifties Bales (1955) ran a series of experiments in which subjects held

simulated meetings. He analysed the nature of the interactions that took place. From

these experiments he elaborated four main categories: positive reactions, negative reactions,

problem-solving attempts and questions. At the Communication Studies Group (CSG), Short

(1976) reduced these four categories to two: Bales' positive and negative reactions were

classed as person-oriented, and the two other categories, problem-solving attempts and

questions were classed as non-person-oriented. The CSG considers person-orientation to be

the core category in understanding communication mediated by teleconferencing.


A further step in developing classifying schema was the SYMLOGh-space. Elaborated

from the large amount of research conducted by Bales (1999) this approach indicates at

least three bipolar characteristics, that are fundamental to describe communications in

small groups. The three dimensions spanning the SYMLOG-space are:

• Dominance versus Submissiveness

• Friendliness versus Unfriendliness

• Acceptance versus Non-acceptance rfAuthority.

These characteristics were implemented in standardised questionnaires and were al

ready applied innumerable times in different cultures. Thus, SYMLOG is supposed to as

sess person-oriented parts of small group communications reliably.

The distinction between content and relationship

Watzlawick (1966) distinguishes between the content and the relation part of a message,

thus establishing a direct link to the terms report and command introduced by Bateson

(Ruesch et al., 1951). Watzlawick (1967) apdy points to the correspondence of these

terms to the computer science terms data and control Since control information specifies

what is to be made with the data at hand, it can be regarded as 'information on informa

tion', i.e. metainformation. The following axiom describes this insight: »Every communica

tion has a content and a relationship aspect such that the latter classifies the former and

is therefore a metacommunication.« (Watzlawick et aI., 1967). In the case of interpersonal

communication, the exchange of control information could be seen as 'downloading ap

plets' to be executed by the communicating persons. In that sense, Watzlawick's view is

very near the categories established by Bales, and we can possibly simplify our taxonomy

by understanding a content-oriented approach to be non-person oriented, and, on the other

hand, relation to be person oriented.

How can one express person-, and non-person-oriented information? The answer to

this question leads to the section 3.1.3, where coding is discussed. But before, we take a

closer view of the orientation layer introducing the distinction of implicit and explicit in

formation types. Both person and non-person oriented information can be of implicit and

explicit nature respectively, and they can be expressed by both verbal and non-verbal cod-

h SYMLOG is the acronym for SYstematic Multiple Level Observation of Groups.


ing. And more precisely, there are no statements conveying solely implicit information,

there needs always to be the 'carrying' explicit information too. But in contrast to this,

there are statements conveying solely explicit information. An example will illustrate this

fact: »Joe drank ten beers last night« is a statement, which is by itself explicit and univo

cal. But depending on the context and the sound of voice saying it, it can be understood

in the way of »Incredible how much alcohol Joe always drinks!« which is the implicit

meaning. For the sake of simplicity, when we speak of implicit information we mean

both implicit and the carrying explicit information.

In our approach there is no sharp distinction between implicit and explicit information

types. We differentiate the two by means of the degree rfambiguity, i.e. implicit information

is strongly ambiguous, whereas explicit information is slighdy ambiguous or not at all.

The use of the term ambiguity likewise implies that the communicating partners have

mutual and tacit assumptions about the rules of their information exchangei . This means

that partners should agree on, and be aware of the 'rules' of ambiguityi; fulftlment of this

requirement is - among others - a job of the education system, imparting verbal and cul

turalliteracy. In order to clarify the distinction between implicit and explicit information

types see Table 8 (page 33) where some examples are depicted.

3.1.3 Coding

Information can be encoded and transmitted in many different ways, and different

forms of communication with specific codes may be used concurrendy. A voice signal

can conceptually be decomposed into a verbal part and a non-verbal part (or so-called prosodic information, like pitch, melody, level and timing). On the one hand, non-verbal fea

tures of both visual and auditory modalities convey information allowing to interpret a

message properly (e.g. to differentiate between a question and an exclamation), or allow

ing to make the speaking person out (e.g. by means of moving lips); on the other hand,

they help us in identifying a known person or to guess about his/her state of mind.

i otherwise - as a consequence - they would have to accept an impaired conversation (as it mighthappen between partners speaking different languages), or they might have to oversimplify thetopic.j we are not considering the particular meanings of an ambiguous statement to be vague, unclear,or obscure - far from it - they are very precise and clear; ambiguity is caused by the fact that oneis not sure which of the meanings can be accepted for true.


In addition to this, we should consider the different recognition capabilities of differ

ent codingk • Weidenmann (1988) points out this aspect referring to the learning charac

teristics of different media types. He states that - when choosing the appropriate learning

media - our different familiarity in handling words and pictures comes into play.

Whereas linguistic skills like reading and writing (i.e. verbal skills) are systematically

trained in our educational system, the competence of handling instructional pictures (i.e.

non-verbal skills) needs to be developed. As far as we can see, the two outstanding at

tributes on the coding level are the verbal and the non-verbal information types, as de

scribed below.

Verbal Coding

We speak of verbal coding when written and/or spoken languages and/or numbers are

used, and when mechanisms (i.e. grammars or lexica) exist through which the correctness

of a text or utterance and its meaning can be determined. Thus, verbal coding can itself

be seen at different levels, as shown in Figure 5. It should be noted that the higher we

move up in Figure 5, the more it becomes difficult to set up grammars and lexica as a

formal and comprehensive basis. In fact, the complete interpretation of text and speech

is partly dependent on the semantic level, and to some extent also on the pragmatic level,

which is in addition represented by the orientation and the social context layers as discussed

in sections 3.1.2 and 3.1.1 (page 27).

k Note that we are considering external coding here, unlike internal coding which is used in cognitive psychology with emphasis on the mental coding of input and the resulting information processmg.


/ Pragmatics Derivation of actions fromthe meaning.

Semiotics SemanticsGenerally acceptedmeaning of words,sentences and texts.

~ SyntaxRules by which words arecombined to makesentences and texts.

~ Rules by which signs arecombined to make words.

Figure 5 A layered view of verbal information. Note that the attributes given onthe right side are typical examples.

Non-verbal Coding

31

Non-verbal coding is often associated with many different kinds of pictorial representa

tion (e.g. gestures and facial expressions conveyed through video, graphs and pictures,

animations, pictograms, icons, etc.). Also, as already described, the prosodic features con

tained in a speech signal represent non-verbally coded information, as well as any non

verbal sound, for example instrumental music. Furthermore, it should be noted that vari

ous forms of background information (both visual and auditory) might supply important

information about the context of a communication session. I 'or example, hearing the

background noise of a rail-way station makes a phone call mure credible when the sub

ject of the call is about train delays - or even more when seeing the cabin in the back

ground if using e.g. a UMTS device.

3.1.4 Modality

One of the most basic conditions for participation in any communication event is the

sensation and the perception of the transmitted signals conveying information. This in

volves the human sense organs, which are able to detect light, sound, smell, taste, touch

and position, each corresponding to one specific mode (often, the term channel is used

alternatively). Although future developments in telecommunications might bring the in-


troduction of olfactive (smell) and haptic (touch) modes in special contexts (e.g. telesur

gery), we will restrict ourselves for the time being to the auditory and visual channels.

Multimodal Communication: Audio-Visual

Whenever the auditory and the visual channels are simultaneously invoked in a com

munication setting, we normally speak of multimedia communication. Increasingly, the alter

native term multimodal communication is used, where 'multi' does not just imply 'sound and

vision', but the fact that several different forms of communication (in the sense of sub

modes) can be implemented within both the auditory and visual channels. For example,

the visual channel is involved when a video signal represents a 'head and shoulder' pic

ture of the communication partners; alternatively, it is used as well when text and graphi

cal information are exchanged in shared workspace applications. The latter application

usually comprises still another supporting communication mode in the form of a separate

channel linking a mouse or a joystick simultaneously with a local and a remote pointer.

These examples belong to the coding layer in our communication model, since the sub

modes differentiate themselves through different forms of coding.


Table 8 Exemplification of the three layers orientation, coding and modality (including implicit/explicit distinction). The implicit messages are made explicit in the onfy explicitcolumn (in the verbal-coding row only). Note that the distinction between implicit and explicit is made by means of the degree ifambiguity.

33

Person oriented . ... ..:"". .~. ...

Implicit and explicit Only explicit Implicit and explicit Only explicit

Reading 'between Written text with rela- Reading 'between the Written, task-relatedthe lines' some relational information. lines' some task-related text.

ro tional information. information.;:,en e.g. »Big parts of his at- e.g. »We will have to lay5

e.g. »Mr. Miller has tendance time Mr. Miller e.g. »Maintaining job se- off workers next month.«.gJextraordinary interper- was chatting with his col- curity will be abig chal-

1 sonal skills.« leagues.« lenge in near future.«

~ Hearing 'between Spoken text with rela- Hearing 'between the Spoken, task-related

~. the lines' some relational information. lines' some task-related text.~ tional information. information..s:.0 e.g. »1 doubt about your e.g. »The Porsche engine;:,<{ e.g. »Are you sure of competence.« e.g. »The Porsche engine has to be reengineered.«

what you are talking still uses traditional injec-about?« tion.«

Extracting relational Gazes, gestures, im- Extracting task-related Task-related gestures,information from ages, emoticons [e.g. information from gazes, images, icons [e.g. ~,

ro gazes, gestures, :-( or ©] etc. with re- gestures, images, etc. ){] etc.;:, images, etc. lational information.en5

e.g. showing apicture ofCD e.g. configuration of linesc e.g. avoiding eye con- e.g. the referee showing indicating a3D-cube adefect of an aeroplane,;gCo) tact the yellow card

.......

~ Extracting relational Pitch, volume, etc. of Extracting task-related Task-related pitch, vol-1" information from voice and sounds with information from pitch, ume, etc. of voice andc pitch, volume, stac- relational information. volume, etc. of voice sounds.0z ~ cato, etc. of voice and sounds..s

:.0 and sounds. e.g. the sound of abeep;:, e.g. hooting, cheering<{

e.g. rattle noise from a instead of acensorede.g. talking with higher vehicle word in aspoken sen-pitch to someone tence

34

3.1.5 Timing

CHAPTER 3. THEORY

As can be seen in Figure 4 we distinguish between synchronous and asynchronous

interactions1. Simplifying things we note that the main difference between the two timing

categories is the time magnitude of the interactions between the communicating partners.

Whereas synchronous interaction is in the range of milliseconds to seconds, asynchro

nous interaction is in the range of minutes to hours, or even days to weeks. Since we re

strict the model to synchronous and asynchronous timing, we implicidy restrict ourselves

to dialog or interactive communication.

According to Fluckiger (1995) the timing of interaction decides which applications

take place in the considered communication setting. Examples for synchronous or real

time interaction are:

• Interpersonal applications: Only two individuals are involved. Also called person

to-person applications, and sometimes called one-to-one applications.

• Distribution apph'cations: Sometimes called person-to-group applications, where

multimedia information such as a live audio and video is transmitted from

one source to multiple recipients in a one-way mode (no return channel

from the recipient to the source). This is analogous to 1V broadcasting.

• Group teleconferencing: Sometimes also called group-to-group teleconferencing,

which is a generic term referring to bi-directional conversational communi

cation between two or more groups of people.

Examples for asynchronous interaction are:

• Multimedia e-mail: This is the conventional e-mail where the documents ex

changed are not only plain text, but also include rich text, hyperlinks, and

audio or video sequences.

• A!)nchronous computer conferencing: Refers to a service where people exchange

multimedia messages asynchronously. The technique often consists of sub

mitting or retrieving contributions to or from centralised servers.

1We define a basic interaction unit as one reciprocal action, consisting of an action triggered by asource, echoed by a sink, and received by the source again.


Since asynchronous communication is not in the focus of our investigations, we will

only consider synchronous (i.e. real-time) settings in the following. Within these real-time

settings we focus on network-mediated interpersonal communication as well as on peo

ple-to-systems communication.

Absolute Delay

The main issue of our investigation of real-time communications concerns absolute de

Icry, which is only perceivable in dialog settings. Strictly speaking, also typical one-way ap

plications like e.g. video-on-demand have a dialog part, namely between sending the re

quest and receiving the video stream. This means that also one-way applications let the

user perceive absolute delcrys in an initial phase, but as soon as the connection is established

the user is not aware of absolute delays anymore, so that the term one-wcry for such kind

of applications is justifiable.

In Figure 6A we depict the definition of the absolute delcry in the way users of real-time

dialog applications are aware of. In the same figure there is also the technologically in

spired definition of the term round-trip delcry, which is sometimes used synonymous. We

define absolute delcry as the elapsed time between the expression of an auditory, visual or

tactile trigger and the answer from a communicating partner (human or machine). I.e.,

the acting user at the source sets a primary internal time marker when executing an ex

pression, and a secondary one when perceiving the answer. The estimation of the elapsed

time between these two markers is what the acting user perceives as absolute delay. It

remains for the time being an open question, whether the acting user sets the time

marker at the time he/she perceives his/her own expression, or at the time he/she is

planning to produce it. The absolute delay consists of:

• two network transit delays (hi-directional)

• two times the depacketising delay

• the source encoding and decoding

• the sink echo processing

• the neural transit delay of the user at the source receiving the answer

In Figure 6B we depict a magnified view of the sink echo processing consisting of:

• the sink encoding and decoding


• of the reacting user's reaction time

On its part the reaction time consists of:

• the neural transit delay between peripheral excitation and conscious percep

tion

• the cognitive processing time

• the time needed to produce and execute the output stimulus

Depending on the point of view, a particular user in a real-time dialog setting has both

roles: for oneself that of the source and for the partner that of the sink. Hence, when

perceiving the absolute delay the user does - beside the technologically generated delays

- estimate the reaction time of the partner, but not the own reaction time. I.e., the per

ceived and estimated source echo processing time is not equal to the sink echo process

ing time.

tA

Neural transit t Bdelay of acting

SUbject

Stimulusproduction

Cognitiv processing time

Consciousperception of

reacting subject

1

1 Neural transitdelay

I I 1

I!I-Ollll(""'--- Reaction time of reacting subject--...

01" Absolute Delay ---------~.-JI~ I~ I \J-li·.-----I-::.~=-I--Round-trip delay '-1 -co III~ 1(/)1 .c.51 I 1"1 .21~o I First bit I First bit II~ I Last bit First bit First bit 0.\ U)

~ Itransmittedl received ~ Ireceived transmitted received B I~o I by source I by sink 10ID by sink by sink by source ~ lit;51 a.. ca

'Ci) Sink echo processing IU)~~__

KII Source INetwork transit I I Sink I Sink INetwork transit l>< I coding I delay 1 I Idecodin I coding I delay 2 IWI

Figure 6 A: Schematic diagram of the round-trip delay according to Fluckiger(1995), and of the absolute delay according to our definition. Grey shaded areas indicatehuman information processing time. B: Magnified view of the reacting subject's reactiontime, and the neural transit delay of the acting subject.

3.1 ATaxonomy of Communication

Relative Delay

37

In contrast to absolute delay, relative delqy is perceivable also in one-way settings. We

define relative delqy as the time difference between the appearance of the visual stimulus

and the appearance of its appendant auditory stimulus in an audio-visual presentation.

Furthermore we distinguish between the possible orders of the incoming stimuli: Audi

tory precedes visual (AV), visual precedes auditory ryA), or they are in sync, i.e. there is

no relative delay. Relative delay is sometimes referred to as intermedia .rynchronisation, or lip

.rynchronisation, pointing to the particular synchronisation requirements needed either to

give the feeling of naturalness in audio-visual telecommunications, or to enable or en

hance lip reading for hearing impaired people (e.g. to optimise hearing aids). These areas

comprise a rich body of literature as e.g. (McGrath et al., 1985; Pandey et al., 1986;

Summerfield, 1992; Kouvelas et al., 1996; Steinmetz, 1996; Stone et al., 1999; Oviatt et

al., 2000; Stone et aI., 2001; Van Hoesel et al., 2002), which in fact rarely treats perception

thresholds obtained by means of psychophysical methods. Further studies investigated

the intermedia synchronisation by means of distinct stimuli like bouncing disks

(Lewkowicz, 1996), or hammer hitting a peg (Dixon et al., 1980). (See also section 2.2.2

Published Results for Perception and Acceptance of Delay).

3.1.6 Exemplification of the interpersonal communication model

In the following we will explain the interpersonal communication model by means of a

videoconference ryC) user. Furthermore we will point out the interface between the OSI

and the interpersonal communication model, when we consider the interaction timing of

different applications.

Before the VC user will start sending information through the videoconference appli

cation, s/he will be aware of the social context, in which the communication setting will

take place. In our approach, this means that s/he knows if the communication partner

belongs e.g. to the family, to the workmates, to the circle of friends, or to the circle of

acquaintances etc. Consequendy s/he has also an idea of the hierarchical position of the

communicating partner, of the overall importance of the event and the like. We subsume

these factors, saying that the user is aware of the degree rif the formality of the event. Fur

thermore we predicate that the degree of formality determines the communication proc-


ess to come. That is, the communicating partners will choose an appropriate languagern

as well as modify the topics of the conversation, voice pitch, gestures, gazes, and interac

tion timing. When we consider these modified aspects separately, deeper layers in the

communication model will be probed.

As already stated before, communication between people comprehends content in

formation, including also the problem or purpose, and metainformation, i.e. implicit in

formation concerning the intended meaning of the verbal, usually ambiguous content.

Metainformation usually uncovers in which relation the partners are, and is therefore

considered as person-oriented information, unlike the non-person-oriented information, which is

the 'real' content. However, the orientation of the information stream to be sent to the

communication partner is the next crossway, where the VC user has to pass by. Accord

ing to her/his appraisement of the actual communication setting (which also includes the

problem to solve), s/he will direct the information flow more towards the partner or

more towards the task. And s/he will choose a more explicit or a more implicit way to

express her/his message. Again, this influences the following layers.

In order to illustrate how the higher layers influence the coding of a message, let us

assume two examples for the use of a videoconference, which both are in a formal con

text:

• Two industrial designers are working on improving the ergonomics ifa drilling machine.

• Superior and emplqyee are talking about the emplqyeejpersonalperformance.

First of all these examples show that the choice of verbal and non-verbal coding respec

tively is determined by the problem to solve. The designers primarily will choose sketches

and schemes to solve the problem, whereas the superior will talk to the employee before

writing a letter of reference. Thus it appears that the purpose determines the orientation

of the conversation, and furthermore the suitable coding: The coding is mainly non

verbal in the case of the (non-person-oriented) designers, and is mainly verbal in the case

of the (person-oriented) superior. The chosen examples are not inevitably typical exam

ples, there are probably more examples proving the contrary, e.g. using non-verbal cod

ing in person-oriented communication and using verbal coding in non-person-oriented

communication. Hence it appears that these examples do not imply rules for the use of

m For instance, they restrict their vocabulary, if they consider themselves in a formal conversation.


the particular coding. They only exemplify the layered order of our modeL We are sug

gesting that depending on a particular communication setting, there are tacit agreements

about the accepted and optimal manner to encode the message. Referring to the two ex

amples this means that it is probably not helpful to use to a great extent spoken and writ

ten language in order to improve the ergonomics of a drilling machine. And it is unusual

and probably not accepted by the employee being qualified only by charts and diagrams;

a personal word is expected here.

Actually the next entity modality needs not to be underneath the coding, in terms of

being determined by it. But it makes sense when we consider the degree of conscious

ness, which is necessary either to perceive or to decode a message. Perceiving visual and

auditory information is handled by the sense organs, their corresponding neurological

pathways and by the visual and auditory cortex, whereas decoding verbal and non-verbal

information involves higher levels of information processing and consciousness. In

short: An amoeba is capable of detecting light, but will fail to extract abstract information

from a visual pattern.

Considering the relative delcry of an audio-visual event, where sound is preceding the

visual component, we are leaving the 'natural' frame of reference: in a natural environ

ment there is no sound preceding the corresponding visual event, whereas the contrary

situation - sound is lagging the corresponding visual component - is familiar to every

one, e.g. seeing first a hammer hitting a peg before hearing the knock. A comparable rea

soning can be followed in regard to absolute delcry: audio-visual communication in natural

environments creates no bigger transit delays than sound needs to travel through the

range of vision, whereas in technologically mediated communication this delay can be

theoretically of any value above a minimal delay due to physical constraints. Recapitulat

ing, it appears that fundamental characteristics of the timing layer concerning order, or

asynchrony are not found in face-to-face communication. In contrast to that, all charac

teristics of the higher layers in our communication model are found - together with

technologically mediated communication - also in face-to-face communication. This fact

predestines the timing to be the most basic layer, representing the interface to the system

technology, which, on its part, is instantiated by the OSI-modeL

The basic layers timing, modality and coding of the communication model in Figure 4 are

mainly of elementary nature. They can be regarded as absolute prerequisites for any

communication between people. Procedures for the investigation of these layers are ex

pected to be manageable. This is not the case for the upper layers orientation and social


context, where many diverse situations are conceivable, usually very much depending on

the nature of the tasks performed. Moreover, at these layers psychological characteristics

of the involved persons will play an important role; thus, the character of the involved

people, and - after all - group dynamics may have to be taken into account when design

ing experiments or interpreting their results.

It has been found that the investigation of most multi-participant (dialog or conversa

tion type) settings is a move into 'terra incognita', i.e. generally accepted research ap

proaches do not exist, and most often there is a lack of methods, taxonomies and even

proper definitions of the entities under investigation. This is especially true for psycho

physics, where traditionally many problems associated with 'one-way' situations were in

vestigated (humans as stimulus receivers), and where, on the other hand, research

concerning dialog settings appears to be extremely sparse.

3.2 Processing Time of Auditory and Visual Stimuli

When we consider the perception of events conveying coexistent information of dif

ferent modalities, such as - in our case - auditory and visual, we have to take into ac

count that different receptors and perceptual pathways are involved for different modali

ties. Therefore it is obvious taking into account the possibility of different processing

times in different modalities. In fact there are differences. In the following we will pre

sent two ways of determining them: indirectly through differences in reaction time for

different modalities, and directly trough measurement of Event Related Potentials (ERP).

3.2.1 Indirect: Reaction Time Differences

Reaction time has been a favourite subject of experimental psychologists since the

middle of the nineteenth century. Thereby three basic kinds of reaction time experiments

have been conducted.

• Simple reaction time experiments

• Recognition reaction time experiments and

• Choice reaction time experiments

3.2 Processing Time of Auditory and Visual Stimuli 41

In simple reaction time experiments, there is only one stimulus and one response. If the

stimulus appears the response is required as fast as possible. In recognition reaction time

experiments, there are some stimuli that should be responded to and others that should

be ignored. And in choice reaction time experiments, the experimental subject must give a re

sponse that corresponds to the stimulus, such as pressing a key corresponding to a letter

if the letter appears on the screen.

Since the beginning of the reaction time research, many researchers have confirmed

that reaction to sound is faster than reaction to light. The accepted figures for mean sim

ple reaction times for college-age individuals are about 190 ms for visual stimuli and

about 160 ms for auditory stimuli (Galton, 1899; Fieandt et al., 1956; Brebner et al., 1980;

Welford, 1980). Differences in reaction time between these types of stimuli persist

whether the subject is asked to make a simple response or a complex response (Sanders,

1998). The time for motor preparation (e.g., tensing muscles) and motor response is the

same in all three types of reaction time test, implying that the differences in reaction time

are due to processing time (Miller et al., 2001).

Hence, there is evidence from reaction time experiments that the mean processing

time of auditory stimuli is about 30 ms shorter than the mean processing time of visual

stimuli. On the other hand there is also evidence, that processing speeds are not fixed

values, rather they are influenced by various forms of facilitation effects: The difference

between reaction time to visual and auditory stimuli can be eliminated if a sufficiently

high visual stimulus intensity is used (Kohfeld, 1971). Cross-modal facilitation can be

demonstrated with experiments showing that reaction time to multimodal inputs pre

sented in close spatial and temporal proximity are typically faster and more accurate than

those made to the unimodal stimuli alone (Hershenson, 1962; Welch et al., 1986; Giard et

aI., 1999; McDonald et al., 2000).

3.2.2 Direct: Event-Related Potentials (ERPs)

Electroencephalography (EEG) provides a direct and non-invasive technique to di

rectly measure processing speed of different modalities: Embedded within EEG signals

are short-term transient waves known as Event-Related Potentials (BRPs). These waveforms

reflect the singular experience associated with an external stimulus such as an auditory or

visual event.


When a stimulus is presented to a subject, and brain activity is recorded following the

presentation of the stimulus, an ERP can be recorded. I.e. the voltage fluctuations re

corded at the surface of the scalp contain elements specific to the presented stimulus.

Typically, ERPs are largely contaminated by other activities of the brain. By averaging

across several tens or hundreds of trials, individual ERPs become apparent. A specific

ERP becomes evident by adding a series of individual EEG samples time-locked to the

evoking stimulus. By summing these samples, the background brain activity, which is as

sumed to vary randomly over time, will tend to average out.

Accepted figures of visual processing time derived from ERP-studies are between 45

ms and 55 ms as represented by the onset of the earliest cortical potential (Clark et al.,

1995; Clark et al., 1996; Foxe et aI., 2002). On the other hand, the earliest auditory

evoked potential reaches the cortex between 9 ms and 15 ms (Celesia et al., 1971;

Vaughan et aI., 1988), or in less than half the time of visual input, approximately 30 - 40

ms earlier than the visual stimulus. The consequences of different processing times are

that asynchronies of audio-visual events are perceived differendy in respect of the stimu

lus order: Same relative delays for both incoming modality orders would evoke a bigger

perceived delay when auditory precedes visual, than in the opposite order (see Figure 7).

However, this effect might be compensated by recendy discovered fmdings: two stud

ies (Giard et al., 1999; Molholm et al., 2002), which investigated the integration of audio

visual (AV) information by means of ERPs, showed an early AV effect after 46 ms over

the right parieto-occipital scalp. This finding suggests that the auditory part of AV-inputs

modifies early visual sensory processing and leads to the following interpretation: Firsdy,

auditory input activates primary auditory cortex (A1) within 15 ms after stimulus presen

tation and is then transmitted up the auditory processing stream. This input is then pro

jected to visual areas. The critical issue is one of timing. The question is whether there is

sufficient time for auditory input to reach early visual areas to result in modulation of the

later arriving visual input. Given the above mentioned processing times between the ini

tial auditory and visual inputs to their respective primary cortices, there is a window of 25

ms - 30 ms in which the auditory evoked process can prepare visual areas for arriving

visual evoked processes.

3.3 Mental Representation of Time

I i

'!i-ooII1III(f-----perceived relative delay AV----.....;..~II

1l1li( .. :

Iperceived relative delay VAI

43

o 20 40 60 80 100 120 t(ms)

Figure 7 Effect of the different processing times of auditory and visual stimuli inthe human brain disregarding the early AV effects described by Molholm (2002). Greyshaded areas indicate processing time for both modalities.

3.3 Mental Representation ofTime

Synchronous interaction is immediate. Knowing from real-life situations, the term imme

diate is used with considerable tolerance. In some situations the reaction of a request

should be as fast as possible, whereas other situations allow for a reaction after a certain

delay, e.g. after a commenced workstep is accomplished. Anyway, whenever an immedi

ate reaction is required, it is expected to be executed now. Therefore the term now - which

means the present - is afflicted with big tolerance too.

This real-life experience has an analogy in the r hilosophical discourse: If one argues

on an abstract level, the present can be considered as the dimensionless border between

the past and the future, thus the present does not last since it is a timeless cut-off point.

On the other hand, we know by experience that the present has a certain duration, i.e. we

are aware of the present and we can easily distinguish between what is now, what has

been before and what is still to come. Otherwise we would be riven between past and fu

ture. This discrepancy between experience and theory represents a profound problem,

and philosophers were dealing with it since antiquity (for some examples see Poppel

(1997a». Since we focus on phenomenological reality, we are not treating the abstract


. connotation of the present, but the experienced 'nowness', which is called su,?jective present

(Stern, 1897; Poppel, 1978).

3.3.1 Low Frequency Processing

Given that suijective present is experienced as a certain amount of time, how can it be

determined then? Poppel (1997a) describes some experiments dealing with the duration

of subjective time. In the following we give an excerpt of these experiments, concerning

the visual and the auditory modality.

Figure 8 shows an ambiguous line drawing, named after its founder Louis Albert

N ecker. It is a wire-frame drawing of a cube in isometric perspective, which means that

parallel edges of the cube are drawn as parallel lines in the picture. When two lines cross,

the picture does not show which is in front and which is behind. This makes the picture

ambiguous, i.e. it can be interpreted in two different ways. When a person stares at the

picture, it will often seem to flip back and forth between the two valid interpretations. In

order to reproduce what follows, it is helpful making us familiar with both perspectives.

The black spot in the corner of the cube in Figure 8 is an aid to envision the two per

spectives: in one perspective it is in the foreground of the cube, in the other it is in the

background. After we are capable of swapping deliberately between the two perspectives,

an experiment can be conducted demonstrating the scope of the human time integration

capability: We stare at the cube and try to hold one perspective as long as possible. What

happens then is, that after a few seconds the perspective swaps automatically. Now we

try to hold the swapped perspective as long as possible. We will notice that once again af

ter some seconds the cube swaps against our wishes. A possibility to overcome the

cube's forced swapping is staring at an arbitrary point of the cube and trying to think at

something different. As a result, the cube remains stable, because we have banned it from

conSCiousness.

3.3 Mental Representation of Time

Figure 8 The Necker Cube is an optical illusion first published in 1832 by theSwiss crystallographer Louis Albert Necker. It offers a means to estimate the durationof the subjective present.

45

The spontaneous alteration of ambiguous figures is an effect that is observed also in

the auditory modality. A similar experiment can also be conducted interpreting e.g. the

ambiguous phoneme sequence CU - BA - CU - .... For some seconds one hears BACU

whereupon for another couple of seconds one hears CUBA (poppel, 1997b). Such spon

taneous alteration rate in the two modalities suggests that a low-frequency mechanism

binds successive events of up to 3 s (poppel, 1994) into perceptual units. After this pe

riod attentional mechanisms are elicited that open sensory channels for new information;

if the physical stimulus remains the same, the alternative interpretation of the stimulus

will gain control. Metaphorically, up to every 3 s the brain scans the sensory inputs and

asks: »what is new?«

Evidence for the 3-seconds-hypothesis is also supplied by experiments using other

paradigms. Studies on the temporal reproduction of stimuli with different duration show

that stimuli are reproduced almost truthfully up to 3 s. Longer stimuli are reproduced

significandy shorter and with much greater variability (see Figure 9). Intervals of up to 3 s

can be mentally preserved, or grasped as a unit, whereas longer stimuli are likely to be

squeezed into the 3 s interval.


723456Duration of stimulus (s)

1o+-....,....~_r___r_.......,...___r___r--..,.__----.r_.._..___...___.

o

7

6

-en 5--Q)enc0

4Q.en~-0c 3 _.----------_ ..._---0

:0::;m.....~ 20

1

Figure 9 Example for the reproduction of temporal stimuli between 0.5 and 7 sduration from one subject. Stimuli were given in random order. A continuous light wasused as stimulus. At S=R, stimulus duration equals reproduction. ALth is the geometricmean of all stimulus durations. Note that for stimuli longer than 3 s temporal reproduction remains short. Data from Poppel (1971).

3.3.2 High Frequency Processing

Evidence for a high-frequency processing system comes, in part, from studies on tem

poral order thresholds (Hirsh et al., 1961; von Steinbiiche1 et al., 1996). If the temporal

order of two stimuli has to be indicated by experimental subjects, independent of sensory

modality, a threshold of 30 ms is observed. Data picked up within 30 ms are treated as

co-temporal, that is, a relationship between separate stimuli with respect to the before

after dimension can no longer be established. This does not mean that the central nerv

ous system cannot process information for shorter intervals than 30 ms (e.g. the localisa

tion of objects in auditory space requires a much higher temporal resolution. For detailed

explanations concerning microsecond timing, see section 3.4.1), however, distinct events

require a minimum of 30 ms to be perceived as successive.

3.3 Mental Representation of Time 47

Support for distinct system states come from a variety of studies using different para

digms: Under stationary conditions response distributions of reaction time Ookeit, 1990),

or pursuit eye movements (poppel, 1986) show typical characteristics in the sense that

frequencies of preferred response latencies are separated approximately by the 30 ms in

terval (see Figure 10). These effects can be explained on the basis of neuronal oscilla

tions. After the transduction of a stimulus, an oscillation of 30 ms is initiated that is

phase-locked to the stimulus. Such an oscillatory mechanism, under environmental

stimulus control, allows integration of information from different sensory modalities, i.e.,

data from various inputs can be collected within one period, which defines a basic system

state. The separate response modes possibly represent similar successive and discrete de

cision-making stages, as is assumed in high-speed short-term memory scanning

(Sternberg, 1966).

! ! ! !!

2

12

10

8Cl)IDCl)

56c..Cl)

~~4o

O-+-....-........-'l'~~

o 50 100 150 20(' 250 300 350Latency (ms)

Figure 10 Histogram of 463 latencies of pursuit eye movements in three subjects.Data are summarised in 10 ms bins. Arrows indicate temporal positions of the preferredlatencies that are separated by 30 to 40 ms. Data qualitatively from Poppel (1986).

Further support for the 30-ms-hypothesis is supplied by neurophysiological observa

tions. The auditory evoked potential in the midlatency region shows an oscillatory com

ponent with a period of 30 ms (Galambos et al., 1981). This component is a sensitive

marker for the anaesthetic state because it selectively disappears during general anaesthe-


sia (Madler et al., 1987). Thus, oscillations with a period of 30 ms represent functional

system states that are apparently necessary prerequisites for the establishment of events

(Schwender, 1994).

3.4 Neural and Cognitive Models of Time Perception

In a strict sense, time perception should not occur because receptors of what we refer

to as 'time' do not exist. Following the reasoning of the previous section, where the nota

tion of the subjective present was introduced, time can be regarded as a mental construction

based on sensory processing. Conceivabilities about the underlying neural functioning as

well as cognitive models of time perception positioned on a higher level of abstraction

are topics of this section.

A fundamental part of sensory processing is pattern recognition, that is, how central

neurons develop selective responses to spatial and temporal patterns of activity from en

vironmental stimuli. Sensory stimuli can be decomposed into spatial and temporal com

ponents. Spatial patterns refer to those that can be discriminated based on a static 'snap

shot' of which neurons are active (e.g. retinotopy of cortical activation). Temporal pat

terns refer to those in which the order, duration, or interval between the activation of

neurons is required for stimulus discrimination. The duration of flashed bars of light and

the voice-onset time of phonemes are examples of temporal stimuli ranging between few

orders of time magnitude only. All together the brain processes temporal information

over a range of at least ten orders of magnitude - from microseconds to daily circadian

rhythms (see Figure 11).


TASK ApPROPRIATE MODEL

10.3 Microsecond Processing: Delay Lines0.01 Sound Localisation Labelled Lines

0.1

1Millisecond Processing:

Speech Generation/Recognition Population Models10 Motion Detection

W 100 Motor Coordination.§.

olI( 1 sCD 10 3.$ Second Processing: Pacemaker-Switch-~ 10 4 Conscious Time EstimationIl( 1min Accumulator-Models

10 5

10 6olI( 1 h

Circadiane Rhytm:10 7

Il( 1d Appetite10 8 Sleep-Wake

10 9

Figure 11 Scales of temporal processing. Human process temporal informationover a scale of at least ten orders of magnitude, executing tasks in the microsecond tothe daily scope. At the right side of the figure, appropriate models for particular tasksare listed. There is no sharp border between the use of the appropriate models. Ratherthey are assumed to overlap the particular tasks. Modified from Buonomano et al.(2002).

49

3.4.1 Labelled Lines

The Labelled Lines models are used to explain microsecond temporal processing,

which is primarily responsible for the detection of interaural delays used to localise sound

sources. In humans it takes sound approximately 600 fls to 700 fls to travel the distance

between the left and right ear. The auditory system uses these intervals to calculate the

spatial location of the sound source. A relatively simple but extremely sensitive mecha

nism is used to determine these microsecond intervals: A sound arriving in each ear willactivate neurons in the cochlear nucleus. The axons from these neurons function as delay

lines; that is, the distance an action potential has to travel is proportional to the time it

takes. Neurons in the medial superior olive function as coincidence detectors and use the

delays to respond selectively to different intervals. Together these neurons establish a to

pographic map of auditory space (Carr, 1993). Whereas Labelled Lines models have


proven suitable to explain microsecond processing, they are not well suited for complex

forms of temporal processing such as sequences and speech (Buonomano et al., 2002).

Computationally, the Labelled Lines models are very effective, but only for simple tasks.

3.4.2 Population Clocks (Neural Networks)

In Population Clocks (or population models), time is coded in the population activity

of a network of neurons, where any given neuron will contain litde temporal information.

An additional difference from Labelled Line models is that there is not an explicit range

of time constants or time delays specifically set to capture specific intervals. These mod

els generally rely on local network dynamics and time-dependent changes in network

states, which appear as a result of e.g. plasticity of synaptic delays. Central to 'biologically

feasible' population models are oscillatory pacemaker neurons. The idea of using oscilla

tors to store an arbitrary temporal sequence was introduced in the sixties by Longuet

Higgins (1968). Since then a series of refinements took place triggered by the use of

computer simulations.

Figure 12 shows a recent approach aiming to model stored time intervals (Miall, 1996).

This model relies on a large population of pacemakers with only a narrow distribution of

oscillation periods. A unique group of pacemakers is selected that have the appropriate

beat frequency to store any particular time interval. Consider a group of oscillators

(pacemaker neurons), each with a slighdy different frequency of oscillation, and each

spiking for a brief part of each cycle. The beat frequency of any pair of these oscillators is

then the frequency at which they spike simultaneously. Thus their beat frequency is much

lower than their intrinsic oscillation frequencyn. It is given by the difference between the

frequencies on the two cells. For a population of oscillators the beat frequency is given

by the lowest common multiple of the periods of their oscillations. A group of a few

hundred pacemaker cells, even with similar oscillation frequencies, can encode a wide

range of time intervals and can recall the interval at a later time.

n which is the requirement for storing time intervals in the second range. With such a model it isnot necessary to assume pacemaker neurons with a great variability in the oscillation frequency,as other models do.


A Btime

~

1 11111111 111111111

2 111111 I I I I I I I 1.00-...--

3 I I I I I I I I ~0-ca

4 I I I I I I I I 'uIn0

5 I I I I I I I

to t1 t2~ ..

interval tobe stored

Figure 12 Storing time with oscillating neurons. A: A schematic diagram of activity in five oscillators, indicated by short vertical bars. The interval to - t1 can be encodedby selection of those oscillators active both at to and at t1 (oscillators 1, 2, 5). B: Thenetwork: a heterogeneous population of oscillators mutually excite an output neuron,which sums incoming activity and fires when a threshold is reached. Modified fromMiall (1996).

51

Computer simulations of this model show both impressive characteristics and severe

weaknesses regarding the comparableness to biological systems: With such a model it is

neither necessary to assume unrealistically accurate pacemaker neurons, nor to assume

them firing with unrealistic variability (e.g. from tens of milliseconds to tens of seconds).

Furthermore the model is very robust regarding noise: great random fluctuations of the

pacemaker neurons have little impact on the system's behaviour. But as soon as there is a

directional shift instead of random fluctuation of the unit's periods, recall is poor. A fur

ther failure to mimic biology is the relationship between interval duration and accuracy:

The networks, as modelled, are either accurate or they fail. There is no distribution of re

sponses about the desired time that might lead to the typical Weber's Law relationship

between errors and duration. The remaining difficulty with the model presented here is

that the group of selected units encoding a particular time interval or sequence needs to

be synchronously reset to allow recall of the stored interval. This is possible, but would

require some powerful reset signal to reach the entire group of oscillators.

52

3.4.3 Pacemaker-Switch-Accumulator Models

CHAPTER 3. THEORY

Positioned on a - compared to Labelled Lines and Population Clocks - higher level of

abstraction is a class of models referred here to as Pacemaker-Switch-Accumulator models.

As the name suggests, central to these models is a three-step process beginning with a

pacemaker unit that emits pulses (whose rate can be increased or decreased). These

pulses are gated to an accumulator through a switch, which can be closed (so that

pulses pass) or open (pulses cannot pass). The closure of the switch is triggered by in

coming significant temporal information, its opening by the end of the temporal episode

to be estimated (Church, 1984; Gibbon et al., 1984). The accumulator is a perceptual

store similar to an 'up' counter incremented by pulses which have passed the switch.

The Temporal Information Processing Model (TIP) of Figure 13 is an approach, which uses

the Pacemaker-Switch-Accumulator model in its core. It explains the variance of duration

estimations in humans and animals. The model stemmed from animal timing behaviour

experiments where - by means of Classical Conditioning procedures - particular dura

tions were reinforced. From the animal's recall behaviour of reinforced durations the TIP

has been developed. TIP also suites well human duration estimation, where - instead of

reinforced - consciously learned durations have to be recalled, albeit some mechanisms

concerning attention and arousal are not fully clarified. For attention and arousal-related

work see e.g. Treisman et al. (1990), Block (2001), or Zackay (1998).

ISwitch IA "IPacemaker I' I I I '/~IAccumulatorl

Working .iiii.~:v1 ReferencelMemo t " / n* Memory .X ? )I'.L b*

Comparator

Clock Level

Memory Level

Decision Level

YESif

abs (t-n*)/n* < b*

NOif

abs (t-n*)/n* > b*

Figure 13 The Temporal Information Processing Model (TIP) composed of thethree interacting levels: clock, memory, and decision. Modified from (Church, 1984).

3.4 Neural and Cognitive Models of Time Perception 53

Following the Pacemaker-Switch-Accumulator level of the TIP-model, additional two lev

els are introduced and discussed here. The memory level includes a short-term memory

store (working memory), which is functionally equivalent to the accumulator, and a long

term store (reference memory), where reinforced (or learned) durations are transferred at

the end of a trial.

Finally, at the decision level, a comparator compares the number of pulses t currently

in the short-term store, and a random sample n* from the reference memory for the

standard duration, represented as a Gaussian distribution. A decision as to whether or

not to respond depends on the comparison of the absolute difference between t and n*,expressed as a fraction of n*, and a threshold b*, which is a random value drawn from a

Gaussian distribution. Thus, the equation describing the threshold for responding is ex

pressed as [abs (n*- t)/n* < b*], with abs indicating absolute difference. If this normal

ised difference is less than the threshold, responding is initiated (see Church et al. (1994)

for an application of this model).

Pacemaker-Switch-Accumulator models, including TIP, account for (or have been devel

oped due to) several effects in human time perception showing that the subjective dura

tion of a stimulus can be influenced by factors in addition to its actual physical length.

For example, stimuli that are 'filled' (e.g. continuous tones) are usually perceived as

longer than equal-length stimuli that are 'empty' (Thomas et al., 1974). likewise, moving

stimuli have been judged as lasting longer in duration than static ones (Goldstone et al.,

1974; Brown, 1995), presentations of familiar words were judged as lasting longer than

unfamiliar ones (Witherspoon et al., 1985). Frequent results from the classical timing lit

erature are that more intense stimuli tend to be judged as lasting longer than less intense

ones (Fraisse, 1964), as well as 'sounds are judged longer than lights' (Goldstone et al.,

1974). The latter refers to the phenomenon that auditory stimuli frequently appear to

have longer subjective durations than do visual stimuli of the same real-time length.

Explanations for these effects mainly concern either the closure latency of the switch,

or the speed of the pacemaker. The closure latency of the switch is supposed to exceed

its opening latency, and this difference might depend on the modality or the degree of

expectation of the temporal signal (Lejeune, 1998). Or pacemaker speed can be increased

or decreased with arousing or calming stimuli (Boltz, 1994; Wearden et al., 1999).


3.5 Psychophysical Theory for Measuring Thresholds

As mentioned in the introduction to this thesis, we consider the psychophysical ap

proach suitable in order to measure the perception and acceptance thresholds of particu

lar delay parameters. In what follows, the fundamentals of psychophysical testing, a

specification of the psychophysical function as well as the adaptive psychophysical pro

cedure applied in the empirical part of the thesis are described.

3.5.1 Testing paradigms

Psychophysical procedures dispose of various testing paradigms, of which we describe

the yes-no and the forced-choice (nAFC: n-Alternative-Forced-Choice) mode. With the yes-no

mode subjects are given a series of trials, in which they must judge the presence or ab

sence of a stimulus at each case. The ratio between the number of trials containing a

stimulus and the total number of trials is usually 0.5, but can be any other value. Usually

this ratio is told to the subject in advance. The rate of yes-responses for all tested stimu

lus intensities is defmed as the dependent variable.

Basically a different testing mode is represented by the forced-choice mode: Subjects are

given a variety of n alternatives, from which they have to choose the one containing the

stimulus. The alternatives are presented with either spatial or temporal coincidence, or

without either coincidence. The subjects know that exactly one alternative contains the

stimulus, and that the rest has a zero-stimulus. The differences between these two meth

ods become obvious when the presented stimuli are faint. In the yes-no paradigm the

proportion of yes-answers approaches zero, whereas in the forced-choice paradigm the

proportion of correct answers approaches the value of equal probability for all alterna

tives, which is the reciprocal value of the number of alternatives. Likewise this means

that e.g. in two-alternative forced-choice (2AFC) tasks the threshold is located where ob

servers give 75% of correct responses, since they already gave 50% of correct responses

due to the 2AFC-inherent guessing. The basic advantage of 2AFC consists of its well

founded assumption that subjects will opt for the stimulus evoking the strongest percep

tion, regardless of their tendency to say 'yes' or 'no'. This is in contrast to the yes-no

paradigm, where decision making in the presence of uncertainty is according to the sub

ject's psychological characteristics, like e.g. prudence. Unlike the yes-no mode, the de

pendent variable of nAFC is the rate of correct responses for all tested stimuli instead of


the rate of yes-responses. In the following we subsume both kinds of dependent vari

ables under the term positive-response rate 'If.

For most of psychophysical testing, be it in the clinic or in the research lab, efficiency

is of great importance, i.e. the threshold should be estimated with satisfying accuracy af

ter as few as possible trials. The requirement of minimal number of trials is given by the

fact, that after a long run of trials experimental subjects tend to fatigue and to be bored,

resulting in an apparently drift of their thresholds. For this reason, so-called adaptive p.ry

choplrysicalprocedures have been developed, whose prior purpose is to minimize the number

of trials. We will recapitulate the adaptive procedure called best-PEST in chapter 3.5.3, for

more details about adaptive procedures see the overview of Treutwein (1995). In the next

chapter we describe the theoretical background necessary to understand this procedure.

3.5.2 Specification of the Psychometric function If/ =f (t/J )

The psychometric function assigns a positive-response rate 'If to the range of stimulus

intensities. The particular properties of this function are described in the following:

The range of 'lfis bounded as lower limit by the probability to give positive responses

without perceiving the stimulus (false positive or false alarm rate). This false positive rate

consists of a methodical part (only in nAFe), and the 'proper' false positive rate c. The

methodical part is equal to the reciprocal value of the alternatives n. The upper limit of 'If

consists of (1-b): Big stimulus intensities effect positive responses in virtually all the

cases, only reduced by the false negative rate (i.e. misses) 8. The error terms g and care

caused by observers' inattention or fatigue for instance.

56

If/-00 =P (positive response I~ ~ -00 ) =.!. + £n

If/+00 =P (positive response I~ ~ +00 ) =1-g

~ : stimulus intensity { ~E lR }

£ :false positive {£E lR I 0~£~0.5}g :false negative {g E lR I 0 ~ g ~ 0.5}

n : number of alternatives {nE N 12~n~100}o

CHAPTER 3. THEORY

eq (1)

eq (2)

We define the threshold () to be that value of stimulus intensity that yields a specified

positive-response rate. For practical reasons in testing, the threshold is located at the

steepest slope of the psychometric function (derivation see section 3.5.3). In the follow

ing we will exemplify the psychometric function by means of the logistic model, because

this is the kernel function of the adaptive procedure best-PEST, which is the topic of

section 3.5.3:

If/* (~) = (1+eP'(O-rfJ) )-1 eq (3)

: kernel function

: steepness parameter

: threshold

Since the logistic function is rotationally symmetric in the inflection point, the thresh

old is in the middle of the response range [If/-00' If/+00]' Therefore, the rate of positive re

sponses at threshold is:

eq (4)

In order to create a formal link between the two testing paradigms, theyes-no situation

can be considered as forced-choice situation with an infinitive number of alternatives. In this

case the threshold converges to the value where the positive-response rate is:

o the number of alternatives are restricted to 100, since the practicability of experiments withmore that 100 alternatives is doubtful.


If/(}(YesINo) = limO.5(1-£5+..!.-+£) =0.5(1-£5+£)n~oo n

57

eq (5)

The psychometric function If/* (tfJ) has to be adjusted due to the observers' false posi

tive and false negative rates. For these purposes the kernel function is shifted to n-1 + £

and scaled to the response range [If/-oo,lf/+oo], which distance is - according to eq (1)

and eq (2) - equal to 11-£5 -n-I -£\:eq (6)

If/( tfJ) : adjusted psychometric function

In order to deal with a well-known constant, which is comparable between different

magnitudes of stimuli, we let 13 be the slope of the inflection point of the normalized psy

chometric function. We define the threshold to be at stimulus intensity of 0.5, thus we

normalize the stimulus intensity to two threshold units, with the result of obtaining the

'real' slope in an equal-scaled plot (i.e. the slope is equivalent to the tangent of the gradi

ent angle):

dlf/ 13* (1- £5 - n-I - £ ) . * 4 f3f3 = - = that IS f3 =---'-----

dm 4 l-£5-n- I -£'I' rp=(}

f3 : slope of the psychometric function at threshold (inflection point)

eq (7) inserted in eq (6) leads to:

eq (7)

eq (8)

Equation eq (8) is the underlying, generic formula for the threshold estimation by the

best-PEST calculator. Figure 14 depicts the mapping of eq (8) with different parameter

settings:

nE {2, 4, oo}

f3 E {1.5, 3, 7}

£=0.07

£5 =0.04


positive-response rate '¥ ~1.00-r-----------'8-------------

0.75

4AFC

0.25-+--------.-----~-----r

Yes/No

1/

--------~ ~ =1.5

---~=3.0

-- ~=7.0

1.000.50Normalized stimulus intensity <l> [28]

O.OO+---~-----~~--%.----------,

0.00

Figure 14 Logistic psychometric graphs depicting yes-no and forced-choice situations (nAFC). The asymptotes are at (lln + e), and at (i-b). The slope pis 3 (straightlines), and 7 and 1.5 (dashed lines) respectively. The stimulus intensity is normalized to 2threshold units.

Typically psychometric functions are - as depicted in Figure 14 - of statistical value

(unless they represent a heaviside step function with its 'step' at the threshold value). I.e.

when an observer is presented on several occasions with the same stimulus, s/he or she

is likely to respond yes on some trials and no on other trials. Thus, the threshold cannot

be defmed as the stimulus value below which detection never occurs and above which

detection always occurs, but rather as the stimulus value which is perceptible in a prede-


fined percentage (usually 50%) of the trials. Experimenters are confronted with the ques

tion, how to determine the psychometric function of an experimental subject or of a par

ticular study cohort. For that purpose classical psychophysics offers several methods,

which we will not explain here in detail. Readers interested in this topic may consult the

standard work of Gescheider (1997). Recapitulating, we note that with these methods we

determine the detectability of several stimulus intensities, and fit an appropriate sigmoid

shaped curve to the data to obtain the psychometric function. From this function the

50% threshold for instance can be read out.

In order to measure the empirical threshold, the experimenter must decide what

stimulus intensities should be used in the experiment. It should be clear that choosing in

tensities that are all greatly above or below the threshold would provide little information

leading to an accurate estimation of the threshold. In addition to the problem of requir

ing a large quantity of trials to obtain the threshold, waste trials are likely to occur with

these methods, unless the testing range is known in advance. An approach with these

characteristics is far from optimally efficient and consequently the adaptive methods for

measuring threshold have evolved.

3.5.3 Adaptive Psychophysical Procedures

In all adaptive procedures, the intensity of a stimulus presented on a particular trial is

determined by the observer's performance in detecting stimuli presented on prior trials.

Except for one class of procedures called maximum-likelihood methods, all other methods

described in Gescheider (1997) suggest more or less heuristic rules after how many trials

and how much the presented stimulus intensity has to be adjusted. Even though it is a

characteristic of all adaptive procedures to recall information from the past history of an

experimental run, only the maximum-likelihood procedures determine the next stimulus

presentation based on a statistical estimation of the observer's threshold, which is made

from all of the results obtained from the beginning of the run. The statistical technique

of maximum-likelihood estimation assumes that the underlying psychometric function

has a specific form. For example it could be a Gaussian (the cumulative normal distribu

tion), logistic, Weibull, or some other sigmoid-shaped function. Because these functions

have similar forms, the estimated thresholds are not greatly different, and the choice may

only be of importance if e.g. a particular perception model is under test. In the following


we describe the best-PEST method (pentland, 1980), which uses the logistic function as

underlying modelP.

Maximum-Likelihood: best-PEST

In best-PEST the approach taken to the problem of determining a threshold is to

maximise the information gained with each measurement. In so doing the smallest possi

ble number of measurements will be required. First we derive the choice of the sampling

point on the psychometric function:

For any value fjJ of the stimulus range [O,k], there is a probability tp of a positive an

swer. Given N samples taken at fjJ, of which p were positive, our estimate of tpis:

..... pIf/=

N

If/ : estimate of the probability of a positive response

p : number of positive responses

N : number of samples

the variance is

If/(l-lf/)a=--~

N

a : variance of estimation

and the confidence intervals are

Cl~ =w#If/

Cl~ : width of the confidence interval about ~If/

W : level of desired confidence (e.g. 0.95)

Equations eq (9) and eq (10) inserted in eq (11) leads to

P PEST is the acronym for Parameter Estimation qy Sequential Testing.

eq (9)

eq (10)

eq (11)


Cl~ = W~P(N - p)'If N3

61

eq (12)

To get the stimulus range ljJ corresponding to the confidence interval of the dependent

variable, it has to be divided by the slope of the psychometric curve:

eq (13)

Cl~ : width of the confidence interval about ~

Thus, in order to minimise the estimated confidence interval about the stimulus ljJ for

a given number of trials we have to maximise the slope of the psychometric function.

For all sigmoid-shaped functions, the steepest slope is located at the inflection point. In

the rotationally symmetric logistic function used in best-PEST this point is at the centre

of the curve. In the yes-no mode this is at 50% (if E=O and S=1); in the 2AFC mode this

is at 75% (ifE=0.5 and S=0.5).

In order to explain the best-PEST procedure we reformulate eq (8) and obtain the

probability of getting a positive (if r=1) or negative (if r=-1) response at the i-th trial:

eq (14)

rj : response of the observer at i-th triaL 1'; E {1, -1}

--(Jj : i-th estimate of the threshold

E

Sandeq (2)

: elevation of the psychometric function accord LOg to eq (1)

: scaling of the psychometric function to th( response range according to eq (1)

The strategy in best-PEST is to calculate the likelihood of the sampling point is being

at each point within the testing range and taking as new estimate the stimulus value that

is assigned to the highest probability. After N-1 trials, we find the N-th point of meas

urement by solving:

eq (15)


where (0, k) is the test range of the stimulus (J, and (0;, ri) denotes the results of the

i-th measurement that was taken at value 0;.

The maximum likelihood estimator is known to be the most efficient unbiased estima

tor. One problem arises: the product of all the probability distributions approaches zero

for large numbers of trials. To overcome this problem, we apply a logarithmic transfor

mation to the likelihood function with the result of obtaining the sum instead of the

product of all likelihood functions. That way, the log-likelihood functions do not run into

underflow and need not to be standardised to the overall probability of 1. Since the loga

rithmic transformation is stricdy monotonic increasing, the locations of relative maxima

are preserved:

N Nmax IIf(x) = max :Llogf(x)

xe(a,b)i=1 xe(a,b)i=1eq (16)

For the case of the used function eq (14), the N-th threshold estimation is calculated

according to eq (15) and eq (16):

( ( ~ )-IJ-- N-l r, 8.-fIJ 4pS-lON = max :L log E +S 1+e ,( I )

tflE(O,k) i=1eq (17)

Figure 15 depicts the expansion of the log-likelihood functions according to eq (17).

The parameter settings in Table 9 are used:

Table 9 Parameter settings of the curves depicted in Figure 15.

~s~.<>

"'">

A=2AFC B=yes/no

N 10 10

E 0.5 0

S 0.5 1

»1 f3 2 2

r {1, 1, -1,1,1, -1,1,1,1, -1} {1, -1,1, -1,1,1, -1, -1,1, -1}


koko stimulus intensity <pO-r---------===::-.........

"0oo..c

~

-10

Figure 15 Expansion of the log-likelihood functions in the stimulus interval [0, k]of the adaptive procedure best-PEST. Circles indicate the relative maxima; dashed linesshow the progression of the threshold convergence. Bold lines represent the predefmedinitialisations; thin lines are calculated according to the responses r.A: 2-alternativ forced-choice (2AFC) paradigm. B: yes-no paradigm.

4 Experiments

This chapter consists of the description and the results of the conducted threshold determination experiments. The ftrst part concerns experiments in which single subjects interact withthe computer, aiming to determine perception thresholds for relative and absolute delays.Relative delay thresholds are obtainedfor auditory and visual stimuli in both orders. Absolute delay thresholds are obtainedfor the interaction with both vocal or mouse input, and thecorresponding visual response. The secondpart describes experiments in which pairs or tnplesof subjects interact over a videoconference using an emulated communication network. Thispart consists ofexpenments mming to determine absolute delay thresholds for basic auditoryand visual interaction as well asfor realistic communication tasks.

4.1 In Human-Computer Interaction (HCI) Mode

The experiments conducted in the human-computer interaction (HCl) mode comprise

threshold determinations where the experimental subjects receive computer-generated

stimuli triggered by the subjects' inputs. Such experiments can be conducted without us

ing an emulated communication network, and require single subjects only. This makes

these experiments easier to control, since there is no group dynamic aspect present. Fur

thermore, due to the plain technical infrastructure of such experiments, there is much

less effort needed to install and calibrate the whole. The HeI experiments consist of the

following threshold determinations:

• Relative delay thresholds: auditory before visual (condition AV), and visual be

fore auditory (condition VA) (Zuberbiihler et al., 2002).

• Absolute delay thresholds: vocal trigger - visual response (condition VocVis),

and mouse trigger - visual response (MouVis) (Zuberbiihler et al., 2003).

66

4.1.1 Experimental Setup

CHAPTER 4. EXPERIMENTS

The experimental set up, including stimulus presentation, best-PEST algorithm, and

data acquisition was implemented in a fully computerised environment using Macrome

dia Director's object oriented scripting language Lingo. The temporal resolution capacity

of the entire system was in the range of ±5 ms, whereas the minimal increment adminis

tered by the adaptive procedure has been set to 10 ms. The settings of the particular pa

rameters are listed in Table 10.

Table 10 Parameter setting for threshold determinations in HeI (for an explana-tion of the parameters see the Annex on page 101).

HD.,!y- .. ,

Relative Delay (AV and VA) Absolute Delay (VocVis and MouVis)

Mode 2AFC 2AFCl<

Start value k 400ms 610 ms

Smallest step size 10 ms 10ms

Termination criterion 12 trials 12 trials++

Slope of best-PEST 1.75 1.75

J False negative 8 0 0

False positive e 0 0

Mean of x trials 3 3fH

Runs per subject 3 2+1training

4.1.2 Procedure

We used 2AFC tasks, and applied the adapti' e procedure called best-PEST, suggested

by Pentland (1980), in all experiments investigating HCI. Best-PEST is described in sec

tion 3.5.3 (page 59). 2AFC tasks are designed to dissuade biased influences from the ob

servers' decision criterion, and best-PEST is assumed to deliver thresholds after smallest

possible number of trials. In all experiments the subjects received their instructions via

written text, displayed at the appropriate time during the experimental run. Their task

was to detect whether a delay appeared on the right or on the left side of the screen. The

appearance of the delay position was randomly balanced. In addition, every six trials we

4.1 In Human-Computer Interaction (HCn Mode 67

presented one intermittent trial with a large delay. This trial neither contributed to the re

sults nor did it influence the best-PEST-estimation. This particular procedure was chosen

based on insights gained via pre-tests that subjects tended to be bored after reaching their

approximate threshold (at this point the stimulus is very faint). The presentation of an in

termittent trial with large delays gives the subject the experience of success, resulting in

an increased motivation.

In the following we describe those parts of the procedure that were not common with

all threshold determination experiments.

Relative Delay: Auditory before Visual (AV) and Visual before Auditory (VA)

The delay occurred between the presentation of a black disc (diameter of 4 arc de

grees) on yellow background and the presentation of a 1 kHz tone of 60 dBA with

rise/ fall time of 10 ms. Both disc and tone lasted for 500 ms, therefore stimulus onsets as

well as stimulus offsets served as clues for delay. Stimuli order (auditory before visual or

vice versa) was randomly chosen. That way subjects could not gain insight to the logic of

the best-PEST procedure, and should not have been able to predict the next trials.

",

visual stimulusc:::

visual stimulus.2 -.:!: .c::lI) ,0,

& c:::-.eauditory stimulus 0 ot:: auditory stimulus c:::

~ .!!! ,~-c::: e ~~ ~t ~ ~ ~t ~.2! e =u ..- a-'"- -o 250 500 750 1000 1250 1500 1750

t [ms]

Figure 16 Test sequence of one trial. The duration of ~t is equal to the maximumlikelihood of the threshold computed in the best-PEST procedure. The occurrence of~t is randomly balanced between the left and the right side of the screen. The questionafter each sequence was: »On which side did you perceive a time difference betweensound and picture?«

Seven female and nine male subjects (aged between 25 and 56, mean=32) participated

in the 20 minute experiment. The experimental design was within-group, i.e. all subject

68 CHAPTER 4. EXPERIMENTS

performed all conditions. The 1kHz hearing threshold level and the visual acuity of the

subjects were tested by means of the audiometer Bosch ST10, and the Landoh-rings acu

ity chart. All subjects had normal hearing and normal or corrected to normal vision. Each

subject had to complete the threshold procedure three times for both orders, resulting in

a total of 96 threshold estimations. The median of three threshold estimations per person

was taken for further analysis.

Absolute Delay: Vocal Trigger - Visual Response (VocVis)

The delay occurred between a vocal trigger and the disappearance of a visual stimulus

on the screen. As visual stimulus, a black cipher with a height of 4 arc degrees appeared

on a white background. The subjects were told to pronounce the displayed cipher. As

soon as the sound level of their voice exceeded 65.5 dBA the cipher disappeared either

after a small, system-inherent delay or after the delay calculated by best-PEST. A sound

level of 65.5 dBA was chosen as the trigger point because (1) this sound level represented

the average voice level of subjects and (2) it is high enough to avoid the disappearance of

the cipher due to background noise.

.(

_) stimulus

T 0

voiceinput

c::.0..Q

~ to~Cl)Cl);:,0- t [ms) "

, -

a

tDL+500

\.(I

stimulus visy

T 0ce

c::voice .2 -:t::: .cinput ~ .th

t::Q. .s-0 It::Cl) J!!th

~ L\t+to ~ i e'5 e

'-=I I,I

o

ViSU~

D=disappearan

T=trigger on65.5 dBA

Figure 17 Test sequence of one trial of the voice-visual interaction threshold experiment. The duration of Lit is equal to the maximum likelihood of the threshold computed in the best-PEST procedure. The occurrence of Lit is randomly balanced betweenthe left and the right side of the screen. to is the average response latency of the microphone device. The question after each sequence was: »On which side did you perceive adelayed disappearance of the cipher?«

4.1 In Human-Computer Interaction (HCI) Mode 69

Absolute Delay: Mouse Trigger - Visual Response (MouVis)

The delay occurred between a mouse trigger and the appearance of a visual stimulus

on the screen. The visual stimulus consisted of a red square with a side length of 5 arc

degrees on a white background. The subjects were told to click into a white square,

whereupon the square changed its colour to red, either immediately or after a delay calcu

lated by best-PEST.

,,,

visual stimulus,

T=D

c:: ......2 .-- 5~ .c:

8- .th ct:.e .::tIt.- .20 tt:: U c::

& J!! L..- .0.;::~ e Cl)

Cl)

'5 e :::st[msL'to: 0-., ,.

tOL+500 tOR tOR+500

~t

ViSU~ stimulus

T 0

r-,------1I~--...._IIIIIi-----+_-___1

o

D=disappearance

T=trigger onmouse up

Figure 18 Test sequence of one trial of the click-visual interaction threshold experiment. The duration of ~t is equal to the maximum likelihood of the threshold computed in the best-PEST procedure. The occurrence of ~t is randomly balanced betweenthe left and the right side of the screen. The question after each sequence was: »Onwhich side did you perceive a delayed change of colour?«

Seven female and 17 male subjects (aged between 19 and 41) were recruited for the

two experiments testing two different input modalities (VocVis and MouVis). The ex

perimental design was within-group, i.e. all subject performed all conditions. Each of the

experiments lasted approximately 20 minutes. The subject had to complete the threshold

procedure three times for both modalities, resulting in a total of 144 threshold estima

tions. The first threshold estimation in each condition was considered as practice and was

therefore excluded from further analysis.

70

4.1.3 HCI-Results


Relative Delay: A=Auditory before Visual (AV) and B=Visual before Auditory (VA)

Figure 19 shows the logistic psychometric functions for the perception of relative de

lays (curve fitting by means of the method of the least-squares). The 75 % thresholds ob

tained are 74 ms (AV), and 98 ms 01A), respectively. The thresholds - obtained by the

adaptive procedure best-PEST - are 71 (±17) ms for the AV-condition, and 105 (±25)

ms for the VA-condition, (numbers in brackets stand for the 95% confidence levels) (see

Figure 20). A one-sided, paired t-test shows that the mean of the VA-threshold is signifi

candy higher (p<0.05) than the mean of the AV-threshold. Gender and age had no sig

nificant effect on the detection of relative delays.

0.25

o Auditory before Visual (AV)

o ••••••. Visual before Auditory (VA)

o 50 100 150 200 250Delay (ms)

300 350 400

Figure 19 Psychometric functions for perception of relative AV delay (straightline) and relative VA delay (dashed line). Arrows indicate the 75% thresholds, which areat 74 ms (AV), and 98 ms ryA), respectively. The data are fitted with a logistic model.


200A n=16 • audio before visual

175 + plus 95% Col.

+ minus 95% C.1.150 i - - - threshold AV

Ui' 125E';: 100III + + +'ii + + + + + +"0 75

50

25

0 ------------'----_._-_._--~, __ ._____....l ..,._~~_._

___L---___

1 2 3 4 5 6 7 8 9 10 11# of trials

200 -

B n=16 • visual before audio175 ; + plus 95% Col.

150 t minus 95% Col.- - - threshold VA iL-__,.. ____._....______~____~

Ui'125 + + + + +E

+ + +

';:100III'ii 75 ~"0

50 ~

25

01 2 3 4 5 6 7 8 9 10 11

# of trials

Figure 20 Delays calculated by best-PEST for every trial. The curves representthe mean of all subjects. Continuous lines show the progression of the threshold convergence. Dashed lines indicate the final thresholds; grey lines indicate the 95% confidence interval. A: Auditory before visual, B: Visual before auditory.

72

Absolute Delay: VocVis and MouVis


Figure 21 shows the logistic psychometric functions for the perception of absolute de

lays (curve fitting by means of the method of the least-squares). The 75 % thresholds ob

tained are 98 ms when vocal inputs trigger visual responses (VocVis), and 65 ms when a

click inputs trigger visual responses (MouVis). The thresholds - obtained by the adaptive

procedure best-PEST - are 115 (±23) ms for the VocVis-condition, and 78 (±14) ms for

the MouVis-condition, respectively (numbers in brackets stand for the 95% confidence

levels). A one-sided, paired t-test shows that the mean of the VocVis-threshold is signifi

cantly higher (p=O.Ol) than the mean of the MouVis-threshold. Gender and age had no

significant effect on the detection of absolute delays in Her.

--- Mouse Trigger I Visual Response

0······· Voice Trigger I Visual Response

0.25

o 100 200 300Delay (ms)

400 500 600

Figure 21 Psychometric functions for delay perception between mouse triggerand visual response (straight line) as well as between voice trigger and visual response(dashed line). Arrows indicate the 75% thresholds.


_ ...•.-.-~--._._-._--_..,~

n=24 " voice trigger-visual response+ plus 95% C.I.

minus 95% C.1.

- - - threshold voice-visual

350 ~

A300

250

UiE 200....>-.!!! 150III"C

+ + + + + + + +

50 -

o1 2 3 4 5 6 7

# of trials8 9 10 11 12

+ ++ +... - "+=,

n=24 " mouse trigger-visual response I

+ plus 95% C.I. '

minus 95% C.I.

- - - threshold mouse-visual

5 6 7# of trials

8 9 10 11 12

Figure 22 Delays calculated by best-PEST for every trial. The curves representthe mean of all subjects. Continuous lines show the progression of the threshold convergence. Dashed lines indicate the final thresholds; grey lines indicate the 95% confidence interval.A: Voice trigger - Visual response, B: Mouse trigger - Visual response.

Table 11 summarises the results of the threshold determinations conducted in the HeI

mode.


Table 11 Summary of results of the HeI experiments.

.... ..i,lay.....

R!lifi'l'YDelar

AV VA VocVis MouVis

Threshold best-PEST 71 (±17) ms 105 (±25) ms 115 (± 23) ms 78 (± 14) ms

Threshold fitted model 77 ms 98ms 98ms 65 ms

Slope Ps of the standardised psy- 1.756 2.252 1.111 1.133chometric function (threshold at 0.5)

Experimental design Within-group Within-group

Significance difference AV <VA (p<0.05) VocVis > MouVis (p=0.01)

Significance age (p<0.05) no no no no

Significance gender (p<0.05) no no no no

4.2 In Human-Human Interaction (HHI) Mode

The experiments conducted in the human-human interaction (RH!) mode comprise

threshold determinations in which the experimental subjects interact with each other

over a videoconference that uses an emulated ATM-network infrastructure. Thus, in a

strict sense these experiments investigate network-mediated HHI. The participating subjects

act as both stimulus producer and stimulus receiver. In contrast to the experiments de

scribed before, the intermediary computer-system is not involved in producing stimuli, in

the sense of newly created ones. Rather it is used to process and transmit the human ex

pressions, and to reproduce them as realistically as possible. The HHI experiments con

sist of the following threshold determinations:

• Absolute delcry: Basic auditory interaction between two subjects (condition

AudBas), and basic visual interaction between two subjects (condition Vis

Bas).

• Absolute delcry: Realistic audio-visual interaction ber leen three subjects (con

dition AudVisReal), and realistic auditory interaction between three subjects

(condition AudReal).


4.2.1 Experimental Setup

The experimental set-up depicted in Figure 23 consists of two (in the case of basic in

teractions) or three (in the case of audio-only and audio-visual interactions) videoconfer

ence stations accommodating the so called ETHMICS Kubus (which contains the ETH

MICS videoconferencing system developed at the Computer Engineering and Networks Labo

ratory (Rothlisberger, 1998), the ATM Transmission Hardware, as well as a built-in Mac

intosh computer), a monitor, a camera, a microphone and headphones. The workstations

are connected via fibre passing through a system called ARES (Kurmann, 1997), which

simulates the behaviour of ATM channels in real-time, with performance degradations

(such as delay or errors) for various network configurations and assumptions about

background traffic. The whole is supervised by a control station, which sends delay set

tings to ARES. The control station is also connected to the workstations, in order to ask

the test participants periodically to give their ratings concerning the delay (by means of a

UDP based client/server application). The values are sent back to the control station

where the next delay, according to best-PEST, is calculated.

__ VideoconferenceD-- Network (ATM)

Control NetworkD(Ethernet)Recording (IEEE 845i)

" "".··.'.'m'w.' ·.=·,~,..,,,w,,...... ..·'··m"o'o.".·w···· ,

, '""""""""":;;=::::1"1 111Kubus 3 • I! 11

! fiiiiiiic:Jiiiiiiitll !I1I! 11! 111

."."" ......_ ..._.=.J I1I

III

~Recordingostation

ARES

IKubus2 8 I

-,=--r-J:JJ

111 serial bus

O!Control station

Figure 23 Wiring of the experimental set-up.

Furthermore the video signals from the videoconference cameras are displayed on a

monitor in the observation area. Additionally these signals are recorded on digital video

tape and are saved directly onto a hard drive. For further analysis, the data is encoded


into MPEG 2 and burned to a DVD. The whole experimental set-up has a built-in one

way delay of about 65 ms (with buffered audio stream). This means that the no delay

situation presented to the subjects has in fact an absolute (sub threshold) delay of 130

ms. Table 12 lists the separate delays inherent in the particular videoconference compo

nents. It can be seen that major delay contributions are due to the capturing and process

ing of visual information.

Table 12 Minimal one-way delay in the videoconference network subdivided intothe particular components. Data from (Rothlisberger, 1998).

Average Delay Ems]

CCO-Camera 30

Oigitiser 0

JPEG-Encoder 1

Channel Buffer outgoing 0.5:t:::c:: ATM Network <1:::;) f-------------1I----------J

4.2.2

Channel Buffer incoming

JPEG-Oecoder

Scaling and De-Interlacing

Graphics-Card

TOTAL

Procedure

0.5

1

24

7

65

All experiments investigating HHI are conducted with the above-specified set-up em

ploying the best-PEST procedure. It is not possible to approach the thresholds of all in

teracting subjects simultaneously, since the subjects share the same delay values calcu

lated on the response basis of only one subject. Therefore, in all HHI experiments, using

adaptive methods, one has to assign a subject whose threshold is determined thereafter.

The remaining subjects contribute only with their corresponding ratings. In the following

the particular procedures for the HHI experiments are described.


Absolute Delay: Basic Interaction Task (AudBas and VisBas)

The aim of this task was to evaluate the absolute deltry threshold for basic auditory and

visual interactions. For these purposes the experimental subjects had either to count

from one to ten in alternate order (auditory condition), or had to give hand signs in the

same way (visual condition). One of the two subjects held the relevant information re

quired for the best-PEST calculation. If the answer of the this subject was correct the de

lay value for the next trial was decreased, otherwise it was increased. The subjects were

instructed to react as fast as possible after recognising the partner's expression. That way,

the unknown reaction time could be better controlled, in the sense that no reasoning

took place about the answer to give. Applying a 2AFC paradigm, this procedure had to

be accomplished twice (see Figure 24): one course with an introduced delay computed by

best-PEST, and another course without any additional delay. The delay was randomly in

troduced either in the first or the second course, and the subject's task was to indicate in

which of the two courses the delay was. With this task we expect to measure the lowest

possible threshold for absolute delays, since the subjects communicate with maximal de

gree of interactivity.

A sA.1

B• • • • •

~..~--_ .I-----------~t [ms]

(2)A sA.1

BFigure 24 Test sequence of one trial consisting of two courses with five stimulusexpressions per subject. The stimuli s is of auditory or visual type. 1:0 is the built-in delayplus reaction time. ~t is the transit delay equal to the maximum likelihood of thethreshold computed with the best-PEST algorithm. The occurrence of ~t is randomlybalanced between the first and the second course. The question after each sequencewas: »In which course did you perceive a delay?«

Six female and 14 male subjects (aged between 21 and 35, mean=25) completed the

basic interaction task, giving a total of 640 ratings for different delay values. The experi

mental design was within-group, i.e. all subject performed all conditions. The subjects

were mainly recruited on campus and received 10 CHF for participation in the 20 minute

experiment.

78

Absolute Delay: Realistic Task (AudVisReal and AudReal)


The aim of this task was to evaluate the absolute delay thresholds for a realistic

communication scenario. Evoking natural conversations between subjects that are

captured by video cameras and observed by experimenters is still a great challenge.

Therefore, the least we can do is select a discussion topic, which is familiar to the target

group acting as experimental subjects. As for the basic task, the subjects for this task

were mainly university students. We consider them to be familiar with the problems of

shared flats, either from experience or from hearsay, and consequently they should have

a well-founded opinion to communicate. This makes this topic suitable to be discussed in

the experiment.

The task was structured in two parts: At ftrst the subjects introduced themselves over

the videoconference (condition AudVisReal) or over the audio channel (condition

AudReal). They could do that autonomously or according to predeftned questions to be

asked to each other. During this phase the supervisor introduced pronounced or no delay

values and gave the relevant information to the subjects, in order to acquaint them with

the delay issue. In the second phase the subjects were required to communicate freely ac

cording to the following scenario: One of the three subjects has rented a four-room flat

and needs to ftnd two flat-mates. The remaining two subjects perform the prospects. As

discussion hints they were delivered with catchwords such as shopping and food, visits of

friends, or cleaning regime. Furthermore they had floor plans of the flat that were to be

used for the room allocation. During the second phase delay was introduced according to

the best-PEST algorithm, using a yes-no paradigm, i.e. after one minute of conversation,

the subjects were asked whether they perceived a delay or not. For these experiments we

ran two interleaved best-PEST calculations: One aiming to approach the perception

threshold, and another aiming to approach the acceptance threshold. After having per

ceived a delay the subjects were asked whether it was disturbing or not.

In the audio-visual condition, 30 female and 47 male subjects (aged between 20 and 45

mean=24) completed the realistic task, giving a total of 954 perception and 602 accep

tance ratings for different delay values. The subjects received 30 CHF for participation in

the 45 minute experiment. In the audio-only condition, 8 female and 22 male subjects

(aged between 19 and 32 mean=24) completed the realistic task, giving a total of 438

perception and 326 acceptance ratings for different delay values. The subjects received 10

CHF for participation in the 30 minute experiment.


4.2.3 HHI-Results

Absolute Delay: Basic Interaction Tasks (AudBas and VisBas)

Figure 25 shows the logistic psychometric functions for absolute delays. The 75 %

thresholds are at 196 ms (AudBas) and 204 ms (VisBas). The thresholds - obtained by

the adaptive procedure best-PEST (see Figure 26) - are 216 (±44) for AudBas, and 237

(±92) ms for VisBas, (numbers in brackets stand for the 95% confidence levels). A one

sided, paired t-test shows that the two means are not significandy different (p>0.05).

Gender and age had no significant effect on the detection of basic interaction delays.

oo 0

Visual Perceptiono • • • • •• of Absolute Delay

Auditory Perception0---- of Absolute Delay

o

o

1.00

~ 0.75 ~---------C)----171a..;

;~

"'C(

<:;~ 0.50 ...-----~

<3 0oJ2~

0.25

o 80 160 240 320Absolute Delay (ms)

400 480

Figure 25 Psychometric functions for absolute delay perception with auditory interaction (straight line) and visual interaction (dashed lir ~). Arrows indicate the 75%thresholds. The experimental data are fitted with a logisti( model. The data points represent rates of correct answers for particular delay values that have been obtained withequal or more than 30 measurements.


500 r I'----~·,···_--_·__·,,· ..-··_--_··_--_·_·__·_-....---------,,

n=9 i --auditory interaction '450 + plus 95% Cl.

I400A

minus 95% Cl. I

350 - - - threshold auditory+.... ~... _....~+-_ ...~.. -..-~-- ..III 300 + +E';:250III --------"i 200 ~

"C

150 ~

100

50 r

01 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

# of trials

500 r---------------- ..-.-----...----....---.-----.~,

+ n=9 I --visual interaction I

450 I + plus 95% Cl.I

400 + + !

minus 95% Cl.

350 B + threshold visual+ ---....III 300 - + + +E';: 250

+

III -------"i 200"C

150

100 r

50

0 ------'--------- .._-----'----- ___L ____ --------"-----.-,----------'---- ..- -------l __ ,_____~_.___.L._______ ,_.J

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16# of trials

Figure 26 The curves represent the mean of all subjects. Continuous lines showthe progression of the threshold convergence. Dashed lines indicate the final thresholds;grey lines indicate the 95% confidence interval. A:. Auditory interaction, B: Visual interaction.

4.2 In Human-Human Interaction (HHI) Mode

Absolute Delay: Realistic Task (AudVisReal)

81

Figure 27 shows logistic psychometric functions for perception and acceptance when

interacting audio-visually with a realistic task (condition AudVisReal) (curve fitting by

means of the method of the least-squares). The 50 % perception threshold obtained is

1220 ms, and the 50 % acceptance threshold is 2080 ms.

1.00

--- Perception of Delay

••••••. Non-Acceptance of Delay

0.75

0.25

o 400

o

800 1200 1600Absolute Delay (ms)

2000 2400 2800

Figure 27 Psychometric functions for absolute delay perception (straight line) andacceptance (dashed line) in a realistic, conversational task using both the audio and thevisual channel. Arrows indicate the 50% threshold~. The experimental data are fittedwith a logistic model. The data points represent rates of yes-answers for particular delayvalues that have been obtained with equal or more than 100 measurements.


Absolute Delay: Realistic Task (AudReal)

The thresholds of absolute delays - obtained by the adaptive procedure best-PEST

in a task where only the audio channel is supported are at 970 (±330) ms for perception,

and 1760 (±410) ms acceptance, (numbers in brackets stand for the 95% confidence lev

els). A one-sided, paired t-test shows that the mean perception threshold is significandy

higher (p<0.01) than the mean acceptance threshold. Gender and age had no significant

effect on the perception and acceptance of absolute delays. Figure 28 shows the particu

lar logistic psychometric functions, obtained by a curve fitting procedure by means of the

method of the least-squares. The 50 % thresholds obtained are 800 ms (perception), and

1690 ms (acceptance), respectively.

--- Perception of Delay

•••••• , Non-Acceptance of Delay

28002400

o

20001200 1600Absolute Delay (ms)

800400o

1.00

0.75

~Cl)c:,:tI)

~0.50....0 0.e&!

0.25

Figure 28 Psychometric functions for absolute delay perception (straight line) andacceptance (dashed line) in a realistic, conversational task using only the audio channel.Arrows indicate the 50% thresholds. The experimental data are fitted with a logisticmodel. The data points represent rates of yes-answers for particular delay values thathave been obtained with equal or more than 30 measurements.


Table 13 summarises the results of the threshold determinations conducted in the

HHI mode. Note that in the AudVisReal condition, no best-PEST procedure was ap

plied, thus no individual thresholds were obtained. As a consequence it is not possible to

quote confidence levels and statements about significance.

Table 13 Summary of results of the HHI experiments (n.a. means: not available).

B8$.lc Interaction

AudBas VisBas AudVisReal AudReal

Perception Threshold best-PEST 216 (±44) ms 237 (±92) ms n.a. 970 (±330) ms

Perception Threshold fitted model 196 ms 204 ms 1220 ms 800 ms

Acceptance Threshold best-PEST n.a. n.a. n.a. 1760 (±410) ms

Acceptance Threshold fitted model n.a. n.a. 2080 ms 1690 ms

Slope Ps of the standardised psy- 3.316 3.1620.8889 (perc.) 1.056 (perc.)

cho-metric function (thresh. at 0.5) 1.575 (accept.) 2.324 (accept.)

Experimental design Within-group Between-group

Significance difference (p<0.05) no n.a.

Significance age (p<0.05) no no n.a. no

Significance gender (p<0.05) no no n.a. no

Seite Leer /

Blank leaf

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

II

5

5.1

Discussion and Conclusions

In this chapter we discuss the results of the experiments described in the previous chapter.The discussion is divided into relative deltrys, absolute deltrys, and a section where we discussthe task dependenry ofthe perception and acceptance ofde/try.

Regarding Relative Delays

5.1.1 In Human-Computer Interaction (HCI)

The summary of the HeI experiments from Table 11 (page 74) shows that the percep

tion threshold of a visual stimulus preceding an auditory stimulus is approximately 30 ms

higher than the perception threshold of reverse ordered stimuli. This is plausible since it

reflects human experience in a natural environment, where the propagation speed of light

is much higher than that of sound. Thus, humans are adapted to this situation and

thereby less sensitive to it. Other studies (Dixon et al., 1980; McGrath et al., 1985;

Lewkowicz, 1996) found that synchronisation errors are detected easier the more artifi

cial the presented situation is. For our experiment, the chosen presentation is highlyarti

ficial. Thus, we consider the thresholds we found to be suitable for most stringent condi

tions, as might be present e.g. in telesurgery applications. A~ suggested, just noticeable

relative delays may serve as a decision support for content and service providers and

network planners, in such a way that below these values users will not benefit from op

timisation of the network referring to relative delays.

86 CHAPTER 5. DISCUSSION AND CONCLUSIONS

Since detection performance of relative delays is distributed over user populations, it is

- from a service provider's point of view - a 'political' question, which user percentage

will be accepted to perceive a particular relative delay. From the psychometric functions

of Figure 19 (page 70) such detailed information can be calculated (see Table 15). For

this purpose, the inverse function of the generic psychometric function of eq (8) (page

57) is determined:

[

In (I00lf/-1 - I)Jt/J =() I - -----'-__---C-

2Ps

t/J : delay [ms] {t/J E R !t/J ~ O}

If/ : user percentage [%] {If/E RiO < If/ < lOO}

Ps :standardised slope (i.e. threshold is at stimulus intensity of 0.5)

() : threshold [ms]

eq (18)

Table 14 Parameter values obtained from the resulting psychometric function ofthe relative delay experiments. These values inserted in eq (18) lead to the values listedin Table 15.

8 77 98

Ps 1.756 2.252

5.1 Regarding Relative Delays

Table 15 Relative delay values perceived by particular percentages of users.Reading example: It can be expected that not more than 25 % of users will detect anAV-delay of 53 ms, and a VA-delay of74 ms, respectively.

87

Percentage of UsersDetecting Asynchrony in Hel

'1/[%]

5

10

25

33

50

67

75

90

95

Extent of Asynchrony whenAuditory Precedes Visual (AV)

{lema]

12

29

53

61

77

92

101

125

141

Extent of Asynchrony whenVisual Precedes Auditory (VA)

{lema]

34

50

74

67

98

113

122

146

162

Note that - due to consistency reasons with the adaptive method best-PEST - we

used a logistic model to fit the data, thus assuming the user's detection performance to

follow a logistic function. In some respects this proceeding might look unusual, since

measured values are commonly mapped by Gaussian distributions. But, since the nota

tion of the logistic function is more practicable and the two distributions only differ neg

ligibly, we consider the chosen procedure more maintainable in praxis. This proceeding is

also supported by the fact that threshold values obtained with the model fitting proce

dure are close to the threshold values obtained with the adaptive procedure, i.e. they are

within the 95 % confidence interval. From these results it can be concluded, that the lo

gistic model is appropriate - at least in the middle of the response range, i.e. around the

threshold value.

Ifwe take a closer view at the results taking into consideration the different processing

times of auditory and visual stimuli, it appears that the threshold differences between AV

and VA are inverted: Considering the point of time when auditory and visual stimuli are

available to consciousness, it becomes obvious that a bigger internal time difference 'tAV

than 'tVA is needed to perceive an audio-visual event as asynchronous (see Figure 29). At

the AV-threshold value of 71 ms, the perceived difference - in consideration of the dif

ferent processing times for auditory (15 ms) and visual (55 ms) stimuli - takes the value


'tAV of approximately 110 ms, whereas the perceived difference 'tVA at the VA-threshold

value of 105 ms is approximately 65 ms.

:2 .......--------------.o auditory.cf visual.c-•~

auditory

auditory

500 t [ms]400300

visual

visual

200

Figure 29 Perceived time differences at the thresholds obtained from the relativedelay experiments, when considering the processing time differences of auditory andvisual stimuli.

Recent ERP-findings (Molholm et al., 2002) suggest that the auditory component of

an audio-visual event prepares early visual areas in the cortex for the awaiting visual

component. Under the assumption that this preparation leads to an earlier conscious per

ception of the visual stimulus, this effect could account for an equalisation of differences

between 'tAV and 'tVA, and could thereby indicate an univer"al time quantum within which

multimodal synchrony is perceived. Following such argumentation, it would be necessary

to know, if the complementary effect is also observed, i.e. if the visual component of an

audio-visual event is found to prepare the auditory cortex. To our knowledge, such an ef

fect is not known, and remains to be investigated. Although our data do not reveal suffi

cient evidence to elucidate such effects, they can serve as basis leading to formulate well

founded hypotheses.

5.2 Regarding Absolute Delays 89

5.2

5.2.1

Regarding Absolute Delays

In Human-Computer Interaction (HCI)

The results of Table 11 (page 74) show that it is significantly easier to perceive an ab

solute delay when interacting with the computer by mouse-clicks rather than vocally. Or

more precisely: Voice-visual interaction delays are less likely to be detected than click

visual interaction delays. The difference of the two thresholds is approximately 30 ms.

Since the voice trigger is less distinct compared to the mouse trigger, this difference

makes sense, i.e. the sharp onsets of mouse clicks facilitates detection of delays, com

pared to the blurred onsets of vocal utterances.

Table 16 Parameter values obtained from the resulting psychometric function ofthe absolute delay experiments in HeI. These values inserted in eq (18) (page 86) lead tothe values listed in Table 17.

VocVis. . .... ...8 98 64.8

Ps 1.111 1.133

Table 17 Absolute delay values perceived by particular percentages of users.Reading example: It can be expected that up to 75 % of users will detect an absolute delay of 146 ms when interacting by voice. And up to 75 % of users will detect an absolutedelay of 96 ms when interacting by mouse clicks.

Percentages of Users DetectingAbsolute Delays in Hel

",[OhJ

25

33

50

67

75

90

95

Absolute Delay inVocVis Interaction Mode

; {msJ

50

67

98

129

146

195

228

Absolute Delay inMouVis Interaction Mode

; {msJ

33

45

65

85

96

128

149


Table 11 (page 74) shows that the threshold obtained with the model fitting procedure

is within the 95 % confidence interval of the mean threshold obtained with best-PEST.

Thus - like in the relative delay experiments - we suggest that, around threshold, the lo

gistic model fits well to the experimental data. However, considering the psychometric

function of Figure 21 (page 72), it appears that this is no longer true for small delays: The

best fit of the logistic function intersects the ordinate at a point above 50 %. Translated

to user percentages this would mean that there is a certain percentage of users, say 10 %,which would detect a nonexistent delay. Actually, this scenario is conceivable in experi

ments using the yes-no mode, when subjects give yes-answers without perceiving a delay

(commonly expressed by the false alarm rate l!). But in forced-choice experiments - thus in

the case at hand - this is assumed not to happen, rather they are applied just because one

aims to avoid such response bias. The following two reason illustrate, why these response

biases are unlikely to happen:

• The subjects had to assign, in which presentation the delay occurred, not ifthere was a delay. That way, the subjects could not pretend to perceive a de

lay.

• The presentation containing the delay was randomly distributed, and varied

following the best-PEST procedure; thus it could not be anticipated.

Thus, since the bias is unlikely to result from methodical weaknesses, we must con

sider and possibly revise the assumption made that the logistic (and also the Gaussian)

distribution maps the user's detection performance. At least for absolute delays, we have

to take into account other distributions. Qualitatively, it seems that right-skewed distribu

tions (e.g. Log-Normal and Poissonq) could better match the data. In fact, Limpert et al.

(2001) suggest that the Log-Normal distribution maps multiplicative biological processes

better than the popular Gaussian distribution. Furthermore, they analysed arbitrary

Gaussian measuring data, and found, that the Log-Normal distribution matched the data

at least as well as the Gaussian distribution.

For further delay experiments, we suggest to fit the obtained data with a cumulative

Log-Normal or cumulative Poisson distribution, and to use one of these distributions as

the underlying function of the best-PEST procedure. The drawback of this proceeding is

q Interestingly, in Pacemaker-Switch-Accumulator models (see section 3.4.3 on page 52) the scientific discourse concerns - among others - the question, which distribution the pacemaker frequency is likely to follow. The hypothesis that the frequency follows a Poisson distribution ismore and more evident (Gibbon, 1992; Wearden et aI., 2001).

5.2 Regarding Absolute Delays 91

that both suggested models are computationally impractical. As a consequence, numerical

approximations of the two functions must be implemented.

For the time being, we have to be satisfied with the data at hand. However, in order to

decide if the logistic model is appropriate enough for small delays, we suggest a heuristic

rule for its use: If the standardised slope (i.e. where the threshold is set to the stimulus in

tensity of 0.5) of the particular psychometric function is greater than 1.7, the model willaccount for small delays. Complying with this rule, one can expect the logistic function

intersecting the ordinate at values smaller than 3.2 %. This means that less than 3.2 % of

the users would detect a nonexistent delay. As can be seen in Table 16, the standardised

slopes of the MouVis, and the VoiVis conditions are smaller than 1.7. That is why we re

frain from declaring users percentages detecting small delays for these conditions (see

Table 17 on page 89).

5.2.2 In Human-Human Interaction (HHI)

In contrast to HCI, in HHI experiments it is not possible to approach the thresholds

of all interacting subjects simultaneously, since the subjects share the same delay values

calculated on the response basis of only one subject. Therefore, in all HHI experiments

using adaptive methods, one has to assign a determining subject whose threshold is fi

nally determined. The remaining subjects contribute only with their corresponding rat

ings. For this reason, the statistical power is not that high as it could be expected from

the chosen number of recruited subjects. In order to include all available information, we

therefore applied the curve fitting procedure already applied in the HCI experiments, i.e.

we fitted an assumed logistic model to the data by means of the method of least squares.

From the obtained psychometric function, the desired thresholds can be read out.

Basic Auditory and Visual Interaction

With the curve fitting procedure we found a threshold of 196 ms for auditory, and of

204 ms for visual interaction. These values agree (i.e. are within the 95 % confidence

level) with the threshold values obtained from the best-PEST procedure including only

half of the subjects (216 ±44 ms for auditory, 237 ±92 ms for visual interaction). Reca

pitulating the results from the basic interaction task we found an absolute delay threshold

of about 200 ms for both auditory and visual interactions. We should bear in mind, that


this value is a difference threshold DL, on the basis of the build-in delay of 130 ms plus

the reaction time of the subjects, which is about 190 ms (Brebner, 1980). The suggested

value has to be understood in the following way: When confronted with an absolute de

lay of 320 ms, 50 % of the subjects were able to detect an additional delay of 200 ms.

These results are in line with the results from the HCI experiments (see also

(Zuberbiihler et al., 2003», where we investigated the absolute delay between vocal input

and delayed visual computer-generated response (condition VoiVis). This absolute delay

is 115 (±23) ms, or approximately half of the present value. This makes sense, since in

HCI experiments the subjects were not confronted with human interaction partners, and

had therefore not to consider the ambiguous (and fluctuating) human reaction time.

In contrast to the findings in HCI experiments, the logistic model accounts in this case

also for small delays (i.e. the standardised slopes are greater than 1.7). For this reason we

can quote percentages of users detecting small delays (see Table 19).

Table 18 Parameter values obtained from the resulting psychometric function ofthe absolute delay experiments in HHI. These values inserted in eq (18) (page 86) leadto the values listed in Table 19.

AudBas ,,--(J 196 204

Ps 3.316 3.162

5.2 Regarding Absolute Delays

Table 19 Absolute delay values perceived by particular percentages of users.Reading example: It can be expected that up to 75 % of users will detect an absolute delay of 228 ms in auditory HHI. And up to 75 % of users will detect an absolute delay of239 ms in visual HHI.

93

Percentages of Users DetectingAbsolute Delays in HHI

Vtl%]5

10

25

33

50

67

75

90

95

Realistic Tasks

AAbsolute. Delay

;[ms]

109

131

164

175

196

217

228

261

283

..... - - ...

;[ms]

109

133

169

181

204

227

239

275

299

As we have seen from the basic interaction task, involving two or more people in in

teractive tasks almost doubles perceived absolute delays. This eff<..:ct becomes even more

striking, when the involved persons solve a realistic task, instead of a task designed to fa

cilitate the perception of absolute delays. For the task of free Jiscussion about a familiar

topic, we found a perception threshold of over 1200 ms, and an acceptance threshold of

almost 2100 ms (see Figure 27 on page 81). Since there was no evidence from other stud

ies to support such high values, we have dimensioned the experimental set-up only for a

maximal delay of 2800 ms. In the course of the :..:xpt:riment, we noticed, that several sub

jects did not even detect such a high delay, anc' a greater number of subjects did not find

it disturbing.

Such circumstances make the adaptive )rocedure best-PEST difficult to apply, since

after a few non-detections of the highesttimulus the algorithm requires a huge number

of detection trials in order to return to tht testing range. Hence we did not pursue further

adaptive procedures, but instead presented particular delay values in a random order and

recorded the respective ratings. As a cons,.;quence of this proceeding we were no more

able to determine individual thresholds, and thus cannot quote a confidence interval. In-


stead we applied the model-fitting procedure described earlier to obtain the threshold

values, and the psychometric functions depicted in Figure 27. They show two things:

• The perception and the acceptance functions are relatively flat signifying

that either there exists no sharp thresholds (in this case one might discuss

whether the term threshold is appropriate in this context), or there are great

slope and threshold variances, i.e. some subjects have very good time dis

crimination skills, while others have moderate to poor. Due to qualitative

observation of the subjects during the test we tend to favour the latter ex

planation.

• The not-standardised slopes of the perception and the acceptance functions

are essentially of the same size (1.02 versus 1.06). This fact may indicate that

two similar, linearly interconnected mechanisms are involved in perceiving

and rating absolute delays. It is understood that this hypothesis must be con

firmed by further experimentation.

Our finding that the perception threshold is much greater in the realistic than in the

basic task, as well as inconsistent threshold figures found in the literaturer suggest three

conclusions:

• Perception and acceptance of absolute delays are very much task-dependent.

Therefore it is probably not helpful to recommend universal threshold val

ues, rather they should be suggested for different task categories.

• The choice of value ranges is not a simple task and should be kept as a busi

ness strategy of the service provider.

• The main difference between the realistic and the basic task concerns the de

gree cif interactiviry. Whereas in the basic task this variable is assumed to be at

maximum, it is at a considerably lower level in the realistic task, since the

subjects spend more time studying the documents. Thus, the degree cif interac

tiviry could act as the variable upon which particular tasks (and communica

tion settings) can be classified. We consider the degree cif interactivity as the

sum parameter including some of the verbal interaction parameters sug-

r Bouch for instance suggests a value no greater than 400 ms (Bouch et aI., 2000b), whereasIsaacs and Tang suggest a delay of between 640 ms and 840 ms to be acceptable (Isaac et aI.,1994) (the three figures refer to roundtrip delay).

5.2 Regarding Absolute Delays

gested by O'Conaill et al. (1993): Backchannels, interruptions, explicit handovers

and number rf turns.

95

In order to fmd reasons for the unexpectedly high perception and acceptance thresh

olds in the audio-visual realistic task, we ran an experiment with the same task, applying

the audio channel only. With this condition we found a perception threshold of 800 ms,

and an acceptance threshold of 1690 ms. These two values are within the 95 % confi

dence interval of the threshold means obtained with the best-PEST procedure (970

(±330) ms for perception, and 1760 (±410) ms for acceptance). The fact, that thresholds

in the audio-only condition are well below the thresholds in the audio-visual condition

suggest two possible explanations:

• The visual channel in an audio-visual application acts as a distractor. I.e. the

focus of attention is divided into parts for the audio, and parts for the visual

channel. Since the audio channel apparendy suffices to execute the chosen

task, the additional visual information does not yield additional clues for de

tecting delays, far from it, it hampers the detection of delays. This does not

mean that the visual channel does not yield usable information. But it seems

that the gain of 'media richness' in audio-visual communication has to be

paid by a loss of focussed perception.

• The use of videoconferences (VC) is still unfamiliar to the users acting as

experimental subjects, whereas audio-only conversation is not: Since teleph

ony is very common for users, they are well-trained to perceive and evaluate

situations differing from the ones considered normal. This is not the case in

VC, insofar as the subjects do not have a point of reference to compare the

experimental situation withs•

The fact that, in the audio-only condition, perception and acceptance thresholds are

still above the values found in the literature suggest the following explanation:

• Three subjects participated in our experiment, whereas only two were em

ployed in experiments described in the available literature. The higher

threshold of our experiments could signify that additional conversation

s This explanation resembles in some respects the reasoning, that computer-mediated real-timecommunication is assumed to be compared to the reference point of the natural face-to-facecommunication, whereas for other areas of computer-mediated communication (i.e. browsingthe WWW), such a reference point does not exist (see also section 2.2, Scope of Investigation).


members act as additional distractors. I.e. the focus of attention is divided

into all members. A further reason could be that - since the videoconfer

ence does not support gaze awareness - the members are not sure when

they are addressed. This slows down the degree of interactivity, and thus the

perceived absolute delays.

Table 20 Parameter values obtained from the resulting psychometric function ofthe absolute delay experiments executing a specific realistic task. These values insertedin eq (18) (page 86) lead to the values listed in O.

~erception

AudVisReal AudReal AudVisReal AudReal

8 1220 800 2080 1690

Ps 0.889 1.06 1.58 2.32

5.3 Further Research

Table 21 Absolute delay values perceived by particular percentages of users.Reading example: It can be expected that not more than 33 % of users will detect an absolute delay of 734 ms when interacting audio-visually, and 535 ms when interactingonly auditory. And not more than 33 % of users will ftnd an absolute delay of 1610 msdisturbing when interacting audio-visually, or 1430 ms when interacting only auditory.Note that these values count only for the chosen task.

97

; [ms] ; [ms] ; [ms] ; [ms]

n.a. n.a. n.a. 617

n.a. n.a. 629 889

466 386 1350 1290

734 535 1610 1430

1220 800 2080 1690

1710 1070 2550 1940

1970 1220 2810 2090

2730 1640 3530 2480

3240 1920 4030 2750

Percentages of UsersDetecting or AcceptingAbsolute Delays in HHI

"rh]5

10

25

33

50

67

75

90

95

Perception of Absolute Delay

AudVisReal AudReal

.. , - • ft .1... .,.,....;, ,

AudVisReal AudReal

Considering the values of the standardised slopes in Table 20, it appears that the val

ues for only the acceptance function in the realistic audio-only task were greater than the

suggested value of 1.7. That is why in Table 21, for the other conditions, no delay values

are listed for small user percentages.

5.3 Further Research

This thesis shows several areas, where further research concerning perception and ac

ceptance of delays in multimodal real-time communication, as well as human time per

ception is needed. In the following future research needs are divided into relative and ab

solute delays.

98

5.3.1 Relative Delay

CHAPTER 5. DISCUSSION AND CONCLUSIONS

Although figures of perception thresholds of relative delays between auditory and vis

ual stimuli have been suggested by several authors (Dixon et al., 1980; Summerfield,

1992; Lewkowicz, 1996; Steinmetz, 1996), only a few of them adopted psychophysical

procedures in their study designs. While one might argue that this is not necessary, since

the suggested values are working well in practice, it is nevertheless of interest to verify

these values with different study paradigms, and for different contexts.

Furthermore, questions concerning the different asynchrony perception for different

modality orders are predominately answered by intuitive explanations. A consistent

model of multimodal stimulus processing is still absent. In this area, the upcoming medi

cal imaging techniques present a promising means to investigate questions concerning

multimodal stimulus integration in humans. They could provide deeper insight, why e.g.

AV-stimuli are detected easier than VA-stimuli.

5.3.2 Absolute Delay

Since perception and acceptance of absolute delay is strongly task-dependent, we sug

gested the degree rf interactivity to act as the variable upon which tasks and communication

settings can be assessed in terms of their delay-sensitive impact. The usefulness of this

variable must be verified. If it should turn out to be appropriate, further work has to be

done aiming to classify the abundance of relevant communication settings. Once the

communication settings evoking same degrees of interactivity are pooled, further experi

ments must be conducted with some representative communication settings. The goal of

such a proceeding is to obtain psychometric functions for particular interactivity rates.

Since interactivity is considered a parameter, which can be continuously measured in

networked services, it should be possible to adjust delay values according to the meas

ured degree of interactivity. Having knowledge of the appendant psychometric function,

the delay can be set according to a predefmed (or negotiated) percentage of users per

ceiving, or accepting this particular delay.

On a more theoretical side, further research is needed for modelling time perception.

Although a lot of work has already been done in different research areas, it is still to dis

cover, which neurological mechanisms are responsible for conscious time perception. As

a consequence, for the time being, it remains an open question, which distribution time

5.3 Further Research 99

perception follows. And most notably it is not understood, how contextual factors, such

as attention, arousal, modality, mood, age, and intelligence influence the conscious esti

mation of durations.


I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

Annex

Developed Software: The best-PEST Calculator

As a methodical outcome of the threshold experiments, we advanced the used best

PEST method to a fully independent, browser-based application. The idea was to pro

vide experimenters with a tool for measuring thresholds, which can be used without

spending any installation, compilation, or even programming effort (this is in contrast to

other available software). The drawback of this premise lies in the missing interface. For

security reasons the program has no access to the client computer and therefore cannot

provide it with the estimated values direcdy. The experimenters have to insert the re

ceived threshold values in their testing environment by hand. This fact makes the best

PEST Calculator useful especially for these threshold estimations, whose stimulus pres

entation cannot be done with the aid of common computer-equipment, like e.g. smell

and taste thresholds. This manual and the program can be downloaded from the follow

ing internet address, also quoted in Zuberbiihler (2002):

http://www.psychophysics.ethz.ch/tools/

Depending on the version used, the browser has to be updated with the Macromedia

Director plug-in version 8.5. The software recognises automatically if an update is neces

sary, whereupon it will be done within three or four mouse-clicks.

In the following the best-PEST Calculator is described. This description can also be

downloaded from the above-mentioned link.

102 ANNEX

Description

In the following Figure 30, Figure 31, and Figure 32, screenshots of the three masks of

the program are shown and the input and output fields are explained where they are not

self-explanatory (indicated by numbers).

Settl n 5

Forced-~~~;~~=~;~~~d~~~-(~~F~)-···---·::!j CD

@@

;::.:=::.:::==:====.::=====:::.::.:.:.==::.:.:.=::======.,4 f4\Number of trials \::!/

1) 10.06-..·----.--.-------

0.08

®®CV

.. _,_ _ _ 3 __ _._ _ _. .. .__ --{ ®

Figure 30 Screenshot of the first mask (input), where the settings for the experiment are entered. If all the fields are filled out in the requested format, pressing the'start' button will lead to the second input mask. If not, a dialogue window pops up, indicating the missing or false input. Clicking the arrow opens the 'advanced settings'

fields. By default these settings are: 'slope W= 2, 'false negative 0' = 0, 'false positive E'

= 0, 'mean of x trials' = 3.

CD ModeIn the drop-down menu mode, the users have the choice between the yes-no and the forcedchoice (nAFC) paradigm. If they choose nAFe, an additional input field appears, where thenumber of alternatives n is to insert. If n > 100 is entered, the program switches auto-


matically to theyes-no calculation mode. It is to state that experimental subjects most likelywill be overstrained if they have to make repeated decisions about the presence of astimulus from more than a hundred alternatives. Anyhow, if such experiments areplanned, one can expect the error caused by the slightly inadequate calculation beingmuch smaller than the error caused by any other interference - for instance the subject'slapses.

eq (19)

{l E lR Il ~ o}

{kElRlk>O}k: stimulus maximum

CID Start value kSetting of the test interval [0, k], where k determines the highest stimulus value that can beobtained during the run. The upper limit k should be at least twice as large as the expectedthreshold value. Note that the start value will not be presented to the subject, assumingthis value is so high that subjects will perceive it in all the cases. In order to deal withcomparable slope values, the algorithm uses the normalized range [0, 1] of the stimulusintensity. The stimulus intensity fjJ denotes therefore:

lfjJ=k

fjJ*: stimulus intensity in desired unit

@ Smallest Step SizeDetermines the size of the smallest stimulus change that can be obtained. Ideally this isthe difference threshold of the particular stimulus. If this value is not known - in the casewhere we just want to determine it - we have to estimate a suitable step size. Experimenters need to be aware of step sizes that are too small or too big, since both result in largemeasurement bias of the thresholds. If the ratio between 'start value' and 'smallest stepsize' is larger than 1000, the program will prompt a warning and ask for either a biggerstep size or a smaller start value. This is a precautionary measure to prevent lengthy computing times.

cv Termination CriterionUsers have the choice between 'Number of Trials' and 'Number of Reversals'. A reversalR is defined as a change from increasing to decreasing (01 the other way around) of thepresented stimulus intensities M.

M ={m E lR I m is presented at trial i} eq (20)

104

R ={mj E M I (mi-l > mi < mj+l) v (mi-l < mi > mi+l )}

M: set of presented stimulus intensities

R: set of reversals

ANNEX

eq (21)

@ Advanced Setting: Slope pAs an advanced setting, the users have the opportunity to enter the estimated or knownslope of the particular psychometric function. For the definition of the slope see Figure14 and eq (7). The slope value is calculated according to equal-scaled axes. Entering pimplies knowledge about the tested cohort or subject, usually gained through pre-testing. Ifthe slope is not known, Pwill be set by default to two.

@ Advanced Setting: false negative b8 specifies the false negative rate (or miss rate). This rate is constituted by the observers'negative answers even though the stimulus intensity is at maximum. Entering 8 impliesknowledge about the tested cohort or subject, usually gained through pre-testing. By default this value is zero.

(J) Advanced Setting: false positive E

£ specifies the false positive rate (or false alarm rate). This rate is constituted by the observers' positive answers even tough the stimulus intensity is zero. In forced-choice experiments, £ does not comprise the methodical false alarm rate, which is the reciprocalvalue of the number of alternatives. Entering £ implies knowledge about the tested cohortor subject, usually gained through pre-testing. By default this value is zero.

® Advanced Setting: Mean of x Trialsx specifies the number of trials to take at the end of an experimental run for calculatingthe mean threshold value. As a rule-of-thumb, larger numbers of trials permit larger numbers of x. By default this value is three.


Next value to present to the sUbject is

110

The subject's response was

CORRECT INCORRECT@ 0

....................................

Calculate next value

Figure 31 Screenshot of the second mask (input/output), where the computationof the actual maximum likelihood threshold is done. Pressing the button 'back' willabort the computation and returns to the ftrst mask to modify the settings. Pressing the'cancel' button will abort the computation and switches the program to the results maskwhich displays the recent status of the experiment, without having reached the termination criterion.

105

®

®

CID Step 1: Output from the best-PEST algorithmThe output value mi is to be presented to the subject. This value is the maximum likelihood estimation of the threshold, obtained from all available information. Since there isno information available from the subjects in the first trial, the initialisation is conductedassuming that 100% of the subjects will perceive the stimulus at the start intensity k, andthat at zero intensity they will be certain not to perceive the stimulus. Therefore the firstoutput will fall somewhere in the middle of the test interval.

106 ANNEX

@) Step 2: Response of the subjectAfter the subjects were presented with the stimulus intensity obtained from step 1, the radio button is to select corresponding to the subject's response. In the nAFC mode thebuttons are labelled with 'CORRECT' and 'INCORRECT', and in the yes-no mode theyare labelled with 'YES' and 'NO'.

@ Step 3: next valuePressing the button 'calculate next value' will trigger the next calculation, whereupon anew value will appear in the output field. Steps 1 to 3 have to be repeated until the termination criterion is reached. Pressing this button will bring the program to the 'results'mask.

R e 5 U 5

Threshold is at

103 @

@Values

stimulusintensity220 •\

"

198 ...

\

17S \,"

\.154\

132 '. @110\ ...~ ,~,-

...-".,-"-- .....,,""'---..........--.. threshold\. .'

0·-.. -.::,," ......../ .. '9==::::::¥

88 ", ,.. .......

"-SS .......~.... !

44

22

o 0 2 3 4 5 S 7 8 9 10 11 12 13 14 15 is -number "'trials

Figure 32 Screenshot of the third mask (output), where the results of the entireexperimental run are displayed. Pressing the 'start again' button will return the programto the first mask, and leave the settings as they are.


@ Threshold valueOutput of the final threshold estimation, which is the mean value of the x last trials.

107

@ All valuesThe presented stimulus intensities of the entire experimental run are displayed andmarked in the field 'values' in order to copy them to the clipboard (Ctrl + C).

@GraphThe values of the entire experimental run as well as the final threshold are shown in a diagram with stimulus intensity as ordinate and number of trials as abscissa.

Monte-Carlo-Simulations

The following Monte-Carlo-Simulations were made to evaluate the convergence be

haviour of the best-PEST algorithm. All simulations were made in theyes/no mode with

equal start values. A built-in random process simulated the response behaviour of an as

sumed experimental subject, which we call stochastic obseroer. For that purpose we assumed

that the stochastic observer answers in a logistic manner with a stable threshold - an as

sumption that is in fact made by best-PEST:

...........According to eq (17) (page 62), ON is the n-th threshold estimate accomplished by

best-PEST. For this estimate there is - according to eq (8) (page 57) - a probability

'1/(0,:;) for a positive response. We obtain the p;::ticular answer of the stochastic ob

server by applying the following procedure: If '1/(ON ) is greater than a jointl0stributed

random number between 0 and 1, the stochastic observer answers no, if '1/(ON ) is equal

or smaller than the random number, the stochastic observer answersyes. That way, after a

sufficient number of runs we map the outcome of the best-PEST procedure onto the as

sumed psychometric function of the stochastic observer, and perhaps an empirical law of

the algorithm's behaviour can be established.

In the following we show the results of three simulation runs. Table 22 lists the corre

sponding parameter settings for the conducted simulations, whose results are displayed in

Figure 33, Figure 34, and Figure 35.

108 ANNEX

Table 22 Parameter settings used for the Monte-Carlo-Simulations separated forthree conditions. For an explanation of the parameters see the previous chapter.

Rarameter ... ..•"••"'v if:

Figure 33 Figure 34 Figure 35Mode yes/no Ives/no yes/noStart value k 1.7391 1.7391 1.7391Threshold {} of the stochastic observer 1.0000 1.0000 1.0000Start value k / smallest step size 40 40 40Termination criterion: Number of Trials 15 5to 50 50Slopes of best-PESTs model 1.0 to 3.5 0.1 to 5.0 0.1 to 5.0Slopes of the stochastic observer's psychometric func- same steps same steps 0.1 to 5.0tionFalse negative 8 0 0 0False positive £ 0 0 0Mean of x trials 3 3 3Number of threshold determinations per measuring 3 1000 1000pointNumber of measuring points 2500 2500 2500

In order to gain an idea of the accuracy the best-PEST algorithm provides, we ran a

simulation with realistic parameter settings: As a trade-off between accuracy and practi

cability, the simulated subject accomplishes three threshold determinations consisting of

15 stimulus presentations with corresponding decision-making. As such the whole pro

cedure corresponds to the real time of approximately 30 minutes, which is of course de

pendent on the duration of each stimulus presentation. With such a scenario, experi

menters can be sure that the subjects' fatigue will play a negligible role. For the simula

tions, we ran the above-mentioned scenario with slopes from 1.0 to 3.5, resulting in a to

tal of 2500 threshold means. The histogram of this distribution can be seen in Figure 33.


Il2Jfrequencyl

", - .,..t'I.A.n.~ rnB_~_~_

109

0.7 0.8 0.9 1.0 1.1threshold (target value=1.0)

1.2 1.3

Figure 33 Distribution of the obtained threshold values with the best-PEST algorithm. The stochastic observer's threshold is 1.0 (target value). Basis for the distributionare 2500 threshold determinations, each representing the mean of 3 runs.

The distribution is approximately Gaussian with a mean of 0.99755, and a variance of

0.00764.

The aim of the second simulation was to gain insight to the convergence behaviour of

best-PEST for different numbers of trials until termination, and for different slope values

of both stochastic observer and best-PEST model. For that purpose we calculated the

variance of the mean threshold after 1000 runs as a function of the mentioned variables.

The contour lines of equal variance in the range lO, 0.05] can be seen in Figure 34.

110 ANNEX

504540

--~---t-~...._...~~~~~I I I

0.5 L-----l._--l...._-"--_..J.-_l-----J._-..L-_--J...=.:..;:.;;.....J

5 10 15 20 25 30 35Number of Trials

1.0

.....o~2.0c..oen 1.5

Figure 34 Simulation of threshold determination with the best-PEST algorithm.The curves show contour lines of threshold variances up to 0.05. The number of trialsuntil a threshold determination stops is on the abscissa; the slopes of the psychometricfunctions of both stochastic observer and model are on the ordinate. The variance iscalculated on the basis of 1000 threshold determinations for each measuring point. Theslope's increment is 0.1; the number of trials' increment is 1.

The equal variances of the mean threshold describe approximately exponential curves,

which is coherent with the interpretation that increasing number of trials diminishes the

marginal utility. This interpretation is obvious when we consider the nature of the best

PEST procedure: the information increase relative to the existing information is decreas

ing with every additional trial, and therefore successive threshold estimations approach

the true threshold values. A further prediction that can be made from these data is that

the number of trials plays an important role only for large slopes of the psychometric

functions.

The third simulation was made in order to analyse the convergence behaviour of best

PEST for different, interdependent slope values of the observer's and of the model's

psychometric function. For that purpose we calculated - as in the second simulation -


1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0Slope of Model

the variance of the mean threshold after 1000 runs as a function of the two slope vari

ables. The contour lines of equal variance in the range [0, 0.05] can be seen in Figure 35.

4.5

4.0

~3.5

~:g'3.0en'+-

~ 2.5c..o

en 2.0

1,0~~~~~m~i0.5

0.1 0.5 1.0

Figure 35 Simulation of threshold determination with the best-PEST algorithm.The curves show contour lines of threshold variances up to 0.05. The slope of themodel is on the abscissa; the slope of the stochastic observer is on the ordinate. Thevariance is calculated on the basis of 1000 threshold deterrninations for each measuringpoint. The increment is 0.1 for both variables.

On first sight the curves of equal variance indicate no reasonable and explainable

model of the interdependent behaviour of the two slope parameters. It can be read out

that there is no reason to choose much bigger model than observer slopes, since they in

crease the variance for a given observer slope, especially in its lower range. As a rule of

thumb, we can say, that a model slope twice as big as the observer slope will provide best

results, since it seems, that there is a relative minimum at each of the contour lines at

these points.

References

Alfano, M. (2000). QUASIMODO -Quality rifSef7Jice Methodologies and solutions within thesef7Jiceframework: measuring, managing and chargingQoS: EURESCOM: European Institute for Research and Strategic Studies in Telecommunications.

Armitage, G. (2000). MPLS: The Magic Behind the Myths. IEEE Communications Maga=\?"ne, 38(1), 124-131.

Baird, J. c., & Noma, E. (1978). Fundamentals rifscaling andp[Ychophysics. New York:Wiley.

Bales, R. F. (1955). How people interact in conferences. Scientific American, 3-7.

Bales, R. F. (1999) . Social interaction {Jstems: Theory and measurement. New Brunswick:Transaction Publishers.

Block, R. A., & Zakay, D. (2001). Internal Clocks and the Representation of Time.In C. Hoed & T. McCormack (Eds.), Time and Memory - Issues in Philosophy andP[Ychology (pp. 59-76). Oxford: Oxford University Press Inc.

Boltz, M. G. (1994). Changes in internal tempo and effects on the learning and remembering of event durations. Journal rifExperimental P{Jchology, 20, 1154-1171.

Bouch, A., Bhatti, N., & Kuchinsky, A. J. (2000a). Quality is in the rye rifthe beholder:Meeting users' requirementsfor InternetQuality rifSef7Jice. CHI'2000, Hague.

Bouch, A., Sasse, M. A., & DeMeer, H. (2000b). OfPackets and People: A User-CentredApproach to Quality ofSef7Jice. IWQoS 2000, Pittsburgh, PA.

Braun, A. (2003). Qualitiitsaspekte multimodaler Kommunikation: Subjektive und objektiveMessungen. PhD thesis, Swiss Federal Institute of Technology, Zurich.

Brebner, J. T. (1980). Reaction Time in Personality Theory. In A. T. Welford (Bd.),Reaction Times (pp. 309-320). New York: Academic Press.

114 REFERENCES

Brebner, J. T., & Welford, A T. (1980). Introduction: An Historical BackgroundSketch. In A T. Welford (Bd.), Reaction Times (pp. 1-23). New York: AcademicPress.

Brown, S. W. (1995). Time, change, and motion: The effects of stimulus movementon temporal perception. Perception & P!Jchophysics, 57, 105-116.

Buonomano, D. V., & Karmarkar, U. R. (2002). How Do We Tell Time? The Neuroscientist, 8(1),42-51.

Carr, C. E. (1993). Processing of temporal information in the brain. Annual Review 0/Neuroscience, 16,223-243.

Celesia, G. G., & Puletti, F. (1971). Auditory input to the human cortex during statesof drowness and surgical anesthesia. Electroencephalography and Clinical Neurophysiology, 31, 603-609.

Chen, T. M., Walrand,J., & Messerschmitt, D. G. (1989). Protocols for PacketVoice. IEEE Selected Areas in Communication.

Church, R. M. (1984). Properties of the internal clock. Annals 0/the New York Academy 0/Sciences, 424, 566-582.

Church, R. M., Meek, W. H., & Gibbon, J. (1994). Application of scalar timing theory to individual trials. Journal o/Experimental P!Jchology: Animal Behaviour, 20, 135155.

Clark, V. P., Fan, S., & Hillyard, S. A (1995). Identification of early visual evokedpotential generators by retinotopic and topographic analyses. Human Brain Mapping, 2, 170-187.

Clark, V. P., & Hillyard, S. A (1996). Spatial selective attention affects early extrastriate but not striate components of the visual evoked potential. Journal 0/CognitiveNeuroscience, 8, 387-402.

Coffman, K. G., & Odlyzko, A. M. (1998). The size and growth rate of the internet.IIII/!:! I IIIJJ 'w. die. UJJlJl. cduI ~ot!1v:7ko It!oc!iJllcmcl. Jl>c./idt:

j _ 4: ~j r.

Dixon, N. F., & Spitz, L. (1980). The detection of auditory visual desynchrony. Perception, 9, 719-721.

Fieandt, K., Huhtala, A, Kullberg, P., & Saarl' K. (1956). 1ersonal tempo andphenomenaltime at different age levels. (2). Helsinki: University of Hllsinki.

Fluckiger, F. (1995). Understanding networked multimedia: applications and technology. London: Prentice Hall.

REFERENCES

Foxe,].]., & Simpson, G. V. (2002). Flow of activation from V1 to frontal cortex inhumans: a framework for defining 'early' visual processing. Experimental Brain Research.

Fraisse, P. (1964). The p.rychology oftime. London: Eyre and Spottiswoode.

Galambos, R., Makeig, S., & Talmachoff, P.]. (1981). A 40-Hz auditory potential recorded from the human scalp. Proceedings ofthe NationalAcademy ofSciences, 78,2643-2647.

Galton, F. (1899). On instruments for (1) testing perception of differences of tintand for (2) determining reaction time. Journal ofthe Anthropological Institute(19), 2729.

Gescheider, G. A. (1997). P.rychophysics: The Fundamentals (3 ed.). Mahwah, NJ: Lawrence Erlbaum Associates.

Giard, M. H., & Peronnet, F. (1999). Auditory-Visual Integration during MultimodalObject Recognition in Humans: A Behavioral and Electrophysiological Study.Journal ofCognitive Neuroscience, 11(5),473-490.

Gibbon,). (1992). Ubiquity of scalar timing with a Poisson clock. Journal ofMathematical P.rychology, 36, 283-293.

Gibbon,)., Church, R. M., & Meek, W. H. (1984). Scalar timing in memory. Annals ofthe New York Academy ofSciences, 424, 52-77.

Goldstone, S., & Lhamon, W. T. (1974). Studies of auditory-visual differences inhuman time judgment: 1. Sounds are judged longer than lights. Perceptual and Motor Skills, 39, 63-82.

Gonsalves, T. (1989). Comparative Performance of Voice/Data Local Area Networks. IEEE Selected Areas in Communication.

Guttormsen Schar, S., Arial, M., Zuberbiihler, H. J., & Krueger, H. (2002). DistributedCo-operative Design Systems: supporting Human Factors with 'Communicate-It'. 28th Annual Conference of the IEEE Industrial Electronics Society, Sevilla, Spain.

Helder, G. K. (1966). Customer Evaluation of Telephone Circuits with Delay. BellSystem TechnicalJourna4 38(9).

Hershenson, M. (1962). Reaction time as a measure of intersensory facilitation. Journal ofExperimental P[Ychology, 63,289-293.

Hirsh, 1. J., & Sherrick, C. E. (1961). Perceived order in different sense modalities.Journal ofExperimental P.rychology, 62,423-432.

115

116 REFERENCES

Isaac, E., & Tang,]. (1994). What video can and can't do for collaboration: a casestudy. Multimedia Systems, 2, 63-73.

Jokeit, H. (1990). Analysis of periodicities in human reaction times. Natunvissenschaften, 77, 289-291.

Kohfeld, D. L. (1971). Simple reaction time as a function of stimulus intensity indecibels of light and sound. Journal o/Experimental P[Ychology, 88,251-257.

Kouvelas, 1., Hardman, V., & Watson, A. (1996). Lip Synchronisation for Use Over theInternet: Ana!ysis and Implementation. IEEE Globecom'96, London UK.

Krueger, H. (1994). Wahrnehmung und Be.ftndlichkeit ins richtige Lichtgeseli!. 11. Gemeinschaftstagung der Lichttechnischen Gesellschaften der Schweiz, Deutschlands,der Niederlande und Ostereichs, Interlaken.

Kiindig, A., Zuberbiihler, H. J., & Braun, A. (2001). QoS User Expectations: State 0/theArl- Kry Parameters - their Relevance and their Determination (QED-R-2). ZUrich:ETHZ / TIK, IHA.

Kurmann, H. (1997). On the Emulation o/Impairments inATM-Networks. PhD Thesis,Swiss Federal Institute of Technology, Zurich.

Lejeune, H. (1998). Switching or gating? The attentional challenge in cognitive models of psychological time. Behavioural Processes(44), 127-145.

Lewkowicz, D. J. (1996). Perception of auditory-visual temporal synchrony in humaninfants. Journal o/Experimental P[Ychology: Human Perception and Performance, 22(5),1094-1106.

Limpert, E., Stahel, W. A., & Abbt, M. (2001). Log-normal distributions across thesciences - keys and clues. BioScience, 51, 341-352.

Longuet-Higgins, H. C. (1968). Holographic model of temporal recall. Nature, 217,104.

Madler, c., & Poppel, E. (1987). Auditory evoked potentials indicate the loss of neuronal oscillations during general anasthesia. Natunvissenschaften, 74,42-43.

McDonald, J. J., & Teder-Salejarvi, W. A. (2000). Involuntary orienting to sound improves visual perception. Nature(407), 906-908.

McGrath, M., & Summerfield, Q. (1985). Intermodal timing relations and audiovisual speech recognition by normal-hearing adults. J. Acoust. Soc. Am., 77(2),678-685.

Miall, C. (1996). Models of neural timing. In M. A. Pastor & J. Artieda (Eds.), Time,Internal Clocks and Movement (pp. 69-94). Amsterdam: Elsevier Science B.Y.

REFERENCES

Miller,]. 0., & Low, K. (2001). Motor processes in simple, go/no-go, and choice reaction time tasks: a psychophysiological analysis. Journal ofExperimental P[Ychology:Human Perception and Performance, 27, 266.

Molholm, S., Ritter, W., Murray, M. M., Javitt, D. C, Schroeder, C E., & Foxe,J. J.(2002). Multisensory auditory-visual interactions during early sensory processingin humans: a high-density electrical mapping study. Cognitive Brain Research, 14,115-128.

O'Conaill, B., Wittaker, S., & Willbur, S. (1993). Conversations Over Videoconference: an Evaluation of the Spoken Aspects of Video-Mediated Communications. Human-computer interaction, 8, 389-428.

Odlyzko, A. M. (2000). Internet Growth: Myth and Reality, Use and Abuse. iMP: Information Impacts Magazine(November).

Oviatt, S., & Cohen, P. R. (2000). Designing the User Interface for MultimodalSpeech and Pen-Based Gesture Applications: State-of-the-Art Systems and Future Research Directions. Human Computer Interaction, 15, 263-322.

Pandey, P. C, Kunov, H., & Abel, S. (1986). Disruptive effects of auditory signal delay on speech perception with lipreading. The Journal ofAuditory Research, 26, 2741.

Pentland, A. (1980). Maximum likelihood estimation: The best PEST. Perception &P[Ychophysics, 28(4), 377-379.

Poppel, E. (1971). Oscillations as possible basis for time perception. Studium Generale,24,85-107.

Poppel, E. (1978). Time Perception. In R. Held & H. Leibowitz & H.-L. Teuber(Eds.), Handbook ofSensory Physiology CV01. VIII: Perception, pp. 713-729). Berlin:Springer.

Poppel, E. (1986). Neuronal oscillations in the brain. Discontinuous initiations ofpursuit eye movements indicate a 30-Hz temporal framework for visual information processing. Natunvissenschaften, 77,289-291.

Poppel, E. (1994). Temporal Mechanisms in Perception. International review ofneurobiology, 37, 185-202.

Poppel, E. (1997a). Grenzen des Bewusstseins. Frankfurt am Main: Insel Verlag.

Poppel, E. (1997b). A hierarchical model of temporal perception. Trends in CognitiveScience, 1(2), 56-61.

117

118 REFERENCES

Ranta-aho, M., Wilkins, M., & Egloff, P. (1998). JUPITER -Joint Usabiliry, Performabiliry and Interoperabiliry Trials in Europe: EURESCOM: European Institute for Research and Strategic Studies in Telecommunications.

Rothlisberger, U. (1998). The Architecture ifan Interactive Multimedia Communication System. PhD thesis, Swiss Federal Institute of Technology, Zurich.

Ruesch, J., & Bateson, G. (1951). Communication: The SocialMatrix ifP.rychiatry. NewYork: W.W. Norton & Co.

Sanders, A. F. (1998). Elements ifHuman Performance: Reaction Processes and Attention inHuman Skill. Mahwah, New Jersey: Lawrence Erlbaum Associates.

Schwender, D. e. a. (1994). Anasthetic control of 40-Hz brain activity and implicitmemory. Consciousness and Cognition, 3, 129-147.

Short, J., Williams, E., & Christie, B. (1976). The socialp.rychology iftelecommunication.London: Wiley.

Smith, R. L., Richetto, G. M., & Zima, J. P. (1972). Organizational behaviour: an approach to human communication. In R. W. Budd & B. D. Ruben (Eds.), Approaches to Human Communication (pp. 269-289). New York: Spartan Books.

Steinmetz, R. (1996). Human Perception ofJitter and Media Synchronization. IEEEJournal on Selected Areas in Communications, 14(1),61-72.

Stern, L. W. (1897). Psychische prasenzzeit. ZeitschriJtfur Prychologie und Physiologie derSinnesory,ane, 13, 325-349.

Sternberg, S. (1966). High-speed scanning in human memory. Science, 153,652-654.

Stone, J. v., Hunkin, N. M., Porrill, J., Wood, R., Keeler, V., Beanland, M., Port, M.,& Porter, N. R. (2001). When is now? Perception of simultaneity. Proceedings Biological Sciences: The Rqyal Sociery, 268,31-38.

Stone, M. A., & Moore, B. C. (1999). Tolerable hearing aid delays. 1. Estimation oflimits imposed by the auditory path alone using simulated hearing losses. Ear andHearing, 20(3), 182-192.

Summerfield, Q. (1992). Lipreading and audio-visual speech perception. PhilosophicalTransactions ifthe Rqyal Sociery ifLondon, Series B: Biological Sciences, 335(1273), 7178.

Thomas, E. A. c., & Brown, 1. (1974). Time perception and the filled-duration illusion. Perception & P.rychophysics, 16, 449-458.

REFERENCES

Treisman, M., Faulkner, A., Naish, P., & Brogan, D. (1990). The internal clock: evidence for a temporal oscillator underlying time perception with some estimatesof its characteristic frequency. Perception, 19(6), 705-743.

Treutwein, B. (1995). Adaptive Psychophysical Procedures. Vision Research, 35(17),2503-2522.

Van Hoesel, R, Ramsden, R, & Odriscoll, M. (2002). Sound-direction identification,interaural time delay discrimination, and speech intelligibility advantages in noisefor a bilateral cochlear implant user. Ear and Hearing, 23(2), 137-149.

Vaughan, H. G., & Arezzo,J. C. (1988). The neural basis of event-related potentials.In T. W. Picton (Ed.), Human Event-related Potentials, Handbook ofElectroencephalograpf:y and Clinical Neuropf:ysiology (Revised Series ed., Vol. 3, pp. 45-96). Amsterdam: Elsevier.

von Steinbiichel, N., Wittmann, M., & Poppel, E. (1996). In M. A. Pastor & J.Artieda (Eds.), Time, Internal Clocks, and Movement (pp. 281-304): Elsevier.

Watzlawick, P., Bavelas, J. B., & Jackson, D. D. (1967). Pragmatics ofHuman Communication. New York: W.W. Norton Co.

Watzlawick, P., & Beavin, J. H. (1966). Einige formale Aspekte der Kommunikation.In B. Badura & K. Gloy (Eds.), S0::dologie der Kommunikation: Eine Textauswahl iJlrEinfiihrung. Stuttgart: Frommann.

Wearden, J. H., & Bray, S. (2001). Scalar timing without reference memory? Episodictemporal generalization and bisection in humans. The QuarterlY Journal ofExpert'mental Psychology, 54B(4), 289-309.

Wearden,J. H., Philpott, K., & Win, T. (1999). Speeding up and (... relatively...)slowing down an internal clock in humans. Behavioural Processes(46), 63-73.

Weidenmann, B. (1988). Psychische Prozesse beim Verstehen von Bildern. Bern: VerlagHans Huber.

Welch, R. B., & Warren, D. H. (1986). Intersensory interactions. In K. R Kaufman& J. P. Thomas (Eds.), Handbook ofPerception and Human Peiformance, Sensory Processesand Perception (Vol. 1, pp. 1-36). New York: Wiley.

Welford, A. T. (1980). Choice Reaction Time: Basic Concepts. In A. T. Welford(Ed.), Reaction Times (pp. 73-128). New York: Academic Press.

Wilkins, M., & Tuominen, J. (1998). Recommended Network Parameter Valuesfor Acceptability Tests: EURESCOM: European Institute for Research and Strategic Studiesin Telecommunications.

119

120 REFERENCES

Wilson, G., & Sasse, M. A. (2000). Do Users Always Know What's Good ForThem? Utilising Physiological Responses to Assess Media Quality. In S.McDonald & Y. Waern & G. Cockton (Eds.), People and Computers XIV - Usability or Else! Proceedings ifHCI 2000 (pp. 327-339). Sunderland, UK: Springer.

Witherspoon, D., & Allan, L. G. (1985). Time judgments and the repetition effectsin perceptual identification. Memory and Cognition, 13, 101-111.

Yamaguchi, H., Wada, M., & Yamamoto, H. (1986). A 64 kbit/s Integrated VisualCommunication System - New Communication Medium for the ISDN. IEEESelected Areas in Communication.

Zakay, D., & Block, R. A. (1998). New Perspective on Prospective Time Estimation.In V. De Keyser & G. Ydewalle & A. Vandierendonck (Eds.), Time and the Dynamic Control ifBehavior. Hogrefe & Huber.

Zuberbiihler, H. J. (2002). Rapid Evaluation of Perceptual Thresholds - The BestPest Calculator: A web-based application for non-expert users.IIt/p: //IJ'Il'WPD'c!/{)phl'JitJ.et!i:<;. cb / DolJ!J1/oadJ/EapEJ'iJl.pdt:

Zuberbiihler, H. J., Krueger, H., & Kiindig, A. (2003). Deltry Perception Thresholds inHuman-Computer Interaction: Fundamentalsfor CSCW-Applications. GfA - XVII International Annual Occupational Ergonomics and Safety Conference, Munich.

Zuberbiihler, H. J., Ruegg, S., Krueger, H., & Kiindig, A. (2002). Intermedia Synchronisation in Network Design: Using an Adaptive P{Jchophysical Method to Specify the Perceivable Audio-Visual Deltry. WWDU 2002 - Work With Display Units: World WideWork, Berchtesgaden.

Zwicker, E., & Feldtkeller, R. (1967). Das Ohr als Nachrichtenempfiinger. Stuttgart: S.Hirzel Verlag.

Glossary

2ZAFC ZAltemative-Forced-Choice. see Forced-Choice procedure.

AAbsolute Minimal detectable amount of stimulation.

Threshold

Application In our context, application describes what kind of processes the end user is trying to support when using services of a public network. This interpretation ispurposely wider than the meaning of application program running on some computer, e.g. application may also mean that a phone call is made for some specific purpose.

ATM Arynchronous Tran.ifer Mode: High speed packet switching technology usingsmall packets (cells) of fixed-size (48 data +5 header = 53bytes). ATM is alsoknown as fastpacket.

BBandwidth Technically, the difference, in Hertz (Hz), between the highest and lowest fre

quencies of a transmission channeL However, as typically used, the amount ofdata that can be sent through a given communications circuit.

Best-PEST see PEST

Bit rate Number of binary digits that the network is capable of accepting and delivering per unit of time.

BPS Bits per Second: A measure of the data transfer rate of the data channel

122

cCircuit

SwitchedMode

Client

Codec

Compression

CSCW

CtrlControl

DDifferenceThreshold

DVD

GLOSSARY

Operational mode of a telecommunication network where connections are setup from an end system A to any other end system B, with network resourcesreserved in the network for this connection along a fIxed path. Within network nodes, a very low delay link is dedicated to each connection, and a fIxedbandwidth (bit rate) is reserved on each link participating in a connection.

A computer system or program, which communicates with another suchwhich provides special services (e.g. a workstation requesting the contents of aflie from a ftle server is a client of the flie server).

Beginning and end point of a videoconferencing system. Codec is an acronymfor compression decompression, compressor decompressor, or coder decoder. A codec compresses its video and audio input using computed algorithms. The compressed signal is adapted for transmission over a particularnetwork.

Mapping sets of bits produced by a source into a smaller number of bits to betransmitted. With compression, the original information content may be retained (so-called lossless compression) or reduced (so-called lossy compression). At the receiving side, suitable decompression algorithms restore theoriginal information as far as feasible.

Computer-Supported Cooperative WOrk applications enable real-time collaborationamong geographically-distributed work group members. They typically includeflie transfer, chat, shared whiteboard, application sharing, voice, and video.

A key on a terminal or computer keyboard which modifIes the effect of other(letter, number and some other) keys - in a similar way that the Shift keymakes letter keys generate capital letters

Smallest detectable difference between two stimuli, the just noticeable difference (also calledjnd or Differenz Limen DL)

Digital Versatile Disc is an optical disc technology that holds 4.7 gigabyte of information on one of its two sides, or enough for a 133-minute movie. Withtwo layers on each of its two sides, it will hold up to 17 gigabytes of video,audio, or other information. DVD uses th-.: MPEG-2 flie and compressionstandard.

GLOSSARY

FForced

Choice Procedure

H

123

The observer is given two or more observation intervals, one of which contains a signal. The observer is required to choose which observation intervalcontained the signal.

HCI Human-Computer Interaction is a discipline concerned with the design, evaluation and implementation of interactive computing systems for human use andwith the study of major phenomena surrounding them.

HHI In our context, Human-Human Interaction concerns information exchange between two or more users, over an intermediary computer and/or communication network.

IIEEE Institute rf Electrical and Electronics Engineers (US): Professional society, which

sets standards.

Internet The global collection of interconnected regional and wide-area networks,which use IP as the network, layer protocoL

IP Internet Protocol: The network layer, which describes a packet format for data topass on a TCP/IP network and on the Internet. It is a connectionless, besteffort packet switching protocoL

ISDN Integrated Se17Jices Digital Network: A switched digital network operating in circuit-switched mode. International standard for digital phone and other serVlces.

LLAN Local Area Network: A network spanning a small physical area (e.g. building or

campus) and operating at high speed (typically 10 - 100Mbit/sec)

Layer Communication networks for computers may be organized as a set of, moreor less, independent protocols, each in a different layer (or level). The lowestlayer governs direct host-to-host communication between the hardware at different hosts; the highest consists of user applications. For each layer, programs at different hosts use protocols appropriate to the layer to communicate with each other. TCP/IP has five layers of protocols; OSI has seven. OSIlayers:

124

MMethod of

Least Squares

GLOSSARY

physicalconverts data bits (is and Os) into electrical (or optical) signals (specifying signallevels and timing) to allow transfer of data across parts of a network

data linkframes data into packets and checks the data transferred by level 1 to correcttransmission errors (or retransmit lost data), and control the speed and direction of flow of data between end points of the network

networkcontrols addressing and routing of data through the network, controlling congestion, negotiating packet sizes and protocols between networks, and accounting and billing for data transferred

transportprovides end-to-end data transport between users or processes on differentmachines, interfacing with the network layer to present network connectionsof appropriate types to the higher layers (e.g. an error-corrected point-to-pointchannel, transport of messages without guaranteed delivery, or broadcastingof messages to multiple destinations)

sessionallows higher layers to establish sessions across end-to-end transport links,controlling the direction of communications, providing tokens to regulate operations carried out across the link, and synchronising operations

presentationperforms conversion of data between end-systems' internal representations(e.g. ASCII or EBCDIC coding for characters, one's complement and two'scomplement representation of numbers etc) and abstract data structures, enabling interchange of data between different systems; and data compressionand encryption

applicationconverts between specific characteristics of end-systems' hardware and software and virtual models, enabling applications to run between different systems (e.g. general flle-transfer protocols use a model of a flle system which ismapped into specific systems' representations of file's names, format, encoding etc; similarly for email, directory lookup, remote job entry, terminal emulation etc)

Method for determining particular parameters of a predefined function thatbest fitted a set of data points in which for each point, the Y value of thepoint is plotted as a function of its X value. The method minimises the sum ofthe squared deviations of the Y values from the drawn function.

GLOSSARY

MaximumLikelihood

Methods

MPEG

Monte-CarloSimulation

MPLS

N

125

Adaptive procedures for measuring threshold in which the intensity of thestimulus presented on each trial is determined by a statistical estimation of theobserver's threshold that is made from all of the results obtained from the beginning of the test run.

Moving Picture Experts Group develops standards for digital video and digitalaudio compression. It operates under the auspices of the International Organization for Standardization (ISO). The MPEG standards are an evolvingseries, each designed for a different purpose. MPEG-2 images have four timesthe resolution of MPEG-1 images and can be delivered at 60 interlaced fieldsper second where two fields constitute one image frame. (MPEG-1 can deliver 30 noninterlaced frames per second.)

A computer simulation with a built-in random process, allowing for testingdifferent possible outcomes of a hypothesized model.

MultiProtocol Label Switching. A data transfer mode blending the characteristicsofIP and ATM. For a detailed description see e.g. (Armitage,2000).

nAFC n-Alternative-Forced Choice. Psychophysical testing paradigm, in which the experimental subject is forced to choose in which of n possibilities the stimuluslies.

Network A set of interconnected computers, peripherals and terminals. Its purpose isto enable each computing service to be accessed from other computers andterminals. Consists of an ensemble of switching nodes and transmission links;includes for mobile services all entities supporting mobile end-systems roaming through different cells of the network or even moving from one administrative domain to some other domain

Network An application available on a network, e.g.: electronic mail, ftle transfer, jobservice transfer or interactive terminal connection.

N-ISDN Narrowband ISDN: Two 64 Kbps channels plus one 16 Kbps signalling channel

oOSI Open ~stems Interconnection: A model developed by ISO (International Organi

zation for Standardization) to allow computer systems made by different vendors to communicate with each other. The goal of OSI is to create a worldwide open systems networking environment where all systems can interconnect.

126

OSI referencemodel

pPacket

PacketSwitched

Mode

Perception

PEST

Positive response rate lfI

POTS

Pragmatics

Protocol

GLOSSARY

ISO model for communication between equipment and networks - the famous 7-layer model.

A block of information with a defined format containing control informationand data. "Packet" is a generic term used to describe units of data at all levelsof the protocol stack, but it is most correcdy used to describe application dataunits.

Operational mode of a telecommunication network where information is conveyed in packets of constant or variable length, with packets undergoing temporary storage in nodes. Both the resources within nodes and on links are allocated dynamically, such that, on a statistical basis, a better resource utilization is achieved for bursty traffic. There are two variants of packet mode: (1)with connectionless operation, no network resources are reserved for a particular end-user, i.e. the network is operating in a so-called best-effort mode(no QoS guarantee); (2) with connection-oriented operation, network resources are reserved for a so-called virtual connection such that some QoSguarantees (such as sustainable bit rate or limited delay) can be given.

The interpretation of sensory information to produce an internal representation of the world.

Parameter Estimation by Sequential Trials. Adaptive psychophysical testingmethod.

Rate of 'YES' answers in the yes-no paradigm, or rate of correct answers inthe forced-choice paradigm.

Plain Old Telephone Seroice: The service provided by the conventional analoguetelephone network, i.e. circuit-switched analogue connections with a bandwidth of 3,1 kHz. Its digital equivalent is provided by ISDN.

The study of language seen in relation to its users, branch of semiotics.

A formal description of message formats and the rules two computers mustfollow to exchange those messages. Protocols can describe low-level details ofmachine-to-machine interfaces (e.g. the order in which bits and bytes are sentacross a wire) or high-level exchanges between allocation programs (e.g., theway in which two programs transfer a flle across the Internet).

GLOSSARY

Q

R

127

QoS Quality l?! Service: Formal definition of quality for some specific telecommunication service, using specific parameters. A certain QoS may be agreed by anetwork user and the network operator at different instances and for differentdurations, i.e. its validity may be limited to a connection (or even only partthereof), or it may be the subject of a so-called service level agreement. For a detailed description see (Fluckiger, 1995).

Response Bias

Retinotopy

Return tripdelay

sSemantics

Semiotics

Sensation

SMS

SourceCoding

Syntax

A tendency for the observer to favour one response over another, which isdetermined by factors other than the intensity of the stimulus.

The notion that receptor cells in the retina are mapped to points e.g. on thesurface of the visual cortex.

The elapsed time between the emission of the first bit of a data block and itsreception by the same end-system after the block has been echoed by the destination end-system.

The study of meanings, branch of semiotics.

The science of signs and/or sign systems.

Process of detecting a stimulus or some aspect of it.

Short Message Service: An E-Mail service with very limited capabilities offered inthe framework of the GSM mobile phone system.

Bringing the raw information produced by a source into a form suitable fortransmission. Usually involves A/D conversion and may involve compression.

The rules by which signs are combined to make f'tatements, branch of semiotiCS.

128

T

GLOSSARY

Threshold In our context, the term threshold describes what elsewhere is referred to asEmpirical or Statistical Threshold: The intensity of a stimulus required for a specified level of performance by an observer. Examples are the intensity of thestimulus corresponding to reporting the stimulus 50% of the time in theyes-noparadigm, or correctly detecting the stimulus 75% of the time in a 2APeparadigm. See also Absolute Threshold, and Difference Threshold.

TCPlIP TCPlIP usually refers to the suite of transport and application protocols, especially TCP, which run over IP.

Throughput see bit rate

uUMTS Universal Mobile Telecommunications ~stem. UMTS is one of the Third Genera

tion mobile systems being developed within the framework, which has beendefined by the International Telecommunications Union (ITU) and known asIMT-2000. It seeks to build on and extend the capability of today's mobile,cordless and satellite technologies by providing increased capacity, data capability and a greater range of services.

Underflow A condition that can occur when the result of a floating-point operationwould be smaller in magnitude than the smallest quantity representable. Underflow is actually negative overflow of the exponent. For example, a resultless than 10-128 would cause underflow.

vVoIP 10ice over IP. Sometimes called Internet telephony, IP telephony, or Voice

over the Internet (VOl). A category of hardware and software that enablespeople to use the Internet as the transmission medium for telephone calls. Forusers who have free, or fixed-price Internet access, Internet telephony software essentially provides free telephone calls anywhere in the world. There aremany Internet telephony applications available. Some come bundled withpopular Web browsers, others are stand-alone products.

wWeber's Law Says that the size of the just noticeable difference (see also Difference Threshold)

is a constant proportion of the original stimulus value.

GLOSSARY 129

WAN Wide Area Network: Network extending over large distances/area (typically 10- 1000 km) operating at relatively slow speeds (10 kbit/s -10Mbit/s)

WWW World Wide web. Hypertext-based distributed information service, created byresearchers at CERN in Switzerland; WWW uses the HyperText Markup Language (HTML) for its formatting and interfaces for various systems are available Users may create, edit or browse hypertext documents.

yyes-no Psychophysical testing paradigm, in which an experimental subject has to an

swer after each presentation if she/he detects the stimulus. The presentationscontain a predefined percentage of stimuli.

Seite l.eer /Blank leaf

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

Index

2AFC 54 66 77 121, , ,30-ms-hypothesis 473-seconds-hypothesis 45accumulator 52action potential 49adaptive psychophysical procedure 55, 59affective judgement 20ambiguity 29amplifying principle 11anaesthesia 48ARES 75arousal 52ATM 9 74 75 121, , ,attention 45, 52attribution 20atttention 96audiometer Bosch ST10 68axon 49backchannel 95background noise 68background traffic 75bandwidth 10, 121beat frequency 50best-PEST 55,60,66, 77, 78bit rate 10, 121b' ..raln aCtlvlty 42CCD-Camera 76circadian rhytm 48circuit-switched 1 9

definition of 122classic conditioning 52client 75, 101,122cochlear nucleus 49coding 29, 38

non-verbal 31, 38verbal 30, 38

11 ..co ectlVlsm 24communication 23

audio-visual. 15, 39

business 13 26dial 'og 34face-to-face 13 39formal :.26informal 26interactive 34interpersonal 24layered model of 24multimodal 32 97. 'pnvate 13taxonomy of 23

=;~f~~:..~~~~~~~ ..~~~~~.::::::::::::::: ~~comparator 53compression 10, 15,122content provider 85cortex

auditory 39, 42 88. al 'vlSU 39, 42,88

CSCW 13, 122culture 24decision criterion 66d f' ..egree 0 mteraCtlVlty 77, 94, 96de-interlacing 76

delah..·..· · ··· ..·· · · ··· · ·15a solute 3, 18, 35, 36, 39, 89interaural 49relative 3, 17,37,39,85ret\lrn trip 3, 127rOClndtrip 3,35,36t\ .lnsit 36

digitiser 76distractor 95distribution

cumulative normal 59gau;ss~an 53, 87, 90IOgJ.stlc 59, 90log-normal 90Poisson 90res~onse 47 51rig t-skewed :.90Weibull 59

DVD 76, 122electroencephalography (EEG) 41

132

ETHMICS 75event-related potential (ERP) 41,88explicit handover 95facilitation

cross-modal 41intensity 41

false alarm 55false negative 55, 66, 104false positive 55, 66, 104forced-choice 54, 123formality 26,37graphics-card 76human-computer-interaction (HCI)3, 65, 85,

89defInition of 123

human-human-interaction 91human-human-interaction (HHI) 3, 74

defInition of 123individualism 24inflection point 57information

~t~i~:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::~~prosodic 15temporal 52

interactionbasic auditory 74,77,91basic visual 74, 77,91click-visual 89realistic audio-visual 74, 78realistic auditory 74, 78synchronous 43unit of 34voice-visual 89

intermedia synchronisation 3, 37interruption 95invers function 86IP 1,9,123ISDN 1,9,123isometric perspective 44JPEG-decoder 76JPEG-encoder 76labelled lines 49UN 18, 123Landolt-rings acuity chart 68Lingo 66lip synchronisation 3, 37logarithmic transformation 62logistic 56, 87Macromedia 66, 101

INDEX

man-machine interaction 13marginal utility 110masking principle 11maximum likelihood 59,125media richness 95memory

long-term 53reference 53short-term 47,53working 53

mental construct system 20method ofleast squares 70, 79, 82, 124miss 55modality 31,39,53

auditory 39visual 39

mode 66, 102Monte-Carlo simulation 107,125MPEG 2 76, 125MPLS 9,125n-alternative-forced-choice (nAFC) 54, 102necker cube 45network l, 10, 74, 125network planner 85network service 10, 15, 125neural networks 50neuron 49, 50number of turns 95orientation 27, 38

content 28non-person 27,38person 27, 38relationship 28

oscillationsneuronal 47

OSI 24,39,125pacemaker 50, 52, 53pacemaker-switch-accumulator 52packet-switched 1, 9

defInition of 126pattern recogni lon ·· 13, 48perception 126perceptual store ·.· 52population clocks 50population models 50POTS 11, 126pragmatics 31, 126present

abstract connotation of 44subjective 44

INDEX

pre-test 67, 104processing

high frequency 4610:" frequency 44ffilcrosecond 49

processing timeauditory 40 87

. 1 'Vlsua 40, 87psychometric function 55

elevation 61scaling 61

psychophysics 40d fi ..e In1t1on 19theory 54

puIs 52,53QED 11Quality of Service 1 10

dflr" f 'e In1t1on 0 127reaction time 40

choice 41recognition 41simple 41

responsecorrect 54positive 55, 126yes 55

response bias 90, 127retinotopy 48, 127sampling point 61scaling 76semantics 31,127semiotics 31, 127sensation 31, 127

. 'dsetVlce prov1 er 85shared flat 78sink 35slope 57, 104smallest step size 66, 103SMS 10,127social context 26, 37source

coding oL 10 36 127d di ' ,eco ng of. 36

start value 66, 103step function 58stimulus 54

absence of 54auditory 53

f:ft~::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::~~filled 53intensity oL 53, 55, 56

133

moving 53offset of. 67onset of 67order 67presence of 54producer of 74receiver of 74static 53visual 53,68,88

stochastic observer 107superior olivary complex 49switch 52, 53

ATM 75ethernet 75

SThfLOG 28synchronisation error 3syntax 31, 127telesurgery 85temporal information processing (TIP) 52temporal pattern 14temporal reproduction 45termination criterion 66, 103threshold 128

absolute delay 65, 74acceptance 14, 83definition of. 56difference 92, 103, 122hearing 68measuring 54, 59perception 14,83relative delay 65temporal order 46

throughput 15, 128time

magnitudes of 48perception of 48

timing , 34, 39asynchronous 34synchronous 34

topographic map 49trigger

mouse 69vocal. 68

t-test 70, 79, 82two-alternative forced-choice 54UDP 75UMTS 13,31,128underflow 62, 128videoconference 1,37,38, 75, 78

. al .VISU acwty 68voice-over-IP 1, 128WAN 18,129

134

Weber's Law 51,128WWW 13, 129

INDEX

yes-no 54, 78, 102definition of 129

About the Author

Hans-Jorg Zuberbiihlerwas born 11. February 1968 inSt.Gallen, Switzerland. Afterprimary school, he completedan apprenticeship at the SwissFederal Laboratories for Materials Testing and Research(EMPA) in St.Gallen. Aftersome years of industrial experience, he studied environmental sciences and ergonomics at the Swiss Federal Institute of Technology (ETH) in Zurich. In 1999 he received a master's degree with a thesis about human motion perception and its impact on the acquisition of procedural knowledge. Since then he has been employedas a research assistant at the Institute for Hygieneand Applied Physiology at the ETH. His researchinterests comprise the fields of cognitive ergonomics, sensory physiology and psychophysics as well asmethodical issues.

rights / license: research collection in copyright - non ... · user perception and acceptance...

Documents